From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from h2.hallyn.com ([78.46.35.8]:57914 "EHLO mail.hallyn.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933569AbeGCQyv (ORCPT ); Tue, 3 Jul 2018 12:54:51 -0400 Date: Tue, 3 Jul 2018 11:54:50 -0500 From: "Serge E. Hallyn" To: James Bottomley Cc: Amir Goldstein , linux-fsdevel , Seth Forshee , Linux Containers , Tyler Hicks , Christian Brauner Subject: Re: shiftfs status and future development Message-ID: <20180703165450.GB22894@mail.hallyn.com> References: <20180614184448.GC30028@ubuntu-xps13> <20180615135638.GA29299@mail.hallyn.com> <20180615145917.GF30028@ubuntu-xps13> <1529118185.4048.46.camel@HansenPartnership.com> <20180618134032.GP30028@ubuntu-xps13> <1529333819.4021.4.camel@HansenPartnership.com> <1530085696.4243.5.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1530085696.4243.5.camel@HansenPartnership.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Quoting James Bottomley (James.Bottomley@HansenPartnership.com): > On Mon, 2018-06-18 at 20:11 +0300, Amir Goldstein wrote: > > On Mon, Jun 18, 2018 at 5:56 PM, James Bottomley > > wrote: > > [...] > > > > > > �- Does not break inotify > > > > > > > > > > I don't expect it does, but I haven't checked. > > > > > > > > I haven't checked either; I'm planning to do so soon. This is a > > > > concern that was expressed to me by others, I think because > > > > inotify doesn't work with overlayfs. > > > > > > I think shiftfs does work simply because it doesn't really do > > > overlays, so lots of stuff that doesn't work with overlays does > > > work with it. > > > > > > > I'm afraid shiftfs suffers from the same problems that the old naiive > > overlayfs inode implementation suffered from. > > > > This problem is demonstrated with LTP tests inotify08 inotify09. > > shiftfs_new_inode() is called on every lookup, so inotify watch > > may be set on an inode object, then dentry is evicted from cache > > and then all events on new dentry are not reported on the watched > > inode. You will need to implement hashed inodes to solve it. > > Can be done as overlay does - hashing by real inode pointer. > > > > This is just one of those subtle things about stacked fs and there > > may be other in present and more in future - if we don't have a > > shared code base for the two stacked fs, I wager you are going to end > > up "cherry picking" fixes often. > > > > IMO, an important question to ask is, since both shiftfs and > > overlayfs are strongly coupled with container use cases, are there > > users that are interested in both layering AND shifting? on the same > > "mark"? If the answer is yes, then this may be an argument in favor > > of integrating at least some of shittfs functionality into overlayfs. > > My container use case is interested in shifting but not layering. Even > the docker use case would only mix the two with the overlay graph > driver. There seem to be quite a few clouds using non overlayfs graph > drivers (the dm one being the most popular). > > > Another argument is that shiftfs itself takes the maximum allowed > > 2 levels of s_stack_depth for it's 2 mounts, so it is actually not > > possible with current VFS limitation to combine shiftfs with > > overlayfs. > > That's an artificial, not an inherent, restriction that was introduced > to keep the call stack small. It can be increased or even eliminated > (although then we'd risk a real run off the end of the kernel stack > problem). > > > This could be solved relatively easily by adding "-o mark" support > > to overlayfs and allowing to mount shiftfs also over "marked" > > overlayfs inside container. > > Can we please decided whether the temporary mark, as implemented in the > current patch set or a more permanent security. xattr type > mark is preferred for this? It's an important question that's been > asked, but we have no resolution on. I think something permanent is mandatory. Otherwise users may be able to induce a reboot into a state where the temp mark isn't made. A security. xattr has the problem that an older kernel may not know about it. Two possibilities which have been mentioned before: 1. just demand that the *source* idmap doesn't start at 0. Ideally it would be something like 100k uids starting at 100k. The kernel would refuse to do a shiftfs mount if the source idmap includes uid 0. I suppose the "let-them-shoot-themselves-in-the-foot" approach would be to just strongly recommend using such a source uid mapping, but not enforce it. 2. Enforce that the base directory have perms 700 for shiftfs-mount to be allowed. So /var/lib/shiftfs/base-rootfs might be root-owned with 700 perms. Root then can shiftfs-mount it to /container1 uid-shifted to 100000:0:100000. Yes, root could stupidly change the perms later, but failing that uid 1000 can never exploit a setuid-root script under /var/lib/shiftfs/base-rootfs -serge