From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from bedivere.hansenpartnership.com ([66.63.167.143]:55536 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752095AbeF0HsW (ORCPT ); Wed, 27 Jun 2018 03:48:22 -0400 Message-ID: <1530085696.4243.5.camel@HansenPartnership.com> Subject: Re: shiftfs status and future development From: James Bottomley To: Amir Goldstein Cc: Seth Forshee , linux-fsdevel , Tyler Hicks , Linux Containers , Christian Brauner Date: Wed, 27 Jun 2018 15:48:16 +0800 In-Reply-To: References: <20180614184448.GC30028@ubuntu-xps13> <20180615135638.GA29299@mail.hallyn.com> <20180615145917.GF30028@ubuntu-xps13> <1529118185.4048.46.camel@HansenPartnership.com> <20180618134032.GP30028@ubuntu-xps13> <1529333819.4021.4.camel@HansenPartnership.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, 2018-06-18 at 20:11 +0300, Amir Goldstein wrote: > On Mon, Jun 18, 2018 at 5:56 PM, James Bottomley > wrote: > [...] > > > > >  - Does not break inotify > > > > > > > > I don't expect it does, but I haven't checked. > > > > > > I haven't checked either; I'm planning to do so soon. This is a > > > concern that was expressed to me by others, I think because > > > inotify doesn't work with overlayfs. > > > > I think shiftfs does work simply because it doesn't really do > > overlays, so lots of stuff that doesn't work with overlays does > > work with it. > > > > I'm afraid shiftfs suffers from the same problems that the old naiive > overlayfs inode implementation suffered from. > > This problem is demonstrated with LTP tests inotify08 inotify09. > shiftfs_new_inode() is called on every lookup, so inotify watch > may be set on an inode object, then dentry is evicted from cache > and then all events on new dentry are not reported on the watched > inode. You will need to implement hashed inodes to solve it. > Can be done as overlay does - hashing by real inode pointer. > > This is just one of those subtle things about stacked fs and there > may be other in present and more in future - if we don't have a > shared code base for the two stacked fs, I wager you are going to end > up "cherry picking" fixes often. > > IMO, an important question to ask is, since both shiftfs and > overlayfs are strongly coupled with container use cases, are there > users that are interested in both layering AND shifting? on the same > "mark"? If the answer is yes, then this may be an argument in favor > of integrating at least some of shittfs functionality into overlayfs. My container use case is interested in shifting but not layering. Even the docker use case would only mix the two with the overlay graph driver. There seem to be quite a few clouds using non overlayfs graph drivers (the dm one being the most popular). > Another argument is that shiftfs itself takes the maximum allowed > 2 levels of s_stack_depth for it's 2 mounts, so it is actually not > possible with current VFS limitation to combine shiftfs with > overlayfs. That's an artificial, not an inherent, restriction that was introduced to keep the call stack small. It can be increased or even eliminated (although then we'd risk a real run off the end of the kernel stack problem). > This could be solved relatively easily by adding "-o mark" support > to overlayfs and allowing to mount shiftfs also over "marked" > overlayfs inside container. Can we please decided whether the temporary mark, as implemented in the current patch set or a more permanent security. xattr type mark is preferred for this? It's an important question that's been asked, but we have no resolution on. > Anyway, I'm just playing devil's advocate to the idea of two stacked > fs implementation, so presenting this point of view. I am fully aware > that there are also plenty of disadvantages to couple two unrelated > functionalities together. The biggest one seems to be that the points at which overlayfs and shiftfs do credential shifting are subtly different. That's not to say they can't be unified, but there's some work to do to prove it's possible. James