* Unionmount status? @ 2011-04-12 15:00 Michal Suchanek 2011-04-12 20:31 ` Ric Wheeler 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-12 15:00 UTC (permalink / raw) To: linux-fsdevel, linux-kernel Hello, as some already know the Unionmount VFS union which has been in development for some years now is the only True Union (TM) that can be accepted into the kernel mainline by the VFS maintainers (for reasons of their own which you can surely find if you search the web or ask them directly). The current UnionMount version that can be found here: http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works works for me as good as aufs does. That is I can build a live CD using this unioning solution and it boots and runs without any apparent issues. There are probably many possible uses of the union which I did not test nor did I test long term stability of using the unioned filesystem. As far as ephemeral live systems go it works fine for me, though. The issue is that while the code is (nearly) finished it is not yet merged into mainline and as I am not familiar with the details of ever-changing Linux VFS layer forward-porting this code to current kernels is somewhat challenging. What is the plan with unionmount now? What is required for it to be merged into mainline? Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-12 15:00 Unionmount status? Michal Suchanek @ 2011-04-12 20:31 ` Ric Wheeler 2011-04-12 21:36 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Ric Wheeler @ 2011-04-12 20:31 UTC (permalink / raw) To: Michal Suchanek Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 04/12/2011 11:00 AM, Michal Suchanek wrote: > Hello, > > as some already know the Unionmount VFS union which has been in > development for some years now is the only True Union (TM) that can be > accepted into the kernel mainline by the VFS maintainers (for reasons > of their own which you can surely find if you search the web or ask > them directly). > > The current UnionMount version that can be found here: > > http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works > > works for me as good as aufs does. That is I can build a live CD using > this unioning solution and it boots and runs without any apparent > issues. > > There are probably many possible uses of the union which I did not > test nor did I test long term stability of using the unioned > filesystem. As far as ephemeral live systems go it works fine for me, > though. > > The issue is that while the code is (nearly) finished it is not yet > merged into mainline and as I am not familiar with the details of > ever-changing Linux VFS layer forward-porting this code to current > kernels is somewhat challenging. > > What is the plan with unionmount now? > > What is required for it to be merged into mainline? > > Thanks > > Michal Hi Michal, People are actively looking to see what union mount (or overlayfs) solution to pursue. Val has shifted her focus away from kernel hacking these days, but did refresh her patch set in the last month or so. Miklos has an overlay file system that has also been posted upstream. I think that testing like you did and getting more eyes to look at the code is the next key step for both projects. Thanks! Ric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-12 20:31 ` Ric Wheeler @ 2011-04-12 21:36 ` Michal Suchanek 2011-04-13 14:18 ` Jiri Kosina 2011-04-13 17:26 ` Ric Wheeler 0 siblings, 2 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-12 21:36 UTC (permalink / raw) To: Ric Wheeler Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 12 April 2011 22:31, Ric Wheeler <ricwheeler@gmail.com> wrote: > On 04/12/2011 11:00 AM, Michal Suchanek wrote: >> >> Hello, >> >> as some already know the Unionmount VFS union which has been in >> development for some years now is the only True Union (TM) that can be >> accepted into the kernel mainline by the VFS maintainers (for reasons >> of their own which you can surely find if you search the web or ask >> them directly). >> >> The current UnionMount version that can be found here: >> >> >> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works >> >> works for me as good as aufs does. That is I can build a live CD using >> this unioning solution and it boots and runs without any apparent >> issues. >> >> There are probably many possible uses of the union which I did not >> test nor did I test long term stability of using the unioned >> filesystem. As far as ephemeral live systems go it works fine for me, >> though. >> >> The issue is that while the code is (nearly) finished it is not yet >> merged into mainline and as I am not familiar with the details of >> ever-changing Linux VFS layer forward-porting this code to current >> kernels is somewhat challenging. >> >> What is the plan with unionmount now? >> >> What is required for it to be merged into mainline? >> >> Thanks >> >> Michal > > Hi Michal, > > People are actively looking to see what union mount (or overlayfs) solution > to pursue. Val has shifted her focus away from kernel hacking these days, > but did refresh her patch set in the last month or so. I am not aware of such refreshed patch set, at least it is not published in her repo. > > Miklos has an overlay file system that has also been posted upstream. I have no idea about his other overlay filesystem either. > > I think that testing like you did and getting more eyes to look at the code > is the next key step for both projects. > Could you provide more details? I am not following the Linux fs development closely. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-12 21:36 ` Michal Suchanek @ 2011-04-13 14:18 ` Jiri Kosina 2011-04-13 15:13 ` Michal Suchanek 2011-04-13 17:26 ` Ric Wheeler 1 sibling, 1 reply; 81+ messages in thread From: Jiri Kosina @ 2011-04-13 14:18 UTC (permalink / raw) To: Michal Suchanek Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On Tue, 12 Apr 2011, Michal Suchanek wrote: > > Miklos has an overlay file system that has also been posted upstream. > > I have no idea about his other overlay filesystem either. https://lkml.org/lkml/2011/3/22/222 -- Jiri Kosina SUSE Labs, Novell Inc. ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 14:18 ` Jiri Kosina @ 2011-04-13 15:13 ` Michal Suchanek 2011-04-14 8:38 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-13 15:13 UTC (permalink / raw) To: Jiri Kosina Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote: > On Tue, 12 Apr 2011, Michal Suchanek wrote: > >> > Miklos has an overlay file system that has also been posted upstream. >> >> I have no idea about his other overlay filesystem either. > > https://lkml.org/lkml/2011/3/22/222 > Thanks for pointing out this announcement. However, neither this announcement nor the document Documentation/filesystems/overlayfs.txt included in the git tree details how to mount the filesystem. Without that it is kind of useless. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 15:13 ` Michal Suchanek @ 2011-04-14 8:38 ` Miklos Szeredi 2011-04-14 9:48 ` Sedat Dilek 2011-04-15 11:22 ` Michal Suchanek 0 siblings, 2 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-04-14 8:38 UTC (permalink / raw) To: Michal Suchanek Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote: > On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote: >> https://lkml.org/lkml/2011/3/22/222 >> > > Thanks for pointing out this announcement. > > However, neither this announcement nor the document > Documentation/filesystems/overlayfs.txt included in the git tree > details how to mount the filesystem. Without that it is kind of > useless. Oh, I'll fix that in the docs. Thanks for pointing it out. Here's how to mount: mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 8:38 ` Miklos Szeredi @ 2011-04-14 9:48 ` Sedat Dilek 2011-04-14 9:58 ` Miklos Szeredi 2011-04-15 11:22 ` Michal Suchanek 1 sibling, 1 reply; 81+ messages in thread From: Sedat Dilek @ 2011-04-14 9:48 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On Thu, Apr 14, 2011 at 10:38 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote: >>> https://lkml.org/lkml/2011/3/22/222 >>> >> >> Thanks for pointing out this announcement. >> >> However, neither this announcement nor the document >> Documentation/filesystems/overlayfs.txt included in the git tree >> details how to mount the filesystem. Without that it is kind of >> useless. > > Oh, I'll fix that in the docs. Thanks for pointing it out. > > Here's how to mount: > > mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay > > Thanks, > Miklos > Hi Miklos, did you stop working on OverlayFS (no hear no read since 3 weeks :-))? You made some post-v7-patchset patches, but did not publish a v8. Also, Linus requested a finer-grained split of the big overlayfs.c file like in the other FS. What's the status on this? Regards, - Sedat - ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 9:48 ` Sedat Dilek @ 2011-04-14 9:58 ` Miklos Szeredi 0 siblings, 0 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-04-14 9:58 UTC (permalink / raw) To: sedat.dilek Cc: Sedat Dilek, Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On Thu, Apr 14, 2011 at 11:48 AM, Sedat Dilek <sedat.dilek@googlemail.com> wrote: > did you stop working on OverlayFS (no hear no read since 3 weeks :-))? > You made some post-v7-patchset patches, but did not publish a v8. > > Also, Linus requested a finer-grained split of the big overlayfs.c > file like in the other FS. > What's the status on this? I'm working on that. Will post patches shortly. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 8:38 ` Miklos Szeredi 2011-04-14 9:48 ` Sedat Dilek @ 2011-04-15 11:22 ` Michal Suchanek 2011-04-15 11:31 ` Miklos Szeredi 1 sibling, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-15 11:22 UTC (permalink / raw) To: Miklos Szeredi Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 14 April 2011 10:38, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote: >>> https://lkml.org/lkml/2011/3/22/222 >>> >> >> Thanks for pointing out this announcement. >> >> However, neither this announcement nor the document >> Documentation/filesystems/overlayfs.txt included in the git tree >> details how to mount the filesystem. Without that it is kind of >> useless. > > Oh, I'll fix that in the docs. Thanks for pointing it out. > > Here's how to mount: > > mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay > OK, I built a small live CD with the v7 patch and I can boot it but I get errors like Cleaning up temporary files... [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' find: cannot delete `./motd': Operation not supported Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 11:22 ` Michal Suchanek @ 2011-04-15 11:31 ` Miklos Szeredi 2011-04-15 11:51 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-04-15 11:31 UTC (permalink / raw) To: Michal Suchanek Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote: > On 14 April 2011 10:38, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >>> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote: >>>> https://lkml.org/lkml/2011/3/22/222 >>>> >>> >>> Thanks for pointing out this announcement. >>> >>> However, neither this announcement nor the document >>> Documentation/filesystems/overlayfs.txt included in the git tree >>> details how to mount the filesystem. Without that it is kind of >>> useless. >> >> Oh, I'll fix that in the docs. Thanks for pointing it out. >> >> Here's how to mount: >> >> mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay >> > > OK, I built a small live CD with the v7 patch and I can boot it but I > get errors like > > Cleaning up temporary files... > [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' > find: cannot delete `./motd': Operation not supported What is the upper filesystem type? Is xattr support enabled? Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 11:31 ` Miklos Szeredi @ 2011-04-15 11:51 ` Michal Suchanek 2011-04-15 12:29 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-15 11:51 UTC (permalink / raw) To: Miklos Szeredi Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >> On 14 April 2011 10:38, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >>>> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote: >>>>> https://lkml.org/lkml/2011/3/22/222 >>>>> >>>> >>>> Thanks for pointing out this announcement. >>>> >>>> However, neither this announcement nor the document >>>> Documentation/filesystems/overlayfs.txt included in the git tree >>>> details how to mount the filesystem. Without that it is kind of >>>> useless. >>> >>> Oh, I'll fix that in the docs. Thanks for pointing it out. >>> >>> Here's how to mount: >>> >>> mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay >>> >> >> OK, I built a small live CD with the v7 patch and I can boot it but I >> get errors like >> >> Cleaning up temporary files... >> [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' >> find: cannot delete `./motd': Operation not supported > > What is the upper filesystem type? Is xattr support enabled? > The upper filesystem is tmpfs and there is not option regarding XATTR in the config. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 11:51 ` Michal Suchanek @ 2011-04-15 12:29 ` Miklos Szeredi 2011-04-15 12:34 ` Michal Suchanek ` (2 more replies) 0 siblings, 3 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-04-15 12:29 UTC (permalink / raw) To: Michal Suchanek Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote: > On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote: >> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >>> >>> Cleaning up temporary files... >>> [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' >>> find: cannot delete `./motd': Operation not supported >> >> What is the upper filesystem type? Is xattr support enabled? >> > > The upper filesystem is tmpfs and there is not option regarding XATTR > in the config. Apparently tmpfs does not support generic xattr. I understand why tmpfs is an attractive choice for an upper filesystem, so this should be addressed. I see two options here: 1) implement generic xattr in tmpfs 2) take whiteout/opaque support from union mounts and use that Both have advantages and disadvantages. Any thoughts? Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 12:29 ` Miklos Szeredi @ 2011-04-15 12:34 ` Michal Suchanek 2011-04-15 21:48 ` Hugh Dickins 2011-04-15 22:18 ` Andreas Dilger 2 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-15 12:34 UTC (permalink / raw) To: Miklos Szeredi Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 15 April 2011 14:29, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >>>> >>>> Cleaning up temporary files... >>>> [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' >>>> find: cannot delete `./motd': Operation not supported >>> >>> What is the upper filesystem type? Is xattr support enabled? >>> >> >> The upper filesystem is tmpfs and there is not option regarding XATTR >> in the config. > > Apparently tmpfs does not support generic xattr. I understand why > tmpfs is an attractive choice for an upper filesystem, so this should > be addressed. > > I see two options here: > > 1) implement generic xattr in tmpfs IMHO this is a general feature that keeps overlayfs in itself clean and small and can benefit other uses of tmpfs. > 2) take whiteout/opaque support from union mounts and use that How far from importing full unionmount is that? Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? @ 2011-04-15 12:34 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-15 12:34 UTC (permalink / raw) To: Miklos Szeredi Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 15 April 2011 14:29, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >>>> >>>> Cleaning up temporary files... >>>> [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' >>>> find: cannot delete `./motd': Operation not supported >>> >>> What is the upper filesystem type? Is xattr support enabled? >>> >> >> The upper filesystem is tmpfs and there is not option regarding XATTR >> in the config. > > Apparently tmpfs does not support generic xattr. I understand why > tmpfs is an attractive choice for an upper filesystem, so this should > be addressed. > > I see two options here: > > 1) implement generic xattr in tmpfs IMHO this is a general feature that keeps overlayfs in itself clean and small and can benefit other uses of tmpfs. > 2) take whiteout/opaque support from union mounts and use that How far from importing full unionmount is that? Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 12:34 ` Michal Suchanek (?) @ 2011-04-15 12:48 ` Miklos Szeredi -1 siblings, 0 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-04-15 12:48 UTC (permalink / raw) To: Michal Suchanek Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On Fri, Apr 15, 2011 at 2:34 PM, Michal Suchanek <hramrach@centrum.cz> wrote: > On 15 April 2011 14:29, Miklos Szeredi <miklos@szeredi.hu> wrote: >> 2) take whiteout/opaque support from union mounts and use that > > How far from importing full unionmount is that? Whiteout/opaque support is quite separate from the rest of union mounts, and could be reusable for overlayfs. There are several reasons why I didn't want to go that way with: - lots of filesystems would have to be updated - it introduces incompatibility in the filesystem format, which can be a real pain (not for tmpfs, obviously, since tmpfs doesn't have a persistent backing) There *are* advantages to doing whiteouts in the filesystem, for example it makes file removal atomic. But atomicity is something that needs to be addressed in lots of other places (e.g. copy up) not just during whiteout, and there are other ways to do that than push the responsibility into each and every filesystem. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 12:29 ` Miklos Szeredi 2011-04-15 12:34 ` Michal Suchanek @ 2011-04-15 21:48 ` Hugh Dickins 2011-04-15 22:18 ` Andreas Dilger 2 siblings, 0 replies; 81+ messages in thread From: Hugh Dickins @ 2011-04-15 21:48 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Eric Paris, James Morris On Fri, Apr 15, 2011 at 5:29 AM, Miklos Szeredi <miklos@szeredi.hu> wrote: > On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote: >>>> >>>> Cleaning up temporary files... >>>> [ nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd' >>>> find: cannot delete `./motd': Operation not supported >>> >>> What is the upper filesystem type? Is xattr support enabled? >>> >> >> The upper filesystem is tmpfs and there is not option regarding XATTR >> in the config. > > Apparently tmpfs does not support generic xattr. I understand why > tmpfs is an attractive choice for an upper filesystem, so this should > be addressed. > > I see two options here: > > 1) implement generic xattr in tmpfs > 2) take whiteout/opaque support from union mounts and use that > > Both have advantages and disadvantages. > > Any thoughts? Please get together with Eric: he's having trouble whipping up enthusiasm for his tmpfs xattr patch, he and I would both appreciate your support (and if I've persuaded him to leave out a part of it that you would need, join forces to call me an idiot). Mainly I'm hoping to hear from hch and jmorris on it, being xattr-illiterate myself. https://lkml.org/lkml/2011/3/29/302 Hugh ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 12:29 ` Miklos Szeredi 2011-04-15 12:34 ` Michal Suchanek 2011-04-15 21:48 ` Hugh Dickins @ 2011-04-15 22:18 ` Andreas Dilger 2011-04-18 13:31 ` Michal Suchanek ` (2 more replies) 2 siblings, 3 replies; 81+ messages in thread From: Andreas Dilger @ 2011-04-15 22:18 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: > Apparently tmpfs does not support generic xattr. I understand why > tmpfs is an attractive choice for an upper filesystem, so this should > be addressed. > > I see two options here: > > 1) implement generic xattr in tmpfs There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: From: Eric Paris <eparis@redhat.com> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace Date: March 29, 2011 12:56:49 PM MDT Cheers, Andreas ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 22:18 ` Andreas Dilger @ 2011-04-18 13:31 ` Michal Suchanek 2011-04-18 13:34 ` Michal Suchanek 2011-04-18 13:37 ` Michal Suchanek 2 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-18 13:31 UTC (permalink / raw) To: Andreas Dilger Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: >> Apparently tmpfs does not support generic xattr. I understand why >> tmpfs is an attractive choice for an upper filesystem, so this should >> be addressed. >> >> I see two options here: >> >> 1) implement generic xattr in tmpfs > > There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: > > From: Eric Paris <eparis@redhat.com> > Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace > Date: March 29, 2011 12:56:49 PM MDT > > Cheers, Andreas Applying this patch is not sufficient. Apparently more xattrs are needed but adding them on top of this patch should be easy. The ones mentioned in the overlayfs doc are trusted.overlay.whiteout trusted.overlay.opaque The patch implements security.* Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? @ 2011-04-18 13:31 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-18 13:31 UTC (permalink / raw) To: Andreas Dilger Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: >> Apparently tmpfs does not support generic xattr. I understand why >> tmpfs is an attractive choice for an upper filesystem, so this should >> be addressed. >> >> I see two options here: >> >> 1) implement generic xattr in tmpfs > > There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: > > From: Eric Paris <eparis@redhat.com> > Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace > Date: March 29, 2011 12:56:49 PM MDT > > Cheers, Andreas Applying this patch is not sufficient. Apparently more xattrs are needed but adding them on top of this patch should be easy. The ones mentioned in the overlayfs doc are trusted.overlay.whiteout trusted.overlay.opaque The patch implements security.* Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH] tmpfs: implement generic xattr support 2011-04-18 13:31 ` Michal Suchanek (?) @ 2011-04-19 20:04 ` Miklos Szeredi 2011-04-20 2:18 ` Phillip Lougher ` (2 more replies) -1 siblings, 3 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-04-19 20:04 UTC (permalink / raw) To: Michal Suchanek Cc: Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris Michal Suchanek <hramrach@centrum.cz> writes: > Applying this patch is not sufficient. Apparently more xattrs are > needed but adding them on top of this patch should be easy. > > The ones mentioned in the overlayfs doc are > > trusted.overlay.whiteout > trusted.overlay.opaque > > The patch implements security.* Here's an updated patch. It changes a number of things: - it implements shmem specific xattr ops instead of using the generic_* functions. Which means that there's no back and forth between the VFS and the filesystem. I basically copied the btrfs way of doing things. - adds a new config option: CONFIG_TMPFS_XATTR and makes CONFIG_TMPFS_POSIX_ACL depend on this. This way xattr support can be turned on without also adding ACL support. - now supports trusted.* namespace needed by overlayfs in addition to security.*. Doesn't yet support user.* since that needs more thought wrt. kernel memory use. - supports xattrs on symlinks, again needed by overlayfs Hugh, Eric, does this look acceptable? Thanks, Miklos --- From: Eric Paris <eparis@redhat.com> Subject: tmpfs: implement generic xattr support This patch implements generic xattrs for tmpfs filesystems. The feodra project, while trying to replace suid apps with file capabilities, realized that tmpfs, which is used on the build systems, does not support file capabilities and thus cannot be used to build packages which use file capabilities. Xattrs are also needed for overlayfs. The xattr interface is a bit, odd. If a filesystem does not implement any {get,set,list}xattr functions the VFS will call into some random LSM hooks and the running LSM can then implement some method for handling xattrs. SELinux for example provides a method to support security.selinux but no other security.* xattrs. As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have xattr handler routines specifically to handle acls. Because of this tmpfs would loose the VFS/LSM helpers to support the running LSM. To make up for that tmpfs had stub functions that did nothing but call into the LSM hooks which implement the helpers. This new patch does not use the LSM fallback functions and instead just implements a native get/set/list xattr feature for the full security.* and trusted.* namespace like a normal filesystem. This means that tmpfs can now support both security.selinux and security.capability, which was not previously possible. The basic implementation is that I attach a: struct shmem_xattr { struct list_head list; /* anchored by shmem_inode_info->xattr_list */ char *name; size_t size; char value[0]; }; Into the struct shmem_inode_info for each xattr that is set. This implementation could easily support the user.* namespace as well, except some care needs to be taken to prevent large amounts of unswappable memory being allocated for unprivileged users. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> --- fs/Kconfig | 18 ++ include/linux/shmem_fs.h | 1 mm/shmem.c | 302 +++++++++++++++++++++++++++++++++++++++-------- 3 files changed, 273 insertions(+), 48 deletions(-) Index: linux-2.6/fs/Kconfig =================================================================== --- linux-2.6.orig/fs/Kconfig 2011-04-19 21:09:33.000000000 +0200 +++ linux-2.6/fs/Kconfig 2011-04-19 21:09:35.000000000 +0200 @@ -121,9 +121,25 @@ config TMPFS See <file:Documentation/filesystems/tmpfs.txt> for details. +config TMPFS_XATTR + bool "Tmpfs extended attributes" + depends on TMPFS + default y + help + Extended attributes are name:value pairs associated with inodes by + the kernel or by users (see the attr(5) manual page, or visit + <http://acl.bestbits.at/> for details). + + Currently this enables support for the trusted.* and + security.* namespaces. + + If unsure, say N. + + You need this for POSIX ACL support on tmpfs. + config TMPFS_POSIX_ACL bool "Tmpfs POSIX Access Control Lists" - depends on TMPFS + depends on TMPFS_XATTR select GENERIC_ACL help POSIX Access Control Lists (ACLs) support permissions for users and Index: linux-2.6/include/linux/shmem_fs.h =================================================================== --- linux-2.6.orig/include/linux/shmem_fs.h 2011-04-19 21:09:25.000000000 +0200 +++ linux-2.6/include/linux/shmem_fs.h 2011-04-19 21:09:35.000000000 +0200 @@ -19,6 +19,7 @@ struct shmem_inode_info { struct page *i_indirect; /* top indirect blocks page */ swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */ struct list_head swaplist; /* chain of maybes on swap */ + struct list_head xattr_list; /* list of shmem_xattr */ struct inode vfs_inode; }; Index: linux-2.6/mm/shmem.c =================================================================== --- linux-2.6.orig/mm/shmem.c 2011-04-19 21:09:25.000000000 +0200 +++ linux-2.6/mm/shmem.c 2011-04-19 21:09:35.000000000 +0200 @@ -99,6 +99,13 @@ static struct vfsmount *shm_mnt; /* Pretend that each entry is of this size in directory's i_size */ #define BOGO_DIRENT_SIZE 20 +struct shmem_xattr { + struct list_head list; /* anchored by shmem_inode_info->xattr_list */ + char *name; /* xattr name */ + size_t size; + char value[0]; +}; + /* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */ enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ @@ -822,6 +829,7 @@ static int shmem_notify_change(struct de static void shmem_evict_inode(struct inode *inode) { struct shmem_inode_info *info = SHMEM_I(inode); + struct shmem_xattr *xattr, *nxattr; if (inode->i_mapping->a_ops == &shmem_aops) { truncate_inode_pages(inode->i_mapping, 0); @@ -834,6 +842,11 @@ static void shmem_evict_inode(struct ino mutex_unlock(&shmem_swaplist_mutex); } } + + list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) { + kfree(xattr->name); + kfree(xattr); + } BUG_ON(inode->i_blocks); shmem_free_inode(inode->i_sb); end_writeback(inode); @@ -1597,6 +1610,7 @@ static struct inode *shmem_get_inode(str spin_lock_init(&info->lock); info->flags = flags & VM_NORESERVE; INIT_LIST_HEAD(&info->swaplist); + INIT_LIST_HEAD(&info->xattr_list); cache_no_acl(inode); switch (mode & S_IFMT) { @@ -2048,62 +2062,225 @@ static void shmem_put_link(struct dentry } } -static const struct inode_operations shmem_symlink_inline_operations = { - .readlink = generic_readlink, - .follow_link = shmem_follow_link_inline, -}; - -static const struct inode_operations shmem_symlink_inode_operations = { - .readlink = generic_readlink, - .follow_link = shmem_follow_link, - .put_link = shmem_put_link, -}; - -#ifdef CONFIG_TMPFS_POSIX_ACL +#ifdef CONFIG_TMPFS_XATTR /* - * Superblocks without xattr inode operations will get security.* xattr - * support from the VFS "for free". As soon as we have any other xattrs + * Superblocks without xattr inode operations may get some security.* xattr + * support from the LSM "for free". As soon as we have any other xattrs * like ACLs, we also need to implement the security.* handlers at * filesystem level, though. */ -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list, - size_t list_len, const char *name, - size_t name_len, int handler_flags) +static int shmem_xattr_get(struct dentry *dentry, const char *name, + void *buffer, size_t size) { - return security_inode_listsecurity(dentry->d_inode, list, list_len); -} + struct shmem_inode_info *info; + struct shmem_xattr *xattr; + int ret = -ENODATA; -static int shmem_xattr_security_get(struct dentry *dentry, const char *name, - void *buffer, size_t size, int handler_flags) -{ - if (strcmp(name, "") == 0) - return -EINVAL; - return xattr_getsecurity(dentry->d_inode, name, buffer, size); + info = SHMEM_I(dentry->d_inode); + + spin_lock(&dentry->d_inode->i_lock); + list_for_each_entry(xattr, &info->xattr_list, list) { + if (strcmp(name, xattr->name)) + continue; + + ret = xattr->size; + if (buffer) { + if (size < xattr->size) + ret = -ERANGE; + else + memcpy(buffer, xattr->value, xattr->size); + } + break; + } + spin_unlock(&dentry->d_inode->i_lock); + return ret; } -static int shmem_xattr_security_set(struct dentry *dentry, const char *name, - const void *value, size_t size, int flags, int handler_flags) +static int shmem_xattr_set(struct dentry *dentry, const char *name, + const void *value, size_t size, int flags) { - if (strcmp(name, "") == 0) - return -EINVAL; - return security_inode_setsecurity(dentry->d_inode, name, value, - size, flags); + struct inode *inode = dentry->d_inode; + struct shmem_inode_info *info = SHMEM_I(inode); + struct shmem_xattr *xattr; + struct shmem_xattr *new_xattr = NULL; + size_t len; + int err = 0; + + /* value == NULL means remove */ + if (value) { + /* wrap around? */ + len = sizeof(*new_xattr) + size; + if (len <= sizeof(*new_xattr)) + return -ENOMEM; + + new_xattr = kmalloc(len, GFP_NOFS); + if (!new_xattr) + return -ENOMEM; + + new_xattr->name = kstrdup(name, GFP_NOFS); + if (!new_xattr->name) { + kfree(new_xattr); + return -ENOMEM; + } + + new_xattr->size = size; + memcpy(new_xattr->value, value, size); + } + + spin_lock(&inode->i_lock); + list_for_each_entry(xattr, &info->xattr_list, list) { + if (!strcmp(name, xattr->name)) { + if (flags & XATTR_CREATE) { + xattr = new_xattr; + err = -EEXIST; + } else if (new_xattr) { + list_replace(&xattr->list, &new_xattr->list); + } else { + list_del(&xattr->list); + } + goto out; + } + } + if (flags & XATTR_REPLACE) { + xattr = new_xattr; + err = -ENODATA; + } else { + list_add(&new_xattr->list, &info->xattr_list); + xattr = NULL; + } +out: + spin_unlock(&inode->i_lock); + if (xattr) + kfree(xattr->name); + kfree(xattr); + return 0; } -static const struct xattr_handler shmem_xattr_security_handler = { - .prefix = XATTR_SECURITY_PREFIX, - .list = shmem_xattr_security_list, - .get = shmem_xattr_security_get, - .set = shmem_xattr_security_set, -}; static const struct xattr_handler *shmem_xattr_handlers[] = { +#ifdef CONFIG_TMPFS_POSIX_ACL &generic_acl_access_handler, &generic_acl_default_handler, - &shmem_xattr_security_handler, +#endif NULL }; + +static int shmem_xattr_validate(const char *name) +{ + struct { const char *prefix; size_t len; } arr[] = { + { XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN }, + { XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }}; + int i; + + for (i = 0; i < ARRAY_SIZE(arr); i++) { + size_t preflen = arr[i].len; + if (strncmp(name, arr[i].prefix, preflen) == 0) { + if (!name[preflen]) + return -EINVAL; + return 0; + } + } + return -EOPNOTSUPP; +} + +static ssize_t shmem_getxattr(struct dentry *dentry, const char *name, + void *buffer, size_t size) +{ + int err; + + /* + * If this is a request for a synthetic attribute in the system.* + * namespace use the generic infrastructure to resolve a handler + * for it via sb->s_xattr. + */ + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) + return generic_getxattr(dentry, name, buffer, size); + + err = shmem_xattr_validate(name); + if (err) + return err; + + return shmem_xattr_get(dentry, name, buffer, size); +} + +static int shmem_setxattr(struct dentry *dentry, const char *name, + const void *value, size_t size, int flags) +{ + int err; + + /* + * If this is a request for a synthetic attribute in the system.* + * namespace use the generic infrastructure to resolve a handler + * for it via sb->s_xattr. + */ + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) + return generic_setxattr(dentry, name, value, size, flags); + + err = shmem_xattr_validate(name); + if (err) + return err; + + if (size == 0) + value = ""; /* empty EA, do not remove */ + + return shmem_xattr_set(dentry, name, value, size, flags); + +} + +static int shmem_removexattr(struct dentry *dentry, const char *name) +{ + int err; + + /* + * If this is a request for a synthetic attribute in the system.* + * namespace use the generic infrastructure to resolve a handler + * for it via sb->s_xattr. + */ + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) + return generic_removexattr(dentry, name); + + err = shmem_xattr_validate(name); + if (err) + return err; + + return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE); +} + +static bool xattr_is_trusted(const char *name) +{ + return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN); +} + +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) +{ + bool trusted = capable(CAP_SYS_ADMIN); + struct shmem_xattr *xattr; + struct shmem_inode_info *info; + size_t used = 0; + + info = SHMEM_I(dentry->d_inode); + + spin_lock(&dentry->d_inode->i_lock); + list_for_each_entry(xattr, &info->xattr_list, list) { + /* skip "trusted." attributes for unprivileged callers */ + if (!trusted && xattr_is_trusted(xattr->name)) + continue; + + used += strlen(xattr->name) + 1; + if (buffer) { + if (size < used) { + used = -ERANGE; + break; + } + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); + buffer += strlen(xattr->name) + 1; + } + } + spin_unlock(&dentry->d_inode->i_lock); + + return used; +} #endif static struct dentry *shmem_get_parent(struct dentry *child) @@ -2384,8 +2561,10 @@ int shmem_fill_super(struct super_block sb->s_magic = TMPFS_MAGIC; sb->s_op = &shmem_ops; sb->s_time_gran = 1; -#ifdef CONFIG_TMPFS_POSIX_ACL +#ifdef CONFIG_TMPFS_XATTR sb->s_xattr = shmem_xattr_handlers; +#endif +#ifdef CONFIG_TMPFS_POSIX_ACL sb->s_flags |= MS_POSIXACL; #endif @@ -2483,16 +2662,41 @@ static const struct file_operations shme static const struct inode_operations shmem_inode_operations = { .setattr = shmem_notify_change, .truncate_range = shmem_truncate_range, +#ifdef CONFIG_TMPFS_XATTR + .setxattr = shmem_setxattr, + .getxattr = shmem_getxattr, + .listxattr = shmem_listxattr, + .removexattr = shmem_removexattr, +#endif #ifdef CONFIG_TMPFS_POSIX_ACL - .setxattr = generic_setxattr, - .getxattr = generic_getxattr, - .listxattr = generic_listxattr, - .removexattr = generic_removexattr, .check_acl = generic_check_acl, #endif }; +static const struct inode_operations shmem_symlink_inline_operations = { + .readlink = generic_readlink, + .follow_link = shmem_follow_link_inline, +#ifdef CONFIG_TMPFS_XATTR + .setxattr = shmem_setxattr, + .getxattr = shmem_getxattr, + .listxattr = shmem_listxattr, + .removexattr = shmem_removexattr, +#endif +}; + +static const struct inode_operations shmem_symlink_inode_operations = { + .readlink = generic_readlink, + .follow_link = shmem_follow_link, + .put_link = shmem_put_link, +#ifdef CONFIG_TMPFS_XATTR + .setxattr = shmem_setxattr, + .getxattr = shmem_getxattr, + .listxattr = shmem_listxattr, + .removexattr = shmem_removexattr, +#endif +}; + static const struct inode_operations shmem_dir_inode_operations = { #ifdef CONFIG_TMPFS .create = shmem_create, @@ -2505,23 +2709,27 @@ static const struct inode_operations shm .mknod = shmem_mknod, .rename = shmem_rename, #endif -#ifdef CONFIG_TMPFS_POSIX_ACL - .setattr = shmem_notify_change, +#ifdef CONFIG_TMPFS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, .listxattr = generic_listxattr, .removexattr = generic_removexattr, +#endif +#ifdef CONFIG_TMPFS_POSIX_ACL + .setattr = shmem_notify_change, .check_acl = generic_check_acl, #endif }; static const struct inode_operations shmem_special_inode_operations = { -#ifdef CONFIG_TMPFS_POSIX_ACL - .setattr = shmem_notify_change, +#ifdef CONFIG_TMPFS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, .listxattr = generic_listxattr, .removexattr = generic_removexattr, +#endif +#ifdef CONFIG_TMPFS_POSIX_ACL + .setattr = shmem_notify_change, .check_acl = generic_check_acl, #endif }; ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-19 20:04 ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi @ 2011-04-20 2:18 ` Phillip Lougher 2011-04-20 16:00 ` Serge E. Hallyn 2011-05-12 4:20 ` Hugh Dickins 2 siblings, 0 replies; 81+ messages in thread From: Phillip Lougher @ 2011-04-20 2:18 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) > + spin_lock(&dentry->d_inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + /* skip "trusted." attributes for unprivileged callers */ > + if (!trusted && xattr_is_trusted(xattr->name)) > + continue; > + > + used += strlen(xattr->name) + 1; > + if (buffer) { > + if (size < used) { > + used = -ERANGE; > + break; > + } > + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); >+ buffer += strlen(xattr->name) + 1; Why are you doing a strncpy here? strcpy() isn't going to copy more than strlen(xattr->name) + 1 bytes, and you know buffer is large enough to hold that because of the previous if (size < used) check? If you assigned the first strlen(xattr->name) + 1 to a temporary variable, you could use memcpy here, and avoid the 3 repeated strlen(xattr->name) calls. Phillip ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support @ 2011-04-20 2:18 ` Phillip Lougher 0 siblings, 0 replies; 81+ messages in thread From: Phillip Lougher @ 2011-04-20 2:18 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) > + spin_lock(&dentry->d_inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + /* skip "trusted." attributes for unprivileged callers */ > + if (!trusted && xattr_is_trusted(xattr->name)) > + continue; > + > + used += strlen(xattr->name) + 1; > + if (buffer) { > + if (size < used) { > + used = -ERANGE; > + break; > + } > + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); >+ buffer += strlen(xattr->name) + 1; Why are you doing a strncpy here? strcpy() isn't going to copy more than strlen(xattr->name) + 1 bytes, and you know buffer is large enough to hold that because of the previous if (size < used) check? If you assigned the first strlen(xattr->name) + 1 to a temporary variable, you could use memcpy here, and avoid the 3 repeated strlen(xattr->name) calls. Phillip -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-20 2:18 ` Phillip Lougher (?) @ 2011-04-20 13:43 ` Miklos Szeredi 2011-04-21 6:59 ` Michal Suchanek -1 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-04-20 13:43 UTC (permalink / raw) To: Phillip Lougher Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris Phillip Lougher <phillip.lougher@gmail.com> writes: > On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: > >> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) > >> + spin_lock(&dentry->d_inode->i_lock); >> + list_for_each_entry(xattr, &info->xattr_list, list) { >> + /* skip "trusted." attributes for unprivileged callers */ >> + if (!trusted && xattr_is_trusted(xattr->name)) >> + continue; >> + >> + used += strlen(xattr->name) + 1; >> + if (buffer) { >> + if (size < used) { >> + used = -ERANGE; >> + break; >> + } >> + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); >>+ buffer += strlen(xattr->name) + 1; > > Why are you doing a strncpy here? strcpy() isn't going to copy more > than strlen(xattr->name) + 1 bytes, and you know buffer is large > enough to hold that because of the previous if (size < used) check? > > If you assigned the first strlen(xattr->name) + 1 to a temporary > variable, you could use memcpy here, and avoid the 3 repeated > strlen(xattr->name) calls. Yeah, makes sense. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-20 13:43 ` Miklos Szeredi @ 2011-04-21 6:59 ` Michal Suchanek 2011-04-21 9:08 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-21 6:59 UTC (permalink / raw) To: Miklos Szeredi Cc: Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On 20 April 2011 15:43, Miklos Szeredi <miklos@szeredi.hu> wrote: > Phillip Lougher <phillip.lougher@gmail.com> writes: > >> On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote: >> >>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) >> >>> + spin_lock(&dentry->d_inode->i_lock); >>> + list_for_each_entry(xattr, &info->xattr_list, list) { >>> + /* skip "trusted." attributes for unprivileged callers */ >>> + if (!trusted && xattr_is_trusted(xattr->name)) >>> + continue; >>> + >>> + used += strlen(xattr->name) + 1; >>> + if (buffer) { >>> + if (size < used) { >>> + used = -ERANGE; >>> + break; >>> + } >>> + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); >>>+ buffer += strlen(xattr->name) + 1; >> >> Why are you doing a strncpy here? strcpy() isn't going to copy more >> than strlen(xattr->name) + 1 bytes, and you know buffer is large >> enough to hold that because of the previous if (size < used) check? >> >> If you assigned the first strlen(xattr->name) + 1 to a temporary >> variable, you could use memcpy here, and avoid the 3 repeated >> strlen(xattr->name) calls. > > Yeah, makes sense. > Can you put up a branch with both overlayfs and xattrs? I would like to point people at a repo with working implementation and mirroring the kernel tree to add one patch is a bit overkill. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 6:59 ` Michal Suchanek @ 2011-04-21 9:08 ` Miklos Szeredi 2011-04-21 10:59 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-04-21 9:08 UTC (permalink / raw) To: Michal Suchanek Cc: Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris Michal Suchanek <hramrach@centrum.cz> writes: > Can you put up a branch with both overlayfs and xattrs? > > I would like to point people at a repo with working implementation and > mirroring the kernel tree to add one patch is a bit overkill. Okay, added the tmpfs-xattr patch to the overlayfs.v8 branch. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 9:08 ` Miklos Szeredi @ 2011-04-21 10:59 ` Michal Suchanek 2011-04-21 14:58 ` Jordi Pujol 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-21 10:59 UTC (permalink / raw) To: Miklos Szeredi Cc: Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On 21 April 2011 11:08, Miklos Szeredi <miklos@szeredi.hu> wrote: > Michal Suchanek <hramrach@centrum.cz> writes: > >> Can you put up a branch with both overlayfs and xattrs? >> >> I would like to point people at a repo with working implementation and >> mirroring the kernel tree to add one patch is a bit overkill. > > Okay, added the tmpfs-xattr patch to the overlayfs.v8 branch. > Okay, I built a small live system using this branch and it boots without errors. Now it's time to build a larger image I guess. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 10:59 ` Michal Suchanek @ 2011-04-21 14:58 ` Jordi Pujol 2011-04-21 15:22 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Jordi Pujol @ 2011-04-21 14:58 UTC (permalink / raw) To: Michal Suchanek Cc: Miklos Szeredi, Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure: > Now it's time to build a larger image I guess. Hello, I have done it, building a desktop image that contains kde4 and some services, The tests show that tmpfs is broken, it makes first operations but at some point it becomes erratic; that occurs in non-persistent mode, where overlayfs upperdir is on tmpfs, and in persistent mode also but in this case more user working is needed before tmpfs failure, in Linux single user mode I have installed the nvidia packages and driver using dpkg, only some files are installed, therefore I tried creating manually a required directory and in this manner the process has done some steps more. kernel source: http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3- lnet_2.6.38-15.tar.bz2 logs: http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2 thanks, Jordi Pujol Live never ending Tale GNU/Linux Live forever! http://livenet.selfip.com ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 14:58 ` Jordi Pujol @ 2011-04-21 15:22 ` Michal Suchanek 2011-04-21 15:43 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-21 15:22 UTC (permalink / raw) To: Jordi Pujol Cc: Miklos Szeredi, Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On 21 April 2011 16:58, Jordi Pujol <jordipujolp@gmail.com> wrote: > A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure: >> Now it's time to build a larger image I guess. > > Hello, > > I have done it, building a desktop image that contains kde4 and some services, > The tests show that tmpfs is broken, it makes first operations but at some > point it becomes erratic; that occurs in non-persistent mode, where > overlayfs upperdir is on tmpfs, and in persistent mode also but in this case > more user working is needed before tmpfs failure, > > in Linux single user mode I have installed the nvidia packages and driver > using dpkg, only some files are installed, therefore I tried creating manually > a required directory and in this manner the process has done some steps more. > > kernel source: > > http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3- > lnet_2.6.38-15.tar.bz2 > > logs: > > http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2 > Yes, while the system appears to run generally fine unpacking dpkg packages fails. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 15:22 ` Michal Suchanek @ 2011-04-21 15:43 ` Michal Suchanek 2011-04-21 17:26 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-21 15:43 UTC (permalink / raw) To: Jordi Pujol Cc: Miklos Szeredi, Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On 21 April 2011 17:22, Michal Suchanek <hramrach@centrum.cz> wrote: > On 21 April 2011 16:58, Jordi Pujol <jordipujolp@gmail.com> wrote: >> A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure: >>> Now it's time to build a larger image I guess. >> >> Hello, >> >> I have done it, building a desktop image that contains kde4 and some services, >> The tests show that tmpfs is broken, it makes first operations but at some >> point it becomes erratic; that occurs in non-persistent mode, where >> overlayfs upperdir is on tmpfs, and in persistent mode also but in this case >> more user working is needed before tmpfs failure, >> >> in Linux single user mode I have installed the nvidia packages and driver >> using dpkg, only some files are installed, therefore I tried creating manually >> a required directory and in this manner the process has done some steps more. >> >> kernel source: >> >> http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3- >> lnet_2.6.38-15.tar.bz2 >> >> logs: >> >> http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2 >> > > Yes, while the system appears to run generally fine unpacking dpkg > packages fails. You will be probably interested in this log: ls: cannot access /usr/share/doc/ucb*: No such file or directory ... rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory) rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory) mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0 chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0 chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0 utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05, 2008/06/08-11:38:17]) = 0 rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") = -1 EOPNOTSUPP (Operation not supported) Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 15:43 ` Michal Suchanek @ 2011-04-21 17:26 ` Miklos Szeredi 2011-04-21 19:17 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-04-21 17:26 UTC (permalink / raw) To: Michal Suchanek Cc: Jordi Pujol, Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris Michal Suchanek <hramrach@centrum.cz> writes: > You will be probably interested in this log: > > ls: cannot access /usr/share/doc/ucb*: No such file or directory > ... > rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory) > rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory) > mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0 > chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0 > chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0 > utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05, > 2008/06/08-11:38:17]) = 0 > rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") = > -1 EOPNOTSUPP (Operation not supported) Oops, stupid mistake. Following patch should fix it. Thanks, Miklos commit e90f1946938da7c9502337340126beae1931d239 Author: Miklos Szeredi <mszeredi@suse.cz> Date: Thu Apr 21 19:18:23 2011 +0200 tmpfs: xattr: fix xattr for directories and special files Forgot to update the xattr methods for some file types. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> diff --git a/mm/shmem.c b/mm/shmem.c index ac8ec9e..c527484 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2714,10 +2714,10 @@ static const struct inode_operations shmem_dir_inode_operations = { .rename = shmem_rename, #endif #ifdef CONFIG_TMPFS_XATTR - .setxattr = generic_setxattr, - .getxattr = generic_getxattr, - .listxattr = generic_listxattr, - .removexattr = generic_removexattr, + .setxattr = shmem_setxattr, + .getxattr = shmem_getxattr, + .listxattr = shmem_listxattr, + .removexattr = shmem_removexattr, #endif #ifdef CONFIG_TMPFS_POSIX_ACL .setattr = shmem_notify_change, @@ -2727,10 +2727,10 @@ static const struct inode_operations shmem_dir_inode_operations = { static const struct inode_operations shmem_special_inode_operations = { #ifdef CONFIG_TMPFS_XATTR - .setxattr = generic_setxattr, - .getxattr = generic_getxattr, - .listxattr = generic_listxattr, - .removexattr = generic_removexattr, + .setxattr = shmem_setxattr, + .getxattr = shmem_getxattr, + .listxattr = shmem_listxattr, + .removexattr = shmem_removexattr, #endif #ifdef CONFIG_TMPFS_POSIX_ACL .setattr = shmem_notify_change, ^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-21 17:26 ` Miklos Szeredi @ 2011-04-21 19:17 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-21 19:17 UTC (permalink / raw) To: Miklos Szeredi Cc: Jordi Pujol, Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris On 21 April 2011 19:26, Miklos Szeredi <miklos@szeredi.hu> wrote: > Michal Suchanek <hramrach@centrum.cz> writes: > >> You will be probably interested in this log: >> >> ls: cannot access /usr/share/doc/ucb*: No such file or directory >> ... >> rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory) >> rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory) >> mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0 >> chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0 >> chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0 >> utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05, >> 2008/06/08-11:38:17]) = 0 >> rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") = >> -1 EOPNOTSUPP (Operation not supported) > > Oops, stupid mistake. > > Following patch should fix it. > Yes, with the additional patch package installation now works. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-19 20:04 ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi 2011-04-20 2:18 ` Phillip Lougher @ 2011-04-20 16:00 ` Serge E. Hallyn 2011-05-12 4:20 ` Hugh Dickins 2 siblings, 0 replies; 81+ messages in thread From: Serge E. Hallyn @ 2011-04-20 16:00 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins, Eric Paris Quoting Miklos Szeredi (miklos@szeredi.hu): > Michal Suchanek <hramrach@centrum.cz> writes: > > Applying this patch is not sufficient. Apparently more xattrs are > > needed but adding them on top of this patch should be easy. > > > > The ones mentioned in the overlayfs doc are > > > > trusted.overlay.whiteout > > trusted.overlay.opaque > > > > The patch implements security.* > > Here's an updated patch. It changes a number of things: > > - it implements shmem specific xattr ops instead of using the generic_* > functions. Which means that there's no back and forth between the > VFS and the filesystem. I basically copied the btrfs way of doing > things. > > - adds a new config option: CONFIG_TMPFS_XATTR and makes > CONFIG_TMPFS_POSIX_ACL depend on this. This way xattr support can be > turned on without also adding ACL support. > > - now supports trusted.* namespace needed by overlayfs in addition to > security.*. Doesn't yet support user.* since that needs more thought > wrt. kernel memory use. > > - supports xattrs on symlinks, again needed by overlayfs > > Hugh, Eric, does this look acceptable? > > Thanks, > Miklos > > --- > From: Eric Paris <eparis@redhat.com> > Subject: tmpfs: implement generic xattr support > > This patch implements generic xattrs for tmpfs filesystems. The feodra > project, while trying to replace suid apps with file capabilities, > realized that tmpfs, which is used on the build systems, does not > support file capabilities and thus cannot be used to build packages > which use file capabilities. Xattrs are also needed for overlayfs. > > The xattr interface is a bit, odd. If a filesystem does not implement any > {get,set,list}xattr functions the VFS will call into some random LSM hooks and > the running LSM can then implement some method for handling xattrs. SELinux > for example provides a method to support security.selinux but no other > security.* xattrs. > > As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have > xattr handler routines specifically to handle acls. Because of this tmpfs > would loose the VFS/LSM helpers to support the running LSM. To make up for > that tmpfs had stub functions that did nothing but call into the LSM hooks > which implement the helpers. > > This new patch does not use the LSM fallback functions and instead > just implements a native get/set/list xattr feature for the full > security.* and trusted.* namespace like a normal filesystem. This > means that tmpfs can now support both security.selinux and > security.capability, which was not previously possible. > > The basic implementation is that I attach a: > > struct shmem_xattr { > struct list_head list; /* anchored by shmem_inode_info->xattr_list */ > char *name; > size_t size; > char value[0]; > }; > > Into the struct shmem_inode_info for each xattr that is set. This > implementation could easily support the user.* namespace as well, > except some care needs to be taken to prevent large amounts of > unswappable memory being allocated for unprivileged users. > > Signed-off-by: Eric Paris <eparis@redhat.com> > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Looks good to me. Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> thanks, -serge > --- > fs/Kconfig | 18 ++ > include/linux/shmem_fs.h | 1 > mm/shmem.c | 302 +++++++++++++++++++++++++++++++++++++++-------- > 3 files changed, 273 insertions(+), 48 deletions(-) > > Index: linux-2.6/fs/Kconfig > =================================================================== > --- linux-2.6.orig/fs/Kconfig 2011-04-19 21:09:33.000000000 +0200 > +++ linux-2.6/fs/Kconfig 2011-04-19 21:09:35.000000000 +0200 > @@ -121,9 +121,25 @@ config TMPFS > > See <file:Documentation/filesystems/tmpfs.txt> for details. > > +config TMPFS_XATTR > + bool "Tmpfs extended attributes" > + depends on TMPFS > + default y > + help > + Extended attributes are name:value pairs associated with inodes by > + the kernel or by users (see the attr(5) manual page, or visit > + <http://acl.bestbits.at/> for details). > + > + Currently this enables support for the trusted.* and > + security.* namespaces. > + > + If unsure, say N. > + > + You need this for POSIX ACL support on tmpfs. > + > config TMPFS_POSIX_ACL > bool "Tmpfs POSIX Access Control Lists" > - depends on TMPFS > + depends on TMPFS_XATTR > select GENERIC_ACL > help > POSIX Access Control Lists (ACLs) support permissions for users and > Index: linux-2.6/include/linux/shmem_fs.h > =================================================================== > --- linux-2.6.orig/include/linux/shmem_fs.h 2011-04-19 21:09:25.000000000 +0200 > +++ linux-2.6/include/linux/shmem_fs.h 2011-04-19 21:09:35.000000000 +0200 > @@ -19,6 +19,7 @@ struct shmem_inode_info { > struct page *i_indirect; /* top indirect blocks page */ > swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */ > struct list_head swaplist; /* chain of maybes on swap */ > + struct list_head xattr_list; /* list of shmem_xattr */ > struct inode vfs_inode; > }; > > Index: linux-2.6/mm/shmem.c > =================================================================== > --- linux-2.6.orig/mm/shmem.c 2011-04-19 21:09:25.000000000 +0200 > +++ linux-2.6/mm/shmem.c 2011-04-19 21:09:35.000000000 +0200 > @@ -99,6 +99,13 @@ static struct vfsmount *shm_mnt; > /* Pretend that each entry is of this size in directory's i_size */ > #define BOGO_DIRENT_SIZE 20 > > +struct shmem_xattr { > + struct list_head list; /* anchored by shmem_inode_info->xattr_list */ > + char *name; /* xattr name */ > + size_t size; > + char value[0]; > +}; > + > /* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */ > enum sgp_type { > SGP_READ, /* don't exceed i_size, don't allocate page */ > @@ -822,6 +829,7 @@ static int shmem_notify_change(struct de > static void shmem_evict_inode(struct inode *inode) > { > struct shmem_inode_info *info = SHMEM_I(inode); > + struct shmem_xattr *xattr, *nxattr; > > if (inode->i_mapping->a_ops == &shmem_aops) { > truncate_inode_pages(inode->i_mapping, 0); > @@ -834,6 +842,11 @@ static void shmem_evict_inode(struct ino > mutex_unlock(&shmem_swaplist_mutex); > } > } > + > + list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) { > + kfree(xattr->name); > + kfree(xattr); > + } > BUG_ON(inode->i_blocks); > shmem_free_inode(inode->i_sb); > end_writeback(inode); > @@ -1597,6 +1610,7 @@ static struct inode *shmem_get_inode(str > spin_lock_init(&info->lock); > info->flags = flags & VM_NORESERVE; > INIT_LIST_HEAD(&info->swaplist); > + INIT_LIST_HEAD(&info->xattr_list); > cache_no_acl(inode); > > switch (mode & S_IFMT) { > @@ -2048,62 +2062,225 @@ static void shmem_put_link(struct dentry > } > } > > -static const struct inode_operations shmem_symlink_inline_operations = { > - .readlink = generic_readlink, > - .follow_link = shmem_follow_link_inline, > -}; > - > -static const struct inode_operations shmem_symlink_inode_operations = { > - .readlink = generic_readlink, > - .follow_link = shmem_follow_link, > - .put_link = shmem_put_link, > -}; > - > -#ifdef CONFIG_TMPFS_POSIX_ACL > +#ifdef CONFIG_TMPFS_XATTR > /* > - * Superblocks without xattr inode operations will get security.* xattr > - * support from the VFS "for free". As soon as we have any other xattrs > + * Superblocks without xattr inode operations may get some security.* xattr > + * support from the LSM "for free". As soon as we have any other xattrs > * like ACLs, we also need to implement the security.* handlers at > * filesystem level, though. > */ > > -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list, > - size_t list_len, const char *name, > - size_t name_len, int handler_flags) > +static int shmem_xattr_get(struct dentry *dentry, const char *name, > + void *buffer, size_t size) > { > - return security_inode_listsecurity(dentry->d_inode, list, list_len); > -} > + struct shmem_inode_info *info; > + struct shmem_xattr *xattr; > + int ret = -ENODATA; > > -static int shmem_xattr_security_get(struct dentry *dentry, const char *name, > - void *buffer, size_t size, int handler_flags) > -{ > - if (strcmp(name, "") == 0) > - return -EINVAL; > - return xattr_getsecurity(dentry->d_inode, name, buffer, size); > + info = SHMEM_I(dentry->d_inode); > + > + spin_lock(&dentry->d_inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + if (strcmp(name, xattr->name)) > + continue; > + > + ret = xattr->size; > + if (buffer) { > + if (size < xattr->size) > + ret = -ERANGE; > + else > + memcpy(buffer, xattr->value, xattr->size); > + } > + break; > + } > + spin_unlock(&dentry->d_inode->i_lock); > + return ret; > } > > -static int shmem_xattr_security_set(struct dentry *dentry, const char *name, > - const void *value, size_t size, int flags, int handler_flags) > +static int shmem_xattr_set(struct dentry *dentry, const char *name, > + const void *value, size_t size, int flags) > { > - if (strcmp(name, "") == 0) > - return -EINVAL; > - return security_inode_setsecurity(dentry->d_inode, name, value, > - size, flags); > + struct inode *inode = dentry->d_inode; > + struct shmem_inode_info *info = SHMEM_I(inode); > + struct shmem_xattr *xattr; > + struct shmem_xattr *new_xattr = NULL; > + size_t len; > + int err = 0; > + > + /* value == NULL means remove */ > + if (value) { > + /* wrap around? */ > + len = sizeof(*new_xattr) + size; > + if (len <= sizeof(*new_xattr)) > + return -ENOMEM; > + > + new_xattr = kmalloc(len, GFP_NOFS); > + if (!new_xattr) > + return -ENOMEM; > + > + new_xattr->name = kstrdup(name, GFP_NOFS); > + if (!new_xattr->name) { > + kfree(new_xattr); > + return -ENOMEM; > + } > + > + new_xattr->size = size; > + memcpy(new_xattr->value, value, size); > + } > + > + spin_lock(&inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + if (!strcmp(name, xattr->name)) { > + if (flags & XATTR_CREATE) { > + xattr = new_xattr; > + err = -EEXIST; > + } else if (new_xattr) { > + list_replace(&xattr->list, &new_xattr->list); > + } else { > + list_del(&xattr->list); > + } > + goto out; > + } > + } > + if (flags & XATTR_REPLACE) { > + xattr = new_xattr; > + err = -ENODATA; > + } else { > + list_add(&new_xattr->list, &info->xattr_list); > + xattr = NULL; > + } > +out: > + spin_unlock(&inode->i_lock); > + if (xattr) > + kfree(xattr->name); > + kfree(xattr); > + return 0; > } > > -static const struct xattr_handler shmem_xattr_security_handler = { > - .prefix = XATTR_SECURITY_PREFIX, > - .list = shmem_xattr_security_list, > - .get = shmem_xattr_security_get, > - .set = shmem_xattr_security_set, > -}; > > static const struct xattr_handler *shmem_xattr_handlers[] = { > +#ifdef CONFIG_TMPFS_POSIX_ACL > &generic_acl_access_handler, > &generic_acl_default_handler, > - &shmem_xattr_security_handler, > +#endif > NULL > }; > + > +static int shmem_xattr_validate(const char *name) > +{ > + struct { const char *prefix; size_t len; } arr[] = { > + { XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN }, > + { XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }}; > + int i; > + > + for (i = 0; i < ARRAY_SIZE(arr); i++) { > + size_t preflen = arr[i].len; > + if (strncmp(name, arr[i].prefix, preflen) == 0) { > + if (!name[preflen]) > + return -EINVAL; > + return 0; > + } > + } > + return -EOPNOTSUPP; > +} > + > +static ssize_t shmem_getxattr(struct dentry *dentry, const char *name, > + void *buffer, size_t size) > +{ > + int err; > + > + /* > + * If this is a request for a synthetic attribute in the system.* > + * namespace use the generic infrastructure to resolve a handler > + * for it via sb->s_xattr. > + */ > + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) > + return generic_getxattr(dentry, name, buffer, size); > + > + err = shmem_xattr_validate(name); > + if (err) > + return err; > + > + return shmem_xattr_get(dentry, name, buffer, size); > +} > + > +static int shmem_setxattr(struct dentry *dentry, const char *name, > + const void *value, size_t size, int flags) > +{ > + int err; > + > + /* > + * If this is a request for a synthetic attribute in the system.* > + * namespace use the generic infrastructure to resolve a handler > + * for it via sb->s_xattr. > + */ > + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) > + return generic_setxattr(dentry, name, value, size, flags); > + > + err = shmem_xattr_validate(name); > + if (err) > + return err; > + > + if (size == 0) > + value = ""; /* empty EA, do not remove */ > + > + return shmem_xattr_set(dentry, name, value, size, flags); > + > +} > + > +static int shmem_removexattr(struct dentry *dentry, const char *name) > +{ > + int err; > + > + /* > + * If this is a request for a synthetic attribute in the system.* > + * namespace use the generic infrastructure to resolve a handler > + * for it via sb->s_xattr. > + */ > + if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN)) > + return generic_removexattr(dentry, name); > + > + err = shmem_xattr_validate(name); > + if (err) > + return err; > + > + return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE); > +} > + > +static bool xattr_is_trusted(const char *name) > +{ > + return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN); > +} > + > +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) > +{ > + bool trusted = capable(CAP_SYS_ADMIN); > + struct shmem_xattr *xattr; > + struct shmem_inode_info *info; > + size_t used = 0; > + > + info = SHMEM_I(dentry->d_inode); > + > + spin_lock(&dentry->d_inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + /* skip "trusted." attributes for unprivileged callers */ > + if (!trusted && xattr_is_trusted(xattr->name)) > + continue; > + > + used += strlen(xattr->name) + 1; > + if (buffer) { > + if (size < used) { > + used = -ERANGE; > + break; > + } > + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); > + buffer += strlen(xattr->name) + 1; > + } > + } > + spin_unlock(&dentry->d_inode->i_lock); > + > + return used; > +} > #endif > > static struct dentry *shmem_get_parent(struct dentry *child) > @@ -2384,8 +2561,10 @@ int shmem_fill_super(struct super_block > sb->s_magic = TMPFS_MAGIC; > sb->s_op = &shmem_ops; > sb->s_time_gran = 1; > -#ifdef CONFIG_TMPFS_POSIX_ACL > +#ifdef CONFIG_TMPFS_XATTR > sb->s_xattr = shmem_xattr_handlers; > +#endif > +#ifdef CONFIG_TMPFS_POSIX_ACL > sb->s_flags |= MS_POSIXACL; > #endif > > @@ -2483,16 +2662,41 @@ static const struct file_operations shme > static const struct inode_operations shmem_inode_operations = { > .setattr = shmem_notify_change, > .truncate_range = shmem_truncate_range, > +#ifdef CONFIG_TMPFS_XATTR > + .setxattr = shmem_setxattr, > + .getxattr = shmem_getxattr, > + .listxattr = shmem_listxattr, > + .removexattr = shmem_removexattr, > +#endif > #ifdef CONFIG_TMPFS_POSIX_ACL > - .setxattr = generic_setxattr, > - .getxattr = generic_getxattr, > - .listxattr = generic_listxattr, > - .removexattr = generic_removexattr, > .check_acl = generic_check_acl, > #endif > > }; > > +static const struct inode_operations shmem_symlink_inline_operations = { > + .readlink = generic_readlink, > + .follow_link = shmem_follow_link_inline, > +#ifdef CONFIG_TMPFS_XATTR > + .setxattr = shmem_setxattr, > + .getxattr = shmem_getxattr, > + .listxattr = shmem_listxattr, > + .removexattr = shmem_removexattr, > +#endif > +}; > + > +static const struct inode_operations shmem_symlink_inode_operations = { > + .readlink = generic_readlink, > + .follow_link = shmem_follow_link, > + .put_link = shmem_put_link, > +#ifdef CONFIG_TMPFS_XATTR > + .setxattr = shmem_setxattr, > + .getxattr = shmem_getxattr, > + .listxattr = shmem_listxattr, > + .removexattr = shmem_removexattr, > +#endif > +}; > + > static const struct inode_operations shmem_dir_inode_operations = { > #ifdef CONFIG_TMPFS > .create = shmem_create, > @@ -2505,23 +2709,27 @@ static const struct inode_operations shm > .mknod = shmem_mknod, > .rename = shmem_rename, > #endif > -#ifdef CONFIG_TMPFS_POSIX_ACL > - .setattr = shmem_notify_change, > +#ifdef CONFIG_TMPFS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > .listxattr = generic_listxattr, > .removexattr = generic_removexattr, > +#endif > +#ifdef CONFIG_TMPFS_POSIX_ACL > + .setattr = shmem_notify_change, > .check_acl = generic_check_acl, > #endif > }; > > static const struct inode_operations shmem_special_inode_operations = { > -#ifdef CONFIG_TMPFS_POSIX_ACL > - .setattr = shmem_notify_change, > +#ifdef CONFIG_TMPFS_XATTR > .setxattr = generic_setxattr, > .getxattr = generic_getxattr, > .listxattr = generic_listxattr, > .removexattr = generic_removexattr, > +#endif > +#ifdef CONFIG_TMPFS_POSIX_ACL > + .setattr = shmem_notify_change, > .check_acl = generic_check_acl, > #endif > }; > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-04-19 20:04 ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi 2011-04-20 2:18 ` Phillip Lougher 2011-04-20 16:00 ` Serge E. Hallyn @ 2011-05-12 4:20 ` Hugh Dickins 2011-05-12 7:52 ` Michal Suchanek 2011-05-12 12:27 ` Miklos Szeredi 2 siblings, 2 replies; 81+ messages in thread From: Hugh Dickins @ 2011-05-12 4:20 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Eric Paris, Andrew Morton, James Morris, Serge Hallyn Hi Miklos, I understand from the mm-commits discussion that you're probably preparing another version: so let me make just a few comments on this old one now, in case you can factor them in. On Tue, 19 Apr 2011, Miklos Szeredi wrote: > Michal Suchanek <hramrach@centrum.cz> writes: > > Applying this patch is not sufficient. Apparently more xattrs are > > needed but adding them on top of this patch should be easy. > > > > The ones mentioned in the overlayfs doc are > > > > trusted.overlay.whiteout > > trusted.overlay.opaque > > > > The patch implements security.* > > Here's an updated patch. It changes a number of things: > > - it implements shmem specific xattr ops instead of using the generic_* > functions. Which means that there's no back and forth between the > VFS and the filesystem. I basically copied the btrfs way of doing > things. > > - adds a new config option: CONFIG_TMPFS_XATTR and makes > CONFIG_TMPFS_POSIX_ACL depend on this. This way xattr support can be > turned on without also adding ACL support. > > - now supports trusted.* namespace needed by overlayfs in addition to > security.*. Doesn't yet support user.* since that needs more thought > wrt. kernel memory use. > > - supports xattrs on symlinks, again needed by overlayfs > > Hugh, Eric, does this look acceptable? Pretty much, I think. As ever I'm somewhat bemused by these xattrs, so don't count too much on my opinion. But if it looks good to xattr people, then it's welcome in tmpfs - thank you. Eric's ack would be good. Just a few comments... > > Thanks, > Miklos > > --- > From: Eric Paris <eparis@redhat.com> > Subject: tmpfs: implement generic xattr support > > This patch implements generic xattrs for tmpfs filesystems. The feodra > project, while trying to replace suid apps with file capabilities, > realized that tmpfs, which is used on the build systems, does not > support file capabilities and thus cannot be used to build packages > which use file capabilities. Xattrs are also needed for overlayfs. > > The xattr interface is a bit, odd. If a filesystem does not implement any > {get,set,list}xattr functions the VFS will call into some random LSM hooks and > the running LSM can then implement some method for handling xattrs. SELinux > for example provides a method to support security.selinux but no other > security.* xattrs. > > As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have > xattr handler routines specifically to handle acls. Because of this tmpfs > would loose the VFS/LSM helpers to support the running LSM. To make up for > that tmpfs had stub functions that did nothing but call into the LSM hooks > which implement the helpers. > > This new patch does not use the LSM fallback functions and instead > just implements a native get/set/list xattr feature for the full > security.* and trusted.* namespace like a normal filesystem. This > means that tmpfs can now support both security.selinux and > security.capability, which was not previously possible. > > The basic implementation is that I attach a: > > struct shmem_xattr { > struct list_head list; /* anchored by shmem_inode_info->xattr_list */ > char *name; > size_t size; > char value[0]; > }; > > Into the struct shmem_inode_info for each xattr that is set. This > implementation could easily support the user.* namespace as well, > except some care needs to be taken to prevent large amounts of > unswappable memory being allocated for unprivileged users. > > Signed-off-by: Eric Paris <eparis@redhat.com> > Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> > --- > fs/Kconfig | 18 ++ > include/linux/shmem_fs.h | 1 > mm/shmem.c | 302 +++++++++++++++++++++++++++++++++++++++-------- > 3 files changed, 273 insertions(+), 48 deletions(-) > > Index: linux-2.6/fs/Kconfig > =================================================================== > --- linux-2.6.orig/fs/Kconfig 2011-04-19 21:09:33.000000000 +0200 > +++ linux-2.6/fs/Kconfig 2011-04-19 21:09:35.000000000 +0200 > @@ -121,9 +121,25 @@ config TMPFS > > See <file:Documentation/filesystems/tmpfs.txt> for details. > > +config TMPFS_XATTR > + bool "Tmpfs extended attributes" > + depends on TMPFS > + default y > + help > + Extended attributes are name:value pairs associated with inodes by > + the kernel or by users (see the attr(5) manual page, or visit > + <http://acl.bestbits.at/> for details). > + > + Currently this enables support for the trusted.* and > + security.* namespaces. > + > + If unsure, say N. > + > + You need this for POSIX ACL support on tmpfs. > + I disagree with Andrew on this, it should default to off, consistent with the "If unsure, say N" comment. Partly because that's how Linus prefers it, for good anti-bloat reasons: if people have got along without a feature for years, whyever should we suddenly default it on? And partly because TMPFS_POSIX_ACL already defaults to off (as do EXT2_FS_XATTR and EXT2_FS_POSIX_ACL). However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically, so we're not in danger of removing their old functionality. But I haven't thought of the right way to achieve that - maybe your helpful POSIX ACL comment is enough, but automatic would be better. Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it would work to make CONFIG_TMPFS_XATTR the new config option to cover both, and select it from config TMPFS_POSIX_ACL (without a prompt) so the oldconfig propagates correctly. Perhaps. But I haven't tried myself, and you may be forgiveably disinclined to make config experiments! > config TMPFS_POSIX_ACL > bool "Tmpfs POSIX Access Control Lists" > - depends on TMPFS > + depends on TMPFS_XATTR > select GENERIC_ACL > help > POSIX Access Control Lists (ACLs) support permissions for users and > Index: linux-2.6/include/linux/shmem_fs.h > =================================================================== > --- linux-2.6.orig/include/linux/shmem_fs.h 2011-04-19 21:09:25.000000000 +0200 > +++ linux-2.6/include/linux/shmem_fs.h 2011-04-19 21:09:35.000000000 +0200 > @@ -19,6 +19,7 @@ struct shmem_inode_info { > struct page *i_indirect; /* top indirect blocks page */ > swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */ > struct list_head swaplist; /* chain of maybes on swap */ > + struct list_head xattr_list; /* list of shmem_xattr */ > struct inode vfs_inode; > }; > > Index: linux-2.6/mm/shmem.c > =================================================================== > --- linux-2.6.orig/mm/shmem.c 2011-04-19 21:09:25.000000000 +0200 > +++ linux-2.6/mm/shmem.c 2011-04-19 21:09:35.000000000 +0200 ... > -#ifdef CONFIG_TMPFS_POSIX_ACL > +#ifdef CONFIG_TMPFS_XATTR > /* > - * Superblocks without xattr inode operations will get security.* xattr > - * support from the VFS "for free". As soon as we have any other xattrs > + * Superblocks without xattr inode operations may get some security.* xattr > + * support from the LSM "for free". As soon as we have any other xattrs > * like ACLs, we also need to implement the security.* handlers at > * filesystem level, though. > */ > > -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list, > - size_t list_len, const char *name, > - size_t name_len, int handler_flags) > +static int shmem_xattr_get(struct dentry *dentry, const char *name, > + void *buffer, size_t size) > { > - return security_inode_listsecurity(dentry->d_inode, list, list_len); > -} > + struct shmem_inode_info *info; > + struct shmem_xattr *xattr; > + int ret = -ENODATA; > > -static int shmem_xattr_security_get(struct dentry *dentry, const char *name, > - void *buffer, size_t size, int handler_flags) > -{ > - if (strcmp(name, "") == 0) > - return -EINVAL; > - return xattr_getsecurity(dentry->d_inode, name, buffer, size); > + info = SHMEM_I(dentry->d_inode); > + > + spin_lock(&dentry->d_inode->i_lock); Not important, but I suggest you use info->lock throughout for this, instead of dentry->d_inode->i_lock: in each case you need "info" for info->xattr_list (and don't need "inode" at all I think), so info->lock seems appropriate, and may be in the same cacheline once I make shmem_inode_info smaller. But don't worry if you'd prefer to leave it. > -static int shmem_xattr_security_set(struct dentry *dentry, const char *name, > - const void *value, size_t size, int flags, int handler_flags) > +static int shmem_xattr_set(struct dentry *dentry, const char *name, > + const void *value, size_t size, int flags) > { > - if (strcmp(name, "") == 0) > - return -EINVAL; > - return security_inode_setsecurity(dentry->d_inode, name, value, > - size, flags); > + struct inode *inode = dentry->d_inode; > + struct shmem_inode_info *info = SHMEM_I(inode); > + struct shmem_xattr *xattr; > + struct shmem_xattr *new_xattr = NULL; > + size_t len; > + int err = 0; > + > + /* value == NULL means remove */ > + if (value) { > + /* wrap around? */ > + len = sizeof(*new_xattr) + size; > + if (len <= sizeof(*new_xattr)) > + return -ENOMEM; > + > + new_xattr = kmalloc(len, GFP_NOFS); There's no need for GFP_NOFS in this patch, the page reclaim path won't ever recurse into xattr handling: just use the usual GFP_KERNEL throughout. > + if (!new_xattr) > + return -ENOMEM; > + > + new_xattr->name = kstrdup(name, GFP_NOFS); > + if (!new_xattr->name) { > + kfree(new_xattr); > + return -ENOMEM; > + } > + > + new_xattr->size = size; > + memcpy(new_xattr->value, value, size); > + } > + > + spin_lock(&inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + if (!strcmp(name, xattr->name)) { > + if (flags & XATTR_CREATE) { > + xattr = new_xattr; > + err = -EEXIST; > + } else if (new_xattr) { > + list_replace(&xattr->list, &new_xattr->list); > + } else { > + list_del(&xattr->list); > + } > + goto out; > + } > + } > + if (flags & XATTR_REPLACE) { > + xattr = new_xattr; > + err = -ENODATA; > + } else { > + list_add(&new_xattr->list, &info->xattr_list); > + xattr = NULL; > + } > +out: > + spin_unlock(&inode->i_lock); > + if (xattr) > + kfree(xattr->name); > + kfree(xattr); > + return 0; I think you meant to return err rather than always 0. ... > +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) > +{ > + bool trusted = capable(CAP_SYS_ADMIN); > + struct shmem_xattr *xattr; > + struct shmem_inode_info *info; > + size_t used = 0; > + > + info = SHMEM_I(dentry->d_inode); > + > + spin_lock(&dentry->d_inode->i_lock); > + list_for_each_entry(xattr, &info->xattr_list, list) { > + /* skip "trusted." attributes for unprivileged callers */ > + if (!trusted && xattr_is_trusted(xattr->name)) > + continue; > + > + used += strlen(xattr->name) + 1; > + if (buffer) { > + if (size < used) { > + used = -ERANGE; > + break; > + } > + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); > + buffer += strlen(xattr->name) + 1; > + } > + } > + spin_unlock(&dentry->d_inode->i_lock); > + > + return used; > +} > #endif #endif /* CONFIG_TMPFS_XATTR */ would be welcome by the time we get here. Thanks, Hugh ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-05-12 4:20 ` Hugh Dickins @ 2011-05-12 7:52 ` Michal Suchanek 2011-05-12 12:27 ` Miklos Szeredi 1 sibling, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-05-12 7:52 UTC (permalink / raw) To: Hugh Dickins Cc: Miklos Szeredi, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Eric Paris, Andrew Morton, James Morris, Serge Hallyn On 12 May 2011 06:20, Hugh Dickins <hughd@google.com> wrote: > On Tue, 19 Apr 2011, Miklos Szeredi wrote: ... > > I disagree with Andrew on this, it should default to off, consistent with > the "If unsure, say N" comment. Partly because that's how Linus prefers > it, for good anti-bloat reasons: if people have got along without a > feature for years, whyever should we suddenly default it on? And > partly because TMPFS_POSIX_ACL already defaults to off (as do > EXT2_FS_XATTR and EXT2_FS_POSIX_ACL). > > However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their > old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically, > so we're not in danger of removing their old functionality. But I haven't > thought of the right way to achieve that - maybe your helpful POSIX ACL > comment is enough, but automatic would be better. > > Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it > would work to make CONFIG_TMPFS_XATTR the new config option to cover both, > and select it from config TMPFS_POSIX_ACL (without a prompt) so the > oldconfig propagates correctly. Perhaps. But I haven't tried myself, > and you may be forgiveably disinclined to make config experiments! The xattrs are required for overlayfs so "generic" kernels will probably want those. However, the "generic" kernels likely have the acls already so this is not that much of an issue. Perhaps the acls should select xattrs so that in configs that had acls the xattrs are added automagically. Or does it already work with depends? Thanks Michal > >> config TMPFS_POSIX_ACL >> bool "Tmpfs POSIX Access Control Lists" >> - depends on TMPFS >> + depends on TMPFS_XATTR >> select GENERIC_ACL >> help ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-05-12 4:20 ` Hugh Dickins 2011-05-12 7:52 ` Michal Suchanek @ 2011-05-12 12:27 ` Miklos Szeredi 2011-05-12 14:00 ` Miklos Szeredi 1 sibling, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-05-12 12:27 UTC (permalink / raw) To: Hugh Dickins Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Eric Paris, Andrew Morton, James Morris, Serge Hallyn Hi Hugh, Hugh Dickins <hughd@google.com> writes: > I understand from the mm-commits discussion that you're probably > preparing another version: so let me make just a few comments > on this old one now, in case you can factor them in. For sure. Thanks for your comments. >> +config TMPFS_XATTR >> + bool "Tmpfs extended attributes" >> + depends on TMPFS >> + default y >> + help >> + Extended attributes are name:value pairs associated with inodes by >> + the kernel or by users (see the attr(5) manual page, or visit >> + <http://acl.bestbits.at/> for details). >> + >> + Currently this enables support for the trusted.* and >> + security.* namespaces. >> + >> + If unsure, say N. >> + >> + You need this for POSIX ACL support on tmpfs. >> + > > I disagree with Andrew on this, it should default to off, consistent with > the "If unsure, say N" comment. Partly because that's how Linus prefers > it, for good anti-bloat reasons: if people have got along without a > feature for years, whyever should we suddenly default it on? And > partly because TMPFS_POSIX_ACL already defaults to off (as do > EXT2_FS_XATTR and EXT2_FS_POSIX_ACL). Okay, changed to "default n". > However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their > old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically, > so we're not in danger of removing their old functionality. But I haven't > thought of the right way to achieve that - maybe your helpful POSIX ACL > comment is enough, but automatic would be better. AFAICS we don't really support backward compatibility in Kconfig. If something breaks, it's the user's (kernel builder's) responsibility to fix up. > Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it > would work to make CONFIG_TMPFS_XATTR the new config option to cover both, > and select it from config TMPFS_POSIX_ACL (without a prompt) so the > oldconfig propagates correctly. Perhaps. But I haven't tried myself, > and you may be forgiveably disinclined to make config experiments! I think it would work, but I prefer to have these options separate and avoid messy back compat options. On overlayfs, for example, TMPFS_XATTR would be useful but TMPFS_POSIX_ACL would not. >> + info = SHMEM_I(dentry->d_inode); >> + >> + spin_lock(&dentry->d_inode->i_lock); > > Not important, but I suggest you use info->lock throughout for this, > instead of dentry->d_inode->i_lock: in each case you need "info" for > info->xattr_list (and don't need "inode" at all I think), so info->lock > seems appropriate, and may be in the same cacheline once I make > shmem_inode_info smaller. But don't worry if you'd prefer to leave > it. Makes sense. I updated the locking. >> + new_xattr = kmalloc(len, GFP_NOFS); > > There's no need for GFP_NOFS in this patch, the page reclaim path won't > ever recurse into xattr handling: just use the usual GFP_KERNEL > throughout. Yes. I didn't notice this in Eric's patch, but GFP_NOFS is obviously not necessary here. >> + if (!new_xattr) >> + return -ENOMEM; >> + >> + new_xattr->name = kstrdup(name, GFP_NOFS); >> + if (!new_xattr->name) { >> + kfree(new_xattr); >> + return -ENOMEM; >> + } >> + >> + new_xattr->size = size; >> + memcpy(new_xattr->value, value, size); >> + } >> + >> + spin_lock(&inode->i_lock); >> + list_for_each_entry(xattr, &info->xattr_list, list) { >> + if (!strcmp(name, xattr->name)) { >> + if (flags & XATTR_CREATE) { >> + xattr = new_xattr; >> + err = -EEXIST; >> + } else if (new_xattr) { >> + list_replace(&xattr->list, &new_xattr->list); >> + } else { >> + list_del(&xattr->list); >> + } >> + goto out; >> + } >> + } >> + if (flags & XATTR_REPLACE) { >> + xattr = new_xattr; >> + err = -ENODATA; >> + } else { >> + list_add(&new_xattr->list, &info->xattr_list); >> + xattr = NULL; >> + } >> +out: >> + spin_unlock(&inode->i_lock); >> + if (xattr) >> + kfree(xattr->name); >> + kfree(xattr); >> + return 0; > > I think you meant to return err rather than always 0. Right. Well spotted. >> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) >> +{ >> + bool trusted = capable(CAP_SYS_ADMIN); >> + struct shmem_xattr *xattr; >> + struct shmem_inode_info *info; >> + size_t used = 0; >> + >> + info = SHMEM_I(dentry->d_inode); >> + >> + spin_lock(&dentry->d_inode->i_lock); >> + list_for_each_entry(xattr, &info->xattr_list, list) { >> + /* skip "trusted." attributes for unprivileged callers */ >> + if (!trusted && xattr_is_trusted(xattr->name)) >> + continue; >> + >> + used += strlen(xattr->name) + 1; >> + if (buffer) { >> + if (size < used) { >> + used = -ERANGE; >> + break; >> + } >> + strncpy(buffer, xattr->name, strlen(xattr->name) + 1); >> + buffer += strlen(xattr->name) + 1; >> + } >> + } >> + spin_unlock(&dentry->d_inode->i_lock); >> + >> + return used; >> +} >> #endif > > #endif /* CONFIG_TMPFS_XATTR */ > > would be welcome by the time we get here. Yes, fixed. Updated patch will follow shortly. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-05-12 12:27 ` Miklos Szeredi @ 2011-05-12 14:00 ` Miklos Szeredi 2011-05-12 16:52 ` Hugh Dickins 0 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-05-12 14:00 UTC (permalink / raw) To: Hugh Dickins Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Eric Paris, Andrew Morton, James Morris, Serge Hallyn Miklos Szeredi <miklos@szeredi.hu> writes: >>> + info = SHMEM_I(dentry->d_inode); >>> + >>> + spin_lock(&dentry->d_inode->i_lock); >> >> Not important, but I suggest you use info->lock throughout for this, >> instead of dentry->d_inode->i_lock: in each case you need "info" for >> info->xattr_list (and don't need "inode" at all I think), so info->lock >> seems appropriate, and may be in the same cacheline once I make >> shmem_inode_info smaller. But don't worry if you'd prefer to leave >> it. > > Makes sense. I updated the locking. This uncovered a nasty bug lurking in there: the "info" area, including lock and xattr_list, may be overwritten by inline symlinks. Because xattr_list is near the end, this wasn't noticed with casual testing, but info->lock would immediately Oops on getxattr for symlinks. I propose the following solution. It results in slightly less space for inline symlinks, but correct operation for xattrs. Does the anonymous union/struct solution look acceptable? Thanks, Miklos Index: linux-2.6/include/linux/shmem_fs.h =================================================================== --- linux-2.6.orig/include/linux/shmem_fs.h 2011-05-12 15:59:08.000000000 +0200 +++ linux-2.6/include/linux/shmem_fs.h 2011-05-12 15:58:25.000000000 +0200 @@ -11,15 +11,39 @@ struct shmem_inode_info { spinlock_t lock; - unsigned long flags; - unsigned long alloced; /* data pages alloced to file */ - unsigned long swapped; /* subtotal assigned to swap */ - unsigned long next_index; /* highest alloced index + 1 */ - struct shared_policy policy; /* NUMA memory alloc policy */ - struct page *i_indirect; /* top indirect blocks page */ - swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */ - struct list_head swaplist; /* chain of maybes on swap */ - struct list_head xattr_list; /* list of shmem_xattr */ + + /* list of shmem_xattr */ + struct list_head xattr_list; + + union { + char inline_symlink[0]; + + /* Members not used by inline symlinks: */ + struct { + unsigned long flags; + + /* data pages alloced to file */ + unsigned long alloced; + + /* subtotal assigned to swap */ + unsigned long swapped; + + /* highest alloced index + 1 */ + unsigned long next_index; + + /* NUMA memory alloc policy */ + struct shared_policy policy; + + /* top indirect blocks page */ + struct page *i_indirect; + + /* first blocks */ + swp_entry_t i_direct[SHMEM_NR_DIRECT]; + + /* chain of maybes on swap */ + struct list_head swaplist; + }; + }; struct inode vfs_inode; }; Index: linux-2.6/mm/shmem.c =================================================================== --- linux-2.6.orig/mm/shmem.c 2011-05-12 15:59:08.000000000 +0200 +++ linux-2.6/mm/shmem.c 2011-05-12 15:50:10.000000000 +0200 @@ -2029,9 +2029,9 @@ static int shmem_symlink(struct inode *d info = SHMEM_I(inode); inode->i_size = len-1; - if (len <= (char *)inode - (char *)info) { + if (len <= (char *)inode - info->inline_symlink) { /* do it inline */ - memcpy(info, symname, len); + memcpy(info->inline_symlink, symname, len); inode->i_op = &shmem_symlink_inline_operations; } else { error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL); @@ -2057,7 +2057,7 @@ static int shmem_symlink(struct inode *d static void *shmem_follow_link_inline(struct dentry *dentry, struct nameidata *nd) { - nd_set_link(nd, (char *)SHMEM_I(dentry->d_inode)); + nd_set_link(nd, SHMEM_I(dentry->d_inode)->inline_symlink); return NULL; } ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: [PATCH] tmpfs: implement generic xattr support 2011-05-12 14:00 ` Miklos Szeredi @ 2011-05-12 16:52 ` Hugh Dickins 0 siblings, 0 replies; 81+ messages in thread From: Hugh Dickins @ 2011-05-12 16:52 UTC (permalink / raw) To: Miklos Szeredi Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig, Eric Paris, Andrew Morton, James Morris, Serge Hallyn On Thu, 12 May 2011, Miklos Szeredi wrote: > Miklos Szeredi <miklos@szeredi.hu> writes: > > >>> + info = SHMEM_I(dentry->d_inode); > >>> + > >>> + spin_lock(&dentry->d_inode->i_lock); > >> > >> Not important, but I suggest you use info->lock throughout for this, > >> instead of dentry->d_inode->i_lock: in each case you need "info" for > >> info->xattr_list (and don't need "inode" at all I think), so info->lock > >> seems appropriate, and may be in the same cacheline once I make > >> shmem_inode_info smaller. But don't worry if you'd prefer to leave > >> it. > > > > Makes sense. I updated the locking. > > This uncovered a nasty bug lurking in there: the "info" area, including > lock and xattr_list, may be overwritten by inline symlinks. Because > xattr_list is near the end, this wasn't noticed with casual testing, but > info->lock would immediately Oops on getxattr for symlinks. Yikes, I'd completely forgotten about those inline symlinks: many thanks for reminding me. > > I propose the following solution. It results in slightly less space for > inline symlinks, but correct operation for xattrs. Does the anonymous > union/struct solution look acceptable? You're being conscientious to minimize the space reduction, and I wonder if I'm being sloppy about it: but I think I'd prefer you to keep it simple and just make a union of the i_direct[SHMEM_NR_DIRECT] array and the inline symlink buffer. That does waste space that was occasionally being put to use before, but saves us from embarrassment next time we forget about the inline symlinks. I intend to be removing that i_direct array very soon: I guess I'll want to kmalloc for short symlinks then, certainly not overlaying over what fields are left: so you'd be moving in that direction if you just reuse the i_direct area now. Hugh > > Thanks, > Miklos > > Index: linux-2.6/include/linux/shmem_fs.h > =================================================================== > --- linux-2.6.orig/include/linux/shmem_fs.h 2011-05-12 15:59:08.000000000 +0200 > +++ linux-2.6/include/linux/shmem_fs.h 2011-05-12 15:58:25.000000000 +0200 > @@ -11,15 +11,39 @@ > > struct shmem_inode_info { > spinlock_t lock; > - unsigned long flags; > - unsigned long alloced; /* data pages alloced to file */ > - unsigned long swapped; /* subtotal assigned to swap */ > - unsigned long next_index; /* highest alloced index + 1 */ > - struct shared_policy policy; /* NUMA memory alloc policy */ > - struct page *i_indirect; /* top indirect blocks page */ > - swp_entry_t i_direct[SHMEM_NR_DIRECT]; /* first blocks */ > - struct list_head swaplist; /* chain of maybes on swap */ > - struct list_head xattr_list; /* list of shmem_xattr */ > + > + /* list of shmem_xattr */ > + struct list_head xattr_list; > + > + union { > + char inline_symlink[0]; > + > + /* Members not used by inline symlinks: */ > + struct { > + unsigned long flags; > + > + /* data pages alloced to file */ > + unsigned long alloced; > + > + /* subtotal assigned to swap */ > + unsigned long swapped; > + > + /* highest alloced index + 1 */ > + unsigned long next_index; > + > + /* NUMA memory alloc policy */ > + struct shared_policy policy; > + > + /* top indirect blocks page */ > + struct page *i_indirect; > + > + /* first blocks */ > + swp_entry_t i_direct[SHMEM_NR_DIRECT]; > + > + /* chain of maybes on swap */ > + struct list_head swaplist; > + }; > + }; > struct inode vfs_inode; > }; > > Index: linux-2.6/mm/shmem.c > =================================================================== > --- linux-2.6.orig/mm/shmem.c 2011-05-12 15:59:08.000000000 +0200 > +++ linux-2.6/mm/shmem.c 2011-05-12 15:50:10.000000000 +0200 > @@ -2029,9 +2029,9 @@ static int shmem_symlink(struct inode *d > > info = SHMEM_I(inode); > inode->i_size = len-1; > - if (len <= (char *)inode - (char *)info) { > + if (len <= (char *)inode - info->inline_symlink) { > /* do it inline */ > - memcpy(info, symname, len); > + memcpy(info->inline_symlink, symname, len); > inode->i_op = &shmem_symlink_inline_operations; > } else { > error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL); > @@ -2057,7 +2057,7 @@ static int shmem_symlink(struct inode *d > > static void *shmem_follow_link_inline(struct dentry *dentry, struct nameidata *nd) > { > - nd_set_link(nd, (char *)SHMEM_I(dentry->d_inode)); > + nd_set_link(nd, SHMEM_I(dentry->d_inode)->inline_symlink); > return NULL; > } ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 22:18 ` Andreas Dilger @ 2011-04-18 13:34 ` Michal Suchanek 2011-04-18 13:34 ` Michal Suchanek 2011-04-18 13:37 ` Michal Suchanek 2 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-18 13:34 UTC (permalink / raw) To: Andreas Dilger Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: >> Apparently tmpfs does not support generic xattr. I understand why >> tmpfs is an attractive choice for an upper filesystem, so this should >> be addressed. >> >> I see two options here: >> >> 1) implement generic xattr in tmpfs > > There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: > > From: Eric Paris <eparis@redhat.com> > Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace > Date: March 29, 2011 12:56:49 PM MDT > > Cheers, Andreas Applying this patch is not sufficient. Apparently more xattrs are needed but adding them on top of this patch should be easy. The ones mentioned in the overlayfs doc are trusted.overlay.whiteout trusted.overlay.opaque The patch implements security.* Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? @ 2011-04-18 13:34 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-18 13:34 UTC (permalink / raw) To: Andreas Dilger Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: >> Apparently tmpfs does not support generic xattr. I understand why >> tmpfs is an attractive choice for an upper filesystem, so this should >> be addressed. >> >> I see two options here: >> >> 1) implement generic xattr in tmpfs > > There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: > > From: Eric Paris <eparis@redhat.com> > Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace > Date: March 29, 2011 12:56:49 PM MDT > > Cheers, Andreas Applying this patch is not sufficient. Apparently more xattrs are needed but adding them on top of this patch should be easy. The ones mentioned in the overlayfs doc are trusted.overlay.whiteout trusted.overlay.opaque The patch implements security.* Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-15 22:18 ` Andreas Dilger @ 2011-04-18 13:37 ` Michal Suchanek 2011-04-18 13:34 ` Michal Suchanek 2011-04-18 13:37 ` Michal Suchanek 2 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-18 13:37 UTC (permalink / raw) To: Andreas Dilger Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: >> Apparently tmpfs does not support generic xattr. I understand why >> tmpfs is an attractive choice for an upper filesystem, so this should >> be addressed. >> >> I see two options here: >> >> 1) implement generic xattr in tmpfs > > There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: > > From: Eric Paris <eparis@redhat.com> > Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace > Date: March 29, 2011 12:56:49 PM MDT > > Cheers, Andreas Applying this patch is not sufficient. Apparently more xattrs are needed but adding them on top of this patch should be easy. The ones mentioned in the overlayfs doc are trusted.overlay.whiteout trusted.overlay.opaque The patch implements security.* Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? @ 2011-04-18 13:37 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-18 13:37 UTC (permalink / raw) To: Andreas Dilger Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote: > On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote: >> Apparently tmpfs does not support generic xattr. I understand why >> tmpfs is an attractive choice for an upper filesystem, so this should >> be addressed. >> >> I see two options here: >> >> 1) implement generic xattr in tmpfs > > There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels: > > From: Eric Paris <eparis@redhat.com> > Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace > Date: March 29, 2011 12:56:49 PM MDT > > Cheers, Andreas Applying this patch is not sufficient. Apparently more xattrs are needed but adding them on top of this patch should be easy. The ones mentioned in the overlayfs doc are trusted.overlay.whiteout trusted.overlay.opaque The patch implements security.* Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-12 21:36 ` Michal Suchanek 2011-04-13 14:18 ` Jiri Kosina @ 2011-04-13 17:26 ` Ric Wheeler 2011-04-13 18:58 ` Michal Suchanek 1 sibling, 1 reply; 81+ messages in thread From: Ric Wheeler @ 2011-04-13 17:26 UTC (permalink / raw) To: Michal Suchanek Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 04/12/2011 05:36 PM, Michal Suchanek wrote: > On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com> wrote: >> On 04/12/2011 11:00 AM, Michal Suchanek wrote: >>> Hello, >>> >>> as some already know the Unionmount VFS union which has been in >>> development for some years now is the only True Union (TM) that can be >>> accepted into the kernel mainline by the VFS maintainers (for reasons >>> of their own which you can surely find if you search the web or ask >>> them directly). >>> >>> The current UnionMount version that can be found here: >>> >>> >>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works >>> >>> works for me as good as aufs does. That is I can build a live CD using >>> this unioning solution and it boots and runs without any apparent >>> issues. >>> >>> There are probably many possible uses of the union which I did not >>> test nor did I test long term stability of using the unioned >>> filesystem. As far as ephemeral live systems go it works fine for me, >>> though. >>> >>> The issue is that while the code is (nearly) finished it is not yet >>> merged into mainline and as I am not familiar with the details of >>> ever-changing Linux VFS layer forward-porting this code to current >>> kernels is somewhat challenging. >>> >>> What is the plan with unionmount now? >>> >>> What is required for it to be merged into mainline? >>> >>> Thanks >>> >>> Michal >> Hi Michal, >> >> People are actively looking to see what union mount (or overlayfs) solution >> to pursue. Val has shifted her focus away from kernel hacking these days, >> but did refresh her patch set in the last month or so. > I am not aware of such refreshed patch set, at least it is not > published in her repo. > Val posted the refreshed patches with the title on March 22nd: http://lwn.net/Articles/435019/ Ric >> Miklos has an overlay file system that has also been posted upstream. > I have no idea about his other overlay filesystem either. > >> I think that testing like you did and getting more eyes to look at the code >> is the next key step for both projects. >> > Could you provide more details? > > I am not following the Linux fs development closely. > > Thanks > > Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 17:26 ` Ric Wheeler @ 2011-04-13 18:58 ` Michal Suchanek 2011-04-13 19:11 ` Ric Wheeler 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-13 18:58 UTC (permalink / raw) To: Ric Wheeler Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 13 April 2011 19:26, Ric Wheeler <ricwheeler@gmail.com> wrote: > On 04/12/2011 05:36 PM, Michal Suchanek wrote: >> >> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com> wrote: >>> >>> On 04/12/2011 11:00 AM, Michal Suchanek wrote: >>>> >>>> Hello, >>>> >>>> as some already know the Unionmount VFS union which has been in >>>> development for some years now is the only True Union (TM) that can be >>>> accepted into the kernel mainline by the VFS maintainers (for reasons >>>> of their own which you can surely find if you search the web or ask >>>> them directly). >>>> >>>> The current UnionMount version that can be found here: >>>> >>>> >>>> >>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works >>>> >>>> works for me as good as aufs does. That is I can build a live CD using >>>> this unioning solution and it boots and runs without any apparent >>>> issues. >>>> >>>> There are probably many possible uses of the union which I did not >>>> test nor did I test long term stability of using the unioned >>>> filesystem. As far as ephemeral live systems go it works fine for me, >>>> though. >>>> >>>> The issue is that while the code is (nearly) finished it is not yet >>>> merged into mainline and as I am not familiar with the details of >>>> ever-changing Linux VFS layer forward-porting this code to current >>>> kernels is somewhat challenging. >>>> >>>> What is the plan with unionmount now? >>>> >>>> What is required for it to be merged into mainline? >>>> >>>> Thanks >>>> >>>> Michal >>> >>> Hi Michal, >>> >>> People are actively looking to see what union mount (or overlayfs) >>> solution >>> to pursue. Val has shifted her focus away from kernel hacking these days, >>> but did refresh her patch set in the last month or so. >> >> I am not aware of such refreshed patch set, at least it is not >> published in her repo. >> > > Val posted the refreshed patches with the title on March 22nd: > > http://lwn.net/Articles/435019/ > That article references the same four months old repo which I mentioned at the start of the thread, only a slightly different branch. While it maybe useful for testing unionmount (which I already tried) it is not a patch against current kernel which could be used to build current live images. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 18:58 ` Michal Suchanek @ 2011-04-13 19:11 ` Ric Wheeler 2011-04-13 19:47 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Ric Wheeler @ 2011-04-13 19:11 UTC (permalink / raw) To: Michal Suchanek Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 04/13/2011 02:58 PM, Michal Suchanek wrote: > On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com> wrote: >> On 04/12/2011 05:36 PM, Michal Suchanek wrote: >>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com> wrote: >>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote: >>>>> Hello, >>>>> >>>>> as some already know the Unionmount VFS union which has been in >>>>> development for some years now is the only True Union (TM) that can be >>>>> accepted into the kernel mainline by the VFS maintainers (for reasons >>>>> of their own which you can surely find if you search the web or ask >>>>> them directly). >>>>> >>>>> The current UnionMount version that can be found here: >>>>> >>>>> >>>>> >>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works >>>>> >>>>> works for me as good as aufs does. That is I can build a live CD using >>>>> this unioning solution and it boots and runs without any apparent >>>>> issues. >>>>> >>>>> There are probably many possible uses of the union which I did not >>>>> test nor did I test long term stability of using the unioned >>>>> filesystem. As far as ephemeral live systems go it works fine for me, >>>>> though. >>>>> >>>>> The issue is that while the code is (nearly) finished it is not yet >>>>> merged into mainline and as I am not familiar with the details of >>>>> ever-changing Linux VFS layer forward-porting this code to current >>>>> kernels is somewhat challenging. >>>>> >>>>> What is the plan with unionmount now? >>>>> >>>>> What is required for it to be merged into mainline? >>>>> >>>>> Thanks >>>>> >>>>> Michal >>>> Hi Michal, >>>> >>>> People are actively looking to see what union mount (or overlayfs) >>>> solution >>>> to pursue. Val has shifted her focus away from kernel hacking these days, >>>> but did refresh her patch set in the last month or so. >>> I am not aware of such refreshed patch set, at least it is not >>> published in her repo. >>> >> Val posted the refreshed patches with the title on March 22nd: >> >> http://lwn.net/Articles/435019/ >> > That article references the same four months old repo which I > mentioned at the start of the thread, only a slightly different > branch. > > While it maybe useful for testing unionmount (which I already tried) > it is not a patch against current kernel which could be used to build > current live images. > > Thanks > > Michal She did post the patch series that same date in March - you can probably grab the series from linux-fsdevel, look for this series: "[PATCH 00/74] Union mounts version something or other" Al Viro was planning on looking at her refreshed patches (he had reviewed them with her in person), but that is not going to happen any time soon so getting more eyes and testing would be great! Ric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 19:11 ` Ric Wheeler @ 2011-04-13 19:47 ` Michal Suchanek 2011-04-14 4:50 ` Ian Kent 2011-04-14 19:14 ` David Howells 0 siblings, 2 replies; 81+ messages in thread From: Michal Suchanek @ 2011-04-13 19:47 UTC (permalink / raw) To: Ric Wheeler Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer, miklos, Christoph Hellwig On 13 April 2011 21:11, Ric Wheeler <ricwheeler@gmail.com> wrote: > On 04/13/2011 02:58 PM, Michal Suchanek wrote: >> >> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com> wrote: >>> >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote: >>>> >>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com> wrote: >>>>> >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> as some already know the Unionmount VFS union which has been in >>>>>> development for some years now is the only True Union (TM) that can be >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons >>>>>> of their own which you can surely find if you search the web or ask >>>>>> them directly). >>>>>> >>>>>> The current UnionMount version that can be found here: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works >>>>>> >>>>>> works for me as good as aufs does. That is I can build a live CD using >>>>>> this unioning solution and it boots and runs without any apparent >>>>>> issues. >>>>>> >>>>>> There are probably many possible uses of the union which I did not >>>>>> test nor did I test long term stability of using the unioned >>>>>> filesystem. As far as ephemeral live systems go it works fine for me, >>>>>> though. >>>>>> >>>>>> The issue is that while the code is (nearly) finished it is not yet >>>>>> merged into mainline and as I am not familiar with the details of >>>>>> ever-changing Linux VFS layer forward-porting this code to current >>>>>> kernels is somewhat challenging. >>>>>> >>>>>> What is the plan with unionmount now? >>>>>> >>>>>> What is required for it to be merged into mainline? >>>>>> >>>>>> Thanks >>>>>> >>>>>> Michal >>>>> >>>>> Hi Michal, >>>>> >>>>> People are actively looking to see what union mount (or overlayfs) >>>>> solution >>>>> to pursue. Val has shifted her focus away from kernel hacking these >>>>> days, >>>>> but did refresh her patch set in the last month or so. >>>> >>>> I am not aware of such refreshed patch set, at least it is not >>>> published in her repo. >>>> >>> Val posted the refreshed patches with the title on March 22nd: >>> >>> http://lwn.net/Articles/435019/ >>> >> That article references the same four months old repo which I >> mentioned at the start of the thread, only a slightly different >> branch. >> >> While it maybe useful for testing unionmount (which I already tried) >> it is not a patch against current kernel which could be used to build >> current live images. >> >> Thanks >> >> Michal > > She did post the patch series that same date in March - you can probably > grab the series from linux-fsdevel, look for this series: > > "[PATCH 00/74] Union mounts version something or other" > > Al Viro was planning on looking at her refreshed patches (he had reviewed > them with her in person), but that is not going to happen any time soon so > getting more eyes and testing would be great! > Even gmame can't collect the patches back from the ML, I don't want to try. However, the discussion suggests that these are exactly the 4 months old branch ending in a commit with the summary "Temporary commit" which did not inspire confidence in me so I used the previous (also 4 moths old) branch. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 19:47 ` Michal Suchanek @ 2011-04-14 4:50 ` Ian Kent 2011-04-14 9:32 ` Michal Suchanek 2011-04-14 19:14 ` David Howells 1 sibling, 1 reply; 81+ messages in thread From: Ian Kent @ 2011-04-14 4:50 UTC (permalink / raw) To: Michal Suchanek Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Jeff Moyer, miklos, Christoph Hellwig On Wed, 2011-04-13 at 21:47 +0200, Michal Suchanek wrote: > On 13 April 2011 21:11, Ric Wheeler <ricwheeler@gmail.com> wrote: > > On 04/13/2011 02:58 PM, Michal Suchanek wrote: > >> > >> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com> wrote: > >>> > >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote: > >>>> > >>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com> wrote: > >>>>> > >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> as some already know the Unionmount VFS union which has been in > >>>>>> development for some years now is the only True Union (TM) that can be > >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons > >>>>>> of their own which you can surely find if you search the web or ask > >>>>>> them directly). > >>>>>> > >>>>>> The current UnionMount version that can be found here: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works > >>>>>> > >>>>>> works for me as good as aufs does. That is I can build a live CD using > >>>>>> this unioning solution and it boots and runs without any apparent > >>>>>> issues. > >>>>>> > >>>>>> There are probably many possible uses of the union which I did not > >>>>>> test nor did I test long term stability of using the unioned > >>>>>> filesystem. As far as ephemeral live systems go it works fine for me, > >>>>>> though. > >>>>>> > >>>>>> The issue is that while the code is (nearly) finished it is not yet > >>>>>> merged into mainline and as I am not familiar with the details of > >>>>>> ever-changing Linux VFS layer forward-porting this code to current > >>>>>> kernels is somewhat challenging. > >>>>>> > >>>>>> What is the plan with unionmount now? > >>>>>> > >>>>>> What is required for it to be merged into mainline? > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> Michal > >>>>> > >>>>> Hi Michal, > >>>>> > >>>>> People are actively looking to see what union mount (or overlayfs) > >>>>> solution > >>>>> to pursue. Val has shifted her focus away from kernel hacking these > >>>>> days, > >>>>> but did refresh her patch set in the last month or so. > >>>> > >>>> I am not aware of such refreshed patch set, at least it is not > >>>> published in her repo. > >>>> > >>> Val posted the refreshed patches with the title on March 22nd: > >>> > >>> http://lwn.net/Articles/435019/ > >>> > >> That article references the same four months old repo which I > >> mentioned at the start of the thread, only a slightly different > >> branch. > >> > >> While it maybe useful for testing unionmount (which I already tried) > >> it is not a patch against current kernel which could be used to build > >> current live images. > >> > >> Thanks > >> > >> Michal > > > > She did post the patch series that same date in March - you can probably > > grab the series from linux-fsdevel, look for this series: > > > > "[PATCH 00/74] Union mounts version something or other" > > > > Al Viro was planning on looking at her refreshed patches (he had reviewed > > them with her in person), but that is not going to happen any time soon so > > getting more eyes and testing would be great! > > > > Even gmame can't collect the patches back from the ML, I don't want to try. > > However, the discussion suggests that these are exactly the 4 months > old branch ending in a commit with the summary "Temporary commit" > which did not inspire confidence in me so I used the previous (also 4 > moths old) branch. Yes, that's the impression I have too. I believe David was working to update the patches and his silence indicates he is probably bogged down with other priority work. If that's the case, and your still interested, I might be able to help updating the series some time soon. I haven't reviewed any of Val's series posts for a while now so I'd need to catch up with the current state of the project first. I guess the first thing is to find out if David has made any progress, David? As for the overlayfs from Miklos I haven't looked closely at it but since Miklos hasn't replied so far I'm guessing there's still a way to with that as well. Ian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 4:50 ` Ian Kent @ 2011-04-14 9:32 ` Michal Suchanek 2011-04-14 9:40 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-14 9:32 UTC (permalink / raw) To: Ian Kent Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Jeff Moyer, miklos, Christoph Hellwig On 14 April 2011 06:50, Ian Kent <ikent@redhat.com> wrote: > On Wed, 2011-04-13 at 21:47 +0200, Michal Suchanek wrote: >> On 13 April 2011 21:11, Ric Wheeler <ricwheeler@gmail.com> wrote: >> > On 04/13/2011 02:58 PM, Michal Suchanek wrote: >> >> >> >> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com> wrote: >> >>> >> >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote: >> >>>> >> >>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com> wrote: >> >>>>> >> >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote: >> >>>>>> >> >>>>>> Hello, >> >>>>>> >> >>>>>> as some already know the Unionmount VFS union which has been in >> >>>>>> development for some years now is the only True Union (TM) that can be >> >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons >> >>>>>> of their own which you can surely find if you search the web or ask >> >>>>>> them directly). >> >>>>>> >> >>>>>> The current UnionMount version that can be found here: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works >> >>>>>> >> >>>>>> works for me as good as aufs does. That is I can build a live CD using >> >>>>>> this unioning solution and it boots and runs without any apparent >> >>>>>> issues. >> >>>>>> >> >>>>>> There are probably many possible uses of the union which I did not >> >>>>>> test nor did I test long term stability of using the unioned >> >>>>>> filesystem. As far as ephemeral live systems go it works fine for me, >> >>>>>> though. >> >>>>>> >> >>>>>> The issue is that while the code is (nearly) finished it is not yet >> >>>>>> merged into mainline and as I am not familiar with the details of >> >>>>>> ever-changing Linux VFS layer forward-porting this code to current >> >>>>>> kernels is somewhat challenging. >> >>>>>> >> >>>>>> What is the plan with unionmount now? >> >>>>>> >> >>>>>> What is required for it to be merged into mainline? >> >>>>>> >> >>>>>> Thanks >> >>>>>> >> >>>>>> Michal >> >>>>> >> >>>>> Hi Michal, >> >>>>> >> >>>>> People are actively looking to see what union mount (or overlayfs) >> >>>>> solution >> >>>>> to pursue. Val has shifted her focus away from kernel hacking these >> >>>>> days, >> >>>>> but did refresh her patch set in the last month or so. >> >>>> >> >>>> I am not aware of such refreshed patch set, at least it is not >> >>>> published in her repo. >> >>>> >> >>> Val posted the refreshed patches with the title on March 22nd: >> >>> >> >>> http://lwn.net/Articles/435019/ >> >>> >> >> That article references the same four months old repo which I >> >> mentioned at the start of the thread, only a slightly different >> >> branch. >> >> >> >> While it maybe useful for testing unionmount (which I already tried) >> >> it is not a patch against current kernel which could be used to build >> >> current live images. >> >> >> >> Thanks >> >> >> >> Michal >> > >> > She did post the patch series that same date in March - you can probably >> > grab the series from linux-fsdevel, look for this series: >> > >> > "[PATCH 00/74] Union mounts version something or other" >> > >> > Al Viro was planning on looking at her refreshed patches (he had reviewed >> > them with her in person), but that is not going to happen any time soon so >> > getting more eyes and testing would be great! >> > >> >> Even gmame can't collect the patches back from the ML, I don't want to try. >> >> However, the discussion suggests that these are exactly the 4 months >> old branch ending in a commit with the summary "Temporary commit" >> which did not inspire confidence in me so I used the previous (also 4 >> moths old) branch. > > Yes, that's the impression I have too. > > I believe David was working to update the patches and his silence > indicates he is probably bogged down with other priority work. If that's > the case, and your still interested, I might be able to help updating > the series some time soon. I haven't reviewed any of Val's series posts > for a while now so I'd need to catch up with the current state of the > project first. I guess overlayfs includes the better part of unionmount and achieves similar level of functionality in much smaller code size and is actively developed. This might make it the best candidate for inclusion so far. It does not (yet?) support NFS which is one of the options commonly used with union solutions, though. I personally don;t use NFS and have not tested overlayfs so far so I can't tell. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 9:32 ` Michal Suchanek @ 2011-04-14 9:40 ` Miklos Szeredi 2011-04-14 13:21 ` Ric Wheeler 0 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-04-14 9:40 UTC (permalink / raw) To: Michal Suchanek Cc: Ian Kent, Ric Wheeler, linux-fsdevel, linux-kernel, David Howells, Jeff Moyer, Christoph Hellwig On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek <hramrach@centrum.cz> wrote: > I guess overlayfs includes the better part of unionmount and achieves > similar level of functionality in much smaller code size and is > actively developed. > > This might make it the best candidate for inclusion so far. > > It does not (yet?) support NFS which is one of the options commonly > used with union solutions, though. NFS is supported as a lower (read-only) layer, but not as an upper (read-write) layer. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 9:40 ` Miklos Szeredi @ 2011-04-14 13:21 ` Ric Wheeler 2011-04-14 14:54 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Ric Wheeler @ 2011-04-14 13:21 UTC (permalink / raw) To: Miklos Szeredi, Christoph Hellwig Cc: Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, David Howells, Jeff Moyer On 04/14/2011 05:40 AM, Miklos Szeredi wrote: > On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<hramrach@centrum.cz> wrote: >> I guess overlayfs includes the better part of unionmount and achieves >> similar level of functionality in much smaller code size and is >> actively developed. >> >> This might make it the best candidate for inclusion so far. >> >> It does not (yet?) support NFS which is one of the options commonly >> used with union solutions, though. > NFS is supported as a lower (read-only) layer, but not as an upper > (read-write) layer. > > Thanks, > Miklos I am not that concerned with the state of Val's repo, her intention was to hand off the project cleanly to others and have them drive the code (that hand off was the posting of the patch set). Several people (Ian, David Howells and Al Viro) had been involved with union mounts recently, so we do have reasonable candidates for a hand off. One of the concerns with unionfs is the duplication of data. Union mounts avoids this with that implementation. That might make unionfs more of a burden for very large file systems, but probably not a concern for many use cases. The union mount patch set is certainly considerably larger. Christoph had expressed some concerns about locking that I think it would be good to discuss again as well. Ric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 13:21 ` Ric Wheeler @ 2011-04-14 14:54 ` Michal Suchanek 2011-04-15 16:31 ` Ric Wheeler 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-04-14 14:54 UTC (permalink / raw) To: Ric Wheeler Cc: Miklos Szeredi, Christoph Hellwig, Ian Kent, linux-fsdevel, linux-kernel, David Howells, Jeff Moyer On 14 April 2011 15:21, Ric Wheeler <ricwheeler@gmail.com> wrote: > On 04/14/2011 05:40 AM, Miklos Szeredi wrote: >> >> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<hramrach@centrum.cz> >> wrote: >>> >>> I guess overlayfs includes the better part of unionmount and achieves >>> similar level of functionality in much smaller code size and is >>> actively developed. >>> >>> This might make it the best candidate for inclusion so far. >>> >>> It does not (yet?) support NFS which is one of the options commonly >>> used with union solutions, though. >> >> NFS is supported as a lower (read-only) layer, but not as an upper >> (read-write) layer. >> >> Thanks, >> Miklos > > I am not that concerned with the state of Val's repo, her intention was to > hand off the project cleanly to others and have them drive the code (that > hand off was the posting of the patch set). Several people (Ian, David > Howells and Al Viro) had been involved with union mounts recently, so we do > have reasonable candidates for a hand off. > > One of the concerns with unionfs is the duplication of data. Union mounts > avoids this with that implementation. That might make unionfs more of a > burden for very large file systems, but probably not a concern for many use > cases. Just to make things clear, what is a very large filesystem? A heavily compressed DVD image? Tens or hundreds of gigabytes? Terabytes? Hundreds, thousands or hundreds of thousands of inodes? Or is testing required to determine at what size the performance becomes unacceptable? Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-14 14:54 ` Michal Suchanek @ 2011-04-15 16:31 ` Ric Wheeler 0 siblings, 0 replies; 81+ messages in thread From: Ric Wheeler @ 2011-04-15 16:31 UTC (permalink / raw) To: Michal Suchanek Cc: Miklos Szeredi, Christoph Hellwig, Ian Kent, linux-fsdevel, linux-kernel, David Howells, Jeff Moyer On 04/14/2011 10:54 AM, Michal Suchanek wrote: > On 14 April 2011 15:21, Ric Wheeler<ricwheeler@gmail.com> wrote: >> On 04/14/2011 05:40 AM, Miklos Szeredi wrote: >>> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<hramrach@centrum.cz> >>> wrote: >>>> I guess overlayfs includes the better part of unionmount and achieves >>>> similar level of functionality in much smaller code size and is >>>> actively developed. >>>> >>>> This might make it the best candidate for inclusion so far. >>>> >>>> It does not (yet?) support NFS which is one of the options commonly >>>> used with union solutions, though. >>> NFS is supported as a lower (read-only) layer, but not as an upper >>> (read-write) layer. >>> >>> Thanks, >>> Miklos >> I am not that concerned with the state of Val's repo, her intention was to >> hand off the project cleanly to others and have them drive the code (that >> hand off was the posting of the patch set). Several people (Ian, David >> Howells and Al Viro) had been involved with union mounts recently, so we do >> have reasonable candidates for a hand off. >> >> One of the concerns with unionfs is the duplication of data. Union mounts >> avoids this with that implementation. That might make unionfs more of a >> burden for very large file systems, but probably not a concern for many use >> cases. > Just to make things clear, what is a very large filesystem? > > A heavily compressed DVD image? > > Tens or hundreds of gigabytes? Terabytes? > > Hundreds, thousands or hundreds of thousands of inodes? > > Or is testing required to determine at what size the performance > becomes unacceptable? > > Thanks > > Michal Very large in the number of inodes more so than fs size... Ric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Unionmount status? 2011-04-13 19:47 ` Michal Suchanek 2011-04-14 4:50 ` Ian Kent @ 2011-04-14 19:14 ` David Howells 2011-06-29 9:39 ` Union mount and overlayfs bake off? Ric Wheeler 2011-06-29 10:17 ` David Howells 1 sibling, 2 replies; 81+ messages in thread From: David Howells @ 2011-04-14 19:14 UTC (permalink / raw) To: Ian Kent Cc: dhowells, Michal Suchanek, Ric Wheeler, linux-fsdevel, linux-kernel, Jeff Moyer, miklos, Christoph Hellwig Ian Kent <ikent@redhat.com> wrote: > I believe David was working to update the patches and his silence > indicates he is probably bogged down with other priority work. If that's > the case, and your still interested, I might be able to help updating > the series some time soon. I haven't reviewed any of Val's series posts > for a while now so I'd need to catch up with the current state of the > project first. > > I guess the first thing is to find out if David has made any progress, > David? I started trying to forwardport them. It's not trivial because of Nick's RCU pathwalk patches. However, I noticed that the collection of patches I was working on wasn't the most recent in Val's GIT tree, so I stopped work on them whilst I found out from Val which was the correct branch to use. Her reply came just before I jetted out to LSF, and I haven't got round to working on them again. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Union mount and overlayfs bake off? 2011-04-14 19:14 ` David Howells @ 2011-06-29 9:39 ` Ric Wheeler 2011-06-29 11:40 ` Michal Suchanek 2011-06-29 10:17 ` David Howells 1 sibling, 1 reply; 81+ messages in thread From: Ric Wheeler @ 2011-06-29 9:39 UTC (permalink / raw) To: David Howells, Michal Suchanek, Alexander Viro Cc: Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos, Christoph Hellwig I think that it has been a while since David reposted the refreshed patch set for union mounts & know that overlayfs has recent updates as well. Despite that, I have not seen a lot of feedback from reviewers or testers. Al, from your point of view, where we stand? Thanks! Ric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and overlayfs bake off? 2011-06-29 9:39 ` Union mount and overlayfs bake off? Ric Wheeler @ 2011-06-29 11:40 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-06-29 11:40 UTC (permalink / raw) To: Ric Wheeler Cc: David Howells, Alexander Viro, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos, Christoph Hellwig On 29 June 2011 11:39, Ric Wheeler <ricwheeler@gmail.com> wrote: > > I think that it has been a while since David reposted the refreshed patch > set for union mounts & know that overlayfs has recent updates as well. > > Despite that, I have not seen a lot of feedback from reviewers or testers. > I can only attest that I can build working live CD on top of either overlayfs or unionmount. This is not a good metric for reliability because issues were found with both solutions after the live CDs were working fine already. However, it is an usability metric in that both solutions are from the alpha stage in which crucial features were missing or broken way into the beta stage in which corner case issues are discovered but the solution is generally working. Somebody was running racer on top of overlayfs which uncovered some problems with overlayfs itself and linux vfs in general. Unionmount has supposedly its own testsuite but I have never seen that. There are some issues with neither creating full emulation of a traditional filesystem and the cost of hiding more effects of the union from userspace to make the filesystem look more traditional. With something touching VFS, however, it is important to decide how (and if) these corner case issues should be resolved because they could possibly affect unrelated parts of the system. Reviewing all raised concerns is unfortunately taking quite some time, and then addressing them in code will probably require more. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and overlayfs bake off? 2011-04-14 19:14 ` David Howells 2011-06-29 9:39 ` Union mount and overlayfs bake off? Ric Wheeler @ 2011-06-29 10:17 ` David Howells 2011-06-30 12:44 ` Miklos Szeredi 2011-07-10 8:28 ` Union mount and lockdep design issues Ric Wheeler 1 sibling, 2 replies; 81+ messages in thread From: David Howells @ 2011-06-29 10:17 UTC (permalink / raw) To: Ric Wheeler Cc: dhowells, Michal Suchanek, Alexander Viro, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos, Christoph Hellwig Ric Wheeler <ricwheeler@gmail.com> wrote: > I think that it has been a while since David reposted the refreshed patch set > for union mounts & know that overlayfs has recent updates as well. > > Despite that, I have not seen a lot of feedback from reviewers or testers. The main problem I've got is that it causes lockdep to generate warnings when the top layer and one of the lower layers are of the same filesystem type. The obvious way round this is to give each superblock its own i_mutex lock class rather than putting this in the file_system_type struct, but I'm not sure of the consequences (the two obvious problems are superblock transience and the fact that there may be so many more of them that it may explode lockdep). I've split out some of the VFS patches that we might be interested in taking upstream anyway. They're currently sat on Al's plate for his consideration. I've been dealing with some of Al's issues with the unionmount patches, but I know he's got more - I just can't remember them all. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and overlayfs bake off? 2011-06-29 10:17 ` David Howells @ 2011-06-30 12:44 ` Miklos Szeredi 2011-07-10 8:28 ` Union mount and lockdep design issues Ric Wheeler 1 sibling, 0 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-06-30 12:44 UTC (permalink / raw) To: David Howells Cc: Ric Wheeler, Michal Suchanek, Alexander Viro, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, Christoph Hellwig David Howells <dhowells@redhat.com> writes: > I've been dealing with some of Al's issues with the unionmount > patches, but I know he's got more - I just can't remember them all. A couple of questions I have: 1) What happens to the union in a cloned namespace (CLONE_NEWNS)? 2) What's the overhead for non-unioned filesystems if CONFIG_UNION_MOUNTS is enabled? Does it show up in any microbenchmarks? 3) Is there a future strategy for making atomic operations really atomic? E.g. what happens if power is lost in the middle of a copy-up? Or if whiteout of source fails after a successful rename()? 4) Have you looked at overlayfs? Do you have any thoughts about the relative merrits of each solution? Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Union mount and lockdep design issues 2011-06-29 10:17 ` David Howells 2011-06-30 12:44 ` Miklos Szeredi @ 2011-07-10 8:28 ` Ric Wheeler 2011-07-10 13:48 ` Peter Zijlstra 2011-07-11 11:01 ` David Howells 1 sibling, 2 replies; 81+ messages in thread From: Ric Wheeler @ 2011-07-10 8:28 UTC (permalink / raw) To: David Howells, Alexander Viro, Christoph Hellwig, Peter Zijlstra, Ingo Molnar Cc: Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On 06/29/2011 11:17 AM, David Howells wrote: > Ric Wheeler<ricwheeler@gmail.com> wrote: > >> I think that it has been a while since David reposted the refreshed patch set >> for union mounts& know that overlayfs has recent updates as well. >> >> Despite that, I have not seen a lot of feedback from reviewers or testers. > The main problem I've got is that it causes lockdep to generate warnings when > the top layer and one of the lower layers are of the same filesystem type. The > obvious way round this is to give each superblock its own i_mutex lock class > rather than putting this in the file_system_type struct, but I'm not sure of > the consequences (the two obvious problems are superblock transience and the > fact that there may be so many more of them that it may explode lockdep). > > I've split out some of the VFS patches that we might be interested in taking > upstream anyway. They're currently sat on Al's plate for his consideration. > > I've been dealing with some of Al's issues with the unionmount patches, but I > know he's got more - I just can't remember them all. > > David After sitting down in person to dive into the lockdep issues with David over some very nice food (thanks David!), it does seem that this is really more of a lockdep issue and the way it is designed than a union mount issue. Peter, Ingo, are either of you the right people to think about how to fix lockdep to handle stacked components (like ext4 used in union mounts stacked on top of another ext4 fs) where both layers will routinely lock to the same object? Should we do a specific hack to work around this for union mounts or look for lockdep to change? Thanks! Ric ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-10 8:28 ` Union mount and lockdep design issues Ric Wheeler @ 2011-07-10 13:48 ` Peter Zijlstra 2011-07-11 8:35 ` Michal Suchanek 2011-07-11 11:01 ` David Howells 1 sibling, 1 reply; 81+ messages in thread From: Peter Zijlstra @ 2011-07-10 13:48 UTC (permalink / raw) To: Ric Wheeler Cc: David Howells, Alexander Viro, Christoph Hellwig, Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On Sun, 2011-07-10 at 09:28 +0100, Ric Wheeler wrote: > On 06/29/2011 11:17 AM, David Howells wrote: > > Ric Wheeler<ricwheeler@gmail.com> wrote: > > > >> I think that it has been a while since David reposted the refreshed patch set > >> for union mounts& know that overlayfs has recent updates as well. > >> > >> Despite that, I have not seen a lot of feedback from reviewers or testers. > > The main problem I've got is that it causes lockdep to generate warnings when > > the top layer and one of the lower layers are of the same filesystem type. The > > obvious way round this is to give each superblock its own i_mutex lock class > > rather than putting this in the file_system_type struct, but I'm not sure of > > the consequences (the two obvious problems are superblock transience and the > > fact that there may be so many more of them that it may explode lockdep). Can't, that would involve classes in dynamically allocated memory (as you cannot a-priori determine how many instances there will be of a particular sb). There a number of good (although at times rather frustrating) reasons why lockdep cannot do dynamic memory. Most of those arguments center around things like: allocating memory involves locks, therefore we could end up wanting to allocate memory while in the allocator etc. Also, why would you want to have a class per sb-instance? From last talking to David, he said there could only ever be 2 filesystems involved in this, the top and bottom, and it is determined on (union) mount time which is which. I'm also assuming that once a filesystem is part of a union mount, it cannot be accessed from outside of said union (can it? can the botton be itself be the top layer of another union?) Therefore, why can't we, on constructing the union layers, reset the lock classes? Also, in what state are the filesystems on construction of the union? Are they already fully formed and populated (do inodes already exist?) > Peter, Ingo, are either of you the right people to think about how to fix > lockdep to handle stacked components (like ext4 used in union mounts stacked on > top of another ext4 fs) where both layers will routinely lock to the same object? We would be. > Should we do a specific hack to work around this for union mounts or look for > lockdep to change? I still don't understand the problem at all, so lets start there ;-) ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-10 13:48 ` Peter Zijlstra @ 2011-07-11 8:35 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-07-11 8:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Ric Wheeler, David Howells, Alexander Viro, Christoph Hellwig, Ingo Molnar, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On 10 July 2011 15:48, Peter Zijlstra <peterz@infradead.org> wrote: > On Sun, 2011-07-10 at 09:28 +0100, Ric Wheeler wrote: >> On 06/29/2011 11:17 AM, David Howells wrote: >> > Ric Wheeler<ricwheeler@gmail.com> wrote: >> > >> >> I think that it has been a while since David reposted the refreshed patch set >> >> for union mounts& know that overlayfs has recent updates as well. >> >> >> >> Despite that, I have not seen a lot of feedback from reviewers or testers. > >> > The main problem I've got is that it causes lockdep to generate warnings when >> > the top layer and one of the lower layers are of the same filesystem type. The >> > obvious way round this is to give each superblock its own i_mutex lock class >> > rather than putting this in the file_system_type struct, but I'm not sure of >> > the consequences (the two obvious problems are superblock transience and the >> > fact that there may be so many more of them that it may explode lockdep). > > Can't, that would involve classes in dynamically allocated memory (as > you cannot a-priori determine how many instances there will be of a > particular sb). There a number of good (although at times rather > frustrating) reasons why lockdep cannot do dynamic memory. > > Most of those arguments center around things like: allocating memory > involves locks, therefore we could end up wanting to allocate memory > while in the allocator etc. > > Also, why would you want to have a class per sb-instance? From last > talking to David, he said there could only ever be 2 filesystems > involved in this, the top and bottom, and it is determined on (union) > mount time which is which. > > I'm also assuming that once a filesystem is part of a union mount, it > cannot be accessed from outside of said union (can it? can the botton be > itself be the top layer of another union?) > It can still be accessed, there is nothing preventing access, just like mount --bind won't prevent access to the original location. It might be possible to block access to it but it would be another artifical limitation. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-10 8:28 ` Union mount and lockdep design issues Ric Wheeler 2011-07-10 13:48 ` Peter Zijlstra @ 2011-07-11 11:01 ` David Howells 2011-07-11 12:00 ` Peter Zijlstra 2011-07-11 13:54 ` David Howells 1 sibling, 2 replies; 81+ messages in thread From: David Howells @ 2011-07-11 11:01 UTC (permalink / raw) To: Peter Zijlstra Cc: dhowells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos Peter Zijlstra <peterz@infradead.org> wrote: > > > The main problem I've got is that it causes lockdep to generate warnings > > > when the top layer and one of the lower layers are of the same > > > filesystem type. The obvious way round this is to give each superblock > > > its own i_mutex lock class rather than putting this in the > > > file_system_type struct, but I'm not sure of the consequences (the two > > > obvious problems are superblock transience and the fact that there may > > > be so many more of them that it may explode lockdep). > > Can't, that would involve classes in dynamically allocated memory (as > you cannot a-priori determine how many instances there will be of a > particular sb). There a number of good (although at times rather > frustrating) reasons why lockdep cannot do dynamic memory. What does this mean for filesystem modules that get removed and inserted again? That's something I do during development rather than rebooting the machine. > Most of those arguments center around things like: allocating memory > involves locks, therefore we could end up wanting to allocate memory > while in the allocator etc. I'm not sure what these arguments are. Initialising the lock class doesn't need to be done with any locks held. I assumed the problems came from key reuse and the storage of out-of-date keys, and an over-abundance of keys, where a lock class's key is simply the pointer to its struct. > Also, why would you want to have a class per sb-instance? From last > talking to David, he said there could only ever be 2 filesystems > involved in this, the top and bottom, and it is determined on (union) > mount time which is which. There can be more than 2 - one upperfs (the actual union) and many lowerfs - though I think only one lowerfs is accessed at a time. However, I was wondering that if in the future it could be possible to make it possible to union over a union. I think that conceptually it shouldn't be that hard, but definitely lockdep presents a barrier unless the top union goes behind the scenes of the lower union and interacts with its lowerfs's directly. > I'm also assuming that once a filesystem is part of a union mount, it > cannot be accessed from outside of said union (can it? can the botton be > itself be the top layer of another union?) Not at the moment; the hard read-only requirements on the lowerfs versus the writeability requirements of the upperfs (you can't enter a directory that you can't mirror up) prevent it. However, at some point I'd be interested in trying to make it possible to union over a writeable filesystem. This is pretty much a requirement for unioning over NFS (as you can't tell the server to make the volume you're mounting hard read-only). > Therefore, why can't we, on constructing the union layers, reset the lock > classes? Reset in what sense? > Also, in what state are the filesystems on construction of the union? Are > they already fully formed and populated (do inodes already exist?) The lower filesystems must be fully formed and, at present, may not be modified whilst in the union. The upper filesystem can be empty or filled by a previous union. In fact, there's nothing stopping the upper fs being an ordinary fs that's then used as the upper layer in a union, but I'm not sure you can then access the lower echelons as the directories don't contain fallthru entries. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 11:01 ` David Howells @ 2011-07-11 12:00 ` Peter Zijlstra 2011-07-11 13:36 ` Michal Suchanek 2011-07-11 13:54 ` David Howells 1 sibling, 1 reply; 81+ messages in thread From: Peter Zijlstra @ 2011-07-11 12:00 UTC (permalink / raw) To: David Howells Cc: Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: > Peter Zijlstra <peterz@infradead.org> wrote: > > > > > The main problem I've got is that it causes lockdep to generate warnings > > > > when the top layer and one of the lower layers are of the same > > > > filesystem type. The obvious way round this is to give each superblock > > > > its own i_mutex lock class rather than putting this in the > > > > file_system_type struct, but I'm not sure of the consequences (the two > > > > obvious problems are superblock transience and the fact that there may > > > > be so many more of them that it may explode lockdep). > > > > Can't, that would involve classes in dynamically allocated memory (as > > you cannot a-priori determine how many instances there will be of a > > particular sb). There a number of good (although at times rather > > frustrating) reasons why lockdep cannot do dynamic memory. > > What does this mean for filesystem modules that get removed and inserted > again? That's something I do during development rather than rebooting the > machine. At some point lockdep runs out of resources and you'll have to reboot. This is true for all module muck. > > Most of those arguments center around things like: allocating memory > > involves locks, therefore we could end up wanting to allocate memory > > while in the allocator etc. > > I'm not sure what these arguments are. Initialising the lock class doesn't > need to be done with any locks held. Correct, however we require all class keys to be of static storage, using dynamic storage for keys would entail having to tear down lock relations on free. Furthermore, once a lock class gets used it consumes resources, its needs to get linked into dependency chains etc.. all that is at lock usage sites. > I assumed the problems came from key reuse and the storage of out-of-date > keys, and an over-abundance of keys, where a lock class's key is simply the > pointer to its struct. Right, that too is a problem (partially already exposed by using modules), compile time static storage isn't actually static in the face of removable modules, and we leak resources from our static pools on rmmod. > > Also, why would you want to have a class per sb-instance? From last > > talking to David, he said there could only ever be 2 filesystems > > involved in this, the top and bottom, and it is determined on (union) > > mount time which is which. > > There can be more than 2 - one upperfs (the actual union) and many lowerfs - > though I think only one lowerfs is accessed at a time. Right, however I understood from our earlier discussion that the vfs would only ever try to lock 2 filesystems at a time, the top and one lower. > However, I was wondering that if in the future it could be possible to make it > possible to union over a union. I think that conceptually it shouldn't be that > hard, but definitely lockdep presents a barrier unless the top union goes > behind the scenes of the lower union and interacts with its lowerfs's directly. Aside from lockdep, how many fs locks will you nest and how will you enforce the filesystem relations remain a DAG? But yeah, that'll be a tad harder to do. One of the ways we could tackle that is create a lock class per depth, and statically create say 16 of those, allowing for a DAG with span of 16. > > I'm also assuming that once a filesystem is part of a union mount, it > > cannot be accessed from outside of said union (can it? can the botton be > > itself be the top layer of another union?) > > Not at the moment; the hard read-only requirements on the lowerfs versus the > writeability requirements of the upperfs (you can't enter a directory that you > can't mirror up) prevent it. > > However, at some point I'd be interested in trying to make it possible to union > over a writeable filesystem. This is pretty much a requirement for unioning > over NFS (as you can't tell the server to make the volume you're mounting hard > read-only). Right, ok, but lets try and make the current situation work first. I understand the desire to later grow. > > Therefore, why can't we, on constructing the union layers, reset the lock > > classes? > > Reset in what sense? Change lock class, we can change the lock class of a lock (easier if not held, not impossible when held, although in the latter case you'll obviously establish a relation between the new class and possibly other held locks). > > Also, in what state are the filesystems on construction of the union? Are > > they already fully formed and populated (do inodes already exist?) > > The lower filesystems must be fully formed and, at present, may not be modified > whilst in the union. > > The upper filesystem can be empty or filled by a previous union. In fact, > there's nothing stopping the upper fs being an ordinary fs that's then used as > the upper layer in a union, but I'm not sure you can then access the lower > echelons as the directories don't contain fallthru entries. Right, so in both cases they can be fully formed, in that case we'll need to iterate all inodes and change their lock class as well. Going from pure vfs-ignorance, something like the below (does the inode locks only, are the sb locks also required?), where the union-mount will call reclassify_fs() and provide the correct fs depth (max 1 for now, to be extended later when you do the full DAG thing). --- fs/inode.c | 43 +++++++++++++++++++++++++++++++++++-------- fs/super.c | 7 +++++++ include/linux/fs.h | 22 ++++++++++++++++++---- 3 files changed, 60 insertions(+), 12 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 43566d1..dfb5b11 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -171,13 +171,13 @@ int inode_init_always(struct super_block *sb, struct inode *inode) if (security_inode_alloc(inode)) goto out; spin_lock_init(&inode->i_lock); - lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key); + lockdep_set_class(&inode->i_lock, sb->i_lock_key); mutex_init(&inode->i_mutex); - lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key); + lockdep_set_class(&inode->i_mutex, sb->i_mutex_key); init_rwsem(&inode->i_alloc_sem); - lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key); + lockdep_set_class(&inode->i_alloc_sem, sb->i_alloc_sem_key); mapping->a_ops = &empty_aops; mapping->host = inode; @@ -882,18 +882,16 @@ void unlock_new_inode(struct inode *inode) { #ifdef CONFIG_DEBUG_LOCK_ALLOC if (S_ISDIR(inode->i_mode)) { - struct file_system_type *type = inode->i_sb->s_type; + struct super_block *sb = inode->i_sb; /* Set new key only if filesystem hasn't already changed it */ - if (!lockdep_match_class(&inode->i_mutex, - &type->i_mutex_key)) { + if (!lockdep_match_class(&inode->i_mutex, sb->i_mutex_key)) { /* * ensure nobody is actually holding i_mutex */ mutex_destroy(&inode->i_mutex); mutex_init(&inode->i_mutex); - lockdep_set_class(&inode->i_mutex, - &type->i_mutex_dir_key); + lockdep_set_class(&inode->i_mutex, sb->i_mutex_dir_key); } } #endif @@ -905,6 +903,35 @@ void unlock_new_inode(struct inode *inode) } EXPORT_SYMBOL(unlock_new_inode); +void reclassify_sb(struct super_block *sb, int level) +{ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct file_system_type *type = sb->s_type; + struct inode *inode; + + WARN_ON_ONCE(level >= LOCKDEP_FS_KEYS); + + s->i_lock_key = &type->i_lock_key[level]; + s->i_mutex_key = &type->i_mutex_key[level]; + s->i_mutex_dir_key = &type->i_mutex_dir_key[level]; + s->i_alloc_sem_key = &type->i_alloc_sem_key[level]; + + spin_lock(&inode_sb_list_lock); + list_for_each_entry(inode, sb->s_inodes, i_sb_list) { + // XXX is the inode unused and unlocked ?! + + lockdep_set_class(&inode->i_lock, sb->i_lock_key); + if (S_ISDIR(inode->i_mode)) + lockdep_set_class(&inode->i_mutex, sb->i_mutex_dir_key); + else + lockdep_set_class(&inode->i_mutex, sb->i_mutex_key); + lockdep_set_class(&inode->i_alloc_sem, sb->i_alloc_sem_key); + } + spin_unlock(&inode_sb_list_lock); +#endif +} +EXPORT_SYMBOL(reclassify_sb); + /** * iget5_locked - obtain an inode from a mounted file system * @sb: super block of file system diff --git a/fs/super.c b/fs/super.c index ab3d672..dca208b 100644 --- a/fs/super.c +++ b/fs/super.c @@ -114,6 +114,13 @@ static struct super_block *alloc_super(struct file_system_type *type) s->s_op = &default_op; s->s_time_gran = 1000000000; s->cleancache_poolid = -1; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + s->i_lock_key = &type->i_lock_key[0]; + s->i_mutex_key = &type->i_mutex_key[0]; + s->i_mutex_dir_key = &type->i_mutex_dir_key[0]; + s->i_alloc_sem_key = &type->i_alloc_sem_key[0]; +#endif } out: return s; diff --git a/include/linux/fs.h b/include/linux/fs.h index b5b9792..ef43ac3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1438,6 +1438,13 @@ struct super_block { * Saved pool identifier for cleancache (-1 means none) */ int cleancache_poolid; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lock_class_key *i_lock_key; + struct lock_class_key *i_mutex_key; + struct lock_class_key *i_mutex_dir_key; + struct lock_class_key *i_alloc_sem_key; +#endif }; extern struct timespec current_fs_time(struct super_block *sb); @@ -1799,6 +1806,13 @@ static inline void file_accessed(struct file *file) int sync_inode(struct inode *inode, struct writeback_control *wbc); int sync_inode_metadata(struct inode *inode, int wait); +/* + * The number of different inode lock keys, used for union mounts + * so that layers of the same file_system_type may still nest without + * making lockdep upset. + */ +#define LOCKDEP_FS_KEYS 2 + struct file_system_type { const char *name; int fs_flags; @@ -1813,10 +1827,10 @@ struct file_system_type { struct lock_class_key s_umount_key; struct lock_class_key s_vfs_rename_key; - struct lock_class_key i_lock_key; - struct lock_class_key i_mutex_key; - struct lock_class_key i_mutex_dir_key; - struct lock_class_key i_alloc_sem_key; + struct lock_class_key i_lock_key[LOCKDEP_FS_KEYS]; + struct lock_class_key i_mutex_key[LOCKDEP_FS_KEYS]; + struct lock_class_key i_mutex_dir_key[LOCKDEP_FS_KEYS]; + struct lock_class_key i_alloc_sem_key[LOCKDEP_FS_KEYS]; }; extern struct dentry *mount_ns(struct file_system_type *fs_type, int flags, ^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 12:00 ` Peter Zijlstra @ 2011-07-11 13:36 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-07-11 13:36 UTC (permalink / raw) To: Peter Zijlstra Cc: David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: >> Peter Zijlstra <peterz@infradead.org> wrote: >> >> > Also, why would you want to have a class per sb-instance? From last >> > talking to David, he said there could only ever be 2 filesystems >> > involved in this, the top and bottom, and it is determined on (union) >> > mount time which is which. >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs - >> though I think only one lowerfs is accessed at a time. > > Right, however I understood from our earlier discussion that the vfs > would only ever try to lock 2 filesystems at a time, the top and one > lower. This is true from local point of view. However, it is technically possible to use overlayfs as the upper layer of another overlayfs which allows layering multiple readonly "branches" into a single overlay. Since the vfs will lock the "union" and one (or possibly both) of its branches and one of the branches may be itself an union you can get arbitrary depth (which is currently limited by a constant in the code to cut recursion depth and stack usage). > >> However, I was wondering that if in the future it could be possible to make it >> possible to union over a union. I think that conceptually it shouldn't be that >> hard, but definitely lockdep presents a barrier unless the top union goes >> behind the scenes of the lower union and interacts with its lowerfs's directly. > > Aside from lockdep, how many fs locks will you nest and how will you > enforce the filesystem relations remain a DAG? But yeah, that'll be a > tad harder to do. One of the ways we could tackle that is create a lock > class per depth, and statically create say 16 of those, allowing for a > DAG with span of 16. This would be consistent with the limit on nesting imposed by stack size but there should be probably some mechanism to infer one of the numbers from the other. > >> > I'm also assuming that once a filesystem is part of a union mount, it >> > cannot be accessed from outside of said union (can it? can the botton be >> > itself be the top layer of another union?) >> >> Not at the moment; the hard read-only requirements on the lowerfs versus the >> writeability requirements of the upperfs (you can't enter a directory that you >> can't mirror up) prevent it. >> >> However, at some point I'd be interested in trying to make it possible to union >> over a writeable filesystem. This is pretty much a requirement for unioning >> over NFS (as you can't tell the server to make the volume you're mounting hard >> read-only). I don't think that there is a hard readonly requirement. As far s a I understand the current status is that "The filesystem should not be modified directly" and "doing so will lead to undefined behaviour but no crash or lockup". Unless there are bugs, obviously. >> > Also, in what state are the filesystems on construction of the union? Are >> > they already fully formed and populated (do inodes already exist?) >> >> The lower filesystems must be fully formed and, at present, may not be modified >> whilst in the union. >> >> The upper filesystem can be empty or filled by a previous union. In fact, >> there's nothing stopping the upper fs being an ordinary fs that's then used as >> the upper layer in a union, but I'm not sure you can then access the lower >> echelons as the directories don't contain fallthru entries. As overlayfs does not have explicit fallthru entries layering any two fully formed filesystems gives an union of the two. You will only lose access to entries that were previously deleted in an union and have a whiteout entry in the upper layer. Unionmount makes any directories which were touched in an upper union layer opaque and requires explicit fallthru entries to access the lower layer. A normal filesystem does not have opaque directories and allows access to the lower layer when it is used as the top layer for the first time. Traversing the union will make it opaque, though. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues @ 2011-07-11 13:36 ` Michal Suchanek 0 siblings, 0 replies; 81+ messages in thread From: Michal Suchanek @ 2011-07-11 13:36 UTC (permalink / raw) To: Peter Zijlstra Cc: David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: >> Peter Zijlstra <peterz@infradead.org> wrote: >> >> > Also, why would you want to have a class per sb-instance? From last >> > talking to David, he said there could only ever be 2 filesystems >> > involved in this, the top and bottom, and it is determined on (union) >> > mount time which is which. >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs - >> though I think only one lowerfs is accessed at a time. > > Right, however I understood from our earlier discussion that the vfs > would only ever try to lock 2 filesystems at a time, the top and one > lower. This is true from local point of view. However, it is technically possible to use overlayfs as the upper layer of another overlayfs which allows layering multiple readonly "branches" into a single overlay. Since the vfs will lock the "union" and one (or possibly both) of its branches and one of the branches may be itself an union you can get arbitrary depth (which is currently limited by a constant in the code to cut recursion depth and stack usage). > >> However, I was wondering that if in the future it could be possible to make it >> possible to union over a union. I think that conceptually it shouldn't be that >> hard, but definitely lockdep presents a barrier unless the top union goes >> behind the scenes of the lower union and interacts with its lowerfs's directly. > > Aside from lockdep, how many fs locks will you nest and how will you > enforce the filesystem relations remain a DAG? But yeah, that'll be a > tad harder to do. One of the ways we could tackle that is create a lock > class per depth, and statically create say 16 of those, allowing for a > DAG with span of 16. This would be consistent with the limit on nesting imposed by stack size but there should be probably some mechanism to infer one of the numbers from the other. > >> > I'm also assuming that once a filesystem is part of a union mount, it >> > cannot be accessed from outside of said union (can it? can the botton be >> > itself be the top layer of another union?) >> >> Not at the moment; the hard read-only requirements on the lowerfs versus the >> writeability requirements of the upperfs (you can't enter a directory that you >> can't mirror up) prevent it. >> >> However, at some point I'd be interested in trying to make it possible to union >> over a writeable filesystem. This is pretty much a requirement for unioning >> over NFS (as you can't tell the server to make the volume you're mounting hard >> read-only). I don't think that there is a hard readonly requirement. As far s a I understand the current status is that "The filesystem should not be modified directly" and "doing so will lead to undefined behaviour but no crash or lockup". Unless there are bugs, obviously. >> > Also, in what state are the filesystems on construction of the union? Are >> > they already fully formed and populated (do inodes already exist?) >> >> The lower filesystems must be fully formed and, at present, may not be modified >> whilst in the union. >> >> The upper filesystem can be empty or filled by a previous union. In fact, >> there's nothing stopping the upper fs being an ordinary fs that's then used as >> the upper layer in a union, but I'm not sure you can then access the lower >> echelons as the directories don't contain fallthru entries. As overlayfs does not have explicit fallthru entries layering any two fully formed filesystems gives an union of the two. You will only lose access to entries that were previously deleted in an union and have a whiteout entry in the upper layer. Unionmount makes any directories which were touched in an upper union layer opaque and requires explicit fallthru entries to access the lower layer. A normal filesystem does not have opaque directories and allows access to the lower layer when it is used as the top layer for the first time. Traversing the union will make it opaque, though. Thanks Michal -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 13:36 ` Michal Suchanek (?) @ 2011-07-11 13:50 ` Ian Kent 2011-07-11 16:17 ` Michal Suchanek -1 siblings, 1 reply; 81+ messages in thread From: Ian Kent @ 2011-07-11 13:50 UTC (permalink / raw) To: Michal Suchanek Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote: > On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: > >> Peter Zijlstra <peterz@infradead.org> wrote: > >> > > >> > Also, why would you want to have a class per sb-instance? From last > >> > talking to David, he said there could only ever be 2 filesystems > >> > involved in this, the top and bottom, and it is determined on (union) > >> > mount time which is which. > >> > >> There can be more than 2 - one upperfs (the actual union) and many lowerfs - > >> though I think only one lowerfs is accessed at a time. > > > > Right, however I understood from our earlier discussion that the vfs > > would only ever try to lock 2 filesystems at a time, the top and one > > lower. > > This is true from local point of view. However, it is technically > possible to use overlayfs as the upper layer of another overlayfs > which allows layering multiple readonly "branches" into a single > overlay. Since the vfs will lock the "union" and one (or possibly > both) of its branches and one of the branches may be itself an union > you can get arbitrary depth (which is currently limited by a constant > in the code to cut recursion depth and stack usage). Off topic but can you elaborate on that? Are you saying the "unioned stack" can consist of more than two file systems and can have more than two layers and possibly a mix of multiple read-only and read-write file systems? Ian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 13:50 ` Ian Kent @ 2011-07-11 16:17 ` Michal Suchanek 2011-07-11 17:23 ` Ian Kent 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-07-11 16:17 UTC (permalink / raw) To: Ian Kent Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On 11 July 2011 15:50, Ian Kent <ikent@redhat.com> wrote: > On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote: >> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: >> >> Peter Zijlstra <peterz@infradead.org> wrote: >> >> >> >> >> > Also, why would you want to have a class per sb-instance? From last >> >> > talking to David, he said there could only ever be 2 filesystems >> >> > involved in this, the top and bottom, and it is determined on (union) >> >> > mount time which is which. >> >> >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs - >> >> though I think only one lowerfs is accessed at a time. >> > >> > Right, however I understood from our earlier discussion that the vfs >> > would only ever try to lock 2 filesystems at a time, the top and one >> > lower. >> >> This is true from local point of view. However, it is technically >> possible to use overlayfs as the upper layer of another overlayfs >> which allows layering multiple readonly "branches" into a single >> overlay. Since the vfs will lock the "union" and one (or possibly >> both) of its branches and one of the branches may be itself an union >> you can get arbitrary depth (which is currently limited by a constant >> in the code to cut recursion depth and stack usage). > > Off topic but can you elaborate on that? > > Are you saying the "unioned stack" can consist of more than two file > systems and can have more than two layers and possibly a mix of multiple > read-only and read-write file systems? > This is how requirements are described in documentation: > The lower filesystem can be any filesystem supported by Linux and does > not need to be writable. The lower filesystem can even be another > overlayfs. The upper filesystem will normally be writable and if it > is it must support the creation of trusted.* extended attributes, and > must provide valid d_type in readdir responses, at least for symbolic > links - so NFS is not suitable. In no place it says that the lower filesystem is required to be readonly, only that it should not be modified. This is what the documentation gives as example: > mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay This is how it can be expanded: mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay lower1 and lower2 are readonly branches you want to union; tmpoverlay is writable so it fulfills the requirement for upper layer of overlay. Note that the order is backwards, lower1 appears under lower2 which is mounted earlier if I understand the layering correctly. Depending on the way overlayfs handles creation of trusted.* extended attributes on itself this overlay may work or break in interesting ways. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 16:17 ` Michal Suchanek @ 2011-07-11 17:23 ` Ian Kent 2011-07-11 18:08 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Ian Kent @ 2011-07-11 17:23 UTC (permalink / raw) To: Michal Suchanek Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On Mon, 2011-07-11 at 18:17 +0200, Michal Suchanek wrote: > On 11 July 2011 15:50, Ian Kent <ikent@redhat.com> wrote: > > On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote: > >> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote: > >> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: > >> >> Peter Zijlstra <peterz@infradead.org> wrote: > >> >> > >> > >> >> > Also, why would you want to have a class per sb-instance? From last > >> >> > talking to David, he said there could only ever be 2 filesystems > >> >> > involved in this, the top and bottom, and it is determined on (union) > >> >> > mount time which is which. > >> >> > >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs - > >> >> though I think only one lowerfs is accessed at a time. > >> > > >> > Right, however I understood from our earlier discussion that the vfs > >> > would only ever try to lock 2 filesystems at a time, the top and one > >> > lower. > >> > >> This is true from local point of view. However, it is technically > >> possible to use overlayfs as the upper layer of another overlayfs > >> which allows layering multiple readonly "branches" into a single > >> overlay. Since the vfs will lock the "union" and one (or possibly > >> both) of its branches and one of the branches may be itself an union > >> you can get arbitrary depth (which is currently limited by a constant > >> in the code to cut recursion depth and stack usage). > > > > Off topic but can you elaborate on that? > > > > Are you saying the "unioned stack" can consist of more than two file > > systems and can have more than two layers and possibly a mix of multiple > > read-only and read-write file systems? > > > > This is how requirements are described in documentation: > > > The lower filesystem can be any filesystem supported by Linux and does > > not need to be writable. The lower filesystem can even be another > > overlayfs. The upper filesystem will normally be writable and if it > > is it must support the creation of trusted.* extended attributes, and > > must provide valid d_type in readdir responses, at least for symbolic > > links - so NFS is not suitable. > > In no place it says that the lower filesystem is required to be > readonly, only that it should not be modified. > > > This is what the documentation gives as example: > > > mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay > > This is how it can be expanded: > > mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay > mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay OK, I'll have to think about what this means but I suspect that it is broken. I'll have a look at the overlayfs code and see if there are globally enforced ordering of stacked file systems. If there is none then I believe overlayfs is probably open to AB <-> BA deadlock due to the possibility of locking two file systems in one overlayfs stack in one order and the same two file systems in the opposite order in another. I don't remember seeing any unioning file system that checks and enforces this type of global ordering, although I think the special case checks of union mount pretty much cover it, AFAICT. Its been a while since I looked at the code for any of the unioning file systems so I may be wrong. Assuming I am correct though, that then defines restrictions on what should (or can) be aloud from a lockdep POV. Ian ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 17:23 ` Ian Kent @ 2011-07-11 18:08 ` Michal Suchanek 2011-07-12 8:30 ` Miklos Szeredi 2011-07-13 12:02 ` David Howells 0 siblings, 2 replies; 81+ messages in thread From: Michal Suchanek @ 2011-07-11 18:08 UTC (permalink / raw) To: Ian Kent Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On 11 July 2011 19:23, Ian Kent <ikent@redhat.com> wrote: > On Mon, 2011-07-11 at 18:17 +0200, Michal Suchanek wrote: >> On 11 July 2011 15:50, Ian Kent <ikent@redhat.com> wrote: >> > On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote: >> >> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote: >> >> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote: >> >> >> Peter Zijlstra <peterz@infradead.org> wrote: >> >> >> >> >> >> >> >> > Also, why would you want to have a class per sb-instance? From last >> >> >> > talking to David, he said there could only ever be 2 filesystems >> >> >> > involved in this, the top and bottom, and it is determined on (union) >> >> >> > mount time which is which. >> >> >> >> >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs - >> >> >> though I think only one lowerfs is accessed at a time. >> >> > >> >> > Right, however I understood from our earlier discussion that the vfs >> >> > would only ever try to lock 2 filesystems at a time, the top and one >> >> > lower. >> >> >> >> This is true from local point of view. However, it is technically >> >> possible to use overlayfs as the upper layer of another overlayfs >> >> which allows layering multiple readonly "branches" into a single >> >> overlay. Since the vfs will lock the "union" and one (or possibly >> >> both) of its branches and one of the branches may be itself an union >> >> you can get arbitrary depth (which is currently limited by a constant >> >> in the code to cut recursion depth and stack usage). >> > >> > Off topic but can you elaborate on that? >> > >> > Are you saying the "unioned stack" can consist of more than two file >> > systems and can have more than two layers and possibly a mix of multiple >> > read-only and read-write file systems? >> > >> >> This is how requirements are described in documentation: >> >> > The lower filesystem can be any filesystem supported by Linux and does >> > not need to be writable. The lower filesystem can even be another >> > overlayfs. The upper filesystem will normally be writable and if it >> > is it must support the creation of trusted.* extended attributes, and >> > must provide valid d_type in readdir responses, at least for symbolic >> > links - so NFS is not suitable. >> >> In no place it says that the lower filesystem is required to be >> readonly, only that it should not be modified. >> >> >> This is what the documentation gives as example: >> >> > mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay >> >> This is how it can be expanded: >> >> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay >> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay > > OK, I'll have to think about what this means but I suspect that it is > broken. I'll have a look at the overlayfs code and see if there are > globally enforced ordering of stacked file systems. If there is none > then I believe overlayfs is probably open to AB <-> BA deadlock due to > the possibility of locking two file systems in one overlayfs stack in > one order and the same two file systems in the opposite order in > another. I think this is fine as long as the same layer does not appear in two different unions. The locking order is likely determined by the structure of the union and not some system-wide order of filesystems so assuming the readonly layers are locked as well you will probably get a deadlock with technically correct mount: mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2 mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2 because now lower1 and lower2 are differently ordered in the two overlays. System-wide locking order and some optimizations are reasonably possible only when the mount is actually aware that it has multiple branches like mount -t overlayfs overlayfs -olowerdirs=/lower1:/lower2,upperdir=/upper3 /not-possible-overlay Note also that there is no guarantee that /lower1 and /lower2 are in any way distinct or don't have intermingled hardlinks or symlinks. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 18:08 ` Michal Suchanek @ 2011-07-12 8:30 ` Miklos Szeredi 2011-07-12 9:58 ` Michal Suchanek 2011-07-13 12:02 ` David Howells 1 sibling, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-07-12 8:30 UTC (permalink / raw) To: Michal Suchanek Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Michal Suchanek <hramrach@centrum.cz> writes: > The locking order is likely determined by the structure of the union > and not some system-wide order of filesystems so assuming the readonly > layers are locked as well you will probably get a deadlock with > technically correct mount: > > mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay > mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay > > mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2 > mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2 > > because now lower1 and lower2 are differently ordered in the two > overlays. Overlayfs never locks both upper and lower at the same time, which means there's no AB-BA locking dependency. The lock orderings are: -> /overlay -> /lower1 -> /tmpoverlay -> /lower2 -> /upper -> /overlay2 -> /lower2 -> /tmpoverlay2 -> /lower1 -> /upper2 As you can see there's no nesting of lower2 and lower1 into each other. When you combine two filesystems, a completely new ordering is created each time, there's no possibility to make an AB-BA nesting. At least I cannot see one. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-12 8:30 ` Miklos Szeredi @ 2011-07-12 9:58 ` Michal Suchanek 2011-07-12 11:45 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-07-12 9:58 UTC (permalink / raw) To: Miklos Szeredi Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer On 12 July 2011 10:30, Miklos Szeredi <miklos@szeredi.hu> wrote: > Michal Suchanek <hramrach@centrum.cz> writes: > >> The locking order is likely determined by the structure of the union >> and not some system-wide order of filesystems so assuming the readonly >> layers are locked as well you will probably get a deadlock with >> technically correct mount: >> >> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay >> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay >> >> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2 >> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2 >> >> because now lower1 and lower2 are differently ordered in the two >> overlays. > > Overlayfs never locks both upper and lower at the same time, which means > there's no AB-BA locking dependency. The lock orderings are: > > -> /overlay > -> /lower1 > -> /tmpoverlay > -> /lower2 > -> /upper > -> /overlay2 > -> /lower2 > -> /tmpoverlay2 > -> /lower1 > -> /upper2 > > As you can see there's no nesting of lower2 and lower1 into each other. > > When you combine two filesystems, a completely new ordering is created > each time, there's no possibility to make an AB-BA nesting. At least I > cannot see one. Except you can get in situation where overlay locks lower1 and tmpoverlay waits for lower2 which is held by overlay2 waiting for lower1 in tmpoverlay2. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-12 9:58 ` Michal Suchanek @ 2011-07-12 11:45 ` Miklos Szeredi 2011-07-12 18:49 ` Michal Suchanek 0 siblings, 1 reply; 81+ messages in thread From: Miklos Szeredi @ 2011-07-12 11:45 UTC (permalink / raw) To: Michal Suchanek Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Michal Suchanek <hramrach@centrum.cz> writes: > On 12 July 2011 10:30, Miklos Szeredi <miklos@szeredi.hu> wrote: >> Michal Suchanek <hramrach@centrum.cz> writes: >> >>> The locking order is likely determined by the structure of the union >>> and not some system-wide order of filesystems so assuming the readonly >>> layers are locked as well you will probably get a deadlock with >>> technically correct mount: >>> >>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay >>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay >>> >>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2 >>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2 >>> >>> because now lower1 and lower2 are differently ordered in the two >>> overlays. >> >> Overlayfs never locks both upper and lower at the same time, which means >> there's no AB-BA locking dependency. The lock orderings are: >> >> -> /overlay >> -> /lower1 >> -> /tmpoverlay >> -> /lower2 >> -> /upper >> -> /overlay2 >> -> /lower2 >> -> /tmpoverlay2 >> -> /lower1 >> -> /upper2 >> >> As you can see there's no nesting of lower2 and lower1 into each other. >> >> When you combine two filesystems, a completely new ordering is created >> each time, there's no possibility to make an AB-BA nesting. At least I >> cannot see one. > > Except you can get in situation where overlay locks lower1 and > tmpoverlay waits for lower2 Note: tmpoverlay lock does *not* nest into lower1 lock, they are both on the same nesting level. There's no dependency between the two. > which is held by overlay2 waiting for > lower1 in tmpoverlay2. No deadlock there. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-12 11:45 ` Miklos Szeredi @ 2011-07-12 18:49 ` Michal Suchanek 2011-07-13 9:49 ` Miklos Szeredi 0 siblings, 1 reply; 81+ messages in thread From: Michal Suchanek @ 2011-07-12 18:49 UTC (permalink / raw) To: Miklos Szeredi Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer On 12 July 2011 13:45, Miklos Szeredi <miklos@szeredi.hu> wrote: > Michal Suchanek <hramrach@centrum.cz> writes: > >> On 12 July 2011 10:30, Miklos Szeredi <miklos@szeredi.hu> wrote: >>> Michal Suchanek <hramrach@centrum.cz> writes: >>> >>>> The locking order is likely determined by the structure of the union >>>> and not some system-wide order of filesystems so assuming the readonly >>>> layers are locked as well you will probably get a deadlock with >>>> technically correct mount: >>>> >>>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay >>>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay >>>> >>>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2 >>>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2 >>>> >>>> because now lower1 and lower2 are differently ordered in the two >>>> overlays. >>> >>> Overlayfs never locks both upper and lower at the same time, which means >>> there's no AB-BA locking dependency. The lock orderings are: >>> >>> -> /overlay >>> -> /lower1 >>> -> /tmpoverlay >>> -> /lower2 >>> -> /upper >>> -> /overlay2 >>> -> /lower2 >>> -> /tmpoverlay2 >>> -> /lower1 >>> -> /upper2 >>> >>> As you can see there's no nesting of lower2 and lower1 into each other. >>> >>> When you combine two filesystems, a completely new ordering is created >>> each time, there's no possibility to make an AB-BA nesting. At least I >>> cannot see one. >> >> Except you can get in situation where overlay locks lower1 and >> tmpoverlay waits for lower2 > > Note: tmpoverlay lock does *not* nest into lower1 lock, they are both on > the same nesting level. There's no dependency between the two. > >> which is held by overlay2 waiting for >> lower1 in tmpoverlay2. > > No deadlock there. > That's nice. You can still do mount --bind /lower1 /lower2/lower1 mount --bind /lower2 /lower1/lower2 Which is technically not against usage guidelines, unlike mount --bind /upper /lower1/upper If crossing mount boundaries is forbidden try with symlinks or hardlinks. Thanks Michal ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-12 18:49 ` Michal Suchanek @ 2011-07-13 9:49 ` Miklos Szeredi 0 siblings, 0 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-07-13 9:49 UTC (permalink / raw) To: Michal Suchanek Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Michal Suchanek <hramrach@centrum.cz> writes: > You can still do > > mount --bind /lower1 /lower2/lower1 > mount --bind /lower2 /lower1/lower2 > > Which is technically not against usage guidelines, unlike mount --bind > /upper /lower1/upper Overlayfs will take the directory you specify as upperdir and lowerdir and will clone those mounts internally. This means that any present or future submounts are ignored and the mount containting those directories may be moved or umounted and overlayfs will still function. None of the bind mounts above will have any effect on an overlayfs that uses /lower[12] and /upper. > If crossing mount boundaries is forbidden try with symlinks or hardlinks. I'm not sure what you mean here. Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 18:08 ` Michal Suchanek @ 2011-07-13 12:02 ` David Howells 2011-07-13 12:02 ` David Howells 1 sibling, 0 replies; 81+ messages in thread From: David Howells @ 2011-07-13 12:02 UTC (permalink / raw) To: Miklos Szeredi Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Miklos Szeredi <miklos@szeredi.hu> wrote: > Overlayfs never locks both upper and lower at the same time, which means > there's no AB-BA locking dependency. The lock orderings are: What you're talking about is not analogous to the situation I'm seeing with unionmount. You actually have three filesystems in overlayfs. The interaction between overlayfs-and-upperfs and overlayfs-and-lowerfs is the equivalent in unionmount terms to upperfs-and-lowerfs. This is where the lockdep issue lies. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues @ 2011-07-13 12:02 ` David Howells 0 siblings, 0 replies; 81+ messages in thread From: David Howells @ 2011-07-13 12:02 UTC (permalink / raw) To: Miklos Szeredi Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Miklos Szeredi <miklos@szeredi.hu> wrote: > Overlayfs never locks both upper and lower at the same time, which means > there's no AB-BA locking dependency. The lock orderings are: What you're talking about is not analogous to the situation I'm seeing with unionmount. You actually have three filesystems in overlayfs. The interaction between overlayfs-and-upperfs and overlayfs-and-lowerfs is the equivalent in unionmount terms to upperfs-and-lowerfs. This is where the lockdep issue lies. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-13 12:02 ` David Howells (?) @ 2011-07-13 13:20 ` Miklos Szeredi -1 siblings, 0 replies; 81+ messages in thread From: Miklos Szeredi @ 2011-07-13 13:20 UTC (permalink / raw) To: David Howells Cc: Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer David Howells <dhowells@redhat.com> writes: > Miklos Szeredi <miklos@szeredi.hu> wrote: > >> Overlayfs never locks both upper and lower at the same time, which means >> there's no AB-BA locking dependency. The lock orderings are: > > What you're talking about is not analogous to the situation I'm seeing with > unionmount. > > You actually have three filesystems in overlayfs. The interaction between > overlayfs-and-upperfs and overlayfs-and-lowerfs is the equivalent in > unionmount terms to upperfs-and-lowerfs. This is where the lockdep issue > lies. Normally overlayfs would be immune to this, since lower/upper are different filesystem types. But overlayfs-over-overlayfs would see the same lockdep warnings, right? Thanks, Miklos ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-13 12:02 ` David Howells @ 2011-07-14 0:57 ` David Howells -1 siblings, 0 replies; 81+ messages in thread From: David Howells @ 2011-07-14 0:57 UTC (permalink / raw) To: Miklos Szeredi Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Miklos Szeredi <miklos@szeredi.hu> wrote: > Normally overlayfs would be immune to this, since lower/upper are > different filesystem types. But overlayfs-over-overlayfs would see the > same lockdep warnings, right? Yes. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues @ 2011-07-14 0:57 ` David Howells 0 siblings, 0 replies; 81+ messages in thread From: David Howells @ 2011-07-14 0:57 UTC (permalink / raw) To: Miklos Szeredi Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel, Jeff Moyer Miklos Szeredi <miklos@szeredi.hu> wrote: > Normally overlayfs would be immune to this, since lower/upper are > different filesystem types. But overlayfs-over-overlayfs would see the > same lockdep warnings, right? Yes. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 11:01 ` David Howells 2011-07-11 12:00 ` Peter Zijlstra @ 2011-07-11 13:54 ` David Howells 2011-07-11 14:02 ` Peter Zijlstra 1 sibling, 1 reply; 81+ messages in thread From: David Howells @ 2011-07-11 13:54 UTC (permalink / raw) To: Peter Zijlstra Cc: dhowells, Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos Peter Zijlstra <peterz@infradead.org> wrote: > At some point lockdep runs out of resources and you'll have to reboot. > This is true for all module muck. Fair enough; I guess for normal use, modules just don't get unloaded. > > There can be more than 2 - one upperfs (the actual union) and many lowerfs - > > though I think only one lowerfs is accessed at a time. > > Right, however I understood from our earlier discussion that the vfs > would only ever try to lock 2 filesystems at a time, the top and one > lower. That's what I meant by 'only one lowerfs is accessed at a time'. > Aside from lockdep, how many fs locks will you nest and how will you > enforce the filesystem relations remain a DAG? This ought to be fine. Unionmount requires the superblock for the upperfs be fresh and unsullied when it gets to union it. Furthermore, it shouldn't allow that partition to be mounted elsewhere in a second mount whilst it is still unioned. Bind mounts should be okay, since they're effectively a reference on the union we already have. > Right, ok, but lets try and make the current situation work first. I > understand the desire to later grow. Indeed. > > The upper filesystem can be empty or filled by a previous union. In fact, > > there's nothing stopping the upper fs being an ordinary fs that's then used > > as the upper layer in a union, but I'm not sure you can then access the > > lower echelons as the directories don't contain fallthru entries. > > Right, so in both cases they can be fully formed, in that case we'll > need to iterate all inodes and change their lock class as well. I think I misunderstood you, then. 'Fully formed' in what sense? I assumed you meant populated on disk. I'll post the two patches I have to deal with this. The first propagates the mount flags to sget() and the second makes use of MS_UNION in sget() to key the locks appropriately. David ^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: Union mount and lockdep design issues 2011-07-11 13:54 ` David Howells @ 2011-07-11 14:02 ` Peter Zijlstra 2011-07-11 14:50 ` [PATCH 1/2] VFS: Pass mount flags to sget() David Howells 2011-07-11 14:50 ` [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer David Howells 0 siblings, 2 replies; 81+ messages in thread From: Peter Zijlstra @ 2011-07-11 14:02 UTC (permalink / raw) To: David Howells Cc: Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos On Mon, 2011-07-11 at 14:54 +0100, David Howells wrote: > > I think I misunderstood you, then. 'Fully formed' in what sense? I assumed > you meant populated on disk. I meant, sb exists and has inodes. It would be easier to deal with if the union mount creates all sb involved, because then we can set the proper lock class from the start and not have to worry about existing state. > I'll post the two patches I have to deal with this. The first propagates the > mount flags to sget() and the second makes use of MS_UNION in sget() to key the > locks appropriately. Right, note that the patch I provided has a number of problems, for one it assumes inodes are unused and unlocked in that reclassify loop, furthermore even by holding inode_sb_list_lock there is a race with inode creation because we set the lock keys before adding to the s_inodes list. Anyway, it does illustrate the idea and someone more versed in vfs can make it work, I hope ;-) ^ permalink raw reply [flat|nested] 81+ messages in thread
* [PATCH 1/2] VFS: Pass mount flags to sget() 2011-07-11 14:02 ` Peter Zijlstra @ 2011-07-11 14:50 ` David Howells 2011-07-11 14:50 ` [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer David Howells 1 sibling, 0 replies; 81+ messages in thread From: David Howells @ 2011-07-11 14:50 UTC (permalink / raw) To: peterz Cc: dhowells, aviro, hch, hramrach, ikent, linux-fsdevel, linux-kernel, jmoyer, miklos Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: David Howells <dhowells@redhat.com> --- drivers/mtd/mtdsuper.c | 4 +--- fs/9p/vfs_super.c | 4 ++-- fs/afs/super.c | 3 +-- fs/btrfs/super.c | 4 ++-- fs/ceph/super.c | 2 +- fs/cifs/cifsfs.c | 9 ++++----- fs/devpts/inode.c | 6 +++--- fs/ecryptfs/main.c | 3 +-- fs/gfs2/ops_fstype.c | 5 ++--- fs/libfs.c | 4 ++-- fs/logfs/super.c | 3 +-- fs/nfs/super.c | 10 +++++----- fs/nilfs2/super.c | 4 ++-- fs/proc/root.c | 3 +-- fs/reiserfs/procfs.c | 2 +- fs/super.c | 22 +++++++++++----------- fs/sysfs/mount.c | 3 +-- fs/ubifs/super.c | 3 +-- include/linux/fs.h | 2 +- kernel/cgroup.c | 2 +- 20 files changed, 44 insertions(+), 54 deletions(-) diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c index 16b02a1..45ade10 100644 --- a/drivers/mtd/mtdsuper.c +++ b/drivers/mtd/mtdsuper.c @@ -62,7 +62,7 @@ static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags, struct super_block *sb; int ret; - sb = sget(fs_type, get_sb_mtd_compare, get_sb_mtd_set, mtd); + sb = sget(fs_type, get_sb_mtd_compare, get_sb_mtd_set, flags, mtd); if (IS_ERR(sb)) goto out_error; @@ -73,8 +73,6 @@ static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags, DEBUG(1, "MTDSB: New superblock for device %d (\"%s\")\n", mtd->index, mtd->name); - sb->s_flags = flags; - ret = fill_super(sb, data, flags & MS_SILENT ? 1 : 0); if (ret < 0) { deactivate_locked_super(sb); diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index feef6cd..0d163c5 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -89,7 +89,7 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses, if (v9ses->cache) sb->s_bdi->ra_pages = (VM_MAX_READAHEAD * 1024)/PAGE_CACHE_SIZE; - sb->s_flags = flags | MS_ACTIVE | MS_DIRSYNC | MS_NOATIME; + sb->s_flags |= MS_ACTIVE | MS_DIRSYNC | MS_NOATIME; if (!v9ses->cache) sb->s_flags |= MS_SYNCHRONOUS; @@ -137,7 +137,7 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags, goto close_session; } - sb = sget(fs_type, NULL, v9fs_set_super, v9ses); + sb = sget(fs_type, NULL, v9fs_set_super, flags, v9ses); if (IS_ERR(sb)) { retval = PTR_ERR(sb); goto clunk_fid; diff --git a/fs/afs/super.c b/fs/afs/super.c index 356dcf0..3c859b9 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -398,7 +398,7 @@ static struct dentry *afs_mount(struct file_system_type *fs_type, as->volume = vol; /* allocate a deviceless superblock */ - sb = sget(fs_type, afs_test_super, afs_set_super, as); + sb = sget(fs_type, afs_test_super, afs_set_super, flags, as); if (IS_ERR(sb)) { ret = PTR_ERR(sb); afs_put_volume(vol); @@ -409,7 +409,6 @@ static struct dentry *afs_mount(struct file_system_type *fs_type, if (!sb->s_root) { /* initial superblock/root creation */ _debug("create"); - sb->s_flags = flags; ret = afs_fill_super(sb, ¶ms); if (ret < 0) { deactivate_locked_super(sb); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0bb4ebb..c6d6836 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -808,7 +808,8 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags, tree_root->fs_info = fs_info; bdev = fs_devices->latest_bdev; - s = sget(fs_type, btrfs_test_super, btrfs_set_super, tree_root); + s = sget(fs_type, btrfs_test_super, btrfs_set_super, flags | MS_NOSEC, + tree_root); if (IS_ERR(s)) goto error_s; @@ -825,7 +826,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags, } else { char b[BDEVNAME_SIZE]; - s->s_flags = flags | MS_NOSEC; strlcpy(s->s_id, bdevname(bdev, b), sizeof(s->s_id)); error = btrfs_fill_super(s, fs_devices, data, flags & MS_SILENT ? 1 : 0); diff --git a/fs/ceph/super.c b/fs/ceph/super.c index f2f77fd..82710f6 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -823,7 +823,7 @@ static struct dentry *ceph_mount(struct file_system_type *fs_type, if (ceph_test_opt(fsc->client, NOSHARE)) compare_super = NULL; - sb = sget(fs_type, compare_super, ceph_set_super, fsc); + sb = sget(fs_type, compare_super, ceph_set_super, flags, fsc); if (IS_ERR(sb)) { res = ERR_CAST(sb); goto out; diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 35f9154..834367e 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -679,7 +679,10 @@ cifs_do_mount(struct file_system_type *fs_type, mnt_data.cifs_sb = cifs_sb; mnt_data.flags = flags; - sb = sget(fs_type, cifs_match_super, cifs_set_super, &mnt_data); + /* BB should we make this contingent on mount parm? */ + flags |= MS_NODIRATIME | MS_NOATIME; + + sb = sget(fs_type, cifs_match_super, cifs_set_super, flags, &mnt_data); if (IS_ERR(sb)) { root = ERR_CAST(sb); cifs_umount(cifs_sb); @@ -690,10 +693,6 @@ cifs_do_mount(struct file_system_type *fs_type, cFYI(1, "Use existing superblock"); cifs_umount(cifs_sb); } else { - sb->s_flags = flags; - /* BB should we make this contingent on mount parm? */ - sb->s_flags |= MS_NODIRATIME | MS_NOATIME; - rc = cifs_read_super(sb); if (rc) { root = ERR_PTR(rc); diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c index 2f27e57..4154d08 100644 --- a/fs/devpts/inode.c +++ b/fs/devpts/inode.c @@ -369,15 +369,15 @@ static struct dentry *devpts_mount(struct file_system_type *fs_type, return ERR_PTR(error); if (opts.newinstance) - s = sget(fs_type, NULL, set_anon_super, NULL); + s = sget(fs_type, NULL, set_anon_super, flags, NULL); else - s = sget(fs_type, compare_init_pts_sb, set_anon_super, NULL); + s = sget(fs_type, compare_init_pts_sb, set_anon_super, flags, + NULL); if (IS_ERR(s)) return ERR_CAST(s); if (!s->s_root) { - s->s_flags = flags; error = devpts_fill_super(s, data, flags & MS_SILENT ? 1 : 0); if (error) goto out_undo_sget; diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c index 9f1bb74..46ac2b2 100644 --- a/fs/ecryptfs/main.c +++ b/fs/ecryptfs/main.c @@ -489,13 +489,12 @@ static struct dentry *ecryptfs_mount(struct file_system_type *fs_type, int flags goto out; } - s = sget(fs_type, NULL, set_anon_super, NULL); + s = sget(fs_type, NULL, set_anon_super, flags, NULL); if (IS_ERR(s)) { rc = PTR_ERR(s); goto out; } - s->s_flags = flags; rc = bdi_setup_and_register(&sbi->bdi, "ecryptfs", BDI_CAP_MAP_COPY); if (rc) goto out1; diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 8ac9ae1..80f24ce 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1278,7 +1278,7 @@ static struct dentry *gfs2_mount(struct file_system_type *fs_type, int flags, error = -EBUSY; goto error_bdev; } - s = sget(fs_type, test_gfs2_super, set_gfs2_super, bdev); + s = sget(fs_type, test_gfs2_super, set_gfs2_super, flags, bdev); mutex_unlock(&bdev->bd_fsfreeze_mutex); error = PTR_ERR(s); if (IS_ERR(s)) @@ -1308,7 +1308,6 @@ static struct dentry *gfs2_mount(struct file_system_type *fs_type, int flags, } else { char b[BDEVNAME_SIZE]; - s->s_flags = flags; s->s_mode = mode; strlcpy(s->s_id, bdevname(bdev, b), sizeof(s->s_id)); sb_set_blocksize(s, block_size(bdev)); @@ -1352,7 +1351,7 @@ static struct dentry *gfs2_mount_meta(struct file_system_type *fs_type, dev_name, error); return ERR_PTR(error); } - s = sget(&gfs2_fs_type, test_gfs2_super, set_meta_super, + s = sget(&gfs2_fs_type, test_gfs2_super, set_meta_super, flags, path.dentry->d_inode->i_sb->s_bdev); path_put(&path); if (IS_ERR(s)) { diff --git a/fs/libfs.c b/fs/libfs.c index c88eab5..130d4f7 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -220,15 +220,15 @@ struct dentry *mount_pseudo(struct file_system_type *fs_type, char *name, const struct super_operations *ops, const struct dentry_operations *dops, unsigned long magic) { - struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL); + struct super_block *s; struct dentry *dentry; struct inode *root; struct qstr d_name = {.name = name, .len = strlen(name)}; + s = sget(fs_type, NULL, set_anon_super, MS_NOUSER, NULL); if (IS_ERR(s)) return ERR_CAST(s); - s->s_flags = MS_NOUSER; s->s_maxbytes = MAX_LFS_FILESIZE; s->s_blocksize = PAGE_SIZE; s->s_blocksize_bits = PAGE_SHIFT; diff --git a/fs/logfs/super.c b/fs/logfs/super.c index ce03a18..322888a 100644 --- a/fs/logfs/super.c +++ b/fs/logfs/super.c @@ -541,7 +541,7 @@ static struct dentry *logfs_get_sb_device(struct logfs_super *super, log_super("LogFS: Start mount %x\n", mount_count++); err = -EINVAL; - sb = sget(type, logfs_sb_test, logfs_sb_set, super); + sb = sget(type, logfs_sb_test, logfs_sb_set, flags | MS_NOATIME, super); if (IS_ERR(sb)) { super->s_devops->put_device(super); kfree(super); @@ -563,7 +563,6 @@ static struct dentry *logfs_get_sb_device(struct logfs_super *super, */ sb->s_maxbytes = (1ull << 43) - 1; sb->s_op = &logfs_super_operations; - sb->s_flags = flags | MS_NOATIME; err = logfs_read_sb(sb, sb->s_flags & MS_RDONLY); if (err) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index ce40e5c..4d2ea32 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2250,7 +2250,7 @@ static struct dentry *nfs_fs_mount(struct file_system_type *fs_type, compare_super = NULL; /* Get a superblock - note that we may end up sharing one that already exists */ - s = sget(fs_type, compare_super, nfs_set_super, &sb_mntdata); + s = sget(fs_type, compare_super, nfs_set_super, flags, &sb_mntdata); if (IS_ERR(s)) { mntroot = ERR_CAST(s); goto out_err_nosb; @@ -2362,7 +2362,7 @@ nfs_xdev_mount(struct file_system_type *fs_type, int flags, compare_super = NULL; /* Get a superblock - note that we may end up sharing one that already exists */ - s = sget(&nfs_fs_type, compare_super, nfs_set_super, &sb_mntdata); + s = sget(&nfs_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata); if (IS_ERR(s)) { error = PTR_ERR(s); goto out_err_nosb; @@ -2629,7 +2629,7 @@ nfs4_remote_mount(struct file_system_type *fs_type, int flags, compare_super = NULL; /* Get a superblock - note that we may end up sharing one that already exists */ - s = sget(&nfs4_fs_type, compare_super, nfs_set_super, &sb_mntdata); + s = sget(&nfs4_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata); if (IS_ERR(s)) { error = PTR_ERR(s); goto out_free; @@ -2923,7 +2923,7 @@ nfs4_xdev_mount(struct file_system_type *fs_type, int flags, compare_super = NULL; /* Get a superblock - note that we may end up sharing one that already exists */ - s = sget(&nfs4_fs_type, compare_super, nfs_set_super, &sb_mntdata); + s = sget(&nfs4_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata); if (IS_ERR(s)) { error = PTR_ERR(s); goto out_err_nosb; @@ -3010,7 +3010,7 @@ nfs4_remote_referral_mount(struct file_system_type *fs_type, int flags, compare_super = NULL; /* Get a superblock - note that we may end up sharing one that already exists */ - s = sget(&nfs4_fs_type, compare_super, nfs_set_super, &sb_mntdata); + s = sget(&nfs4_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata); if (IS_ERR(s)) { error = PTR_ERR(s); goto out_err_nosb; diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c index 8351c44..1a36d5d 100644 --- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -1290,7 +1290,8 @@ nilfs_mount(struct file_system_type *fs_type, int flags, err = -EBUSY; goto failed; } - s = sget(fs_type, nilfs_test_bdev_super, nilfs_set_bdev_super, sd.bdev); + s = sget(fs_type, nilfs_test_bdev_super, nilfs_set_bdev_super, flags, + sd.bdev); mutex_unlock(&sd.bdev->bd_fsfreeze_mutex); if (IS_ERR(s)) { err = PTR_ERR(s); @@ -1303,7 +1304,6 @@ nilfs_mount(struct file_system_type *fs_type, int flags, s_new = true; /* New superblock instance created */ - s->s_flags = flags; s->s_mode = mode; strlcpy(s->s_id, bdevname(sd.bdev, b), sizeof(s->s_id)); sb_set_blocksize(s, block_size(sd.bdev)); diff --git a/fs/proc/root.c b/fs/proc/root.c index d6c3b41..e393f12 100644 --- a/fs/proc/root.c +++ b/fs/proc/root.c @@ -49,12 +49,11 @@ static struct dentry *proc_mount(struct file_system_type *fs_type, else ns = current->nsproxy->pid_ns; - sb = sget(fs_type, proc_test_super, proc_set_super, ns); + sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns); if (IS_ERR(sb)) return ERR_CAST(sb); if (!sb->s_root) { - sb->s_flags = flags; err = proc_fill_super(sb); if (err) { deactivate_locked_super(sb); diff --git a/fs/reiserfs/procfs.c b/fs/reiserfs/procfs.c index 7a99811..de42b03 100644 --- a/fs/reiserfs/procfs.c +++ b/fs/reiserfs/procfs.c @@ -404,7 +404,7 @@ static void *r_start(struct seq_file *m, loff_t * pos) if (l) return NULL; - if (IS_ERR(sget(&reiserfs_fs_type, test_sb, set_sb, s))) + if (IS_ERR(sget(&reiserfs_fs_type, test_sb, set_sb, 0, s))) return NULL; up_write(&s->s_umount); diff --git a/fs/super.c b/fs/super.c index ab3d672..1a36e98 100644 --- a/fs/super.c +++ b/fs/super.c @@ -41,11 +41,12 @@ DEFINE_SPINLOCK(sb_lock); /** * alloc_super - create new superblock * @type: filesystem type superblock should belong to + * @flags: the mount flags * * Allocates and initializes a new &struct super_block. alloc_super() * returns a pointer new superblock or %NULL if allocation had failed. */ -static struct super_block *alloc_super(struct file_system_type *type) +static struct super_block *alloc_super(struct file_system_type *type, int flags) { struct super_block *s = kzalloc(sizeof(struct super_block), GFP_USER); static const struct super_operations default_op; @@ -72,6 +73,7 @@ static struct super_block *alloc_super(struct file_system_type *type) #else INIT_LIST_HEAD(&s->s_files); #endif + s->s_flags = flags; s->s_bdi = &default_backing_dev_info; INIT_LIST_HEAD(&s->s_instances); INIT_HLIST_BL_HEAD(&s->s_anon); @@ -311,11 +313,13 @@ EXPORT_SYMBOL(generic_shutdown_super); * @type: filesystem type superblock should belong to * @test: comparison callback * @set: setup callback + * @flags: mount flags * @data: argument to each of them */ struct super_block *sget(struct file_system_type *type, int (*test)(struct super_block *,void *), int (*set)(struct super_block *,void *), + int flags, void *data) { struct super_block *s = NULL; @@ -345,7 +349,7 @@ retry: } if (!s) { spin_unlock(&sb_lock); - s = alloc_super(type); + s = alloc_super(type, flags); if (!s) return ERR_PTR(-ENOMEM); goto retry; @@ -730,13 +734,12 @@ struct dentry *mount_ns(struct file_system_type *fs_type, int flags, { struct super_block *sb; - sb = sget(fs_type, ns_test_super, ns_set_super, data); + sb = sget(fs_type, ns_test_super, ns_set_super, flags, data); if (IS_ERR(sb)) return ERR_CAST(sb); if (!sb->s_root) { int err; - sb->s_flags = flags; err = fill_super(sb, data, flags & MS_SILENT ? 1 : 0); if (err) { deactivate_locked_super(sb); @@ -797,7 +800,8 @@ struct dentry *mount_bdev(struct file_system_type *fs_type, error = -EBUSY; goto error_bdev; } - s = sget(fs_type, test_bdev_super, set_bdev_super, bdev); + s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC, + bdev); mutex_unlock(&bdev->bd_fsfreeze_mutex); if (IS_ERR(s)) goto error_s; @@ -822,7 +826,6 @@ struct dentry *mount_bdev(struct file_system_type *fs_type, } else { char b[BDEVNAME_SIZE]; - s->s_flags = flags | MS_NOSEC; s->s_mode = mode; strlcpy(s->s_id, bdevname(bdev, b), sizeof(s->s_id)); sb_set_blocksize(s, block_size(bdev)); @@ -867,13 +870,11 @@ struct dentry *mount_nodev(struct file_system_type *fs_type, int (*fill_super)(struct super_block *, void *, int)) { int error; - struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL); + struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL); if (IS_ERR(s)) return ERR_CAST(s); - s->s_flags = flags; - error = fill_super(s, data, flags & MS_SILENT ? 1 : 0); if (error) { deactivate_locked_super(s); @@ -896,11 +897,10 @@ struct dentry *mount_single(struct file_system_type *fs_type, struct super_block *s; int error; - s = sget(fs_type, compare_single, set_anon_super, NULL); + s = sget(fs_type, compare_single, set_anon_super, flags, NULL); if (IS_ERR(s)) return ERR_CAST(s); if (!s->s_root) { - s->s_flags = flags; error = fill_super(s, data, flags & MS_SILENT ? 1 : 0); if (error) { deactivate_locked_super(s); diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c index e34f0d9..a6674da 100644 --- a/fs/sysfs/mount.c +++ b/fs/sysfs/mount.c @@ -118,13 +118,12 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type, for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++) info->ns[type] = kobj_ns_grab_current(type); - sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info); + sb = sget(fs_type, sysfs_test_super, sysfs_set_super, flags, info); if (IS_ERR(sb) || sb->s_fs_info != info) free_sysfs_super_info(info); if (IS_ERR(sb)) return ERR_CAST(sb); if (!sb->s_root) { - sb->s_flags = flags; error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0); if (error) { deactivate_locked_super(sb); diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c index 529be05..d628df8 100644 --- a/fs/ubifs/super.c +++ b/fs/ubifs/super.c @@ -2142,7 +2142,7 @@ static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags, dbg_gen("opened ubi%d_%d", c->vi.ubi_num, c->vi.vol_id); - sb = sget(fs_type, sb_test, sb_set, c); + sb = sget(fs_type, sb_test, sb_set, flags, c); if (IS_ERR(sb)) { err = PTR_ERR(sb); kfree(c); @@ -2159,7 +2159,6 @@ static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags, goto out_deact; } } else { - sb->s_flags = flags; err = ubifs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0); if (err) goto out_deact; diff --git a/include/linux/fs.h b/include/linux/fs.h index b5b9792..602167b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1840,7 +1840,7 @@ int set_anon_super(struct super_block *s, void *data); struct super_block *sget(struct file_system_type *type, int (*test)(struct super_block *,void *), int (*set)(struct super_block *,void *), - void *data); + int flags, void *data); extern struct dentry *mount_pseudo(struct file_system_type *, char *, const struct super_operations *ops, const struct dentry_operations *dops, diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 2731d11..916924a 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1499,7 +1499,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, opts.new_root = new_root; /* Locate an existing or new sb for this hierarchy */ - sb = sget(fs_type, cgroup_test_super, cgroup_set_super, &opts); + sb = sget(fs_type, cgroup_test_super, cgroup_set_super, 0, &opts); if (IS_ERR(sb)) { ret = PTR_ERR(sb); cgroup_drop_root(opts.new_root); ^ permalink raw reply related [flat|nested] 81+ messages in thread
* [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer 2011-07-11 14:02 ` Peter Zijlstra 2011-07-11 14:50 ` [PATCH 1/2] VFS: Pass mount flags to sget() David Howells @ 2011-07-11 14:50 ` David Howells 1 sibling, 0 replies; 81+ messages in thread From: David Howells @ 2011-07-11 14:50 UTC (permalink / raw) To: peterz Cc: dhowells, aviro, hch, hramrach, ikent, linux-fsdevel, linux-kernel, jmoyer, miklos Duplicate the i_mutex and i_dir_mutex lock classes and use for unionmount upper layer superblock instead of the normal lock classes. This solves some of the lockdep noise when the VFS tries to hold locks on inodes in both layers at the same time. Note these only occur if both layers are of the same filesystem type. As far as I can tell, most of the lockdep warnings are false positives since the inodes being locked are part of different superblocks; however, because lockdep works on lock *classes*, it can't determine this. I suspect that giving each superblock its own lock class would overextend lockdep. --- fs/inode.c | 48 ++++++++++++++++++++++++++++++++++++------------ fs/namespace.c | 2 +- fs/super.c | 8 ++++++++ include/linux/fs.h | 5 +++-- 4 files changed, 48 insertions(+), 15 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 43566d1..95d076d 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -173,8 +173,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode) spin_lock_init(&inode->i_lock); lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key); + /* Duplicate the code with separate indices so that when lockdep print + * a warning, the numeric index is seen. + */ mutex_init(&inode->i_mutex); - lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key); + if (sb->s_lock_class == 0) + lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key[0]); + else + lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key[1]); init_rwsem(&inode->i_alloc_sem); lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key); @@ -882,18 +888,36 @@ void unlock_new_inode(struct inode *inode) { #ifdef CONFIG_DEBUG_LOCK_ALLOC if (S_ISDIR(inode->i_mode)) { - struct file_system_type *type = inode->i_sb->s_type; + struct super_block *sb = inode->i_sb; + struct file_system_type *type = sb->s_type; - /* Set new key only if filesystem hasn't already changed it */ - if (!lockdep_match_class(&inode->i_mutex, - &type->i_mutex_key)) { - /* - * ensure nobody is actually holding i_mutex - */ - mutex_destroy(&inode->i_mutex); - mutex_init(&inode->i_mutex); - lockdep_set_class(&inode->i_mutex, - &type->i_mutex_dir_key); + /* Set new key only if filesystem hasn't already changed it + * + * Duplicate the code with separate indices so that when + * lockdep print a warning, the numeric index is seen. + */ + if (sb->s_lock_class == 0) { + if (!lockdep_match_class(&inode->i_mutex, + &type->i_mutex_key[0])) { + /* + * ensure nobody is actually holding i_mutex + */ + mutex_destroy(&inode->i_mutex); + mutex_init(&inode->i_mutex); + lockdep_set_class(&inode->i_mutex, + &type->i_mutex_dir_key[0]); + } + } else { + if (!lockdep_match_class(&inode->i_mutex, + &type->i_mutex_key[1])) { + /* + * ensure nobody is actually holding i_mutex + */ + mutex_destroy(&inode->i_mutex); + mutex_init(&inode->i_mutex); + lockdep_set_class(&inode->i_mutex, + &type->i_mutex_dir_key[1]); + } } } #endif diff --git a/fs/namespace.c b/fs/namespace.c index 18958fd..59f0942 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2611,7 +2611,7 @@ long do_mount(char *dev_name, char *dir_name, char *type_page, flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | - MS_STRICTATIME | MS_UNION); + MS_STRICTATIME); if (flags & MS_REMOUNT) retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags, diff --git a/fs/super.c b/fs/super.c index 95a2ebc..cd60a34 100644 --- a/fs/super.c +++ b/fs/super.c @@ -74,6 +74,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags) INIT_LIST_HEAD(&s->s_files); #endif s->s_flags = flags; + s->s_lock_class = (flags & MS_UNION) ? 1 : 0; s->s_bdi = &default_backing_dev_info; INIT_LIST_HEAD(&s->s_instances); INIT_HLIST_BL_HEAD(&s->s_anon); @@ -346,6 +347,13 @@ retry: deactivate_locked_super(old); goto retry; } +#ifdef CONFIG_UNION_MOUNT + if (unlikely((old->s_flags | flags) & MS_UNION)) { + up_write(&old->s_umount); + deactivate_locked_super(old); + return ERR_PTR(-EINVAL); + } +#endif return old; } } diff --git a/include/linux/fs.h b/include/linux/fs.h index 4bf3903..2fd73a9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1373,6 +1373,7 @@ struct super_block { dev_t s_dev; /* search index; _not_ kdev_t */ unsigned char s_dirt; unsigned char s_blocksize_bits; + u8 s_lock_class; /* Set of lock classes to use */ unsigned long s_blocksize; loff_t s_maxbytes; /* Max file size */ struct file_system_type *s_type; @@ -1842,8 +1843,8 @@ struct file_system_type { struct lock_class_key s_vfs_rename_key; struct lock_class_key i_lock_key; - struct lock_class_key i_mutex_key; - struct lock_class_key i_mutex_dir_key; + struct lock_class_key i_mutex_key[2]; + struct lock_class_key i_mutex_dir_key[2]; struct lock_class_key i_alloc_sem_key; }; ^ permalink raw reply related [flat|nested] 81+ messages in thread
end of thread, other threads:[~2011-07-14 0:58 UTC | newest] Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-04-12 15:00 Unionmount status? Michal Suchanek 2011-04-12 20:31 ` Ric Wheeler 2011-04-12 21:36 ` Michal Suchanek 2011-04-13 14:18 ` Jiri Kosina 2011-04-13 15:13 ` Michal Suchanek 2011-04-14 8:38 ` Miklos Szeredi 2011-04-14 9:48 ` Sedat Dilek 2011-04-14 9:58 ` Miklos Szeredi 2011-04-15 11:22 ` Michal Suchanek 2011-04-15 11:31 ` Miklos Szeredi 2011-04-15 11:51 ` Michal Suchanek 2011-04-15 12:29 ` Miklos Szeredi 2011-04-15 12:34 ` Michal Suchanek 2011-04-15 12:34 ` Michal Suchanek 2011-04-15 12:48 ` Miklos Szeredi 2011-04-15 21:48 ` Hugh Dickins 2011-04-15 22:18 ` Andreas Dilger 2011-04-18 13:31 ` Michal Suchanek 2011-04-18 13:31 ` Michal Suchanek 2011-04-19 20:04 ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi 2011-04-20 2:18 ` Phillip Lougher 2011-04-20 2:18 ` Phillip Lougher 2011-04-20 13:43 ` Miklos Szeredi 2011-04-21 6:59 ` Michal Suchanek 2011-04-21 9:08 ` Miklos Szeredi 2011-04-21 10:59 ` Michal Suchanek 2011-04-21 14:58 ` Jordi Pujol 2011-04-21 15:22 ` Michal Suchanek 2011-04-21 15:43 ` Michal Suchanek 2011-04-21 17:26 ` Miklos Szeredi 2011-04-21 19:17 ` Michal Suchanek 2011-04-20 16:00 ` Serge E. Hallyn 2011-05-12 4:20 ` Hugh Dickins 2011-05-12 7:52 ` Michal Suchanek 2011-05-12 12:27 ` Miklos Szeredi 2011-05-12 14:00 ` Miklos Szeredi 2011-05-12 16:52 ` Hugh Dickins 2011-04-18 13:34 ` Unionmount status? Michal Suchanek 2011-04-18 13:34 ` Michal Suchanek 2011-04-18 13:37 ` Michal Suchanek 2011-04-18 13:37 ` Michal Suchanek 2011-04-13 17:26 ` Ric Wheeler 2011-04-13 18:58 ` Michal Suchanek 2011-04-13 19:11 ` Ric Wheeler 2011-04-13 19:47 ` Michal Suchanek 2011-04-14 4:50 ` Ian Kent 2011-04-14 9:32 ` Michal Suchanek 2011-04-14 9:40 ` Miklos Szeredi 2011-04-14 13:21 ` Ric Wheeler 2011-04-14 14:54 ` Michal Suchanek 2011-04-15 16:31 ` Ric Wheeler 2011-04-14 19:14 ` David Howells 2011-06-29 9:39 ` Union mount and overlayfs bake off? Ric Wheeler 2011-06-29 11:40 ` Michal Suchanek 2011-06-29 10:17 ` David Howells 2011-06-30 12:44 ` Miklos Szeredi 2011-07-10 8:28 ` Union mount and lockdep design issues Ric Wheeler 2011-07-10 13:48 ` Peter Zijlstra 2011-07-11 8:35 ` Michal Suchanek 2011-07-11 11:01 ` David Howells 2011-07-11 12:00 ` Peter Zijlstra 2011-07-11 13:36 ` Michal Suchanek 2011-07-11 13:36 ` Michal Suchanek 2011-07-11 13:50 ` Ian Kent 2011-07-11 16:17 ` Michal Suchanek 2011-07-11 17:23 ` Ian Kent 2011-07-11 18:08 ` Michal Suchanek 2011-07-12 8:30 ` Miklos Szeredi 2011-07-12 9:58 ` Michal Suchanek 2011-07-12 11:45 ` Miklos Szeredi 2011-07-12 18:49 ` Michal Suchanek 2011-07-13 9:49 ` Miklos Szeredi 2011-07-13 12:02 ` David Howells 2011-07-13 12:02 ` David Howells 2011-07-13 13:20 ` Miklos Szeredi 2011-07-14 0:57 ` David Howells 2011-07-14 0:57 ` David Howells 2011-07-11 13:54 ` David Howells 2011-07-11 14:02 ` Peter Zijlstra 2011-07-11 14:50 ` [PATCH 1/2] VFS: Pass mount flags to sget() David Howells 2011-07-11 14:50 ` [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer David Howells
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.