All of lore.kernel.org
 help / color / mirror / Atom feed
* Unionmount status?
@ 2011-04-12 15:00 Michal Suchanek
  2011-04-12 20:31 ` Ric Wheeler
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-12 15:00 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel

Hello,

as some already know the Unionmount VFS union which has been in
development for some years now is the only True Union (TM) that can be
accepted into the kernel mainline by the VFS maintainers (for reasons
of their own which you can surely find if you search the web or ask
them directly).

The current UnionMount version that can be found here:

http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works

works for me as good as aufs does. That is I can build a live CD using
this unioning solution and it boots and runs without any apparent
issues.

There are probably many possible uses of the union which I did not
test nor did I test long term stability of using the unioned
filesystem. As far as ephemeral live systems go it works fine for me,
though.

The issue is that while the code is (nearly) finished it is not yet
merged into mainline and as I am not familiar with the details of
ever-changing Linux VFS layer forward-porting this code to current
kernels is somewhat challenging.

What is the plan with unionmount now?

What is required  for it to be merged into mainline?

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-12 15:00 Unionmount status? Michal Suchanek
@ 2011-04-12 20:31 ` Ric Wheeler
  2011-04-12 21:36   ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Ric Wheeler @ 2011-04-12 20:31 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	miklos, Christoph Hellwig

On 04/12/2011 11:00 AM, Michal Suchanek wrote:
> Hello,
>
> as some already know the Unionmount VFS union which has been in
> development for some years now is the only True Union (TM) that can be
> accepted into the kernel mainline by the VFS maintainers (for reasons
> of their own which you can surely find if you search the web or ask
> them directly).
>
> The current UnionMount version that can be found here:
>
> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>
> works for me as good as aufs does. That is I can build a live CD using
> this unioning solution and it boots and runs without any apparent
> issues.
>
> There are probably many possible uses of the union which I did not
> test nor did I test long term stability of using the unioned
> filesystem. As far as ephemeral live systems go it works fine for me,
> though.
>
> The issue is that while the code is (nearly) finished it is not yet
> merged into mainline and as I am not familiar with the details of
> ever-changing Linux VFS layer forward-porting this code to current
> kernels is somewhat challenging.
>
> What is the plan with unionmount now?
>
> What is required  for it to be merged into mainline?
>
> Thanks
>
> Michal

Hi Michal,

People are actively looking to see what union mount (or overlayfs) solution to 
pursue. Val has shifted her focus away from kernel hacking these days, but did 
refresh her patch set in the last month or so.

Miklos has an overlay file system that has also been posted upstream.

I think that testing like you did and getting more eyes to look at the code is 
the next key step for both projects.

Thanks!

Ric




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-12 20:31 ` Ric Wheeler
@ 2011-04-12 21:36   ` Michal Suchanek
  2011-04-13 14:18     ` Jiri Kosina
  2011-04-13 17:26     ` Ric Wheeler
  0 siblings, 2 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-12 21:36 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	miklos, Christoph Hellwig

On 12 April 2011 22:31, Ric Wheeler <ricwheeler@gmail.com> wrote:
> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>
>> Hello,
>>
>> as some already know the Unionmount VFS union which has been in
>> development for some years now is the only True Union (TM) that can be
>> accepted into the kernel mainline by the VFS maintainers (for reasons
>> of their own which you can surely find if you search the web or ask
>> them directly).
>>
>> The current UnionMount version that can be found here:
>>
>>
>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>
>> works for me as good as aufs does. That is I can build a live CD using
>> this unioning solution and it boots and runs without any apparent
>> issues.
>>
>> There are probably many possible uses of the union which I did not
>> test nor did I test long term stability of using the unioned
>> filesystem. As far as ephemeral live systems go it works fine for me,
>> though.
>>
>> The issue is that while the code is (nearly) finished it is not yet
>> merged into mainline and as I am not familiar with the details of
>> ever-changing Linux VFS layer forward-porting this code to current
>> kernels is somewhat challenging.
>>
>> What is the plan with unionmount now?
>>
>> What is required  for it to be merged into mainline?
>>
>> Thanks
>>
>> Michal
>
> Hi Michal,
>
> People are actively looking to see what union mount (or overlayfs) solution
> to pursue. Val has shifted her focus away from kernel hacking these days,
> but did refresh her patch set in the last month or so.

I am not aware of such refreshed patch set, at least it is not
published in her repo.

>
> Miklos has an overlay file system that has also been posted upstream.

I have no idea about his other overlay filesystem either.

>
> I think that testing like you did and getting more eyes to look at the code
> is the next key step for both projects.
>

Could you provide more details?

I am not following the Linux fs development closely.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-12 21:36   ` Michal Suchanek
@ 2011-04-13 14:18     ` Jiri Kosina
  2011-04-13 15:13       ` Michal Suchanek
  2011-04-13 17:26     ` Ric Wheeler
  1 sibling, 1 reply; 81+ messages in thread
From: Jiri Kosina @ 2011-04-13 14:18 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, miklos, Christoph Hellwig

On Tue, 12 Apr 2011, Michal Suchanek wrote:

> > Miklos has an overlay file system that has also been posted upstream.
> 
> I have no idea about his other overlay filesystem either.

https://lkml.org/lkml/2011/3/22/222

-- 
Jiri Kosina
SUSE Labs, Novell Inc.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 14:18     ` Jiri Kosina
@ 2011-04-13 15:13       ` Michal Suchanek
  2011-04-14  8:38         ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-13 15:13 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, miklos, Christoph Hellwig

On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote:
> On Tue, 12 Apr 2011, Michal Suchanek wrote:
>
>> > Miklos has an overlay file system that has also been posted upstream.
>>
>> I have no idea about his other overlay filesystem either.
>
> https://lkml.org/lkml/2011/3/22/222
>

Thanks for pointing out this announcement.

However, neither this announcement nor the document
Documentation/filesystems/overlayfs.txt included in the git tree
details how to mount the filesystem. Without that it is kind of
useless.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-12 21:36   ` Michal Suchanek
  2011-04-13 14:18     ` Jiri Kosina
@ 2011-04-13 17:26     ` Ric Wheeler
  2011-04-13 18:58       ` Michal Suchanek
  1 sibling, 1 reply; 81+ messages in thread
From: Ric Wheeler @ 2011-04-13 17:26 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	miklos, Christoph Hellwig

On 04/12/2011 05:36 PM, Michal Suchanek wrote:
> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com>  wrote:
>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>> Hello,
>>>
>>> as some already know the Unionmount VFS union which has been in
>>> development for some years now is the only True Union (TM) that can be
>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>> of their own which you can surely find if you search the web or ask
>>> them directly).
>>>
>>> The current UnionMount version that can be found here:
>>>
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>
>>> works for me as good as aufs does. That is I can build a live CD using
>>> this unioning solution and it boots and runs without any apparent
>>> issues.
>>>
>>> There are probably many possible uses of the union which I did not
>>> test nor did I test long term stability of using the unioned
>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>> though.
>>>
>>> The issue is that while the code is (nearly) finished it is not yet
>>> merged into mainline and as I am not familiar with the details of
>>> ever-changing Linux VFS layer forward-porting this code to current
>>> kernels is somewhat challenging.
>>>
>>> What is the plan with unionmount now?
>>>
>>> What is required  for it to be merged into mainline?
>>>
>>> Thanks
>>>
>>> Michal
>> Hi Michal,
>>
>> People are actively looking to see what union mount (or overlayfs) solution
>> to pursue. Val has shifted her focus away from kernel hacking these days,
>> but did refresh her patch set in the last month or so.
> I am not aware of such refreshed patch set, at least it is not
> published in her repo.
>

Val posted the refreshed patches with the title on March 22nd:

http://lwn.net/Articles/435019/

Ric

>> Miklos has an overlay file system that has also been posted upstream.
> I have no idea about his other overlay filesystem either.
>
>> I think that testing like you did and getting more eyes to look at the code
>> is the next key step for both projects.
>>
> Could you provide more details?
>
> I am not following the Linux fs development closely.
>
> Thanks
>
> Michal


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 17:26     ` Ric Wheeler
@ 2011-04-13 18:58       ` Michal Suchanek
  2011-04-13 19:11         ` Ric Wheeler
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-13 18:58 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	miklos, Christoph Hellwig

On 13 April 2011 19:26, Ric Wheeler <ricwheeler@gmail.com> wrote:
> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>>
>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com>  wrote:
>>>
>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>>>
>>>> Hello,
>>>>
>>>> as some already know the Unionmount VFS union which has been in
>>>> development for some years now is the only True Union (TM) that can be
>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>>> of their own which you can surely find if you search the web or ask
>>>> them directly).
>>>>
>>>> The current UnionMount version that can be found here:
>>>>
>>>>
>>>>
>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>>
>>>> works for me as good as aufs does. That is I can build a live CD using
>>>> this unioning solution and it boots and runs without any apparent
>>>> issues.
>>>>
>>>> There are probably many possible uses of the union which I did not
>>>> test nor did I test long term stability of using the unioned
>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>>> though.
>>>>
>>>> The issue is that while the code is (nearly) finished it is not yet
>>>> merged into mainline and as I am not familiar with the details of
>>>> ever-changing Linux VFS layer forward-porting this code to current
>>>> kernels is somewhat challenging.
>>>>
>>>> What is the plan with unionmount now?
>>>>
>>>> What is required  for it to be merged into mainline?
>>>>
>>>> Thanks
>>>>
>>>> Michal
>>>
>>> Hi Michal,
>>>
>>> People are actively looking to see what union mount (or overlayfs)
>>> solution
>>> to pursue. Val has shifted her focus away from kernel hacking these days,
>>> but did refresh her patch set in the last month or so.
>>
>> I am not aware of such refreshed patch set, at least it is not
>> published in her repo.
>>
>
> Val posted the refreshed patches with the title on March 22nd:
>
> http://lwn.net/Articles/435019/
>

That article references the same four months old repo which I
mentioned at the start of the thread, only a slightly different
branch.

While it maybe useful for testing unionmount (which I already tried)
it is not a patch against current kernel which could be used to build
current live images.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 18:58       ` Michal Suchanek
@ 2011-04-13 19:11         ` Ric Wheeler
  2011-04-13 19:47           ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Ric Wheeler @ 2011-04-13 19:11 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	miklos, Christoph Hellwig

On 04/13/2011 02:58 PM, Michal Suchanek wrote:
> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com>  wrote:
>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com>    wrote:
>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>>>> Hello,
>>>>>
>>>>> as some already know the Unionmount VFS union which has been in
>>>>> development for some years now is the only True Union (TM) that can be
>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>>>> of their own which you can surely find if you search the web or ask
>>>>> them directly).
>>>>>
>>>>> The current UnionMount version that can be found here:
>>>>>
>>>>>
>>>>>
>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>>>
>>>>> works for me as good as aufs does. That is I can build a live CD using
>>>>> this unioning solution and it boots and runs without any apparent
>>>>> issues.
>>>>>
>>>>> There are probably many possible uses of the union which I did not
>>>>> test nor did I test long term stability of using the unioned
>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>>>> though.
>>>>>
>>>>> The issue is that while the code is (nearly) finished it is not yet
>>>>> merged into mainline and as I am not familiar with the details of
>>>>> ever-changing Linux VFS layer forward-porting this code to current
>>>>> kernels is somewhat challenging.
>>>>>
>>>>> What is the plan with unionmount now?
>>>>>
>>>>> What is required  for it to be merged into mainline?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Michal
>>>> Hi Michal,
>>>>
>>>> People are actively looking to see what union mount (or overlayfs)
>>>> solution
>>>> to pursue. Val has shifted her focus away from kernel hacking these days,
>>>> but did refresh her patch set in the last month or so.
>>> I am not aware of such refreshed patch set, at least it is not
>>> published in her repo.
>>>
>> Val posted the refreshed patches with the title on March 22nd:
>>
>> http://lwn.net/Articles/435019/
>>
> That article references the same four months old repo which I
> mentioned at the start of the thread, only a slightly different
> branch.
>
> While it maybe useful for testing unionmount (which I already tried)
> it is not a patch against current kernel which could be used to build
> current live images.
>
> Thanks
>
> Michal

She did post the patch series that same date in March - you can probably grab 
the series from linux-fsdevel, look for this series:

"[PATCH 00/74] Union mounts version something or other"

Al Viro was planning on looking at her refreshed patches (he had reviewed them 
with her in person), but that is not going to happen any time soon so getting 
more eyes and testing would be great!

Ric


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 19:11         ` Ric Wheeler
@ 2011-04-13 19:47           ` Michal Suchanek
  2011-04-14  4:50             ` Ian Kent
  2011-04-14 19:14             ` David Howells
  0 siblings, 2 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-13 19:47 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	miklos, Christoph Hellwig

On 13 April 2011 21:11, Ric Wheeler <ricwheeler@gmail.com> wrote:
> On 04/13/2011 02:58 PM, Michal Suchanek wrote:
>>
>> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com>  wrote:
>>>
>>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>>>>
>>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com>    wrote:
>>>>>
>>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> as some already know the Unionmount VFS union which has been in
>>>>>> development for some years now is the only True Union (TM) that can be
>>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>>>>>> of their own which you can surely find if you search the web or ask
>>>>>> them directly).
>>>>>>
>>>>>> The current UnionMount version that can be found here:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>>>>>>
>>>>>> works for me as good as aufs does. That is I can build a live CD using
>>>>>> this unioning solution and it boots and runs without any apparent
>>>>>> issues.
>>>>>>
>>>>>> There are probably many possible uses of the union which I did not
>>>>>> test nor did I test long term stability of using the unioned
>>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>>>>>> though.
>>>>>>
>>>>>> The issue is that while the code is (nearly) finished it is not yet
>>>>>> merged into mainline and as I am not familiar with the details of
>>>>>> ever-changing Linux VFS layer forward-porting this code to current
>>>>>> kernels is somewhat challenging.
>>>>>>
>>>>>> What is the plan with unionmount now?
>>>>>>
>>>>>> What is required  for it to be merged into mainline?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Michal
>>>>>
>>>>> Hi Michal,
>>>>>
>>>>> People are actively looking to see what union mount (or overlayfs)
>>>>> solution
>>>>> to pursue. Val has shifted her focus away from kernel hacking these
>>>>> days,
>>>>> but did refresh her patch set in the last month or so.
>>>>
>>>> I am not aware of such refreshed patch set, at least it is not
>>>> published in her repo.
>>>>
>>> Val posted the refreshed patches with the title on March 22nd:
>>>
>>> http://lwn.net/Articles/435019/
>>>
>> That article references the same four months old repo which I
>> mentioned at the start of the thread, only a slightly different
>> branch.
>>
>> While it maybe useful for testing unionmount (which I already tried)
>> it is not a patch against current kernel which could be used to build
>> current live images.
>>
>> Thanks
>>
>> Michal
>
> She did post the patch series that same date in March - you can probably
> grab the series from linux-fsdevel, look for this series:
>
> "[PATCH 00/74] Union mounts version something or other"
>
> Al Viro was planning on looking at her refreshed patches (he had reviewed
> them with her in person), but that is not going to happen any time soon so
> getting more eyes and testing would be great!
>

Even gmame can't collect the patches back from the ML, I don't want to try.

However, the discussion suggests that these are exactly the 4 months
old branch ending in a commit with the summary "Temporary commit"
which did not inspire confidence in me so I used the previous (also 4
moths old) branch.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 19:47           ` Michal Suchanek
@ 2011-04-14  4:50             ` Ian Kent
  2011-04-14  9:32               ` Michal Suchanek
  2011-04-14 19:14             ` David Howells
  1 sibling, 1 reply; 81+ messages in thread
From: Ian Kent @ 2011-04-14  4:50 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Jeff Moyer, miklos, Christoph Hellwig

On Wed, 2011-04-13 at 21:47 +0200, Michal Suchanek wrote:
> On 13 April 2011 21:11, Ric Wheeler <ricwheeler@gmail.com> wrote:
> > On 04/13/2011 02:58 PM, Michal Suchanek wrote:
> >>
> >> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com>  wrote:
> >>>
> >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
> >>>>
> >>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com>    wrote:
> >>>>>
> >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> as some already know the Unionmount VFS union which has been in
> >>>>>> development for some years now is the only True Union (TM) that can be
> >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
> >>>>>> of their own which you can surely find if you search the web or ask
> >>>>>> them directly).
> >>>>>>
> >>>>>> The current UnionMount version that can be found here:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
> >>>>>>
> >>>>>> works for me as good as aufs does. That is I can build a live CD using
> >>>>>> this unioning solution and it boots and runs without any apparent
> >>>>>> issues.
> >>>>>>
> >>>>>> There are probably many possible uses of the union which I did not
> >>>>>> test nor did I test long term stability of using the unioned
> >>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
> >>>>>> though.
> >>>>>>
> >>>>>> The issue is that while the code is (nearly) finished it is not yet
> >>>>>> merged into mainline and as I am not familiar with the details of
> >>>>>> ever-changing Linux VFS layer forward-porting this code to current
> >>>>>> kernels is somewhat challenging.
> >>>>>>
> >>>>>> What is the plan with unionmount now?
> >>>>>>
> >>>>>> What is required  for it to be merged into mainline?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Michal
> >>>>>
> >>>>> Hi Michal,
> >>>>>
> >>>>> People are actively looking to see what union mount (or overlayfs)
> >>>>> solution
> >>>>> to pursue. Val has shifted her focus away from kernel hacking these
> >>>>> days,
> >>>>> but did refresh her patch set in the last month or so.
> >>>>
> >>>> I am not aware of such refreshed patch set, at least it is not
> >>>> published in her repo.
> >>>>
> >>> Val posted the refreshed patches with the title on March 22nd:
> >>>
> >>> http://lwn.net/Articles/435019/
> >>>
> >> That article references the same four months old repo which I
> >> mentioned at the start of the thread, only a slightly different
> >> branch.
> >>
> >> While it maybe useful for testing unionmount (which I already tried)
> >> it is not a patch against current kernel which could be used to build
> >> current live images.
> >>
> >> Thanks
> >>
> >> Michal
> >
> > She did post the patch series that same date in March - you can probably
> > grab the series from linux-fsdevel, look for this series:
> >
> > "[PATCH 00/74] Union mounts version something or other"
> >
> > Al Viro was planning on looking at her refreshed patches (he had reviewed
> > them with her in person), but that is not going to happen any time soon so
> > getting more eyes and testing would be great!
> >
> 
> Even gmame can't collect the patches back from the ML, I don't want to try.
> 
> However, the discussion suggests that these are exactly the 4 months
> old branch ending in a commit with the summary "Temporary commit"
> which did not inspire confidence in me so I used the previous (also 4
> moths old) branch.

Yes, that's the impression I have too.

I believe David was working to update the patches and his silence
indicates he is probably bogged down with other priority work. If that's
the case, and your still interested, I might be able to help updating
the series some time soon. I haven't reviewed any of Val's series posts
for a while now so I'd need to catch up with the current state of the
project first.

I guess the first thing is to find out if David has made any progress,
David?

As for the overlayfs from Miklos I haven't looked closely at it but
since Miklos hasn't replied so far I'm guessing there's still a way to
with that as well.

Ian



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 15:13       ` Michal Suchanek
@ 2011-04-14  8:38         ` Miklos Szeredi
  2011-04-14  9:48           ` Sedat Dilek
  2011-04-15 11:22           ` Michal Suchanek
  0 siblings, 2 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-14  8:38 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote:
>> https://lkml.org/lkml/2011/3/22/222
>>
>
> Thanks for pointing out this announcement.
>
> However, neither this announcement nor the document
> Documentation/filesystems/overlayfs.txt included in the git tree
> details how to mount the filesystem. Without that it is kind of
> useless.

Oh, I'll fix that in the docs.  Thanks for pointing it out.

Here's how to mount:

  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14  4:50             ` Ian Kent
@ 2011-04-14  9:32               ` Michal Suchanek
  2011-04-14  9:40                 ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-14  9:32 UTC (permalink / raw)
  To: Ian Kent
  Cc: Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Jeff Moyer, miklos, Christoph Hellwig

On 14 April 2011 06:50, Ian Kent <ikent@redhat.com> wrote:
> On Wed, 2011-04-13 at 21:47 +0200, Michal Suchanek wrote:
>> On 13 April 2011 21:11, Ric Wheeler <ricwheeler@gmail.com> wrote:
>> > On 04/13/2011 02:58 PM, Michal Suchanek wrote:
>> >>
>> >> On 13 April 2011 19:26, Ric Wheeler<ricwheeler@gmail.com>  wrote:
>> >>>
>> >>> On 04/12/2011 05:36 PM, Michal Suchanek wrote:
>> >>>>
>> >>>> On 12 April 2011 22:31, Ric Wheeler<ricwheeler@gmail.com>    wrote:
>> >>>>>
>> >>>>> On 04/12/2011 11:00 AM, Michal Suchanek wrote:
>> >>>>>>
>> >>>>>> Hello,
>> >>>>>>
>> >>>>>> as some already know the Unionmount VFS union which has been in
>> >>>>>> development for some years now is the only True Union (TM) that can be
>> >>>>>> accepted into the kernel mainline by the VFS maintainers (for reasons
>> >>>>>> of their own which you can surely find if you search the web or ask
>> >>>>>> them directly).
>> >>>>>>
>> >>>>>> The current UnionMount version that can be found here:
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> http://git.kernel.org/?p=linux/kernel/git/val/linux-2.6.git;a=shortlog;h=refs/heads/ext2_works
>> >>>>>>
>> >>>>>> works for me as good as aufs does. That is I can build a live CD using
>> >>>>>> this unioning solution and it boots and runs without any apparent
>> >>>>>> issues.
>> >>>>>>
>> >>>>>> There are probably many possible uses of the union which I did not
>> >>>>>> test nor did I test long term stability of using the unioned
>> >>>>>> filesystem. As far as ephemeral live systems go it works fine for me,
>> >>>>>> though.
>> >>>>>>
>> >>>>>> The issue is that while the code is (nearly) finished it is not yet
>> >>>>>> merged into mainline and as I am not familiar with the details of
>> >>>>>> ever-changing Linux VFS layer forward-porting this code to current
>> >>>>>> kernels is somewhat challenging.
>> >>>>>>
>> >>>>>> What is the plan with unionmount now?
>> >>>>>>
>> >>>>>> What is required  for it to be merged into mainline?
>> >>>>>>
>> >>>>>> Thanks
>> >>>>>>
>> >>>>>> Michal
>> >>>>>
>> >>>>> Hi Michal,
>> >>>>>
>> >>>>> People are actively looking to see what union mount (or overlayfs)
>> >>>>> solution
>> >>>>> to pursue. Val has shifted her focus away from kernel hacking these
>> >>>>> days,
>> >>>>> but did refresh her patch set in the last month or so.
>> >>>>
>> >>>> I am not aware of such refreshed patch set, at least it is not
>> >>>> published in her repo.
>> >>>>
>> >>> Val posted the refreshed patches with the title on March 22nd:
>> >>>
>> >>> http://lwn.net/Articles/435019/
>> >>>
>> >> That article references the same four months old repo which I
>> >> mentioned at the start of the thread, only a slightly different
>> >> branch.
>> >>
>> >> While it maybe useful for testing unionmount (which I already tried)
>> >> it is not a patch against current kernel which could be used to build
>> >> current live images.
>> >>
>> >> Thanks
>> >>
>> >> Michal
>> >
>> > She did post the patch series that same date in March - you can probably
>> > grab the series from linux-fsdevel, look for this series:
>> >
>> > "[PATCH 00/74] Union mounts version something or other"
>> >
>> > Al Viro was planning on looking at her refreshed patches (he had reviewed
>> > them with her in person), but that is not going to happen any time soon so
>> > getting more eyes and testing would be great!
>> >
>>
>> Even gmame can't collect the patches back from the ML, I don't want to try.
>>
>> However, the discussion suggests that these are exactly the 4 months
>> old branch ending in a commit with the summary "Temporary commit"
>> which did not inspire confidence in me so I used the previous (also 4
>> moths old) branch.
>
> Yes, that's the impression I have too.
>
> I believe David was working to update the patches and his silence
> indicates he is probably bogged down with other priority work. If that's
> the case, and your still interested, I might be able to help updating
> the series some time soon. I haven't reviewed any of Val's series posts
> for a while now so I'd need to catch up with the current state of the
> project first.

I guess overlayfs includes the better part of unionmount and achieves
similar level of functionality in much smaller code size and is
actively developed.

This might make it the best candidate for inclusion so far.

It does not (yet?) support NFS which is one of the options commonly
used with union solutions, though.

I personally don;t use NFS and have not tested overlayfs so far so I can't tell.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14  9:32               ` Michal Suchanek
@ 2011-04-14  9:40                 ` Miklos Szeredi
  2011-04-14 13:21                   ` Ric Wheeler
  0 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-14  9:40 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Ian Kent, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Jeff Moyer, Christoph Hellwig

On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek <hramrach@centrum.cz> wrote:
> I guess overlayfs includes the better part of unionmount and achieves
> similar level of functionality in much smaller code size and is
> actively developed.
>
> This might make it the best candidate for inclusion so far.
>
> It does not (yet?) support NFS which is one of the options commonly
> used with union solutions, though.

NFS is supported as a lower (read-only) layer, but not as an upper
(read-write) layer.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14  8:38         ` Miklos Szeredi
@ 2011-04-14  9:48           ` Sedat Dilek
  2011-04-14  9:58             ` Miklos Szeredi
  2011-04-15 11:22           ` Michal Suchanek
  1 sibling, 1 reply; 81+ messages in thread
From: Sedat Dilek @ 2011-04-14  9:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On Thu, Apr 14, 2011 at 10:38 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote:
>>> https://lkml.org/lkml/2011/3/22/222
>>>
>>
>> Thanks for pointing out this announcement.
>>
>> However, neither this announcement nor the document
>> Documentation/filesystems/overlayfs.txt included in the git tree
>> details how to mount the filesystem. Without that it is kind of
>> useless.
>
> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>
> Here's how to mount:
>
>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>
> Thanks,
> Miklos
>

Hi Miklos,

did you stop working on OverlayFS (no hear no read since 3 weeks :-))?
You made some post-v7-patchset patches, but did not publish a v8.

Also, Linus requested a finer-grained split of the big overlayfs.c
file like in the other FS.
What's the status on this?

Regards,
- Sedat -

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14  9:48           ` Sedat Dilek
@ 2011-04-14  9:58             ` Miklos Szeredi
  0 siblings, 0 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-14  9:58 UTC (permalink / raw)
  To: sedat.dilek
  Cc: Sedat Dilek, Michal Suchanek, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On Thu, Apr 14, 2011 at 11:48 AM, Sedat Dilek
<sedat.dilek@googlemail.com> wrote:
> did you stop working on OverlayFS (no hear no read since 3 weeks :-))?
> You made some post-v7-patchset patches, but did not publish a v8.
>
> Also, Linus requested a finer-grained split of the big overlayfs.c
> file like in the other FS.
> What's the status on this?

I'm working on that.  Will post patches shortly.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14  9:40                 ` Miklos Szeredi
@ 2011-04-14 13:21                   ` Ric Wheeler
  2011-04-14 14:54                     ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Ric Wheeler @ 2011-04-14 13:21 UTC (permalink / raw)
  To: Miklos Szeredi, Christoph Hellwig
  Cc: Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel,
	David Howells, Jeff Moyer

On 04/14/2011 05:40 AM, Miklos Szeredi wrote:
> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<hramrach@centrum.cz>  wrote:
>> I guess overlayfs includes the better part of unionmount and achieves
>> similar level of functionality in much smaller code size and is
>> actively developed.
>>
>> This might make it the best candidate for inclusion so far.
>>
>> It does not (yet?) support NFS which is one of the options commonly
>> used with union solutions, though.
> NFS is supported as a lower (read-only) layer, but not as an upper
> (read-write) layer.
>
> Thanks,
> Miklos
I am not that concerned with the state of Val's repo, her intention was to hand 
off the project cleanly to others and have them drive the code (that hand off 
was the posting of the patch set). Several people (Ian, David Howells and Al 
Viro) had been involved with union mounts recently, so we do have reasonable 
candidates for a hand off.

One of the concerns with unionfs is the duplication of data. Union mounts avoids 
this with that implementation. That might make unionfs more of a burden for very 
large file systems, but probably not a concern for many use cases.

The union mount patch set is certainly considerably larger.

Christoph had expressed some concerns about locking that I think it would be 
good to discuss again as well.

Ric




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14 13:21                   ` Ric Wheeler
@ 2011-04-14 14:54                     ` Michal Suchanek
  2011-04-15 16:31                       ` Ric Wheeler
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-14 14:54 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Miklos Szeredi, Christoph Hellwig, Ian Kent, linux-fsdevel,
	linux-kernel, David Howells, Jeff Moyer

On 14 April 2011 15:21, Ric Wheeler <ricwheeler@gmail.com> wrote:
> On 04/14/2011 05:40 AM, Miklos Szeredi wrote:
>>
>> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<hramrach@centrum.cz>
>>  wrote:
>>>
>>> I guess overlayfs includes the better part of unionmount and achieves
>>> similar level of functionality in much smaller code size and is
>>> actively developed.
>>>
>>> This might make it the best candidate for inclusion so far.
>>>
>>> It does not (yet?) support NFS which is one of the options commonly
>>> used with union solutions, though.
>>
>> NFS is supported as a lower (read-only) layer, but not as an upper
>> (read-write) layer.
>>
>> Thanks,
>> Miklos
>
> I am not that concerned with the state of Val's repo, her intention was to
> hand off the project cleanly to others and have them drive the code (that
> hand off was the posting of the patch set). Several people (Ian, David
> Howells and Al Viro) had been involved with union mounts recently, so we do
> have reasonable candidates for a hand off.
>
> One of the concerns with unionfs is the duplication of data. Union mounts
> avoids this with that implementation. That might make unionfs more of a
> burden for very large file systems, but probably not a concern for many use
> cases.

Just to make things clear, what is a very large filesystem?

A heavily compressed DVD image?

Tens or hundreds  of gigabytes? Terabytes?

Hundreds, thousands or hundreds of thousands of inodes?

Or is testing required to determine at what size the performance
becomes unacceptable?

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-13 19:47           ` Michal Suchanek
  2011-04-14  4:50             ` Ian Kent
@ 2011-04-14 19:14             ` David Howells
  2011-06-29  9:39               ` Union mount and overlayfs bake off? Ric Wheeler
  2011-06-29 10:17               ` David Howells
  1 sibling, 2 replies; 81+ messages in thread
From: David Howells @ 2011-04-14 19:14 UTC (permalink / raw)
  To: Ian Kent
  Cc: dhowells, Michal Suchanek, Ric Wheeler, linux-fsdevel,
	linux-kernel, Jeff Moyer, miklos, Christoph Hellwig

Ian Kent <ikent@redhat.com> wrote:

> I believe David was working to update the patches and his silence
> indicates he is probably bogged down with other priority work. If that's
> the case, and your still interested, I might be able to help updating
> the series some time soon. I haven't reviewed any of Val's series posts
> for a while now so I'd need to catch up with the current state of the
> project first.
> 
> I guess the first thing is to find out if David has made any progress,
> David?

I started trying to forwardport them.  It's not trivial because of Nick's RCU
pathwalk patches.

However, I noticed that the collection of patches I was working on wasn't the
most recent in Val's GIT tree, so I stopped work on them whilst I found out
from Val which was the correct branch to use.  Her reply came just before I
jetted out to LSF, and I haven't got round to working on them again.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14  8:38         ` Miklos Szeredi
  2011-04-14  9:48           ` Sedat Dilek
@ 2011-04-15 11:22           ` Michal Suchanek
  2011-04-15 11:31             ` Miklos Szeredi
  1 sibling, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-15 11:22 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On 14 April 2011 10:38, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote:
>>> https://lkml.org/lkml/2011/3/22/222
>>>
>>
>> Thanks for pointing out this announcement.
>>
>> However, neither this announcement nor the document
>> Documentation/filesystems/overlayfs.txt included in the git tree
>> details how to mount the filesystem. Without that it is kind of
>> useless.
>
> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>
> Here's how to mount:
>
>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>

OK, I built a small live CD with the v7 patch and I can boot it but I
get errors like

Cleaning up temporary files...
[    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
find: cannot delete `./motd': Operation not supported

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 11:22           ` Michal Suchanek
@ 2011-04-15 11:31             ` Miklos Szeredi
  2011-04-15 11:51               ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-15 11:31 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 14 April 2011 10:38, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>>> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote:
>>>> https://lkml.org/lkml/2011/3/22/222
>>>>
>>>
>>> Thanks for pointing out this announcement.
>>>
>>> However, neither this announcement nor the document
>>> Documentation/filesystems/overlayfs.txt included in the git tree
>>> details how to mount the filesystem. Without that it is kind of
>>> useless.
>>
>> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>>
>> Here's how to mount:
>>
>>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>>
>
> OK, I built a small live CD with the v7 patch and I can boot it but I
> get errors like
>
> Cleaning up temporary files...
> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
> find: cannot delete `./motd': Operation not supported

What is the upper filesystem type?  Is xattr support enabled?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 11:31             ` Miklos Szeredi
@ 2011-04-15 11:51               ` Michal Suchanek
  2011-04-15 12:29                 ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-15 11:51 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 14 April 2011 10:38, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Wed, Apr 13, 2011 at 5:13 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>>>> On 13 April 2011 16:18, Jiri Kosina <jkosina@suse.cz> wrote:
>>>>> https://lkml.org/lkml/2011/3/22/222
>>>>>
>>>>
>>>> Thanks for pointing out this announcement.
>>>>
>>>> However, neither this announcement nor the document
>>>> Documentation/filesystems/overlayfs.txt included in the git tree
>>>> details how to mount the filesystem. Without that it is kind of
>>>> useless.
>>>
>>> Oh, I'll fix that in the docs.  Thanks for pointing it out.
>>>
>>> Here's how to mount:
>>>
>>>  mount -t overlayfs overlayfs -olowerdir=/lower -oupperdir=/upper /overlay
>>>
>>
>> OK, I built a small live CD with the v7 patch and I can boot it but I
>> get errors like
>>
>> Cleaning up temporary files...
>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>> find: cannot delete `./motd': Operation not supported
>
> What is the upper filesystem type?  Is xattr support enabled?
>

The upper filesystem is tmpfs and there is not option regarding XATTR
in the config.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 11:51               ` Michal Suchanek
@ 2011-04-15 12:29                 ` Miklos Szeredi
  2011-04-15 12:34                     ` Michal Suchanek
                                     ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-15 12:29 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>>>
>>> Cleaning up temporary files...
>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>> find: cannot delete `./motd': Operation not supported
>>
>> What is the upper filesystem type?  Is xattr support enabled?
>>
>
> The upper filesystem is tmpfs and there is not option regarding XATTR
> in the config.

Apparently tmpfs does not support generic xattr.  I understand why
tmpfs is an attractive choice for an upper filesystem, so this should
be addressed.

I see two options here:

1) implement generic xattr in tmpfs
2) take whiteout/opaque support from union mounts and use that

Both have advantages and disadvantages.

Any thoughts?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 12:29                 ` Miklos Szeredi
@ 2011-04-15 12:34                     ` Michal Suchanek
  2011-04-15 21:48                   ` Hugh Dickins
  2011-04-15 22:18                   ` Andreas Dilger
  2 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-15 12:34 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On 15 April 2011 14:29, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>>>>
>>>> Cleaning up temporary files...
>>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>>> find: cannot delete `./motd': Operation not supported
>>>
>>> What is the upper filesystem type?  Is xattr support enabled?
>>>
>>
>> The upper filesystem is tmpfs and there is not option regarding XATTR
>> in the config.
>
> Apparently tmpfs does not support generic xattr.  I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
>
> I see two options here:
>
> 1) implement generic xattr in tmpfs

IMHO this is a general feature that keeps overlayfs in itself clean
and small and can benefit other uses of tmpfs.

> 2) take whiteout/opaque support from union mounts and use that

How far from importing full unionmount is that?

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
@ 2011-04-15 12:34                     ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-15 12:34 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On 15 April 2011 14:29, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>>>>
>>>> Cleaning up temporary files...
>>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>>> find: cannot delete `./motd': Operation not supported
>>>
>>> What is the upper filesystem type?  Is xattr support enabled?
>>>
>>
>> The upper filesystem is tmpfs and there is not option regarding XATTR
>> in the config.
>
> Apparently tmpfs does not support generic xattr.  I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
>
> I see two options here:
>
> 1) implement generic xattr in tmpfs

IMHO this is a general feature that keeps overlayfs in itself clean
and small and can benefit other uses of tmpfs.

> 2) take whiteout/opaque support from union mounts and use that

How far from importing full unionmount is that?

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 12:34                     ` Michal Suchanek
  (?)
@ 2011-04-15 12:48                     ` Miklos Szeredi
  -1 siblings, 0 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-15 12:48 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Jiri Kosina, Ric Wheeler, linux-fsdevel, linux-kernel,
	David Howells, Ian Kent, Jeff Moyer, Christoph Hellwig

On Fri, Apr 15, 2011 at 2:34 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 15 April 2011 14:29, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> 2) take whiteout/opaque support from union mounts and use that
>
> How far from importing full unionmount is that?

Whiteout/opaque support is quite separate from the rest of union
mounts, and could be reusable for overlayfs.

There are several reasons why I didn't want to go that way with:

- lots of filesystems would have to be updated
- it introduces incompatibility in the filesystem format, which can be
a real pain (not for tmpfs, obviously, since tmpfs doesn't have a
persistent backing)

There *are* advantages to doing whiteouts in the filesystem, for
example it makes file removal atomic.  But atomicity is something that
needs to be addressed in lots of other places (e.g. copy up) not just
during whiteout, and there are other ways to do that than push the
responsibility into each and every filesystem.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-14 14:54                     ` Michal Suchanek
@ 2011-04-15 16:31                       ` Ric Wheeler
  0 siblings, 0 replies; 81+ messages in thread
From: Ric Wheeler @ 2011-04-15 16:31 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Miklos Szeredi, Christoph Hellwig, Ian Kent, linux-fsdevel,
	linux-kernel, David Howells, Jeff Moyer

On 04/14/2011 10:54 AM, Michal Suchanek wrote:
> On 14 April 2011 15:21, Ric Wheeler<ricwheeler@gmail.com>  wrote:
>> On 04/14/2011 05:40 AM, Miklos Szeredi wrote:
>>> On Thu, Apr 14, 2011 at 11:32 AM, Michal Suchanek<hramrach@centrum.cz>
>>>   wrote:
>>>> I guess overlayfs includes the better part of unionmount and achieves
>>>> similar level of functionality in much smaller code size and is
>>>> actively developed.
>>>>
>>>> This might make it the best candidate for inclusion so far.
>>>>
>>>> It does not (yet?) support NFS which is one of the options commonly
>>>> used with union solutions, though.
>>> NFS is supported as a lower (read-only) layer, but not as an upper
>>> (read-write) layer.
>>>
>>> Thanks,
>>> Miklos
>> I am not that concerned with the state of Val's repo, her intention was to
>> hand off the project cleanly to others and have them drive the code (that
>> hand off was the posting of the patch set). Several people (Ian, David
>> Howells and Al Viro) had been involved with union mounts recently, so we do
>> have reasonable candidates for a hand off.
>>
>> One of the concerns with unionfs is the duplication of data. Union mounts
>> avoids this with that implementation. That might make unionfs more of a
>> burden for very large file systems, but probably not a concern for many use
>> cases.
> Just to make things clear, what is a very large filesystem?
>
> A heavily compressed DVD image?
>
> Tens or hundreds  of gigabytes? Terabytes?
>
> Hundreds, thousands or hundreds of thousands of inodes?
>
> Or is testing required to determine at what size the performance
> becomes unacceptable?
>
> Thanks
>
> Michal

Very large in the number of inodes more so than fs size...

Ric


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 12:29                 ` Miklos Szeredi
  2011-04-15 12:34                     ` Michal Suchanek
@ 2011-04-15 21:48                   ` Hugh Dickins
  2011-04-15 22:18                   ` Andreas Dilger
  2 siblings, 0 replies; 81+ messages in thread
From: Hugh Dickins @ 2011-04-15 21:48 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Eric Paris, James Morris

On Fri, Apr 15, 2011 at 5:29 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Apr 15, 2011 at 1:51 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>> On 15 April 2011 13:31, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> On Fri, Apr 15, 2011 at 1:22 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
>>>>
>>>> Cleaning up temporary files...
>>>> [    nn.nnnnnn] overlayfs: ERROR - failed to whiteout 'motd'
>>>> find: cannot delete `./motd': Operation not supported
>>>
>>> What is the upper filesystem type?  Is xattr support enabled?
>>>
>>
>> The upper filesystem is tmpfs and there is not option regarding XATTR
>> in the config.
>
> Apparently tmpfs does not support generic xattr.  I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
>
> I see two options here:
>
> 1) implement generic xattr in tmpfs
> 2) take whiteout/opaque support from union mounts and use that
>
> Both have advantages and disadvantages.
>
> Any thoughts?

Please get together with Eric: he's having trouble whipping up
enthusiasm for his tmpfs xattr patch, he and I would both appreciate
your support (and if I've persuaded him to leave out a part of it that
you would need, join forces to call me an idiot).  Mainly I'm hoping
to hear from hch and jmorris on it, being xattr-illiterate myself.

https://lkml.org/lkml/2011/3/29/302

Hugh

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 12:29                 ` Miklos Szeredi
  2011-04-15 12:34                     ` Michal Suchanek
  2011-04-15 21:48                   ` Hugh Dickins
@ 2011-04-15 22:18                   ` Andreas Dilger
  2011-04-18 13:31                       ` Michal Suchanek
                                       ` (2 more replies)
  2 siblings, 3 replies; 81+ messages in thread
From: Andreas Dilger @ 2011-04-15 22:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
> Apparently tmpfs does not support generic xattr.  I understand why
> tmpfs is an attractive choice for an upper filesystem, so this should
> be addressed.
> 
> I see two options here:
> 
> 1) implement generic xattr in tmpfs

There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:

From: Eric Paris <eparis@redhat.com>
Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
Date: March 29, 2011 12:56:49 PM MDT

Cheers, Andreas






^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 22:18                   ` Andreas Dilger
@ 2011-04-18 13:31                       ` Michal Suchanek
  2011-04-18 13:34                       ` Michal Suchanek
  2011-04-18 13:37                       ` Michal Suchanek
  2 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-18 13:31 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <eparis@redhat.com>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
@ 2011-04-18 13:31                       ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-18 13:31 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <eparis@redhat.com>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 22:18                   ` Andreas Dilger
@ 2011-04-18 13:34                       ` Michal Suchanek
  2011-04-18 13:34                       ` Michal Suchanek
  2011-04-18 13:37                       ` Michal Suchanek
  2 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-18 13:34 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <eparis@redhat.com>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
@ 2011-04-18 13:34                       ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-18 13:34 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <eparis@redhat.com>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
  2011-04-15 22:18                   ` Andreas Dilger
@ 2011-04-18 13:37                       ` Michal Suchanek
  2011-04-18 13:34                       ` Michal Suchanek
  2011-04-18 13:37                       ` Michal Suchanek
  2 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-18 13:37 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <eparis@redhat.com>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Unionmount status?
@ 2011-04-18 13:37                       ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-18 13:37 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Miklos Szeredi, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig

On 16 April 2011 00:18, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-15, at 6:29 AM, Miklos Szeredi wrote:
>> Apparently tmpfs does not support generic xattr.  I understand why
>> tmpfs is an attractive choice for an upper filesystem, so this should
>> be addressed.
>>
>> I see two options here:
>>
>> 1) implement generic xattr in tmpfs
>
> There was a patch posted recently to add xattr support to tmpfs, so that it can use security labels:
>
> From: Eric Paris <eparis@redhat.com>
> Subject: [PATCH] tmpfs: implement xattr support for the entire security namespace
> Date: March 29, 2011 12:56:49 PM MDT
>
> Cheers, Andreas

Applying this patch is not sufficient. Apparently more xattrs are
needed but adding them on top of this patch should be easy.

The ones mentioned in the overlayfs doc are

trusted.overlay.whiteout
trusted.overlay.opaque

The patch implements security.*

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH] tmpfs: implement generic xattr support
  2011-04-18 13:31                       ` Michal Suchanek
  (?)
@ 2011-04-19 20:04                       ` Miklos Szeredi
  2011-04-20  2:18                           ` Phillip Lougher
                                           ` (2 more replies)
  -1 siblings, 3 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-19 20:04 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Andreas Dilger, Jiri Kosina, Ric Wheeler, linux-fsdevel,
	linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

Michal Suchanek <hramrach@centrum.cz> writes:
> Applying this patch is not sufficient. Apparently more xattrs are
> needed but adding them on top of this patch should be easy.
>
> The ones mentioned in the overlayfs doc are
>
> trusted.overlay.whiteout
> trusted.overlay.opaque
>
> The patch implements security.*

Here's an updated patch.  It changes a number of things:

 - it implements shmem specific xattr ops instead of using the generic_*
   functions.  Which means that there's no back and forth between the
   VFS and the filesystem.  I basically copied the btrfs way of doing
   things.

 - adds a new config option: CONFIG_TMPFS_XATTR and makes
   CONFIG_TMPFS_POSIX_ACL depend on this.  This way xattr support can be
   turned on without also adding ACL support.

 - now supports trusted.* namespace needed by overlayfs in addition to
   security.*.  Doesn't yet support user.* since that needs more thought
   wrt. kernel memory use.

 - supports xattrs on symlinks, again needed by overlayfs

Hugh, Eric, does this look acceptable?

Thanks,
Miklos

---
From: Eric Paris <eparis@redhat.com>
Subject: tmpfs: implement generic xattr support

This patch implements generic xattrs for tmpfs filesystems.  The feodra
project, while trying to replace suid apps with file capabilities,
realized that tmpfs, which is used on the build systems, does not
support file capabilities and thus cannot be used to build packages
which use file capabilities.  Xattrs are also needed for overlayfs.

The xattr interface is a bit, odd.  If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks and
the running LSM can then implement some method for handling xattrs.  SELinux
for example provides a method to support security.selinux but no other
security.* xattrs.

As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls.  Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM.  To make up for
that tmpfs had stub functions that did nothing but call into the LSM hooks
which implement the helpers.

This new patch does not use the LSM fallback functions and instead
just implements a native get/set/list xattr feature for the full
security.* and trusted.* namespace like a normal filesystem.  This
means that tmpfs can now support both security.selinux and
security.capability, which was not previously possible.

The basic implementation is that I attach a:

struct shmem_xattr {
	struct list_head list; /* anchored by shmem_inode_info->xattr_list */
	char *name;
	size_t size;
	char value[0];
};

Into the struct shmem_inode_info for each xattr that is set.  This
implementation could easily support the user.* namespace as well,
except some care needs to be taken to prevent large amounts of
unswappable memory being allocated for unprivileged users.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---
 fs/Kconfig               |   18 ++
 include/linux/shmem_fs.h |    1 
 mm/shmem.c               |  302 +++++++++++++++++++++++++++++++++++++++--------
 3 files changed, 273 insertions(+), 48 deletions(-)

Index: linux-2.6/fs/Kconfig
===================================================================
--- linux-2.6.orig/fs/Kconfig	2011-04-19 21:09:33.000000000 +0200
+++ linux-2.6/fs/Kconfig	2011-04-19 21:09:35.000000000 +0200
@@ -121,9 +121,25 @@ config TMPFS
 
 	  See <file:Documentation/filesystems/tmpfs.txt> for details.
 
+config TMPFS_XATTR
+	bool "Tmpfs extended attributes"
+	depends on TMPFS
+	default y
+	help
+	  Extended attributes are name:value pairs associated with inodes by
+	  the kernel or by users (see the attr(5) manual page, or visit
+	  <http://acl.bestbits.at/> for details).
+
+	  Currently this enables support for the trusted.* and
+	  security.* namespaces.
+
+	  If unsure, say N.
+
+	  You need this for POSIX ACL support on tmpfs.
+
 config TMPFS_POSIX_ACL
 	bool "Tmpfs POSIX Access Control Lists"
-	depends on TMPFS
+	depends on TMPFS_XATTR
 	select GENERIC_ACL
 	help
 	  POSIX Access Control Lists (ACLs) support permissions for users and
Index: linux-2.6/include/linux/shmem_fs.h
===================================================================
--- linux-2.6.orig/include/linux/shmem_fs.h	2011-04-19 21:09:25.000000000 +0200
+++ linux-2.6/include/linux/shmem_fs.h	2011-04-19 21:09:35.000000000 +0200
@@ -19,6 +19,7 @@ struct shmem_inode_info {
 	struct page		*i_indirect;	/* top indirect blocks page */
 	swp_entry_t		i_direct[SHMEM_NR_DIRECT]; /* first blocks */
 	struct list_head	swaplist;	/* chain of maybes on swap */
+	struct list_head	xattr_list;	/* list of shmem_xattr */
 	struct inode		vfs_inode;
 };
 
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c	2011-04-19 21:09:25.000000000 +0200
+++ linux-2.6/mm/shmem.c	2011-04-19 21:09:35.000000000 +0200
@@ -99,6 +99,13 @@ static struct vfsmount *shm_mnt;
 /* Pretend that each entry is of this size in directory's i_size */
 #define BOGO_DIRENT_SIZE 20
 
+struct shmem_xattr {
+	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
+	char *name;		/* xattr name */
+	size_t size;
+	char value[0];
+};
+
 /* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */
 enum sgp_type {
 	SGP_READ,	/* don't exceed i_size, don't allocate page */
@@ -822,6 +829,7 @@ static int shmem_notify_change(struct de
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		truncate_inode_pages(inode->i_mapping, 0);
@@ -834,6 +842,11 @@ static void shmem_evict_inode(struct ino
 			mutex_unlock(&shmem_swaplist_mutex);
 		}
 	}
+
+	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
+		kfree(xattr->name);
+		kfree(xattr);
+	}
 	BUG_ON(inode->i_blocks);
 	shmem_free_inode(inode->i_sb);
 	end_writeback(inode);
@@ -1597,6 +1610,7 @@ static struct inode *shmem_get_inode(str
 		spin_lock_init(&info->lock);
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->swaplist);
+		INIT_LIST_HEAD(&info->xattr_list);
 		cache_no_acl(inode);
 
 		switch (mode & S_IFMT) {
@@ -2048,62 +2062,225 @@ static void shmem_put_link(struct dentry
 	}
 }
 
-static const struct inode_operations shmem_symlink_inline_operations = {
-	.readlink	= generic_readlink,
-	.follow_link	= shmem_follow_link_inline,
-};
-
-static const struct inode_operations shmem_symlink_inode_operations = {
-	.readlink	= generic_readlink,
-	.follow_link	= shmem_follow_link,
-	.put_link	= shmem_put_link,
-};
-
-#ifdef CONFIG_TMPFS_POSIX_ACL
+#ifdef CONFIG_TMPFS_XATTR
 /*
- * Superblocks without xattr inode operations will get security.* xattr
- * support from the VFS "for free". As soon as we have any other xattrs
+ * Superblocks without xattr inode operations may get some security.* xattr
+ * support from the LSM "for free". As soon as we have any other xattrs
  * like ACLs, we also need to implement the security.* handlers at
  * filesystem level, though.
  */
 
-static size_t shmem_xattr_security_list(struct dentry *dentry, char *list,
-					size_t list_len, const char *name,
-					size_t name_len, int handler_flags)
+static int shmem_xattr_get(struct dentry *dentry, const char *name,
+			   void *buffer, size_t size)
 {
-	return security_inode_listsecurity(dentry->d_inode, list, list_len);
-}
+	struct shmem_inode_info *info;
+	struct shmem_xattr *xattr;
+	int ret = -ENODATA;
 
-static int shmem_xattr_security_get(struct dentry *dentry, const char *name,
-		void *buffer, size_t size, int handler_flags)
-{
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return xattr_getsecurity(dentry->d_inode, name, buffer, size);
+	info = SHMEM_I(dentry->d_inode);
+
+	spin_lock(&dentry->d_inode->i_lock);
+	list_for_each_entry(xattr, &info->xattr_list, list) {
+		if (strcmp(name, xattr->name))
+			continue;
+
+		ret = xattr->size;
+		if (buffer) {
+			if (size < xattr->size)
+				ret = -ERANGE;
+			else
+				memcpy(buffer, xattr->value, xattr->size);
+		}
+		break;
+	}
+	spin_unlock(&dentry->d_inode->i_lock);
+	return ret;
 }
 
-static int shmem_xattr_security_set(struct dentry *dentry, const char *name,
-		const void *value, size_t size, int flags, int handler_flags)
+static int shmem_xattr_set(struct dentry *dentry, const char *name,
+			   const void *value, size_t size, int flags)
 {
-	if (strcmp(name, "") == 0)
-		return -EINVAL;
-	return security_inode_setsecurity(dentry->d_inode, name, value,
-					  size, flags);
+	struct inode *inode = dentry->d_inode;
+	struct shmem_inode_info *info = SHMEM_I(inode);
+	struct shmem_xattr *xattr;
+	struct shmem_xattr *new_xattr = NULL;
+	size_t len;
+	int err = 0;
+
+	/* value == NULL means remove */
+	if (value) {
+		/* wrap around? */
+		len = sizeof(*new_xattr) + size;
+		if (len <= sizeof(*new_xattr))
+			return -ENOMEM;
+
+		new_xattr = kmalloc(len, GFP_NOFS);
+		if (!new_xattr)
+		return -ENOMEM;
+
+		new_xattr->name = kstrdup(name, GFP_NOFS);
+		if (!new_xattr->name) {
+			kfree(new_xattr);
+			return -ENOMEM;
+		}
+
+		new_xattr->size = size;
+		memcpy(new_xattr->value, value, size);
+	}
+
+	spin_lock(&inode->i_lock);
+	list_for_each_entry(xattr, &info->xattr_list, list) {
+		if (!strcmp(name, xattr->name)) {
+			if (flags & XATTR_CREATE) {
+				xattr = new_xattr;
+				err = -EEXIST;
+			} else if (new_xattr) {
+				list_replace(&xattr->list, &new_xattr->list);
+			} else {
+				list_del(&xattr->list);
+			}
+			goto out;
+		}
+	}
+	if (flags & XATTR_REPLACE) {
+		xattr = new_xattr;
+		err = -ENODATA;
+	} else {
+		list_add(&new_xattr->list, &info->xattr_list);
+		xattr = NULL;
+	}
+out:
+	spin_unlock(&inode->i_lock);
+	if (xattr)
+		kfree(xattr->name);
+	kfree(xattr);
+	return 0;
 }
 
-static const struct xattr_handler shmem_xattr_security_handler = {
-	.prefix = XATTR_SECURITY_PREFIX,
-	.list   = shmem_xattr_security_list,
-	.get    = shmem_xattr_security_get,
-	.set    = shmem_xattr_security_set,
-};
 
 static const struct xattr_handler *shmem_xattr_handlers[] = {
+#ifdef CONFIG_TMPFS_POSIX_ACL
 	&generic_acl_access_handler,
 	&generic_acl_default_handler,
-	&shmem_xattr_security_handler,
+#endif
 	NULL
 };
+
+static int shmem_xattr_validate(const char *name)
+{
+	struct { const char *prefix; size_t len; } arr[] = {
+		{ XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN },
+		{ XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(arr); i++) {
+		size_t preflen = arr[i].len;
+		if (strncmp(name, arr[i].prefix, preflen) == 0) {
+			if (!name[preflen])
+				return -EINVAL;
+			return 0;
+		}
+	}
+	return -EOPNOTSUPP;
+}
+
+static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
+			      void *buffer, size_t size)
+{
+	int err;
+
+	/*
+	 * If this is a request for a synthetic attribute in the system.*
+	 * namespace use the generic infrastructure to resolve a handler
+	 * for it via sb->s_xattr.
+	 */
+	if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
+		return generic_getxattr(dentry, name, buffer, size);
+
+	err = shmem_xattr_validate(name);
+	if (err)
+		return err;
+
+	return shmem_xattr_get(dentry, name, buffer, size);
+}
+
+static int shmem_setxattr(struct dentry *dentry, const char *name,
+			  const void *value, size_t size, int flags)
+{
+	int err;
+
+	/*
+	 * If this is a request for a synthetic attribute in the system.*
+	 * namespace use the generic infrastructure to resolve a handler
+	 * for it via sb->s_xattr.
+	 */
+	if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
+		return generic_setxattr(dentry, name, value, size, flags);
+
+	err = shmem_xattr_validate(name);
+	if (err)
+		return err;
+
+	if (size == 0)
+		value = "";  /* empty EA, do not remove */
+
+	return shmem_xattr_set(dentry, name, value, size, flags);
+
+}
+
+static int shmem_removexattr(struct dentry *dentry, const char *name)
+{
+	int err;
+
+	/*
+	 * If this is a request for a synthetic attribute in the system.*
+	 * namespace use the generic infrastructure to resolve a handler
+	 * for it via sb->s_xattr.
+	 */
+	if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
+		return generic_removexattr(dentry, name);
+
+	err = shmem_xattr_validate(name);
+	if (err)
+		return err;
+
+	return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
+{
+	bool trusted = capable(CAP_SYS_ADMIN);
+	struct shmem_xattr *xattr;
+	struct shmem_inode_info *info;
+	size_t used = 0;
+
+	info = SHMEM_I(dentry->d_inode);
+
+	spin_lock(&dentry->d_inode->i_lock);
+	list_for_each_entry(xattr, &info->xattr_list, list) {
+		/* skip "trusted." attributes for unprivileged callers */
+		if (!trusted && xattr_is_trusted(xattr->name))
+			continue;
+
+		used += strlen(xattr->name) + 1;
+		if (buffer) {
+			if (size < used) {
+				used = -ERANGE;
+				break;
+			}
+			strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
+			buffer += strlen(xattr->name) + 1;
+		}
+	}
+	spin_unlock(&dentry->d_inode->i_lock);
+
+	return used;
+}
 #endif
 
 static struct dentry *shmem_get_parent(struct dentry *child)
@@ -2384,8 +2561,10 @@ int shmem_fill_super(struct super_block
 	sb->s_magic = TMPFS_MAGIC;
 	sb->s_op = &shmem_ops;
 	sb->s_time_gran = 1;
-#ifdef CONFIG_TMPFS_POSIX_ACL
+#ifdef CONFIG_TMPFS_XATTR
 	sb->s_xattr = shmem_xattr_handlers;
+#endif
+#ifdef CONFIG_TMPFS_POSIX_ACL
 	sb->s_flags |= MS_POSIXACL;
 #endif
 
@@ -2483,16 +2662,41 @@ static const struct file_operations shme
 static const struct inode_operations shmem_inode_operations = {
 	.setattr	= shmem_notify_change,
 	.truncate_range	= shmem_truncate_range,
+#ifdef CONFIG_TMPFS_XATTR
+	.setxattr	= shmem_setxattr,
+	.getxattr	= shmem_getxattr,
+	.listxattr	= shmem_listxattr,
+	.removexattr	= shmem_removexattr,
+#endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= generic_listxattr,
-	.removexattr	= generic_removexattr,
 	.check_acl	= generic_check_acl,
 #endif
 
 };
 
+static const struct inode_operations shmem_symlink_inline_operations = {
+	.readlink	= generic_readlink,
+	.follow_link	= shmem_follow_link_inline,
+#ifdef CONFIG_TMPFS_XATTR
+	.setxattr	= shmem_setxattr,
+	.getxattr	= shmem_getxattr,
+	.listxattr	= shmem_listxattr,
+	.removexattr	= shmem_removexattr,
+#endif
+};
+
+static const struct inode_operations shmem_symlink_inode_operations = {
+	.readlink	= generic_readlink,
+	.follow_link	= shmem_follow_link,
+	.put_link	= shmem_put_link,
+#ifdef CONFIG_TMPFS_XATTR
+	.setxattr	= shmem_setxattr,
+	.getxattr	= shmem_getxattr,
+	.listxattr	= shmem_listxattr,
+	.removexattr	= shmem_removexattr,
+#endif
+};
+
 static const struct inode_operations shmem_dir_inode_operations = {
 #ifdef CONFIG_TMPFS
 	.create		= shmem_create,
@@ -2505,23 +2709,27 @@ static const struct inode_operations shm
 	.mknod		= shmem_mknod,
 	.rename		= shmem_rename,
 #endif
-#ifdef CONFIG_TMPFS_POSIX_ACL
-	.setattr	= shmem_notify_change,
+#ifdef CONFIG_TMPFS_XATTR
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,
 	.listxattr	= generic_listxattr,
 	.removexattr	= generic_removexattr,
+#endif
+#ifdef CONFIG_TMPFS_POSIX_ACL
+	.setattr	= shmem_notify_change,
 	.check_acl	= generic_check_acl,
 #endif
 };
 
 static const struct inode_operations shmem_special_inode_operations = {
-#ifdef CONFIG_TMPFS_POSIX_ACL
-	.setattr	= shmem_notify_change,
+#ifdef CONFIG_TMPFS_XATTR
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,
 	.listxattr	= generic_listxattr,
 	.removexattr	= generic_removexattr,
+#endif
+#ifdef CONFIG_TMPFS_POSIX_ACL
+	.setattr	= shmem_notify_change,
 	.check_acl	= generic_check_acl,
 #endif
 };

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-19 20:04                       ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi
@ 2011-04-20  2:18                           ` Phillip Lougher
  2011-04-20 16:00                         ` Serge E. Hallyn
  2011-05-12  4:20                         ` Hugh Dickins
  2 siblings, 0 replies; 81+ messages in thread
From: Phillip Lougher @ 2011-04-20  2:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:

> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)

> +       spin_lock(&dentry->d_inode->i_lock);
> +       list_for_each_entry(xattr, &info->xattr_list, list) {
> +               /* skip "trusted." attributes for unprivileged callers */
> +               if (!trusted && xattr_is_trusted(xattr->name))
> +                       continue;
> +
> +               used += strlen(xattr->name) + 1;
> +               if (buffer) {
> +                       if (size < used) {
> +                               used = -ERANGE;
> +                               break;
> +                       }
> +                       strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>+                        buffer += strlen(xattr->name) + 1;

Why are you doing a strncpy here?  strcpy() isn't going to copy more
than strlen(xattr->name) + 1 bytes, and you know buffer is large
enough to hold that because of the previous if (size < used) check?

If you assigned the first strlen(xattr->name) + 1 to a temporary
variable, you could use memcpy here, and avoid the 3 repeated
strlen(xattr->name) calls.

Phillip

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
@ 2011-04-20  2:18                           ` Phillip Lougher
  0 siblings, 0 replies; 81+ messages in thread
From: Phillip Lougher @ 2011-04-20  2:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:

> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)

> +       spin_lock(&dentry->d_inode->i_lock);
> +       list_for_each_entry(xattr, &info->xattr_list, list) {
> +               /* skip "trusted." attributes for unprivileged callers */
> +               if (!trusted && xattr_is_trusted(xattr->name))
> +                       continue;
> +
> +               used += strlen(xattr->name) + 1;
> +               if (buffer) {
> +                       if (size < used) {
> +                               used = -ERANGE;
> +                               break;
> +                       }
> +                       strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>+                        buffer += strlen(xattr->name) + 1;

Why are you doing a strncpy here?  strcpy() isn't going to copy more
than strlen(xattr->name) + 1 bytes, and you know buffer is large
enough to hold that because of the previous if (size < used) check?

If you assigned the first strlen(xattr->name) + 1 to a temporary
variable, you could use memcpy here, and avoid the 3 repeated
strlen(xattr->name) calls.

Phillip
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-20  2:18                           ` Phillip Lougher
  (?)
@ 2011-04-20 13:43                           ` Miklos Szeredi
  2011-04-21  6:59                             ` Michal Suchanek
  -1 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-20 13:43 UTC (permalink / raw)
  To: Phillip Lougher
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

Phillip Lougher <phillip.lougher@gmail.com> writes:

> On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>
>> +       spin_lock(&dentry->d_inode->i_lock);
>> +       list_for_each_entry(xattr, &info->xattr_list, list) {
>> +               /* skip "trusted." attributes for unprivileged callers */
>> +               if (!trusted && xattr_is_trusted(xattr->name))
>> +                       continue;
>> +
>> +               used += strlen(xattr->name) + 1;
>> +               if (buffer) {
>> +                       if (size < used) {
>> +                               used = -ERANGE;
>> +                               break;
>> +                       }
>> +                       strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>>+                        buffer += strlen(xattr->name) + 1;
>
> Why are you doing a strncpy here?  strcpy() isn't going to copy more
> than strlen(xattr->name) + 1 bytes, and you know buffer is large
> enough to hold that because of the previous if (size < used) check?
>
> If you assigned the first strlen(xattr->name) + 1 to a temporary
> variable, you could use memcpy here, and avoid the 3 repeated
> strlen(xattr->name) calls.

Yeah, makes sense.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-19 20:04                       ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi
  2011-04-20  2:18                           ` Phillip Lougher
@ 2011-04-20 16:00                         ` Serge E. Hallyn
  2011-05-12  4:20                         ` Hugh Dickins
  2 siblings, 0 replies; 81+ messages in thread
From: Serge E. Hallyn @ 2011-04-20 16:00 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

Quoting Miklos Szeredi (miklos@szeredi.hu):
> Michal Suchanek <hramrach@centrum.cz> writes:
> > Applying this patch is not sufficient. Apparently more xattrs are
> > needed but adding them on top of this patch should be easy.
> >
> > The ones mentioned in the overlayfs doc are
> >
> > trusted.overlay.whiteout
> > trusted.overlay.opaque
> >
> > The patch implements security.*
> 
> Here's an updated patch.  It changes a number of things:
> 
>  - it implements shmem specific xattr ops instead of using the generic_*
>    functions.  Which means that there's no back and forth between the
>    VFS and the filesystem.  I basically copied the btrfs way of doing
>    things.
> 
>  - adds a new config option: CONFIG_TMPFS_XATTR and makes
>    CONFIG_TMPFS_POSIX_ACL depend on this.  This way xattr support can be
>    turned on without also adding ACL support.
> 
>  - now supports trusted.* namespace needed by overlayfs in addition to
>    security.*.  Doesn't yet support user.* since that needs more thought
>    wrt. kernel memory use.
> 
>  - supports xattrs on symlinks, again needed by overlayfs
> 
> Hugh, Eric, does this look acceptable?
> 
> Thanks,
> Miklos
> 
> ---
> From: Eric Paris <eparis@redhat.com>
> Subject: tmpfs: implement generic xattr support
> 
> This patch implements generic xattrs for tmpfs filesystems.  The feodra
> project, while trying to replace suid apps with file capabilities,
> realized that tmpfs, which is used on the build systems, does not
> support file capabilities and thus cannot be used to build packages
> which use file capabilities.  Xattrs are also needed for overlayfs.
> 
> The xattr interface is a bit, odd.  If a filesystem does not implement any
> {get,set,list}xattr functions the VFS will call into some random LSM hooks and
> the running LSM can then implement some method for handling xattrs.  SELinux
> for example provides a method to support security.selinux but no other
> security.* xattrs.
> 
> As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
> xattr handler routines specifically to handle acls.  Because of this tmpfs
> would loose the VFS/LSM helpers to support the running LSM.  To make up for
> that tmpfs had stub functions that did nothing but call into the LSM hooks
> which implement the helpers.
> 
> This new patch does not use the LSM fallback functions and instead
> just implements a native get/set/list xattr feature for the full
> security.* and trusted.* namespace like a normal filesystem.  This
> means that tmpfs can now support both security.selinux and
> security.capability, which was not previously possible.
> 
> The basic implementation is that I attach a:
> 
> struct shmem_xattr {
> 	struct list_head list; /* anchored by shmem_inode_info->xattr_list */
> 	char *name;
> 	size_t size;
> 	char value[0];
> };
> 
> Into the struct shmem_inode_info for each xattr that is set.  This
> implementation could easily support the user.* namespace as well,
> except some care needs to be taken to prevent large amounts of
> unswappable memory being allocated for unprivileged users.
> 
> Signed-off-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

Looks good to me.

Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>

thanks,
-serge

> ---
>  fs/Kconfig               |   18 ++
>  include/linux/shmem_fs.h |    1 
>  mm/shmem.c               |  302 +++++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 273 insertions(+), 48 deletions(-)
> 
> Index: linux-2.6/fs/Kconfig
> ===================================================================
> --- linux-2.6.orig/fs/Kconfig	2011-04-19 21:09:33.000000000 +0200
> +++ linux-2.6/fs/Kconfig	2011-04-19 21:09:35.000000000 +0200
> @@ -121,9 +121,25 @@ config TMPFS
>  
>  	  See <file:Documentation/filesystems/tmpfs.txt> for details.
>  
> +config TMPFS_XATTR
> +	bool "Tmpfs extended attributes"
> +	depends on TMPFS
> +	default y
> +	help
> +	  Extended attributes are name:value pairs associated with inodes by
> +	  the kernel or by users (see the attr(5) manual page, or visit
> +	  <http://acl.bestbits.at/> for details).
> +
> +	  Currently this enables support for the trusted.* and
> +	  security.* namespaces.
> +
> +	  If unsure, say N.
> +
> +	  You need this for POSIX ACL support on tmpfs.
> +
>  config TMPFS_POSIX_ACL
>  	bool "Tmpfs POSIX Access Control Lists"
> -	depends on TMPFS
> +	depends on TMPFS_XATTR
>  	select GENERIC_ACL
>  	help
>  	  POSIX Access Control Lists (ACLs) support permissions for users and
> Index: linux-2.6/include/linux/shmem_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/shmem_fs.h	2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/include/linux/shmem_fs.h	2011-04-19 21:09:35.000000000 +0200
> @@ -19,6 +19,7 @@ struct shmem_inode_info {
>  	struct page		*i_indirect;	/* top indirect blocks page */
>  	swp_entry_t		i_direct[SHMEM_NR_DIRECT]; /* first blocks */
>  	struct list_head	swaplist;	/* chain of maybes on swap */
> +	struct list_head	xattr_list;	/* list of shmem_xattr */
>  	struct inode		vfs_inode;
>  };
>  
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c	2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/mm/shmem.c	2011-04-19 21:09:35.000000000 +0200
> @@ -99,6 +99,13 @@ static struct vfsmount *shm_mnt;
>  /* Pretend that each entry is of this size in directory's i_size */
>  #define BOGO_DIRENT_SIZE 20
>  
> +struct shmem_xattr {
> +	struct list_head list;	/* anchored by shmem_inode_info->xattr_list */
> +	char *name;		/* xattr name */
> +	size_t size;
> +	char value[0];
> +};
> +
>  /* Flag allocation requirements to shmem_getpage and shmem_swp_alloc */
>  enum sgp_type {
>  	SGP_READ,	/* don't exceed i_size, don't allocate page */
> @@ -822,6 +829,7 @@ static int shmem_notify_change(struct de
>  static void shmem_evict_inode(struct inode *inode)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_xattr *xattr, *nxattr;
>  
>  	if (inode->i_mapping->a_ops == &shmem_aops) {
>  		truncate_inode_pages(inode->i_mapping, 0);
> @@ -834,6 +842,11 @@ static void shmem_evict_inode(struct ino
>  			mutex_unlock(&shmem_swaplist_mutex);
>  		}
>  	}
> +
> +	list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
> +		kfree(xattr->name);
> +		kfree(xattr);
> +	}
>  	BUG_ON(inode->i_blocks);
>  	shmem_free_inode(inode->i_sb);
>  	end_writeback(inode);
> @@ -1597,6 +1610,7 @@ static struct inode *shmem_get_inode(str
>  		spin_lock_init(&info->lock);
>  		info->flags = flags & VM_NORESERVE;
>  		INIT_LIST_HEAD(&info->swaplist);
> +		INIT_LIST_HEAD(&info->xattr_list);
>  		cache_no_acl(inode);
>  
>  		switch (mode & S_IFMT) {
> @@ -2048,62 +2062,225 @@ static void shmem_put_link(struct dentry
>  	}
>  }
>  
> -static const struct inode_operations shmem_symlink_inline_operations = {
> -	.readlink	= generic_readlink,
> -	.follow_link	= shmem_follow_link_inline,
> -};
> -
> -static const struct inode_operations shmem_symlink_inode_operations = {
> -	.readlink	= generic_readlink,
> -	.follow_link	= shmem_follow_link,
> -	.put_link	= shmem_put_link,
> -};
> -
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> +#ifdef CONFIG_TMPFS_XATTR
>  /*
> - * Superblocks without xattr inode operations will get security.* xattr
> - * support from the VFS "for free". As soon as we have any other xattrs
> + * Superblocks without xattr inode operations may get some security.* xattr
> + * support from the LSM "for free". As soon as we have any other xattrs
>   * like ACLs, we also need to implement the security.* handlers at
>   * filesystem level, though.
>   */
>  
> -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list,
> -					size_t list_len, const char *name,
> -					size_t name_len, int handler_flags)
> +static int shmem_xattr_get(struct dentry *dentry, const char *name,
> +			   void *buffer, size_t size)
>  {
> -	return security_inode_listsecurity(dentry->d_inode, list, list_len);
> -}
> +	struct shmem_inode_info *info;
> +	struct shmem_xattr *xattr;
> +	int ret = -ENODATA;
>  
> -static int shmem_xattr_security_get(struct dentry *dentry, const char *name,
> -		void *buffer, size_t size, int handler_flags)
> -{
> -	if (strcmp(name, "") == 0)
> -		return -EINVAL;
> -	return xattr_getsecurity(dentry->d_inode, name, buffer, size);
> +	info = SHMEM_I(dentry->d_inode);
> +
> +	spin_lock(&dentry->d_inode->i_lock);
> +	list_for_each_entry(xattr, &info->xattr_list, list) {
> +		if (strcmp(name, xattr->name))
> +			continue;
> +
> +		ret = xattr->size;
> +		if (buffer) {
> +			if (size < xattr->size)
> +				ret = -ERANGE;
> +			else
> +				memcpy(buffer, xattr->value, xattr->size);
> +		}
> +		break;
> +	}
> +	spin_unlock(&dentry->d_inode->i_lock);
> +	return ret;
>  }
>  
> -static int shmem_xattr_security_set(struct dentry *dentry, const char *name,
> -		const void *value, size_t size, int flags, int handler_flags)
> +static int shmem_xattr_set(struct dentry *dentry, const char *name,
> +			   const void *value, size_t size, int flags)
>  {
> -	if (strcmp(name, "") == 0)
> -		return -EINVAL;
> -	return security_inode_setsecurity(dentry->d_inode, name, value,
> -					  size, flags);
> +	struct inode *inode = dentry->d_inode;
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_xattr *xattr;
> +	struct shmem_xattr *new_xattr = NULL;
> +	size_t len;
> +	int err = 0;
> +
> +	/* value == NULL means remove */
> +	if (value) {
> +		/* wrap around? */
> +		len = sizeof(*new_xattr) + size;
> +		if (len <= sizeof(*new_xattr))
> +			return -ENOMEM;
> +
> +		new_xattr = kmalloc(len, GFP_NOFS);
> +		if (!new_xattr)
> +		return -ENOMEM;
> +
> +		new_xattr->name = kstrdup(name, GFP_NOFS);
> +		if (!new_xattr->name) {
> +			kfree(new_xattr);
> +			return -ENOMEM;
> +		}
> +
> +		new_xattr->size = size;
> +		memcpy(new_xattr->value, value, size);
> +	}
> +
> +	spin_lock(&inode->i_lock);
> +	list_for_each_entry(xattr, &info->xattr_list, list) {
> +		if (!strcmp(name, xattr->name)) {
> +			if (flags & XATTR_CREATE) {
> +				xattr = new_xattr;
> +				err = -EEXIST;
> +			} else if (new_xattr) {
> +				list_replace(&xattr->list, &new_xattr->list);
> +			} else {
> +				list_del(&xattr->list);
> +			}
> +			goto out;
> +		}
> +	}
> +	if (flags & XATTR_REPLACE) {
> +		xattr = new_xattr;
> +		err = -ENODATA;
> +	} else {
> +		list_add(&new_xattr->list, &info->xattr_list);
> +		xattr = NULL;
> +	}
> +out:
> +	spin_unlock(&inode->i_lock);
> +	if (xattr)
> +		kfree(xattr->name);
> +	kfree(xattr);
> +	return 0;
>  }
>  
> -static const struct xattr_handler shmem_xattr_security_handler = {
> -	.prefix = XATTR_SECURITY_PREFIX,
> -	.list   = shmem_xattr_security_list,
> -	.get    = shmem_xattr_security_get,
> -	.set    = shmem_xattr_security_set,
> -};
>  
>  static const struct xattr_handler *shmem_xattr_handlers[] = {
> +#ifdef CONFIG_TMPFS_POSIX_ACL
>  	&generic_acl_access_handler,
>  	&generic_acl_default_handler,
> -	&shmem_xattr_security_handler,
> +#endif
>  	NULL
>  };
> +
> +static int shmem_xattr_validate(const char *name)
> +{
> +	struct { const char *prefix; size_t len; } arr[] = {
> +		{ XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN },
> +		{ XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }};
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(arr); i++) {
> +		size_t preflen = arr[i].len;
> +		if (strncmp(name, arr[i].prefix, preflen) == 0) {
> +			if (!name[preflen])
> +				return -EINVAL;
> +			return 0;
> +		}
> +	}
> +	return -EOPNOTSUPP;
> +}
> +
> +static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
> +			      void *buffer, size_t size)
> +{
> +	int err;
> +
> +	/*
> +	 * If this is a request for a synthetic attribute in the system.*
> +	 * namespace use the generic infrastructure to resolve a handler
> +	 * for it via sb->s_xattr.
> +	 */
> +	if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> +		return generic_getxattr(dentry, name, buffer, size);
> +
> +	err = shmem_xattr_validate(name);
> +	if (err)
> +		return err;
> +
> +	return shmem_xattr_get(dentry, name, buffer, size);
> +}
> +
> +static int shmem_setxattr(struct dentry *dentry, const char *name,
> +			  const void *value, size_t size, int flags)
> +{
> +	int err;
> +
> +	/*
> +	 * If this is a request for a synthetic attribute in the system.*
> +	 * namespace use the generic infrastructure to resolve a handler
> +	 * for it via sb->s_xattr.
> +	 */
> +	if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> +		return generic_setxattr(dentry, name, value, size, flags);
> +
> +	err = shmem_xattr_validate(name);
> +	if (err)
> +		return err;
> +
> +	if (size == 0)
> +		value = "";  /* empty EA, do not remove */
> +
> +	return shmem_xattr_set(dentry, name, value, size, flags);
> +
> +}
> +
> +static int shmem_removexattr(struct dentry *dentry, const char *name)
> +{
> +	int err;
> +
> +	/*
> +	 * If this is a request for a synthetic attribute in the system.*
> +	 * namespace use the generic infrastructure to resolve a handler
> +	 * for it via sb->s_xattr.
> +	 */
> +	if (!strncmp(name, XATTR_SYSTEM_PREFIX, XATTR_SYSTEM_PREFIX_LEN))
> +		return generic_removexattr(dentry, name);
> +
> +	err = shmem_xattr_validate(name);
> +	if (err)
> +		return err;
> +
> +	return shmem_xattr_set(dentry, name, NULL, 0, XATTR_REPLACE);
> +}
> +
> +static bool xattr_is_trusted(const char *name)
> +{
> +	return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
> +}
> +
> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
> +{
> +	bool trusted = capable(CAP_SYS_ADMIN);
> +	struct shmem_xattr *xattr;
> +	struct shmem_inode_info *info;
> +	size_t used = 0;
> +
> +	info = SHMEM_I(dentry->d_inode);
> +
> +	spin_lock(&dentry->d_inode->i_lock);
> +	list_for_each_entry(xattr, &info->xattr_list, list) {
> +		/* skip "trusted." attributes for unprivileged callers */
> +		if (!trusted && xattr_is_trusted(xattr->name))
> +			continue;
> +
> +		used += strlen(xattr->name) + 1;
> +		if (buffer) {
> +			if (size < used) {
> +				used = -ERANGE;
> +				break;
> +			}
> +			strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
> +			buffer += strlen(xattr->name) + 1;
> +		}
> +	}
> +	spin_unlock(&dentry->d_inode->i_lock);
> +
> +	return used;
> +}
>  #endif
>  
>  static struct dentry *shmem_get_parent(struct dentry *child)
> @@ -2384,8 +2561,10 @@ int shmem_fill_super(struct super_block
>  	sb->s_magic = TMPFS_MAGIC;
>  	sb->s_op = &shmem_ops;
>  	sb->s_time_gran = 1;
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> +#ifdef CONFIG_TMPFS_XATTR
>  	sb->s_xattr = shmem_xattr_handlers;
> +#endif
> +#ifdef CONFIG_TMPFS_POSIX_ACL
>  	sb->s_flags |= MS_POSIXACL;
>  #endif
>  
> @@ -2483,16 +2662,41 @@ static const struct file_operations shme
>  static const struct inode_operations shmem_inode_operations = {
>  	.setattr	= shmem_notify_change,
>  	.truncate_range	= shmem_truncate_range,
> +#ifdef CONFIG_TMPFS_XATTR
> +	.setxattr	= shmem_setxattr,
> +	.getxattr	= shmem_getxattr,
> +	.listxattr	= shmem_listxattr,
> +	.removexattr	= shmem_removexattr,
> +#endif
>  #ifdef CONFIG_TMPFS_POSIX_ACL
> -	.setxattr	= generic_setxattr,
> -	.getxattr	= generic_getxattr,
> -	.listxattr	= generic_listxattr,
> -	.removexattr	= generic_removexattr,
>  	.check_acl	= generic_check_acl,
>  #endif
>  
>  };
>  
> +static const struct inode_operations shmem_symlink_inline_operations = {
> +	.readlink	= generic_readlink,
> +	.follow_link	= shmem_follow_link_inline,
> +#ifdef CONFIG_TMPFS_XATTR
> +	.setxattr	= shmem_setxattr,
> +	.getxattr	= shmem_getxattr,
> +	.listxattr	= shmem_listxattr,
> +	.removexattr	= shmem_removexattr,
> +#endif
> +};
> +
> +static const struct inode_operations shmem_symlink_inode_operations = {
> +	.readlink	= generic_readlink,
> +	.follow_link	= shmem_follow_link,
> +	.put_link	= shmem_put_link,
> +#ifdef CONFIG_TMPFS_XATTR
> +	.setxattr	= shmem_setxattr,
> +	.getxattr	= shmem_getxattr,
> +	.listxattr	= shmem_listxattr,
> +	.removexattr	= shmem_removexattr,
> +#endif
> +};
> +
>  static const struct inode_operations shmem_dir_inode_operations = {
>  #ifdef CONFIG_TMPFS
>  	.create		= shmem_create,
> @@ -2505,23 +2709,27 @@ static const struct inode_operations shm
>  	.mknod		= shmem_mknod,
>  	.rename		= shmem_rename,
>  #endif
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> -	.setattr	= shmem_notify_change,
> +#ifdef CONFIG_TMPFS_XATTR
>  	.setxattr	= generic_setxattr,
>  	.getxattr	= generic_getxattr,
>  	.listxattr	= generic_listxattr,
>  	.removexattr	= generic_removexattr,
> +#endif
> +#ifdef CONFIG_TMPFS_POSIX_ACL
> +	.setattr	= shmem_notify_change,
>  	.check_acl	= generic_check_acl,
>  #endif
>  };
>  
>  static const struct inode_operations shmem_special_inode_operations = {
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> -	.setattr	= shmem_notify_change,
> +#ifdef CONFIG_TMPFS_XATTR
>  	.setxattr	= generic_setxattr,
>  	.getxattr	= generic_getxattr,
>  	.listxattr	= generic_listxattr,
>  	.removexattr	= generic_removexattr,
> +#endif
> +#ifdef CONFIG_TMPFS_POSIX_ACL
> +	.setattr	= shmem_notify_change,
>  	.check_acl	= generic_check_acl,
>  #endif
>  };
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-20 13:43                           ` Miklos Szeredi
@ 2011-04-21  6:59                             ` Michal Suchanek
  2011-04-21  9:08                               ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-21  6:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

On 20 April 2011 15:43, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Phillip Lougher <phillip.lougher@gmail.com> writes:
>
>> On Tue, Apr 19, 2011 at 9:04 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>>
>>> +       spin_lock(&dentry->d_inode->i_lock);
>>> +       list_for_each_entry(xattr, &info->xattr_list, list) {
>>> +               /* skip "trusted." attributes for unprivileged callers */
>>> +               if (!trusted && xattr_is_trusted(xattr->name))
>>> +                       continue;
>>> +
>>> +               used += strlen(xattr->name) + 1;
>>> +               if (buffer) {
>>> +                       if (size < used) {
>>> +                               used = -ERANGE;
>>> +                               break;
>>> +                       }
>>> +                       strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>>>+                        buffer += strlen(xattr->name) + 1;
>>
>> Why are you doing a strncpy here?  strcpy() isn't going to copy more
>> than strlen(xattr->name) + 1 bytes, and you know buffer is large
>> enough to hold that because of the previous if (size < used) check?
>>
>> If you assigned the first strlen(xattr->name) + 1 to a temporary
>> variable, you could use memcpy here, and avoid the 3 repeated
>> strlen(xattr->name) calls.
>
> Yeah, makes sense.
>

Can you put up a branch with both overlayfs and xattrs?

I would like to point people at a repo with working implementation and
mirroring the kernel tree to add one patch is a bit overkill.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21  6:59                             ` Michal Suchanek
@ 2011-04-21  9:08                               ` Miklos Szeredi
  2011-04-21 10:59                                 ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-21  9:08 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

Michal Suchanek <hramrach@centrum.cz> writes:

> Can you put up a branch with both overlayfs and xattrs?
>
> I would like to point people at a repo with working implementation and
> mirroring the kernel tree to add one patch is a bit overkill.

Okay, added the tmpfs-xattr patch to the overlayfs.v8 branch.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21  9:08                               ` Miklos Szeredi
@ 2011-04-21 10:59                                 ` Michal Suchanek
  2011-04-21 14:58                                   ` Jordi Pujol
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-21 10:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Phillip Lougher, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Hugh Dickins, Eric Paris

On 21 April 2011 11:08, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Michal Suchanek <hramrach@centrum.cz> writes:
>
>> Can you put up a branch with both overlayfs and xattrs?
>>
>> I would like to point people at a repo with working implementation and
>> mirroring the kernel tree to add one patch is a bit overkill.
>
> Okay, added the tmpfs-xattr patch to the overlayfs.v8 branch.
>

Okay, I built a small live system using this branch and it boots without errors.

Now it's time to build a larger image I guess.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21 10:59                                 ` Michal Suchanek
@ 2011-04-21 14:58                                   ` Jordi Pujol
  2011-04-21 15:22                                     ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Jordi Pujol @ 2011-04-21 14:58 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Miklos Szeredi, Phillip Lougher, Andreas Dilger, Jiri Kosina,
	Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins,
	Eric Paris

A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure:
> Now it's time to build a larger image I guess.

Hello,

I have done it, building a desktop image that contains kde4 and some services,
The tests show that tmpfs is broken, it makes first operations but at some 
point it becomes erratic; that occurs in non-persistent mode, where 
overlayfs upperdir is on tmpfs, and in persistent mode also but in this case 
more user working is needed before tmpfs failure,

in Linux single user mode I have installed the nvidia packages and driver 
using dpkg, only some files are installed, therefore I tried creating manually 
a required directory and in this manner the process has done some steps more.

kernel source:

http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3-
lnet_2.6.38-15.tar.bz2

logs:

http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2

thanks,

Jordi Pujol

Live never ending Tale
GNU/Linux Live forever!
http://livenet.selfip.com

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21 14:58                                   ` Jordi Pujol
@ 2011-04-21 15:22                                     ` Michal Suchanek
  2011-04-21 15:43                                       ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-21 15:22 UTC (permalink / raw)
  To: Jordi Pujol
  Cc: Miklos Szeredi, Phillip Lougher, Andreas Dilger, Jiri Kosina,
	Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins,
	Eric Paris

On 21 April 2011 16:58, Jordi Pujol <jordipujolp@gmail.com> wrote:
> A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure:
>> Now it's time to build a larger image I guess.
>
> Hello,
>
> I have done it, building a desktop image that contains kde4 and some services,
> The tests show that tmpfs is broken, it makes first operations but at some
> point it becomes erratic; that occurs in non-persistent mode, where
> overlayfs upperdir is on tmpfs, and in persistent mode also but in this case
> more user working is needed before tmpfs failure,
>
> in Linux single user mode I have installed the nvidia packages and driver
> using dpkg, only some files are installed, therefore I tried creating manually
> a required directory and in this manner the process has done some steps more.
>
> kernel source:
>
> http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3-
> lnet_2.6.38-15.tar.bz2
>
> logs:
>
> http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2
>

Yes, while the system appears to run generally fine unpacking dpkg
packages fails.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21 15:22                                     ` Michal Suchanek
@ 2011-04-21 15:43                                       ` Michal Suchanek
  2011-04-21 17:26                                         ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-04-21 15:43 UTC (permalink / raw)
  To: Jordi Pujol
  Cc: Miklos Szeredi, Phillip Lougher, Andreas Dilger, Jiri Kosina,
	Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins,
	Eric Paris

On 21 April 2011 17:22, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 21 April 2011 16:58, Jordi Pujol <jordipujolp@gmail.com> wrote:
>> A Dijous 21 Abril 2011 12:59:01, Michal Suchanek va escriure:
>>> Now it's time to build a larger image I guess.
>>
>> Hello,
>>
>> I have done it, building a desktop image that contains kde4 and some services,
>> The tests show that tmpfs is broken, it makes first operations but at some
>> point it becomes erratic; that occurs in non-persistent mode, where
>> overlayfs upperdir is on tmpfs, and in persistent mode also but in this case
>> more user working is needed before tmpfs failure,
>>
>> in Linux single user mode I have installed the nvidia packages and driver
>> using dpkg, only some files are installed, therefore I tried creating manually
>> a required directory and in this manner the process has done some steps more.
>>
>> kernel source:
>>
>> http://livenet.selfip.com/ftp/debian/linux-2.6/linux-2.6.38-3.jpp.3-
>> lnet_2.6.38-15.tar.bz2
>>
>> logs:
>>
>> http://livenet.selfip.com/ftp/debian/overlayfs/test-tmpfs-xattrs.tar.bz2
>>
>
> Yes, while the system appears to run generally fine unpacking dpkg
> packages fails.

You will be probably interested in this log:

ls: cannot access /usr/share/doc/ucb*: No such file or directory
...
rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory)
rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory)
mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0
chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0
chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0
utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05,
2008/06/08-11:38:17]) = 0
rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") =
-1 EOPNOTSUPP (Operation not supported)

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21 15:43                                       ` Michal Suchanek
@ 2011-04-21 17:26                                         ` Miklos Szeredi
  2011-04-21 19:17                                           ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-04-21 17:26 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Jordi Pujol, Phillip Lougher, Andreas Dilger, Jiri Kosina,
	Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins,
	Eric Paris

Michal Suchanek <hramrach@centrum.cz> writes:

> You will be probably interested in this log:
>
> ls: cannot access /usr/share/doc/ucb*: No such file or directory
> ...
> rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory)
> rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory)
> mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0
> chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0
> chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0
> utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05,
> 2008/06/08-11:38:17]) = 0
> rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") =
> -1 EOPNOTSUPP (Operation not supported)

Oops, stupid mistake.

Following patch should fix it. 

Thanks,
Miklos

commit e90f1946938da7c9502337340126beae1931d239
Author: Miklos Szeredi <mszeredi@suse.cz>
Date:   Thu Apr 21 19:18:23 2011 +0200

    tmpfs: xattr: fix xattr for directories and special files
    
    Forgot to update the xattr methods for some file types.
    
    Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

diff --git a/mm/shmem.c b/mm/shmem.c
index ac8ec9e..c527484 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2714,10 +2714,10 @@ static const struct inode_operations shmem_dir_inode_operations = {
 	.rename		= shmem_rename,
 #endif
 #ifdef CONFIG_TMPFS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= generic_listxattr,
-	.removexattr	= generic_removexattr,
+	.setxattr	= shmem_setxattr,
+	.getxattr	= shmem_getxattr,
+	.listxattr	= shmem_listxattr,
+	.removexattr	= shmem_removexattr,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,
@@ -2727,10 +2727,10 @@ static const struct inode_operations shmem_dir_inode_operations = {
 
 static const struct inode_operations shmem_special_inode_operations = {
 #ifdef CONFIG_TMPFS_XATTR
-	.setxattr	= generic_setxattr,
-	.getxattr	= generic_getxattr,
-	.listxattr	= generic_listxattr,
-	.removexattr	= generic_removexattr,
+	.setxattr	= shmem_setxattr,
+	.getxattr	= shmem_getxattr,
+	.listxattr	= shmem_listxattr,
+	.removexattr	= shmem_removexattr,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
 	.setattr	= shmem_notify_change,

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-21 17:26                                         ` Miklos Szeredi
@ 2011-04-21 19:17                                           ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-04-21 19:17 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jordi Pujol, Phillip Lougher, Andreas Dilger, Jiri Kosina,
	Ric Wheeler, linux-fsdevel, linux-kernel, David Howells,
	Ian Kent, Jeff Moyer, Christoph Hellwig, Hugh Dickins,
	Eric Paris

On 21 April 2011 19:26, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Michal Suchanek <hramrach@centrum.cz> writes:
>
>> You will be probably interested in this log:
>>
>> ls: cannot access /usr/share/doc/ucb*: No such file or directory
>> ...
>> rmdir("/usr/share/doc/ucblogo.dpkg-new") = -1 ENOENT (No such file or directory)
>> rmdir("/usr/share/doc/ucblogo.dpkg-tmp") = -1 ENOENT (No such file or directory)
>> mkdir("/usr/share/doc/ucblogo.dpkg-new", 0) = 0
>> chown32("/usr/share/doc/ucblogo.dpkg-new", 0, 0) = 0
>> chmod("/usr/share/doc/ucblogo.dpkg-new", 0755) = 0
>> utime("/usr/share/doc/ucblogo.dpkg-new", [2011/04/21-15:37:05,
>> 2008/06/08-11:38:17]) = 0
>> rename("/usr/share/doc/ucblogo.dpkg-new", "/usr/share/doc/ucblogo") =
>> -1 EOPNOTSUPP (Operation not supported)
>
> Oops, stupid mistake.
>
> Following patch should fix it.
>

Yes, with the additional patch package installation now works.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-04-19 20:04                       ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi
  2011-04-20  2:18                           ` Phillip Lougher
  2011-04-20 16:00                         ` Serge E. Hallyn
@ 2011-05-12  4:20                         ` Hugh Dickins
  2011-05-12  7:52                           ` Michal Suchanek
  2011-05-12 12:27                           ` Miklos Szeredi
  2 siblings, 2 replies; 81+ messages in thread
From: Hugh Dickins @ 2011-05-12  4:20 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Eric Paris, Andrew Morton, James Morris,
	Serge Hallyn

Hi Miklos,

I understand from the mm-commits discussion that you're probably
preparing another version: so let me make just a few comments
on this old one now, in case you can factor them in.

On Tue, 19 Apr 2011, Miklos Szeredi wrote:

> Michal Suchanek <hramrach@centrum.cz> writes:
> > Applying this patch is not sufficient. Apparently more xattrs are
> > needed but adding them on top of this patch should be easy.
> >
> > The ones mentioned in the overlayfs doc are
> >
> > trusted.overlay.whiteout
> > trusted.overlay.opaque
> >
> > The patch implements security.*
> 
> Here's an updated patch.  It changes a number of things:
> 
>  - it implements shmem specific xattr ops instead of using the generic_*
>    functions.  Which means that there's no back and forth between the
>    VFS and the filesystem.  I basically copied the btrfs way of doing
>    things.
> 
>  - adds a new config option: CONFIG_TMPFS_XATTR and makes
>    CONFIG_TMPFS_POSIX_ACL depend on this.  This way xattr support can be
>    turned on without also adding ACL support.
> 
>  - now supports trusted.* namespace needed by overlayfs in addition to
>    security.*.  Doesn't yet support user.* since that needs more thought
>    wrt. kernel memory use.
> 
>  - supports xattrs on symlinks, again needed by overlayfs
> 
> Hugh, Eric, does this look acceptable?

Pretty much, I think.  As ever I'm somewhat bemused by these xattrs,
so don't count too much on my opinion.  But if it looks good to xattr
people, then it's welcome in tmpfs - thank you.  Eric's ack would be good.

Just a few comments...

> 
> Thanks,
> Miklos
> 
> ---
> From: Eric Paris <eparis@redhat.com>
> Subject: tmpfs: implement generic xattr support
> 
> This patch implements generic xattrs for tmpfs filesystems.  The feodra
> project, while trying to replace suid apps with file capabilities,
> realized that tmpfs, which is used on the build systems, does not
> support file capabilities and thus cannot be used to build packages
> which use file capabilities.  Xattrs are also needed for overlayfs.
> 
> The xattr interface is a bit, odd.  If a filesystem does not implement any
> {get,set,list}xattr functions the VFS will call into some random LSM hooks and
> the running LSM can then implement some method for handling xattrs.  SELinux
> for example provides a method to support security.selinux but no other
> security.* xattrs.
> 
> As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
> xattr handler routines specifically to handle acls.  Because of this tmpfs
> would loose the VFS/LSM helpers to support the running LSM.  To make up for
> that tmpfs had stub functions that did nothing but call into the LSM hooks
> which implement the helpers.
> 
> This new patch does not use the LSM fallback functions and instead
> just implements a native get/set/list xattr feature for the full
> security.* and trusted.* namespace like a normal filesystem.  This
> means that tmpfs can now support both security.selinux and
> security.capability, which was not previously possible.
> 
> The basic implementation is that I attach a:
> 
> struct shmem_xattr {
> 	struct list_head list; /* anchored by shmem_inode_info->xattr_list */
> 	char *name;
> 	size_t size;
> 	char value[0];
> };
> 
> Into the struct shmem_inode_info for each xattr that is set.  This
> implementation could easily support the user.* namespace as well,
> except some care needs to be taken to prevent large amounts of
> unswappable memory being allocated for unprivileged users.
> 
> Signed-off-by: Eric Paris <eparis@redhat.com>
> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
> ---
>  fs/Kconfig               |   18 ++
>  include/linux/shmem_fs.h |    1 
>  mm/shmem.c               |  302 +++++++++++++++++++++++++++++++++++++++--------
>  3 files changed, 273 insertions(+), 48 deletions(-)
> 
> Index: linux-2.6/fs/Kconfig
> ===================================================================
> --- linux-2.6.orig/fs/Kconfig	2011-04-19 21:09:33.000000000 +0200
> +++ linux-2.6/fs/Kconfig	2011-04-19 21:09:35.000000000 +0200
> @@ -121,9 +121,25 @@ config TMPFS
>  
>  	  See <file:Documentation/filesystems/tmpfs.txt> for details.
>  
> +config TMPFS_XATTR
> +	bool "Tmpfs extended attributes"
> +	depends on TMPFS
> +	default y
> +	help
> +	  Extended attributes are name:value pairs associated with inodes by
> +	  the kernel or by users (see the attr(5) manual page, or visit
> +	  <http://acl.bestbits.at/> for details).
> +
> +	  Currently this enables support for the trusted.* and
> +	  security.* namespaces.
> +
> +	  If unsure, say N.
> +
> +	  You need this for POSIX ACL support on tmpfs.
> +

I disagree with Andrew on this, it should default to off, consistent with
the "If unsure, say N" comment.  Partly because that's how Linus prefers
it, for good anti-bloat reasons: if people have got along without a
feature for years, whyever should we suddenly default it on?  And
partly because TMPFS_POSIX_ACL already defaults to off (as do
EXT2_FS_XATTR and EXT2_FS_POSIX_ACL).

However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their
old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically,
so we're not in danger of removing their old functionality.  But I haven't
thought of the right way to achieve that - maybe your helpful POSIX ACL
comment is enough, but automatic would be better.

Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it
would work to make CONFIG_TMPFS_XATTR the new config option to cover both,
and select it from config TMPFS_POSIX_ACL (without a prompt) so the
oldconfig propagates correctly.  Perhaps.  But I haven't tried myself,
and you may be forgiveably disinclined to make config experiments!

>  config TMPFS_POSIX_ACL
>  	bool "Tmpfs POSIX Access Control Lists"
> -	depends on TMPFS
> +	depends on TMPFS_XATTR
>  	select GENERIC_ACL
>  	help
>  	  POSIX Access Control Lists (ACLs) support permissions for users and
> Index: linux-2.6/include/linux/shmem_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/shmem_fs.h	2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/include/linux/shmem_fs.h	2011-04-19 21:09:35.000000000 +0200
> @@ -19,6 +19,7 @@ struct shmem_inode_info {
>  	struct page		*i_indirect;	/* top indirect blocks page */
>  	swp_entry_t		i_direct[SHMEM_NR_DIRECT]; /* first blocks */
>  	struct list_head	swaplist;	/* chain of maybes on swap */
> +	struct list_head	xattr_list;	/* list of shmem_xattr */
>  	struct inode		vfs_inode;
>  };
>  
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c	2011-04-19 21:09:25.000000000 +0200
> +++ linux-2.6/mm/shmem.c	2011-04-19 21:09:35.000000000 +0200
...
> -#ifdef CONFIG_TMPFS_POSIX_ACL
> +#ifdef CONFIG_TMPFS_XATTR
>  /*
> - * Superblocks without xattr inode operations will get security.* xattr
> - * support from the VFS "for free". As soon as we have any other xattrs
> + * Superblocks without xattr inode operations may get some security.* xattr
> + * support from the LSM "for free". As soon as we have any other xattrs
>   * like ACLs, we also need to implement the security.* handlers at
>   * filesystem level, though.
>   */
>  
> -static size_t shmem_xattr_security_list(struct dentry *dentry, char *list,
> -					size_t list_len, const char *name,
> -					size_t name_len, int handler_flags)
> +static int shmem_xattr_get(struct dentry *dentry, const char *name,
> +			   void *buffer, size_t size)
>  {
> -	return security_inode_listsecurity(dentry->d_inode, list, list_len);
> -}
> +	struct shmem_inode_info *info;
> +	struct shmem_xattr *xattr;
> +	int ret = -ENODATA;
>  
> -static int shmem_xattr_security_get(struct dentry *dentry, const char *name,
> -		void *buffer, size_t size, int handler_flags)
> -{
> -	if (strcmp(name, "") == 0)
> -		return -EINVAL;
> -	return xattr_getsecurity(dentry->d_inode, name, buffer, size);
> +	info = SHMEM_I(dentry->d_inode);
> +
> +	spin_lock(&dentry->d_inode->i_lock);

Not important, but I suggest you use info->lock throughout for this,
instead of dentry->d_inode->i_lock: in each case you need "info" for
info->xattr_list (and don't need "inode" at all I think), so info->lock
seems appropriate, and may be in the same cacheline once I make
shmem_inode_info smaller.  But don't worry if you'd prefer to leave it.

> -static int shmem_xattr_security_set(struct dentry *dentry, const char *name,
> -		const void *value, size_t size, int flags, int handler_flags)
> +static int shmem_xattr_set(struct dentry *dentry, const char *name,
> +			   const void *value, size_t size, int flags)
>  {
> -	if (strcmp(name, "") == 0)
> -		return -EINVAL;
> -	return security_inode_setsecurity(dentry->d_inode, name, value,
> -					  size, flags);
> +	struct inode *inode = dentry->d_inode;
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	struct shmem_xattr *xattr;
> +	struct shmem_xattr *new_xattr = NULL;
> +	size_t len;
> +	int err = 0;
> +
> +	/* value == NULL means remove */
> +	if (value) {
> +		/* wrap around? */
> +		len = sizeof(*new_xattr) + size;
> +		if (len <= sizeof(*new_xattr))
> +			return -ENOMEM;
> +
> +		new_xattr = kmalloc(len, GFP_NOFS);

There's no need for GFP_NOFS in this patch, the page reclaim path won't
ever recurse into xattr handling: just use the usual GFP_KERNEL throughout.

> +		if (!new_xattr)
> +		return -ENOMEM;
> +
> +		new_xattr->name = kstrdup(name, GFP_NOFS);
> +		if (!new_xattr->name) {
> +			kfree(new_xattr);
> +			return -ENOMEM;
> +		}
> +
> +		new_xattr->size = size;
> +		memcpy(new_xattr->value, value, size);
> +	}
> +
> +	spin_lock(&inode->i_lock);
> +	list_for_each_entry(xattr, &info->xattr_list, list) {
> +		if (!strcmp(name, xattr->name)) {
> +			if (flags & XATTR_CREATE) {
> +				xattr = new_xattr;
> +				err = -EEXIST;
> +			} else if (new_xattr) {
> +				list_replace(&xattr->list, &new_xattr->list);
> +			} else {
> +				list_del(&xattr->list);
> +			}
> +			goto out;
> +		}
> +	}
> +	if (flags & XATTR_REPLACE) {
> +		xattr = new_xattr;
> +		err = -ENODATA;
> +	} else {
> +		list_add(&new_xattr->list, &info->xattr_list);
> +		xattr = NULL;
> +	}
> +out:
> +	spin_unlock(&inode->i_lock);
> +	if (xattr)
> +		kfree(xattr->name);
> +	kfree(xattr);
> +	return 0;

I think you meant to return err rather than always 0.

...

> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
> +{
> +	bool trusted = capable(CAP_SYS_ADMIN);
> +	struct shmem_xattr *xattr;
> +	struct shmem_inode_info *info;
> +	size_t used = 0;
> +
> +	info = SHMEM_I(dentry->d_inode);
> +
> +	spin_lock(&dentry->d_inode->i_lock);
> +	list_for_each_entry(xattr, &info->xattr_list, list) {
> +		/* skip "trusted." attributes for unprivileged callers */
> +		if (!trusted && xattr_is_trusted(xattr->name))
> +			continue;
> +
> +		used += strlen(xattr->name) + 1;
> +		if (buffer) {
> +			if (size < used) {
> +				used = -ERANGE;
> +				break;
> +			}
> +			strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
> +			buffer += strlen(xattr->name) + 1;
> +		}
> +	}
> +	spin_unlock(&dentry->d_inode->i_lock);
> +
> +	return used;
> +}
>  #endif

#endif /* CONFIG_TMPFS_XATTR */

would be welcome by the time we get here.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-05-12  4:20                         ` Hugh Dickins
@ 2011-05-12  7:52                           ` Michal Suchanek
  2011-05-12 12:27                           ` Miklos Szeredi
  1 sibling, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-05-12  7:52 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Miklos Szeredi, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Eric Paris, Andrew Morton, James Morris,
	Serge Hallyn

On 12 May 2011 06:20, Hugh Dickins <hughd@google.com> wrote:
> On Tue, 19 Apr 2011, Miklos Szeredi wrote:
...
>
> I disagree with Andrew on this, it should default to off, consistent with
> the "If unsure, say N" comment.  Partly because that's how Linus prefers
> it, for good anti-bloat reasons: if people have got along without a
> feature for years, whyever should we suddenly default it on?  And
> partly because TMPFS_POSIX_ACL already defaults to off (as do
> EXT2_FS_XATTR and EXT2_FS_POSIX_ACL).
>
> However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their
> old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically,
> so we're not in danger of removing their old functionality.  But I haven't
> thought of the right way to achieve that - maybe your helpful POSIX ACL
> comment is enough, but automatic would be better.
>
> Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it
> would work to make CONFIG_TMPFS_XATTR the new config option to cover both,
> and select it from config TMPFS_POSIX_ACL (without a prompt) so the
> oldconfig propagates correctly.  Perhaps.  But I haven't tried myself,
> and you may be forgiveably disinclined to make config experiments!

The xattrs are required for overlayfs so "generic" kernels will
probably want those. However, the "generic" kernels likely have the
acls already so this is not that much of an issue.

Perhaps the acls should select xattrs so that in configs that had acls
the xattrs are added automagically.

Or does it already work with depends?

Thanks

Michal

>
>>  config TMPFS_POSIX_ACL
>>       bool "Tmpfs POSIX Access Control Lists"
>> -     depends on TMPFS
>> +     depends on TMPFS_XATTR
>>       select GENERIC_ACL
>>       help

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-05-12  4:20                         ` Hugh Dickins
  2011-05-12  7:52                           ` Michal Suchanek
@ 2011-05-12 12:27                           ` Miklos Szeredi
  2011-05-12 14:00                             ` Miklos Szeredi
  1 sibling, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-05-12 12:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Eric Paris, Andrew Morton, James Morris,
	Serge Hallyn

Hi Hugh,

Hugh Dickins <hughd@google.com> writes:

> I understand from the mm-commits discussion that you're probably
> preparing another version: so let me make just a few comments
> on this old one now, in case you can factor them in.

For sure.  Thanks for your comments.

>> +config TMPFS_XATTR
>> +	bool "Tmpfs extended attributes"
>> +	depends on TMPFS
>> +	default y
>> +	help
>> +	  Extended attributes are name:value pairs associated with inodes by
>> +	  the kernel or by users (see the attr(5) manual page, or visit
>> +	  <http://acl.bestbits.at/> for details).
>> +
>> +	  Currently this enables support for the trusted.* and
>> +	  security.* namespaces.
>> +
>> +	  If unsure, say N.
>> +
>> +	  You need this for POSIX ACL support on tmpfs.
>> +
>
> I disagree with Andrew on this, it should default to off, consistent with
> the "If unsure, say N" comment.  Partly because that's how Linus prefers
> it, for good anti-bloat reasons: if people have got along without a
> feature for years, whyever should we suddenly default it on?  And
> partly because TMPFS_POSIX_ACL already defaults to off (as do
> EXT2_FS_XATTR and EXT2_FS_POSIX_ACL).

Okay, changed to "default n".

> However... if someone already has CONFIG_TMPFS_POSIX_ACL=y in their
> old config, it would be nice to make CONFIG_TMPFS_XATTR=y automatically,
> so we're not in danger of removing their old functionality.  But I haven't
> thought of the right way to achieve that - maybe your helpful POSIX ACL
> comment is enough, but automatic would be better.

AFAICS we don't really support backward compatibility in Kconfig.  If
something breaks, it's the user's (kernel builder's) responsibility to
fix up.

> Few people would choose TMPFS_XATTR on and TMPFS_POSIX_ACL off: perhaps it
> would work to make CONFIG_TMPFS_XATTR the new config option to cover both,
> and select it from config TMPFS_POSIX_ACL (without a prompt) so the
> oldconfig propagates correctly.  Perhaps.  But I haven't tried myself,
> and you may be forgiveably disinclined to make config experiments!

I think it would work, but I prefer to have these options separate and
avoid messy back compat options.

On overlayfs, for example, TMPFS_XATTR would be useful but
TMPFS_POSIX_ACL would not.

>> +	info = SHMEM_I(dentry->d_inode);
>> +
>> +	spin_lock(&dentry->d_inode->i_lock);
>
> Not important, but I suggest you use info->lock throughout for this,
> instead of dentry->d_inode->i_lock: in each case you need "info" for
> info->xattr_list (and don't need "inode" at all I think), so info->lock
> seems appropriate, and may be in the same cacheline once I make
> shmem_inode_info smaller.  But don't worry if you'd prefer to leave
> it.

Makes sense.  I updated the locking.

>> +		new_xattr = kmalloc(len, GFP_NOFS);
>
> There's no need for GFP_NOFS in this patch, the page reclaim path won't
> ever recurse into xattr handling: just use the usual GFP_KERNEL
> throughout.

Yes.  I didn't notice this in Eric's patch, but GFP_NOFS is obviously
not necessary here.

>> +		if (!new_xattr)
>> +		return -ENOMEM;
>> +
>> +		new_xattr->name = kstrdup(name, GFP_NOFS);
>> +		if (!new_xattr->name) {
>> +			kfree(new_xattr);
>> +			return -ENOMEM;
>> +		}
>> +
>> +		new_xattr->size = size;
>> +		memcpy(new_xattr->value, value, size);
>> +	}
>> +
>> +	spin_lock(&inode->i_lock);
>> +	list_for_each_entry(xattr, &info->xattr_list, list) {
>> +		if (!strcmp(name, xattr->name)) {
>> +			if (flags & XATTR_CREATE) {
>> +				xattr = new_xattr;
>> +				err = -EEXIST;
>> +			} else if (new_xattr) {
>> +				list_replace(&xattr->list, &new_xattr->list);
>> +			} else {
>> +				list_del(&xattr->list);
>> +			}
>> +			goto out;
>> +		}
>> +	}
>> +	if (flags & XATTR_REPLACE) {
>> +		xattr = new_xattr;
>> +		err = -ENODATA;
>> +	} else {
>> +		list_add(&new_xattr->list, &info->xattr_list);
>> +		xattr = NULL;
>> +	}
>> +out:
>> +	spin_unlock(&inode->i_lock);
>> +	if (xattr)
>> +		kfree(xattr->name);
>> +	kfree(xattr);
>> +	return 0;
>
> I think you meant to return err rather than always 0.

Right.  Well spotted.

>> +static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
>> +{
>> +	bool trusted = capable(CAP_SYS_ADMIN);
>> +	struct shmem_xattr *xattr;
>> +	struct shmem_inode_info *info;
>> +	size_t used = 0;
>> +
>> +	info = SHMEM_I(dentry->d_inode);
>> +
>> +	spin_lock(&dentry->d_inode->i_lock);
>> +	list_for_each_entry(xattr, &info->xattr_list, list) {
>> +		/* skip "trusted." attributes for unprivileged callers */
>> +		if (!trusted && xattr_is_trusted(xattr->name))
>> +			continue;
>> +
>> +		used += strlen(xattr->name) + 1;
>> +		if (buffer) {
>> +			if (size < used) {
>> +				used = -ERANGE;
>> +				break;
>> +			}
>> +			strncpy(buffer, xattr->name, strlen(xattr->name) + 1);
>> +			buffer += strlen(xattr->name) + 1;
>> +		}
>> +	}
>> +	spin_unlock(&dentry->d_inode->i_lock);
>> +
>> +	return used;
>> +}
>>  #endif
>
> #endif /* CONFIG_TMPFS_XATTR */
>
> would be welcome by the time we get here.

Yes, fixed.

Updated patch will follow shortly.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-05-12 12:27                           ` Miklos Szeredi
@ 2011-05-12 14:00                             ` Miklos Szeredi
  2011-05-12 16:52                               ` Hugh Dickins
  0 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-05-12 14:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Eric Paris, Andrew Morton, James Morris,
	Serge Hallyn

Miklos Szeredi <miklos@szeredi.hu> writes:

>>> +	info = SHMEM_I(dentry->d_inode);
>>> +
>>> +	spin_lock(&dentry->d_inode->i_lock);
>>
>> Not important, but I suggest you use info->lock throughout for this,
>> instead of dentry->d_inode->i_lock: in each case you need "info" for
>> info->xattr_list (and don't need "inode" at all I think), so info->lock
>> seems appropriate, and may be in the same cacheline once I make
>> shmem_inode_info smaller.  But don't worry if you'd prefer to leave
>> it.
>
> Makes sense.  I updated the locking.

This uncovered a nasty bug lurking in there: the "info" area, including
lock and xattr_list, may be overwritten by inline symlinks.  Because
xattr_list is near the end, this wasn't noticed with casual testing, but
info->lock would immediately Oops on getxattr for symlinks.

I propose the following solution.  It results in slightly less space for
inline symlinks, but correct operation for xattrs.  Does the anonymous
union/struct solution look acceptable?

Thanks,
Miklos

Index: linux-2.6/include/linux/shmem_fs.h
===================================================================
--- linux-2.6.orig/include/linux/shmem_fs.h	2011-05-12 15:59:08.000000000 +0200
+++ linux-2.6/include/linux/shmem_fs.h	2011-05-12 15:58:25.000000000 +0200
@@ -11,15 +11,39 @@
 
 struct shmem_inode_info {
 	spinlock_t		lock;
-	unsigned long		flags;
-	unsigned long		alloced;	/* data pages alloced to file */
-	unsigned long		swapped;	/* subtotal assigned to swap */
-	unsigned long		next_index;	/* highest alloced index + 1 */
-	struct shared_policy	policy;		/* NUMA memory alloc policy */
-	struct page		*i_indirect;	/* top indirect blocks page */
-	swp_entry_t		i_direct[SHMEM_NR_DIRECT]; /* first blocks */
-	struct list_head	swaplist;	/* chain of maybes on swap */
-	struct list_head	xattr_list;	/* list of shmem_xattr */
+
+	/* list of shmem_xattr */
+	struct list_head	xattr_list;
+
+	union {
+		char			inline_symlink[0];
+
+		/* Members not used by inline symlinks: */
+		struct {
+			unsigned long		flags;
+
+			/* data pages alloced to file */
+			unsigned long		alloced;
+
+			/* subtotal assigned to swap */
+			unsigned long		swapped;
+
+			/* highest alloced index + 1 */
+			unsigned long		next_index;
+
+			/* NUMA memory alloc policy */
+			struct shared_policy	policy;
+
+			/* top indirect blocks page */
+			struct page		*i_indirect;
+
+			/* first blocks */
+			swp_entry_t		i_direct[SHMEM_NR_DIRECT];
+
+			/* chain of maybes on swap */
+			struct list_head	swaplist;
+		};
+	};
 	struct inode		vfs_inode;
 };
 
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c	2011-05-12 15:59:08.000000000 +0200
+++ linux-2.6/mm/shmem.c	2011-05-12 15:50:10.000000000 +0200
@@ -2029,9 +2029,9 @@ static int shmem_symlink(struct inode *d
 
 	info = SHMEM_I(inode);
 	inode->i_size = len-1;
-	if (len <= (char *)inode - (char *)info) {
+	if (len <= (char *)inode - info->inline_symlink) {
 		/* do it inline */
-		memcpy(info, symname, len);
+		memcpy(info->inline_symlink, symname, len);
 		inode->i_op = &shmem_symlink_inline_operations;
 	} else {
 		error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
@@ -2057,7 +2057,7 @@ static int shmem_symlink(struct inode *d
 
 static void *shmem_follow_link_inline(struct dentry *dentry, struct nameidata *nd)
 {
-	nd_set_link(nd, (char *)SHMEM_I(dentry->d_inode));
+	nd_set_link(nd, SHMEM_I(dentry->d_inode)->inline_symlink);
 	return NULL;
 }
 




^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH] tmpfs: implement generic xattr support
  2011-05-12 14:00                             ` Miklos Szeredi
@ 2011-05-12 16:52                               ` Hugh Dickins
  0 siblings, 0 replies; 81+ messages in thread
From: Hugh Dickins @ 2011-05-12 16:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Michal Suchanek, Andreas Dilger, Jiri Kosina, Ric Wheeler,
	linux-fsdevel, linux-kernel, David Howells, Ian Kent, Jeff Moyer,
	Christoph Hellwig, Eric Paris, Andrew Morton, James Morris,
	Serge Hallyn

On Thu, 12 May 2011, Miklos Szeredi wrote:
> Miklos Szeredi <miklos@szeredi.hu> writes:
> 
> >>> +	info = SHMEM_I(dentry->d_inode);
> >>> +
> >>> +	spin_lock(&dentry->d_inode->i_lock);
> >>
> >> Not important, but I suggest you use info->lock throughout for this,
> >> instead of dentry->d_inode->i_lock: in each case you need "info" for
> >> info->xattr_list (and don't need "inode" at all I think), so info->lock
> >> seems appropriate, and may be in the same cacheline once I make
> >> shmem_inode_info smaller.  But don't worry if you'd prefer to leave
> >> it.
> >
> > Makes sense.  I updated the locking.
> 
> This uncovered a nasty bug lurking in there: the "info" area, including
> lock and xattr_list, may be overwritten by inline symlinks.  Because
> xattr_list is near the end, this wasn't noticed with casual testing, but
> info->lock would immediately Oops on getxattr for symlinks.

Yikes, I'd completely forgotten about those inline symlinks:
many thanks for reminding me.

> 
> I propose the following solution.  It results in slightly less space for
> inline symlinks, but correct operation for xattrs.  Does the anonymous
> union/struct solution look acceptable?

You're being conscientious to minimize the space reduction, and I wonder
if I'm being sloppy about it: but I think I'd prefer you to keep it simple
and just make a union of the i_direct[SHMEM_NR_DIRECT] array and the inline
symlink buffer.  That does waste space that was occasionally being put to
use before, but saves us from embarrassment next time we forget about the
inline symlinks.

I intend to be removing that i_direct array very soon: I guess I'll want
to kmalloc for short symlinks then, certainly not overlaying over what
fields are left: so you'd be moving in that direction if you just reuse
the i_direct area now.

Hugh

> 
> Thanks,
> Miklos
> 
> Index: linux-2.6/include/linux/shmem_fs.h
> ===================================================================
> --- linux-2.6.orig/include/linux/shmem_fs.h	2011-05-12 15:59:08.000000000 +0200
> +++ linux-2.6/include/linux/shmem_fs.h	2011-05-12 15:58:25.000000000 +0200
> @@ -11,15 +11,39 @@
>  
>  struct shmem_inode_info {
>  	spinlock_t		lock;
> -	unsigned long		flags;
> -	unsigned long		alloced;	/* data pages alloced to file */
> -	unsigned long		swapped;	/* subtotal assigned to swap */
> -	unsigned long		next_index;	/* highest alloced index + 1 */
> -	struct shared_policy	policy;		/* NUMA memory alloc policy */
> -	struct page		*i_indirect;	/* top indirect blocks page */
> -	swp_entry_t		i_direct[SHMEM_NR_DIRECT]; /* first blocks */
> -	struct list_head	swaplist;	/* chain of maybes on swap */
> -	struct list_head	xattr_list;	/* list of shmem_xattr */
> +
> +	/* list of shmem_xattr */
> +	struct list_head	xattr_list;
> +
> +	union {
> +		char			inline_symlink[0];
> +
> +		/* Members not used by inline symlinks: */
> +		struct {
> +			unsigned long		flags;
> +
> +			/* data pages alloced to file */
> +			unsigned long		alloced;
> +
> +			/* subtotal assigned to swap */
> +			unsigned long		swapped;
> +
> +			/* highest alloced index + 1 */
> +			unsigned long		next_index;
> +
> +			/* NUMA memory alloc policy */
> +			struct shared_policy	policy;
> +
> +			/* top indirect blocks page */
> +			struct page		*i_indirect;
> +
> +			/* first blocks */
> +			swp_entry_t		i_direct[SHMEM_NR_DIRECT];
> +
> +			/* chain of maybes on swap */
> +			struct list_head	swaplist;
> +		};
> +	};
>  	struct inode		vfs_inode;
>  };
>  
> Index: linux-2.6/mm/shmem.c
> ===================================================================
> --- linux-2.6.orig/mm/shmem.c	2011-05-12 15:59:08.000000000 +0200
> +++ linux-2.6/mm/shmem.c	2011-05-12 15:50:10.000000000 +0200
> @@ -2029,9 +2029,9 @@ static int shmem_symlink(struct inode *d
>  
>  	info = SHMEM_I(inode);
>  	inode->i_size = len-1;
> -	if (len <= (char *)inode - (char *)info) {
> +	if (len <= (char *)inode - info->inline_symlink) {
>  		/* do it inline */
> -		memcpy(info, symname, len);
> +		memcpy(info->inline_symlink, symname, len);
>  		inode->i_op = &shmem_symlink_inline_operations;
>  	} else {
>  		error = shmem_getpage(inode, 0, &page, SGP_WRITE, NULL);
> @@ -2057,7 +2057,7 @@ static int shmem_symlink(struct inode *d
>  
>  static void *shmem_follow_link_inline(struct dentry *dentry, struct nameidata *nd)
>  {
> -	nd_set_link(nd, (char *)SHMEM_I(dentry->d_inode));
> +	nd_set_link(nd, SHMEM_I(dentry->d_inode)->inline_symlink);
>  	return NULL;
>  }

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Union mount and overlayfs bake off?
  2011-04-14 19:14             ` David Howells
@ 2011-06-29  9:39               ` Ric Wheeler
  2011-06-29 11:40                 ` Michal Suchanek
  2011-06-29 10:17               ` David Howells
  1 sibling, 1 reply; 81+ messages in thread
From: Ric Wheeler @ 2011-06-29  9:39 UTC (permalink / raw)
  To: David Howells, Michal Suchanek, Alexander Viro
  Cc: Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer, miklos,
	Christoph Hellwig


I think that it has been a while since David reposted the refreshed patch set 
for union mounts & know that overlayfs has recent updates as well.

Despite that, I have not seen a lot of feedback from reviewers or testers.

Al, from your point of view, where we stand?

Thanks!

Ric


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and overlayfs bake off?
  2011-04-14 19:14             ` David Howells
  2011-06-29  9:39               ` Union mount and overlayfs bake off? Ric Wheeler
@ 2011-06-29 10:17               ` David Howells
  2011-06-30 12:44                 ` Miklos Szeredi
  2011-07-10  8:28                 ` Union mount and lockdep design issues Ric Wheeler
  1 sibling, 2 replies; 81+ messages in thread
From: David Howells @ 2011-06-29 10:17 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: dhowells, Michal Suchanek, Alexander Viro, Ian Kent,
	linux-fsdevel, linux-kernel, Jeff Moyer, miklos,
	Christoph Hellwig

Ric Wheeler <ricwheeler@gmail.com> wrote:

> I think that it has been a while since David reposted the refreshed patch set
> for union mounts & know that overlayfs has recent updates as well.
> 
> Despite that, I have not seen a lot of feedback from reviewers or testers.

The main problem I've got is that it causes lockdep to generate warnings when
the top layer and one of the lower layers are of the same filesystem type.  The
obvious way round this is to give each superblock its own i_mutex lock class
rather than putting this in the file_system_type struct, but I'm not sure of
the consequences (the two obvious problems are superblock transience and the
fact that there may be so many more of them that it may explode lockdep).

I've split out some of the VFS patches that we might be interested in taking
upstream anyway.  They're currently sat on Al's plate for his consideration.

I've been dealing with some of Al's issues with the unionmount patches, but I
know he's got more - I just can't remember them all.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and overlayfs bake off?
  2011-06-29  9:39               ` Union mount and overlayfs bake off? Ric Wheeler
@ 2011-06-29 11:40                 ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-06-29 11:40 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: David Howells, Alexander Viro, Ian Kent, linux-fsdevel,
	linux-kernel, Jeff Moyer, miklos, Christoph Hellwig

On 29 June 2011 11:39, Ric Wheeler <ricwheeler@gmail.com> wrote:
>
> I think that it has been a while since David reposted the refreshed patch
> set for union mounts & know that overlayfs has recent updates as well.
>
> Despite that, I have not seen a lot of feedback from reviewers or testers.
>

I can only attest that I can build working live CD on top of either
overlayfs or unionmount.

This is not a good metric for reliability because issues were found
with both solutions after the live CDs were working fine already.

However, it is an usability metric in that both solutions are from the
alpha stage in which crucial features were missing or broken way into
the beta stage  in which corner case issues are discovered but the
solution is generally working.

Somebody was running racer on top of overlayfs which uncovered some
problems with overlayfs itself and linux vfs in general. Unionmount
has supposedly its own testsuite but I have never seen that.

There are some issues with neither creating full emulation of a
traditional filesystem and the cost of hiding more effects of the
union from userspace to make the filesystem look more traditional.

With something touching VFS, however, it is important to decide how
(and if) these corner case issues should be resolved because they
could possibly affect unrelated parts of the system. Reviewing all
raised concerns is unfortunately taking quite some time, and then
addressing them in code will probably require more.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and overlayfs bake off?
  2011-06-29 10:17               ` David Howells
@ 2011-06-30 12:44                 ` Miklos Szeredi
  2011-07-10  8:28                 ` Union mount and lockdep design issues Ric Wheeler
  1 sibling, 0 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-06-30 12:44 UTC (permalink / raw)
  To: David Howells
  Cc: Ric Wheeler, Michal Suchanek, Alexander Viro, Ian Kent,
	linux-fsdevel, linux-kernel, Jeff Moyer, Christoph Hellwig

David Howells <dhowells@redhat.com> writes:

> I've been dealing with some of Al's issues with the unionmount
> patches, but I know he's got more - I just can't remember them all.

A couple of questions I have:

1) What happens to the union in a cloned namespace (CLONE_NEWNS)?

2) What's the overhead for non-unioned filesystems if CONFIG_UNION_MOUNTS
is enabled?  Does it show up in any microbenchmarks?

3) Is there a future strategy for making atomic operations really atomic?
E.g. what happens if power is lost in the middle of a copy-up?  Or if
whiteout of source fails after a successful rename()?

4) Have you looked at overlayfs?  Do you have any thoughts about the
relative merrits of each solution?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Union mount and lockdep design issues
  2011-06-29 10:17               ` David Howells
  2011-06-30 12:44                 ` Miklos Szeredi
@ 2011-07-10  8:28                 ` Ric Wheeler
  2011-07-10 13:48                   ` Peter Zijlstra
  2011-07-11 11:01                   ` David Howells
  1 sibling, 2 replies; 81+ messages in thread
From: Ric Wheeler @ 2011-07-10  8:28 UTC (permalink / raw)
  To: David Howells, Alexander Viro, Christoph Hellwig, Peter Zijlstra,
	Ingo Molnar
  Cc: Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On 06/29/2011 11:17 AM, David Howells wrote:
> Ric Wheeler<ricwheeler@gmail.com>  wrote:
>
>> I think that it has been a while since David reposted the refreshed patch set
>> for union mounts&  know that overlayfs has recent updates as well.
>>
>> Despite that, I have not seen a lot of feedback from reviewers or testers.
> The main problem I've got is that it causes lockdep to generate warnings when
> the top layer and one of the lower layers are of the same filesystem type.  The
> obvious way round this is to give each superblock its own i_mutex lock class
> rather than putting this in the file_system_type struct, but I'm not sure of
> the consequences (the two obvious problems are superblock transience and the
> fact that there may be so many more of them that it may explode lockdep).
>
> I've split out some of the VFS patches that we might be interested in taking
> upstream anyway.  They're currently sat on Al's plate for his consideration.
>
> I've been dealing with some of Al's issues with the unionmount patches, but I
> know he's got more - I just can't remember them all.
>
> David

After sitting down in person to dive into the lockdep issues with David over 
some very nice food (thanks David!), it does seem that this is really more of a 
lockdep issue and the way it is designed than a union mount issue.

Peter, Ingo, are either of you the right people to think about how to fix 
lockdep to handle stacked components (like ext4 used in union mounts stacked on 
top of another ext4 fs) where both layers will routinely lock to the same object?

Should we do a specific hack to work around this for union mounts or look for 
lockdep to change?

Thanks!

Ric



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-10  8:28                 ` Union mount and lockdep design issues Ric Wheeler
@ 2011-07-10 13:48                   ` Peter Zijlstra
  2011-07-11  8:35                     ` Michal Suchanek
  2011-07-11 11:01                   ` David Howells
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2011-07-10 13:48 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: David Howells, Alexander Viro, Christoph Hellwig, Ingo Molnar,
	Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On Sun, 2011-07-10 at 09:28 +0100, Ric Wheeler wrote:
> On 06/29/2011 11:17 AM, David Howells wrote:
> > Ric Wheeler<ricwheeler@gmail.com>  wrote:
> >
> >> I think that it has been a while since David reposted the refreshed patch set
> >> for union mounts&  know that overlayfs has recent updates as well.
> >>
> >> Despite that, I have not seen a lot of feedback from reviewers or testers.

> > The main problem I've got is that it causes lockdep to generate warnings when
> > the top layer and one of the lower layers are of the same filesystem type.  The
> > obvious way round this is to give each superblock its own i_mutex lock class
> > rather than putting this in the file_system_type struct, but I'm not sure of
> > the consequences (the two obvious problems are superblock transience and the
> > fact that there may be so many more of them that it may explode lockdep).

Can't, that would involve classes in dynamically allocated memory (as
you cannot a-priori determine how many instances there will be of a
particular sb). There a number of good (although at times rather
frustrating) reasons why lockdep cannot do dynamic memory.

Most of those arguments center around things like: allocating memory
involves locks, therefore we could end up wanting to allocate memory
while in the allocator etc.

Also, why would you want to have a class per sb-instance? From last
talking to David, he said there could only ever be 2 filesystems
involved in this, the top and bottom, and it is determined on (union)
mount time which is which.

I'm also assuming that once a filesystem is part of a union mount, it
cannot be accessed from outside of said union (can it? can the botton be
itself be the top layer of another union?)

Therefore, why can't we, on constructing the union layers, reset the
lock classes?

Also, in what state are the filesystems on construction of the union?
Are they already fully formed and populated (do inodes already exist?)

> Peter, Ingo, are either of you the right people to think about how to fix 
> lockdep to handle stacked components (like ext4 used in union mounts stacked on 
> top of another ext4 fs) where both layers will routinely lock to the same object?

We would be.

> Should we do a specific hack to work around this for union mounts or look for 
> lockdep to change?

I still don't understand the problem at all, so lets start there ;-)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-10 13:48                   ` Peter Zijlstra
@ 2011-07-11  8:35                     ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-07-11  8:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ric Wheeler, David Howells, Alexander Viro, Christoph Hellwig,
	Ingo Molnar, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer,
	miklos

On 10 July 2011 15:48, Peter Zijlstra <peterz@infradead.org> wrote:
> On Sun, 2011-07-10 at 09:28 +0100, Ric Wheeler wrote:
>> On 06/29/2011 11:17 AM, David Howells wrote:
>> > Ric Wheeler<ricwheeler@gmail.com>  wrote:
>> >
>> >> I think that it has been a while since David reposted the refreshed patch set
>> >> for union mounts&  know that overlayfs has recent updates as well.
>> >>
>> >> Despite that, I have not seen a lot of feedback from reviewers or testers.
>
>> > The main problem I've got is that it causes lockdep to generate warnings when
>> > the top layer and one of the lower layers are of the same filesystem type.  The
>> > obvious way round this is to give each superblock its own i_mutex lock class
>> > rather than putting this in the file_system_type struct, but I'm not sure of
>> > the consequences (the two obvious problems are superblock transience and the
>> > fact that there may be so many more of them that it may explode lockdep).
>
> Can't, that would involve classes in dynamically allocated memory (as
> you cannot a-priori determine how many instances there will be of a
> particular sb). There a number of good (although at times rather
> frustrating) reasons why lockdep cannot do dynamic memory.
>
> Most of those arguments center around things like: allocating memory
> involves locks, therefore we could end up wanting to allocate memory
> while in the allocator etc.
>
> Also, why would you want to have a class per sb-instance? From last
> talking to David, he said there could only ever be 2 filesystems
> involved in this, the top and bottom, and it is determined on (union)
> mount time which is which.
>
> I'm also assuming that once a filesystem is part of a union mount, it
> cannot be accessed from outside of said union (can it? can the botton be
> itself be the top layer of another union?)
>

It can still be accessed, there is nothing preventing access, just
like mount --bind won't prevent access to the original location.

It might be possible to block access to it but it would be another
artifical limitation.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-10  8:28                 ` Union mount and lockdep design issues Ric Wheeler
  2011-07-10 13:48                   ` Peter Zijlstra
@ 2011-07-11 11:01                   ` David Howells
  2011-07-11 12:00                     ` Peter Zijlstra
  2011-07-11 13:54                     ` David Howells
  1 sibling, 2 replies; 81+ messages in thread
From: David Howells @ 2011-07-11 11:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, Ric Wheeler, Alexander Viro, Christoph Hellwig,
	Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel,
	linux-kernel, Jeff Moyer, miklos

Peter Zijlstra <peterz@infradead.org> wrote:

> > > The main problem I've got is that it causes lockdep to generate warnings
> > > when the top layer and one of the lower layers are of the same
> > > filesystem type.  The obvious way round this is to give each superblock
> > > its own i_mutex lock class rather than putting this in the
> > > file_system_type struct, but I'm not sure of the consequences (the two
> > > obvious problems are superblock transience and the fact that there may
> > > be so many more of them that it may explode lockdep).
> 
> Can't, that would involve classes in dynamically allocated memory (as
> you cannot a-priori determine how many instances there will be of a
> particular sb). There a number of good (although at times rather
> frustrating) reasons why lockdep cannot do dynamic memory.

What does this mean for filesystem modules that get removed and inserted
again?  That's something I do during development rather than rebooting the
machine.

> Most of those arguments center around things like: allocating memory
> involves locks, therefore we could end up wanting to allocate memory
> while in the allocator etc.

I'm not sure what these arguments are.  Initialising the lock class doesn't
need to be done with any locks held.

I assumed the problems came from key reuse and the storage of out-of-date
keys, and an over-abundance of keys, where a lock class's key is simply the
pointer to its struct.

> Also, why would you want to have a class per sb-instance? From last
> talking to David, he said there could only ever be 2 filesystems
> involved in this, the top and bottom, and it is determined on (union)
> mount time which is which.

There can be more than 2 - one upperfs (the actual union) and many lowerfs -
though I think only one lowerfs is accessed at a time.

However, I was wondering that if in the future it could be possible to make it
possible to union over a union.  I think that conceptually it shouldn't be that
hard, but definitely lockdep presents a barrier unless the top union goes
behind the scenes of the lower union and interacts with its lowerfs's directly.

> I'm also assuming that once a filesystem is part of a union mount, it
> cannot be accessed from outside of said union (can it? can the botton be
> itself be the top layer of another union?)

Not at the moment; the hard read-only requirements on the lowerfs versus the
writeability requirements of the upperfs (you can't enter a directory that you
can't mirror up) prevent it.

However, at some point I'd be interested in trying to make it possible to union
over a writeable filesystem.  This is pretty much a requirement for unioning
over NFS (as you can't tell the server to make the volume you're mounting hard
read-only).

> Therefore, why can't we, on constructing the union layers, reset the lock
> classes?

Reset in what sense?

> Also, in what state are the filesystems on construction of the union?  Are
> they already fully formed and populated (do inodes already exist?)

The lower filesystems must be fully formed and, at present, may not be modified
whilst in the union.

The upper filesystem can be empty or filled by a previous union.  In fact,
there's nothing stopping the upper fs being an ordinary fs that's then used as
the upper layer in a union, but I'm not sure you can then access the lower
echelons as the directories don't contain fallthru entries.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 11:01                   ` David Howells
@ 2011-07-11 12:00                     ` Peter Zijlstra
  2011-07-11 13:36                         ` Michal Suchanek
  2011-07-11 13:54                     ` David Howells
  1 sibling, 1 reply; 81+ messages in thread
From: Peter Zijlstra @ 2011-07-11 12:00 UTC (permalink / raw)
  To: David Howells
  Cc: Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar,
	Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > > The main problem I've got is that it causes lockdep to generate warnings
> > > > when the top layer and one of the lower layers are of the same
> > > > filesystem type.  The obvious way round this is to give each superblock
> > > > its own i_mutex lock class rather than putting this in the
> > > > file_system_type struct, but I'm not sure of the consequences (the two
> > > > obvious problems are superblock transience and the fact that there may
> > > > be so many more of them that it may explode lockdep).
> > 
> > Can't, that would involve classes in dynamically allocated memory (as
> > you cannot a-priori determine how many instances there will be of a
> > particular sb). There a number of good (although at times rather
> > frustrating) reasons why lockdep cannot do dynamic memory.
> 
> What does this mean for filesystem modules that get removed and inserted
> again?  That's something I do during development rather than rebooting the
> machine.

At some point lockdep runs out of resources and you'll have to reboot.
This is true for all module muck.

> > Most of those arguments center around things like: allocating memory
> > involves locks, therefore we could end up wanting to allocate memory
> > while in the allocator etc.
> 
> I'm not sure what these arguments are.  Initialising the lock class doesn't
> need to be done with any locks held.

Correct, however we require all class keys to be of static storage,
using dynamic storage for keys would entail having to tear down lock
relations on free.

Furthermore, once a lock class gets used it consumes resources, its
needs to get linked into dependency chains etc.. all that is at lock
usage sites.

> I assumed the problems came from key reuse and the storage of out-of-date
> keys, and an over-abundance of keys, where a lock class's key is simply the
> pointer to its struct.

Right, that too is a problem (partially already exposed by using
modules), compile time static storage isn't actually static in the face
of removable modules, and we leak resources from our static pools on
rmmod.

> > Also, why would you want to have a class per sb-instance? From last
> > talking to David, he said there could only ever be 2 filesystems
> > involved in this, the top and bottom, and it is determined on (union)
> > mount time which is which.
> 
> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
> though I think only one lowerfs is accessed at a time.

Right, however I understood from our earlier discussion that the vfs
would only ever try to lock 2 filesystems at a time, the top and one
lower.

> However, I was wondering that if in the future it could be possible to make it
> possible to union over a union.  I think that conceptually it shouldn't be that
> hard, but definitely lockdep presents a barrier unless the top union goes
> behind the scenes of the lower union and interacts with its lowerfs's directly.

Aside from lockdep, how many fs locks will you nest and how will you
enforce the filesystem relations remain a DAG? But yeah, that'll be a
tad harder to do. One of the ways we could tackle that is create a lock
class per depth, and statically create say 16 of those, allowing for a
DAG with span of 16.

> > I'm also assuming that once a filesystem is part of a union mount, it
> > cannot be accessed from outside of said union (can it? can the botton be
> > itself be the top layer of another union?)
> 
> Not at the moment; the hard read-only requirements on the lowerfs versus the
> writeability requirements of the upperfs (you can't enter a directory that you
> can't mirror up) prevent it.
> 
> However, at some point I'd be interested in trying to make it possible to union
> over a writeable filesystem.  This is pretty much a requirement for unioning
> over NFS (as you can't tell the server to make the volume you're mounting hard
> read-only).

Right, ok, but lets try and make the current situation work first. I
understand the desire to later grow.

> > Therefore, why can't we, on constructing the union layers, reset the lock
> > classes?
> 
> Reset in what sense?

Change lock class, we can change the lock class of a lock (easier if not
held, not impossible when held, although in the latter case you'll
obviously establish a relation between the new class and possibly other
held locks).

> > Also, in what state are the filesystems on construction of the union?  Are
> > they already fully formed and populated (do inodes already exist?)
> 
> The lower filesystems must be fully formed and, at present, may not be modified
> whilst in the union.
> 
> The upper filesystem can be empty or filled by a previous union.  In fact,
> there's nothing stopping the upper fs being an ordinary fs that's then used as
> the upper layer in a union, but I'm not sure you can then access the lower
> echelons as the directories don't contain fallthru entries.

Right, so in both cases they can be fully formed, in that case we'll
need to iterate all inodes and change their lock class as well.

Going from pure vfs-ignorance, something like the below (does the inode
locks only, are the sb locks also required?), where the union-mount will
call reclassify_fs() and provide the correct fs depth (max 1 for now, to
be extended later when you do the full DAG thing).


---
 fs/inode.c         |   43 +++++++++++++++++++++++++++++++++++--------
 fs/super.c         |    7 +++++++
 include/linux/fs.h |   22 ++++++++++++++++++----
 3 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 43566d1..dfb5b11 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -171,13 +171,13 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	if (security_inode_alloc(inode))
 		goto out;
 	spin_lock_init(&inode->i_lock);
-	lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
+	lockdep_set_class(&inode->i_lock, sb->i_lock_key);
 
 	mutex_init(&inode->i_mutex);
-	lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
+	lockdep_set_class(&inode->i_mutex, sb->i_mutex_key);
 
 	init_rwsem(&inode->i_alloc_sem);
-	lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
+	lockdep_set_class(&inode->i_alloc_sem, sb->i_alloc_sem_key);
 
 	mapping->a_ops = &empty_aops;
 	mapping->host = inode;
@@ -882,18 +882,16 @@ void unlock_new_inode(struct inode *inode)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	if (S_ISDIR(inode->i_mode)) {
-		struct file_system_type *type = inode->i_sb->s_type;
+		struct super_block *sb = inode->i_sb;
 
 		/* Set new key only if filesystem hasn't already changed it */
-		if (!lockdep_match_class(&inode->i_mutex,
-		    &type->i_mutex_key)) {
+		if (!lockdep_match_class(&inode->i_mutex, sb->i_mutex_key)) {
 			/*
 			 * ensure nobody is actually holding i_mutex
 			 */
 			mutex_destroy(&inode->i_mutex);
 			mutex_init(&inode->i_mutex);
-			lockdep_set_class(&inode->i_mutex,
-					  &type->i_mutex_dir_key);
+			lockdep_set_class(&inode->i_mutex, sb->i_mutex_dir_key);
 		}
 	}
 #endif
@@ -905,6 +903,35 @@ void unlock_new_inode(struct inode *inode)
 }
 EXPORT_SYMBOL(unlock_new_inode);
 
+void reclassify_sb(struct super_block *sb, int level)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct file_system_type *type = sb->s_type;
+	struct inode *inode;
+
+	WARN_ON_ONCE(level >= LOCKDEP_FS_KEYS);
+
+	s->i_lock_key = &type->i_lock_key[level];
+	s->i_mutex_key = &type->i_mutex_key[level];
+	s->i_mutex_dir_key = &type->i_mutex_dir_key[level];
+	s->i_alloc_sem_key = &type->i_alloc_sem_key[level];
+
+	spin_lock(&inode_sb_list_lock);
+	list_for_each_entry(inode, sb->s_inodes, i_sb_list) {
+		// XXX is the inode unused and unlocked ?!
+		
+		lockdep_set_class(&inode->i_lock, sb->i_lock_key);
+		if (S_ISDIR(inode->i_mode))
+			lockdep_set_class(&inode->i_mutex, sb->i_mutex_dir_key);
+		else
+			lockdep_set_class(&inode->i_mutex, sb->i_mutex_key);
+		lockdep_set_class(&inode->i_alloc_sem, sb->i_alloc_sem_key);
+	}
+	spin_unlock(&inode_sb_list_lock);
+#endif
+}
+EXPORT_SYMBOL(reclassify_sb);
+
 /**
  * iget5_locked - obtain an inode from a mounted file system
  * @sb:		super block of file system
diff --git a/fs/super.c b/fs/super.c
index ab3d672..dca208b 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -114,6 +114,13 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		s->s_op = &default_op;
 		s->s_time_gran = 1000000000;
 		s->cleancache_poolid = -1;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+		s->i_lock_key = &type->i_lock_key[0];
+		s->i_mutex_key = &type->i_mutex_key[0];
+		s->i_mutex_dir_key = &type->i_mutex_dir_key[0];
+		s->i_alloc_sem_key = &type->i_alloc_sem_key[0];
+#endif
 	}
 out:
 	return s;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b9792..ef43ac3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1438,6 +1438,13 @@ struct super_block {
 	 * Saved pool identifier for cleancache (-1 means none)
 	 */
 	int cleancache_poolid;
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+	struct lock_class_key *i_lock_key;
+	struct lock_class_key *i_mutex_key;
+	struct lock_class_key *i_mutex_dir_key;
+	struct lock_class_key *i_alloc_sem_key;
+#endif
 };
 
 extern struct timespec current_fs_time(struct super_block *sb);
@@ -1799,6 +1806,13 @@ static inline void file_accessed(struct file *file)
 int sync_inode(struct inode *inode, struct writeback_control *wbc);
 int sync_inode_metadata(struct inode *inode, int wait);
 
+/*
+ * The number of different inode lock keys, used for union mounts
+ * so that layers of the same file_system_type may still nest without
+ * making lockdep upset.
+ */
+#define LOCKDEP_FS_KEYS	2
+
 struct file_system_type {
 	const char *name;
 	int fs_flags;
@@ -1813,10 +1827,10 @@ struct file_system_type {
 	struct lock_class_key s_umount_key;
 	struct lock_class_key s_vfs_rename_key;
 
-	struct lock_class_key i_lock_key;
-	struct lock_class_key i_mutex_key;
-	struct lock_class_key i_mutex_dir_key;
-	struct lock_class_key i_alloc_sem_key;
+	struct lock_class_key i_lock_key[LOCKDEP_FS_KEYS];
+	struct lock_class_key i_mutex_key[LOCKDEP_FS_KEYS];
+	struct lock_class_key i_mutex_dir_key[LOCKDEP_FS_KEYS];
+	struct lock_class_key i_alloc_sem_key[LOCKDEP_FS_KEYS];
 };
 
 extern struct dentry *mount_ns(struct file_system_type *fs_type, int flags,


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 12:00                     ` Peter Zijlstra
@ 2011-07-11 13:36                         ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-07-11 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig,
	Ingo Molnar, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer,
	miklos

On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>

>> > Also, why would you want to have a class per sb-instance? From last
>> > talking to David, he said there could only ever be 2 filesystems
>> > involved in this, the top and bottom, and it is determined on (union)
>> > mount time which is which.
>>
>> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
>> though I think only one lowerfs is accessed at a time.
>
> Right, however I understood from our earlier discussion that the vfs
> would only ever try to lock 2 filesystems at a time, the top and one
> lower.

This is true from local point of view. However, it is technically
possible to use overlayfs as the upper layer of another overlayfs
which allows layering multiple readonly "branches" into a single
overlay. Since the vfs will lock the "union" and one (or possibly
both) of its branches and one of the branches may be itself an union
you can get arbitrary depth (which is currently limited by a constant
in the code to cut recursion depth and stack usage).

>
>> However, I was wondering that if in the future it could be possible to make it
>> possible to union over a union.  I think that conceptually it shouldn't be that
>> hard, but definitely lockdep presents a barrier unless the top union goes
>> behind the scenes of the lower union and interacts with its lowerfs's directly.
>
> Aside from lockdep, how many fs locks will you nest and how will you
> enforce the filesystem relations remain a DAG? But yeah, that'll be a
> tad harder to do. One of the ways we could tackle that is create a lock
> class per depth, and statically create say 16 of those, allowing for a
> DAG with span of 16.

This would be consistent with the limit on nesting imposed by stack
size but there should be probably some mechanism to infer one of the
numbers from the other.

>
>> > I'm also assuming that once a filesystem is part of a union mount, it
>> > cannot be accessed from outside of said union (can it? can the botton be
>> > itself be the top layer of another union?)
>>
>> Not at the moment; the hard read-only requirements on the lowerfs versus the
>> writeability requirements of the upperfs (you can't enter a directory that you
>> can't mirror up) prevent it.
>>
>> However, at some point I'd be interested in trying to make it possible to union
>> over a writeable filesystem.  This is pretty much a requirement for unioning
>> over NFS (as you can't tell the server to make the volume you're mounting hard
>> read-only).

I don't think that there is a hard readonly requirement. As far s a I
understand the current status is that "The filesystem should not be
modified directly" and "doing so will lead to undefined behaviour but
no crash or lockup". Unless there are bugs, obviously.

>> > Also, in what state are the filesystems on construction of the union?  Are
>> > they already fully formed and populated (do inodes already exist?)
>>
>> The lower filesystems must be fully formed and, at present, may not be modified
>> whilst in the union.
>>
>> The upper filesystem can be empty or filled by a previous union.  In fact,
>> there's nothing stopping the upper fs being an ordinary fs that's then used as
>> the upper layer in a union, but I'm not sure you can then access the lower
>> echelons as the directories don't contain fallthru entries.

As overlayfs does not have explicit fallthru entries layering any two
fully formed filesystems gives an union of the two. You will only lose
access to entries that were previously deleted in an union and have a
whiteout entry in the upper layer.

Unionmount makes any directories which were touched in an upper union
layer opaque and requires explicit fallthru entries to access the
lower layer. A normal filesystem does not have opaque directories and
allows access to the lower layer when it is used as the top layer for
the first time. Traversing the union will make it opaque, though.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
@ 2011-07-11 13:36                         ` Michal Suchanek
  0 siblings, 0 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-07-11 13:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Howells, Ric Wheeler, Alexander Viro, Christoph Hellwig,
	Ingo Molnar, Ian Kent, linux-fsdevel, linux-kernel, Jeff Moyer,
	miklos

On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>

>> > Also, why would you want to have a class per sb-instance? From last
>> > talking to David, he said there could only ever be 2 filesystems
>> > involved in this, the top and bottom, and it is determined on (union)
>> > mount time which is which.
>>
>> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
>> though I think only one lowerfs is accessed at a time.
>
> Right, however I understood from our earlier discussion that the vfs
> would only ever try to lock 2 filesystems at a time, the top and one
> lower.

This is true from local point of view. However, it is technically
possible to use overlayfs as the upper layer of another overlayfs
which allows layering multiple readonly "branches" into a single
overlay. Since the vfs will lock the "union" and one (or possibly
both) of its branches and one of the branches may be itself an union
you can get arbitrary depth (which is currently limited by a constant
in the code to cut recursion depth and stack usage).

>
>> However, I was wondering that if in the future it could be possible to make it
>> possible to union over a union.  I think that conceptually it shouldn't be that
>> hard, but definitely lockdep presents a barrier unless the top union goes
>> behind the scenes of the lower union and interacts with its lowerfs's directly.
>
> Aside from lockdep, how many fs locks will you nest and how will you
> enforce the filesystem relations remain a DAG? But yeah, that'll be a
> tad harder to do. One of the ways we could tackle that is create a lock
> class per depth, and statically create say 16 of those, allowing for a
> DAG with span of 16.

This would be consistent with the limit on nesting imposed by stack
size but there should be probably some mechanism to infer one of the
numbers from the other.

>
>> > I'm also assuming that once a filesystem is part of a union mount, it
>> > cannot be accessed from outside of said union (can it? can the botton be
>> > itself be the top layer of another union?)
>>
>> Not at the moment; the hard read-only requirements on the lowerfs versus the
>> writeability requirements of the upperfs (you can't enter a directory that you
>> can't mirror up) prevent it.
>>
>> However, at some point I'd be interested in trying to make it possible to union
>> over a writeable filesystem.  This is pretty much a requirement for unioning
>> over NFS (as you can't tell the server to make the volume you're mounting hard
>> read-only).

I don't think that there is a hard readonly requirement. As far s a I
understand the current status is that "The filesystem should not be
modified directly" and "doing so will lead to undefined behaviour but
no crash or lockup". Unless there are bugs, obviously.

>> > Also, in what state are the filesystems on construction of the union?  Are
>> > they already fully formed and populated (do inodes already exist?)
>>
>> The lower filesystems must be fully formed and, at present, may not be modified
>> whilst in the union.
>>
>> The upper filesystem can be empty or filled by a previous union.  In fact,
>> there's nothing stopping the upper fs being an ordinary fs that's then used as
>> the upper layer in a union, but I'm not sure you can then access the lower
>> echelons as the directories don't contain fallthru entries.

As overlayfs does not have explicit fallthru entries layering any two
fully formed filesystems gives an union of the two. You will only lose
access to entries that were previously deleted in an union and have a
whiteout entry in the upper layer.

Unionmount makes any directories which were touched in an upper union
layer opaque and requires explicit fallthru entries to access the
lower layer. A normal filesystem does not have opaque directories and
allows access to the lower layer when it is used as the top layer for
the first time. Traversing the union will make it opaque, though.

Thanks

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 13:36                         ` Michal Suchanek
  (?)
@ 2011-07-11 13:50                         ` Ian Kent
  2011-07-11 16:17                           ` Michal Suchanek
  -1 siblings, 1 reply; 81+ messages in thread
From: Ian Kent @ 2011-07-11 13:50 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro,
	Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote:
> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
> >> Peter Zijlstra <peterz@infradead.org> wrote:
> >>
> 
> >> > Also, why would you want to have a class per sb-instance? From last
> >> > talking to David, he said there could only ever be 2 filesystems
> >> > involved in this, the top and bottom, and it is determined on (union)
> >> > mount time which is which.
> >>
> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
> >> though I think only one lowerfs is accessed at a time.
> >
> > Right, however I understood from our earlier discussion that the vfs
> > would only ever try to lock 2 filesystems at a time, the top and one
> > lower.
> 
> This is true from local point of view. However, it is technically
> possible to use overlayfs as the upper layer of another overlayfs
> which allows layering multiple readonly "branches" into a single
> overlay. Since the vfs will lock the "union" and one (or possibly
> both) of its branches and one of the branches may be itself an union
> you can get arbitrary depth (which is currently limited by a constant
> in the code to cut recursion depth and stack usage).

Off topic but can you elaborate on that?

Are you saying the "unioned stack" can consist of more than two file
systems and can have more than two layers and possibly a mix of multiple
read-only and read-write file systems?

Ian



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 11:01                   ` David Howells
  2011-07-11 12:00                     ` Peter Zijlstra
@ 2011-07-11 13:54                     ` David Howells
  2011-07-11 14:02                       ` Peter Zijlstra
  1 sibling, 1 reply; 81+ messages in thread
From: David Howells @ 2011-07-11 13:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dhowells, Ric Wheeler, Alexander Viro, Christoph Hellwig,
	Ingo Molnar, Michal Suchanek, Ian Kent, linux-fsdevel,
	linux-kernel, Jeff Moyer, miklos

Peter Zijlstra <peterz@infradead.org> wrote:

> At some point lockdep runs out of resources and you'll have to reboot.
> This is true for all module muck.

Fair enough; I guess for normal use, modules just don't get unloaded.

> > There can be more than 2 - one upperfs (the actual union) and many lowerfs -
> > though I think only one lowerfs is accessed at a time.
> 
> Right, however I understood from our earlier discussion that the vfs
> would only ever try to lock 2 filesystems at a time, the top and one
> lower.

That's what I meant by 'only one lowerfs is accessed at a time'.

> Aside from lockdep, how many fs locks will you nest and how will you
> enforce the filesystem relations remain a DAG?

This ought to be fine.  Unionmount requires the superblock for the upperfs be
fresh and unsullied when it gets to union it.  Furthermore, it shouldn't allow
that partition to be mounted elsewhere in a second mount whilst it is still
unioned.  Bind mounts should be okay, since they're effectively a reference on
the union we already have.

> Right, ok, but lets try and make the current situation work first. I
> understand the desire to later grow.

Indeed.

> > The upper filesystem can be empty or filled by a previous union.  In fact,
> > there's nothing stopping the upper fs being an ordinary fs that's then used
> > as the upper layer in a union, but I'm not sure you can then access the
> > lower echelons as the directories don't contain fallthru entries.
> 
> Right, so in both cases they can be fully formed, in that case we'll
> need to iterate all inodes and change their lock class as well.

I think I misunderstood you, then.  'Fully formed' in what sense?  I assumed
you meant populated on disk.

I'll post the two patches I have to deal with this.  The first propagates the
mount flags to sget() and the second makes use of MS_UNION in sget() to key the
locks appropriately.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 13:54                     ` David Howells
@ 2011-07-11 14:02                       ` Peter Zijlstra
  2011-07-11 14:50                         ` [PATCH 1/2] VFS: Pass mount flags to sget() David Howells
  2011-07-11 14:50                         ` [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer David Howells
  0 siblings, 2 replies; 81+ messages in thread
From: Peter Zijlstra @ 2011-07-11 14:02 UTC (permalink / raw)
  To: David Howells
  Cc: Ric Wheeler, Alexander Viro, Christoph Hellwig, Ingo Molnar,
	Michal Suchanek, Ian Kent, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On Mon, 2011-07-11 at 14:54 +0100, David Howells wrote:
> 
> I think I misunderstood you, then.  'Fully formed' in what sense?  I assumed
> you meant populated on disk.

I meant, sb exists and has inodes. It would be easier to deal with if
the union mount creates all sb involved, because then we can set the
proper lock class from the start and not have to worry about existing
state.

> I'll post the two patches I have to deal with this.  The first propagates the
> mount flags to sget() and the second makes use of MS_UNION in sget() to key the
> locks appropriately.

Right, note that the patch I provided has a number of problems, for one
it assumes inodes are unused and unlocked in that reclassify loop,
furthermore even by holding inode_sb_list_lock there is a race with
inode creation because we set the lock keys before adding to the
s_inodes list.

Anyway, it does illustrate the idea and someone more versed in vfs can
make it work, I hope ;-)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 1/2] VFS: Pass mount flags to sget()
  2011-07-11 14:02                       ` Peter Zijlstra
@ 2011-07-11 14:50                         ` David Howells
  2011-07-11 14:50                         ` [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer David Howells
  1 sibling, 0 replies; 81+ messages in thread
From: David Howells @ 2011-07-11 14:50 UTC (permalink / raw)
  To: peterz
  Cc: dhowells, aviro, hch, hramrach, ikent, linux-fsdevel,
	linux-kernel, jmoyer, miklos

Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called.  They could also be passed to the
compare function.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 drivers/mtd/mtdsuper.c |    4 +---
 fs/9p/vfs_super.c      |    4 ++--
 fs/afs/super.c         |    3 +--
 fs/btrfs/super.c       |    4 ++--
 fs/ceph/super.c        |    2 +-
 fs/cifs/cifsfs.c       |    9 ++++-----
 fs/devpts/inode.c      |    6 +++---
 fs/ecryptfs/main.c     |    3 +--
 fs/gfs2/ops_fstype.c   |    5 ++---
 fs/libfs.c             |    4 ++--
 fs/logfs/super.c       |    3 +--
 fs/nfs/super.c         |   10 +++++-----
 fs/nilfs2/super.c      |    4 ++--
 fs/proc/root.c         |    3 +--
 fs/reiserfs/procfs.c   |    2 +-
 fs/super.c             |   22 +++++++++++-----------
 fs/sysfs/mount.c       |    3 +--
 fs/ubifs/super.c       |    3 +--
 include/linux/fs.h     |    2 +-
 kernel/cgroup.c        |    2 +-
 20 files changed, 44 insertions(+), 54 deletions(-)

diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
index 16b02a1..45ade10 100644
--- a/drivers/mtd/mtdsuper.c
+++ b/drivers/mtd/mtdsuper.c
@@ -62,7 +62,7 @@ static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags,
 	struct super_block *sb;
 	int ret;
 
-	sb = sget(fs_type, get_sb_mtd_compare, get_sb_mtd_set, mtd);
+	sb = sget(fs_type, get_sb_mtd_compare, get_sb_mtd_set, flags, mtd);
 	if (IS_ERR(sb))
 		goto out_error;
 
@@ -73,8 +73,6 @@ static struct dentry *mount_mtd_aux(struct file_system_type *fs_type, int flags,
 	DEBUG(1, "MTDSB: New superblock for device %d (\"%s\")\n",
 	      mtd->index, mtd->name);
 
-	sb->s_flags = flags;
-
 	ret = fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
 	if (ret < 0) {
 		deactivate_locked_super(sb);
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index feef6cd..0d163c5 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -89,7 +89,7 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
 	if (v9ses->cache)
 		sb->s_bdi->ra_pages = (VM_MAX_READAHEAD * 1024)/PAGE_CACHE_SIZE;
 
-	sb->s_flags = flags | MS_ACTIVE | MS_DIRSYNC | MS_NOATIME;
+	sb->s_flags |= MS_ACTIVE | MS_DIRSYNC | MS_NOATIME;
 	if (!v9ses->cache)
 		sb->s_flags |= MS_SYNCHRONOUS;
 
@@ -137,7 +137,7 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 		goto close_session;
 	}
 
-	sb = sget(fs_type, NULL, v9fs_set_super, v9ses);
+	sb = sget(fs_type, NULL, v9fs_set_super, flags, v9ses);
 	if (IS_ERR(sb)) {
 		retval = PTR_ERR(sb);
 		goto clunk_fid;
diff --git a/fs/afs/super.c b/fs/afs/super.c
index 356dcf0..3c859b9 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -398,7 +398,7 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 	as->volume = vol;
 
 	/* allocate a deviceless superblock */
-	sb = sget(fs_type, afs_test_super, afs_set_super, as);
+	sb = sget(fs_type, afs_test_super, afs_set_super, flags, as);
 	if (IS_ERR(sb)) {
 		ret = PTR_ERR(sb);
 		afs_put_volume(vol);
@@ -409,7 +409,6 @@ static struct dentry *afs_mount(struct file_system_type *fs_type,
 	if (!sb->s_root) {
 		/* initial superblock/root creation */
 		_debug("create");
-		sb->s_flags = flags;
 		ret = afs_fill_super(sb, &params);
 		if (ret < 0) {
 			deactivate_locked_super(sb);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 0bb4ebb..c6d6836 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -808,7 +808,8 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 	tree_root->fs_info = fs_info;
 
 	bdev = fs_devices->latest_bdev;
-	s = sget(fs_type, btrfs_test_super, btrfs_set_super, tree_root);
+	s = sget(fs_type, btrfs_test_super, btrfs_set_super, flags | MS_NOSEC,
+		 tree_root);
 	if (IS_ERR(s))
 		goto error_s;
 
@@ -825,7 +826,6 @@ static struct dentry *btrfs_mount(struct file_system_type *fs_type, int flags,
 	} else {
 		char b[BDEVNAME_SIZE];
 
-		s->s_flags = flags | MS_NOSEC;
 		strlcpy(s->s_id, bdevname(bdev, b), sizeof(s->s_id));
 		error = btrfs_fill_super(s, fs_devices, data,
 					 flags & MS_SILENT ? 1 : 0);
diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index f2f77fd..82710f6 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -823,7 +823,7 @@ static struct dentry *ceph_mount(struct file_system_type *fs_type,
 
 	if (ceph_test_opt(fsc->client, NOSHARE))
 		compare_super = NULL;
-	sb = sget(fs_type, compare_super, ceph_set_super, fsc);
+	sb = sget(fs_type, compare_super, ceph_set_super, flags, fsc);
 	if (IS_ERR(sb)) {
 		res = ERR_CAST(sb);
 		goto out;
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 35f9154..834367e 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -679,7 +679,10 @@ cifs_do_mount(struct file_system_type *fs_type,
 	mnt_data.cifs_sb = cifs_sb;
 	mnt_data.flags = flags;
 
-	sb = sget(fs_type, cifs_match_super, cifs_set_super, &mnt_data);
+	/* BB should we make this contingent on mount parm? */
+	flags |= MS_NODIRATIME | MS_NOATIME;
+
+	sb = sget(fs_type, cifs_match_super, cifs_set_super, flags, &mnt_data);
 	if (IS_ERR(sb)) {
 		root = ERR_CAST(sb);
 		cifs_umount(cifs_sb);
@@ -690,10 +693,6 @@ cifs_do_mount(struct file_system_type *fs_type,
 		cFYI(1, "Use existing superblock");
 		cifs_umount(cifs_sb);
 	} else {
-		sb->s_flags = flags;
-		/* BB should we make this contingent on mount parm? */
-		sb->s_flags |= MS_NODIRATIME | MS_NOATIME;
-
 		rc = cifs_read_super(sb);
 		if (rc) {
 			root = ERR_PTR(rc);
diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
index 2f27e57..4154d08 100644
--- a/fs/devpts/inode.c
+++ b/fs/devpts/inode.c
@@ -369,15 +369,15 @@ static struct dentry *devpts_mount(struct file_system_type *fs_type,
 		return ERR_PTR(error);
 
 	if (opts.newinstance)
-		s = sget(fs_type, NULL, set_anon_super, NULL);
+		s = sget(fs_type, NULL, set_anon_super, flags, NULL);
 	else
-		s = sget(fs_type, compare_init_pts_sb, set_anon_super, NULL);
+		s = sget(fs_type, compare_init_pts_sb, set_anon_super, flags,
+			 NULL);
 
 	if (IS_ERR(s))
 		return ERR_CAST(s);
 
 	if (!s->s_root) {
-		s->s_flags = flags;
 		error = devpts_fill_super(s, data, flags & MS_SILENT ? 1 : 0);
 		if (error)
 			goto out_undo_sget;
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index 9f1bb74..46ac2b2 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -489,13 +489,12 @@ static struct dentry *ecryptfs_mount(struct file_system_type *fs_type, int flags
 		goto out;
 	}
 
-	s = sget(fs_type, NULL, set_anon_super, NULL);
+	s = sget(fs_type, NULL, set_anon_super, flags, NULL);
 	if (IS_ERR(s)) {
 		rc = PTR_ERR(s);
 		goto out;
 	}
 
-	s->s_flags = flags;
 	rc = bdi_setup_and_register(&sbi->bdi, "ecryptfs", BDI_CAP_MAP_COPY);
 	if (rc)
 		goto out1;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 8ac9ae1..80f24ce 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -1278,7 +1278,7 @@ static struct dentry *gfs2_mount(struct file_system_type *fs_type, int flags,
 		error = -EBUSY;
 		goto error_bdev;
 	}
-	s = sget(fs_type, test_gfs2_super, set_gfs2_super, bdev);
+	s = sget(fs_type, test_gfs2_super, set_gfs2_super, flags, bdev);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	error = PTR_ERR(s);
 	if (IS_ERR(s))
@@ -1308,7 +1308,6 @@ static struct dentry *gfs2_mount(struct file_system_type *fs_type, int flags,
 	} else {
 		char b[BDEVNAME_SIZE];
 
-		s->s_flags = flags;
 		s->s_mode = mode;
 		strlcpy(s->s_id, bdevname(bdev, b), sizeof(s->s_id));
 		sb_set_blocksize(s, block_size(bdev));
@@ -1352,7 +1351,7 @@ static struct dentry *gfs2_mount_meta(struct file_system_type *fs_type,
 		       dev_name, error);
 		return ERR_PTR(error);
 	}
-	s = sget(&gfs2_fs_type, test_gfs2_super, set_meta_super,
+	s = sget(&gfs2_fs_type, test_gfs2_super, set_meta_super, flags,
 		 path.dentry->d_inode->i_sb->s_bdev);
 	path_put(&path);
 	if (IS_ERR(s)) {
diff --git a/fs/libfs.c b/fs/libfs.c
index c88eab5..130d4f7 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -220,15 +220,15 @@ struct dentry *mount_pseudo(struct file_system_type *fs_type, char *name,
 	const struct super_operations *ops,
 	const struct dentry_operations *dops, unsigned long magic)
 {
-	struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL);
+	struct super_block *s;
 	struct dentry *dentry;
 	struct inode *root;
 	struct qstr d_name = {.name = name, .len = strlen(name)};
 
+	s = sget(fs_type, NULL, set_anon_super, MS_NOUSER, NULL);
 	if (IS_ERR(s))
 		return ERR_CAST(s);
 
-	s->s_flags = MS_NOUSER;
 	s->s_maxbytes = MAX_LFS_FILESIZE;
 	s->s_blocksize = PAGE_SIZE;
 	s->s_blocksize_bits = PAGE_SHIFT;
diff --git a/fs/logfs/super.c b/fs/logfs/super.c
index ce03a18..322888a 100644
--- a/fs/logfs/super.c
+++ b/fs/logfs/super.c
@@ -541,7 +541,7 @@ static struct dentry *logfs_get_sb_device(struct logfs_super *super,
 	log_super("LogFS: Start mount %x\n", mount_count++);
 
 	err = -EINVAL;
-	sb = sget(type, logfs_sb_test, logfs_sb_set, super);
+	sb = sget(type, logfs_sb_test, logfs_sb_set, flags | MS_NOATIME, super);
 	if (IS_ERR(sb)) {
 		super->s_devops->put_device(super);
 		kfree(super);
@@ -563,7 +563,6 @@ static struct dentry *logfs_get_sb_device(struct logfs_super *super,
 	 */
 	sb->s_maxbytes	= (1ull << 43) - 1;
 	sb->s_op	= &logfs_super_operations;
-	sb->s_flags	= flags | MS_NOATIME;
 
 	err = logfs_read_sb(sb, sb->s_flags & MS_RDONLY);
 	if (err)
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index ce40e5c..4d2ea32 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2250,7 +2250,7 @@ static struct dentry *nfs_fs_mount(struct file_system_type *fs_type,
 		compare_super = NULL;
 
 	/* Get a superblock - note that we may end up sharing one that already exists */
-	s = sget(fs_type, compare_super, nfs_set_super, &sb_mntdata);
+	s = sget(fs_type, compare_super, nfs_set_super, flags, &sb_mntdata);
 	if (IS_ERR(s)) {
 		mntroot = ERR_CAST(s);
 		goto out_err_nosb;
@@ -2362,7 +2362,7 @@ nfs_xdev_mount(struct file_system_type *fs_type, int flags,
 		compare_super = NULL;
 
 	/* Get a superblock - note that we may end up sharing one that already exists */
-	s = sget(&nfs_fs_type, compare_super, nfs_set_super, &sb_mntdata);
+	s = sget(&nfs_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata);
 	if (IS_ERR(s)) {
 		error = PTR_ERR(s);
 		goto out_err_nosb;
@@ -2629,7 +2629,7 @@ nfs4_remote_mount(struct file_system_type *fs_type, int flags,
 		compare_super = NULL;
 
 	/* Get a superblock - note that we may end up sharing one that already exists */
-	s = sget(&nfs4_fs_type, compare_super, nfs_set_super, &sb_mntdata);
+	s = sget(&nfs4_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata);
 	if (IS_ERR(s)) {
 		error = PTR_ERR(s);
 		goto out_free;
@@ -2923,7 +2923,7 @@ nfs4_xdev_mount(struct file_system_type *fs_type, int flags,
 		compare_super = NULL;
 
 	/* Get a superblock - note that we may end up sharing one that already exists */
-	s = sget(&nfs4_fs_type, compare_super, nfs_set_super, &sb_mntdata);
+	s = sget(&nfs4_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata);
 	if (IS_ERR(s)) {
 		error = PTR_ERR(s);
 		goto out_err_nosb;
@@ -3010,7 +3010,7 @@ nfs4_remote_referral_mount(struct file_system_type *fs_type, int flags,
 		compare_super = NULL;
 
 	/* Get a superblock - note that we may end up sharing one that already exists */
-	s = sget(&nfs4_fs_type, compare_super, nfs_set_super, &sb_mntdata);
+	s = sget(&nfs4_fs_type, compare_super, nfs_set_super, flags, &sb_mntdata);
 	if (IS_ERR(s)) {
 		error = PTR_ERR(s);
 		goto out_err_nosb;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 8351c44..1a36d5d 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1290,7 +1290,8 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 		err = -EBUSY;
 		goto failed;
 	}
-	s = sget(fs_type, nilfs_test_bdev_super, nilfs_set_bdev_super, sd.bdev);
+	s = sget(fs_type, nilfs_test_bdev_super, nilfs_set_bdev_super, flags,
+		 sd.bdev);
 	mutex_unlock(&sd.bdev->bd_fsfreeze_mutex);
 	if (IS_ERR(s)) {
 		err = PTR_ERR(s);
@@ -1303,7 +1304,6 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 		s_new = true;
 
 		/* New superblock instance created */
-		s->s_flags = flags;
 		s->s_mode = mode;
 		strlcpy(s->s_id, bdevname(sd.bdev, b), sizeof(s->s_id));
 		sb_set_blocksize(s, block_size(sd.bdev));
diff --git a/fs/proc/root.c b/fs/proc/root.c
index d6c3b41..e393f12 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -49,12 +49,11 @@ static struct dentry *proc_mount(struct file_system_type *fs_type,
 	else
 		ns = current->nsproxy->pid_ns;
 
-	sb = sget(fs_type, proc_test_super, proc_set_super, ns);
+	sb = sget(fs_type, proc_test_super, proc_set_super, flags, ns);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
 	if (!sb->s_root) {
-		sb->s_flags = flags;
 		err = proc_fill_super(sb);
 		if (err) {
 			deactivate_locked_super(sb);
diff --git a/fs/reiserfs/procfs.c b/fs/reiserfs/procfs.c
index 7a99811..de42b03 100644
--- a/fs/reiserfs/procfs.c
+++ b/fs/reiserfs/procfs.c
@@ -404,7 +404,7 @@ static void *r_start(struct seq_file *m, loff_t * pos)
 	if (l)
 		return NULL;
 
-	if (IS_ERR(sget(&reiserfs_fs_type, test_sb, set_sb, s)))
+	if (IS_ERR(sget(&reiserfs_fs_type, test_sb, set_sb, 0, s)))
 		return NULL;
 
 	up_write(&s->s_umount);
diff --git a/fs/super.c b/fs/super.c
index ab3d672..1a36e98 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -41,11 +41,12 @@ DEFINE_SPINLOCK(sb_lock);
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
+ *	@flags: the mount flags
  *
  *	Allocates and initializes a new &struct super_block.  alloc_super()
  *	returns a pointer new superblock or %NULL if allocation had failed.
  */
-static struct super_block *alloc_super(struct file_system_type *type)
+static struct super_block *alloc_super(struct file_system_type *type, int flags)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
@@ -72,6 +73,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 #else
 		INIT_LIST_HEAD(&s->s_files);
 #endif
+		s->s_flags = flags;
 		s->s_bdi = &default_backing_dev_info;
 		INIT_LIST_HEAD(&s->s_instances);
 		INIT_HLIST_BL_HEAD(&s->s_anon);
@@ -311,11 +313,13 @@ EXPORT_SYMBOL(generic_shutdown_super);
  *	@type:	filesystem type superblock should belong to
  *	@test:	comparison callback
  *	@set:	setup callback
+ *	@flags:	mount flags
  *	@data:	argument to each of them
  */
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
+			int flags,
 			void *data)
 {
 	struct super_block *s = NULL;
@@ -345,7 +349,7 @@ retry:
 	}
 	if (!s) {
 		spin_unlock(&sb_lock);
-		s = alloc_super(type);
+		s = alloc_super(type, flags);
 		if (!s)
 			return ERR_PTR(-ENOMEM);
 		goto retry;
@@ -730,13 +734,12 @@ struct dentry *mount_ns(struct file_system_type *fs_type, int flags,
 {
 	struct super_block *sb;
 
-	sb = sget(fs_type, ns_test_super, ns_set_super, data);
+	sb = sget(fs_type, ns_test_super, ns_set_super, flags, data);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 
 	if (!sb->s_root) {
 		int err;
-		sb->s_flags = flags;
 		err = fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
 		if (err) {
 			deactivate_locked_super(sb);
@@ -797,7 +800,8 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 		error = -EBUSY;
 		goto error_bdev;
 	}
-	s = sget(fs_type, test_bdev_super, set_bdev_super, bdev);
+	s = sget(fs_type, test_bdev_super, set_bdev_super, flags | MS_NOSEC,
+		 bdev);
 	mutex_unlock(&bdev->bd_fsfreeze_mutex);
 	if (IS_ERR(s))
 		goto error_s;
@@ -822,7 +826,6 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	} else {
 		char b[BDEVNAME_SIZE];
 
-		s->s_flags = flags | MS_NOSEC;
 		s->s_mode = mode;
 		strlcpy(s->s_id, bdevname(bdev, b), sizeof(s->s_id));
 		sb_set_blocksize(s, block_size(bdev));
@@ -867,13 +870,11 @@ struct dentry *mount_nodev(struct file_system_type *fs_type,
 	int (*fill_super)(struct super_block *, void *, int))
 {
 	int error;
-	struct super_block *s = sget(fs_type, NULL, set_anon_super, NULL);
+	struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL);
 
 	if (IS_ERR(s))
 		return ERR_CAST(s);
 
-	s->s_flags = flags;
-
 	error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
 	if (error) {
 		deactivate_locked_super(s);
@@ -896,11 +897,10 @@ struct dentry *mount_single(struct file_system_type *fs_type,
 	struct super_block *s;
 	int error;
 
-	s = sget(fs_type, compare_single, set_anon_super, NULL);
+	s = sget(fs_type, compare_single, set_anon_super, flags, NULL);
 	if (IS_ERR(s))
 		return ERR_CAST(s);
 	if (!s->s_root) {
-		s->s_flags = flags;
 		error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
 		if (error) {
 			deactivate_locked_super(s);
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index e34f0d9..a6674da 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -118,13 +118,12 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 	for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++)
 		info->ns[type] = kobj_ns_grab_current(type);
 
-	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
+	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, flags, info);
 	if (IS_ERR(sb) || sb->s_fs_info != info)
 		free_sysfs_super_info(info);
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 	if (!sb->s_root) {
-		sb->s_flags = flags;
 		error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
 		if (error) {
 			deactivate_locked_super(sb);
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 529be05..d628df8 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2142,7 +2142,7 @@ static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags,
 
 	dbg_gen("opened ubi%d_%d", c->vi.ubi_num, c->vi.vol_id);
 
-	sb = sget(fs_type, sb_test, sb_set, c);
+	sb = sget(fs_type, sb_test, sb_set, flags, c);
 	if (IS_ERR(sb)) {
 		err = PTR_ERR(sb);
 		kfree(c);
@@ -2159,7 +2159,6 @@ static struct dentry *ubifs_mount(struct file_system_type *fs_type, int flags,
 			goto out_deact;
 		}
 	} else {
-		sb->s_flags = flags;
 		err = ubifs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
 		if (err)
 			goto out_deact;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b5b9792..602167b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1840,7 +1840,7 @@ int set_anon_super(struct super_block *s, void *data);
 struct super_block *sget(struct file_system_type *type,
 			int (*test)(struct super_block *,void *),
 			int (*set)(struct super_block *,void *),
-			void *data);
+			int flags, void *data);
 extern struct dentry *mount_pseudo(struct file_system_type *, char *,
 	const struct super_operations *ops,
 	const struct dentry_operations *dops,
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 2731d11..916924a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1499,7 +1499,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	opts.new_root = new_root;
 
 	/* Locate an existing or new sb for this hierarchy */
-	sb = sget(fs_type, cgroup_test_super, cgroup_set_super, &opts);
+	sb = sget(fs_type, cgroup_test_super, cgroup_set_super, 0, &opts);
 	if (IS_ERR(sb)) {
 		ret = PTR_ERR(sb);
 		cgroup_drop_root(opts.new_root);


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer
  2011-07-11 14:02                       ` Peter Zijlstra
  2011-07-11 14:50                         ` [PATCH 1/2] VFS: Pass mount flags to sget() David Howells
@ 2011-07-11 14:50                         ` David Howells
  1 sibling, 0 replies; 81+ messages in thread
From: David Howells @ 2011-07-11 14:50 UTC (permalink / raw)
  To: peterz
  Cc: dhowells, aviro, hch, hramrach, ikent, linux-fsdevel,
	linux-kernel, jmoyer, miklos

Duplicate the i_mutex and i_dir_mutex lock classes and use for unionmount upper
layer superblock instead of the normal lock classes.  This solves some of the
lockdep noise when the VFS tries to hold locks on inodes in both layers at the
same time.  Note these only occur if both layers are of the same filesystem
type.

As far as I can tell, most of the lockdep warnings are false positives since
the inodes being locked are part of different superblocks; however, because
lockdep works on lock *classes*, it can't determine this.

I suspect that giving each superblock its own lock class would overextend
lockdep.
---

 fs/inode.c         |   48 ++++++++++++++++++++++++++++++++++++------------
 fs/namespace.c     |    2 +-
 fs/super.c         |    8 ++++++++
 include/linux/fs.h |    5 +++--
 4 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 43566d1..95d076d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -173,8 +173,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 	spin_lock_init(&inode->i_lock);
 	lockdep_set_class(&inode->i_lock, &sb->s_type->i_lock_key);
 
+	/* Duplicate the code with separate indices so that when lockdep print
+	 * a warning, the numeric index is seen.
+	 */
 	mutex_init(&inode->i_mutex);
-	lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
+	if (sb->s_lock_class == 0)
+		lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key[0]);
+	else
+		lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key[1]);
 
 	init_rwsem(&inode->i_alloc_sem);
 	lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
@@ -882,18 +888,36 @@ void unlock_new_inode(struct inode *inode)
 {
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	if (S_ISDIR(inode->i_mode)) {
-		struct file_system_type *type = inode->i_sb->s_type;
+		struct super_block *sb = inode->i_sb;
+		struct file_system_type *type = sb->s_type;
 
-		/* Set new key only if filesystem hasn't already changed it */
-		if (!lockdep_match_class(&inode->i_mutex,
-		    &type->i_mutex_key)) {
-			/*
-			 * ensure nobody is actually holding i_mutex
-			 */
-			mutex_destroy(&inode->i_mutex);
-			mutex_init(&inode->i_mutex);
-			lockdep_set_class(&inode->i_mutex,
-					  &type->i_mutex_dir_key);
+		/* Set new key only if filesystem hasn't already changed it
+		 *
+		 * Duplicate the code with separate indices so that when
+		 * lockdep print a warning, the numeric index is seen.
+		 */
+		if (sb->s_lock_class == 0) {
+			if (!lockdep_match_class(&inode->i_mutex,
+						 &type->i_mutex_key[0])) {
+				/*
+				 * ensure nobody is actually holding i_mutex
+				 */
+				mutex_destroy(&inode->i_mutex);
+				mutex_init(&inode->i_mutex);
+				lockdep_set_class(&inode->i_mutex,
+						  &type->i_mutex_dir_key[0]);
+			}
+		} else {
+			if (!lockdep_match_class(&inode->i_mutex,
+						 &type->i_mutex_key[1])) {
+				/*
+				 * ensure nobody is actually holding i_mutex
+				 */
+				mutex_destroy(&inode->i_mutex);
+				mutex_init(&inode->i_mutex);
+				lockdep_set_class(&inode->i_mutex,
+						  &type->i_mutex_dir_key[1]);
+			}
 		}
 	}
 #endif
diff --git a/fs/namespace.c b/fs/namespace.c
index 18958fd..59f0942 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2611,7 +2611,7 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 
 	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN |
 		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
-		   MS_STRICTATIME | MS_UNION);
+		   MS_STRICTATIME);
 
 	if (flags & MS_REMOUNT)
 		retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags,
diff --git a/fs/super.c b/fs/super.c
index 95a2ebc..cd60a34 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -74,6 +74,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags)
 		INIT_LIST_HEAD(&s->s_files);
 #endif
 		s->s_flags = flags;
+		s->s_lock_class = (flags & MS_UNION) ? 1 : 0;
 		s->s_bdi = &default_backing_dev_info;
 		INIT_LIST_HEAD(&s->s_instances);
 		INIT_HLIST_BL_HEAD(&s->s_anon);
@@ -346,6 +347,13 @@ retry:
 				deactivate_locked_super(old);
 				goto retry;
 			}
+#ifdef CONFIG_UNION_MOUNT
+			if (unlikely((old->s_flags | flags) & MS_UNION)) {
+				up_write(&old->s_umount);
+				deactivate_locked_super(old);
+				return ERR_PTR(-EINVAL);
+			}
+#endif
 			return old;
 		}
 	}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4bf3903..2fd73a9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1373,6 +1373,7 @@ struct super_block {
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
 	unsigned char		s_dirt;
 	unsigned char		s_blocksize_bits;
+	u8			s_lock_class;	/* Set of lock classes to use */
 	unsigned long		s_blocksize;
 	loff_t			s_maxbytes;	/* Max file size */
 	struct file_system_type	*s_type;
@@ -1842,8 +1843,8 @@ struct file_system_type {
 	struct lock_class_key s_vfs_rename_key;
 
 	struct lock_class_key i_lock_key;
-	struct lock_class_key i_mutex_key;
-	struct lock_class_key i_mutex_dir_key;
+	struct lock_class_key i_mutex_key[2];
+	struct lock_class_key i_mutex_dir_key[2];
 	struct lock_class_key i_alloc_sem_key;
 };
 


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 13:50                         ` Ian Kent
@ 2011-07-11 16:17                           ` Michal Suchanek
  2011-07-11 17:23                             ` Ian Kent
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-07-11 16:17 UTC (permalink / raw)
  To: Ian Kent
  Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro,
	Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On 11 July 2011 15:50, Ian Kent <ikent@redhat.com> wrote:
> On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote:
>> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
>> >> Peter Zijlstra <peterz@infradead.org> wrote:
>> >>
>>
>> >> > Also, why would you want to have a class per sb-instance? From last
>> >> > talking to David, he said there could only ever be 2 filesystems
>> >> > involved in this, the top and bottom, and it is determined on (union)
>> >> > mount time which is which.
>> >>
>> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
>> >> though I think only one lowerfs is accessed at a time.
>> >
>> > Right, however I understood from our earlier discussion that the vfs
>> > would only ever try to lock 2 filesystems at a time, the top and one
>> > lower.
>>
>> This is true from local point of view. However, it is technically
>> possible to use overlayfs as the upper layer of another overlayfs
>> which allows layering multiple readonly "branches" into a single
>> overlay. Since the vfs will lock the "union" and one (or possibly
>> both) of its branches and one of the branches may be itself an union
>> you can get arbitrary depth (which is currently limited by a constant
>> in the code to cut recursion depth and stack usage).
>
> Off topic but can you elaborate on that?
>
> Are you saying the "unioned stack" can consist of more than two file
> systems and can have more than two layers and possibly a mix of multiple
> read-only and read-write file systems?
>

This is how requirements are described in documentation:

> The lower filesystem can be any filesystem supported by Linux and does
> not need to be writable.  The lower filesystem can even be another
> overlayfs.  The upper filesystem will normally be writable and if it
> is it must support the creation of trusted.* extended attributes, and
> must provide valid d_type in readdir responses, at least for symbolic
> links - so NFS is not suitable.

In no place it says that the lower filesystem is required to be
readonly, only that it should not be modified.


This is what the documentation gives as example:

> mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay

This is how it can be expanded:

mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay

lower1 and lower2 are readonly branches you want to union; tmpoverlay
is writable so it fulfills the requirement for upper layer of overlay.
Note that the order is backwards, lower1 appears under lower2 which is
mounted earlier if I understand the layering correctly.

Depending on the way overlayfs handles creation of trusted.* extended
attributes on itself this overlay may work or break in interesting
ways.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 16:17                           ` Michal Suchanek
@ 2011-07-11 17:23                             ` Ian Kent
  2011-07-11 18:08                               ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Ian Kent @ 2011-07-11 17:23 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro,
	Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On Mon, 2011-07-11 at 18:17 +0200, Michal Suchanek wrote:
> On 11 July 2011 15:50, Ian Kent <ikent@redhat.com> wrote:
> > On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote:
> >> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote:
> >> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
> >> >> Peter Zijlstra <peterz@infradead.org> wrote:
> >> >>
> >>
> >> >> > Also, why would you want to have a class per sb-instance? From last
> >> >> > talking to David, he said there could only ever be 2 filesystems
> >> >> > involved in this, the top and bottom, and it is determined on (union)
> >> >> > mount time which is which.
> >> >>
> >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
> >> >> though I think only one lowerfs is accessed at a time.
> >> >
> >> > Right, however I understood from our earlier discussion that the vfs
> >> > would only ever try to lock 2 filesystems at a time, the top and one
> >> > lower.
> >>
> >> This is true from local point of view. However, it is technically
> >> possible to use overlayfs as the upper layer of another overlayfs
> >> which allows layering multiple readonly "branches" into a single
> >> overlay. Since the vfs will lock the "union" and one (or possibly
> >> both) of its branches and one of the branches may be itself an union
> >> you can get arbitrary depth (which is currently limited by a constant
> >> in the code to cut recursion depth and stack usage).
> >
> > Off topic but can you elaborate on that?
> >
> > Are you saying the "unioned stack" can consist of more than two file
> > systems and can have more than two layers and possibly a mix of multiple
> > read-only and read-write file systems?
> >
> 
> This is how requirements are described in documentation:
> 
> > The lower filesystem can be any filesystem supported by Linux and does
> > not need to be writable.  The lower filesystem can even be another
> > overlayfs.  The upper filesystem will normally be writable and if it
> > is it must support the creation of trusted.* extended attributes, and
> > must provide valid d_type in readdir responses, at least for symbolic
> > links - so NFS is not suitable.
> 
> In no place it says that the lower filesystem is required to be
> readonly, only that it should not be modified.
> 
> 
> This is what the documentation gives as example:
> 
> > mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay
> 
> This is how it can be expanded:
> 
> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay

OK, I'll have to think about what this means but I suspect that it is
broken. I'll have a look at the overlayfs code and see if there are
globally enforced ordering of stacked file systems. If there is none
then I believe overlayfs is probably open to AB <-> BA deadlock due to
the possibility of locking two file systems in one overlayfs stack in
one order and the same two file systems in the opposite order in
another.

I don't remember seeing any unioning file system that checks and
enforces this type of global ordering, although I think the special case
checks of union mount pretty much cover it, AFAICT. Its been a while
since I looked at the code for any of the unioning file systems so I may
be wrong.

Assuming I am correct though, that then defines restrictions on what
should (or can) be aloud from a lockdep POV.

Ian



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 17:23                             ` Ian Kent
@ 2011-07-11 18:08                               ` Michal Suchanek
  2011-07-12  8:30                                 ` Miklos Szeredi
  2011-07-13 12:02                                   ` David Howells
  0 siblings, 2 replies; 81+ messages in thread
From: Michal Suchanek @ 2011-07-11 18:08 UTC (permalink / raw)
  To: Ian Kent
  Cc: Peter Zijlstra, David Howells, Ric Wheeler, Alexander Viro,
	Christoph Hellwig, Ingo Molnar, linux-fsdevel, linux-kernel,
	Jeff Moyer, miklos

On 11 July 2011 19:23, Ian Kent <ikent@redhat.com> wrote:
> On Mon, 2011-07-11 at 18:17 +0200, Michal Suchanek wrote:
>> On 11 July 2011 15:50, Ian Kent <ikent@redhat.com> wrote:
>> > On Mon, 2011-07-11 at 15:36 +0200, Michal Suchanek wrote:
>> >> On 11 July 2011 14:00, Peter Zijlstra <peterz@infradead.org> wrote:
>> >> > On Mon, 2011-07-11 at 12:01 +0100, David Howells wrote:
>> >> >> Peter Zijlstra <peterz@infradead.org> wrote:
>> >> >>
>> >>
>> >> >> > Also, why would you want to have a class per sb-instance? From last
>> >> >> > talking to David, he said there could only ever be 2 filesystems
>> >> >> > involved in this, the top and bottom, and it is determined on (union)
>> >> >> > mount time which is which.
>> >> >>
>> >> >> There can be more than 2 - one upperfs (the actual union) and many lowerfs -
>> >> >> though I think only one lowerfs is accessed at a time.
>> >> >
>> >> > Right, however I understood from our earlier discussion that the vfs
>> >> > would only ever try to lock 2 filesystems at a time, the top and one
>> >> > lower.
>> >>
>> >> This is true from local point of view. However, it is technically
>> >> possible to use overlayfs as the upper layer of another overlayfs
>> >> which allows layering multiple readonly "branches" into a single
>> >> overlay. Since the vfs will lock the "union" and one (or possibly
>> >> both) of its branches and one of the branches may be itself an union
>> >> you can get arbitrary depth (which is currently limited by a constant
>> >> in the code to cut recursion depth and stack usage).
>> >
>> > Off topic but can you elaborate on that?
>> >
>> > Are you saying the "unioned stack" can consist of more than two file
>> > systems and can have more than two layers and possibly a mix of multiple
>> > read-only and read-write file systems?
>> >
>>
>> This is how requirements are described in documentation:
>>
>> > The lower filesystem can be any filesystem supported by Linux and does
>> > not need to be writable.  The lower filesystem can even be another
>> > overlayfs.  The upper filesystem will normally be writable and if it
>> > is it must support the creation of trusted.* extended attributes, and
>> > must provide valid d_type in readdir responses, at least for symbolic
>> > links - so NFS is not suitable.
>>
>> In no place it says that the lower filesystem is required to be
>> readonly, only that it should not be modified.
>>
>>
>> This is what the documentation gives as example:
>>
>> > mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper /overlay
>>
>> This is how it can be expanded:
>>
>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay
>
> OK, I'll have to think about what this means but I suspect that it is
> broken. I'll have a look at the overlayfs code and see if there are
> globally enforced ordering of stacked file systems. If there is none
> then I believe overlayfs is probably open to AB <-> BA deadlock due to
> the possibility of locking two file systems in one overlayfs stack in
> one order and the same two file systems in the opposite order in
> another.

I think this is fine as long as the same layer does not appear in two
different unions.

The locking order is likely determined by the structure of the union
and not some system-wide order of filesystems so assuming the readonly
layers are locked as well you will probably get a deadlock with
technically correct mount:

mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay

mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2
mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2

because now lower1 and lower2 are differently ordered in the two overlays.

System-wide locking order and some optimizations are reasonably
possible only when the mount is actually aware that it has multiple
branches like

mount -t overlayfs overlayfs
-olowerdirs=/lower1:/lower2,upperdir=/upper3 /not-possible-overlay

Note also that there is no guarantee that /lower1 and /lower2 are in
any way distinct or don't have intermingled hardlinks or symlinks.


Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 18:08                               ` Michal Suchanek
@ 2011-07-12  8:30                                 ` Miklos Szeredi
  2011-07-12  9:58                                   ` Michal Suchanek
  2011-07-13 12:02                                   ` David Howells
  1 sibling, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-07-12  8:30 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Michal Suchanek <hramrach@centrum.cz> writes:

> The locking order is likely determined by the structure of the union
> and not some system-wide order of filesystems so assuming the readonly
> layers are locked as well you will probably get a deadlock with
> technically correct mount:
>
> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay
>
> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2
> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2
>
> because now lower1 and lower2 are differently ordered in the two
> overlays.

Overlayfs never locks both upper and lower at the same time, which means
there's no AB-BA locking dependency.  The lock orderings are:

-> /overlay
  -> /lower1
  -> /tmpoverlay
    -> /lower2
    -> /upper
-> /overlay2
  -> /lower2
  -> /tmpoverlay2
    -> /lower1
    -> /upper2

As you can see there's no nesting of lower2 and lower1 into each other.

When you combine two filesystems, a completely new ordering is created
each time, there's no possibility to make an AB-BA nesting.  At least I
cannot see one.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-12  8:30                                 ` Miklos Szeredi
@ 2011-07-12  9:58                                   ` Michal Suchanek
  2011-07-12 11:45                                     ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-07-12  9:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

On 12 July 2011 10:30, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Michal Suchanek <hramrach@centrum.cz> writes:
>
>> The locking order is likely determined by the structure of the union
>> and not some system-wide order of filesystems so assuming the readonly
>> layers are locked as well you will probably get a deadlock with
>> technically correct mount:
>>
>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay
>>
>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2
>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2
>>
>> because now lower1 and lower2 are differently ordered in the two
>> overlays.
>
> Overlayfs never locks both upper and lower at the same time, which means
> there's no AB-BA locking dependency.  The lock orderings are:
>
> -> /overlay
>  -> /lower1
>  -> /tmpoverlay
>    -> /lower2
>    -> /upper
> -> /overlay2
>  -> /lower2
>  -> /tmpoverlay2
>    -> /lower1
>    -> /upper2
>
> As you can see there's no nesting of lower2 and lower1 into each other.
>
> When you combine two filesystems, a completely new ordering is created
> each time, there's no possibility to make an AB-BA nesting.  At least I
> cannot see one.

Except you can get in situation where overlay locks lower1 and
tmpoverlay waits for lower2 which is held by overlay2 waiting for
lower1 in tmpoverlay2.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-12  9:58                                   ` Michal Suchanek
@ 2011-07-12 11:45                                     ` Miklos Szeredi
  2011-07-12 18:49                                       ` Michal Suchanek
  0 siblings, 1 reply; 81+ messages in thread
From: Miklos Szeredi @ 2011-07-12 11:45 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Michal Suchanek <hramrach@centrum.cz> writes:

> On 12 July 2011 10:30, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> Michal Suchanek <hramrach@centrum.cz> writes:
>>
>>> The locking order is likely determined by the structure of the union
>>> and not some system-wide order of filesystems so assuming the readonly
>>> layers are locked as well you will probably get a deadlock with
>>> technically correct mount:
>>>
>>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
>>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay
>>>
>>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2
>>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2
>>>
>>> because now lower1 and lower2 are differently ordered in the two
>>> overlays.
>>
>> Overlayfs never locks both upper and lower at the same time, which means
>> there's no AB-BA locking dependency.  The lock orderings are:
>>
>> -> /overlay
>>  -> /lower1
>>  -> /tmpoverlay
>>    -> /lower2
>>    -> /upper
>> -> /overlay2
>>  -> /lower2
>>  -> /tmpoverlay2
>>    -> /lower1
>>    -> /upper2
>>
>> As you can see there's no nesting of lower2 and lower1 into each other.
>>
>> When you combine two filesystems, a completely new ordering is created
>> each time, there's no possibility to make an AB-BA nesting.  At least I
>> cannot see one.
>
> Except you can get in situation where overlay locks lower1 and
> tmpoverlay waits for lower2

Note: tmpoverlay lock does *not* nest into lower1 lock, they are both on
the same nesting level.  There's no dependency between the two.

>  which is held by overlay2 waiting for
> lower1 in tmpoverlay2.

No deadlock there.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-12 11:45                                     ` Miklos Szeredi
@ 2011-07-12 18:49                                       ` Michal Suchanek
  2011-07-13  9:49                                         ` Miklos Szeredi
  0 siblings, 1 reply; 81+ messages in thread
From: Michal Suchanek @ 2011-07-12 18:49 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

On 12 July 2011 13:45, Miklos Szeredi <miklos@szeredi.hu> wrote:
> Michal Suchanek <hramrach@centrum.cz> writes:
>
>> On 12 July 2011 10:30, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>> Michal Suchanek <hramrach@centrum.cz> writes:
>>>
>>>> The locking order is likely determined by the structure of the union
>>>> and not some system-wide order of filesystems so assuming the readonly
>>>> layers are locked as well you will probably get a deadlock with
>>>> technically correct mount:
>>>>
>>>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/upper /tmpoverlay
>>>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/tmpoverlay /overlay
>>>>
>>>> mount -t overlayfs overlayfs -olowerdir=/lower1,upperdir=/upper2 /tmpoverlay2
>>>> mount -t overlayfs overlayfs -olowerdir=/lower2,upperdir=/tmpoverlay2 /overlay2
>>>>
>>>> because now lower1 and lower2 are differently ordered in the two
>>>> overlays.
>>>
>>> Overlayfs never locks both upper and lower at the same time, which means
>>> there's no AB-BA locking dependency.  The lock orderings are:
>>>
>>> -> /overlay
>>>  -> /lower1
>>>  -> /tmpoverlay
>>>    -> /lower2
>>>    -> /upper
>>> -> /overlay2
>>>  -> /lower2
>>>  -> /tmpoverlay2
>>>    -> /lower1
>>>    -> /upper2
>>>
>>> As you can see there's no nesting of lower2 and lower1 into each other.
>>>
>>> When you combine two filesystems, a completely new ordering is created
>>> each time, there's no possibility to make an AB-BA nesting.  At least I
>>> cannot see one.
>>
>> Except you can get in situation where overlay locks lower1 and
>> tmpoverlay waits for lower2
>
> Note: tmpoverlay lock does *not* nest into lower1 lock, they are both on
> the same nesting level.  There's no dependency between the two.
>
>>  which is held by overlay2 waiting for
>> lower1 in tmpoverlay2.
>
> No deadlock there.
>

That's nice.

You can still do

mount --bind /lower1 /lower2/lower1
mount --bind /lower2 /lower1/lower2

Which is technically not against usage guidelines, unlike mount --bind
/upper /lower1/upper

If crossing mount boundaries is forbidden try with symlinks or hardlinks.

Thanks

Michal

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-12 18:49                                       ` Michal Suchanek
@ 2011-07-13  9:49                                         ` Miklos Szeredi
  0 siblings, 0 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-07-13  9:49 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Ian Kent, Peter Zijlstra, David Howells, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Michal Suchanek <hramrach@centrum.cz> writes:

> You can still do
>
> mount --bind /lower1 /lower2/lower1
> mount --bind /lower2 /lower1/lower2
>
> Which is technically not against usage guidelines, unlike mount --bind
> /upper /lower1/upper

Overlayfs will take the directory you specify as upperdir and lowerdir
and will clone those mounts internally.

This means that any present or future submounts are ignored and the
mount containting those directories may be moved or umounted and
overlayfs will still function.

None of the bind mounts above will have any effect on an overlayfs that
uses /lower[12] and /upper.

> If crossing mount boundaries is forbidden try with symlinks or hardlinks.

I'm not sure what you mean here.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-11 18:08                               ` Michal Suchanek
@ 2011-07-13 12:02                                   ` David Howells
  2011-07-13 12:02                                   ` David Howells
  1 sibling, 0 replies; 81+ messages in thread
From: David Howells @ 2011-07-13 12:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Overlayfs never locks both upper and lower at the same time, which means
> there's no AB-BA locking dependency.  The lock orderings are:

What you're talking about is not analogous to the situation I'm seeing with
unionmount.

You actually have three filesystems in overlayfs.  The interaction between
overlayfs-and-upperfs and overlayfs-and-lowerfs is the equivalent in
unionmount terms to upperfs-and-lowerfs.  This is where the lockdep issue
lies.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
@ 2011-07-13 12:02                                   ` David Howells
  0 siblings, 0 replies; 81+ messages in thread
From: David Howells @ 2011-07-13 12:02 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Overlayfs never locks both upper and lower at the same time, which means
> there's no AB-BA locking dependency.  The lock orderings are:

What you're talking about is not analogous to the situation I'm seeing with
unionmount.

You actually have three filesystems in overlayfs.  The interaction between
overlayfs-and-upperfs and overlayfs-and-lowerfs is the equivalent in
unionmount terms to upperfs-and-lowerfs.  This is where the lockdep issue
lies.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-13 12:02                                   ` David Howells
  (?)
@ 2011-07-13 13:20                                   ` Miklos Szeredi
  -1 siblings, 0 replies; 81+ messages in thread
From: Miklos Szeredi @ 2011-07-13 13:20 UTC (permalink / raw)
  To: David Howells
  Cc: Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

David Howells <dhowells@redhat.com> writes:

> Miklos Szeredi <miklos@szeredi.hu> wrote:
>
>> Overlayfs never locks both upper and lower at the same time, which means
>> there's no AB-BA locking dependency.  The lock orderings are:
>
> What you're talking about is not analogous to the situation I'm seeing with
> unionmount.
>
> You actually have three filesystems in overlayfs.  The interaction between
> overlayfs-and-upperfs and overlayfs-and-lowerfs is the equivalent in
> unionmount terms to upperfs-and-lowerfs.  This is where the lockdep issue
> lies.

Normally overlayfs would be immune to this, since lower/upper are
different filesystem types.  But overlayfs-over-overlayfs would see the
same lockdep warnings, right?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
  2011-07-13 12:02                                   ` David Howells
@ 2011-07-14  0:57                                     ` David Howells
  -1 siblings, 0 replies; 81+ messages in thread
From: David Howells @ 2011-07-14  0:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Normally overlayfs would be immune to this, since lower/upper are
> different filesystem types.  But overlayfs-over-overlayfs would see the
> same lockdep warnings, right?

Yes.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: Union mount and lockdep design issues
@ 2011-07-14  0:57                                     ` David Howells
  0 siblings, 0 replies; 81+ messages in thread
From: David Howells @ 2011-07-14  0:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: dhowells, Michal Suchanek, Ian Kent, Peter Zijlstra, Ric Wheeler,
	Alexander Viro, Christoph Hellwig, Ingo Molnar, linux-fsdevel,
	linux-kernel, Jeff Moyer

Miklos Szeredi <miklos@szeredi.hu> wrote:

> Normally overlayfs would be immune to this, since lower/upper are
> different filesystem types.  But overlayfs-over-overlayfs would see the
> same lockdep warnings, right?

Yes.

David

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2011-07-14  0:58 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-12 15:00 Unionmount status? Michal Suchanek
2011-04-12 20:31 ` Ric Wheeler
2011-04-12 21:36   ` Michal Suchanek
2011-04-13 14:18     ` Jiri Kosina
2011-04-13 15:13       ` Michal Suchanek
2011-04-14  8:38         ` Miklos Szeredi
2011-04-14  9:48           ` Sedat Dilek
2011-04-14  9:58             ` Miklos Szeredi
2011-04-15 11:22           ` Michal Suchanek
2011-04-15 11:31             ` Miklos Szeredi
2011-04-15 11:51               ` Michal Suchanek
2011-04-15 12:29                 ` Miklos Szeredi
2011-04-15 12:34                   ` Michal Suchanek
2011-04-15 12:34                     ` Michal Suchanek
2011-04-15 12:48                     ` Miklos Szeredi
2011-04-15 21:48                   ` Hugh Dickins
2011-04-15 22:18                   ` Andreas Dilger
2011-04-18 13:31                     ` Michal Suchanek
2011-04-18 13:31                       ` Michal Suchanek
2011-04-19 20:04                       ` [PATCH] tmpfs: implement generic xattr support Miklos Szeredi
2011-04-20  2:18                         ` Phillip Lougher
2011-04-20  2:18                           ` Phillip Lougher
2011-04-20 13:43                           ` Miklos Szeredi
2011-04-21  6:59                             ` Michal Suchanek
2011-04-21  9:08                               ` Miklos Szeredi
2011-04-21 10:59                                 ` Michal Suchanek
2011-04-21 14:58                                   ` Jordi Pujol
2011-04-21 15:22                                     ` Michal Suchanek
2011-04-21 15:43                                       ` Michal Suchanek
2011-04-21 17:26                                         ` Miklos Szeredi
2011-04-21 19:17                                           ` Michal Suchanek
2011-04-20 16:00                         ` Serge E. Hallyn
2011-05-12  4:20                         ` Hugh Dickins
2011-05-12  7:52                           ` Michal Suchanek
2011-05-12 12:27                           ` Miklos Szeredi
2011-05-12 14:00                             ` Miklos Szeredi
2011-05-12 16:52                               ` Hugh Dickins
2011-04-18 13:34                     ` Unionmount status? Michal Suchanek
2011-04-18 13:34                       ` Michal Suchanek
2011-04-18 13:37                     ` Michal Suchanek
2011-04-18 13:37                       ` Michal Suchanek
2011-04-13 17:26     ` Ric Wheeler
2011-04-13 18:58       ` Michal Suchanek
2011-04-13 19:11         ` Ric Wheeler
2011-04-13 19:47           ` Michal Suchanek
2011-04-14  4:50             ` Ian Kent
2011-04-14  9:32               ` Michal Suchanek
2011-04-14  9:40                 ` Miklos Szeredi
2011-04-14 13:21                   ` Ric Wheeler
2011-04-14 14:54                     ` Michal Suchanek
2011-04-15 16:31                       ` Ric Wheeler
2011-04-14 19:14             ` David Howells
2011-06-29  9:39               ` Union mount and overlayfs bake off? Ric Wheeler
2011-06-29 11:40                 ` Michal Suchanek
2011-06-29 10:17               ` David Howells
2011-06-30 12:44                 ` Miklos Szeredi
2011-07-10  8:28                 ` Union mount and lockdep design issues Ric Wheeler
2011-07-10 13:48                   ` Peter Zijlstra
2011-07-11  8:35                     ` Michal Suchanek
2011-07-11 11:01                   ` David Howells
2011-07-11 12:00                     ` Peter Zijlstra
2011-07-11 13:36                       ` Michal Suchanek
2011-07-11 13:36                         ` Michal Suchanek
2011-07-11 13:50                         ` Ian Kent
2011-07-11 16:17                           ` Michal Suchanek
2011-07-11 17:23                             ` Ian Kent
2011-07-11 18:08                               ` Michal Suchanek
2011-07-12  8:30                                 ` Miklos Szeredi
2011-07-12  9:58                                   ` Michal Suchanek
2011-07-12 11:45                                     ` Miklos Szeredi
2011-07-12 18:49                                       ` Michal Suchanek
2011-07-13  9:49                                         ` Miklos Szeredi
2011-07-13 12:02                                 ` David Howells
2011-07-13 12:02                                   ` David Howells
2011-07-13 13:20                                   ` Miklos Szeredi
2011-07-14  0:57                                   ` David Howells
2011-07-14  0:57                                     ` David Howells
2011-07-11 13:54                     ` David Howells
2011-07-11 14:02                       ` Peter Zijlstra
2011-07-11 14:50                         ` [PATCH 1/2] VFS: Pass mount flags to sget() David Howells
2011-07-11 14:50                         ` [PATCH 2/2] union-mount: Duplicate the i_{, dir_}mutex lock classes and use for upper layer David Howells

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.