linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* overlayfs: allowing for changes to lowerdir
@ 2017-02-13 21:41 Josh England
  2017-02-14 14:01 ` Amir Goldstein
  2017-02-15  1:29 ` J. R. Okajima
  0 siblings, 2 replies; 15+ messages in thread
From: Josh England @ 2017-02-13 21:41 UTC (permalink / raw)
  To: linux-fsdevel

So here's the use case:  lowerdir is an NFS mounted root filesystem
(shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
for writes to happen.  This works great with the caveat being I cannot
make 'live' changes to the root filesystem, which poses the problem.
Any access to a changed file causes a 'Stale file handle' error.

With some experimenting, I've discovered that remounting the overlay
filesystem (mount -o remount / /)  registers any changes that have
been made to the lower NFS filesystem.  In addition, dumping cache
(via /proc/sys/vm/drop_caches) also makes the stale file handle errors
go away and reads pass through to the lower dir and correctly show
changes.

I'd like to make this use case feasible by allowing changes to the NFS
lowerdir to work more or less transparently.  It seems like if the
overlay did not do any caching at all, all reads would fall through to
either the upperdir ram disk or the NFS lower, which is precisely what
I want.

So, let me pose this somewhat naive question:  Would it be possible to
simply disable any cacheing performed by the overlay to force all
reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
Would this be enough to accomplish my goal of being able to change the
lowerdir of an active overlayfs?

-JE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-13 21:41 overlayfs: allowing for changes to lowerdir Josh England
@ 2017-02-14 14:01 ` Amir Goldstein
  2017-02-14 17:14   ` Josh England
                     ` (2 more replies)
  2017-02-15  1:29 ` J. R. Okajima
  1 sibling, 3 replies; 15+ messages in thread
From: Amir Goldstein @ 2017-02-14 14:01 UTC (permalink / raw)
  To: Josh England; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi

On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
> So here's the use case:  lowerdir is an NFS mounted root filesystem
> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
> for writes to happen.  This works great with the caveat being I cannot
> make 'live' changes to the root filesystem, which poses the problem.
> Any access to a changed file causes a 'Stale file handle' error.
>
> With some experimenting, I've discovered that remounting the overlay
> filesystem (mount -o remount / /)  registers any changes that have
> been made to the lower NFS filesystem.  In addition, dumping cache
> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
> go away and reads pass through to the lower dir and correctly show
> changes.
>
> I'd like to make this use case feasible by allowing changes to the NFS
> lowerdir to work more or less transparently.  It seems like if the
> overlay did not do any caching at all, all reads would fall through to
> either the upperdir ram disk or the NFS lower, which is precisely what
> I want.
>
> So, let me pose this somewhat naive question:  Would it be possible to
> simply disable any cacheing performed by the overlay to force all
> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
> Would this be enough to accomplish my goal of being able to change the
> lowerdir of an active overlayfs?
>

There is no need to disable caching. There is already a mechanism
in place in VFS to revalidate inode cache entries.
NFS implements d_revalidate() and overlayfs implements d_revalidate()
by calling into the lower fs d_revalidate().

However overlayfs intentionally errors when lower entry has been modified.
(see: 7c03b5d ovl: allow distributed fs as lower layer)

You can try this (untested) patch to revert this behavior, just to see if it
works for your use case, but it won't change this fact
from Documentation/filesystems/overlayfs.txt:
" Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed.  If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock."

Specifically, renaming directories and files in lower that were already
copied up is going to have a weird outcome.

Also, the situation with changing files in lower remote fs could be worse
than changing files on lower local fs, simply because right now, this
use case is not tested (i.e. it results in ESTALE).

I believe that fixing this use case, if at all possible, would require quite
a bit of work, a lot of documentation (about expected behavior) and
even more testing.

Amir.

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index e8ef9d1..6806ef3 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry
*dentry, unsigned int flags)

                if (d->d_flags & DCACHE_OP_REVALIDATE) {
                        ret = d->d_op->d_revalidate(d, flags);
-                       if (ret < 0)
+                       if (ret =< 0)
                                return ret;
-                       if (!ret) {
-                               if (!(flags & LOOKUP_RCU))
-                                       d_invalidate(d);
-                               return -ESTALE;
-                       }
                }
        }
-       return 1;
+       return ret;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-14 14:01 ` Amir Goldstein
@ 2017-02-14 17:14   ` Josh England
  2017-02-21 23:08   ` Josh England
  2017-03-09 10:37   ` Miklos Szeredi
  2 siblings, 0 replies; 15+ messages in thread
From: Josh England @ 2017-02-14 17:14 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi

Amir,

Thanks for the patch and the clarification for which changes to the
underlying filesystem are likely to cause havoc.  I'll play with this
some to see how usable it might be.

-JE


On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>> for writes to happen.  This works great with the caveat being I cannot
>> make 'live' changes to the root filesystem, which poses the problem.
>> Any access to a changed file causes a 'Stale file handle' error.
>>
>> With some experimenting, I've discovered that remounting the overlay
>> filesystem (mount -o remount / /)  registers any changes that have
>> been made to the lower NFS filesystem.  In addition, dumping cache
>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>> go away and reads pass through to the lower dir and correctly show
>> changes.
>>
>> I'd like to make this use case feasible by allowing changes to the NFS
>> lowerdir to work more or less transparently.  It seems like if the
>> overlay did not do any caching at all, all reads would fall through to
>> either the upperdir ram disk or the NFS lower, which is precisely what
>> I want.
>>
>> So, let me pose this somewhat naive question:  Would it be possible to
>> simply disable any cacheing performed by the overlay to force all
>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>> Would this be enough to accomplish my goal of being able to change the
>> lowerdir of an active overlayfs?
>>
>
> There is no need to disable caching. There is already a mechanism
> in place in VFS to revalidate inode cache entries.
> NFS implements d_revalidate() and overlayfs implements d_revalidate()
> by calling into the lower fs d_revalidate().
>
> However overlayfs intentionally errors when lower entry has been modified.
> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>
> You can try this (untested) patch to revert this behavior, just to see if it
> works for your use case, but it won't change this fact
> from Documentation/filesystems/overlayfs.txt:
> " Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."
>
> Specifically, renaming directories and files in lower that were already
> copied up is going to have a weird outcome.
>
> Also, the situation with changing files in lower remote fs could be worse
> than changing files on lower local fs, simply because right now, this
> use case is not tested (i.e. it results in ESTALE).
>
> I believe that fixing this use case, if at all possible, would require quite
> a bit of work, a lot of documentation (about expected behavior) and
> even more testing.
>
> Amir.
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index e8ef9d1..6806ef3 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry
> *dentry, unsigned int flags)
>
>                 if (d->d_flags & DCACHE_OP_REVALIDATE) {
>                         ret = d->d_op->d_revalidate(d, flags);
> -                       if (ret < 0)
> +                       if (ret =< 0)
>                                 return ret;
> -                       if (!ret) {
> -                               if (!(flags & LOOKUP_RCU))
> -                                       d_invalidate(d);
> -                               return -ESTALE;
> -                       }
>                 }
>         }
> -       return 1;
> +       return ret;
>  }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-13 21:41 overlayfs: allowing for changes to lowerdir Josh England
  2017-02-14 14:01 ` Amir Goldstein
@ 2017-02-15  1:29 ` J. R. Okajima
  1 sibling, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2017-02-15  1:29 UTC (permalink / raw)
  To: Josh England; +Cc: linux-fsdevel

Josh England:
> So, let me pose this somewhat naive question:  Would it be possible to
> simply disable any cacheing performed by the overlay to force all
> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
> Would this be enough to accomplish my goal of being able to change the
> lowerdir of an active overlayfs?

For your information, aufs allows by-passing aufs, ie. the modification
on the layers or UDBA (user's direct branch access). Internally aufs
sets the inotify to the cached dir inodes and recieves the notification
when the files on the branches (layers) are modified. Recieving the
notification, aufs marks the cached inode obsolete. Next time when the
access happens, aufs_d_revalidate() checks the mark and forces VFS to to
re-lookup.
You can enable/disable this feature anytime by
"mount -o remount,udba=brabra ..."


J. R. Okajima

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-14 14:01 ` Amir Goldstein
  2017-02-14 17:14   ` Josh England
@ 2017-02-21 23:08   ` Josh England
  2017-02-22  9:00     ` Ian Kent
  2017-02-27 10:40     ` Amir Goldstein
  2017-03-09 10:37   ` Miklos Szeredi
  2 siblings, 2 replies; 15+ messages in thread
From: Josh England @ 2017-02-21 23:08 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi

Amir,

After playing with it some, this patch seems to provide precisely the
behavior I need for my use case.  Do you think it makes sense to turn
this behavior into a module parameter (eg: allow_revalidate)?

-JE


On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>> for writes to happen.  This works great with the caveat being I cannot
>> make 'live' changes to the root filesystem, which poses the problem.
>> Any access to a changed file causes a 'Stale file handle' error.
>>
>> With some experimenting, I've discovered that remounting the overlay
>> filesystem (mount -o remount / /)  registers any changes that have
>> been made to the lower NFS filesystem.  In addition, dumping cache
>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>> go away and reads pass through to the lower dir and correctly show
>> changes.
>>
>> I'd like to make this use case feasible by allowing changes to the NFS
>> lowerdir to work more or less transparently.  It seems like if the
>> overlay did not do any caching at all, all reads would fall through to
>> either the upperdir ram disk or the NFS lower, which is precisely what
>> I want.
>>
>> So, let me pose this somewhat naive question:  Would it be possible to
>> simply disable any cacheing performed by the overlay to force all
>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>> Would this be enough to accomplish my goal of being able to change the
>> lowerdir of an active overlayfs?
>>
>
> There is no need to disable caching. There is already a mechanism
> in place in VFS to revalidate inode cache entries.
> NFS implements d_revalidate() and overlayfs implements d_revalidate()
> by calling into the lower fs d_revalidate().
>
> However overlayfs intentionally errors when lower entry has been modified.
> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>
> You can try this (untested) patch to revert this behavior, just to see if it
> works for your use case, but it won't change this fact
> from Documentation/filesystems/overlayfs.txt:
> " Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."
>
> Specifically, renaming directories and files in lower that were already
> copied up is going to have a weird outcome.
>
> Also, the situation with changing files in lower remote fs could be worse
> than changing files on lower local fs, simply because right now, this
> use case is not tested (i.e. it results in ESTALE).
>
> I believe that fixing this use case, if at all possible, would require quite
> a bit of work, a lot of documentation (about expected behavior) and
> even more testing.
>
> Amir.
>
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index e8ef9d1..6806ef3 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry
> *dentry, unsigned int flags)
>
>                 if (d->d_flags & DCACHE_OP_REVALIDATE) {
>                         ret = d->d_op->d_revalidate(d, flags);
> -                       if (ret < 0)
> +                       if (ret =< 0)
>                                 return ret;
> -                       if (!ret) {
> -                               if (!(flags & LOOKUP_RCU))
> -                                       d_invalidate(d);
> -                               return -ESTALE;
> -                       }
>                 }
>         }
> -       return 1;
> +       return ret;
>  }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-21 23:08   ` Josh England
@ 2017-02-22  9:00     ` Ian Kent
  2017-02-27 10:40     ` Amir Goldstein
  1 sibling, 0 replies; 15+ messages in thread
From: Ian Kent @ 2017-02-22  9:00 UTC (permalink / raw)
  To: Josh England, Amir Goldstein; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi

On Tue, 2017-02-21 at 15:08 -0800, Josh England wrote:
> Amir,
> 
> After playing with it some, this patch seems to provide precisely the
> behavior I need for my use case.  Do you think it makes sense to turn
> this behavior into a module parameter (eg: allow_revalidate)?

This has been a hot topic for me recently, and not only for the overlayfs use
case.

Any case where the inode of a mount point changes (file or directory) is a
problem with current kernels.

The problem is that, if the invalidated mount point belongs to a file system
that has been propagated to other namespaces (which is mostly always the case)
the invalid dentry will be exposed in the other namespaces where it is not a
mountpoint.

I'm not sure how this can be exploited but apparently it can be by using user
namespaces.

To implement this without exposing these dentrys when they shouldn't be the VFS
would need to retain the invalidated dentrys and allow valid dentrys to be
created (leading to multiple dentrys of the same name within directories), hide
them from external VFS lookups, but be able to find them itself when it needs
them and check if they are a mount point in the current namespace, avoid further
revalidation, etc, etc, and the implications go on.

Not only would retaining these dentrys probably lead to some fairly ugly code,
locating and checking these dentrys would need to be done on very hot execution
paths in the VFS which is a challenge in itself (the current kernel mount data
structures don't lend themselves to this at all).

So yes, on the face of it, it's straight forward to get something fairly simple
that will "appear to work" but would not be anywhere near good enough for
inclusion in the mainline kernel.

> 
> -JE
> 
> 
> On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
> > > So here's the use case:  lowerdir is an NFS mounted root filesystem
> > > (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
> > > for writes to happen.  This works great with the caveat being I cannot
> > > make 'live' changes to the root filesystem, which poses the problem.
> > > Any access to a changed file causes a 'Stale file handle' error.
> > > 
> > > With some experimenting, I've discovered that remounting the overlay
> > > filesystem (mount -o remount / /)  registers any changes that have
> > > been made to the lower NFS filesystem.  In addition, dumping cache
> > > (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
> > > go away and reads pass through to the lower dir and correctly show
> > > changes.
> > > 
> > > I'd like to make this use case feasible by allowing changes to the NFS
> > > lowerdir to work more or less transparently.  It seems like if the
> > > overlay did not do any caching at all, all reads would fall through to
> > > either the upperdir ram disk or the NFS lower, which is precisely what
> > > I want.
> > > 
> > > So, let me pose this somewhat naive question:  Would it be possible to
> > > simply disable any cacheing performed by the overlay to force all
> > > reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
> > > Would this be enough to accomplish my goal of being able to change the
> > > lowerdir of an active overlayfs?
> > > 
> > 
> > There is no need to disable caching. There is already a mechanism
> > in place in VFS to revalidate inode cache entries.
> > NFS implements d_revalidate() and overlayfs implements d_revalidate()
> > by calling into the lower fs d_revalidate().
> > 
> > However overlayfs intentionally errors when lower entry has been modified.
> > (see: 7c03b5d ovl: allow distributed fs as lower layer)
> > 
> > You can try this (untested) patch to revert this behavior, just to see if it
> > works for your use case, but it won't change this fact
> > from Documentation/filesystems/overlayfs.txt:
> > " Changes to the underlying filesystems while part of a mounted overlay
> > filesystem are not allowed.  If the underlying filesystem is changed,
> > the behavior of the overlay is undefined, though it will not result in
> > a crash or deadlock."
> > 
> > Specifically, renaming directories and files in lower that were already
> > copied up is going to have a weird outcome.
> > 
> > Also, the situation with changing files in lower remote fs could be worse
> > than changing files on lower local fs, simply because right now, this
> > use case is not tested (i.e. it results in ESTALE).
> > 
> > I believe that fixing this use case, if at all possible, would require quite
> > a bit of work, a lot of documentation (about expected behavior) and
> > even more testing.
> > 
> > Amir.
> > 
> > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> > index e8ef9d1..6806ef3 100644
> > --- a/fs/overlayfs/super.c
> > +++ b/fs/overlayfs/super.c
> > @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry
> > *dentry, unsigned int flags)
> > 
> >                 if (d->d_flags & DCACHE_OP_REVALIDATE) {
> >                         ret = d->d_op->d_revalidate(d, flags);
> > -                       if (ret < 0)
> > +                       if (ret =< 0)
> >                                 return ret;
> > -                       if (!ret) {
> > -                               if (!(flags & LOOKUP_RCU))
> > -                                       d_invalidate(d);
> > -                               return -ESTALE;
> > -                       }
> >                 }
> >         }
> > -       return 1;
> > +       return ret;
> >  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-21 23:08   ` Josh England
  2017-02-22  9:00     ` Ian Kent
@ 2017-02-27 10:40     ` Amir Goldstein
  2017-02-28 19:08       ` Josh England
  1 sibling, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2017-02-27 10:40 UTC (permalink / raw)
  To: Josh England; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi

On Wed, Feb 22, 2017 at 1:08 AM, Josh England <jjengla@gmail.com> wrote:
> Amir,
>
> After playing with it some, this patch seems to provide precisely the
> behavior I need for my use case.  Do you think it makes sense to turn
> this behavior into a module parameter (eg: allow_revalidate)?
>

I don't know, because I don't know the reason that Miklos chose to error
on revalidate of remote lower fs.
But it would be strange to introduce a feature that changes one undefined
behavior (maybe ESTALE) with another undefined behavior.

It may be easier if you can argue for a use case which does have
defined behavior,
for example, lower fs has some directory subtrees that are not
modified via overlayfs
and only modified directry via lower fs.
I think this *may* result in defined behavior over lower remote fs,
but can't tell for sure.
Anyway, you will have to argue why such a setup is useful.


> -JE
>
>
> On Tue, Feb 14, 2017 at 6:01 AM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
>>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>>> for writes to happen.  This works great with the caveat being I cannot
>>> make 'live' changes to the root filesystem, which poses the problem.
>>> Any access to a changed file causes a 'Stale file handle' error.
>>>
>>> With some experimenting, I've discovered that remounting the overlay
>>> filesystem (mount -o remount / /)  registers any changes that have
>>> been made to the lower NFS filesystem.  In addition, dumping cache
>>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>>> go away and reads pass through to the lower dir and correctly show
>>> changes.
>>>
>>> I'd like to make this use case feasible by allowing changes to the NFS
>>> lowerdir to work more or less transparently.  It seems like if the
>>> overlay did not do any caching at all, all reads would fall through to
>>> either the upperdir ram disk or the NFS lower, which is precisely what
>>> I want.
>>>
>>> So, let me pose this somewhat naive question:  Would it be possible to
>>> simply disable any cacheing performed by the overlay to force all
>>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>>> Would this be enough to accomplish my goal of being able to change the
>>> lowerdir of an active overlayfs?
>>>
>>
>> There is no need to disable caching. There is already a mechanism
>> in place in VFS to revalidate inode cache entries.
>> NFS implements d_revalidate() and overlayfs implements d_revalidate()
>> by calling into the lower fs d_revalidate().
>>
>> However overlayfs intentionally errors when lower entry has been modified.
>> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>>
>> You can try this (untested) patch to revert this behavior, just to see if it
>> works for your use case, but it won't change this fact
>> from Documentation/filesystems/overlayfs.txt:
>> " Changes to the underlying filesystems while part of a mounted overlay
>> filesystem are not allowed.  If the underlying filesystem is changed,
>> the behavior of the overlay is undefined, though it will not result in
>> a crash or deadlock."
>>
>> Specifically, renaming directories and files in lower that were already
>> copied up is going to have a weird outcome.
>>
>> Also, the situation with changing files in lower remote fs could be worse
>> than changing files on lower local fs, simply because right now, this
>> use case is not tested (i.e. it results in ESTALE).
>>
>> I believe that fixing this use case, if at all possible, would require quite
>> a bit of work, a lot of documentation (about expected behavior) and
>> even more testing.
>>
>> Amir.
>>
>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>> index e8ef9d1..6806ef3 100644
>> --- a/fs/overlayfs/super.c
>> +++ b/fs/overlayfs/super.c
>> @@ -106,16 +106,11 @@ static int ovl_dentry_revalidate(struct dentry
>> *dentry, unsigned int flags)
>>
>>                 if (d->d_flags & DCACHE_OP_REVALIDATE) {
>>                         ret = d->d_op->d_revalidate(d, flags);
>> -                       if (ret < 0)
>> +                       if (ret =< 0)
>>                                 return ret;
>> -                       if (!ret) {
>> -                               if (!(flags & LOOKUP_RCU))
>> -                                       d_invalidate(d);
>> -                               return -ESTALE;
>> -                       }
>>                 }
>>         }
>> -       return 1;
>> +       return ret;
>>  }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-27 10:40     ` Amir Goldstein
@ 2017-02-28 19:08       ` Josh England
  2017-02-28 19:44         ` Al Viro
  2017-03-01 20:22         ` Colin Walters
  0 siblings, 2 replies; 15+ messages in thread
From: Josh England @ 2017-02-28 19:08 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi

On Mon, Feb 27, 2017 at 2:40 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> It may be easier if you can argue for a use case which does have
> defined behavior,
> for example, lower fs has some directory subtrees that are not
> modified via overlayfs
> and only modified directry via lower fs.
> I think this *may* result in defined behavior over lower remote fs,
> but can't tell for sure.
> Anyway, you will have to argue why such a setup is useful.

My use case is actually one that involves very little (but necessary)
writes to the overlayfs, and the vast majority of changes happen
directly to the lower fs.  I maintain cluster management software that
has been used on some of the largest clusters in the world.  The
premise is to network boot and mount a shared NFS root on an
arbitrarily large number of cluster nodes.  It has always been an
annoyance to have to find the little bits and pieces in the OS (think
of a full RHEL/CentOS distro) that require the ability to write
somewhere and so we've always had to play silly tricks to get around
that.  Using a ram overlay as the upper allows these minor writes to
happen transparently and save end users massive amounts of time having
to track down and handle each one individually.

The behavior I've noted so far from this patch seems to be very well
defined and follows the rule of least surprise (at least for me).  I
can change files/directories in the shared root and the changes are
instantly visible to all clients.  If a file/directory has been copied
up on the client that change takes precedence over changes in the
lower.  Deletes work as expected.  I haven't found a downside yet.  In
fact I've bundled a release of my software that configures this mode
of operation as on option.  The given patch is included with that
release, but it would be 1000 times cleaner as a stock kernel module
parameter with no patches required.

-JE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-28 19:08       ` Josh England
@ 2017-02-28 19:44         ` Al Viro
  2017-03-01 11:15           ` Amir Goldstein
  2017-03-01 18:22           ` Josh England
  2017-03-01 20:22         ` Colin Walters
  1 sibling, 2 replies; 15+ messages in thread
From: Al Viro @ 2017-02-28 19:44 UTC (permalink / raw)
  To: Josh England; +Cc: Amir Goldstein, linux-fsdevel, linux-unionfs, Miklos Szeredi

On Tue, Feb 28, 2017 at 11:08:31AM -0800, Josh England wrote:

> The behavior I've noted so far from this patch seems to be very well
> defined and follows the rule of least surprise (at least for me).  I
> can change files/directories in the shared root and the changes are
> instantly visible to all clients.  If a file/directory has been copied
> up on the client that change takes precedence over changes in the
> lower.  Deletes work as expected.  I haven't found a downside yet.

Try to do something that invalidates a directory in underlying layer
with stuff in overlay on its subdirectories. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-28 19:44         ` Al Viro
@ 2017-03-01 11:15           ` Amir Goldstein
  2017-03-01 18:22           ` Josh England
  1 sibling, 0 replies; 15+ messages in thread
From: Amir Goldstein @ 2017-03-01 11:15 UTC (permalink / raw)
  To: Al Viro, Miklos Szeredi; +Cc: Josh England, linux-fsdevel, linux-unionfs

On Tue, Feb 28, 2017 at 9:44 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Feb 28, 2017 at 11:08:31AM -0800, Josh England wrote:
>
>> The behavior I've noted so far from this patch seems to be very well
>> defined and follows the rule of least surprise (at least for me).  I
>> can change files/directories in the shared root and the changes are
>> instantly visible to all clients.  If a file/directory has been copied
>> up on the client that change takes precedence over changes in the
>> lower.  Deletes work as expected.  I haven't found a downside yet.
>
> Try to do something that invalidates a directory in underlying layer
> with stuff in overlay on its subdirectories.

That's an interesting test to run, but why does ovl_dentry_revalidate()
need to return 1/-ESTALE instead of ret?
Why not propagate the return values 0 and -ECHILD from NFS?

After all, NFS lower directory inode can be invalidated after lookup
and then other overlayfs ops (e.g. readdir) could fail with -ESTALE
just the same as those ops would fail directly over NFS. No?

What am I missing?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-28 19:44         ` Al Viro
  2017-03-01 11:15           ` Amir Goldstein
@ 2017-03-01 18:22           ` Josh England
  1 sibling, 0 replies; 15+ messages in thread
From: Josh England @ 2017-03-01 18:22 UTC (permalink / raw)
  To: Al Viro; +Cc: Amir Goldstein, linux-fsdevel, linux-unionfs, Miklos Szeredi

Directory /foo/bar
Client (overlay) creates /foo/bar/biz
/foo/bar gets deleted from lower (on NFS server)
All of /foo/bar (including biz) still exists on client because it had
been copied up.

This is actually the behavior I'd expect given the nature of overlayfs.

-JE


On Tue, Feb 28, 2017 at 11:44 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Feb 28, 2017 at 11:08:31AM -0800, Josh England wrote:
>
>> The behavior I've noted so far from this patch seems to be very well
>> defined and follows the rule of least surprise (at least for me).  I
>> can change files/directories in the shared root and the changes are
>> instantly visible to all clients.  If a file/directory has been copied
>> up on the client that change takes precedence over changes in the
>> lower.  Deletes work as expected.  I haven't found a downside yet.
>
> Try to do something that invalidates a directory in underlying layer
> with stuff in overlay on its subdirectories.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-28 19:08       ` Josh England
  2017-02-28 19:44         ` Al Viro
@ 2017-03-01 20:22         ` Colin Walters
  1 sibling, 0 replies; 15+ messages in thread
From: Colin Walters @ 2017-03-01 20:22 UTC (permalink / raw)
  To: Josh England, Amir Goldstein; +Cc: linux-fsdevel, linux-unionfs, Miklos Szeredi



On Tue, Feb 28, 2017, at 02:08 PM, Josh England wrote:
>  It has always been an
> annoyance to have to find the little bits and pieces in the OS (think
> of a full RHEL/CentOS distro) that require the ability to write
> somewhere

As part of driving the [OSTree](https://ostree.readthedocs.io/en/latest/)
model into Fedora/CentOS (well really rpm-ostree, which is part of Atomic Host),
we have a read-only bind mount over /usr, and the only two writable directories are /etc and /var.

In a diskless mode, then you could have /usr as a read-only NFS/cachefiles mount, and use tmpfs
for /etc and /var.  Ideally /etc can be read-only too during operation but that is
definitely one where we hit things like LVM userspace writing there in the background:
https://bugzilla.redhat.com/show_bug.cgi?id=1366584

But if something required writing to /usr at runtime in Fedora/CentOS that should
be considered a bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-02-14 14:01 ` Amir Goldstein
  2017-02-14 17:14   ` Josh England
  2017-02-21 23:08   ` Josh England
@ 2017-03-09 10:37   ` Miklos Szeredi
  2017-03-09 11:22     ` Amir Goldstein
  2 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2017-03-09 10:37 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Josh England, linux-fsdevel, linux-unionfs

On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>> for writes to happen.  This works great with the caveat being I cannot
>> make 'live' changes to the root filesystem, which poses the problem.
>> Any access to a changed file causes a 'Stale file handle' error.
>>
>> With some experimenting, I've discovered that remounting the overlay
>> filesystem (mount -o remount / /)  registers any changes that have
>> been made to the lower NFS filesystem.  In addition, dumping cache
>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>> go away and reads pass through to the lower dir and correctly show
>> changes.
>>
>> I'd like to make this use case feasible by allowing changes to the NFS
>> lowerdir to work more or less transparently.  It seems like if the
>> overlay did not do any caching at all, all reads would fall through to
>> either the upperdir ram disk or the NFS lower, which is precisely what
>> I want.
>>
>> So, let me pose this somewhat naive question:  Would it be possible to
>> simply disable any cacheing performed by the overlay to force all
>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>> Would this be enough to accomplish my goal of being able to change the
>> lowerdir of an active overlayfs?
>>
>
> There is no need to disable caching. There is already a mechanism
> in place in VFS to revalidate inode cache entries.
> NFS implements d_revalidate() and overlayfs implements d_revalidate()
> by calling into the lower fs d_revalidate().
>
> However overlayfs intentionally errors when lower entry has been modified.
> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>
> You can try this (untested) patch to revert this behavior, just to see if it
> works for your use case, but it won't change this fact
> from Documentation/filesystems/overlayfs.txt:
> " Changes to the underlying filesystems while part of a mounted overlay
> filesystem are not allowed.  If the underlying filesystem is changed,
> the behavior of the overlay is undefined, though it will not result in
> a crash or deadlock."

Best way to keep things simple is to only add functionality when
someone actually needs it (and can test it).  This has been the design
policy in overlayfs and it worked wonderfully.

So we could probably fix the undefined behavior in the above case to
some extent.

>
> Specifically, renaming directories and files in lower that were already
> copied up is going to have a weird outcome.
>
> Also, the situation with changing files in lower remote fs could be worse
> than changing files on lower local fs, simply because right now, this
> use case is not tested (i.e. it results in ESTALE).
>
> I believe that fixing this use case, if at all possible, would require quite
> a bit of work, a lot of documentation (about expected behavior) and
> even more testing.

Well, your patch seems to be safe:  if remote fs says something
changed, throw away node and subtree on the overlay level.

We could introduce the same thing for local fs.  Just need to verify
in .d_revalidate() that underlying dentry's parent and name matches
overlay dentry's parent and name.  It's an overhead, and makes no
sense in the case when we know the lower layers won't change, so it
may be best to keep this check optional.

Note, that overlay would still return ESTALE if the change on the
lower layer happens on a dentry already looked up (e.g. cwd, open
file, race of lookup with rename on underlying layer).  Same as NFS.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-03-09 10:37   ` Miklos Szeredi
@ 2017-03-09 11:22     ` Amir Goldstein
  2017-03-09 13:12       ` Miklos Szeredi
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2017-03-09 11:22 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Josh England, linux-fsdevel, linux-unionfs

On Thu, Mar 9, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
>>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>>> for writes to happen.  This works great with the caveat being I cannot
>>> make 'live' changes to the root filesystem, which poses the problem.
>>> Any access to a changed file causes a 'Stale file handle' error.
>>>
>>> With some experimenting, I've discovered that remounting the overlay
>>> filesystem (mount -o remount / /)  registers any changes that have
>>> been made to the lower NFS filesystem.  In addition, dumping cache
>>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>>> go away and reads pass through to the lower dir and correctly show
>>> changes.
>>>
>>> I'd like to make this use case feasible by allowing changes to the NFS
>>> lowerdir to work more or less transparently.  It seems like if the
>>> overlay did not do any caching at all, all reads would fall through to
>>> either the upperdir ram disk or the NFS lower, which is precisely what
>>> I want.
>>>
>>> So, let me pose this somewhat naive question:  Would it be possible to
>>> simply disable any cacheing performed by the overlay to force all
>>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>>> Would this be enough to accomplish my goal of being able to change the
>>> lowerdir of an active overlayfs?
>>>
>>
>> There is no need to disable caching. There is already a mechanism
>> in place in VFS to revalidate inode cache entries.
>> NFS implements d_revalidate() and overlayfs implements d_revalidate()
>> by calling into the lower fs d_revalidate().
>>
>> However overlayfs intentionally errors when lower entry has been modified.
>> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>>
>> You can try this (untested) patch to revert this behavior, just to see if it
>> works for your use case, but it won't change this fact
>> from Documentation/filesystems/overlayfs.txt:
>> " Changes to the underlying filesystems while part of a mounted overlay
>> filesystem are not allowed.  If the underlying filesystem is changed,
>> the behavior of the overlay is undefined, though it will not result in
>> a crash or deadlock."
>
> Best way to keep things simple is to only add functionality when
> someone actually needs it (and can test it).  This has been the design
> policy in overlayfs and it worked wonderfully.
>
> So we could probably fix the undefined behavior in the above case to
> some extent.
>
>>
>> Specifically, renaming directories and files in lower that were already
>> copied up is going to have a weird outcome.
>>
>> Also, the situation with changing files in lower remote fs could be worse
>> than changing files on lower local fs, simply because right now, this
>> use case is not tested (i.e. it results in ESTALE).
>>
>> I believe that fixing this use case, if at all possible, would require quite
>> a bit of work, a lot of documentation (about expected behavior) and
>> even more testing.
>
> Well, your patch seems to be safe:  if remote fs says something
> changed, throw away node and subtree on the overlay level.
>
> We could introduce the same thing for local fs.  Just need to verify
> in .d_revalidate() that underlying dentry's parent and name matches
> overlay dentry's parent and name.  It's an overhead, and makes no
> sense in the case when we know the lower layers won't change, so it
> may be best to keep this check optional.
>
> Note, that overlay would still return ESTALE if the change on the
> lower layer happens on a dentry already looked up (e.g. cwd, open
> file, race of lookup with rename on underlying layer).  Same as NFS.
>

Naturally.

Could you explain what was the reason for special casing (ret == 0)
in ovl_dentry_revalidate()?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: overlayfs: allowing for changes to lowerdir
  2017-03-09 11:22     ` Amir Goldstein
@ 2017-03-09 13:12       ` Miklos Szeredi
  0 siblings, 0 replies; 15+ messages in thread
From: Miklos Szeredi @ 2017-03-09 13:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Josh England, linux-fsdevel, linux-unionfs

On Thu, Mar 9, 2017 at 12:22 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> On Thu, Mar 9, 2017 at 12:37 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Tue, Feb 14, 2017 at 3:01 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>>> On Mon, Feb 13, 2017 at 11:41 PM, Josh England <jjengla@gmail.com> wrote:
>>>> So here's the use case:  lowerdir is an NFS mounted root filesystem
>>>> (shared by a bunch of nodes).  upperdir is a tmpfs RAM disk to allow
>>>> for writes to happen.  This works great with the caveat being I cannot
>>>> make 'live' changes to the root filesystem, which poses the problem.
>>>> Any access to a changed file causes a 'Stale file handle' error.
>>>>
>>>> With some experimenting, I've discovered that remounting the overlay
>>>> filesystem (mount -o remount / /)  registers any changes that have
>>>> been made to the lower NFS filesystem.  In addition, dumping cache
>>>> (via /proc/sys/vm/drop_caches) also makes the stale file handle errors
>>>> go away and reads pass through to the lower dir and correctly show
>>>> changes.
>>>>
>>>> I'd like to make this use case feasible by allowing changes to the NFS
>>>> lowerdir to work more or less transparently.  It seems like if the
>>>> overlay did not do any caching at all, all reads would fall through to
>>>> either the upperdir ram disk or the NFS lower, which is precisely what
>>>> I want.
>>>>
>>>> So, let me pose this somewhat naive question:  Would it be possible to
>>>> simply disable any cacheing performed by the overlay to force all
>>>> reads to go to either the tmpfs upper or the (VFS-cached) NFS lower?
>>>> Would this be enough to accomplish my goal of being able to change the
>>>> lowerdir of an active overlayfs?
>>>>
>>>
>>> There is no need to disable caching. There is already a mechanism
>>> in place in VFS to revalidate inode cache entries.
>>> NFS implements d_revalidate() and overlayfs implements d_revalidate()
>>> by calling into the lower fs d_revalidate().
>>>
>>> However overlayfs intentionally errors when lower entry has been modified.
>>> (see: 7c03b5d ovl: allow distributed fs as lower layer)
>>>
>>> You can try this (untested) patch to revert this behavior, just to see if it
>>> works for your use case, but it won't change this fact
>>> from Documentation/filesystems/overlayfs.txt:
>>> " Changes to the underlying filesystems while part of a mounted overlay
>>> filesystem are not allowed.  If the underlying filesystem is changed,
>>> the behavior of the overlay is undefined, though it will not result in
>>> a crash or deadlock."
>>
>> Best way to keep things simple is to only add functionality when
>> someone actually needs it (and can test it).  This has been the design
>> policy in overlayfs and it worked wonderfully.
>>
>> So we could probably fix the undefined behavior in the above case to
>> some extent.
>>
>>>
>>> Specifically, renaming directories and files in lower that were already
>>> copied up is going to have a weird outcome.
>>>
>>> Also, the situation with changing files in lower remote fs could be worse
>>> than changing files on lower local fs, simply because right now, this
>>> use case is not tested (i.e. it results in ESTALE).
>>>
>>> I believe that fixing this use case, if at all possible, would require quite
>>> a bit of work, a lot of documentation (about expected behavior) and
>>> even more testing.
>>
>> Well, your patch seems to be safe:  if remote fs says something
>> changed, throw away node and subtree on the overlay level.
>>
>> We could introduce the same thing for local fs.  Just need to verify
>> in .d_revalidate() that underlying dentry's parent and name matches
>> overlay dentry's parent and name.  It's an overhead, and makes no
>> sense in the case when we know the lower layers won't change, so it
>> may be best to keep this check optional.
>>
>> Note, that overlay would still return ESTALE if the change on the
>> lower layer happens on a dentry already looked up (e.g. cwd, open
>> file, race of lookup with rename on underlying layer).  Same as NFS.
>>
>
> Naturally.
>
> Could you explain what was the reason for special casing (ret == 0)
> in ovl_dentry_revalidate()?

I think my reasoning was: if lower dentry is invalid, it means lower
was changed; this is against the rules (as per documentation) so
return error.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-09 13:12 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-13 21:41 overlayfs: allowing for changes to lowerdir Josh England
2017-02-14 14:01 ` Amir Goldstein
2017-02-14 17:14   ` Josh England
2017-02-21 23:08   ` Josh England
2017-02-22  9:00     ` Ian Kent
2017-02-27 10:40     ` Amir Goldstein
2017-02-28 19:08       ` Josh England
2017-02-28 19:44         ` Al Viro
2017-03-01 11:15           ` Amir Goldstein
2017-03-01 18:22           ` Josh England
2017-03-01 20:22         ` Colin Walters
2017-03-09 10:37   ` Miklos Szeredi
2017-03-09 11:22     ` Amir Goldstein
2017-03-09 13:12       ` Miklos Szeredi
2017-02-15  1:29 ` J. R. Okajima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).