All of lore.kernel.org
 help / color / mirror / Atom feed
* More fun with unmounting ESTALE directories.
@ 2013-02-12  0:38 NeilBrown
  2013-02-14 15:42 ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2013-02-12  0:38 UTC (permalink / raw)
  To: Myklebust, Trond, Alexander Viro; +Cc: NFS

[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]


I've been exploring difficulties with unmounting stale directories and
discovered another bug.

If I:

SERVER:  mkdir /foo/bar  #and make sure it is exported
CLIENT:  mount -o vers=4 server:/foo/bar /mnt
SERVER:  rm -r /foo
CLIENT:  > /mnt/baz # gets an error of course
CLIENT:  ls -l /mnt # error again
CLIENT:  umount /mnt

The result of that last command is:

/mnt was not found in /proc/mounts
/mnt was not found in /proc/mounts

Strange?

cat /proc/mounts

.....
10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
....

Notice the "\040(deleted)".

NFS has unhashed that directory because it is obviously bad, and d_path()
notices and adds " (deleted)".

Now I might be able to argue that NFS shouldn't be unhashing a directory that
is a mountpoint - it certainly seems strange behaviour.

But I think I can more strongly argue that /proc/mounts shouldn't be showing
the mounted directory, but instead the directory that it is mounted on.
Obviously these both have the same name so it shouldn't matter ... except
that here is a case where it does.

I "fixed" it with

--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
 {
 	struct mount *r = real_mount(mnt);
 	int err = 0;
-	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
+	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
 	struct super_block *sb = mnt_path.dentry->d_sb;
 
 	if (sb->s_op->show_devname) {

though I suspect that isn't safe and needs some locking.

Probably both should be fixed:  NFS should not invalidate any mounted
directory, and show_vfsmnt() should report the mointpoint, not the mounted
directory.

I can't figure out any way to get NFS to not invalidate the mounted directory.
I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
don't know how to tell if a given dentry is a mnt_root for any mountpoint.

Suggestions?  Thoughts?

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-12  0:38 More fun with unmounting ESTALE directories NeilBrown
@ 2013-02-14 15:42 ` Jeff Layton
  2013-02-18  2:25   ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-14 15:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: Myklebust, Trond, Alexander Viro, NFS

On Tue, 12 Feb 2013 11:38:13 +1100
NeilBrown <neilb@suse.de> wrote:

> 
> I've been exploring difficulties with unmounting stale directories and
> discovered another bug.
> 
> If I:
> 
> SERVER:  mkdir /foo/bar  #and make sure it is exported
> CLIENT:  mount -o vers=4 server:/foo/bar /mnt
> SERVER:  rm -r /foo
> CLIENT:  > /mnt/baz # gets an error of course
> CLIENT:  ls -l /mnt # error again
> CLIENT:  umount /mnt
> 
> The result of that last command is:
> 
> /mnt was not found in /proc/mounts
> /mnt was not found in /proc/mounts
> 
> Strange?
> 
> cat /proc/mounts
> 
> .....
> 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
> ....
> 
> Notice the "\040(deleted)".
> 
> NFS has unhashed that directory because it is obviously bad, and d_path()
> notices and adds " (deleted)".
> 
> Now I might be able to argue that NFS shouldn't be unhashing a directory that
> is a mountpoint - it certainly seems strange behaviour.
> 
> But I think I can more strongly argue that /proc/mounts shouldn't be showing
> the mounted directory, but instead the directory that it is mounted on.
> Obviously these both have the same name so it shouldn't matter ... except
> that here is a case where it does.
> 
> I "fixed" it with
> 
> --- a/fs/proc_namespace.c
> +++ b/fs/proc_namespace.c
> @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
>  {
>  	struct mount *r = real_mount(mnt);
>  	int err = 0;
> -	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
> +	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
>  	struct super_block *sb = mnt_path.dentry->d_sb;
>  
>  	if (sb->s_op->show_devname) {
> 
> though I suspect that isn't safe and needs some locking.
> 
> Probably both should be fixed:  NFS should not invalidate any mounted
> directory, and show_vfsmnt() should report the mointpoint, not the mounted
> directory.
> 
> I can't figure out any way to get NFS to not invalidate the mounted directory.
> I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
> don't know how to tell if a given dentry is a mnt_root for any mountpoint.
> 
> Suggestions?  Thoughts?
> 
> Thanks,
> NeilBrown
> 

I've also been looking at some weird ESTALE problems. Here's another
fun one that doesn't involve mountpoints. Assume here that we're
working in the same exported directory on client and server:

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30  # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat ‘.’: Stale NFS file handle

Obviously, "." should not be stale. It got renamed, but the inode still
exists on the server.

If you sniff on the wire, you'll see that the server doesn't ever send
an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
end up trying to revalidate the dentry that "." refers to. We find that
the parent changed (obviously) and then try to redo the lookup of "a".
At that point we notice that it doesn't exist and turn it into ESTALE.

I don't really understand the point of FS_REVAL_DOT. What does that
actually buy us? I wonder if removing it would also help your testcase?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-14 15:42 ` Jeff Layton
@ 2013-02-18  2:25   ` NeilBrown
  2013-02-18 12:41     ` Jeff Layton
  2013-02-18 18:46     ` Al Viro
  0 siblings, 2 replies; 17+ messages in thread
From: NeilBrown @ 2013-02-18  2:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Myklebust, Trond, Alexander Viro, NFS

[-- Attachment #1: Type: text/plain, Size: 7243 bytes --]

On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@redhat.com> wrote:

> On Tue, 12 Feb 2013 11:38:13 +1100
> NeilBrown <neilb@suse.de> wrote:
> 
> > 
> > I've been exploring difficulties with unmounting stale directories and
> > discovered another bug.
> > 
> > If I:
> > 
> > SERVER:  mkdir /foo/bar  #and make sure it is exported
> > CLIENT:  mount -o vers=4 server:/foo/bar /mnt
> > SERVER:  rm -r /foo
> > CLIENT:  > /mnt/baz # gets an error of course
> > CLIENT:  ls -l /mnt # error again
> > CLIENT:  umount /mnt
> > 
> > The result of that last command is:
> > 
> > /mnt was not found in /proc/mounts
> > /mnt was not found in /proc/mounts
> > 
> > Strange?
> > 
> > cat /proc/mounts
> > 
> > .....
> > 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
> > ....
> > 
> > Notice the "\040(deleted)".
> > 
> > NFS has unhashed that directory because it is obviously bad, and d_path()
> > notices and adds " (deleted)".
> > 
> > Now I might be able to argue that NFS shouldn't be unhashing a directory that
> > is a mountpoint - it certainly seems strange behaviour.
> > 
> > But I think I can more strongly argue that /proc/mounts shouldn't be showing
> > the mounted directory, but instead the directory that it is mounted on.
> > Obviously these both have the same name so it shouldn't matter ... except
> > that here is a case where it does.
> > 
> > I "fixed" it with
> > 
> > --- a/fs/proc_namespace.c
> > +++ b/fs/proc_namespace.c
> > @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
> >  {
> >  	struct mount *r = real_mount(mnt);
> >  	int err = 0;
> > -	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
> > +	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
> >  	struct super_block *sb = mnt_path.dentry->d_sb;
> >  
> >  	if (sb->s_op->show_devname) {
> > 
> > though I suspect that isn't safe and needs some locking.
> > 
> > Probably both should be fixed:  NFS should not invalidate any mounted
> > directory, and show_vfsmnt() should report the mointpoint, not the mounted
> > directory.
> > 
> > I can't figure out any way to get NFS to not invalidate the mounted directory.
> > I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
> > don't know how to tell if a given dentry is a mnt_root for any mountpoint.
> > 
> > Suggestions?  Thoughts?
> > 
> > Thanks,
> > NeilBrown
> > 
> 
> I've also been looking at some weird ESTALE problems. Here's another
> fun one that doesn't involve mountpoints. Assume here that we're
> working in the same exported directory on client and server:
> 
>     server# mkdir a
>     client# cd a
>     server# mv a a.bak
>     client# sleep 30  # (or whatever the dir attrcache timeout is)
>     client# stat .
>     stat: cannot stat ‘.’: Stale NFS file handle
> 
> Obviously, "." should not be stale. It got renamed, but the inode still
> exists on the server.
> 
> If you sniff on the wire, you'll see that the server doesn't ever send
> an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
> end up trying to revalidate the dentry that "." refers to. We find that
> the parent changed (obviously) and then try to redo the lookup of "a".
> At that point we notice that it doesn't exist and turn it into ESTALE.
> 
> I don't really understand the point of FS_REVAL_DOT. What does that
> actually buy us? I wonder if removing it would also help your testcase?
> 

I think that is a slightly different issue, but certainly related. 
I have hit your problem before and have the following patch in SLES.  I think
I tried pushing it upstream, but didn't get much in the way of a useful
response.
(patch is space-damaged - don't try to apply with 'patch').

BTW I have another problem, related to this one and which could be fixed by
removing FS_REVAL_DOT.

If you
   mount -o vers=4,noac server:/some/path /mnt
then stop nfsd on the server and
   umount /mnt

it hangs.
Partly it hangs because  'mount' tries to do a 'readlink' on the mountpoint.
I can probably get it to not do that (or use --no-canonicalize).
But then sys_umount hangs because it tries to check with the server that the
thing being unmounted really is still a directory...

I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
last step and returns the mounted-on directory, not the mountpoint that is
mounted there - or at least makes sure not revalidate happens on that final
mounted directory.


I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
attributes from the server if the cache is old.  However it seems to do a
whole lot more than that, including "lookup" calls which it I'm sure is wrong.

NeilBrown


Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.

When d_revalidate is called on a dentry because FS_REVAL_DOT is set
it isn't really appropriate to revalidate the name.

If the path was simply ".", then the current-working-directory could
have been renamed on the server and should still be accessible as "."
even if it has a new name.

If the path was "/some/long/path/.", then the final component ("path" in
this case) has already been revalidated and there is no particular
need to do it again.

If we change nd->last_type to refer to "the last component looked at"
rather than just "the last component", then these cases can be
detected by "nd->last_type != LAST_NORM".

Signed-off-by: NeilBrown <neilb@suse.de>

---
 fs/namei.c   |    2 +-
 fs/nfs/dir.c |    9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

--- linux-3.0-SLE11-SP2.orig/fs/namei.c
+++ linux-3.0-SLE11-SP2/fs/namei.c
@@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na
                        }
                }
 
+               nd->last_type = type;
                /* remove trailing slashes? */
                if (!c)
                        goto last_component;
@@ -1486,7 +1487,6 @@ last_component:
                /* Clear LOOKUP_CONTINUE iff it was previously unset */
                nd->flags &= lookup_flags | ~LOOKUP_CONTINUE;
                nd->last = this;
-               nd->last_type = type;
                return 0;
        }
        terminate_walk(nd);
--- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c
+++ linux-3.0-SLE11-SP2/fs/nfs/dir.c
@@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct
        if (NFS_STALE(inode))
                goto out_bad;
 
+       if (nd->last_type != LAST_NORM) {
+               /* name not relevant, just inode */
+               error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+               if (error)
+                       goto out_bad;
+               else
+                       goto out_valid;
+       }
+
        error = -ENOMEM;
        fhandle = nfs_alloc_fhandle();
        fattr = nfs_alloc_fattr();

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18  2:25   ` NeilBrown
@ 2013-02-18 12:41     ` Jeff Layton
  2013-02-18 15:36       ` Chuck Lever
  2013-02-18 18:46     ` Al Viro
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-18 12:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: Myklebust, Trond, Alexander Viro, NFS, steved

[-- Attachment #1: Type: text/plain, Size: 8741 bytes --]

On Mon, 18 Feb 2013 13:25:09 +1100
NeilBrown <neilb@suse.de> wrote:

> On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Tue, 12 Feb 2013 11:38:13 +1100
> > NeilBrown <neilb@suse.de> wrote:
> > 
> > > 
> > > I've been exploring difficulties with unmounting stale directories and
> > > discovered another bug.
> > > 
> > > If I:
> > > 
> > > SERVER:  mkdir /foo/bar  #and make sure it is exported
> > > CLIENT:  mount -o vers=4 server:/foo/bar /mnt
> > > SERVER:  rm -r /foo
> > > CLIENT:  > /mnt/baz # gets an error of course
> > > CLIENT:  ls -l /mnt # error again
> > > CLIENT:  umount /mnt
> > > 
> > > The result of that last command is:
> > > 
> > > /mnt was not found in /proc/mounts
> > > /mnt was not found in /proc/mounts
> > > 
> > > Strange?
> > > 
> > > cat /proc/mounts
> > > 
> > > .....
> > > 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
> > > ....
> > > 
> > > Notice the "\040(deleted)".
> > > 
> > > NFS has unhashed that directory because it is obviously bad, and d_path()
> > > notices and adds " (deleted)".
> > > 
> > > Now I might be able to argue that NFS shouldn't be unhashing a directory that
> > > is a mountpoint - it certainly seems strange behaviour.
> > > 
> > > But I think I can more strongly argue that /proc/mounts shouldn't be showing
> > > the mounted directory, but instead the directory that it is mounted on.
> > > Obviously these both have the same name so it shouldn't matter ... except
> > > that here is a case where it does.
> > > 
> > > I "fixed" it with
> > > 
> > > --- a/fs/proc_namespace.c
> > > +++ b/fs/proc_namespace.c
> > > @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
> > >  {
> > >  	struct mount *r = real_mount(mnt);
> > >  	int err = 0;
> > > -	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
> > > +	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
> > >  	struct super_block *sb = mnt_path.dentry->d_sb;
> > >  
> > >  	if (sb->s_op->show_devname) {
> > > 
> > > though I suspect that isn't safe and needs some locking.
> > > 
> > > Probably both should be fixed:  NFS should not invalidate any mounted
> > > directory, and show_vfsmnt() should report the mointpoint, not the mounted
> > > directory.
> > > 
> > > I can't figure out any way to get NFS to not invalidate the mounted directory.
> > > I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
> > > don't know how to tell if a given dentry is a mnt_root for any mountpoint.
> > > 
> > > Suggestions?  Thoughts?
> > > 
> > > Thanks,
> > > NeilBrown
> > > 
> > 
> > I've also been looking at some weird ESTALE problems. Here's another
> > fun one that doesn't involve mountpoints. Assume here that we're
> > working in the same exported directory on client and server:
> > 
> >     server# mkdir a
> >     client# cd a
> >     server# mv a a.bak
> >     client# sleep 30  # (or whatever the dir attrcache timeout is)
> >     client# stat .
> >     stat: cannot stat ‘.’: Stale NFS file handle
> > 
> > Obviously, "." should not be stale. It got renamed, but the inode still
> > exists on the server.
> > 
> > If you sniff on the wire, you'll see that the server doesn't ever send
> > an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
> > end up trying to revalidate the dentry that "." refers to. We find that
> > the parent changed (obviously) and then try to redo the lookup of "a".
> > At that point we notice that it doesn't exist and turn it into ESTALE.
> > 
> > I don't really understand the point of FS_REVAL_DOT. What does that
> > actually buy us? I wonder if removing it would also help your testcase?
> > 
> 
> I think that is a slightly different issue, but certainly related. 
> I have hit your problem before and have the following patch in SLES.  I think
> I tried pushing it upstream, but didn't get much in the way of a useful
> response.
> (patch is space-damaged - don't try to apply with 'patch').
> 
> BTW I have another problem, related to this one and which could be fixed by
> removing FS_REVAL_DOT.
> 
> If you
>    mount -o vers=4,noac server:/some/path /mnt
> then stop nfsd on the server and
>    umount /mnt
> 
> it hangs.
> Partly it hangs because  'mount' tries to do a 'readlink' on the mountpoint.
> I can probably get it to not do that (or use --no-canonicalize).
> But then sys_umount hangs because it tries to check with the server that the
> thing being unmounted really is still a directory...
> 
> I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> last step and returns the mounted-on directory, not the mountpoint that is
> mounted there - or at least makes sure not revalidate happens on that final
> mounted directory.
> 
> 
> I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
> attributes from the server if the cache is old.  However it seems to do a
> whole lot more than that, including "lookup" calls which it I'm sure is wrong.
> 
> 
> 
> Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.
> 
> When d_revalidate is called on a dentry because FS_REVAL_DOT is set
> it isn't really appropriate to revalidate the name.
> 
> If the path was simply ".", then the current-working-directory could
> have been renamed on the server and should still be accessible as "."
> even if it has a new name.
> 
> If the path was "/some/long/path/.", then the final component ("path" in
> this case) has already been revalidated and there is no particular
> need to do it again.
> 
> If we change nd->last_type to refer to "the last component looked at"
> rather than just "the last component", then these cases can be
> detected by "nd->last_type != LAST_NORM".
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> ---
>  fs/namei.c   |    2 +-
>  fs/nfs/dir.c |    9 +++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> --- linux-3.0-SLE11-SP2.orig/fs/namei.c
> +++ linux-3.0-SLE11-SP2/fs/namei.c
> @@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na
>                         }
>                 }
>  
> +               nd->last_type = type;
>                 /* remove trailing slashes? */
>                 if (!c)
>                         goto last_component;
> @@ -1486,7 +1487,6 @@ last_component:
>                 /* Clear LOOKUP_CONTINUE iff it was previously unset */
>                 nd->flags &= lookup_flags | ~LOOKUP_CONTINUE;
>                 nd->last = this;
> -               nd->last_type = type;
>                 return 0;
>         }
>         terminate_walk(nd);
> --- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c
> +++ linux-3.0-SLE11-SP2/fs/nfs/dir.c
> @@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct
>         if (NFS_STALE(inode))
>                 goto out_bad;
>  
> +       if (nd->last_type != LAST_NORM) {
> +               /* name not relevant, just inode */
> +               error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> +               if (error)
> +                       goto out_bad;
> +               else
> +                       goto out_valid;
> +       }
> +
>         error = -ENOMEM;
>         fhandle = nfs_alloc_fhandle();
>         fattr = nfs_alloc_fattr();

Ahh thanks -- that is the same problem exactly. I'll have to look over
your patch and see whether and how it could be applied to current
mainline code.

I think the umount thing may be the same problem that steved was
talking about the other day (V4 unmount causes a GETATTR). I hadn't put
the two together, but you're probably right.

LOOKUP_MOUNTPOINT is a very interesting idea and might even be
reasonable in conjunction with removing FS_REVAL_DOT as it would make
the needs of umount more explicit.

As far as FS_REVAL_DOT goes though, in the current mainline code, the
only place that looks at it is complete_walk(), and it just means that
we won't skip doing a d_revalidate on the final component (which will
only have not been done if it's a '.', right?).

If we remove FS_REVAL_DOT altogether, then we can chop off the bottom
half of that function, which has a certain appeal. I guess the question
we have to answer is -- if we remove FS_REVAL_DOT, what (if anything)
will break?


-- 
Jeff Layton <jlayton@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 12:41     ` Jeff Layton
@ 2013-02-18 15:36       ` Chuck Lever
  2013-02-18 21:58         ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2013-02-18 15:36 UTC (permalink / raw)
  To: Jeff Layton; +Cc: NeilBrown, Myklebust, Trond, Alexander Viro, NFS, steved


On Feb 18, 2013, at 7:41 AM, Jeff Layton <jlayton@redhat.com> wrote:

> On Mon, 18 Feb 2013 13:25:09 +1100
> NeilBrown <neilb@suse.de> wrote:
> 
>> On Thu, 14 Feb 2013 10:42:30 -0500 Jeff Layton <jlayton@redhat.com> wrote:
>> 
>>> On Tue, 12 Feb 2013 11:38:13 +1100
>>> NeilBrown <neilb@suse.de> wrote:
>>> 
>>>> 
>>>> I've been exploring difficulties with unmounting stale directories and
>>>> discovered another bug.
>>>> 
>>>> If I:
>>>> 
>>>> SERVER:  mkdir /foo/bar  #and make sure it is exported
>>>> CLIENT:  mount -o vers=4 server:/foo/bar /mnt
>>>> SERVER:  rm -r /foo
>>>> CLIENT:  > /mnt/baz # gets an error of course
>>>> CLIENT:  ls -l /mnt # error again
>>>> CLIENT:  umount /mnt
>>>> 
>>>> The result of that last command is:
>>>> 
>>>> /mnt was not found in /proc/mounts
>>>> /mnt was not found in /proc/mounts
>>>> 
>>>> Strange?
>>>> 
>>>> cat /proc/mounts
>>>> 
>>>> .....
>>>> 10.0.2.2://foo/bar /mnt\040(deleted) nfs4 rw,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.2.15,minorversion=0,local_lock=none,addr=10.0.2.2 0 0
>>>> ....
>>>> 
>>>> Notice the "\040(deleted)".
>>>> 
>>>> NFS has unhashed that directory because it is obviously bad, and d_path()
>>>> notices and adds " (deleted)".
>>>> 
>>>> Now I might be able to argue that NFS shouldn't be unhashing a directory that
>>>> is a mountpoint - it certainly seems strange behaviour.
>>>> 
>>>> But I think I can more strongly argue that /proc/mounts shouldn't be showing
>>>> the mounted directory, but instead the directory that it is mounted on.
>>>> Obviously these both have the same name so it shouldn't matter ... except
>>>> that here is a case where it does.
>>>> 
>>>> I "fixed" it with
>>>> 
>>>> --- a/fs/proc_namespace.c
>>>> +++ b/fs/proc_namespace.c
>>>> @@ -93,7 +93,7 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
>>>> {
>>>> 	struct mount *r = real_mount(mnt);
>>>> 	int err = 0;
>>>> -	struct path mnt_path = { .dentry = mnt->mnt_root, .mnt = mnt };
>>>> +	struct path mnt_path = { .dentry = r->mnt_mountpoint, .mnt = &(r->mnt_parent)->mnt };
>>>> 	struct super_block *sb = mnt_path.dentry->d_sb;
>>>> 
>>>> 	if (sb->s_op->show_devname) {
>>>> 
>>>> though I suspect that isn't safe and needs some locking.
>>>> 
>>>> Probably both should be fixed:  NFS should not invalidate any mounted
>>>> directory, and show_vfsmnt() should report the mointpoint, not the mounted
>>>> directory.
>>>> 
>>>> I can't figure out any way to get NFS to not invalidate the mounted directory.
>>>> I think it happens in nfs_lookup_revalidate() when it calls d_drop(), but I
>>>> don't know how to tell if a given dentry is a mnt_root for any mountpoint.
>>>> 
>>>> Suggestions?  Thoughts?
>>>> 
>>>> Thanks,
>>>> NeilBrown
>>>> 
>>> 
>>> I've also been looking at some weird ESTALE problems. Here's another
>>> fun one that doesn't involve mountpoints. Assume here that we're
>>> working in the same exported directory on client and server:
>>> 
>>>    server# mkdir a
>>>    client# cd a
>>>    server# mv a a.bak
>>>    client# sleep 30  # (or whatever the dir attrcache timeout is)
>>>    client# stat .
>>>    stat: cannot stat ‘.’: Stale NFS file handle
>>> 
>>> Obviously, "." should not be stale. It got renamed, but the inode still
>>> exists on the server.
>>> 
>>> If you sniff on the wire, you'll see that the server doesn't ever send
>>> an ESTALE here. What happens is that due to FS_REVAL_DOT being set, we
>>> end up trying to revalidate the dentry that "." refers to. We find that
>>> the parent changed (obviously) and then try to redo the lookup of "a".
>>> At that point we notice that it doesn't exist and turn it into ESTALE.
>>> 
>>> I don't really understand the point of FS_REVAL_DOT. What does that
>>> actually buy us? I wonder if removing it would also help your testcase?
>>> 
>> 
>> I think that is a slightly different issue, but certainly related. 
>> I have hit your problem before and have the following patch in SLES.  I think
>> I tried pushing it upstream, but didn't get much in the way of a useful
>> response.
>> (patch is space-damaged - don't try to apply with 'patch').
>> 
>> BTW I have another problem, related to this one and which could be fixed by
>> removing FS_REVAL_DOT.
>> 
>> If you
>>   mount -o vers=4,noac server:/some/path /mnt
>> then stop nfsd on the server and
>>   umount /mnt
>> 
>> it hangs.
>> Partly it hangs because  'mount' tries to do a 'readlink' on the mountpoint.
>> I can probably get it to not do that (or use --no-canonicalize).
>> But then sys_umount hangs because it tries to check with the server that the
>> thing being unmounted really is still a directory...
>> 
>> I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
>> works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
>> last step and returns the mounted-on directory, not the mountpoint that is
>> mounted there - or at least makes sure not revalidate happens on that final
>> mounted directory.
>> 
>> 
>> I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
>> attributes from the server if the cache is old.  However it seems to do a
>> whole lot more than that, including "lookup" calls which it I'm sure is wrong.
>> 
>> 
>> 
>> Subject: [PATCH] nfs - handle d_revalidate of 'dot' correctly.
>> 
>> When d_revalidate is called on a dentry because FS_REVAL_DOT is set
>> it isn't really appropriate to revalidate the name.
>> 
>> If the path was simply ".", then the current-working-directory could
>> have been renamed on the server and should still be accessible as "."
>> even if it has a new name.
>> 
>> If the path was "/some/long/path/.", then the final component ("path" in
>> this case) has already been revalidated and there is no particular
>> need to do it again.
>> 
>> If we change nd->last_type to refer to "the last component looked at"
>> rather than just "the last component", then these cases can be
>> detected by "nd->last_type != LAST_NORM".
>> 
>> Signed-off-by: NeilBrown <neilb@suse.de>
>> 
>> ---
>> fs/namei.c   |    2 +-
>> fs/nfs/dir.c |    9 +++++++++
>> 2 files changed, 10 insertions(+), 1 deletion(-)
>> 
>> --- linux-3.0-SLE11-SP2.orig/fs/namei.c
>> +++ linux-3.0-SLE11-SP2/fs/namei.c
>> @@ -1460,6 +1460,7 @@ static int link_path_walk(const char *na
>>                        }
>>                }
>> 
>> +               nd->last_type = type;
>>                /* remove trailing slashes? */
>>                if (!c)
>>                        goto last_component;
>> @@ -1486,7 +1487,6 @@ last_component:
>>                /* Clear LOOKUP_CONTINUE iff it was previously unset */
>>                nd->flags &= lookup_flags | ~LOOKUP_CONTINUE;
>>                nd->last = this;
>> -               nd->last_type = type;
>>                return 0;
>>        }
>>        terminate_walk(nd);
>> --- linux-3.0-SLE11-SP2.orig/fs/nfs/dir.c
>> +++ linux-3.0-SLE11-SP2/fs/nfs/dir.c
>> @@ -1138,6 +1138,15 @@ static int nfs_lookup_revalidate(struct
>>        if (NFS_STALE(inode))
>>                goto out_bad;
>> 
>> +       if (nd->last_type != LAST_NORM) {
>> +               /* name not relevant, just inode */
>> +               error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
>> +               if (error)
>> +                       goto out_bad;
>> +               else
>> +                       goto out_valid;
>> +       }
>> +
>>        error = -ENOMEM;
>>        fhandle = nfs_alloc_fhandle();
>>        fattr = nfs_alloc_fattr();
> 
> Ahh thanks -- that is the same problem exactly. I'll have to look over
> your patch and see whether and how it could be applied to current
> mainline code.
> 
> I think the umount thing may be the same problem that steved was
> talking about the other day (V4 unmount causes a GETATTR). I hadn't put
> the two together, but you're probably right.
> 
> LOOKUP_MOUNTPOINT is a very interesting idea and might even be
> reasonable in conjunction with removing FS_REVAL_DOT as it would make
> the needs of umount more explicit.
> 
> As far as FS_REVAL_DOT goes though, in the current mainline code, the
> only place that looks at it is complete_walk(), and it just means that
> we won't skip doing a d_revalidate on the final component (which will
> only have not been done if it's a '.', right?).
> 
> If we remove FS_REVAL_DOT altogether, then we can chop off the bottom
> half of that function, which has a certain appeal. I guess the question
> we have to answer is -- if we remove FS_REVAL_DOT, what (if anything)
> will break?

Before removing FS_REVAL_DOT, I recommend some archaeology:  find out what was fixed by adding that for NFS.  I worked on something related years ago and have a vague recollection about this that there is a situation where revalidating dot is important.

Unfortunately I don't see it in a quick browse through the commit log for fs/nfs/dir.c.  It may be in BitKeeper.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18  2:25   ` NeilBrown
  2013-02-18 12:41     ` Jeff Layton
@ 2013-02-18 18:46     ` Al Viro
  2013-02-18 19:46       ` Jeff Layton
  2013-02-18 23:10       ` NeilBrown
  1 sibling, 2 replies; 17+ messages in thread
From: Al Viro @ 2013-02-18 18:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jeff Layton, Myklebust, Trond, NFS, Linus Torvalds

On Mon, Feb 18, 2013 at 01:25:09PM +1100, NeilBrown wrote:

> I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> last step and returns the mounted-on directory, not the mountpoint that is
> mounted there - or at least makes sure not revalidate happens on that final
> mounted directory.

I don't think LOOKUP_MOUNTPOINT is a good idea.  For one thing, we have
fairly few places that might want it, all of them in core VFS.  Might as
well provide a separate function for them, a-la path_lookupat() vs.
path_openat().

For another, we need to decide what to do with a really nasty corner case:
	a/b is a mountpoint, with c/d bound on it.
	c/d is a symlink to c/e
	c/e is a mountpoint
What should umount("a/b", 0) do?  There are two possibilities - removing
vfsmount on top of a/b or one on top of c/e...

We have the latter semantics; _that_ is what this GETATTR is about.  It's
a fairly obscure corner case - the question is not even whether to follow
symlinks, it's whether to follow _mounts_ on the last component.  Hell
knows; I'm seriously tempted to change it "don't follow mounts" and see if
anyone complains.  The only case when behaviour would change would be
a symlink mounted somewhere (note that this is _not_ something that can easily
happen; e.g. mount --bind does follow symlinks) and umount(2) given the
path resolving to the mountpoint of that symlink.
 
> I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
> attributes from the server if the cache is old.  However it seems to do a
> whole lot more than that, including "lookup" calls which it I'm sure is wrong.

Far from only that.  FS_REVAL_DOT is a misnomer - it's not just ".".  Take
a look at the places where LOOKUP_JUMPED is set; _that_ is what drives the
damn thing.  Essentially, LOOKUP_JUMPED is "we hadn't arrived here by
lookup in parent"; "." (or "/") obviously qualify, but so does following
a procfs-style symlink, or .., for that matter.  "foo/.", OTOH, does *not*.

It matters only in the end of the pathname, of course.  So we don't need to do
revalidate every time we step on e.g. .. or cross a mountpoint (let alone
when we step on .), as long as we make sure that revalidate isn't missing in
the end of path.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 18:46     ` Al Viro
@ 2013-02-18 19:46       ` Jeff Layton
  2013-02-18 20:15         ` Al Viro
  2013-02-18 23:10       ` NeilBrown
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2013-02-18 19:46 UTC (permalink / raw)
  To: Al Viro; +Cc: NeilBrown, Myklebust, Trond, NFS, Linus Torvalds

On Mon, 18 Feb 2013 18:46:09 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Feb 18, 2013 at 01:25:09PM +1100, NeilBrown wrote:
> 
> > I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> > works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> > last step and returns the mounted-on directory, not the mountpoint that is
> > mounted there - or at least makes sure not revalidate happens on that final
> > mounted directory.
> 
> I don't think LOOKUP_MOUNTPOINT is a good idea.  For one thing, we have
> fairly few places that might want it, all of them in core VFS.  Might as
> well provide a separate function for them, a-la path_lookupat() vs.
> path_openat().
> 
> For another, we need to decide what to do with a really nasty corner case:
> 	a/b is a mountpoint, with c/d bound on it.
> 	c/d is a symlink to c/e
> 	c/e is a mountpoint
> What should umount("a/b", 0) do?  There are two possibilities - removing
> vfsmount on top of a/b or one on top of c/e...
> 
> We have the latter semantics; _that_ is what this GETATTR is about.  It's
> a fairly obscure corner case - the question is not even whether to follow
> symlinks, it's whether to follow _mounts_ on the last component.  Hell
> knows; I'm seriously tempted to change it "don't follow mounts" and see if
> anyone complains.  The only case when behaviour would change would be
> a symlink mounted somewhere (note that this is _not_ something that can easily
> happen; e.g. mount --bind does follow symlinks) and umount(2) given the
> path resolving to the mountpoint of that symlink.
>  
> > I think FS_REVAL_DOT is needed so that if you call stat("."), it will update
> > attributes from the server if the cache is old.  However it seems to do a
> > whole lot more than that, including "lookup" calls which it I'm sure is wrong.
> 
> Far from only that.  FS_REVAL_DOT is a misnomer - it's not just ".".  Take
> a look at the places where LOOKUP_JUMPED is set; _that_ is what drives the
> damn thing.  Essentially, LOOKUP_JUMPED is "we hadn't arrived here by
> lookup in parent"; "." (or "/") obviously qualify, but so does following
> a procfs-style symlink, or .., for that matter.  "foo/.", OTOH, does *not*.
> 
> It matters only in the end of the pathname, of course.  So we don't need to do
> revalidate every time we step on e.g. .. or cross a mountpoint (let alone
> when we step on .), as long as we make sure that revalidate isn't missing in
> the end of path.

Ok, that helps. In that case, this patch might be a reasonable
forward-port of the one Neil sent earlier today. Note that this doesn't
really do anything for the umount problem, but it does seem to fix the
testcase for the problem I've been looking at.

Thoughts?

---------------[snip]--------------

nfs: handle d_revalidate of 'dot' correctly

When d_revalidate is called on a dentry because FS_REVAL_DOT is set it
isn't really appropriate to revalidate the name.

If the path was simply ".", then the current-working-directory could
have been renamed on the server and should still be accessible as "."
even if it has a new name.

If the path was "/some/long/path/.", then the final component ("path" in
this case) has already been revalidated and there is no particular need
to do it again.

If LOOKUP_JUMPED is set in the flags, then that indicates that we
reached this dentry by some means other than a lookup in a parent
directory. In that case, verifying the dentry by name is pretty
meaningless. We do however want to know whether the inode is still good.

Based-on-Patch-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/dir.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a8bd28c..f159c3c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1076,6 +1076,19 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
 	if (NFS_STALE(inode))
 		goto out_bad;
 
+	/*
+	 * If we didn't reach this dentry as the result of a lookup in the
+	 * parent dir, then LOOKUP_JUMPED will be set. In that case, however
+	 * the name not really relevant, so just revalidate the inode
+	 */
+	if (flags & LOOKUP_JUMPED) {
+		error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+		if (error)
+			goto out_bad;
+		else
+			goto out_valid;
+	}
+
 	error = -ENOMEM;
 	fhandle = nfs_alloc_fhandle();
 	fattr = nfs_alloc_fattr();
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 19:46       ` Jeff Layton
@ 2013-02-18 20:15         ` Al Viro
  2013-02-18 23:14           ` NeilBrown
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2013-02-18 20:15 UTC (permalink / raw)
  To: Jeff Layton; +Cc: NeilBrown, Myklebust, Trond, NFS, Linus Torvalds

On Mon, Feb 18, 2013 at 02:46:55PM -0500, Jeff Layton wrote:

> Ok, that helps. In that case, this patch might be a reasonable
> forward-port of the one Neil sent earlier today. Note that this doesn't
> really do anything for the umount problem, but it does seem to fix the
> testcase for the problem I've been looking at.
> 
> Thoughts?

If we really go for "in this case revalidate should be weaker", we might
as well introduce a separate method for it.

As it is, we have several callers of ->d_revalidate(); this one (in
complete_walk(), only for FS_REVAL_DOT filesystems) and ones in
lookup_dcache() and lookup_fast() (in both cases we have and want the
name to match).  There are only two fs with FS_REVAL_DOT present - nfs
and 9p.  *IF* we want to make ->d_revalidate() on NFS behave differently
in complete_walk() case, it would argue for just splitting the method
in two, replacing FS_REVAL_DOT with "dentry has this method" and probably
taking a good look at what 9p needs in the same case.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 15:36       ` Chuck Lever
@ 2013-02-18 21:58         ` J. Bruce Fields
  2013-02-18 22:05           ` Jeff Layton
  2013-02-18 22:16           ` Chuck Lever
  0 siblings, 2 replies; 17+ messages in thread
From: J. Bruce Fields @ 2013-02-18 21:58 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jeff Layton, NeilBrown, Myklebust, Trond, Alexander Viro, NFS, steved

On Mon, Feb 18, 2013 at 10:36:52AM -0500, Chuck Lever wrote:
> Before removing FS_REVAL_DOT, I recommend some archaeology:  find out what was fixed by adding that for NFS.  I worked on something related years ago and have a vague recollection about this that there is a situation where revalidating dot is important.
> 
> Unfortunately I don't see it in a quick browse through the commit log for fs/nfs/dir.c.  It may be in BitKeeper.

Everybody should have one of these historical git repos set up--they're
awesome.

(No idea if this is helpful, I just note this seems to be where
FS_REVAL_DOT got added to fs_flags.)

--b.

commit e14720a157f5d8745006f44520dfa3b8ac498328
Author: Trond Myklebust <trond.myklebust@fys.uio.no>
Date:   Tue Aug 19 21:35:45 2003 -0700

    [PATCH] Support dentry revalidation under open(".")
    
    link_path_walk() currently treats the special filenames ".", ".."
    and "/" differently in that it does not call down to the filesystem in
    order to revalidate the cached dentry, but just assumes that it is
    fine.
    
      For most filesystems this is OK, but it the case of the stateless
    NFS, this means that it circumvents path staleness detection, and the
    attribute+data cache revalidation code on such common commands as
    opendir(".").
    
    This change provides a way to do such revalidation for NFS without
    impacting other filesystems.
    
    Note: the failure to revalidate the path here does not result in a
    call to d_invalidate() unlike (all?) other calls to d_revalidate(). It
    only results in an ESTALE error being returned to the caller.

diff --git a/fs/namei.c b/fs/namei.c
index f922236..9e1fcb9 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -572,7 +572,7 @@ int link_path_walk(const char * name, struct nameidata *nd)
 	while (*name=='/')
 		name++;
 	if (!*name)
-		goto return_base;
+		goto return_reval;
 
 	inode = nd->dentry->d_inode;
 	if (current->link_count)
@@ -693,7 +693,7 @@ last_component:
 				inode = nd->dentry->d_inode;
 				/* fallthrough */
 			case 1:
-				goto return_base;
+				goto return_reval;
 		}
 		if (nd->dentry->d_op && nd->dentry->d_op->d_hash) {
 			err = nd->dentry->d_op->d_hash(nd->dentry, &this);
@@ -737,6 +737,20 @@ lookup_parent:
 			nd->last_type = LAST_DOT;
 		else if (this.len == 2 && this.name[1] == '.')
 			nd->last_type = LAST_DOTDOT;
+		else
+			goto return_base;
+return_reval:
+		/*
+		 * We bypassed the ordinary revalidation routines.
+		 * We may need to check the cached dentry for staleness.
+		 */
+		if (nd->dentry && nd->dentry->d_sb &&
+		    (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
+			err = -ESTALE;
+			/* Note: we do not d_invalidate() */
+			if (!nd->dentry->d_op->d_revalidate(nd->dentry, nd))
+				break;
+		}
 return_base:
 		return 0;
 out_dput:
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 1bae8c1..c8e3191 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1343,7 +1343,7 @@ static struct file_system_type nfs_fs_type = {
 	.name		= "nfs",
 	.get_sb		= nfs_get_sb,
 	.kill_sb	= nfs_kill_super,
-	.fs_flags	= FS_ODD_RENAME,
+	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT,
 };
 
 #ifdef CONFIG_NFS_V4
@@ -1575,7 +1575,7 @@ static struct file_system_type nfs4_fs_type = {
 	.name		= "nfs4",
 	.get_sb		= nfs4_get_sb,
 	.kill_sb	= nfs_kill_super,
-	.fs_flags	= FS_ODD_RENAME,
+	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT,
 };
 
 #define nfs4_zero_state(nfsi) \
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8df205c..a9e02e3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -89,6 +89,7 @@ extern int leases_enable, dir_notify_enable, lease_break_time;
 
 /* public flags for file_system_type */
 #define FS_REQUIRES_DEV 1 
+#define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
 #define FS_ODD_RENAME	32768	/* Temporary stuff; will go away as soon
 				  * as nfs_rename() will be cleaned up
 				  */

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 21:58         ` J. Bruce Fields
@ 2013-02-18 22:05           ` Jeff Layton
  2013-02-18 22:16           ` Chuck Lever
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2013-02-18 22:05 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Chuck Lever, NeilBrown, Myklebust, Trond, Alexander Viro, NFS, steved

On Mon, 18 Feb 2013 16:58:29 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Feb 18, 2013 at 10:36:52AM -0500, Chuck Lever wrote:
> > Before removing FS_REVAL_DOT, I recommend some archaeology:  find out what was fixed by adding that for NFS.  I worked on something related years ago and have a vague recollection about this that there is a situation where revalidating dot is important.
> > 
> > Unfortunately I don't see it in a quick browse through the commit log for fs/nfs/dir.c.  It may be in BitKeeper.
> 
> Everybody should have one of these historical git repos set up--they're
> awesome.
> 
> (No idea if this is helpful, I just note this seems to be where
> FS_REVAL_DOT got added to fs_flags.)
> 
> --b.
> 
> commit e14720a157f5d8745006f44520dfa3b8ac498328
> Author: Trond Myklebust <trond.myklebust@fys.uio.no>
> Date:   Tue Aug 19 21:35:45 2003 -0700
> 
>     [PATCH] Support dentry revalidation under open(".")
>     
>     link_path_walk() currently treats the special filenames ".", ".."
>     and "/" differently in that it does not call down to the filesystem in
>     order to revalidate the cached dentry, but just assumes that it is
>     fine.
>     
>       For most filesystems this is OK, but it the case of the stateless
>     NFS, this means that it circumvents path staleness detection, and the
>     attribute+data cache revalidation code on such common commands as
>     opendir(".").
>     
>     This change provides a way to do such revalidation for NFS without
>     impacting other filesystems.
>     
>     Note: the failure to revalidate the path here does not result in a
>     call to d_invalidate() unlike (all?) other calls to d_revalidate(). It
>     only results in an ESTALE error being returned to the caller.
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index f922236..9e1fcb9 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -572,7 +572,7 @@ int link_path_walk(const char * name, struct nameidata *nd)
>  	while (*name=='/')
>  		name++;
>  	if (!*name)
> -		goto return_base;
> +		goto return_reval;
>  
>  	inode = nd->dentry->d_inode;
>  	if (current->link_count)
> @@ -693,7 +693,7 @@ last_component:
>  				inode = nd->dentry->d_inode;
>  				/* fallthrough */
>  			case 1:
> -				goto return_base;
> +				goto return_reval;
>  		}
>  		if (nd->dentry->d_op && nd->dentry->d_op->d_hash) {
>  			err = nd->dentry->d_op->d_hash(nd->dentry, &this);
> @@ -737,6 +737,20 @@ lookup_parent:
>  			nd->last_type = LAST_DOT;
>  		else if (this.len == 2 && this.name[1] == '.')
>  			nd->last_type = LAST_DOTDOT;
> +		else
> +			goto return_base;
> +return_reval:
> +		/*
> +		 * We bypassed the ordinary revalidation routines.
> +		 * We may need to check the cached dentry for staleness.
> +		 */
> +		if (nd->dentry && nd->dentry->d_sb &&
> +		    (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
> +			err = -ESTALE;
> +			/* Note: we do not d_invalidate() */
> +			if (!nd->dentry->d_op->d_revalidate(nd->dentry, nd))
> +				break;
> +		}
>  return_base:
>  		return 0;
>  out_dput:
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 1bae8c1..c8e3191 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1343,7 +1343,7 @@ static struct file_system_type nfs_fs_type = {
>  	.name		= "nfs",
>  	.get_sb		= nfs_get_sb,
>  	.kill_sb	= nfs_kill_super,
> -	.fs_flags	= FS_ODD_RENAME,
> +	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT,
>  };
>  
>  #ifdef CONFIG_NFS_V4
> @@ -1575,7 +1575,7 @@ static struct file_system_type nfs4_fs_type = {
>  	.name		= "nfs4",
>  	.get_sb		= nfs4_get_sb,
>  	.kill_sb	= nfs_kill_super,
> -	.fs_flags	= FS_ODD_RENAME,
> +	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT,
>  };
>  
>  #define nfs4_zero_state(nfsi) \
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8df205c..a9e02e3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -89,6 +89,7 @@ extern int leases_enable, dir_notify_enable, lease_break_time;
>  
>  /* public flags for file_system_type */
>  #define FS_REQUIRES_DEV 1 
> +#define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
>  #define FS_ODD_RENAME	32768	/* Temporary stuff; will go away as soon
>  				  * as nfs_rename() will be cleaned up
>  				  */


Thanks Bruce, that helps. I think we still end up with those semantics
with the patch I sent earlier, but Al has a good point that we can do
this a little more cleanly without needing this flag.

I'll plan to take another swipe at it tomorrow and see whether we can't
get this fixed. Once we have that, we'll still need to address the
problem Neil originally brought up on this thread with umount(), but
maybe that will be easier once we've fixed this problem...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 21:58         ` J. Bruce Fields
  2013-02-18 22:05           ` Jeff Layton
@ 2013-02-18 22:16           ` Chuck Lever
  1 sibling, 0 replies; 17+ messages in thread
From: Chuck Lever @ 2013-02-18 22:16 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Jeff Layton, NeilBrown, Myklebust, Trond, Alexander Viro, NFS, steved


On Feb 18, 2013, at 4:58 PM, "J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Feb 18, 2013 at 10:36:52AM -0500, Chuck Lever wrote:
>> Before removing FS_REVAL_DOT, I recommend some archaeology:  find out what was fixed by adding that for NFS.  I worked on something related years ago and have a vague recollection about this that there is a situation where revalidating dot is important.
>> 
>> Unfortunately I don't see it in a quick browse through the commit log for fs/nfs/dir.c.  It may be in BitKeeper.
> 
> Everybody should have one of these historical git repos set up--they're
> awesome.

I forgot I had one of those.  Thanks.

> (No idea if this is helpful, I just note this seems to be where
> FS_REVAL_DOT got added to fs_flags.)

Yes, I believe this is what I was remembering.

> --b.
> 
> commit e14720a157f5d8745006f44520dfa3b8ac498328
> Author: Trond Myklebust <trond.myklebust@fys.uio.no>
> Date:   Tue Aug 19 21:35:45 2003 -0700
> 
>    [PATCH] Support dentry revalidation under open(".")
> 
>    link_path_walk() currently treats the special filenames ".", ".."
>    and "/" differently in that it does not call down to the filesystem in
>    order to revalidate the cached dentry, but just assumes that it is
>    fine.
> 
>      For most filesystems this is OK, but it the case of the stateless
>    NFS, this means that it circumvents path staleness detection, and the
>    attribute+data cache revalidation code on such common commands as
>    opendir(".").
> 
>    This change provides a way to do such revalidation for NFS without
>    impacting other filesystems.
> 
>    Note: the failure to revalidate the path here does not result in a
>    call to d_invalidate() unlike (all?) other calls to d_revalidate(). It
>    only results in an ESTALE error being returned to the caller.
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index f922236..9e1fcb9 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -572,7 +572,7 @@ int link_path_walk(const char * name, struct nameidata *nd)
> 	while (*name=='/')
> 		name++;
> 	if (!*name)
> -		goto return_base;
> +		goto return_reval;
> 
> 	inode = nd->dentry->d_inode;
> 	if (current->link_count)
> @@ -693,7 +693,7 @@ last_component:
> 				inode = nd->dentry->d_inode;
> 				/* fallthrough */
> 			case 1:
> -				goto return_base;
> +				goto return_reval;
> 		}
> 		if (nd->dentry->d_op && nd->dentry->d_op->d_hash) {
> 			err = nd->dentry->d_op->d_hash(nd->dentry, &this);
> @@ -737,6 +737,20 @@ lookup_parent:
> 			nd->last_type = LAST_DOT;
> 		else if (this.len == 2 && this.name[1] == '.')
> 			nd->last_type = LAST_DOTDOT;
> +		else
> +			goto return_base;
> +return_reval:
> +		/*
> +		 * We bypassed the ordinary revalidation routines.
> +		 * We may need to check the cached dentry for staleness.
> +		 */
> +		if (nd->dentry && nd->dentry->d_sb &&
> +		    (nd->dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT)) {
> +			err = -ESTALE;
> +			/* Note: we do not d_invalidate() */
> +			if (!nd->dentry->d_op->d_revalidate(nd->dentry, nd))
> +				break;
> +		}
> return_base:
> 		return 0;
> out_dput:
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 1bae8c1..c8e3191 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1343,7 +1343,7 @@ static struct file_system_type nfs_fs_type = {
> 	.name		= "nfs",
> 	.get_sb		= nfs_get_sb,
> 	.kill_sb	= nfs_kill_super,
> -	.fs_flags	= FS_ODD_RENAME,
> +	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT,
> };
> 
> #ifdef CONFIG_NFS_V4
> @@ -1575,7 +1575,7 @@ static struct file_system_type nfs4_fs_type = {
> 	.name		= "nfs4",
> 	.get_sb		= nfs4_get_sb,
> 	.kill_sb	= nfs_kill_super,
> -	.fs_flags	= FS_ODD_RENAME,
> +	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT,
> };
> 
> #define nfs4_zero_state(nfsi) \
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 8df205c..a9e02e3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -89,6 +89,7 @@ extern int leases_enable, dir_notify_enable, lease_break_time;
> 
> /* public flags for file_system_type */
> #define FS_REQUIRES_DEV 1 
> +#define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
> #define FS_ODD_RENAME	32768	/* Temporary stuff; will go away as soon
> 				  * as nfs_rename() will be cleaned up
> 				  */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 18:46     ` Al Viro
  2013-02-18 19:46       ` Jeff Layton
@ 2013-02-18 23:10       ` NeilBrown
  2013-02-18 23:17         ` Myklebust, Trond
  2013-02-19 14:27         ` Jeff Layton
  1 sibling, 2 replies; 17+ messages in thread
From: NeilBrown @ 2013-02-18 23:10 UTC (permalink / raw)
  To: Al Viro; +Cc: Jeff Layton, Myklebust, Trond, NFS, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2941 bytes --]

On Mon, 18 Feb 2013 18:46:09 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Feb 18, 2013 at 01:25:09PM +1100, NeilBrown wrote:
> 
> > I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> > works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> > last step and returns the mounted-on directory, not the mountpoint that is
> > mounted there - or at least makes sure not revalidate happens on that final
> > mounted directory.
> 
> I don't think LOOKUP_MOUNTPOINT is a good idea.  For one thing, we have
> fairly few places that might want it, all of them in core VFS.  Might as
> well provide a separate function for them, a-la path_lookupat() vs.
> path_openat().
> 
> For another, we need to decide what to do with a really nasty corner case:
> 	a/b is a mountpoint, with c/d bound on it.
> 	c/d is a symlink to c/e
> 	c/e is a mountpoint
> What should umount("a/b", 0) do?  There are two possibilities - removing
> vfsmount on top of a/b or one on top of c/e...
> 
> We have the latter semantics; _that_ is what this GETATTR is about.  It's
> a fairly obscure corner case - the question is not even whether to follow
> symlinks, it's whether to follow _mounts_ on the last component.  Hell
> knows; I'm seriously tempted to change it "don't follow mounts" and see if
> anyone complains.  The only case when behaviour would change would be
> a symlink mounted somewhere (note that this is _not_ something that can easily
> happen; e.g. mount --bind does follow symlinks) and umount(2) given the
> path resolving to the mountpoint of that symlink.

Thinking about this some more, I now realise that my LOOKUP_MOUNTPOINT idea
was too simplistic and missed the real point.

The real point is that for unmount, we want to resolve the the path without
any reference to any filesystem at all - the lookup should be handled
entirely by the dcache.
Any mountpoint is pinned in the dcache, and consequently any ancestor of any
mount point also is.  So the dcache will lead us to the dentry that we want.

And the dentry is *all* we want.  It doesn't really matter what the inode is
like, or whether the filesystem thinks that the inode or name still exist.
All we need to do is find a dentry that must be  in the cache, and detach the
mount that is there.

Whether that is achieved by a LOOKUP_ flag or a separate lookup function
doesn't matter much to me.

I suspect we need to allow for passing a symlink to unmount, and the symlink
might not be in cache, so we cannot insist uniformly on only using the dcache.
However if a name is in the cache, and the cached data suggests that it is a
directory, then we should trust that and avoid referring to the filesystem.

umount is really quite unique in this.  All other times we lookup a path we
want to use the thing we found.  With umount, we want to stop using it.

???

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 20:15         ` Al Viro
@ 2013-02-18 23:14           ` NeilBrown
  2013-02-19 12:33             ` Jeff Layton
  0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2013-02-18 23:14 UTC (permalink / raw)
  To: Al Viro; +Cc: Jeff Layton, Myklebust, Trond, NFS, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

On Mon, 18 Feb 2013 20:15:02 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Feb 18, 2013 at 02:46:55PM -0500, Jeff Layton wrote:
> 
> > Ok, that helps. In that case, this patch might be a reasonable
> > forward-port of the one Neil sent earlier today. Note that this doesn't
> > really do anything for the umount problem, but it does seem to fix the
> > testcase for the problem I've been looking at.
> > 
> > Thoughts?
> 
> If we really go for "in this case revalidate should be weaker", we might
> as well introduce a separate method for it.
> 
> As it is, we have several callers of ->d_revalidate(); this one (in
> complete_walk(), only for FS_REVAL_DOT filesystems) and ones in
> lookup_dcache() and lookup_fast() (in both cases we have and want the
> name to match).  There are only two fs with FS_REVAL_DOT present - nfs
> and 9p.  *IF* we want to make ->d_revalidate() on NFS behave differently
> in complete_walk() case, it would argue for just splitting the method
> in two, replacing FS_REVAL_DOT with "dentry has this method" and probably
> taking a good look at what 9p needs in the same case.

Sounds good to me.

Reminds me that we used to have an i_op->revalidate() method for revalidating
just the inode (not the dentry).  It called nfs_revalidate_inode() for NFS.

We lost it over a decade ago:

commit cc41b90f8a9ad3cd85a39dd4fcc71f965a675b0e
Author: Alexander Viro <viro@math.psu.edu>
Date:   Tue May 21 21:12:46 2002 -0700

    [PATCH] kill ->i_op->revalidate()
    
    kill ->i_op->revalidate()



:-)

(this doesn't help my umount problem though)

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 23:10       ` NeilBrown
@ 2013-02-18 23:17         ` Myklebust, Trond
  2013-02-18 23:31           ` NeilBrown
  2013-02-19 14:27         ` Jeff Layton
  1 sibling, 1 reply; 17+ messages in thread
From: Myklebust, Trond @ 2013-02-18 23:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: Al Viro, Jeff Layton, NFS, Linus Torvalds

On Tue, 2013-02-19 at 10:10 +1100, NeilBrown wrote:
> On Mon, 18 Feb 2013 18:46:09 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > On Mon, Feb 18, 2013 at 01:25:09PM +1100, NeilBrown wrote:
> > 
> > > I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> > > works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> > > last step and returns the mounted-on directory, not the mountpoint that is
> > > mounted there - or at least makes sure not revalidate happens on that final
> > > mounted directory.
> > 
> > I don't think LOOKUP_MOUNTPOINT is a good idea.  For one thing, we have
> > fairly few places that might want it, all of them in core VFS.  Might as
> > well provide a separate function for them, a-la path_lookupat() vs.
> > path_openat().
> > 
> > For another, we need to decide what to do with a really nasty corner case:
> > 	a/b is a mountpoint, with c/d bound on it.
> > 	c/d is a symlink to c/e
> > 	c/e is a mountpoint
> > What should umount("a/b", 0) do?  There are two possibilities - removing
> > vfsmount on top of a/b or one on top of c/e...
> > 
> > We have the latter semantics; _that_ is what this GETATTR is about.  It's
> > a fairly obscure corner case - the question is not even whether to follow
> > symlinks, it's whether to follow _mounts_ on the last component.  Hell
> > knows; I'm seriously tempted to change it "don't follow mounts" and see if
> > anyone complains.  The only case when behaviour would change would be
> > a symlink mounted somewhere (note that this is _not_ something that can easily
> > happen; e.g. mount --bind does follow symlinks) and umount(2) given the
> > path resolving to the mountpoint of that symlink.
> 
> Thinking about this some more, I now realise that my LOOKUP_MOUNTPOINT idea
> was too simplistic and missed the real point.
> 
> The real point is that for unmount, we want to resolve the the path without
> any reference to any filesystem at all - the lookup should be handled
> entirely by the dcache.
> Any mountpoint is pinned in the dcache, and consequently any ancestor of any
> mount point also is.  So the dcache will lead us to the dentry that we want.
> 
> And the dentry is *all* we want.  It doesn't really matter what the inode is
> like, or whether the filesystem thinks that the inode or name still exist.
> All we need to do is find a dentry that must be  in the cache, and detach the
> mount that is there.
> 
> Whether that is achieved by a LOOKUP_ flag or a separate lookup function
> doesn't matter much to me.
> 
> I suspect we need to allow for passing a symlink to unmount, and the symlink
> might not be in cache, so we cannot insist uniformly on only using the dcache.
> However if a name is in the cache, and the cached data suggests that it is a
> directory, then we should trust that and avoid referring to the filesystem.
> 
> umount is really quite unique in this.  All other times we lookup a path we
> want to use the thing we found.  With umount, we want to stop using it.
> 
> ???

Add a umountat() syscall so that you can supply a file descriptor? :-)

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 23:17         ` Myklebust, Trond
@ 2013-02-18 23:31           ` NeilBrown
  0 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2013-02-18 23:31 UTC (permalink / raw)
  To: Myklebust, Trond; +Cc: Al Viro, Jeff Layton, NFS, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 3604 bytes --]

On Mon, 18 Feb 2013 23:17:42 +0000 "Myklebust, Trond"
<Trond.Myklebust@netapp.com> wrote:

> On Tue, 2013-02-19 at 10:10 +1100, NeilBrown wrote:
> > On Mon, 18 Feb 2013 18:46:09 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> > 
> > > On Mon, Feb 18, 2013 at 01:25:09PM +1100, NeilBrown wrote:
> > > 
> > > > I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> > > > works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> > > > last step and returns the mounted-on directory, not the mountpoint that is
> > > > mounted there - or at least makes sure not revalidate happens on that final
> > > > mounted directory.
> > > 
> > > I don't think LOOKUP_MOUNTPOINT is a good idea.  For one thing, we have
> > > fairly few places that might want it, all of them in core VFS.  Might as
> > > well provide a separate function for them, a-la path_lookupat() vs.
> > > path_openat().
> > > 
> > > For another, we need to decide what to do with a really nasty corner case:
> > > 	a/b is a mountpoint, with c/d bound on it.
> > > 	c/d is a symlink to c/e
> > > 	c/e is a mountpoint
> > > What should umount("a/b", 0) do?  There are two possibilities - removing
> > > vfsmount on top of a/b or one on top of c/e...
> > > 
> > > We have the latter semantics; _that_ is what this GETATTR is about.  It's
> > > a fairly obscure corner case - the question is not even whether to follow
> > > symlinks, it's whether to follow _mounts_ on the last component.  Hell
> > > knows; I'm seriously tempted to change it "don't follow mounts" and see if
> > > anyone complains.  The only case when behaviour would change would be
> > > a symlink mounted somewhere (note that this is _not_ something that can easily
> > > happen; e.g. mount --bind does follow symlinks) and umount(2) given the
> > > path resolving to the mountpoint of that symlink.
> > 
> > Thinking about this some more, I now realise that my LOOKUP_MOUNTPOINT idea
> > was too simplistic and missed the real point.
> > 
> > The real point is that for unmount, we want to resolve the the path without
> > any reference to any filesystem at all - the lookup should be handled
> > entirely by the dcache.
> > Any mountpoint is pinned in the dcache, and consequently any ancestor of any
> > mount point also is.  So the dcache will lead us to the dentry that we want.
> > 
> > And the dentry is *all* we want.  It doesn't really matter what the inode is
> > like, or whether the filesystem thinks that the inode or name still exist.
> > All we need to do is find a dentry that must be  in the cache, and detach the
> > mount that is there.
> > 
> > Whether that is achieved by a LOOKUP_ flag or a separate lookup function
> > doesn't matter much to me.
> > 
> > I suspect we need to allow for passing a symlink to unmount, and the symlink
> > might not be in cache, so we cannot insist uniformly on only using the dcache.
> > However if a name is in the cache, and the cached data suggests that it is a
> > directory, then we should trust that and avoid referring to the filesystem.
> > 
> > umount is really quite unique in this.  All other times we lookup a path we
> > want to use the thing we found.  With umount, we want to stop using it.
>
> > ???
> 
> Add a umountat() syscall so that you can supply a file descriptor? :-)
> 

If I could get that file descriptor by opening some magic file in /proc which
led immediately to the mount point, then I'd say "yes please!".
Otherwise, I don't think it helps, and so support your ":-)".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 23:14           ` NeilBrown
@ 2013-02-19 12:33             ` Jeff Layton
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2013-02-19 12:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: Al Viro, Myklebust, Trond, NFS, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2061 bytes --]

On Tue, 19 Feb 2013 10:14:50 +1100
NeilBrown <neilb@suse.de> wrote:

> On Mon, 18 Feb 2013 20:15:02 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > On Mon, Feb 18, 2013 at 02:46:55PM -0500, Jeff Layton wrote:
> > 
> > > Ok, that helps. In that case, this patch might be a reasonable
> > > forward-port of the one Neil sent earlier today. Note that this doesn't
> > > really do anything for the umount problem, but it does seem to fix the
> > > testcase for the problem I've been looking at.
> > > 
> > > Thoughts?
> > 
> > If we really go for "in this case revalidate should be weaker", we might
> > as well introduce a separate method for it.
> > 
> > As it is, we have several callers of ->d_revalidate(); this one (in
> > complete_walk(), only for FS_REVAL_DOT filesystems) and ones in
> > lookup_dcache() and lookup_fast() (in both cases we have and want the
> > name to match).  There are only two fs with FS_REVAL_DOT present - nfs
> > and 9p.  *IF* we want to make ->d_revalidate() on NFS behave differently
> > in complete_walk() case, it would argue for just splitting the method
> > in two, replacing FS_REVAL_DOT with "dentry has this method" and probably
> > taking a good look at what 9p needs in the same case.
> 
> Sounds good to me.
> 
> Reminds me that we used to have an i_op->revalidate() method for revalidating
> just the inode (not the dentry).  It called nfs_revalidate_inode() for NFS.
> 
> We lost it over a decade ago:
> 
> commit cc41b90f8a9ad3cd85a39dd4fcc71f965a675b0e
> Author: Alexander Viro <viro@math.psu.edu>
> Date:   Tue May 21 21:12:46 2002 -0700
> 
>     [PATCH] kill ->i_op->revalidate()
>     
>     kill ->i_op->revalidate()
> 
> 
> 
> :-)
> 
> (this doesn't help my umount problem though)
> 

That would work for NFS, but unfortunately I think 9p needs a dentry
operation here if we're not going to break it. Anyone want to suggest a
new name for it? I'm using d_reval_dot for now, but a better name would
be welcome...

-- 
Jeff Layton <jlayton@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: More fun with unmounting ESTALE directories.
  2013-02-18 23:10       ` NeilBrown
  2013-02-18 23:17         ` Myklebust, Trond
@ 2013-02-19 14:27         ` Jeff Layton
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2013-02-19 14:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: Al Viro, Myklebust, Trond, NFS, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 3574 bytes --]

On Tue, 19 Feb 2013 10:10:31 +1100
NeilBrown <neilb@suse.de> wrote:

> On Mon, 18 Feb 2013 18:46:09 +0000 Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > On Mon, Feb 18, 2013 at 01:25:09PM +1100, NeilBrown wrote:
> > 
> > > I would be really nice if sys_unmount used a LOOKUP_MOUNTPOINT flag that
> > > works a bit like LOOKUP_PARENT and LOOKUP_NOFOLLOW in that it skips the very
> > > last step and returns the mounted-on directory, not the mountpoint that is
> > > mounted there - or at least makes sure not revalidate happens on that final
> > > mounted directory.
> > 
> > I don't think LOOKUP_MOUNTPOINT is a good idea.  For one thing, we have
> > fairly few places that might want it, all of them in core VFS.  Might as
> > well provide a separate function for them, a-la path_lookupat() vs.
> > path_openat().
> > 
> > For another, we need to decide what to do with a really nasty corner case:
> > 	a/b is a mountpoint, with c/d bound on it.
> > 	c/d is a symlink to c/e
> > 	c/e is a mountpoint
> > What should umount("a/b", 0) do?  There are two possibilities - removing
> > vfsmount on top of a/b or one on top of c/e...
> > 
> > We have the latter semantics; _that_ is what this GETATTR is about.  It's
> > a fairly obscure corner case - the question is not even whether to follow
> > symlinks, it's whether to follow _mounts_ on the last component.  Hell
> > knows; I'm seriously tempted to change it "don't follow mounts" and see if
> > anyone complains.  The only case when behaviour would change would be
> > a symlink mounted somewhere (note that this is _not_ something that can easily
> > happen; e.g. mount --bind does follow symlinks) and umount(2) given the
> > path resolving to the mountpoint of that symlink.
> 
> Thinking about this some more, I now realise that my LOOKUP_MOUNTPOINT idea
> was too simplistic and missed the real point.
> 
> The real point is that for unmount, we want to resolve the the path without
> any reference to any filesystem at all - the lookup should be handled
> entirely by the dcache.
> Any mountpoint is pinned in the dcache, and consequently any ancestor of any
> mount point also is.  So the dcache will lead us to the dentry that we want.
> 
> And the dentry is *all* we want.  It doesn't really matter what the inode is
> like, or whether the filesystem thinks that the inode or name still exist.
> All we need to do is find a dentry that must be  in the cache, and detach the
> mount that is there.
> 
> Whether that is achieved by a LOOKUP_ flag or a separate lookup function
> doesn't matter much to me.
> 
> I suspect we need to allow for passing a symlink to unmount, and the symlink
> might not be in cache, so we cannot insist uniformly on only using the dcache.
> However if a name is in the cache, and the cached data suggests that it is a
> directory, then we should trust that and avoid referring to the filesystem.
> 
> umount is really quite unique in this.  All other times we lookup a path we
> want to use the thing we found.  With umount, we want to stop using it.
> 

From an IRC conversation with Al yesterday, which may point you in the
right direction:

12:49 < viro> jlayton: umount() simply shouldn't do full lookup for path
12:50 < viro> it should get the parent
12:50 < viro> _then_ do pure dcache lookup for the last step

...of course that's still tricky if the last component is a symlink
since you'd need to chase it by hand, but that seems like a reasonable
way to start.

-- 
Jeff Layton <jlayton@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-02-19 14:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12  0:38 More fun with unmounting ESTALE directories NeilBrown
2013-02-14 15:42 ` Jeff Layton
2013-02-18  2:25   ` NeilBrown
2013-02-18 12:41     ` Jeff Layton
2013-02-18 15:36       ` Chuck Lever
2013-02-18 21:58         ` J. Bruce Fields
2013-02-18 22:05           ` Jeff Layton
2013-02-18 22:16           ` Chuck Lever
2013-02-18 18:46     ` Al Viro
2013-02-18 19:46       ` Jeff Layton
2013-02-18 20:15         ` Al Viro
2013-02-18 23:14           ` NeilBrown
2013-02-19 12:33             ` Jeff Layton
2013-02-18 23:10       ` NeilBrown
2013-02-18 23:17         ` Myklebust, Trond
2013-02-18 23:31           ` NeilBrown
2013-02-19 14:27         ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.