All of lore.kernel.org
 help / color / mirror / Atom feed
* client lookup_revalidate bug
@ 2019-05-29 15:08 Benjamin Coddington
  2019-05-29 16:21 ` Trond Myklebust
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Coddington @ 2019-05-29 15:08 UTC (permalink / raw)
  To: Linux NFS Mailing List

Hey, here's an interesting one, this seems wrong:

[root@fedora27_c2_v5 ~]# mkdir /mnt/one
[root@fedora27_c2_v5 ~]# mkdir /mnt/two
[root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache 
192.168.122.50:/exports /mnt/one
[root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache 
192.168.122.50:/exports /mnt/two
[root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
[root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
[root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
[root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
[root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
[root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
[root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
[root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
   File: /mnt/two/B/foo
   Size: 0         	Blocks: 0          IO Block: 262144 regular empty 
file
Device: 2fh/47d	Inode: 439603      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:nfs_t:s0
Access: 2019-05-28 14:00:18.929663705 -0400
Modify: 2019-05-28 14:00:18.929663705 -0400
Change: 2019-05-28 14:00:58.990124573 -0400
  Birth: -


^^ that lstat should return -ENOENT.

I think we detect a stale directory by comparing the directory's 
change_attr with the dentry's d_time.  But, here's a case where they are 
the same!

Am I wrong about this, or any clever ideas to catch this case?

Ben

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: client lookup_revalidate bug
  2019-05-29 15:08 client lookup_revalidate bug Benjamin Coddington
@ 2019-05-29 16:21 ` Trond Myklebust
  2019-05-29 16:39   ` Benjamin Coddington
  0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2019-05-29 16:21 UTC (permalink / raw)
  To: linux-nfs, bcodding

On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
> Hey, here's an interesting one, this seems wrong:
> 
> [root@fedora27_c2_v5 ~]# mkdir /mnt/one
> [root@fedora27_c2_v5 ~]# mkdir /mnt/two
> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache 
> 192.168.122.50:/exports /mnt/one
> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache 
> 192.168.122.50:/exports /mnt/two
> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
> [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
> [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
>    File: /mnt/two/B/foo
>    Size: 0         	Blocks: 0          IO Block: 262144 regular
> empty 
> file
> Device: 2fh/47d	Inode: 439603      Links: 1
> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid:
> (    0/    root)
> Context: system_u:object_r:nfs_t:s0
> Access: 2019-05-28 14:00:18.929663705 -0400
> Modify: 2019-05-28 14:00:18.929663705 -0400
> Change: 2019-05-28 14:00:58.990124573 -0400
>   Birth: -
> 
> 
> ^^ that lstat should return -ENOENT.
> 
> I think we detect a stale directory by comparing the directory's 
> change_attr with the dentry's d_time.  But, here's a case where they
> are 
> the same!
> 
> Am I wrong about this, or any clever ideas to catch this case?
> 

When you are mounting using 'nosharecache' then you are making /mnt/one
and /mnt/two act as if they are different filesystems. The fact that
they are the same on the server, means you are setting up a testcase
where the files+directories are acting like the "changing on the
server" case as far as the client is concerned.

If you want the above to work in a sane fashion, then just don't use
'nosharecache'.

Cheers
   Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: client lookup_revalidate bug
  2019-05-29 16:21 ` Trond Myklebust
@ 2019-05-29 16:39   ` Benjamin Coddington
  2019-05-29 17:02     ` Achilles Gaikwad
  2019-05-29 17:19     ` Trond Myklebust
  0 siblings, 2 replies; 7+ messages in thread
From: Benjamin Coddington @ 2019-05-29 16:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 29 May 2019, at 12:21, Trond Myklebust wrote:

> On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
>> Hey, here's an interesting one, this seems wrong:
>>
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/one
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/two
>> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
>> 192.168.122.50:/exports /mnt/one
>> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
>> 192.168.122.50:/exports /mnt/two
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
>> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
>> [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
>> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
>> [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
>> [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
>> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
>> [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
>>    File: /mnt/two/B/foo
>>    Size: 0         	Blocks: 0          IO Block: 262144 regular
>> empty
>> file
>> Device: 2fh/47d	Inode: 439603      Links: 1
>> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid:
>> (    0/    root)
>> Context: system_u:object_r:nfs_t:s0
>> Access: 2019-05-28 14:00:18.929663705 -0400
>> Modify: 2019-05-28 14:00:18.929663705 -0400
>> Change: 2019-05-28 14:00:58.990124573 -0400
>>   Birth: -
>>
>>
>> ^^ that lstat should return -ENOENT.
>>
>> I think we detect a stale directory by comparing the directory's
>> change_attr with the dentry's d_time.  But, here's a case where they
>> are
>> the same!
>>
>> Am I wrong about this, or any clever ideas to catch this case?
>>
>
> When you are mounting using 'nosharecache' then you are making /mnt/one
> and /mnt/two act as if they are different filesystems. The fact that
> they are the same on the server, means you are setting up a testcase
> where the files+directories are acting like the "changing on the
> server" case as far as the client is concerned.
>
> If you want the above to work in a sane fashion, then just don't use
> 'nosharecache'.

That was deliberate to avoid having to show two clients in the example..
sorry, I should have been more explicit.

I think the client should be able to detect this case, since it can see an
updated change_attr for that particular directory -- that is
"/mnt/two/B/foo", but maybe it needs to compare the change_attr to its
previous value instead of comparing it to the child's d_time?

The person who reported it has some workload that flips files between
directories on separate clients, and doesn't like it when `mv` reports
"source and destination are the same file".

Ben

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: client lookup_revalidate bug
  2019-05-29 16:39   ` Benjamin Coddington
@ 2019-05-29 17:02     ` Achilles Gaikwad
  2019-05-29 17:19     ` Trond Myklebust
  1 sibling, 0 replies; 7+ messages in thread
From: Achilles Gaikwad @ 2019-05-29 17:02 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: Trond Myklebust, linux-nfs, Takayuki Nagata

The file doesn't exist, but when specifying the complete pathname,
nfsv4 thinks that the file already exists.

[...]
#    mv /mnt/nfs/B/f /mnt/nfs/A/f
mv: ‘/mnt/nfs/B/f’ and ‘/mnt/nfs/A/f’ are the same file

# ls -l /mnt/nfs/A/f
-rw-r--r--. 1 nfsnobody nfsnobody 0 May 29 01:59 /mnt/nfs/A/f

# ls -l /mnt/nfs/A/
total 0
[...]

Things I've tried:
1. dropping cache
2. using noac

What mv tries to do is mv ‘/mnt/nfs/B/f’  ‘/mnt/nfs/A/f’, it finds
that the file already exists. This is a bug.
Enabling rpcdbug logs for
   # rpcdebug -m nfs -s pagecache  lookupcache  dircache fscache

[47821.556171] NFS: nfs_lookup_revalidate(A/f) is valid

Best,
- Achilles
---
Reproducer by my colleague @Takayuki Nagata

Steps to Reproduce:
1. Mount a NFSv4 volume with noac on node1 and node2.

[root@node1 ~]# mount -o noac server:/srv/export /mnt/nfs
[root@node2 ~]# mount -o noac server:/srv/export /mnt/nfs

2. Create directories and a file.

[root@node1 ~]# mkdir /mnt/nfs/{A,B}; touch /mnt/nfs/A/f

3. Move the file from A to B on the node1.

[root@node1 ~]# mv /mnt/nfs/A/f /mnt/nfs/B/

4. cat the file on the node2, and move it from B to A.

[root@node2 ~]# cat /mnt/nfs/B/f; mv /mnt/nfs/B/f /mnt/nfs/A/

5. Move it again from A to B on the node1.

[root@node1 ~]# mv /mnt/nfs/A/f /mnt/nfs/B/

6. cat the file again on the node2, and move it from B to A again.

[root@node2 ~]# cat /mnt/nfs/B/f; mv /mnt/nfs/B/f /mnt/nfs/A/
mv: `/mnt/nfs/B/f' and `/mnt/nfs/A/f' are the same file

Best,
- Achilles
---


On Wed, May 29, 2019 at 10:09 PM Benjamin Coddington
<bcodding@redhat.com> wrote:
>
> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>
> > On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
> >> Hey, here's an interesting one, this seems wrong:
> >>
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/one
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/two
> >> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
> >> 192.168.122.50:/exports /mnt/one
> >> [root@fedora27_c2_v5 ~]# mount -t nfs -ov4,noac,sec=sys,nosharecache
> >> 192.168.122.50:/exports /mnt/two
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
> >> [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
> >> [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
> >> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> >> [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
> >> [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
> >> [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> >> [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
> >>    File: /mnt/two/B/foo
> >>    Size: 0           Blocks: 0          IO Block: 262144 regular
> >> empty
> >> file
> >> Device: 2fh/47d      Inode: 439603      Links: 1
> >> Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid:
> >> (    0/    root)
> >> Context: system_u:object_r:nfs_t:s0
> >> Access: 2019-05-28 14:00:18.929663705 -0400
> >> Modify: 2019-05-28 14:00:18.929663705 -0400
> >> Change: 2019-05-28 14:00:58.990124573 -0400
> >>   Birth: -
> >>
> >>
> >> ^^ that lstat should return -ENOENT.
> >>
> >> I think we detect a stale directory by comparing the directory's
> >> change_attr with the dentry's d_time.  But, here's a case where they
> >> are
> >> the same!
> >>
> >> Am I wrong about this, or any clever ideas to catch this case?
> >>
> >
> > When you are mounting using 'nosharecache' then you are making /mnt/one
> > and /mnt/two act as if they are different filesystems. The fact that
> > they are the same on the server, means you are setting up a testcase
> > where the files+directories are acting like the "changing on the
> > server" case as far as the client is concerned.
> >
> > If you want the above to work in a sane fashion, then just don't use
> > 'nosharecache'.
>
> That was deliberate to avoid having to show two clients in the example..
> sorry, I should have been more explicit.
>
> I think the client should be able to detect this case, since it can see an
> updated change_attr for that particular directory -- that is
> "/mnt/two/B/foo", but maybe it needs to compare the change_attr to its
> previous value instead of comparing it to the child's d_time?
>
> The person who reported it has some workload that flips files between
> directories on separate clients, and doesn't like it when `mv` reports
> "source and destination are the same file".
>
> Ben

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: client lookup_revalidate bug
  2019-05-29 16:39   ` Benjamin Coddington
  2019-05-29 17:02     ` Achilles Gaikwad
@ 2019-05-29 17:19     ` Trond Myklebust
  2019-05-30 10:35       ` Benjamin Coddington
  1 sibling, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2019-05-29 17:19 UTC (permalink / raw)
  To: bcodding; +Cc: linux-nfs

On Wed, 2019-05-29 at 12:39 -0400, Benjamin Coddington wrote:
> On 29 May 2019, at 12:21, Trond Myklebust wrote:
> 
> > On Wed, 2019-05-29 at 11:08 -0400, Benjamin Coddington wrote:
> > > Hey, here's an interesting one, this seems wrong:
> > > 
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/one
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/two
> > > [root@fedora27_c2_v5 ~]# mount -t nfs
> > > -ov4,noac,sec=sys,nosharecache
> > > 192.168.122.50:/exports /mnt/one
> > > [root@fedora27_c2_v5 ~]# mount -t nfs
> > > -ov4,noac,sec=sys,nosharecache
> > > 192.168.122.50:/exports /mnt/two
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/one/A
> > > [root@fedora27_c2_v5 ~]# mkdir /mnt/one/B
> > > [root@fedora27_c2_v5 ~]# touch /mnt/one/A/foo
> > > [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> > > [root@fedora27_c2_v5 ~]# mv /mnt/two/A/foo /mnt/two/B/foo
> > > [root@fedora27_c2_v5 ~]# mv /mnt/one/B/foo /mnt/one/A/foo
> > > [root@fedora27_c2_v5 ~]# cat /mnt/two/A/foo
> > > [root@fedora27_c2_v5 ~]# stat /mnt/two/B/foo
> > >    File: /mnt/two/B/foo
> > >    Size: 0         	Blocks: 0          IO Block: 262144
> > > regular
> > > empty
> > > file
> > > Device: 2fh/47d	Inode: 439603      Links: 1
> > > Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid:
> > > (    0/    root)
> > > Context: system_u:object_r:nfs_t:s0
> > > Access: 2019-05-28 14:00:18.929663705 -0400
> > > Modify: 2019-05-28 14:00:18.929663705 -0400
> > > Change: 2019-05-28 14:00:58.990124573 -0400
> > >   Birth: -
> > > 
> > > 
> > > ^^ that lstat should return -ENOENT.
> > > 
> > > I think we detect a stale directory by comparing the directory's
> > > change_attr with the dentry's d_time.  But, here's a case where
> > > they
> > > are
> > > the same!
> > > 
> > > Am I wrong about this, or any clever ideas to catch this case?
> > > 
> > 
> > When you are mounting using 'nosharecache' then you are making
> > /mnt/one
> > and /mnt/two act as if they are different filesystems. The fact
> > that
> > they are the same on the server, means you are setting up a
> > testcase
> > where the files+directories are acting like the "changing on the
> > server" case as far as the client is concerned.
> > 
> > If you want the above to work in a sane fashion, then just don't
> > use
> > 'nosharecache'.
> 
> That was deliberate to avoid having to show two clients in the
> example..
> sorry, I should have been more explicit.
> 
> I think the client should be able to detect this case, since it can
> see an
> updated change_attr for that particular directory -- that is
> "/mnt/two/B/foo", but maybe it needs to compare the change_attr to
> its
> previous value instead of comparing it to the child's d_time?
> 
> The person who reported it has some workload that flips files between
> directories on separate clients, and doesn't like it when `mv`
> reports
> "source and destination are the same file".

Sorry, but that's not the case, because of the abuse of the
nosharecache flag. You are manipulating the file on the server and
expecting an immediate cache invalidation. That would require
information that the client does not have.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: client lookup_revalidate bug
  2019-05-29 17:19     ` Trond Myklebust
@ 2019-05-30 10:35       ` Benjamin Coddington
  2019-05-30 13:44         ` Benjamin Coddington
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Coddington @ 2019-05-30 10:35 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 29 May 2019, at 13:19, Trond Myklebust wrote:
> On Wed, 2019-05-29 at 12:39 -0400, Benjamin Coddington wrote:
>> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>
> Sorry, but that's not the case, because of the abuse of the
> nosharecache flag. You are manipulating the file on the server and
> expecting an immediate cache invalidation. That would require
> information that the client does not have.

Yes, forget about the nosharecache flag.  Let's just assume we move the file
on the server.  The client does perform a GETATTR and sees the updated
change_attr for the directory, but it matches the dentry's new d_time.

I don't expect immediate cache invalidation; invalidation will never happen
as long as B/'s change_attr isn't bumped again.  On the client, B/foo
dentry's d_time is the same as the change_attr for B/, and even though
there's no file at B/foo on the server, the client will keep the B/foo
dentry until something else bumps B/'s change_attr.

Maybe I need to imagine a scenario where this matters more..

But, I think that if we use the method of comparing d_time with the parent's
change_attr to determine if we need to lookup, this is a case where that's
not reliable.

I'll try to come up with something as I have time to work on it since I am
getting pressure about it.  It looks to me like this behavior has been
around for a long time.  Thanks for the look.

Ben


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: client lookup_revalidate bug
  2019-05-30 10:35       ` Benjamin Coddington
@ 2019-05-30 13:44         ` Benjamin Coddington
  0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Coddington @ 2019-05-30 13:44 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 30 May 2019, at 6:35, Benjamin Coddington wrote:

> On 29 May 2019, at 13:19, Trond Myklebust wrote:
>> On Wed, 2019-05-29 at 12:39 -0400, Benjamin Coddington wrote:
>>> On 29 May 2019, at 12:21, Trond Myklebust wrote:
>>
>> Sorry, but that's not the case, because of the abuse of the
>> nosharecache flag. You are manipulating the file on the server and
>> expecting an immediate cache invalidation. That would require
>> information that the client does not have.
>
> Yes, forget about the nosharecache flag.  Let's just assume we move 
> the file
> on the server.  The client does perform a GETATTR and sees the updated
> change_attr for the directory, but it matches the dentry's new d_time.
>
> I don't expect immediate cache invalidation; invalidation will never 
> happen
> as long as B/'s change_attr isn't bumped again.  On the client, B/foo
> dentry's d_time is the same as the change_attr for B/, and even though
> there's no file at B/foo on the server, the client will keep the B/foo
> dentry until something else bumps B/'s change_attr.
>
> Maybe I need to imagine a scenario where this matters more..
>
> But, I think that if we use the method of comparing d_time with the 
> parent's
> change_attr to determine if we need to lookup, this is a case where 
> that's
> not reliable.
>
> I'll try to come up with something as I have time to work on it since 
> I am
> getting pressure about it.  It looks to me like this behavior has been
> around for a long time.  Thanks for the look.

Takayuki Nagata points out that this behavior only happens when we have 
a
read delegation, and it seems obvious that we ought to skip lookup
revalidation if so.

Let me see if I can clarify the order of events so anyone that wishes 
can
better argue about what should happen.

Single client, NFS is mounted on /mnt

Client reads /mnt/A/foo, dentry A/foo is hashed.
Client moves /mnt/A/foo to /mnt/B/foo, dentry B/foo is hashed.
Server moves B/foo back to A/foo.
Client reads A/foo, gets a read delegation.
Client looks up B/foo, skips revalidation because it holds the read 
delegation.

Client will always return a positive lookup for both A/foo and B/foo at 
this
point.

Should this happen?  My position is no, it shouldn't.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-05-30 13:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-29 15:08 client lookup_revalidate bug Benjamin Coddington
2019-05-29 16:21 ` Trond Myklebust
2019-05-29 16:39   ` Benjamin Coddington
2019-05-29 17:02     ` Achilles Gaikwad
2019-05-29 17:19     ` Trond Myklebust
2019-05-30 10:35       ` Benjamin Coddington
2019-05-30 13:44         ` Benjamin Coddington

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.