* Lookup revalidation for OPEN_CLAIM_FH @ 2020-01-16 13:51 Benjamin Coddington 2020-01-16 14:35 ` Trond Myklebust 0 siblings, 1 reply; 5+ messages in thread From: Benjamin Coddington @ 2020-01-16 13:51 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Hi Trond, I'd like to fix up lookup revalidation for v4.1+ when the client is using OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not seem to improve things for v4.1, and actually make the behavior a bit worse since we no longer pass through nfs_lookup_verify_inode(), which would catch the cases where nlink == 0. Would you accept work to _always_ revalidate the dentry's parent for CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be preferable for this case, though I don't know how the client would know when to decide between them. Here's a simple reproducer for convenience, I think we've already all agreed that the behavior we want is for the second open by `cat` to reflect the results of the move on the server, or at least eventually later opens would revalidate the dentry: #!/bin/bash set -o xtrace vers=4.1 exportfs -ua exportfs -o rw,sec=sys,no_root_squash *:/exports mkdir /mnt/localhost || true rm -f /exports/file{1,2} echo this is file 1 > /exports/file1 echo this is file 2 > /exports/file2 mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost tail -f /mnt/localhost/file1 & sleep 1 # this is file 1 cat /mnt/localhost/file1 # overwrite the file on the server: mv -f /exports/file2 /exports/file1 # this is file 2 cat /mnt/localhost/file1 killall tail # this is file 2 cat /mnt/localhost/file1 umount /mnt/localhost Switching the $vers variable between v4.0 and v4.1 in this script shows the difference in behavior. Thanks for any advice, Ben ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lookup revalidation for OPEN_CLAIM_FH 2020-01-16 13:51 Lookup revalidation for OPEN_CLAIM_FH Benjamin Coddington @ 2020-01-16 14:35 ` Trond Myklebust 2020-01-16 15:13 ` Benjamin Coddington 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2020-01-16 14:35 UTC (permalink / raw) To: bcodding; +Cc: linux-nfs On Thu, 2020-01-16 at 08:51 -0500, Benjamin Coddington wrote: > Hi Trond, > > I'd like to fix up lookup revalidation for v4.1+ when the client is > using > OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not seem > to > improve things for v4.1, and actually make the behavior a bit worse > since we > no longer pass through nfs_lookup_verify_inode(), which would catch > the > cases where nlink == 0. > > Would you accept work to _always_ revalidate the dentry's parent for > CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be > preferable for > this case, though I don't know how the client would know when to > decide > between them. > > Here's a simple reproducer for convenience, I think we've already all > agreed > that the behavior we want is for the second open by `cat` to reflect > the > results of the move on the server, or at least eventually later opens > would > revalidate the dentry: > > #!/bin/bash > > set -o xtrace > vers=4.1 > > exportfs -ua > exportfs -o rw,sec=sys,no_root_squash *:/exports > > mkdir /mnt/localhost || true > > rm -f /exports/file{1,2} > > echo this is file 1 > /exports/file1 > echo this is file 2 > /exports/file2 > > mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost > > tail -f /mnt/localhost/file1 & > sleep 1 > > # this is file 1 > cat /mnt/localhost/file1 > > # overwrite the file on the server: > mv -f /exports/file2 /exports/file1 > > # this is file 2 > cat /mnt/localhost/file1 > > killall tail > # this is file 2 > cat /mnt/localhost/file1 > umount /mnt/localhost > > > Switching the $vers variable between v4.0 and v4.1 in this script > shows the > difference in behavior. > If somebody needs stronger lookup cache revalidation, then that's what they have the 'lookupcache=none' mount option for. We have these 'lookupcache' mount options in order to allow users to tailor the caching behaviour (on a per-mount basis) should the default behaviour be insufficiently strict. Since your testcase doesn't use that mount option, I don't see what it is proving other than what we already know about the default lookup caching: namely that it sacrifices some accuracy in the interest of file open performance. By the way, NFSv4.1 will actually handle this situation better than NFSv3, since the stateful open ensures that even if the file that won the race was deleted by the other client, then the server will preserve that file and its contents until our client calls close(). In conclusion: * With a default mount option of 'lookupcache=all', we don't promise 100% accuracy in the face of 3rd party client changes to the namespace. We only promise that we will eventually pick up the changes. If we're failing to do that, then let's look at why, but 'lookupcache=all' is not guaranteed to immediately find namespace changes. * With 'lookupcache=pos', we promise that the client will ignore negative cached dentries, and hence will always find a file that was created using exclusive create on the server. However I'd expect that too to fail your test above. * If it fails with 'lookupcache=none', then I would agree that we have a problem. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lookup revalidation for OPEN_CLAIM_FH 2020-01-16 14:35 ` Trond Myklebust @ 2020-01-16 15:13 ` Benjamin Coddington 2020-01-16 15:38 ` Trond Myklebust 0 siblings, 1 reply; 5+ messages in thread From: Benjamin Coddington @ 2020-01-16 15:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On 16 Jan 2020, at 9:35, Trond Myklebust wrote: > On Thu, 2020-01-16 at 08:51 -0500, Benjamin Coddington wrote: >> Hi Trond, >> >> I'd like to fix up lookup revalidation for v4.1+ when the client is >> using >> OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not seem >> to >> improve things for v4.1, and actually make the behavior a bit worse >> since we >> no longer pass through nfs_lookup_verify_inode(), which would catch >> the >> cases where nlink == 0. >> >> Would you accept work to _always_ revalidate the dentry's parent for >> CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be >> preferable for >> this case, though I don't know how the client would know when to >> decide >> between them. >> >> Here's a simple reproducer for convenience, I think we've already all >> agreed >> that the behavior we want is for the second open by `cat` to reflect >> the >> results of the move on the server, or at least eventually later opens >> would >> revalidate the dentry: >> >> #!/bin/bash >> >> set -o xtrace >> vers=4.1 >> >> exportfs -ua >> exportfs -o rw,sec=sys,no_root_squash *:/exports >> >> mkdir /mnt/localhost || true >> >> rm -f /exports/file{1,2} >> >> echo this is file 1 > /exports/file1 >> echo this is file 2 > /exports/file2 >> >> mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost >> >> tail -f /mnt/localhost/file1 & >> sleep 1 >> >> # this is file 1 >> cat /mnt/localhost/file1 >> >> # overwrite the file on the server: >> mv -f /exports/file2 /exports/file1 >> >> # this is file 2 >> cat /mnt/localhost/file1 >> >> killall tail >> # this is file 2 >> cat /mnt/localhost/file1 >> umount /mnt/localhost >> >> >> Switching the $vers variable between v4.0 and v4.1 in this script >> shows the >> difference in behavior. >> > > If somebody needs stronger lookup cache revalidation, then that's what > they have the 'lookupcache=none' mount option for. We have these > 'lookupcache' mount options in order to allow users to tailor the > caching behaviour (on a per-mount basis) should the default behaviour > be insufficiently strict. > > Since your testcase doesn't use that mount option, I don't see what it > is proving other than what we already know about the default lookup > caching: namely that it sacrifices some accuracy in the interest of > file open performance. Thanks for the look. The testcase only provides a comparison of different behavior between v4.0 and v4.1, which is due to our use of CLAIM_NULL vs CLAIM_FH. Indeed, setting lookupcache=none give real-time updates. With default lookupcache, after the directory attributes time out, the client will likewise get the namespace right. So, this isn't a major problem, but we do have some QE folks that are unhappy about it. Can we improve things a bit for v4.1 without sacrificing performance? I can't think of a reason to not go back to CLAIM_NULL in nfs4_file_open(). Maybe it is a bit more work on the server to have to do one extra lookup per open, but we'll end up with the right file each time. It makes sense to keep CLAIM_FH for recovery, but why keep it for regular opens? Ben ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lookup revalidation for OPEN_CLAIM_FH 2020-01-16 15:13 ` Benjamin Coddington @ 2020-01-16 15:38 ` Trond Myklebust 2020-01-16 16:21 ` Benjamin Coddington 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2020-01-16 15:38 UTC (permalink / raw) To: bcodding; +Cc: linux-nfs On Thu, 2020-01-16 at 10:13 -0500, Benjamin Coddington wrote: > On 16 Jan 2020, at 9:35, Trond Myklebust wrote: > > > On Thu, 2020-01-16 at 08:51 -0500, Benjamin Coddington wrote: > > > Hi Trond, > > > > > > I'd like to fix up lookup revalidation for v4.1+ when the client > > > is > > > using > > > OPEN_CLAIM_FH. The fixes a while back for Stan Hu's case do not > > > seem > > > to > > > improve things for v4.1, and actually make the behavior a bit > > > worse > > > since we > > > no longer pass through nfs_lookup_verify_inode(), which would > > > catch > > > the > > > cases where nlink == 0. > > > > > > Would you accept work to _always_ revalidate the dentry's parent > > > for > > > CLAIM_FH? Alternatively, it seems that CLAIM_NULL would be > > > preferable for > > > this case, though I don't know how the client would know when to > > > decide > > > between them. > > > > > > Here's a simple reproducer for convenience, I think we've already > > > all > > > agreed > > > that the behavior we want is for the second open by `cat` to > > > reflect > > > the > > > results of the move on the server, or at least eventually later > > > opens > > > would > > > revalidate the dentry: > > > > > > #!/bin/bash > > > > > > set -o xtrace > > > vers=4.1 > > > > > > exportfs -ua > > > exportfs -o rw,sec=sys,no_root_squash *:/exports > > > > > > mkdir /mnt/localhost || true > > > > > > rm -f /exports/file{1,2} > > > > > > echo this is file 1 > /exports/file1 > > > echo this is file 2 > /exports/file2 > > > > > > mount -t nfs -ov$vers,sec=sys localhost:/exports /mnt/localhost > > > > > > tail -f /mnt/localhost/file1 & > > > sleep 1 > > > > > > # this is file 1 > > > cat /mnt/localhost/file1 > > > > > > # overwrite the file on the server: > > > mv -f /exports/file2 /exports/file1 > > > > > > # this is file 2 > > > cat /mnt/localhost/file1 > > > > > > killall tail > > > # this is file 2 > > > cat /mnt/localhost/file1 > > > umount /mnt/localhost > > > > > > > > > Switching the $vers variable between v4.0 and v4.1 in this script > > > shows the > > > difference in behavior. > > > > > > > If somebody needs stronger lookup cache revalidation, then that's > > what > > they have the 'lookupcache=none' mount option for. We have these > > 'lookupcache' mount options in order to allow users to tailor the > > caching behaviour (on a per-mount basis) should the default > > behaviour > > be insufficiently strict. > > > > Since your testcase doesn't use that mount option, I don't see what > > it > > is proving other than what we already know about the default lookup > > caching: namely that it sacrifices some accuracy in the interest of > > file open performance. > > Thanks for the look. The testcase only provides a comparison of > different > behavior between v4.0 and v4.1, which is due to our use of CLAIM_NULL > vs > CLAIM_FH. > > Indeed, setting lookupcache=none give real-time updates. With > default > lookupcache, after the directory attributes time out, the client will > likewise get the namespace right. So, this isn't a major problem, > but we do > have some QE folks that are unhappy about it. > > Can we improve things a bit for v4.1 without sacrificing > performance? I > can't think of a reason to not go back to CLAIM_NULL in > nfs4_file_open(). > Maybe it is a bit more work on the server to have to do one extra > lookup per > open, but we'll end up with the right file each time. > > It makes sense to keep CLAIM_FH for recovery, but why keep it for > regular > opens? > nfs_file_open() is completely the wrong place to perform a lookup. Its purpose in the VFS is to allow the filesystem to set up state *after* we've already looked up the dentry, revalidated it and therefore decided which file to open. The NFSv4.0 behaviour of performing a new lookup is actually the aberration here, and is due to the fact that it does not have an open- by-filehandle operation, so we have no alternative. As I said, if you want stronger semantics, there are lookupcache mount options that allow you to tune things. I therefore see no valid reason to change the existing behaviour, which also matches that of older NFS versions (i.e. v3 and v2). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Lookup revalidation for OPEN_CLAIM_FH 2020-01-16 15:38 ` Trond Myklebust @ 2020-01-16 16:21 ` Benjamin Coddington 0 siblings, 0 replies; 5+ messages in thread From: Benjamin Coddington @ 2020-01-16 16:21 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On 16 Jan 2020, at 10:38, Trond Myklebust wrote: > On Thu, 2020-01-16 at 10:13 -0500, Benjamin Coddington wrote: >> On 16 Jan 2020, at 9:35, Trond Myklebust wrote: > > nfs_file_open() is completely the wrong place to perform a lookup. Its > purpose in the VFS is to allow the filesystem to set up state *after* > we've already looked up the dentry, revalidated it and therefore > decided which file to open. > The NFSv4.0 behaviour of performing a new lookup is actually the > aberration here, and is due to the fact that it does not have an open- > by-filehandle operation, so we have no alternative. Ok, that makes good sense to me. > As I said, if you want stronger semantics, there are lookupcache mount > options that allow you to tune things. I therefore see no valid reason > to change the existing behaviour, which also matches that of older NFS > versions (i.e. v3 and v2). Thanks for the discussion. Your point above about v4.0 CLAIM_NULL actually doing another lookup after we already did a lookup throws things in sharp relief. Ben ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-01-16 16:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-16 13:51 Lookup revalidation for OPEN_CLAIM_FH Benjamin Coddington 2020-01-16 14:35 ` Trond Myklebust 2020-01-16 15:13 ` Benjamin Coddington 2020-01-16 15:38 ` Trond Myklebust 2020-01-16 16:21 ` Benjamin Coddington
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).