* Do we really need d_weak_revalidate??? @ 2017-08-11 4:31 NeilBrown 2017-08-11 5:55 ` Trond Myklebust 0 siblings, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-11 4:31 UTC (permalink / raw) To: Jeff Layton, Al Viro Cc: Michal Koutny, Linux NFS Mailing List, linux-fsdevel, lkml [-- Attachment #1: Type: text/plain, Size: 2113 bytes --] Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock flag and introduced the d_weak_revalidate dentry operation instead. We duly removed the flag from NFS superblocks and NFSv4 superblocks, and added the new dentry operation to NFS dentries .... but not to NFSv4 dentries. And nobody noticed. Until today. A customer reports a situation where mount(....,MS_REMOUNT,..) on an NFS filesystem hangs because the network has been deconfigured. This makes perfect sense and I suggested a code change to fix the problem. However when a colleague was trying to reproduce the problem to validate the fix, he couldn't. Then nor could I. The problem is trivially reproducible with NFSv3, and not at all with NFSv4. The reason is the missing d_weak_revalidate. We could simply add d_weak_revalidate for NFSv4, but given that it has been missing for 4.5 years, and the only time anyone noticed was when the ommission resulted in a better user experience, I do wonder if we need to. Can we just discard d_weak_revalidate? What purpose does it serve? I couldn't find one. Thanks, NeilBrown For reference, see Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op") To reproduce the problem at home, on a system that uses systemd: 1/ place (or find) a filesystem image in a file on an NFS filesystem. 2/ mount the nfs filesystem with "noac" - choose v3 or v4 3/ loop-mount the filesystem image read-only somewhere 4/ reboot If you choose v4, the reboot will succeed, possibly after a 90second timeout. If you choose v3, the reboot will hang indefinitely in systemd-shutdown while remounting the nfs filesystem read-only. If you don't use "noac" it can still hang, but only if something slows down the reboot enough that attributes have timed out by the time that systemd-shutdown runs. This happens for our customer. If the loop-mounted filesystem is not read-only, you get other problems. We really want systemd to figure out that the loop-mount needs to be unmounted first. I have ideas concerning that, but it is messy. But that isn't the only bug here. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-11 4:31 Do we really need d_weak_revalidate??? NeilBrown @ 2017-08-11 5:55 ` Trond Myklebust 2017-08-11 11:01 ` Jeff Layton 2017-08-13 23:29 ` NeilBrown 0 siblings, 2 replies; 34+ messages in thread From: Trond Myklebust @ 2017-08-11 5:55 UTC (permalink / raw) To: viro, jlayton, neilb; +Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock > flag and introduced the d_weak_revalidate dentry operation instead. > We duly removed the flag from NFS superblocks and NFSv4 superblocks, > and added the new dentry operation to NFS dentries .... but not to > NFSv4 > dentries. > > And nobody noticed. > > Until today. > > A customer reports a situation where mount(....,MS_REMOUNT,..) on an > NFS > filesystem hangs because the network has been deconfigured. This > makes > perfect sense and I suggested a code change to fix the problem. > However when a colleague was trying to reproduce the problem to > validate > the fix, he couldn't. Then nor could I. > > The problem is trivially reproducible with NFSv3, and not at all with > NFSv4. The reason is the missing d_weak_revalidate. > > We could simply add d_weak_revalidate for NFSv4, but given that it > has been missing for 4.5 years, and the only time anyone noticed was > when the ommission resulted in a better user experience, I do wonder > if > we need to. Can we just discard d_weak_revalidate? What purpose > does > it serve? I couldn't find one. > > Thanks, > NeilBrown > > For reference, see > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a > d_weak_revalidate dentry op") > > > > To reproduce the problem at home, on a system that uses systemd: > 1/ place (or find) a filesystem image in a file on an NFS filesystem. > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 > 3/ loop-mount the filesystem image read-only somewhere > 4/ reboot > > If you choose v4, the reboot will succeed, possibly after a 90second > timeout. > If you choose v3, the reboot will hang indefinitely in systemd- > shutdown while > remounting the nfs filesystem read-only. > > If you don't use "noac" it can still hang, but only if something > slows > down the reboot enough that attributes have timed out by the time > that > systemd-shutdown runs. This happens for our customer. > > If the loop-mounted filesystem is not read-only, you get other > problems. > > We really want systemd to figure out that the loop-mount needs to be > unmounted first. I have ideas concerning that, but it is messy. But > that isn't the only bug here. The main purpose of d_weak_revalidate() was to catch the issues that arise when someone changes the contents of the current working directory or its parent on the server. Since '.' and '..' are treated specially in the lookup code, they would not be revalidated without special treatment. That leads to issues when looking up files as ./<filename> or ../<filename>, since the client won't detect that its dcache is stale until it tries to use the cached dentry+inode. The one thing that has changed since its introduction is, I believe, the ESTALE handling in the VFS layer. That might fix a lot of the dcache lookup bugs that were previously handled by d_weak_revalidate(). I haven't done an audit to figure out if it actually can handle all of them. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-11 5:55 ` Trond Myklebust @ 2017-08-11 11:01 ` Jeff Layton 2017-08-13 23:36 ` NeilBrown 2017-08-13 23:29 ` NeilBrown 1 sibling, 1 reply; 34+ messages in thread From: Jeff Layton @ 2017-08-11 11:01 UTC (permalink / raw) To: Trond Myklebust, viro, neilb Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: > > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock > > flag and introduced the d_weak_revalidate dentry operation instead. > > We duly removed the flag from NFS superblocks and NFSv4 superblocks, > > and added the new dentry operation to NFS dentries .... but not to > > NFSv4 > > dentries. > > > > And nobody noticed. > > > > Until today. > > > > A customer reports a situation where mount(....,MS_REMOUNT,..) on an > > NFS > > filesystem hangs because the network has been deconfigured. This > > makes > > perfect sense and I suggested a code change to fix the problem. > > However when a colleague was trying to reproduce the problem to > > validate > > the fix, he couldn't. Then nor could I. > > > > The problem is trivially reproducible with NFSv3, and not at all with > > NFSv4. The reason is the missing d_weak_revalidate. > > > > We could simply add d_weak_revalidate for NFSv4, but given that it > > has been missing for 4.5 years, and the only time anyone noticed was > > when the ommission resulted in a better user experience, I do wonder > > if > > we need to. Can we just discard d_weak_revalidate? What purpose > > does > > it serve? I couldn't find one. > > > > Thanks, > > NeilBrown > > > > For reference, see > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a > > d_weak_revalidate dentry op") > > > > > > > > To reproduce the problem at home, on a system that uses systemd: > > 1/ place (or find) a filesystem image in a file on an NFS filesystem. > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 > > 3/ loop-mount the filesystem image read-only somewhere > > 4/ reboot > > > > If you choose v4, the reboot will succeed, possibly after a 90second > > timeout. > > If you choose v3, the reboot will hang indefinitely in systemd- > > shutdown while > > remounting the nfs filesystem read-only. > > > > If you don't use "noac" it can still hang, but only if something > > slows > > down the reboot enough that attributes have timed out by the time > > that > > systemd-shutdown runs. This happens for our customer. > > > > If the loop-mounted filesystem is not read-only, you get other > > problems. > > > > We really want systemd to figure out that the loop-mount needs to be > > unmounted first. I have ideas concerning that, but it is messy. But > > that isn't the only bug here. > > The main purpose of d_weak_revalidate() was to catch the issues that > arise when someone changes the contents of the current working > directory or its parent on the server. Since '.' and '..' are treated > specially in the lookup code, they would not be revalidated without > special treatment. That leads to issues when looking up files as > ./<filename> or ../<filename>, since the client won't detect that its > dcache is stale until it tries to use the cached dentry+inode. > > The one thing that has changed since its introduction is, I believe, > the ESTALE handling in the VFS layer. That might fix a lot of the > dcache lookup bugs that were previously handled by d_weak_revalidate(). > I haven't done an audit to figure out if it actually can handle all of > them. > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: vfs: allow umount to handle mountpoints without revalidating them Possibly the fact that we no longer try to revalidate during unmount means that this is no longer necessary? The original patch that added d_weak_revalidate had a reproducer in the patch description. Have you verified that that problem is still not reproducible when you remove d_weak_revalidate? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-11 11:01 ` Jeff Layton @ 2017-08-13 23:36 ` NeilBrown 2017-08-14 10:10 ` Jeff Layton 0 siblings, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-13 23:36 UTC (permalink / raw) To: Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 4645 bytes --] On Fri, Aug 11 2017, Jeff Layton wrote: > On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >> > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >> > flag and introduced the d_weak_revalidate dentry operation instead. >> > We duly removed the flag from NFS superblocks and NFSv4 superblocks, >> > and added the new dentry operation to NFS dentries .... but not to >> > NFSv4 >> > dentries. >> > >> > And nobody noticed. >> > >> > Until today. >> > >> > A customer reports a situation where mount(....,MS_REMOUNT,..) on an >> > NFS >> > filesystem hangs because the network has been deconfigured. This >> > makes >> > perfect sense and I suggested a code change to fix the problem. >> > However when a colleague was trying to reproduce the problem to >> > validate >> > the fix, he couldn't. Then nor could I. >> > >> > The problem is trivially reproducible with NFSv3, and not at all with >> > NFSv4. The reason is the missing d_weak_revalidate. >> > >> > We could simply add d_weak_revalidate for NFSv4, but given that it >> > has been missing for 4.5 years, and the only time anyone noticed was >> > when the ommission resulted in a better user experience, I do wonder >> > if >> > we need to. Can we just discard d_weak_revalidate? What purpose >> > does >> > it serve? I couldn't find one. >> > >> > Thanks, >> > NeilBrown >> > >> > For reference, see >> > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >> > d_weak_revalidate dentry op") >> > >> > >> > >> > To reproduce the problem at home, on a system that uses systemd: >> > 1/ place (or find) a filesystem image in a file on an NFS filesystem. >> > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >> > 3/ loop-mount the filesystem image read-only somewhere >> > 4/ reboot >> > >> > If you choose v4, the reboot will succeed, possibly after a 90second >> > timeout. >> > If you choose v3, the reboot will hang indefinitely in systemd- >> > shutdown while >> > remounting the nfs filesystem read-only. >> > >> > If you don't use "noac" it can still hang, but only if something >> > slows >> > down the reboot enough that attributes have timed out by the time >> > that >> > systemd-shutdown runs. This happens for our customer. >> > >> > If the loop-mounted filesystem is not read-only, you get other >> > problems. >> > >> > We really want systemd to figure out that the loop-mount needs to be >> > unmounted first. I have ideas concerning that, but it is messy. But >> > that isn't the only bug here. >> >> The main purpose of d_weak_revalidate() was to catch the issues that >> arise when someone changes the contents of the current working >> directory or its parent on the server. Since '.' and '..' are treated >> specially in the lookup code, they would not be revalidated without >> special treatment. That leads to issues when looking up files as >> ./<filename> or ../<filename>, since the client won't detect that its >> dcache is stale until it tries to use the cached dentry+inode. >> >> The one thing that has changed since its introduction is, I believe, >> the ESTALE handling in the VFS layer. That might fix a lot of the >> dcache lookup bugs that were previously handled by d_weak_revalidate(). >> I haven't done an audit to figure out if it actually can handle all of >> them. >> > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: > > vfs: allow umount to handle mountpoints without revalidating them You say in the comment for that commit: but there are cases where we do want to revalidate the root of the fs. Do you happen to remember what those cases are? > > Possibly the fact that we no longer try to revalidate during unmount > means that this is no longer necessary? > > The original patch that added d_weak_revalidate had a reproducer in the > patch description. Have you verified that that problem is still not > reproducible when you remove d_weak_revalidate? I did try the reproducer and it works as expected both with and without d_weak_revalidate. On reflection, the problem it displayed was caused by d_revalidate() being called when the dentry name was irrelevant. We remove that (fixing the problem) and introduce d_weak_revalidate because we thought that minimum functionality was still useful. I'm currently not convinced that even that is needed. If we discarded d_weak_revalidate(), we could get rid of the special handling of umount.... Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-13 23:36 ` NeilBrown @ 2017-08-14 10:10 ` Jeff Layton 2017-08-16 2:43 ` NeilBrown 0 siblings, 1 reply; 34+ messages in thread From: Jeff Layton @ 2017-08-14 10:10 UTC (permalink / raw) To: NeilBrown, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: > On Fri, Aug 11 2017, Jeff Layton wrote: > > > On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: > > > > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock > > > > flag and introduced the d_weak_revalidate dentry operation instead. > > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks, > > > > and added the new dentry operation to NFS dentries .... but not to > > > > NFSv4 > > > > dentries. > > > > > > > > And nobody noticed. > > > > > > > > Until today. > > > > > > > > A customer reports a situation where mount(....,MS_REMOUNT,..) on an > > > > NFS > > > > filesystem hangs because the network has been deconfigured. This > > > > makes > > > > perfect sense and I suggested a code change to fix the problem. > > > > However when a colleague was trying to reproduce the problem to > > > > validate > > > > the fix, he couldn't. Then nor could I. > > > > > > > > The problem is trivially reproducible with NFSv3, and not at all with > > > > NFSv4. The reason is the missing d_weak_revalidate. > > > > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it > > > > has been missing for 4.5 years, and the only time anyone noticed was > > > > when the ommission resulted in a better user experience, I do wonder > > > > if > > > > we need to. Can we just discard d_weak_revalidate? What purpose > > > > does > > > > it serve? I couldn't find one. > > > > > > > > Thanks, > > > > NeilBrown > > > > > > > > For reference, see > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a > > > > d_weak_revalidate dentry op") > > > > > > > > > > > > > > > > To reproduce the problem at home, on a system that uses systemd: > > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem. > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 > > > > 3/ loop-mount the filesystem image read-only somewhere > > > > 4/ reboot > > > > > > > > If you choose v4, the reboot will succeed, possibly after a 90second > > > > timeout. > > > > If you choose v3, the reboot will hang indefinitely in systemd- > > > > shutdown while > > > > remounting the nfs filesystem read-only. > > > > > > > > If you don't use "noac" it can still hang, but only if something > > > > slows > > > > down the reboot enough that attributes have timed out by the time > > > > that > > > > systemd-shutdown runs. This happens for our customer. > > > > > > > > If the loop-mounted filesystem is not read-only, you get other > > > > problems. > > > > > > > > We really want systemd to figure out that the loop-mount needs to be > > > > unmounted first. I have ideas concerning that, but it is messy. But > > > > that isn't the only bug here. > > > > > > The main purpose of d_weak_revalidate() was to catch the issues that > > > arise when someone changes the contents of the current working > > > directory or its parent on the server. Since '.' and '..' are treated > > > specially in the lookup code, they would not be revalidated without > > > special treatment. That leads to issues when looking up files as > > > ./<filename> or ../<filename>, since the client won't detect that its > > > dcache is stale until it tries to use the cached dentry+inode. > > > > > > The one thing that has changed since its introduction is, I believe, > > > the ESTALE handling in the VFS layer. That might fix a lot of the > > > dcache lookup bugs that were previously handled by d_weak_revalidate(). > > > I haven't done an audit to figure out if it actually can handle all of > > > them. > > > > > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: > > > > vfs: allow umount to handle mountpoints without revalidating them > > You say in the comment for that commit: > > but there > are cases where we do want to revalidate the root of the fs. > > Do you happen to remember what those cases are? > Not exactly, but I _think_ I might have been assuming that we needed to ensure that the inode attrs on the root were up to date after the pathwalk. I think that was probably wrong. d_revalidate is really intended to ensure that the dentry in question still points to the same inode. In the case of the root of the mount though, we don't really care about the dentry on the server at all. We're attaching the root of the mount to an inode and don't care of the dentry name changes. If we do need to ensure the inode attrs are updated, we'll just revalidate them at that point. > > > > Possibly the fact that we no longer try to revalidate during unmount > > means that this is no longer necessary? > > > > The original patch that added d_weak_revalidate had a reproducer in the > > patch description. Have you verified that that problem is still not > > reproducible when you remove d_weak_revalidate? > > I did try the reproducer and it works as expected both with and without > d_weak_revalidate. > On reflection, the problem it displayed was caused by d_revalidate() > being called when the dentry name was irrelevant. We remove that > (fixing the problem) and introduce d_weak_revalidate because we thought > that minimum functionality was still useful. I'm currently not > convinced that even that is needed. > > If we discarded d_weak_revalidate(), we could get rid of the special > handling of umount.... I like idea. I say go for it and we can see what (if anything) breaks? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-14 10:10 ` Jeff Layton @ 2017-08-16 2:43 ` NeilBrown 2017-08-16 11:34 ` Jeff Layton 0 siblings, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-16 2:43 UTC (permalink / raw) To: Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 6370 bytes --] On Mon, Aug 14 2017, Jeff Layton wrote: > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >> On Fri, Aug 11 2017, Jeff Layton wrote: >> >> > On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >> > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >> > > > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >> > > > flag and introduced the d_weak_revalidate dentry operation instead. >> > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks, >> > > > and added the new dentry operation to NFS dentries .... but not to >> > > > NFSv4 >> > > > dentries. >> > > > >> > > > And nobody noticed. >> > > > >> > > > Until today. >> > > > >> > > > A customer reports a situation where mount(....,MS_REMOUNT,..) on an >> > > > NFS >> > > > filesystem hangs because the network has been deconfigured. This >> > > > makes >> > > > perfect sense and I suggested a code change to fix the problem. >> > > > However when a colleague was trying to reproduce the problem to >> > > > validate >> > > > the fix, he couldn't. Then nor could I. >> > > > >> > > > The problem is trivially reproducible with NFSv3, and not at all with >> > > > NFSv4. The reason is the missing d_weak_revalidate. >> > > > >> > > > We could simply add d_weak_revalidate for NFSv4, but given that it >> > > > has been missing for 4.5 years, and the only time anyone noticed was >> > > > when the ommission resulted in a better user experience, I do wonder >> > > > if >> > > > we need to. Can we just discard d_weak_revalidate? What purpose >> > > > does >> > > > it serve? I couldn't find one. >> > > > >> > > > Thanks, >> > > > NeilBrown >> > > > >> > > > For reference, see >> > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >> > > > d_weak_revalidate dentry op") >> > > > >> > > > >> > > > >> > > > To reproduce the problem at home, on a system that uses systemd: >> > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem. >> > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >> > > > 3/ loop-mount the filesystem image read-only somewhere >> > > > 4/ reboot >> > > > >> > > > If you choose v4, the reboot will succeed, possibly after a 90second >> > > > timeout. >> > > > If you choose v3, the reboot will hang indefinitely in systemd- >> > > > shutdown while >> > > > remounting the nfs filesystem read-only. >> > > > >> > > > If you don't use "noac" it can still hang, but only if something >> > > > slows >> > > > down the reboot enough that attributes have timed out by the time >> > > > that >> > > > systemd-shutdown runs. This happens for our customer. >> > > > >> > > > If the loop-mounted filesystem is not read-only, you get other >> > > > problems. >> > > > >> > > > We really want systemd to figure out that the loop-mount needs to be >> > > > unmounted first. I have ideas concerning that, but it is messy. But >> > > > that isn't the only bug here. >> > > >> > > The main purpose of d_weak_revalidate() was to catch the issues that >> > > arise when someone changes the contents of the current working >> > > directory or its parent on the server. Since '.' and '..' are treated >> > > specially in the lookup code, they would not be revalidated without >> > > special treatment. That leads to issues when looking up files as >> > > ./<filename> or ../<filename>, since the client won't detect that its >> > > dcache is stale until it tries to use the cached dentry+inode. >> > > >> > > The one thing that has changed since its introduction is, I believe, >> > > the ESTALE handling in the VFS layer. That might fix a lot of the >> > > dcache lookup bugs that were previously handled by d_weak_revalidate(). >> > > I haven't done an audit to figure out if it actually can handle all of >> > > them. >> > > >> > >> > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >> > >> > vfs: allow umount to handle mountpoints without revalidating them >> >> You say in the comment for that commit: >> >> but there >> are cases where we do want to revalidate the root of the fs. >> >> Do you happen to remember what those cases are? >> > > Not exactly, but I _think_ I might have been assuming that we needed to > ensure that the inode attrs on the root were up to date after the > pathwalk. > > I think that was probably wrong. d_revalidate is really intended to > ensure that the dentry in question still points to the same inode. In > the case of the root of the mount though, we don't really care about the > dentry on the server at all. We're attaching the root of the mount to an > inode and don't care of the dentry name changes. If we do need to ensure > the inode attrs are updated, we'll just revalidate them at that point. > >> > >> > Possibly the fact that we no longer try to revalidate during unmount >> > means that this is no longer necessary? >> > >> > The original patch that added d_weak_revalidate had a reproducer in the >> > patch description. Have you verified that that problem is still not >> > reproducible when you remove d_weak_revalidate? >> >> I did try the reproducer and it works as expected both with and without >> d_weak_revalidate. >> On reflection, the problem it displayed was caused by d_revalidate() >> being called when the dentry name was irrelevant. We remove that >> (fixing the problem) and introduce d_weak_revalidate because we thought >> that minimum functionality was still useful. I'm currently not >> convinced that even that is needed. >> >> If we discarded d_weak_revalidate(), we could get rid of the special >> handling of umount.... > > I like idea. I say go for it and we can see what (if anything) breaks? Getting rid of d_weak_revalidate is easy enough - hardly any users. Getting rid of filename_mountpoint() isn't so easy unfortunately. autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck in autofs4_d_manage()? It would be a shame to keep this infrastructure around just so that one part of autofs4 can talk to another part of autofs4. Do you know if the fact that filename_mountpoint() skips ->d_manage is important for sys_umount ?? Thanks, NeilBrown > -- > Jeff Layton <jlayton@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-16 2:43 ` NeilBrown @ 2017-08-16 11:34 ` Jeff Layton 2017-08-16 23:47 ` NeilBrown 2017-08-17 2:20 ` Ian Kent 0 siblings, 2 replies; 34+ messages in thread From: Jeff Layton @ 2017-08-16 11:34 UTC (permalink / raw) To: NeilBrown, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, Ian Kent On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: > On Mon, Aug 14 2017, Jeff Layton wrote: > > > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: > > > On Fri, Aug 11 2017, Jeff Layton wrote: > > > > > > > On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: > > > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: > > > > > > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock > > > > > > flag and introduced the d_weak_revalidate dentry operation instead. > > > > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks, > > > > > > and added the new dentry operation to NFS dentries .... but not to > > > > > > NFSv4 > > > > > > dentries. > > > > > > > > > > > > And nobody noticed. > > > > > > > > > > > > Until today. > > > > > > > > > > > > A customer reports a situation where mount(....,MS_REMOUNT,..) on an > > > > > > NFS > > > > > > filesystem hangs because the network has been deconfigured. This > > > > > > makes > > > > > > perfect sense and I suggested a code change to fix the problem. > > > > > > However when a colleague was trying to reproduce the problem to > > > > > > validate > > > > > > the fix, he couldn't. Then nor could I. > > > > > > > > > > > > The problem is trivially reproducible with NFSv3, and not at all with > > > > > > NFSv4. The reason is the missing d_weak_revalidate. > > > > > > > > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it > > > > > > has been missing for 4.5 years, and the only time anyone noticed was > > > > > > when the ommission resulted in a better user experience, I do wonder > > > > > > if > > > > > > we need to. Can we just discard d_weak_revalidate? What purpose > > > > > > does > > > > > > it serve? I couldn't find one. > > > > > > > > > > > > Thanks, > > > > > > NeilBrown > > > > > > > > > > > > For reference, see > > > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a > > > > > > d_weak_revalidate dentry op") > > > > > > > > > > > > > > > > > > > > > > > > To reproduce the problem at home, on a system that uses systemd: > > > > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem. > > > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 > > > > > > 3/ loop-mount the filesystem image read-only somewhere > > > > > > 4/ reboot > > > > > > > > > > > > If you choose v4, the reboot will succeed, possibly after a 90second > > > > > > timeout. > > > > > > If you choose v3, the reboot will hang indefinitely in systemd- > > > > > > shutdown while > > > > > > remounting the nfs filesystem read-only. > > > > > > > > > > > > If you don't use "noac" it can still hang, but only if something > > > > > > slows > > > > > > down the reboot enough that attributes have timed out by the time > > > > > > that > > > > > > systemd-shutdown runs. This happens for our customer. > > > > > > > > > > > > If the loop-mounted filesystem is not read-only, you get other > > > > > > problems. > > > > > > > > > > > > We really want systemd to figure out that the loop-mount needs to be > > > > > > unmounted first. I have ideas concerning that, but it is messy. But > > > > > > that isn't the only bug here. > > > > > > > > > > The main purpose of d_weak_revalidate() was to catch the issues that > > > > > arise when someone changes the contents of the current working > > > > > directory or its parent on the server. Since '.' and '..' are treated > > > > > specially in the lookup code, they would not be revalidated without > > > > > special treatment. That leads to issues when looking up files as > > > > > ./<filename> or ../<filename>, since the client won't detect that its > > > > > dcache is stale until it tries to use the cached dentry+inode. > > > > > > > > > > The one thing that has changed since its introduction is, I believe, > > > > > the ESTALE handling in the VFS layer. That might fix a lot of the > > > > > dcache lookup bugs that were previously handled by d_weak_revalidate(). > > > > > I haven't done an audit to figure out if it actually can handle all of > > > > > them. > > > > > > > > > > > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: > > > > > > > > vfs: allow umount to handle mountpoints without revalidating them > > > > > > You say in the comment for that commit: > > > > > > but there > > > are cases where we do want to revalidate the root of the fs. > > > > > > Do you happen to remember what those cases are? > > > > > > > Not exactly, but I _think_ I might have been assuming that we needed to > > ensure that the inode attrs on the root were up to date after the > > pathwalk. > > > > I think that was probably wrong. d_revalidate is really intended to > > ensure that the dentry in question still points to the same inode. In > > the case of the root of the mount though, we don't really care about the > > dentry on the server at all. We're attaching the root of the mount to an > > inode and don't care of the dentry name changes. If we do need to ensure > > the inode attrs are updated, we'll just revalidate them at that point. > > > > > > > > > > Possibly the fact that we no longer try to revalidate during unmount > > > > means that this is no longer necessary? > > > > > > > > The original patch that added d_weak_revalidate had a reproducer in the > > > > patch description. Have you verified that that problem is still not > > > > reproducible when you remove d_weak_revalidate? > > > > > > I did try the reproducer and it works as expected both with and without > > > d_weak_revalidate. > > > On reflection, the problem it displayed was caused by d_revalidate() > > > being called when the dentry name was irrelevant. We remove that > > > (fixing the problem) and introduce d_weak_revalidate because we thought > > > that minimum functionality was still useful. I'm currently not > > > convinced that even that is needed. > > > > > > If we discarded d_weak_revalidate(), we could get rid of the special > > > handling of umount.... > > > > I like idea. I say go for it and we can see what (if anything) breaks? > > Getting rid of d_weak_revalidate is easy enough - hardly any users. > > Getting rid of filename_mountpoint() isn't so easy unfortunately. > autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck > in autofs4_d_manage()? It would be a shame to keep this infrastructure > around just so that one part of autofs4 can talk to another part of > autofs4. > > Do you know if the fact that filename_mountpoint() skips ->d_manage is > important for sys_umount ?? > > Thanks, > NeilBrown > (cc'ing David and Ian) I'm less familiar with the automounting machinery, but I imagine you don't really want to go triggering new mounts when your intent is to unmount something. As long as that doesn't happen I'd think we'd be ok here. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-16 11:34 ` Jeff Layton @ 2017-08-16 23:47 ` NeilBrown 2017-08-17 2:20 ` Ian Kent 1 sibling, 0 replies; 34+ messages in thread From: NeilBrown @ 2017-08-16 23:47 UTC (permalink / raw) To: Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, Ian Kent [-- Attachment #1: Type: text/plain, Size: 10210 bytes --] On Wed, Aug 16 2017, Jeff Layton wrote: > On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >> On Mon, Aug 14 2017, Jeff Layton wrote: >> >> > On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >> > > On Fri, Aug 11 2017, Jeff Layton wrote: >> > > >> > > > On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >> > > > > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >> > > > > > Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >> > > > > > flag and introduced the d_weak_revalidate dentry operation instead. >> > > > > > We duly removed the flag from NFS superblocks and NFSv4 superblocks, >> > > > > > and added the new dentry operation to NFS dentries .... but not to >> > > > > > NFSv4 >> > > > > > dentries. >> > > > > > >> > > > > > And nobody noticed. >> > > > > > >> > > > > > Until today. >> > > > > > >> > > > > > A customer reports a situation where mount(....,MS_REMOUNT,..) on an >> > > > > > NFS >> > > > > > filesystem hangs because the network has been deconfigured. This >> > > > > > makes >> > > > > > perfect sense and I suggested a code change to fix the problem. >> > > > > > However when a colleague was trying to reproduce the problem to >> > > > > > validate >> > > > > > the fix, he couldn't. Then nor could I. >> > > > > > >> > > > > > The problem is trivially reproducible with NFSv3, and not at all with >> > > > > > NFSv4. The reason is the missing d_weak_revalidate. >> > > > > > >> > > > > > We could simply add d_weak_revalidate for NFSv4, but given that it >> > > > > > has been missing for 4.5 years, and the only time anyone noticed was >> > > > > > when the ommission resulted in a better user experience, I do wonder >> > > > > > if >> > > > > > we need to. Can we just discard d_weak_revalidate? What purpose >> > > > > > does >> > > > > > it serve? I couldn't find one. >> > > > > > >> > > > > > Thanks, >> > > > > > NeilBrown >> > > > > > >> > > > > > For reference, see >> > > > > > Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >> > > > > > d_weak_revalidate dentry op") >> > > > > > >> > > > > > >> > > > > > >> > > > > > To reproduce the problem at home, on a system that uses systemd: >> > > > > > 1/ place (or find) a filesystem image in a file on an NFS filesystem. >> > > > > > 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >> > > > > > 3/ loop-mount the filesystem image read-only somewhere >> > > > > > 4/ reboot >> > > > > > >> > > > > > If you choose v4, the reboot will succeed, possibly after a 90second >> > > > > > timeout. >> > > > > > If you choose v3, the reboot will hang indefinitely in systemd- >> > > > > > shutdown while >> > > > > > remounting the nfs filesystem read-only. >> > > > > > >> > > > > > If you don't use "noac" it can still hang, but only if something >> > > > > > slows >> > > > > > down the reboot enough that attributes have timed out by the time >> > > > > > that >> > > > > > systemd-shutdown runs. This happens for our customer. >> > > > > > >> > > > > > If the loop-mounted filesystem is not read-only, you get other >> > > > > > problems. >> > > > > > >> > > > > > We really want systemd to figure out that the loop-mount needs to be >> > > > > > unmounted first. I have ideas concerning that, but it is messy. But >> > > > > > that isn't the only bug here. >> > > > > >> > > > > The main purpose of d_weak_revalidate() was to catch the issues that >> > > > > arise when someone changes the contents of the current working >> > > > > directory or its parent on the server. Since '.' and '..' are treated >> > > > > specially in the lookup code, they would not be revalidated without >> > > > > special treatment. That leads to issues when looking up files as >> > > > > ./<filename> or ../<filename>, since the client won't detect that its >> > > > > dcache is stale until it tries to use the cached dentry+inode. >> > > > > >> > > > > The one thing that has changed since its introduction is, I believe, >> > > > > the ESTALE handling in the VFS layer. That might fix a lot of the >> > > > > dcache lookup bugs that were previously handled by d_weak_revalidate(). >> > > > > I haven't done an audit to figure out if it actually can handle all of >> > > > > them. >> > > > > >> > > > >> > > > It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >> > > > >> > > > vfs: allow umount to handle mountpoints without revalidating them >> > > >> > > You say in the comment for that commit: >> > > >> > > but there >> > > are cases where we do want to revalidate the root of the fs. >> > > >> > > Do you happen to remember what those cases are? >> > > >> > >> > Not exactly, but I _think_ I might have been assuming that we needed to >> > ensure that the inode attrs on the root were up to date after the >> > pathwalk. >> > >> > I think that was probably wrong. d_revalidate is really intended to >> > ensure that the dentry in question still points to the same inode. In >> > the case of the root of the mount though, we don't really care about the >> > dentry on the server at all. We're attaching the root of the mount to an >> > inode and don't care of the dentry name changes. If we do need to ensure >> > the inode attrs are updated, we'll just revalidate them at that point. >> > >> > > > >> > > > Possibly the fact that we no longer try to revalidate during unmount >> > > > means that this is no longer necessary? >> > > > >> > > > The original patch that added d_weak_revalidate had a reproducer in the >> > > > patch description. Have you verified that that problem is still not >> > > > reproducible when you remove d_weak_revalidate? >> > > >> > > I did try the reproducer and it works as expected both with and without >> > > d_weak_revalidate. >> > > On reflection, the problem it displayed was caused by d_revalidate() >> > > being called when the dentry name was irrelevant. We remove that >> > > (fixing the problem) and introduce d_weak_revalidate because we thought >> > > that minimum functionality was still useful. I'm currently not >> > > convinced that even that is needed. >> > > >> > > If we discarded d_weak_revalidate(), we could get rid of the special >> > > handling of umount.... >> > >> > I like idea. I say go for it and we can see what (if anything) breaks? >> >> Getting rid of d_weak_revalidate is easy enough - hardly any users. >> >> Getting rid of filename_mountpoint() isn't so easy unfortunately. >> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >> in autofs4_d_manage()? It would be a shame to keep this infrastructure >> around just so that one part of autofs4 can talk to another part of >> autofs4. >> >> Do you know if the fact that filename_mountpoint() skips ->d_manage is >> important for sys_umount ?? >> >> Thanks, >> NeilBrown >> > > (cc'ing David and Ian) > > I'm less familiar with the automounting machinery, but I imagine you > don't really want to go triggering new mounts when your intent is to > unmount something. As long as that doesn't happen I'd think we'd be ok > here. New mounts don't get triggered unless asked for (LOOKUP_AUTOMOUNT) so that isn't a problem. But the d_manage() operation does get called, and it can block. I think the purpose of the interface is to avoid races with expiring automounts. Once the decision has been made to expire the mount, you want to be able to stop anyone getting a new access. d_manage() can do that. umount() probably doesn't need d_manage() to be skipped. autofs *does* for certain ioctls which are used to re-establish ownership of an automount (e.g. when restarting automountd). The owner never gets blocked, but if the new automountd isn't the owner yet, it needs help. The current code causes the lookup to never call d_manage(), which I think is overkill. There might be some other filesystem which uses d_manage() (not actually possible at present) and those shouldn't be ignored. So I came up with this. It isn't exactly elegant, but it should avoid the deadlock that caused autofs4 to start using kern_path_mount(). If we apply this change, and get rid of d_weak_revalidate(), then I think we can get rid of all the "mountpoint" pathname lookup code. Ian/David - you do have any thoughts on this approach? Thanks, NeilBrown diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index beef981aa54f..478a9aafd6d3 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -271,3 +271,4 @@ static inline void autofs4_del_expiring(struct dentry *dentry) } void autofs4_kill_sb(struct super_block *); +int is_autofs_finder(void); diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c index dd9f1bebb5a3..f3e84bfed85a 100644 --- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -200,6 +200,13 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, return 0; } +static DEFINE_MUTEX(autofs_find); +static struct task_struct *autofs_finder = NULL; +int is_autofs_finder(void) +{ + return autofs_finder == current; +} + /* Find the topmost mount satisfying test() */ static int find_autofs_mount(const char *pathname, struct path *res, @@ -209,7 +216,12 @@ static int find_autofs_mount(const char *pathname, struct path path; int err; - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); + mutex_lock(&autofs_find); + autofs_finder = current; + err = kern_path(pathname, 0, &path); + autofs_finder = NULL; + mutex_unlock(&autofs_find); + if (err) return err; err = -ENOENT; diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index d79ced925861..e93a3112d60c 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -438,7 +438,7 @@ static int autofs4_d_manage(const struct path *path, bool rcu_walk) pr_debug("dentry=%p %pd\n", dentry, dentry); /* The daemon never waits. */ - if (autofs4_oz_mode(sbi)) { + if (autofs4_oz_mode(sbi) || is_autofs_finder()) { if (!path_is_mountpoint(path)) return -EISDIR; return 0; [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-16 11:34 ` Jeff Layton 2017-08-16 23:47 ` NeilBrown @ 2017-08-17 2:20 ` Ian Kent 2017-08-18 5:24 ` NeilBrown 1 sibling, 1 reply; 34+ messages in thread From: Ian Kent @ 2017-08-17 2:20 UTC (permalink / raw) To: Jeff Layton, NeilBrown, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 16/08/17 19:34, Jeff Layton wrote: > On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >> On Mon, Aug 14 2017, Jeff Layton wrote: >> >>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>> >>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>> NFSv4 >>>>>>> dentries. >>>>>>> >>>>>>> And nobody noticed. >>>>>>> >>>>>>> Until today. >>>>>>> >>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>> NFS >>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>> makes >>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>> validate >>>>>>> the fix, he couldn't. Then nor could I. >>>>>>> >>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>> >>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>> if >>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>> does >>>>>>> it serve? I couldn't find one. >>>>>>> >>>>>>> Thanks, >>>>>>> NeilBrown >>>>>>> >>>>>>> For reference, see >>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>> d_weak_revalidate dentry op") >>>>>>> >>>>>>> >>>>>>> >>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>> 4/ reboot >>>>>>> >>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>> timeout. >>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>> shutdown while >>>>>>> remounting the nfs filesystem read-only. >>>>>>> >>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>> slows >>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>> that >>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>> >>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>> problems. >>>>>>> >>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>> that isn't the only bug here. >>>>>> >>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>> arise when someone changes the contents of the current working >>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>> specially in the lookup code, they would not be revalidated without >>>>>> special treatment. That leads to issues when looking up files as >>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>> >>>>>> The one thing that has changed since its introduction is, I believe, >>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>> them. >>>>>> >>>>> >>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>> >>>>> vfs: allow umount to handle mountpoints without revalidating them >>>> >>>> You say in the comment for that commit: >>>> >>>> but there >>>> are cases where we do want to revalidate the root of the fs. >>>> >>>> Do you happen to remember what those cases are? >>>> >>> >>> Not exactly, but I _think_ I might have been assuming that we needed to >>> ensure that the inode attrs on the root were up to date after the >>> pathwalk. >>> >>> I think that was probably wrong. d_revalidate is really intended to >>> ensure that the dentry in question still points to the same inode. In >>> the case of the root of the mount though, we don't really care about the >>> dentry on the server at all. We're attaching the root of the mount to an >>> inode and don't care of the dentry name changes. If we do need to ensure >>> the inode attrs are updated, we'll just revalidate them at that point. >>> >>>>> >>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>> means that this is no longer necessary? >>>>> >>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>> patch description. Have you verified that that problem is still not >>>>> reproducible when you remove d_weak_revalidate? >>>> >>>> I did try the reproducer and it works as expected both with and without >>>> d_weak_revalidate. >>>> On reflection, the problem it displayed was caused by d_revalidate() >>>> being called when the dentry name was irrelevant. We remove that >>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>> that minimum functionality was still useful. I'm currently not >>>> convinced that even that is needed. >>>> >>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>> handling of umount.... >>> >>> I like idea. I say go for it and we can see what (if anything) breaks? >> >> Getting rid of d_weak_revalidate is easy enough - hardly any users. >> >> Getting rid of filename_mountpoint() isn't so easy unfortunately. >> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >> in autofs4_d_manage()? It would be a shame to keep this infrastructure >> around just so that one part of autofs4 can talk to another part of >> autofs4. When this was first implemented autofs didn't use kern_path_mountpoint() (it didn't exist) it used a path lookup on the parent and a separate lookup for the last component. It's used for two operations, first to open a file handle to a (possibly) covered autofs mount, and second to get mounted information about a path without following past a (possibly covered) autofs mount. It's less about not triggering an automount or getting stuck in ->d_manage() and more about resolving paths that are not accessible via normal vfs walks. I never thought about re-validation for either of these cases and altering it to the way it was before filename_mountpoint() shouldn't be a problem. >> >> Do you know if the fact that filename_mountpoint() skips ->d_manage is >> important for sys_umount ?? That's a lot harder to answer, d_manage() is needed by autofs to detect if a path walk needs to block because of some concurrent activity, usually to avoid concurrent expire (indirectly a umount) to mount races. But it isn't the umount it's the autofs expire that sets the flag used to block walks (sends it to the ref-walk path), the (potential) umount comes later as a result of the expire and the flag is cleared after expire completion. I can talk about why ->d_manage() isn't (shouldn't) be needed for other vfs automount users such as NFS etc. if needed. So filename_mountpoint() skiping ->d_manage shouldn't matter to autofs at all. If it goes away then autofs will need to revert to an alternate way (similar to the way it was originally done) of doing the lookup. >> >> Thanks, >> NeilBrown >> > > (cc'ing David and Ian) > > I'm less familiar with the automounting machinery, but I imagine you > don't really want to go triggering new mounts when your intent is to > unmount something. As long as that doesn't happen I'd think we'd be ok > here. Ian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-17 2:20 ` Ian Kent @ 2017-08-18 5:24 ` NeilBrown 2017-08-18 6:47 ` Ian Kent 0 siblings, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-18 5:24 UTC (permalink / raw) To: Ian Kent, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells [-- Attachment #1: Type: text/plain, Size: 7398 bytes --] On Thu, Aug 17 2017, Ian Kent wrote: > On 16/08/17 19:34, Jeff Layton wrote: >> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >>> On Mon, Aug 14 2017, Jeff Layton wrote: >>> >>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>>> >>>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>>> NFSv4 >>>>>>>> dentries. >>>>>>>> >>>>>>>> And nobody noticed. >>>>>>>> >>>>>>>> Until today. >>>>>>>> >>>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>>> NFS >>>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>>> makes >>>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>>> validate >>>>>>>> the fix, he couldn't. Then nor could I. >>>>>>>> >>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>>> >>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>>> if >>>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>>> does >>>>>>>> it serve? I couldn't find one. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> NeilBrown >>>>>>>> >>>>>>>> For reference, see >>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>>> d_weak_revalidate dentry op") >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>>> 4/ reboot >>>>>>>> >>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>>> timeout. >>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>>> shutdown while >>>>>>>> remounting the nfs filesystem read-only. >>>>>>>> >>>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>>> slows >>>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>>> that >>>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>>> >>>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>>> problems. >>>>>>>> >>>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>>> that isn't the only bug here. >>>>>>> >>>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>>> arise when someone changes the contents of the current working >>>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>>> specially in the lookup code, they would not be revalidated without >>>>>>> special treatment. That leads to issues when looking up files as >>>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>>> >>>>>>> The one thing that has changed since its introduction is, I believe, >>>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>>> them. >>>>>>> >>>>>> >>>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>>> >>>>>> vfs: allow umount to handle mountpoints without revalidating them >>>>> >>>>> You say in the comment for that commit: >>>>> >>>>> but there >>>>> are cases where we do want to revalidate the root of the fs. >>>>> >>>>> Do you happen to remember what those cases are? >>>>> >>>> >>>> Not exactly, but I _think_ I might have been assuming that we needed to >>>> ensure that the inode attrs on the root were up to date after the >>>> pathwalk. >>>> >>>> I think that was probably wrong. d_revalidate is really intended to >>>> ensure that the dentry in question still points to the same inode. In >>>> the case of the root of the mount though, we don't really care about the >>>> dentry on the server at all. We're attaching the root of the mount to an >>>> inode and don't care of the dentry name changes. If we do need to ensure >>>> the inode attrs are updated, we'll just revalidate them at that point. >>>> >>>>>> >>>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>>> means that this is no longer necessary? >>>>>> >>>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>>> patch description. Have you verified that that problem is still not >>>>>> reproducible when you remove d_weak_revalidate? >>>>> >>>>> I did try the reproducer and it works as expected both with and without >>>>> d_weak_revalidate. >>>>> On reflection, the problem it displayed was caused by d_revalidate() >>>>> being called when the dentry name was irrelevant. We remove that >>>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>>> that minimum functionality was still useful. I'm currently not >>>>> convinced that even that is needed. >>>>> >>>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>>> handling of umount.... >>>> >>>> I like idea. I say go for it and we can see what (if anything) breaks? >>> >>> Getting rid of d_weak_revalidate is easy enough - hardly any users. >>> >>> Getting rid of filename_mountpoint() isn't so easy unfortunately. >>> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >>> in autofs4_d_manage()? It would be a shame to keep this infrastructure >>> around just so that one part of autofs4 can talk to another part of >>> autofs4. > > When this was first implemented autofs didn't use kern_path_mountpoint() > (it didn't exist) it used a path lookup on the parent and a separate > lookup for the last component. This was before commit 4e44b6852e03 ("Get rid of path_lookup in autofs4"). This used kern_path(). I'm more interested in commit ac8387199656 ("autofs4 - fix device ioctl mount lookup") which replaced the use of kern_path() with kern_path_mountpoint(). > > It's used for two operations, first to open a file handle to a (possibly) > covered autofs mount, and second to get mounted information about a path > without following past a (possibly covered) autofs mount. > > It's less about not triggering an automount or getting stuck in ->d_manage() > and more about resolving paths that are not accessible via normal vfs walks. > > I never thought about re-validation for either of these cases and altering > it to the way it was before filename_mountpoint() shouldn't be a > problem. If it shouldn't be a problem, what justified ac8387199656?? Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-18 5:24 ` NeilBrown @ 2017-08-18 6:47 ` Ian Kent 2017-08-18 6:55 ` Ian Kent 2017-08-21 6:23 ` NeilBrown 0 siblings, 2 replies; 34+ messages in thread From: Ian Kent @ 2017-08-18 6:47 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 18/08/17 13:24, NeilBrown wrote: > On Thu, Aug 17 2017, Ian Kent wrote: > >> On 16/08/17 19:34, Jeff Layton wrote: >>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >>>> On Mon, Aug 14 2017, Jeff Layton wrote: >>>> >>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>>>> >>>>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>>>> NFSv4 >>>>>>>>> dentries. >>>>>>>>> >>>>>>>>> And nobody noticed. >>>>>>>>> >>>>>>>>> Until today. >>>>>>>>> >>>>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>>>> NFS >>>>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>>>> makes >>>>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>>>> validate >>>>>>>>> the fix, he couldn't. Then nor could I. >>>>>>>>> >>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>>>> >>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>>>> if >>>>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>>>> does >>>>>>>>> it serve? I couldn't find one. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> NeilBrown >>>>>>>>> >>>>>>>>> For reference, see >>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>>>> d_weak_revalidate dentry op") >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>>>> 4/ reboot >>>>>>>>> >>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>>>> timeout. >>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>>>> shutdown while >>>>>>>>> remounting the nfs filesystem read-only. >>>>>>>>> >>>>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>>>> slows >>>>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>>>> that >>>>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>>>> >>>>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>>>> problems. >>>>>>>>> >>>>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>>>> that isn't the only bug here. >>>>>>>> >>>>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>>>> arise when someone changes the contents of the current working >>>>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>>>> specially in the lookup code, they would not be revalidated without >>>>>>>> special treatment. That leads to issues when looking up files as >>>>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>>>> >>>>>>>> The one thing that has changed since its introduction is, I believe, >>>>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>>>> them. >>>>>>>> >>>>>>> >>>>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>>>> >>>>>>> vfs: allow umount to handle mountpoints without revalidating them >>>>>> >>>>>> You say in the comment for that commit: >>>>>> >>>>>> but there >>>>>> are cases where we do want to revalidate the root of the fs. >>>>>> >>>>>> Do you happen to remember what those cases are? >>>>>> >>>>> >>>>> Not exactly, but I _think_ I might have been assuming that we needed to >>>>> ensure that the inode attrs on the root were up to date after the >>>>> pathwalk. >>>>> >>>>> I think that was probably wrong. d_revalidate is really intended to >>>>> ensure that the dentry in question still points to the same inode. In >>>>> the case of the root of the mount though, we don't really care about the >>>>> dentry on the server at all. We're attaching the root of the mount to an >>>>> inode and don't care of the dentry name changes. If we do need to ensure >>>>> the inode attrs are updated, we'll just revalidate them at that point. >>>>> >>>>>>> >>>>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>>>> means that this is no longer necessary? >>>>>>> >>>>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>>>> patch description. Have you verified that that problem is still not >>>>>>> reproducible when you remove d_weak_revalidate? >>>>>> >>>>>> I did try the reproducer and it works as expected both with and without >>>>>> d_weak_revalidate. >>>>>> On reflection, the problem it displayed was caused by d_revalidate() >>>>>> being called when the dentry name was irrelevant. We remove that >>>>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>>>> that minimum functionality was still useful. I'm currently not >>>>>> convinced that even that is needed. >>>>>> >>>>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>>>> handling of umount.... >>>>> >>>>> I like idea. I say go for it and we can see what (if anything) breaks? >>>> >>>> Getting rid of d_weak_revalidate is easy enough - hardly any users. >>>> >>>> Getting rid of filename_mountpoint() isn't so easy unfortunately. >>>> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >>>> in autofs4_d_manage()? It would be a shame to keep this infrastructure >>>> around just so that one part of autofs4 can talk to another part of >>>> autofs4. >> >> When this was first implemented autofs didn't use kern_path_mountpoint() >> (it didn't exist) it used a path lookup on the parent and a separate >> lookup for the last component. > > This was before commit 4e44b6852e03 ("Get rid of path_lookup in > autofs4"). This used kern_path(). I have to plead not guilty of this one. IIRC it broke the requirement of "lookup parent then lookup last component" rather it walked the whole path then followed up to find the mount point struct path. Like it says in the description of ac8387199656 the caller might not yet "own" the autofs mount which causes a mount to be triggered during the walk that can't be satisfied because of the deadlock that occurs. > > I'm more interested in commit ac8387199656 ("autofs4 - fix device ioctl > mount lookup") which replaced the use of kern_path() with > kern_path_mountpoint(). Probably should have had a Fixes: 4e44b6852e03 ... > >> >> It's used for two operations, first to open a file handle to a (possibly) >> covered autofs mount, and second to get mounted information about a path >> without following past a (possibly covered) autofs mount. >> >> It's less about not triggering an automount or getting stuck in ->d_manage() >> and more about resolving paths that are not accessible via normal vfs walks. >> >> I never thought about re-validation for either of these cases and altering >> it to the way it was before filename_mountpoint() shouldn't be a >> problem. > > If it shouldn't be a problem, what justified ac8387199656?? > > Thanks, > NeilBrown > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-18 6:47 ` Ian Kent @ 2017-08-18 6:55 ` Ian Kent 2017-08-21 6:23 ` NeilBrown 1 sibling, 0 replies; 34+ messages in thread From: Ian Kent @ 2017-08-18 6:55 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 18/08/17 14:47, Ian Kent wrote: > On 18/08/17 13:24, NeilBrown wrote: >> On Thu, Aug 17 2017, Ian Kent wrote: >> >>> On 16/08/17 19:34, Jeff Layton wrote: >>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >>>>> On Mon, Aug 14 2017, Jeff Layton wrote: >>>>> >>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>>>>> >>>>>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>>>>> NFSv4 >>>>>>>>>> dentries. >>>>>>>>>> >>>>>>>>>> And nobody noticed. >>>>>>>>>> >>>>>>>>>> Until today. >>>>>>>>>> >>>>>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>>>>> NFS >>>>>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>>>>> makes >>>>>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>>>>> validate >>>>>>>>>> the fix, he couldn't. Then nor could I. >>>>>>>>>> >>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>>>>> >>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>>>>> if >>>>>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>>>>> does >>>>>>>>>> it serve? I couldn't find one. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> NeilBrown >>>>>>>>>> >>>>>>>>>> For reference, see >>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>>>>> d_weak_revalidate dentry op") >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>>>>> 4/ reboot >>>>>>>>>> >>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>>>>> timeout. >>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>>>>> shutdown while >>>>>>>>>> remounting the nfs filesystem read-only. >>>>>>>>>> >>>>>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>>>>> slows >>>>>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>>>>> that >>>>>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>>>>> >>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>>>>> problems. >>>>>>>>>> >>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>>>>> that isn't the only bug here. >>>>>>>>> >>>>>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>>>>> arise when someone changes the contents of the current working >>>>>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>>>>> specially in the lookup code, they would not be revalidated without >>>>>>>>> special treatment. That leads to issues when looking up files as >>>>>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>>>>> >>>>>>>>> The one thing that has changed since its introduction is, I believe, >>>>>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>>>>> them. >>>>>>>>> >>>>>>>> >>>>>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>>>>> >>>>>>>> vfs: allow umount to handle mountpoints without revalidating them >>>>>>> >>>>>>> You say in the comment for that commit: >>>>>>> >>>>>>> but there >>>>>>> are cases where we do want to revalidate the root of the fs. >>>>>>> >>>>>>> Do you happen to remember what those cases are? >>>>>>> >>>>>> >>>>>> Not exactly, but I _think_ I might have been assuming that we needed to >>>>>> ensure that the inode attrs on the root were up to date after the >>>>>> pathwalk. >>>>>> >>>>>> I think that was probably wrong. d_revalidate is really intended to >>>>>> ensure that the dentry in question still points to the same inode. In >>>>>> the case of the root of the mount though, we don't really care about the >>>>>> dentry on the server at all. We're attaching the root of the mount to an >>>>>> inode and don't care of the dentry name changes. If we do need to ensure >>>>>> the inode attrs are updated, we'll just revalidate them at that point. >>>>>> >>>>>>>> >>>>>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>>>>> means that this is no longer necessary? >>>>>>>> >>>>>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>>>>> patch description. Have you verified that that problem is still not >>>>>>>> reproducible when you remove d_weak_revalidate? >>>>>>> >>>>>>> I did try the reproducer and it works as expected both with and without >>>>>>> d_weak_revalidate. >>>>>>> On reflection, the problem it displayed was caused by d_revalidate() >>>>>>> being called when the dentry name was irrelevant. We remove that >>>>>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>>>>> that minimum functionality was still useful. I'm currently not >>>>>>> convinced that even that is needed. >>>>>>> >>>>>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>>>>> handling of umount.... >>>>>> >>>>>> I like idea. I say go for it and we can see what (if anything) breaks? >>>>> >>>>> Getting rid of d_weak_revalidate is easy enough - hardly any users. >>>>> >>>>> Getting rid of filename_mountpoint() isn't so easy unfortunately. >>>>> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >>>>> in autofs4_d_manage()? It would be a shame to keep this infrastructure >>>>> around just so that one part of autofs4 can talk to another part of >>>>> autofs4. >>> >>> When this was first implemented autofs didn't use kern_path_mountpoint() >>> (it didn't exist) it used a path lookup on the parent and a separate >>> lookup for the last component. >> >> This was before commit 4e44b6852e03 ("Get rid of path_lookup in >> autofs4"). This used kern_path(). > > I have to plead not guilty of this one. > > IIRC it broke the requirement of "lookup parent then lookup last component" > rather it walked the whole path then followed up to find the mount point > struct path. > > Like it says in the description of ac8387199656 the caller might not yet > "own" the autofs mount which causes a mount to be triggered during the > walk that can't be satisfied because of the deadlock that occurs. Also, 4e44b6852e03 fixed another mistake I had made with: - if (nd.path.mnt->mnt_mountpoint != nd.path.mnt->mnt_root) { - if (follow_down(&nd.path.mnt, &nd.path.dentry)) { - struct inode *inode = nd.path.dentry->d_inode; - magic = inode->i_sb->s_magic; - } + if (path.mnt->mnt_mountpoint != path.mnt->mnt_root) { + if (follow_down(&path.mnt, &path.dentry)) + magic = path.mnt->mnt_sb->s_magic; } > >> >> I'm more interested in commit ac8387199656 ("autofs4 - fix device ioctl >> mount lookup") which replaced the use of kern_path() with >> kern_path_mountpoint(). > > Probably should have had a Fixes: 4e44b6852e03 ... > >> >>> >>> It's used for two operations, first to open a file handle to a (possibly) >>> covered autofs mount, and second to get mounted information about a path >>> without following past a (possibly covered) autofs mount. >>> >>> It's less about not triggering an automount or getting stuck in ->d_manage() >>> and more about resolving paths that are not accessible via normal vfs walks. >>> >>> I never thought about re-validation for either of these cases and altering >>> it to the way it was before filename_mountpoint() shouldn't be a >>> problem. >> >> If it shouldn't be a problem, what justified ac8387199656?? >> >> Thanks, >> NeilBrown >> > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-18 6:47 ` Ian Kent 2017-08-18 6:55 ` Ian Kent @ 2017-08-21 6:23 ` NeilBrown 2017-08-21 6:32 ` Ian Kent 1 sibling, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-21 6:23 UTC (permalink / raw) To: Ian Kent, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells [-- Attachment #1: Type: text/plain, Size: 8224 bytes --] On Fri, Aug 18 2017, Ian Kent wrote: > On 18/08/17 13:24, NeilBrown wrote: >> On Thu, Aug 17 2017, Ian Kent wrote: >> >>> On 16/08/17 19:34, Jeff Layton wrote: >>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >>>>> On Mon, Aug 14 2017, Jeff Layton wrote: >>>>> >>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>>>>> >>>>>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>>>>> NFSv4 >>>>>>>>>> dentries. >>>>>>>>>> >>>>>>>>>> And nobody noticed. >>>>>>>>>> >>>>>>>>>> Until today. >>>>>>>>>> >>>>>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>>>>> NFS >>>>>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>>>>> makes >>>>>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>>>>> validate >>>>>>>>>> the fix, he couldn't. Then nor could I. >>>>>>>>>> >>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>>>>> >>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>>>>> if >>>>>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>>>>> does >>>>>>>>>> it serve? I couldn't find one. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> NeilBrown >>>>>>>>>> >>>>>>>>>> For reference, see >>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>>>>> d_weak_revalidate dentry op") >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>>>>> 4/ reboot >>>>>>>>>> >>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>>>>> timeout. >>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>>>>> shutdown while >>>>>>>>>> remounting the nfs filesystem read-only. >>>>>>>>>> >>>>>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>>>>> slows >>>>>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>>>>> that >>>>>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>>>>> >>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>>>>> problems. >>>>>>>>>> >>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>>>>> that isn't the only bug here. >>>>>>>>> >>>>>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>>>>> arise when someone changes the contents of the current working >>>>>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>>>>> specially in the lookup code, they would not be revalidated without >>>>>>>>> special treatment. That leads to issues when looking up files as >>>>>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>>>>> >>>>>>>>> The one thing that has changed since its introduction is, I believe, >>>>>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>>>>> them. >>>>>>>>> >>>>>>>> >>>>>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>>>>> >>>>>>>> vfs: allow umount to handle mountpoints without revalidating them >>>>>>> >>>>>>> You say in the comment for that commit: >>>>>>> >>>>>>> but there >>>>>>> are cases where we do want to revalidate the root of the fs. >>>>>>> >>>>>>> Do you happen to remember what those cases are? >>>>>>> >>>>>> >>>>>> Not exactly, but I _think_ I might have been assuming that we needed to >>>>>> ensure that the inode attrs on the root were up to date after the >>>>>> pathwalk. >>>>>> >>>>>> I think that was probably wrong. d_revalidate is really intended to >>>>>> ensure that the dentry in question still points to the same inode. In >>>>>> the case of the root of the mount though, we don't really care about the >>>>>> dentry on the server at all. We're attaching the root of the mount to an >>>>>> inode and don't care of the dentry name changes. If we do need to ensure >>>>>> the inode attrs are updated, we'll just revalidate them at that point. >>>>>> >>>>>>>> >>>>>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>>>>> means that this is no longer necessary? >>>>>>>> >>>>>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>>>>> patch description. Have you verified that that problem is still not >>>>>>>> reproducible when you remove d_weak_revalidate? >>>>>>> >>>>>>> I did try the reproducer and it works as expected both with and without >>>>>>> d_weak_revalidate. >>>>>>> On reflection, the problem it displayed was caused by d_revalidate() >>>>>>> being called when the dentry name was irrelevant. We remove that >>>>>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>>>>> that minimum functionality was still useful. I'm currently not >>>>>>> convinced that even that is needed. >>>>>>> >>>>>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>>>>> handling of umount.... >>>>>> >>>>>> I like idea. I say go for it and we can see what (if anything) breaks? >>>>> >>>>> Getting rid of d_weak_revalidate is easy enough - hardly any users. >>>>> >>>>> Getting rid of filename_mountpoint() isn't so easy unfortunately. >>>>> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >>>>> in autofs4_d_manage()? It would be a shame to keep this infrastructure >>>>> around just so that one part of autofs4 can talk to another part of >>>>> autofs4. >>> >>> When this was first implemented autofs didn't use kern_path_mountpoint() >>> (it didn't exist) it used a path lookup on the parent and a separate >>> lookup for the last component. >> >> This was before commit 4e44b6852e03 ("Get rid of path_lookup in >> autofs4"). This used kern_path(). > > I have to plead not guilty of this one. > > IIRC it broke the requirement of "lookup parent then lookup last component" > rather it walked the whole path then followed up to find the mount point > struct path. > > Like it says in the description of ac8387199656 the caller might not yet > "own" the autofs mount which causes a mount to be triggered during the > walk that can't be satisfied because of the deadlock that occurs. A mount isn't triggered by kern_path(pathname, 0, &path). That '0' would need to include one of LOOKUP_PARENT | LOOKUP_DIRECTORY | LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT to trigger an automount (otherwise you just get -EISDIR). That is why I assumed that ->d_managed was the problem. > >> >> I'm more interested in commit ac8387199656 ("autofs4 - fix device ioctl >> mount lookup") which replaced the use of kern_path() with >> kern_path_mountpoint(). > > Probably should have had a Fixes: 4e44b6852e03 ... Still a bit confused as to exactly what was fixed... Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-21 6:23 ` NeilBrown @ 2017-08-21 6:32 ` Ian Kent 2017-08-21 7:46 ` NeilBrown 2017-08-23 1:06 ` NeilBrown 0 siblings, 2 replies; 34+ messages in thread From: Ian Kent @ 2017-08-21 6:32 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 21/08/17 14:23, NeilBrown wrote: > On Fri, Aug 18 2017, Ian Kent wrote: > >> On 18/08/17 13:24, NeilBrown wrote: >>> On Thu, Aug 17 2017, Ian Kent wrote: >>> >>>> On 16/08/17 19:34, Jeff Layton wrote: >>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >>>>>> On Mon, Aug 14 2017, Jeff Layton wrote: >>>>>> >>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>>>>>> >>>>>>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>>>>>> NFSv4 >>>>>>>>>>> dentries. >>>>>>>>>>> >>>>>>>>>>> And nobody noticed. >>>>>>>>>>> >>>>>>>>>>> Until today. >>>>>>>>>>> >>>>>>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>>>>>> NFS >>>>>>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>>>>>> makes >>>>>>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>>>>>> validate >>>>>>>>>>> the fix, he couldn't. Then nor could I. >>>>>>>>>>> >>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>>>>>> >>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>>>>>> if >>>>>>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>>>>>> does >>>>>>>>>>> it serve? I couldn't find one. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> NeilBrown >>>>>>>>>>> >>>>>>>>>>> For reference, see >>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>>>>>> d_weak_revalidate dentry op") >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>>>>>> 4/ reboot >>>>>>>>>>> >>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>>>>>> timeout. >>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>>>>>> shutdown while >>>>>>>>>>> remounting the nfs filesystem read-only. >>>>>>>>>>> >>>>>>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>>>>>> slows >>>>>>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>>>>>> that >>>>>>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>>>>>> >>>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>>>>>> problems. >>>>>>>>>>> >>>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>>>>>> that isn't the only bug here. >>>>>>>>>> >>>>>>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>>>>>> arise when someone changes the contents of the current working >>>>>>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>>>>>> specially in the lookup code, they would not be revalidated without >>>>>>>>>> special treatment. That leads to issues when looking up files as >>>>>>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>>>>>> >>>>>>>>>> The one thing that has changed since its introduction is, I believe, >>>>>>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>>>>>> them. >>>>>>>>>> >>>>>>>>> >>>>>>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>>>>>> >>>>>>>>> vfs: allow umount to handle mountpoints without revalidating them >>>>>>>> >>>>>>>> You say in the comment for that commit: >>>>>>>> >>>>>>>> but there >>>>>>>> are cases where we do want to revalidate the root of the fs. >>>>>>>> >>>>>>>> Do you happen to remember what those cases are? >>>>>>>> >>>>>>> >>>>>>> Not exactly, but I _think_ I might have been assuming that we needed to >>>>>>> ensure that the inode attrs on the root were up to date after the >>>>>>> pathwalk. >>>>>>> >>>>>>> I think that was probably wrong. d_revalidate is really intended to >>>>>>> ensure that the dentry in question still points to the same inode. In >>>>>>> the case of the root of the mount though, we don't really care about the >>>>>>> dentry on the server at all. We're attaching the root of the mount to an >>>>>>> inode and don't care of the dentry name changes. If we do need to ensure >>>>>>> the inode attrs are updated, we'll just revalidate them at that point. >>>>>>> >>>>>>>>> >>>>>>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>>>>>> means that this is no longer necessary? >>>>>>>>> >>>>>>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>>>>>> patch description. Have you verified that that problem is still not >>>>>>>>> reproducible when you remove d_weak_revalidate? >>>>>>>> >>>>>>>> I did try the reproducer and it works as expected both with and without >>>>>>>> d_weak_revalidate. >>>>>>>> On reflection, the problem it displayed was caused by d_revalidate() >>>>>>>> being called when the dentry name was irrelevant. We remove that >>>>>>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>>>>>> that minimum functionality was still useful. I'm currently not >>>>>>>> convinced that even that is needed. >>>>>>>> >>>>>>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>>>>>> handling of umount.... >>>>>>> >>>>>>> I like idea. I say go for it and we can see what (if anything) breaks? >>>>>> >>>>>> Getting rid of d_weak_revalidate is easy enough - hardly any users. >>>>>> >>>>>> Getting rid of filename_mountpoint() isn't so easy unfortunately. >>>>>> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >>>>>> in autofs4_d_manage()? It would be a shame to keep this infrastructure >>>>>> around just so that one part of autofs4 can talk to another part of >>>>>> autofs4. >>>> >>>> When this was first implemented autofs didn't use kern_path_mountpoint() >>>> (it didn't exist) it used a path lookup on the parent and a separate >>>> lookup for the last component. >>> >>> This was before commit 4e44b6852e03 ("Get rid of path_lookup in >>> autofs4"). This used kern_path(). >> >> I have to plead not guilty of this one. >> >> IIRC it broke the requirement of "lookup parent then lookup last component" >> rather it walked the whole path then followed up to find the mount point >> struct path. >> >> Like it says in the description of ac8387199656 the caller might not yet >> "own" the autofs mount which causes a mount to be triggered during the >> walk that can't be satisfied because of the deadlock that occurs. > > A mount isn't triggered by kern_path(pathname, 0, &path). > That '0' would need to include one of > LOOKUP_PARENT | LOOKUP_DIRECTORY | > LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT > > to trigger an automount (otherwise you just get -EISDIR). It's perfectly sensible to think that but there is a case where a a mount is triggered when using kern_path(). The EISDIR return occurs for positive dentrys, negative dentrys will still trigger an automount (which is autofs specific, indirect mount map using nobrowse option, the install default). > > That is why I assumed that ->d_managed was the problem. > >> >>> >>> I'm more interested in commit ac8387199656 ("autofs4 - fix device ioctl >>> mount lookup") which replaced the use of kern_path() with >>> kern_path_mountpoint(). >> >> Probably should have had a Fixes: 4e44b6852e03 ... > > Still a bit confused as to exactly what was fixed... Hopefully also considering the negative dentry case will clear that up. Ian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-21 6:32 ` Ian Kent @ 2017-08-21 7:46 ` NeilBrown 2017-08-23 1:06 ` NeilBrown 1 sibling, 0 replies; 34+ messages in thread From: NeilBrown @ 2017-08-21 7:46 UTC (permalink / raw) To: Ian Kent, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells [-- Attachment #1: Type: text/plain, Size: 9117 bytes --] On Mon, Aug 21 2017, Ian Kent wrote: > On 21/08/17 14:23, NeilBrown wrote: >> On Fri, Aug 18 2017, Ian Kent wrote: >> >>> On 18/08/17 13:24, NeilBrown wrote: >>>> On Thu, Aug 17 2017, Ian Kent wrote: >>>> >>>>> On 16/08/17 19:34, Jeff Layton wrote: >>>>>> On Wed, 2017-08-16 at 12:43 +1000, NeilBrown wrote: >>>>>>> On Mon, Aug 14 2017, Jeff Layton wrote: >>>>>>> >>>>>>>> On Mon, 2017-08-14 at 09:36 +1000, NeilBrown wrote: >>>>>>>>> On Fri, Aug 11 2017, Jeff Layton wrote: >>>>>>>>> >>>>>>>>>> On Fri, 2017-08-11 at 05:55 +0000, Trond Myklebust wrote: >>>>>>>>>>> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>>>>>>>>>>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>>>>>>>>>>> flag and introduced the d_weak_revalidate dentry operation instead. >>>>>>>>>>>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>>>>>>>>>>> and added the new dentry operation to NFS dentries .... but not to >>>>>>>>>>>> NFSv4 >>>>>>>>>>>> dentries. >>>>>>>>>>>> >>>>>>>>>>>> And nobody noticed. >>>>>>>>>>>> >>>>>>>>>>>> Until today. >>>>>>>>>>>> >>>>>>>>>>>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>>>>>>>>>>> NFS >>>>>>>>>>>> filesystem hangs because the network has been deconfigured. This >>>>>>>>>>>> makes >>>>>>>>>>>> perfect sense and I suggested a code change to fix the problem. >>>>>>>>>>>> However when a colleague was trying to reproduce the problem to >>>>>>>>>>>> validate >>>>>>>>>>>> the fix, he couldn't. Then nor could I. >>>>>>>>>>>> >>>>>>>>>>>> The problem is trivially reproducible with NFSv3, and not at all with >>>>>>>>>>>> NFSv4. The reason is the missing d_weak_revalidate. >>>>>>>>>>>> >>>>>>>>>>>> We could simply add d_weak_revalidate for NFSv4, but given that it >>>>>>>>>>>> has been missing for 4.5 years, and the only time anyone noticed was >>>>>>>>>>>> when the ommission resulted in a better user experience, I do wonder >>>>>>>>>>>> if >>>>>>>>>>>> we need to. Can we just discard d_weak_revalidate? What purpose >>>>>>>>>>>> does >>>>>>>>>>>> it serve? I couldn't find one. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> NeilBrown >>>>>>>>>>>> >>>>>>>>>>>> For reference, see >>>>>>>>>>>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>>>>>>>>>>> d_weak_revalidate dentry op") >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> To reproduce the problem at home, on a system that uses systemd: >>>>>>>>>>>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>>>>>>>>>>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>>>>>>>>>>> 3/ loop-mount the filesystem image read-only somewhere >>>>>>>>>>>> 4/ reboot >>>>>>>>>>>> >>>>>>>>>>>> If you choose v4, the reboot will succeed, possibly after a 90second >>>>>>>>>>>> timeout. >>>>>>>>>>>> If you choose v3, the reboot will hang indefinitely in systemd- >>>>>>>>>>>> shutdown while >>>>>>>>>>>> remounting the nfs filesystem read-only. >>>>>>>>>>>> >>>>>>>>>>>> If you don't use "noac" it can still hang, but only if something >>>>>>>>>>>> slows >>>>>>>>>>>> down the reboot enough that attributes have timed out by the time >>>>>>>>>>>> that >>>>>>>>>>>> systemd-shutdown runs. This happens for our customer. >>>>>>>>>>>> >>>>>>>>>>>> If the loop-mounted filesystem is not read-only, you get other >>>>>>>>>>>> problems. >>>>>>>>>>>> >>>>>>>>>>>> We really want systemd to figure out that the loop-mount needs to be >>>>>>>>>>>> unmounted first. I have ideas concerning that, but it is messy. But >>>>>>>>>>>> that isn't the only bug here. >>>>>>>>>>> >>>>>>>>>>> The main purpose of d_weak_revalidate() was to catch the issues that >>>>>>>>>>> arise when someone changes the contents of the current working >>>>>>>>>>> directory or its parent on the server. Since '.' and '..' are treated >>>>>>>>>>> specially in the lookup code, they would not be revalidated without >>>>>>>>>>> special treatment. That leads to issues when looking up files as >>>>>>>>>>> ./<filename> or ../<filename>, since the client won't detect that its >>>>>>>>>>> dcache is stale until it tries to use the cached dentry+inode. >>>>>>>>>>> >>>>>>>>>>> The one thing that has changed since its introduction is, I believe, >>>>>>>>>>> the ESTALE handling in the VFS layer. That might fix a lot of the >>>>>>>>>>> dcache lookup bugs that were previously handled by d_weak_revalidate(). >>>>>>>>>>> I haven't done an audit to figure out if it actually can handle all of >>>>>>>>>>> them. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It may also be related to 8033426e6bdb2690d302872ac1e1fadaec1a5581: >>>>>>>>>> >>>>>>>>>> vfs: allow umount to handle mountpoints without revalidating them >>>>>>>>> >>>>>>>>> You say in the comment for that commit: >>>>>>>>> >>>>>>>>> but there >>>>>>>>> are cases where we do want to revalidate the root of the fs. >>>>>>>>> >>>>>>>>> Do you happen to remember what those cases are? >>>>>>>>> >>>>>>>> >>>>>>>> Not exactly, but I _think_ I might have been assuming that we needed to >>>>>>>> ensure that the inode attrs on the root were up to date after the >>>>>>>> pathwalk. >>>>>>>> >>>>>>>> I think that was probably wrong. d_revalidate is really intended to >>>>>>>> ensure that the dentry in question still points to the same inode. In >>>>>>>> the case of the root of the mount though, we don't really care about the >>>>>>>> dentry on the server at all. We're attaching the root of the mount to an >>>>>>>> inode and don't care of the dentry name changes. If we do need to ensure >>>>>>>> the inode attrs are updated, we'll just revalidate them at that point. >>>>>>>> >>>>>>>>>> >>>>>>>>>> Possibly the fact that we no longer try to revalidate during unmount >>>>>>>>>> means that this is no longer necessary? >>>>>>>>>> >>>>>>>>>> The original patch that added d_weak_revalidate had a reproducer in the >>>>>>>>>> patch description. Have you verified that that problem is still not >>>>>>>>>> reproducible when you remove d_weak_revalidate? >>>>>>>>> >>>>>>>>> I did try the reproducer and it works as expected both with and without >>>>>>>>> d_weak_revalidate. >>>>>>>>> On reflection, the problem it displayed was caused by d_revalidate() >>>>>>>>> being called when the dentry name was irrelevant. We remove that >>>>>>>>> (fixing the problem) and introduce d_weak_revalidate because we thought >>>>>>>>> that minimum functionality was still useful. I'm currently not >>>>>>>>> convinced that even that is needed. >>>>>>>>> >>>>>>>>> If we discarded d_weak_revalidate(), we could get rid of the special >>>>>>>>> handling of umount.... >>>>>>>> >>>>>>>> I like idea. I say go for it and we can see what (if anything) breaks? >>>>>>> >>>>>>> Getting rid of d_weak_revalidate is easy enough - hardly any users. >>>>>>> >>>>>>> Getting rid of filename_mountpoint() isn't so easy unfortunately. >>>>>>> autofs4 uses kern_path_mountpoint() - presumably to avoid getting stuck >>>>>>> in autofs4_d_manage()? It would be a shame to keep this infrastructure >>>>>>> around just so that one part of autofs4 can talk to another part of >>>>>>> autofs4. >>>>> >>>>> When this was first implemented autofs didn't use kern_path_mountpoint() >>>>> (it didn't exist) it used a path lookup on the parent and a separate >>>>> lookup for the last component. >>>> >>>> This was before commit 4e44b6852e03 ("Get rid of path_lookup in >>>> autofs4"). This used kern_path(). >>> >>> I have to plead not guilty of this one. >>> >>> IIRC it broke the requirement of "lookup parent then lookup last component" >>> rather it walked the whole path then followed up to find the mount point >>> struct path. >>> >>> Like it says in the description of ac8387199656 the caller might not yet >>> "own" the autofs mount which causes a mount to be triggered during the >>> walk that can't be satisfied because of the deadlock that occurs. >> >> A mount isn't triggered by kern_path(pathname, 0, &path). >> That '0' would need to include one of >> LOOKUP_PARENT | LOOKUP_DIRECTORY | >> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >> >> to trigger an automount (otherwise you just get -EISDIR). > > It's perfectly sensible to think that but there is a case where a > a mount is triggered when using kern_path(). > > The EISDIR return occurs for positive dentrys, negative dentrys > will still trigger an automount (which is autofs specific, > indirect mount map using nobrowse option, the install default). Ahh - light dawns. Thanks :-) NeilBrown > >> >> That is why I assumed that ->d_managed was the problem. >> >>> >>>> >>>> I'm more interested in commit ac8387199656 ("autofs4 - fix device ioctl >>>> mount lookup") which replaced the use of kern_path() with >>>> kern_path_mountpoint(). >>> >>> Probably should have had a Fixes: 4e44b6852e03 ... >> >> Still a bit confused as to exactly what was fixed... > > Hopefully also considering the negative dentry case will clear that up. > > Ian [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-21 6:32 ` Ian Kent 2017-08-21 7:46 ` NeilBrown @ 2017-08-23 1:06 ` NeilBrown 2017-08-23 2:32 ` Ian Kent 1 sibling, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-23 1:06 UTC (permalink / raw) To: Ian Kent, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells [-- Attachment #1: Type: text/plain, Size: 4148 bytes --] On Mon, Aug 21 2017, Ian Kent wrote: >> >> A mount isn't triggered by kern_path(pathname, 0, &path). >> That '0' would need to include one of >> LOOKUP_PARENT | LOOKUP_DIRECTORY | >> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >> >> to trigger an automount (otherwise you just get -EISDIR). > > It's perfectly sensible to think that but there is a case where a > a mount is triggered when using kern_path(). > > The EISDIR return occurs for positive dentrys, negative dentrys > will still trigger an automount (which is autofs specific, > indirect mount map using nobrowse option, the install default). Ok, I understand this better now. This difference between direct and indirect mounts is slightly awkward. It is visible from user-space, but not elegant to document. When you use O_PATH to open a direct automount that has not already been triggered, the open returns the underlying directory (and fstatfs confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on an indirect automount, it *will* trigger the automount when "nobrowse" is in effect, but it won't when "browse" is in effect. So we cannot just say "O_PATH doesn't trigger automounts", which is essentially what I said in https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 It might be possible to modify automount so that it was more consistent - i.e. if the point is triggered by a mkdir has been done, just to the mkdir. If it is triggered after a mkdir has been done, do the mount. I guess that might be racy, and in any case is hard to justify. Maybe I should change it to be about "direct automounts", and add a note that indirect automounts aren't so predictable. But back to my original issue of wanting to discard kern_path_mountpoint, what would you think of the following approach - slight revised from before. Thanks, NeilBrown diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index beef981aa54f..7663ea82e68d 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) /* autofs4_oz_mode(): do we see the man behind the curtain? (The * processes which do manipulations for us in user space sees the raw * filesystem without "magic".) + * A process performing certain ioctls can get temporary oz status. */ +extern struct task_struct *autofs_tmp_oz; static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) { - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || + autofs_tmp_oz == current; } struct inode *autofs4_get_inode(struct super_block *, umode_t); diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c index dd9f1bebb5a3..d76401669a20 100644 --- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, return 0; } +struct task_struct *autofs_tmp_oz; +int kern_path_oz(const char *pathname, int flags, struct path *path) +{ + static DEFINE_MUTEX(autofs_oz); + int err; + + mutex_lock(&autofs_oz); + autofs_tmp_oz = current; + err = kern_path(pathname, flags, path); + autofs_tmp_oz = NULL; + mutex_unlock(&autofs_oz); + return err; +} + /* Find the topmost mount satisfying test() */ static int find_autofs_mount(const char *pathname, struct path *res, @@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname, struct path path; int err; - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); + err = kern_path_oz(pathname, 0, &path); + if (err) return err; err = -ENOENT; @@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp, if (!fp || param->ioctlfd == -1) { if (autofs_type_any(type)) - err = kern_path_mountpoint(AT_FDCWD, - name, &path, LOOKUP_FOLLOW); + err = kern_path_oz(name, LOOKUP_FOLLOW, &path); else err = find_autofs_mount(name, &path, test_by_type, &type); [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-23 1:06 ` NeilBrown @ 2017-08-23 2:32 ` Ian Kent 2017-08-23 2:40 ` Ian Kent 2017-08-24 4:07 ` NeilBrown 0 siblings, 2 replies; 34+ messages in thread From: Ian Kent @ 2017-08-23 2:32 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 23/08/17 09:06, NeilBrown wrote: > On Mon, Aug 21 2017, Ian Kent wrote: > >>> >>> A mount isn't triggered by kern_path(pathname, 0, &path). >>> That '0' would need to include one of >>> LOOKUP_PARENT | LOOKUP_DIRECTORY | >>> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >>> >>> to trigger an automount (otherwise you just get -EISDIR). >> >> It's perfectly sensible to think that but there is a case where a >> a mount is triggered when using kern_path(). >> >> The EISDIR return occurs for positive dentrys, negative dentrys >> will still trigger an automount (which is autofs specific, >> indirect mount map using nobrowse option, the install default). > > Ok, I understand this better now. This difference between direct and > indirect mounts is slightly awkward. It is visible from user-space, but > not elegant to document. > When you use O_PATH to open a direct automount that has not already been > triggered, the open returns the underlying directory (and fstatfs > confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on > an indirect automount, it *will* trigger the automount when "nobrowse" is > in effect, but it won't when "browse" is in effect. That inconsistency has bothered me for quite a while now. It was carried over from the autofs module behavior when automounting support was added to the VFS. What's worse is it prevents the use of the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with statx(). There is some risk in changing that so it does work but it really does need to work to enable userspace to not trigger an automount by using this flag. So that's (hopefully) going to change soonish, see: http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch The result should be that stat family calls don't trigger automounts except for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. > > So we cannot just say "O_PATH doesn't trigger automounts", which is > essentially what I said in > > https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 > > It might be possible to modify automount so that it was more consistent > - i.e. if the point is triggered by a mkdir has been done, just to the > mkdir. If it is triggered after a mkdir has been done, do the mount. I > guess that might be racy, and in any case is hard to justify. > > Maybe I should change it to be about "direct automounts", and add a note > that indirect automounts aren't so predictable. Right and the semantics should be much more consistent in the near future. I hope (and expect) this semantic change won't cause problems. > > But back to my original issue of wanting to discard > kern_path_mountpoint, what would you think of the following approach - > slight revised from before. > > Thanks, > NeilBrown > > diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h > index beef981aa54f..7663ea82e68d 100644 > --- a/fs/autofs4/autofs_i.h > +++ b/fs/autofs4/autofs_i.h > @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) > /* autofs4_oz_mode(): do we see the man behind the curtain? (The > * processes which do manipulations for us in user space sees the raw > * filesystem without "magic".) > + * A process performing certain ioctls can get temporary oz status. > */ > +extern struct task_struct *autofs_tmp_oz; > static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) > { > - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; > + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || > + autofs_tmp_oz == current; > } > > struct inode *autofs4_get_inode(struct super_block *, umode_t); > diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c > index dd9f1bebb5a3..d76401669a20 100644 > --- a/fs/autofs4/dev-ioctl.c > +++ b/fs/autofs4/dev-ioctl.c > @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, > return 0; > } > > +struct task_struct *autofs_tmp_oz; > +int kern_path_oz(const char *pathname, int flags, struct path *path) > +{ > + static DEFINE_MUTEX(autofs_oz); > + int err; > + > + mutex_lock(&autofs_oz); > + autofs_tmp_oz = current; > + err = kern_path(pathname, flags, path); > + autofs_tmp_oz = NULL; > + mutex_unlock(&autofs_oz); > + return err; > +} > + It's simple enough but does look like it will attract criticism as being a hack! The kern_path_locked() function is very similar to what was originally done, along with code to look down the mount stack (rather than up the way it does now) to get the mount point. In this case, to be valid the dentry can't be a symlink so that fits kern_path_locked() too. So maybe it is worth going back to the way it was in the beginning and be done with it .... OTOH Al must have had a reason for changing the way it was done that I didn't get. > /* Find the topmost mount satisfying test() */ > static int find_autofs_mount(const char *pathname, > struct path *res, > @@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname, > struct path path; > int err; > > - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); > + err = kern_path_oz(pathname, 0, &path); > + > if (err) > return err; > err = -ENOENT; > @@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp, > > if (!fp || param->ioctlfd == -1) { > if (autofs_type_any(type)) > - err = kern_path_mountpoint(AT_FDCWD, > - name, &path, LOOKUP_FOLLOW); > + err = kern_path_oz(name, LOOKUP_FOLLOW, &path); > else > err = find_autofs_mount(name, &path, > test_by_type, &type); > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-23 2:32 ` Ian Kent @ 2017-08-23 2:40 ` Ian Kent 2017-08-23 2:54 ` Ian Kent 2017-08-24 3:21 ` NeilBrown 2017-08-24 4:07 ` NeilBrown 1 sibling, 2 replies; 34+ messages in thread From: Ian Kent @ 2017-08-23 2:40 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 23/08/17 10:32, Ian Kent wrote: > On 23/08/17 09:06, NeilBrown wrote: >> On Mon, Aug 21 2017, Ian Kent wrote: >> >>>> >>>> A mount isn't triggered by kern_path(pathname, 0, &path). >>>> That '0' would need to include one of >>>> LOOKUP_PARENT | LOOKUP_DIRECTORY | >>>> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >>>> >>>> to trigger an automount (otherwise you just get -EISDIR). >>> >>> It's perfectly sensible to think that but there is a case where a >>> a mount is triggered when using kern_path(). >>> >>> The EISDIR return occurs for positive dentrys, negative dentrys >>> will still trigger an automount (which is autofs specific, >>> indirect mount map using nobrowse option, the install default). >> >> Ok, I understand this better now. This difference between direct and >> indirect mounts is slightly awkward. It is visible from user-space, but >> not elegant to document. >> When you use O_PATH to open a direct automount that has not already been >> triggered, the open returns the underlying directory (and fstatfs >> confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on >> an indirect automount, it *will* trigger the automount when "nobrowse" is >> in effect, but it won't when "browse" is in effect. > > That inconsistency has bothered me for quite a while now. > > It was carried over from the autofs module behavior when automounting > support was added to the VFS. What's worse is it prevents the use of > the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with > statx(). > > There is some risk in changing that so it does work but it really does > need to work to enable userspace to not trigger an automount by using > this flag. > > So that's (hopefully) going to change soonish, see: > http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch > > The result should be that stat family calls don't trigger automounts except > for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. > >> >> So we cannot just say "O_PATH doesn't trigger automounts", which is >> essentially what I said in >> >> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 >> >> It might be possible to modify automount so that it was more consistent >> - i.e. if the point is triggered by a mkdir has been done, just to the >> mkdir. If it is triggered after a mkdir has been done, do the mount. I >> guess that might be racy, and in any case is hard to justify. >> >> Maybe I should change it to be about "direct automounts", and add a note >> that indirect automounts aren't so predictable. > > Right and the semantics should be much more consistent in the near future. > I hope (and expect) this semantic change won't cause problems. > >> >> But back to my original issue of wanting to discard >> kern_path_mountpoint, what would you think of the following approach - >> slight revised from before. >> >> Thanks, >> NeilBrown >> >> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h >> index beef981aa54f..7663ea82e68d 100644 >> --- a/fs/autofs4/autofs_i.h >> +++ b/fs/autofs4/autofs_i.h >> @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) >> /* autofs4_oz_mode(): do we see the man behind the curtain? (The >> * processes which do manipulations for us in user space sees the raw >> * filesystem without "magic".) >> + * A process performing certain ioctls can get temporary oz status. >> */ >> +extern struct task_struct *autofs_tmp_oz; >> static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) >> { >> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; >> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || >> + autofs_tmp_oz == current; >> } >> >> struct inode *autofs4_get_inode(struct super_block *, umode_t); >> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c >> index dd9f1bebb5a3..d76401669a20 100644 >> --- a/fs/autofs4/dev-ioctl.c >> +++ b/fs/autofs4/dev-ioctl.c >> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, >> return 0; >> } >> >> +struct task_struct *autofs_tmp_oz; >> +int kern_path_oz(const char *pathname, int flags, struct path *path) >> +{ >> + static DEFINE_MUTEX(autofs_oz); >> + int err; >> + >> + mutex_lock(&autofs_oz); >> + autofs_tmp_oz = current; >> + err = kern_path(pathname, flags, path); >> + autofs_tmp_oz = NULL; >> + mutex_unlock(&autofs_oz); >> + return err; >> +} >> + > > It's simple enough but does look like it will attract criticism as being > a hack! > > The kern_path_locked() function is very similar to what was originally > done, along with code to look down the mount stack (rather than up the > way it does now) to get the mount point. In this case, to be valid the > dentry can't be a symlink so that fits kern_path_locked() too. Oh wait, that __lookup_hash() tries too hard to resolve the dentry, that won't quite work, and maybe d_lookup() can't be used safely in this context either .... > > So maybe it is worth going back to the way it was in the beginning and > be done with it .... OTOH Al must have had a reason for changing the > way it was done that I didn't get. > >> /* Find the topmost mount satisfying test() */ >> static int find_autofs_mount(const char *pathname, >> struct path *res, >> @@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname, >> struct path path; >> int err; >> >> - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); >> + err = kern_path_oz(pathname, 0, &path); >> + >> if (err) >> return err; >> err = -ENOENT; >> @@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp, >> >> if (!fp || param->ioctlfd == -1) { >> if (autofs_type_any(type)) >> - err = kern_path_mountpoint(AT_FDCWD, >> - name, &path, LOOKUP_FOLLOW); >> + err = kern_path_oz(name, LOOKUP_FOLLOW, &path); >> else >> err = find_autofs_mount(name, &path, >> test_by_type, &type); >> > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-23 2:40 ` Ian Kent @ 2017-08-23 2:54 ` Ian Kent 2017-08-23 7:51 ` Ian Kent 2017-08-24 3:21 ` NeilBrown 1 sibling, 1 reply; 34+ messages in thread From: Ian Kent @ 2017-08-23 2:54 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 23/08/17 10:40, Ian Kent wrote: > On 23/08/17 10:32, Ian Kent wrote: >> On 23/08/17 09:06, NeilBrown wrote: >>> On Mon, Aug 21 2017, Ian Kent wrote: >>> >>>>> >>>>> A mount isn't triggered by kern_path(pathname, 0, &path). >>>>> That '0' would need to include one of >>>>> LOOKUP_PARENT | LOOKUP_DIRECTORY | >>>>> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >>>>> >>>>> to trigger an automount (otherwise you just get -EISDIR). >>>> >>>> It's perfectly sensible to think that but there is a case where a >>>> a mount is triggered when using kern_path(). >>>> >>>> The EISDIR return occurs for positive dentrys, negative dentrys >>>> will still trigger an automount (which is autofs specific, >>>> indirect mount map using nobrowse option, the install default). >>> >>> Ok, I understand this better now. This difference between direct and >>> indirect mounts is slightly awkward. It is visible from user-space, but >>> not elegant to document. >>> When you use O_PATH to open a direct automount that has not already been >>> triggered, the open returns the underlying directory (and fstatfs >>> confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on >>> an indirect automount, it *will* trigger the automount when "nobrowse" is >>> in effect, but it won't when "browse" is in effect. >> >> That inconsistency has bothered me for quite a while now. >> >> It was carried over from the autofs module behavior when automounting >> support was added to the VFS. What's worse is it prevents the use of >> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >> statx(). >> >> There is some risk in changing that so it does work but it really does >> need to work to enable userspace to not trigger an automount by using >> this flag. >> >> So that's (hopefully) going to change soonish, see: >> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >> >> The result should be that stat family calls don't trigger automounts except >> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >> >>> >>> So we cannot just say "O_PATH doesn't trigger automounts", which is >>> essentially what I said in >>> >>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 >>> >>> It might be possible to modify automount so that it was more consistent >>> - i.e. if the point is triggered by a mkdir has been done, just to the >>> mkdir. If it is triggered after a mkdir has been done, do the mount. I >>> guess that might be racy, and in any case is hard to justify. >>> >>> Maybe I should change it to be about "direct automounts", and add a note >>> that indirect automounts aren't so predictable. >> >> Right and the semantics should be much more consistent in the near future. >> I hope (and expect) this semantic change won't cause problems. >> >>> >>> But back to my original issue of wanting to discard >>> kern_path_mountpoint, what would you think of the following approach - >>> slight revised from before. >>> >>> Thanks, >>> NeilBrown >>> >>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h >>> index beef981aa54f..7663ea82e68d 100644 >>> --- a/fs/autofs4/autofs_i.h >>> +++ b/fs/autofs4/autofs_i.h >>> @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) >>> /* autofs4_oz_mode(): do we see the man behind the curtain? (The >>> * processes which do manipulations for us in user space sees the raw >>> * filesystem without "magic".) >>> + * A process performing certain ioctls can get temporary oz status. >>> */ >>> +extern struct task_struct *autofs_tmp_oz; >>> static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) >>> { >>> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; >>> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || >>> + autofs_tmp_oz == current; >>> } >>> >>> struct inode *autofs4_get_inode(struct super_block *, umode_t); >>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c >>> index dd9f1bebb5a3..d76401669a20 100644 >>> --- a/fs/autofs4/dev-ioctl.c >>> +++ b/fs/autofs4/dev-ioctl.c >>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, >>> return 0; >>> } >>> >>> +struct task_struct *autofs_tmp_oz; >>> +int kern_path_oz(const char *pathname, int flags, struct path *path) >>> +{ >>> + static DEFINE_MUTEX(autofs_oz); >>> + int err; >>> + >>> + mutex_lock(&autofs_oz); >>> + autofs_tmp_oz = current; >>> + err = kern_path(pathname, flags, path); >>> + autofs_tmp_oz = NULL; >>> + mutex_unlock(&autofs_oz); >>> + return err; >>> +} >>> + >> >> It's simple enough but does look like it will attract criticism as being >> a hack! >> >> The kern_path_locked() function is very similar to what was originally >> done, along with code to look down the mount stack (rather than up the >> way it does now) to get the mount point. In this case, to be valid the >> dentry can't be a symlink so that fits kern_path_locked() too. > > Oh wait, that __lookup_hash() tries too hard to resolve the dentry, > that won't quite work, and maybe d_lookup() can't be used safely in > this context either .... Umm .. d_lookup() does look ok so maybe path_parentat() + d_lookup() would be ok. > >> >> So maybe it is worth going back to the way it was in the beginning and >> be done with it .... OTOH Al must have had a reason for changing the >> way it was done that I didn't get. >> >>> /* Find the topmost mount satisfying test() */ >>> static int find_autofs_mount(const char *pathname, >>> struct path *res, >>> @@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname, >>> struct path path; >>> int err; >>> >>> - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); >>> + err = kern_path_oz(pathname, 0, &path); >>> + >>> if (err) >>> return err; >>> err = -ENOENT; >>> @@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp, >>> >>> if (!fp || param->ioctlfd == -1) { >>> if (autofs_type_any(type)) >>> - err = kern_path_mountpoint(AT_FDCWD, >>> - name, &path, LOOKUP_FOLLOW); >>> + err = kern_path_oz(name, LOOKUP_FOLLOW, &path); >>> else >>> err = find_autofs_mount(name, &path, >>> test_by_type, &type); >>> >> > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-23 2:54 ` Ian Kent @ 2017-08-23 7:51 ` Ian Kent 0 siblings, 0 replies; 34+ messages in thread From: Ian Kent @ 2017-08-23 7:51 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 23/08/17 10:54, Ian Kent wrote: > On 23/08/17 10:40, Ian Kent wrote: >> On 23/08/17 10:32, Ian Kent wrote: >>> On 23/08/17 09:06, NeilBrown wrote: >>>> On Mon, Aug 21 2017, Ian Kent wrote: >>>> >>>>>> >>>>>> A mount isn't triggered by kern_path(pathname, 0, &path). >>>>>> That '0' would need to include one of >>>>>> LOOKUP_PARENT | LOOKUP_DIRECTORY | >>>>>> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >>>>>> >>>>>> to trigger an automount (otherwise you just get -EISDIR). >>>>> >>>>> It's perfectly sensible to think that but there is a case where a >>>>> a mount is triggered when using kern_path(). >>>>> >>>>> The EISDIR return occurs for positive dentrys, negative dentrys >>>>> will still trigger an automount (which is autofs specific, >>>>> indirect mount map using nobrowse option, the install default). >>>> >>>> Ok, I understand this better now. This difference between direct and >>>> indirect mounts is slightly awkward. It is visible from user-space, but >>>> not elegant to document. >>>> When you use O_PATH to open a direct automount that has not already been >>>> triggered, the open returns the underlying directory (and fstatfs >>>> confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on >>>> an indirect automount, it *will* trigger the automount when "nobrowse" is >>>> in effect, but it won't when "browse" is in effect. >>> >>> That inconsistency has bothered me for quite a while now. >>> >>> It was carried over from the autofs module behavior when automounting >>> support was added to the VFS. What's worse is it prevents the use of >>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >>> statx(). >>> >>> There is some risk in changing that so it does work but it really does >>> need to work to enable userspace to not trigger an automount by using >>> this flag. >>> >>> So that's (hopefully) going to change soonish, see: >>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >>> >>> The result should be that stat family calls don't trigger automounts except >>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >>> >>>> >>>> So we cannot just say "O_PATH doesn't trigger automounts", which is >>>> essentially what I said in >>>> >>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 >>>> >>>> It might be possible to modify automount so that it was more consistent >>>> - i.e. if the point is triggered by a mkdir has been done, just to the >>>> mkdir. If it is triggered after a mkdir has been done, do the mount. I >>>> guess that might be racy, and in any case is hard to justify. >>>> >>>> Maybe I should change it to be about "direct automounts", and add a note >>>> that indirect automounts aren't so predictable. >>> >>> Right and the semantics should be much more consistent in the near future. >>> I hope (and expect) this semantic change won't cause problems. >>> >>>> >>>> But back to my original issue of wanting to discard >>>> kern_path_mountpoint, what would you think of the following approach - >>>> slight revised from before. >>>> >>>> Thanks, >>>> NeilBrown >>>> >>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h >>>> index beef981aa54f..7663ea82e68d 100644 >>>> --- a/fs/autofs4/autofs_i.h >>>> +++ b/fs/autofs4/autofs_i.h >>>> @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) >>>> /* autofs4_oz_mode(): do we see the man behind the curtain? (The >>>> * processes which do manipulations for us in user space sees the raw >>>> * filesystem without "magic".) >>>> + * A process performing certain ioctls can get temporary oz status. >>>> */ >>>> +extern struct task_struct *autofs_tmp_oz; >>>> static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) >>>> { >>>> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; >>>> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || >>>> + autofs_tmp_oz == current; >>>> } >>>> >>>> struct inode *autofs4_get_inode(struct super_block *, umode_t); >>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c >>>> index dd9f1bebb5a3..d76401669a20 100644 >>>> --- a/fs/autofs4/dev-ioctl.c >>>> +++ b/fs/autofs4/dev-ioctl.c >>>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, >>>> return 0; >>>> } >>>> >>>> +struct task_struct *autofs_tmp_oz; >>>> +int kern_path_oz(const char *pathname, int flags, struct path *path) >>>> +{ >>>> + static DEFINE_MUTEX(autofs_oz); >>>> + int err; >>>> + >>>> + mutex_lock(&autofs_oz); >>>> + autofs_tmp_oz = current; >>>> + err = kern_path(pathname, flags, path); >>>> + autofs_tmp_oz = NULL; >>>> + mutex_unlock(&autofs_oz); >>>> + return err; >>>> +} >>>> + >>> >>> It's simple enough but does look like it will attract criticism as being >>> a hack! >>> >>> The kern_path_locked() function is very similar to what was originally >>> done, along with code to look down the mount stack (rather than up the >>> way it does now) to get the mount point. In this case, to be valid the >>> dentry can't be a symlink so that fits kern_path_locked() too. >> >> Oh wait, that __lookup_hash() tries too hard to resolve the dentry, >> that won't quite work, and maybe d_lookup() can't be used safely in >> this context either .... > > Umm .. d_lookup() does look ok so maybe path_parentat() + d_lookup() > would be ok. Double Umm ... with the patch above kern_path() with flags 0 or LOOKUP_FOLLOW should get either EISDIR or ENOENT ... maybe I should think occasionally !! > >> >>> >>> So maybe it is worth going back to the way it was in the beginning and >>> be done with it .... OTOH Al must have had a reason for changing the >>> way it was done that I didn't get. >>> >>>> /* Find the topmost mount satisfying test() */ >>>> static int find_autofs_mount(const char *pathname, >>>> struct path *res, >>>> @@ -209,7 +223,8 @@ static int find_autofs_mount(const char *pathname, >>>> struct path path; >>>> int err; >>>> >>>> - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); >>>> + err = kern_path_oz(pathname, 0, &path); >>>> + >>>> if (err) >>>> return err; >>>> err = -ENOENT; >>>> @@ -552,8 +567,7 @@ static int autofs_dev_ioctl_ismountpoint(struct file *fp, >>>> >>>> if (!fp || param->ioctlfd == -1) { >>>> if (autofs_type_any(type)) >>>> - err = kern_path_mountpoint(AT_FDCWD, >>>> - name, &path, LOOKUP_FOLLOW); >>>> + err = kern_path_oz(name, LOOKUP_FOLLOW, &path); >>>> else >>>> err = find_autofs_mount(name, &path, >>>> test_by_type, &type); >>>> >>> >> > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-23 2:40 ` Ian Kent 2017-08-23 2:54 ` Ian Kent @ 2017-08-24 3:21 ` NeilBrown 2017-08-24 4:35 ` Ian Kent 1 sibling, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-24 3:21 UTC (permalink / raw) To: Ian Kent, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells [-- Attachment #1: Type: text/plain, Size: 6677 bytes --] On Wed, Aug 23 2017, Ian Kent wrote: > On 23/08/17 10:32, Ian Kent wrote: >> On 23/08/17 09:06, NeilBrown wrote: >>> On Mon, Aug 21 2017, Ian Kent wrote: >>> >>>>> >>>>> A mount isn't triggered by kern_path(pathname, 0, &path). >>>>> That '0' would need to include one of >>>>> LOOKUP_PARENT | LOOKUP_DIRECTORY | >>>>> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >>>>> >>>>> to trigger an automount (otherwise you just get -EISDIR). >>>> >>>> It's perfectly sensible to think that but there is a case where a >>>> a mount is triggered when using kern_path(). >>>> >>>> The EISDIR return occurs for positive dentrys, negative dentrys >>>> will still trigger an automount (which is autofs specific, >>>> indirect mount map using nobrowse option, the install default). >>> >>> Ok, I understand this better now. This difference between direct and >>> indirect mounts is slightly awkward. It is visible from user-space, but >>> not elegant to document. >>> When you use O_PATH to open a direct automount that has not already been >>> triggered, the open returns the underlying directory (and fstatfs >>> confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on >>> an indirect automount, it *will* trigger the automount when "nobrowse" is >>> in effect, but it won't when "browse" is in effect. >> >> That inconsistency has bothered me for quite a while now. >> >> It was carried over from the autofs module behavior when automounting >> support was added to the VFS. What's worse is it prevents the use of >> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >> statx(). >> >> There is some risk in changing that so it does work but it really does >> need to work to enable userspace to not trigger an automount by using >> this flag. >> >> So that's (hopefully) going to change soonish, see: >> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >> >> The result should be that stat family calls don't trigger automounts except >> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >> >>> >>> So we cannot just say "O_PATH doesn't trigger automounts", which is >>> essentially what I said in >>> >>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 >>> >>> It might be possible to modify automount so that it was more consistent >>> - i.e. if the point is triggered by a mkdir has been done, just to the >>> mkdir. If it is triggered after a mkdir has been done, do the mount. I >>> guess that might be racy, and in any case is hard to justify. >>> >>> Maybe I should change it to be about "direct automounts", and add a note >>> that indirect automounts aren't so predictable. >> >> Right and the semantics should be much more consistent in the near future. >> I hope (and expect) this semantic change won't cause problems. >> >>> >>> But back to my original issue of wanting to discard >>> kern_path_mountpoint, what would you think of the following approach - >>> slight revised from before. >>> >>> Thanks, >>> NeilBrown >>> >>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h >>> index beef981aa54f..7663ea82e68d 100644 >>> --- a/fs/autofs4/autofs_i.h >>> +++ b/fs/autofs4/autofs_i.h >>> @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) >>> /* autofs4_oz_mode(): do we see the man behind the curtain? (The >>> * processes which do manipulations for us in user space sees the raw >>> * filesystem without "magic".) >>> + * A process performing certain ioctls can get temporary oz status. >>> */ >>> +extern struct task_struct *autofs_tmp_oz; >>> static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) >>> { >>> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; >>> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || >>> + autofs_tmp_oz == current; >>> } >>> >>> struct inode *autofs4_get_inode(struct super_block *, umode_t); >>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c >>> index dd9f1bebb5a3..d76401669a20 100644 >>> --- a/fs/autofs4/dev-ioctl.c >>> +++ b/fs/autofs4/dev-ioctl.c >>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, >>> return 0; >>> } >>> >>> +struct task_struct *autofs_tmp_oz; >>> +int kern_path_oz(const char *pathname, int flags, struct path *path) >>> +{ >>> + static DEFINE_MUTEX(autofs_oz); >>> + int err; >>> + >>> + mutex_lock(&autofs_oz); >>> + autofs_tmp_oz = current; >>> + err = kern_path(pathname, flags, path); >>> + autofs_tmp_oz = NULL; >>> + mutex_unlock(&autofs_oz); >>> + return err; >>> +} >>> + >> >> It's simple enough but does look like it will attract criticism as being >> a hack! >> >> The kern_path_locked() function is very similar to what was originally >> done, along with code to look down the mount stack (rather than up the >> way it does now) to get the mount point. In this case, to be valid the >> dentry can't be a symlink so that fits kern_path_locked() too. > > Oh wait, that __lookup_hash() tries too hard to resolve the dentry, > that won't quite work, and maybe d_lookup() can't be used safely in > this context either .... > Why do you think that __look_hash() tries too hard? It does call into the filesystem ->lookup() if the name isn't in the cache, which probably isn't strictly needed, but that isn't harmful and the current code does that. Some the following seems sensible to me (though I haven't tested it). Thanks, NeilBrown diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c index dd9f1bebb5a3..859c198d0163 100644 --- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -208,12 +208,16 @@ static int find_autofs_mount(const char *pathname, { struct path path; int err; + struct dentry *de, *parent; + + de = kern_path_locked(pathname, &path); + if (IS_ERR(de)) + return PTR_ERR(de); + parent = path.dentry; + path.dentry = de; - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); - if (err) - return err; err = -ENOENT; - while (path.dentry == path.mnt->mnt_root) { + do { if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) { if (test(&path, data)) { path_get(&path); @@ -222,10 +226,11 @@ static int find_autofs_mount(const char *pathname, break; } } - if (!follow_up(&path)) - break; - } + } while (follow_down_one(&path)); + path_put(&path); + inode_unlock(d_inode(parent)); + dput(parent); return err; } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-24 3:21 ` NeilBrown @ 2017-08-24 4:35 ` Ian Kent 0 siblings, 0 replies; 34+ messages in thread From: Ian Kent @ 2017-08-24 4:35 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells On 24/08/17 11:21, NeilBrown wrote: > On Wed, Aug 23 2017, Ian Kent wrote: > >> On 23/08/17 10:32, Ian Kent wrote: >>> On 23/08/17 09:06, NeilBrown wrote: >>>> On Mon, Aug 21 2017, Ian Kent wrote: >>>> >>>>>> >>>>>> A mount isn't triggered by kern_path(pathname, 0, &path). >>>>>> That '0' would need to include one of >>>>>> LOOKUP_PARENT | LOOKUP_DIRECTORY | >>>>>> LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT >>>>>> >>>>>> to trigger an automount (otherwise you just get -EISDIR). >>>>> >>>>> It's perfectly sensible to think that but there is a case where a >>>>> a mount is triggered when using kern_path(). >>>>> >>>>> The EISDIR return occurs for positive dentrys, negative dentrys >>>>> will still trigger an automount (which is autofs specific, >>>>> indirect mount map using nobrowse option, the install default). >>>> >>>> Ok, I understand this better now. This difference between direct and >>>> indirect mounts is slightly awkward. It is visible from user-space, but >>>> not elegant to document. >>>> When you use O_PATH to open a direct automount that has not already been >>>> triggered, the open returns the underlying directory (and fstatfs >>>> confirms that it is AUTOFS_SUPER_MAGIC). When you use O_PATH on >>>> an indirect automount, it *will* trigger the automount when "nobrowse" is >>>> in effect, but it won't when "browse" is in effect. >>> >>> That inconsistency has bothered me for quite a while now. >>> >>> It was carried over from the autofs module behavior when automounting >>> support was added to the VFS. What's worse is it prevents the use of >>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >>> statx(). >>> >>> There is some risk in changing that so it does work but it really does >>> need to work to enable userspace to not trigger an automount by using >>> this flag. >>> >>> So that's (hopefully) going to change soonish, see: >>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >>> >>> The result should be that stat family calls don't trigger automounts except >>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >>> >>>> >>>> So we cannot just say "O_PATH doesn't trigger automounts", which is >>>> essentially what I said in >>>> >>>> https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=97a45d02e6671482e8b2cdcce3951930bf6bdb94 >>>> >>>> It might be possible to modify automount so that it was more consistent >>>> - i.e. if the point is triggered by a mkdir has been done, just to the >>>> mkdir. If it is triggered after a mkdir has been done, do the mount. I >>>> guess that might be racy, and in any case is hard to justify. >>>> >>>> Maybe I should change it to be about "direct automounts", and add a note >>>> that indirect automounts aren't so predictable. >>> >>> Right and the semantics should be much more consistent in the near future. >>> I hope (and expect) this semantic change won't cause problems. >>> >>>> >>>> But back to my original issue of wanting to discard >>>> kern_path_mountpoint, what would you think of the following approach - >>>> slight revised from before. >>>> >>>> Thanks, >>>> NeilBrown >>>> >>>> diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h >>>> index beef981aa54f..7663ea82e68d 100644 >>>> --- a/fs/autofs4/autofs_i.h >>>> +++ b/fs/autofs4/autofs_i.h >>>> @@ -135,10 +135,13 @@ static inline struct autofs_info *autofs4_dentry_ino(struct dentry *dentry) >>>> /* autofs4_oz_mode(): do we see the man behind the curtain? (The >>>> * processes which do manipulations for us in user space sees the raw >>>> * filesystem without "magic".) >>>> + * A process performing certain ioctls can get temporary oz status. >>>> */ >>>> +extern struct task_struct *autofs_tmp_oz; >>>> static inline int autofs4_oz_mode(struct autofs_sb_info *sbi) >>>> { >>>> - return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp; >>>> + return sbi->catatonic || task_pgrp(current) == sbi->oz_pgrp || >>>> + autofs_tmp_oz == current; >>>> } >>>> >>>> struct inode *autofs4_get_inode(struct super_block *, umode_t); >>>> diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c >>>> index dd9f1bebb5a3..d76401669a20 100644 >>>> --- a/fs/autofs4/dev-ioctl.c >>>> +++ b/fs/autofs4/dev-ioctl.c >>>> @@ -200,6 +200,20 @@ static int autofs_dev_ioctl_protosubver(struct file *fp, >>>> return 0; >>>> } >>>> >>>> +struct task_struct *autofs_tmp_oz; >>>> +int kern_path_oz(const char *pathname, int flags, struct path *path) >>>> +{ >>>> + static DEFINE_MUTEX(autofs_oz); >>>> + int err; >>>> + >>>> + mutex_lock(&autofs_oz); >>>> + autofs_tmp_oz = current; >>>> + err = kern_path(pathname, flags, path); >>>> + autofs_tmp_oz = NULL; >>>> + mutex_unlock(&autofs_oz); >>>> + return err; >>>> +} >>>> + >>> >>> It's simple enough but does look like it will attract criticism as being >>> a hack! >>> >>> The kern_path_locked() function is very similar to what was originally >>> done, along with code to look down the mount stack (rather than up the >>> way it does now) to get the mount point. In this case, to be valid the >>> dentry can't be a symlink so that fits kern_path_locked() too. >> >> Oh wait, that __lookup_hash() tries too hard to resolve the dentry, >> that won't quite work, and maybe d_lookup() can't be used safely in >> this context either .... >> > > Why do you think that __look_hash() tries too hard? > It does call into the filesystem ->lookup() if the name isn't in the > cache, which probably isn't strictly needed, but that isn't harmful and > the current code does that. Only that the execution path can encounter the negative dentry case in follow_automount(). Also, my saying that the dentry can't be a symlink is wrong. For the covered fd case it's reasonable but for the mount point check ioctl the path could be anything, eg. a symlink to an automount dentry, and I want that ioctl to be callable by anyone. > > Some the following seems sensible to me (though I haven't tested it). > > Thanks, > NeilBrown > > > diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c > index dd9f1bebb5a3..859c198d0163 100644 > --- a/fs/autofs4/dev-ioctl.c > +++ b/fs/autofs4/dev-ioctl.c > @@ -208,12 +208,16 @@ static int find_autofs_mount(const char *pathname, > { > struct path path; > int err; > + struct dentry *de, *parent; > + > + de = kern_path_locked(pathname, &path); > + if (IS_ERR(de)) > + return PTR_ERR(de); > + parent = path.dentry; > + path.dentry = de; > > - err = kern_path_mountpoint(AT_FDCWD, pathname, &path, 0); > - if (err) > - return err; > err = -ENOENT; > - while (path.dentry == path.mnt->mnt_root) { > + do { > if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) { > if (test(&path, data)) { > path_get(&path); > @@ -222,10 +226,11 @@ static int find_autofs_mount(const char *pathname, > break; > } > } > - if (!follow_up(&path)) > - break; > - } > + } while (follow_down_one(&path)); > + > path_put(&path); > + inode_unlock(d_inode(parent)); > + dput(parent); > return err; > } > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-23 2:32 ` Ian Kent 2017-08-23 2:40 ` Ian Kent @ 2017-08-24 4:07 ` NeilBrown 2017-08-24 4:47 ` Ian Kent ` (3 more replies) 1 sibling, 4 replies; 34+ messages in thread From: NeilBrown @ 2017-08-24 4:07 UTC (permalink / raw) To: Ian Kent, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, Michael Kerrisk (man-pages) [-- Attachment #1: Type: text/plain, Size: 2019 bytes --] On Wed, Aug 23 2017, Ian Kent wrote: > > That inconsistency has bothered me for quite a while now. > > It was carried over from the autofs module behavior when automounting > support was added to the VFS. What's worse is it prevents the use of > the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with > statx(). > > There is some risk in changing that so it does work but it really does > need to work to enable userspace to not trigger an automount by using > this flag. > > So that's (hopefully) going to change soonish, see: > http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch > > The result should be that stat family calls don't trigger automounts except > for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. > oooh, yes. That's much better - thanks. We should make sure that change gets into the man pages... First however, we should probably correct the man page! stat.2 says: NOTES On Linux, lstat() will generally not trigger automounter action, whereas stat() will (but see the description of fstatat() AT_NO_AUTOMOUNT fag, above). which is wrong: lstat and stat treat automounts the same. @Michael: do you recall why you inserted that text? The commit message in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp discussion in NOTES") is not very helpful. I propose correcting to NOTES: On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set and will not trigger automounter action for direct automount points, though they may (prior to 4.14) for indirect automount points. The more precise details, that automount action for indirect automount points is not triggered when the 'browse' option is used, is probably not necessary. Ian: if you agree with that text, and Michael doesn't provide alternate evidence, I'll submit a formal patch for the man page.... or should we just wait until the patch actually lands? Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-24 4:07 ` NeilBrown @ 2017-08-24 4:47 ` Ian Kent 2017-08-24 4:58 ` Ian Kent ` (2 subsequent siblings) 3 siblings, 0 replies; 34+ messages in thread From: Ian Kent @ 2017-08-24 4:47 UTC (permalink / raw) To: NeilBrown, Jeff Layton, Trond Myklebust, viro Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, Michael Kerrisk (man-pages) On 24/08/17 12:07, NeilBrown wrote: > On Wed, Aug 23 2017, Ian Kent wrote: > >> >> That inconsistency has bothered me for quite a while now. >> >> It was carried over from the autofs module behavior when automounting >> support was added to the VFS. What's worse is it prevents the use of >> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >> statx(). >> >> There is some risk in changing that so it does work but it really does >> need to work to enable userspace to not trigger an automount by using >> this flag. >> >> So that's (hopefully) going to change soonish, see: >> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >> >> The result should be that stat family calls don't trigger automounts except >> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >> > > oooh, yes. That's much better - thanks. > > We should make sure that change gets into the man pages... Yes, I was wondering who to contact for that. > > First however, we should probably correct the man page! > stat.2 says: > > > NOTES > On Linux, lstat() will generally not trigger automounter > action, whereas stat() will (but see the description of > fstatat() AT_NO_AUTOMOUNT fag, above). > > which is wrong: lstat and stat treat automounts the same. > @Michael: do you recall why you inserted that text? The commit message > in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp > discussion in NOTES") is not very helpful. > > I propose correcting to > > NOTES: > On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set > and will not trigger automounter action for direct automount > points, though they may (prior to 4.14) for indirect automount > points. Shouldn't that be "lstat() and stat() act as though AT_NO_AUTOMOUNT is set ..." > > > The more precise details, that automount action for indirect automount > points is not triggered when the 'browse' option is used, is probably > not necessary. > > Ian: if you agree with that text, and Michael doesn't provide alternate > evidence, I'll submit a formal patch for the man page.... or should we > just wait until the patch actually lands? I thought the fstatat() description needed attention too, doubly so with the AT_NO_AUTOMOUNT change. The "The fstatat() system call operates in exactly the same way as stat()" is wrong in the same way as the stat() description was wrong. After the change fstatat() will trigger automounts if the AT_NO_AUTOMOUNT flag is not given which is different from lstat() and stat(). The updated NOTE above probably needs to be referred to in order to clarify what's meant by "in exactly the same way" since that probably refers to the information returned rather than whether an automount will be done. Ian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-24 4:07 ` NeilBrown 2017-08-24 4:47 ` Ian Kent @ 2017-08-24 4:58 ` Ian Kent 2017-08-24 11:03 ` Michael Kerrisk (man-pages) 2017-09-08 15:15 ` Do we really need d_weak_revalidate??? David Howells 3 siblings, 0 replies; 34+ messages in thread From: Ian Kent @ 2017-08-24 4:58 UTC (permalink / raw) To: NeilBrown, viro, David Howells, Andrew Morton Cc: Jeff Layton, Trond Myklebust, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, Michael Kerrisk (man-pages) On 24/08/17 12:07, NeilBrown wrote: > > > The more precise details, that automount action for indirect automount > points is not triggered when the 'browse' option is used, is probably > not necessary. > > Ian: if you agree with that text, and Michael doesn't provide alternate > evidence, I'll submit a formal patch for the man page.... or should we > just wait until the patch actually lands? So far only David commented about using ENOENT rather than EREMOTE. I prefer ENOENT for this case myself and he didn't object when I explained why, David, any concerns? Al has been silent so far so either he hasn't seen it or he's ok with it, Al, any concerns? And I guess if there are no concerns there's a good chance Andrew is ok with it for the next merge window, Andrew? If everyone agrees then we could go ahead immediately so there's a better chance of getting it into released man pages closer to the change being merged. Ian ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-24 4:07 ` NeilBrown 2017-08-24 4:47 ` Ian Kent 2017-08-24 4:58 ` Ian Kent @ 2017-08-24 11:03 ` Michael Kerrisk (man-pages) 2017-08-25 0:05 ` Ian Kent 2017-08-25 5:32 ` [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions NeilBrown 2017-09-08 15:15 ` Do we really need d_weak_revalidate??? David Howells 3 siblings, 2 replies; 34+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-08-24 11:03 UTC (permalink / raw) To: NeilBrown Cc: Ian Kent, Jeff Layton, Trond Myklebust, viro, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, H. Peter Anvin Hi Neil, On 24 August 2017 at 06:07, NeilBrown <neilb@suse.com> wrote: > On Wed, Aug 23 2017, Ian Kent wrote: > >> >> That inconsistency has bothered me for quite a while now. >> >> It was carried over from the autofs module behavior when automounting >> support was added to the VFS. What's worse is it prevents the use of >> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >> statx(). >> >> There is some risk in changing that so it does work but it really does >> need to work to enable userspace to not trigger an automount by using >> this flag. >> >> So that's (hopefully) going to change soonish, see: >> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >> >> The result should be that stat family calls don't trigger automounts except >> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >> > > oooh, yes. That's much better - thanks. > > We should make sure that change gets into the man pages... > > First however, we should probably correct the man page! > stat.2 says: > > > NOTES > On Linux, lstat() will generally not trigger automounter > action, whereas stat() will (but see the description of > fstatat() AT_NO_AUTOMOUNT fag, above). > > which is wrong: lstat and stat treat automounts the same. > @Michael: do you recall why you inserted that text? The commit message > in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp > discussion in NOTES") is not very helpful. That commit really was just cosmetic changes. The change that introduced the text was 82d2be3d9d66b7, based on a note from Peter Anvin: [[ > > Additionally, you may want to make a note in the stat/lstat man page tha t on > > Linux, lstat(2) will generally not trigger automounter action, whereas > > stat(2) will. > > I don't understand this last piece. Can you say some more. (I'm not > familiar with automounter details.) An automounter (either an explicit one, like autofs, or an implicit one, such as are used by AFS or NFSv4) is something that triggers a mount when something is touched. However, it's undesirable to automount, say, everyone's home directory just because someone opened up /home in their GUI browser or typed "ls -l /home". The early automounters simply didn't list the contents until you accessed it by name; this is still the case when you can't enumerate a mapping (say, all DNS names under /net). However, this is extremely inconvenient, too. The solution we ended up settling on is to create something that looks like a directory (i.e. reports S_IFDIR in stat()), but behaves somewhat like a symlink. In particular, when it is accessed in a way where a symlink would be dereferenced, the automount triggers and the directory is mounted. However, system calls which do *not* cause a symlink to be dereferenced, like lstat(), also do not cause the automounter to trigger. This means that "ls -l", or a GUI file browser, can see a list of directories without causing each one of them to be automounted. -hpa ]] Cheers, Michael > I propose correcting to > > NOTES: > On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set > and will not trigger automounter action for direct automount > points, though they may (prior to 4.14) for indirect automount > points. > > > The more precise details, that automount action for indirect automount > points is not triggered when the 'browse' option is used, is probably > not necessary. > > Ian: if you agree with that text, and Michael doesn't provide alternate > evidence, I'll submit a formal patch for the man page.... or should we > just wait until the patch actually lands? > > Thanks, > NeilBrown > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-24 11:03 ` Michael Kerrisk (man-pages) @ 2017-08-25 0:05 ` Ian Kent 2017-08-25 5:32 ` [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions NeilBrown 1 sibling, 0 replies; 34+ messages in thread From: Ian Kent @ 2017-08-25 0:05 UTC (permalink / raw) To: mtk.manpages, NeilBrown Cc: Jeff Layton, Trond Myklebust, viro, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, H. Peter Anvin On 24/08/17 19:03, Michael Kerrisk (man-pages) wrote: > Hi Neil, > > On 24 August 2017 at 06:07, NeilBrown <neilb@suse.com> wrote: >> On Wed, Aug 23 2017, Ian Kent wrote: >> >>> >>> That inconsistency has bothered me for quite a while now. >>> >>> It was carried over from the autofs module behavior when automounting >>> support was added to the VFS. What's worse is it prevents the use of >>> the AT_NO_AUTOMOUNT flag from working properly with fstatat(2) and with >>> statx(). >>> >>> There is some risk in changing that so it does work but it really does >>> need to work to enable userspace to not trigger an automount by using >>> this flag. >>> >>> So that's (hopefully) going to change soonish, see: >>> http://ozlabs.org/~akpm/mmotm/broken-out/autofs-fix-at_no_automount-not-being-honored.patch >>> >>> The result should be that stat family calls don't trigger automounts except >>> for fstatat(2) and statx() which will require the AT_NO_AUTOMOUNT flag. >>> >> >> oooh, yes. That's much better - thanks. >> >> We should make sure that change gets into the man pages... >> >> First however, we should probably correct the man page! >> stat.2 says: >> >> >> NOTES >> On Linux, lstat() will generally not trigger automounter >> action, whereas stat() will (but see the description of >> fstatat() AT_NO_AUTOMOUNT fag, above). >> >> which is wrong: lstat and stat treat automounts the same. >> @Michael: do you recall why you inserted that text? The commit message >> in commit 1ef5b2805471 ("stat.2: Cosmetic reworking of timestamp >> discussion in NOTES") is not very helpful. > > That commit really was just cosmetic changes. The change that > introduced the text was 82d2be3d9d66b7, based on a note from Peter > Anvin: Indeed, that was correct for autofs v3 but we're at autofs v5 now and a lot has changed over time (the commit is from 2008). All I can do is apologize for not also checking the man pages and trying to keep them up to date. Let's just work on making them accurate now. > > [[ > > > Additionally, you may want to make a note in the stat/lstat man page tha > t on > > > Linux, lstat(2) will generally not trigger automounter action, whereas > > > stat(2) will. > > > > I don't understand this last piece. Can you say some more. (I'm not > > familiar with automounter details.) > > An automounter (either an explicit one, like autofs, or an implicit > one, such as are used by AFS or NFSv4) is something that triggers > a mount when something is touched. > > However, it's undesirable to automount, say, everyone's home > directory just because someone opened up /home in their GUI > browser or typed "ls -l /home". The early automounters simply > didn't list the contents until you accessed it by name; > this is still the case when you can't enumerate a mapping > (say, all DNS names under /net). However, this is extremely > inconvenient, too. > > The solution we ended up settling on is to create something > that looks like a directory (i.e. reports S_IFDIR in stat()), > but behaves somewhat like a symlink. In particular, when it is > accessed in a way where a symlink would be dereferenced, > the automount triggers and the directory is mounted. However, > system calls which do *not* cause a symlink to be dereferenced, > like lstat(), also do not cause the automounter to trigger. > This means that "ls -l", or a GUI file browser, can see a list > of directories without causing each one of them to be automounted. > > -hpa > ]] > > Cheers, > > Michael > >> I propose correcting to >> >> NOTES: >> On Linux, lstat() nor stat() act as though AT_NO_AUTOMOUNT was set >> and will not trigger automounter action for direct automount >> points, though they may (prior to 4.14) for indirect automount >> points. >> >> >> The more precise details, that automount action for indirect automount >> points is not triggered when the 'browse' option is used, is probably >> not necessary. >> >> Ian: if you agree with that text, and Michael doesn't provide alternate >> evidence, I'll submit a formal patch for the man page.... or should we >> just wait until the patch actually lands? >> >> Thanks, >> NeilBrown >> > > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions. 2017-08-24 11:03 ` Michael Kerrisk (man-pages) 2017-08-25 0:05 ` Ian Kent @ 2017-08-25 5:32 ` NeilBrown 2017-09-14 13:38 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-25 5:32 UTC (permalink / raw) To: mtk.manpages Cc: Ian Kent, Jeff Layton, Trond Myklebust, viro, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, H. Peter Anvin [-- Attachment #1: Type: text/plain, Size: 3545 bytes --] Expand on the relationship between fstatat() and the other three functions, and improve the description of AT_NO_AUTOMOUNT. Specifically, both stat() and lstat() act the same way with respect to automounts, and that behavior matches fstatat with the AT_NO_AUTOMOUNT flag. The text in the NOTES is removed and places with the text for AT_NO_AUTOMOUNT to improve cohesion. New text for a difference to be introduced in 4.14. Cc: Ian Kent <ikent@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> --- Thanks Ian and Michael. I considered your input and read through the whole again, and came up with this which is quite different to what I suggested before. If this patch is applied, the result probably shouldn't be released until the relevant patch actually lands in Linus's tree. NeilBrown man2/stat.2 | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/man2/stat.2 b/man2/stat.2 index d8a9e76b3d9f..c6dddfe0d3a7 100644 --- a/man2/stat.2 +++ b/man2/stat.2 @@ -260,9 +260,12 @@ For further information on the above fields, see .SS fstatat() The .BR fstatat () -system call operates in exactly the same way as +system call is a more general interface for accessing file information +which can still provide exactly the behavior of each of .BR stat (), -except for the differences described here. +.BR lstat (), +and +.BR fstat (). .PP If the pathname given in .I pathname @@ -272,6 +275,8 @@ referred to by the file descriptor (rather than relative to the current working directory of the calling process, as is done by .BR stat () +and +.BR lstat () for a relative pathname). .PP If @@ -284,7 +289,9 @@ then .I pathname is interpreted relative to the current working directory of the calling process (like -.BR stat ()). +.BR stat () +and +.BR lstat ()). .PP If .I pathname @@ -307,7 +314,11 @@ is an empty string, operate on the file referred to by flag). In this case, .I dirfd -can refer to any type of file, not just a directory. +can refer to any type of file, not just a directory, and +the behavior of +.BR fstatat () +is similar to that of +.BR fstat (). If .I dirfd is @@ -324,6 +335,8 @@ Don't automount the terminal ("basename") component of if it is a directory that is an automount point. This allows the caller to gather attributes of an automount point (rather than the location it would mount). +Since Linux 4.14, also don't instantiate a non-existent name in an +on-demand directory such as used for automounter indirect maps. This flag can be used in tools that scan directories to prevent mass-automounting of a directory of automount points. The @@ -333,6 +346,13 @@ This flag is Linux-specific; define .B _GNU_SOURCE .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed to obtain its definition. +Both +.BR stat () +and +.BR lstat () +act as though +.B AT_NO_AUTOMOUNT +was set. .TP .B AT_SYMLINK_NOFOLLOW If @@ -474,15 +494,6 @@ fields may be less portable. The interpretation differs between systems, and possibly on a single system when NFS mounts are involved.) .SH NOTES -On Linux, -.BR lstat () -will generally not trigger automounter action, whereas -.BR stat () -will (but see the description of the -.BR fstatat () -.B AT_NO_AUTOMOUNT -fag, above). -.\" .SS Timestamp fields Older kernels and older standards did not support nanosecond timestamp fields. -- 2.14.0.rc0.dirty [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions. 2017-08-25 5:32 ` [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions NeilBrown @ 2017-09-14 13:38 ` Michael Kerrisk (man-pages) 2017-09-14 22:25 ` NeilBrown 0 siblings, 1 reply; 34+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-09-14 13:38 UTC (permalink / raw) To: NeilBrown Cc: Ian Kent, Jeff Layton, Trond Myklebust, viro, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, H. Peter Anvin, Michael Kerrisk Hi Neil, On 25 August 2017 at 07:32, NeilBrown <neilb@suse.com> wrote: > > Expand on the relationship between fstatat() and the other > three functions, and improve the description of AT_NO_AUTOMOUNT. > Specifically, both stat() and lstat() act the same way > with respect to automounts, and that behavior matches > fstatat with the AT_NO_AUTOMOUNT flag. > > The text in the NOTES is removed and places with the text for > AT_NO_AUTOMOUNT to improve cohesion. > > New text for a difference to be introduced in 4.14. Has the 4.14 piece gone in? Cheers, Michael > Cc: Ian Kent <ikent@redhat.com> > Signed-off-by: NeilBrown <neilb@suse.com> > --- > > Thanks Ian and Michael. I considered your input and read > through the whole again, and came up with this which is > quite different to what I suggested before. > > If this patch is applied, the result probably shouldn't be released > until the relevant patch actually lands in Linus's tree. > > NeilBrown > > > man2/stat.2 | 37 ++++++++++++++++++++++++------------- > 1 file changed, 24 insertions(+), 13 deletions(-) > > diff --git a/man2/stat.2 b/man2/stat.2 > index d8a9e76b3d9f..c6dddfe0d3a7 100644 > --- a/man2/stat.2 > +++ b/man2/stat.2 > @@ -260,9 +260,12 @@ For further information on the above fields, see > .SS fstatat() > The > .BR fstatat () > -system call operates in exactly the same way as > +system call is a more general interface for accessing file information > +which can still provide exactly the behavior of each of > .BR stat (), > -except for the differences described here. > +.BR lstat (), > +and > +.BR fstat (). > .PP > If the pathname given in > .I pathname > @@ -272,6 +275,8 @@ referred to by the file descriptor > (rather than relative to the current working directory of > the calling process, as is done by > .BR stat () > +and > +.BR lstat () > for a relative pathname). > .PP > If > @@ -284,7 +289,9 @@ then > .I pathname > is interpreted relative to the current working > directory of the calling process (like > -.BR stat ()). > +.BR stat () > +and > +.BR lstat ()). > .PP > If > .I pathname > @@ -307,7 +314,11 @@ is an empty string, operate on the file referred to by > flag). > In this case, > .I dirfd > -can refer to any type of file, not just a directory. > +can refer to any type of file, not just a directory, and > +the behavior of > +.BR fstatat () > +is similar to that of > +.BR fstat (). > If > .I dirfd > is > @@ -324,6 +335,8 @@ Don't automount the terminal ("basename") component of > if it is a directory that is an automount point. > This allows the caller to gather attributes of an automount point > (rather than the location it would mount). > +Since Linux 4.14, also don't instantiate a non-existent name in an > +on-demand directory such as used for automounter indirect maps. > This flag can be used in tools that scan directories > to prevent mass-automounting of a directory of automount points. > The > @@ -333,6 +346,13 @@ This flag is Linux-specific; define > .B _GNU_SOURCE > .\" Before glibc 2.16, defining _ATFILE_SOURCE sufficed > to obtain its definition. > +Both > +.BR stat () > +and > +.BR lstat () > +act as though > +.B AT_NO_AUTOMOUNT > +was set. > .TP > .B AT_SYMLINK_NOFOLLOW > If > @@ -474,15 +494,6 @@ fields may be less portable. > The interpretation differs between systems, > and possibly on a single system when NFS mounts are involved.) > .SH NOTES > -On Linux, > -.BR lstat () > -will generally not trigger automounter action, whereas > -.BR stat () > -will (but see the description of the > -.BR fstatat () > -.B AT_NO_AUTOMOUNT > -fag, above). > -.\" > .SS Timestamp fields > Older kernels and older standards did not support nanosecond timestamp > fields. > -- > 2.14.0.rc0.dirty > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions. 2017-09-14 13:38 ` Michael Kerrisk (man-pages) @ 2017-09-14 22:25 ` NeilBrown 2017-09-16 13:11 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-09-14 22:25 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Ian Kent, Jeff Layton, Trond Myklebust, viro, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, H. Peter Anvin, Michael Kerrisk [-- Attachment #1: Type: text/plain, Size: 723 bytes --] On Thu, Sep 14 2017, Michael Kerrisk (man-pages) wrote: > Hi Neil, > > On 25 August 2017 at 07:32, NeilBrown <neilb@suse.com> wrote: >> >> Expand on the relationship between fstatat() and the other >> three functions, and improve the description of AT_NO_AUTOMOUNT. >> Specifically, both stat() and lstat() act the same way >> with respect to automounts, and that behavior matches >> fstatat with the AT_NO_AUTOMOUNT flag. >> >> The text in the NOTES is removed and places with the text for >> AT_NO_AUTOMOUNT to improve cohesion. >> >> New text for a difference to be introduced in 4.14. > > Has the 4.14 piece gone in? > Yes. Commit: 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored") Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions. 2017-09-14 22:25 ` NeilBrown @ 2017-09-16 13:11 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 34+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-09-16 13:11 UTC (permalink / raw) To: NeilBrown Cc: mtk.manpages, Ian Kent, Jeff Layton, Trond Myklebust, viro, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, David Howells, H. Peter Anvin On 09/15/2017 12:25 AM, NeilBrown wrote: > On Thu, Sep 14 2017, Michael Kerrisk (man-pages) wrote: > >> Hi Neil, >> >> On 25 August 2017 at 07:32, NeilBrown <neilb@suse.com> wrote: >>> >>> Expand on the relationship between fstatat() and the other >>> three functions, and improve the description of AT_NO_AUTOMOUNT. >>> Specifically, both stat() and lstat() act the same way >>> with respect to automounts, and that behavior matches >>> fstatat with the AT_NO_AUTOMOUNT flag. >>> >>> The text in the NOTES is removed and places with the text for >>> AT_NO_AUTOMOUNT to improve cohesion. >>> >>> New text for a difference to be introduced in 4.14. >> >> Has the 4.14 piece gone in? >> > > Yes. > > Commit: 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored") Thanks. I've applied the patch. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-24 4:07 ` NeilBrown ` (2 preceding siblings ...) 2017-08-24 11:03 ` Michael Kerrisk (man-pages) @ 2017-09-08 15:15 ` David Howells 3 siblings, 0 replies; 34+ messages in thread From: David Howells @ 2017-09-08 15:15 UTC (permalink / raw) To: Ian Kent Cc: dhowells, NeilBrown, viro, Andrew Morton, Jeff Layton, Trond Myklebust, linux-kernel, mkoutny, linux-nfs, linux-fsdevel, Michael Kerrisk (man-pages) Ian Kent <ikent@redhat.com> wrote: > So far only David commented about using ENOENT rather than EREMOTE. > > I prefer ENOENT for this case myself and he didn't object when I > explained why, David, any concerns? Not really - it just seems EREMOTE is a better fit since there is something there, we're just not allowed to follow it. This is different to a dangling symlink IMO. David ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-11 5:55 ` Trond Myklebust 2017-08-11 11:01 ` Jeff Layton @ 2017-08-13 23:29 ` NeilBrown 2017-08-24 6:34 ` NeilBrown 1 sibling, 1 reply; 34+ messages in thread From: NeilBrown @ 2017-08-13 23:29 UTC (permalink / raw) To: Trond Myklebust, viro, jlayton Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 4427 bytes --] On Fri, Aug 11 2017, Trond Myklebust wrote: > On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >> flag and introduced the d_weak_revalidate dentry operation instead. >> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >> and added the new dentry operation to NFS dentries .... but not to >> NFSv4 >> dentries. >> >> And nobody noticed. >> >> Until today. >> >> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >> NFS >> filesystem hangs because the network has been deconfigured. This >> makes >> perfect sense and I suggested a code change to fix the problem. >> However when a colleague was trying to reproduce the problem to >> validate >> the fix, he couldn't. Then nor could I. >> >> The problem is trivially reproducible with NFSv3, and not at all with >> NFSv4. The reason is the missing d_weak_revalidate. >> >> We could simply add d_weak_revalidate for NFSv4, but given that it >> has been missing for 4.5 years, and the only time anyone noticed was >> when the ommission resulted in a better user experience, I do wonder >> if >> we need to. Can we just discard d_weak_revalidate? What purpose >> does >> it serve? I couldn't find one. >> >> Thanks, >> NeilBrown >> >> For reference, see >> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >> d_weak_revalidate dentry op") >> >> >> >> To reproduce the problem at home, on a system that uses systemd: >> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >> 3/ loop-mount the filesystem image read-only somewhere >> 4/ reboot >> >> If you choose v4, the reboot will succeed, possibly after a 90second >> timeout. >> If you choose v3, the reboot will hang indefinitely in systemd- >> shutdown while >> remounting the nfs filesystem read-only. >> >> If you don't use "noac" it can still hang, but only if something >> slows >> down the reboot enough that attributes have timed out by the time >> that >> systemd-shutdown runs. This happens for our customer. >> >> If the loop-mounted filesystem is not read-only, you get other >> problems. >> >> We really want systemd to figure out that the loop-mount needs to be >> unmounted first. I have ideas concerning that, but it is messy. But >> that isn't the only bug here. > > The main purpose of d_weak_revalidate() was to catch the issues that > arise when someone changes the contents of the current working > directory or its parent on the server. Since '.' and '..' are treated > specially in the lookup code, they would not be revalidated without > special treatment. That leads to issues when looking up files as > ./<filename> or ../<filename>, since the client won't detect that its > dcache is stale until it tries to use the cached dentry+inode. I don't think that is quite right. d_weak_revalidate() is only called from complete_walk() if LOOKUP_JUMPED is set. The happens when the final component of a path: - is a mount point - is ".." or if the whole path is "/". I thought "." was treated specially too, but I cannot find that in the code. After a path walk completes, the operation that acts on the path will revalidate the inode one way or another so having an extra early validation seems hard to justify. If the inode has been removed, ESTALE is returned. The slightly earlier return of ESTALE might change some behavior.... All I can think of is that if the directory under a mountpoint gets deleted, the mountpoint is automatically removed.., but that happens in d_invalidate() which isn't called when d_weak_revalidate() is called. > > The one thing that has changed since its introduction is, I believe, > the ESTALE handling in the VFS layer. That might fix a lot of the > dcache lookup bugs that were previously handled by d_weak_revalidate(). > I haven't done an audit to figure out if it actually can handle all of > them. I agree that seems like it might be relevant, but I don't see how it would relate to any of the three cases that d_weak_revalidate affects. Maybe there is some other change that we don't remember. Thanks, NeilBrown > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Do we really need d_weak_revalidate??? 2017-08-13 23:29 ` NeilBrown @ 2017-08-24 6:34 ` NeilBrown 0 siblings, 0 replies; 34+ messages in thread From: NeilBrown @ 2017-08-24 6:34 UTC (permalink / raw) To: Trond Myklebust, viro, jlayton Cc: linux-kernel, mkoutny, linux-nfs, linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 6289 bytes --] On Mon, Aug 14 2017, NeilBrown wrote: > On Fri, Aug 11 2017, Trond Myklebust wrote: > >> On Fri, 2017-08-11 at 14:31 +1000, NeilBrown wrote: >>> Funny story. 4.5 years ago we discarded the FS_REVAL_DOT superblock >>> flag and introduced the d_weak_revalidate dentry operation instead. >>> We duly removed the flag from NFS superblocks and NFSv4 superblocks, >>> and added the new dentry operation to NFS dentries .... but not to >>> NFSv4 >>> dentries. >>> >>> And nobody noticed. >>> >>> Until today. >>> >>> A customer reports a situation where mount(....,MS_REMOUNT,..) on an >>> NFS >>> filesystem hangs because the network has been deconfigured. This >>> makes >>> perfect sense and I suggested a code change to fix the problem. >>> However when a colleague was trying to reproduce the problem to >>> validate >>> the fix, he couldn't. Then nor could I. >>> >>> The problem is trivially reproducible with NFSv3, and not at all with >>> NFSv4. The reason is the missing d_weak_revalidate. >>> >>> We could simply add d_weak_revalidate for NFSv4, but given that it >>> has been missing for 4.5 years, and the only time anyone noticed was >>> when the ommission resulted in a better user experience, I do wonder >>> if >>> we need to. Can we just discard d_weak_revalidate? What purpose >>> does >>> it serve? I couldn't find one. >>> >>> Thanks, >>> NeilBrown >>> >>> For reference, see >>> Commit: ecf3d1f1aa74 ("vfs: kill FS_REVAL_DOT by adding a >>> d_weak_revalidate dentry op") >>> >>> >>> >>> To reproduce the problem at home, on a system that uses systemd: >>> 1/ place (or find) a filesystem image in a file on an NFS filesystem. >>> 2/ mount the nfs filesystem with "noac" - choose v3 or v4 >>> 3/ loop-mount the filesystem image read-only somewhere >>> 4/ reboot >>> >>> If you choose v4, the reboot will succeed, possibly after a 90second >>> timeout. >>> If you choose v3, the reboot will hang indefinitely in systemd- >>> shutdown while >>> remounting the nfs filesystem read-only. >>> >>> If you don't use "noac" it can still hang, but only if something >>> slows >>> down the reboot enough that attributes have timed out by the time >>> that >>> systemd-shutdown runs. This happens for our customer. >>> >>> If the loop-mounted filesystem is not read-only, you get other >>> problems. >>> >>> We really want systemd to figure out that the loop-mount needs to be >>> unmounted first. I have ideas concerning that, but it is messy. But >>> that isn't the only bug here. >> >> The main purpose of d_weak_revalidate() was to catch the issues that >> arise when someone changes the contents of the current working >> directory or its parent on the server. Since '.' and '..' are treated >> specially in the lookup code, they would not be revalidated without >> special treatment. That leads to issues when looking up files as >> ./<filename> or ../<filename>, since the client won't detect that its >> dcache is stale until it tries to use the cached dentry+inode. > > I don't think that is quite right. > d_weak_revalidate() is only called from complete_walk() if LOOKUP_JUMPED > is set. The happens when the final component of a path: > - is a mount point > - is ".." > or if the whole path is "/". I thought "." was treated specially too, > but I cannot find that in the code. Actually, you were very close to the right answer, and I was missing something important. The issue (or, at least "an" issue) happens when you open "." or ".." or a mount point, or a /proc/*/fd/* symlink. In each case LOOKUP_JUMPED is set. "." doesn't set it, but it doesn't clear it either and it is always set at the start of a path lookup. When you open a file (or directory) on NFS you need to validate the attributes to ensure close-to-open consistency rules are met. When you open any path that ends with a LAST_NORM name, d_revalidate will be passed the LOOKUP_OPEN flag and so nfs_lookup_verify_inode() will force a revalidate with __nfs_revalidate_inode(). When you open something that ends with LOOKUP_JUMPED, the task of forcing the revalidate falls to d_weak_revalidate(). Unfortunately it doesn't actually do that. With NFSv4, there is no d_weak_revalidate(). With NFSv3 there is - but it doesn't know if LOOKUP_JUMPED is set, and doesn't force the revalidate. This means that if you echo * or echo ../* there might be no communication with the server, and you might get stale data. These command *do* work as expected only when the directory being listed is a mountpoint. This is because nfs_opendir() contains: if (filp->f_path.dentry == filp->f_path.mnt->mnt_root) { /* This is a mountpoint, so d_revalidate will never * have been called, so we need to refresh the * inode (for close-open consistency) ourselves. */ __nfs_revalidate_inode(NFS_SERVER(inode), inode); } which I put there some years ago, when things worked differently. There are various ways we could fix this. The simplest would be to change complete_walk() to only call d_weak_revalidate if (nd->flags & LOOKUP_OPEN), and change d_weak_revalidate to call __nfs_revalidate_inode() unconditionally. And to get NFSv4 to call this too. However I would like to take a different approach. I'd like to change nfs_lookup_revalidate to check LOOKUP_JUMPED itself, and to consider only the inode when the flag is set. When we can discard d_weak_revalidate() and call d_revalidate (with LOOKUP_JUMPED set) in complete_walk(). Maybe this is too intrusive on other filesystems that don't differentiate revalidate on open ... nfs is the only filesystem which tests LOOKUP_OPEN in d_revalidate. Or maybe the LAST_JUMPED flag could be passed to ->open (atomic_open doesn't need it) - but that could get messy. It would have to go through vfs_open Either approach will mean that umount can go back to using user_path_at(), as the final dentry will only be revalidated on open, not on other accesses. The LOOKUP_JUMPED flag and d_weak_revalidate() trace their history back to FS_REVAL_DOT, and the issue has always been about handling open() correctly when the path doesn't ends LAST_NORM. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2017-09-16 13:11 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-11 4:31 Do we really need d_weak_revalidate??? NeilBrown 2017-08-11 5:55 ` Trond Myklebust 2017-08-11 11:01 ` Jeff Layton 2017-08-13 23:36 ` NeilBrown 2017-08-14 10:10 ` Jeff Layton 2017-08-16 2:43 ` NeilBrown 2017-08-16 11:34 ` Jeff Layton 2017-08-16 23:47 ` NeilBrown 2017-08-17 2:20 ` Ian Kent 2017-08-18 5:24 ` NeilBrown 2017-08-18 6:47 ` Ian Kent 2017-08-18 6:55 ` Ian Kent 2017-08-21 6:23 ` NeilBrown 2017-08-21 6:32 ` Ian Kent 2017-08-21 7:46 ` NeilBrown 2017-08-23 1:06 ` NeilBrown 2017-08-23 2:32 ` Ian Kent 2017-08-23 2:40 ` Ian Kent 2017-08-23 2:54 ` Ian Kent 2017-08-23 7:51 ` Ian Kent 2017-08-24 3:21 ` NeilBrown 2017-08-24 4:35 ` Ian Kent 2017-08-24 4:07 ` NeilBrown 2017-08-24 4:47 ` Ian Kent 2017-08-24 4:58 ` Ian Kent 2017-08-24 11:03 ` Michael Kerrisk (man-pages) 2017-08-25 0:05 ` Ian Kent 2017-08-25 5:32 ` [PATCH manpages] stat.2: correct AT_NO_AUTOMOUNT text and general revisions NeilBrown 2017-09-14 13:38 ` Michael Kerrisk (man-pages) 2017-09-14 22:25 ` NeilBrown 2017-09-16 13:11 ` Michael Kerrisk (man-pages) 2017-09-08 15:15 ` Do we really need d_weak_revalidate??? David Howells 2017-08-13 23:29 ` NeilBrown 2017-08-24 6:34 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).