* recent intermittent fsx-related failures @ 2021-07-23 20:12 Chuck Lever III 2021-07-23 20:24 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever III @ 2021-07-23 20:12 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List Hi- I noticed recently that generic/075, generic/112, and generic/127 were failing intermittently on NFSv3 mounts. All three of these tests are based on fsx. "git bisect" landed on this commit: 7b24dacf0840 ("NFS: Another inode revalidation improvement") After reverting 7b24dacf0840 on v5.14-rc1, I can no longer reproduce the test failures. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-07-23 20:12 recent intermittent fsx-related failures Chuck Lever III @ 2021-07-23 20:24 ` Trond Myklebust 2021-07-23 21:31 ` Chuck Lever III 2021-09-19 23:03 ` Chuck Lever III 0 siblings, 2 replies; 11+ messages in thread From: Trond Myklebust @ 2021-07-23 20:24 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: > Hi- > > I noticed recently that generic/075, generic/112, and generic/127 > were > failing intermittently on NFSv3 mounts. All three of these tests are > based on fsx. > > "git bisect" landed on this commit: > > 7b24dacf0840 ("NFS: Another inode revalidation improvement") > > After reverting 7b24dacf0840 on v5.14-rc1, I can no longer reproduce > the test failures. > > So you are seeing file metadata updates that end up not changing the ctime? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-07-23 20:24 ` Trond Myklebust @ 2021-07-23 21:31 ` Chuck Lever III 2021-08-23 15:21 ` Chuck Lever III 2021-09-19 23:03 ` Chuck Lever III 1 sibling, 1 reply; 11+ messages in thread From: Chuck Lever III @ 2021-07-23 21:31 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Jul 23, 2021, at 4:24 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: >> Hi- >> >> I noticed recently that generic/075, generic/112, and generic/127 >> were >> failing intermittently on NFSv3 mounts. All three of these tests are >> based on fsx. >> >> "git bisect" landed on this commit: >> >> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >> >> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer reproduce >> the test failures. > > So you are seeing file metadata updates that end up not changing the > ctime? I haven't drilled that deeply into it yet. Fwiw, the test runs against a tmpfs export. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-07-23 21:31 ` Chuck Lever III @ 2021-08-23 15:21 ` Chuck Lever III 0 siblings, 0 replies; 11+ messages in thread From: Chuck Lever III @ 2021-08-23 15:21 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Jul 23, 2021, at 5:31 PM, Chuck Lever III <chuck.lever@oracle.com> wrote: > >> On Jul 23, 2021, at 4:24 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: >> >> On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: >>> Hi- >>> >>> I noticed recently that generic/075, generic/112, and generic/127 >>> were >>> failing intermittently on NFSv3 mounts. All three of these tests are >>> based on fsx. >>> >>> "git bisect" landed on this commit: >>> >>> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >>> >>> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer reproduce >>> the test failures. >> >> So you are seeing file metadata updates that end up not changing the >> ctime? > > I haven't drilled that deeply into it yet. > > Fwiw, the test runs against a tmpfs export. Btw, I'm still seeing this failure on occasion. It had stopped reproducing for a while, so I set it aside, but it has cropped up again. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-07-23 20:24 ` Trond Myklebust 2021-07-23 21:31 ` Chuck Lever III @ 2021-09-19 23:03 ` Chuck Lever III 2021-09-19 23:19 ` Trond Myklebust 2021-09-21 19:00 ` Trond Myklebust 1 sibling, 2 replies; 11+ messages in thread From: Chuck Lever III @ 2021-09-19 23:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Jul 23, 2021, at 4:24 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: >> Hi- >> >> I noticed recently that generic/075, generic/112, and generic/127 >> were >> failing intermittently on NFSv3 mounts. All three of these tests are >> based on fsx. >> >> "git bisect" landed on this commit: >> >> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >> >> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer reproduce >> the test failures. >> >> > > So you are seeing file metadata updates that end up not changing the > ctime? As far as I can tell, a WRITE and two SETATTRs are happening in sequence to the same file during the same jiffy. The WRITE does not report pre/post attributes, but the SETATTRs do. The reported pre- and post- mtime and ctime are all the same value for both SETATTRs, I believe due to timestamp_truncate(). My theory is that persistent-storage-backed filesystems seem to go slow enough that it doesn't become a significant problem. But with tmpfs, this can happen often enough that the client gets confused. And I can make the problem unreproducable if I enable enough debugging paraphernalia on the server to slow it down. I'm not exactly sure how the client becomes confused by this behavior, but fsx reports a stale size value, or it can hit a bus error. I'm seeing at least four of the fsx-based xfs tests fail intermittently. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-09-19 23:03 ` Chuck Lever III @ 2021-09-19 23:19 ` Trond Myklebust 2021-09-20 20:05 ` Chuck Lever III 2021-09-21 19:00 ` Trond Myklebust 1 sibling, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2021-09-19 23:19 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote: > > > > On Jul 23, 2021, at 4:24 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: > > > Hi- > > > > > > I noticed recently that generic/075, generic/112, and generic/127 > > > were > > > failing intermittently on NFSv3 mounts. All three of these tests > > > are > > > based on fsx. > > > > > > "git bisect" landed on this commit: > > > > > > 7b24dacf0840 ("NFS: Another inode revalidation improvement") > > > > > > After reverting 7b24dacf0840 on v5.14-rc1, I can no longer > > > reproduce > > > the test failures. > > > > > > > > > > So you are seeing file metadata updates that end up not changing > > the > > ctime? > > As far as I can tell, a WRITE and two SETATTRs are happening in > sequence to the same file during the same jiffy. The WRITE does > not report pre/post attributes, but the SETATTRs do. The reported > pre- and post- mtime and ctime are all the same value for both > SETATTRs, I believe due to timestamp_truncate(). > > My theory is that persistent-storage-backed filesystems seem to > go slow enough that it doesn't become a significant problem. But > with tmpfs, this can happen often enough that the client gets > confused. And I can make the problem unreproducable if I enable > enough debugging paraphernalia on the server to slow it down. > > I'm not exactly sure how the client becomes confused by this > behavior, but fsx reports a stale size value, or it can hit a > bus error. I'm seeing at least four of the fsx-based xfs tests > fail intermittently. > It really isn't a client problem then. If the server is failing to update the timestamps, then you gets what you gets. This is one of the reasons why we need to move on from NFSv3. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-09-19 23:19 ` Trond Myklebust @ 2021-09-20 20:05 ` Chuck Lever III 0 siblings, 0 replies; 11+ messages in thread From: Chuck Lever III @ 2021-09-20 20:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Sep 19, 2021, at 7:19 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote: >> >>> On Jul 23, 2021, at 4:24 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: >>>> Hi- >>>> >>>> I noticed recently that generic/075, generic/112, and generic/127 >>>> were >>>> failing intermittently on NFSv3 mounts. All three of these tests >>>> are >>>> based on fsx. >>>> >>>> "git bisect" landed on this commit: >>>> >>>> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >>>> >>>> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer >>>> reproduce >>>> the test failures. >>>> >>>> >>> >>> So you are seeing file metadata updates that end up not changing >>> the >>> ctime? >> >> As far as I can tell, a WRITE and two SETATTRs are happening in >> sequence to the same file during the same jiffy. The WRITE does >> not report pre/post attributes, but the SETATTRs do. The reported >> pre- and post- mtime and ctime are all the same value for both >> SETATTRs, I believe due to timestamp_truncate(). >> >> My theory is that persistent-storage-backed filesystems seem to >> go slow enough that it doesn't become a significant problem. But >> with tmpfs, this can happen often enough that the client gets >> confused. And I can make the problem unreproducable if I enable >> enough debugging paraphernalia on the server to slow it down. >> >> I'm not exactly sure how the client becomes confused by this >> behavior, but fsx reports a stale size value, or it can hit a >> bus error. I'm seeing at least four of the fsx-based xfs tests >> fail intermittently. > > It really isn't a client problem then. If the server is failing to > update the timestamps, then you gets what you gets. I don't think it's as simple as that. The Linux VFS has clamped the resolution of file timestamps since before the git era began. See current_time() and its ancestors. The fsx-based tests start failing only after 7b24dacf0840 ("NFS: Another inode revalidation improvement") was applied to the client. So until 7b24dacf0840, things worked despite poor server-side timestamp resolution. In addition, it's not terribly sensible that the client should ignore changes that it made itself simply because the ctime on the server didn't change. m/ctime has been more or less a hint since day one, used to detect possible changes by _other_ clients. Here, the client is doing a SETATTR then throwing away the server-returned attributes and presenting a stale file size from its own cache to an application. That smells awfully like a client regression to me. In any event, as I said above, I'm not exactly sure how the client is becoming confused, so this is not yet a rigorous root-cause analysis. I was simply responding to your question about file metadata updates without a ctime change. Yes, that is happening, but apparently that is actually a pretty normal situation. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-09-19 23:03 ` Chuck Lever III 2021-09-19 23:19 ` Trond Myklebust @ 2021-09-21 19:00 ` Trond Myklebust 2021-09-24 15:30 ` Chuck Lever III 1 sibling, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2021-09-21 19:00 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote: > > > > On Jul 23, 2021, at 4:24 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: > > > Hi- > > > > > > I noticed recently that generic/075, generic/112, and generic/127 > > > were > > > failing intermittently on NFSv3 mounts. All three of these tests > > > are > > > based on fsx. > > > > > > "git bisect" landed on this commit: > > > > > > 7b24dacf0840 ("NFS: Another inode revalidation improvement") > > > > > > After reverting 7b24dacf0840 on v5.14-rc1, I can no longer > > > reproduce > > > the test failures. > > > > > > > > > > So you are seeing file metadata updates that end up not changing > > the > > ctime? > > As far as I can tell, a WRITE and two SETATTRs are happening in > sequence to the same file during the same jiffy. The WRITE does > not report pre/post attributes, but the SETATTRs do. The reported > pre- and post- mtime and ctime are all the same value for both > SETATTRs, I believe due to timestamp_truncate(). > > My theory is that persistent-storage-backed filesystems seem to > go slow enough that it doesn't become a significant problem. But > with tmpfs, this can happen often enough that the client gets > confused. And I can make the problem unreproducable if I enable > enough debugging paraphernalia on the server to slow it down. > > I'm not exactly sure how the client becomes confused by this > behavior, but fsx reports a stale size value, or it can hit a > bus error. I'm seeing at least four of the fsx-based xfs tests > fail intermittently. > The client no longer relies on post-op attributes in order to update the metadata after a successful SETATTR. If you look at nfs_setattr_update_inode() you'll see that it picks the values that were set directly from the iattr argument. The post-op attributes are only used to determine the implicit timestamp updates, and to detect any other updates that may have happened. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-09-21 19:00 ` Trond Myklebust @ 2021-09-24 15:30 ` Chuck Lever III 2021-09-24 22:09 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever III @ 2021-09-24 15:30 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Sep 21, 2021, at 3:00 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote: >> >> >>> On Jul 23, 2021, at 4:24 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: >>>> Hi- >>>> >>>> I noticed recently that generic/075, generic/112, and generic/127 >>>> were >>>> failing intermittently on NFSv3 mounts. All three of these tests >>>> are >>>> based on fsx. >>>> >>>> "git bisect" landed on this commit: >>>> >>>> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >>>> >>>> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer >>>> reproduce >>>> the test failures. >>>> >>>> >>> >>> So you are seeing file metadata updates that end up not changing >>> the >>> ctime? >> >> As far as I can tell, a WRITE and two SETATTRs are happening in >> sequence to the same file during the same jiffy. The WRITE does >> not report pre/post attributes, but the SETATTRs do. The reported >> pre- and post- mtime and ctime are all the same value for both >> SETATTRs, I believe due to timestamp_truncate(). >> >> My theory is that persistent-storage-backed filesystems seem to >> go slow enough that it doesn't become a significant problem. But >> with tmpfs, this can happen often enough that the client gets >> confused. And I can make the problem unreproducable if I enable >> enough debugging paraphernalia on the server to slow it down. >> >> I'm not exactly sure how the client becomes confused by this >> behavior, but fsx reports a stale size value, or it can hit a >> bus error. I'm seeing at least four of the fsx-based xfs tests >> fail intermittently. >> > > The client no longer relies on post-op attributes in order to update > the metadata after a successful SETATTR. If you look at > nfs_setattr_update_inode() you'll see that it picks the values that > were set directly from the iattr argument. > > The post-op attributes are only used to determine the implicit > timestamp updates, and to detect any other updates that may have > happened. I've been able to directly and repeatedly observe the size attribute reverting to a previous value. The issue stems from the MM driving a background readahead operation at the same time the application truncates or extends the file. The READ starts before the size-mutating operation and completes after it. If the server happens to have done the READ before the size-mutating operation, the READ result contains the previous size value. When the READ completes, the client overwrites the more recent size value with the stale one. I'm not yet sure how this relates to 7b24dacf0840 ("NFS: Another inode revalidation improvement") and maybe it doesn't. "git bisect" with an unreliable reproducer generates notoriously noisy data. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-09-24 15:30 ` Chuck Lever III @ 2021-09-24 22:09 ` Trond Myklebust 2021-09-25 17:26 ` Chuck Lever III 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2021-09-24 22:09 UTC (permalink / raw) To: chuck.lever; +Cc: linux-nfs On Fri, 2021-09-24 at 15:30 +0000, Chuck Lever III wrote: > > > > On Sep 21, 2021, at 3:00 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote: > > > > > > > > > > On Jul 23, 2021, at 4:24 PM, Trond Myklebust > > > > <trondmy@hammerspace.com> wrote: > > > > > > > > On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: > > > > > Hi- > > > > > > > > > > I noticed recently that generic/075, generic/112, and > > > > > generic/127 > > > > > were > > > > > failing intermittently on NFSv3 mounts. All three of these > > > > > tests > > > > > are > > > > > based on fsx. > > > > > > > > > > "git bisect" landed on this commit: > > > > > > > > > > 7b24dacf0840 ("NFS: Another inode revalidation improvement") > > > > > > > > > > After reverting 7b24dacf0840 on v5.14-rc1, I can no longer > > > > > reproduce > > > > > the test failures. > > > > > > > > > > > > > > > > > > So you are seeing file metadata updates that end up not > > > > changing > > > > the > > > > ctime? > > > > > > As far as I can tell, a WRITE and two SETATTRs are happening in > > > sequence to the same file during the same jiffy. The WRITE does > > > not report pre/post attributes, but the SETATTRs do. The reported > > > pre- and post- mtime and ctime are all the same value for both > > > SETATTRs, I believe due to timestamp_truncate(). > > > > > > My theory is that persistent-storage-backed filesystems seem to > > > go slow enough that it doesn't become a significant problem. But > > > with tmpfs, this can happen often enough that the client gets > > > confused. And I can make the problem unreproducable if I enable > > > enough debugging paraphernalia on the server to slow it down. > > > > > > I'm not exactly sure how the client becomes confused by this > > > behavior, but fsx reports a stale size value, or it can hit a > > > bus error. I'm seeing at least four of the fsx-based xfs tests > > > fail intermittently. > > > > > > > The client no longer relies on post-op attributes in order to > > update > > the metadata after a successful SETATTR. If you look at > > nfs_setattr_update_inode() you'll see that it picks the values that > > were set directly from the iattr argument. > > > > The post-op attributes are only used to determine the implicit > > timestamp updates, and to detect any other updates that may have > > happened. > > I've been able to directly and repeatedly observe the size attribute > reverting to a previous value. > > The issue stems from the MM driving a background readahead operation > at the same time the application truncates or extends the file. The > READ starts before the size-mutating operation and completes after > it. > > If the server happens to have done the READ before the size-mutating > operation, the READ result contains the previous size value. When > the READ completes, the client overwrites the more recent size > value with the stale one. > > I'm not yet sure how this relates to > > 7b24dacf0840 ("NFS: Another inode revalidation improvement") > > and maybe it doesn't. "git bisect" with an unreliable reproducer > generates notoriously noisy data. > Hmm... That makes sense. If so, the issue is the attributes from the READ end up tricking nfs_inode_finish_partial_attr_update() into OKing the update because the ctime ends up looking the same, and so the client tries to opportunistically revalidate the cache that was (for some reason) already marked as being invalid. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: recent intermittent fsx-related failures 2021-09-24 22:09 ` Trond Myklebust @ 2021-09-25 17:26 ` Chuck Lever III 0 siblings, 0 replies; 11+ messages in thread From: Chuck Lever III @ 2021-09-25 17:26 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List > On Sep 24, 2021, at 6:09 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2021-09-24 at 15:30 +0000, Chuck Lever III wrote: >> >>> On Sep 21, 2021, at 3:00 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote: >>>> >>>> >>>>> On Jul 23, 2021, at 4:24 PM, Trond Myklebust >>>>> <trondmy@hammerspace.com> wrote: >>>>> >>>>> On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote: >>>>>> Hi- >>>>>> >>>>>> I noticed recently that generic/075, generic/112, and >>>>>> generic/127 >>>>>> were >>>>>> failing intermittently on NFSv3 mounts. All three of these >>>>>> tests >>>>>> are >>>>>> based on fsx. >>>>>> >>>>>> "git bisect" landed on this commit: >>>>>> >>>>>> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >>>>>> >>>>>> After reverting 7b24dacf0840 on v5.14-rc1, I can no longer >>>>>> reproduce >>>>>> the test failures. >>>>>> >>>>>> >>>>> >>>>> So you are seeing file metadata updates that end up not >>>>> changing >>>>> the >>>>> ctime? >>>> >>>> As far as I can tell, a WRITE and two SETATTRs are happening in >>>> sequence to the same file during the same jiffy. The WRITE does >>>> not report pre/post attributes, but the SETATTRs do. The reported >>>> pre- and post- mtime and ctime are all the same value for both >>>> SETATTRs, I believe due to timestamp_truncate(). >>>> >>>> My theory is that persistent-storage-backed filesystems seem to >>>> go slow enough that it doesn't become a significant problem. But >>>> with tmpfs, this can happen often enough that the client gets >>>> confused. And I can make the problem unreproducable if I enable >>>> enough debugging paraphernalia on the server to slow it down. >>>> >>>> I'm not exactly sure how the client becomes confused by this >>>> behavior, but fsx reports a stale size value, or it can hit a >>>> bus error. I'm seeing at least four of the fsx-based xfs tests >>>> fail intermittently. >>>> >>> >>> The client no longer relies on post-op attributes in order to >>> update >>> the metadata after a successful SETATTR. If you look at >>> nfs_setattr_update_inode() you'll see that it picks the values that >>> were set directly from the iattr argument. >>> >>> The post-op attributes are only used to determine the implicit >>> timestamp updates, and to detect any other updates that may have >>> happened. >> >> I've been able to directly and repeatedly observe the size attribute >> reverting to a previous value. >> >> The issue stems from the MM driving a background readahead operation >> at the same time the application truncates or extends the file. The >> READ starts before the size-mutating operation and completes after >> it. >> >> If the server happens to have done the READ before the size-mutating >> operation, the READ result contains the previous size value. When >> the READ completes, the client overwrites the more recent size >> value with the stale one. >> >> I'm not yet sure how this relates to >> >> 7b24dacf0840 ("NFS: Another inode revalidation improvement") >> >> and maybe it doesn't. "git bisect" with an unreliable reproducer >> generates notoriously noisy data. > > Hmm... That makes sense. If so, the issue is the attributes from the > READ end up tricking nfs_inode_finish_partial_attr_update() into OKing > the update because the ctime ends up looking the same, and so the > client tries to opportunistically revalidate the cache that was (for > some reason) already marked as being invalid. That agrees with the behavior I'm able to observe. I sprinkled in extra tracing to get more visibility on what's happening. The XDR decoders report the file's size as returned by the server. I also added events in readpages to note when a readahead starts. Finally, a few new tracepoints show the cache and fattr validity flags used to make cache update decisions. An excerpt with annotations follows. Throughout this part of the trace, the iversion remains at 1752899367783879642. # The MM starts the readahead operation here fsx-1387 [006] 391.823097: nfs_aops_readpages: fileid=00:28:2 fhandle=0x36fbbe51 nr_pages=7 fsx-1387 [006] 391.823102: nfs_initiate_read: fileid=00:28:2 fhandle=0x36fbbe51 offset=114688 count=28672 # This is a concurrent truncating SETATTR fsx-1387 [006] 391.823109: nfs_setattr_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 fsx-1387 [006] 391.823109: nfs_writeback_inode_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 fsx-1387 [006] 391.823109: nfs_writeback_inode_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1752899367783879642 size=151896 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET) # This is the new size value carried by the SETATTR reply fsx-1387 [006] 391.823136: bprint: nfs3_xdr_dec_setattr3res: task:61174@5 size=0x3ab89 # This event captures the client adjusting the inode's size fsx-1387 [006] 391.823137: nfs_size_truncate: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 old size=151896 new size=240521 # Here, refresh_inode_locked is shunted to check_inode_attributes, # which decides not to update the inode's size again fsx-1387 [006] 391.823138: bprint: nfs_inode_attrs_cmp: fileid=00:28:2 fhandle=0x36fbbe51 res=0 fsx-1387 [006] 391.823138: nfs_refresh_inode_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 fsx-1387 [006] 391.823138: nfs_partial_attr_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 cache_validity=DATA_INVAL_DEFER fattr_validity=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|PRE_SIZE|SPACE_USED|FSID|FILEID|ATIME|MTIME|CTIME|PRE_MTIME|PRE_CTIME|CHANGE|PRE_CHANGE fsx-1387 [006] 391.823139: nfs_check_attrs: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 valid_flags= fsx-1387 [006] 391.823139: nfs_refresh_inode_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1752899367783879642 size=240521 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET) fsx-1387 [006] 391.823139: nfs_setattr_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1752899367783879642 size=240521 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET) # Then the readahead completes. The size carried by that READ result # is also captured here kworker/u24:9-193 [005] 391.823090: nfs_readpage_done: fileid=00:28:2 fhandle=0x36fbbe51 offset=49152 count=65536 res=65536 status=65536 kworker/u24:9-193 [005] 391.823142: bprint: nfs3_xdr_dec_read3res: task:61173@5 size=0x25158 # inode_attrs_cmp returns zero also in this case. Prior to 7b24dac # this would have been enough to block the following size update. kworker/u24:9-193 [005] 391.823143: bprint: nfs_inode_attrs_cmp: fileid=00:28:2 fhandle=0x36fbbe51 res=0 kworker/u24:9-193 [005] 391.823143: nfs_refresh_inode_enter: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 kworker/u24:9-193 [005] 391.823143: nfs_partial_attr_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 cache_validity=INVALID_ATIME|DATA_INVAL_DEFER fattr_validity=TYPE|MODE|NLINK|OWNER|GROUP|RDEV|SIZE|SPACE_USED|FSID|FILEID|ATIME|MTIME|CTIME|CHANGE # refresh_inode_locked decides to invoke nfs_update_inode instead # of check_inode_attributes, and that reverts the SETATTR's size # change kworker/u24:9-193 [005] 391.823144: nfs_size_update: fileid=00:28:2 fhandle=0x36fbbe51 version=1752899367783879642 old size=240521 new size=151896 kworker/u24:9-193 [005] 391.823144: nfs_refresh_inode_exit: error=0 (OK) fileid=00:28:2 fhandle=0x36fbbe51 type=8 (REG) version=1752899367783879642 size=151896 cache_validity=0x2000 (DATA_INVAL_DEFER) nfs_flags=0x4 (ACL_LRU_SET) I apologize for this taking so long to diagnose. I was only recently able to derive a reproducer that is reliable enough to make steady progress. Fwiw, I'm now using: # fsx -q -l 262144 -o 65536 -S 191110531 -N 1000000 -R -W fsx_std_nommap on an NFSv3 over RDMA mount point against a tmpfs share. This is taken from generic/127 but -N is increased from 100,000 to a million. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-09-25 17:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-23 20:12 recent intermittent fsx-related failures Chuck Lever III 2021-07-23 20:24 ` Trond Myklebust 2021-07-23 21:31 ` Chuck Lever III 2021-08-23 15:21 ` Chuck Lever III 2021-09-19 23:03 ` Chuck Lever III 2021-09-19 23:19 ` Trond Myklebust 2021-09-20 20:05 ` Chuck Lever III 2021-09-21 19:00 ` Trond Myklebust 2021-09-24 15:30 ` Chuck Lever III 2021-09-24 22:09 ` Trond Myklebust 2021-09-25 17:26 ` Chuck Lever III
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).