All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: recent intermittent fsx-related failures
Date: Fri, 24 Sep 2021 22:09:44 +0000	[thread overview]
Message-ID: <aa2ef2bbb991d693009fb5cf130462a366f5d459.camel@hammerspace.com> (raw)
In-Reply-To: <EA26A03F-962E-4561-9A70-C97D19574993@oracle.com>

On Fri, 2021-09-24 at 15:30 +0000, Chuck Lever III wrote:
> 
> 
> > On Sep 21, 2021, at 3:00 PM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Sun, 2021-09-19 at 23:03 +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Jul 23, 2021, at 4:24 PM, Trond Myklebust
> > > > <trondmy@hammerspace.com> wrote:
> > > > 
> > > > On Fri, 2021-07-23 at 20:12 +0000, Chuck Lever III wrote:
> > > > > Hi-
> > > > > 
> > > > > I noticed recently that generic/075, generic/112, and
> > > > > generic/127
> > > > > were
> > > > > failing intermittently on NFSv3 mounts. All three of these
> > > > > tests
> > > > > are
> > > > > based on fsx.
> > > > > 
> > > > > "git bisect" landed on this commit:
> > > > > 
> > > > > 7b24dacf0840 ("NFS: Another inode revalidation improvement")
> > > > > 
> > > > > After reverting 7b24dacf0840 on v5.14-rc1, I can no longer
> > > > > reproduce
> > > > > the test failures.
> > > > > 
> > > > > 
> > > > 
> > > > So you are seeing file metadata updates that end up not
> > > > changing
> > > > the
> > > > ctime?
> > > 
> > > As far as I can tell, a WRITE and two SETATTRs are happening in
> > > sequence to the same file during the same jiffy. The WRITE does
> > > not report pre/post attributes, but the SETATTRs do. The reported
> > > pre- and post- mtime and ctime are all the same value for both
> > > SETATTRs, I believe due to timestamp_truncate().
> > > 
> > > My theory is that persistent-storage-backed filesystems seem to
> > > go slow enough that it doesn't become a significant problem. But
> > > with tmpfs, this can happen often enough that the client gets
> > > confused. And I can make the problem unreproducable if I enable
> > > enough debugging paraphernalia on the server to slow it down.
> > > 
> > > I'm not exactly sure how the client becomes confused by this
> > > behavior, but fsx reports a stale size value, or it can hit a
> > > bus error. I'm seeing at least four of the fsx-based xfs tests
> > > fail intermittently.
> > > 
> > 
> > The client no longer relies on post-op attributes in order to
> > update
> > the metadata after a successful SETATTR. If you look at
> > nfs_setattr_update_inode() you'll see that it picks the values that
> > were set directly from the iattr argument.
> > 
> > The post-op attributes are only used to determine the implicit
> > timestamp updates, and to detect any other updates that may have
> > happened.
> 
> I've been able to directly and repeatedly observe the size attribute
> reverting to a previous value.
> 
> The issue stems from the MM driving a background readahead operation
> at the same time the application truncates or extends the file. The
> READ starts before the size-mutating operation and completes after
> it.
> 
> If the server happens to have done the READ before the size-mutating
> operation, the READ result contains the previous size value. When
> the READ completes, the client overwrites the more recent size
> value with the stale one.
> 
> I'm not yet sure how this relates to
> 
> 7b24dacf0840 ("NFS: Another inode revalidation improvement")
> 
> and maybe it doesn't. "git bisect" with an unreliable reproducer
> generates notoriously noisy data. 
> 

Hmm... That makes sense. If so, the issue is the attributes from the
READ end up tricking nfs_inode_finish_partial_attr_update() into OKing
the update because the ctime ends up looking the same, and so the
client tries to opportunistically revalidate the cache that was (for
some reason) already marked as being invalid.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2021-09-24 22:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-23 20:12 recent intermittent fsx-related failures Chuck Lever III
2021-07-23 20:24 ` Trond Myklebust
2021-07-23 21:31   ` Chuck Lever III
2021-08-23 15:21     ` Chuck Lever III
2021-09-19 23:03   ` Chuck Lever III
2021-09-19 23:19     ` Trond Myklebust
2021-09-20 20:05       ` Chuck Lever III
2021-09-21 19:00     ` Trond Myklebust
2021-09-24 15:30       ` Chuck Lever III
2021-09-24 22:09         ` Trond Myklebust [this message]
2021-09-25 17:26           ` Chuck Lever III

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa2ef2bbb991d693009fb5cf130462a366f5d459.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.