linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "bfields@fieldses.org" <bfields@fieldses.org>
Cc: "linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"daire@dneg.com" <daire@dneg.com>
Subject: Re: Adventures in NFS re-exporting
Date: Wed, 23 Sep 2020 13:09:01 +0000	[thread overview]
Message-ID: <a47553497db7c9ae9f68cbe703a12a4e4051aef2.camel@hammerspace.com> (raw)
In-Reply-To: <20200923124038.GA4691@fieldses.org>

On Wed, 2020-09-23 at 08:40 -0400, J. Bruce Fields wrote:
> On Tue, Sep 22, 2020 at 01:52:25PM +0000, Trond Myklebust wrote:
> > On Tue, 2020-09-22 at 13:31 +0100, Daire Byrne wrote:
> > > Hi, 
> > > 
> > > I just thought I'd flesh out the other two issues I have found
> > > with
> > > re-exporting that are ultimately responsible for the biggest
> > > performance bottlenecks. And both of them revolve around the
> > > caching
> > > of metadata file lookups in the NFS client.
> > > 
> > > Especially for the case where we are re-exporting a server many
> > > milliseconds away (i.e. on-premise -> cloud), we want to be able
> > > to
> > > control how much the client caches metadata and file data so that
> > > it's many LAN clients all benefit from the re-export server only
> > > having to do the WAN lookups once (within a specified coherency
> > > time).
> > > 
> > > Keeping the file data in the vfs page cache or on disk using
> > > fscache/cachefiles is fairly straightforward, but keeping the
> > > metadata cached is particularly difficult. And without the cached
> > > metadata we introduce long delays before we can serve the already
> > > present and locally cached file data to many waiting clients.
> > > 
> > > ----- On 7 Sep, 2020, at 18:31, Daire Byrne daire@dneg.com wrote:
> > > > 2) If we cache metadata on the re-export server using
> > > > actimeo=3600,nocto we can
> > > > cut the network packets back to the origin server to zero for
> > > > repeated lookups.
> > > > However, if a client of the re-export server walks paths and
> > > > memory
> > > > maps those
> > > > files (i.e. loading an application), the re-export server
> > > > starts
> > > > issuing
> > > > unexpected calls back to the origin server again,
> > > > ignoring/invalidating the
> > > > re-export server's NFS client cache. We worked around this this
> > > > by
> > > > patching an
> > > > inode/iversion validity check in inode.c so that the NFS client
> > > > cache on the
> > > > re-export server is used. I'm not sure about the correctness of
> > > > this patch but
> > > > it works for our corner case.
> > > 
> > > If we use actimeo=3600,nocto (say) to mount a remote software
> > > volume
> > > on the re-export server, we can successfully cache the loading of
> > > applications and walking of paths directly on the re-export
> > > server
> > > such that after a couple of runs, there are practically zero
> > > packets
> > > back to the originating NFS server (great!). But, if we then do
> > > the
> > > same thing on a client which is mounting that re-export server,
> > > the
> > > re-export server now starts issuing lots of calls back to the
> > > originating server and invalidating it's client cache (bad!).
> > > 
> > > I'm not exactly sure why, but the iversion of the inode gets
> > > changed
> > > locally (due to atime modification?) most likely via invocation
> > > of
> > > method inode_inc_iversion_raw. Each time it gets incremented the
> > > following call to validate attributes detects changes causing it
> > > to
> > > be reloaded from the originating server.
> > > 
> > > This patch helps to avoid this when applied to the re-export
> > > server
> > > but there may be other places where this happens too. I accept
> > > that
> > > this patch is probably not the right/general way to do this, but
> > > it
> > > helps to highlight the issue when re-exporting and it works well
> > > for
> > > our use case:
> > > 
> > > --- linux-5.5.0-1.el7.x86_64/fs/nfs/inode.c     2020-01-27
> > > 00:23:03.000000000 +0000
> > > +++ new/fs/nfs/inode.c  2020-02-13 16:32:09.013055074 +0000
> > > @@ -1869,7 +1869,7 @@
> > >  
> > >         /* More cache consistency checks */
> > >         if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
> > > -               if (!inode_eq_iversion_raw(inode, fattr-
> > > > change_attr)) {
> > > +               if (inode_peek_iversion_raw(inode) < fattr-
> > > > change_attr) {
> > >                         /* Could it be a race with writeback? */
> > >                         if (!(have_writers || have_delegation)) {
> > >                                 invalid |= NFS_INO_INVALID_DATA
> > 
> > There is nothing in the base NFSv4, and NFSv4.1 specs that allow
> > you to
> > make assumptions about how the change attribute behaves over time.
> > 
> > The only safe way to do something like the above is if the server
> > supports NFSv4.2 and also advertises support for the
> > 'change_attr_type'
> > attribute. In that case, you can check at mount time for whether or
> > not
> > the change attribute on this filesystem is one of the monotonic
> > types
> > which would allow the above optimisation.
> 
> Looking at https://tools.ietf.org/html/rfc7862#section-12.2.3 .... I
> think that would be anything but NFS4_CHANGE_TYPE_IS_UNDEFINED ?
> 
> The Linux server's ctime is monotonic and will advertise that with
> change_attr_type since 4.19.
> 
> So I think it would be easy to patch the client to check
> change_attr_type and set an NFS_CAP_MONOTONIC_CHANGE flag in
> server->caps, the hard part would be figuring out which optimisations
> are OK.
> 

The ctime is *not* monotonic. It can regress under server reboots and
it can regress if someone deliberately changes the time. We have code
that tries to handle all these issues (see fattr->gencount and nfsi-
>attr_gencount) because we've hit those issues before...

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2020-09-23 13:09 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust [this message]
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a47553497db7c9ae9f68cbe703a12a4e4051aef2.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@fieldses.org \
    --cc=daire@dneg.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).