All of lore.kernel.org
 help / color / mirror / Atom feed
From: "bfields@fieldses.org" <bfields@fieldses.org>
To: Trond Myklebust <trondmy@hammerspace.com>
Cc: "linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"daire@dneg.com" <daire@dneg.com>
Subject: Re: Adventures in NFS re-exporting
Date: Wed, 23 Sep 2020 13:07:35 -0400	[thread overview]
Message-ID: <20200923170735.GC4691@fieldses.org> (raw)
In-Reply-To: <a47553497db7c9ae9f68cbe703a12a4e4051aef2.camel@hammerspace.com>

On Wed, Sep 23, 2020 at 01:09:01PM +0000, Trond Myklebust wrote:
> On Wed, 2020-09-23 at 08:40 -0400, J. Bruce Fields wrote:
> > On Tue, Sep 22, 2020 at 01:52:25PM +0000, Trond Myklebust wrote:
> > > On Tue, 2020-09-22 at 13:31 +0100, Daire Byrne wrote:
> > > > Hi, 
> > > > 
> > > > I just thought I'd flesh out the other two issues I have found
> > > > with
> > > > re-exporting that are ultimately responsible for the biggest
> > > > performance bottlenecks. And both of them revolve around the
> > > > caching
> > > > of metadata file lookups in the NFS client.
> > > > 
> > > > Especially for the case where we are re-exporting a server many
> > > > milliseconds away (i.e. on-premise -> cloud), we want to be able
> > > > to
> > > > control how much the client caches metadata and file data so that
> > > > it's many LAN clients all benefit from the re-export server only
> > > > having to do the WAN lookups once (within a specified coherency
> > > > time).
> > > > 
> > > > Keeping the file data in the vfs page cache or on disk using
> > > > fscache/cachefiles is fairly straightforward, but keeping the
> > > > metadata cached is particularly difficult. And without the cached
> > > > metadata we introduce long delays before we can serve the already
> > > > present and locally cached file data to many waiting clients.
> > > > 
> > > > ----- On 7 Sep, 2020, at 18:31, Daire Byrne daire@dneg.com wrote:
> > > > > 2) If we cache metadata on the re-export server using
> > > > > actimeo=3600,nocto we can
> > > > > cut the network packets back to the origin server to zero for
> > > > > repeated lookups.
> > > > > However, if a client of the re-export server walks paths and
> > > > > memory
> > > > > maps those
> > > > > files (i.e. loading an application), the re-export server
> > > > > starts
> > > > > issuing
> > > > > unexpected calls back to the origin server again,
> > > > > ignoring/invalidating the
> > > > > re-export server's NFS client cache. We worked around this this
> > > > > by
> > > > > patching an
> > > > > inode/iversion validity check in inode.c so that the NFS client
> > > > > cache on the
> > > > > re-export server is used. I'm not sure about the correctness of
> > > > > this patch but
> > > > > it works for our corner case.
> > > > 
> > > > If we use actimeo=3600,nocto (say) to mount a remote software
> > > > volume
> > > > on the re-export server, we can successfully cache the loading of
> > > > applications and walking of paths directly on the re-export
> > > > server
> > > > such that after a couple of runs, there are practically zero
> > > > packets
> > > > back to the originating NFS server (great!). But, if we then do
> > > > the
> > > > same thing on a client which is mounting that re-export server,
> > > > the
> > > > re-export server now starts issuing lots of calls back to the
> > > > originating server and invalidating it's client cache (bad!).
> > > > 
> > > > I'm not exactly sure why, but the iversion of the inode gets
> > > > changed
> > > > locally (due to atime modification?) most likely via invocation
> > > > of
> > > > method inode_inc_iversion_raw. Each time it gets incremented the
> > > > following call to validate attributes detects changes causing it
> > > > to
> > > > be reloaded from the originating server.
> > > > 
> > > > This patch helps to avoid this when applied to the re-export
> > > > server
> > > > but there may be other places where this happens too. I accept
> > > > that
> > > > this patch is probably not the right/general way to do this, but
> > > > it
> > > > helps to highlight the issue when re-exporting and it works well
> > > > for
> > > > our use case:
> > > > 
> > > > --- linux-5.5.0-1.el7.x86_64/fs/nfs/inode.c     2020-01-27
> > > > 00:23:03.000000000 +0000
> > > > +++ new/fs/nfs/inode.c  2020-02-13 16:32:09.013055074 +0000
> > > > @@ -1869,7 +1869,7 @@
> > > >  
> > > >         /* More cache consistency checks */
> > > >         if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
> > > > -               if (!inode_eq_iversion_raw(inode, fattr-
> > > > > change_attr)) {
> > > > +               if (inode_peek_iversion_raw(inode) < fattr-
> > > > > change_attr) {
> > > >                         /* Could it be a race with writeback? */
> > > >                         if (!(have_writers || have_delegation)) {
> > > >                                 invalid |= NFS_INO_INVALID_DATA
> > > 
> > > There is nothing in the base NFSv4, and NFSv4.1 specs that allow
> > > you to
> > > make assumptions about how the change attribute behaves over time.
> > > 
> > > The only safe way to do something like the above is if the server
> > > supports NFSv4.2 and also advertises support for the
> > > 'change_attr_type'
> > > attribute. In that case, you can check at mount time for whether or
> > > not
> > > the change attribute on this filesystem is one of the monotonic
> > > types
> > > which would allow the above optimisation.
> > 
> > Looking at https://tools.ietf.org/html/rfc7862#section-12.2.3 .... I
> > think that would be anything but NFS4_CHANGE_TYPE_IS_UNDEFINED ?
> > 
> > The Linux server's ctime is monotonic and will advertise that with
> > change_attr_type since 4.19.
> > 
> > So I think it would be easy to patch the client to check
> > change_attr_type and set an NFS_CAP_MONOTONIC_CHANGE flag in
> > server->caps, the hard part would be figuring out which optimisations
> > are OK.
> > 
> 
> The ctime is *not* monotonic. It can regress under server reboots and
> it can regress if someone deliberately changes the time.

So, anything other than IS_UNDEFINED or IS_TIME_METADATA?

Though the linux server is susceptible to some of that even when it
returns MONTONIC_INCR.  If the admin replaces the filesystem by an older
snapshot, there's not much we can do.  I'm not sure what degree of
gaurantee we need.

--b.

> We have code
> that tries to handle all these issues (see fattr->gencount and nfsi-
> >attr_gencount) because we've hit those issues before...



  reply	other threads:[~2020-09-23 17:07 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields [this message]
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200923170735.GC4691@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=daire@dneg.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.