All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Trond Myklebust <trondmy@hammerspace.com>,
	"daire@dneg.com" <daire@dneg.com>
Cc: "linux-cachefs@redhat.com" <linux-cachefs@redhat.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [Linux-cachefs] Adventures in NFS re-exporting
Date: Thu, 01 Oct 2020 12:39:41 -0400	[thread overview]
Message-ID: <cab47436d0050570cafaf9c9c4a02ee6202215fd.camel@kernel.org> (raw)
In-Reply-To: <7cdb496a2b77dd62b8e6373c28926f11a4816d49.camel@hammerspace.com>

On Thu, 2020-10-01 at 12:38 +0000, Trond Myklebust wrote:
> On Thu, 2020-10-01 at 06:36 -0400, Jeff Layton wrote:
> > On Thu, 2020-10-01 at 01:09 +0100, Daire Byrne wrote:
> > > ----- On 30 Sep, 2020, at 20:30, Jeff Layton jlayton@kernel.org
> > > wrote:
> > > 
> > > > On Tue, 2020-09-22 at 13:31 +0100, Daire Byrne wrote:
> > > > > Hi,
> > > > > 
> > > > > I just thought I'd flesh out the other two issues I have found
> > > > > with re-exporting
> > > > > that are ultimately responsible for the biggest performance
> > > > > bottlenecks. And
> > > > > both of them revolve around the caching of metadata file
> > > > > lookups in the NFS
> > > > > client.
> > > > > 
> > > > > Especially for the case where we are re-exporting a server many
> > > > > milliseconds
> > > > > away (i.e. on-premise -> cloud), we want to be able to control
> > > > > how much the
> > > > > client caches metadata and file data so that it's many LAN
> > > > > clients all benefit
> > > > > from the re-export server only having to do the WAN lookups
> > > > > once (within a
> > > > > specified coherency time).
> > > > > 
> > > > > Keeping the file data in the vfs page cache or on disk using
> > > > > fscache/cachefiles
> > > > > is fairly straightforward, but keeping the metadata cached is
> > > > > particularly
> > > > > difficult. And without the cached metadata we introduce long
> > > > > delays before we
> > > > > can serve the already present and locally cached file data to
> > > > > many waiting
> > > > > clients.
> > > > > 
> > > > > ----- On 7 Sep, 2020, at 18:31, Daire Byrne daire@dneg.com
> > > > > wrote:
> > > > > > 2) If we cache metadata on the re-export server using
> > > > > > actimeo=3600,nocto we can
> > > > > > cut the network packets back to the origin server to zero for
> > > > > > repeated lookups.
> > > > > > However, if a client of the re-export server walks paths and
> > > > > > memory maps those
> > > > > > files (i.e. loading an application), the re-export server
> > > > > > starts issuing
> > > > > > unexpected calls back to the origin server again,
> > > > > > ignoring/invalidating the
> > > > > > re-export server's NFS client cache. We worked around this
> > > > > > this by patching an
> > > > > > inode/iversion validity check in inode.c so that the NFS
> > > > > > client cache on the
> > > > > > re-export server is used. I'm not sure about the correctness
> > > > > > of this patch but
> > > > > > it works for our corner case.
> > > > > 
> > > > > If we use actimeo=3600,nocto (say) to mount a remote software
> > > > > volume on the
> > > > > re-export server, we can successfully cache the loading of
> > > > > applications and
> > > > > walking of paths directly on the re-export server such that
> > > > > after a couple of
> > > > > runs, there are practically zero packets back to the
> > > > > originating NFS server
> > > > > (great!). But, if we then do the same thing on a client which
> > > > > is mounting that
> > > > > re-export server, the re-export server now starts issuing lots
> > > > > of calls back to
> > > > > the originating server and invalidating it's client cache
> > > > > (bad!).
> > > > > 
> > > > > I'm not exactly sure why, but the iversion of the inode gets
> > > > > changed locally
> > > > > (due to atime modification?) most likely via invocation of
> > > > > method
> > > > > inode_inc_iversion_raw. Each time it gets incremented the
> > > > > following call to
> > > > > validate attributes detects changes causing it to be reloaded
> > > > > from the
> > > > > originating server.
> > > > > 
> > > > 
> > > > I'd expect the change attribute to track what's in actual inode
> > > > on the
> > > > "home" server. The NFS client is supposed to (mostly) keep the
> > > > raw
> > > > change attribute in its i_version field.
> > > > 
> > > > The only place we call inode_inc_iversion_raw is in
> > > > nfs_inode_add_request, which I don't think you'd be hitting
> > > > unless you
> > > > were writing to the file while holding a write delegation.
> > > > 
> > > > What sort of server is hosting the actual data in your setup?
> > > 
> > > We mostly use RHEL7.6 NFS servers with XFS backed filesystems and a
> > > couple of (older) Netapps too. The re-export server is running the
> > > latest mainline kernel(s).
> > > 
> > > As far as I can make out, both these originating (home) server
> > > types exhibit a similar (but not exactly the same) effect on the
> > > Linux NFS client cache when it is being re-exported and accessed by
> > > other clients. I can replicate it when only using a read-only mount
> > > at every hop so I don't think that writes are related.
> > > 
> > > Our RHEL7 NFS servers actually mount XFS with noatime too so any
> > > atime updates that might be causing this client invalidation (which
> > > is what I initially thought) are ultimately a wasted effort.
> > > 
> > 
> > Ok. I suspect there is a bug here somewhere, but with such a
> > complicated
> > setup though it's not clear to me where that bug would be though. You
> > might need to do some packet sniffing and look at what the servers
> > are
> > sending for change attributes.
> > 
> > nfsd4_change_attribute does mix in the ctime, so your hunch about the
> > atime may be correct. atime updates imply a ctime update and that
> > could
> > cause nfsd to continually send a new one, even on files that aren't
> > being changed.
> 
> No. Ordinary atime updates due to read() do not trigger a ctime or
> change attribute update. Only an explicit atime update through, e.g. a
> call to utimensat() will do that.
> 

Oh, interesting. I didn't realize that.

> > It might be interesting to doctor nfsd4_change_attribute() to not mix
> > in
> > the ctime and see whether that improves things. If it does, then we
> > may
> > want to teach nfsd how to avoid doing that for certain types of
> > filesystems.
> 
> NACK. That would cause very incorrect behaviour for the change
> attribute. It is supposed to change in all circumstances where you
> ordinarily see a ctime change.


I wasn't suggesting this as a real fix, just as a way to see whether we
understand the problem correctly. I doubt the reexporting machine would
be bumping the change_attr on its own, and this may tell you whether
it's the "home" server changing it. There are other ways to determine it
too though (packet sniffer, for instance).

-- 
Jeff Layton <jlayton@kernel.org>


  reply	other threads:[~2020-10-01 16:39 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton [this message]
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cab47436d0050570cafaf9c9c4a02ee6202215fd.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=daire@dneg.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.