linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: bfields <bfields@fieldses.org>
Cc: Daire Byrne <daire@dneg.com>,
	Trond Myklebust <trondmy@hammerspace.com>,
	linux-cachefs <linux-cachefs@redhat.com>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: Adventures in NFS re-exporting
Date: Mon, 16 Nov 2020 11:38:44 -0500	[thread overview]
Message-ID: <db55bab87b6fc9dd1594f8c2e853f07b1165ff5d.camel@kernel.org> (raw)
In-Reply-To: <20201116161407.GG898@fieldses.org>

On Mon, 2020-11-16 at 11:14 -0500, bfields wrote:
> On Mon, Nov 16, 2020 at 11:03:00AM -0500, Jeff Layton wrote:
> > On Mon, 2020-11-16 at 10:56 -0500, bfields wrote:
> > > On Mon, Nov 16, 2020 at 10:29:29AM -0500, Jeff Layton wrote:
> > > > On Fri, 2020-11-13 at 17:26 -0500, bfields wrote:
> > > > > On Fri, Nov 13, 2020 at 09:50:50AM -0500, bfields wrote:
> > > > > > On Thu, Nov 12, 2020 at 11:05:57PM +0000, Daire Byrne wrote:
> > > > > > > So, I can't lay claim to identifying the exact optimisation/hack that
> > > > > > > improves the retention of the re-export server's client cache when
> > > > > > > re-exporting an NFSv3 server (which is then read by many clients). We
> > > > > > > were working with an engineer at the time who showed an interest in
> > > > > > > our use case and after we supplied a reproducer he suggested modifying
> > > > > > > the nfs/inode.c
> > > > > > > 
> > > > > > > -		if (!inode_eq_iversion_raw(inode, fattr->change_attr)) {
> > > > > > > +		if (inode_peek_iversion_raw(inode) < fattr->change_attr)
> > > > > > > {
> > > > > > > 
> > > > > > > His reasoning at the time was:
> > > > > > > 
> > > > > > > "Fixes inode invalidation caused by read access. The least important
> > > > > > > bit is ORed with 1 and causes the inode version to differ from the one
> > > > > > > seen on the NFS share. This in turn causes unnecessary re-download
> > > > > > > impacting the performance significantly. This fix makes it only
> > > > > > > re-fetch file content if inode version seen on the server is newer
> > > > > > > than the one on the client."
> > > > > > > 
> > > > > > > But I've always been puzzled by why this only seems to be the case
> > > > > > > when using knfsd to re-export the (NFSv3) client mount. Using multiple
> > > > > > > processes on a standard client mount never causes any similar
> > > > > > > re-validations. And this happens with a completely read-only share
> > > > > > > which is why I started to think it has something to do with atimes as
> > > > > > > that could perhaps still cause a "write" modification even when
> > > > > > > read-only?
> > > > > > 
> > > > > > Ah-hah!  So, it's inode_query_iversion() that's modifying a nfs inode's
> > > > > > i_version.  That's a special thing that only nfsd would do.
> > > > > > 
> > > > > > I think that's totally fixable, we'll just have to think a little about
> > > > > > how....
> > > > > 
> > > > > I wonder if something like this helps?--b.
> > > > > 
> > > > > commit 0add88a9ccc5
> > > > > Author: J. Bruce Fields <bfields@redhat.com>
> > > > > Date:   Fri Nov 13 17:03:04 2020 -0500
> > > > > 
> > > > >     nfs: don't mangle i_version on NFS
> > > > >     
> > > > > 
> > > > >     The i_version on NFS has pretty much opaque to the client, so we don't
> > > > >     want to give the low bit any special interpretation.
> > > > >     
> > > > > 
> > > > >     Define a new FS_PRIVATE_I_VERSION flag for filesystems that manage the
> > > > >     i_version on their own.
> > > > >     
> > > > > 
> > > > >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > > > 
> > > > > diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
> > > > > index 29ec8b09a52d..9b8dd5b713a7 100644
> > > > > --- a/fs/nfs/fs_context.c
> > > > > +++ b/fs/nfs/fs_context.c
> > > > > @@ -1488,7 +1488,8 @@ struct file_system_type nfs_fs_type = {
> > > > >  	.init_fs_context	= nfs_init_fs_context,
> > > > >  	.parameters		= nfs_fs_parameters,
> > > > >  	.kill_sb		= nfs_kill_super,
> > > > > -	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
> > > > > +	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
> > > > > +				  FS_PRIVATE_I_VERSION,
> > > > >  };
> > > > >  MODULE_ALIAS_FS("nfs");
> > > > >  EXPORT_SYMBOL_GPL(nfs_fs_type);
> > > > > @@ -1500,7 +1501,8 @@ struct file_system_type nfs4_fs_type = {
> > > > >  	.init_fs_context	= nfs_init_fs_context,
> > > > >  	.parameters		= nfs_fs_parameters,
> > > > >  	.kill_sb		= nfs_kill_super,
> > > > > -	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
> > > > > +	.fs_flags		= FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
> > > > > +				  FS_PRIVATE_I_VERSION,
> > > > >  };
> > > > >  MODULE_ALIAS_FS("nfs4");
> > > > >  MODULE_ALIAS("nfs4");
> > > > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > > > index 21cc971fd960..c5bb4268228b 100644
> > > > > --- a/include/linux/fs.h
> > > > > +++ b/include/linux/fs.h
> > > > > @@ -2217,6 +2217,7 @@ struct file_system_type {
> > > > >  #define FS_HAS_SUBTYPE		4
> > > > >  #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
> > > > >  #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
> > > > > +#define FS_PRIVATE_I_VERSION	32	/* i_version managed by filesystem */
> > > > >  #define FS_THP_SUPPORT		8192	/* Remove once all fs converted */
> > > > >  #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
> > > > >  	int (*init_fs_context)(struct fs_context *);
> > > > > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > > > > index 2917ef990d43..52c790a847de 100644
> > > > > --- a/include/linux/iversion.h
> > > > > +++ b/include/linux/iversion.h
> > > > > @@ -307,6 +307,8 @@ inode_query_iversion(struct inode *inode)
> > > > >  	u64 cur, old, new;
> > > > >  
> > > > > 
> > > > >  	cur = inode_peek_iversion_raw(inode);
> > > > > +	if (inode->i_sb->s_type->fs_flags & FS_PRIVATE_I_VERSION)
> > > > > +		return cur;
> > > > >  	for (;;) {
> > > > >  		/* If flag is already set, then no need to swap */
> > > > >  		if (cur & I_VERSION_QUERIED) {
> > > > 
> > > > 
> > > > It's probably more correct to just check the already-existing
> > > > SB_I_VERSION flag here
> > > 
> > > So the check would be
> > > 
> > > 	if (!IS_I_VERSION(inode))
> > > 		return cur;
> > > 
> > > ?
> > > 
> > 
> > Yes, that looks about right.
> 
> That doesn't sound right to me.  NFS, for example, has a perfectly good
> i_version that works as a change attribute, so it should set
> SB_I_VERSION.  But it doesn't want the vfs playing games with the low
> bit.
> 
> (In fact, I'm confused now: the improvement Daire was seeing should only
> be possible if the re-export server was seeing SB_I_VERSION set on the
> NFS filesystem it was exporting, but a quick grep doesn't actually show
> me where NFS is setting SB_I_VERSION.  I'm missing something
> obvious....)


Hmm, ok... nfsd4_change_attribute() is called from nfs4 code but also
nfs3 code as well. The v4 caller (encode_change) only calls it when
IS_I_VERSION is set, but the v3 callers don't seem to pay attention to
that.

I think the basic issue here is that we're trying to use SB_I_VERSION
for two different things. Its main purpose is to tell the kernel that
when it's updating the file times that it should also (possibly)
increment the i_version counter too. (Some of this is documented in
include/linux/iversion.h too, fwiw)

nfsd needs a way to tell whether the field should be consulted at all.
For that we probably do need a different flag of some sort. Doing it at
the fstype level seems a bit wrong though -- v2/3 don't have a real
change attribute and it probably shouldn't be trusted when exporting
them.

-- 
Jeff Layton <jlayton@kernel.org>


  reply	other threads:[~2020-11-16 16:38 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08  9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06   ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59   ` Trond Myklebust
2020-09-16 16:01     ` Daire Byrne
2020-10-19 16:19       ` Daire Byrne
2020-10-19 17:53         ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53           ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53             ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05         ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05           ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05             ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37         ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37           ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37             ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37               ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21  9:33         ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02           ` bfields
2020-11-12 13:01             ` Daire Byrne
2020-11-12 13:57               ` bfields
2020-11-12 18:33                 ` Daire Byrne
2020-11-12 20:55                   ` bfields
2020-11-12 23:05                     ` Daire Byrne
2020-11-13 14:50                       ` bfields
2020-11-13 22:26                         ` bfields
2020-11-14 12:57                           ` Daire Byrne
2020-11-16 15:18                             ` bfields
2020-11-16 15:53                             ` bfields
2020-11-16 19:21                               ` Daire Byrne
2020-11-16 15:29                           ` Jeff Layton
2020-11-16 15:56                             ` bfields
2020-11-16 16:03                               ` Jeff Layton
2020-11-16 16:14                                 ` bfields
2020-11-16 16:38                                   ` Jeff Layton [this message]
2020-11-16 19:03                                     ` bfields
2020-11-16 20:03                                       ` Jeff Layton
2020-11-17  3:16                                         ` bfields
2020-11-17  3:18                                           ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-17 15:26                                                 ` J. Bruce Fields
2020-11-17 15:34                                                   ` Jeff Layton
2020-11-20 22:38                                                     ` J. Bruce Fields
2020-11-20 22:39                                                       ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21  0:34                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21  0:58                                                           ` Jeff Layton
2020-11-21  1:01                                                             ` J. Bruce Fields
2020-11-21 13:00                                                           ` Jeff Layton
2020-11-20 22:39                                                         ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39                                                         ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44                                                       ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21  1:03                                                         ` Jeff Layton
2020-11-21 21:44                                                           ` Daire Byrne
2020-11-22  0:02                                                             ` bfields
2020-11-22  1:55                                                               ` Daire Byrne
2020-11-22  3:03                                                                 ` bfields
2020-11-23 20:07                                                                   ` Daire Byrne
2020-11-17 15:25                                               ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27                                               ` Jeff Layton
2020-11-17 14:14                                                 ` J. Bruce Fields
2020-11-17  3:18                                             ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34                                               ` Jeff Layton
2020-11-24 20:35               ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15                 ` bfields
2020-11-24 22:15                   ` Frank Filz
2020-11-25 14:47                     ` 'bfields'
2020-11-25 16:25                       ` Frank Filz
2020-11-25 19:03                         ` 'bfields'
2020-11-26  0:04                           ` Frank Filz
2020-11-25 17:14                   ` Daire Byrne
2020-11-25 19:31                     ` bfields
2020-12-03 12:20                     ` Daire Byrne
2020-12-03 18:51                       ` bfields
2020-12-03 20:27                         ` Trond Myklebust
2020-12-03 21:13                           ` bfields
2020-12-03 21:32                             ` Frank Filz
2020-12-03 21:34                             ` Trond Myklebust
2020-12-03 21:45                               ` Frank Filz
2020-12-03 21:57                                 ` Trond Myklebust
2020-12-03 22:04                                   ` bfields
2020-12-03 22:14                                     ` Trond Myklebust
2020-12-03 22:39                                       ` Frank Filz
2020-12-03 22:50                                         ` Trond Myklebust
2020-12-03 23:34                                           ` Frank Filz
2020-12-03 22:44                                       ` bfields
2020-12-03 21:54                               ` bfields
2020-12-03 22:45                               ` bfields
2020-12-03 22:53                                 ` Trond Myklebust
2020-12-03 23:16                                   ` bfields
2020-12-03 23:28                                     ` Frank Filz
2020-12-04  1:02                                     ` Trond Myklebust
2020-12-04  1:41                                       ` bfields
2020-12-04  2:27                                         ` Trond Myklebust
2020-09-17 16:01   ` Daire Byrne
2020-09-17 19:09     ` bfields
2020-09-17 20:23       ` Frank van der Linden
2020-09-17 21:57         ` bfields
2020-09-19 11:08           ` Daire Byrne
2020-09-22 16:43         ` Chuck Lever
2020-09-23 20:25           ` Daire Byrne
2020-09-23 21:01             ` Frank van der Linden
2020-09-26  9:00               ` Daire Byrne
2020-09-28 15:49                 ` Frank van der Linden
2020-09-28 16:08                   ` Chuck Lever
2020-09-28 17:42                     ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52   ` Trond Myklebust
2020-09-23 12:40     ` J. Bruce Fields
2020-09-23 13:09       ` Trond Myklebust
2020-09-23 17:07         ` bfields
2020-09-30 19:30   ` [Linux-cachefs] " Jeff Layton
2020-10-01  0:09     ` Daire Byrne
2020-10-01 10:36       ` Jeff Layton
2020-10-01 12:38         ` Trond Myklebust
2020-10-01 16:39           ` Jeff Layton
2020-10-05 12:54         ` Daire Byrne
2020-10-13  9:59           ` Daire Byrne
2020-10-01 18:41     ` J. Bruce Fields
2020-10-01 19:24       ` Trond Myklebust
2020-10-01 19:26         ` bfields
2020-10-01 19:29           ` Trond Myklebust
2020-10-01 19:51             ` bfields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db55bab87b6fc9dd1594f8c2e853f07b1165ff5d.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=bfields@fieldses.org \
    --cc=daire@dneg.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).