From: bfields <bfields@fieldses.org>
To: Daire Byrne <daire@dneg.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
linux-cachefs <linux-cachefs@redhat.com>,
linux-nfs <linux-nfs@vger.kernel.org>,
jlayton@kernel.org
Subject: Re: Adventures in NFS re-exporting
Date: Mon, 16 Nov 2020 10:18:50 -0500 [thread overview]
Message-ID: <20201116151850.GD898@fieldses.org> (raw)
In-Reply-To: <217712894.87456370.1605358643862.JavaMail.zimbra@dneg.com>
Jeff, does something like this look reasonable?
--b.
On Sat, Nov 14, 2020 at 12:57:24PM +0000, Daire Byrne wrote:
> ----- On 13 Nov, 2020, at 22:26, bfields bfields@fieldses.org wrote:
> > On Fri, Nov 13, 2020 at 09:50:50AM -0500, bfields wrote:
> >> Ah-hah! So, it's inode_query_iversion() that's modifying a nfs inode's
> >> i_version. That's a special thing that only nfsd would do.
> >>
> >> I think that's totally fixable, we'll just have to think a little about
> >> how....
> >
> > I wonder if something like this helps?--b.
> >
> > commit 0add88a9ccc5
> > Author: J. Bruce Fields <bfields@redhat.com>
> > Date: Fri Nov 13 17:03:04 2020 -0500
> >
> > nfs: don't mangle i_version on NFS
> >
> > The i_version on NFS has pretty much opaque to the client, so we don't
> > want to give the low bit any special interpretation.
> >
> > Define a new FS_PRIVATE_I_VERSION flag for filesystems that manage the
> > i_version on their own.
> >
> > Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> >
> > diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
> > index 29ec8b09a52d..9b8dd5b713a7 100644
> > --- a/fs/nfs/fs_context.c
> > +++ b/fs/nfs/fs_context.c
> > @@ -1488,7 +1488,8 @@ struct file_system_type nfs_fs_type = {
> > .init_fs_context = nfs_init_fs_context,
> > .parameters = nfs_fs_parameters,
> > .kill_sb = nfs_kill_super,
> > - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
> > + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
> > + FS_PRIVATE_I_VERSION,
> > };
> > MODULE_ALIAS_FS("nfs");
> > EXPORT_SYMBOL_GPL(nfs_fs_type);
> > @@ -1500,7 +1501,8 @@ struct file_system_type nfs4_fs_type = {
> > .init_fs_context = nfs_init_fs_context,
> > .parameters = nfs_fs_parameters,
> > .kill_sb = nfs_kill_super,
> > - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA,
> > + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA|
> > + FS_PRIVATE_I_VERSION,
> > };
> > MODULE_ALIAS_FS("nfs4");
> > MODULE_ALIAS("nfs4");
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 21cc971fd960..c5bb4268228b 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2217,6 +2217,7 @@ struct file_system_type {
> > #define FS_HAS_SUBTYPE 4
> > #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */
> > #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */
> > +#define FS_PRIVATE_I_VERSION 32 /* i_version managed by filesystem */
> > #define FS_THP_SUPPORT 8192 /* Remove once all fs converted */
> > #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename()
> > internally. */
> > int (*init_fs_context)(struct fs_context *);
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index 2917ef990d43..52c790a847de 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -307,6 +307,8 @@ inode_query_iversion(struct inode *inode)
> > u64 cur, old, new;
> >
> > cur = inode_peek_iversion_raw(inode);
> > + if (inode->i_sb->s_type->fs_flags & FS_PRIVATE_I_VERSION)
> > + return cur;
> > for (;;) {
> > /* If flag is already set, then no need to swap */
> > if (cur & I_VERSION_QUERIED) {
>
> Yes, I can confirm that this absolutely helps! I replaced our (brute force) iversion patch with this (much nicer) patch and we got the same improvement; nfsd and it's clients no longer cause the re-export server's client cache to constantly be re-validated. The re-export server can now serve the same results to many clients from cache. Thanks so much for spending the time to track this down. If merged, future (crazy) NFS re-exporters will benefit from the metadata performance improvement/acceleration!
>
> Now if anyone has any ideas why all the read calls to the originating server are limited to a maximum of 128k (with rsize=1M) when coming via the re-export server's nfsd threads, I see that as the next biggest performance issue. Reading directly on the re-export server with a userspace process issues 1MB reads as expected. It doesn't happen for writes (wsize=1MB all the way through) but I'm not sure if that has more to do with async and write back caching helping to build up the size before commit?
>
> I figure the other remaining items on my (wish) list are probably more in the "won't fix" or "can't fix" category (except maybe the NFSv4.0 input/output errors?).
>
> Daire
next prev parent reply other threads:[~2020-11-16 15:19 UTC|newest]
Thread overview: 129+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-07 17:31 Adventures in NFS re-exporting Daire Byrne
2020-09-08 9:40 ` Mkrtchyan, Tigran
2020-09-08 11:06 ` Daire Byrne
2020-09-15 17:21 ` J. Bruce Fields
2020-09-15 19:59 ` Trond Myklebust
2020-09-16 16:01 ` Daire Byrne
2020-10-19 16:19 ` Daire Byrne
2020-10-19 17:53 ` [PATCH 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 17:53 ` [PATCH 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 17:53 ` [PATCH 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-19 20:05 ` [PATCH v2 0/2] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-19 20:05 ` [PATCH v2 1/2] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-19 20:05 ` [PATCH v2 2/2] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37 ` [PATCH v3 0/3] Add NFSv3 emulation of the lookupp operation trondmy
2020-10-20 18:37 ` [PATCH v3 1/3] NFSv3: Refactor nfs3_proc_lookup() to split out the dentry trondmy
2020-10-20 18:37 ` [PATCH v3 2/3] NFSv3: Add emulation of the lookupp() operation trondmy
2020-10-20 18:37 ` [PATCH v3 3/3] NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp trondmy
2020-10-21 9:33 ` Adventures in NFS re-exporting Daire Byrne
2020-11-09 16:02 ` bfields
2020-11-12 13:01 ` Daire Byrne
2020-11-12 13:57 ` bfields
2020-11-12 18:33 ` Daire Byrne
2020-11-12 20:55 ` bfields
2020-11-12 23:05 ` Daire Byrne
2020-11-13 14:50 ` bfields
2020-11-13 22:26 ` bfields
2020-11-14 12:57 ` Daire Byrne
2020-11-16 15:18 ` bfields [this message]
2020-11-16 15:53 ` bfields
2020-11-16 19:21 ` Daire Byrne
2020-11-16 15:29 ` Jeff Layton
2020-11-16 15:56 ` bfields
2020-11-16 16:03 ` Jeff Layton
2020-11-16 16:14 ` bfields
2020-11-16 16:38 ` Jeff Layton
2020-11-16 19:03 ` bfields
2020-11-16 20:03 ` Jeff Layton
2020-11-17 3:16 ` bfields
2020-11-17 3:18 ` [PATCH 1/4] nfsd: move fill_{pre,post}_wcc to nfsfh.c J. Bruce Fields
2020-11-17 3:18 ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-17 12:34 ` Jeff Layton
2020-11-17 15:26 ` J. Bruce Fields
2020-11-17 15:34 ` Jeff Layton
2020-11-20 22:38 ` J. Bruce Fields
2020-11-20 22:39 ` [PATCH 1/8] nfsd: only call inode_query_iversion in the I_VERSION case J. Bruce Fields
2020-11-20 22:39 ` [PATCH 2/8] nfsd: simplify nfsd4_change_info J. Bruce Fields
2020-11-20 22:39 ` [PATCH 3/8] nfsd: minor nfsd4_change_attribute cleanup J. Bruce Fields
2020-11-21 0:34 ` Jeff Layton
2020-11-20 22:39 ` [PATCH 4/8] nfsd4: don't query change attribute in v2/v3 case J. Bruce Fields
2020-11-20 22:39 ` [PATCH 5/8] nfs: use change attribute for NFS re-exports J. Bruce Fields
2020-11-20 22:39 ` [PATCH 6/8] nfsd: move change attribute generation to filesystem J. Bruce Fields
2020-11-21 0:58 ` Jeff Layton
2020-11-21 1:01 ` J. Bruce Fields
2020-11-21 13:00 ` Jeff Layton
2020-11-20 22:39 ` [PATCH 7/8] nfsd: skip some unnecessary stats in the v4 case J. Bruce Fields
2020-11-20 22:39 ` [PATCH 8/8] Revert "nfsd4: support change_attr_type attribute" J. Bruce Fields
2020-11-20 22:44 ` [PATCH 2/4] nfsd: pre/post attr is using wrong change attribute J. Bruce Fields
2020-11-21 1:03 ` Jeff Layton
2020-11-21 21:44 ` Daire Byrne
2020-11-22 0:02 ` bfields
2020-11-22 1:55 ` Daire Byrne
2020-11-22 3:03 ` bfields
2020-11-23 20:07 ` Daire Byrne
2020-11-17 15:25 ` J. Bruce Fields
2020-11-17 3:18 ` [PATCH 3/4] nfs: don't mangle i_version on NFS J. Bruce Fields
2020-11-17 12:27 ` Jeff Layton
2020-11-17 14:14 ` J. Bruce Fields
2020-11-17 3:18 ` [PATCH 4/4] nfs: support i_version in the NFSv4 case J. Bruce Fields
2020-11-17 12:34 ` Jeff Layton
2020-11-24 20:35 ` Adventures in NFS re-exporting Daire Byrne
2020-11-24 21:15 ` bfields
2020-11-24 22:15 ` Frank Filz
2020-11-25 14:47 ` 'bfields'
2020-11-25 16:25 ` Frank Filz
2020-11-25 19:03 ` 'bfields'
2020-11-26 0:04 ` Frank Filz
2020-11-25 17:14 ` Daire Byrne
2020-11-25 19:31 ` bfields
2020-12-03 12:20 ` Daire Byrne
2020-12-03 18:51 ` bfields
2020-12-03 20:27 ` Trond Myklebust
2020-12-03 21:13 ` bfields
2020-12-03 21:32 ` Frank Filz
2020-12-03 21:34 ` Trond Myklebust
2020-12-03 21:45 ` Frank Filz
2020-12-03 21:57 ` Trond Myklebust
2020-12-03 22:04 ` bfields
2020-12-03 22:14 ` Trond Myklebust
2020-12-03 22:39 ` Frank Filz
2020-12-03 22:50 ` Trond Myklebust
2020-12-03 23:34 ` Frank Filz
2020-12-03 22:44 ` bfields
2020-12-03 21:54 ` bfields
2020-12-03 22:45 ` bfields
2020-12-03 22:53 ` Trond Myklebust
2020-12-03 23:16 ` bfields
2020-12-03 23:28 ` Frank Filz
2020-12-04 1:02 ` Trond Myklebust
2020-12-04 1:41 ` bfields
2020-12-04 2:27 ` Trond Myklebust
2020-09-17 16:01 ` Daire Byrne
2020-09-17 19:09 ` bfields
2020-09-17 20:23 ` Frank van der Linden
2020-09-17 21:57 ` bfields
2020-09-19 11:08 ` Daire Byrne
2020-09-22 16:43 ` Chuck Lever
2020-09-23 20:25 ` Daire Byrne
2020-09-23 21:01 ` Frank van der Linden
2020-09-26 9:00 ` Daire Byrne
2020-09-28 15:49 ` Frank van der Linden
2020-09-28 16:08 ` Chuck Lever
2020-09-28 17:42 ` Frank van der Linden
2020-09-22 12:31 ` Daire Byrne
2020-09-22 13:52 ` Trond Myklebust
2020-09-23 12:40 ` J. Bruce Fields
2020-09-23 13:09 ` Trond Myklebust
2020-09-23 17:07 ` bfields
2020-09-30 19:30 ` [Linux-cachefs] " Jeff Layton
2020-10-01 0:09 ` Daire Byrne
2020-10-01 10:36 ` Jeff Layton
2020-10-01 12:38 ` Trond Myklebust
2020-10-01 16:39 ` Jeff Layton
2020-10-05 12:54 ` Daire Byrne
2020-10-13 9:59 ` Daire Byrne
2020-10-01 18:41 ` J. Bruce Fields
2020-10-01 19:24 ` Trond Myklebust
2020-10-01 19:26 ` bfields
2020-10-01 19:29 ` Trond Myklebust
2020-10-01 19:51 ` bfields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201116151850.GD898@fieldses.org \
--to=bfields@fieldses.org \
--cc=daire@dneg.com \
--cc=jlayton@kernel.org \
--cc=linux-cachefs@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).