From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CB71C4742C for ; Mon, 16 Nov 2020 15:29:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EBBB6216C4 for ; Mon, 16 Nov 2020 15:29:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="gtVXO6JS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729729AbgKPP3c (ORCPT ); Mon, 16 Nov 2020 10:29:32 -0500 Received: from mail.kernel.org ([198.145.29.99]:52552 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728029AbgKPP3c (ORCPT ); Mon, 16 Nov 2020 10:29:32 -0500 Received: from tleilax.poochiereds.net (68-20-15-154.lightspeed.rlghnc.sbcglobal.net [68.20.15.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BA7762078E; Mon, 16 Nov 2020 15:29:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605540571; bh=VPus0irQdfOSUV31LzV4IP2qOddzMB7EMuTRl0SBDL8=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=gtVXO6JS98VO/sYIzNgIlteGhjE9fWJHWEtnVhNBP8ItajcSmCnK/wN0FKVGhLSZK bnsY7CAElCzMUyQm+CfnjhRDULCqeIZdymraEbdArdkQtWF6pwhRSvKtcqcAvAE1yA dj41rhw7gxYHsZ2EZVJSk5bz+4p1dcrjPTo8BPEQ= Message-ID: Subject: Re: Adventures in NFS re-exporting From: Jeff Layton To: bfields , Daire Byrne Cc: Trond Myklebust , linux-cachefs , linux-nfs Date: Mon, 16 Nov 2020 10:29:29 -0500 In-Reply-To: <20201113222600.GC1299@fieldses.org> References: <943482310.31162206.1599499860595.JavaMail.zimbra@dneg.com> <279389889.68934777.1603124383614.JavaMail.zimbra@dneg.com> <635679406.70384074.1603272832846.JavaMail.zimbra@dneg.com> <20201109160256.GB11144@fieldses.org> <1744768451.86186596.1605186084252.JavaMail.zimbra@dneg.com> <20201112135733.GA9243@fieldses.org> <444227972.86442677.1605206025305.JavaMail.zimbra@dneg.com> <20201112205524.GI9243@fieldses.org> <883314904.86570901.1605222357023.JavaMail.zimbra@dneg.com> <20201113145050.GB1299@fieldses.org> <20201113222600.GC1299@fieldses.org> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.1 (3.38.1-1.fc33) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, 2020-11-13 at 17:26 -0500, bfields wrote: > On Fri, Nov 13, 2020 at 09:50:50AM -0500, bfields wrote: > > On Thu, Nov 12, 2020 at 11:05:57PM +0000, Daire Byrne wrote: > > > So, I can't lay claim to identifying the exact optimisation/hack that > > > improves the retention of the re-export server's client cache when > > > re-exporting an NFSv3 server (which is then read by many clients). We > > > were working with an engineer at the time who showed an interest in > > > our use case and after we supplied a reproducer he suggested modifying > > > the nfs/inode.c > > > > > > - if (!inode_eq_iversion_raw(inode, fattr->change_attr)) { > > > + if (inode_peek_iversion_raw(inode) < fattr->change_attr) > > > { > > > > > > His reasoning at the time was: > > > > > > "Fixes inode invalidation caused by read access. The least important > > > bit is ORed with 1 and causes the inode version to differ from the one > > > seen on the NFS share. This in turn causes unnecessary re-download > > > impacting the performance significantly. This fix makes it only > > > re-fetch file content if inode version seen on the server is newer > > > than the one on the client." > > > > > > But I've always been puzzled by why this only seems to be the case > > > when using knfsd to re-export the (NFSv3) client mount. Using multiple > > > processes on a standard client mount never causes any similar > > > re-validations. And this happens with a completely read-only share > > > which is why I started to think it has something to do with atimes as > > > that could perhaps still cause a "write" modification even when > > > read-only? > > > > Ah-hah! So, it's inode_query_iversion() that's modifying a nfs inode's > > i_version. That's a special thing that only nfsd would do. > > > > I think that's totally fixable, we'll just have to think a little about > > how.... > > I wonder if something like this helps?--b. > > commit 0add88a9ccc5 > Author: J. Bruce Fields > Date: Fri Nov 13 17:03:04 2020 -0500 > >     nfs: don't mangle i_version on NFS >      > >     The i_version on NFS has pretty much opaque to the client, so we don't >     want to give the low bit any special interpretation. >      > >     Define a new FS_PRIVATE_I_VERSION flag for filesystems that manage the >     i_version on their own. >      > >     Signed-off-by: J. Bruce Fields > > diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c > index 29ec8b09a52d..9b8dd5b713a7 100644 > --- a/fs/nfs/fs_context.c > +++ b/fs/nfs/fs_context.c > @@ -1488,7 +1488,8 @@ struct file_system_type nfs_fs_type = { >   .init_fs_context = nfs_init_fs_context, >   .parameters = nfs_fs_parameters, >   .kill_sb = nfs_kill_super, > - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA, > + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA| > + FS_PRIVATE_I_VERSION, >  }; >  MODULE_ALIAS_FS("nfs"); >  EXPORT_SYMBOL_GPL(nfs_fs_type); > @@ -1500,7 +1501,8 @@ struct file_system_type nfs4_fs_type = { >   .init_fs_context = nfs_init_fs_context, >   .parameters = nfs_fs_parameters, >   .kill_sb = nfs_kill_super, > - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA, > + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_BINARY_MOUNTDATA| > + FS_PRIVATE_I_VERSION, >  }; >  MODULE_ALIAS_FS("nfs4"); >  MODULE_ALIAS("nfs4"); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 21cc971fd960..c5bb4268228b 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2217,6 +2217,7 @@ struct file_system_type { >  #define FS_HAS_SUBTYPE 4 >  #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ >  #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ > +#define FS_PRIVATE_I_VERSION 32 /* i_version managed by filesystem */ >  #define FS_THP_SUPPORT 8192 /* Remove once all fs converted */ >  #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ >   int (*init_fs_context)(struct fs_context *); > diff --git a/include/linux/iversion.h b/include/linux/iversion.h > index 2917ef990d43..52c790a847de 100644 > --- a/include/linux/iversion.h > +++ b/include/linux/iversion.h > @@ -307,6 +307,8 @@ inode_query_iversion(struct inode *inode) >   u64 cur, old, new; >   > >   cur = inode_peek_iversion_raw(inode); > + if (inode->i_sb->s_type->fs_flags & FS_PRIVATE_I_VERSION) > + return cur; >   for (;;) { >   /* If flag is already set, then no need to swap */ >   if (cur & I_VERSION_QUERIED) { It's probably more correct to just check the already-existing SB_I_VERSION flag here (though in hindsight a fstype flag might have made more sense). -- Jeff Layton