linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jan Kara <jack@suse.cz>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
Date: Thu, 11 May 2017 14:59:43 -0400	[thread overview]
Message-ID: <20170511185942.GD25434@fieldses.org> (raw)
In-Reply-To: <20170405181409.GC28681@fieldses.org>

On Wed, Apr 05, 2017 at 02:14:09PM -0400, J. Bruce Fields wrote:
> On Wed, Apr 05, 2017 at 10:05:51AM +0200, Jan Kara wrote:
> > 1) Keep i_version as is, make clients also check for i_ctime.
> 
> That would be a protocol revision, which we'd definitely rather avoid.
> 
> But can't we accomplish the same by using something like
> 
> 	ctime * (some constant) + i_version
> 
> ?
> 
> >    Pro: No on-disk format changes.
> >    Cons: After a crash, i_version can go backwards (but when file changes
> >    i_version, i_ctime pair should be still different) or not, data can be
> >    old or not.
> 
> This is probably good enough for NFS purposes: typically on an NFS
> filesystem, results of a read in the face of a concurrent write open are
> undefined.  And writers sync before close.
> 
> So after a crash with a dirty inode, we're in a situation where an NFS
> client still needs to resend some writes, sync, and close.  I'm OK with
> things being inconsistent during this window.
> 
> I do expect things to return to normal once that client's has resent its
> writes--hence the worry about actually resuing old values after boot
> (such as if i_version regresses on boot and then increments back to the
> same value after further writes).  Factoring in ctime fixes that.

So for now I'm thinking of just doing something like the following.

Only nfsd needs it for now, but it could be moved to a vfs helper for
statx, or for individual filesystems that want to do something
different.  (The NFSv4 client will want to use the server's change
attribute instead, I think.  And other filesystems might want to try
something more ambitious like Neil's proposal.)

--b.

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 12feac6ee2fd..9636c9a60aba 100644
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index f84fe6bf9aee..14f09f1ef605 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -240,6 +240,16 @@ fh_clear_wcc(struct svc_fh *fhp)
 	fhp->fh_pre_saved = false;
 }
 
+static inline u64 nfsd4_change_attribute(struct inode *inode)
+{
+	u64 chattr;
+
+	chattr = inode->i_ctime.tv_sec << 30;
+	chattr += inode->i_ctime.tv_nsec;
+	chattr += inode->i_version;
+	return chattr;
+}
+
 /*
  * Fill in the pre_op attr for the wcc data
  */
@@ -253,7 +263,7 @@ fill_pre_wcc(struct svc_fh *fhp)
 		fhp->fh_pre_mtime = inode->i_mtime;
 		fhp->fh_pre_ctime = inode->i_ctime;
 		fhp->fh_pre_size  = inode->i_size;
-		fhp->fh_pre_change = inode->i_version;
+		fhp->fh_pre_change = nfsd4_change_attribute(inode);
 		fhp->fh_pre_saved = true;
 	}
 }
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -260,7 +260,7 @@ void fill_post_wcc(struct svc_fh *fhp)
 		printk("nfsd: inode locked twice during operation.\n");
 
 	err = fh_getattr(fhp, &fhp->fh_post_attr);
-	fhp->fh_post_change = d_inode(fhp->fh_dentry)->i_version;
+	fhp->fh_post_change = nfsd4_change_attribute(d_inode(fhp->fh_dentry));
 	if (err) {
 		fhp->fh_post_saved = false;
 		/* Grab the ctime anyway - set_change_info might use it */
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 26780d53a6f9..a09532d4a383 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1973,7 +1973,7 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode,
 		*p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time));
 		*p++ = 0;
 	} else if (IS_I_VERSION(inode)) {
-		p = xdr_encode_hyper(p, inode->i_version);
+		p = xdr_encode_hyper(p, nfsd4_change_attribute(inode));
 	} else {
 		*p++ = cpu_to_be32(stat->ctime.tv_sec);
 		*p++ = cpu_to_be32(stat->ctime.tv_nsec);

  reply	other threads:[~2017-05-11 18:59 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36   ` J. Bruce Fields
2017-03-04  0:09     ` Jeff Layton
2017-03-03 23:55   ` NeilBrown
2017-03-04  1:58     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04  0:03   ` NeilBrown
2017-03-04  0:43     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-22  8:38   ` Amir Goldstein
2016-12-22 13:27     ` Jeff Layton
2017-03-04  0:00   ` NeilBrown
2016-12-22  8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42   ` Jeff Layton
2017-03-20 21:43     ` J. Bruce Fields
2017-03-21 13:45       ` Christoph Hellwig
2017-03-21 16:30         ` J. Bruce Fields
2017-03-21 17:23           ` Jeff Layton
2017-03-21 17:37             ` J. Bruce Fields
2017-03-21 17:51               ` J. Bruce Fields
2017-03-21 18:30             ` J. Bruce Fields
2017-03-21 18:46               ` Jeff Layton
2017-03-21 19:13                 ` J. Bruce Fields
2017-03-21 21:54                   ` Jeff Layton
2017-03-29 11:15                 ` Jan Kara
2017-03-29 17:54                   ` Jeff Layton
2017-03-29 23:41                     ` Dave Chinner
2017-03-30 11:24                       ` Jeff Layton
2017-04-04 18:38                       ` J. Bruce Fields
2017-03-30  6:47                     ` Jan Kara
2017-03-30 11:11                       ` Jeff Layton
2017-03-30 16:12                         ` J. Bruce Fields
2017-03-30 18:35                           ` Jeff Layton
2017-03-30 21:11                             ` Boaz Harrosh
2017-04-04 18:31                             ` J. Bruce Fields
2017-04-05  1:43                               ` NeilBrown
2017-04-05  8:05                                 ` Jan Kara
2017-04-05 18:14                                   ` J. Bruce Fields
2017-05-11 18:59                                     ` J. Bruce Fields [this message]
2017-05-11 22:22                                       ` NeilBrown
2017-05-12 16:21                                         ` J. Bruce Fields
2017-10-30 13:21                                           ` Jeff Layton
2017-05-12  8:27                                       ` Jan Kara
2017-05-12 15:56                                         ` J. Bruce Fields
2017-05-12 11:01                                       ` Jeff Layton
2017-05-12 15:57                                         ` J. Bruce Fields
2017-04-06  1:12                                   ` NeilBrown
2017-04-06  7:22                                     ` Jan Kara
2017-04-05 17:26                                 ` J. Bruce Fields
2017-04-01 23:05                           ` Dave Chinner
2017-04-03 14:00                             ` Jan Kara
2017-04-04 12:34                               ` Dave Chinner
2017-04-04 17:53                                 ` J. Bruce Fields
2017-04-05  1:26                                 ` NeilBrown
2017-03-21 21:45             ` Dave Chinner
2017-03-22 19:53               ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04  0:53   ` Jeff Layton
2017-03-08 17:29     ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511185942.GD25434@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neil@brown.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).