linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neil@brown.name>
To: "J. Bruce Fields" <bfields@fieldses.org>, Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
Date: Fri, 12 May 2017 08:22:23 +1000	[thread overview]
Message-ID: <87r2zvkp9c.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170511185942.GD25434@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 4590 bytes --]

On Thu, May 11 2017, J. Bruce Fields wrote:

> On Wed, Apr 05, 2017 at 02:14:09PM -0400, J. Bruce Fields wrote:
>> On Wed, Apr 05, 2017 at 10:05:51AM +0200, Jan Kara wrote:
>> > 1) Keep i_version as is, make clients also check for i_ctime.
>> 
>> That would be a protocol revision, which we'd definitely rather avoid.
>> 
>> But can't we accomplish the same by using something like
>> 
>> 	ctime * (some constant) + i_version
>> 
>> ?
>> 
>> >    Pro: No on-disk format changes.
>> >    Cons: After a crash, i_version can go backwards (but when file changes
>> >    i_version, i_ctime pair should be still different) or not, data can be
>> >    old or not.
>> 
>> This is probably good enough for NFS purposes: typically on an NFS
>> filesystem, results of a read in the face of a concurrent write open are
>> undefined.  And writers sync before close.
>> 
>> So after a crash with a dirty inode, we're in a situation where an NFS
>> client still needs to resend some writes, sync, and close.  I'm OK with
>> things being inconsistent during this window.
>> 
>> I do expect things to return to normal once that client's has resent its
>> writes--hence the worry about actually resuing old values after boot
>> (such as if i_version regresses on boot and then increments back to the
>> same value after further writes).  Factoring in ctime fixes that.
>
> So for now I'm thinking of just doing something like the following.
>
> Only nfsd needs it for now, but it could be moved to a vfs helper for
> statx, or for individual filesystems that want to do something
> different.  (The NFSv4 client will want to use the server's change
> attribute instead, I think.  And other filesystems might want to try
> something more ambitious like Neil's proposal.)
>
> --b.
>
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index 12feac6ee2fd..9636c9a60aba 100644
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index f84fe6bf9aee..14f09f1ef605 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -240,6 +240,16 @@ fh_clear_wcc(struct svc_fh *fhp)
>  	fhp->fh_pre_saved = false;
>  }
>  
> +static inline u64 nfsd4_change_attribute(struct inode *inode)
> +{
> +	u64 chattr;
> +
> +	chattr = inode->i_ctime.tv_sec << 30;
> +	chattr += inode->i_ctime.tv_nsec;
> +	chattr += inode->i_version;
> +	return chattr;

So if I chmod a file, all clients will need to flush the content from their cache?
Maybe they already do?  Maybe it is a boring corner case?

> +}
> +
>  /*
>   * Fill in the pre_op attr for the wcc data
>   */
> @@ -253,7 +263,7 @@ fill_pre_wcc(struct svc_fh *fhp)
>  		fhp->fh_pre_mtime = inode->i_mtime;
>  		fhp->fh_pre_ctime = inode->i_ctime;
>  		fhp->fh_pre_size  = inode->i_size;
> -		fhp->fh_pre_change = inode->i_version;
> +		fhp->fh_pre_change = nfsd4_change_attribute(inode);
>  		fhp->fh_pre_saved = true;
>  	}
>  }
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -260,7 +260,7 @@ void fill_post_wcc(struct svc_fh *fhp)
>  		printk("nfsd: inode locked twice during operation.\n");
>  
>  	err = fh_getattr(fhp, &fhp->fh_post_attr);
> -	fhp->fh_post_change = d_inode(fhp->fh_dentry)->i_version;
> +	fhp->fh_post_change = nfsd4_change_attribute(d_inode(fhp->fh_dentry));
>  	if (err) {
>  		fhp->fh_post_saved = false;
>  		/* Grab the ctime anyway - set_change_info might use it */
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 26780d53a6f9..a09532d4a383 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1973,7 +1973,7 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode,
>  		*p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time));
>  		*p++ = 0;
>  	} else if (IS_I_VERSION(inode)) {
> -		p = xdr_encode_hyper(p, inode->i_version);
> +		p = xdr_encode_hyper(p, nfsd4_change_attribute(inode));
>  	} else {
>  		*p++ = cpu_to_be32(stat->ctime.tv_sec);
>  		*p++ = cpu_to_be32(stat->ctime.tv_nsec);

It is *really* confusing to find that fh_post_change is only set in nfs3
code, and only used in nfs4 code.
It is probably time to get a 'version' field in 'struct kstat'.
That would allow this code to get a little cleaner.

(to me, this exercise is just a reminder that the NFSv4 change attribute
is poorly designed ... so it just makes me grumpy).

NeilBrown


> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-05-11 22:22 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36   ` J. Bruce Fields
2017-03-04  0:09     ` Jeff Layton
2017-03-03 23:55   ` NeilBrown
2017-03-04  1:58     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04  0:03   ` NeilBrown
2017-03-04  0:43     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-22  8:38   ` Amir Goldstein
2016-12-22 13:27     ` Jeff Layton
2017-03-04  0:00   ` NeilBrown
2016-12-22  8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42   ` Jeff Layton
2017-03-20 21:43     ` J. Bruce Fields
2017-03-21 13:45       ` Christoph Hellwig
2017-03-21 16:30         ` J. Bruce Fields
2017-03-21 17:23           ` Jeff Layton
2017-03-21 17:37             ` J. Bruce Fields
2017-03-21 17:51               ` J. Bruce Fields
2017-03-21 18:30             ` J. Bruce Fields
2017-03-21 18:46               ` Jeff Layton
2017-03-21 19:13                 ` J. Bruce Fields
2017-03-21 21:54                   ` Jeff Layton
2017-03-29 11:15                 ` Jan Kara
2017-03-29 17:54                   ` Jeff Layton
2017-03-29 23:41                     ` Dave Chinner
2017-03-30 11:24                       ` Jeff Layton
2017-04-04 18:38                       ` J. Bruce Fields
2017-03-30  6:47                     ` Jan Kara
2017-03-30 11:11                       ` Jeff Layton
2017-03-30 16:12                         ` J. Bruce Fields
2017-03-30 18:35                           ` Jeff Layton
2017-03-30 21:11                             ` Boaz Harrosh
2017-04-04 18:31                             ` J. Bruce Fields
2017-04-05  1:43                               ` NeilBrown
2017-04-05  8:05                                 ` Jan Kara
2017-04-05 18:14                                   ` J. Bruce Fields
2017-05-11 18:59                                     ` J. Bruce Fields
2017-05-11 22:22                                       ` NeilBrown [this message]
2017-05-12 16:21                                         ` J. Bruce Fields
2017-10-30 13:21                                           ` Jeff Layton
2017-05-12  8:27                                       ` Jan Kara
2017-05-12 15:56                                         ` J. Bruce Fields
2017-05-12 11:01                                       ` Jeff Layton
2017-05-12 15:57                                         ` J. Bruce Fields
2017-04-06  1:12                                   ` NeilBrown
2017-04-06  7:22                                     ` Jan Kara
2017-04-05 17:26                                 ` J. Bruce Fields
2017-04-01 23:05                           ` Dave Chinner
2017-04-03 14:00                             ` Jan Kara
2017-04-04 12:34                               ` Dave Chinner
2017-04-04 17:53                                 ` J. Bruce Fields
2017-04-05  1:26                                 ` NeilBrown
2017-03-21 21:45             ` Dave Chinner
2017-03-22 19:53               ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04  0:53   ` Jeff Layton
2017-03-08 17:29     ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r2zvkp9c.fsf@notabene.neil.brown.name \
    --to=neil@brown.name \
    --cc=bfields@fieldses.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).