From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jan Kara <jack@suse.cz>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
Date: Fri, 12 May 2017 11:56:01 -0400 [thread overview]
Message-ID: <20170512155601.GE7704@fieldses.org> (raw)
In-Reply-To: <20170512082754.GB31470@quack2.suse.cz>
On Fri, May 12, 2017 at 10:27:54AM +0200, Jan Kara wrote:
> On Thu 11-05-17 14:59:43, J. Bruce Fields wrote:
> > On Wed, Apr 05, 2017 at 02:14:09PM -0400, J. Bruce Fields wrote:
> > > On Wed, Apr 05, 2017 at 10:05:51AM +0200, Jan Kara wrote:
> > > > 1) Keep i_version as is, make clients also check for i_ctime.
> > >
> > > That would be a protocol revision, which we'd definitely rather avoid.
> > >
> > > But can't we accomplish the same by using something like
> > >
> > > ctime * (some constant) + i_version
> > >
> > > ?
> > >
> > > > Pro: No on-disk format changes.
> > > > Cons: After a crash, i_version can go backwards (but when file changes
> > > > i_version, i_ctime pair should be still different) or not, data can be
> > > > old or not.
> > >
> > > This is probably good enough for NFS purposes: typically on an NFS
> > > filesystem, results of a read in the face of a concurrent write open are
> > > undefined. And writers sync before close.
> > >
> > > So after a crash with a dirty inode, we're in a situation where an NFS
> > > client still needs to resend some writes, sync, and close. I'm OK with
> > > things being inconsistent during this window.
> > >
> > > I do expect things to return to normal once that client's has resent its
> > > writes--hence the worry about actually resuing old values after boot
> > > (such as if i_version regresses on boot and then increments back to the
> > > same value after further writes). Factoring in ctime fixes that.
> >
> > So for now I'm thinking of just doing something like the following.
> >
> > Only nfsd needs it for now, but it could be moved to a vfs helper for
> > statx, or for individual filesystems that want to do something
> > different. (The NFSv4 client will want to use the server's change
> > attribute instead, I think. And other filesystems might want to try
> > something more ambitious like Neil's proposal.)
> >
> > --b.
> >
> > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > index 12feac6ee2fd..9636c9a60aba 100644
> > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > index f84fe6bf9aee..14f09f1ef605 100644
> > --- a/fs/nfsd/nfsfh.h
> > +++ b/fs/nfsd/nfsfh.h
> > @@ -240,6 +240,16 @@ fh_clear_wcc(struct svc_fh *fhp)
> > fhp->fh_pre_saved = false;
> > }
> >
> > +static inline u64 nfsd4_change_attribute(struct inode *inode)
> > +{
> > + u64 chattr;
> > +
> > + chattr = inode->i_ctime.tv_sec << 30;
>
> Won't this overflow on 32-bit archs? tv_sec seems to be defined as long?
> Probably you need explicit (u64) cast... Otherwise I'm fine with this.
Whoops, yes. Or just assign to chattr as a separate step. I'll fix
that.
--b.
> > + chattr += inode->i_ctime.tv_nsec;
> > + chattr += inode->i_version;
> > + return chattr;
> > +}
> > +
> > /*
> > * Fill in the pre_op attr for the wcc data
> > */
> > @@ -253,7 +263,7 @@ fill_pre_wcc(struct svc_fh *fhp)
> > fhp->fh_pre_mtime = inode->i_mtime;
> > fhp->fh_pre_ctime = inode->i_ctime;
> > fhp->fh_pre_size = inode->i_size;
> > - fhp->fh_pre_change = inode->i_version;
> > + fhp->fh_pre_change = nfsd4_change_attribute(inode);
> > fhp->fh_pre_saved = true;
> > }
> > }
> > --- a/fs/nfsd/nfs3xdr.c
> > +++ b/fs/nfsd/nfs3xdr.c
> > @@ -260,7 +260,7 @@ void fill_post_wcc(struct svc_fh *fhp)
> > printk("nfsd: inode locked twice during operation.\n");
> >
> > err = fh_getattr(fhp, &fhp->fh_post_attr);
> > - fhp->fh_post_change = d_inode(fhp->fh_dentry)->i_version;
> > + fhp->fh_post_change = nfsd4_change_attribute(d_inode(fhp->fh_dentry));
> > if (err) {
> > fhp->fh_post_saved = false;
> > /* Grab the ctime anyway - set_change_info might use it */
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 26780d53a6f9..a09532d4a383 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -1973,7 +1973,7 @@ static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode,
> > *p++ = cpu_to_be32(convert_to_wallclock(exp->cd->flush_time));
> > *p++ = 0;
> > } else if (IS_I_VERSION(inode)) {
> > - p = xdr_encode_hyper(p, inode->i_version);
> > + p = xdr_encode_hyper(p, nfsd4_change_attribute(inode));
> > } else {
> > *p++ = cpu_to_be32(stat->ctime.tv_sec);
> > *p++ = cpu_to_be32(stat->ctime.tv_nsec);
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
next prev parent reply other threads:[~2017-05-12 15:56 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36 ` J. Bruce Fields
2017-03-04 0:09 ` Jeff Layton
2017-03-03 23:55 ` NeilBrown
2017-03-04 1:58 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04 0:03 ` NeilBrown
2017-03-04 0:43 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-22 8:38 ` Amir Goldstein
2016-12-22 13:27 ` Jeff Layton
2017-03-04 0:00 ` NeilBrown
2016-12-22 8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42 ` Jeff Layton
2017-03-20 21:43 ` J. Bruce Fields
2017-03-21 13:45 ` Christoph Hellwig
2017-03-21 16:30 ` J. Bruce Fields
2017-03-21 17:23 ` Jeff Layton
2017-03-21 17:37 ` J. Bruce Fields
2017-03-21 17:51 ` J. Bruce Fields
2017-03-21 18:30 ` J. Bruce Fields
2017-03-21 18:46 ` Jeff Layton
2017-03-21 19:13 ` J. Bruce Fields
2017-03-21 21:54 ` Jeff Layton
2017-03-29 11:15 ` Jan Kara
2017-03-29 17:54 ` Jeff Layton
2017-03-29 23:41 ` Dave Chinner
2017-03-30 11:24 ` Jeff Layton
2017-04-04 18:38 ` J. Bruce Fields
2017-03-30 6:47 ` Jan Kara
2017-03-30 11:11 ` Jeff Layton
2017-03-30 16:12 ` J. Bruce Fields
2017-03-30 18:35 ` Jeff Layton
2017-03-30 21:11 ` Boaz Harrosh
2017-04-04 18:31 ` J. Bruce Fields
2017-04-05 1:43 ` NeilBrown
2017-04-05 8:05 ` Jan Kara
2017-04-05 18:14 ` J. Bruce Fields
2017-05-11 18:59 ` J. Bruce Fields
2017-05-11 22:22 ` NeilBrown
2017-05-12 16:21 ` J. Bruce Fields
2017-10-30 13:21 ` Jeff Layton
2017-05-12 8:27 ` Jan Kara
2017-05-12 15:56 ` J. Bruce Fields [this message]
2017-05-12 11:01 ` Jeff Layton
2017-05-12 15:57 ` J. Bruce Fields
2017-04-06 1:12 ` NeilBrown
2017-04-06 7:22 ` Jan Kara
2017-04-05 17:26 ` J. Bruce Fields
2017-04-01 23:05 ` Dave Chinner
2017-04-03 14:00 ` Jan Kara
2017-04-04 12:34 ` Dave Chinner
2017-04-04 17:53 ` J. Bruce Fields
2017-04-05 1:26 ` NeilBrown
2017-03-21 21:45 ` Dave Chinner
2017-03-22 19:53 ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04 0:53 ` Jeff Layton
2017-03-08 17:29 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170512155601.GE7704@fieldses.org \
--to=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jlayton@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=neil@brown.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).