linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-nfs@vger.kernel.org, Ext4 <linux-ext4@vger.kernel.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t
Date: Thu, 22 Dec 2016 08:27:41 -0500	[thread overview]
Message-ID: <1482413261.3924.17.camel@redhat.com> (raw)
In-Reply-To: <CAOQ4uxi_z1DK4tM3DVTtyAM8matJk+KqakUnaequsEXS2En9Xg@mail.gmail.com>

On Thu, 2016-12-22 at 10:38 +0200, Amir Goldstein wrote:
> On Wed, Dec 21, 2016 at 7:03 PM, Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > The spinlock is only used to serialize callers that want to increment
> > the counter. We can achieve the same thing with an atomic64_t and
> > get the i_lock out of this codepath.
> > 
> 
> Cool work! See some nits and suggestions below.
> 
> > 
> > +/*
> > + * We borrow the top bit in the i_version to use as a flag to tell us whether
> > + * it has been queried since we last bumped it. If it has, then we must bump
> > + * it and set the flag. Note that this means that we have to handle wrapping
> > + * manually.
> > + */
> > +#define INODE_I_VERSION_QUERIED                (1ULL<<63)
> > +
> >  /**
> >   * inode_set_iversion - set i_version to a particular value
> >   * @inode: inode to set
> > @@ -1976,7 +1980,7 @@ static inline void inode_dec_link_count(struct inode *inode)
> >  static inline void
> >  inode_set_iversion(struct inode *inode, const u64 new)
> >  {
> > -       inode->i_version = new;
> > +       atomic64_set(&inode->i_version, new);
> >  }
> > 
> 
> Maybe needs an overflow sanity check !(new & INODE_I_VERSION_QUERIED)??
> See API change suggestion below.
> 
> 

Possibly. Note that in some cases (when the i_version can be stored on
disk across a remount), we need to ensure that we set this flag when the
inode is read in from disk. It's always possible that we'll get a query
for it, and then crash so we always set the flag just in case.

> > 
> >  /**
> > @@ -2010,16 +2011,26 @@ inode_set_iversion_read(struct inode *inode, const u64 new)
> >  static inline bool
> >  inode_inc_iversion(struct inode *inode, bool force)
> >  {
> > -       bool ret = false;
> > +       u64 cur, old, new;
> > +
> > +       cur = (u64)atomic64_read(&inode->i_version);
> > +       for (;;) {
> > +               /* If flag is clear then we needn't do anything */
> > +               if (!force && !(cur & INODE_I_VERSION_QUERIED))
> > +                       return false;
> > +
> > +               new = (cur & ~INODE_I_VERSION_QUERIED) + 1;
> > +
> > +               /* Did we overflow into flag bit? Reset to 0 if so. */
> > +               if (unlikely(new == INODE_I_VERSION_QUERIED))
> > +                       new = 0;
> > 
> 
> Did you consider changing f_version type and the signature of the new
> i_version API to set/get s64 instead of u64?
> 
> It makes a bit more sense from API users perspective to know that
> the valid range for version is >=0.
> 
> file->f_version is not the only struct member used to store&compare
> i_version. nfs and xfs have other struct members for that, but even
> if all those members are not changed to type s64, the explicit cast
> to (s64) and back to (u64) will serve as a good documentation in
> the code about the valid range of version in the new API.
> 

This API is definitely not set in stone. That said, we have to consider
that there are really three classes of filesystems here:

1) ones that treat i_version as an opaque value: Mostly AFS and NFS,
as they get this value from the server. These both can also use the
entire u64 field, so we need to ensure that we don't monkey with the
flag bit on them.

2) filesystems that just use it internally: These don't set MS_I_VERSION
and mostly use it to detect directory changes that occur during readdir.
i_version is initialized to some value (0 or 1) when the struct inode is
allocated and bump it on directory changes.

3) filesystems where the kernel manages it completely: these set
MS_I_VERSION and the kernel handles bumping it on writes. Currently,
this is btrfs, ext4 and xfs. These are persistent across remounts as
well.

So, we have to ensure that this API encompasses all 3 of these use
cases.

> >  /**
> > @@ -2080,7 +2099,7 @@ inode_get_iversion(struct inode *inode)
> >  static inline s64
> >  inode_cmp_iversion(const struct inode *inode, const u64 old)
> >  {
> > -       return (s64)inode->i_version - (s64)old;
> > +       return (s64)(atomic64_read(&inode->i_version) << 1) - (s64)(old << 1);
> >  }
> > 
> 
> IMO, it is better for the API to determine that 'old' is valid a value
> returned from
> inode_get_iversion* and therefore should not have the MSB set.
> Unless the reason you chose to shift those 2 values is because it is cheaper
> then masking INODE_I_VERSION_QUERIED??
> 
> 

No, we need to do that in order to handle wraparound correctly. We want
this check to work something like the time_before/after macros in the
kernel that handle jiffies wraparound.

So, the sign returned here matters, as positive values indicate that the
current one is "newer" than the old one. That's the main reason for the
shift here.

Note that that that should be documented here too, I'll plan to add that
for the next revision.

Thanks for the comments so far!
-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2016-12-22 13:27 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36   ` J. Bruce Fields
2017-03-04  0:09     ` Jeff Layton
2017-03-03 23:55   ` NeilBrown
2017-03-04  1:58     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04  0:03   ` NeilBrown
2017-03-04  0:43     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-22  8:38   ` Amir Goldstein
2016-12-22 13:27     ` Jeff Layton [this message]
2017-03-04  0:00   ` NeilBrown
2016-12-22  8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42   ` Jeff Layton
2017-03-20 21:43     ` J. Bruce Fields
2017-03-21 13:45       ` Christoph Hellwig
2017-03-21 16:30         ` J. Bruce Fields
2017-03-21 17:23           ` Jeff Layton
2017-03-21 17:37             ` J. Bruce Fields
2017-03-21 17:51               ` J. Bruce Fields
2017-03-21 18:30             ` J. Bruce Fields
2017-03-21 18:46               ` Jeff Layton
2017-03-21 19:13                 ` J. Bruce Fields
2017-03-21 21:54                   ` Jeff Layton
2017-03-29 11:15                 ` Jan Kara
2017-03-29 17:54                   ` Jeff Layton
2017-03-29 23:41                     ` Dave Chinner
2017-03-30 11:24                       ` Jeff Layton
2017-04-04 18:38                       ` J. Bruce Fields
2017-03-30  6:47                     ` Jan Kara
2017-03-30 11:11                       ` Jeff Layton
2017-03-30 16:12                         ` J. Bruce Fields
2017-03-30 18:35                           ` Jeff Layton
2017-03-30 21:11                             ` Boaz Harrosh
2017-04-04 18:31                             ` J. Bruce Fields
2017-04-05  1:43                               ` NeilBrown
2017-04-05  8:05                                 ` Jan Kara
2017-04-05 18:14                                   ` J. Bruce Fields
2017-05-11 18:59                                     ` J. Bruce Fields
2017-05-11 22:22                                       ` NeilBrown
2017-05-12 16:21                                         ` J. Bruce Fields
2017-10-30 13:21                                           ` Jeff Layton
2017-05-12  8:27                                       ` Jan Kara
2017-05-12 15:56                                         ` J. Bruce Fields
2017-05-12 11:01                                       ` Jeff Layton
2017-05-12 15:57                                         ` J. Bruce Fields
2017-04-06  1:12                                   ` NeilBrown
2017-04-06  7:22                                     ` Jan Kara
2017-04-05 17:26                                 ` J. Bruce Fields
2017-04-01 23:05                           ` Dave Chinner
2017-04-03 14:00                             ` Jan Kara
2017-04-04 12:34                               ` Dave Chinner
2017-04-04 17:53                                 ` J. Bruce Fields
2017-04-05  1:26                                 ` NeilBrown
2017-03-21 21:45             ` Dave Chinner
2017-03-22 19:53               ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04  0:53   ` Jeff Layton
2017-03-08 17:29     ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1482413261.3924.17.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=amir73il@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).