linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-xfs@vger.kernel.org
Subject: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t
Date: Wed, 21 Dec 2016 12:03:47 -0500	[thread overview]
Message-ID: <1482339827-7882-31-git-send-email-jlayton@redhat.com> (raw)
In-Reply-To: <1482339827-7882-1-git-send-email-jlayton@redhat.com>

The spinlock is only used to serialize callers that want to increment
the counter. We can achieve the same thing with an atomic64_t and
get the i_lock out of this codepath.

Drop the I_VERS_BUMP flag, and instead, borrow the most significant bit
in the counter to use as the flag. With this change, we can stop taking
the i_lock in this codepath, and can use atomics instead to manage the
thing.

On the query side, if the flag is already set, then we just return the
counter value. Otherwise, we set the flag in our in-memory copy and use
cmpxchg to swap it into place if it hasn't changed. If it has, then we
use the value from the cmpxchg as the new "old" value and try again.

When we go to bump the thing, we fetch the value and check the flag bit.
If it's clear then we don't need to do anything if the update isn't
being forced.

If we do need to update, then we clear the flag in our in-memory copy
and bump the counter (handling any overflow into the flag bit by
resetting the counter to zero). We then do a cmpxchg to swap the updated
value into place if it hasn't changed. If it has changed, then we use
the value we got back from cmpxchg to try again.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 82 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 48 insertions(+), 34 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 917557faa8e8..401e38d76171 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -621,7 +621,7 @@ struct inode {
 		struct hlist_head	i_dentry;
 		struct rcu_head		i_rcu;
 	};
-	u64			i_version;
+	atomic64_t		i_version;
 	atomic_t		i_count;
 	atomic_t		i_dio_count;
 	atomic_t		i_writecount;
@@ -1909,9 +1909,6 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
  *			wb stat updates to grab mapping->tree_lock.  See
  *			inode_switch_wb_work_fn() for details.
  *
- * I_VERS_BUMP		inode->i_version counter must be bumped on the next
- * 			change. See the inode_*_iversion functions.
- *
  * Q: What is the difference between I_WILL_FREE and I_FREEING?
  */
 #define I_DIRTY_SYNC		(1 << 0)
@@ -1932,7 +1929,6 @@ static inline bool HAS_UNMAPPED_ID(struct inode *inode)
 #define __I_DIRTY_TIME_EXPIRED	12
 #define I_DIRTY_TIME_EXPIRED	(1 << __I_DIRTY_TIME_EXPIRED)
 #define I_WB_SWITCH		(1 << 13)
-#define I_VERS_BUMP		(1 << 14)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 #define I_DIRTY_ALL (I_DIRTY | I_DIRTY_TIME)
@@ -1965,6 +1961,14 @@ static inline void inode_dec_link_count(struct inode *inode)
 	mark_inode_dirty(inode);
 }
 
+/*
+ * We borrow the top bit in the i_version to use as a flag to tell us whether
+ * it has been queried since we last bumped it. If it has, then we must bump
+ * it and set the flag. Note that this means that we have to handle wrapping
+ * manually.
+ */
+#define INODE_I_VERSION_QUERIED		(1ULL<<63)
+
 /**
  * inode_set_iversion - set i_version to a particular value
  * @inode: inode to set
@@ -1976,7 +1980,7 @@ static inline void inode_dec_link_count(struct inode *inode)
 static inline void
 inode_set_iversion(struct inode *inode, const u64 new)
 {
-	inode->i_version = new;
+	atomic64_set(&inode->i_version, new);
 }
 
 /**
@@ -1992,10 +1996,7 @@ inode_set_iversion(struct inode *inode, const u64 new)
 static inline void
 inode_set_iversion_read(struct inode *inode, const u64 new)
 {
-	spin_lock(&inode->i_lock);
-	inode_set_iversion(inode, new);
-	inode->i_state |= I_VERS_BUMP;
-	spin_unlock(&inode->i_lock);
+	inode_set_iversion(inode, new | INODE_I_VERSION_QUERIED);
 }
 
 /**
@@ -2010,16 +2011,26 @@ inode_set_iversion_read(struct inode *inode, const u64 new)
 static inline bool
 inode_inc_iversion(struct inode *inode, bool force)
 {
-	bool ret = false;
+	u64 cur, old, new;
+
+	cur = (u64)atomic64_read(&inode->i_version);
+	for (;;) {
+		/* If flag is clear then we needn't do anything */
+		if (!force && !(cur & INODE_I_VERSION_QUERIED))
+			return false;
+
+		new = (cur & ~INODE_I_VERSION_QUERIED) + 1;
+
+		/* Did we overflow into flag bit? Reset to 0 if so. */
+		if (unlikely(new == INODE_I_VERSION_QUERIED))
+			new = 0;
 
-	spin_lock(&inode->i_lock);
-	if (force || (inode->i_state & I_VERS_BUMP)) {
-		inode->i_version++;
-		inode->i_state &= ~I_VERS_BUMP;
-		ret = true;
+		old = atomic64_cmpxchg(&inode->i_version, cur, new);
+		if (likely(old == cur))
+			break;
+		cur = old;
 	}
-	spin_unlock(&inode->i_lock);
-	return ret;
+	return true;
 }
 
 /**
@@ -2027,8 +2038,9 @@ inode_inc_iversion(struct inode *inode, bool force)
  * @inode: inode to be updated
  *
  * Increment the i_version field in the inode. This version is usable
- * when there is some other sort of lock in play that would prevent
- * concurrent increments (typically inode->i_rwsem for write).
+ * when there is some other sort of lock in play (e.g. i_rwsem for write)
+ * that would prevent concurrent incrementors, and is typically used on
+ * directories or other non-regular files.
  */
 static inline void
 inode_inc_iversion_locked(struct inode *inode)
@@ -2047,7 +2059,7 @@ inode_inc_iversion_locked(struct inode *inode)
 static inline u64
 inode_get_iversion_raw(const struct inode *inode)
 {
-	return inode->i_version;
+	return atomic64_read(&inode->i_version) & ~INODE_I_VERSION_QUERIED;
 }
 
 /**
@@ -2060,13 +2072,20 @@ inode_get_iversion_raw(const struct inode *inode)
 static inline u64
 inode_get_iversion(struct inode *inode)
 {
-	u64 ret;
+	u64 cur, old, new;
 
-	spin_lock(&inode->i_lock);
-	inode->i_state |= I_VERS_BUMP;
-	ret = inode->i_version;
-	spin_unlock(&inode->i_lock);
-	return ret;
+	cur = atomic64_read(&inode->i_version);
+	for (;;) {
+		if (cur & INODE_I_VERSION_QUERIED)
+			return (cur & ~INODE_I_VERSION_QUERIED);
+
+		new = (cur | INODE_I_VERSION_QUERIED);
+		old = atomic64_cmpxchg(&inode->i_version, cur, new);
+		if (old == cur)
+			break;
+		cur = old;
+	}
+	return cur;
 }
 
 /**
@@ -2080,7 +2099,7 @@ inode_get_iversion(struct inode *inode)
 static inline s64
 inode_cmp_iversion(const struct inode *inode, const u64 old)
 {
-	return (s64)inode->i_version - (s64)old;
+	return (s64)(atomic64_read(&inode->i_version) << 1) - (s64)(old << 1);
 }
 
 /**
@@ -2093,12 +2112,7 @@ inode_cmp_iversion(const struct inode *inode, const u64 old)
 static inline bool
 inode_iversion_need_inc(struct inode *inode)
 {
-	bool ret;
-
-	spin_lock(&inode->i_lock);
-	ret = inode->i_state & I_VERS_BUMP;
-	spin_unlock(&inode->i_lock);
-	return ret;
+	return atomic64_read(&inode->i_version) & INODE_I_VERSION_QUERIED;
 }
 
 enum file_time_flags {
-- 
2.7.4

  parent reply	other threads:[~2016-12-21 17:05 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36   ` J. Bruce Fields
2017-03-04  0:09     ` Jeff Layton
2017-03-03 23:55   ` NeilBrown
2017-03-04  1:58     ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04  0:03   ` NeilBrown
2017-03-04  0:43     ` Jeff Layton
2016-12-21 17:03 ` Jeff Layton [this message]
2016-12-22  8:38   ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Amir Goldstein
2016-12-22 13:27     ` Jeff Layton
2017-03-04  0:00   ` NeilBrown
2016-12-22  8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42   ` Jeff Layton
2017-03-20 21:43     ` J. Bruce Fields
2017-03-21 13:45       ` Christoph Hellwig
2017-03-21 16:30         ` J. Bruce Fields
2017-03-21 17:23           ` Jeff Layton
2017-03-21 17:37             ` J. Bruce Fields
2017-03-21 17:51               ` J. Bruce Fields
2017-03-21 18:30             ` J. Bruce Fields
2017-03-21 18:46               ` Jeff Layton
2017-03-21 19:13                 ` J. Bruce Fields
2017-03-21 21:54                   ` Jeff Layton
2017-03-29 11:15                 ` Jan Kara
2017-03-29 17:54                   ` Jeff Layton
2017-03-29 23:41                     ` Dave Chinner
2017-03-30 11:24                       ` Jeff Layton
2017-04-04 18:38                       ` J. Bruce Fields
2017-03-30  6:47                     ` Jan Kara
2017-03-30 11:11                       ` Jeff Layton
2017-03-30 16:12                         ` J. Bruce Fields
2017-03-30 18:35                           ` Jeff Layton
2017-03-30 21:11                             ` Boaz Harrosh
2017-04-04 18:31                             ` J. Bruce Fields
2017-04-05  1:43                               ` NeilBrown
2017-04-05  8:05                                 ` Jan Kara
2017-04-05 18:14                                   ` J. Bruce Fields
2017-05-11 18:59                                     ` J. Bruce Fields
2017-05-11 22:22                                       ` NeilBrown
2017-05-12 16:21                                         ` J. Bruce Fields
2017-10-30 13:21                                           ` Jeff Layton
2017-05-12  8:27                                       ` Jan Kara
2017-05-12 15:56                                         ` J. Bruce Fields
2017-05-12 11:01                                       ` Jeff Layton
2017-05-12 15:57                                         ` J. Bruce Fields
2017-04-06  1:12                                   ` NeilBrown
2017-04-06  7:22                                     ` Jan Kara
2017-04-05 17:26                                 ` J. Bruce Fields
2017-04-01 23:05                           ` Dave Chinner
2017-04-03 14:00                             ` Jan Kara
2017-04-04 12:34                               ` Dave Chinner
2017-04-04 17:53                                 ` J. Bruce Fields
2017-04-05  1:26                                 ` NeilBrown
2017-03-21 21:45             ` Dave Chinner
2017-03-22 19:53               ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04  0:53   ` Jeff Layton
2017-03-08 17:29     ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1482339827-7882-31-git-send-email-jlayton@redhat.com \
    --to=jlayton@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --subject='Re: [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).