All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, viro@zeniv.linux.org.uk,
	linux-nfs@vger.kernel.org, bfields@fieldses.org, neilb@suse.de,
	jack@suse.de, linux-ext4@vger.kernel.org, tytso@mit.edu,
	adilger.kernel@dilger.ca, linux-xfs@vger.kernel.org,
	darrick.wong@oracle.com, david@fromorbit.com,
	linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com,
	dsterba@suse.com, linux-integrity@vger.kernel.org,
	zohar@linux.vnet.ibm.com, dmitry.kasatkin@gmail.com,
	linux-afs@lists.infradead.org, dhowells@redhat.com,
	jaltman@auristor.com, krzk@kernel.org
Subject: [PATCH v5 01/19] fs: new API for handling inode->i_version
Date: Tue,  9 Jan 2018 09:10:41 -0500	[thread overview]
Message-ID: <20180109141059.25929-2-jlayton@kernel.org> (raw)
In-Reply-To: <20180109141059.25929-1-jlayton@kernel.org>

From: Jeff Layton <jlayton@redhat.com>

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/btrfs/file.c          |   1 +
 fs/btrfs/inode.c         |   1 +
 fs/btrfs/ioctl.c         |   1 +
 fs/btrfs/xattr.c         |   1 +
 fs/ext4/inode.c          |   1 +
 fs/ext4/namei.c          |   1 +
 fs/inode.c               |   1 +
 include/linux/fs.h       |  15 ---
 include/linux/iversion.h | 236 +++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 243 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/iversion.h

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index eb1bac7c8553..c95d7b2efefb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -31,6 +31,7 @@
 #include <linux/slab.h>
 #include <linux/btrfs.h>
 #include <linux/uio.h>
+#include <linux/iversion.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..27f008b33fc1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -43,6 +43,7 @@
 #include <linux/posix_acl_xattr.h>
 #include <linux/uio.h>
 #include <linux/magic.h>
+#include <linux/iversion.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2ef8acaac688..aa452c9e2eff 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -43,6 +43,7 @@
 #include <linux/uuid.h>
 #include <linux/btrfs.h>
 #include <linux/uaccess.h>
+#include <linux/iversion.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 2c7e53f9ff1b..5258c1714830 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -23,6 +23,7 @@
 #include <linux/xattr.h>
 #include <linux/security.h>
 #include <linux/posix_acl_xattr.h>
+#include <linux/iversion.h>
 #include "ctree.h"
 #include "btrfs_inode.h"
 #include "transaction.h"
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7df2c5644e59..fa5d8bc52d2d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -39,6 +39,7 @@
 #include <linux/slab.h>
 #include <linux/bitops.h>
 #include <linux/iomap.h>
+#include <linux/iversion.h>
 
 #include "ext4_jbd2.h"
 #include "xattr.h"
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 798b3ac680db..bcf0dff517be 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -34,6 +34,7 @@
 #include <linux/quotaops.h>
 #include <linux/buffer_head.h>
 #include <linux/bio.h>
+#include <linux/iversion.h>
 #include "ext4.h"
 #include "ext4_jbd2.h"
 
diff --git a/fs/inode.c b/fs/inode.c
index 03102d6ef044..19e72f500f71 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -18,6 +18,7 @@
 #include <linux/buffer_head.h> /* for inode_has_buffers */
 #include <linux/ratelimit.h>
 #include <linux/list_lru.h>
+#include <linux/iversion.h>
 #include <trace/events/writeback.h>
 #include "internal.h"
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..76382c24e9d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode *inode)
 	mark_inode_dirty(inode);
 }
 
-/**
- * inode_inc_iversion - increments i_version
- * @inode: inode that need to be updated
- *
- * Every time the inode is modified, the i_version field will be incremented.
- * The filesystem has to be mounted with i_version flag
- */
-
-static inline void inode_inc_iversion(struct inode *inode)
-{
-       spin_lock(&inode->i_lock);
-       inode->i_version++;
-       spin_unlock(&inode->i_lock);
-}
-
 enum file_time_flags {
 	S_ATIME = 1,
 	S_MTIME = 2,
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
new file mode 100644
index 000000000000..d09cc3a08740
--- /dev/null
+++ b/include/linux/iversion.h
@@ -0,0 +1,236 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IVERSION_H
+#define _LINUX_IVERSION_H
+
+#include <linux/fs.h>
+
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * Observers see the i_version as a 64-bit number that never changes. If it
+ * remains the same since it was last checked, then nothing has changed in the
+ * inode. If it's different then something has changed. Observers cannot infer
+ * anything about the nature or magnitude of the changes from the value, only
+ * that the inode has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
+ *
+ * Those that set SB_I_VERSION will automatically have their i_version counter
+ * incremented on writes to normal files. If the SB_I_VERSION is not set, then
+ * the VFS will not touch it on writes, and the filesystem can use it how it
+ * wishes. Note that the filesystem is always responsible for updating the
+ * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
+ * We consider these sorts of filesystems to have a kernel-managed i_version.
+ *
+ * Note that some filesystems (e.g. NFS and AFS) just use the field to store
+ * a server-provided value (for the most part). For that reason, those
+ * filesystems do not set SB_I_VERSION. These filesystems are considered to
+ * have a self-managed i_version.
+ */
+
+/**
+ * inode_set_iversion_raw - set i_version to the specified raw value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for use by
+ * filesystems that self-manage the i_version.
+ *
+ * For example, the NFS client stores its NFSv4 change attribute in this way,
+ * and the AFS client stores the data_version from the server here.
+ */
+static inline void
+inode_set_iversion_raw(struct inode *inode, u64 new)
+{
+	inode->i_version = new;
+}
+
+/**
+ * inode_set_iversion - set i_version to a particular value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for filesystems with
+ * a kernel-managed i_version.
+ *
+ * For now, this just does the same thing as the _raw variant.
+ */
+static inline void
+inode_set_iversion(struct inode *inode, u64 new)
+{
+	inode_set_iversion_raw(inode, new);
+}
+
+/**
+ * inode_set_iversion_queried - set i_version to a particular value and set
+ *                              flag to indicate that it has been viewed
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * When loading in an i_version value from a backing store, we typically don't
+ * know whether it was previously viewed before being stored or not. Thus, we
+ * must assume that it was, to ensure that any changes will result in the
+ * value changing.
+ *
+ * This function will set the inode's i_version, and possibly flag the value
+ * as if it has already been viewed at least once.
+ *
+ * For now, this just does what inode_set_iversion does.
+ */
+static inline void
+inode_set_iversion_queried(struct inode *inode, u64 new)
+{
+	inode_set_iversion(inode, new);
+}
+
+/**
+ * inode_maybe_inc_iversion - increments i_version
+ * @inode: inode with the i_version that should be updated
+ * @force: increment the counter even if it's not necessary
+ *
+ * Every time the inode is modified, the i_version field must be seen to have
+ * changed by any observer.
+ *
+ * In this implementation, we always increment it after taking the i_lock to
+ * ensure that we don't race with other incrementors.
+ *
+ * Returns true if counter was bumped, and false if it wasn't.
+ */
+static inline bool
+inode_maybe_inc_iversion(struct inode *inode, bool force)
+{
+	spin_lock(&inode->i_lock);
+	inode->i_version++;
+	spin_unlock(&inode->i_lock);
+	return true;
+}
+
+/**
+ * inode_inc_iversion - forcibly increment i_version
+ * @inode: inode that needs to be updated
+ *
+ * Forcbily increment the i_version field. This always results in a change to
+ * the observable value.
+ */
+static inline void
+inode_inc_iversion(struct inode *inode)
+{
+	inode_maybe_inc_iversion(inode, true);
+}
+
+/**
+ * inode_iversion_need_inc - is the i_version in need of being incremented?
+ * @inode: inode to check
+ *
+ * Returns whether the inode->i_version counter needs incrementing on the next
+ * change.
+ *
+ * For now, we assume that it always does.
+ */
+static inline bool
+inode_iversion_need_inc(struct inode *inode)
+{
+	return true;
+}
+
+/**
+ * inode_peek_iversion_raw - grab a "raw" iversion value
+ * @inode: inode from which i_version should be read
+ *
+ * Grab a "raw" inode->i_version value and return it. The i_version is not
+ * flagged or converted in any way. This is mostly used to access a self-managed
+ * i_version.
+ *
+ * With those filesystems, we want to treat the i_version as an entirely
+ * opaque value.
+ */
+static inline u64
+inode_peek_iversion_raw(const struct inode *inode)
+{
+	return inode->i_version;
+}
+
+/**
+ * inode_inc_iversion_raw - forcibly increment raw i_version
+ * @inode: inode that needs to be updated
+ *
+ * Forcbily increment the raw i_version field. This always results in a change
+ * to the raw value.
+ *
+ * NFS will use the i_version field to store the value from the server. It
+ * mostly treats it as opaque, but in the case where it holds a write
+ * delegation, it must increment the value itself. This function does that.
+ */
+static inline void
+inode_inc_iversion_raw(struct inode *inode)
+{
+	inode_inc_iversion(inode);
+}
+
+/**
+ * inode_peek_iversion - read i_version without flagging it to be incremented
+ * @inode: inode from which i_version should be read
+ *
+ * Read the inode i_version counter for an inode without registering it as a
+ * query.
+ *
+ * This is typically used by local filesystems that need to store an i_version
+ * on disk. In that situation, it's not necessary to flag it as having been
+ * viewed, as the result won't be used to gauge changes from that point.
+ */
+static inline u64
+inode_peek_iversion(const struct inode *inode)
+{
+	return inode_peek_iversion_raw(inode);
+}
+
+/**
+ * inode_query_iversion - read i_version for later use
+ * @inode: inode from which i_version should be read
+ *
+ * Read the inode i_version counter. This should be used by callers that wish
+ * to store the returned i_version for later comparison. This will guarantee
+ * that a later query of the i_version will result in a different value if
+ * anything has changed.
+ *
+ * This implementation just does a peek.
+ */
+static inline u64
+inode_query_iversion(struct inode *inode)
+{
+	return inode_peek_iversion(inode);
+}
+
+/**
+ * inode_cmp_iversion_raw - check whether the raw i_version counter has changed
+ * @inode: inode to check
+ * @old: old value to check against its i_version
+ *
+ * Compare the current raw i_version counter with a previous one. Returns 0 if
+ * they are the same or non-zero if they are different.
+ */
+static inline s64
+inode_cmp_iversion_raw(const struct inode *inode, u64 old)
+{
+	return (s64)inode_peek_iversion_raw(inode) - (s64)old;
+}
+
+/**
+ * inode_cmp_iversion - check whether the i_version counter has changed
+ * @inode: inode to check
+ * @old: old value to check against its i_version
+ *
+ * Compare an i_version counter with a previous one. Returns 0 if they are
+ * the same or non-zero if they are different.
+ */
+static inline s64
+inode_cmp_iversion(const struct inode *inode, u64 old)
+{
+	return (s64)inode_peek_iversion(inode) - (s64)old;
+}
+#endif
-- 
2.14.3

  reply	other threads:[~2018-01-09 14:11 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-09 14:10 [PATCH v5 00/19] fs: rework and optimize i_version handling in filesystems Jeff Layton
2018-01-09 14:10 ` Jeff Layton [this message]
2018-01-18 21:38   ` [PATCH v5 01/19] fs: new API for handling inode->i_version J. Bruce Fields
2018-01-18 21:38     ` J. Bruce Fields
2018-01-18 22:47     ` Jeff Layton
2018-01-09 14:10 ` [PATCH v5 02/19] fs: don't take the i_lock in inode_inc_iversion Jeff Layton
2018-01-09 15:14   ` Jan Kara
2018-01-18 21:45   ` J. Bruce Fields
2018-01-19 14:36     ` Jeff Layton
2018-01-19 14:43       ` J. Bruce Fields
2018-01-09 14:10 ` [PATCH v5 03/19] fat: convert to new i_version API Jeff Layton
2018-01-09 14:10 ` [PATCH v5 04/19] affs: " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 05/19] afs: " Jeff Layton
2018-01-09 14:10   ` Jeff Layton
2018-01-09 14:10 ` [PATCH v5 06/19] btrfs: " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 07/19] exofs: switch " Jeff Layton
2018-01-09 14:10   ` Jeff Layton
2018-01-09 14:10 ` [PATCH v5 08/19] ext2: convert " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 09/19] ext4: " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 10/19] nfs: " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 11/19] nfsd: " Jeff Layton
2018-01-09 14:10   ` Jeff Layton
2018-01-09 14:10 ` [PATCH v5 12/19] ocfs2: " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 13/19] ufs: use " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 14/19] xfs: convert to " Jeff Layton
2018-01-09 22:46   ` Dave Chinner
2018-01-09 14:10 ` [PATCH v5 15/19] IMA: switch IMA over " Jeff Layton
2018-01-09 14:10 ` [PATCH v5 16/19] fs: only set S_VERSION when updating times if necessary Jeff Layton
2018-01-09 14:10 ` [PATCH v5 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2018-01-09 22:48   ` Dave Chinner
2018-01-09 14:10 ` [PATCH v5 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2018-01-11 19:30   ` Liu Bo
2018-01-09 14:10 ` [PATCH v5 19/19] fs: handle inode->i_version more efficiently Jeff Layton
2018-01-09 22:55   ` Dave Chinner
2018-01-10 14:12   ` Krzysztof Kozlowski
2018-01-10 14:12     ` Krzysztof Kozlowski
2018-01-11 20:23 ` [PATCH v5 00/19] fs: rework and optimize i_version handling in filesystems Liu Bo
2018-01-11 20:23   ` Liu Bo
2018-01-12 11:49   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180109141059.25929-2-jlayton@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=bfields@fieldses.org \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=dmitry.kasatkin@gmail.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.de \
    --cc=jaltman@auristor.com \
    --cc=jbacik@fb.com \
    --cc=krzk@kernel.org \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-integrity@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.