linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx
@ 2022-09-07 11:33 Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 1/6] iversion: update comments with info about atime updates Jeff Layton
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs

v4: drop xfs patch
    revise comment update patch with latest proposed semantics

This is a small revision to the patchset I sent a little over a week
ago [1]. Since then, this has also garnered a LWN article [2], so I
won't go into great detail on the basic premise and rationale.

The biggest change here is that I've dropped the xfs patch. Dave Chinner
stated that they'd need to add a new on-disk field instead of modifying
the behavior of the existing di_changecount field [3]. I'll leave that
to the xfs devs, but this does mean that xfs will have "buggy" behavior
until that's done.

I've also sent a revised manpage patchset separately to make sure that
the semantics are acceptable [4]. That hasn't gotten a lot of comments,
so I'm operating under the assumption that the semantics proposed there
are acceptable to most.

[1]: https://lore.kernel.org/linux-nfs/20220826214703.134870-1-jlayton@kernel.org/
[2]: https://lwn.net/Articles/905931/
[3]: https://lore.kernel.org/linux-nfs/20220830000851.GV3600936@dread.disaster.area/
[4]: https://lore.kernel.org/linux-nfs/20220907111606.18831-1-jlayton@kernel.org/T/#u

Jeff Layton (6):
  iversion: update comments with info about atime updates
  ext4: fix i_version handling in ext4
  ext4: unconditionally enable the i_version counter
  vfs: report an inode version in statx for IS_I_VERSION inodes
  nfs: report the inode version in statx if requested
  ceph: fill in the change attribute in statx requests

 fs/ceph/inode.c           | 14 +++++++++-----
 fs/ext4/inode.c           | 15 +++++----------
 fs/ext4/ioctl.c           |  4 ++++
 fs/ext4/move_extent.c     |  6 ++++++
 fs/ext4/super.c           | 13 ++++---------
 fs/ext4/xattr.c           |  1 +
 fs/nfs/inode.c            |  7 +++++--
 fs/stat.c                 |  7 +++++++
 include/linux/iversion.h  | 10 ++++++++--
 include/linux/stat.h      |  1 +
 include/uapi/linux/stat.h |  3 ++-
 samples/vfs/test-statx.c  |  8 ++++++--
 12 files changed, 58 insertions(+), 31 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/6] iversion: update comments with info about atime updates
  2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
@ 2022-09-07 11:33 ` Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 2/6] ext4: fix i_version handling in ext4 Jeff Layton
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs, Colin Walters

The i_version field in the kernel has had different semantics over
the decades, but we're now proposing to expose it to userland via
statx. This means that we need a clear, consistent definition of
what it means and when it should change.

Update the comments in iversion.h to describe when the i_version
must change.

Cc: Colin Walters <walters@verbum.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Cc: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/linux-xfs/166086932784.5425.17134712694961326033@noble.neil.brown.name/#t
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/iversion.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index 3bfebde5a1a6..0555a3851dbf 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -9,8 +9,14 @@
  * ---------------------------
  * The change attribute (i_version) is mandated by NFSv4 and is mostly for
  * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
- * appear different to observers if there was a change to the inode's data or
- * metadata since it was last queried.
+ * appear larger to observers if there was an explicit change to the inode's
+ * data or metadata since it was last queried.
+ *
+ * An explicit change is one that would ordinarily result in a change to the
+ * inode status change time (aka ctime). i_version must appear to change, even
+ * if the ctime does not (since the whole point is to avoid missing updates due
+ * to timestamp granularity). If POSIX mandates that the ctime must change due
+ * to an operation, then the i_version counter must be incremented as well.
  *
  * Observers see the i_version as a 64-bit number that never decreases. If it
  * remains the same since it was last checked, then nothing has changed in the
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 2/6] ext4: fix i_version handling in ext4
  2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 1/6] iversion: update comments with info about atime updates Jeff Layton
@ 2022-09-07 11:33 ` Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 3/6] ext4: unconditionally enable the i_version counter Jeff Layton
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs

ext4 currently updates the i_version counter when the atime is updated
during a read. This is less than ideal as it can cause unnecessary cache
invalidations with NFSv4 and unnecessary remeasurements for IMA.

The increment in ext4_mark_iloc_dirty is also problematic since it can
corrupt the i_version counter for ea_inodes. We aren't bumping the file
times in ext4_mark_iloc_dirty, so changing the i_version there seems
wrong, and is the cause of both problems.

Remove that callsite and add increments to the setattr, setxattr and
ioctl codepaths, at the same times that we update the ctime. The
i_version bump that already happens during timestamp updates should take
care of the rest.

In ext4_move_extents, increment the i_version on both inodes, and also
add in missing ctime updates.

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ext4/inode.c       | 15 +++++----------
 fs/ext4/ioctl.c       |  8 ++++++++
 fs/ext4/move_extent.c |  8 ++++++++
 fs/ext4/xattr.c       |  2 ++
 4 files changed, 23 insertions(+), 10 deletions(-)

Note that this patch is based on top of this patch from Lukas:

    ext4: don't increase iversion counter for ea_inodes

It won't apply cleanly to mainline without that.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2a220be34caa..aa37bce4c541 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5342,6 +5342,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	int error, rc = 0;
 	int orphan = 0;
 	const unsigned int ia_valid = attr->ia_valid;
+	bool inc_ivers = IS_I_VERSION(inode);
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
 		return -EIO;
@@ -5425,8 +5426,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 			return -EINVAL;
 		}
 
-		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
-			inode_inc_iversion(inode);
+		if (attr->ia_size == inode->i_size)
+			inc_ivers = false;
 
 		if (shrink) {
 			if (ext4_should_order_data(inode)) {
@@ -5528,6 +5529,8 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	}
 
 	if (!error) {
+		if (inc_ivers)
+			inode_inc_iversion(inode);
 		setattr_copy(mnt_userns, inode, attr);
 		mark_inode_dirty(inode);
 	}
@@ -5731,14 +5734,6 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 	}
 	ext4_fc_track_inode(handle, inode);
 
-	/*
-	 * ea_inodes are using i_version for storing reference count, don't
-	 * mess with it
-	 */
-	if (IS_I_VERSION(inode) &&
-	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
-		inode_inc_iversion(inode);
-
 	/* the do_update_inode consumes one bh->b_count */
 	get_bh(iloc->bh);
 
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 3cf3ec4b1c21..60e77ae9342d 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -452,6 +452,8 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	swap_inode_data(inode, inode_bl);
 
 	inode->i_ctime = inode_bl->i_ctime = current_time(inode);
+	if (IS_I_VERSION(inode))
+		inode_inc_iversion(inode);
 
 	inode->i_generation = prandom_u32();
 	inode_bl->i_generation = prandom_u32();
@@ -665,6 +667,8 @@ static int ext4_ioctl_setflags(struct inode *inode,
 	ext4_set_inode_flags(inode, false);
 
 	inode->i_ctime = current_time(inode);
+	if (IS_I_VERSION(inode))
+		inode_inc_iversion(inode);
 
 	err = ext4_mark_iloc_dirty(handle, inode, &iloc);
 flags_err:
@@ -775,6 +779,8 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
 
 	EXT4_I(inode)->i_projid = kprojid;
 	inode->i_ctime = current_time(inode);
+	if (IS_I_VERSION(inode))
+		inode_inc_iversion(inode);
 out_dirty:
 	rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
 	if (!err)
@@ -1257,6 +1263,8 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		err = ext4_reserve_inode_write(handle, inode, &iloc);
 		if (err == 0) {
 			inode->i_ctime = current_time(inode);
+			if (IS_I_VERSION(inode))
+				inode_inc_iversion(inode);
 			inode->i_generation = generation;
 			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
 		}
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 701f1d6a217f..d73ab3153218 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/fs.h>
+#include <linux/iversion.h>
 #include <linux/quotaops.h>
 #include <linux/slab.h>
 #include <linux/sched/mm.h>
@@ -683,6 +684,13 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
 			break;
 		o_start += cur_len;
 		d_start += cur_len;
+
+		orig_inode->i_ctime = current_time(orig_inode);
+		donor_inode->i_ctime = current_time(donor_inode);
+		if (IS_I_VERSION(orig_inode))
+			inode_inc_iversion(orig_inode);
+		if (IS_I_VERSION(donor_inode))
+			inode_inc_iversion(donor_inode);
 	}
 	*moved_len = o_start - orig_blk;
 	if (*moved_len > len)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 533216e80fa2..e975442e4ab2 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2412,6 +2412,8 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
 	if (!error) {
 		ext4_xattr_update_super_block(handle, inode->i_sb);
 		inode->i_ctime = current_time(inode);
+		if (IS_I_VERSION(inode))
+			inode_inc_iversion(inode);
 		if (!value)
 			no_expand = 0;
 		error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 3/6] ext4: unconditionally enable the i_version counter
  2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 1/6] iversion: update comments with info about atime updates Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 2/6] ext4: fix i_version handling in ext4 Jeff Layton
@ 2022-09-07 11:33 ` Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 4/6] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs, Benjamin Coddington,
	Christoph Hellwig

The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).

Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.

Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, remove the handling of
Opt_i_version as well since it's due for deprecation anyway.

Ideally, we'd couple this change with a way to disable the i_version
counter (just in case), but the way the iversion mount option was
implemented makes that difficult to do. We'd need to add a new mount
option altogether or do something with tune2fs. That's probably best
left to later patches if it turns out to be needed.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ext4/inode.c       |  2 +-
 fs/ext4/ioctl.c       | 12 ++++--------
 fs/ext4/move_extent.c |  6 ++----
 fs/ext4/super.c       | 13 ++++---------
 fs/ext4/xattr.c       |  3 +--
 5 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index aa37bce4c541..6ef37269e7c0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5342,7 +5342,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 	int error, rc = 0;
 	int orphan = 0;
 	const unsigned int ia_valid = attr->ia_valid;
-	bool inc_ivers = IS_I_VERSION(inode);
+	bool inc_ivers = true;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
 		return -EIO;
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 60e77ae9342d..ad3a294a88eb 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -452,8 +452,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	swap_inode_data(inode, inode_bl);
 
 	inode->i_ctime = inode_bl->i_ctime = current_time(inode);
-	if (IS_I_VERSION(inode))
-		inode_inc_iversion(inode);
+	inode_inc_iversion(inode);
 
 	inode->i_generation = prandom_u32();
 	inode_bl->i_generation = prandom_u32();
@@ -667,8 +666,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
 	ext4_set_inode_flags(inode, false);
 
 	inode->i_ctime = current_time(inode);
-	if (IS_I_VERSION(inode))
-		inode_inc_iversion(inode);
+	inode_inc_iversion(inode);
 
 	err = ext4_mark_iloc_dirty(handle, inode, &iloc);
 flags_err:
@@ -779,8 +777,7 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
 
 	EXT4_I(inode)->i_projid = kprojid;
 	inode->i_ctime = current_time(inode);
-	if (IS_I_VERSION(inode))
-		inode_inc_iversion(inode);
+	inode_inc_iversion(inode);
 out_dirty:
 	rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
 	if (!err)
@@ -1263,8 +1260,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		err = ext4_reserve_inode_write(handle, inode, &iloc);
 		if (err == 0) {
 			inode->i_ctime = current_time(inode);
-			if (IS_I_VERSION(inode))
-				inode_inc_iversion(inode);
+			inode_inc_iversion(inode);
 			inode->i_generation = generation;
 			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
 		}
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index d73ab3153218..285700b00d38 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -687,10 +687,8 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
 
 		orig_inode->i_ctime = current_time(orig_inode);
 		donor_inode->i_ctime = current_time(donor_inode);
-		if (IS_I_VERSION(orig_inode))
-			inode_inc_iversion(orig_inode);
-		if (IS_I_VERSION(donor_inode))
-			inode_inc_iversion(donor_inode);
+		inode_inc_iversion(orig_inode);
+		inode_inc_iversion(donor_inode);
 	}
 	*moved_len = o_start - orig_blk;
 	if (*moved_len > len)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9a66abcca1a8..e7cf5361245a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
 	Opt_inlinecrypt,
 	Opt_usrjquota, Opt_grpjquota, Opt_quota,
 	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
-	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+	Opt_usrquota, Opt_grpquota, Opt_prjquota,
 	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
 	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
 	fsparam_flag	("barrier",		Opt_barrier),
 	fsparam_u32	("barrier",		Opt_barrier),
 	fsparam_flag	("nobarrier",		Opt_nobarrier),
-	fsparam_flag	("i_version",		Opt_i_version),
 	fsparam_flag	("dax",			Opt_dax),
 	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
 	fsparam_u32	("stripe",		Opt_stripe),
@@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case Opt_abort:
 		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
 		return 0;
-	case Opt_i_version:
-		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
-		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
-		ctx_set_flags(ctx, SB_I_VERSION);
-		return 0;
 	case Opt_inlinecrypt:
 #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
 		ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2970,8 +2964,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
 		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
 	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
 		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
-	if (sb->s_flags & SB_I_VERSION)
-		SEQ_OPTS_PUTS("i_version");
 	if (nodefs || sbi->s_stripe)
 		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
 	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4640,6 +4632,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
 		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
 
+	/* i_version is always enabled now */
+	sb->s_flags |= SB_I_VERSION;
+
 	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
 	    (ext4_has_compat_features(sb) ||
 	     ext4_has_ro_compat_features(sb) ||
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index e975442e4ab2..36d6ba7190b6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2412,8 +2412,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
 	if (!error) {
 		ext4_xattr_update_super_block(handle, inode->i_sb);
 		inode->i_ctime = current_time(inode);
-		if (IS_I_VERSION(inode))
-			inode_inc_iversion(inode);
+		inode_inc_iversion(inode);
 		if (!value)
 			no_expand = 0;
 		error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 4/6] vfs: report an inode version in statx for IS_I_VERSION inodes
  2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
                   ` (2 preceding siblings ...)
  2022-09-07 11:33 ` [PATCH v4 3/6] ext4: unconditionally enable the i_version counter Jeff Layton
@ 2022-09-07 11:33 ` Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 5/6] nfs: report the inode version in statx if requested Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 6/6] ceph: fill in the change attribute in statx requests Jeff Layton
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs, Jeff Layton, David Howells,
	Frank Filz

From: Jeff Layton <jlayton@redhat.com>

The NFS server and IMA both rely heavily on the i_version counter, but
it's largely invisible to userland, which makes it difficult to test its
behavior. This value would also be of use to userland NFS servers, and
other applications that want a reliable way to know whether there might
have been an explicit change to an inode since they last checked.

Claim one of the spare fields in struct statx to hold a 64-bit inode
version attribute. This value must change with any explicit, observeable
metadata or data change. Note that atime updates are excluded from this,
unless it is due to an explicit change via utimes or similar mechanism.

When statx requests this attribute on an IS_I_VERSION inode, do an
inode_query_iversion and fill the result in the field. Also, update the
test-statx.c program to display the inode version and the mountid.

Cc: David Howells <dhowells@redhat.com>
Cc: Frank Filz <ffilzlnx@mindspring.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/stat.c                 | 7 +++++++
 include/linux/stat.h      | 1 +
 include/uapi/linux/stat.h | 3 ++-
 samples/vfs/test-statx.c  | 8 ++++++--
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index 9ced8860e0f3..d892909836aa 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -17,6 +17,7 @@
 #include <linux/syscalls.h>
 #include <linux/pagemap.h>
 #include <linux/compat.h>
+#include <linux/iversion.h>
 
 #include <linux/uaccess.h>
 #include <asm/unistd.h>
@@ -118,6 +119,11 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT |
 				  STATX_ATTR_DAX);
 
+	if ((request_mask & STATX_INO_VERSION) && IS_I_VERSION(inode)) {
+		stat->result_mask |= STATX_INO_VERSION;
+		stat->ino_version = inode_query_iversion(inode);
+	}
+
 	mnt_userns = mnt_user_ns(path->mnt);
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(mnt_userns, path, stat,
@@ -611,6 +617,7 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 	tmp.stx_dev_major = MAJOR(stat->dev);
 	tmp.stx_dev_minor = MINOR(stat->dev);
 	tmp.stx_mnt_id = stat->mnt_id;
+	tmp.stx_ino_version = stat->ino_version;
 
 	return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
 }
diff --git a/include/linux/stat.h b/include/linux/stat.h
index 7df06931f25d..9cd77eb7bc1a 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -50,6 +50,7 @@ struct kstat {
 	struct timespec64 btime;			/* File creation time */
 	u64		blocks;
 	u64		mnt_id;
+	u64		ino_version;
 };
 
 #endif
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 1500a0f58041..48d9307d7f31 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -124,7 +124,7 @@ struct statx {
 	__u32	stx_dev_minor;
 	/* 0x90 */
 	__u64	stx_mnt_id;
-	__u64	__spare2;
+	__u64	stx_ino_version; /* Inode change attribute */
 	/* 0xa0 */
 	__u64	__spare3[12];	/* Spare space for future expansion */
 	/* 0x100 */
@@ -152,6 +152,7 @@ struct statx {
 #define STATX_BASIC_STATS	0x000007ffU	/* The stuff in the normal stat struct */
 #define STATX_BTIME		0x00000800U	/* Want/got stx_btime */
 #define STATX_MNT_ID		0x00001000U	/* Got stx_mnt_id */
+#define STATX_INO_VERSION	0x00002000U	/* Want/got stx_change_attr */
 
 #define STATX__RESERVED		0x80000000U	/* Reserved for future struct statx expansion */
 
diff --git a/samples/vfs/test-statx.c b/samples/vfs/test-statx.c
index 49c7a46cee07..23e68036fdfb 100644
--- a/samples/vfs/test-statx.c
+++ b/samples/vfs/test-statx.c
@@ -107,6 +107,8 @@ static void dump_statx(struct statx *stx)
 	printf("Device: %-15s", buffer);
 	if (stx->stx_mask & STATX_INO)
 		printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino);
+	if (stx->stx_mask & STATX_MNT_ID)
+		printf(" MountId: %llx", stx->stx_mnt_id);
 	if (stx->stx_mask & STATX_NLINK)
 		printf(" Links: %-5u", stx->stx_nlink);
 	if (stx->stx_mask & STATX_TYPE) {
@@ -145,7 +147,9 @@ static void dump_statx(struct statx *stx)
 	if (stx->stx_mask & STATX_CTIME)
 		print_time("Change: ", &stx->stx_ctime);
 	if (stx->stx_mask & STATX_BTIME)
-		print_time(" Birth: ", &stx->stx_btime);
+		print_time("Birth: ", &stx->stx_btime);
+	if (stx->stx_mask & STATX_INO_VERSION)
+		printf("Inode Version: 0x%llx\n", stx->stx_ino_version);
 
 	if (stx->stx_attributes_mask) {
 		unsigned char bits, mbits;
@@ -218,7 +222,7 @@ int main(int argc, char **argv)
 	struct statx stx;
 	int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW;
 
-	unsigned int mask = STATX_BASIC_STATS | STATX_BTIME;
+	unsigned int mask = STATX_BASIC_STATS | STATX_BTIME | STATX_MNT_ID | STATX_INO_VERSION;
 
 	for (argv++; *argv; argv++) {
 		if (strcmp(*argv, "-F") == 0) {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 5/6] nfs: report the inode version in statx if requested
  2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
                   ` (3 preceding siblings ...)
  2022-09-07 11:33 ` [PATCH v4 4/6] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
@ 2022-09-07 11:33 ` Jeff Layton
  2022-09-07 11:33 ` [PATCH v4 6/6] ceph: fill in the change attribute in statx requests Jeff Layton
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs

Allow NFS to report the i_version in statx. Since the cost to fetch it
is relatively cheap, do it unconditionally and just set the flag if it
looks like it's valid.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfs/inode.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bea7c005119c..88c732a5c821 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -830,6 +830,8 @@ static u32 nfs_get_valid_attrmask(struct inode *inode)
 		reply_mask |= STATX_UID | STATX_GID;
 	if (!(cache_validity & NFS_INO_INVALID_BLOCKS))
 		reply_mask |= STATX_BLOCKS;
+	if (!(cache_validity & NFS_INO_INVALID_CHANGE))
+		reply_mask |= STATX_INO_VERSION;
 	return reply_mask;
 }
 
@@ -848,7 +850,7 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 
 	request_mask &= STATX_TYPE | STATX_MODE | STATX_NLINK | STATX_UID |
 			STATX_GID | STATX_ATIME | STATX_MTIME | STATX_CTIME |
-			STATX_INO | STATX_SIZE | STATX_BLOCKS;
+			STATX_INO | STATX_SIZE | STATX_BLOCKS | STATX_INO_VERSION;
 
 	if ((query_flags & AT_STATX_DONT_SYNC) && !force_sync) {
 		if (readdirplus_enabled)
@@ -877,7 +879,7 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 	/* Is the user requesting attributes that might need revalidation? */
 	if (!(request_mask & (STATX_MODE|STATX_NLINK|STATX_ATIME|STATX_CTIME|
 					STATX_MTIME|STATX_UID|STATX_GID|
-					STATX_SIZE|STATX_BLOCKS)))
+					STATX_SIZE|STATX_BLOCKS|STATX_INO_VERSION)))
 		goto out_no_revalidate;
 
 	/* Check whether the cached attributes are stale */
@@ -915,6 +917,7 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 
 	generic_fillattr(&init_user_ns, inode, stat);
 	stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
+	stat->ino_version = inode_peek_iversion_raw(inode);
 	if (S_ISDIR(inode->i_mode))
 		stat->blksize = NFS_SERVER(inode)->dtsize;
 out:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 6/6] ceph: fill in the change attribute in statx requests
  2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
                   ` (4 preceding siblings ...)
  2022-09-07 11:33 ` [PATCH v4 5/6] nfs: report the inode version in statx if requested Jeff Layton
@ 2022-09-07 11:33 ` Jeff Layton
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2022-09-07 11:33 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro,
	zohar, xiubli, chuck.lever, lczerner, jack, bfields, brauner,
	fweimer
  Cc: linux-api, linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel,
	linux-ext4, linux-nfs, linux-xfs

When statx requests the change attribute, request the full gamut of caps
(similarly to how ctime is handled). When the change attribute seems to
be valid, return it in the ino_version field.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 42351d7a0dd6..ccc926a7dcb0 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2415,10 +2415,10 @@ static int statx_to_caps(u32 want, umode_t mode)
 {
 	int mask = 0;
 
-	if (want & (STATX_MODE|STATX_UID|STATX_GID|STATX_CTIME|STATX_BTIME))
+	if (want & (STATX_MODE|STATX_UID|STATX_GID|STATX_CTIME|STATX_BTIME|STATX_INO_VERSION))
 		mask |= CEPH_CAP_AUTH_SHARED;
 
-	if (want & (STATX_NLINK|STATX_CTIME)) {
+	if (want & (STATX_NLINK|STATX_CTIME|STATX_INO_VERSION)) {
 		/*
 		 * The link count for directories depends on inode->i_subdirs,
 		 * and that is only updated when Fs caps are held.
@@ -2429,11 +2429,10 @@ static int statx_to_caps(u32 want, umode_t mode)
 			mask |= CEPH_CAP_LINK_SHARED;
 	}
 
-	if (want & (STATX_ATIME|STATX_MTIME|STATX_CTIME|STATX_SIZE|
-		    STATX_BLOCKS))
+	if (want & (STATX_ATIME|STATX_MTIME|STATX_CTIME|STATX_SIZE|STATX_BLOCKS|STATX_INO_VERSION))
 		mask |= CEPH_CAP_FILE_SHARED;
 
-	if (want & (STATX_CTIME))
+	if (want & (STATX_CTIME|STATX_INO_VERSION))
 		mask |= CEPH_CAP_XATTR_SHARED;
 
 	return mask;
@@ -2475,6 +2474,11 @@ int ceph_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		valid_mask |= STATX_BTIME;
 	}
 
+	if (request_mask & STATX_INO_VERSION) {
+		stat->ino_version = inode_peek_iversion_raw(inode);
+		valid_mask |= STATX_INO_VERSION;
+	}
+
 	if (ceph_snap(inode) == CEPH_NOSNAP)
 		stat->dev = inode->i_sb->s_dev;
 	else
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-07 11:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-07 11:33 [PATCH v4 0/6] vfs: clean up i_version behavior and expose it via statx Jeff Layton
2022-09-07 11:33 ` [PATCH v4 1/6] iversion: update comments with info about atime updates Jeff Layton
2022-09-07 11:33 ` [PATCH v4 2/6] ext4: fix i_version handling in ext4 Jeff Layton
2022-09-07 11:33 ` [PATCH v4 3/6] ext4: unconditionally enable the i_version counter Jeff Layton
2022-09-07 11:33 ` [PATCH v4 4/6] vfs: report an inode version in statx for IS_I_VERSION inodes Jeff Layton
2022-09-07 11:33 ` [PATCH v4 5/6] nfs: report the inode version in statx if requested Jeff Layton
2022-09-07 11:33 ` [PATCH v4 6/6] ceph: fill in the change attribute in statx requests Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).