All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] fs: implement multigrain timestamps
@ 2023-05-03 14:20 Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

Major changes in v3:
- move flag to use bit 31 instead of 0 since the upper bits in the
  tv_nsec field aren't used for timestamps. This means we don't need to
  set s_time_gran to a value higher than 1.

- use an fstype flag instead of a superblock flag

...plus a lot of smaller cleanups and documentation.

The basic idea with multigrain timestamps is to keep track of when an
inode's mtime or ctime has been queried and to force a fine-grained
timestamp the next time the mtime or ctime is updated.

This is a follow-up of the patches I posted last week [1]. The main
change in this set is that it no longer uses the lowest-order bit in the
tv_nsec field, and instead uses one of the higher-order bits (#31,
specifically) since they are otherwise unused. This change makes things
much simpler, and we no longer need to twiddle s_time_gran for it.

Note that with these changes, the statx06 LTP test will intermittently
fail on most filesystems, usually with errors like this:

    statx06.c:138: TFAIL: Birth time > after_time
    statx06.c:138: TFAIL: Modified time > after_time

The test does this:

        SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &before_time);
        clock_wait_tick();
        tc->operation();
        clock_wait_tick();
        SAFE_CLOCK_GETTIME(CLOCK_REALTIME_COARSE, &after_time);

Converting the second SAFE_CLOCK_GETTIME to use CLOCK_REALTIME instead
gets things working again.

For now, I've only converted/tested a few filesystems, focusing on the
most popular ones exported via NFS.  If this approach looks acceptable
though, I'll plan to convert more filesystems to it.

Another thing we could consider is enabling this unilaterally
kernel-wide. I decided not to do that for now, but it's something we
could consider for lately.

[1]: https://lore.kernel.org/linux-fsdevel/20230424151104.175456-1-jlayton@kernel.org/

Jeff Layton (6):
  fs: add infrastructure for multigrain inode i_m/ctime
  overlayfs: allow it handle multigrain timestamps
  shmem: convert to multigrain timestamps
  xfs: convert to multigrain timestamps
  ext4: convert to multigrain timestamps
  btrfs: convert to multigrain timestamps

 fs/btrfs/delayed-inode.c        |  2 +-
 fs/btrfs/file.c                 | 10 +++---
 fs/btrfs/inode.c                | 25 +++++++-------
 fs/btrfs/ioctl.c                |  6 ++--
 fs/btrfs/reflink.c              |  2 +-
 fs/btrfs/super.c                |  5 +--
 fs/btrfs/transaction.c          |  2 +-
 fs/btrfs/tree-log.c             |  2 +-
 fs/btrfs/volumes.c              |  2 +-
 fs/btrfs/xattr.c                |  4 +--
 fs/ext4/acl.c                   |  2 +-
 fs/ext4/extents.c               | 10 +++---
 fs/ext4/ialloc.c                |  2 +-
 fs/ext4/inline.c                |  4 +--
 fs/ext4/inode.c                 | 24 ++++++++++---
 fs/ext4/ioctl.c                 |  8 ++---
 fs/ext4/namei.c                 | 20 +++++------
 fs/ext4/super.c                 |  4 +--
 fs/ext4/xattr.c                 |  2 +-
 fs/inode.c                      | 52 ++++++++++++++++++++++++++--
 fs/overlayfs/file.c             |  7 ++--
 fs/overlayfs/util.c             |  2 +-
 fs/stat.c                       | 32 +++++++++++++++++
 fs/xfs/libxfs/xfs_inode_buf.c   |  2 +-
 fs/xfs/libxfs/xfs_trans_inode.c |  2 +-
 fs/xfs/xfs_acl.c                |  2 +-
 fs/xfs/xfs_bmap_util.c          |  2 +-
 fs/xfs/xfs_inode.c              |  2 +-
 fs/xfs/xfs_inode_item.c         |  2 +-
 fs/xfs/xfs_iops.c               | 15 ++++++--
 fs/xfs/xfs_super.c              |  2 +-
 include/linux/fs.h              | 61 ++++++++++++++++++++++++++++++++-
 mm/shmem.c                      | 25 +++++++-------
 33 files changed, 255 insertions(+), 89 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
@ 2023-05-03 14:20 ` Jeff Layton
  2023-05-05  0:10   ` Dave Chinner
  2023-05-03 14:20 ` [PATCH v3 2/6] overlayfs: allow it handle multigrain timestamps Jeff Layton
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

The VFS always uses coarse-grained timestamp updates for filling out the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
lot of exported filesystems don't properly support a change attribute
and are subject to the same problems with timestamp granularity. Other
applications have similar issues (e.g backup applications).

Switching to always using fine-grained timestamps would improve the
situation, but that becomes rather expensive, as the underlying
filesystem will have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried.

The kernel always stores normalized ctime values, so only the first 30
bits of the tv_nsec field are ever used. Whenever the mtime changes, the
ctime must also change.

Use the 31st bit of the tv_nsec field to indicate that something has
queried the inode for the i_mtime or i_ctime. When this flag is set, on
the next timestamp update, the kernel can fetch a fine-grained timestamp
instead of the usual coarse-grained one.

This patch adds the infrastructure this scheme. Filesytems can opt
into it by setting the FS_MULTIGRAIN_TS flag in the fstype.

Later patches will convert individual filesystems over to use it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/inode.c         | 52 ++++++++++++++++++++++++++++++++++++---
 fs/stat.c          | 32 ++++++++++++++++++++++++
 include/linux/fs.h | 61 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 141 insertions(+), 4 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 4558dc2f1355..7f6189961d6a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2030,6 +2030,7 @@ EXPORT_SYMBOL(file_remove_privs);
 static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
 {
 	int sync_it = 0;
+	struct timespec64 ctime;
 
 	/* First try to exhaust all avenues to not sync */
 	if (IS_NOCMTIME(inode))
@@ -2038,7 +2039,8 @@ static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
 	if (!timespec64_equal(&inode->i_mtime, now))
 		sync_it = S_MTIME;
 
-	if (!timespec64_equal(&inode->i_ctime, now))
+	ctime = ctime_peek(inode);
+	if (!timespec64_equal(&ctime, now))
 		sync_it |= S_CTIME;
 
 	if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
@@ -2062,6 +2064,50 @@ static int __file_update_time(struct file *file, struct timespec64 *now,
 	return ret;
 }
 
+/**
+ * current_ctime - Return FS time (possibly fine-grained)
+ * @inode: inode.
+ *
+ * Return the current time truncated to the time granularity supported by
+ * the fs, as suitable for a ctime/mtime change.
+ *
+ * For a multigrain timestamp, if the ctime is flagged as having been
+ * QUERIED, get a fine-grained timestamp.
+ */
+struct timespec64 current_ctime(struct inode *inode)
+{
+	bool multigrain = is_multigrain_ts(inode);
+	struct timespec64 now;
+	long nsec = 0;
+
+	if (multigrain) {
+		atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
+
+		nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED, pnsec);
+	}
+
+	if (nsec & I_CTIME_QUERIED) {
+		ktime_get_real_ts64(&now);
+	} else {
+		ktime_get_coarse_real_ts64(&now);
+
+		if (multigrain) {
+			/*
+			 * If we've recently fetched a fine-grained timestamp
+			 * then the coarse-grained one may be earlier than the
+			 * existing one. Just keep the existing ctime if so.
+			 */
+			struct timespec64 ctime = ctime_peek(inode);
+
+			if (timespec64_compare(&ctime, &now) > 0)
+				now = ctime;
+		}
+	}
+
+	return timestamp_truncate(now, inode);
+}
+EXPORT_SYMBOL(current_ctime);
+
 /**
  * file_update_time - update mtime and ctime time
  * @file: file accessed
@@ -2080,7 +2126,7 @@ int file_update_time(struct file *file)
 {
 	int ret;
 	struct inode *inode = file_inode(file);
-	struct timespec64 now = current_time(inode);
+	struct timespec64 now = current_ctime(inode);
 
 	ret = inode_needs_update_time(inode, &now);
 	if (ret <= 0)
@@ -2109,7 +2155,7 @@ static int file_modified_flags(struct file *file, int flags)
 {
 	int ret;
 	struct inode *inode = file_inode(file);
-	struct timespec64 now = current_time(inode);
+	struct timespec64 now = current_ctime(inode);
 
 	/*
 	 * Clear the security bits if the process is not being run by root.
diff --git a/fs/stat.c b/fs/stat.c
index 7c238da22ef0..11a7e277f53e 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -26,6 +26,38 @@
 #include "internal.h"
 #include "mount.h"
 
+/**
+ * generic_fill_multigrain_cmtime - Fill in the mtime and ctime and flag ctime as QUERIED
+ * @request_mask: STATX_* values requested
+ * @inode: inode from which to grab the c/mtime
+ * @stat: where to store the resulting values
+ *
+ * Given @inode, grab the ctime and mtime out if it and store the result
+ * in @stat. When fetching the value, flag it as queried so the next write
+ * will use a fine-grained timestamp.
+ */
+void generic_fill_multigrain_cmtime(u32 request_mask,struct inode *inode,
+					struct kstat *stat)
+{
+	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
+
+	/* If neither time was requested, then just don't report it */
+	if (!(request_mask & (STATX_CTIME|STATX_MTIME))) {
+		stat->result_mask &= ~(STATX_CTIME|STATX_MTIME);
+		return;
+	}
+
+	stat->mtime = inode->i_mtime;
+	stat->ctime.tv_sec = inode->i_ctime.tv_sec;
+	/*
+	 * Atomically set the QUERIED flag and fetch the new value with
+	 * the flag masked off.
+	 */
+	stat->ctime.tv_nsec = atomic_long_fetch_or(I_CTIME_QUERIED, pnsec) &
+					~I_CTIME_QUERIED;
+}
+EXPORT_SYMBOL(generic_fill_multigrain_cmtime);
+
 /**
  * generic_fillattr - Fill in the basic attributes from the inode struct
  * @idmap:	idmap of the mount the inode was found from
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c85916e9f7db..d12d4a302d9d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1457,7 +1457,8 @@ static inline bool fsuidgid_has_mapping(struct super_block *sb,
 	       kgid_has_mapping(fs_userns, kgid);
 }
 
-extern struct timespec64 current_time(struct inode *inode);
+struct timespec64 current_time(struct inode *inode);
+struct timespec64 current_ctime(struct inode *inode);
 
 /*
  * Snapshotting support.
@@ -2195,6 +2196,7 @@ struct file_system_type {
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
 #define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
+#define FS_MULTIGRAIN_TS	64	/* Filesystem uses multigrain timestamps */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
@@ -2218,6 +2220,61 @@ struct file_system_type {
 
 #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME)
 
+/*
+ * Multigrain timestamps
+ *
+ * Conditionally use fine-grained ctime and mtime timestamps when there
+ * are users actively observing them via getattr. The primary use-case
+ * for this is NFS clients that use the ctime to distinguish between
+ * different states of the file, and that are often fooled by multiple
+ * operations that occur in the same coarse-grained timer tick.
+ */
+static inline bool is_multigrain_ts(const struct inode *inode)
+{
+	return inode->i_sb->s_type->fs_flags & FS_MULTIGRAIN_TS;
+}
+
+/*
+ * The kernel always keeps normalized struct timespec64 values in the ctime,
+ * which means that only the first 30 bits of the value are used. Use the
+ * 31st bit of the ctime's tv_nsec field as a flag to indicate that the value
+ * has been queried since it was last updated.
+ */
+#define I_CTIME_QUERIED		(1L<<30)
+
+/**
+ * ctime_nsec_peek - peek at (but don't query) the ctime tv_nsec field
+ * @inode: inode to fetch the ctime from
+ *
+ * Grab the current ctime tv_nsec field from the inode, mask off the
+ * I_CTIME_QUERIED flag and return it. This is mostly intended for use by
+ * internal consumers of the ctime that aren't concerned with ensuring a
+ * fine-grained update on the next change (e.g. when preparing to store
+ * the value in the backing store for later retrieval).
+ */
+static inline long ctime_nsec_peek(const struct inode *inode)
+{
+	return inode->i_ctime.tv_nsec &~ I_CTIME_QUERIED;
+}
+
+/**
+ * ctime_peek - peek at (but don't query) the ctime
+ * @inode: inode to fetch the ctime from
+ *
+ * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For
+ * use by internal consumers that don't require a fine-grained update on
+ * the next change.
+ */
+static inline struct timespec64 ctime_peek(const struct inode *inode)
+{
+	struct timespec64 ctime;
+
+	ctime.tv_sec = inode->i_ctime.tv_sec;
+	ctime.tv_nsec = ctime_nsec_peek(inode);
+
+	return ctime;
+}
+
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data,
 	int (*fill_super)(struct super_block *, void *, int));
@@ -2838,6 +2895,8 @@ extern void page_put_link(void *);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
 extern void kfree_link(void *);
+void generic_fill_multigrain_cmtime(u32 request_mask, struct inode *inode,
+					struct kstat *stat);
 void generic_fillattr(struct mnt_idmap *, struct inode *, struct kstat *);
 void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/6] overlayfs: allow it handle multigrain timestamps
  2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
@ 2023-05-03 14:20 ` Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 3/6] shmem: convert to " Jeff Layton
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

Ensure that we strip off the I_CTIME_QUERIED bit when copying up.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/overlayfs/file.c | 7 +++++--
 fs/overlayfs/util.c | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 7c04f033aadd..cad715df8c4e 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -222,6 +222,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 static void ovl_file_accessed(struct file *file)
 {
 	struct inode *inode, *upperinode;
+	struct timespec64 ctime, uctime;
 
 	if (file->f_flags & O_NOATIME)
 		return;
@@ -232,10 +233,12 @@ static void ovl_file_accessed(struct file *file)
 	if (!upperinode)
 		return;
 
+	ctime = ctime_peek(inode);
+	uctime = ctime_peek(upperinode);
 	if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
-	     !timespec64_equal(&inode->i_ctime, &upperinode->i_ctime))) {
+	     !timespec64_equal(&ctime, &uctime))) {
 		inode->i_mtime = upperinode->i_mtime;
-		inode->i_ctime = upperinode->i_ctime;
+		inode->i_ctime = uctime;
 	}
 
 	touch_atime(&file->f_path);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 923d66d131c1..f4f9d7e189ef 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -1117,6 +1117,6 @@ void ovl_copyattr(struct inode *inode)
 	inode->i_mode = realinode->i_mode;
 	inode->i_atime = realinode->i_atime;
 	inode->i_mtime = realinode->i_mtime;
-	inode->i_ctime = realinode->i_ctime;
+	inode->i_ctime = ctime_peek(realinode);
 	i_size_write(inode, i_size_read(realinode));
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/6] shmem: convert to multigrain timestamps
  2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 2/6] overlayfs: allow it handle multigrain timestamps Jeff Layton
@ 2023-05-03 14:20 ` Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 4/6] xfs: " Jeff Layton
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 mm/shmem.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 448f393d8ab2..40c794a7baa8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1039,7 +1039,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 {
 	shmem_undo_range(inode, lstart, lend, false);
-	inode->i_ctime = inode->i_mtime = current_time(inode);
+	inode->i_ctime = inode->i_mtime = current_ctime(inode);
 	inode_inc_iversion(inode);
 }
 EXPORT_SYMBOL_GPL(shmem_truncate_range);
@@ -1066,6 +1066,7 @@ static int shmem_getattr(struct mnt_idmap *idmap,
 			STATX_ATTR_IMMUTABLE |
 			STATX_ATTR_NODUMP);
 	generic_fillattr(idmap, inode, stat);
+	generic_fill_multigrain_cmtime(request_mask, inode, stat);
 
 	if (shmem_is_huge(inode, 0, false, NULL, 0))
 		stat->blksize = HPAGE_PMD_SIZE;
@@ -1136,7 +1137,7 @@ static int shmem_setattr(struct mnt_idmap *idmap,
 	if (attr->ia_valid & ATTR_MODE)
 		error = posix_acl_chmod(idmap, dentry, inode->i_mode);
 	if (!error && update_ctime) {
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		if (update_mtime)
 			inode->i_mtime = inode->i_ctime;
 		inode_inc_iversion(inode);
@@ -2361,7 +2362,7 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
 		inode->i_ino = ino;
 		inode_init_owner(idmap, inode, dir, mode);
 		inode->i_blocks = 0;
-		inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
+		inode->i_atime = inode->i_mtime = inode->i_ctime = current_ctime(inode);
 		inode->i_generation = get_random_u32();
 		info = SHMEM_I(inode);
 		memset(info, 0, (char *)inode - (char *)info);
@@ -2940,7 +2941,7 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
 
 		error = 0;
 		dir->i_size += BOGO_DIRENT_SIZE;
-		dir->i_ctime = dir->i_mtime = current_time(dir);
+		dir->i_ctime = dir->i_mtime = current_ctime(dir);
 		inode_inc_iversion(dir);
 		d_instantiate(dentry, inode);
 		dget(dentry); /* Extra count - pin the dentry in core */
@@ -3016,7 +3017,7 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
 	}
 
 	dir->i_size += BOGO_DIRENT_SIZE;
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
+	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_ctime(inode);
 	inode_inc_iversion(dir);
 	inc_nlink(inode);
 	ihold(inode);	/* New dentry reference */
@@ -3034,7 +3035,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 		shmem_free_inode(inode->i_sb);
 
 	dir->i_size -= BOGO_DIRENT_SIZE;
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
+	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_ctime(inode);
 	inode_inc_iversion(dir);
 	drop_nlink(inode);
 	dput(dentry);	/* Undo the count from "create" - this does all the work */
@@ -3124,7 +3125,7 @@ static int shmem_rename2(struct mnt_idmap *idmap,
 	new_dir->i_size += BOGO_DIRENT_SIZE;
 	old_dir->i_ctime = old_dir->i_mtime =
 	new_dir->i_ctime = new_dir->i_mtime =
-	inode->i_ctime = current_time(old_dir);
+	inode->i_ctime = current_ctime(old_dir);
 	inode_inc_iversion(old_dir);
 	inode_inc_iversion(new_dir);
 	return 0;
@@ -3178,7 +3179,7 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
 		folio_put(folio);
 	}
 	dir->i_size += BOGO_DIRENT_SIZE;
-	dir->i_ctime = dir->i_mtime = current_time(dir);
+	dir->i_ctime = dir->i_mtime = current_ctime(dir);
 	inode_inc_iversion(dir);
 	d_instantiate(dentry, inode);
 	dget(dentry);
@@ -3250,7 +3251,7 @@ static int shmem_fileattr_set(struct mnt_idmap *idmap,
 		(fa->flags & SHMEM_FL_USER_MODIFIABLE);
 
 	shmem_set_inode_flags(inode, info->fsflags);
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	inode_inc_iversion(inode);
 	return 0;
 }
@@ -3320,7 +3321,7 @@ static int shmem_xattr_handler_set(const struct xattr_handler *handler,
 	name = xattr_full_name(handler, name);
 	err = simple_xattr_set(&info->xattrs, name, value, size, flags, NULL);
 	if (!err) {
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		inode_inc_iversion(inode);
 	}
 	return err;
@@ -4052,9 +4053,9 @@ static struct file_system_type shmem_fs_type = {
 #endif
 	.kill_sb	= kill_litter_super,
 #ifdef CONFIG_SHMEM
-	.fs_flags	= FS_USERNS_MOUNT | FS_ALLOW_IDMAP,
+	.fs_flags	= FS_USERNS_MOUNT | FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS
 #else
-	.fs_flags	= FS_USERNS_MOUNT,
+	.fs_flags	= FS_USERNS_MOUNT | FS_MULTIGRAIN_TS,
 #endif
 };
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 4/6] xfs: convert to multigrain timestamps
  2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
                   ` (2 preceding siblings ...)
  2023-05-03 14:20 ` [PATCH v3 3/6] shmem: convert to " Jeff Layton
@ 2023-05-03 14:20 ` Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 5/6] ext4: " Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 6/6] btrfs: " Jeff Layton
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

With this change, also have XFS stop reporting a STATX_CHANGE_COOKIE, so
that nfsd will use the ctime instead.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/xfs/libxfs/xfs_inode_buf.c   |  2 +-
 fs/xfs/libxfs/xfs_trans_inode.c |  2 +-
 fs/xfs/xfs_acl.c                |  2 +-
 fs/xfs/xfs_bmap_util.c          |  2 +-
 fs/xfs/xfs_inode.c              |  2 +-
 fs/xfs/xfs_inode_item.c         |  2 +-
 fs/xfs/xfs_iops.c               | 15 ++++++++++++---
 fs/xfs/xfs_super.c              |  2 +-
 8 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 758aacd8166b..c29e961fac34 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -316,7 +316,7 @@ xfs_inode_to_disk(
 
 	to->di_atime = xfs_inode_to_disk_ts(ip, inode->i_atime);
 	to->di_mtime = xfs_inode_to_disk_ts(ip, inode->i_mtime);
-	to->di_ctime = xfs_inode_to_disk_ts(ip, inode->i_ctime);
+	to->di_ctime = xfs_inode_to_disk_ts(ip, ctime_peek(inode));
 	to->di_nlink = cpu_to_be32(inode->i_nlink);
 	to->di_gen = cpu_to_be32(inode->i_generation);
 	to->di_mode = cpu_to_be16(inode->i_mode);
diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
index 8b5547073379..c08be3aa3339 100644
--- a/fs/xfs/libxfs/xfs_trans_inode.c
+++ b/fs/xfs/libxfs/xfs_trans_inode.c
@@ -63,7 +63,7 @@ xfs_trans_ichgtime(
 	ASSERT(tp);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
-	tv = current_time(inode);
+	tv = current_ctime(inode);
 
 	if (flags & XFS_ICHGTIME_MOD)
 		inode->i_mtime = tv;
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 791db7d9c849..85353e6e9004 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -233,7 +233,7 @@ xfs_acl_set_mode(
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	inode->i_mode = mode;
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
 	if (xfs_has_wsync(mp))
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index a09dd2606479..e9cb1bfb9574 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1757,7 +1757,7 @@ xfs_swap_extents(
 	 * under it.
 	 */
 	if ((sbp->bs_ctime.tv_sec != VFS_I(ip)->i_ctime.tv_sec) ||
-	    (sbp->bs_ctime.tv_nsec != VFS_I(ip)->i_ctime.tv_nsec) ||
+	    (sbp->bs_ctime.tv_nsec != ctime_nsec_peek(VFS_I(ip))) ||
 	    (sbp->bs_mtime.tv_sec != VFS_I(ip)->i_mtime.tv_sec) ||
 	    (sbp->bs_mtime.tv_nsec != VFS_I(ip)->i_mtime.tv_nsec)) {
 		error = -EBUSY;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 5808abab786c..ac299c1a9838 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -843,7 +843,7 @@ xfs_init_new_inode(
 	ip->i_df.if_nextents = 0;
 	ASSERT(ip->i_nblocks == 0);
 
-	tv = current_time(inode);
+	tv = current_ctime(inode);
 	inode->i_mtime = tv;
 	inode->i_atime = tv;
 	inode->i_ctime = tv;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index ca2941ab6cbc..018f187387f0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -381,7 +381,7 @@ xfs_inode_to_log_dinode(
 	memset(to->di_pad3, 0, sizeof(to->di_pad3));
 	to->di_atime = xfs_inode_to_log_dinode_ts(ip, inode->i_atime);
 	to->di_mtime = xfs_inode_to_log_dinode_ts(ip, inode->i_mtime);
-	to->di_ctime = xfs_inode_to_log_dinode_ts(ip, inode->i_ctime);
+	to->di_ctime = xfs_inode_to_log_dinode_ts(ip, ctime_peek(inode));
 	to->di_nlink = inode->i_nlink;
 	to->di_gen = inode->i_generation;
 	to->di_mode = inode->i_mode;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 24718adb3c16..f41155cfbbe2 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -573,8 +573,17 @@ xfs_vn_getattr(
 	stat->gid = vfsgid_into_kgid(vfsgid);
 	stat->ino = ip->i_ino;
 	stat->atime = inode->i_atime;
-	stat->mtime = inode->i_mtime;
-	stat->ctime = inode->i_ctime;
+	generic_fill_multigrain_cmtime(request_mask, inode, stat);
+
+	/*
+	 * XFS's i_version counter doesn't conform to the rules that other
+	 * filesystems live by. In particular, it changes the version on atime
+	 * updates which leads to excess cache invalidations on NFS. Just clear
+	 * the STATX_CHANGE_COOKIE flag so that nfsd (and others) use the
+	 * (multigrain) ctime instead.
+	 */
+	stat->result_mask &= ~STATX_CHANGE_COOKIE;
+
 	stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
 
 	if (xfs_has_v3inodes(mp)) {
@@ -917,7 +926,7 @@ xfs_setattr_size(
 	if (newsize != oldsize &&
 	    !(iattr->ia_valid & (ATTR_CTIME | ATTR_MTIME))) {
 		iattr->ia_ctime = iattr->ia_mtime =
-			current_time(inode);
+			current_ctime(inode);
 		iattr->ia_valid |= ATTR_CTIME | ATTR_MTIME;
 	}
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4f814f9e12ab..db3943d09532 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1976,7 +1976,7 @@ static struct file_system_type xfs_fs_type = {
 	.init_fs_context	= xfs_init_fs_context,
 	.parameters		= xfs_fs_parameters,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 };
 MODULE_ALIAS_FS("xfs");
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 5/6] ext4: convert to multigrain timestamps
  2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
                   ` (3 preceding siblings ...)
  2023-05-03 14:20 ` [PATCH v3 4/6] xfs: " Jeff Layton
@ 2023-05-03 14:20 ` Jeff Layton
  2023-05-03 14:20 ` [PATCH v3 6/6] btrfs: " Jeff Layton
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ext4/acl.c     |  2 +-
 fs/ext4/extents.c | 10 +++++-----
 fs/ext4/ialloc.c  |  2 +-
 fs/ext4/inline.c  |  4 ++--
 fs/ext4/inode.c   | 24 +++++++++++++++++++-----
 fs/ext4/ioctl.c   |  8 ++++----
 fs/ext4/namei.c   | 20 ++++++++++----------
 fs/ext4/super.c   |  4 ++--
 fs/ext4/xattr.c   |  2 +-
 9 files changed, 45 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 27fcbddfb148..1f9cf0bdbd3f 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -259,7 +259,7 @@ ext4_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
 	error = __ext4_set_acl(handle, inode, type, acl, 0 /* xattr_flags */);
 	if (!error && update_mode) {
 		inode->i_mode = mode;
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		error = ext4_mark_inode_dirty(handle, inode);
 	}
 out_stop:
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3559ea6b0781..76ac6790869e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4484,7 +4484,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset,
 		map.m_lblk += ret;
 		map.m_len = len = len - ret;
 		epos = (loff_t)map.m_lblk << inode->i_blkbits;
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		if (new_size) {
 			if (epos > new_size)
 				epos = new_size;
@@ -4618,7 +4618,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 		}
 		/* Now release the pages and zero block aligned part of pages */
 		truncate_pagecache_range(inode, start, end - 1);
-		inode->i_mtime = inode->i_ctime = current_time(inode);
+		inode->i_mtime = inode->i_ctime = current_ctime(inode);
 
 		ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
 					     flags);
@@ -4643,7 +4643,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
 		goto out_mutex;
 	}
 
-	inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_ctime = current_ctime(inode);
 	if (new_size)
 		ext4_update_inode_size(inode, new_size);
 	ret = ext4_mark_inode_dirty(handle, inode);
@@ -5392,7 +5392,7 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 	up_write(&EXT4_I(inode)->i_data_sem);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
-	inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_ctime = current_ctime(inode);
 	ret = ext4_mark_inode_dirty(handle, inode);
 	ext4_update_inode_fsync_trans(handle, inode, 1);
 
@@ -5509,7 +5509,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 	/* Expand file to avoid data loss if there is error while shifting */
 	inode->i_size += len;
 	EXT4_I(inode)->i_disksize += len;
-	inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_ctime = current_ctime(inode);
 	ret = ext4_mark_inode_dirty(handle, inode);
 	if (ret)
 		goto out_stop;
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 157663031f8c..cf6973286acc 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1248,7 +1248,7 @@ struct inode *__ext4_new_inode(struct mnt_idmap *idmap,
 	inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb);
 	/* This is the optimal IO size (for stat), not the fs block size */
 	inode->i_blocks = 0;
-	inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_atime = inode->i_ctime = current_ctime(inode);
 	ei->i_crtime = inode->i_mtime;
 
 	memset(ei->i_data, 0, sizeof(ei->i_data));
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 1602d74b5eeb..6dbbd1a31fbe 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1054,7 +1054,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 	 * happen is that the times are slightly out of date
 	 * and/or different from the directory change time.
 	 */
-	dir->i_mtime = dir->i_ctime = current_time(dir);
+	dir->i_mtime = dir->i_ctime = current_ctime(dir);
 	ext4_update_dx_flag(dir);
 	inode_inc_iversion(dir);
 	return 1;
@@ -2015,7 +2015,7 @@ int ext4_inline_data_truncate(struct inode *inode, int *has_inline)
 		ext4_orphan_del(handle, inode);
 
 	if (err == 0) {
-		inode->i_mtime = inode->i_ctime = current_time(inode);
+		inode->i_mtime = inode->i_ctime = current_ctime(inode);
 		err = ext4_mark_inode_dirty(handle, inode);
 		if (IS_SYNC(inode))
 			ext4_handle_sync(handle);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index bf0b7dea4900..135fa0bf445c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4201,7 +4201,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
 
-	inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_ctime = current_ctime(inode);
 	ret2 = ext4_mark_inode_dirty(handle, inode);
 	if (unlikely(ret2))
 		ret = ret2;
@@ -4361,7 +4361,7 @@ int ext4_truncate(struct inode *inode)
 	if (inode->i_nlink)
 		ext4_orphan_del(handle, inode);
 
-	inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_ctime = current_ctime(inode);
 	err2 = ext4_mark_inode_dirty(handle, inode);
 	if (unlikely(err2 && !err))
 		err = err2;
@@ -4424,6 +4424,19 @@ static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
 	return 0;
 }
 
+static void ext4_inode_set_ctime(struct inode *inode, struct ext4_inode *raw_inode)
+{
+	struct timespec64 ctime = ctime_peek(inode);
+
+	if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), i_ctime_extra)) {
+		raw_inode->i_ctime = cpu_to_le32(ctime.tv_sec);
+		raw_inode->i_ctime_extra = ext4_encode_extra_time(&ctime);
+	} else {
+		raw_inode->i_ctime = cpu_to_le32(clamp_t(int32_t,
+					ctime.tv_sec, S32_MIN, S32_MAX));
+	}
+}
+
 static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
@@ -4464,7 +4477,7 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode
 	}
 	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
 
-	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+	ext4_inode_set_ctime(inode, raw_inode);
 	EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
 	EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
 	EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
@@ -5172,7 +5185,7 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
 		spin_unlock(&inode->i_lock);
 
 		spin_lock(&ei->i_raw_lock);
-		EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+		ext4_inode_set_ctime(inode, raw_inode);
 		EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
 		EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
 		ext4_inode_csum_set(inode, raw_inode, ei);
@@ -5568,7 +5581,7 @@ int ext4_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 			 * update c/mtime in shrink case below
 			 */
 			if (!shrink) {
-				inode->i_mtime = current_time(inode);
+				inode->i_mtime = current_ctime(inode);
 				inode->i_ctime = inode->i_mtime;
 			}
 
@@ -5729,6 +5742,7 @@ int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
 				  STATX_ATTR_VERITY);
 
 	generic_fillattr(idmap, inode, stat);
+	generic_fill_multigrain_cmtime(request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f9a430152063..4244ea049065 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -449,7 +449,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
 	diff = size - size_bl;
 	swap_inode_data(inode, inode_bl);
 
-	inode->i_ctime = inode_bl->i_ctime = current_time(inode);
+	inode->i_ctime = inode_bl->i_ctime = current_ctime(inode);
 	inode_inc_iversion(inode);
 
 	inode->i_generation = get_random_u32();
@@ -663,7 +663,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
 
 	ext4_set_inode_flags(inode, false);
 
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	inode_inc_iversion(inode);
 
 	err = ext4_mark_iloc_dirty(handle, inode, &iloc);
@@ -774,7 +774,7 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
 	}
 
 	EXT4_I(inode)->i_projid = kprojid;
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	inode_inc_iversion(inode);
 out_dirty:
 	rc = ext4_mark_iloc_dirty(handle, inode, &iloc);
@@ -1257,7 +1257,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		}
 		err = ext4_reserve_inode_write(handle, inode, &iloc);
 		if (err == 0) {
-			inode->i_ctime = current_time(inode);
+			inode->i_ctime = current_ctime(inode);
 			inode_inc_iversion(inode);
 			inode->i_generation = generation;
 			err = ext4_mark_iloc_dirty(handle, inode, &iloc);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a5010b5b8a8c..1615ae8f8026 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2187,7 +2187,7 @@ static int add_dirent_to_buf(handle_t *handle, struct ext4_filename *fname,
 	 * happen is that the times are slightly out of date
 	 * and/or different from the directory change time.
 	 */
-	dir->i_mtime = dir->i_ctime = current_time(dir);
+	dir->i_mtime = dir->i_ctime = current_ctime(dir);
 	ext4_update_dx_flag(dir);
 	inode_inc_iversion(dir);
 	err2 = ext4_mark_inode_dirty(handle, dir);
@@ -3176,7 +3176,7 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)
 	 * recovery. */
 	inode->i_size = 0;
 	ext4_orphan_add(handle, inode);
-	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
+	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_ctime(inode);
 	retval = ext4_mark_inode_dirty(handle, inode);
 	if (retval)
 		goto end_rmdir;
@@ -3250,7 +3250,7 @@ int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
 		retval = ext4_delete_entry(handle, dir, de, bh);
 		if (retval)
 			goto out_handle;
-		dir->i_ctime = dir->i_mtime = current_time(dir);
+		dir->i_ctime = dir->i_mtime = current_ctime(dir);
 		ext4_update_dx_flag(dir);
 		retval = ext4_mark_inode_dirty(handle, dir);
 		if (retval)
@@ -3265,7 +3265,7 @@ int __ext4_unlink(struct inode *dir, const struct qstr *d_name,
 		drop_nlink(inode);
 	if (!inode->i_nlink)
 		ext4_orphan_add(handle, inode);
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	retval = ext4_mark_inode_dirty(handle, inode);
 	if (dentry && !retval)
 		ext4_fc_track_unlink(handle, dentry);
@@ -3442,7 +3442,7 @@ int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry)
 	if (IS_DIRSYNC(dir))
 		ext4_handle_sync(handle);
 
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	ext4_inc_count(inode);
 	ihold(inode);
 
@@ -3621,7 +3621,7 @@ static int ext4_setent(handle_t *handle, struct ext4_renament *ent,
 		ent->de->file_type = file_type;
 	inode_inc_iversion(ent->dir);
 	ent->dir->i_ctime = ent->dir->i_mtime =
-		current_time(ent->dir);
+		current_ctime(ent->dir);
 	retval = ext4_mark_inode_dirty(handle, ent->dir);
 	BUFFER_TRACE(ent->bh, "call ext4_handle_dirty_metadata");
 	if (!ent->inlined) {
@@ -3929,7 +3929,7 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	 * Like most other Unix systems, set the ctime for inodes on a
 	 * rename.
 	 */
-	old.inode->i_ctime = current_time(old.inode);
+	old.inode->i_ctime = current_ctime(old.inode);
 	retval = ext4_mark_inode_dirty(handle, old.inode);
 	if (unlikely(retval))
 		goto end_rename;
@@ -3943,9 +3943,9 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 
 	if (new.inode) {
 		ext4_dec_count(new.inode);
-		new.inode->i_ctime = current_time(new.inode);
+		new.inode->i_ctime = current_ctime(new.inode);
 	}
-	old.dir->i_ctime = old.dir->i_mtime = current_time(old.dir);
+	old.dir->i_ctime = old.dir->i_mtime = current_ctime(old.dir);
 	ext4_update_dx_flag(old.dir);
 	if (old.dir_bh) {
 		retval = ext4_rename_dir_finish(handle, &old, new.dir->i_ino);
@@ -4139,7 +4139,7 @@ static int ext4_cross_rename(struct inode *old_dir, struct dentry *old_dentry,
 	 * Like most other Unix systems, set the ctime for inodes on a
 	 * rename.
 	 */
-	ctime = current_time(old.inode);
+	ctime = current_ctime(old.inode);
 	old.inode->i_ctime = ctime;
 	new.inode->i_ctime = ctime;
 	retval = ext4_mark_inode_dirty(handle, old.inode);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f43e526112ae..cca7726eceff 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -7051,7 +7051,7 @@ static int ext4_quota_off(struct super_block *sb, int type)
 	}
 	EXT4_I(inode)->i_flags &= ~(EXT4_NOATIME_FL | EXT4_IMMUTABLE_FL);
 	inode_set_flags(inode, 0, S_NOATIME | S_IMMUTABLE);
-	inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_mtime = inode->i_ctime = current_ctime(inode);
 	err = ext4_mark_inode_dirty(handle, inode);
 	ext4_journal_stop(handle);
 out_unlock:
@@ -7227,7 +7227,7 @@ static struct file_system_type ext4_fs_type = {
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 };
 MODULE_ALIAS_FS("ext4");
 
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 767454d74cd6..160f203d211e 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2475,7 +2475,7 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index,
 	}
 	if (!error) {
 		ext4_xattr_update_super_block(handle, inode->i_sb);
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		inode_inc_iversion(inode);
 		if (!value)
 			no_expand = 0;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 6/6] btrfs: convert to multigrain timestamps
  2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
                   ` (4 preceding siblings ...)
  2023-05-03 14:20 ` [PATCH v3 5/6] ext4: " Jeff Layton
@ 2023-05-03 14:20 ` Jeff Layton
  5 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2023-05-03 14:20 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/btrfs/delayed-inode.c |  2 +-
 fs/btrfs/file.c          | 10 +++++-----
 fs/btrfs/inode.c         | 25 +++++++++++++------------
 fs/btrfs/ioctl.c         |  6 +++---
 fs/btrfs/reflink.c       |  2 +-
 fs/btrfs/super.c         |  5 +++--
 fs/btrfs/transaction.c   |  2 +-
 fs/btrfs/tree-log.c      |  2 +-
 fs/btrfs/volumes.c       |  2 +-
 fs/btrfs/xattr.c         |  4 ++--
 10 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 6b457b010cbc..8307fd69da43 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1810,7 +1810,7 @@ static void fill_stack_inode_item(struct btrfs_trans_handle *trans,
 	btrfs_set_stack_timespec_sec(&inode_item->ctime,
 				     inode->i_ctime.tv_sec);
 	btrfs_set_stack_timespec_nsec(&inode_item->ctime,
-				      inode->i_ctime.tv_nsec);
+				      ctime_nsec_peek(inode));
 
 	btrfs_set_stack_timespec_sec(&inode_item->otime,
 				     BTRFS_I(inode)->i_otime.tv_sec);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 5cc5a1faaef5..3344f64f58dc 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1113,7 +1113,7 @@ static void update_time_for_write(struct inode *inode)
 	if (IS_NOCMTIME(inode))
 		return;
 
-	now = current_time(inode);
+	now = current_ctime(inode);
 	if (!timespec64_equal(&inode->i_mtime, &now))
 		inode->i_mtime = now;
 
@@ -2473,7 +2473,7 @@ int btrfs_replace_file_extents(struct btrfs_inode *inode,
 		inode_inc_iversion(&inode->vfs_inode);
 
 		if (!extent_info || extent_info->update_times) {
-			inode->vfs_inode.i_mtime = current_time(&inode->vfs_inode);
+			inode->vfs_inode.i_mtime = current_ctime(&inode->vfs_inode);
 			inode->vfs_inode.i_ctime = inode->vfs_inode.i_mtime;
 		}
 
@@ -2716,7 +2716,7 @@ static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len)
 
 	ASSERT(trans != NULL);
 	inode_inc_iversion(inode);
-	inode->i_mtime = current_time(inode);
+	inode->i_mtime = current_ctime(inode);
 	inode->i_ctime = inode->i_mtime;
 	ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
 	updated_inode = true;
@@ -2734,7 +2734,7 @@ static int btrfs_punch_hole(struct file *file, loff_t offset, loff_t len)
 		 * for detecting, at fsync time, if the inode isn't yet in the
 		 * log tree or it's there but not up to date.
 		 */
-		struct timespec64 now = current_time(inode);
+		struct timespec64 now = current_ctime(inode);
 
 		inode_inc_iversion(inode);
 		inode->i_mtime = now;
@@ -2809,7 +2809,7 @@ static int btrfs_fallocate_update_isize(struct inode *inode,
 	if (IS_ERR(trans))
 		return PTR_ERR(trans);
 
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	i_size_write(inode, end);
 	btrfs_inode_safe_disk_i_size_write(BTRFS_I(inode), 0);
 	ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 957e4d76a7b6..889ff97d9595 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4029,7 +4029,7 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
 	btrfs_set_token_timespec_sec(&token, &item->ctime,
 				     inode->i_ctime.tv_sec);
 	btrfs_set_token_timespec_nsec(&token, &item->ctime,
-				      inode->i_ctime.tv_nsec);
+				      ctime_nsec_peek(inode));
 
 	btrfs_set_token_timespec_sec(&token, &item->otime,
 				     BTRFS_I(inode)->i_otime.tv_sec);
@@ -4227,7 +4227,7 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
 	btrfs_i_size_write(dir, dir->vfs_inode.i_size - name->len * 2);
 	inode_inc_iversion(&inode->vfs_inode);
 	inode_inc_iversion(&dir->vfs_inode);
-	inode->vfs_inode.i_ctime = current_time(&inode->vfs_inode);
+	inode->vfs_inode.i_ctime = current_ctime(&inode->vfs_inode);
 	dir->vfs_inode.i_mtime = inode->vfs_inode.i_ctime;
 	dir->vfs_inode.i_ctime = inode->vfs_inode.i_ctime;
 	ret = btrfs_update_inode(trans, root, dir);
@@ -4409,7 +4409,7 @@ static int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
 
 	btrfs_i_size_write(dir, dir->vfs_inode.i_size - fname.disk_name.len * 2);
 	inode_inc_iversion(&dir->vfs_inode);
-	dir->vfs_inode.i_mtime = current_time(&dir->vfs_inode);
+	dir->vfs_inode.i_mtime = current_ctime(&dir->vfs_inode);
 	dir->vfs_inode.i_ctime = dir->vfs_inode.i_mtime;
 	ret = btrfs_update_inode_fallback(trans, root, dir);
 	if (ret)
@@ -5052,7 +5052,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr)
 	if (newsize != oldsize) {
 		inode_inc_iversion(inode);
 		if (!(mask & (ATTR_CTIME | ATTR_MTIME))) {
-			inode->i_mtime = current_time(inode);
+			inode->i_mtime = current_ctime(inode);
 			inode->i_ctime = inode->i_mtime;
 		}
 	}
@@ -5693,7 +5693,7 @@ static struct inode *new_simple_dir(struct super_block *s,
 	inode->i_opflags &= ~IOP_XATTR;
 	inode->i_fop = &simple_dir_operations;
 	inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO;
-	inode->i_mtime = current_time(inode);
+	inode->i_mtime = current_ctime(inode);
 	inode->i_atime = inode->i_mtime;
 	inode->i_ctime = inode->i_mtime;
 	BTRFS_I(inode)->i_otime = inode->i_mtime;
@@ -6335,7 +6335,7 @@ int btrfs_create_new_inode(struct btrfs_trans_handle *trans,
 		goto discard;
 	}
 
-	inode->i_mtime = current_time(inode);
+	inode->i_mtime = current_ctime(inode);
 	inode->i_atime = inode->i_mtime;
 	inode->i_ctime = inode->i_mtime;
 	BTRFS_I(inode)->i_otime = inode->i_mtime;
@@ -6503,7 +6503,7 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
 	 * values (the ones it had when the fsync was done).
 	 */
 	if (!test_bit(BTRFS_FS_LOG_RECOVERING, &root->fs_info->flags)) {
-		struct timespec64 now = current_time(&parent_inode->vfs_inode);
+		struct timespec64 now = current_ctime(&parent_inode->vfs_inode);
 
 		parent_inode->vfs_inode.i_mtime = now;
 		parent_inode->vfs_inode.i_ctime = now;
@@ -6647,7 +6647,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
 	BTRFS_I(inode)->dir_index = 0ULL;
 	inc_nlink(inode);
 	inode_inc_iversion(inode);
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	ihold(inode);
 	set_bit(BTRFS_INODE_COPY_EVERYTHING, &BTRFS_I(inode)->runtime_flags);
 
@@ -8659,6 +8659,7 @@ static int btrfs_getattr(struct mnt_idmap *idmap,
 				  STATX_ATTR_NODUMP);
 
 	generic_fillattr(idmap, inode, stat);
+	generic_fill_multigrain_cmtime(request_mask, inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
 
 	spin_lock(&BTRFS_I(inode)->lock);
@@ -8682,7 +8683,7 @@ static int btrfs_rename_exchange(struct inode *old_dir,
 	struct btrfs_root *dest = BTRFS_I(new_dir)->root;
 	struct inode *new_inode = new_dentry->d_inode;
 	struct inode *old_inode = old_dentry->d_inode;
-	struct timespec64 ctime = current_time(old_inode);
+	struct timespec64 ctime = current_ctime(old_inode);
 	struct btrfs_rename_ctx old_rename_ctx;
 	struct btrfs_rename_ctx new_rename_ctx;
 	u64 old_ino = btrfs_ino(BTRFS_I(old_inode));
@@ -9082,7 +9083,7 @@ static int btrfs_rename(struct mnt_idmap *idmap,
 	inode_inc_iversion(old_dir);
 	inode_inc_iversion(new_dir);
 	inode_inc_iversion(old_inode);
-	old_dir->i_mtime = current_time(old_dir);
+	old_dir->i_mtime = current_ctime(old_dir);
 	old_dir->i_ctime = old_dir->i_mtime;
 	new_dir->i_mtime = old_dir->i_mtime;
 	new_dir->i_ctime = old_dir->i_mtime;
@@ -9108,7 +9109,7 @@ static int btrfs_rename(struct mnt_idmap *idmap,
 
 	if (new_inode) {
 		inode_inc_iversion(new_inode);
-		new_inode->i_ctime = current_time(new_inode);
+		new_inode->i_ctime = current_ctime(new_inode);
 		if (unlikely(btrfs_ino(BTRFS_I(new_inode)) ==
 			     BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) {
 			ret = btrfs_unlink_subvol(trans, BTRFS_I(new_dir), new_dentry);
@@ -9648,7 +9649,7 @@ static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
 		*alloc_hint = ins.objectid + ins.offset;
 
 		inode_inc_iversion(inode);
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		BTRFS_I(inode)->flags |= BTRFS_INODE_PREALLOC;
 		if (!(mode & FALLOC_FL_KEEP_SIZE) &&
 		    (actual_len > inode->i_size) &&
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index ba769a1eb87a..4b862d777fa7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -384,7 +384,7 @@ int btrfs_fileattr_set(struct mnt_idmap *idmap,
 	binode->flags = binode_flags;
 	btrfs_sync_inode_flags_to_i_flags(inode);
 	inode_inc_iversion(inode);
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
 
  out_end_trans:
@@ -591,7 +591,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
 	struct btrfs_root *root = BTRFS_I(dir)->root;
 	struct btrfs_root *new_root;
 	struct btrfs_block_rsv block_rsv;
-	struct timespec64 cur_time = current_time(dir);
+	struct timespec64 cur_time = current_ctime(dir);
 	struct btrfs_new_inode_args new_inode_args = {
 		.dir = dir,
 		.dentry = dentry,
@@ -3918,7 +3918,7 @@ static long _btrfs_ioctl_set_received_subvol(struct file *file,
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_root_item *root_item = &root->root_item;
 	struct btrfs_trans_handle *trans;
-	struct timespec64 ct = current_time(inode);
+	struct timespec64 ct = current_ctime(inode);
 	int ret = 0;
 	int received_uuid_changed;
 
diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
index 0474bbe39da7..59d3ce505098 100644
--- a/fs/btrfs/reflink.c
+++ b/fs/btrfs/reflink.c
@@ -30,7 +30,7 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
 
 	inode_inc_iversion(inode);
 	if (!no_time_update) {
-		inode->i_mtime = current_time(inode);
+		inode->i_mtime = current_ctime(inode);
 		inode->i_ctime = inode->i_mtime;
 	}
 	/*
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 366fb4cde145..dc8dddbc12b9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2139,7 +2139,7 @@ static struct file_system_type btrfs_fs_type = {
 	.name		= "btrfs",
 	.mount		= btrfs_mount,
 	.kill_sb	= btrfs_kill_super,
-	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_MULTIGRAIN_TS,
 };
 
 static struct file_system_type btrfs_root_fs_type = {
@@ -2147,7 +2147,8 @@ static struct file_system_type btrfs_root_fs_type = {
 	.name		= "btrfs",
 	.mount		= btrfs_mount_root,
 	.kill_sb	= btrfs_kill_super,
-	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
+	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
+			  FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 };
 
 MODULE_ALIAS_FS("btrfs");
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b8d5b1fa9a03..277aedfce808 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1838,7 +1838,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 
 	btrfs_i_size_write(BTRFS_I(parent_inode), parent_inode->i_size +
 						  fname.disk_name.len * 2);
-	parent_inode->i_mtime = current_time(parent_inode);
+	parent_inode->i_mtime = current_ctime(parent_inode);
 	parent_inode->i_ctime = parent_inode->i_mtime;
 	ret = btrfs_update_inode_fallback(trans, parent_root, BTRFS_I(parent_inode));
 	if (ret) {
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 200cea6e49e5..1e0e25dafa47 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4177,7 +4177,7 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
 	btrfs_set_token_timespec_sec(&token, &item->ctime,
 				     inode->i_ctime.tv_sec);
 	btrfs_set_token_timespec_nsec(&token, &item->ctime,
-				      inode->i_ctime.tv_nsec);
+				      ctime_nsec_peek(inode));
 
 	/*
 	 * We do not need to set the nbytes field, in fact during a fast fsync
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c6d592870400..d89f1afde366 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1917,7 +1917,7 @@ static void update_dev_time(const char *device_path)
 	if (ret)
 		return;
 
-	now = current_time(d_inode(path.dentry));
+	now = current_ctime(d_inode(path.dentry));
 	inode_update_time(d_inode(path.dentry), &now, S_MTIME | S_CTIME);
 	path_put(&path);
 }
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 0ebeaf4e81f9..30a37333e92a 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -264,7 +264,7 @@ int btrfs_setxattr_trans(struct inode *inode, const char *name,
 		goto out;
 
 	inode_inc_iversion(inode);
-	inode->i_ctime = current_time(inode);
+	inode->i_ctime = current_ctime(inode);
 	ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
 	if (ret)
 		btrfs_abort_transaction(trans, ret);
@@ -407,7 +407,7 @@ static int btrfs_xattr_handler_set_prop(const struct xattr_handler *handler,
 	ret = btrfs_set_prop(trans, inode, name, value, size, flags);
 	if (!ret) {
 		inode_inc_iversion(inode);
-		inode->i_ctime = current_time(inode);
+		inode->i_ctime = current_ctime(inode);
 		ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
 		if (ret)
 			btrfs_abort_transaction(trans, ret);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-03 14:20 ` [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
@ 2023-05-05  0:10   ` Dave Chinner
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2023-05-05  0:10 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Chuck Lever, Jan Kara, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs

On Wed, May 03, 2023 at 10:20:32AM -0400, Jeff Layton wrote:
> The VFS always uses coarse-grained timestamp updates for filling out the
> ctime and mtime after a change. This has the benefit of allowing
> filesystems to optimize away a lot metadata updates, down to around 1
> per jiffy, even when a file is under heavy writes.
> 
> Unfortunately, this has always been an issue when we're exporting via
> NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> lot of exported filesystems don't properly support a change attribute
> and are subject to the same problems with timestamp granularity. Other
> applications have similar issues (e.g backup applications).
> 
> Switching to always using fine-grained timestamps would improve the
> situation, but that becomes rather expensive, as the underlying
> filesystem will have to log a lot more metadata updates.
> 
> What we need is a way to only use fine-grained timestamps when they are
> being actively queried.
> 
> The kernel always stores normalized ctime values, so only the first 30
> bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> ctime must also change.
> 
> Use the 31st bit of the tv_nsec field to indicate that something has
> queried the inode for the i_mtime or i_ctime. When this flag is set, on
> the next timestamp update, the kernel can fetch a fine-grained timestamp
> instead of the usual coarse-grained one.
> 
> This patch adds the infrastructure this scheme. Filesytems can opt
> into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> 
> Later patches will convert individual filesystems over to use it.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/inode.c         | 52 ++++++++++++++++++++++++++++++++++++---
>  fs/stat.c          | 32 ++++++++++++++++++++++++
>  include/linux/fs.h | 61 +++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 141 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 4558dc2f1355..7f6189961d6a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -2030,6 +2030,7 @@ EXPORT_SYMBOL(file_remove_privs);
>  static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
>  {
>  	int sync_it = 0;
> +	struct timespec64 ctime;
>  
>  	/* First try to exhaust all avenues to not sync */
>  	if (IS_NOCMTIME(inode))
> @@ -2038,7 +2039,8 @@ static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
>  	if (!timespec64_equal(&inode->i_mtime, now))
>  		sync_it = S_MTIME;
>  
> -	if (!timespec64_equal(&inode->i_ctime, now))
> +	ctime = ctime_peek(inode);
> +	if (!timespec64_equal(&ctime, now))
>  		sync_it |= S_CTIME;
>  
>  	if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
> @@ -2062,6 +2064,50 @@ static int __file_update_time(struct file *file, struct timespec64 *now,
>  	return ret;
>  }
>  
> +/**
> + * current_ctime - Return FS time (possibly fine-grained)
> + * @inode: inode.
> + *
> + * Return the current time truncated to the time granularity supported by
> + * the fs, as suitable for a ctime/mtime change.
> + *
> + * For a multigrain timestamp, if the ctime is flagged as having been
> + * QUERIED, get a fine-grained timestamp.
> + */
> +struct timespec64 current_ctime(struct inode *inode)
> +{
> +	bool multigrain = is_multigrain_ts(inode);
> +	struct timespec64 now;
> +	long nsec = 0;
> +
> +	if (multigrain) {
> +		atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
> +
> +		nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED, pnsec);
> +	}
> +
> +	if (nsec & I_CTIME_QUERIED) {
> +		ktime_get_real_ts64(&now);
> +	} else {
> +		ktime_get_coarse_real_ts64(&now);
> +
> +		if (multigrain) {
> +			/*
> +			 * If we've recently fetched a fine-grained timestamp
> +			 * then the coarse-grained one may be earlier than the
> +			 * existing one. Just keep the existing ctime if so.
> +			 */
> +			struct timespec64 ctime = ctime_peek(inode);
> +
> +			if (timespec64_compare(&ctime, &now) > 0)
> +				now = ctime;
> +		}
> +	}
> +
> +	return timestamp_truncate(now, inode);
> +}
> +EXPORT_SYMBOL(current_ctime);

I can't help but think this is easier to read/follow when structured
to separate multigrain vs coarse logic completely like so:

struct timespec64 current_ctime(struct inode *inode)
{
	struct timespec64 now, ctime;
	long nsec;

	if (!is_multigrain_ts(inode)) {
		ktime_get_coarse_real_ts64(&now);
		goto out_truncate;
	}

	nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED,
			(atomic_long_t *)&inode->i_ctime.tv_nsec);

	if (nsec & I_CTIME_QUERIED) {
		ktime_get_real_ts64(&now);
		goto out_truncate;
	}

	/*
	 * If we've recently fetched a fine-grained timestamp then
	 * the coarse-grained one may be earlier than the existing
	 * one. Just keep the existing ctime if so.
	 */
	ktime_get_coarse_real_ts64(&now);
	ctime = ctime_peek(inode);
	if (timespec64_compare(&ctime, &now) > 0)
		now = ctime;

out_truncate:
	return timestamp_truncate(now, inode);
}

> diff --git a/fs/stat.c b/fs/stat.c
> index 7c238da22ef0..11a7e277f53e 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -26,6 +26,38 @@
>  #include "internal.h"
>  #include "mount.h"
>  
> +/**
> + * generic_fill_multigrain_cmtime - Fill in the mtime and ctime and flag ctime as QUERIED
> + * @request_mask: STATX_* values requested
> + * @inode: inode from which to grab the c/mtime
> + * @stat: where to store the resulting values
> + *
> + * Given @inode, grab the ctime and mtime out if it and store the result
> + * in @stat. When fetching the value, flag it as queried so the next write
> + * will use a fine-grained timestamp.
> + */
> +void generic_fill_multigrain_cmtime(u32 request_mask,struct inode *inode,
> +					struct kstat *stat)
> +{
> +	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
> +
> +	/* If neither time was requested, then just don't report it */
> +	if (!(request_mask & (STATX_CTIME|STATX_MTIME))) {
> +		stat->result_mask &= ~(STATX_CTIME|STATX_MTIME);
> +		return;
> +	}
> +
> +	stat->mtime = inode->i_mtime;
> +	stat->ctime.tv_sec = inode->i_ctime.tv_sec;
> +	/*
> +	 * Atomically set the QUERIED flag and fetch the new value with
> +	 * the flag masked off.
> +	 */
> +	stat->ctime.tv_nsec = atomic_long_fetch_or(I_CTIME_QUERIED, pnsec) &
> +					~I_CTIME_QUERIED;
> +}
> +EXPORT_SYMBOL(generic_fill_multigrain_cmtime);

Hmmm - why not just have a generic_fill_cmtime() function that hides
multigrain behaviour from all the statx callers?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-05-05  0:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-03 14:20 [PATCH v3 0/6] fs: implement multigrain timestamps Jeff Layton
2023-05-03 14:20 ` [PATCH v3 1/6] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
2023-05-05  0:10   ` Dave Chinner
2023-05-03 14:20 ` [PATCH v3 2/6] overlayfs: allow it handle multigrain timestamps Jeff Layton
2023-05-03 14:20 ` [PATCH v3 3/6] shmem: convert to " Jeff Layton
2023-05-03 14:20 ` [PATCH v3 4/6] xfs: " Jeff Layton
2023-05-03 14:20 ` [PATCH v3 5/6] ext4: " Jeff Layton
2023-05-03 14:20 ` [PATCH v3 6/6] btrfs: " Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.