linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] fs: implement multigrain timestamps
@ 2023-05-18 11:47 Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr Jeff Layton
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

v4:
- add request_mask argument to generic_fillattr
- Drop current_ctime helper and just code functionality into current_time
- rework i_ctime accessor functions

A few weeks ago, during one of the discussions around i_version, Dave
Chinner wrote this:

"You've missed the part where I suggested lifting the "nfsd sampled
i_version" state into an inode state flag rather than hiding it in
the i_version field. At that point, we could optimise away the
secondary ctime updates just like you are proposing we do with the
i_version updates.  Further, we could also use that state it to
decide whether we need to use high resolution timestamps when
recording ctime updates - if the nfsd has not sampled the
ctime/i_version, we don't need high res timestamps to be recorded
for ctime...."

While I don't think we can practically optimize away ctime updates
like we do with i_version, I do like the idea of using this scheme to
indicate when we need to use a high-res timestamp.

The basic idea here is to use an unused bit in the timespec64.tv_nsec
field to act as a flag to indicate that the value was queried since
the last time we updated it. If that flag is set when we go to update
the timestamp, we'll clear it and grab a fine-grained ktime value for
the update.

The first couple of patches add the necessary infrastructure, and the
last several patches update various filesystems to use it. For now, I'm
focusing on widely-used, exportable filesystems, but this scheme is
probably suitable for most filesystems in the kernel.

Note that this does cause at least one test failure with LTP's statx06
test. I have submitted a patch to fix the issue (by changing how it
fetches the "after" timestamp in that test).

Jeff Layton (9):
  fs: pass the request_mask to generic_fillattr
  fs: add infrastructure for multigrain inode i_m/ctime
  overlayfs: allow it to handle multigrain timestamps
  nfsd: ensure we use ctime_peek to grab the inode->i_ctime
  ksmbd: use ctime_peek to grab the ctime out of the inode
  tmpfs: add support for multigrain timestamps
  xfs: switch to multigrain timestamps
  ext4: convert to multigrain timestamps
  btrfs: convert to multigrain timestamps

 fs/9p/vfs_inode.c             |  4 +--
 fs/9p/vfs_inode_dotl.c        |  4 +--
 fs/afs/inode.c                |  2 +-
 fs/btrfs/delayed-inode.c      |  2 +-
 fs/btrfs/inode.c              |  4 +--
 fs/btrfs/super.c              |  5 +--
 fs/btrfs/tree-log.c           |  2 +-
 fs/ceph/inode.c               |  2 +-
 fs/cifs/inode.c               |  2 +-
 fs/coda/inode.c               |  3 +-
 fs/ecryptfs/inode.c           |  5 +--
 fs/erofs/inode.c              |  2 +-
 fs/exfat/file.c               |  2 +-
 fs/ext2/inode.c               |  2 +-
 fs/ext4/inode.c               | 19 ++++++++--
 fs/ext4/super.c               |  2 +-
 fs/f2fs/file.c                |  2 +-
 fs/fat/file.c                 |  2 +-
 fs/fuse/dir.c                 |  2 +-
 fs/gfs2/inode.c               |  2 +-
 fs/hfsplus/inode.c            |  2 +-
 fs/inode.c                    | 48 +++++++++++++++++++++----
 fs/kernfs/inode.c             |  2 +-
 fs/ksmbd/smb2pdu.c            | 28 +++++++--------
 fs/ksmbd/vfs.c                |  3 +-
 fs/libfs.c                    |  4 +--
 fs/minix/inode.c              |  2 +-
 fs/nfs/inode.c                |  2 +-
 fs/nfs/namespace.c            |  3 +-
 fs/nfsd/nfsfh.c               | 11 ++++--
 fs/ntfs3/file.c               |  2 +-
 fs/ocfs2/file.c               |  2 +-
 fs/orangefs/inode.c           |  2 +-
 fs/overlayfs/file.c           |  7 ++--
 fs/overlayfs/util.c           |  2 +-
 fs/proc/base.c                |  4 +--
 fs/proc/fd.c                  |  2 +-
 fs/proc/generic.c             |  2 +-
 fs/proc/proc_net.c            |  2 +-
 fs/proc/proc_sysctl.c         |  2 +-
 fs/proc/root.c                |  3 +-
 fs/stat.c                     | 59 ++++++++++++++++++++++++------
 fs/sysv/itree.c               |  3 +-
 fs/ubifs/dir.c                |  2 +-
 fs/udf/symlink.c              |  2 +-
 fs/vboxsf/utils.c             |  2 +-
 fs/xfs/libxfs/xfs_inode_buf.c |  2 +-
 fs/xfs/xfs_inode_item.c       |  2 +-
 fs/xfs/xfs_iops.c             |  4 +--
 fs/xfs/xfs_super.c            |  2 +-
 include/linux/fs.h            | 68 +++++++++++++++++++++++++++++++++--
 mm/shmem.c                    |  4 +--
 52 files changed, 260 insertions(+), 95 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-23  9:17   ` Jan Kara
  2023-05-18 11:47 ` [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

generic_fillattr just fills in the entire stat struct indiscriminately
today, copying data from the inode. There is at least one attribute
(STATX_CHANGE_COOKIE) that can have side effects when it is reported,
and we're looking at adding more with the addition of multigrain
timestamps.

Add a request_mask argument to generic_fillattr and have most callers
just pass in the value that is passed to getattr. Have other callers
(e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
STATX_CHANGE_COOKIE into generic_fillattr.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/9p/vfs_inode.c      |  4 ++--
 fs/9p/vfs_inode_dotl.c |  4 ++--
 fs/afs/inode.c         |  2 +-
 fs/btrfs/inode.c       |  2 +-
 fs/ceph/inode.c        |  2 +-
 fs/cifs/inode.c        |  2 +-
 fs/coda/inode.c        |  3 ++-
 fs/ecryptfs/inode.c    |  5 +++--
 fs/erofs/inode.c       |  2 +-
 fs/exfat/file.c        |  2 +-
 fs/ext2/inode.c        |  2 +-
 fs/ext4/inode.c        |  2 +-
 fs/f2fs/file.c         |  2 +-
 fs/fat/file.c          |  2 +-
 fs/fuse/dir.c          |  2 +-
 fs/gfs2/inode.c        |  2 +-
 fs/hfsplus/inode.c     |  2 +-
 fs/kernfs/inode.c      |  2 +-
 fs/ksmbd/smb2pdu.c     | 22 +++++++++++-----------
 fs/ksmbd/vfs.c         |  3 ++-
 fs/libfs.c             |  4 ++--
 fs/minix/inode.c       |  2 +-
 fs/nfs/inode.c         |  2 +-
 fs/nfs/namespace.c     |  3 ++-
 fs/ntfs3/file.c        |  2 +-
 fs/ocfs2/file.c        |  2 +-
 fs/orangefs/inode.c    |  2 +-
 fs/proc/base.c         |  4 ++--
 fs/proc/fd.c           |  2 +-
 fs/proc/generic.c      |  2 +-
 fs/proc/proc_net.c     |  2 +-
 fs/proc/proc_sysctl.c  |  2 +-
 fs/proc/root.c         |  3 ++-
 fs/stat.c              | 18 ++++++++++--------
 fs/sysv/itree.c        |  3 ++-
 fs/ubifs/dir.c         |  2 +-
 fs/udf/symlink.c       |  2 +-
 fs/vboxsf/utils.c      |  2 +-
 include/linux/fs.h     |  2 +-
 mm/shmem.c             |  2 +-
 40 files changed, 70 insertions(+), 62 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 36b466e35887..11bd069d038b 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1016,7 +1016,7 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
 	v9ses = v9fs_dentry2v9ses(dentry);
 	if (v9ses->cache & (CACHE_META|CACHE_LOOSE)) {
-		generic_fillattr(&nop_mnt_idmap, inode, stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 		return 0;
 	} else if (v9ses->cache & CACHE_WRITEBACK) {
 		if (S_ISREG(inode->i_mode)) {
@@ -1037,7 +1037,7 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		return PTR_ERR(st);
 
 	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
-	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
 
 	p9stat_free(st);
 	kfree(st);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 5361cd2d7996..04083fbf0c91 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -451,7 +451,7 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
 	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
 	v9ses = v9fs_dentry2v9ses(dentry);
 	if (v9ses->cache & (CACHE_META|CACHE_LOOSE)) {
-		generic_fillattr(&nop_mnt_idmap, inode, stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 		return 0;
 	} else if (v9ses->cache) {
 		if (S_ISREG(inode->i_mode)) {
@@ -476,7 +476,7 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
 		return PTR_ERR(st);
 
 	v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
-	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
 	/* Change block size to what the server returned */
 	stat->blksize = st->st_blksize;
 
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 866bab860a88..54a4a1dd3bba 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -773,7 +773,7 @@ int afs_getattr(struct mnt_idmap *idmap, const struct path *path,
 
 	do {
 		read_seqbegin_or_lock(&vnode->cb_lock, &seq);
-		generic_fillattr(&nop_mnt_idmap, inode, stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 		if (test_bit(AFS_VNODE_SILLY_DELETED, &vnode->flags) &&
 		    stat->nlink > 0)
 			stat->nlink -= 1;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 19c707bc8801..2335b5e1cecc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8625,7 +8625,7 @@ static int btrfs_getattr(struct mnt_idmap *idmap,
 				  STATX_ATTR_IMMUTABLE |
 				  STATX_ATTR_NODUMP);
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 	stat->dev = BTRFS_I(inode)->root->anon_dev;
 
 	spin_lock(&BTRFS_I(inode)->lock);
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 8e5f41d45283..2e479dca6845 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2465,7 +2465,7 @@ int ceph_getattr(struct mnt_idmap *idmap, const struct path *path,
 			return err;
 	}
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	stat->ino = ceph_present_inode(inode);
 
 	/*
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 1087ac6104a9..1ba09e39a1de 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -2540,7 +2540,7 @@ int cifs_getattr(struct mnt_idmap *idmap, const struct path *path,
 			return rc;
 	}
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	stat->blksize = cifs_sb->ctx->bsize;
 	stat->ino = CIFS_I(inode)->uniqueid;
 
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index d661e6cf17ac..f0be47be3587 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -256,7 +256,8 @@ int coda_getattr(struct mnt_idmap *idmap, const struct path *path,
 {
 	int err = coda_revalidate_inode(d_inode(path->dentry));
 	if (!err)
-		generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask,
+				 d_inode(path->dentry), stat);
 	return err;
 }
 
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 83274915ba6d..e4c1b62a2de2 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -982,7 +982,7 @@ static int ecryptfs_getattr_link(struct mnt_idmap *idmap,
 
 	mount_crypt_stat = &ecryptfs_superblock_to_private(
 						dentry->d_sb)->mount_crypt_stat;
-	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
 	if (mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES) {
 		char *target;
 		size_t targetsiz;
@@ -1011,7 +1011,8 @@ static int ecryptfs_getattr(struct mnt_idmap *idmap,
 	if (!rc) {
 		fsstack_copy_attr_all(d_inode(dentry),
 				      ecryptfs_inode_to_lower(d_inode(dentry)));
-		generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask,
+				 d_inode(dentry), stat);
 		stat->blocks = lower_stat.blocks;
 	}
 	return rc;
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index d70b12b81507..b8aeb1251997 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -372,7 +372,7 @@ int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	stat->attributes_mask |= (STATX_ATTR_COMPRESSED |
 				  STATX_ATTR_IMMUTABLE);
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index e99183a74611..29b6229fddad 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -232,7 +232,7 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
 	struct inode *inode = d_backing_inode(path->dentry);
 	struct exfat_inode_info *ei = EXFAT_I(inode);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	exfat_truncate_atime(&stat->atime);
 	stat->result_mask |= STATX_BTIME;
 	stat->btime.tv_sec = ei->i_crtime.tv_sec;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 26f135e7ffce..eb4d32fcbf17 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1614,7 +1614,7 @@ int ext2_getattr(struct mnt_idmap *idmap, const struct path *path,
 			STATX_ATTR_IMMUTABLE |
 			STATX_ATTR_NODUMP);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ce5f21b6c2b3..e0bbcf7a07b5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5539,7 +5539,7 @@ int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
 				  STATX_ATTR_NODUMP |
 				  STATX_ATTR_VERITY);
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 5ac53d2627d2..bb16319a9491 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -888,7 +888,7 @@ int f2fs_getattr(struct mnt_idmap *idmap, const struct path *path,
 				  STATX_ATTR_NODUMP |
 				  STATX_ATTR_VERITY);
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 
 	/* we need to show initial sectors used for inline_data/dentries */
 	if ((S_ISREG(inode->i_mode) && f2fs_has_inline_data(inode)) ||
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 795a4fad5c40..650f77422057 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -401,7 +401,7 @@ int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
 	struct inode *inode = d_inode(path->dentry);
 	struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 	stat->blksize = sbi->cluster_size;
 
 	if (sbi->options.nfs == FAT_NFS_NOSTALE_RO) {
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 35bc174f9ba2..a445d72bd98f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1224,7 +1224,7 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
 		forget_all_cached_acls(inode);
 		err = fuse_do_getattr(inode, stat, file);
 	} else if (stat) {
-		generic_fillattr(&nop_mnt_idmap, inode, stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 		stat->mode = fi->orig_i_mode;
 		stat->ino = fi->orig_ino;
 	}
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 17c994a0c0d0..5b3c62d20db5 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -2071,7 +2071,7 @@ static int gfs2_getattr(struct mnt_idmap *idmap,
 				  STATX_ATTR_IMMUTABLE |
 				  STATX_ATTR_NODUMP);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 
 	if (gfs2_holder_initialized(&gh))
 		gfs2_glock_dq_uninit(&gh);
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index b21660475ac1..6aefdbd3e7b6 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -298,7 +298,7 @@ int hfsplus_getattr(struct mnt_idmap *idmap, const struct path *path,
 	stat->attributes_mask |= STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE |
 				 STATX_ATTR_NODUMP;
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
index b22b74d1a115..b01cc38ed843 100644
--- a/fs/kernfs/inode.c
+++ b/fs/kernfs/inode.c
@@ -191,7 +191,7 @@ int kernfs_iop_getattr(struct mnt_idmap *idmap,
 
 	down_read(&root->kernfs_iattr_rwsem);
 	kernfs_refresh_inode(kn, inode);
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	up_read(&root->kernfs_iattr_rwsem);
 
 	return 0;
diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c
index cb93fd231f4e..d39ddb344417 100644
--- a/fs/ksmbd/smb2pdu.c
+++ b/fs/ksmbd/smb2pdu.c
@@ -4342,8 +4342,8 @@ static int get_file_basic_info(struct smb2_query_info_rsp *rsp,
 	}
 
 	basic_info = (struct smb2_file_basic_info *)rsp->Buffer;
-	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
-			 &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
+			 file_inode(fp->filp), &stat);
 	basic_info->CreationTime = cpu_to_le64(fp->create_time);
 	time = ksmbd_UnixTimeToNT(stat.atime);
 	basic_info->LastAccessTime = cpu_to_le64(time);
@@ -4383,7 +4383,7 @@ static void get_file_standard_info(struct smb2_query_info_rsp *rsp,
 	struct kstat stat;
 
 	inode = file_inode(fp->filp);
-	generic_fillattr(file_mnt_idmap(fp->filp), inode, &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS, inode, &stat);
 
 	sinfo = (struct smb2_file_standard_info *)rsp->Buffer;
 	delete_pending = ksmbd_inode_pending_delete(fp);
@@ -4437,7 +4437,7 @@ static int get_file_all_info(struct ksmbd_work *work,
 		return PTR_ERR(filename);
 
 	inode = file_inode(fp->filp);
-	generic_fillattr(file_mnt_idmap(fp->filp), inode, &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS, inode, &stat);
 
 	ksmbd_debug(SMB, "filename = %s\n", filename);
 	delete_pending = ksmbd_inode_pending_delete(fp);
@@ -4514,8 +4514,8 @@ static void get_file_stream_info(struct ksmbd_work *work,
 	int buf_free_len;
 	struct smb2_query_info_req *req = ksmbd_req_buf_next(work);
 
-	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
-			 &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
+			 file_inode(fp->filp), &stat);
 	file_info = (struct smb2_file_stream_info *)rsp->Buffer;
 
 	buf_free_len =
@@ -4605,8 +4605,8 @@ static void get_file_internal_info(struct smb2_query_info_rsp *rsp,
 	struct smb2_file_internal_info *file_info;
 	struct kstat stat;
 
-	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
-			 &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
+			 file_inode(fp->filp), &stat);
 	file_info = (struct smb2_file_internal_info *)rsp->Buffer;
 	file_info->IndexNumber = cpu_to_le64(stat.ino);
 	rsp->OutputBufferLength =
@@ -4631,7 +4631,7 @@ static int get_file_network_open_info(struct smb2_query_info_rsp *rsp,
 	file_info = (struct smb2_file_ntwrk_info *)rsp->Buffer;
 
 	inode = file_inode(fp->filp);
-	generic_fillattr(file_mnt_idmap(fp->filp), inode, &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS, inode, &stat);
 
 	file_info->CreationTime = cpu_to_le64(fp->create_time);
 	time = ksmbd_UnixTimeToNT(stat.atime);
@@ -4692,8 +4692,8 @@ static void get_file_compression_info(struct smb2_query_info_rsp *rsp,
 	struct smb2_file_comp_info *file_info;
 	struct kstat stat;
 
-	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
-			 &stat);
+	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
+			 file_inode(fp->filp), &stat);
 
 	file_info = (struct smb2_file_comp_info *)rsp->Buffer;
 	file_info->CompressedFileSize = cpu_to_le64(stat.blocks << 9);
diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c
index 778c152708e4..47a4fe1e5043 100644
--- a/fs/ksmbd/vfs.c
+++ b/fs/ksmbd/vfs.c
@@ -1597,7 +1597,8 @@ int ksmbd_vfs_fill_dentry_attrs(struct ksmbd_work *work,
 	u64 time;
 	int rc;
 
-	generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
+	generic_fillattr(idmap, STATX_BASIC_STATS, d_inode(dentry),
+			 ksmbd_kstat->kstat);
 
 	time = ksmbd_UnixTimeToNT(ksmbd_kstat->kstat->ctime);
 	ksmbd_kstat->create_time = time;
diff --git a/fs/libfs.c b/fs/libfs.c
index 89cf614a3271..b8f7be638f17 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -33,7 +33,7 @@ int simple_getattr(struct mnt_idmap *idmap, const struct path *path,
 		   unsigned int query_flags)
 {
 	struct inode *inode = d_inode(path->dentry);
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	stat->blocks = inode->i_mapping->nrpages << (PAGE_SHIFT - 9);
 	return 0;
 }
@@ -1315,7 +1315,7 @@ static int empty_dir_getattr(struct mnt_idmap *idmap,
 			     u32 request_mask, unsigned int query_flags)
 {
 	struct inode *inode = d_inode(path->dentry);
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/minix/inode.c b/fs/minix/inode.c
index e9fbb5303a22..a7f927e3760c 100644
--- a/fs/minix/inode.c
+++ b/fs/minix/inode.c
@@ -660,7 +660,7 @@ int minix_getattr(struct mnt_idmap *idmap, const struct path *path,
 	struct super_block *sb = path->dentry->d_sb;
 	struct inode *inode = d_inode(path->dentry);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	if (INODE_VERSION(inode) == MINIX_V1)
 		stat->blocks = (BLOCK_SIZE / 512) * V1_minix_blocks(stat->size, sb);
 	else
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index a910b9a638c5..7f490c29ab6b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -912,7 +912,7 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	/* Only return attributes that were revalidated. */
 	stat->result_mask = nfs_get_valid_attrmask(inode) | request_mask;
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
 	stat->change_cookie = inode_peek_iversion_raw(inode);
 	stat->attributes_mask |= STATX_ATTR_CHANGE_MONOTONIC;
diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 19d51ebf842c..e7494cdd957e 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -215,7 +215,8 @@ nfs_namespace_getattr(struct mnt_idmap *idmap,
 	if (NFS_FH(d_inode(path->dentry))->size != 0)
 		return nfs_getattr(idmap, path, stat, request_mask,
 				   query_flags);
-	generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
+			 stat);
 	return 0;
 }
 
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
index 9a3d55c367d9..43ffd48eb048 100644
--- a/fs/ntfs3/file.c
+++ b/fs/ntfs3/file.c
@@ -85,7 +85,7 @@ int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 
 	stat->attributes_mask |= STATX_ATTR_COMPRESSED | STATX_ATTR_ENCRYPTED;
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 
 	stat->result_mask |= STATX_BTIME;
 	stat->btime = ni->i_crtime;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index efb09de4343d..5dc659a53311 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1317,7 +1317,7 @@ int ocfs2_getattr(struct mnt_idmap *idmap, const struct path *path,
 		goto bail;
 	}
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	/*
 	 * If there is inline data in the inode, the inode will normally not
 	 * have data blocks allocated (it may have an external xattr block).
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 9014bbcc8031..a52c30e80f45 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -871,7 +871,7 @@ int orangefs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	ret = orangefs_inode_getattr(inode,
 	    request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
 	if (ret == 0) {
-		generic_fillattr(&nop_mnt_idmap, inode, stat);
+		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 
 		/* override block size reported to stat */
 		if (!(request_mask & STATX_SIZE))
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 05452c3b9872..98eab84553f9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1966,7 +1966,7 @@ int pid_getattr(struct mnt_idmap *idmap, const struct path *path,
 	struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb);
 	struct task_struct *task;
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 
 	stat->uid = GLOBAL_ROOT_UID;
 	stat->gid = GLOBAL_ROOT_GID;
@@ -3899,7 +3899,7 @@ static int proc_task_getattr(struct mnt_idmap *idmap,
 {
 	struct inode *inode = d_inode(path->dentry);
 	struct task_struct *p = get_proc_task(inode);
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 
 	if (p) {
 		stat->nlink += get_nr_threads(p);
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index b3140deebbbf..6276b3938842 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -352,7 +352,7 @@ static int proc_fd_getattr(struct mnt_idmap *idmap,
 	struct inode *inode = d_inode(path->dentry);
 	int rv = 0;
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 
 	/* If it's a directory, put the number of open fds there */
 	if (S_ISDIR(inode->i_mode)) {
diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index 42ae38ff6e7e..775ce0bcf08c 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -146,7 +146,7 @@ static int proc_getattr(struct mnt_idmap *idmap,
 		}
 	}
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	return 0;
 }
 
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index a0c0419872e3..75f35f128e63 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -308,7 +308,7 @@ static int proc_tgid_net_getattr(struct mnt_idmap *idmap,
 
 	net = get_proc_task_net(inode);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 
 	if (net != NULL) {
 		stat->nlink = net->proc_net->nlink;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 8038833ff5b0..c00b15b0ba81 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -855,7 +855,7 @@ static int proc_sys_getattr(struct mnt_idmap *idmap,
 	if (IS_ERR(head))
 		return PTR_ERR(head);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	if (table)
 		stat->mode = (stat->mode & S_IFMT) | table->mode;
 
diff --git a/fs/proc/root.c b/fs/proc/root.c
index a86e65a608da..9191248f2dac 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -314,7 +314,8 @@ static int proc_root_getattr(struct mnt_idmap *idmap,
 			     const struct path *path, struct kstat *stat,
 			     u32 request_mask, unsigned int query_flags)
 {
-	generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
+			 stat);
 	stat->nlink = proc_root.nlink + nr_processes();
 	return 0;
 }
diff --git a/fs/stat.c b/fs/stat.c
index 7c238da22ef0..9b513a142a56 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -29,6 +29,7 @@
 /**
  * generic_fillattr - Fill in the basic attributes from the inode struct
  * @idmap:	idmap of the mount the inode was found from
+ * @req_mask	statx request_mask
  * @inode:	Inode to use as the source
  * @stat:	Where to fill in the attributes
  *
@@ -42,8 +43,8 @@
  * uid and gid filds. On non-idmapped mounts or if permission checking is to be
  * performed on the raw inode simply passs @nop_mnt_idmap.
  */
-void generic_fillattr(struct mnt_idmap *idmap, struct inode *inode,
-		      struct kstat *stat)
+void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask,
+		      struct inode *inode, struct kstat *stat)
 {
 	vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
 	vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
@@ -61,6 +62,12 @@ void generic_fillattr(struct mnt_idmap *idmap, struct inode *inode,
 	stat->ctime = inode->i_ctime;
 	stat->blksize = i_blocksize(inode);
 	stat->blocks = inode->i_blocks;
+
+	if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
+		stat->result_mask |= STATX_CHANGE_COOKIE;
+		stat->change_cookie = inode_query_iversion(inode);
+	}
+
 }
 EXPORT_SYMBOL(generic_fillattr);
 
@@ -123,17 +130,12 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT |
 				  STATX_ATTR_DAX);
 
-	if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
-		stat->result_mask |= STATX_CHANGE_COOKIE;
-		stat->change_cookie = inode_query_iversion(inode);
-	}
-
 	idmap = mnt_idmap(path->mnt);
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(idmap, path, stat,
 					    request_mask, query_flags);
 
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 	return 0;
 }
 EXPORT_SYMBOL(vfs_getattr_nosec);
diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c
index b22764fe669c..d41189cd9ec2 100644
--- a/fs/sysv/itree.c
+++ b/fs/sysv/itree.c
@@ -445,7 +445,8 @@ int sysv_getattr(struct mnt_idmap *idmap, const struct path *path,
 		 struct kstat *stat, u32 request_mask, unsigned int flags)
 {
 	struct super_block *s = path->dentry->d_sb;
-	generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
+			 stat);
 	stat->blocks = (s->s_blocksize / 512) * sysv_nblocks(s, stat->size);
 	stat->blksize = s->s_blocksize;
 	return 0;
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index ef0499edc248..ca044f31c49d 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -1665,7 +1665,7 @@ int ubifs_getattr(struct mnt_idmap *idmap, const struct path *path,
 				STATX_ATTR_ENCRYPTED |
 				STATX_ATTR_IMMUTABLE);
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	stat->blksize = UBIFS_BLOCK_SIZE;
 	stat->size = ui->ui_size;
 
diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
index a34c8c4e6d21..69f6982aec9a 100644
--- a/fs/udf/symlink.c
+++ b/fs/udf/symlink.c
@@ -153,7 +153,7 @@ static int udf_symlink_getattr(struct mnt_idmap *idmap,
 	struct inode *inode = d_backing_inode(dentry);
 	struct page *page;
 
-	generic_fillattr(&nop_mnt_idmap, inode, stat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
 	page = read_mapping_page(inode->i_mapping, 0, NULL);
 	if (IS_ERR(page))
 		return PTR_ERR(page);
diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
index dd0ae1188e87..482f778709d6 100644
--- a/fs/vboxsf/utils.c
+++ b/fs/vboxsf/utils.c
@@ -252,7 +252,7 @@ int vboxsf_getattr(struct mnt_idmap *idmap, const struct path *path,
 	if (err)
 		return err;
 
-	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), kstat);
+	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), kstat);
 	return 0;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a981680856..d5896f90093a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2857,7 +2857,7 @@ extern void page_put_link(void *);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
 extern void kfree_link(void *);
-void generic_fillattr(struct mnt_idmap *, struct inode *, struct kstat *);
+void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
 void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
 extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
diff --git a/mm/shmem.c b/mm/shmem.c
index e40a08c5c6d7..8208d4f85dff 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1073,7 +1073,7 @@ static int shmem_getattr(struct mnt_idmap *idmap,
 	stat->attributes_mask |= (STATX_ATTR_APPEND |
 			STATX_ATTR_IMMUTABLE |
 			STATX_ATTR_NODUMP);
-	generic_fillattr(idmap, inode, stat);
+	generic_fillattr(idmap, request_mask, inode, stat);
 
 	if (shmem_is_huge(inode, 0, false, NULL, 0))
 		stat->blksize = HPAGE_PMD_SIZE;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-23 10:02   ` Jan Kara
  2023-05-18 11:47 ` [PATCH v4 3/9] overlayfs: allow it to handle multigrain timestamps Jeff Layton
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

The VFS always uses coarse-grained timestamp updates for filling out the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via
NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
lot of exported filesystems don't properly support a change attribute
and are subject to the same problems with timestamp granularity. Other
applications have similar issues (e.g backup applications).

Switching to always using fine-grained timestamps would improve the
situation, but that becomes rather expensive, as the underlying
filesystem will have to log a lot more metadata updates.

What we need is a way to only use fine-grained timestamps when they are
being actively queried.

The kernel always stores normalized ctime values, so only the first 30
bits of the tv_nsec field are ever used. Whenever the mtime changes, the
ctime must also change.

Use the 31st bit of the ctime tv_nsec field to indicate that something
has queried the inode for the i_mtime or i_ctime. When this flag is set,
on the next timestamp update, the kernel can fetch a fine-grained
timestamp instead of the usual coarse-grained one.

This patch adds the infrastructure this scheme. Filesytems can opt
into it by setting the FS_MULTIGRAIN_TS flag in the fstype.

Later patches will convert individual filesystems over to use it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/inode.c         | 48 ++++++++++++++++++++++++++++-----
 fs/stat.c          | 41 ++++++++++++++++++++++++++--
 include/linux/fs.h | 66 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 145 insertions(+), 10 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 577799b7855f..24769e08fbaa 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2029,6 +2029,7 @@ EXPORT_SYMBOL(file_remove_privs);
 static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
 {
 	int sync_it = 0;
+	struct timespec64 ctime;
 
 	/* First try to exhaust all avenues to not sync */
 	if (IS_NOCMTIME(inode))
@@ -2037,7 +2038,8 @@ static int inode_needs_update_time(struct inode *inode, struct timespec64 *now)
 	if (!timespec64_equal(&inode->i_mtime, now))
 		sync_it = S_MTIME;
 
-	if (!timespec64_equal(&inode->i_ctime, now))
+	ctime = ctime_peek(inode);
+	if (!timespec64_equal(&ctime, now))
 		sync_it |= S_CTIME;
 
 	if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
@@ -2431,6 +2433,40 @@ struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode)
 }
 EXPORT_SYMBOL(timestamp_truncate);
 
+/**
+ * current_mg_time - Return FS time (possibly fine-grained)
+ * @inode: inode.
+ *
+ * Return the current time truncated to the time granularity supported by
+ * the fs, as suitable for a ctime/mtime change. If the ctime is flagged
+ * as having been QUERIED, get a fine-grained timestamp.
+ */
+static struct timespec64 current_mg_time(struct inode *inode)
+{
+	struct timespec64 now;
+	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
+	long nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED, pnsec);
+
+	if (nsec & I_CTIME_QUERIED) {
+		ktime_get_real_ts64(&now);
+	} else {
+		struct timespec64 ctime;
+
+		ktime_get_coarse_real_ts64(&now);
+
+		/*
+		 * If we've recently fetched a fine-grained timestamp
+		 * then the coarse-grained one may still be earlier than the
+		 * existing one. Just keep the existing ctime if so.
+		 */
+		ctime = ctime_peek(inode);
+		if (timespec64_compare(&ctime, &now) > 0)
+			now = ctime;
+	}
+
+	return now;
+}
+
 /**
  * current_time - Return FS time
  * @inode: inode.
@@ -2445,12 +2481,10 @@ struct timespec64 current_time(struct inode *inode)
 {
 	struct timespec64 now;
 
-	ktime_get_coarse_real_ts64(&now);
-
-	if (unlikely(!inode->i_sb)) {
-		WARN(1, "current_time() called with uninitialized super_block in the inode");
-		return now;
-	}
+	if (is_multigrain_ts(inode))
+		now = current_mg_time(inode);
+	else
+		ktime_get_coarse_real_ts64(&now);
 
 	return timestamp_truncate(now, inode);
 }
diff --git a/fs/stat.c b/fs/stat.c
index 9b513a142a56..74d8283cc5c6 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -26,6 +26,38 @@
 #include "internal.h"
 #include "mount.h"
 
+/**
+ * fill_multigrain_cmtime - Fill in the mtime and ctime and flag ctime as QUERIED
+ * @request_mask: STATX_* values requested
+ * @inode: inode from which to grab the c/mtime
+ * @stat: where to store the resulting values
+ *
+ * Given @inode, grab the ctime and mtime out if it and store the result
+ * in @stat. When fetching the value, flag it as queried so the next write
+ * will use a fine-grained timestamp.
+ */
+void fill_multigrain_cmtime(u32 request_mask, struct inode *inode,
+			    struct kstat *stat)
+{
+	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
+
+	/* If neither time was requested, then don't report them */
+	if (!(request_mask & (STATX_CTIME|STATX_MTIME))) {
+		stat->result_mask &= ~(STATX_CTIME|STATX_MTIME);
+		return;
+	}
+
+	stat->mtime = inode->i_mtime;
+	stat->ctime.tv_sec = inode->i_ctime.tv_sec;
+	/*
+	 * Atomically set the QUERIED flag and fetch the new value with
+	 * the flag masked off.
+	 */
+	stat->ctime.tv_nsec = atomic_long_fetch_or(I_CTIME_QUERIED, pnsec) &
+					~I_CTIME_QUERIED;
+}
+EXPORT_SYMBOL(fill_multigrain_cmtime);
+
 /**
  * generic_fillattr - Fill in the basic attributes from the inode struct
  * @idmap:	idmap of the mount the inode was found from
@@ -58,11 +90,16 @@ void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask,
 	stat->rdev = inode->i_rdev;
 	stat->size = i_size_read(inode);
 	stat->atime = inode->i_atime;
-	stat->mtime = inode->i_mtime;
-	stat->ctime = inode->i_ctime;
 	stat->blksize = i_blocksize(inode);
 	stat->blocks = inode->i_blocks;
 
+	if (is_multigrain_ts(inode)) {
+		fill_multigrain_cmtime(request_mask, inode, stat);
+	} else {
+		stat->mtime = inode->i_mtime;
+		stat->ctime = inode->i_ctime;
+	}
+
 	if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
 		stat->result_mask |= STATX_CHANGE_COOKIE;
 		stat->change_cookie = inode_query_iversion(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d5896f90093a..1f670cf1edbd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1474,7 +1474,7 @@ static inline bool fsuidgid_has_mapping(struct super_block *sb,
 	       kgid_has_mapping(fs_userns, kgid);
 }
 
-extern struct timespec64 current_time(struct inode *inode);
+struct timespec64 current_time(struct inode *inode);
 
 /*
  * Snapshotting support.
@@ -2212,6 +2212,7 @@ struct file_system_type {
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
 #define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
+#define FS_MULTIGRAIN_TS	64	/* Filesystem uses multigrain timestamps */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
 	const struct fs_parameter_spec *parameters;
@@ -2235,6 +2236,67 @@ struct file_system_type {
 
 #define MODULE_ALIAS_FS(NAME) MODULE_ALIAS("fs-" NAME)
 
+/*
+ * Multigrain timestamps
+ *
+ * Conditionally use fine-grained ctime and mtime timestamps when there
+ * are users actively observing them via getattr. The primary use-case
+ * for this is NFS clients that use the ctime to distinguish between
+ * different states of the file, and that are often fooled by multiple
+ * operations that occur in the same coarse-grained timer tick.
+ */
+static inline bool is_multigrain_ts(const struct inode *inode)
+{
+	return inode->i_sb->s_type->fs_flags & FS_MULTIGRAIN_TS;
+}
+
+/*
+ * The kernel always keeps normalized struct timespec64 values in the ctime,
+ * which means that only the first 30 bits of the value are used. Use the
+ * 31st bit of the ctime's tv_nsec field as a flag to indicate that the value
+ * has been queried since it was last updated.
+ */
+#define I_CTIME_QUERIED		(1L<<30)
+
+/**
+ * ctime_nsec_peek - peek at (but don't query) the ctime tv_nsec field
+ * @inode: inode to fetch the ctime from
+ *
+ * Grab the current ctime tv_nsec field from the inode, mask off the
+ * I_CTIME_QUERIED flag and return it. This is mostly intended for use by
+ * internal consumers of the ctime that aren't concerned with ensuring a
+ * fine-grained update on the next change (e.g. when preparing to store
+ * the value in the backing store for later retrieval).
+ *
+ * This is safe to call regardless of whether the underlying filesystem
+ * is using multigrain timestamps.
+ */
+static inline long ctime_nsec_peek(const struct inode *inode)
+{
+	return inode->i_ctime.tv_nsec &~ I_CTIME_QUERIED;
+}
+
+/**
+ * ctime_peek - peek at (but don't query) the ctime
+ * @inode: inode to fetch the ctime from
+ *
+ * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For
+ * use by internal consumers that don't require a fine-grained update on
+ * the next change.
+ *
+ * This is safe to call regardless of whether the underlying filesystem
+ * is using multigrain timestamps.
+ */
+static inline struct timespec64 ctime_peek(const struct inode *inode)
+{
+	struct timespec64 ctime;
+
+	ctime.tv_sec = inode->i_ctime.tv_sec;
+	ctime.tv_nsec = ctime_nsec_peek(inode);
+
+	return ctime;
+}
+
 extern struct dentry *mount_bdev(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data,
 	int (*fill_super)(struct super_block *, void *, int));
@@ -2857,6 +2919,8 @@ extern void page_put_link(void *);
 extern int page_symlink(struct inode *inode, const char *symname, int len);
 extern const struct inode_operations page_symlink_inode_operations;
 extern void kfree_link(void *);
+void fill_multigrain_cmtime(u32 request_mask, struct inode *inode,
+			    struct kstat *stat);
 void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
 void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
 extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/9] overlayfs: allow it to handle multigrain timestamps
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime Jeff Layton
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

Ensure that we strip off the I_CTIME_QUERIED bit when copying up.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/overlayfs/file.c | 7 +++++--
 fs/overlayfs/util.c | 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 7c04f033aadd..cad715df8c4e 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -222,6 +222,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
 static void ovl_file_accessed(struct file *file)
 {
 	struct inode *inode, *upperinode;
+	struct timespec64 ctime, uctime;
 
 	if (file->f_flags & O_NOATIME)
 		return;
@@ -232,10 +233,12 @@ static void ovl_file_accessed(struct file *file)
 	if (!upperinode)
 		return;
 
+	ctime = ctime_peek(inode);
+	uctime = ctime_peek(upperinode);
 	if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
-	     !timespec64_equal(&inode->i_ctime, &upperinode->i_ctime))) {
+	     !timespec64_equal(&ctime, &uctime))) {
 		inode->i_mtime = upperinode->i_mtime;
-		inode->i_ctime = upperinode->i_ctime;
+		inode->i_ctime = uctime;
 	}
 
 	touch_atime(&file->f_path);
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 923d66d131c1..f4f9d7e189ef 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -1117,6 +1117,6 @@ void ovl_copyattr(struct inode *inode)
 	inode->i_mode = realinode->i_mode;
 	inode->i_atime = realinode->i_atime;
 	inode->i_mtime = realinode->i_mtime;
-	inode->i_ctime = realinode->i_ctime;
+	inode->i_ctime = ctime_peek(realinode);
 	i_size_write(inode, i_size_read(realinode));
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (2 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 3/9] overlayfs: allow it to handle multigrain timestamps Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-18 13:43   ` Chuck Lever III
  2023-05-18 11:47 ` [PATCH v4 5/9] ksmbd: use ctime_peek to grab the ctime out of the inode Jeff Layton
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

If getattr fails, then nfsd can end up scraping the time values directly
out of the inode for pre and post-op attrs. This may or may not be the
right thing to do, but for now make it at least use ctime_peek in this
situation to ensure that the QUERIED flag is masked.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfsfh.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index ccd8485fee04..f053cf20dd8a 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -624,9 +624,14 @@ void fh_fill_pre_attrs(struct svc_fh *fhp)
 	inode = d_inode(fhp->fh_dentry);
 	err = fh_getattr(fhp, &stat);
 	if (err) {
-		/* Grab the times from inode anyway */
+		/*
+		 * Grab the times from inode anyway.
+		 *
+		 * FIXME: is this the right thing to do? Or should we just
+		 * 	  not send pre and post-op attrs in this case?
+		 */
 		stat.mtime = inode->i_mtime;
-		stat.ctime = inode->i_ctime;
+		stat.ctime = ctime_peek(inode);
 		stat.size  = inode->i_size;
 		if (v4 && IS_I_VERSION(inode)) {
 			stat.change_cookie = inode_query_iversion(inode);
@@ -662,7 +667,7 @@ void fh_fill_post_attrs(struct svc_fh *fhp)
 	err = fh_getattr(fhp, &fhp->fh_post_attr);
 	if (err) {
 		fhp->fh_post_saved = false;
-		fhp->fh_post_attr.ctime = inode->i_ctime;
+		fhp->fh_post_attr.ctime = ctime_peek(inode);
 		if (v4 && IS_I_VERSION(inode)) {
 			fhp->fh_post_attr.change_cookie = inode_query_iversion(inode);
 			fhp->fh_post_attr.result_mask |= STATX_CHANGE_COOKIE;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 5/9] ksmbd: use ctime_peek to grab the ctime out of the inode
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (3 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 6/9] tmpfs: add support for multigrain timestamps Jeff Layton
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

This ensures that the flag is masked off in the result.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ksmbd/smb2pdu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c
index d39ddb344417..c33128570448 100644
--- a/fs/ksmbd/smb2pdu.c
+++ b/fs/ksmbd/smb2pdu.c
@@ -4745,7 +4745,7 @@ static int find_file_posix_info(struct smb2_query_info_rsp *rsp,
 	file_info->LastAccessTime = cpu_to_le64(time);
 	time = ksmbd_UnixTimeToNT(inode->i_mtime);
 	file_info->LastWriteTime = cpu_to_le64(time);
-	time = ksmbd_UnixTimeToNT(inode->i_ctime);
+	time = ksmbd_UnixTimeToNT(ctime_peek(inode));
 	file_info->ChangeTime = cpu_to_le64(time);
 	file_info->DosAttributes = fp->f_ci->m_fattr;
 	file_info->Inode = cpu_to_le64(inode->i_ino);
@@ -5386,7 +5386,7 @@ int smb2_close(struct ksmbd_work *work)
 		rsp->LastAccessTime = cpu_to_le64(time);
 		time = ksmbd_UnixTimeToNT(inode->i_mtime);
 		rsp->LastWriteTime = cpu_to_le64(time);
-		time = ksmbd_UnixTimeToNT(inode->i_ctime);
+		time = ksmbd_UnixTimeToNT(ctime_peek(inode));
 		rsp->ChangeTime = cpu_to_le64(time);
 		ksmbd_fd_put(work, fp);
 	} else {
@@ -5605,7 +5605,7 @@ static int set_file_basic_info(struct ksmbd_file *fp,
 	if (file_info->ChangeTime)
 		attrs.ia_ctime = ksmbd_NTtimeToUnix(file_info->ChangeTime);
 	else
-		attrs.ia_ctime = inode->i_ctime;
+		attrs.ia_ctime = ctime_peek(inode);
 
 	if (file_info->LastWriteTime) {
 		attrs.ia_mtime = ksmbd_NTtimeToUnix(file_info->LastWriteTime);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 6/9] tmpfs: add support for multigrain timestamps
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (4 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 5/9] ksmbd: use ctime_peek to grab the ctime out of the inode Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 7/9] xfs: switch to " Jeff Layton
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 8208d4f85dff..94ea3086eacc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4085,7 +4085,7 @@ static struct file_system_type shmem_fs_type = {
 #endif
 	.kill_sb	= kill_litter_super,
 #ifdef CONFIG_SHMEM
-	.fs_flags	= FS_USERNS_MOUNT | FS_ALLOW_IDMAP,
+	.fs_flags	= FS_USERNS_MOUNT | FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 #else
 	.fs_flags	= FS_USERNS_MOUNT,
 #endif
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 7/9] xfs: switch to multigrain timestamps
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (5 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 6/9] tmpfs: add support for multigrain timestamps Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-18 11:47 ` [PATCH v4 8/9] ext4: convert " Jeff Layton
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 2 +-
 fs/xfs/xfs_inode_item.c       | 2 +-
 fs/xfs/xfs_iops.c             | 4 ++--
 fs/xfs/xfs_super.c            | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 758aacd8166b..c29e961fac34 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -316,7 +316,7 @@ xfs_inode_to_disk(
 
 	to->di_atime = xfs_inode_to_disk_ts(ip, inode->i_atime);
 	to->di_mtime = xfs_inode_to_disk_ts(ip, inode->i_mtime);
-	to->di_ctime = xfs_inode_to_disk_ts(ip, inode->i_ctime);
+	to->di_ctime = xfs_inode_to_disk_ts(ip, ctime_peek(inode));
 	to->di_nlink = cpu_to_be32(inode->i_nlink);
 	to->di_gen = cpu_to_be32(inode->i_generation);
 	to->di_mode = cpu_to_be16(inode->i_mode);
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index ca2941ab6cbc..018f187387f0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -381,7 +381,7 @@ xfs_inode_to_log_dinode(
 	memset(to->di_pad3, 0, sizeof(to->di_pad3));
 	to->di_atime = xfs_inode_to_log_dinode_ts(ip, inode->i_atime);
 	to->di_mtime = xfs_inode_to_log_dinode_ts(ip, inode->i_mtime);
-	to->di_ctime = xfs_inode_to_log_dinode_ts(ip, inode->i_ctime);
+	to->di_ctime = xfs_inode_to_log_dinode_ts(ip, ctime_peek(inode));
 	to->di_nlink = inode->i_nlink;
 	to->di_gen = inode->i_generation;
 	to->di_mode = inode->i_mode;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 24718adb3c16..f101b543a33f 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -573,10 +573,10 @@ xfs_vn_getattr(
 	stat->gid = vfsgid_into_kgid(vfsgid);
 	stat->ino = ip->i_ino;
 	stat->atime = inode->i_atime;
-	stat->mtime = inode->i_mtime;
-	stat->ctime = inode->i_ctime;
 	stat->blocks = XFS_FSB_TO_BB(mp, ip->i_nblocks + ip->i_delayed_blks);
 
+	fill_multigrain_cmtime(request_mask, inode, stat);
+
 	if (xfs_has_v3inodes(mp)) {
 		if (request_mask & STATX_BTIME) {
 			stat->result_mask |= STATX_BTIME;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7e706255f165..71c04cec974b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1992,7 +1992,7 @@ static struct file_system_type xfs_fs_type = {
 	.init_fs_context	= xfs_init_fs_context,
 	.parameters		= xfs_fs_parameters,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 };
 MODULE_ALIAS_FS("xfs");
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 8/9] ext4: convert to multigrain timestamps
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (6 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 7/9] xfs: switch to " Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-20 20:29   ` Theodore Ts'o
  2023-05-18 11:47 ` [PATCH v4 9/9] btrfs: " Jeff Layton
  2023-05-22  9:54 ` [PATCH v4 0/9] fs: implement " Christian Brauner
  9 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ext4/inode.c | 17 +++++++++++++++--
 fs/ext4/super.c |  2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e0bbcf7a07b5..37840aeb7ff9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4235,6 +4235,19 @@ static int ext4_inode_blocks_set(struct ext4_inode *raw_inode,
 	return 0;
 }
 
+static void ext4_inode_set_ctime(struct inode *inode, struct ext4_inode *raw_inode)
+{
+	struct timespec64 ctime = ctime_peek(inode);
+
+	if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), i_ctime_extra)) {
+		raw_inode->i_ctime = cpu_to_le32(ctime.tv_sec);
+		raw_inode->i_ctime_extra = ext4_encode_extra_time(&ctime);
+	} else {
+		raw_inode->i_ctime = cpu_to_le32(clamp_t(int32_t,
+					ctime.tv_sec, S32_MIN, S32_MAX));
+	}
+}
+
 static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode)
 {
 	struct ext4_inode_info *ei = EXT4_I(inode);
@@ -4275,7 +4288,7 @@ static int ext4_fill_raw_inode(struct inode *inode, struct ext4_inode *raw_inode
 	}
 	raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
 
-	EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+	ext4_inode_set_ctime(inode, raw_inode);
 	EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
 	EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
 	EXT4_EINODE_SET_XTIME(i_crtime, ei, raw_inode);
@@ -4983,7 +4996,7 @@ static void __ext4_update_other_inode_time(struct super_block *sb,
 		spin_unlock(&inode->i_lock);
 
 		spin_lock(&ei->i_raw_lock);
-		EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+		ext4_inode_set_ctime(inode, raw_inode);
 		EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
 		EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
 		ext4_inode_csum_set(inode, raw_inode, ei);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9680fe753e59..4de4977dcb21 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -7258,7 +7258,7 @@ static struct file_system_type ext4_fs_type = {
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
 	.kill_sb		= kill_block_super,
-	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
+	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 };
 MODULE_ALIAS_FS("ext4");
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 9/9] btrfs: convert to multigrain timestamps
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (7 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 8/9] ext4: convert " Jeff Layton
@ 2023-05-18 11:47 ` Jeff Layton
  2023-05-22  9:56   ` David Sterba
  2023-05-22  9:54 ` [PATCH v4 0/9] fs: implement " Christian Brauner
  9 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 11:47 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever
  Cc: Jan Kara, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/btrfs/delayed-inode.c | 2 +-
 fs/btrfs/inode.c         | 2 +-
 fs/btrfs/super.c         | 5 +++--
 fs/btrfs/tree-log.c      | 2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 6b457b010cbc..8307fd69da43 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1810,7 +1810,7 @@ static void fill_stack_inode_item(struct btrfs_trans_handle *trans,
 	btrfs_set_stack_timespec_sec(&inode_item->ctime,
 				     inode->i_ctime.tv_sec);
 	btrfs_set_stack_timespec_nsec(&inode_item->ctime,
-				      inode->i_ctime.tv_nsec);
+				      ctime_nsec_peek(inode));
 
 	btrfs_set_stack_timespec_sec(&inode_item->otime,
 				     BTRFS_I(inode)->i_otime.tv_sec);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2335b5e1cecc..b27d4dda6024 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3970,7 +3970,7 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
 	btrfs_set_token_timespec_sec(&token, &item->ctime,
 				     inode->i_ctime.tv_sec);
 	btrfs_set_token_timespec_nsec(&token, &item->ctime,
-				      inode->i_ctime.tv_nsec);
+				      ctime_nsec_peek(inode));
 
 	btrfs_set_token_timespec_sec(&token, &item->otime,
 				     BTRFS_I(inode)->i_otime.tv_sec);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index ec18e2210602..fc6abf8b1f42 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2144,7 +2144,7 @@ static struct file_system_type btrfs_fs_type = {
 	.name		= "btrfs",
 	.mount		= btrfs_mount,
 	.kill_sb	= btrfs_kill_super,
-	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_MULTIGRAIN_TS,
 };
 
 static struct file_system_type btrfs_root_fs_type = {
@@ -2152,7 +2152,8 @@ static struct file_system_type btrfs_root_fs_type = {
 	.name		= "btrfs",
 	.mount		= btrfs_mount_root,
 	.kill_sb	= btrfs_kill_super,
-	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
+	.fs_flags	= FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
+			  FS_ALLOW_IDMAP | FS_MULTIGRAIN_TS,
 };
 
 MODULE_ALIAS_FS("btrfs");
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9b212e8c70cc..9a4d1b2ab204 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4150,7 +4150,7 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
 	btrfs_set_token_timespec_sec(&token, &item->ctime,
 				     inode->i_ctime.tv_sec);
 	btrfs_set_token_timespec_nsec(&token, &item->ctime,
-				      inode->i_ctime.tv_nsec);
+				      ctime_nsec_peek(inode));
 
 	/*
 	 * We do not need to set the nbytes field, in fact during a fast fsync
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime
  2023-05-18 11:47 ` [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime Jeff Layton
@ 2023-05-18 13:43   ` Chuck Lever III
  2023-05-18 15:31     ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever III @ 2023-05-18 13:43 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Jan Kara, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, open list, Linux-XFS, linux-btrfs, linux-ext4,
	linux-mm, Linux NFS Mailing List, linux-cifs



> On May 18, 2023, at 7:47 AM, Jeff Layton <jlayton@kernel.org> wrote:
> 
> If getattr fails, then nfsd can end up scraping the time values directly
> out of the inode for pre and post-op attrs. This may or may not be the
> right thing to do, but for now make it at least use ctime_peek in this
> situation to ensure that the QUERIED flag is masked.

That code comes from:

commit 39ca1bf624b6b82cc895b0217889eaaf572a7913
Author:     Amir Goldstein <amir73il@gmail.com>
AuthorDate: Wed Jan 3 17:14:35 2018 +0200
Commit:     J. Bruce Fields <bfields@redhat.com>
CommitDate: Thu Feb 8 13:40:17 2018 -0500

    nfsd: store stat times in fill_pre_wcc() instead of inode times

    The time values in stat and inode may differ for overlayfs and stat time
    values are the correct ones to use. This is also consistent with the fact
    that fill_post_wcc() also stores stat time values.

    This means introducing a stat call that could fail, where previously we
    were just copying values out of the inode.  To be conservative about
    changing behavior, we fall back to copying values out of the inode in
    the error case.  It might be better just to clear fh_pre_saved (though
    note the BUG_ON in set_change_info).

    Signed-off-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

I was thinking it might have been added to handle odd corner
cases around re-exporting NFS mounts, but that does not seem
to be the case.

The fh_getattr() can fail for legitimate reasons -- like the
file is in the middle of being deleted or renamed over -- I
would think. This code should really deal with that by not
adding pre-op attrs, since they are optional.


> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/nfsfh.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index ccd8485fee04..f053cf20dd8a 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -624,9 +624,14 @@ void fh_fill_pre_attrs(struct svc_fh *fhp)
> inode = d_inode(fhp->fh_dentry);
> err = fh_getattr(fhp, &stat);
> if (err) {
> - /* Grab the times from inode anyway */
> + /*
> + * Grab the times from inode anyway.
> + *
> + * FIXME: is this the right thing to do? Or should we just
> + *  not send pre and post-op attrs in this case?
> + */
> stat.mtime = inode->i_mtime;
> - stat.ctime = inode->i_ctime;
> + stat.ctime = ctime_peek(inode);
> stat.size  = inode->i_size;
> if (v4 && IS_I_VERSION(inode)) {
> stat.change_cookie = inode_query_iversion(inode);
> @@ -662,7 +667,7 @@ void fh_fill_post_attrs(struct svc_fh *fhp)
> err = fh_getattr(fhp, &fhp->fh_post_attr);
> if (err) {
> fhp->fh_post_saved = false;
> - fhp->fh_post_attr.ctime = inode->i_ctime;
> + fhp->fh_post_attr.ctime = ctime_peek(inode);
> if (v4 && IS_I_VERSION(inode)) {
> fhp->fh_post_attr.change_cookie = inode_query_iversion(inode);
> fhp->fh_post_attr.result_mask |= STATX_CHANGE_COOKIE;
> -- 
> 2.40.1
> 

--
Chuck Lever



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime
  2023-05-18 13:43   ` Chuck Lever III
@ 2023-05-18 15:31     ` Jeff Layton
  2023-05-19 10:36       ` Christian Brauner
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-18 15:31 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Jan Kara, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, open list, Linux-XFS, linux-btrfs, linux-ext4,
	linux-mm, Linux NFS Mailing List, linux-cifs

On Thu, 2023-05-18 at 13:43 +0000, Chuck Lever III wrote:
> 
> > On May 18, 2023, at 7:47 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > If getattr fails, then nfsd can end up scraping the time values directly
> > out of the inode for pre and post-op attrs. This may or may not be the
> > right thing to do, but for now make it at least use ctime_peek in this
> > situation to ensure that the QUERIED flag is masked.
> 
> That code comes from:
> 
> commit 39ca1bf624b6b82cc895b0217889eaaf572a7913
> Author:     Amir Goldstein <amir73il@gmail.com>
> AuthorDate: Wed Jan 3 17:14:35 2018 +0200
> Commit:     J. Bruce Fields <bfields@redhat.com>
> CommitDate: Thu Feb 8 13:40:17 2018 -0500
> 
>     nfsd: store stat times in fill_pre_wcc() instead of inode times
> 
>     The time values in stat and inode may differ for overlayfs and stat time
>     values are the correct ones to use. This is also consistent with the fact
>     that fill_post_wcc() also stores stat time values.
> 
>     This means introducing a stat call that could fail, where previously we
>     were just copying values out of the inode.  To be conservative about
>     changing behavior, we fall back to copying values out of the inode in
>     the error case.  It might be better just to clear fh_pre_saved (though
>     note the BUG_ON in set_change_info).
> 
>     Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> I was thinking it might have been added to handle odd corner
> cases around re-exporting NFS mounts, but that does not seem
> to be the case.
> 
> The fh_getattr() can fail for legitimate reasons -- like the
> file is in the middle of being deleted or renamed over -- I
> would think. This code should really deal with that by not
> adding pre-op attrs, since they are optional.
> 

That sounds fine to me. I'll plan to drop this patch from the series and
I'll send a separate patch to just remove those branches altogether
(which should DTRT).

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime
  2023-05-18 15:31     ` Jeff Layton
@ 2023-05-19 10:36       ` Christian Brauner
  2023-05-19 11:22         ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Brauner @ 2023-05-19 10:36 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Chuck Lever III, Al Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Jan Kara, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, open list, Linux-XFS, linux-btrfs, linux-ext4,
	linux-mm, Linux NFS Mailing List, linux-cifs

On Thu, May 18, 2023 at 11:31:45AM -0400, Jeff Layton wrote:
> On Thu, 2023-05-18 at 13:43 +0000, Chuck Lever III wrote:
> > 
> > > On May 18, 2023, at 7:47 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > If getattr fails, then nfsd can end up scraping the time values directly
> > > out of the inode for pre and post-op attrs. This may or may not be the
> > > right thing to do, but for now make it at least use ctime_peek in this
> > > situation to ensure that the QUERIED flag is masked.
> > 
> > That code comes from:
> > 
> > commit 39ca1bf624b6b82cc895b0217889eaaf572a7913
> > Author:     Amir Goldstein <amir73il@gmail.com>
> > AuthorDate: Wed Jan 3 17:14:35 2018 +0200
> > Commit:     J. Bruce Fields <bfields@redhat.com>
> > CommitDate: Thu Feb 8 13:40:17 2018 -0500
> > 
> >     nfsd: store stat times in fill_pre_wcc() instead of inode times
> > 
> >     The time values in stat and inode may differ for overlayfs and stat time
> >     values are the correct ones to use. This is also consistent with the fact
> >     that fill_post_wcc() also stores stat time values.
> > 
> >     This means introducing a stat call that could fail, where previously we
> >     were just copying values out of the inode.  To be conservative about
> >     changing behavior, we fall back to copying values out of the inode in
> >     the error case.  It might be better just to clear fh_pre_saved (though
> >     note the BUG_ON in set_change_info).
> > 
> >     Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > 
> > I was thinking it might have been added to handle odd corner
> > cases around re-exporting NFS mounts, but that does not seem
> > to be the case.
> > 
> > The fh_getattr() can fail for legitimate reasons -- like the
> > file is in the middle of being deleted or renamed over -- I
> > would think. This code should really deal with that by not
> > adding pre-op attrs, since they are optional.
> > 
> 
> That sounds fine to me. I'll plan to drop this patch from the series and
> I'll send a separate patch to just remove those branches altogether
> (which should DTRT).

I'll wait with reviewing this until you send the next version then.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime
  2023-05-19 10:36       ` Christian Brauner
@ 2023-05-19 11:22         ` Jeff Layton
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-19 11:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Chuck Lever III, Al Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Jan Kara, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, open list, Linux-XFS, linux-btrfs, linux-ext4,
	linux-mm, Linux NFS Mailing List, linux-cifs

On Fri, 2023-05-19 at 12:36 +0200, Christian Brauner wrote:
> On Thu, May 18, 2023 at 11:31:45AM -0400, Jeff Layton wrote:
> > On Thu, 2023-05-18 at 13:43 +0000, Chuck Lever III wrote:
> > > 
> > > > On May 18, 2023, at 7:47 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > 
> > > > If getattr fails, then nfsd can end up scraping the time values directly
> > > > out of the inode for pre and post-op attrs. This may or may not be the
> > > > right thing to do, but for now make it at least use ctime_peek in this
> > > > situation to ensure that the QUERIED flag is masked.
> > > 
> > > That code comes from:
> > > 
> > > commit 39ca1bf624b6b82cc895b0217889eaaf572a7913
> > > Author:     Amir Goldstein <amir73il@gmail.com>
> > > AuthorDate: Wed Jan 3 17:14:35 2018 +0200
> > > Commit:     J. Bruce Fields <bfields@redhat.com>
> > > CommitDate: Thu Feb 8 13:40:17 2018 -0500
> > > 
> > >     nfsd: store stat times in fill_pre_wcc() instead of inode times
> > > 
> > >     The time values in stat and inode may differ for overlayfs and stat time
> > >     values are the correct ones to use. This is also consistent with the fact
> > >     that fill_post_wcc() also stores stat time values.
> > > 
> > >     This means introducing a stat call that could fail, where previously we
> > >     were just copying values out of the inode.  To be conservative about
> > >     changing behavior, we fall back to copying values out of the inode in
> > >     the error case.  It might be better just to clear fh_pre_saved (though
> > >     note the BUG_ON in set_change_info).
> > > 
> > >     Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> > > 
> > > I was thinking it might have been added to handle odd corner
> > > cases around re-exporting NFS mounts, but that does not seem
> > > to be the case.
> > > 
> > > The fh_getattr() can fail for legitimate reasons -- like the
> > > file is in the middle of being deleted or renamed over -- I
> > > would think. This code should really deal with that by not
> > > adding pre-op attrs, since they are optional.
> > > 
> > 
> > That sounds fine to me. I'll plan to drop this patch from the series and
> > I'll send a separate patch to just remove those branches altogether
> > (which should DTRT).
> 
> I'll wait with reviewing this until you send the next version then.

I don't have any other big changes queued up. So far, this would just be
the exact same set, without this patch.

FWIW, I'm mostly interested in your review of patches #1 and 2. Is
altering prototype for generic_fillattr, and changing the logic in
current_time the right approach here?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 8/9] ext4: convert to multigrain timestamps
  2023-05-18 11:47 ` [PATCH v4 8/9] ext4: convert " Jeff Layton
@ 2023-05-20 20:29   ` Theodore Ts'o
  0 siblings, 0 replies; 31+ messages in thread
From: Theodore Ts'o @ 2023-05-20 20:29 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

Acked-by: Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 0/9] fs: implement multigrain timestamps
  2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
                   ` (8 preceding siblings ...)
  2023-05-18 11:47 ` [PATCH v4 9/9] btrfs: " Jeff Layton
@ 2023-05-22  9:54 ` Christian Brauner
  9 siblings, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-05-22  9:54 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Christian Brauner, Jan Kara, Amir Goldstein, David Howells,
	Neil Brown, Matthew Wilcox, Andreas Dilger, Theodore T'so,
	Chris Mason, Josef Bacik, David Sterba, Namjae Jeon,
	Steve French, Sergey Senozhatsky, Tom Talpey, linux-fsdevel,
	linux-kernel, linux-xfs, linux-btrfs, linux-ext4, linux-mm,
	linux-nfs, linux-cifs, Alexander Viro, Darrick J. Wong,
	Hugh Dickins, Andrew Morton, Dave Chinner, Chuck Lever

On Thu, 18 May 2023 07:47:33 -0400, Jeff Layton wrote:
> v4:
> - add request_mask argument to generic_fillattr
> - Drop current_ctime helper and just code functionality into current_time
> - rework i_ctime accessor functions
> 
> A few weeks ago, during one of the discussions around i_version, Dave
> Chinner wrote this:
> 
> [...]

Let's get this into -next so we can see whether this leads to any
performance or other regressions. It's moved to a vfs.unstable.* branch
for now. If nothing bad happens it'll be upgraded to a vfs.* branch.
Filesystems that prefer to carry their fs specific patch themselves can
request a stable tag for the generic changes.

---

Applied to the vfs.unstable.ctime branch of the vfs/vfs.git tree.
Patches in the vfs.unstable.ctime branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.unstable.ctime

[1/9] fs: pass the request_mask to generic_fillattr
      https://git.kernel.org/vfs/vfs/c/ec1239bab5fd
[2/9] fs: add infrastructure for multigrain inode i_m/ctime
      https://git.kernel.org/vfs/vfs/c/97e9fbb03240
[3/9] overlayfs: allow it to handle multigrain timestamps
      https://git.kernel.org/vfs/vfs/c/ada0fe43f748
[4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime
      https://git.kernel.org/vfs/vfs/c/39493918b700
[5/9] ksmbd: use ctime_peek to grab the ctime out of the inode
      https://git.kernel.org/vfs/vfs/c/35527cdc7840
[6/9] tmpfs: add support for multigrain timestamps
      https://git.kernel.org/vfs/vfs/c/ce1dc211dbde
[7/9] xfs: switch to multigrain timestamps
      https://git.kernel.org/vfs/vfs/c/78bbdfd2fb74
[8/9] ext4: convert to multigrain timestamps
      https://git.kernel.org/vfs/vfs/c/d2100ca52e14
[9/9] btrfs: convert to multigrain timestamps
      https://git.kernel.org/vfs/vfs/c/c725b40cfbd5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] btrfs: convert to multigrain timestamps
  2023-05-18 11:47 ` [PATCH v4 9/9] btrfs: " Jeff Layton
@ 2023-05-22  9:56   ` David Sterba
  2023-05-22 10:08     ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: David Sterba @ 2023-05-22  9:56 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Thu, May 18, 2023 at 07:47:42AM -0400, Jeff Layton wrote:
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Acked-by: David Sterba <dsterba@suse.com>

Please add a brief description to the changelog too, there's a similar
text in the patches adding the infrastructure. Something like "Allow to
optimize lot of metadata updates by encoding the status in the cmtime.
The fine grained time is needed for NFS."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] btrfs: convert to multigrain timestamps
  2023-05-22  9:56   ` David Sterba
@ 2023-05-22 10:08     ` Jeff Layton
  2023-05-22 10:53       ` Christian Brauner
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-22 10:08 UTC (permalink / raw)
  To: dsterba
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Mon, 2023-05-22 at 11:56 +0200, David Sterba wrote:
> On Thu, May 18, 2023 at 07:47:42AM -0400, Jeff Layton wrote:
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> Acked-by: David Sterba <dsterba@suse.com>
> 
> Please add a brief description to the changelog too, there's a similar
> text in the patches adding the infrastructure. Something like "Allow to
> optimize lot of metadata updates by encoding the status in the cmtime.
> The fine grained time is needed for NFS."

Sure thing.

Christian, do you want to just alter the changelog with David's
suggestion, or would you rather I resend the series?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 9/9] btrfs: convert to multigrain timestamps
  2023-05-22 10:08     ` Jeff Layton
@ 2023-05-22 10:53       ` Christian Brauner
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-05-22 10:53 UTC (permalink / raw)
  To: Jeff Layton
  Cc: dsterba, Alexander Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Mon, May 22, 2023 at 06:08:56AM -0400, Jeff Layton wrote:
> On Mon, 2023-05-22 at 11:56 +0200, David Sterba wrote:
> > On Thu, May 18, 2023 at 07:47:42AM -0400, Jeff Layton wrote:
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > 
> > Acked-by: David Sterba <dsterba@suse.com>
> > 
> > Please add a brief description to the changelog too, there's a similar
> > text in the patches adding the infrastructure. Something like "Allow to
> > optimize lot of metadata updates by encoding the status in the cmtime.
> > The fine grained time is needed for NFS."
> 
> Sure thing.
> 
> Christian, do you want to just alter the changelog with David's
> suggestion, or would you rather I resend the series?

Nah, don't bother resending I'll just add it to the fs specific patches.
I'll end up updating the patch trailers anyway when individual
maintainers add new Acks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr
  2023-05-18 11:47 ` [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr Jeff Layton
@ 2023-05-23  9:17   ` Jan Kara
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Kara @ 2023-05-23  9:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Thu 18-05-23 07:47:34, Jeff Layton wrote:
> generic_fillattr just fills in the entire stat struct indiscriminately
> today, copying data from the inode. There is at least one attribute
> (STATX_CHANGE_COOKIE) that can have side effects when it is reported,
> and we're looking at adding more with the addition of multigrain
> timestamps.
> 
> Add a request_mask argument to generic_fillattr and have most callers
> just pass in the value that is passed to getattr. Have other callers
> (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of
> STATX_CHANGE_COOKIE into generic_fillattr.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/9p/vfs_inode.c      |  4 ++--
>  fs/9p/vfs_inode_dotl.c |  4 ++--
>  fs/afs/inode.c         |  2 +-
>  fs/btrfs/inode.c       |  2 +-
>  fs/ceph/inode.c        |  2 +-
>  fs/cifs/inode.c        |  2 +-
>  fs/coda/inode.c        |  3 ++-
>  fs/ecryptfs/inode.c    |  5 +++--
>  fs/erofs/inode.c       |  2 +-
>  fs/exfat/file.c        |  2 +-
>  fs/ext2/inode.c        |  2 +-
>  fs/ext4/inode.c        |  2 +-
>  fs/f2fs/file.c         |  2 +-
>  fs/fat/file.c          |  2 +-
>  fs/fuse/dir.c          |  2 +-
>  fs/gfs2/inode.c        |  2 +-
>  fs/hfsplus/inode.c     |  2 +-
>  fs/kernfs/inode.c      |  2 +-
>  fs/ksmbd/smb2pdu.c     | 22 +++++++++++-----------
>  fs/ksmbd/vfs.c         |  3 ++-
>  fs/libfs.c             |  4 ++--
>  fs/minix/inode.c       |  2 +-
>  fs/nfs/inode.c         |  2 +-
>  fs/nfs/namespace.c     |  3 ++-
>  fs/ntfs3/file.c        |  2 +-
>  fs/ocfs2/file.c        |  2 +-
>  fs/orangefs/inode.c    |  2 +-
>  fs/proc/base.c         |  4 ++--
>  fs/proc/fd.c           |  2 +-
>  fs/proc/generic.c      |  2 +-
>  fs/proc/proc_net.c     |  2 +-
>  fs/proc/proc_sysctl.c  |  2 +-
>  fs/proc/root.c         |  3 ++-
>  fs/stat.c              | 18 ++++++++++--------
>  fs/sysv/itree.c        |  3 ++-
>  fs/ubifs/dir.c         |  2 +-
>  fs/udf/symlink.c       |  2 +-
>  fs/vboxsf/utils.c      |  2 +-
>  include/linux/fs.h     |  2 +-
>  mm/shmem.c             |  2 +-
>  40 files changed, 70 insertions(+), 62 deletions(-)
> 
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 36b466e35887..11bd069d038b 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -1016,7 +1016,7 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
>  	v9ses = v9fs_dentry2v9ses(dentry);
>  	if (v9ses->cache & (CACHE_META|CACHE_LOOSE)) {
> -		generic_fillattr(&nop_mnt_idmap, inode, stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  		return 0;
>  	} else if (v9ses->cache & CACHE_WRITEBACK) {
>  		if (S_ISREG(inode->i_mode)) {
> @@ -1037,7 +1037,7 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  		return PTR_ERR(st);
>  
>  	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> -	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>  
>  	p9stat_free(st);
>  	kfree(st);
> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> index 5361cd2d7996..04083fbf0c91 100644
> --- a/fs/9p/vfs_inode_dotl.c
> +++ b/fs/9p/vfs_inode_dotl.c
> @@ -451,7 +451,7 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>  	p9_debug(P9_DEBUG_VFS, "dentry: %p\n", dentry);
>  	v9ses = v9fs_dentry2v9ses(dentry);
>  	if (v9ses->cache & (CACHE_META|CACHE_LOOSE)) {
> -		generic_fillattr(&nop_mnt_idmap, inode, stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  		return 0;
>  	} else if (v9ses->cache) {
>  		if (S_ISREG(inode->i_mode)) {
> @@ -476,7 +476,7 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>  		return PTR_ERR(st);
>  
>  	v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> -	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>  	/* Change block size to what the server returned */
>  	stat->blksize = st->st_blksize;
>  
> diff --git a/fs/afs/inode.c b/fs/afs/inode.c
> index 866bab860a88..54a4a1dd3bba 100644
> --- a/fs/afs/inode.c
> +++ b/fs/afs/inode.c
> @@ -773,7 +773,7 @@ int afs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  
>  	do {
>  		read_seqbegin_or_lock(&vnode->cb_lock, &seq);
> -		generic_fillattr(&nop_mnt_idmap, inode, stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  		if (test_bit(AFS_VNODE_SILLY_DELETED, &vnode->flags) &&
>  		    stat->nlink > 0)
>  			stat->nlink -= 1;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 19c707bc8801..2335b5e1cecc 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8625,7 +8625,7 @@ static int btrfs_getattr(struct mnt_idmap *idmap,
>  				  STATX_ATTR_IMMUTABLE |
>  				  STATX_ATTR_NODUMP);
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  	stat->dev = BTRFS_I(inode)->root->anon_dev;
>  
>  	spin_lock(&BTRFS_I(inode)->lock);
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 8e5f41d45283..2e479dca6845 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -2465,7 +2465,7 @@ int ceph_getattr(struct mnt_idmap *idmap, const struct path *path,
>  			return err;
>  	}
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	stat->ino = ceph_present_inode(inode);
>  
>  	/*
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index 1087ac6104a9..1ba09e39a1de 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -2540,7 +2540,7 @@ int cifs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  			return rc;
>  	}
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	stat->blksize = cifs_sb->ctx->bsize;
>  	stat->ino = CIFS_I(inode)->uniqueid;
>  
> diff --git a/fs/coda/inode.c b/fs/coda/inode.c
> index d661e6cf17ac..f0be47be3587 100644
> --- a/fs/coda/inode.c
> +++ b/fs/coda/inode.c
> @@ -256,7 +256,8 @@ int coda_getattr(struct mnt_idmap *idmap, const struct path *path,
>  {
>  	int err = coda_revalidate_inode(d_inode(path->dentry));
>  	if (!err)
> -		generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask,
> +				 d_inode(path->dentry), stat);
>  	return err;
>  }
>  
> diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
> index 83274915ba6d..e4c1b62a2de2 100644
> --- a/fs/ecryptfs/inode.c
> +++ b/fs/ecryptfs/inode.c
> @@ -982,7 +982,7 @@ static int ecryptfs_getattr_link(struct mnt_idmap *idmap,
>  
>  	mount_crypt_stat = &ecryptfs_superblock_to_private(
>  						dentry->d_sb)->mount_crypt_stat;
> -	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>  	if (mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES) {
>  		char *target;
>  		size_t targetsiz;
> @@ -1011,7 +1011,8 @@ static int ecryptfs_getattr(struct mnt_idmap *idmap,
>  	if (!rc) {
>  		fsstack_copy_attr_all(d_inode(dentry),
>  				      ecryptfs_inode_to_lower(d_inode(dentry)));
> -		generic_fillattr(&nop_mnt_idmap, d_inode(dentry), stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask,
> +				 d_inode(dentry), stat);
>  		stat->blocks = lower_stat.blocks;
>  	}
>  	return rc;
> diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
> index d70b12b81507..b8aeb1251997 100644
> --- a/fs/erofs/inode.c
> +++ b/fs/erofs/inode.c
> @@ -372,7 +372,7 @@ int erofs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	stat->attributes_mask |= (STATX_ATTR_COMPRESSED |
>  				  STATX_ATTR_IMMUTABLE);
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  
> diff --git a/fs/exfat/file.c b/fs/exfat/file.c
> index e99183a74611..29b6229fddad 100644
> --- a/fs/exfat/file.c
> +++ b/fs/exfat/file.c
> @@ -232,7 +232,7 @@ int exfat_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	struct inode *inode = d_backing_inode(path->dentry);
>  	struct exfat_inode_info *ei = EXFAT_I(inode);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	exfat_truncate_atime(&stat->atime);
>  	stat->result_mask |= STATX_BTIME;
>  	stat->btime.tv_sec = ei->i_crtime.tv_sec;
> diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
> index 26f135e7ffce..eb4d32fcbf17 100644
> --- a/fs/ext2/inode.c
> +++ b/fs/ext2/inode.c
> @@ -1614,7 +1614,7 @@ int ext2_getattr(struct mnt_idmap *idmap, const struct path *path,
>  			STATX_ATTR_IMMUTABLE |
>  			STATX_ATTR_NODUMP);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index ce5f21b6c2b3..e0bbcf7a07b5 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5539,7 +5539,7 @@ int ext4_getattr(struct mnt_idmap *idmap, const struct path *path,
>  				  STATX_ATTR_NODUMP |
>  				  STATX_ATTR_VERITY);
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 5ac53d2627d2..bb16319a9491 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -888,7 +888,7 @@ int f2fs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  				  STATX_ATTR_NODUMP |
>  				  STATX_ATTR_VERITY);
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  
>  	/* we need to show initial sectors used for inline_data/dentries */
>  	if ((S_ISREG(inode->i_mode) && f2fs_has_inline_data(inode)) ||
> diff --git a/fs/fat/file.c b/fs/fat/file.c
> index 795a4fad5c40..650f77422057 100644
> --- a/fs/fat/file.c
> +++ b/fs/fat/file.c
> @@ -401,7 +401,7 @@ int fat_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	struct inode *inode = d_inode(path->dentry);
>  	struct msdos_sb_info *sbi = MSDOS_SB(inode->i_sb);
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  	stat->blksize = sbi->cluster_size;
>  
>  	if (sbi->options.nfs == FAT_NFS_NOSTALE_RO) {
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 35bc174f9ba2..a445d72bd98f 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1224,7 +1224,7 @@ static int fuse_update_get_attr(struct inode *inode, struct file *file,
>  		forget_all_cached_acls(inode);
>  		err = fuse_do_getattr(inode, stat, file);
>  	} else if (stat) {
> -		generic_fillattr(&nop_mnt_idmap, inode, stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  		stat->mode = fi->orig_i_mode;
>  		stat->ino = fi->orig_ino;
>  	}
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index 17c994a0c0d0..5b3c62d20db5 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -2071,7 +2071,7 @@ static int gfs2_getattr(struct mnt_idmap *idmap,
>  				  STATX_ATTR_IMMUTABLE |
>  				  STATX_ATTR_NODUMP);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  
>  	if (gfs2_holder_initialized(&gh))
>  		gfs2_glock_dq_uninit(&gh);
> diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
> index b21660475ac1..6aefdbd3e7b6 100644
> --- a/fs/hfsplus/inode.c
> +++ b/fs/hfsplus/inode.c
> @@ -298,7 +298,7 @@ int hfsplus_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	stat->attributes_mask |= STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE |
>  				 STATX_ATTR_NODUMP;
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  
> diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c
> index b22b74d1a115..b01cc38ed843 100644
> --- a/fs/kernfs/inode.c
> +++ b/fs/kernfs/inode.c
> @@ -191,7 +191,7 @@ int kernfs_iop_getattr(struct mnt_idmap *idmap,
>  
>  	down_read(&root->kernfs_iattr_rwsem);
>  	kernfs_refresh_inode(kn, inode);
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	up_read(&root->kernfs_iattr_rwsem);
>  
>  	return 0;
> diff --git a/fs/ksmbd/smb2pdu.c b/fs/ksmbd/smb2pdu.c
> index cb93fd231f4e..d39ddb344417 100644
> --- a/fs/ksmbd/smb2pdu.c
> +++ b/fs/ksmbd/smb2pdu.c
> @@ -4342,8 +4342,8 @@ static int get_file_basic_info(struct smb2_query_info_rsp *rsp,
>  	}
>  
>  	basic_info = (struct smb2_file_basic_info *)rsp->Buffer;
> -	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
> -			 &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
> +			 file_inode(fp->filp), &stat);
>  	basic_info->CreationTime = cpu_to_le64(fp->create_time);
>  	time = ksmbd_UnixTimeToNT(stat.atime);
>  	basic_info->LastAccessTime = cpu_to_le64(time);
> @@ -4383,7 +4383,7 @@ static void get_file_standard_info(struct smb2_query_info_rsp *rsp,
>  	struct kstat stat;
>  
>  	inode = file_inode(fp->filp);
> -	generic_fillattr(file_mnt_idmap(fp->filp), inode, &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS, inode, &stat);
>  
>  	sinfo = (struct smb2_file_standard_info *)rsp->Buffer;
>  	delete_pending = ksmbd_inode_pending_delete(fp);
> @@ -4437,7 +4437,7 @@ static int get_file_all_info(struct ksmbd_work *work,
>  		return PTR_ERR(filename);
>  
>  	inode = file_inode(fp->filp);
> -	generic_fillattr(file_mnt_idmap(fp->filp), inode, &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS, inode, &stat);
>  
>  	ksmbd_debug(SMB, "filename = %s\n", filename);
>  	delete_pending = ksmbd_inode_pending_delete(fp);
> @@ -4514,8 +4514,8 @@ static void get_file_stream_info(struct ksmbd_work *work,
>  	int buf_free_len;
>  	struct smb2_query_info_req *req = ksmbd_req_buf_next(work);
>  
> -	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
> -			 &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
> +			 file_inode(fp->filp), &stat);
>  	file_info = (struct smb2_file_stream_info *)rsp->Buffer;
>  
>  	buf_free_len =
> @@ -4605,8 +4605,8 @@ static void get_file_internal_info(struct smb2_query_info_rsp *rsp,
>  	struct smb2_file_internal_info *file_info;
>  	struct kstat stat;
>  
> -	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
> -			 &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
> +			 file_inode(fp->filp), &stat);
>  	file_info = (struct smb2_file_internal_info *)rsp->Buffer;
>  	file_info->IndexNumber = cpu_to_le64(stat.ino);
>  	rsp->OutputBufferLength =
> @@ -4631,7 +4631,7 @@ static int get_file_network_open_info(struct smb2_query_info_rsp *rsp,
>  	file_info = (struct smb2_file_ntwrk_info *)rsp->Buffer;
>  
>  	inode = file_inode(fp->filp);
> -	generic_fillattr(file_mnt_idmap(fp->filp), inode, &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS, inode, &stat);
>  
>  	file_info->CreationTime = cpu_to_le64(fp->create_time);
>  	time = ksmbd_UnixTimeToNT(stat.atime);
> @@ -4692,8 +4692,8 @@ static void get_file_compression_info(struct smb2_query_info_rsp *rsp,
>  	struct smb2_file_comp_info *file_info;
>  	struct kstat stat;
>  
> -	generic_fillattr(file_mnt_idmap(fp->filp), file_inode(fp->filp),
> -			 &stat);
> +	generic_fillattr(file_mnt_idmap(fp->filp), STATX_BASIC_STATS,
> +			 file_inode(fp->filp), &stat);
>  
>  	file_info = (struct smb2_file_comp_info *)rsp->Buffer;
>  	file_info->CompressedFileSize = cpu_to_le64(stat.blocks << 9);
> diff --git a/fs/ksmbd/vfs.c b/fs/ksmbd/vfs.c
> index 778c152708e4..47a4fe1e5043 100644
> --- a/fs/ksmbd/vfs.c
> +++ b/fs/ksmbd/vfs.c
> @@ -1597,7 +1597,8 @@ int ksmbd_vfs_fill_dentry_attrs(struct ksmbd_work *work,
>  	u64 time;
>  	int rc;
>  
> -	generic_fillattr(idmap, d_inode(dentry), ksmbd_kstat->kstat);
> +	generic_fillattr(idmap, STATX_BASIC_STATS, d_inode(dentry),
> +			 ksmbd_kstat->kstat);
>  
>  	time = ksmbd_UnixTimeToNT(ksmbd_kstat->kstat->ctime);
>  	ksmbd_kstat->create_time = time;
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 89cf614a3271..b8f7be638f17 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -33,7 +33,7 @@ int simple_getattr(struct mnt_idmap *idmap, const struct path *path,
>  		   unsigned int query_flags)
>  {
>  	struct inode *inode = d_inode(path->dentry);
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	stat->blocks = inode->i_mapping->nrpages << (PAGE_SHIFT - 9);
>  	return 0;
>  }
> @@ -1315,7 +1315,7 @@ static int empty_dir_getattr(struct mnt_idmap *idmap,
>  			     u32 request_mask, unsigned int query_flags)
>  {
>  	struct inode *inode = d_inode(path->dentry);
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  
> diff --git a/fs/minix/inode.c b/fs/minix/inode.c
> index e9fbb5303a22..a7f927e3760c 100644
> --- a/fs/minix/inode.c
> +++ b/fs/minix/inode.c
> @@ -660,7 +660,7 @@ int minix_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	struct super_block *sb = path->dentry->d_sb;
>  	struct inode *inode = d_inode(path->dentry);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	if (INODE_VERSION(inode) == MINIX_V1)
>  		stat->blocks = (BLOCK_SIZE / 512) * V1_minix_blocks(stat->size, sb);
>  	else
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index a910b9a638c5..7f490c29ab6b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -912,7 +912,7 @@ int nfs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	/* Only return attributes that were revalidated. */
>  	stat->result_mask = nfs_get_valid_attrmask(inode) | request_mask;
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
>  	stat->change_cookie = inode_peek_iversion_raw(inode);
>  	stat->attributes_mask |= STATX_ATTR_CHANGE_MONOTONIC;
> diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
> index 19d51ebf842c..e7494cdd957e 100644
> --- a/fs/nfs/namespace.c
> +++ b/fs/nfs/namespace.c
> @@ -215,7 +215,8 @@ nfs_namespace_getattr(struct mnt_idmap *idmap,
>  	if (NFS_FH(d_inode(path->dentry))->size != 0)
>  		return nfs_getattr(idmap, path, stat, request_mask,
>  				   query_flags);
> -	generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
> +			 stat);
>  	return 0;
>  }
>  
> diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> index 9a3d55c367d9..43ffd48eb048 100644
> --- a/fs/ntfs3/file.c
> +++ b/fs/ntfs3/file.c
> @@ -85,7 +85,7 @@ int ntfs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  
>  	stat->attributes_mask |= STATX_ATTR_COMPRESSED | STATX_ATTR_ENCRYPTED;
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  
>  	stat->result_mask |= STATX_BTIME;
>  	stat->btime = ni->i_crtime;
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index efb09de4343d..5dc659a53311 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -1317,7 +1317,7 @@ int ocfs2_getattr(struct mnt_idmap *idmap, const struct path *path,
>  		goto bail;
>  	}
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	/*
>  	 * If there is inline data in the inode, the inode will normally not
>  	 * have data blocks allocated (it may have an external xattr block).
> diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
> index 9014bbcc8031..a52c30e80f45 100644
> --- a/fs/orangefs/inode.c
> +++ b/fs/orangefs/inode.c
> @@ -871,7 +871,7 @@ int orangefs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	ret = orangefs_inode_getattr(inode,
>  	    request_mask & STATX_SIZE ? ORANGEFS_GETATTR_SIZE : 0);
>  	if (ret == 0) {
> -		generic_fillattr(&nop_mnt_idmap, inode, stat);
> +		generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  
>  		/* override block size reported to stat */
>  		if (!(request_mask & STATX_SIZE))
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 05452c3b9872..98eab84553f9 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1966,7 +1966,7 @@ int pid_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	struct proc_fs_info *fs_info = proc_sb_info(inode->i_sb);
>  	struct task_struct *task;
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  
>  	stat->uid = GLOBAL_ROOT_UID;
>  	stat->gid = GLOBAL_ROOT_GID;
> @@ -3899,7 +3899,7 @@ static int proc_task_getattr(struct mnt_idmap *idmap,
>  {
>  	struct inode *inode = d_inode(path->dentry);
>  	struct task_struct *p = get_proc_task(inode);
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  
>  	if (p) {
>  		stat->nlink += get_nr_threads(p);
> diff --git a/fs/proc/fd.c b/fs/proc/fd.c
> index b3140deebbbf..6276b3938842 100644
> --- a/fs/proc/fd.c
> +++ b/fs/proc/fd.c
> @@ -352,7 +352,7 @@ static int proc_fd_getattr(struct mnt_idmap *idmap,
>  	struct inode *inode = d_inode(path->dentry);
>  	int rv = 0;
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  
>  	/* If it's a directory, put the number of open fds there */
>  	if (S_ISDIR(inode->i_mode)) {
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index 42ae38ff6e7e..775ce0bcf08c 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -146,7 +146,7 @@ static int proc_getattr(struct mnt_idmap *idmap,
>  		}
>  	}
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  
> diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
> index a0c0419872e3..75f35f128e63 100644
> --- a/fs/proc/proc_net.c
> +++ b/fs/proc/proc_net.c
> @@ -308,7 +308,7 @@ static int proc_tgid_net_getattr(struct mnt_idmap *idmap,
>  
>  	net = get_proc_task_net(inode);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  
>  	if (net != NULL) {
>  		stat->nlink = net->proc_net->nlink;
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index 8038833ff5b0..c00b15b0ba81 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -855,7 +855,7 @@ static int proc_sys_getattr(struct mnt_idmap *idmap,
>  	if (IS_ERR(head))
>  		return PTR_ERR(head);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	if (table)
>  		stat->mode = (stat->mode & S_IFMT) | table->mode;
>  
> diff --git a/fs/proc/root.c b/fs/proc/root.c
> index a86e65a608da..9191248f2dac 100644
> --- a/fs/proc/root.c
> +++ b/fs/proc/root.c
> @@ -314,7 +314,8 @@ static int proc_root_getattr(struct mnt_idmap *idmap,
>  			     const struct path *path, struct kstat *stat,
>  			     u32 request_mask, unsigned int query_flags)
>  {
> -	generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
> +			 stat);
>  	stat->nlink = proc_root.nlink + nr_processes();
>  	return 0;
>  }
> diff --git a/fs/stat.c b/fs/stat.c
> index 7c238da22ef0..9b513a142a56 100644
> --- a/fs/stat.c
> +++ b/fs/stat.c
> @@ -29,6 +29,7 @@
>  /**
>   * generic_fillattr - Fill in the basic attributes from the inode struct
>   * @idmap:	idmap of the mount the inode was found from
> + * @req_mask	statx request_mask
>   * @inode:	Inode to use as the source
>   * @stat:	Where to fill in the attributes
>   *
> @@ -42,8 +43,8 @@
>   * uid and gid filds. On non-idmapped mounts or if permission checking is to be
>   * performed on the raw inode simply passs @nop_mnt_idmap.
>   */
> -void generic_fillattr(struct mnt_idmap *idmap, struct inode *inode,
> -		      struct kstat *stat)
> +void generic_fillattr(struct mnt_idmap *idmap, u32 request_mask,
> +		      struct inode *inode, struct kstat *stat)
>  {
>  	vfsuid_t vfsuid = i_uid_into_vfsuid(idmap, inode);
>  	vfsgid_t vfsgid = i_gid_into_vfsgid(idmap, inode);
> @@ -61,6 +62,12 @@ void generic_fillattr(struct mnt_idmap *idmap, struct inode *inode,
>  	stat->ctime = inode->i_ctime;
>  	stat->blksize = i_blocksize(inode);
>  	stat->blocks = inode->i_blocks;
> +
> +	if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
> +		stat->result_mask |= STATX_CHANGE_COOKIE;
> +		stat->change_cookie = inode_query_iversion(inode);
> +	}
> +
>  }
>  EXPORT_SYMBOL(generic_fillattr);
>  
> @@ -123,17 +130,12 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
>  	stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT |
>  				  STATX_ATTR_DAX);
>  
> -	if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
> -		stat->result_mask |= STATX_CHANGE_COOKIE;
> -		stat->change_cookie = inode_query_iversion(inode);
> -	}
> -
>  	idmap = mnt_idmap(path->mnt);
>  	if (inode->i_op->getattr)
>  		return inode->i_op->getattr(idmap, path, stat,
>  					    request_mask, query_flags);
>  
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  	return 0;
>  }
>  EXPORT_SYMBOL(vfs_getattr_nosec);
> diff --git a/fs/sysv/itree.c b/fs/sysv/itree.c
> index b22764fe669c..d41189cd9ec2 100644
> --- a/fs/sysv/itree.c
> +++ b/fs/sysv/itree.c
> @@ -445,7 +445,8 @@ int sysv_getattr(struct mnt_idmap *idmap, const struct path *path,
>  		 struct kstat *stat, u32 request_mask, unsigned int flags)
>  {
>  	struct super_block *s = path->dentry->d_sb;
> -	generic_fillattr(&nop_mnt_idmap, d_inode(path->dentry), stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(path->dentry),
> +			 stat);
>  	stat->blocks = (s->s_blocksize / 512) * sysv_nblocks(s, stat->size);
>  	stat->blksize = s->s_blocksize;
>  	return 0;
> diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
> index ef0499edc248..ca044f31c49d 100644
> --- a/fs/ubifs/dir.c
> +++ b/fs/ubifs/dir.c
> @@ -1665,7 +1665,7 @@ int ubifs_getattr(struct mnt_idmap *idmap, const struct path *path,
>  				STATX_ATTR_ENCRYPTED |
>  				STATX_ATTR_IMMUTABLE);
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	stat->blksize = UBIFS_BLOCK_SIZE;
>  	stat->size = ui->ui_size;
>  
> diff --git a/fs/udf/symlink.c b/fs/udf/symlink.c
> index a34c8c4e6d21..69f6982aec9a 100644
> --- a/fs/udf/symlink.c
> +++ b/fs/udf/symlink.c
> @@ -153,7 +153,7 @@ static int udf_symlink_getattr(struct mnt_idmap *idmap,
>  	struct inode *inode = d_backing_inode(dentry);
>  	struct page *page;
>  
> -	generic_fillattr(&nop_mnt_idmap, inode, stat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
>  	page = read_mapping_page(inode->i_mapping, 0, NULL);
>  	if (IS_ERR(page))
>  		return PTR_ERR(page);
> diff --git a/fs/vboxsf/utils.c b/fs/vboxsf/utils.c
> index dd0ae1188e87..482f778709d6 100644
> --- a/fs/vboxsf/utils.c
> +++ b/fs/vboxsf/utils.c
> @@ -252,7 +252,7 @@ int vboxsf_getattr(struct mnt_idmap *idmap, const struct path *path,
>  	if (err)
>  		return err;
>  
> -	generic_fillattr(&nop_mnt_idmap, d_inode(dentry), kstat);
> +	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), kstat);
>  	return 0;
>  }
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 21a981680856..d5896f90093a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2857,7 +2857,7 @@ extern void page_put_link(void *);
>  extern int page_symlink(struct inode *inode, const char *symname, int len);
>  extern const struct inode_operations page_symlink_inode_operations;
>  extern void kfree_link(void *);
> -void generic_fillattr(struct mnt_idmap *, struct inode *, struct kstat *);
> +void generic_fillattr(struct mnt_idmap *, u32, struct inode *, struct kstat *);
>  void generic_fill_statx_attr(struct inode *inode, struct kstat *stat);
>  extern int vfs_getattr_nosec(const struct path *, struct kstat *, u32, unsigned int);
>  extern int vfs_getattr(const struct path *, struct kstat *, u32, unsigned int);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index e40a08c5c6d7..8208d4f85dff 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1073,7 +1073,7 @@ static int shmem_getattr(struct mnt_idmap *idmap,
>  	stat->attributes_mask |= (STATX_ATTR_APPEND |
>  			STATX_ATTR_IMMUTABLE |
>  			STATX_ATTR_NODUMP);
> -	generic_fillattr(idmap, inode, stat);
> +	generic_fillattr(idmap, request_mask, inode, stat);
>  
>  	if (shmem_is_huge(inode, 0, false, NULL, 0))
>  		stat->blksize = HPAGE_PMD_SIZE;
> -- 
> 2.40.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-18 11:47 ` [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
@ 2023-05-23 10:02   ` Jan Kara
  2023-05-23 10:17     ` Jan Kara
                       ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Jan Kara @ 2023-05-23 10:02 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> The VFS always uses coarse-grained timestamp updates for filling out the
> ctime and mtime after a change. This has the benefit of allowing
> filesystems to optimize away a lot metadata updates, down to around 1
> per jiffy, even when a file is under heavy writes.
> 
> Unfortunately, this has always been an issue when we're exporting via
> NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> lot of exported filesystems don't properly support a change attribute
> and are subject to the same problems with timestamp granularity. Other
> applications have similar issues (e.g backup applications).
> 
> Switching to always using fine-grained timestamps would improve the
> situation, but that becomes rather expensive, as the underlying
> filesystem will have to log a lot more metadata updates.
> 
> What we need is a way to only use fine-grained timestamps when they are
> being actively queried.
> 
> The kernel always stores normalized ctime values, so only the first 30
> bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> ctime must also change.
> 
> Use the 31st bit of the ctime tv_nsec field to indicate that something
> has queried the inode for the i_mtime or i_ctime. When this flag is set,
> on the next timestamp update, the kernel can fetch a fine-grained
> timestamp instead of the usual coarse-grained one.
> 
> This patch adds the infrastructure this scheme. Filesytems can opt
> into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> 
> Later patches will convert individual filesystems over to use it.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

So there are two things I dislike about this series because I think they
are fragile:

1) If we have a filesystem supporting multigrain ts and someone
accidentally directly uses the value of inode->i_ctime, he can get bogus
value (with QUERIED flag). This mistake is very easy to do. So I think we
should rename i_ctime to something like __i_ctime and always use accessor
function for it.

2) As I already commented in a previous version of the series, the scheme
with just one flag for both ctime and mtime and flag getting cleared in
current_time() relies on the fact that filesystems always do an equivalent
of:

	inode->i_mtime = inode->i_ctime = current_time();

Otherwise we can do coarse grained update where we should have done a fine
grained one. Filesystems often update timestamps like this but not
universally. Grepping shows some instances where only inode->i_mtime is set
from current_time() e.g. in autofs or bfs. Again a mistake that is rather
easy to make and results in subtle issues. I think this would be also
nicely solved by renaming i_ctime to __i_ctime and using a function to set
ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().

I understand this is quite some churn but a very mechanical one that could
be just done with Coccinelle and a few manual fixups. So IMHO it is worth
the more robust result.

Some more nits below.

> +/**
> + * current_mg_time - Return FS time (possibly fine-grained)
> + * @inode: inode.
> + *
> + * Return the current time truncated to the time granularity supported by
> + * the fs, as suitable for a ctime/mtime change. If the ctime is flagged
> + * as having been QUERIED, get a fine-grained timestamp.
> + */

The comment should also mention that QUERIED flag is cleared from the ctime.

> +static struct timespec64 current_mg_time(struct inode *inode)
> +{
> +	struct timespec64 now;
> +	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
> +	long nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED, pnsec);
> +
> +	if (nsec & I_CTIME_QUERIED) {
> +		ktime_get_real_ts64(&now);
> +	} else {
> +		struct timespec64 ctime;
> +
> +		ktime_get_coarse_real_ts64(&now);
> +
> +		/*
> +		 * If we've recently fetched a fine-grained timestamp
> +		 * then the coarse-grained one may still be earlier than the
> +		 * existing one. Just keep the existing ctime if so.
> +		 */
> +		ctime = ctime_peek(inode);
> +		if (timespec64_compare(&ctime, &now) > 0)
> +			now = ctime;
> +	}
> +
> +	return now;
> +}
> +

...

> +/**
> + * ctime_nsec_peek - peek at (but don't query) the ctime tv_nsec field
> + * @inode: inode to fetch the ctime from
> + *
> + * Grab the current ctime tv_nsec field from the inode, mask off the
> + * I_CTIME_QUERIED flag and return it. This is mostly intended for use by
> + * internal consumers of the ctime that aren't concerned with ensuring a
> + * fine-grained update on the next change (e.g. when preparing to store
> + * the value in the backing store for later retrieval).
> + *
> + * This is safe to call regardless of whether the underlying filesystem
> + * is using multigrain timestamps.
> + */
> +static inline long ctime_nsec_peek(const struct inode *inode)
> +{
> +	return inode->i_ctime.tv_nsec &~ I_CTIME_QUERIED;

This is somewhat unusual spacing. I'd use:

	inode->i_ctime.tv_nsec & ~I_CTIME_QUERIED

> +}
> +
> +/**
> + * ctime_peek - peek at (but don't query) the ctime
> + * @inode: inode to fetch the ctime from
> + *
> + * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For
> + * use by internal consumers that don't require a fine-grained update on
> + * the next change.
> + *
> + * This is safe to call regardless of whether the underlying filesystem
> + * is using multigrain timestamps.
> + */
> +static inline struct timespec64 ctime_peek(const struct inode *inode)
> +{
> +	struct timespec64 ctime;
> +
> +	ctime.tv_sec = inode->i_ctime.tv_sec;
> +	ctime.tv_nsec = ctime_nsec_peek(inode);
> +
> +	return ctime;
> +}

Given this is in a header that gets included in a lot of places, maybe we
should call it like inode_ctime_peek() or inode_ctime_get() to reduce
chances of a name clash?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:02   ` Jan Kara
@ 2023-05-23 10:17     ` Jan Kara
  2023-05-23 10:56       ` Jeff Layton
  2023-06-13 19:47       ` Jeff Layton
  2023-05-23 10:38     ` Christian Brauner
  2023-05-23 10:40     ` Jeff Layton
  2 siblings, 2 replies; 31+ messages in thread
From: Jan Kara @ 2023-05-23 10:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Jan Kara,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Tue 23-05-23 12:02:40, Jan Kara wrote:
> On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > The VFS always uses coarse-grained timestamp updates for filling out the
> > ctime and mtime after a change. This has the benefit of allowing
> > filesystems to optimize away a lot metadata updates, down to around 1
> > per jiffy, even when a file is under heavy writes.
> > 
> > Unfortunately, this has always been an issue when we're exporting via
> > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > lot of exported filesystems don't properly support a change attribute
> > and are subject to the same problems with timestamp granularity. Other
> > applications have similar issues (e.g backup applications).
> > 
> > Switching to always using fine-grained timestamps would improve the
> > situation, but that becomes rather expensive, as the underlying
> > filesystem will have to log a lot more metadata updates.
> > 
> > What we need is a way to only use fine-grained timestamps when they are
> > being actively queried.
> > 
> > The kernel always stores normalized ctime values, so only the first 30
> > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > ctime must also change.
> > 
> > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > on the next timestamp update, the kernel can fetch a fine-grained
> > timestamp instead of the usual coarse-grained one.
> > 
> > This patch adds the infrastructure this scheme. Filesytems can opt
> > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > 
> > Later patches will convert individual filesystems over to use it.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> So there are two things I dislike about this series because I think they
> are fragile:
> 
> 1) If we have a filesystem supporting multigrain ts and someone
> accidentally directly uses the value of inode->i_ctime, he can get bogus
> value (with QUERIED flag). This mistake is very easy to do. So I think we
> should rename i_ctime to something like __i_ctime and always use accessor
> function for it.
> 
> 2) As I already commented in a previous version of the series, the scheme
> with just one flag for both ctime and mtime and flag getting cleared in
> current_time() relies on the fact that filesystems always do an equivalent
> of:
> 
> 	inode->i_mtime = inode->i_ctime = current_time();
> 
> Otherwise we can do coarse grained update where we should have done a fine
> grained one. Filesystems often update timestamps like this but not
> universally. Grepping shows some instances where only inode->i_mtime is set
> from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> easy to make and results in subtle issues. I think this would be also
> nicely solved by renaming i_ctime to __i_ctime and using a function to set
> ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> 
> I understand this is quite some churn but a very mechanical one that could
> be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> the more robust result.

Also as I'm thinking about it your current scheme is slightly racy. Suppose
the filesystem does:

CPU1					CPU2

					statx()
inode->i_ctime = current_time()
  current_mg_time()
    nsec = atomic_long_fetch_andnot(QUERIED, &inode->i_ctime.tv_nsec)
					  nsec = atomic_long_fetch_or(QUERIED, &inode->i_ctime.tv_nsec)
    if (nsec & QUERIED) - not set
      ktime_get_coarse_real_ts64(&now)
    return timestamp_truncate(now, inode);
- QUERIED flag in the inode->i_ctime gets overwritten by the assignment
  => we need not update ctime due to granularity although it was queried

One more reason to use explicit function to update inode->i_ctime ;)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:02   ` Jan Kara
  2023-05-23 10:17     ` Jan Kara
@ 2023-05-23 10:38     ` Christian Brauner
  2023-05-23 10:40     ` Jeff Layton
  2 siblings, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-05-23 10:38 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jeff Layton, Alexander Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, May 23, 2023 at 12:02:40PM +0200, Jan Kara wrote:
> On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > The VFS always uses coarse-grained timestamp updates for filling out the
> > ctime and mtime after a change. This has the benefit of allowing
> > filesystems to optimize away a lot metadata updates, down to around 1
> > per jiffy, even when a file is under heavy writes.
> > 
> > Unfortunately, this has always been an issue when we're exporting via
> > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > lot of exported filesystems don't properly support a change attribute
> > and are subject to the same problems with timestamp granularity. Other
> > applications have similar issues (e.g backup applications).
> > 
> > Switching to always using fine-grained timestamps would improve the
> > situation, but that becomes rather expensive, as the underlying
> > filesystem will have to log a lot more metadata updates.
> > 
> > What we need is a way to only use fine-grained timestamps when they are
> > being actively queried.
> > 
> > The kernel always stores normalized ctime values, so only the first 30
> > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > ctime must also change.
> > 
> > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > on the next timestamp update, the kernel can fetch a fine-grained
> > timestamp instead of the usual coarse-grained one.
> > 
> > This patch adds the infrastructure this scheme. Filesytems can opt
> > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > 
> > Later patches will convert individual filesystems over to use it.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> So there are two things I dislike about this series because I think they
> are fragile:
> 
> 1) If we have a filesystem supporting multigrain ts and someone
> accidentally directly uses the value of inode->i_ctime, he can get bogus
> value (with QUERIED flag). This mistake is very easy to do. So I think we
> should rename i_ctime to something like __i_ctime and always use accessor
> function for it.
> 
> 2) As I already commented in a previous version of the series, the scheme
> with just one flag for both ctime and mtime and flag getting cleared in
> current_time() relies on the fact that filesystems always do an equivalent
> of:
> 
> 	inode->i_mtime = inode->i_ctime = current_time();
> 
> Otherwise we can do coarse grained update where we should have done a fine
> grained one. Filesystems often update timestamps like this but not
> universally. Grepping shows some instances where only inode->i_mtime is set
> from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> easy to make and results in subtle issues. I think this would be also
> nicely solved by renaming i_ctime to __i_ctime and using a function to set
> ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> 
> I understand this is quite some churn but a very mechanical one that could
> be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> the more robust result.

Yeah, these are all good points.

> 
> Some more nits below.
> 
> > +/**
> > + * current_mg_time - Return FS time (possibly fine-grained)
> > + * @inode: inode.
> > + *
> > + * Return the current time truncated to the time granularity supported by
> > + * the fs, as suitable for a ctime/mtime change. If the ctime is flagged
> > + * as having been QUERIED, get a fine-grained timestamp.
> > + */
> 
> The comment should also mention that QUERIED flag is cleared from the ctime.
> 
> > +static struct timespec64 current_mg_time(struct inode *inode)
> > +{
> > +	struct timespec64 now;
> > +	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
> > +	long nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED, pnsec);
> > +
> > +	if (nsec & I_CTIME_QUERIED) {
> > +		ktime_get_real_ts64(&now);
> > +	} else {
> > +		struct timespec64 ctime;
> > +
> > +		ktime_get_coarse_real_ts64(&now);
> > +
> > +		/*
> > +		 * If we've recently fetched a fine-grained timestamp
> > +		 * then the coarse-grained one may still be earlier than the
> > +		 * existing one. Just keep the existing ctime if so.
> > +		 */
> > +		ctime = ctime_peek(inode);
> > +		if (timespec64_compare(&ctime, &now) > 0)
> > +			now = ctime;
> > +	}
> > +
> > +	return now;
> > +}
> > +
> 
> ...
> 
> > +/**
> > + * ctime_nsec_peek - peek at (but don't query) the ctime tv_nsec field
> > + * @inode: inode to fetch the ctime from
> > + *
> > + * Grab the current ctime tv_nsec field from the inode, mask off the
> > + * I_CTIME_QUERIED flag and return it. This is mostly intended for use by
> > + * internal consumers of the ctime that aren't concerned with ensuring a
> > + * fine-grained update on the next change (e.g. when preparing to store
> > + * the value in the backing store for later retrieval).
> > + *
> > + * This is safe to call regardless of whether the underlying filesystem
> > + * is using multigrain timestamps.
> > + */
> > +static inline long ctime_nsec_peek(const struct inode *inode)
> > +{
> > +	return inode->i_ctime.tv_nsec &~ I_CTIME_QUERIED;
> 
> This is somewhat unusual spacing. I'd use:
> 
> 	inode->i_ctime.tv_nsec & ~I_CTIME_QUERIED
> 
> > +}
> > +
> > +/**
> > + * ctime_peek - peek at (but don't query) the ctime
> > + * @inode: inode to fetch the ctime from
> > + *
> > + * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For
> > + * use by internal consumers that don't require a fine-grained update on
> > + * the next change.
> > + *
> > + * This is safe to call regardless of whether the underlying filesystem
> > + * is using multigrain timestamps.
> > + */
> > +static inline struct timespec64 ctime_peek(const struct inode *inode)
> > +{
> > +	struct timespec64 ctime;
> > +
> > +	ctime.tv_sec = inode->i_ctime.tv_sec;
> > +	ctime.tv_nsec = ctime_nsec_peek(inode);
> > +
> > +	return ctime;
> > +}
> 
> Given this is in a header that gets included in a lot of places, maybe we
> should call it like inode_ctime_peek() or inode_ctime_get() to reduce
> chances of a name clash?

I think I mentioned this in an earlier comment. Independent of this
series, it would be kinda nice if we could start moving stuff out of
fs.h so we end up with a finer grained split of fs.h.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:02   ` Jan Kara
  2023-05-23 10:17     ` Jan Kara
  2023-05-23 10:38     ` Christian Brauner
@ 2023-05-23 10:40     ` Jeff Layton
  2023-05-23 12:46       ` Jan Kara
  2 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-23 10:40 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, 2023-05-23 at 12:02 +0200, Jan Kara wrote:
> On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > The VFS always uses coarse-grained timestamp updates for filling out the
> > ctime and mtime after a change. This has the benefit of allowing
> > filesystems to optimize away a lot metadata updates, down to around 1
> > per jiffy, even when a file is under heavy writes.
> > 
> > Unfortunately, this has always been an issue when we're exporting via
> > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > lot of exported filesystems don't properly support a change attribute
> > and are subject to the same problems with timestamp granularity. Other
> > applications have similar issues (e.g backup applications).
> > 
> > Switching to always using fine-grained timestamps would improve the
> > situation, but that becomes rather expensive, as the underlying
> > filesystem will have to log a lot more metadata updates.
> > 
> > What we need is a way to only use fine-grained timestamps when they are
> > being actively queried.
> > 
> > The kernel always stores normalized ctime values, so only the first 30
> > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > ctime must also change.
> > 
> > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > on the next timestamp update, the kernel can fetch a fine-grained
> > timestamp instead of the usual coarse-grained one.
> > 
> > This patch adds the infrastructure this scheme. Filesytems can opt
> > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > 
> > Later patches will convert individual filesystems over to use it.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> So there are two things I dislike about this series because I think they
> are fragile:
> 
> 1) If we have a filesystem supporting multigrain ts and someone
> accidentally directly uses the value of inode->i_ctime, he can get bogus
> value (with QUERIED flag). This mistake is very easy to do. So I think we
> should rename i_ctime to something like __i_ctime and always use accessor
> function for it.
> 

We could do this, but it'll be quite invasive. We'd have to change any
place that touches i_ctime (and there are a lot of them), even on
filesystems that are not being converted.

> 2) As I already commented in a previous version of the series, the scheme
> with just one flag for both ctime and mtime and flag getting cleared in
> current_time() relies on the fact that filesystems always do an equivalent
> of:
> 
> 	inode->i_mtime = inode->i_ctime = current_time();
> 
> Otherwise we can do coarse grained update where we should have done a fine
> grained one. Filesystems often update timestamps like this but not
> universally. Grepping shows some instances where only inode->i_mtime is set
> from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> easy to make and results in subtle issues. I think this would be also
> nicely solved by renaming i_ctime to __i_ctime and using a function to set
> ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
>
> I understand this is quite some churn but a very mechanical one that could
> be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> the more robust result.

AFAICT, under POSIX, you must _always_ set the ctime when you set the
mtime, but the reverse is not true. That's why keeping the flag in the
ctime makes sense. If we're updating the mtime, then we necessarily must
update the ctime.

> Some more nits below.
> 
> > +/**
> > + * current_mg_time - Return FS time (possibly fine-grained)
> > + * @inode: inode.
> > + *
> > + * Return the current time truncated to the time granularity supported by
> > + * the fs, as suitable for a ctime/mtime change. If the ctime is flagged
> > + * as having been QUERIED, get a fine-grained timestamp.
> > + */
> 
> The comment should also mention that QUERIED flag is cleared from the ctime.
> 

Fair point. I can fix that up.

> > +static struct timespec64 current_mg_time(struct inode *inode)
> > +{
> > +	struct timespec64 now;
> > +	atomic_long_t *pnsec = (atomic_long_t *)&inode->i_ctime.tv_nsec;
> > +	long nsec = atomic_long_fetch_andnot(I_CTIME_QUERIED, pnsec);
> > +
> > +	if (nsec & I_CTIME_QUERIED) {
> > +		ktime_get_real_ts64(&now);
> > +	} else {
> > +		struct timespec64 ctime;
> > +
> > +		ktime_get_coarse_real_ts64(&now);
> > +
> > +		/*
> > +		 * If we've recently fetched a fine-grained timestamp
> > +		 * then the coarse-grained one may still be earlier than the
> > +		 * existing one. Just keep the existing ctime if so.
> > +		 */
> > +		ctime = ctime_peek(inode);
> > +		if (timespec64_compare(&ctime, &now) > 0)
> > +			now = ctime;
> > +	}
> > +
> > +	return now;
> > +}
> > +
> 
> ...
> 
> > +/**
> > + * ctime_nsec_peek - peek at (but don't query) the ctime tv_nsec field
> > + * @inode: inode to fetch the ctime from
> > + *
> > + * Grab the current ctime tv_nsec field from the inode, mask off the
> > + * I_CTIME_QUERIED flag and return it. This is mostly intended for use by
> > + * internal consumers of the ctime that aren't concerned with ensuring a
> > + * fine-grained update on the next change (e.g. when preparing to store
> > + * the value in the backing store for later retrieval).
> > + *
> > + * This is safe to call regardless of whether the underlying filesystem
> > + * is using multigrain timestamps.
> > + */
> > +static inline long ctime_nsec_peek(const struct inode *inode)
> > +{
> > +	return inode->i_ctime.tv_nsec &~ I_CTIME_QUERIED;
> 
> This is somewhat unusual spacing. I'd use:
> 
> 	inode->i_ctime.tv_nsec & ~I_CTIME_QUERIED
> 

Yeah, I don't know what happened there. I'll fix that up.

> > +}
> > +
> > +/**
> > + * ctime_peek - peek at (but don't query) the ctime
> > + * @inode: inode to fetch the ctime from
> > + *
> > + * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For
> > + * use by internal consumers that don't require a fine-grained update on
> > + * the next change.
> > + *
> > + * This is safe to call regardless of whether the underlying filesystem
> > + * is using multigrain timestamps.
> > + */
> > +static inline struct timespec64 ctime_peek(const struct inode *inode)
> > +{
> > +	struct timespec64 ctime;
> > +
> > +	ctime.tv_sec = inode->i_ctime.tv_sec;
> > +	ctime.tv_nsec = ctime_nsec_peek(inode);
> > +
> > +	return ctime;
> > +}
> 
> Given this is in a header that gets included in a lot of places, maybe we
> should call it like inode_ctime_peek() or inode_ctime_get() to reduce
> chances of a name clash?

I'd be fine with that, but "ctime" sort of implies inode->i_ctime to me.
We don't really use that nomenclature elsewhere.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:17     ` Jan Kara
@ 2023-05-23 10:56       ` Jeff Layton
  2023-05-23 11:01         ` Christian Brauner
  2023-06-13 19:47       ` Jeff Layton
  1 sibling, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-05-23 10:56 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, 2023-05-23 at 12:17 +0200, Jan Kara wrote:
> On Tue 23-05-23 12:02:40, Jan Kara wrote:
> > On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > > The VFS always uses coarse-grained timestamp updates for filling out the
> > > ctime and mtime after a change. This has the benefit of allowing
> > > filesystems to optimize away a lot metadata updates, down to around 1
> > > per jiffy, even when a file is under heavy writes.
> > > 
> > > Unfortunately, this has always been an issue when we're exporting via
> > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > > lot of exported filesystems don't properly support a change attribute
> > > and are subject to the same problems with timestamp granularity. Other
> > > applications have similar issues (e.g backup applications).
> > > 
> > > Switching to always using fine-grained timestamps would improve the
> > > situation, but that becomes rather expensive, as the underlying
> > > filesystem will have to log a lot more metadata updates.
> > > 
> > > What we need is a way to only use fine-grained timestamps when they are
> > > being actively queried.
> > > 
> > > The kernel always stores normalized ctime values, so only the first 30
> > > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > > ctime must also change.
> > > 
> > > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > > on the next timestamp update, the kernel can fetch a fine-grained
> > > timestamp instead of the usual coarse-grained one.
> > > 
> > > This patch adds the infrastructure this scheme. Filesytems can opt
> > > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > > 
> > > Later patches will convert individual filesystems over to use it.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > 
> > So there are two things I dislike about this series because I think they
> > are fragile:
> > 
> > 1) If we have a filesystem supporting multigrain ts and someone
> > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > should rename i_ctime to something like __i_ctime and always use accessor
> > function for it.
> > 
> > 2) As I already commented in a previous version of the series, the scheme
> > with just one flag for both ctime and mtime and flag getting cleared in
> > current_time() relies on the fact that filesystems always do an equivalent
> > of:
> > 
> > 	inode->i_mtime = inode->i_ctime = current_time();
> > 
> > Otherwise we can do coarse grained update where we should have done a fine
> > grained one. Filesystems often update timestamps like this but not
> > universally. Grepping shows some instances where only inode->i_mtime is set
> > from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> > easy to make and results in subtle issues. I think this would be also
> > nicely solved by renaming i_ctime to __i_ctime and using a function to set
> > ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> > 
> > I understand this is quite some churn but a very mechanical one that could
> > be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> > the more robust result.
> 
> Also as I'm thinking about it your current scheme is slightly racy. Suppose
> the filesystem does:
> 
> CPU1					CPU2
> 
> 					statx()
> inode->i_ctime = current_time()
>   current_mg_time()
>     nsec = atomic_long_fetch_andnot(QUERIED, &inode->i_ctime.tv_nsec)
> 					  nsec = atomic_long_fetch_or(QUERIED, &inode->i_ctime.tv_nsec)
>     if (nsec & QUERIED) - not set
>       ktime_get_coarse_real_ts64(&now)
>     return timestamp_truncate(now, inode);
> - QUERIED flag in the inode->i_ctime gets overwritten by the assignment
>   => we need not update ctime due to granularity although it was queried
> 
> One more reason to use explicit function to update inode->i_ctime ;)

When we store the new time in the i_ctime field, the flag gets cleared
because at that point we're storing a new (unseen) time.

However, you're correct: if the i_ctime in your above example starts at
the same value that is currently being returned by
ktime_get_coarse_real_ts64, then we'll lose the flag set in statx.

I think the right fix there would be to not update the ctime at all if
it's a coarse grained time, and the value wouldn't have an apparent
change to an observer. That would leave the flag intact.

That does mean we'd need to move to a function that does clock fetch and
assigns it to i_ctime in one go (like you suggest). Something like:

    inode_update_ctime(inode);

How we do that with atomic operations over two values (the tv_sec and
tv_nsec) is a bit tricky. I'll have to think about it.

Christian, given Jan's concerns do you want to drop this series for now
and let me respin it?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:56       ` Jeff Layton
@ 2023-05-23 11:01         ` Christian Brauner
  2023-05-23 11:15           ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Brauner @ 2023-05-23 11:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Alexander Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, May 23, 2023 at 06:56:11AM -0400, Jeff Layton wrote:
> On Tue, 2023-05-23 at 12:17 +0200, Jan Kara wrote:
> > On Tue 23-05-23 12:02:40, Jan Kara wrote:
> > > On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > > > The VFS always uses coarse-grained timestamp updates for filling out the
> > > > ctime and mtime after a change. This has the benefit of allowing
> > > > filesystems to optimize away a lot metadata updates, down to around 1
> > > > per jiffy, even when a file is under heavy writes.
> > > > 
> > > > Unfortunately, this has always been an issue when we're exporting via
> > > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > > > lot of exported filesystems don't properly support a change attribute
> > > > and are subject to the same problems with timestamp granularity. Other
> > > > applications have similar issues (e.g backup applications).
> > > > 
> > > > Switching to always using fine-grained timestamps would improve the
> > > > situation, but that becomes rather expensive, as the underlying
> > > > filesystem will have to log a lot more metadata updates.
> > > > 
> > > > What we need is a way to only use fine-grained timestamps when they are
> > > > being actively queried.
> > > > 
> > > > The kernel always stores normalized ctime values, so only the first 30
> > > > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > > > ctime must also change.
> > > > 
> > > > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > > > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > > > on the next timestamp update, the kernel can fetch a fine-grained
> > > > timestamp instead of the usual coarse-grained one.
> > > > 
> > > > This patch adds the infrastructure this scheme. Filesytems can opt
> > > > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > > > 
> > > > Later patches will convert individual filesystems over to use it.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > 
> > > So there are two things I dislike about this series because I think they
> > > are fragile:
> > > 
> > > 1) If we have a filesystem supporting multigrain ts and someone
> > > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > > should rename i_ctime to something like __i_ctime and always use accessor
> > > function for it.
> > > 
> > > 2) As I already commented in a previous version of the series, the scheme
> > > with just one flag for both ctime and mtime and flag getting cleared in
> > > current_time() relies on the fact that filesystems always do an equivalent
> > > of:
> > > 
> > > 	inode->i_mtime = inode->i_ctime = current_time();
> > > 
> > > Otherwise we can do coarse grained update where we should have done a fine
> > > grained one. Filesystems often update timestamps like this but not
> > > universally. Grepping shows some instances where only inode->i_mtime is set
> > > from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> > > easy to make and results in subtle issues. I think this would be also
> > > nicely solved by renaming i_ctime to __i_ctime and using a function to set
> > > ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> > > 
> > > I understand this is quite some churn but a very mechanical one that could
> > > be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> > > the more robust result.
> > 
> > Also as I'm thinking about it your current scheme is slightly racy. Suppose
> > the filesystem does:
> > 
> > CPU1					CPU2
> > 
> > 					statx()
> > inode->i_ctime = current_time()
> >   current_mg_time()
> >     nsec = atomic_long_fetch_andnot(QUERIED, &inode->i_ctime.tv_nsec)
> > 					  nsec = atomic_long_fetch_or(QUERIED, &inode->i_ctime.tv_nsec)
> >     if (nsec & QUERIED) - not set
> >       ktime_get_coarse_real_ts64(&now)
> >     return timestamp_truncate(now, inode);
> > - QUERIED flag in the inode->i_ctime gets overwritten by the assignment
> >   => we need not update ctime due to granularity although it was queried
> > 
> > One more reason to use explicit function to update inode->i_ctime ;)
> 
> When we store the new time in the i_ctime field, the flag gets cleared
> because at that point we're storing a new (unseen) time.
> 
> However, you're correct: if the i_ctime in your above example starts at
> the same value that is currently being returned by
> ktime_get_coarse_real_ts64, then we'll lose the flag set in statx.
> 
> I think the right fix there would be to not update the ctime at all if
> it's a coarse grained time, and the value wouldn't have an apparent
> change to an observer. That would leave the flag intact.
> 
> That does mean we'd need to move to a function that does clock fetch and
> assigns it to i_ctime in one go (like you suggest). Something like:
> 
>     inode_update_ctime(inode);
> 
> How we do that with atomic operations over two values (the tv_sec and
> tv_nsec) is a bit tricky. I'll have to think about it.
> 
> Christian, given Jan's concerns do you want to drop this series for now
> and let me respin it?

I deliberately put it into a vfs.unstable.* branch. I would leave it
there until you send a new one then drop it. If we get lucky the bots
that run on -next will have time to report potential perf issues while
it's not currently causing conflicts.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 11:01         ` Christian Brauner
@ 2023-05-23 11:15           ` Jeff Layton
  0 siblings, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-05-23 11:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Alexander Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, 2023-05-23 at 13:01 +0200, Christian Brauner wrote:
> On Tue, May 23, 2023 at 06:56:11AM -0400, Jeff Layton wrote:
> > On Tue, 2023-05-23 at 12:17 +0200, Jan Kara wrote:
> > > On Tue 23-05-23 12:02:40, Jan Kara wrote:
> > > > On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > > > > The VFS always uses coarse-grained timestamp updates for filling out the
> > > > > ctime and mtime after a change. This has the benefit of allowing
> > > > > filesystems to optimize away a lot metadata updates, down to around 1
> > > > > per jiffy, even when a file is under heavy writes.
> > > > > 
> > > > > Unfortunately, this has always been an issue when we're exporting via
> > > > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > > > > lot of exported filesystems don't properly support a change attribute
> > > > > and are subject to the same problems with timestamp granularity. Other
> > > > > applications have similar issues (e.g backup applications).
> > > > > 
> > > > > Switching to always using fine-grained timestamps would improve the
> > > > > situation, but that becomes rather expensive, as the underlying
> > > > > filesystem will have to log a lot more metadata updates.
> > > > > 
> > > > > What we need is a way to only use fine-grained timestamps when they are
> > > > > being actively queried.
> > > > > 
> > > > > The kernel always stores normalized ctime values, so only the first 30
> > > > > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > > > > ctime must also change.
> > > > > 
> > > > > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > > > > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > > > > on the next timestamp update, the kernel can fetch a fine-grained
> > > > > timestamp instead of the usual coarse-grained one.
> > > > > 
> > > > > This patch adds the infrastructure this scheme. Filesytems can opt
> > > > > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > > > > 
> > > > > Later patches will convert individual filesystems over to use it.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > 
> > > > So there are two things I dislike about this series because I think they
> > > > are fragile:
> > > > 
> > > > 1) If we have a filesystem supporting multigrain ts and someone
> > > > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > > > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > > > should rename i_ctime to something like __i_ctime and always use accessor
> > > > function for it.
> > > > 
> > > > 2) As I already commented in a previous version of the series, the scheme
> > > > with just one flag for both ctime and mtime and flag getting cleared in
> > > > current_time() relies on the fact that filesystems always do an equivalent
> > > > of:
> > > > 
> > > > 	inode->i_mtime = inode->i_ctime = current_time();
> > > > 
> > > > Otherwise we can do coarse grained update where we should have done a fine
> > > > grained one. Filesystems often update timestamps like this but not
> > > > universally. Grepping shows some instances where only inode->i_mtime is set
> > > > from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> > > > easy to make and results in subtle issues. I think this would be also
> > > > nicely solved by renaming i_ctime to __i_ctime and using a function to set
> > > > ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> > > > 
> > > > I understand this is quite some churn but a very mechanical one that could
> > > > be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> > > > the more robust result.
> > > 
> > > Also as I'm thinking about it your current scheme is slightly racy. Suppose
> > > the filesystem does:
> > > 
> > > CPU1					CPU2
> > > 
> > > 					statx()
> > > inode->i_ctime = current_time()
> > >   current_mg_time()
> > >     nsec = atomic_long_fetch_andnot(QUERIED, &inode->i_ctime.tv_nsec)
> > > 					  nsec = atomic_long_fetch_or(QUERIED, &inode->i_ctime.tv_nsec)
> > >     if (nsec & QUERIED) - not set
> > >       ktime_get_coarse_real_ts64(&now)
> > >     return timestamp_truncate(now, inode);
> > > - QUERIED flag in the inode->i_ctime gets overwritten by the assignment
> > >   => we need not update ctime due to granularity although it was queried
> > > 
> > > One more reason to use explicit function to update inode->i_ctime ;)
> > 
> > When we store the new time in the i_ctime field, the flag gets cleared
> > because at that point we're storing a new (unseen) time.
> > 
> > However, you're correct: if the i_ctime in your above example starts at
> > the same value that is currently being returned by
> > ktime_get_coarse_real_ts64, then we'll lose the flag set in statx.
> > 
> > I think the right fix there would be to not update the ctime at all if
> > it's a coarse grained time, and the value wouldn't have an apparent
> > change to an observer. That would leave the flag intact.
> > 
> > That does mean we'd need to move to a function that does clock fetch and
> > assigns it to i_ctime in one go (like you suggest). Something like:
> > 
> >     inode_update_ctime(inode);
> > 
> > How we do that with atomic operations over two values (the tv_sec and
> > tv_nsec) is a bit tricky. I'll have to think about it.
> > 
> > Christian, given Jan's concerns do you want to drop this series for now
> > and let me respin it?
> 
> I deliberately put it into a vfs.unstable.* branch. I would leave it
> there until you send a new one then drop it. If we get lucky the bots
> that run on -next will have time to report potential perf issues while
> it's not currently causing conflicts.

Sounds good to me. Thanks!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:40     ` Jeff Layton
@ 2023-05-23 12:46       ` Jan Kara
  2023-06-13 13:09         ` Jeff Layton
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Kara @ 2023-05-23 12:46 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Alexander Viro, Christian Brauner, Darrick J. Wong,
	Hugh Dickins, Andrew Morton, Dave Chinner, Chuck Lever,
	Amir Goldstein, David Howells, Neil Brown, Matthew Wilcox,
	Andreas Dilger, Theodore T'so, Chris Mason, Josef Bacik,
	David Sterba, Namjae Jeon, Steve French, Sergey Senozhatsky,
	Tom Talpey, linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs,
	linux-ext4, linux-mm, linux-nfs, linux-cifs

On Tue 23-05-23 06:40:08, Jeff Layton wrote:
> On Tue, 2023-05-23 at 12:02 +0200, Jan Kara wrote:
> > On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > > The VFS always uses coarse-grained timestamp updates for filling out the
> > > ctime and mtime after a change. This has the benefit of allowing
> > > filesystems to optimize away a lot metadata updates, down to around 1
> > > per jiffy, even when a file is under heavy writes.
> > > 
> > > Unfortunately, this has always been an issue when we're exporting via
> > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > > lot of exported filesystems don't properly support a change attribute
> > > and are subject to the same problems with timestamp granularity. Other
> > > applications have similar issues (e.g backup applications).
> > > 
> > > Switching to always using fine-grained timestamps would improve the
> > > situation, but that becomes rather expensive, as the underlying
> > > filesystem will have to log a lot more metadata updates.
> > > 
> > > What we need is a way to only use fine-grained timestamps when they are
> > > being actively queried.
> > > 
> > > The kernel always stores normalized ctime values, so only the first 30
> > > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > > ctime must also change.
> > > 
> > > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > > on the next timestamp update, the kernel can fetch a fine-grained
> > > timestamp instead of the usual coarse-grained one.
> > > 
> > > This patch adds the infrastructure this scheme. Filesytems can opt
> > > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > > 
> > > Later patches will convert individual filesystems over to use it.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > 
> > So there are two things I dislike about this series because I think they
> > are fragile:
> > 
> > 1) If we have a filesystem supporting multigrain ts and someone
> > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > should rename i_ctime to something like __i_ctime and always use accessor
> > function for it.
> > 
> 
> We could do this, but it'll be quite invasive. We'd have to change any
> place that touches i_ctime (and there are a lot of them), even on
> filesystems that are not being converted.

Yes, that's why I suggested Coccinelle to deal with this.

> > 2) As I already commented in a previous version of the series, the scheme
> > with just one flag for both ctime and mtime and flag getting cleared in
> > current_time() relies on the fact that filesystems always do an equivalent
> > of:
> > 
> > 	inode->i_mtime = inode->i_ctime = current_time();
> > 
> > Otherwise we can do coarse grained update where we should have done a fine
> > grained one. Filesystems often update timestamps like this but not
> > universally. Grepping shows some instances where only inode->i_mtime is set
> > from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> > easy to make and results in subtle issues. I think this would be also
> > nicely solved by renaming i_ctime to __i_ctime and using a function to set
> > ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> >
> > I understand this is quite some churn but a very mechanical one that could
> > be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> > the more robust result.
> 
> AFAICT, under POSIX, you must _always_ set the ctime when you set the
> mtime, but the reverse is not true. That's why keeping the flag in the
> ctime makes sense. If we're updating the mtime, then we necessarily must
> update the ctime.

Yes, but technically speaking:

	inode->i_mtime = current_time();
	inode->i_ctime = current_time();

follows these requirements but is broken with your changes in unobvious
way. And if mtime update is hidden in some function or condition, it is not
even that suboptimal coding pattern.

> > > +}
> > > +
> > > +/**
> > > + * ctime_peek - peek at (but don't query) the ctime
> > > + * @inode: inode to fetch the ctime from
> > > + *
> > > + * Grab the current ctime from the inode, sans I_CTIME_QUERIED flag. For
> > > + * use by internal consumers that don't require a fine-grained update on
> > > + * the next change.
> > > + *
> > > + * This is safe to call regardless of whether the underlying filesystem
> > > + * is using multigrain timestamps.
> > > + */
> > > +static inline struct timespec64 ctime_peek(const struct inode *inode)
> > > +{
> > > +	struct timespec64 ctime;
> > > +
> > > +	ctime.tv_sec = inode->i_ctime.tv_sec;
> > > +	ctime.tv_nsec = ctime_nsec_peek(inode);
> > > +
> > > +	return ctime;
> > > +}
> > 
> > Given this is in a header that gets included in a lot of places, maybe we
> > should call it like inode_ctime_peek() or inode_ctime_get() to reduce
> > chances of a name clash?
> 
> I'd be fine with that, but "ctime" sort of implies inode->i_ctime to me.
> We don't really use that nomenclature elsewhere.

Yes, I don't insist here but we do have 'ctime' e.g. in IPC code
(sem_ctime, shm_ctime, msg_ctime), we have ctime in md layer, ctime(3) is
also a glibc function. So it is not *completely* impossible the clash
happens but we can deal with it when it happens.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 12:46       ` Jan Kara
@ 2023-06-13 13:09         ` Jeff Layton
  2023-06-14  6:29           ` Christian Brauner
  0 siblings, 1 reply; 31+ messages in thread
From: Jeff Layton @ 2023-06-13 13:09 UTC (permalink / raw)
  To: Jan Kara, Alexander Viro, Christian Brauner
  Cc: Darrick J. Wong, Hugh Dickins, Andrew Morton, Dave Chinner,
	Chuck Lever, Amir Goldstein, David Howells, Neil Brown,
	Matthew Wilcox, Andreas Dilger, Theodore T'so, Chris Mason,
	Josef Bacik, David Sterba, Namjae Jeon, Steve French,
	Sergey Senozhatsky, Tom Talpey, linux-fsdevel, linux-kernel,
	linux-xfs, linux-btrfs, linux-ext4, linux-mm, linux-nfs,
	linux-cifs

On Tue, 2023-05-23 at 14:46 +0200, Jan Kara wrote:
> On Tue 23-05-23 06:40:08, Jeff Layton wrote:
> > On Tue, 2023-05-23 at 12:02 +0200, Jan Kara wrote:
> > > 
> > > So there are two things I dislike about this series because I think they
> > > are fragile:
> > > 
> > > 1) If we have a filesystem supporting multigrain ts and someone
> > > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > > should rename i_ctime to something like __i_ctime and always use accessor
> > > function for it.
> > > 
> > 
> > We could do this, but it'll be quite invasive. We'd have to change any
> > place that touches i_ctime (and there are a lot of them), even on
> > filesystems that are not being converted.
> 
> Yes, that's why I suggested Coccinelle to deal with this.


I've done the work to convert all of the accesses of i_ctime into
accessor functions in the kernel. The current state of it is here:

   
https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/commit/?h=ctime

As expected, it touches a lot of code, all over the place. So far I have
most of the conversion in one giant patch, and I need to split it up
(probably per-subsystem).

What's the best way to feed this change into mainline? Should I try to
get subsystem maintainers to pick these up, or are we better off feeding
this in via a separate branch?

For reference, the diffstat for the big conversion patch is below:

 arch/powerpc/platforms/cell/spufs/inode.c |  2 +-
 arch/s390/hypfs/inode.c                   |  4 +-
 drivers/android/binderfs.c                |  8 ++--
 drivers/infiniband/hw/qib/qib_fs.c        |  4 +-
 drivers/misc/ibmasm/ibmasmfs.c            |  2 +-
 drivers/misc/ibmvmc.c                     |  2 +-
 drivers/usb/core/devio.c                  | 16 ++++----
 drivers/usb/gadget/function/f_fs.c        |  6 +--
 drivers/usb/gadget/legacy/inode.c         |  3 +-
 fs/9p/vfs_inode.c                         |  6 ++-
 fs/9p/vfs_inode_dotl.c                    | 11 +++---
 fs/adfs/inode.c                           |  4 +-
 fs/affs/amigaffs.c                        |  6 +--
 fs/affs/inode.c                           | 17 ++++----
 fs/afs/dynroot.c                          |  2 +-
 fs/afs/inode.c                            |  6 +--
 fs/attr.c                                 |  2 +-
 fs/autofs/inode.c                         |  2 +-
 fs/autofs/root.c                          |  6 +--
 fs/bad_inode.c                            |  3 +-
 fs/befs/linuxvfs.c                        |  2 +-
 fs/bfs/dir.c                              | 16 ++++----
 fs/bfs/inode.c                            |  6 +--
 fs/binfmt_misc.c                          |  3 +-
 fs/btrfs/delayed-inode.c                  | 10 +++--
 fs/btrfs/file.c                           | 24 ++++-------
 fs/btrfs/inode.c                          | 66 ++++++++++++------------
-------
 fs/btrfs/ioctl.c                          |  2 +-
 fs/btrfs/reflink.c                        |  7 ++--
 fs/btrfs/transaction.c                    |  3 +-
 fs/btrfs/tree-log.c                       |  4 +-
 fs/btrfs/xattr.c                          |  4 +-
 fs/ceph/acl.c                             |  2 +-
 fs/ceph/caps.c                            |  2 +-
 fs/ceph/inode.c                           | 17 ++++----
 fs/ceph/snap.c                            |  2 +-
 fs/ceph/xattr.c                           |  2 +-
 fs/coda/coda_linux.c                      |  2 +-
 fs/coda/dir.c                             |  2 +-
 fs/coda/file.c                            |  2 +-
 fs/coda/inode.c                           |  2 +-
 fs/configfs/inode.c                       |  6 +--
 fs/cramfs/inode.c                         |  2 +-
 fs/debugfs/inode.c                        |  2 +-
 fs/devpts/inode.c                         |  6 +--
 fs/ecryptfs/inode.c                       |  2 +-
 fs/efivarfs/file.c                        |  2 +-
 fs/efivarfs/inode.c                       |  2 +-
 fs/efs/inode.c                            |  5 ++-
 fs/erofs/inode.c                          | 16 ++++----
 fs/exfat/file.c                           |  4 +-
 fs/exfat/inode.c                          |  6 +--
 fs/exfat/namei.c                          | 29 +++++++-------
 fs/exfat/super.c                          |  4 +-
 fs/ext2/acl.c                             |  2 +-
 fs/ext2/dir.c                             |  6 +--
 fs/ext2/ialloc.c                          |  2 +-
 fs/ext2/inode.c                           | 11 +++---
 fs/ext2/ioctl.c                           |  4 +-
 fs/ext2/namei.c                           |  8 ++--
 fs/ext2/super.c                           |  2 +-
 fs/ext2/xattr.c                           |  2 +-
 fs/ext4/acl.c                             |  2 +-
 fs/ext4/ext4.h                            | 20 ++++++++++
 fs/ext4/extents.c                         | 12 +++---
 fs/ext4/ialloc.c                          |  2 +-
 fs/ext4/inline.c                          |  4 +-
 fs/ext4/inode.c                           | 16 ++++----
 fs/ext4/ioctl.c                           |  9 +++--
 fs/ext4/namei.c                           | 26 ++++++------
 fs/ext4/super.c                           |  2 +-
 fs/ext4/xattr.c                           |  6 +--
 fs/f2fs/dir.c                             |  8 ++--
 fs/f2fs/f2fs.h                            |  5 ++-
 fs/f2fs/file.c                            | 16 ++++----
 fs/f2fs/inline.c                          |  2 +-
 fs/f2fs/inode.c                           | 10 ++---
 fs/f2fs/namei.c                           | 12 +++---
 fs/f2fs/recovery.c                        |  4 +-
 fs/f2fs/super.c                           |  2 +-
 fs/f2fs/xattr.c                           |  2 +-
 fs/fat/inode.c                            |  8 ++--
 fs/fat/misc.c                             |  7 +++-
 fs/freevxfs/vxfs_inode.c                  |  4 +-
 fs/fuse/control.c                         |  2 +-
 fs/fuse/dir.c                             |  8 ++--
 fs/fuse/inode.c                           | 18 +++++----
 fs/gfs2/acl.c                             |  2 +-
 fs/gfs2/bmap.c                            | 11 +++---
 fs/gfs2/dir.c                             | 15 +++----
 fs/gfs2/file.c                            |  2 +-
 fs/gfs2/glops.c                           |  4 +-
 fs/gfs2/inode.c                           |  8 ++--
 fs/gfs2/super.c                           |  4 +-
 fs/gfs2/xattr.c                           |  8 ++--
 fs/hfs/catalog.c                          |  8 ++--
 fs/hfs/dir.c                              |  2 +-
 fs/hfs/inode.c                            | 13 +++---
 fs/hfs/sysdep.c                           |  2 +-
 fs/hfsplus/catalog.c                      |  8 ++--
 fs/hfsplus/dir.c                          |  6 +--
 fs/hfsplus/inode.c                        | 14 +++----
 fs/hostfs/hostfs_kern.c                   |  5 ++-
 fs/hpfs/dir.c                             |  8 ++--
 fs/hpfs/inode.c                           |  6 +--
 fs/hpfs/namei.c                           | 26 ++++++------
 fs/hpfs/super.c                           |  5 ++-
 fs/hugetlbfs/inode.c                      | 12 +++---
 fs/inode.c                                | 12 ++++--
 fs/isofs/inode.c                          |  4 +-
 fs/isofs/rock.c                           | 16 ++++----
 fs/jffs2/dir.c                            | 19 ++++-----
 fs/jffs2/file.c                           |  3 +-
 fs/jffs2/fs.c                             | 10 ++---
 fs/jffs2/os-linux.h                       |  2 +-
 fs/jfs/acl.c                              |  2 +-
 fs/jfs/inode.c                            |  2 +-
 fs/jfs/ioctl.c                            |  2 +-
 fs/jfs/jfs_imap.c                         |  8 ++--
 fs/jfs/jfs_inode.c                        |  4 +-
 fs/jfs/namei.c                            | 25 ++++++------
 fs/jfs/super.c                            |  2 +-
 fs/jfs/xattr.c                            |  2 +-
 fs/kernfs/inode.c                         |  4 +-
 fs/libfs.c                                | 32 ++++++++-------
 fs/minix/bitmap.c                         |  2 +-
 fs/minix/dir.c                            |  6 +--
 fs/minix/inode.c                          | 11 +++---
 fs/minix/itree_common.c                   |  4 +-
 fs/minix/namei.c                          |  6 +--
 fs/nfs/callback_proc.c                    |  2 +-
 fs/nfs/fscache.h                          |  4 +-
 fs/nfs/inode.c                            | 21 +++++-----
 fs/nfsd/nfsctl.c                          |  2 +-
 fs/nfsd/nfsfh.c                           |  4 +-
 fs/nfsd/vfs.c                             |  2 +-
 fs/nilfs2/dir.c                           |  6 +--
 fs/nilfs2/inode.c                         | 12 +++---
 fs/nilfs2/ioctl.c                         |  2 +-
 fs/nilfs2/namei.c                         |  8 ++--
 fs/nsfs.c                                 |  2 +-
 fs/ntfs/inode.c                           | 15 +++----
 fs/ntfs/mft.c                             |  3 +-
 fs/ntfs3/file.c                           |  6 +--
 fs/ntfs3/frecord.c                        |  4 +-
 fs/ntfs3/inode.c                          | 14 ++++---
 fs/ntfs3/namei.c                          | 10 ++---
 fs/ntfs3/xattr.c                          |  4 +-
 fs/ocfs2/acl.c                            |  6 +--
 fs/ocfs2/alloc.c                          |  6 +--
 fs/ocfs2/aops.c                           |  2 +-
 fs/ocfs2/dir.c                            |  8 ++--
 fs/ocfs2/dlmfs/dlmfs.c                    |  4 +-
 fs/ocfs2/dlmglue.c                        | 10 +++--
 fs/ocfs2/file.c                           | 16 ++++----
 fs/ocfs2/inode.c                          | 14 ++++---
 fs/ocfs2/move_extents.c                   |  6 +--
 fs/ocfs2/namei.c                          | 22 ++++++-----
 fs/ocfs2/refcounttree.c                   | 14 +++----
 fs/ocfs2/xattr.c                          |  6 +--
 fs/omfs/dir.c                             |  4 +-
 fs/omfs/inode.c                           | 10 ++---
 fs/openpromfs/inode.c                     |  4 +-
 fs/orangefs/namei.c                       |  2 +-
 fs/orangefs/orangefs-utils.c              |  6 +--
 fs/overlayfs/file.c                       |  7 +++-
 fs/overlayfs/util.c                       |  2 +-
 fs/pipe.c                                 |  2 +-
 fs/posix_acl.c                            |  2 +-
 fs/proc/base.c                            |  2 +-
 fs/proc/inode.c                           |  2 +-
 fs/proc/proc_sysctl.c                     |  2 +-
 fs/proc/self.c                            |  2 +-
 fs/proc/thread_self.c                     |  2 +-
 fs/pstore/inode.c                         |  4 +-
 fs/qnx4/inode.c                           |  4 +-
 fs/qnx6/inode.c                           |  4 +-
 fs/ramfs/inode.c                          |  6 +--
 fs/reiserfs/inode.c                       | 14 +++----
 fs/reiserfs/ioctl.c                       |  4 +-
 fs/reiserfs/namei.c                       | 21 +++++-----
 fs/reiserfs/stree.c                       |  4 +-
 fs/reiserfs/super.c                       |  2 +-
 fs/reiserfs/xattr.c                       |  5 ++-
 fs/reiserfs/xattr_acl.c                   |  2 +-
 fs/romfs/super.c                          |  4 +-
 fs/smb/client/file.c                      |  4 +-
 fs/smb/client/fscache.h                   |  5 ++-
 fs/smb/client/inode.c                     | 15 ++++---
 fs/smb/client/smb2ops.c                   |  2 +-
 fs/smb/server/smb2pdu.c                   |  8 ++--
 fs/squashfs/inode.c                       |  2 +-
 fs/stack.c                                |  2 +-
 fs/stat.c                                 |  2 +-
 fs/sysv/dir.c                             |  6 +--
 fs/sysv/ialloc.c                          |  2 +-
 fs/sysv/inode.c                           |  6 +--
 fs/sysv/itree.c                           |  4 +-
 fs/sysv/namei.c                           |  6 +--
 fs/tracefs/inode.c                        |  2 +-
 fs/ubifs/debug.c                          |  4 +-
 fs/ubifs/dir.c                            | 39 +++++++++---------
 fs/ubifs/file.c                           | 16 ++++----
 fs/ubifs/ioctl.c                          |  2 +-
 fs/ubifs/journal.c                        |  4 +-
 fs/ubifs/super.c                          |  4 +-
 fs/ubifs/xattr.c                          |  6 +--
 fs/udf/ialloc.c                           |  2 +-
 fs/udf/inode.c                            | 17 ++++----
 fs/udf/namei.c                            | 24 +++++------
 fs/ufs/dir.c                              |  6 +--
 fs/ufs/ialloc.c                           |  2 +-
 fs/ufs/inode.c                            | 23 ++++++-----
 fs/ufs/namei.c                            |  8 ++--
 fs/vboxsf/utils.c                         |  4 +-
 fs/xfs/libxfs/xfs_inode_buf.c             |  4 +-
 fs/xfs/libxfs/xfs_trans_inode.c           |  2 +-
 fs/xfs/xfs_acl.c                          |  2 +-
 fs/xfs/xfs_bmap_util.c                    |  6 ++-
 fs/xfs/xfs_inode.c                        |  2 +-
 fs/xfs/xfs_inode_item.c                   |  2 +-
 fs/xfs/xfs_iops.c                         |  4 +-
 fs/xfs/xfs_itable.c                       |  4 +-
 fs/zonefs/super.c                         |  8 ++--
 include/linux/fs.h                        |  1 +
 include/linux/fs_stack.h                  |  2 +-
 ipc/mqueue.c                              | 20 +++++-----
 kernel/bpf/inode.c                        |  4 +-
 mm/shmem.c                                | 28 +++++++------
 net/sunrpc/rpc_pipe.c                     |  2 +-
 security/apparmor/apparmorfs.c            |  6 +--
 security/apparmor/policy_unpack.c         |  4 +-
 security/inode.c                          |  2 +-
 security/selinux/selinuxfs.c              |  2 +-
 234 files changed, 851 insertions(+), 808 deletions(-)

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-05-23 10:17     ` Jan Kara
  2023-05-23 10:56       ` Jeff Layton
@ 2023-06-13 19:47       ` Jeff Layton
  1 sibling, 0 replies; 31+ messages in thread
From: Jeff Layton @ 2023-06-13 19:47 UTC (permalink / raw)
  To: Jan Kara
  Cc: Alexander Viro, Christian Brauner, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, 2023-05-23 at 12:17 +0200, Jan Kara wrote:
> On Tue 23-05-23 12:02:40, Jan Kara wrote:
> > On Thu 18-05-23 07:47:35, Jeff Layton wrote:
> > > The VFS always uses coarse-grained timestamp updates for filling out the
> > > ctime and mtime after a change. This has the benefit of allowing
> > > filesystems to optimize away a lot metadata updates, down to around 1
> > > per jiffy, even when a file is under heavy writes.
> > > 
> > > Unfortunately, this has always been an issue when we're exporting via
> > > NFSv3, which relies on timestamps to validate caches. Even with NFSv4, a
> > > lot of exported filesystems don't properly support a change attribute
> > > and are subject to the same problems with timestamp granularity. Other
> > > applications have similar issues (e.g backup applications).
> > > 
> > > Switching to always using fine-grained timestamps would improve the
> > > situation, but that becomes rather expensive, as the underlying
> > > filesystem will have to log a lot more metadata updates.
> > > 
> > > What we need is a way to only use fine-grained timestamps when they are
> > > being actively queried.
> > > 
> > > The kernel always stores normalized ctime values, so only the first 30
> > > bits of the tv_nsec field are ever used. Whenever the mtime changes, the
> > > ctime must also change.
> > > 
> > > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > > has queried the inode for the i_mtime or i_ctime. When this flag is set,
> > > on the next timestamp update, the kernel can fetch a fine-grained
> > > timestamp instead of the usual coarse-grained one.
> > > 
> > > This patch adds the infrastructure this scheme. Filesytems can opt
> > > into it by setting the FS_MULTIGRAIN_TS flag in the fstype.
> > > 
> > > Later patches will convert individual filesystems over to use it.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > 
> > So there are two things I dislike about this series because I think they
> > are fragile:
> > 
> > 1) If we have a filesystem supporting multigrain ts and someone
> > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > should rename i_ctime to something like __i_ctime and always use accessor
> > function for it.
> > 
> > 2) As I already commented in a previous version of the series, the scheme
> > with just one flag for both ctime and mtime and flag getting cleared in
> > current_time() relies on the fact that filesystems always do an equivalent
> > of:
> > 
> > 	inode->i_mtime = inode->i_ctime = current_time();
> > 
> > Otherwise we can do coarse grained update where we should have done a fine
> > grained one. Filesystems often update timestamps like this but not
> > universally. Grepping shows some instances where only inode->i_mtime is set
> > from current_time() e.g. in autofs or bfs. Again a mistake that is rather
> > easy to make and results in subtle issues. I think this would be also
> > nicely solved by renaming i_ctime to __i_ctime and using a function to set
> > ctime. Mtime could then be updated with inode->i_mtime = ctime_peek().
> > 
> > I understand this is quite some churn but a very mechanical one that could
> > be just done with Coccinelle and a few manual fixups. So IMHO it is worth
> > the more robust result.
> 
> Also as I'm thinking about it your current scheme is slightly racy. Suppose
> the filesystem does:
> 
> CPU1					CPU2
> 
> 					statx()
> inode->i_ctime = current_time()
>   current_mg_time()
>     nsec = atomic_long_fetch_andnot(QUERIED, &inode->i_ctime.tv_nsec)
> 					  nsec = atomic_long_fetch_or(QUERIED, &inode->i_ctime.tv_nsec)
>     if (nsec & QUERIED) - not set
>       ktime_get_coarse_real_ts64(&now)
>     return timestamp_truncate(now, inode);
> - QUERIED flag in the inode->i_ctime gets overwritten by the assignment
>   => we need not update ctime due to granularity although it was queried
> 
> One more reason to use explicit function to update inode->i_ctime ;)

Thinking about this some more, I think we can fix the race you pointed
out by just not clearing the queried flag when we fetch the
i_ctime.tv_nsec field when we're updating.

So, instead of atomic_long_fetch_andnot, we'd just want to use an
atomic_long_fetch there, and just let the eventual assignment of
inode->__i_ctime.tv_nsec be what clears the flag.

Any task that goes to update the time during the interim window will
fetch a fine-grained time, but that's what we want anyway.

Since you bring up races though, there are a couple of other things we
should be aware of. Note that both problems exist today too:

1) it's possible for two tasks to race in such a way that the ctime goes
backward. There's no synchronization between tasks doing the updating,
so an older time can overwrite a newer one. I think you'd need a pretty
tight race to observe this though.

2) it's possible to fetch a "torn" timestamp out of the inode.
timespec64 is two words, and we don't order its loads or stores. We
could consider adding a seqcount_t and some barriers and fixing it that
way. I'm not sure it's worth it though, given that we haven't worried
about this in the past.

For now, I propose that we ignore both problems, unless and until we can
prove that they are more than theoretical.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime
  2023-06-13 13:09         ` Jeff Layton
@ 2023-06-14  6:29           ` Christian Brauner
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-06-14  6:29 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, Alexander Viro, Darrick J. Wong, Hugh Dickins,
	Andrew Morton, Dave Chinner, Chuck Lever, Amir Goldstein,
	David Howells, Neil Brown, Matthew Wilcox, Andreas Dilger,
	Theodore T'so, Chris Mason, Josef Bacik, David Sterba,
	Namjae Jeon, Steve French, Sergey Senozhatsky, Tom Talpey,
	linux-fsdevel, linux-kernel, linux-xfs, linux-btrfs, linux-ext4,
	linux-mm, linux-nfs, linux-cifs

On Tue, Jun 13, 2023 at 09:09:29AM -0400, Jeff Layton wrote:
> On Tue, 2023-05-23 at 14:46 +0200, Jan Kara wrote:
> > On Tue 23-05-23 06:40:08, Jeff Layton wrote:
> > > On Tue, 2023-05-23 at 12:02 +0200, Jan Kara wrote:
> > > > 
> > > > So there are two things I dislike about this series because I think they
> > > > are fragile:
> > > > 
> > > > 1) If we have a filesystem supporting multigrain ts and someone
> > > > accidentally directly uses the value of inode->i_ctime, he can get bogus
> > > > value (with QUERIED flag). This mistake is very easy to do. So I think we
> > > > should rename i_ctime to something like __i_ctime and always use accessor
> > > > function for it.
> > > > 
> > > 
> > > We could do this, but it'll be quite invasive. We'd have to change any
> > > place that touches i_ctime (and there are a lot of them), even on
> > > filesystems that are not being converted.
> > 
> > Yes, that's why I suggested Coccinelle to deal with this.
> 
> 
> I've done the work to convert all of the accesses of i_ctime into
> accessor functions in the kernel. The current state of it is here:
> 
>    
> https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/commit/?h=ctime
> 
> As expected, it touches a lot of code, all over the place. So far I have
> most of the conversion in one giant patch, and I need to split it up
> (probably per-subsystem).

Yeah, you have time since it'll be v6.6 material.

> 
> What's the best way to feed this change into mainline? Should I try to
> get subsystem maintainers to pick these up, or are we better off feeding
> this in via a separate branch?

I would prefer if we send them all through the vfs tree since trickle
down conversions are otherwise very painful and potentially very slow.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-06-14  6:30 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 11:47 [PATCH v4 0/9] fs: implement multigrain timestamps Jeff Layton
2023-05-18 11:47 ` [PATCH v4 1/9] fs: pass the request_mask to generic_fillattr Jeff Layton
2023-05-23  9:17   ` Jan Kara
2023-05-18 11:47 ` [PATCH v4 2/9] fs: add infrastructure for multigrain inode i_m/ctime Jeff Layton
2023-05-23 10:02   ` Jan Kara
2023-05-23 10:17     ` Jan Kara
2023-05-23 10:56       ` Jeff Layton
2023-05-23 11:01         ` Christian Brauner
2023-05-23 11:15           ` Jeff Layton
2023-06-13 19:47       ` Jeff Layton
2023-05-23 10:38     ` Christian Brauner
2023-05-23 10:40     ` Jeff Layton
2023-05-23 12:46       ` Jan Kara
2023-06-13 13:09         ` Jeff Layton
2023-06-14  6:29           ` Christian Brauner
2023-05-18 11:47 ` [PATCH v4 3/9] overlayfs: allow it to handle multigrain timestamps Jeff Layton
2023-05-18 11:47 ` [PATCH v4 4/9] nfsd: ensure we use ctime_peek to grab the inode->i_ctime Jeff Layton
2023-05-18 13:43   ` Chuck Lever III
2023-05-18 15:31     ` Jeff Layton
2023-05-19 10:36       ` Christian Brauner
2023-05-19 11:22         ` Jeff Layton
2023-05-18 11:47 ` [PATCH v4 5/9] ksmbd: use ctime_peek to grab the ctime out of the inode Jeff Layton
2023-05-18 11:47 ` [PATCH v4 6/9] tmpfs: add support for multigrain timestamps Jeff Layton
2023-05-18 11:47 ` [PATCH v4 7/9] xfs: switch to " Jeff Layton
2023-05-18 11:47 ` [PATCH v4 8/9] ext4: convert " Jeff Layton
2023-05-20 20:29   ` Theodore Ts'o
2023-05-18 11:47 ` [PATCH v4 9/9] btrfs: " Jeff Layton
2023-05-22  9:56   ` David Sterba
2023-05-22 10:08     ` Jeff Layton
2023-05-22 10:53       ` Christian Brauner
2023-05-22  9:54 ` [PATCH v4 0/9] fs: implement " Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).