linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes
@ 2022-08-24 16:03 Lukas Czerner
  2022-08-24 16:03 ` [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Lukas Czerner @ 2022-08-24 16:03 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, jlayton, jack, linux-fsdevel, ebiggers, david, Christian Brauner

ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.

The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
---
v2, v3, v4: no change

 fs/ext4/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 601214453c3a..2a220be34caa 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 	}
 	ext4_fc_track_inode(handle, inode);
 
-	if (IS_I_VERSION(inode))
+	/*
+	 * ea_inodes are using i_version for storing reference count, don't
+	 * mess with it
+	 */
+	if (IS_I_VERSION(inode) &&
+	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
 		inode_inc_iversion(inode);
 
 	/* the do_update_inode consumes one bh->b_count */
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-24 16:03 [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
@ 2022-08-24 16:03 ` Lukas Czerner
  2022-08-24 17:31   ` Jan Kara
  2022-08-25 10:06   ` [PATCH v5] " Lukas Czerner
  2022-08-24 16:03 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
  2022-09-29 14:58 ` [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Theodore Ts'o
  2 siblings, 2 replies; 13+ messages in thread
From: Lukas Czerner @ 2022-08-24 16:03 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, jlayton, jack, linux-fsdevel, ebiggers, david, Christoph Hellwig

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
v2: Reworked according to suggestions from Jan
v3: Update documentation, add comments, change flag to flags in
    xfs_fs_dirty_inode()
v4: Update documentation, simplify condition in xfs_fs_dirty_inode()

 Documentation/filesystems/vfs.rst |  3 +++
 fs/fs-writeback.c                 | 34 ++++++++++++++++++++-----------
 fs/xfs/xfs_super.c                | 10 +++++++--
 include/linux/fs.h                |  9 ++++----
 4 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 6cd6953e175b..b2ef2449aed9 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -274,6 +274,9 @@ or bottom half).
 	This is specifically for the inode itself being marked dirty,
 	not its data.  If the update needs to be persisted by fdatasync(),
 	then I_DIRTY_DATASYNC will be set in the flags argument.
+	I_DIRTY_TIME will be set in the flags in case lazytime is enabled
+	and struct inode has times updated since the last ->dirty_inode
+	call.
 
 ``write_inode``
 	this method is called when the VFS needs to write an inode to
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..638dbf143727 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
 	 */
 	if (!(inode->i_state & I_DIRTY_ALL))
 		inode_cgwb_move_to_attached(inode, wb);
-	else if (!(inode->i_state & I_SYNC_QUEUED) &&
-		 (inode->i_state & I_DIRTY))
-		redirty_tail_locked(inode, wb);
+	else if (!(inode->i_state & I_SYNC_QUEUED)) {
+		if ((inode->i_state & I_DIRTY))
+			redirty_tail_locked(inode, wb);
+		else if (inode->i_state & I_DIRTY_TIME) {
+			inode->dirtied_when = jiffies;
+			inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+		}
+	}
 
 	spin_unlock(&wb->list_lock);
 	inode_sync_complete(inode);
@@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	trace_writeback_mark_inode_dirty(inode, flags);
 
 	if (flags & I_DIRTY_INODE) {
+
+		/* Inode timestamp update will piggback on this dirtying */
+		if (inode->i_state & I_DIRTY_TIME) {
+			spin_lock(&inode->i_lock);
+			if (inode->i_state & I_DIRTY_TIME) {
+				inode->i_state &= ~I_DIRTY_TIME;
+				flags |= I_DIRTY_TIME;
+			}
+			spin_unlock(&inode->i_lock);
+		}
+
 		/*
 		 * Notify the filesystem about the inode being dirtied, so that
 		 * (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2394,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		 */
 		trace_writeback_dirty_inode_start(inode, flags);
 		if (sb->s_op->dirty_inode)
-			sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+			sb->s_op->dirty_inode(inode,
+				flags & (I_DIRTY_INODE | I_DIRTY_TIME));
 		trace_writeback_dirty_inode(inode, flags);
 
 		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2416,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	 */
 	smp_mb();
 
-	if (((inode->i_state & flags) == flags) ||
-	    (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+	if ((inode->i_state & flags) == flags)
 		return;
 
 	spin_lock(&inode->i_lock);
-	if (dirtytime && (inode->i_state & I_DIRTY_INODE))
-		goto out_unlock_inode;
 	if ((inode->i_state & flags) != flags) {
 		const int was_dirty = inode->i_state & I_DIRTY;
 
 		inode_attach_wb(inode, NULL);
 
-		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
-		if (flags & I_DIRTY_INODE)
-			inode->i_state &= ~I_DIRTY_TIME;
 		inode->i_state |= flags;
 
 		/*
@@ -2486,7 +2497,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 out_unlock:
 	if (wb)
 		spin_unlock(&wb->list_lock);
-out_unlock_inode:
 	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9ac59814bbb6..f029c6702dda 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -653,7 +653,7 @@ xfs_fs_destroy_inode(
 static void
 xfs_fs_dirty_inode(
 	struct inode			*inode,
-	int				flag)
+	int				flags)
 {
 	struct xfs_inode		*ip = XFS_I(inode);
 	struct xfs_mount		*mp = ip->i_mount;
@@ -661,7 +661,13 @@ xfs_fs_dirty_inode(
 
 	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
 		return;
-	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+
+	/*
+	 * Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC)
+	 * and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be passed
+	 * in flags possibly together with I_DIRTY_SYNC.
+	 */
+	if ((flags & ~I_DIRTY_TIME) != I_DIRTY_SYNC || !(flags & I_DIRTY_TIME))
 		return;
 
 	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..56a4b4b02477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2371,13 +2371,14 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *			don't have to write inode on fdatasync() when only
  *			e.g. the timestamps have changed.
  * I_DIRTY_PAGES	Inode has dirty pages.  Inode itself may be clean.
- * I_DIRTY_TIME		The inode itself only has dirty timestamps, and the
+ * I_DIRTY_TIME		The inode itself has dirty timestamps, and the
  *			lazytime mount option is enabled.  We keep track of this
  *			separately from I_DIRTY_SYNC in order to implement
  *			lazytime.  This gets cleared if I_DIRTY_INODE
- *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
- *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- *			i_state, but not both.  I_DIRTY_PAGES may still be set.
+ *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ *			in place because writeback might already be in progress
+ *			and we don't want to lose the time update
  * I_NEW		Serves as both a mutex and completion notification.
  *			New inodes set I_NEW.  If two processes both create
  *			the same inode, one of them will release its inode and
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-24 16:03 [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
  2022-08-24 16:03 ` [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
@ 2022-08-24 16:03 ` Lukas Czerner
  2022-08-26 16:11   ` Jeff Layton
  2022-09-29 14:58 ` [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Theodore Ts'o
  2 siblings, 1 reply; 13+ messages in thread
From: Lukas Czerner @ 2022-08-24 16:03 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, jlayton, jack, linux-fsdevel, ebiggers, david,
	Benjamin Coddington, Christoph Hellwig, Darrick J . Wong,
	Christian Brauner

From: Jeff Layton <jlayton@kernel.org>

The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).

Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.

Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, remove the handling of
Opt_i_version as well, since we're almost to 5.20 anyway.

Ideally, we'd couple this change with a way to disable the i_version
counter (just in case), but the way the iversion mount option was
implemented makes that difficult to do. We'd need to add a new mount
option altogether or do something with tune2fs. That's probably best
left to later patches if it turns out to be needed.

[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]

Cc: Dave Chinner <david@fromorbit.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
v3: Removed leftover bits of i_version from ext4_apply_options
v4: no change

 fs/ext4/inode.c |  5 ++---
 fs/ext4/super.c | 21 ++++-----------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2a220be34caa..c77d40f05763 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 			return -EINVAL;
 		}
 
-		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+		if (attr->ia_size != inode->i_size)
 			inode_inc_iversion(inode);
 
 		if (shrink) {
@@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 	 * ea_inodes are using i_version for storing reference count, don't
 	 * mess with it
 	 */
-	if (IS_I_VERSION(inode) &&
-	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
 		inode_inc_iversion(inode);
 
 	/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9a66abcca1a8..1c953f6d400e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
 	Opt_inlinecrypt,
 	Opt_usrjquota, Opt_grpjquota, Opt_quota,
 	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
-	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+	Opt_usrquota, Opt_grpquota, Opt_prjquota,
 	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
 	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
 	fsparam_flag	("barrier",		Opt_barrier),
 	fsparam_u32	("barrier",		Opt_barrier),
 	fsparam_flag	("nobarrier",		Opt_nobarrier),
-	fsparam_flag	("i_version",		Opt_i_version),
 	fsparam_flag	("dax",			Opt_dax),
 	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
 	fsparam_u32	("stripe",		Opt_stripe),
@@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case Opt_abort:
 		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
 		return 0;
-	case Opt_i_version:
-		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
-		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
-		ctx_set_flags(ctx, SB_I_VERSION);
-		return 0;
 	case Opt_inlinecrypt:
 #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
 		ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags &= ~ctx->mask_s_flags;
 	sb->s_flags |= ctx->vals_s_flags;
 
-	/*
-	 * i_version differs from common mount option iversion so we have
-	 * to let vfs know that it was set, otherwise it would get cleared
-	 * on remount
-	 */
-	if (ctx->mask_s_flags & SB_I_VERSION)
-		fc->sb_flags |= SB_I_VERSION;
-
 #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
 	APPLY(s_commit_interval);
 	APPLY(s_stripe);
@@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
 		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
 	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
 		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
-	if (sb->s_flags & SB_I_VERSION)
-		SEQ_OPTS_PUTS("i_version");
 	if (nodefs || sbi->s_stripe)
 		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
 	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
 		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
 
+	/* i_version is always enabled now */
+	sb->s_flags |= SB_I_VERSION;
+
 	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
 	    (ext4_has_compat_features(sb) ||
 	     ext4_has_ro_compat_features(sb) ||
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-24 16:03 ` [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
@ 2022-08-24 17:31   ` Jan Kara
  2022-08-25 10:06   ` [PATCH v5] " Lukas Czerner
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Kara @ 2022-08-24 17:31 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, tytso, jlayton, jack, linux-fsdevel, ebiggers, david,
	Christoph Hellwig

On Wed 24-08-22 18:03:48, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
> 
> The problem can be reproduced on ext4 by running xfstest generic/622
> with -o iversion mount option.
> 
> Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> I_DIRTY_INODE. Also make sure that the case is properly handled in
> writeback_single_inode() as well. Additionally changes in
> xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
> 
> Thanks Jan Kara for suggestions on how to make this work properly.
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Suggested-by: Jan Kara <jack@suse.cz>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

Just two nits below:

> @@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
>  	trace_writeback_mark_inode_dirty(inode, flags);
>  
>  	if (flags & I_DIRTY_INODE) {
> +

Pointless empty line here.

> +		/* Inode timestamp update will piggback on this dirtying */

Maybe expand this comment to:

		/*
		 * Inode timestamp update will piggback on this dirtying.
		 * We tell ->dirty_inode callback that timestamps need to
		 * be updated by setting I_DIRTY_TIME in flags.
		 */
> +		if (inode->i_state & I_DIRTY_TIME) {
> +			spin_lock(&inode->i_lock);
> +			if (inode->i_state & I_DIRTY_TIME) {
> +				inode->i_state &= ~I_DIRTY_TIME;
> +				flags |= I_DIRTY_TIME;
> +			}
> +			spin_unlock(&inode->i_lock);
> +		}
> +

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-24 16:03 ` [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
  2022-08-24 17:31   ` Jan Kara
@ 2022-08-25 10:06   ` Lukas Czerner
  2022-09-29 14:58     ` Theodore Ts'o
  1 sibling, 1 reply; 13+ messages in thread
From: Lukas Czerner @ 2022-08-25 10:06 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, jlayton, jack, linux-fsdevel, ebiggers, david, Christoph Hellwig

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
---
v2: Reworked according to suggestions from Jan
v3: Update documentation, add comments, change flag to flags in
    xfs_fs_dirty_inode()
v4: Update documentation, simplify condition in xfs_fs_dirty_inode()
v5: Update comment for condition in __mark_inode_dirty()


 Documentation/filesystems/vfs.rst |  3 +++
 fs/fs-writeback.c                 | 37 +++++++++++++++++++++----------
 fs/xfs/xfs_super.c                | 10 +++++++--
 include/linux/fs.h                |  9 ++++----
 4 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index 6cd6953e175b..b2ef2449aed9 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -274,6 +274,9 @@ or bottom half).
 	This is specifically for the inode itself being marked dirty,
 	not its data.  If the update needs to be persisted by fdatasync(),
 	then I_DIRTY_DATASYNC will be set in the flags argument.
+	I_DIRTY_TIME will be set in the flags in case lazytime is enabled
+	and struct inode has times updated since the last ->dirty_inode
+	call.
 
 ``write_inode``
 	this method is called when the VFS needs to write an inode to
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..45860591d51f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
 	 */
 	if (!(inode->i_state & I_DIRTY_ALL))
 		inode_cgwb_move_to_attached(inode, wb);
-	else if (!(inode->i_state & I_SYNC_QUEUED) &&
-		 (inode->i_state & I_DIRTY))
-		redirty_tail_locked(inode, wb);
+	else if (!(inode->i_state & I_SYNC_QUEUED)) {
+		if ((inode->i_state & I_DIRTY))
+			redirty_tail_locked(inode, wb);
+		else if (inode->i_state & I_DIRTY_TIME) {
+			inode->dirtied_when = jiffies;
+			inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+		}
+	}
 
 	spin_unlock(&wb->list_lock);
 	inode_sync_complete(inode);
@@ -2369,6 +2374,20 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	trace_writeback_mark_inode_dirty(inode, flags);
 
 	if (flags & I_DIRTY_INODE) {
+		/*
+		 * Inode timestamp update will piggback on this dirtying.
+		 * We tell ->dirty_inode callback that timestamps need to
+		 * be updated by setting I_DIRTY_TIME in flags.
+		 */
+		if (inode->i_state & I_DIRTY_TIME) {
+			spin_lock(&inode->i_lock);
+			if (inode->i_state & I_DIRTY_TIME) {
+				inode->i_state &= ~I_DIRTY_TIME;
+				flags |= I_DIRTY_TIME;
+			}
+			spin_unlock(&inode->i_lock);
+		}
+
 		/*
 		 * Notify the filesystem about the inode being dirtied, so that
 		 * (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2397,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		 */
 		trace_writeback_dirty_inode_start(inode, flags);
 		if (sb->s_op->dirty_inode)
-			sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+			sb->s_op->dirty_inode(inode,
+				flags & (I_DIRTY_INODE | I_DIRTY_TIME));
 		trace_writeback_dirty_inode(inode, flags);
 
 		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2419,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	 */
 	smp_mb();
 
-	if (((inode->i_state & flags) == flags) ||
-	    (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+	if ((inode->i_state & flags) == flags)
 		return;
 
 	spin_lock(&inode->i_lock);
-	if (dirtytime && (inode->i_state & I_DIRTY_INODE))
-		goto out_unlock_inode;
 	if ((inode->i_state & flags) != flags) {
 		const int was_dirty = inode->i_state & I_DIRTY;
 
 		inode_attach_wb(inode, NULL);
 
-		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
-		if (flags & I_DIRTY_INODE)
-			inode->i_state &= ~I_DIRTY_TIME;
 		inode->i_state |= flags;
 
 		/*
@@ -2486,7 +2500,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 out_unlock:
 	if (wb)
 		spin_unlock(&wb->list_lock);
-out_unlock_inode:
 	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 9ac59814bbb6..f029c6702dda 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -653,7 +653,7 @@ xfs_fs_destroy_inode(
 static void
 xfs_fs_dirty_inode(
 	struct inode			*inode,
-	int				flag)
+	int				flags)
 {
 	struct xfs_inode		*ip = XFS_I(inode);
 	struct xfs_mount		*mp = ip->i_mount;
@@ -661,7 +661,13 @@ xfs_fs_dirty_inode(
 
 	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
 		return;
-	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+
+	/*
+	 * Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC)
+	 * and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be passed
+	 * in flags possibly together with I_DIRTY_SYNC.
+	 */
+	if ((flags & ~I_DIRTY_TIME) != I_DIRTY_SYNC || !(flags & I_DIRTY_TIME))
 		return;
 
 	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..56a4b4b02477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2371,13 +2371,14 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *			don't have to write inode on fdatasync() when only
  *			e.g. the timestamps have changed.
  * I_DIRTY_PAGES	Inode has dirty pages.  Inode itself may be clean.
- * I_DIRTY_TIME		The inode itself only has dirty timestamps, and the
+ * I_DIRTY_TIME		The inode itself has dirty timestamps, and the
  *			lazytime mount option is enabled.  We keep track of this
  *			separately from I_DIRTY_SYNC in order to implement
  *			lazytime.  This gets cleared if I_DIRTY_INODE
- *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
- *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- *			i_state, but not both.  I_DIRTY_PAGES may still be set.
+ *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ *			in place because writeback might already be in progress
+ *			and we don't want to lose the time update
  * I_NEW		Serves as both a mutex and completion notification.
  *			New inodes set I_NEW.  If two processes both create
  *			the same inode, one of them will release its inode and
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-24 16:03 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
@ 2022-08-26 16:11   ` Jeff Layton
  2022-08-29  8:17     ` Lukas Czerner
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2022-08-26 16:11 UTC (permalink / raw)
  To: Lukas Czerner, linux-ext4
  Cc: tytso, jack, linux-fsdevel, ebiggers, david, Benjamin Coddington,
	Christoph Hellwig, Darrick J . Wong, Christian Brauner

On Wed, 2022-08-24 at 18:03 +0200, Lukas Czerner wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> The original i_version implementation was pretty expensive, requiring a
> log flush on every change. Because of this, it was gated behind a mount
> option (implemented via the MS_I_VERSION mountoption flag).
> 
> Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> i_version flag much less expensive, so there is no longer a performance
> penalty from enabling it. xfs and btrfs already enable it
> unconditionally when the on-disk format can support it.
> 
> Have ext4 ignore the SB_I_VERSION flag, and just enable it
> unconditionally. While we're in here, remove the handling of
> Opt_i_version as well, since we're almost to 5.20 anyway.
> 
> Ideally, we'd couple this change with a way to disable the i_version
> counter (just in case), but the way the iversion mount option was
> implemented makes that difficult to do. We'd need to add a new mount
> option altogether or do something with tune2fs. That's probably best
> left to later patches if it turns out to be needed.
> 
> [ Removed leftover bits of i_version from ext4_apply_options() since it
> now can't ever be set in ctx->mask_s_flags -- lczerner ]
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> Reviewed-by: Jan Kara <jack@suse.cz>
> ---
> v3: Removed leftover bits of i_version from ext4_apply_options
> v4: no change
> 
>  fs/ext4/inode.c |  5 ++---
>  fs/ext4/super.c | 21 ++++-----------------
>  2 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2a220be34caa..c77d40f05763 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
>  			return -EINVAL;
>  		}
>  
> -		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> +		if (attr->ia_size != inode->i_size)
>  			inode_inc_iversion(inode);
>  
>  		if (shrink) {
> @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
>  	 * ea_inodes are using i_version for storing reference count, don't
>  	 * mess with it
>  	 */
> -	if (IS_I_VERSION(inode) &&
> -	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> +	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
>  		inode_inc_iversion(inode);
>  
>  	/* the do_update_inode consumes one bh->b_count */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9a66abcca1a8..1c953f6d400e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1585,7 +1585,7 @@ enum {
>  	Opt_inlinecrypt,
>  	Opt_usrjquota, Opt_grpjquota, Opt_quota,
>  	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> -	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> +	Opt_usrquota, Opt_grpquota, Opt_prjquota,
>  	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
>  	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
>  	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
>  	fsparam_flag	("barrier",		Opt_barrier),
>  	fsparam_u32	("barrier",		Opt_barrier),
>  	fsparam_flag	("nobarrier",		Opt_nobarrier),
> -	fsparam_flag	("i_version",		Opt_i_version),
>  	fsparam_flag	("dax",			Opt_dax),
>  	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
>  	fsparam_u32	("stripe",		Opt_stripe),
> @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  	case Opt_abort:
>  		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
>  		return 0;
> -	case Opt_i_version:
> -		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> -		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> -		ctx_set_flags(ctx, SB_I_VERSION);
> -		return 0;
>  	case Opt_inlinecrypt:
>  #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
>  		ctx_set_flags(ctx, SB_INLINECRYPT);
> @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
>  	sb->s_flags &= ~ctx->mask_s_flags;
>  	sb->s_flags |= ctx->vals_s_flags;
>  
> -	/*
> -	 * i_version differs from common mount option iversion so we have
> -	 * to let vfs know that it was set, otherwise it would get cleared
> -	 * on remount
> -	 */
> -	if (ctx->mask_s_flags & SB_I_VERSION)
> -		fc->sb_flags |= SB_I_VERSION;
> -
>  #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
>  	APPLY(s_commit_interval);
>  	APPLY(s_stripe);
> @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
>  		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
>  	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
>  		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> -	if (sb->s_flags & SB_I_VERSION)
> -		SEQ_OPTS_PUTS("i_version");
>  	if (nodefs || sbi->s_stripe)
>  		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
>  	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
>  		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
>  
> +	/* i_version is always enabled now */
> +	sb->s_flags |= SB_I_VERSION;
> +
>  	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
>  	    (ext4_has_compat_features(sb) ||
>  	     ext4_has_ro_compat_features(sb) ||

Hi Lukas,

I know I had originally asked you to shepherd this patch into mainline,
but I think it may be better to wait on it for now. Since I asked that,
we've since found out that ext4 is bumping the i_version counter on
atime updates. It'd be best to get that fixed before we turn this on
unconditionally, since it could cause a performance regression in some
cases. I'll plan to pick this back up for my latest i_version series if
that sounds ok to you.

Sorry for the back and forth, and thanks again!

Cheers,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-26 16:11   ` Jeff Layton
@ 2022-08-29  8:17     ` Lukas Czerner
  2022-08-29 10:16       ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Lukas Czerner @ 2022-08-29  8:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-ext4, tytso, jack, linux-fsdevel, ebiggers, david,
	Benjamin Coddington, Christoph Hellwig, Darrick J . Wong,
	Christian Brauner

On Fri, Aug 26, 2022 at 12:11:23PM -0400, Jeff Layton wrote:
> On Wed, 2022-08-24 at 18:03 +0200, Lukas Czerner wrote:
> > From: Jeff Layton <jlayton@kernel.org>
> > 
> > The original i_version implementation was pretty expensive, requiring a
> > log flush on every change. Because of this, it was gated behind a mount
> > option (implemented via the MS_I_VERSION mountoption flag).
> > 
> > Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> > i_version flag much less expensive, so there is no longer a performance
> > penalty from enabling it. xfs and btrfs already enable it
> > unconditionally when the on-disk format can support it.
> > 
> > Have ext4 ignore the SB_I_VERSION flag, and just enable it
> > unconditionally. While we're in here, remove the handling of
> > Opt_i_version as well, since we're almost to 5.20 anyway.
> > 
> > Ideally, we'd couple this change with a way to disable the i_version
> > counter (just in case), but the way the iversion mount option was
> > implemented makes that difficult to do. We'd need to add a new mount
> > option altogether or do something with tune2fs. That's probably best
> > left to later patches if it turns out to be needed.
> > 
> > [ Removed leftover bits of i_version from ext4_apply_options() since it
> > now can't ever be set in ctx->mask_s_flags -- lczerner ]
> > 
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Benjamin Coddington <bcodding@redhat.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Darrick J. Wong <djwong@kernel.org>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > ---
> > v3: Removed leftover bits of i_version from ext4_apply_options
> > v4: no change
> > 
> >  fs/ext4/inode.c |  5 ++---
> >  fs/ext4/super.c | 21 ++++-----------------
> >  2 files changed, 6 insertions(+), 20 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 2a220be34caa..c77d40f05763 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> >  			return -EINVAL;
> >  		}
> >  
> > -		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> > +		if (attr->ia_size != inode->i_size)
> >  			inode_inc_iversion(inode);
> >  
> >  		if (shrink) {
> > @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> >  	 * ea_inodes are using i_version for storing reference count, don't
> >  	 * mess with it
> >  	 */
> > -	if (IS_I_VERSION(inode) &&
> > -	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > +	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> >  		inode_inc_iversion(inode);
> >  
> >  	/* the do_update_inode consumes one bh->b_count */
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 9a66abcca1a8..1c953f6d400e 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -1585,7 +1585,7 @@ enum {
> >  	Opt_inlinecrypt,
> >  	Opt_usrjquota, Opt_grpjquota, Opt_quota,
> >  	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > -	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> > +	Opt_usrquota, Opt_grpquota, Opt_prjquota,
> >  	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
> >  	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> >  	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> > @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
> >  	fsparam_flag	("barrier",		Opt_barrier),
> >  	fsparam_u32	("barrier",		Opt_barrier),
> >  	fsparam_flag	("nobarrier",		Opt_nobarrier),
> > -	fsparam_flag	("i_version",		Opt_i_version),
> >  	fsparam_flag	("dax",			Opt_dax),
> >  	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
> >  	fsparam_u32	("stripe",		Opt_stripe),
> > @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
> >  	case Opt_abort:
> >  		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
> >  		return 0;
> > -	case Opt_i_version:
> > -		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> > -		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> > -		ctx_set_flags(ctx, SB_I_VERSION);
> > -		return 0;
> >  	case Opt_inlinecrypt:
> >  #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
> >  		ctx_set_flags(ctx, SB_INLINECRYPT);
> > @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
> >  	sb->s_flags &= ~ctx->mask_s_flags;
> >  	sb->s_flags |= ctx->vals_s_flags;
> >  
> > -	/*
> > -	 * i_version differs from common mount option iversion so we have
> > -	 * to let vfs know that it was set, otherwise it would get cleared
> > -	 * on remount
> > -	 */
> > -	if (ctx->mask_s_flags & SB_I_VERSION)
> > -		fc->sb_flags |= SB_I_VERSION;
> > -
> >  #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
> >  	APPLY(s_commit_interval);
> >  	APPLY(s_stripe);
> > @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> >  		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
> >  	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
> >  		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> > -	if (sb->s_flags & SB_I_VERSION)
> > -		SEQ_OPTS_PUTS("i_version");
> >  	if (nodefs || sbi->s_stripe)
> >  		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
> >  	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> > @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> >  	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
> >  		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
> >  
> > +	/* i_version is always enabled now */
> > +	sb->s_flags |= SB_I_VERSION;
> > +
> >  	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> >  	    (ext4_has_compat_features(sb) ||
> >  	     ext4_has_ro_compat_features(sb) ||
> 
> Hi Lukas,
> 
> I know I had originally asked you to shepherd this patch into mainline,
> but I think it may be better to wait on it for now. Since I asked that,
> we've since found out that ext4 is bumping the i_version counter on
> atime updates. It'd be best to get that fixed before we turn this on
> unconditionally, since it could cause a performance regression in some
> cases. I'll plan to pick this back up for my latest i_version series if
> that sounds ok to you.
> 
> Sorry for the back and forth, and thanks again!

Hi Jeff,

sure, no problem. I can drop the patch. The rest of the series is still
valid though.

Thanks!
-Lukas

> 
> Cheers,
> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-29  8:17     ` Lukas Czerner
@ 2022-08-29 10:16       ` Jeff Layton
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2022-08-29 10:16 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, tytso, jack, linux-fsdevel, ebiggers, david,
	Benjamin Coddington, Christoph Hellwig, Darrick J . Wong,
	Christian Brauner

On Mon, 2022-08-29 at 10:17 +0200, Lukas Czerner wrote:
> On Fri, Aug 26, 2022 at 12:11:23PM -0400, Jeff Layton wrote:
> > On Wed, 2022-08-24 at 18:03 +0200, Lukas Czerner wrote:
> > > From: Jeff Layton <jlayton@kernel.org>
> > > 
> > > The original i_version implementation was pretty expensive, requiring a
> > > log flush on every change. Because of this, it was gated behind a mount
> > > option (implemented via the MS_I_VERSION mountoption flag).
> > > 
> > > Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> > > i_version flag much less expensive, so there is no longer a performance
> > > penalty from enabling it. xfs and btrfs already enable it
> > > unconditionally when the on-disk format can support it.
> > > 
> > > Have ext4 ignore the SB_I_VERSION flag, and just enable it
> > > unconditionally. While we're in here, remove the handling of
> > > Opt_i_version as well, since we're almost to 5.20 anyway.
> > > 
> > > Ideally, we'd couple this change with a way to disable the i_version
> > > counter (just in case), but the way the iversion mount option was
> > > implemented makes that difficult to do. We'd need to add a new mount
> > > option altogether or do something with tune2fs. That's probably best
> > > left to later patches if it turns out to be needed.
> > > 
> > > [ Removed leftover bits of i_version from ext4_apply_options() since it
> > > now can't ever be set in ctx->mask_s_flags -- lczerner ]
> > > 
> > > Cc: Dave Chinner <david@fromorbit.com>
> > > Cc: Benjamin Coddington <bcodding@redhat.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Darrick J. Wong <djwong@kernel.org>
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > > Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > ---
> > > v3: Removed leftover bits of i_version from ext4_apply_options
> > > v4: no change
> > > 
> > >  fs/ext4/inode.c |  5 ++---
> > >  fs/ext4/super.c | 21 ++++-----------------
> > >  2 files changed, 6 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 2a220be34caa..c77d40f05763 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> > >  			return -EINVAL;
> > >  		}
> > >  
> > > -		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> > > +		if (attr->ia_size != inode->i_size)
> > >  			inode_inc_iversion(inode);
> > >  
> > >  		if (shrink) {
> > > @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> > >  	 * ea_inodes are using i_version for storing reference count, don't
> > >  	 * mess with it
> > >  	 */
> > > -	if (IS_I_VERSION(inode) &&
> > > -	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > > +	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> > >  		inode_inc_iversion(inode);
> > >  
> > >  	/* the do_update_inode consumes one bh->b_count */
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index 9a66abcca1a8..1c953f6d400e 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -1585,7 +1585,7 @@ enum {
> > >  	Opt_inlinecrypt,
> > >  	Opt_usrjquota, Opt_grpjquota, Opt_quota,
> > >  	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> > > -	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> > > +	Opt_usrquota, Opt_grpquota, Opt_prjquota,
> > >  	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
> > >  	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> > >  	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> > > @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
> > >  	fsparam_flag	("barrier",		Opt_barrier),
> > >  	fsparam_u32	("barrier",		Opt_barrier),
> > >  	fsparam_flag	("nobarrier",		Opt_nobarrier),
> > > -	fsparam_flag	("i_version",		Opt_i_version),
> > >  	fsparam_flag	("dax",			Opt_dax),
> > >  	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
> > >  	fsparam_u32	("stripe",		Opt_stripe),
> > > @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > >  	case Opt_abort:
> > >  		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
> > >  		return 0;
> > > -	case Opt_i_version:
> > > -		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> > > -		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> > > -		ctx_set_flags(ctx, SB_I_VERSION);
> > > -		return 0;
> > >  	case Opt_inlinecrypt:
> > >  #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
> > >  		ctx_set_flags(ctx, SB_INLINECRYPT);
> > > @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
> > >  	sb->s_flags &= ~ctx->mask_s_flags;
> > >  	sb->s_flags |= ctx->vals_s_flags;
> > >  
> > > -	/*
> > > -	 * i_version differs from common mount option iversion so we have
> > > -	 * to let vfs know that it was set, otherwise it would get cleared
> > > -	 * on remount
> > > -	 */
> > > -	if (ctx->mask_s_flags & SB_I_VERSION)
> > > -		fc->sb_flags |= SB_I_VERSION;
> > > -
> > >  #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
> > >  	APPLY(s_commit_interval);
> > >  	APPLY(s_stripe);
> > > @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> > >  		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
> > >  	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
> > >  		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> > > -	if (sb->s_flags & SB_I_VERSION)
> > > -		SEQ_OPTS_PUTS("i_version");
> > >  	if (nodefs || sbi->s_stripe)
> > >  		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
> > >  	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> > > @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > >  	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
> > >  		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
> > >  
> > > +	/* i_version is always enabled now */
> > > +	sb->s_flags |= SB_I_VERSION;
> > > +
> > >  	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> > >  	    (ext4_has_compat_features(sb) ||
> > >  	     ext4_has_ro_compat_features(sb) ||
> > 
> > Hi Lukas,
> > 
> > I know I had originally asked you to shepherd this patch into mainline,
> > but I think it may be better to wait on it for now. Since I asked that,
> > we've since found out that ext4 is bumping the i_version counter on
> > atime updates. It'd be best to get that fixed before we turn this on
> > unconditionally, since it could cause a performance regression in some
> > cases. I'll plan to pick this back up for my latest i_version series if
> > that sounds ok to you.
> > 
> > Sorry for the back and forth, and thanks again!
> 
> Hi Jeff,
> 
> sure, no problem. I can drop the patch. The rest of the series is still
> valid though.
> 
> Thanks!
> -Lukas
> 
> 

Yes, the rest is fine (AFAICT)

Thanks!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes
  2022-08-24 16:03 [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
  2022-08-24 16:03 ` [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
  2022-08-24 16:03 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
@ 2022-09-29 14:58 ` Theodore Ts'o
  2 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2022-09-29 14:58 UTC (permalink / raw)
  To: linux-ext4, lczerner
  Cc: Theodore Ts'o, jlayton, ebiggers, jack, david, brauner,
	linux-fsdevel

On Wed, 24 Aug 2022 18:03:47 +0200, Lukas Czerner wrote:
> ea_inodes are using i_version for storing part of the reference count so
> we really need to leave it alone.
> 
> The problem can be reproduced by xfstest ext4/026 when iversion is
> enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
> inodes in ext4_mark_iloc_dirty().
> 
> [...]

Applied, thanks!

[1/3] ext4: don't increase iversion counter for ea_inodes
      commit: 6c7c5ade428cc65b58e4aba1925b5347970f4456
[2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
      commit: 625e1e67b66245b93ccae868cd4a950d257de003
[3/3] ext4: unconditionally enable the i_version counter
      commit: 59772a0cb09a7ec77362653e8be207a464fa04af

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-25 10:06   ` [PATCH v5] " Lukas Czerner
@ 2022-09-29 14:58     ` Theodore Ts'o
  0 siblings, 0 replies; 13+ messages in thread
From: Theodore Ts'o @ 2022-09-29 14:58 UTC (permalink / raw)
  To: linux-ext4, lczerner
  Cc: Theodore Ts'o, jlayton, ebiggers, jack, david, linux-fsdevel, hch

On Thu, 25 Aug 2022 12:06:57 +0200, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
> 
> [...]

Applied, thanks!

[1/1] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
      commit: 625e1e67b66245b93ccae868cd4a950d257de003

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-12 12:37 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
  2022-08-12 13:05   ` Christian Brauner
@ 2022-08-16 11:48   ` Jan Kara
  1 sibling, 0 replies; 13+ messages in thread
From: Jan Kara @ 2022-08-16 11:48 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, tytso, jlayton, jack, linux-fsdevel, ebiggers, david,
	Benjamin Coddington, Christoph Hellwig, Darrick J . Wong

On Fri 12-08-22 14:37:27, Lukas Czerner wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> The original i_version implementation was pretty expensive, requiring a
> log flush on every change. Because of this, it was gated behind a mount
> option (implemented via the MS_I_VERSION mountoption flag).
> 
> Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> i_version flag much less expensive, so there is no longer a performance
> penalty from enabling it. xfs and btrfs already enable it
> unconditionally when the on-disk format can support it.
> 
> Have ext4 ignore the SB_I_VERSION flag, and just enable it
> unconditionally. While we're in here, remove the handling of
> Opt_i_version as well, since we're almost to 5.20 anyway.
> 
> Ideally, we'd couple this change with a way to disable the i_version
> counter (just in case), but the way the iversion mount option was
> implemented makes that difficult to do. We'd need to add a new mount
> option altogether or do something with tune2fs. That's probably best
> left to later patches if it turns out to be needed.
> 
> [ Removed leftover bits of i_version from ext4_apply_options() since it
> now can't ever be set in ctx->mask_s_flags -- lczerner ]
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> v3: Removed leftover bits of i_version from ext4_apply_options
> v4: no change
> 
>  fs/ext4/inode.c |  5 ++---
>  fs/ext4/super.c | 21 ++++-----------------
>  2 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2a220be34caa..c77d40f05763 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
>  			return -EINVAL;
>  		}
>  
> -		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> +		if (attr->ia_size != inode->i_size)
>  			inode_inc_iversion(inode);
>  
>  		if (shrink) {
> @@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
>  	 * ea_inodes are using i_version for storing reference count, don't
>  	 * mess with it
>  	 */
> -	if (IS_I_VERSION(inode) &&
> -	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> +	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
>  		inode_inc_iversion(inode);
>  
>  	/* the do_update_inode consumes one bh->b_count */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9a66abcca1a8..1c953f6d400e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1585,7 +1585,7 @@ enum {
>  	Opt_inlinecrypt,
>  	Opt_usrjquota, Opt_grpjquota, Opt_quota,
>  	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> -	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> +	Opt_usrquota, Opt_grpquota, Opt_prjquota,
>  	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
>  	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
>  	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
>  	fsparam_flag	("barrier",		Opt_barrier),
>  	fsparam_u32	("barrier",		Opt_barrier),
>  	fsparam_flag	("nobarrier",		Opt_nobarrier),
> -	fsparam_flag	("i_version",		Opt_i_version),
>  	fsparam_flag	("dax",			Opt_dax),
>  	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
>  	fsparam_u32	("stripe",		Opt_stripe),
> @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  	case Opt_abort:
>  		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
>  		return 0;
> -	case Opt_i_version:
> -		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> -		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> -		ctx_set_flags(ctx, SB_I_VERSION);
> -		return 0;
>  	case Opt_inlinecrypt:
>  #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
>  		ctx_set_flags(ctx, SB_INLINECRYPT);
> @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
>  	sb->s_flags &= ~ctx->mask_s_flags;
>  	sb->s_flags |= ctx->vals_s_flags;
>  
> -	/*
> -	 * i_version differs from common mount option iversion so we have
> -	 * to let vfs know that it was set, otherwise it would get cleared
> -	 * on remount
> -	 */
> -	if (ctx->mask_s_flags & SB_I_VERSION)
> -		fc->sb_flags |= SB_I_VERSION;
> -
>  #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
>  	APPLY(s_commit_interval);
>  	APPLY(s_stripe);
> @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
>  		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
>  	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
>  		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> -	if (sb->s_flags & SB_I_VERSION)
> -		SEQ_OPTS_PUTS("i_version");
>  	if (nodefs || sbi->s_stripe)
>  		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
>  	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> @@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
>  		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
>  
> +	/* i_version is always enabled now */
> +	sb->s_flags |= SB_I_VERSION;
> +
>  	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
>  	    (ext4_has_compat_features(sb) ||
>  	     ext4_has_ro_compat_features(sb) ||
> -- 
> 2.37.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-12 12:37 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
@ 2022-08-12 13:05   ` Christian Brauner
  2022-08-16 11:48   ` Jan Kara
  1 sibling, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2022-08-12 13:05 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, tytso, jlayton, jack, linux-fsdevel, ebiggers, david,
	Benjamin Coddington, Christoph Hellwig, Darrick J . Wong

On Fri, Aug 12, 2022 at 02:37:27PM +0200, Lukas Czerner wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> The original i_version implementation was pretty expensive, requiring a
> log flush on every change. Because of this, it was gated behind a mount
> option (implemented via the MS_I_VERSION mountoption flag).
> 
> Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> i_version flag much less expensive, so there is no longer a performance
> penalty from enabling it. xfs and btrfs already enable it
> unconditionally when the on-disk format can support it.
> 
> Have ext4 ignore the SB_I_VERSION flag, and just enable it
> unconditionally. While we're in here, remove the handling of
> Opt_i_version as well, since we're almost to 5.20 anyway.
> 
> Ideally, we'd couple this change with a way to disable the i_version
> counter (just in case), but the way the iversion mount option was
> implemented makes that difficult to do. We'd need to add a new mount
> option altogether or do something with tune2fs. That's probably best
> left to later patches if it turns out to be needed.
> 
> [ Removed leftover bits of i_version from ext4_apply_options() since it
> now can't ever be set in ctx->mask_s_flags -- lczerner ]
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---

Since ext4 seems to ignore unknown mount options in ext4_parse_param()
removing seems good,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 3/3] ext4: unconditionally enable the i_version counter
  2022-08-12 12:37 [PATCH v3 " Lukas Czerner
@ 2022-08-12 12:37 ` Lukas Czerner
  2022-08-12 13:05   ` Christian Brauner
  2022-08-16 11:48   ` Jan Kara
  0 siblings, 2 replies; 13+ messages in thread
From: Lukas Czerner @ 2022-08-12 12:37 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, jlayton, jack, linux-fsdevel, ebiggers, david,
	Benjamin Coddington, Christoph Hellwig, Darrick J . Wong

From: Jeff Layton <jlayton@kernel.org>

The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).

Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.

Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, remove the handling of
Opt_i_version as well, since we're almost to 5.20 anyway.

Ideally, we'd couple this change with a way to disable the i_version
counter (just in case), but the way the iversion mount option was
implemented makes that difficult to do. We'd need to add a new mount
option altogether or do something with tune2fs. That's probably best
left to later patches if it turns out to be needed.

[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]

Cc: Dave Chinner <david@fromorbit.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
v3: Removed leftover bits of i_version from ext4_apply_options
v4: no change

 fs/ext4/inode.c |  5 ++---
 fs/ext4/super.c | 21 ++++-----------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2a220be34caa..c77d40f05763 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5425,7 +5425,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 			return -EINVAL;
 		}
 
-		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+		if (attr->ia_size != inode->i_size)
 			inode_inc_iversion(inode);
 
 		if (shrink) {
@@ -5735,8 +5735,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 	 * ea_inodes are using i_version for storing reference count, don't
 	 * mess with it
 	 */
-	if (IS_I_VERSION(inode) &&
-	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
 		inode_inc_iversion(inode);
 
 	/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9a66abcca1a8..1c953f6d400e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
 	Opt_inlinecrypt,
 	Opt_usrjquota, Opt_grpjquota, Opt_quota,
 	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
-	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+	Opt_usrquota, Opt_grpquota, Opt_prjquota,
 	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
 	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
 	fsparam_flag	("barrier",		Opt_barrier),
 	fsparam_u32	("barrier",		Opt_barrier),
 	fsparam_flag	("nobarrier",		Opt_nobarrier),
-	fsparam_flag	("i_version",		Opt_i_version),
 	fsparam_flag	("dax",			Opt_dax),
 	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
 	fsparam_u32	("stripe",		Opt_stripe),
@@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case Opt_abort:
 		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
 		return 0;
-	case Opt_i_version:
-		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
-		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
-		ctx_set_flags(ctx, SB_I_VERSION);
-		return 0;
 	case Opt_inlinecrypt:
 #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
 		ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags &= ~ctx->mask_s_flags;
 	sb->s_flags |= ctx->vals_s_flags;
 
-	/*
-	 * i_version differs from common mount option iversion so we have
-	 * to let vfs know that it was set, otherwise it would get cleared
-	 * on remount
-	 */
-	if (ctx->mask_s_flags & SB_I_VERSION)
-		fc->sb_flags |= SB_I_VERSION;
-
 #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
 	APPLY(s_commit_interval);
 	APPLY(s_stripe);
@@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
 		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
 	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
 		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
-	if (sb->s_flags & SB_I_VERSION)
-		SEQ_OPTS_PUTS("i_version");
 	if (nodefs || sbi->s_stripe)
 		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
 	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4640,6 +4624,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
 		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
 
+	/* i_version is always enabled now */
+	sb->s_flags |= SB_I_VERSION;
+
 	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
 	    (ext4_has_compat_features(sb) ||
 	     ext4_has_ro_compat_features(sb) ||
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-29 14:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24 16:03 [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
2022-08-24 16:03 ` [PATCH v4 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
2022-08-24 17:31   ` Jan Kara
2022-08-25 10:06   ` [PATCH v5] " Lukas Czerner
2022-09-29 14:58     ` Theodore Ts'o
2022-08-24 16:03 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
2022-08-26 16:11   ` Jeff Layton
2022-08-29  8:17     ` Lukas Czerner
2022-08-29 10:16       ` Jeff Layton
2022-09-29 14:58 ` [PATCH v4 1/3] ext4: don't increase iversion counter for ea_inodes Theodore Ts'o
  -- strict thread matches above, loose matches on Subject: below --
2022-08-12 12:37 [PATCH v3 " Lukas Czerner
2022-08-12 12:37 ` [PATCH v4 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
2022-08-12 13:05   ` Christian Brauner
2022-08-16 11:48   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).