All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes
@ 2022-08-03 10:53 Lukas Czerner
  2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
  2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
  0 siblings, 2 replies; 9+ messages in thread
From: Lukas Czerner @ 2022-08-03 10:53 UTC (permalink / raw)
  To: linux-ext4; +Cc: jlayton, tytso, linux-fsdevel, Jan Kara

ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.

The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
v2: no changes

 fs/ext4/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84c0eb55071d..b76554124224 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5717,7 +5717,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 	}
 	ext4_fc_track_inode(handle, inode);
 
-	if (IS_I_VERSION(inode))
+	/*
+	 * ea_inodes are using i_version for storing reference count, don't
+	 * mess with it
+	 */
+	if (IS_I_VERSION(inode) &&
+	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
 		inode_inc_iversion(inode);
 
 	/* the do_update_inode consumes one bh->b_count */
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-03 10:53 [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
@ 2022-08-03 10:53 ` Lukas Czerner
  2022-08-05  8:05   ` Eric Biggers
  2022-08-07 23:08   ` Dave Chinner
  2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
  1 sibling, 2 replies; 9+ messages in thread
From: Lukas Czerner @ 2022-08-03 10:53 UTC (permalink / raw)
  To: linux-ext4
  Cc: jlayton, tytso, linux-fsdevel, Dave Chinner, Christoph Hellwig, Jan Kara

Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.

The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.

Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.

Thanks Jan Kara for suggestions on how to make this work properly.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
v2: Reworked according to suggestions from Jan

 fs/fs-writeback.c  | 34 ++++++++++++++++++++++------------
 fs/xfs/xfs_super.c |  3 ++-
 include/linux/fs.h |  6 +++---
 3 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..638dbf143727 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
 	 */
 	if (!(inode->i_state & I_DIRTY_ALL))
 		inode_cgwb_move_to_attached(inode, wb);
-	else if (!(inode->i_state & I_SYNC_QUEUED) &&
-		 (inode->i_state & I_DIRTY))
-		redirty_tail_locked(inode, wb);
+	else if (!(inode->i_state & I_SYNC_QUEUED)) {
+		if ((inode->i_state & I_DIRTY))
+			redirty_tail_locked(inode, wb);
+		else if (inode->i_state & I_DIRTY_TIME) {
+			inode->dirtied_when = jiffies;
+			inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+		}
+	}
 
 	spin_unlock(&wb->list_lock);
 	inode_sync_complete(inode);
@@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	trace_writeback_mark_inode_dirty(inode, flags);
 
 	if (flags & I_DIRTY_INODE) {
+
+		/* Inode timestamp update will piggback on this dirtying */
+		if (inode->i_state & I_DIRTY_TIME) {
+			spin_lock(&inode->i_lock);
+			if (inode->i_state & I_DIRTY_TIME) {
+				inode->i_state &= ~I_DIRTY_TIME;
+				flags |= I_DIRTY_TIME;
+			}
+			spin_unlock(&inode->i_lock);
+		}
+
 		/*
 		 * Notify the filesystem about the inode being dirtied, so that
 		 * (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2394,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 		 */
 		trace_writeback_dirty_inode_start(inode, flags);
 		if (sb->s_op->dirty_inode)
-			sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+			sb->s_op->dirty_inode(inode,
+				flags & (I_DIRTY_INODE | I_DIRTY_TIME));
 		trace_writeback_dirty_inode(inode, flags);
 
 		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2416,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	 */
 	smp_mb();
 
-	if (((inode->i_state & flags) == flags) ||
-	    (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+	if ((inode->i_state & flags) == flags)
 		return;
 
 	spin_lock(&inode->i_lock);
-	if (dirtytime && (inode->i_state & I_DIRTY_INODE))
-		goto out_unlock_inode;
 	if ((inode->i_state & flags) != flags) {
 		const int was_dirty = inode->i_state & I_DIRTY;
 
 		inode_attach_wb(inode, NULL);
 
-		/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
-		if (flags & I_DIRTY_INODE)
-			inode->i_state &= ~I_DIRTY_TIME;
 		inode->i_state |= flags;
 
 		/*
@@ -2486,7 +2497,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 out_unlock:
 	if (wb)
 		spin_unlock(&wb->list_lock);
-out_unlock_inode:
 	spin_unlock(&inode->i_lock);
 }
 EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index aa977c7ea370..cff05a4771b5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
 
 	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
 		return;
-	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+	if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
+	    !((inode->i_state | flag) & I_DIRTY_TIME))
 		return;
 
 	if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..2243797badf2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
  *			lazytime mount option is enabled.  We keep track of this
  *			separately from I_DIRTY_SYNC in order to implement
  *			lazytime.  This gets cleared if I_DIRTY_INODE
- *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
- *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- *			i_state, but not both.  I_DIRTY_PAGES may still be set.
+ *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ *			in place.
  * I_NEW		Serves as both a mutex and completion notification.
  *			New inodes set I_NEW.  If two processes both create
  *			the same inode, one of them will release its inode and
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] ext4: unconditionally enable the i_version counter
  2022-08-03 10:53 [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
  2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
@ 2022-08-03 10:53 ` Lukas Czerner
  2022-08-03 13:04   ` Jeff Layton
  1 sibling, 1 reply; 9+ messages in thread
From: Lukas Czerner @ 2022-08-03 10:53 UTC (permalink / raw)
  To: linux-ext4
  Cc: jlayton, tytso, linux-fsdevel, Dave Chinner, Benjamin Coddington,
	Christoph Hellwig, Darrick J . Wong

From: Jeff Layton <jlayton@kernel.org>

The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).

Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.

Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, remove the handling of
Opt_i_version as well, since we're almost to 5.20 anyway.

Ideally, we'd couple this change with a way to disable the i_version
counter (just in case), but the way the iversion mount option was
implemented makes that difficult to do. We'd need to add a new mount
option altogether or do something with tune2fs. That's probably best
left to later patches if it turns out to be needed.

[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]

Cc: Dave Chinner <david@fromorbit.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
v3: Removed leftover bits of i_version from ext4_apply_options

 fs/ext4/inode.c |  5 ++---
 fs/ext4/super.c | 21 ++++-----------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b76554124224..acd00300a697 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5411,7 +5411,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
 			return -EINVAL;
 		}
 
-		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+		if (attr->ia_size != inode->i_size)
 			inode_inc_iversion(inode);
 
 		if (shrink) {
@@ -5721,8 +5721,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
 	 * ea_inodes are using i_version for storing reference count, don't
 	 * mess with it
 	 */
-	if (IS_I_VERSION(inode) &&
-	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
 		inode_inc_iversion(inode);
 
 	/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 845f2f8aee5f..4c3e6021e772 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
 	Opt_inlinecrypt,
 	Opt_usrjquota, Opt_grpjquota, Opt_quota,
 	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
-	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+	Opt_usrquota, Opt_grpquota, Opt_prjquota,
 	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
 	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
 	fsparam_flag	("barrier",		Opt_barrier),
 	fsparam_u32	("barrier",		Opt_barrier),
 	fsparam_flag	("nobarrier",		Opt_nobarrier),
-	fsparam_flag	("i_version",		Opt_i_version),
 	fsparam_flag	("dax",			Opt_dax),
 	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
 	fsparam_u32	("stripe",		Opt_stripe),
@@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case Opt_abort:
 		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
 		return 0;
-	case Opt_i_version:
-		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
-		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
-		ctx_set_flags(ctx, SB_I_VERSION);
-		return 0;
 	case Opt_inlinecrypt:
 #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
 		ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags &= ~ctx->mask_s_flags;
 	sb->s_flags |= ctx->vals_s_flags;
 
-	/*
-	 * i_version differs from common mount option iversion so we have
-	 * to let vfs know that it was set, otherwise it would get cleared
-	 * on remount
-	 */
-	if (ctx->mask_s_flags & SB_I_VERSION)
-		fc->sb_flags |= SB_I_VERSION;
-
 #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
 	APPLY(s_commit_interval);
 	APPLY(s_stripe);
@@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
 		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
 	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
 		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
-	if (sb->s_flags & SB_I_VERSION)
-		SEQ_OPTS_PUTS("i_version");
 	if (nodefs || sbi->s_stripe)
 		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
 	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4630,6 +4614,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
 		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
 
+	/* i_version is always enabled now */
+	sb->s_flags |= SB_I_VERSION;
+
 	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
 	    (ext4_has_compat_features(sb) ||
 	     ext4_has_ro_compat_features(sb) ||
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] ext4: unconditionally enable the i_version counter
  2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
@ 2022-08-03 13:04   ` Jeff Layton
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-08-03 13:04 UTC (permalink / raw)
  To: Lukas Czerner, linux-ext4
  Cc: tytso, linux-fsdevel, Dave Chinner, Benjamin Coddington,
	Christoph Hellwig, Darrick J . Wong

On Wed, 2022-08-03 at 12:53 +0200, Lukas Czerner wrote:
> From: Jeff Layton <jlayton@kernel.org>
> 
> The original i_version implementation was pretty expensive, requiring a
> log flush on every change. Because of this, it was gated behind a mount
> option (implemented via the MS_I_VERSION mountoption flag).
> 
> Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> i_version flag much less expensive, so there is no longer a performance
> penalty from enabling it. xfs and btrfs already enable it
> unconditionally when the on-disk format can support it.
> 
> Have ext4 ignore the SB_I_VERSION flag, and just enable it
> unconditionally. While we're in here, remove the handling of
> Opt_i_version as well, since we're almost to 5.20 anyway.
> 
> Ideally, we'd couple this change with a way to disable the i_version
> counter (just in case), but the way the iversion mount option was
> implemented makes that difficult to do. We'd need to add a new mount
> option altogether or do something with tune2fs. That's probably best
> left to later patches if it turns out to be needed.
> 
> [ Removed leftover bits of i_version from ext4_apply_options() since it
> now can't ever be set in ctx->mask_s_flags -- lczerner ]
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
> v3: Removed leftover bits of i_version from ext4_apply_options
> 
>  fs/ext4/inode.c |  5 ++---
>  fs/ext4/super.c | 21 ++++-----------------
>  2 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index b76554124224..acd00300a697 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5411,7 +5411,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
>  			return -EINVAL;
>  		}
>  
> -		if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> +		if (attr->ia_size != inode->i_size)
>  			inode_inc_iversion(inode);
>  
>  		if (shrink) {
> @@ -5721,8 +5721,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
>  	 * ea_inodes are using i_version for storing reference count, don't
>  	 * mess with it
>  	 */
> -	if (IS_I_VERSION(inode) &&
> -	    !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> +	if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
>  		inode_inc_iversion(inode);
>  
>  	/* the do_update_inode consumes one bh->b_count */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 845f2f8aee5f..4c3e6021e772 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1585,7 +1585,7 @@ enum {
>  	Opt_inlinecrypt,
>  	Opt_usrjquota, Opt_grpjquota, Opt_quota,
>  	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> -	Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> +	Opt_usrquota, Opt_grpquota, Opt_prjquota,
>  	Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
>  	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
>  	Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
>  	fsparam_flag	("barrier",		Opt_barrier),
>  	fsparam_u32	("barrier",		Opt_barrier),
>  	fsparam_flag	("nobarrier",		Opt_nobarrier),
> -	fsparam_flag	("i_version",		Opt_i_version),
>  	fsparam_flag	("dax",			Opt_dax),
>  	fsparam_enum	("dax",			Opt_dax_type, ext4_param_dax),
>  	fsparam_u32	("stripe",		Opt_stripe),
> @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
>  	case Opt_abort:
>  		ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
>  		return 0;
> -	case Opt_i_version:
> -		ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> -		ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> -		ctx_set_flags(ctx, SB_I_VERSION);
> -		return 0;
>  	case Opt_inlinecrypt:
>  #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
>  		ctx_set_flags(ctx, SB_INLINECRYPT);
> @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
>  	sb->s_flags &= ~ctx->mask_s_flags;
>  	sb->s_flags |= ctx->vals_s_flags;
>  
> -	/*
> -	 * i_version differs from common mount option iversion so we have
> -	 * to let vfs know that it was set, otherwise it would get cleared
> -	 * on remount
> -	 */
> -	if (ctx->mask_s_flags & SB_I_VERSION)
> -		fc->sb_flags |= SB_I_VERSION;
> -
>  #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
>  	APPLY(s_commit_interval);
>  	APPLY(s_stripe);
> @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
>  		SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
>  	if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
>  		SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> -	if (sb->s_flags & SB_I_VERSION)
> -		SEQ_OPTS_PUTS("i_version");
>  	if (nodefs || sbi->s_stripe)
>  		SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
>  	if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> @@ -4630,6 +4614,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  	sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
>  		(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
>  
> +	/* i_version is always enabled now */
> +	sb->s_flags |= SB_I_VERSION;
> +
>  	if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
>  	    (ext4_has_compat_features(sb) ||
>  	     ext4_has_ro_compat_features(sb) ||

Looks good to me. Thanks for picking this up, Lukas!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
@ 2022-08-05  8:05   ` Eric Biggers
  2022-08-05 12:23     ` Lukas Czerner
  2022-08-07 23:08   ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2022-08-05  8:05 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Dave Chinner,
	Christoph Hellwig, Jan Kara

On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ad5e3520fae..2243797badf2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
>   *			The inode itself only has dirty timestamps, and the
>   *			lazytime mount option is enabled.  We keep track of this
>   *			separately from I_DIRTY_SYNC in order to implement
>   *			lazytime.  This gets cleared if I_DIRTY_INODE
> - *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
> - *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> - *			i_state, but not both.  I_DIRTY_PAGES may still be set.
> + *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> + *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> + *			in place.

I'm still having a hard time understanding the new semantics.  The first
sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
itself only has dirty timestamps", right?

Also, have you checked all the places that I_DIRTY_TIME is used and verified
they do the right thing now?  What about inode_is_dirtytime_only()?

Also what is the precise meaning of the flags argument to ->dirty_inode now?

	sb->s_op->dirty_inode(inode,
			flags & (I_DIRTY_INODE | I_DIRTY_TIME));

Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.

- Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-05  8:05   ` Eric Biggers
@ 2022-08-05 12:23     ` Lukas Czerner
  2022-08-12 18:20       ` Eric Biggers
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Czerner @ 2022-08-05 12:23 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Dave Chinner,
	Christoph Hellwig, Jan Kara

On Fri, Aug 05, 2022 at 01:05:45AM -0700, Eric Biggers wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9ad5e3520fae..2243797badf2 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> >   *			The inode itself only has dirty timestamps, and the
> >   *			lazytime mount option is enabled.  We keep track of this
> >   *			separately from I_DIRTY_SYNC in order to implement
> >   *			lazytime.  This gets cleared if I_DIRTY_INODE
> > - *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set.  I.e.
> > - *			either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> > - *			i_state, but not both.  I_DIRTY_PAGES may still be set.
> > + *			(I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> > + *			I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> > + *			in place.
> 
> I'm still having a hard time understanding the new semantics.  The first
> sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
> itself only has dirty timestamps", right?

The problem is that it was always assumed that I_DIRTY_INODE superseeds
I_DIRTY_TIME and so it would get cleared in __mark_inode_dirty() when we
have I_DIRTY_INODE. That's true, we call sb->s_op->dirty_inode(), the
time update gets pushed into on-disk inode structure, I_DIRTY_TIME
cleared and it will get queued for writeback.

Any subsequent dirtying with I_DIRTY_TIME gets ignored simply because
I_DIRTY_INODE is already set in i_state. But in ext4 this time update
will never get pushed into on disk inode and there is no I_DIRTY_TIME so
once the writeback is done we've lost all those I_DIRTY_TIME updates in
between even if there was a sync.

Now, we still clear I_DIRTY_TIME when we get I_DIRTY_INODE, but any
subsequent I_DIRTY_TIME only updates won't be ignored and we set it into
i_state. After the writeback is done it'll be moved to b_dirty_time
list.

So I am not sure how would you like it to be re-worded, simply removing
the 'only' would be ok?

> 
> Also, have you checked all the places that I_DIRTY_TIME is used and verified
> they do the right thing now?  What about inode_is_dirtytime_only()?

Yes, that's fine, despite the slightly misleading name ;)

> 
> Also what is the precise meaning of the flags argument to ->dirty_inode now?
> 
> 	sb->s_op->dirty_inode(inode,
> 			flags & (I_DIRTY_INODE | I_DIRTY_TIME));
> 
> Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.

Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
well. Additionaly it can have I_DIRTY_TIME to inform the fs we have a
dirty timestamp as well (in case of lazytime).

Perhaps we can add:

If the inode has dirty timestamp and lazytime is enabled I_DIRTY_TIME
will be set in the flags.

-Lukas

> 
> - Eric
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
  2022-08-05  8:05   ` Eric Biggers
@ 2022-08-07 23:08   ` Dave Chinner
  2022-08-08 10:26     ` Lukas Czerner
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2022-08-07 23:08 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Christoph Hellwig, Jan Kara

On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
> 
> The problem can be reproduced on ext4 by running xfstest generic/622
> with -o iversion mount option.
> 
> Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> I_DIRTY_INODE. Also make sure that the case is properly handled in
> writeback_single_inode() as well. Additionally changes in
> xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
> 
> Thanks Jan Kara for suggestions on how to make this work properly.
> 
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Suggested-by: Jan Kara <jack@suse.cz>
> ---
> v2: Reworked according to suggestions from Jan

....

> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index aa977c7ea370..cff05a4771b5 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
>  
>  	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
>  		return;
> -	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> +	if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
> +	    !((inode->i_state | flag) & I_DIRTY_TIME))
>  		return;

My eyes, they bleed. The dirty time code was already a horrid
abomination, and this makes it worse.

From looking at the code, I cannot work out what the new semantics
for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work
out what the condition this is new code is supposed to be doing. I
*can't verify it is correct* by reading the code.

Can you please add a comment here explaining the conditions where we
don't have to log a new timestamp update?

Also, if "flag" now contains multiple flags, can you rename it
"flags"?

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-07 23:08   ` Dave Chinner
@ 2022-08-08 10:26     ` Lukas Czerner
  0 siblings, 0 replies; 9+ messages in thread
From: Lukas Czerner @ 2022-08-08 10:26 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Christoph Hellwig, Jan Kara

On Mon, Aug 08, 2022 at 09:08:10AM +1000, Dave Chinner wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > Currently the I_DIRTY_TIME will never get set if the inode already has
> > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME.  That's
> > true, however ext4 will only update the on-disk inode in
> > ->dirty_inode(), not on actual writeback. As a result if the inode
> > already has I_DIRTY_INODE state by the time we get to
> > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> > into on-disk inode and will not get updated until the next I_DIRTY_INODE
> > update, which might never come if we crash or get a power failure.
> > 
> > The problem can be reproduced on ext4 by running xfstest generic/622
> > with -o iversion mount option.
> > 
> > Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> > I_DIRTY_INODE. Also make sure that the case is properly handled in
> > writeback_single_inode() as well. Additionally changes in
> > xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
> > 
> > Thanks Jan Kara for suggestions on how to make this work properly.
> > 
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > ---
> > v2: Reworked according to suggestions from Jan
> 
> ....
> 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index aa977c7ea370..cff05a4771b5 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
> >  
> >  	if (!(inode->i_sb->s_flags & SB_LAZYTIME))
> >  		return;
> > -	if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> > +	if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
> > +	    !((inode->i_state | flag) & I_DIRTY_TIME))
> >  		return;
> 
> My eyes, they bleed. The dirty time code was already a horrid
> abomination, and this makes it worse.
> 
> From looking at the code, I cannot work out what the new semantics
> for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work

Hi Dave,

please see the other thready for this patch with Eric Biggers, where I
try to explain and give some suggestion to change the doc. Does it make
sense to you, or am I missing something?

https://marc.info/?l=linux-ext4&m=165970194205621&w=2

> out what the condition this is new code is supposed to be doing. I
> *can't verify it is correct* by reading the code.

The ->dirty_inode() needed to be changed to clear I_DIRTY_TIME from
i_state *before* we call ->dirty_inode() to avoid race where we would
lose timestamp update that comes just a little later, after
-dirty_inode() call with I_DRITY_INODE.

But that would break xfs, so I decided to keep the condition and loosen
the requirement so that I_DIRTY_TIME can also be se in 'flag', not just
the i_state. Hence the abomination.

> 
> Can you please add a comment here explaining the conditions where we
> don't have to log a new timestamp update?

How about something like this?

Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC) and
has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be either already
set in i_state, or passed in flags possibly together with I_DIRTY_SYNC.

> 
> Also, if "flag" now contains multiple flags, can you rename it
> "flags"?

Sure, I can do that.

Thanks!
-Lukas

> 
> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
  2022-08-05 12:23     ` Lukas Czerner
@ 2022-08-12 18:20       ` Eric Biggers
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Biggers @ 2022-08-12 18:20 UTC (permalink / raw)
  To: Lukas Czerner
  Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Dave Chinner,
	Christoph Hellwig, Jan Kara

On Fri, Aug 05, 2022 at 02:23:06PM +0200, Lukas Czerner wrote:
> > 
> > Also what is the precise meaning of the flags argument to ->dirty_inode now?
> > 
> > 	sb->s_op->dirty_inode(inode,
> > 			flags & (I_DIRTY_INODE | I_DIRTY_TIME));
> > 
> > Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.
> 
> Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
> well.

Well, it didn't really need to because there were only two possibilities:
datasync and not datasync.  This patch changes that.

> Additionaly it can have I_DIRTY_TIME to inform the fs we have a
> dirty timestamp as well (in case of lazytime).

This is introduced by this patch.

- Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-12 18:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 10:53 [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
2022-08-05  8:05   ` Eric Biggers
2022-08-05 12:23     ` Lukas Czerner
2022-08-12 18:20       ` Eric Biggers
2022-08-07 23:08   ` Dave Chinner
2022-08-08 10:26     ` Lukas Czerner
2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
2022-08-03 13:04   ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.