* [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes
@ 2022-08-03 10:53 Lukas Czerner
2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
0 siblings, 2 replies; 9+ messages in thread
From: Lukas Czerner @ 2022-08-03 10:53 UTC (permalink / raw)
To: linux-ext4; +Cc: jlayton, tytso, linux-fsdevel, Jan Kara
ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.
The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
v2: no changes
fs/ext4/inode.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84c0eb55071d..b76554124224 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5717,7 +5717,12 @@ int ext4_mark_iloc_dirty(handle_t *handle,
}
ext4_fc_track_inode(handle, inode);
- if (IS_I_VERSION(inode))
+ /*
+ * ea_inodes are using i_version for storing reference count, don't
+ * mess with it
+ */
+ if (IS_I_VERSION(inode) &&
+ !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
inode_inc_iversion(inode);
/* the do_update_inode consumes one bh->b_count */
--
2.37.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
2022-08-03 10:53 [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
@ 2022-08-03 10:53 ` Lukas Czerner
2022-08-05 8:05 ` Eric Biggers
2022-08-07 23:08 ` Dave Chinner
2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
1 sibling, 2 replies; 9+ messages in thread
From: Lukas Czerner @ 2022-08-03 10:53 UTC (permalink / raw)
To: linux-ext4
Cc: jlayton, tytso, linux-fsdevel, Dave Chinner, Christoph Hellwig, Jan Kara
Currently the I_DIRTY_TIME will never get set if the inode already has
I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
true, however ext4 will only update the on-disk inode in
->dirty_inode(), not on actual writeback. As a result if the inode
already has I_DIRTY_INODE state by the time we get to
__mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
into on-disk inode and will not get updated until the next I_DIRTY_INODE
update, which might never come if we crash or get a power failure.
The problem can be reproduced on ext4 by running xfstest generic/622
with -o iversion mount option.
Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
I_DIRTY_INODE. Also make sure that the case is properly handled in
writeback_single_inode() as well. Additionally changes in
xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
Thanks Jan Kara for suggestions on how to make this work properly.
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Suggested-by: Jan Kara <jack@suse.cz>
---
v2: Reworked according to suggestions from Jan
fs/fs-writeback.c | 34 ++++++++++++++++++++++------------
fs/xfs/xfs_super.c | 3 ++-
include/linux/fs.h | 6 +++---
3 files changed, 27 insertions(+), 16 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 05221366a16d..638dbf143727 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode,
*/
if (!(inode->i_state & I_DIRTY_ALL))
inode_cgwb_move_to_attached(inode, wb);
- else if (!(inode->i_state & I_SYNC_QUEUED) &&
- (inode->i_state & I_DIRTY))
- redirty_tail_locked(inode, wb);
+ else if (!(inode->i_state & I_SYNC_QUEUED)) {
+ if ((inode->i_state & I_DIRTY))
+ redirty_tail_locked(inode, wb);
+ else if (inode->i_state & I_DIRTY_TIME) {
+ inode->dirtied_when = jiffies;
+ inode_io_list_move_locked(inode, wb, &wb->b_dirty_time);
+ }
+ }
spin_unlock(&wb->list_lock);
inode_sync_complete(inode);
@@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags)
trace_writeback_mark_inode_dirty(inode, flags);
if (flags & I_DIRTY_INODE) {
+
+ /* Inode timestamp update will piggback on this dirtying */
+ if (inode->i_state & I_DIRTY_TIME) {
+ spin_lock(&inode->i_lock);
+ if (inode->i_state & I_DIRTY_TIME) {
+ inode->i_state &= ~I_DIRTY_TIME;
+ flags |= I_DIRTY_TIME;
+ }
+ spin_unlock(&inode->i_lock);
+ }
+
/*
* Notify the filesystem about the inode being dirtied, so that
* (if needed) it can update on-disk fields and journal the
@@ -2378,7 +2394,8 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
trace_writeback_dirty_inode_start(inode, flags);
if (sb->s_op->dirty_inode)
- sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE);
+ sb->s_op->dirty_inode(inode,
+ flags & (I_DIRTY_INODE | I_DIRTY_TIME));
trace_writeback_dirty_inode(inode, flags);
/* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
@@ -2399,21 +2416,15 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
smp_mb();
- if (((inode->i_state & flags) == flags) ||
- (dirtytime && (inode->i_state & I_DIRTY_INODE)))
+ if ((inode->i_state & flags) == flags)
return;
spin_lock(&inode->i_lock);
- if (dirtytime && (inode->i_state & I_DIRTY_INODE))
- goto out_unlock_inode;
if ((inode->i_state & flags) != flags) {
const int was_dirty = inode->i_state & I_DIRTY;
inode_attach_wb(inode, NULL);
- /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
- if (flags & I_DIRTY_INODE)
- inode->i_state &= ~I_DIRTY_TIME;
inode->i_state |= flags;
/*
@@ -2486,7 +2497,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
out_unlock:
if (wb)
spin_unlock(&wb->list_lock);
-out_unlock_inode:
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(__mark_inode_dirty);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index aa977c7ea370..cff05a4771b5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
if (!(inode->i_sb->s_flags & SB_LAZYTIME))
return;
- if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
+ if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
+ !((inode->i_state | flag) & I_DIRTY_TIME))
return;
if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..2243797badf2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
* lazytime mount option is enabled. We keep track of this
* separately from I_DIRTY_SYNC in order to implement
* lazytime. This gets cleared if I_DIRTY_INODE
- * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
- * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
- * i_state, but not both. I_DIRTY_PAGES may still be set.
+ * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
+ * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
+ * in place.
* I_NEW Serves as both a mutex and completion notification.
* New inodes set I_NEW. If two processes both create
* the same inode, one of them will release its inode and
--
2.37.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v3 3/3] ext4: unconditionally enable the i_version counter
2022-08-03 10:53 [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
@ 2022-08-03 10:53 ` Lukas Czerner
2022-08-03 13:04 ` Jeff Layton
1 sibling, 1 reply; 9+ messages in thread
From: Lukas Czerner @ 2022-08-03 10:53 UTC (permalink / raw)
To: linux-ext4
Cc: jlayton, tytso, linux-fsdevel, Dave Chinner, Benjamin Coddington,
Christoph Hellwig, Darrick J . Wong
From: Jeff Layton <jlayton@kernel.org>
The original i_version implementation was pretty expensive, requiring a
log flush on every change. Because of this, it was gated behind a mount
option (implemented via the MS_I_VERSION mountoption flag).
Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
i_version flag much less expensive, so there is no longer a performance
penalty from enabling it. xfs and btrfs already enable it
unconditionally when the on-disk format can support it.
Have ext4 ignore the SB_I_VERSION flag, and just enable it
unconditionally. While we're in here, remove the handling of
Opt_i_version as well, since we're almost to 5.20 anyway.
Ideally, we'd couple this change with a way to disable the i_version
counter (just in case), but the way the iversion mount option was
implemented makes that difficult to do. We'd need to add a new mount
option altogether or do something with tune2fs. That's probably best
left to later patches if it turns out to be needed.
[ Removed leftover bits of i_version from ext4_apply_options() since it
now can't ever be set in ctx->mask_s_flags -- lczerner ]
Cc: Dave Chinner <david@fromorbit.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
---
v3: Removed leftover bits of i_version from ext4_apply_options
fs/ext4/inode.c | 5 ++---
fs/ext4/super.c | 21 ++++-----------------
2 files changed, 6 insertions(+), 20 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index b76554124224..acd00300a697 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5411,7 +5411,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
return -EINVAL;
}
- if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
+ if (attr->ia_size != inode->i_size)
inode_inc_iversion(inode);
if (shrink) {
@@ -5721,8 +5721,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
* ea_inodes are using i_version for storing reference count, don't
* mess with it
*/
- if (IS_I_VERSION(inode) &&
- !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
+ if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
inode_inc_iversion(inode);
/* the do_update_inode consumes one bh->b_count */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 845f2f8aee5f..4c3e6021e772 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1585,7 +1585,7 @@ enum {
Opt_inlinecrypt,
Opt_usrjquota, Opt_grpjquota, Opt_quota,
Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
- Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
+ Opt_usrquota, Opt_grpquota, Opt_prjquota,
Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
@@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
fsparam_flag ("barrier", Opt_barrier),
fsparam_u32 ("barrier", Opt_barrier),
fsparam_flag ("nobarrier", Opt_nobarrier),
- fsparam_flag ("i_version", Opt_i_version),
fsparam_flag ("dax", Opt_dax),
fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
fsparam_u32 ("stripe", Opt_stripe),
@@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
case Opt_abort:
ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
return 0;
- case Opt_i_version:
- ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
- ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
- ctx_set_flags(ctx, SB_I_VERSION);
- return 0;
case Opt_inlinecrypt:
#ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
ctx_set_flags(ctx, SB_INLINECRYPT);
@@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
sb->s_flags &= ~ctx->mask_s_flags;
sb->s_flags |= ctx->vals_s_flags;
- /*
- * i_version differs from common mount option iversion so we have
- * to let vfs know that it was set, otherwise it would get cleared
- * on remount
- */
- if (ctx->mask_s_flags & SB_I_VERSION)
- fc->sb_flags |= SB_I_VERSION;
-
#define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
APPLY(s_commit_interval);
APPLY(s_stripe);
@@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
- if (sb->s_flags & SB_I_VERSION)
- SEQ_OPTS_PUTS("i_version");
if (nodefs || sbi->s_stripe)
SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
if (nodefs || EXT4_MOUNT_DATA_FLAGS &
@@ -4630,6 +4614,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
(test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
+ /* i_version is always enabled now */
+ sb->s_flags |= SB_I_VERSION;
+
if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
(ext4_has_compat_features(sb) ||
ext4_has_ro_compat_features(sb) ||
--
2.37.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 3/3] ext4: unconditionally enable the i_version counter
2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
@ 2022-08-03 13:04 ` Jeff Layton
0 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-08-03 13:04 UTC (permalink / raw)
To: Lukas Czerner, linux-ext4
Cc: tytso, linux-fsdevel, Dave Chinner, Benjamin Coddington,
Christoph Hellwig, Darrick J . Wong
On Wed, 2022-08-03 at 12:53 +0200, Lukas Czerner wrote:
> From: Jeff Layton <jlayton@kernel.org>
>
> The original i_version implementation was pretty expensive, requiring a
> log flush on every change. Because of this, it was gated behind a mount
> option (implemented via the MS_I_VERSION mountoption flag).
>
> Commit ae5e165d855d (fs: new API for handling inode->i_version) made the
> i_version flag much less expensive, so there is no longer a performance
> penalty from enabling it. xfs and btrfs already enable it
> unconditionally when the on-disk format can support it.
>
> Have ext4 ignore the SB_I_VERSION flag, and just enable it
> unconditionally. While we're in here, remove the handling of
> Opt_i_version as well, since we're almost to 5.20 anyway.
>
> Ideally, we'd couple this change with a way to disable the i_version
> counter (just in case), but the way the iversion mount option was
> implemented makes that difficult to do. We'd need to add a new mount
> option altogether or do something with tune2fs. That's probably best
> left to later patches if it turns out to be needed.
>
> [ Removed leftover bits of i_version from ext4_apply_options() since it
> now can't ever be set in ctx->mask_s_flags -- lczerner ]
>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Darrick J. Wong <djwong@kernel.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> ---
> v3: Removed leftover bits of i_version from ext4_apply_options
>
> fs/ext4/inode.c | 5 ++---
> fs/ext4/super.c | 21 ++++-----------------
> 2 files changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index b76554124224..acd00300a697 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -5411,7 +5411,7 @@ int ext4_setattr(struct user_namespace *mnt_userns, struct dentry *dentry,
> return -EINVAL;
> }
>
> - if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size)
> + if (attr->ia_size != inode->i_size)
> inode_inc_iversion(inode);
>
> if (shrink) {
> @@ -5721,8 +5721,7 @@ int ext4_mark_iloc_dirty(handle_t *handle,
> * ea_inodes are using i_version for storing reference count, don't
> * mess with it
> */
> - if (IS_I_VERSION(inode) &&
> - !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> + if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL))
> inode_inc_iversion(inode);
>
> /* the do_update_inode consumes one bh->b_count */
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 845f2f8aee5f..4c3e6021e772 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1585,7 +1585,7 @@ enum {
> Opt_inlinecrypt,
> Opt_usrjquota, Opt_grpjquota, Opt_quota,
> Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
> - Opt_usrquota, Opt_grpquota, Opt_prjquota, Opt_i_version,
> + Opt_usrquota, Opt_grpquota, Opt_prjquota,
> Opt_dax, Opt_dax_always, Opt_dax_inode, Opt_dax_never,
> Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_warn_on_error,
> Opt_nowarn_on_error, Opt_mblk_io_submit, Opt_debug_want_extra_isize,
> @@ -1694,7 +1694,6 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
> fsparam_flag ("barrier", Opt_barrier),
> fsparam_u32 ("barrier", Opt_barrier),
> fsparam_flag ("nobarrier", Opt_nobarrier),
> - fsparam_flag ("i_version", Opt_i_version),
> fsparam_flag ("dax", Opt_dax),
> fsparam_enum ("dax", Opt_dax_type, ext4_param_dax),
> fsparam_u32 ("stripe", Opt_stripe),
> @@ -2140,11 +2139,6 @@ static int ext4_parse_param(struct fs_context *fc, struct fs_parameter *param)
> case Opt_abort:
> ctx_set_mount_flag(ctx, EXT4_MF_FS_ABORTED);
> return 0;
> - case Opt_i_version:
> - ext4_msg(NULL, KERN_WARNING, deprecated_msg, param->key, "5.20");
> - ext4_msg(NULL, KERN_WARNING, "Use iversion instead\n");
> - ctx_set_flags(ctx, SB_I_VERSION);
> - return 0;
> case Opt_inlinecrypt:
> #ifdef CONFIG_FS_ENCRYPTION_INLINE_CRYPT
> ctx_set_flags(ctx, SB_INLINECRYPT);
> @@ -2814,14 +2808,6 @@ static void ext4_apply_options(struct fs_context *fc, struct super_block *sb)
> sb->s_flags &= ~ctx->mask_s_flags;
> sb->s_flags |= ctx->vals_s_flags;
>
> - /*
> - * i_version differs from common mount option iversion so we have
> - * to let vfs know that it was set, otherwise it would get cleared
> - * on remount
> - */
> - if (ctx->mask_s_flags & SB_I_VERSION)
> - fc->sb_flags |= SB_I_VERSION;
> -
> #define APPLY(X) ({ if (ctx->spec & EXT4_SPEC_##X) sbi->X = ctx->X; })
> APPLY(s_commit_interval);
> APPLY(s_stripe);
> @@ -2970,8 +2956,6 @@ static int _ext4_show_options(struct seq_file *seq, struct super_block *sb,
> SEQ_OPTS_PRINT("min_batch_time=%u", sbi->s_min_batch_time);
> if (nodefs || sbi->s_max_batch_time != EXT4_DEF_MAX_BATCH_TIME)
> SEQ_OPTS_PRINT("max_batch_time=%u", sbi->s_max_batch_time);
> - if (sb->s_flags & SB_I_VERSION)
> - SEQ_OPTS_PUTS("i_version");
> if (nodefs || sbi->s_stripe)
> SEQ_OPTS_PRINT("stripe=%lu", sbi->s_stripe);
> if (nodefs || EXT4_MOUNT_DATA_FLAGS &
> @@ -4630,6 +4614,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
> (test_opt(sb, POSIX_ACL) ? SB_POSIXACL : 0);
>
> + /* i_version is always enabled now */
> + sb->s_flags |= SB_I_VERSION;
> +
> if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV &&
> (ext4_has_compat_features(sb) ||
> ext4_has_ro_compat_features(sb) ||
Looks good to me. Thanks for picking this up, Lukas!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
@ 2022-08-05 8:05 ` Eric Biggers
2022-08-05 12:23 ` Lukas Czerner
2022-08-07 23:08 ` Dave Chinner
1 sibling, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2022-08-05 8:05 UTC (permalink / raw)
To: Lukas Czerner
Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Dave Chinner,
Christoph Hellwig, Jan Kara
On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ad5e3520fae..2243797badf2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> * The inode itself only has dirty timestamps, and the
> * lazytime mount option is enabled. We keep track of this
> * separately from I_DIRTY_SYNC in order to implement
> * lazytime. This gets cleared if I_DIRTY_INODE
> - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
> - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> - * i_state, but not both. I_DIRTY_PAGES may still be set.
> + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> + * in place.
I'm still having a hard time understanding the new semantics. The first
sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
itself only has dirty timestamps", right?
Also, have you checked all the places that I_DIRTY_TIME is used and verified
they do the right thing now? What about inode_is_dirtytime_only()?
Also what is the precise meaning of the flags argument to ->dirty_inode now?
sb->s_op->dirty_inode(inode,
flags & (I_DIRTY_INODE | I_DIRTY_TIME));
Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.
- Eric
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
2022-08-05 8:05 ` Eric Biggers
@ 2022-08-05 12:23 ` Lukas Czerner
2022-08-12 18:20 ` Eric Biggers
0 siblings, 1 reply; 9+ messages in thread
From: Lukas Czerner @ 2022-08-05 12:23 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Dave Chinner,
Christoph Hellwig, Jan Kara
On Fri, Aug 05, 2022 at 01:05:45AM -0700, Eric Biggers wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 9ad5e3520fae..2243797badf2 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
> > * The inode itself only has dirty timestamps, and the
> > * lazytime mount option is enabled. We keep track of this
> > * separately from I_DIRTY_SYNC in order to implement
> > * lazytime. This gets cleared if I_DIRTY_INODE
> > - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e.
> > - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in
> > - * i_state, but not both. I_DIRTY_PAGES may still be set.
> > + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But
> > + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already
> > + * in place.
>
> I'm still having a hard time understanding the new semantics. The first
> sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode
> itself only has dirty timestamps", right?
The problem is that it was always assumed that I_DIRTY_INODE superseeds
I_DIRTY_TIME and so it would get cleared in __mark_inode_dirty() when we
have I_DIRTY_INODE. That's true, we call sb->s_op->dirty_inode(), the
time update gets pushed into on-disk inode structure, I_DIRTY_TIME
cleared and it will get queued for writeback.
Any subsequent dirtying with I_DIRTY_TIME gets ignored simply because
I_DIRTY_INODE is already set in i_state. But in ext4 this time update
will never get pushed into on disk inode and there is no I_DIRTY_TIME so
once the writeback is done we've lost all those I_DIRTY_TIME updates in
between even if there was a sync.
Now, we still clear I_DIRTY_TIME when we get I_DIRTY_INODE, but any
subsequent I_DIRTY_TIME only updates won't be ignored and we set it into
i_state. After the writeback is done it'll be moved to b_dirty_time
list.
So I am not sure how would you like it to be re-worded, simply removing
the 'only' would be ok?
>
> Also, have you checked all the places that I_DIRTY_TIME is used and verified
> they do the right thing now? What about inode_is_dirtytime_only()?
Yes, that's fine, despite the slightly misleading name ;)
>
> Also what is the precise meaning of the flags argument to ->dirty_inode now?
>
> sb->s_op->dirty_inode(inode,
> flags & (I_DIRTY_INODE | I_DIRTY_TIME));
>
> Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.
Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
well. Additionaly it can have I_DIRTY_TIME to inform the fs we have a
dirty timestamp as well (in case of lazytime).
Perhaps we can add:
If the inode has dirty timestamp and lazytime is enabled I_DIRTY_TIME
will be set in the flags.
-Lukas
>
> - Eric
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
2022-08-05 8:05 ` Eric Biggers
@ 2022-08-07 23:08 ` Dave Chinner
2022-08-08 10:26 ` Lukas Czerner
1 sibling, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2022-08-07 23:08 UTC (permalink / raw)
To: Lukas Czerner
Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Christoph Hellwig, Jan Kara
On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> Currently the I_DIRTY_TIME will never get set if the inode already has
> I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
> true, however ext4 will only update the on-disk inode in
> ->dirty_inode(), not on actual writeback. As a result if the inode
> already has I_DIRTY_INODE state by the time we get to
> __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> into on-disk inode and will not get updated until the next I_DIRTY_INODE
> update, which might never come if we crash or get a power failure.
>
> The problem can be reproduced on ext4 by running xfstest generic/622
> with -o iversion mount option.
>
> Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> I_DIRTY_INODE. Also make sure that the case is properly handled in
> writeback_single_inode() as well. Additionally changes in
> xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
>
> Thanks Jan Kara for suggestions on how to make this work properly.
>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> Suggested-by: Jan Kara <jack@suse.cz>
> ---
> v2: Reworked according to suggestions from Jan
....
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index aa977c7ea370..cff05a4771b5 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
>
> if (!(inode->i_sb->s_flags & SB_LAZYTIME))
> return;
> - if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> + if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
> + !((inode->i_state | flag) & I_DIRTY_TIME))
> return;
My eyes, they bleed. The dirty time code was already a horrid
abomination, and this makes it worse.
From looking at the code, I cannot work out what the new semantics
for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work
out what the condition this is new code is supposed to be doing. I
*can't verify it is correct* by reading the code.
Can you please add a comment here explaining the conditions where we
don't have to log a new timestamp update?
Also, if "flag" now contains multiple flags, can you rename it
"flags"?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
2022-08-07 23:08 ` Dave Chinner
@ 2022-08-08 10:26 ` Lukas Czerner
0 siblings, 0 replies; 9+ messages in thread
From: Lukas Czerner @ 2022-08-08 10:26 UTC (permalink / raw)
To: Dave Chinner
Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Christoph Hellwig, Jan Kara
On Mon, Aug 08, 2022 at 09:08:10AM +1000, Dave Chinner wrote:
> On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote:
> > Currently the I_DIRTY_TIME will never get set if the inode already has
> > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
> > true, however ext4 will only update the on-disk inode in
> > ->dirty_inode(), not on actual writeback. As a result if the inode
> > already has I_DIRTY_INODE state by the time we get to
> > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> > into on-disk inode and will not get updated until the next I_DIRTY_INODE
> > update, which might never come if we crash or get a power failure.
> >
> > The problem can be reproduced on ext4 by running xfstest generic/622
> > with -o iversion mount option.
> >
> > Fix it by allowing I_DIRTY_TIME to be set even if the inode already has
> > I_DIRTY_INODE. Also make sure that the case is properly handled in
> > writeback_single_inode() as well. Additionally changes in
> > xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag.
> >
> > Thanks Jan Kara for suggestions on how to make this work properly.
> >
> > Cc: Dave Chinner <david@fromorbit.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > Suggested-by: Jan Kara <jack@suse.cz>
> > ---
> > v2: Reworked according to suggestions from Jan
>
> ....
>
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index aa977c7ea370..cff05a4771b5 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -658,7 +658,8 @@ xfs_fs_dirty_inode(
> >
> > if (!(inode->i_sb->s_flags & SB_LAZYTIME))
> > return;
> > - if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME))
> > + if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC ||
> > + !((inode->i_state | flag) & I_DIRTY_TIME))
> > return;
>
> My eyes, they bleed. The dirty time code was already a horrid
> abomination, and this makes it worse.
>
> From looking at the code, I cannot work out what the new semantics
> for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work
Hi Dave,
please see the other thready for this patch with Eric Biggers, where I
try to explain and give some suggestion to change the doc. Does it make
sense to you, or am I missing something?
https://marc.info/?l=linux-ext4&m=165970194205621&w=2
> out what the condition this is new code is supposed to be doing. I
> *can't verify it is correct* by reading the code.
The ->dirty_inode() needed to be changed to clear I_DIRTY_TIME from
i_state *before* we call ->dirty_inode() to avoid race where we would
lose timestamp update that comes just a little later, after
-dirty_inode() call with I_DRITY_INODE.
But that would break xfs, so I decided to keep the condition and loosen
the requirement so that I_DIRTY_TIME can also be se in 'flag', not just
the i_state. Hence the abomination.
>
> Can you please add a comment here explaining the conditions where we
> don't have to log a new timestamp update?
How about something like this?
Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC) and
has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be either already
set in i_state, or passed in flags possibly together with I_DIRTY_SYNC.
>
> Also, if "flag" now contains multiple flags, can you rename it
> "flags"?
Sure, I can do that.
Thanks!
-Lukas
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> david@fromorbit.com
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
2022-08-05 12:23 ` Lukas Czerner
@ 2022-08-12 18:20 ` Eric Biggers
0 siblings, 0 replies; 9+ messages in thread
From: Eric Biggers @ 2022-08-12 18:20 UTC (permalink / raw)
To: Lukas Czerner
Cc: linux-ext4, jlayton, tytso, linux-fsdevel, Dave Chinner,
Christoph Hellwig, Jan Kara
On Fri, Aug 05, 2022 at 02:23:06PM +0200, Lukas Czerner wrote:
> >
> > Also what is the precise meaning of the flags argument to ->dirty_inode now?
> >
> > sb->s_op->dirty_inode(inode,
> > flags & (I_DIRTY_INODE | I_DIRTY_TIME));
> >
> > Note that dirty_inode is documented in Documentation/filesystems/vfs.rst.
>
> Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as
> well.
Well, it didn't really need to because there were only two possibilities:
datasync and not datasync. This patch changes that.
> Additionaly it can have I_DIRTY_TIME to inform the fs we have a
> dirty timestamp as well (in case of lazytime).
This is introduced by this patch.
- Eric
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-08-12 18:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03 10:53 [PATCH v2 1/3] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
2022-08-03 10:53 ` [PATCH v2 2/3] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
2022-08-05 8:05 ` Eric Biggers
2022-08-05 12:23 ` Lukas Czerner
2022-08-12 18:20 ` Eric Biggers
2022-08-07 23:08 ` Dave Chinner
2022-08-08 10:26 ` Lukas Czerner
2022-08-03 10:53 ` [PATCH v3 3/3] ext4: unconditionally enable the i_version counter Lukas Czerner
2022-08-03 13:04 ` Jeff Layton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.