[PATCH v3 0/2] ext4, jbd2: journal cycled record transactions between each mount

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/2] ext4, jbd2: journal cycled record transactions between each mount
@ 2023-03-14 14:05 Zhang Yi
  2023-03-14 14:05 ` [PATCH v3 1/2] jbd2: continue to record log " Zhang Yi
  2023-03-14 14:05 ` [PATCH v3 2/2] ext4: add journal cycled recording support Zhang Yi
  0 siblings, 2 replies; 12+ messages in thread
From: Zhang Yi @ 2023-03-14 14:05 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

v3->v2:
 - Prevent warning if mount old image with journal_cycle_record enabled.
 - Limit this mount option into ext4 iamge only.
v1->v2:
 - Fix the format type warning.
 - Add more check of journal_cycle_record mount options in remount.

Hello!

This patch set is the third version of the journal_cycle_record mount
option. It save journal head for a clean unmounted file system in the
journal super block, which could let us record journal transactions
between each mount continuously. It could help us to do journal
backtrack and find root cause from a corrupted filesystem. Current
filesystem's corruption analysis is difficult and less useful
information, especially on the real products. It is useful to some
extent, especially for the cases of doing fuzzy tests and deploy in
some shout-runing products.

I have finished the corresponding e2fsprogs part and I will them send
out separately, all of these have done below test cases and also passed
xfstests in auto mode.
 - Mount a filesystem with empty journal.
 - Mount a filesystem with journal ended in an unrecovered complete
   transaction.
 - Mount a filesystem with journal ended in an incomplete transaction.
 - Mount a corrupted filesystem with out of bound journal s_head.
 - Mount old filesystem without journal s_head set.

Any comments are welcome.

Thanks!
Yi.

v2: https://lore.kernel.org/linux-ext4/20230202142224.3679549-1-yi.zhang@huawei.com/
v1: https://lore.kernel.org/linux-ext4/20230119034600.3431194-3-yi.zhang@huaweicloud.com/

Zhang Yi (2):
  jbd2: continue to record log between each mount
  ext4: add journal cycled recording support

 fs/ext4/ext4.h       |  2 ++
 fs/ext4/super.c      | 18 ++++++++++++++++++
 fs/jbd2/journal.c    | 18 ++++++++++++++++--
 fs/jbd2/recovery.c   | 22 +++++++++++++++++-----
 include/linux/jbd2.h |  9 +++++++--
 5 files changed, 60 insertions(+), 9 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] jbd2: continue to record log between each mount
  2023-03-14 14:05 [PATCH v3 0/2] ext4, jbd2: journal cycled record transactions between each mount Zhang Yi
@ 2023-03-14 14:05 ` Zhang Yi
  2023-03-15  9:48   ` Jan Kara
  2023-03-14 14:05 ` [PATCH v3 2/2] ext4: add journal cycled recording support Zhang Yi
  1 sibling, 1 reply; 12+ messages in thread
From: Zhang Yi @ 2023-03-14 14:05 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

For a newly mounted file system, the journal committing thread always
record new transactions from the start of the journal area, no matter
whether the journal was clean or just has been recovered. So the logdump
code in debugfs cannot dump continuous logs between each mount, it is
disadvantageous to analysis corrupted file system image and locate the
file system inconsistency bugs.

If we get a corrupted file system in the running products and want to
find out what has happened, besides lookup the system log, one effective
way is to backtrack the journal log. But we may not always run e2fsck
before each mount and the default fsck -a mode also cannot always
checkout all inconsistencies, so it could left over some inconsistencies
into the next mount until we detect it. Finally, transactions in the
journal may probably discontinuous and some relatively new transactions
has been covered, it becomes hard to analyse. If we could record
transactions continuously between each mount, we could acquire more
useful info from the journal. Like this:

 |Previous mount checkpointed/recovered logs|Current mount logs         |
 |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|

And yes the journal area is limited and cannot record everything, the
problematic transaction may also be covered even if we do this, but
this is still useful for fuzzy tests and short-running products.

This patch save the head blocknr in the superblock after flushing the
journal or unmounting the file system, let the next mount could continue
to record new transaction behind it. This change is backward compatible
because the old kernel does not care about the head blocknr of the
journal. It is also fine if we mount a clean old image without valid
head blocknr, we fail back to set it to s_first just like before.
Finally, for the case of mount an unclean file system, we could also get
the journal head easily after scanning/replaying the journal, it will
continue to record new transaction after the recovered transactions.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/journal.c    | 18 ++++++++++++++++--
 fs/jbd2/recovery.c   | 22 +++++++++++++++++-----
 include/linux/jbd2.h |  9 +++++++--
 3 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index e80c781731f8..c57ab466fc18 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1556,8 +1556,21 @@ static int journal_reset(journal_t *journal)
 	journal->j_first = first;
 	journal->j_last = last;
 
-	journal->j_head = journal->j_first;
-	journal->j_tail = journal->j_first;
+	if (journal->j_head != 0 && journal->j_flags & JBD2_CYCLE_RECORD) {
+		/*
+		 * Disable the cycled recording mode if the journal head block
+		 * number is not correct.
+		 */
+		if (journal->j_head < first || journal->j_head >= last) {
+			printk(KERN_WARNING "JBD2: Incorrect Journal head block %lu, "
+			       "disable journal_cycle_record\n",
+			       journal->j_head);
+			journal->j_head = journal->j_first;
+		}
+	} else {
+		journal->j_head = journal->j_first;
+	}
+	journal->j_tail = journal->j_head;
 	journal->j_free = journal->j_last - journal->j_first;
 
 	journal->j_tail_sequence = journal->j_transaction_sequence;
@@ -1729,6 +1742,7 @@ static void jbd2_mark_journal_empty(journal_t *journal, blk_opf_t write_flags)
 
 	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
 	sb->s_start    = cpu_to_be32(0);
+	sb->s_head     = cpu_to_be32(journal->j_head);
 	if (jbd2_has_feature_fast_commit(journal)) {
 		/*
 		 * When journal is clean, no need to commit fast commit flag and
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 8286a9ec122f..0184931d47f7 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -29,6 +29,7 @@ struct recovery_info
 {
 	tid_t		start_transaction;
 	tid_t		end_transaction;
+	unsigned long	head_block;
 
 	int		nr_replays;
 	int		nr_revokes;
@@ -301,11 +302,11 @@ int jbd2_journal_recover(journal_t *journal)
 	 * is always zero if, and only if, the journal was cleanly
 	 * unmounted.
 	 */
-
 	if (!sb->s_start) {
-		jbd2_debug(1, "No recovery required, last transaction %d\n",
-			  be32_to_cpu(sb->s_sequence));
+		jbd2_debug(1, "No recovery required, last transaction %d, head block %u\n",
+			  be32_to_cpu(sb->s_sequence), be32_to_cpu(sb->s_head));
 		journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
+		journal->j_head = be32_to_cpu(sb->s_head);
 		return 0;
 	}
 
@@ -324,6 +325,9 @@ int jbd2_journal_recover(journal_t *journal)
 	/* Restart the log at the next transaction ID, thus invalidating
 	 * any existing commit records in the log. */
 	journal->j_transaction_sequence = ++info.end_transaction;
+	journal->j_head = info.head_block;
+	jbd2_debug(1, "JBD2: last transaction %d, head block %lu\n",
+		  journal->j_transaction_sequence, journal->j_head);
 
 	jbd2_journal_clear_revoke(journal);
 	err2 = sync_blockdev(journal->j_fs_dev);
@@ -364,6 +368,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
 	if (err) {
 		printk(KERN_ERR "JBD2: error %d scanning journal\n", err);
 		++journal->j_transaction_sequence;
+		journal->j_head = journal->j_first;
 	} else {
 #ifdef CONFIG_JBD2_DEBUG
 		int dropped = info.end_transaction - 
@@ -373,6 +378,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
 			  dropped, (dropped == 1) ? "" : "s");
 #endif
 		journal->j_transaction_sequence = ++info.end_transaction;
+		journal->j_head = info.head_block;
 	}
 
 	journal->j_tail = 0;
@@ -462,7 +468,7 @@ static int do_one_pass(journal_t *journal,
 			struct recovery_info *info, enum passtype pass)
 {
 	unsigned int		first_commit_ID, next_commit_ID;
-	unsigned long		next_log_block;
+	unsigned long		next_log_block, head_block;
 	int			err, success = 0;
 	journal_superblock_t *	sb;
 	journal_header_t *	tmp;
@@ -485,6 +491,7 @@ static int do_one_pass(journal_t *journal,
 	sb = journal->j_superblock;
 	next_commit_ID = be32_to_cpu(sb->s_sequence);
 	next_log_block = be32_to_cpu(sb->s_start);
+	head_block = next_log_block;
 
 	first_commit_ID = next_commit_ID;
 	if (pass == PASS_SCAN)
@@ -809,6 +816,7 @@ static int do_one_pass(journal_t *journal,
 				if (commit_time < last_trans_commit_time)
 					goto ignore_crc_mismatch;
 				info->end_transaction = next_commit_ID;
+				info->head_block = head_block;
 
 				if (!jbd2_has_feature_async_commit(journal)) {
 					journal->j_failed_commit =
@@ -817,8 +825,10 @@ static int do_one_pass(journal_t *journal,
 					break;
 				}
 			}
-			if (pass == PASS_SCAN)
+			if (pass == PASS_SCAN) {
 				last_trans_commit_time = commit_time;
+				head_block = next_log_block;
+			}
 			brelse(bh);
 			next_commit_ID++;
 			continue;
@@ -868,6 +878,8 @@ static int do_one_pass(journal_t *journal,
 	if (pass == PASS_SCAN) {
 		if (!info->end_transaction)
 			info->end_transaction = next_commit_ID;
+		if (!info->head_block)
+			info->head_block = head_block;
 	} else {
 		/* It's really bad news if different passes end up at
 		 * different places (but possible due to IO errors). */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 5962072a4b19..475f135260c9 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -265,8 +265,10 @@ typedef struct journal_superblock_s
 	__u8	s_padding2[3];
 /* 0x0054 */
 	__be32	s_num_fc_blks;		/* Number of fast commit blocks */
-/* 0x0058 */
-	__u32	s_padding[41];
+	__be32	s_head;			/* blocknr of head of log, only uptodate
+					 * while the filesystem is clean */
+/* 0x005C */
+	__u32	s_padding[40];
 	__be32	s_checksum;		/* crc32c(superblock) */
 
 /* 0x0100 */
@@ -1392,6 +1394,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
 #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
 						 * data write error in ordered
 						 * mode */
+#define JBD2_CYCLE_RECORD		0x080	/* Journal cycled record log on
+						 * clean and empty filesystem
+						 * logging area */
 #define JBD2_FAST_COMMIT_ONGOING	0x100	/* Fast commit is ongoing */
 #define JBD2_FULL_COMMIT_ONGOING	0x200	/* Full commit is ongoing */
 #define JBD2_JOURNAL_FLUSH_DISCARD	0x0001
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] ext4: add journal cycled recording support
  2023-03-14 14:05 [PATCH v3 0/2] ext4, jbd2: journal cycled record transactions between each mount Zhang Yi
  2023-03-14 14:05 ` [PATCH v3 1/2] jbd2: continue to record log " Zhang Yi
@ 2023-03-14 14:05 ` Zhang Yi
  1 sibling, 0 replies; 12+ messages in thread
From: Zhang Yi @ 2023-03-14 14:05 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

Introduce a new mount option 'journal_cycle_record', let jbd2 continue
to record new journal transactions from the recovered journal head or
the checkpointed transactions in the previous mount.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/ext4.h  |  2 ++
 fs/ext4/super.c | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 4eeb02d456a9..1106a4a3c341 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1267,6 +1267,8 @@ struct ext4_inode_info {
 #define EXT4_MOUNT2_MB_OPTIMIZE_SCAN	0x00000080 /* Optimize group
 						    * scanning in mballoc
 						    */
+#define EXT4_MOUNT2_JOURNAL_CYCLE_RECORD	0x00000100 /* Journal cycled record
+							    * log on empty logging area */
 
 #define clear_opt(sb, opt)		EXT4_SB(sb)->s_mount_opt &= \
 						~EXT4_MOUNT_##opt
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 88f7b8a88c76..6e071aeea44a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1591,6 +1591,7 @@ enum {
 	Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache,
 	Opt_no_prefetch_block_bitmaps, Opt_mb_optimize_scan,
 	Opt_errors, Opt_data, Opt_data_err, Opt_jqfmt, Opt_dax_type,
+	Opt_journal_cycle_record,
 #ifdef CONFIG_EXT4_DEBUG
 	Opt_fc_debug_max_replay, Opt_fc_debug_force
 #endif
@@ -1670,6 +1671,7 @@ static const struct fs_parameter_spec ext4_param_specs[] = {
 	fsparam_flag	("journal_checksum",	Opt_journal_checksum),
 	fsparam_flag	("nojournal_checksum",	Opt_nojournal_checksum),
 	fsparam_flag	("journal_async_commit",Opt_journal_async_commit),
+	fsparam_flag	("journal_cycle_record",Opt_journal_cycle_record),
 	fsparam_flag	("abort",		Opt_abort),
 	fsparam_enum	("data",		Opt_data, ext4_param_data),
 	fsparam_enum	("data_err",		Opt_data_err,
@@ -1826,6 +1828,8 @@ static const struct mount_opts {
 	{Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET},
 	{Opt_no_prefetch_block_bitmaps, EXT4_MOUNT_NO_PREFETCH_BLOCK_BITMAPS,
 	 MOPT_SET},
+	{Opt_journal_cycle_record, EXT4_MOUNT2_JOURNAL_CYCLE_RECORD,
+	 MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY},
 #ifdef CONFIG_EXT4_DEBUG
 	{Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT,
 	 MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY},
@@ -2761,6 +2765,13 @@ static int ext4_check_opt_consistency(struct fs_context *fc,
 			    !(sbi->s_mount_opt2 & EXT4_MOUNT2_DAX_INODE))) {
 			goto fail_dax_change_remount;
 		}
+
+		if (ctx_test_mount_opt2(ctx, EXT4_MOUNT2_JOURNAL_CYCLE_RECORD) &&
+		    !test_opt2(sb, JOURNAL_CYCLE_RECORD)) {
+			ext4_msg(NULL, KERN_ERR,
+				 "can't change journal_cycle_record on remount");
+			return -EINVAL;
+		}
 	}
 
 	return ext4_check_quota_consistency(fc, sb);
@@ -5291,6 +5302,11 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 			goto failed_mount3a;
 		}
 
+		if (test_opt2(sb, JOURNAL_CYCLE_RECORD)) {
+			ext4_msg(sb, KERN_ERR, "can't mount with "
+				 "journal_cycle_record, fs mounted w/o journal");
+			goto failed_mount3a;
+		}
 		if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) {
 			ext4_msg(sb, KERN_ERR, "can't mount with "
 				 "journal_checksum, fs mounted w/o journal");
@@ -5691,6 +5707,8 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal)
 		journal->j_flags |= JBD2_ABORT_ON_SYNCDATA_ERR;
 	else
 		journal->j_flags &= ~JBD2_ABORT_ON_SYNCDATA_ERR;
+	if (test_opt2(sb, JOURNAL_CYCLE_RECORD))
+		journal->j_flags |= JBD2_CYCLE_RECORD;
 	write_unlock(&journal->j_state_lock);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] jbd2: continue to record log between each mount
  2023-03-14 14:05 ` [PATCH v3 1/2] jbd2: continue to record log " Zhang Yi
@ 2023-03-15  9:48   ` Jan Kara
  2023-03-15 12:37       ` Zhang Yi
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2023-03-15  9:48 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, yukuai3

On Tue 14-03-23 22:05:21, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> For a newly mounted file system, the journal committing thread always
> record new transactions from the start of the journal area, no matter
> whether the journal was clean or just has been recovered. So the logdump
> code in debugfs cannot dump continuous logs between each mount, it is
> disadvantageous to analysis corrupted file system image and locate the
> file system inconsistency bugs.
> 
> If we get a corrupted file system in the running products and want to
> find out what has happened, besides lookup the system log, one effective
> way is to backtrack the journal log. But we may not always run e2fsck
> before each mount and the default fsck -a mode also cannot always
> checkout all inconsistencies, so it could left over some inconsistencies
> into the next mount until we detect it. Finally, transactions in the
> journal may probably discontinuous and some relatively new transactions
> has been covered, it becomes hard to analyse. If we could record
> transactions continuously between each mount, we could acquire more
> useful info from the journal. Like this:
> 
>  |Previous mount checkpointed/recovered logs|Current mount logs         |
>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
> 
> And yes the journal area is limited and cannot record everything, the
> problematic transaction may also be covered even if we do this, but
> this is still useful for fuzzy tests and short-running products.
> 
> This patch save the head blocknr in the superblock after flushing the
> journal or unmounting the file system, let the next mount could continue
> to record new transaction behind it. This change is backward compatible
> because the old kernel does not care about the head blocknr of the
> journal. It is also fine if we mount a clean old image without valid
> head blocknr, we fail back to set it to s_first just like before.
> Finally, for the case of mount an unclean file system, we could also get
> the journal head easily after scanning/replaying the journal, it will
> continue to record new transaction after the recovered transactions.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

I like this implementation! I even think we could perhaps make ext4 always
behave this way to not increase size of the test matrix. Or do you see any
downside to this option?

								Honza

> ---
>  fs/jbd2/journal.c    | 18 ++++++++++++++++--
>  fs/jbd2/recovery.c   | 22 +++++++++++++++++-----
>  include/linux/jbd2.h |  9 +++++++--
>  3 files changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index e80c781731f8..c57ab466fc18 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1556,8 +1556,21 @@ static int journal_reset(journal_t *journal)
>  	journal->j_first = first;
>  	journal->j_last = last;
>  
> -	journal->j_head = journal->j_first;
> -	journal->j_tail = journal->j_first;
> +	if (journal->j_head != 0 && journal->j_flags & JBD2_CYCLE_RECORD) {
> +		/*
> +		 * Disable the cycled recording mode if the journal head block
> +		 * number is not correct.
> +		 */
> +		if (journal->j_head < first || journal->j_head >= last) {
> +			printk(KERN_WARNING "JBD2: Incorrect Journal head block %lu, "
> +			       "disable journal_cycle_record\n",
> +			       journal->j_head);
> +			journal->j_head = journal->j_first;
> +		}
> +	} else {
> +		journal->j_head = journal->j_first;
> +	}
> +	journal->j_tail = journal->j_head;
>  	journal->j_free = journal->j_last - journal->j_first;
>  
>  	journal->j_tail_sequence = journal->j_transaction_sequence;
> @@ -1729,6 +1742,7 @@ static void jbd2_mark_journal_empty(journal_t *journal, blk_opf_t write_flags)
>  
>  	sb->s_sequence = cpu_to_be32(journal->j_tail_sequence);
>  	sb->s_start    = cpu_to_be32(0);
> +	sb->s_head     = cpu_to_be32(journal->j_head);
>  	if (jbd2_has_feature_fast_commit(journal)) {
>  		/*
>  		 * When journal is clean, no need to commit fast commit flag and
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 8286a9ec122f..0184931d47f7 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -29,6 +29,7 @@ struct recovery_info
>  {
>  	tid_t		start_transaction;
>  	tid_t		end_transaction;
> +	unsigned long	head_block;
>  
>  	int		nr_replays;
>  	int		nr_revokes;
> @@ -301,11 +302,11 @@ int jbd2_journal_recover(journal_t *journal)
>  	 * is always zero if, and only if, the journal was cleanly
>  	 * unmounted.
>  	 */
> -
>  	if (!sb->s_start) {
> -		jbd2_debug(1, "No recovery required, last transaction %d\n",
> -			  be32_to_cpu(sb->s_sequence));
> +		jbd2_debug(1, "No recovery required, last transaction %d, head block %u\n",
> +			  be32_to_cpu(sb->s_sequence), be32_to_cpu(sb->s_head));
>  		journal->j_transaction_sequence = be32_to_cpu(sb->s_sequence) + 1;
> +		journal->j_head = be32_to_cpu(sb->s_head);
>  		return 0;
>  	}
>  
> @@ -324,6 +325,9 @@ int jbd2_journal_recover(journal_t *journal)
>  	/* Restart the log at the next transaction ID, thus invalidating
>  	 * any existing commit records in the log. */
>  	journal->j_transaction_sequence = ++info.end_transaction;
> +	journal->j_head = info.head_block;
> +	jbd2_debug(1, "JBD2: last transaction %d, head block %lu\n",
> +		  journal->j_transaction_sequence, journal->j_head);
>  
>  	jbd2_journal_clear_revoke(journal);
>  	err2 = sync_blockdev(journal->j_fs_dev);
> @@ -364,6 +368,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
>  	if (err) {
>  		printk(KERN_ERR "JBD2: error %d scanning journal\n", err);
>  		++journal->j_transaction_sequence;
> +		journal->j_head = journal->j_first;
>  	} else {
>  #ifdef CONFIG_JBD2_DEBUG
>  		int dropped = info.end_transaction - 
> @@ -373,6 +378,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
>  			  dropped, (dropped == 1) ? "" : "s");
>  #endif
>  		journal->j_transaction_sequence = ++info.end_transaction;
> +		journal->j_head = info.head_block;
>  	}
>  
>  	journal->j_tail = 0;
> @@ -462,7 +468,7 @@ static int do_one_pass(journal_t *journal,
>  			struct recovery_info *info, enum passtype pass)
>  {
>  	unsigned int		first_commit_ID, next_commit_ID;
> -	unsigned long		next_log_block;
> +	unsigned long		next_log_block, head_block;
>  	int			err, success = 0;
>  	journal_superblock_t *	sb;
>  	journal_header_t *	tmp;
> @@ -485,6 +491,7 @@ static int do_one_pass(journal_t *journal,
>  	sb = journal->j_superblock;
>  	next_commit_ID = be32_to_cpu(sb->s_sequence);
>  	next_log_block = be32_to_cpu(sb->s_start);
> +	head_block = next_log_block;
>  
>  	first_commit_ID = next_commit_ID;
>  	if (pass == PASS_SCAN)
> @@ -809,6 +816,7 @@ static int do_one_pass(journal_t *journal,
>  				if (commit_time < last_trans_commit_time)
>  					goto ignore_crc_mismatch;
>  				info->end_transaction = next_commit_ID;
> +				info->head_block = head_block;
>  
>  				if (!jbd2_has_feature_async_commit(journal)) {
>  					journal->j_failed_commit =
> @@ -817,8 +825,10 @@ static int do_one_pass(journal_t *journal,
>  					break;
>  				}
>  			}
> -			if (pass == PASS_SCAN)
> +			if (pass == PASS_SCAN) {
>  				last_trans_commit_time = commit_time;
> +				head_block = next_log_block;
> +			}
>  			brelse(bh);
>  			next_commit_ID++;
>  			continue;
> @@ -868,6 +878,8 @@ static int do_one_pass(journal_t *journal,
>  	if (pass == PASS_SCAN) {
>  		if (!info->end_transaction)
>  			info->end_transaction = next_commit_ID;
> +		if (!info->head_block)
> +			info->head_block = head_block;
>  	} else {
>  		/* It's really bad news if different passes end up at
>  		 * different places (but possible due to IO errors). */
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 5962072a4b19..475f135260c9 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -265,8 +265,10 @@ typedef struct journal_superblock_s
>  	__u8	s_padding2[3];
>  /* 0x0054 */
>  	__be32	s_num_fc_blks;		/* Number of fast commit blocks */
> -/* 0x0058 */
> -	__u32	s_padding[41];
> +	__be32	s_head;			/* blocknr of head of log, only uptodate
> +					 * while the filesystem is clean */
> +/* 0x005C */
> +	__u32	s_padding[40];
>  	__be32	s_checksum;		/* crc32c(superblock) */
>  
>  /* 0x0100 */
> @@ -1392,6 +1394,9 @@ JBD2_FEATURE_INCOMPAT_FUNCS(fast_commit,	FAST_COMMIT)
>  #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
>  						 * data write error in ordered
>  						 * mode */
> +#define JBD2_CYCLE_RECORD		0x080	/* Journal cycled record log on
> +						 * clean and empty filesystem
> +						 * logging area */
>  #define JBD2_FAST_COMMIT_ONGOING	0x100	/* Fast commit is ongoing */
>  #define JBD2_FULL_COMMIT_ONGOING	0x200	/* Full commit is ongoing */
>  #define JBD2_JOURNAL_FLUSH_DISCARD	0x0001
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] [PATCH v3 1/2] jbd2: continue to record log between each mount
  2023-03-15  9:48   ` Jan Kara
@ 2023-03-15 12:37       ` Zhang Yi
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang Yi via Ocfs2-devel @ 2023-03-15 12:37 UTC (permalink / raw)
  To: Jan Kara, Zhang Yi
  Cc: adilger.kernel, linux-ext4, tytso, ocfs2-devel, yukuai3

On 2023/3/15 17:48, Jan Kara wrote:
> On Tue 14-03-23 22:05:21, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> For a newly mounted file system, the journal committing thread always
>> record new transactions from the start of the journal area, no matter
>> whether the journal was clean or just has been recovered. So the logdump
>> code in debugfs cannot dump continuous logs between each mount, it is
>> disadvantageous to analysis corrupted file system image and locate the
>> file system inconsistency bugs.
>>
>> If we get a corrupted file system in the running products and want to
>> find out what has happened, besides lookup the system log, one effective
>> way is to backtrack the journal log. But we may not always run e2fsck
>> before each mount and the default fsck -a mode also cannot always
>> checkout all inconsistencies, so it could left over some inconsistencies
>> into the next mount until we detect it. Finally, transactions in the
>> journal may probably discontinuous and some relatively new transactions
>> has been covered, it becomes hard to analyse. If we could record
>> transactions continuously between each mount, we could acquire more
>> useful info from the journal. Like this:
>>
>>  |Previous mount checkpointed/recovered logs|Current mount logs         |
>>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
>>
>> And yes the journal area is limited and cannot record everything, the
>> problematic transaction may also be covered even if we do this, but
>> this is still useful for fuzzy tests and short-running products.
>>
>> This patch save the head blocknr in the superblock after flushing the
>> journal or unmounting the file system, let the next mount could continue
>> to record new transaction behind it. This change is backward compatible
>> because the old kernel does not care about the head blocknr of the
>> journal. It is also fine if we mount a clean old image without valid
>> head blocknr, we fail back to set it to s_first just like before.
>> Finally, for the case of mount an unclean file system, we could also get
>> the journal head easily after scanning/replaying the journal, it will
>> continue to record new transaction after the recovered transactions.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> I like this implementation! I even think we could perhaps make ext4 always
> behave this way to not increase size of the test matrix. Or do you see any
> downside to this option?
> 

Thanks for your suggestion. Indeed, I don't find any side effect on this
option both in theory and in the actual use tests on ext4, I added a new
option was just from the safe point of view and let user could disable it if
they don't want it. I also prefer to make ext4 always behave this way.:)

I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.

Thanks,
Yi.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] jbd2: continue to record log between each mount
@ 2023-03-15 12:37       ` Zhang Yi
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang Yi @ 2023-03-15 12:37 UTC (permalink / raw)
  To: Jan Kara, Zhang Yi
  Cc: linux-ext4, tytso, adilger.kernel, yukuai3, ocfs2-devel

On 2023/3/15 17:48, Jan Kara wrote:
> On Tue 14-03-23 22:05:21, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> For a newly mounted file system, the journal committing thread always
>> record new transactions from the start of the journal area, no matter
>> whether the journal was clean or just has been recovered. So the logdump
>> code in debugfs cannot dump continuous logs between each mount, it is
>> disadvantageous to analysis corrupted file system image and locate the
>> file system inconsistency bugs.
>>
>> If we get a corrupted file system in the running products and want to
>> find out what has happened, besides lookup the system log, one effective
>> way is to backtrack the journal log. But we may not always run e2fsck
>> before each mount and the default fsck -a mode also cannot always
>> checkout all inconsistencies, so it could left over some inconsistencies
>> into the next mount until we detect it. Finally, transactions in the
>> journal may probably discontinuous and some relatively new transactions
>> has been covered, it becomes hard to analyse. If we could record
>> transactions continuously between each mount, we could acquire more
>> useful info from the journal. Like this:
>>
>>  |Previous mount checkpointed/recovered logs|Current mount logs         |
>>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
>>
>> And yes the journal area is limited and cannot record everything, the
>> problematic transaction may also be covered even if we do this, but
>> this is still useful for fuzzy tests and short-running products.
>>
>> This patch save the head blocknr in the superblock after flushing the
>> journal or unmounting the file system, let the next mount could continue
>> to record new transaction behind it. This change is backward compatible
>> because the old kernel does not care about the head blocknr of the
>> journal. It is also fine if we mount a clean old image without valid
>> head blocknr, we fail back to set it to s_first just like before.
>> Finally, for the case of mount an unclean file system, we could also get
>> the journal head easily after scanning/replaying the journal, it will
>> continue to record new transaction after the recovered transactions.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> I like this implementation! I even think we could perhaps make ext4 always
> behave this way to not increase size of the test matrix. Or do you see any
> downside to this option?
> 

Thanks for your suggestion. Indeed, I don't find any side effect on this
option both in theory and in the actual use tests on ext4, I added a new
option was just from the safe point of view and let user could disable it if
they don't want it. I also prefer to make ext4 always behave this way.:)

I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] jbd2: continue to record log between each mount
  2023-03-15 12:37       ` Zhang Yi
@ 2023-03-15 17:28         ` Jan Kara via Ocfs2-devel
  -1 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2023-03-15 17:28 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, Zhang Yi, linux-ext4, tytso, adilger.kernel, yukuai3,
	ocfs2-devel

On Wed 15-03-23 20:37:32, Zhang Yi wrote:
> On 2023/3/15 17:48, Jan Kara wrote:
> > On Tue 14-03-23 22:05:21, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> For a newly mounted file system, the journal committing thread always
> >> record new transactions from the start of the journal area, no matter
> >> whether the journal was clean or just has been recovered. So the logdump
> >> code in debugfs cannot dump continuous logs between each mount, it is
> >> disadvantageous to analysis corrupted file system image and locate the
> >> file system inconsistency bugs.
> >>
> >> If we get a corrupted file system in the running products and want to
> >> find out what has happened, besides lookup the system log, one effective
> >> way is to backtrack the journal log. But we may not always run e2fsck
> >> before each mount and the default fsck -a mode also cannot always
> >> checkout all inconsistencies, so it could left over some inconsistencies
> >> into the next mount until we detect it. Finally, transactions in the
> >> journal may probably discontinuous and some relatively new transactions
> >> has been covered, it becomes hard to analyse. If we could record
> >> transactions continuously between each mount, we could acquire more
> >> useful info from the journal. Like this:
> >>
> >>  |Previous mount checkpointed/recovered logs|Current mount logs         |
> >>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
> >>
> >> And yes the journal area is limited and cannot record everything, the
> >> problematic transaction may also be covered even if we do this, but
> >> this is still useful for fuzzy tests and short-running products.
> >>
> >> This patch save the head blocknr in the superblock after flushing the
> >> journal or unmounting the file system, let the next mount could continue
> >> to record new transaction behind it. This change is backward compatible
> >> because the old kernel does not care about the head blocknr of the
> >> journal. It is also fine if we mount a clean old image without valid
> >> head blocknr, we fail back to set it to s_first just like before.
> >> Finally, for the case of mount an unclean file system, we could also get
> >> the journal head easily after scanning/replaying the journal, it will
> >> continue to record new transaction after the recovered transactions.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > I like this implementation! I even think we could perhaps make ext4 always
> > behave this way to not increase size of the test matrix. Or do you see any
> > downside to this option?
> > 
> 
> Thanks for your suggestion. Indeed, I don't find any side effect on this
> option both in theory and in the actual use tests on ext4, I added a new
> option was just from the safe point of view and let user could disable it if
> they don't want it. I also prefer to make ext4 always behave this way.:)
> 
> I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
> want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
> and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.

Yes, that makes sense.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] [PATCH v3 1/2] jbd2: continue to record log between each mount
@ 2023-03-15 17:28         ` Jan Kara via Ocfs2-devel
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara via Ocfs2-devel @ 2023-03-15 17:28 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, Zhang Yi, adilger.kernel, yukuai3, tytso, linux-ext4,
	ocfs2-devel

On Wed 15-03-23 20:37:32, Zhang Yi wrote:
> On 2023/3/15 17:48, Jan Kara wrote:
> > On Tue 14-03-23 22:05:21, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> For a newly mounted file system, the journal committing thread always
> >> record new transactions from the start of the journal area, no matter
> >> whether the journal was clean or just has been recovered. So the logdump
> >> code in debugfs cannot dump continuous logs between each mount, it is
> >> disadvantageous to analysis corrupted file system image and locate the
> >> file system inconsistency bugs.
> >>
> >> If we get a corrupted file system in the running products and want to
> >> find out what has happened, besides lookup the system log, one effective
> >> way is to backtrack the journal log. But we may not always run e2fsck
> >> before each mount and the default fsck -a mode also cannot always
> >> checkout all inconsistencies, so it could left over some inconsistencies
> >> into the next mount until we detect it. Finally, transactions in the
> >> journal may probably discontinuous and some relatively new transactions
> >> has been covered, it becomes hard to analyse. If we could record
> >> transactions continuously between each mount, we could acquire more
> >> useful info from the journal. Like this:
> >>
> >>  |Previous mount checkpointed/recovered logs|Current mount logs         |
> >>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
> >>
> >> And yes the journal area is limited and cannot record everything, the
> >> problematic transaction may also be covered even if we do this, but
> >> this is still useful for fuzzy tests and short-running products.
> >>
> >> This patch save the head blocknr in the superblock after flushing the
> >> journal or unmounting the file system, let the next mount could continue
> >> to record new transaction behind it. This change is backward compatible
> >> because the old kernel does not care about the head blocknr of the
> >> journal. It is also fine if we mount a clean old image without valid
> >> head blocknr, we fail back to set it to s_first just like before.
> >> Finally, for the case of mount an unclean file system, we could also get
> >> the journal head easily after scanning/replaying the journal, it will
> >> continue to record new transaction after the recovered transactions.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > 
> > I like this implementation! I even think we could perhaps make ext4 always
> > behave this way to not increase size of the test matrix. Or do you see any
> > downside to this option?
> > 
> 
> Thanks for your suggestion. Indeed, I don't find any side effect on this
> option both in theory and in the actual use tests on ext4, I added a new
> option was just from the safe point of view and let user could disable it if
> they don't want it. I also prefer to make ext4 always behave this way.:)
> 
> I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
> want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
> and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.

Yes, that makes sense.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] jbd2: continue to record log between each mount
  2023-03-15 17:28         ` [Ocfs2-devel] " Jan Kara via Ocfs2-devel
@ 2023-03-17 11:25           ` Jan Kara via Ocfs2-devel
  -1 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2023-03-17 11:25 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, Zhang Yi, linux-ext4, tytso, adilger.kernel, yukuai3,
	ocfs2-devel

On Wed 15-03-23 18:28:17, Jan Kara wrote:
> On Wed 15-03-23 20:37:32, Zhang Yi wrote:
> > On 2023/3/15 17:48, Jan Kara wrote:
> > > On Tue 14-03-23 22:05:21, Zhang Yi wrote:
> > >> From: Zhang Yi <yi.zhang@huawei.com>
> > >>
> > >> For a newly mounted file system, the journal committing thread always
> > >> record new transactions from the start of the journal area, no matter
> > >> whether the journal was clean or just has been recovered. So the logdump
> > >> code in debugfs cannot dump continuous logs between each mount, it is
> > >> disadvantageous to analysis corrupted file system image and locate the
> > >> file system inconsistency bugs.
> > >>
> > >> If we get a corrupted file system in the running products and want to
> > >> find out what has happened, besides lookup the system log, one effective
> > >> way is to backtrack the journal log. But we may not always run e2fsck
> > >> before each mount and the default fsck -a mode also cannot always
> > >> checkout all inconsistencies, so it could left over some inconsistencies
> > >> into the next mount until we detect it. Finally, transactions in the
> > >> journal may probably discontinuous and some relatively new transactions
> > >> has been covered, it becomes hard to analyse. If we could record
> > >> transactions continuously between each mount, we could acquire more
> > >> useful info from the journal. Like this:
> > >>
> > >>  |Previous mount checkpointed/recovered logs|Current mount logs         |
> > >>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
> > >>
> > >> And yes the journal area is limited and cannot record everything, the
> > >> problematic transaction may also be covered even if we do this, but
> > >> this is still useful for fuzzy tests and short-running products.
> > >>
> > >> This patch save the head blocknr in the superblock after flushing the
> > >> journal or unmounting the file system, let the next mount could continue
> > >> to record new transaction behind it. This change is backward compatible
> > >> because the old kernel does not care about the head blocknr of the
> > >> journal. It is also fine if we mount a clean old image without valid
> > >> head blocknr, we fail back to set it to s_first just like before.
> > >> Finally, for the case of mount an unclean file system, we could also get
> > >> the journal head easily after scanning/replaying the journal, it will
> > >> continue to record new transaction after the recovered transactions.
> > >>
> > >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > > 
> > > I like this implementation! I even think we could perhaps make ext4 always
> > > behave this way to not increase size of the test matrix. Or do you see any
> > > downside to this option?
> > > 
> > 
> > Thanks for your suggestion. Indeed, I don't find any side effect on this
> > option both in theory and in the actual use tests on ext4, I added a new
> > option was just from the safe point of view and let user could disable it if
> > they don't want it. I also prefer to make ext4 always behave this way.:)
> > 
> > I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
> > want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
> > and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.
> 
> Yes, that makes sense.

FWIW yesterday I'v spoken with Ted and he also agrees that we don't need
ext4 mount option for this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] [PATCH v3 1/2] jbd2: continue to record log between each mount
@ 2023-03-17 11:25           ` Jan Kara via Ocfs2-devel
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara via Ocfs2-devel @ 2023-03-17 11:25 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, Zhang Yi, adilger.kernel, yukuai3, tytso, linux-ext4,
	ocfs2-devel

On Wed 15-03-23 18:28:17, Jan Kara wrote:
> On Wed 15-03-23 20:37:32, Zhang Yi wrote:
> > On 2023/3/15 17:48, Jan Kara wrote:
> > > On Tue 14-03-23 22:05:21, Zhang Yi wrote:
> > >> From: Zhang Yi <yi.zhang@huawei.com>
> > >>
> > >> For a newly mounted file system, the journal committing thread always
> > >> record new transactions from the start of the journal area, no matter
> > >> whether the journal was clean or just has been recovered. So the logdump
> > >> code in debugfs cannot dump continuous logs between each mount, it is
> > >> disadvantageous to analysis corrupted file system image and locate the
> > >> file system inconsistency bugs.
> > >>
> > >> If we get a corrupted file system in the running products and want to
> > >> find out what has happened, besides lookup the system log, one effective
> > >> way is to backtrack the journal log. But we may not always run e2fsck
> > >> before each mount and the default fsck -a mode also cannot always
> > >> checkout all inconsistencies, so it could left over some inconsistencies
> > >> into the next mount until we detect it. Finally, transactions in the
> > >> journal may probably discontinuous and some relatively new transactions
> > >> has been covered, it becomes hard to analyse. If we could record
> > >> transactions continuously between each mount, we could acquire more
> > >> useful info from the journal. Like this:
> > >>
> > >>  |Previous mount checkpointed/recovered logs|Current mount logs         |
> > >>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
> > >>
> > >> And yes the journal area is limited and cannot record everything, the
> > >> problematic transaction may also be covered even if we do this, but
> > >> this is still useful for fuzzy tests and short-running products.
> > >>
> > >> This patch save the head blocknr in the superblock after flushing the
> > >> journal or unmounting the file system, let the next mount could continue
> > >> to record new transaction behind it. This change is backward compatible
> > >> because the old kernel does not care about the head blocknr of the
> > >> journal. It is also fine if we mount a clean old image without valid
> > >> head blocknr, we fail back to set it to s_first just like before.
> > >> Finally, for the case of mount an unclean file system, we could also get
> > >> the journal head easily after scanning/replaying the journal, it will
> > >> continue to record new transaction after the recovered transactions.
> > >>
> > >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> > > 
> > > I like this implementation! I even think we could perhaps make ext4 always
> > > behave this way to not increase size of the test matrix. Or do you see any
> > > downside to this option?
> > > 
> > 
> > Thanks for your suggestion. Indeed, I don't find any side effect on this
> > option both in theory and in the actual use tests on ext4, I added a new
> > option was just from the safe point of view and let user could disable it if
> > they don't want it. I also prefer to make ext4 always behave this way.:)
> > 
> > I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
> > want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
> > and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.
> 
> Yes, that makes sense.

FWIW yesterday I'v spoken with Ted and he also agrees that we don't need
ext4 mount option for this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] jbd2: continue to record log between each mount
  2023-03-17 11:25           ` [Ocfs2-devel] " Jan Kara via Ocfs2-devel
@ 2023-03-18  2:25             ` Zhang Yi via Ocfs2-devel
  -1 siblings, 0 replies; 12+ messages in thread
From: Zhang Yi @ 2023-03-18  2:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Zhang Yi, linux-ext4, tytso, adilger.kernel, yukuai3, ocfs2-devel

On 2023/3/17 19:25, Jan Kara wrote:
> On Wed 15-03-23 18:28:17, Jan Kara wrote:
>> On Wed 15-03-23 20:37:32, Zhang Yi wrote:
>>> On 2023/3/15 17:48, Jan Kara wrote:
>>>> On Tue 14-03-23 22:05:21, Zhang Yi wrote:
>>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>>
>>>>> For a newly mounted file system, the journal committing thread always
>>>>> record new transactions from the start of the journal area, no matter
>>>>> whether the journal was clean or just has been recovered. So the logdump
>>>>> code in debugfs cannot dump continuous logs between each mount, it is
>>>>> disadvantageous to analysis corrupted file system image and locate the
>>>>> file system inconsistency bugs.
>>>>>
>>>>> If we get a corrupted file system in the running products and want to
>>>>> find out what has happened, besides lookup the system log, one effective
>>>>> way is to backtrack the journal log. But we may not always run e2fsck
>>>>> before each mount and the default fsck -a mode also cannot always
>>>>> checkout all inconsistencies, so it could left over some inconsistencies
>>>>> into the next mount until we detect it. Finally, transactions in the
>>>>> journal may probably discontinuous and some relatively new transactions
>>>>> has been covered, it becomes hard to analyse. If we could record
>>>>> transactions continuously between each mount, we could acquire more
>>>>> useful info from the journal. Like this:
>>>>>
>>>>>  |Previous mount checkpointed/recovered logs|Current mount logs         |
>>>>>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
>>>>>
>>>>> And yes the journal area is limited and cannot record everything, the
>>>>> problematic transaction may also be covered even if we do this, but
>>>>> this is still useful for fuzzy tests and short-running products.
>>>>>
>>>>> This patch save the head blocknr in the superblock after flushing the
>>>>> journal or unmounting the file system, let the next mount could continue
>>>>> to record new transaction behind it. This change is backward compatible
>>>>> because the old kernel does not care about the head blocknr of the
>>>>> journal. It is also fine if we mount a clean old image without valid
>>>>> head blocknr, we fail back to set it to s_first just like before.
>>>>> Finally, for the case of mount an unclean file system, we could also get
>>>>> the journal head easily after scanning/replaying the journal, it will
>>>>> continue to record new transaction after the recovered transactions.
>>>>>
>>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> I like this implementation! I even think we could perhaps make ext4 always
>>>> behave this way to not increase size of the test matrix. Or do you see any
>>>> downside to this option?
>>>>
>>>
>>> Thanks for your suggestion. Indeed, I don't find any side effect on this
>>> option both in theory and in the actual use tests on ext4, I added a new
>>> option was just from the safe point of view and let user could disable it if
>>> they don't want it. I also prefer to make ext4 always behave this way.:)
>>>
>>> I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
>>> want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
>>> and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.
>>
>> Yes, that makes sense.
> 
> FWIW yesterday I'v spoken with Ted and he also agrees that we don't need
> ext4 mount option for this.
> 

Thanks! I've removed this mount option in v4.

Yi.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Ocfs2-devel] [PATCH v3 1/2] jbd2: continue to record log between each mount
@ 2023-03-18  2:25             ` Zhang Yi via Ocfs2-devel
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang Yi via Ocfs2-devel @ 2023-03-18  2:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: tytso, Zhang Yi, adilger.kernel, yukuai3, linux-ext4, ocfs2-devel

On 2023/3/17 19:25, Jan Kara wrote:
> On Wed 15-03-23 18:28:17, Jan Kara wrote:
>> On Wed 15-03-23 20:37:32, Zhang Yi wrote:
>>> On 2023/3/15 17:48, Jan Kara wrote:
>>>> On Tue 14-03-23 22:05:21, Zhang Yi wrote:
>>>>> From: Zhang Yi <yi.zhang@huawei.com>
>>>>>
>>>>> For a newly mounted file system, the journal committing thread always
>>>>> record new transactions from the start of the journal area, no matter
>>>>> whether the journal was clean or just has been recovered. So the logdump
>>>>> code in debugfs cannot dump continuous logs between each mount, it is
>>>>> disadvantageous to analysis corrupted file system image and locate the
>>>>> file system inconsistency bugs.
>>>>>
>>>>> If we get a corrupted file system in the running products and want to
>>>>> find out what has happened, besides lookup the system log, one effective
>>>>> way is to backtrack the journal log. But we may not always run e2fsck
>>>>> before each mount and the default fsck -a mode also cannot always
>>>>> checkout all inconsistencies, so it could left over some inconsistencies
>>>>> into the next mount until we detect it. Finally, transactions in the
>>>>> journal may probably discontinuous and some relatively new transactions
>>>>> has been covered, it becomes hard to analyse. If we could record
>>>>> transactions continuously between each mount, we could acquire more
>>>>> useful info from the journal. Like this:
>>>>>
>>>>>  |Previous mount checkpointed/recovered logs|Current mount logs         |
>>>>>  |{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
>>>>>
>>>>> And yes the journal area is limited and cannot record everything, the
>>>>> problematic transaction may also be covered even if we do this, but
>>>>> this is still useful for fuzzy tests and short-running products.
>>>>>
>>>>> This patch save the head blocknr in the superblock after flushing the
>>>>> journal or unmounting the file system, let the next mount could continue
>>>>> to record new transaction behind it. This change is backward compatible
>>>>> because the old kernel does not care about the head blocknr of the
>>>>> journal. It is also fine if we mount a clean old image without valid
>>>>> head blocknr, we fail back to set it to s_first just like before.
>>>>> Finally, for the case of mount an unclean file system, we could also get
>>>>> the journal head easily after scanning/replaying the journal, it will
>>>>> continue to record new transaction after the recovered transactions.
>>>>>
>>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> I like this implementation! I even think we could perhaps make ext4 always
>>>> behave this way to not increase size of the test matrix. Or do you see any
>>>> downside to this option?
>>>>
>>>
>>> Thanks for your suggestion. Indeed, I don't find any side effect on this
>>> option both in theory and in the actual use tests on ext4, I added a new
>>> option was just from the safe point of view and let user could disable it if
>>> they don't want it. I also prefer to make ext4 always behave this way.:)
>>>
>>> I would like to keep the JBD2_CYCLE_RECORD flag(ocfs2 also use jbd2, I don't
>>> want to disturb it until it needs), remove EXT4_MOUNT2_JOURNAL_CYCLE_RECORD
>>> and always set JBD2_CYCLE_RECORD on ext4 in patch 2 in the next iteration.
>>
>> Yes, that makes sense.
> 
> FWIW yesterday I'v spoken with Ted and he also agrees that we don't need
> ext4 mount option for this.
> 

Thanks! I've removed this mount option in v4.

Yi.

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-18  2:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-14 14:05 [PATCH v3 0/2] ext4, jbd2: journal cycled record transactions between each mount Zhang Yi
2023-03-14 14:05 ` [PATCH v3 1/2] jbd2: continue to record log " Zhang Yi
2023-03-15  9:48   ` Jan Kara
2023-03-15 12:37     ` [Ocfs2-devel] " Zhang Yi via Ocfs2-devel
2023-03-15 12:37       ` Zhang Yi
2023-03-15 17:28       ` Jan Kara
2023-03-15 17:28         ` [Ocfs2-devel] " Jan Kara via Ocfs2-devel
2023-03-17 11:25         ` Jan Kara
2023-03-17 11:25           ` [Ocfs2-devel] " Jan Kara via Ocfs2-devel
2023-03-18  2:25           ` Zhang Yi
2023-03-18  2:25             ` [Ocfs2-devel] " Zhang Yi via Ocfs2-devel
2023-03-14 14:05 ` [PATCH v3 2/2] ext4: add journal cycled recording support Zhang Yi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.