Re: JBD2: journal transaction 6943 on loop0-8 is corrupt.

From: Manfred Spraul <manfred@colorfullife.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, 1vier1@web.de
Subject: Re: JBD2: journal transaction 6943 on loop0-8 is corrupt.
Date: Thu, 30 Dec 2021 09:16:13 +0100	[thread overview]
Message-ID: <8fe067d0-6d57-9dd7-2c10-5a2c34037ee1@colorfullife.com> (raw)
In-Reply-To: <Yc0NUYyRhLdtapq+@mit.edu>

[-- Attachment #1: Type: text/plain, Size: 4552 bytes --]

Hi Ted,

On 12/30/21 02:37, Theodore Ts'o wrote:
> On Tue, Dec 28, 2021 at 09:36:22PM +0100, Manfred Spraul wrote:
>> Hi,
>>
>> with simulated power failures, I see a corrupted journal
>>
>> [39056.200845] JBD2: journal transaction 6943 on loop0-8 is corrupt.
>> [39056.200851] EXT4-fs (loop0): error loading journal
> This means that the journal replay found a commit which was *not* the
> last commit, and which contained a CRC error.  If it's the last commit
> (e.g., there is no valid subsequent commit block), then it's possible
> that the journal commit was never completed before the system crashed
> --- e.g., it was an interrupted commit.

It is the last commit, there are no valid subsequent commit blocks.

The current failure model is simple: all blocks up to block <n> are 
written, the blocks starting from <n+1> are discarded.

What I can't rule out, but I think this is not what I see:
The image is small (512 MB) and everything is in memory. Thus I would 
not rule out that the whole journal is filled within less than one second.
The commit header contains h_commit_sec and h_commit_nsec, but from what 
I see, do_one_pass() evaluates only h_commit_sec.


> Your test is aborting the commit at various points in the write I/O
> stream, so it should be simulating an interrupted commit (assuming
> that it's not corrupting any I/O.  So the jbd2 layer should have
> understood it was the last commit in the journal, and been OK with the
> checksum failure.

I think the jbd2 layer understood that it was the last commit - but it 
nevertheless failed the recovery.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/jbd2/recovery.c?h=v5.16-rc7#n809

> |
> 		if  (pass  ==  PASS_SCAN  &&
> 			!jbd2_commit_block_csum_verify(journal,
> 							bh->b_data))  {
> 			chksum_error:
> 				if  (commit_time  <  last_trans_commit_time)
> 					goto  ignore_crc_mismatch;
> 				info->end_transaction  =  next_commit_ID;
>
> 				if  (!jbd2_has_feature_async_commit(journal))  {
> 					journal->j_failed_commit  =
> 						next_commit_ID;
> 					brelse(bh);
> 					break;
> 				}
> 			}
> |

async_commit() is false.

journal->j_failed_commit is set, and thus after journal playback, a 
failure is reported :-(


What I have done:

- I have removed the line journal->j_failed_commit = next_commit_ID, 
then the image is mounted.

- I had added pr_info() lines, and this confirms that it starts with a 
failure of jbd2_block_csum_verify().

dmesg with JBD2 debug output and additional pr_info lines:

 >>>

[  748.591348] next_commit_id increased: 6943.
[  748.593397] fs/jbd2/recovery.c: (do_one_pass, 517): Scanning for 
sequence ID 6943 at 4544/8192
[  748.594322] fs/jbd2/recovery.c: (do_one_pass, 526): JBD2: checking 
block 4544
[  748.595879] fs/jbd2/recovery.c: (do_one_pass, 549): Found magic 5, 
sequence 6943
[  748.596800] fs/jbd2/recovery.c: (do_one_pass, 517): Scanning for 
sequence ID 6943 at 4545/8192
[  748.600073] fs/jbd2/recovery.c: (do_one_pass, 526): JBD2: checking 
block 4545
[  748.601398] fs/jbd2/recovery.c: (do_one_pass, 549): Found magic 1, 
sequence 6943
[  748.603481] fs/jbd2/recovery.c: (do_one_pass, 517): Scanning for 
sequence ID 6943 at 4574/8192
[  748.606238] fs/jbd2/recovery.c: (do_one_pass, 526): JBD2: checking 
block 4574
[  748.607241] fs/jbd2/recovery.c: (do_one_pass, 549): Found magic 2, 
sequence 6943
[  748.608248] commit_csum_verify error: provided 81be34bd.
[  748.610199] direct block verify error.
[  748.612327] chksum_error.
[  748.617616] error line 2.
[  748.622085] error line 3.
[  748.623291] fs/jbd2/recovery.c: (do_one_pass, 517): Scanning for 
sequence ID 6943 at 4575/8192
[  748.624133] fs/jbd2/recovery.c: (do_one_pass, 526): JBD2: checking 
block 4575
[  748.625134] done, info->end_transaction is 6943.
[  748.626966] done with update, info->end_transaction is 6943.
[  748.629521] next_commit_id: initial value 6799.
[  748.630836] fs/jbd2/recovery.c: (do_one_pass, 491): Starting recovery 
pass 1
[  748.631529] fs/jbd2/recovery.c: (do_one_pass, 517): Scanning for 
sequence ID 6799 at 6734/8192
[  748.633346] fs/jbd2/recovery.c: (do_one_pass, 526): JBD2: checking 
block 6734

<<<

What do you think?
Is JBD2 too aggressive in declaring something a journal corruption?

What is definitively correct is that the failure is minor. fsck -f was 
able to correct it.
Thus: What is your understanding:
If a mount command fails due to a journal corruption, should user space 
first try a fsck -f before giving up?


--

     Manfred

[-- Attachment #2: 0001-jbd2-recovery.c-Continue-on-csum-failures-for-commit.patch --]
[-- Type: text/x-patch, Size: 2744 bytes --]

From 759062a21176c46a8fc1fa4d95e20c4450a10b9a Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Thu, 30 Dec 2021 08:22:00 +0100
Subject: [PATCH] jbd2/recovery.c: Continue on csum failures for commit record

Noticed with simulated power failures, i.e. not on real hardware:

The failure model is:
- 512 bytes writes are atomic.
- Larger writes are not atomic.
- Everything is written in order.

Since the JBD2 block size can be larger than the block size of the
physical drive, it may happen that a (JBD2) block starts with the
expected magic/block type==JBD2_COMMIT_BLOCK/sequence number/
commit time, but nevertheless the csum verification fails because
jbd2_commit_block_csum_verify() calculates a checksum over
the complete JBD2 block.

Thus: Just end the scan on a csum failure.

Note: The change is most likely incomplete. There are probably
more situations where the code assumes that an incorrect csum
is always a corruption.

---
 fs/jbd2/recovery.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 8ca3527189f8..f6d59bc204a1 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -709,6 +709,20 @@ static int do_one_pass(journal_t *journal,
 			/*     How to differentiate between interrupted commit
 			 *               and journal corruption ?
 			 *
+			 * Assume: Physical block size 512 bytes,
+			 *         j->j_blocksize=1024
+			 * If the 1st physical block of a commit block is
+			 * written, then the correct magic/block type/
+			 * sequence number/commit time will be there.
+			 * If the 2nd block is not written, then the csum
+			 * verification will fail, because the csum is
+			 * calculated over the whole JBD2 block.
+			 *
+			 * Thus: Only only async_commit, n-th transaction fails
+			 * csum check, (n+1)th transaction passes csum check,
+			 * is a journal corruption. Everything else could be
+			 * just an interrupted write.
+			 *
 			 * {nth transaction}
 			 *        Checksum Verification Failed
 			 *			 |
@@ -717,7 +731,7 @@ static int do_one_pass(journal_t *journal,
 			 * 	async_commit             sync_commit
 			 *     		|                    |
 			 *		| GO TO NEXT    "Journal Corruption"
-			 *		| TRANSACTION
+			 *		| TRANSACTION   or "Interrupted Commit"
 			 *		|
 			 * {(n+1)th transanction}
 			 *		|
@@ -806,8 +820,9 @@ static int do_one_pass(journal_t *journal,
 				info->end_transaction = next_commit_ID;
 
 				if (!jbd2_has_feature_async_commit(journal)) {
-					journal->j_failed_commit =
-						next_commit_ID;
+					/* Interrupted commit or corrupt
+					 * journal. Assume interrupted commit.
+					 */
 					brelse(bh);
 					break;
 				}
-- 
2.33.1


[-- Attachment #3: 0003-DEBUG-patch-add-printk-to-fs-jbd2-recovery.c.patch --]
[-- Type: text/x-patch, Size: 3320 bytes --]

From 693bcf65312e202c15b2f291e1d105b5486cfeb9 Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Thu, 30 Dec 2021 08:03:04 +0100
Subject: [PATCH 3/3] DEBUG patch: add printk to fs/jbd2/recovery.c

debug printouts.

---
 fs/jbd2/journal.c  |  2 +-
 fs/jbd2/recovery.c | 13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 35302bc192eb..58545fa74b9f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -49,7 +49,7 @@
 #include <asm/page.h>
 
 #ifdef CONFIG_JBD2_DEBUG
-ushort jbd2_journal_enable_debug __read_mostly;
+ushort jbd2_journal_enable_debug __read_mostly = 99;
 EXPORT_SYMBOL(jbd2_journal_enable_debug);
 
 module_param_named(jbd2_debug, jbd2_journal_enable_debug, ushort, 0644);
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 8ca3527189f8..b66884786cb9 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -428,6 +428,9 @@ static int jbd2_commit_block_csum_verify(journal_t *j, void *buf)
 	h->h_chksum[0] = 0;
 	calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
 	h->h_chksum[0] = provided;
+if (provided != cpu_to_be32(calculated)) {
+pr_info("commit_csum_verify error: provided %x.\n", provided);
+}
 
 	return provided == cpu_to_be32(calculated);
 }
@@ -478,6 +481,7 @@ static int do_one_pass(journal_t *journal,
 
 	sb = journal->j_superblock;
 	next_commit_ID = be32_to_cpu(sb->s_sequence);
+pr_info("next_commit_id: initial value %u.\n", (unsigned int) next_commit_ID);
 	next_log_block = be32_to_cpu(sb->s_start);
 
 	first_commit_ID = next_commit_ID;
@@ -783,6 +787,7 @@ static int do_one_pass(journal_t *journal,
 					brelse(bh);
 					break;
 				}
+pr_info("crc32_sum %x found_csum %x.\n", crc32_sum, found_chksum);
 
 				/* Neither checksum match nor unused? */
 				if (!((crc32_sum == found_chksum &&
@@ -800,22 +805,28 @@ static int do_one_pass(journal_t *journal,
 			if (pass == PASS_SCAN &&
 			    !jbd2_commit_block_csum_verify(journal,
 							   bh->b_data)) {
+pr_info("direct block verify error.\n");
 			chksum_error:
+pr_info("chksum_error.\n");
 				if (commit_time < last_trans_commit_time)
 					goto ignore_crc_mismatch;
+pr_info("error line 2.\n");
 				info->end_transaction = next_commit_ID;
 
 				if (!jbd2_has_feature_async_commit(journal)) {
+pr_info("error line 3.\n");
 					journal->j_failed_commit =
 						next_commit_ID;
 					brelse(bh);
 					break;
 				}
+pr_info("error line 4.\n");
 			}
 			if (pass == PASS_SCAN)
 				last_trans_commit_time = commit_time;
 			brelse(bh);
 			next_commit_ID++;
+pr_info("next_commit_id increased: %u.\n", (unsigned int) next_commit_ID);
 			continue;
 
 		case JBD2_REVOKE_BLOCK:
@@ -859,10 +870,12 @@ static int do_one_pass(journal_t *journal,
 	 * log.  If the latter happened, then we know that the "current"
 	 * transaction marks the end of the valid log.
 	 */
+pr_info("done, info->end_transaction is %ld.\n", (long)info->end_transaction);
 
 	if (pass == PASS_SCAN) {
 		if (!info->end_transaction)
 			info->end_transaction = next_commit_ID;
+pr_info("done with update, info->end_transaction is %ld.\n", (long)info->end_transaction);
 	} else {
 		/* It's really bad news if different passes end up at
 		 * different places (but possible due to IO errors). */
-- 
2.33.1