[PATCH 0/5] jbd2: fix several checkpoint inconsistent issues

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues
@ 2023-05-31 11:50 Zhang Yi
  2023-05-31 11:50 ` [PATCH 1/5] jbd2: recheck chechpointing non-dirty buffer Zhang Yi
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Zhang Yi @ 2023-05-31 11:50 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3, chengzhihao1

From: Zhang Yi <yi.zhang@huawei.com>

Hello,

The first three patches are from [1] and are not changed, appending
another two (it depends on the first three) to fix another three race
issues in the checkpoint procedure which could also lead to inconsistent
results.

[1] https://lore.kernel.org/linux-ext4/20230516020226.2813588-1-yi.zhang@huaweicloud.com/

Thanks,
Yi.

Zhang Yi (4):
  jbd2: recheck chechpointing non-dirty buffer
  jbd2: remove t_checkpoint_io_list
  jbd2: remove released parameter in journal_shrink_one_cp_list()
  jbd2: fix a race when checking checkpoint buffer busy

Zhihao Cheng (1):
  jbd2: Fix wrongly judgement for buffer head removing while doing
    checkpoint

 fs/jbd2/checkpoint.c  | 186 +++++++++++-------------------------------
 fs/jbd2/commit.c      |   3 +-
 fs/jbd2/transaction.c |   4 +-
 include/linux/jbd2.h  |   9 +-
 4 files changed, 55 insertions(+), 147 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] jbd2: recheck chechpointing non-dirty buffer
  2023-05-31 11:50 [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues Zhang Yi
@ 2023-05-31 11:50 ` Zhang Yi
  2023-05-31 11:50 ` [PATCH 2/5] jbd2: remove t_checkpoint_io_list Zhang Yi
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang Yi @ 2023-05-31 11:50 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3, chengzhihao1

From: Zhang Yi <yi.zhang@huawei.com>

There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.

jbd2_log_do_checkpoint() //transaction X
 //buffer A is dirty and not belones to any transaction
 __buffer_relink_io() //move it to the IO list
 __flush_batch()
  write_dirty_buffer()
                             do_get_write_access()
                             clear_buffer_dirty
                             __jbd2_journal_file_buffer()
                             //add buffer A to a new transaction Y
   lock_buffer(bh)
   //doesn't write out
 __jbd2_journal_remove_checkpoint()
 //finish checkpoint except buffer A
 //filesystem corrupt if the new transaction Y isn't fully write out.

Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.

Cc: stable@vger.kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/checkpoint.c | 102 ++++++++++++-------------------------------
 1 file changed, 29 insertions(+), 73 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
 	}
 }
 
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-
-	if (!transaction->t_checkpoint_io_list) {
-		jh->b_cpnext = jh->b_cpprev = jh;
-	} else {
-		jh->b_cpnext = transaction->t_checkpoint_io_list;
-		jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
-		jh->b_cpprev->b_cpnext = jh;
-		jh->b_cpnext->b_cpprev = jh;
-	}
-	transaction->t_checkpoint_io_list = jh;
-}
-
 /*
  * Check a checkpoint buffer could be release or not.
  *
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
 		struct buffer_head *bh = journal->j_chkpt_bhs[i];
 		BUFFER_TRACE(bh, "brelse");
 		__brelse(bh);
+		journal->j_chkpt_bhs[i] = NULL;
 	}
 	*batch_count = 0;
 }
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 		jh = transaction->t_checkpoint_list;
 		bh = jh2bh(jh);
 
+		/*
+		 * The buffer may be writing back, or flushing out in the
+		 * last couple of cycles, or re-adding into a new transaction,
+		 * need to check it again until it's unlocked.
+		 */
 		if (buffer_locked(bh)) {
 			get_bh(bh);
 			spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 		}
 		if (!buffer_dirty(bh)) {
 			BUFFER_TRACE(bh, "remove from checkpoint");
-			if (__jbd2_journal_remove_checkpoint(jh))
-				/* The transaction was released; we're done */
+			/*
+			 * If the transaction was released or the checkpoint
+			 * list was empty, we're done.
+			 */
+			if (__jbd2_journal_remove_checkpoint(jh) ||
+			    !transaction->t_checkpoint_list)
 				goto out;
-			continue;
+		} else {
+			/*
+			 * We are about to write the buffer, it could be
+			 * raced by some other transaction shrink or buffer
+			 * re-log logic once we release the j_list_lock,
+			 * leave it on the checkpoint list and check status
+			 * again to make sure it's clean.
+			 */
+			BUFFER_TRACE(bh, "queue");
+			get_bh(bh);
+			J_ASSERT_BH(bh, !buffer_jwrite(bh));
+			journal->j_chkpt_bhs[batch_count++] = bh;
+			transaction->t_chp_stats.cs_written++;
+			transaction->t_checkpoint_list = jh->b_cpnext;
 		}
-		/*
-		 * Important: we are about to write the buffer, and
-		 * possibly block, while still holding the journal
-		 * lock.  We cannot afford to let the transaction
-		 * logic start messing around with this buffer before
-		 * we write it to disk, as that would break
-		 * recoverability.
-		 */
-		BUFFER_TRACE(bh, "queue");
-		get_bh(bh);
-		J_ASSERT_BH(bh, !buffer_jwrite(bh));
-		journal->j_chkpt_bhs[batch_count++] = bh;
-		__buffer_relink_io(jh);
-		transaction->t_chp_stats.cs_written++;
+
 		if ((batch_count == JBD2_NR_BATCH) ||
-		    need_resched() ||
-		    spin_needbreak(&journal->j_list_lock))
+		    need_resched() || spin_needbreak(&journal->j_list_lock) ||
+		    jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
 			goto unlock_and_flush;
 	}
 
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 			goto restart;
 	}
 
-	/*
-	 * Now we issued all of the transaction's buffers, let's deal
-	 * with the buffers that are out for I/O.
-	 */
-restart2:
-	/* Did somebody clean up the transaction in the meanwhile? */
-	if (journal->j_checkpoint_transactions != transaction ||
-	    transaction->t_tid != this_tid)
-		goto out;
-
-	while (transaction->t_checkpoint_io_list) {
-		jh = transaction->t_checkpoint_io_list;
-		bh = jh2bh(jh);
-		if (buffer_locked(bh)) {
-			get_bh(bh);
-			spin_unlock(&journal->j_list_lock);
-			wait_on_buffer(bh);
-			/* the journal_head may have gone by now */
-			BUFFER_TRACE(bh, "brelse");
-			__brelse(bh);
-			spin_lock(&journal->j_list_lock);
-			goto restart2;
-		}
-
-		/*
-		 * Now in whatever state the buffer currently is, we
-		 * know that it has been written out and so we can
-		 * drop it from the list
-		 */
-		if (__jbd2_journal_remove_checkpoint(jh))
-			break;
-	}
 out:
 	spin_unlock(&journal->j_list_lock);
 	result = jbd2_cleanup_journal_tail(journal);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] jbd2: remove t_checkpoint_io_list
  2023-05-31 11:50 [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues Zhang Yi
  2023-05-31 11:50 ` [PATCH 1/5] jbd2: recheck chechpointing non-dirty buffer Zhang Yi
@ 2023-05-31 11:50 ` Zhang Yi
  2023-05-31 11:50 ` [PATCH 3/5] jbd2: remove released parameter in journal_shrink_one_cp_list() Zhang Yi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang Yi @ 2023-05-31 11:50 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3, chengzhihao1

From: Zhang Yi <yi.zhang@huawei.com>

Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint()
now, it's time to remove the whole t_checkpoint_io_list logic.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/checkpoint.c | 42 ++----------------------------------------
 fs/jbd2/commit.c     |  3 +--
 include/linux/jbd2.h |  6 ------
 3 files changed, 3 insertions(+), 48 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 25e3c20eb19f..55d6efdbea64 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -27,7 +27,7 @@
  *
  * Called with j_list_lock held.
  */
-static inline void __buffer_unlink_first(struct journal_head *jh)
+static inline void __buffer_unlink(struct journal_head *jh)
 {
 	transaction_t *transaction = jh->b_cp_transaction;
 
@@ -40,23 +40,6 @@ static inline void __buffer_unlink_first(struct journal_head *jh)
 	}
 }
 
-/*
- * Unlink a buffer from a transaction checkpoint(io) list.
- *
- * Called with j_list_lock held.
- */
-static inline void __buffer_unlink(struct journal_head *jh)
-{
-	transaction_t *transaction = jh->b_cp_transaction;
-
-	__buffer_unlink_first(jh);
-	if (transaction->t_checkpoint_io_list == jh) {
-		transaction->t_checkpoint_io_list = jh->b_cpnext;
-		if (transaction->t_checkpoint_io_list == jh)
-			transaction->t_checkpoint_io_list = NULL;
-	}
-}
-
 /*
  * Check a checkpoint buffer could be release or not.
  *
@@ -503,15 +486,6 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
 			break;
 		if (need_resched() || spin_needbreak(&journal->j_list_lock))
 			break;
-		if (released)
-			continue;
-
-		nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_io_list,
-						       nr_to_scan, &released);
-		if (*nr_to_scan == 0)
-			break;
-		if (need_resched() || spin_needbreak(&journal->j_list_lock))
-			break;
 	} while (transaction != last_transaction);
 
 	if (transaction != last_transaction) {
@@ -566,17 +540,6 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
 		 */
 		if (need_resched())
 			return;
-		if (ret)
-			continue;
-		/*
-		 * It is essential that we are as careful as in the case of
-		 * t_checkpoint_list with removing the buffer from the list as
-		 * we can possibly see not yet submitted buffers on io_list
-		 */
-		ret = journal_clean_one_cp_list(transaction->
-				t_checkpoint_io_list, destroy);
-		if (need_resched())
-			return;
 		/*
 		 * Stop scanning if we couldn't free the transaction. This
 		 * avoids pointless scanning of transactions which still
@@ -661,7 +624,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
 	jbd2_journal_put_journal_head(jh);
 
 	/* Is this transaction empty? */
-	if (transaction->t_checkpoint_list || transaction->t_checkpoint_io_list)
+	if (transaction->t_checkpoint_list)
 		return 0;
 
 	/*
@@ -753,7 +716,6 @@ void __jbd2_journal_drop_transaction(journal_t *journal, transaction_t *transact
 	J_ASSERT(transaction->t_forget == NULL);
 	J_ASSERT(transaction->t_shadow_list == NULL);
 	J_ASSERT(transaction->t_checkpoint_list == NULL);
-	J_ASSERT(transaction->t_checkpoint_io_list == NULL);
 	J_ASSERT(atomic_read(&transaction->t_updates) == 0);
 	J_ASSERT(journal->j_committing_transaction != transaction);
 	J_ASSERT(journal->j_running_transaction != transaction);
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index b33155dd7001..1073259902a6 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -1141,8 +1141,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	spin_lock(&journal->j_list_lock);
 	commit_transaction->t_state = T_FINISHED;
 	/* Check if the transaction can be dropped now that we are finished */
-	if (commit_transaction->t_checkpoint_list == NULL &&
-	    commit_transaction->t_checkpoint_io_list == NULL) {
+	if (commit_transaction->t_checkpoint_list == NULL) {
 		__jbd2_journal_drop_transaction(journal, commit_transaction);
 		jbd2_journal_free_transaction(commit_transaction);
 	}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index f619bae1dcc5..91a2cf4bc575 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -622,12 +622,6 @@ struct transaction_s
 	 */
 	struct journal_head	*t_checkpoint_list;
 
-	/*
-	 * Doubly-linked circular list of all buffers submitted for IO while
-	 * checkpointing. [j_list_lock]
-	 */
-	struct journal_head	*t_checkpoint_io_list;
-
 	/*
 	 * Doubly-linked circular list of metadata buffers being
 	 * shadowed by log IO.  The IO buffers on the iobuf list and
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] jbd2: remove released parameter in journal_shrink_one_cp_list()
  2023-05-31 11:50 [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues Zhang Yi
  2023-05-31 11:50 ` [PATCH 1/5] jbd2: recheck chechpointing non-dirty buffer Zhang Yi
  2023-05-31 11:50 ` [PATCH 2/5] jbd2: remove t_checkpoint_io_list Zhang Yi
@ 2023-05-31 11:50 ` Zhang Yi
  2023-05-31 11:50 ` [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint Zhang Yi
  2023-05-31 11:51 ` [PATCH 5/5] jbd2: fix a race when checking checkpoint buffer busy Zhang Yi
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang Yi @ 2023-05-31 11:50 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3, chengzhihao1

From: Zhang Yi <yi.zhang@huawei.com>

After t_checkpoint_io_list is gone, the 'released' parameter in
journal_shrink_one_cp_list() becomes useless, just remove it.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/checkpoint.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 55d6efdbea64..3f52560912a9 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -391,15 +391,13 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
  * journal_shrink_one_cp_list
  *
  * Find 'nr_to_scan' written-back checkpoint buffers in the given list
- * and try to release them. If the whole transaction is released, set
- * the 'released' parameter. Return the number of released checkpointed
+ * and try to release them. Return the number of released checkpointed
  * buffers.
  *
  * Called with j_list_lock held.
  */
 static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
-						unsigned long *nr_to_scan,
-						bool *released)
+						unsigned long *nr_to_scan)
 {
 	struct journal_head *last_jh;
 	struct journal_head *next_jh = jh;
@@ -420,10 +418,8 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
 
 		nr_freed++;
 		ret = __jbd2_journal_remove_checkpoint(jh);
-		if (ret) {
-			*released = true;
+		if (ret)
 			break;
-		}
 
 		if (need_resched())
 			break;
@@ -445,7 +441,6 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
 						  unsigned long *nr_to_scan)
 {
 	transaction_t *transaction, *last_transaction, *next_transaction;
-	bool released;
 	tid_t first_tid = 0, last_tid = 0, next_tid = 0;
 	tid_t tid = 0;
 	unsigned long nr_freed = 0;
@@ -478,10 +473,9 @@ unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal,
 		transaction = next_transaction;
 		next_transaction = transaction->t_cpnext;
 		tid = transaction->t_tid;
-		released = false;
 
 		nr_freed += journal_shrink_one_cp_list(transaction->t_checkpoint_list,
-						       nr_to_scan, &released);
+						       nr_to_scan);
 		if (*nr_to_scan == 0)
 			break;
 		if (need_resched() || spin_needbreak(&journal->j_list_lock))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  2023-05-31 11:50 [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues Zhang Yi
                   ` (2 preceding siblings ...)
  2023-05-31 11:50 ` [PATCH 3/5] jbd2: remove released parameter in journal_shrink_one_cp_list() Zhang Yi
@ 2023-05-31 11:50 ` Zhang Yi
  2023-06-01  9:41   ` Jan Kara
  2023-05-31 11:51 ` [PATCH 5/5] jbd2: fix a race when checking checkpoint buffer busy Zhang Yi
  4 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2023-05-31 11:50 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3, chengzhihao1

From: Zhihao Cheng <chengzhihao1@huawei.com>

Following process,

jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
          P1                   wb_workfn
jbd2_log_do_checkpoint
 if (buffer_locked(bh)) // false
                            __block_write_full_page
                             trylock_buffer(bh)
                             test_clear_buffer_dirty(bh)
 if (!buffer_dirty(bh))
  __jbd2_journal_remove_checkpoint(jh)
   if (buffer_write_io_error(bh)) // false
                             >> bh IO error occurs <<
 jbd2_cleanup_journal_tail
  __jbd2_update_log_tail
   jbd2_write_superblock
   // The bh won't be replayed in next mount.
, which could corrupt the ext4 image, fetch a reproducer in [Link].

Since writeback process clears buffer dirty after locking buffer head,
we can fix it by checking buffer dirty firstly and then checking buffer
locked, the buffer head can be removed if it is neither dirty nor locked.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/checkpoint.c | 48 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 3f52560912a9..620f3d345f3d 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -204,20 +204,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 		jh = transaction->t_checkpoint_list;
 		bh = jh2bh(jh);
 
-		/*
-		 * The buffer may be writing back, or flushing out in the
-		 * last couple of cycles, or re-adding into a new transaction,
-		 * need to check it again until it's unlocked.
-		 */
-		if (buffer_locked(bh)) {
-			get_bh(bh);
-			spin_unlock(&journal->j_list_lock);
-			wait_on_buffer(bh);
-			/* the journal_head may have gone by now */
-			BUFFER_TRACE(bh, "brelse");
-			__brelse(bh);
-			goto retry;
-		}
 		if (jh->b_transaction != NULL) {
 			transaction_t *t = jh->b_transaction;
 			tid_t tid = t->t_tid;
@@ -252,16 +238,7 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 			spin_lock(&journal->j_list_lock);
 			goto restart;
 		}
-		if (!buffer_dirty(bh)) {
-			BUFFER_TRACE(bh, "remove from checkpoint");
-			/*
-			 * If the transaction was released or the checkpoint
-			 * list was empty, we're done.
-			 */
-			if (__jbd2_journal_remove_checkpoint(jh) ||
-			    !transaction->t_checkpoint_list)
-				goto out;
-		} else {
+		if (buffer_dirty(bh)) {
 			/*
 			 * We are about to write the buffer, it could be
 			 * raced by some other transaction shrink or buffer
@@ -275,6 +252,29 @@ int jbd2_log_do_checkpoint(journal_t *journal)
 			journal->j_chkpt_bhs[batch_count++] = bh;
 			transaction->t_chp_stats.cs_written++;
 			transaction->t_checkpoint_list = jh->b_cpnext;
+		} else if (buffer_locked(bh)) {
+			/*
+			 * The buffer may be writing back, or flushing out
+			 * in the last couple of cycles, or re-adding into
+			 * a new transaction, need to check it again until
+			 * it's unlocked.
+			 */
+			get_bh(bh);
+			spin_unlock(&journal->j_list_lock);
+			wait_on_buffer(bh);
+			/* the journal_head may have gone by now */
+			BUFFER_TRACE(bh, "brelse");
+			__brelse(bh);
+			goto retry;
+		} else {
+			BUFFER_TRACE(bh, "remove from checkpoint");
+			/*
+			 * If the transaction was released or the checkpoint
+			 * list was empty, we're done.
+			 */
+			if (__jbd2_journal_remove_checkpoint(jh) ||
+			    !transaction->t_checkpoint_list)
+				goto out;
 		}
 
 		if ((batch_count == JBD2_NR_BATCH) ||
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] jbd2: fix a race when checking checkpoint buffer busy
  2023-05-31 11:50 [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues Zhang Yi
                   ` (3 preceding siblings ...)
  2023-05-31 11:50 ` [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint Zhang Yi
@ 2023-05-31 11:51 ` Zhang Yi
  4 siblings, 0 replies; 11+ messages in thread
From: Zhang Yi @ 2023-05-31 11:51 UTC (permalink / raw)
  To: linux-ext4
  Cc: tytso, adilger.kernel, jack, yi.zhang, yi.zhang, yukuai3, chengzhihao1

From: Zhang Yi <yi.zhang@huawei.com>

Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem, so fix them by introduce a new helper to check the busy
state atomically.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/jbd2/checkpoint.c  | 8 ++++----
 fs/jbd2/transaction.c | 4 ++--
 include/linux/jbd2.h  | 3 +++
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 620f3d345f3d..2dde5fd1f0dd 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -45,11 +45,11 @@ static inline void __buffer_unlink(struct journal_head *jh)
  *
  * Requires j_list_lock
  */
-static inline bool __cp_buffer_busy(struct journal_head *jh)
+static inline bool cp_buffer_busy(struct journal_head *jh)
 {
 	struct buffer_head *bh = jh2bh(jh);
 
-	return (jh->b_transaction || buffer_locked(bh) || buffer_dirty(bh));
+	return (jh->b_transaction || __cp_buffer_busy(bh));
 }
 
 /*
@@ -369,7 +369,7 @@ static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
 		jh = next_jh;
 		next_jh = jh->b_cpnext;
 
-		if (!destroy && __cp_buffer_busy(jh))
+		if (!destroy && cp_buffer_busy(jh))
 			return 0;
 
 		if (__jbd2_journal_remove_checkpoint(jh))
@@ -413,7 +413,7 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
 		next_jh = jh->b_cpnext;
 
 		(*nr_to_scan)--;
-		if (__cp_buffer_busy(jh))
+		if (cp_buffer_busy(jh))
 			continue;
 
 		nr_freed++;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..04863787c93e 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,7 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
 		 * Otherwise, if the buffer has been written to disk,
 		 * it is safe to remove the checkpoint and drop it.
 		 */
-		if (!buffer_dirty(bh)) {
+		if (!__cp_buffer_busy(bh)) {
 			__jbd2_journal_remove_checkpoint(jh);
 			spin_unlock(&journal->j_list_lock);
 			goto drop;
@@ -2112,7 +2112,7 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
 
 	jh = bh2jh(bh);
 
-	if (buffer_locked(bh) || buffer_dirty(bh))
+	if (__cp_buffer_busy(bh))
 		goto out;
 
 	if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 91a2cf4bc575..b17d1efab787 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1440,6 +1440,9 @@ void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block);
 extern void jbd2_journal_commit_transaction(journal_t *);
 
 /* Checkpoint list management */
+#define __cp_buffer_busy(bh) \
+	((bh)->b_state & ((1ul << BH_Dirty) | (1ul << BH_Lock)))
+
 void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
 unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
 int __jbd2_journal_remove_checkpoint(struct journal_head *);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  2023-05-31 11:50 ` [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint Zhang Yi
@ 2023-06-01  9:41   ` Jan Kara
  2023-06-01 13:44     ` Zhihao Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2023-06-01  9:41 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, tytso, adilger.kernel, jack, yi.zhang, yukuai3, chengzhihao1

On Wed 31-05-23 19:50:59, Zhang Yi wrote:
> From: Zhihao Cheng <chengzhihao1@huawei.com>
> 
> Following process,
> 
> jbd2_journal_commit_transaction
> // there are several dirty buffer heads in transaction->t_checkpoint_list
>           P1                   wb_workfn
> jbd2_log_do_checkpoint
>  if (buffer_locked(bh)) // false
>                             __block_write_full_page
>                              trylock_buffer(bh)
>                              test_clear_buffer_dirty(bh)
>  if (!buffer_dirty(bh))
>   __jbd2_journal_remove_checkpoint(jh)
>    if (buffer_write_io_error(bh)) // false
>                              >> bh IO error occurs <<
>  jbd2_cleanup_journal_tail
>   __jbd2_update_log_tail
>    jbd2_write_superblock
>    // The bh won't be replayed in next mount.
> , which could corrupt the ext4 image, fetch a reproducer in [Link].
> 
> Since writeback process clears buffer dirty after locking buffer head,
> we can fix it by checking buffer dirty firstly and then checking buffer
> locked, the buffer head can be removed if it is neither dirty nor locked.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

OK, the analysis is correct but I'm afraid the fix won't be that easy.  The
reordering of tests you did below doesn't really help because CPU or the
compiler are free to order the loads (and stores) in whatever way they
wish. You'd have to use memory barriers when reading and modifying bh flags
(although the modification side is implicitely handled by the bitlock
code) to make this work reliably. But that is IMHO too subtle for this
code.

What we should be doing to avoid these races is to lock the bh. So
something like:

	if (jh->b_transaction != NULL) {
		do stuff
	}
	if (!trylock_buffer(bh)) {
		buffer_locked() branch
	}
	... Now we have the buffer locked and can safely check for dirtyness

And we need to do a similar treatment for journal_clean_one_cp_list() and
journal_shrink_one_cp_list().

BTW, I think we could merge journal_clean_one_cp_list() and
journal_shrink_one_cp_list() into a single common function. I think we can
drop the nr_to_scan argument and just always cleanup the whole checkpoint
list and return the number of freed buffers. That way we have one less
function to deal with checkpoint list cleaning.

Thinking about it some more maybe we can have a function like:

int jbd2_try_remove_checkpoint(struct journal_head *jh)
{
	struct buffer_head *bh = jh2bh(jh);

	if (!trylock_buffer(bh) || buffer_dirty(bh))
		return -EBUSY;
	/*
	 * Buffer is clean and the IO has finished (we hold the buffer lock) so
	 * the checkpoint is done. We can safely remove the buffer from this
	 * transaction.
	 */
	unlock_buffer(bh);
	return __jbd2_journal_remove_checkpoint(jh);
}

and that can be used with a bit of care in the checkpointing functions as
well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
journal_unmap_buffer().

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  2023-06-01  9:41   ` Jan Kara
@ 2023-06-01 13:44     ` Zhihao Cheng
  2023-06-01 14:20       ` Zhang Yi
  0 siblings, 1 reply; 11+ messages in thread
From: Zhihao Cheng @ 2023-06-01 13:44 UTC (permalink / raw)
  To: Jan Kara, Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, yi.zhang, yukuai3

在 2023/6/1 17:41, Jan Kara 写道:

Hi, Jan
> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>> From: Zhihao Cheng <chengzhihao1@huawei.com>
>>
>> Following process,
>>
>> jbd2_journal_commit_transaction
>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>>            P1                   wb_workfn
>> jbd2_log_do_checkpoint
>>   if (buffer_locked(bh)) // false
>>                              __block_write_full_page
>>                               trylock_buffer(bh)
>>                               test_clear_buffer_dirty(bh)
>>   if (!buffer_dirty(bh))
>>    __jbd2_journal_remove_checkpoint(jh)
>>     if (buffer_write_io_error(bh)) // false
>>                               >> bh IO error occurs <<
>>   jbd2_cleanup_journal_tail
>>    __jbd2_update_log_tail
>>     jbd2_write_superblock
>>     // The bh won't be replayed in next mount.
>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>
>> Since writeback process clears buffer dirty after locking buffer head,
>> we can fix it by checking buffer dirty firstly and then checking buffer
>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> OK, the analysis is correct but I'm afraid the fix won't be that easy.  The
> reordering of tests you did below doesn't really help because CPU or the
> compiler are free to order the loads (and stores) in whatever way they
> wish. You'd have to use memory barriers when reading and modifying bh flags
> (although the modification side is implicitely handled by the bitlock
> code) to make this work reliably. But that is IMHO too subtle for this
> code.
> 

Do you mean there might be a sequence like following:

jbd2_log_do_checkpoint
  if (buffer_dirty(bh))
  else if (buffer_locked(bh))
  else
    __jbd2_journal_remove_checkpoint(jh)

CPU re-arranges the order of getting buffer state.
reg_1 = buffer_locked(bh)  // false
                            lock_buffer(bh)
                            clear_buffer(bh)
reg_2 = buffer_dirty(bh)   // false

Then, jbd2_log_do_checkpoint() could become:
if (reg_2)
else if (reg_1)
else
   __jbd2_journal_remove_checkpoint(jh)  // enter !

Am I understanding right?

> What we should be doing to avoid these races is to lock the bh. So
> something like:
> 
> 	if (jh->b_transaction != NULL) {
> 		do stuff
> 	}
> 	if (!trylock_buffer(bh)) {
> 		buffer_locked() branch
> 	}
> 	... Now we have the buffer locked and can safely check for dirtyness
> 
> And we need to do a similar treatment for journal_clean_one_cp_list() and
> journal_shrink_one_cp_list().
> 
> BTW, I think we could merge journal_clean_one_cp_list() and
> journal_shrink_one_cp_list() into a single common function. I think we can
> drop the nr_to_scan argument and just always cleanup the whole checkpoint
> list and return the number of freed buffers. That way we have one less
> function to deal with checkpoint list cleaning.
> 
> Thinking about it some more maybe we can have a function like:
> 
> int jbd2_try_remove_checkpoint(struct journal_head *jh)
> {
> 	struct buffer_head *bh = jh2bh(jh);
> 
> 	if (!trylock_buffer(bh) || buffer_dirty(bh))
> 		return -EBUSY;
> 	/*
> 	 * Buffer is clean and the IO has finished (we hold the buffer lock) so
> 	 * the checkpoint is done. We can safely remove the buffer from this
> 	 * transaction.
> 	 */
> 	unlock_buffer(bh);
> 	return __jbd2_journal_remove_checkpoint(jh);
> }
> 
> and that can be used with a bit of care in the checkpointing functions as
> well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
> journal_unmap_buffer().
> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  2023-06-01 13:44     ` Zhihao Cheng
@ 2023-06-01 14:20       ` Zhang Yi
  2023-06-01 16:31         ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Zhang Yi @ 2023-06-01 14:20 UTC (permalink / raw)
  To: Zhihao Cheng, Jan Kara
  Cc: linux-ext4, tytso, adilger.kernel, yi.zhang, yukuai3

On 2023/6/1 21:44, Zhihao Cheng wrote:
> 在 2023/6/1 17:41, Jan Kara 写道:
> 
> Hi, Jan
>> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>>> From: Zhihao Cheng <chengzhihao1@huawei.com>
>>>
>>> Following process,
>>>
>>> jbd2_journal_commit_transaction
>>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>>>            P1                   wb_workfn
>>> jbd2_log_do_checkpoint
>>>   if (buffer_locked(bh)) // false
>>>                              __block_write_full_page
>>>                               trylock_buffer(bh)
>>>                               test_clear_buffer_dirty(bh)
>>>   if (!buffer_dirty(bh))
>>>    __jbd2_journal_remove_checkpoint(jh)
>>>     if (buffer_write_io_error(bh)) // false
>>>                               >> bh IO error occurs <<
>>>   jbd2_cleanup_journal_tail
>>>    __jbd2_update_log_tail
>>>     jbd2_write_superblock
>>>     // The bh won't be replayed in next mount.
>>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>>
>>> Since writeback process clears buffer dirty after locking buffer head,
>>> we can fix it by checking buffer dirty firstly and then checking buffer
>>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>
>> OK, the analysis is correct but I'm afraid the fix won't be that easy.  The
>> reordering of tests you did below doesn't really help because CPU or the
>> compiler are free to order the loads (and stores) in whatever way they
>> wish. You'd have to use memory barriers when reading and modifying bh flags
>> (although the modification side is implicitely handled by the bitlock
>> code) to make this work reliably. But that is IMHO too subtle for this
>> code.
>>
> 

I write two litmus-test files in tools/memory-model to check the memory module
of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.

**1. test-ww-rr.litmus**  //simulate our changes in jbd2_log_do_checkpoint()

  C WW-RR

  (*
   * Result: Never
   *
   * This test shows that write-write ordering
   * (in P0()) is visible to external process P2().
   *
   * bit 0: lock
   * bit 1: dirty
   *)

  {
          x=2;
  }

  P0(int *x)
  {
          int r1;

          r1 = READ_ONCE(*x);
          r1 = r1 | 1;  //lock
          WRITE_ONCE(*x, r1);

          r1 = READ_ONCE(*x);
          r1 = r1 & 253;  //&~2, clear dirty
          WRITE_ONCE(*x, r1);
  }

  P1(int *x)
  {
          int r0;
          int r1;

          r0 = READ_ONCE(*x);
          r1 = READ_ONCE(*x);
  }

  exists (1:r1=2 /\ 1:r0=1)

The test results are:

  $ herd7 -conf linux-kernel.cfg litmus-tests/test-ww-rr.litmus
  Test WW-RR Allowed
  States 6
  1:r0=1; 1:r1=1;
  1:r0=2; 1:r1=1;
  1:r0=2; 1:r1=2;
  1:r0=2; 1:r1=3;
  1:r0=3; 1:r1=1;
  1:r0=3; 1:r1=3;
  No
  Witnesses
  Positive: 0 Negative: 6
  Condition exists (1:r1=2 /\ 1:r0=1)
  Observation WW-RR Never 0 6
  Time WW-RR 0.02
  Hash=d91ecee2379f8ac878b8d06f17967874

**2. test-ww-r.litmus**  //simulate our changes in __cp_buffer_busy()

  C WW-R

  (*
   * Result: Never
   *
   * This test shows that write-write ordering
   * (in P0()) is visible to external process P2().
   *
   * bit 0: lock
   * bit 1: dirty
   *)

  {
          x=2;
  }

  P0(int *x)
  {
          int r1;

          r1 = READ_ONCE(*x);
          r1 = r1 | 1;  //lock
          WRITE_ONCE(*x, r1);

          r1 = READ_ONCE(*x);
          r1 = r1 & 253;  //&~2, clear dirty
          WRITE_ONCE(*x, r1);
  }

  P1(int *x)
  {
          int r0;

          r0 = READ_ONCE(*x);
  }

  exists (1:r0=0)

The test results are:

  $ herd7 -conf linux-kernel.cfg litmus-tests/test-ww-r.litmus
  Test WW-R Allowed
  States 3
  1:r0=1;
  1:r0=2;
  1:r0=3;
  No
  Witnesses
  Positive: 0 Negative: 3
  Condition exists (1:r0=0)
  Observation WW-R Never 0 3
  Time WW-R 0.01
  Hash=d76bb39c367dc0cbd0c363a87c6c9eb7

So it looks like the out-of-order situation cannot happen, am I miss something?

Thanks,
Yi.

> Do you mean there might be a sequence like following:
> 
> jbd2_log_do_checkpoint
>  if (buffer_dirty(bh))
>  else if (buffer_locked(bh))
>  else
>    __jbd2_journal_remove_checkpoint(jh)
> 
> CPU re-arranges the order of getting buffer state.
> reg_1 = buffer_locked(bh)  // false
>                            lock_buffer(bh)
>                            clear_buffer(bh)
> reg_2 = buffer_dirty(bh)   // false
> 
> Then, jbd2_log_do_checkpoint() could become:
> if (reg_2)
> else if (reg_1)
> else
>   __jbd2_journal_remove_checkpoint(jh)  // enter !
> 
> Am I understanding right?
> 
>> What we should be doing to avoid these races is to lock the bh. So
>> something like:
>>
>>     if (jh->b_transaction != NULL) {
>>         do stuff
>>     }
>>     if (!trylock_buffer(bh)) {
>>         buffer_locked() branch
>>     }
>>     ... Now we have the buffer locked and can safely check for dirtyness
>>
>> And we need to do a similar treatment for journal_clean_one_cp_list() and
>> journal_shrink_one_cp_list().
>>
>> BTW, I think we could merge journal_clean_one_cp_list() and
>> journal_shrink_one_cp_list() into a single common function. I think we can
>> drop the nr_to_scan argument and just always cleanup the whole checkpoint
>> list and return the number of freed buffers. That way we have one less
>> function to deal with checkpoint list cleaning.
>>
>> Thinking about it some more maybe we can have a function like:
>>
>> int jbd2_try_remove_checkpoint(struct journal_head *jh)
>> {
>>     struct buffer_head *bh = jh2bh(jh);
>>
>>     if (!trylock_buffer(bh) || buffer_dirty(bh))
>>         return -EBUSY;
>>     /*
>>      * Buffer is clean and the IO has finished (we hold the buffer lock) so
>>      * the checkpoint is done. We can safely remove the buffer from this
>>      * transaction.
>>      */
>>     unlock_buffer(bh);
>>     return __jbd2_journal_remove_checkpoint(jh);
>> }
>>
>> and that can be used with a bit of care in the checkpointing functions as
>> well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
>> journal_unmap_buffer().
>>
>>                                 Honza
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  2023-06-01 14:20       ` Zhang Yi
@ 2023-06-01 16:31         ` Jan Kara
  2023-06-02  1:52           ` Zhihao Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2023-06-01 16:31 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Zhihao Cheng, Jan Kara, linux-ext4, tytso, adilger.kernel,
	yi.zhang, yukuai3

On Thu 01-06-23 22:20:38, Zhang Yi wrote:
> On 2023/6/1 21:44, Zhihao Cheng wrote:
> > 在 2023/6/1 17:41, Jan Kara 写道:
> > 
> > Hi, Jan
> >> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
> >>> From: Zhihao Cheng <chengzhihao1@huawei.com>
> >>>
> >>> Following process,
> >>>
> >>> jbd2_journal_commit_transaction
> >>> // there are several dirty buffer heads in transaction->t_checkpoint_list
> >>>            P1                   wb_workfn
> >>> jbd2_log_do_checkpoint
> >>>   if (buffer_locked(bh)) // false
> >>>                              __block_write_full_page
> >>>                               trylock_buffer(bh)
> >>>                               test_clear_buffer_dirty(bh)
> >>>   if (!buffer_dirty(bh))
> >>>    __jbd2_journal_remove_checkpoint(jh)
> >>>     if (buffer_write_io_error(bh)) // false
> >>>                               >> bh IO error occurs <<
> >>>   jbd2_cleanup_journal_tail
> >>>    __jbd2_update_log_tail
> >>>     jbd2_write_superblock
> >>>     // The bh won't be replayed in next mount.
> >>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
> >>>
> >>> Since writeback process clears buffer dirty after locking buffer head,
> >>> we can fix it by checking buffer dirty firstly and then checking buffer
> >>> locked, the buffer head can be removed if it is neither dirty nor locked.
> >>>
> >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
> >>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
> >>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> >>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> OK, the analysis is correct but I'm afraid the fix won't be that easy.  The
> >> reordering of tests you did below doesn't really help because CPU or the
> >> compiler are free to order the loads (and stores) in whatever way they
> >> wish. You'd have to use memory barriers when reading and modifying bh flags
> >> (although the modification side is implicitely handled by the bitlock
> >> code) to make this work reliably. But that is IMHO too subtle for this
> >> code.
> >>
> > 
> 
> I write two litmus-test files in tools/memory-model to check the memory module
> of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.

<snip litmus tests>

> So it looks like the out-of-order situation cannot happen, am I miss something?

After thinking about it a bit, indeed CPU cannot reorder the two loads
because they are from the same location in memory. Thanks for correcting me
on this. I'm not sure whether a C compiler still could not reorder the
tests - technically I suspect the C standard does not forbid this although
it would have to be really evil compiler to do this.

But still I think with the helper function I've suggested things are much
more obviously correct.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
  2023-06-01 16:31         ` Jan Kara
@ 2023-06-02  1:52           ` Zhihao Cheng
  0 siblings, 0 replies; 11+ messages in thread
From: Zhihao Cheng @ 2023-06-02  1:52 UTC (permalink / raw)
  To: Jan Kara, Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, yi.zhang, yukuai3

在 2023/6/2 0:31, Jan Kara 写道:
> On Thu 01-06-23 22:20:38, Zhang Yi wrote:
>> On 2023/6/1 21:44, Zhihao Cheng wrote:
>>> 在 2023/6/1 17:41, Jan Kara 写道:
>>>
>>> Hi, Jan
>>>> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>>>>> From: Zhihao Cheng <chengzhihao1@huawei.com>
>>>>>
>>>>> Following process,
>>>>>
>>>>> jbd2_journal_commit_transaction
>>>>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>>>>>             P1                   wb_workfn
>>>>> jbd2_log_do_checkpoint
>>>>>    if (buffer_locked(bh)) // false
>>>>>                               __block_write_full_page
>>>>>                                trylock_buffer(bh)
>>>>>                                test_clear_buffer_dirty(bh)
>>>>>    if (!buffer_dirty(bh))
>>>>>     __jbd2_journal_remove_checkpoint(jh)
>>>>>      if (buffer_write_io_error(bh)) // false
>>>>>                                >> bh IO error occurs <<
>>>>>    jbd2_cleanup_journal_tail
>>>>>     __jbd2_update_log_tail
>>>>>      jbd2_write_superblock
>>>>>      // The bh won't be replayed in next mount.
>>>>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>>>>
>>>>> Since writeback process clears buffer dirty after locking buffer head,
>>>>> we can fix it by checking buffer dirty firstly and then checking buffer
>>>>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>>>>
>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>>>>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>>>>> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
>>>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>>>>
>>>> OK, the analysis is correct but I'm afraid the fix won't be that easy.  The
>>>> reordering of tests you did below doesn't really help because CPU or the
>>>> compiler are free to order the loads (and stores) in whatever way they
>>>> wish. You'd have to use memory barriers when reading and modifying bh flags
>>>> (although the modification side is implicitely handled by the bitlock
>>>> code) to make this work reliably. But that is IMHO too subtle for this
>>>> code.
>>>>
>>>
>>
>> I write two litmus-test files in tools/memory-model to check the memory module
>> of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.
> 
> <snip litmus tests>
> 
>> So it looks like the out-of-order situation cannot happen, am I miss something?
> 
> After thinking about it a bit, indeed CPU cannot reorder the two loads
> because they are from the same location in memory. Thanks for correcting me
> on this. I'm not sure whether a C compiler still could not reorder the
> tests - technically I suspect the C standard does not forbid this although
> it would have to be really evil compiler to do this.
> 
> But still I think with the helper function I've suggested things are much
> more obviously correct.

Thanks for suggestions, we will modify it in next iteration.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-02  1:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-31 11:50 [PATCH 0/5] jbd2: fix several checkpoint inconsistent issues Zhang Yi
2023-05-31 11:50 ` [PATCH 1/5] jbd2: recheck chechpointing non-dirty buffer Zhang Yi
2023-05-31 11:50 ` [PATCH 2/5] jbd2: remove t_checkpoint_io_list Zhang Yi
2023-05-31 11:50 ` [PATCH 3/5] jbd2: remove released parameter in journal_shrink_one_cp_list() Zhang Yi
2023-05-31 11:50 ` [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint Zhang Yi
2023-06-01  9:41   ` Jan Kara
2023-06-01 13:44     ` Zhihao Cheng
2023-06-01 14:20       ` Zhang Yi
2023-06-01 16:31         ` Jan Kara
2023-06-02  1:52           ` Zhihao Cheng
2023-05-31 11:51 ` [PATCH 5/5] jbd2: fix a race when checking checkpoint buffer busy Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).