All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: tytso@mit.edu, jack@suse.com, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, yi.zhang@huawei.com,
	libaokun1@huawei.com, zhanchengbin1@huawei.com
Subject: Re: [PATCH v2] jbd2: Fix data missing when reusing bh which is ready to be checkpointed
Date: Fri, 6 Jan 2023 15:22:55 +0100	[thread overview]
Message-ID: <20230106142255.fqnzgw5tqr77mdzj@quack3> (raw)
In-Reply-To: <20230106115603.2624644-1-chengzhihao1@huawei.com>

On Fri 06-01-23 19:56:03, Zhihao Cheng wrote:
> Following process will make data lost and could lead to a filesystem
> corrupted problem:
> 
> 1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and
>    jh->b_transaction = NULL
> 2. T1 is added into journal->j_checkpoint_transactions.
> 3. Get bh prepare to write while doing checkpoing:
>            PA				    PB
>    do_get_write_access             jbd2_log_do_checkpoint
>     spin_lock(&jh->b_state_lock)
>      if (buffer_dirty(bh))
>       clear_buffer_dirty(bh)   // clear buffer dirty
>        set_buffer_jbddirty(bh)
> 				    transaction =
> 				    journal->j_checkpoint_transactions
> 				    jh = transaction->t_checkpoint_list
> 				    if (!buffer_dirty(bh))
> 		                      __jbd2_journal_remove_checkpoint(jh)
> 				      // bh won't be flushed
> 		                    jbd2_cleanup_journal_tail
>     __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved)
> 4. Aborting journal/Power-cut before writing latest bh on journal area.
> 
> In this way we get a corrupted filesystem with bh's data lost.
> 
> Fix it by moving the clearing of buffer_dirty bit just before the call
> to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction
> assignment are under journal->j_list_lock locked, so that
> jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed
> even bh is currently not dirty. And journal_shrink_one_cp_list() won't
> remove jh from checkpoint list if the buffer head is reused in
> do_get_write_access().
> 
> Cc: <stable@kernel.org>
> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com>
> Suggested-by: Jan Kara <jack@suse.cz>

Thanks for the patch! It looks good, some suggestions for making it a bit
more tidy below:

> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 6a404ac1c178..06a5e7961ef2 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1010,36 +1010,37 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  	 * ie. locked but not dirty) or tune2fs (which may actually have
>  	 * the buffer dirtied, ugh.)  */
>  
> -	if (buffer_dirty(bh)) {
> +	if (buffer_dirty(bh) && jh->b_transaction) {
>  		/*
>  		 * First question: is this buffer already part of the current
>  		 * transaction or the existing committing transaction?
>  		 */
> -		if (jh->b_transaction) {
> -			J_ASSERT_JH(jh,
> -				jh->b_transaction == transaction ||
> -				jh->b_transaction ==
> -					journal->j_committing_transaction);
> -			if (jh->b_next_transaction)
> -				J_ASSERT_JH(jh, jh->b_next_transaction ==
> -							transaction);
> -			warn_dirty_buffer(bh);
> -		}
> +		J_ASSERT_JH(jh, jh->b_transaction == transaction ||
> +			jh->b_transaction == journal->j_committing_transaction);
> +		if (jh->b_next_transaction)
> +			J_ASSERT_JH(jh, jh->b_next_transaction == transaction);
> +		warn_dirty_buffer(bh);
>  		/*
> -		 * In any case we need to clean the dirty flag and we must
> -		 * do it under the buffer lock to be sure we don't race
> -		 * with running write-out.
> +		 * We need to clean the dirty flag and we must do it under the
> +		 * buffer lock to be sure we don't race with running write-out.
>  		 */
>  		JBUFFER_TRACE(jh, "Journalling dirty buffer");
>  		clear_buffer_dirty(bh);
> +		/*
> +		 * Setting jbddirty after clearing buffer dirty is necessary.
> +		 * Function jbd2_journal_restart() could keep buffer on
> +		 * BJ_Reserved list until the transaction committing, then the
> +		 * buffer won't be dirtied by jbd2_journal_refile_buffer()
> +		 * after committing, the buffer couldn't fall on disk even
> +		 * last checkpoint finished, which may corrupt filesystem.
> +		 */
>  		set_buffer_jbddirty(bh);
>  	}

So I think the sequence:

	if (buffer_dirty(bh)) {
		warn_dirty_buffer(bh);
		JBUFFER_TRACE(jh, "Journalling dirty buffer");
		clear_buffer_dirty(bh);
		set_buffer_jbddirty(bh);
	}

can be moved into the branch

  	if (jh->b_transaction == transaction ||
	    jh->b_next_transaction == transaction) {

below. That way you can drop the assertions as well because they happen
later in do_get_write_access() again anyway.

Also I don't quite understand the new comment you have added. Do you mean
we need to not only clear BH_Dirty bit but also set BH_JBDdirty as dirtying
(through jbd2_journal_dirty_metadata()) does not have to follow after
do_get_write_access()?

Otherwise the patch looks good.

								Honza
>  
> -	unlock_buffer(bh);
> -
>  	error = -EROFS;
>  	if (is_handle_aborted(handle)) {
>  		spin_unlock(&jh->b_state_lock);
> +		unlock_buffer(bh);
>  		goto out;
>  	}
>  	error = 0;
> @@ -1049,8 +1050,10 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  	 * b_next_transaction points to it
>  	 */
>  	if (jh->b_transaction == transaction ||
> -	    jh->b_next_transaction == transaction)
> +	    jh->b_next_transaction == transaction) {
> +		unlock_buffer(bh);
>  		goto done;
> +	}
>  
>  	/*
>  	 * this is the first time this transaction is touching this buffer,
> @@ -1074,10 +1077,24 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
>  		 */
>  		smp_wmb();
>  		spin_lock(&journal->j_list_lock);
> +		if (test_clear_buffer_dirty(bh)) {
> +			/*
> +			 * Execute buffer dirty clearing and jh->b_transaction
> +			 * assignment under journal->j_list_lock locked to
> +			 * prevent bh being removed from checkpoint list if
> +			 * the buffer is in an intermediate state (not dirty
> +			 * and jh->b_transaction is NULL).
> +			 */
> +			JBUFFER_TRACE(jh, "Journalling dirty buffer");
> +			set_buffer_jbddirty(bh);
> +		}
>  		__jbd2_journal_file_buffer(jh, transaction, BJ_Reserved);
>  		spin_unlock(&journal->j_list_lock);
> +		unlock_buffer(bh);
>  		goto done;
>  	}
> +	unlock_buffer(bh);
> +
>  	/*
>  	 * If there is already a copy-out version of this buffer, then we don't
>  	 * need to make another one
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2023-01-06 14:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 11:56 [PATCH v2] jbd2: Fix data missing when reusing bh which is ready to be checkpointed Zhihao Cheng
2023-01-06 14:22 ` Jan Kara [this message]
2023-01-07  9:16   ` Zhihao Cheng
2023-01-09 11:20     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230106142255.fqnzgw5tqr77mdzj@quack3 \
    --to=jack@suse.cz \
    --cc=chengzhihao1@huawei.com \
    --cc=jack@suse.com \
    --cc=libaokun1@huawei.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@huawei.com \
    --cc=zhanchengbin1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.