From: Jan Kara <jack@suse.cz>
To: Zhang Yi <yi.zhang@huawei.com>
Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
yukuai3@huawei.com
Subject: Re: [RFC PATCH v2 2/7] jbd2: ensure abort the journal if detect IO error when writing original buffer back
Date: Wed, 21 Apr 2021 15:20:14 +0200 [thread overview]
Message-ID: <20210421132014.GR8706@quack2.suse.cz> (raw)
In-Reply-To: <20210414134737.2366971-3-yi.zhang@huawei.com>
On Wed 14-04-21 21:47:32, Zhang Yi wrote:
> Althourh we merged <c044f3d8360> ("jbd2: abort journal if free a async
^^ Although ^^ We usually reference commits without <>.
> write error metadata buffer"), there is a race between
> jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the
> jbd2_log_do_checkpoint() may still missing to detect the buffer write
^^^ fail
> io error flag and will end up leading to filesystem inconsistency.
^^^^^^^^^^^^^^^^^^^ which may lead
> jbd2_journal_try_to_free_buffers() ext4_put_super()
> jbd2_journal_destroy()
> __jbd2_journal_remove_checkpoint()
> detect buffer write error jbd2_log_do_checkpoint()
> jbd2_cleanup_journal_tail()
> <--- lead to inconsistency
> jbd2_journal_abort()
>
> Fix this issue by introduce a new mark j_checkpoint_io_error and set it
^^^ introducing
> in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer
> has write_io_error flag. Then jbd2_journal_destroy() will detect this
^ which has
> mark and abort the journal to prevent updating log tail.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Thanks! In principle the patch looks good. Some spelling corrections and a
few suggestions for improvement below.
> ---
> fs/jbd2/checkpoint.c | 32 +++++++++++++++++++-------------
> fs/jbd2/journal.c | 9 +++++++++
> include/linux/jbd2.h | 7 +++++++
> 3 files changed, 35 insertions(+), 13 deletions(-)
>
> diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
> index bf5511d19ac5..ff3d38ad3a1e 100644
> --- a/fs/jbd2/checkpoint.c
> +++ b/fs/jbd2/checkpoint.c
> @@ -221,14 +221,15 @@ int jbd2_log_do_checkpoint(journal_t *journal)
> result = jbd2_cleanup_journal_tail(journal);
> trace_jbd2_checkpoint(journal, result);
> jbd_debug(1, "cleanup_journal_tail returned %d\n", result);
> - if (result <= 0)
> + if (result == 0)
> return result;
> + if (result < 0)
> + goto error;
>
> /*
> * OK, we need to start writing disk blocks. Take one transaction
> * and write it.
> */
> - result = 0;
> spin_lock(&journal->j_list_lock);
> if (!journal->j_checkpoint_transactions)
> goto out;
> @@ -295,8 +296,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
> goto restart;
> }
> if (!buffer_dirty(bh)) {
> - if (unlikely(buffer_write_io_error(bh)) && !result)
> - result = -EIO;
> BUFFER_TRACE(bh, "remove from checkpoint");
> if (__jbd2_journal_remove_checkpoint(jh))
> /* The transaction was released; we're done */
> @@ -356,9 +355,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
> spin_lock(&journal->j_list_lock);
> goto restart2;
> }
> - if (unlikely(buffer_write_io_error(bh)) && !result)
> - result = -EIO;
> -
> /*
> * Now in whatever state the buffer currently is, we
> * know that it has been written out and so we can
> @@ -369,12 +365,13 @@ int jbd2_log_do_checkpoint(journal_t *journal)
> }
> out:
> spin_unlock(&journal->j_list_lock);
> - if (result < 0)
> + result = jbd2_cleanup_journal_tail(journal);
> + if (result >= 0)
> + return 0;
> +error:
> + if (!is_journal_aborted(journal))
> jbd2_journal_abort(journal, result);
> - else
> - result = jbd2_cleanup_journal_tail(journal);
> -
> - return (result < 0) ? result : 0;
> + return result;
> }
>
> /*
> @@ -400,7 +397,7 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
> tid_t first_tid;
> unsigned long blocknr;
>
> - if (is_journal_aborted(journal))
> + if (is_journal_aborted(journal) || journal->j_checkpoint_io_error)
> return -EIO;
I think it would be cleaner to have in jbd2_journal_update_sb_log_tail()
a code handling j_checkpoint_io_error like:
if (is_journal_aborted(journal))
return -EIO;
if (journal->j_checkpoint_io_error) {
jbd2_journal_abort(...);
return -EIO;
}
So we are sure we never update log tail once we hit checkpointing error.
That way we can also drop this special case from
jbd2_cleanup_journal_tail() as well as from jbd2_log_do_checkpoint().
>
> if (!jbd2_journal_get_log_tail(journal, &first_tid, &blocknr))
> @@ -564,6 +561,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
> struct transaction_chp_stats_s *stats;
> transaction_t *transaction;
> journal_t *journal;
> + struct buffer_head *bh = jh2bh(jh);
>
> JBUFFER_TRACE(jh, "entry");
>
> @@ -575,6 +573,14 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
> journal = transaction->t_journal;
>
> JBUFFER_TRACE(jh, "removing from transaction");
> + /*
> + * If the buffer has been failed to write out to disk, it may probably
> + * lead to filesystem inconsistency after remove it from the log, so
> + * mark the journal checkpoint write back IO error.
> + */
I'd rephrase this comment like:
If we have failed to write the buffer out to disk, the filesystem may
become inconsistent. We cannot abort the journal here since we hold
j_list_lock and we have to careful about races with jbd2_journal_destroy().
So mark the writeback IO error in the journal here and we abort the journal
later from a better context.
> + if (buffer_write_io_error(bh))
> + journal->j_checkpoint_io_error = true;
> +
I agree that using j_flags for this is not great because they require
j_state_lock which we cannot easily take in
__jbd2_journal_remove_checkpoint(). But I think it may be better to create
j_atomic_flags in journal_t, manipulate these with atomic bitops
(set_bit(), test_bit()) and have just a flag J_CHECKPOINT_IO_ERROR there.
That way we can use these flags for other purposes in the future as well
and we won't create new KCSAN warnings...
> __buffer_unlink(jh);
> jh->b_cp_transaction = NULL;
> jbd2_journal_put_journal_head(jh);
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index 2dc944442802..6dbeab8b9feb 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1995,6 +1995,15 @@ int jbd2_journal_destroy(journal_t *journal)
> J_ASSERT(journal->j_checkpoint_transactions == NULL);
> spin_unlock(&journal->j_list_lock);
>
> + /*
> + * OK, all checkpoint transactions have been checked, now check the
> + * write out io error flag and abort the journal if some buffer failed
> + * to write back to the original location, otherwise the filesystem
> + * may probably becomes inconsistency.
^^^^^^^^^^^^^^^^^^^^^^^^^^ may become inconsistent.
> + */
> + if (!is_journal_aborted(journal) && journal->j_checkpoint_io_error)
> + jbd2_journal_abort(journal, -EIO);
> +
> if (journal->j_sb_buffer) {
> if (!is_journal_aborted(journal)) {
> mutex_lock_io(&journal->j_checkpoint_mutex);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 99d3cd051ac3..53ca70f80ad8 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -841,6 +841,13 @@ struct journal_s
> */
> transaction_t *j_checkpoint_transactions;
>
> + /**
> + * @j_checkpoint_io_error:
> + *
> + * Detect io error while writing back original buffer to disk.
> + */
> + bool j_checkpoint_io_error;
> +
> /**
> * @j_wait_transaction_locked:
> *
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2021-04-21 13:21 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-14 13:47 [RFC PATCH v2 0/7] ext4, jbd2: fix 3 issues about bdev_try_to_free_page() Zhang Yi
2021-04-14 13:47 ` [RFC PATCH v2 1/7] jbd2: remove the out label in __jbd2_journal_remove_checkpoint() Zhang Yi
2021-04-21 10:01 ` Jan Kara
2021-04-14 13:47 ` [RFC PATCH v2 2/7] jbd2: ensure abort the journal if detect IO error when writing original buffer back Zhang Yi
2021-04-21 13:20 ` Jan Kara [this message]
2021-04-14 13:47 ` [RFC PATCH v2 3/7] jbd2: don't abort the journal when freeing buffers Zhang Yi
2021-04-21 13:23 ` Jan Kara
2021-04-14 13:47 ` [RFC PATCH v2 4/7] jbd2: do not free buffers in jbd2_journal_try_to_free_buffers() Zhang Yi
2021-04-15 14:46 ` Christoph Hellwig
2021-04-14 13:47 ` [RFC PATCH v2 5/7] ext4: use RCU to protect accessing superblock in blkdev_releasepage() Zhang Yi
2021-04-15 14:48 ` Christoph Hellwig
2021-04-14 13:47 ` [RFC PATCH v2 6/7] fs: introduce a usage count into the superblock Zhang Yi
2021-04-15 14:40 ` Christoph Hellwig
2021-04-16 8:00 ` Zhang Yi
2021-04-14 13:47 ` [RFC PATCH v2 7/7] ext4: fix race between blkdev_releasepage() and ext4_put_super() Zhang Yi
2021-04-15 14:52 ` Christoph Hellwig
2021-04-16 8:00 ` Zhang Yi
2021-04-20 13:08 ` Christoph Hellwig
2021-04-21 13:46 ` Jan Kara
2021-04-21 16:57 ` Theodore Ts'o
2021-04-22 9:04 ` Jan Kara
2021-04-23 11:39 ` Zhang Yi
2021-04-23 16:06 ` Jan Kara
2021-04-23 14:40 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210421132014.GR8706@quack2.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).