From: Josef Bacik <josef@toxicpanda.com>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion
Date: Mon, 29 Jul 2019 09:38:55 -0400 [thread overview]
Message-ID: <20190729133854.lkaiv7fvw4lf3cun@macbook-pro-91.dhcp.thefacebook.com> (raw)
In-Reply-To: <20190725102704.11404-1-fdmanana@kernel.org>
On Thu, Jul 25, 2019 at 11:27:04AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When one transaction is finishing its commit, it is possible for another
> transaction to start and enter its initial commit phase as well. If the
> first ends up getting aborted, we have a small time window where the second
> transaction commit does not notice that the previous transaction aborted
> and ends up committing, writing a superblock that points to btrees that
> reference extent buffers (nodes and leafs) that were not persisted to disk.
> The consequence is that after mounting the filesystem again, we will be
> unable to load some btree nodes/leafs, either because the content on disk
> is either garbage (or just zeroes) or corresponds to the old content of a
> previouly COWed or deleted node/leaf, resulting in the well known error
> messages "parent transid verify failed on ...".
> The following sequence diagram illustrates how this can happen.
>
> CPU 1 CPU 2
>
> <at transaction N>
>
> btrfs_commit_transaction()
> (...)
> --> sets transaction state to
> TRANS_STATE_UNBLOCKED
> --> sets fs_info->running_transaction
> to NULL
>
> (...)
> btrfs_start_transaction()
> start_transaction()
> wait_current_trans()
> --> returns immediately
> because
> fs_info->running_transaction
> is NULL
> join_transaction()
> --> creates transaction N + 1
> --> sets
> fs_info->running_transaction
> to transaction N + 1
> --> adds transaction N + 1 to
> the fs_info->trans_list list
> --> returns transaction handle
> pointing to the new
> transaction N + 1
> (...)
>
> btrfs_sync_file()
> btrfs_start_transaction()
> --> returns handle to
> transaction N + 1
> (...)
>
> btrfs_write_and_wait_transaction()
> --> writeback of some extent
> buffer fails, returns an
> error
> btrfs_handle_fs_error()
> --> sets BTRFS_FS_STATE_ERROR in
> fs_info->fs_state
> --> jumps to label "scrub_continue"
> cleanup_transaction()
> btrfs_abort_transaction(N)
> --> sets BTRFS_FS_STATE_TRANS_ABORTED
> flag in fs_info->fs_state
> --> sets aborted field in the
> transaction and transaction
> handle structures, for
> transaction N only
> --> removes transaction from the
> list fs_info->trans_list
> btrfs_commit_transaction(N + 1)
> --> transaction N + 1 was not
> aborted, so it proceeds
> (...)
> --> sets the transaction's state
> to TRANS_STATE_COMMIT_START
> --> does not find the previous
> transaction (N) in the
> fs_info->trans_list, so it
> doesn't know that transaction
> was aborted, and the commit
> of transaction N + 1 proceeds
> (...)
> --> sets transaction N + 1 state
> to TRANS_STATE_UNBLOCKED
> btrfs_write_and_wait_transaction()
> --> succeeds writing all extent
> buffers created in the
> transaction N + 1
> write_all_supers()
> --> succeeds
> --> we now have a superblock on
> disk that points to trees
> that refer to at least one
> extent buffer that was
> never persisted
>
> So fix this by updating the transaction commit path to check if the flag
> BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
> the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
> transaction in the fs_info->trans_list. If the flag is set, just fail the
> transaction commit with -EROFS, as we do in other places. The exact error
> code for the previous transaction abort was already logged and reported.
>
> Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/transaction.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3f6811cdf803..7b8bd9046229 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -2019,6 +2019,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
> }
> } else {
> spin_unlock(&fs_info->trans_lock);
> + /*
> + * The previous transaction was aborted and was already removed
> + * from the list of transactions at fs_info->trans_list. So we
> + * abort to prevent writing a new superblock that reflects a
> + * corrupt state (pointing to trees with unwritten nodes/leafs).
> + */
> + if (test_bit(BTRFS_FS_STATE_TRANS_ABORTED,
> + &fs_info->fs_state)) {
> + ret = -EROFS;
> + goto cleanup_transaction;
> + }
> }
>
> extwriter_counter_dec(cur_trans, trans->type);
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
prev parent reply other threads:[~2019-07-29 13:39 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-25 10:27 [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion fdmanana
2019-07-26 14:19 ` David Sterba
2019-07-29 13:38 ` Josef Bacik [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190729133854.lkaiv7fvw4lf3cun@macbook-pro-91.dhcp.thefacebook.com \
--to=josef@toxicpanda.com \
--cc=fdmanana@kernel.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).