linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Sterba <dsterba@suse.cz>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion
Date: Fri, 26 Jul 2019 16:19:23 +0200	[thread overview]
Message-ID: <20190726141922.GC2868@twin.jikos.cz> (raw)
In-Reply-To: <20190725102704.11404-1-fdmanana@kernel.org>

On Thu, Jul 25, 2019 at 11:27:04AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When one transaction is finishing its commit, it is possible for another
> transaction to start and enter its initial commit phase as well. If the
> first ends up getting aborted, we have a small time window where the second
> transaction commit does not notice that the previous transaction aborted
> and ends up committing, writing a superblock that points to btrees that
> reference extent buffers (nodes and leafs) that were not persisted to disk.
> The consequence is that after mounting the filesystem again, we will be
> unable to load some btree nodes/leafs, either because the content on disk
> is either garbage (or just zeroes) or corresponds to the old content of a
> previouly COWed or deleted node/leaf, resulting in the well known error
> messages "parent transid verify failed on ...".
> The following sequence diagram illustrates how this can happen.
> 
>         CPU 1                                           CPU 2
> 
>  <at transaction N>
> 
>  btrfs_commit_transaction()
>    (...)
>    --> sets transaction state to
>        TRANS_STATE_UNBLOCKED
>    --> sets fs_info->running_transaction
>        to NULL
> 
>                                                     (...)
>                                                     btrfs_start_transaction()
>                                                       start_transaction()
>                                                         wait_current_trans()
>                                                           --> returns immediately
>                                                               because
>                                                               fs_info->running_transaction
>                                                               is NULL
>                                                         join_transaction()
>                                                           --> creates transaction N + 1
>                                                           --> sets
>                                                               fs_info->running_transaction
>                                                               to transaction N + 1
>                                                           --> adds transaction N + 1 to
>                                                               the fs_info->trans_list list
>                                                         --> returns transaction handle
>                                                             pointing to the new
>                                                             transaction N + 1
>                                                     (...)
> 
>                                                     btrfs_sync_file()
>                                                       btrfs_start_transaction()
>                                                         --> returns handle to
>                                                             transaction N + 1
>                                                       (...)
> 
>    btrfs_write_and_wait_transaction()
>      --> writeback of some extent
>          buffer fails, returns an
> 	 error
>    btrfs_handle_fs_error()
>      --> sets BTRFS_FS_STATE_ERROR in
>          fs_info->fs_state
>    --> jumps to label "scrub_continue"
>    cleanup_transaction()
>      btrfs_abort_transaction(N)
>        --> sets BTRFS_FS_STATE_TRANS_ABORTED
>            flag in fs_info->fs_state
>        --> sets aborted field in the
>            transaction and transaction
> 	   handle structures, for
>            transaction N only
>      --> removes transaction from the
>          list fs_info->trans_list
>                                                       btrfs_commit_transaction(N + 1)
>                                                         --> transaction N + 1 was not
> 							    aborted, so it proceeds
>                                                         (...)
>                                                         --> sets the transaction's state
>                                                             to TRANS_STATE_COMMIT_START
>                                                         --> does not find the previous
>                                                             transaction (N) in the
>                                                             fs_info->trans_list, so it
>                                                             doesn't know that transaction
>                                                             was aborted, and the commit
>                                                             of transaction N + 1 proceeds
>                                                         (...)
>                                                         --> sets transaction N + 1 state
>                                                             to TRANS_STATE_UNBLOCKED
>                                                         btrfs_write_and_wait_transaction()
>                                                           --> succeeds writing all extent
>                                                               buffers created in the
>                                                               transaction N + 1
>                                                         write_all_supers()
>                                                            --> succeeds
>                                                            --> we now have a superblock on
>                                                                disk that points to trees
>                                                                that refer to at least one
>                                                                extent buffer that was
>                                                                never persisted
> 
> So fix this by updating the transaction commit path to check if the flag
> BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
> the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
> transaction in the fs_info->trans_list. If the flag is set, just fail the
> transaction commit with -EROFS, as we do in other places. The exact error
> code for the previous transaction abort was already logged and reported.
> 
> Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: David Sterba <dsterba@suse.com>

Queued for 5.3, thanks.

  reply	other threads:[~2019-07-26 14:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-25 10:27 [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion fdmanana
2019-07-26 14:19 ` David Sterba [this message]
2019-07-29 13:38 ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190726141922.GC2868@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).