* [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion
@ 2019-07-25 10:27 fdmanana
2019-07-26 14:19 ` David Sterba
2019-07-29 13:38 ` Josef Bacik
0 siblings, 2 replies; 3+ messages in thread
From: fdmanana @ 2019-07-25 10:27 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
When one transaction is finishing its commit, it is possible for another
transaction to start and enter its initial commit phase as well. If the
first ends up getting aborted, we have a small time window where the second
transaction commit does not notice that the previous transaction aborted
and ends up committing, writing a superblock that points to btrees that
reference extent buffers (nodes and leafs) that were not persisted to disk.
The consequence is that after mounting the filesystem again, we will be
unable to load some btree nodes/leafs, either because the content on disk
is either garbage (or just zeroes) or corresponds to the old content of a
previouly COWed or deleted node/leaf, resulting in the well known error
messages "parent transid verify failed on ...".
The following sequence diagram illustrates how this can happen.
CPU 1 CPU 2
<at transaction N>
btrfs_commit_transaction()
(...)
--> sets transaction state to
TRANS_STATE_UNBLOCKED
--> sets fs_info->running_transaction
to NULL
(...)
btrfs_start_transaction()
start_transaction()
wait_current_trans()
--> returns immediately
because
fs_info->running_transaction
is NULL
join_transaction()
--> creates transaction N + 1
--> sets
fs_info->running_transaction
to transaction N + 1
--> adds transaction N + 1 to
the fs_info->trans_list list
--> returns transaction handle
pointing to the new
transaction N + 1
(...)
btrfs_sync_file()
btrfs_start_transaction()
--> returns handle to
transaction N + 1
(...)
btrfs_write_and_wait_transaction()
--> writeback of some extent
buffer fails, returns an
error
btrfs_handle_fs_error()
--> sets BTRFS_FS_STATE_ERROR in
fs_info->fs_state
--> jumps to label "scrub_continue"
cleanup_transaction()
btrfs_abort_transaction(N)
--> sets BTRFS_FS_STATE_TRANS_ABORTED
flag in fs_info->fs_state
--> sets aborted field in the
transaction and transaction
handle structures, for
transaction N only
--> removes transaction from the
list fs_info->trans_list
btrfs_commit_transaction(N + 1)
--> transaction N + 1 was not
aborted, so it proceeds
(...)
--> sets the transaction's state
to TRANS_STATE_COMMIT_START
--> does not find the previous
transaction (N) in the
fs_info->trans_list, so it
doesn't know that transaction
was aborted, and the commit
of transaction N + 1 proceeds
(...)
--> sets transaction N + 1 state
to TRANS_STATE_UNBLOCKED
btrfs_write_and_wait_transaction()
--> succeeds writing all extent
buffers created in the
transaction N + 1
write_all_supers()
--> succeeds
--> we now have a superblock on
disk that points to trees
that refer to at least one
extent buffer that was
never persisted
So fix this by updating the transaction commit path to check if the flag
BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
transaction in the fs_info->trans_list. If the flag is set, just fail the
transaction commit with -EROFS, as we do in other places. The exact error
code for the previous transaction abort was already logged and reported.
Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/transaction.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3f6811cdf803..7b8bd9046229 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2019,6 +2019,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
}
} else {
spin_unlock(&fs_info->trans_lock);
+ /*
+ * The previous transaction was aborted and was already removed
+ * from the list of transactions at fs_info->trans_list. So we
+ * abort to prevent writing a new superblock that reflects a
+ * corrupt state (pointing to trees with unwritten nodes/leafs).
+ */
+ if (test_bit(BTRFS_FS_STATE_TRANS_ABORTED,
+ &fs_info->fs_state)) {
+ ret = -EROFS;
+ goto cleanup_transaction;
+ }
}
extwriter_counter_dec(cur_trans, trans->type);
--
2.11.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion
2019-07-25 10:27 [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion fdmanana
@ 2019-07-26 14:19 ` David Sterba
2019-07-29 13:38 ` Josef Bacik
1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2019-07-26 14:19 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Thu, Jul 25, 2019 at 11:27:04AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When one transaction is finishing its commit, it is possible for another
> transaction to start and enter its initial commit phase as well. If the
> first ends up getting aborted, we have a small time window where the second
> transaction commit does not notice that the previous transaction aborted
> and ends up committing, writing a superblock that points to btrees that
> reference extent buffers (nodes and leafs) that were not persisted to disk.
> The consequence is that after mounting the filesystem again, we will be
> unable to load some btree nodes/leafs, either because the content on disk
> is either garbage (or just zeroes) or corresponds to the old content of a
> previouly COWed or deleted node/leaf, resulting in the well known error
> messages "parent transid verify failed on ...".
> The following sequence diagram illustrates how this can happen.
>
> CPU 1 CPU 2
>
> <at transaction N>
>
> btrfs_commit_transaction()
> (...)
> --> sets transaction state to
> TRANS_STATE_UNBLOCKED
> --> sets fs_info->running_transaction
> to NULL
>
> (...)
> btrfs_start_transaction()
> start_transaction()
> wait_current_trans()
> --> returns immediately
> because
> fs_info->running_transaction
> is NULL
> join_transaction()
> --> creates transaction N + 1
> --> sets
> fs_info->running_transaction
> to transaction N + 1
> --> adds transaction N + 1 to
> the fs_info->trans_list list
> --> returns transaction handle
> pointing to the new
> transaction N + 1
> (...)
>
> btrfs_sync_file()
> btrfs_start_transaction()
> --> returns handle to
> transaction N + 1
> (...)
>
> btrfs_write_and_wait_transaction()
> --> writeback of some extent
> buffer fails, returns an
> error
> btrfs_handle_fs_error()
> --> sets BTRFS_FS_STATE_ERROR in
> fs_info->fs_state
> --> jumps to label "scrub_continue"
> cleanup_transaction()
> btrfs_abort_transaction(N)
> --> sets BTRFS_FS_STATE_TRANS_ABORTED
> flag in fs_info->fs_state
> --> sets aborted field in the
> transaction and transaction
> handle structures, for
> transaction N only
> --> removes transaction from the
> list fs_info->trans_list
> btrfs_commit_transaction(N + 1)
> --> transaction N + 1 was not
> aborted, so it proceeds
> (...)
> --> sets the transaction's state
> to TRANS_STATE_COMMIT_START
> --> does not find the previous
> transaction (N) in the
> fs_info->trans_list, so it
> doesn't know that transaction
> was aborted, and the commit
> of transaction N + 1 proceeds
> (...)
> --> sets transaction N + 1 state
> to TRANS_STATE_UNBLOCKED
> btrfs_write_and_wait_transaction()
> --> succeeds writing all extent
> buffers created in the
> transaction N + 1
> write_all_supers()
> --> succeeds
> --> we now have a superblock on
> disk that points to trees
> that refer to at least one
> extent buffer that was
> never persisted
>
> So fix this by updating the transaction commit path to check if the flag
> BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
> the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
> transaction in the fs_info->trans_list. If the flag is set, just fail the
> transaction commit with -EROFS, as we do in other places. The exact error
> code for the previous transaction abort was already logged and reported.
>
> Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Queued for 5.3, thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion
2019-07-25 10:27 [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion fdmanana
2019-07-26 14:19 ` David Sterba
@ 2019-07-29 13:38 ` Josef Bacik
1 sibling, 0 replies; 3+ messages in thread
From: Josef Bacik @ 2019-07-29 13:38 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Thu, Jul 25, 2019 at 11:27:04AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When one transaction is finishing its commit, it is possible for another
> transaction to start and enter its initial commit phase as well. If the
> first ends up getting aborted, we have a small time window where the second
> transaction commit does not notice that the previous transaction aborted
> and ends up committing, writing a superblock that points to btrees that
> reference extent buffers (nodes and leafs) that were not persisted to disk.
> The consequence is that after mounting the filesystem again, we will be
> unable to load some btree nodes/leafs, either because the content on disk
> is either garbage (or just zeroes) or corresponds to the old content of a
> previouly COWed or deleted node/leaf, resulting in the well known error
> messages "parent transid verify failed on ...".
> The following sequence diagram illustrates how this can happen.
>
> CPU 1 CPU 2
>
> <at transaction N>
>
> btrfs_commit_transaction()
> (...)
> --> sets transaction state to
> TRANS_STATE_UNBLOCKED
> --> sets fs_info->running_transaction
> to NULL
>
> (...)
> btrfs_start_transaction()
> start_transaction()
> wait_current_trans()
> --> returns immediately
> because
> fs_info->running_transaction
> is NULL
> join_transaction()
> --> creates transaction N + 1
> --> sets
> fs_info->running_transaction
> to transaction N + 1
> --> adds transaction N + 1 to
> the fs_info->trans_list list
> --> returns transaction handle
> pointing to the new
> transaction N + 1
> (...)
>
> btrfs_sync_file()
> btrfs_start_transaction()
> --> returns handle to
> transaction N + 1
> (...)
>
> btrfs_write_and_wait_transaction()
> --> writeback of some extent
> buffer fails, returns an
> error
> btrfs_handle_fs_error()
> --> sets BTRFS_FS_STATE_ERROR in
> fs_info->fs_state
> --> jumps to label "scrub_continue"
> cleanup_transaction()
> btrfs_abort_transaction(N)
> --> sets BTRFS_FS_STATE_TRANS_ABORTED
> flag in fs_info->fs_state
> --> sets aborted field in the
> transaction and transaction
> handle structures, for
> transaction N only
> --> removes transaction from the
> list fs_info->trans_list
> btrfs_commit_transaction(N + 1)
> --> transaction N + 1 was not
> aborted, so it proceeds
> (...)
> --> sets the transaction's state
> to TRANS_STATE_COMMIT_START
> --> does not find the previous
> transaction (N) in the
> fs_info->trans_list, so it
> doesn't know that transaction
> was aborted, and the commit
> of transaction N + 1 proceeds
> (...)
> --> sets transaction N + 1 state
> to TRANS_STATE_UNBLOCKED
> btrfs_write_and_wait_transaction()
> --> succeeds writing all extent
> buffers created in the
> transaction N + 1
> write_all_supers()
> --> succeeds
> --> we now have a superblock on
> disk that points to trees
> that refer to at least one
> extent buffer that was
> never persisted
>
> So fix this by updating the transaction commit path to check if the flag
> BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
> the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
> transaction in the fs_info->trans_list. If the flag is set, just fail the
> transaction commit with -EROFS, as we do in other places. The exact error
> code for the previous transaction abort was already logged and reported.
>
> Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> fs/btrfs/transaction.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 3f6811cdf803..7b8bd9046229 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -2019,6 +2019,17 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
> }
> } else {
> spin_unlock(&fs_info->trans_lock);
> + /*
> + * The previous transaction was aborted and was already removed
> + * from the list of transactions at fs_info->trans_list. So we
> + * abort to prevent writing a new superblock that reflects a
> + * corrupt state (pointing to trees with unwritten nodes/leafs).
> + */
> + if (test_bit(BTRFS_FS_STATE_TRANS_ABORTED,
> + &fs_info->fs_state)) {
> + ret = -EROFS;
> + goto cleanup_transaction;
> + }
> }
>
> extwriter_counter_dec(cur_trans, trans->type);
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-07-29 13:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25 10:27 [PATCH] Btrfs: fix race leading to fs corruption after transaction abortion fdmanana
2019-07-26 14:19 ` David Sterba
2019-07-29 13:38 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).