* [PATCH] Btrfs: send, fix race with transaction commits that create snapshots
@ 2018-12-10 17:55 fdmanana
2018-12-11 10:19 ` [PATCH v2] " fdmanana
0 siblings, 1 reply; 3+ messages in thread
From: fdmanana @ 2018-12-10 17:55 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If we create a snapshot of a snapshot currently being used by a send
operation, we can end up with send failing unexpectedly (returning
-ENOENT error to user space for example). The following diagram shows
how this happens.
CPU 1 CPU2 CPU3
btrfs_ioctl_send()
(...)
create_snapshot()
-> creates snapshot of a
root used by the send
task
btrfs_commit_transaction()
create_pending_snapshot()
__get_inode_info()
btrfs_search_slot()
btrfs_search_slot_get_root()
down_read commit_root_sem
get reference on eb of the
commit root
-> eb with bytenr == X
up_read commit_root_sem
btrfs_cow_block(root node)
btrfs_free_tree_block()
-> creates delayed ref to
free the extent
btrfs_run_delayed_refs()
-> runs the delayed ref,
adds extent to
fs_info->pinned_extents
btrfs_finish_extent_commit()
unpin_extent_range()
-> marks extent as free
in the free space cache
transaction commit finishes
btrfs_start_transaction()
(...)
btrfs_cow_block()
btrfs_alloc_tree_block()
btrfs_reserve_extent()
-> allocates extent at
bytenr == X
btrfs_init_new_buffer(bytenr X)
btrfs_find_create_tree_block()
alloc_extent_buffer(bytenr X)
find_extent_buffer(bytenr X)
-> returns existing eb,
which the send task got
(...)
-> modifies content of the
eb with bytenr == X
-> uses an eb that now
belongs to some other
tree and no more matches
the commit root of the
snapshot, resuts will be
unpredictable
The consequences of this race can be various, and can lead to searches in
the commit root performed by the send task failing unexpectedly (unable to
find inode items, returning -ENOENT to user space, for example) or not
failing because an inode item with the same number was added to the tree
that reused the metadata extent, in which case send can behave incorrectly
in the worst case or just fail later for some reason.
Fix this by performing a copy of the commit root's extent buffer when doing
a search in the context of a send operation.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
fs/btrfs/ctree.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 539901fb5165..b3ab47cca384 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2587,11 +2587,12 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
/* The commit roots are read only so we always do read locks */
if (p->need_commit_sem)
down_read(&fs_info->commit_root_sem);
- b = root->commit_root;
- extent_buffer_get(b);
- level = btrfs_header_level(b);
+ b = btrfs_clone_extent_buffer(root->commit_root);
if (p->need_commit_sem)
up_read(&fs_info->commit_root_sem);
+ if (!b)
+ return ERR_PTR(-ENOMEM);
+ level = btrfs_header_level(b);
/*
* Ensure that all callers have set skip_locking when
* p->search_commit_root = 1.
@@ -2717,6 +2718,10 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
again:
prev_cmp = -1;
b = btrfs_search_slot_get_root(root, p, write_lock_level);
+ if (IS_ERR(b)) {
+ ret = PTR_ERR(b);
+ goto done;
+ }
while (b) {
level = btrfs_header_level(b);
--
2.11.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2] Btrfs: send, fix race with transaction commits that create snapshots
2018-12-10 17:55 [PATCH] Btrfs: send, fix race with transaction commits that create snapshots fdmanana
@ 2018-12-11 10:19 ` fdmanana
2018-12-12 13:37 ` David Sterba
0 siblings, 1 reply; 3+ messages in thread
From: fdmanana @ 2018-12-11 10:19 UTC (permalink / raw)
To: linux-btrfs
From: Filipe Manana <fdmanana@suse.com>
If we create a snapshot of a snapshot currently being used by a send
operation, we can end up with send failing unexpectedly (returning
-ENOENT error to user space for example). The following diagram shows
how this happens.
CPU 1 CPU2 CPU3
btrfs_ioctl_send()
(...)
create_snapshot()
-> creates snapshot of a
root used by the send
task
btrfs_commit_transaction()
create_pending_snapshot()
__get_inode_info()
btrfs_search_slot()
btrfs_search_slot_get_root()
down_read commit_root_sem
get reference on eb of the
commit root
-> eb with bytenr == X
up_read commit_root_sem
btrfs_cow_block(root node)
btrfs_free_tree_block()
-> creates delayed ref to
free the extent
btrfs_run_delayed_refs()
-> runs the delayed ref,
adds extent to
fs_info->pinned_extents
btrfs_finish_extent_commit()
unpin_extent_range()
-> marks extent as free
in the free space cache
transaction commit finishes
btrfs_start_transaction()
(...)
btrfs_cow_block()
btrfs_alloc_tree_block()
btrfs_reserve_extent()
-> allocates extent at
bytenr == X
btrfs_init_new_buffer(bytenr X)
btrfs_find_create_tree_block()
alloc_extent_buffer(bytenr X)
find_extent_buffer(bytenr X)
-> returns existing eb,
which the send task got
(...)
-> modifies content of the
eb with bytenr == X
-> uses an eb that now
belongs to some other
tree and no more matches
the commit root of the
snapshot, resuts will be
unpredictable
The consequences of this race can be various, and can lead to searches in
the commit root performed by the send task failing unexpectedly (unable to
find inode items, returning -ENOENT to user space, for example) or not
failing because an inode item with the same number was added to the tree
that reused the metadata extent, in which case send can behave incorrectly
in the worst case or just fail later for some reason.
Fix this by performing a copy of the commit root's extent buffer when doing
a search in the context of a send operation.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
V2: Updated comment and made it clone the extent buffer really only in the
send context, and not for every search on the commit root (all searches
other from the ones done by send are well behaved hold the commit_root_sem).
fs/btrfs/ctree.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 539901fb5165..99e7645ad94e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2584,14 +2584,27 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
root_lock = BTRFS_READ_LOCK;
if (p->search_commit_root) {
- /* The commit roots are read only so we always do read locks */
- if (p->need_commit_sem)
+ /*
+ * The commit roots are read only so we always do read locks,
+ * and we always must hold the commit_root_sem when doing
+ * searches on them, the only exception is send where we don't
+ * want to block transaction commits for a long time, so
+ * we need to clone the commit root in order to avoid races
+ * with transaction commits that create a snapshot of one of
+ * the roots used by a send operation.
+ */
+ if (p->need_commit_sem) {
down_read(&fs_info->commit_root_sem);
- b = root->commit_root;
- extent_buffer_get(b);
- level = btrfs_header_level(b);
- if (p->need_commit_sem)
+ b = btrfs_clone_extent_buffer(root->commit_root);
up_read(&fs_info->commit_root_sem);
+ if (!b)
+ return ERR_PTR(-ENOMEM);
+
+ } else {
+ b = root->commit_root;
+ extent_buffer_get(b);
+ }
+ level = btrfs_header_level(b);
/*
* Ensure that all callers have set skip_locking when
* p->search_commit_root = 1.
@@ -2717,6 +2730,10 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
again:
prev_cmp = -1;
b = btrfs_search_slot_get_root(root, p, write_lock_level);
+ if (IS_ERR(b)) {
+ ret = PTR_ERR(b);
+ goto done;
+ }
while (b) {
level = btrfs_header_level(b);
--
2.11.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] Btrfs: send, fix race with transaction commits that create snapshots
2018-12-11 10:19 ` [PATCH v2] " fdmanana
@ 2018-12-12 13:37 ` David Sterba
0 siblings, 0 replies; 3+ messages in thread
From: David Sterba @ 2018-12-12 13:37 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs
On Tue, Dec 11, 2018 at 10:19:45AM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> If we create a snapshot of a snapshot currently being used by a send
> operation, we can end up with send failing unexpectedly (returning
> -ENOENT error to user space for example). The following diagram shows
> how this happens.
>
> CPU 1 CPU2 CPU3
>
> btrfs_ioctl_send()
> (...)
> create_snapshot()
> -> creates snapshot of a
> root used by the send
> task
> btrfs_commit_transaction()
> create_pending_snapshot()
> __get_inode_info()
> btrfs_search_slot()
> btrfs_search_slot_get_root()
> down_read commit_root_sem
>
> get reference on eb of the
> commit root
> -> eb with bytenr == X
>
> up_read commit_root_sem
>
> btrfs_cow_block(root node)
> btrfs_free_tree_block()
> -> creates delayed ref to
> free the extent
>
> btrfs_run_delayed_refs()
> -> runs the delayed ref,
> adds extent to
> fs_info->pinned_extents
>
> btrfs_finish_extent_commit()
> unpin_extent_range()
> -> marks extent as free
> in the free space cache
>
> transaction commit finishes
>
> btrfs_start_transaction()
> (...)
> btrfs_cow_block()
> btrfs_alloc_tree_block()
> btrfs_reserve_extent()
> -> allocates extent at
> bytenr == X
> btrfs_init_new_buffer(bytenr X)
> btrfs_find_create_tree_block()
> alloc_extent_buffer(bytenr X)
> find_extent_buffer(bytenr X)
> -> returns existing eb,
> which the send task got
>
> (...)
> -> modifies content of the
> eb with bytenr == X
>
> -> uses an eb that now
> belongs to some other
> tree and no more matches
> the commit root of the
> snapshot, resuts will be
> unpredictable
>
> The consequences of this race can be various, and can lead to searches in
> the commit root performed by the send task failing unexpectedly (unable to
> find inode items, returning -ENOENT to user space, for example) or not
> failing because an inode item with the same number was added to the tree
> that reused the metadata extent, in which case send can behave incorrectly
> in the worst case or just fail later for some reason.
>
> Fix this by performing a copy of the commit root's extent buffer when doing
> a search in the context of a send operation.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
Nice catch, patch added to misc-next, thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-12-12 13:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-10 17:55 [PATCH] Btrfs: send, fix race with transaction commits that create snapshots fdmanana
2018-12-11 10:19 ` [PATCH v2] " fdmanana
2018-12-12 13:37 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.