linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8][v3] relocation error handling fixes
@ 2020-03-13 15:44 Josef Bacik
  2020-03-13 15:44 ` [PATCH 1/8] btrfs: drop block from cache on error in relocation Josef Bacik
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v2->v3:
- Updated the free reloc control to use the rbtree postorder thing.
- Updated the changelog for patch 6.
- Fixed the error goto in patch 5.
- Added the reviewed-bys.

v1->v2:
- Incorporated the various feedback, tweaked some things, adjusted commit
  messages, added some more comments.  Hopefully I got everything.
- Added "btrfs: do not init a reloc root if we aren't relocating", this is to
  oddress the weird handling of DEAD_ROOT and the use-after-free that Qu had
  originally attempted to fix.
- Reworked "btrfs: splice rc->reloc_roots onto reloc roots in recover" to simply
  handle cleaning up any remaining bigs on the reloc_control, since errors can
  mean we'll have some things left pending on the reloc_control.

------------------------------ Original email ---------------------------------
My root ref patches have uncovered weird failures in some of our xfstests,
particularly those that do balance while having errors.

I ran relocation through my eio-stress bpf script and loads of things fell out,
these are the fixes required to make the stress test run to completion.

Dave this is just based on my master, I assume it'll apply cleanly to what you
have, but if not let me know which branch you want me to rebase onto to get it
to work right.

Most of these are straightforward, the only tricky/subtle one is 7/7, and I've
added a lot of explanation there around my reasoning.  6/7 is also a little bit
more complicated as it changes the rules slightly for reference holding for
roots.  Before we just sort of hoped and prayed we go the right reference
dropped when we dropped root->reloc_root.  Now we hold one ref for the list of
reloc roots and one ref for root->reloc_root, so it's more clear when we need to
be dropping references.  Everything else is relatively straightforward.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/8] btrfs: drop block from cache on error in relocation
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 15:44 ` [PATCH 2/8] btrfs: do not init a reloc root if we aren't relocating Josef Bacik
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

If we have an error while building the backref tree in relocation we'll
process all the pending edges and then free the node.  However if we
integrated some edges into the cache we'll lose our link to those edges
by simply freeing this node, which means we'll leak memory and
references to any roots that we've found.

Instead we need to use remove_backref_node(), which walks through all of
the edges that are still linked to this node and free's them up and
drops any root references we may be holding.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 4fb7e3cc2aca..507361e99316 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1244,7 +1244,7 @@ struct backref_node *build_backref_tree(struct reloc_control *rc,
 			free_backref_node(cache, lower);
 		}
 
-		free_backref_node(cache, node);
+		remove_backref_node(cache, node);
 		return ERR_PTR(err);
 	}
 	ASSERT(!node || !node->detached);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/8] btrfs: do not init a reloc root if we aren't relocating
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
  2020-03-13 15:44 ` [PATCH 1/8] btrfs: drop block from cache on error in relocation Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 15:44 ` [PATCH 3/8] btrfs: unset reloc control if we fail to recover Josef Bacik
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

We previously were checking if the root had a dead root before accessing
root->reloc_root in order to avoid a UAF type bug.  However this
scenario happens after we've unset the reloc control, so we would have
been saved if we'd simply checked for fs_info->reloc_control.  At this
point during relocation we no longer need to be creating new reloc
roots, so simply move this check above the reloc_root checks to avoid
any future races and confusion.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 507361e99316..2141519a9dd0 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1527,6 +1527,10 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 	int clear_rsv = 0;
 	int ret;
 
+	if (!rc || !rc->create_reloc_tree ||
+	    root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+		return 0;
+
 	/*
 	 * The subvolume has reloc tree but the swap is finished, no need to
 	 * create/update the dead reloc tree
@@ -1540,10 +1544,6 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 		return 0;
 	}
 
-	if (!rc || !rc->create_reloc_tree ||
-	    root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
-		return 0;
-
 	if (!trans->reloc_reserved) {
 		rsv = trans->block_rsv;
 		trans->block_rsv = rc->block_rsv;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/8] btrfs: unset reloc control if we fail to recover
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
  2020-03-13 15:44 ` [PATCH 1/8] btrfs: drop block from cache on error in relocation Josef Bacik
  2020-03-13 15:44 ` [PATCH 2/8] btrfs: do not init a reloc root if we aren't relocating Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 15:44 ` [PATCH 4/8] btrfs: free the reloc_control in a consistent way Josef Bacik
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

If we fail to load an fs root, or fail to start a transaction we can
bail without unsetting the reloc control, which leads to problems later
when we free the reloc control but still have it attached to the file
system.

In the normal path we'll end up calling unset_reloc_control() twice, but
all it does is set fs_info->reloc_control = NULL, and we can only have
one balance at a time so it's not racey.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2141519a9dd0..c496f8ed8c7e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4657,9 +4657,8 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 
 	trans = btrfs_join_transaction(rc->extent_root);
 	if (IS_ERR(trans)) {
-		unset_reloc_control(rc);
 		err = PTR_ERR(trans);
-		goto out_free;
+		goto out_unset;
 	}
 
 	rc->merge_reloc_tree = 1;
@@ -4679,7 +4678,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 		if (IS_ERR(fs_root)) {
 			err = PTR_ERR(fs_root);
 			list_add_tail(&reloc_root->root_list, &reloc_roots);
-			goto out_free;
+			goto out_unset;
 		}
 
 		err = __add_reloc_root(reloc_root);
@@ -4690,7 +4689,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 
 	err = btrfs_commit_transaction(trans);
 	if (err)
-		goto out_free;
+		goto out_unset;
 
 	merge_reloc_roots(rc);
 
@@ -4706,6 +4705,8 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 	ret = clean_dirty_subvols(rc);
 	if (ret < 0 && !err)
 		err = ret;
+out_unset:
+	unset_reloc_control(rc);
 out_free:
 	kfree(rc);
 out:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/8] btrfs: free the reloc_control in a consistent way
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
                   ` (2 preceding siblings ...)
  2020-03-13 15:44 ` [PATCH 3/8] btrfs: unset reloc control if we fail to recover Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 17:18   ` David Sterba
  2020-03-13 15:44 ` [PATCH 5/8] btrfs: run clean_dirty_subvols if we fail to start a trans Josef Bacik
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If we have an error while processing the reloc roots we could leak roots
that were added to rc->reloc_roots before we hit the error.  We could
have also not removed the reloct tree mapping from our rb_tree, so clean
up any remaining nodes in the reloc root rb_tree.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c496f8ed8c7e..721d049ff2b5 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4387,6 +4387,20 @@ static struct reloc_control *alloc_reloc_control(struct btrfs_fs_info *fs_info)
 	return rc;
 }
 
+static void free_reloc_control(struct reloc_control *rc)
+{
+	struct mapping_node *node, *tmp;
+
+	free_reloc_roots(&rc->reloc_roots);
+	rbtree_postorder_for_each_entry_safe(node, tmp,
+					     &rc->reloc_root_tree.rb_root,
+					     rb_node) {
+		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
+		kfree(node);
+	}
+	kfree(rc);
+}
+
 /*
  * Print the block group being relocated
  */
@@ -4531,7 +4545,7 @@ int btrfs_relocate_block_group(struct btrfs_fs_info *fs_info, u64 group_start)
 		btrfs_dec_block_group_ro(rc->block_group);
 	iput(rc->data_inode);
 	btrfs_put_block_group(rc->block_group);
-	kfree(rc);
+	free_reloc_control(rc);
 	return err;
 }
 
@@ -4708,7 +4722,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 out_unset:
 	unset_reloc_control(rc);
 out_free:
-	kfree(rc);
+	free_reloc_control(rc);
 out:
 	if (!list_empty(&reloc_roots))
 		free_reloc_roots(&reloc_roots);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/8] btrfs: run clean_dirty_subvols if we fail to start a trans
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
                   ` (3 preceding siblings ...)
  2020-03-13 15:44 ` [PATCH 4/8] btrfs: free the reloc_control in a consistent way Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 15:44 ` [PATCH 6/8] btrfs: clear BTRFS_ROOT_DEAD_RELOC_TREE before dropping the reloc root Josef Bacik
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

If we do merge_reloc_roots() we could insert a few roots onto the dirty
subvol roots list, where we hold a ref on them.  If we fail to start the
transaction we need to run clean_dirty_subvols() in order to cleanup the
refs.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 721d049ff2b5..9c8a4a4b2bde 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4279,10 +4279,10 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 		goto out_free;
 	}
 	btrfs_commit_transaction(trans);
+out_free:
 	ret = clean_dirty_subvols(rc);
 	if (ret < 0 && !err)
 		err = ret;
-out_free:
 	btrfs_free_block_rsv(fs_info, rc->block_rsv);
 	btrfs_free_path(path);
 	return err;
@@ -4712,16 +4712,15 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 	trans = btrfs_join_transaction(rc->extent_root);
 	if (IS_ERR(trans)) {
 		err = PTR_ERR(trans);
-		goto out_free;
+		goto out_clean;
 	}
 	err = btrfs_commit_transaction(trans);
-
+out_clean:
 	ret = clean_dirty_subvols(rc);
 	if (ret < 0 && !err)
 		err = ret;
 out_unset:
 	unset_reloc_control(rc);
-out_free:
 	free_reloc_control(rc);
 out:
 	if (!list_empty(&reloc_roots))
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/8] btrfs: clear BTRFS_ROOT_DEAD_RELOC_TREE before dropping the reloc root
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
                   ` (4 preceding siblings ...)
  2020-03-13 15:44 ` [PATCH 5/8] btrfs: run clean_dirty_subvols if we fail to start a trans Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 15:44 ` [PATCH 7/8] btrfs: hold a ref on the root->reloc_root Josef Bacik
  2020-03-13 15:44 ` [PATCH 8/8] btrfs: remove a BUG_ON() from merge_reloc_roots() Josef Bacik
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Qu put this in place in order to avoid a use after free in
init_reloc_root.  However a previous patch in this series makes the use
after free impossible because at this point we no longer have a
reloc_control set on the fs_info.

So move this to be coupled with clearing the root->reloc_root so we're
consistent with all other operations of the reloc root.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 9c8a4a4b2bde..5bbed0ea26e1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2291,18 +2291,19 @@ static int clean_dirty_subvols(struct reloc_control *rc)
 
 			list_del_init(&root->reloc_dirty_list);
 			root->reloc_root = NULL;
-			if (reloc_root) {
-
-				ret2 = btrfs_drop_snapshot(reloc_root, NULL, 0, 1);
-				if (ret2 < 0 && !ret)
-					ret = ret2;
-			}
 			/*
 			 * Need barrier to ensure clear_bit() only happens after
 			 * root->reloc_root = NULL. Pairs with have_reloc_root.
 			 */
 			smp_wmb();
 			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
+
+			if (reloc_root) {
+
+				ret2 = btrfs_drop_snapshot(reloc_root, NULL, 0, 1);
+				if (ret2 < 0 && !ret)
+					ret = ret2;
+			}
 			btrfs_put_root(root);
 		} else {
 			/* Orphan reloc tree, just clean it up */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/8] btrfs: hold a ref on the root->reloc_root
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
                   ` (5 preceding siblings ...)
  2020-03-13 15:44 ` [PATCH 6/8] btrfs: clear BTRFS_ROOT_DEAD_RELOC_TREE before dropping the reloc root Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  2020-03-13 15:44 ` [PATCH 8/8] btrfs: remove a BUG_ON() from merge_reloc_roots() Josef Bacik
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We previously were relying on root->reloc_root to be cleaned up by the
drop snapshot, or the error handling.  However if btrfs_drop_snapshot()
failed it wouldn't drop the ref for the root.  Also we sort of depend on
the right thing to happen with moving reloc roots between lists and the
fs root they belong to, which makes it hard to figure out who owns the
reference.

Fix this by explicitly holding a reference on the reloc root for
roo->reloc_root.  This means that we hold two references on reloc roots,
one for whichever reloc_roots list it's attached to, and the
root->reloc_root we're on.

This makes it easier to reason out who owns a reference on the root, and
when it needs to be dropped.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 59 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 5bbed0ea26e1..7eae49834f3e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1384,6 +1384,7 @@ static void __del_reloc_root(struct btrfs_root *root)
 	struct rb_node *rb_node;
 	struct mapping_node *node = NULL;
 	struct reloc_control *rc = fs_info->reloc_ctl;
+	bool put_ref = false;
 
 	if (rc && root->node) {
 		spin_lock(&rc->reloc_root_tree.lock);
@@ -1399,9 +1400,22 @@ static void __del_reloc_root(struct btrfs_root *root)
 		BUG_ON((struct btrfs_root *)node->data != root);
 	}
 
+	/*
+	 * We only put the reloc root here if it's on the list.  There's a lot
+	 * of places where the pattern is to splice the rc->reloc_roots, process
+	 * the reloc roots, and then add the reloc root back onto
+	 * rc->reloc_roots.  If we call __del_reloc_root while it's off of the
+	 * list we don't want the reference being dropped, because the guy
+	 * messing with the list is in charge of the reference.
+	 */
 	spin_lock(&fs_info->trans_lock);
-	list_del_init(&root->root_list);
+	if (!list_empty(&root->root_list)) {
+		put_ref = true;
+		list_del_init(&root->root_list);
+	}
 	spin_unlock(&fs_info->trans_lock);
+	if (put_ref)
+		btrfs_put_root(root);
 	kfree(node);
 }
 
@@ -1516,6 +1530,9 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans,
 /*
  * create reloc tree for a given fs tree. reloc tree is just a
  * snapshot of the fs tree with special root objectid.
+ *
+ * The reloc_root comes out of here with two references, one for
+ * root->reloc_root, and another for being on the rc->reloc_roots list.
  */
 int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root)
@@ -1555,7 +1572,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 
 	ret = __add_reloc_root(reloc_root);
 	BUG_ON(ret < 0);
-	root->reloc_root = reloc_root;
+	root->reloc_root = btrfs_grab_root(reloc_root);
 	return 0;
 }
 
@@ -1576,6 +1593,13 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 	reloc_root = root->reloc_root;
 	root_item = &reloc_root->root_item;
 
+	/*
+	 * We are probably ok here, but __del_reloc_root() will drop its ref of
+	 * the root.  We have the ref fro root->reloc_root, but just in case
+	 * hold it while we update the reloc root.
+	 */
+	btrfs_grab_root(reloc_root);
+
 	/* root->reloc_root will stay until current relocation finished */
 	if (fs_info->reloc_ctl->merge_reloc_tree &&
 	    btrfs_root_refs(root_item) == 0) {
@@ -1597,7 +1621,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 	ret = btrfs_update_root(trans, fs_info->tree_root,
 				&reloc_root->root_key, root_item);
 	BUG_ON(ret);
-
+	btrfs_put_root(reloc_root);
 out:
 	return 0;
 }
@@ -2297,19 +2321,28 @@ static int clean_dirty_subvols(struct reloc_control *rc)
 			 */
 			smp_wmb();
 			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
-
 			if (reloc_root) {
-
+				/*
+				 * btrfs_drop_snapshot drops our ref we hold for
+				 * ->reloc_root.  If it fails however we must
+				 * drop the ref ourselves.
+				 */
 				ret2 = btrfs_drop_snapshot(reloc_root, NULL, 0, 1);
-				if (ret2 < 0 && !ret)
-					ret = ret2;
+				if (ret2 < 0) {
+					btrfs_put_root(reloc_root);
+					if (!ret)
+						ret = ret2;
+				}
 			}
 			btrfs_put_root(root);
 		} else {
 			/* Orphan reloc tree, just clean it up */
 			ret2 = btrfs_drop_snapshot(root, NULL, 0, 1);
-			if (ret2 < 0 && !ret)
-				ret = ret2;
+			if (ret2 < 0) {
+				btrfs_put_root(root);
+				if (!ret)
+					ret = ret2;
+			}
 		}
 	}
 	return ret;
@@ -4698,7 +4731,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 
 		err = __add_reloc_root(reloc_root);
 		BUG_ON(err < 0); /* -ENOMEM or logic error */
-		fs_root->reloc_root = reloc_root;
+		fs_root->reloc_root = btrfs_grab_root(reloc_root);
 		btrfs_put_root(fs_root);
 	}
 
@@ -4886,6 +4919,10 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
 /*
  * called after snapshot is created. migrate block reservation
  * and create reloc root for the newly created snapshot
+ *
+ * This is similar to btrfs_init_reloc_root(), we come out of here with two
+ * references held on the reloc_root, one for root->reloc_root and one for
+ * rc->reloc_roots.
  */
 int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 			       struct btrfs_pending_snapshot *pending)
@@ -4918,7 +4955,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 
 	ret = __add_reloc_root(reloc_root);
 	BUG_ON(ret < 0);
-	new_root->reloc_root = reloc_root;
+	new_root->reloc_root = btrfs_grab_root(reloc_root);
 
 	if (rc->create_reloc_tree)
 		ret = clone_backref_node(trans, rc, root, reloc_root);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 8/8] btrfs: remove a BUG_ON() from merge_reloc_roots()
  2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
                   ` (6 preceding siblings ...)
  2020-03-13 15:44 ` [PATCH 7/8] btrfs: hold a ref on the root->reloc_root Josef Bacik
@ 2020-03-13 15:44 ` Josef Bacik
  7 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 15:44 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This was pretty subtle, we default to reloc roots having 0 root refs, so
if we crash in the middle of the relocation they can just be deleted.
If we successfully complete the relocation operations we'll set our root
refs to 1 in prepare_to_merge() and then go on to merge_reloc_roots().

At prepare_to_merge() time if any of the reloc roots have a 0 reference
still, we will remove that reloc root from our reloc root rb tree, and
then clean it up later.

However this only happens if we successfully start a transaction.  If
we've aborted previously we will skip this step completely, and only
have reloc roots with a reference count of 0, but were never properly
removed from the reloc control's rb tree.

This isn't a problem per-se, our references are held by the list the
reloc roots are on, and by the original root the reloc root belongs to.
If we end up in this situation all the reloc roots will be added to the
dirty_reloc_list, and then properly dropped at that point.  The reloc
control will be free'd and the rb tree is no longer used.

There were two options when fixing this, one was to remove the BUG_ON(),
the other was to make prepare_to_merge() handle the case where we
couldn't start a trans handle.

IMO this is the cleaner solution.  I started with handling the error in
prepare_to_merge(), but it turned out super ugly.  And in the end this
BUG_ON() simply doesn't matter, the cleanup was happening properly, we
were just panicing because this BUG_ON() only matters in the success
case.  So I've opted to just remove it and add a comment where it was.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 7eae49834f3e..d35f075014c2 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2653,7 +2653,21 @@ void merge_reloc_roots(struct reloc_control *rc)
 			free_reloc_roots(&reloc_roots);
 	}
 
-	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
+	/*
+	 * We used to have
+	 *
+	 * BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
+	 *
+	 * here, but it's wrong.  If we fail to start the transaction in
+	 * prepare_to_merge() we will have only 0 ref reloc roots, none of which
+	 * have actually been removed from the reloc_root_tree rb tree.  This is
+	 * fine because we're bailing here, and we hold a reference on the root
+	 * for the list that holds it, so these roots will be cleaned up when we
+	 * do the reloc_dirty_list afterwards.  Meanwhile the root->reloc_root
+	 * will be cleaned up on unmount.
+	 *
+	 * The remaining nodes will be cleaned up by free_reloc_control.
+	 */
 }
 
 static void free_block_list(struct rb_root *blocks)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/8] btrfs: free the reloc_control in a consistent way
  2020-03-13 15:44 ` [PATCH 4/8] btrfs: free the reloc_control in a consistent way Josef Bacik
@ 2020-03-13 17:18   ` David Sterba
  2020-03-13 17:38     ` David Sterba
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2020-03-13 17:18 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Mar 13, 2020 at 11:44:44AM -0400, Josef Bacik wrote:
> If we have an error while processing the reloc roots we could leak roots
> that were added to rc->reloc_roots before we hit the error.  We could
> have also not removed the reloct tree mapping from our rb_tree, so clean
> up any remaining nodes in the reloc root rb_tree.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/relocation.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index c496f8ed8c7e..721d049ff2b5 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4387,6 +4387,20 @@ static struct reloc_control *alloc_reloc_control(struct btrfs_fs_info *fs_info)
>  	return rc;
>  }
>  
> +static void free_reloc_control(struct reloc_control *rc)
> +{
> +	struct mapping_node *node, *tmp;
> +
> +	free_reloc_roots(&rc->reloc_roots);
> +	rbtree_postorder_for_each_entry_safe(node, tmp,
> +					     &rc->reloc_root_tree.rb_root,
> +					     rb_node) {
> +		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);

The rb_erase is not needed here, the postorder traversal just goes over
all nodes and allows to free the containing structures together with the
rb_node. Dangling pointers are not an issue.

> +		kfree(node);
> +	}
> +	kfree(rc);
> +}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/8] btrfs: free the reloc_control in a consistent way
  2020-03-13 17:18   ` David Sterba
@ 2020-03-13 17:38     ` David Sterba
  2020-03-13 17:39       ` Josef Bacik
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2020-03-13 17:38 UTC (permalink / raw)
  To: David Sterba; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Fri, Mar 13, 2020 at 06:18:51PM +0100, David Sterba wrote:
> On Fri, Mar 13, 2020 at 11:44:44AM -0400, Josef Bacik wrote:
> > If we have an error while processing the reloc roots we could leak roots
> > that were added to rc->reloc_roots before we hit the error.  We could
> > have also not removed the reloct tree mapping from our rb_tree, so clean
> > up any remaining nodes in the reloc root rb_tree.
> > 
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/relocation.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> > index c496f8ed8c7e..721d049ff2b5 100644
> > --- a/fs/btrfs/relocation.c
> > +++ b/fs/btrfs/relocation.c
> > @@ -4387,6 +4387,20 @@ static struct reloc_control *alloc_reloc_control(struct btrfs_fs_info *fs_info)
> >  	return rc;
> >  }
> >  
> > +static void free_reloc_control(struct reloc_control *rc)
> > +{
> > +	struct mapping_node *node, *tmp;
> > +
> > +	free_reloc_roots(&rc->reloc_roots);
> > +	rbtree_postorder_for_each_entry_safe(node, tmp,
> > +					     &rc->reloc_root_tree.rb_root,
> > +					     rb_node) {
> > +		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
> 
> The rb_erase is not needed here, the postorder traversal just goes over
> all nodes and allows to free the containing structures together with the
> rb_node. Dangling pointers are not an issue.

I had not seen your reply when I replied to the v2 patch but if you
think the rb_erase is needed, I don't see why.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/8] btrfs: free the reloc_control in a consistent way
  2020-03-13 17:38     ` David Sterba
@ 2020-03-13 17:39       ` Josef Bacik
  0 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2020-03-13 17:39 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 3/13/20 1:38 PM, David Sterba wrote:
> On Fri, Mar 13, 2020 at 06:18:51PM +0100, David Sterba wrote:
>> On Fri, Mar 13, 2020 at 11:44:44AM -0400, Josef Bacik wrote:
>>> If we have an error while processing the reloc roots we could leak roots
>>> that were added to rc->reloc_roots before we hit the error.  We could
>>> have also not removed the reloct tree mapping from our rb_tree, so clean
>>> up any remaining nodes in the reloc root rb_tree.
>>>
>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>> ---
>>>   fs/btrfs/relocation.c | 18 ++++++++++++++++--
>>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>> index c496f8ed8c7e..721d049ff2b5 100644
>>> --- a/fs/btrfs/relocation.c
>>> +++ b/fs/btrfs/relocation.c
>>> @@ -4387,6 +4387,20 @@ static struct reloc_control *alloc_reloc_control(struct btrfs_fs_info *fs_info)
>>>   	return rc;
>>>   }
>>>   
>>> +static void free_reloc_control(struct reloc_control *rc)
>>> +{
>>> +	struct mapping_node *node, *tmp;
>>> +
>>> +	free_reloc_roots(&rc->reloc_roots);
>>> +	rbtree_postorder_for_each_entry_safe(node, tmp,
>>> +					     &rc->reloc_root_tree.rb_root,
>>> +					     rb_node) {
>>> +		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
>>
>> The rb_erase is not needed here, the postorder traversal just goes over
>> all nodes and allows to free the containing structures together with the
>> rb_node. Dangling pointers are not an issue.
> 
> I had not seen your reply when I replied to the v2 patch but if you
> think the rb_erase is needed, I don't see why.
> 

Because I looked at it and thought it was needed and was confused and had to go 
look when you replied when you said it wasn't.  So it's needed for clarity sake ;).

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/8] btrfs: unset reloc control if we fail to recover
  2020-03-04 16:18 ` [PATCH 3/8] btrfs: unset reloc control if we fail to recover Josef Bacik
@ 2020-03-05 11:38   ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2020-03-05 11:38 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team


[-- Attachment #1.1: Type: text/plain, Size: 1940 bytes --]



On 2020/3/5 上午12:18, Josef Bacik wrote:
> If we fail to load an fs root, or fail to start a transaction we can
> bail without unsetting the reloc control, which leads to problems later
> when we free the reloc control but still have it attached to the file
> system.
> 
> In the normal path we'll end up calling unset_reloc_control() twice, but
> all it does is set fs_info->reloc_control = NULL, and we can only have
> one balance at a time so it's not racey.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>  fs/btrfs/relocation.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 2141519a9dd0..c496f8ed8c7e 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4657,9 +4657,8 @@ int btrfs_recover_relocation(struct btrfs_root *root)
>  
>  	trans = btrfs_join_transaction(rc->extent_root);
>  	if (IS_ERR(trans)) {
> -		unset_reloc_control(rc);
>  		err = PTR_ERR(trans);
> -		goto out_free;
> +		goto out_unset;
>  	}
>  
>  	rc->merge_reloc_tree = 1;
> @@ -4679,7 +4678,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
>  		if (IS_ERR(fs_root)) {
>  			err = PTR_ERR(fs_root);
>  			list_add_tail(&reloc_root->root_list, &reloc_roots);
> -			goto out_free;
> +			goto out_unset;
>  		}
>  
>  		err = __add_reloc_root(reloc_root);
> @@ -4690,7 +4689,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
>  
>  	err = btrfs_commit_transaction(trans);
>  	if (err)
> -		goto out_free;
> +		goto out_unset;
>  
>  	merge_reloc_roots(rc);
>  
> @@ -4706,6 +4705,8 @@ int btrfs_recover_relocation(struct btrfs_root *root)
>  	ret = clean_dirty_subvols(rc);
>  	if (ret < 0 && !err)
>  		err = ret;
> +out_unset:
> +	unset_reloc_control(rc);
>  out_free:
>  	kfree(rc);
>  out:
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/8] btrfs: unset reloc control if we fail to recover
  2020-03-04 16:18 [PATCH 0/8][v2] relocation error handling fixes Josef Bacik
@ 2020-03-04 16:18 ` Josef Bacik
  2020-03-05 11:38   ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2020-03-04 16:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

If we fail to load an fs root, or fail to start a transaction we can
bail without unsetting the reloc control, which leads to problems later
when we free the reloc control but still have it attached to the file
system.

In the normal path we'll end up calling unset_reloc_control() twice, but
all it does is set fs_info->reloc_control = NULL, and we can only have
one balance at a time so it's not racey.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2141519a9dd0..c496f8ed8c7e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4657,9 +4657,8 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 
 	trans = btrfs_join_transaction(rc->extent_root);
 	if (IS_ERR(trans)) {
-		unset_reloc_control(rc);
 		err = PTR_ERR(trans);
-		goto out_free;
+		goto out_unset;
 	}
 
 	rc->merge_reloc_tree = 1;
@@ -4679,7 +4678,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 		if (IS_ERR(fs_root)) {
 			err = PTR_ERR(fs_root);
 			list_add_tail(&reloc_root->root_list, &reloc_roots);
-			goto out_free;
+			goto out_unset;
 		}
 
 		err = __add_reloc_root(reloc_root);
@@ -4690,7 +4689,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 
 	err = btrfs_commit_transaction(trans);
 	if (err)
-		goto out_free;
+		goto out_unset;
 
 	merge_reloc_roots(rc);
 
@@ -4706,6 +4705,8 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 	ret = clean_dirty_subvols(rc);
 	if (ret < 0 && !err)
 		err = ret;
+out_unset:
+	unset_reloc_control(rc);
 out_free:
 	kfree(rc);
 out:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-03-13 17:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 15:44 [PATCH 0/8][v3] relocation error handling fixes Josef Bacik
2020-03-13 15:44 ` [PATCH 1/8] btrfs: drop block from cache on error in relocation Josef Bacik
2020-03-13 15:44 ` [PATCH 2/8] btrfs: do not init a reloc root if we aren't relocating Josef Bacik
2020-03-13 15:44 ` [PATCH 3/8] btrfs: unset reloc control if we fail to recover Josef Bacik
2020-03-13 15:44 ` [PATCH 4/8] btrfs: free the reloc_control in a consistent way Josef Bacik
2020-03-13 17:18   ` David Sterba
2020-03-13 17:38     ` David Sterba
2020-03-13 17:39       ` Josef Bacik
2020-03-13 15:44 ` [PATCH 5/8] btrfs: run clean_dirty_subvols if we fail to start a trans Josef Bacik
2020-03-13 15:44 ` [PATCH 6/8] btrfs: clear BTRFS_ROOT_DEAD_RELOC_TREE before dropping the reloc root Josef Bacik
2020-03-13 15:44 ` [PATCH 7/8] btrfs: hold a ref on the root->reloc_root Josef Bacik
2020-03-13 15:44 ` [PATCH 8/8] btrfs: remove a BUG_ON() from merge_reloc_roots() Josef Bacik
  -- strict thread matches above, loose matches on Subject: below --
2020-03-04 16:18 [PATCH 0/8][v2] relocation error handling fixes Josef Bacik
2020-03-04 16:18 ` [PATCH 3/8] btrfs: unset reloc control if we fail to recover Josef Bacik
2020-03-05 11:38   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).