All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/38] Cleanup error handling in relocation
@ 2020-12-16 16:26 Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 01/38] btrfs: convert BUG_ON()'s in relocate_tree_block Josef Bacik
                   ` (38 more replies)
  0 siblings, 39 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v6->v7:
- Broke up the series into 3 series, 1 for cosmetic things, 1 for all the major
  issues (including those reported on v6 of this set), and this new set which is
  solely the error handling related patches for relocation.  It's still a lot of
  patches, sorry about that.

v5->v6:
- Reworked "btrfs: handle errors from select_reloc_root()" because Zygo reported
  hitting an ASSERT(ret != -ENOENT) during his testing.  This was because I
  changed select_reloc_root() to return -ENOENT if we happened to race with
  somebody else who failed to init the reloc root, however we had an ASSERT() to
  check for this because it indicated corruption.  I modified that patch to move
  the ASSERT() to where the problem actually is, so select_reloc_root() can
  return whatever error and it'll pass it along.  I also removed Qu's
  reviewed-by for the patch because of the change.

v4->v5:
- Dropped "btrfs: fix error handling in commit_fs_roots" as it was merged.
- Fixed an ASSERT() that happened during relocation recovery that Zygo reported,
  I moved the error condition out of another condition which broke recovery if
  we had deleted subvols pending with relocation.

v3->v4:
- Squashed the __add_reloc_root error handling patches in
  btrfs_recover_relocation as they were small and in the same function.
- Squashed the record_root_in_trans failure handling patches for
  select_reloc_root as they were small and in the same function.
- Added a new patch to address an existing error handling problem with subvol
  creation.
- Fixed up the various cases that Qu noticed where I got things wrong, cleaning
  up a leaked root extent ref, a leaked inode item, and where I accidentally
  stopped dealing with errors from btrfs_drop_subtree.
- Reworked a bunch of the ASSERT()'s to do ASSERT(0) in their respective if
  statements.
- Added reviewed-bys.

v2->v3:
- A lot of extra patches fixing various things that I encountered while
  debugging the corruption problem that was uncovered by these patches.
- Fixed the panic that Zygo was seeing and other issues.
- Fixed up the comments from Nikolay and Filipe.

A slight note, the first set of patches could probably be taken now, and in fact

  btrfs: fix error handling in commit_fs_roots

Was sent earlier this week and is very important and needs to be reviewed and
merged ASAP.  The following are safe and could be merged outside of the rest of
this series

  btrfs: allow error injection for btrfs_search_slot and btrfs_cow_block
  btrfs: fix lockdep splat in btrfs_recover_relocation
  btrfs: keep track of the root owner for relocation reads
  btrfs: noinline btrfs_should_cancel_balance
  btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
  btrfs: pass down the tree block level through ref-verify
  btrfs: make sure owner is set in ref-verify
  btrfs: don't clear ret in btrfs_start_dirty_block_groups

The rest obviously are all around the actual error handling.

v1->v2:
- fixed a bug where I accidentally dropped reading flags in relocate_block_group
  when I dropped the extra checks that we handle in the tree checker.

--- Original message ---
Hello,

Relocation is the last place that is not able to handle errors at all, which
results in all sorts of lovely panics if you encounter corruptions or IO errors.
I'm going to start cleaning up relocation, but before I move code around I want
the error handling to be somewhat sane, so I'm not changing behavior and error
handling at the same time.

These patches are purely about error handling, there is no behavior changing
other than returning errors up the chain properly.  There is a lot of room for
follow up cleanups, which will happen next.  However I wanted to get this series
done today and out so we could get it merged ASAP, and then the follow up
cleanups can happen later as they are less important and less critical.

The only exception to the above is the patch to add the error injection sites
for btrfs_cow_block and btrfs_search_slot, and a lockdep fix that I discovered
while running my tests, those are the first two patches in the series.

I tested this with my error injection stress test, where I keep track of all
stack traces that have been tested and only inject errors when we have a new
stack trace, which means I should have covered all of the various error
conditions.  With this patchset I'm no longer panicing while stressing the error
conditions.  Thanks,

Josef

Josef Bacik (38):
  btrfs: convert BUG_ON()'s in relocate_tree_block
  btrfs: return an error from btrfs_record_root_in_trans
  btrfs: handle errors from select_reloc_root()
  btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors
  btrfs: check record_root_in_trans related failures in
    select_reloc_root
  btrfs: do proper error handling in record_reloc_root_in_trans
  btrfs: handle btrfs_record_root_in_trans failure in
    btrfs_rename_exchange
  btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename
  btrfs: handle btrfs_record_root_in_trans failure in
    btrfs_delete_subvolume
  btrfs: handle btrfs_record_root_in_trans failure in
    btrfs_recover_log_trees
  btrfs: handle btrfs_record_root_in_trans failure in create_subvol
  btrfs: btrfs: handle btrfs_record_root_in_trans failure in
    relocate_tree_block
  btrfs: handle btrfs_record_root_in_trans failure in start_transaction
  btrfs: handle record_root_in_trans failure in qgroup_account_snapshot
  btrfs: handle record_root_in_trans failure in
    btrfs_record_root_in_trans
  btrfs: handle record_root_in_trans failure in create_pending_snapshot
  btrfs: do not panic in __add_reloc_root
  btrfs: have proper error handling in btrfs_init_reloc_root
  btrfs: do proper error handling in create_reloc_root
  btrfs: validate ->reloc_root after recording root in trans
  btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots
  btrfs: change insert_dirty_subvol to return errors
  btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol
  btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge
  btrfs: do proper error handling in btrfs_update_reloc_root
  btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
  btrfs: handle btrfs_cow_block errors in replace_path
  btrfs: handle btrfs_search_slot failure in replace_path
  btrfs: handle errors in reference count manipulation in replace_path
  btrfs: handle extent reference errors in do_relocation
  btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly
  btrfs: remove the extent item sanity checks in relocate_block_group
  btrfs: do proper error handling in create_reloc_inode
  btrfs: handle __add_reloc_root failures in btrfs_recover_relocation
  btrfs: cleanup error handling in prepare_to_merge
  btrfs: handle extent corruption with select_one_root properly
  btrfs: do proper error handling in merge_reloc_roots
  btrfs: check return value of btrfs_commit_transaction in relocation

 fs/btrfs/inode.c        |  20 +-
 fs/btrfs/ioctl.c        |   7 +-
 fs/btrfs/relocation.c   | 414 +++++++++++++++++++++++++++++++---------
 fs/btrfs/transaction.c  |  37 ++--
 fs/btrfs/tree-checker.c |   5 +
 fs/btrfs/tree-log.c     |   8 +-
 6 files changed, 382 insertions(+), 109 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v7 01/38] btrfs: convert BUG_ON()'s in relocate_tree_block
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 02/38] btrfs: return an error from btrfs_record_root_in_trans Josef Bacik
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

We have a couple of BUG_ON()'s in relocate_tree_block() that can be
tripped if we have file system corruption.  Convert these to ASSERT()'s
so developers still get yelled at when they break the backref code, but
error out nicely for users so the whole box doesn't go down.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 8cb9a7b364d8..08ffaa93b78f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2454,8 +2454,28 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans,
 
 	if (root) {
 		if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
-			BUG_ON(node->new_bytenr);
-			BUG_ON(!list_empty(&node->list));
+			/*
+			 * This block was the root block of a root, and this is
+			 * the first time we're processing the block and thus it
+			 * should not have had the ->new_bytenr modified and
+			 * should have not been included on the changed list.
+			 *
+			 * However in the case of corruption we could have
+			 * multiple refs pointing to the same block improperly,
+			 * and thus we would trip over these checks.  ASSERT()
+			 * for the developer case, because it could indicate a
+			 * bug in the backref code, however error out for a
+			 * normal user in the case of corruption.
+			 */
+			ASSERT(node->new_bytenr == 0);
+			ASSERT(list_empty(&node->list));
+			if (node->new_bytenr || !list_empty(&node->list)) {
+				btrfs_err(root->fs_info,
+				  "bytenr %llu has improper references to it",
+					  node->bytenr);
+				ret = -EUCLEAN;
+				goto out;
+			}
 			btrfs_record_root_in_trans(trans, root);
 			root = root->reloc_root;
 			node->new_bytenr = root->node->start;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 02/38] btrfs: return an error from btrfs_record_root_in_trans
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 01/38] btrfs: convert BUG_ON()'s in relocate_tree_block Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2021-02-26 19:03   ` David Sterba
  2020-12-16 16:26 ` [PATCH v7 03/38] btrfs: handle errors from select_reloc_root() Josef Bacik
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo, Johannes Thumshirn

We can create a reloc root when we record the root in the trans, which
can fail for all sorts of different reasons.  Propagate this error up
the chain of callers.  Future patches will fix the callers of
btrfs_record_root_in_trans() to handle the error.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f51f9e39bcee..eba48578159e 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -400,6 +400,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans,
 			       int force)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
+	int ret = 0;
 
 	if ((test_bit(BTRFS_ROOT_SHAREABLE, &root->state) &&
 	    root->last_trans < trans->transid) || force) {
@@ -448,11 +449,11 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans,
 		 * lock.  smp_wmb() makes sure that all the writes above are
 		 * done before we pop in the zero below
 		 */
-		btrfs_init_reloc_root(trans, root);
+		ret = btrfs_init_reloc_root(trans, root);
 		smp_mb__before_atomic();
 		clear_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state);
 	}
-	return 0;
+	return ret;
 }
 
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 03/38] btrfs: handle errors from select_reloc_root()
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 01/38] btrfs: convert BUG_ON()'s in relocate_tree_block Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 02/38] btrfs: return an error from btrfs_record_root_in_trans Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2021-02-26 18:30   ` David Sterba
  2020-12-16 16:26 ` [PATCH v7 04/38] btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors Josef Bacik
                   ` (35 subsequent siblings)
  38 siblings, 1 reply; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Currently select_reloc_root() doesn't return an error, but followup
patches will make it possible for it to return an error.  We do have
proper error recovery in do_relocation however, so handle the
possibility of select_reloc_root() having an error properly instead of
BUG_ON(!root).  I've also adjusted select_reloc_root() to return
ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the
error case easier to deal with.  I've replaced the BUG_ON(!root) with an
ASSERT(0) for this case as it indicates we messed up the backref walking
code, but it could also indicate corruption.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 08ffaa93b78f..741068580455 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2024,8 +2024,14 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
 		if (!next || next->level <= node->level)
 			break;
 	}
-	if (!root)
-		return NULL;
+	if (!root) {
+		/*
+		 * This can happen if there's fs corruption or if there's a bug
+		 * in the backref lookup code.
+		 */
+		ASSERT(0);
+		return ERR_PTR(-ENOENT);
+	}
 
 	next = node;
 	/* setup backref node path for btrfs_reloc_cow_block */
@@ -2196,7 +2202,10 @@ static int do_relocation(struct btrfs_trans_handle *trans,
 
 		upper = edge->node[UPPER];
 		root = select_reloc_root(trans, rc, upper, edges);
-		BUG_ON(!root);
+		if (IS_ERR(root)) {
+			ret = PTR_ERR(root);
+			goto next;
+		}
 
 		if (upper->eb && !upper->locked) {
 			if (!lowest) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 04/38] btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (2 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 03/38] btrfs: handle errors from select_reloc_root() Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 05/38] btrfs: check record_root_in_trans related failures in select_reloc_root Josef Bacik
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have several BUG_ON()'s in select_reloc_root() that can be tripped if
you have extent tree corruption.  Convert these to ASSERT()'s, because
if we hit it during testing it really is bad, or could indicate a
problem with the backref walking code.

However if users hit these problems it generally indicates corruption,
I've hit a few machines in the fleet that trip over these with clearly
corrupted extent trees, so be nice and spit out an error message and
return an error instead of bringing the whole box down.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 47 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 741068580455..2a0f3c0dbbc0 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1994,8 +1994,33 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
 		cond_resched();
 		next = walk_up_backref(next, edges, &index);
 		root = next->root;
-		BUG_ON(!root);
-		BUG_ON(!test_bit(BTRFS_ROOT_SHAREABLE, &root->state));
+
+		/*
+		 * If there is no root, then our references for this block are
+		 * incomplete, as we should be able to walk all the way up to a
+		 * block that is owned by a root.
+		 *
+		 * This path is only for SHAREABLE roots, so if we come upon a
+		 * non-SHAREABLE root then we have backrefs that resolve
+		 * improperly.
+		 *
+		 * Both of these cases indicate file system corruption, or a bug
+		 * in the backref walking code.
+		 */
+		if (!root) {
+			ASSERT(0);
+			btrfs_err(trans->fs_info,
+		"bytenr %llu doesn't have a backref path ending in a root",
+				  node->bytenr);
+			return ERR_PTR(-EUCLEAN);
+		}
+		if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
+			ASSERT(0);
+			btrfs_err(trans->fs_info,
+"bytenr %llu has multiple refs with one ending in a non shareable root",
+				  node->bytenr);
+			return ERR_PTR(-EUCLEAN);
+		}
 
 		if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) {
 			record_reloc_root_in_trans(trans, root);
@@ -2006,8 +2031,22 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
 		root = root->reloc_root;
 
 		if (next->new_bytenr != root->node->start) {
-			BUG_ON(next->new_bytenr);
-			BUG_ON(!list_empty(&next->list));
+			/*
+			 * We just created the reloc root, so we shouldn't have
+			 * ->new_bytenr set and this shouldn't be in the changed
+			 *  list.  If it is then we have multiple roots pointing
+			 *  at the same bytenr which indicates corruption, or
+			 *  we've made a mistake in the backref walking code.
+			 */
+			ASSERT(next->new_bytenr == 0);
+			ASSERT(list_empty(&next->list));
+			if (next->new_bytenr || !list_empty(&next->list)) {
+				btrfs_err(trans->fs_info,
+"bytenr %llu possibly has multiple roots pointing at the same bytenr %llu",
+					  node->bytenr, next->bytenr);
+				return ERR_PTR(-EUCLEAN);
+			}
+
 			next->new_bytenr = root->node->start;
 			btrfs_put_root(next->root);
 			next->root = btrfs_grab_root(root);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 05/38] btrfs: check record_root_in_trans related failures in select_reloc_root
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (3 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 04/38] btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 06/38] btrfs: do proper error handling in record_reloc_root_in_trans Josef Bacik
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

We will record the fs root or the reloc root in the trans in
select_reloc_root.  These will actually return errors in the following
patches, so check their return value here and return it up the stack.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2a0f3c0dbbc0..1c36b10fdd02 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1988,6 +1988,7 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
 	struct btrfs_backref_node *next;
 	struct btrfs_root *root;
 	int index = 0;
+	int ret;
 
 	next = node;
 	while (1) {
@@ -2023,11 +2024,15 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
 		}
 
 		if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) {
-			record_reloc_root_in_trans(trans, root);
+			ret = record_reloc_root_in_trans(trans, root);
+			if (ret)
+				return ERR_PTR(ret);
 			break;
 		}
 
-		btrfs_record_root_in_trans(trans, root);
+		ret = btrfs_record_root_in_trans(trans, root);
+		if (ret)
+			return ERR_PTR(ret);
 		root = root->reloc_root;
 
 		if (next->new_bytenr != root->node->start) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 06/38] btrfs: do proper error handling in record_reloc_root_in_trans
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (4 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 05/38] btrfs: check record_root_in_trans related failures in select_reloc_root Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 07/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange Josef Bacik
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Generally speaking this shouldn't ever fail, the corresponding fs root
for the reloc root will already be in memory, so we won't get -ENOMEM
here.

However if there is no corresponding root for the reloc root then we
could get -ENOMEM when we try to allocate it or we could get -ENOENT
when we look it up and see that it doesn't exist.

Convert these BUG_ON()'s into ASSERT()'s + proper error handling for the
case of corruption.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1c36b10fdd02..0eea27d9e3cf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1971,8 +1971,27 @@ static int record_reloc_root_in_trans(struct btrfs_trans_handle *trans,
 		return 0;
 
 	root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset, false);
-	BUG_ON(IS_ERR(root));
-	BUG_ON(root->reloc_root != reloc_root);
+
+	/*
+	 * This should succeed, since we can't have a reloc root without having
+	 * already looked up the actual root and created the reloc root for this
+	 * root.
+	 *
+	 * However if there's some sort of corruption where we have a ref to a
+	 * reloc root without a corresponding root this could return -ENOENT.
+	 */
+	if (IS_ERR(root)) {
+		ASSERT(0);
+		return PTR_ERR(root);
+	}
+	if (root->reloc_root != reloc_root) {
+		ASSERT(0);
+		btrfs_err(fs_info,
+			  "root %llu has two reloc roots associated with it",
+			  reloc_root->root_key.offset);
+		btrfs_put_root(root);
+		return -EUCLEAN;
+	}
 	ret = btrfs_record_root_in_trans(trans, root);
 	btrfs_put_root(root);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 07/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (5 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 06/38] btrfs: do proper error handling in record_reloc_root_in_trans Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 08/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename Josef Bacik
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_rename_exchange.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 070716650df8..2f8bb8405ac6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8897,8 +8897,11 @@ static int btrfs_rename_exchange(struct inode *old_dir,
 		goto out_notrans;
 	}
 
-	if (dest != root)
-		btrfs_record_root_in_trans(trans, dest);
+	if (dest != root) {
+		ret = btrfs_record_root_in_trans(trans, dest);
+		if (ret)
+			goto out_fail;
+	}
 
 	/*
 	 * We need to find a free sequence number both in the source and
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 08/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (6 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 07/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 09/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume Josef Bacik
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_rename.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2f8bb8405ac6..bcbae8b460c0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9205,8 +9205,11 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
 		goto out_notrans;
 	}
 
-	if (dest != root)
-		btrfs_record_root_in_trans(trans, dest);
+	if (dest != root) {
+		ret = btrfs_record_root_in_trans(trans, dest);
+		if (ret)
+			goto out_fail;
+	}
 
 	ret = btrfs_set_inode_index(BTRFS_I(new_dir), &index);
 	if (ret)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 09/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (7 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 08/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 10/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees Josef Bacik
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_delete_subvolume.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/inode.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bcbae8b460c0..7a8d89e1b73f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4175,7 +4175,11 @@ int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
 		goto out_end_trans;
 	}
 
-	btrfs_record_root_in_trans(trans, dest);
+	ret = btrfs_record_root_in_trans(trans, dest);
+	if (ret) {
+		btrfs_abort_transaction(trans, ret);
+		goto out_end_trans;
+	}
 
 	memset(&dest->root_item.drop_progress, 0,
 		sizeof(dest->root_item.drop_progress));
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 10/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (8 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 09/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 11/38] btrfs: handle btrfs_record_root_in_trans failure in create_subvol Josef Bacik
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in btrfs_recover_log_trees.

This appears tricky, however we have a reference count on the
destination root, so if this fails we need to continue on in the loop to
make sure the properly cleanup is done.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/tree-log.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 254c2ee43aae..77adeb3c988d 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6286,8 +6286,12 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree)
 		}
 
 		wc.replay_dest->log_root = log;
-		btrfs_record_root_in_trans(trans, wc.replay_dest);
-		ret = walk_log_tree(trans, log, &wc);
+		ret = btrfs_record_root_in_trans(trans, wc.replay_dest);
+		if (ret)
+			btrfs_handle_fs_error(fs_info, ret,
+				"Couldn't record the root in the transaction.");
+		else
+			ret = walk_log_tree(trans, log, &wc);
 
 		if (!ret && wc.stage == LOG_WALK_REPLAY_ALL) {
 			ret = fixup_inode_link_counts(trans, wc.replay_dest,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 11/38] btrfs: handle btrfs_record_root_in_trans failure in create_subvol
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (9 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 10/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 12/38] btrfs: btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block Josef Bacik
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in create_subvol.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index af8d01659562..2d0782f7e0e0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -714,7 +714,12 @@ static noinline int create_subvol(struct inode *dir,
 	/* Freeing will be done in btrfs_put_root() of new_root */
 	anon_dev = 0;
 
-	btrfs_record_root_in_trans(trans, new_root);
+	ret = btrfs_record_root_in_trans(trans, new_root);
+	if (ret) {
+		btrfs_put_root(new_root);
+		btrfs_abort_transaction(trans, ret);
+		goto fail;
+	}
 
 	ret = btrfs_create_subvol_root(trans, new_root, root, new_dirid);
 	if (!ret) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 12/38] btrfs: btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (10 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 11/38] btrfs: handle btrfs_record_root_in_trans failure in create_subvol Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 13/38] btrfs: handle btrfs_record_root_in_trans failure in start_transaction Josef Bacik
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in relocate_tree_block.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0eea27d9e3cf..7e3305aca6ac 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2548,7 +2548,9 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans,
 				ret = -EUCLEAN;
 				goto out;
 			}
-			btrfs_record_root_in_trans(trans, root);
+			ret = btrfs_record_root_in_trans(trans, root);
+			if (ret)
+				goto out;
 			root = root->reloc_root;
 			node->new_bytenr = root->node->start;
 			btrfs_put_root(node->root);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 13/38] btrfs: handle btrfs_record_root_in_trans failure in start_transaction
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (11 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 12/38] btrfs: btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 14/38] btrfs: handle record_root_in_trans failure in qgroup_account_snapshot Josef Bacik
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

btrfs_record_root_in_trans will return errors in the future, so handle
the error properly in start_transaction.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index eba48578159e..307a73abe86d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -734,7 +734,11 @@ start_transaction(struct btrfs_root *root, unsigned int num_items,
 	 * Thus it need to be called after current->journal_info initialized,
 	 * or we can deadlock.
 	 */
-	btrfs_record_root_in_trans(h, root);
+	ret = btrfs_record_root_in_trans(h, root);
+	if (ret) {
+		btrfs_end_transaction(h);
+		return ERR_PTR(ret);
+	}
 
 	return h;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 14/38] btrfs: handle record_root_in_trans failure in qgroup_account_snapshot
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (12 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 13/38] btrfs: handle btrfs_record_root_in_trans failure in start_transaction Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 15/38] btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans Josef Bacik
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

record_root_in_trans can fail currently, so handle this failure
properly.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 307a73abe86d..5b3008ec4e13 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1436,7 +1436,9 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
 	 * recorded root will never be updated again, causing an outdated root
 	 * item.
 	 */
-	record_root_in_trans(trans, src, 1);
+	ret = record_root_in_trans(trans, src, 1);
+	if (ret)
+		return ret;
 
 	/*
 	 * We are going to commit transaction, see btrfs_commit_transaction()
@@ -1488,7 +1490,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans,
 	 * insert_dir_item()
 	 */
 	if (!ret)
-		record_root_in_trans(trans, parent, 1);
+		ret = record_root_in_trans(trans, parent, 1);
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 15/38] btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (13 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 14/38] btrfs: handle record_root_in_trans failure in qgroup_account_snapshot Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 16/38] btrfs: handle record_root_in_trans failure in create_pending_snapshot Josef Bacik
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

record_root_in_trans can fail currently, handle this failure properly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5b3008ec4e13..4efdb87df27d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -480,6 +480,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans,
 			       struct btrfs_root *root)
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
+	int ret;
 
 	if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state))
 		return 0;
@@ -494,10 +495,10 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans,
 		return 0;
 
 	mutex_lock(&fs_info->reloc_mutex);
-	record_root_in_trans(trans, root, 0);
+	ret = record_root_in_trans(trans, root, 0);
 	mutex_unlock(&fs_info->reloc_mutex);
 
-	return 0;
+	return ret;
 }
 
 static inline int is_transaction_blocked(struct btrfs_transaction *trans)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 16/38] btrfs: handle record_root_in_trans failure in create_pending_snapshot
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (14 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 15/38] btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 17/38] btrfs: do not panic in __add_reloc_root Josef Bacik
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

record_root_in_trans can currently fail, so handle this failure
properly.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4efdb87df27d..0663675354a2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1568,8 +1568,9 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	dentry = pending->dentry;
 	parent_inode = pending->dir;
 	parent_root = BTRFS_I(parent_inode)->root;
-	record_root_in_trans(trans, parent_root, 0);
-
+	ret = record_root_in_trans(trans, parent_root, 0);
+	if (ret)
+		goto fail;
 	cur_time = current_time(parent_inode);
 
 	/*
@@ -1605,7 +1606,11 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 		goto fail;
 	}
 
-	record_root_in_trans(trans, root, 0);
+	ret = record_root_in_trans(trans, root, 0);
+	if (ret) {
+		btrfs_abort_transaction(trans, ret);
+		goto fail;
+	}
 	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
 	memcpy(new_root_item, &root->root_item, sizeof(*new_root_item));
 	btrfs_check_and_init_root_item(new_root_item);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 17/38] btrfs: do not panic in __add_reloc_root
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (15 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 16/38] btrfs: handle record_root_in_trans failure in create_pending_snapshot Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 18/38] btrfs: have proper error handling in btrfs_init_reloc_root Josef Bacik
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

If we have a duplicate entry for a reloc root then we could have fs
corruption that resulted in a double allocation.  This shouldn't happen
generally so leave an ASSERT() for this case, but return an error
instead of panicing in the normal user case.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 7e3305aca6ac..410e779af1ed 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -638,9 +638,11 @@ static int __must_check __add_reloc_root(struct btrfs_root *root)
 				   node->bytenr, &node->rb_node);
 	spin_unlock(&rc->reloc_root_tree.lock);
 	if (rb_node) {
-		btrfs_panic(fs_info, -EEXIST,
+		ASSERT(0);
+		btrfs_err(fs_info,
 			    "Duplicate root found for start=%llu while inserting into relocation tree",
 			    node->bytenr);
+		return -EEXIST;
 	}
 
 	list_add_tail(&root->root_list, &rc->reloc_roots);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 18/38] btrfs: have proper error handling in btrfs_init_reloc_root
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (16 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 17/38] btrfs: do not panic in __add_reloc_root Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 19/38] btrfs: do proper error handling in create_reloc_root Josef Bacik
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

create_reloc_root will return errors in the future, and __add_reloc_root
can return -ENOMEM or -EEXIST, so handle these errors properly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 410e779af1ed..126655b199df 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -858,9 +858,14 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 	reloc_root = create_reloc_root(trans, root, root->root_key.objectid);
 	if (clear_rsv)
 		trans->block_rsv = rsv;
+	if (IS_ERR(reloc_root))
+		return PTR_ERR(reloc_root);
 
 	ret = __add_reloc_root(reloc_root);
-	BUG_ON(ret < 0);
+	if (ret) {
+		btrfs_put_root(reloc_root);
+		return ret;
+	}
 	root->reloc_root = btrfs_grab_root(reloc_root);
 	return 0;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 19/38] btrfs: do proper error handling in create_reloc_root
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (17 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 18/38] btrfs: have proper error handling in btrfs_init_reloc_root Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 20/38] btrfs: validate ->reloc_root after recording root in trans Josef Bacik
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We do memory allocations here, read blocks from disk, all sorts of
operations that could easily fail at any given point.  Instead of
panicing the box, simply return the error back up the chain, all callers
at this point have proper error handling.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 126655b199df..d5740acaaddf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -735,10 +735,12 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans,
 	struct extent_buffer *eb;
 	struct btrfs_root_item *root_item;
 	struct btrfs_key root_key;
-	int ret;
+	int ret = 0;
+	bool must_abort = false;
 
 	root_item = kmalloc(sizeof(*root_item), GFP_NOFS);
-	BUG_ON(!root_item);
+	if (!root_item)
+		return ERR_PTR(-ENOMEM);
 
 	root_key.objectid = BTRFS_TREE_RELOC_OBJECTID;
 	root_key.type = BTRFS_ROOT_ITEM_KEY;
@@ -750,7 +752,9 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans,
 		/* called by btrfs_init_reloc_root */
 		ret = btrfs_copy_root(trans, root, root->commit_root, &eb,
 				      BTRFS_TREE_RELOC_OBJECTID);
-		BUG_ON(ret);
+		if (ret)
+			goto fail;
+
 		/*
 		 * Set the last_snapshot field to the generation of the commit
 		 * root - like this ctree.c:btrfs_block_can_be_shared() behaves
@@ -771,9 +775,16 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans,
 		 */
 		ret = btrfs_copy_root(trans, root, root->node, &eb,
 				      BTRFS_TREE_RELOC_OBJECTID);
-		BUG_ON(ret);
+		if (ret)
+			goto fail;
 	}
 
+	/*
+	 * We have changed references at this point, we must abort the
+	 * transaction if anything fails.
+	 */
+	must_abort = true;
+
 	memcpy(root_item, &root->root_item, sizeof(*root_item));
 	btrfs_set_root_bytenr(root_item, eb->start);
 	btrfs_set_root_level(root_item, btrfs_header_level(eb));
@@ -791,14 +802,25 @@ static struct btrfs_root *create_reloc_root(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_insert_root(trans, fs_info->tree_root,
 				&root_key, root_item);
-	BUG_ON(ret);
+	if (ret)
+		goto fail;
+
 	kfree(root_item);
 
 	reloc_root = btrfs_read_tree_root(fs_info->tree_root, &root_key);
-	BUG_ON(IS_ERR(reloc_root));
+	if (IS_ERR(reloc_root)) {
+		ret = PTR_ERR(reloc_root);
+		goto abort;
+	}
 	set_bit(BTRFS_ROOT_SHAREABLE, &reloc_root->state);
 	reloc_root->last_trans = trans->transid;
 	return reloc_root;
+fail:
+	kfree(root_item);
+abort:
+	if (must_abort)
+		btrfs_abort_transaction(trans, ret);
+	return ERR_PTR(ret);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 20/38] btrfs: validate ->reloc_root after recording root in trans
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (18 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 19/38] btrfs: do proper error handling in create_reloc_root Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 21/38] btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots Josef Bacik
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Zygo Blaxell

If we fail to setup a ->reloc_root in a different thread that path will
error out, however it still leaves root->reloc_root NULL but would still
appear set up in the transaction.  Subsequent calls to
btrfs_record_root_in_transaction would succeed without attempting to
create the reloc root, as the transid has already been update.  Handle
this case by making sure we have a root->reloc_root set after a
btrfs_record_root_in_transaction call so we don't end up deref'ing a
NULL pointer.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d5740acaaddf..918fee55bc30 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2083,6 +2083,13 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
 			return ERR_PTR(ret);
 		root = root->reloc_root;
 
+		/*
+		 * We could have raced with another thread which failed, so
+		 * ->reloc_root may not be set, return -ENOENT in this case.
+		 */
+		if (!root)
+			return ERR_PTR(-ENOENT);
+
 		if (next->new_bytenr != root->node->start) {
 			/*
 			 * We just created the reloc root, so we shouldn't have
@@ -2580,6 +2587,14 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans,
 			ret = btrfs_record_root_in_trans(trans, root);
 			if (ret)
 				goto out;
+			/*
+			 * Another thread could have failed, need to check if we
+			 * have ->reloc_root actually set.
+			 */
+			if (!root->reloc_root) {
+				ret = -ENOENT;
+				goto out;
+			}
 			root = root->reloc_root;
 			node->new_bytenr = root->node->start;
 			btrfs_put_root(node->root);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 21/38] btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (19 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 20/38] btrfs: validate ->reloc_root after recording root in trans Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 22/38] btrfs: change insert_dirty_subvol to return errors Josef Bacik
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

btrfs_update_reloc_root will will return errors in the future, so handle
the error properly in commit_fs_roots.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/transaction.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0663675354a2..1e1587290955 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1344,7 +1344,9 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
 			spin_unlock(&fs_info->fs_roots_radix_lock);
 
 			btrfs_free_log(trans, root);
-			btrfs_update_reloc_root(trans, root);
+			ret2 = btrfs_update_reloc_root(trans, root);
+			if (ret2)
+				return ret2;
 
 			/* see comments in should_cow_block() */
 			clear_bit(BTRFS_ROOT_FORCE_COW, &root->state);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 22/38] btrfs: change insert_dirty_subvol to return errors
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (20 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 21/38] btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 23/38] btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol Josef Bacik
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This will be able to return errors in the future, so change it to return
an error and handle the error appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 918fee55bc30..13d5fd74e745 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1566,9 +1566,9 @@ static int find_next_key(struct btrfs_path *path, int level,
 /*
  * Insert current subvolume into reloc_control::dirty_subvol_roots
  */
-static void insert_dirty_subvol(struct btrfs_trans_handle *trans,
-				struct reloc_control *rc,
-				struct btrfs_root *root)
+static int insert_dirty_subvol(struct btrfs_trans_handle *trans,
+			       struct reloc_control *rc,
+			       struct btrfs_root *root)
 {
 	struct btrfs_root *reloc_root = root->reloc_root;
 	struct btrfs_root_item *reloc_root_item;
@@ -1588,6 +1588,7 @@ static void insert_dirty_subvol(struct btrfs_trans_handle *trans,
 		btrfs_grab_root(root);
 		list_add_tail(&root->reloc_dirty_list, &rc->dirty_subvol_roots);
 	}
+	return 0;
 }
 
 static int clean_dirty_subvols(struct reloc_control *rc)
@@ -1789,8 +1790,11 @@ static noinline_for_stack int merge_reloc_root(struct reloc_control *rc,
 out:
 	btrfs_free_path(path);
 
-	if (ret == 0)
-		insert_dirty_subvol(trans, rc, root);
+	if (ret == 0) {
+		ret = insert_dirty_subvol(trans, rc, root);
+		if (ret)
+			btrfs_abort_transaction(trans, ret);
+	}
 
 	if (trans)
 		btrfs_end_transaction_throttle(trans);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 23/38] btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (21 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 22/38] btrfs: change insert_dirty_subvol to return errors Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 24/38] btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge Josef Bacik
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_update_reloc_root will will return errors in the future, so handle
the error properly in insert_dirty_subvol.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 13d5fd74e745..dde383477f5f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1572,6 +1572,7 @@ static int insert_dirty_subvol(struct btrfs_trans_handle *trans,
 {
 	struct btrfs_root *reloc_root = root->reloc_root;
 	struct btrfs_root_item *reloc_root_item;
+	int ret;
 
 	/* @root must be a subvolume tree root with a valid reloc tree */
 	ASSERT(root->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID);
@@ -1582,7 +1583,9 @@ static int insert_dirty_subvol(struct btrfs_trans_handle *trans,
 		sizeof(reloc_root_item->drop_progress));
 	btrfs_set_root_drop_level(reloc_root_item, 0);
 	btrfs_set_root_refs(reloc_root_item, 0);
-	btrfs_update_reloc_root(trans, root);
+	ret = btrfs_update_reloc_root(trans, root);
+	if (ret)
+		return ret;
 
 	if (list_empty(&root->reloc_dirty_list)) {
 		btrfs_grab_root(root);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 24/38] btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (22 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 23/38] btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 25/38] btrfs: do proper error handling in btrfs_update_reloc_root Josef Bacik
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

btrfs_update_reloc_root will will return errors in the future, so handle
an error properly in prepare_to_merge.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index dde383477f5f..3a207548edde 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1870,10 +1870,21 @@ int prepare_to_merge(struct reloc_control *rc, int err)
 		 */
 		if (!err)
 			btrfs_set_root_refs(&reloc_root->root_item, 1);
-		btrfs_update_reloc_root(trans, root);
+		ret = btrfs_update_reloc_root(trans, root);
 
+		/*
+		 * Even if we have an error we need this reloc root back on our
+		 * list so we can clean up properly.
+		 */
 		list_add(&reloc_root->root_list, &reloc_roots);
 		btrfs_put_root(root);
+
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			if (!err)
+				err = ret;
+			break;
+		}
 	}
 
 	list_splice(&reloc_roots, &rc->reloc_roots);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 25/38] btrfs: do proper error handling in btrfs_update_reloc_root
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (23 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 24/38] btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 26/38] btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s Josef Bacik
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

We call btrfs_update_root in btrfs_update_reloc_root, which can fail for
all sorts of reasons, including IO errors.  Instead of panicing the box
lets return the error, now that all callers properly handle those
errors.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 3a207548edde..73c8cc502b07 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -904,7 +904,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 	int ret;
 
 	if (!have_reloc_root(root))
-		goto out;
+		return 0;
 
 	reloc_root = root->reloc_root;
 	root_item = &reloc_root->root_item;
@@ -937,10 +937,8 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_update_root(trans, fs_info->tree_root,
 				&reloc_root->root_key, root_item);
-	BUG_ON(ret);
 	btrfs_put_root(reloc_root);
-out:
-	return 0;
+	return ret;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 26/38] btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (24 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 25/38] btrfs: do proper error handling in btrfs_update_reloc_root Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 27/38] btrfs: handle btrfs_cow_block errors in replace_path Josef Bacik
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

A few BUG_ON()'s in replace_path are purely to keep us from making
logical mistakes, so replace them with ASSERT()'s.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 73c8cc502b07..574ae3f43e33 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1212,8 +1212,8 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 	int ret;
 	int slot;
 
-	BUG_ON(src->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID);
-	BUG_ON(dest->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID);
+	ASSERT(src->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID);
+	ASSERT(dest->root_key.objectid != BTRFS_TREE_RELOC_OBJECTID);
 
 	last_snapshot = btrfs_root_last_snapshot(&src->root_item);
 again:
@@ -1244,7 +1244,7 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 	parent = eb;
 	while (1) {
 		level = btrfs_header_level(parent);
-		BUG_ON(level < lowest_level);
+		ASSERT(level >= lowest_level);
 
 		ret = btrfs_bin_search(parent, &key, &slot);
 		if (ret < 0)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 27/38] btrfs: handle btrfs_cow_block errors in replace_path
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (25 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 26/38] btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 28/38] btrfs: handle btrfs_search_slot failure " Josef Bacik
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

If we error out cow'ing the root node when doing a replace_path then we
simply unlock and free the buffer and return the error.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 574ae3f43e33..d43603ae5570 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1232,7 +1232,11 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 	if (cow) {
 		ret = btrfs_cow_block(trans, dest, eb, NULL, 0, &eb,
 				      BTRFS_NESTING_COW);
-		BUG_ON(ret);
+		if (ret) {
+			btrfs_tree_unlock(eb);
+			free_extent_buffer(eb);
+			return ret;
+		}
 	}
 
 	if (next_key) {
@@ -1292,7 +1296,11 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 				ret = btrfs_cow_block(trans, dest, eb, parent,
 						      slot, &eb,
 						      BTRFS_NESTING_COW);
-				BUG_ON(ret);
+				if (ret) {
+					btrfs_tree_unlock(eb);
+					free_extent_buffer(eb);
+					break;
+				}
 			}
 
 			btrfs_tree_unlock(parent);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 28/38] btrfs: handle btrfs_search_slot failure in replace_path
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (26 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 27/38] btrfs: handle btrfs_cow_block errors in replace_path Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 29/38] btrfs: handle errors in reference count manipulation " Josef Bacik
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

This can fail for any number of reasons, why bring the whole box down
with it?

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d43603ae5570..8101cae374cf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1324,7 +1324,8 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 		path->lowest_level = level;
 		ret = btrfs_search_slot(trans, src, &key, path, 0, 1);
 		path->lowest_level = 0;
-		BUG_ON(ret);
+		if (ret)
+			break;
 
 		/*
 		 * Info qgroup to trace both subtrees.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 29/38] btrfs: handle errors in reference count manipulation in replace_path
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (27 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 28/38] btrfs: handle btrfs_search_slot failure " Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 30/38] btrfs: handle extent reference errors in do_relocation Josef Bacik
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

If any of the reference count manipulation stuff fails in replace_path
we need to abort the transaction, as we've modified the blocks already.
We can simply break at this point and everything will be cleaned up.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 8101cae374cf..e60353980942 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1365,27 +1365,39 @@ int replace_path(struct btrfs_trans_handle *trans, struct reloc_control *rc,
 		ref.skip_qgroup = true;
 		btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid);
 		ret = btrfs_inc_extent_ref(trans, &ref);
-		BUG_ON(ret);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			break;
+		}
 		btrfs_init_generic_ref(&ref, BTRFS_ADD_DELAYED_REF, new_bytenr,
 				       blocksize, 0);
 		ref.skip_qgroup = true;
 		btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid);
 		ret = btrfs_inc_extent_ref(trans, &ref);
-		BUG_ON(ret);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			break;
+		}
 
 		btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, new_bytenr,
 				       blocksize, path->nodes[level]->start);
 		btrfs_init_tree_ref(&ref, level - 1, src->root_key.objectid);
 		ref.skip_qgroup = true;
 		ret = btrfs_free_extent(trans, &ref);
-		BUG_ON(ret);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			break;
+		}
 
 		btrfs_init_generic_ref(&ref, BTRFS_DROP_DELAYED_REF, old_bytenr,
 				       blocksize, 0);
 		btrfs_init_tree_ref(&ref, level - 1, dest->root_key.objectid);
 		ref.skip_qgroup = true;
 		ret = btrfs_free_extent(trans, &ref);
-		BUG_ON(ret);
+		if (ret) {
+			btrfs_abort_transaction(trans, ret);
+			break;
+		}
 
 		btrfs_unlock_up_safe(path, 0);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 30/38] btrfs: handle extent reference errors in do_relocation
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (28 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 29/38] btrfs: handle errors in reference count manipulation " Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 31/38] btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly Josef Bacik
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We can already deal with errors appropriately from do_relocation, simply
handle any errors that come from changing the refs at this point
cleanly.  We have to abort the transaction if we fail here as we've
modified metadata at this point.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index e60353980942..67df6ae6d13f 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2434,10 +2434,11 @@ static int do_relocation(struct btrfs_trans_handle *trans,
 			btrfs_init_tree_ref(&ref, node->level,
 					    btrfs_header_owner(upper->eb));
 			ret = btrfs_inc_extent_ref(trans, &ref);
-			BUG_ON(ret);
-
-			ret = btrfs_drop_subtree(trans, root, eb, upper->eb);
-			BUG_ON(ret);
+			if (!ret)
+				ret = btrfs_drop_subtree(trans, root, eb,
+							 upper->eb);
+			if (ret)
+				btrfs_abort_transaction(trans, ret);
 		}
 next:
 		if (!upper->pending)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 31/38] btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (29 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 30/38] btrfs: handle extent reference errors in do_relocation Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 32/38] btrfs: remove the extent item sanity checks in relocate_block_group Josef Bacik
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

We need to validate that a data extent item does not have the
FULL_BACKREF flag set on it's flags.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/tree-checker.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 028e733e42f3..39714aeb9b36 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -1283,6 +1283,11 @@ static int check_extent_item(struct extent_buffer *leaf,
 				   key->offset, fs_info->sectorsize);
 			return -EUCLEAN;
 		}
+		if (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF) {
+			extent_err(leaf, slot,
+			"invalid extent flag, data has full backref set");
+			return -EUCLEAN;
+		}
 	}
 	ptr = (unsigned long)(struct btrfs_extent_item *)(ei + 1);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 32/38] btrfs: remove the extent item sanity checks in relocate_block_group
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (30 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 31/38] btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 33/38] btrfs: do proper error handling in create_reloc_inode Josef Bacik
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

These checks are all taken care of for us by the tree checker code.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 29 +----------------------------
 1 file changed, 1 insertion(+), 28 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 67df6ae6d13f..93446ec5bcd9 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3371,20 +3371,6 @@ static void unset_reloc_control(struct reloc_control *rc)
 	mutex_unlock(&fs_info->reloc_mutex);
 }
 
-static int check_extent_flags(u64 flags)
-{
-	if ((flags & BTRFS_EXTENT_FLAG_DATA) &&
-	    (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK))
-		return 1;
-	if (!(flags & BTRFS_EXTENT_FLAG_DATA) &&
-	    !(flags & BTRFS_EXTENT_FLAG_TREE_BLOCK))
-		return 1;
-	if ((flags & BTRFS_EXTENT_FLAG_DATA) &&
-	    (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF))
-		return 1;
-	return 0;
-}
-
 static noinline_for_stack
 int prepare_to_relocate(struct reloc_control *rc)
 {
@@ -3436,7 +3422,6 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 	struct btrfs_path *path;
 	struct btrfs_extent_item *ei;
 	u64 flags;
-	u32 item_size;
 	int ret;
 	int err = 0;
 	int progress = 0;
@@ -3485,19 +3470,7 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 
 		ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
 				    struct btrfs_extent_item);
-		item_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
-		if (item_size >= sizeof(*ei)) {
-			flags = btrfs_extent_flags(path->nodes[0], ei);
-			ret = check_extent_flags(flags);
-			BUG_ON(ret);
-		} else if (unlikely(item_size == sizeof(struct btrfs_extent_item_v0))) {
-			err = -EINVAL;
-			btrfs_print_v0_err(trans->fs_info);
-			btrfs_abort_transaction(trans, err);
-			break;
-		} else {
-			BUG();
-		}
+		flags = btrfs_extent_flags(path->nodes[0], ei);
 
 		if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
 			ret = add_tree_block(rc, &key, path, &blocks);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 33/38] btrfs: do proper error handling in create_reloc_inode
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (31 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 32/38] btrfs: remove the extent item sanity checks in relocate_block_group Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 34/38] btrfs: handle __add_reloc_root failures in btrfs_recover_relocation Josef Bacik
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We already handle some errors in this function, and the callers do the
correct error handling, so clean up the rest of the function to do the
appropriate error handling.

There's a little extra work that needs to be done here, as we create the
inode item before we create the orphan item.  We could potentially add
the orphan item, but if we failed to create the inode item we would have
to abort the transaction.  Instead add a helper to delete the inode item
we created in the case that we're unable to look up the inode (this
would likely be caused by an ENOMEM), which if it succeeds means we can
avoid a transaction abort in this particular error case.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 93446ec5bcd9..26b85ab4b295 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3609,6 +3609,35 @@ static int __insert_orphan_inode(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
+static void delete_orphan_inode(struct btrfs_trans_handle *trans,
+				struct btrfs_root *root, u64 objectid)
+{
+	struct btrfs_path *path;
+	struct btrfs_key key;
+	int ret = 0;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	key.objectid = objectid;
+	key.type = BTRFS_INODE_ITEM_KEY;
+	key.offset = 0;
+	ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
+	if (ret) {
+		if (ret > 0)
+			ret = -ENOENT;
+		goto out;
+	}
+	ret = btrfs_del_item(trans, root, path);
+out:
+	if (ret)
+		btrfs_abort_transaction(trans, ret);
+	btrfs_free_path(path);
+}
+
 /*
  * helper to create inode for data relocation.
  * the inode is in data relocation tree and its link count is 0
@@ -3635,10 +3664,16 @@ struct inode *create_reloc_inode(struct btrfs_fs_info *fs_info,
 		goto out;
 
 	err = __insert_orphan_inode(trans, root, objectid);
-	BUG_ON(err);
+	if (err)
+		goto out;
 
 	inode = btrfs_iget(fs_info->sb, objectid, root);
-	BUG_ON(IS_ERR(inode));
+	if (IS_ERR(inode)) {
+		delete_orphan_inode(trans, root, objectid);
+		err = PTR_ERR(inode);
+		inode = NULL;
+		goto out;
+	}
 	BTRFS_I(inode)->index_cnt = group->start;
 
 	err = btrfs_orphan_add(trans, BTRFS_I(inode));
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 34/38] btrfs: handle __add_reloc_root failures in btrfs_recover_relocation
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (32 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 33/38] btrfs: do proper error handling in create_reloc_inode Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 35/38] btrfs: cleanup error handling in prepare_to_merge Josef Bacik
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

We can already handle errors appropriately from this function, deal with
an error coming from __add_reloc_root appropriately.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 26b85ab4b295..39d5cb9360a2 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4015,7 +4015,12 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 		}
 
 		err = __add_reloc_root(reloc_root);
-		BUG_ON(err < 0); /* -ENOMEM or logic error */
+		if (err) {
+			list_add_tail(&reloc_root->root_list, &reloc_roots);
+			btrfs_put_root(fs_root);
+			btrfs_end_transaction(trans);
+			goto out_unset;
+		}
 		fs_root->reloc_root = btrfs_grab_root(reloc_root);
 		btrfs_put_root(fs_root);
 	}
@@ -4230,7 +4235,10 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 		return PTR_ERR(reloc_root);
 
 	ret = __add_reloc_root(reloc_root);
-	BUG_ON(ret < 0);
+	if (ret) {
+		btrfs_put_root(reloc_root);
+		return ret;
+	}
 	new_root->reloc_root = btrfs_grab_root(reloc_root);
 
 	if (rc->create_reloc_tree)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 35/38] btrfs: cleanup error handling in prepare_to_merge
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (33 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 34/38] btrfs: handle __add_reloc_root failures in btrfs_recover_relocation Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 36/38] btrfs: handle extent corruption with select_one_root properly Josef Bacik
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

This probably can't happen even with a corrupt file system, because we
would have failed much earlier on than here.  However there's no reason
we can't just check and bail out as appropriate, so do that and convert
the correctness BUG_ON() to an ASSERT().

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 39d5cb9360a2..56f1fce7c746 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1880,8 +1880,14 @@ int prepare_to_merge(struct reloc_control *rc, int err)
 
 		root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset,
 				false);
-		BUG_ON(IS_ERR(root));
-		BUG_ON(root->reloc_root != reloc_root);
+		if (IS_ERR(root)) {
+			list_add(&reloc_root->root_list, &reloc_roots);
+			btrfs_abort_transaction(trans, (int)PTR_ERR(root));
+			if (!err)
+				err = PTR_ERR(root);
+			break;
+		}
+		ASSERT(root->reloc_root == reloc_root);
 
 		/*
 		 * set reference count to 1, so btrfs_recover_relocation
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 36/38] btrfs: handle extent corruption with select_one_root properly
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (34 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 35/38] btrfs: cleanup error handling in prepare_to_merge Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 37/38] btrfs: do proper error handling in merge_reloc_roots Josef Bacik
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

In corruption cases we could have paths from a block up to no root at
all, and thus we'll BUG_ON(!root) in select_one_root.  Handle this by
adding an ASSERT() for developers, and returning an error for normal
users.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 56f1fce7c746..3f71fbb5ea18 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2209,7 +2209,17 @@ struct btrfs_root *select_one_root(struct btrfs_backref_node *node)
 		cond_resched();
 		next = walk_up_backref(next, edges, &index);
 		root = next->root;
-		BUG_ON(!root);
+
+		/*
+		 * This can occur if we have incomplete extent refs leading all
+		 * the way up a particular path, in this case return -EUCLEAN.
+		 * However leave as an ASSERT() for developers, because it could
+		 * indicate a bug in the backref code.
+		 */
+		if (!root) {
+			ASSERT(0);
+			return ERR_PTR(-EUCLEAN);
+		}
 
 		/* No other choice for non-shareable tree */
 		if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state))
@@ -2599,8 +2609,12 @@ static int relocate_tree_block(struct btrfs_trans_handle *trans,
 
 	BUG_ON(node->processed);
 	root = select_one_root(node);
-	if (root == ERR_PTR(-ENOENT)) {
-		update_processed_blocks(rc, node);
+	if (IS_ERR(root)) {
+		ret = PTR_ERR(root);
+		if (ret == -ENOENT) {
+			ret = 0;
+			update_processed_blocks(rc, node);
+		}
 		goto out;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 37/38] btrfs: do proper error handling in merge_reloc_roots
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (35 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 36/38] btrfs: handle extent corruption with select_one_root properly Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 16:26 ` [PATCH v7 38/38] btrfs: check return value of btrfs_commit_transaction in relocation Josef Bacik
  2020-12-16 19:56 ` [PATCH v7 00/38] Cleanup error handling " Zygo Blaxell
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have a BUG_ON() if we get an error back from btrfs_get_fs_root().
This honestly should never fail, as at this point we have a solid
coordination of fs root to reloc root, and these roots will all be in
memory.  But in the name of killing BUG_ON()'s remove these and handle
the error condition properly, ASSERT()'ing for developers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 3f71fbb5ea18..44743d1fe414 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1960,8 +1960,29 @@ void merge_reloc_roots(struct reloc_control *rc)
 		root = btrfs_get_fs_root(fs_info, reloc_root->root_key.offset,
 					 false);
 		if (btrfs_root_refs(&reloc_root->root_item) > 0) {
-			BUG_ON(IS_ERR(root));
-			BUG_ON(root->reloc_root != reloc_root);
+			if (IS_ERR(root)) {
+				/*
+				 * For recovery we read the fs roots on mount,
+				 * and if we didn't find the root then we marked
+				 * the reloc root as a garbage root.  For normal
+				 * relocation obviously the root should exist in
+				 * memory.  However there's no reason we can't
+				 * handle the error properly here just in case.
+				 */
+				ASSERT(0);
+				ret = PTR_ERR(root);
+				goto out;
+			}
+			if (root->reloc_root != reloc_root) {
+				/*
+				 * This is actually impossible without something
+				 * going really wrong (like weird race condition
+				 * or cosmic rays).
+				 */
+				ASSERT(0);
+				ret = -EINVAL;
+				goto out;
+			}
 			ret = merge_reloc_root(rc, root);
 			btrfs_put_root(root);
 			if (ret) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 38/38] btrfs: check return value of btrfs_commit_transaction in relocation
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (36 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 37/38] btrfs: do proper error handling in merge_reloc_roots Josef Bacik
@ 2020-12-16 16:26 ` Josef Bacik
  2020-12-16 19:56 ` [PATCH v7 00/38] Cleanup error handling " Zygo Blaxell
  38 siblings, 0 replies; 45+ messages in thread
From: Josef Bacik @ 2020-12-16 16:26 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

There's a few places where we don't check the return value of
btrfs_commit_transaction in relocation.c.  Thankfully all these places
have straightforward error handling, so simply change all of the sites
at once.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 44743d1fe414..26ecc55e9f59 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1915,7 +1915,7 @@ int prepare_to_merge(struct reloc_control *rc, int err)
 	list_splice(&reloc_roots, &rc->reloc_roots);
 
 	if (!err)
-		btrfs_commit_transaction(trans);
+		err = btrfs_commit_transaction(trans);
 	else
 		btrfs_end_transaction(trans);
 	return err;
@@ -3450,8 +3450,7 @@ int prepare_to_relocate(struct reloc_control *rc)
 		 */
 		return PTR_ERR(trans);
 	}
-	btrfs_commit_transaction(trans);
-	return 0;
+	return btrfs_commit_transaction(trans);
 }
 
 static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
@@ -3610,7 +3609,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 		err = PTR_ERR(trans);
 		goto out_free;
 	}
-	btrfs_commit_transaction(trans);
+	ret = btrfs_commit_transaction(trans);
+	if (ret && !err)
+		err = ret;
 out_free:
 	ret = clean_dirty_subvols(rc);
 	if (ret < 0 && !err)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 00/38] Cleanup error handling in relocation
  2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
                   ` (37 preceding siblings ...)
  2020-12-16 16:26 ` [PATCH v7 38/38] btrfs: check return value of btrfs_commit_transaction in relocation Josef Bacik
@ 2020-12-16 19:56 ` Zygo Blaxell
  2020-12-18 16:25   ` Zygo Blaxell
  38 siblings, 1 reply; 45+ messages in thread
From: Zygo Blaxell @ 2020-12-16 19:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 11:26:16AM -0500, Josef Bacik wrote:
> v6->v7:
> - Broke up the series into 3 series, 1 for cosmetic things, 1 for all the major
>   issues (including those reported on v6 of this set), and this new set which is
>   solely the error handling related patches for relocation.  It's still a lot of
>   patches, sorry about that.

So far it lockdepped, but it is still running:

	[Wed Dec 16 13:30:45 2020] irq event stamp: 5875656

	[Wed Dec 16 13:30:45 2020] ======================================================
	[Wed Dec 16 13:30:45 2020] hardirqs last  enabled at (5875655): [<ffffffff98a82e37>] _raw_spin_unlock_irqrestore+0x47/0x50
	[Wed Dec 16 13:30:45 2020] WARNING: possible circular locking dependency detected
	[Wed Dec 16 13:30:45 2020] hardirqs last disabled at (5875656): [<ffffffff98a8368d>] _raw_spin_lock_irqsave+0x7d/0xa0
	[Wed Dec 16 13:30:45 2020] softirqs last  enabled at (5874382): [<ffffffff98e0044f>] __do_softirq+0x44f/0x5a7
	[Wed Dec 16 13:30:45 2020] 5.10.0-39fbe74d1bbc-josef+ #1 Tainted: G        W        
	[Wed Dec 16 13:30:45 2020] softirqs last disabled at (5874369): [<ffffffff98c01122>] asm_call_irq_on_stack+0x12/0x20
	[Wed Dec 16 13:30:45 2020] ------------------------------------------------------
	[Wed Dec 16 13:30:45 2020] btrfs/6366 is trying to acquire lock:
	[Wed Dec 16 13:30:45 2020] ffff88816f146a28 (btrfs-treloc-03){+.+.}-{3:3}, at: __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:45 2020]
				   but task is already holding lock:
	[Wed Dec 16 13:30:45 2020] ffff88810167c598 (btrfs-tree-02){++++}-{3:3}, at: __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:45 2020]
				   which lock already depends on the new lock.

	[Wed Dec 16 13:30:45 2020]
				   the existing dependency chain (in reverse order) is:
	[Wed Dec 16 13:30:45 2020]
				   -> #1 (btrfs-tree-02){++++}-{3:3}:
	[Wed Dec 16 13:30:45 2020]        down_write_nested+0xa6/0x2d0
	[Wed Dec 16 13:30:45 2020]        __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:45 2020]        btrfs_tree_lock+0x10/0x20
	[Wed Dec 16 13:30:45 2020]        btrfs_search_slot+0x474/0x1090
	[Wed Dec 16 13:30:45 2020]        do_relocation+0x2a0/0xc20
	[Wed Dec 16 13:30:45 2020]        relocate_tree_blocks+0x733/0xb90
	[Wed Dec 16 13:30:45 2020]        relocate_block_group+0x2eb/0x780
	[Wed Dec 16 13:30:45 2020]        btrfs_relocate_block_group+0x26e/0x4c0
	[Wed Dec 16 13:30:45 2020]        btrfs_relocate_chunk+0x52/0x120
	[Wed Dec 16 13:30:45 2020]        btrfs_balance+0xe2e/0x1900
	[Wed Dec 16 13:30:45 2020]        btrfs_ioctl_balance+0x1f9/0x460
	[Wed Dec 16 13:30:45 2020]        btrfs_ioctl+0xe9c/0x4360
	[Wed Dec 16 13:30:45 2020]        __x64_sys_ioctl+0xc3/0x100
	[Wed Dec 16 13:30:45 2020]        do_syscall_64+0x37/0x80
	[Wed Dec 16 13:30:45 2020]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[Wed Dec 16 13:30:45 2020]
				   -> #0 (btrfs-treloc-03){+.+.}-{3:3}:
	[Wed Dec 16 13:30:45 2020]        __lock_acquire+0x1ebb/0x2880
	[Wed Dec 16 13:30:45 2020]        lock_acquire+0x192/0x550
	[Wed Dec 16 13:30:45 2020]        down_write_nested+0xa6/0x2d0
	[Wed Dec 16 13:30:45 2020]        __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:45 2020]        btrfs_lock_root_node+0x5b/0x190
	[Wed Dec 16 13:30:45 2020]        btrfs_search_slot+0xc8d/0x1090
	[Wed Dec 16 13:30:45 2020]        replace_path.isra.36+0x8a2/0xee0
	[Wed Dec 16 13:30:45 2020]        merge_reloc_root+0x58c/0xc10
	[Wed Dec 16 13:30:45 2020]        merge_reloc_roots+0x1e6/0x4a0
	[Wed Dec 16 13:30:45 2020]        relocate_block_group+0x3d2/0x780
	[Wed Dec 16 13:30:45 2020]        btrfs_relocate_block_group+0x26e/0x4c0
	[Wed Dec 16 13:30:45 2020]        btrfs_relocate_chunk+0x52/0x120
	[Wed Dec 16 13:30:45 2020]        btrfs_balance+0xe2e/0x1900
	[Wed Dec 16 13:30:45 2020]        btrfs_ioctl_balance+0x1f9/0x460
	[Wed Dec 16 13:30:45 2020]        btrfs_ioctl+0xe9c/0x4360
	[Wed Dec 16 13:30:45 2020]        __x64_sys_ioctl+0xc3/0x100
	[Wed Dec 16 13:30:45 2020]        do_syscall_64+0x37/0x80
	[Wed Dec 16 13:30:45 2020]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[Wed Dec 16 13:30:45 2020]
				   other info that might help us debug this:

	[Wed Dec 16 13:30:45 2020]  Possible unsafe locking scenario:

	[Wed Dec 16 13:30:45 2020]        CPU0                    CPU1
	[Wed Dec 16 13:30:45 2020]        ----                    ----
	[Wed Dec 16 13:30:45 2020]   lock(btrfs-tree-02);
	[Wed Dec 16 13:30:45 2020]                                lock(btrfs-treloc-03);
	[Wed Dec 16 13:30:45 2020]                                lock(btrfs-tree-02);
	[Wed Dec 16 13:30:45 2020]   lock(btrfs-treloc-03);
	[Wed Dec 16 13:30:45 2020]
				    *** DEADLOCK ***

	[Wed Dec 16 13:30:45 2020] 5 locks held by btrfs/6366:
	[Wed Dec 16 13:30:45 2020]  #0: ffff888128318498 (sb_writers#13){.+.+}-{0:0}, at: btrfs_ioctl_balance+0x71/0x460
	[Wed Dec 16 13:30:45 2020]  #1: ffff88810d2f22c8 (&fs_info->delete_unused_bgs_mutex){+.+.}-{3:3}, at: btrfs_balance+0xa54/0x1900
	[Wed Dec 16 13:30:45 2020]  #2: ffff88810d2f08f0 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x266/0x4c0
	[Wed Dec 16 13:30:45 2020]  #3: ffff8881283186b8 (sb_internal#2){.+.+}-{0:0}, at: btrfs_start_transaction+0x1e/0x20
	[Wed Dec 16 13:30:45 2020]  #4: ffff88810167c598 (btrfs-tree-02){++++}-{3:3}, at: __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:45 2020]
				   stack backtrace:
	[Wed Dec 16 13:30:45 2020] CPU: 0 PID: 6366 Comm: btrfs Tainted: G        W         5.10.0-39fbe74d1bbc-josef+ #1
	[Wed Dec 16 13:30:45 2020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
	[Wed Dec 16 13:30:45 2020] Call Trace:
	[Wed Dec 16 13:30:45 2020]  dump_stack+0xbc/0xf9
	[Wed Dec 16 13:30:45 2020]  print_circular_bug.isra.42.cold.67+0x146/0x14b
	[Wed Dec 16 13:30:45 2020]  check_noncircular+0x219/0x250
	[Wed Dec 16 13:30:45 2020]  ? print_circular_bug.isra.42+0x230/0x230
	[Wed Dec 16 13:30:45 2020]  ? pvclock_clocksource_read+0xeb/0x190
	[Wed Dec 16 13:30:45 2020]  ? __kasan_check_write+0x14/0x20
	[Wed Dec 16 13:30:45 2020]  ? lockdep_lock+0xaa/0x160
	[Wed Dec 16 13:30:45 2020]  __lock_acquire+0x1ebb/0x2880
	[Wed Dec 16 13:30:45 2020]  ? register_lock_class+0x8f0/0x8f0
	[Wed Dec 16 13:30:45 2020]  ? rcu_read_lock_sched_held+0xa1/0xd0
	[Wed Dec 16 13:30:45 2020]  ? rcu_read_lock_bh_held+0xb0/0xb0
	[Wed Dec 16 13:30:45 2020]  lock_acquire+0x192/0x550
	[Wed Dec 16 13:30:45 2020]  ? __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:45 2020]  ? check_flags+0x30/0x30
	[Wed Dec 16 13:30:45 2020]  ? ___might_sleep+0x10f/0x1e0
	[Wed Dec 16 13:30:46 2020]  ? __might_sleep+0x71/0xe0
	[Wed Dec 16 13:30:46 2020]  down_write_nested+0xa6/0x2d0
	[Wed Dec 16 13:30:46 2020]  ? __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:46 2020]  ? btrfs_root_node+0x23/0x200
	[Wed Dec 16 13:30:46 2020]  ? _down_write_nest_lock+0x2c0/0x2c0
	[Wed Dec 16 13:30:46 2020]  ? rcu_read_lock_sched_held+0xd0/0xd0
	[Wed Dec 16 13:30:46 2020]  __btrfs_tree_lock+0x29/0x190
	[Wed Dec 16 13:30:46 2020]  btrfs_lock_root_node+0x5b/0x190
	[Wed Dec 16 13:30:46 2020]  btrfs_search_slot+0xc8d/0x1090
	[Wed Dec 16 13:30:46 2020]  ? __kasan_check_read+0x11/0x20
	[Wed Dec 16 13:30:46 2020]  ? check_flags.part.50+0x6c/0x1e0
	[Wed Dec 16 13:30:46 2020]  ? split_leaf+0xa20/0xa20
	[Wed Dec 16 13:30:46 2020]  ? _raw_spin_unlock+0x22/0x30
	[Wed Dec 16 13:30:46 2020]  ? release_extent_buffer+0x225/0x280
	[Wed Dec 16 13:30:46 2020]  ? __kasan_check_write+0x14/0x20
	[Wed Dec 16 13:30:46 2020]  ? free_extent_buffer.part.53+0x90/0x140
	[Wed Dec 16 13:30:46 2020]  ? free_extent_buffer+0x13/0x20
	[Wed Dec 16 13:30:46 2020]  replace_path.isra.36+0x8a2/0xee0
	[Wed Dec 16 13:30:46 2020]  ? describe_relocation.isra.30+0xf0/0xf0
	[Wed Dec 16 13:30:46 2020]  ? check_setget_bounds+0x2a/0x60
	[Wed Dec 16 13:30:46 2020]  ? btrfs_get_64+0x1e8/0x200
	[Wed Dec 16 13:30:46 2020]  ? btrfs_get_token_64+0x350/0x350
	[Wed Dec 16 13:30:46 2020]  ? memcpy+0x4d/0x60
	[Wed Dec 16 13:30:46 2020]  merge_reloc_root+0x58c/0xc10
	[Wed Dec 16 13:30:46 2020]  ? prepare_to_merge+0x660/0x660
	[Wed Dec 16 13:30:46 2020]  ? btrfs_lookup_fs_root+0x113/0x180
	[Wed Dec 16 13:30:46 2020]  ? btrfs_end_super_write+0x3c0/0x3c0
	[Wed Dec 16 13:30:46 2020]  ? mutex_lock_io_nested+0xc20/0xc20
	[Wed Dec 16 13:30:46 2020]  ? btrfs_get_root_ref+0x24a/0x470
	[Wed Dec 16 13:30:46 2020]  ? __mutex_unlock_slowpath+0xb2/0x400
	[Wed Dec 16 13:30:46 2020]  ? btrfs_find_highest_objectid+0x1b0/0x1b0
	[Wed Dec 16 13:30:46 2020]  ? btrfs_init_reloc_root+0x370/0x370
	[Wed Dec 16 13:30:46 2020]  ? __kasan_check_write+0x14/0x20
	[Wed Dec 16 13:30:46 2020]  merge_reloc_roots+0x1e6/0x4a0
	[Wed Dec 16 13:30:46 2020]  ? merge_reloc_root+0xc10/0xc10
	[Wed Dec 16 13:30:46 2020]  ? btrfs_block_rsv_release+0x3a7/0x620
	[Wed Dec 16 13:30:46 2020]  relocate_block_group+0x3d2/0x780
	[Wed Dec 16 13:30:46 2020]  ? btrfs_defrag_leaves+0x250/0x650
	[Wed Dec 16 13:30:46 2020]  ? merge_reloc_roots+0x4a0/0x4a0
	[Wed Dec 16 13:30:46 2020]  btrfs_relocate_block_group+0x26e/0x4c0
	[Wed Dec 16 13:30:46 2020]  btrfs_relocate_chunk+0x52/0x120
	[Wed Dec 16 13:30:46 2020]  btrfs_balance+0xe2e/0x1900
	[Wed Dec 16 13:30:46 2020]  ? pvclock_clocksource_read+0xeb/0x190
	[Wed Dec 16 13:30:46 2020]  ? btrfs_relocate_chunk+0x120/0x120
	[Wed Dec 16 13:30:46 2020]  ? lock_acquired+0x7f/0x5e0
	[Wed Dec 16 13:30:46 2020]  ? do_raw_spin_lock+0x1e0/0x1e0
	[Wed Dec 16 13:30:46 2020]  ? do_raw_spin_unlock+0xa8/0x140
	[Wed Dec 16 13:30:46 2020]  btrfs_ioctl_balance+0x1f9/0x460
	[Wed Dec 16 13:30:46 2020]  btrfs_ioctl+0xe9c/0x4360
	[Wed Dec 16 13:30:46 2020]  ? __kasan_check_read+0x11/0x20
	[Wed Dec 16 13:30:46 2020]  ? check_chain_key+0x1f4/0x2f0
	[Wed Dec 16 13:30:46 2020]  ? __asan_loadN+0xf/0x20
	[Wed Dec 16 13:30:46 2020]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[Wed Dec 16 13:30:46 2020]  ? kvm_sched_clock_read+0x18/0x30
	[Wed Dec 16 13:30:46 2020]  ? check_chain_key+0x1f4/0x2f0
	[Wed Dec 16 13:30:46 2020]  ? lock_downgrade+0x3f0/0x3f0
	[Wed Dec 16 13:30:46 2020]  ? handle_mm_fault+0xad6/0x2150
	[Wed Dec 16 13:30:46 2020]  ? do_vfs_ioctl+0xfc/0x9d0
	[Wed Dec 16 13:30:46 2020]  ? ioctl_file_clone+0xe0/0xe0
	[Wed Dec 16 13:30:46 2020]  ? check_flags.part.50+0x6c/0x1e0
	[Wed Dec 16 13:30:46 2020]  ? check_flags.part.50+0x6c/0x1e0
	[Wed Dec 16 13:30:46 2020]  ? check_flags+0x26/0x30
	[Wed Dec 16 13:30:46 2020]  ? lock_is_held_type+0xc3/0xf0
	[Wed Dec 16 13:30:46 2020]  ? syscall_enter_from_user_mode+0x1b/0x60
	[Wed Dec 16 13:30:46 2020]  ? do_syscall_64+0x13/0x80
	[Wed Dec 16 13:30:46 2020]  ? rcu_read_lock_sched_held+0xa1/0xd0
	[Wed Dec 16 13:30:46 2020]  ? __kasan_check_read+0x11/0x20
	[Wed Dec 16 13:30:46 2020]  ? __fget_light+0xae/0x110
	[Wed Dec 16 13:30:46 2020]  __x64_sys_ioctl+0xc3/0x100
	[Wed Dec 16 13:30:46 2020]  do_syscall_64+0x37/0x80
	[Wed Dec 16 13:30:46 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[Wed Dec 16 13:30:46 2020] RIP: 0033:0x7f7c98b07427
	[Wed Dec 16 13:30:46 2020] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
	[Wed Dec 16 13:30:46 2020] RSP: 002b:00007fff0cb87938 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
	[Wed Dec 16 13:30:46 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7c98b07427
	[Wed Dec 16 13:30:46 2020] RDX: 00007fff0cb87948 RSI: 00000000c4009420 RDI: 0000000000000003
	[Wed Dec 16 13:30:46 2020] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
	[Wed Dec 16 13:30:46 2020] R10: fffffffffffff59d R11: 0000000000000202 R12: 0000000000000002
	[Wed Dec 16 13:30:46 2020] R13: 00007fff0cb89a3f R14: 0000557c21f8c119 R15: 0000000000000000
	[Wed Dec 16 13:31:03 2020] BTRFS info (device dm-0): found 2 extents, loops 2, stage: update data pointers
	[Wed Dec 16 13:31:59 2020] BTRFS info (device dm-0): relocating block group 981598208000 flags data

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 00/38] Cleanup error handling in relocation
  2020-12-16 19:56 ` [PATCH v7 00/38] Cleanup error handling " Zygo Blaxell
@ 2020-12-18 16:25   ` Zygo Blaxell
  0 siblings, 0 replies; 45+ messages in thread
From: Zygo Blaxell @ 2020-12-18 16:25 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 02:56:03PM -0500, Zygo Blaxell wrote:
> On Wed, Dec 16, 2020 at 11:26:16AM -0500, Josef Bacik wrote:
> > v6->v7:
> > - Broke up the series into 3 series, 1 for cosmetic things, 1 for all the major
> >   issues (including those reported on v6 of this set), and this new set which is
> >   solely the error handling related patches for relocation.  It's still a lot of
> >   patches, sorry about that.
> 
> So far it lockdepped, but it is still running:
> 
> 	[Wed Dec 16 13:30:45 2020] irq event stamp: 5875656

...and now it's dead, looks like tree mod log strikes again:

	[145504.989768][ T3280] BTRFS info (device dm-0): found 13271 extents, loops 2, stage: update data pointers
	[145622.222898][ T3280] avg_delayed_ref_runtime = 743782, time = 772386615466, count = 1038457
	[145664.364729][ T4659] ------------[ cut here ]------------
	[145664.373144][ T4659] kernel BUG at fs/btrfs/ctree.c:1208!
	[145664.377059][ T4659] invalid opcode: 0000 [#1] SMP KASAN PTI
	[145664.379909][ T4659] CPU: 1 PID: 4659 Comm: crawl_258 Tainted: G        W         5.10.0-39fbe74d1bbc-josef+ #1
	[145664.383114][ T4659] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
	[145664.386201][ T4659] RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0
	[145664.388238][ T4659] Code: 05 48 8d 74 10 65 ba 19 00 00 00 e8 49 e7 06 00 e9 a7 fd ff ff 4c 8d 7b 2c 4c 89 ff e8 f8 3e c9 ff 48 63 43 2c e9 a2 fe ff ff <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
	[145664.394705][ T4659] RSP: 0018:ffffc90001c870e8 EFLAGS: 00010297
	[145664.396777][ T4659] RAX: 0000000000000000 RBX: ffff888004946f00 RCX: ffffffff978792f6
	[145664.399482][ T4659] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff888004946f2c
	[145664.402117][ T4659] RBP: ffffc90001c87138 R08: 1ffff1100da2481c R09: ffffed100da2481c
	[145664.404763][ T4659] R10: ffff88806d1240d8 R11: ffffed100da2481b R12: 000000000000000b
	[145664.407401][ T4659] R13: ffff888062123000 R14: ffff88800f6a7a00 R15: ffff888004946f2c
	[145664.410047][ T4659] FS:  00007f0c92e1f700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
	[145664.413001][ T4659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[145664.415177][ T4659] CR2: 00007fc867df6100 CR3: 00000001038e0004 CR4: 0000000000170ee0
	[145664.417822][ T4659] Call Trace:
	[145664.418935][ T4659]  btrfs_search_old_slot+0x265/0x10d0
	[145664.420747][ T4659]  ? lock_acquired+0xbb/0x5e0
	[145664.422335][ T4659]  ? btrfs_search_slot+0x1090/0x1090
	[145664.424102][ T4659]  ? free_extent_buffer.part.53+0xd7/0x140
	[145664.426077][ T4659]  ? free_extent_buffer+0x13/0x20
	[145664.427810][ T4659]  resolve_indirect_refs+0x3e9/0xfc0
	[145664.429610][ T4659]  ? down_write_nested+0x2d0/0x2d0
	[145664.431310][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.432998][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.434654][ T4659]  ? add_prelim_ref.part.12+0x150/0x150
	[145664.436511][ T4659]  ? lock_downgrade+0x3f0/0x3f0
	[145664.438140][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.439818][ T4659]  ? lock_acquired+0xbb/0x5e0
	[145664.441384][ T4659]  ? free_extent_buffer.part.53+0xb1/0x140
	[145664.443416][ T4659]  ? do_raw_spin_unlock+0xa8/0x140
	[145664.445231][ T4659]  ? rb_insert_color+0x310/0x360
	[145664.446928][ T4659]  ? prelim_ref_insert+0x12d/0x430
	[145664.448694][ T4659]  find_parent_nodes+0x5c3/0x1830
	[145664.450439][ T4659]  ? resolve_indirect_refs+0xfc0/0xfc0
	[145664.452355][ T4659]  ? iterate_inodes_from_logical+0x129/0x170
	[145664.454282][ T4659]  ? btrfs_ioctl+0x237e/0x4360
	[145664.457750][ T4659]  ? __x64_sys_ioctl+0xc3/0x100
	[145664.459969][ T4659]  ? do_syscall_64+0x37/0x80
	[145664.462279][ T4659]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[145664.465078][ T4659]  ? resolve_indirect_refs+0xf90/0xfc0
	[145664.467560][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.471069][ T4659]  ? kasan_unpoison_shadow+0x35/0x50
	[145664.472861][ T4659]  ? trace_hardirqs_on+0x55/0x120
	[145664.474528][ T4659]  btrfs_find_all_roots_safe+0x142/0x1e0
	[145664.476661][ T4659]  ? find_parent_nodes+0x1830/0x1830
	[145664.478378][ T4659]  ? btrfs_inode_flags_to_xflags+0x50/0x50
	[145664.480274][ T4659]  iterate_extent_inodes+0x20e/0x580
	[145664.481995][ T4659]  ? tree_backref_for_extent+0x230/0x230
	[145664.483514][ T4659]  ? release_extent_buffer+0x225/0x280
	[145664.485246][ T4659]  ? read_extent_buffer+0xdd/0x110
	[145664.486916][ T4659]  ? lock_downgrade+0x3f0/0x3f0
	[145664.490569][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.492257][ T4659]  ? lock_acquired+0xbb/0x5e0
	[145664.493876][ T4659]  ? free_extent_buffer.part.53+0xb1/0x140
	[145664.495842][ T4659]  ? do_raw_spin_unlock+0xa8/0x140
	[145664.497540][ T4659]  ? _raw_spin_unlock+0x22/0x30
	[145664.499181][ T4659]  ? release_extent_buffer+0x225/0x280
	[145664.501022][ T4659]  iterate_inodes_from_logical+0x129/0x170
	[145664.502926][ T4659]  ? iterate_inodes_from_logical+0x129/0x170
	[145664.505126][ T4659]  ? btrfs_inode_flags_to_xflags+0x50/0x50
	[145664.507147][ T4659]  ? iterate_extent_inodes+0x580/0x580
	[145664.509104][ T4659]  ? __vmalloc_node+0x92/0xb0
	[145664.510806][ T4659]  ? init_data_container+0x34/0xb0
	[145664.512612][ T4659]  ? init_data_container+0x34/0xb0
	[145664.514452][ T4659]  ? kvmalloc_node+0x60/0x80
	[145664.516057][ T4659]  btrfs_ioctl_logical_to_ino+0x139/0x1e0
	[145664.518043][ T4659]  btrfs_ioctl+0x237e/0x4360
	[145664.519622][ T4659]  ? __kasan_check_write+0x14/0x20
	[145664.521403][ T4659]  ? mmput+0x3b/0x220
	[145664.522780][ T4659]  ? btrfs_ioctl_get_supported_features+0x30/0x30
	[145664.525000][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.526713][ T4659]  ? lock_release+0xc8/0x640
	[145664.528308][ T4659]  ? __might_fault+0x64/0xd0
	[145664.529894][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.531564][ T4659]  ? lock_downgrade+0x3f0/0x3f0
	[145664.533187][ T4659]  ? check_flags+0x30/0x30
	[145664.534701][ T4659]  ? check_flags+0x30/0x30
	[145664.536227][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.537952][ T4659]  ? lock_release+0xc8/0x640
	[145664.539497][ T4659]  ? do_vfs_ioctl+0xfc/0x9d0
	[145664.541020][ T4659]  ? __fget_files+0x151/0x250
	[145664.542641][ T4659]  ? ioctl_file_clone+0xe0/0xe0
	[145664.544304][ T4659]  ? lock_downgrade+0x3f0/0x3f0
	[145664.545917][ T4659]  ? check_flags+0x30/0x30
	[145664.547160][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.548782][ T4659]  ? lock_release+0xc8/0x640
	[145664.550404][ T4659]  ? __task_pid_nr_ns+0xd3/0x250
	[145664.553313][ T4659]  ? __kasan_check_read+0x11/0x20
	[145664.555669][ T4659]  ? __fget_files+0x170/0x250
	[145664.557893][ T4659]  ? __fget_light+0xf2/0x110
	[145664.560061][ T4659]  __x64_sys_ioctl+0xc3/0x100
	[145664.562178][ T4659]  do_syscall_64+0x37/0x80
	[145664.563689][ T4659]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
	[145664.565964][ T4659] RIP: 0033:0x7f0c94f19427
	[145664.567420][ T4659] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
	[145664.574088][ T4659] RSP: 002b:00007f0c92e1cca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[145664.577078][ T4659] RAX: ffffffffffffffda RBX: 00007f0c92e1cee0 RCX: 00007f0c94f19427
	[145664.579999][ T4659] RDX: 00007f0c92e1cee8 RSI: 00000000c038943b RDI: 0000000000000004
	[145664.582957][ T4659] RBP: 0000000001000000 R08: 0000000000000000 R09: 00007f0c92e1d0c0
	[145664.585592][ T4659] R10: 000055b378eb9c40 R11: 0000000000000246 R12: 0000000000000004
	[145664.588153][ T4659] R13: 00007f0c92e1cee8 R14: 00007f0c92e1cff0 R15: 00007f0c92e1cee0
	[145664.591569][ T4659] Modules linked in:
	[145664.592962][ T4659] ---[ end trace 21a31c4983711212 ]---
	[145664.594870][ T4659] RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0
	[145664.597141][ T4659] Code: 05 48 8d 74 10 65 ba 19 00 00 00 e8 49 e7 06 00 e9 a7 fd ff ff 4c 8d 7b 2c 4c 89 ff e8 f8 3e c9 ff 48 63 43 2c e9 a2 fe ff ff <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
	[145664.604814][ T4659] RSP: 0018:ffffc90001c870e8 EFLAGS: 00010297
	[145664.606882][ T4659] RAX: 0000000000000000 RBX: ffff888004946f00 RCX: ffffffff978792f6
	[145664.611246][ T4659] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff888004946f2c
	[145664.613859][ T4659] RBP: ffffc90001c87138 R08: 1ffff1100da2481c R09: ffffed100da2481c
	[145664.617160][ T4659] R10: ffff88806d1240d8 R11: ffffed100da2481b R12: 000000000000000b
	[145664.619988][ T4659] R13: ffff888062123000 R14: ffff88800f6a7a00 R15: ffff888004946f2c
	[145664.622597][ T4659] FS:  00007f0c92e1f700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000
	[145664.625590][ T4659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[145664.627890][ T4659] CR2: 00007fc867df6100 CR3: 00000001038e0004 CR4: 0000000000170ee0
	[145664.630639][ T4659] note: crawl_258[4659] exited with preempt_count 1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/38] btrfs: handle errors from select_reloc_root()
  2020-12-16 16:26 ` [PATCH v7 03/38] btrfs: handle errors from select_reloc_root() Josef Bacik
@ 2021-02-26 18:30   ` David Sterba
  2021-03-11 18:10     ` Josef Bacik
  0 siblings, 1 reply; 45+ messages in thread
From: David Sterba @ 2021-02-26 18:30 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 11:26:19AM -0500, Josef Bacik wrote:
> Currently select_reloc_root() doesn't return an error, but followup
> patches will make it possible for it to return an error.  We do have
> proper error recovery in do_relocation however, so handle the
> possibility of select_reloc_root() having an error properly instead of
> BUG_ON(!root).  I've also adjusted select_reloc_root() to return
> ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the
> error case easier to deal with.  I've replaced the BUG_ON(!root) with an
> ASSERT(0) for this case as it indicates we messed up the backref walking
> code, but it could also indicate corruption.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/relocation.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 08ffaa93b78f..741068580455 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2024,8 +2024,14 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
>  		if (!next || next->level <= node->level)
>  			break;
>  	}
> -	if (!root)
> -		return NULL;
> +	if (!root) {
> +		/*
> +		 * This can happen if there's fs corruption or if there's a bug
> +		 * in the backref lookup code.
> +		 */
> +		ASSERT(0);

You've added these assert(0) to several patches and I think it's wrong.
The asserts are supposed to verify invariants, you can hardly say that
we're expecting 0 to be true, so the construct serves as "please crash
here", which is no better than BUG().  It's been spreading, there are
like 25 now.

> +		return ERR_PTR(-ENOENT);

The caller that does expect ENOENT because that would be a logical error
should do the assert.

> +	}
>  
>  	next = node;
>  	/* setup backref node path for btrfs_reloc_cow_block */
> @@ -2196,7 +2202,10 @@ static int do_relocation(struct btrfs_trans_handle *trans,
>  
>  		upper = edge->node[UPPER];
>  		root = select_reloc_root(trans, rc, upper, edges);
> -		BUG_ON(!root);
> +		if (IS_ERR(root)) {
> +			ret = PTR_ERR(root);
> +			goto next;
> +		}
>  
>  		if (upper->eb && !upper->locked) {
>  			if (!lowest) {
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 02/38] btrfs: return an error from btrfs_record_root_in_trans
  2020-12-16 16:26 ` [PATCH v7 02/38] btrfs: return an error from btrfs_record_root_in_trans Josef Bacik
@ 2021-02-26 19:03   ` David Sterba
  0 siblings, 0 replies; 45+ messages in thread
From: David Sterba @ 2021-02-26 19:03 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Qu Wenruo, Johannes Thumshirn

On Wed, Dec 16, 2020 at 11:26:18AM -0500, Josef Bacik wrote:
> We can create a reloc root when we record the root in the trans, which
> can fail for all sorts of different reasons.  Propagate this error up
> the chain of callers.  Future patches will fix the callers of
> btrfs_record_root_in_trans() to handle the error.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/transaction.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index f51f9e39bcee..eba48578159e 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -400,6 +400,7 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans,
>  			       int force)
>  {
>  	struct btrfs_fs_info *fs_info = root->fs_info;
> +	int ret = 0;
>  
>  	if ((test_bit(BTRFS_ROOT_SHAREABLE, &root->state) &&
>  	    root->last_trans < trans->transid) || force) {
> @@ -448,11 +449,11 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans,
>  		 * lock.  smp_wmb() makes sure that all the writes above are
>  		 * done before we pop in the zero below
>  		 */
> -		btrfs_init_reloc_root(trans, root);
> +		ret = btrfs_init_reloc_root(trans, root);

This is patch 2 from the series and got me curious if it's ok to add the
error value check here, because that would mean that the whole callgraph
from btrfs_init_reloc_root is also error handling clean (ie. no
BUG_ONs).

And it's not until patch 19.

btrfs_init_reloc_root
  create_reloc_root
    kmalloc + BUG_ON
    btrfs_copy_root + BUG_ON, twice
    btrfs_insert_root + BUG_ON
    btrfs_read_tree_root + BUG_ON
  __add_reloc_root
    ...

All the patches in between add handling the record_root_in_trans errors,
which is fine as end result, but the proper error handling needs to be
pushed upwards from all leaf functions. That way it's cleaner and an
understandable pattern as we can review one step in the callgraph,
assuming that the calless are OK.

After reading the whole patchset it looks like the end result is more
or less what it should be but it's not a sequence of reviewable steps
patch-to-patch.

So it's patch ordering and maybe some context fixups, where the leaf
functions are BUG_ON-free and then all individual callers are updated.

Roughly in that order:

- __add_reloc_root
- create_reloc_root
- btrfs_init_reloc_root
- record_root_in_trans
- select_reloc_root

>  		smp_mb__before_atomic();
>  		clear_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state);
>  	}
> -	return 0;
> +	return ret;
>  }
>  
>  
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/38] btrfs: handle errors from select_reloc_root()
  2021-02-26 18:30   ` David Sterba
@ 2021-03-11 18:10     ` Josef Bacik
  2021-04-09 19:24       ` David Sterba
  0 siblings, 1 reply; 45+ messages in thread
From: Josef Bacik @ 2021-03-11 18:10 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 2/26/21 1:30 PM, David Sterba wrote:
> On Wed, Dec 16, 2020 at 11:26:19AM -0500, Josef Bacik wrote:
>> Currently select_reloc_root() doesn't return an error, but followup
>> patches will make it possible for it to return an error.  We do have
>> proper error recovery in do_relocation however, so handle the
>> possibility of select_reloc_root() having an error properly instead of
>> BUG_ON(!root).  I've also adjusted select_reloc_root() to return
>> ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the
>> error case easier to deal with.  I've replaced the BUG_ON(!root) with an
>> ASSERT(0) for this case as it indicates we messed up the backref walking
>> code, but it could also indicate corruption.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/relocation.c | 15 ++++++++++++---
>>   1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 08ffaa93b78f..741068580455 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -2024,8 +2024,14 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
>>   		if (!next || next->level <= node->level)
>>   			break;
>>   	}
>> -	if (!root)
>> -		return NULL;
>> +	if (!root) {
>> +		/*
>> +		 * This can happen if there's fs corruption or if there's a bug
>> +		 * in the backref lookup code.
>> +		 */
>> +		ASSERT(0);
> 
> You've added these assert(0) to several patches and I think it's wrong.
> The asserts are supposed to verify invariants, you can hardly say that
> we're expecting 0 to be true, so the construct serves as "please crash
> here", which is no better than BUG().  It's been spreading, there are
> like 25 now.

They are much better than a BUG_ON(), as they won't trip in production, they'll 
only trip for developers.  And we need them in many of these cases, and this 
example you've called out is a clear example of where we absolutely want to 
differentiate, because we have 3 different failure modes that will return 
-EUCLEAN.  If I add a ASSERT(ret != -EUCLEAN) to all of the callers then when 
the developer hits a logic bug they'll have to go and manually add in their own 
assert to figure out which error condition failed.  Instead I've added 
explanations in the comments for each assert and then have an assert for every 
error condition so that it's clear where things went wrong.

So these ASSERT()'s are staying.  I could be convinced to change them to match 
their error condition, so in the above case instead I would be fine changing it to

if (!root) {
	ASSERT(root);
...

If that's more acceptable let me know, I'll change them to match their error 
condition.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/38] btrfs: handle errors from select_reloc_root()
  2021-03-11 18:10     ` Josef Bacik
@ 2021-04-09 19:24       ` David Sterba
  0 siblings, 0 replies; 45+ messages in thread
From: David Sterba @ 2021-04-09 19:24 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, linux-btrfs, kernel-team

On Thu, Mar 11, 2021 at 01:10:03PM -0500, Josef Bacik wrote:
> On 2/26/21 1:30 PM, David Sterba wrote:
> > On Wed, Dec 16, 2020 at 11:26:19AM -0500, Josef Bacik wrote:
> >> Currently select_reloc_root() doesn't return an error, but followup
> >> patches will make it possible for it to return an error.  We do have
> >> proper error recovery in do_relocation however, so handle the
> >> possibility of select_reloc_root() having an error properly instead of
> >> BUG_ON(!root).  I've also adjusted select_reloc_root() to return
> >> ERR_PTR(-ENOENT) if we don't find a root, instead of NULL, to make the
> >> error case easier to deal with.  I've replaced the BUG_ON(!root) with an
> >> ASSERT(0) for this case as it indicates we messed up the backref walking
> >> code, but it could also indicate corruption.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >> ---
> >>   fs/btrfs/relocation.c | 15 ++++++++++++---
> >>   1 file changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> >> index 08ffaa93b78f..741068580455 100644
> >> --- a/fs/btrfs/relocation.c
> >> +++ b/fs/btrfs/relocation.c
> >> @@ -2024,8 +2024,14 @@ struct btrfs_root *select_reloc_root(struct btrfs_trans_handle *trans,
> >>   		if (!next || next->level <= node->level)
> >>   			break;
> >>   	}
> >> -	if (!root)
> >> -		return NULL;
> >> +	if (!root) {
> >> +		/*
> >> +		 * This can happen if there's fs corruption or if there's a bug
> >> +		 * in the backref lookup code.
> >> +		 */
> >> +		ASSERT(0);
> > 
> > You've added these assert(0) to several patches and I think it's wrong.
> > The asserts are supposed to verify invariants, you can hardly say that
> > we're expecting 0 to be true, so the construct serves as "please crash
> > here", which is no better than BUG().  It's been spreading, there are
> > like 25 now.
> 
> They are much better than a BUG_ON(), as they won't trip in production, they'll 
> only trip for developers.

I'm not suggesting to use BUG_ON, only in rare cases (ideally). For
developers what should blow up are conditions that validate the
invariants, either in advance or after some action.

> And we need them in many of these cases, and this 
> example you've called out is a clear example of where we absolutely want to 
> differentiate, because we have 3 different failure modes that will return 
> -EUCLEAN.  If I add a ASSERT(ret != -EUCLEAN) to all of the callers then when 
> the developer hits a logic bug they'll have to go and manually add in their own 
> assert to figure out which error condition failed.

The idea is to add more conditions to differentiate if it's corrupted fs
or if it's a development bug. But for testing I'd rather see a way we
can exercise the corruption path, eg. fuzzing or crafted images, up to
the point where EUCLEAN is encountered by some transaction abort or
normal error.

> Instead I've added 
> explanations in the comments for each assert and then have an assert for every 
> error condition so that it's clear where things went wrong.

There can be exceptional cases where distinguishing can't be done easily
or at all so the comments are fine but I'd rather not encourage the
ASSERT(0) pattern instead of thinking hard if it's really the right way
to handle the errors. It too much resembles the BUG_ON() anti-pattern.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-04-09 19:26 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 16:26 [PATCH v7 00/38] Cleanup error handling in relocation Josef Bacik
2020-12-16 16:26 ` [PATCH v7 01/38] btrfs: convert BUG_ON()'s in relocate_tree_block Josef Bacik
2020-12-16 16:26 ` [PATCH v7 02/38] btrfs: return an error from btrfs_record_root_in_trans Josef Bacik
2021-02-26 19:03   ` David Sterba
2020-12-16 16:26 ` [PATCH v7 03/38] btrfs: handle errors from select_reloc_root() Josef Bacik
2021-02-26 18:30   ` David Sterba
2021-03-11 18:10     ` Josef Bacik
2021-04-09 19:24       ` David Sterba
2020-12-16 16:26 ` [PATCH v7 04/38] btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors Josef Bacik
2020-12-16 16:26 ` [PATCH v7 05/38] btrfs: check record_root_in_trans related failures in select_reloc_root Josef Bacik
2020-12-16 16:26 ` [PATCH v7 06/38] btrfs: do proper error handling in record_reloc_root_in_trans Josef Bacik
2020-12-16 16:26 ` [PATCH v7 07/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange Josef Bacik
2020-12-16 16:26 ` [PATCH v7 08/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename Josef Bacik
2020-12-16 16:26 ` [PATCH v7 09/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume Josef Bacik
2020-12-16 16:26 ` [PATCH v7 10/38] btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees Josef Bacik
2020-12-16 16:26 ` [PATCH v7 11/38] btrfs: handle btrfs_record_root_in_trans failure in create_subvol Josef Bacik
2020-12-16 16:26 ` [PATCH v7 12/38] btrfs: btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block Josef Bacik
2020-12-16 16:26 ` [PATCH v7 13/38] btrfs: handle btrfs_record_root_in_trans failure in start_transaction Josef Bacik
2020-12-16 16:26 ` [PATCH v7 14/38] btrfs: handle record_root_in_trans failure in qgroup_account_snapshot Josef Bacik
2020-12-16 16:26 ` [PATCH v7 15/38] btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans Josef Bacik
2020-12-16 16:26 ` [PATCH v7 16/38] btrfs: handle record_root_in_trans failure in create_pending_snapshot Josef Bacik
2020-12-16 16:26 ` [PATCH v7 17/38] btrfs: do not panic in __add_reloc_root Josef Bacik
2020-12-16 16:26 ` [PATCH v7 18/38] btrfs: have proper error handling in btrfs_init_reloc_root Josef Bacik
2020-12-16 16:26 ` [PATCH v7 19/38] btrfs: do proper error handling in create_reloc_root Josef Bacik
2020-12-16 16:26 ` [PATCH v7 20/38] btrfs: validate ->reloc_root after recording root in trans Josef Bacik
2020-12-16 16:26 ` [PATCH v7 21/38] btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots Josef Bacik
2020-12-16 16:26 ` [PATCH v7 22/38] btrfs: change insert_dirty_subvol to return errors Josef Bacik
2020-12-16 16:26 ` [PATCH v7 23/38] btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol Josef Bacik
2020-12-16 16:26 ` [PATCH v7 24/38] btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge Josef Bacik
2020-12-16 16:26 ` [PATCH v7 25/38] btrfs: do proper error handling in btrfs_update_reloc_root Josef Bacik
2020-12-16 16:26 ` [PATCH v7 26/38] btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s Josef Bacik
2020-12-16 16:26 ` [PATCH v7 27/38] btrfs: handle btrfs_cow_block errors in replace_path Josef Bacik
2020-12-16 16:26 ` [PATCH v7 28/38] btrfs: handle btrfs_search_slot failure " Josef Bacik
2020-12-16 16:26 ` [PATCH v7 29/38] btrfs: handle errors in reference count manipulation " Josef Bacik
2020-12-16 16:26 ` [PATCH v7 30/38] btrfs: handle extent reference errors in do_relocation Josef Bacik
2020-12-16 16:26 ` [PATCH v7 31/38] btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly Josef Bacik
2020-12-16 16:26 ` [PATCH v7 32/38] btrfs: remove the extent item sanity checks in relocate_block_group Josef Bacik
2020-12-16 16:26 ` [PATCH v7 33/38] btrfs: do proper error handling in create_reloc_inode Josef Bacik
2020-12-16 16:26 ` [PATCH v7 34/38] btrfs: handle __add_reloc_root failures in btrfs_recover_relocation Josef Bacik
2020-12-16 16:26 ` [PATCH v7 35/38] btrfs: cleanup error handling in prepare_to_merge Josef Bacik
2020-12-16 16:26 ` [PATCH v7 36/38] btrfs: handle extent corruption with select_one_root properly Josef Bacik
2020-12-16 16:26 ` [PATCH v7 37/38] btrfs: do proper error handling in merge_reloc_roots Josef Bacik
2020-12-16 16:26 ` [PATCH v7 38/38] btrfs: check return value of btrfs_commit_transaction in relocation Josef Bacik
2020-12-16 19:56 ` [PATCH v7 00/38] Cleanup error handling " Zygo Blaxell
2020-12-18 16:25   ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.