All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations
@ 2023-06-07 19:24 fdmanana
  2023-06-07 19:24 ` [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
                   ` (13 more replies)
  0 siblings, 14 replies; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

This mostly helps avoid some unnecessary enomem failures when logging
tree mod log operations and replace some BUG_ON()'s when dealing with
such failures. There's also 2 bug fixes (the first two patches) and
some cleanups. More details on the changelogs.

Filipe Manana (13):
  btrfs: add missing error handling when logging operation while COWing extent buffer
  btrfs: fix extent buffer leak after failure tree mod log failure at split_node()
  btrfs: avoid tree mod log ENOMEM failures when we don't need to log
  btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
  btrfs: do not BUG_ON() on tree mod log failure at balance_level()
  btrfs: rename enospc label to out at balance_level()
  btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  btrfs: abort transaction at balance_level() when left child is missing
  btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
  btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
  btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()

 fs/btrfs/ctree.c        | 204 ++++++++++++++++++++++++++++------------
 fs/btrfs/ctree.h        |   4 +-
 fs/btrfs/tree-mod-log.c | 148 ++++++++++++++++++++++-------
 3 files changed, 262 insertions(+), 94 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  9:25   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 02/13] btrfs: fix extent buffer leak after failure tree mod log failure at split_node() fdmanana
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.

So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.

Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 385524224037..7f7f13965fe9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 		add_root_to_dirty_list(root);
 	} else {
 		WARN_ON(trans->transid != btrfs_header_generation(parent));
-		btrfs_tree_mod_log_insert_key(parent, parent_slot,
-					      BTRFS_MOD_LOG_KEY_REPLACE);
+		ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+						    BTRFS_MOD_LOG_KEY_REPLACE);
+		if (ret) {
+			btrfs_tree_unlock(cow);
+			free_extent_buffer(cow);
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
 		btrfs_set_node_blockptr(parent, parent_slot,
 					cow->start);
 		btrfs_set_node_ptr_generation(parent, parent_slot,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 02/13] btrfs: fix extent buffer leak after failure tree mod log failure at split_node()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
  2023-06-07 19:24 ` [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  8:40   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log fdmanana
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At split_node(), if we fail to log the tree mod log copy operation, we
return without unlocking the split extent buffer we just allocated and
without decrementing the reference we own on it. Fix this by unlocking
it and decrementing the ref count before returning.

Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 7f7f13965fe9..8496535828de 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3053,6 +3053,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_tree_mod_log_eb_copy(split, c, 0, mid, c_nritems - mid);
 	if (ret) {
+		btrfs_tree_unlock(split);
+		free_extent_buffer(split);
 		btrfs_abort_transaction(trans, ret);
 		return ret;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
  2023-06-07 19:24 ` [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
  2023-06-07 19:24 ` [PATCH 02/13] btrfs: fix extent buffer leak after failure tree mod log failure at split_node() fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-07 19:24 ` [PATCH 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When logging tree mod log operations we start by checking, in a lockless
manner, if we need to log - if we don't, we just return and do nothing,
otherwise we will allocate one or more tree mod log operations and then
check again if we need to log. This second check will take the tree mod
log lock in write mode if we need to log, otherwise it will do nothing
and we just free the allocated memory and return success.

We can improve on this by not returning an error in case the memory
allocations fail, unless the second check tells us that we actually need
to log. That is, if we fail to allocate memory and the second check tells
use that we don't need to log, we can just return success and avoid
returning -ENOMEM to the caller. Currently tree mod log failures are
dealt with either a BUG_ON() or a transaction abort, as tree mod log
operations are logged in code paths that modify a b+tree.

So just avoid failing with -ENOMEM if we fail to allocate a tree mod log
operation unless we actually need to log the operations, that is, if
tree_mod_dont_log() returns true.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-mod-log.c | 148 +++++++++++++++++++++++++++++++---------
 1 file changed, 114 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index 07c086f9e35e..3df6153d5d5a 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -226,21 +226,32 @@ int btrfs_tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
 				  enum btrfs_mod_log_op op)
 {
 	struct tree_mod_elem *tm;
-	int ret;
+	int ret = 0;
 
 	if (!tree_mod_need_log(eb->fs_info, eb))
 		return 0;
 
 	tm = alloc_tree_mod_elem(eb, slot, op);
 	if (!tm)
-		return -ENOMEM;
+		ret = -ENOMEM;
 
 	if (tree_mod_dont_log(eb->fs_info, eb)) {
 		kfree(tm);
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
 		return 0;
+	} else if (ret != 0) {
+		/*
+		 * We previously failed to allocate memory and we need to log,
+		 * so we have to fail.
+		 */
+		goto out_unlock;
 	}
 
 	ret = tree_mod_log_insert(eb->fs_info, tm);
+out_unlock:
 	write_unlock(&eb->fs_info->tree_mod_log_lock);
 	if (ret)
 		kfree(tm);
@@ -282,14 +293,16 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
 		return 0;
 
 	tm_list = kcalloc(nr_items, sizeof(struct tree_mod_elem *), GFP_NOFS);
-	if (!tm_list)
-		return -ENOMEM;
+	if (!tm_list) {
+		ret = -ENOMEM;
+		goto lock;
+	}
 
 	tm = tree_mod_log_alloc_move(eb, dst_slot, src_slot, nr_items);
 	if (IS_ERR(tm)) {
 		ret = PTR_ERR(tm);
 		tm = NULL;
-		goto free_tms;
+		goto lock;
 	}
 
 	for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
@@ -297,14 +310,28 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
 				BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING);
 		if (!tm_list[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
-	if (tree_mod_dont_log(eb->fs_info, eb))
+lock:
+	if (tree_mod_dont_log(eb->fs_info, eb)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	}
 	locked = true;
 
+	/*
+	 * We previously failed to allocate memory and we need to log, so we
+	 * have to fail.
+	 */
+	if (ret != 0)
+		goto free_tms;
+
 	/*
 	 * When we override something during the move, we log these removals.
 	 * This can only happen when we move towards the beginning of the
@@ -325,10 +352,12 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
 	return 0;
 
 free_tms:
-	for (i = 0; i < nr_items; i++) {
-		if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
-			rb_erase(&tm_list[i]->node, &eb->fs_info->tree_mod_log);
-		kfree(tm_list[i]);
+	if (tm_list) {
+		for (i = 0; i < nr_items; i++) {
+			if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+				rb_erase(&tm_list[i]->node, &eb->fs_info->tree_mod_log);
+			kfree(tm_list[i]);
+		}
 	}
 	if (locked)
 		write_unlock(&eb->fs_info->tree_mod_log_lock);
@@ -378,14 +407,14 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
 				  GFP_NOFS);
 		if (!tm_list) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 		for (i = 0; i < nritems; i++) {
 			tm_list[i] = alloc_tree_mod_elem(old_root, i,
 			    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING);
 			if (!tm_list[i]) {
 				ret = -ENOMEM;
-				goto free_tms;
+				goto lock;
 			}
 		}
 	}
@@ -393,7 +422,7 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
 	tm = kzalloc(sizeof(*tm), GFP_NOFS);
 	if (!tm) {
 		ret = -ENOMEM;
-		goto free_tms;
+		goto lock;
 	}
 
 	tm->logical = new_root->start;
@@ -402,14 +431,28 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
 	tm->generation = btrfs_header_generation(old_root);
 	tm->op = BTRFS_MOD_LOG_ROOT_REPLACE;
 
-	if (tree_mod_dont_log(fs_info, NULL))
+lock:
+	if (tree_mod_dont_log(fs_info, NULL)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	} else if (ret != 0) {
+		/*
+		 * We previously failed to allocate memory and we need to log,
+		 * so we have to fail.
+		 */
+		goto out_unlock;
+	}
 
 	if (tm_list)
 		ret = tree_mod_log_free_eb(fs_info, tm_list, nritems);
 	if (!ret)
 		ret = tree_mod_log_insert(fs_info, tm);
 
+out_unlock:
 	write_unlock(&fs_info->tree_mod_log_lock);
 	if (ret)
 		goto free_tms;
@@ -501,7 +544,8 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 	struct btrfs_fs_info *fs_info = dst->fs_info;
 	int ret = 0;
 	struct tree_mod_elem **tm_list = NULL;
-	struct tree_mod_elem **tm_list_add, **tm_list_rem;
+	struct tree_mod_elem **tm_list_add = NULL;
+	struct tree_mod_elem **tm_list_rem = NULL;
 	int i;
 	bool locked = false;
 	struct tree_mod_elem *dst_move_tm = NULL;
@@ -517,8 +561,10 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 
 	tm_list = kcalloc(nr_items * 2, sizeof(struct tree_mod_elem *),
 			  GFP_NOFS);
-	if (!tm_list)
-		return -ENOMEM;
+	if (!tm_list) {
+		ret = -ENOMEM;
+		goto lock;
+	}
 
 	if (dst_move_nr_items) {
 		dst_move_tm = tree_mod_log_alloc_move(dst, dst_offset + nr_items,
@@ -526,7 +572,7 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 		if (IS_ERR(dst_move_tm)) {
 			ret = PTR_ERR(dst_move_tm);
 			dst_move_tm = NULL;
-			goto free_tms;
+			goto lock;
 		}
 	}
 	if (src_move_nr_items) {
@@ -536,7 +582,7 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 		if (IS_ERR(src_move_tm)) {
 			ret = PTR_ERR(src_move_tm);
 			src_move_tm = NULL;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
@@ -547,21 +593,35 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 						     BTRFS_MOD_LOG_KEY_REMOVE);
 		if (!tm_list_rem[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 
 		tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
 						     BTRFS_MOD_LOG_KEY_ADD);
 		if (!tm_list_add[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
-	if (tree_mod_dont_log(fs_info, NULL))
+lock:
+	if (tree_mod_dont_log(fs_info, NULL)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	}
 	locked = true;
 
+	/*
+	 * We previously failed to allocate memory and we need to log, so we
+	 * have to fail.
+	 */
+	if (ret != 0)
+		goto free_tms;
+
 	if (dst_move_tm) {
 		ret = tree_mod_log_insert(fs_info, dst_move_tm);
 		if (ret)
@@ -593,10 +653,12 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 	if (src_move_tm && !RB_EMPTY_NODE(&src_move_tm->node))
 		rb_erase(&src_move_tm->node, &fs_info->tree_mod_log);
 	kfree(src_move_tm);
-	for (i = 0; i < nr_items * 2; i++) {
-		if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
-			rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
-		kfree(tm_list[i]);
+	if (tm_list) {
+		for (i = 0; i < nr_items * 2; i++) {
+			if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+				rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+			kfree(tm_list[i]);
+		}
 	}
 	if (locked)
 		write_unlock(&fs_info->tree_mod_log_lock);
@@ -617,22 +679,38 @@ int btrfs_tree_mod_log_free_eb(struct extent_buffer *eb)
 
 	nritems = btrfs_header_nritems(eb);
 	tm_list = kcalloc(nritems, sizeof(struct tree_mod_elem *), GFP_NOFS);
-	if (!tm_list)
-		return -ENOMEM;
+	if (!tm_list) {
+		ret = -ENOMEM;
+		goto lock;
+	}
 
 	for (i = 0; i < nritems; i++) {
 		tm_list[i] = alloc_tree_mod_elem(eb, i,
 				    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING);
 		if (!tm_list[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
-	if (tree_mod_dont_log(eb->fs_info, eb))
+lock:
+	if (tree_mod_dont_log(eb->fs_info, eb)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	} else if (ret != 0) {
+		/*
+		 * We previously failed to allocate memory and we need to log,
+		 * so we have to fail.
+		 */
+		goto out_unlock;
+	}
 
 	ret = tree_mod_log_free_eb(eb->fs_info, tm_list, nritems);
+out_unlock:
 	write_unlock(&eb->fs_info->tree_mod_log_lock);
 	if (ret)
 		goto free_tms;
@@ -641,9 +719,11 @@ int btrfs_tree_mod_log_free_eb(struct extent_buffer *eb)
 	return 0;
 
 free_tms:
-	for (i = 0; i < nritems; i++)
-		kfree(tm_list[i]);
-	kfree(tm_list);
+	if (tm_list) {
+		for (i = 0; i < nritems; i++)
+			kfree(tm_list[i]);
+		kfree(tm_list);
+	}
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (2 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  8:48   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to
record a tree mod log root insertion operation, do a transaction abort
instead. There's really no need for the BUG_ON(), we can properly
release all resources in this context and turn the filesystem to RO mode
and in an error state instead.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 8496535828de..d6c29564ce49 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -584,9 +584,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 		    btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
 			parent_start = buf->start;
 
-		atomic_inc(&cow->refs);
 		ret = btrfs_tree_mod_log_insert_root(root->node, cow, true);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_tree_unlock(cow);
+			free_extent_buffer(cow);
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
+		atomic_inc(&cow->refs);
 		rcu_assign_pointer(root->node, cow);
 
 		btrfs_free_tree_block(trans, btrfs_root_id(root), buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (3 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  8:52   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 06/13] btrfs: rename enospc label to out " fdmanana
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release
all resources in this context, and we have to abort because other future
tree searches that use the tree mod log (btrfs_search_old_slot()) may get
inconsistent results if other operations modify the tree after that
failure and before the tree mod log based search.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d6c29564ce49..d60b28c6bd1b 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1054,7 +1054,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		}
 
 		ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_tree_unlock(child);
+			free_extent_buffer(child);
+			btrfs_abort_transaction(trans, ret);
+			goto enospc;
+		}
 		rcu_assign_pointer(root->node, child);
 
 		add_root_to_dirty_list(root);
@@ -1142,7 +1147,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 			btrfs_node_key(right, &right_key, 0);
 			ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
 					BTRFS_MOD_LOG_KEY_REPLACE);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				goto enospc;
+			}
 			btrfs_set_node_key(parent, &right_key, pslot + 1);
 			btrfs_mark_buffer_dirty(parent);
 		}
@@ -1188,7 +1196,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		btrfs_node_key(mid, &mid_key, 0);
 		ret = btrfs_tree_mod_log_insert_key(parent, pslot,
 						    BTRFS_MOD_LOG_KEY_REPLACE);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_abort_transaction(trans, ret);
+			goto enospc;
+		}
 		btrfs_set_node_key(parent, &mid_key, pslot);
 		btrfs_mark_buffer_dirty(parent);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 06/13] btrfs: rename enospc label to out at balance_level()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (4 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  8:53   ` Qu Wenruo
  2023-06-08  9:21   ` Anand Jain
  2023-06-07 19:24 ` [PATCH 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
                   ` (7 subsequent siblings)
  13 siblings, 2 replies; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level() we have this 'enospc' label where we jump to in case
we get an error at several places. However that error is certainly not
-ENOSPC in call cases, it can be -EIO or -ENOMEM when reading a child
extent buffer for example, or -ENOMEM when trying to record tree mod log
operations. So to make this less confusing, rename the label to 'out'.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d60b28c6bd1b..e98f9e205e25 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1041,7 +1041,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (IS_ERR(child)) {
 			ret = PTR_ERR(child);
 			btrfs_handle_fs_error(fs_info, ret, NULL);
-			goto enospc;
+			goto out;
 		}
 
 		btrfs_tree_lock(child);
@@ -1050,7 +1050,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (ret) {
 			btrfs_tree_unlock(child);
 			free_extent_buffer(child);
-			goto enospc;
+			goto out;
 		}
 
 		ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
@@ -1058,7 +1058,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 			btrfs_tree_unlock(child);
 			free_extent_buffer(child);
 			btrfs_abort_transaction(trans, ret);
-			goto enospc;
+			goto out;
 		}
 		rcu_assign_pointer(root->node, child);
 
@@ -1087,7 +1087,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (IS_ERR(left)) {
 			ret = PTR_ERR(left);
 			left = NULL;
-			goto enospc;
+			goto out;
 		}
 
 		__btrfs_tree_lock(left, BTRFS_NESTING_LEFT);
@@ -1096,7 +1096,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 				       BTRFS_NESTING_LEFT_COW);
 		if (wret) {
 			ret = wret;
-			goto enospc;
+			goto out;
 		}
 	}
 
@@ -1105,7 +1105,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (IS_ERR(right)) {
 			ret = PTR_ERR(right);
 			right = NULL;
-			goto enospc;
+			goto out;
 		}
 
 		__btrfs_tree_lock(right, BTRFS_NESTING_RIGHT);
@@ -1114,7 +1114,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 				       BTRFS_NESTING_RIGHT_COW);
 		if (wret) {
 			ret = wret;
-			goto enospc;
+			goto out;
 		}
 	}
 
@@ -1149,7 +1149,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 					BTRFS_MOD_LOG_KEY_REPLACE);
 			if (ret < 0) {
 				btrfs_abort_transaction(trans, ret);
-				goto enospc;
+				goto out;
 			}
 			btrfs_set_node_key(parent, &right_key, pslot + 1);
 			btrfs_mark_buffer_dirty(parent);
@@ -1168,12 +1168,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (!left) {
 			ret = -EROFS;
 			btrfs_handle_fs_error(fs_info, ret, NULL);
-			goto enospc;
+			goto out;
 		}
 		wret = balance_node_right(trans, mid, left);
 		if (wret < 0) {
 			ret = wret;
-			goto enospc;
+			goto out;
 		}
 		if (wret == 1) {
 			wret = push_node_left(trans, left, mid, 1);
@@ -1198,7 +1198,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 						    BTRFS_MOD_LOG_KEY_REPLACE);
 		if (ret < 0) {
 			btrfs_abort_transaction(trans, ret);
-			goto enospc;
+			goto out;
 		}
 		btrfs_set_node_key(parent, &mid_key, pslot);
 		btrfs_mark_buffer_dirty(parent);
@@ -1225,7 +1225,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 	if (orig_ptr !=
 	    btrfs_node_blockptr(path->nodes[level], path->slots[level]))
 		BUG();
-enospc:
+out:
 	if (right) {
 		btrfs_tree_unlock(right);
 		free_extent_buffer(right);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (5 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 06/13] btrfs: rename enospc label to out " fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-07 19:24 ` [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level(), when trying to promote a child node to a root node, if
we fail to read the child we call btrfs_handle_fs_error(), which turns the
fs to RO mode and sets it to error state as well, causing any ongoing
transaction to abort. This however is not necessary because at that point
we have not made any change yet at balance_level(), so any error reading
the child node does not leaves us in any inconsistent state. Therefore we
can just return the error to the caller.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e98f9e205e25..4dcdcf25c3fe 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1040,7 +1040,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		child = btrfs_read_node_slot(mid, 0);
 		if (IS_ERR(child)) {
 			ret = PTR_ERR(child);
-			btrfs_handle_fs_error(fs_info, ret, NULL);
 			goto out;
 		}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (6 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  8:57   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level() we are calling btrfs_handle_fs_error() when the middle
child only has 1 item and the left child is missing, however we can simply
use btrfs_abort_transaction(), which achieves the same purposes: to turn
the fs to error state, abort the current transaction and turn the fs to
RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
which makes it more useful.

Also, as this is an highly unexpected case and it's about a b+tree
inconsistency, change the error code from -EROFS to -EUCLEAN and tag the
if branch as 'unlikely'.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4dcdcf25c3fe..e2943047b01d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1164,9 +1164,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		 * otherwise we would have pulled some pointers from the
 		 * right
 		 */
-		if (!left) {
-			ret = -EROFS;
-			btrfs_handle_fs_error(fs_info, ret, NULL);
+		if (unlikely(!left)) {
+			ret = -EUCLEAN;
+			btrfs_abort_transaction(trans, ret);
 			goto out;
 		}
 		wret = balance_node_right(trans, mid, left);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (7 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  8:58   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At update_ref_for_cow() we are calling btrfs_handle_fs_error() if we find
that the extent buffer has an unexpected ref count of zero, however we can
simply use btrfs_abort_transaction(), which achieves the same purposes: to
turn the fs to error state, abort the current transaction and turn the fs
to RO mode as well. Besides that, btrfs_abort_transaction() also prints a
stack trace which makes it more useful.

Also, as this is a very unexpected situation, indicating a serious
corruption/inconsistency, tag the if branch as 'unlikely' and set the
error code to -EUCLEAN instead of -EROFS.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e2943047b01d..2971e7d70d04 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -421,9 +421,9 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
 					       &refs, &flags);
 		if (ret)
 			return ret;
-		if (refs == 0) {
-			ret = -EROFS;
-			btrfs_handle_fs_error(fs_info, ret, NULL);
+		if (unlikely(refs == 0)) {
+			ret = -EUCLEAN;
+			btrfs_abort_transaction(trans, ret);
 			return ret;
 		}
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (8 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  9:02   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At push_nodes_for_insert(), instead of doing a BUG_ON() in case we fail to
record tree mod log operations, do a transaction abort and return the
error to the caller. There's really no need for the BUG_ON() as we can
release all resources in this context, and we have to abort because other
future tree searches that use the tree mod log (btrfs_search_old_slot())
may get inconsistent results if other operations modify the tree after
that failure and before the tree mod log based search.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2971e7d70d04..e3c949fa136f 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1300,7 +1300,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
 			btrfs_node_key(mid, &disk_key, 0);
 			ret = btrfs_tree_mod_log_insert_key(parent, pslot,
 					BTRFS_MOD_LOG_KEY_REPLACE);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_tree_unlock(left);
+				free_extent_buffer(left);
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 			btrfs_set_node_key(parent, &disk_key, pslot);
 			btrfs_mark_buffer_dirty(parent);
 			if (btrfs_header_nritems(left) > orig_slot) {
@@ -1355,7 +1360,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
 			btrfs_node_key(right, &disk_key, 0);
 			ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
 					BTRFS_MOD_LOG_KEY_REPLACE);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_tree_unlock(right);
+				free_extent_buffer(right);
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 			btrfs_set_node_key(parent, &disk_key, pslot + 1);
 			btrfs_mark_buffer_dirty(parent);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (9 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  9:11   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At insert_new_root(), instead of doing a BUG_ON() in case we fail to
record the tree mod log operation, just return the error to the callers
after releasing the allocated tree block. At this point we haven't made
any changes to the b+tree, so we haven't left it in an inconsistent state
and therefore have no need to abort the transaction. All we need to do is
to unlock and free the extent buffer we just allocated with the purpose
of making it the new root.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e3c949fa136f..6e59343034d6 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2956,7 +2956,12 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
 
 	old = root->node;
 	ret = btrfs_tree_mod_log_insert_root(root->node, c, false);
-	BUG_ON(ret < 0);
+	if (ret < 0) {
+		btrfs_free_tree_block(trans, btrfs_root_id(root), c, 0, 1);
+		btrfs_tree_unlock(c);
+		free_extent_buffer(c);
+		return ret;
+	}
 	rcu_assign_pointer(root->node, c);
 
 	/* the super has an extra ref to root->node */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (10 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  9:16   ` Qu Wenruo
  2023-06-07 19:24 ` [PATCH 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At insert_ptr(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release all
resources in the context of all callers, and we have to abort because other
future tree searches that use the tree mod log (btrfs_search_old_slot())
may get inconsistent results if other operations modify the tree after
that failure and before the tree mod log based search.

This implies making insert_ptr() return an int instead of void, and making
all callers check for returned errors.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6e59343034d6..f1fa89ae1184 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2982,10 +2982,10 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
  * slot and level indicate where you want the key to go, and
  * blocknr is the block the key points to.
  */
-static void insert_ptr(struct btrfs_trans_handle *trans,
-		       struct btrfs_path *path,
-		       struct btrfs_disk_key *key, u64 bytenr,
-		       int slot, int level)
+static int insert_ptr(struct btrfs_trans_handle *trans,
+		      struct btrfs_path *path,
+		      struct btrfs_disk_key *key, u64 bytenr,
+		      int slot, int level)
 {
 	struct extent_buffer *lower;
 	int nritems;
@@ -3001,7 +3001,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 		if (level) {
 			ret = btrfs_tree_mod_log_insert_move(lower, slot + 1,
 					slot, nritems - slot);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 		}
 		memmove_extent_buffer(lower,
 			      btrfs_node_key_ptr_offset(lower, slot + 1),
@@ -3011,7 +3014,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 	if (level) {
 		ret = btrfs_tree_mod_log_insert_key(lower, slot,
 						    BTRFS_MOD_LOG_KEY_ADD);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
 	}
 	btrfs_set_node_key(lower, key, slot);
 	btrfs_set_node_blockptr(lower, slot, bytenr);
@@ -3019,6 +3025,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 	btrfs_set_node_ptr_generation(lower, slot, trans->transid);
 	btrfs_set_header_nritems(lower, nritems + 1);
 	btrfs_mark_buffer_dirty(lower);
+
+	return 0;
 }
 
 /*
@@ -3098,8 +3106,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(c);
 	btrfs_mark_buffer_dirty(split);
 
-	insert_ptr(trans, path, &disk_key, split->start,
-		   path->slots[level + 1] + 1, level + 1);
+	ret = insert_ptr(trans, path, &disk_key, split->start,
+			 path->slots[level + 1] + 1, level + 1);
+	if (ret < 0) {
+		btrfs_tree_unlock(split);
+		free_extent_buffer(split);
+		return ret;
+	}
 
 	if (path->slots[level] >= mid) {
 		path->slots[level] -= mid;
@@ -3576,16 +3589,17 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
  * split the path's leaf in two, making sure there is at least data_size
  * available for the resulting leaf level of the path.
  */
-static noinline void copy_for_split(struct btrfs_trans_handle *trans,
-				    struct btrfs_path *path,
-				    struct extent_buffer *l,
-				    struct extent_buffer *right,
-				    int slot, int mid, int nritems)
+static noinline int copy_for_split(struct btrfs_trans_handle *trans,
+				   struct btrfs_path *path,
+				   struct extent_buffer *l,
+				   struct extent_buffer *right,
+				   int slot, int mid, int nritems)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int data_copy_size;
 	int rt_data_off;
 	int i;
+	int ret;
 	struct btrfs_disk_key disk_key;
 	struct btrfs_map_token token;
 
@@ -3610,7 +3624,9 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
 
 	btrfs_set_header_nritems(l, mid);
 	btrfs_item_key(right, &disk_key, 0);
-	insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
+	ret = insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
+	if (ret < 0)
+		return ret;
 
 	btrfs_mark_buffer_dirty(right);
 	btrfs_mark_buffer_dirty(l);
@@ -3628,6 +3644,8 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
 	}
 
 	BUG_ON(path->slots[0] < 0);
+
+	return 0;
 }
 
 /*
@@ -3826,8 +3844,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 	if (split == 0) {
 		if (mid <= slot) {
 			btrfs_set_header_nritems(right, 0);
-			insert_ptr(trans, path, &disk_key,
-				   right->start, path->slots[1] + 1, 1);
+			ret = insert_ptr(trans, path, &disk_key,
+					 right->start, path->slots[1] + 1, 1);
+			if (ret < 0) {
+				btrfs_tree_unlock(right);
+				free_extent_buffer(right);
+				return ret;
+			}
 			btrfs_tree_unlock(path->nodes[0]);
 			free_extent_buffer(path->nodes[0]);
 			path->nodes[0] = right;
@@ -3835,8 +3858,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 			path->slots[1] += 1;
 		} else {
 			btrfs_set_header_nritems(right, 0);
-			insert_ptr(trans, path, &disk_key,
-				   right->start, path->slots[1], 1);
+			ret = insert_ptr(trans, path, &disk_key,
+					 right->start, path->slots[1], 1);
+			if (ret < 0) {
+				btrfs_tree_unlock(right);
+				free_extent_buffer(right);
+				return ret;
+			}
 			btrfs_tree_unlock(path->nodes[0]);
 			free_extent_buffer(path->nodes[0]);
 			path->nodes[0] = right;
@@ -3852,7 +3880,9 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 		return ret;
 	}
 
-	copy_for_split(trans, path, l, right, slot, mid, nritems);
+	ret = copy_for_split(trans, path, l, right, slot, mid, nritems);
+	if (ret < 0)
+		return ret;
 
 	if (split == 2) {
 		BUG_ON(num_doubles != 0);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (11 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
@ 2023-06-07 19:24 ` fdmanana
  2023-06-08  9:19   ` Qu Wenruo
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
  13 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-07 19:24 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At btrfs_del_ptr(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release all
resources in the context of all callers, and we have to abort because other
future tree searches that use the tree mod log (btrfs_search_old_slot())
may get inconsistent results if other operations modify the tree after
that failure and before the tree mod log based search.

This implies btrfs_del_ptr() return an int instead of void, and making all
callers check for returned errors.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 46 +++++++++++++++++++++++++++++++++-------------
 fs/btrfs/ctree.h |  4 ++--
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index f1fa89ae1184..7220c8736218 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1135,7 +1135,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (btrfs_header_nritems(right) == 0) {
 			btrfs_clear_buffer_dirty(trans, right);
 			btrfs_tree_unlock(right);
-			btrfs_del_ptr(root, path, level + 1, pslot + 1);
+			ret = btrfs_del_ptr(trans, root, path, level + 1, pslot + 1);
+			if (ret < 0)
+				goto out;
 			root_sub_used(root, right->len);
 			btrfs_free_tree_block(trans, btrfs_root_id(root), right,
 					      0, 1);
@@ -1184,7 +1186,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 	if (btrfs_header_nritems(mid) == 0) {
 		btrfs_clear_buffer_dirty(trans, mid);
 		btrfs_tree_unlock(mid);
-		btrfs_del_ptr(root, path, level + 1, pslot);
+		ret = btrfs_del_ptr(trans, root, path, level + 1, pslot);
+		if (ret < 0)
+			goto out;
 		root_sub_used(root, mid->len);
 		btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1);
 		free_extent_buffer_stale(mid);
@@ -4429,8 +4433,8 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans,
  *
  * This is exported for use inside btrfs-progs, don't un-export it.
  */
-void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
-		   int slot)
+int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		  struct btrfs_path *path, int level, int slot)
 {
 	struct extent_buffer *parent = path->nodes[level];
 	u32 nritems;
@@ -4441,7 +4445,10 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
 		if (level) {
 			ret = btrfs_tree_mod_log_insert_move(parent, slot,
 					slot + 1, nritems - slot - 1);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 		}
 		memmove_extent_buffer(parent,
 			      btrfs_node_key_ptr_offset(parent, slot),
@@ -4451,7 +4458,10 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
 	} else if (level) {
 		ret = btrfs_tree_mod_log_insert_key(parent, slot,
 						    BTRFS_MOD_LOG_KEY_REMOVE);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
 	}
 
 	nritems--;
@@ -4467,6 +4477,7 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
 		fixup_low_keys(path, &disk_key, level + 1);
 	}
 	btrfs_mark_buffer_dirty(parent);
+	return 0;
 }
 
 /*
@@ -4479,13 +4490,17 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
  * The path must have already been setup for deleting the leaf, including
  * all the proper balancing.  path->nodes[1] must be locked.
  */
-static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
-				    struct btrfs_root *root,
-				    struct btrfs_path *path,
-				    struct extent_buffer *leaf)
+static noinline int btrfs_del_leaf(struct btrfs_trans_handle *trans,
+				   struct btrfs_root *root,
+				   struct btrfs_path *path,
+				   struct extent_buffer *leaf)
 {
+	int ret;
+
 	WARN_ON(btrfs_header_generation(leaf) != trans->transid);
-	btrfs_del_ptr(root, path, 1, path->slots[1]);
+	ret = btrfs_del_ptr(trans, root, path, 1, path->slots[1]);
+	if (ret < 0)
+		return ret;
 
 	/*
 	 * btrfs_free_extent is expensive, we want to make sure we
@@ -4498,6 +4513,7 @@ static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
 	atomic_inc(&leaf->refs);
 	btrfs_free_tree_block(trans, btrfs_root_id(root), leaf, 0, 1);
 	free_extent_buffer_stale(leaf);
+	return 0;
 }
 /*
  * delete the item at the leaf level in path.  If that empties
@@ -4547,7 +4563,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 			btrfs_set_header_level(leaf, 0);
 		} else {
 			btrfs_clear_buffer_dirty(trans, leaf);
-			btrfs_del_leaf(trans, root, path, leaf);
+			ret = btrfs_del_leaf(trans, root, path, leaf);
+			if (ret < 0)
+				return ret;
 		}
 	} else {
 		int used = leaf_space_used(leaf, 0, nritems);
@@ -4608,7 +4626,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 			if (btrfs_header_nritems(leaf) == 0) {
 				path->slots[1] = slot;
-				btrfs_del_leaf(trans, root, path, leaf);
+				ret = btrfs_del_leaf(trans, root, path, leaf);
+				if (ret < 0)
+					return ret;
 				free_extent_buffer(leaf);
 				ret = 0;
 			} else {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 5af61480dde6..f2d2b313bde5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -541,8 +541,8 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 		      struct extent_buffer **cow_ret, u64 new_root_objectid);
 int btrfs_block_can_be_shared(struct btrfs_root *root,
 			      struct extent_buffer *buf);
-void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
-		   int slot);
+int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		  struct btrfs_path *path, int level, int slot);
 void btrfs_extend_item(struct btrfs_path *path, u32 data_size);
 void btrfs_truncate_item(struct btrfs_path *path, u32 new_size, int from_end);
 int btrfs_split_item(struct btrfs_trans_handle *trans,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/13] btrfs: fix extent buffer leak after failure tree mod log failure at split_node()
  2023-06-07 19:24 ` [PATCH 02/13] btrfs: fix extent buffer leak after failure tree mod log failure at split_node() fdmanana
@ 2023-06-08  8:40   ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  8:40 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At split_node(), if we fail to log the tree mod log copy operation, we
> return without unlocking the split extent buffer we just allocated and
> without decrementing the reference we own on it. Fix this by unlocking
> it and decrementing the ref count before returning.
>
> Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/ctree.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 7f7f13965fe9..8496535828de 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -3053,6 +3053,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
>
>   	ret = btrfs_tree_mod_log_eb_copy(split, c, 0, mid, c_nritems - mid);
>   	if (ret) {
> +		btrfs_tree_unlock(split);
> +		free_extent_buffer(split);
>   		btrfs_abort_transaction(trans, ret);
>   		return ret;
>   	}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
  2023-06-07 19:24 ` [PATCH 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
@ 2023-06-08  8:48   ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  8:48 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to
> record a tree mod log root insertion operation, do a transaction abort
> instead. There's really no need for the BUG_ON(), we can properly
> release all resources in this context and turn the filesystem to RO mode
> and in an error state instead.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 8496535828de..d6c29564ce49 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -584,9 +584,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>   		    btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
>   			parent_start = buf->start;
>
> -		atomic_inc(&cow->refs);
>   		ret = btrfs_tree_mod_log_insert_root(root->node, cow, true);
> -		BUG_ON(ret < 0);
> +		if (ret < 0) {
> +			btrfs_tree_unlock(cow);
> +			free_extent_buffer(cow);
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
> +		atomic_inc(&cow->refs);
>   		rcu_assign_pointer(root->node, cow);
>
>   		btrfs_free_tree_block(trans, btrfs_root_id(root), buf,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level()
  2023-06-07 19:24 ` [PATCH 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
@ 2023-06-08  8:52   ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  8:52 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At balance_level(), instead of doing a BUG_ON() in case we fail to record
> tree mod log operations, do a transaction abort and return the error to
> the callers. There's really no need for the BUG_ON() as we can release
> all resources in this context, and we have to abort because other future
> tree searches that use the tree mod log (btrfs_search_old_slot()) may get
> inconsistent results if other operations modify the tree after that
> failure and before the tree mod log based search.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/ctree.c | 17 ++++++++++++++---
>   1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index d6c29564ce49..d60b28c6bd1b 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1054,7 +1054,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		}
>
>   		ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
> -		BUG_ON(ret < 0);
> +		if (ret < 0) {
> +			btrfs_tree_unlock(child);
> +			free_extent_buffer(child);
> +			btrfs_abort_transaction(trans, ret);
> +			goto enospc;
> +		}
>   		rcu_assign_pointer(root->node, child);
>
>   		add_root_to_dirty_list(root);
> @@ -1142,7 +1147,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   			btrfs_node_key(right, &right_key, 0);
>   			ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
>   					BTRFS_MOD_LOG_KEY_REPLACE);
> -			BUG_ON(ret < 0);
> +			if (ret < 0) {
> +				btrfs_abort_transaction(trans, ret);
> +				goto enospc;
> +			}
>   			btrfs_set_node_key(parent, &right_key, pslot + 1);
>   			btrfs_mark_buffer_dirty(parent);
>   		}
> @@ -1188,7 +1196,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		btrfs_node_key(mid, &mid_key, 0);
>   		ret = btrfs_tree_mod_log_insert_key(parent, pslot,
>   						    BTRFS_MOD_LOG_KEY_REPLACE);
> -		BUG_ON(ret < 0);
> +		if (ret < 0) {
> +			btrfs_abort_transaction(trans, ret);
> +			goto enospc;
> +		}
>   		btrfs_set_node_key(parent, &mid_key, pslot);
>   		btrfs_mark_buffer_dirty(parent);
>   	}

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/13] btrfs: rename enospc label to out at balance_level()
  2023-06-07 19:24 ` [PATCH 06/13] btrfs: rename enospc label to out " fdmanana
@ 2023-06-08  8:53   ` Qu Wenruo
  2023-06-08  9:21   ` Anand Jain
  1 sibling, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  8:53 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At balance_level() we have this 'enospc' label where we jump to in case
> we get an error at several places. However that error is certainly not
> -ENOSPC in call cases, it can be -EIO or -ENOMEM when reading a child
> extent buffer for example, or -ENOMEM when trying to record tree mod log
> operations. So to make this less confusing, rename the label to 'out'.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/ctree.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index d60b28c6bd1b..e98f9e205e25 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1041,7 +1041,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		if (IS_ERR(child)) {
>   			ret = PTR_ERR(child);
>   			btrfs_handle_fs_error(fs_info, ret, NULL);
> -			goto enospc;
> +			goto out;
>   		}
>
>   		btrfs_tree_lock(child);
> @@ -1050,7 +1050,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		if (ret) {
>   			btrfs_tree_unlock(child);
>   			free_extent_buffer(child);
> -			goto enospc;
> +			goto out;
>   		}
>
>   		ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
> @@ -1058,7 +1058,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   			btrfs_tree_unlock(child);
>   			free_extent_buffer(child);
>   			btrfs_abort_transaction(trans, ret);
> -			goto enospc;
> +			goto out;
>   		}
>   		rcu_assign_pointer(root->node, child);
>
> @@ -1087,7 +1087,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		if (IS_ERR(left)) {
>   			ret = PTR_ERR(left);
>   			left = NULL;
> -			goto enospc;
> +			goto out;
>   		}
>
>   		__btrfs_tree_lock(left, BTRFS_NESTING_LEFT);
> @@ -1096,7 +1096,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   				       BTRFS_NESTING_LEFT_COW);
>   		if (wret) {
>   			ret = wret;
> -			goto enospc;
> +			goto out;
>   		}
>   	}
>
> @@ -1105,7 +1105,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		if (IS_ERR(right)) {
>   			ret = PTR_ERR(right);
>   			right = NULL;
> -			goto enospc;
> +			goto out;
>   		}
>
>   		__btrfs_tree_lock(right, BTRFS_NESTING_RIGHT);
> @@ -1114,7 +1114,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   				       BTRFS_NESTING_RIGHT_COW);
>   		if (wret) {
>   			ret = wret;
> -			goto enospc;
> +			goto out;
>   		}
>   	}
>
> @@ -1149,7 +1149,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   					BTRFS_MOD_LOG_KEY_REPLACE);
>   			if (ret < 0) {
>   				btrfs_abort_transaction(trans, ret);
> -				goto enospc;
> +				goto out;
>   			}
>   			btrfs_set_node_key(parent, &right_key, pslot + 1);
>   			btrfs_mark_buffer_dirty(parent);
> @@ -1168,12 +1168,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		if (!left) {
>   			ret = -EROFS;
>   			btrfs_handle_fs_error(fs_info, ret, NULL);
> -			goto enospc;
> +			goto out;
>   		}
>   		wret = balance_node_right(trans, mid, left);
>   		if (wret < 0) {
>   			ret = wret;
> -			goto enospc;
> +			goto out;
>   		}
>   		if (wret == 1) {
>   			wret = push_node_left(trans, left, mid, 1);
> @@ -1198,7 +1198,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   						    BTRFS_MOD_LOG_KEY_REPLACE);
>   		if (ret < 0) {
>   			btrfs_abort_transaction(trans, ret);
> -			goto enospc;
> +			goto out;
>   		}
>   		btrfs_set_node_key(parent, &mid_key, pslot);
>   		btrfs_mark_buffer_dirty(parent);
> @@ -1225,7 +1225,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   	if (orig_ptr !=
>   	    btrfs_node_blockptr(path->nodes[level], path->slots[level]))
>   		BUG();
> -enospc:
> +out:
>   	if (right) {
>   		btrfs_tree_unlock(right);
>   		free_extent_buffer(right);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-07 19:24 ` [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
@ 2023-06-08  8:57   ` Qu Wenruo
  2023-06-08  9:47     ` Filipe Manana
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  8:57 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At balance_level() we are calling btrfs_handle_fs_error() when the middle
> child only has 1 item and the left child is missing, however we can simply
> use btrfs_abort_transaction(), which achieves the same purposes: to turn
> the fs to error state, abort the current transaction and turn the fs to
> RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
> which makes it more useful.
>
> Also, as this is an highly unexpected case and it's about a b+tree
> inconsistency, change the error code from -EROFS to -EUCLEAN and tag the
> if branch as 'unlikely'.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ctree.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4dcdcf25c3fe..e2943047b01d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1164,9 +1164,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		 * otherwise we would have pulled some pointers from the
>   		 * right
>   		 */
> -		if (!left) {
> -			ret = -EROFS;
> -			btrfs_handle_fs_error(fs_info, ret, NULL);
> +		if (unlikely(!left)) {
> +			ret = -EUCLEAN;

I'd prefer to have an message every time we return -EUCLEAN.

Otherwise looks good to me.

Thanks,
Qu
> +			btrfs_abort_transaction(trans, ret);
>   			goto out;
>   		}
>   		wret = balance_node_right(trans, mid, left);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  2023-06-07 19:24 ` [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
@ 2023-06-08  8:58   ` Qu Wenruo
  2023-06-08  9:47     ` Filipe Manana
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  8:58 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At update_ref_for_cow() we are calling btrfs_handle_fs_error() if we find
> that the extent buffer has an unexpected ref count of zero, however we can
> simply use btrfs_abort_transaction(), which achieves the same purposes: to
> turn the fs to error state, abort the current transaction and turn the fs
> to RO mode as well. Besides that, btrfs_abort_transaction() also prints a
> stack trace which makes it more useful.
>
> Also, as this is a very unexpected situation, indicating a serious
> corruption/inconsistency, tag the if branch as 'unlikely' and set the
> error code to -EUCLEAN instead of -EROFS.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ctree.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index e2943047b01d..2971e7d70d04 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -421,9 +421,9 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
>   					       &refs, &flags);
>   		if (ret)
>   			return ret;
> -		if (refs == 0) {
> -			ret = -EROFS;
> -			btrfs_handle_fs_error(fs_info, ret, NULL);
> +		if (unlikely(refs == 0)) {
> +			ret = -EUCLEAN;

The same as previous patch, just one extra error message explaining the
reason for EUCLEAN would be better.

Thanks,
Qu
> +			btrfs_abort_transaction(trans, ret);
>   			return ret;
>   		}
>   	} else {

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  2023-06-07 19:24 ` [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
@ 2023-06-08  9:02   ` Qu Wenruo
  2023-06-08  9:46     ` Filipe Manana
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  9:02 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At push_nodes_for_insert(), instead of doing a BUG_ON() in case we fail to
> record tree mod log operations, do a transaction abort and return the
> error to the caller. There's really no need for the BUG_ON() as we can
> release all resources in this context, and we have to abort because other
> future tree searches that use the tree mod log (btrfs_search_old_slot())
> may get inconsistent results if other operations modify the tree after
> that failure and before the tree mod log based search.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ctree.c | 14 ++++++++++++--
>   1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 2971e7d70d04..e3c949fa136f 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1300,7 +1300,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
>   			btrfs_node_key(mid, &disk_key, 0);
>   			ret = btrfs_tree_mod_log_insert_key(parent, pslot,
>   					BTRFS_MOD_LOG_KEY_REPLACE);
> -			BUG_ON(ret < 0);
> +			if (ret < 0) {
> +				btrfs_tree_unlock(left);
> +				free_extent_buffer(left);

I'm not sure if we only need to unlock and free @left.

As just lines below, we have a two branches which one unlock and free @mid.

Thanks,
Qu

> +				btrfs_abort_transaction(trans, ret);
> +				return ret;
> +			}
>   			btrfs_set_node_key(parent, &disk_key, pslot);
>   			btrfs_mark_buffer_dirty(parent);
>   			if (btrfs_header_nritems(left) > orig_slot) {
> @@ -1355,7 +1360,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
>   			btrfs_node_key(right, &disk_key, 0);
>   			ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
>   					BTRFS_MOD_LOG_KEY_REPLACE);
> -			BUG_ON(ret < 0);
> +			if (ret < 0) {
> +				btrfs_tree_unlock(right);
> +				free_extent_buffer(right);
> +				btrfs_abort_transaction(trans, ret);
> +				return ret;
> +			}
>   			btrfs_set_node_key(parent, &disk_key, pslot + 1);
>   			btrfs_mark_buffer_dirty(parent);
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
  2023-06-07 19:24 ` [PATCH 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
@ 2023-06-08  9:11   ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  9:11 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At insert_new_root(), instead of doing a BUG_ON() in case we fail to
> record the tree mod log operation, just return the error to the callers
> after releasing the allocated tree block. At this point we haven't made
> any changes to the b+tree, so we haven't left it in an inconsistent state
> and therefore have no need to abort the transaction. All we need to do is
> to unlock and free the extent buffer we just allocated with the purpose
> of making it the new root.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index e3c949fa136f..6e59343034d6 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2956,7 +2956,12 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
>
>   	old = root->node;
>   	ret = btrfs_tree_mod_log_insert_root(root->node, c, false);
> -	BUG_ON(ret < 0);
> +	if (ret < 0) {
> +		btrfs_free_tree_block(trans, btrfs_root_id(root), c, 0, 1);
> +		btrfs_tree_unlock(c);
> +		free_extent_buffer(c);
> +		return ret;
> +	}
>   	rcu_assign_pointer(root->node, c);
>
>   	/* the super has an extra ref to root->node */

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
  2023-06-07 19:24 ` [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
@ 2023-06-08  9:16   ` Qu Wenruo
  2023-06-08  9:43     ` Filipe Manana
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  9:16 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At insert_ptr(), instead of doing a BUG_ON() in case we fail to record
> tree mod log operations, do a transaction abort and return the error to
> the callers. There's really no need for the BUG_ON() as we can release all
> resources in the context of all callers, and we have to abort because other
> future tree searches that use the tree mod log (btrfs_search_old_slot())
> may get inconsistent results if other operations modify the tree after
> that failure and before the tree mod log based search.
>
> This implies making insert_ptr() return an int instead of void, and making
> all callers check for returned errors.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 6e59343034d6..f1fa89ae1184 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2982,10 +2982,10 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
>    * slot and level indicate where you want the key to go, and
>    * blocknr is the block the key points to.
>    */
> -static void insert_ptr(struct btrfs_trans_handle *trans,
> -		       struct btrfs_path *path,
> -		       struct btrfs_disk_key *key, u64 bytenr,
> -		       int slot, int level)
> +static int insert_ptr(struct btrfs_trans_handle *trans,
> +		      struct btrfs_path *path,
> +		      struct btrfs_disk_key *key, u64 bytenr,
> +		      int slot, int level)
>   {
>   	struct extent_buffer *lower;
>   	int nritems;
> @@ -3001,7 +3001,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
>   		if (level) {
>   			ret = btrfs_tree_mod_log_insert_move(lower, slot + 1,
>   					slot, nritems - slot);
> -			BUG_ON(ret < 0);
> +			if (ret < 0) {
> +				btrfs_abort_transaction(trans, ret);
> +				return ret;
> +			}
>   		}
>   		memmove_extent_buffer(lower,
>   			      btrfs_node_key_ptr_offset(lower, slot + 1),
> @@ -3011,7 +3014,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
>   	if (level) {
>   		ret = btrfs_tree_mod_log_insert_key(lower, slot,
>   						    BTRFS_MOD_LOG_KEY_ADD);
> -		BUG_ON(ret < 0);
> +		if (ret < 0) {
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
>   	}
>   	btrfs_set_node_key(lower, key, slot);
>   	btrfs_set_node_blockptr(lower, slot, bytenr);
> @@ -3019,6 +3025,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
>   	btrfs_set_node_ptr_generation(lower, slot, trans->transid);
>   	btrfs_set_header_nritems(lower, nritems + 1);
>   	btrfs_mark_buffer_dirty(lower);
> +
> +	return 0;
>   }
>
>   /*
> @@ -3098,8 +3106,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
>   	btrfs_mark_buffer_dirty(c);
>   	btrfs_mark_buffer_dirty(split);
>
> -	insert_ptr(trans, path, &disk_key, split->start,
> -		   path->slots[level + 1] + 1, level + 1);
> +	ret = insert_ptr(trans, path, &disk_key, split->start,
> +			 path->slots[level + 1] + 1, level + 1);
> +	if (ret < 0) {
> +		btrfs_tree_unlock(split);
> +		free_extent_buffer(split);
> +		return ret;
> +	}
>
>   	if (path->slots[level] >= mid) {
>   		path->slots[level] -= mid;
> @@ -3576,16 +3589,17 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>    * split the path's leaf in two, making sure there is at least data_size
>    * available for the resulting leaf level of the path.
>    */
> -static noinline void copy_for_split(struct btrfs_trans_handle *trans,
> -				    struct btrfs_path *path,
> -				    struct extent_buffer *l,
> -				    struct extent_buffer *right,
> -				    int slot, int mid, int nritems)
> +static noinline int copy_for_split(struct btrfs_trans_handle *trans,
> +				   struct btrfs_path *path,
> +				   struct extent_buffer *l,
> +				   struct extent_buffer *right,
> +				   int slot, int mid, int nritems)
>   {
>   	struct btrfs_fs_info *fs_info = trans->fs_info;
>   	int data_copy_size;
>   	int rt_data_off;
>   	int i;
> +	int ret;
>   	struct btrfs_disk_key disk_key;
>   	struct btrfs_map_token token;
>
> @@ -3610,7 +3624,9 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
>
>   	btrfs_set_header_nritems(l, mid);
>   	btrfs_item_key(right, &disk_key, 0);
> -	insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
> +	ret = insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
> +	if (ret < 0)
> +		return ret;
>
>   	btrfs_mark_buffer_dirty(right);
>   	btrfs_mark_buffer_dirty(l);
> @@ -3628,6 +3644,8 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
>   	}
>
>   	BUG_ON(path->slots[0] < 0);
> +
> +	return 0;
>   }
>
>   /*
> @@ -3826,8 +3844,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
>   	if (split == 0) {
>   		if (mid <= slot) {
>   			btrfs_set_header_nritems(right, 0);
> -			insert_ptr(trans, path, &disk_key,
> -				   right->start, path->slots[1] + 1, 1);
> +			ret = insert_ptr(trans, path, &disk_key,
> +					 right->start, path->slots[1] + 1, 1);
> +			if (ret < 0) {
> +				btrfs_tree_unlock(right);
> +				free_extent_buffer(right);
> +				return ret;
> +			}
>   			btrfs_tree_unlock(path->nodes[0]);
>   			free_extent_buffer(path->nodes[0]);
>   			path->nodes[0] = right;
> @@ -3835,8 +3858,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
>   			path->slots[1] += 1;
>   		} else {
>   			btrfs_set_header_nritems(right, 0);
> -			insert_ptr(trans, path, &disk_key,
> -				   right->start, path->slots[1], 1);
> +			ret = insert_ptr(trans, path, &disk_key,
> +					 right->start, path->slots[1], 1);
> +			if (ret < 0) {
> +				btrfs_tree_unlock(right);
> +				free_extent_buffer(right);
> +				return ret;
> +			}
>   			btrfs_tree_unlock(path->nodes[0]);
>   			free_extent_buffer(path->nodes[0]);
>   			path->nodes[0] = right;
> @@ -3852,7 +3880,9 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
>   		return ret;
>   	}
>
> -	copy_for_split(trans, path, l, right, slot, mid, nritems);
> +	ret = copy_for_split(trans, path, l, right, slot, mid, nritems);
> +	if (ret < 0)
> +		return ret;

Shouldn't we also call btrfs_free_tree_block() for @right for error?

Or because we have already aborted the trans, there is no longer the
need to add delayed ref for @right?

Thanks,
Qu
>
>   	if (split == 2) {
>   		BUG_ON(num_doubles != 0);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()
  2023-06-07 19:24 ` [PATCH 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
@ 2023-06-08  9:19   ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  9:19 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At btrfs_del_ptr(), instead of doing a BUG_ON() in case we fail to record
> tree mod log operations, do a transaction abort and return the error to
> the callers. There's really no need for the BUG_ON() as we can release all
> resources in the context of all callers, and we have to abort because other
> future tree searches that use the tree mod log (btrfs_search_old_slot())
> may get inconsistent results if other operations modify the tree after
> that failure and before the tree mod log based search.
>
> This implies btrfs_del_ptr() return an int instead of void, and making all
> callers check for returned errors.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 46 +++++++++++++++++++++++++++++++++-------------
>   fs/btrfs/ctree.h |  4 ++--
>   2 files changed, 35 insertions(+), 15 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index f1fa89ae1184..7220c8736218 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1135,7 +1135,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		if (btrfs_header_nritems(right) == 0) {
>   			btrfs_clear_buffer_dirty(trans, right);
>   			btrfs_tree_unlock(right);
> -			btrfs_del_ptr(root, path, level + 1, pslot + 1);
> +			ret = btrfs_del_ptr(trans, root, path, level + 1, pslot + 1);
> +			if (ret < 0)
> +				goto out;
>   			root_sub_used(root, right->len);
>   			btrfs_free_tree_block(trans, btrfs_root_id(root), right,
>   					      0, 1);
> @@ -1184,7 +1186,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   	if (btrfs_header_nritems(mid) == 0) {
>   		btrfs_clear_buffer_dirty(trans, mid);
>   		btrfs_tree_unlock(mid);
> -		btrfs_del_ptr(root, path, level + 1, pslot);
> +		ret = btrfs_del_ptr(trans, root, path, level + 1, pslot);
> +		if (ret < 0)
> +			goto out;
>   		root_sub_used(root, mid->len);
>   		btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1);
>   		free_extent_buffer_stale(mid);
> @@ -4429,8 +4433,8 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans,
>    *
>    * This is exported for use inside btrfs-progs, don't un-export it.
>    */
> -void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
> -		   int slot)
> +int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> +		  struct btrfs_path *path, int level, int slot)
>   {
>   	struct extent_buffer *parent = path->nodes[level];
>   	u32 nritems;
> @@ -4441,7 +4445,10 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
>   		if (level) {
>   			ret = btrfs_tree_mod_log_insert_move(parent, slot,
>   					slot + 1, nritems - slot - 1);
> -			BUG_ON(ret < 0);
> +			if (ret < 0) {
> +				btrfs_abort_transaction(trans, ret);
> +				return ret;
> +			}
>   		}
>   		memmove_extent_buffer(parent,
>   			      btrfs_node_key_ptr_offset(parent, slot),
> @@ -4451,7 +4458,10 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
>   	} else if (level) {
>   		ret = btrfs_tree_mod_log_insert_key(parent, slot,
>   						    BTRFS_MOD_LOG_KEY_REMOVE);
> -		BUG_ON(ret < 0);
> +		if (ret < 0) {
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
>   	}
>
>   	nritems--;
> @@ -4467,6 +4477,7 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
>   		fixup_low_keys(path, &disk_key, level + 1);
>   	}
>   	btrfs_mark_buffer_dirty(parent);
> +	return 0;
>   }
>
>   /*
> @@ -4479,13 +4490,17 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
>    * The path must have already been setup for deleting the leaf, including
>    * all the proper balancing.  path->nodes[1] must be locked.
>    */
> -static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
> -				    struct btrfs_root *root,
> -				    struct btrfs_path *path,
> -				    struct extent_buffer *leaf)
> +static noinline int btrfs_del_leaf(struct btrfs_trans_handle *trans,
> +				   struct btrfs_root *root,
> +				   struct btrfs_path *path,
> +				   struct extent_buffer *leaf)
>   {
> +	int ret;
> +
>   	WARN_ON(btrfs_header_generation(leaf) != trans->transid);
> -	btrfs_del_ptr(root, path, 1, path->slots[1]);
> +	ret = btrfs_del_ptr(trans, root, path, 1, path->slots[1]);
> +	if (ret < 0)
> +		return ret;
>
>   	/*
>   	 * btrfs_free_extent is expensive, we want to make sure we
> @@ -4498,6 +4513,7 @@ static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
>   	atomic_inc(&leaf->refs);
>   	btrfs_free_tree_block(trans, btrfs_root_id(root), leaf, 0, 1);
>   	free_extent_buffer_stale(leaf);
> +	return 0;
>   }
>   /*
>    * delete the item at the leaf level in path.  If that empties
> @@ -4547,7 +4563,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>   			btrfs_set_header_level(leaf, 0);
>   		} else {
>   			btrfs_clear_buffer_dirty(trans, leaf);
> -			btrfs_del_leaf(trans, root, path, leaf);
> +			ret = btrfs_del_leaf(trans, root, path, leaf);
> +			if (ret < 0)
> +				return ret;
>   		}
>   	} else {
>   		int used = leaf_space_used(leaf, 0, nritems);
> @@ -4608,7 +4626,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>
>   			if (btrfs_header_nritems(leaf) == 0) {
>   				path->slots[1] = slot;
> -				btrfs_del_leaf(trans, root, path, leaf);
> +				ret = btrfs_del_leaf(trans, root, path, leaf);
> +				if (ret < 0)
> +					return ret;
>   				free_extent_buffer(leaf);
>   				ret = 0;
>   			} else {
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 5af61480dde6..f2d2b313bde5 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -541,8 +541,8 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
>   		      struct extent_buffer **cow_ret, u64 new_root_objectid);
>   int btrfs_block_can_be_shared(struct btrfs_root *root,
>   			      struct extent_buffer *buf);
> -void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
> -		   int slot);
> +int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> +		  struct btrfs_path *path, int level, int slot);
>   void btrfs_extend_item(struct btrfs_path *path, u32 data_size);
>   void btrfs_truncate_item(struct btrfs_path *path, u32 new_size, int from_end);
>   int btrfs_split_item(struct btrfs_trans_handle *trans,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 06/13] btrfs: rename enospc label to out at balance_level()
  2023-06-07 19:24 ` [PATCH 06/13] btrfs: rename enospc label to out " fdmanana
  2023-06-08  8:53   ` Qu Wenruo
@ 2023-06-08  9:21   ` Anand Jain
  1 sibling, 0 replies; 54+ messages in thread
From: Anand Jain @ 2023-06-08  9:21 UTC (permalink / raw)
  To: fdmanana, linux-btrfs

LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer
  2023-06-07 19:24 ` [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
@ 2023-06-08  9:25   ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08  9:25 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> When COWing an extent buffer that is not the root node, we need to log in
> the tree mod log that we replaced a pointer in the parent node, otherwise
> a tree mod log user doing a search on the b+tree can return incorrect
> results (that miss something). We are doing the call to
> btrfs_tree_mod_log_insert_key() but we totally ignore its return value.
>
> So fix this by adding the missing error handling, resulting in a
> transaction abort and freeing the COWed extent buffer.
>
> Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/ctree.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 385524224037..7f7f13965fe9 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
>   		add_root_to_dirty_list(root);
>   	} else {
>   		WARN_ON(trans->transid != btrfs_header_generation(parent));
> -		btrfs_tree_mod_log_insert_key(parent, parent_slot,
> -					      BTRFS_MOD_LOG_KEY_REPLACE);
> +		ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
> +						    BTRFS_MOD_LOG_KEY_REPLACE);
> +		if (ret) {
> +			btrfs_tree_unlock(cow);
> +			free_extent_buffer(cow);
> +			btrfs_abort_transaction(trans, ret);
> +			return ret;
> +		}
>   		btrfs_set_node_blockptr(parent, parent_slot,
>   					cow->start);
>   		btrfs_set_node_ptr_generation(parent, parent_slot,

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
  2023-06-08  9:16   ` Qu Wenruo
@ 2023-06-08  9:43     ` Filipe Manana
  0 siblings, 0 replies; 54+ messages in thread
From: Filipe Manana @ 2023-06-08  9:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 8, 2023 at 10:16 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > At insert_ptr(), instead of doing a BUG_ON() in case we fail to record
> > tree mod log operations, do a transaction abort and return the error to
> > the callers. There's really no need for the BUG_ON() as we can release all
> > resources in the context of all callers, and we have to abort because other
> > future tree searches that use the tree mod log (btrfs_search_old_slot())
> > may get inconsistent results if other operations modify the tree after
> > that failure and before the tree mod log based search.
> >
> > This implies making insert_ptr() return an int instead of void, and making
> > all callers check for returned errors.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++--------------
> >   1 file changed, 49 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 6e59343034d6..f1fa89ae1184 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -2982,10 +2982,10 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
> >    * slot and level indicate where you want the key to go, and
> >    * blocknr is the block the key points to.
> >    */
> > -static void insert_ptr(struct btrfs_trans_handle *trans,
> > -                    struct btrfs_path *path,
> > -                    struct btrfs_disk_key *key, u64 bytenr,
> > -                    int slot, int level)
> > +static int insert_ptr(struct btrfs_trans_handle *trans,
> > +                   struct btrfs_path *path,
> > +                   struct btrfs_disk_key *key, u64 bytenr,
> > +                   int slot, int level)
> >   {
> >       struct extent_buffer *lower;
> >       int nritems;
> > @@ -3001,7 +3001,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
> >               if (level) {
> >                       ret = btrfs_tree_mod_log_insert_move(lower, slot + 1,
> >                                       slot, nritems - slot);
> > -                     BUG_ON(ret < 0);
> > +                     if (ret < 0) {
> > +                             btrfs_abort_transaction(trans, ret);
> > +                             return ret;
> > +                     }
> >               }
> >               memmove_extent_buffer(lower,
> >                             btrfs_node_key_ptr_offset(lower, slot + 1),
> > @@ -3011,7 +3014,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
> >       if (level) {
> >               ret = btrfs_tree_mod_log_insert_key(lower, slot,
> >                                                   BTRFS_MOD_LOG_KEY_ADD);
> > -             BUG_ON(ret < 0);
> > +             if (ret < 0) {
> > +                     btrfs_abort_transaction(trans, ret);
> > +                     return ret;
> > +             }
> >       }
> >       btrfs_set_node_key(lower, key, slot);
> >       btrfs_set_node_blockptr(lower, slot, bytenr);
> > @@ -3019,6 +3025,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
> >       btrfs_set_node_ptr_generation(lower, slot, trans->transid);
> >       btrfs_set_header_nritems(lower, nritems + 1);
> >       btrfs_mark_buffer_dirty(lower);
> > +
> > +     return 0;
> >   }
> >
> >   /*
> > @@ -3098,8 +3106,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
> >       btrfs_mark_buffer_dirty(c);
> >       btrfs_mark_buffer_dirty(split);
> >
> > -     insert_ptr(trans, path, &disk_key, split->start,
> > -                path->slots[level + 1] + 1, level + 1);
> > +     ret = insert_ptr(trans, path, &disk_key, split->start,
> > +                      path->slots[level + 1] + 1, level + 1);
> > +     if (ret < 0) {
> > +             btrfs_tree_unlock(split);
> > +             free_extent_buffer(split);
> > +             return ret;
> > +     }
> >
> >       if (path->slots[level] >= mid) {
> >               path->slots[level] -= mid;
> > @@ -3576,16 +3589,17 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
> >    * split the path's leaf in two, making sure there is at least data_size
> >    * available for the resulting leaf level of the path.
> >    */
> > -static noinline void copy_for_split(struct btrfs_trans_handle *trans,
> > -                                 struct btrfs_path *path,
> > -                                 struct extent_buffer *l,
> > -                                 struct extent_buffer *right,
> > -                                 int slot, int mid, int nritems)
> > +static noinline int copy_for_split(struct btrfs_trans_handle *trans,
> > +                                struct btrfs_path *path,
> > +                                struct extent_buffer *l,
> > +                                struct extent_buffer *right,
> > +                                int slot, int mid, int nritems)
> >   {
> >       struct btrfs_fs_info *fs_info = trans->fs_info;
> >       int data_copy_size;
> >       int rt_data_off;
> >       int i;
> > +     int ret;
> >       struct btrfs_disk_key disk_key;
> >       struct btrfs_map_token token;
> >
> > @@ -3610,7 +3624,9 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
> >
> >       btrfs_set_header_nritems(l, mid);
> >       btrfs_item_key(right, &disk_key, 0);
> > -     insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
> > +     ret = insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
> > +     if (ret < 0)
> > +             return ret;
> >
> >       btrfs_mark_buffer_dirty(right);
> >       btrfs_mark_buffer_dirty(l);
> > @@ -3628,6 +3644,8 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
> >       }
> >
> >       BUG_ON(path->slots[0] < 0);
> > +
> > +     return 0;
> >   }
> >
> >   /*
> > @@ -3826,8 +3844,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
> >       if (split == 0) {
> >               if (mid <= slot) {
> >                       btrfs_set_header_nritems(right, 0);
> > -                     insert_ptr(trans, path, &disk_key,
> > -                                right->start, path->slots[1] + 1, 1);
> > +                     ret = insert_ptr(trans, path, &disk_key,
> > +                                      right->start, path->slots[1] + 1, 1);
> > +                     if (ret < 0) {
> > +                             btrfs_tree_unlock(right);
> > +                             free_extent_buffer(right);
> > +                             return ret;
> > +                     }
> >                       btrfs_tree_unlock(path->nodes[0]);
> >                       free_extent_buffer(path->nodes[0]);
> >                       path->nodes[0] = right;
> > @@ -3835,8 +3858,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
> >                       path->slots[1] += 1;
> >               } else {
> >                       btrfs_set_header_nritems(right, 0);
> > -                     insert_ptr(trans, path, &disk_key,
> > -                                right->start, path->slots[1], 1);
> > +                     ret = insert_ptr(trans, path, &disk_key,
> > +                                      right->start, path->slots[1], 1);
> > +                     if (ret < 0) {
> > +                             btrfs_tree_unlock(right);
> > +                             free_extent_buffer(right);
> > +                             return ret;
> > +                     }
> >                       btrfs_tree_unlock(path->nodes[0]);
> >                       free_extent_buffer(path->nodes[0]);
> >                       path->nodes[0] = right;
> > @@ -3852,7 +3880,9 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
> >               return ret;
> >       }
> >
> > -     copy_for_split(trans, path, l, right, slot, mid, nritems);
> > +     ret = copy_for_split(trans, path, l, right, slot, mid, nritems);
> > +     if (ret < 0)
> > +             return ret;
>
> Shouldn't we also call btrfs_free_tree_block() for @right for error?
>
> Or because we have already aborted the trans, there is no longer the
> need to add delayed ref for @right?

Yes, we don't need to free tree blocks we allocated in the current
transaction because the transaction abort takes care of doing the
cleanup.

However this is missing the unlock and dropping the ref count.
I'll add that in v2, thanks.

>
> Thanks,
> Qu
> >
> >       if (split == 2) {
> >               BUG_ON(num_doubles != 0);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  2023-06-08  9:02   ` Qu Wenruo
@ 2023-06-08  9:46     ` Filipe Manana
  2023-06-08 10:19       ` Qu Wenruo
  0 siblings, 1 reply; 54+ messages in thread
From: Filipe Manana @ 2023-06-08  9:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 8, 2023 at 10:02 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > At push_nodes_for_insert(), instead of doing a BUG_ON() in case we fail to
> > record tree mod log operations, do a transaction abort and return the
> > error to the caller. There's really no need for the BUG_ON() as we can
> > release all resources in this context, and we have to abort because other
> > future tree searches that use the tree mod log (btrfs_search_old_slot())
> > may get inconsistent results if other operations modify the tree after
> > that failure and before the tree mod log based search.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/ctree.c | 14 ++++++++++++--
> >   1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 2971e7d70d04..e3c949fa136f 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1300,7 +1300,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
> >                       btrfs_node_key(mid, &disk_key, 0);
> >                       ret = btrfs_tree_mod_log_insert_key(parent, pslot,
> >                                       BTRFS_MOD_LOG_KEY_REPLACE);
> > -                     BUG_ON(ret < 0);
> > +                     if (ret < 0) {
> > +                             btrfs_tree_unlock(left);
> > +                             free_extent_buffer(left);
>
> I'm not sure if we only need to unlock and free @left.
>
> As just lines below, we have a two branches which one unlock and free @mid.

mid is part of the path, so we shouldn't touch it at this point.
It's released by the caller doing a btrfs_release_path() or btrfs_free_path().

Those branches unlock and free mid because they are replacing it in the path.

Thanks.

>
> Thanks,
> Qu
>
> > +                             btrfs_abort_transaction(trans, ret);
> > +                             return ret;
> > +                     }
> >                       btrfs_set_node_key(parent, &disk_key, pslot);
> >                       btrfs_mark_buffer_dirty(parent);
> >                       if (btrfs_header_nritems(left) > orig_slot) {
> > @@ -1355,7 +1360,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
> >                       btrfs_node_key(right, &disk_key, 0);
> >                       ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
> >                                       BTRFS_MOD_LOG_KEY_REPLACE);
> > -                     BUG_ON(ret < 0);
> > +                     if (ret < 0) {
> > +                             btrfs_tree_unlock(right);
> > +                             free_extent_buffer(right);
> > +                             btrfs_abort_transaction(trans, ret);
> > +                             return ret;
> > +                     }
> >                       btrfs_set_node_key(parent, &disk_key, pslot + 1);
> >                       btrfs_mark_buffer_dirty(parent);
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  2023-06-08  8:58   ` Qu Wenruo
@ 2023-06-08  9:47     ` Filipe Manana
  0 siblings, 0 replies; 54+ messages in thread
From: Filipe Manana @ 2023-06-08  9:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 8, 2023 at 9:58 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > At update_ref_for_cow() we are calling btrfs_handle_fs_error() if we find
> > that the extent buffer has an unexpected ref count of zero, however we can
> > simply use btrfs_abort_transaction(), which achieves the same purposes: to
> > turn the fs to error state, abort the current transaction and turn the fs
> > to RO mode as well. Besides that, btrfs_abort_transaction() also prints a
> > stack trace which makes it more useful.
> >
> > Also, as this is a very unexpected situation, indicating a serious
> > corruption/inconsistency, tag the if branch as 'unlikely' and set the
> > error code to -EUCLEAN instead of -EROFS.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/ctree.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index e2943047b01d..2971e7d70d04 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -421,9 +421,9 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
> >                                              &refs, &flags);
> >               if (ret)
> >                       return ret;
> > -             if (refs == 0) {
> > -                     ret = -EROFS;
> > -                     btrfs_handle_fs_error(fs_info, ret, NULL);
> > +             if (unlikely(refs == 0)) {
> > +                     ret = -EUCLEAN;
>
> The same as previous patch, just one extra error message explaining the
> reason for EUCLEAN would be better.

Sure... I didn't see a strong need for that because the transaction
abort's stack
trace will be obvious. But I can add it in v2.

Thanks.

>
> Thanks,
> Qu
> > +                     btrfs_abort_transaction(trans, ret);
> >                       return ret;
> >               }
> >       } else {

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-08  8:57   ` Qu Wenruo
@ 2023-06-08  9:47     ` Filipe Manana
  2023-06-08 12:26       ` David Sterba
  0 siblings, 1 reply; 54+ messages in thread
From: Filipe Manana @ 2023-06-08  9:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 8, 2023 at 9:58 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > At balance_level() we are calling btrfs_handle_fs_error() when the middle
> > child only has 1 item and the left child is missing, however we can simply
> > use btrfs_abort_transaction(), which achieves the same purposes: to turn
> > the fs to error state, abort the current transaction and turn the fs to
> > RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
> > which makes it more useful.
> >
> > Also, as this is an highly unexpected case and it's about a b+tree
> > inconsistency, change the error code from -EROFS to -EUCLEAN and tag the
> > if branch as 'unlikely'.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/ctree.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 4dcdcf25c3fe..e2943047b01d 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1164,9 +1164,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
> >                * otherwise we would have pulled some pointers from the
> >                * right
> >                */
> > -             if (!left) {
> > -                     ret = -EROFS;
> > -                     btrfs_handle_fs_error(fs_info, ret, NULL);
> > +             if (unlikely(!left)) {
> > +                     ret = -EUCLEAN;
>
> I'd prefer to have an message every time we return -EUCLEAN.

Sure... I didn't see a strong need for that because the transaction
abort's stack
trace will be obvious. But I can add it in v2.

Thanks.

>
> Otherwise looks good to me.
>
> Thanks,
> Qu
> > +                     btrfs_abort_transaction(trans, ret);
> >                       goto out;
> >               }
> >               wret = balance_node_right(trans, mid, left);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  2023-06-08  9:46     ` Filipe Manana
@ 2023-06-08 10:19       ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08 10:19 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



On 2023/6/8 17:46, Filipe Manana wrote:
> On Thu, Jun 8, 2023 at 10:02 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/6/8 03:24, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> At push_nodes_for_insert(), instead of doing a BUG_ON() in case we fail to
>>> record tree mod log operations, do a transaction abort and return the
>>> error to the caller. There's really no need for the BUG_ON() as we can
>>> release all resources in this context, and we have to abort because other
>>> future tree searches that use the tree mod log (btrfs_search_old_slot())
>>> may get inconsistent results if other operations modify the tree after
>>> that failure and before the tree mod log based search.
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>>    fs/btrfs/ctree.c | 14 ++++++++++++--
>>>    1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index 2971e7d70d04..e3c949fa136f 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -1300,7 +1300,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
>>>                        btrfs_node_key(mid, &disk_key, 0);
>>>                        ret = btrfs_tree_mod_log_insert_key(parent, pslot,
>>>                                        BTRFS_MOD_LOG_KEY_REPLACE);
>>> -                     BUG_ON(ret < 0);
>>> +                     if (ret < 0) {
>>> +                             btrfs_tree_unlock(left);
>>> +                             free_extent_buffer(left);
>>
>> I'm not sure if we only need to unlock and free @left.
>>
>> As just lines below, we have a two branches which one unlock and free @mid.
>
> mid is part of the path, so we shouldn't touch it at this point.
> It's released by the caller doing a btrfs_release_path() or btrfs_free_path().
>
> Those branches unlock and free mid because they are replacing it in the path.

Thanks for the detailed reason.

Then this looks good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> Thanks.
>
>>
>> Thanks,
>> Qu
>>
>>> +                             btrfs_abort_transaction(trans, ret);
>>> +                             return ret;
>>> +                     }
>>>                        btrfs_set_node_key(parent, &disk_key, pslot);
>>>                        btrfs_mark_buffer_dirty(parent);
>>>                        if (btrfs_header_nritems(left) > orig_slot) {
>>> @@ -1355,7 +1360,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
>>>                        btrfs_node_key(right, &disk_key, 0);
>>>                        ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
>>>                                        BTRFS_MOD_LOG_KEY_REPLACE);
>>> -                     BUG_ON(ret < 0);
>>> +                     if (ret < 0) {
>>> +                             btrfs_tree_unlock(right);
>>> +                             free_extent_buffer(right);
>>> +                             btrfs_abort_transaction(trans, ret);
>>> +                             return ret;
>>> +                     }
>>>                        btrfs_set_node_key(parent, &disk_key, pslot + 1);
>>>                        btrfs_mark_buffer_dirty(parent);
>>>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations
  2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                   ` (12 preceding siblings ...)
  2023-06-07 19:24 ` [PATCH 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
@ 2023-06-08 10:27 ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
                     ` (14 more replies)
  13 siblings, 15 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

This mostly helps avoid some unnecessary enomem failures when logging
tree mod log operations and replace some BUG_ON()'s when dealing with
such failures. There's also 2 bug fixes (the first two patches) and
some cleanups. More details on the changelogs.

V2: Add explicit error messages in patches 8/13 and 9/13.
    Add missing unlock and ref count drop of 'right' extent buffer to patch 12/13.
    Add missing extent buffer ref count drops for right and mid extent buffers in
    error paths of balance_level() to patch 13/13.
    Fix subject of patch 2/13 (removed duplicated word).
    Added Reviewed-by tags where appropriate.

Filipe Manana (13):
  btrfs: add missing error handling when logging operation while COWing extent buffer
  btrfs: fix extent buffer leak after tree mod log failure at split_node()
  btrfs: avoid tree mod log ENOMEM failures when we don't need to log
  btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
  btrfs: do not BUG_ON() on tree mod log failure at balance_level()
  btrfs: rename enospc label to out at balance_level()
  btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  btrfs: abort transaction at balance_level() when left child is missing
  btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
  btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
  btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()

 fs/btrfs/ctree.c        | 221 +++++++++++++++++++++++++++++-----------
 fs/btrfs/ctree.h        |   4 +-
 fs/btrfs/tree-mod-log.c | 148 ++++++++++++++++++++-------
 3 files changed, 279 insertions(+), 94 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 02/13] btrfs: fix extent buffer leak after tree mod log failure at split_node() fdmanana
                     ` (13 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.

So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.

Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 385524224037..7f7f13965fe9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 		add_root_to_dirty_list(root);
 	} else {
 		WARN_ON(trans->transid != btrfs_header_generation(parent));
-		btrfs_tree_mod_log_insert_key(parent, parent_slot,
-					      BTRFS_MOD_LOG_KEY_REPLACE);
+		ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+						    BTRFS_MOD_LOG_KEY_REPLACE);
+		if (ret) {
+			btrfs_tree_unlock(cow);
+			free_extent_buffer(cow);
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
 		btrfs_set_node_blockptr(parent, parent_slot,
 					cow->start);
 		btrfs_set_node_ptr_generation(parent, parent_slot,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 02/13] btrfs: fix extent buffer leak after tree mod log failure at split_node()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
  2023-06-08 10:27   ` [PATCH v2 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log fdmanana
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At split_node(), if we fail to log the tree mod log copy operation, we
return without unlocking the split extent buffer we just allocated and
without decrementing the reference we own on it. Fix this by unlocking
it and decrementing the ref count before returning.

Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 7f7f13965fe9..8496535828de 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3053,6 +3053,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_tree_mod_log_eb_copy(split, c, 0, mid, c_nritems - mid);
 	if (ret) {
+		btrfs_tree_unlock(split);
+		free_extent_buffer(split);
 		btrfs_abort_transaction(trans, ret);
 		return ret;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
  2023-06-08 10:27   ` [PATCH v2 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
  2023-06-08 10:27   ` [PATCH v2 02/13] btrfs: fix extent buffer leak after tree mod log failure at split_node() fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
                     ` (11 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

When logging tree mod log operations we start by checking, in a lockless
manner, if we need to log - if we don't, we just return and do nothing,
otherwise we will allocate one or more tree mod log operations and then
check again if we need to log. This second check will take the tree mod
log lock in write mode if we need to log, otherwise it will do nothing
and we just free the allocated memory and return success.

We can improve on this by not returning an error in case the memory
allocations fail, unless the second check tells us that we actually need
to log. That is, if we fail to allocate memory and the second check tells
use that we don't need to log, we can just return success and avoid
returning -ENOMEM to the caller. Currently tree mod log failures are
dealt with either a BUG_ON() or a transaction abort, as tree mod log
operations are logged in code paths that modify a b+tree.

So just avoid failing with -ENOMEM if we fail to allocate a tree mod log
operation unless we actually need to log the operations, that is, if
tree_mod_dont_log() returns true.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/tree-mod-log.c | 148 +++++++++++++++++++++++++++++++---------
 1 file changed, 114 insertions(+), 34 deletions(-)

diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index 07c086f9e35e..3df6153d5d5a 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -226,21 +226,32 @@ int btrfs_tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
 				  enum btrfs_mod_log_op op)
 {
 	struct tree_mod_elem *tm;
-	int ret;
+	int ret = 0;
 
 	if (!tree_mod_need_log(eb->fs_info, eb))
 		return 0;
 
 	tm = alloc_tree_mod_elem(eb, slot, op);
 	if (!tm)
-		return -ENOMEM;
+		ret = -ENOMEM;
 
 	if (tree_mod_dont_log(eb->fs_info, eb)) {
 		kfree(tm);
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
 		return 0;
+	} else if (ret != 0) {
+		/*
+		 * We previously failed to allocate memory and we need to log,
+		 * so we have to fail.
+		 */
+		goto out_unlock;
 	}
 
 	ret = tree_mod_log_insert(eb->fs_info, tm);
+out_unlock:
 	write_unlock(&eb->fs_info->tree_mod_log_lock);
 	if (ret)
 		kfree(tm);
@@ -282,14 +293,16 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
 		return 0;
 
 	tm_list = kcalloc(nr_items, sizeof(struct tree_mod_elem *), GFP_NOFS);
-	if (!tm_list)
-		return -ENOMEM;
+	if (!tm_list) {
+		ret = -ENOMEM;
+		goto lock;
+	}
 
 	tm = tree_mod_log_alloc_move(eb, dst_slot, src_slot, nr_items);
 	if (IS_ERR(tm)) {
 		ret = PTR_ERR(tm);
 		tm = NULL;
-		goto free_tms;
+		goto lock;
 	}
 
 	for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
@@ -297,14 +310,28 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
 				BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING);
 		if (!tm_list[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
-	if (tree_mod_dont_log(eb->fs_info, eb))
+lock:
+	if (tree_mod_dont_log(eb->fs_info, eb)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	}
 	locked = true;
 
+	/*
+	 * We previously failed to allocate memory and we need to log, so we
+	 * have to fail.
+	 */
+	if (ret != 0)
+		goto free_tms;
+
 	/*
 	 * When we override something during the move, we log these removals.
 	 * This can only happen when we move towards the beginning of the
@@ -325,10 +352,12 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
 	return 0;
 
 free_tms:
-	for (i = 0; i < nr_items; i++) {
-		if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
-			rb_erase(&tm_list[i]->node, &eb->fs_info->tree_mod_log);
-		kfree(tm_list[i]);
+	if (tm_list) {
+		for (i = 0; i < nr_items; i++) {
+			if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+				rb_erase(&tm_list[i]->node, &eb->fs_info->tree_mod_log);
+			kfree(tm_list[i]);
+		}
 	}
 	if (locked)
 		write_unlock(&eb->fs_info->tree_mod_log_lock);
@@ -378,14 +407,14 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
 				  GFP_NOFS);
 		if (!tm_list) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 		for (i = 0; i < nritems; i++) {
 			tm_list[i] = alloc_tree_mod_elem(old_root, i,
 			    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING);
 			if (!tm_list[i]) {
 				ret = -ENOMEM;
-				goto free_tms;
+				goto lock;
 			}
 		}
 	}
@@ -393,7 +422,7 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
 	tm = kzalloc(sizeof(*tm), GFP_NOFS);
 	if (!tm) {
 		ret = -ENOMEM;
-		goto free_tms;
+		goto lock;
 	}
 
 	tm->logical = new_root->start;
@@ -402,14 +431,28 @@ int btrfs_tree_mod_log_insert_root(struct extent_buffer *old_root,
 	tm->generation = btrfs_header_generation(old_root);
 	tm->op = BTRFS_MOD_LOG_ROOT_REPLACE;
 
-	if (tree_mod_dont_log(fs_info, NULL))
+lock:
+	if (tree_mod_dont_log(fs_info, NULL)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	} else if (ret != 0) {
+		/*
+		 * We previously failed to allocate memory and we need to log,
+		 * so we have to fail.
+		 */
+		goto out_unlock;
+	}
 
 	if (tm_list)
 		ret = tree_mod_log_free_eb(fs_info, tm_list, nritems);
 	if (!ret)
 		ret = tree_mod_log_insert(fs_info, tm);
 
+out_unlock:
 	write_unlock(&fs_info->tree_mod_log_lock);
 	if (ret)
 		goto free_tms;
@@ -501,7 +544,8 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 	struct btrfs_fs_info *fs_info = dst->fs_info;
 	int ret = 0;
 	struct tree_mod_elem **tm_list = NULL;
-	struct tree_mod_elem **tm_list_add, **tm_list_rem;
+	struct tree_mod_elem **tm_list_add = NULL;
+	struct tree_mod_elem **tm_list_rem = NULL;
 	int i;
 	bool locked = false;
 	struct tree_mod_elem *dst_move_tm = NULL;
@@ -517,8 +561,10 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 
 	tm_list = kcalloc(nr_items * 2, sizeof(struct tree_mod_elem *),
 			  GFP_NOFS);
-	if (!tm_list)
-		return -ENOMEM;
+	if (!tm_list) {
+		ret = -ENOMEM;
+		goto lock;
+	}
 
 	if (dst_move_nr_items) {
 		dst_move_tm = tree_mod_log_alloc_move(dst, dst_offset + nr_items,
@@ -526,7 +572,7 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 		if (IS_ERR(dst_move_tm)) {
 			ret = PTR_ERR(dst_move_tm);
 			dst_move_tm = NULL;
-			goto free_tms;
+			goto lock;
 		}
 	}
 	if (src_move_nr_items) {
@@ -536,7 +582,7 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 		if (IS_ERR(src_move_tm)) {
 			ret = PTR_ERR(src_move_tm);
 			src_move_tm = NULL;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
@@ -547,21 +593,35 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 						     BTRFS_MOD_LOG_KEY_REMOVE);
 		if (!tm_list_rem[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 
 		tm_list_add[i] = alloc_tree_mod_elem(dst, i + dst_offset,
 						     BTRFS_MOD_LOG_KEY_ADD);
 		if (!tm_list_add[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
-	if (tree_mod_dont_log(fs_info, NULL))
+lock:
+	if (tree_mod_dont_log(fs_info, NULL)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	}
 	locked = true;
 
+	/*
+	 * We previously failed to allocate memory and we need to log, so we
+	 * have to fail.
+	 */
+	if (ret != 0)
+		goto free_tms;
+
 	if (dst_move_tm) {
 		ret = tree_mod_log_insert(fs_info, dst_move_tm);
 		if (ret)
@@ -593,10 +653,12 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
 	if (src_move_tm && !RB_EMPTY_NODE(&src_move_tm->node))
 		rb_erase(&src_move_tm->node, &fs_info->tree_mod_log);
 	kfree(src_move_tm);
-	for (i = 0; i < nr_items * 2; i++) {
-		if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
-			rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
-		kfree(tm_list[i]);
+	if (tm_list) {
+		for (i = 0; i < nr_items * 2; i++) {
+			if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
+				rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
+			kfree(tm_list[i]);
+		}
 	}
 	if (locked)
 		write_unlock(&fs_info->tree_mod_log_lock);
@@ -617,22 +679,38 @@ int btrfs_tree_mod_log_free_eb(struct extent_buffer *eb)
 
 	nritems = btrfs_header_nritems(eb);
 	tm_list = kcalloc(nritems, sizeof(struct tree_mod_elem *), GFP_NOFS);
-	if (!tm_list)
-		return -ENOMEM;
+	if (!tm_list) {
+		ret = -ENOMEM;
+		goto lock;
+	}
 
 	for (i = 0; i < nritems; i++) {
 		tm_list[i] = alloc_tree_mod_elem(eb, i,
 				    BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING);
 		if (!tm_list[i]) {
 			ret = -ENOMEM;
-			goto free_tms;
+			goto lock;
 		}
 	}
 
-	if (tree_mod_dont_log(eb->fs_info, eb))
+lock:
+	if (tree_mod_dont_log(eb->fs_info, eb)) {
+		/*
+		 * Don't error if we failed to allocate memory because we don't
+		 * need to log.
+		 */
+		ret = 0;
 		goto free_tms;
+	} else if (ret != 0) {
+		/*
+		 * We previously failed to allocate memory and we need to log,
+		 * so we have to fail.
+		 */
+		goto out_unlock;
+	}
 
 	ret = tree_mod_log_free_eb(eb->fs_info, tm_list, nritems);
+out_unlock:
 	write_unlock(&eb->fs_info->tree_mod_log_lock);
 	if (ret)
 		goto free_tms;
@@ -641,9 +719,11 @@ int btrfs_tree_mod_log_free_eb(struct extent_buffer *eb)
 	return 0;
 
 free_tms:
-	for (i = 0; i < nritems; i++)
-		kfree(tm_list[i]);
-	kfree(tm_list);
+	if (tm_list) {
+		for (i = 0; i < nritems; i++)
+			kfree(tm_list[i]);
+		kfree(tm_list);
+	}
 
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (2 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to
record a tree mod log root insertion operation, do a transaction abort
instead. There's really no need for the BUG_ON(), we can properly
release all resources in this context and turn the filesystem to RO mode
and in an error state instead.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 8496535828de..d6c29564ce49 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -584,9 +584,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
 		    btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
 			parent_start = buf->start;
 
-		atomic_inc(&cow->refs);
 		ret = btrfs_tree_mod_log_insert_root(root->node, cow, true);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_tree_unlock(cow);
+			free_extent_buffer(cow);
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
+		atomic_inc(&cow->refs);
 		rcu_assign_pointer(root->node, cow);
 
 		btrfs_free_tree_block(trans, btrfs_root_id(root), buf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (3 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 06/13] btrfs: rename enospc label to out " fdmanana
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release
all resources in this context, and we have to abort because other future
tree searches that use the tree mod log (btrfs_search_old_slot()) may get
inconsistent results if other operations modify the tree after that
failure and before the tree mod log based search.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d6c29564ce49..d60b28c6bd1b 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1054,7 +1054,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		}
 
 		ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_tree_unlock(child);
+			free_extent_buffer(child);
+			btrfs_abort_transaction(trans, ret);
+			goto enospc;
+		}
 		rcu_assign_pointer(root->node, child);
 
 		add_root_to_dirty_list(root);
@@ -1142,7 +1147,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 			btrfs_node_key(right, &right_key, 0);
 			ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
 					BTRFS_MOD_LOG_KEY_REPLACE);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				goto enospc;
+			}
 			btrfs_set_node_key(parent, &right_key, pslot + 1);
 			btrfs_mark_buffer_dirty(parent);
 		}
@@ -1188,7 +1196,10 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		btrfs_node_key(mid, &mid_key, 0);
 		ret = btrfs_tree_mod_log_insert_key(parent, pslot,
 						    BTRFS_MOD_LOG_KEY_REPLACE);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_abort_transaction(trans, ret);
+			goto enospc;
+		}
 		btrfs_set_node_key(parent, &mid_key, pslot);
 		btrfs_mark_buffer_dirty(parent);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 06/13] btrfs: rename enospc label to out at balance_level()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (4 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level() we have this 'enospc' label where we jump to in case
we get an error at several places. However that error is certainly not
-ENOSPC in call cases, it can be -EIO or -ENOMEM when reading a child
extent buffer for example, or -ENOMEM when trying to record tree mod log
operations. So to make this less confusing, rename the label to 'out'.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index d60b28c6bd1b..e98f9e205e25 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1041,7 +1041,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (IS_ERR(child)) {
 			ret = PTR_ERR(child);
 			btrfs_handle_fs_error(fs_info, ret, NULL);
-			goto enospc;
+			goto out;
 		}
 
 		btrfs_tree_lock(child);
@@ -1050,7 +1050,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (ret) {
 			btrfs_tree_unlock(child);
 			free_extent_buffer(child);
-			goto enospc;
+			goto out;
 		}
 
 		ret = btrfs_tree_mod_log_insert_root(root->node, child, true);
@@ -1058,7 +1058,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 			btrfs_tree_unlock(child);
 			free_extent_buffer(child);
 			btrfs_abort_transaction(trans, ret);
-			goto enospc;
+			goto out;
 		}
 		rcu_assign_pointer(root->node, child);
 
@@ -1087,7 +1087,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (IS_ERR(left)) {
 			ret = PTR_ERR(left);
 			left = NULL;
-			goto enospc;
+			goto out;
 		}
 
 		__btrfs_tree_lock(left, BTRFS_NESTING_LEFT);
@@ -1096,7 +1096,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 				       BTRFS_NESTING_LEFT_COW);
 		if (wret) {
 			ret = wret;
-			goto enospc;
+			goto out;
 		}
 	}
 
@@ -1105,7 +1105,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (IS_ERR(right)) {
 			ret = PTR_ERR(right);
 			right = NULL;
-			goto enospc;
+			goto out;
 		}
 
 		__btrfs_tree_lock(right, BTRFS_NESTING_RIGHT);
@@ -1114,7 +1114,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 				       BTRFS_NESTING_RIGHT_COW);
 		if (wret) {
 			ret = wret;
-			goto enospc;
+			goto out;
 		}
 	}
 
@@ -1149,7 +1149,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 					BTRFS_MOD_LOG_KEY_REPLACE);
 			if (ret < 0) {
 				btrfs_abort_transaction(trans, ret);
-				goto enospc;
+				goto out;
 			}
 			btrfs_set_node_key(parent, &right_key, pslot + 1);
 			btrfs_mark_buffer_dirty(parent);
@@ -1168,12 +1168,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (!left) {
 			ret = -EROFS;
 			btrfs_handle_fs_error(fs_info, ret, NULL);
-			goto enospc;
+			goto out;
 		}
 		wret = balance_node_right(trans, mid, left);
 		if (wret < 0) {
 			ret = wret;
-			goto enospc;
+			goto out;
 		}
 		if (wret == 1) {
 			wret = push_node_left(trans, left, mid, 1);
@@ -1198,7 +1198,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 						    BTRFS_MOD_LOG_KEY_REPLACE);
 		if (ret < 0) {
 			btrfs_abort_transaction(trans, ret);
-			goto enospc;
+			goto out;
 		}
 		btrfs_set_node_key(parent, &mid_key, pslot);
 		btrfs_mark_buffer_dirty(parent);
@@ -1225,7 +1225,7 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 	if (orig_ptr !=
 	    btrfs_node_blockptr(path->nodes[level], path->slots[level]))
 		BUG();
-enospc:
+out:
 	if (right) {
 		btrfs_tree_unlock(right);
 		free_extent_buffer(right);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (5 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 06/13] btrfs: rename enospc label to out " fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:51     ` Qu Wenruo
  2023-06-08 10:27   ` [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
                     ` (7 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level(), when trying to promote a child node to a root node, if
we fail to read the child we call btrfs_handle_fs_error(), which turns the
fs to RO mode and sets it to error state as well, causing any ongoing
transaction to abort. This however is not necessary because at that point
we have not made any change yet at balance_level(), so any error reading
the child node does not leaves us in any inconsistent state. Therefore we
can just return the error to the caller.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index e98f9e205e25..4dcdcf25c3fe 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1040,7 +1040,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		child = btrfs_read_node_slot(mid, 0);
 		if (IS_ERR(child)) {
 			ret = PTR_ERR(child);
-			btrfs_handle_fs_error(fs_info, ret, NULL);
 			goto out;
 		}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (6 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:37     ` Qu Wenruo
  2023-06-08 10:52     ` Qu Wenruo
  2023-06-08 10:27   ` [PATCH v2 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
                     ` (6 subsequent siblings)
  14 siblings, 2 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At balance_level() we are calling btrfs_handle_fs_error() when the middle
child only has 1 item and the left child is missing, however we can simply
use btrfs_abort_transaction(), which achieves the same purposes: to turn
the fs to error state, abort the current transaction and turn the fs to
RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
which makes it more useful.

Also, as this is a highly unexpected case and it's about a b+tree
inconsistency, change the error code from -EROFS to -EUCLEAN, tag the if
branch as 'unlikely' and log an explicit error message.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4dcdcf25c3fe..00eea2925d1d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1164,9 +1164,13 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		 * otherwise we would have pulled some pointers from the
 		 * right
 		 */
-		if (!left) {
-			ret = -EROFS;
-			btrfs_handle_fs_error(fs_info, ret, NULL);
+		if (unlikely(!left)) {
+			btrfs_crit(fs_info,
+"missing left child when middle child only has 1 item, parent bytenr %llu level %d mid bytenr %llu root %llu",
+				   parent->start, btrfs_header_level(parent),
+				   mid->start, btrfs_root_id(root));
+			ret = -EUCLEAN;
+			btrfs_abort_transaction(trans, ret);
 			goto out;
 		}
 		wret = balance_node_right(trans, mid, left);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (7 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:52     ` Qu Wenruo
  2023-06-08 10:27   ` [PATCH v2 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
                     ` (5 subsequent siblings)
  14 siblings, 1 reply; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At update_ref_for_cow() we are calling btrfs_handle_fs_error() if we find
that the extent buffer has an unexpected ref count of zero, however we can
simply use btrfs_abort_transaction(), which achieves the same purposes: to
turn the fs to error state, abort the current transaction and turn the fs
to RO mode as well. Besides that, btrfs_abort_transaction() also prints a
stack trace which makes it more useful.

Also, as this is a very unexpected situation, indicating a serious
corruption/inconsistency, tag the if branch as 'unlikely', set the error
code to -EUCLEAN instead of -EROFS, and log an explicit message.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 00eea2925d1d..0449b3d819a9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -421,9 +421,13 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
 					       &refs, &flags);
 		if (ret)
 			return ret;
-		if (refs == 0) {
-			ret = -EROFS;
-			btrfs_handle_fs_error(fs_info, ret, NULL);
+		if (unlikely(refs == 0)) {
+			btrfs_crit(fs_info,
+				   "found 0 references for tree block at bytenr %llu level %d root %llu",
+				   buf->start, btrfs_header_level(buf),
+				   btrfs_root_id(root));
+			ret = -EUCLEAN;
+			btrfs_abort_transaction(trans, ret);
 			return ret;
 		}
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (8 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At push_nodes_for_insert(), instead of doing a BUG_ON() in case we fail to
record tree mod log operations, do a transaction abort and return the
error to the caller. There's really no need for the BUG_ON() as we can
release all resources in this context, and we have to abort because other
future tree searches that use the tree mod log (btrfs_search_old_slot())
may get inconsistent results if other operations modify the tree after
that failure and before the tree mod log based search.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 0449b3d819a9..620ed3a3e51e 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1308,7 +1308,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
 			btrfs_node_key(mid, &disk_key, 0);
 			ret = btrfs_tree_mod_log_insert_key(parent, pslot,
 					BTRFS_MOD_LOG_KEY_REPLACE);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_tree_unlock(left);
+				free_extent_buffer(left);
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 			btrfs_set_node_key(parent, &disk_key, pslot);
 			btrfs_mark_buffer_dirty(parent);
 			if (btrfs_header_nritems(left) > orig_slot) {
@@ -1363,7 +1368,12 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
 			btrfs_node_key(right, &disk_key, 0);
 			ret = btrfs_tree_mod_log_insert_key(parent, pslot + 1,
 					BTRFS_MOD_LOG_KEY_REPLACE);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_tree_unlock(right);
+				free_extent_buffer(right);
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 			btrfs_set_node_key(parent, &disk_key, pslot + 1);
 			btrfs_mark_buffer_dirty(parent);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (9 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At insert_new_root(), instead of doing a BUG_ON() in case we fail to
record the tree mod log operation, just return the error to the callers
after releasing the allocated tree block. At this point we haven't made
any changes to the b+tree, so we haven't left it in an inconsistent state
and therefore have no need to abort the transaction. All we need to do is
to unlock and free the extent buffer we just allocated with the purpose
of making it the new root.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 620ed3a3e51e..056b174c4b33 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2964,7 +2964,12 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
 
 	old = root->node;
 	ret = btrfs_tree_mod_log_insert_root(root->node, c, false);
-	BUG_ON(ret < 0);
+	if (ret < 0) {
+		btrfs_free_tree_block(trans, btrfs_root_id(root), c, 0, 1);
+		btrfs_tree_unlock(c);
+		free_extent_buffer(c);
+		return ret;
+	}
 	rcu_assign_pointer(root->node, c);
 
 	/* the super has an extra ref to root->node */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (10 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-08 10:27   ` [PATCH v2 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At insert_ptr(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release all
resources in the context of all callers, and we have to abort because other
future tree searches that use the tree mod log (btrfs_search_old_slot())
may get inconsistent results if other operations modify the tree after
that failure and before the tree mod log based search.

This implies making insert_ptr() return an int instead of void, and making
all callers check for returned errors.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 71 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 056b174c4b33..0188cf6e30bf 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2990,10 +2990,10 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
  * slot and level indicate where you want the key to go, and
  * blocknr is the block the key points to.
  */
-static void insert_ptr(struct btrfs_trans_handle *trans,
-		       struct btrfs_path *path,
-		       struct btrfs_disk_key *key, u64 bytenr,
-		       int slot, int level)
+static int insert_ptr(struct btrfs_trans_handle *trans,
+		      struct btrfs_path *path,
+		      struct btrfs_disk_key *key, u64 bytenr,
+		      int slot, int level)
 {
 	struct extent_buffer *lower;
 	int nritems;
@@ -3009,7 +3009,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 		if (level) {
 			ret = btrfs_tree_mod_log_insert_move(lower, slot + 1,
 					slot, nritems - slot);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 		}
 		memmove_extent_buffer(lower,
 			      btrfs_node_key_ptr_offset(lower, slot + 1),
@@ -3019,7 +3022,10 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 	if (level) {
 		ret = btrfs_tree_mod_log_insert_key(lower, slot,
 						    BTRFS_MOD_LOG_KEY_ADD);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
 	}
 	btrfs_set_node_key(lower, key, slot);
 	btrfs_set_node_blockptr(lower, slot, bytenr);
@@ -3027,6 +3033,8 @@ static void insert_ptr(struct btrfs_trans_handle *trans,
 	btrfs_set_node_ptr_generation(lower, slot, trans->transid);
 	btrfs_set_header_nritems(lower, nritems + 1);
 	btrfs_mark_buffer_dirty(lower);
+
+	return 0;
 }
 
 /*
@@ -3106,8 +3114,13 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
 	btrfs_mark_buffer_dirty(c);
 	btrfs_mark_buffer_dirty(split);
 
-	insert_ptr(trans, path, &disk_key, split->start,
-		   path->slots[level + 1] + 1, level + 1);
+	ret = insert_ptr(trans, path, &disk_key, split->start,
+			 path->slots[level + 1] + 1, level + 1);
+	if (ret < 0) {
+		btrfs_tree_unlock(split);
+		free_extent_buffer(split);
+		return ret;
+	}
 
 	if (path->slots[level] >= mid) {
 		path->slots[level] -= mid;
@@ -3584,16 +3597,17 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
  * split the path's leaf in two, making sure there is at least data_size
  * available for the resulting leaf level of the path.
  */
-static noinline void copy_for_split(struct btrfs_trans_handle *trans,
-				    struct btrfs_path *path,
-				    struct extent_buffer *l,
-				    struct extent_buffer *right,
-				    int slot, int mid, int nritems)
+static noinline int copy_for_split(struct btrfs_trans_handle *trans,
+				   struct btrfs_path *path,
+				   struct extent_buffer *l,
+				   struct extent_buffer *right,
+				   int slot, int mid, int nritems)
 {
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	int data_copy_size;
 	int rt_data_off;
 	int i;
+	int ret;
 	struct btrfs_disk_key disk_key;
 	struct btrfs_map_token token;
 
@@ -3618,7 +3632,9 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
 
 	btrfs_set_header_nritems(l, mid);
 	btrfs_item_key(right, &disk_key, 0);
-	insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
+	ret = insert_ptr(trans, path, &disk_key, right->start, path->slots[1] + 1, 1);
+	if (ret < 0)
+		return ret;
 
 	btrfs_mark_buffer_dirty(right);
 	btrfs_mark_buffer_dirty(l);
@@ -3636,6 +3652,8 @@ static noinline void copy_for_split(struct btrfs_trans_handle *trans,
 	}
 
 	BUG_ON(path->slots[0] < 0);
+
+	return 0;
 }
 
 /*
@@ -3834,8 +3852,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 	if (split == 0) {
 		if (mid <= slot) {
 			btrfs_set_header_nritems(right, 0);
-			insert_ptr(trans, path, &disk_key,
-				   right->start, path->slots[1] + 1, 1);
+			ret = insert_ptr(trans, path, &disk_key,
+					 right->start, path->slots[1] + 1, 1);
+			if (ret < 0) {
+				btrfs_tree_unlock(right);
+				free_extent_buffer(right);
+				return ret;
+			}
 			btrfs_tree_unlock(path->nodes[0]);
 			free_extent_buffer(path->nodes[0]);
 			path->nodes[0] = right;
@@ -3843,8 +3866,13 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 			path->slots[1] += 1;
 		} else {
 			btrfs_set_header_nritems(right, 0);
-			insert_ptr(trans, path, &disk_key,
-				   right->start, path->slots[1], 1);
+			ret = insert_ptr(trans, path, &disk_key,
+					 right->start, path->slots[1], 1);
+			if (ret < 0) {
+				btrfs_tree_unlock(right);
+				free_extent_buffer(right);
+				return ret;
+			}
 			btrfs_tree_unlock(path->nodes[0]);
 			free_extent_buffer(path->nodes[0]);
 			path->nodes[0] = right;
@@ -3860,7 +3888,12 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
 		return ret;
 	}
 
-	copy_for_split(trans, path, l, right, slot, mid, nritems);
+	ret = copy_for_split(trans, path, l, right, slot, mid, nritems);
+	if (ret < 0) {
+		btrfs_tree_unlock(right);
+		free_extent_buffer(right);
+		return ret;
+	}
 
 	if (split == 2) {
 		BUG_ON(num_doubles != 0);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (11 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
@ 2023-06-08 10:27   ` fdmanana
  2023-06-09 16:51   ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations David Sterba
  2023-06-09 17:20   ` David Sterba
  14 siblings, 0 replies; 54+ messages in thread
From: fdmanana @ 2023-06-08 10:27 UTC (permalink / raw)
  To: linux-btrfs

From: Filipe Manana <fdmanana@suse.com>

At btrfs_del_ptr(), instead of doing a BUG_ON() in case we fail to record
tree mod log operations, do a transaction abort and return the error to
the callers. There's really no need for the BUG_ON() as we can release all
resources in the context of all callers, and we have to abort because other
future tree searches that use the tree mod log (btrfs_search_old_slot())
may get inconsistent results if other operations modify the tree after
that failure and before the tree mod log based search.

This implies btrfs_del_ptr() return an int instead of void, and making all
callers check for returned errors.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 52 ++++++++++++++++++++++++++++++++++++------------
 fs/btrfs/ctree.h |  4 ++--
 2 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 0188cf6e30bf..29c5fa252eb1 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1139,7 +1139,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		if (btrfs_header_nritems(right) == 0) {
 			btrfs_clear_buffer_dirty(trans, right);
 			btrfs_tree_unlock(right);
-			btrfs_del_ptr(root, path, level + 1, pslot + 1);
+			ret = btrfs_del_ptr(trans, root, path, level + 1, pslot + 1);
+			if (ret < 0) {
+				free_extent_buffer_stale(right);
+				right = NULL;
+				goto out;
+			}
 			root_sub_used(root, right->len);
 			btrfs_free_tree_block(trans, btrfs_root_id(root), right,
 					      0, 1);
@@ -1192,7 +1197,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 	if (btrfs_header_nritems(mid) == 0) {
 		btrfs_clear_buffer_dirty(trans, mid);
 		btrfs_tree_unlock(mid);
-		btrfs_del_ptr(root, path, level + 1, pslot);
+		ret = btrfs_del_ptr(trans, root, path, level + 1, pslot);
+		if (ret < 0) {
+			free_extent_buffer_stale(mid);
+			mid = NULL;
+			goto out;
+		}
 		root_sub_used(root, mid->len);
 		btrfs_free_tree_block(trans, btrfs_root_id(root), mid, 0, 1);
 		free_extent_buffer_stale(mid);
@@ -4440,8 +4450,8 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans,
  *
  * This is exported for use inside btrfs-progs, don't un-export it.
  */
-void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
-		   int slot)
+int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		  struct btrfs_path *path, int level, int slot)
 {
 	struct extent_buffer *parent = path->nodes[level];
 	u32 nritems;
@@ -4452,7 +4462,10 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
 		if (level) {
 			ret = btrfs_tree_mod_log_insert_move(parent, slot,
 					slot + 1, nritems - slot - 1);
-			BUG_ON(ret < 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				return ret;
+			}
 		}
 		memmove_extent_buffer(parent,
 			      btrfs_node_key_ptr_offset(parent, slot),
@@ -4462,7 +4475,10 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
 	} else if (level) {
 		ret = btrfs_tree_mod_log_insert_key(parent, slot,
 						    BTRFS_MOD_LOG_KEY_REMOVE);
-		BUG_ON(ret < 0);
+		if (ret < 0) {
+			btrfs_abort_transaction(trans, ret);
+			return ret;
+		}
 	}
 
 	nritems--;
@@ -4478,6 +4494,7 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
 		fixup_low_keys(path, &disk_key, level + 1);
 	}
 	btrfs_mark_buffer_dirty(parent);
+	return 0;
 }
 
 /*
@@ -4490,13 +4507,17 @@ void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
  * The path must have already been setup for deleting the leaf, including
  * all the proper balancing.  path->nodes[1] must be locked.
  */
-static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
-				    struct btrfs_root *root,
-				    struct btrfs_path *path,
-				    struct extent_buffer *leaf)
+static noinline int btrfs_del_leaf(struct btrfs_trans_handle *trans,
+				   struct btrfs_root *root,
+				   struct btrfs_path *path,
+				   struct extent_buffer *leaf)
 {
+	int ret;
+
 	WARN_ON(btrfs_header_generation(leaf) != trans->transid);
-	btrfs_del_ptr(root, path, 1, path->slots[1]);
+	ret = btrfs_del_ptr(trans, root, path, 1, path->slots[1]);
+	if (ret < 0)
+		return ret;
 
 	/*
 	 * btrfs_free_extent is expensive, we want to make sure we
@@ -4509,6 +4530,7 @@ static noinline void btrfs_del_leaf(struct btrfs_trans_handle *trans,
 	atomic_inc(&leaf->refs);
 	btrfs_free_tree_block(trans, btrfs_root_id(root), leaf, 0, 1);
 	free_extent_buffer_stale(leaf);
+	return 0;
 }
 /*
  * delete the item at the leaf level in path.  If that empties
@@ -4558,7 +4580,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 			btrfs_set_header_level(leaf, 0);
 		} else {
 			btrfs_clear_buffer_dirty(trans, leaf);
-			btrfs_del_leaf(trans, root, path, leaf);
+			ret = btrfs_del_leaf(trans, root, path, leaf);
+			if (ret < 0)
+				return ret;
 		}
 	} else {
 		int used = leaf_space_used(leaf, 0, nritems);
@@ -4619,7 +4643,9 @@ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 
 			if (btrfs_header_nritems(leaf) == 0) {
 				path->slots[1] = slot;
-				btrfs_del_leaf(trans, root, path, leaf);
+				ret = btrfs_del_leaf(trans, root, path, leaf);
+				if (ret < 0)
+					return ret;
 				free_extent_buffer(leaf);
 				ret = 0;
 			} else {
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 5af61480dde6..f2d2b313bde5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -541,8 +541,8 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 		      struct extent_buffer **cow_ret, u64 new_root_objectid);
 int btrfs_block_can_be_shared(struct btrfs_root *root,
 			      struct extent_buffer *buf);
-void btrfs_del_ptr(struct btrfs_root *root, struct btrfs_path *path, int level,
-		   int slot);
+int btrfs_del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+		  struct btrfs_path *path, int level, int slot);
 void btrfs_extend_item(struct btrfs_path *path, u32 data_size);
 void btrfs_truncate_item(struct btrfs_path *path, u32 new_size, int from_end);
 int btrfs_split_item(struct btrfs_trans_handle *trans,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-08 10:27   ` [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
@ 2023-06-08 10:37     ` Qu Wenruo
  2023-06-08 10:52     ` Qu Wenruo
  1 sibling, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08 10:37 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 18:27, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At balance_level() we are calling btrfs_handle_fs_error() when the middle
> child only has 1 item and the left child is missing, however we can simply
> use btrfs_abort_transaction(), which achieves the same purposes: to turn
> the fs to error state, abort the current transaction and turn the fs to
> RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
> which makes it more useful.
>
> Also, as this is a highly unexpected case and it's about a b+tree
> inconsistency, change the error code from -EROFS to -EUCLEAN, tag the if
> branch as 'unlikely' and log an explicit error message.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4dcdcf25c3fe..00eea2925d1d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1164,9 +1164,13 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		 * otherwise we would have pulled some pointers from the
>   		 * right
>   		 */
> -		if (!left) {
> -			ret = -EROFS;
> -			btrfs_handle_fs_error(fs_info, ret, NULL);
> +		if (unlikely(!left)) {
> +			btrfs_crit(fs_info,
> +"missing left child when middle child only has 1 item, parent bytenr %llu level %d mid bytenr %llu root %llu",
> +				   parent->start, btrfs_header_level(parent),
> +				   mid->start, btrfs_root_id(root));
> +			ret = -EUCLEAN;
> +			btrfs_abort_transaction(trans, ret);
>   			goto out;
>   		}
>   		wret = balance_node_right(trans, mid, left);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  2023-06-08 10:27   ` [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
@ 2023-06-08 10:51     ` Qu Wenruo
  2023-06-08 11:00       ` Filipe Manana
  0 siblings, 1 reply; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08 10:51 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 18:27, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At balance_level(), when trying to promote a child node to a root node, if
> we fail to read the child we call btrfs_handle_fs_error(), which turns the
> fs to RO mode and sets it to error state as well, causing any ongoing
> transaction to abort. This however is not necessary because at that point
> we have not made any change yet at balance_level(), so any error reading
> the child node does not leaves us in any inconsistent state. Therefore we
> can just return the error to the caller.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Although I'd like to add some comments on the error handling.

The catch here is, we can only hit the branch because @mid is already
the highest tree block of the path.
Thus the path has no CoWed tree block in it at all.

If the condition is not met, we will return an error while some CoWed
tree blocks are still in the path.
In that case, a simple btrfs_release_path() will only reduce the refs
and unlock, but not remove the delayed refs.

Thus this is more like an exception, other locations can not follow the
practice here.

Thanks,
Qu

> ---
>   fs/btrfs/ctree.c | 1 -
>   1 file changed, 1 deletion(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index e98f9e205e25..4dcdcf25c3fe 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1040,7 +1040,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		child = btrfs_read_node_slot(mid, 0);
>   		if (IS_ERR(child)) {
>   			ret = PTR_ERR(child);
> -			btrfs_handle_fs_error(fs_info, ret, NULL);
>   			goto out;
>   		}
>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero
  2023-06-08 10:27   ` [PATCH v2 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
@ 2023-06-08 10:52     ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08 10:52 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 18:27, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At update_ref_for_cow() we are calling btrfs_handle_fs_error() if we find
> that the extent buffer has an unexpected ref count of zero, however we can
> simply use btrfs_abort_transaction(), which achieves the same purposes: to
> turn the fs to error state, abort the current transaction and turn the fs
> to RO mode as well. Besides that, btrfs_abort_transaction() also prints a
> stack trace which makes it more useful.
>
> Also, as this is a very unexpected situation, indicating a serious
> corruption/inconsistency, tag the if branch as 'unlikely', set the error
> code to -EUCLEAN instead of -EROFS, and log an explicit message.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 00eea2925d1d..0449b3d819a9 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -421,9 +421,13 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
>   					       &refs, &flags);
>   		if (ret)
>   			return ret;
> -		if (refs == 0) {
> -			ret = -EROFS;
> -			btrfs_handle_fs_error(fs_info, ret, NULL);
> +		if (unlikely(refs == 0)) {
> +			btrfs_crit(fs_info,
> +				   "found 0 references for tree block at bytenr %llu level %d root %llu",
> +				   buf->start, btrfs_header_level(buf),
> +				   btrfs_root_id(root));
> +			ret = -EUCLEAN;
> +			btrfs_abort_transaction(trans, ret);
>   			return ret;
>   		}
>   	} else {

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-08 10:27   ` [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
  2023-06-08 10:37     ` Qu Wenruo
@ 2023-06-08 10:52     ` Qu Wenruo
  1 sibling, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08 10:52 UTC (permalink / raw)
  To: fdmanana, linux-btrfs



On 2023/6/8 18:27, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
>
> At balance_level() we are calling btrfs_handle_fs_error() when the middle
> child only has 1 item and the left child is missing, however we can simply
> use btrfs_abort_transaction(), which achieves the same purposes: to turn
> the fs to error state, abort the current transaction and turn the fs to
> RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
> which makes it more useful.
>
> Also, as this is a highly unexpected case and it's about a b+tree
> inconsistency, change the error code from -EROFS to -EUCLEAN, tag the if
> branch as 'unlikely' and log an explicit error message.
>
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4dcdcf25c3fe..00eea2925d1d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1164,9 +1164,13 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		 * otherwise we would have pulled some pointers from the
>   		 * right
>   		 */
> -		if (!left) {
> -			ret = -EROFS;
> -			btrfs_handle_fs_error(fs_info, ret, NULL);
> +		if (unlikely(!left)) {
> +			btrfs_crit(fs_info,
> +"missing left child when middle child only has 1 item, parent bytenr %llu level %d mid bytenr %llu root %llu",
> +				   parent->start, btrfs_header_level(parent),
> +				   mid->start, btrfs_root_id(root));
> +			ret = -EUCLEAN;
> +			btrfs_abort_transaction(trans, ret);
>   			goto out;
>   		}
>   		wret = balance_node_right(trans, mid, left);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  2023-06-08 10:51     ` Qu Wenruo
@ 2023-06-08 11:00       ` Filipe Manana
  2023-06-08 11:04         ` Qu Wenruo
  0 siblings, 1 reply; 54+ messages in thread
From: Filipe Manana @ 2023-06-08 11:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jun 8, 2023 at 11:51 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/6/8 18:27, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > At balance_level(), when trying to promote a child node to a root node, if
> > we fail to read the child we call btrfs_handle_fs_error(), which turns the
> > fs to RO mode and sets it to error state as well, causing any ongoing
> > transaction to abort. This however is not necessary because at that point
> > we have not made any change yet at balance_level(), so any error reading
> > the child node does not leaves us in any inconsistent state. Therefore we
> > can just return the error to the caller.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> Although I'd like to add some comments on the error handling.
>
> The catch here is, we can only hit the branch because @mid is already
> the highest tree block of the path.
> Thus the path has no CoWed tree block in it at all.

Even if it's not the highest level, there's no problem at all.
COWing blocks without doing anything else doesn't leave a tree in an
inconsistent state,
as long as each parent points to the new (COWed) child.

>
> If the condition is not met, we will return an error while some CoWed
> tree blocks are still in the path.

As said before, the COWed blocks are fine, the tree is consistent as
long as each
parent points to the new blocks.

> In that case, a simple btrfs_release_path() will only reduce the refs
> and unlock, but not remove the delayed refs.

btrfs_release_path() is never responsible for adding delayed blocks.
That happens during COW, when we call btrfs_free_tree_block().

>
> Thus this is more like an exception, other locations can not follow the
> practice here.
>
> Thanks,
> Qu
>
> > ---
> >   fs/btrfs/ctree.c | 1 -
> >   1 file changed, 1 deletion(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index e98f9e205e25..4dcdcf25c3fe 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -1040,7 +1040,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
> >               child = btrfs_read_node_slot(mid, 0);
> >               if (IS_ERR(child)) {
> >                       ret = PTR_ERR(child);
> > -                     btrfs_handle_fs_error(fs_info, ret, NULL);
> >                       goto out;
> >               }
> >

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
  2023-06-08 11:00       ` Filipe Manana
@ 2023-06-08 11:04         ` Qu Wenruo
  0 siblings, 0 replies; 54+ messages in thread
From: Qu Wenruo @ 2023-06-08 11:04 UTC (permalink / raw)
  To: Filipe Manana; +Cc: linux-btrfs



On 2023/6/8 19:00, Filipe Manana wrote:
> On Thu, Jun 8, 2023 at 11:51 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/6/8 18:27, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> At balance_level(), when trying to promote a child node to a root node, if
>>> we fail to read the child we call btrfs_handle_fs_error(), which turns the
>>> fs to RO mode and sets it to error state as well, causing any ongoing
>>> transaction to abort. This however is not necessary because at that point
>>> we have not made any change yet at balance_level(), so any error reading
>>> the child node does not leaves us in any inconsistent state. Therefore we
>>> can just return the error to the caller.
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
>> Although I'd like to add some comments on the error handling.
>>
>> The catch here is, we can only hit the branch because @mid is already
>> the highest tree block of the path.
>> Thus the path has no CoWed tree block in it at all.
>
> Even if it's not the highest level, there's no problem at all.
> COWing blocks without doing anything else doesn't leave a tree in an
> inconsistent state,
> as long as each parent points to the new (COWed) child.

Oh, you're right, I forgot that the newly COWed tree blocks should
always be accessible from the root node.

There is no problem at all from the very beginning.

Thanks,
Qu
>
>>
>> If the condition is not met, we will return an error while some CoWed
>> tree blocks are still in the path.
>
> As said before, the COWed blocks are fine, the tree is consistent as
> long as each
> parent points to the new blocks.
>
>> In that case, a simple btrfs_release_path() will only reduce the refs
>> and unlock, but not remove the delayed refs.
>
> btrfs_release_path() is never responsible for adding delayed blocks.
> That happens during COW, when we call btrfs_free_tree_block().
>
>>
>> Thus this is more like an exception, other locations can not follow the
>> practice here.
>>
>> Thanks,
>> Qu
>>
>>> ---
>>>    fs/btrfs/ctree.c | 1 -
>>>    1 file changed, 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index e98f9e205e25..4dcdcf25c3fe 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -1040,7 +1040,6 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>>>                child = btrfs_read_node_slot(mid, 0);
>>>                if (IS_ERR(child)) {
>>>                        ret = PTR_ERR(child);
>>> -                     btrfs_handle_fs_error(fs_info, ret, NULL);
>>>                        goto out;
>>>                }
>>>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing
  2023-06-08  9:47     ` Filipe Manana
@ 2023-06-08 12:26       ` David Sterba
  0 siblings, 0 replies; 54+ messages in thread
From: David Sterba @ 2023-06-08 12:26 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Qu Wenruo, linux-btrfs

On Thu, Jun 08, 2023 at 10:47:49AM +0100, Filipe Manana wrote:
> On Thu, Jun 8, 2023 at 9:58 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >
> >
> >
> > On 2023/6/8 03:24, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > At balance_level() we are calling btrfs_handle_fs_error() when the middle
> > > child only has 1 item and the left child is missing, however we can simply
> > > use btrfs_abort_transaction(), which achieves the same purposes: to turn
> > > the fs to error state, abort the current transaction and turn the fs to
> > > RO mode. Besides that, btrfs_abort_transaction() also prints a stack trace
> > > which makes it more useful.
> > >
> > > Also, as this is an highly unexpected case and it's about a b+tree
> > > inconsistency, change the error code from -EROFS to -EUCLEAN and tag the
> > > if branch as 'unlikely'.
> > >
> > > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > > ---
> > >   fs/btrfs/ctree.c | 6 +++---
> > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > > index 4dcdcf25c3fe..e2943047b01d 100644
> > > --- a/fs/btrfs/ctree.c
> > > +++ b/fs/btrfs/ctree.c
> > > @@ -1164,9 +1164,9 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
> > >                * otherwise we would have pulled some pointers from the
> > >                * right
> > >                */
> > > -             if (!left) {
> > > -                     ret = -EROFS;
> > > -                     btrfs_handle_fs_error(fs_info, ret, NULL);
> > > +             if (unlikely(!left)) {
> > > +                     ret = -EUCLEAN;
> >
> > I'd prefer to have an message every time we return -EUCLEAN.
> 
> Sure... I didn't see a strong need for that because the transaction
> abort's stack
> trace will be obvious. But I can add it in v2.

The error messages are a nice to have description of what happened, it
would be logged and may help to recognize the problem without reading
the stack trace or source. Also the message itself can print the values
for the failed conditions so this can give some clue as well.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (12 preceding siblings ...)
  2023-06-08 10:27   ` [PATCH v2 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
@ 2023-06-09 16:51   ` David Sterba
  2023-06-09 17:20   ` David Sterba
  14 siblings, 0 replies; 54+ messages in thread
From: David Sterba @ 2023-06-09 16:51 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, Jun 08, 2023 at 11:27:36AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> This mostly helps avoid some unnecessary enomem failures when logging
> tree mod log operations and replace some BUG_ON()'s when dealing with
> such failures. There's also 2 bug fixes (the first two patches) and
> some cleanups. More details on the changelogs.
> 
> V2: Add explicit error messages in patches 8/13 and 9/13.
>     Add missing unlock and ref count drop of 'right' extent buffer to patch 12/13.
>     Add missing extent buffer ref count drops for right and mid extent buffers in
>     error paths of balance_level() to patch 13/13.
>     Fix subject of patch 2/13 (removed duplicated word).
>     Added Reviewed-by tags where appropriate.
> 
> Filipe Manana (13):
>   btrfs: add missing error handling when logging operation while COWing extent buffer
>   btrfs: fix extent buffer leak after tree mod log failure at split_node()
>   btrfs: avoid tree mod log ENOMEM failures when we don't need to log
>   btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
>   btrfs: do not BUG_ON() on tree mod log failure at balance_level()
>   btrfs: rename enospc label to out at balance_level()
>   btrfs: avoid unnecessarily setting the fs to RO and error state at balance_level()
>   btrfs: abort transaction at balance_level() when left child is missing
>   btrfs: abort transaction at update_ref_for_cow() when ref count is zero
>   btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert()
>   btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()
>   btrfs: do not BUG_ON() on tree mod log failures at insert_ptr()
>   btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr()

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations
  2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
                     ` (13 preceding siblings ...)
  2023-06-09 16:51   ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations David Sterba
@ 2023-06-09 17:20   ` David Sterba
  14 siblings, 0 replies; 54+ messages in thread
From: David Sterba @ 2023-06-09 17:20 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

On Thu, Jun 08, 2023 at 11:27:36AM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> This mostly helps avoid some unnecessary enomem failures when logging
> tree mod log operations and replace some BUG_ON()'s when dealing with
> such failures. There's also 2 bug fixes (the first two patches) and
> some cleanups. More details on the changelogs.

The net effect of this patchset (+ the split_item fix) is -12 BUG_ONs,
that's great. Lost of them have been there for a long time. There are
still a few more remaining in ctree.c, some of them look like assertions
but verifying the conditions besides assertions would be good too. That
can be decided case by case.

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2023-06-09 17:26 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07 19:24 [PATCH 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
2023-06-07 19:24 ` [PATCH 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
2023-06-08  9:25   ` Qu Wenruo
2023-06-07 19:24 ` [PATCH 02/13] btrfs: fix extent buffer leak after failure tree mod log failure at split_node() fdmanana
2023-06-08  8:40   ` Qu Wenruo
2023-06-07 19:24 ` [PATCH 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log fdmanana
2023-06-07 19:24 ` [PATCH 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
2023-06-08  8:48   ` Qu Wenruo
2023-06-07 19:24 ` [PATCH 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
2023-06-08  8:52   ` Qu Wenruo
2023-06-07 19:24 ` [PATCH 06/13] btrfs: rename enospc label to out " fdmanana
2023-06-08  8:53   ` Qu Wenruo
2023-06-08  9:21   ` Anand Jain
2023-06-07 19:24 ` [PATCH 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
2023-06-07 19:24 ` [PATCH 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
2023-06-08  8:57   ` Qu Wenruo
2023-06-08  9:47     ` Filipe Manana
2023-06-08 12:26       ` David Sterba
2023-06-07 19:24 ` [PATCH 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
2023-06-08  8:58   ` Qu Wenruo
2023-06-08  9:47     ` Filipe Manana
2023-06-07 19:24 ` [PATCH 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
2023-06-08  9:02   ` Qu Wenruo
2023-06-08  9:46     ` Filipe Manana
2023-06-08 10:19       ` Qu Wenruo
2023-06-07 19:24 ` [PATCH 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
2023-06-08  9:11   ` Qu Wenruo
2023-06-07 19:24 ` [PATCH 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
2023-06-08  9:16   ` Qu Wenruo
2023-06-08  9:43     ` Filipe Manana
2023-06-07 19:24 ` [PATCH 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
2023-06-08  9:19   ` Qu Wenruo
2023-06-08 10:27 ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations fdmanana
2023-06-08 10:27   ` [PATCH v2 01/13] btrfs: add missing error handling when logging operation while COWing extent buffer fdmanana
2023-06-08 10:27   ` [PATCH v2 02/13] btrfs: fix extent buffer leak after tree mod log failure at split_node() fdmanana
2023-06-08 10:27   ` [PATCH v2 03/13] btrfs: avoid tree mod log ENOMEM failures when we don't need to log fdmanana
2023-06-08 10:27   ` [PATCH v2 04/13] btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() fdmanana
2023-06-08 10:27   ` [PATCH v2 05/13] btrfs: do not BUG_ON() on tree mod log failure at balance_level() fdmanana
2023-06-08 10:27   ` [PATCH v2 06/13] btrfs: rename enospc label to out " fdmanana
2023-06-08 10:27   ` [PATCH v2 07/13] btrfs: avoid unnecessarily setting the fs to RO and error state " fdmanana
2023-06-08 10:51     ` Qu Wenruo
2023-06-08 11:00       ` Filipe Manana
2023-06-08 11:04         ` Qu Wenruo
2023-06-08 10:27   ` [PATCH v2 08/13] btrfs: abort transaction at balance_level() when left child is missing fdmanana
2023-06-08 10:37     ` Qu Wenruo
2023-06-08 10:52     ` Qu Wenruo
2023-06-08 10:27   ` [PATCH v2 09/13] btrfs: abort transaction at update_ref_for_cow() when ref count is zero fdmanana
2023-06-08 10:52     ` Qu Wenruo
2023-06-08 10:27   ` [PATCH v2 10/13] btrfs: do not BUG_ON() on tree mod log failures at push_nodes_for_insert() fdmanana
2023-06-08 10:27   ` [PATCH v2 11/13] btrfs: do not BUG_ON() on tree mod log failure at insert_new_root() fdmanana
2023-06-08 10:27   ` [PATCH v2 12/13] btrfs: do not BUG_ON() on tree mod log failures at insert_ptr() fdmanana
2023-06-08 10:27   ` [PATCH v2 13/13] btrfs: do not BUG_ON() on tree mod log failures at btrfs_del_ptr() fdmanana
2023-06-09 16:51   ` [PATCH v2 00/13] btrfs: some fixes and updates around handling errors for tree mod log operations David Sterba
2023-06-09 17:20   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.