linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] Serious fixes for different error paths
@ 2020-12-16 16:22 Josef Bacik
  2020-12-16 16:22 ` [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc Josef Bacik
                   ` (13 more replies)
  0 siblings, 14 replies; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

A lot of these were in previous versions of the relocation error handling
patches.  I added a few since the last go around.  All of these do not rely on
the error handling patches, and some of them are quite important otherwise we
get corruption if we get errors in certain spots.  There's also a few lockdep
fixes and such.  These really need to go in ASAP, regardless of when the
relocation error handling patches are merged.  They're mostly small and self
contained, the only "big" one being the one that tracks the root owner for
relocation reads, which is to resolve the remaining class of lockdep errors we
get because of an improper lockdep class set on the extent buffer.  Thanks,

Josef

Josef Bacik (13):
  btrfs: don't get an EINTR during drop_snapshot for reloc
  btrfs: initialize test inodes location
  btrfs: fix reloc root leak with 0 ref reloc roots on recovery
  btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
  btrfs: do not WARN_ON() if we can't find the reloc root
  btrfs: add ASSERT()'s for deleting backref cache nodes
  btrfs: do not double free backref nodes on error
  btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
  btrfs: modify the new_root highest_objectid under a ref count
  btrfs: fix lockdep splat in btrfs_recover_relocation
  btrfs: keep track of the root owner for relocation reads
  btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
  btrfs: don't clear ret in btrfs_start_dirty_block_groups

 fs/btrfs/backref.c           | 11 ++------
 fs/btrfs/backref.h           |  9 ++++---
 fs/btrfs/block-group.c       |  6 ++++-
 fs/btrfs/ctree.c             |  5 ++--
 fs/btrfs/extent-tree.c       |  5 +++-
 fs/btrfs/ioctl.c             | 10 ++++---
 fs/btrfs/relocation.c        | 51 +++++++++++++++++++++++++++++++-----
 fs/btrfs/tests/btrfs-tests.c |  7 ++++-
 fs/btrfs/tests/inode-tests.c |  9 -------
 fs/btrfs/volumes.c           |  2 ++
 10 files changed, 79 insertions(+), 36 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-11 21:56   ` David Sterba
  2020-12-16 16:22 ` [PATCH 02/13] btrfs: initialize test inodes location Josef Bacik
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

This was partially fixed by f3e3d9cc35252, however it missed a spot when
we restart a trans handle because we need to end the transaction.  The
fix is the same, simply use btrfs_join_transaction() instead of
btrfs_start_transaction() when deleting reloc roots.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d79b8369e6aa..08c664d04824 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5549,7 +5549,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
 				goto out_free;
 			}
 
-			trans = btrfs_start_transaction(tree_root, 0);
+			if (for_reloc)
+				trans = btrfs_join_transaction(tree_root);
+			else
+				trans = btrfs_start_transaction(tree_root, 0);
 			if (IS_ERR(trans)) {
 				err = PTR_ERR(trans);
 				goto out_free;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/13] btrfs: initialize test inodes location
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
  2020-12-16 16:22 ` [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2020-12-18 10:30   ` Nikolay Borisov
  2020-12-16 16:22 ` [PATCH 03/13] btrfs: fix reloc root leak with 0 ref reloc roots on recovery Josef Bacik
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While testing other things I was noticing that sometimes my VM would
fail to load the btrfs module because the self test failed like this

BTRFS: selftest: fs/btrfs/tests/inode-tests.c:963 miscount, wanted 1, got 0

This turned out to be because sometimes the btrfs ino would be the btree
inode number, and thus we'd skip calling the set extent delalloc bit
helper, and thus not adjust ->outstanding_extents.  Fix this by making
sure we init test inodes with a valid inode number so that we don't get
random failures during self tests.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/tests/btrfs-tests.c | 7 ++++++-
 fs/btrfs/tests/inode-tests.c | 9 ---------
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index 8ca334d554af..0fede1514a3e 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -55,8 +55,13 @@ struct inode *btrfs_new_test_inode(void)
 	struct inode *inode;
 
 	inode = new_inode(test_mnt->mnt_sb);
-	if (inode)
+	if (inode) {
+		inode->i_mode = S_IFREG;
+		BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
+		BTRFS_I(inode)->location.objectid = BTRFS_FIRST_FREE_OBJECTID;
+		BTRFS_I(inode)->location.offset = 0;
 		inode_init_owner(inode, NULL, S_IFREG);
+	}
 
 	return inode;
 }
diff --git a/fs/btrfs/tests/inode-tests.c b/fs/btrfs/tests/inode-tests.c
index 04022069761d..c9874b12d337 100644
--- a/fs/btrfs/tests/inode-tests.c
+++ b/fs/btrfs/tests/inode-tests.c
@@ -232,11 +232,6 @@ static noinline int test_btrfs_get_extent(u32 sectorsize, u32 nodesize)
 		return ret;
 	}
 
-	inode->i_mode = S_IFREG;
-	BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
-	BTRFS_I(inode)->location.objectid = BTRFS_FIRST_FREE_OBJECTID;
-	BTRFS_I(inode)->location.offset = 0;
-
 	fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
 	if (!fs_info) {
 		test_std_err(TEST_ALLOC_FS_INFO);
@@ -835,10 +830,6 @@ static int test_hole_first(u32 sectorsize, u32 nodesize)
 		return ret;
 	}
 
-	BTRFS_I(inode)->location.type = BTRFS_INODE_ITEM_KEY;
-	BTRFS_I(inode)->location.objectid = BTRFS_FIRST_FREE_OBJECTID;
-	BTRFS_I(inode)->location.offset = 0;
-
 	fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
 	if (!fs_info) {
 		test_std_err(TEST_ALLOC_FS_INFO);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/13] btrfs: fix reloc root leak with 0 ref reloc roots on recovery
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
  2020-12-16 16:22 ` [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc Josef Bacik
  2020-12-16 16:22 ` [PATCH 02/13] btrfs: initialize test inodes location Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2020-12-16 16:22 ` [PATCH 04/13] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list Josef Bacik
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When recovering a relocation, if we run into a reloc root that has 0
refs we simply add it to the reloc_control->reloc_roots list, and then
clean it up later.  The problem with this is __del_reloc_root() doesn't
do anything if the root isn't in the radix tree, which in this case it
won't be because we never call __add_reloc_root() on the reloc_root.

This exit condition simply isn't correct really.  During normal
operation we can remove ourselves from the rb tree and then we're meant
to clean up later at merge_reloc_roots() time, and this happens
correctly.  During recovery we're depending on free_reloc_roots() to
drop our references, but we're short-circuiting.

Fix this by continuing to check if we're on the list and dropping
ourselves from the reloc_control root list and dropping our reference
appropriately.  Change the corresponding BUG_ON() to an ASSERT() that
does the correct thing if we aren't in the rb tree.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f9500262106e..073666fb7b57 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -668,9 +668,7 @@ static void __del_reloc_root(struct btrfs_root *root)
 			RB_CLEAR_NODE(&node->rb_node);
 		}
 		spin_unlock(&rc->reloc_root_tree.lock);
-		if (!node)
-			return;
-		BUG_ON((struct btrfs_root *)node->data != root);
+		ASSERT(!node || (struct btrfs_root *)node->data == root);
 	}
 
 	/*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/13] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (2 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 03/13] btrfs: fix reloc root leak with 0 ref reloc roots on recovery Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-11 22:09   ` David Sterba
  2020-12-16 16:22 ` [PATCH 05/13] btrfs: do not WARN_ON() if we can't find the reloc root Josef Bacik
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

While doing error injection testing with my relocation patches I hit the
following ASSERT()

assertion failed: list_empty(&block_group->dirty_list), in fs/btrfs/block-group.c:3356
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3357!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 PID: 24351 Comm: umount Tainted: G        W         5.10.0-rc3+ #193
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
RIP: 0010:assertfail.constprop.0+0x18/0x1a
RSP: 0018:ffffa09b019c7e00 EFLAGS: 00010282
RAX: 0000000000000056 RBX: ffff8f6492c18000 RCX: 0000000000000000
RDX: ffff8f64fbc27c60 RSI: ffff8f64fbc19050 RDI: ffff8f64fbc19050
RBP: ffff8f6483bbdc00 R08: 0000000000000000 R09: 0000000000000000
R10: ffffa09b019c7c38 R11: ffffffff85d70928 R12: ffff8f6492c18100
R13: ffff8f6492c18148 R14: ffff8f6483bbdd70 R15: dead000000000100
FS:  00007fbfda4cdc40(0000) GS:ffff8f64fbc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbfda666fd0 CR3: 000000013cf66002 CR4: 0000000000370ef0
Call Trace:
 btrfs_free_block_groups.cold+0x55/0x55
 close_ctree+0x2c5/0x306
 ? fsnotify_destroy_marks+0x14/0x100
 generic_shutdown_super+0x6c/0x100
 kill_anon_super+0x14/0x30
 btrfs_kill_super+0x12/0x20
 deactivate_locked_super+0x36/0xa0
 cleanup_mnt+0x12d/0x190
 task_work_run+0x5c/0xa0
 exit_to_user_mode_prepare+0x1b1/0x1d0
 syscall_exit_to_user_mode+0x54/0x280
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened because I injected an error in btrfs_cow_block() while
running the dirty block groups.  When we run the dirty block groups, we
splice the list onto a local list to process.  However if an error
occurs, we only cleanup the transactions dirty block group list, not any
pending block groups we have on our locally spliced list.  Fix this by
splicing the list back onto the transactions dirty block group list, so
any remaining block groups are cleaned up.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 52f2198d44c9..69f8a306d70d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2684,6 +2684,9 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 		}
 		spin_unlock(&cur_trans->dirty_bgs_lock);
 	} else if (ret < 0) {
+		spin_lock(&cur_trans->dirty_bgs_lock);
+		list_splice_init(&dirty, &cur_trans->dirty_bgs);
+		spin_unlock(&cur_trans->dirty_bgs_lock);
 		btrfs_cleanup_dirty_bgs(cur_trans, fs_info);
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/13] btrfs: do not WARN_ON() if we can't find the reloc root
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (3 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 04/13] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-11 22:14   ` David Sterba
  2020-12-16 16:22 ` [PATCH 06/13] btrfs: add ASSERT()'s for deleting backref cache nodes Josef Bacik
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Zygo Blaxell

Any number of things could have gone wrong, like ENOMEM or EIO, so don't
WARN_ON() if we're unable to find the reloc root in the backref code.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/backref.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 02d7d7b2563b..f0877d2883f9 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -2624,7 +2624,7 @@ static int handle_direct_tree_backref(struct btrfs_backref_cache *cache,
 		/* Only reloc backref cache cares about a specific root */
 		if (cache->is_reloc) {
 			root = find_reloc_root(cache->fs_info, cur->bytenr);
-			if (WARN_ON(!root))
+			if (!root)
 				return -ENOENT;
 			cur->root = root;
 		} else {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/13] btrfs: add ASSERT()'s for deleting backref cache nodes
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (4 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 05/13] btrfs: do not WARN_ON() if we can't find the reloc root Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-11 22:18   ` David Sterba
  2020-12-16 16:22 ` [PATCH 07/13] btrfs: do not double free backref nodes on error Josef Bacik
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

A weird KASAN problem that Zygo reported could have been easily caught
if we checked for basic things in our backref freeing code.  We have two
methods of freeing a backref node

- btrfs_backref_free_node: this just is kfree() essentially.
- btrfs_backref_drop_node: this actually unlinks the node and cleans up
  everything and then calls btrfs_backref_free_node().

We should mostly be using btrfs_backref_drop_node(), to make sure the
node is properly unlinked from the backref cache, and only use
btrfs_backref_free_node() when we know the node isn't actually linked to
the backref cache.  We made a mistake here and thus got the KASAN splat.
Make this style of issue easier to find by adding some ASSERT()'s to
btrfs_backref_free_node() and adjusting our deletion stuff to properly
init the list so we can rely on list_empty() checks working properly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/backref.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/backref.h b/fs/btrfs/backref.h
index ff705cc564a9..17abde7f794c 100644
--- a/fs/btrfs/backref.h
+++ b/fs/btrfs/backref.h
@@ -296,6 +296,9 @@ static inline void btrfs_backref_free_node(struct btrfs_backref_cache *cache,
 					   struct btrfs_backref_node *node)
 {
 	if (node) {
+		ASSERT(list_empty(&node->list));
+		ASSERT(list_empty(&node->lower));
+		ASSERT(node->eb == NULL);
 		cache->nr_nodes--;
 		btrfs_put_root(node->root);
 		kfree(node);
@@ -340,11 +343,11 @@ static inline void btrfs_backref_drop_node_buffer(
 static inline void btrfs_backref_drop_node(struct btrfs_backref_cache *tree,
 					   struct btrfs_backref_node *node)
 {
-	BUG_ON(!list_empty(&node->upper));
+	ASSERT(list_empty(&node->upper));
 
 	btrfs_backref_drop_node_buffer(node);
-	list_del(&node->list);
-	list_del(&node->lower);
+	list_del_init(&node->list);
+	list_del_init(&node->lower);
 	if (!RB_EMPTY_NODE(&node->rb_node))
 		rb_erase(&node->rb_node, &tree->rb_root);
 	btrfs_backref_free_node(tree, node);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/13] btrfs: do not double free backref nodes on error
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (5 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 06/13] btrfs: add ASSERT()'s for deleting backref cache nodes Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2020-12-16 16:22 ` [PATCH 08/13] btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root Josef Bacik
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Zygo reported the following KASAN splat

BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
Read of size 8 at addr ffff888112402950 by task btrfs/28836

CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Call Trace:
 dump_stack+0xbc/0xf9
 ? btrfs_backref_cleanup_node+0x18a/0x420
 print_address_description.constprop.8+0x21/0x210
 ? record_print_text.cold.34+0x11/0x11
 ? btrfs_backref_cleanup_node+0x18a/0x420
 ? btrfs_backref_cleanup_node+0x18a/0x420
 kasan_report.cold.10+0x20/0x37
 ? btrfs_backref_cleanup_node+0x18a/0x420
 __asan_load8+0x69/0x90
 btrfs_backref_cleanup_node+0x18a/0x420
 btrfs_backref_release_cache+0x83/0x1b0
 relocate_block_group+0x394/0x780
 ? merge_reloc_roots+0x4a0/0x4a0
 btrfs_relocate_block_group+0x26e/0x4c0
 btrfs_relocate_chunk+0x52/0x120
 btrfs_balance+0xe2e/0x1900
 ? check_flags.part.50+0x6c/0x1e0
 ? btrfs_relocate_chunk+0x120/0x120
 ? kmem_cache_alloc_trace+0xa06/0xcb0
 ? _copy_from_user+0x83/0xc0
 btrfs_ioctl_balance+0x3a7/0x460
 btrfs_ioctl+0x24c8/0x4360
 ? __kasan_check_read+0x11/0x20
 ? check_chain_key+0x1f4/0x2f0
 ? __asan_loadN+0xf/0x20
 ? btrfs_ioctl_get_supported_features+0x30/0x30
 ? kvm_sched_clock_read+0x18/0x30
 ? check_chain_key+0x1f4/0x2f0
 ? lock_downgrade+0x3f0/0x3f0
 ? handle_mm_fault+0xad6/0x2150
 ? do_vfs_ioctl+0xfc/0x9d0
 ? ioctl_file_clone+0xe0/0xe0
 ? check_flags.part.50+0x6c/0x1e0
 ? check_flags.part.50+0x6c/0x1e0
 ? check_flags+0x26/0x30
 ? lock_is_held_type+0xc3/0xf0
 ? syscall_enter_from_user_mode+0x1b/0x60
 ? do_syscall_64+0x13/0x80
 ? rcu_read_lock_sched_held+0xa1/0xd0
 ? __kasan_check_read+0x11/0x20
 ? __fget_light+0xae/0x110
 __x64_sys_ioctl+0xc3/0x100
 do_syscall_64+0x37/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4c4bdfe427

Allocated by task 28836:
 kasan_save_stack+0x21/0x50
 __kasan_kmalloc.constprop.18+0xbe/0xd0
 kasan_kmalloc+0x9/0x10
 kmem_cache_alloc_trace+0x410/0xcb0
 btrfs_backref_alloc_node+0x46/0xf0
 btrfs_backref_add_tree_node+0x60d/0x11d0
 build_backref_tree+0xc5/0x700
 relocate_tree_blocks+0x2be/0xb90
 relocate_block_group+0x2eb/0x780
 btrfs_relocate_block_group+0x26e/0x4c0
 btrfs_relocate_chunk+0x52/0x120
 btrfs_balance+0xe2e/0x1900
 btrfs_ioctl_balance+0x3a7/0x460
 btrfs_ioctl+0x24c8/0x4360
 __x64_sys_ioctl+0xc3/0x100
 do_syscall_64+0x37/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 28836:
 kasan_save_stack+0x21/0x50
 kasan_set_track+0x20/0x30
 kasan_set_free_info+0x1f/0x30
 __kasan_slab_free+0xf3/0x140
 kasan_slab_free+0xe/0x10
 kfree+0xde/0x200
 btrfs_backref_error_cleanup+0x452/0x530
 build_backref_tree+0x1a5/0x700
 relocate_tree_blocks+0x2be/0xb90
 relocate_block_group+0x2eb/0x780
 btrfs_relocate_block_group+0x26e/0x4c0
 btrfs_relocate_chunk+0x52/0x120
 btrfs_balance+0xe2e/0x1900
 btrfs_ioctl_balance+0x3a7/0x460
 btrfs_ioctl+0x24c8/0x4360
 __x64_sys_ioctl+0xc3/0x100
 do_syscall_64+0x37/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This occurred because we free'd our backref node in
btrfs_backref_error_cleanup(), but then tried to free it again in
btrfs_backref_release_cache().  This is because
btrfs_backref_release_cache() will cycle through all of the
cache->leaves nodes and free them up.  However
btrfs_backref_error_cleanup() free'd the backref node with
btrfs_backref_free_node(), which simply kfree()'d the backref node
without unlinking it from the cache.  Change this to a
btrfs_backref_drop_node(), which does the appropriate cleanup and
removes the node from the cache->leaves list, so when we go to free the
remaining cache we don't trip over items we've already dropped.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/backref.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index f0877d2883f9..3af38b09be43 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3117,7 +3117,7 @@ void btrfs_backref_error_cleanup(struct btrfs_backref_cache *cache,
 		list_del_init(&lower->list);
 		if (lower == node)
 			node = NULL;
-		btrfs_backref_free_node(cache, lower);
+		btrfs_backref_drop_node(cache, lower);
 	}
 
 	btrfs_backref_cleanup_node(cache, node);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/13] btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (6 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 07/13] btrfs: do not double free backref nodes on error Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-11 22:20   ` David Sterba
  2020-12-16 16:22 ` [PATCH 09/13] btrfs: modify the new_root highest_objectid under a ref count Josef Bacik
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I hit a pretty ugly corruption when doing some error injection, it turns
out this is because we do not abort the transaction if we fail to adjust
the reference counts in btrfs_copy_root().  This makes us miss updating
the references for the new root that has already been allocated and
inserted, which is the problem.  Fix this by aborting the transaction if
we fail to do the reference modification.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 56e132d825a2..95d9bae764ab 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -221,9 +221,10 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 		ret = btrfs_inc_ref(trans, root, cow, 1);
 	else
 		ret = btrfs_inc_ref(trans, root, cow, 0);
-
-	if (ret)
+	if (ret) {
+		btrfs_abort_transaction(trans, ret);
 		return ret;
+	}
 
 	btrfs_mark_buffer_dirty(cow);
 	*cow_ret = cow;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/13] btrfs: modify the new_root highest_objectid under a ref count
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (7 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 08/13] btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2020-12-18 11:18   ` Nikolay Borisov
  2020-12-16 16:22 ` [PATCH 10/13] btrfs: fix lockdep splat in btrfs_recover_relocation Josef Bacik
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Qu pointed out a bug in one of my error handling patches, which made me
notice that we modify the new_root->highest_objectid _after_ we've
dropped the ref to the new_root.  This could lead to a possible UAF, fix
this by modifying the ->highest_objectid before we drop our reference to
the new_root.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index dde49a791f3e..af8d01659562 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -717,6 +717,12 @@ static noinline int create_subvol(struct inode *dir,
 	btrfs_record_root_in_trans(trans, new_root);
 
 	ret = btrfs_create_subvol_root(trans, new_root, root, new_dirid);
+	if (!ret) {
+		mutex_lock(&new_root->objectid_mutex);
+		new_root->highest_objectid = new_dirid;
+		mutex_unlock(&new_root->objectid_mutex);
+	}
+
 	btrfs_put_root(new_root);
 	if (ret) {
 		/* We potentially lose an unused inode item here */
@@ -724,10 +730,6 @@ static noinline int create_subvol(struct inode *dir,
 		goto fail;
 	}
 
-	mutex_lock(&new_root->objectid_mutex);
-	new_root->highest_objectid = new_dirid;
-	mutex_unlock(&new_root->objectid_mutex);
-
 	/*
 	 * insert the directory item
 	 */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/13] btrfs: fix lockdep splat in btrfs_recover_relocation
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (8 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 09/13] btrfs: modify the new_root highest_objectid under a ref count Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2020-12-16 16:22 ` [PATCH 11/13] btrfs: keep track of the root owner for relocation reads Josef Bacik
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo, Johannes Thumshirn

While testing the error paths of relocation I hit the following lockdep
splat

======================================================
WARNING: possible circular locking dependency detected
5.10.0-rc6+ #217 Not tainted
------------------------------------------------------
mount/779 is trying to acquire lock:
ffffa0e676945418 (&fs_info->balance_mutex){+.+.}-{3:3}, at: btrfs_recover_balance+0x2f0/0x340

but task is already holding lock:
ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (btrfs-root-00){++++}-{3:3}:
       down_read_nested+0x43/0x130
       __btrfs_tree_read_lock+0x27/0x100
       btrfs_read_lock_root_node+0x31/0x40
       btrfs_search_slot+0x462/0x8f0
       btrfs_update_root+0x55/0x2b0
       btrfs_drop_snapshot+0x398/0x750
       clean_dirty_subvols+0xdf/0x120
       btrfs_recover_relocation+0x534/0x5a0
       btrfs_start_pre_rw_mount+0xcb/0x170
       open_ctree+0x151f/0x1726
       btrfs_mount_root.cold+0x12/0xea
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       vfs_kern_mount.part.0+0x71/0xb0
       btrfs_mount+0x10d/0x380
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       path_mount+0x433/0xc10
       __x64_sys_mount+0xe3/0x120
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (sb_internal#2){.+.+}-{0:0}:
       start_transaction+0x444/0x700
       insert_balance_item.isra.0+0x37/0x320
       btrfs_balance+0x354/0xf40
       btrfs_ioctl_balance+0x2cf/0x380
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&fs_info->balance_mutex){+.+.}-{3:3}:
       __lock_acquire+0x1120/0x1e10
       lock_acquire+0x116/0x370
       __mutex_lock+0x7e/0x7b0
       btrfs_recover_balance+0x2f0/0x340
       open_ctree+0x1095/0x1726
       btrfs_mount_root.cold+0x12/0xea
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       vfs_kern_mount.part.0+0x71/0xb0
       btrfs_mount+0x10d/0x380
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       path_mount+0x433/0xc10
       __x64_sys_mount+0xe3/0x120
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

Chain exists of:
  &fs_info->balance_mutex --> sb_internal#2 --> btrfs-root-00

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-root-00);
                               lock(sb_internal#2);
                               lock(btrfs-root-00);
  lock(&fs_info->balance_mutex);

 *** DEADLOCK ***

2 locks held by mount/779:
 #0: ffffa0e60dc040e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
 #1: ffffa0e60ee31da8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x27/0x100

stack backtrace:
CPU: 0 PID: 779 Comm: mount Not tainted 5.10.0-rc6+ #217
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb0
 check_noncircular+0xcf/0xf0
 ? trace_call_bpf+0x139/0x260
 __lock_acquire+0x1120/0x1e10
 lock_acquire+0x116/0x370
 ? btrfs_recover_balance+0x2f0/0x340
 __mutex_lock+0x7e/0x7b0
 ? btrfs_recover_balance+0x2f0/0x340
 ? btrfs_recover_balance+0x2f0/0x340
 ? rcu_read_lock_sched_held+0x3f/0x80
 ? kmem_cache_alloc_trace+0x2c4/0x2f0
 ? btrfs_get_64+0x5e/0x100
 btrfs_recover_balance+0x2f0/0x340
 open_ctree+0x1095/0x1726
 btrfs_mount_root.cold+0x12/0xea
 ? rcu_read_lock_sched_held+0x3f/0x80
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 vfs_kern_mount.part.0+0x71/0xb0
 btrfs_mount+0x10d/0x380
 ? __kmalloc_track_caller+0x2f2/0x320
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 ? capable+0x3a/0x60
 path_mount+0x433/0xc10
 __x64_sys_mount+0xe3/0x120
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is thankfully straightforward to fix, simply release the path
before we setup the reloc_ctl.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7930e1c78c45..49ba941f0314 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4318,6 +4318,8 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
 		btrfs_warn(fs_info,
 	"balance: cannot set exclusive op status, resume manually");
 
+	btrfs_release_path(path);
+
 	mutex_lock(&fs_info->balance_mutex);
 	BUG_ON(fs_info->balance_ctl);
 	spin_lock(&fs_info->balance_lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 11/13] btrfs: keep track of the root owner for relocation reads
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (9 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 10/13] btrfs: fix lockdep splat in btrfs_recover_relocation Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-11 22:23   ` David Sterba
  2020-12-16 16:22 ` [PATCH 12/13] btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node Josef Bacik
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While testing the error paths in relocation, I hit the following lockdep
splat

======================================================
WARNING: possible circular locking dependency detected
5.10.0-rc3+ #206 Not tainted
------------------------------------------------------
btrfs-balance/1571 is trying to acquire lock:
ffff8cdbcc8f77d0 (&head_ref->mutex){+.+.}-{3:3}, at: btrfs_lookup_extent_info+0x156/0x3b0

but task is already holding lock:
ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (btrfs-tree-00){++++}-{3:3}:
       down_write_nested+0x43/0x80
       __btrfs_tree_lock+0x27/0x100
       btrfs_search_slot+0x248/0x890
       relocate_tree_blocks+0x490/0x650
       relocate_block_group+0x1ba/0x5d0
       kretprobe_trampoline+0x0/0x50

-> #1 (btrfs-csum-01){++++}-{3:3}:
       down_read_nested+0x43/0x130
       __btrfs_tree_read_lock+0x27/0x100
       btrfs_read_lock_root_node+0x31/0x40
       btrfs_search_slot+0x5ab/0x890
       btrfs_del_csums+0x10b/0x3c0
       __btrfs_free_extent+0x49d/0x8e0
       __btrfs_run_delayed_refs+0x283/0x11f0
       btrfs_run_delayed_refs+0x86/0x220
       btrfs_start_dirty_block_groups+0x2ba/0x520
       kretprobe_trampoline+0x0/0x50

-> #0 (&head_ref->mutex){+.+.}-{3:3}:
       __lock_acquire+0x1167/0x2150
       lock_acquire+0x116/0x3e0
       __mutex_lock+0x7e/0x7b0
       btrfs_lookup_extent_info+0x156/0x3b0
       walk_down_proc+0x1c3/0x280
       walk_down_tree+0x64/0xe0
       btrfs_drop_subtree+0x182/0x260
       do_relocation+0x52e/0x660
       relocate_tree_blocks+0x2ae/0x650
       relocate_block_group+0x1ba/0x5d0
       kretprobe_trampoline+0x0/0x50

other info that might help us debug this:

Chain exists of:
  &head_ref->mutex --> btrfs-csum-01 --> btrfs-tree-00

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-tree-00);
                               lock(btrfs-csum-01);
                               lock(btrfs-tree-00);
  lock(&head_ref->mutex);

 *** DEADLOCK ***

5 locks held by btrfs-balance/1571:
 #0: ffff8cdb89749ff8 (&fs_info->delete_unused_bgs_mutex){+.+.}-{3:3}, at: btrfs_balance+0x563/0xf40
 #1: ffff8cdb89748838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x156/0x300
 #2: ffff8cdbc2c16650 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x413/0x5c0
 #3: ffff8cdbc135f538 (btrfs-treloc-01){+.+.}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
 #4: ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100

stack backtrace:
CPU: 1 PID: 1571 Comm: btrfs-balance Not tainted 5.10.0-rc3+ #206
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb0
 check_noncircular+0xcf/0xf0
 ? trace_call_bpf+0x139/0x260
 __lock_acquire+0x1167/0x2150
 lock_acquire+0x116/0x3e0
 ? btrfs_lookup_extent_info+0x156/0x3b0
 __mutex_lock+0x7e/0x7b0
 ? btrfs_lookup_extent_info+0x156/0x3b0
 ? btrfs_lookup_extent_info+0x156/0x3b0
 ? release_extent_buffer+0x124/0x170
 ? _raw_spin_unlock+0x1f/0x30
 ? release_extent_buffer+0x124/0x170
 btrfs_lookup_extent_info+0x156/0x3b0
 walk_down_proc+0x1c3/0x280
 walk_down_tree+0x64/0xe0
 btrfs_drop_subtree+0x182/0x260
 do_relocation+0x52e/0x660
 relocate_tree_blocks+0x2ae/0x650
 ? add_tree_block+0x149/0x1b0
 relocate_block_group+0x1ba/0x5d0
 elfcorehdr_read+0x40/0x40
 ? elfcorehdr_read+0x40/0x40
 ? btrfs_balance+0x796/0xf40
 ? __kthread_parkme+0x66/0x90
 ? btrfs_balance+0xf40/0xf40
 ? balance_kthread+0x37/0x50
 ? kthread+0x137/0x150
 ? __kthread_bind_mask+0x60/0x60
 ? ret_from_fork+0x1f/0x30

As you can see this is bogus, we never take another tree's lock under
the csum lock.  This happens because sometimes we have to read tree
blocks from disk without knowing which root they belong to during
relocation.  We defaulted to an owner of 0, which translates to an fs
tree.  This is fine as all fs trees have the same class, but obviously
isn't fine if the block belongs to a cow only tree.

Thankfully cow only trees only have their owners root as a reference to
them, and since we already look up the extent information during
relocation, go ahead and check and see if this block might belong to a
cow only tree, and if so save the owner in the struct tree_block.  This
allows us to read_tree_block with the proper owner, which gets rid of
this lockdep splat.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 47 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 073666fb7b57..4dd9f220904a 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -98,6 +98,7 @@ struct tree_block {
 		u64 bytenr;
 	}; /* Use rb_simple_node for search/insert */
 	struct btrfs_key key;
+	u64 owner;
 	unsigned int level:8;
 	unsigned int key_ready:1;
 };
@@ -2391,8 +2392,8 @@ static int get_tree_block_key(struct btrfs_fs_info *fs_info,
 {
 	struct extent_buffer *eb;
 
-	eb = read_tree_block(fs_info, block->bytenr, 0, block->key.offset,
-			     block->level, NULL);
+	eb = read_tree_block(fs_info, block->bytenr, block->owner,
+			     block->key.offset, block->level, NULL);
 	if (IS_ERR(eb)) {
 		return PTR_ERR(eb);
 	} else if (!extent_buffer_uptodate(eb)) {
@@ -2491,7 +2492,8 @@ int relocate_tree_blocks(struct btrfs_trans_handle *trans,
 	/* Kick in readahead for tree blocks with missing keys */
 	rbtree_postorder_for_each_entry_safe(block, next, blocks, rb_node) {
 		if (!block->key_ready)
-			btrfs_readahead_tree_block(fs_info, block->bytenr, 0, 0,
+			btrfs_readahead_tree_block(fs_info, block->bytenr,
+						   block->owner, 0,
 						   block->level);
 	}
 
@@ -2799,21 +2801,59 @@ static int add_tree_block(struct reloc_control *rc,
 	u32 item_size;
 	int level = -1;
 	u64 generation;
+	u64 owner = 0;
 
 	eb =  path->nodes[0];
 	item_size = btrfs_item_size_nr(eb, path->slots[0]);
 
 	if (extent_key->type == BTRFS_METADATA_ITEM_KEY ||
 	    item_size >= sizeof(*ei) + sizeof(*bi)) {
+		unsigned long ptr = 0, end;
+
 		ei = btrfs_item_ptr(eb, path->slots[0],
 				struct btrfs_extent_item);
+		end = (unsigned long)ei + item_size;
 		if (extent_key->type == BTRFS_EXTENT_ITEM_KEY) {
 			bi = (struct btrfs_tree_block_info *)(ei + 1);
 			level = btrfs_tree_block_level(eb, bi);
+			ptr = (unsigned long)(bi + 1);
 		} else {
 			level = (int)extent_key->offset;
+			ptr = (unsigned long)(ei + 1);
 		}
 		generation = btrfs_extent_generation(eb, ei);
+
+		/*
+		 * We're reading random blocks without knowing their owner ahead
+		 * of time.  This is ok most of the time, as all reloc roots and
+		 * fs roots have the same lock type.  However normal trees do
+		 * not, and the only way to know ahead of time is to read the
+		 * inline ref offset.  We know it's an fs root if
+		 *
+		 * 1. There's more than one ref.
+		 * 2. There's a SHARED_DATA_REF_KEY set.
+		 * 3. FULL_BACKREF is set on the flags.
+		 *
+		 * Otherwise it's safe to assume that the ref offset == the
+		 * owner of this block, so we can use that when calling
+		 * read_tree_block.
+		 */
+		if (btrfs_extent_refs(eb, ei) == 1 &&
+		    !(btrfs_extent_flags(eb, ei) &
+		      BTRFS_BLOCK_FLAG_FULL_BACKREF) &&
+		    ptr < end) {
+			struct btrfs_extent_inline_ref *iref;
+			int type;
+
+			iref = (struct btrfs_extent_inline_ref *)ptr;
+			type = btrfs_get_extent_inline_ref_type(eb, iref,
+							BTRFS_REF_TYPE_BLOCK);
+			if (type == BTRFS_REF_TYPE_INVALID)
+				return -EINVAL;
+			if (type == BTRFS_TREE_BLOCK_REF_KEY)
+				owner = btrfs_extent_inline_ref_offset(eb,
+								       iref);
+		}
 	} else if (unlikely(item_size == sizeof(struct btrfs_extent_item_v0))) {
 		btrfs_print_v0_err(eb->fs_info);
 		btrfs_handle_fs_error(eb->fs_info, -EINVAL, NULL);
@@ -2835,6 +2875,7 @@ static int add_tree_block(struct reloc_control *rc,
 	block->key.offset = generation;
 	block->level = level;
 	block->key_ready = 0;
+	block->owner = owner;
 
 	rb_node = rb_simple_insert(blocks, block->bytenr, &block->rb_node);
 	if (rb_node)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 12/13] btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (10 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 11/13] btrfs: keep track of the root owner for relocation reads Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2020-12-16 16:22 ` [PATCH 13/13] btrfs: don't clear ret in btrfs_start_dirty_block_groups Josef Bacik
  2021-01-08 16:44 ` [PATCH 00/13] Serious fixes for different error paths David Sterba
  13 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo

Zygo reported the following panic when testing my error handling patches
for relocation

------------[ cut here ]------------
kernel BUG at fs/btrfs/backref.c:2545!
invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 3 PID: 8472 Comm: btrfs Tainted: G        W 14
Hardware name: QEMU Standard PC (i440FX + PIIX,

Call Trace:
 btrfs_backref_error_cleanup+0x4df/0x530
 build_backref_tree+0x1a5/0x700
 ? _raw_spin_unlock+0x22/0x30
 ? release_extent_buffer+0x225/0x280
 ? free_extent_buffer.part.52+0xd7/0x140
 relocate_tree_blocks+0x2a6/0xb60
 ? kasan_unpoison_shadow+0x35/0x50
 ? do_relocation+0xc10/0xc10
 ? kasan_kmalloc+0x9/0x10
 ? kmem_cache_alloc_trace+0x6a3/0xcb0
 ? free_extent_buffer.part.52+0xd7/0x140
 ? rb_insert_color+0x342/0x360
 ? add_tree_block.isra.36+0x236/0x2b0
 relocate_block_group+0x2eb/0x780
 ? merge_reloc_roots+0x470/0x470
 btrfs_relocate_block_group+0x26e/0x4c0
 btrfs_relocate_chunk+0x52/0x120
 btrfs_balance+0xe2e/0x18f0
 ? pvclock_clocksource_read+0xeb/0x190
 ? btrfs_relocate_chunk+0x120/0x120
 ? lock_contended+0x620/0x6e0
 ? do_raw_spin_lock+0x1e0/0x1e0
 ? do_raw_spin_unlock+0xa8/0x140
 btrfs_ioctl_balance+0x1f9/0x460
 btrfs_ioctl+0x24c8/0x4380
 ? __kasan_check_read+0x11/0x20
 ? check_chain_key+0x1f4/0x2f0
 ? __asan_loadN+0xf/0x20
 ? btrfs_ioctl_get_supported_features+0x30/0x30
 ? kvm_sched_clock_read+0x18/0x30
 ? check_chain_key+0x1f4/0x2f0
 ? lock_downgrade+0x3f0/0x3f0
 ? handle_mm_fault+0xad6/0x2150
 ? do_vfs_ioctl+0xfc/0x9d0
 ? ioctl_file_clone+0xe0/0xe0
 ? check_flags.part.50+0x6c/0x1e0
 ? check_flags.part.50+0x6c/0x1e0
 ? check_flags+0x26/0x30
 ? lock_is_held_type+0xc3/0xf0
 ? syscall_enter_from_user_mode+0x1b/0x60
 ? do_syscall_64+0x13/0x80
 ? rcu_read_lock_sched_held+0xa1/0xd0
 ? __kasan_check_read+0x11/0x20
 ? __fget_light+0xae/0x110
 __x64_sys_ioctl+0xc3/0x100
 do_syscall_64+0x37/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This occurs because of this check

if (RB_EMPTY_NODE(&upper->rb_node))
	BUG_ON(!list_empty(&node->upper));

As we are dropping the backref node, if we discover that our upper node
in the edge we just cleaned up isn't linked into the cache that we are
now done with this node, thus the BUG_ON().

However this is an erroneous assumption, as we will look up all the
references for a node first, and then process the pending edges.  All of
the 'upper' nodes in our pending edges won't be in the cache's rb_tree
yet, because they haven't been processed.  We could very well have many
edges still left to cleanup on this node.

The fact is we simply do not need this check, we can just process all of
the edges only for this node, because below this check we do the
following

if (list_empty(&upper->lower)) {
	list_add_tail(&upper->lower, &cache->leaves);
	upper->lowest = 1;
}

If the upper node truly isn't used yet, then we add it to the
cache->leaves list to be cleaned up later.  If it is still used then the
last child node that has it linked into its node will add it to the
leaves list and then it will be cleaned up.

Fix this problem by dropping this logic altogether.  With this fix I no
longer see the panic when testing with error injection in the backref
code.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/backref.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 3af38b09be43..7ac59a568595 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -2541,13 +2541,6 @@ void btrfs_backref_cleanup_node(struct btrfs_backref_cache *cache,
 		list_del(&edge->list[UPPER]);
 		btrfs_backref_free_edge(cache, edge);
 
-		if (RB_EMPTY_NODE(&upper->rb_node)) {
-			BUG_ON(!list_empty(&node->upper));
-			btrfs_backref_drop_node(cache, node);
-			node = upper;
-			node->lowest = 1;
-			continue;
-		}
 		/*
 		 * Add the node to leaf node list if no other child block
 		 * cached.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 13/13] btrfs: don't clear ret in btrfs_start_dirty_block_groups
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (11 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 12/13] btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node Josef Bacik
@ 2020-12-16 16:22 ` Josef Bacik
  2021-01-08 16:44 ` [PATCH 00/13] Serious fixes for different error paths David Sterba
  13 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-12-16 16:22 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Qu Wenruo, Johannes Thumshirn

If we fail to update a block group item in the loop we'll break, however
we'll do btrfs_run_delayed_refs and lose our error value in ret, and
thus not clean up properly.  Fix this by only running the delayed refs
if there was no failure.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 69f8a306d70d..5cfa52b1a3b8 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2669,7 +2669,8 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 	 * Go through delayed refs for all the stuff we've just kicked off
 	 * and then loop back (just once)
 	 */
-	ret = btrfs_run_delayed_refs(trans, 0);
+	if (!ret)
+		ret = btrfs_run_delayed_refs(trans, 0);
 	if (!ret && loops == 0) {
 		loops++;
 		spin_lock(&cur_trans->dirty_bgs_lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 02/13] btrfs: initialize test inodes location
  2020-12-16 16:22 ` [PATCH 02/13] btrfs: initialize test inodes location Josef Bacik
@ 2020-12-18 10:30   ` Nikolay Borisov
  2020-12-21 16:58     ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2020-12-18 10:30 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 16.12.20 г. 18:22 ч., Josef Bacik wrote:
> While testing other things I was noticing that sometimes my VM would
> fail to load the btrfs module because the self test failed like this
> 
> BTRFS: selftest: fs/btrfs/tests/inode-tests.c:963 miscount, wanted 1, got 0
> 
> This turned out to be because sometimes the btrfs ino would be the btree
> inode number, and thus we'd skip calling the set extent delalloc bit
> helper, and thus not adjust ->outstanding_extents.  Fix this by making
> sure we init test inodes with a valid inode number so that we don't get
> random failures during self tests.

This warrants slightly more explanation why this initialization is
required, namely that newly allocated indoes are initialized by passing
a set callback, since we are acquiring inodes for tests we need to
simulate this behavior by, effectively, open coding
btrfs_init_locked_inode.

<snip>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 09/13] btrfs: modify the new_root highest_objectid under a ref count
  2020-12-16 16:22 ` [PATCH 09/13] btrfs: modify the new_root highest_objectid under a ref count Josef Bacik
@ 2020-12-18 11:18   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2020-12-18 11:18 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 16.12.20 г. 18:22 ч., Josef Bacik wrote:
> Qu pointed out a bug in one of my error handling patches, which made me
> notice that we modify the new_root->highest_objectid _after_ we've
> dropped the ref to the new_root.  This could lead to a possible UAF, fix
> this by modifying the ->highest_objectid before we drop our reference to
> the new_root.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

This patch must be dropped as per
https://github.com/kdave/btrfs-devel/commit/6044ce89a75a07a80f28f4bfeddeefc1b772366a

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 02/13] btrfs: initialize test inodes location
  2020-12-18 10:30   ` Nikolay Borisov
@ 2020-12-21 16:58     ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2020-12-21 16:58 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Fri, Dec 18, 2020 at 12:30:13PM +0200, Nikolay Borisov wrote:
> 
> 
> On 16.12.20 г. 18:22 ч., Josef Bacik wrote:
> > While testing other things I was noticing that sometimes my VM would
> > fail to load the btrfs module because the self test failed like this
> > 
> > BTRFS: selftest: fs/btrfs/tests/inode-tests.c:963 miscount, wanted 1, got 0
> > 
> > This turned out to be because sometimes the btrfs ino would be the btree
> > inode number, and thus we'd skip calling the set extent delalloc bit
> > helper, and thus not adjust ->outstanding_extents.  Fix this by making
> > sure we init test inodes with a valid inode number so that we don't get
> > random failures during self tests.
> 
> This warrants slightly more explanation why this initialization is
> required, namely that newly allocated indoes are initialized by passing
> a set callback, since we are acquiring inodes for tests we need to
> simulate this behavior by, effectively, open coding
> btrfs_init_locked_inode.

Makes sense, thanks. I've updated the changelog, the patch was sent
independently and has been added to misc-next already.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/13] Serious fixes for different error paths
  2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
                   ` (12 preceding siblings ...)
  2020-12-16 16:22 ` [PATCH 13/13] btrfs: don't clear ret in btrfs_start_dirty_block_groups Josef Bacik
@ 2021-01-08 16:44 ` David Sterba
  2021-01-14 18:20   ` David Sterba
  13 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2021-01-08 16:44 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 11:22:04AM -0500, Josef Bacik wrote:
> Hello,
> 
> A lot of these were in previous versions of the relocation error handling
> patches.  I added a few since the last go around.  All of these do not rely on
> the error handling patches, and some of them are quite important otherwise we
> get corruption if we get errors in certain spots.  There's also a few lockdep
> fixes and such.  These really need to go in ASAP, regardless of when the
> relocation error handling patches are merged.  They're mostly small and self
> contained, the only "big" one being the one that tracks the root owner for
> relocation reads, which is to resolve the remaining class of lockdep errors we
> get because of an improper lockdep class set on the extent buffer.  Thanks,
> 
> Josef
> 
> Josef Bacik (13):
>   btrfs: don't get an EINTR during drop_snapshot for reloc
>   btrfs: initialize test inodes location
>   btrfs: fix reloc root leak with 0 ref reloc roots on recovery
>   btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
>   btrfs: do not WARN_ON() if we can't find the reloc root
>   btrfs: add ASSERT()'s for deleting backref cache nodes
>   btrfs: do not double free backref nodes on error
>   btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
>   btrfs: modify the new_root highest_objectid under a ref count
>   btrfs: fix lockdep splat in btrfs_recover_relocation
>   btrfs: keep track of the root owner for relocation reads
>   btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
>   btrfs: don't clear ret in btrfs_start_dirty_block_groups

Patch 2 has been merged, patch 9 dropped due to the inode number
cleanups in "btrfs: make btrfs_root::free_objectid hold the next
available objectid". Most of them are reviewed, I'll add the series to
for-next and continue merging to misc-next and then to -rc eventually.
Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc
  2020-12-16 16:22 ` [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc Josef Bacik
@ 2021-01-11 21:56   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-11 21:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 11:22:05AM -0500, Josef Bacik wrote:
> This was partially fixed by f3e3d9cc35252, however it missed a spot when

Please use full commit reference like f3e3d9cc3525 ("btrfs: avoid
possible signal interruption of btrfs_drop_snapshot() on relocation
tree")

> we restart a trans handle because we need to end the transaction.  The
> fix is the same, simply use btrfs_join_transaction() instead of
> btrfs_start_transaction() when deleting reloc roots.
> 

Added Fixes: f3e3d9cc3525 ("btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on relocation tree")

> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/extent-tree.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index d79b8369e6aa..08c664d04824 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5549,7 +5549,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>  				goto out_free;
>  			}
>  
> -			trans = btrfs_start_transaction(tree_root, 0);

Added same comment as in f3e3d9cc3525

> +			if (for_reloc)
> +				trans = btrfs_join_transaction(tree_root);
> +			else
> +				trans = btrfs_start_transaction(tree_root, 0);
>  			if (IS_ERR(trans)) {
>  				err = PTR_ERR(trans);
>  				goto out_free;
> -- 
> 2.26.2

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 04/13] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
  2020-12-16 16:22 ` [PATCH 04/13] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list Josef Bacik
@ 2021-01-11 22:09   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-11 22:09 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Qu Wenruo

On Wed, Dec 16, 2020 at 11:22:08AM -0500, Josef Bacik wrote:
> While doing error injection testing with my relocation patches I hit the
> following ASSERT()
> 
> assertion failed: list_empty(&block_group->dirty_list), in fs/btrfs/block-group.c:3356
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/ctree.h:3357!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 0 PID: 24351 Comm: umount Tainted: G        W         5.10.0-rc3+ #193
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> RIP: 0010:assertfail.constprop.0+0x18/0x1a
> RSP: 0018:ffffa09b019c7e00 EFLAGS: 00010282
> RAX: 0000000000000056 RBX: ffff8f6492c18000 RCX: 0000000000000000
> RDX: ffff8f64fbc27c60 RSI: ffff8f64fbc19050 RDI: ffff8f64fbc19050
> RBP: ffff8f6483bbdc00 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffa09b019c7c38 R11: ffffffff85d70928 R12: ffff8f6492c18100
> R13: ffff8f6492c18148 R14: ffff8f6483bbdd70 R15: dead000000000100
> FS:  00007fbfda4cdc40(0000) GS:ffff8f64fbc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbfda666fd0 CR3: 000000013cf66002 CR4: 0000000000370ef0
> Call Trace:
>  btrfs_free_block_groups.cold+0x55/0x55
>  close_ctree+0x2c5/0x306
>  ? fsnotify_destroy_marks+0x14/0x100
>  generic_shutdown_super+0x6c/0x100
>  kill_anon_super+0x14/0x30
>  btrfs_kill_super+0x12/0x20
>  deactivate_locked_super+0x36/0xa0
>  cleanup_mnt+0x12d/0x190
>  task_work_run+0x5c/0xa0
>  exit_to_user_mode_prepare+0x1b1/0x1d0
>  syscall_exit_to_user_mode+0x54/0x280
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> This happened because I injected an error in btrfs_cow_block() while
> running the dirty block groups.  When we run the dirty block groups, we
> splice the list onto a local list to process.  However if an error
> occurs, we only cleanup the transactions dirty block group list, not any
> pending block groups we have on our locally spliced list.  Fix this by
> splicing the list back onto the transactions dirty block group list, so
> any remaining block groups are cleaned up.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/block-group.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 52f2198d44c9..69f8a306d70d 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -2684,6 +2684,9 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
>  		}
>  		spin_unlock(&cur_trans->dirty_bgs_lock);
>  	} else if (ret < 0) {
> +		spin_lock(&cur_trans->dirty_bgs_lock);
> +		list_splice_init(&dirty, &cur_trans->dirty_bgs);
> +		spin_unlock(&cur_trans->dirty_bgs_lock);
>  		btrfs_cleanup_dirty_bgs(cur_trans, fs_info);
>  	}

There seem to be another error path that should un-splice the remaining
block groups:

2554         spin_lock(&cur_trans->dirty_bgs_lock);
2555         if (list_empty(&cur_trans->dirty_bgs)) {
2556                 spin_unlock(&cur_trans->dirty_bgs_lock);
2557                 return 0;
2558         }
2559         list_splice_init(&cur_trans->dirty_bgs, &dirty);
2560         spin_unlock(&cur_trans->dirty_bgs_lock);
2561
2562 again:
2563         /* Make sure all the block groups on our dirty list actually exist */
2564         btrfs_create_pending_block_groups(trans);
2565
2566         if (!path) {
2567                 path = btrfs_alloc_path();
2568                 if (!path)
2569                         return -ENOMEM;
2570         }

Initially the splice happens on line 2559. First error is on the !path
condition on line 2568, so that would need to be un-spliced.

On the 2nd iteration (looop == 1) we can't hit the error as path was
initialized before and it's only reused.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 05/13] btrfs: do not WARN_ON() if we can't find the reloc root
  2020-12-16 16:22 ` [PATCH 05/13] btrfs: do not WARN_ON() if we can't find the reloc root Josef Bacik
@ 2021-01-11 22:14   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-11 22:14 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Zygo Blaxell

On Wed, Dec 16, 2020 at 11:22:09AM -0500, Josef Bacik wrote:
> Any number of things could have gone wrong, like ENOMEM or EIO, so don't
> WARN_ON() if we're unable to find the reloc root in the backref code.

Where does the ENOMEM or EIO happen? The return value from
find_reloc_root is just a pointer and besides the rbtree lookup, there's
nothing else.

Removing the warning makes sense, or at least replacing with a warning
if that would bring some value, but otherwise it's handled and can stay
silent.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 06/13] btrfs: add ASSERT()'s for deleting backref cache nodes
  2020-12-16 16:22 ` [PATCH 06/13] btrfs: add ASSERT()'s for deleting backref cache nodes Josef Bacik
@ 2021-01-11 22:18   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-11 22:18 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

Regarding the subject, you can write plain 'assert', no need to spell
it exactly as it's in the code.

On Wed, Dec 16, 2020 at 11:22:10AM -0500, Josef Bacik wrote:
> A weird KASAN problem that Zygo reported

Please add the relevant part of the report to the changelog and/or link
to the report itself.

> could have been easily caught
> if we checked for basic things in our backref freeing code.  We have two
> methods of freeing a backref node
> 
> - btrfs_backref_free_node: this just is kfree() essentially.
> - btrfs_backref_drop_node: this actually unlinks the node and cleans up
>   everything and then calls btrfs_backref_free_node().
> 
> We should mostly be using btrfs_backref_drop_node(), to make sure the
> node is properly unlinked from the backref cache, and only use
> btrfs_backref_free_node() when we know the node isn't actually linked to
> the backref cache.  We made a mistake here and thus got the KASAN splat.
> Make this style of issue easier to find by adding some ASSERT()'s to
> btrfs_backref_free_node() and adjusting our deletion stuff to properly
> init the list so we can rely on list_empty() checks working properly.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 08/13] btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
  2020-12-16 16:22 ` [PATCH 08/13] btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root Josef Bacik
@ 2021-01-11 22:20   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-11 22:20 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 11:22:12AM -0500, Josef Bacik wrote:
> I hit a pretty ugly corruption when doing some error injection, it turns

Maybe it's useful to have the stack trace even for injected errors, it
gives some perspective about the context and call chain.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/13] btrfs: keep track of the root owner for relocation reads
  2020-12-16 16:22 ` [PATCH 11/13] btrfs: keep track of the root owner for relocation reads Josef Bacik
@ 2021-01-11 22:23   ` David Sterba
  2021-01-14 18:03     ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2021-01-11 22:23 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Dec 16, 2020 at 11:22:15AM -0500, Josef Bacik wrote:
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -98,6 +98,7 @@ struct tree_block {
>  		u64 bytenr;
>  	}; /* Use rb_simple_node for search/insert */
>  	struct btrfs_key key;
> +	u64 owner;
>  	unsigned int level:8;
>  	unsigned int key_ready:1;

This would probably lead to bad packing, key is 17 bytes and placing
u64 after that adds 7 bytes for proper alignment. The bitfield members
following the key are aligned to a byte so it would work if owner is
before key.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 11/13] btrfs: keep track of the root owner for relocation reads
  2021-01-11 22:23   ` David Sterba
@ 2021-01-14 18:03     ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-14 18:03 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Mon, Jan 11, 2021 at 11:23:15PM +0100, David Sterba wrote:
> On Wed, Dec 16, 2020 at 11:22:15AM -0500, Josef Bacik wrote:
> > --- a/fs/btrfs/relocation.c
> > +++ b/fs/btrfs/relocation.c
> > @@ -98,6 +98,7 @@ struct tree_block {
> >  		u64 bytenr;
> >  	}; /* Use rb_simple_node for search/insert */
> >  	struct btrfs_key key;
> > +	u64 owner;
> >  	unsigned int level:8;
> >  	unsigned int key_ready:1;
> 
> This would probably lead to bad packing, key is 17 bytes and placing
> u64 after that adds 7 bytes for proper alignment. The bitfield members
> following the key are aligned to a byte so it would work if owner is
> before key.

Easy fix and size 64 is also more cache friendly.

@@ -12256,22 +12256,18 @@ struct tree_block {
                struct rb_node     rb_node __attribute__((__aligned__(8))); /*     0    24 */
                u64                bytenr;               /*    24     8 */
        } __attribute__((__aligned__(8))) __attribute__((__aligned__(8)));               /*     0    32 */
-       struct btrfs_key           key;                  /*    32    17 */
+       u64                        owner;                /*    32     8 */
+       struct btrfs_key           key;                  /*    40    17 */

-       /* XXX 7 bytes hole, try to pack */
+       /* Bitfield combined with next fields */

-       u64                        owner;                /*    56     8 */
-       /* --- cacheline 1 boundary (64 bytes) --- */
-       unsigned int               level:8;              /*    64: 0  4 */
-       unsigned int               key_ready:1;          /*    64: 8  4 */
+       unsigned int               level:8;              /*    56: 8  4 */
+       unsigned int               key_ready:1;          /*    56:16  4 */

-       /* size: 72, cachelines: 2, members: 5 */
-       /* sum members: 57, holes: 1, sum holes: 7 */
-       /* sum bitfield members: 9 bits (1 bytes) */
+       /* size: 64, cachelines: 1, members: 5 */
        /* padding: 4 */
-       /* bit_padding: 23 bits */
+       /* bit_padding: 15 bits */
        /* forced alignments: 1 */
-       /* last cacheline: 8 bytes */
 } __attribute__((__aligned__(8)));

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/13] Serious fixes for different error paths
  2021-01-08 16:44 ` [PATCH 00/13] Serious fixes for different error paths David Sterba
@ 2021-01-14 18:20   ` David Sterba
  0 siblings, 0 replies; 26+ messages in thread
From: David Sterba @ 2021-01-14 18:20 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Fri, Jan 08, 2021 at 05:44:35PM +0100, David Sterba wrote:
> On Wed, Dec 16, 2020 at 11:22:04AM -0500, Josef Bacik wrote:
> > Hello,
> > 
> > A lot of these were in previous versions of the relocation error handling
> > patches.  I added a few since the last go around.  All of these do not rely on
> > the error handling patches, and some of them are quite important otherwise we
> > get corruption if we get errors in certain spots.  There's also a few lockdep
> > fixes and such.  These really need to go in ASAP, regardless of when the
> > relocation error handling patches are merged.  They're mostly small and self
> > contained, the only "big" one being the one that tracks the root owner for
> > relocation reads, which is to resolve the remaining class of lockdep errors we
> > get because of an improper lockdep class set on the extent buffer.  Thanks,
> > 
> > Josef
> > 
> > Josef Bacik (13):
> >   btrfs: don't get an EINTR during drop_snapshot for reloc
> >   btrfs: initialize test inodes location
> >   btrfs: fix reloc root leak with 0 ref reloc roots on recovery
> >   btrfs: splice remaining dirty_bg's onto the transaction dirty bg list
> >   btrfs: do not WARN_ON() if we can't find the reloc root
> >   btrfs: add ASSERT()'s for deleting backref cache nodes
> >   btrfs: do not double free backref nodes on error
> >   btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root
> >   btrfs: modify the new_root highest_objectid under a ref count
> >   btrfs: fix lockdep splat in btrfs_recover_relocation
> >   btrfs: keep track of the root owner for relocation reads
> >   btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node
> >   btrfs: don't clear ret in btrfs_start_dirty_block_groups
> 
> Patch 2 has been merged, patch 9 dropped due to the inode number
> cleanups in "btrfs: make btrfs_root::free_objectid hold the next
> available objectid". Most of them are reviewed, I'll add the series to
> for-next and continue merging to misc-next and then to -rc eventually.
> Thanks.

1, 7, 10-13 added to misc-next.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-14 18:23 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 16:22 [PATCH 00/13] Serious fixes for different error paths Josef Bacik
2020-12-16 16:22 ` [PATCH 01/13] btrfs: don't get an EINTR during drop_snapshot for reloc Josef Bacik
2021-01-11 21:56   ` David Sterba
2020-12-16 16:22 ` [PATCH 02/13] btrfs: initialize test inodes location Josef Bacik
2020-12-18 10:30   ` Nikolay Borisov
2020-12-21 16:58     ` David Sterba
2020-12-16 16:22 ` [PATCH 03/13] btrfs: fix reloc root leak with 0 ref reloc roots on recovery Josef Bacik
2020-12-16 16:22 ` [PATCH 04/13] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list Josef Bacik
2021-01-11 22:09   ` David Sterba
2020-12-16 16:22 ` [PATCH 05/13] btrfs: do not WARN_ON() if we can't find the reloc root Josef Bacik
2021-01-11 22:14   ` David Sterba
2020-12-16 16:22 ` [PATCH 06/13] btrfs: add ASSERT()'s for deleting backref cache nodes Josef Bacik
2021-01-11 22:18   ` David Sterba
2020-12-16 16:22 ` [PATCH 07/13] btrfs: do not double free backref nodes on error Josef Bacik
2020-12-16 16:22 ` [PATCH 08/13] btrfs: abort the transaction if we fail to inc ref in btrfs_copy_root Josef Bacik
2021-01-11 22:20   ` David Sterba
2020-12-16 16:22 ` [PATCH 09/13] btrfs: modify the new_root highest_objectid under a ref count Josef Bacik
2020-12-18 11:18   ` Nikolay Borisov
2020-12-16 16:22 ` [PATCH 10/13] btrfs: fix lockdep splat in btrfs_recover_relocation Josef Bacik
2020-12-16 16:22 ` [PATCH 11/13] btrfs: keep track of the root owner for relocation reads Josef Bacik
2021-01-11 22:23   ` David Sterba
2021-01-14 18:03     ` David Sterba
2020-12-16 16:22 ` [PATCH 12/13] btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node Josef Bacik
2020-12-16 16:22 ` [PATCH 13/13] btrfs: don't clear ret in btrfs_start_dirty_block_groups Josef Bacik
2021-01-08 16:44 ` [PATCH 00/13] Serious fixes for different error paths David Sterba
2021-01-14 18:20   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).