All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v4 05/53] btrfs: keep track of the root owner for relocation reads
Date: Thu,  3 Dec 2020 13:22:11 -0500	[thread overview]
Message-ID: <d96c58dec2b3d3564d2c176814a5b04498db0f71.1607019557.git.josef@toxicpanda.com> (raw)
In-Reply-To: <cover.1607019557.git.josef@toxicpanda.com>

While testing the error paths in relocation, I hit the following lockdep
splat

======================================================
WARNING: possible circular locking dependency detected
5.10.0-rc3+ #206 Not tainted
------------------------------------------------------
btrfs-balance/1571 is trying to acquire lock:
ffff8cdbcc8f77d0 (&head_ref->mutex){+.+.}-{3:3}, at: btrfs_lookup_extent_info+0x156/0x3b0

but task is already holding lock:
ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (btrfs-tree-00){++++}-{3:3}:
       down_write_nested+0x43/0x80
       __btrfs_tree_lock+0x27/0x100
       btrfs_search_slot+0x248/0x890
       relocate_tree_blocks+0x490/0x650
       relocate_block_group+0x1ba/0x5d0
       kretprobe_trampoline+0x0/0x50

-> #1 (btrfs-csum-01){++++}-{3:3}:
       down_read_nested+0x43/0x130
       __btrfs_tree_read_lock+0x27/0x100
       btrfs_read_lock_root_node+0x31/0x40
       btrfs_search_slot+0x5ab/0x890
       btrfs_del_csums+0x10b/0x3c0
       __btrfs_free_extent+0x49d/0x8e0
       __btrfs_run_delayed_refs+0x283/0x11f0
       btrfs_run_delayed_refs+0x86/0x220
       btrfs_start_dirty_block_groups+0x2ba/0x520
       kretprobe_trampoline+0x0/0x50

-> #0 (&head_ref->mutex){+.+.}-{3:3}:
       __lock_acquire+0x1167/0x2150
       lock_acquire+0x116/0x3e0
       __mutex_lock+0x7e/0x7b0
       btrfs_lookup_extent_info+0x156/0x3b0
       walk_down_proc+0x1c3/0x280
       walk_down_tree+0x64/0xe0
       btrfs_drop_subtree+0x182/0x260
       do_relocation+0x52e/0x660
       relocate_tree_blocks+0x2ae/0x650
       relocate_block_group+0x1ba/0x5d0
       kretprobe_trampoline+0x0/0x50

other info that might help us debug this:

Chain exists of:
  &head_ref->mutex --> btrfs-csum-01 --> btrfs-tree-00

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-tree-00);
                               lock(btrfs-csum-01);
                               lock(btrfs-tree-00);
  lock(&head_ref->mutex);

 *** DEADLOCK ***

5 locks held by btrfs-balance/1571:
 #0: ffff8cdb89749ff8 (&fs_info->delete_unused_bgs_mutex){+.+.}-{3:3}, at: btrfs_balance+0x563/0xf40
 #1: ffff8cdb89748838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x156/0x300
 #2: ffff8cdbc2c16650 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x413/0x5c0
 #3: ffff8cdbc135f538 (btrfs-treloc-01){+.+.}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
 #4: ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100

stack backtrace:
CPU: 1 PID: 1571 Comm: btrfs-balance Not tainted 5.10.0-rc3+ #206
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb0
 check_noncircular+0xcf/0xf0
 ? trace_call_bpf+0x139/0x260
 __lock_acquire+0x1167/0x2150
 lock_acquire+0x116/0x3e0
 ? btrfs_lookup_extent_info+0x156/0x3b0
 __mutex_lock+0x7e/0x7b0
 ? btrfs_lookup_extent_info+0x156/0x3b0
 ? btrfs_lookup_extent_info+0x156/0x3b0
 ? release_extent_buffer+0x124/0x170
 ? _raw_spin_unlock+0x1f/0x30
 ? release_extent_buffer+0x124/0x170
 btrfs_lookup_extent_info+0x156/0x3b0
 walk_down_proc+0x1c3/0x280
 walk_down_tree+0x64/0xe0
 btrfs_drop_subtree+0x182/0x260
 do_relocation+0x52e/0x660
 relocate_tree_blocks+0x2ae/0x650
 ? add_tree_block+0x149/0x1b0
 relocate_block_group+0x1ba/0x5d0
 elfcorehdr_read+0x40/0x40
 ? elfcorehdr_read+0x40/0x40
 ? btrfs_balance+0x796/0xf40
 ? __kthread_parkme+0x66/0x90
 ? btrfs_balance+0xf40/0xf40
 ? balance_kthread+0x37/0x50
 ? kthread+0x137/0x150
 ? __kthread_bind_mask+0x60/0x60
 ? ret_from_fork+0x1f/0x30

As you can see this is bogus, we never take another tree's lock under
the csum lock.  This happens because sometimes we have to read tree
blocks from disk without knowing which root they belong to during
relocation.  We defaulted to an owner of 0, which translates to an fs
tree.  This is fine as all fs trees have the same class, but obviously
isn't fine if the block belongs to a cow only tree.

Thankfully cow only trees only have their owners root as a reference to
them, and since we already look up the extent information during
relocation, go ahead and check and see if this block might belong to a
cow only tree, and if so save the owner in the struct tree_block.  This
allows us to read_tree_block with the proper owner, which gets rid of
this lockdep splat.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 47 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 19b7db8b2117..2b30e39e922a 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -98,6 +98,7 @@ struct tree_block {
 		u64 bytenr;
 	}; /* Use rb_simple_node for search/insert */
 	struct btrfs_key key;
+	u64 owner;
 	unsigned int level:8;
 	unsigned int key_ready:1;
 };
@@ -2393,8 +2394,8 @@ static int get_tree_block_key(struct btrfs_fs_info *fs_info,
 {
 	struct extent_buffer *eb;
 
-	eb = read_tree_block(fs_info, block->bytenr, 0, block->key.offset,
-			     block->level, NULL);
+	eb = read_tree_block(fs_info, block->bytenr, block->owner,
+			     block->key.offset, block->level, NULL);
 	if (IS_ERR(eb)) {
 		return PTR_ERR(eb);
 	} else if (!extent_buffer_uptodate(eb)) {
@@ -2493,7 +2494,8 @@ int relocate_tree_blocks(struct btrfs_trans_handle *trans,
 	/* Kick in readahead for tree blocks with missing keys */
 	rbtree_postorder_for_each_entry_safe(block, next, blocks, rb_node) {
 		if (!block->key_ready)
-			btrfs_readahead_tree_block(fs_info, block->bytenr, 0, 0,
+			btrfs_readahead_tree_block(fs_info, block->bytenr,
+						   block->owner, 0,
 						   block->level);
 	}
 
@@ -2801,21 +2803,59 @@ static int add_tree_block(struct reloc_control *rc,
 	u32 item_size;
 	int level = -1;
 	u64 generation;
+	u64 owner = 0;
 
 	eb =  path->nodes[0];
 	item_size = btrfs_item_size_nr(eb, path->slots[0]);
 
 	if (extent_key->type == BTRFS_METADATA_ITEM_KEY ||
 	    item_size >= sizeof(*ei) + sizeof(*bi)) {
+		unsigned long ptr = 0, end;
+
 		ei = btrfs_item_ptr(eb, path->slots[0],
 				struct btrfs_extent_item);
+		end = (unsigned long)ei + item_size;
 		if (extent_key->type == BTRFS_EXTENT_ITEM_KEY) {
 			bi = (struct btrfs_tree_block_info *)(ei + 1);
 			level = btrfs_tree_block_level(eb, bi);
+			ptr = (unsigned long)(bi + 1);
 		} else {
 			level = (int)extent_key->offset;
+			ptr = (unsigned long)(ei + 1);
 		}
 		generation = btrfs_extent_generation(eb, ei);
+
+		/*
+		 * We're reading random blocks without knowing their owner ahead
+		 * of time.  This is ok most of the time, as all reloc roots and
+		 * fs roots have the same lock type.  However normal trees do
+		 * not, and the only way to know ahead of time is to read the
+		 * inline ref offset.  We know it's an fs root if
+		 *
+		 * 1. There's more than one ref.
+		 * 2. There's a SHARED_DATA_REF_KEY set.
+		 * 3. FULL_BACKREF is set on the flags.
+		 *
+		 * Otherwise it's safe to assume that the ref offset == the
+		 * owner of this block, so we can use that when calling
+		 * read_tree_block.
+		 */
+		if (btrfs_extent_refs(eb, ei) == 1 &&
+		    !(btrfs_extent_flags(eb, ei) &
+		      BTRFS_BLOCK_FLAG_FULL_BACKREF) &&
+		    ptr < end) {
+			struct btrfs_extent_inline_ref *iref;
+			int type;
+
+			iref = (struct btrfs_extent_inline_ref *)ptr;
+			type = btrfs_get_extent_inline_ref_type(eb, iref,
+							BTRFS_REF_TYPE_BLOCK);
+			if (type == BTRFS_REF_TYPE_INVALID)
+				return -EINVAL;
+			if (type == BTRFS_TREE_BLOCK_REF_KEY)
+				owner = btrfs_extent_inline_ref_offset(eb,
+								       iref);
+		}
 	} else if (unlikely(item_size == sizeof(struct btrfs_extent_item_v0))) {
 		btrfs_print_v0_err(eb->fs_info);
 		btrfs_handle_fs_error(eb->fs_info, -EINVAL, NULL);
@@ -2837,6 +2877,7 @@ static int add_tree_block(struct reloc_control *rc,
 	block->key.offset = generation;
 	block->level = level;
 	block->key_ready = 0;
+	block->owner = owner;
 
 	rb_node = rb_simple_insert(blocks, block->bytenr, &block->rb_node);
 	if (rb_node)
-- 
2.26.2


  parent reply	other threads:[~2020-12-03 18:24 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 18:22 [PATCH v4 00/53] Cleanup error handling in relocation Josef Bacik
2020-12-03 18:22 ` [PATCH v4 01/53] btrfs: fix error handling in commit_fs_roots Josef Bacik
2020-12-03 18:22 ` [PATCH v4 02/53] btrfs: allow error injection for btrfs_search_slot and btrfs_cow_block Josef Bacik
2020-12-03 18:22 ` [PATCH v4 03/53] btrfs: modify the new_root highest_objectid under a ref count Josef Bacik
2020-12-04  8:01   ` Qu Wenruo
2020-12-07  8:35     ` Nikolay Borisov
2020-12-03 18:22 ` [PATCH v4 04/53] btrfs: fix lockdep splat in btrfs_recover_relocation Josef Bacik
2020-12-03 18:22 ` Josef Bacik [this message]
2020-12-04  8:03   ` [PATCH v4 05/53] btrfs: keep track of the root owner for relocation reads Qu Wenruo
2020-12-03 18:22 ` [PATCH v4 06/53] btrfs: noinline btrfs_should_cancel_balance Josef Bacik
2020-12-03 18:22 ` [PATCH v4 07/53] btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node Josef Bacik
2020-12-03 18:22 ` [PATCH v4 08/53] btrfs: pass down the tree block level through ref-verify Josef Bacik
2020-12-07  7:53   ` Johannes Thumshirn
2020-12-03 18:22 ` [PATCH v4 09/53] btrfs: make sure owner is set in ref-verify Josef Bacik
2020-12-03 18:22 ` [PATCH v4 10/53] btrfs: don't clear ret in btrfs_start_dirty_block_groups Josef Bacik
2020-12-03 18:22 ` [PATCH v4 11/53] btrfs: convert some BUG_ON()'s to ASSERT()'s in do_relocation Josef Bacik
2020-12-03 18:22 ` [PATCH v4 12/53] btrfs: convert BUG_ON()'s in relocate_tree_block Josef Bacik
2020-12-03 18:22 ` [PATCH v4 13/53] btrfs: return an error from btrfs_record_root_in_trans Josef Bacik
2020-12-03 18:22 ` [PATCH v4 14/53] btrfs: handle errors from select_reloc_root() Josef Bacik
2020-12-03 18:22 ` [PATCH v4 15/53] btrfs: convert BUG_ON()'s in select_reloc_root() to proper errors Josef Bacik
2020-12-04  8:04   ` Qu Wenruo
2020-12-03 18:22 ` [PATCH v4 16/53] btrfs: check record_root_in_trans related failures in select_reloc_root Josef Bacik
2020-12-03 18:22 ` [PATCH v4 17/53] btrfs: do proper error handling in record_reloc_root_in_trans Josef Bacik
2020-12-04  8:05   ` Qu Wenruo
2020-12-03 18:22 ` [PATCH v4 18/53] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename_exchange Josef Bacik
2020-12-03 18:22 ` [PATCH v4 19/53] btrfs: handle btrfs_record_root_in_trans failure in btrfs_rename Josef Bacik
2020-12-03 18:22 ` [PATCH v4 20/53] btrfs: handle btrfs_record_root_in_trans failure in btrfs_delete_subvolume Josef Bacik
2020-12-03 18:22 ` [PATCH v4 21/53] btrfs: handle btrfs_record_root_in_trans failure in btrfs_recover_log_trees Josef Bacik
2020-12-03 18:22 ` [PATCH v4 22/53] btrfs: handle btrfs_record_root_in_trans failure in create_subvol Josef Bacik
2020-12-03 18:22 ` [PATCH v4 23/53] btrfs: btrfs: handle btrfs_record_root_in_trans failure in relocate_tree_block Josef Bacik
2020-12-03 18:22 ` [PATCH v4 24/53] btrfs: handle btrfs_record_root_in_trans failure in start_transaction Josef Bacik
2020-12-03 18:22 ` [PATCH v4 25/53] btrfs: handle record_root_in_trans failure in qgroup_account_snapshot Josef Bacik
2020-12-03 18:22 ` [PATCH v4 26/53] btrfs: handle record_root_in_trans failure in btrfs_record_root_in_trans Josef Bacik
2020-12-03 18:22 ` [PATCH v4 27/53] btrfs: handle record_root_in_trans failure in create_pending_snapshot Josef Bacik
2020-12-03 18:22 ` [PATCH v4 28/53] btrfs: do not panic in __add_reloc_root Josef Bacik
2020-12-03 18:22 ` [PATCH v4 29/53] btrfs: have proper error handling in btrfs_init_reloc_root Josef Bacik
2020-12-03 18:22 ` [PATCH v4 30/53] btrfs: do proper error handling in create_reloc_root Josef Bacik
2020-12-03 18:22 ` [PATCH v4 31/53] btrfs: validate ->reloc_root after recording root in trans Josef Bacik
2020-12-03 18:22 ` [PATCH v4 32/53] btrfs: handle btrfs_update_reloc_root failure in commit_fs_roots Josef Bacik
2020-12-03 18:22 ` [PATCH v4 33/53] btrfs: change insert_dirty_subvol to return errors Josef Bacik
2020-12-03 18:22 ` [PATCH v4 34/53] btrfs: handle btrfs_update_reloc_root failure in insert_dirty_subvol Josef Bacik
2020-12-03 18:22 ` [PATCH v4 35/53] btrfs: handle btrfs_update_reloc_root failure in prepare_to_merge Josef Bacik
2020-12-03 18:22 ` [PATCH v4 36/53] btrfs: do proper error handling in btrfs_update_reloc_root Josef Bacik
2020-12-03 18:22 ` [PATCH v4 37/53] btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s Josef Bacik
2020-12-03 18:22 ` [PATCH v4 38/53] btrfs: handle btrfs_cow_block errors in replace_path Josef Bacik
2020-12-03 18:22 ` [PATCH v4 39/53] btrfs: handle btrfs_search_slot failure " Josef Bacik
2020-12-03 18:22 ` [PATCH v4 40/53] btrfs: handle errors in reference count manipulation " Josef Bacik
2020-12-03 18:22 ` [PATCH v4 41/53] btrfs: handle extent reference errors in do_relocation Josef Bacik
2020-12-03 18:22 ` [PATCH v4 42/53] btrfs: check for BTRFS_BLOCK_FLAG_FULL_BACKREF being set improperly Josef Bacik
2020-12-03 18:22 ` [PATCH v4 43/53] btrfs: remove the extent item sanity checks in relocate_block_group Josef Bacik
2020-12-03 18:22 ` [PATCH v4 44/53] btrfs: do proper error handling in create_reloc_inode Josef Bacik
2020-12-03 18:22 ` [PATCH v4 45/53] btrfs: handle __add_reloc_root failures in btrfs_recover_relocation Josef Bacik
2020-12-03 18:22 ` [PATCH v4 46/53] btrfs: cleanup error handling in prepare_to_merge Josef Bacik
2020-12-03 18:22 ` [PATCH v4 47/53] btrfs: handle extent corruption with select_one_root properly Josef Bacik
2020-12-03 18:22 ` [PATCH v4 48/53] btrfs: do proper error handling in merge_reloc_roots Josef Bacik
2020-12-06 22:10   ` Zygo Blaxell
2020-12-07  1:11     ` Qu Wenruo
2020-12-08  2:39       ` Zygo Blaxell
2020-12-03 18:22 ` [PATCH v4 49/53] btrfs: check return value of btrfs_commit_transaction in relocation Josef Bacik
2020-12-03 18:22 ` [PATCH v4 50/53] btrfs: do not WARN_ON() if we can't find the reloc root Josef Bacik
2020-12-03 18:22 ` [PATCH v4 51/53] btrfs: print the actual offset in btrfs_root_name Josef Bacik
2020-12-03 18:22 ` [PATCH v4 52/53] btrfs: fix reloc root leak with 0 ref reloc roots on recovery Josef Bacik
2020-12-03 18:22 ` [PATCH v4 53/53] btrfs: splice remaining dirty_bg's onto the transaction dirty bg list Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d96c58dec2b3d3564d2c176814a5b04498db0f71.1607019557.git.josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.