linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 14/14] btrfs: set the lockdep class for ebs on creation
Date: Fri, 30 Oct 2020 17:03:06 -0400	[thread overview]
Message-ID: <684352674d9bc1db4373af4b94cbe56667f90503.1604091530.git.josef@toxicpanda.com> (raw)
In-Reply-To: <cover.1604091530.git.josef@toxicpanda.com>

Both Filipe and Fedora QA recently hit the following lockdep splat

WARNING: possible recursive locking detected
5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1 Not tainted
--------------------------------------------
rsync/2610 is trying to acquire lock:
ffff89617ed48f20 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140

but task is already holding lock:
ffff8961757b1130 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140

other info that might help us debug this:
 Possible unsafe locking scenario:
       CPU0
       ----
  lock(&eb->lock);
  lock(&eb->lock);

 *** DEADLOCK ***
 May be due to missing lock nesting notation
2 locks held by rsync/2610:
 #0: ffff896107212b90 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: walk_component+0x10c/0x190
 #1: ffff8961757b1130 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140

stack backtrace:
CPU: 1 PID: 2610 Comm: rsync Not tainted 5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x12d/0x2a4
 ? kvm_sched_clock_read+0x14/0x30
 ? sched_clock+0x5/0x10
 lock_acquire+0xc8/0x400
 ? btrfs_tree_read_lock_atomic+0x34/0x140
 ? read_block_for_search.isra.0+0xdd/0x320
 _raw_read_lock+0x3d/0xa0
 ? btrfs_tree_read_lock_atomic+0x34/0x140
 btrfs_tree_read_lock_atomic+0x34/0x140
 btrfs_search_slot+0x616/0x9a0
 btrfs_lookup_dir_item+0x6c/0xb0
 btrfs_lookup_dentry+0xa8/0x520
 ? lockdep_init_map_waits+0x4c/0x210
 btrfs_lookup+0xe/0x30
 __lookup_slow+0x10f/0x1e0
 walk_component+0x11b/0x190
 path_lookupat+0x72/0x1c0
 filename_lookup+0x97/0x180
 ? strncpy_from_user+0x96/0x1e0
 ? getname_flags.part.0+0x45/0x1a0
 vfs_statx+0x64/0x100
 ? lockdep_hardirqs_on_prepare+0xff/0x180
 ? _raw_spin_unlock_irqrestore+0x41/0x50
 __do_sys_newlstat+0x26/0x40
 ? lockdep_hardirqs_on_prepare+0xff/0x180
 ? syscall_enter_from_user_mode+0x27/0x80
 ? syscall_enter_from_user_mode+0x27/0x80
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

I have also seen a report of lockdep complaining about the lock class
that was looked up being the same as the lock class on the lock we were
using, but I can't find the report.

These are problems that occur because we do not have the lockdep class
set on the extent buffer until _after_ we read the eb in properly.  This
is problematic for concurrent readers, because we will create the extent
buffer, lock it, and then attempt to read the extent buffer.

If a second thread comes in and tries to do a search down the same path
they'll get the above lockdep splat because the class isn't set properly
on the extent buffer.

There was a good reason for this, we generally didn't know the real
owner of the eb until we read it, specifically in refcount'ed roots.

However now all refcount'ed roots have the same class name, so we no
longer need to worry about this.  For non-refcount'ed tree's we know
which root we're on based on the parent.

Fix this by setting the lockdep class on the eb at creation time instead
of read time.  This will fix the splat and the weirdness where the class
changes in the middle of locking the block.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/disk-io.c     | 3 ---
 fs/btrfs/extent-tree.c | 8 +++++---
 fs/btrfs/extent_io.c   | 1 +
 fs/btrfs/volumes.c     | 1 -
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 989412501a92..d8ce8bbb3a45 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -578,9 +578,6 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset,
 		goto err;
 	}
 
-	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
-				       eb, found_level);
-
 	csum_tree_block(eb, result);
 
 	if (memcmp_extent_buffer(eb, result, 0, csum_size)) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a2c611a83057..1ddd8f4e9564 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4679,6 +4679,11 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 		return ERR_PTR(-EUCLEAN);
 	}
 
+	/*
+	 * This needs to stay, because we could allocate a free'd block from an
+	 * old tree into a new tree, so we need to make sure this new block is
+	 * set to the appropriate level and owner.
+	 */
 	btrfs_set_buffer_lockdep_class(owner, buf, level);
 	__btrfs_tree_lock(buf, nest);
 	btrfs_clean_tree_block(buf);
@@ -5069,9 +5074,6 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
 						    level - 1);
 		if (IS_ERR(next))
 			return PTR_ERR(next);
-
-		btrfs_set_buffer_lockdep_class(root->root_key.objectid, next,
-					       level - 1);
 		reada = 1;
 	}
 	btrfs_tree_lock(next);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0af8333ccca1..4e758d670fc1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5188,6 +5188,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
 	eb = __alloc_extent_buffer(fs_info, start, len);
 	if (!eb)
 		return ERR_PTR(-ENOMEM);
+	btrfs_set_buffer_lockdep_class(owner_root, eb, level);
 
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++, index++) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9ef1a51379e9..ad244b44a3a1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6914,7 +6914,6 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
 	if (IS_ERR(sb))
 		return PTR_ERR(sb);
 	set_extent_buffer_uptodate(sb);
-	btrfs_set_buffer_lockdep_class(root->root_key.objectid, sb, 0);
 	/*
 	 * The sb extent buffer is artificial and just used to read the system array.
 	 * set_extent_buffer_uptodate() call does not properly mark all it's
-- 
2.26.2


  parent reply	other threads:[~2020-10-30 21:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 21:02 [PATCH 00/14] Set the lockdep class on eb's at allocation time Josef Bacik
2020-10-30 21:02 ` [PATCH 01/14] btrfs: remove lockdep classes for the fs tree Josef Bacik
2020-10-30 21:02 ` [PATCH 02/14] btrfs: cleanup extent buffer readahead Josef Bacik
2020-10-30 21:02 ` [PATCH 03/14] btrfs: use btrfs_read_node_slot in btrfs_realloc_node Josef Bacik
2020-10-30 21:02 ` [PATCH 04/14] btrfs: use btrfs_read_node_slot in walk_down_reloc_tree Josef Bacik
2020-10-30 21:02 ` [PATCH 05/14] btrfs: use btrfs_read_node_slot in do_relocation Josef Bacik
2020-10-30 21:02 ` [PATCH 06/14] btrfs: use btrfs_read_node_slot in replace_path Josef Bacik
2020-10-30 21:02 ` [PATCH 07/14] btrfs: use btrfs_read_node_slot in walk_down_tree Josef Bacik
2020-10-30 21:03 ` [PATCH 08/14] btrfs: use btrfs_read_node_slot in qgroup_trace_extent_swap Josef Bacik
2020-10-30 21:03 ` [PATCH 09/14] btrfs: use btrfs_read_node_slot in qgroup_trace_new_subtree_blocks Josef Bacik
2020-10-30 21:03 ` [PATCH 10/14] btrfs: use btrfs_read_node_slot in btrfs_qgroup_trace_subtree Josef Bacik
2020-10-30 21:03 ` [PATCH 11/14] btrfs: pass root owner to read_tree_block Josef Bacik
2020-10-30 21:03 ` [PATCH 12/14] btrfs: pass the root owner and level around for reada Josef Bacik
2020-10-30 21:03 ` [PATCH 13/14] btrfs: pass the owner_root and level to alloc_extent_buffer Josef Bacik
2020-10-30 21:03 ` Josef Bacik [this message]
2020-11-05 15:45 [PATCH 00/14][REBASED] Set the lockdep class on eb's at allocation time Josef Bacik
2020-11-05 15:45 ` [PATCH 14/14] btrfs: set the lockdep class for ebs on creation Josef Bacik
2020-11-06 11:58   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=684352674d9bc1db4373af4b94cbe56667f90503.1604091530.git.josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).