From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>, kernel-team@fb.com
Subject: Re: [PATCH 14/14] btrfs: set the lockdep class for ebs on creation
Date: Fri, 6 Nov 2020 11:58:34 +0000 [thread overview]
Message-ID: <CAL3q7H7jrF3Acu0DHfShMWE_WmwAcwF5b7-ZOj4vRPZ61_pf-Q@mail.gmail.com> (raw)
In-Reply-To: <1cee2922a32c305056a9559ccf7aede49777beae.1604591048.git.josef@toxicpanda.com>
On Thu, Nov 5, 2020 at 3:47 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> Both Filipe and Fedora QA recently hit the following lockdep splat
>
> WARNING: possible recursive locking detected
> 5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1 Not tainted
> --------------------------------------------
> rsync/2610 is trying to acquire lock:
> ffff89617ed48f20 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140
>
> but task is already holding lock:
> ffff8961757b1130 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
> CPU0
> ----
> lock(&eb->lock);
> lock(&eb->lock);
>
> *** DEADLOCK ***
> May be due to missing lock nesting notation
> 2 locks held by rsync/2610:
> #0: ffff896107212b90 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: walk_component+0x10c/0x190
> #1: ffff8961757b1130 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140
>
> stack backtrace:
> CPU: 1 PID: 2610 Comm: rsync Not tainted 5.10.0-0.rc1.20201028gited8780e3f2ec.57.fc34.x86_64 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> Call Trace:
> dump_stack+0x8b/0xb0
> __lock_acquire.cold+0x12d/0x2a4
> ? kvm_sched_clock_read+0x14/0x30
> ? sched_clock+0x5/0x10
> lock_acquire+0xc8/0x400
> ? btrfs_tree_read_lock_atomic+0x34/0x140
> ? read_block_for_search.isra.0+0xdd/0x320
> _raw_read_lock+0x3d/0xa0
> ? btrfs_tree_read_lock_atomic+0x34/0x140
> btrfs_tree_read_lock_atomic+0x34/0x140
> btrfs_search_slot+0x616/0x9a0
> btrfs_lookup_dir_item+0x6c/0xb0
> btrfs_lookup_dentry+0xa8/0x520
> ? lockdep_init_map_waits+0x4c/0x210
> btrfs_lookup+0xe/0x30
> __lookup_slow+0x10f/0x1e0
> walk_component+0x11b/0x190
> path_lookupat+0x72/0x1c0
> filename_lookup+0x97/0x180
> ? strncpy_from_user+0x96/0x1e0
> ? getname_flags.part.0+0x45/0x1a0
> vfs_statx+0x64/0x100
> ? lockdep_hardirqs_on_prepare+0xff/0x180
> ? _raw_spin_unlock_irqrestore+0x41/0x50
> __do_sys_newlstat+0x26/0x40
> ? lockdep_hardirqs_on_prepare+0xff/0x180
> ? syscall_enter_from_user_mode+0x27/0x80
> ? syscall_enter_from_user_mode+0x27/0x80
> do_syscall_64+0x33/0x40
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> I have also seen a report of lockdep complaining about the lock class
> that was looked up being the same as the lock class on the lock we were
> using, but I can't find the report.
>
> These are problems that occur because we do not have the lockdep class
> set on the extent buffer until _after_ we read the eb in properly. This
> is problematic for concurrent readers, because we will create the extent
> buffer, lock it, and then attempt to read the extent buffer.
>
> If a second thread comes in and tries to do a search down the same path
> they'll get the above lockdep splat because the class isn't set properly
> on the extent buffer.
>
> There was a good reason for this, we generally didn't know the real
> owner of the eb until we read it, specifically in refcount'ed roots.
>
> However now all refcount'ed roots have the same class name, so we no
> longer need to worry about this. For non-refcount'ed tree's we know
> which root we're on based on the parent.
>
> Fix this by setting the lockdep class on the eb at creation time instead
> of read time. This will fix the splat and the weirdness where the class
> changes in the middle of locking the block.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
I couldn't get anymore the lockdep splat I reported before (after
applying the whole patchset of course), it used to happen very often
with btrfs/033.
Looks good, thanks.
> ---
> fs/btrfs/disk-io.c | 3 ---
> fs/btrfs/extent-tree.c | 8 +++++---
> fs/btrfs/extent_io.c | 1 +
> fs/btrfs/volumes.c | 1 -
> 4 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f14398b5d933..a90839426cfa 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -577,9 +577,6 @@ int btrfs_validate_metadata_buffer(struct btrfs_io_bio *io_bio, u64 phy_offset,
> goto err;
> }
>
> - btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
> - eb, found_level);
> -
> csum_tree_block(eb, result);
>
> if (memcmp_extent_buffer(eb, result, 0, csum_size)) {
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 14b6e19f6151..517c2558f973 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4629,6 +4629,11 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> return ERR_PTR(-EUCLEAN);
> }
>
> + /*
> + * This needs to stay, because we could allocate a free'd block from an
> + * old tree into a new tree, so we need to make sure this new block is
> + * set to the appropriate level and owner.
> + */
> btrfs_set_buffer_lockdep_class(owner, buf, level);
> __btrfs_tree_lock(buf, nest);
> btrfs_clean_tree_block(buf);
> @@ -5018,9 +5023,6 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
> level - 1);
> if (IS_ERR(next))
> return PTR_ERR(next);
> -
> - btrfs_set_buffer_lockdep_class(root->root_key.objectid, next,
> - level - 1);
> reada = 1;
> }
> btrfs_tree_lock(next);
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index a883350d5e7f..3c61981b2c7b 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5196,6 +5196,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> eb = __alloc_extent_buffer(fs_info, start, len);
> if (!eb)
> return ERR_PTR(-ENOMEM);
> + btrfs_set_buffer_lockdep_class(owner_root, eb, level);
>
> num_pages = num_extent_pages(eb);
> for (i = 0; i < num_pages; i++, index++) {
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0ca2e96a9cda..4830c40fc400 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6902,7 +6902,6 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info)
> if (IS_ERR(sb))
> return PTR_ERR(sb);
> set_extent_buffer_uptodate(sb);
> - btrfs_set_buffer_lockdep_class(root->root_key.objectid, sb, 0);
> /*
> * The sb extent buffer is artificial and just used to read the system array.
> * set_extent_buffer_uptodate() call does not properly mark all it's
> --
> 2.26.2
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
next prev parent reply other threads:[~2020-11-06 11:58 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-05 15:45 [PATCH 00/14][REBASED] Set the lockdep class on eb's at allocation time Josef Bacik
2020-11-05 15:45 ` [PATCH 01/14] btrfs: remove lockdep classes for the fs tree Josef Bacik
2020-11-06 11:50 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 02/14] btrfs: cleanup extent buffer readahead Josef Bacik
2020-11-06 11:51 ` Filipe Manana
2020-11-09 15:09 ` David Sterba
2020-11-05 15:45 ` [PATCH 03/14] btrfs: use btrfs_read_node_slot in btrfs_realloc_node Josef Bacik
2020-11-06 11:51 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 04/14] btrfs: use btrfs_read_node_slot in walk_down_reloc_tree Josef Bacik
2020-11-06 11:52 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 05/14] btrfs: use btrfs_read_node_slot in do_relocation Josef Bacik
2020-11-06 11:52 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 06/14] btrfs: use btrfs_read_node_slot in replace_path Josef Bacik
2020-11-06 11:53 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 07/14] btrfs: use btrfs_read_node_slot in walk_down_tree Josef Bacik
2020-11-06 11:53 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 08/14] btrfs: use btrfs_read_node_slot in qgroup_trace_extent_swap Josef Bacik
2020-11-06 11:53 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 09/14] btrfs: use btrfs_read_node_slot in qgroup_trace_new_subtree_blocks Josef Bacik
2020-11-06 11:54 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 10/14] btrfs: use btrfs_read_node_slot in btrfs_qgroup_trace_subtree Josef Bacik
2020-11-06 11:55 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 11/14] btrfs: pass root owner to read_tree_block Josef Bacik
2020-11-06 11:55 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 12/14] btrfs: pass the root owner and level around for reada Josef Bacik
2020-11-06 11:56 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 13/14] btrfs: pass the owner_root and level to alloc_extent_buffer Josef Bacik
2020-11-06 11:54 ` Filipe Manana
2020-11-05 15:45 ` [PATCH 14/14] btrfs: set the lockdep class for ebs on creation Josef Bacik
2020-11-06 11:58 ` Filipe Manana [this message]
2020-11-09 16:50 ` [PATCH 00/14][REBASED] Set the lockdep class on eb's at allocation time David Sterba
-- strict thread matches above, loose matches on Subject: below --
2020-10-30 21:02 [PATCH 00/14] " Josef Bacik
2020-10-30 21:03 ` [PATCH 14/14] btrfs: set the lockdep class for ebs on creation Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H7jrF3Acu0DHfShMWE_WmwAcwF5b7-ZOj4vRPZ61_pf-Q@mail.gmail.com \
--to=fdmanana@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).