* [PATCH 0/3] Lockdep fixes for misc-next
@ 2020-10-19 20:02 Josef Bacik
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
To: linux-btrfs, kernel-team
Hello,
Here's a few lockdep fixes for misc-next+my rwsem patch. Nothing too crazy, but
the last one is a little wonkey because qgroups does a backref resolution while
adding delayed refs. Generally we do the right thing with searching the commit
roots and skipping the locking, with the exception of looking up fs roots if we
have to resolve indirect refs. This obviously uses the normal lookup and
locking stuff, which is problematic in the new world order. For now I'm fixing
it with a special helper for backref lookups that either finds the root in
cache, or generates a temporary root that's not inserted into the fs roots radix
tree and is only used to do the backref resolution. Thanks,
Josef
Josef Bacik (3):
btrfs: drop the path before adding qgroup items when enabling qgroups
btrfs: protect the fs_info->caching_block_groups differently
btrfs: add a helper to read the tree_root commit root for backref
lookup
fs/btrfs/backref.c | 14 +++++++-
fs/btrfs/block-group.c | 12 +++----
fs/btrfs/disk-io.c | 79 +++++++++++++++++++++++++++++++-----------
fs/btrfs/disk-io.h | 3 ++
fs/btrfs/extent-tree.c | 2 ++
fs/btrfs/qgroup.c | 16 +++++++++
6 files changed, 98 insertions(+), 28 deletions(-)
--
2.26.2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups
2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
@ 2020-10-19 20:02 ` Josef Bacik
2020-10-20 9:31 ` Filipe Manana
2020-10-21 15:19 ` David Sterba
2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
` (2 subsequent siblings)
3 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
To: linux-btrfs, kernel-team
When enabling qgroups we walk the tree_root and then add a qgroup item
for every root that we have. This creates a lock dependency on the
tree_root and qgroup_root, which results in the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
5.9.0-default+ #1299 Not tainted
------------------------------------------------------
btrfs/24552 is trying to acquire lock:
ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
but task is already holding lock:
ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (btrfs-root-00){++++}-{3:3}:
__lock_acquire+0x3fb/0x730
lock_acquire.part.0+0x6a/0x130
down_read_nested+0x46/0x130
__btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
__btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
btrfs_search_slot+0xc3/0x9f0 [btrfs]
btrfs_insert_item+0x6e/0x140 [btrfs]
btrfs_create_tree+0x1cb/0x240 [btrfs]
btrfs_quota_enable+0xcd/0x790 [btrfs]
btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
__x64_sys_ioctl+0x83/0xa0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (btrfs-quota-00){++++}-{3:3}:
check_prev_add+0x91/0xc30
validate_chain+0x491/0x750
__lock_acquire+0x3fb/0x730
lock_acquire.part.0+0x6a/0x130
down_read_nested+0x46/0x130
__btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
__btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
btrfs_search_slot+0xc3/0x9f0 [btrfs]
btrfs_insert_empty_items+0x58/0xa0 [btrfs]
add_qgroup_item.part.0+0x72/0x210 [btrfs]
btrfs_quota_enable+0x3bb/0x790 [btrfs]
btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
__x64_sys_ioctl+0x83/0xa0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(btrfs-root-00);
lock(btrfs-quota-00);
lock(btrfs-root-00);
lock(btrfs-quota-00);
*** DEADLOCK ***
5 locks held by btrfs/24552:
#0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
#1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
#2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
#3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
#4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
stack backtrace:
CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
Call Trace:
dump_stack+0x77/0x97
check_noncircular+0xf3/0x110
check_prev_add+0x91/0xc30
validate_chain+0x491/0x750
__lock_acquire+0x3fb/0x730
lock_acquire.part.0+0x6a/0x130
? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
? lock_acquire+0xc4/0x140
? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
down_read_nested+0x46/0x130
? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
__btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
? btrfs_root_node+0xd9/0x200 [btrfs]
__btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
btrfs_search_slot+0xc3/0x9f0 [btrfs]
btrfs_insert_empty_items+0x58/0xa0 [btrfs]
add_qgroup_item.part.0+0x72/0x210 [btrfs]
btrfs_quota_enable+0x3bb/0x790 [btrfs]
btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
__x64_sys_ioctl+0x83/0xa0
do_syscall_64+0x2d/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by dropping the path whenever we find a root item, add the
qgroup item, and then re-lookup the root item we found and continue
processing roots.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/qgroup.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 580899bdb991..573a555b51d8 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1026,6 +1026,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
btrfs_item_key_to_cpu(leaf, &found_key, slot);
if (found_key.type == BTRFS_ROOT_REF_KEY) {
+ btrfs_release_path(path);
+
ret = add_qgroup_item(trans, quota_root,
found_key.offset);
if (ret) {
@@ -1044,6 +1046,20 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
btrfs_abort_transaction(trans, ret);
goto out_free_path;
}
+ ret = btrfs_search_slot_for_read(tree_root, &found_key,
+ path, 1, 0);
+ if (ret < 0) {
+ btrfs_abort_transaction(trans, ret);
+ goto out_free_path;
+ }
+ if (ret > 0) {
+ /*
+ * Shouldn't happen, but in case it does we
+ * don't need to do the btrfs_next_item, just
+ * continue.
+ */
+ continue;
+ }
}
ret = btrfs_next_item(tree_root, path);
if (ret < 0) {
--
2.26.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
@ 2020-10-19 20:02 ` Josef Bacik
2020-10-20 10:10 ` Filipe Manana
2020-10-21 15:51 ` David Sterba
2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
2020-10-21 16:13 ` [PATCH 0/3] Lockdep fixes for misc-next David Sterba
3 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
To: linux-btrfs, kernel-team
I got the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
5.9.0+ #101 Not tainted
------------------------------------------------------
btrfs-cleaner/3445 is trying to acquire lock:
ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
but task is already holding lock:
ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
down_write+0x3d/0x70
btrfs_cache_block_group+0x2d5/0x510
find_free_extent+0xb6e/0x12f0
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb1/0x330
alloc_tree_block_no_bg_flush+0x4f/0x60
__btrfs_cow_block+0x11d/0x580
btrfs_cow_block+0x10c/0x220
commit_cowonly_roots+0x47/0x2e0
btrfs_commit_transaction+0x595/0xbd0
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x36/0xa0
cleanup_mnt+0x12d/0x190
task_work_run+0x5c/0xa0
exit_to_user_mode_prepare+0x1df/0x200
syscall_exit_to_user_mode+0x54/0x280
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&space_info->groups_sem){++++}-{3:3}:
down_read+0x40/0x130
find_free_extent+0x2ed/0x12f0
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb1/0x330
alloc_tree_block_no_bg_flush+0x4f/0x60
__btrfs_cow_block+0x11d/0x580
btrfs_cow_block+0x10c/0x220
commit_cowonly_roots+0x47/0x2e0
btrfs_commit_transaction+0x595/0xbd0
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x36/0xa0
cleanup_mnt+0x12d/0x190
task_work_run+0x5c/0xa0
exit_to_user_mode_prepare+0x1df/0x200
syscall_exit_to_user_mode+0x54/0x280
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (btrfs-root-00){++++}-{3:3}:
__lock_acquire+0x1167/0x2150
lock_acquire+0xb9/0x3d0
down_read_nested+0x43/0x130
__btrfs_tree_read_lock+0x32/0x170
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x614/0x9d0
btrfs_find_root+0x35/0x1b0
btrfs_read_tree_root+0x61/0x120
btrfs_get_root_ref+0x14b/0x600
find_parent_nodes+0x3e6/0x1b30
btrfs_find_all_roots_safe+0xb4/0x130
btrfs_find_all_roots+0x60/0x80
btrfs_qgroup_trace_extent_post+0x27/0x40
btrfs_add_delayed_data_ref+0x3fd/0x460
btrfs_free_extent+0x42/0x100
__btrfs_mod_ref+0x1d7/0x2f0
walk_up_proc+0x11c/0x400
walk_up_tree+0xf0/0x180
btrfs_drop_snapshot+0x1c7/0x780
btrfs_clean_one_deleted_snapshot+0xfb/0x110
cleaner_kthread+0xd4/0x140
kthread+0x13a/0x150
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Chain exists of:
btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->commit_root_sem);
lock(&space_info->groups_sem);
lock(&fs_info->commit_root_sem);
lock(btrfs-root-00);
*** DEADLOCK ***
3 locks held by btrfs-cleaner/3445:
#0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
#1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
#2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
stack backtrace:
CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x8b/0xb0
check_noncircular+0xcf/0xf0
__lock_acquire+0x1167/0x2150
? __bfs+0x42/0x210
lock_acquire+0xb9/0x3d0
? __btrfs_tree_read_lock+0x32/0x170
down_read_nested+0x43/0x130
? __btrfs_tree_read_lock+0x32/0x170
__btrfs_tree_read_lock+0x32/0x170
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x614/0x9d0
? find_held_lock+0x2b/0x80
btrfs_find_root+0x35/0x1b0
? do_raw_spin_unlock+0x4b/0xa0
btrfs_read_tree_root+0x61/0x120
btrfs_get_root_ref+0x14b/0x600
find_parent_nodes+0x3e6/0x1b30
btrfs_find_all_roots_safe+0xb4/0x130
btrfs_find_all_roots+0x60/0x80
btrfs_qgroup_trace_extent_post+0x27/0x40
btrfs_add_delayed_data_ref+0x3fd/0x460
btrfs_free_extent+0x42/0x100
__btrfs_mod_ref+0x1d7/0x2f0
walk_up_proc+0x11c/0x400
walk_up_tree+0xf0/0x180
btrfs_drop_snapshot+0x1c7/0x780
? btrfs_clean_one_deleted_snapshot+0x73/0x110
btrfs_clean_one_deleted_snapshot+0xfb/0x110
cleaner_kthread+0xd4/0x140
? btrfs_alloc_root+0x50/0x50
kthread+0x13a/0x150
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x1f/0x30
while testing another lockdep fix. This happens because we're using the
commit_root_sem to protect fs_info->caching_block_groups, which creates
a dependency on the groups_sem -> commit_root_sem, which is problematic
because we will allocate blocks while holding tree roots. Fix this by
making the list itself protected by the fs_info->block_group_cache_lock.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/block-group.c | 12 ++++++------
fs/btrfs/extent-tree.c | 2 ++
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 3ba6f3839d39..a8240913e6fc 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
return 0;
}
- down_write(&fs_info->commit_root_sem);
+ spin_lock(&fs_info->block_group_cache_lock);
refcount_inc(&caching_ctl->count);
list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
- up_write(&fs_info->commit_root_sem);
+ spin_unlock(&fs_info->block_group_cache_lock);
btrfs_get_block_group(cache);
@@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
if (block_group->cached == BTRFS_CACHE_STARTED)
btrfs_wait_block_group_cache_done(block_group);
if (block_group->has_caching_ctl) {
- down_write(&fs_info->commit_root_sem);
+ spin_lock(&fs_info->block_group_cache_lock);
if (!caching_ctl) {
struct btrfs_caching_control *ctl;
@@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
}
if (caching_ctl)
list_del_init(&caching_ctl->list);
- up_write(&fs_info->commit_root_sem);
+ spin_unlock(&fs_info->block_group_cache_lock);
if (caching_ctl) {
/* Once for the caching bgs list and once for us. */
btrfs_put_caching_control(caching_ctl);
@@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_caching_control *caching_ctl;
struct rb_node *n;
- down_write(&info->commit_root_sem);
+ spin_lock(&info->block_group_cache_lock);
while (!list_empty(&info->caching_block_groups)) {
caching_ctl = list_entry(info->caching_block_groups.next,
struct btrfs_caching_control, list);
list_del(&caching_ctl->list);
btrfs_put_caching_control(caching_ctl);
}
- up_write(&info->commit_root_sem);
+ spin_unlock(&info->block_group_cache_lock);
spin_lock(&info->unused_bgs_lock);
while (!list_empty(&info->unused_bgs)) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5fd60b13f4f8..ff2b5c132a70 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
down_write(&fs_info->commit_root_sem);
+ spin_lock(&fs_info->block_group_cache_lock);
list_for_each_entry_safe(caching_ctl, next,
&fs_info->caching_block_groups, list) {
cache = caching_ctl->block_group;
@@ -2749,6 +2750,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
cache->last_byte_to_unpin = caching_ctl->progress;
}
}
+ spin_unlock(&fs_info->block_group_cache_lock);
up_write(&fs_info->commit_root_sem);
--
2.26.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup
2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
@ 2020-10-19 20:02 ` Josef Bacik
2020-10-20 11:06 ` Filipe Manana
2020-10-21 16:13 ` [PATCH 0/3] Lockdep fixes for misc-next David Sterba
3 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
To: linux-btrfs, kernel-team
I got the following lockdep splat with my rwsem patches on btrfs/104
======================================================
WARNING: possible circular locking dependency detected
5.9.0+ #102 Not tainted
------------------------------------------------------
btrfs-cleaner/903 is trying to acquire lock:
ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
but task is already holding lock:
ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
down_read+0x40/0x130
caching_thread+0x53/0x5a0
btrfs_work_helper+0xfa/0x520
process_one_work+0x238/0x540
worker_thread+0x55/0x3c0
kthread+0x13a/0x150
ret_from_fork+0x1f/0x30
-> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
__mutex_lock+0x7e/0x7b0
btrfs_cache_block_group+0x1e0/0x510
find_free_extent+0xb6e/0x12f0
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb1/0x330
alloc_tree_block_no_bg_flush+0x4f/0x60
__btrfs_cow_block+0x11d/0x580
btrfs_cow_block+0x10c/0x220
commit_cowonly_roots+0x47/0x2e0
btrfs_commit_transaction+0x595/0xbd0
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x36/0xa0
cleanup_mnt+0x12d/0x190
task_work_run+0x5c/0xa0
exit_to_user_mode_prepare+0x1df/0x200
syscall_exit_to_user_mode+0x54/0x280
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&space_info->groups_sem){++++}-{3:3}:
down_read+0x40/0x130
find_free_extent+0x2ed/0x12f0
btrfs_reserve_extent+0xb3/0x1b0
btrfs_alloc_tree_block+0xb1/0x330
alloc_tree_block_no_bg_flush+0x4f/0x60
__btrfs_cow_block+0x11d/0x580
btrfs_cow_block+0x10c/0x220
commit_cowonly_roots+0x47/0x2e0
btrfs_commit_transaction+0x595/0xbd0
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20
deactivate_locked_super+0x36/0xa0
cleanup_mnt+0x12d/0x190
task_work_run+0x5c/0xa0
exit_to_user_mode_prepare+0x1df/0x200
syscall_exit_to_user_mode+0x54/0x280
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (btrfs-root-00){++++}-{3:3}:
__lock_acquire+0x1167/0x2150
lock_acquire+0xb9/0x3d0
down_read_nested+0x43/0x130
__btrfs_tree_read_lock+0x32/0x170
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x614/0x9d0
btrfs_find_root+0x35/0x1b0
btrfs_read_tree_root+0x61/0x120
btrfs_get_root_ref+0x14b/0x600
find_parent_nodes+0x3e6/0x1b30
btrfs_find_all_roots_safe+0xb4/0x130
btrfs_find_all_roots+0x60/0x80
btrfs_qgroup_trace_extent_post+0x27/0x40
btrfs_add_delayed_data_ref+0x3fd/0x460
btrfs_free_extent+0x42/0x100
__btrfs_mod_ref+0x1d7/0x2f0
walk_up_proc+0x11c/0x400
walk_up_tree+0xf0/0x180
btrfs_drop_snapshot+0x1c7/0x780
btrfs_clean_one_deleted_snapshot+0xfb/0x110
cleaner_kthread+0xd4/0x140
kthread+0x13a/0x150
ret_from_fork+0x1f/0x30
other info that might help us debug this:
Chain exists of:
btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->commit_root_sem);
lock(&caching_ctl->mutex);
lock(&fs_info->commit_root_sem);
lock(btrfs-root-00);
*** DEADLOCK ***
3 locks held by btrfs-cleaner/903:
#0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
#1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
#2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
stack backtrace:
CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
dump_stack+0x8b/0xb0
check_noncircular+0xcf/0xf0
__lock_acquire+0x1167/0x2150
? __bfs+0x42/0x210
lock_acquire+0xb9/0x3d0
? __btrfs_tree_read_lock+0x32/0x170
down_read_nested+0x43/0x130
? __btrfs_tree_read_lock+0x32/0x170
__btrfs_tree_read_lock+0x32/0x170
__btrfs_read_lock_root_node+0x3a/0x50
btrfs_search_slot+0x614/0x9d0
? find_held_lock+0x2b/0x80
btrfs_find_root+0x35/0x1b0
? do_raw_spin_unlock+0x4b/0xa0
btrfs_read_tree_root+0x61/0x120
btrfs_get_root_ref+0x14b/0x600
find_parent_nodes+0x3e6/0x1b30
btrfs_find_all_roots_safe+0xb4/0x130
btrfs_find_all_roots+0x60/0x80
btrfs_qgroup_trace_extent_post+0x27/0x40
btrfs_add_delayed_data_ref+0x3fd/0x460
btrfs_free_extent+0x42/0x100
__btrfs_mod_ref+0x1d7/0x2f0
walk_up_proc+0x11c/0x400
walk_up_tree+0xf0/0x180
btrfs_drop_snapshot+0x1c7/0x780
? btrfs_clean_one_deleted_snapshot+0x73/0x110
btrfs_clean_one_deleted_snapshot+0xfb/0x110
cleaner_kthread+0xd4/0x140
? btrfs_alloc_root+0x50/0x50
kthread+0x13a/0x150
? kthread_create_worker_on_cpu+0x40/0x40
ret_from_fork+0x1f/0x30
BTRFS info (device sdb): disk space caching is enabled
BTRFS info (device sdb): has skinny extents
This happens because qgroups does a backref lookup when we create a
delayed ref. From here it may have to look up a root from an indirect
ref, which does a normal lookup on the tree_root, which takes the read
lock on the tree_root nodes.
To fix this we need to add a variant for looking up roots that searches
the commit root of the tree_root. Then when we do the backref search
using the commit root we are sure to not take any locks on the tree_root
nodes. This gets rid of the lockdep splat when running btrfs/104.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/backref.c | 14 +++++++-
fs/btrfs/disk-io.c | 79 ++++++++++++++++++++++++++++++++++------------
fs/btrfs/disk-io.h | 3 ++
3 files changed, 74 insertions(+), 22 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index b3268f4ea5f3..cacba965c535 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -544,7 +544,19 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info,
int level = ref->level;
struct btrfs_key search_key = ref->key_for_search;
- root = btrfs_get_fs_root(fs_info, ref->root_id, false);
+ /*
+ * If we're search_commit_root we could possibly be holding locks on
+ * other tree nodes. This happens when qgroups does backref walks when
+ * adding new delayed refs. To deal with this we need to look in cache
+ * for the root, and if we don't find it then we need to search the
+ * tree_root's commit root, thus the btrfs_get_fs_root_commit_root usage
+ * here.
+ */
+ if (path->search_commit_root)
+ root = btrfs_get_fs_root_commit_root(fs_info, path,
+ ref->root_id);
+ else
+ root = btrfs_get_fs_root(fs_info, ref->root_id, false);
if (IS_ERR(root)) {
ret = PTR_ERR(root);
goto out_free;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 81e7b1880b5b..3972f16b333d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1281,32 +1281,26 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
return 0;
}
-struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
- struct btrfs_key *key)
+static struct btrfs_root *read_tree_root_path(struct btrfs_root *tree_root,
+ struct btrfs_path *path,
+ struct btrfs_key *key)
{
struct btrfs_root *root;
struct btrfs_fs_info *fs_info = tree_root->fs_info;
- struct btrfs_path *path;
u64 generation;
int ret;
int level;
- path = btrfs_alloc_path();
- if (!path)
- return ERR_PTR(-ENOMEM);
-
root = btrfs_alloc_root(fs_info, key->objectid, GFP_NOFS);
- if (!root) {
- ret = -ENOMEM;
- goto alloc_fail;
- }
+ if (!root)
+ return ERR_PTR(-ENOMEM);
ret = btrfs_find_root(tree_root, key, path,
&root->root_item, &root->root_key);
if (ret) {
if (ret > 0)
ret = -ENOENT;
- goto find_fail;
+ goto fail;
}
generation = btrfs_root_generation(&root->root_item);
@@ -1317,21 +1311,30 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
if (IS_ERR(root->node)) {
ret = PTR_ERR(root->node);
root->node = NULL;
- goto find_fail;
+ goto fail;
} else if (!btrfs_buffer_uptodate(root->node, generation, 0)) {
ret = -EIO;
- goto find_fail;
+ goto fail;
}
root->commit_root = btrfs_root_node(root);
-out:
- btrfs_free_path(path);
return root;
-
-find_fail:
+fail:
btrfs_put_root(root);
-alloc_fail:
- root = ERR_PTR(ret);
- goto out;
+ return ERR_PTR(ret);
+}
+
+struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
+ struct btrfs_key *key)
+{
+ struct btrfs_root *root;
+ struct btrfs_path *path;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return ERR_PTR(-ENOMEM);
+ root = read_tree_root_path(tree_root, path, key);
+ btrfs_free_path(path);
+ return root;
}
/*
@@ -1621,6 +1624,40 @@ struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
}
+/* btrfs_get_fs_root_commit_root - return a root for the given objectid
+ * @fs_info - the fs_info.
+ * @objectid = the objectid we need to lookup.
+ *
+ * This is exclusively used for backref walking, and exists specifically because
+ * of how qgroups does lookups. Qgroups will do a backref lookup at delayed ref
+ * creation time, which means we may have to read the tree_root in order to look
+ * up a fs root that is not in memory. If the root is not in memory we will
+ * read the tree root commit root and look up the fs root from there. This is a
+ * temporary root, it will not be inserted into the radix tree as it doesn't
+ * have the most uptodate information, it'll simply be discarded once the
+ * backref code is finished using the root.
+ */
+struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path,
+ u64 objectid)
+{
+ struct btrfs_root *root;
+ struct btrfs_key key;
+
+ ASSERT(path->search_commit_root && path->skip_locking);
+
+ root = btrfs_lookup_fs_root(fs_info, objectid);
+ if (root)
+ return root;
+
+ key.objectid = objectid;
+ key.type = BTRFS_ROOT_ITEM_KEY;
+ key.offset = (u64)-1;
+ root = read_tree_root_path(fs_info->tree_root, path, &key);
+ btrfs_release_path(path);
+ return root;
+}
+
/*
* called by the kthread helper functions to finally call the bio end_io
* functions. This is where read checksum verification actually happens
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index fee69ced58b4..182540bdcea0 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -69,6 +69,9 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
u64 objectid, bool check_ref);
struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
u64 objectid, dev_t anon_dev);
+struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
+ struct btrfs_path *path,
+ u64 objectid);
void btrfs_free_fs_info(struct btrfs_fs_info *fs_info);
int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info);
--
2.26.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
@ 2020-10-20 9:31 ` Filipe Manana
2020-10-21 15:19 ` David Sterba
1 sibling, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2020-10-20 9:31 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Tue, Oct 20, 2020 at 8:37 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> When enabling qgroups we walk the tree_root and then add a qgroup item
> for every root that we have. This creates a lock dependency on the
> tree_root and qgroup_root, which results in the following lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0-default+ #1299 Not tainted
> ------------------------------------------------------
> btrfs/24552 is trying to acquire lock:
> ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>
> but task is already holding lock:
> ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (btrfs-root-00){++++}-{3:3}:
> __lock_acquire+0x3fb/0x730
> lock_acquire.part.0+0x6a/0x130
> down_read_nested+0x46/0x130
> __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
> __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
> btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
> btrfs_search_slot+0xc3/0x9f0 [btrfs]
> btrfs_insert_item+0x6e/0x140 [btrfs]
> btrfs_create_tree+0x1cb/0x240 [btrfs]
> btrfs_quota_enable+0xcd/0x790 [btrfs]
> btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
> __x64_sys_ioctl+0x83/0xa0
> do_syscall_64+0x2d/0x70
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (btrfs-quota-00){++++}-{3:3}:
> check_prev_add+0x91/0xc30
> validate_chain+0x491/0x750
> __lock_acquire+0x3fb/0x730
> lock_acquire.part.0+0x6a/0x130
> down_read_nested+0x46/0x130
> __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
> __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
> btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
> btrfs_search_slot+0xc3/0x9f0 [btrfs]
> btrfs_insert_empty_items+0x58/0xa0 [btrfs]
> add_qgroup_item.part.0+0x72/0x210 [btrfs]
> btrfs_quota_enable+0x3bb/0x790 [btrfs]
> btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
> __x64_sys_ioctl+0x83/0xa0
> do_syscall_64+0x2d/0x70
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(btrfs-root-00);
> lock(btrfs-quota-00);
> lock(btrfs-root-00);
> lock(btrfs-quota-00);
>
> *** DEADLOCK ***
>
> 5 locks held by btrfs/24552:
> #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
> #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
> #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
> #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
> #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>
> stack backtrace:
> CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> Call Trace:
> dump_stack+0x77/0x97
> check_noncircular+0xf3/0x110
> check_prev_add+0x91/0xc30
> validate_chain+0x491/0x750
> __lock_acquire+0x3fb/0x730
> lock_acquire.part.0+0x6a/0x130
> ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
> ? lock_acquire+0xc4/0x140
> ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
> down_read_nested+0x46/0x130
> ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
> __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
> ? btrfs_root_node+0xd9/0x200 [btrfs]
> __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
> btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
> btrfs_search_slot+0xc3/0x9f0 [btrfs]
> btrfs_insert_empty_items+0x58/0xa0 [btrfs]
> add_qgroup_item.part.0+0x72/0x210 [btrfs]
> btrfs_quota_enable+0x3bb/0x790 [btrfs]
> btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
> __x64_sys_ioctl+0x83/0xa0
> do_syscall_64+0x2d/0x70
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fix this by dropping the path whenever we find a root item, add the
> qgroup item, and then re-lookup the root item we found and continue
> processing roots.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Looks good, thanks.
> ---
> fs/btrfs/qgroup.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 580899bdb991..573a555b51d8 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1026,6 +1026,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
> btrfs_item_key_to_cpu(leaf, &found_key, slot);
>
> if (found_key.type == BTRFS_ROOT_REF_KEY) {
> + btrfs_release_path(path);
> +
> ret = add_qgroup_item(trans, quota_root,
> found_key.offset);
> if (ret) {
> @@ -1044,6 +1046,20 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
> btrfs_abort_transaction(trans, ret);
> goto out_free_path;
> }
> + ret = btrfs_search_slot_for_read(tree_root, &found_key,
> + path, 1, 0);
> + if (ret < 0) {
> + btrfs_abort_transaction(trans, ret);
> + goto out_free_path;
> + }
> + if (ret > 0) {
> + /*
> + * Shouldn't happen, but in case it does we
> + * don't need to do the btrfs_next_item, just
> + * continue.
> + */
> + continue;
> + }
> }
> ret = btrfs_next_item(tree_root, path);
> if (ret < 0) {
> --
> 2.26.2
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
@ 2020-10-20 10:10 ` Filipe Manana
2020-10-21 14:05 ` Josef Bacik
2020-10-21 15:51 ` David Sterba
1 sibling, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2020-10-20 10:10 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> I got the following lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0+ #101 Not tainted
> ------------------------------------------------------
> btrfs-cleaner/3445 is trying to acquire lock:
> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>
> but task is already holding lock:
> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
> down_write+0x3d/0x70
> btrfs_cache_block_group+0x2d5/0x510
> find_free_extent+0xb6e/0x12f0
> btrfs_reserve_extent+0xb3/0x1b0
> btrfs_alloc_tree_block+0xb1/0x330
> alloc_tree_block_no_bg_flush+0x4f/0x60
> __btrfs_cow_block+0x11d/0x580
> btrfs_cow_block+0x10c/0x220
> commit_cowonly_roots+0x47/0x2e0
> btrfs_commit_transaction+0x595/0xbd0
> sync_filesystem+0x74/0x90
> generic_shutdown_super+0x22/0x100
> kill_anon_super+0x14/0x30
> btrfs_kill_super+0x12/0x20
> deactivate_locked_super+0x36/0xa0
> cleanup_mnt+0x12d/0x190
> task_work_run+0x5c/0xa0
> exit_to_user_mode_prepare+0x1df/0x200
> syscall_exit_to_user_mode+0x54/0x280
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #1 (&space_info->groups_sem){++++}-{3:3}:
> down_read+0x40/0x130
> find_free_extent+0x2ed/0x12f0
> btrfs_reserve_extent+0xb3/0x1b0
> btrfs_alloc_tree_block+0xb1/0x330
> alloc_tree_block_no_bg_flush+0x4f/0x60
> __btrfs_cow_block+0x11d/0x580
> btrfs_cow_block+0x10c/0x220
> commit_cowonly_roots+0x47/0x2e0
> btrfs_commit_transaction+0x595/0xbd0
> sync_filesystem+0x74/0x90
> generic_shutdown_super+0x22/0x100
> kill_anon_super+0x14/0x30
> btrfs_kill_super+0x12/0x20
> deactivate_locked_super+0x36/0xa0
> cleanup_mnt+0x12d/0x190
> task_work_run+0x5c/0xa0
> exit_to_user_mode_prepare+0x1df/0x200
> syscall_exit_to_user_mode+0x54/0x280
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (btrfs-root-00){++++}-{3:3}:
> __lock_acquire+0x1167/0x2150
> lock_acquire+0xb9/0x3d0
> down_read_nested+0x43/0x130
> __btrfs_tree_read_lock+0x32/0x170
> __btrfs_read_lock_root_node+0x3a/0x50
> btrfs_search_slot+0x614/0x9d0
> btrfs_find_root+0x35/0x1b0
> btrfs_read_tree_root+0x61/0x120
> btrfs_get_root_ref+0x14b/0x600
> find_parent_nodes+0x3e6/0x1b30
> btrfs_find_all_roots_safe+0xb4/0x130
> btrfs_find_all_roots+0x60/0x80
> btrfs_qgroup_trace_extent_post+0x27/0x40
> btrfs_add_delayed_data_ref+0x3fd/0x460
> btrfs_free_extent+0x42/0x100
> __btrfs_mod_ref+0x1d7/0x2f0
> walk_up_proc+0x11c/0x400
> walk_up_tree+0xf0/0x180
> btrfs_drop_snapshot+0x1c7/0x780
> btrfs_clean_one_deleted_snapshot+0xfb/0x110
> cleaner_kthread+0xd4/0x140
> kthread+0x13a/0x150
> ret_from_fork+0x1f/0x30
>
> other info that might help us debug this:
>
> Chain exists of:
> btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&fs_info->commit_root_sem);
> lock(&space_info->groups_sem);
> lock(&fs_info->commit_root_sem);
> lock(btrfs-root-00);
>
> *** DEADLOCK ***
>
> 3 locks held by btrfs-cleaner/3445:
> #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
> #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
> #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> stack backtrace:
> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
> dump_stack+0x8b/0xb0
> check_noncircular+0xcf/0xf0
> __lock_acquire+0x1167/0x2150
> ? __bfs+0x42/0x210
> lock_acquire+0xb9/0x3d0
> ? __btrfs_tree_read_lock+0x32/0x170
> down_read_nested+0x43/0x130
> ? __btrfs_tree_read_lock+0x32/0x170
> __btrfs_tree_read_lock+0x32/0x170
> __btrfs_read_lock_root_node+0x3a/0x50
> btrfs_search_slot+0x614/0x9d0
> ? find_held_lock+0x2b/0x80
> btrfs_find_root+0x35/0x1b0
> ? do_raw_spin_unlock+0x4b/0xa0
> btrfs_read_tree_root+0x61/0x120
> btrfs_get_root_ref+0x14b/0x600
> find_parent_nodes+0x3e6/0x1b30
> btrfs_find_all_roots_safe+0xb4/0x130
> btrfs_find_all_roots+0x60/0x80
> btrfs_qgroup_trace_extent_post+0x27/0x40
> btrfs_add_delayed_data_ref+0x3fd/0x460
> btrfs_free_extent+0x42/0x100
> __btrfs_mod_ref+0x1d7/0x2f0
> walk_up_proc+0x11c/0x400
> walk_up_tree+0xf0/0x180
> btrfs_drop_snapshot+0x1c7/0x780
> ? btrfs_clean_one_deleted_snapshot+0x73/0x110
> btrfs_clean_one_deleted_snapshot+0xfb/0x110
> cleaner_kthread+0xd4/0x140
> ? btrfs_alloc_root+0x50/0x50
> kthread+0x13a/0x150
> ? kthread_create_worker_on_cpu+0x40/0x40
> ret_from_fork+0x1f/0x30
>
> while testing another lockdep fix. This happens because we're using the
> commit_root_sem to protect fs_info->caching_block_groups, which creates
> a dependency on the groups_sem -> commit_root_sem, which is problematic
> because we will allocate blocks while holding tree roots. Fix this by
> making the list itself protected by the fs_info->block_group_cache_lock.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/btrfs/block-group.c | 12 ++++++------
> fs/btrfs/extent-tree.c | 2 ++
> 2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 3ba6f3839d39..a8240913e6fc 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
> return 0;
> }
>
> - down_write(&fs_info->commit_root_sem);
> + spin_lock(&fs_info->block_group_cache_lock);
> refcount_inc(&caching_ctl->count);
> list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
> - up_write(&fs_info->commit_root_sem);
> + spin_unlock(&fs_info->block_group_cache_lock);
>
> btrfs_get_block_group(cache);
>
> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> if (block_group->cached == BTRFS_CACHE_STARTED)
> btrfs_wait_block_group_cache_done(block_group);
> if (block_group->has_caching_ctl) {
> - down_write(&fs_info->commit_root_sem);
> + spin_lock(&fs_info->block_group_cache_lock);
> if (!caching_ctl) {
> struct btrfs_caching_control *ctl;
>
> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> }
> if (caching_ctl)
> list_del_init(&caching_ctl->list);
> - up_write(&fs_info->commit_root_sem);
> + spin_unlock(&fs_info->block_group_cache_lock);
> if (caching_ctl) {
> /* Once for the caching bgs list and once for us. */
> btrfs_put_caching_control(caching_ctl);
> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
> struct btrfs_caching_control *caching_ctl;
> struct rb_node *n;
>
> - down_write(&info->commit_root_sem);
> + spin_lock(&info->block_group_cache_lock);
> while (!list_empty(&info->caching_block_groups)) {
> caching_ctl = list_entry(info->caching_block_groups.next,
> struct btrfs_caching_control, list);
> list_del(&caching_ctl->list);
> btrfs_put_caching_control(caching_ctl);
> }
> - up_write(&info->commit_root_sem);
> + spin_unlock(&info->block_group_cache_lock);
>
> spin_lock(&info->unused_bgs_lock);
> while (!list_empty(&info->unused_bgs)) {
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 5fd60b13f4f8..ff2b5c132a70 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>
> down_write(&fs_info->commit_root_sem);
>
> + spin_lock(&fs_info->block_group_cache_lock);
Why are we still doing the down_write on the commit_root_sem? It
doesn't seem necessary anymore (only for switching the commit roots
now).
Thanks.
> list_for_each_entry_safe(caching_ctl, next,
> &fs_info->caching_block_groups, list) {
> cache = caching_ctl->block_group;
> @@ -2749,6 +2750,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
> cache->last_byte_to_unpin = caching_ctl->progress;
> }
> }
> + spin_unlock(&fs_info->block_group_cache_lock);
>
> up_write(&fs_info->commit_root_sem);
>
> --
> 2.26.2
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup
2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
@ 2020-10-20 11:06 ` Filipe Manana
0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2020-10-20 11:06 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Tue, Oct 20, 2020 at 8:37 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> I got the following lockdep splat with my rwsem patches on btrfs/104
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0+ #102 Not tainted
> ------------------------------------------------------
> btrfs-cleaner/903 is trying to acquire lock:
> ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>
> but task is already holding lock:
> ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
> down_read+0x40/0x130
> caching_thread+0x53/0x5a0
> btrfs_work_helper+0xfa/0x520
> process_one_work+0x238/0x540
> worker_thread+0x55/0x3c0
> kthread+0x13a/0x150
> ret_from_fork+0x1f/0x30
>
> -> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
> __mutex_lock+0x7e/0x7b0
> btrfs_cache_block_group+0x1e0/0x510
> find_free_extent+0xb6e/0x12f0
> btrfs_reserve_extent+0xb3/0x1b0
> btrfs_alloc_tree_block+0xb1/0x330
> alloc_tree_block_no_bg_flush+0x4f/0x60
> __btrfs_cow_block+0x11d/0x580
> btrfs_cow_block+0x10c/0x220
> commit_cowonly_roots+0x47/0x2e0
> btrfs_commit_transaction+0x595/0xbd0
> sync_filesystem+0x74/0x90
> generic_shutdown_super+0x22/0x100
> kill_anon_super+0x14/0x30
> btrfs_kill_super+0x12/0x20
> deactivate_locked_super+0x36/0xa0
> cleanup_mnt+0x12d/0x190
> task_work_run+0x5c/0xa0
> exit_to_user_mode_prepare+0x1df/0x200
> syscall_exit_to_user_mode+0x54/0x280
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #1 (&space_info->groups_sem){++++}-{3:3}:
> down_read+0x40/0x130
> find_free_extent+0x2ed/0x12f0
> btrfs_reserve_extent+0xb3/0x1b0
> btrfs_alloc_tree_block+0xb1/0x330
> alloc_tree_block_no_bg_flush+0x4f/0x60
> __btrfs_cow_block+0x11d/0x580
> btrfs_cow_block+0x10c/0x220
> commit_cowonly_roots+0x47/0x2e0
> btrfs_commit_transaction+0x595/0xbd0
> sync_filesystem+0x74/0x90
> generic_shutdown_super+0x22/0x100
> kill_anon_super+0x14/0x30
> btrfs_kill_super+0x12/0x20
> deactivate_locked_super+0x36/0xa0
> cleanup_mnt+0x12d/0x190
> task_work_run+0x5c/0xa0
> exit_to_user_mode_prepare+0x1df/0x200
> syscall_exit_to_user_mode+0x54/0x280
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (btrfs-root-00){++++}-{3:3}:
> __lock_acquire+0x1167/0x2150
> lock_acquire+0xb9/0x3d0
> down_read_nested+0x43/0x130
> __btrfs_tree_read_lock+0x32/0x170
> __btrfs_read_lock_root_node+0x3a/0x50
> btrfs_search_slot+0x614/0x9d0
> btrfs_find_root+0x35/0x1b0
> btrfs_read_tree_root+0x61/0x120
> btrfs_get_root_ref+0x14b/0x600
> find_parent_nodes+0x3e6/0x1b30
> btrfs_find_all_roots_safe+0xb4/0x130
> btrfs_find_all_roots+0x60/0x80
> btrfs_qgroup_trace_extent_post+0x27/0x40
> btrfs_add_delayed_data_ref+0x3fd/0x460
> btrfs_free_extent+0x42/0x100
> __btrfs_mod_ref+0x1d7/0x2f0
> walk_up_proc+0x11c/0x400
> walk_up_tree+0xf0/0x180
> btrfs_drop_snapshot+0x1c7/0x780
> btrfs_clean_one_deleted_snapshot+0xfb/0x110
> cleaner_kthread+0xd4/0x140
> kthread+0x13a/0x150
> ret_from_fork+0x1f/0x30
>
> other info that might help us debug this:
>
> Chain exists of:
> btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&fs_info->commit_root_sem);
> lock(&caching_ctl->mutex);
> lock(&fs_info->commit_root_sem);
> lock(btrfs-root-00);
>
> *** DEADLOCK ***
>
> 3 locks held by btrfs-cleaner/903:
> #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
> #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
> #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> stack backtrace:
> CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
> dump_stack+0x8b/0xb0
> check_noncircular+0xcf/0xf0
> __lock_acquire+0x1167/0x2150
> ? __bfs+0x42/0x210
> lock_acquire+0xb9/0x3d0
> ? __btrfs_tree_read_lock+0x32/0x170
> down_read_nested+0x43/0x130
> ? __btrfs_tree_read_lock+0x32/0x170
> __btrfs_tree_read_lock+0x32/0x170
> __btrfs_read_lock_root_node+0x3a/0x50
> btrfs_search_slot+0x614/0x9d0
> ? find_held_lock+0x2b/0x80
> btrfs_find_root+0x35/0x1b0
> ? do_raw_spin_unlock+0x4b/0xa0
> btrfs_read_tree_root+0x61/0x120
> btrfs_get_root_ref+0x14b/0x600
> find_parent_nodes+0x3e6/0x1b30
> btrfs_find_all_roots_safe+0xb4/0x130
> btrfs_find_all_roots+0x60/0x80
> btrfs_qgroup_trace_extent_post+0x27/0x40
> btrfs_add_delayed_data_ref+0x3fd/0x460
> btrfs_free_extent+0x42/0x100
> __btrfs_mod_ref+0x1d7/0x2f0
> walk_up_proc+0x11c/0x400
> walk_up_tree+0xf0/0x180
> btrfs_drop_snapshot+0x1c7/0x780
> ? btrfs_clean_one_deleted_snapshot+0x73/0x110
> btrfs_clean_one_deleted_snapshot+0xfb/0x110
> cleaner_kthread+0xd4/0x140
> ? btrfs_alloc_root+0x50/0x50
> kthread+0x13a/0x150
> ? kthread_create_worker_on_cpu+0x40/0x40
> ret_from_fork+0x1f/0x30
> BTRFS info (device sdb): disk space caching is enabled
> BTRFS info (device sdb): has skinny extents
>
> This happens because qgroups does a backref lookup when we create a
> delayed ref. From here it may have to look up a root from an indirect
> ref, which does a normal lookup on the tree_root, which takes the read
> lock on the tree_root nodes.
>
> To fix this we need to add a variant for looking up roots that searches
> the commit root of the tree_root. Then when we do the backref search
> using the commit root we are sure to not take any locks on the tree_root
> nodes. This gets rid of the lockdep splat when running btrfs/104.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Looks good, only the first comment line at the top of the new function
btrfs_get_fs_root_commit_root() doesn't follow the preferred style.
Other than that, it's fine as far as I can see.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks.
> ---
> fs/btrfs/backref.c | 14 +++++++-
> fs/btrfs/disk-io.c | 79 ++++++++++++++++++++++++++++++++++------------
> fs/btrfs/disk-io.h | 3 ++
> 3 files changed, 74 insertions(+), 22 deletions(-)
>
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index b3268f4ea5f3..cacba965c535 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -544,7 +544,19 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info,
> int level = ref->level;
> struct btrfs_key search_key = ref->key_for_search;
>
> - root = btrfs_get_fs_root(fs_info, ref->root_id, false);
> + /*
> + * If we're search_commit_root we could possibly be holding locks on
> + * other tree nodes. This happens when qgroups does backref walks when
> + * adding new delayed refs. To deal with this we need to look in cache
> + * for the root, and if we don't find it then we need to search the
> + * tree_root's commit root, thus the btrfs_get_fs_root_commit_root usage
> + * here.
> + */
> + if (path->search_commit_root)
> + root = btrfs_get_fs_root_commit_root(fs_info, path,
> + ref->root_id);
> + else
> + root = btrfs_get_fs_root(fs_info, ref->root_id, false);
> if (IS_ERR(root)) {
> ret = PTR_ERR(root);
> goto out_free;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 81e7b1880b5b..3972f16b333d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1281,32 +1281,26 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
> return 0;
> }
>
> -struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
> - struct btrfs_key *key)
> +static struct btrfs_root *read_tree_root_path(struct btrfs_root *tree_root,
> + struct btrfs_path *path,
> + struct btrfs_key *key)
> {
> struct btrfs_root *root;
> struct btrfs_fs_info *fs_info = tree_root->fs_info;
> - struct btrfs_path *path;
> u64 generation;
> int ret;
> int level;
>
> - path = btrfs_alloc_path();
> - if (!path)
> - return ERR_PTR(-ENOMEM);
> -
> root = btrfs_alloc_root(fs_info, key->objectid, GFP_NOFS);
> - if (!root) {
> - ret = -ENOMEM;
> - goto alloc_fail;
> - }
> + if (!root)
> + return ERR_PTR(-ENOMEM);
>
> ret = btrfs_find_root(tree_root, key, path,
> &root->root_item, &root->root_key);
> if (ret) {
> if (ret > 0)
> ret = -ENOENT;
> - goto find_fail;
> + goto fail;
> }
>
> generation = btrfs_root_generation(&root->root_item);
> @@ -1317,21 +1311,30 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
> if (IS_ERR(root->node)) {
> ret = PTR_ERR(root->node);
> root->node = NULL;
> - goto find_fail;
> + goto fail;
> } else if (!btrfs_buffer_uptodate(root->node, generation, 0)) {
> ret = -EIO;
> - goto find_fail;
> + goto fail;
> }
> root->commit_root = btrfs_root_node(root);
> -out:
> - btrfs_free_path(path);
> return root;
> -
> -find_fail:
> +fail:
> btrfs_put_root(root);
> -alloc_fail:
> - root = ERR_PTR(ret);
> - goto out;
> + return ERR_PTR(ret);
> +}
> +
> +struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
> + struct btrfs_key *key)
> +{
> + struct btrfs_root *root;
> + struct btrfs_path *path;
> +
> + path = btrfs_alloc_path();
> + if (!path)
> + return ERR_PTR(-ENOMEM);
> + root = read_tree_root_path(tree_root, path, key);
> + btrfs_free_path(path);
> + return root;
> }
>
> /*
> @@ -1621,6 +1624,40 @@ struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
> return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
> }
>
> +/* btrfs_get_fs_root_commit_root - return a root for the given objectid
> + * @fs_info - the fs_info.
> + * @objectid = the objectid we need to lookup.
> + *
> + * This is exclusively used for backref walking, and exists specifically because
> + * of how qgroups does lookups. Qgroups will do a backref lookup at delayed ref
> + * creation time, which means we may have to read the tree_root in order to look
> + * up a fs root that is not in memory. If the root is not in memory we will
> + * read the tree root commit root and look up the fs root from there. This is a
> + * temporary root, it will not be inserted into the radix tree as it doesn't
> + * have the most uptodate information, it'll simply be discarded once the
> + * backref code is finished using the root.
> + */
> +struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
> + struct btrfs_path *path,
> + u64 objectid)
> +{
> + struct btrfs_root *root;
> + struct btrfs_key key;
> +
> + ASSERT(path->search_commit_root && path->skip_locking);
> +
> + root = btrfs_lookup_fs_root(fs_info, objectid);
> + if (root)
> + return root;
> +
> + key.objectid = objectid;
> + key.type = BTRFS_ROOT_ITEM_KEY;
> + key.offset = (u64)-1;
> + root = read_tree_root_path(fs_info->tree_root, path, &key);
> + btrfs_release_path(path);
> + return root;
> +}
> +
> /*
> * called by the kthread helper functions to finally call the bio end_io
> * functions. This is where read checksum verification actually happens
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index fee69ced58b4..182540bdcea0 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -69,6 +69,9 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
> u64 objectid, bool check_ref);
> struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
> u64 objectid, dev_t anon_dev);
> +struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
> + struct btrfs_path *path,
> + u64 objectid);
>
> void btrfs_free_fs_info(struct btrfs_fs_info *fs_info);
> int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info);
> --
> 2.26.2
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
2020-10-20 10:10 ` Filipe Manana
@ 2020-10-21 14:05 ` Josef Bacik
2020-10-21 16:36 ` Filipe Manana
0 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-10-21 14:05 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, kernel-team
On 10/20/20 6:10 AM, Filipe Manana wrote:
> On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> I got the following lockdep splat
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.9.0+ #101 Not tainted
>> ------------------------------------------------------
>> btrfs-cleaner/3445 is trying to acquire lock:
>> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>>
>> but task is already holding lock:
>> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
>> down_write+0x3d/0x70
>> btrfs_cache_block_group+0x2d5/0x510
>> find_free_extent+0xb6e/0x12f0
>> btrfs_reserve_extent+0xb3/0x1b0
>> btrfs_alloc_tree_block+0xb1/0x330
>> alloc_tree_block_no_bg_flush+0x4f/0x60
>> __btrfs_cow_block+0x11d/0x580
>> btrfs_cow_block+0x10c/0x220
>> commit_cowonly_roots+0x47/0x2e0
>> btrfs_commit_transaction+0x595/0xbd0
>> sync_filesystem+0x74/0x90
>> generic_shutdown_super+0x22/0x100
>> kill_anon_super+0x14/0x30
>> btrfs_kill_super+0x12/0x20
>> deactivate_locked_super+0x36/0xa0
>> cleanup_mnt+0x12d/0x190
>> task_work_run+0x5c/0xa0
>> exit_to_user_mode_prepare+0x1df/0x200
>> syscall_exit_to_user_mode+0x54/0x280
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> -> #1 (&space_info->groups_sem){++++}-{3:3}:
>> down_read+0x40/0x130
>> find_free_extent+0x2ed/0x12f0
>> btrfs_reserve_extent+0xb3/0x1b0
>> btrfs_alloc_tree_block+0xb1/0x330
>> alloc_tree_block_no_bg_flush+0x4f/0x60
>> __btrfs_cow_block+0x11d/0x580
>> btrfs_cow_block+0x10c/0x220
>> commit_cowonly_roots+0x47/0x2e0
>> btrfs_commit_transaction+0x595/0xbd0
>> sync_filesystem+0x74/0x90
>> generic_shutdown_super+0x22/0x100
>> kill_anon_super+0x14/0x30
>> btrfs_kill_super+0x12/0x20
>> deactivate_locked_super+0x36/0xa0
>> cleanup_mnt+0x12d/0x190
>> task_work_run+0x5c/0xa0
>> exit_to_user_mode_prepare+0x1df/0x200
>> syscall_exit_to_user_mode+0x54/0x280
>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> -> #0 (btrfs-root-00){++++}-{3:3}:
>> __lock_acquire+0x1167/0x2150
>> lock_acquire+0xb9/0x3d0
>> down_read_nested+0x43/0x130
>> __btrfs_tree_read_lock+0x32/0x170
>> __btrfs_read_lock_root_node+0x3a/0x50
>> btrfs_search_slot+0x614/0x9d0
>> btrfs_find_root+0x35/0x1b0
>> btrfs_read_tree_root+0x61/0x120
>> btrfs_get_root_ref+0x14b/0x600
>> find_parent_nodes+0x3e6/0x1b30
>> btrfs_find_all_roots_safe+0xb4/0x130
>> btrfs_find_all_roots+0x60/0x80
>> btrfs_qgroup_trace_extent_post+0x27/0x40
>> btrfs_add_delayed_data_ref+0x3fd/0x460
>> btrfs_free_extent+0x42/0x100
>> __btrfs_mod_ref+0x1d7/0x2f0
>> walk_up_proc+0x11c/0x400
>> walk_up_tree+0xf0/0x180
>> btrfs_drop_snapshot+0x1c7/0x780
>> btrfs_clean_one_deleted_snapshot+0xfb/0x110
>> cleaner_kthread+0xd4/0x140
>> kthread+0x13a/0x150
>> ret_from_fork+0x1f/0x30
>>
>> other info that might help us debug this:
>>
>> Chain exists of:
>> btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
>>
>> Possible unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(&fs_info->commit_root_sem);
>> lock(&space_info->groups_sem);
>> lock(&fs_info->commit_root_sem);
>> lock(btrfs-root-00);
>>
>> *** DEADLOCK ***
>>
>> 3 locks held by btrfs-cleaner/3445:
>> #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
>> #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
>> #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>
>> stack backtrace:
>> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
>> Call Trace:
>> dump_stack+0x8b/0xb0
>> check_noncircular+0xcf/0xf0
>> __lock_acquire+0x1167/0x2150
>> ? __bfs+0x42/0x210
>> lock_acquire+0xb9/0x3d0
>> ? __btrfs_tree_read_lock+0x32/0x170
>> down_read_nested+0x43/0x130
>> ? __btrfs_tree_read_lock+0x32/0x170
>> __btrfs_tree_read_lock+0x32/0x170
>> __btrfs_read_lock_root_node+0x3a/0x50
>> btrfs_search_slot+0x614/0x9d0
>> ? find_held_lock+0x2b/0x80
>> btrfs_find_root+0x35/0x1b0
>> ? do_raw_spin_unlock+0x4b/0xa0
>> btrfs_read_tree_root+0x61/0x120
>> btrfs_get_root_ref+0x14b/0x600
>> find_parent_nodes+0x3e6/0x1b30
>> btrfs_find_all_roots_safe+0xb4/0x130
>> btrfs_find_all_roots+0x60/0x80
>> btrfs_qgroup_trace_extent_post+0x27/0x40
>> btrfs_add_delayed_data_ref+0x3fd/0x460
>> btrfs_free_extent+0x42/0x100
>> __btrfs_mod_ref+0x1d7/0x2f0
>> walk_up_proc+0x11c/0x400
>> walk_up_tree+0xf0/0x180
>> btrfs_drop_snapshot+0x1c7/0x780
>> ? btrfs_clean_one_deleted_snapshot+0x73/0x110
>> btrfs_clean_one_deleted_snapshot+0xfb/0x110
>> cleaner_kthread+0xd4/0x140
>> ? btrfs_alloc_root+0x50/0x50
>> kthread+0x13a/0x150
>> ? kthread_create_worker_on_cpu+0x40/0x40
>> ret_from_fork+0x1f/0x30
>>
>> while testing another lockdep fix. This happens because we're using the
>> commit_root_sem to protect fs_info->caching_block_groups, which creates
>> a dependency on the groups_sem -> commit_root_sem, which is problematic
>> because we will allocate blocks while holding tree roots. Fix this by
>> making the list itself protected by the fs_info->block_group_cache_lock.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>> fs/btrfs/block-group.c | 12 ++++++------
>> fs/btrfs/extent-tree.c | 2 ++
>> 2 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 3ba6f3839d39..a8240913e6fc 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
>> return 0;
>> }
>>
>> - down_write(&fs_info->commit_root_sem);
>> + spin_lock(&fs_info->block_group_cache_lock);
>> refcount_inc(&caching_ctl->count);
>> list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
>> - up_write(&fs_info->commit_root_sem);
>> + spin_unlock(&fs_info->block_group_cache_lock);
>>
>> btrfs_get_block_group(cache);
>>
>> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> if (block_group->cached == BTRFS_CACHE_STARTED)
>> btrfs_wait_block_group_cache_done(block_group);
>> if (block_group->has_caching_ctl) {
>> - down_write(&fs_info->commit_root_sem);
>> + spin_lock(&fs_info->block_group_cache_lock);
>> if (!caching_ctl) {
>> struct btrfs_caching_control *ctl;
>>
>> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>> }
>> if (caching_ctl)
>> list_del_init(&caching_ctl->list);
>> - up_write(&fs_info->commit_root_sem);
>> + spin_unlock(&fs_info->block_group_cache_lock);
>> if (caching_ctl) {
>> /* Once for the caching bgs list and once for us. */
>> btrfs_put_caching_control(caching_ctl);
>> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>> struct btrfs_caching_control *caching_ctl;
>> struct rb_node *n;
>>
>> - down_write(&info->commit_root_sem);
>> + spin_lock(&info->block_group_cache_lock);
>> while (!list_empty(&info->caching_block_groups)) {
>> caching_ctl = list_entry(info->caching_block_groups.next,
>> struct btrfs_caching_control, list);
>> list_del(&caching_ctl->list);
>> btrfs_put_caching_control(caching_ctl);
>> }
>> - up_write(&info->commit_root_sem);
>> + spin_unlock(&info->block_group_cache_lock);
>>
>> spin_lock(&info->unused_bgs_lock);
>> while (!list_empty(&info->unused_bgs)) {
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 5fd60b13f4f8..ff2b5c132a70 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>>
>> down_write(&fs_info->commit_root_sem);
>>
>> + spin_lock(&fs_info->block_group_cache_lock);
>
> Why are we still doing the down_write on the commit_root_sem? It
> doesn't seem necessary anymore (only for switching the commit roots
> now).
Because the commit_root_sem is doing something else here, it's protecting the
->last_byte_to_unpin field. We have to coordinate the end of the transaction
with the caching threads, but the list itself just needs to be protected with a
spin_lock. Thanks,
Josef
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
2020-10-20 9:31 ` Filipe Manana
@ 2020-10-21 15:19 ` David Sterba
1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-10-21 15:19 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Mon, Oct 19, 2020 at 04:02:29PM -0400, Josef Bacik wrote:
> When enabling qgroups we walk the tree_root and then add a qgroup item
> for every root that we have. This creates a lock dependency on the
> tree_root and qgroup_root, which results in the following lockdep splat
This should mention that it's with the rwsem and that we've seen it on
btrfs/022 and btrfs/017.
That it's with rwsem is in the cover letter but we need it in the
patches too.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
2020-10-20 10:10 ` Filipe Manana
@ 2020-10-21 15:51 ` David Sterba
1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-10-21 15:51 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Mon, Oct 19, 2020 at 04:02:30PM -0400, Josef Bacik wrote:
> while testing another lockdep fix. This happens because we're using the
> commit_root_sem to protect fs_info->caching_block_groups, which creates
> a dependency on the groups_sem -> commit_root_sem, which is problematic
> because we will allocate blocks while holding tree roots. Fix this by
> making the list itself protected by the fs_info->block_group_cache_lock.
That's lacking explanation why this is supposed to be correct. Replacing
semaphore by a spin lock has implications and given that he commit root
sem does not protect a single structure or is sometimes used for context
exclusion, this requires more in the changelog and updated perhaps
comments for the locks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/3] Lockdep fixes for misc-next
2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
` (2 preceding siblings ...)
2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
@ 2020-10-21 16:13 ` David Sterba
3 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-10-21 16:13 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Mon, Oct 19, 2020 at 04:02:28PM -0400, Josef Bacik wrote:
> Hello,
>
> Here's a few lockdep fixes for misc-next+my rwsem patch. Nothing too crazy, but
> the last one is a little wonkey because qgroups does a backref resolution while
> adding delayed refs. Generally we do the right thing with searching the commit
> roots and skipping the locking, with the exception of looking up fs roots if we
> have to resolve indirect refs. This obviously uses the normal lookup and
> locking stuff, which is problematic in the new world order. For now I'm fixing
> it with a special helper for backref lookups that either finds the root in
> cache, or generates a temporary root that's not inserted into the fs roots radix
> tree and is only used to do the backref resolution. Thanks,
>
> Josef
>
> Josef Bacik (3):
> btrfs: drop the path before adding qgroup items when enabling qgroups
> btrfs: protect the fs_info->caching_block_groups differently
> btrfs: add a helper to read the tree_root commit root for backref
> lookup
I'll add 1 and 3 to misc-next for now.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
2020-10-21 14:05 ` Josef Bacik
@ 2020-10-21 16:36 ` Filipe Manana
2020-10-21 18:44 ` Josef Bacik
0 siblings, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2020-10-21 16:36 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Wed, Oct 21, 2020 at 3:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 10/20/20 6:10 AM, Filipe Manana wrote:
> > On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> I got the following lockdep splat
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 5.9.0+ #101 Not tainted
> >> ------------------------------------------------------
> >> btrfs-cleaner/3445 is trying to acquire lock:
> >> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
> >>
> >> but task is already holding lock:
> >> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
> >> down_write+0x3d/0x70
> >> btrfs_cache_block_group+0x2d5/0x510
> >> find_free_extent+0xb6e/0x12f0
> >> btrfs_reserve_extent+0xb3/0x1b0
> >> btrfs_alloc_tree_block+0xb1/0x330
> >> alloc_tree_block_no_bg_flush+0x4f/0x60
> >> __btrfs_cow_block+0x11d/0x580
> >> btrfs_cow_block+0x10c/0x220
> >> commit_cowonly_roots+0x47/0x2e0
> >> btrfs_commit_transaction+0x595/0xbd0
> >> sync_filesystem+0x74/0x90
> >> generic_shutdown_super+0x22/0x100
> >> kill_anon_super+0x14/0x30
> >> btrfs_kill_super+0x12/0x20
> >> deactivate_locked_super+0x36/0xa0
> >> cleanup_mnt+0x12d/0x190
> >> task_work_run+0x5c/0xa0
> >> exit_to_user_mode_prepare+0x1df/0x200
> >> syscall_exit_to_user_mode+0x54/0x280
> >> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >> -> #1 (&space_info->groups_sem){++++}-{3:3}:
> >> down_read+0x40/0x130
> >> find_free_extent+0x2ed/0x12f0
> >> btrfs_reserve_extent+0xb3/0x1b0
> >> btrfs_alloc_tree_block+0xb1/0x330
> >> alloc_tree_block_no_bg_flush+0x4f/0x60
> >> __btrfs_cow_block+0x11d/0x580
> >> btrfs_cow_block+0x10c/0x220
> >> commit_cowonly_roots+0x47/0x2e0
> >> btrfs_commit_transaction+0x595/0xbd0
> >> sync_filesystem+0x74/0x90
> >> generic_shutdown_super+0x22/0x100
> >> kill_anon_super+0x14/0x30
> >> btrfs_kill_super+0x12/0x20
> >> deactivate_locked_super+0x36/0xa0
> >> cleanup_mnt+0x12d/0x190
> >> task_work_run+0x5c/0xa0
> >> exit_to_user_mode_prepare+0x1df/0x200
> >> syscall_exit_to_user_mode+0x54/0x280
> >> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >> -> #0 (btrfs-root-00){++++}-{3:3}:
> >> __lock_acquire+0x1167/0x2150
> >> lock_acquire+0xb9/0x3d0
> >> down_read_nested+0x43/0x130
> >> __btrfs_tree_read_lock+0x32/0x170
> >> __btrfs_read_lock_root_node+0x3a/0x50
> >> btrfs_search_slot+0x614/0x9d0
> >> btrfs_find_root+0x35/0x1b0
> >> btrfs_read_tree_root+0x61/0x120
> >> btrfs_get_root_ref+0x14b/0x600
> >> find_parent_nodes+0x3e6/0x1b30
> >> btrfs_find_all_roots_safe+0xb4/0x130
> >> btrfs_find_all_roots+0x60/0x80
> >> btrfs_qgroup_trace_extent_post+0x27/0x40
> >> btrfs_add_delayed_data_ref+0x3fd/0x460
> >> btrfs_free_extent+0x42/0x100
> >> __btrfs_mod_ref+0x1d7/0x2f0
> >> walk_up_proc+0x11c/0x400
> >> walk_up_tree+0xf0/0x180
> >> btrfs_drop_snapshot+0x1c7/0x780
> >> btrfs_clean_one_deleted_snapshot+0xfb/0x110
> >> cleaner_kthread+0xd4/0x140
> >> kthread+0x13a/0x150
> >> ret_from_fork+0x1f/0x30
> >>
> >> other info that might help us debug this:
> >>
> >> Chain exists of:
> >> btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
> >>
> >> Possible unsafe locking scenario:
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> lock(&fs_info->commit_root_sem);
> >> lock(&space_info->groups_sem);
> >> lock(&fs_info->commit_root_sem);
> >> lock(btrfs-root-00);
> >>
> >> *** DEADLOCK ***
> >>
> >> 3 locks held by btrfs-cleaner/3445:
> >> #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
> >> #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
> >> #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
> >>
> >> stack backtrace:
> >> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
> >> Call Trace:
> >> dump_stack+0x8b/0xb0
> >> check_noncircular+0xcf/0xf0
> >> __lock_acquire+0x1167/0x2150
> >> ? __bfs+0x42/0x210
> >> lock_acquire+0xb9/0x3d0
> >> ? __btrfs_tree_read_lock+0x32/0x170
> >> down_read_nested+0x43/0x130
> >> ? __btrfs_tree_read_lock+0x32/0x170
> >> __btrfs_tree_read_lock+0x32/0x170
> >> __btrfs_read_lock_root_node+0x3a/0x50
> >> btrfs_search_slot+0x614/0x9d0
> >> ? find_held_lock+0x2b/0x80
> >> btrfs_find_root+0x35/0x1b0
> >> ? do_raw_spin_unlock+0x4b/0xa0
> >> btrfs_read_tree_root+0x61/0x120
> >> btrfs_get_root_ref+0x14b/0x600
> >> find_parent_nodes+0x3e6/0x1b30
> >> btrfs_find_all_roots_safe+0xb4/0x130
> >> btrfs_find_all_roots+0x60/0x80
> >> btrfs_qgroup_trace_extent_post+0x27/0x40
> >> btrfs_add_delayed_data_ref+0x3fd/0x460
> >> btrfs_free_extent+0x42/0x100
> >> __btrfs_mod_ref+0x1d7/0x2f0
> >> walk_up_proc+0x11c/0x400
> >> walk_up_tree+0xf0/0x180
> >> btrfs_drop_snapshot+0x1c7/0x780
> >> ? btrfs_clean_one_deleted_snapshot+0x73/0x110
> >> btrfs_clean_one_deleted_snapshot+0xfb/0x110
> >> cleaner_kthread+0xd4/0x140
> >> ? btrfs_alloc_root+0x50/0x50
> >> kthread+0x13a/0x150
> >> ? kthread_create_worker_on_cpu+0x40/0x40
> >> ret_from_fork+0x1f/0x30
> >>
> >> while testing another lockdep fix. This happens because we're using the
> >> commit_root_sem to protect fs_info->caching_block_groups, which creates
> >> a dependency on the groups_sem -> commit_root_sem, which is problematic
> >> because we will allocate blocks while holding tree roots. Fix this by
> >> making the list itself protected by the fs_info->block_group_cache_lock.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >> ---
> >> fs/btrfs/block-group.c | 12 ++++++------
> >> fs/btrfs/extent-tree.c | 2 ++
> >> 2 files changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> >> index 3ba6f3839d39..a8240913e6fc 100644
> >> --- a/fs/btrfs/block-group.c
> >> +++ b/fs/btrfs/block-group.c
> >> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
> >> return 0;
> >> }
> >>
> >> - down_write(&fs_info->commit_root_sem);
> >> + spin_lock(&fs_info->block_group_cache_lock);
> >> refcount_inc(&caching_ctl->count);
> >> list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
> >> - up_write(&fs_info->commit_root_sem);
> >> + spin_unlock(&fs_info->block_group_cache_lock);
> >>
> >> btrfs_get_block_group(cache);
> >>
> >> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >> if (block_group->cached == BTRFS_CACHE_STARTED)
> >> btrfs_wait_block_group_cache_done(block_group);
> >> if (block_group->has_caching_ctl) {
> >> - down_write(&fs_info->commit_root_sem);
> >> + spin_lock(&fs_info->block_group_cache_lock);
> >> if (!caching_ctl) {
> >> struct btrfs_caching_control *ctl;
> >>
> >> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >> }
> >> if (caching_ctl)
> >> list_del_init(&caching_ctl->list);
> >> - up_write(&fs_info->commit_root_sem);
> >> + spin_unlock(&fs_info->block_group_cache_lock);
> >> if (caching_ctl) {
> >> /* Once for the caching bgs list and once for us. */
> >> btrfs_put_caching_control(caching_ctl);
> >> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
> >> struct btrfs_caching_control *caching_ctl;
> >> struct rb_node *n;
> >>
> >> - down_write(&info->commit_root_sem);
> >> + spin_lock(&info->block_group_cache_lock);
> >> while (!list_empty(&info->caching_block_groups)) {
> >> caching_ctl = list_entry(info->caching_block_groups.next,
> >> struct btrfs_caching_control, list);
> >> list_del(&caching_ctl->list);
> >> btrfs_put_caching_control(caching_ctl);
> >> }
> >> - up_write(&info->commit_root_sem);
> >> + spin_unlock(&info->block_group_cache_lock);
> >>
> >> spin_lock(&info->unused_bgs_lock);
> >> while (!list_empty(&info->unused_bgs)) {
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index 5fd60b13f4f8..ff2b5c132a70 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
> >>
> >> down_write(&fs_info->commit_root_sem);
> >>
> >> + spin_lock(&fs_info->block_group_cache_lock);
> >
> > Why are we still doing the down_write on the commit_root_sem? It
> > doesn't seem necessary anymore (only for switching the commit roots
> > now).
>
> Because the commit_root_sem is doing something else here, it's protecting the
> ->last_byte_to_unpin field. We have to coordinate the end of the transaction
> with the caching threads, but the list itself just needs to be protected with a
> spin_lock. Thanks,
It's not clear what protects last_byte_to_unpin.
We have several places where we read or write to it without locking
commit_root_sem.
unpin_extent_range() reads it, called from finish_extent_commit(), but
doesn't take the lock - even though it's very unlikely to have races
in this case.
read_on_block_group() also reads it without commit_root_sem, however
this is only called during mount time, so it should be safe.
btrfs_cache_block_group() sets it without holding commit_root_sem, but
while holding the block group's spinlock.
It seems more intuitive to use the block group's spinlock to protect it.
And having a comment on top of the declaration of 'last_byte_to_unpin'
mentioning which lock protects it would be nice to have, or a comment
on top of the lock mentioning what it protects (or both), even if that
isn't strictly related to this patch.
We're highly inconsistent writing to and reading from
block_group->last_byte_to_unpin.
Why can't we just always use the block group's spinlock?
Anyway, it's not like there's something wrong that this patch is
introducing, so:
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Thanks.
>
> Josef
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
2020-10-21 16:36 ` Filipe Manana
@ 2020-10-21 18:44 ` Josef Bacik
0 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-21 18:44 UTC (permalink / raw)
To: fdmanana; +Cc: linux-btrfs, kernel-team
On 10/21/20 12:36 PM, Filipe Manana wrote:
> On Wed, Oct 21, 2020 at 3:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On 10/20/20 6:10 AM, Filipe Manana wrote:
>>> On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>
>>>> I got the following lockdep splat
>>>>
>>>> ======================================================
>>>> WARNING: possible circular locking dependency detected
>>>> 5.9.0+ #101 Not tainted
>>>> ------------------------------------------------------
>>>> btrfs-cleaner/3445 is trying to acquire lock:
>>>> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>>>>
>>>> but task is already holding lock:
>>>> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
>>>> down_write+0x3d/0x70
>>>> btrfs_cache_block_group+0x2d5/0x510
>>>> find_free_extent+0xb6e/0x12f0
>>>> btrfs_reserve_extent+0xb3/0x1b0
>>>> btrfs_alloc_tree_block+0xb1/0x330
>>>> alloc_tree_block_no_bg_flush+0x4f/0x60
>>>> __btrfs_cow_block+0x11d/0x580
>>>> btrfs_cow_block+0x10c/0x220
>>>> commit_cowonly_roots+0x47/0x2e0
>>>> btrfs_commit_transaction+0x595/0xbd0
>>>> sync_filesystem+0x74/0x90
>>>> generic_shutdown_super+0x22/0x100
>>>> kill_anon_super+0x14/0x30
>>>> btrfs_kill_super+0x12/0x20
>>>> deactivate_locked_super+0x36/0xa0
>>>> cleanup_mnt+0x12d/0x190
>>>> task_work_run+0x5c/0xa0
>>>> exit_to_user_mode_prepare+0x1df/0x200
>>>> syscall_exit_to_user_mode+0x54/0x280
>>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>
>>>> -> #1 (&space_info->groups_sem){++++}-{3:3}:
>>>> down_read+0x40/0x130
>>>> find_free_extent+0x2ed/0x12f0
>>>> btrfs_reserve_extent+0xb3/0x1b0
>>>> btrfs_alloc_tree_block+0xb1/0x330
>>>> alloc_tree_block_no_bg_flush+0x4f/0x60
>>>> __btrfs_cow_block+0x11d/0x580
>>>> btrfs_cow_block+0x10c/0x220
>>>> commit_cowonly_roots+0x47/0x2e0
>>>> btrfs_commit_transaction+0x595/0xbd0
>>>> sync_filesystem+0x74/0x90
>>>> generic_shutdown_super+0x22/0x100
>>>> kill_anon_super+0x14/0x30
>>>> btrfs_kill_super+0x12/0x20
>>>> deactivate_locked_super+0x36/0xa0
>>>> cleanup_mnt+0x12d/0x190
>>>> task_work_run+0x5c/0xa0
>>>> exit_to_user_mode_prepare+0x1df/0x200
>>>> syscall_exit_to_user_mode+0x54/0x280
>>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>
>>>> -> #0 (btrfs-root-00){++++}-{3:3}:
>>>> __lock_acquire+0x1167/0x2150
>>>> lock_acquire+0xb9/0x3d0
>>>> down_read_nested+0x43/0x130
>>>> __btrfs_tree_read_lock+0x32/0x170
>>>> __btrfs_read_lock_root_node+0x3a/0x50
>>>> btrfs_search_slot+0x614/0x9d0
>>>> btrfs_find_root+0x35/0x1b0
>>>> btrfs_read_tree_root+0x61/0x120
>>>> btrfs_get_root_ref+0x14b/0x600
>>>> find_parent_nodes+0x3e6/0x1b30
>>>> btrfs_find_all_roots_safe+0xb4/0x130
>>>> btrfs_find_all_roots+0x60/0x80
>>>> btrfs_qgroup_trace_extent_post+0x27/0x40
>>>> btrfs_add_delayed_data_ref+0x3fd/0x460
>>>> btrfs_free_extent+0x42/0x100
>>>> __btrfs_mod_ref+0x1d7/0x2f0
>>>> walk_up_proc+0x11c/0x400
>>>> walk_up_tree+0xf0/0x180
>>>> btrfs_drop_snapshot+0x1c7/0x780
>>>> btrfs_clean_one_deleted_snapshot+0xfb/0x110
>>>> cleaner_kthread+0xd4/0x140
>>>> kthread+0x13a/0x150
>>>> ret_from_fork+0x1f/0x30
>>>>
>>>> other info that might help us debug this:
>>>>
>>>> Chain exists of:
>>>> btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
>>>>
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0 CPU1
>>>> ---- ----
>>>> lock(&fs_info->commit_root_sem);
>>>> lock(&space_info->groups_sem);
>>>> lock(&fs_info->commit_root_sem);
>>>> lock(btrfs-root-00);
>>>>
>>>> *** DEADLOCK ***
>>>>
>>>> 3 locks held by btrfs-cleaner/3445:
>>>> #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
>>>> #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
>>>> #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>>>
>>>> stack backtrace:
>>>> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
>>>> Call Trace:
>>>> dump_stack+0x8b/0xb0
>>>> check_noncircular+0xcf/0xf0
>>>> __lock_acquire+0x1167/0x2150
>>>> ? __bfs+0x42/0x210
>>>> lock_acquire+0xb9/0x3d0
>>>> ? __btrfs_tree_read_lock+0x32/0x170
>>>> down_read_nested+0x43/0x130
>>>> ? __btrfs_tree_read_lock+0x32/0x170
>>>> __btrfs_tree_read_lock+0x32/0x170
>>>> __btrfs_read_lock_root_node+0x3a/0x50
>>>> btrfs_search_slot+0x614/0x9d0
>>>> ? find_held_lock+0x2b/0x80
>>>> btrfs_find_root+0x35/0x1b0
>>>> ? do_raw_spin_unlock+0x4b/0xa0
>>>> btrfs_read_tree_root+0x61/0x120
>>>> btrfs_get_root_ref+0x14b/0x600
>>>> find_parent_nodes+0x3e6/0x1b30
>>>> btrfs_find_all_roots_safe+0xb4/0x130
>>>> btrfs_find_all_roots+0x60/0x80
>>>> btrfs_qgroup_trace_extent_post+0x27/0x40
>>>> btrfs_add_delayed_data_ref+0x3fd/0x460
>>>> btrfs_free_extent+0x42/0x100
>>>> __btrfs_mod_ref+0x1d7/0x2f0
>>>> walk_up_proc+0x11c/0x400
>>>> walk_up_tree+0xf0/0x180
>>>> btrfs_drop_snapshot+0x1c7/0x780
>>>> ? btrfs_clean_one_deleted_snapshot+0x73/0x110
>>>> btrfs_clean_one_deleted_snapshot+0xfb/0x110
>>>> cleaner_kthread+0xd4/0x140
>>>> ? btrfs_alloc_root+0x50/0x50
>>>> kthread+0x13a/0x150
>>>> ? kthread_create_worker_on_cpu+0x40/0x40
>>>> ret_from_fork+0x1f/0x30
>>>>
>>>> while testing another lockdep fix. This happens because we're using the
>>>> commit_root_sem to protect fs_info->caching_block_groups, which creates
>>>> a dependency on the groups_sem -> commit_root_sem, which is problematic
>>>> because we will allocate blocks while holding tree roots. Fix this by
>>>> making the list itself protected by the fs_info->block_group_cache_lock.
>>>>
>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>> ---
>>>> fs/btrfs/block-group.c | 12 ++++++------
>>>> fs/btrfs/extent-tree.c | 2 ++
>>>> 2 files changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>>>> index 3ba6f3839d39..a8240913e6fc 100644
>>>> --- a/fs/btrfs/block-group.c
>>>> +++ b/fs/btrfs/block-group.c
>>>> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
>>>> return 0;
>>>> }
>>>>
>>>> - down_write(&fs_info->commit_root_sem);
>>>> + spin_lock(&fs_info->block_group_cache_lock);
>>>> refcount_inc(&caching_ctl->count);
>>>> list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
>>>> - up_write(&fs_info->commit_root_sem);
>>>> + spin_unlock(&fs_info->block_group_cache_lock);
>>>>
>>>> btrfs_get_block_group(cache);
>>>>
>>>> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>>> if (block_group->cached == BTRFS_CACHE_STARTED)
>>>> btrfs_wait_block_group_cache_done(block_group);
>>>> if (block_group->has_caching_ctl) {
>>>> - down_write(&fs_info->commit_root_sem);
>>>> + spin_lock(&fs_info->block_group_cache_lock);
>>>> if (!caching_ctl) {
>>>> struct btrfs_caching_control *ctl;
>>>>
>>>> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>>> }
>>>> if (caching_ctl)
>>>> list_del_init(&caching_ctl->list);
>>>> - up_write(&fs_info->commit_root_sem);
>>>> + spin_unlock(&fs_info->block_group_cache_lock);
>>>> if (caching_ctl) {
>>>> /* Once for the caching bgs list and once for us. */
>>>> btrfs_put_caching_control(caching_ctl);
>>>> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>>>> struct btrfs_caching_control *caching_ctl;
>>>> struct rb_node *n;
>>>>
>>>> - down_write(&info->commit_root_sem);
>>>> + spin_lock(&info->block_group_cache_lock);
>>>> while (!list_empty(&info->caching_block_groups)) {
>>>> caching_ctl = list_entry(info->caching_block_groups.next,
>>>> struct btrfs_caching_control, list);
>>>> list_del(&caching_ctl->list);
>>>> btrfs_put_caching_control(caching_ctl);
>>>> }
>>>> - up_write(&info->commit_root_sem);
>>>> + spin_unlock(&info->block_group_cache_lock);
>>>>
>>>> spin_lock(&info->unused_bgs_lock);
>>>> while (!list_empty(&info->unused_bgs)) {
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 5fd60b13f4f8..ff2b5c132a70 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>>>>
>>>> down_write(&fs_info->commit_root_sem);
>>>>
>>>> + spin_lock(&fs_info->block_group_cache_lock);
>>>
>>> Why are we still doing the down_write on the commit_root_sem? It
>>> doesn't seem necessary anymore (only for switching the commit roots
>>> now).
>>
>> Because the commit_root_sem is doing something else here, it's protecting the
>> ->last_byte_to_unpin field. We have to coordinate the end of the transaction
>> with the caching threads, but the list itself just needs to be protected with a
>> spin_lock. Thanks,
>
> It's not clear what protects last_byte_to_unpin.
> We have several places where we read or write to it without locking
> commit_root_sem.
>
> unpin_extent_range() reads it, called from finish_extent_commit(), but
> doesn't take the lock - even though it's very unlikely to have races
> in this case.
> read_on_block_group() also reads it without commit_root_sem, however
> this is only called during mount time, so it should be safe.
> btrfs_cache_block_group() sets it without holding commit_root_sem, but
> while holding the block group's spinlock.
>
> It seems more intuitive to use the block group's spinlock to protect it.
> And having a comment on top of the declaration of 'last_byte_to_unpin'
> mentioning which lock protects it would be nice to have, or a comment
> on top of the lock mentioning what it protects (or both), even if that
> isn't strictly related to this patch.
>
> We're highly inconsistent writing to and reading from
> block_group->last_byte_to_unpin.
last_byte_to_unpin needs to sync up with the swapping of the commit roots,
otherwise we end up with leaked space. Currently we run finish_extent_commit()
in the transactions context, which means last_byte_to_unpin will never change
while we're running, because the next transaction will have to wait for us to
finish our work. It's only synchronized with the caching threads, which is
controlled by the commit_root_sem, after that we are free to read it and trust
that it won't change.
However in documenting this I realized there's a problem, we actually do the
prepare first, and then switch the commit roots, so there's a window in which we
could make progress and have a stale unpin setting, which will result in leaked
space. I'll fix that up and resend. Thanks,
Josef
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-10-21 18:44 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
2020-10-20 9:31 ` Filipe Manana
2020-10-21 15:19 ` David Sterba
2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
2020-10-20 10:10 ` Filipe Manana
2020-10-21 14:05 ` Josef Bacik
2020-10-21 16:36 ` Filipe Manana
2020-10-21 18:44 ` Josef Bacik
2020-10-21 15:51 ` David Sterba
2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
2020-10-20 11:06 ` Filipe Manana
2020-10-21 16:13 ` [PATCH 0/3] Lockdep fixes for misc-next David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).