All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Lockdep fixes for misc-next
@ 2020-10-19 20:02 Josef Bacik
  2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

Here's a few lockdep fixes for misc-next+my rwsem patch.  Nothing too crazy, but
the last one is a little wonkey because qgroups does a backref resolution while
adding delayed refs.  Generally we do the right thing with searching the commit
roots and skipping the locking, with the exception of looking up fs roots if we
have to resolve indirect refs.  This obviously uses the normal lookup and
locking stuff, which is problematic in the new world order.  For now I'm fixing
it with a special helper for backref lookups that either finds the root in
cache, or generates a temporary root that's not inserted into the fs roots radix
tree and is only used to do the backref resolution.  Thanks,

Josef

Josef Bacik (3):
  btrfs: drop the path before adding qgroup items when enabling qgroups
  btrfs: protect the fs_info->caching_block_groups differently
  btrfs: add a helper to read the tree_root commit root for backref
    lookup

 fs/btrfs/backref.c     | 14 +++++++-
 fs/btrfs/block-group.c | 12 +++----
 fs/btrfs/disk-io.c     | 79 +++++++++++++++++++++++++++++++-----------
 fs/btrfs/disk-io.h     |  3 ++
 fs/btrfs/extent-tree.c |  2 ++
 fs/btrfs/qgroup.c      | 16 +++++++++
 6 files changed, 98 insertions(+), 28 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups
  2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
@ 2020-10-19 20:02 ` Josef Bacik
  2020-10-20  9:31   ` Filipe Manana
  2020-10-21 15:19   ` David Sterba
  2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When enabling qgroups we walk the tree_root and then add a qgroup item
for every root that we have.  This creates a lock dependency on the
tree_root and qgroup_root, which results in the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.9.0-default+ #1299 Not tainted
------------------------------------------------------
btrfs/24552 is trying to acquire lock:
ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

but task is already holding lock:
ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (btrfs-root-00){++++}-{3:3}:
       __lock_acquire+0x3fb/0x730
       lock_acquire.part.0+0x6a/0x130
       down_read_nested+0x46/0x130
       __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
       __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
       btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
       btrfs_search_slot+0xc3/0x9f0 [btrfs]
       btrfs_insert_item+0x6e/0x140 [btrfs]
       btrfs_create_tree+0x1cb/0x240 [btrfs]
       btrfs_quota_enable+0xcd/0x790 [btrfs]
       btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
       __x64_sys_ioctl+0x83/0xa0
       do_syscall_64+0x2d/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (btrfs-quota-00){++++}-{3:3}:
       check_prev_add+0x91/0xc30
       validate_chain+0x491/0x750
       __lock_acquire+0x3fb/0x730
       lock_acquire.part.0+0x6a/0x130
       down_read_nested+0x46/0x130
       __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
       __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
       btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
       btrfs_search_slot+0xc3/0x9f0 [btrfs]
       btrfs_insert_empty_items+0x58/0xa0 [btrfs]
       add_qgroup_item.part.0+0x72/0x210 [btrfs]
       btrfs_quota_enable+0x3bb/0x790 [btrfs]
       btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
       __x64_sys_ioctl+0x83/0xa0
       do_syscall_64+0x2d/0x70
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-root-00);
                               lock(btrfs-quota-00);
                               lock(btrfs-root-00);
  lock(btrfs-quota-00);

 *** DEADLOCK ***

5 locks held by btrfs/24552:
 #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
 #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
 #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
 #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
 #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]

stack backtrace:
CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
Call Trace:
 dump_stack+0x77/0x97
 check_noncircular+0xf3/0x110
 check_prev_add+0x91/0xc30
 validate_chain+0x491/0x750
 __lock_acquire+0x3fb/0x730
 lock_acquire.part.0+0x6a/0x130
 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
 ? lock_acquire+0xc4/0x140
 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
 down_read_nested+0x46/0x130
 ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
 __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
 ? btrfs_root_node+0xd9/0x200 [btrfs]
 __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
 btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
 btrfs_search_slot+0xc3/0x9f0 [btrfs]
 btrfs_insert_empty_items+0x58/0xa0 [btrfs]
 add_qgroup_item.part.0+0x72/0x210 [btrfs]
 btrfs_quota_enable+0x3bb/0x790 [btrfs]
 btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
 __x64_sys_ioctl+0x83/0xa0
 do_syscall_64+0x2d/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by dropping the path whenever we find a root item, add the
qgroup item, and then re-lookup the root item we found and continue
processing roots.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/qgroup.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 580899bdb991..573a555b51d8 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1026,6 +1026,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 		btrfs_item_key_to_cpu(leaf, &found_key, slot);
 
 		if (found_key.type == BTRFS_ROOT_REF_KEY) {
+			btrfs_release_path(path);
+
 			ret = add_qgroup_item(trans, quota_root,
 					      found_key.offset);
 			if (ret) {
@@ -1044,6 +1046,20 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
 				btrfs_abort_transaction(trans, ret);
 				goto out_free_path;
 			}
+			ret = btrfs_search_slot_for_read(tree_root, &found_key,
+							 path, 1, 0);
+			if (ret < 0) {
+				btrfs_abort_transaction(trans, ret);
+				goto out_free_path;
+			}
+			if (ret > 0) {
+				/*
+				 * Shouldn't happen, but in case it does we
+				 * don't need to do the btrfs_next_item, just
+				 * continue.
+				 */
+				continue;
+			}
 		}
 		ret = btrfs_next_item(tree_root, path);
 		if (ret < 0) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
  2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
  2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
@ 2020-10-19 20:02 ` Josef Bacik
  2020-10-20 10:10   ` Filipe Manana
  2020-10-21 15:51   ` David Sterba
  2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
  2020-10-21 16:13 ` [PATCH 0/3] Lockdep fixes for misc-next David Sterba
  3 siblings, 2 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I got the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.9.0+ #101 Not tainted
------------------------------------------------------
btrfs-cleaner/3445 is trying to acquire lock:
ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170

but task is already holding lock:
ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
       down_write+0x3d/0x70
       btrfs_cache_block_group+0x2d5/0x510
       find_free_extent+0xb6e/0x12f0
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb1/0x330
       alloc_tree_block_no_bg_flush+0x4f/0x60
       __btrfs_cow_block+0x11d/0x580
       btrfs_cow_block+0x10c/0x220
       commit_cowonly_roots+0x47/0x2e0
       btrfs_commit_transaction+0x595/0xbd0
       sync_filesystem+0x74/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x36/0xa0
       cleanup_mnt+0x12d/0x190
       task_work_run+0x5c/0xa0
       exit_to_user_mode_prepare+0x1df/0x200
       syscall_exit_to_user_mode+0x54/0x280
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&space_info->groups_sem){++++}-{3:3}:
       down_read+0x40/0x130
       find_free_extent+0x2ed/0x12f0
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb1/0x330
       alloc_tree_block_no_bg_flush+0x4f/0x60
       __btrfs_cow_block+0x11d/0x580
       btrfs_cow_block+0x10c/0x220
       commit_cowonly_roots+0x47/0x2e0
       btrfs_commit_transaction+0x595/0xbd0
       sync_filesystem+0x74/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x36/0xa0
       cleanup_mnt+0x12d/0x190
       task_work_run+0x5c/0xa0
       exit_to_user_mode_prepare+0x1df/0x200
       syscall_exit_to_user_mode+0x54/0x280
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (btrfs-root-00){++++}-{3:3}:
       __lock_acquire+0x1167/0x2150
       lock_acquire+0xb9/0x3d0
       down_read_nested+0x43/0x130
       __btrfs_tree_read_lock+0x32/0x170
       __btrfs_read_lock_root_node+0x3a/0x50
       btrfs_search_slot+0x614/0x9d0
       btrfs_find_root+0x35/0x1b0
       btrfs_read_tree_root+0x61/0x120
       btrfs_get_root_ref+0x14b/0x600
       find_parent_nodes+0x3e6/0x1b30
       btrfs_find_all_roots_safe+0xb4/0x130
       btrfs_find_all_roots+0x60/0x80
       btrfs_qgroup_trace_extent_post+0x27/0x40
       btrfs_add_delayed_data_ref+0x3fd/0x460
       btrfs_free_extent+0x42/0x100
       __btrfs_mod_ref+0x1d7/0x2f0
       walk_up_proc+0x11c/0x400
       walk_up_tree+0xf0/0x180
       btrfs_drop_snapshot+0x1c7/0x780
       btrfs_clean_one_deleted_snapshot+0xfb/0x110
       cleaner_kthread+0xd4/0x140
       kthread+0x13a/0x150
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

Chain exists of:
  btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->commit_root_sem);
                               lock(&space_info->groups_sem);
                               lock(&fs_info->commit_root_sem);
  lock(btrfs-root-00);

 *** DEADLOCK ***

3 locks held by btrfs-cleaner/3445:
 #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
 #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
 #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

stack backtrace:
CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb0
 check_noncircular+0xcf/0xf0
 __lock_acquire+0x1167/0x2150
 ? __bfs+0x42/0x210
 lock_acquire+0xb9/0x3d0
 ? __btrfs_tree_read_lock+0x32/0x170
 down_read_nested+0x43/0x130
 ? __btrfs_tree_read_lock+0x32/0x170
 __btrfs_tree_read_lock+0x32/0x170
 __btrfs_read_lock_root_node+0x3a/0x50
 btrfs_search_slot+0x614/0x9d0
 ? find_held_lock+0x2b/0x80
 btrfs_find_root+0x35/0x1b0
 ? do_raw_spin_unlock+0x4b/0xa0
 btrfs_read_tree_root+0x61/0x120
 btrfs_get_root_ref+0x14b/0x600
 find_parent_nodes+0x3e6/0x1b30
 btrfs_find_all_roots_safe+0xb4/0x130
 btrfs_find_all_roots+0x60/0x80
 btrfs_qgroup_trace_extent_post+0x27/0x40
 btrfs_add_delayed_data_ref+0x3fd/0x460
 btrfs_free_extent+0x42/0x100
 __btrfs_mod_ref+0x1d7/0x2f0
 walk_up_proc+0x11c/0x400
 walk_up_tree+0xf0/0x180
 btrfs_drop_snapshot+0x1c7/0x780
 ? btrfs_clean_one_deleted_snapshot+0x73/0x110
 btrfs_clean_one_deleted_snapshot+0xfb/0x110
 cleaner_kthread+0xd4/0x140
 ? btrfs_alloc_root+0x50/0x50
 kthread+0x13a/0x150
 ? kthread_create_worker_on_cpu+0x40/0x40
 ret_from_fork+0x1f/0x30

while testing another lockdep fix.  This happens because we're using the
commit_root_sem to protect fs_info->caching_block_groups, which creates
a dependency on the groups_sem -> commit_root_sem, which is problematic
because we will allocate blocks while holding tree roots.  Fix this by
making the list itself protected by the fs_info->block_group_cache_lock.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 12 ++++++------
 fs/btrfs/extent-tree.c |  2 ++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 3ba6f3839d39..a8240913e6fc 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
 		return 0;
 	}
 
-	down_write(&fs_info->commit_root_sem);
+	spin_lock(&fs_info->block_group_cache_lock);
 	refcount_inc(&caching_ctl->count);
 	list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
-	up_write(&fs_info->commit_root_sem);
+	spin_unlock(&fs_info->block_group_cache_lock);
 
 	btrfs_get_block_group(cache);
 
@@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 	if (block_group->cached == BTRFS_CACHE_STARTED)
 		btrfs_wait_block_group_cache_done(block_group);
 	if (block_group->has_caching_ctl) {
-		down_write(&fs_info->commit_root_sem);
+		spin_lock(&fs_info->block_group_cache_lock);
 		if (!caching_ctl) {
 			struct btrfs_caching_control *ctl;
 
@@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
 		}
 		if (caching_ctl)
 			list_del_init(&caching_ctl->list);
-		up_write(&fs_info->commit_root_sem);
+		spin_unlock(&fs_info->block_group_cache_lock);
 		if (caching_ctl) {
 			/* Once for the caching bgs list and once for us. */
 			btrfs_put_caching_control(caching_ctl);
@@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 	struct btrfs_caching_control *caching_ctl;
 	struct rb_node *n;
 
-	down_write(&info->commit_root_sem);
+	spin_lock(&info->block_group_cache_lock);
 	while (!list_empty(&info->caching_block_groups)) {
 		caching_ctl = list_entry(info->caching_block_groups.next,
 					 struct btrfs_caching_control, list);
 		list_del(&caching_ctl->list);
 		btrfs_put_caching_control(caching_ctl);
 	}
-	up_write(&info->commit_root_sem);
+	spin_unlock(&info->block_group_cache_lock);
 
 	spin_lock(&info->unused_bgs_lock);
 	while (!list_empty(&info->unused_bgs)) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5fd60b13f4f8..ff2b5c132a70 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
 
 	down_write(&fs_info->commit_root_sem);
 
+	spin_lock(&fs_info->block_group_cache_lock);
 	list_for_each_entry_safe(caching_ctl, next,
 				 &fs_info->caching_block_groups, list) {
 		cache = caching_ctl->block_group;
@@ -2749,6 +2750,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
 			cache->last_byte_to_unpin = caching_ctl->progress;
 		}
 	}
+	spin_unlock(&fs_info->block_group_cache_lock);
 
 	up_write(&fs_info->commit_root_sem);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup
  2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
  2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
  2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
@ 2020-10-19 20:02 ` Josef Bacik
  2020-10-20 11:06   ` Filipe Manana
  2020-10-21 16:13 ` [PATCH 0/3] Lockdep fixes for misc-next David Sterba
  3 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-10-19 20:02 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I got the following lockdep splat with my rwsem patches on btrfs/104

======================================================
WARNING: possible circular locking dependency detected
5.9.0+ #102 Not tainted
------------------------------------------------------
btrfs-cleaner/903 is trying to acquire lock:
ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170

but task is already holding lock:
ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
       down_read+0x40/0x130
       caching_thread+0x53/0x5a0
       btrfs_work_helper+0xfa/0x520
       process_one_work+0x238/0x540
       worker_thread+0x55/0x3c0
       kthread+0x13a/0x150
       ret_from_fork+0x1f/0x30

-> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
       __mutex_lock+0x7e/0x7b0
       btrfs_cache_block_group+0x1e0/0x510
       find_free_extent+0xb6e/0x12f0
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb1/0x330
       alloc_tree_block_no_bg_flush+0x4f/0x60
       __btrfs_cow_block+0x11d/0x580
       btrfs_cow_block+0x10c/0x220
       commit_cowonly_roots+0x47/0x2e0
       btrfs_commit_transaction+0x595/0xbd0
       sync_filesystem+0x74/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x36/0xa0
       cleanup_mnt+0x12d/0x190
       task_work_run+0x5c/0xa0
       exit_to_user_mode_prepare+0x1df/0x200
       syscall_exit_to_user_mode+0x54/0x280
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&space_info->groups_sem){++++}-{3:3}:
       down_read+0x40/0x130
       find_free_extent+0x2ed/0x12f0
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb1/0x330
       alloc_tree_block_no_bg_flush+0x4f/0x60
       __btrfs_cow_block+0x11d/0x580
       btrfs_cow_block+0x10c/0x220
       commit_cowonly_roots+0x47/0x2e0
       btrfs_commit_transaction+0x595/0xbd0
       sync_filesystem+0x74/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x36/0xa0
       cleanup_mnt+0x12d/0x190
       task_work_run+0x5c/0xa0
       exit_to_user_mode_prepare+0x1df/0x200
       syscall_exit_to_user_mode+0x54/0x280
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (btrfs-root-00){++++}-{3:3}:
       __lock_acquire+0x1167/0x2150
       lock_acquire+0xb9/0x3d0
       down_read_nested+0x43/0x130
       __btrfs_tree_read_lock+0x32/0x170
       __btrfs_read_lock_root_node+0x3a/0x50
       btrfs_search_slot+0x614/0x9d0
       btrfs_find_root+0x35/0x1b0
       btrfs_read_tree_root+0x61/0x120
       btrfs_get_root_ref+0x14b/0x600
       find_parent_nodes+0x3e6/0x1b30
       btrfs_find_all_roots_safe+0xb4/0x130
       btrfs_find_all_roots+0x60/0x80
       btrfs_qgroup_trace_extent_post+0x27/0x40
       btrfs_add_delayed_data_ref+0x3fd/0x460
       btrfs_free_extent+0x42/0x100
       __btrfs_mod_ref+0x1d7/0x2f0
       walk_up_proc+0x11c/0x400
       walk_up_tree+0xf0/0x180
       btrfs_drop_snapshot+0x1c7/0x780
       btrfs_clean_one_deleted_snapshot+0xfb/0x110
       cleaner_kthread+0xd4/0x140
       kthread+0x13a/0x150
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

Chain exists of:
  btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->commit_root_sem);
                               lock(&caching_ctl->mutex);
                               lock(&fs_info->commit_root_sem);
  lock(btrfs-root-00);

 *** DEADLOCK ***

3 locks held by btrfs-cleaner/903:
 #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
 #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
 #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80

stack backtrace:
CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb0
 check_noncircular+0xcf/0xf0
 __lock_acquire+0x1167/0x2150
 ? __bfs+0x42/0x210
 lock_acquire+0xb9/0x3d0
 ? __btrfs_tree_read_lock+0x32/0x170
 down_read_nested+0x43/0x130
 ? __btrfs_tree_read_lock+0x32/0x170
 __btrfs_tree_read_lock+0x32/0x170
 __btrfs_read_lock_root_node+0x3a/0x50
 btrfs_search_slot+0x614/0x9d0
 ? find_held_lock+0x2b/0x80
 btrfs_find_root+0x35/0x1b0
 ? do_raw_spin_unlock+0x4b/0xa0
 btrfs_read_tree_root+0x61/0x120
 btrfs_get_root_ref+0x14b/0x600
 find_parent_nodes+0x3e6/0x1b30
 btrfs_find_all_roots_safe+0xb4/0x130
 btrfs_find_all_roots+0x60/0x80
 btrfs_qgroup_trace_extent_post+0x27/0x40
 btrfs_add_delayed_data_ref+0x3fd/0x460
 btrfs_free_extent+0x42/0x100
 __btrfs_mod_ref+0x1d7/0x2f0
 walk_up_proc+0x11c/0x400
 walk_up_tree+0xf0/0x180
 btrfs_drop_snapshot+0x1c7/0x780
 ? btrfs_clean_one_deleted_snapshot+0x73/0x110
 btrfs_clean_one_deleted_snapshot+0xfb/0x110
 cleaner_kthread+0xd4/0x140
 ? btrfs_alloc_root+0x50/0x50
 kthread+0x13a/0x150
 ? kthread_create_worker_on_cpu+0x40/0x40
 ret_from_fork+0x1f/0x30
BTRFS info (device sdb): disk space caching is enabled
BTRFS info (device sdb): has skinny extents

This happens because qgroups does a backref lookup when we create a
delayed ref.  From here it may have to look up a root from an indirect
ref, which does a normal lookup on the tree_root, which takes the read
lock on the tree_root nodes.

To fix this we need to add a variant for looking up roots that searches
the commit root of the tree_root.  Then when we do the backref search
using the commit root we are sure to not take any locks on the tree_root
nodes.  This gets rid of the lockdep splat when running btrfs/104.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/backref.c | 14 +++++++-
 fs/btrfs/disk-io.c | 79 ++++++++++++++++++++++++++++++++++------------
 fs/btrfs/disk-io.h |  3 ++
 3 files changed, 74 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index b3268f4ea5f3..cacba965c535 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -544,7 +544,19 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info,
 	int level = ref->level;
 	struct btrfs_key search_key = ref->key_for_search;
 
-	root = btrfs_get_fs_root(fs_info, ref->root_id, false);
+	/*
+	 * If we're search_commit_root we could possibly be holding locks on
+	 * other tree nodes.  This happens when qgroups does backref walks when
+	 * adding new delayed refs.  To deal with this we need to look in cache
+	 * for the root, and if we don't find it then we need to search the
+	 * tree_root's commit root, thus the btrfs_get_fs_root_commit_root usage
+	 * here.
+	 */
+	if (path->search_commit_root)
+		root = btrfs_get_fs_root_commit_root(fs_info, path,
+						     ref->root_id);
+	else
+		root = btrfs_get_fs_root(fs_info, ref->root_id, false);
 	if (IS_ERR(root)) {
 		ret = PTR_ERR(root);
 		goto out_free;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 81e7b1880b5b..3972f16b333d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1281,32 +1281,26 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
 	return 0;
 }
 
-struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
-					struct btrfs_key *key)
+static struct btrfs_root *read_tree_root_path(struct btrfs_root *tree_root,
+					      struct btrfs_path *path,
+					      struct btrfs_key *key)
 {
 	struct btrfs_root *root;
 	struct btrfs_fs_info *fs_info = tree_root->fs_info;
-	struct btrfs_path *path;
 	u64 generation;
 	int ret;
 	int level;
 
-	path = btrfs_alloc_path();
-	if (!path)
-		return ERR_PTR(-ENOMEM);
-
 	root = btrfs_alloc_root(fs_info, key->objectid, GFP_NOFS);
-	if (!root) {
-		ret = -ENOMEM;
-		goto alloc_fail;
-	}
+	if (!root)
+		return ERR_PTR(-ENOMEM);
 
 	ret = btrfs_find_root(tree_root, key, path,
 			      &root->root_item, &root->root_key);
 	if (ret) {
 		if (ret > 0)
 			ret = -ENOENT;
-		goto find_fail;
+		goto fail;
 	}
 
 	generation = btrfs_root_generation(&root->root_item);
@@ -1317,21 +1311,30 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
 	if (IS_ERR(root->node)) {
 		ret = PTR_ERR(root->node);
 		root->node = NULL;
-		goto find_fail;
+		goto fail;
 	} else if (!btrfs_buffer_uptodate(root->node, generation, 0)) {
 		ret = -EIO;
-		goto find_fail;
+		goto fail;
 	}
 	root->commit_root = btrfs_root_node(root);
-out:
-	btrfs_free_path(path);
 	return root;
-
-find_fail:
+fail:
 	btrfs_put_root(root);
-alloc_fail:
-	root = ERR_PTR(ret);
-	goto out;
+	return ERR_PTR(ret);
+}
+
+struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
+					struct btrfs_key *key)
+{
+	struct btrfs_root *root;
+	struct btrfs_path *path;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return ERR_PTR(-ENOMEM);
+	root = read_tree_root_path(tree_root, path, key);
+	btrfs_free_path(path);
+	return root;
 }
 
 /*
@@ -1621,6 +1624,40 @@ struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
 	return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
 }
 
+/* btrfs_get_fs_root_commit_root - return a root for the given objectid
+ * @fs_info - the fs_info.
+ * @objectid = the objectid we need to lookup.
+ *
+ * This is exclusively used for backref walking, and exists specifically because
+ * of how qgroups does lookups.  Qgroups will do a backref lookup at delayed ref
+ * creation time, which means we may have to read the tree_root in order to look
+ * up a fs root that is not in memory.  If the root is not in memory we will
+ * read the tree root commit root and look up the fs root from there.  This is a
+ * temporary root, it will not be inserted into the radix tree as it doesn't
+ * have the most uptodate information, it'll simply be discarded once the
+ * backref code is finished using the root.
+ */
+struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
+						 struct btrfs_path *path,
+						 u64 objectid)
+{
+	struct btrfs_root *root;
+	struct btrfs_key key;
+
+	ASSERT(path->search_commit_root && path->skip_locking);
+
+	root = btrfs_lookup_fs_root(fs_info, objectid);
+	if (root)
+		return root;
+
+	key.objectid = objectid;
+	key.type = BTRFS_ROOT_ITEM_KEY;
+	key.offset = (u64)-1;
+	root = read_tree_root_path(fs_info->tree_root, path, &key);
+	btrfs_release_path(path);
+	return root;
+}
+
 /*
  * called by the kthread helper functions to finally call the bio end_io
  * functions.  This is where read checksum verification actually happens
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index fee69ced58b4..182540bdcea0 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -69,6 +69,9 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
 				     u64 objectid, bool check_ref);
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
 					 u64 objectid, dev_t anon_dev);
+struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
+						 struct btrfs_path *path,
+						 u64 objectid);
 
 void btrfs_free_fs_info(struct btrfs_fs_info *fs_info);
 int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups
  2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
@ 2020-10-20  9:31   ` Filipe Manana
  2020-10-21 15:19   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2020-10-20  9:31 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Oct 20, 2020 at 8:37 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> When enabling qgroups we walk the tree_root and then add a qgroup item
> for every root that we have.  This creates a lock dependency on the
> tree_root and qgroup_root, which results in the following lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0-default+ #1299 Not tainted
> ------------------------------------------------------
> btrfs/24552 is trying to acquire lock:
> ffff9142dfc5f630 (btrfs-quota-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>
> but task is already holding lock:
> ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (btrfs-root-00){++++}-{3:3}:
>        __lock_acquire+0x3fb/0x730
>        lock_acquire.part.0+0x6a/0x130
>        down_read_nested+0x46/0x130
>        __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>        __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
>        btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
>        btrfs_search_slot+0xc3/0x9f0 [btrfs]
>        btrfs_insert_item+0x6e/0x140 [btrfs]
>        btrfs_create_tree+0x1cb/0x240 [btrfs]
>        btrfs_quota_enable+0xcd/0x790 [btrfs]
>        btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
>        __x64_sys_ioctl+0x83/0xa0
>        do_syscall_64+0x2d/0x70
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (btrfs-quota-00){++++}-{3:3}:
>        check_prev_add+0x91/0xc30
>        validate_chain+0x491/0x750
>        __lock_acquire+0x3fb/0x730
>        lock_acquire.part.0+0x6a/0x130
>        down_read_nested+0x46/0x130
>        __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>        __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
>        btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
>        btrfs_search_slot+0xc3/0x9f0 [btrfs]
>        btrfs_insert_empty_items+0x58/0xa0 [btrfs]
>        add_qgroup_item.part.0+0x72/0x210 [btrfs]
>        btrfs_quota_enable+0x3bb/0x790 [btrfs]
>        btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
>        __x64_sys_ioctl+0x83/0xa0
>        do_syscall_64+0x2d/0x70
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> other info that might help us debug this:
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(btrfs-root-00);
>                                lock(btrfs-quota-00);
>                                lock(btrfs-root-00);
>   lock(btrfs-quota-00);
>
>  *** DEADLOCK ***
>
> 5 locks held by btrfs/24552:
>  #0: ffff9142df431478 (sb_writers#10){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0xa0
>  #1: ffff9142f9b10cc0 (&fs_info->subvol_sem){++++}-{3:3}, at: btrfs_ioctl_quota_ctl+0x7b/0xe0 [btrfs]
>  #2: ffff9142f9b11a08 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_enable+0x3b/0x790 [btrfs]
>  #3: ffff9142df431698 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x406/0x510 [btrfs]
>  #4: ffff9142dfc5d0b0 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>
> stack backtrace:
> CPU: 1 PID: 24552 Comm: btrfs Not tainted 5.9.0-default+ #1299
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
> Call Trace:
>  dump_stack+0x77/0x97
>  check_noncircular+0xf3/0x110
>  check_prev_add+0x91/0xc30
>  validate_chain+0x491/0x750
>  __lock_acquire+0x3fb/0x730
>  lock_acquire.part.0+0x6a/0x130
>  ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>  ? lock_acquire+0xc4/0x140
>  ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>  down_read_nested+0x46/0x130
>  ? __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>  __btrfs_tree_read_lock+0x35/0x1c0 [btrfs]
>  ? btrfs_root_node+0xd9/0x200 [btrfs]
>  __btrfs_read_lock_root_node+0x3a/0x50 [btrfs]
>  btrfs_search_slot_get_root+0x11d/0x290 [btrfs]
>  btrfs_search_slot+0xc3/0x9f0 [btrfs]
>  btrfs_insert_empty_items+0x58/0xa0 [btrfs]
>  add_qgroup_item.part.0+0x72/0x210 [btrfs]
>  btrfs_quota_enable+0x3bb/0x790 [btrfs]
>  btrfs_ioctl_quota_ctl+0xc9/0xe0 [btrfs]
>  __x64_sys_ioctl+0x83/0xa0
>  do_syscall_64+0x2d/0x70
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fix this by dropping the path whenever we find a root item, add the
> qgroup item, and then re-lookup the root item we found and continue
> processing roots.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Looks good, thanks.

> ---
>  fs/btrfs/qgroup.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 580899bdb991..573a555b51d8 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1026,6 +1026,8 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
>                 btrfs_item_key_to_cpu(leaf, &found_key, slot);
>
>                 if (found_key.type == BTRFS_ROOT_REF_KEY) {
> +                       btrfs_release_path(path);
> +
>                         ret = add_qgroup_item(trans, quota_root,
>                                               found_key.offset);
>                         if (ret) {
> @@ -1044,6 +1046,20 @@ int btrfs_quota_enable(struct btrfs_fs_info *fs_info)
>                                 btrfs_abort_transaction(trans, ret);
>                                 goto out_free_path;
>                         }
> +                       ret = btrfs_search_slot_for_read(tree_root, &found_key,
> +                                                        path, 1, 0);
> +                       if (ret < 0) {
> +                               btrfs_abort_transaction(trans, ret);
> +                               goto out_free_path;
> +                       }
> +                       if (ret > 0) {
> +                               /*
> +                                * Shouldn't happen, but in case it does we
> +                                * don't need to do the btrfs_next_item, just
> +                                * continue.
> +                                */
> +                               continue;
> +                       }
>                 }
>                 ret = btrfs_next_item(tree_root, path);
>                 if (ret < 0) {
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
  2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
@ 2020-10-20 10:10   ` Filipe Manana
  2020-10-21 14:05     ` Josef Bacik
  2020-10-21 15:51   ` David Sterba
  1 sibling, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2020-10-20 10:10 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> I got the following lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0+ #101 Not tainted
> ------------------------------------------------------
> btrfs-cleaner/3445 is trying to acquire lock:
> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>
> but task is already holding lock:
> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
>        down_write+0x3d/0x70
>        btrfs_cache_block_group+0x2d5/0x510
>        find_free_extent+0xb6e/0x12f0
>        btrfs_reserve_extent+0xb3/0x1b0
>        btrfs_alloc_tree_block+0xb1/0x330
>        alloc_tree_block_no_bg_flush+0x4f/0x60
>        __btrfs_cow_block+0x11d/0x580
>        btrfs_cow_block+0x10c/0x220
>        commit_cowonly_roots+0x47/0x2e0
>        btrfs_commit_transaction+0x595/0xbd0
>        sync_filesystem+0x74/0x90
>        generic_shutdown_super+0x22/0x100
>        kill_anon_super+0x14/0x30
>        btrfs_kill_super+0x12/0x20
>        deactivate_locked_super+0x36/0xa0
>        cleanup_mnt+0x12d/0x190
>        task_work_run+0x5c/0xa0
>        exit_to_user_mode_prepare+0x1df/0x200
>        syscall_exit_to_user_mode+0x54/0x280
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #1 (&space_info->groups_sem){++++}-{3:3}:
>        down_read+0x40/0x130
>        find_free_extent+0x2ed/0x12f0
>        btrfs_reserve_extent+0xb3/0x1b0
>        btrfs_alloc_tree_block+0xb1/0x330
>        alloc_tree_block_no_bg_flush+0x4f/0x60
>        __btrfs_cow_block+0x11d/0x580
>        btrfs_cow_block+0x10c/0x220
>        commit_cowonly_roots+0x47/0x2e0
>        btrfs_commit_transaction+0x595/0xbd0
>        sync_filesystem+0x74/0x90
>        generic_shutdown_super+0x22/0x100
>        kill_anon_super+0x14/0x30
>        btrfs_kill_super+0x12/0x20
>        deactivate_locked_super+0x36/0xa0
>        cleanup_mnt+0x12d/0x190
>        task_work_run+0x5c/0xa0
>        exit_to_user_mode_prepare+0x1df/0x200
>        syscall_exit_to_user_mode+0x54/0x280
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (btrfs-root-00){++++}-{3:3}:
>        __lock_acquire+0x1167/0x2150
>        lock_acquire+0xb9/0x3d0
>        down_read_nested+0x43/0x130
>        __btrfs_tree_read_lock+0x32/0x170
>        __btrfs_read_lock_root_node+0x3a/0x50
>        btrfs_search_slot+0x614/0x9d0
>        btrfs_find_root+0x35/0x1b0
>        btrfs_read_tree_root+0x61/0x120
>        btrfs_get_root_ref+0x14b/0x600
>        find_parent_nodes+0x3e6/0x1b30
>        btrfs_find_all_roots_safe+0xb4/0x130
>        btrfs_find_all_roots+0x60/0x80
>        btrfs_qgroup_trace_extent_post+0x27/0x40
>        btrfs_add_delayed_data_ref+0x3fd/0x460
>        btrfs_free_extent+0x42/0x100
>        __btrfs_mod_ref+0x1d7/0x2f0
>        walk_up_proc+0x11c/0x400
>        walk_up_tree+0xf0/0x180
>        btrfs_drop_snapshot+0x1c7/0x780
>        btrfs_clean_one_deleted_snapshot+0xfb/0x110
>        cleaner_kthread+0xd4/0x140
>        kthread+0x13a/0x150
>        ret_from_fork+0x1f/0x30
>
> other info that might help us debug this:
>
> Chain exists of:
>   btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&fs_info->commit_root_sem);
>                                lock(&space_info->groups_sem);
>                                lock(&fs_info->commit_root_sem);
>   lock(btrfs-root-00);
>
>  *** DEADLOCK ***
>
> 3 locks held by btrfs-cleaner/3445:
>  #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
>  #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
>  #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> stack backtrace:
> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>  dump_stack+0x8b/0xb0
>  check_noncircular+0xcf/0xf0
>  __lock_acquire+0x1167/0x2150
>  ? __bfs+0x42/0x210
>  lock_acquire+0xb9/0x3d0
>  ? __btrfs_tree_read_lock+0x32/0x170
>  down_read_nested+0x43/0x130
>  ? __btrfs_tree_read_lock+0x32/0x170
>  __btrfs_tree_read_lock+0x32/0x170
>  __btrfs_read_lock_root_node+0x3a/0x50
>  btrfs_search_slot+0x614/0x9d0
>  ? find_held_lock+0x2b/0x80
>  btrfs_find_root+0x35/0x1b0
>  ? do_raw_spin_unlock+0x4b/0xa0
>  btrfs_read_tree_root+0x61/0x120
>  btrfs_get_root_ref+0x14b/0x600
>  find_parent_nodes+0x3e6/0x1b30
>  btrfs_find_all_roots_safe+0xb4/0x130
>  btrfs_find_all_roots+0x60/0x80
>  btrfs_qgroup_trace_extent_post+0x27/0x40
>  btrfs_add_delayed_data_ref+0x3fd/0x460
>  btrfs_free_extent+0x42/0x100
>  __btrfs_mod_ref+0x1d7/0x2f0
>  walk_up_proc+0x11c/0x400
>  walk_up_tree+0xf0/0x180
>  btrfs_drop_snapshot+0x1c7/0x780
>  ? btrfs_clean_one_deleted_snapshot+0x73/0x110
>  btrfs_clean_one_deleted_snapshot+0xfb/0x110
>  cleaner_kthread+0xd4/0x140
>  ? btrfs_alloc_root+0x50/0x50
>  kthread+0x13a/0x150
>  ? kthread_create_worker_on_cpu+0x40/0x40
>  ret_from_fork+0x1f/0x30
>
> while testing another lockdep fix.  This happens because we're using the
> commit_root_sem to protect fs_info->caching_block_groups, which creates
> a dependency on the groups_sem -> commit_root_sem, which is problematic
> because we will allocate blocks while holding tree roots.  Fix this by
> making the list itself protected by the fs_info->block_group_cache_lock.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/block-group.c | 12 ++++++------
>  fs/btrfs/extent-tree.c |  2 ++
>  2 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 3ba6f3839d39..a8240913e6fc 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
>                 return 0;
>         }
>
> -       down_write(&fs_info->commit_root_sem);
> +       spin_lock(&fs_info->block_group_cache_lock);
>         refcount_inc(&caching_ctl->count);
>         list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
> -       up_write(&fs_info->commit_root_sem);
> +       spin_unlock(&fs_info->block_group_cache_lock);
>
>         btrfs_get_block_group(cache);
>
> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>         if (block_group->cached == BTRFS_CACHE_STARTED)
>                 btrfs_wait_block_group_cache_done(block_group);
>         if (block_group->has_caching_ctl) {
> -               down_write(&fs_info->commit_root_sem);
> +               spin_lock(&fs_info->block_group_cache_lock);
>                 if (!caching_ctl) {
>                         struct btrfs_caching_control *ctl;
>
> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>                 }
>                 if (caching_ctl)
>                         list_del_init(&caching_ctl->list);
> -               up_write(&fs_info->commit_root_sem);
> +               spin_unlock(&fs_info->block_group_cache_lock);
>                 if (caching_ctl) {
>                         /* Once for the caching bgs list and once for us. */
>                         btrfs_put_caching_control(caching_ctl);
> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>         struct btrfs_caching_control *caching_ctl;
>         struct rb_node *n;
>
> -       down_write(&info->commit_root_sem);
> +       spin_lock(&info->block_group_cache_lock);
>         while (!list_empty(&info->caching_block_groups)) {
>                 caching_ctl = list_entry(info->caching_block_groups.next,
>                                          struct btrfs_caching_control, list);
>                 list_del(&caching_ctl->list);
>                 btrfs_put_caching_control(caching_ctl);
>         }
> -       up_write(&info->commit_root_sem);
> +       spin_unlock(&info->block_group_cache_lock);
>
>         spin_lock(&info->unused_bgs_lock);
>         while (!list_empty(&info->unused_bgs)) {
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 5fd60b13f4f8..ff2b5c132a70 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>
>         down_write(&fs_info->commit_root_sem);
>
> +       spin_lock(&fs_info->block_group_cache_lock);

Why are we still doing the down_write on the commit_root_sem? It
doesn't seem necessary anymore (only for switching the commit roots
now).

Thanks.

>         list_for_each_entry_safe(caching_ctl, next,
>                                  &fs_info->caching_block_groups, list) {
>                 cache = caching_ctl->block_group;
> @@ -2749,6 +2750,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>                         cache->last_byte_to_unpin = caching_ctl->progress;
>                 }
>         }
> +       spin_unlock(&fs_info->block_group_cache_lock);
>
>         up_write(&fs_info->commit_root_sem);
>
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup
  2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
@ 2020-10-20 11:06   ` Filipe Manana
  0 siblings, 0 replies; 13+ messages in thread
From: Filipe Manana @ 2020-10-20 11:06 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Oct 20, 2020 at 8:37 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> I got the following lockdep splat with my rwsem patches on btrfs/104
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0+ #102 Not tainted
> ------------------------------------------------------
> btrfs-cleaner/903 is trying to acquire lock:
> ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>
> but task is already holding lock:
> ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
>        down_read+0x40/0x130
>        caching_thread+0x53/0x5a0
>        btrfs_work_helper+0xfa/0x520
>        process_one_work+0x238/0x540
>        worker_thread+0x55/0x3c0
>        kthread+0x13a/0x150
>        ret_from_fork+0x1f/0x30
>
> -> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7e/0x7b0
>        btrfs_cache_block_group+0x1e0/0x510
>        find_free_extent+0xb6e/0x12f0
>        btrfs_reserve_extent+0xb3/0x1b0
>        btrfs_alloc_tree_block+0xb1/0x330
>        alloc_tree_block_no_bg_flush+0x4f/0x60
>        __btrfs_cow_block+0x11d/0x580
>        btrfs_cow_block+0x10c/0x220
>        commit_cowonly_roots+0x47/0x2e0
>        btrfs_commit_transaction+0x595/0xbd0
>        sync_filesystem+0x74/0x90
>        generic_shutdown_super+0x22/0x100
>        kill_anon_super+0x14/0x30
>        btrfs_kill_super+0x12/0x20
>        deactivate_locked_super+0x36/0xa0
>        cleanup_mnt+0x12d/0x190
>        task_work_run+0x5c/0xa0
>        exit_to_user_mode_prepare+0x1df/0x200
>        syscall_exit_to_user_mode+0x54/0x280
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #1 (&space_info->groups_sem){++++}-{3:3}:
>        down_read+0x40/0x130
>        find_free_extent+0x2ed/0x12f0
>        btrfs_reserve_extent+0xb3/0x1b0
>        btrfs_alloc_tree_block+0xb1/0x330
>        alloc_tree_block_no_bg_flush+0x4f/0x60
>        __btrfs_cow_block+0x11d/0x580
>        btrfs_cow_block+0x10c/0x220
>        commit_cowonly_roots+0x47/0x2e0
>        btrfs_commit_transaction+0x595/0xbd0
>        sync_filesystem+0x74/0x90
>        generic_shutdown_super+0x22/0x100
>        kill_anon_super+0x14/0x30
>        btrfs_kill_super+0x12/0x20
>        deactivate_locked_super+0x36/0xa0
>        cleanup_mnt+0x12d/0x190
>        task_work_run+0x5c/0xa0
>        exit_to_user_mode_prepare+0x1df/0x200
>        syscall_exit_to_user_mode+0x54/0x280
>        entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (btrfs-root-00){++++}-{3:3}:
>        __lock_acquire+0x1167/0x2150
>        lock_acquire+0xb9/0x3d0
>        down_read_nested+0x43/0x130
>        __btrfs_tree_read_lock+0x32/0x170
>        __btrfs_read_lock_root_node+0x3a/0x50
>        btrfs_search_slot+0x614/0x9d0
>        btrfs_find_root+0x35/0x1b0
>        btrfs_read_tree_root+0x61/0x120
>        btrfs_get_root_ref+0x14b/0x600
>        find_parent_nodes+0x3e6/0x1b30
>        btrfs_find_all_roots_safe+0xb4/0x130
>        btrfs_find_all_roots+0x60/0x80
>        btrfs_qgroup_trace_extent_post+0x27/0x40
>        btrfs_add_delayed_data_ref+0x3fd/0x460
>        btrfs_free_extent+0x42/0x100
>        __btrfs_mod_ref+0x1d7/0x2f0
>        walk_up_proc+0x11c/0x400
>        walk_up_tree+0xf0/0x180
>        btrfs_drop_snapshot+0x1c7/0x780
>        btrfs_clean_one_deleted_snapshot+0xfb/0x110
>        cleaner_kthread+0xd4/0x140
>        kthread+0x13a/0x150
>        ret_from_fork+0x1f/0x30
>
> other info that might help us debug this:
>
> Chain exists of:
>   btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem
>
>  Possible unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&fs_info->commit_root_sem);
>                                lock(&caching_ctl->mutex);
>                                lock(&fs_info->commit_root_sem);
>   lock(btrfs-root-00);
>
>  *** DEADLOCK ***
>
> 3 locks held by btrfs-cleaner/903:
>  #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
>  #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
>  #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>
> stack backtrace:
> CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>  dump_stack+0x8b/0xb0
>  check_noncircular+0xcf/0xf0
>  __lock_acquire+0x1167/0x2150
>  ? __bfs+0x42/0x210
>  lock_acquire+0xb9/0x3d0
>  ? __btrfs_tree_read_lock+0x32/0x170
>  down_read_nested+0x43/0x130
>  ? __btrfs_tree_read_lock+0x32/0x170
>  __btrfs_tree_read_lock+0x32/0x170
>  __btrfs_read_lock_root_node+0x3a/0x50
>  btrfs_search_slot+0x614/0x9d0
>  ? find_held_lock+0x2b/0x80
>  btrfs_find_root+0x35/0x1b0
>  ? do_raw_spin_unlock+0x4b/0xa0
>  btrfs_read_tree_root+0x61/0x120
>  btrfs_get_root_ref+0x14b/0x600
>  find_parent_nodes+0x3e6/0x1b30
>  btrfs_find_all_roots_safe+0xb4/0x130
>  btrfs_find_all_roots+0x60/0x80
>  btrfs_qgroup_trace_extent_post+0x27/0x40
>  btrfs_add_delayed_data_ref+0x3fd/0x460
>  btrfs_free_extent+0x42/0x100
>  __btrfs_mod_ref+0x1d7/0x2f0
>  walk_up_proc+0x11c/0x400
>  walk_up_tree+0xf0/0x180
>  btrfs_drop_snapshot+0x1c7/0x780
>  ? btrfs_clean_one_deleted_snapshot+0x73/0x110
>  btrfs_clean_one_deleted_snapshot+0xfb/0x110
>  cleaner_kthread+0xd4/0x140
>  ? btrfs_alloc_root+0x50/0x50
>  kthread+0x13a/0x150
>  ? kthread_create_worker_on_cpu+0x40/0x40
>  ret_from_fork+0x1f/0x30
> BTRFS info (device sdb): disk space caching is enabled
> BTRFS info (device sdb): has skinny extents
>
> This happens because qgroups does a backref lookup when we create a
> delayed ref.  From here it may have to look up a root from an indirect
> ref, which does a normal lookup on the tree_root, which takes the read
> lock on the tree_root nodes.
>
> To fix this we need to add a variant for looking up roots that searches
> the commit root of the tree_root.  Then when we do the backref search
> using the commit root we are sure to not take any locks on the tree_root
> nodes.  This gets rid of the lockdep splat when running btrfs/104.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Looks good, only the first comment line at the top of the new function
btrfs_get_fs_root_commit_root() doesn't follow the preferred style.

Other than that, it's fine as far as I can see.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/backref.c | 14 +++++++-
>  fs/btrfs/disk-io.c | 79 ++++++++++++++++++++++++++++++++++------------
>  fs/btrfs/disk-io.h |  3 ++
>  3 files changed, 74 insertions(+), 22 deletions(-)
>
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index b3268f4ea5f3..cacba965c535 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -544,7 +544,19 @@ static int resolve_indirect_ref(struct btrfs_fs_info *fs_info,
>         int level = ref->level;
>         struct btrfs_key search_key = ref->key_for_search;
>
> -       root = btrfs_get_fs_root(fs_info, ref->root_id, false);
> +       /*
> +        * If we're search_commit_root we could possibly be holding locks on
> +        * other tree nodes.  This happens when qgroups does backref walks when
> +        * adding new delayed refs.  To deal with this we need to look in cache
> +        * for the root, and if we don't find it then we need to search the
> +        * tree_root's commit root, thus the btrfs_get_fs_root_commit_root usage
> +        * here.
> +        */
> +       if (path->search_commit_root)
> +               root = btrfs_get_fs_root_commit_root(fs_info, path,
> +                                                    ref->root_id);
> +       else
> +               root = btrfs_get_fs_root(fs_info, ref->root_id, false);
>         if (IS_ERR(root)) {
>                 ret = PTR_ERR(root);
>                 goto out_free;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 81e7b1880b5b..3972f16b333d 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1281,32 +1281,26 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
>         return 0;
>  }
>
> -struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
> -                                       struct btrfs_key *key)
> +static struct btrfs_root *read_tree_root_path(struct btrfs_root *tree_root,
> +                                             struct btrfs_path *path,
> +                                             struct btrfs_key *key)
>  {
>         struct btrfs_root *root;
>         struct btrfs_fs_info *fs_info = tree_root->fs_info;
> -       struct btrfs_path *path;
>         u64 generation;
>         int ret;
>         int level;
>
> -       path = btrfs_alloc_path();
> -       if (!path)
> -               return ERR_PTR(-ENOMEM);
> -
>         root = btrfs_alloc_root(fs_info, key->objectid, GFP_NOFS);
> -       if (!root) {
> -               ret = -ENOMEM;
> -               goto alloc_fail;
> -       }
> +       if (!root)
> +               return ERR_PTR(-ENOMEM);
>
>         ret = btrfs_find_root(tree_root, key, path,
>                               &root->root_item, &root->root_key);
>         if (ret) {
>                 if (ret > 0)
>                         ret = -ENOENT;
> -               goto find_fail;
> +               goto fail;
>         }
>
>         generation = btrfs_root_generation(&root->root_item);
> @@ -1317,21 +1311,30 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
>         if (IS_ERR(root->node)) {
>                 ret = PTR_ERR(root->node);
>                 root->node = NULL;
> -               goto find_fail;
> +               goto fail;
>         } else if (!btrfs_buffer_uptodate(root->node, generation, 0)) {
>                 ret = -EIO;
> -               goto find_fail;
> +               goto fail;
>         }
>         root->commit_root = btrfs_root_node(root);
> -out:
> -       btrfs_free_path(path);
>         return root;
> -
> -find_fail:
> +fail:
>         btrfs_put_root(root);
> -alloc_fail:
> -       root = ERR_PTR(ret);
> -       goto out;
> +       return ERR_PTR(ret);
> +}
> +
> +struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
> +                                       struct btrfs_key *key)
> +{
> +       struct btrfs_root *root;
> +       struct btrfs_path *path;
> +
> +       path = btrfs_alloc_path();
> +       if (!path)
> +               return ERR_PTR(-ENOMEM);
> +       root = read_tree_root_path(tree_root, path, key);
> +       btrfs_free_path(path);
> +       return root;
>  }
>
>  /*
> @@ -1621,6 +1624,40 @@ struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>         return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
>  }
>
> +/* btrfs_get_fs_root_commit_root - return a root for the given objectid
> + * @fs_info - the fs_info.
> + * @objectid = the objectid we need to lookup.
> + *
> + * This is exclusively used for backref walking, and exists specifically because
> + * of how qgroups does lookups.  Qgroups will do a backref lookup at delayed ref
> + * creation time, which means we may have to read the tree_root in order to look
> + * up a fs root that is not in memory.  If the root is not in memory we will
> + * read the tree root commit root and look up the fs root from there.  This is a
> + * temporary root, it will not be inserted into the radix tree as it doesn't
> + * have the most uptodate information, it'll simply be discarded once the
> + * backref code is finished using the root.
> + */
> +struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
> +                                                struct btrfs_path *path,
> +                                                u64 objectid)
> +{
> +       struct btrfs_root *root;
> +       struct btrfs_key key;
> +
> +       ASSERT(path->search_commit_root && path->skip_locking);
> +
> +       root = btrfs_lookup_fs_root(fs_info, objectid);
> +       if (root)
> +               return root;
> +
> +       key.objectid = objectid;
> +       key.type = BTRFS_ROOT_ITEM_KEY;
> +       key.offset = (u64)-1;
> +       root = read_tree_root_path(fs_info->tree_root, path, &key);
> +       btrfs_release_path(path);
> +       return root;
> +}
> +
>  /*
>   * called by the kthread helper functions to finally call the bio end_io
>   * functions.  This is where read checksum verification actually happens
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index fee69ced58b4..182540bdcea0 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -69,6 +69,9 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
>                                      u64 objectid, bool check_ref);
>  struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
>                                          u64 objectid, dev_t anon_dev);
> +struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
> +                                                struct btrfs_path *path,
> +                                                u64 objectid);
>
>  void btrfs_free_fs_info(struct btrfs_fs_info *fs_info);
>  int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info);
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
  2020-10-20 10:10   ` Filipe Manana
@ 2020-10-21 14:05     ` Josef Bacik
  2020-10-21 16:36       ` Filipe Manana
  0 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-10-21 14:05 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team

On 10/20/20 6:10 AM, Filipe Manana wrote:
> On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> I got the following lockdep splat
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.9.0+ #101 Not tainted
>> ------------------------------------------------------
>> btrfs-cleaner/3445 is trying to acquire lock:
>> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>>
>> but task is already holding lock:
>> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
>>         down_write+0x3d/0x70
>>         btrfs_cache_block_group+0x2d5/0x510
>>         find_free_extent+0xb6e/0x12f0
>>         btrfs_reserve_extent+0xb3/0x1b0
>>         btrfs_alloc_tree_block+0xb1/0x330
>>         alloc_tree_block_no_bg_flush+0x4f/0x60
>>         __btrfs_cow_block+0x11d/0x580
>>         btrfs_cow_block+0x10c/0x220
>>         commit_cowonly_roots+0x47/0x2e0
>>         btrfs_commit_transaction+0x595/0xbd0
>>         sync_filesystem+0x74/0x90
>>         generic_shutdown_super+0x22/0x100
>>         kill_anon_super+0x14/0x30
>>         btrfs_kill_super+0x12/0x20
>>         deactivate_locked_super+0x36/0xa0
>>         cleanup_mnt+0x12d/0x190
>>         task_work_run+0x5c/0xa0
>>         exit_to_user_mode_prepare+0x1df/0x200
>>         syscall_exit_to_user_mode+0x54/0x280
>>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> -> #1 (&space_info->groups_sem){++++}-{3:3}:
>>         down_read+0x40/0x130
>>         find_free_extent+0x2ed/0x12f0
>>         btrfs_reserve_extent+0xb3/0x1b0
>>         btrfs_alloc_tree_block+0xb1/0x330
>>         alloc_tree_block_no_bg_flush+0x4f/0x60
>>         __btrfs_cow_block+0x11d/0x580
>>         btrfs_cow_block+0x10c/0x220
>>         commit_cowonly_roots+0x47/0x2e0
>>         btrfs_commit_transaction+0x595/0xbd0
>>         sync_filesystem+0x74/0x90
>>         generic_shutdown_super+0x22/0x100
>>         kill_anon_super+0x14/0x30
>>         btrfs_kill_super+0x12/0x20
>>         deactivate_locked_super+0x36/0xa0
>>         cleanup_mnt+0x12d/0x190
>>         task_work_run+0x5c/0xa0
>>         exit_to_user_mode_prepare+0x1df/0x200
>>         syscall_exit_to_user_mode+0x54/0x280
>>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> -> #0 (btrfs-root-00){++++}-{3:3}:
>>         __lock_acquire+0x1167/0x2150
>>         lock_acquire+0xb9/0x3d0
>>         down_read_nested+0x43/0x130
>>         __btrfs_tree_read_lock+0x32/0x170
>>         __btrfs_read_lock_root_node+0x3a/0x50
>>         btrfs_search_slot+0x614/0x9d0
>>         btrfs_find_root+0x35/0x1b0
>>         btrfs_read_tree_root+0x61/0x120
>>         btrfs_get_root_ref+0x14b/0x600
>>         find_parent_nodes+0x3e6/0x1b30
>>         btrfs_find_all_roots_safe+0xb4/0x130
>>         btrfs_find_all_roots+0x60/0x80
>>         btrfs_qgroup_trace_extent_post+0x27/0x40
>>         btrfs_add_delayed_data_ref+0x3fd/0x460
>>         btrfs_free_extent+0x42/0x100
>>         __btrfs_mod_ref+0x1d7/0x2f0
>>         walk_up_proc+0x11c/0x400
>>         walk_up_tree+0xf0/0x180
>>         btrfs_drop_snapshot+0x1c7/0x780
>>         btrfs_clean_one_deleted_snapshot+0xfb/0x110
>>         cleaner_kthread+0xd4/0x140
>>         kthread+0x13a/0x150
>>         ret_from_fork+0x1f/0x30
>>
>> other info that might help us debug this:
>>
>> Chain exists of:
>>    btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
>>
>>   Possible unsafe locking scenario:
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&fs_info->commit_root_sem);
>>                                 lock(&space_info->groups_sem);
>>                                 lock(&fs_info->commit_root_sem);
>>    lock(btrfs-root-00);
>>
>>   *** DEADLOCK ***
>>
>> 3 locks held by btrfs-cleaner/3445:
>>   #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
>>   #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
>>   #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>
>> stack backtrace:
>> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
>> Call Trace:
>>   dump_stack+0x8b/0xb0
>>   check_noncircular+0xcf/0xf0
>>   __lock_acquire+0x1167/0x2150
>>   ? __bfs+0x42/0x210
>>   lock_acquire+0xb9/0x3d0
>>   ? __btrfs_tree_read_lock+0x32/0x170
>>   down_read_nested+0x43/0x130
>>   ? __btrfs_tree_read_lock+0x32/0x170
>>   __btrfs_tree_read_lock+0x32/0x170
>>   __btrfs_read_lock_root_node+0x3a/0x50
>>   btrfs_search_slot+0x614/0x9d0
>>   ? find_held_lock+0x2b/0x80
>>   btrfs_find_root+0x35/0x1b0
>>   ? do_raw_spin_unlock+0x4b/0xa0
>>   btrfs_read_tree_root+0x61/0x120
>>   btrfs_get_root_ref+0x14b/0x600
>>   find_parent_nodes+0x3e6/0x1b30
>>   btrfs_find_all_roots_safe+0xb4/0x130
>>   btrfs_find_all_roots+0x60/0x80
>>   btrfs_qgroup_trace_extent_post+0x27/0x40
>>   btrfs_add_delayed_data_ref+0x3fd/0x460
>>   btrfs_free_extent+0x42/0x100
>>   __btrfs_mod_ref+0x1d7/0x2f0
>>   walk_up_proc+0x11c/0x400
>>   walk_up_tree+0xf0/0x180
>>   btrfs_drop_snapshot+0x1c7/0x780
>>   ? btrfs_clean_one_deleted_snapshot+0x73/0x110
>>   btrfs_clean_one_deleted_snapshot+0xfb/0x110
>>   cleaner_kthread+0xd4/0x140
>>   ? btrfs_alloc_root+0x50/0x50
>>   kthread+0x13a/0x150
>>   ? kthread_create_worker_on_cpu+0x40/0x40
>>   ret_from_fork+0x1f/0x30
>>
>> while testing another lockdep fix.  This happens because we're using the
>> commit_root_sem to protect fs_info->caching_block_groups, which creates
>> a dependency on the groups_sem -> commit_root_sem, which is problematic
>> because we will allocate blocks while holding tree roots.  Fix this by
>> making the list itself protected by the fs_info->block_group_cache_lock.
>>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/block-group.c | 12 ++++++------
>>   fs/btrfs/extent-tree.c |  2 ++
>>   2 files changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>> index 3ba6f3839d39..a8240913e6fc 100644
>> --- a/fs/btrfs/block-group.c
>> +++ b/fs/btrfs/block-group.c
>> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
>>                  return 0;
>>          }
>>
>> -       down_write(&fs_info->commit_root_sem);
>> +       spin_lock(&fs_info->block_group_cache_lock);
>>          refcount_inc(&caching_ctl->count);
>>          list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
>> -       up_write(&fs_info->commit_root_sem);
>> +       spin_unlock(&fs_info->block_group_cache_lock);
>>
>>          btrfs_get_block_group(cache);
>>
>> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>          if (block_group->cached == BTRFS_CACHE_STARTED)
>>                  btrfs_wait_block_group_cache_done(block_group);
>>          if (block_group->has_caching_ctl) {
>> -               down_write(&fs_info->commit_root_sem);
>> +               spin_lock(&fs_info->block_group_cache_lock);
>>                  if (!caching_ctl) {
>>                          struct btrfs_caching_control *ctl;
>>
>> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>                  }
>>                  if (caching_ctl)
>>                          list_del_init(&caching_ctl->list);
>> -               up_write(&fs_info->commit_root_sem);
>> +               spin_unlock(&fs_info->block_group_cache_lock);
>>                  if (caching_ctl) {
>>                          /* Once for the caching bgs list and once for us. */
>>                          btrfs_put_caching_control(caching_ctl);
>> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>>          struct btrfs_caching_control *caching_ctl;
>>          struct rb_node *n;
>>
>> -       down_write(&info->commit_root_sem);
>> +       spin_lock(&info->block_group_cache_lock);
>>          while (!list_empty(&info->caching_block_groups)) {
>>                  caching_ctl = list_entry(info->caching_block_groups.next,
>>                                           struct btrfs_caching_control, list);
>>                  list_del(&caching_ctl->list);
>>                  btrfs_put_caching_control(caching_ctl);
>>          }
>> -       up_write(&info->commit_root_sem);
>> +       spin_unlock(&info->block_group_cache_lock);
>>
>>          spin_lock(&info->unused_bgs_lock);
>>          while (!list_empty(&info->unused_bgs)) {
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 5fd60b13f4f8..ff2b5c132a70 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>>
>>          down_write(&fs_info->commit_root_sem);
>>
>> +       spin_lock(&fs_info->block_group_cache_lock);
> 
> Why are we still doing the down_write on the commit_root_sem? It
> doesn't seem necessary anymore (only for switching the commit roots
> now).

Because the commit_root_sem is doing something else here, it's protecting the 
->last_byte_to_unpin field.  We have to coordinate the end of the transaction 
with the caching threads, but the list itself just needs to be protected with a 
spin_lock.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups
  2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
  2020-10-20  9:31   ` Filipe Manana
@ 2020-10-21 15:19   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-10-21 15:19 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Oct 19, 2020 at 04:02:29PM -0400, Josef Bacik wrote:
> When enabling qgroups we walk the tree_root and then add a qgroup item
> for every root that we have.  This creates a lock dependency on the
> tree_root and qgroup_root, which results in the following lockdep splat

This should mention that it's with the rwsem and that we've seen it on
btrfs/022 and btrfs/017.

That it's with rwsem is in the cover letter but we need it in the
patches too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
  2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
  2020-10-20 10:10   ` Filipe Manana
@ 2020-10-21 15:51   ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-10-21 15:51 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Oct 19, 2020 at 04:02:30PM -0400, Josef Bacik wrote:
> while testing another lockdep fix.  This happens because we're using the
> commit_root_sem to protect fs_info->caching_block_groups, which creates
> a dependency on the groups_sem -> commit_root_sem, which is problematic
> because we will allocate blocks while holding tree roots.  Fix this by
> making the list itself protected by the fs_info->block_group_cache_lock.

That's lacking explanation why this is supposed to be correct. Replacing
semaphore by a spin lock has implications and given that he commit root
sem does not protect a single structure or is sometimes used for context
exclusion, this requires more in the changelog and updated perhaps
comments for the locks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] Lockdep fixes for misc-next
  2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
                   ` (2 preceding siblings ...)
  2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
@ 2020-10-21 16:13 ` David Sterba
  3 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-10-21 16:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Mon, Oct 19, 2020 at 04:02:28PM -0400, Josef Bacik wrote:
> Hello,
> 
> Here's a few lockdep fixes for misc-next+my rwsem patch.  Nothing too crazy, but
> the last one is a little wonkey because qgroups does a backref resolution while
> adding delayed refs.  Generally we do the right thing with searching the commit
> roots and skipping the locking, with the exception of looking up fs roots if we
> have to resolve indirect refs.  This obviously uses the normal lookup and
> locking stuff, which is problematic in the new world order.  For now I'm fixing
> it with a special helper for backref lookups that either finds the root in
> cache, or generates a temporary root that's not inserted into the fs roots radix
> tree and is only used to do the backref resolution.  Thanks,
> 
> Josef
> 
> Josef Bacik (3):
>   btrfs: drop the path before adding qgroup items when enabling qgroups
>   btrfs: protect the fs_info->caching_block_groups differently
>   btrfs: add a helper to read the tree_root commit root for backref
>     lookup

I'll add 1 and 3 to misc-next for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
  2020-10-21 14:05     ` Josef Bacik
@ 2020-10-21 16:36       ` Filipe Manana
  2020-10-21 18:44         ` Josef Bacik
  0 siblings, 1 reply; 13+ messages in thread
From: Filipe Manana @ 2020-10-21 16:36 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Oct 21, 2020 at 3:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On 10/20/20 6:10 AM, Filipe Manana wrote:
> > On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >>
> >> I got the following lockdep splat
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 5.9.0+ #101 Not tainted
> >> ------------------------------------------------------
> >> btrfs-cleaner/3445 is trying to acquire lock:
> >> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
> >>
> >> but task is already holding lock:
> >> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
> >>         down_write+0x3d/0x70
> >>         btrfs_cache_block_group+0x2d5/0x510
> >>         find_free_extent+0xb6e/0x12f0
> >>         btrfs_reserve_extent+0xb3/0x1b0
> >>         btrfs_alloc_tree_block+0xb1/0x330
> >>         alloc_tree_block_no_bg_flush+0x4f/0x60
> >>         __btrfs_cow_block+0x11d/0x580
> >>         btrfs_cow_block+0x10c/0x220
> >>         commit_cowonly_roots+0x47/0x2e0
> >>         btrfs_commit_transaction+0x595/0xbd0
> >>         sync_filesystem+0x74/0x90
> >>         generic_shutdown_super+0x22/0x100
> >>         kill_anon_super+0x14/0x30
> >>         btrfs_kill_super+0x12/0x20
> >>         deactivate_locked_super+0x36/0xa0
> >>         cleanup_mnt+0x12d/0x190
> >>         task_work_run+0x5c/0xa0
> >>         exit_to_user_mode_prepare+0x1df/0x200
> >>         syscall_exit_to_user_mode+0x54/0x280
> >>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >> -> #1 (&space_info->groups_sem){++++}-{3:3}:
> >>         down_read+0x40/0x130
> >>         find_free_extent+0x2ed/0x12f0
> >>         btrfs_reserve_extent+0xb3/0x1b0
> >>         btrfs_alloc_tree_block+0xb1/0x330
> >>         alloc_tree_block_no_bg_flush+0x4f/0x60
> >>         __btrfs_cow_block+0x11d/0x580
> >>         btrfs_cow_block+0x10c/0x220
> >>         commit_cowonly_roots+0x47/0x2e0
> >>         btrfs_commit_transaction+0x595/0xbd0
> >>         sync_filesystem+0x74/0x90
> >>         generic_shutdown_super+0x22/0x100
> >>         kill_anon_super+0x14/0x30
> >>         btrfs_kill_super+0x12/0x20
> >>         deactivate_locked_super+0x36/0xa0
> >>         cleanup_mnt+0x12d/0x190
> >>         task_work_run+0x5c/0xa0
> >>         exit_to_user_mode_prepare+0x1df/0x200
> >>         syscall_exit_to_user_mode+0x54/0x280
> >>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >> -> #0 (btrfs-root-00){++++}-{3:3}:
> >>         __lock_acquire+0x1167/0x2150
> >>         lock_acquire+0xb9/0x3d0
> >>         down_read_nested+0x43/0x130
> >>         __btrfs_tree_read_lock+0x32/0x170
> >>         __btrfs_read_lock_root_node+0x3a/0x50
> >>         btrfs_search_slot+0x614/0x9d0
> >>         btrfs_find_root+0x35/0x1b0
> >>         btrfs_read_tree_root+0x61/0x120
> >>         btrfs_get_root_ref+0x14b/0x600
> >>         find_parent_nodes+0x3e6/0x1b30
> >>         btrfs_find_all_roots_safe+0xb4/0x130
> >>         btrfs_find_all_roots+0x60/0x80
> >>         btrfs_qgroup_trace_extent_post+0x27/0x40
> >>         btrfs_add_delayed_data_ref+0x3fd/0x460
> >>         btrfs_free_extent+0x42/0x100
> >>         __btrfs_mod_ref+0x1d7/0x2f0
> >>         walk_up_proc+0x11c/0x400
> >>         walk_up_tree+0xf0/0x180
> >>         btrfs_drop_snapshot+0x1c7/0x780
> >>         btrfs_clean_one_deleted_snapshot+0xfb/0x110
> >>         cleaner_kthread+0xd4/0x140
> >>         kthread+0x13a/0x150
> >>         ret_from_fork+0x1f/0x30
> >>
> >> other info that might help us debug this:
> >>
> >> Chain exists of:
> >>    btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
> >>
> >>   Possible unsafe locking scenario:
> >>
> >>         CPU0                    CPU1
> >>         ----                    ----
> >>    lock(&fs_info->commit_root_sem);
> >>                                 lock(&space_info->groups_sem);
> >>                                 lock(&fs_info->commit_root_sem);
> >>    lock(btrfs-root-00);
> >>
> >>   *** DEADLOCK ***
> >>
> >> 3 locks held by btrfs-cleaner/3445:
> >>   #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
> >>   #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
> >>   #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
> >>
> >> stack backtrace:
> >> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
> >> Call Trace:
> >>   dump_stack+0x8b/0xb0
> >>   check_noncircular+0xcf/0xf0
> >>   __lock_acquire+0x1167/0x2150
> >>   ? __bfs+0x42/0x210
> >>   lock_acquire+0xb9/0x3d0
> >>   ? __btrfs_tree_read_lock+0x32/0x170
> >>   down_read_nested+0x43/0x130
> >>   ? __btrfs_tree_read_lock+0x32/0x170
> >>   __btrfs_tree_read_lock+0x32/0x170
> >>   __btrfs_read_lock_root_node+0x3a/0x50
> >>   btrfs_search_slot+0x614/0x9d0
> >>   ? find_held_lock+0x2b/0x80
> >>   btrfs_find_root+0x35/0x1b0
> >>   ? do_raw_spin_unlock+0x4b/0xa0
> >>   btrfs_read_tree_root+0x61/0x120
> >>   btrfs_get_root_ref+0x14b/0x600
> >>   find_parent_nodes+0x3e6/0x1b30
> >>   btrfs_find_all_roots_safe+0xb4/0x130
> >>   btrfs_find_all_roots+0x60/0x80
> >>   btrfs_qgroup_trace_extent_post+0x27/0x40
> >>   btrfs_add_delayed_data_ref+0x3fd/0x460
> >>   btrfs_free_extent+0x42/0x100
> >>   __btrfs_mod_ref+0x1d7/0x2f0
> >>   walk_up_proc+0x11c/0x400
> >>   walk_up_tree+0xf0/0x180
> >>   btrfs_drop_snapshot+0x1c7/0x780
> >>   ? btrfs_clean_one_deleted_snapshot+0x73/0x110
> >>   btrfs_clean_one_deleted_snapshot+0xfb/0x110
> >>   cleaner_kthread+0xd4/0x140
> >>   ? btrfs_alloc_root+0x50/0x50
> >>   kthread+0x13a/0x150
> >>   ? kthread_create_worker_on_cpu+0x40/0x40
> >>   ret_from_fork+0x1f/0x30
> >>
> >> while testing another lockdep fix.  This happens because we're using the
> >> commit_root_sem to protect fs_info->caching_block_groups, which creates
> >> a dependency on the groups_sem -> commit_root_sem, which is problematic
> >> because we will allocate blocks while holding tree roots.  Fix this by
> >> making the list itself protected by the fs_info->block_group_cache_lock.
> >>
> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> >> ---
> >>   fs/btrfs/block-group.c | 12 ++++++------
> >>   fs/btrfs/extent-tree.c |  2 ++
> >>   2 files changed, 8 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> >> index 3ba6f3839d39..a8240913e6fc 100644
> >> --- a/fs/btrfs/block-group.c
> >> +++ b/fs/btrfs/block-group.c
> >> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
> >>                  return 0;
> >>          }
> >>
> >> -       down_write(&fs_info->commit_root_sem);
> >> +       spin_lock(&fs_info->block_group_cache_lock);
> >>          refcount_inc(&caching_ctl->count);
> >>          list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
> >> -       up_write(&fs_info->commit_root_sem);
> >> +       spin_unlock(&fs_info->block_group_cache_lock);
> >>
> >>          btrfs_get_block_group(cache);
> >>
> >> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >>          if (block_group->cached == BTRFS_CACHE_STARTED)
> >>                  btrfs_wait_block_group_cache_done(block_group);
> >>          if (block_group->has_caching_ctl) {
> >> -               down_write(&fs_info->commit_root_sem);
> >> +               spin_lock(&fs_info->block_group_cache_lock);
> >>                  if (!caching_ctl) {
> >>                          struct btrfs_caching_control *ctl;
> >>
> >> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> >>                  }
> >>                  if (caching_ctl)
> >>                          list_del_init(&caching_ctl->list);
> >> -               up_write(&fs_info->commit_root_sem);
> >> +               spin_unlock(&fs_info->block_group_cache_lock);
> >>                  if (caching_ctl) {
> >>                          /* Once for the caching bgs list and once for us. */
> >>                          btrfs_put_caching_control(caching_ctl);
> >> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
> >>          struct btrfs_caching_control *caching_ctl;
> >>          struct rb_node *n;
> >>
> >> -       down_write(&info->commit_root_sem);
> >> +       spin_lock(&info->block_group_cache_lock);
> >>          while (!list_empty(&info->caching_block_groups)) {
> >>                  caching_ctl = list_entry(info->caching_block_groups.next,
> >>                                           struct btrfs_caching_control, list);
> >>                  list_del(&caching_ctl->list);
> >>                  btrfs_put_caching_control(caching_ctl);
> >>          }
> >> -       up_write(&info->commit_root_sem);
> >> +       spin_unlock(&info->block_group_cache_lock);
> >>
> >>          spin_lock(&info->unused_bgs_lock);
> >>          while (!list_empty(&info->unused_bgs)) {
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index 5fd60b13f4f8..ff2b5c132a70 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
> >>
> >>          down_write(&fs_info->commit_root_sem);
> >>
> >> +       spin_lock(&fs_info->block_group_cache_lock);
> >
> > Why are we still doing the down_write on the commit_root_sem? It
> > doesn't seem necessary anymore (only for switching the commit roots
> > now).
>
> Because the commit_root_sem is doing something else here, it's protecting the
> ->last_byte_to_unpin field.  We have to coordinate the end of the transaction
> with the caching threads, but the list itself just needs to be protected with a
> spin_lock.  Thanks,

It's not clear what protects last_byte_to_unpin.
We have several places where we read or write to it without locking
commit_root_sem.

unpin_extent_range() reads it, called from finish_extent_commit(), but
doesn't take the lock - even though it's very unlikely to have races
in this case.
read_on_block_group() also reads it without commit_root_sem, however
this is only called during mount time, so it should be safe.
btrfs_cache_block_group() sets it without holding commit_root_sem, but
while holding the block group's spinlock.

It seems more intuitive to use the block group's spinlock to protect it.
And having a comment on top of the declaration of 'last_byte_to_unpin'
mentioning which lock protects it would be nice to have, or a comment
on top of the lock mentioning what it protects (or both), even if that
isn't strictly related to this patch.

We're highly inconsistent writing to and reading from
block_group->last_byte_to_unpin.

Why can't we just always use the block group's spinlock?

Anyway, it's not like there's something wrong that this patch is
introducing, so:

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.


>
> Josef



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently
  2020-10-21 16:36       ` Filipe Manana
@ 2020-10-21 18:44         ` Josef Bacik
  0 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-10-21 18:44 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs, kernel-team

On 10/21/20 12:36 PM, Filipe Manana wrote:
> On Wed, Oct 21, 2020 at 3:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
>>
>> On 10/20/20 6:10 AM, Filipe Manana wrote:
>>> On Tue, Oct 20, 2020 at 8:36 AM Josef Bacik <josef@toxicpanda.com> wrote:
>>>>
>>>> I got the following lockdep splat
>>>>
>>>> ======================================================
>>>> WARNING: possible circular locking dependency detected
>>>> 5.9.0+ #101 Not tainted
>>>> ------------------------------------------------------
>>>> btrfs-cleaner/3445 is trying to acquire lock:
>>>> ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
>>>>
>>>> but task is already holding lock:
>>>> ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>>>
>>>> which lock already depends on the new lock.
>>>>
>>>> the existing dependency chain (in reverse order) is:
>>>>
>>>> -> #2 (&fs_info->commit_root_sem){++++}-{3:3}:
>>>>          down_write+0x3d/0x70
>>>>          btrfs_cache_block_group+0x2d5/0x510
>>>>          find_free_extent+0xb6e/0x12f0
>>>>          btrfs_reserve_extent+0xb3/0x1b0
>>>>          btrfs_alloc_tree_block+0xb1/0x330
>>>>          alloc_tree_block_no_bg_flush+0x4f/0x60
>>>>          __btrfs_cow_block+0x11d/0x580
>>>>          btrfs_cow_block+0x10c/0x220
>>>>          commit_cowonly_roots+0x47/0x2e0
>>>>          btrfs_commit_transaction+0x595/0xbd0
>>>>          sync_filesystem+0x74/0x90
>>>>          generic_shutdown_super+0x22/0x100
>>>>          kill_anon_super+0x14/0x30
>>>>          btrfs_kill_super+0x12/0x20
>>>>          deactivate_locked_super+0x36/0xa0
>>>>          cleanup_mnt+0x12d/0x190
>>>>          task_work_run+0x5c/0xa0
>>>>          exit_to_user_mode_prepare+0x1df/0x200
>>>>          syscall_exit_to_user_mode+0x54/0x280
>>>>          entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>
>>>> -> #1 (&space_info->groups_sem){++++}-{3:3}:
>>>>          down_read+0x40/0x130
>>>>          find_free_extent+0x2ed/0x12f0
>>>>          btrfs_reserve_extent+0xb3/0x1b0
>>>>          btrfs_alloc_tree_block+0xb1/0x330
>>>>          alloc_tree_block_no_bg_flush+0x4f/0x60
>>>>          __btrfs_cow_block+0x11d/0x580
>>>>          btrfs_cow_block+0x10c/0x220
>>>>          commit_cowonly_roots+0x47/0x2e0
>>>>          btrfs_commit_transaction+0x595/0xbd0
>>>>          sync_filesystem+0x74/0x90
>>>>          generic_shutdown_super+0x22/0x100
>>>>          kill_anon_super+0x14/0x30
>>>>          btrfs_kill_super+0x12/0x20
>>>>          deactivate_locked_super+0x36/0xa0
>>>>          cleanup_mnt+0x12d/0x190
>>>>          task_work_run+0x5c/0xa0
>>>>          exit_to_user_mode_prepare+0x1df/0x200
>>>>          syscall_exit_to_user_mode+0x54/0x280
>>>>          entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>
>>>> -> #0 (btrfs-root-00){++++}-{3:3}:
>>>>          __lock_acquire+0x1167/0x2150
>>>>          lock_acquire+0xb9/0x3d0
>>>>          down_read_nested+0x43/0x130
>>>>          __btrfs_tree_read_lock+0x32/0x170
>>>>          __btrfs_read_lock_root_node+0x3a/0x50
>>>>          btrfs_search_slot+0x614/0x9d0
>>>>          btrfs_find_root+0x35/0x1b0
>>>>          btrfs_read_tree_root+0x61/0x120
>>>>          btrfs_get_root_ref+0x14b/0x600
>>>>          find_parent_nodes+0x3e6/0x1b30
>>>>          btrfs_find_all_roots_safe+0xb4/0x130
>>>>          btrfs_find_all_roots+0x60/0x80
>>>>          btrfs_qgroup_trace_extent_post+0x27/0x40
>>>>          btrfs_add_delayed_data_ref+0x3fd/0x460
>>>>          btrfs_free_extent+0x42/0x100
>>>>          __btrfs_mod_ref+0x1d7/0x2f0
>>>>          walk_up_proc+0x11c/0x400
>>>>          walk_up_tree+0xf0/0x180
>>>>          btrfs_drop_snapshot+0x1c7/0x780
>>>>          btrfs_clean_one_deleted_snapshot+0xfb/0x110
>>>>          cleaner_kthread+0xd4/0x140
>>>>          kthread+0x13a/0x150
>>>>          ret_from_fork+0x1f/0x30
>>>>
>>>> other info that might help us debug this:
>>>>
>>>> Chain exists of:
>>>>     btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem
>>>>
>>>>    Possible unsafe locking scenario:
>>>>
>>>>          CPU0                    CPU1
>>>>          ----                    ----
>>>>     lock(&fs_info->commit_root_sem);
>>>>                                  lock(&space_info->groups_sem);
>>>>                                  lock(&fs_info->commit_root_sem);
>>>>     lock(btrfs-root-00);
>>>>
>>>>    *** DEADLOCK ***
>>>>
>>>> 3 locks held by btrfs-cleaner/3445:
>>>>    #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
>>>>    #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
>>>>    #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
>>>>
>>>> stack backtrace:
>>>> CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
>>>> Call Trace:
>>>>    dump_stack+0x8b/0xb0
>>>>    check_noncircular+0xcf/0xf0
>>>>    __lock_acquire+0x1167/0x2150
>>>>    ? __bfs+0x42/0x210
>>>>    lock_acquire+0xb9/0x3d0
>>>>    ? __btrfs_tree_read_lock+0x32/0x170
>>>>    down_read_nested+0x43/0x130
>>>>    ? __btrfs_tree_read_lock+0x32/0x170
>>>>    __btrfs_tree_read_lock+0x32/0x170
>>>>    __btrfs_read_lock_root_node+0x3a/0x50
>>>>    btrfs_search_slot+0x614/0x9d0
>>>>    ? find_held_lock+0x2b/0x80
>>>>    btrfs_find_root+0x35/0x1b0
>>>>    ? do_raw_spin_unlock+0x4b/0xa0
>>>>    btrfs_read_tree_root+0x61/0x120
>>>>    btrfs_get_root_ref+0x14b/0x600
>>>>    find_parent_nodes+0x3e6/0x1b30
>>>>    btrfs_find_all_roots_safe+0xb4/0x130
>>>>    btrfs_find_all_roots+0x60/0x80
>>>>    btrfs_qgroup_trace_extent_post+0x27/0x40
>>>>    btrfs_add_delayed_data_ref+0x3fd/0x460
>>>>    btrfs_free_extent+0x42/0x100
>>>>    __btrfs_mod_ref+0x1d7/0x2f0
>>>>    walk_up_proc+0x11c/0x400
>>>>    walk_up_tree+0xf0/0x180
>>>>    btrfs_drop_snapshot+0x1c7/0x780
>>>>    ? btrfs_clean_one_deleted_snapshot+0x73/0x110
>>>>    btrfs_clean_one_deleted_snapshot+0xfb/0x110
>>>>    cleaner_kthread+0xd4/0x140
>>>>    ? btrfs_alloc_root+0x50/0x50
>>>>    kthread+0x13a/0x150
>>>>    ? kthread_create_worker_on_cpu+0x40/0x40
>>>>    ret_from_fork+0x1f/0x30
>>>>
>>>> while testing another lockdep fix.  This happens because we're using the
>>>> commit_root_sem to protect fs_info->caching_block_groups, which creates
>>>> a dependency on the groups_sem -> commit_root_sem, which is problematic
>>>> because we will allocate blocks while holding tree roots.  Fix this by
>>>> making the list itself protected by the fs_info->block_group_cache_lock.
>>>>
>>>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>>>> ---
>>>>    fs/btrfs/block-group.c | 12 ++++++------
>>>>    fs/btrfs/extent-tree.c |  2 ++
>>>>    2 files changed, 8 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
>>>> index 3ba6f3839d39..a8240913e6fc 100644
>>>> --- a/fs/btrfs/block-group.c
>>>> +++ b/fs/btrfs/block-group.c
>>>> @@ -796,10 +796,10 @@ int btrfs_cache_block_group(struct btrfs_block_group *cache, int load_cache_only
>>>>                   return 0;
>>>>           }
>>>>
>>>> -       down_write(&fs_info->commit_root_sem);
>>>> +       spin_lock(&fs_info->block_group_cache_lock);
>>>>           refcount_inc(&caching_ctl->count);
>>>>           list_add_tail(&caching_ctl->list, &fs_info->caching_block_groups);
>>>> -       up_write(&fs_info->commit_root_sem);
>>>> +       spin_unlock(&fs_info->block_group_cache_lock);
>>>>
>>>>           btrfs_get_block_group(cache);
>>>>
>>>> @@ -1043,7 +1043,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>>>           if (block_group->cached == BTRFS_CACHE_STARTED)
>>>>                   btrfs_wait_block_group_cache_done(block_group);
>>>>           if (block_group->has_caching_ctl) {
>>>> -               down_write(&fs_info->commit_root_sem);
>>>> +               spin_lock(&fs_info->block_group_cache_lock);
>>>>                   if (!caching_ctl) {
>>>>                           struct btrfs_caching_control *ctl;
>>>>
>>>> @@ -1057,7 +1057,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
>>>>                   }
>>>>                   if (caching_ctl)
>>>>                           list_del_init(&caching_ctl->list);
>>>> -               up_write(&fs_info->commit_root_sem);
>>>> +               spin_unlock(&fs_info->block_group_cache_lock);
>>>>                   if (caching_ctl) {
>>>>                           /* Once for the caching bgs list and once for us. */
>>>>                           btrfs_put_caching_control(caching_ctl);
>>>> @@ -3307,14 +3307,14 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
>>>>           struct btrfs_caching_control *caching_ctl;
>>>>           struct rb_node *n;
>>>>
>>>> -       down_write(&info->commit_root_sem);
>>>> +       spin_lock(&info->block_group_cache_lock);
>>>>           while (!list_empty(&info->caching_block_groups)) {
>>>>                   caching_ctl = list_entry(info->caching_block_groups.next,
>>>>                                            struct btrfs_caching_control, list);
>>>>                   list_del(&caching_ctl->list);
>>>>                   btrfs_put_caching_control(caching_ctl);
>>>>           }
>>>> -       up_write(&info->commit_root_sem);
>>>> +       spin_unlock(&info->block_group_cache_lock);
>>>>
>>>>           spin_lock(&info->unused_bgs_lock);
>>>>           while (!list_empty(&info->unused_bgs)) {
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 5fd60b13f4f8..ff2b5c132a70 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -2738,6 +2738,7 @@ void btrfs_prepare_extent_commit(struct btrfs_fs_info *fs_info)
>>>>
>>>>           down_write(&fs_info->commit_root_sem);
>>>>
>>>> +       spin_lock(&fs_info->block_group_cache_lock);
>>>
>>> Why are we still doing the down_write on the commit_root_sem? It
>>> doesn't seem necessary anymore (only for switching the commit roots
>>> now).
>>
>> Because the commit_root_sem is doing something else here, it's protecting the
>> ->last_byte_to_unpin field.  We have to coordinate the end of the transaction
>> with the caching threads, but the list itself just needs to be protected with a
>> spin_lock.  Thanks,
> 
> It's not clear what protects last_byte_to_unpin.
> We have several places where we read or write to it without locking
> commit_root_sem.
> 
> unpin_extent_range() reads it, called from finish_extent_commit(), but
> doesn't take the lock - even though it's very unlikely to have races
> in this case.
> read_on_block_group() also reads it without commit_root_sem, however
> this is only called during mount time, so it should be safe.
> btrfs_cache_block_group() sets it without holding commit_root_sem, but
> while holding the block group's spinlock.
> 
> It seems more intuitive to use the block group's spinlock to protect it.
> And having a comment on top of the declaration of 'last_byte_to_unpin'
> mentioning which lock protects it would be nice to have, or a comment
> on top of the lock mentioning what it protects (or both), even if that
> isn't strictly related to this patch.
> 
> We're highly inconsistent writing to and reading from
> block_group->last_byte_to_unpin.

last_byte_to_unpin needs to sync up with the swapping of the commit roots, 
otherwise we end up with leaked space.  Currently we run finish_extent_commit() 
in the transactions context, which means last_byte_to_unpin will never change 
while we're running, because the next transaction will have to wait for us to 
finish our work.  It's only synchronized with the caching threads, which is 
controlled by the commit_root_sem, after that we are free to read it and trust 
that it won't change.

However in documenting this I realized there's a problem, we actually do the 
prepare first, and then switch the commit roots, so there's a window in which we 
could make progress and have a stale unpin setting, which will result in leaked 
space.  I'll fix that up and resend.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-21 18:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 20:02 [PATCH 0/3] Lockdep fixes for misc-next Josef Bacik
2020-10-19 20:02 ` [PATCH 1/3] btrfs: drop the path before adding qgroup items when enabling qgroups Josef Bacik
2020-10-20  9:31   ` Filipe Manana
2020-10-21 15:19   ` David Sterba
2020-10-19 20:02 ` [PATCH 2/3] btrfs: protect the fs_info->caching_block_groups differently Josef Bacik
2020-10-20 10:10   ` Filipe Manana
2020-10-21 14:05     ` Josef Bacik
2020-10-21 16:36       ` Filipe Manana
2020-10-21 18:44         ` Josef Bacik
2020-10-21 15:51   ` David Sterba
2020-10-19 20:02 ` [PATCH 3/3] btrfs: add a helper to read the tree_root commit root for backref lookup Josef Bacik
2020-10-20 11:06   ` Filipe Manana
2020-10-21 16:13 ` [PATCH 0/3] Lockdep fixes for misc-next David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.