All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4][v2] Lockdep fixes
@ 2020-09-01 21:40 Josef Bacik
  2020-09-01 21:40 ` [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev Josef Bacik
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Josef Bacik @ 2020-09-01 21:40 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- Included the add_missing_dev patch in the series.
- Added a patch to kill the rcu protection for fs_info->space_info.
- Fixed the raid sysfs init stuff to be completely out of link_block_group, as
  it causes a lockdep splat with the rwsem conversion.

Hello,

These are the last two lockdep splats I'm able to see in my testing.  We have
like 4 variations of the same lockdep splat that's addressed by 

btrfs: do not create raid sysfs entries under chunk_mutex

Basically this particular dependency pulls in the kernfs_mutex under the
chunk_mutex, and so we have like 4 issues in github with slightly different
splats, but are all fixed by that fix.  With these two patches (and the one I
sent the other day for add_missing_dev) I haven't hit any lockdep splats in 6
runs of xfstests on 3 different VMs in the last 12 hours.  That means it should
take Dave at least 2 runs before he hits a new one.  Thanks,

Josef Bacik (4):
  btrfs: fix lockdep splat in add_missing_dev
  btrfs: init sysfs for devices outside of the chunk_mutex
  btrfs: kill the rcu protection for fs_info->space_info
  btrfs: do not create raid sysfs entries under any locks

 fs/btrfs/block-group.c | 47 ++++++++++++++++++++++++------------------
 fs/btrfs/ioctl.c       | 10 ++-------
 fs/btrfs/space-info.c  | 14 ++++---------
 fs/btrfs/super.c       |  5 +----
 fs/btrfs/sysfs.c       | 25 ++++++++++++++++++++--
 fs/btrfs/volumes.c     | 17 ++++++++++++---
 6 files changed, 71 insertions(+), 47 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev
  2020-09-01 21:40 [PATCH 0/4][v2] Lockdep fixes Josef Bacik
@ 2020-09-01 21:40 ` Josef Bacik
  2020-09-02  6:23   ` Anand Jain
  2020-09-03 11:17   ` David Sterba
  2020-09-01 21:40 ` [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex Josef Bacik
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Josef Bacik @ 2020-09-01 21:40 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Nikolay reported a lockdep splat that I could reproduce with btrfs/187.

======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc2+ #1 Tainted: G        W
------------------------------------------------------
kswapd0/100 is trying to acquire lock:
ffff9e8ef38b6268 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

but task is already holding lock:
ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (fs_reclaim){+.+.}-{0:0}:
       fs_reclaim_acquire+0x65/0x80
       slab_pre_alloc_hook.constprop.0+0x20/0x200
       kmem_cache_alloc_trace+0x3a/0x1a0
       btrfs_alloc_device+0x43/0x210
       add_missing_dev+0x20/0x90
       read_one_chunk+0x301/0x430
       btrfs_read_sys_array+0x17b/0x1b0
       open_ctree+0xa62/0x1896
       btrfs_mount_root.cold+0x12/0xea
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       vfs_kern_mount.part.0+0x71/0xb0
       btrfs_mount+0x10d/0x379
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       path_mount+0x434/0xc00
       __x64_sys_mount+0xe3/0x120
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7e/0x7e0
       btrfs_chunk_alloc+0x125/0x3a0
       find_free_extent+0xdf6/0x1210
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb0/0x310
       alloc_tree_block_no_bg_flush+0x4a/0x60
       __btrfs_cow_block+0x11a/0x530
       btrfs_cow_block+0x104/0x220
       btrfs_search_slot+0x52e/0x9d0
       btrfs_lookup_inode+0x2a/0x8f
       __btrfs_update_delayed_inode+0x80/0x240
       btrfs_commit_inode_delayed_inode+0x119/0x120
       btrfs_evict_inode+0x357/0x500
       evict+0xcf/0x1f0
       vfs_rmdir.part.0+0x149/0x160
       do_rmdir+0x136/0x1a0
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
       __lock_acquire+0x1184/0x1fa0
       lock_acquire+0xa4/0x3d0
       __mutex_lock+0x7e/0x7e0
       __btrfs_release_delayed_node.part.0+0x3f/0x330
       btrfs_evict_inode+0x24c/0x500
       evict+0xcf/0x1f0
       dispose_list+0x48/0x70
       prune_icache_sb+0x44/0x50
       super_cache_scan+0x161/0x1e0
       do_shrink_slab+0x178/0x3c0
       shrink_slab+0x17c/0x290
       shrink_node+0x2b2/0x6d0
       balance_pgdat+0x30a/0x670
       kswapd+0x213/0x4c0
       kthread+0x138/0x160
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

Chain exists of:
  &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(&fs_info->chunk_mutex);
                               lock(fs_reclaim);
  lock(&delayed_node->mutex);

 *** DEADLOCK ***

3 locks held by kswapd0/100:
 #0: ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
 #1: ffffffffa9d65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
 #2: ffff9e8e9da260e0 (&type->s_umount_key#48){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

stack backtrace:
CPU: 1 PID: 100 Comm: kswapd0 Tainted: G        W         5.9.0-rc2+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x92/0xc8
 check_noncircular+0x12d/0x150
 __lock_acquire+0x1184/0x1fa0
 lock_acquire+0xa4/0x3d0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 __mutex_lock+0x7e/0x7e0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 ? lock_acquire+0xa4/0x3d0
 ? btrfs_evict_inode+0x11e/0x500
 ? find_held_lock+0x2b/0x80
 __btrfs_release_delayed_node.part.0+0x3f/0x330
 btrfs_evict_inode+0x24c/0x500
 evict+0xcf/0x1f0
 dispose_list+0x48/0x70
 prune_icache_sb+0x44/0x50
 super_cache_scan+0x161/0x1e0
 do_shrink_slab+0x178/0x3c0
 shrink_slab+0x17c/0x290
 shrink_node+0x2b2/0x6d0
 balance_pgdat+0x30a/0x670
 kswapd+0x213/0x4c0
 ? _raw_spin_unlock_irqrestore+0x46/0x60
 ? add_wait_queue_exclusive+0x70/0x70
 ? balance_pgdat+0x670/0x670
 kthread+0x138/0x160
 ? kthread_create_worker_on_cpu+0x40/0x40
 ret_from_fork+0x1f/0x30

This is because we are holding the chunk_mutex when we call
btrfs_alloc_device, which does a GFP_KERNEL allocation.  We don't want
to switch that to a GFP_NOFS lock because this is the only place where
it matters.  So instead use memalloc_nofs_save() around the allocation
in order to avoid the lockdep splat.

References: https://github.com/btrfs/fstests/issues/6
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3f8bd1af29eb..d6bbbe1986bb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/sched.h>
+#include <linux/sched/mm.h>
 #include <linux/bio.h>
 #include <linux/slab.h>
 #include <linux/blkdev.h>
@@ -6480,8 +6481,17 @@ static struct btrfs_device *add_missing_dev(struct btrfs_fs_devices *fs_devices,
 					    u64 devid, u8 *dev_uuid)
 {
 	struct btrfs_device *device;
+	unsigned int nofs_flag;
 
+	/*
+	 * We call this under the chunk_mutex, so we want to use NOFS for this
+	 * allocation, however we don't want to change btrfs_alloc_device() to
+	 * always do NOFS because we use it in a lot of other GFP_KERNEL safe
+	 * places.
+	 */
+	nofs_flag = memalloc_nofs_save();
 	device = btrfs_alloc_device(NULL, &devid, dev_uuid);
+	memalloc_nofs_restore(nofs_flag);
 	if (IS_ERR(device))
 		return device;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex
  2020-09-01 21:40 [PATCH 0/4][v2] Lockdep fixes Josef Bacik
  2020-09-01 21:40 ` [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev Josef Bacik
@ 2020-09-01 21:40 ` Josef Bacik
  2020-09-02  6:21   ` Anand Jain
  2020-09-03 11:18   ` David Sterba
  2020-09-01 21:40 ` [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info Josef Bacik
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Josef Bacik @ 2020-09-01 21:40 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While running btrfs/187 I hit the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc3+ #4 Not tainted
------------------------------------------------------
kswapd0/100 is trying to acquire lock:
ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

but task is already holding lock:
ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (fs_reclaim){+.+.}-{0:0}:
       fs_reclaim_acquire+0x65/0x80
       slab_pre_alloc_hook.constprop.0+0x20/0x200
       kmem_cache_alloc+0x37/0x270
       alloc_inode+0x82/0xb0
       iget_locked+0x10d/0x2c0
       kernfs_get_inode+0x1b/0x130
       kernfs_get_tree+0x136/0x240
       sysfs_get_tree+0x16/0x40
       vfs_get_tree+0x28/0xc0
       path_mount+0x434/0xc00
       __x64_sys_mount+0xe3/0x120
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (kernfs_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7e/0x7e0
       kernfs_add_one+0x23/0x150
       kernfs_create_link+0x63/0xa0
       sysfs_do_create_link_sd+0x5e/0xd0
       btrfs_sysfs_add_devices_dir+0x81/0x130
       btrfs_init_new_device+0x67f/0x1250
       btrfs_ioctl+0x1ef/0x2e20
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7e/0x7e0
       btrfs_chunk_alloc+0x125/0x3a0
       find_free_extent+0xdf6/0x1210
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb0/0x310
       alloc_tree_block_no_bg_flush+0x4a/0x60
       __btrfs_cow_block+0x11a/0x530
       btrfs_cow_block+0x104/0x220
       btrfs_search_slot+0x52e/0x9d0
       btrfs_insert_empty_items+0x64/0xb0
       btrfs_insert_delayed_items+0x90/0x4f0
       btrfs_commit_inode_delayed_items+0x93/0x140
       btrfs_log_inode+0x5de/0x2020
       btrfs_log_inode_parent+0x429/0xc90
       btrfs_log_new_name+0x95/0x9b
       btrfs_rename2+0xbb9/0x1800
       vfs_rename+0x64f/0x9f0
       do_renameat2+0x320/0x4e0
       __x64_sys_rename+0x1f/0x30
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
       __lock_acquire+0x119c/0x1fc0
       lock_acquire+0xa7/0x3d0
       __mutex_lock+0x7e/0x7e0
       __btrfs_release_delayed_node.part.0+0x3f/0x330
       btrfs_evict_inode+0x24c/0x500
       evict+0xcf/0x1f0
       dispose_list+0x48/0x70
       prune_icache_sb+0x44/0x50
       super_cache_scan+0x161/0x1e0
       do_shrink_slab+0x178/0x3c0
       shrink_slab+0x17c/0x290
       shrink_node+0x2b2/0x6d0
       balance_pgdat+0x30a/0x670
       kswapd+0x213/0x4c0
       kthread+0x138/0x160
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

Chain exists of:
  &delayed_node->mutex --> kernfs_mutex --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(kernfs_mutex);
                               lock(fs_reclaim);
  lock(&delayed_node->mutex);

 *** DEADLOCK ***

3 locks held by kswapd0/100:
 #0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
 #1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
 #2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

stack backtrace:
CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb8
 check_noncircular+0x12d/0x150
 __lock_acquire+0x119c/0x1fc0
 lock_acquire+0xa7/0x3d0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 __mutex_lock+0x7e/0x7e0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 ? lock_acquire+0xa7/0x3d0
 ? find_held_lock+0x2b/0x80
 __btrfs_release_delayed_node.part.0+0x3f/0x330
 btrfs_evict_inode+0x24c/0x500
 evict+0xcf/0x1f0
 dispose_list+0x48/0x70
 prune_icache_sb+0x44/0x50
 super_cache_scan+0x161/0x1e0
 do_shrink_slab+0x178/0x3c0
 shrink_slab+0x17c/0x290
 shrink_node+0x2b2/0x6d0
 balance_pgdat+0x30a/0x670
 kswapd+0x213/0x4c0
 ? _raw_spin_unlock_irqrestore+0x41/0x50
 ? add_wait_queue_exclusive+0x70/0x70
 ? balance_pgdat+0x670/0x670
 kthread+0x138/0x160
 ? kthread_create_worker_on_cpu+0x40/0x40
 ret_from_fork+0x1f/0x30

This happens because we are holding the chunk_mutex at the time of
adding in a new device.  However we only need to hold the
device_list_mutex, as we're going to iterate over the fs_devices
devices.  Move the sysfs init stuff outside of the chunk_mutex to get
rid of this lockdep splat.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d6bbbe1986bb..77b7da42c651 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2599,9 +2599,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	btrfs_set_super_num_devices(fs_info->super_copy,
 				    orig_super_num_devices + 1);
 
-	/* add sysfs device entry */
-	btrfs_sysfs_add_devices_dir(fs_devices, device);
-
 	/*
 	 * we've got more storage, clear any full flags on the space
 	 * infos
@@ -2609,6 +2606,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	btrfs_clear_space_info_full(fs_info);
 
 	mutex_unlock(&fs_info->chunk_mutex);
+
+	/* add sysfs device entry */
+	btrfs_sysfs_add_devices_dir(fs_devices, device);
+
 	mutex_unlock(&fs_devices->device_list_mutex);
 
 	if (seeding_dev) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info
  2020-09-01 21:40 [PATCH 0/4][v2] Lockdep fixes Josef Bacik
  2020-09-01 21:40 ` [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev Josef Bacik
  2020-09-01 21:40 ` [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex Josef Bacik
@ 2020-09-01 21:40 ` Josef Bacik
  2020-09-02  8:04   ` Nikolay Borisov
  2020-09-02 10:32   ` David Sterba
  2020-09-01 21:40 ` [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks Josef Bacik
  2020-09-04 14:20 ` [PATCH 0/4][v2] Lockdep fixes David Sterba
  4 siblings, 2 replies; 18+ messages in thread
From: Josef Bacik @ 2020-09-01 21:40 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We have this thing wrapped in an rcu lock, but it's really not needed.
We create all the space_info's on mount, and we destroy them on unmount.
The list never changes and we're protected from messing with it by the
normal mount/umount path, so kill the RCU stuff around it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 16 ++--------------
 fs/btrfs/ioctl.c       | 10 ++--------
 fs/btrfs/space-info.c  | 14 ++++----------
 fs/btrfs/super.c       |  5 +----
 4 files changed, 9 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 01e8ba1da1d3..a3b27204371c 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2031,8 +2031,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 		btrfs_release_path(path);
 	}
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(space_info, &info->space_info, list) {
+	list_for_each_entry(space_info, &info->space_info, list) {
 		if (!(btrfs_get_alloc_profile(info, space_info->flags) &
 		      (BTRFS_BLOCK_GROUP_RAID10 |
 		       BTRFS_BLOCK_GROUP_RAID1_MASK |
@@ -2052,7 +2051,6 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 				list)
 			inc_block_group_ro(cache, 1);
 	}
-	rcu_read_unlock();
 
 	btrfs_init_global_block_rsv(info);
 	ret = check_chunk_block_group_mappings(info);
@@ -3007,12 +3005,10 @@ static void force_metadata_allocation(struct btrfs_fs_info *info)
 	struct list_head *head = &info->space_info;
 	struct btrfs_space_info *found;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(found, head, list) {
+	list_for_each_entry(found, head, list) {
 		if (found->flags & BTRFS_BLOCK_GROUP_METADATA)
 			found->force_alloc = CHUNK_ALLOC_FORCE;
 	}
-	rcu_read_unlock();
 }
 
 static int should_alloc_chunk(struct btrfs_fs_info *fs_info,
@@ -3343,14 +3339,6 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 	}
 	spin_unlock(&info->block_group_cache_lock);
 
-	/*
-	 * Now that all the block groups are freed, go through and free all the
-	 * space_info structs.  This is only called during the final stages of
-	 * unmount, and so we know nobody is using them.  We call
-	 * synchronize_rcu() once before we start, just to be on the safe side.
-	 */
-	synchronize_rcu();
-
 	btrfs_release_global_block_rsv(info);
 
 	while (!list_empty(&info->space_info)) {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a5f466d23139..3a7a34cf6c22 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3461,15 +3461,12 @@ static long btrfs_ioctl_space_info(struct btrfs_fs_info *fs_info,
 		struct btrfs_space_info *tmp;
 
 		info = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info,
-					list) {
+		list_for_each_entry(tmp, &fs_info->space_info, list) {
 			if (tmp->flags == types[i]) {
 				info = tmp;
 				break;
 			}
 		}
-		rcu_read_unlock();
 
 		if (!info)
 			continue;
@@ -3517,15 +3514,12 @@ static long btrfs_ioctl_space_info(struct btrfs_fs_info *fs_info,
 			break;
 
 		info = NULL;
-		rcu_read_lock();
-		list_for_each_entry_rcu(tmp, &fs_info->space_info,
-					list) {
+		list_for_each_entry(tmp, &fs_info->space_info, list) {
 			if (tmp->flags == types[i]) {
 				info = tmp;
 				break;
 			}
 		}
-		rcu_read_unlock();
 
 		if (!info)
 			continue;
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index b733718f45d3..837615702c94 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -175,10 +175,8 @@ void btrfs_clear_space_info_full(struct btrfs_fs_info *info)
 	struct list_head *head = &info->space_info;
 	struct btrfs_space_info *found;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(found, head, list)
+	list_for_each_entry(found, head, list)
 		found->full = 0;
-	rcu_read_unlock();
 }
 
 static int create_space_info(struct btrfs_fs_info *info, u64 flags)
@@ -213,7 +211,7 @@ static int create_space_info(struct btrfs_fs_info *info, u64 flags)
 	if (ret)
 		return ret;
 
-	list_add_rcu(&space_info->list, &info->space_info);
+	list_add(&space_info->list, &info->space_info);
 	if (flags & BTRFS_BLOCK_GROUP_DATA)
 		info->data_sinfo = space_info;
 
@@ -290,14 +288,10 @@ struct btrfs_space_info *btrfs_find_space_info(struct btrfs_fs_info *info,
 
 	flags &= BTRFS_BLOCK_GROUP_TYPE_MASK;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(found, head, list) {
-		if (found->flags & flags) {
-			rcu_read_unlock();
+	list_for_each_entry(found, head, list) {
+		if (found->flags & flags)
 			return found;
-		}
 	}
-	rcu_read_unlock();
 	return NULL;
 }
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3ebe7240c63d..8840a4fa81eb 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2164,8 +2164,7 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	u64 thresh = 0;
 	int mixed = 0;
 
-	rcu_read_lock();
-	list_for_each_entry_rcu(found, &fs_info->space_info, list) {
+	list_for_each_entry(found, &fs_info->space_info, list) {
 		if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
 			int i;
 
@@ -2194,8 +2193,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 		total_used += found->disk_used;
 	}
 
-	rcu_read_unlock();
-
 	buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super), factor);
 	buf->f_blocks >>= bits;
 	buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >> bits);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks
  2020-09-01 21:40 [PATCH 0/4][v2] Lockdep fixes Josef Bacik
                   ` (2 preceding siblings ...)
  2020-09-01 21:40 ` [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info Josef Bacik
@ 2020-09-01 21:40 ` Josef Bacik
  2020-09-08 12:40   ` David Sterba
  2020-09-04 14:20 ` [PATCH 0/4][v2] Lockdep fixes David Sterba
  4 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2020-09-01 21:40 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

While running xfstests btrfs/177 I got the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.9.0-rc3+ #5 Not tainted
------------------------------------------------------
kswapd0/100 is trying to acquire lock:
ffff97066aa56760 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

but task is already holding lock:
ffffffff9fd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (fs_reclaim){+.+.}-{0:0}:
       fs_reclaim_acquire+0x65/0x80
       slab_pre_alloc_hook.constprop.0+0x20/0x200
       kmem_cache_alloc+0x37/0x270
       alloc_inode+0x82/0xb0
       iget_locked+0x10d/0x2c0
       kernfs_get_inode+0x1b/0x130
       kernfs_get_tree+0x136/0x240
       sysfs_get_tree+0x16/0x40
       vfs_get_tree+0x28/0xc0
       path_mount+0x434/0xc00
       __x64_sys_mount+0xe3/0x120
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (kernfs_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7e/0x7e0
       kernfs_add_one+0x23/0x150
       kernfs_create_dir_ns+0x7a/0xb0
       sysfs_create_dir_ns+0x60/0xb0
       kobject_add_internal+0xc0/0x2c0
       kobject_add+0x6e/0x90
       btrfs_sysfs_add_block_group_type+0x102/0x160
       btrfs_make_block_group+0x167/0x230
       btrfs_alloc_chunk+0x54f/0xb80
       btrfs_chunk_alloc+0x18e/0x3a0
       find_free_extent+0xdf6/0x1210
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb0/0x310
       alloc_tree_block_no_bg_flush+0x4a/0x60
       __btrfs_cow_block+0x11a/0x530
       btrfs_cow_block+0x104/0x220
       btrfs_search_slot+0x52e/0x9d0
       btrfs_insert_empty_items+0x64/0xb0
       btrfs_new_inode+0x225/0x730
       btrfs_create+0xab/0x1f0
       lookup_open.isra.0+0x52d/0x690
       path_openat+0x2a7/0x9e0
       do_filp_open+0x75/0x100
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7e/0x7e0
       btrfs_chunk_alloc+0x125/0x3a0
       find_free_extent+0xdf6/0x1210
       btrfs_reserve_extent+0xb3/0x1b0
       btrfs_alloc_tree_block+0xb0/0x310
       alloc_tree_block_no_bg_flush+0x4a/0x60
       __btrfs_cow_block+0x11a/0x530
       btrfs_cow_block+0x104/0x220
       btrfs_search_slot+0x52e/0x9d0
       btrfs_lookup_inode+0x2a/0x8f
       __btrfs_update_delayed_inode+0x80/0x240
       btrfs_commit_inode_delayed_inode+0x119/0x120
       btrfs_evict_inode+0x357/0x500
       evict+0xcf/0x1f0
       do_unlinkat+0x1a9/0x2b0
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&delayed_node->mutex){+.+.}-{3:3}:
       __lock_acquire+0x119c/0x1fc0
       lock_acquire+0xa7/0x3d0
       __mutex_lock+0x7e/0x7e0
       __btrfs_release_delayed_node.part.0+0x3f/0x330
       btrfs_evict_inode+0x24c/0x500
       evict+0xcf/0x1f0
       dispose_list+0x48/0x70
       prune_icache_sb+0x44/0x50
       super_cache_scan+0x161/0x1e0
       do_shrink_slab+0x178/0x3c0
       shrink_slab+0x17c/0x290
       shrink_node+0x2b2/0x6d0
       balance_pgdat+0x30a/0x670
       kswapd+0x213/0x4c0
       kthread+0x138/0x160
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

Chain exists of:
  &delayed_node->mutex --> kernfs_mutex --> fs_reclaim

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(fs_reclaim);
                               lock(kernfs_mutex);
                               lock(fs_reclaim);
  lock(&delayed_node->mutex);

 *** DEADLOCK ***

3 locks held by kswapd0/100:
 #0: ffffffff9fd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
 #1: ffffffff9fd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
 #2: ffff9706629780e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

stack backtrace:
CPU: 1 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack+0x8b/0xb8
 check_noncircular+0x12d/0x150
 __lock_acquire+0x119c/0x1fc0
 lock_acquire+0xa7/0x3d0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 __mutex_lock+0x7e/0x7e0
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 ? __btrfs_release_delayed_node.part.0+0x3f/0x330
 ? lock_acquire+0xa7/0x3d0
 ? find_held_lock+0x2b/0x80
 __btrfs_release_delayed_node.part.0+0x3f/0x330
 btrfs_evict_inode+0x24c/0x500
 evict+0xcf/0x1f0
 dispose_list+0x48/0x70
 prune_icache_sb+0x44/0x50
 super_cache_scan+0x161/0x1e0
 do_shrink_slab+0x178/0x3c0
 shrink_slab+0x17c/0x290
 shrink_node+0x2b2/0x6d0
 balance_pgdat+0x30a/0x670
 kswapd+0x213/0x4c0
 ? _raw_spin_unlock_irqrestore+0x41/0x50
 ? add_wait_queue_exclusive+0x70/0x70
 ? balance_pgdat+0x670/0x670
 kthread+0x138/0x160
 ? kthread_create_worker_on_cpu+0x40/0x40
 ret_from_fork+0x1f/0x30

This happens because when we link in a block group with a new raid index
type we'll create the corresponding sysfs entries for it.  This is
problematic because while restriping we're holding the chunk_mutex, and
while mounting we're holding the tree locks.

Fixing this isn't pretty, we move the call to the sysfs stuff into the
btrfs_create_pending_block_groups() work, where we're not holding any
locks.  This creates a slight race where other threads could see that
there's no sysfs kobj for that raid type, and race to create the
syfsdir.  Fix this by wrapping the creation in space_info->lock, so we
only get one person calling kobject_add() for the new directory.  We
don't worry about the lock on cleanup as it only gets deleted on
unmount.

On mount it's more straightforward, we loop through the space_info's
already, just check every raid index in each space_info and added the
sysfs entries for the corresponding block groups.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/block-group.c | 31 +++++++++++++++++++++++++------
 fs/btrfs/sysfs.c       | 25 +++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a3b27204371c..2dbdf6ef568e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1766,16 +1766,10 @@ static void link_block_group(struct btrfs_block_group *cache)
 {
 	struct btrfs_space_info *space_info = cache->space_info;
 	int index = btrfs_bg_flags_to_raid_index(cache->flags);
-	bool first = false;
 
 	down_write(&space_info->groups_sem);
-	if (list_empty(&space_info->block_groups[index]))
-		first = true;
 	list_add_tail(&cache->list, &space_info->block_groups[index]);
 	up_write(&space_info->groups_sem);
-
-	if (first)
-		btrfs_sysfs_add_block_group_type(cache);
 }
 
 static struct btrfs_block_group *btrfs_create_block_group_cache(
@@ -2032,6 +2026,17 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 	}
 
 	list_for_each_entry(space_info, &info->space_info, list) {
+		int i;
+
+		for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
+			if (list_empty(&space_info->block_groups[i]))
+				continue;
+			cache = list_first_entry(&space_info->block_groups[i],
+						 struct btrfs_block_group,
+						 list);
+			btrfs_sysfs_add_block_group_type(cache);
+		}
+
 		if (!(btrfs_get_alloc_profile(info, space_info->flags) &
 		      (BTRFS_BLOCK_GROUP_RAID10 |
 		       BTRFS_BLOCK_GROUP_RAID1_MASK |
@@ -2091,12 +2096,16 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 		return;
 
 	while (!list_empty(&trans->new_bgs)) {
+		int index;
+
 		block_group = list_first_entry(&trans->new_bgs,
 					       struct btrfs_block_group,
 					       bg_list);
 		if (ret)
 			goto next;
 
+		index = btrfs_bg_flags_to_raid_index(block_group->flags);
+
 		ret = insert_block_group_item(trans, block_group);
 		if (ret)
 			btrfs_abort_transaction(trans, ret);
@@ -2105,6 +2114,16 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 		if (ret)
 			btrfs_abort_transaction(trans, ret);
 		add_block_group_free_space(trans, block_group);
+
+		/*
+		 * If we restriped we may have added a new raid type, so now add
+		 * the sysfs entries when it is safe to do so.  We don't have to
+		 * worry about locking here as it's handled in
+		 * btrfs_sysfs_add_block_group_type.
+		 */
+		if (block_group->space_info->block_group_kobjs[index] == NULL)
+			btrfs_sysfs_add_block_group_type(block_group);
+
 		/* Already aborted the transaction if it failed. */
 next:
 		btrfs_delayed_refs_rsv_release(fs_info, 1);
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 190e59152be5..89005f51bab8 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -1077,17 +1077,38 @@ void btrfs_sysfs_add_block_group_type(struct btrfs_block_group *cache)
 
 	rkobj->flags = cache->flags;
 	kobject_init(&rkobj->kobj, &btrfs_raid_ktype);
+
+	/*
+	 * We call this either on mount, or if we've created a block group for a
+	 * new index type while running (i.e. when restriping).  The running
+	 * case is tricky because we could race with other threads, so we need
+	 * to have this check to make sure we didn't already init the kobject.
+	 *
+	 * We don't have to protect on the free side because it only happens on
+	 * unmount.
+	 */
+	spin_lock(&space_info->lock);
+	if (space_info->block_group_kobjs[index]) {
+		spin_unlock(&space_info->lock);
+		kobject_put(&rkobj->kobj);
+		return;
+	} else {
+		space_info->block_group_kobjs[index] = &rkobj->kobj;
+	}
+	spin_unlock(&space_info->lock);
+
 	ret = kobject_add(&rkobj->kobj, &space_info->kobj, "%s",
 			  btrfs_bg_type_to_raid_name(rkobj->flags));
 	memalloc_nofs_restore(nofs_flag);
 	if (ret) {
+		spin_lock(&space_info->lock);
+		space_info->block_group_kobjs[index] = NULL;
+		spin_unlock(&space_info->lock);
 		kobject_put(&rkobj->kobj);
 		btrfs_warn(fs_info,
 			"failed to add kobject for block cache, ignoring");
 		return;
 	}
-
-	space_info->block_group_kobjs[index] = &rkobj->kobj;
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex
  2020-09-01 21:40 ` [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex Josef Bacik
@ 2020-09-02  6:21   ` Anand Jain
  2020-09-02 17:45     ` David Sterba
  2020-09-03 11:18   ` David Sterba
  1 sibling, 1 reply; 18+ messages in thread
From: Anand Jain @ 2020-09-02  6:21 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 2/9/20 5:40 am, Josef Bacik wrote:
> While running btrfs/187 I hit the following lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0-rc3+ #4 Not tainted
> ------------------------------------------------------
> kswapd0/100 is trying to acquire lock:
> ffff96ecc22ef4a0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330
> 
> but task is already holding lock:
> ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #3 (fs_reclaim){+.+.}-{0:0}:
>         fs_reclaim_acquire+0x65/0x80
>         slab_pre_alloc_hook.constprop.0+0x20/0x200
>         kmem_cache_alloc+0x37/0x270
>         alloc_inode+0x82/0xb0
>         iget_locked+0x10d/0x2c0
>         kernfs_get_inode+0x1b/0x130
>         kernfs_get_tree+0x136/0x240
>         sysfs_get_tree+0x16/0x40
>         vfs_get_tree+0x28/0xc0
>         path_mount+0x434/0xc00
>         __x64_sys_mount+0xe3/0x120
>         do_syscall_64+0x33/0x40
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> -> #2 (kernfs_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7e/0x7e0
>         kernfs_add_one+0x23/0x150
>         kernfs_create_link+0x63/0xa0
>         sysfs_do_create_link_sd+0x5e/0xd0
>         btrfs_sysfs_add_devices_dir+0x81/0x130
>         btrfs_init_new_device+0x67f/0x1250
>         btrfs_ioctl+0x1ef/0x2e20
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x33/0x40
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7e/0x7e0
>         btrfs_chunk_alloc+0x125/0x3a0
>         find_free_extent+0xdf6/0x1210
>         btrfs_reserve_extent+0xb3/0x1b0
>         btrfs_alloc_tree_block+0xb0/0x310
>         alloc_tree_block_no_bg_flush+0x4a/0x60
>         __btrfs_cow_block+0x11a/0x530
>         btrfs_cow_block+0x104/0x220
>         btrfs_search_slot+0x52e/0x9d0
>         btrfs_insert_empty_items+0x64/0xb0
>         btrfs_insert_delayed_items+0x90/0x4f0
>         btrfs_commit_inode_delayed_items+0x93/0x140
>         btrfs_log_inode+0x5de/0x2020
>         btrfs_log_inode_parent+0x429/0xc90
>         btrfs_log_new_name+0x95/0x9b
>         btrfs_rename2+0xbb9/0x1800
>         vfs_rename+0x64f/0x9f0
>         do_renameat2+0x320/0x4e0
>         __x64_sys_rename+0x1f/0x30
>         do_syscall_64+0x33/0x40
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
>         __lock_acquire+0x119c/0x1fc0
>         lock_acquire+0xa7/0x3d0
>         __mutex_lock+0x7e/0x7e0
>         __btrfs_release_delayed_node.part.0+0x3f/0x330
>         btrfs_evict_inode+0x24c/0x500
>         evict+0xcf/0x1f0
>         dispose_list+0x48/0x70
>         prune_icache_sb+0x44/0x50
>         super_cache_scan+0x161/0x1e0
>         do_shrink_slab+0x178/0x3c0
>         shrink_slab+0x17c/0x290
>         shrink_node+0x2b2/0x6d0
>         balance_pgdat+0x30a/0x670
>         kswapd+0x213/0x4c0
>         kthread+0x138/0x160
>         ret_from_fork+0x1f/0x30
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    &delayed_node->mutex --> kernfs_mutex --> fs_reclaim
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(fs_reclaim);
>                                 lock(kernfs_mutex);
>                                 lock(fs_reclaim);
>    lock(&delayed_node->mutex);
> 
>   *** DEADLOCK ***
> 
> 3 locks held by kswapd0/100:
>   #0: ffffffff8dd74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
>   #1: ffffffff8dd65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
>   #2: ffff96ed2ade30e0 (&type->s_umount_key#36){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
> 
> stack backtrace:
> CPU: 0 PID: 100 Comm: kswapd0 Not tainted 5.9.0-rc3+ #4
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack+0x8b/0xb8
>   check_noncircular+0x12d/0x150
>   __lock_acquire+0x119c/0x1fc0
>   lock_acquire+0xa7/0x3d0
>   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
>   __mutex_lock+0x7e/0x7e0
>   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
>   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
>   ? lock_acquire+0xa7/0x3d0
>   ? find_held_lock+0x2b/0x80
>   __btrfs_release_delayed_node.part.0+0x3f/0x330
>   btrfs_evict_inode+0x24c/0x500
>   evict+0xcf/0x1f0
>   dispose_list+0x48/0x70
>   prune_icache_sb+0x44/0x50
>   super_cache_scan+0x161/0x1e0
>   do_shrink_slab+0x178/0x3c0
>   shrink_slab+0x17c/0x290
>   shrink_node+0x2b2/0x6d0
>   balance_pgdat+0x30a/0x670
>   kswapd+0x213/0x4c0
>   ? _raw_spin_unlock_irqrestore+0x41/0x50
>   ? add_wait_queue_exclusive+0x70/0x70
>   ? balance_pgdat+0x670/0x670
>   kthread+0x138/0x160
>   ? kthread_create_worker_on_cpu+0x40/0x40
>   ret_from_fork+0x1f/0x30
> 
> This happens because we are holding the chunk_mutex at the time of
> adding in a new device.  However we only need to hold the
> device_list_mutex, as we're going to iterate over the fs_devices
> devices.  Move the sysfs init stuff outside of the chunk_mutex to get
> rid of this lockdep splat.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
 >
> ---
>   fs/btrfs/volumes.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d6bbbe1986bb..77b7da42c651 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2599,9 +2599,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	btrfs_set_super_num_devices(fs_info->super_copy,
>   				    orig_super_num_devices + 1);
>   
> -	/* add sysfs device entry */
> -	btrfs_sysfs_add_devices_dir(fs_devices, device);
> -
>   	/*
>   	 * we've got more storage, clear any full flags on the space
>   	 * infos
> @@ -2609,6 +2606,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	btrfs_clear_space_info_full(fs_info);
>   
>   	mutex_unlock(&fs_info->chunk_mutex);
> +
> +	/* add sysfs device entry */
> +	btrfs_sysfs_add_devices_dir(fs_devices, device);
> +
>   	mutex_unlock(&fs_devices->device_list_mutex);
>   
>   	if (seeding_dev) {
> 

Strange we should get this splat when btrfs_sysfs_add_devices_dir()
already has implicit GFP_NOFS allocation scope right? What did I
miss?

Thanks, Anand







^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev
  2020-09-01 21:40 ` [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev Josef Bacik
@ 2020-09-02  6:23   ` Anand Jain
  2020-09-03 11:17   ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-09-02  6:23 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 2/9/20 5:40 am, Josef Bacik wrote:
> Nikolay reported a lockdep splat that I could reproduce with btrfs/187.
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.9.0-rc2+ #1 Tainted: G        W
> ------------------------------------------------------
> kswapd0/100 is trying to acquire lock:
> ffff9e8ef38b6268 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330
> 
> but task is already holding lock:
> ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (fs_reclaim){+.+.}-{0:0}:
>         fs_reclaim_acquire+0x65/0x80
>         slab_pre_alloc_hook.constprop.0+0x20/0x200
>         kmem_cache_alloc_trace+0x3a/0x1a0
>         btrfs_alloc_device+0x43/0x210
>         add_missing_dev+0x20/0x90
>         read_one_chunk+0x301/0x430
>         btrfs_read_sys_array+0x17b/0x1b0
>         open_ctree+0xa62/0x1896
>         btrfs_mount_root.cold+0x12/0xea
>         legacy_get_tree+0x30/0x50
>         vfs_get_tree+0x28/0xc0
>         vfs_kern_mount.part.0+0x71/0xb0
>         btrfs_mount+0x10d/0x379
>         legacy_get_tree+0x30/0x50
>         vfs_get_tree+0x28/0xc0
>         path_mount+0x434/0xc00
>         __x64_sys_mount+0xe3/0x120
>         do_syscall_64+0x33/0x40
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7e/0x7e0
>         btrfs_chunk_alloc+0x125/0x3a0
>         find_free_extent+0xdf6/0x1210
>         btrfs_reserve_extent+0xb3/0x1b0
>         btrfs_alloc_tree_block+0xb0/0x310
>         alloc_tree_block_no_bg_flush+0x4a/0x60
>         __btrfs_cow_block+0x11a/0x530
>         btrfs_cow_block+0x104/0x220
>         btrfs_search_slot+0x52e/0x9d0
>         btrfs_lookup_inode+0x2a/0x8f
>         __btrfs_update_delayed_inode+0x80/0x240
>         btrfs_commit_inode_delayed_inode+0x119/0x120
>         btrfs_evict_inode+0x357/0x500
>         evict+0xcf/0x1f0
>         vfs_rmdir.part.0+0x149/0x160
>         do_rmdir+0x136/0x1a0
>         do_syscall_64+0x33/0x40
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
>         __lock_acquire+0x1184/0x1fa0
>         lock_acquire+0xa4/0x3d0
>         __mutex_lock+0x7e/0x7e0
>         __btrfs_release_delayed_node.part.0+0x3f/0x330
>         btrfs_evict_inode+0x24c/0x500
>         evict+0xcf/0x1f0
>         dispose_list+0x48/0x70
>         prune_icache_sb+0x44/0x50
>         super_cache_scan+0x161/0x1e0
>         do_shrink_slab+0x178/0x3c0
>         shrink_slab+0x17c/0x290
>         shrink_node+0x2b2/0x6d0
>         balance_pgdat+0x30a/0x670
>         kswapd+0x213/0x4c0
>         kthread+0x138/0x160
>         ret_from_fork+0x1f/0x30
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(fs_reclaim);
>                                 lock(&fs_info->chunk_mutex);
>                                 lock(fs_reclaim);
>    lock(&delayed_node->mutex);
> 
>   *** DEADLOCK ***
> 
> 3 locks held by kswapd0/100:
>   #0: ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
>   #1: ffffffffa9d65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
>   #2: ffff9e8e9da260e0 (&type->s_umount_key#48){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
> 
> stack backtrace:
> CPU: 1 PID: 100 Comm: kswapd0 Tainted: G        W         5.9.0-rc2+ #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack+0x92/0xc8
>   check_noncircular+0x12d/0x150
>   __lock_acquire+0x1184/0x1fa0
>   lock_acquire+0xa4/0x3d0
>   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
>   __mutex_lock+0x7e/0x7e0
>   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
>   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
>   ? lock_acquire+0xa4/0x3d0
>   ? btrfs_evict_inode+0x11e/0x500
>   ? find_held_lock+0x2b/0x80
>   __btrfs_release_delayed_node.part.0+0x3f/0x330
>   btrfs_evict_inode+0x24c/0x500
>   evict+0xcf/0x1f0
>   dispose_list+0x48/0x70
>   prune_icache_sb+0x44/0x50
>   super_cache_scan+0x161/0x1e0
>   do_shrink_slab+0x178/0x3c0
>   shrink_slab+0x17c/0x290
>   shrink_node+0x2b2/0x6d0
>   balance_pgdat+0x30a/0x670
>   kswapd+0x213/0x4c0
>   ? _raw_spin_unlock_irqrestore+0x46/0x60
>   ? add_wait_queue_exclusive+0x70/0x70
>   ? balance_pgdat+0x670/0x670
>   kthread+0x138/0x160
>   ? kthread_create_worker_on_cpu+0x40/0x40
>   ret_from_fork+0x1f/0x30
> 
> This is because we are holding the chunk_mutex when we call
> btrfs_alloc_device, which does a GFP_KERNEL allocation.  We don't want
> to switch that to a GFP_NOFS lock because this is the only place where
> it matters.  So instead use memalloc_nofs_save() around the allocation
> in order to avoid the lockdep splat.
> 
> References: https://github.com/btrfs/fstests/issues/6
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3f8bd1af29eb..d6bbbe1986bb 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4,6 +4,7 @@
>    */
>   
>   #include <linux/sched.h>
> +#include <linux/sched/mm.h>
>   #include <linux/bio.h>
>   #include <linux/slab.h>
>   #include <linux/blkdev.h>
> @@ -6480,8 +6481,17 @@ static struct btrfs_device *add_missing_dev(struct btrfs_fs_devices *fs_devices,
>   					    u64 devid, u8 *dev_uuid)
>   {
>   	struct btrfs_device *device;
> +	unsigned int nofs_flag;
>   
> +	/*
> +	 * We call this under the chunk_mutex, so we want to use NOFS for this
> +	 * allocation, however we don't want to change btrfs_alloc_device() to
> +	 * always do NOFS because we use it in a lot of other GFP_KERNEL safe
> +	 * places.
> +	 */
> +	nofs_flag = memalloc_nofs_save();
>   	device = btrfs_alloc_device(NULL, &devid, dev_uuid);
> +	memalloc_nofs_restore(nofs_flag);
>   	if (IS_ERR(device))
>   		return device;
>   
> 

looks good.
Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info
  2020-09-01 21:40 ` [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info Josef Bacik
@ 2020-09-02  8:04   ` Nikolay Borisov
  2020-09-02 10:32   ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: Nikolay Borisov @ 2020-09-02  8:04 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 2.09.20 г. 0:40 ч., Josef Bacik wrote:
> We have this thing wrapped in an rcu lock, but it's really not needed.
> We create all the space_info's on mount, and we destroy them on unmount.
> The list never changes and we're protected from messing with it by the
> normal mount/umount path, so kill the RCU stuff around it.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info
  2020-09-01 21:40 ` [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info Josef Bacik
  2020-09-02  8:04   ` Nikolay Borisov
@ 2020-09-02 10:32   ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-09-02 10:32 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 01, 2020 at 05:40:37PM -0400, Josef Bacik wrote:
> We have this thing wrapped in an rcu lock, but it's really not needed.
> We create all the space_info's on mount, and we destroy them on unmount.
> The list never changes and we're protected from messing with it by the
> normal mount/umount path, so kill the RCU stuff around it.

What happens when balance filter converts one profile to a new one?
That's neither mount nor umount and it modifies the space infos, so
the RCU is there to keep the consistent view in case statfs/ioctl
happens, no?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex
  2020-09-02  6:21   ` Anand Jain
@ 2020-09-02 17:45     ` David Sterba
  2020-09-03 11:41       ` Anand Jain
  2020-09-03 11:42       ` Anand Jain
  0 siblings, 2 replies; 18+ messages in thread
From: David Sterba @ 2020-09-02 17:45 UTC (permalink / raw)
  To: Anand Jain; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Wed, Sep 02, 2020 at 02:21:31PM +0800, Anand Jain wrote:
> > @@ -2609,6 +2606,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> >   	btrfs_clear_space_info_full(fs_info);
> >   
> >   	mutex_unlock(&fs_info->chunk_mutex);
> > +
> > +	/* add sysfs device entry */
> > +	btrfs_sysfs_add_devices_dir(fs_devices, device);
> > +
> >   	mutex_unlock(&fs_devices->device_list_mutex);
> >   
> >   	if (seeding_dev) {
> > 
> 
> Strange we should get this splat when btrfs_sysfs_add_devices_dir()
> already has implicit GFP_NOFS allocation scope right? What did I
> miss?

The problem is the sysfs' kernfs_mutex, all sysfs allocations are
GFP_KERNEL thus it can grab fs_reclaim at some point.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev
  2020-09-01 21:40 ` [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev Josef Bacik
  2020-09-02  6:23   ` Anand Jain
@ 2020-09-03 11:17   ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-09-03 11:17 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 01, 2020 at 05:40:35PM -0400, Josef Bacik wrote:
> 
> References: https://github.com/btrfs/fstests/issues/6
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex
  2020-09-01 21:40 ` [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex Josef Bacik
  2020-09-02  6:21   ` Anand Jain
@ 2020-09-03 11:18   ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-09-03 11:18 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 01, 2020 at 05:40:36PM -0400, Josef Bacik wrote:
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex
  2020-09-02 17:45     ` David Sterba
@ 2020-09-03 11:41       ` Anand Jain
  2020-09-03 11:42       ` Anand Jain
  1 sibling, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-09-03 11:41 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On 3/9/20 1:45 am, David Sterba wrote:
> On Wed, Sep 02, 2020 at 02:21:31PM +0800, Anand Jain wrote:
>>> @@ -2609,6 +2606,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>    	btrfs_clear_space_info_full(fs_info);
>>>    
>>>    	mutex_unlock(&fs_info->chunk_mutex);
>>> +
>>> +	/* add sysfs device entry */
>>> +	btrfs_sysfs_add_devices_dir(fs_devices, device);
>>> +
>>>    	mutex_unlock(&fs_devices->device_list_mutex);
>>>    
>>>    	if (seeding_dev) {
>>>
>>
>> Strange we should get this splat when btrfs_sysfs_add_devices_dir()
>> already has implicit GFP_NOFS allocation scope right? What did I
>> miss?
> 
> The problem is the sysfs' kernfs_mutex, all sysfs allocations are
> GFP_KERNEL thus it can grab fs_reclaim at some point.
> 

Oh. Thanks.
Anand

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex
  2020-09-02 17:45     ` David Sterba
  2020-09-03 11:41       ` Anand Jain
@ 2020-09-03 11:42       ` Anand Jain
  1 sibling, 0 replies; 18+ messages in thread
From: Anand Jain @ 2020-09-03 11:42 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On 3/9/20 1:45 am, David Sterba wrote:
> On Wed, Sep 02, 2020 at 02:21:31PM +0800, Anand Jain wrote:
>>> @@ -2609,6 +2606,10 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>>    	btrfs_clear_space_info_full(fs_info);
>>>    
>>>    	mutex_unlock(&fs_info->chunk_mutex);
>>> +
>>> +	/* add sysfs device entry */
>>> +	btrfs_sysfs_add_devices_dir(fs_devices, device);
>>> +
>>>    	mutex_unlock(&fs_devices->device_list_mutex);
>>>    
>>>    	if (seeding_dev) {
>>>
>>
>> Strange we should get this splat when btrfs_sysfs_add_devices_dir()
>> already has implicit GFP_NOFS allocation scope right? What did I
>> miss?
> 
> The problem is the sysfs' kernfs_mutex, all sysfs allocations are
> GFP_KERNEL thus it can grab fs_reclaim at some point.
> 

Oh. Thanks.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Anand

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4][v2] Lockdep fixes
  2020-09-01 21:40 [PATCH 0/4][v2] Lockdep fixes Josef Bacik
                   ` (3 preceding siblings ...)
  2020-09-01 21:40 ` [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks Josef Bacik
@ 2020-09-04 14:20 ` David Sterba
  2020-09-07 13:05   ` David Sterba
  4 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-09-04 14:20 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 01, 2020 at 05:40:34PM -0400, Josef Bacik wrote:
> v1->v2:
> - Included the add_missing_dev patch in the series.
> - Added a patch to kill the rcu protection for fs_info->space_info.
> - Fixed the raid sysfs init stuff to be completely out of link_block_group, as
>   it causes a lockdep splat with the rwsem conversion.
> 
> Hello,
> 
> These are the last two lockdep splats I'm able to see in my testing.  We have
> like 4 variations of the same lockdep splat that's addressed by 
> 
> btrfs: do not create raid sysfs entries under chunk_mutex
> 
> Basically this particular dependency pulls in the kernfs_mutex under the
> chunk_mutex, and so we have like 4 issues in github with slightly different
> splats, but are all fixed by that fix.  With these two patches (and the one I
> sent the other day for add_missing_dev) I haven't hit any lockdep splats in 6
> runs of xfstests on 3 different VMs in the last 12 hours.  That means it should
> take Dave at least 2 runs before he hits a new one.  Thanks,
> 
> Josef Bacik (4):
>   btrfs: fix lockdep splat in add_missing_dev
>   btrfs: init sysfs for devices outside of the chunk_mutex

So the two patches were in misc-next and I started to see lockdep
complaining in btrfs/011, which is very early in the test list and this
makes it impossible to notice further lockdep reports.

I'll move the patches to a topic branch and will retest misc-next to see
what's the actual list of lockdep things to fix, so we have a clean
state before the eb rwsem switch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/4][v2] Lockdep fixes
  2020-09-04 14:20 ` [PATCH 0/4][v2] Lockdep fixes David Sterba
@ 2020-09-07 13:05   ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-09-07 13:05 UTC (permalink / raw)
  To: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Fri, Sep 04, 2020 at 04:20:00PM +0200, David Sterba wrote:
> On Tue, Sep 01, 2020 at 05:40:34PM -0400, Josef Bacik wrote:
> > v1->v2:
> > - Included the add_missing_dev patch in the series.
> > - Added a patch to kill the rcu protection for fs_info->space_info.
> > - Fixed the raid sysfs init stuff to be completely out of link_block_group, as
> >   it causes a lockdep splat with the rwsem conversion.
> > 
> > Hello,
> > 
> > These are the last two lockdep splats I'm able to see in my testing.  We have
> > like 4 variations of the same lockdep splat that's addressed by 
> > 
> > btrfs: do not create raid sysfs entries under chunk_mutex
> > 
> > Basically this particular dependency pulls in the kernfs_mutex under the
> > chunk_mutex, and so we have like 4 issues in github with slightly different
> > splats, but are all fixed by that fix.  With these two patches (and the one I
> > sent the other day for add_missing_dev) I haven't hit any lockdep splats in 6
> > runs of xfstests on 3 different VMs in the last 12 hours.  That means it should
> > take Dave at least 2 runs before he hits a new one.  Thanks,
> > 
> > Josef Bacik (4):
> >   btrfs: fix lockdep splat in add_missing_dev
> >   btrfs: init sysfs for devices outside of the chunk_mutex
> 
> So the two patches were in misc-next and I started to see lockdep
> complaining in btrfs/011, which is very early in the test list and this
> makes it impossible to notice further lockdep reports.

So it was really because of the rwsem switch in for-next. Misc-next
passes without lockdep warnings up to btrfs/124, which is the
kernfs/fs_reclaim case and there's a patch for that.

> I'll move the patches to a topic branch and will retest misc-next to see
> what's the actual list of lockdep things to fix, so we have a clean
> state before the eb rwsem switch.

The patches are in misc-next and I'll add more once I test them.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks
  2020-09-01 21:40 ` [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks Josef Bacik
@ 2020-09-08 12:40   ` David Sterba
  2020-09-08 12:52     ` Josef Bacik
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2020-09-08 12:40 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 01, 2020 at 05:40:38PM -0400, Josef Bacik wrote:
>  	}
>  
>  	list_for_each_entry(space_info, &info->space_info, list) {
> +		int i;
> +
> +		for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
> +			if (list_empty(&space_info->block_groups[i]))
> +				continue;
> +			cache = list_first_entry(&space_info->block_groups[i],
> +						 struct btrfs_block_group,
> +						 list);
> +			btrfs_sysfs_add_block_group_type(cache);
> +		}
> +

I had the previous version of the patch pass fstests, with no lockdep
warnings and then realized it's not the v2 that depends on the 3rd patch
removing RCU from this list traversal.

btrfs_sysfs_add_block_group_type does not seem to be conflicting RCU in
this loop so now I'm undecided if v1 is ok or if we really need v2, sice
the patch 3 removing RCU seems suspicious.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks
  2020-09-08 12:40   ` David Sterba
@ 2020-09-08 12:52     ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2020-09-08 12:52 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 9/8/20 8:40 AM, David Sterba wrote:
> On Tue, Sep 01, 2020 at 05:40:38PM -0400, Josef Bacik wrote:
>>   	}
>>   
>>   	list_for_each_entry(space_info, &info->space_info, list) {
>> +		int i;
>> +
>> +		for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
>> +			if (list_empty(&space_info->block_groups[i]))
>> +				continue;
>> +			cache = list_first_entry(&space_info->block_groups[i],
>> +						 struct btrfs_block_group,
>> +						 list);
>> +			btrfs_sysfs_add_block_group_type(cache);
>> +		}
>> +
> 
> I had the previous version of the patch pass fstests, with no lockdep
> warnings and then realized it's not the v2 that depends on the 3rd patch
> removing RCU from this list traversal.
> 
> btrfs_sysfs_add_block_group_type does not seem to be conflicting RCU in
> this loop so now I'm undecided if v1 is ok or if we really need v2, sice
> the patch 3 removing RCU seems suspicious.
> 

It conflicts with RCU here because we could sleep, that's why I had to remove it.

Alternatively we could just loop a second time through the space_info's outside 
of the RCU to do the btrfs_sysfs_add_block_group_type(), and I can drop the RCU 
removal patch altogether.  It's up to you, but we still need v2 because the 
problem still exists without it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-08 19:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 21:40 [PATCH 0/4][v2] Lockdep fixes Josef Bacik
2020-09-01 21:40 ` [PATCH 1/4] btrfs: fix lockdep splat in add_missing_dev Josef Bacik
2020-09-02  6:23   ` Anand Jain
2020-09-03 11:17   ` David Sterba
2020-09-01 21:40 ` [PATCH 2/4] btrfs: init sysfs for devices outside of the chunk_mutex Josef Bacik
2020-09-02  6:21   ` Anand Jain
2020-09-02 17:45     ` David Sterba
2020-09-03 11:41       ` Anand Jain
2020-09-03 11:42       ` Anand Jain
2020-09-03 11:18   ` David Sterba
2020-09-01 21:40 ` [PATCH 3/4] btrfs: kill the rcu protection for fs_info->space_info Josef Bacik
2020-09-02  8:04   ` Nikolay Borisov
2020-09-02 10:32   ` David Sterba
2020-09-01 21:40 ` [PATCH 4/4] btrfs: do not create raid sysfs entries under any locks Josef Bacik
2020-09-08 12:40   ` David Sterba
2020-09-08 12:52     ` Josef Bacik
2020-09-04 14:20 ` [PATCH 0/4][v2] Lockdep fixes David Sterba
2020-09-07 13:05   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.