All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 0/2] btrfs: device_list_mutex fix lockdep warn and cleanup
@ 2021-08-31  1:21 Anand Jain
  2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
  2021-08-31  1:21 ` [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent Anand Jain
  0 siblings, 2 replies; 12+ messages in thread
From: Anand Jain @ 2021-08-31  1:21 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: l

There are no new patches here. These patches were once independent and or
part of another set in the mainline list. But now, as these two patches
are related, so they are brought together in this set.

These two patches were at V2 and V4 (respectively) at their last reroll.
Here I have marked them both to be at V5 to avoid any confusion. But
there is no code change.

Patch 1/2 is an old patch in the mailing list, now part of this set as a
new cleanup patch 2/2 depends on it.

Patch 2/2 was sent before as part of the patchset (btrf_show_devname
related fixes). And, now in this set as it depends on 1/2.

Anand Jain (2):
  btrfs: fix lockdep warning while mounting sprout fs
  btrfs: consolidate device_list_mutex in prepare_sprout to its parent

 fs/btrfs/volumes.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs
  2021-08-31  1:21 [PATCH V5 0/2] btrfs: device_list_mutex fix lockdep warn and cleanup Anand Jain
@ 2021-08-31  1:21 ` Anand Jain
  2021-08-31  8:18   ` Nikolay Borisov
                     ` (3 more replies)
  2021-08-31  1:21 ` [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent Anand Jain
  1 sibling, 4 replies; 12+ messages in thread
From: Anand Jain @ 2021-08-31  1:21 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: l

Following test case reproduces lockdep warning.

 Test case:

 $ mkfs.btrfs -f <dev1>
 $ btrfstune -S 1 <dev1>
 $ mount <dev1> <mnt>
 $ btrfs device add <dev2> <mnt> -f
 $ umount <mnt>
 $ mount <dev2> <mnt>
 $ umount <mnt>

The warning claims a possible ABBA deadlock between the threads initiated by
[#1] btrfs device add and [#0] the mount.

======================================================
[ 540.743122] WARNING: possible circular locking dependency detected
[ 540.743129] 5.11.0-rc7+ #5 Not tainted
[ 540.743135] ------------------------------------------------------
[ 540.743142] mount/2515 is trying to acquire lock:
[ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs]
[ 540.743458] but task is already holding lock:
[ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
[ 540.743541] which lock already depends on the new lock.
[ 540.743543] the existing dependency chain (in reverse order) is:

[ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
[ 540.743566] down_read_nested+0x48/0x2b0
[ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
[ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
[ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
[ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
[ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex
[ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs]
[ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
[ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
[ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
[ 540.744166] __x64_sys_ioctl+0xe4/0x140
[ 540.744170] do_syscall_64+0x4b/0x80
[ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
[ 540.744184] __lock_acquire+0x155f/0x2360
[ 540.744188] lock_acquire+0x10b/0x5c0
[ 540.744190] __mutex_lock+0xb1/0xf80
[ 540.744193] mutex_lock_nested+0x27/0x30
[ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
[ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
[ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
[ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
[ 540.744472] legacy_get_tree+0x38/0x90
[ 540.744475] vfs_get_tree+0x30/0x140
[ 540.744478] fc_mount+0x16/0x60
[ 540.744482] vfs_kern_mount+0x91/0x100
[ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
[ 540.744536] legacy_get_tree+0x38/0x90
[ 540.744537] vfs_get_tree+0x30/0x140
[ 540.744539] path_mount+0x8d8/0x1070
[ 540.744541] do_mount+0x8d/0xc0
[ 540.744543] __x64_sys_mount+0x125/0x160
[ 540.744545] do_syscall_64+0x4b/0x80
[ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 540.744551] other info that might help us debug this:
[ 540.744552] Possible unsafe locking scenario:

[ 540.744553] CPU0 				CPU1
[ 540.744554] ---- 				----
[ 540.744555] lock(btrfs-chunk-00);
[ 540.744557] 					lock(&fs_devs->device_list_mutex);
[ 540.744560] 					lock(btrfs-chunk-00);
[ 540.744562] lock(&fs_devs->device_list_mutex);
[ 540.744564]
 *** DEADLOCK ***

[ 540.744565] 3 locks held by mount/2515:
[ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450
[ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
[ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
[ 540.744708]
 stack backtrace:
[ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5

But the device_list_mutex in clone_fs_devices() is redundant, as explained
below.
Two threads [1]  and [2] (below) could lead to clone_fs_device().

[1]
open_ctree <== mount sprout fs
 btrfs_read_chunk_tree()
  mutex_lock(&uuid_mutex) <== global lock
  read_one_dev()
   open_seed_devices()
    clone_fs_devices() <== seed fs_devices
     mutex_lock(&orig->device_list_mutex) <== seed fs_devices

[2]
btrfs_init_new_device() <== sprouting
 mutex_lock(&uuid_mutex); <== global lock
 btrfs_prepare_sprout()
   lockdep_assert_held(&uuid_mutex)
   clone_fs_devices(seed_fs_device) <== seed fs_devices

Both of these threads hold uuid_mutex which is sufficient to protect
getting the seed device(s) freed while we are trying to clone it for
sprouting [2] or mounting a sprout [1] (as above). A mounted seed
device can not free/write/replace because it is read-only. An unmounted
seed device can free by btrfs_free_stale_devices(), but it needs uuid_mutex.
So this patch removes the unnecessary device_list_mutex in clone_fs_devices().
And adds a lockdep_assert_held(&uuid_mutex) in clone_fs_devices().

Reported-by: Su Yue <l@damenly.su>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v5: Vet test case in the changelog.
v2: Remove Martin's Reported-by and Tested-by.
    Add Su's Reported-by.
    Add lockdep_assert_held check.
    Update the changelog, make it relevant to the current misc-next

 fs/btrfs/volumes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index abdc392f6ef9..fa9fe47b5b68 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -558,6 +558,8 @@ static int btrfs_free_stale_devices(const char *path,
 	struct btrfs_device *device, *tmp_device;
 	int ret = 0;
 
+	lockdep_assert_held(&uuid_mutex);
+
 	if (path)
 		ret = -ENOENT;
 
@@ -988,11 +990,12 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 	struct btrfs_device *orig_dev;
 	int ret = 0;
 
+	lockdep_assert_held(&uuid_mutex);
+
 	fs_devices = alloc_fs_devices(orig->fsid, NULL);
 	if (IS_ERR(fs_devices))
 		return fs_devices;
 
-	mutex_lock(&orig->device_list_mutex);
 	fs_devices->total_devices = orig->total_devices;
 
 	list_for_each_entry(orig_dev, &orig->devices, dev_list) {
@@ -1024,10 +1027,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 		device->fs_devices = fs_devices;
 		fs_devices->num_devices++;
 	}
-	mutex_unlock(&orig->device_list_mutex);
 	return fs_devices;
 error:
-	mutex_unlock(&orig->device_list_mutex);
 	free_fs_devices(fs_devices);
 	return ERR_PTR(ret);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent
  2021-08-31  1:21 [PATCH V5 0/2] btrfs: device_list_mutex fix lockdep warn and cleanup Anand Jain
  2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
@ 2021-08-31  1:21 ` Anand Jain
  2021-08-31 13:03   ` Nikolay Borisov
  2021-09-17 15:37   ` David Sterba
  1 sibling, 2 replies; 12+ messages in thread
From: Anand Jain @ 2021-08-31  1:21 UTC (permalink / raw)
  To: linux-btrfs, dsterba; +Cc: l

btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
so that its parent function btrfs_init_new_device() can add the new sprout
device to fs_info->fs_devices.

Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
device_list_mutex. But they are holding it sequentially, thus creates a
small window to an opportunity to race. Close this opportunity and hold
device_list_mutex common to both btrfs_init_new_device() and
btrfs_prepare_sprout().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
RFC because IMO the cleanup of device_list_mutex makes sense even though
there isn't another thread that could race potentially race as of now.

Depends on
 [PATCH v2] btrfs: fix lockdep warning while mounting sprout fs
which removed the device_list_mutex from clone_fs_devices() otherwise
this patch will cause a double mutex error.

v2: fix the missing mutex_unlock in the error return
v3: -
v4: -
v5: - (Except for the change in below SO comments)

 fs/btrfs/volumes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fa9fe47b5b68..53ead67b625c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2369,6 +2369,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	u64 super_flags;
 
 	lockdep_assert_held(&uuid_mutex);
+	lockdep_assert_held(&fs_devices->device_list_mutex);
+
 	if (!fs_devices->seeding)
 		return -EINVAL;
 
@@ -2400,7 +2402,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	INIT_LIST_HEAD(&seed_devices->alloc_list);
 	mutex_init(&seed_devices->device_list_mutex);
 
-	mutex_lock(&fs_devices->device_list_mutex);
 	list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices,
 			      synchronize_rcu);
 	list_for_each_entry(device, &seed_devices->devices, dev_list)
@@ -2416,7 +2417,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	generate_random_uuid(fs_devices->fsid);
 	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
-	mutex_unlock(&fs_devices->device_list_mutex);
 
 	super_flags = btrfs_super_flags(disk_super) &
 		      ~BTRFS_SUPER_FLAG_SEEDING;
@@ -2591,10 +2591,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	device->dev_stats_valid = 1;
 	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
 
+	mutex_lock(&fs_devices->device_list_mutex);
 	if (seeding_dev) {
 		btrfs_clear_sb_rdonly(sb);
 		ret = btrfs_prepare_sprout(fs_info);
 		if (ret) {
+			mutex_unlock(&fs_devices->device_list_mutex);
 			btrfs_abort_transaction(trans, ret);
 			goto error_trans;
 		}
@@ -2604,7 +2606,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	device->fs_devices = fs_devices;
 
-	mutex_lock(&fs_devices->device_list_mutex);
 	mutex_lock(&fs_info->chunk_mutex);
 	list_add_rcu(&device->dev_list, &fs_devices->devices);
 	list_add(&device->dev_alloc_list, &fs_devices->alloc_list);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs
  2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
@ 2021-08-31  8:18   ` Nikolay Borisov
  2021-09-02 23:51     ` Anand Jain
  2021-08-31 12:37   ` Nikolay Borisov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2021-08-31  8:18 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs, dsterba; +Cc: l



On 31.08.21 г. 4:21, Anand Jain wrote:
> Following test case reproduces lockdep warning.
> 
>  Test case:
> 
>  $ mkfs.btrfs -f <dev1>
>  $ btrfstune -S 1 <dev1>
>  $ mount <dev1> <mnt>
>  $ btrfs device add <dev2> <mnt> -f
>  $ umount <mnt>
>  $ mount <dev2> <mnt>
>  $ umount <mnt>
> 
> The warning claims a possible ABBA deadlock between the threads initiated by
> [#1] btrfs device add and [#0] the mount.
> 

Send this as an xfstest

<snip>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs
  2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
  2021-08-31  8:18   ` Nikolay Borisov
@ 2021-08-31 12:37   ` Nikolay Borisov
  2021-09-01  0:49   ` Su Yue
  2021-09-02 15:28   ` David Sterba
  3 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2021-08-31 12:37 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs, dsterba; +Cc: l



On 31.08.21 г. 4:21, Anand Jain wrote:
> Following test case reproduces lockdep warning.
> 
>  Test case:
> 
>  $ mkfs.btrfs -f <dev1>
>  $ btrfstune -S 1 <dev1>
>  $ mount <dev1> <mnt>
>  $ btrfs device add <dev2> <mnt> -f
>  $ umount <mnt>
>  $ mount <dev2> <mnt>
>  $ umount <mnt>
> 
> The warning claims a possible ABBA deadlock between the threads initiated by
> [#1] btrfs device add and [#0] the mount.
> 
> ======================================================
> [ 540.743122] WARNING: possible circular locking dependency detected
> [ 540.743129] 5.11.0-rc7+ #5 Not tainted
> [ 540.743135] ------------------------------------------------------
> [ 540.743142] mount/2515 is trying to acquire lock:
> [ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs]
> [ 540.743458] but task is already holding lock:
> [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.743541] which lock already depends on the new lock.
> [ 540.743543] the existing dependency chain (in reverse order) is:
> 
> [ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
> [ 540.743566] down_read_nested+0x48/0x2b0
> [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
> [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
> [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
> [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex
> [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs]
> [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
> [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
> [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
> [ 540.744166] __x64_sys_ioctl+0xe4/0x140
> [ 540.744170] do_syscall_64+0x4b/0x80
> [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9

In this callchain we lock device_list_mutex followed by locking the
chunk root extent item
> 
> [ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
> [ 540.744184] __lock_acquire+0x155f/0x2360
> [ 540.744188] lock_acquire+0x10b/0x5c0
> [ 540.744190] __mutex_lock+0xb1/0xf80
> [ 540.744193] mutex_lock_nested+0x27/0x30
> [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
> [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
> [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
> [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
> [ 540.744472] legacy_get_tree+0x38/0x90
> [ 540.744475] vfs_get_tree+0x30/0x140
> [ 540.744478] fc_mount+0x16/0x60
> [ 540.744482] vfs_kern_mount+0x91/0x100
> [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
> [ 540.744536] legacy_get_tree+0x38/0x90
> [ 540.744537] vfs_get_tree+0x30/0x140
> [ 540.744539] path_mount+0x8d8/0x1070
> [ 540.744541] do_mount+0x8d/0xc0
> [ 540.744543] __x64_sys_mount+0x125/0x160
> [ 540.744545] do_syscall_64+0x4b/0x80
> [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Whereas in this call chain we first lock the chunk_root in
btrfs_read_chunk_tree and then acquire device_list_mutex in
clone_fs_devices. And this inversion of lock order could potentially
lead to a deadlock.

> 
> [ 540.744551] other info that might help us debug this:
> [ 540.744552] Possible unsafe locking scenario:
> 
> [ 540.744553] CPU0 				CPU1
> [ 540.744554] ---- 				----
> [ 540.744555] lock(btrfs-chunk-00);
> [ 540.744557] 					lock(&fs_devs->device_list_mutex);
> [ 540.744560] 					lock(btrfs-chunk-00);
> [ 540.744562] lock(&fs_devs->device_list_mutex);
> [ 540.744564]
>  *** DEADLOCK ***
> 
> [ 540.744565] 3 locks held by mount/2515:
> [ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450
> [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
> [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.744708]
>  stack backtrace:
> [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5
> 
> But the device_list_mutex in clone_fs_devices() is redundant, as explained
> below.
> Two threads [1]  and [2] (below) could lead to clone_fs_device().
> 
> [1]
> open_ctree <== mount sprout fs
>  btrfs_read_chunk_tree()
>   mutex_lock(&uuid_mutex) <== global lock
>   read_one_dev()
>    open_seed_devices()
>     clone_fs_devices() <== seed fs_devices
>      mutex_lock(&orig->device_list_mutex) <== seed fs_devices
> 
> [2]
> btrfs_init_new_device() <== sprouting
>  mutex_lock(&uuid_mutex); <== global lock
>  btrfs_prepare_sprout()
>    lockdep_assert_held(&uuid_mutex)
>    clone_fs_devices(seed_fs_device) <== seed fs_devices
> 
> Both of these threads hold uuid_mutex which is sufficient to protect
> getting the seed device(s) freed while we are trying to clone it for
> sprouting [2] or mounting a sprout [1] (as above). A mounted seed
> device can not free/write/replace because it is read-only. An unmounted
> seed device can free by btrfs_free_stale_devices(), but it needs uuid_mutex.
> So this patch removes the unnecessary device_list_mutex in clone_fs_devices().
> And adds a lockdep_assert_held(&uuid_mutex) in clone_fs_devices().

The reason we hold the device_list_mutex is not due to freeing (only)
but also due to possible racing mounts of filesystems seeded by the same
device. I.e in btrfs_prepare_sprout we essentially "empty" the seed
following the call to clone_fs_devices by splicing its devices on the
newly allocated 'seed_device'.

But since those processes are synchronized by uuid_mutex (i.e
open_seed_devices) then I'd say this is fine.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> 
> Reported-by: Su Yue <l@damenly.su>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> v5: Vet test case in the changelog.
> v2: Remove Martin's Reported-by and Tested-by.
>     Add Su's Reported-by.
>     Add lockdep_assert_held check.
>     Update the changelog, make it relevant to the current misc-next
> 
>  fs/btrfs/volumes.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index abdc392f6ef9..fa9fe47b5b68 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -558,6 +558,8 @@ static int btrfs_free_stale_devices(const char *path,
>  	struct btrfs_device *device, *tmp_device;
>  	int ret = 0;
>  
> +	lockdep_assert_held(&uuid_mutex);
> +
>  	if (path)
>  		ret = -ENOENT;
>  
> @@ -988,11 +990,12 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>  	struct btrfs_device *orig_dev;
>  	int ret = 0;
>  
> +	lockdep_assert_held(&uuid_mutex);
> +
>  	fs_devices = alloc_fs_devices(orig->fsid, NULL);
>  	if (IS_ERR(fs_devices))
>  		return fs_devices;
>  
> -	mutex_lock(&orig->device_list_mutex);
>  	fs_devices->total_devices = orig->total_devices;
>  
>  	list_for_each_entry(orig_dev, &orig->devices, dev_list) {
> @@ -1024,10 +1027,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>  		device->fs_devices = fs_devices;
>  		fs_devices->num_devices++;
>  	}
> -	mutex_unlock(&orig->device_list_mutex);
>  	return fs_devices;
>  error:
> -	mutex_unlock(&orig->device_list_mutex);
>  	free_fs_devices(fs_devices);
>  	return ERR_PTR(ret);
>  }
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent
  2021-08-31  1:21 ` [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent Anand Jain
@ 2021-08-31 13:03   ` Nikolay Borisov
  2021-09-03  3:08     ` Anand Jain
  2021-09-17 15:37   ` David Sterba
  1 sibling, 1 reply; 12+ messages in thread
From: Nikolay Borisov @ 2021-08-31 13:03 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs, dsterba; +Cc: l



On 31.08.21 г. 4:21, Anand Jain wrote:
> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
> so that its parent function btrfs_init_new_device() can add the new sprout
> device to fs_info->fs_devices.
> 
> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
> device_list_mutex. But they are holding it sequentially, thus creates a
> small window to an opportunity to race. Close this opportunity and hold
> device_list_mutex common to both btrfs_init_new_device() and
> btrfs_prepare_sprout().
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

That's a moot point, what's important is that btrfs_prepare_sprout
leaves the fs_devices in a consistent state and btrfs_init_new_device
also takes the lock when it's about to modify the devices list, so
that's fine as well. While the patch itself won't do any harm I think
it's irrelevant.

<snip>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs
  2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
  2021-08-31  8:18   ` Nikolay Borisov
  2021-08-31 12:37   ` Nikolay Borisov
@ 2021-09-01  0:49   ` Su Yue
  2021-09-02 15:28   ` David Sterba
  3 siblings, 0 replies; 12+ messages in thread
From: Su Yue @ 2021-09-01  0:49 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba


On Tue 31 Aug 2021 at 09:21, Anand Jain <anand.jain@oracle.com> 
wrote:

> Following test case reproduces lockdep warning.
>
>  Test case:
>
>  $ mkfs.btrfs -f <dev1>
>  $ btrfstune -S 1 <dev1>
>  $ mount <dev1> <mnt>
>  $ btrfs device add <dev2> <mnt> -f
>  $ umount <mnt>
>  $ mount <dev2> <mnt>
>  $ umount <mnt>
>
> The warning claims a possible ABBA deadlock between the threads 
> initiated by
> [#1] btrfs device add and [#0] the mount.
>
> ======================================================
> [ 540.743122] WARNING: possible circular locking dependency 
> detected
> [ 540.743129] 5.11.0-rc7+ #5 Not tainted
> [ 540.743135] 
> ------------------------------------------------------
> [ 540.743142] mount/2515 is trying to acquire lock:
> [ 540.743149] ffffa0c5544c2ce0 
> (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: 
> clone_fs_devices+0x6d/0x210 [btrfs]
> [ 540.743458] but task is already holding lock:
> [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: 
> __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.743541] which lock already depends on the new lock.
> [ 540.743543] the existing dependency chain (in reverse order) 
> is:
>
> [ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
> [ 540.743566] down_read_nested+0x48/0x2b0
> [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
> [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
> [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
> [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- 
> device_list_mutex
> [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 
> [btrfs]
> [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
> [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
> [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
> [ 540.744166] __x64_sys_ioctl+0xe4/0x140
> [ 540.744170] do_syscall_64+0x4b/0x80
> [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
> [ 540.744184] __lock_acquire+0x155f/0x2360
> [ 540.744188] lock_acquire+0x10b/0x5c0
> [ 540.744190] __mutex_lock+0xb1/0xf80
> [ 540.744193] mutex_lock_nested+0x27/0x30
> [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
> [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
> [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
> [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
> [ 540.744472] legacy_get_tree+0x38/0x90
> [ 540.744475] vfs_get_tree+0x30/0x140
> [ 540.744478] fc_mount+0x16/0x60
> [ 540.744482] vfs_kern_mount+0x91/0x100
> [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
> [ 540.744536] legacy_get_tree+0x38/0x90
> [ 540.744537] vfs_get_tree+0x30/0x140
> [ 540.744539] path_mount+0x8d8/0x1070
> [ 540.744541] do_mount+0x8d/0xc0
> [ 540.744543] __x64_sys_mount+0x125/0x160
> [ 540.744545] do_syscall_64+0x4b/0x80
> [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [ 540.744551] other info that might help us debug this:
> [ 540.744552] Possible unsafe locking scenario:
>
> [ 540.744553] CPU0 				CPU1
> [ 540.744554] ---- 				----
> [ 540.744555] lock(btrfs-chunk-00);
> [ 540.744557] 
> lock(&fs_devs->device_list_mutex);
> [ 540.744560] 					lock(btrfs-chunk-00);
> [ 540.744562] lock(&fs_devs->device_list_mutex);
> [ 540.744564]
>  *** DEADLOCK ***
>
> [ 540.744565] 3 locks held by mount/2515:
> [ 540.744567] #0: ffffa0c56bf7a0e0 
> (&type->s_umount_key#42/1){+.+.}-{4:4}, at: 
> alloc_super.isra.16+0xdf/0x450
> [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: 
> btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
> [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, 
> at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.744708]
>  stack backtrace:
> [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 
> 5.11.0-rc7+ #5
>
> But the device_list_mutex in clone_fs_devices() is redundant, as 
> explained
> below.
> Two threads [1]  and [2] (below) could lead to 
> clone_fs_device().
>
> [1]
> open_ctree <== mount sprout fs
>  btrfs_read_chunk_tree()
>   mutex_lock(&uuid_mutex) <== global lock
>   read_one_dev()
>    open_seed_devices()
>     clone_fs_devices() <== seed fs_devices
>      mutex_lock(&orig->device_list_mutex) <== seed fs_devices
>
> [2]
> btrfs_init_new_device() <== sprouting
>  mutex_lock(&uuid_mutex); <== global lock
>  btrfs_prepare_sprout()
>    lockdep_assert_held(&uuid_mutex)
>    clone_fs_devices(seed_fs_device) <== seed fs_devices
>
> Both of these threads hold uuid_mutex which is sufficient to 
> protect
> getting the seed device(s) freed while we are trying to clone it 
> for
> sprouting [2] or mounting a sprout [1] (as above). A mounted 
> seed
> device can not free/write/replace because it is read-only. An 
> unmounted
> seed device can free by btrfs_free_stale_devices(), but it needs 
> uuid_mutex.
> So this patch removes the unnecessary device_list_mutex in 
> clone_fs_devices().
> And adds a lockdep_assert_held(&uuid_mutex) in 
> clone_fs_devices().
>
> Reported-by: Su Yue <l@damenly.su>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>

Tested-by: Su Yue <l@damenly.su>

--
Su
> ---
> v5: Vet test case in the changelog.
> v2: Remove Martin's Reported-by and Tested-by.
>     Add Su's Reported-by.
>     Add lockdep_assert_held check.
>     Update the changelog, make it relevant to the current 
>     misc-next
>
>  fs/btrfs/volumes.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index abdc392f6ef9..fa9fe47b5b68 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -558,6 +558,8 @@ static int btrfs_free_stale_devices(const 
> char *path,
>  	struct btrfs_device *device, *tmp_device;
>  	int ret = 0;
>
> +	lockdep_assert_held(&uuid_mutex);
> +
>  	if (path)
>  		ret = -ENOENT;
>
> @@ -988,11 +990,12 @@ static struct btrfs_fs_devices 
> *clone_fs_devices(struct btrfs_fs_devices *orig)
>  	struct btrfs_device *orig_dev;
>  	int ret = 0;
>
> +	lockdep_assert_held(&uuid_mutex);
> +
>  	fs_devices = alloc_fs_devices(orig->fsid, NULL);
>  	if (IS_ERR(fs_devices))
>  		return fs_devices;
>
> -	mutex_lock(&orig->device_list_mutex);
>  	fs_devices->total_devices = orig->total_devices;
>
>  	list_for_each_entry(orig_dev, &orig->devices, dev_list) {
> @@ -1024,10 +1027,8 @@ static struct btrfs_fs_devices 
> *clone_fs_devices(struct btrfs_fs_devices *orig)
>  		device->fs_devices = fs_devices;
>  		fs_devices->num_devices++;
>  	}
> -	mutex_unlock(&orig->device_list_mutex);
>  	return fs_devices;
>  error:
> -	mutex_unlock(&orig->device_list_mutex);
>  	free_fs_devices(fs_devices);
>  	return ERR_PTR(ret);
>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs
  2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
                     ` (2 preceding siblings ...)
  2021-09-01  0:49   ` Su Yue
@ 2021-09-02 15:28   ` David Sterba
  3 siblings, 0 replies; 12+ messages in thread
From: David Sterba @ 2021-09-02 15:28 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba, l

On Tue, Aug 31, 2021 at 09:21:28AM +0800, Anand Jain wrote:
> Following test case reproduces lockdep warning.
> 
>  Test case:
> 
>  $ mkfs.btrfs -f <dev1>
>  $ btrfstune -S 1 <dev1>
>  $ mount <dev1> <mnt>
>  $ btrfs device add <dev2> <mnt> -f
>  $ umount <mnt>
>  $ mount <dev2> <mnt>
>  $ umount <mnt>
> 
> The warning claims a possible ABBA deadlock between the threads initiated by
> [#1] btrfs device add and [#0] the mount.
> 
> ======================================================
> [ 540.743122] WARNING: possible circular locking dependency detected
> [ 540.743129] 5.11.0-rc7+ #5 Not tainted
> [ 540.743135] ------------------------------------------------------
> [ 540.743142] mount/2515 is trying to acquire lock:
> [ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs]
> [ 540.743458] but task is already holding lock:
> [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.743541] which lock already depends on the new lock.
> [ 540.743543] the existing dependency chain (in reverse order) is:
> 
> [ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
> [ 540.743566] down_read_nested+0x48/0x2b0
> [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
> [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
> [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
> [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex
> [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs]
> [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
> [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
> [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
> [ 540.744166] __x64_sys_ioctl+0xe4/0x140
> [ 540.744170] do_syscall_64+0x4b/0x80
> [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> [ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
> [ 540.744184] __lock_acquire+0x155f/0x2360
> [ 540.744188] lock_acquire+0x10b/0x5c0
> [ 540.744190] __mutex_lock+0xb1/0xf80
> [ 540.744193] mutex_lock_nested+0x27/0x30
> [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
> [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
> [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
> [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
> [ 540.744472] legacy_get_tree+0x38/0x90
> [ 540.744475] vfs_get_tree+0x30/0x140
> [ 540.744478] fc_mount+0x16/0x60
> [ 540.744482] vfs_kern_mount+0x91/0x100
> [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
> [ 540.744536] legacy_get_tree+0x38/0x90
> [ 540.744537] vfs_get_tree+0x30/0x140
> [ 540.744539] path_mount+0x8d8/0x1070
> [ 540.744541] do_mount+0x8d/0xc0
> [ 540.744543] __x64_sys_mount+0x125/0x160
> [ 540.744545] do_syscall_64+0x4b/0x80
> [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> [ 540.744551] other info that might help us debug this:
> [ 540.744552] Possible unsafe locking scenario:
> 
> [ 540.744553] CPU0 				CPU1
> [ 540.744554] ---- 				----
> [ 540.744555] lock(btrfs-chunk-00);
> [ 540.744557] 					lock(&fs_devs->device_list_mutex);
> [ 540.744560] 					lock(btrfs-chunk-00);
> [ 540.744562] lock(&fs_devs->device_list_mutex);
> [ 540.744564]
>  *** DEADLOCK ***
> 
> [ 540.744565] 3 locks held by mount/2515:
> [ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450
> [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
> [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
> [ 540.744708]
>  stack backtrace:
> [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5
> 
> But the device_list_mutex in clone_fs_devices() is redundant, as explained
> below.
> Two threads [1]  and [2] (below) could lead to clone_fs_device().
> 
> [1]
> open_ctree <== mount sprout fs
>  btrfs_read_chunk_tree()
>   mutex_lock(&uuid_mutex) <== global lock
>   read_one_dev()
>    open_seed_devices()
>     clone_fs_devices() <== seed fs_devices
>      mutex_lock(&orig->device_list_mutex) <== seed fs_devices
> 
> [2]
> btrfs_init_new_device() <== sprouting
>  mutex_lock(&uuid_mutex); <== global lock
>  btrfs_prepare_sprout()
>    lockdep_assert_held(&uuid_mutex)
>    clone_fs_devices(seed_fs_device) <== seed fs_devices
> 
> Both of these threads hold uuid_mutex which is sufficient to protect
> getting the seed device(s) freed while we are trying to clone it for
> sprouting [2] or mounting a sprout [1] (as above). A mounted seed
> device can not free/write/replace because it is read-only. An unmounted
> seed device can free by btrfs_free_stale_devices(), but it needs uuid_mutex.
> So this patch removes the unnecessary device_list_mutex in clone_fs_devices().
> And adds a lockdep_assert_held(&uuid_mutex) in clone_fs_devices().
> 
> Reported-by: Su Yue <l@damenly.su>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

For the record, this patch has been added to misc-next.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs
  2021-08-31  8:18   ` Nikolay Borisov
@ 2021-09-02 23:51     ` Anand Jain
  0 siblings, 0 replies; 12+ messages in thread
From: Anand Jain @ 2021-09-02 23:51 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: l, linux-btrfs, dsterba

On 31/08/2021 16:18, Nikolay Borisov wrote:
> 
> 
> On 31.08.21 г. 4:21, Anand Jain wrote:
>> Following test case reproduces lockdep warning.
>>
>>   Test case:
>>
>>   $ mkfs.btrfs -f <dev1>
>>   $ btrfstune -S 1 <dev1>
>>   $ mount <dev1> <mnt>
>>   $ btrfs device add <dev2> <mnt> -f
>>   $ umount <mnt>
>>   $ mount <dev2> <mnt>
>>   $ umount <mnt>
>>
>> The warning claims a possible ABBA deadlock between the threads initiated by
>> [#1] btrfs device add and [#0] the mount.
>>
> 
> Send this as an xfstest

The above steps are the most common in the seed group of test cases.
The first test case (btrfs/161) in the -g seed group already reproduces
the lock dep warning.

Any idea how do I reset the lockdep warning (without reboot) so that it
will again report the same lockdep warning stack if it discovers again?


Thanks, Anand


> 
> <snip>
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent
  2021-08-31 13:03   ` Nikolay Borisov
@ 2021-09-03  3:08     ` Anand Jain
  0 siblings, 0 replies; 12+ messages in thread
From: Anand Jain @ 2021-09-03  3:08 UTC (permalink / raw)
  To: Nikolay Borisov, dsterba; +Cc: l, linux-btrfs

On 31/08/2021 21:03, Nikolay Borisov wrote:
> 
> 
> On 31.08.21 г. 4:21, Anand Jain wrote:
>> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
>> so that its parent function btrfs_init_new_device() can add the new sprout
>> device to fs_info->fs_devices.
>>
>> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
>> device_list_mutex. But they are holding it sequentially, thus creates a
>> small window to an opportunity to race. Close this opportunity and hold
>> device_list_mutex common to both btrfs_init_new_device() and
>> btrfs_prepare_sprout().
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> 
> That's a moot point, what's important is that btrfs_prepare_sprout
> leaves the fs_devices in a consistent state and btrfs_init_new_device
> also takes the lock when it's about to modify the devices list, so
> that's fine as well. While the patch itself won't do any harm I think
> it's irrelevant.


  This patch is for the cleanup of the device_list_mutex.

Thanks, Anand

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent
  2021-08-31  1:21 ` [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent Anand Jain
  2021-08-31 13:03   ` Nikolay Borisov
@ 2021-09-17 15:37   ` David Sterba
  2021-09-18  0:10     ` Anand Jain
  1 sibling, 1 reply; 12+ messages in thread
From: David Sterba @ 2021-09-17 15:37 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, dsterba, l

On Tue, Aug 31, 2021 at 09:21:29AM +0800, Anand Jain wrote:
> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
> so that its parent function btrfs_init_new_device() can add the new sprout
> device to fs_info->fs_devices.
> 
> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
> device_list_mutex. But they are holding it sequentially, thus creates a
> small window to an opportunity to race. Close this opportunity and hold
> device_list_mutex common to both btrfs_init_new_device() and
> btrfs_prepare_sprout().

I don't se what exactly would go wrong with the separate device list
locking, but I see at least one potential problem with the new code.

> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> RFC because IMO the cleanup of device_list_mutex makes sense even though
> there isn't another thread that could race potentially race as of now.
> 
> Depends on
>  [PATCH v2] btrfs: fix lockdep warning while mounting sprout fs
> which removed the device_list_mutex from clone_fs_devices() otherwise
> this patch will cause a double mutex error.
> 
> v2: fix the missing mutex_unlock in the error return
> v3: -
> v4: -
> v5: - (Except for the change in below SO comments)
> 
>  fs/btrfs/volumes.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index fa9fe47b5b68..53ead67b625c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2369,6 +2369,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>  	u64 super_flags;
>  
>  	lockdep_assert_held(&uuid_mutex);
> +	lockdep_assert_held(&fs_devices->device_list_mutex);
> +
>  	if (!fs_devices->seeding)
>  		return -EINVAL;
>  
> @@ -2400,7 +2402,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>  	INIT_LIST_HEAD(&seed_devices->alloc_list);
>  	mutex_init(&seed_devices->device_list_mutex);

A few lines before this one there's alloc_fs_devices and
clone_fs_devices, both allocating memory. This would happen under a big
lock as device_list_mutex also protects superblock write. This is a
pattern to avoid.

A rough idea would be to split btrfs_prepare_sprout into parts where the
allocations are not done under the lock and the locked part. It could be
partially inlined to btrfs_init_new_device.

>  
> -	mutex_lock(&fs_devices->device_list_mutex);
>  	list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices,
>  			      synchronize_rcu);
>  	list_for_each_entry(device, &seed_devices->devices, dev_list)
> @@ -2416,7 +2417,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>  	generate_random_uuid(fs_devices->fsid);
>  	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
>  	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
> -	mutex_unlock(&fs_devices->device_list_mutex);
>  
>  	super_flags = btrfs_super_flags(disk_super) &
>  		      ~BTRFS_SUPER_FLAG_SEEDING;
> @@ -2591,10 +2591,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  	device->dev_stats_valid = 1;
>  	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>  
> +	mutex_lock(&fs_devices->device_list_mutex);
>  	if (seeding_dev) {
>  		btrfs_clear_sb_rdonly(sb);
>  		ret = btrfs_prepare_sprout(fs_info);
>  		if (ret) {
> +			mutex_unlock(&fs_devices->device_list_mutex);
>  			btrfs_abort_transaction(trans, ret);
>  			goto error_trans;
>  		}
> @@ -2604,7 +2606,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  
>  	device->fs_devices = fs_devices;
>  
> -	mutex_lock(&fs_devices->device_list_mutex);
>  	mutex_lock(&fs_info->chunk_mutex);
>  	list_add_rcu(&device->dev_list, &fs_devices->devices);
>  	list_add(&device->dev_alloc_list, &fs_devices->alloc_list);
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent
  2021-09-17 15:37   ` David Sterba
@ 2021-09-18  0:10     ` Anand Jain
  0 siblings, 0 replies; 12+ messages in thread
From: Anand Jain @ 2021-09-18  0:10 UTC (permalink / raw)
  To: dsterba, linux-btrfs, dsterba, l



On 17/09/2021 23:37, David Sterba wrote:
> On Tue, Aug 31, 2021 at 09:21:29AM +0800, Anand Jain wrote:
>> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
>> so that its parent function btrfs_init_new_device() can add the new sprout
>> device to fs_info->fs_devices.
>>
>> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
>> device_list_mutex. But they are holding it sequentially, thus creates a
>> small window to an opportunity to race. Close this opportunity and hold
>> device_list_mutex common to both btrfs_init_new_device() and
>> btrfs_prepare_sprout().
> 
> I don't se what exactly would go wrong with the separate device list
> locking, but I see at least one potential problem with the new code.
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> RFC because IMO the cleanup of device_list_mutex makes sense even though
>> there isn't another thread that could race potentially race as of now.
>>
>> Depends on
>>   [PATCH v2] btrfs: fix lockdep warning while mounting sprout fs
>> which removed the device_list_mutex from clone_fs_devices() otherwise
>> this patch will cause a double mutex error.
>>
>> v2: fix the missing mutex_unlock in the error return
>> v3: -
>> v4: -
>> v5: - (Except for the change in below SO comments)
>>
>>   fs/btrfs/volumes.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index fa9fe47b5b68..53ead67b625c 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2369,6 +2369,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>>   	u64 super_flags;
>>   
>>   	lockdep_assert_held(&uuid_mutex);
>> +	lockdep_assert_held(&fs_devices->device_list_mutex);
>> +
>>   	if (!fs_devices->seeding)
>>   		return -EINVAL;
>>   
>> @@ -2400,7 +2402,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>>   	INIT_LIST_HEAD(&seed_devices->alloc_list);


>>   	mutex_init(&seed_devices->device_list_mutex);

  BTW mutex_init here will go, as the sprout's private
  fs_devices::device_list_mutex is unused. It is a pending cleanup.

> A few lines before this one there's alloc_fs_devices and
> clone_fs_devices, both allocating memory. This would happen under a big
> lock as device_list_mutex also protects superblock write. This is a
> pattern to avoid.

  Oh. That's right. Thx. One way is to flag NOFS alloc.

> A rough idea would be to split btrfs_prepare_sprout into parts where the
> allocations are not done under the lock and the locked part. It could be
> partially inlined to btrfs_init_new_device.

  I think you mean something like this...

  btrfs_init_new_device()
  <snip>
    if seeding_dev
       alloc_prepare_sprout
    mutex_lock(&fs_devices->device_list_mutex);
    if seeding_dev
       finish_prepare_sprout
    <snip>
    mutex_unlock(&fs_devices->device_list_mutex);

  I am trying.

Thanks, Anand

>>   
>> -	mutex_lock(&fs_devices->device_list_mutex);
>>   	list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices,
>>   			      synchronize_rcu);
>>   	list_for_each_entry(device, &seed_devices->devices, dev_list)
>> @@ -2416,7 +2417,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>>   	generate_random_uuid(fs_devices->fsid);
>>   	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
>>   	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
>> -	mutex_unlock(&fs_devices->device_list_mutex);
>>   
>>   	super_flags = btrfs_super_flags(disk_super) &
>>   		      ~BTRFS_SUPER_FLAG_SEEDING;
>> @@ -2591,10 +2591,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>   	device->dev_stats_valid = 1;
>>   	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>>   
>> +	mutex_lock(&fs_devices->device_list_mutex);
>>   	if (seeding_dev) {
>>   		btrfs_clear_sb_rdonly(sb);
>>   		ret = btrfs_prepare_sprout(fs_info);
>>   		if (ret) {
>> +			mutex_unlock(&fs_devices->device_list_mutex);
>>   			btrfs_abort_transaction(trans, ret);
>>   			goto error_trans;
>>   		}
>> @@ -2604,7 +2606,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>   
>>   	device->fs_devices = fs_devices;
>>   
>> -	mutex_lock(&fs_devices->device_list_mutex);
>>   	mutex_lock(&fs_info->chunk_mutex);
>>   	list_add_rcu(&device->dev_list, &fs_devices->devices);
>>   	list_add(&device->dev_alloc_list, &fs_devices->alloc_list);
>> -- 
>> 2.31.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-18  0:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31  1:21 [PATCH V5 0/2] btrfs: device_list_mutex fix lockdep warn and cleanup Anand Jain
2021-08-31  1:21 ` [PATCH V5 1/2] btrfs: fix lockdep warning while mounting sprout fs Anand Jain
2021-08-31  8:18   ` Nikolay Borisov
2021-09-02 23:51     ` Anand Jain
2021-08-31 12:37   ` Nikolay Borisov
2021-09-01  0:49   ` Su Yue
2021-09-02 15:28   ` David Sterba
2021-08-31  1:21 ` [PATCH RFC V5 2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent Anand Jain
2021-08-31 13:03   ` Nikolay Borisov
2021-09-03  3:08     ` Anand Jain
2021-09-17 15:37   ` David Sterba
2021-09-18  0:10     ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.