* [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree
@ 2020-07-14 17:21 Marcos Paulo de Souza
2020-07-14 17:43 ` Nikolay Borisov
2020-07-15 13:14 ` David Sterba
0 siblings, 2 replies; 4+ messages in thread
From: Marcos Paulo de Souza @ 2020-07-14 17:21 UTC (permalink / raw)
To: dsterba, linux-btrfs; +Cc: Marcos Paulo de Souza, Nokolay Borisov
From: Marcos Paulo de Souza <mpdesouza@suse.com>
There is a lockdep report when running xfstests btrfs/161 (seed + sprout
devices) related to chunk_mutex:
======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc4-default #504 Not tainted
------------------------------------------------------
mount/454 is trying to acquire lock:
ffff8881133058e8 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x340
but task is already holding lock:
ffff888112010988 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xcc/0x600
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
__mutex_lock+0x139/0x13e0
btrfs_init_new_device+0x1ed1/0x3c50
btrfs_ioctl+0x19b4/0x8130
ksys_ioctl+0xa9/0xf0
__x64_sys_ioctl+0x6f/0xb0
do_syscall_64+0x50/0xd0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__lock_acquire+0x2df6/0x4da0
lock_acquire+0x176/0x820
__mutex_lock+0x139/0x13e0
clone_fs_devices+0x4d/0x340
read_one_dev+0xa5c/0x13e0
btrfs_read_chunk_tree+0x480/0x600
open_ctree+0x244b/0x450d
btrfs_mount_root.cold.80+0x10/0x129
legacy_get_tree+0xff/0x1f0
vfs_get_tree+0x83/0x250
fc_mount+0xf/0x90
vfs_kern_mount.part.47+0x5c/0x80
btrfs_mount+0x215/0xbcf
legacy_get_tree+0xff/0x1f0
vfs_get_tree+0x83/0x250
do_mount+0x106d/0x16f0
__x64_sys_mount+0x162/0x1b0
do_syscall_64+0x50/0xd0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->chunk_mutex);
lock(&fs_devs->device_list_mutex);
lock(&fs_info->chunk_mutex);
lock(&fs_devs->device_list_mutex);
*** DEADLOCK ***
3 locks held by mount/454:
#0: ffff8881119340e8 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super.isra.21+0x183/0x990
#1: ffffffff83b260d0 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xb6/0x600
#2: ffff888112010988 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xcc/0x600
stack backtrace:
CPU: 3 PID: 454 Comm: mount Not tainted 5.8.0-rc4-default #504
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
dump_stack+0x9d/0xe0
check_noncircular+0x351/0x410
? print_circular_bug.isra.41+0x360/0x360
? mark_lock+0x12d/0x14d0
? rcu_read_unlock+0x40/0x40
? sched_clock+0x5/0x10
? sched_clock_cpu+0x18/0x170
? sched_clock_cpu+0x18/0x170
__lock_acquire+0x2df6/0x4da0
? lockdep_hardirqs_on_prepare+0x540/0x540
? _raw_spin_unlock_irqrestore+0x3e/0x50
? trace_hardirqs_on+0x20/0x170
? stack_depot_save+0x260/0x470
lock_acquire+0x176/0x820
? clone_fs_devices+0x4d/0x340
? btrfs_mount_root.cold.80+0x10/0x129
? rcu_read_unlock+0x40/0x40
? vfs_kern_mount.part.47+0x5c/0x80
? btrfs_mount+0x215/0xbcf
? legacy_get_tree+0xff/0x1f0
? vfs_get_tree+0x83/0x250
? do_mount+0x106d/0x16f0
? __x64_sys_mount+0x162/0x1b0
? do_syscall_64+0x50/0xd0
__mutex_lock+0x139/0x13e0
? clone_fs_devices+0x4d/0x340
? clone_fs_devices+0x4d/0x340
? mark_held_locks+0xb7/0x120
? mutex_lock_io_nested+0x12a0/0x12a0
? rcu_read_lock_sched_held+0xa1/0xd0
? lockdep_init_map_waits+0x29f/0x810
? lockdep_init_map_waits+0x29f/0x810
? debug_mutex_init+0x31/0x60
? clone_fs_devices+0x4d/0x340
clone_fs_devices+0x4d/0x340
? read_extent_buffer+0x15b/0x2a0
read_one_dev+0xa5c/0x13e0
? split_leaf+0xef0/0xef0
? read_one_chunk+0xb20/0xb20
? btrfs_get_token_32+0x730/0x730
? memcpy+0x38/0x60
? read_extent_buffer+0x15b/0x2a0
btrfs_read_chunk_tree+0x480/0x600
? btrfs_check_rw_degradable+0x340/0x340
? btrfs_root_node+0x2d/0x240
? memcpy+0x38/0x60
? read_extent_buffer+0x15b/0x2a0
? btrfs_root_node+0x2d/0x240
open_ctree+0x244b/0x450d
? close_ctree+0x61c/0x61c
? sget+0x328/0x400
btrfs_mount_root.cold.80+0x10/0x129
? parse_rescue_options+0x260/0x260
? rcu_read_lock_sched_held+0xa1/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
? kfree+0x2e2/0x330
? parse_rescue_options+0x260/0x260
legacy_get_tree+0xff/0x1f0
vfs_get_tree+0x83/0x250
fc_mount+0xf/0x90
vfs_kern_mount.part.47+0x5c/0x80
btrfs_mount+0x215/0xbcf
? check_object+0xb3/0x2c0
? btrfs_get_subvol_name_from_objectid+0x7f0/0x7f0
? ___slab_alloc+0x4e5/0x570
? vfs_parse_fs_string+0xc0/0x100
? rcu_read_lock_sched_held+0xa1/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
? kfree+0x2e2/0x330
? btrfs_get_subvol_name_from_objectid+0x7f0/0x7f0
? legacy_get_tree+0xff/0x1f0
legacy_get_tree+0xff/0x1f0
vfs_get_tree+0x83/0x250
do_mount+0x106d/0x16f0
? lock_downgrade+0x720/0x720
? copy_mount_string+0x20/0x20
? _copy_from_user+0xbe/0x100
? memdup_user+0x47/0x70
__x64_sys_mount+0x162/0x1b0
do_syscall_64+0x50/0xd0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f82dd1583ca
Code: Bad RIP value.
RSP: 002b:00007ffcf6935fc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000562f10678500 RCX: 00007f82dd1583ca
RDX: 0000562f1067f8e0 RSI: 0000562f1067b3f0 RDI: 0000562f106786e0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000562f106786e0
R13: 0000562f1067f8e0 R14: 0000000000000000 R15: 00007f82dd6798a4
In volumes.c there are comments stating that chunk_mutex is used to
protect add/remove chunks, trim or add/remove devices. Since
btrfs_read_chunk_tree is only called on mount, and trim and chunk alloc
cannot happen at mount time, it's safe to remove this lock/unlock.
Also, btrfs_ioctl_{add|rm}_dev calls set BTRFS_FS_EXCL_OP, which should
be safe.
Dropping the mutex lock/unlock solves the lockdep warning.
Suggested-by: Nokolay Borisov <nborisov@suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
fs/btrfs/volumes.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c7a3d4d730a3..94cbdadd9d26 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7033,7 +7033,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
* otherwise we don't need it.
*/
mutex_lock(&uuid_mutex);
- mutex_lock(&fs_info->chunk_mutex);
/*
* Read all device items, and then all the chunk items. All
@@ -7100,7 +7099,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
}
ret = 0;
error:
- mutex_unlock(&fs_info->chunk_mutex);
mutex_unlock(&uuid_mutex);
btrfs_free_path(path);
--
2.27.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree
2020-07-14 17:21 [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree Marcos Paulo de Souza
@ 2020-07-14 17:43 ` Nikolay Borisov
2020-07-14 18:05 ` Nikolay Borisov
2020-07-15 13:14 ` David Sterba
1 sibling, 1 reply; 4+ messages in thread
From: Nikolay Borisov @ 2020-07-14 17:43 UTC (permalink / raw)
To: Marcos Paulo de Souza, dsterba, linux-btrfs; +Cc: Marcos Paulo de Souza
On 14.07.20 г. 20:21 ч., Marcos Paulo de Souza wrote:
> From: Marcos Paulo de Souza <mpdesouza@suse.com>
>
> There is a lockdep report when running xfstests btrfs/161 (seed + sprout
> devices) related to chunk_mutex:
>
For whatever reason this submission seems garbled. In any case my
proposal was to replace the chunk mutex with FS_EXCL_OP in
btrfs_read_chunk_tree which you haven't done. You simply remove the
chunk mutex.
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.8.0-rc4-default #504 Not tainted
> ------------------------------------------------------
> mount/454 is trying to acquire lock:
> ffff8881133058e8 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x340
>
> but task is already holding lock:
> ffff888112010988 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xcc/0x600
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
> __mutex_lock+0x139/0x13e0
> btrfs_init_new_device+0x1ed1/0x3c50
> btrfs_ioctl+0x19b4/0x8130
> ksys_ioctl+0xa9/0xf0
> __x64_sys_ioctl+0x6f/0xb0
> do_syscall_64+0x50/0xd0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
> __lock_acquire+0x2df6/0x4da0
> lock_acquire+0x176/0x820
> __mutex_lock+0x139/0x13e0
> clone_fs_devices+0x4d/0x340
> read_one_dev+0xa5c/0x13e0
> btrfs_read_chunk_tree+0x480/0x600
> open_ctree+0x244b/0x450d
> btrfs_mount_root.cold.80+0x10/0x129
> legacy_get_tree+0xff/0x1f0
> vfs_get_tree+0x83/0x250
> fc_mount+0xf/0x90
> vfs_kern_mount.part.47+0x5c/0x80
> btrfs_mount+0x215/0xbcf
> legacy_get_tree+0xff/0x1f0
> vfs_get_tree+0x83/0x250
> do_mount+0x106d/0x16f0
> __x64_sys_mount+0x162/0x1b0
> do_syscall_64+0x50/0xd0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&fs_info->chunk_mutex);
> lock(&fs_devs->device_list_mutex);
> lock(&fs_info->chunk_mutex);
> lock(&fs_devs->device_list_mutex);
>
> *** DEADLOCK ***
>
> 3 locks held by mount/454:
> #0: ffff8881119340e8 (&type->s_umount_key#41/1){+.+.}-{3:3}, at: alloc_super.isra.21+0x183/0x990
> #1: ffffffff83b260d0 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xb6/0x600
> #2: ffff888112010988 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0xcc/0x600
>
> stack backtrace:
> CPU: 3 PID: 454 Comm: mount Not tainted 5.8.0-rc4-default #504
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
> dump_stack+0x9d/0xe0
> check_noncircular+0x351/0x410
> ? print_circular_bug.isra.41+0x360/0x360
> ? mark_lock+0x12d/0x14d0
> ? rcu_read_unlock+0x40/0x40
> ? sched_clock+0x5/0x10
> ? sched_clock_cpu+0x18/0x170
> ? sched_clock_cpu+0x18/0x170
> __lock_acquire+0x2df6/0x4da0
> ? lockdep_hardirqs_on_prepare+0x540/0x540
> ? _raw_spin_unlock_irqrestore+0x3e/0x50
> ? trace_hardirqs_on+0x20/0x170
> ? stack_depot_save+0x260/0x470
> lock_acquire+0x176/0x820
> ? clone_fs_devices+0x4d/0x340
> ? btrfs_mount_root.cold.80+0x10/0x129
> ? rcu_read_unlock+0x40/0x40
> ? vfs_kern_mount.part.47+0x5c/0x80
> ? btrfs_mount+0x215/0xbcf
> ? legacy_get_tree+0xff/0x1f0
> ? vfs_get_tree+0x83/0x250
> ? do_mount+0x106d/0x16f0
> ? __x64_sys_mount+0x162/0x1b0
> ? do_syscall_64+0x50/0xd0
> __mutex_lock+0x139/0x13e0
> ? clone_fs_devices+0x4d/0x340
> ? clone_fs_devices+0x4d/0x340
> ? mark_held_locks+0xb7/0x120
> ? mutex_lock_io_nested+0x12a0/0x12a0
> ? rcu_read_lock_sched_held+0xa1/0xd0
> ? lockdep_init_map_waits+0x29f/0x810
> ? lockdep_init_map_waits+0x29f/0x810
> ? debug_mutex_init+0x31/0x60
> ? clone_fs_devices+0x4d/0x340
> clone_fs_devices+0x4d/0x340
> ? read_extent_buffer+0x15b/0x2a0
> read_one_dev+0xa5c/0x13e0
> ? split_leaf+0xef0/0xef0
> ? read_one_chunk+0xb20/0xb20
> ? btrfs_get_token_32+0x730/0x730
> ? memcpy+0x38/0x60
> ? read_extent_buffer+0x15b/0x2a0
> btrfs_read_chunk_tree+0x480/0x600
> ? btrfs_check_rw_degradable+0x340/0x340
> ? btrfs_root_node+0x2d/0x240
> ? memcpy+0x38/0x60
> ? read_extent_buffer+0x15b/0x2a0
> ? btrfs_root_node+0x2d/0x240
> open_ctree+0x244b/0x450d
> ? close_ctree+0x61c/0x61c
> ? sget+0x328/0x400
> btrfs_mount_root.cold.80+0x10/0x129
> ? parse_rescue_options+0x260/0x260
> ? rcu_read_lock_sched_held+0xa1/0xd0
> ? rcu_read_lock_bh_held+0xb0/0xb0
> ? kfree+0x2e2/0x330
> ? parse_rescue_options+0x260/0x260
> legacy_get_tree+0xff/0x1f0
> vfs_get_tree+0x83/0x250
> fc_mount+0xf/0x90
> vfs_kern_mount.part.47+0x5c/0x80
> btrfs_mount+0x215/0xbcf
> ? check_object+0xb3/0x2c0
> ? btrfs_get_subvol_name_from_objectid+0x7f0/0x7f0
> ? ___slab_alloc+0x4e5/0x570
> ? vfs_parse_fs_string+0xc0/0x100
> ? rcu_read_lock_sched_held+0xa1/0xd0
> ? rcu_read_lock_bh_held+0xb0/0xb0
> ? kfree+0x2e2/0x330
> ? btrfs_get_subvol_name_from_objectid+0x7f0/0x7f0
> ? legacy_get_tree+0xff/0x1f0
> legacy_get_tree+0xff/0x1f0
> vfs_get_tree+0x83/0x250
> do_mount+0x106d/0x16f0
> ? lock_downgrade+0x720/0x720
> ? copy_mount_string+0x20/0x20
> ? _copy_from_user+0xbe/0x100
> ? memdup_user+0x47/0x70
> __x64_sys_mount+0x162/0x1b0
> do_syscall_64+0x50/0xd0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f82dd1583ca
> Code: Bad RIP value.
> RSP: 002b:00007ffcf6935fc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 0000562f10678500 RCX: 00007f82dd1583ca
> RDX: 0000562f1067f8e0 RSI: 0000562f1067b3f0 RDI: 0000562f106786e0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000562f106786e0
> R13: 0000562f1067f8e0 R14: 0000000000000000 R15: 00007f82dd6798a4
>
> In volumes.c there are comments stating that chunk_mutex is used to
> protect add/remove chunks, trim or add/remove devices. Since
> btrfs_read_chunk_tree is only called on mount, and trim and chunk alloc
> cannot happen at mount time, it's safe to remove this lock/unlock.
>
> Also, btrfs_ioctl_{add|rm}_dev calls set BTRFS_FS_EXCL_OP, which should
> be safe.
>
> Dropping the mutex lock/unlock solves the lockdep warning.
>
> Suggested-by: Nokolay Borisov <nborisov@suse.com>
> Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
> ---
> fs/btrfs/volumes.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index c7a3d4d730a3..94cbdadd9d26 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7033,7 +7033,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
> * otherwise we don't need it.
> */
> mutex_lock(&uuid_mutex);
> - mutex_lock(&fs_info->chunk_mutex);
>
> /*
> * Read all device items, and then all the chunk items. All
> @@ -7100,7 +7099,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
> }
> ret = 0;
> error:
> - mutex_unlock(&fs_info->chunk_mutex);
> mutex_unlock(&uuid_mutex);
>
> btrfs_free_path(path);
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree
2020-07-14 17:43 ` Nikolay Borisov
@ 2020-07-14 18:05 ` Nikolay Borisov
0 siblings, 0 replies; 4+ messages in thread
From: Nikolay Borisov @ 2020-07-14 18:05 UTC (permalink / raw)
To: Marcos Paulo de Souza, dsterba, linux-btrfs; +Cc: Marcos Paulo de Souza
On 14.07.20 г. 20:43 ч., Nikolay Borisov wrote:
>
>
> On 14.07.20 г. 20:21 ч., Marcos Paulo de Souza wrote:
>> From: Marcos Paulo de Souza <mpdesouza@suse.com>
>>
>> There is a lockdep report when running xfstests btrfs/161 (seed + sprout
>> devices) related to chunk_mutex:
>>
>
> For whatever reason this submission seems garbled. In any case my
> proposal was to replace the chunk mutex with FS_EXCL_OP in
> btrfs_read_chunk_tree which you haven't done. You simply remove the
> chunk mutex.
>
Actually the mutex could possibly be removed because in order to
add/remove a device we need a well-formed mount i.e the mount process
must have been completed. Given that chunk tree read happens as part of
this I'd assume it's safe to drop it altogether.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree
2020-07-14 17:21 [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree Marcos Paulo de Souza
2020-07-14 17:43 ` Nikolay Borisov
@ 2020-07-15 13:14 ` David Sterba
1 sibling, 0 replies; 4+ messages in thread
From: David Sterba @ 2020-07-15 13:14 UTC (permalink / raw)
To: Marcos Paulo de Souza
Cc: dsterba, linux-btrfs, Marcos Paulo de Souza, Nokolay Borisov
On Tue, Jul 14, 2020 at 02:21:53PM -0300, Marcos Paulo de Souza wrote:
> From: Marcos Paulo de Souza <mpdesouza@suse.com>
> In volumes.c there are comments stating that chunk_mutex is used to
> protect add/remove chunks, trim or add/remove devices. Since
> btrfs_read_chunk_tree is only called on mount, and trim and chunk alloc
> cannot happen at mount time, it's safe to remove this lock/unlock.
>
> Also, btrfs_ioctl_{add|rm}_dev calls set BTRFS_FS_EXCL_OP, which should
> be safe.
I've analyzed it in
https://lore.kernel.org/linux-btrfs/20200512232827.GI18421@twin.jikos.cz/
and an ioctl gets called during mount with the seeding device. I don't
see it mentioned in your changelog.
The BTRFS_FS_EXCL_OP do not affect that because there's only one the
device add at this time.
> Dropping the mutex lock/unlock solves the lockdep warning.
Yes, but the question is if this is also correct :) Right now I don't
see how the excl_op flag is relevant here.
The patch proposed in
https://lore.kernel.org/linux-btrfs/20200513194659.34493-1-anand.jain@oracle.com/
has some issues, I tried to fix them back then but for some reason I
don't remember the patch did not work as expected.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-15 13:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-14 17:21 [RFC] btrfs: volumes: Drop chunk_mutex lock/unlock on btrfs_read_chunk_tree Marcos Paulo de Souza
2020-07-14 17:43 ` Nikolay Borisov
2020-07-14 18:05 ` Nikolay Borisov
2020-07-15 13:14 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.