* [PATCH 0/3] Fix a few lockdep splats
@ 2020-07-17 19:12 Josef Bacik
2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
To: linux-btrfs, kernel-team
We've had a longstanding lockdep splat in open_fs_devices that we've been
ignoring for some time, since it's not actually a problem. However this has
been masking all the other lockdep issues that we have, since lockdep takes its
ball and goes home as soon as you hit one problem.
This fixes all the issues I found while running xfstests. I would like to do
something more sane around fs_devices in the future, but for right now lets just
get these lockdep things gone so we can focus on more important things, as well
as get notifications on any new lockdep issues that show up. Thanks,
Josef
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
@ 2020-07-17 19:12 ` Josef Bacik
2020-07-22 12:57 ` David Sterba
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
To: linux-btrfs, kernel-team
There's long existed a lockdep splat because we open our bdev's under
the ->device_list_mutex at mount time, which acquires the bd_mutex.
Usually this goes unnoticed, but if you do loopback devices at all
suddenly the bd_mutex comes with a whole host of other dependencies,
which results in the splat when you mount a btrfs file system.
======================================================
WARNING: possible circular locking dependency detected
5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted
------------------------------------------------------
systemd-journal/509 is trying to acquire lock:
ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs]
but task is already holding lock:
ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #6 (sb_pagefaults){.+.+}-{0:0}:
__sb_start_write+0x13e/0x220
btrfs_page_mkwrite+0x59/0x560 [btrfs]
do_page_mkwrite+0x4f/0x130
do_wp_page+0x3b0/0x4f0
handle_mm_fault+0xf47/0x1850
do_user_addr_fault+0x1fc/0x4b0
exc_page_fault+0x88/0x300
asm_exc_page_fault+0x1e/0x30
-> #5 (&mm->mmap_lock#2){++++}-{3:3}:
__might_fault+0x60/0x80
_copy_from_user+0x20/0xb0
get_sg_io_hdr+0x9a/0xb0
scsi_cmd_ioctl+0x1ea/0x2f0
cdrom_ioctl+0x3c/0x12b4
sr_block_ioctl+0xa4/0xd0
block_ioctl+0x3f/0x50
ksys_ioctl+0x82/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #4 (&cd->lock){+.+.}-{3:3}:
__mutex_lock+0x7b/0x820
sr_block_open+0xa2/0x180
__blkdev_get+0xdd/0x550
blkdev_get+0x38/0x150
do_dentry_open+0x16b/0x3e0
path_openat+0x3c9/0xa00
do_filp_open+0x75/0x100
do_sys_openat2+0x8a/0x140
__x64_sys_openat+0x46/0x70
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #3 (&bdev->bd_mutex){+.+.}-{3:3}:
__mutex_lock+0x7b/0x820
__blkdev_get+0x6a/0x550
blkdev_get+0x85/0x150
blkdev_get_by_path+0x2c/0x70
btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
open_fs_devices+0x88/0x240 [btrfs]
btrfs_open_devices+0x92/0xa0 [btrfs]
btrfs_mount_root+0x250/0x490 [btrfs]
legacy_get_tree+0x30/0x50
vfs_get_tree+0x28/0xc0
vfs_kern_mount.part.0+0x71/0xb0
btrfs_mount+0x119/0x380 [btrfs]
legacy_get_tree+0x30/0x50
vfs_get_tree+0x28/0xc0
do_mount+0x8c6/0xca0
__x64_sys_mount+0x8e/0xd0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__mutex_lock+0x7b/0x820
btrfs_run_dev_stats+0x36/0x420 [btrfs]
commit_cowonly_roots+0x91/0x2d0 [btrfs]
btrfs_commit_transaction+0x4e6/0x9f0 [btrfs]
btrfs_sync_file+0x38a/0x480 [btrfs]
__x64_sys_fdatasync+0x47/0x80
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
__mutex_lock+0x7b/0x820
btrfs_commit_transaction+0x48e/0x9f0 [btrfs]
btrfs_sync_file+0x38a/0x480 [btrfs]
__x64_sys_fdatasync+0x47/0x80
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}:
__lock_acquire+0x1241/0x20c0
lock_acquire+0xb0/0x400
__mutex_lock+0x7b/0x820
btrfs_record_root_in_trans+0x44/0x70 [btrfs]
start_transaction+0xd2/0x500 [btrfs]
btrfs_dirty_inode+0x44/0xd0 [btrfs]
file_update_time+0xc6/0x120
btrfs_page_mkwrite+0xda/0x560 [btrfs]
do_page_mkwrite+0x4f/0x130
do_wp_page+0x3b0/0x4f0
handle_mm_fault+0xf47/0x1850
do_user_addr_fault+0x1fc/0x4b0
exc_page_fault+0x88/0x300
asm_exc_page_fault+0x1e/0x30
other info that might help us debug this:
Chain exists of:
&fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_pagefaults);
lock(&mm->mmap_lock#2);
lock(sb_pagefaults);
lock(&fs_info->reloc_mutex);
*** DEADLOCK ***
3 locks held by systemd-journal/509:
#0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0
#1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]
#2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs]
stack backtrace:
CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack+0x92/0xc8
check_noncircular+0x134/0x150
__lock_acquire+0x1241/0x20c0
lock_acquire+0xb0/0x400
? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
? lock_acquire+0xb0/0x400
? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
__mutex_lock+0x7b/0x820
? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
? kvm_sched_clock_read+0x14/0x30
? sched_clock+0x5/0x10
? sched_clock_cpu+0xc/0xb0
btrfs_record_root_in_trans+0x44/0x70 [btrfs]
start_transaction+0xd2/0x500 [btrfs]
btrfs_dirty_inode+0x44/0xd0 [btrfs]
file_update_time+0xc6/0x120
btrfs_page_mkwrite+0xda/0x560 [btrfs]
? sched_clock+0x5/0x10
do_page_mkwrite+0x4f/0x130
do_wp_page+0x3b0/0x4f0
handle_mm_fault+0xf47/0x1850
do_user_addr_fault+0x1fc/0x4b0
exc_page_fault+0x88/0x300
? asm_exc_page_fault+0x8/0x30
asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fa3972fdbfe
Code: Bad RIP value.
Fix this by not holding the ->device_list_mutex at this point. The
device_list_mutex exists to protect us from modifying the device list
while the file system is running.
However it can also be modified by doing a scan on a device. But this
action is specifically protected by the uuid_mutex, which we are holding
here. We cannot race with opening at this point because we have the
->s_mount lock held during the mount. Not having the
->device_list_mutex here is perfectly safe as we're not going to change
the devices at this point.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/volumes.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ce01e44f8134..20295441251a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -258,6 +258,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
* may be used to exclude some operations from running concurrently without any
* modifications to the list (see write_all_supers)
*
+ * Is not required at mount and close times, because our device list is
+ * protected by the uuid_mutex at that point.
+ *
* balance_mutex
* -------------
* protects balance structures (status, state) and context accessed from
@@ -602,6 +605,11 @@ static int btrfs_free_stale_devices(const char *path,
return ret;
}
+/*
+ * This is only used on mount, and we are protected from competing things
+ * messing with our fs_devices by the uuid_mutex, thus we do not need the
+ * fs_devices->device_list_mutex here.
+ */
static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
struct btrfs_device *device, fmode_t flags,
void *holder)
@@ -1230,7 +1238,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
lockdep_assert_held(&uuid_mutex);
- mutex_lock(&fs_devices->device_list_mutex);
if (fs_devices->opened) {
fs_devices->opened++;
ret = 0;
@@ -1238,7 +1245,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
list_sort(NULL, &fs_devices->devices, devid_cmp);
ret = open_fs_devices(fs_devices, flags, holder);
}
- mutex_unlock(&fs_devices->device_list_mutex);
return ret;
}
--
2.24.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
@ 2020-07-17 19:12 ` Josef Bacik
2020-07-20 7:23 ` Anand Jain
` (2 more replies)
2020-07-17 19:12 ` [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info Josef Bacik
` (2 subsequent siblings)
4 siblings, 3 replies; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
To: linux-btrfs, kernel-team
We are currently getting this lockdep splat
======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5+ #20 Tainted: G E
------------------------------------------------------
mount/678048 is trying to acquire lock:
ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs]
but task is already holding lock:
ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
__mutex_lock+0x8b/0x8f0
btrfs_init_new_device+0x2d2/0x1240 [btrfs]
btrfs_ioctl+0x1de/0x2d20 [btrfs]
ksys_ioctl+0x87/0xc0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
__lock_acquire+0x1240/0x2460
lock_acquire+0xab/0x360
__mutex_lock+0x8b/0x8f0
clone_fs_devices+0x4d/0x170 [btrfs]
btrfs_read_chunk_tree+0x330/0x800 [btrfs]
open_ctree+0xb7c/0x18ce [btrfs]
btrfs_mount_root.cold+0x13/0xfa [btrfs]
legacy_get_tree+0x30/0x50
vfs_get_tree+0x28/0xc0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
btrfs_mount+0x13b/0x3e0 [btrfs]
legacy_get_tree+0x30/0x50
vfs_get_tree+0x28/0xc0
do_mount+0x7de/0xb30
__x64_sys_mount+0x8e/0xd0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fs_info->chunk_mutex);
lock(&fs_devs->device_list_mutex);
lock(&fs_info->chunk_mutex);
lock(&fs_devs->device_list_mutex);
*** DEADLOCK ***
3 locks held by mount/678048:
#0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
#1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs]
#2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
stack backtrace:
CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
Call Trace:
dump_stack+0x96/0xd0
check_noncircular+0x162/0x180
__lock_acquire+0x1240/0x2460
? asm_sysvec_apic_timer_interrupt+0x12/0x20
lock_acquire+0xab/0x360
? clone_fs_devices+0x4d/0x170 [btrfs]
__mutex_lock+0x8b/0x8f0
? clone_fs_devices+0x4d/0x170 [btrfs]
? rcu_read_lock_sched_held+0x52/0x60
? cpumask_next+0x16/0x20
? module_assert_mutex_or_preempt+0x14/0x40
? __module_address+0x28/0xf0
? clone_fs_devices+0x4d/0x170 [btrfs]
? static_obj+0x4f/0x60
? lockdep_init_map_waits+0x43/0x200
? clone_fs_devices+0x4d/0x170 [btrfs]
clone_fs_devices+0x4d/0x170 [btrfs]
btrfs_read_chunk_tree+0x330/0x800 [btrfs]
open_ctree+0xb7c/0x18ce [btrfs]
? super_setup_bdi_name+0x79/0xd0
btrfs_mount_root.cold+0x13/0xfa [btrfs]
? vfs_parse_fs_string+0x84/0xb0
? rcu_read_lock_sched_held+0x52/0x60
? kfree+0x2b5/0x310
legacy_get_tree+0x30/0x50
vfs_get_tree+0x28/0xc0
fc_mount+0xe/0x40
vfs_kern_mount.part.0+0x71/0x90
btrfs_mount+0x13b/0x3e0 [btrfs]
? cred_has_capability+0x7c/0x120
? rcu_read_lock_sched_held+0x52/0x60
? legacy_get_tree+0x30/0x50
legacy_get_tree+0x30/0x50
vfs_get_tree+0x28/0xc0
do_mount+0x7de/0xb30
? memdup_user+0x4e/0x90
__x64_sys_mount+0x8e/0xd0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and
then read the device, which takes the device_list_mutex. The
device_list_mutex needs to be taken before the chunk_mutex, so this is a
problem. We only really need the chunk mutex around adding the chunk,
so move the mutex around read_one_chunk.
An argument could be made that we don't even need the chunk_mutex here
as it's during mount, and we are protected by various other locks.
However we already have special rules for ->device_list_mutex, and I'd
rather not have another special case for ->chunk_mutex.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/volumes.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 20295441251a..adc7bc2a3094 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7053,7 +7053,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
* otherwise we don't need it.
*/
mutex_lock(&uuid_mutex);
- mutex_lock(&fs_info->chunk_mutex);
/*
* Read all device items, and then all the chunk items. All
@@ -7103,7 +7102,9 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
} else if (found_key.type == BTRFS_CHUNK_ITEM_KEY) {
struct btrfs_chunk *chunk;
chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
+ mutex_lock(&fs_info->chunk_mutex);
ret = read_one_chunk(&found_key, leaf, chunk);
+ mutex_unlock(&fs_info->chunk_mutex);
if (ret)
goto error;
}
@@ -7133,7 +7134,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
}
ret = 0;
error:
- mutex_unlock(&fs_info->chunk_mutex);
mutex_unlock(&uuid_mutex);
btrfs_free_path(path);
--
2.24.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
@ 2020-07-17 19:12 ` Josef Bacik
2020-07-21 10:00 ` [PATCH 0/3] Fix a few lockdep splats David Sterba
2020-07-22 14:02 ` David Sterba
4 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
To: linux-btrfs, kernel-team
When running with -o enospc_debug you can get the following splat if one
of the dump_space_info's trip
======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5+ #20 Tainted: G OE
------------------------------------------------------
dd/563090 is trying to acquire lock:
ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs]
but task is already holding lock:
ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&cache->lock){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs]
find_free_extent+0x7ef/0x13b0 [btrfs]
btrfs_reserve_extent+0x9b/0x180 [btrfs]
btrfs_alloc_tree_block+0xc1/0x340 [btrfs]
alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
__btrfs_cow_block+0x122/0x530 [btrfs]
btrfs_cow_block+0x106/0x210 [btrfs]
commit_cowonly_roots+0x55/0x300 [btrfs]
btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
sync_filesystem+0x74/0x90
generic_shutdown_super+0x22/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x36/0x70
cleanup_mnt+0x104/0x160
task_work_run+0x5f/0x90
__prepare_exit_to_usermode+0x1bd/0x1c0
do_syscall_64+0x5e/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #2 (&space_info->lock){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs]
btrfs_inode_rsv_release+0x4f/0x170 [btrfs]
btrfs_clear_delalloc_extent+0x155/0x480 [btrfs]
clear_state_bit+0x81/0x1a0 [btrfs]
__clear_extent_bit+0x25c/0x5d0 [btrfs]
clear_extent_bit+0x15/0x20 [btrfs]
btrfs_invalidatepage+0x2b7/0x3c0 [btrfs]
truncate_cleanup_page+0x47/0xe0
truncate_inode_pages_range+0x238/0x840
truncate_pagecache+0x44/0x60
btrfs_setattr+0x202/0x5e0 [btrfs]
notify_change+0x33b/0x490
do_truncate+0x76/0xd0
path_openat+0x687/0xa10
do_filp_open+0x91/0x100
do_sys_openat2+0x215/0x2d0
do_sys_open+0x44/0x80
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #1 (&tree->lock#2){+.+.}-{2:2}:
_raw_spin_lock+0x25/0x30
find_first_extent_bit+0x32/0x150 [btrfs]
write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
__btrfs_write_out_cache+0x172/0x480 [btrfs]
btrfs_write_out_cache+0x7a/0xf0 [btrfs]
btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs]
commit_cowonly_roots+0x245/0x300 [btrfs]
btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
close_ctree+0xf9/0x2f5 [btrfs]
generic_shutdown_super+0x6c/0x100
kill_anon_super+0x14/0x30
btrfs_kill_super+0x12/0x20 [btrfs]
deactivate_locked_super+0x36/0x70
cleanup_mnt+0x104/0x160
task_work_run+0x5f/0x90
__prepare_exit_to_usermode+0x1bd/0x1c0
do_syscall_64+0x5e/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
-> #0 (&ctl->tree_lock){+.+.}-{2:2}:
__lock_acquire+0x1240/0x2460
lock_acquire+0xab/0x360
_raw_spin_lock+0x25/0x30
btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_space_info+0xf4/0x120 [btrfs]
btrfs_reserve_extent+0x176/0x180 [btrfs]
__btrfs_prealloc_file_range+0x145/0x550 [btrfs]
cache_save_setup+0x28d/0x3b0 [btrfs]
btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
btrfs_commit_transaction+0xcc/0xac0 [btrfs]
btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
btrfs_file_write_iter+0x3cf/0x610 [btrfs]
new_sync_write+0x11e/0x1b0
vfs_write+0x1c9/0x200
ksys_write+0x68/0xe0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
other info that might help us debug this:
Chain exists of:
&ctl->tree_lock --> &space_info->lock --> &cache->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&cache->lock);
lock(&space_info->lock);
lock(&cache->lock);
lock(&ctl->tree_lock);
*** DEADLOCK ***
6 locks held by dd/563090:
#0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200
#1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs]
#2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs]
#3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs]
#4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs]
#5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]
stack backtrace:
CPU: 3 PID: 563090 Comm: dd Tainted: G OE 5.8.0-rc5+ #20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
Call Trace:
dump_stack+0x96/0xd0
check_noncircular+0x162/0x180
__lock_acquire+0x1240/0x2460
? wake_up_klogd.part.0+0x30/0x40
lock_acquire+0xab/0x360
? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
_raw_spin_lock+0x25/0x30
? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_free_space+0x2b/0xa0 [btrfs]
btrfs_dump_space_info+0xf4/0x120 [btrfs]
btrfs_reserve_extent+0x176/0x180 [btrfs]
__btrfs_prealloc_file_range+0x145/0x550 [btrfs]
? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs]
cache_save_setup+0x28d/0x3b0 [btrfs]
btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
btrfs_commit_transaction+0xcc/0xac0 [btrfs]
? start_transaction+0xe0/0x5b0 [btrfs]
btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
? ktime_get_coarse_real_ts64+0xa8/0xd0
? trace_hardirqs_on+0x1c/0xe0
btrfs_file_write_iter+0x3cf/0x610 [btrfs]
new_sync_write+0x11e/0x1b0
vfs_write+0x1c9/0x200
ksys_write+0x68/0xe0
do_syscall_64+0x52/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This is because we're holding the block_group->lock while trying to dump
the free space cache. However we don't need this lock, we just need it
to read the values for the printk, so move the free space cache dumping
outside of the block group lock.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
fs/btrfs/space-info.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c7bd3fdd7792..475968ccbd1d 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -468,8 +468,8 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s",
cache->start, cache->length, cache->used, cache->pinned,
cache->reserved, cache->ro ? "[readonly]" : "");
- btrfs_dump_free_space(cache, bytes);
spin_unlock(&cache->lock);
+ btrfs_dump_free_space(cache, bytes);
}
if (++index < BTRFS_NR_RAID_TYPES)
goto again;
--
2.24.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
@ 2020-07-20 7:23 ` Anand Jain
2020-07-22 13:36 ` David Sterba
2020-07-22 13:47 ` David Sterba
2 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-07-20 7:23 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs, kernel-team
On 18/7/20 3:12 am, Josef Bacik wrote:
> We are currently getting this lockdep splat
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.8.0-rc5+ #20 Tainted: G E
> ------------------------------------------------------
> mount/678048 is trying to acquire lock:
> ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs]
>
> but task is already holding lock:
> ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
> __mutex_lock+0x8b/0x8f0
> btrfs_init_new_device+0x2d2/0x1240 [btrfs]
> btrfs_ioctl+0x1de/0x2d20 [btrfs]
> ksys_ioctl+0x87/0xc0
> __x64_sys_ioctl+0x16/0x20
> do_syscall_64+0x52/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
> __lock_acquire+0x1240/0x2460
> lock_acquire+0xab/0x360
> __mutex_lock+0x8b/0x8f0
> clone_fs_devices+0x4d/0x170 [btrfs]
> btrfs_read_chunk_tree+0x330/0x800 [btrfs]
> open_ctree+0xb7c/0x18ce [btrfs]
> btrfs_mount_root.cold+0x13/0xfa [btrfs]
> legacy_get_tree+0x30/0x50
> vfs_get_tree+0x28/0xc0
> fc_mount+0xe/0x40
> vfs_kern_mount.part.0+0x71/0x90
> btrfs_mount+0x13b/0x3e0 [btrfs]
> legacy_get_tree+0x30/0x50
> vfs_get_tree+0x28/0xc0
> do_mount+0x7de/0xb30
> __x64_sys_mount+0x8e/0xd0
> do_syscall_64+0x52/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&fs_info->chunk_mutex);
> lock(&fs_devs->device_list_mutex);
> lock(&fs_info->chunk_mutex);
> lock(&fs_devs->device_list_mutex);
>
> *** DEADLOCK ***
>
> 3 locks held by mount/678048:
> #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
> #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs]
> #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
>
> stack backtrace:
> CPU: 2 PID: 678048 Comm: mount Tainted: G E 5.8.0-rc5+ #20
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
> Call Trace:
> dump_stack+0x96/0xd0
> check_noncircular+0x162/0x180
> __lock_acquire+0x1240/0x2460
> ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> lock_acquire+0xab/0x360
> ? clone_fs_devices+0x4d/0x170 [btrfs]
> __mutex_lock+0x8b/0x8f0
> ? clone_fs_devices+0x4d/0x170 [btrfs]
> ? rcu_read_lock_sched_held+0x52/0x60
> ? cpumask_next+0x16/0x20
> ? module_assert_mutex_or_preempt+0x14/0x40
> ? __module_address+0x28/0xf0
> ? clone_fs_devices+0x4d/0x170 [btrfs]
> ? static_obj+0x4f/0x60
> ? lockdep_init_map_waits+0x43/0x200
> ? clone_fs_devices+0x4d/0x170 [btrfs]
> clone_fs_devices+0x4d/0x170 [btrfs]
> btrfs_read_chunk_tree+0x330/0x800 [btrfs]
> open_ctree+0xb7c/0x18ce [btrfs]
> ? super_setup_bdi_name+0x79/0xd0
> btrfs_mount_root.cold+0x13/0xfa [btrfs]
> ? vfs_parse_fs_string+0x84/0xb0
> ? rcu_read_lock_sched_held+0x52/0x60
> ? kfree+0x2b5/0x310
> legacy_get_tree+0x30/0x50
> vfs_get_tree+0x28/0xc0
> fc_mount+0xe/0x40
> vfs_kern_mount.part.0+0x71/0x90
> btrfs_mount+0x13b/0x3e0 [btrfs]
> ? cred_has_capability+0x7c/0x120
> ? rcu_read_lock_sched_held+0x52/0x60
> ? legacy_get_tree+0x30/0x50
> legacy_get_tree+0x30/0x50
> vfs_get_tree+0x28/0xc0
> do_mount+0x7de/0xb30
> ? memdup_user+0x4e/0x90
> __x64_sys_mount+0x8e/0xd0
> do_syscall_64+0x52/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and
> then read the device, which takes the device_list_mutex. The
> device_list_mutex needs to be taken before the chunk_mutex, so this is a
> problem. We only really need the chunk mutex around adding the chunk,
> so move the mutex around read_one_chunk.
>
> An argument could be made that we don't even need the chunk_mutex here
> as it's during mount, and we are protected by various other locks.
> However we already have special rules for ->device_list_mutex, and I'd
> rather not have another special case for ->chunk_mutex.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Thanks, Anand
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Fix a few lockdep splats
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
` (2 preceding siblings ...)
2020-07-17 19:12 ` [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info Josef Bacik
@ 2020-07-21 10:00 ` David Sterba
2020-07-22 14:02 ` David Sterba
4 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-21 10:00 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Fri, Jul 17, 2020 at 03:12:26PM -0400, Josef Bacik wrote:
> We've had a longstanding lockdep splat in open_fs_devices that we've been
> ignoring for some time, since it's not actually a problem. However this has
> been masking all the other lockdep issues that we have, since lockdep takes its
> ball and goes home as soon as you hit one problem.
That lockdep does not report anything past the first warning is making
it hard to test, I got an unrelated splat with btrfs/078 so can't
actually verify that the fstests run is clean with your patchset.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices
2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
@ 2020-07-22 12:57 ` David Sterba
0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 12:57 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Fri, Jul 17, 2020 at 03:12:27PM -0400, Josef Bacik wrote:
> Fix this by not holding the ->device_list_mutex at this point. The
> device_list_mutex exists to protect us from modifying the device list
> while the file system is running.
>
> However it can also be modified by doing a scan on a device. But this
> action is specifically protected by the uuid_mutex, which we are holding
> here. We cannot race with opening at this point because we have the
> ->s_mount lock held during the mount. Not having the
> ->device_list_mutex here is perfectly safe as we're not going to change
> the devices at this point.
Agreed, the uuid_mutex is sufficient here, since 81ffd56b574 ("btrfs:
fix mount and ioctl device scan ioctl race") that excludes the critical
parts of mount and scan.
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
> fs/btrfs/volumes.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ce01e44f8134..20295441251a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -258,6 +258,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
> * may be used to exclude some operations from running concurrently without any
> * modifications to the list (see write_all_supers)
> *
> + * Is not required at mount and close times, because our device list is
> + * protected by the uuid_mutex at that point.
This is correct, however there's one comment a few lines above about
unid_mutex
"does not protect: manipulation of the fs_devices::devices list!"
so I'll update it means 'not in general but there are exceptions like
mount context'.
> + *
> * balance_mutex
> * -------------
> * protects balance structures (status, state) and context accessed from
> @@ -602,6 +605,11 @@ static int btrfs_free_stale_devices(const char *path,
> return ret;
> }
>
> +/*
> + * This is only used on mount, and we are protected from competing things
> + * messing with our fs_devices by the uuid_mutex, thus we do not need the
> + * fs_devices->device_list_mutex here.
> + */
> static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
> struct btrfs_device *device, fmode_t flags,
> void *holder)
> @@ -1230,7 +1238,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>
> lockdep_assert_held(&uuid_mutex);
>
> - mutex_lock(&fs_devices->device_list_mutex);
I'll leave a comment here as the device list is clearly modified
(list_sort).
> if (fs_devices->opened) {
> fs_devices->opened++;
> ret = 0;
> @@ -1238,7 +1245,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
> list_sort(NULL, &fs_devices->devices, devid_cmp);
> ret = open_fs_devices(fs_devices, flags, holder);
> }
> - mutex_unlock(&fs_devices->device_list_mutex);
>
> return ret;
> }
> --
> 2.24.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
2020-07-20 7:23 ` Anand Jain
@ 2020-07-22 13:36 ` David Sterba
2020-07-22 13:47 ` David Sterba
2 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 13:36 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Fri, Jul 17, 2020 at 03:12:28PM -0400, Josef Bacik wrote:
> We are currently getting this lockdep splat
So this is the btrfs/161 report.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
2020-07-20 7:23 ` Anand Jain
2020-07-22 13:36 ` David Sterba
@ 2020-07-22 13:47 ` David Sterba
2 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 13:47 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Fri, Jul 17, 2020 at 03:12:28PM -0400, Josef Bacik wrote:
> An argument could be made that we don't even need the chunk_mutex here
> as it's during mount, and we are protected by various other locks.
> However we already have special rules for ->device_list_mutex, and I'd
> rather not have another special case for ->chunk_mutex.
Yeah, we can keep the lock there, eg. for the case when some helper
asserts the lock is held.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] Fix a few lockdep splats
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
` (3 preceding siblings ...)
2020-07-21 10:00 ` [PATCH 0/3] Fix a few lockdep splats David Sterba
@ 2020-07-22 14:02 ` David Sterba
4 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 14:02 UTC (permalink / raw)
To: Josef Bacik; +Cc: linux-btrfs, kernel-team
On Fri, Jul 17, 2020 at 03:12:26PM -0400, Josef Bacik wrote:
> We've had a longstanding lockdep splat in open_fs_devices that we've been
> ignoring for some time, since it's not actually a problem. However this has
> been masking all the other lockdep issues that we have, since lockdep takes its
> ball and goes home as soon as you hit one problem.
Added to misc-next, thanks. There should be like 4 lockdep fixes, let's
see which tests with higher numbers were obscured due to the lockdep
first-only behaviour.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-07-22 14:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
2020-07-22 12:57 ` David Sterba
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
2020-07-20 7:23 ` Anand Jain
2020-07-22 13:36 ` David Sterba
2020-07-22 13:47 ` David Sterba
2020-07-17 19:12 ` [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info Josef Bacik
2020-07-21 10:00 ` [PATCH 0/3] Fix a few lockdep splats David Sterba
2020-07-22 14:02 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.