All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Fix a few lockdep splats
@ 2020-07-17 19:12 Josef Bacik
  2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We've had a longstanding lockdep splat in open_fs_devices that we've been
ignoring for some time, since it's not actually a problem.  However this has
been masking all the other lockdep issues that we have, since lockdep takes its
ball and goes home as soon as you hit one problem.

This fixes all the issues I found while running xfstests.  I would like to do
something more sane around fs_devices in the future, but for right now lets just
get these lockdep things gone so we can focus on more important things, as well
as get notifications on any new lockdep issues that show up.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices
  2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
@ 2020-07-17 19:12 ` Josef Bacik
  2020-07-22 12:57   ` David Sterba
  2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

There's long existed a lockdep splat because we open our bdev's under
the ->device_list_mutex at mount time, which acquires the bd_mutex.
Usually this goes unnoticed, but if you do loopback devices at all
suddenly the bd_mutex comes with a whole host of other dependencies,
which results in the splat when you mount a btrfs file system.

======================================================
WARNING: possible circular locking dependency detected
5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted
------------------------------------------------------
systemd-journal/509 is trying to acquire lock:
ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs]

but task is already holding lock:
ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

 -> #6 (sb_pagefaults){.+.+}-{0:0}:
       __sb_start_write+0x13e/0x220
       btrfs_page_mkwrite+0x59/0x560 [btrfs]
       do_page_mkwrite+0x4f/0x130
       do_wp_page+0x3b0/0x4f0
       handle_mm_fault+0xf47/0x1850
       do_user_addr_fault+0x1fc/0x4b0
       exc_page_fault+0x88/0x300
       asm_exc_page_fault+0x1e/0x30

 -> #5 (&mm->mmap_lock#2){++++}-{3:3}:
       __might_fault+0x60/0x80
       _copy_from_user+0x20/0xb0
       get_sg_io_hdr+0x9a/0xb0
       scsi_cmd_ioctl+0x1ea/0x2f0
       cdrom_ioctl+0x3c/0x12b4
       sr_block_ioctl+0xa4/0xd0
       block_ioctl+0x3f/0x50
       ksys_ioctl+0x82/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #4 (&cd->lock){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       sr_block_open+0xa2/0x180
       __blkdev_get+0xdd/0x550
       blkdev_get+0x38/0x150
       do_dentry_open+0x16b/0x3e0
       path_openat+0x3c9/0xa00
       do_filp_open+0x75/0x100
       do_sys_openat2+0x8a/0x140
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       __blkdev_get+0x6a/0x550
       blkdev_get+0x85/0x150
       blkdev_get_by_path+0x2c/0x70
       btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
       open_fs_devices+0x88/0x240 [btrfs]
       btrfs_open_devices+0x92/0xa0 [btrfs]
       btrfs_mount_root+0x250/0x490 [btrfs]
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       vfs_kern_mount.part.0+0x71/0xb0
       btrfs_mount+0x119/0x380 [btrfs]
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       do_mount+0x8c6/0xca0
       __x64_sys_mount+0x8e/0xd0
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       btrfs_run_dev_stats+0x36/0x420 [btrfs]
       commit_cowonly_roots+0x91/0x2d0 [btrfs]
       btrfs_commit_transaction+0x4e6/0x9f0 [btrfs]
       btrfs_sync_file+0x38a/0x480 [btrfs]
       __x64_sys_fdatasync+0x47/0x80
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       btrfs_commit_transaction+0x48e/0x9f0 [btrfs]
       btrfs_sync_file+0x38a/0x480 [btrfs]
       __x64_sys_fdatasync+0x47/0x80
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}:
       __lock_acquire+0x1241/0x20c0
       lock_acquire+0xb0/0x400
       __mutex_lock+0x7b/0x820
       btrfs_record_root_in_trans+0x44/0x70 [btrfs]
       start_transaction+0xd2/0x500 [btrfs]
       btrfs_dirty_inode+0x44/0xd0 [btrfs]
       file_update_time+0xc6/0x120
       btrfs_page_mkwrite+0xda/0x560 [btrfs]
       do_page_mkwrite+0x4f/0x130
       do_wp_page+0x3b0/0x4f0
       handle_mm_fault+0xf47/0x1850
       do_user_addr_fault+0x1fc/0x4b0
       exc_page_fault+0x88/0x300
       asm_exc_page_fault+0x1e/0x30

other info that might help us debug this:

Chain exists of:
  &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults

Possible unsafe locking scenario:

     CPU0                    CPU1
     ----                    ----
 lock(sb_pagefaults);
                             lock(&mm->mmap_lock#2);
                             lock(sb_pagefaults);
 lock(&fs_info->reloc_mutex);

 *** DEADLOCK ***

3 locks held by systemd-journal/509:
 #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0
 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]
 #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs]

stack backtrace:
CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack+0x92/0xc8
 check_noncircular+0x134/0x150
 __lock_acquire+0x1241/0x20c0
 lock_acquire+0xb0/0x400
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 ? lock_acquire+0xb0/0x400
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 __mutex_lock+0x7b/0x820
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 ? kvm_sched_clock_read+0x14/0x30
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0xc/0xb0
 btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 start_transaction+0xd2/0x500 [btrfs]
 btrfs_dirty_inode+0x44/0xd0 [btrfs]
 file_update_time+0xc6/0x120
 btrfs_page_mkwrite+0xda/0x560 [btrfs]
 ? sched_clock+0x5/0x10
 do_page_mkwrite+0x4f/0x130
 do_wp_page+0x3b0/0x4f0
 handle_mm_fault+0xf47/0x1850
 do_user_addr_fault+0x1fc/0x4b0
 exc_page_fault+0x88/0x300
 ? asm_exc_page_fault+0x8/0x30
 asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fa3972fdbfe
Code: Bad RIP value.

Fix this by not holding the ->device_list_mutex at this point.  The
device_list_mutex exists to protect us from modifying the device list
while the file system is running.

However it can also be modified by doing a scan on a device.  But this
action is specifically protected by the uuid_mutex, which we are holding
here.  We cannot race with opening at this point because we have the
->s_mount lock held during the mount.  Not having the
->device_list_mutex here is perfectly safe as we're not going to change
the devices at this point.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ce01e44f8134..20295441251a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -258,6 +258,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
  * may be used to exclude some operations from running concurrently without any
  * modifications to the list (see write_all_supers)
  *
+ * Is not required at mount and close times, because our device list is
+ * protected by the uuid_mutex at that point.
+ *
  * balance_mutex
  * -------------
  * protects balance structures (status, state) and context accessed from
@@ -602,6 +605,11 @@ static int btrfs_free_stale_devices(const char *path,
 	return ret;
 }
 
+/*
+ * This is only used on mount, and we are protected from competing things
+ * messing with our fs_devices by the uuid_mutex, thus we do not need the
+ * fs_devices->device_list_mutex here.
+ */
 static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 			struct btrfs_device *device, fmode_t flags,
 			void *holder)
@@ -1230,7 +1238,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 
 	lockdep_assert_held(&uuid_mutex);
 
-	mutex_lock(&fs_devices->device_list_mutex);
 	if (fs_devices->opened) {
 		fs_devices->opened++;
 		ret = 0;
@@ -1238,7 +1245,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		list_sort(NULL, &fs_devices->devices, devid_cmp);
 		ret = open_fs_devices(fs_devices, flags, holder);
 	}
-	mutex_unlock(&fs_devices->device_list_mutex);
 
 	return ret;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
  2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
  2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
@ 2020-07-17 19:12 ` Josef Bacik
  2020-07-20  7:23   ` Anand Jain
                     ` (2 more replies)
  2020-07-17 19:12 ` [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info Josef Bacik
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We are currently getting this lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5+ #20 Tainted: G            E
------------------------------------------------------
mount/678048 is trying to acquire lock:
ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs]

but task is already holding lock:
ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
       __mutex_lock+0x8b/0x8f0
       btrfs_init_new_device+0x2d2/0x1240 [btrfs]
       btrfs_ioctl+0x1de/0x2d20 [btrfs]
       ksys_ioctl+0x87/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __lock_acquire+0x1240/0x2460
       lock_acquire+0xab/0x360
       __mutex_lock+0x8b/0x8f0
       clone_fs_devices+0x4d/0x170 [btrfs]
       btrfs_read_chunk_tree+0x330/0x800 [btrfs]
       open_ctree+0xb7c/0x18ce [btrfs]
       btrfs_mount_root.cold+0x13/0xfa [btrfs]
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       fc_mount+0xe/0x40
       vfs_kern_mount.part.0+0x71/0x90
       btrfs_mount+0x13b/0x3e0 [btrfs]
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       do_mount+0x7de/0xb30
       __x64_sys_mount+0x8e/0xd0
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->chunk_mutex);
                               lock(&fs_devs->device_list_mutex);
                               lock(&fs_info->chunk_mutex);
  lock(&fs_devs->device_list_mutex);

 *** DEADLOCK ***

3 locks held by mount/678048:
 #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
 #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs]
 #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]

stack backtrace:
CPU: 2 PID: 678048 Comm: mount Tainted: G            E     5.8.0-rc5+ #20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
Call Trace:
 dump_stack+0x96/0xd0
 check_noncircular+0x162/0x180
 __lock_acquire+0x1240/0x2460
 ? asm_sysvec_apic_timer_interrupt+0x12/0x20
 lock_acquire+0xab/0x360
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 __mutex_lock+0x8b/0x8f0
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 ? rcu_read_lock_sched_held+0x52/0x60
 ? cpumask_next+0x16/0x20
 ? module_assert_mutex_or_preempt+0x14/0x40
 ? __module_address+0x28/0xf0
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 ? static_obj+0x4f/0x60
 ? lockdep_init_map_waits+0x43/0x200
 ? clone_fs_devices+0x4d/0x170 [btrfs]
 clone_fs_devices+0x4d/0x170 [btrfs]
 btrfs_read_chunk_tree+0x330/0x800 [btrfs]
 open_ctree+0xb7c/0x18ce [btrfs]
 ? super_setup_bdi_name+0x79/0xd0
 btrfs_mount_root.cold+0x13/0xfa [btrfs]
 ? vfs_parse_fs_string+0x84/0xb0
 ? rcu_read_lock_sched_held+0x52/0x60
 ? kfree+0x2b5/0x310
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 fc_mount+0xe/0x40
 vfs_kern_mount.part.0+0x71/0x90
 btrfs_mount+0x13b/0x3e0 [btrfs]
 ? cred_has_capability+0x7c/0x120
 ? rcu_read_lock_sched_held+0x52/0x60
 ? legacy_get_tree+0x30/0x50
 legacy_get_tree+0x30/0x50
 vfs_get_tree+0x28/0xc0
 do_mount+0x7de/0xb30
 ? memdup_user+0x4e/0x90
 __x64_sys_mount+0x8e/0xd0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and
then read the device, which takes the device_list_mutex.  The
device_list_mutex needs to be taken before the chunk_mutex, so this is a
problem.  We only really need the chunk mutex around adding the chunk,
so move the mutex around read_one_chunk.

An argument could be made that we don't even need the chunk_mutex here
as it's during mount, and we are protected by various other locks.
However we already have special rules for ->device_list_mutex, and I'd
rather not have another special case for ->chunk_mutex.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 20295441251a..adc7bc2a3094 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7053,7 +7053,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	 * otherwise we don't need it.
 	 */
 	mutex_lock(&uuid_mutex);
-	mutex_lock(&fs_info->chunk_mutex);
 
 	/*
 	 * Read all device items, and then all the chunk items. All
@@ -7103,7 +7102,9 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 		} else if (found_key.type == BTRFS_CHUNK_ITEM_KEY) {
 			struct btrfs_chunk *chunk;
 			chunk = btrfs_item_ptr(leaf, slot, struct btrfs_chunk);
+			mutex_lock(&fs_info->chunk_mutex);
 			ret = read_one_chunk(&found_key, leaf, chunk);
+			mutex_unlock(&fs_info->chunk_mutex);
 			if (ret)
 				goto error;
 		}
@@ -7133,7 +7134,6 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	}
 	ret = 0;
 error:
-	mutex_unlock(&fs_info->chunk_mutex);
 	mutex_unlock(&uuid_mutex);
 
 	btrfs_free_path(path);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info
  2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
  2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
  2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
@ 2020-07-17 19:12 ` Josef Bacik
  2020-07-21 10:00 ` [PATCH 0/3] Fix a few lockdep splats David Sterba
  2020-07-22 14:02 ` David Sterba
  4 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2020-07-17 19:12 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When running with -o enospc_debug you can get the following splat if one
of the dump_space_info's trip

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc5+ #20 Tainted: G           OE
------------------------------------------------------
dd/563090 is trying to acquire lock:
ffff9e7dbf4f1e18 (&ctl->tree_lock){+.+.}-{2:2}, at: btrfs_dump_free_space+0x2b/0xa0 [btrfs]

but task is already holding lock:
ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&cache->lock){+.+.}-{2:2}:
       _raw_spin_lock+0x25/0x30
       btrfs_add_reserved_bytes+0x3c/0x3c0 [btrfs]
       find_free_extent+0x7ef/0x13b0 [btrfs]
       btrfs_reserve_extent+0x9b/0x180 [btrfs]
       btrfs_alloc_tree_block+0xc1/0x340 [btrfs]
       alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
       __btrfs_cow_block+0x122/0x530 [btrfs]
       btrfs_cow_block+0x106/0x210 [btrfs]
       commit_cowonly_roots+0x55/0x300 [btrfs]
       btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
       sync_filesystem+0x74/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20 [btrfs]
       deactivate_locked_super+0x36/0x70
       cleanup_mnt+0x104/0x160
       task_work_run+0x5f/0x90
       __prepare_exit_to_usermode+0x1bd/0x1c0
       do_syscall_64+0x5e/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (&space_info->lock){+.+.}-{2:2}:
       _raw_spin_lock+0x25/0x30
       btrfs_block_rsv_release+0x1a6/0x3f0 [btrfs]
       btrfs_inode_rsv_release+0x4f/0x170 [btrfs]
       btrfs_clear_delalloc_extent+0x155/0x480 [btrfs]
       clear_state_bit+0x81/0x1a0 [btrfs]
       __clear_extent_bit+0x25c/0x5d0 [btrfs]
       clear_extent_bit+0x15/0x20 [btrfs]
       btrfs_invalidatepage+0x2b7/0x3c0 [btrfs]
       truncate_cleanup_page+0x47/0xe0
       truncate_inode_pages_range+0x238/0x840
       truncate_pagecache+0x44/0x60
       btrfs_setattr+0x202/0x5e0 [btrfs]
       notify_change+0x33b/0x490
       do_truncate+0x76/0xd0
       path_openat+0x687/0xa10
       do_filp_open+0x91/0x100
       do_sys_openat2+0x215/0x2d0
       do_sys_open+0x44/0x80
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&tree->lock#2){+.+.}-{2:2}:
       _raw_spin_lock+0x25/0x30
       find_first_extent_bit+0x32/0x150 [btrfs]
       write_pinned_extent_entries.isra.0+0xc5/0x100 [btrfs]
       __btrfs_write_out_cache+0x172/0x480 [btrfs]
       btrfs_write_out_cache+0x7a/0xf0 [btrfs]
       btrfs_write_dirty_block_groups+0x286/0x3b0 [btrfs]
       commit_cowonly_roots+0x245/0x300 [btrfs]
       btrfs_commit_transaction+0x4ed/0xac0 [btrfs]
       close_ctree+0xf9/0x2f5 [btrfs]
       generic_shutdown_super+0x6c/0x100
       kill_anon_super+0x14/0x30
       btrfs_kill_super+0x12/0x20 [btrfs]
       deactivate_locked_super+0x36/0x70
       cleanup_mnt+0x104/0x160
       task_work_run+0x5f/0x90
       __prepare_exit_to_usermode+0x1bd/0x1c0
       do_syscall_64+0x5e/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&ctl->tree_lock){+.+.}-{2:2}:
       __lock_acquire+0x1240/0x2460
       lock_acquire+0xab/0x360
       _raw_spin_lock+0x25/0x30
       btrfs_dump_free_space+0x2b/0xa0 [btrfs]
       btrfs_dump_space_info+0xf4/0x120 [btrfs]
       btrfs_reserve_extent+0x176/0x180 [btrfs]
       __btrfs_prealloc_file_range+0x145/0x550 [btrfs]
       cache_save_setup+0x28d/0x3b0 [btrfs]
       btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
       btrfs_commit_transaction+0xcc/0xac0 [btrfs]
       btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
       btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
       btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
       btrfs_file_write_iter+0x3cf/0x610 [btrfs]
       new_sync_write+0x11e/0x1b0
       vfs_write+0x1c9/0x200
       ksys_write+0x68/0xe0
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

Chain exists of:
  &ctl->tree_lock --> &space_info->lock --> &cache->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&cache->lock);
                               lock(&space_info->lock);
                               lock(&cache->lock);
  lock(&ctl->tree_lock);

 *** DEADLOCK ***

6 locks held by dd/563090:
 #0: ffff9e7e21d18448 (sb_writers#14){.+.+}-{0:0}, at: vfs_write+0x195/0x200
 #1: ffff9e7dd0410ed8 (&sb->s_type->i_mutex_key#19){++++}-{3:3}, at: btrfs_file_write_iter+0x86/0x610 [btrfs]
 #2: ffff9e7e21d18638 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40b/0x5b0 [btrfs]
 #3: ffff9e7e1f05d688 (&cur_trans->cache_write_mutex){+.+.}-{3:3}, at: btrfs_start_dirty_block_groups+0x158/0x4f0 [btrfs]
 #4: ffff9e7e2284ddb8 (&space_info->groups_sem){++++}-{3:3}, at: btrfs_dump_space_info+0x69/0x120 [btrfs]
 #5: ffff9e7e2284d428 (&cache->lock){+.+.}-{2:2}, at: btrfs_dump_space_info+0xaa/0x120 [btrfs]

stack backtrace:
CPU: 3 PID: 563090 Comm: dd Tainted: G           OE     5.8.0-rc5+ #20
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
Call Trace:
 dump_stack+0x96/0xd0
 check_noncircular+0x162/0x180
 __lock_acquire+0x1240/0x2460
 ? wake_up_klogd.part.0+0x30/0x40
 lock_acquire+0xab/0x360
 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 _raw_spin_lock+0x25/0x30
 ? btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 btrfs_dump_free_space+0x2b/0xa0 [btrfs]
 btrfs_dump_space_info+0xf4/0x120 [btrfs]
 btrfs_reserve_extent+0x176/0x180 [btrfs]
 __btrfs_prealloc_file_range+0x145/0x550 [btrfs]
 ? btrfs_qgroup_reserve_data+0x1d/0x60 [btrfs]
 cache_save_setup+0x28d/0x3b0 [btrfs]
 btrfs_start_dirty_block_groups+0x1fc/0x4f0 [btrfs]
 btrfs_commit_transaction+0xcc/0xac0 [btrfs]
 ? start_transaction+0xe0/0x5b0 [btrfs]
 btrfs_alloc_data_chunk_ondemand+0x162/0x4c0 [btrfs]
 btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
 btrfs_buffered_write.isra.0+0x19b/0x740 [btrfs]
 ? ktime_get_coarse_real_ts64+0xa8/0xd0
 ? trace_hardirqs_on+0x1c/0xe0
 btrfs_file_write_iter+0x3cf/0x610 [btrfs]
 new_sync_write+0x11e/0x1b0
 vfs_write+0x1c9/0x200
 ksys_write+0x68/0xe0
 do_syscall_64+0x52/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This is because we're holding the block_group->lock while trying to dump
the free space cache.  However we don't need this lock, we just need it
to read the values for the printk, so move the free space cache dumping
outside of the block group lock.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/space-info.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index c7bd3fdd7792..475968ccbd1d 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -468,8 +468,8 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
 			"block group %llu has %llu bytes, %llu used %llu pinned %llu reserved %s",
 			cache->start, cache->length, cache->used, cache->pinned,
 			cache->reserved, cache->ro ? "[readonly]" : "");
-		btrfs_dump_free_space(cache, bytes);
 		spin_unlock(&cache->lock);
+		btrfs_dump_free_space(cache, bytes);
 	}
 	if (++index < BTRFS_NR_RAID_TYPES)
 		goto again;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
  2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
@ 2020-07-20  7:23   ` Anand Jain
  2020-07-22 13:36   ` David Sterba
  2020-07-22 13:47   ` David Sterba
  2 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2020-07-20  7:23 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 18/7/20 3:12 am, Josef Bacik wrote:
> We are currently getting this lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.8.0-rc5+ #20 Tainted: G            E
> ------------------------------------------------------
> mount/678048 is trying to acquire lock:
> ffff9b769f15b6e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170 [btrfs]
> 
> but task is already holding lock:
> ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x8b/0x8f0
>         btrfs_init_new_device+0x2d2/0x1240 [btrfs]
>         btrfs_ioctl+0x1de/0x2d20 [btrfs]
>         ksys_ioctl+0x87/0xc0
>         __x64_sys_ioctl+0x16/0x20
>         do_syscall_64+0x52/0xb0
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
>         __lock_acquire+0x1240/0x2460
>         lock_acquire+0xab/0x360
>         __mutex_lock+0x8b/0x8f0
>         clone_fs_devices+0x4d/0x170 [btrfs]
>         btrfs_read_chunk_tree+0x330/0x800 [btrfs]
>         open_ctree+0xb7c/0x18ce [btrfs]
>         btrfs_mount_root.cold+0x13/0xfa [btrfs]
>         legacy_get_tree+0x30/0x50
>         vfs_get_tree+0x28/0xc0
>         fc_mount+0xe/0x40
>         vfs_kern_mount.part.0+0x71/0x90
>         btrfs_mount+0x13b/0x3e0 [btrfs]
>         legacy_get_tree+0x30/0x50
>         vfs_get_tree+0x28/0xc0
>         do_mount+0x7de/0xb30
>         __x64_sys_mount+0x8e/0xd0
>         do_syscall_64+0x52/0xb0
>         entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> other info that might help us debug this:
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&fs_info->chunk_mutex);
>                                 lock(&fs_devs->device_list_mutex);
>                                 lock(&fs_info->chunk_mutex);
>    lock(&fs_devs->device_list_mutex);
> 
>   *** DEADLOCK ***
> 
> 3 locks held by mount/678048:
>   #0: ffff9b75ff5fb0e0 (&type->s_umount_key#63/1){+.+.}-{3:3}, at: alloc_super+0xb5/0x380
>   #1: ffffffffc0c2fbc8 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x54/0x800 [btrfs]
>   #2: ffff9b76abdb08d0 (&fs_info->chunk_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x6a/0x800 [btrfs]
> 
> stack backtrace:
> CPU: 2 PID: 678048 Comm: mount Tainted: G            E     5.8.0-rc5+ #20
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
> Call Trace:
>   dump_stack+0x96/0xd0
>   check_noncircular+0x162/0x180
>   __lock_acquire+0x1240/0x2460
>   ? asm_sysvec_apic_timer_interrupt+0x12/0x20
>   lock_acquire+0xab/0x360
>   ? clone_fs_devices+0x4d/0x170 [btrfs]
>   __mutex_lock+0x8b/0x8f0
>   ? clone_fs_devices+0x4d/0x170 [btrfs]
>   ? rcu_read_lock_sched_held+0x52/0x60
>   ? cpumask_next+0x16/0x20
>   ? module_assert_mutex_or_preempt+0x14/0x40
>   ? __module_address+0x28/0xf0
>   ? clone_fs_devices+0x4d/0x170 [btrfs]
>   ? static_obj+0x4f/0x60
>   ? lockdep_init_map_waits+0x43/0x200
>   ? clone_fs_devices+0x4d/0x170 [btrfs]
>   clone_fs_devices+0x4d/0x170 [btrfs]
>   btrfs_read_chunk_tree+0x330/0x800 [btrfs]
>   open_ctree+0xb7c/0x18ce [btrfs]
>   ? super_setup_bdi_name+0x79/0xd0
>   btrfs_mount_root.cold+0x13/0xfa [btrfs]
>   ? vfs_parse_fs_string+0x84/0xb0
>   ? rcu_read_lock_sched_held+0x52/0x60
>   ? kfree+0x2b5/0x310
>   legacy_get_tree+0x30/0x50
>   vfs_get_tree+0x28/0xc0
>   fc_mount+0xe/0x40
>   vfs_kern_mount.part.0+0x71/0x90
>   btrfs_mount+0x13b/0x3e0 [btrfs]
>   ? cred_has_capability+0x7c/0x120
>   ? rcu_read_lock_sched_held+0x52/0x60
>   ? legacy_get_tree+0x30/0x50
>   legacy_get_tree+0x30/0x50
>   vfs_get_tree+0x28/0xc0
>   do_mount+0x7de/0xb30
>   ? memdup_user+0x4e/0x90
>   __x64_sys_mount+0x8e/0xd0
>   do_syscall_64+0x52/0xb0
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> This is because btrfs_read_chunk_tree() can come upon DEV_EXTENT's and
> then read the device, which takes the device_list_mutex.  The
> device_list_mutex needs to be taken before the chunk_mutex, so this is a
> problem.  We only really need the chunk mutex around adding the chunk,
> so move the mutex around read_one_chunk.
> 
> An argument could be made that we don't even need the chunk_mutex here
> as it's during mount, and we are protected by various other locks.
> However we already have special rules for ->device_list_mutex, and I'd
> rather not have another special case for ->chunk_mutex.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

  Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Fix a few lockdep splats
  2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
                   ` (2 preceding siblings ...)
  2020-07-17 19:12 ` [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info Josef Bacik
@ 2020-07-21 10:00 ` David Sterba
  2020-07-22 14:02 ` David Sterba
  4 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-21 10:00 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jul 17, 2020 at 03:12:26PM -0400, Josef Bacik wrote:
> We've had a longstanding lockdep splat in open_fs_devices that we've been
> ignoring for some time, since it's not actually a problem.  However this has
> been masking all the other lockdep issues that we have, since lockdep takes its
> ball and goes home as soon as you hit one problem.

That lockdep does not report anything past the first warning is making
it hard to test, I got an unrelated splat with btrfs/078 so can't
actually verify that the fstests run is clean with your patchset.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices
  2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
@ 2020-07-22 12:57   ` David Sterba
  0 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 12:57 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jul 17, 2020 at 03:12:27PM -0400, Josef Bacik wrote:
> Fix this by not holding the ->device_list_mutex at this point.  The
> device_list_mutex exists to protect us from modifying the device list
> while the file system is running.
> 
> However it can also be modified by doing a scan on a device.  But this
> action is specifically protected by the uuid_mutex, which we are holding
> here.  We cannot race with opening at this point because we have the
> ->s_mount lock held during the mount.  Not having the
> ->device_list_mutex here is perfectly safe as we're not going to change
> the devices at this point.

Agreed, the uuid_mutex is sufficient here, since 81ffd56b574 ("btrfs:
fix mount and ioctl device scan ioctl race") that excludes the critical
parts of mount and scan.

> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/volumes.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ce01e44f8134..20295441251a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -258,6 +258,9 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   * may be used to exclude some operations from running concurrently without any
>   * modifications to the list (see write_all_supers)
>   *
> + * Is not required at mount and close times, because our device list is
> + * protected by the uuid_mutex at that point.

This is correct, however there's one comment a few lines above about
unid_mutex

  "does not protect: manipulation of the fs_devices::devices list!"

so I'll update it means 'not in general but there are exceptions like
mount context'.

> + *
>   * balance_mutex
>   * -------------
>   * protects balance structures (status, state) and context accessed from
> @@ -602,6 +605,11 @@ static int btrfs_free_stale_devices(const char *path,
>  	return ret;
>  }
>  
> +/*
> + * This is only used on mount, and we are protected from competing things
> + * messing with our fs_devices by the uuid_mutex, thus we do not need the
> + * fs_devices->device_list_mutex here.
> + */
>  static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  			struct btrfs_device *device, fmode_t flags,
>  			void *holder)
> @@ -1230,7 +1238,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>  
>  	lockdep_assert_held(&uuid_mutex);
>  
> -	mutex_lock(&fs_devices->device_list_mutex);

I'll leave a comment here as the device list is clearly modified
(list_sort).

>  	if (fs_devices->opened) {
>  		fs_devices->opened++;
>  		ret = 0;
> @@ -1238,7 +1245,6 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>  		list_sort(NULL, &fs_devices->devices, devid_cmp);
>  		ret = open_fs_devices(fs_devices, flags, holder);
>  	}
> -	mutex_unlock(&fs_devices->device_list_mutex);
>  
>  	return ret;
>  }
> -- 
> 2.24.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
  2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
  2020-07-20  7:23   ` Anand Jain
@ 2020-07-22 13:36   ` David Sterba
  2020-07-22 13:47   ` David Sterba
  2 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 13:36 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jul 17, 2020 at 03:12:28PM -0400, Josef Bacik wrote:
> We are currently getting this lockdep splat

So this is the btrfs/161 report.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree
  2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
  2020-07-20  7:23   ` Anand Jain
  2020-07-22 13:36   ` David Sterba
@ 2020-07-22 13:47   ` David Sterba
  2 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 13:47 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jul 17, 2020 at 03:12:28PM -0400, Josef Bacik wrote:
> An argument could be made that we don't even need the chunk_mutex here
> as it's during mount, and we are protected by various other locks.
> However we already have special rules for ->device_list_mutex, and I'd
> rather not have another special case for ->chunk_mutex.

Yeah, we can keep the lock there, eg. for the case when some helper
asserts the lock is held.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/3] Fix a few lockdep splats
  2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
                   ` (3 preceding siblings ...)
  2020-07-21 10:00 ` [PATCH 0/3] Fix a few lockdep splats David Sterba
@ 2020-07-22 14:02 ` David Sterba
  4 siblings, 0 replies; 10+ messages in thread
From: David Sterba @ 2020-07-22 14:02 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Jul 17, 2020 at 03:12:26PM -0400, Josef Bacik wrote:
> We've had a longstanding lockdep splat in open_fs_devices that we've been
> ignoring for some time, since it's not actually a problem.  However this has
> been masking all the other lockdep issues that we have, since lockdep takes its
> ball and goes home as soon as you hit one problem.

Added to misc-next, thanks. There should be like 4 lockdep fixes, let's
see which tests with higher numbers were obscured due to the lockdep
first-only behaviour.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-22 14:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 19:12 [PATCH 0/3] Fix a few lockdep splats Josef Bacik
2020-07-17 19:12 ` [PATCH 1/3] btrfs: fix lockdep splat in open_fs_devices Josef Bacik
2020-07-22 12:57   ` David Sterba
2020-07-17 19:12 ` [PATCH 2/3] btrfs: move the chunk_mutex in btrfs_read_chunk_tree Josef Bacik
2020-07-20  7:23   ` Anand Jain
2020-07-22 13:36   ` David Sterba
2020-07-22 13:47   ` David Sterba
2020-07-17 19:12 ` [PATCH 3/3] btrfs: fix lockdep splat from btrfs_dump_space_info Josef Bacik
2020-07-21 10:00 ` [PATCH 0/3] Fix a few lockdep splats David Sterba
2020-07-22 14:02 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.