All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Lockdep fixes
@ 2020-08-20 15:18 Josef Bacik
  2020-08-20 15:18 ` [PATCH 1/8] btrfs: drop path before adding new uuid tree entry Josef Bacik
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

While reworking the btrfs locking to use a rwsem I discovered a few more lockdep
issues because of the new locking.  These are stand alone fixes that are safe
enough to go into 5.9, so I've broken them out from the overall locking rework.
I addressed the issues that Filipe pointed out (I think I did anyway), and run
the whole series through xfstests and saw no more lockdep messages.
 
fs/btrfs/ctree.c       |   6 +-
 fs/btrfs/dev-replace.c |   6 +-
 fs/btrfs/extent-tree.c |   2 +-
 fs/btrfs/extent_io.c   |   8 +--
 fs/btrfs/extent_io.h   |   6 +-
 fs/btrfs/ioctl.c       |  27 ++++++---
 fs/btrfs/scrub.c       | 122 +++++++++++++++++++++++------------------
 fs/btrfs/volumes.c     |  23 ++++----
 fs/btrfs/volumes.h     |   3 +
 9 files changed, 121 insertions(+), 82 deletions(-)

Thanks,

Josef

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/8] btrfs: drop path before adding new uuid tree entry
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-20 15:18 ` [PATCH 2/8] btrfs: fix potential deadlock in the search ioctl Josef Bacik
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

With the conversion to the rwsemaphore I got the following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925 Not tainted
------------------------------------------------------
btrfs-uuid/7955 is trying to acquire lock:
ffff88bfbafec0f8 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

but task is already holding lock:
ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (btrfs-uuid-00){++++}-{3:3}:
       down_read_nested+0x3e/0x140
       __btrfs_tree_read_lock+0x39/0x180
       __btrfs_read_lock_root_node+0x3a/0x50
       btrfs_search_slot+0x4bd/0x990
       btrfs_uuid_tree_add+0x89/0x2d0
       btrfs_uuid_scan_kthread+0x330/0x390
       kthread+0x133/0x150
       ret_from_fork+0x1f/0x30

-> #0 (btrfs-root-00){++++}-{3:3}:
       __lock_acquire+0x1272/0x2310
       lock_acquire+0x9e/0x360
       down_read_nested+0x3e/0x140
       __btrfs_tree_read_lock+0x39/0x180
       __btrfs_read_lock_root_node+0x3a/0x50
       btrfs_search_slot+0x4bd/0x990
       btrfs_find_root+0x45/0x1b0
       btrfs_read_tree_root+0x61/0x100
       btrfs_get_root_ref.part.50+0x143/0x630
       btrfs_uuid_tree_iterate+0x207/0x314
       btrfs_uuid_rescan_kthread+0x12/0x50
       kthread+0x133/0x150
       ret_from_fork+0x1f/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-uuid-00);
                               lock(btrfs-root-00);
                               lock(btrfs-uuid-00);
  lock(btrfs-root-00);

 *** DEADLOCK ***

1 lock held by btrfs-uuid/7955:
 #0: ffff88bfbafef2a8 (btrfs-uuid-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

stack backtrace:
CPU: 73 PID: 7955 Comm: btrfs-uuid Kdump: loaded Not tainted 5.8.0-rc7-00167-g0d7ba0c5b375-dirty #925
Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
Call Trace:
 dump_stack+0x78/0xa0
 check_noncircular+0x165/0x180
 __lock_acquire+0x1272/0x2310
 lock_acquire+0x9e/0x360
 ? __btrfs_tree_read_lock+0x39/0x180
 ? btrfs_root_node+0x1c/0x1d0
 down_read_nested+0x3e/0x140
 ? __btrfs_tree_read_lock+0x39/0x180
 __btrfs_tree_read_lock+0x39/0x180
 __btrfs_read_lock_root_node+0x3a/0x50
 btrfs_search_slot+0x4bd/0x990
 btrfs_find_root+0x45/0x1b0
 btrfs_read_tree_root+0x61/0x100
 btrfs_get_root_ref.part.50+0x143/0x630
 btrfs_uuid_tree_iterate+0x207/0x314
 ? btree_readpage+0x20/0x20
 btrfs_uuid_rescan_kthread+0x12/0x50
 kthread+0x133/0x150
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x1f/0x30

This problem exists because we have two different rescan threads,
btrfs_uuid_scan_kthread which creates the uuid tree, and
btrfs_uuid_tree_iterate that goes through and updates or deletes any out
of date roots.  The problem is they both do things in different order.
btrfs_uuid_scan_kthread() reads the tree_root, and then inserts entries
into the uuid_root.  btrfs_uuid_tree_iterate() scans the uuid_root, but
then does a btrfs_get_fs_root() which can read from the tree_root.

It's actually easy enough to not be holding the path in
btrfs_uuid_scan_kthread() when we add a uuid entry, as we already drop
it further down and re-start the search when we loop.  So simply move
the path release before we add our entry to the uuid tree.

This also fixes a problem where we're holding a path open after we do
btrfs_end_transaction(), which has it's own problems.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1d5026fdfc75..756ac903a19f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4465,6 +4465,7 @@ int btrfs_uuid_scan_kthread(void *data)
 			goto skip;
 		}
 update_tree:
+		btrfs_release_path(path);
 		if (!btrfs_is_empty_uuid(root_item.uuid)) {
 			ret = btrfs_uuid_tree_add(trans, root_item.uuid,
 						  BTRFS_UUID_KEY_SUBVOL,
@@ -4489,6 +4490,7 @@ int btrfs_uuid_scan_kthread(void *data)
 		}
 
 skip:
+		btrfs_release_path(path);
 		if (trans) {
 			ret = btrfs_end_transaction(trans);
 			trans = NULL;
@@ -4496,7 +4498,6 @@ int btrfs_uuid_scan_kthread(void *data)
 				break;
 		}
 
-		btrfs_release_path(path);
 		if (key.offset < (u64)-1) {
 			key.offset++;
 		} else if (key.type < BTRFS_ROOT_ITEM_KEY) {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/8] btrfs: fix potential deadlock in the search ioctl
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
  2020-08-20 15:18 ` [PATCH 1/8] btrfs: drop path before adding new uuid tree entry Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-20 15:18 ` [PATCH 3/8] btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing Josef Bacik
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When I converted the locking over to using a rwsem I got the following
lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc7-00165-g04ec4da5f45f-dirty #922 Not tainted
------------------------------------------------------
compsize/11122 is trying to acquire lock:
ffff889fabca8768 (&mm->mmap_lock#2){++++}-{3:3}, at: __might_fault+0x3e/0x90

but task is already holding lock:
ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (btrfs-fs-00){++++}-{3:3}:
       down_write_nested+0x3b/0x70
       __btrfs_tree_lock+0x24/0x120
       btrfs_search_slot+0x756/0x990
       btrfs_lookup_inode+0x3a/0xb4
       __btrfs_update_delayed_inode+0x93/0x270
       btrfs_async_run_delayed_root+0x168/0x230
       btrfs_work_helper+0xd4/0x570
       process_one_work+0x2ad/0x5f0
       worker_thread+0x3a/0x3d0
       kthread+0x133/0x150
       ret_from_fork+0x1f/0x30

-> #1 (&delayed_node->mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_delayed_update_inode+0x50/0x440
       btrfs_update_inode+0x8a/0xf0
       btrfs_dirty_inode+0x5b/0xd0
       touch_atime+0xa1/0xd0
       btrfs_file_mmap+0x3f/0x60
       mmap_region+0x3a4/0x640
       do_mmap+0x376/0x580
       vm_mmap_pgoff+0xd5/0x120
       ksys_mmap_pgoff+0x193/0x230
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&mm->mmap_lock#2){++++}-{3:3}:
       __lock_acquire+0x1272/0x2310
       lock_acquire+0x9e/0x360
       __might_fault+0x68/0x90
       _copy_to_user+0x1e/0x80
       copy_to_sk.isra.32+0x121/0x300
       search_ioctl+0x106/0x200
       btrfs_ioctl_tree_search_v2+0x7b/0xf0
       btrfs_ioctl+0x106f/0x30a0
       ksys_ioctl+0x83/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

Chain exists of:
  &mm->mmap_lock#2 --> &delayed_node->mutex --> btrfs-fs-00

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-fs-00);
                               lock(&delayed_node->mutex);
                               lock(btrfs-fs-00);
  lock(&mm->mmap_lock#2);

 *** DEADLOCK ***

1 lock held by compsize/11122:
 #0: ffff889fe720fe40 (btrfs-fs-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180

stack backtrace:
CPU: 17 PID: 11122 Comm: compsize Kdump: loaded Not tainted 5.8.0-rc7-00165-g04ec4da5f45f-dirty #922
Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
Call Trace:
 dump_stack+0x78/0xa0
 check_noncircular+0x165/0x180
 __lock_acquire+0x1272/0x2310
 lock_acquire+0x9e/0x360
 ? __might_fault+0x3e/0x90
 ? find_held_lock+0x72/0x90
 __might_fault+0x68/0x90
 ? __might_fault+0x3e/0x90
 _copy_to_user+0x1e/0x80
 copy_to_sk.isra.32+0x121/0x300
 ? btrfs_search_forward+0x2a6/0x360
 search_ioctl+0x106/0x200
 btrfs_ioctl_tree_search_v2+0x7b/0xf0
 btrfs_ioctl+0x106f/0x30a0
 ? __do_sys_newfstat+0x5a/0x70
 ? ksys_ioctl+0x83/0xc0
 ksys_ioctl+0x83/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x50/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The problem is we're doing a copy_to_user() while holding tree locks,
which can deadlock if we have to do a page fault for the copy_to_user().
This exists even without my locking changes, so it needs to be fixed.
Rework the search ioctl to do the pre-fault and then
copy_to_user_nofault for the copying.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c |  8 ++++----
 fs/btrfs/extent_io.h |  6 +++---
 fs/btrfs/ioctl.c     | 27 ++++++++++++++++++++-------
 3 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8181a75e25cf..0320ddb0133e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5655,9 +5655,9 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	}
 }
 
-int read_extent_buffer_to_user(const struct extent_buffer *eb,
-			       void __user *dstv,
-			       unsigned long start, unsigned long len)
+int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
+				       void __user *dstv,
+				       unsigned long start, unsigned long len)
 {
 	size_t cur;
 	size_t offset;
@@ -5677,7 +5677,7 @@ int read_extent_buffer_to_user(const struct extent_buffer *eb,
 
 		cur = min(len, (PAGE_SIZE - offset));
 		kaddr = page_address(page);
-		if (copy_to_user(dst, kaddr + offset, cur)) {
+		if (copy_to_user_nofault(dst, kaddr + offset, cur)) {
 			ret = -EFAULT;
 			break;
 		}
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 00a88f2eb5ab..30794ae58498 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -241,9 +241,9 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 void read_extent_buffer(const struct extent_buffer *eb, void *dst,
 			unsigned long start,
 			unsigned long len);
-int read_extent_buffer_to_user(const struct extent_buffer *eb,
-			       void __user *dst, unsigned long start,
-			       unsigned long len);
+int read_extent_buffer_to_user_nofault(const struct extent_buffer *eb,
+				       void __user *dst, unsigned long start,
+				       unsigned long len);
 void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *src);
 void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
 		const void *src);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e45520d3cf51..18713900a32b 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2086,9 +2086,14 @@ static noinline int copy_to_sk(struct btrfs_path *path,
 		sh.len = item_len;
 		sh.transid = found_transid;
 
-		/* copy search result header */
-		if (copy_to_user(ubuf + *sk_offset, &sh, sizeof(sh))) {
-			ret = -EFAULT;
+		/*
+		 * copy search result header, if we fault loop again so we can
+		 * fault in the pages and -EFAULT there if there's a problem,
+		 * otherwise we'll fault and then copy the buffer in properly
+		 * this next time through
+		 */
+		if (copy_to_user_nofault(ubuf + *sk_offset, &sh, sizeof(sh))) {
+			ret = 0;
 			goto out;
 		}
 
@@ -2096,10 +2101,14 @@ static noinline int copy_to_sk(struct btrfs_path *path,
 
 		if (item_len) {
 			char __user *up = ubuf + *sk_offset;
-			/* copy the item */
-			if (read_extent_buffer_to_user(leaf, up,
-						       item_off, item_len)) {
-				ret = -EFAULT;
+			/* copy the item, same behavior as above, but reset the
+			 * sk_offset so we copy the full thing again.
+			*/
+			if (read_extent_buffer_to_user_nofault(leaf, up,
+							       item_off,
+							       item_len)) {
+				ret = 0;
+				*sk_offset -= sizeof(sh);
 				goto out;
 			}
 
@@ -2184,6 +2193,10 @@ static noinline int search_ioctl(struct inode *inode,
 	key.offset = sk->min_offset;
 
 	while (1) {
+		ret = fault_in_pages_writeable(ubuf, *buf_size - sk_offset);
+		if (ret)
+			break;
+
 		ret = btrfs_search_forward(root, &key, path, sk->min_transid);
 		if (ret != 0) {
 			if (ret > 0)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/8] btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
  2020-08-20 15:18 ` [PATCH 1/8] btrfs: drop path before adding new uuid tree entry Josef Bacik
  2020-08-20 15:18 ` [PATCH 2/8] btrfs: fix potential deadlock in the search ioctl Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-21 12:56   ` Nikolay Borisov
  2020-08-20 15:18 ` [PATCH 4/8] btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks Josef Bacik
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We need to move the closing of the src_device out of all the device
replace locking, but we definitely want to zero out the superblock
before we commit the last time to make sure the device is properly
removed.  Handle this by pushing btrfs_scratch_superblocks into
btrfs_dev_replace_finishing, and then later on we'll move the src_device
closing and freeing stuff where we need it to be.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/dev-replace.c |  3 +++
 fs/btrfs/volumes.c     | 12 +++---------
 fs/btrfs/volumes.h     |  3 +++
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 769ac3098880..cd240714a7af 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -745,6 +745,9 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	/* replace the sysfs entry */
 	btrfs_sysfs_remove_devices_dir(fs_info->fs_devices, src_device);
 	btrfs_sysfs_update_devid(tgt_device);
+	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &src_device->dev_state))
+		btrfs_scratch_superblocks(fs_info, src_device->bdev,
+					  src_device->name->str);
 	btrfs_rm_dev_replace_free_srcdev(src_device);
 
 	/* write back the superblocks */
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 756ac903a19f..33ecfcb472a8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1998,9 +1998,9 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
 	return num_devices;
 }
 
-static void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
-				      struct block_device *bdev,
-				      const char *device_path)
+void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
+			       struct block_device *bdev,
+			       const char *device_path)
 {
 	struct btrfs_super_block *disk_super;
 	int copy_num;
@@ -2223,12 +2223,6 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev)
 	struct btrfs_fs_info *fs_info = srcdev->fs_info;
 	struct btrfs_fs_devices *fs_devices = srcdev->fs_devices;
 
-	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &srcdev->dev_state)) {
-		/* zero out the old super if it is writable */
-		btrfs_scratch_superblocks(fs_info, srcdev->bdev,
-					  srcdev->name->str);
-	}
-
 	btrfs_close_bdev(srcdev);
 	synchronize_rcu();
 	btrfs_free_device(srcdev);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5eea93916fbf..302c9234f7d0 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -573,6 +573,9 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 					struct btrfs_device *failing_dev);
+void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
+			       struct block_device *bdev,
+			       const char *device_path);
 
 int btrfs_bg_type_to_factor(u64 flags);
 const char *btrfs_bg_type_to_raid_name(u64 flags);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/8] btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
                   ` (2 preceding siblings ...)
  2020-08-20 15:18 ` [PATCH 3/8] btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-20 15:18 ` [PATCH 5/8] btrfs: do not hold device_list_mutex when closing devices Josef Bacik
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When closing and freeing the source device we could end up doing our
final blkdev_put() on the bdev, which will grab the bd_mutex.  As such
we want to be holding as few locks as possible, so move this call
outside of the dev_replace->lock_finishing_cancel_unmount lock.  Since
we're modifying the fs_devices we need to make sure we're holding the
uuid_mutex here, so take that as well.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/dev-replace.c | 3 ++-
 fs/btrfs/volumes.c     | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index cd240714a7af..580e60fe07d0 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -748,7 +748,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &src_device->dev_state))
 		btrfs_scratch_superblocks(fs_info, src_device->bdev,
 					  src_device->name->str);
-	btrfs_rm_dev_replace_free_srcdev(src_device);
 
 	/* write back the superblocks */
 	trans = btrfs_start_transaction(root, 0);
@@ -757,6 +756,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 
 	mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
 
+	btrfs_rm_dev_replace_free_srcdev(src_device);
+
 	return 0;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 33ecfcb472a8..4247ac0ebb71 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2223,6 +2223,8 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev)
 	struct btrfs_fs_info *fs_info = srcdev->fs_info;
 	struct btrfs_fs_devices *fs_devices = srcdev->fs_devices;
 
+	mutex_lock(&uuid_mutex);
+
 	btrfs_close_bdev(srcdev);
 	synchronize_rcu();
 	btrfs_free_device(srcdev);
@@ -2251,6 +2253,7 @@ void btrfs_rm_dev_replace_free_srcdev(struct btrfs_device *srcdev)
 		close_fs_devices(fs_devices);
 		free_fs_devices(fs_devices);
 	}
+	mutex_unlock(&uuid_mutex);
 }
 
 void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/8] btrfs: do not hold device_list_mutex when closing devices
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
                   ` (3 preceding siblings ...)
  2020-08-20 15:18 ` [PATCH 4/8] btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-20 15:18 ` [PATCH 6/8] btrfs: allocate scrub workqueues outside of locks Josef Bacik
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

The following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc7-00169-g87212851a027-dirty #929 Not tainted
------------------------------------------------------
fsstress/8739 is trying to acquire lock:
ffff88bfd0eb0c90 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x43/0x70

but task is already holding lock:
ffff88bfbd16e538 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x6a/0x4a0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #10 (sb_pagefaults){.+.+}-{0:0}:
       __sb_start_write+0x129/0x210
       btrfs_page_mkwrite+0x6a/0x4a0
       do_page_mkwrite+0x4d/0xc0
       handle_mm_fault+0x103c/0x1730
       exc_page_fault+0x340/0x660
       asm_exc_page_fault+0x1e/0x30

-> #9 (&mm->mmap_lock#2){++++}-{3:3}:
       __might_fault+0x68/0x90
       _copy_to_user+0x1e/0x80
       perf_read+0x141/0x2c0
       vfs_read+0xad/0x1b0
       ksys_read+0x5f/0xe0
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #8 (&cpuctx_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       perf_event_init_cpu+0x88/0x150
       perf_event_init+0x1db/0x20b
       start_kernel+0x3ae/0x53c
       secondary_startup_64+0xa4/0xb0

-> #7 (pmus_lock){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       perf_event_init_cpu+0x4f/0x150
       cpuhp_invoke_callback+0xb1/0x900
       _cpu_up.constprop.26+0x9f/0x130
       cpu_up+0x7b/0xc0
       bringup_nonboot_cpus+0x4f/0x60
       smp_init+0x26/0x71
       kernel_init_freeable+0x110/0x258
       kernel_init+0xa/0x103
       ret_from_fork+0x1f/0x30

-> #6 (cpu_hotplug_lock){++++}-{0:0}:
       cpus_read_lock+0x39/0xb0
       kmem_cache_create_usercopy+0x28/0x230
       kmem_cache_create+0x12/0x20
       bioset_init+0x15e/0x2b0
       init_bio+0xa3/0xaa
       do_one_initcall+0x5a/0x2e0
       kernel_init_freeable+0x1f4/0x258
       kernel_init+0xa/0x103
       ret_from_fork+0x1f/0x30

-> #5 (bio_slab_lock){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       bioset_init+0xbc/0x2b0
       __blk_alloc_queue+0x6f/0x2d0
       blk_mq_init_queue_data+0x1b/0x70
       loop_add+0x110/0x290 [loop]
       fq_codel_tcf_block+0x12/0x20 [sch_fq_codel]
       do_one_initcall+0x5a/0x2e0
       do_init_module+0x5a/0x220
       load_module+0x2459/0x26e0
       __do_sys_finit_module+0xba/0xe0
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #4 (loop_ctl_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       lo_open+0x18/0x50 [loop]
       __blkdev_get+0xec/0x570
       blkdev_get+0xe8/0x150
       do_dentry_open+0x167/0x410
       path_openat+0x7c9/0xa80
       do_filp_open+0x93/0x100
       do_sys_openat2+0x22a/0x2e0
       do_sys_open+0x4b/0x80
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #3 (&bdev->bd_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       blkdev_put+0x1d/0x120
       close_fs_devices.part.31+0x84/0x130
       btrfs_close_devices+0x44/0xb0
       close_ctree+0x296/0x2b2
       generic_shutdown_super+0x69/0x100
       kill_anon_super+0xe/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x29/0x60
       cleanup_mnt+0xb8/0x140
       task_work_run+0x6d/0xb0
       __prepare_exit_to_usermode+0x1cc/0x1e0
       do_syscall_64+0x5c/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_run_dev_stats+0x49/0x480
       commit_cowonly_roots+0xb5/0x2a0
       btrfs_commit_transaction+0x516/0xa60
       sync_filesystem+0x6b/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0xe/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x29/0x60
       cleanup_mnt+0xb8/0x140
       task_work_run+0x6d/0xb0
       __prepare_exit_to_usermode+0x1cc/0x1e0
       do_syscall_64+0x5c/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_commit_transaction+0x4bb/0xa60
       sync_filesystem+0x6b/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0xe/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x29/0x60
       cleanup_mnt+0xb8/0x140
       task_work_run+0x6d/0xb0
       __prepare_exit_to_usermode+0x1cc/0x1e0
       do_syscall_64+0x5c/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}:
       __lock_acquire+0x1272/0x2310
       lock_acquire+0x9e/0x360
       __mutex_lock+0x9f/0x930
       btrfs_record_root_in_trans+0x43/0x70
       start_transaction+0xd1/0x5d0
       btrfs_dirty_inode+0x42/0xd0
       file_update_time+0xc8/0x110
       btrfs_page_mkwrite+0x10c/0x4a0
       do_page_mkwrite+0x4d/0xc0
       handle_mm_fault+0x103c/0x1730
       exc_page_fault+0x340/0x660
       asm_exc_page_fault+0x1e/0x30

other info that might help us debug this:

Chain exists of:
  &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sb_pagefaults);
                               lock(&mm->mmap_lock#2);
                               lock(sb_pagefaults);
  lock(&fs_info->reloc_mutex);

 *** DEADLOCK ***

3 locks held by fsstress/8739:
 #0: ffff88bee66eeb68 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x173/0x660
 #1: ffff88bfbd16e538 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x6a/0x4a0
 #2: ffff88bfbd16e630 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3da/0x5d0

stack backtrace:
CPU: 17 PID: 8739 Comm: fsstress Kdump: loaded Not tainted 5.8.0-rc7-00169-g87212851a027-dirty #929
Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
Call Trace:
 dump_stack+0x78/0xa0
 check_noncircular+0x165/0x180
 __lock_acquire+0x1272/0x2310
 ? btrfs_get_alloc_profile+0x150/0x210
 lock_acquire+0x9e/0x360
 ? btrfs_record_root_in_trans+0x43/0x70
 __mutex_lock+0x9f/0x930
 ? btrfs_record_root_in_trans+0x43/0x70
 ? lock_acquire+0x9e/0x360
 ? join_transaction+0x5d/0x450
 ? find_held_lock+0x2d/0x90
 ? btrfs_record_root_in_trans+0x43/0x70
 ? join_transaction+0x3d5/0x450
 ? btrfs_record_root_in_trans+0x43/0x70
 btrfs_record_root_in_trans+0x43/0x70
 start_transaction+0xd1/0x5d0
 btrfs_dirty_inode+0x42/0xd0
 file_update_time+0xc8/0x110
 btrfs_page_mkwrite+0x10c/0x4a0
 ? handle_mm_fault+0x5e/0x1730
 do_page_mkwrite+0x4d/0xc0
 ? __do_fault+0x32/0x150
 handle_mm_fault+0x103c/0x1730
 exc_page_fault+0x340/0x660
 ? asm_exc_page_fault+0x8/0x30
 asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7faa6c9969c4

Was seen in testing.  The fix is similar to that of

  btrfs: open device without device_list_mutex

where we're holding the device_list_mutex and then grab the bd_mutex,
which pulls in a bunch of dependencies under the bd_mutex.  We only ever
call btrfs_close_devices() on mount failure or unmount, so we're save to
not have the device_list_mutex here.  We're already holding the
uuid_mutex which keeps us safe from any external modification of the
fs_devices.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4247ac0ebb71..7e9aceadf5a8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1152,14 +1152,13 @@ static int close_fs_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_device *device, *tmp;
 
+	lockdep_assert_held(&uuid_mutex);
+
 	if (--fs_devices->opened > 0)
 		return 0;
 
-	mutex_lock(&fs_devices->device_list_mutex);
-	list_for_each_entry_safe(device, tmp, &fs_devices->devices, dev_list) {
+	list_for_each_entry_safe(device, tmp, &fs_devices->devices, dev_list)
 		btrfs_close_one_device(device);
-	}
-	mutex_unlock(&fs_devices->device_list_mutex);
 
 	WARN_ON(fs_devices->open_devices);
 	WARN_ON(fs_devices->rw_devices);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/8] btrfs: allocate scrub workqueues outside of locks
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
                   ` (4 preceding siblings ...)
  2020-08-20 15:18 ` [PATCH 5/8] btrfs: do not hold device_list_mutex when closing devices Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-20 15:18 ` [PATCH 7/8] btrfs: set the correct lockdep class for new nodes Josef Bacik
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I got the following lockdep splat while testing

======================================================
WARNING: possible circular locking dependency detected
5.8.0-rc7-00172-g021118712e59 #932 Not tainted
------------------------------------------------------
btrfs/229626 is trying to acquire lock:
ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450

but task is already holding lock:
ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #7 (&fs_info->scrub_lock){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_scrub_dev+0x11c/0x630
       btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
       btrfs_ioctl+0x2799/0x30a0
       ksys_ioctl+0x83/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_run_dev_stats+0x49/0x480
       commit_cowonly_roots+0xb5/0x2a0
       btrfs_commit_transaction+0x516/0xa60
       sync_filesystem+0x6b/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0xe/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x29/0x60
       cleanup_mnt+0xb8/0x140
       task_work_run+0x6d/0xb0
       __prepare_exit_to_usermode+0x1cc/0x1e0
       do_syscall_64+0x5c/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_commit_transaction+0x4bb/0xa60
       sync_filesystem+0x6b/0x90
       generic_shutdown_super+0x22/0x100
       kill_anon_super+0xe/0x30
       btrfs_kill_super+0x12/0x20
       deactivate_locked_super+0x29/0x60
       cleanup_mnt+0xb8/0x140
       task_work_run+0x6d/0xb0
       __prepare_exit_to_usermode+0x1cc/0x1e0
       do_syscall_64+0x5c/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       btrfs_record_root_in_trans+0x43/0x70
       start_transaction+0xd1/0x5d0
       btrfs_dirty_inode+0x42/0xd0
       touch_atime+0xa1/0xd0
       btrfs_file_mmap+0x3f/0x60
       mmap_region+0x3a4/0x640
       do_mmap+0x376/0x580
       vm_mmap_pgoff+0xd5/0x120
       ksys_mmap_pgoff+0x193/0x230
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #3 (&mm->mmap_lock#2){++++}-{3:3}:
       __might_fault+0x68/0x90
       _copy_to_user+0x1e/0x80
       perf_read+0x141/0x2c0
       vfs_read+0xad/0x1b0
       ksys_read+0x5f/0xe0
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

-> #2 (&cpuctx_mutex){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       perf_event_init_cpu+0x88/0x150
       perf_event_init+0x1db/0x20b
       start_kernel+0x3ae/0x53c
       secondary_startup_64+0xa4/0xb0

-> #1 (pmus_lock){+.+.}-{3:3}:
       __mutex_lock+0x9f/0x930
       perf_event_init_cpu+0x4f/0x150
       cpuhp_invoke_callback+0xb1/0x900
       _cpu_up.constprop.26+0x9f/0x130
       cpu_up+0x7b/0xc0
       bringup_nonboot_cpus+0x4f/0x60
       smp_init+0x26/0x71
       kernel_init_freeable+0x110/0x258
       kernel_init+0xa/0x103
       ret_from_fork+0x1f/0x30

-> #0 (cpu_hotplug_lock){++++}-{0:0}:
       __lock_acquire+0x1272/0x2310
       lock_acquire+0x9e/0x360
       cpus_read_lock+0x39/0xb0
       alloc_workqueue+0x378/0x450
       __btrfs_alloc_workqueue+0x15d/0x200
       btrfs_alloc_workqueue+0x51/0x160
       scrub_workers_get+0x5a/0x170
       btrfs_scrub_dev+0x18c/0x630
       btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
       btrfs_ioctl+0x2799/0x30a0
       ksys_ioctl+0x83/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x50/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

other info that might help us debug this:

Chain exists of:
  cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&fs_info->scrub_lock);
                               lock(&fs_devs->device_list_mutex);
                               lock(&fs_info->scrub_lock);
  lock(cpu_hotplug_lock);

 *** DEADLOCK ***

2 locks held by btrfs/229626:
 #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630
 #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630

stack backtrace:
CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932
Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
Call Trace:
 dump_stack+0x78/0xa0
 check_noncircular+0x165/0x180
 __lock_acquire+0x1272/0x2310
 lock_acquire+0x9e/0x360
 ? alloc_workqueue+0x378/0x450
 cpus_read_lock+0x39/0xb0
 ? alloc_workqueue+0x378/0x450
 alloc_workqueue+0x378/0x450
 ? rcu_read_lock_sched_held+0x52/0x80
 __btrfs_alloc_workqueue+0x15d/0x200
 btrfs_alloc_workqueue+0x51/0x160
 scrub_workers_get+0x5a/0x170
 btrfs_scrub_dev+0x18c/0x630
 ? start_transaction+0xd1/0x5d0
 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4
 btrfs_ioctl+0x2799/0x30a0
 ? do_sigaction+0x102/0x250
 ? lockdep_hardirqs_on_prepare+0xca/0x160
 ? _raw_spin_unlock_irq+0x24/0x30
 ? trace_hardirqs_on+0x1c/0xe0
 ? _raw_spin_unlock_irq+0x24/0x30
 ? do_sigaction+0x102/0x250
 ? ksys_ioctl+0x83/0xc0
 ksys_ioctl+0x83/0xc0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x50/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happens because we're allocating the scrub workqueues under the
scrub and device list mutex, which brings in a whole host of other
dependencies.  Fix this by moving the actual allocation outside of the
scrub lock, and then only take the lock once we're ready to actually
assign them to the fs_info.  We'll now have to cleanup the workqueues in
a few more places, so I've added a helper to do the refcount dance to
safely free the workqueues.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/scrub.c | 122 +++++++++++++++++++++++++++--------------------
 1 file changed, 70 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 5a6cb9db512e..7f7098e3110e 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3716,50 +3716,84 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx,
 	return 0;
 }
 
+static void scrub_workers_put(struct btrfs_fs_info *fs_info)
+{
+	if (refcount_dec_and_mutex_lock(&fs_info->scrub_workers_refcnt,
+					&fs_info->scrub_lock)) {
+		struct btrfs_workqueue *scrub_workers = NULL;
+		struct btrfs_workqueue *scrub_wr_comp = NULL;
+		struct btrfs_workqueue *scrub_parity = NULL;
+
+		scrub_workers = fs_info->scrub_workers;
+		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
+		scrub_parity = fs_info->scrub_parity_workers;
+
+		fs_info->scrub_workers = NULL;
+		fs_info->scrub_wr_completion_workers = NULL;
+		fs_info->scrub_parity_workers = NULL;
+		mutex_unlock(&fs_info->scrub_lock);
+
+		btrfs_destroy_workqueue(scrub_workers);
+		btrfs_destroy_workqueue(scrub_wr_comp);
+		btrfs_destroy_workqueue(scrub_parity);
+	}
+}
+
 /*
  * get a reference count on fs_info->scrub_workers. start worker if necessary
  */
 static noinline_for_stack int scrub_workers_get(struct btrfs_fs_info *fs_info,
 						int is_dev_replace)
 {
+	struct btrfs_workqueue *scrub_workers = NULL;
+	struct btrfs_workqueue *scrub_wr_comp = NULL;
+	struct btrfs_workqueue *scrub_parity = NULL;
 	unsigned int flags = WQ_FREEZABLE | WQ_UNBOUND;
 	int max_active = fs_info->thread_pool_size;
+	int ret = -ENOMEM;
 
-	lockdep_assert_held(&fs_info->scrub_lock);
+	if (refcount_inc_not_zero(&fs_info->scrub_workers_refcnt))
+		return 0;
 
-	if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
-		ASSERT(fs_info->scrub_workers == NULL);
-		fs_info->scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub",
-				flags, is_dev_replace ? 1 : max_active, 4);
-		if (!fs_info->scrub_workers)
-			goto fail_scrub_workers;
-
-		ASSERT(fs_info->scrub_wr_completion_workers == NULL);
-		fs_info->scrub_wr_completion_workers =
-			btrfs_alloc_workqueue(fs_info, "scrubwrc", flags,
-					      max_active, 2);
-		if (!fs_info->scrub_wr_completion_workers)
-			goto fail_scrub_wr_completion_workers;
+	scrub_workers = btrfs_alloc_workqueue(fs_info, "scrub", flags,
+					      is_dev_replace ? 1 : max_active,
+					      4);
+	if (!scrub_workers)
+		goto fail_scrub_workers;
 
-		ASSERT(fs_info->scrub_parity_workers == NULL);
-		fs_info->scrub_parity_workers =
-			btrfs_alloc_workqueue(fs_info, "scrubparity", flags,
+	scrub_wr_comp = btrfs_alloc_workqueue(fs_info, "scrubwrc", flags,
 					      max_active, 2);
-		if (!fs_info->scrub_parity_workers)
-			goto fail_scrub_parity_workers;
+	if (!scrub_wr_comp)
+		goto fail_scrub_wr_completion_workers;
 
+	scrub_parity = btrfs_alloc_workqueue(fs_info, "scrubparity", flags,
+					     max_active, 2);
+	if (!scrub_parity)
+		goto fail_scrub_parity_workers;
+
+	mutex_lock(&fs_info->scrub_lock);
+	if (refcount_read(&fs_info->scrub_workers_refcnt) == 0) {
+		ASSERT(fs_info->scrub_workers == NULL &&
+		       fs_info->scrub_wr_completion_workers == NULL &&
+		       fs_info->scrub_parity_workers == NULL);
+		fs_info->scrub_workers = scrub_workers;
+		fs_info->scrub_wr_completion_workers = scrub_wr_comp;
+		fs_info->scrub_parity_workers = scrub_parity;
 		refcount_set(&fs_info->scrub_workers_refcnt, 1);
-	} else {
-		refcount_inc(&fs_info->scrub_workers_refcnt);
+		mutex_unlock(&fs_info->scrub_lock);
+		return 0;
 	}
-	return 0;
+	refcount_inc(&fs_info->scrub_workers_refcnt);
+	mutex_unlock(&fs_info->scrub_lock);
 
+	ret = 0;
+	btrfs_destroy_workqueue(scrub_parity);
 fail_scrub_parity_workers:
-	btrfs_destroy_workqueue(fs_info->scrub_wr_completion_workers);
+	btrfs_destroy_workqueue(scrub_wr_comp);
 fail_scrub_wr_completion_workers:
-	btrfs_destroy_workqueue(fs_info->scrub_workers);
+	btrfs_destroy_workqueue(scrub_workers);
 fail_scrub_workers:
-	return -ENOMEM;
+	return ret;
 }
 
 int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
@@ -3770,9 +3804,6 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	int ret;
 	struct btrfs_device *dev;
 	unsigned int nofs_flag;
-	struct btrfs_workqueue *scrub_workers = NULL;
-	struct btrfs_workqueue *scrub_wr_comp = NULL;
-	struct btrfs_workqueue *scrub_parity = NULL;
 
 	if (btrfs_fs_closing(fs_info))
 		return -EAGAIN;
@@ -3819,13 +3850,17 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 	if (IS_ERR(sctx))
 		return PTR_ERR(sctx);
 
+	ret = scrub_workers_get(fs_info, is_dev_replace);
+	if (ret)
+		goto out_free_ctx;
+
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	dev = btrfs_find_device(fs_info->fs_devices, devid, NULL, NULL, true);
 	if (!dev || (test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) &&
 		     !is_dev_replace)) {
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 		ret = -ENODEV;
-		goto out_free_ctx;
+		goto out;
 	}
 
 	if (!is_dev_replace && !readonly &&
@@ -3834,7 +3869,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		btrfs_err_in_rcu(fs_info, "scrub: device %s is not writable",
 				rcu_str_deref(dev->name));
 		ret = -EROFS;
-		goto out_free_ctx;
+		goto out;
 	}
 
 	mutex_lock(&fs_info->scrub_lock);
@@ -3843,7 +3878,7 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 		ret = -EIO;
-		goto out_free_ctx;
+		goto out;
 	}
 
 	down_read(&fs_info->dev_replace.rwsem);
@@ -3854,17 +3889,10 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 		ret = -EINPROGRESS;
-		goto out_free_ctx;
+		goto out;
 	}
 	up_read(&fs_info->dev_replace.rwsem);
 
-	ret = scrub_workers_get(fs_info, is_dev_replace);
-	if (ret) {
-		mutex_unlock(&fs_info->scrub_lock);
-		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
-		goto out_free_ctx;
-	}
-
 	sctx->readonly = readonly;
 	dev->scrub_ctx = sctx;
 	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
@@ -3917,24 +3945,14 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 
 	mutex_lock(&fs_info->scrub_lock);
 	dev->scrub_ctx = NULL;
-	if (refcount_dec_and_test(&fs_info->scrub_workers_refcnt)) {
-		scrub_workers = fs_info->scrub_workers;
-		scrub_wr_comp = fs_info->scrub_wr_completion_workers;
-		scrub_parity = fs_info->scrub_parity_workers;
-
-		fs_info->scrub_workers = NULL;
-		fs_info->scrub_wr_completion_workers = NULL;
-		fs_info->scrub_parity_workers = NULL;
-	}
 	mutex_unlock(&fs_info->scrub_lock);
 
-	btrfs_destroy_workqueue(scrub_workers);
-	btrfs_destroy_workqueue(scrub_wr_comp);
-	btrfs_destroy_workqueue(scrub_parity);
+	scrub_workers_put(fs_info);
 	scrub_put_ctx(sctx);
 
 	return ret;
-
+out:
+	scrub_workers_put(fs_info);
 out_free_ctx:
 	scrub_free_ctx(sctx);
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 7/8] btrfs: set the correct lockdep class for new nodes
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
                   ` (5 preceding siblings ...)
  2020-08-20 15:18 ` [PATCH 6/8] btrfs: allocate scrub workqueues outside of locks Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-21 13:36   ` Nikolay Borisov
  2020-08-20 15:18 ` [PATCH 8/8] btrfs: set the lockdep class for log tree extent buffers Josef Bacik
  2020-08-26 10:50 ` [PATCH 0/8] Lockdep fixes David Sterba
  8 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
in replace_path(), which is weird because we're swapping the reloc root
with the actual target root.  Turns out this is because we're using the
root->root_key.objectid as the root id for the newly allocated tree
block when setting the lockdep class, however we need to be using the
actual owner of this new block, which is saved in owner.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 73973e6e8ba6..6c2373f65be2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4522,7 +4522,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 		return ERR_PTR(-EUCLEAN);
 	}
 
-	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
+	btrfs_set_buffer_lockdep_class(owner, buf, level);
 	btrfs_tree_lock(buf);
 	btrfs_clean_tree_block(buf);
 	clear_bit(EXTENT_BUFFER_STALE, &buf->bflags);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 8/8] btrfs: set the lockdep class for log tree extent buffers
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
                   ` (6 preceding siblings ...)
  2020-08-20 15:18 ` [PATCH 7/8] btrfs: set the correct lockdep class for new nodes Josef Bacik
@ 2020-08-20 15:18 ` Josef Bacik
  2020-08-21 13:43   ` Nikolay Borisov
  2020-08-26 10:50 ` [PATCH 0/8] Lockdep fixes David Sterba
  8 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2020-08-20 15:18 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

These are special extent buffers that get rewound in order to lookup
the state of the tree at a specific point in time.  As such they do not
go through the normal initialization paths that set their lockdep class,
so handle them appropriately when they are created and before they are
locked.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 09a0dc690a17..e0fd8a34efcc 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1297,6 +1297,8 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
 	btrfs_tree_read_unlock_blocking(eb);
 	free_extent_buffer(eb);
 
+	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb_rewin),
+				       eb_rewin, btrfs_header_level(eb_rewin));
 	btrfs_tree_read_lock(eb_rewin);
 	__tree_mod_log_rewind(fs_info, eb_rewin, time_seq, tm);
 	WARN_ON(btrfs_header_nritems(eb_rewin) >
@@ -1370,7 +1372,6 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
 
 	if (!eb)
 		return NULL;
-	btrfs_tree_read_lock(eb);
 	if (old_root) {
 		btrfs_set_header_bytenr(eb, eb->start);
 		btrfs_set_header_backref_rev(eb, BTRFS_MIXED_BACKREF_REV);
@@ -1378,6 +1379,9 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
 		btrfs_set_header_level(eb, old_root->level);
 		btrfs_set_header_generation(eb, old_generation);
 	}
+	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), eb,
+				       btrfs_header_level(eb));
+	btrfs_tree_read_lock(eb);
 	if (tm)
 		__tree_mod_log_rewind(fs_info, eb, time_seq, tm);
 	else
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/8] btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing
  2020-08-20 15:18 ` [PATCH 3/8] btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing Josef Bacik
@ 2020-08-21 12:56   ` Nikolay Borisov
  0 siblings, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2020-08-21 12:56 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 20.08.20 г. 18:18 ч., Josef Bacik wrote:
> We need to move the closing of the src_device out of all the device
> replace locking, but we definitely want to zero out the superblock
> before we commit the last time to make sure the device is properly
> removed.  Handle this by pushing btrfs_scratch_superblocks into
> btrfs_dev_replace_finishing, and then later on we'll move the src_device
> closing and freeing stuff where we need it to be.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 7/8] btrfs: set the correct lockdep class for new nodes
  2020-08-20 15:18 ` [PATCH 7/8] btrfs: set the correct lockdep class for new nodes Josef Bacik
@ 2020-08-21 13:36   ` Nikolay Borisov
  0 siblings, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2020-08-21 13:36 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 20.08.20 г. 18:18 ч., Josef Bacik wrote:
> When flipping over to the rw_semaphore I noticed I'd get a lockdep splat
> in replace_path(), which is weird because we're swapping the reloc root
> with the actual target root.  Turns out this is because we're using the
> root->root_key.objectid as the root id for the newly allocated tree
> block when setting the lockdep class, however we need to be using the
> actual owner of this new block, which is saved in owner.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

You can be slightly more specific by mentioning that the affected path
is through btrfs_copy_root as all other callers of
btrfs_alloc_tree_block (which calls init_new_buffer) have root_objectid
== root->root_key.objectid

Otherwise the change is good.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 73973e6e8ba6..6c2373f65be2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4522,7 +4522,7 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  		return ERR_PTR(-EUCLEAN);
>  	}
>  
> -	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
> +	btrfs_set_buffer_lockdep_class(owner, buf, level);
>  	btrfs_tree_lock(buf);
>  	btrfs_clean_tree_block(buf);
>  	clear_bit(EXTENT_BUFFER_STALE, &buf->bflags);
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 8/8] btrfs: set the lockdep class for log tree extent buffers
  2020-08-20 15:18 ` [PATCH 8/8] btrfs: set the lockdep class for log tree extent buffers Josef Bacik
@ 2020-08-21 13:43   ` Nikolay Borisov
  0 siblings, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2020-08-21 13:43 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 20.08.20 г. 18:18 ч., Josef Bacik wrote:
> These are special extent buffers that get rewound in order to lookup
> the state of the tree at a specific point in time.  As such they do not
> go through the normal initialization paths that set their lockdep class,
> so handle them appropriately when they are created and before they are
> locked.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Setting the lockdep class is fine, however in get_old_root your also
reduce the scope of the eb's read lock. That's fine since in the if
(old_root) branch we only set some values to the eb, which is guaranteed
to be a clone/private copy i.e no other references.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/ctree.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 09a0dc690a17..e0fd8a34efcc 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1297,6 +1297,8 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
>  	btrfs_tree_read_unlock_blocking(eb);
>  	free_extent_buffer(eb);
>  
> +	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb_rewin),
> +				       eb_rewin, btrfs_header_level(eb_rewin));
>  	btrfs_tree_read_lock(eb_rewin);
>  	__tree_mod_log_rewind(fs_info, eb_rewin, time_seq, tm);
>  	WARN_ON(btrfs_header_nritems(eb_rewin) >
> @@ -1370,7 +1372,6 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
>  
>  	if (!eb)
>  		return NULL;
> -	btrfs_tree_read_lock(eb);
>  	if (old_root) {
>  		btrfs_set_header_bytenr(eb, eb->start);
>  		btrfs_set_header_backref_rev(eb, BTRFS_MIXED_BACKREF_REV);
> @@ -1378,6 +1379,9 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
>  		btrfs_set_header_level(eb, old_root->level);
>  		btrfs_set_header_generation(eb, old_generation);
>  	}
> +	btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb), eb,
> +				       btrfs_header_level(eb));
> +	btrfs_tree_read_lock(eb);
>  	if (tm)
>  		__tree_mod_log_rewind(fs_info, eb, time_seq, tm);
>  	else
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/8] Lockdep fixes
  2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
                   ` (7 preceding siblings ...)
  2020-08-20 15:18 ` [PATCH 8/8] btrfs: set the lockdep class for log tree extent buffers Josef Bacik
@ 2020-08-26 10:50 ` David Sterba
  8 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2020-08-26 10:50 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Thu, Aug 20, 2020 at 11:18:23AM -0400, Josef Bacik wrote:
> Hello,
> 
> While reworking the btrfs locking to use a rwsem I discovered a few more lockdep
> issues because of the new locking.  These are stand alone fixes that are safe
> enough to go into 5.9, so I've broken them out from the overall locking rework.
> I addressed the issues that Filipe pointed out (I think I did anyway), and run
> the whole series through xfstests and saw no more lockdep messages.
>  
> fs/btrfs/ctree.c       |   6 +-
>  fs/btrfs/dev-replace.c |   6 +-
>  fs/btrfs/extent-tree.c |   2 +-
>  fs/btrfs/extent_io.c   |   8 +--
>  fs/btrfs/extent_io.h   |   6 +-
>  fs/btrfs/ioctl.c       |  27 ++++++---
>  fs/btrfs/scrub.c       | 122 +++++++++++++++++++++++------------------
>  fs/btrfs/volumes.c     |  23 ++++----
>  fs/btrfs/volumes.h     |   3 +
>  9 files changed, 121 insertions(+), 82 deletions(-)

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-08-26 10:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20 15:18 [PATCH 0/8] Lockdep fixes Josef Bacik
2020-08-20 15:18 ` [PATCH 1/8] btrfs: drop path before adding new uuid tree entry Josef Bacik
2020-08-20 15:18 ` [PATCH 2/8] btrfs: fix potential deadlock in the search ioctl Josef Bacik
2020-08-20 15:18 ` [PATCH 3/8] btrfs: move btrfs_scratch_superblocks into btrfs_dev_replace_finishing Josef Bacik
2020-08-21 12:56   ` Nikolay Borisov
2020-08-20 15:18 ` [PATCH 4/8] btrfs: move btrfs_rm_dev_replace_free_srcdev outside of all locks Josef Bacik
2020-08-20 15:18 ` [PATCH 5/8] btrfs: do not hold device_list_mutex when closing devices Josef Bacik
2020-08-20 15:18 ` [PATCH 6/8] btrfs: allocate scrub workqueues outside of locks Josef Bacik
2020-08-20 15:18 ` [PATCH 7/8] btrfs: set the correct lockdep class for new nodes Josef Bacik
2020-08-21 13:36   ` Nikolay Borisov
2020-08-20 15:18 ` [PATCH 8/8] btrfs: set the lockdep class for log tree extent buffers Josef Bacik
2020-08-21 13:43   ` Nikolay Borisov
2020-08-26 10:50 ` [PATCH 0/8] Lockdep fixes David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.