All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7]
@ 2021-07-27 21:01 Josef Bacik
  2021-07-27 21:01 ` [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- Rework the first patch as it was wrong because we need it for seed devices.
- Fix another lockdep splat I uncovered while testing against seed devices to
  make sure I hadn't broken anything.

--- Original email ---

Hello,

The commit 87579e9b7d8d ("loop: use worker per cgroup instead of kworker")
enabled the use of workqueues for loopback devices, which brought with it
lockdep annotations for the workqueues for loopback devices.  This uncovered a
cascade of lockdep warnings because of how we mess with the block_device while
under the sb writers lock while doing the device removal.

The first patch seems innocuous but we have a lockdep_assert_held(&uuid_mutex)
in one of the helpers, which is why I have it first.  The code should never be
called which is why it is removed, but I'm removing it specifically to remove
confusion about the role of the uuid_mutex here.

The next 4 patches are to resolve the lockdep messages as they occur.  There are
several issues and I address them one at a time until we're no longer getting
lockdep warnings.

The final patch doesn't necessarily have to go in right away, it's just a
cleanup as I noticed we have a lot of duplicated code between the v1 and v2
device removal handling.  Thanks,

Josef

Josef Bacik (7):
  btrfs: do not call close_fs_devices in btrfs_rm_device
  btrfs: do not take the uuid_mutex in btrfs_rm_device
  btrfs: do not read super look for a device path
  btrfs: update the bdev time directly when closing
  btrfs: delay blkdev_put until after the device remove
  btrfs: unify common code for the v1 and v2 versions of device remove
  btrfs: do not take the device_list_mutex in clone_fs_devices

 fs/btrfs/ioctl.c   |  92 +++++++++++++++--------------------
 fs/btrfs/volumes.c | 118 ++++++++++++++++++++++-----------------------
 fs/btrfs/volumes.h |   3 +-
 3 files changed, 101 insertions(+), 112 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-09-01  8:13   ` Anand Jain
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

There's a subtle case where if we're removing the seed device from a
file system we need to free its private copy of the fs_devices.  However
we do not need to call close_fs_devices(), because at this point there
are no devices left to close as we've closed the last one.  The only
thing that close_fs_devices() does is decrement ->opened, which should
be 1.  We want to avoid calling close_fs_devices() here because it has a
lockdep_assert_held(&uuid_mutex), and we are going to stop holding the
uuid_mutex in this path.

So add an assert for the ->opened counter and simply decrement it like
we should, and then clean up like normal.  Also add a comment explaining
what we're doing here as I initially removed this code erroneously.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 86846d6e58d0..5217b93172b4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2200,9 +2200,17 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	synchronize_rcu();
 	btrfs_free_device(device);
 
+	/*
+	 * This can happen if cur_devices is the private seed devices list.  We
+	 * cannot call close_fs_devices() here because it expects the uuid_mutex
+	 * to be held, but in fact we don't need that for the private
+	 * seed_devices, we can simply decrement cur_devices->opened and then
+	 * remove it from our list and free the fs_devices.
+	 */
 	if (cur_devices->open_devices == 0) {
+		ASSERT(cur_devices->opened == 1);
 		list_del_init(&cur_devices->seed_list);
-		close_fs_devices(cur_devices);
+		cur_devices->opened--;
 		free_fs_devices(cur_devices);
 	}
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
  2021-07-27 21:01 ` [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-09-01 12:01   ` Anand Jain
                     ` (4 more replies)
  2021-07-27 21:01 ` [PATCH v2 3/7] btrfs: do not read super look for a device path Josef Bacik
                   ` (5 subsequent siblings)
  7 siblings, 5 replies; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We got the following lockdep splat while running xfstests (specifically
btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
converted loop to using workqueues, which comes with lockdep
annotations that don't exist with kworkers.  The lockdep splat is as
follows

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2-custom+ #34 Not tainted
------------------------------------------------------
losetup/156417 is trying to acquire lock:
ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600

but task is already holding lock:
ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0xba/0x7c0
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x28/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x163/0x3a0
       path_openat+0x74d/0xa40
       do_filp_open+0x9c/0x140
       do_sys_openat2+0xb1/0x170
       __x64_sys_openat+0x54/0x90
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0xba/0x7c0
       blkdev_get_by_dev.part.0+0xd1/0x3c0
       blkdev_get_by_path+0xc0/0xd0
       btrfs_scan_one_device+0x52/0x1f0 [btrfs]
       btrfs_control_ioctl+0xac/0x170 [btrfs]
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (uuid_mutex){+.+.}-{3:3}:
       __mutex_lock+0xba/0x7c0
       btrfs_rm_device+0x48/0x6a0 [btrfs]
       btrfs_ioctl+0x2d1c/0x3110 [btrfs]
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#11){.+.+}-{0:0}:
       lo_write_bvec+0x112/0x290 [loop]
       loop_process_work+0x25f/0xcb0 [loop]
       process_one_work+0x28f/0x5d0
       worker_thread+0x55/0x3c0
       kthread+0x140/0x170
       ret_from_fork+0x22/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x266/0x5d0
       worker_thread+0x55/0x3c0
       kthread+0x140/0x170
       ret_from_fork+0x22/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x1130/0x1dc0
       lock_acquire+0xf5/0x320
       flush_workqueue+0xae/0x600
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x650 [loop]
       lo_ioctl+0x29d/0x780 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:
Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***
1 lock held by losetup/156417:
 #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

stack backtrace:
CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0x10a/0x120
 __lock_acquire+0x1130/0x1dc0
 lock_acquire+0xf5/0x320
 ? flush_workqueue+0x84/0x600
 flush_workqueue+0xae/0x600
 ? flush_workqueue+0x84/0x600
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x650 [loop]
 lo_ioctl+0x29d/0x780 [loop]
 ? __lock_acquire+0x3a0/0x1dc0
 ? update_dl_rq_load_avg+0x152/0x360
 ? lock_is_held_type+0xa5/0x120
 ? find_held_lock.constprop.0+0x2b/0x80
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x83/0xb0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f645884de6b

Usually the uuid_mutex exists to protect the fs_devices that map
together all of the devices that match a specific uuid.  In rm_device
we're messing with the uuid of a device, so it makes sense to protect
that here.

However in doing that it pulls in a whole host of lockdep dependencies,
as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
we end up with the dependency chain under the uuid_mutex being added
under the normal sb write dependency chain, which causes problems with
loop devices.

We don't need the uuid mutex here however.  If we call
btrfs_scan_one_device() before we scratch the super block we will find
the fs_devices and not find the device itself and return EBUSY because
the fs_devices is open.  If we call it after the scratch happens it will
not appear to be a valid btrfs file system.

We do not need to worry about other fs_devices modifying operations here
because we're protected by the exclusive operations locking.

So drop the uuid_mutex here in order to fix the lockdep splat.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5217b93172b4..0e7372f637eb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	u64 num_devices;
 	int ret = 0;
 
-	mutex_lock(&uuid_mutex);
-
 	num_devices = btrfs_num_devices(fs_info);
 
 	ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
@@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 		mutex_unlock(&fs_info->chunk_mutex);
 	}
 
-	mutex_unlock(&uuid_mutex);
 	ret = btrfs_shrink_device(device, 0);
 	if (!ret)
 		btrfs_reada_remove_dev(device);
-	mutex_lock(&uuid_mutex);
 	if (ret)
 		goto error_undo;
 
@@ -2215,7 +2211,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	}
 
 out:
-	mutex_unlock(&uuid_mutex);
 	return ret;
 
 error_undo:
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/7] btrfs: do not read super look for a device path
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
  2021-07-27 21:01 ` [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-08-25  2:00   ` Anand Jain
  2021-07-27 21:01 ` [PATCH v2 4/7] btrfs: update the bdev time directly when closing Josef Bacik
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

For device removal and replace we call btrfs_find_device_by_devspec,
which if we give it a device path and nothing else will call
btrfs_find_device_by_path, which opens the block device and reads the
super block and then looks up our device based on that.

However this is completely unnecessary because we have the path stored
in our device on our fsdevices.  All we need to do if we're given a path
is look through the fs_devices on our file system and use that device if
we find it, reading the super block is just silly.

This fixes the case where we end up with our sb write "lock" getting the
dependency of the block device ->open_mutex, which resulted in the
following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #405 Not tainted
------------------------------------------------------
losetup/11576 is trying to acquire lock:
ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_get_by_path+0x98/0xa0
       btrfs_get_bdev_and_sb+0x1b/0xb0
       btrfs_find_device_by_devspec+0x12b/0x1c0
       btrfs_rm_device+0x127/0x610
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11576:
 #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f31b02404cb

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 61 +++++++++++++++++-----------------------------
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0e7372f637eb..bf2449cdb2ab 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2313,37 +2313,22 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
 	btrfs_free_device(tgtdev);
 }
 
-static struct btrfs_device *btrfs_find_device_by_path(
-		struct btrfs_fs_info *fs_info, const char *device_path)
+static struct btrfs_device *find_device_by_path(
+					struct btrfs_fs_devices *fs_devices,
+					const char *path)
 {
-	int ret = 0;
-	struct btrfs_super_block *disk_super;
-	u64 devid;
-	u8 *dev_uuid;
-	struct block_device *bdev;
 	struct btrfs_device *device;
+	bool missing = !strcmp(path, "missing");
 
-	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
-				    fs_info->bdev_holder, 0, &bdev, &disk_super);
-	if (ret)
-		return ERR_PTR(ret);
-
-	devid = btrfs_stack_device_id(&disk_super->dev_item);
-	dev_uuid = disk_super->dev_item.uuid;
-	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
-		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
-					   disk_super->metadata_uuid);
-	else
-		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
-					   disk_super->fsid);
-
-	btrfs_release_disk_super(disk_super);
-	if (!device)
-		device = ERR_PTR(-ENOENT);
-	blkdev_put(bdev, FMODE_READ);
-	return device;
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		if (missing && test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
+					&device->dev_state) && !device->bdev)
+			return device;
+		if (!missing && device_path_matched(path, device))
+			return device;
+	}
+	return NULL;
 }
-
 /*
  * Lookup a device given by device id, or the path if the id is 0.
  */
@@ -2351,6 +2336,7 @@ struct btrfs_device *btrfs_find_device_by_devspec(
 		struct btrfs_fs_info *fs_info, u64 devid,
 		const char *device_path)
 {
+	struct btrfs_fs_devices *seed_devs;
 	struct btrfs_device *device;
 
 	if (devid) {
@@ -2364,18 +2350,17 @@ struct btrfs_device *btrfs_find_device_by_devspec(
 	if (!device_path || !device_path[0])
 		return ERR_PTR(-EINVAL);
 
-	if (strcmp(device_path, "missing") == 0) {
-		/* Find first missing device */
-		list_for_each_entry(device, &fs_info->fs_devices->devices,
-				    dev_list) {
-			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
-				     &device->dev_state) && !device->bdev)
-				return device;
-		}
-		return ERR_PTR(-ENOENT);
-	}
+	device = find_device_by_path(fs_info->fs_devices, device_path);
+	if (device)
+		return device;
 
-	return btrfs_find_device_by_path(fs_info, device_path);
+	list_for_each_entry(seed_devs, &fs_info->fs_devices->seed_list,
+			    seed_list) {
+		device = find_device_by_path(seed_devs, device_path);
+		if (device)
+			return device;
+	}
+	return ERR_PTR(-ENOENT);
 }
 
 /*
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/7] btrfs: update the bdev time directly when closing
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
                   ` (2 preceding siblings ...)
  2021-07-27 21:01 ` [PATCH v2 3/7] btrfs: do not read super look for a device path Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-08-25  0:35   ` Anand Jain
  2021-09-02 12:16   ` David Sterba
  2021-07-27 21:01 ` [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove Josef Bacik
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We update the ctime/mtime of a block device when we remove it so that
blkid knows the device changed.  However we do this by re-opening the
block device and calling filp_update_time.  This is more correct because
it'll call the inode->i_op->update_time if it exists, but the block dev
inodes do not do this.  Instead call generic_update_time() on the
bd_inode in order to avoid the blkdev_open path and get rid of the
following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #406 Not tainted
------------------------------------------------------
losetup/11596 is trying to acquire lock:
ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       file_open_name+0xc7/0x170
       filp_open+0x2c/0x50
       btrfs_scratch_superblocks.part.0+0x10f/0x170
       btrfs_rm_device.cold+0xe8/0xed
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11596:
 #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf2449cdb2ab..3ab6c78e6eb2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1882,15 +1882,17 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
  * Function to update ctime/mtime for a given device path.
  * Mainly used for ctime/mtime based probe like libblkid.
  */
-static void update_dev_time(const char *path_name)
+static void update_dev_time(struct block_device *bdev)
 {
-	struct file *filp;
+	struct inode *inode = bdev->bd_inode;
+	struct timespec64 now;
 
-	filp = filp_open(path_name, O_RDWR, 0);
-	if (IS_ERR(filp))
+	/* Shouldn't happen but just in case. */
+	if (!inode)
 		return;
-	file_update_time(filp);
-	filp_close(filp, NULL);
+
+	now = current_time(inode);
+	generic_update_time(inode, &now, S_MTIME|S_CTIME);
 }
 
 static int btrfs_rm_dev_item(struct btrfs_device *device)
@@ -2070,7 +2072,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
 
 	/* Update ctime/mtime for device path for libblkid */
-	update_dev_time(device_path);
+	update_dev_time(bdev);
 }
 
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
@@ -2711,7 +2713,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	btrfs_forget_devices(device_path);
 
 	/* Update ctime/mtime for blkid or udev */
-	update_dev_time(device_path);
+	update_dev_time(bdev);
 
 	return ret;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
                   ` (3 preceding siblings ...)
  2021-07-27 21:01 ` [PATCH v2 4/7] btrfs: update the bdev time directly when closing Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-08-25  1:00   ` Anand Jain
  2021-09-02 12:16   ` David Sterba
  2021-07-27 21:01 ` [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When removing the device we call blkdev_put() on the device once we've
removed it, and because we have an EXCL open we need to take the
->open_mutex on the block device to clean it up.  Unfortunately during
device remove we are holding the sb writers lock, which results in the
following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #407 Not tainted
------------------------------------------------------
losetup/11595 is trying to acquire lock:
ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_put+0x3a/0x220
       btrfs_rm_device.cold+0x62/0xe5
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11595:
 #0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc21255d4cb

So instead save the bdev and do the put once we've dropped the sb
writers lock in order to avoid the lockdep recursion.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c   | 17 ++++++++++++++---
 fs/btrfs/volumes.c | 19 +++++++++++++++----
 fs/btrfs/volumes.h |  3 ++-
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0ba98e08a029..fabbfdfa56f5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3205,6 +3205,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ioctl_vol_args_v2 *vol_args;
+	struct block_device *bdev = NULL;
+	fmode_t mode;
 	int ret;
 	bool cancel = false;
 
@@ -3237,9 +3239,11 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	/* Exclusive operation is now claimed */
 
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
+		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev,
+				      &mode);
 	else
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
+		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
+				      &mode);
 
 	btrfs_exclop_finish(fs_info);
 
@@ -3255,6 +3259,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	kfree(vol_args);
 err_drop:
 	mnt_drop_write_file(file);
+	if (bdev)
+		blkdev_put(bdev, mode);
 	return ret;
 }
 
@@ -3263,6 +3269,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ioctl_vol_args *vol_args;
+	struct block_device *bdev = NULL;
+	fmode_t mode;
 	int ret;
 	bool cancel;
 
@@ -3284,7 +3292,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
 					   cancel);
 	if (ret == 0) {
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
+		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
+				      &mode);
 		if (!ret)
 			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
 		btrfs_exclop_finish(fs_info);
@@ -3294,6 +3303,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 out_drop_write:
 	mnt_drop_write_file(file);
 
+	if (bdev)
+		blkdev_put(bdev, mode);
 	return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3ab6c78e6eb2..f622e93a6ff1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2076,7 +2076,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 }
 
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
-		    u64 devid)
+		    u64 devid, struct block_device **bdev, fmode_t *mode)
 {
 	struct btrfs_device *device;
 	struct btrfs_fs_devices *cur_devices;
@@ -2186,15 +2186,26 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	mutex_unlock(&fs_devices->device_list_mutex);
 
 	/*
-	 * at this point, the device is zero sized and detached from
+	 * At this point, the device is zero sized and detached from
 	 * the devices list.  All that's left is to zero out the old
 	 * supers and free the device.
+	 *
+	 * We cannot call btrfs_close_bdev() here because we're holding the sb
+	 * write lock, and blkdev_put() will pull in the ->open_mutex on the
+	 * block device and it's dependencies.  Instead just flush the device
+	 * and let the caller do the final blkdev_put.
 	 */
-	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
+	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
 		btrfs_scratch_superblocks(fs_info, device->bdev,
 					  device->name->str);
+		if (device->bdev) {
+			sync_blockdev(device->bdev);
+			invalidate_bdev(device->bdev);
+		}
+	}
 
-	btrfs_close_bdev(device);
+	*bdev = device->bdev;
+	*mode = device->mode;
 	synchronize_rcu();
 	btrfs_free_device(device);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 70c749eee3ad..cc70e54cb901 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -472,7 +472,8 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u8 *uuid);
 void btrfs_free_device(struct btrfs_device *device);
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
-		    const char *device_path, u64 devid);
+		    const char *device_path, u64 devid,
+		    struct block_device **bdev, fmode_t *mode);
 void __exit btrfs_cleanup_fs_uuids(void);
 int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
 int btrfs_grow_device(struct btrfs_trans_handle *trans,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of device remove
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
                   ` (4 preceding siblings ...)
  2021-07-27 21:01 ` [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-08-25  1:19   ` Anand Jain
  2021-09-01 14:05   ` Nikolay Borisov
  2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
  2021-09-17 15:06 ` [PATCH v2 0/7] David Sterba
  7 siblings, 2 replies; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

These things share a lot of common code, v2 simply allows you to specify
devid.  Abstract out this common code and use the helper by both the v1
and v2 interfaces to save us some lines of code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c | 99 +++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 61 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fabbfdfa56f5..e3a7e8544609 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3200,15 +3200,14 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
 	return ret;
 }
 
-static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
+static long btrfs_do_device_removal(struct file *file, const char *path,
+				    u64 devid, bool cancel)
 {
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_ioctl_vol_args_v2 *vol_args;
 	struct block_device *bdev = NULL;
 	fmode_t mode;
 	int ret;
-	bool cancel = false;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -3217,11 +3216,37 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	if (ret)
 		return ret;
 
-	vol_args = memdup_user(arg, sizeof(*vol_args));
-	if (IS_ERR(vol_args)) {
-		ret = PTR_ERR(vol_args);
-		goto err_drop;
+	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
+					   cancel);
+	if (ret)
+		goto out;
+
+	/* Exclusive operation is now claimed */
+	ret = btrfs_rm_device(fs_info, path, devid, &bdev, &mode);
+	btrfs_exclop_finish(fs_info);
+
+	if (!ret) {
+		if (path)
+			btrfs_info(fs_info, "device deleted: %s", path);
+		else
+			btrfs_info(fs_info, "device deleted: id %llu", devid);
 	}
+out:
+	mnt_drop_write_file(file);
+	if (bdev)
+		blkdev_put(bdev, mode);
+	return ret;
+}
+
+static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
+{
+	struct btrfs_ioctl_vol_args_v2 *vol_args;
+	int ret = 0;
+	bool cancel = false;
+
+	vol_args = memdup_user(arg, sizeof(*vol_args));
+	if (IS_ERR(vol_args))
+		return PTR_ERR(vol_args);
 
 	if (vol_args->flags & ~BTRFS_DEVICE_REMOVE_ARGS_MASK) {
 		ret = -EOPNOTSUPP;
@@ -3232,79 +3257,31 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	    strcmp("cancel", vol_args->name) == 0)
 		cancel = true;
 
-	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
-					   cancel);
-	if (ret)
-		goto out;
-	/* Exclusive operation is now claimed */
-
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev,
-				      &mode);
+		ret = btrfs_do_device_removal(file, NULL, vol_args->devid,
+					      cancel);
 	else
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
-				      &mode);
-
-	btrfs_exclop_finish(fs_info);
-
-	if (!ret) {
-		if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-			btrfs_info(fs_info, "device deleted: id %llu",
-					vol_args->devid);
-		else
-			btrfs_info(fs_info, "device deleted: %s",
-					vol_args->name);
-	}
+		ret = btrfs_do_device_removal(file, vol_args->name, 0, cancel);
 out:
 	kfree(vol_args);
-err_drop:
-	mnt_drop_write_file(file);
-	if (bdev)
-		blkdev_put(bdev, mode);
 	return ret;
 }
 
 static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 {
-	struct inode *inode = file_inode(file);
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ioctl_vol_args *vol_args;
-	struct block_device *bdev = NULL;
-	fmode_t mode;
 	int ret;
 	bool cancel;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	ret = mnt_want_write_file(file);
-	if (ret)
-		return ret;
-
 	vol_args = memdup_user(arg, sizeof(*vol_args));
-	if (IS_ERR(vol_args)) {
-		ret = PTR_ERR(vol_args);
-		goto out_drop_write;
-	}
+	if (IS_ERR(vol_args))
+		return PTR_ERR(vol_args);
 	vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
 	cancel = (strcmp("cancel", vol_args->name) == 0);
 
-	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
-					   cancel);
-	if (ret == 0) {
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
-				      &mode);
-		if (!ret)
-			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
-		btrfs_exclop_finish(fs_info);
-	}
+	ret = btrfs_do_device_removal(file, vol_args->name, 0, cancel);
 
 	kfree(vol_args);
-out_drop_write:
-	mnt_drop_write_file(file);
-
-	if (bdev)
-		blkdev_put(bdev, mode);
 	return ret;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
                   ` (5 preceding siblings ...)
  2021-07-27 21:01 ` [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
@ 2021-07-27 21:01 ` Josef Bacik
  2021-08-24 22:08   ` Anand Jain
                     ` (2 more replies)
  2021-09-17 15:06 ` [PATCH v2 0/7] David Sterba
  7 siblings, 3 replies; 39+ messages in thread
From: Josef Bacik @ 2021-07-27 21:01 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I got the following lockdep splat while testing seed devices

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #409 Not tainted
------------------------------------------------------
mount/34004 is trying to acquire lock:
ffff9eaac48188e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170

but task is already holding lock:
ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (btrfs-chunk-00){++++}-{3:3}:
       down_read_nested+0x46/0x60
       __btrfs_tree_read_lock+0x24/0x100
       btrfs_read_lock_root_node+0x31/0x40
       btrfs_search_slot+0x480/0x930
       btrfs_update_device+0x63/0x180
       btrfs_chunk_alloc_add_chunk_item+0xdc/0x3a0
       btrfs_chunk_alloc+0x281/0x540
       find_free_extent+0x10ca/0x1790
       btrfs_reserve_extent+0xbf/0x1d0
       btrfs_alloc_tree_block+0xb1/0x320
       __btrfs_cow_block+0x136/0x5f0
       btrfs_cow_block+0x107/0x210
       btrfs_search_slot+0x56a/0x930
       btrfs_truncate_inode_items+0x187/0xef0
       btrfs_truncate_free_space_cache+0x11c/0x210
       delete_block_group_cache+0x6f/0xb0
       btrfs_relocate_block_group+0xf8/0x350
       btrfs_relocate_chunk+0x38/0x120
       btrfs_balance+0x79b/0xf00
       btrfs_ioctl_balance+0x327/0x400
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       btrfs_init_new_device+0x6d6/0x1540
       btrfs_ioctl+0x1b12/0x2d30
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       __mutex_lock+0x7d/0x750
       clone_fs_devices+0x4d/0x170
       btrfs_read_chunk_tree+0x32f/0x800
       open_ctree+0xae3/0x16f0
       btrfs_mount_root.cold+0x12/0xea
       legacy_get_tree+0x2d/0x50
       vfs_get_tree+0x25/0xc0
       vfs_kern_mount.part.0+0x71/0xb0
       btrfs_mount+0x10d/0x380
       legacy_get_tree+0x2d/0x50
       vfs_get_tree+0x25/0xc0
       path_mount+0x433/0xb60
       __x64_sys_mount+0xe3/0x120
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  &fs_devs->device_list_mutex --> &fs_info->chunk_mutex --> btrfs-chunk-00

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(btrfs-chunk-00);
                               lock(&fs_info->chunk_mutex);
                               lock(btrfs-chunk-00);
  lock(&fs_devs->device_list_mutex);

 *** DEADLOCK ***

3 locks held by mount/34004:
 #0: ffff9eaad75c00e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
 #1: ffffffffbd2dcf08 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x800
 #2: ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100

stack backtrace:
CPU: 0 PID: 34004 Comm: mount Not tainted 5.14.0-rc2+ #409
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? clone_fs_devices+0x4d/0x170
 ? lock_is_held_type+0xa5/0x120
 __mutex_lock+0x7d/0x750
 ? clone_fs_devices+0x4d/0x170
 ? clone_fs_devices+0x4d/0x170
 ? lockdep_init_map_type+0x47/0x220
 ? debug_mutex_init+0x33/0x40
 clone_fs_devices+0x4d/0x170
 ? lock_is_held_type+0xa5/0x120
 btrfs_read_chunk_tree+0x32f/0x800
 ? find_held_lock+0x2b/0x80
 open_ctree+0xae3/0x16f0
 btrfs_mount_root.cold+0x12/0xea
 ? rcu_read_lock_sched_held+0x3f/0x80
 ? kfree+0x1f6/0x410
 legacy_get_tree+0x2d/0x50
 vfs_get_tree+0x25/0xc0
 vfs_kern_mount.part.0+0x71/0xb0
 btrfs_mount+0x10d/0x380
 ? kfree+0x1f6/0x410
 legacy_get_tree+0x2d/0x50
 vfs_get_tree+0x25/0xc0
 path_mount+0x433/0xb60
 __x64_sys_mount+0xe3/0x120
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f6cbcd9788e

It is because we take the ->device_list_mutex in this path while holding
onto the tree locks in the chunk root.  However we do not need the lock
here, because we're already holding onto the uuid_mutex, and in fact
have removed all other uses of the ->device_list_mutex in this path
because of this.  Remove the ->device_list_mutex locking here, add an
assert for the uuid_mutex and the problem is fixed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f622e93a6ff1..bdfcc35335c3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1000,11 +1000,12 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 	struct btrfs_device *orig_dev;
 	int ret = 0;
 
+	lockdep_assert_held(&uuid_mutex);
+
 	fs_devices = alloc_fs_devices(orig->fsid, NULL);
 	if (IS_ERR(fs_devices))
 		return fs_devices;
 
-	mutex_lock(&orig->device_list_mutex);
 	fs_devices->total_devices = orig->total_devices;
 
 	list_for_each_entry(orig_dev, &orig->devices, dev_list) {
@@ -1036,10 +1037,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 		device->fs_devices = fs_devices;
 		fs_devices->num_devices++;
 	}
-	mutex_unlock(&orig->device_list_mutex);
 	return fs_devices;
 error:
-	mutex_unlock(&orig->device_list_mutex);
 	free_fs_devices(fs_devices);
 	return ERR_PTR(ret);
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices
  2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
@ 2021-08-24 22:08   ` Anand Jain
  2021-09-01 13:35   ` Nikolay Borisov
  2021-09-02 12:59   ` David Sterba
  2 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-08-24 22:08 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, kernel-team, Su Yue



On 28/07/2021 05:01, Josef Bacik wrote:
> I got the following lockdep splat while testing seed devices
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #409 Not tainted
> ------------------------------------------------------
> mount/34004 is trying to acquire lock:
> ffff9eaac48188e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170
> 
> but task is already holding lock:
> ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (btrfs-chunk-00){++++}-{3:3}:
>         down_read_nested+0x46/0x60
>         __btrfs_tree_read_lock+0x24/0x100
>         btrfs_read_lock_root_node+0x31/0x40
>         btrfs_search_slot+0x480/0x930
>         btrfs_update_device+0x63/0x180
>         btrfs_chunk_alloc_add_chunk_item+0xdc/0x3a0
>         btrfs_chunk_alloc+0x281/0x540
>         find_free_extent+0x10ca/0x1790
>         btrfs_reserve_extent+0xbf/0x1d0
>         btrfs_alloc_tree_block+0xb1/0x320
>         __btrfs_cow_block+0x136/0x5f0
>         btrfs_cow_block+0x107/0x210
>         btrfs_search_slot+0x56a/0x930
>         btrfs_truncate_inode_items+0x187/0xef0
>         btrfs_truncate_free_space_cache+0x11c/0x210
>         delete_block_group_cache+0x6f/0xb0
>         btrfs_relocate_block_group+0xf8/0x350
>         btrfs_relocate_chunk+0x38/0x120
>         btrfs_balance+0x79b/0xf00
>         btrfs_ioctl_balance+0x327/0x400
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         btrfs_init_new_device+0x6d6/0x1540
>         btrfs_ioctl+0x1b12/0x2d30
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
>         __lock_acquire+0x10ea/0x1d90
>         lock_acquire+0xb5/0x2b0
>         __mutex_lock+0x7d/0x750
>         clone_fs_devices+0x4d/0x170
>         btrfs_read_chunk_tree+0x32f/0x800
>         open_ctree+0xae3/0x16f0
>         btrfs_mount_root.cold+0x12/0xea
>         legacy_get_tree+0x2d/0x50
>         vfs_get_tree+0x25/0xc0
>         vfs_kern_mount.part.0+0x71/0xb0
>         btrfs_mount+0x10d/0x380
>         legacy_get_tree+0x2d/0x50
>         vfs_get_tree+0x25/0xc0
>         path_mount+0x433/0xb60
>         __x64_sys_mount+0xe3/0x120
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    &fs_devs->device_list_mutex --> &fs_info->chunk_mutex --> btrfs-chunk-00
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(btrfs-chunk-00);
>                                 lock(&fs_info->chunk_mutex);
>                                 lock(btrfs-chunk-00);
>    lock(&fs_devs->device_list_mutex);
> 
>   *** DEADLOCK ***
> 
> 3 locks held by mount/34004:
>   #0: ffff9eaad75c00e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
>   #1: ffffffffbd2dcf08 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x800
>   #2: ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100
> 
> stack backtrace:
> CPU: 0 PID: 34004 Comm: mount Not tainted 5.14.0-rc2+ #409
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0xcf/0xf0
>   __lock_acquire+0x10ea/0x1d90
>   lock_acquire+0xb5/0x2b0
>   ? clone_fs_devices+0x4d/0x170
>   ? lock_is_held_type+0xa5/0x120
>   __mutex_lock+0x7d/0x750
>   ? clone_fs_devices+0x4d/0x170
>   ? clone_fs_devices+0x4d/0x170
>   ? lockdep_init_map_type+0x47/0x220
>   ? debug_mutex_init+0x33/0x40
>   clone_fs_devices+0x4d/0x170
>   ? lock_is_held_type+0xa5/0x120
>   btrfs_read_chunk_tree+0x32f/0x800
>   ? find_held_lock+0x2b/0x80
>   open_ctree+0xae3/0x16f0
>   btrfs_mount_root.cold+0x12/0xea
>   ? rcu_read_lock_sched_held+0x3f/0x80
>   ? kfree+0x1f6/0x410
>   legacy_get_tree+0x2d/0x50
>   vfs_get_tree+0x25/0xc0
>   vfs_kern_mount.part.0+0x71/0xb0
>   btrfs_mount+0x10d/0x380
>   ? kfree+0x1f6/0x410
>   legacy_get_tree+0x2d/0x50
>   vfs_get_tree+0x25/0xc0
>   path_mount+0x433/0xb60
>   __x64_sys_mount+0xe3/0x120
>   do_syscall_64+0x38/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f6cbcd9788e
> 
> It is because we take the ->device_list_mutex in this path while holding
> onto the tree locks in the chunk root.  However we do not need the lock
> here, because we're already holding onto the uuid_mutex, and in fact
> have removed all other uses of the ->device_list_mutex in this path
> because of this.  Remove the ->device_list_mutex locking here, add an
> assert for the uuid_mutex and the problem is fixed.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f622e93a6ff1..bdfcc35335c3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1000,11 +1000,12 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>   	struct btrfs_device *orig_dev;
>   	int ret = 0;
>   
> +	lockdep_assert_held(&uuid_mutex);
> +
>   	fs_devices = alloc_fs_devices(orig->fsid, NULL);
>   	if (IS_ERR(fs_devices))
>   		return fs_devices;
>   
> -	mutex_lock(&orig->device_list_mutex);
>   	fs_devices->total_devices = orig->total_devices;
>   
>   	list_for_each_entry(orig_dev, &orig->devices, dev_list) {
> @@ -1036,10 +1037,8 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>   		device->fs_devices = fs_devices;
>   		fs_devices->num_devices++;
>   	}
> -	mutex_unlock(&orig->device_list_mutex);
>   	return fs_devices;
>   error:
> -	mutex_unlock(&orig->device_list_mutex);
>   	free_fs_devices(fs_devices);
>   	return ERR_PTR(ret);
>   }
> 


  This fix is same as in [1]

  [1]
 
https://patchwork.kernel.org/project/linux-btrfs/patch/23a8830f3be500995e74b45f18862e67c0634c3d.1614793362.git.anand.jain@oracle.com/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 4/7] btrfs: update the bdev time directly when closing
  2021-07-27 21:01 ` [PATCH v2 4/7] btrfs: update the bdev time directly when closing Josef Bacik
@ 2021-08-25  0:35   ` Anand Jain
  2021-09-02 12:16   ` David Sterba
  1 sibling, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-08-25  0:35 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/07/2021 05:01, Josef Bacik wrote:
> We update the ctime/mtime of a block device when we remove it so that
> blkid knows the device changed.  However we do this by re-opening the
> block device and calling filp_update_time.  This is more correct because
> it'll call the inode->i_op->update_time if it exists, but the block dev
> inodes do not do this.  Instead call generic_update_time() on the
> bd_inode in order to avoid the blkdev_open path and get rid of the
> following lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #406 Not tainted
> ------------------------------------------------------
> losetup/11596 is trying to acquire lock:
> ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x25/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x161/0x390
>         path_openat+0x3cc/0xa20
>         do_filp_open+0x96/0x120
>         do_sys_openat2+0x7b/0x130
>         __x64_sys_openat+0x46/0x70
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         blkdev_get_by_dev.part.0+0x56/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x161/0x390
>         path_openat+0x3cc/0xa20
>         do_filp_open+0x96/0x120
>         file_open_name+0xc7/0x170
>         filp_open+0x2c/0x50
>         btrfs_scratch_superblocks.part.0+0x10f/0x170
>         btrfs_rm_device.cold+0xe8/0xed
>         btrfs_ioctl+0x2a31/0x2e70
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>         lo_write_bvec+0xc2/0x240 [loop]
>         loop_process_work+0x238/0xd00 [loop]
>         process_one_work+0x26b/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x245/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x10ea/0x1d90
>         lock_acquire+0xb5/0x2b0
>         flush_workqueue+0x91/0x5e0
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x660 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 
> 1 lock held by losetup/11596:
>   #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0xcf/0xf0
>   ? stack_trace_save+0x3b/0x50
>   __lock_acquire+0x10ea/0x1d90
>   lock_acquire+0xb5/0x2b0
>   ? flush_workqueue+0x67/0x5e0
>   ? lockdep_init_map_type+0x47/0x220
>   flush_workqueue+0x91/0x5e0
>   ? flush_workqueue+0x67/0x5e0
>   ? verify_cpu+0xf0/0x100
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x660 [loop]
>   ? blkdev_ioctl+0x8d/0x2a0
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x80/0xb0
>   do_syscall_64+0x38/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 18 ++++++++++--------
>   1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index bf2449cdb2ab..3ab6c78e6eb2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1882,15 +1882,17 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
>    * Function to update ctime/mtime for a given device path.
>    * Mainly used for ctime/mtime based probe like libblkid.
>    */
> -static void update_dev_time(const char *path_name)
> +static void update_dev_time(struct block_device *bdev)
>   {
> -	struct file *filp;
> +	struct inode *inode = bdev->bd_inode;
> +	struct timespec64 now;
>   
> -	filp = filp_open(path_name, O_RDWR, 0);
> -	if (IS_ERR(filp))
> +	/* Shouldn't happen but just in case. */
> +	if (!inode)
>   		return;
> -	file_update_time(filp);
> -	filp_close(filp, NULL);
> +
> +	now = current_time(inode);
> +	generic_update_time(inode, &now, S_MTIME|S_CTIME);


  Oh. We could use that.

>   }
>   
>   static int btrfs_rm_dev_item(struct btrfs_device *device)
> @@ -2070,7 +2072,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
>   	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
>   
>   	/* Update ctime/mtime for device path for libblkid */
> -	update_dev_time(device_path);
> +	update_dev_time(bdev);
>   }
>   
>   int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> @@ -2711,7 +2713,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>   	btrfs_forget_devices(device_path);
>   
>   	/* Update ctime/mtime for blkid or udev */
> -	update_dev_time(device_path);
> +	update_dev_time(bdev);
>   
>   	return ret;
>   
> 

  Reviewed-by: Anand Jain <anand.jain@oracle.com>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove
  2021-07-27 21:01 ` [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove Josef Bacik
@ 2021-08-25  1:00   ` Anand Jain
  2021-09-02 12:16   ` David Sterba
  1 sibling, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-08-25  1:00 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/07/2021 05:01, Josef Bacik wrote:
> When removing the device we call blkdev_put() on the device once we've
> removed it, and because we have an EXCL open we need to take the
> ->open_mutex on the block device to clean it up.  Unfortunately during
> device remove we are holding the sb writers lock, which results in the
> following lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #407 Not tainted
> ------------------------------------------------------
> losetup/11595 is trying to acquire lock:
> ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x25/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x161/0x390
>         path_openat+0x3cc/0xa20
>         do_filp_open+0x96/0x120
>         do_sys_openat2+0x7b/0x130
>         __x64_sys_openat+0x46/0x70
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         blkdev_put+0x3a/0x220
>         btrfs_rm_device.cold+0x62/0xe5
>         btrfs_ioctl+0x2a31/0x2e70
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>         lo_write_bvec+0xc2/0x240 [loop]
>         loop_process_work+0x238/0xd00 [loop]
>         process_one_work+0x26b/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x245/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x10ea/0x1d90
>         lock_acquire+0xb5/0x2b0
>         flush_workqueue+0x91/0x5e0
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x660 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 
> 1 lock held by losetup/11595:
>   #0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0xcf/0xf0
>   ? stack_trace_save+0x3b/0x50
>   __lock_acquire+0x10ea/0x1d90
>   lock_acquire+0xb5/0x2b0
>   ? flush_workqueue+0x67/0x5e0
>   ? lockdep_init_map_type+0x47/0x220
>   flush_workqueue+0x91/0x5e0
>   ? flush_workqueue+0x67/0x5e0
>   ? verify_cpu+0xf0/0x100
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x660 [loop]
>   ? blkdev_ioctl+0x8d/0x2a0
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x80/0xb0
>   do_syscall_64+0x38/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc21255d4cb
> 
> So instead save the bdev and do the put once we've dropped the sb
> writers lock in order to avoid the lockdep recursion.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/ioctl.c   | 17 ++++++++++++++---
>   fs/btrfs/volumes.c | 19 +++++++++++++++----
>   fs/btrfs/volumes.h |  3 ++-
>   3 files changed, 31 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 0ba98e08a029..fabbfdfa56f5 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3205,6 +3205,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_ioctl_vol_args_v2 *vol_args;
> +	struct block_device *bdev = NULL;
> +	fmode_t mode;
>   	int ret;
>   	bool cancel = false;
>   
> @@ -3237,9 +3239,11 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	/* Exclusive operation is now claimed */
>   
>   	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
> -		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
> +		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev,
> +				      &mode);
>   	else
> -		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
> +		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
> +				      &mode);
>   
>   	btrfs_exclop_finish(fs_info);
>   
> @@ -3255,6 +3259,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	kfree(vol_args);
>   err_drop:
>   	mnt_drop_write_file(file);
> +	if (bdev)
> +		blkdev_put(bdev, mode);
>   	return ret;
>   }
>   
> @@ -3263,6 +3269,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_ioctl_vol_args *vol_args;
> +	struct block_device *bdev = NULL;
> +	fmode_t mode;
>   	int ret;
>   	bool cancel;
>   
> @@ -3284,7 +3292,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
>   					   cancel);
>   	if (ret == 0) {
> -		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
> +		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
> +				      &mode);
>   		if (!ret)
>   			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
>   		btrfs_exclop_finish(fs_info);
> @@ -3294,6 +3303,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   out_drop_write:
>   	mnt_drop_write_file(file);
>   
> +	if (bdev)
> +		blkdev_put(bdev, mode);
>   	return ret;
>   }
>   
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3ab6c78e6eb2..f622e93a6ff1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2076,7 +2076,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
>   }
>   
>   int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> -		    u64 devid)
> +		    u64 devid, struct block_device **bdev, fmode_t *mode)
>   {
>   	struct btrfs_device *device;
>   	struct btrfs_fs_devices *cur_devices;
> @@ -2186,15 +2186,26 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	mutex_unlock(&fs_devices->device_list_mutex);
>   
>   	/*
> -	 * at this point, the device is zero sized and detached from
> +	 * At this point, the device is zero sized and detached from
>   	 * the devices list.  All that's left is to zero out the old
>   	 * supers and free the device.
> +	 *
> +	 * We cannot call btrfs_close_bdev() here because we're holding the sb
> +	 * write lock, and blkdev_put() will pull in the ->open_mutex on the
> +	 * block device and it's dependencies.  Instead just flush the device
> +	 * and let the caller do the final blkdev_put.
>   	 */
> -	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
> +	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>   		btrfs_scratch_superblocks(fs_info, device->bdev,
>   					  device->name->str);
> +		if (device->bdev) {
> +			sync_blockdev(device->bdev);
> +			invalidate_bdev(device->bdev);
> +		}
> +	}
>   
> -	btrfs_close_bdev(device);
> +	*bdev = device->bdev;
> +	*mode = device->mode;
>   	synchronize_rcu();
>   	btrfs_free_device(device);
>   
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 70c749eee3ad..cc70e54cb901 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -472,7 +472,8 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
>   					const u8 *uuid);
>   void btrfs_free_device(struct btrfs_device *device);
>   int btrfs_rm_device(struct btrfs_fs_info *fs_info,
> -		    const char *device_path, u64 devid);
> +		    const char *device_path, u64 devid,
> +		    struct block_device **bdev, fmode_t *mode);
>   void __exit btrfs_cleanup_fs_uuids(void);
>   int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
>   int btrfs_grow_device(struct btrfs_trans_handle *trans,
> 


LGTM

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of device remove
  2021-07-27 21:01 ` [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
@ 2021-08-25  1:19   ` Anand Jain
  2021-09-01 14:05   ` Nikolay Borisov
  1 sibling, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-08-25  1:19 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/07/2021 05:01, Josef Bacik wrote:
> These things share a lot of common code, v2 simply allows you to specify
> devid.  Abstract out this common code and use the helper by both the v1
> and v2 interfaces to save us some lines of code.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/ioctl.c | 99 +++++++++++++++++++-----------------------------
>   1 file changed, 38 insertions(+), 61 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index fabbfdfa56f5..e3a7e8544609 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3200,15 +3200,14 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
>   	return ret;
>   }
>   
> -static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
> +static long btrfs_do_device_removal(struct file *file, const char *path,
> +				    u64 devid, bool cancel)
>   {
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	struct btrfs_ioctl_vol_args_v2 *vol_args;
>   	struct block_device *bdev = NULL;
>   	fmode_t mode;
>   	int ret;
> -	bool cancel = false;
>   
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
> @@ -3217,11 +3216,37 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	if (ret)
>   		return ret;
>   
> -	vol_args = memdup_user(arg, sizeof(*vol_args));
> -	if (IS_ERR(vol_args)) {
> -		ret = PTR_ERR(vol_args);
> -		goto err_drop;
> +	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
> +					   cancel);
> +	if (ret)
> +		goto out;
> +
> +	/* Exclusive operation is now claimed */
> +	ret = btrfs_rm_device(fs_info, path, devid, &bdev, &mode);
> +	btrfs_exclop_finish(fs_info);
> +
> +	if (!ret) {
> +		if (path)
> +			btrfs_info(fs_info, "device deleted: %s", path);
> +		else
> +			btrfs_info(fs_info, "device deleted: id %llu", devid);
>   	}
> +out:
> +	mnt_drop_write_file(file);
> +	if (bdev)
> +		blkdev_put(bdev, mode);
> +	return ret;
> +}
> +
> +static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
> +{
> +	struct btrfs_ioctl_vol_args_v2 *vol_args;
> +	int ret = 0;
> +	bool cancel = false;
> +
> +	vol_args = memdup_user(arg, sizeof(*vol_args));
> +	if (IS_ERR(vol_args))
> +		return PTR_ERR(vol_args);
>   
>   	if (vol_args->flags & ~BTRFS_DEVICE_REMOVE_ARGS_MASK) {
>   		ret = -EOPNOTSUPP;
> @@ -3232,79 +3257,31 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	    strcmp("cancel", vol_args->name) == 0)
>   		cancel = true;
>   
> -	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
> -					   cancel);
> -	if (ret)
> -		goto out;
> -	/* Exclusive operation is now claimed */
> -
>   	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
> -		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev,
> -				      &mode);
> +		ret = btrfs_do_device_removal(file, NULL, vol_args->devid,
> +					      cancel);
>   	else
> -		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
> -				      &mode);
> -
> -	btrfs_exclop_finish(fs_info);
> -
> -	if (!ret) {
> -		if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
> -			btrfs_info(fs_info, "device deleted: id %llu",
> -					vol_args->devid);
> -		else
> -			btrfs_info(fs_info, "device deleted: %s",
> -					vol_args->name);
> -	}
> +		ret = btrfs_do_device_removal(file, vol_args->name, 0, cancel);
>   out:
>   	kfree(vol_args);
> -err_drop:
> -	mnt_drop_write_file(file);
> -	if (bdev)
> -		blkdev_put(bdev, mode);
>   	return ret;
>   }
>   
>   static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   {
> -	struct inode *inode = file_inode(file);
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_ioctl_vol_args *vol_args;
> -	struct block_device *bdev = NULL;
> -	fmode_t mode;
>   	int ret;
>   	bool cancel;
>   
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> -
> -	ret = mnt_want_write_file(file);
> -	if (ret)
> -		return ret;
> -
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
> -	if (IS_ERR(vol_args)) {
> -		ret = PTR_ERR(vol_args);
> -		goto out_drop_write;
> -	}
> +	if (IS_ERR(vol_args))
> +		return PTR_ERR(vol_args);
>   	vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
>   	cancel = (strcmp("cancel", vol_args->name) == 0);
>   
> -	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
> -					   cancel);
> -	if (ret == 0) {
> -		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
> -				      &mode);
> -		if (!ret)
> -			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
> -		btrfs_exclop_finish(fs_info);
> -	}
> +	ret = btrfs_do_device_removal(file, vol_args->name, 0, cancel);
>   
>   	kfree(vol_args);
> -out_drop_write:
> -	mnt_drop_write_file(file);
> -
> -	if (bdev)
> -		blkdev_put(bdev, mode);
>   	return ret;
>   }
>   
> 


Looks much better now.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/7] btrfs: do not read super look for a device path
  2021-07-27 21:01 ` [PATCH v2 3/7] btrfs: do not read super look for a device path Josef Bacik
@ 2021-08-25  2:00   ` Anand Jain
  2021-09-27 15:32     ` Josef Bacik
  0 siblings, 1 reply; 39+ messages in thread
From: Anand Jain @ 2021-08-25  2:00 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On 28/07/2021 05:01, Josef Bacik wrote:
> For device removal and replace we call btrfs_find_device_by_devspec,
> which if we give it a device path and nothing else will call
> btrfs_find_device_by_path, which opens the block device and reads the
> super block and then looks up our device based on that.
> 
> However this is completely unnecessary because we have the path stored
> in our device on our fsdevices.  All we need to do if we're given a path
> is look through the fs_devices on our file system and use that device if
> we find it, reading the super block is just silly.

The device path as stored in our fs_devices can differ from the path
provided by the user for the same device (for example, dm, lvm).

btrfs-progs sanitize the device path but, others (for example, an ioctl
test case) might not. And the path lookup would fail.

Also, btrfs dev scan <path> can update the device path anytime, even
after it is mounted. Fixing that failed the subsequent subvolume mounts
(if I remember correctly).

> This fixes the case where we end up with our sb write "lock" getting the
> dependency of the block device ->open_mutex, which resulted in the
> following lockdep splat

Can we do..

btrfs_exclop_start()
  ::
find device part (read sb)
  ::
mnt_want_write_file()?


Thanks, Anand


> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #405 Not tainted
> ------------------------------------------------------
> losetup/11576 is trying to acquire lock:
> ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x25/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x161/0x390
>         path_openat+0x3cc/0xa20
>         do_filp_open+0x96/0x120
>         do_sys_openat2+0x7b/0x130
>         __x64_sys_openat+0x46/0x70
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         blkdev_get_by_dev.part.0+0x56/0x3c0
>         blkdev_get_by_path+0x98/0xa0
>         btrfs_get_bdev_and_sb+0x1b/0xb0
>         btrfs_find_device_by_devspec+0x12b/0x1c0
>         btrfs_rm_device+0x127/0x610
>         btrfs_ioctl+0x2a31/0x2e70
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>         lo_write_bvec+0xc2/0x240 [loop]
>         loop_process_work+0x238/0xd00 [loop]
>         process_one_work+0x26b/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x245/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x10ea/0x1d90
>         lock_acquire+0xb5/0x2b0
>         flush_workqueue+0x91/0x5e0
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x660 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 
> 1 lock held by losetup/11576:
>   #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0xcf/0xf0
>   ? stack_trace_save+0x3b/0x50
>   __lock_acquire+0x10ea/0x1d90
>   lock_acquire+0xb5/0x2b0
>   ? flush_workqueue+0x67/0x5e0
>   ? lockdep_init_map_type+0x47/0x220
>   flush_workqueue+0x91/0x5e0
>   ? flush_workqueue+0x67/0x5e0
>   ? verify_cpu+0xf0/0x100
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x660 [loop]
>   ? blkdev_ioctl+0x8d/0x2a0
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x80/0xb0
>   do_syscall_64+0x38/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f31b02404cb
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 61 +++++++++++++++++-----------------------------
>   1 file changed, 23 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0e7372f637eb..bf2449cdb2ab 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2313,37 +2313,22 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
>   	btrfs_free_device(tgtdev);
>   }
>   
> -static struct btrfs_device *btrfs_find_device_by_path(
> -		struct btrfs_fs_info *fs_info, const char *device_path)
> +static struct btrfs_device *find_device_by_path(
> +					struct btrfs_fs_devices *fs_devices,
> +					const char *path)
>   {
> -	int ret = 0;
> -	struct btrfs_super_block *disk_super;
> -	u64 devid;
> -	u8 *dev_uuid;
> -	struct block_device *bdev;
>   	struct btrfs_device *device;
> +	bool missing = !strcmp(path, "missing");
>   
> -	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
> -				    fs_info->bdev_holder, 0, &bdev, &disk_super);
> -	if (ret)
> -		return ERR_PTR(ret);
> -
> -	devid = btrfs_stack_device_id(&disk_super->dev_item);
> -	dev_uuid = disk_super->dev_item.uuid;
> -	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
> -		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
> -					   disk_super->metadata_uuid);
> -	else
> -		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
> -					   disk_super->fsid);
> -
> -	btrfs_release_disk_super(disk_super);
> -	if (!device)
> -		device = ERR_PTR(-ENOENT);
> -	blkdev_put(bdev, FMODE_READ);
> -	return device;
> +	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +		if (missing && test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> +					&device->dev_state) && !device->bdev)
> +			return device;
> +		if (!missing && device_path_matched(path, device))
> +			return device;
> +	}
> +	return NULL;
>   }
> -
>   /*
>    * Lookup a device given by device id, or the path if the id is 0.
>    */
> @@ -2351,6 +2336,7 @@ struct btrfs_device *btrfs_find_device_by_devspec(
>   		struct btrfs_fs_info *fs_info, u64 devid,
>   		const char *device_path)
>   {
> +	struct btrfs_fs_devices *seed_devs;
>   	struct btrfs_device *device;
>   
>   	if (devid) {
> @@ -2364,18 +2350,17 @@ struct btrfs_device *btrfs_find_device_by_devspec(
>   	if (!device_path || !device_path[0])
>   		return ERR_PTR(-EINVAL);
>   
> -	if (strcmp(device_path, "missing") == 0) {
> -		/* Find first missing device */
> -		list_for_each_entry(device, &fs_info->fs_devices->devices,
> -				    dev_list) {
> -			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> -				     &device->dev_state) && !device->bdev)
> -				return device;
> -		}
> -		return ERR_PTR(-ENOENT);
> -	}
> +	device = find_device_by_path(fs_info->fs_devices, device_path);
> +	if (device)
> +		return device;
>   
> -	return btrfs_find_device_by_path(fs_info, device_path);
> +	list_for_each_entry(seed_devs, &fs_info->fs_devices->seed_list,
> +			    seed_list) {
> +		device = find_device_by_path(seed_devs, device_path);
> +		if (device)
> +			return device;
> +	}
> +	return ERR_PTR(-ENOENT);
>   }
>   
>   /*
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device
  2021-07-27 21:01 ` [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
@ 2021-09-01  8:13   ` Anand Jain
  0 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-01  8:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: kernel-team, linux-btrfs

On 28/07/2021 05:01, Josef Bacik wrote:
> There's a subtle case where if we're removing the seed device from a
> file system we need to free its private copy of the fs_devices.  However
> we do not need to call close_fs_devices(), because at this point there
> are no devices left to close as we've closed the last one.  The only
> thing that close_fs_devices() does is decrement ->opened, which should
> be 1.  We want to avoid calling close_fs_devices() here because it has a
> lockdep_assert_held(&uuid_mutex), and we are going to stop holding the
> uuid_mutex in this path.
> 
> So add an assert for the ->opened counter and simply decrement it like
> we should, and then clean up like normal.  Also add a comment explaining
> what we're doing here as I initially removed this code erroneously.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 86846d6e58d0..5217b93172b4 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2200,9 +2200,17 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	synchronize_rcu();
>   	btrfs_free_device(device);
>   
> +	/*
> +	 * This can happen if cur_devices is the private seed devices list.  We
> +	 * cannot call close_fs_devices() here because it expects the uuid_mutex
> +	 * to be held, but in fact we don't need that for the private
> +	 * seed_devices, we can simply decrement cur_devices->opened and then
> +	 * remove it from our list and free the fs_devices.
> +	 */

>   	if (cur_devices->open_devices == 0) {

  We should in fact use cur_devices->num_devices == 0 here.
  Sent a patch [1] to fix it.

[1]
https://patchwork.kernel.org/project/linux-btrfs/patch/d9c89b1740a876b3851fcf358f22809aa7f1ad2a.1630478246.git.anand.jain@oracle.com/


> +		ASSERT(cur_devices->opened == 1);

We don't need to assert(). free_fs_devices() has a warning for it.

         WARN_ON(fs_devices->opened);

>   		list_del_init(&cur_devices->seed_list);
> -		close_fs_devices(cur_devices);
> +		cur_devices->opened--;
>   		free_fs_devices(cur_devices);
>   	}


With the above two fixed.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
@ 2021-09-01 12:01   ` Anand Jain
  2021-09-01 17:08     ` David Sterba
  2021-09-01 17:10     ` Josef Bacik
  2021-09-02 12:58   ` David Sterba
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-01 12:01 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/07/2021 05:01, Josef Bacik wrote:
> We got the following lockdep splat while running xfstests (specifically
> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> converted loop to using workqueues, which comes with lockdep
> annotations that don't exist with kworkers.  The lockdep splat is as
> follows
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2-custom+ #34 Not tainted
> ------------------------------------------------------
> losetup/156417 is trying to acquire lock:
> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
> 
> but task is already holding lock:
> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0xba/0x7c0
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x28/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x163/0x3a0
>         path_openat+0x74d/0xa40
>         do_filp_open+0x9c/0x140
>         do_sys_openat2+0xb1/0x170
>         __x64_sys_openat+0x54/0x90
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0xba/0x7c0
>         blkdev_get_by_dev.part.0+0xd1/0x3c0
>         blkdev_get_by_path+0xc0/0xd0
>         btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>         btrfs_control_ioctl+0xac/0x170 [btrfs]
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (uuid_mutex){+.+.}-{3:3}:
>         __mutex_lock+0xba/0x7c0
>         btrfs_rm_device+0x48/0x6a0 [btrfs]
>         btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#11){.+.+}-{0:0}:
>         lo_write_bvec+0x112/0x290 [loop]
>         loop_process_work+0x25f/0xcb0 [loop]
>         process_one_work+0x28f/0x5d0
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x170
>         ret_from_fork+0x22/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x266/0x5d0
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x170
>         ret_from_fork+0x22/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x1130/0x1dc0
>         lock_acquire+0xf5/0x320
>         flush_workqueue+0xae/0x600
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x650 [loop]
>         lo_ioctl+0x29d/0x780 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>   Possible unsafe locking scenario:
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 1 lock held by losetup/156417:
>   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> 
> stack backtrace:
> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0x10a/0x120
>   __lock_acquire+0x1130/0x1dc0
>   lock_acquire+0xf5/0x320
>   ? flush_workqueue+0x84/0x600
>   flush_workqueue+0xae/0x600
>   ? flush_workqueue+0x84/0x600
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x650 [loop]
>   lo_ioctl+0x29d/0x780 [loop]
>   ? __lock_acquire+0x3a0/0x1dc0
>   ? update_dl_rq_load_avg+0x152/0x360
>   ? lock_is_held_type+0xa5/0x120
>   ? find_held_lock.constprop.0+0x2b/0x80
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x83/0xb0
>   do_syscall_64+0x3b/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f645884de6b
> 
> Usually the uuid_mutex exists to protect the fs_devices that map
> together all of the devices that match a specific uuid.  In rm_device
> we're messing with the uuid of a device, so it makes sense to protect
> that here.
> 
> However in doing that it pulls in a whole host of lockdep dependencies,
> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> we end up with the dependency chain under the uuid_mutex being added
> under the normal sb write dependency chain, which causes problems with
> loop devices.
> 
> We don't need the uuid mutex here however.  If we call
> btrfs_scan_one_device() before we scratch the super block we will find
> the fs_devices and not find the device itself and return EBUSY because
> the fs_devices is open.  If we call it after the scratch happens it will
> not appear to be a valid btrfs file system.
> 
> We do not need to worry about other fs_devices modifying operations here
> because we're protected by the exclusive operations locking.
> 
> So drop the uuid_mutex here in order to fix the lockdep splat.


I think uuid_mutex should stay. Here is why.

  While thread A takes %device at line 816 and deref at line 880.
  Thread B can completely remove and free that %device.
  As of now these threads are mutual exclusive using uuid_mutex.

Thread A

btrfs_control_ioctl()
   mutex_lock(&uuid_mutex);
     btrfs_scan_one_device()
       device_list_add()
       {
  815                 mutex_lock(&fs_devices->device_list_mutex);

  816                 device = btrfs_find_device(fs_devices, devid,
  817                                 disk_super->dev_item.uuid, NULL);

  880         } else if (!device->name || strcmp(device->name->str, path)) {

  933                         if (device->bdev->bd_dev != path_dev) {

  982         mutex_unlock(&fs_devices->device_list_mutex);
        }


Thread B

btrfs_rm_device()

2069         mutex_lock(&uuid_mutex);  <-- proposed to remove

2150         mutex_lock(&fs_devices->device_list_mutex);

2172         mutex_unlock(&fs_devices->device_list_mutex);

2180                 btrfs_scratch_superblocks(fs_info, device->bdev,
2181                                           device->name->str);

2183         btrfs_close_bdev(device);
2184         synchronize_rcu();
2185         btrfs_free_device(device);

2194         mutex_unlock(&uuid_mutex);  <-- proposed to remove


Well, I don't have a better option to fix this issue as of now.


> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 5217b93172b4..0e7372f637eb 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	u64 num_devices;
>   	int ret = 0;
>   
> -	mutex_lock(&uuid_mutex);
> -
>   	num_devices = btrfs_num_devices(fs_info);
>   
>   	ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
> @@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   		mutex_unlock(&fs_info->chunk_mutex);
>   	}
>   
> -	mutex_unlock(&uuid_mutex);
>   	ret = btrfs_shrink_device(device, 0);
>   	if (!ret)
>   		btrfs_reada_remove_dev(device);
> -	mutex_lock(&uuid_mutex);
>   	if (ret)
>   		goto error_undo;
>   
> @@ -2215,7 +2211,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	}
>   
>   out:
> -	mutex_unlock(&uuid_mutex);
>   	return ret;
>   
>   error_undo:
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices
  2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
  2021-08-24 22:08   ` Anand Jain
@ 2021-09-01 13:35   ` Nikolay Borisov
  2021-09-02 12:59   ` David Sterba
  2 siblings, 0 replies; 39+ messages in thread
From: Nikolay Borisov @ 2021-09-01 13:35 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 28.07.21 г. 0:01, Josef Bacik wrote:
> I got the following lockdep splat while testing seed devices
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #409 Not tainted
> ------------------------------------------------------
> mount/34004 is trying to acquire lock:
> ffff9eaac48188e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170
> 
> but task is already holding lock:
> ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (btrfs-chunk-00){++++}-{3:3}:
>        down_read_nested+0x46/0x60
>        __btrfs_tree_read_lock+0x24/0x100
>        btrfs_read_lock_root_node+0x31/0x40
>        btrfs_search_slot+0x480/0x930
>        btrfs_update_device+0x63/0x180
>        btrfs_chunk_alloc_add_chunk_item+0xdc/0x3a0
>        btrfs_chunk_alloc+0x281/0x540
>        find_free_extent+0x10ca/0x1790
>        btrfs_reserve_extent+0xbf/0x1d0
>        btrfs_alloc_tree_block+0xb1/0x320
>        __btrfs_cow_block+0x136/0x5f0
>        btrfs_cow_block+0x107/0x210
>        btrfs_search_slot+0x56a/0x930
>        btrfs_truncate_inode_items+0x187/0xef0
>        btrfs_truncate_free_space_cache+0x11c/0x210
>        delete_block_group_cache+0x6f/0xb0
>        btrfs_relocate_block_group+0xf8/0x350
>        btrfs_relocate_chunk+0x38/0x120
>        btrfs_balance+0x79b/0xf00
>        btrfs_ioctl_balance+0x327/0x400
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7d/0x750
>        btrfs_init_new_device+0x6d6/0x1540
>        btrfs_ioctl+0x1b12/0x2d30
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
>        __lock_acquire+0x10ea/0x1d90
>        lock_acquire+0xb5/0x2b0
>        __mutex_lock+0x7d/0x750
>        clone_fs_devices+0x4d/0x170
>        btrfs_read_chunk_tree+0x32f/0x800
>        open_ctree+0xae3/0x16f0
>        btrfs_mount_root.cold+0x12/0xea
>        legacy_get_tree+0x2d/0x50
>        vfs_get_tree+0x25/0xc0
>        vfs_kern_mount.part.0+0x71/0xb0
>        btrfs_mount+0x10d/0x380
>        legacy_get_tree+0x2d/0x50
>        vfs_get_tree+0x25/0xc0
>        path_mount+0x433/0xb60
>        __x64_sys_mount+0xe3/0x120
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   &fs_devs->device_list_mutex --> &fs_info->chunk_mutex --> btrfs-chunk-00
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(btrfs-chunk-00);
>                                lock(&fs_info->chunk_mutex);
>                                lock(btrfs-chunk-00);
>   lock(&fs_devs->device_list_mutex);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by mount/34004:
>  #0: ffff9eaad75c00e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
>  #1: ffffffffbd2dcf08 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x800
>  #2: ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100
> 
> stack backtrace:
> CPU: 0 PID: 34004 Comm: mount Not tainted 5.14.0-rc2+ #409
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>  dump_stack_lvl+0x57/0x72
>  check_noncircular+0xcf/0xf0
>  __lock_acquire+0x10ea/0x1d90
>  lock_acquire+0xb5/0x2b0
>  ? clone_fs_devices+0x4d/0x170
>  ? lock_is_held_type+0xa5/0x120
>  __mutex_lock+0x7d/0x750
>  ? clone_fs_devices+0x4d/0x170
>  ? clone_fs_devices+0x4d/0x170
>  ? lockdep_init_map_type+0x47/0x220
>  ? debug_mutex_init+0x33/0x40
>  clone_fs_devices+0x4d/0x170
>  ? lock_is_held_type+0xa5/0x120
>  btrfs_read_chunk_tree+0x32f/0x800
>  ? find_held_lock+0x2b/0x80
>  open_ctree+0xae3/0x16f0
>  btrfs_mount_root.cold+0x12/0xea
>  ? rcu_read_lock_sched_held+0x3f/0x80
>  ? kfree+0x1f6/0x410
>  legacy_get_tree+0x2d/0x50
>  vfs_get_tree+0x25/0xc0
>  vfs_kern_mount.part.0+0x71/0xb0
>  btrfs_mount+0x10d/0x380
>  ? kfree+0x1f6/0x410
>  legacy_get_tree+0x2d/0x50
>  vfs_get_tree+0x25/0xc0
>  path_mount+0x433/0xb60
>  __x64_sys_mount+0xe3/0x120
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f6cbcd9788e
> 
> It is because we take the ->device_list_mutex in this path while holding
> onto the tree locks in the chunk root.  However we do not need the lock
> here, because we're already holding onto the uuid_mutex, and in fact
> have removed all other uses of the ->device_list_mutex in this path
> because of this.  Remove the ->device_list_mutex locking here, add an
> assert for the uuid_mutex and the problem is fixed.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>


That's essentially the same as Anand's patch but nevertheless is
correct. So

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of device remove
  2021-07-27 21:01 ` [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
  2021-08-25  1:19   ` Anand Jain
@ 2021-09-01 14:05   ` Nikolay Borisov
  1 sibling, 0 replies; 39+ messages in thread
From: Nikolay Borisov @ 2021-09-01 14:05 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team



On 28.07.21 г. 0:01, Josef Bacik wrote:
> These things share a lot of common code, v2 simply allows you to specify
> devid.  Abstract out this common code and use the helper by both the v1
> and v2 interfaces to save us some lines of code.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-01 12:01   ` Anand Jain
@ 2021-09-01 17:08     ` David Sterba
  2021-09-01 17:10     ` Josef Bacik
  1 sibling, 0 replies; 39+ messages in thread
From: David Sterba @ 2021-09-01 17:08 UTC (permalink / raw)
  To: Anand Jain; +Cc: Josef Bacik, linux-btrfs, kernel-team

On Wed, Sep 01, 2021 at 08:01:24PM +0800, Anand Jain wrote:
> On 28/07/2021 05:01, Josef Bacik wrote:
> > We got the following lockdep splat while running xfstests (specifically
> > btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> > by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> > converted loop to using workqueues, which comes with lockdep
> > annotations that don't exist with kworkers.  The lockdep splat is as
> > follows
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 5.14.0-rc2-custom+ #34 Not tainted
> > ------------------------------------------------------
> > losetup/156417 is trying to acquire lock:
> > ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
> > 
> > but task is already holding lock:
> > ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> > 
> > which lock already depends on the new lock.
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
> >         __mutex_lock+0xba/0x7c0
> >         lo_open+0x28/0x60 [loop]
> >         blkdev_get_whole+0x28/0xf0
> >         blkdev_get_by_dev.part.0+0x168/0x3c0
> >         blkdev_open+0xd2/0xe0
> >         do_dentry_open+0x163/0x3a0
> >         path_openat+0x74d/0xa40
> >         do_filp_open+0x9c/0x140
> >         do_sys_openat2+0xb1/0x170
> >         __x64_sys_openat+0x54/0x90
> >         do_syscall_64+0x3b/0x90
> >         entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > -> #4 (&disk->open_mutex){+.+.}-{3:3}:
> >         __mutex_lock+0xba/0x7c0
> >         blkdev_get_by_dev.part.0+0xd1/0x3c0
> >         blkdev_get_by_path+0xc0/0xd0
> >         btrfs_scan_one_device+0x52/0x1f0 [btrfs]
> >         btrfs_control_ioctl+0xac/0x170 [btrfs]
> >         __x64_sys_ioctl+0x83/0xb0
> >         do_syscall_64+0x3b/0x90
> >         entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > -> #3 (uuid_mutex){+.+.}-{3:3}:
> >         __mutex_lock+0xba/0x7c0
> >         btrfs_rm_device+0x48/0x6a0 [btrfs]
> >         btrfs_ioctl+0x2d1c/0x3110 [btrfs]
> >         __x64_sys_ioctl+0x83/0xb0
> >         do_syscall_64+0x3b/0x90
> >         entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > -> #2 (sb_writers#11){.+.+}-{0:0}:
> >         lo_write_bvec+0x112/0x290 [loop]
> >         loop_process_work+0x25f/0xcb0 [loop]
> >         process_one_work+0x28f/0x5d0
> >         worker_thread+0x55/0x3c0
> >         kthread+0x140/0x170
> >         ret_from_fork+0x22/0x30
> > 
> > -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
> >         process_one_work+0x266/0x5d0
> >         worker_thread+0x55/0x3c0
> >         kthread+0x140/0x170
> >         ret_from_fork+0x22/0x30
> > 
> > -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
> >         __lock_acquire+0x1130/0x1dc0
> >         lock_acquire+0xf5/0x320
> >         flush_workqueue+0xae/0x600
> >         drain_workqueue+0xa0/0x110
> >         destroy_workqueue+0x36/0x250
> >         __loop_clr_fd+0x9a/0x650 [loop]
> >         lo_ioctl+0x29d/0x780 [loop]
> >         block_ioctl+0x3f/0x50
> >         __x64_sys_ioctl+0x83/0xb0
> >         do_syscall_64+0x3b/0x90
> >         entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > other info that might help us debug this:
> > Chain exists of:
> >    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> >   Possible unsafe locking scenario:
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(&lo->lo_mutex);
> >                                 lock(&disk->open_mutex);
> >                                 lock(&lo->lo_mutex);
> >    lock((wq_completion)loop0);
> > 
> >   *** DEADLOCK ***
> > 1 lock held by losetup/156417:
> >   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> > 
> > stack backtrace:
> > CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > Call Trace:
> >   dump_stack_lvl+0x57/0x72
> >   check_noncircular+0x10a/0x120
> >   __lock_acquire+0x1130/0x1dc0
> >   lock_acquire+0xf5/0x320
> >   ? flush_workqueue+0x84/0x600
> >   flush_workqueue+0xae/0x600
> >   ? flush_workqueue+0x84/0x600
> >   drain_workqueue+0xa0/0x110
> >   destroy_workqueue+0x36/0x250
> >   __loop_clr_fd+0x9a/0x650 [loop]
> >   lo_ioctl+0x29d/0x780 [loop]
> >   ? __lock_acquire+0x3a0/0x1dc0
> >   ? update_dl_rq_load_avg+0x152/0x360
> >   ? lock_is_held_type+0xa5/0x120
> >   ? find_held_lock.constprop.0+0x2b/0x80
> >   block_ioctl+0x3f/0x50
> >   __x64_sys_ioctl+0x83/0xb0
> >   do_syscall_64+0x3b/0x90
> >   entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f645884de6b
> > 
> > Usually the uuid_mutex exists to protect the fs_devices that map
> > together all of the devices that match a specific uuid.  In rm_device
> > we're messing with the uuid of a device, so it makes sense to protect
> > that here.
> > 
> > However in doing that it pulls in a whole host of lockdep dependencies,
> > as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> > we end up with the dependency chain under the uuid_mutex being added
> > under the normal sb write dependency chain, which causes problems with
> > loop devices.
> > 
> > We don't need the uuid mutex here however.  If we call
> > btrfs_scan_one_device() before we scratch the super block we will find
> > the fs_devices and not find the device itself and return EBUSY because
> > the fs_devices is open.  If we call it after the scratch happens it will
> > not appear to be a valid btrfs file system.
> > 
> > We do not need to worry about other fs_devices modifying operations here
> > because we're protected by the exclusive operations locking.
> > 
> > So drop the uuid_mutex here in order to fix the lockdep splat.
> 
> I think uuid_mutex should stay. Here is why.
> 
>   While thread A takes %device at line 816 and deref at line 880.
>   Thread B can completely remove and free that %device.
>   As of now these threads are mutual exclusive using uuid_mutex.
> 
> Thread A
> 
> btrfs_control_ioctl()
>    mutex_lock(&uuid_mutex);
>      btrfs_scan_one_device()
>        device_list_add()
>        {
>   815                 mutex_lock(&fs_devices->device_list_mutex);
> 
>   816                 device = btrfs_find_device(fs_devices, devid,
>   817                                 disk_super->dev_item.uuid, NULL);
> 
>   880         } else if (!device->name || strcmp(device->name->str, path)) {
> 
>   933                         if (device->bdev->bd_dev != path_dev) {
> 
>   982         mutex_unlock(&fs_devices->device_list_mutex);
>         }
> 
> 
> Thread B
> 
> btrfs_rm_device()
> 
> 2069         mutex_lock(&uuid_mutex);  <-- proposed to remove
> 
> 2150         mutex_lock(&fs_devices->device_list_mutex);
> 
> 2172         mutex_unlock(&fs_devices->device_list_mutex);
> 
> 2180                 btrfs_scratch_superblocks(fs_info, device->bdev,
> 2181                                           device->name->str);
> 
> 2183         btrfs_close_bdev(device);
> 2184         synchronize_rcu();
> 2185         btrfs_free_device(device);
> 
> 2194         mutex_unlock(&uuid_mutex);  <-- proposed to remove

Yeah, I think this is the reason why uuid mutex exists at all, serialize
scanning (mounted or unmounted) with device list operations on mounted
filesystems (eg. removing).

> Well, I don't have a better option to fix this issue as of now.

Me neither. In general removing a lock allows sections to compete for
the resources and given that we've had some weird interactions of
mount/scan triggered by syzkaller I'm reluctant to just drop uuid mutex.

The reasoning of this patch concerns mounted filesystems AFAICS, not
scanning triggered by the control ioctl.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-01 12:01   ` Anand Jain
  2021-09-01 17:08     ` David Sterba
@ 2021-09-01 17:10     ` Josef Bacik
  2021-09-01 19:49       ` Anand Jain
  1 sibling, 1 reply; 39+ messages in thread
From: Josef Bacik @ 2021-09-01 17:10 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs, kernel-team

On 9/1/21 8:01 AM, Anand Jain wrote:
> On 28/07/2021 05:01, Josef Bacik wrote:
>> We got the following lockdep splat while running xfstests (specifically
>> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
>> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
>> converted loop to using workqueues, which comes with lockdep
>> annotations that don't exist with kworkers.  The lockdep splat is as
>> follows
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.14.0-rc2-custom+ #34 Not tainted
>> ------------------------------------------------------
>> losetup/156417 is trying to acquire lock:
>> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: 
>> flush_workqueue+0x84/0x600
>>
>> but task is already holding lock:
>> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: 
>> __loop_clr_fd+0x41/0x650 [loop]
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>>         __mutex_lock+0xba/0x7c0
>>         lo_open+0x28/0x60 [loop]
>>         blkdev_get_whole+0x28/0xf0
>>         blkdev_get_by_dev.part.0+0x168/0x3c0
>>         blkdev_open+0xd2/0xe0
>>         do_dentry_open+0x163/0x3a0
>>         path_openat+0x74d/0xa40
>>         do_filp_open+0x9c/0x140
>>         do_sys_openat2+0xb1/0x170
>>         __x64_sys_openat+0x54/0x90
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>>         __mutex_lock+0xba/0x7c0
>>         blkdev_get_by_dev.part.0+0xd1/0x3c0
>>         blkdev_get_by_path+0xc0/0xd0
>>         btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>>         btrfs_control_ioctl+0xac/0x170 [btrfs]
>>         __x64_sys_ioctl+0x83/0xb0
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #3 (uuid_mutex){+.+.}-{3:3}:
>>         __mutex_lock+0xba/0x7c0
>>         btrfs_rm_device+0x48/0x6a0 [btrfs]
>>         btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>>         __x64_sys_ioctl+0x83/0xb0
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #2 (sb_writers#11){.+.+}-{0:0}:
>>         lo_write_bvec+0x112/0x290 [loop]
>>         loop_process_work+0x25f/0xcb0 [loop]
>>         process_one_work+0x28f/0x5d0
>>         worker_thread+0x55/0x3c0
>>         kthread+0x140/0x170
>>         ret_from_fork+0x22/0x30
>>
>> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>>         process_one_work+0x266/0x5d0
>>         worker_thread+0x55/0x3c0
>>         kthread+0x140/0x170
>>         ret_from_fork+0x22/0x30
>>
>> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>>         __lock_acquire+0x1130/0x1dc0
>>         lock_acquire+0xf5/0x320
>>         flush_workqueue+0xae/0x600
>>         drain_workqueue+0xa0/0x110
>>         destroy_workqueue+0x36/0x250
>>         __loop_clr_fd+0x9a/0x650 [loop]
>>         lo_ioctl+0x29d/0x780 [loop]
>>         block_ioctl+0x3f/0x50
>>         __x64_sys_ioctl+0x83/0xb0
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> other info that might help us debug this:
>> Chain exists of:
>>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>>   Possible unsafe locking scenario:
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&lo->lo_mutex);
>>                                 lock(&disk->open_mutex);
>>                                 lock(&lo->lo_mutex);
>>    lock((wq_completion)loop0);
>>
>>   *** DEADLOCK ***
>> 1 lock held by losetup/156417:
>>   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: 
>> __loop_clr_fd+0x41/0x650 [loop]
>>
>> stack backtrace:
>> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>> Call Trace:
>>   dump_stack_lvl+0x57/0x72
>>   check_noncircular+0x10a/0x120
>>   __lock_acquire+0x1130/0x1dc0
>>   lock_acquire+0xf5/0x320
>>   ? flush_workqueue+0x84/0x600
>>   flush_workqueue+0xae/0x600
>>   ? flush_workqueue+0x84/0x600
>>   drain_workqueue+0xa0/0x110
>>   destroy_workqueue+0x36/0x250
>>   __loop_clr_fd+0x9a/0x650 [loop]
>>   lo_ioctl+0x29d/0x780 [loop]
>>   ? __lock_acquire+0x3a0/0x1dc0
>>   ? update_dl_rq_load_avg+0x152/0x360
>>   ? lock_is_held_type+0xa5/0x120
>>   ? find_held_lock.constprop.0+0x2b/0x80
>>   block_ioctl+0x3f/0x50
>>   __x64_sys_ioctl+0x83/0xb0
>>   do_syscall_64+0x3b/0x90
>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x7f645884de6b
>>
>> Usually the uuid_mutex exists to protect the fs_devices that map
>> together all of the devices that match a specific uuid.  In rm_device
>> we're messing with the uuid of a device, so it makes sense to protect
>> that here.
>>
>> However in doing that it pulls in a whole host of lockdep dependencies,
>> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
>> we end up with the dependency chain under the uuid_mutex being added
>> under the normal sb write dependency chain, which causes problems with
>> loop devices.
>>
>> We don't need the uuid mutex here however.  If we call
>> btrfs_scan_one_device() before we scratch the super block we will find
>> the fs_devices and not find the device itself and return EBUSY because
>> the fs_devices is open.  If we call it after the scratch happens it will
>> not appear to be a valid btrfs file system.
>>
>> We do not need to worry about other fs_devices modifying operations here
>> because we're protected by the exclusive operations locking.
>>
>> So drop the uuid_mutex here in order to fix the lockdep splat.
> 
> 
> I think uuid_mutex should stay. Here is why.
> 
>   While thread A takes %device at line 816 and deref at line 880.
>   Thread B can completely remove and free that %device.
>   As of now these threads are mutual exclusive using uuid_mutex.
> 
> Thread A
> 
> btrfs_control_ioctl()
>    mutex_lock(&uuid_mutex);
>      btrfs_scan_one_device()
>        device_list_add()
>        {
>   815                 mutex_lock(&fs_devices->device_list_mutex);
> 
>   816                 device = btrfs_find_device(fs_devices, devid,
>   817                                 disk_super->dev_item.uuid, NULL);
> 
>   880         } else if (!device->name || strcmp(device->name->str, 
> path)) {
> 
>   933                         if (device->bdev->bd_dev != path_dev) {
> 
>   982         mutex_unlock(&fs_devices->device_list_mutex);
>         }
> 
> 
> Thread B
> 
> btrfs_rm_device()
> 
> 2069         mutex_lock(&uuid_mutex);  <-- proposed to remove
> 
> 2150         mutex_lock(&fs_devices->device_list_mutex);
> 
> 2172         mutex_unlock(&fs_devices->device_list_mutex);
> 
> 2180                 btrfs_scratch_superblocks(fs_info, device->bdev,
> 2181                                           device->name->str);
> 
> 2183         btrfs_close_bdev(device);
> 2184         synchronize_rcu();
> 2185         btrfs_free_device(device);
> 
> 2194         mutex_unlock(&uuid_mutex);  <-- proposed to remove
> 
> 

This is fine, we're protected by the fs_devices->device_list_mutex here. 
  We'll remove our device from the list before dropping the 
device_list_mutex, so we won't be able to find the old device if we're 
removing it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-01 17:10     ` Josef Bacik
@ 2021-09-01 19:49       ` Anand Jain
  0 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-01 19:49 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: David Sterba



On 02/09/2021 01:10, Josef Bacik wrote:
> On 9/1/21 8:01 AM, Anand Jain wrote:
>> On 28/07/2021 05:01, Josef Bacik wrote:
>>> We got the following lockdep splat while running xfstests (specifically
>>> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
>>> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
>>> converted loop to using workqueues, which comes with lockdep
>>> annotations that don't exist with kworkers.  The lockdep splat is as
>>> follows
>>>
>>> ======================================================
>>> WARNING: possible circular locking dependency detected
>>> 5.14.0-rc2-custom+ #34 Not tainted
>>> ------------------------------------------------------
>>> losetup/156417 is trying to acquire lock:
>>> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: 
>>> flush_workqueue+0x84/0x600
>>>
>>> but task is already holding lock:
>>> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: 
>>> __loop_clr_fd+0x41/0x650 [loop]
>>>
>>> which lock already depends on the new lock.
>>>
>>> the existing dependency chain (in reverse order) is:
>>>
>>> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>>>         __mutex_lock+0xba/0x7c0
>>>         lo_open+0x28/0x60 [loop]
>>>         blkdev_get_whole+0x28/0xf0
>>>         blkdev_get_by_dev.part.0+0x168/0x3c0
>>>         blkdev_open+0xd2/0xe0
>>>         do_dentry_open+0x163/0x3a0
>>>         path_openat+0x74d/0xa40
>>>         do_filp_open+0x9c/0x140
>>>         do_sys_openat2+0xb1/0x170
>>>         __x64_sys_openat+0x54/0x90
>>>         do_syscall_64+0x3b/0x90
>>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>>>         __mutex_lock+0xba/0x7c0
>>>         blkdev_get_by_dev.part.0+0xd1/0x3c0
>>>         blkdev_get_by_path+0xc0/0xd0
>>>         btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>>>         btrfs_control_ioctl+0xac/0x170 [btrfs]
>>>         __x64_sys_ioctl+0x83/0xb0
>>>         do_syscall_64+0x3b/0x90
>>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> -> #3 (uuid_mutex){+.+.}-{3:3}:
>>>         __mutex_lock+0xba/0x7c0
>>>         btrfs_rm_device+0x48/0x6a0 [btrfs]
>>>         btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>>>         __x64_sys_ioctl+0x83/0xb0
>>>         do_syscall_64+0x3b/0x90
>>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> -> #2 (sb_writers#11){.+.+}-{0:0}:
>>>         lo_write_bvec+0x112/0x290 [loop]
>>>         loop_process_work+0x25f/0xcb0 [loop]
>>>         process_one_work+0x28f/0x5d0
>>>         worker_thread+0x55/0x3c0
>>>         kthread+0x140/0x170
>>>         ret_from_fork+0x22/0x30
>>>
>>> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>>>         process_one_work+0x266/0x5d0
>>>         worker_thread+0x55/0x3c0
>>>         kthread+0x140/0x170
>>>         ret_from_fork+0x22/0x30
>>>
>>> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>>>         __lock_acquire+0x1130/0x1dc0
>>>         lock_acquire+0xf5/0x320
>>>         flush_workqueue+0xae/0x600
>>>         drain_workqueue+0xa0/0x110
>>>         destroy_workqueue+0x36/0x250
>>>         __loop_clr_fd+0x9a/0x650 [loop]
>>>         lo_ioctl+0x29d/0x780 [loop]
>>>         block_ioctl+0x3f/0x50
>>>         __x64_sys_ioctl+0x83/0xb0
>>>         do_syscall_64+0x3b/0x90
>>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> other info that might help us debug this:
>>> Chain exists of:
>>>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>>>   Possible unsafe locking scenario:
>>>         CPU0                    CPU1
>>>         ----                    ----
>>>    lock(&lo->lo_mutex);
>>>                                 lock(&disk->open_mutex);
>>>                                 lock(&lo->lo_mutex);
>>>    lock((wq_completion)loop0);
>>>
>>>   *** DEADLOCK ***
>>> 1 lock held by losetup/156417:
>>>   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: 
>>> __loop_clr_fd+0x41/0x650 [loop]
>>>
>>> stack backtrace:
>>> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
>>> 02/06/2015
>>> Call Trace:
>>>   dump_stack_lvl+0x57/0x72
>>>   check_noncircular+0x10a/0x120
>>>   __lock_acquire+0x1130/0x1dc0
>>>   lock_acquire+0xf5/0x320
>>>   ? flush_workqueue+0x84/0x600
>>>   flush_workqueue+0xae/0x600
>>>   ? flush_workqueue+0x84/0x600
>>>   drain_workqueue+0xa0/0x110
>>>   destroy_workqueue+0x36/0x250
>>>   __loop_clr_fd+0x9a/0x650 [loop]
>>>   lo_ioctl+0x29d/0x780 [loop]
>>>   ? __lock_acquire+0x3a0/0x1dc0
>>>   ? update_dl_rq_load_avg+0x152/0x360
>>>   ? lock_is_held_type+0xa5/0x120
>>>   ? find_held_lock.constprop.0+0x2b/0x80
>>>   block_ioctl+0x3f/0x50
>>>   __x64_sys_ioctl+0x83/0xb0
>>>   do_syscall_64+0x3b/0x90
>>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>>> RIP: 0033:0x7f645884de6b
>>>
>>> Usually the uuid_mutex exists to protect the fs_devices that map
>>> together all of the devices that match a specific uuid.  In rm_device
>>> we're messing with the uuid of a device, so it makes sense to protect
>>> that here.
>>>
>>> However in doing that it pulls in a whole host of lockdep dependencies,
>>> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
>>> we end up with the dependency chain under the uuid_mutex being added
>>> under the normal sb write dependency chain, which causes problems with
>>> loop devices.
>>>
>>> We don't need the uuid mutex here however.  If we call
>>> btrfs_scan_one_device() before we scratch the super block we will find
>>> the fs_devices and not find the device itself and return EBUSY because
>>> the fs_devices is open.  If we call it after the scratch happens it will
>>> not appear to be a valid btrfs file system.
>>>
>>> We do not need to worry about other fs_devices modifying operations here
>>> because we're protected by the exclusive operations locking.
>>>
>>> So drop the uuid_mutex here in order to fix the lockdep splat.
>>
>>
>> I think uuid_mutex should stay. Here is why.
>>
>>   While thread A takes %device at line 816 and deref at line 880.
>>   Thread B can completely remove and free that %device.
>>   As of now these threads are mutual exclusive using uuid_mutex.
>>
>> Thread A
>>
>> btrfs_control_ioctl()
>>    mutex_lock(&uuid_mutex);
>>      btrfs_scan_one_device()
>>        device_list_add()
>>        {
>>   815                 mutex_lock(&fs_devices->device_list_mutex);
>>
>>   816                 device = btrfs_find_device(fs_devices, devid,
>>   817                                 disk_super->dev_item.uuid, NULL);
>>
>>   880         } else if (!device->name || strcmp(device->name->str, 
>> path)) {
>>
>>   933                         if (device->bdev->bd_dev != path_dev) {
>>
>>   982         mutex_unlock(&fs_devices->device_list_mutex);
>>         }
>>
>>
>> Thread B
>>
>> btrfs_rm_device()
>>
>> 2069         mutex_lock(&uuid_mutex);  <-- proposed to remove
>>
>> 2150         mutex_lock(&fs_devices->device_list_mutex);
    2151         list_del_rcu(&device->dev_list);   <----
>>
>> 2172         mutex_unlock(&fs_devices->device_list_mutex);
>>
>> 2180                 btrfs_scratch_superblocks(fs_info, device->bdev,
>> 2181                                           device->name->str);
>>
>> 2183         btrfs_close_bdev(device);
>> 2184         synchronize_rcu();
>> 2185         btrfs_free_device(device);
>>
>> 2194         mutex_unlock(&uuid_mutex);  <-- proposed to remove
>>
>>
> 
> This is fine, we're protected by the fs_devices->device_list_mutex here. 

>   We'll remove our device from the list before dropping the 
> device_list_mutex,

You are right. I missed that point.
Changes looks good to me.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks.

> so we won't be able to find the old device if we're 
> removing it.  Thanks,





> 
> Josef

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 4/7] btrfs: update the bdev time directly when closing
  2021-07-27 21:01 ` [PATCH v2 4/7] btrfs: update the bdev time directly when closing Josef Bacik
  2021-08-25  0:35   ` Anand Jain
@ 2021-09-02 12:16   ` David Sterba
  1 sibling, 0 replies; 39+ messages in thread
From: David Sterba @ 2021-09-02 12:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jul 27, 2021 at 05:01:16PM -0400, Josef Bacik wrote:
> We update the ctime/mtime of a block device when we remove it so that
> blkid knows the device changed.  However we do this by re-opening the
> block device and calling filp_update_time.  This is more correct because
> it'll call the inode->i_op->update_time if it exists, but the block dev
> inodes do not do this.  Instead call generic_update_time() on the
> bd_inode in order to avoid the blkdev_open path and get rid of the
> following lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #406 Not tainted
> ------------------------------------------------------
> losetup/11596 is trying to acquire lock:
> ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7d/0x750
>        lo_open+0x28/0x60 [loop]
>        blkdev_get_whole+0x25/0xf0
>        blkdev_get_by_dev.part.0+0x168/0x3c0
>        blkdev_open+0xd2/0xe0
>        do_dentry_open+0x161/0x390
>        path_openat+0x3cc/0xa20
>        do_filp_open+0x96/0x120
>        do_sys_openat2+0x7b/0x130
>        __x64_sys_openat+0x46/0x70
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7d/0x750
>        blkdev_get_by_dev.part.0+0x56/0x3c0
>        blkdev_open+0xd2/0xe0
>        do_dentry_open+0x161/0x390
>        path_openat+0x3cc/0xa20
>        do_filp_open+0x96/0x120
>        file_open_name+0xc7/0x170
>        filp_open+0x2c/0x50
>        btrfs_scratch_superblocks.part.0+0x10f/0x170
>        btrfs_rm_device.cold+0xe8/0xed
>        btrfs_ioctl+0x2a31/0x2e70
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>        lo_write_bvec+0xc2/0x240 [loop]
>        loop_process_work+0x238/0xd00 [loop]
>        process_one_work+0x26b/0x560
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x160
>        ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>        process_one_work+0x245/0x560
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x160
>        ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>        __lock_acquire+0x10ea/0x1d90
>        lock_acquire+0xb5/0x2b0
>        flush_workqueue+0x91/0x5e0
>        drain_workqueue+0xa0/0x110
>        destroy_workqueue+0x36/0x250
>        __loop_clr_fd+0x9a/0x660 [loop]
>        block_ioctl+0x3f/0x50
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&lo->lo_mutex);
>                                lock(&disk->open_mutex);
>                                lock(&lo->lo_mutex);
>   lock((wq_completion)loop0);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by losetup/11596:
>  #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>  dump_stack_lvl+0x57/0x72
>  check_noncircular+0xcf/0xf0
>  ? stack_trace_save+0x3b/0x50
>  __lock_acquire+0x10ea/0x1d90
>  lock_acquire+0xb5/0x2b0
>  ? flush_workqueue+0x67/0x5e0
>  ? lockdep_init_map_type+0x47/0x220
>  flush_workqueue+0x91/0x5e0
>  ? flush_workqueue+0x67/0x5e0
>  ? verify_cpu+0xf0/0x100
>  drain_workqueue+0xa0/0x110
>  destroy_workqueue+0x36/0x250
>  __loop_clr_fd+0x9a/0x660 [loop]
>  ? blkdev_ioctl+0x8d/0x2a0
>  block_ioctl+0x3f/0x50
>  __x64_sys_ioctl+0x80/0xb0
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove
  2021-07-27 21:01 ` [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove Josef Bacik
  2021-08-25  1:00   ` Anand Jain
@ 2021-09-02 12:16   ` David Sterba
  1 sibling, 0 replies; 39+ messages in thread
From: David Sterba @ 2021-09-02 12:16 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jul 27, 2021 at 05:01:17PM -0400, Josef Bacik wrote:
> When removing the device we call blkdev_put() on the device once we've
> removed it, and because we have an EXCL open we need to take the
> ->open_mutex on the block device to clean it up.  Unfortunately during
> device remove we are holding the sb writers lock, which results in the
> following lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #407 Not tainted
> ------------------------------------------------------
> losetup/11595 is trying to acquire lock:
> ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7d/0x750
>        lo_open+0x28/0x60 [loop]
>        blkdev_get_whole+0x25/0xf0
>        blkdev_get_by_dev.part.0+0x168/0x3c0
>        blkdev_open+0xd2/0xe0
>        do_dentry_open+0x161/0x390
>        path_openat+0x3cc/0xa20
>        do_filp_open+0x96/0x120
>        do_sys_openat2+0x7b/0x130
>        __x64_sys_openat+0x46/0x70
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7d/0x750
>        blkdev_put+0x3a/0x220
>        btrfs_rm_device.cold+0x62/0xe5
>        btrfs_ioctl+0x2a31/0x2e70
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>        lo_write_bvec+0xc2/0x240 [loop]
>        loop_process_work+0x238/0xd00 [loop]
>        process_one_work+0x26b/0x560
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x160
>        ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>        process_one_work+0x245/0x560
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x160
>        ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>        __lock_acquire+0x10ea/0x1d90
>        lock_acquire+0xb5/0x2b0
>        flush_workqueue+0x91/0x5e0
>        drain_workqueue+0xa0/0x110
>        destroy_workqueue+0x36/0x250
>        __loop_clr_fd+0x9a/0x660 [loop]
>        block_ioctl+0x3f/0x50
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&lo->lo_mutex);
>                                lock(&disk->open_mutex);
>                                lock(&lo->lo_mutex);
>   lock((wq_completion)loop0);
> 
>  *** DEADLOCK ***
> 
> 1 lock held by losetup/11595:
>  #0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>  dump_stack_lvl+0x57/0x72
>  check_noncircular+0xcf/0xf0
>  ? stack_trace_save+0x3b/0x50
>  __lock_acquire+0x10ea/0x1d90
>  lock_acquire+0xb5/0x2b0
>  ? flush_workqueue+0x67/0x5e0
>  ? lockdep_init_map_type+0x47/0x220
>  flush_workqueue+0x91/0x5e0
>  ? flush_workqueue+0x67/0x5e0
>  ? verify_cpu+0xf0/0x100
>  drain_workqueue+0xa0/0x110
>  destroy_workqueue+0x36/0x250
>  __loop_clr_fd+0x9a/0x660 [loop]
>  ? blkdev_ioctl+0x8d/0x2a0
>  block_ioctl+0x3f/0x50
>  __x64_sys_ioctl+0x80/0xb0
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fc21255d4cb
> 
> So instead save the bdev and do the put once we've dropped the sb
> writers lock in order to avoid the lockdep recursion.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
  2021-09-01 12:01   ` Anand Jain
@ 2021-09-02 12:58   ` David Sterba
  2021-09-02 14:10     ` Josef Bacik
  2021-09-20  7:45   ` Anand Jain
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: David Sterba @ 2021-09-02 12:58 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jul 27, 2021 at 05:01:14PM -0400, Josef Bacik wrote:
> We got the following lockdep splat while running xfstests (specifically
> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> converted loop to using workqueues, which comes with lockdep
> annotations that don't exist with kworkers.  The lockdep splat is as
> follows
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2-custom+ #34 Not tainted
> ------------------------------------------------------
> losetup/156417 is trying to acquire lock:
> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
> 
> but task is already holding lock:
> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>        __mutex_lock+0xba/0x7c0
>        lo_open+0x28/0x60 [loop]
>        blkdev_get_whole+0x28/0xf0
>        blkdev_get_by_dev.part.0+0x168/0x3c0
>        blkdev_open+0xd2/0xe0
>        do_dentry_open+0x163/0x3a0
>        path_openat+0x74d/0xa40
>        do_filp_open+0x9c/0x140
>        do_sys_openat2+0xb1/0x170
>        __x64_sys_openat+0x54/0x90
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>        __mutex_lock+0xba/0x7c0
>        blkdev_get_by_dev.part.0+0xd1/0x3c0
>        blkdev_get_by_path+0xc0/0xd0
>        btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>        btrfs_control_ioctl+0xac/0x170 [btrfs]
>        __x64_sys_ioctl+0x83/0xb0
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (uuid_mutex){+.+.}-{3:3}:
>        __mutex_lock+0xba/0x7c0
>        btrfs_rm_device+0x48/0x6a0 [btrfs]
>        btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>        __x64_sys_ioctl+0x83/0xb0
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#11){.+.+}-{0:0}:
>        lo_write_bvec+0x112/0x290 [loop]
>        loop_process_work+0x25f/0xcb0 [loop]
>        process_one_work+0x28f/0x5d0
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x170
>        ret_from_fork+0x22/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>        process_one_work+0x266/0x5d0
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x170
>        ret_from_fork+0x22/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>        __lock_acquire+0x1130/0x1dc0
>        lock_acquire+0xf5/0x320
>        flush_workqueue+0xae/0x600
>        drain_workqueue+0xa0/0x110
>        destroy_workqueue+0x36/0x250
>        __loop_clr_fd+0x9a/0x650 [loop]
>        lo_ioctl+0x29d/0x780 [loop]
>        block_ioctl+0x3f/0x50
>        __x64_sys_ioctl+0x83/0xb0
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> Chain exists of:
>   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>  Possible unsafe locking scenario:
>        CPU0                    CPU1
>        ----                    ----
>   lock(&lo->lo_mutex);
>                                lock(&disk->open_mutex);
>                                lock(&lo->lo_mutex);
>   lock((wq_completion)loop0);
> 
>  *** DEADLOCK ***
> 1 lock held by losetup/156417:
>  #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> 
> stack backtrace:
> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> Call Trace:
>  dump_stack_lvl+0x57/0x72
>  check_noncircular+0x10a/0x120
>  __lock_acquire+0x1130/0x1dc0
>  lock_acquire+0xf5/0x320
>  ? flush_workqueue+0x84/0x600
>  flush_workqueue+0xae/0x600
>  ? flush_workqueue+0x84/0x600
>  drain_workqueue+0xa0/0x110
>  destroy_workqueue+0x36/0x250
>  __loop_clr_fd+0x9a/0x650 [loop]
>  lo_ioctl+0x29d/0x780 [loop]
>  ? __lock_acquire+0x3a0/0x1dc0
>  ? update_dl_rq_load_avg+0x152/0x360
>  ? lock_is_held_type+0xa5/0x120
>  ? find_held_lock.constprop.0+0x2b/0x80
>  block_ioctl+0x3f/0x50
>  __x64_sys_ioctl+0x83/0xb0
>  do_syscall_64+0x3b/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f645884de6b
> 
> Usually the uuid_mutex exists to protect the fs_devices that map
> together all of the devices that match a specific uuid.  In rm_device
> we're messing with the uuid of a device, so it makes sense to protect
> that here.
> 
> However in doing that it pulls in a whole host of lockdep dependencies,
> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> we end up with the dependency chain under the uuid_mutex being added
> under the normal sb write dependency chain, which causes problems with
> loop devices.
> 
> We don't need the uuid mutex here however.  If we call
> btrfs_scan_one_device() before we scratch the super block we will find
> the fs_devices and not find the device itself and return EBUSY because
> the fs_devices is open.  If we call it after the scratch happens it will
> not appear to be a valid btrfs file system.

This is a bit hand wavy but the critical part of the correctness proof,
and it's not explaining it enough IMO. The important piece happens in
device_list_add, the fs_devices lookup and EBUSY, but all that is now
excluded completely by the uuid_mutex from running in parallel with any
part of rm_device.

This means that the state of the device is seen complete by each (scan,
rm device). Without the uuid mutex the scaning can find the signature,
then try to lookup the device in the list, while in parallel the rm
device changes the signature or manipulates the list. But not everything
is covered by the device list mutex so there are combinations of both
tasks with some in-progress state.  Also count in the RCU protection.

From high level it is what you say about ordering scan/scratch, but
otherwise I'm not convinced that the change is not subtly breaking
something.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices
  2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
  2021-08-24 22:08   ` Anand Jain
  2021-09-01 13:35   ` Nikolay Borisov
@ 2021-09-02 12:59   ` David Sterba
  2 siblings, 0 replies; 39+ messages in thread
From: David Sterba @ 2021-09-02 12:59 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jul 27, 2021 at 05:01:19PM -0400, Josef Bacik wrote:
> I got the following lockdep splat while testing seed devices
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #409 Not tainted
> ------------------------------------------------------
> mount/34004 is trying to acquire lock:
> ffff9eaac48188e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: clone_fs_devices+0x4d/0x170
> 
> but task is already holding lock:
> ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (btrfs-chunk-00){++++}-{3:3}:
>        down_read_nested+0x46/0x60
>        __btrfs_tree_read_lock+0x24/0x100
>        btrfs_read_lock_root_node+0x31/0x40
>        btrfs_search_slot+0x480/0x930
>        btrfs_update_device+0x63/0x180
>        btrfs_chunk_alloc_add_chunk_item+0xdc/0x3a0
>        btrfs_chunk_alloc+0x281/0x540
>        find_free_extent+0x10ca/0x1790
>        btrfs_reserve_extent+0xbf/0x1d0
>        btrfs_alloc_tree_block+0xb1/0x320
>        __btrfs_cow_block+0x136/0x5f0
>        btrfs_cow_block+0x107/0x210
>        btrfs_search_slot+0x56a/0x930
>        btrfs_truncate_inode_items+0x187/0xef0
>        btrfs_truncate_free_space_cache+0x11c/0x210
>        delete_block_group_cache+0x6f/0xb0
>        btrfs_relocate_block_group+0xf8/0x350
>        btrfs_relocate_chunk+0x38/0x120
>        btrfs_balance+0x79b/0xf00
>        btrfs_ioctl_balance+0x327/0x400
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
>        __mutex_lock+0x7d/0x750
>        btrfs_init_new_device+0x6d6/0x1540
>        btrfs_ioctl+0x1b12/0x2d30
>        __x64_sys_ioctl+0x80/0xb0
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #0 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
>        __lock_acquire+0x10ea/0x1d90
>        lock_acquire+0xb5/0x2b0
>        __mutex_lock+0x7d/0x750
>        clone_fs_devices+0x4d/0x170
>        btrfs_read_chunk_tree+0x32f/0x800
>        open_ctree+0xae3/0x16f0
>        btrfs_mount_root.cold+0x12/0xea
>        legacy_get_tree+0x2d/0x50
>        vfs_get_tree+0x25/0xc0
>        vfs_kern_mount.part.0+0x71/0xb0
>        btrfs_mount+0x10d/0x380
>        legacy_get_tree+0x2d/0x50
>        vfs_get_tree+0x25/0xc0
>        path_mount+0x433/0xb60
>        __x64_sys_mount+0xe3/0x120
>        do_syscall_64+0x38/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>   &fs_devs->device_list_mutex --> &fs_info->chunk_mutex --> btrfs-chunk-00
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(btrfs-chunk-00);
>                                lock(&fs_info->chunk_mutex);
>                                lock(btrfs-chunk-00);
>   lock(&fs_devs->device_list_mutex);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by mount/34004:
>  #0: ffff9eaad75c00e0 (&type->s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0
>  #1: ffffffffbd2dcf08 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x800
>  #2: ffff9eaac766d438 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x100
> 
> stack backtrace:
> CPU: 0 PID: 34004 Comm: mount Not tainted 5.14.0-rc2+ #409
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>  dump_stack_lvl+0x57/0x72
>  check_noncircular+0xcf/0xf0
>  __lock_acquire+0x10ea/0x1d90
>  lock_acquire+0xb5/0x2b0
>  ? clone_fs_devices+0x4d/0x170
>  ? lock_is_held_type+0xa5/0x120
>  __mutex_lock+0x7d/0x750
>  ? clone_fs_devices+0x4d/0x170
>  ? clone_fs_devices+0x4d/0x170
>  ? lockdep_init_map_type+0x47/0x220
>  ? debug_mutex_init+0x33/0x40
>  clone_fs_devices+0x4d/0x170
>  ? lock_is_held_type+0xa5/0x120
>  btrfs_read_chunk_tree+0x32f/0x800
>  ? find_held_lock+0x2b/0x80
>  open_ctree+0xae3/0x16f0
>  btrfs_mount_root.cold+0x12/0xea
>  ? rcu_read_lock_sched_held+0x3f/0x80
>  ? kfree+0x1f6/0x410
>  legacy_get_tree+0x2d/0x50
>  vfs_get_tree+0x25/0xc0
>  vfs_kern_mount.part.0+0x71/0xb0
>  btrfs_mount+0x10d/0x380
>  ? kfree+0x1f6/0x410
>  legacy_get_tree+0x2d/0x50
>  vfs_get_tree+0x25/0xc0
>  path_mount+0x433/0xb60
>  __x64_sys_mount+0xe3/0x120
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f6cbcd9788e
> 
> It is because we take the ->device_list_mutex in this path while holding
> onto the tree locks in the chunk root.  However we do not need the lock
> here, because we're already holding onto the uuid_mutex, and in fact
> have removed all other uses of the ->device_list_mutex in this path
> because of this.  Remove the ->device_list_mutex locking here, add an
> assert for the uuid_mutex and the problem is fixed.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

I'll pick Anand's version, it adds one more lock annotation and has a
bit more verbose explanation.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-02 12:58   ` David Sterba
@ 2021-09-02 14:10     ` Josef Bacik
  2021-09-17 14:33       ` David Sterba
  0 siblings, 1 reply; 39+ messages in thread
From: Josef Bacik @ 2021-09-02 14:10 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On 9/2/21 8:58 AM, David Sterba wrote:
> On Tue, Jul 27, 2021 at 05:01:14PM -0400, Josef Bacik wrote:
>> We got the following lockdep splat while running xfstests (specifically
>> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
>> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
>> converted loop to using workqueues, which comes with lockdep
>> annotations that don't exist with kworkers.  The lockdep splat is as
>> follows
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.14.0-rc2-custom+ #34 Not tainted
>> ------------------------------------------------------
>> losetup/156417 is trying to acquire lock:
>> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
>>
>> but task is already holding lock:
>> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>>         __mutex_lock+0xba/0x7c0
>>         lo_open+0x28/0x60 [loop]
>>         blkdev_get_whole+0x28/0xf0
>>         blkdev_get_by_dev.part.0+0x168/0x3c0
>>         blkdev_open+0xd2/0xe0
>>         do_dentry_open+0x163/0x3a0
>>         path_openat+0x74d/0xa40
>>         do_filp_open+0x9c/0x140
>>         do_sys_openat2+0xb1/0x170
>>         __x64_sys_openat+0x54/0x90
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>>         __mutex_lock+0xba/0x7c0
>>         blkdev_get_by_dev.part.0+0xd1/0x3c0
>>         blkdev_get_by_path+0xc0/0xd0
>>         btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>>         btrfs_control_ioctl+0xac/0x170 [btrfs]
>>         __x64_sys_ioctl+0x83/0xb0
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #3 (uuid_mutex){+.+.}-{3:3}:
>>         __mutex_lock+0xba/0x7c0
>>         btrfs_rm_device+0x48/0x6a0 [btrfs]
>>         btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>>         __x64_sys_ioctl+0x83/0xb0
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> -> #2 (sb_writers#11){.+.+}-{0:0}:
>>         lo_write_bvec+0x112/0x290 [loop]
>>         loop_process_work+0x25f/0xcb0 [loop]
>>         process_one_work+0x28f/0x5d0
>>         worker_thread+0x55/0x3c0
>>         kthread+0x140/0x170
>>         ret_from_fork+0x22/0x30
>>
>> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>>         process_one_work+0x266/0x5d0
>>         worker_thread+0x55/0x3c0
>>         kthread+0x140/0x170
>>         ret_from_fork+0x22/0x30
>>
>> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>>         __lock_acquire+0x1130/0x1dc0
>>         lock_acquire+0xf5/0x320
>>         flush_workqueue+0xae/0x600
>>         drain_workqueue+0xa0/0x110
>>         destroy_workqueue+0x36/0x250
>>         __loop_clr_fd+0x9a/0x650 [loop]
>>         lo_ioctl+0x29d/0x780 [loop]
>>         block_ioctl+0x3f/0x50
>>         __x64_sys_ioctl+0x83/0xb0
>>         do_syscall_64+0x3b/0x90
>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> other info that might help us debug this:
>> Chain exists of:
>>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>>   Possible unsafe locking scenario:
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(&lo->lo_mutex);
>>                                 lock(&disk->open_mutex);
>>                                 lock(&lo->lo_mutex);
>>    lock((wq_completion)loop0);
>>
>>   *** DEADLOCK ***
>> 1 lock held by losetup/156417:
>>   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
>>
>> stack backtrace:
>> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>> Call Trace:
>>   dump_stack_lvl+0x57/0x72
>>   check_noncircular+0x10a/0x120
>>   __lock_acquire+0x1130/0x1dc0
>>   lock_acquire+0xf5/0x320
>>   ? flush_workqueue+0x84/0x600
>>   flush_workqueue+0xae/0x600
>>   ? flush_workqueue+0x84/0x600
>>   drain_workqueue+0xa0/0x110
>>   destroy_workqueue+0x36/0x250
>>   __loop_clr_fd+0x9a/0x650 [loop]
>>   lo_ioctl+0x29d/0x780 [loop]
>>   ? __lock_acquire+0x3a0/0x1dc0
>>   ? update_dl_rq_load_avg+0x152/0x360
>>   ? lock_is_held_type+0xa5/0x120
>>   ? find_held_lock.constprop.0+0x2b/0x80
>>   block_ioctl+0x3f/0x50
>>   __x64_sys_ioctl+0x83/0xb0
>>   do_syscall_64+0x3b/0x90
>>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> RIP: 0033:0x7f645884de6b
>>
>> Usually the uuid_mutex exists to protect the fs_devices that map
>> together all of the devices that match a specific uuid.  In rm_device
>> we're messing with the uuid of a device, so it makes sense to protect
>> that here.
>>
>> However in doing that it pulls in a whole host of lockdep dependencies,
>> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
>> we end up with the dependency chain under the uuid_mutex being added
>> under the normal sb write dependency chain, which causes problems with
>> loop devices.
>>
>> We don't need the uuid mutex here however.  If we call
>> btrfs_scan_one_device() before we scratch the super block we will find
>> the fs_devices and not find the device itself and return EBUSY because
>> the fs_devices is open.  If we call it after the scratch happens it will
>> not appear to be a valid btrfs file system.
> 
> This is a bit hand wavy but the critical part of the correctness proof,
> and it's not explaining it enough IMO. The important piece happens in
> device_list_add, the fs_devices lookup and EBUSY, but all that is now
> excluded completely by the uuid_mutex from running in parallel with any
> part of rm_device.
> 
> This means that the state of the device is seen complete by each (scan,
> rm device). Without the uuid mutex the scaning can find the signature,
> then try to lookup the device in the list, while in parallel the rm
> device changes the signature or manipulates the list. But not everything
> is covered by the device list mutex so there are combinations of both
> tasks with some in-progress state.  Also count in the RCU protection.
> 
>  From high level it is what you say about ordering scan/scratch, but
> otherwise I'm not convinced that the change is not subtly breaking
> something.
> 

Yeah this is far from ideal, we really need to rework our entire device 
liftetime handling and locking, however this isn't going to break 
anything.  We are worried about rm and scan racing with each other, 
before this change we'll zero the device out under the UUID mutex so 
when scan does run it'll make sure that it can go through the whole 
device scan thing without rm messing with us.

We aren't worried if the scratch happens first, because the result is we 
don't think this is a btrfs device and we bail out.

The only case we are concerned with is we scratch _after_ scan is able 
to read the superblock and gets a seemingly valid super block, so lets 
consider this case.

Scan will call device_list_add() with the device we're removing.  We'll 
call find_fsid_with_metadata_uuid() and get our fs_devices for this 
UUID.  At this point we lock the fs_devices->device_list_mutex.  This is 
what protects us in this case, but we have two cases here.

1. We aren't to the device removal part of the RM.  We found our device, 
and device name matches our path, we go down and we set total_devices to 
our super number of devices, which doesn't affect anything because we 
haven't done the remove yet.

2. We are past the device removal part, which is protected by the 
device_list_mutex.  Scan doesn't find the device, it goes down and does the

if (fs_devices->opened)
	return -EBUSY;

check and we bail out.

Nothing about this situation is ideal, but the lockdep splat is real, 
and the fix is safe, tho admittedly a bit scary looking.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-02 14:10     ` Josef Bacik
@ 2021-09-17 14:33       ` David Sterba
  0 siblings, 0 replies; 39+ messages in thread
From: David Sterba @ 2021-09-17 14:33 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, linux-btrfs, kernel-team

On Thu, Sep 02, 2021 at 10:10:04AM -0400, Josef Bacik wrote:
> > This is a bit hand wavy but the critical part of the correctness proof,
> > and it's not explaining it enough IMO. The important piece happens in
> > device_list_add, the fs_devices lookup and EBUSY, but all that is now
> > excluded completely by the uuid_mutex from running in parallel with any
> > part of rm_device.
> > 
> > This means that the state of the device is seen complete by each (scan,
> > rm device). Without the uuid mutex the scaning can find the signature,
> > then try to lookup the device in the list, while in parallel the rm
> > device changes the signature or manipulates the list. But not everything
> > is covered by the device list mutex so there are combinations of both
> > tasks with some in-progress state.  Also count in the RCU protection.
> > 
> >  From high level it is what you say about ordering scan/scratch, but
> > otherwise I'm not convinced that the change is not subtly breaking
> > something.
> > 
> 
> Yeah this is far from ideal, we really need to rework our entire device 
> liftetime handling and locking, however this isn't going to break 
> anything.  We are worried about rm and scan racing with each other, 
> before this change we'll zero the device out under the UUID mutex so 
> when scan does run it'll make sure that it can go through the whole 
> device scan thing without rm messing with us.
> 
> We aren't worried if the scratch happens first, because the result is we 
> don't think this is a btrfs device and we bail out.
> 
> The only case we are concerned with is we scratch _after_ scan is able 
> to read the superblock and gets a seemingly valid super block, so lets 
> consider this case.
> 
> Scan will call device_list_add() with the device we're removing.  We'll 
> call find_fsid_with_metadata_uuid() and get our fs_devices for this 
> UUID.  At this point we lock the fs_devices->device_list_mutex.  This is 
> what protects us in this case, but we have two cases here.
> 
> 1. We aren't to the device removal part of the RM.  We found our device, 
> and device name matches our path, we go down and we set total_devices to 
> our super number of devices, which doesn't affect anything because we 
> haven't done the remove yet.
> 
> 2. We are past the device removal part, which is protected by the 
> device_list_mutex.  Scan doesn't find the device, it goes down and does the
> 
> if (fs_devices->opened)
> 	return -EBUSY;
> 
> check and we bail out.
> 
> Nothing about this situation is ideal, but the lockdep splat is real, 
> and the fix is safe, tho admittedly a bit scary looking.  Thanks,

Thanks, reading the code a few more times I tend to agree, I've added
this another explanation to the changelog.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/7]
  2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
                   ` (6 preceding siblings ...)
  2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
@ 2021-09-17 15:06 ` David Sterba
  7 siblings, 0 replies; 39+ messages in thread
From: David Sterba @ 2021-09-17 15:06 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jul 27, 2021 at 05:01:12PM -0400, Josef Bacik wrote:
> v1->v2:
> - Rework the first patch as it was wrong because we need it for seed devices.
> - Fix another lockdep splat I uncovered while testing against seed devices to
>   make sure I hadn't broken anything.
> 
> --- Original email ---
> 
> Hello,
> 
> The commit 87579e9b7d8d ("loop: use worker per cgroup instead of kworker")
> enabled the use of workqueues for loopback devices, which brought with it
> lockdep annotations for the workqueues for loopback devices.  This uncovered a
> cascade of lockdep warnings because of how we mess with the block_device while
> under the sb writers lock while doing the device removal.
> 
> The first patch seems innocuous but we have a lockdep_assert_held(&uuid_mutex)
> in one of the helpers, which is why I have it first.  The code should never be
> called which is why it is removed, but I'm removing it specifically to remove
> confusion about the role of the uuid_mutex here.
> 
> The next 4 patches are to resolve the lockdep messages as they occur.  There are
> several issues and I address them one at a time until we're no longer getting
> lockdep warnings.
> 
> The final patch doesn't necessarily have to go in right away, it's just a
> cleanup as I noticed we have a lot of duplicated code between the v1 and v2
> device removal handling.  Thanks,
> 
> Josef
> 
> Josef Bacik (7):
>   btrfs: do not call close_fs_devices in btrfs_rm_device
>   btrfs: do not take the uuid_mutex in btrfs_rm_device
>   btrfs: do not read super look for a device path
>   btrfs: update the bdev time directly when closing
>   btrfs: delay blkdev_put until after the device remove
>   btrfs: unify common code for the v1 and v2 versions of device remove
>   btrfs: do not take the device_list_mutex in clone_fs_devices

I've merged what looked ok and did not have comments. Remaining patches
are 1, 3 and 7. Please have a look and resend, thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
  2021-09-01 12:01   ` Anand Jain
  2021-09-02 12:58   ` David Sterba
@ 2021-09-20  7:45   ` Anand Jain
  2021-09-20  8:26     ` David Sterba
  2021-09-21 11:59   ` Filipe Manana
  2021-09-23  3:58   ` [PATCH] btrfs: drop lockdep assert in close_fs_devices() Anand Jain
  4 siblings, 1 reply; 39+ messages in thread
From: Anand Jain @ 2021-09-20  7:45 UTC (permalink / raw)
  To: Josef Bacik, David Sterba; +Cc: linux-btrfs, kernel-team


This patch is causing btrfs/225 to fail [here].

------
static void close_fs_devices(struct btrfs_fs_devices *fs_devices)
{
         struct btrfs_device *device, *tmp;

         lockdep_assert_held(&uuid_mutex);  <--- here
-------

as this patch removed mutex_lock(&uuid_mutex) in btrfs_rm_device().


commit 425c6ed6486f (btrfs: do not hold device_list_mutex when closing
devices) added lockdep_assert_held(&uuid_mutex) in close_fs_devices().


But mutex_lock(&uuid_mutex) in btrfs_rm_device() is not essential as we
discussed/proved earlier.

Remove lockdep_assert_held(&uuid_mutex) in close_fs_devices() is a 
better choice.

Thanks, Anand

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-20  7:45   ` Anand Jain
@ 2021-09-20  8:26     ` David Sterba
  2021-09-20  9:41       ` Anand Jain
  0 siblings, 1 reply; 39+ messages in thread
From: David Sterba @ 2021-09-20  8:26 UTC (permalink / raw)
  To: Anand Jain; +Cc: Josef Bacik, David Sterba, linux-btrfs, kernel-team

On Mon, Sep 20, 2021 at 03:45:14PM +0800, Anand Jain wrote:
> 
> This patch is causing btrfs/225 to fail [here].
> 
> ------
> static void close_fs_devices(struct btrfs_fs_devices *fs_devices)
> {
>          struct btrfs_device *device, *tmp;
> 
>          lockdep_assert_held(&uuid_mutex);  <--- here
> -------
> 
> as this patch removed mutex_lock(&uuid_mutex) in btrfs_rm_device().
> 
> 
> commit 425c6ed6486f (btrfs: do not hold device_list_mutex when closing
> devices) added lockdep_assert_held(&uuid_mutex) in close_fs_devices().
> 
> 
> But mutex_lock(&uuid_mutex) in btrfs_rm_device() is not essential as we
> discussed/proved earlier.
> 
> Remove lockdep_assert_held(&uuid_mutex) in close_fs_devices() is a 
> better choice.

This is the other patch that's still not in misc-next. I merged the
branch partially and in a different order so that causes the lockdep
warning. I can remove the patch "btrfs: do not take the uuid_mutex in
btrfs_rm_device" from misc-next for now and merge the whole series in
the order as sent but there were comments so I'm waiting for an update.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-20  8:26     ` David Sterba
@ 2021-09-20  9:41       ` Anand Jain
  2021-09-23  4:33         ` Anand Jain
  0 siblings, 1 reply; 39+ messages in thread
From: Anand Jain @ 2021-09-20  9:41 UTC (permalink / raw)
  To: dsterba, Josef Bacik, David Sterba, linux-btrfs, kernel-team



On 20/09/2021 16:26, David Sterba wrote:
> On Mon, Sep 20, 2021 at 03:45:14PM +0800, Anand Jain wrote:
>>
>> This patch is causing btrfs/225 to fail [here].
>>
>> ------
>> static void close_fs_devices(struct btrfs_fs_devices *fs_devices)
>> {
>>           struct btrfs_device *device, *tmp;
>>
>>           lockdep_assert_held(&uuid_mutex);  <--- here
>> -------
>>
>> as this patch removed mutex_lock(&uuid_mutex) in btrfs_rm_device().
>>
>>
>> commit 425c6ed6486f (btrfs: do not hold device_list_mutex when closing
>> devices) added lockdep_assert_held(&uuid_mutex) in close_fs_devices().
>>
>>
>> But mutex_lock(&uuid_mutex) in btrfs_rm_device() is not essential as we
>> discussed/proved earlier.
>>
>> Remove lockdep_assert_held(&uuid_mutex) in close_fs_devices() is a
>> better choice.
> 
> This is the other patch that's still not in misc-next. I merged the
> branch partially and in a different order so that causes the lockdep
> warning. I can remove the patch "btrfs: do not take the uuid_mutex in
> btrfs_rm_device" from misc-next for now and merge the whole series in
> the order as sent but there were comments so I'm waiting for an update.

Ha ha. I think you are confused, even I was. The problem assert is at 
close_fs_devices() not clone_fs_devices() (as in 7/7). They are 
similarly named.

A variant of 7/7 is already merged.
c124706900c2 btrfs: fix lockdep warning while mounting sprout fs


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
                     ` (2 preceding siblings ...)
  2021-09-20  7:45   ` Anand Jain
@ 2021-09-21 11:59   ` Filipe Manana
  2021-09-21 12:17     ` Filipe Manana
  2021-09-23  3:58   ` [PATCH] btrfs: drop lockdep assert in close_fs_devices() Anand Jain
  4 siblings, 1 reply; 39+ messages in thread
From: Filipe Manana @ 2021-09-21 11:59 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Jul 27, 2021 at 10:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> We got the following lockdep splat while running xfstests (specifically
> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> converted loop to using workqueues, which comes with lockdep
> annotations that don't exist with kworkers.  The lockdep splat is as
> follows
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2-custom+ #34 Not tainted
> ------------------------------------------------------
> losetup/156417 is trying to acquire lock:
> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
>
> but task is already holding lock:
> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>        __mutex_lock+0xba/0x7c0
>        lo_open+0x28/0x60 [loop]
>        blkdev_get_whole+0x28/0xf0
>        blkdev_get_by_dev.part.0+0x168/0x3c0
>        blkdev_open+0xd2/0xe0
>        do_dentry_open+0x163/0x3a0
>        path_openat+0x74d/0xa40
>        do_filp_open+0x9c/0x140
>        do_sys_openat2+0xb1/0x170
>        __x64_sys_openat+0x54/0x90
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>        __mutex_lock+0xba/0x7c0
>        blkdev_get_by_dev.part.0+0xd1/0x3c0
>        blkdev_get_by_path+0xc0/0xd0
>        btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>        btrfs_control_ioctl+0xac/0x170 [btrfs]
>        __x64_sys_ioctl+0x83/0xb0
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> -> #3 (uuid_mutex){+.+.}-{3:3}:
>        __mutex_lock+0xba/0x7c0
>        btrfs_rm_device+0x48/0x6a0 [btrfs]
>        btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>        __x64_sys_ioctl+0x83/0xb0
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> -> #2 (sb_writers#11){.+.+}-{0:0}:
>        lo_write_bvec+0x112/0x290 [loop]
>        loop_process_work+0x25f/0xcb0 [loop]
>        process_one_work+0x28f/0x5d0
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x170
>        ret_from_fork+0x22/0x30
>
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>        process_one_work+0x266/0x5d0
>        worker_thread+0x55/0x3c0
>        kthread+0x140/0x170
>        ret_from_fork+0x22/0x30
>
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>        __lock_acquire+0x1130/0x1dc0
>        lock_acquire+0xf5/0x320
>        flush_workqueue+0xae/0x600
>        drain_workqueue+0xa0/0x110
>        destroy_workqueue+0x36/0x250
>        __loop_clr_fd+0x9a/0x650 [loop]
>        lo_ioctl+0x29d/0x780 [loop]
>        block_ioctl+0x3f/0x50
>        __x64_sys_ioctl+0x83/0xb0
>        do_syscall_64+0x3b/0x90
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> other info that might help us debug this:
> Chain exists of:
>   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>  Possible unsafe locking scenario:
>        CPU0                    CPU1
>        ----                    ----
>   lock(&lo->lo_mutex);
>                                lock(&disk->open_mutex);
>                                lock(&lo->lo_mutex);
>   lock((wq_completion)loop0);
>
>  *** DEADLOCK ***
> 1 lock held by losetup/156417:
>  #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
>
> stack backtrace:
> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> Call Trace:
>  dump_stack_lvl+0x57/0x72
>  check_noncircular+0x10a/0x120
>  __lock_acquire+0x1130/0x1dc0
>  lock_acquire+0xf5/0x320
>  ? flush_workqueue+0x84/0x600
>  flush_workqueue+0xae/0x600
>  ? flush_workqueue+0x84/0x600
>  drain_workqueue+0xa0/0x110
>  destroy_workqueue+0x36/0x250
>  __loop_clr_fd+0x9a/0x650 [loop]
>  lo_ioctl+0x29d/0x780 [loop]
>  ? __lock_acquire+0x3a0/0x1dc0
>  ? update_dl_rq_load_avg+0x152/0x360
>  ? lock_is_held_type+0xa5/0x120
>  ? find_held_lock.constprop.0+0x2b/0x80
>  block_ioctl+0x3f/0x50
>  __x64_sys_ioctl+0x83/0xb0
>  do_syscall_64+0x3b/0x90
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f645884de6b
>
> Usually the uuid_mutex exists to protect the fs_devices that map
> together all of the devices that match a specific uuid.  In rm_device
> we're messing with the uuid of a device, so it makes sense to protect
> that here.
>
> However in doing that it pulls in a whole host of lockdep dependencies,
> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> we end up with the dependency chain under the uuid_mutex being added
> under the normal sb write dependency chain, which causes problems with
> loop devices.
>
> We don't need the uuid mutex here however.  If we call
> btrfs_scan_one_device() before we scratch the super block we will find
> the fs_devices and not find the device itself and return EBUSY because
> the fs_devices is open.  If we call it after the scratch happens it will
> not appear to be a valid btrfs file system.
>
> We do not need to worry about other fs_devices modifying operations here
> because we're protected by the exclusive operations locking.
>
> So drop the uuid_mutex here in order to fix the lockdep splat.
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/volumes.c | 5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 5217b93172b4..0e7372f637eb 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>         u64 num_devices;
>         int ret = 0;
>
> -       mutex_lock(&uuid_mutex);
> -
>         num_devices = btrfs_num_devices(fs_info);
>
>         ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
> @@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>                 mutex_unlock(&fs_info->chunk_mutex);
>         }
>
> -       mutex_unlock(&uuid_mutex);
>         ret = btrfs_shrink_device(device, 0);
>         if (!ret)
>                 btrfs_reada_remove_dev(device);
> -       mutex_lock(&uuid_mutex);

On misc-next, this is now triggering a warning due to a lockdep
assertion failure:

[ 5343.002752] ------------[ cut here ]------------
[ 5343.002756] WARNING: CPU: 3 PID: 797246 at fs/btrfs/volumes.c:1165
close_fs_devices+0x200/0x220 [btrfs]
[ 5343.002813] Modules linked in: dm_dust btrfs dm_flakey dm_mod
blake2b_generic xor raid6_pq libcrc32c bochs drm_vram_helper
intel_rapl_msr intel_rapl_common drm_ttm_helper crct10dif_pclmul ttm
ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd ppdev
cryptd joy>
[ 5343.002876] CPU: 3 PID: 797246 Comm: btrfs Not tainted
5.15.0-rc2-btrfs-next-99 #1
[ 5343.002879] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 5343.002883] RIP: 0010:close_fs_devices+0x200/0x220 [btrfs]
[ 5343.002912] Code: 8b 43 78 48 85 c0 0f 85 89 fe ff ff e9 7e fe ff
ff be ff ff ff ff 48 c7 c7 10 6f bd c0 e8 58 70 7d c9 85 c0 0f 85 20
fe ff ff <0f> 0b e9 19 fe ff ff 0f 0b e9 63 ff ff ff 0f 0b e9 67 ff ff
ff 66
[ 5343.002914] RSP: 0018:ffffb32608fe7d38 EFLAGS: 00010246
[ 5343.002917] RAX: 0000000000000000 RBX: ffff948d78f6b538 RCX: 0000000000000001
[ 5343.002918] RDX: 0000000000000000 RSI: ffffffff8aabac29 RDI: ffffffff8ab2a43e
[ 5343.002920] RBP: ffff948d78f6b400 R08: ffff948d4fcecd38 R09: 0000000000000000
[ 5343.002921] R10: 0000000000000000 R11: 0000000000000000 R12: ffff948d4fcecc78
[ 5343.002922] R13: ffff948d401bc000 R14: ffff948d78f6b400 R15: ffff948d4fcecc00
[ 5343.002924] FS:  00007fe1259208c0(0000) GS:ffff94906d400000(0000)
knlGS:0000000000000000
[ 5343.002926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5343.002927] CR2: 00007fe125a953d5 CR3: 00000001017ca005 CR4: 0000000000370ee0
[ 5343.002930] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5343.002932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5343.002933] Call Trace:
[ 5343.002938]  btrfs_rm_device.cold+0x147/0x1c0 [btrfs]
[ 5343.002981]  btrfs_ioctl+0x2dc2/0x3460 [btrfs]
[ 5343.003021]  ? __do_sys_newstat+0x48/0x70
[ 5343.003028]  ? lock_is_held_type+0xe8/0x140
[ 5343.003034]  ? __x64_sys_ioctl+0x83/0xb0
[ 5343.003037]  __x64_sys_ioctl+0x83/0xb0
[ 5343.003042]  do_syscall_64+0x3b/0xc0
[ 5343.003045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 5343.003048] RIP: 0033:0x7fe125a17d87
[ 5343.003051] Code: 00 00 00 48 8b 05 09 91 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 90 0c 00 f7 d8 64 89
01 48
[ 5343.003053] RSP: 002b:00007ffdbfbd11c8 EFLAGS: 00000206 ORIG_RAX:
0000000000000010
[ 5343.003056] RAX: ffffffffffffffda RBX: 00007ffdbfbd33b0 RCX: 00007fe125a17d87
[ 5343.003057] RDX: 00007ffdbfbd21e0 RSI: 000000005000943a RDI: 0000000000000003
[ 5343.003059] RBP: 0000000000000000 R08: 0000000000000000 R09: 006264732f766564
[ 5343.003060] R10: fffffffffffffebb R11: 0000000000000206 R12: 0000000000000003
[ 5343.003061] R13: 00007ffdbfbd33b0 R14: 0000000000000000 R15: 00007ffdbfbd33b8
[ 5343.003077] irq event stamp: 202039
[ 5343.003079] hardirqs last  enabled at (202045):
[<ffffffff8992d2a0>] __up_console_sem+0x60/0x70
[ 5343.003082] hardirqs last disabled at (202050):
[<ffffffff8992d285>] __up_console_sem+0x45/0x70
[ 5343.003083] softirqs last  enabled at (196012):
[<ffffffff898a0f2b>] irq_exit_rcu+0xeb/0x130
[ 5343.003086] softirqs last disabled at (195973):
[<ffffffff898a0f2b>] irq_exit_rcu+0xeb/0x130
[ 5343.003090] ---[ end trace 7b957e10a906f920 ]---

Happens all the time on btrfs/164 for example.
Maybe some other patch is missing?


>         if (ret)
>                 goto error_undo;
>
> @@ -2215,7 +2211,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>         }
>
>  out:
> -       mutex_unlock(&uuid_mutex);
>         return ret;
>
>  error_undo:
> --
> 2.26.3
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-21 11:59   ` Filipe Manana
@ 2021-09-21 12:17     ` Filipe Manana
  2021-09-22 15:33       ` Filipe Manana
  0 siblings, 1 reply; 39+ messages in thread
From: Filipe Manana @ 2021-09-21 12:17 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 21, 2021 at 12:59 PM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Tue, Jul 27, 2021 at 10:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > We got the following lockdep splat while running xfstests (specifically
> > btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> > by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> > converted loop to using workqueues, which comes with lockdep
> > annotations that don't exist with kworkers.  The lockdep splat is as
> > follows
> >
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 5.14.0-rc2-custom+ #34 Not tainted
> > ------------------------------------------------------
> > losetup/156417 is trying to acquire lock:
> > ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
> >
> > but task is already holding lock:
> > ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> >
> > which lock already depends on the new lock.
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
> >        __mutex_lock+0xba/0x7c0
> >        lo_open+0x28/0x60 [loop]
> >        blkdev_get_whole+0x28/0xf0
> >        blkdev_get_by_dev.part.0+0x168/0x3c0
> >        blkdev_open+0xd2/0xe0
> >        do_dentry_open+0x163/0x3a0
> >        path_openat+0x74d/0xa40
> >        do_filp_open+0x9c/0x140
> >        do_sys_openat2+0xb1/0x170
> >        __x64_sys_openat+0x54/0x90
> >        do_syscall_64+0x3b/0x90
> >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > -> #4 (&disk->open_mutex){+.+.}-{3:3}:
> >        __mutex_lock+0xba/0x7c0
> >        blkdev_get_by_dev.part.0+0xd1/0x3c0
> >        blkdev_get_by_path+0xc0/0xd0
> >        btrfs_scan_one_device+0x52/0x1f0 [btrfs]
> >        btrfs_control_ioctl+0xac/0x170 [btrfs]
> >        __x64_sys_ioctl+0x83/0xb0
> >        do_syscall_64+0x3b/0x90
> >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > -> #3 (uuid_mutex){+.+.}-{3:3}:
> >        __mutex_lock+0xba/0x7c0
> >        btrfs_rm_device+0x48/0x6a0 [btrfs]
> >        btrfs_ioctl+0x2d1c/0x3110 [btrfs]
> >        __x64_sys_ioctl+0x83/0xb0
> >        do_syscall_64+0x3b/0x90
> >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > -> #2 (sb_writers#11){.+.+}-{0:0}:
> >        lo_write_bvec+0x112/0x290 [loop]
> >        loop_process_work+0x25f/0xcb0 [loop]
> >        process_one_work+0x28f/0x5d0
> >        worker_thread+0x55/0x3c0
> >        kthread+0x140/0x170
> >        ret_from_fork+0x22/0x30
> >
> > -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
> >        process_one_work+0x266/0x5d0
> >        worker_thread+0x55/0x3c0
> >        kthread+0x140/0x170
> >        ret_from_fork+0x22/0x30
> >
> > -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
> >        __lock_acquire+0x1130/0x1dc0
> >        lock_acquire+0xf5/0x320
> >        flush_workqueue+0xae/0x600
> >        drain_workqueue+0xa0/0x110
> >        destroy_workqueue+0x36/0x250
> >        __loop_clr_fd+0x9a/0x650 [loop]
> >        lo_ioctl+0x29d/0x780 [loop]
> >        block_ioctl+0x3f/0x50
> >        __x64_sys_ioctl+0x83/0xb0
> >        do_syscall_64+0x3b/0x90
> >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > other info that might help us debug this:
> > Chain exists of:
> >   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> >  Possible unsafe locking scenario:
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&lo->lo_mutex);
> >                                lock(&disk->open_mutex);
> >                                lock(&lo->lo_mutex);
> >   lock((wq_completion)loop0);
> >
> >  *** DEADLOCK ***
> > 1 lock held by losetup/156417:
> >  #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> >
> > stack backtrace:
> > CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > Call Trace:
> >  dump_stack_lvl+0x57/0x72
> >  check_noncircular+0x10a/0x120
> >  __lock_acquire+0x1130/0x1dc0
> >  lock_acquire+0xf5/0x320
> >  ? flush_workqueue+0x84/0x600
> >  flush_workqueue+0xae/0x600
> >  ? flush_workqueue+0x84/0x600
> >  drain_workqueue+0xa0/0x110
> >  destroy_workqueue+0x36/0x250
> >  __loop_clr_fd+0x9a/0x650 [loop]
> >  lo_ioctl+0x29d/0x780 [loop]
> >  ? __lock_acquire+0x3a0/0x1dc0
> >  ? update_dl_rq_load_avg+0x152/0x360
> >  ? lock_is_held_type+0xa5/0x120
> >  ? find_held_lock.constprop.0+0x2b/0x80
> >  block_ioctl+0x3f/0x50
> >  __x64_sys_ioctl+0x83/0xb0
> >  do_syscall_64+0x3b/0x90
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f645884de6b
> >
> > Usually the uuid_mutex exists to protect the fs_devices that map
> > together all of the devices that match a specific uuid.  In rm_device
> > we're messing with the uuid of a device, so it makes sense to protect
> > that here.
> >
> > However in doing that it pulls in a whole host of lockdep dependencies,
> > as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> > we end up with the dependency chain under the uuid_mutex being added
> > under the normal sb write dependency chain, which causes problems with
> > loop devices.
> >
> > We don't need the uuid mutex here however.  If we call
> > btrfs_scan_one_device() before we scratch the super block we will find
> > the fs_devices and not find the device itself and return EBUSY because
> > the fs_devices is open.  If we call it after the scratch happens it will
> > not appear to be a valid btrfs file system.
> >
> > We do not need to worry about other fs_devices modifying operations here
> > because we're protected by the exclusive operations locking.
> >
> > So drop the uuid_mutex here in order to fix the lockdep splat.
> >
> > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > ---
> >  fs/btrfs/volumes.c | 5 -----
> >  1 file changed, 5 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 5217b93172b4..0e7372f637eb 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> >         u64 num_devices;
> >         int ret = 0;
> >
> > -       mutex_lock(&uuid_mutex);
> > -
> >         num_devices = btrfs_num_devices(fs_info);
> >
> >         ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
> > @@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> >                 mutex_unlock(&fs_info->chunk_mutex);
> >         }
> >
> > -       mutex_unlock(&uuid_mutex);
> >         ret = btrfs_shrink_device(device, 0);
> >         if (!ret)
> >                 btrfs_reada_remove_dev(device);
> > -       mutex_lock(&uuid_mutex);
>
> On misc-next, this is now triggering a warning due to a lockdep
> assertion failure:
>
> [ 5343.002752] ------------[ cut here ]------------
> [ 5343.002756] WARNING: CPU: 3 PID: 797246 at fs/btrfs/volumes.c:1165
> close_fs_devices+0x200/0x220 [btrfs]
> [ 5343.002813] Modules linked in: dm_dust btrfs dm_flakey dm_mod
> blake2b_generic xor raid6_pq libcrc32c bochs drm_vram_helper
> intel_rapl_msr intel_rapl_common drm_ttm_helper crct10dif_pclmul ttm
> ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd ppdev
> cryptd joy>
> [ 5343.002876] CPU: 3 PID: 797246 Comm: btrfs Not tainted
> 5.15.0-rc2-btrfs-next-99 #1
> [ 5343.002879] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [ 5343.002883] RIP: 0010:close_fs_devices+0x200/0x220 [btrfs]
> [ 5343.002912] Code: 8b 43 78 48 85 c0 0f 85 89 fe ff ff e9 7e fe ff
> ff be ff ff ff ff 48 c7 c7 10 6f bd c0 e8 58 70 7d c9 85 c0 0f 85 20
> fe ff ff <0f> 0b e9 19 fe ff ff 0f 0b e9 63 ff ff ff 0f 0b e9 67 ff ff
> ff 66
> [ 5343.002914] RSP: 0018:ffffb32608fe7d38 EFLAGS: 00010246
> [ 5343.002917] RAX: 0000000000000000 RBX: ffff948d78f6b538 RCX: 0000000000000001
> [ 5343.002918] RDX: 0000000000000000 RSI: ffffffff8aabac29 RDI: ffffffff8ab2a43e
> [ 5343.002920] RBP: ffff948d78f6b400 R08: ffff948d4fcecd38 R09: 0000000000000000
> [ 5343.002921] R10: 0000000000000000 R11: 0000000000000000 R12: ffff948d4fcecc78
> [ 5343.002922] R13: ffff948d401bc000 R14: ffff948d78f6b400 R15: ffff948d4fcecc00
> [ 5343.002924] FS:  00007fe1259208c0(0000) GS:ffff94906d400000(0000)
> knlGS:0000000000000000
> [ 5343.002926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5343.002927] CR2: 00007fe125a953d5 CR3: 00000001017ca005 CR4: 0000000000370ee0
> [ 5343.002930] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 5343.002932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 5343.002933] Call Trace:
> [ 5343.002938]  btrfs_rm_device.cold+0x147/0x1c0 [btrfs]
> [ 5343.002981]  btrfs_ioctl+0x2dc2/0x3460 [btrfs]
> [ 5343.003021]  ? __do_sys_newstat+0x48/0x70
> [ 5343.003028]  ? lock_is_held_type+0xe8/0x140
> [ 5343.003034]  ? __x64_sys_ioctl+0x83/0xb0
> [ 5343.003037]  __x64_sys_ioctl+0x83/0xb0
> [ 5343.003042]  do_syscall_64+0x3b/0xc0
> [ 5343.003045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 5343.003048] RIP: 0033:0x7fe125a17d87
> [ 5343.003051] Code: 00 00 00 48 8b 05 09 91 0c 00 64 c7 00 26 00 00
> 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 90 0c 00 f7 d8 64 89
> 01 48
> [ 5343.003053] RSP: 002b:00007ffdbfbd11c8 EFLAGS: 00000206 ORIG_RAX:
> 0000000000000010
> [ 5343.003056] RAX: ffffffffffffffda RBX: 00007ffdbfbd33b0 RCX: 00007fe125a17d87
> [ 5343.003057] RDX: 00007ffdbfbd21e0 RSI: 000000005000943a RDI: 0000000000000003
> [ 5343.003059] RBP: 0000000000000000 R08: 0000000000000000 R09: 006264732f766564
> [ 5343.003060] R10: fffffffffffffebb R11: 0000000000000206 R12: 0000000000000003
> [ 5343.003061] R13: 00007ffdbfbd33b0 R14: 0000000000000000 R15: 00007ffdbfbd33b8
> [ 5343.003077] irq event stamp: 202039
> [ 5343.003079] hardirqs last  enabled at (202045):
> [<ffffffff8992d2a0>] __up_console_sem+0x60/0x70
> [ 5343.003082] hardirqs last disabled at (202050):
> [<ffffffff8992d285>] __up_console_sem+0x45/0x70
> [ 5343.003083] softirqs last  enabled at (196012):
> [<ffffffff898a0f2b>] irq_exit_rcu+0xeb/0x130
> [ 5343.003086] softirqs last disabled at (195973):
> [<ffffffff898a0f2b>] irq_exit_rcu+0xeb/0x130
> [ 5343.003090] ---[ end trace 7b957e10a906f920 ]---
>
> Happens all the time on btrfs/164 for example.
> Maybe some other patch is missing?

Also, this patch alone does not (completely at least) fix that lockdep
issue with lo_mutex and disk->open_mutex, at least not on current
misc-next.
btrfs/199 triggers this:

[ 6285.539713] run fstests btrfs/199 at 2021-09-21 13:08:09
[ 6286.090226] BTRFS info (device sda): flagging fs with big metadata feature
[ 6286.090233] BTRFS info (device sda): disk space caching is enabled
[ 6286.090236] BTRFS info (device sda): has skinny extents
[ 6286.268451] loop: module loaded
[ 6286.515848] BTRFS: device fsid b59e1692-d742-4826-bb86-11b14cd1d0b0
devid 1 transid 5 /dev/sdb scanned by mkfs.btrfs (838579)
[ 6286.566724] BTRFS info (device sdb): flagging fs with big metadata feature
[ 6286.566732] BTRFS info (device sdb): disk space caching is enabled
[ 6286.566735] BTRFS info (device sdb): has skinny extents
[ 6286.575156] BTRFS info (device sdb): checking UUID tree
[ 6286.773181] loop0: detected capacity change from 0 to 20971520
[ 6286.817351] BTRFS: device fsid d416e8f8-f18e-41c8-8038-932a871c0763
devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (831305)
[ 6286.837095] BTRFS info (device loop0): flagging fs with big metadata feature
[ 6286.837101] BTRFS info (device loop0): disabling disk space caching
[ 6286.837103] BTRFS info (device loop0): setting nodatasum
[ 6286.837105] BTRFS info (device loop0): turning on sync discard
[ 6286.837107] BTRFS info (device loop0): has skinny extents
[ 6286.847904] BTRFS info (device loop0): enabling ssd optimizations
[ 6286.848767] BTRFS info (device loop0): cleaning free space cache v1
[ 6286.870143] BTRFS info (device loop0): checking UUID tree

[ 6323.701494] ======================================================
[ 6323.702261] WARNING: possible circular locking dependency detected
[ 6323.703033] 5.15.0-rc2-btrfs-next-99 #1 Tainted: G        W
[ 6323.703818] ------------------------------------------------------
[ 6323.704591] losetup/838700 is trying to acquire lock:
[ 6323.705225] ffff948d4bb35948 ((wq_completion)loop0){+.+.}-{0:0},
at: flush_workqueue+0x8b/0x5b0
[ 6323.706316]
               but task is already holding lock:
[ 6323.707047] ffff948d7c093ca0 (&lo->lo_mutex){+.+.}-{3:3}, at:
__loop_clr_fd+0x5a/0x680 [loop]
[ 6323.708198]
               which lock already depends on the new lock.

[ 6323.709664]
               the existing dependency chain (in reverse order) is:
[ 6323.711007]
               -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
[ 6323.712103]        __mutex_lock+0x92/0x900
[ 6323.712851]        lo_open+0x28/0x60 [loop]
[ 6323.713612]        blkdev_get_whole+0x28/0x90
[ 6323.714405]        blkdev_get_by_dev.part.0+0x142/0x320
[ 6323.715348]        blkdev_open+0x5e/0xa0
[ 6323.716057]        do_dentry_open+0x163/0x390
[ 6323.716841]        path_openat+0x3f0/0xa80
[ 6323.717585]        do_filp_open+0xa9/0x150
[ 6323.718326]        do_sys_openat2+0x97/0x160
[ 6323.719099]        __x64_sys_openat+0x54/0x90
[ 6323.719896]        do_syscall_64+0x3b/0xc0
[ 6323.720640]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6323.721652]
               -> #3 (&disk->open_mutex){+.+.}-{3:3}:
[ 6323.722791]        __mutex_lock+0x92/0x900
[ 6323.723530]        blkdev_get_by_dev.part.0+0x56/0x320
[ 6323.724468]        blkdev_get_by_path+0xb8/0xd0
[ 6323.725291]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
[ 6323.726344]        btrfs_find_device_by_devspec+0x154/0x1e0 [btrfs]
[ 6323.727519]        btrfs_rm_device+0x14d/0x770 [btrfs]
[ 6323.728253]        btrfs_ioctl+0x2dc2/0x3460 [btrfs]
[ 6323.728911]        __x64_sys_ioctl+0x83/0xb0
[ 6323.729439]        do_syscall_64+0x3b/0xc0
[ 6323.729943]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6323.730625]
               -> #2 (sb_writers#14){.+.+}-{0:0}:
[ 6323.731367]        lo_write_bvec+0xea/0x2a0 [loop]
[ 6323.731964]        loop_process_work+0x257/0xdb0 [loop]
[ 6323.732606]        process_one_work+0x24c/0x5b0
[ 6323.733176]        worker_thread+0x55/0x3c0
[ 6323.733692]        kthread+0x155/0x180
[ 6323.734157]        ret_from_fork+0x22/0x30
[ 6323.734662]
               -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
[ 6323.735619]        process_one_work+0x223/0x5b0
[ 6323.736181]        worker_thread+0x55/0x3c0
[ 6323.736708]        kthread+0x155/0x180
[ 6323.737168]        ret_from_fork+0x22/0x30
[ 6323.737671]
               -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
[ 6323.738464]        __lock_acquire+0x130e/0x2210
[ 6323.739033]        lock_acquire+0xd7/0x310
[ 6323.739539]        flush_workqueue+0xb5/0x5b0
[ 6323.740084]        drain_workqueue+0xa0/0x110
[ 6323.740621]        destroy_workqueue+0x36/0x280
[ 6323.741191]        __loop_clr_fd+0xb4/0x680 [loop]
[ 6323.741785]        block_ioctl+0x48/0x50
[ 6323.742272]        __x64_sys_ioctl+0x83/0xb0
[ 6323.742800]        do_syscall_64+0x3b/0xc0
[ 6323.743307]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6323.743995]
               other info that might help us debug this:

[ 6323.744979] Chain exists of:
                 (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

[ 6323.746338]  Possible unsafe locking scenario:

[ 6323.747073]        CPU0                    CPU1
[ 6323.747628]        ----                    ----
[ 6323.748190]   lock(&lo->lo_mutex);
[ 6323.748612]                                lock(&disk->open_mutex);
[ 6323.749386]                                lock(&lo->lo_mutex);
[ 6323.750201]   lock((wq_completion)loop0);
[ 6323.750696]
                *** DEADLOCK ***

[ 6323.751415] 1 lock held by losetup/838700:
[ 6323.751925]  #0: ffff948d7c093ca0 (&lo->lo_mutex){+.+.}-{3:3}, at:
__loop_clr_fd+0x5a/0x680 [loop]
[ 6323.753025]
               stack backtrace:
[ 6323.753556] CPU: 7 PID: 838700 Comm: losetup Tainted: G        W
     5.15.0-rc2-btrfs-next-99 #1
[ 6323.754659] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 6323.756066] Call Trace:
[ 6323.756375]  dump_stack_lvl+0x57/0x72
[ 6323.756842]  check_noncircular+0xf3/0x110
[ 6323.757341]  ? stack_trace_save+0x4b/0x70
[ 6323.757837]  __lock_acquire+0x130e/0x2210
[ 6323.758335]  lock_acquire+0xd7/0x310
[ 6323.758769]  ? flush_workqueue+0x8b/0x5b0
[ 6323.759258]  ? lockdep_init_map_type+0x51/0x260
[ 6323.759822]  ? lockdep_init_map_type+0x51/0x260
[ 6323.760382]  flush_workqueue+0xb5/0x5b0
[ 6323.760867]  ? flush_workqueue+0x8b/0x5b0
[ 6323.761367]  ? __mutex_unlock_slowpath+0x45/0x280
[ 6323.761948]  drain_workqueue+0xa0/0x110
[ 6323.762426]  destroy_workqueue+0x36/0x280
[ 6323.762924]  __loop_clr_fd+0xb4/0x680 [loop]
[ 6323.763465]  ? blkdev_ioctl+0xb5/0x320
[ 6323.763935]  block_ioctl+0x48/0x50
[ 6323.764356]  __x64_sys_ioctl+0x83/0xb0
[ 6323.764828]  do_syscall_64+0x3b/0xc0
[ 6323.765269]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 6323.765887] RIP: 0033:0x7fb0fe20dd87

>
>
> >         if (ret)
> >                 goto error_undo;
> >
> > @@ -2215,7 +2211,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> >         }
> >
> >  out:
> > -       mutex_unlock(&uuid_mutex);
> >         return ret;
> >
> >  error_undo:
> > --
> > 2.26.3
> >
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-21 12:17     ` Filipe Manana
@ 2021-09-22 15:33       ` Filipe Manana
  2021-09-23  4:15         ` Anand Jain
  0 siblings, 1 reply; 39+ messages in thread
From: Filipe Manana @ 2021-09-22 15:33 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Tue, Sep 21, 2021 at 1:17 PM Filipe Manana <fdmanana@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 12:59 PM Filipe Manana <fdmanana@gmail.com> wrote:
> >
> > On Tue, Jul 27, 2021 at 10:05 PM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > We got the following lockdep splat while running xfstests (specifically
> > > btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> > > by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> > > converted loop to using workqueues, which comes with lockdep
> > > annotations that don't exist with kworkers.  The lockdep splat is as
> > > follows
> > >
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 5.14.0-rc2-custom+ #34 Not tainted
> > > ------------------------------------------------------
> > > losetup/156417 is trying to acquire lock:
> > > ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
> > >
> > > but task is already holding lock:
> > > ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> > >
> > > which lock already depends on the new lock.
> > >
> > > the existing dependency chain (in reverse order) is:
> > >
> > > -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
> > >        __mutex_lock+0xba/0x7c0
> > >        lo_open+0x28/0x60 [loop]
> > >        blkdev_get_whole+0x28/0xf0
> > >        blkdev_get_by_dev.part.0+0x168/0x3c0
> > >        blkdev_open+0xd2/0xe0
> > >        do_dentry_open+0x163/0x3a0
> > >        path_openat+0x74d/0xa40
> > >        do_filp_open+0x9c/0x140
> > >        do_sys_openat2+0xb1/0x170
> > >        __x64_sys_openat+0x54/0x90
> > >        do_syscall_64+0x3b/0x90
> > >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > -> #4 (&disk->open_mutex){+.+.}-{3:3}:
> > >        __mutex_lock+0xba/0x7c0
> > >        blkdev_get_by_dev.part.0+0xd1/0x3c0
> > >        blkdev_get_by_path+0xc0/0xd0
> > >        btrfs_scan_one_device+0x52/0x1f0 [btrfs]
> > >        btrfs_control_ioctl+0xac/0x170 [btrfs]
> > >        __x64_sys_ioctl+0x83/0xb0
> > >        do_syscall_64+0x3b/0x90
> > >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > -> #3 (uuid_mutex){+.+.}-{3:3}:
> > >        __mutex_lock+0xba/0x7c0
> > >        btrfs_rm_device+0x48/0x6a0 [btrfs]
> > >        btrfs_ioctl+0x2d1c/0x3110 [btrfs]
> > >        __x64_sys_ioctl+0x83/0xb0
> > >        do_syscall_64+0x3b/0x90
> > >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > -> #2 (sb_writers#11){.+.+}-{0:0}:
> > >        lo_write_bvec+0x112/0x290 [loop]
> > >        loop_process_work+0x25f/0xcb0 [loop]
> > >        process_one_work+0x28f/0x5d0
> > >        worker_thread+0x55/0x3c0
> > >        kthread+0x140/0x170
> > >        ret_from_fork+0x22/0x30
> > >
> > > -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
> > >        process_one_work+0x266/0x5d0
> > >        worker_thread+0x55/0x3c0
> > >        kthread+0x140/0x170
> > >        ret_from_fork+0x22/0x30
> > >
> > > -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
> > >        __lock_acquire+0x1130/0x1dc0
> > >        lock_acquire+0xf5/0x320
> > >        flush_workqueue+0xae/0x600
> > >        drain_workqueue+0xa0/0x110
> > >        destroy_workqueue+0x36/0x250
> > >        __loop_clr_fd+0x9a/0x650 [loop]
> > >        lo_ioctl+0x29d/0x780 [loop]
> > >        block_ioctl+0x3f/0x50
> > >        __x64_sys_ioctl+0x83/0xb0
> > >        do_syscall_64+0x3b/0x90
> > >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > other info that might help us debug this:
> > > Chain exists of:
> > >   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> > >  Possible unsafe locking scenario:
> > >        CPU0                    CPU1
> > >        ----                    ----
> > >   lock(&lo->lo_mutex);
> > >                                lock(&disk->open_mutex);
> > >                                lock(&lo->lo_mutex);
> > >   lock((wq_completion)loop0);
> > >
> > >  *** DEADLOCK ***
> > > 1 lock held by losetup/156417:
> > >  #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> > >
> > > stack backtrace:
> > > CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > > Call Trace:
> > >  dump_stack_lvl+0x57/0x72
> > >  check_noncircular+0x10a/0x120
> > >  __lock_acquire+0x1130/0x1dc0
> > >  lock_acquire+0xf5/0x320
> > >  ? flush_workqueue+0x84/0x600
> > >  flush_workqueue+0xae/0x600
> > >  ? flush_workqueue+0x84/0x600
> > >  drain_workqueue+0xa0/0x110
> > >  destroy_workqueue+0x36/0x250
> > >  __loop_clr_fd+0x9a/0x650 [loop]
> > >  lo_ioctl+0x29d/0x780 [loop]
> > >  ? __lock_acquire+0x3a0/0x1dc0
> > >  ? update_dl_rq_load_avg+0x152/0x360
> > >  ? lock_is_held_type+0xa5/0x120
> > >  ? find_held_lock.constprop.0+0x2b/0x80
> > >  block_ioctl+0x3f/0x50
> > >  __x64_sys_ioctl+0x83/0xb0
> > >  do_syscall_64+0x3b/0x90
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > RIP: 0033:0x7f645884de6b
> > >
> > > Usually the uuid_mutex exists to protect the fs_devices that map
> > > together all of the devices that match a specific uuid.  In rm_device
> > > we're messing with the uuid of a device, so it makes sense to protect
> > > that here.
> > >
> > > However in doing that it pulls in a whole host of lockdep dependencies,
> > > as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> > > we end up with the dependency chain under the uuid_mutex being added
> > > under the normal sb write dependency chain, which causes problems with
> > > loop devices.
> > >
> > > We don't need the uuid mutex here however.  If we call
> > > btrfs_scan_one_device() before we scratch the super block we will find
> > > the fs_devices and not find the device itself and return EBUSY because
> > > the fs_devices is open.  If we call it after the scratch happens it will
> > > not appear to be a valid btrfs file system.
> > >
> > > We do not need to worry about other fs_devices modifying operations here
> > > because we're protected by the exclusive operations locking.
> > >
> > > So drop the uuid_mutex here in order to fix the lockdep splat.
> > >
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > > ---
> > >  fs/btrfs/volumes.c | 5 -----
> > >  1 file changed, 5 deletions(-)
> > >
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index 5217b93172b4..0e7372f637eb 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> > >         u64 num_devices;
> > >         int ret = 0;
> > >
> > > -       mutex_lock(&uuid_mutex);
> > > -
> > >         num_devices = btrfs_num_devices(fs_info);
> > >
> > >         ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
> > > @@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> > >                 mutex_unlock(&fs_info->chunk_mutex);
> > >         }
> > >
> > > -       mutex_unlock(&uuid_mutex);
> > >         ret = btrfs_shrink_device(device, 0);
> > >         if (!ret)
> > >                 btrfs_reada_remove_dev(device);
> > > -       mutex_lock(&uuid_mutex);
> >
> > On misc-next, this is now triggering a warning due to a lockdep
> > assertion failure:
> >
> > [ 5343.002752] ------------[ cut here ]------------
> > [ 5343.002756] WARNING: CPU: 3 PID: 797246 at fs/btrfs/volumes.c:1165
> > close_fs_devices+0x200/0x220 [btrfs]
> > [ 5343.002813] Modules linked in: dm_dust btrfs dm_flakey dm_mod
> > blake2b_generic xor raid6_pq libcrc32c bochs drm_vram_helper
> > intel_rapl_msr intel_rapl_common drm_ttm_helper crct10dif_pclmul ttm
> > ghash_clmulni_intel aesni_intel drm_kms_helper crypto_simd ppdev
> > cryptd joy>
> > [ 5343.002876] CPU: 3 PID: 797246 Comm: btrfs Not tainted
> > 5.15.0-rc2-btrfs-next-99 #1
> > [ 5343.002879] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > [ 5343.002883] RIP: 0010:close_fs_devices+0x200/0x220 [btrfs]
> > [ 5343.002912] Code: 8b 43 78 48 85 c0 0f 85 89 fe ff ff e9 7e fe ff
> > ff be ff ff ff ff 48 c7 c7 10 6f bd c0 e8 58 70 7d c9 85 c0 0f 85 20
> > fe ff ff <0f> 0b e9 19 fe ff ff 0f 0b e9 63 ff ff ff 0f 0b e9 67 ff ff
> > ff 66
> > [ 5343.002914] RSP: 0018:ffffb32608fe7d38 EFLAGS: 00010246
> > [ 5343.002917] RAX: 0000000000000000 RBX: ffff948d78f6b538 RCX: 0000000000000001
> > [ 5343.002918] RDX: 0000000000000000 RSI: ffffffff8aabac29 RDI: ffffffff8ab2a43e
> > [ 5343.002920] RBP: ffff948d78f6b400 R08: ffff948d4fcecd38 R09: 0000000000000000
> > [ 5343.002921] R10: 0000000000000000 R11: 0000000000000000 R12: ffff948d4fcecc78
> > [ 5343.002922] R13: ffff948d401bc000 R14: ffff948d78f6b400 R15: ffff948d4fcecc00
> > [ 5343.002924] FS:  00007fe1259208c0(0000) GS:ffff94906d400000(0000)
> > knlGS:0000000000000000
> > [ 5343.002926] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5343.002927] CR2: 00007fe125a953d5 CR3: 00000001017ca005 CR4: 0000000000370ee0
> > [ 5343.002930] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 5343.002932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 5343.002933] Call Trace:
> > [ 5343.002938]  btrfs_rm_device.cold+0x147/0x1c0 [btrfs]
> > [ 5343.002981]  btrfs_ioctl+0x2dc2/0x3460 [btrfs]
> > [ 5343.003021]  ? __do_sys_newstat+0x48/0x70
> > [ 5343.003028]  ? lock_is_held_type+0xe8/0x140
> > [ 5343.003034]  ? __x64_sys_ioctl+0x83/0xb0
> > [ 5343.003037]  __x64_sys_ioctl+0x83/0xb0
> > [ 5343.003042]  do_syscall_64+0x3b/0xc0
> > [ 5343.003045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 5343.003048] RIP: 0033:0x7fe125a17d87
> > [ 5343.003051] Code: 00 00 00 48 8b 05 09 91 0c 00 64 c7 00 26 00 00
> > 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
> > 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 90 0c 00 f7 d8 64 89
> > 01 48
> > [ 5343.003053] RSP: 002b:00007ffdbfbd11c8 EFLAGS: 00000206 ORIG_RAX:
> > 0000000000000010
> > [ 5343.003056] RAX: ffffffffffffffda RBX: 00007ffdbfbd33b0 RCX: 00007fe125a17d87
> > [ 5343.003057] RDX: 00007ffdbfbd21e0 RSI: 000000005000943a RDI: 0000000000000003
> > [ 5343.003059] RBP: 0000000000000000 R08: 0000000000000000 R09: 006264732f766564
> > [ 5343.003060] R10: fffffffffffffebb R11: 0000000000000206 R12: 0000000000000003
> > [ 5343.003061] R13: 00007ffdbfbd33b0 R14: 0000000000000000 R15: 00007ffdbfbd33b8
> > [ 5343.003077] irq event stamp: 202039
> > [ 5343.003079] hardirqs last  enabled at (202045):
> > [<ffffffff8992d2a0>] __up_console_sem+0x60/0x70
> > [ 5343.003082] hardirqs last disabled at (202050):
> > [<ffffffff8992d285>] __up_console_sem+0x45/0x70
> > [ 5343.003083] softirqs last  enabled at (196012):
> > [<ffffffff898a0f2b>] irq_exit_rcu+0xeb/0x130
> > [ 5343.003086] softirqs last disabled at (195973):
> > [<ffffffff898a0f2b>] irq_exit_rcu+0xeb/0x130
> > [ 5343.003090] ---[ end trace 7b957e10a906f920 ]---
> >
> > Happens all the time on btrfs/164 for example.
> > Maybe some other patch is missing?
>
> Also, this patch alone does not (completely at least) fix that lockdep
> issue with lo_mutex and disk->open_mutex, at least not on current
> misc-next.
> btrfs/199 triggers this:
>
> [ 6285.539713] run fstests btrfs/199 at 2021-09-21 13:08:09
> [ 6286.090226] BTRFS info (device sda): flagging fs with big metadata feature
> [ 6286.090233] BTRFS info (device sda): disk space caching is enabled
> [ 6286.090236] BTRFS info (device sda): has skinny extents
> [ 6286.268451] loop: module loaded
> [ 6286.515848] BTRFS: device fsid b59e1692-d742-4826-bb86-11b14cd1d0b0
> devid 1 transid 5 /dev/sdb scanned by mkfs.btrfs (838579)
> [ 6286.566724] BTRFS info (device sdb): flagging fs with big metadata feature
> [ 6286.566732] BTRFS info (device sdb): disk space caching is enabled
> [ 6286.566735] BTRFS info (device sdb): has skinny extents
> [ 6286.575156] BTRFS info (device sdb): checking UUID tree
> [ 6286.773181] loop0: detected capacity change from 0 to 20971520
> [ 6286.817351] BTRFS: device fsid d416e8f8-f18e-41c8-8038-932a871c0763
> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (831305)
> [ 6286.837095] BTRFS info (device loop0): flagging fs with big metadata feature
> [ 6286.837101] BTRFS info (device loop0): disabling disk space caching
> [ 6286.837103] BTRFS info (device loop0): setting nodatasum
> [ 6286.837105] BTRFS info (device loop0): turning on sync discard
> [ 6286.837107] BTRFS info (device loop0): has skinny extents
> [ 6286.847904] BTRFS info (device loop0): enabling ssd optimizations
> [ 6286.848767] BTRFS info (device loop0): cleaning free space cache v1
> [ 6286.870143] BTRFS info (device loop0): checking UUID tree
>
> [ 6323.701494] ======================================================
> [ 6323.702261] WARNING: possible circular locking dependency detected
> [ 6323.703033] 5.15.0-rc2-btrfs-next-99 #1 Tainted: G        W
> [ 6323.703818] ------------------------------------------------------
> [ 6323.704591] losetup/838700 is trying to acquire lock:
> [ 6323.705225] ffff948d4bb35948 ((wq_completion)loop0){+.+.}-{0:0},
> at: flush_workqueue+0x8b/0x5b0
> [ 6323.706316]
>                but task is already holding lock:
> [ 6323.707047] ffff948d7c093ca0 (&lo->lo_mutex){+.+.}-{3:3}, at:
> __loop_clr_fd+0x5a/0x680 [loop]
> [ 6323.708198]
>                which lock already depends on the new lock.
>
> [ 6323.709664]
>                the existing dependency chain (in reverse order) is:
> [ 6323.711007]
>                -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
> [ 6323.712103]        __mutex_lock+0x92/0x900
> [ 6323.712851]        lo_open+0x28/0x60 [loop]
> [ 6323.713612]        blkdev_get_whole+0x28/0x90
> [ 6323.714405]        blkdev_get_by_dev.part.0+0x142/0x320
> [ 6323.715348]        blkdev_open+0x5e/0xa0
> [ 6323.716057]        do_dentry_open+0x163/0x390
> [ 6323.716841]        path_openat+0x3f0/0xa80
> [ 6323.717585]        do_filp_open+0xa9/0x150
> [ 6323.718326]        do_sys_openat2+0x97/0x160
> [ 6323.719099]        __x64_sys_openat+0x54/0x90
> [ 6323.719896]        do_syscall_64+0x3b/0xc0
> [ 6323.720640]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 6323.721652]
>                -> #3 (&disk->open_mutex){+.+.}-{3:3}:
> [ 6323.722791]        __mutex_lock+0x92/0x900
> [ 6323.723530]        blkdev_get_by_dev.part.0+0x56/0x320
> [ 6323.724468]        blkdev_get_by_path+0xb8/0xd0
> [ 6323.725291]        btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
> [ 6323.726344]        btrfs_find_device_by_devspec+0x154/0x1e0 [btrfs]
> [ 6323.727519]        btrfs_rm_device+0x14d/0x770 [btrfs]
> [ 6323.728253]        btrfs_ioctl+0x2dc2/0x3460 [btrfs]
> [ 6323.728911]        __x64_sys_ioctl+0x83/0xb0
> [ 6323.729439]        do_syscall_64+0x3b/0xc0
> [ 6323.729943]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 6323.730625]
>                -> #2 (sb_writers#14){.+.+}-{0:0}:
> [ 6323.731367]        lo_write_bvec+0xea/0x2a0 [loop]
> [ 6323.731964]        loop_process_work+0x257/0xdb0 [loop]
> [ 6323.732606]        process_one_work+0x24c/0x5b0
> [ 6323.733176]        worker_thread+0x55/0x3c0
> [ 6323.733692]        kthread+0x155/0x180
> [ 6323.734157]        ret_from_fork+0x22/0x30
> [ 6323.734662]
>                -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
> [ 6323.735619]        process_one_work+0x223/0x5b0
> [ 6323.736181]        worker_thread+0x55/0x3c0
> [ 6323.736708]        kthread+0x155/0x180
> [ 6323.737168]        ret_from_fork+0x22/0x30
> [ 6323.737671]
>                -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
> [ 6323.738464]        __lock_acquire+0x130e/0x2210
> [ 6323.739033]        lock_acquire+0xd7/0x310
> [ 6323.739539]        flush_workqueue+0xb5/0x5b0
> [ 6323.740084]        drain_workqueue+0xa0/0x110
> [ 6323.740621]        destroy_workqueue+0x36/0x280
> [ 6323.741191]        __loop_clr_fd+0xb4/0x680 [loop]
> [ 6323.741785]        block_ioctl+0x48/0x50
> [ 6323.742272]        __x64_sys_ioctl+0x83/0xb0
> [ 6323.742800]        do_syscall_64+0x3b/0xc0
> [ 6323.743307]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 6323.743995]
>                other info that might help us debug this:
>
> [ 6323.744979] Chain exists of:
>                  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>
> [ 6323.746338]  Possible unsafe locking scenario:
>
> [ 6323.747073]        CPU0                    CPU1
> [ 6323.747628]        ----                    ----
> [ 6323.748190]   lock(&lo->lo_mutex);
> [ 6323.748612]                                lock(&disk->open_mutex);
> [ 6323.749386]                                lock(&lo->lo_mutex);
> [ 6323.750201]   lock((wq_completion)loop0);
> [ 6323.750696]
>                 *** DEADLOCK ***
>
> [ 6323.751415] 1 lock held by losetup/838700:
> [ 6323.751925]  #0: ffff948d7c093ca0 (&lo->lo_mutex){+.+.}-{3:3}, at:
> __loop_clr_fd+0x5a/0x680 [loop]
> [ 6323.753025]
>                stack backtrace:
> [ 6323.753556] CPU: 7 PID: 838700 Comm: losetup Tainted: G        W
>      5.15.0-rc2-btrfs-next-99 #1
> [ 6323.754659] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [ 6323.756066] Call Trace:
> [ 6323.756375]  dump_stack_lvl+0x57/0x72
> [ 6323.756842]  check_noncircular+0xf3/0x110
> [ 6323.757341]  ? stack_trace_save+0x4b/0x70
> [ 6323.757837]  __lock_acquire+0x130e/0x2210
> [ 6323.758335]  lock_acquire+0xd7/0x310
> [ 6323.758769]  ? flush_workqueue+0x8b/0x5b0
> [ 6323.759258]  ? lockdep_init_map_type+0x51/0x260
> [ 6323.759822]  ? lockdep_init_map_type+0x51/0x260
> [ 6323.760382]  flush_workqueue+0xb5/0x5b0
> [ 6323.760867]  ? flush_workqueue+0x8b/0x5b0
> [ 6323.761367]  ? __mutex_unlock_slowpath+0x45/0x280
> [ 6323.761948]  drain_workqueue+0xa0/0x110
> [ 6323.762426]  destroy_workqueue+0x36/0x280
> [ 6323.762924]  __loop_clr_fd+0xb4/0x680 [loop]
> [ 6323.763465]  ? blkdev_ioctl+0xb5/0x320
> [ 6323.763935]  block_ioctl+0x48/0x50
> [ 6323.764356]  __x64_sys_ioctl+0x83/0xb0
> [ 6323.764828]  do_syscall_64+0x3b/0xc0
> [ 6323.765269]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 6323.765887] RIP: 0033:0x7fb0fe20dd87

generic/648, on latest misc-next (that has this patch integrated),
also triggers the same type of lockdep warning involving the same two
locks:

[19738.081729] ======================================================
[19738.082620] WARNING: possible circular locking dependency detected
[19738.083511] 5.15.0-rc2-btrfs-next-99 #1 Not tainted
[19738.084234] ------------------------------------------------------
[19738.085149] umount/508378 is trying to acquire lock:
[19738.085884] ffff97a34c161d48 ((wq_completion)loop0){+.+.}-{0:0},
at: flush_workqueue+0x8b/0x5b0
[19738.087180]
               but task is already holding lock:
[19738.088048] ffff97a31f64d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at:
__loop_clr_fd+0x5a/0x680 [loop]
[19738.089274]
               which lock already depends on the new lock.

[19738.090287]
               the existing dependency chain (in reverse order) is:
[19738.091216]
               -> #8 (&lo->lo_mutex){+.+.}-{3:3}:
[19738.091959]        __mutex_lock+0x92/0x900
[19738.092473]        lo_open+0x28/0x60 [loop]
[19738.093018]        blkdev_get_whole+0x28/0x90
[19738.093650]        blkdev_get_by_dev.part.0+0x142/0x320
[19738.094298]        blkdev_open+0x5e/0xa0
[19738.094790]        do_dentry_open+0x163/0x390
[19738.095425]        path_openat+0x3f0/0xa80
[19738.096041]        do_filp_open+0xa9/0x150
[19738.096657]        do_sys_openat2+0x97/0x160
[19738.097299]        __x64_sys_openat+0x54/0x90
[19738.097914]        do_syscall_64+0x3b/0xc0
[19738.098433]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.099243]
               -> #7 (&disk->open_mutex){+.+.}-{3:3}:
[19738.100259]        __mutex_lock+0x92/0x900
[19738.100865]        blkdev_get_by_dev.part.0+0x56/0x320
[19738.101530]        swsusp_check+0x19/0x150
[19738.102046]        software_resume.part.0+0xb8/0x150
[19738.102678]        resume_store+0xaf/0xd0
[19738.103181]        kernfs_fop_write_iter+0x140/0x1e0
[19738.103799]        new_sync_write+0x122/0x1b0
[19738.104341]        vfs_write+0x29e/0x3d0
[19738.104831]        ksys_write+0x68/0xe0
[19738.105309]        do_syscall_64+0x3b/0xc0
[19738.105823]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.106524]
               -> #6 (system_transition_mutex/1){+.+.}-{3:3}:
[19738.107393]        __mutex_lock+0x92/0x900
[19738.107911]        software_resume.part.0+0x18/0x150
[19738.108537]        resume_store+0xaf/0xd0
[19738.109057]        kernfs_fop_write_iter+0x140/0x1e0
[19738.109675]        new_sync_write+0x122/0x1b0
[19738.110218]        vfs_write+0x29e/0x3d0
[19738.110711]        ksys_write+0x68/0xe0
[19738.111190]        do_syscall_64+0x3b/0xc0
[19738.111699]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.112388]
               -> #5 (&of->mutex){+.+.}-{3:3}:
[19738.113089]        __mutex_lock+0x92/0x900
[19738.113600]        kernfs_seq_start+0x2a/0xb0
[19738.114141]        seq_read_iter+0x101/0x4d0
[19738.114679]        new_sync_read+0x11b/0x1a0
[19738.115212]        vfs_read+0x128/0x1c0
[19738.115691]        ksys_read+0x68/0xe0
[19738.116159]        do_syscall_64+0x3b/0xc0
[19738.116670]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.117382]
               -> #4 (&p->lock){+.+.}-{3:3}:
[19738.118062]        __mutex_lock+0x92/0x900
[19738.118580]        seq_read_iter+0x51/0x4d0
[19738.119102]        proc_reg_read_iter+0x48/0x80
[19738.119651]        generic_file_splice_read+0x102/0x1b0
[19738.120301]        splice_file_to_pipe+0xbc/0xd0
[19738.120879]        do_sendfile+0x14e/0x5a0
[19738.121389]        do_syscall_64+0x3b/0xc0
[19738.121901]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.122597]
               -> #3 (&pipe->mutex/1){+.+.}-{3:3}:
[19738.123339]        __mutex_lock+0x92/0x900
[19738.123850]        iter_file_splice_write+0x98/0x440
[19738.124475]        do_splice+0x36b/0x880
[19738.124981]        __do_splice+0xde/0x160
[19738.125483]        __x64_sys_splice+0x92/0x110
[19738.126037]        do_syscall_64+0x3b/0xc0
[19738.126553]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.127245]
               -> #2 (sb_writers#14){.+.+}-{0:0}:
[19738.127978]        lo_write_bvec+0xea/0x2a0 [loop]
[19738.128576]        loop_process_work+0x257/0xdb0 [loop]
[19738.129224]        process_one_work+0x24c/0x5b0
[19738.129789]        worker_thread+0x55/0x3c0
[19738.130311]        kthread+0x155/0x180
[19738.130783]        ret_from_fork+0x22/0x30
[19738.131296]
               -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
[19738.132262]        process_one_work+0x223/0x5b0
[19738.132827]        worker_thread+0x55/0x3c0
[19738.133365]        kthread+0x155/0x180
[19738.133834]        ret_from_fork+0x22/0x30
[19738.134350]
               -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
[19738.135153]        __lock_acquire+0x130e/0x2210
[19738.135715]        lock_acquire+0xd7/0x310
[19738.136224]        flush_workqueue+0xb5/0x5b0
[19738.136766]        drain_workqueue+0xa0/0x110
[19738.137308]        destroy_workqueue+0x36/0x280
[19738.137870]        __loop_clr_fd+0xb4/0x680 [loop]
[19738.138473]        blkdev_put+0xc7/0x220
[19738.138964]        close_fs_devices+0x95/0x220 [btrfs]
[19738.139685]        btrfs_close_devices+0x48/0x160 [btrfs]
[19738.140379]        generic_shutdown_super+0x74/0x110
[19738.141011]        kill_anon_super+0x14/0x30
[19738.141542]        btrfs_kill_super+0x12/0x20 [btrfs]
[19738.142189]        deactivate_locked_super+0x31/0xa0
[19738.142812]        cleanup_mnt+0x147/0x1c0
[19738.143322]        task_work_run+0x5c/0xa0
[19738.143831]        exit_to_user_mode_prepare+0x20c/0x210
[19738.144487]        syscall_exit_to_user_mode+0x27/0x60
[19738.145125]        do_syscall_64+0x48/0xc0
[19738.145636]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.146466]
               other info that might help us debug this:

[19738.147602] Chain exists of:
                 (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

[19738.149221]  Possible unsafe locking scenario:

[19738.149952]        CPU0                    CPU1
[19738.150520]        ----                    ----
[19738.151082]   lock(&lo->lo_mutex);
[19738.151508]                                lock(&disk->open_mutex);
[19738.152276]                                lock(&lo->lo_mutex);
[19738.153010]   lock((wq_completion)loop0);
[19738.153510]
                *** DEADLOCK ***

[19738.154241] 4 locks held by umount/508378:
[19738.154756]  #0: ffff97a30dd9c0e8
(&type->s_umount_key#62){++++}-{3:3}, at: deactivate_super+0x2c/0x40
[19738.155900]  #1: ffffffffc0ac5f10 (uuid_mutex){+.+.}-{3:3}, at:
btrfs_close_devices+0x40/0x160 [btrfs]
[19738.157094]  #2: ffff97a31bc6d928 (&disk->open_mutex){+.+.}-{3:3},
at: blkdev_put+0x3a/0x220
[19738.158137]  #3: ffff97a31f64d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at:
__loop_clr_fd+0x5a/0x680 [loop]
[19738.159244]
               stack backtrace:
[19738.159784] CPU: 2 PID: 508378 Comm: umount Not tainted
5.15.0-rc2-btrfs-next-99 #1
[19738.160723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[19738.162132] Call Trace:
[19738.162448]  dump_stack_lvl+0x57/0x72
[19738.162908]  check_noncircular+0xf3/0x110
[19738.163411]  __lock_acquire+0x130e/0x2210
[19738.163912]  lock_acquire+0xd7/0x310
[19738.164358]  ? flush_workqueue+0x8b/0x5b0
[19738.164859]  ? lockdep_init_map_type+0x51/0x260
[19738.165437]  ? lockdep_init_map_type+0x51/0x260
[19738.165999]  flush_workqueue+0xb5/0x5b0
[19738.166481]  ? flush_workqueue+0x8b/0x5b0
[19738.166990]  ? __mutex_unlock_slowpath+0x45/0x280
[19738.167574]  drain_workqueue+0xa0/0x110
[19738.168052]  destroy_workqueue+0x36/0x280
[19738.168551]  __loop_clr_fd+0xb4/0x680 [loop]
[19738.169084]  blkdev_put+0xc7/0x220
[19738.169510]  close_fs_devices+0x95/0x220 [btrfs]
[19738.170109]  btrfs_close_devices+0x48/0x160 [btrfs]
[19738.170745]  generic_shutdown_super+0x74/0x110
[19738.171300]  kill_anon_super+0x14/0x30
[19738.171760]  btrfs_kill_super+0x12/0x20 [btrfs]
[19738.172342]  deactivate_locked_super+0x31/0xa0
[19738.172880]  cleanup_mnt+0x147/0x1c0
[19738.173343]  task_work_run+0x5c/0xa0
[19738.173781]  exit_to_user_mode_prepare+0x20c/0x210
[19738.174381]  syscall_exit_to_user_mode+0x27/0x60
[19738.174957]  do_syscall_64+0x48/0xc0
[19738.175407]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[19738.176037] RIP: 0033:0x7f4d7104fee7
[19738.176487] Code: ff 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f
44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 ff 0b 00 f7 d8 64 89
01 48
[19738.178787] RSP: 002b:00007ffeca2fd758 EFLAGS: 00000246 ORIG_RAX:
00000000000000a6
[19738.179722] RAX: 0000000000000000 RBX: 00007f4d71175264 RCX: 00007f4d7104fee7
[19738.180601] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005615eb38bdd0
[19738.181496] RBP: 00005615eb38bba0 R08: 0000000000000000 R09: 00007ffeca2fc4d0
[19738.182376] R10: 00005615eb38bdf0 R11: 0000000000000246 R12: 0000000000000000
[19738.183249] R13: 00005615eb38bdd0 R14: 00005615eb38bcb0 R15: 0000000000000000

>
> >
> >
> > >         if (ret)
> > >                 goto error_undo;
> > >
> > > @@ -2215,7 +2211,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> > >         }
> > >
> > >  out:
> > > -       mutex_unlock(&uuid_mutex);
> > >         return ret;
> > >
> > >  error_undo:
> > > --
> > > 2.26.3
> > >
> >
> >
> > --
> > Filipe David Manana,
> >
> > “Whether you think you can, or you think you can't — you're right.”
>
>
>
> --
> Filipe David Manana,
>
> “Whether you think you can, or you think you can't — you're right.”



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] btrfs: drop lockdep assert in close_fs_devices()
  2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
                     ` (3 preceding siblings ...)
  2021-09-21 11:59   ` Filipe Manana
@ 2021-09-23  3:58   ` Anand Jain
  2021-09-23  4:04     ` Anand Jain
  4 siblings, 1 reply; 39+ messages in thread
From: Anand Jain @ 2021-09-23  3:58 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, dsterba, fdmanana

btrfs/225, btrfs/164  reports warning due to a lockdep assertion failure:

[ 5343.002752] ------------[ cut here ]------------
[ 5343.002756] WARNING: CPU: 3 PID: 797246 at fs/btrfs/volumes.c:1165
close_fs_devices+0x200/0x220 [btrfs]

[ 5343.002933] Call Trace:
[ 5343.002938]  btrfs_rm_device.cold+0x147/0x1c0 [btrfs]
[ 5343.002981]  btrfs_ioctl+0x2dc2/0x3460 [btrfs]
[ 5343.003021]  ? __do_sys_newstat+0x48/0x70
[ 5343.003028]  ? lock_is_held_type+0xe8/0x140
[ 5343.003034]  ? __x64_sys_ioctl+0x83/0xb0
[ 5343.003037]  __x64_sys_ioctl+0x83/0xb0
[ 5343.003042]  do_syscall_64+0x3b/0xc0
[ 5343.003045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 5343.003048] RIP: 0033:0x7fe125a17d87

The patch [1] removed uuid_mutex in btrfs_rm_device(). So now there is no
uuid_mutex in the call chain leading to close_fs_devices() that has
lockdep_assert_held(uuid_mutex).

 [1]  [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device

The lockdep_assert_held(uuid_mutex) in close_fs_devices() was added by the
commit 425c6ed6486f (btrfs: do not hold device_list_mutex when closing
devices) as it found that device_list_mutex in close_fs_devices() was
redundant.

In the current code the lockdep_assert_held(uuid_mutex) in close_fs_devices()
in incorrect, remove it.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
David,
  Pls feel free to either roll this into the patch "[PATCH v2 2/7] btrfs: do not
  take the uuid_mutex in btrfs_rm_device" or merge it as an independent patch.
 
 fs/btrfs/volumes.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9fea27b9f9be..ac4a9f349932 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1162,8 +1162,6 @@ static void close_fs_devices(struct btrfs_fs_devices *fs_devices)
 {
 	struct btrfs_device *device, *tmp;
 
-	lockdep_assert_held(&uuid_mutex);
-
 	if (--fs_devices->opened > 0)
 		return;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] btrfs: drop lockdep assert in close_fs_devices()
  2021-09-23  3:58   ` [PATCH] btrfs: drop lockdep assert in close_fs_devices() Anand Jain
@ 2021-09-23  4:04     ` Anand Jain
  0 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-23  4:04 UTC (permalink / raw)
  To: linux-btrfs; +Cc: josef, dsterba, fdmanana


Ignore this patch.

  The patch 1/7 in this series is not yet integrated. It will fix the issue.

  [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device

Thanks, Anand

On 23/09/2021 11:58, Anand Jain wrote:
> btrfs/225, btrfs/164  reports warning due to a lockdep assertion failure:
> 
> [ 5343.002752] ------------[ cut here ]------------
> [ 5343.002756] WARNING: CPU: 3 PID: 797246 at fs/btrfs/volumes.c:1165
> close_fs_devices+0x200/0x220 [btrfs]
> 
> [ 5343.002933] Call Trace:
> [ 5343.002938]  btrfs_rm_device.cold+0x147/0x1c0 [btrfs]
> [ 5343.002981]  btrfs_ioctl+0x2dc2/0x3460 [btrfs]
> [ 5343.003021]  ? __do_sys_newstat+0x48/0x70
> [ 5343.003028]  ? lock_is_held_type+0xe8/0x140
> [ 5343.003034]  ? __x64_sys_ioctl+0x83/0xb0
> [ 5343.003037]  __x64_sys_ioctl+0x83/0xb0
> [ 5343.003042]  do_syscall_64+0x3b/0xc0
> [ 5343.003045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 5343.003048] RIP: 0033:0x7fe125a17d87
> 
> The patch [1] removed uuid_mutex in btrfs_rm_device(). So now there is no
> uuid_mutex in the call chain leading to close_fs_devices() that has
> lockdep_assert_held(uuid_mutex).
> 
>   [1]  [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
> 
> The lockdep_assert_held(uuid_mutex) in close_fs_devices() was added by the
> commit 425c6ed6486f (btrfs: do not hold device_list_mutex when closing
> devices) as it found that device_list_mutex in close_fs_devices() was
> redundant.
> 
> In the current code the lockdep_assert_held(uuid_mutex) in close_fs_devices()
> in incorrect, remove it.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> David,
>    Pls feel free to either roll this into the patch "[PATCH v2 2/7] btrfs: do not
>    take the uuid_mutex in btrfs_rm_device" or merge it as an independent patch.
>   
>   fs/btrfs/volumes.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9fea27b9f9be..ac4a9f349932 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1162,8 +1162,6 @@ static void close_fs_devices(struct btrfs_fs_devices *fs_devices)
>   {
>   	struct btrfs_device *device, *tmp;
>   
> -	lockdep_assert_held(&uuid_mutex);
> -
>   	if (--fs_devices->opened > 0)
>   		return;
>   
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-22 15:33       ` Filipe Manana
@ 2021-09-23  4:15         ` Anand Jain
  0 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-23  4:15 UTC (permalink / raw)
  To: fdmanana, Josef Bacik; +Cc: linux-btrfs, kernel-team




> generic/648, on latest misc-next (that has this patch integrated),
> also triggers the same type of lockdep warning involving the same two
> locks:


  This lockdep warning is fixed by the yet to merge patch:

   [PATCH v2 3/7] btrfs: do not read super look for a device path


Thanks, Anand


> 
> [19738.081729] ======================================================
> [19738.082620] WARNING: possible circular locking dependency detected
> [19738.083511] 5.15.0-rc2-btrfs-next-99 #1 Not tainted
> [19738.084234] ------------------------------------------------------
> [19738.085149] umount/508378 is trying to acquire lock:
> [19738.085884] ffff97a34c161d48 ((wq_completion)loop0){+.+.}-{0:0},
> at: flush_workqueue+0x8b/0x5b0
> [19738.087180]
>                 but task is already holding lock:
> [19738.088048] ffff97a31f64d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at:
> __loop_clr_fd+0x5a/0x680 [loop]
> [19738.089274]
>                 which lock already depends on the new lock.
> 
> [19738.090287]
>                 the existing dependency chain (in reverse order) is:
> [19738.091216]
>                 -> #8 (&lo->lo_mutex){+.+.}-{3:3}:
> [19738.091959]        __mutex_lock+0x92/0x900
> [19738.092473]        lo_open+0x28/0x60 [loop]
> [19738.093018]        blkdev_get_whole+0x28/0x90
> [19738.093650]        blkdev_get_by_dev.part.0+0x142/0x320
> [19738.094298]        blkdev_open+0x5e/0xa0
> [19738.094790]        do_dentry_open+0x163/0x390
> [19738.095425]        path_openat+0x3f0/0xa80
> [19738.096041]        do_filp_open+0xa9/0x150
> [19738.096657]        do_sys_openat2+0x97/0x160
> [19738.097299]        __x64_sys_openat+0x54/0x90
> [19738.097914]        do_syscall_64+0x3b/0xc0
> [19738.098433]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.099243]
>                 -> #7 (&disk->open_mutex){+.+.}-{3:3}:
> [19738.100259]        __mutex_lock+0x92/0x900
> [19738.100865]        blkdev_get_by_dev.part.0+0x56/0x320
> [19738.101530]        swsusp_check+0x19/0x150
> [19738.102046]        software_resume.part.0+0xb8/0x150
> [19738.102678]        resume_store+0xaf/0xd0
> [19738.103181]        kernfs_fop_write_iter+0x140/0x1e0
> [19738.103799]        new_sync_write+0x122/0x1b0
> [19738.104341]        vfs_write+0x29e/0x3d0
> [19738.104831]        ksys_write+0x68/0xe0
> [19738.105309]        do_syscall_64+0x3b/0xc0
> [19738.105823]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.106524]
>                 -> #6 (system_transition_mutex/1){+.+.}-{3:3}:
> [19738.107393]        __mutex_lock+0x92/0x900
> [19738.107911]        software_resume.part.0+0x18/0x150
> [19738.108537]        resume_store+0xaf/0xd0
> [19738.109057]        kernfs_fop_write_iter+0x140/0x1e0
> [19738.109675]        new_sync_write+0x122/0x1b0
> [19738.110218]        vfs_write+0x29e/0x3d0
> [19738.110711]        ksys_write+0x68/0xe0
> [19738.111190]        do_syscall_64+0x3b/0xc0
> [19738.111699]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.112388]
>                 -> #5 (&of->mutex){+.+.}-{3:3}:
> [19738.113089]        __mutex_lock+0x92/0x900
> [19738.113600]        kernfs_seq_start+0x2a/0xb0
> [19738.114141]        seq_read_iter+0x101/0x4d0
> [19738.114679]        new_sync_read+0x11b/0x1a0
> [19738.115212]        vfs_read+0x128/0x1c0
> [19738.115691]        ksys_read+0x68/0xe0
> [19738.116159]        do_syscall_64+0x3b/0xc0
> [19738.116670]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.117382]
>                 -> #4 (&p->lock){+.+.}-{3:3}:
> [19738.118062]        __mutex_lock+0x92/0x900
> [19738.118580]        seq_read_iter+0x51/0x4d0
> [19738.119102]        proc_reg_read_iter+0x48/0x80
> [19738.119651]        generic_file_splice_read+0x102/0x1b0
> [19738.120301]        splice_file_to_pipe+0xbc/0xd0
> [19738.120879]        do_sendfile+0x14e/0x5a0
> [19738.121389]        do_syscall_64+0x3b/0xc0
> [19738.121901]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.122597]
>                 -> #3 (&pipe->mutex/1){+.+.}-{3:3}:
> [19738.123339]        __mutex_lock+0x92/0x900
> [19738.123850]        iter_file_splice_write+0x98/0x440
> [19738.124475]        do_splice+0x36b/0x880
> [19738.124981]        __do_splice+0xde/0x160
> [19738.125483]        __x64_sys_splice+0x92/0x110
> [19738.126037]        do_syscall_64+0x3b/0xc0
> [19738.126553]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.127245]
>                 -> #2 (sb_writers#14){.+.+}-{0:0}:
> [19738.127978]        lo_write_bvec+0xea/0x2a0 [loop]
> [19738.128576]        loop_process_work+0x257/0xdb0 [loop]
> [19738.129224]        process_one_work+0x24c/0x5b0
> [19738.129789]        worker_thread+0x55/0x3c0
> [19738.130311]        kthread+0x155/0x180
> [19738.130783]        ret_from_fork+0x22/0x30
> [19738.131296]
>                 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
> [19738.132262]        process_one_work+0x223/0x5b0
> [19738.132827]        worker_thread+0x55/0x3c0
> [19738.133365]        kthread+0x155/0x180
> [19738.133834]        ret_from_fork+0x22/0x30
> [19738.134350]
>                 -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
> [19738.135153]        __lock_acquire+0x130e/0x2210
> [19738.135715]        lock_acquire+0xd7/0x310
> [19738.136224]        flush_workqueue+0xb5/0x5b0
> [19738.136766]        drain_workqueue+0xa0/0x110
> [19738.137308]        destroy_workqueue+0x36/0x280
> [19738.137870]        __loop_clr_fd+0xb4/0x680 [loop]
> [19738.138473]        blkdev_put+0xc7/0x220
> [19738.138964]        close_fs_devices+0x95/0x220 [btrfs]
> [19738.139685]        btrfs_close_devices+0x48/0x160 [btrfs]
> [19738.140379]        generic_shutdown_super+0x74/0x110
> [19738.141011]        kill_anon_super+0x14/0x30
> [19738.141542]        btrfs_kill_super+0x12/0x20 [btrfs]
> [19738.142189]        deactivate_locked_super+0x31/0xa0
> [19738.142812]        cleanup_mnt+0x147/0x1c0
> [19738.143322]        task_work_run+0x5c/0xa0
> [19738.143831]        exit_to_user_mode_prepare+0x20c/0x210
> [19738.144487]        syscall_exit_to_user_mode+0x27/0x60
> [19738.145125]        do_syscall_64+0x48/0xc0
> [19738.145636]        entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.146466]
>                 other info that might help us debug this:
> 
> [19738.147602] Chain exists of:
>                   (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
> [19738.149221]  Possible unsafe locking scenario:
> 
> [19738.149952]        CPU0                    CPU1
> [19738.150520]        ----                    ----
> [19738.151082]   lock(&lo->lo_mutex);
> [19738.151508]                                lock(&disk->open_mutex);
> [19738.152276]                                lock(&lo->lo_mutex);
> [19738.153010]   lock((wq_completion)loop0);
> [19738.153510]
>                  *** DEADLOCK ***
> 
> [19738.154241] 4 locks held by umount/508378:
> [19738.154756]  #0: ffff97a30dd9c0e8
> (&type->s_umount_key#62){++++}-{3:3}, at: deactivate_super+0x2c/0x40
> [19738.155900]  #1: ffffffffc0ac5f10 (uuid_mutex){+.+.}-{3:3}, at:
> btrfs_close_devices+0x40/0x160 [btrfs]
> [19738.157094]  #2: ffff97a31bc6d928 (&disk->open_mutex){+.+.}-{3:3},
> at: blkdev_put+0x3a/0x220
> [19738.158137]  #3: ffff97a31f64d4a0 (&lo->lo_mutex){+.+.}-{3:3}, at:
> __loop_clr_fd+0x5a/0x680 [loop]
> [19738.159244]
>                 stack backtrace:
> [19738.159784] CPU: 2 PID: 508378 Comm: umount Not tainted
> 5.15.0-rc2-btrfs-next-99 #1
> [19738.160723] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [19738.162132] Call Trace:
> [19738.162448]  dump_stack_lvl+0x57/0x72
> [19738.162908]  check_noncircular+0xf3/0x110
> [19738.163411]  __lock_acquire+0x130e/0x2210
> [19738.163912]  lock_acquire+0xd7/0x310
> [19738.164358]  ? flush_workqueue+0x8b/0x5b0
> [19738.164859]  ? lockdep_init_map_type+0x51/0x260
> [19738.165437]  ? lockdep_init_map_type+0x51/0x260
> [19738.165999]  flush_workqueue+0xb5/0x5b0
> [19738.166481]  ? flush_workqueue+0x8b/0x5b0
> [19738.166990]  ? __mutex_unlock_slowpath+0x45/0x280
> [19738.167574]  drain_workqueue+0xa0/0x110
> [19738.168052]  destroy_workqueue+0x36/0x280
> [19738.168551]  __loop_clr_fd+0xb4/0x680 [loop]
> [19738.169084]  blkdev_put+0xc7/0x220
> [19738.169510]  close_fs_devices+0x95/0x220 [btrfs]
> [19738.170109]  btrfs_close_devices+0x48/0x160 [btrfs]
> [19738.170745]  generic_shutdown_super+0x74/0x110
> [19738.171300]  kill_anon_super+0x14/0x30
> [19738.171760]  btrfs_kill_super+0x12/0x20 [btrfs]
> [19738.172342]  deactivate_locked_super+0x31/0xa0
> [19738.172880]  cleanup_mnt+0x147/0x1c0
> [19738.173343]  task_work_run+0x5c/0xa0
> [19738.173781]  exit_to_user_mode_prepare+0x20c/0x210
> [19738.174381]  syscall_exit_to_user_mode+0x27/0x60
> [19738.174957]  do_syscall_64+0x48/0xc0
> [19738.175407]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [19738.176037] RIP: 0033:0x7f4d7104fee7
> [19738.176487] Code: ff 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f
> 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 ff 0b 00 f7 d8 64 89
> 01 48
> [19738.178787] RSP: 002b:00007ffeca2fd758 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a6
> [19738.179722] RAX: 0000000000000000 RBX: 00007f4d71175264 RCX: 00007f4d7104fee7
> [19738.180601] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005615eb38bdd0
> [19738.181496] RBP: 00005615eb38bba0 R08: 0000000000000000 R09: 00007ffeca2fc4d0
> [19738.182376] R10: 00005615eb38bdf0 R11: 0000000000000246 R12: 0000000000000000
> [19738.183249] R13: 00005615eb38bdd0 R14: 00005615eb38bcb0 R15: 0000000000000000

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 2/7] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-09-20  9:41       ` Anand Jain
@ 2021-09-23  4:33         ` Anand Jain
  0 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-23  4:33 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik, David Sterba, dsterba, kernel-team



On 20/09/2021 17:41, Anand Jain wrote:
> 
> 
> On 20/09/2021 16:26, David Sterba wrote:
>> On Mon, Sep 20, 2021 at 03:45:14PM +0800, Anand Jain wrote:
>>>
>>> This patch is causing btrfs/225 to fail [here].
>>>
>>> ------
>>> static void close_fs_devices(struct btrfs_fs_devices *fs_devices)
>>> {
>>>           struct btrfs_device *device, *tmp;
>>>
>>>           lockdep_assert_held(&uuid_mutex);  <--- here
>>> -------
>>>
>>> as this patch removed mutex_lock(&uuid_mutex) in btrfs_rm_device().
>>>
>>>
>>> commit 425c6ed6486f (btrfs: do not hold device_list_mutex when closing
>>> devices) added lockdep_assert_held(&uuid_mutex) in close_fs_devices().
>>>
>>>
>>> But mutex_lock(&uuid_mutex) in btrfs_rm_device() is not essential as we
>>> discussed/proved earlier.
>>>
>>> Remove lockdep_assert_held(&uuid_mutex) in close_fs_devices() is a
>>> better choice.
>>
>> This is the other patch that's still not in misc-next. I merged the
>> branch partially and in a different order so that causes the lockdep
>> warning. I can remove the patch "btrfs: do not take the uuid_mutex in
>> btrfs_rm_device" from misc-next for now and merge the whole series in
>> the order as sent but there were comments so I'm waiting for an update.
> 
> Ha ha. I think you are confused, even I was. The problem assert is at 
> close_fs_devices() not clone_fs_devices() (as in 7/7). They are 
> similarly named.
> 
> A variant of 7/7 is already merged.
> c124706900c2 btrfs: fix lockdep warning while mounting sprout fs

  Oops it is patch 1/7 that is not merged. So David meant that.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/7] btrfs: do not read super look for a device path
  2021-08-25  2:00   ` Anand Jain
@ 2021-09-27 15:32     ` Josef Bacik
  2021-09-28 11:50       ` Anand Jain
  0 siblings, 1 reply; 39+ messages in thread
From: Josef Bacik @ 2021-09-27 15:32 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs, kernel-team, David Sterba

On 8/24/21 10:00 PM, Anand Jain wrote:
> On 28/07/2021 05:01, Josef Bacik wrote:
>> For device removal and replace we call btrfs_find_device_by_devspec,
>> which if we give it a device path and nothing else will call
>> btrfs_find_device_by_path, which opens the block device and reads the
>> super block and then looks up our device based on that.
>>
>> However this is completely unnecessary because we have the path stored
>> in our device on our fsdevices.  All we need to do if we're given a path
>> is look through the fs_devices on our file system and use that device if
>> we find it, reading the super block is just silly.
> 
> The device path as stored in our fs_devices can differ from the path
> provided by the user for the same device (for example, dm, lvm).
> 
> btrfs-progs sanitize the device path but, others (for example, an ioctl
> test case) might not. And the path lookup would fail.
> 
> Also, btrfs dev scan <path> can update the device path anytime, even
> after it is mounted. Fixing that failed the subsequent subvolume mounts
> (if I remember correctly).
> 

This is a good point, that's kind of a big deal from a UX perspective.

>> This fixes the case where we end up with our sb write "lock" getting the
>> dependency of the block device ->open_mutex, which resulted in the
>> following lockdep splat
> 
> Can we do..
> 
> btrfs_exclop_start()
>   ::
> find device part (read sb)
>   ::
> mnt_want_write_file()?
> 
> 

I looked into this, but we'd have to re-order the exclop_start to above the 
mnt_want_write_file() part everywhere to be consistent, and this is mostly OK 
except for balance.  Balance the exclop is tied to the lifetime of the balance 
ctl, which can exist past the task running balance because we could pause the 
balance.

Could we get around this?  Sure, but in my head exclop == lock.  This means we 
have something akin to

exclop_start
mnt_want_write_file()

pause balance
mnt_drop_write()

resume balance

exclop_start magic stuff in balance to resume without doing the exclop
mnt_want_write_file()
<do balance>
exclop_finish
mnt_drop_write()

If we're OK with this then we can definitely do that.

The other option is simply to make userspace do the superblock read and use the 
devid thing for us.  Then we just eat the UX problem for older tools where you 
want to do btrfs rm device /dev/mapper/whatever and we have the pathname as 
/dev/dm-#.

Both options are unattractive in their own way.  I think the first option is 
only annoying to us, and maintains the UX expectations.  But I want more than me 
to make this decision, so if you and Dave are OK with that I'll go with 
re-ordering exclop+mnt_want_write_file(), and then put the device lookup between 
the two of them for device removal.  Thanks,

Josef


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 3/7] btrfs: do not read super look for a device path
  2021-09-27 15:32     ` Josef Bacik
@ 2021-09-28 11:50       ` Anand Jain
  0 siblings, 0 replies; 39+ messages in thread
From: Anand Jain @ 2021-09-28 11:50 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, David Sterba

On 27/09/2021 23:32, Josef Bacik wrote:
> On 8/24/21 10:00 PM, Anand Jain wrote:
>> On 28/07/2021 05:01, Josef Bacik wrote:
>>> For device removal and replace we call btrfs_find_device_by_devspec,
>>> which if we give it a device path and nothing else will call
>>> btrfs_find_device_by_path, which opens the block device and reads the
>>> super block and then looks up our device based on that.
>>>
>>> However this is completely unnecessary because we have the path stored
>>> in our device on our fsdevices.  All we need to do if we're given a path
>>> is look through the fs_devices on our file system and use that device if
>>> we find it, reading the super block is just silly.
>>
>> The device path as stored in our fs_devices can differ from the path
>> provided by the user for the same device (for example, dm, lvm).
>>
>> btrfs-progs sanitize the device path but, others (for example, an ioctl
>> test case) might not. And the path lookup would fail.
>>
>> Also, btrfs dev scan <path> can update the device path anytime, even
>> after it is mounted. Fixing that failed the subsequent subvolume mounts
>> (if I remember correctly).
>>
> 
> This is a good point, that's kind of a big deal from a UX perspective.
> 
>>> This fixes the case where we end up with our sb write "lock" getting the
>>> dependency of the block device ->open_mutex, which resulted in the
>>> following lockdep splat
>>
>> Can we do..
>>
>> btrfs_exclop_start()
>>   ::
>> find device part (read sb)
>>   ::
>> mnt_want_write_file()?
>>
>>
> 


> I looked into this, but we'd have to re-order the exclop_start to above 
> the mnt_want_write_file() part everywhere to be consistent, and this is 
> mostly OK except for balance.  Balance the exclop is tied to the 
> lifetime of the balance ctl, which can exist past the task running 
> balance because we could pause the balance.
> 
> Could we get around this?  Sure, but in my head exclop == lock.  This 
> means we have something akin to
> 
> exclop_start
> mnt_want_write_file()
> 
> pause balance
> mnt_drop_write()
> 
> resume balance
> 
> exclop_start magic stuff in balance to resume without doing the exclop
> mnt_want_write_file()
> <do balance>
> exclop_finish
> mnt_drop_write()
> 
> If we're OK with this then we can definitely do that.

This is getting complex. IMO.

> The other option is simply to make userspace do the superblock read and 
> use the devid thing for us.  Then we just eat the UX problem for older 
> tools where you want to do btrfs rm device /dev/mapper/whatever and we 
> have the pathname as /dev/dm-#.


> Both options are unattractive in their own way. 

I agree.

> I think the first 
> option is only annoying to us, and maintains the UX expectations.  But I 
> want more than me to make this decision, so if you and Dave are OK with 
> that I'll go with re-ordering exclop+mnt_want_write_file(), and then put 
> the device lookup between the two of them for device removal.  Thanks,


There is a 3rd option.

Here, the root of the problem is about reading superblock after 
mnt_drop_write().

So to avoid this, can we read the sb -> devid before mnt_drop_write()? 
and use the devid later on?

But when we read the sb we have neither mnt_drop_write() nor 
exclop_start..().

If the devid we read is stale, at a later stage the btrfs_rm_device() 
will still verify it.

I experimented this option, the diff is here [1]. This change needs a 
lot of clean up and did not copy the same logic to btrfs_ioctl_rm_dev() 
or tried to merge it. This diff passed the -g volume test cases with no 
new regressions. But I wasn't able to reproduce the original issue for 
which we wrote this patch.



[1]

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9eb0c1eb568e..e9c6bd05abf9 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3164,15 +3164,13 @@ static long btrfs_ioctl_rm_dev_v2(struct file 
*file, void __user *arg)
         struct block_device *bdev = NULL;
         fmode_t mode;
         int ret;
+       bool cancel_or_missing = false;
         bool cancel = false;
+       u64 devid;

         if (!capable(CAP_SYS_ADMIN))
                 return -EPERM;

-       ret = mnt_want_write_file(file);
-       if (ret)
-               return ret;
-
         vol_args = memdup_user(arg, sizeof(*vol_args));
         if (IS_ERR(vol_args)) {
                 ret = PTR_ERR(vol_args);
@@ -3184,9 +3182,32 @@ static long btrfs_ioctl_rm_dev_v2(struct file 
*file, void __user *arg)
                 goto out;
         }
         vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
-       if (!(vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) &&
-           strcmp("cancel", vol_args->name) == 0)
-               cancel = true;
+       if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
+               devid = vol_args->devid;
+       else {
+               if (strcmp("cancel", vol_args->name) == 0) {
+                       cancel_or_missing = true;
+                       cancel = true;
+               } else if (!strcmp("missing", vol_args->name))
+                       cancel_or_missing = true;
+               else {
+                       struct block_device *bdev;
+                       struct btrfs_super_block *disk_super;
+
+                       ret = btrfs_get_bdev_and_sb(vol_args->name, 
FMODE_READ,
+                                                   fs_info->bdev_holder, 0,
+                                                   &bdev, &disk_super);
+                       if (ret)
+                               goto out;
+                       devid = 
btrfs_stack_device_id(&disk_super->dev_item);
+                       btrfs_release_disk_super(disk_super);
+                       blkdev_put(bdev, FMODE_READ);
+               }
+       }
+
+       ret = mnt_want_write_file(file);
+       if (ret)
+               goto out;

         ret = exclop_start_or_cancel_reloc(fs_info, 
BTRFS_EXCLOP_DEV_REMOVE,
                                            cancel);
@@ -3194,10 +3215,10 @@ static long btrfs_ioctl_rm_dev_v2(struct file 
*file, void __user *arg)
                 goto out;
         /* Exclusive operation is now claimed */

-       if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-               ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, 
&bdev, &mode);
-       else
+       if (cancel_or_missing)
                 ret = btrfs_rm_device(fs_info, vol_args->name, 0, 
&bdev, &mode);
+       else
+               ret = btrfs_rm_device(fs_info, NULL, devid, &bdev, &mode);

         btrfs_exclop_finish(fs_info);

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6ade80bae3a5..85ae7294cea2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -493,7 +493,7 @@ static struct btrfs_fs_devices 
*find_fsid_with_metadata_uuid(
  }


-static int
+int
  btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void 
*holder,
                       int flush, struct block_device **bdev,
                       struct btrfs_super_block **disk_super)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c7ac43d8a7e8..fa1d1faa70d4 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -502,6 +502,9 @@ void btrfs_close_devices(struct btrfs_fs_devices 
*fs_devices);
  void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices);
  void btrfs_assign_next_active_device(struct btrfs_device *device,
                                      struct btrfs_device *this_dev);
+int btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void 
*holder,
+                         int flush, struct block_device **bdev,
+                         struct btrfs_super_block **disk_super);
  struct btrfs_device *btrfs_find_device_by_devspec(struct btrfs_fs_info 
*fs_info,
                                                   u64 devid,
                                                   const char *devpath);



Thanks, Anand



^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-09-28 11:50 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
2021-07-27 21:01 ` [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
2021-09-01  8:13   ` Anand Jain
2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
2021-09-01 12:01   ` Anand Jain
2021-09-01 17:08     ` David Sterba
2021-09-01 17:10     ` Josef Bacik
2021-09-01 19:49       ` Anand Jain
2021-09-02 12:58   ` David Sterba
2021-09-02 14:10     ` Josef Bacik
2021-09-17 14:33       ` David Sterba
2021-09-20  7:45   ` Anand Jain
2021-09-20  8:26     ` David Sterba
2021-09-20  9:41       ` Anand Jain
2021-09-23  4:33         ` Anand Jain
2021-09-21 11:59   ` Filipe Manana
2021-09-21 12:17     ` Filipe Manana
2021-09-22 15:33       ` Filipe Manana
2021-09-23  4:15         ` Anand Jain
2021-09-23  3:58   ` [PATCH] btrfs: drop lockdep assert in close_fs_devices() Anand Jain
2021-09-23  4:04     ` Anand Jain
2021-07-27 21:01 ` [PATCH v2 3/7] btrfs: do not read super look for a device path Josef Bacik
2021-08-25  2:00   ` Anand Jain
2021-09-27 15:32     ` Josef Bacik
2021-09-28 11:50       ` Anand Jain
2021-07-27 21:01 ` [PATCH v2 4/7] btrfs: update the bdev time directly when closing Josef Bacik
2021-08-25  0:35   ` Anand Jain
2021-09-02 12:16   ` David Sterba
2021-07-27 21:01 ` [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove Josef Bacik
2021-08-25  1:00   ` Anand Jain
2021-09-02 12:16   ` David Sterba
2021-07-27 21:01 ` [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
2021-08-25  1:19   ` Anand Jain
2021-09-01 14:05   ` Nikolay Borisov
2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
2021-08-24 22:08   ` Anand Jain
2021-09-01 13:35   ` Nikolay Borisov
2021-09-02 12:59   ` David Sterba
2021-09-17 15:06 ` [PATCH v2 0/7] David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.