All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Fix lockdep issues around device removal
@ 2021-07-27 19:47 Josef Bacik
  2021-07-27 19:47 ` [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device Josef Bacik
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

The commit 87579e9b7d8d ("loop: use worker per cgroup instead of kworker")
enabled the use of workqueues for loopback devices, which brought with it
lockdep annotations for the workqueues for loopback devices.  This uncovered a
cascade of lockdep warnings because of how we mess with the block_device while
under the sb writers lock while doing the device removal.

The first patch seems innocuous but we have a lockdep_assert_held(&uuid_mutex)
in one of the helpers, which is why I have it first.  The code should never be
called which is why it is removed, but I'm removing it specifically to remove
confusion about the role of the uuid_mutex here.

The next 4 patches are to resolve the lockdep messages as they occur.  There are
several issues and I address them one at a time until we're no longer getting
lockdep warnings.

The final patch doesn't necessarily have to go in right away, it's just a
cleanup as I noticed we have a lot of duplicated code between the v1 and v2
device removal handling.  Thanks,

Josef

Josef Bacik (6):
  btrfs: do not check for ->num_devices == 0 in rm_device
  btrfs: do not take the uuid_mutex in btrfs_rm_device
  btrfs: do not read super look for a device path
  btrfs: update the bdev time directly when closing
  btrfs: delay blkdev_put until after the device remove
  btrfs: unify common code for the v1 and v2 versions of device remove

 fs/btrfs/ioctl.c   |  92 +++++++++++++++++--------------------
 fs/btrfs/volumes.c | 110 ++++++++++++++++++++-------------------------
 fs/btrfs/volumes.h |   3 +-
 3 files changed, 90 insertions(+), 115 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device
  2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
@ 2021-07-27 19:47 ` Josef Bacik
  2021-07-27 22:35   ` Anand Jain
  2021-07-27 19:47 ` [PATCH 2/6] btrfs: do not take the uuid_mutex in btrfs_rm_device Josef Bacik
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We're specifically prohibited from removing the last device in a file
system, so this check is meaningless and the code can be deleted.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 86846d6e58d0..373be4e54f28 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2199,13 +2199,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	btrfs_close_bdev(device);
 	synchronize_rcu();
 	btrfs_free_device(device);
-
-	if (cur_devices->open_devices == 0) {
-		list_del_init(&cur_devices->seed_list);
-		close_fs_devices(cur_devices);
-		free_fs_devices(cur_devices);
-	}
-
 out:
 	mutex_unlock(&uuid_mutex);
 	return ret;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
  2021-07-27 19:47 ` [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device Josef Bacik
@ 2021-07-27 19:47 ` Josef Bacik
  2021-07-28  0:15   ` Anand Jain
  2021-07-27 19:47 ` [PATCH 3/6] btrfs: do not read super look for a device path Josef Bacik
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We got the following lockdep splat while running xfstests (specifically
btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
converted loop to using workqueues, which comes with lockdep
annotations that don't exist with kworkers.  The lockdep splat is as
follows

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2-custom+ #34 Not tainted
------------------------------------------------------
losetup/156417 is trying to acquire lock:
ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600

but task is already holding lock:
ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0xba/0x7c0
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x28/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x163/0x3a0
       path_openat+0x74d/0xa40
       do_filp_open+0x9c/0x140
       do_sys_openat2+0xb1/0x170
       __x64_sys_openat+0x54/0x90
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0xba/0x7c0
       blkdev_get_by_dev.part.0+0xd1/0x3c0
       blkdev_get_by_path+0xc0/0xd0
       btrfs_scan_one_device+0x52/0x1f0 [btrfs]
       btrfs_control_ioctl+0xac/0x170 [btrfs]
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (uuid_mutex){+.+.}-{3:3}:
       __mutex_lock+0xba/0x7c0
       btrfs_rm_device+0x48/0x6a0 [btrfs]
       btrfs_ioctl+0x2d1c/0x3110 [btrfs]
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#11){.+.+}-{0:0}:
       lo_write_bvec+0x112/0x290 [loop]
       loop_process_work+0x25f/0xcb0 [loop]
       process_one_work+0x28f/0x5d0
       worker_thread+0x55/0x3c0
       kthread+0x140/0x170
       ret_from_fork+0x22/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x266/0x5d0
       worker_thread+0x55/0x3c0
       kthread+0x140/0x170
       ret_from_fork+0x22/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x1130/0x1dc0
       lock_acquire+0xf5/0x320
       flush_workqueue+0xae/0x600
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x650 [loop]
       lo_ioctl+0x29d/0x780 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x83/0xb0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:
Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***
1 lock held by losetup/156417:
 #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]

stack backtrace:
CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0x10a/0x120
 __lock_acquire+0x1130/0x1dc0
 lock_acquire+0xf5/0x320
 ? flush_workqueue+0x84/0x600
 flush_workqueue+0xae/0x600
 ? flush_workqueue+0x84/0x600
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x650 [loop]
 lo_ioctl+0x29d/0x780 [loop]
 ? __lock_acquire+0x3a0/0x1dc0
 ? update_dl_rq_load_avg+0x152/0x360
 ? lock_is_held_type+0xa5/0x120
 ? find_held_lock.constprop.0+0x2b/0x80
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x83/0xb0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f645884de6b

Usually the uuid_mutex exists to protect the fs_devices that map
together all of the devices that match a specific uuid.  In rm_device
we're messing with the uuid of a device, so it makes sense to protect
that here.

However in doing that it pulls in a whole host of lockdep dependencies,
as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
we end up with the dependency chain under the uuid_mutex being added
under the normal sb write dependency chain, which causes problems with
loop devices.

We don't need the uuid mutex here however.  If we call
btrfs_scan_one_device() before we scratch the super block we will find
the fs_devices and not find the device itself and return EBUSY because
the fs_devices is open.  If we call it after the scratch happens it will
not appear to be a valid btrfs file system.

We do not need to worry about other fs_devices modifying operations here
because we're protected by the exclusive operations locking.

So drop the uuid_mutex here in order to fix the lockdep splat.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 373be4e54f28..62cb7d39435c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	u64 num_devices;
 	int ret = 0;
 
-	mutex_lock(&uuid_mutex);
-
 	num_devices = btrfs_num_devices(fs_info);
 
 	ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
@@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 		mutex_unlock(&fs_info->chunk_mutex);
 	}
 
-	mutex_unlock(&uuid_mutex);
 	ret = btrfs_shrink_device(device, 0);
 	if (!ret)
 		btrfs_reada_remove_dev(device);
-	mutex_lock(&uuid_mutex);
 	if (ret)
 		goto error_undo;
 
@@ -2200,7 +2196,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	synchronize_rcu();
 	btrfs_free_device(device);
 out:
-	mutex_unlock(&uuid_mutex);
 	return ret;
 
 error_undo:
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] btrfs: do not read super look for a device path
  2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
  2021-07-27 19:47 ` [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device Josef Bacik
  2021-07-27 19:47 ` [PATCH 2/6] btrfs: do not take the uuid_mutex in btrfs_rm_device Josef Bacik
@ 2021-07-27 19:47 ` Josef Bacik
  2021-07-27 19:47 ` [PATCH 4/6] btrfs: update the bdev time directly when closing Josef Bacik
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

For device removal and replace we call btrfs_find_device_by_devspec,
which if we give it a device path and nothing else will call
btrfs_find_device_by_path, which opens the block device and reads the
super block and then looks up our device based on that.

However this is completely unnecessary because we have the path stored
in our device on our fsdevices.  All we need to do if we're given a path
is look through the fs_devices on our file system and use that device if
we find it, reading the super block is just silly.

This fixes the case where we end up with our sb write "lock" getting the
dependency of the block device ->open_mutex, which resulted in the
following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #405 Not tainted
------------------------------------------------------
losetup/11576 is trying to acquire lock:
ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_get_by_path+0x98/0xa0
       btrfs_get_bdev_and_sb+0x1b/0xb0
       btrfs_find_device_by_devspec+0x12b/0x1c0
       btrfs_rm_device+0x127/0x610
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11576:
 #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f31b02404cb

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 61 +++++++++++++++++-----------------------------
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 62cb7d39435c..cf4b9978963e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2298,37 +2298,22 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
 	btrfs_free_device(tgtdev);
 }
 
-static struct btrfs_device *btrfs_find_device_by_path(
-		struct btrfs_fs_info *fs_info, const char *device_path)
+static struct btrfs_device *find_device_by_path(
+					struct btrfs_fs_devices *fs_devices,
+					const char *path)
 {
-	int ret = 0;
-	struct btrfs_super_block *disk_super;
-	u64 devid;
-	u8 *dev_uuid;
-	struct block_device *bdev;
 	struct btrfs_device *device;
+	bool missing = !strcmp(path, "missing");
 
-	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
-				    fs_info->bdev_holder, 0, &bdev, &disk_super);
-	if (ret)
-		return ERR_PTR(ret);
-
-	devid = btrfs_stack_device_id(&disk_super->dev_item);
-	dev_uuid = disk_super->dev_item.uuid;
-	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
-		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
-					   disk_super->metadata_uuid);
-	else
-		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
-					   disk_super->fsid);
-
-	btrfs_release_disk_super(disk_super);
-	if (!device)
-		device = ERR_PTR(-ENOENT);
-	blkdev_put(bdev, FMODE_READ);
-	return device;
+	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+		if (missing && test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
+					&device->dev_state) && !device->bdev)
+			return device;
+		if (!missing && device_path_matched(path, device))
+			return device;
+	}
+	return NULL;
 }
-
 /*
  * Lookup a device given by device id, or the path if the id is 0.
  */
@@ -2336,6 +2321,7 @@ struct btrfs_device *btrfs_find_device_by_devspec(
 		struct btrfs_fs_info *fs_info, u64 devid,
 		const char *device_path)
 {
+	struct btrfs_fs_devices *seed_devs;
 	struct btrfs_device *device;
 
 	if (devid) {
@@ -2349,18 +2335,17 @@ struct btrfs_device *btrfs_find_device_by_devspec(
 	if (!device_path || !device_path[0])
 		return ERR_PTR(-EINVAL);
 
-	if (strcmp(device_path, "missing") == 0) {
-		/* Find first missing device */
-		list_for_each_entry(device, &fs_info->fs_devices->devices,
-				    dev_list) {
-			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
-				     &device->dev_state) && !device->bdev)
-				return device;
-		}
-		return ERR_PTR(-ENOENT);
-	}
+	device = find_device_by_path(fs_info->fs_devices, device_path);
+	if (device)
+		return device;
 
-	return btrfs_find_device_by_path(fs_info, device_path);
+	list_for_each_entry(seed_devs, &fs_info->fs_devices->seed_list,
+			    seed_list) {
+		device = find_device_by_path(seed_devs, device_path);
+		if (device)
+			return device;
+	}
+	return ERR_PTR(-ENOENT);
 }
 
 /*
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] btrfs: update the bdev time directly when closing
  2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
                   ` (2 preceding siblings ...)
  2021-07-27 19:47 ` [PATCH 3/6] btrfs: do not read super look for a device path Josef Bacik
@ 2021-07-27 19:47 ` Josef Bacik
  2021-07-27 19:47 ` [PATCH 5/6] btrfs: delay blkdev_put until after the device remove Josef Bacik
  2021-07-27 19:47 ` [PATCH 6/6] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
  5 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We update the ctime/mtime of a block device when we remove it so that
blkid knows the device changed.  However we do this by re-opening the
block device and calling filp_update_time.  This is more correct because
it'll call the inode->i_op->update_time if it exists, but the block dev
inodes do not do this.  Instead call generic_update_time() on the
bd_inode in order to avoid the blkdev_open path and get rid of the
following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #406 Not tainted
------------------------------------------------------
losetup/11596 is trying to acquire lock:
ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_get_by_dev.part.0+0x56/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       file_open_name+0xc7/0x170
       filp_open+0x2c/0x50
       btrfs_scratch_superblocks.part.0+0x10f/0x170
       btrfs_rm_device.cold+0xe8/0xed
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11596:
 #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cf4b9978963e..27df367db7d8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1882,15 +1882,17 @@ static int btrfs_add_dev_item(struct btrfs_trans_handle *trans,
  * Function to update ctime/mtime for a given device path.
  * Mainly used for ctime/mtime based probe like libblkid.
  */
-static void update_dev_time(const char *path_name)
+static void update_dev_time(struct block_device *bdev)
 {
-	struct file *filp;
+	struct inode *inode = bdev->bd_inode;
+	struct timespec64 now;
 
-	filp = filp_open(path_name, O_RDWR, 0);
-	if (IS_ERR(filp))
+	/* Shouldn't happen but just in case. */
+	if (!inode)
 		return;
-	file_update_time(filp);
-	filp_close(filp, NULL);
+
+	now = current_time(inode);
+	generic_update_time(inode, &now, S_MTIME|S_CTIME);
 }
 
 static int btrfs_rm_dev_item(struct btrfs_device *device)
@@ -2070,7 +2072,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 	btrfs_kobject_uevent(bdev, KOBJ_CHANGE);
 
 	/* Update ctime/mtime for device path for libblkid */
-	update_dev_time(device_path);
+	update_dev_time(bdev);
 }
 
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
@@ -2696,7 +2698,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	btrfs_forget_devices(device_path);
 
 	/* Update ctime/mtime for blkid or udev */
-	update_dev_time(device_path);
+	update_dev_time(bdev);
 
 	return ret;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] btrfs: delay blkdev_put until after the device remove
  2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
                   ` (3 preceding siblings ...)
  2021-07-27 19:47 ` [PATCH 4/6] btrfs: update the bdev time directly when closing Josef Bacik
@ 2021-07-27 19:47 ` Josef Bacik
  2021-07-27 19:47 ` [PATCH 6/6] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
  5 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

When removing the device we call blkdev_put() on the device once we've
removed it, and because we have an EXCL open we need to take the
->open_mutex on the block device to clean it up.  Unfortunately during
device remove we are holding the sb writers lock, which results in the
following lockdep splat

======================================================
WARNING: possible circular locking dependency detected
5.14.0-rc2+ #407 Not tainted
------------------------------------------------------
losetup/11595 is trying to acquire lock:
ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0

but task is already holding lock:
ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (&lo->lo_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       lo_open+0x28/0x60 [loop]
       blkdev_get_whole+0x25/0xf0
       blkdev_get_by_dev.part.0+0x168/0x3c0
       blkdev_open+0xd2/0xe0
       do_dentry_open+0x161/0x390
       path_openat+0x3cc/0xa20
       do_filp_open+0x96/0x120
       do_sys_openat2+0x7b/0x130
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&disk->open_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7d/0x750
       blkdev_put+0x3a/0x220
       btrfs_rm_device.cold+0x62/0xe5
       btrfs_ioctl+0x2a31/0x2e70
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#12){.+.+}-{0:0}:
       lo_write_bvec+0xc2/0x240 [loop]
       loop_process_work+0x238/0xd00 [loop]
       process_one_work+0x26b/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
       process_one_work+0x245/0x560
       worker_thread+0x55/0x3c0
       kthread+0x140/0x160
       ret_from_fork+0x1f/0x30

-> #0 ((wq_completion)loop0){+.+.}-{0:0}:
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       flush_workqueue+0x91/0x5e0
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&lo->lo_mutex);
                               lock(&disk->open_mutex);
                               lock(&lo->lo_mutex);
  lock((wq_completion)loop0);

 *** DEADLOCK ***

1 lock held by losetup/11595:
 #0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]

stack backtrace:
CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Call Trace:
 dump_stack_lvl+0x57/0x72
 check_noncircular+0xcf/0xf0
 ? stack_trace_save+0x3b/0x50
 __lock_acquire+0x10ea/0x1d90
 lock_acquire+0xb5/0x2b0
 ? flush_workqueue+0x67/0x5e0
 ? lockdep_init_map_type+0x47/0x220
 flush_workqueue+0x91/0x5e0
 ? flush_workqueue+0x67/0x5e0
 ? verify_cpu+0xf0/0x100
 drain_workqueue+0xa0/0x110
 destroy_workqueue+0x36/0x250
 __loop_clr_fd+0x9a/0x660 [loop]
 ? blkdev_ioctl+0x8d/0x2a0
 block_ioctl+0x3f/0x50
 __x64_sys_ioctl+0x80/0xb0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc21255d4cb

So instead save the bdev and do the put once we've dropped the sb
writers lock in order to avoid the lockdep recursion.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c   | 17 ++++++++++++++---
 fs/btrfs/volumes.c | 19 +++++++++++++++----
 fs/btrfs/volumes.h |  3 ++-
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0ba98e08a029..fabbfdfa56f5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3205,6 +3205,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ioctl_vol_args_v2 *vol_args;
+	struct block_device *bdev = NULL;
+	fmode_t mode;
 	int ret;
 	bool cancel = false;
 
@@ -3237,9 +3239,11 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	/* Exclusive operation is now claimed */
 
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid);
+		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev,
+				      &mode);
 	else
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
+		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
+				      &mode);
 
 	btrfs_exclop_finish(fs_info);
 
@@ -3255,6 +3259,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	kfree(vol_args);
 err_drop:
 	mnt_drop_write_file(file);
+	if (bdev)
+		blkdev_put(bdev, mode);
 	return ret;
 }
 
@@ -3263,6 +3269,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ioctl_vol_args *vol_args;
+	struct block_device *bdev = NULL;
+	fmode_t mode;
 	int ret;
 	bool cancel;
 
@@ -3284,7 +3292,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
 					   cancel);
 	if (ret == 0) {
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0);
+		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
+				      &mode);
 		if (!ret)
 			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
 		btrfs_exclop_finish(fs_info);
@@ -3294,6 +3303,8 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 out_drop_write:
 	mnt_drop_write_file(file);
 
+	if (bdev)
+		blkdev_put(bdev, mode);
 	return ret;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 27df367db7d8..1162e5c5b917 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2076,7 +2076,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 }
 
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
-		    u64 devid)
+		    u64 devid, struct block_device **bdev, fmode_t *mode)
 {
 	struct btrfs_device *device;
 	struct btrfs_fs_devices *cur_devices;
@@ -2186,15 +2186,26 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
 	mutex_unlock(&fs_devices->device_list_mutex);
 
 	/*
-	 * at this point, the device is zero sized and detached from
+	 * At this point, the device is zero sized and detached from
 	 * the devices list.  All that's left is to zero out the old
 	 * supers and free the device.
+	 *
+	 * We cannot call btrfs_close_bdev() here because we're holding the sb
+	 * write lock, and blkdev_put() will pull in the ->open_mutex on the
+	 * block device and it's dependencies.  Instead just flush the device
+	 * and let the caller do the final blkdev_put.
 	 */
-	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
+	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
 		btrfs_scratch_superblocks(fs_info, device->bdev,
 					  device->name->str);
+		if (device->bdev) {
+			sync_blockdev(device->bdev);
+			invalidate_bdev(device->bdev);
+		}
+	}
 
-	btrfs_close_bdev(device);
+	*bdev = device->bdev;
+	*mode = device->mode;
 	synchronize_rcu();
 	btrfs_free_device(device);
 out:
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 70c749eee3ad..cc70e54cb901 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -472,7 +472,8 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 					const u8 *uuid);
 void btrfs_free_device(struct btrfs_device *device);
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
-		    const char *device_path, u64 devid);
+		    const char *device_path, u64 devid,
+		    struct block_device **bdev, fmode_t *mode);
 void __exit btrfs_cleanup_fs_uuids(void);
 int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
 int btrfs_grow_device(struct btrfs_trans_handle *trans,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] btrfs: unify common code for the v1 and v2 versions of device remove
  2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
                   ` (4 preceding siblings ...)
  2021-07-27 19:47 ` [PATCH 5/6] btrfs: delay blkdev_put until after the device remove Josef Bacik
@ 2021-07-27 19:47 ` Josef Bacik
  5 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 19:47 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

These things share a lot of common code, v2 simply allows you to specify
devid.  Abstract out this common code and use the helper by both the v1
and v2 interfaces to save us some lines of code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ioctl.c | 99 +++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 61 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fabbfdfa56f5..e3a7e8544609 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3200,15 +3200,14 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
 	return ret;
 }
 
-static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
+static long btrfs_do_device_removal(struct file *file, const char *path,
+				    u64 devid, bool cancel)
 {
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_ioctl_vol_args_v2 *vol_args;
 	struct block_device *bdev = NULL;
 	fmode_t mode;
 	int ret;
-	bool cancel = false;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -3217,11 +3216,37 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	if (ret)
 		return ret;
 
-	vol_args = memdup_user(arg, sizeof(*vol_args));
-	if (IS_ERR(vol_args)) {
-		ret = PTR_ERR(vol_args);
-		goto err_drop;
+	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
+					   cancel);
+	if (ret)
+		goto out;
+
+	/* Exclusive operation is now claimed */
+	ret = btrfs_rm_device(fs_info, path, devid, &bdev, &mode);
+	btrfs_exclop_finish(fs_info);
+
+	if (!ret) {
+		if (path)
+			btrfs_info(fs_info, "device deleted: %s", path);
+		else
+			btrfs_info(fs_info, "device deleted: id %llu", devid);
 	}
+out:
+	mnt_drop_write_file(file);
+	if (bdev)
+		blkdev_put(bdev, mode);
+	return ret;
+}
+
+static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
+{
+	struct btrfs_ioctl_vol_args_v2 *vol_args;
+	int ret = 0;
+	bool cancel = false;
+
+	vol_args = memdup_user(arg, sizeof(*vol_args));
+	if (IS_ERR(vol_args))
+		return PTR_ERR(vol_args);
 
 	if (vol_args->flags & ~BTRFS_DEVICE_REMOVE_ARGS_MASK) {
 		ret = -EOPNOTSUPP;
@@ -3232,79 +3257,31 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	    strcmp("cancel", vol_args->name) == 0)
 		cancel = true;
 
-	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
-					   cancel);
-	if (ret)
-		goto out;
-	/* Exclusive operation is now claimed */
-
 	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev,
-				      &mode);
+		ret = btrfs_do_device_removal(file, NULL, vol_args->devid,
+					      cancel);
 	else
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
-				      &mode);
-
-	btrfs_exclop_finish(fs_info);
-
-	if (!ret) {
-		if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
-			btrfs_info(fs_info, "device deleted: id %llu",
-					vol_args->devid);
-		else
-			btrfs_info(fs_info, "device deleted: %s",
-					vol_args->name);
-	}
+		ret = btrfs_do_device_removal(file, vol_args->name, 0, cancel);
 out:
 	kfree(vol_args);
-err_drop:
-	mnt_drop_write_file(file);
-	if (bdev)
-		blkdev_put(bdev, mode);
 	return ret;
 }
 
 static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 {
-	struct inode *inode = file_inode(file);
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ioctl_vol_args *vol_args;
-	struct block_device *bdev = NULL;
-	fmode_t mode;
 	int ret;
 	bool cancel;
 
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	ret = mnt_want_write_file(file);
-	if (ret)
-		return ret;
-
 	vol_args = memdup_user(arg, sizeof(*vol_args));
-	if (IS_ERR(vol_args)) {
-		ret = PTR_ERR(vol_args);
-		goto out_drop_write;
-	}
+	if (IS_ERR(vol_args))
+		return PTR_ERR(vol_args);
 	vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
 	cancel = (strcmp("cancel", vol_args->name) == 0);
 
-	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
-					   cancel);
-	if (ret == 0) {
-		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev,
-				      &mode);
-		if (!ret)
-			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
-		btrfs_exclop_finish(fs_info);
-	}
+	ret = btrfs_do_device_removal(file, vol_args->name, 0, cancel);
 
 	kfree(vol_args);
-out_drop_write:
-	mnt_drop_write_file(file);
-
-	if (bdev)
-		blkdev_put(bdev, mode);
 	return ret;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device
  2021-07-27 19:47 ` [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device Josef Bacik
@ 2021-07-27 22:35   ` Anand Jain
  2021-07-27 22:42     ` Josef Bacik
  0 siblings, 1 reply; 10+ messages in thread
From: Anand Jain @ 2021-07-27 22:35 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/07/2021 03:47, Josef Bacik wrote:
> We're specifically prohibited from removing the last device in a file
> system, so this check is meaningless and the code can be deleted.

Josef, That's not true when removing a seed device from the sprouted 
filesystem.

> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 7 -------
>   1 file changed, 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 86846d6e58d0..373be4e54f28 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2199,13 +2199,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	btrfs_close_bdev(device);
>   	synchronize_rcu();
>   	btrfs_free_device(device);
> -
> -	if (cur_devices->open_devices == 0) {
> -		list_del_init(&cur_devices->seed_list);
> -		close_fs_devices(cur_devices);
> -		free_fs_devices(cur_devices);
> -	}
> -

  Here the cur_devices will point to the seed devices, and generally, it 
is the last opened devices in its fs_devices structure.

  Comments about it is a few lines above from this.

2157 /*
2158 * In normal cases the cur_devices == fs_devices. But in case
2159 * of deleting a seed device, the cur_devices should point to
2160 * its own fs_devices listed under the fs_devices->seed.
2161 */

Thanks, Anand

>   out:
>   	mutex_unlock(&uuid_mutex);
>   	return ret;
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device
  2021-07-27 22:35   ` Anand Jain
@ 2021-07-27 22:42     ` Josef Bacik
  0 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2021-07-27 22:42 UTC (permalink / raw)
  To: Anand Jain, linux-btrfs, kernel-team

On 7/27/21 6:35 PM, Anand Jain wrote:
> On 28/07/2021 03:47, Josef Bacik wrote:
>> We're specifically prohibited from removing the last device in a file
>> system, so this check is meaningless and the code can be deleted.
> 
> Josef, That's not true when removing a seed device from the sprouted 
> filesystem.
> 

Yeah I'm an idiot and noticed this immediately after I hit send, you'll 
see I fixed it in v2 that I sent like an hour later.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/6] btrfs: do not take the uuid_mutex in btrfs_rm_device
  2021-07-27 19:47 ` [PATCH 2/6] btrfs: do not take the uuid_mutex in btrfs_rm_device Josef Bacik
@ 2021-07-28  0:15   ` Anand Jain
  0 siblings, 0 replies; 10+ messages in thread
From: Anand Jain @ 2021-07-28  0:15 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team

On 28/07/2021 03:47, Josef Bacik wrote:
> We got the following lockdep splat while running xfstests (specifically
> btrfs/003 and btrfs/020 in a row) with the new rc.  This was uncovered
> by 87579e9b7d8d ("loop: use worker per cgroup instead of kworker") which
> converted loop to using workqueues, which comes with lockdep
> annotations that don't exist with kworkers.  The lockdep splat is as
> follows
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2-custom+ #34 Not tainted
> ------------------------------------------------------
> losetup/156417 is trying to acquire lock:
> ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600
> 
> but task is already holding lock:
> ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #5 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0xba/0x7c0
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x28/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x163/0x3a0
>         path_openat+0x74d/0xa40
>         do_filp_open+0x9c/0x140
>         do_sys_openat2+0xb1/0x170
>         __x64_sys_openat+0x54/0x90
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #4 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0xba/0x7c0
>         blkdev_get_by_dev.part.0+0xd1/0x3c0
>         blkdev_get_by_path+0xc0/0xd0
>         btrfs_scan_one_device+0x52/0x1f0 [btrfs]
>         btrfs_control_ioctl+0xac/0x170 [btrfs]
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (uuid_mutex){+.+.}-{3:3}:
>         __mutex_lock+0xba/0x7c0
>         btrfs_rm_device+0x48/0x6a0 [btrfs]
>         btrfs_ioctl+0x2d1c/0x3110 [btrfs]
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#11){.+.+}-{0:0}:
>         lo_write_bvec+0x112/0x290 [loop]
>         loop_process_work+0x25f/0xcb0 [loop]
>         process_one_work+0x28f/0x5d0
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x170
>         ret_from_fork+0x22/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x266/0x5d0
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x170
>         ret_from_fork+0x22/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x1130/0x1dc0
>         lock_acquire+0xf5/0x320
>         flush_workqueue+0xae/0x600
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x650 [loop]
>         lo_ioctl+0x29d/0x780 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x83/0xb0
>         do_syscall_64+0x3b/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
>   Possible unsafe locking scenario:
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 1 lock held by losetup/156417:
>   #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop]
> 
> stack backtrace:
> CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ #34
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0x10a/0x120
>   __lock_acquire+0x1130/0x1dc0
>   lock_acquire+0xf5/0x320
>   ? flush_workqueue+0x84/0x600
>   flush_workqueue+0xae/0x600
>   ? flush_workqueue+0x84/0x600
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x650 [loop]
>   lo_ioctl+0x29d/0x780 [loop]
>   ? __lock_acquire+0x3a0/0x1dc0
>   ? update_dl_rq_load_avg+0x152/0x360
>   ? lock_is_held_type+0xa5/0x120
>   ? find_held_lock.constprop.0+0x2b/0x80
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x83/0xb0
>   do_syscall_64+0x3b/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f645884de6b
> 
> Usually the uuid_mutex exists to protect the fs_devices that map
> together all of the devices that match a specific uuid.  In rm_device
> we're messing with the uuid of a device, so it makes sense to protect
> that here.
> 
> However in doing that it pulls in a whole host of lockdep dependencies,
> as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus
> we end up with the dependency chain under the uuid_mutex being added
> under the normal sb write dependency chain, which causes problems with
> loop devices.
> 
> We don't need the uuid mutex here however.  If we call
> btrfs_scan_one_device() before we scratch the super block we will find
> the fs_devices and not find the device itself and return EBUSY because
> the fs_devices is open.  If we call it after the scratch happens it will
> not appear to be a valid btrfs file system.
> 
> We do not need to worry about other fs_devices modifying operations here
> because we're protected by the exclusive operations locking.
> 
> So drop the uuid_mutex here in order to fix the lockdep splat.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 5 -----
>   1 file changed, 5 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 373be4e54f28..62cb7d39435c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2082,8 +2082,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	u64 num_devices;
>   	int ret = 0;
>   
> -	mutex_lock(&uuid_mutex);
> -
>   	num_devices = btrfs_num_devices(fs_info);
>   
>   	ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
> @@ -2127,11 +2125,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   		mutex_unlock(&fs_info->chunk_mutex);
>   	}
>   
> -	mutex_unlock(&uuid_mutex);
>   	ret = btrfs_shrink_device(device, 0);
>   	if (!ret)
>   		btrfs_reada_remove_dev(device);
> -	mutex_lock(&uuid_mutex);
>   	if (ret)
>   		goto error_undo;
>   
> @@ -2200,7 +2196,6 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	synchronize_rcu();
>   	btrfs_free_device(device);
>   out:
> -	mutex_unlock(&uuid_mutex);
>   	return ret;
>   
>   error_undo:
> 

  btrfs_rm_device happens on a mounted device, so device scan --forget
  will skip this device. For the rest of the ioctl based device maneuver,
  we have ...exclop_start(). Yeah, I see no reason to keep uuid_mutex
  here.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks, Anand

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-07-28  0:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-27 19:47 [PATCH 0/6] Fix lockdep issues around device removal Josef Bacik
2021-07-27 19:47 ` [PATCH 1/6] btrfs: do not check for ->num_devices == 0 in rm_device Josef Bacik
2021-07-27 22:35   ` Anand Jain
2021-07-27 22:42     ` Josef Bacik
2021-07-27 19:47 ` [PATCH 2/6] btrfs: do not take the uuid_mutex in btrfs_rm_device Josef Bacik
2021-07-28  0:15   ` Anand Jain
2021-07-27 19:47 ` [PATCH 3/6] btrfs: do not read super look for a device path Josef Bacik
2021-07-27 19:47 ` [PATCH 4/6] btrfs: update the bdev time directly when closing Josef Bacik
2021-07-27 19:47 ` [PATCH 5/6] btrfs: delay blkdev_put until after the device remove Josef Bacik
2021-07-27 19:47 ` [PATCH 6/6] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.