All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2 3/7] btrfs: do not read super look for a device path
Date: Wed, 25 Aug 2021 10:00:16 +0800	[thread overview]
Message-ID: <41d2c028-6af5-ab2b-91fa-1090d4258ba9@oracle.com> (raw)
In-Reply-To: <26639cd9f337a84b432b6627cd7c17b3d6d51e34.1627419595.git.josef@toxicpanda.com>

On 28/07/2021 05:01, Josef Bacik wrote:
> For device removal and replace we call btrfs_find_device_by_devspec,
> which if we give it a device path and nothing else will call
> btrfs_find_device_by_path, which opens the block device and reads the
> super block and then looks up our device based on that.
> 
> However this is completely unnecessary because we have the path stored
> in our device on our fsdevices.  All we need to do if we're given a path
> is look through the fs_devices on our file system and use that device if
> we find it, reading the super block is just silly.

The device path as stored in our fs_devices can differ from the path
provided by the user for the same device (for example, dm, lvm).

btrfs-progs sanitize the device path but, others (for example, an ioctl
test case) might not. And the path lookup would fail.

Also, btrfs dev scan <path> can update the device path anytime, even
after it is mounted. Fixing that failed the subsequent subvolume mounts
(if I remember correctly).

> This fixes the case where we end up with our sb write "lock" getting the
> dependency of the block device ->open_mutex, which resulted in the
> following lockdep splat

Can we do..

btrfs_exclop_start()
  ::
find device part (read sb)
  ::
mnt_want_write_file()?


Thanks, Anand


> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #405 Not tainted
> ------------------------------------------------------
> losetup/11576 is trying to acquire lock:
> ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x25/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x161/0x390
>         path_openat+0x3cc/0xa20
>         do_filp_open+0x96/0x120
>         do_sys_openat2+0x7b/0x130
>         __x64_sys_openat+0x46/0x70
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         blkdev_get_by_dev.part.0+0x56/0x3c0
>         blkdev_get_by_path+0x98/0xa0
>         btrfs_get_bdev_and_sb+0x1b/0xb0
>         btrfs_find_device_by_devspec+0x12b/0x1c0
>         btrfs_rm_device+0x127/0x610
>         btrfs_ioctl+0x2a31/0x2e70
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>         lo_write_bvec+0xc2/0x240 [loop]
>         loop_process_work+0x238/0xd00 [loop]
>         process_one_work+0x26b/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x245/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x10ea/0x1d90
>         lock_acquire+0xb5/0x2b0
>         flush_workqueue+0x91/0x5e0
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x660 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 
> 1 lock held by losetup/11576:
>   #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0xcf/0xf0
>   ? stack_trace_save+0x3b/0x50
>   __lock_acquire+0x10ea/0x1d90
>   lock_acquire+0xb5/0x2b0
>   ? flush_workqueue+0x67/0x5e0
>   ? lockdep_init_map_type+0x47/0x220
>   flush_workqueue+0x91/0x5e0
>   ? flush_workqueue+0x67/0x5e0
>   ? verify_cpu+0xf0/0x100
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x660 [loop]
>   ? blkdev_ioctl+0x8d/0x2a0
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x80/0xb0
>   do_syscall_64+0x38/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f31b02404cb
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 61 +++++++++++++++++-----------------------------
>   1 file changed, 23 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0e7372f637eb..bf2449cdb2ab 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2313,37 +2313,22 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
>   	btrfs_free_device(tgtdev);
>   }
>   
> -static struct btrfs_device *btrfs_find_device_by_path(
> -		struct btrfs_fs_info *fs_info, const char *device_path)
> +static struct btrfs_device *find_device_by_path(
> +					struct btrfs_fs_devices *fs_devices,
> +					const char *path)
>   {
> -	int ret = 0;
> -	struct btrfs_super_block *disk_super;
> -	u64 devid;
> -	u8 *dev_uuid;
> -	struct block_device *bdev;
>   	struct btrfs_device *device;
> +	bool missing = !strcmp(path, "missing");
>   
> -	ret = btrfs_get_bdev_and_sb(device_path, FMODE_READ,
> -				    fs_info->bdev_holder, 0, &bdev, &disk_super);
> -	if (ret)
> -		return ERR_PTR(ret);
> -
> -	devid = btrfs_stack_device_id(&disk_super->dev_item);
> -	dev_uuid = disk_super->dev_item.uuid;
> -	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
> -		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
> -					   disk_super->metadata_uuid);
> -	else
> -		device = btrfs_find_device(fs_info->fs_devices, devid, dev_uuid,
> -					   disk_super->fsid);
> -
> -	btrfs_release_disk_super(disk_super);
> -	if (!device)
> -		device = ERR_PTR(-ENOENT);
> -	blkdev_put(bdev, FMODE_READ);
> -	return device;
> +	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +		if (missing && test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> +					&device->dev_state) && !device->bdev)
> +			return device;
> +		if (!missing && device_path_matched(path, device))
> +			return device;
> +	}
> +	return NULL;
>   }
> -
>   /*
>    * Lookup a device given by device id, or the path if the id is 0.
>    */
> @@ -2351,6 +2336,7 @@ struct btrfs_device *btrfs_find_device_by_devspec(
>   		struct btrfs_fs_info *fs_info, u64 devid,
>   		const char *device_path)
>   {
> +	struct btrfs_fs_devices *seed_devs;
>   	struct btrfs_device *device;
>   
>   	if (devid) {
> @@ -2364,18 +2350,17 @@ struct btrfs_device *btrfs_find_device_by_devspec(
>   	if (!device_path || !device_path[0])
>   		return ERR_PTR(-EINVAL);
>   
> -	if (strcmp(device_path, "missing") == 0) {
> -		/* Find first missing device */
> -		list_for_each_entry(device, &fs_info->fs_devices->devices,
> -				    dev_list) {
> -			if (test_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
> -				     &device->dev_state) && !device->bdev)
> -				return device;
> -		}
> -		return ERR_PTR(-ENOENT);
> -	}
> +	device = find_device_by_path(fs_info->fs_devices, device_path);
> +	if (device)
> +		return device;
>   
> -	return btrfs_find_device_by_path(fs_info, device_path);
> +	list_for_each_entry(seed_devs, &fs_info->fs_devices->seed_list,
> +			    seed_list) {
> +		device = find_device_by_path(seed_devs, device_path);
> +		if (device)
> +			return device;
> +	}
> +	return ERR_PTR(-ENOENT);
>   }
>   
>   /*
> 


  reply	other threads:[~2021-08-25  2:00 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-27 21:01 [PATCH v2 0/7] Josef Bacik
2021-07-27 21:01 ` [PATCH v2 1/7] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
2021-09-01  8:13   ` Anand Jain
2021-07-27 21:01 ` [PATCH v2 2/7] btrfs: do not take the uuid_mutex " Josef Bacik
2021-09-01 12:01   ` Anand Jain
2021-09-01 17:08     ` David Sterba
2021-09-01 17:10     ` Josef Bacik
2021-09-01 19:49       ` Anand Jain
2021-09-02 12:58   ` David Sterba
2021-09-02 14:10     ` Josef Bacik
2021-09-17 14:33       ` David Sterba
2021-09-20  7:45   ` Anand Jain
2021-09-20  8:26     ` David Sterba
2021-09-20  9:41       ` Anand Jain
2021-09-23  4:33         ` Anand Jain
2021-09-21 11:59   ` Filipe Manana
2021-09-21 12:17     ` Filipe Manana
2021-09-22 15:33       ` Filipe Manana
2021-09-23  4:15         ` Anand Jain
2021-09-23  3:58   ` [PATCH] btrfs: drop lockdep assert in close_fs_devices() Anand Jain
2021-09-23  4:04     ` Anand Jain
2021-07-27 21:01 ` [PATCH v2 3/7] btrfs: do not read super look for a device path Josef Bacik
2021-08-25  2:00   ` Anand Jain [this message]
2021-09-27 15:32     ` Josef Bacik
2021-09-28 11:50       ` Anand Jain
2021-07-27 21:01 ` [PATCH v2 4/7] btrfs: update the bdev time directly when closing Josef Bacik
2021-08-25  0:35   ` Anand Jain
2021-09-02 12:16   ` David Sterba
2021-07-27 21:01 ` [PATCH v2 5/7] btrfs: delay blkdev_put until after the device remove Josef Bacik
2021-08-25  1:00   ` Anand Jain
2021-09-02 12:16   ` David Sterba
2021-07-27 21:01 ` [PATCH v2 6/7] btrfs: unify common code for the v1 and v2 versions of " Josef Bacik
2021-08-25  1:19   ` Anand Jain
2021-09-01 14:05   ` Nikolay Borisov
2021-07-27 21:01 ` [PATCH v2 7/7] btrfs: do not take the device_list_mutex in clone_fs_devices Josef Bacik
2021-08-24 22:08   ` Anand Jain
2021-09-01 13:35   ` Nikolay Borisov
2021-09-02 12:59   ` David Sterba
2021-09-17 15:06 ` [PATCH v2 0/7] David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41d2c028-6af5-ab2b-91fa-1090d4258ba9@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --subject='Re: [PATCH v2 3/7] btrfs: do not read super look for a device path' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.