All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v4 6/6] btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls
Date: Wed, 6 Oct 2021 16:54:45 +0800	[thread overview]
Message-ID: <e81e4690-377d-40f1-8488-21530ee6c57d@oracle.com> (raw)
In-Reply-To: <78f4669e469232db2d8675fd7b4fd06028f2eef5.1633464631.git.josef@toxicpanda.com>

On 06/10/2021 04:12, Josef Bacik wrote:
> For device removal and replace we call btrfs_find_device_by_devspec,
> which if we give it a device path and nothing else will call
> btrfs_get_dev_args_from_path, which opens the block device and reads the
> super block and then looks up our device based on that.
> 
> However at this point we're holding the sb write "lock", so reading the
> block device pulls in the dependency of ->open_mutex, which produces the
> following lockdep splat
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc2+ #405 Not tainted
> ------------------------------------------------------
> losetup/11576 is trying to acquire lock:
> ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
> 
> but task is already holding lock:
> ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         lo_open+0x28/0x60 [loop]
>         blkdev_get_whole+0x25/0xf0
>         blkdev_get_by_dev.part.0+0x168/0x3c0
>         blkdev_open+0xd2/0xe0
>         do_dentry_open+0x161/0x390
>         path_openat+0x3cc/0xa20
>         do_filp_open+0x96/0x120
>         do_sys_openat2+0x7b/0x130
>         __x64_sys_openat+0x46/0x70
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #3 (&disk->open_mutex){+.+.}-{3:3}:
>         __mutex_lock+0x7d/0x750
>         blkdev_get_by_dev.part.0+0x56/0x3c0
>         blkdev_get_by_path+0x98/0xa0
>         btrfs_get_bdev_and_sb+0x1b/0xb0
>         btrfs_find_device_by_devspec+0x12b/0x1c0
>         btrfs_rm_device+0x127/0x610
>         btrfs_ioctl+0x2a31/0x2e70
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> -> #2 (sb_writers#12){.+.+}-{0:0}:
>         lo_write_bvec+0xc2/0x240 [loop]
>         loop_process_work+0x238/0xd00 [loop]
>         process_one_work+0x26b/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
>         process_one_work+0x245/0x560
>         worker_thread+0x55/0x3c0
>         kthread+0x140/0x160
>         ret_from_fork+0x1f/0x30
> 
> -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
>         __lock_acquire+0x10ea/0x1d90
>         lock_acquire+0xb5/0x2b0
>         flush_workqueue+0x91/0x5e0
>         drain_workqueue+0xa0/0x110
>         destroy_workqueue+0x36/0x250
>         __loop_clr_fd+0x9a/0x660 [loop]
>         block_ioctl+0x3f/0x50
>         __x64_sys_ioctl+0x80/0xb0
>         do_syscall_64+0x38/0x90
>         entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&lo->lo_mutex);
>                                 lock(&disk->open_mutex);
>                                 lock(&lo->lo_mutex);
>    lock((wq_completion)loop0);
> 
>   *** DEADLOCK ***
> 
> 1 lock held by losetup/11576:
>   #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
> 
> stack backtrace:
> CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
> Call Trace:
>   dump_stack_lvl+0x57/0x72
>   check_noncircular+0xcf/0xf0
>   ? stack_trace_save+0x3b/0x50
>   __lock_acquire+0x10ea/0x1d90
>   lock_acquire+0xb5/0x2b0
>   ? flush_workqueue+0x67/0x5e0
>   ? lockdep_init_map_type+0x47/0x220
>   flush_workqueue+0x91/0x5e0
>   ? flush_workqueue+0x67/0x5e0
>   ? verify_cpu+0xf0/0x100
>   drain_workqueue+0xa0/0x110
>   destroy_workqueue+0x36/0x250
>   __loop_clr_fd+0x9a/0x660 [loop]
>   ? blkdev_ioctl+0x8d/0x2a0
>   block_ioctl+0x3f/0x50
>   __x64_sys_ioctl+0x80/0xb0
>   do_syscall_64+0x38/0x90
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f31b02404cb
> 
> Instead what we want to do is populate our device lookup args before we
> grab any locks, and then pass these args into btrfs_rm_device().  From
> there we can find the device and do the appropriate removal.
> 
> Suggested-by: Anand Jain <anand.jain@oracle.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/ioctl.c   | 67 +++++++++++++++++++++++++++-------------------
>   fs/btrfs/volumes.c | 15 +++++------
>   fs/btrfs/volumes.h |  2 +-
>   3 files changed, 48 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index b8508af4e539..c9d3f375df83 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3160,6 +3160,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
>   
>   static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   {
> +	BTRFS_DEV_LOOKUP_ARGS(args);
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_ioctl_vol_args_v2 *vol_args;
> @@ -3171,35 +3172,39 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> -	ret = mnt_want_write_file(file);
> -	if (ret)
> -		return ret;
> -
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
>   	if (IS_ERR(vol_args)) {

>   		ret = PTR_ERR(vol_args);
> -		goto err_drop;
> +		goto out;

  	return  PTR_ERR(vol_args);

is fine here.

>   	}
>   
>   	if (vol_args->flags & ~BTRFS_DEVICE_REMOVE_ARGS_MASK) {
>   		ret = -EOPNOTSUPP;
>   		goto out;

At the label out, we call, btrfs_put_dev_args_from_path(&args).
However, the args.fsid and args.uuid might not have initialized
to NULL or a valid kmem address.

>   	}
> +
>   	vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
> -	if (!(vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) &&
> -	    strcmp("cancel", vol_args->name) == 0)
> +	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
> +		args.devid = vol_args->devid;
> +	} else if (!strcmp("cancel", vol_args->name)) {
>   		cancel = true;
> +	} else {
> +		ret = btrfs_get_dev_args_from_path(fs_info, &args, vol_args->name);
> +		if (ret)
> +			goto out;


Same as above.

> +	}
> +
> +	ret = mnt_want_write_file(file);
> +	if (ret)
> +		goto out;
>   
>   	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
>   					   cancel);
>   	if (ret)
> -		goto out;
> -	/* Exclusive operation is now claimed */
> +		goto err_drop;
>   
> -	if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
> -		ret = btrfs_rm_device(fs_info, NULL, vol_args->devid, &bdev, &mode);
> -	else
> -		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev, &mode);
> +	/* Exclusive operation is now claimed */
> +	ret = btrfs_rm_device(fs_info, &args, &bdev, &mode);
>   
>   	btrfs_exclop_finish(fs_info);
>   
> @@ -3211,17 +3216,19 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   			btrfs_info(fs_info, "device deleted: %s",
>   					vol_args->name);
>   	}
> -out:
> -	kfree(vol_args);
>   err_drop:
>   	mnt_drop_write_file(file);
>   	if (bdev)
>   		blkdev_put(bdev, mode);
> +out:
> +	btrfs_put_dev_args_from_path(&args);
> +	kfree(vol_args);
>   	return ret;
>   }
>   
>   static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   {
> +	BTRFS_DEV_LOOKUP_ARGS(args);
>   	struct inode *inode = file_inode(file);
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_ioctl_vol_args *vol_args;
> @@ -3233,32 +3240,38 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> -	ret = mnt_want_write_file(file);
> -	if (ret)
> -		return ret;
> -
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
> -	if (IS_ERR(vol_args)) {
> -		ret = PTR_ERR(vol_args);
> -		goto out_drop_write;
> -	}
> +	if (IS_ERR(vol_args))
> +		return PTR_ERR(vol_args);
> +
>   	vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
> -	cancel = (strcmp("cancel", vol_args->name) == 0);
> +	if (!strcmp("cancel", vol_args->name)) {
> +		cancel = true;
> +	} else {
> +		ret = btrfs_get_dev_args_from_path(fs_info, &args, vol_args->name);
> +		if (ret)
> +			goto out;


Here too.

Thanks, Anand

> +	}
> +
> +	ret = mnt_want_write_file(file);
> +	if (ret)
> +		goto out;
>   
>   	ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
>   					   cancel);
>   	if (ret == 0) {
> -		ret = btrfs_rm_device(fs_info, vol_args->name, 0, &bdev, &mode);
> +		ret = btrfs_rm_device(fs_info, &args, &bdev, &mode);
>   		if (!ret)
>   			btrfs_info(fs_info, "disk deleted %s", vol_args->name);
>   		btrfs_exclop_finish(fs_info);
>   	}
>   
> -	kfree(vol_args);
> -out_drop_write:
>   	mnt_drop_write_file(file);
>   	if (bdev)
>   		blkdev_put(bdev, mode);
> +out:
> +	btrfs_put_dev_args_from_path(&args);
> +	kfree(vol_args);
>   	return ret;
>   }
>   
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e490414e8987..3262e75fbb1c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2076,8 +2076,9 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
>   	update_dev_time(bdev);
>   }
>   
> -int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
> -		    u64 devid, struct block_device **bdev, fmode_t *mode)
> +int btrfs_rm_device(struct btrfs_fs_info *fs_info,
> +		    struct btrfs_dev_lookup_args *args,
> +		    struct block_device **bdev, fmode_t *mode)
>   {
>   	struct btrfs_device *device;
>   	struct btrfs_fs_devices *cur_devices;
> @@ -2096,14 +2097,12 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
>   	if (ret)
>   		goto out;
>   
> -	device = btrfs_find_device_by_devspec(fs_info, devid, device_path);
> -
> -	if (IS_ERR(device)) {
> -		if (PTR_ERR(device) == -ENOENT &&
> -		    device_path && strcmp(device_path, "missing") == 0)
> +	device = btrfs_find_device(fs_info->fs_devices, args);
> +	if (!device) {
> +		if (args->missing)
>   			ret = BTRFS_ERROR_DEV_MISSING_NOT_FOUND;
>   		else
> -			ret = PTR_ERR(device);
> +			ret = -ENOENT;
>   		goto out;
>   	}
>   
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 3fe5ac683f98..223c390ec057 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -527,7 +527,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
>   void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
>   void btrfs_free_device(struct btrfs_device *device);
>   int btrfs_rm_device(struct btrfs_fs_info *fs_info,
> -		    const char *device_path, u64 devid,
> +		    struct btrfs_dev_lookup_args *args,
>   		    struct block_device **bdev, fmode_t *mode);
>   void __exit btrfs_cleanup_fs_uuids(void);
>   int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
> 


  reply	other threads:[~2021-10-06  8:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-05 20:12 [PATCH v4 0/6] Fix lockdep issues around device removal Josef Bacik
2021-10-05 20:12 ` [PATCH v4 1/6] btrfs: use num_device to check for the last surviving seed device Josef Bacik
2021-10-05 20:12 ` [PATCH v4 2/6] btrfs: add comments for device counts in struct btrfs_fs_devices Josef Bacik
2021-10-05 20:12 ` [PATCH v4 3/6] btrfs: do not call close_fs_devices in btrfs_rm_device Josef Bacik
2021-10-06  6:40   ` Nikolay Borisov
2021-10-13 17:47     ` David Sterba
2021-10-06  8:55   ` Anand Jain
2021-10-05 20:12 ` [PATCH v4 4/6] btrfs: handle device lookup with btrfs_dev_lookup_args Josef Bacik
2021-10-06  7:34   ` Nikolay Borisov
2021-10-13 18:23     ` David Sterba
2021-10-06  8:58   ` Anand Jain
2021-10-05 20:12 ` [PATCH v4 5/6] btrfs: add a btrfs_get_dev_args_from_path helper Josef Bacik
2021-10-06  8:24   ` Nikolay Borisov
2021-10-06  8:58   ` Anand Jain
2021-10-05 20:12 ` [PATCH v4 6/6] btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls Josef Bacik
2021-10-06  8:54   ` Anand Jain [this message]
2021-10-06  9:05     ` Nikolay Borisov
2021-10-19  3:42       ` Anand Jain
2021-10-18 15:22 ` [PATCH v4 0/6] Fix lockdep issues around device removal David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e81e4690-377d-40f1-8488-21530ee6c57d@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.