All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"david@fromorbit.com" <david@fromorbit.com>
Subject: Re: [PATCH v2 2/4] btrfs: mark device addition as mnt_want_write_file
Date: Thu, 17 Mar 2022 10:50:16 +0000	[thread overview]
Message-ID: <YjMSaLIhKNcKUuHM@debian9.Home> (raw)
In-Reply-To: <20220317073628.slh4iyexpen7lmjh@naota-xeon>

On Thu, Mar 17, 2022 at 07:36:29AM +0000, Naohiro Aota wrote:
> On Wed, Mar 16, 2022 at 04:06:26PM +0000, Filipe Manana wrote:
> > On Wed, Mar 16, 2022 at 10:22:38PM +0900, Naohiro Aota wrote:
> > > btrfs_init_new_device() calls btrfs_relocate_sys_chunk() which incurs
> > > file-system internal writing. That writing can cause a deadlock with
> > > FS freezing like as described in like as described in commit
> > > 26559780b953 ("btrfs: zoned: mark relocation as writing").
> > > 
> > > Mark the device addition as mnt_want_write_file. This is also consistent
> > > with the removing device ioctl counterpart.
> > > 
> > > Cc: stable@vger.kernel.org # 4.9+
> > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> > > ---
> > >  fs/btrfs/ioctl.c | 11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > > index 60c907b14547..a6982a1fde65 100644
> > > --- a/fs/btrfs/ioctl.c
> > > +++ b/fs/btrfs/ioctl.c
> > > @@ -3474,8 +3474,10 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
> > >  	return ret;
> > >  }
> > >  
> > > -static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > > +static long btrfs_ioctl_add_dev(struct file *file, void __user *arg)
> > >  {
> > > +	struct inode *inode = file_inode(file);
> > > +	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> > >  	struct btrfs_ioctl_vol_args *vol_args;
> > >  	bool restore_op = false;
> > >  	int ret;
> > > @@ -3488,6 +3490,10 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > >  		return -EINVAL;
> > >  	}
> > >  
> > > +	ret = mnt_want_write_file(file);
> > > +	if (ret)
> > > +		return ret;
> > 
> > So, this now breaks all test cases that exercise device seeding, and I clearly
> > forgot about seeding when I asked about why not use mnt_want_write_file()
> > instead of a bare call to sb_start_write():
> 
> Ah, yes, I also confirmed they fail.
> 
> > 
> > $ ./check btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> ><snip>
> > Ran: btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> > Failures: btrfs/161 btrfs/162 btrfs/163 btrfs/164 btrfs/248
> > Failed 5 of 5 tests
> > 
> > So device seeding introduces a special case. If we mount a seeding
> > filesystem, it's RO, so the mnt_want_write_file() fails.
> 
> Yeah, so we are in a mixed state here. It's RO with a seeding
> device. Or, it must be RW otherwise (checked in
> btrfs_init_new_device()).
> 
> > Something like this deals with it and it makes the tests pass:
> > 
> ><snip>
> > 
> > We are also changing the semantics as we no longer allow for adding a device
> > to a RO filesystem. So the lack of a mnt_want_write_file() was intentional
> > to deal with the seeding filesystem case. But calling mnt_want_write_file()
> > if we are not seeding, changes the semantics - I'm not sure if anyone relies
> > on the ability to add a device to a fs mounted RO, I'm not seeing if it's an
> > useful use case.
> 
> Adding a device to RO FS anyway returns -EROFS from
> btrfs_init_new_device(). So, there is no change.
> 
> > So either we do that special casing like in that diff, or we always do the
> > sb_start_write() / sb_end_write() - in any case please add a comment explaining
> > why we do it like that, why we can't use mnt_want_write_file().
> 
> The conditional using of sb_start_write() or mnt_want_write_file()
> seems a bit dirty. And, I just thought, marking the FS "writing" when
> it's read-only also seems odd.
> 
> I'm now thinking we should have sb_start_write() around here where the
> FS is surely RW.
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 393fc7db99d3..50e02dc4e2b2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2731,6 +2731,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  
>  	mutex_unlock(&fs_devices->device_list_mutex);
>  
> +	sb_start_write(fs_info->sb);
> +
>  	if (seeding_dev) {
>  		mutex_lock(&fs_info->chunk_mutex);
>  		ret = init_first_rw_device(trans);
> @@ -2786,6 +2788,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  		ret = btrfs_commit_transaction(trans);
>  	}
>  
> +	sb_end_write(fs_info->sb);
> +
>  	/*
>  	 * Now that we have written a new super block to this device, check all
>  	 * other fs_devices list if device_path alienates any other scanned
> @@ -2801,6 +2805,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  	return ret;
>  
>  error_sysfs:
> +	sb_end_write(fs_info->sb);
> +
>  	btrfs_sysfs_remove_device(device);
>  	mutex_lock(&fs_info->fs_devices->device_list_mutex);
>  	mutex_lock(&fs_info->chunk_mutex);

Why not just reduce the scope to surround the btrfs_relocate_sys_chunks() call?
It's simpler, and I don't see why all the other code needs to be surround by
sb_start_write() and sb_end_write().

Actually, relocating system chunks does not create ordered extents - that only
happens for data block groups. So we could could get away with all this, and
have the relocation code do the assertion only if we are relocating a data
block group - so no need to touch anything in the device add path.

Thanks.

> 
> > Thanks.
> > 
> > 
> > > +
> > >  	if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_DEV_ADD)) {
> > >  		if (!btrfs_exclop_start_try_lock(fs_info, BTRFS_EXCLOP_DEV_ADD))
> > >  			return BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS;
> > > @@ -3520,6 +3526,7 @@ static long btrfs_ioctl_add_dev(struct btrfs_fs_info *fs_info, void __user *arg)
> > >  		btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED);
> > >  	else
> > >  		btrfs_exclop_finish(fs_info);
> > > +	mnt_drop_write_file(file);
> > >  	return ret;
> > >  }
> > >  
> > > @@ -5443,7 +5450,7 @@ long btrfs_ioctl(struct file *file, unsigned int
> > >  	case BTRFS_IOC_RESIZE:
> > >  		return btrfs_ioctl_resize(file, argp);
> > >  	case BTRFS_IOC_ADD_DEV:
> > > -		return btrfs_ioctl_add_dev(fs_info, argp);
> > > +		return btrfs_ioctl_add_dev(file, argp);
> > >  	case BTRFS_IOC_RM_DEV:
> > >  		return btrfs_ioctl_rm_dev(file, argp);
> > >  	case BTRFS_IOC_RM_DEV_V2:
> > > -- 
> > > 2.35.1
> > > 

  reply	other threads:[~2022-03-17 10:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-16 13:22 [PATCH v2 0/4] protect relocation with sb_start_write Naohiro Aota
2022-03-16 13:22 ` [PATCH v2 1/4] btrfs: mark resumed async balance as writing Naohiro Aota
2022-03-16 15:57   ` Filipe Manana
2022-03-17  7:11     ` Naohiro Aota
2022-03-16 13:22 ` [PATCH v2 2/4] btrfs: mark device addition as mnt_want_write_file Naohiro Aota
2022-03-16 16:06   ` Filipe Manana
2022-03-17  7:36     ` Naohiro Aota
2022-03-17 10:50       ` Filipe Manana [this message]
2022-03-22  4:30         ` Naohiro Aota
2022-03-22 13:11           ` Filipe Manana
2022-03-23  2:26             ` Naohiro Aota
2022-03-18  7:56   ` [btrfs] cd452af388: xfstests.btrfs.218.fail kernel test robot
2022-03-18  7:56     ` kernel test robot
2022-03-16 13:22 ` [PATCH v2 3/4] fs: add check functions for sb_start_{write,pagefault,intwrite} Naohiro Aota
2022-03-17  8:34   ` Dave Chinner
2022-03-17 11:13     ` Naohiro Aota
2022-03-16 13:22 ` [PATCH v2 4/4] btrfs: assert that relocation is protected with sb_start_write() Naohiro Aota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjMSaLIhKNcKUuHM@debian9.Home \
    --to=fdmanana@kernel.org \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=david@fromorbit.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.