Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Naohiro Aota <naohiro.aota@wdc.com>
Cc: linux-btrfs@vger.kernel.org, "David Sterba" <dsterba@suse.com>,
	"Chris Mason" <clm@fb.com>, "Josef Bacik" <josef@toxicpanda.com>,
	"Qu Wenruo" <wqu@suse.com>, "Nikolay Borisov" <nborisov@suse.com>,
	linux-kernel@vger.kernel.org, "Hannes Reinecke" <hare@suse.com>,
	linux-fsdevel@vger.kernel.org,
	"Damien Le Moal" <damien.lemoal@wdc.com>,
	"Matias Bjørling" <mb@lightnvm.io>,
	"Johannes Thumshirn" <jthumshirn@suse.de>,
	"Bart Van Assche" <bvanassche@acm.org>
Subject: Re: [PATCH 18/19] btrfs: support dev-replace in HMZONED mode
Date: Thu, 13 Jun 2019 10:33:26 -0400
Message-ID: <20190613143325.bxcbsx5y44upgqle@MacBook-Pro-91.local> (raw)
In-Reply-To: <20190607131025.31996-19-naohiro.aota@wdc.com>

On Fri, Jun 07, 2019 at 10:10:24PM +0900, Naohiro Aota wrote:
> Currently, dev-replace copy all the device extents on source device to the
> target device, and it also clones new incoming write I/Os from users to the
> source device into the target device.
> 
> Cloning incoming IOs can break the sequential write rule in the target
> device. When write is mapped in the middle of block group, that I/O is
> directed in the middle of a zone of target device, which breaks the
> sequential write rule.
> 
> However, the cloning function cannot be simply disabled since incoming I/Os
> targeting already copied device extents must be cloned so that the I/O is
> executed on the target device.
> 
> We cannot use dev_replace->cursor_{left,right} to determine whether bio
> is going to not yet copied region.  Since we have time gap between
> finishing btrfs_scrub_dev() and rewriting the mapping tree in
> btrfs_dev_replace_finishing(), we can have newly allocated device extent
> which is never cloned (by handle_ops_on_dev_replace) nor copied (by the
> dev-replace process).
> 
> So the point is to copy only already existing device extents. This patch
> introduce mark_block_group_to_copy() to mark existing block group as a
> target of copying. Then, handle_ops_on_dev_replace() and dev-replace can
> check the flag to do their job.
> 
> This patch also handles empty region between used extents. Since
> dev-replace is smart to copy only used extents on source device, we have to
> fill the gap to honor the sequential write rule in the target device.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>  fs/btrfs/ctree.h       |   1 +
>  fs/btrfs/dev-replace.c |  96 +++++++++++++++++++++++
>  fs/btrfs/extent-tree.c |  32 +++++++-
>  fs/btrfs/scrub.c       | 169 +++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.c     |  27 ++++++-
>  5 files changed, 319 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index dad8ea5c3b99..a0be2b96117a 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -639,6 +639,7 @@ struct btrfs_block_group_cache {
>  	unsigned int has_caching_ctl:1;
>  	unsigned int removed:1;
>  	unsigned int wp_broken:1;
> +	unsigned int to_copy:1;
>  
>  	int disk_cache_state;
>  
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index fbe5ea2a04ed..5011b5ce0e75 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -263,6 +263,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>  	device->dev_stats_valid = 1;
>  	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>  	device->fs_devices = fs_info->fs_devices;
> +	if (bdev_is_zoned(bdev)) {
> +		ret = btrfs_get_dev_zonetypes(device);
> +		if (ret) {
> +			mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> +			goto error;
> +		}
> +	}
>  	list_add(&device->dev_list, &fs_info->fs_devices->devices);
>  	fs_info->fs_devices->num_devices++;
>  	fs_info->fs_devices->open_devices++;
> @@ -396,6 +403,88 @@ static char* btrfs_dev_name(struct btrfs_device *device)
>  		return rcu_str_deref(device->name);
>  }
>  
> +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info,
> +				    struct btrfs_device *src_dev)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_key key;
> +	struct btrfs_key found_key;
> +	struct btrfs_root *root = fs_info->dev_root;
> +	struct btrfs_dev_extent *dev_extent = NULL;
> +	struct btrfs_block_group_cache *cache;
> +	struct extent_buffer *l;
> +	int slot;
> +	int ret;
> +	u64 chunk_offset, length;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->reada = READA_FORWARD;
> +	path->search_commit_root = 1;
> +	path->skip_locking = 1;
> +
> +	key.objectid = src_dev->devid;
> +	key.offset = 0ull;
> +	key.type = BTRFS_DEV_EXTENT_KEY;
> +
> +	while (1) {
> +		ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +		if (ret < 0)
> +			break;
> +		if (ret > 0) {
> +			if (path->slots[0] >=
> +			    btrfs_header_nritems(path->nodes[0])) {
> +				ret = btrfs_next_leaf(root, path);
> +				if (ret < 0)
> +					break;
> +				if (ret > 0) {
> +					ret = 0;
> +					break;
> +				}
> +			} else {
> +				ret = 0;
> +			}
> +		}
> +
> +		l = path->nodes[0];
> +		slot = path->slots[0];
> +
> +		btrfs_item_key_to_cpu(l, &found_key, slot);
> +
> +		if (found_key.objectid != src_dev->devid)
> +			break;
> +
> +		if (found_key.type != BTRFS_DEV_EXTENT_KEY)
> +			break;
> +
> +		if (found_key.offset < key.offset)
> +			break;
> +
> +		dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
> +		length = btrfs_dev_extent_length(l, dev_extent);
> +
> +		chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent);
> +
> +		cache = btrfs_lookup_block_group(fs_info, chunk_offset);
> +		if (!cache)
> +			goto skip;
> +
> +		cache->to_copy = 1;
> +
> +		btrfs_put_block_group(cache);
> +
> +skip:
> +		key.offset = found_key.offset + length;
> +		btrfs_release_path(path);
> +	}
> +
> +	btrfs_free_path(path);
> +
> +	return ret;
> +}
> +
>  static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  		const char *tgtdev_name, u64 srcdevid, const char *srcdev_name,
>  		int read_src)
> @@ -439,6 +528,13 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	}
>  
>  	need_unlock = true;
> +
> +	mutex_lock(&fs_info->chunk_mutex);
> +	ret = mark_block_group_to_copy(fs_info, src_device);
> +	mutex_unlock(&fs_info->chunk_mutex);
> +	if (ret)
> +		return ret;
> +
>  	down_write(&dev_replace->rwsem);
>  	switch (dev_replace->replace_state) {
>  	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index ff4d55d6ef04..268365dd9a5d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -29,6 +29,7 @@
>  #include "qgroup.h"
>  #include "ref-verify.h"
>  #include "rcu-string.h"
> +#include "dev-replace.h"
>  
>  #undef SCRAMBLE_DELAYED_REFS
>  
> @@ -2022,7 +2023,31 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>  			if (btrfs_dev_is_sequential(stripe->dev,
>  						    stripe->physical) &&
>  			    stripe->length == stripe->dev->zone_size) {
> -				ret = blkdev_reset_zones(stripe->dev->bdev,
> +				struct btrfs_device *dev = stripe->dev;
> +
> +				ret = blkdev_reset_zones(dev->bdev,
> +							 stripe->physical >>
> +								 SECTOR_SHIFT,
> +							 stripe->length >>
> +								 SECTOR_SHIFT,
> +							 GFP_NOFS);
> +				if (!ret)
> +					discarded_bytes += stripe->length;
> +				else
> +					break;
> +				set_bit(stripe->physical >>
> +					dev->zone_size_shift,
> +					dev->empty_zones);
> +
> +				if (!btrfs_dev_replace_is_ongoing(
> +					    &fs_info->dev_replace) ||
> +				    stripe->dev != fs_info->dev_replace.srcdev)
> +					continue;
> +
> +				/* send to target as well */
> +				dev = fs_info->dev_replace.tgtdev;
> +
> +				ret = blkdev_reset_zones(dev->bdev,

This is unrelated to dev replace isn't it?  Please make this it's own patch, and
it's own helper while you are at it.  Thanks,

Josef

  reply index

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 13:10 [PATCH v2 00/19] btrfs zoned block device support Naohiro Aota
2019-06-07 13:10 ` [PATCH 01/19] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-06-07 13:10 ` [PATCH 02/19] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-06-13 13:58   ` Josef Bacik
2019-06-18  6:04     ` Naohiro Aota
2019-06-13 13:58   ` Josef Bacik
2019-06-17 18:57   ` David Sterba
2019-06-18  6:42     ` Naohiro Aota
2019-06-27 15:11       ` David Sterba
2019-06-07 13:10 ` [PATCH 03/19] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-06-13 13:57   ` Josef Bacik
2019-06-18  6:43     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 04/19] btrfs: disable fallocate in " Naohiro Aota
2019-06-07 13:10 ` [PATCH 05/19] btrfs: disable direct IO " Naohiro Aota
2019-06-13 14:00   ` Josef Bacik
2019-06-18  8:17     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 06/19] btrfs: align dev extent allocation to zone boundary Naohiro Aota
2019-06-07 13:10 ` [PATCH 07/19] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-06-13 14:07   ` Josef Bacik
2019-06-18  8:28     ` Naohiro Aota
2019-06-18 13:37       ` Josef Bacik
2019-06-17 22:30   ` David Sterba
2019-06-18  8:49     ` Naohiro Aota
2019-06-27 15:28       ` David Sterba
2019-06-07 13:10 ` [PATCH 08/19] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-06-13 14:09   ` Josef Bacik
2019-06-18  7:42     ` Naohiro Aota
2019-06-18 13:35       ` Josef Bacik
2019-06-07 13:10 ` [PATCH 09/19] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-06-13 14:12   ` Josef Bacik
2019-06-18  8:51     ` Naohiro Aota
2019-06-17 22:53   ` David Sterba
2019-06-18  9:01     ` Naohiro Aota
2019-06-27 15:35       ` David Sterba
2019-06-28  3:55   ` Anand Jain
2019-06-28  6:39     ` Naohiro Aota
2019-06-28  6:52       ` Anand Jain
2019-06-07 13:10 ` [PATCH 10/19] btrfs: rename btrfs_map_bio() Naohiro Aota
2019-06-07 13:10 ` [PATCH 11/19] btrfs: introduce submit buffer Naohiro Aota
2019-06-13 14:14   ` Josef Bacik
2019-06-17  3:16     ` Damien Le Moal
2019-06-18  0:00       ` David Sterba
2019-06-18  4:04         ` Damien Le Moal
2019-06-18 13:33       ` Josef Bacik
2019-06-19 10:32         ` Damien Le Moal
2019-06-07 13:10 ` [PATCH 12/19] btrfs: expire submit buffer on timeout Naohiro Aota
2019-06-13 14:15   ` Josef Bacik
2019-06-17  3:19     ` Damien Le Moal
2019-06-07 13:10 ` [PATCH 13/19] btrfs: avoid sync IO prioritization on checksum in HMZONED mode Naohiro Aota
2019-06-13 14:17   ` Josef Bacik
2019-06-07 13:10 ` [PATCH 14/19] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-06-13 14:24   ` Josef Bacik
2019-06-18  9:09     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 15/19] btrfs: reset zones of unused block groups Naohiro Aota
2019-06-07 13:10 ` [PATCH 16/19] btrfs: wait existing extents before truncating Naohiro Aota
2019-06-13 14:25   ` Josef Bacik
2019-06-07 13:10 ` [PATCH 17/19] btrfs: shrink delayed allocation size in HMZONED mode Naohiro Aota
2019-06-13 14:27   ` Josef Bacik
2019-06-07 13:10 ` [PATCH 18/19] btrfs: support dev-replace " Naohiro Aota
2019-06-13 14:33   ` Josef Bacik [this message]
2019-06-18  9:14     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 19/19] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
2019-06-07 13:17 ` [PATCH 01/12] btrfs-progs: build: Check zoned block device support Naohiro Aota
2019-06-07 13:17   ` [PATCH 02/12] btrfs-progs: utils: Introduce queue_param Naohiro Aota
2019-06-07 13:17   ` [PATCH 03/12] btrfs-progs: add new HMZONED feature flag Naohiro Aota
2019-06-07 13:17   ` [PATCH 04/12] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
2019-06-07 13:17   ` [PATCH 05/12] btrfs-progs: load and check zone information Naohiro Aota
2019-06-07 13:17   ` [PATCH 06/12] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
2019-06-07 13:17   ` [PATCH 07/12] btrfs-progs: support discarding zoned device Naohiro Aota
2019-06-07 13:17   ` [PATCH 08/12] btrfs-progs: volume: align chunk allocation to zones Naohiro Aota
2019-06-07 13:17   ` [PATCH 09/12] btrfs-progs: do sequential allocation Naohiro Aota
2019-06-07 13:17   ` [PATCH 10/12] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
2019-06-07 13:17   ` [PATCH 11/12] btrfs-progs: device-add: support HMZONED device Naohiro Aota
2019-06-07 13:17   ` [PATCH 12/12] btrfs-progs: introduce support for dev-place " Naohiro Aota
2019-06-12 17:51 ` [PATCH v2 00/19] btrfs zoned block device support David Sterba
2019-06-13  4:59   ` Naohiro Aota
2019-06-13 13:46     ` David Sterba
2019-06-14  2:07       ` Naohiro Aota
2019-06-17  2:44       ` Damien Le Moal

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190613143325.bxcbsx5y44upgqle@MacBook-Pro-91.local \
    --to=josef@toxicpanda.com \
    --cc=bvanassche@acm.org \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mb@lightnvm.io \
    --cc=naohiro.aota@wdc.com \
    --cc=nborisov@suse.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox