From: Naohiro Aota <Naohiro.Aota@wdc.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
"David Sterba" <dsterba@suse.com>, "Chris Mason" <clm@fb.com>,
"Qu Wenruo" <wqu@suse.com>, "Nikolay Borisov" <nborisov@suse.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Hannes Reinecke" <hare@suse.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"Damien Le Moal" <Damien.LeMoal@wdc.com>,
"Matias Bjørling" <mb@lightnvm.io>,
"Johannes Thumshirn" <jthumshirn@suse.de>,
"Bart Van Assche" <bvanassche@acm.org>
Subject: Re: [PATCH 18/19] btrfs: support dev-replace in HMZONED mode
Date: Tue, 18 Jun 2019 09:14:55 +0000 [thread overview]
Message-ID: <SN6PR04MB5231A13B01B410DA158530968CEA0@SN6PR04MB5231.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20190613143325.bxcbsx5y44upgqle@MacBook-Pro-91.local
On 2019/06/13 23:33, Josef Bacik wrote:
> On Fri, Jun 07, 2019 at 10:10:24PM +0900, Naohiro Aota wrote:
>> Currently, dev-replace copy all the device extents on source device to the
>> target device, and it also clones new incoming write I/Os from users to the
>> source device into the target device.
>>
>> Cloning incoming IOs can break the sequential write rule in the target
>> device. When write is mapped in the middle of block group, that I/O is
>> directed in the middle of a zone of target device, which breaks the
>> sequential write rule.
>>
>> However, the cloning function cannot be simply disabled since incoming I/Os
>> targeting already copied device extents must be cloned so that the I/O is
>> executed on the target device.
>>
>> We cannot use dev_replace->cursor_{left,right} to determine whether bio
>> is going to not yet copied region. Since we have time gap between
>> finishing btrfs_scrub_dev() and rewriting the mapping tree in
>> btrfs_dev_replace_finishing(), we can have newly allocated device extent
>> which is never cloned (by handle_ops_on_dev_replace) nor copied (by the
>> dev-replace process).
>>
>> So the point is to copy only already existing device extents. This patch
>> introduce mark_block_group_to_copy() to mark existing block group as a
>> target of copying. Then, handle_ops_on_dev_replace() and dev-replace can
>> check the flag to do their job.
>>
>> This patch also handles empty region between used extents. Since
>> dev-replace is smart to copy only used extents on source device, we have to
>> fill the gap to honor the sequential write rule in the target device.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>> fs/btrfs/ctree.h | 1 +
>> fs/btrfs/dev-replace.c | 96 +++++++++++++++++++++++
>> fs/btrfs/extent-tree.c | 32 +++++++-
>> fs/btrfs/scrub.c | 169 +++++++++++++++++++++++++++++++++++++++++
>> fs/btrfs/volumes.c | 27 ++++++-
>> 5 files changed, 319 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index dad8ea5c3b99..a0be2b96117a 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -639,6 +639,7 @@ struct btrfs_block_group_cache {
>> unsigned int has_caching_ctl:1;
>> unsigned int removed:1;
>> unsigned int wp_broken:1;
>> + unsigned int to_copy:1;
>>
>> int disk_cache_state;
>>
>> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
>> index fbe5ea2a04ed..5011b5ce0e75 100644
>> --- a/fs/btrfs/dev-replace.c
>> +++ b/fs/btrfs/dev-replace.c
>> @@ -263,6 +263,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>> device->dev_stats_valid = 1;
>> set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>> device->fs_devices = fs_info->fs_devices;
>> + if (bdev_is_zoned(bdev)) {
>> + ret = btrfs_get_dev_zonetypes(device);
>> + if (ret) {
>> + mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>> + goto error;
>> + }
>> + }
>> list_add(&device->dev_list, &fs_info->fs_devices->devices);
>> fs_info->fs_devices->num_devices++;
>> fs_info->fs_devices->open_devices++;
>> @@ -396,6 +403,88 @@ static char* btrfs_dev_name(struct btrfs_device *device)
>> return rcu_str_deref(device->name);
>> }
>>
>> +static int mark_block_group_to_copy(struct btrfs_fs_info *fs_info,
>> + struct btrfs_device *src_dev)
>> +{
>> + struct btrfs_path *path;
>> + struct btrfs_key key;
>> + struct btrfs_key found_key;
>> + struct btrfs_root *root = fs_info->dev_root;
>> + struct btrfs_dev_extent *dev_extent = NULL;
>> + struct btrfs_block_group_cache *cache;
>> + struct extent_buffer *l;
>> + int slot;
>> + int ret;
>> + u64 chunk_offset, length;
>> +
>> + path = btrfs_alloc_path();
>> + if (!path)
>> + return -ENOMEM;
>> +
>> + path->reada = READA_FORWARD;
>> + path->search_commit_root = 1;
>> + path->skip_locking = 1;
>> +
>> + key.objectid = src_dev->devid;
>> + key.offset = 0ull;
>> + key.type = BTRFS_DEV_EXTENT_KEY;
>> +
>> + while (1) {
>> + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> + if (ret < 0)
>> + break;
>> + if (ret > 0) {
>> + if (path->slots[0] >=
>> + btrfs_header_nritems(path->nodes[0])) {
>> + ret = btrfs_next_leaf(root, path);
>> + if (ret < 0)
>> + break;
>> + if (ret > 0) {
>> + ret = 0;
>> + break;
>> + }
>> + } else {
>> + ret = 0;
>> + }
>> + }
>> +
>> + l = path->nodes[0];
>> + slot = path->slots[0];
>> +
>> + btrfs_item_key_to_cpu(l, &found_key, slot);
>> +
>> + if (found_key.objectid != src_dev->devid)
>> + break;
>> +
>> + if (found_key.type != BTRFS_DEV_EXTENT_KEY)
>> + break;
>> +
>> + if (found_key.offset < key.offset)
>> + break;
>> +
>> + dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
>> + length = btrfs_dev_extent_length(l, dev_extent);
>> +
>> + chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent);
>> +
>> + cache = btrfs_lookup_block_group(fs_info, chunk_offset);
>> + if (!cache)
>> + goto skip;
>> +
>> + cache->to_copy = 1;
>> +
>> + btrfs_put_block_group(cache);
>> +
>> +skip:
>> + key.offset = found_key.offset + length;
>> + btrfs_release_path(path);
>> + }
>> +
>> + btrfs_free_path(path);
>> +
>> + return ret;
>> +}
>> +
>> static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>> const char *tgtdev_name, u64 srcdevid, const char *srcdev_name,
>> int read_src)
>> @@ -439,6 +528,13 @@ static int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>> }
>>
>> need_unlock = true;
>> +
>> + mutex_lock(&fs_info->chunk_mutex);
>> + ret = mark_block_group_to_copy(fs_info, src_device);
>> + mutex_unlock(&fs_info->chunk_mutex);
>> + if (ret)
>> + return ret;
>> +
>> down_write(&dev_replace->rwsem);
>> switch (dev_replace->replace_state) {
>> case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index ff4d55d6ef04..268365dd9a5d 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -29,6 +29,7 @@
>> #include "qgroup.h"
>> #include "ref-verify.h"
>> #include "rcu-string.h"
>> +#include "dev-replace.h"
>>
>> #undef SCRAMBLE_DELAYED_REFS
>>
>> @@ -2022,7 +2023,31 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>> if (btrfs_dev_is_sequential(stripe->dev,
>> stripe->physical) &&
>> stripe->length == stripe->dev->zone_size) {
>> - ret = blkdev_reset_zones(stripe->dev->bdev,
>> + struct btrfs_device *dev = stripe->dev;
>> +
>> + ret = blkdev_reset_zones(dev->bdev,
>> + stripe->physical >>
>> + SECTOR_SHIFT,
>> + stripe->length >>
>> + SECTOR_SHIFT,
>> + GFP_NOFS);
>> + if (!ret)
>> + discarded_bytes += stripe->length;
>> + else
>> + break;
>> + set_bit(stripe->physical >>
>> + dev->zone_size_shift,
>> + dev->empty_zones);
>> +
>> + if (!btrfs_dev_replace_is_ongoing(
>> + &fs_info->dev_replace) ||
>> + stripe->dev != fs_info->dev_replace.srcdev)
>> + continue;
>> +
>> + /* send to target as well */
>> + dev = fs_info->dev_replace.tgtdev;
>> +
>> + ret = blkdev_reset_zones(dev->bdev,
>
> This is unrelated to dev replace isn't it? Please make this it's own patch, and
> it's own helper while you are at it. Thanks,
>
> Josef
>
Actually, patch 0015 introduced zone reset here. And this patch extend that code
to reset also the corresponding zone when dev_replace is on going. The diff is
messed up here.
I'll add the reset helper in the next version.
Thanks,
next prev parent reply other threads:[~2019-06-18 9:15 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-07 13:10 [PATCH v2 00/19] btrfs zoned block device support Naohiro Aota
2019-06-07 13:10 ` [PATCH 01/19] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-06-07 13:10 ` [PATCH 02/19] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-06-13 13:58 ` Josef Bacik
2019-06-18 6:04 ` Naohiro Aota
2019-06-13 13:58 ` Josef Bacik
2019-06-17 18:57 ` David Sterba
2019-06-18 6:42 ` Naohiro Aota
2019-06-27 15:11 ` David Sterba
2019-06-07 13:10 ` [PATCH 03/19] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-06-13 13:57 ` Josef Bacik
2019-06-18 6:43 ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 04/19] btrfs: disable fallocate in " Naohiro Aota
2019-06-07 13:10 ` [PATCH 05/19] btrfs: disable direct IO " Naohiro Aota
2019-06-13 14:00 ` Josef Bacik
2019-06-18 8:17 ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 06/19] btrfs: align dev extent allocation to zone boundary Naohiro Aota
2019-06-07 13:10 ` [PATCH 07/19] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-06-13 14:07 ` Josef Bacik
2019-06-18 8:28 ` Naohiro Aota
2019-06-18 13:37 ` Josef Bacik
2019-06-17 22:30 ` David Sterba
2019-06-18 8:49 ` Naohiro Aota
2019-06-27 15:28 ` David Sterba
2019-06-07 13:10 ` [PATCH 08/19] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-06-13 14:09 ` Josef Bacik
2019-06-18 7:42 ` Naohiro Aota
2019-06-18 13:35 ` Josef Bacik
2019-06-07 13:10 ` [PATCH 09/19] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-06-13 14:12 ` Josef Bacik
2019-06-18 8:51 ` Naohiro Aota
2019-06-17 22:53 ` David Sterba
2019-06-18 9:01 ` Naohiro Aota
2019-06-27 15:35 ` David Sterba
2019-06-28 3:55 ` Anand Jain
2019-06-28 6:39 ` Naohiro Aota
2019-06-28 6:52 ` Anand Jain
2019-06-07 13:10 ` [PATCH 10/19] btrfs: rename btrfs_map_bio() Naohiro Aota
2019-06-07 13:10 ` [PATCH 11/19] btrfs: introduce submit buffer Naohiro Aota
2019-06-13 14:14 ` Josef Bacik
2019-06-17 3:16 ` Damien Le Moal
2019-06-18 0:00 ` David Sterba
2019-06-18 4:04 ` Damien Le Moal
2019-06-18 13:33 ` Josef Bacik
2019-06-19 10:32 ` Damien Le Moal
2019-06-07 13:10 ` [PATCH 12/19] btrfs: expire submit buffer on timeout Naohiro Aota
2019-06-13 14:15 ` Josef Bacik
2019-06-17 3:19 ` Damien Le Moal
2019-06-07 13:10 ` [PATCH 13/19] btrfs: avoid sync IO prioritization on checksum in HMZONED mode Naohiro Aota
2019-06-13 14:17 ` Josef Bacik
2019-06-07 13:10 ` [PATCH 14/19] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-06-13 14:24 ` Josef Bacik
2019-06-18 9:09 ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 15/19] btrfs: reset zones of unused block groups Naohiro Aota
2019-06-07 13:10 ` [PATCH 16/19] btrfs: wait existing extents before truncating Naohiro Aota
2019-06-13 14:25 ` Josef Bacik
2019-06-07 13:10 ` [PATCH 17/19] btrfs: shrink delayed allocation size in HMZONED mode Naohiro Aota
2019-06-13 14:27 ` Josef Bacik
2019-06-07 13:10 ` [PATCH 18/19] btrfs: support dev-replace " Naohiro Aota
2019-06-13 14:33 ` Josef Bacik
2019-06-18 9:14 ` Naohiro Aota [this message]
2019-06-07 13:10 ` [PATCH 19/19] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
2019-06-07 13:17 ` [PATCH 01/12] btrfs-progs: build: Check zoned block device support Naohiro Aota
2019-06-07 13:17 ` [PATCH 02/12] btrfs-progs: utils: Introduce queue_param Naohiro Aota
2019-06-07 13:17 ` [PATCH 03/12] btrfs-progs: add new HMZONED feature flag Naohiro Aota
2019-06-07 13:17 ` [PATCH 04/12] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
2019-06-07 13:17 ` [PATCH 05/12] btrfs-progs: load and check zone information Naohiro Aota
2019-06-07 13:17 ` [PATCH 06/12] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
2019-06-07 13:17 ` [PATCH 07/12] btrfs-progs: support discarding zoned device Naohiro Aota
2019-06-07 13:17 ` [PATCH 08/12] btrfs-progs: volume: align chunk allocation to zones Naohiro Aota
2019-06-07 13:17 ` [PATCH 09/12] btrfs-progs: do sequential allocation Naohiro Aota
2019-06-07 13:17 ` [PATCH 10/12] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
2019-06-07 13:17 ` [PATCH 11/12] btrfs-progs: device-add: support HMZONED device Naohiro Aota
2019-06-07 13:17 ` [PATCH 12/12] btrfs-progs: introduce support for dev-place " Naohiro Aota
2019-06-12 17:51 ` [PATCH v2 00/19] btrfs zoned block device support David Sterba
2019-06-13 4:59 ` Naohiro Aota
2019-06-13 13:46 ` David Sterba
2019-06-14 2:07 ` Naohiro Aota
2019-06-17 2:44 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=SN6PR04MB5231A13B01B410DA158530968CEA0@SN6PR04MB5231.namprd04.prod.outlook.com \
--to=naohiro.aota@wdc.com \
--cc=Damien.LeMoal@wdc.com \
--cc=bvanassche@acm.org \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=hare@suse.com \
--cc=josef@toxicpanda.com \
--cc=jthumshirn@suse.de \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mb@lightnvm.io \
--cc=nborisov@suse.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).