linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Naohiro Aota <naohiro.aota@wdc.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>,
	Chris Mason <clm@fb.com>, Nikolay Borisov <nborisov@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Hannes Reinecke <hare@suse.com>,
	Anand Jain <anand.jain@oracle.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 23/28] btrfs: support dev-replace in HMZONED mode
Date: Wed, 18 Dec 2019 15:00:33 +0900	[thread overview]
Message-ID: <20191218060033.ubfidtuhvzdbkk3o@naota.dhcp.fujisawa.hgst.com> (raw)
In-Reply-To: <2157b1bb-a64b-eed3-0451-09a8480d0db2@toxicpanda.com>

On Tue, Dec 17, 2019 at 04:05:25PM -0500, Josef Bacik wrote:
>On 12/12/19 11:09 PM, Naohiro Aota wrote:
>>We have two type of I/Os during the device-replace process. One is a I/O to
>>"copy" (by the scrub functions) all the device extents on the source device
>>to the destination device.  The other one is a I/O to "clone" (by
>>handle_ops_on_dev_replace()) new incoming write I/Os from users to the
>>source device into the target device.
>>
>>Cloning incoming I/Os can break the sequential write rule in the target
>>device. When write is mapped in the middle of a block group, that I/O is
>>directed in the middle of a zone of target device, which breaks the
>>sequential write rule.
>>
>>However, the cloning function cannot be simply disabled since incoming I/Os
>>targeting already copied device extents must be cloned so that the I/O is
>>executed on the target device.
>>
>>We cannot use dev_replace->cursor_{left,right} to determine whether bio is
>>going to not yet copied region.  Since we have time gap between finishing
>>btrfs_scrub_dev() and rewriting the mapping tree in
>>btrfs_dev_replace_finishing(), we can have newly allocated device extent
>>which is never cloned nor copied.
>>
>>So the point is to copy only already existing device extents. This patch
>>introduces mark_block_group_to_copy() to mark existing block group as a
>>target of copying. Then, handle_ops_on_dev_replace() and dev-replace can
>>check the flag to do their job.
>>
>>Device-replace process in HMZONED mode must copy or clone all the extents
>>in the source device exctly once.  So, we need to use to ensure allocations
>>started just before the dev-replace process to have their corresponding
>>extent information in the B-trees. finish_extent_writes_for_hmzoned()
>>implements that functionality, which basically is the removed code in the
>>commit 042528f8d840 ("Btrfs: fix block group remaining RO forever after
>>error during device replace").
>>
>>This patch also handles empty region between used extents. Since
>>dev-replace is smart to copy only used extents on source device, we have to
>>fill the gap to honor the sequential write rule in the target device.
>>
>>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>
>Can you split up the copying part and the cloning part into different 
>patches, this is a bear to review.  Also I don't quite understand the 
>zeroout behavior. It _looks_ like for cloning you are doing a zeroout 
>for the gap between the last wp position and the current cloned bio, 
>which makes sense, but doesn't this gap exist because copying is 
>ongoing?  Can you copy into a zero'ed out position?  Or am I missing 
>something here?  Thanks,
>
>Josef

OK, I will split this in the next version. (but, it's mostly "copying"
part)

Let me clarify first that I am using "copying" for copying existing
extents to the new device and "cloning" for cloning a new incoming BIO
to the new device.

For zeroout, it is for "copying" which is done with the scrub code to
copy existing extents on the source devie to the destination
device. Since copying or scrub only scans for living extents, there
can be a gap between two living extents. So, we need to fill a gap
with zeroout to make the writing stream sequential.

And "cloning" is only done for new block groups or already fully
copied block groups. So there is no gaps for them because the
allocator and the IO locks ensures the sequential allocation and
submit.

Thanks,

  reply	other threads:[~2019-12-18  6:00 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-13  4:08 [PATCH v6 00/28] btrfs: zoned block device support Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 01/28] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 02/28] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-12-13 16:18   ` Josef Bacik
2019-12-18  2:29     ` Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 03/28] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-12-13 16:21   ` Josef Bacik
2019-12-18  4:17     ` Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 04/28] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-12-13 16:21   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 05/28] btrfs: disallow space_cache " Naohiro Aota
2019-12-13 16:24   ` Josef Bacik
2019-12-18  4:28     ` Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 06/28] btrfs: disallow NODATACOW " Naohiro Aota
2019-12-13 16:25   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 07/28] btrfs: disable fallocate " Naohiro Aota
2019-12-13 16:26   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 08/28] btrfs: implement log-structured superblock for " Naohiro Aota
2019-12-13 16:38   ` Josef Bacik
2019-12-13 21:58     ` Damien Le Moal
2019-12-17 19:17       ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 09/28] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-12-13 16:52   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 10/28] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-12-17 19:19   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 11/28] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-12-17 19:25   ` Josef Bacik
2019-12-18  7:35     ` Naohiro Aota
2019-12-18 14:54       ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 12/28] btrfs: ensure metadata space available on/after degraded mount in HMZONED Naohiro Aota
2019-12-17 19:32   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 13/28] btrfs: reset zones of unused block groups Naohiro Aota
2019-12-17 19:33   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 14/28] btrfs: redirty released extent buffers in HMZONED mode Naohiro Aota
2019-12-17 19:41   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 15/28] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-12-17 19:49   ` Josef Bacik
2019-12-19  6:54     ` Naohiro Aota
2019-12-19 14:01       ` Josef Bacik
2020-01-21  6:54         ` Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 16/28] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 17/28] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 18/28] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 19/28] btrfs: wait existing extents before truncating Naohiro Aota
2019-12-17 19:53   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 20/28] btrfs: avoid async checksum on HMZONED mode Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 21/28] btrfs: disallow mixed-bg in " Naohiro Aota
2019-12-17 19:56   ` Josef Bacik
2019-12-18  8:03     ` Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 22/28] btrfs: disallow inode_cache " Naohiro Aota
2019-12-17 19:56   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 23/28] btrfs: support dev-replace " Naohiro Aota
2019-12-17 21:05   ` Josef Bacik
2019-12-18  6:00     ` Naohiro Aota [this message]
2019-12-18 14:58       ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 24/28] btrfs: enable relocation " Naohiro Aota
2019-12-17 21:32   ` Josef Bacik
2019-12-18 10:49     ` Naohiro Aota
2019-12-18 15:01       ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 25/28] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-12-17 22:04   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 26/28] btrfs: split alloc_log_tree() Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 27/28] btrfs: enable tree-log on HMZONED mode Naohiro Aota
2019-12-17 22:08   ` Josef Bacik
2019-12-18  9:35     ` Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 28/28] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
2019-12-17 22:09   ` Josef Bacik
2019-12-13  4:15 ` [PATCH RFC v2] libblkid: implement zone-aware probing for HMZONED btrfs Naohiro Aota
2019-12-19 20:19 ` [PATCH v6 00/28] btrfs: zoned block device support David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191218060033.ubfidtuhvzdbkk3o@naota.dhcp.fujisawa.hgst.com \
    --to=naohiro.aota@wdc.com \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).