linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Naohiro Aota <naota@elisp.net>
To: dsterba@suse.cz
Cc: David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, Chris Mason <clm@fb.com>,
	Josef Bacik <jbacik@fb.com>,
	linux-kernel@vger.kernel.org, Hannes Reinecke <hare@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Bart Van Assche <bart.vanassche@wdc.com>,
	Matias Bjorling <mb@lightnvm.io>
Subject: Re: [RFC PATCH 00/17] btrfs zoned block device support
Date: Tue, 28 Aug 2018 19:33:33 +0900	[thread overview]
Message-ID: <20180828103333.uuywsztisyirwgir@zazie> (raw)
In-Reply-To: <20180813184251.GC24025@twin.jikos.cz>

Thank you for your review!

On Mon, Aug 13, 2018 at 08:42:52PM +0200, David Sterba wrote:
> On Fri, Aug 10, 2018 at 03:04:33AM +0900, Naohiro Aota wrote:
> > This series adds zoned block device support to btrfs.
> 
> Yay, thanks!
> 
> As this a RFC, I'll give you some. The code looks ok for what it claims
> to do, I'll skip style and unimportant implementation details for now as
> there are bigger questions.
> 
> The zoned devices bring some constraints so not all filesystem features
> cannot be expected to work, so this rules out any form of in-place
> updates like NODATACOW.
> 
> Then there's list of 'how will zoned device work with feature X'?

Here is the current HMZONED status list based on https://btrfs.wiki.kernel.org/index.php/Status

Performance
Trim       | OK
Autodefrag | OK
Defrag     | OK
fallocate  | Disabled. cannot reserve region in sequential zones
direct IO  | Disabled. falling back to buffered IO

Compression | OK

Reliability
Auto-repair    | not working. need to rewrite the corrupted extent
Scrub          | not working. need to rewrite the corrupted extent
Scrub + RAID56 | not working (RAID56)
nodatacow      | should be disabled. (noticed it's not disabled now)
Device replace | disabled for now (need to handle write pointer issues, WIP patch)
Degraded mount | OK

Block group profile
Single   | OK
DUP      | OK
RAID0    | OK
RAID1    | OK
RAID10   | OK
RAID56   | Disabled for now. need to avoid partial parity write.
Mixed BG | OK

Administration | OK

Misc
Free space tree | Disabled. not necessary for sequential allocator
no-holes        | OK
skinny-metadata | OK
extended-refs   | OK

> You disable fallocate and DIO. I haven't looked closer at the fallocate
> case, but DIO could work in the sense that open() will open the file but
> any write will fallback to buffered writes. This is implemented so it
> would need to be wired together.

Actually, it's working like that. When check_direct_IO() returns
-EINVAL, btrfs_direct_IO() still returns 0. As a result, the callers
fall back to buffered IO.

I will reword the commit subject and log to reflect the actual
behavior. Also I will relax the condition to disable only direct write
IOs.

> Mixed device types are not allowed, and I tend to agree with that,
> though this could work in principle.  Just that the chunk allocator
> would have to be aware of the device types and tweaked to allocate from
> the same group. The btrfs code is not ready for that in terms of the
> allocator capabilities and configuration options.

Yes it will work if the allocator is improved to notice device type,
zone type and zone size.

> Device replace is disabled, but the changlog suggests there's a way to
> make it work, so it's a matter of implementation. And this should be
> implemented at the time of merge.

I have a WIP patch to support device replace. But it fails after
device replacing due to write pointer mismatch. I'm debugging the
code, so the following version may enable the feature.

> RAID5/6 + zoned support is highly desired and lack of it could be
> considered a NAK for the whole series. The drive sizes are expected to
> be several terabytes, that sounds be too risky to lack the redundancy
> options (RAID1 is not sufficient here).
> 
> The changelog does not explain why this does not or cannot work, so I
> cannot reason about that or possibly suggest workarounds or solutions.
> But I think it should work in principle.
> 
> As this is first post and RFC I don't expect that everything is
> implemented, but at least the known missing points should be documented.
> You've implemented lots of the low-level zoned support and extent
> allocation, so even if the raid56 might be difficult, it should be the
> smaller part.

I was leaving RAID56 for the future, since I'm not get used to raid56
code and the its write path (raid56_parity_write) seems to be
separated from the other's (submit_stripe_bio).

I quick checked if RAID5 is working on current HMZONED patch. But even
with simple sequential workload using dd, it made IO failures because
partial parity writes introduced overwrite IOs, which violate the
sequential write rule. At a quick glance at the raid56 code, I'm
currently not sure how we can avoid partial parity write while
dispatching necessary IOs on transaction commit.

Regards,
Naohiro

      parent reply	other threads:[~2018-08-28 14:24 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-09 18:04 [RFC PATCH 00/17] btrfs zoned block device support Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 01/17] btrfs: introduce HMZONED feature flag Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 02/17] btrfs: Get zone information of zoned block devices Naohiro Aota
2018-08-10  7:41   ` Nikolay Borisov
2018-08-09 18:04 ` [RFC PATCH 03/17] btrfs: Check and enable HMZONED mode Naohiro Aota
2018-08-10 12:25   ` Hannes Reinecke
2018-08-10 13:15     ` Naohiro Aota
2018-08-10 13:41       ` Hannes Reinecke
2018-08-09 18:04 ` [RFC PATCH 04/17] btrfs: limit super block locations in " Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 05/17] btrfs: disable fallocate " Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 06/17] btrfs: disable direct IO " Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 07/17] btrfs: disable device replace " Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 08/17] btrfs: align extent allocation to zone boundary Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 09/17] btrfs: do sequential allocation on HMZONED drives Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 10/17] btrfs: split btrfs_map_bio() Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 11/17] btrfs: introduce submit buffer Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 12/17] btrfs: expire submit buffer on timeout Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 13/17] btrfs: avoid sync IO prioritization on checksum in HMZONED mode Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 14/17] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 15/17] btrfs: reset zones of unused block groups Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 16/17] btrfs: wait existing extents before truncating Naohiro Aota
2018-08-09 18:04 ` [RFC PATCH 17/17] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
2018-08-09 18:10 ` [RFC PATCH 01/12] btrfs-progs: build: Check zoned block device support Naohiro Aota
2018-08-09 18:10   ` [RFC PATCH 02/12] btrfs-progs: utils: Introduce queue_param Naohiro Aota
2018-08-09 18:10   ` [RFC PATCH 03/12] btrfs-progs: add new HMZONED feature flag Naohiro Aota
2018-08-09 18:10   ` [RFC PATCH 04/12] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
2018-08-09 18:10   ` [RFC PATCH 05/12] btrfs-progs: load and check zone information Naohiro Aota
2018-08-09 18:10   ` [RFC PATCH 06/12] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
2018-08-09 18:11   ` [RFC PATCH 07/12] btrfs-progs: support discarding zoned device Naohiro Aota
2018-08-09 18:11   ` [RFC PATCH 08/12] btrfs-progs: volume: align chunk allocation to zones Naohiro Aota
2018-08-09 18:11   ` [RFC PATCH 09/12] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
2018-08-09 18:11   ` [RFC PATCH 10/12] btrfs-progs: device-add: support HMZONED device Naohiro Aota
2018-08-09 18:11   ` [RFC PATCH 11/12] btrfs-progs: replace: disable in " Naohiro Aota
2018-08-09 18:11   ` [RFC PATCH 12/12] btrfs-progs: do sequential allocation Naohiro Aota
2018-08-10  7:04 ` [RFC PATCH 00/17] btrfs zoned block device support Hannes Reinecke
2018-08-10 14:24   ` Naohiro Aota
2018-08-10  7:26 ` Hannes Reinecke
2018-08-10  7:28 ` Qu Wenruo
2018-08-10 13:32   ` Hans van Kranenburg
2018-08-10 14:04     ` Qu Wenruo
2018-08-16  9:05   ` Naohiro Aota
2018-08-10  7:53 ` Nikolay Borisov
2018-08-10  7:55   ` Nikolay Borisov
2018-08-13 18:42 ` David Sterba
2018-08-13 19:20   ` Hannes Reinecke
2018-08-13 19:29     ` Austin S. Hemmelgarn
2018-08-14  7:41       ` Hannes Reinecke
2018-08-15 11:25         ` Austin S. Hemmelgarn
2018-08-28 10:33   ` Naohiro Aota [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180828103333.uuywsztisyirwgir@zazie \
    --to=naota@elisp.net \
    --cc=bart.vanassche@wdc.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=hare@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mb@lightnvm.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).