From: David Sterba <dsterba@suse.cz>
To: David Sterba <dsterba@suse.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v2 0/6] RAID1 with 3- and 4- copies
Date: Tue, 25 Jun 2019 19:47:13 +0200 [thread overview]
Message-ID: <20190625174712.GV8917@twin.jikos.cz> (raw)
In-Reply-To: <cover.1559917235.git.dsterba@suse.com>
On Mon, Jun 10, 2019 at 02:29:40PM +0200, David Sterba wrote:
> Hi,
>
> this patchset brings the RAID1 with 3 and 4 copies as a separate
> feature as outlined in V1
> (https://lore.kernel.org/linux-btrfs/cover.1531503452.git.dsterba@suse.com/).
>
> This should help a bit in the raid56 situation, where the write hole
> hurts most for metadata, without a block group profile that offers 2
> device loss resistance.
>
> I've gathered some feedback from knowlegeable poeople on IRC and the
> following setup is considered good enough (certainly better than what we
> have now):
>
> - data: RAID6
> - metadata: RAID1C3
>
> The RAID1C3 vs RAID6 have different characteristics in terms of space
> consumption and repair.
>
>
> Space consumption
> ~~~~~~~~~~~~~~~~~
>
> * RAID6 reduces overall metadata by N/(N-2), so with more devices the
> parity overhead ratio is small
>
> * RAID1C3 will allways consume 67% of metadata chunks for redundancy
>
> The overall size of metadata is typically in range of gigabytes to
> hundreds of gigabytes (depends on usecase), rough estimate is from
> 1%-10%. With larger filesystem the percentage is usually smaller.
>
> So, for the 3-copy raid1 the cost of redundancy is better expressed in
> the absolute value of gigabytes "wasted" on redundancy than as the
> ratio that does look scary compared to raid6.
>
>
> Repair
> ~~~~~~
>
> RAID6 needs to access all available devices to calculate the P and Q,
> either 1 or 2 missing devices.
>
> RAID1C3 can utilize the independence of each copy and also the way the
> RAID1 works in btrfs. In the scenario with 1 missing device, one of the
> 2 correct copies is read and written to the repaired devices.
>
> Given how the 2-copy RAID1 works on btrfs, the block groups could be
> spread over several devices so the load during repair would be spread as
> well.
>
> Additionally, device replace works sequentially and in big chunks so on
> a lightly used system the read pattern is seek-friendly.
>
>
> Compatibility
> ~~~~~~~~~~~~~
>
> The new block group types cost an incompatibility bit, so old kernel
> will refuse to mount filesystem with RAID1C3 feature, ie. any chunk on
> the filesystem with the new type.
>
> To upgrade existing filesystems use the balance filters eg. from RAID6
>
> $ btrfs balance start -mconvert=raid1c3 /path
>
>
> Merge target
> ~~~~~~~~~~~~
>
> I'd like to push that to misc-next for wider testing and merge to 5.3,
> unless something bad pops up. Given that the code changes are small and
> just a new types with the constraints, the rest is done by the generic
> code, I'm not expecting problems that can't be fixed before full
> release.
>
>
> Testing so far
> ~~~~~~~~~~~~~~
>
> * mkfs with the profiles
> * fstests (no specific tests, only check that it does not break)
> * profile conversions between single/raid1/raid5/raid1c3/raid6/raid1c4/raid1c4
> with added devices where needed
> * scrub
>
> TODO:
>
> * 1 missing device followed by repair
> * 2 missing devices followed by repair
Unfortunatelly neither of the two cases works as expected and I don't have time
to fix it for the 5.3 deadline. As the 3-copy is supposed to be a replacement
for raid6, I consider the lack of repair capability to be a show stopper so the
main part of the patchset is postponed.
The test I did was something like this:
- create fs with 3 devices, raid1c3
- fill with some data
- unmount
- wipe 2nd device
- mount degraded
- replace missing
- remount read-write, write data to verify that it works
- unmount
- mount as usual <-- here it fails and the device is still reported missing
The same happens for the 2 missing devices.
prev parent reply other threads:[~2019-06-25 17:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-10 12:29 [PATCH v2 0/6] RAID1 with 3- and 4- copies David Sterba
2019-06-10 12:29 ` [PATCH v2 1/6] btrfs: add mask for all RAID1 types David Sterba
2019-06-10 12:29 ` [PATCH v2 2/6] btrfs: use mask for RAID56 profiles David Sterba
2019-06-10 12:29 ` [PATCH v2 3/6] btrfs: document BTRFS_MAX_MIRRORS David Sterba
2019-06-10 12:29 ` [PATCH v2 4/6] btrfs: add support for 3-copy replication (raid1c3) David Sterba
2019-06-10 12:29 ` [PATCH v2 5/6] btrfs: add support for 4-copy replication (raid1c4) David Sterba
2019-06-10 12:29 ` [PATCH v2 6/6] btrfs: add incompat for raid1 with 3, 4 copies David Sterba
2019-06-10 12:29 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
2019-06-10 12:42 ` [PATCH v2 0/6] RAID1 with 3- and 4- copies Hugo Mills
2019-06-10 14:02 ` David Sterba
2019-06-10 14:48 ` Hugo Mills
2019-06-11 9:53 ` David Sterba
2019-06-11 12:03 ` David Sterba
2019-06-25 17:47 ` David Sterba [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190625174712.GV8917@twin.jikos.cz \
--to=dsterba@suse.cz \
--cc=dsterba@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).