* Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
@ 2017-11-02 2:45 Dave
2017-11-02 7:29 ` ronnie sahlberg
2017-11-02 22:06 ` waxhead
0 siblings, 2 replies; 6+ messages in thread
From: Dave @ 2017-11-02 2:45 UTC (permalink / raw)
To: Linux fs Btrfs
Has this been discussed here? Has anything changed since it was written?
Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
and MDADM (Dec 2014) – Ronny Egners Blog
http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
TL;DR: There are patches to extend the linux kernel to support up to 6
parity disks but BTRFS does not want them because it does not fit
their “business case” and MDADM would want them but somebody needs to
develop patches for the MDADM component. The kernel raid
implementation is ready and usable. If someone volunteers to do this
kind of work I would support with equipment and myself as a test
resource.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
2017-11-02 2:45 Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog Dave
@ 2017-11-02 7:29 ` ronnie sahlberg
2017-11-02 11:21 ` Austin S. Hemmelgarn
2017-11-02 22:06 ` waxhead
1 sibling, 1 reply; 6+ messages in thread
From: ronnie sahlberg @ 2017-11-02 7:29 UTC (permalink / raw)
To: Dave; +Cc: Linux fs Btrfs
I think it is just a matter of lack of resources.
The very few paid resources to work on btrfs probably does not have
priority to work on parity raid.
(And honestly, parity raid is probably much better implemented below
the filesystem in any case, i.e. in say the md driver or the array
itself).
Also, until at least about a year ago, RAID56 was known to be
completely broken in btrfs and would destroy all your data.
Not a question of when, but if.
So, considering the state of parity raid in btrfs it is understandable
if the few resources available would not work on Andrea's 6 parity
raid code.
I don't follow the parity raid code in btrfs closely, it might be
fixed by now or it might still be pathologically broken. I don't know.
I assume it is still deadly to use btrfs raid5/6.
That said, that the MDADM folks did not pick up on Andrea's work is a tragedy.
While it is really just Reed-Solomon coding, his breakthrough was that
he found a 6 parity Reed-Solomon encoding where the first two
parities
were identical to the RAID5/6 parities.
I.e. you could add a third parity to a normal RAID6 and thus create a
3-parity system without having to recompute the first and second
parity.
On Thu, Nov 2, 2017 at 12:45 PM, Dave <davestechshop@gmail.com> wrote:
> Has this been discussed here? Has anything changed since it was written?
>
> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
> and MDADM (Dec 2014) – Ronny Egners Blog
> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>
> TL;DR: There are patches to extend the linux kernel to support up to 6
> parity disks but BTRFS does not want them because it does not fit
> their “business case” and MDADM would want them but somebody needs to
> develop patches for the MDADM component. The kernel raid
> implementation is ready and usable. If someone volunteers to do this
> kind of work I would support with equipment and myself as a test
> resource.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
2017-11-02 7:29 ` ronnie sahlberg
@ 2017-11-02 11:21 ` Austin S. Hemmelgarn
0 siblings, 0 replies; 6+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-02 11:21 UTC (permalink / raw)
To: ronnie sahlberg, Dave; +Cc: Linux fs Btrfs
On 2017-11-02 03:29, ronnie sahlberg wrote:
> I think it is just a matter of lack of resources.
> The very few paid resources to work on btrfs probably does not have
> priority to work on parity raid.
> (And honestly, parity raid is probably much better implemented below
> the filesystem in any case, i.e. in say the md driver or the array
> itself).
More specifically, they likely don't have incentive to work on higher
order parity raid. The common case for most of the paid developers
companies is running large scale storage using BTRFS as a back-end. In
such a situation, it makes more sense to use smaller BTRFS volumes in
raid5 or raid6 mode (or even raid1 or raid10), and handle striping at
the cluster level instead of the node level.
>
> Also, until at least about a year ago, RAID56 was known to be
> completely broken in btrfs and would destroy all your data.
> Not a question of when, but if.
>
> So, considering the state of parity raid in btrfs it is understandable
> if the few resources available would not work on Andrea's 6 parity
> raid code.
> I don't follow the parity raid code in btrfs closely, it might be
> fixed by now or it might still be pathologically broken. I don't know.
> I assume it is still deadly to use btrfs raid5/6.
AFAIK, it's better than it was, but it's still broken.>
> That said, that the MDADM folks did not pick up on Andrea's work is a tragedy.
> While it is really just Reed-Solomon coding, his breakthrough was that
> he found a 6 parity Reed-Solomon encoding where the first two
> parities
> were identical to the RAID5/6 parities.
> I.e. you could add a third parity to a normal RAID6 and thus create a
> 3-parity system without having to recompute the first and second
> parity.
>
>
>
>
> On Thu, Nov 2, 2017 at 12:45 PM, Dave <davestechshop@gmail.com> wrote:
>> Has this been discussed here? Has anything changed since it was written?
>>
>> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
>> and MDADM (Dec 2014) – Ronny Egners Blog
>> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>>
>> TL;DR: There are patches to extend the linux kernel to support up to 6
>> parity disks but BTRFS does not want them because it does not fit
>> their “business case” and MDADM would want them but somebody needs to
>> develop patches for the MDADM component. The kernel raid
>> implementation is ready and usable. If someone volunteers to do this
>> kind of work I would support with equipment and myself as a test
>> resource.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
2017-11-02 2:45 Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog Dave
2017-11-02 7:29 ` ronnie sahlberg
@ 2017-11-02 22:06 ` waxhead
2017-11-04 1:09 ` Chris Murphy
2017-11-05 6:52 ` Duncan
1 sibling, 2 replies; 6+ messages in thread
From: waxhead @ 2017-11-02 22:06 UTC (permalink / raw)
To: Dave, Linux fs Btrfs
Dave wrote:
> Has this been discussed here? Has anything changed since it was written?
>
I have (more or less) been following the mailing list since this feature
was suggested. I have been drooling over it since, but not much have
happened.
> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
> and MDADM (Dec 2014) – Ronny Egners Blog
> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>
> TL;DR: There are patches to extend the linux kernel to support up to 6
> parity disks but BTRFS does not want them because it does not fit
> their “business case” and MDADM would want them but somebody needs to
> develop patches for the MDADM component. The kernel raid
> implementation is ready and usable. If someone volunteers to do this
> kind of work I would support with equipment and myself as a test
> resource.
> --
I am just a list "stalker" and no BTRFS developer, but as others have
indirectly said already. It is not so much that BTRFS don't want the
patches as it is that BTRFS do not want to / can't focus on this right
now due to other priorities.
There was some updates to raid5/6 in kernel 4.12 that should fix (or at
least improve) scrub/auto-repair. The write hole does still exist.
That being said there might be configurations where btrfs raid5/6 might
be of some use. I think I read somewhere that you can set data to
raid5/6 and METADATA to raid1 or 10 and you would risk loosing some data
(but not the filesystem) in the event of a system crash / power failure.
This sounds tempting since it in theory would not make btrfs raid 5/6
significantly less reliable than other RAID's which will corrupt your
data if the disk happens to spits out bad bits without complaining (one
possible exception that might catch this is md raid6 which I use). That
being said there is no way I would personally use btrfs raid 5/6 even
with metadata raid1/10 yet without proper tested backups at standby at
this point.
Anyway - I would worry more about getting raid5/6 to work properly
before even thinking about multi-parity at all :)
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
2017-11-02 22:06 ` waxhead
@ 2017-11-04 1:09 ` Chris Murphy
2017-11-05 6:52 ` Duncan
1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2017-11-04 1:09 UTC (permalink / raw)
To: waxhead; +Cc: Dave, Linux fs Btrfs
For what it's worth, cryptsetup 2 now offers a UI for setting up both
dm-verity and dm-integrity.
https://www.kernel.org/pub/linux/utils/cryptsetup/v2.0/v2.0.0-rc0-ReleaseNotes
While more complicated than Btrfs, it's possible to first make an
integrity device on each drive, and add the integrity block devices to
mdadm or lvm as physical devices to create the raid1/10/5/6 array. You
could do it the other way around, but what should happen if you do it
as described, a sector read that fails checksum matching will cause a
read error to be handed off to md driver which then does
reconstruction from parity. If you only make the integrity volume out
of an array, then your file system just gets a read error whenever
there's a checksum mismatch, reconstruction isn't possible but at
least you're warned.
---
Chris Murphy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
2017-11-02 22:06 ` waxhead
2017-11-04 1:09 ` Chris Murphy
@ 2017-11-05 6:52 ` Duncan
1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2017-11-05 6:52 UTC (permalink / raw)
To: linux-btrfs
waxhead posted on Thu, 02 Nov 2017 23:06:41 +0100 as excerpted:
> Dave wrote:
>>
>> TL;DR: There are patches to extend the linux kernel to support up to 6
>> parity disks but BTRFS does not want them because it does not fit their
>> “business case” and MDADM would want them but somebody needs to develop
>> patches for the MDADM component. The kernel raid implementation is
>> ready and usable. If someone volunteers to do this kind of work I would
>> support with equipment and myself as a test resource.
>> --
> I am just a list "stalker" and no BTRFS developer, but as others have
> indirectly said already. It is not so much that BTRFS don't want the
> patches as it is that BTRFS do not want to / can't focus on this right
> now due to other priorities.
Indeed.
There's a meme that USAF pilots call situations in which they're
seriously outnumbered by the enemy "target rich environments."
Using that analogy here, btrfs is an "development opportunity rich
environment".
IOW, the basic btrfs design is quite flexible and there's all sorts of
ideas as to what sort of features it'd be nice to have at some point, but
there's way more good feature ideas than there are qualified devs to work
on them, and getting upto speed on btrfs takes long enough even for
experienced kernel/fs devs that it's not the sort of thing where just any
dev can simply pick up a project from the list and have it ready for
mainlining in six months...
Meanwhile, btrfs history is a wash/rinse/repeat list of features that
took rather longer, sometimes /years/ and multiple rewrites longer, to
implement, debug and reasonably stabilize. Quotas/qgroups and the
existing raid56 parity-raid are both prime examples, as the devs have
been working on both features for years and while they both appear to be
/somewhat/ stabilized in terms of egregious bugs, there remain big
caveats on both, primarily performance on quotas, and the parity-write-
hole undermining the normal checksummed data and metadata integrity and
thus the greater reliability people would otherwise choose it for, on
raid56.
Given that status and history, realistic estimates on when particular
features may be available as reasonably stable really extend to years for
features under current development, perhaps the 3-5 year timeframe for
those queued up for development "soon", and very likely the 10 years out
timeframe for anything beyond that.
But the thing is, anything beyond five years out in Linux development by
definition is in practice beyond the reasonably predictable -- just look
back at where Linux was 5 or 10 years ago and the unexpected twists and
turns it has taken since then that have played havoc with predictions
from that period, and project that forward 5 to 10 years, and I imagine
you'll agree. (Tho the history of btrfs itself is in that time frame,
but I'm not saying long term projects can't be started with a hope that
they'll be reasonably successful 5-10 years out, just that the picture
you're trying to project out that far is likely to look wildly different
than the picture when you actually get there. Certainly I don't think
many expected btrfs to take this long, tho others cautioned the
projections were wildly optimistic and 7-10 years to approach stability
wasn't unreasonable.)
The point being, if it's not on the "current" or "queued-to-next" lists,
in practice it's almost certainly 5+ years out, and that's beyond
reasonable predictability range, so it's "bluesky", aka "it'd be nice to
have... someday", range.
And honestly there's quite a lot of ideas in that "bluesky" range, and
just because triple-parity-plus is one of them doesn't mean the devs have
rejected it, just that there's this thing called reality that they're up
against.
I know, because my personal wish-list item, N-way-mirroring, has been on
the "right after raid56 mode, since it'll be re-using some of that code"
queue since before the kernel 3.6 era, with raid56 expected to be
introduced for 3.6 when I first looked at btrfs seriously, and N-way-
mirroring assumed to be introduced perhaps 2-3 kernel cycles later.
Of course I was soon disabused of that notion, but even so, N-way-
mirroring has been "3-5 years out" for more than 3-5 years now, and it's
on the "soon" list, so anything /not/ on that "soon" list... well, better
point your time machine at least 10 years out...
But the one thing that can change that is if there's at least one
*really* interested kernel dev (or sponsor willing to pay sufficiently to
create one, or more if necessary) willing to learn btrfs internals and
take on a particular feature as their major personal task for the multi-
year time-period scope necessary, even if it means coping with the
project possibly getting back-burnered for a year or more in the
process. I believe I've seen one such "from left field" feature merged
in the years since I started following the list with 3.5-ish (tho
unfortunately IDR what it was ATM), and a couple others that haven't yet
been merged, but they have proof-of-concept code and have been approved
for soon/next, tho they're backburnered for the moment due to
dependencies and merge-queue scheduling issues. The hot-spare patch set
is in that last category, tho a few patches that had been in that set
were recently dusted off, cleaned up and merged as they turned out to be
useful in their own right. That of course is a good thing, since it
makes the remaining patch set smaller and simpler, and less likely to
conflict with other current or queued/soon projects, as it moves forward
in that queue.
> There was some updates to raid5/6 in kernel 4.12 that should fix (or at
> least improve) scrub/auto-repair. The write hole does still exist.
>
> That being said there might be configurations where btrfs raid5/6 might
> be of some use. I think I read somewhere that you can set data to
> raid5/6 and METADATA to raid1 or 10 and you would risk loosing some data
> (but not the filesystem) in the event of a system crash / power failure.
>
> This sounds tempting since it in theory would not make btrfs raid 5/6
> significantly less reliable than other RAID's which will corrupt your
> data if the disk happens to spits out bad bits without complaining (one
> possible exception that might catch this is md raid6 which I use). That
> being said there is no way I would personally use btrfs raid 5/6 even
> with metadata raid1/10 yet without proper tested backups at standby at
> this point.
Indeed. Unfortunately, the infamous parity-write-hole is rather the
antithesis of btrfs checksummed integrity feature, and until it's fixed,
the reasons one would choose btrfs in general rather conflict with using
raid56 mode in particular. There's no immediate or easy fix. There /is/
a possible mid-term fix, journaling writes, but that's likely to
absolutely kill write speed, making it impractical for most usage, thus
making the use-case small enough it's arguably not worth the trouble.
But the real fix is unfortunately a near full rewrite of the current
raid56 mode, using what we've learned from the current implementation to
hopefully create a better one not affected by the write hole (yes,
there's ways around it), which likely puts 3-5 years out, at least. I'd
put it on the 10 year list but it does seem there's quite an interest by
current devs, thus upgrading it to the queued list.
Unfortunately, if that's the case, then it may well delay other projects,
including the N-way-mirroring I have a personal interest in and that as I
said has been on that 3-5 year list for longer than that now, even
further.
So I'm 50 now; /maybe/ I'll be able to use btrfs N-way-mirroring from the
nursing home, when I'm 70 or 80... if technology hasn't made btrfs as we
know it obsolete by then...
> Anyway - I would worry more about getting raid5/6 to work properly
> before even thinking about multi-parity at all :)
For sure. Even the "soon" N-way-mirroring, which was waiting for raid56
mode, continues to wait...
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-11-05 6:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02 2:45 Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog Dave
2017-11-02 7:29 ` ronnie sahlberg
2017-11-02 11:21 ` Austin S. Hemmelgarn
2017-11-02 22:06 ` waxhead
2017-11-04 1:09 ` Chris Murphy
2017-11-05 6:52 ` Duncan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.