linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
@ 2017-11-02  2:45 Dave
  2017-11-02  7:29 ` ronnie sahlberg
  2017-11-02 22:06 ` waxhead
  0 siblings, 2 replies; 6+ messages in thread
From: Dave @ 2017-11-02  2:45 UTC (permalink / raw)
  To: Linux fs Btrfs

Has this been discussed here? Has anything changed since it was written?

Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
and MDADM (Dec 2014) – Ronny Egners Blog
http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/

TL;DR: There are patches to extend the linux kernel to support up to 6
parity disks but BTRFS does not want them because it does not fit
their “business case” and MDADM would want them but somebody needs to
develop patches for the MDADM component. The kernel raid
implementation is ready and usable. If someone volunteers to do this
kind of work I would support with equipment and myself as a test
resource.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
  2017-11-02  2:45 Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog Dave
@ 2017-11-02  7:29 ` ronnie sahlberg
  2017-11-02 11:21   ` Austin S. Hemmelgarn
  2017-11-02 22:06 ` waxhead
  1 sibling, 1 reply; 6+ messages in thread
From: ronnie sahlberg @ 2017-11-02  7:29 UTC (permalink / raw)
  To: Dave; +Cc: Linux fs Btrfs

I think it is just a matter of lack of resources.
The very few paid resources to work on btrfs probably does not have
priority to work on parity raid.
(And honestly, parity raid is probably much better implemented below
the filesystem in any case, i.e. in say the md driver or the array
itself).

Also, until at least about a year ago, RAID56 was known to be
completely broken in btrfs and would destroy all your data.
Not a question of when, but if.

So, considering the state of parity raid in btrfs it is understandable
if the few resources available would not work on Andrea's 6 parity
raid code.
I don't follow the parity raid code in btrfs closely, it might be
fixed by now or it might still be pathologically broken. I don't know.
I assume it is still deadly to use btrfs raid5/6.


That said, that the MDADM folks did not pick up on Andrea's work is a tragedy.
While it is really just Reed-Solomon coding, his breakthrough was that
he found a 6 parity Reed-Solomon encoding  where the first two
parities
were identical to the RAID5/6 parities.
I.e. you could add a third parity to a normal RAID6 and thus create a
3-parity system without having to recompute the first and second
parity.




On Thu, Nov 2, 2017 at 12:45 PM, Dave <davestechshop@gmail.com> wrote:
> Has this been discussed here? Has anything changed since it was written?
>
> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
> and MDADM (Dec 2014) – Ronny Egners Blog
> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>
> TL;DR: There are patches to extend the linux kernel to support up to 6
> parity disks but BTRFS does not want them because it does not fit
> their “business case” and MDADM would want them but somebody needs to
> develop patches for the MDADM component. The kernel raid
> implementation is ready and usable. If someone volunteers to do this
> kind of work I would support with equipment and myself as a test
> resource.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
  2017-11-02  7:29 ` ronnie sahlberg
@ 2017-11-02 11:21   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 6+ messages in thread
From: Austin S. Hemmelgarn @ 2017-11-02 11:21 UTC (permalink / raw)
  To: ronnie sahlberg, Dave; +Cc: Linux fs Btrfs

On 2017-11-02 03:29, ronnie sahlberg wrote:
> I think it is just a matter of lack of resources.
> The very few paid resources to work on btrfs probably does not have
> priority to work on parity raid.
> (And honestly, parity raid is probably much better implemented below
> the filesystem in any case, i.e. in say the md driver or the array
> itself).
More specifically, they likely don't have incentive to work on higher 
order parity raid.  The common case for most of the paid developers 
companies is running large scale storage using BTRFS as a back-end.  In 
such a situation, it makes more sense to use smaller BTRFS volumes in 
raid5 or raid6 mode (or even raid1 or raid10), and handle striping at 
the cluster level instead of the node level.
> 
> Also, until at least about a year ago, RAID56 was known to be
> completely broken in btrfs and would destroy all your data.
> Not a question of when, but if.
> 
> So, considering the state of parity raid in btrfs it is understandable
> if the few resources available would not work on Andrea's 6 parity
> raid code.
> I don't follow the parity raid code in btrfs closely, it might be
> fixed by now or it might still be pathologically broken. I don't know.
> I assume it is still deadly to use btrfs raid5/6.
AFAIK, it's better than it was, but it's still broken.>
> That said, that the MDADM folks did not pick up on Andrea's work is a tragedy.
> While it is really just Reed-Solomon coding, his breakthrough was that
> he found a 6 parity Reed-Solomon encoding  where the first two
> parities
> were identical to the RAID5/6 parities.
> I.e. you could add a third parity to a normal RAID6 and thus create a
> 3-parity system without having to recompute the first and second
> parity.
> 
> 
> 
> 
> On Thu, Nov 2, 2017 at 12:45 PM, Dave <davestechshop@gmail.com> wrote:
>> Has this been discussed here? Has anything changed since it was written?
>>
>> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
>> and MDADM (Dec 2014) – Ronny Egners Blog
>> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>>
>> TL;DR: There are patches to extend the linux kernel to support up to 6
>> parity disks but BTRFS does not want them because it does not fit
>> their “business case” and MDADM would want them but somebody needs to
>> develop patches for the MDADM component. The kernel raid
>> implementation is ready and usable. If someone volunteers to do this
>> kind of work I would support with equipment and myself as a test
>> resource.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
  2017-11-02  2:45 Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog Dave
  2017-11-02  7:29 ` ronnie sahlberg
@ 2017-11-02 22:06 ` waxhead
  2017-11-04  1:09   ` Chris Murphy
  2017-11-05  6:52   ` Duncan
  1 sibling, 2 replies; 6+ messages in thread
From: waxhead @ 2017-11-02 22:06 UTC (permalink / raw)
  To: Dave, Linux fs Btrfs

Dave wrote:
> Has this been discussed here? Has anything changed since it was written?
>
I have (more or less) been following the mailing list since this feature 
was suggested. I have been drooling over it since, but not much have 
happened.

> Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS
> and MDADM (Dec 2014) – Ronny Egners Blog
> http://blog.ronnyegner-consulting.de/2014/12/10/parity-based-redundancy-raid56triple-parity-and-beyond-on-btrfs-and-mdadm-dec-2014/comment-page-1/
>
> TL;DR: There are patches to extend the linux kernel to support up to 6
> parity disks but BTRFS does not want them because it does not fit
> their “business case” and MDADM would want them but somebody needs to
> develop patches for the MDADM component. The kernel raid
> implementation is ready and usable. If someone volunteers to do this
> kind of work I would support with equipment and myself as a test
> resource.
> --
I am just a list "stalker" and no BTRFS developer, but as others have 
indirectly said already. It is not so much that BTRFS don't want the 
patches as it is that BTRFS do not want to / can't focus on this right 
now due to other priorities.

There was some updates to raid5/6 in kernel 4.12 that should fix (or at 
least improve) scrub/auto-repair. The write hole does still exist.

That being said there might be configurations where btrfs raid5/6 might 
be of some use. I think I read somewhere that you can set data to 
raid5/6 and METADATA to raid1 or 10 and you would risk loosing some data 
(but not the filesystem) in the event of a system crash / power failure.

This sounds tempting since it in theory would not make btrfs raid 5/6 
significantly less reliable than other RAID's which will corrupt your 
data if the disk happens to spits out bad bits without complaining (one 
possible exception that might catch this is md raid6 which I use). That 
being said there is no way I would personally use btrfs raid 5/6 even 
with metadata raid1/10 yet without proper tested backups at standby at 
this point.

Anyway - I would worry more about getting raid5/6 to work properly 
before even thinking about multi-parity at all :)

> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
  2017-11-02 22:06 ` waxhead
@ 2017-11-04  1:09   ` Chris Murphy
  2017-11-05  6:52   ` Duncan
  1 sibling, 0 replies; 6+ messages in thread
From: Chris Murphy @ 2017-11-04  1:09 UTC (permalink / raw)
  To: waxhead; +Cc: Dave, Linux fs Btrfs

For what it's worth, cryptsetup 2 now offers a UI for setting up both
dm-verity and dm-integrity.
https://www.kernel.org/pub/linux/utils/cryptsetup/v2.0/v2.0.0-rc0-ReleaseNotes

While more complicated than Btrfs, it's possible to first make an
integrity device on each drive, and add the integrity block devices to
mdadm or lvm as physical devices to create the raid1/10/5/6 array. You
could do it the other way around, but what should happen if you do it
as described, a sector read that fails checksum matching will cause a
read error to be handed off to md driver which then does
reconstruction from parity. If you only make the integrity volume out
of an array, then your file system just gets a read error whenever
there's a checksum mismatch, reconstruction isn't possible but at
least you're warned.

---
Chris Murphy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog
  2017-11-02 22:06 ` waxhead
  2017-11-04  1:09   ` Chris Murphy
@ 2017-11-05  6:52   ` Duncan
  1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2017-11-05  6:52 UTC (permalink / raw)
  To: linux-btrfs

waxhead posted on Thu, 02 Nov 2017 23:06:41 +0100 as excerpted:

> Dave wrote:
>>
>> TL;DR: There are patches to extend the linux kernel to support up to 6
>> parity disks but BTRFS does not want them because it does not fit their
>> “business case” and MDADM would want them but somebody needs to develop
>> patches for the MDADM component. The kernel raid implementation is
>> ready and usable. If someone volunteers to do this kind of work I would
>> support with equipment and myself as a test resource.
>> --
> I am just a list "stalker" and no BTRFS developer, but as others have
> indirectly said already. It is not so much that BTRFS don't want the
> patches as it is that BTRFS do not want to / can't focus on this right
> now due to other priorities.

Indeed.

There's a meme that USAF pilots call situations in which they're 
seriously outnumbered by the enemy "target rich environments."

Using that analogy here, btrfs is an "development opportunity rich 
environment".

IOW, the basic btrfs design is quite flexible and there's all sorts of 
ideas as to what sort of features it'd be nice to have at some point, but 
there's way more good feature ideas than there are qualified devs to work 
on them, and getting upto speed on btrfs takes long enough even for 
experienced kernel/fs devs that it's not the sort of thing where just any 
dev can simply pick up a project from the list and have it ready for 
mainlining in six months...

Meanwhile, btrfs history is a wash/rinse/repeat list of features that 
took rather longer, sometimes /years/ and multiple rewrites longer, to 
implement, debug and reasonably stabilize.  Quotas/qgroups and the 
existing raid56 parity-raid are both prime examples, as the devs have 
been working on both features for years and while they both appear to be 
/somewhat/ stabilized in terms of egregious bugs, there remain big 
caveats on both, primarily performance on quotas, and the parity-write-
hole undermining the normal checksummed data and metadata integrity and 
thus the greater reliability people would otherwise choose it for, on 
raid56.

Given that status and history, realistic estimates on when particular 
features may be available as reasonably stable really extend to years for 
features under current development, perhaps the 3-5 year timeframe for 
those queued up for development "soon", and very likely the 10 years out 
timeframe for anything beyond that.

But the thing is, anything beyond five years out in Linux development by 
definition is in practice beyond the reasonably predictable -- just look 
back at where Linux was 5 or 10 years ago and the unexpected twists and 
turns it has taken since then that have played havoc with predictions 
from that period, and project that forward 5 to 10 years, and I imagine 
you'll agree.  (Tho the history of btrfs itself is in that time frame, 
but I'm not saying long term projects can't be started with a hope that 
they'll be reasonably successful 5-10 years out, just that the picture 
you're trying to project out that far is likely to look wildly different 
than the picture when you actually get there.  Certainly I don't think 
many expected btrfs to take this long, tho others cautioned the 
projections were wildly optimistic and 7-10 years to approach stability 
wasn't unreasonable.)

The point being, if it's not on the "current" or "queued-to-next" lists, 
in practice it's almost certainly 5+ years out, and that's beyond 
reasonable predictability range, so it's "bluesky", aka "it'd be nice to 
have... someday", range.

And honestly there's quite a lot of ideas in that "bluesky" range, and 
just because triple-parity-plus is one of them doesn't mean the devs have 
rejected it, just that there's this thing called reality that they're up 
against.

I know, because my personal wish-list item, N-way-mirroring, has been on 
the "right after raid56 mode, since it'll be re-using some of that code" 
queue since before the kernel 3.6 era, with raid56 expected to be 
introduced for 3.6 when I first looked at btrfs seriously, and N-way-
mirroring assumed to be introduced perhaps 2-3 kernel cycles later.

Of course I was soon disabused of that notion, but even so, N-way-
mirroring has been "3-5 years out" for more than 3-5 years now, and it's 
on the "soon" list, so anything /not/ on that "soon" list... well, better 
point your time machine at least 10 years out...

But the one thing that can change that is if there's at least one 
*really* interested kernel dev (or sponsor willing to pay sufficiently to 
create one, or more if necessary) willing to learn btrfs internals and 
take on a particular feature as their major personal task for the multi-
year time-period scope necessary, even if it means coping with the 
project possibly getting back-burnered for a year or more in the 
process.  I believe I've seen one such "from left field" feature merged 
in the years since I started following the list with 3.5-ish (tho 
unfortunately IDR what it was ATM), and a couple others that haven't yet 
been merged, but they have proof-of-concept code and have been approved 
for soon/next, tho they're backburnered for the moment due to 
dependencies and merge-queue scheduling issues.  The hot-spare patch set 
is in that last category, tho a few patches that had been in that set 
were recently dusted off, cleaned up and merged as they turned out to be 
useful in their own right.  That of course is a good thing, since it 
makes the remaining patch set smaller and simpler, and less likely to 
conflict with other current or queued/soon projects, as it moves forward 
in that queue.

> There was some updates to raid5/6 in kernel 4.12 that should fix (or at
> least improve) scrub/auto-repair. The write hole does still exist.
> 
> That being said there might be configurations where btrfs raid5/6 might
> be of some use. I think I read somewhere that you can set data to
> raid5/6 and METADATA to raid1 or 10 and you would risk loosing some data
> (but not the filesystem) in the event of a system crash / power failure.
> 
> This sounds tempting since it in theory would not make btrfs raid 5/6
> significantly less reliable than other RAID's which will corrupt your
> data if the disk happens to spits out bad bits without complaining (one
> possible exception that might catch this is md raid6 which I use). That
> being said there is no way I would personally use btrfs raid 5/6 even
> with metadata raid1/10 yet without proper tested backups at standby at
> this point.

Indeed.  Unfortunately, the infamous parity-write-hole is rather the 
antithesis of btrfs checksummed integrity feature, and until it's fixed, 
the reasons one would choose btrfs in general rather conflict with using 
raid56 mode in particular.  There's no immediate or easy fix.  There /is/ 
a possible mid-term fix, journaling writes, but that's likely to 
absolutely kill write speed, making it impractical for most usage, thus 
making the use-case small enough it's arguably not worth the trouble.  
But the real fix is unfortunately a near full rewrite of the current 
raid56 mode, using what we've learned from the current implementation to 
hopefully create a better one not affected by the write hole (yes, 
there's ways around it), which likely puts 3-5 years out, at least.  I'd 
put it on the 10 year list but it does seem there's quite an interest by 
current devs, thus upgrading it to the queued list.

Unfortunately, if that's the case, then it may well delay other projects, 
including the N-way-mirroring I have a personal interest in and that as I 
said has been on that 3-5 year list for longer than that now, even 
further.

So I'm 50 now; /maybe/ I'll be able to use btrfs N-way-mirroring from the 
nursing home, when I'm 70 or 80... if technology hasn't made btrfs as we 
know it obsolete by then...

> Anyway - I would worry more about getting raid5/6 to work properly
> before even thinking about multi-parity at all :)

For sure.  Even the "soon" N-way-mirroring, which was waiting for raid56 
mode, continues to wait...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-05  6:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02  2:45 Parity-based redundancy (RAID5/6/triple parity and beyond) on BTRFS and MDADM (Dec 2014) – Ronny Egners Blog Dave
2017-11-02  7:29 ` ronnie sahlberg
2017-11-02 11:21   ` Austin S. Hemmelgarn
2017-11-02 22:06 ` waxhead
2017-11-04  1:09   ` Chris Murphy
2017-11-05  6:52   ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).