* Re: Linux raid-like idea
2020-09-11 20:14 ` Brian Allen Vanderburg II
@ 2020-09-12 6:09 ` Song Liu
2020-09-12 14:40 ` Adam Goryachev
2020-09-12 16:19 ` antlists
2 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2020-09-12 6:09 UTC (permalink / raw)
To: Brian Allen Vanderburg II; +Cc: antlists, linux-raid
On Fri, Sep 11, 2020 at 1:15 PM Brian Allen Vanderburg II
<brianvanderburg2@aim.com> wrote:
>
>
> On 9/11/20 3:16 PM, antlists wrote:
> > Yes it is a bit like raid-4 since the data and parity disks are
> >> separated. In fact the idea could be better called a parity backed
> >> collection of independently accessed disks. While you would not get the
> >> advantage/performance increase of reads/writes going across multiple
> >> disks, the idea is primarily targeted to read-heavy applications, so in
> >> a typical use, read performance should be no worse than reading directly
> >> from a single un-raided disk, except in case of a disk failure where the
> >> parity is being used to calculated a block read on a missing disk.
> >> Writes would have more overhead since they would also have to
> >> calculate/update parity.
> >
> > Ummm...
> >
> > So let me word this differently. You're looking at pairing disks up,
> > with a filesystem on each pair (data/parity), and then using mergefs
> > on top. Compared with simple raid, that looks like a lose-lose
> > scenario to me.
> >
> > A raid-1 will read faster than a single disk, because it optimises
> > which disk to read from, and it will write faster too because your
> > typical parity calculation for a two-disk scenario is a no-op, which
> > might not optimise out.
>
>
> Not exactly. You can do a data + parity, but you could also do a data +
> data + parity or a data + data + data + parity. Or with more than one
> parity disk data + data + data + data +parity + parity, etc.
>
> Best viewed in a fixed-width font, and probably make more sense read
> from the bottom up:
>
>
> /data
> |
> / mergerfs \
> / \
> /pool1 /pool2 /pool3 (or /home or /usr/local, etc)
> | | |
> The filesystem built upon the /dev/frX devices can be used however the
> user wants.
> | | |
> ----------------------------------------
> | | |
> ext4 (etc) ext4(etc) (ext4/etc, could in theory even have
> multiple partitions then filesystems)
> | | |
> Each exposed block device /dev/frX can have a filesystem/partition table
> placed on it, which is placed onto the single mapped disk.
> Any damage/issues on one data disk would not affect the other data disks
> at all. However, since the collection of data disks also has parity for
> them,
> damage to a data disk can be restored from the parity and other data
> disks. If, during restore, something prevents the restore, then only
> the bad
> data disks have an issue, the other data disks would still be fully
> accessible, and any filesystem on them still intact since the entire
> filesystem
> from anything on /dev/fr0 would be only on /dev/sda1, and so on.
> | | |
> ----------------------------------------
> | | |
> /dev/fr0 /dev/fr1 /dev/fr2
> | | |
> Individual data disks are passed through as fully exposed block devices,
> minus any overhead for information/data structures for the 'raid'.
> A block X on /dev/fr0 maps to block X + offset on /dev/sda1 and so on
> | | |
> Raid/parity backed disk layer (data: /dev/sda1=/dev/fr0,
> /dev/sdb1=/dev/fr1, /dev/sdc1=/dev/fr2, parity: /dev/sdd1)
> | | |
> -----------------------------------------------------
> | | | |
> /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 (parity)
>
>
>
> So basically at the raid (or parity backed layer), multiple disks and
> not just a single disk, can be backed by the parity disk (ideally
> support for more than on parity disk as well) Only difference is,
> instead of joining the disks as one block device /dev/md0, each data
> disk gets its own block device and so has it's own filesystem(s) on it
> independently of the other disks. A single data disk can be removed
> entirely, taken to a different system, and still be read (would need to
> do losetup with an offset to get to the start of the
> filesystem/partition table though), and the other data disks would still
> be readable on the original system. So any total loss of a data disk
> would not affect the other data disks files. In this example, /data
> could be missing some files if /pool1 (/dev/sda1) died, but the files on
> /pool2 would still be entirely accessible as would any filesystem from
> /dev/sdc1. There is no performance advantage to such a setup. The
> advantage is that should something real bad happen and it become
> impossible to restore some data disk(s), the other disk(s) are still
> accessible.
>
> Read from /dev/fr0 = read from /dev/sda1 (adjusted for any overhead/headers)
> Read from /dev/fr1 = read from /dev/sdb1 (adjusted for any overhead/headers)
> Read from /dev/fr2 = read from /dev/sdc1 (adjusted for any overhead/headers)
> Write to /dev/fr0 = write to /dev/sda1 ((adjusted for any
> overhead/headers) and parity /dev/sdd1
> Write to /dev/fr1 = write to /dev/sdb1 ((adjusted for any
> overhead/headers) and parity /dev/sdd1
> Write to /dev/fr2 = write to /dev/sdc1 ((adjusted for any
> overhead/headers) and parity /dev/sdd1
>
> Read from /dev/fr0 (/dev/sda1 missing) = read from parity and other
> disks, recalculate original block)
> During rebuild, /dev/sdd dies as well (unable to rebuild from parity now
> since /dev/sda and /dev/sdd are missing)
> Lost: /dev/sda1
> Still present: /dev/sdb1 -- some files from the pool will be missing
> since /pool1 is missing but the files on /pool2 are still present in
> their entirety
> Still present: /pool3 (or /home or /usr/local, etc, whatever
> /dev/fr2 was used for)
>
> >>
> >>> Personally, I'm looking at something like raid-61 as a project. That
> >>> would let you survive four disk failures ...
> >>
> >> Interesting. I'll check that out more later, but from what it seems so
> >> far there is a lot of overhead (10 1TB disks would only be 3TB of data
> >> (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of
> >> data). My currently solution since I'ts basically just storing bulk
> >> data, is mergerfs and snapraid, and from the documents of snapraid, 10
> >> 1TB disks would provide 6TB if using 4 for parity. However it's parity
> >> calculations seem to be more complex as well.
> >
> > Actually no. Don't forget that, as far as linux is concerned, raid-10
> > and raid-1+0 are two *completely* *different* things. You can raid-10
> > three disks, but you need four for raid-1+0.
> >
> > You've mis-calculated raid-6+1 - that gives you 6TB for 10 disks (two
> > 3TB arrays). I think I would probably get more with raid-61, but every
> > time I think about it my brain goes "whoa!!!", and I'll need to start
> > concentrating on it to work out exactly what's going on.
>
> That's right, I get the various combinations confused. So does raid61
> allow for losing 4 disks in any order and still recovering? or would
> some order of disks make it where just 3 disks lost and be bad?
> Iinteresting non-the-less and I'll have to look into it. Obviously it's
> not intended to as a replacement for backing up important data, but, for
> me any way, just away to minimize loss of any trivial bulk data/files.
>
> It would be nice if the raid modules had support for methods that could
> support a total of more disks in any order lost without loosing data.
> Snapraid source states that it uses some Cauchy Matrix algorithm which
> in theory could loose up to 6 disks if using 6 parity disks, in any
> order, and still be able to restore the data. I'm not familiar with the
> math behind it so can't speak to the accuracy of that claim.
>
> >> This is actually the main purpose of the idea. Due to the data on the
> >> disks in a traditional raid5/6 being mapped from multiple disks to a
> >> single logical block device, and so the structures of any file systems
> >> and their files scattered across all the disks, losing one more than the
> >> number of available lost disks would make the entire filesystem(s) and
> >> all files virtually unrecoverable.
> >
> > But raid 5/6 give you much more usable space than a mirror. What I'm
> > having trouble getting to grips with in your idea is how is it an
> > improvement on a mirror? It looks to me like you're proposing a 2-disk
> > raid-4 as the underlying storage medium, with mergefs on top. Which is
> > effectively giving you a poorly-performing mirror. A crappy raid-1+0,
> > basically.
>
> I do apologize it seems I'm having a little difficulty clearly
> explaining the idea. Hopefully the chart above helps explain it better
> than I have been. Imagine raid 5 or 6, but with no striping (so the
> parity goes on their own disks), and the data disks passed through as
> their down block devices each. You lose any performance benefits of the
> striping of data/parity, but the data stored on any data disk is only on
> that data disk, and same for the others, so losing all parity and a data
> disk, would not lose the data on the other data disks.
>
> >>
> >> By keeping each data disk separate and exposed as it's own block device
> >> with some parity backup, each disk contains an entire filesystem(s) on
> >> it's own to be used however a user decides. The loss of one of the
> >> disks during a rebuild would not cause full data loss anymore but only
> >> of the filesystem(s) on that disk. The data on the other disks would
> >> still be intact and readable, although depending on the user's usage,
> >> may be missing files if they used a union/merge filesystem on top of
> >> them. A rebuild would still have the same issues, would have to read
> >> all the remaining disks to rebuild the lost disk. I'm not really sure
> >> of any way around that since parity would essentially be calculated as
> >> the xor of the same block on all the data disks.
> >>
> > And as I understand your setup, you also suffer from the same problem
> > as raid-10 - lose one disk and you're fine, lose two and it's russian
> > roulette whether you can recover your data. raid-6 is *any* two and
> > you're fine, raid-61 would be *any* four and you're fine.
>
> Not exactly. Since the data disks are passed through as individual
> block devices instead of 'joined' into a single block device, if you
> lose one disk (assuming only one disk of parity) then you are fine. If
> you lose two, then you've only lost the data on the lost data disk. The
> other data disks would still have their in-tact filesystems on them.
> Depending on how they are used, some files may be missing. IE a mergerfs
> between two mount points would be missing any files on the lost mount
> point, but the other files would still be accessible.
>
>
> It may or may not (leaning more to probably not) have any use. I'm
> hoping from the above at least the idea is better understood. I do
> apologize if it's still not clear/
>
IIUC...
If all the disks are of the same size, this looks like raid-4 with HUGE chunk
size. Say a raid-4 with 4 disks, the drive is 2TiB, and the chunk size is also
2TiB. The array is 6TiB. LBA 0-2TiB goes to disk 0, LBA 2-4TiB goes to
disk 1, and LBA 4-6TiB goes to disk 2. disk 3 is all parities.
Then we create 3x 2TiB partitions on the array (assuming the partition table
is free...). Then we can create 3x file systems on these partitions.
Is this same/similar to the idea?
Thanks,
Song
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-11 20:14 ` Brian Allen Vanderburg II
2020-09-12 6:09 ` Song Liu
@ 2020-09-12 14:40 ` Adam Goryachev
2020-09-12 16:19 ` antlists
2 siblings, 0 replies; 26+ messages in thread
From: Adam Goryachev @ 2020-09-12 14:40 UTC (permalink / raw)
To: Brian Allen Vanderburg II, antlists, linux-raid
On 12 September 2020 6:14:51 am AEST, Brian Allen Vanderburg II <brianvanderburg2@aim.com> wrote:
>
>On 9/11/20 3:16 PM, antlists wrote:
>> Yes it is a bit like raid-4 since the data and parity disks are
>>> separated. In fact the idea could be better called a parity backed
>>> collection of independently accessed disks. While you would not get
>the
>>> advantage/performance increase of reads/writes going across multiple
>>> disks, the idea is primarily targeted to read-heavy applications, so
>in
>>> a typical use, read performance should be no worse than reading
>directly
>>> from a single un-raided disk, except in case of a disk failure where
>the
>>> parity is being used to calculated a block read on a missing disk.
>>> Writes would have more overhead since they would also have to
>>> calculate/update parity.
>>
>> Ummm...
>>
>> So let me word this differently. You're looking at pairing disks up,
>> with a filesystem on each pair (data/parity), and then using mergefs
>> on top. Compared with simple raid, that looks like a lose-lose
>> scenario to me.
>>
>> A raid-1 will read faster than a single disk, because it optimises
>> which disk to read from, and it will write faster too because your
>> typical parity calculation for a two-disk scenario is a no-op, which
>> might not optimise out.
>
>
>Not exactly. You can do a data + parity, but you could also do a data
>+
>data + parity or a data + data + data + parity. Or with more than one
>parity disk data + data + data + data +parity + parity, etc.
>
>Best viewed in a fixed-width font, and probably make more sense read
>from the bottom up:
>
>
> /data
> |
> / mergerfs \
> / \
>/pool1 /pool2 /pool3 (or /home or /usr/local, etc)
> | | |
>The filesystem built upon the /dev/frX devices can be used however the
>user wants.
> | | |
>----------------------------------------
> | | |
>ext4 (etc) ext4(etc) (ext4/etc, could in theory even have
>multiple partitions then filesystems)
> | | |
>Each exposed block device /dev/frX can have a filesystem/partition
>table
>placed on it, which is placed onto the single mapped disk.
>Any damage/issues on one data disk would not affect the other data
>disks
>at all. However, since the collection of data disks also has parity
>for
>them,
>damage to a data disk can be restored from the parity and other data
>disks. If, during restore, something prevents the restore, then only
>the bad
>data disks have an issue, the other data disks would still be fully
>accessible, and any filesystem on them still intact since the entire
>filesystem
>from anything on /dev/fr0 would be only on /dev/sda1, and so on.
> | | |
>----------------------------------------
> | | |
>/dev/fr0 /dev/fr1 /dev/fr2
> | | |
>Individual data disks are passed through as fully exposed block
>devices,
>minus any overhead for information/data structures for the 'raid'.
>A block X on /dev/fr0 maps to block X + offset on /dev/sda1 and so on
> | | |
>Raid/parity backed disk layer (data: /dev/sda1=/dev/fr0,
>/dev/sdb1=/dev/fr1, /dev/sdc1=/dev/fr2, parity: /dev/sdd1)
> | | |
>-----------------------------------------------------
> | | | |
>/dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 (parity)
>
>
>
>So basically at the raid (or parity backed layer), multiple disks and
>not just a single disk, can be backed by the parity disk (ideally
>support for more than on parity disk as well) Only difference is,
>instead of joining the disks as one block device /dev/md0, each data
>disk gets its own block device and so has it's own filesystem(s) on it
>independently of the other disks. A single data disk can be removed
>entirely, taken to a different system, and still be read (would need to
>do losetup with an offset to get to the start of the
>filesystem/partition table though), and the other data disks would
>still
>be readable on the original system. So any total loss of a data disk
>would not affect the other data disks files. In this example, /data
>could be missing some files if /pool1 (/dev/sda1) died, but the files
>on
>/pool2 would still be entirely accessible as would any filesystem from
>/dev/sdc1. There is no performance advantage to such a setup. The
>advantage is that should something real bad happen and it become
>impossible to restore some data disk(s), the other disk(s) are still
>accessible.
>
>Read from /dev/fr0 = read from /dev/sda1 (adjusted for any
>overhead/headers)
>Read from /dev/fr1 = read from /dev/sdb1 (adjusted for any
>overhead/headers)
>Read from /dev/fr2 = read from /dev/sdc1 (adjusted for any
>overhead/headers)
>Write to /dev/fr0 = write to /dev/sda1 ((adjusted for any
>overhead/headers) and parity /dev/sdd1
>Write to /dev/fr1 = write to /dev/sdb1 ((adjusted for any
>overhead/headers) and parity /dev/sdd1
>Write to /dev/fr2 = write to /dev/sdc1 ((adjusted for any
>overhead/headers) and parity /dev/sdd1
>
>Read from /dev/fr0 (/dev/sda1 missing) = read from parity and other
>disks, recalculate original block)
>During rebuild, /dev/sdd dies as well (unable to rebuild from parity
>now
>since /dev/sda and /dev/sdd are missing)
> Lost: /dev/sda1
> Still present: /dev/sdb1 -- some files from the pool will be
>missing
>since /pool1 is missing but the files on /pool2 are still present in
>their entirety
> Still present: /pool3 (or /home or /usr/local, etc, whatever
>/dev/fr2 was used for)
>
>>>
>>>> Personally, I'm looking at something like raid-61 as a project.
>That
>>>> would let you survive four disk failures ...
>>>
>>> Interesting. I'll check that out more later, but from what it seems
>so
>>> far there is a lot of overhead (10 1TB disks would only be 3TB of
>data
>>> (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth
>of
>>> data). My currently solution since I'ts basically just storing bulk
>>> data, is mergerfs and snapraid, and from the documents of snapraid,
>10
>>> 1TB disks would provide 6TB if using 4 for parity. However it's
>parity
>>> calculations seem to be more complex as well.
>>
>> Actually no. Don't forget that, as far as linux is concerned, raid-10
>> and raid-1+0 are two *completely* *different* things. You can raid-10
>> three disks, but you need four for raid-1+0.
>>
>> You've mis-calculated raid-6+1 - that gives you 6TB for 10 disks (two
>> 3TB arrays). I think I would probably get more with raid-61, but
>every
>> time I think about it my brain goes "whoa!!!", and I'll need to start
>> concentrating on it to work out exactly what's going on.
>
>That's right, I get the various combinations confused. So does raid61
>allow for losing 4 disks in any order and still recovering? or would
>some order of disks make it where just 3 disks lost and be bad?
>Iinteresting non-the-less and I'll have to look into it. Obviously
>it's
>not intended to as a replacement for backing up important data, but,
>for
>me any way, just away to minimize loss of any trivial bulk data/files.
>
>It would be nice if the raid modules had support for methods that could
>support a total of more disks in any order lost without loosing data.
>Snapraid source states that it uses some Cauchy Matrix algorithm which
>in theory could loose up to 6 disks if using 6 parity disks, in any
>order, and still be able to restore the data. I'm not familiar with
>the
>math behind it so can't speak to the accuracy of that claim.
>
>>> This is actually the main purpose of the idea. Due to the data on
>the
>>> disks in a traditional raid5/6 being mapped from multiple disks to a
>>> single logical block device, and so the structures of any file
>systems
>>> and their files scattered across all the disks, losing one more than
>the
>>> number of available lost disks would make the entire filesystem(s)
>and
>>> all files virtually unrecoverable.
>>
>> But raid 5/6 give you much more usable space than a mirror. What I'm
>> having trouble getting to grips with in your idea is how is it an
>> improvement on a mirror? It looks to me like you're proposing a
>2-disk
>> raid-4 as the underlying storage medium, with mergefs on top. Which
>is
>> effectively giving you a poorly-performing mirror. A crappy raid-1+0,
>> basically.
>
>I do apologize it seems I'm having a little difficulty clearly
>explaining the idea. Hopefully the chart above helps explain it better
>than I have been. Imagine raid 5 or 6, but with no striping (so the
>parity goes on their own disks), and the data disks passed through as
>their down block devices each. You lose any performance benefits of
>the
>striping of data/parity, but the data stored on any data disk is only
>on
>that data disk, and same for the others, so losing all parity and a
>data
>disk, would not lose the data on the other data disks.
>
>>>
>>> By keeping each data disk separate and exposed as it's own block
>device
>>> with some parity backup, each disk contains an entire filesystem(s)
>on
>>> it's own to be used however a user decides. The loss of one of the
>>> disks during a rebuild would not cause full data loss anymore but
>only
>>> of the filesystem(s) on that disk. The data on the other disks
>would
>>> still be intact and readable, although depending on the user's
>usage,
>>> may be missing files if they used a union/merge filesystem on top of
>>> them. A rebuild would still have the same issues, would have to
>read
>>> all the remaining disks to rebuild the lost disk. I'm not really
>sure
>>> of any way around that since parity would essentially be calculated
>as
>>> the xor of the same block on all the data disks.
>>>
>> And as I understand your setup, you also suffer from the same problem
>> as raid-10 - lose one disk and you're fine, lose two and it's russian
>> roulette whether you can recover your data. raid-6 is *any* two and
>> you're fine, raid-61 would be *any* four and you're fine.
>
>Not exactly. Since the data disks are passed through as individual
>block devices instead of 'joined' into a single block device, if you
>lose one disk (assuming only one disk of parity) then you are fine. If
>you lose two, then you've only lost the data on the lost data disk. The
>other data disks would still have their in-tact filesystems on them.
>Depending on how they are used, some files may be missing. IE a
>mergerfs
>between two mount points would be missing any files on the lost mount
>point, but the other files would still be accessible.
>
>
>It may or may not (leaning more to probably not) have any use. I'm
>hoping from the above at least the idea is better understood. I do
>apologize if it's still not clear/
>
Possibly silly question, if you lost 3 data disks, but still had your parity disk, how do you recover all your data? Doesn't sound possible to store enough data on one disk to recover three....
BTW, my preferred method is raid6 on machine a, raid6 6 on machine b, and then drbd to join them together. You can lose a maximum of all disks on one machine and two on the other, or any 2 disks on both machines (total of 4). Basically raid 61 but split between machines.
Regards
Adam
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-11 20:14 ` Brian Allen Vanderburg II
2020-09-12 6:09 ` Song Liu
2020-09-12 14:40 ` Adam Goryachev
@ 2020-09-12 16:19 ` antlists
2020-09-12 17:28 ` John Stoffel
2020-09-14 17:19 ` Phillip Susi
2 siblings, 2 replies; 26+ messages in thread
From: antlists @ 2020-09-12 16:19 UTC (permalink / raw)
To: Brian Allen Vanderburg II, linux-raid
On 11/09/2020 21:14, Brian Allen Vanderburg II wrote:
> That's right, I get the various combinations confused. So does raid61
> allow for losing 4 disks in any order and still recovering? or would
> some order of disks make it where just 3 disks lost and be bad?
> Iinteresting non-the-less and I'll have to look into it. Obviously it's
> not intended to as a replacement for backing up important data, but, for
> me any way, just away to minimize loss of any trivial bulk data/files.
Yup. Raid 6 has two parity disks, and that's mirrored to give four
parity disks. So as an *absolute* *minimum*, raid-61 could lose four
disks with no data loss.
Throw in the guarantee that, with a mirror, you can lose an entire
mirror with no data-loss, that means - with luck and a following wind -
you could lose half your disks, PLUS the two parities in the remaining
disks, and still recover your data. So with a raid-6+1, if I had twelve
disks, I could lose EIGHT disks and still have a *chance* of recovering
my array. I'm not quite sure what difference raid-61 would make.
(That says to me, if I have a raid-61, I need as a minimum a complete
set of data disks. That also says to me, if I've splatted an 8+2 raid-61
across 11 disks, I only need 7 for a full recovery despite needing a
minimum of 8, so something isn't quite right here... I suspect the 7
would be enough but I did say my mind goes Whooaaa!!!!)
>
> It would be nice if the raid modules had support for methods that could
> support a total of more disks in any order lost without loosing data.
> Snapraid source states that it uses some Cauchy Matrix algorithm which
> in theory could loose up to 6 disks if using 6 parity disks, in any
> order, and still be able to restore the data. I'm not familiar with the
> math behind it so can't speak to the accuracy of that claim.
That's easy, it's just whether it's worth it. Look at the maths behind
raid-6. The "one parity disk" methods, 4 or 5, just use XOR. But that
only works once, a second XOR parity disk adds no new redundancy and is
worthless. I'm guessing raid-6 uses that Cauchy method you talk about -
certainly it can generate as many parity disks as you like ... so that
claim is good, even if raid-6 doesn't use that particular technique.
If someone wants to, mod'ing raid-6 to use 3 parity disks shouldn't be
that hard ...
But going back to your original idea, I've been thinking about it. And
it struck me - you NEED to regenerate parity EVERY TIME you write data
to disk! Otherwise, writing one file on one disk instantly trashes your
ability to recover all the other files in the same position on the other
disks. WHOOPS! But if you think it's a good idea, by all means try and
do it.
The other thing I'd suggest here, is try and make it more like raid-5
than raid-4. You have X disks, let's say 5. So one disk each is numbered
0, 1, 2, 3, 4. As part of formatting the disk ready for raid, you create
a file containing every block where LBA mod 5 equals disk number. So as
you recalculate your parities, that's where they go.
Cheers,
Wol
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-12 16:19 ` antlists
@ 2020-09-12 17:28 ` John Stoffel
2020-09-12 18:41 ` antlists
2020-09-14 17:19 ` Phillip Susi
1 sibling, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-12 17:28 UTC (permalink / raw)
To: antlists; +Cc: Brian Allen Vanderburg II, linux-raid
>>>>> "antlists" == antlists <antlists@youngman.org.uk> writes:
antlists> On 11/09/2020 21:14, Brian Allen Vanderburg II wrote:
>> That's right, I get the various combinations confused. So does raid61
>> allow for losing 4 disks in any order and still recovering? or would
>> some order of disks make it where just 3 disks lost and be bad?
>> Iinteresting non-the-less and I'll have to look into it. Obviously it's
>> not intended to as a replacement for backing up important data, but, for
>> me any way, just away to minimize loss of any trivial bulk data/files.
antlists> Yup. Raid 6 has two parity disks, and that's mirrored to give four
antlists> parity disks. So as an *absolute* *minimum*, raid-61 could lose four
antlists> disks with no data loss.
antlists> Throw in the guarantee that, with a mirror, you can lose an entire
antlists> mirror with no data-loss, that means - with luck and a following wind -
antlists> you could lose half your disks, PLUS the two parities in the remaining
antlists> disks, and still recover your data. So with a raid-6+1, if I had twelve
antlists> disks, I could lose EIGHT disks and still have a *chance* of recovering
antlists> my array. I'm not quite sure what difference raid-61 would make.
Of course your useful storage is 12/2 -2 = 4 disks, so only 33%
useable space. Not very good. At that point, I'd just go with RAID 1
pairs and striped together with RAID 0 (RAID 1+0) for only a 50% loss
of space. Now if I lose the wrong four disks, I'm screwed, as opposed
to before where I can lose *any* four disks.
The problem with RAID6 is that random small IO writes have terrible
performance. RAID1+0 gives you much better performance. Arstechnica
had a great article on this earlier in the year that disk actual testing.
I think (and haven't done the math) that Erasure encoding gives you
better protection as you scale. And I've even thought about glusterfs
or cephfs for home storage using a bunch of small single board
computers each talking to one disk for storage. But... it's hard to
justify.
These days, I think it's better for your main storage to be a three
way mirror for the important stuff, performance wise. And RAID6 with
hot spare for your large streaming collections like videos and such.
But even then... it's hard to justify since it costs alot.
antlists> (That says to me, if I have a raid-61, I need as a minimum a complete
antlists> set of data disks. That also says to me, if I've splatted an 8+2 raid-61
antlists> across 11 disks, I only need 7 for a full recovery despite needing a
antlists> minimum of 8, so something isn't quite right here... I suspect the 7
antlists> would be enough but I did say my mind goes Whooaaa!!!!)
>> It would be nice if the raid modules had support for methods that could
>> support a total of more disks in any order lost without loosing data.
>> Snapraid source states that it uses some Cauchy Matrix algorithm which
>> in theory could loose up to 6 disks if using 6 parity disks, in any
>> order, and still be able to restore the data. I'm not familiar with the
>> math behind it so can't speak to the accuracy of that claim.
antlists> That's easy, it's just whether it's worth it. Look at the
antlists> maths behind raid-6. The "one parity disk" methods, 4 or 5,
antlists> just use XOR. But that only works once, a second XOR parity
antlists> disk adds no new redundancy and is worthless. I'm guessing
antlists> raid-6 uses that Cauchy method you talk about - certainly it
antlists> can generate as many parity disks as you like ... so that
antlists> claim is good, even if raid-6 doesn't use that particular
antlists> technique.
antlists> If someone wants to, mod'ing raid-6 to use 3 parity disks
antlists> shouldn't be that hard ...
It's not, but there's diminishing returns because you now have to do
the RMW cycle across even more disks, which is slow.
antlists> But going back to your original idea, I've been thinking
antlists> about it. And it struck me - you NEED to regenerate parity
antlists> EVERY TIME you write data to disk! Otherwise, writing one
antlists> file on one disk instantly trashes your ability to recover
antlists> all the other files in the same position on the other
antlists> disks. WHOOPS! But if you think it's a good idea, by all
antlists> means try and do it.
Correct. When compute parity, you do it across blocks. And the
parity calculation is effectively free these days. The cost comes
from the (on disks at least) rotational latency to read the entire
stripe across all the disks, modify one to N bytes in that stripe,
then re-writing the stripe back to all the disks. That's alot of IO.
With RAID1, you just make two writes, one to each disk. Done. Even
with a three way mirror, it's simpler.
Now the RAID6 works better if you are replacing the entire stripe,
then you can drop your IOs in half, but you still need to write chunks
to different disks.
This is why big vendors have log based filesystems (Netapp, EMC,
Isilon, etc) with battery backed RAM caches, so they can A) tell the
client the writes are done, B) collect large changes into bigger
chunks, and C) write them in linear fashion down to the disk.
Log based filesystems are great for this. Until they get fragmented.
SSDs help in that they really don't have a seek cost at all, so you
can handle fragmentation better. BUT! SSDs are generally written
assuming 512 byte blocks, but the underlying SSDs now generally use 4k
blocks on the NAND flash, so there's another layer of fragmentation
and wear levelling and other stuff happening outside your control
there as well.
antlists> The other thing I'd suggest here, is try and make it more
antlists> like raid-5 than raid-4. You have X disks, let's say 5. So
antlists> one disk each is numbered 0, 1, 2, 3, 4. As part of
antlists> formatting the disk ready for raid, you create a file
antlists> containing every block where LBA mod 5 equals disk
antlists> number. So as you recalculate your parities, that's where
antlists> they go.
RAID4 suffers from the parity disk becoming a super hot spot, since it
needs to get written to no matter what. No one uses it.
Until we can get back to cost effective SSDs using SLC NAND, RAID is
here to stay. And so is mirroring since it does help protect from
alot of issues, both permanent and temporary.
I had one of my 4tb disks fall out of my main VG, but I didn't lose
and data, I just checked the disk and added it back in. I've got a
new 4tb disk on order along with a drive cage so I can balance things
better.
But it's almost to the point where it's cheaper to buy a pair of 8tb
drives to replace the 4x4tb drives I'm using now. But I probably
won't.
I could write for hours here... it's a tough problem space to work
through.
John
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-12 17:28 ` John Stoffel
@ 2020-09-12 18:41 ` antlists
2020-09-13 12:50 ` John Stoffel
0 siblings, 1 reply; 26+ messages in thread
From: antlists @ 2020-09-12 18:41 UTC (permalink / raw)
To: John Stoffel; +Cc: linux-raid
On 12/09/2020 18:28, John Stoffel wrote:
> I had one of my 4tb disks fall out of my main VG, but I didn't lose
> and data, I just checked the disk and added it back in. I've got a
> new 4tb disk on order along with a drive cage so I can balance things
> better.
>
> But it's almost to the point where it's cheaper to buy a pair of 8tb
> drives to replace the 4x4tb drives I'm using now. But I probably
> won't.
>
You should have bought an 8TB to replace the 4 ... one more failure :-(
and you would have your 2x8 (and raid-0 the remaining 4s to provide your
3rd mirror).
> I could write for hours here... it's a tough problem space to work
> through.
Made worse if, like me, you're more into logical completeness than "will
it finish in finite time" :-)
Cheers,
Wol
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-12 18:41 ` antlists
@ 2020-09-13 12:50 ` John Stoffel
2020-09-13 16:01 ` Wols Lists
0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-13 12:50 UTC (permalink / raw)
To: antlists; +Cc: John Stoffel, linux-raid
>>>>> "antlists" == antlists <antlists@youngman.org.uk> writes:
antlists> On 12/09/2020 18:28, John Stoffel wrote:
>> I had one of my 4tb disks fall out of my main VG, but I didn't lose
>> and data, I just checked the disk and added it back in. I've got a
>> new 4tb disk on order along with a drive cage so I can balance things
>> better.
>>
>> But it's almost to the point where it's cheaper to buy a pair of 8tb
>> drives to replace the 4x4tb drives I'm using now. But I probably
>> won't.
>>
antlists> You should have bought an 8TB to replace the 4 ... one more
antlists> failure :-( and you would have your 2x8 (and raid-0 the
antlists> remaining 4s to provide your 3rd mirror).
I know, I really need to buy another drive, but my main system is
full, so I *also* need to either get a new case, or one of those 5 x
3.5" into 3 x 5.25" bay cages to make some room. Decisions... decisions...
>> I could write for hours here... it's a tough problem space to work
>> through.
antlists> Made worse if, like me, you're more into logical
antlists> completeness than "will it finish in finite time" :-)
For sure!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-13 12:50 ` John Stoffel
@ 2020-09-13 16:01 ` Wols Lists
2020-09-13 23:49 ` Brian Allen Vanderburg II
2020-09-15 2:09 ` John Stoffel
0 siblings, 2 replies; 26+ messages in thread
From: Wols Lists @ 2020-09-13 16:01 UTC (permalink / raw)
To: John Stoffel; +Cc: linux-raid
On 13/09/20 13:50, John Stoffel wrote:
> I know, I really need to buy another drive, but my main system is
> full, so I *also* need to either get a new case, or one of those 5 x
> 3.5" into 3 x 5.25" bay cages to make some room. Decisions... decisions...
I know I keep on saying it, but I really think I'm close to getting my
new main system (and hence my development system) sorted, and I think I
need to buy one of those cages too.
If you did get those two 8TB drives, you could still have your 8TB 3-way
mirror without needing any more bays/sata-ports.
My problem, of course, is if I'm playing with raid layouts I need as
many disks as I can cram in :-) I'm counting 6 tucked away in my drawer,
which means I'll almost certainly need to add an add-in 4-way sata card,
and as those drives are a mixture of 500GB and 1TB, I'll probably split
the 1TBs into 2x500GB and ignore md complaining that I have multiple
components on the same physical disk ...
Cheers,
Wol
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-13 16:01 ` Wols Lists
@ 2020-09-13 23:49 ` Brian Allen Vanderburg II
2020-09-15 2:12 ` John Stoffel
2020-09-15 2:09 ` John Stoffel
1 sibling, 1 reply; 26+ messages in thread
From: Brian Allen Vanderburg II @ 2020-09-13 23:49 UTC (permalink / raw)
To: Wols Lists, John Stoffel; +Cc: linux-raid
OT, but I've got one of those 3x5.25 to 5x3.5 hot swap bays in my main
system and I love it. I'm using it with an LSI 9207-8i as my
motherboard only supports a few SATA connectors with several already
used, so needed something to provide more ports for future expansion for
my main system's storage.
For more drives, you can use one of those external drive shelf boxes. I
currently have the HP M6710 I got off eBay with all caddies for about
$100, which can house 24 2.5 hard drives in a 2U chassis and I've used
an LSI 9201-16e to access it (both HBAs flashed to 20.00.07 or something
like that). I've already tested it and it works great, though a bit
loud on the fans when powering on. My understanding is also if you have
more than one of these shelves you can daisy chain them via their ports
SAS card -> Shelf 1 -> Shelf 2, etc, even cycling back to the SAS card
for multi-path support (which is at the time over my head). My plan for
it is to put in my network closet once I get it cleaned out and cabling
ran better to provide whole-house NAS storage. I think there is also an
M6720 model for 24 3.5 drives in a 4U chassis. There is also NetApp
shelf I was looking at but from reading looks like it uses a QSFP
connector on it's IOM, and the cables that converted from SFF-8088 were
quite expensive.
On 9/13/20 12:01 PM, Wols Lists wrote:
> On 13/09/20 13:50, John Stoffel wrote:
>> I know, I really need to buy another drive, but my main system is
>> full, so I *also* need to either get a new case, or one of those 5 x
>> 3.5" into 3 x 5.25" bay cages to make some room. Decisions... decisions...
> I know I keep on saying it, but I really think I'm close to getting my
> new main system (and hence my development system) sorted, and I think I
> need to buy one of those cages too.
>
> If you did get those two 8TB drives, you could still have your 8TB 3-way
> mirror without needing any more bays/sata-ports.
>
> My problem, of course, is if I'm playing with raid layouts I need as
> many disks as I can cram in :-) I'm counting 6 tucked away in my drawer,
> which means I'll almost certainly need to add an add-in 4-way sata card,
> and as those drives are a mixture of 500GB and 1TB, I'll probably split
> the 1TBs into 2x500GB and ignore md complaining that I have multiple
> components on the same physical disk ...
>
> Cheers,
> Wol
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-13 23:49 ` Brian Allen Vanderburg II
@ 2020-09-15 2:12 ` John Stoffel
[not found] ` <43ce60a7-64d1-51bc-f29c-7a6388ad91d5@grumpydevil.homelinux.org>
0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15 2:12 UTC (permalink / raw)
To: Brian Allen Vanderburg II; +Cc: Wols Lists, John Stoffel, linux-raid
>>>>> "Brian" == Brian Allen Vanderburg <brianvanderburg2@aim.com> writes:
Brian> OT, but I've got one of those 3x5.25 to 5x3.5 hot swap bays in
Brian> my main system and I love it. I'm using it with an LSI 9207-8i
Brian> as my motherboard only supports a few SATA connectors with
Brian> several already used, so needed something to provide more ports
Brian> for future expansion for my main system's storage.
Very much like what I'm doing with my LSI board providing most of my
data storage, with boot disks (mirrored) on the MB SATA ports. Makes
for a simpler setup.
Brian> For more drives, you can use one of those external drive shelf
Brian> boxes. I currently have the HP M6710 I got off eBay with all
Brian> caddies for about $100, which can house 24 2.5 hard drives in a
Brian> 2U chassis and I've used an LSI 9201-16e to access it (both
Brian> HBAs flashed to 20.00.07 or something like that). I've already
Brian> tested it and it works great, though a bit loud on the fans
Brian> when powering on. My understanding is also if you have more
Brian> than one of these shelves you can daisy chain them via their
Brian> ports SAS card -> Shelf 1 -> Shelf 2, etc, even cycling back to
Brian> the SAS card for multi-path support (which is at the time over
Brian> my head). My plan for it is to put in my network closet once I
Brian> get it cleaned out and cabling ran better to provide
Brian> whole-house NAS storage. I think there is also an M6720 model
Brian> for 24 3.5 drives in a 4U chassis. There is also NetApp shelf
Brian> I was looking at but from reading looks like it uses a QSFP
Brian> connector on it's IOM, and the cables that converted from
Brian> SFF-8088 were quite expensive.
This is a nice idea, just not sure I want to go with 2.5" drives since
they're expensive per TB of storage. I just want one of those old
style monster cases with 8 x 5.25" bays so I can fill it with 3.5"
bays. Or there was a review on Phoronix.com about a 4U chassis that
looked pretty good, esp with USB3 front ports.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-13 16:01 ` Wols Lists
2020-09-13 23:49 ` Brian Allen Vanderburg II
@ 2020-09-15 2:09 ` John Stoffel
2020-09-15 11:14 ` Roger Heflin
1 sibling, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15 2:09 UTC (permalink / raw)
To: Wols Lists; +Cc: John Stoffel, linux-raid
>>>>> "Wols" == Wols Lists <antlists@youngman.org.uk> writes:
Wols> On 13/09/20 13:50, John Stoffel wrote:
>> I know, I really need to buy another drive, but my main system is
>> full, so I *also* need to either get a new case, or one of those 5 x
>> 3.5" into 3 x 5.25" bay cages to make some room. Decisions... decisions...
Wols> I know I keep on saying it, but I really think I'm close to getting my
Wols> new main system (and hence my development system) sorted, and I think I
Wols> need to buy one of those cages too.
I've been looking at them for a while now, but hesitating
because... not sure why. I'm using a CoolerMaster case with five
5.25" bays, plus a 3.5" bay external, and another three or four
internal 3.5" bays. Works great. Nice and plain and not flashing
lights or other bling. And not too loud either. Which is good.
But I've used crappy drive cages before, crappy hot swap ones. Not
good. And I think it's time I just went with a 4U rack mount with a
bunch of hot swap bays, if I could only find one that wasn't an arm
and a leg.
Wols> If you did get those two 8TB drives, you could still have your
Wols> 8TB 3-way mirror without needing any more bays/sata-ports.
Very true.
Wols> My problem, of course, is if I'm playing with raid layouts I
Wols> need as many disks as I can cram in :-) I'm counting 6 tucked
Wols> away in my drawer, which means I'll almost certainly need to add
Wols> an add-in 4-way sata card, and as those drives are a mixture of
Wols> 500GB and 1TB, I'll probably split the 1TBs into 2x500GB and
Wols> ignore md complaining that I have multiple components on the
Wols> same physical disk ...
It's not a bad plan for testing, but using a setup like that isn't
good for actual performance numbers since you'll have too much
contention for IOPS.
Dammit, I just gotta pull the trigger. :-)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-15 2:09 ` John Stoffel
@ 2020-09-15 11:14 ` Roger Heflin
2020-09-15 18:07 ` John Stoffel
0 siblings, 1 reply; 26+ messages in thread
From: Roger Heflin @ 2020-09-15 11:14 UTC (permalink / raw)
To: John Stoffel; +Cc: Wols Lists, Linux RAID
> I've been looking at them for a while now, but hesitating
> because... not sure why. I'm using a CoolerMaster case with five
> 5.25" bays, plus a 3.5" bay external, and another three or four
> internal 3.5" bays. Works great. Nice and plain and not flashing
> lights or other bling. And not too loud either. Which is good.
>
> But I've used crappy drive cages before, crappy hot swap ones. Not
> good. And I think it's time I just went with a 4U rack mount with a
> bunch of hot swap bays, if I could only find one that wasn't an arm
> and a leg.
>
I have had good luck with the ICY DOCK brand how swap I have 4
different 4 bay-3bay
ones spanning 6+ years and they all seem to just work. And each
newer version seemed
to have improved design from the prior ones (plugs easier to get to, and such).
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-15 11:14 ` Roger Heflin
@ 2020-09-15 18:07 ` John Stoffel
2020-09-15 19:34 ` Ram Ramesh
0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15 18:07 UTC (permalink / raw)
To: Roger Heflin; +Cc: John Stoffel, Wols Lists, Linux RAID
>>>>> "Roger" == Roger Heflin <rogerheflin@gmail.com> writes:
>> I've been looking at them for a while now, but hesitating
>> because... not sure why. I'm using a CoolerMaster case with five
>> 5.25" bays, plus a 3.5" bay external, and another three or four
>> internal 3.5" bays. Works great. Nice and plain and not flashing
>> lights or other bling. And not too loud either. Which is good.
>>
>> But I've used crappy drive cages before, crappy hot swap ones. Not
>> good. And I think it's time I just went with a 4U rack mount with a
>> bunch of hot swap bays, if I could only find one that wasn't an arm
>> and a leg.
>>
Roger> I have had good luck with the ICY DOCK brand how swap I have 4
Roger> different 4 bay-3bay ones spanning 6+ years and they all seem
Roger> to just work. And each newer version seemed to have improved
Roger> design from the prior ones (plugs easier to get to, and such).
Thanks for the recommendation! I'll be looking at these for
sure. Just wish my case could hold two of them. It would be nice if
they made a 2.5 x 5.25" to 4 x 3.5" disk carrier, so I could stuff two
of them into my 5 exposed 5.25" bays. *grin*
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-15 18:07 ` John Stoffel
@ 2020-09-15 19:34 ` Ram Ramesh
0 siblings, 0 replies; 26+ messages in thread
From: Ram Ramesh @ 2020-09-15 19:34 UTC (permalink / raw)
To: John Stoffel, Roger Heflin; +Cc: Wols Lists, Linux RAID
On 9/15/20 1:07 PM, John Stoffel wrote:
>>>>>> "Roger" == Roger Heflin <rogerheflin@gmail.com> writes:
>>> I've been looking at them for a while now, but hesitating
>>> because... not sure why. I'm using a CoolerMaster case with five
>>> 5.25" bays, plus a 3.5" bay external, and another three or four
>>> internal 3.5" bays. Works great. Nice and plain and not flashing
>>> lights or other bling. And not too loud either. Which is good.
>>>
>>> But I've used crappy drive cages before, crappy hot swap ones. Not
>>> good. And I think it's time I just went with a 4U rack mount with a
>>> bunch of hot swap bays, if I could only find one that wasn't an arm
>>> and a leg.
>>>
> Roger> I have had good luck with the ICY DOCK brand how swap I have 4
> Roger> different 4 bay-3bay ones spanning 6+ years and they all seem
> Roger> to just work. And each newer version seemed to have improved
> Roger> design from the prior ones (plugs easier to get to, and such).
>
> Thanks for the recommendation! I'll be looking at these for
> sure. Just wish my case could hold two of them. It would be nice if
> they made a 2.5 x 5.25" to 4 x 3.5" disk carrier, so I could stuff two
> of them into my 5 exposed 5.25" bays. *grin*
John,
Drive cages come in varity of sizes. You have 1 to 1, 2 to 3, 3 to 4
and 4 to 5. Mix and match to fill all 5 bays with best density of 3.5
inch bays. Here is one example and I am sure you can find many.
https://www.newegg.com/p/pl?d=hot+swap+bay&N=100007599%20600551589&name=SSD+%2F+HDD+Accessories&Order=4
I have three cages, two istar and 1 icy doc. My icy dock lost one bay.
The others are holding a bit better. So, YMMV. However, expect them to
have noisy fans. You may want to change to quieter/reliable ones.
Regards
Ramesh
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-12 16:19 ` antlists
2020-09-12 17:28 ` John Stoffel
@ 2020-09-14 17:19 ` Phillip Susi
2020-09-14 17:26 ` Wols Lists
1 sibling, 1 reply; 26+ messages in thread
From: Phillip Susi @ 2020-09-14 17:19 UTC (permalink / raw)
To: antlists; +Cc: Brian Allen Vanderburg II, linux-raid
antlists writes:
> Yup. Raid 6 has two parity disks, and that's mirrored to give four
> parity disks. So as an *absolute* *minimum*, raid-61 could lose four
> disks with no data loss.
Don't you mean 5 disks?
At best 4 lost disks paird off in each raid1 means the raid6 sees two
failures. One more disk failing isn't enough to take out another mirror
so the raid6 keeps ticking.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Linux raid-like idea
2020-09-14 17:19 ` Phillip Susi
@ 2020-09-14 17:26 ` Wols Lists
0 siblings, 0 replies; 26+ messages in thread
From: Wols Lists @ 2020-09-14 17:26 UTC (permalink / raw)
To: Phillip Susi; +Cc: Brian Allen Vanderburg II, linux-raid
On 14/09/20 18:19, Phillip Susi wrote:
>
> antlists writes:
>
>> Yup. Raid 6 has two parity disks, and that's mirrored to give four
>> parity disks. So as an *absolute* *minimum*, raid-61 could lose four
>> disks with no data loss.
>
> Don't you mean 5 disks?
>
> At best 4 lost disks paired off in each raid1 means the raid6 sees two
> failures. One more disk failing isn't enough to take out another mirror
> so the raid6 keeps ticking.
>
Well caught !!!
Cheers,
Wol
^ permalink raw reply [flat|nested] 26+ messages in thread