linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux raid-like idea
       [not found] <1cf0d18c-2f63-6bca-9884-9544b0e7c54e.ref@aim.com>
@ 2020-08-24 17:23 ` Brian Allen Vanderburg II
  2020-08-28 15:31   ` antlists
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Allen Vanderburg II @ 2020-08-24 17:23 UTC (permalink / raw)
  To: linux-raid

I'm not a systems developer so can't even begin such an idea myself, but
I have a small idea about a RAID solution that may be beneficial, and
then again, maybe not.

It seems that RAID is sometimes advised against especially for larger
disks due to issues with rebuild times.  If during a rebuild, another
disk fails, it could mean the loss of the entire array of data,
depending on how many parity disks exist.  Of course RAID is not in
itself an alternative to a backup of critical data, but to minimize the
chance of total data loss of an array, other solutions (UnRAID/etc)
exist.  One I've used a little bit is mergerfs/SnapRAID.  Mergerfs take
two complete file systems and presents it as a single file system, and
can distribute the files across, with the advantage that a lost data
drive does not lose the entire array since each disk is its own complete
filesystem.  Only the files on the lost disk would be missing.  SnapRAID
can then be ran periodically to create parity data to restore from if a
data disk is lost.

This got me to thinking, why can't we do something like this at the
driver level, with real-time parity protection? In SnapRAID, the parity
must be manually built via the command, and a lost disk means that disk
is down until a restore command is manually ran. In a real RAID array,
the parity would be calculated in real time, and a block from a missing
disk can still be read based on the parity information and other disks.
It's just that, since the disks are combined into one logical disk, a
completely lost data disk with no available parity essentially loses all
data in the array.

So the idea is, for a RAID system maybe something like mdadm, but to
present each data disk as its own block device.  /dev/sda1 may be
presented as /dev/fr0 (fr = fakeRAID), /dev/sdb1 as /def/fr1, and so on,
with /dev/sdd1 and /dev/sde1 as parity disks.  A read/write from
/dev/fr0 would always map to /dev/sda1 plus a small fixed-size header
for the associations. This fixed-size header would also allow, if the
drive was removed and inserted into a different system, a loopback mount
with offset to access the contents.

The scope of the idea stops there, just providing parity protection to
the data disks while presenting each data disk as its own block device.
If desired, multiple sets could be created, each with their own data and
parity disks.  And it should support adding new data and parity disks,
removing, etc. Ideally, the data disks could be of different sizes, as
long as the parity disks were the largest.

On top of this, the user then uses the exposed block devices as they see
fit.  If the data disks are related, they could use something like
mergerfs on top of the mount points of the files systems in
/dev/fr0,1,2,etc. If the disks are not related then /dev/fr0,1,2,etc
could be used independently.  They could be partitioned and have more
than one file system on them.  Perhaps in theory a RAID array could be
built on top of them, but this defeats the purpose of each data disk
containing it's own complete file system, and would then have the issue
that a lost data disk would loose the entire data.


Just an idea I wanted to put out there to see if there were any
merit/interest in it.


Thanks,


Brian Allen Vanderburg II



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-08-24 17:23 ` Linux raid-like idea Brian Allen Vanderburg II
@ 2020-08-28 15:31   ` antlists
  2020-09-05 21:47     ` Brian Allen Vanderburg II
  0 siblings, 1 reply; 26+ messages in thread
From: antlists @ 2020-08-28 15:31 UTC (permalink / raw)
  To: Brian Allen Vanderburg II, linux-raid

On 24/08/2020 18:23, Brian Allen Vanderburg II wrote:
> Just an idea I wanted to put out there to see if there were any
> merit/interest in it.

I hate to say it, but your data/parity pair sounds exactly like a 
two-disk raid-1 mirror. Yes, parity != mirror, but in practice I think 
it's a distinction without a difference.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-08-28 15:31   ` antlists
@ 2020-09-05 21:47     ` Brian Allen Vanderburg II
  2020-09-05 22:42       ` Wols Lists
  2020-09-15 11:32       ` Nix
  0 siblings, 2 replies; 26+ messages in thread
From: Brian Allen Vanderburg II @ 2020-09-05 21:47 UTC (permalink / raw)
  To: antlists, linux-raid

The idea is actually to be able to use more than two disks, like raid 5
or raid 6, except with parity on their own disks instead of distributed
across disks, and data kept own their own disks as well.  I've used
SnapRaid a bit and was just making some changes to my own setup when I
got the idea as to why something similar can't be done in block device
level, but keeping one of the advantages of SnapRaid-like systems which
is if any data disk is lost beyond recovery, then only the data on that
data disk is lost due to the fact that the data on the other data disks
are still their own complete filesystem, and providing real-time updates
to the parity data.


So for instance

/dev/sda - may be data disk 1, say 1TB

/dev/sdb - may be data disk 2, 2TB

/dev/sdc - may be data disk 3, 2TB

/dev/sdd - may be parity disk 1 (maybe a raid-5-like setup), 2TB

/dev/sde - may be parity disk 2 (maybe a raid-6-like setup), 2TB


The parity disks must be larger than the largest data disks. If a given
block is not present on a data disk (due to it being smaller than the
other data disks) it is computed as all zeroes. So the parity for
position 1.5TB would use zeros from /dev/sda and whatever the block is
from /dev/sdb and /dev/sdc

In normal raid 5/6, this would only expose a single logical block device
/dev/md0 and the data and parity would distributed across the disks.  If
any data disk is lost without any parity, it's not possible to recover
any data since the blocks are scattered across all disks.  What good is
any file if it is missing every other third block?  Even trying to
figure out the files would be virtually impossible since even the
structures of any filesystem and everything else on /dev/md0 is also
distributed.


My idea is basically, instead of exposing a single logical block device
that is the 'joined' array, each data disk would be exposed as its own
logical block device. /dev/sda1 may be exposed as /dev/fr1 (well, some
better name), /dev/sdb1 as /dev/fr2, /dev/sdc1 as /dev/fr3, the parity
disks would not be exposed as a logical block device.  The blocks would
essentially be a 1-1 identity between /def/fr1 and /dev/sda1 and so on
except a small header on /dev/sda, so block 0 on /dev/fr1 may actually
be block 8 on /dev/sda.  If any single disk were ever removed from the
array, the full data on it could still be accessed via losetup with an
offset, and any file systems that were built on it could be read
independently from any of the other data disks.

The difference from traditional raid is that, if every disk somehow got
damaged beyond recovery excepted for /dev/sda, it would still be
possible to recover whatever data was on that disk since it was exposed
to the system as its own block device, with an entire filesystem on it.
The same with /dev/sdb and /dev/sdc. Any write to any of the data block
devices would automatically also write parity.  Any read from any data
block device if it is failed would recompute from available parity in
real time, except with degraded performance.  The file systems created
on the exposed logical block devices could be used however the user sees
fit, maybe related such as a union/merge pool file system, or unrelated
such as /home on the /dev/fr1 filesystem and /usr/local on /dev/fr2.
There would be no read/write performance increase, since reads from the
a single logical block device maps to the same physical device.  But
there would be the typical redundancy of raid, and if during any
recovery/rebuild another disk fails which would prevent the recovery of
the data, only the data on the lost data disks is gone.


Thanks,


Brian Vanderburg II


On 8/28/20 11:31 AM, antlists wrote:
> On 24/08/2020 18:23, Brian Allen Vanderburg II wrote:
>> Just an idea I wanted to put out there to see if there were any
>> merit/interest in it.
>
> I hate to say it, but your data/parity pair sounds exactly like a
> two-disk raid-1 mirror. Yes, parity != mirror, but in practice I think
> it's a distinction without a difference.
>
> Cheers,
> Wol


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-05 21:47     ` Brian Allen Vanderburg II
@ 2020-09-05 22:42       ` Wols Lists
  2020-09-11 15:14         ` Brian Allen Vanderburg II
  2020-09-15 11:32       ` Nix
  1 sibling, 1 reply; 26+ messages in thread
From: Wols Lists @ 2020-09-05 22:42 UTC (permalink / raw)
  To: Brian Allen Vanderburg II, linux-raid

On 05/09/20 22:47, Brian Allen Vanderburg II wrote:
> The idea is actually to be able to use more than two disks, like raid 5
> or raid 6, except with parity on their own disks instead of distributed
> across disks, and data kept own their own disks as well.  I've used
> SnapRaid a bit and was just making some changes to my own setup when I
> got the idea as to why something similar can't be done in block device
> level, but keeping one of the advantages of SnapRaid-like systems which
> is if any data disk is lost beyond recovery, then only the data on that
> data disk is lost due to the fact that the data on the other data disks
> are still their own complete filesystem, and providing real-time updates
> to the parity data.

I doubt I understand what you're getting at, but this is sounding a bit
like raid-4, if you have data disk(s) and a separate parity disk. People
don't use raid 4 because it has a nasty performance hit.

Personally, I'm looking at something like raid-61 as a project. That
would let you survive four disk failures ...

Also, one of the biggest problems when a disk fails and you have to
replace it is that, at present, with nearly all raid levels even if you
have lots of disks, rebuilding a failed disk is pretty much guaranteed
to hammer just one or two surviving disks, pushing them into failure if
they're at all dodgy. I'm also looking at finding some randomisation
algorithm that will smear the blocks out across all the disks, so that
rebuilding one disk spreads the load evenly across all disks.

At the end of the day, if you think what you're doing is a good idea,
scratch that itch, bounce stuff off here (and the kernel newbies list if
you're not a kernel programmer yet), and see how it goes. Personally, I
don't think it'll fly, but I'm sure people here would say the same about
some of my pet ideas too. Give it a go!

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-05 22:42       ` Wols Lists
@ 2020-09-11 15:14         ` Brian Allen Vanderburg II
  2020-09-11 19:16           ` antlists
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Allen Vanderburg II @ 2020-09-11 15:14 UTC (permalink / raw)
  To: Wols Lists, linux-raid


On 9/5/20 6:42 PM, Wols Lists wrote:
> I doubt I understand what you're getting at, but this is sounding a bit
> like raid-4, if you have data disk(s) and a separate parity disk. People
> don't use raid 4 because it has a nasty performance hit.

Yes it is a bit like raid-4 since the data and parity disks are
separated.  In fact the idea could be better called a parity backed
collection of independently accessed disks. While you would not get the
advantage/performance increase of reads/writes going across multiple
disks, the idea is primarily targeted to read-heavy applications, so in
a typical use, read performance should be no worse than reading directly
from a single un-raided disk, except in case of a disk failure where the
parity is being used to calculated a block read on a missing disk. 
Writes would have more overhead since they would also have to
calculate/update parity.

> Personally, I'm looking at something like raid-61 as a project. That
> would let you survive four disk failures ...

Interesting.  I'll check that out more later, but from what it seems so
far there is a lot of overhead (10 1TB disks would only be 3TB of data
(2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of
data).  My currently solution since I'ts basically just storing bulk
data, is mergerfs and snapraid, and from the documents of snapraid, 10
1TB disks would provide 6TB if using 4 for parity.  However it's parity
calculations seem to be more complex as well.

> Also, one of the biggest problems when a disk fails and you have to
> replace it is that, at present, with nearly all raid levels even if you
> have lots of disks, rebuilding a failed disk is pretty much guaranteed
> to hammer just one or two surviving disks, pushing them into failure if
> they're at all dodgy. I'm also looking at finding some randomisation
> algorithm that will smear the blocks out across all the disks, so that
> rebuilding one disk spreads the load evenly across all disks.

This is actually the main purpose of the idea.  Due to the data on the
disks in a traditional raid5/6 being mapped from multiple disks to a
single logical block device, and so the structures of any file systems
and their files scattered across all the disks, losing one more than the
number of available lost disks would make the entire filesystem(s) and
all files virtually unrecoverable.

By keeping each data disk separate and exposed as it's own block device
with some parity backup, each disk contains an entire filesystem(s) on
it's own to be used however a user decides.  The loss of one of the
disks during a rebuild would not cause full data loss anymore but only
of the filesystem(s) on that disk.  The data on the other disks would
still be intact and readable, although depending on the user's usage,
may be missing files if they used a union/merge filesystem on top of
them.  A rebuild would still have the same issues, would have to read
all the remaining disks to rebuild the lost disk.  I'm not really sure
of any way around that since parity would essentially be calculated as
the xor of the same block on all the data disks.

>
> At the end of the day, if you think what you're doing is a good idea,
> scratch that itch, bounce stuff off here (and the kernel newbies list if
> you're not a kernel programmer yet), and see how it goes. Personally, I
> don't think it'll fly, but I'm sure people here would say the same about
> some of my pet ideas too. Give it a go!
>
> Cheers,
> Wol


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-11 15:14         ` Brian Allen Vanderburg II
@ 2020-09-11 19:16           ` antlists
  2020-09-11 20:14             ` Brian Allen Vanderburg II
  0 siblings, 1 reply; 26+ messages in thread
From: antlists @ 2020-09-11 19:16 UTC (permalink / raw)
  To: Brian Allen Vanderburg II, linux-raid

On 11/09/2020 16:14, Brian Allen Vanderburg II wrote:
> 
> On 9/5/20 6:42 PM, Wols Lists wrote:
>> I doubt I understand what you're getting at, but this is sounding a bit
>> like raid-4, if you have data disk(s) and a separate parity disk. People
>> don't use raid 4 because it has a nasty performance hit.
> 
> Yes it is a bit like raid-4 since the data and parity disks are
> separated.  In fact the idea could be better called a parity backed
> collection of independently accessed disks. While you would not get the
> advantage/performance increase of reads/writes going across multiple
> disks, the idea is primarily targeted to read-heavy applications, so in
> a typical use, read performance should be no worse than reading directly
> from a single un-raided disk, except in case of a disk failure where the
> parity is being used to calculated a block read on a missing disk.
> Writes would have more overhead since they would also have to
> calculate/update parity.

Ummm...

So let me word this differently. You're looking at pairing disks up, 
with a filesystem on each pair (data/parity), and then using mergefs on 
top. Compared with simple raid, that looks like a lose-lose scenario to me.

A raid-1 will read faster than a single disk, because it optimises which 
disk to read from, and it will write faster too because your typical 
parity calculation for a two-disk scenario is a no-op, which might not 
optimise out.
> 
>> Personally, I'm looking at something like raid-61 as a project. That
>> would let you survive four disk failures ...
> 
> Interesting.  I'll check that out more later, but from what it seems so
> far there is a lot of overhead (10 1TB disks would only be 3TB of data
> (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of
> data).  My currently solution since I'ts basically just storing bulk
> data, is mergerfs and snapraid, and from the documents of snapraid, 10
> 1TB disks would provide 6TB if using 4 for parity.  However it's parity
> calculations seem to be more complex as well.

Actually no. Don't forget that, as far as linux is concerned, raid-10 
and raid-1+0 are two *completely* *different* things. You can raid-10 
three disks, but you need four for raid-1+0.

You've mis-calculated raid-6+1 - that gives you 6TB for 10 disks (two 
3TB arrays). I think I would probably get more with raid-61, but every 
time I think about it my brain goes "whoa!!!", and I'll need to start 
concentrating on it to work out exactly what's going on.
> 
>> Also, one of the biggest problems when a disk fails and you have to
>> replace it is that, at present, with nearly all raid levels even if you
>> have lots of disks, rebuilding a failed disk is pretty much guaranteed
>> to hammer just one or two surviving disks, pushing them into failure if
>> they're at all dodgy. I'm also looking at finding some randomisation
>> algorithm that will smear the blocks out across all the disks, so that
>> rebuilding one disk spreads the load evenly across all disks.
> 
> This is actually the main purpose of the idea.  Due to the data on the
> disks in a traditional raid5/6 being mapped from multiple disks to a
> single logical block device, and so the structures of any file systems
> and their files scattered across all the disks, losing one more than the
> number of available lost disks would make the entire filesystem(s) and
> all files virtually unrecoverable.

But raid 5/6 give you much more usable space than a mirror. What I'm 
having trouble getting to grips with in your idea is how is it an 
improvement on a mirror? It looks to me like you're proposing a 2-disk 
raid-4 as the underlying storage medium, with mergefs on top. Which is 
effectively giving you a poorly-performing mirror. A crappy raid-1+0, 
basically.
> 
> By keeping each data disk separate and exposed as it's own block device
> with some parity backup, each disk contains an entire filesystem(s) on
> it's own to be used however a user decides.  The loss of one of the
> disks during a rebuild would not cause full data loss anymore but only
> of the filesystem(s) on that disk.  The data on the other disks would
> still be intact and readable, although depending on the user's usage,
> may be missing files if they used a union/merge filesystem on top of
> them.  A rebuild would still have the same issues, would have to read
> all the remaining disks to rebuild the lost disk.  I'm not really sure
> of any way around that since parity would essentially be calculated as
> the xor of the same block on all the data disks.
> 
And as I understand your setup, you also suffer from the same problem as 
raid-10 - lose one disk and you're fine, lose two and it's russian 
roulette whether you can recover your data. raid-6 is *any* two and 
you're fine, raid-61 would be *any* four and you're fine.
>>
>> At the end of the day, if you think what you're doing is a good idea,
>> scratch that itch, bounce stuff off here (and the kernel newbies list if
>> you're not a kernel programmer yet), and see how it goes. Personally, I
>> don't think it'll fly, but I'm sure people here would say the same about
>> some of my pet ideas too. Give it a go!
>>
Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-11 19:16           ` antlists
@ 2020-09-11 20:14             ` Brian Allen Vanderburg II
  2020-09-12  6:09               ` Song Liu
                                 ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Brian Allen Vanderburg II @ 2020-09-11 20:14 UTC (permalink / raw)
  To: antlists, linux-raid


On 9/11/20 3:16 PM, antlists wrote:
> Yes it is a bit like raid-4 since the data and parity disks are
>> separated.  In fact the idea could be better called a parity backed
>> collection of independently accessed disks. While you would not get the
>> advantage/performance increase of reads/writes going across multiple
>> disks, the idea is primarily targeted to read-heavy applications, so in
>> a typical use, read performance should be no worse than reading directly
>> from a single un-raided disk, except in case of a disk failure where the
>> parity is being used to calculated a block read on a missing disk.
>> Writes would have more overhead since they would also have to
>> calculate/update parity.
>
> Ummm...
>
> So let me word this differently. You're looking at pairing disks up,
> with a filesystem on each pair (data/parity), and then using mergefs
> on top. Compared with simple raid, that looks like a lose-lose
> scenario to me.
>
> A raid-1 will read faster than a single disk, because it optimises
> which disk to read from, and it will write faster too because your
> typical parity calculation for a two-disk scenario is a no-op, which
> might not optimise out.


Not exactly.  You can do a data + parity, but you could also do a data +
data + parity or a data + data + data + parity.  Or with more than one
parity disk data + data + data + data +parity + parity, etc.

Best viewed in a fixed-width font, and probably make more sense read
from the bottom up:


       /data
         |
    / mergerfs  \
   /             \
/pool1         /pool2         /pool3 (or /home or /usr/local, etc)
   |             |             |
The filesystem built upon the /dev/frX devices can be used however the
user wants.
   |             |             |
----------------------------------------
   |             |             |
ext4 (etc)     ext4(etc)    (ext4/etc, could in theory even have
multiple partitions then filesystems)
   |             |             |
Each exposed block device /dev/frX can have a filesystem/partition table
placed on it, which is placed onto the single mapped disk.
Any damage/issues on one data disk would not affect the other data disks
at all.  However, since the collection of data disks also has parity for
them,
damage to a data disk can be restored from the parity and other data
disks.  If, during restore, something prevents the restore, then only
the bad
data disks have an issue, the other data disks would still be fully
accessible, and any filesystem on them still intact since the entire
filesystem
from anything on /dev/fr0 would be only on /dev/sda1, and so on.
   |             |             |
----------------------------------------
   |             |             |
/dev/fr0      /dev/fr1      /dev/fr2
   |             |             |
Individual data disks are passed through as fully exposed block devices,
minus any overhead for information/data structures for the 'raid'.
A block X on /dev/fr0 maps to block X + offset on /dev/sda1 and so on
   |             |             |
Raid/parity backed disk layer (data: /dev/sda1=/dev/fr0,
/dev/sdb1=/dev/fr1, /dev/sdc1=/dev/fr2, parity: /dev/sdd1)
   |             |             |
-----------------------------------------------------
   |             |             |                 |
/dev/sda1    /dev/sdb1     /dev/sdc1      /dev/sdd1 (parity)



So basically at the raid (or parity backed layer), multiple disks and
not just a single disk, can be backed by the parity disk (ideally
support for more than on parity disk as well)  Only difference is,
instead of joining the disks as one block device /dev/md0, each data
disk gets its own block device and so has it's own filesystem(s) on it
independently of the other disks.  A single data disk can be removed
entirely, taken to a different system, and still be read (would need to
do losetup with an offset to get to the start of the
filesystem/partition table though), and the other data disks would still
be readable on the original system.  So any total loss of a data disk
would not affect the other data disks files.  In this example, /data
could be missing some files if /pool1 (/dev/sda1) died, but the files on
/pool2 would still be entirely accessible as would any filesystem from
/dev/sdc1.  There is no performance advantage to such a setup. The
advantage is that should something real bad happen and it become
impossible to restore some data disk(s), the other disk(s) are still
accessible.

Read from /dev/fr0 = read from /dev/sda1 (adjusted for any overhead/headers)
Read from /dev/fr1 = read from /dev/sdb1 (adjusted for any overhead/headers)
Read from /dev/fr2 = read from /dev/sdc1 (adjusted for any overhead/headers)
Write to /dev/fr0 = write to /dev/sda1 ((adjusted for any
overhead/headers) and parity /dev/sdd1
Write to /dev/fr1 = write to /dev/sdb1 ((adjusted for any
overhead/headers) and parity /dev/sdd1
Write to /dev/fr2 = write to /dev/sdc1 ((adjusted for any
overhead/headers) and parity /dev/sdd1

Read from /dev/fr0 (/dev/sda1 missing) = read from parity and other
disks, recalculate original block)
During rebuild, /dev/sdd dies as well (unable to rebuild from parity now
since /dev/sda and /dev/sdd are missing)
    Lost: /dev/sda1
    Still present: /dev/sdb1 -- some files from the pool will be missing
since /pool1 is missing but the files on /pool2 are still present in
their entirety
    Still present: /pool3 (or /home or /usr/local, etc, whatever
/dev/fr2 was used for)

>>
>>> Personally, I'm looking at something like raid-61 as a project. That
>>> would let you survive four disk failures ...
>>
>> Interesting.  I'll check that out more later, but from what it seems so
>> far there is a lot of overhead (10 1TB disks would only be 3TB of data
>> (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of
>> data).  My currently solution since I'ts basically just storing bulk
>> data, is mergerfs and snapraid, and from the documents of snapraid, 10
>> 1TB disks would provide 6TB if using 4 for parity.  However it's parity
>> calculations seem to be more complex as well.
>
> Actually no. Don't forget that, as far as linux is concerned, raid-10
> and raid-1+0 are two *completely* *different* things. You can raid-10
> three disks, but you need four for raid-1+0.
>
> You've mis-calculated raid-6+1 - that gives you 6TB for 10 disks (two
> 3TB arrays). I think I would probably get more with raid-61, but every
> time I think about it my brain goes "whoa!!!", and I'll need to start
> concentrating on it to work out exactly what's going on.

That's right, I get the various combinations confused.  So does raid61
allow for losing 4 disks in any order and still recovering? or would
some order of disks make it where just 3 disks lost and be bad?
Iinteresting non-the-less and I'll have to look into it.  Obviously it's
not intended to as a replacement for backing up important data, but, for
me any way, just away to minimize loss of any trivial bulk data/files.

It would be nice if the raid modules had support for methods that could
support a total of more disks in any order lost without loosing data. 
Snapraid source states that it uses some Cauchy Matrix algorithm which
in theory could loose up to 6 disks if using 6 parity disks, in any
order, and still be able to restore the data.  I'm not familiar with the
math behind it so can't speak to the accuracy of that claim.

>> This is actually the main purpose of the idea.  Due to the data on the
>> disks in a traditional raid5/6 being mapped from multiple disks to a
>> single logical block device, and so the structures of any file systems
>> and their files scattered across all the disks, losing one more than the
>> number of available lost disks would make the entire filesystem(s) and
>> all files virtually unrecoverable.
>
> But raid 5/6 give you much more usable space than a mirror. What I'm
> having trouble getting to grips with in your idea is how is it an
> improvement on a mirror? It looks to me like you're proposing a 2-disk
> raid-4 as the underlying storage medium, with mergefs on top. Which is
> effectively giving you a poorly-performing mirror. A crappy raid-1+0,
> basically.

I do apologize it seems I'm having a little difficulty clearly
explaining the idea.  Hopefully the chart above helps explain it better
than I have been.  Imagine raid 5 or 6, but with no striping (so the
parity goes on their own disks), and the data disks passed through as
their down block devices each.  You lose any performance benefits of the
striping of data/parity, but the data stored on any data disk is only on
that data disk, and same for the others, so losing all parity and a data
disk, would not lose the data on the other data disks.

>>
>> By keeping each data disk separate and exposed as it's own block device
>> with some parity backup, each disk contains an entire filesystem(s) on
>> it's own to be used however a user decides.  The loss of one of the
>> disks during a rebuild would not cause full data loss anymore but only
>> of the filesystem(s) on that disk.  The data on the other disks would
>> still be intact and readable, although depending on the user's usage,
>> may be missing files if they used a union/merge filesystem on top of
>> them.  A rebuild would still have the same issues, would have to read
>> all the remaining disks to rebuild the lost disk.  I'm not really sure
>> of any way around that since parity would essentially be calculated as
>> the xor of the same block on all the data disks.
>>
> And as I understand your setup, you also suffer from the same problem
> as raid-10 - lose one disk and you're fine, lose two and it's russian
> roulette whether you can recover your data. raid-6 is *any* two and
> you're fine, raid-61 would be *any* four and you're fine.

Not exactly.  Since the data disks are passed through as individual
block devices instead of 'joined' into a single block device, if you
lose one disk (assuming only one disk of parity) then you are fine. If
you lose two, then you've only lost the data on the lost data disk. The
other data disks would still have their in-tact filesystems on them. 
Depending on how they are used, some files may be missing. IE a mergerfs
between two mount points would be missing any files on the lost mount
point, but the other files would still be accessible.


It may or may not (leaning more to probably not) have any use. I'm
hoping from the above at least the idea is better understood.  I do
apologize if it's still not clear/


Thanks,


Brian Vanderburg II




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-11 20:14             ` Brian Allen Vanderburg II
@ 2020-09-12  6:09               ` Song Liu
  2020-09-12 14:40               ` Adam Goryachev
  2020-09-12 16:19               ` antlists
  2 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2020-09-12  6:09 UTC (permalink / raw)
  To: Brian Allen Vanderburg II; +Cc: antlists, linux-raid

On Fri, Sep 11, 2020 at 1:15 PM Brian Allen Vanderburg II
<brianvanderburg2@aim.com> wrote:
>
>
> On 9/11/20 3:16 PM, antlists wrote:
> > Yes it is a bit like raid-4 since the data and parity disks are
> >> separated.  In fact the idea could be better called a parity backed
> >> collection of independently accessed disks. While you would not get the
> >> advantage/performance increase of reads/writes going across multiple
> >> disks, the idea is primarily targeted to read-heavy applications, so in
> >> a typical use, read performance should be no worse than reading directly
> >> from a single un-raided disk, except in case of a disk failure where the
> >> parity is being used to calculated a block read on a missing disk.
> >> Writes would have more overhead since they would also have to
> >> calculate/update parity.
> >
> > Ummm...
> >
> > So let me word this differently. You're looking at pairing disks up,
> > with a filesystem on each pair (data/parity), and then using mergefs
> > on top. Compared with simple raid, that looks like a lose-lose
> > scenario to me.
> >
> > A raid-1 will read faster than a single disk, because it optimises
> > which disk to read from, and it will write faster too because your
> > typical parity calculation for a two-disk scenario is a no-op, which
> > might not optimise out.
>
>
> Not exactly.  You can do a data + parity, but you could also do a data +
> data + parity or a data + data + data + parity.  Or with more than one
> parity disk data + data + data + data +parity + parity, etc.
>
> Best viewed in a fixed-width font, and probably make more sense read
> from the bottom up:
>
>
>        /data
>          |
>     / mergerfs  \
>    /             \
> /pool1         /pool2         /pool3 (or /home or /usr/local, etc)
>    |             |             |
> The filesystem built upon the /dev/frX devices can be used however the
> user wants.
>    |             |             |
> ----------------------------------------
>    |             |             |
> ext4 (etc)     ext4(etc)    (ext4/etc, could in theory even have
> multiple partitions then filesystems)
>    |             |             |
> Each exposed block device /dev/frX can have a filesystem/partition table
> placed on it, which is placed onto the single mapped disk.
> Any damage/issues on one data disk would not affect the other data disks
> at all.  However, since the collection of data disks also has parity for
> them,
> damage to a data disk can be restored from the parity and other data
> disks.  If, during restore, something prevents the restore, then only
> the bad
> data disks have an issue, the other data disks would still be fully
> accessible, and any filesystem on them still intact since the entire
> filesystem
> from anything on /dev/fr0 would be only on /dev/sda1, and so on.
>    |             |             |
> ----------------------------------------
>    |             |             |
> /dev/fr0      /dev/fr1      /dev/fr2
>    |             |             |
> Individual data disks are passed through as fully exposed block devices,
> minus any overhead for information/data structures for the 'raid'.
> A block X on /dev/fr0 maps to block X + offset on /dev/sda1 and so on
>    |             |             |
> Raid/parity backed disk layer (data: /dev/sda1=/dev/fr0,
> /dev/sdb1=/dev/fr1, /dev/sdc1=/dev/fr2, parity: /dev/sdd1)
>    |             |             |
> -----------------------------------------------------
>    |             |             |                 |
> /dev/sda1    /dev/sdb1     /dev/sdc1      /dev/sdd1 (parity)
>
>
>
> So basically at the raid (or parity backed layer), multiple disks and
> not just a single disk, can be backed by the parity disk (ideally
> support for more than on parity disk as well)  Only difference is,
> instead of joining the disks as one block device /dev/md0, each data
> disk gets its own block device and so has it's own filesystem(s) on it
> independently of the other disks.  A single data disk can be removed
> entirely, taken to a different system, and still be read (would need to
> do losetup with an offset to get to the start of the
> filesystem/partition table though), and the other data disks would still
> be readable on the original system.  So any total loss of a data disk
> would not affect the other data disks files.  In this example, /data
> could be missing some files if /pool1 (/dev/sda1) died, but the files on
> /pool2 would still be entirely accessible as would any filesystem from
> /dev/sdc1.  There is no performance advantage to such a setup. The
> advantage is that should something real bad happen and it become
> impossible to restore some data disk(s), the other disk(s) are still
> accessible.
>
> Read from /dev/fr0 = read from /dev/sda1 (adjusted for any overhead/headers)
> Read from /dev/fr1 = read from /dev/sdb1 (adjusted for any overhead/headers)
> Read from /dev/fr2 = read from /dev/sdc1 (adjusted for any overhead/headers)
> Write to /dev/fr0 = write to /dev/sda1 ((adjusted for any
> overhead/headers) and parity /dev/sdd1
> Write to /dev/fr1 = write to /dev/sdb1 ((adjusted for any
> overhead/headers) and parity /dev/sdd1
> Write to /dev/fr2 = write to /dev/sdc1 ((adjusted for any
> overhead/headers) and parity /dev/sdd1
>
> Read from /dev/fr0 (/dev/sda1 missing) = read from parity and other
> disks, recalculate original block)
> During rebuild, /dev/sdd dies as well (unable to rebuild from parity now
> since /dev/sda and /dev/sdd are missing)
>     Lost: /dev/sda1
>     Still present: /dev/sdb1 -- some files from the pool will be missing
> since /pool1 is missing but the files on /pool2 are still present in
> their entirety
>     Still present: /pool3 (or /home or /usr/local, etc, whatever
> /dev/fr2 was used for)
>
> >>
> >>> Personally, I'm looking at something like raid-61 as a project. That
> >>> would let you survive four disk failures ...
> >>
> >> Interesting.  I'll check that out more later, but from what it seems so
> >> far there is a lot of overhead (10 1TB disks would only be 3TB of data
> >> (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth of
> >> data).  My currently solution since I'ts basically just storing bulk
> >> data, is mergerfs and snapraid, and from the documents of snapraid, 10
> >> 1TB disks would provide 6TB if using 4 for parity.  However it's parity
> >> calculations seem to be more complex as well.
> >
> > Actually no. Don't forget that, as far as linux is concerned, raid-10
> > and raid-1+0 are two *completely* *different* things. You can raid-10
> > three disks, but you need four for raid-1+0.
> >
> > You've mis-calculated raid-6+1 - that gives you 6TB for 10 disks (two
> > 3TB arrays). I think I would probably get more with raid-61, but every
> > time I think about it my brain goes "whoa!!!", and I'll need to start
> > concentrating on it to work out exactly what's going on.
>
> That's right, I get the various combinations confused.  So does raid61
> allow for losing 4 disks in any order and still recovering? or would
> some order of disks make it where just 3 disks lost and be bad?
> Iinteresting non-the-less and I'll have to look into it.  Obviously it's
> not intended to as a replacement for backing up important data, but, for
> me any way, just away to minimize loss of any trivial bulk data/files.
>
> It would be nice if the raid modules had support for methods that could
> support a total of more disks in any order lost without loosing data.
> Snapraid source states that it uses some Cauchy Matrix algorithm which
> in theory could loose up to 6 disks if using 6 parity disks, in any
> order, and still be able to restore the data.  I'm not familiar with the
> math behind it so can't speak to the accuracy of that claim.
>
> >> This is actually the main purpose of the idea.  Due to the data on the
> >> disks in a traditional raid5/6 being mapped from multiple disks to a
> >> single logical block device, and so the structures of any file systems
> >> and their files scattered across all the disks, losing one more than the
> >> number of available lost disks would make the entire filesystem(s) and
> >> all files virtually unrecoverable.
> >
> > But raid 5/6 give you much more usable space than a mirror. What I'm
> > having trouble getting to grips with in your idea is how is it an
> > improvement on a mirror? It looks to me like you're proposing a 2-disk
> > raid-4 as the underlying storage medium, with mergefs on top. Which is
> > effectively giving you a poorly-performing mirror. A crappy raid-1+0,
> > basically.
>
> I do apologize it seems I'm having a little difficulty clearly
> explaining the idea.  Hopefully the chart above helps explain it better
> than I have been.  Imagine raid 5 or 6, but with no striping (so the
> parity goes on their own disks), and the data disks passed through as
> their down block devices each.  You lose any performance benefits of the
> striping of data/parity, but the data stored on any data disk is only on
> that data disk, and same for the others, so losing all parity and a data
> disk, would not lose the data on the other data disks.
>
> >>
> >> By keeping each data disk separate and exposed as it's own block device
> >> with some parity backup, each disk contains an entire filesystem(s) on
> >> it's own to be used however a user decides.  The loss of one of the
> >> disks during a rebuild would not cause full data loss anymore but only
> >> of the filesystem(s) on that disk.  The data on the other disks would
> >> still be intact and readable, although depending on the user's usage,
> >> may be missing files if they used a union/merge filesystem on top of
> >> them.  A rebuild would still have the same issues, would have to read
> >> all the remaining disks to rebuild the lost disk.  I'm not really sure
> >> of any way around that since parity would essentially be calculated as
> >> the xor of the same block on all the data disks.
> >>
> > And as I understand your setup, you also suffer from the same problem
> > as raid-10 - lose one disk and you're fine, lose two and it's russian
> > roulette whether you can recover your data. raid-6 is *any* two and
> > you're fine, raid-61 would be *any* four and you're fine.
>
> Not exactly.  Since the data disks are passed through as individual
> block devices instead of 'joined' into a single block device, if you
> lose one disk (assuming only one disk of parity) then you are fine. If
> you lose two, then you've only lost the data on the lost data disk. The
> other data disks would still have their in-tact filesystems on them.
> Depending on how they are used, some files may be missing. IE a mergerfs
> between two mount points would be missing any files on the lost mount
> point, but the other files would still be accessible.
>
>
> It may or may not (leaning more to probably not) have any use. I'm
> hoping from the above at least the idea is better understood.  I do
> apologize if it's still not clear/
>

IIUC...

If all the disks are of the same size, this looks like raid-4 with HUGE chunk
size. Say a raid-4 with 4 disks, the drive is 2TiB, and the chunk size is also
2TiB. The array is 6TiB. LBA 0-2TiB goes to disk 0, LBA 2-4TiB goes to
disk 1, and LBA 4-6TiB goes to disk 2. disk 3 is all parities.

Then we create 3x 2TiB partitions on the array (assuming the partition table
is free...). Then we can create 3x file systems on these partitions.

Is this same/similar to the idea?

Thanks,
Song

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-11 20:14             ` Brian Allen Vanderburg II
  2020-09-12  6:09               ` Song Liu
@ 2020-09-12 14:40               ` Adam Goryachev
  2020-09-12 16:19               ` antlists
  2 siblings, 0 replies; 26+ messages in thread
From: Adam Goryachev @ 2020-09-12 14:40 UTC (permalink / raw)
  To: Brian Allen Vanderburg II, antlists, linux-raid



On 12 September 2020 6:14:51 am AEST, Brian Allen Vanderburg II <brianvanderburg2@aim.com> wrote:
>
>On 9/11/20 3:16 PM, antlists wrote:
>> Yes it is a bit like raid-4 since the data and parity disks are
>>> separated.  In fact the idea could be better called a parity backed
>>> collection of independently accessed disks. While you would not get
>the
>>> advantage/performance increase of reads/writes going across multiple
>>> disks, the idea is primarily targeted to read-heavy applications, so
>in
>>> a typical use, read performance should be no worse than reading
>directly
>>> from a single un-raided disk, except in case of a disk failure where
>the
>>> parity is being used to calculated a block read on a missing disk.
>>> Writes would have more overhead since they would also have to
>>> calculate/update parity.
>>
>> Ummm...
>>
>> So let me word this differently. You're looking at pairing disks up,
>> with a filesystem on each pair (data/parity), and then using mergefs
>> on top. Compared with simple raid, that looks like a lose-lose
>> scenario to me.
>>
>> A raid-1 will read faster than a single disk, because it optimises
>> which disk to read from, and it will write faster too because your
>> typical parity calculation for a two-disk scenario is a no-op, which
>> might not optimise out.
>
>
>Not exactly.  You can do a data + parity, but you could also do a data
>+
>data + parity or a data + data + data + parity.  Or with more than one
>parity disk data + data + data + data +parity + parity, etc.
>
>Best viewed in a fixed-width font, and probably make more sense read
>from the bottom up:
>
>
>       /data
>         |
>    / mergerfs  \
>   /             \
>/pool1         /pool2         /pool3 (or /home or /usr/local, etc)
>   |             |             |
>The filesystem built upon the /dev/frX devices can be used however the
>user wants.
>   |             |             |
>----------------------------------------
>   |             |             |
>ext4 (etc)     ext4(etc)    (ext4/etc, could in theory even have
>multiple partitions then filesystems)
>   |             |             |
>Each exposed block device /dev/frX can have a filesystem/partition
>table
>placed on it, which is placed onto the single mapped disk.
>Any damage/issues on one data disk would not affect the other data
>disks
>at all.  However, since the collection of data disks also has parity
>for
>them,
>damage to a data disk can be restored from the parity and other data
>disks.  If, during restore, something prevents the restore, then only
>the bad
>data disks have an issue, the other data disks would still be fully
>accessible, and any filesystem on them still intact since the entire
>filesystem
>from anything on /dev/fr0 would be only on /dev/sda1, and so on.
>   |             |             |
>----------------------------------------
>   |             |             |
>/dev/fr0      /dev/fr1      /dev/fr2
>   |             |             |
>Individual data disks are passed through as fully exposed block
>devices,
>minus any overhead for information/data structures for the 'raid'.
>A block X on /dev/fr0 maps to block X + offset on /dev/sda1 and so on
>   |             |             |
>Raid/parity backed disk layer (data: /dev/sda1=/dev/fr0,
>/dev/sdb1=/dev/fr1, /dev/sdc1=/dev/fr2, parity: /dev/sdd1)
>   |             |             |
>-----------------------------------------------------
>   |             |             |                 |
>/dev/sda1    /dev/sdb1     /dev/sdc1      /dev/sdd1 (parity)
>
>
>
>So basically at the raid (or parity backed layer), multiple disks and
>not just a single disk, can be backed by the parity disk (ideally
>support for more than on parity disk as well)  Only difference is,
>instead of joining the disks as one block device /dev/md0, each data
>disk gets its own block device and so has it's own filesystem(s) on it
>independently of the other disks.  A single data disk can be removed
>entirely, taken to a different system, and still be read (would need to
>do losetup with an offset to get to the start of the
>filesystem/partition table though), and the other data disks would
>still
>be readable on the original system.  So any total loss of a data disk
>would not affect the other data disks files.  In this example, /data
>could be missing some files if /pool1 (/dev/sda1) died, but the files
>on
>/pool2 would still be entirely accessible as would any filesystem from
>/dev/sdc1.  There is no performance advantage to such a setup. The
>advantage is that should something real bad happen and it become
>impossible to restore some data disk(s), the other disk(s) are still
>accessible.
>
>Read from /dev/fr0 = read from /dev/sda1 (adjusted for any
>overhead/headers)
>Read from /dev/fr1 = read from /dev/sdb1 (adjusted for any
>overhead/headers)
>Read from /dev/fr2 = read from /dev/sdc1 (adjusted for any
>overhead/headers)
>Write to /dev/fr0 = write to /dev/sda1 ((adjusted for any
>overhead/headers) and parity /dev/sdd1
>Write to /dev/fr1 = write to /dev/sdb1 ((adjusted for any
>overhead/headers) and parity /dev/sdd1
>Write to /dev/fr2 = write to /dev/sdc1 ((adjusted for any
>overhead/headers) and parity /dev/sdd1
>
>Read from /dev/fr0 (/dev/sda1 missing) = read from parity and other
>disks, recalculate original block)
>During rebuild, /dev/sdd dies as well (unable to rebuild from parity
>now
>since /dev/sda and /dev/sdd are missing)
>    Lost: /dev/sda1
>    Still present: /dev/sdb1 -- some files from the pool will be
>missing
>since /pool1 is missing but the files on /pool2 are still present in
>their entirety
>    Still present: /pool3 (or /home or /usr/local, etc, whatever
>/dev/fr2 was used for)
>
>>>
>>>> Personally, I'm looking at something like raid-61 as a project.
>That
>>>> would let you survive four disk failures ...
>>>
>>> Interesting.  I'll check that out more later, but from what it seems
>so
>>> far there is a lot of overhead (10 1TB disks would only be 3TB of
>data
>>> (2x 5 disk arrays mirrors, then raid6 on each leaving 3 disks-worth
>of
>>> data).  My currently solution since I'ts basically just storing bulk
>>> data, is mergerfs and snapraid, and from the documents of snapraid,
>10
>>> 1TB disks would provide 6TB if using 4 for parity.  However it's
>parity
>>> calculations seem to be more complex as well.
>>
>> Actually no. Don't forget that, as far as linux is concerned, raid-10
>> and raid-1+0 are two *completely* *different* things. You can raid-10
>> three disks, but you need four for raid-1+0.
>>
>> You've mis-calculated raid-6+1 - that gives you 6TB for 10 disks (two
>> 3TB arrays). I think I would probably get more with raid-61, but
>every
>> time I think about it my brain goes "whoa!!!", and I'll need to start
>> concentrating on it to work out exactly what's going on.
>
>That's right, I get the various combinations confused.  So does raid61
>allow for losing 4 disks in any order and still recovering? or would
>some order of disks make it where just 3 disks lost and be bad?
>Iinteresting non-the-less and I'll have to look into it.  Obviously
>it's
>not intended to as a replacement for backing up important data, but,
>for
>me any way, just away to minimize loss of any trivial bulk data/files.
>
>It would be nice if the raid modules had support for methods that could
>support a total of more disks in any order lost without loosing data. 
>Snapraid source states that it uses some Cauchy Matrix algorithm which
>in theory could loose up to 6 disks if using 6 parity disks, in any
>order, and still be able to restore the data.  I'm not familiar with
>the
>math behind it so can't speak to the accuracy of that claim.
>
>>> This is actually the main purpose of the idea.  Due to the data on
>the
>>> disks in a traditional raid5/6 being mapped from multiple disks to a
>>> single logical block device, and so the structures of any file
>systems
>>> and their files scattered across all the disks, losing one more than
>the
>>> number of available lost disks would make the entire filesystem(s)
>and
>>> all files virtually unrecoverable.
>>
>> But raid 5/6 give you much more usable space than a mirror. What I'm
>> having trouble getting to grips with in your idea is how is it an
>> improvement on a mirror? It looks to me like you're proposing a
>2-disk
>> raid-4 as the underlying storage medium, with mergefs on top. Which
>is
>> effectively giving you a poorly-performing mirror. A crappy raid-1+0,
>> basically.
>
>I do apologize it seems I'm having a little difficulty clearly
>explaining the idea.  Hopefully the chart above helps explain it better
>than I have been.  Imagine raid 5 or 6, but with no striping (so the
>parity goes on their own disks), and the data disks passed through as
>their down block devices each.  You lose any performance benefits of
>the
>striping of data/parity, but the data stored on any data disk is only
>on
>that data disk, and same for the others, so losing all parity and a
>data
>disk, would not lose the data on the other data disks.
>
>>>
>>> By keeping each data disk separate and exposed as it's own block
>device
>>> with some parity backup, each disk contains an entire filesystem(s)
>on
>>> it's own to be used however a user decides.  The loss of one of the
>>> disks during a rebuild would not cause full data loss anymore but
>only
>>> of the filesystem(s) on that disk.  The data on the other disks
>would
>>> still be intact and readable, although depending on the user's
>usage,
>>> may be missing files if they used a union/merge filesystem on top of
>>> them.  A rebuild would still have the same issues, would have to
>read
>>> all the remaining disks to rebuild the lost disk.  I'm not really
>sure
>>> of any way around that since parity would essentially be calculated
>as
>>> the xor of the same block on all the data disks.
>>>
>> And as I understand your setup, you also suffer from the same problem
>> as raid-10 - lose one disk and you're fine, lose two and it's russian
>> roulette whether you can recover your data. raid-6 is *any* two and
>> you're fine, raid-61 would be *any* four and you're fine.
>
>Not exactly.  Since the data disks are passed through as individual
>block devices instead of 'joined' into a single block device, if you
>lose one disk (assuming only one disk of parity) then you are fine. If
>you lose two, then you've only lost the data on the lost data disk. The
>other data disks would still have their in-tact filesystems on them. 
>Depending on how they are used, some files may be missing. IE a
>mergerfs
>between two mount points would be missing any files on the lost mount
>point, but the other files would still be accessible.
>
>
>It may or may not (leaning more to probably not) have any use. I'm
>hoping from the above at least the idea is better understood.  I do
>apologize if it's still not clear/
>

Possibly silly question, if you lost 3 data disks, but still had your parity disk, how do you recover all your data? Doesn't sound possible to store enough data on one disk to recover three....

BTW, my preferred method is raid6 on machine a, raid6 6 on machine b, and then drbd to join them together. You can lose a maximum of all disks on one machine and two on the other, or any 2 disks on both machines (total of 4). Basically raid 61 but split between machines.

Regards
Adam


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-11 20:14             ` Brian Allen Vanderburg II
  2020-09-12  6:09               ` Song Liu
  2020-09-12 14:40               ` Adam Goryachev
@ 2020-09-12 16:19               ` antlists
  2020-09-12 17:28                 ` John Stoffel
  2020-09-14 17:19                 ` Phillip Susi
  2 siblings, 2 replies; 26+ messages in thread
From: antlists @ 2020-09-12 16:19 UTC (permalink / raw)
  To: Brian Allen Vanderburg II, linux-raid

On 11/09/2020 21:14, Brian Allen Vanderburg II wrote:
> That's right, I get the various combinations confused.  So does raid61
> allow for losing 4 disks in any order and still recovering? or would
> some order of disks make it where just 3 disks lost and be bad?
> Iinteresting non-the-less and I'll have to look into it.  Obviously it's
> not intended to as a replacement for backing up important data, but, for
> me any way, just away to minimize loss of any trivial bulk data/files.

Yup. Raid 6 has two parity disks, and that's mirrored to give four 
parity disks. So as an *absolute* *minimum*, raid-61 could lose four 
disks with no data loss.

Throw in the guarantee that, with a mirror, you can lose an entire 
mirror with no data-loss, that means - with luck and a following wind - 
you could lose half your disks, PLUS the two parities in the remaining 
disks, and still recover your data. So with a raid-6+1, if I had twelve 
disks, I could lose EIGHT disks and still have a *chance* of recovering 
my array. I'm not quite sure what difference raid-61 would make.

(That says to me, if I have a raid-61, I need as a minimum a complete 
set of data disks. That also says to me, if I've splatted an 8+2 raid-61 
across 11 disks, I only need 7 for a full recovery despite needing a 
minimum of 8, so something isn't quite right here... I suspect the 7 
would be enough but I did say my mind goes Whooaaa!!!!)
> 
> It would be nice if the raid modules had support for methods that could
> support a total of more disks in any order lost without loosing data.
> Snapraid source states that it uses some Cauchy Matrix algorithm which
> in theory could loose up to 6 disks if using 6 parity disks, in any
> order, and still be able to restore the data.  I'm not familiar with the
> math behind it so can't speak to the accuracy of that claim.

That's easy, it's just whether it's worth it. Look at the maths behind 
raid-6. The "one parity disk" methods, 4 or 5, just use XOR. But that 
only works once, a second XOR parity disk adds no new redundancy and is 
worthless. I'm guessing raid-6 uses that Cauchy method you talk about - 
certainly it can generate as many parity disks as you like ... so that 
claim is good, even if raid-6 doesn't use that particular technique.

If someone wants to, mod'ing raid-6 to use 3 parity disks shouldn't be 
that hard ...


But going back to your original idea, I've been thinking about it. And 
it struck me - you NEED to regenerate parity EVERY TIME you write data 
to disk! Otherwise, writing one file on one disk instantly trashes your 
ability to recover all the other files in the same position on the other 
disks. WHOOPS! But if you think it's a good idea, by all means try and 
do it.

The other thing I'd suggest here, is try and make it more like raid-5 
than raid-4. You have X disks, let's say 5. So one disk each is numbered 
0, 1, 2, 3, 4. As part of formatting the disk ready for raid, you create 
a file containing every block where LBA mod 5 equals disk number. So as 
you recalculate your parities, that's where they go.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-12 16:19               ` antlists
@ 2020-09-12 17:28                 ` John Stoffel
  2020-09-12 18:41                   ` antlists
  2020-09-14 17:19                 ` Phillip Susi
  1 sibling, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-12 17:28 UTC (permalink / raw)
  To: antlists; +Cc: Brian Allen Vanderburg II, linux-raid

>>>>> "antlists" == antlists  <antlists@youngman.org.uk> writes:

antlists> On 11/09/2020 21:14, Brian Allen Vanderburg II wrote:
>> That's right, I get the various combinations confused.  So does raid61
>> allow for losing 4 disks in any order and still recovering? or would
>> some order of disks make it where just 3 disks lost and be bad?
>> Iinteresting non-the-less and I'll have to look into it.  Obviously it's
>> not intended to as a replacement for backing up important data, but, for
>> me any way, just away to minimize loss of any trivial bulk data/files.

antlists> Yup. Raid 6 has two parity disks, and that's mirrored to give four 
antlists> parity disks. So as an *absolute* *minimum*, raid-61 could lose four 
antlists> disks with no data loss.

antlists> Throw in the guarantee that, with a mirror, you can lose an entire 
antlists> mirror with no data-loss, that means - with luck and a following wind - 
antlists> you could lose half your disks, PLUS the two parities in the remaining 
antlists> disks, and still recover your data. So with a raid-6+1, if I had twelve 
antlists> disks, I could lose EIGHT disks and still have a *chance* of recovering 
antlists> my array. I'm not quite sure what difference raid-61 would make.

Of course your useful storage is 12/2 -2 = 4 disks, so only 33%
useable space.  Not very good.  At that point, I'd just go with RAID 1
pairs and striped together with RAID 0 (RAID 1+0) for only a 50% loss
of space.  Now if I lose the wrong four disks, I'm screwed, as opposed
to before where I can lose *any* four disks.

The problem with RAID6 is that random small IO writes have terrible
performance.  RAID1+0 gives you much better performance.  Arstechnica
had a great article on this earlier in the year that disk actual testing.

I think (and haven't done the math) that Erasure encoding gives you
better protection as you scale.  And I've even thought about glusterfs
or cephfs for home storage using a bunch of small single board
computers each talking to one disk for storage.  But... it's hard to
justify.

These days, I think it's better for your main storage to be a three
way mirror for the important stuff, performance wise.  And RAID6 with
hot spare for your large streaming collections like videos and such.
But even then... it's hard to justify since it costs alot.

antlists> (That says to me, if I have a raid-61, I need as a minimum a complete 
antlists> set of data disks. That also says to me, if I've splatted an 8+2 raid-61 
antlists> across 11 disks, I only need 7 for a full recovery despite needing a 
antlists> minimum of 8, so something isn't quite right here... I suspect the 7 
antlists> would be enough but I did say my mind goes Whooaaa!!!!)


>> It would be nice if the raid modules had support for methods that could
>> support a total of more disks in any order lost without loosing data.
>> Snapraid source states that it uses some Cauchy Matrix algorithm which
>> in theory could loose up to 6 disks if using 6 parity disks, in any
>> order, and still be able to restore the data.  I'm not familiar with the
>> math behind it so can't speak to the accuracy of that claim.

antlists> That's easy, it's just whether it's worth it. Look at the
antlists> maths behind raid-6. The "one parity disk" methods, 4 or 5,
antlists> just use XOR. But that only works once, a second XOR parity
antlists> disk adds no new redundancy and is worthless. I'm guessing
antlists> raid-6 uses that Cauchy method you talk about - certainly it
antlists> can generate as many parity disks as you like ... so that
antlists> claim is good, even if raid-6 doesn't use that particular
antlists> technique.

antlists> If someone wants to, mod'ing raid-6 to use 3 parity disks
antlists> shouldn't be that hard ...

It's not, but there's diminishing returns because you now have to do
the RMW cycle across even more disks, which is slow.  

antlists> But going back to your original idea, I've been thinking
antlists> about it. And it struck me - you NEED to regenerate parity
antlists> EVERY TIME you write data to disk! Otherwise, writing one
antlists> file on one disk instantly trashes your ability to recover
antlists> all the other files in the same position on the other
antlists> disks. WHOOPS! But if you think it's a good idea, by all
antlists> means try and do it.

Correct.  When compute parity, you do it across blocks.  And the
parity calculation is effectively free these days.  The cost comes
from the (on disks at least) rotational latency to read the entire
stripe across all the disks, modify one to N bytes in that stripe,
then re-writing the stripe back to all the disks.  That's alot of IO.

With RAID1, you just make two writes, one to each disk.  Done.  Even
with a three way mirror, it's simpler.

Now the RAID6 works better if you are replacing the entire stripe,
then you can drop your IOs in half, but you still need to write chunks
to different disks.

This is why big vendors have log based filesystems (Netapp, EMC,
Isilon, etc) with battery backed RAM caches, so they can A) tell the
client the writes are done, B) collect large changes into bigger
chunks, and C) write them in linear fashion down to the disk.

Log based filesystems are great for this.  Until they get fragmented.
SSDs help in that they really don't have a seek cost at all, so you
can handle fragmentation better.  BUT!  SSDs are generally written
assuming 512 byte blocks, but the underlying SSDs now generally use 4k
blocks on the NAND flash, so there's another layer of fragmentation
and wear levelling and other stuff happening outside your control
there as well.  

antlists> The other thing I'd suggest here, is try and make it more
antlists> like raid-5 than raid-4. You have X disks, let's say 5. So
antlists> one disk each is numbered 0, 1, 2, 3, 4. As part of
antlists> formatting the disk ready for raid, you create a file
antlists> containing every block where LBA mod 5 equals disk
antlists> number. So as you recalculate your parities, that's where
antlists> they go.

RAID4 suffers from the parity disk becoming a super hot spot, since it
needs to get written to no matter what.  No one uses it.

Until we can get back to cost effective SSDs using SLC NAND, RAID is
here to stay.  And so is mirroring since it does help protect from
alot of issues, both permanent and temporary.

I had one of my 4tb disks fall out of my main VG, but I didn't lose
and data, I just checked the disk and added it back in.  I've got a
new 4tb disk on order along with a drive cage so I can balance things
better.

But it's almost to the point where it's cheaper to buy a pair of 8tb
drives to replace the 4x4tb drives I'm using now.  But I probably
won't.

I could  write for hours here... it's a tough problem space to work
through.

John

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-12 17:28                 ` John Stoffel
@ 2020-09-12 18:41                   ` antlists
  2020-09-13 12:50                     ` John Stoffel
  0 siblings, 1 reply; 26+ messages in thread
From: antlists @ 2020-09-12 18:41 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

On 12/09/2020 18:28, John Stoffel wrote:
> I had one of my 4tb disks fall out of my main VG, but I didn't lose
> and data, I just checked the disk and added it back in.  I've got a
> new 4tb disk on order along with a drive cage so I can balance things
> better.
> 
> But it's almost to the point where it's cheaper to buy a pair of 8tb
> drives to replace the 4x4tb drives I'm using now.  But I probably
> won't.
> 
You should have bought an 8TB to replace the 4 ... one more failure :-( 
and you would have your 2x8 (and raid-0 the remaining 4s to provide your 
3rd mirror).

> I could  write for hours here... it's a tough problem space to work
> through.

Made worse if, like me, you're more into logical completeness than "will 
it finish in finite time" :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-12 18:41                   ` antlists
@ 2020-09-13 12:50                     ` John Stoffel
  2020-09-13 16:01                       ` Wols Lists
  0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-13 12:50 UTC (permalink / raw)
  To: antlists; +Cc: John Stoffel, linux-raid

>>>>> "antlists" == antlists  <antlists@youngman.org.uk> writes:

antlists> On 12/09/2020 18:28, John Stoffel wrote:
>> I had one of my 4tb disks fall out of my main VG, but I didn't lose
>> and data, I just checked the disk and added it back in.  I've got a
>> new 4tb disk on order along with a drive cage so I can balance things
>> better.
>> 
>> But it's almost to the point where it's cheaper to buy a pair of 8tb
>> drives to replace the 4x4tb drives I'm using now.  But I probably
>> won't.
>> 

antlists> You should have bought an 8TB to replace the 4 ... one more
antlists> failure :-( and you would have your 2x8 (and raid-0 the
antlists> remaining 4s to provide your 3rd mirror).

I know, I really need to buy another drive, but my main system is
full, so I *also* need to either get a new case, or one of those 5 x
3.5" into 3 x 5.25" bay cages to make some room.  Decisions... decisions...

>> I could  write for hours here... it's a tough problem space to work
>> through.

antlists> Made worse if, like me, you're more into logical
antlists> completeness than "will it finish in finite time" :-)

For sure!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-13 12:50                     ` John Stoffel
@ 2020-09-13 16:01                       ` Wols Lists
  2020-09-13 23:49                         ` Brian Allen Vanderburg II
  2020-09-15  2:09                         ` John Stoffel
  0 siblings, 2 replies; 26+ messages in thread
From: Wols Lists @ 2020-09-13 16:01 UTC (permalink / raw)
  To: John Stoffel; +Cc: linux-raid

On 13/09/20 13:50, John Stoffel wrote:
> I know, I really need to buy another drive, but my main system is
> full, so I *also* need to either get a new case, or one of those 5 x
> 3.5" into 3 x 5.25" bay cages to make some room.  Decisions... decisions...

I know I keep on saying it, but I really think I'm close to getting my
new main system (and hence my development system) sorted, and I think I
need to buy one of those cages too.

If you did get those two 8TB drives, you could still have your 8TB 3-way
mirror without needing any more bays/sata-ports.

My problem, of course, is if I'm playing with raid layouts I need as
many disks as I can cram in :-) I'm counting 6 tucked away in my drawer,
which means I'll almost certainly need to add an add-in 4-way sata card,
and as those drives are a mixture of 500GB and 1TB, I'll probably split
the 1TBs into 2x500GB and ignore md complaining that I have multiple
components on the same physical disk ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-13 16:01                       ` Wols Lists
@ 2020-09-13 23:49                         ` Brian Allen Vanderburg II
  2020-09-15  2:12                           ` John Stoffel
  2020-09-15  2:09                         ` John Stoffel
  1 sibling, 1 reply; 26+ messages in thread
From: Brian Allen Vanderburg II @ 2020-09-13 23:49 UTC (permalink / raw)
  To: Wols Lists, John Stoffel; +Cc: linux-raid

OT, but I've got one of those 3x5.25 to 5x3.5 hot swap bays in my main
system and I love it.  I'm using it with an LSI 9207-8i as my
motherboard only supports a few SATA connectors with several already
used, so needed something to provide more ports for future expansion for
my main system's storage.

For more drives, you can use one of those external drive shelf boxes.  I
currently have the HP M6710 I got off eBay with all caddies for about
$100, which can house 24 2.5 hard drives in a 2U chassis and I've used
an LSI 9201-16e to access it (both HBAs flashed to 20.00.07 or something
like that).  I've already tested it and it works great, though a bit
loud on the fans when powering on.  My understanding is also if you have
more than one of these shelves you can daisy chain them via their ports
SAS card -> Shelf 1 -> Shelf 2, etc, even cycling back to the SAS card
for multi-path support (which is at the time over my head).  My plan for
it is to put in my network closet once I get it cleaned out and cabling
ran better to provide whole-house NAS storage.  I think there is also an
M6720 model for 24 3.5 drives in a 4U chassis.  There is also NetApp
shelf I was looking at but from reading looks like it uses a QSFP
connector on it's IOM, and the cables that converted from SFF-8088 were
quite expensive.


On 9/13/20 12:01 PM, Wols Lists wrote:
> On 13/09/20 13:50, John Stoffel wrote:
>> I know, I really need to buy another drive, but my main system is
>> full, so I *also* need to either get a new case, or one of those 5 x
>> 3.5" into 3 x 5.25" bay cages to make some room.  Decisions... decisions...
> I know I keep on saying it, but I really think I'm close to getting my
> new main system (and hence my development system) sorted, and I think I
> need to buy one of those cages too.
>
> If you did get those two 8TB drives, you could still have your 8TB 3-way
> mirror without needing any more bays/sata-ports.
>
> My problem, of course, is if I'm playing with raid layouts I need as
> many disks as I can cram in :-) I'm counting 6 tucked away in my drawer,
> which means I'll almost certainly need to add an add-in 4-way sata card,
> and as those drives are a mixture of 500GB and 1TB, I'll probably split
> the 1TBs into 2x500GB and ignore md complaining that I have multiple
> components on the same physical disk ...
>
> Cheers,
> Wol


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-12 16:19               ` antlists
  2020-09-12 17:28                 ` John Stoffel
@ 2020-09-14 17:19                 ` Phillip Susi
  2020-09-14 17:26                   ` Wols Lists
  1 sibling, 1 reply; 26+ messages in thread
From: Phillip Susi @ 2020-09-14 17:19 UTC (permalink / raw)
  To: antlists; +Cc: Brian Allen Vanderburg II, linux-raid


antlists writes:

> Yup. Raid 6 has two parity disks, and that's mirrored to give four 
> parity disks. So as an *absolute* *minimum*, raid-61 could lose four 
> disks with no data loss.

Don't you mean 5 disks?

At best 4 lost disks paird off in each raid1 means the raid6 sees two
failures.  One more disk failing isn't enough to take out another mirror
so the raid6 keeps ticking.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-14 17:19                 ` Phillip Susi
@ 2020-09-14 17:26                   ` Wols Lists
  0 siblings, 0 replies; 26+ messages in thread
From: Wols Lists @ 2020-09-14 17:26 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Brian Allen Vanderburg II, linux-raid

On 14/09/20 18:19, Phillip Susi wrote:
> 
> antlists writes:
> 
>> Yup. Raid 6 has two parity disks, and that's mirrored to give four 
>> parity disks. So as an *absolute* *minimum*, raid-61 could lose four 
>> disks with no data loss.
> 
> Don't you mean 5 disks?
> 
> At best 4 lost disks paired off in each raid1 means the raid6 sees two
> failures.  One more disk failing isn't enough to take out another mirror
> so the raid6 keeps ticking.
> 
Well caught !!!

Cheers,
Wol

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-13 16:01                       ` Wols Lists
  2020-09-13 23:49                         ` Brian Allen Vanderburg II
@ 2020-09-15  2:09                         ` John Stoffel
  2020-09-15 11:14                           ` Roger Heflin
  1 sibling, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15  2:09 UTC (permalink / raw)
  To: Wols Lists; +Cc: John Stoffel, linux-raid

>>>>> "Wols" == Wols Lists <antlists@youngman.org.uk> writes:

Wols> On 13/09/20 13:50, John Stoffel wrote:
>> I know, I really need to buy another drive, but my main system is
>> full, so I *also* need to either get a new case, or one of those 5 x
>> 3.5" into 3 x 5.25" bay cages to make some room.  Decisions... decisions...

Wols> I know I keep on saying it, but I really think I'm close to getting my
Wols> new main system (and hence my development system) sorted, and I think I
Wols> need to buy one of those cages too.

I've been looking at them for a while now, but hesitating
because... not sure why.  I'm using a CoolerMaster case with five
5.25" bays, plus a 3.5" bay external, and another three or four
internal 3.5" bays.  Works great.  Nice and plain and not flashing
lights or other bling.  And not too loud either.  Which is good.

But I've used crappy drive cages before, crappy hot swap ones.  Not
good.  And I think it's time I just went with a 4U rack mount with a
bunch of hot swap bays, if I could only find one that wasn't an arm
and a leg.  

Wols> If you did get those two 8TB drives, you could still have your
Wols> 8TB 3-way mirror without needing any more bays/sata-ports.

Very true.  

Wols> My problem, of course, is if I'm playing with raid layouts I
Wols> need as many disks as I can cram in :-) I'm counting 6 tucked
Wols> away in my drawer, which means I'll almost certainly need to add
Wols> an add-in 4-way sata card, and as those drives are a mixture of
Wols> 500GB and 1TB, I'll probably split the 1TBs into 2x500GB and
Wols> ignore md complaining that I have multiple components on the
Wols> same physical disk ...

It's not a bad plan for testing, but using a setup like that isn't
good for actual performance numbers since you'll have too much
contention for IOPS.

Dammit, I just gotta pull the trigger.  :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-13 23:49                         ` Brian Allen Vanderburg II
@ 2020-09-15  2:12                           ` John Stoffel
       [not found]                             ` <43ce60a7-64d1-51bc-f29c-7a6388ad91d5@grumpydevil.homelinux.org>
  0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15  2:12 UTC (permalink / raw)
  To: Brian Allen Vanderburg II; +Cc: Wols Lists, John Stoffel, linux-raid

>>>>> "Brian" == Brian Allen Vanderburg <brianvanderburg2@aim.com> writes:

Brian> OT, but I've got one of those 3x5.25 to 5x3.5 hot swap bays in
Brian> my main system and I love it.  I'm using it with an LSI 9207-8i
Brian> as my motherboard only supports a few SATA connectors with
Brian> several already used, so needed something to provide more ports
Brian> for future expansion for my main system's storage.

Very much like what I'm doing with my LSI board providing most of my
data storage, with boot disks (mirrored) on the MB SATA ports.  Makes
for a simpler setup.

Brian> For more drives, you can use one of those external drive shelf
Brian> boxes.  I currently have the HP M6710 I got off eBay with all
Brian> caddies for about $100, which can house 24 2.5 hard drives in a
Brian> 2U chassis and I've used an LSI 9201-16e to access it (both
Brian> HBAs flashed to 20.00.07 or something like that).  I've already
Brian> tested it and it works great, though a bit loud on the fans
Brian> when powering on.  My understanding is also if you have more
Brian> than one of these shelves you can daisy chain them via their
Brian> ports SAS card -> Shelf 1 -> Shelf 2, etc, even cycling back to
Brian> the SAS card for multi-path support (which is at the time over
Brian> my head).  My plan for it is to put in my network closet once I
Brian> get it cleaned out and cabling ran better to provide
Brian> whole-house NAS storage.  I think there is also an M6720 model
Brian> for 24 3.5 drives in a 4U chassis.  There is also NetApp shelf
Brian> I was looking at but from reading looks like it uses a QSFP
Brian> connector on it's IOM, and the cables that converted from
Brian> SFF-8088 were quite expensive.

This is a nice idea, just not sure I want to go with 2.5" drives since
they're expensive per TB of storage.  I just want one of those old
style monster cases with 8 x 5.25" bays so I can fill it with 3.5"
bays.  Or there was a review on Phoronix.com about a 4U chassis that
looked pretty good, esp with USB3 front ports.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-15  2:09                         ` John Stoffel
@ 2020-09-15 11:14                           ` Roger Heflin
  2020-09-15 18:07                             ` John Stoffel
  0 siblings, 1 reply; 26+ messages in thread
From: Roger Heflin @ 2020-09-15 11:14 UTC (permalink / raw)
  To: John Stoffel; +Cc: Wols Lists, Linux RAID

> I've been looking at them for a while now, but hesitating
> because... not sure why.  I'm using a CoolerMaster case with five
> 5.25" bays, plus a 3.5" bay external, and another three or four
> internal 3.5" bays.  Works great.  Nice and plain and not flashing
> lights or other bling.  And not too loud either.  Which is good.
>
> But I've used crappy drive cages before, crappy hot swap ones.  Not
> good.  And I think it's time I just went with a 4U rack mount with a
> bunch of hot swap bays, if I could only find one that wasn't an arm
> and a leg.
>

I have had good luck with the ICY DOCK brand how swap I have 4
different 4 bay-3bay
ones spanning 6+ years and they all seem to just work.    And each
newer version seemed
to have improved design from the prior ones (plugs easier to get to, and such).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-05 21:47     ` Brian Allen Vanderburg II
  2020-09-05 22:42       ` Wols Lists
@ 2020-09-15 11:32       ` Nix
  2020-09-15 18:10         ` John Stoffel
  1 sibling, 1 reply; 26+ messages in thread
From: Nix @ 2020-09-15 11:32 UTC (permalink / raw)
  To: Brian Allen Vanderburg II; +Cc: antlists, linux-raid

On 5 Sep 2020, Brian Allen Vanderburg, II verbalised:

> The idea is actually to be able to use more than two disks, like raid 5
> or raid 6, except with parity on their own disks instead of distributed
> across disks, and data kept own their own disks as well.  I've used
> SnapRaid a bit and was just making some changes to my own setup when I
> got the idea as to why something similar can't be done in block device
> level, but keeping one of the advantages of SnapRaid-like systems which
> is if any data disk is lost beyond recovery, then only the data on that
> data disk is lost due to the fact that the data on the other data disks
> are still their own complete filesystem, and providing real-time updates
> to the parity data.
>
>
> So for instance
>
> /dev/sda - may be data disk 1, say 1TB
>
> /dev/sdb - may be data disk 2, 2TB
>
> /dev/sdc - may be data disk 3, 2TB
>
> /dev/sdd - may be parity disk 1 (maybe a raid-5-like setup), 2TB
>
> /dev/sde - may be parity disk 2 (maybe a raid-6-like setup), 2TB

Why use something as crude as parity? There's *lots* of space there. You
could store full-blown Reed-Solomon stuff in there in much less space
than parity would require with far more likelihood of repairing even
very large errors. A separate device-mapper target would seem to be
perfect for this: like dm-integrity, only with a separate set of
"error-correcting disks" rather than expanding every sector like
dm-integrity does.

-- 
NULL && (void)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-15 11:14                           ` Roger Heflin
@ 2020-09-15 18:07                             ` John Stoffel
  2020-09-15 19:34                               ` Ram Ramesh
  0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15 18:07 UTC (permalink / raw)
  To: Roger Heflin; +Cc: John Stoffel, Wols Lists, Linux RAID

>>>>> "Roger" == Roger Heflin <rogerheflin@gmail.com> writes:

>> I've been looking at them for a while now, but hesitating
>> because... not sure why.  I'm using a CoolerMaster case with five
>> 5.25" bays, plus a 3.5" bay external, and another three or four
>> internal 3.5" bays.  Works great.  Nice and plain and not flashing
>> lights or other bling.  And not too loud either.  Which is good.
>> 
>> But I've used crappy drive cages before, crappy hot swap ones.  Not
>> good.  And I think it's time I just went with a 4U rack mount with a
>> bunch of hot swap bays, if I could only find one that wasn't an arm
>> and a leg.
>> 

Roger> I have had good luck with the ICY DOCK brand how swap I have 4
Roger> different 4 bay-3bay ones spanning 6+ years and they all seem
Roger> to just work.  And each newer version seemed to have improved
Roger> design from the prior ones (plugs easier to get to, and such).

Thanks for the recommendation!  I'll be looking at these for
sure. Just wish my case could hold two of them.  It would be nice if
they made a 2.5 x 5.25" to 4 x 3.5" disk carrier, so I could stuff two
of them into my 5 exposed 5.25" bays.  *grin*

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-15 11:32       ` Nix
@ 2020-09-15 18:10         ` John Stoffel
  0 siblings, 0 replies; 26+ messages in thread
From: John Stoffel @ 2020-09-15 18:10 UTC (permalink / raw)
  To: Nix; +Cc: Brian Allen Vanderburg II, antlists, linux-raid

>>>>> "Nix" == Nix  <nix@esperi.org.uk> writes:

Nix> On 5 Sep 2020, Brian Allen Vanderburg, II verbalised:
>> The idea is actually to be able to use more than two disks, like raid 5
>> or raid 6, except with parity on their own disks instead of distributed
>> across disks, and data kept own their own disks as well.  I've used
>> SnapRaid a bit and was just making some changes to my own setup when I
>> got the idea as to why something similar can't be done in block device
>> level, but keeping one of the advantages of SnapRaid-like systems which
>> is if any data disk is lost beyond recovery, then only the data on that
>> data disk is lost due to the fact that the data on the other data disks
>> are still their own complete filesystem, and providing real-time updates
>> to the parity data.
>> 
>> 
>> So for instance
>> 
>> /dev/sda - may be data disk 1, say 1TB
>> 
>> /dev/sdb - may be data disk 2, 2TB
>> 
>> /dev/sdc - may be data disk 3, 2TB
>> 
>> /dev/sdd - may be parity disk 1 (maybe a raid-5-like setup), 2TB
>> 
>> /dev/sde - may be parity disk 2 (maybe a raid-6-like setup), 2TB

Nix> Why use something as crude as parity? There's *lots* of space
Nix> there. You could store full-blown Reed-Solomon stuff in there in
Nix> much less space than parity would require with far more
Nix> likelihood of repairing even very large errors. A separate
Nix> device-mapper target would seem to be perfect for this: like
Nix> dm-integrity, only with a separate set of "error-correcting
Nix> disks" rather than expanding every sector like dm-integrity does.

The problem with parity only disks is that they become hotspots and
drag down performance.  You need/want to stripe parity/checksums/error
correction data across all disks equally so as to get the best
performance.

There are papers on why no one uses RAID4 because of this.

The big trend now seems to be erasure coding, where the parity is
striped across the entire cluster, with data stored in varying levels
of protection, with some mirrored, some striped, some in varying
levels.

John

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
       [not found]                             ` <43ce60a7-64d1-51bc-f29c-7a6388ad91d5@grumpydevil.homelinux.org>
@ 2020-09-15 18:12                               ` John Stoffel
  2020-09-15 19:52                                 ` Rudy Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: John Stoffel @ 2020-09-15 18:12 UTC (permalink / raw)
  To: Rudy Zijlstra
  Cc: John Stoffel, Brian Allen Vanderburg II, Wols Lists, linux-raid

>>>>> "Rudy" == Rudy Zijlstra <rudy@grumpydevil.homelinux.org> writes:

Rudy> Op 15-09-2020 om 04:12 schreef John Stoffel:
Brian> For more drives, you can use one of those external drive shelf
Brian> boxes.  I currently have the HP M6710 I got off eBay with all
Brian> caddies for about $100, which can house 24 2.5 hard drives in a
Brian> 2U chassis and I've used an LSI 9201-16e to access it (both
Brian> HBAs flashed to 20.00.07 or something like that).  I've already
Brian> tested it and it works great, though a bit loud on the fans
Brian> when powering on.  My understanding is also if you have more
Brian> than one of these shelves you can daisy chain them via their
Brian> ports SAS card -> Shelf 1 -> Shelf 2, etc, even cycling back to
Brian> the SAS card for multi-path support (which is at the time over
Brian> my head).  My plan for it is to put in my network closet once I
Brian> get it cleaned out and cabling ran better to provide
Brian> whole-house NAS storage.  I think there is also an M6720 model
Brian> for 24 3.5 drives in a 4U chassis.  There is also NetApp shelf
Brian> I was looking at but from reading looks like it uses a QSFP
Brian> connector on it's IOM, and the cables that converted from
Brian> SFF-8088 were quite expensive.
    
Rudy> I'd take a look at HP D2600

Looks like it would be too loud for a home office, with those small
fans.  And probably overkill for my needs.  But thank you for pointing
this out!  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-15 18:07                             ` John Stoffel
@ 2020-09-15 19:34                               ` Ram Ramesh
  0 siblings, 0 replies; 26+ messages in thread
From: Ram Ramesh @ 2020-09-15 19:34 UTC (permalink / raw)
  To: John Stoffel, Roger Heflin; +Cc: Wols Lists, Linux RAID

On 9/15/20 1:07 PM, John Stoffel wrote:
>>>>>> "Roger" == Roger Heflin <rogerheflin@gmail.com> writes:
>>> I've been looking at them for a while now, but hesitating
>>> because... not sure why.  I'm using a CoolerMaster case with five
>>> 5.25" bays, plus a 3.5" bay external, and another three or four
>>> internal 3.5" bays.  Works great.  Nice and plain and not flashing
>>> lights or other bling.  And not too loud either.  Which is good.
>>>
>>> But I've used crappy drive cages before, crappy hot swap ones.  Not
>>> good.  And I think it's time I just went with a 4U rack mount with a
>>> bunch of hot swap bays, if I could only find one that wasn't an arm
>>> and a leg.
>>>
> Roger> I have had good luck with the ICY DOCK brand how swap I have 4
> Roger> different 4 bay-3bay ones spanning 6+ years and they all seem
> Roger> to just work.  And each newer version seemed to have improved
> Roger> design from the prior ones (plugs easier to get to, and such).
>
> Thanks for the recommendation!  I'll be looking at these for
> sure. Just wish my case could hold two of them.  It would be nice if
> they made a 2.5 x 5.25" to 4 x 3.5" disk carrier, so I could stuff two
> of them into my 5 exposed 5.25" bays.  *grin*
John,

   Drive cages come in varity of sizes. You have 1 to 1, 2 to 3, 3 to 4 
and 4 to 5. Mix and match to fill all 5 bays with best density of 3.5 
inch bays.  Here is one example and I am sure you can find many.

https://www.newegg.com/p/pl?d=hot+swap+bay&N=100007599%20600551589&name=SSD+%2F+HDD+Accessories&Order=4

I have three cages, two istar and 1 icy doc. My icy dock lost one bay. 
The others are holding a bit better. So, YMMV. However, expect them to 
have noisy fans. You may want to change to quieter/reliable ones.


Regards
Ramesh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Linux raid-like idea
  2020-09-15 18:12                               ` John Stoffel
@ 2020-09-15 19:52                                 ` Rudy Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Rudy Zijlstra @ 2020-09-15 19:52 UTC (permalink / raw)
  To: John Stoffel; +Cc: Brian Allen Vanderburg II, Wols Lists, linux-raid


> Brian> for 24 3.5 drives in a 4U chassis.  There is also NetApp shelf
> Brian> I was looking at but from reading looks like it uses a QSFP
> Brian> connector on it's IOM, and the cables that converted from
> Brian> SFF-8088 were quite expensive.
>      
> Rudy> I'd take a look at HP D2600
>
> Looks like it would be too loud for a home office, with those small
> fans.  And probably overkill for my needs.  But thank you for pointing
> this out!
I've got mine in the cellar, and more quiet than the one it replaces. Do 
not hear it... but then, i have a noisy server running there :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-09-15 19:53 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1cf0d18c-2f63-6bca-9884-9544b0e7c54e.ref@aim.com>
2020-08-24 17:23 ` Linux raid-like idea Brian Allen Vanderburg II
2020-08-28 15:31   ` antlists
2020-09-05 21:47     ` Brian Allen Vanderburg II
2020-09-05 22:42       ` Wols Lists
2020-09-11 15:14         ` Brian Allen Vanderburg II
2020-09-11 19:16           ` antlists
2020-09-11 20:14             ` Brian Allen Vanderburg II
2020-09-12  6:09               ` Song Liu
2020-09-12 14:40               ` Adam Goryachev
2020-09-12 16:19               ` antlists
2020-09-12 17:28                 ` John Stoffel
2020-09-12 18:41                   ` antlists
2020-09-13 12:50                     ` John Stoffel
2020-09-13 16:01                       ` Wols Lists
2020-09-13 23:49                         ` Brian Allen Vanderburg II
2020-09-15  2:12                           ` John Stoffel
     [not found]                             ` <43ce60a7-64d1-51bc-f29c-7a6388ad91d5@grumpydevil.homelinux.org>
2020-09-15 18:12                               ` John Stoffel
2020-09-15 19:52                                 ` Rudy Zijlstra
2020-09-15  2:09                         ` John Stoffel
2020-09-15 11:14                           ` Roger Heflin
2020-09-15 18:07                             ` John Stoffel
2020-09-15 19:34                               ` Ram Ramesh
2020-09-14 17:19                 ` Phillip Susi
2020-09-14 17:26                   ` Wols Lists
2020-09-15 11:32       ` Nix
2020-09-15 18:10         ` John Stoffel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).