Linux raid-like idea

From: Brian Allen Vanderburg II <brianvanderburg2@aim.com>
To: linux-raid@vger.kernel.org
Subject: Linux raid-like idea
Date: Mon, 24 Aug 2020 13:23:11 -0400	[thread overview]
Message-ID: <1cf0d18c-2f63-6bca-9884-9544b0e7c54e@aim.com> (raw)
In-Reply-To: 1cf0d18c-2f63-6bca-9884-9544b0e7c54e.ref@aim.com

I'm not a systems developer so can't even begin such an idea myself, but
I have a small idea about a RAID solution that may be beneficial, and
then again, maybe not.

It seems that RAID is sometimes advised against especially for larger
disks due to issues with rebuild times.  If during a rebuild, another
disk fails, it could mean the loss of the entire array of data,
depending on how many parity disks exist.  Of course RAID is not in
itself an alternative to a backup of critical data, but to minimize the
chance of total data loss of an array, other solutions (UnRAID/etc)
exist.  One I've used a little bit is mergerfs/SnapRAID.  Mergerfs take
two complete file systems and presents it as a single file system, and
can distribute the files across, with the advantage that a lost data
drive does not lose the entire array since each disk is its own complete
filesystem.  Only the files on the lost disk would be missing.  SnapRAID
can then be ran periodically to create parity data to restore from if a
data disk is lost.

This got me to thinking, why can't we do something like this at the
driver level, with real-time parity protection? In SnapRAID, the parity
must be manually built via the command, and a lost disk means that disk
is down until a restore command is manually ran. In a real RAID array,
the parity would be calculated in real time, and a block from a missing
disk can still be read based on the parity information and other disks.
It's just that, since the disks are combined into one logical disk, a
completely lost data disk with no available parity essentially loses all
data in the array.

So the idea is, for a RAID system maybe something like mdadm, but to
present each data disk as its own block device.  /dev/sda1 may be
presented as /dev/fr0 (fr = fakeRAID), /dev/sdb1 as /def/fr1, and so on,
with /dev/sdd1 and /dev/sde1 as parity disks.  A read/write from
/dev/fr0 would always map to /dev/sda1 plus a small fixed-size header
for the associations. This fixed-size header would also allow, if the
drive was removed and inserted into a different system, a loopback mount
with offset to access the contents.

The scope of the idea stops there, just providing parity protection to
the data disks while presenting each data disk as its own block device.
If desired, multiple sets could be created, each with their own data and
parity disks.  And it should support adding new data and parity disks,
removing, etc. Ideally, the data disks could be of different sizes, as
long as the parity disks were the largest.

On top of this, the user then uses the exposed block devices as they see
fit.  If the data disks are related, they could use something like
mergerfs on top of the mount points of the files systems in
/dev/fr0,1,2,etc. If the disks are not related then /dev/fr0,1,2,etc
could be used independently.  They could be partitioned and have more
than one file system on them.  Perhaps in theory a RAID array could be
built on top of them, but this defeats the purpose of each data disk
containing it's own complete file system, and would then have the issue
that a lost data disk would loose the entire data.

Just an idea I wanted to put out there to see if there were any
merit/interest in it.

Thanks,

Brian Allen Vanderburg II