Re: Linux raid-like idea

From: Brian Allen Vanderburg II <brianvanderburg2@aim.com>
To: antlists <antlists@youngman.org.uk>, linux-raid@vger.kernel.org
Subject: Re: Linux raid-like idea
Date: Sat, 5 Sep 2020 17:47:50 -0400	[thread overview]
Message-ID: <ef3719a9-ae53-516e-29ee-36d1cdf91ef1@aim.com> (raw)
In-Reply-To: <e3cb1bbe-65eb-5b75-8e99-afba72156b6e@youngman.org.uk>

The idea is actually to be able to use more than two disks, like raid 5
or raid 6, except with parity on their own disks instead of distributed
across disks, and data kept own their own disks as well.  I've used
SnapRaid a bit and was just making some changes to my own setup when I
got the idea as to why something similar can't be done in block device
level, but keeping one of the advantages of SnapRaid-like systems which
is if any data disk is lost beyond recovery, then only the data on that
data disk is lost due to the fact that the data on the other data disks
are still their own complete filesystem, and providing real-time updates
to the parity data.

So for instance

/dev/sda - may be data disk 1, say 1TB

/dev/sdb - may be data disk 2, 2TB

/dev/sdc - may be data disk 3, 2TB

/dev/sdd - may be parity disk 1 (maybe a raid-5-like setup), 2TB

/dev/sde - may be parity disk 2 (maybe a raid-6-like setup), 2TB

The parity disks must be larger than the largest data disks. If a given
block is not present on a data disk (due to it being smaller than the
other data disks) it is computed as all zeroes. So the parity for
position 1.5TB would use zeros from /dev/sda and whatever the block is
from /dev/sdb and /dev/sdc

In normal raid 5/6, this would only expose a single logical block device
/dev/md0 and the data and parity would distributed across the disks.  If
any data disk is lost without any parity, it's not possible to recover
any data since the blocks are scattered across all disks.  What good is
any file if it is missing every other third block?  Even trying to
figure out the files would be virtually impossible since even the
structures of any filesystem and everything else on /dev/md0 is also
distributed.

My idea is basically, instead of exposing a single logical block device
that is the 'joined' array, each data disk would be exposed as its own
logical block device. /dev/sda1 may be exposed as /dev/fr1 (well, some
better name), /dev/sdb1 as /dev/fr2, /dev/sdc1 as /dev/fr3, the parity
disks would not be exposed as a logical block device.  The blocks would
essentially be a 1-1 identity between /def/fr1 and /dev/sda1 and so on
except a small header on /dev/sda, so block 0 on /dev/fr1 may actually
be block 8 on /dev/sda.  If any single disk were ever removed from the
array, the full data on it could still be accessed via losetup with an
offset, and any file systems that were built on it could be read
independently from any of the other data disks.

The difference from traditional raid is that, if every disk somehow got
damaged beyond recovery excepted for /dev/sda, it would still be
possible to recover whatever data was on that disk since it was exposed
to the system as its own block device, with an entire filesystem on it.
The same with /dev/sdb and /dev/sdc. Any write to any of the data block
devices would automatically also write parity.  Any read from any data
block device if it is failed would recompute from available parity in
real time, except with degraded performance.  The file systems created
on the exposed logical block devices could be used however the user sees
fit, maybe related such as a union/merge pool file system, or unrelated
such as /home on the /dev/fr1 filesystem and /usr/local on /dev/fr2.
There would be no read/write performance increase, since reads from the
a single logical block device maps to the same physical device.  But
there would be the typical redundancy of raid, and if during any
recovery/rebuild another disk fails which would prevent the recovery of
the data, only the data on the lost data disks is gone.

Thanks,

Brian Vanderburg II

On 8/28/20 11:31 AM, antlists wrote:
> On 24/08/2020 18:23, Brian Allen Vanderburg II wrote:
>> Just an idea I wanted to put out there to see if there were any
>> merit/interest in it.
>
> I hate to say it, but your data/parity pair sounds exactly like a
> two-disk raid-1 mirror. Yes, parity != mirror, but in practice I think
> it's a distinction without a difference.
>
> Cheers,
> Wol