linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID4 with no striping mode request
@ 2023-02-13  3:07 Kyle Sanderson
  2023-02-13 19:40 ` John Stoffel
  0 siblings, 1 reply; 11+ messages in thread
From: Kyle Sanderson @ 2023-02-13  3:07 UTC (permalink / raw)
  To: device-mapper development, linux-raid; +Cc: Song Liu, Linux-Kernel

hi DM and Linux-RAID,

There have been multiple proprietary solutions (some nearly 20 years
old now) with a number of (userspace) bugs that are becoming untenable
for me as an end user. Basically how they work is a closed MD module
(typically administered through DM) that uses RAID4 for a dedicated
parity disk across multiple other disks.

As there is no striping, the maximum size of the protected data is the
size of the parity disk (so a set of 4+8+12+16 disks can be protected
by a single dedicated 16 disk).When a block is written on any disk,
the parity bit is read from the parity disk again, and updated
depending on the existing + new bit value (so writing disk + parity
disk spun up). Additionally, if enough disks are already spun up, the
parity information can be recalculated from all of the spinning disks,
resulting in a single write to the parity disk (without a read on the
parity, doubling throughput). Finally any of the data disks can be
moved around within the array without impacting parity as the layout
has not changed. I don't necessarily need all of these features, just
the ability to remove a disk and still access the data that was on
there by spinning up every other disk until the rebuild is complete is
important.

The benefit of this can be the data disks are all zoned, and you can
have a fast parity disk and still maintain excellent performance in
the array (limited only by the speed of the disk in question +
parity). Additionally, should 2 disks fail, you've either lost the
parity and data disk, or 2 data disks with the parity and other disks
not lost.

I was reading through the DM and MD code and it looks like everything
may already be there to do this, just needs (significant) stubs to be
added to support this mode (or new code). Snapraid is a friendly (and
respectable) implementation of this. Unraid and Synology SHR compete
in this space, as well as other NAS and enterprise SAN providers.

Kyle.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID4 with no striping mode request
  2023-02-13  3:07 RAID4 with no striping mode request Kyle Sanderson
@ 2023-02-13 19:40 ` John Stoffel
  2023-02-13 21:11   ` Kyle Sanderson
  0 siblings, 1 reply; 11+ messages in thread
From: John Stoffel @ 2023-02-13 19:40 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: device-mapper development, linux-raid, Song Liu, Linux-Kernel

>>>>> "Kyle" == Kyle Sanderson <kyle.leet@gmail.com> writes:

> hi DM and Linux-RAID,
> There have been multiple proprietary solutions (some nearly 20 years
> old now) with a number of (userspace) bugs that are becoming untenable
> for me as an end user. Basically how they work is a closed MD module
> (typically administered through DM) that uses RAID4 for a dedicated
> parity disk across multiple other disks.

You need to explain what you want in *much* beter detail.  Give simple
concrete examples.  From the sound of it, you want RAID6 but with
RAID4 dedicated Parity so you can spin down some of the data disks in
the array?  But if need be, spin up idle disks to recover data if you
lose an active disk?  

Really hard to understand what exactly you're looking for here.


> As there is no striping, the maximum size of the protected data is the
> size of the parity disk (so a set of 4+8+12+16 disks can be protected
> by a single dedicated 16 disk).When a block is written on any disk,
> the parity bit is read from the parity disk again, and updated
> depending on the existing + new bit value (so writing disk + parity
> disk spun up). Additionally, if enough disks are already spun up, the
> parity information can be recalculated from all of the spinning disks,
> resulting in a single write to the parity disk (without a read on the
> parity, doubling throughput). Finally any of the data disks can be
> moved around within the array without impacting parity as the layout
> has not changed. I don't necessarily need all of these features, just
> the ability to remove a disk and still access the data that was on
> there by spinning up every other disk until the rebuild is complete is
> important.

> The benefit of this can be the data disks are all zoned, and you can
> have a fast parity disk and still maintain excellent performance in
> the array (limited only by the speed of the disk in question +
> parity). Additionally, should 2 disks fail, you've either lost the
> parity and data disk, or 2 data disks with the parity and other disks
> not lost.

> I was reading through the DM and MD code and it looks like everything
> may already be there to do this, just needs (significant) stubs to be
> added to support this mode (or new code). Snapraid is a friendly (and
> respectable) implementation of this. Unraid and Synology SHR compete
> in this space, as well as other NAS and enterprise SAN providers.

> Kyle.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID4 with no striping mode request
  2023-02-13 19:40 ` John Stoffel
@ 2023-02-13 21:11   ` Kyle Sanderson
  2023-02-14 18:37     ` Song Liu
       [not found]     ` <CAM23Vxr8LkkcVDFfW1=qEYGgo7JG1qx62eWSV4WOw4_MnD+TZA@mail.gmail.com>
  0 siblings, 2 replies; 11+ messages in thread
From: Kyle Sanderson @ 2023-02-13 21:11 UTC (permalink / raw)
  To: John Stoffel
  Cc: device-mapper development, linux-raid, Song Liu, Linux-Kernel

> On Mon, Feb 13, 2023 at 11:40 AM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Kyle" == Kyle Sanderson <kyle.leet@gmail.com> writes:
>
> > hi DM and Linux-RAID,
> > There have been multiple proprietary solutions (some nearly 20 years
> > old now) with a number of (userspace) bugs that are becoming untenable
> > for me as an end user. Basically how they work is a closed MD module
> > (typically administered through DM) that uses RAID4 for a dedicated
> > parity disk across multiple other disks.
>
> You need to explain what you want in *much* beter detail.  Give simple
> concrete examples.  From the sound of it, you want RAID6 but with
> RAID4 dedicated Parity so you can spin down some of the data disks in
> the array?  But if need be, spin up idle disks to recover data if you
> lose an active disk?

No, just a single dedicated parity disk - there's no striping on any
of the data disks. The result of this is you can lose 8 data disks and
the parity disk from an array of 10, and still access the last
remaining disk because each disk maintains a complete copy of their
own data. How the implementations do this is still expose each
individual disk (/dev/md*) that are formatted (+ encrypted)
independently, and when written to, update the parity information on
the dedicated disk. That way, when you add a new disk that's fully
zero'd to the array (parity disk is 16T, new disk is 4T), parity is
preserved. Any bytes written beyond the 4T barrier do not include
those disks in the parity calculation.

> Really hard to understand what exactly you're looking for here.

This might help https://www.snapraid.it/compare . There's at least
hundreds of thousands of these systems out there (based on public
sales from a single vendor), if not well into the millions.

Kyle.

On Mon, Feb 13, 2023 at 11:40 AM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Kyle" == Kyle Sanderson <kyle.leet@gmail.com> writes:
>
> > hi DM and Linux-RAID,
> > There have been multiple proprietary solutions (some nearly 20 years
> > old now) with a number of (userspace) bugs that are becoming untenable
> > for me as an end user. Basically how they work is a closed MD module
> > (typically administered through DM) that uses RAID4 for a dedicated
> > parity disk across multiple other disks.
>
> You need to explain what you want in *much* beter detail.  Give simple
> concrete examples.  From the sound of it, you want RAID6 but with
> RAID4 dedicated Parity so you can spin down some of the data disks in
> the array?  But if need be, spin up idle disks to recover data if you
> lose an active disk?
>
> Really hard to understand what exactly you're looking for here.
>
>
> > As there is no striping, the maximum size of the protected data is the
> > size of the parity disk (so a set of 4+8+12+16 disks can be protected
> > by a single dedicated 16 disk).When a block is written on any disk,
> > the parity bit is read from the parity disk again, and updated
> > depending on the existing + new bit value (so writing disk + parity
> > disk spun up). Additionally, if enough disks are already spun up, the
> > parity information can be recalculated from all of the spinning disks,
> > resulting in a single write to the parity disk (without a read on the
> > parity, doubling throughput). Finally any of the data disks can be
> > moved around within the array without impacting parity as the layout
> > has not changed. I don't necessarily need all of these features, just
> > the ability to remove a disk and still access the data that was on
> > there by spinning up every other disk until the rebuild is complete is
> > important.
>
> > The benefit of this can be the data disks are all zoned, and you can
> > have a fast parity disk and still maintain excellent performance in
> > the array (limited only by the speed of the disk in question +
> > parity). Additionally, should 2 disks fail, you've either lost the
> > parity and data disk, or 2 data disks with the parity and other disks
> > not lost.
>
> > I was reading through the DM and MD code and it looks like everything
> > may already be there to do this, just needs (significant) stubs to be
> > added to support this mode (or new code). Snapraid is a friendly (and
> > respectable) implementation of this. Unraid and Synology SHR compete
> > in this space, as well as other NAS and enterprise SAN providers.
>
> > Kyle.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RAID4 with no striping mode request
  2023-02-13 21:11   ` Kyle Sanderson
@ 2023-02-14 18:37     ` Song Liu
       [not found]     ` <CAM23Vxr8LkkcVDFfW1=qEYGgo7JG1qx62eWSV4WOw4_MnD+TZA@mail.gmail.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Song Liu @ 2023-02-14 18:37 UTC (permalink / raw)
  To: Kyle Sanderson
  Cc: John Stoffel, device-mapper development, linux-raid, Linux-Kernel

Hi Kyle,

On Mon, Feb 13, 2023 at 1:12 PM Kyle Sanderson <kyle.leet@gmail.com> wrote:
>
[...]

> >
> > > The benefit of this can be the data disks are all zoned, and you can
> > > have a fast parity disk and still maintain excellent performance in
> > > the array (limited only by the speed of the disk in question +
> > > parity). Additionally, should 2 disks fail, you've either lost the
> > > parity and data disk, or 2 data disks with the parity and other disks
> > > not lost.

I think I understand the high level idea here. But I think we need a lot more
details on how to implement this, and what the system would look like.
Also, I don't quite follow why the data disks can be zoned devices and
still maintain excellent performance.

> > > I was reading through the DM and MD code and it looks like everything
> > > may already be there to do this, just needs (significant) stubs to be
> > > added to support this mode (or new code). Snapraid is a friendly (and
> > > respectable) implementation of this. Unraid and Synology SHR compete
> > > in this space, as well as other NAS and enterprise SAN providers.

Assume we figure out all the details. I will be happy to review patches in
MD code. But I won't be able to develop this feature myself.

Thanks,
Song

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
       [not found]     ` <CAM23Vxr8LkkcVDFfW1=qEYGgo7JG1qx62eWSV4WOw4_MnD+TZA@mail.gmail.com>
@ 2023-02-14 22:28       ` Roger Heflin
  2023-02-15  7:23         ` Kyle Sanderson
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Roger Heflin @ 2023-02-14 22:28 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Kyle Sanderson, linux-raid, Song Liu, device-mapper development,
	John Stoffel, Linux-Kernel

On Tue, Feb 14, 2023 at 3:27 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
>

>
>
> ...which is RAID1 plus a parity disk which seems superfluous as you achieve (N-1)
> resilience against single device failures already without the later.
>
> What would you need such parity disk for?
>
> Heinz
>

I thought that at first too, but threw that idea out as it did not
make much sense.

What he appears to want is 8 linear non-striped data disks + a parity disk.

Such that you can lose any one data disk and parity can rebuild that
disk.  And if you lose several data diskis, then you have intact
non-striped data for the remaining disks.

It would almost seem that you would need to put a separate filesystem
on each data disk/section (or have a filesystem that is redundant
enough to survive) otherwise losing an entire data disk would leave
the filesystem in a mess..

So N filesystems + a parity disk for the data on the N separate
filesystems.   And each write needs you to read the data from the disk
you are writing to, and the parity and recalculate the new parity and
write out the data and new parity.

If the parity disk was an SSD it would be fast enough, but if parity
was an SSD I would expect it to get used up/burned out from all of
parity being re-written for each write on each disk unless you bought
an expensive high-write ssd.

The only advantage of the setup is that if you lose too many disks you
still have some data.

It is not clear to me that it would be any cheaper if parity needs to
be a normal ssd's (since ssds are about 4x the price/gb and high-write
ones are even more) than a classic bunch of mirrors, or even say a 4
disks raid6 where you can lose any 2 and still have data.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
  2023-02-14 22:28       ` [dm-devel] " Roger Heflin
@ 2023-02-15  7:23         ` Kyle Sanderson
  2023-02-15  9:40         ` Wols Lists
       [not found]         ` <CAM23VxpzY6qYsdTYxe01FT7AJvEbODf8X_vq8ALL35TfyrB8xQ@mail.gmail.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Kyle Sanderson @ 2023-02-15  7:23 UTC (permalink / raw)
  To: Roger Heflin
  Cc: Heinz Mauelshagen, linux-raid, Song Liu,
	device-mapper development, John Stoffel, Linux-Kernel

> On Tue, Feb 14, 2023 at 2:28 PM Roger Heflin <rogerheflin@gmail.com> wrote:
>
> Such that you can lose any one data disk and parity can rebuild that
> disk.  And if you lose several data diskis, then you have intact
> non-striped data for the remaining disks.
>
> It would almost seem that you would need to put a separate filesystem
> on each data disk/section (or have a filesystem that is redundant
> enough to survive) otherwise losing an entire data disk would leave
> the filesystem in a mess..

Exactly, each disk operates completely independently (so a XFS
partition per disk on each md device). So I have 4 disks presently, 3
are data, and one is dedicated parity. I can scale up or down these
disks freely, changing the physical data disk sizes and still have
them all protected by the single parity disk by removing and adding
them to the array.

> On Tue, Feb 14, 2023 at 6:23 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
>
> as any of the currently implemented 'parity' algorithms (block xor/P-/Q-Syndrome) provided by DM/MD RAID
> have to have at least two data blocks to calculate:  are you, apart from the filesystem thoughts you bring up, thinking
> about running those on e.g. pairs of disks of mentioned even numbered set of 8?

Users of these appliances today gain "parity" by adding the second
disk (note it must be the equal to or the largest in the array), and
can scale by adding disk by disk individually (so 3, 4, 5, 6...).

Hopefully it's starting to make more sense now.

On Tue, Feb 14, 2023 at 2:28 PM Roger Heflin <rogerheflin@gmail.com> wrote:
>
> On Tue, Feb 14, 2023 at 3:27 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
> >
>
> >
> >
> > ...which is RAID1 plus a parity disk which seems superfluous as you achieve (N-1)
> > resilience against single device failures already without the later.
> >
> > What would you need such parity disk for?
> >
> > Heinz
> >
>
> I thought that at first too, but threw that idea out as it did not
> make much sense.
>
> What he appears to want is 8 linear non-striped data disks + a parity disk.
>
> Such that you can lose any one data disk and parity can rebuild that
> disk.  And if you lose several data diskis, then you have intact
> non-striped data for the remaining disks.
>
> It would almost seem that you would need to put a separate filesystem
> on each data disk/section (or have a filesystem that is redundant
> enough to survive) otherwise losing an entire data disk would leave
> the filesystem in a mess..
>
> So N filesystems + a parity disk for the data on the N separate
> filesystems.   And each write needs you to read the data from the disk
> you are writing to, and the parity and recalculate the new parity and
> write out the data and new parity.
>
> If the parity disk was an SSD it would be fast enough, but if parity
> was an SSD I would expect it to get used up/burned out from all of
> parity being re-written for each write on each disk unless you bought
> an expensive high-write ssd.
>
> The only advantage of the setup is that if you lose too many disks you
> still have some data.
>
> It is not clear to me that it would be any cheaper if parity needs to
> be a normal ssd's (since ssds are about 4x the price/gb and high-write
> ones are even more) than a classic bunch of mirrors, or even say a 4
> disks raid6 where you can lose any 2 and still have data.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
  2023-02-14 22:28       ` [dm-devel] " Roger Heflin
  2023-02-15  7:23         ` Kyle Sanderson
@ 2023-02-15  9:40         ` Wols Lists
       [not found]         ` <CAM23VxpzY6qYsdTYxe01FT7AJvEbODf8X_vq8ALL35TfyrB8xQ@mail.gmail.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Wols Lists @ 2023-02-15  9:40 UTC (permalink / raw)
  To: Roger Heflin, Heinz Mauelshagen
  Cc: Kyle Sanderson, linux-raid, device-mapper development, Linux-Kernel

On 14/02/2023 22:28, Roger Heflin wrote:
> On Tue, Feb 14, 2023 at 3:27 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
>>
> 
>>
>>
>> ...which is RAID1 plus a parity disk which seems superfluous as you achieve (N-1)
>> resilience against single device failures already without the later.
>>
>> What would you need such parity disk for?
>>
>> Heinz
>>
> 
> I thought that at first too, but threw that idea out as it did not
> make much sense.
> 
> What he appears to want is 8 linear non-striped data disks + a parity disk.
> 
> Such that you can lose any one data disk and parity can rebuild that
> disk.  And if you lose several data diskis, then you have intact
> non-striped data for the remaining disks.

But all your lost disks are lost (until you rebuild parity. The lost 
disks are still lost, but you won't lose any more, unless lightning 
strikes twice).
> 
> It would almost seem that you would need to put a separate filesystem
> on each data disk/section (or have a filesystem that is redundant
> enough to survive) otherwise losing an entire data disk would leave
> the filesystem in a mess..
> 
> So N filesystems + a parity disk for the data on the N separate
> filesystems.   And each write needs you to read the data from the disk
> you are writing to, and the parity and recalculate the new parity and
> write out the data and new parity.
> 
> If the parity disk was an SSD it would be fast enough, but if parity
> was an SSD I would expect it to get used up/burned out from all of
> parity being re-written for each write on each disk unless you bought
> an expensive high-write ssd.

I think even cheap SSDs are okay now ...
> 
> The only advantage of the setup is that if you lose too many disks you
> still have some data.
> 
> It is not clear to me that it would be any cheaper if parity needs to
> be a normal ssd's (since ssds are about 4x the price/gb and high-write
> ones are even more) than a classic bunch of mirrors, or even say a 4
> disks raid6 where you can lose any 2 and still have data.

The only (claimed) advantage of the setup is that you can mix and match 
disk sizes. Personally, I'd just raid-0 the smaller disks to get a whole 
bunch of volumes roughly equal to the largest disk, raid-5 or -6 those 
together, and put LVM on the top.

Probably split two disks out to mirror as my / partition away from the 
main /home raid/lvm.

This scheme is just too hare-brained imho.

(Oh, and if one drive fails and the others carry on writing, you run the 
serious risk of screwing up parity and losing your lost disk, anyway. 
It's just not robust in the face of glitches ...)

Cheers,
Wol


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
       [not found]         ` <CAM23VxpzY6qYsdTYxe01FT7AJvEbODf8X_vq8ALL35TfyrB8xQ@mail.gmail.com>
@ 2023-02-15 11:44           ` Roger Heflin
  2023-02-15 14:53             ` Wols Lists
  2023-02-16  0:01             ` Kyle Sanderson
  0 siblings, 2 replies; 11+ messages in thread
From: Roger Heflin @ 2023-02-15 11:44 UTC (permalink / raw)
  To: Heinz Mauelshagen
  Cc: Kyle Sanderson, linux-raid, Song Liu, device-mapper development,
	John Stoffel, Linux-Kernel

I think he is wanting the parity across the data blocks on the
separate filesystems (some sort of parity across fs[1-8]/block0 to
parity/block0).

it is not clear to me what this setup would be enough better than what
the current setups.    Given that one could have 8 spin + 1ssd or 12
spin for the same price.    And 2 6 disk raid6's would have the same
usable space, and be pretty safe (can lose any 2 of the 6 and lose no
data).  And given the separate filesystems requirement that would
require some software above the filesystems to manage spreading the
disks across multiple filesystems.   The risk of another disk going
bad (while one was failed) and losing a disk's worth of data would
push me to use the 6-disk raid6.

WOL: current SSD's are rated for around 1000-2000 writes.  So a 1Tb
disk can sustain 1000-2000TB of total writes.  And writes to
filesystem blocks would get re-written more often than data blocks.
 How well it would work would depend on how often the data is deleted
and re-written.   If the disks are some sort of long term storage then
the SSD is not going to get used up.   And I am not sure if the rated
used up really means anything unless you are using a STUPID enterprise
controller that proactively disables/kills the SSD when it says the
rated writes have happened.   I have a 500GB ssd in a mirror that
"FAILED" according to smart 2 years ago and so far is still fully
functional, and it is "GOOD" again because the counters used to
determine total writes seems to have rolled over.

On Tue, Feb 14, 2023 at 8:23 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
>
> Roger,
>
> as any of the currently implemented 'parity' algorithms (block xor/P-/Q-Syndrome) provided by DM/MD RAID
> have to have at least two data blocks to calculate:  are you, apart from the filesystem thoughts you bring up, thinking
> about running those on e.g. pairs of disks of mentioned even numbered set of 8?
>
> Heinz
>
> On Tue, Feb 14, 2023 at 11:28 PM Roger Heflin <rogerheflin@gmail.com> wrote:
>>
>> On Tue, Feb 14, 2023 at 3:27 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
>> >
>>
>> >
>> >
>> > ...which is RAID1 plus a parity disk which seems superfluous as you achieve (N-1)
>> > resilience against single device failures already without the later.
>> >
>> > What would you need such parity disk for?
>> >
>> > Heinz
>> >
>>
>> I thought that at first too, but threw that idea out as it did not
>> make much sense.
>>
>> What he appears to want is 8 linear non-striped data disks + a parity disk.
>>
>> Such that you can lose any one data disk and parity can rebuild that
>> disk.  And if you lose several data diskis, then you have intact
>> non-striped data for the remaining disks.
>>
>> It would almost seem that you would need to put a separate filesystem
>> on each data disk/section (or have a filesystem that is redundant
>> enough to survive) otherwise losing an entire data disk would leave
>> the filesystem in a mess..
>>
>> So N filesystems + a parity disk for the data on the N separate
>> filesystems.   And each write needs you to read the data from the disk
>> you are writing to, and the parity and recalculate the new parity and
>> write out the data and new parity.
>>
>> If the parity disk was an SSD it would be fast enough, but if parity
>> was an SSD I would expect it to get used up/burned out from all of
>> parity being re-written for each write on each disk unless you bought
>> an expensive high-write ssd.
>>
>> The only advantage of the setup is that if you lose too many disks you
>> still have some data.
>>
>> It is not clear to me that it would be any cheaper if parity needs to
>> be a normal ssd's (since ssds are about 4x the price/gb and high-write
>> ones are even more) than a classic bunch of mirrors, or even say a 4
>> disks raid6 where you can lose any 2 and still have data.
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
  2023-02-15 11:44           ` Roger Heflin
@ 2023-02-15 14:53             ` Wols Lists
  2023-02-15 15:22               ` Roger Heflin
  2023-02-16  0:01             ` Kyle Sanderson
  1 sibling, 1 reply; 11+ messages in thread
From: Wols Lists @ 2023-02-15 14:53 UTC (permalink / raw)
  To: Roger Heflin, Heinz Mauelshagen
  Cc: Kyle Sanderson, linux-raid, Song Liu, device-mapper development,
	John Stoffel, Linux-Kernel

On 15/02/2023 11:44, Roger Heflin wrote:
> WOL: current SSD's are rated for around 1000-2000 writes.  So a 1Tb
> disk can sustain 1000-2000TB of total writes.  And writes to
> filesystem blocks would get re-written more often than data blocks.
>   How well it would work would depend on how often the data is deleted
> and re-written.

When did that guy do that study of SSDs? Basically hammered them to 
death 24/7? I think it took about three years of continuous write/erase 
cycles to destroy them.

Given that most drives are obsolete long before they've had three years 
of writes ... the conclusion was that - for the same write load - 
"modern" (as they were several years ago) SSDs would probably outlast 
mechanical drives for the same workload.

(Cheap SD cards, on the other hand ...)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
  2023-02-15 14:53             ` Wols Lists
@ 2023-02-15 15:22               ` Roger Heflin
  0 siblings, 0 replies; 11+ messages in thread
From: Roger Heflin @ 2023-02-15 15:22 UTC (permalink / raw)
  To: Wols Lists
  Cc: Heinz Mauelshagen, Kyle Sanderson, linux-raid, Song Liu,
	device-mapper development, John Stoffel, Linux-Kernel

The SMART on the disk marks the disk as FAILED when you hit the
manufacturer's posted limit (1000 or 2000 writes average).    I am
sure using a "FAILED" disk would make a lot of people nervous.

The conclusion of you can write as fast as you can and it will take 3
years to wear out would be specific to that specific brand/version
with a given set of chips in it, and may or may not hold to other
vendors/chips/versions, and so may have quite a bit of variation in
it.  I think I remember seeing that, but I don't remember what the
average write rate was.  The one I just found says 200TB of writes on
a 240g drive, so about 8000erases per cell was the lowest failure
rate, with some drives making it 3-5x higher.


On Wed, Feb 15, 2023 at 8:53 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 15/02/2023 11:44, Roger Heflin wrote:
> > WOL: current SSD's are rated for around 1000-2000 writes.  So a 1Tb
> > disk can sustain 1000-2000TB of total writes.  And writes to
> > filesystem blocks would get re-written more often than data blocks.
> >   How well it would work would depend on how often the data is deleted
> > and re-written.
>
> When did that guy do that study of SSDs? Basically hammered them to
> death 24/7? I think it took about three years of continuous write/erase
> cycles to destroy them.
>
> Given that most drives are obsolete long before they've had three years
> of writes ... the conclusion was that - for the same write load -
> "modern" (as they were several years ago) SSDs would probably outlast
> mechanical drives for the same workload.
>
> (Cheap SD cards, on the other hand ...)
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dm-devel] RAID4 with no striping mode request
  2023-02-15 11:44           ` Roger Heflin
  2023-02-15 14:53             ` Wols Lists
@ 2023-02-16  0:01             ` Kyle Sanderson
  1 sibling, 0 replies; 11+ messages in thread
From: Kyle Sanderson @ 2023-02-16  0:01 UTC (permalink / raw)
  To: Roger Heflin
  Cc: Heinz Mauelshagen, linux-raid, Song Liu,
	device-mapper development, John Stoffel, Linux-Kernel

> On Wed, Feb 15, 2023 at 3:44 AM Roger Heflin <rogerheflin@gmail.com> wrote:
>
> I think he is wanting the parity across the data blocks on the
> separate filesystems (some sort of parity across fs[1-8]/block0 to
> parity/block0).

Correct.

> On Wed, Feb 15, 2023 at 3:44 AM Roger Heflin <rogerheflin@gmail.com> wrote:
> it is not clear to me what this setup would be enough better than what
> the current setups.    Given that one could have 8 spin + 1ssd or 12
> spin for the same price.    And 2 6 disk raid6's would have the same
> usable space, and be pretty safe (can lose any 2 of the 6 and lose no
> data).

They're not the same price though. Remember these disks are mixed
sizes and various ages, exposing their entire data value
(4d+8d+12d+12p gives you 24T of usable storage) all protected by the
single parity disk.

Yes, higher levels of RAID will always be better. However, that's not
how these millions of appliances are developed by a number of
manufacturers and sold at your local retailer. The proposal (and ask
for help) that I've raised is to have an open-source solution to these
proprietary MD implementations, as opposed to being trapped with buggy
MD drivers on firmware that's glitchy and breaks other aspects of the
kernel.

> On Wed, Feb 15, 2023 at 3:44 AM Roger Heflin <rogerheflin@gmail.com> wrote:
> And given the separate filesystems requirement that would
> require some software above the filesystems to manage spreading the
> disks across multiple filesystems.   The risk of another disk going
> bad (while one was failed) and losing a disk's worth of data would
> push me to use the 6-disk raid6.

This is long solved by a number of FUSE filesystems, as well as
overlayfs (which would be nice if it could gradually spool data down
into layers, but that's another ball of wax).

Hopefully that makes sense. The only thing that's coming closer to
this is bcachefs, but that's still looking like a multi-year long road
(with the above being deployed in homes since the early 2000s).

On Wed, Feb 15, 2023 at 3:44 AM Roger Heflin <rogerheflin@gmail.com> wrote:
>
> I think he is wanting the parity across the data blocks on the
> separate filesystems (some sort of parity across fs[1-8]/block0 to
> parity/block0).
>
> it is not clear to me what this setup would be enough better than what
> the current setups.    Given that one could have 8 spin + 1ssd or 12
> spin for the same price.    And 2 6 disk raid6's would have the same
> usable space, and be pretty safe (can lose any 2 of the 6 and lose no
> data).  And given the separate filesystems requirement that would
> require some software above the filesystems to manage spreading the
> disks across multiple filesystems.   The risk of another disk going
> bad (while one was failed) and losing a disk's worth of data would
> push me to use the 6-disk raid6.
>
> WOL: current SSD's are rated for around 1000-2000 writes.  So a 1Tb
> disk can sustain 1000-2000TB of total writes.  And writes to
> filesystem blocks would get re-written more often than data blocks.
>  How well it would work would depend on how often the data is deleted
> and re-written.   If the disks are some sort of long term storage then
> the SSD is not going to get used up.   And I am not sure if the rated
> used up really means anything unless you are using a STUPID enterprise
> controller that proactively disables/kills the SSD when it says the
> rated writes have happened.   I have a 500GB ssd in a mirror that
> "FAILED" according to smart 2 years ago and so far is still fully
> functional, and it is "GOOD" again because the counters used to
> determine total writes seems to have rolled over.
>
> On Tue, Feb 14, 2023 at 8:23 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
> >
> > Roger,
> >
> > as any of the currently implemented 'parity' algorithms (block xor/P-/Q-Syndrome) provided by DM/MD RAID
> > have to have at least two data blocks to calculate:  are you, apart from the filesystem thoughts you bring up, thinking
> > about running those on e.g. pairs of disks of mentioned even numbered set of 8?
> >
> > Heinz
> >
> > On Tue, Feb 14, 2023 at 11:28 PM Roger Heflin <rogerheflin@gmail.com> wrote:
> >>
> >> On Tue, Feb 14, 2023 at 3:27 PM Heinz Mauelshagen <heinzm@redhat.com> wrote:
> >> >
> >>
> >> >
> >> >
> >> > ...which is RAID1 plus a parity disk which seems superfluous as you achieve (N-1)
> >> > resilience against single device failures already without the later.
> >> >
> >> > What would you need such parity disk for?
> >> >
> >> > Heinz
> >> >
> >>
> >> I thought that at first too, but threw that idea out as it did not
> >> make much sense.
> >>
> >> What he appears to want is 8 linear non-striped data disks + a parity disk.
> >>
> >> Such that you can lose any one data disk and parity can rebuild that
> >> disk.  And if you lose several data diskis, then you have intact
> >> non-striped data for the remaining disks.
> >>
> >> It would almost seem that you would need to put a separate filesystem
> >> on each data disk/section (or have a filesystem that is redundant
> >> enough to survive) otherwise losing an entire data disk would leave
> >> the filesystem in a mess..
> >>
> >> So N filesystems + a parity disk for the data on the N separate
> >> filesystems.   And each write needs you to read the data from the disk
> >> you are writing to, and the parity and recalculate the new parity and
> >> write out the data and new parity.
> >>
> >> If the parity disk was an SSD it would be fast enough, but if parity
> >> was an SSD I would expect it to get used up/burned out from all of
> >> parity being re-written for each write on each disk unless you bought
> >> an expensive high-write ssd.
> >>
> >> The only advantage of the setup is that if you lose too many disks you
> >> still have some data.
> >>
> >> It is not clear to me that it would be any cheaper if parity needs to
> >> be a normal ssd's (since ssds are about 4x the price/gb and high-write
> >> ones are even more) than a classic bunch of mirrors, or even say a 4
> >> disks raid6 where you can lose any 2 and still have data.
> >>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-16  0:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-13  3:07 RAID4 with no striping mode request Kyle Sanderson
2023-02-13 19:40 ` John Stoffel
2023-02-13 21:11   ` Kyle Sanderson
2023-02-14 18:37     ` Song Liu
     [not found]     ` <CAM23Vxr8LkkcVDFfW1=qEYGgo7JG1qx62eWSV4WOw4_MnD+TZA@mail.gmail.com>
2023-02-14 22:28       ` [dm-devel] " Roger Heflin
2023-02-15  7:23         ` Kyle Sanderson
2023-02-15  9:40         ` Wols Lists
     [not found]         ` <CAM23VxpzY6qYsdTYxe01FT7AJvEbODf8X_vq8ALL35TfyrB8xQ@mail.gmail.com>
2023-02-15 11:44           ` Roger Heflin
2023-02-15 14:53             ` Wols Lists
2023-02-15 15:22               ` Roger Heflin
2023-02-16  0:01             ` Kyle Sanderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).