All of lore.kernel.org
 help / color / mirror / Atom feed
* the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request
@ 2010-08-18  3:56 Brett L. Trotter
  2010-08-18  7:45 ` Michael Tokarev
  2010-09-04 18:38 ` Bill Davidsen
  0 siblings, 2 replies; 5+ messages in thread
From: Brett L. Trotter @ 2010-08-18  3:56 UTC (permalink / raw)
  To: linux-raid

I've googled endlessly about the internal nature of a md Raid-1.

Over the years, I've found several single bit flips on traditional
platter disks on files that were previously on a linux raid-1 that had
split and while doing a verification of the two copies. This seems to
imply that what I'm looking for doesn't exist- and that is simply a
vertical parity within each disk at the md level- even a single crc32
every once in a while so that if a bit flips on drive 1 of a mirror,
drive 2's copy replaces it instead of 1's bit flipped copy replacing
drive 2's good copy. From what I can gather, it seems to be a 50/50 shot
whether your good copy gets mangled in the event of a silent bit flippage.

So- is there any built in parity that helps mdadm decide which copy to
use when the copies disagree on a raid 1 mirror during a resync?

If not- is there a reason why not beyond the extra space overhead and
read compute write overhead?

This issue interests me more as I look into SSDs and having flash blocks
wear out.


I'd choose a higher raid level if i could, but this is only a very small
atom 330 box with only mildly important data. I think I'm ultimately
looking for something like ZFS has, but ZFS under RHEL/CentOS will
probably never happen in any meaningful production worthy way due
licensing and the ultimate demise of sun and tainting of things that is
Oracle.

I'd love any information anyone has on the subject.

-Brett

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request
  2010-08-18  3:56 the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request Brett L. Trotter
@ 2010-08-18  7:45 ` Michael Tokarev
  2010-08-18  8:00   ` Mikael Abrahamsson
  2010-09-04 18:38 ` Bill Davidsen
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Tokarev @ 2010-08-18  7:45 UTC (permalink / raw)
  To: Brett L. Trotter; +Cc: linux-raid

18.08.2010 07:56, Brett L. Trotter wrote:
[]
> So- is there any built in parity that helps mdadm decide which copy to
> use when the copies disagree on a raid 1 mirror during a resync?

There's no.

> If not- is there a reason why not beyond the extra space overhead and
> read compute write overhead?

Well.  This sounds pretty much like old discussion about bad
blocks marking in md or filesystems or any other layer like this.
But nowadays - hopefully anyway - all drives are capable of doing
this internally, -- remapping bad blocks.  If a drive is not able
to remap a new bad block anymore, it's time to throw it away or
to RMA it, instead of trying to "cure" it in upper layers.

The same thing is with parity.  All modern drives, at least in
theory, has ways to ensure they either return whatever has been
written, or indicate error.  This is ECC codes, checksums, parity,
whatnot - things supposed to detect errors and sometimes correct
simple ones like bit flips.

I understand you've got real cases where such detection does not
works for some reason.  Well, bad block remapping didn't work in
the past too... ;)

It shouldn't be very difficult to implement checsumming and/or
simple ECC codes in md (storing the parity information within
extra blocks either at the end of underlying device or every,
say, 64th block or so - in order to not reduce sector size into
something like 511 bytes :).  The overhead shouldn't be large
either.  Together with implementing bad block remapping..  But
to me, the question is if there's a real reason/demand of doing
so.

From the other hand, following this theme one may say that whole
md subsystem is obsolete by hardware raid controllers...  :)

> This issue interests me more as I look into SSDs and having flash blocks
> wear out.

And it is even more important for SSDs to have such feature, and
as far as I understand, ths is what they actually have.  I might
be wrong, but...

/mjt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request
  2010-08-18  7:45 ` Michael Tokarev
@ 2010-08-18  8:00   ` Mikael Abrahamsson
  0 siblings, 0 replies; 5+ messages in thread
From: Mikael Abrahamsson @ 2010-08-18  8:00 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Brett L. Trotter, linux-raid

On Wed, 18 Aug 2010, Michael Tokarev wrote:

> But to me, the question is if there's a real reason/demand of doing so.

ZFS does it and people who are paranoid about bit rot really want it. It 
gives more protection against memory errors etc, ie outside the drive when 
the bits are in transit from the drive thru 
cables/controllers/drivers/block subsystem etc. Of course it's not 
perfect, but it gives some added protection.

If the cost/benefit analysis holds up or not I don't know, because I don't 
know the complexity. Having a 64k stripe in md actually use 68 k on drive 
and store some checksum might make sense, but it doesn't give great 
granularity. Perhaps those 4k can be checksum per 4k within the 64k stripe 
block so that a fairly fine-granular error can be given, and also if there 
is parity information available it can be read and the problem corrected.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request
  2010-08-18  3:56 the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request Brett L. Trotter
  2010-08-18  7:45 ` Michael Tokarev
@ 2010-09-04 18:38 ` Bill Davidsen
  2010-09-04 18:56   ` Mikael Abrahamsson
  1 sibling, 1 reply; 5+ messages in thread
From: Bill Davidsen @ 2010-09-04 18:38 UTC (permalink / raw)
  To: Brett L. Trotter; +Cc: linux-raid, Neil Brown

Brett L. Trotter wrote:
> I've googled endlessly about the internal nature of a md Raid-1.
>
> Over the years, I've found several single bit flips on traditional
> platter disks on files that were previously on a linux raid-1 that had
> split and while doing a verification of the two copies. This seems to
> imply that what I'm looking for doesn't exist- and that is simply a
> vertical parity within each disk at the md level- even a single crc32
> every once in a while so that if a bit flips on drive 1 of a mirror,
> drive 2's copy replaces it instead of 1's bit flipped copy replacing
> drive 2's good copy. From what I can gather, it seems to be a 50/50 shot
> whether your good copy gets mangled in the event of a silent bit flippage.
>
> So- is there any built in parity that helps mdadm decide which copy to
> use when the copies disagree on a raid 1 mirror during a resync?
>
> If not- is there a reason why not beyond the extra space overhead and
> read compute write overhead?
>
> This issue interests me more as I look into SSDs and having flash blocks
> wear out.
>
>
> I'd choose a higher raid level if i could, but this is only a very small
> atom 330 box with only mildly important data. I think I'm ultimately
> looking for something like ZFS has, but ZFS under RHEL/CentOS will
> probably never happen in any meaningful production worthy way due
> licensing and the ultimate demise of sun and tainting of things that is
> Oracle.
>
> I'd love any information anyone has on the subject.
>   

I don't think you are going to love this, as far as I can tell there is 
no better recovery done for higher level raid, either, if the failure is 
silent rather than a drive failing. When a 'check' is run and error 
found, Neil seems to believe that it is not worth the overhead of 
identifying the most likely wrong data, so it is simply rewritten to 
make the mismatch go away, rather than to make an attempt to identify 
the most likely correct data for the most likely bad sector and to fix that.

On N>2 copy raid-1, no check is made (unless the change is very recent) 
to see if N-1 copies agree on a value, and with raid-6 the obvious check 
to find the most likely to be wrong data isn't done. This has been 
discussed to death, I don't see any changes coming.

-- 
Bill Davidsen <davidsen@tmr.com>
  "We can't solve today's problems by using the same thinking we
   used in creating them." - Einstein


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request
  2010-09-04 18:38 ` Bill Davidsen
@ 2010-09-04 18:56   ` Mikael Abrahamsson
  0 siblings, 0 replies; 5+ messages in thread
From: Mikael Abrahamsson @ 2010-09-04 18:56 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Brett L. Trotter, linux-raid, Neil Brown

On Sat, 4 Sep 2010, Bill Davidsen wrote:

> This has been discussed to death, I don't see any changes coming.

True. If someone really wants this (for instance 64k on drive using 4k ECC 
data for a total of 68k per stripe) and are willing to put either money on 
the table for someone to write it, or someone is volunteering, I don't see 
this coming either.

If someone is willing to actually code (or have it coded) then a 
discussion should first happen if such a change would be accepted.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-09-04 18:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-18  3:56 the true behavior of mdadm's raid-1 with regard to vertical parity and silent error detection/scrubbing- confirmation or feature request Brett L. Trotter
2010-08-18  7:45 ` Michael Tokarev
2010-08-18  8:00   ` Mikael Abrahamsson
2010-09-04 18:38 ` Bill Davidsen
2010-09-04 18:56   ` Mikael Abrahamsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.