All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Wols Lists <antlists@youngman.org.uk>, Nix <nix@esperi.org.uk>,
	NeilBrown <neilb@suse.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: RFC - Raid error detection and auto-recovery (was Fault tolerance with badblocks)
Date: Tue, 16 May 2017 11:31:16 -0400	[thread overview]
Message-ID: <1bccccd1-1ff8-1cb3-492e-42468a3c8a8f@turmel.org> (raw)
In-Reply-To: <591B125A.1000307@youngman.org.uk>

On 05/16/2017 10:53 AM, Wols Lists wrote:

> I'll give a car example. I'm talking about a car in a ditch. You're
> talking about a motorway pile-up AND YOU'RE ASSUMING I CAN'T TELL THE
> DIFFERENCE. That's why I'm getting so frustrated!

You clearly cannot.

> Please LOOK AT THE MATHS of my scenario.

It's not a math problem.  I'm quite familiar with the math, as a matter
of fact.  Galois fields are exceedingly cool for a math geek like me.

> First thing we do is read the entire stripe.

A substantial performance degradation, right out of the gate...

> IF the integrity check passes, we return the data. If it fails and our
> raid can't reconstruct (two-disk mirror, raid-4, raid-5) we return an error.

Where we currently return the data and let the upper layer decide its
value.  An error here is a regression in my book.

> Second - we now have a stripe that fails integrity, so we pass it
> through Peter's equation. If it returns "one block is corrupt and here's
> the correct version" we return the correct version. If it returns "can't
> solve the equation - too many unknowns" we return a read error.

Changing the data returned from what was written is another regression
in my book. Since the drive not returning a read error is far more
significant indication that the data is correct than a mismatch saying
its wrong.

> We *have* to assume that if the stripe passes the integrity check that
> it's correct - but we could have had an error that fools the integrity
> check! We just assume it's highly unlikely.

If the data blocks are successfully read from there drives, we *have* to
assume they're correct.  There are so many zeroes between the decimal
point and the first significant digit of that error probability that a
physical explanation elsewhere is a virtual certainty.

> What is the probability that Peter's equation screws up? We *KNOW* that
> if only one block is corrupt, that it will ALWAYS SUCCESSFULLY correct
> it. And from reading the paper, it seems to me that if *more than one*
> block is corrupt, it will detect it with over 99.9% accuracy.

No.  We don't.  We have a highly reliable drive saying the data is
correct versus a *system* of reads and writes spread over multiple
physical systems and spread over time that has a constellation of
failure modes, any one of which could have created the situation at hand.

Software flaws galore, particularly incomplete stripe writes.  Power
problems truncating stripe writes.  System memory bit flips.  PCIe
uncaught transmission errors.  Controller buffer memory bit flips.  SATA
or SAS transmission errors.

All of the above are rare.  But not anywhere near as rare as an
undetected sector read error.  MD cannot safely fix this automatically,
and shouldn't.  And with the performance hit, it is actively stupid.

And I'm done arguing.

Phil

  reply	other threads:[~2017-05-16 15:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-10 13:26 RFC - Raid error detection and auto-recovery (was Fault tolerance with badblocks) Wols Lists
2017-05-10 17:07 ` Piergiorgio Sartor
2017-05-11 23:31   ` Eyal Lebedinsky
2017-05-15  3:43 ` NeilBrown
2017-05-15 11:11   ` Nix
2017-05-15 13:44     ` Wols Lists
2017-05-15 22:31       ` Phil Turmel
2017-05-16 10:33         ` Wols Lists
2017-05-16 14:17           ` Phil Turmel
2017-05-16 14:53             ` Wols Lists
2017-05-16 15:31               ` Phil Turmel [this message]
2017-05-16 15:51                 ` Nix
2017-05-16 16:11                   ` Anthonys Lists

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bccccd1-1ff8-1cb3-492e-42468a3c8a8f@turmel.org \
    --to=philip@turmel.org \
    --cc=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=nix@esperi.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.