All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Turmel <philip@turmel.org>
To: Nix <nix@esperi.org.uk>
Cc: Wols Lists <antlists@youngman.org.uk>,
	"Ravi (Tom) Hale" <ravi@hale.ee>,
	linux-raid@vger.kernel.org
Subject: Re: Fault tolerance with badblocks
Date: Mon, 8 May 2017 16:56:24 -0400	[thread overview]
Message-ID: <e2196f02-2b94-8afb-06a0-9695d441c890@turmel.org> (raw)
In-Reply-To: <87vapb6s9h.fsf@esperi.org.uk>

On 05/08/2017 03:52 PM, Nix wrote:
> On 8 May 2017, Phil Turmel verbalised:
> 
>> On 05/08/2017 10:50 AM, Nix wrote:

> And... then what do you do? On RAID-6, it appears the answer is "live
> with a high probability of inevitable corruption".

No, you investigate the quality of your data and the integrity of the
rest of the system, as something *other* than a drive problem caused the
mismatch.  (Swap is a known exception, though.)

> That's not very good.
> (AIUI, if a check scrub finds a URE, it'll rewrite it, and when in the
> common case the drive spares it out and the write succeeds, this will
> not be reported as a mismatch: is this right?)

This is also wrong, because you are assuming sparing-out is the common
case.  A read error does not automatically trigger relocation.  It
triggers *verification* of the next *write*.  In young drives,
successful rewrite in place is the common case.  As the drive ages,
rewrites will begin relocating because there really is a new problem at
that spot, not simple thermal/magnetic decay.

But keep in mind that the firmware of the drive will start verification
of a sector only if it gets a *read* error.  Such sectors get marked as
"pending" relocations until they are written again.  If that write
verifies correct, the "pending" status simply goes away.  Ordinary
writes to presumed-ok sectors are *not* verified.  (There'd be a huge
difference between read and write speeds on rotating media if they were.)

{ Drive self tests might do some pre-emptive rewriting of marginal
sectors -- it's not something drive manufacturers are documenting.  But
a drive self-test cannot fix an unreadable sector -- it doesn't know
what to write there. }

>> This is actually counterproductive.  Rewriting everything may refresh
>> the magnetism on weakening sectors, but will also prevent the drive from
>> *finding* weakening sectors that really do need relocation.
> 
> If a sector weakens purely because of neighbouring writes or temperature
> or a vibrating housing or something (i.e. not because of actual damage),
> so that a rewrite will strengthen it and relocation was never necessary,
> surely you've just saved a pointless bit of sector sparing? (I don't
> know: I'm not sure what the relative frequency of these things is. Read
> and write errors in general are so rare that it's quite possible I'm
> worrying about nothing at all. I do know I forgot to scrub my old
> hardware RAID array for about three years and nothing bad happened...)

Drives that are in applications that get *read* pretty often don't need
much if any scrubbing -- the application itself will expose problem
sectors.  Hobbyists and home media servers can go months with specific
files unread, so developing problems can hit in clusters.  Regular
scrubbing will catch these problems before they take your array down.

And you can't compare hardware array behavior to MD -- they have their
own algorithms to take care of attached disks without OS intervention.

Phil

  parent reply	other threads:[~2017-05-08 20:56 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 10:04 Fault tolerance in RAID0 with badblocks Ravi (Tom) Hale
2017-05-04 13:44 ` Wols Lists
2017-05-05  4:03   ` Fault tolerance " Ravi (Tom) Hale
2017-05-05 19:20     ` Anthony Youngman
2017-05-06 11:21       ` Ravi (Tom) Hale
2017-05-06 13:00         ` Wols Lists
2017-05-08 14:50           ` Nix
2017-05-08 18:00             ` Anthony Youngman
2017-05-09 10:11               ` David Brown
2017-05-09 10:18               ` Nix
2017-05-08 19:02             ` Phil Turmel
2017-05-08 19:52               ` Nix
2017-05-08 20:27                 ` Anthony Youngman
2017-05-09  9:53                   ` Nix
2017-05-09 11:09                     ` David Brown
2017-05-09 11:27                       ` Nix
2017-05-09 11:58                         ` David Brown
2017-05-09 17:25                           ` Chris Murphy
2017-05-09 19:44                             ` Wols Lists
2017-05-10  3:53                               ` Chris Murphy
2017-05-10  4:49                                 ` Wols Lists
2017-05-10 17:18                                   ` Chris Murphy
2017-05-16  3:20                                   ` NeilBrown
2017-05-10  5:00                                 ` Dave Stevens
2017-05-10 16:44                                 ` Edward Kuns
2017-05-10 18:09                                   ` Chris Murphy
2017-05-09 20:18                             ` Nix
2017-05-09 20:52                               ` Wols Lists
2017-05-10  8:41                               ` David Brown
2017-05-09 21:06                             ` A sector-of-mismatch warning patch (was Re: Fault tolerance with badblocks) Nix
2017-05-12 11:14                               ` Nix
2017-05-16  3:27                               ` NeilBrown
2017-05-16  9:13                                 ` Nix
2017-05-16 21:11                                 ` NeilBrown
2017-05-16 21:46                                   ` Nix
2017-05-18  0:07                                     ` Shaohua Li
2017-05-19  4:53                                       ` NeilBrown
2017-05-19 10:31                                         ` Nix
2017-05-19 16:48                                           ` Shaohua Li
2017-06-02 12:28                                             ` Nix
2017-05-19  4:49                                     ` NeilBrown
2017-05-19 10:32                                       ` Nix
2017-05-19 16:55                                         ` Shaohua Li
2017-05-21 22:00                                           ` NeilBrown
2017-05-09 19:16                         ` Fault tolerance with badblocks Phil Turmel
2017-05-09 20:01                           ` Nix
2017-05-09 20:57                             ` Wols Lists
2017-05-09 21:22                               ` Nix
2017-05-09 21:23                             ` Phil Turmel
2017-05-09 21:32                     ` NeilBrown
2017-05-10 19:03                       ` Nix
2017-05-09 16:05                   ` Chris Murphy
2017-05-09 17:49                     ` Wols Lists
2017-05-10  3:06                       ` Chris Murphy
2017-05-08 20:56                 ` Phil Turmel [this message]
2017-05-09 10:28                   ` Nix
2017-05-09 10:50                     ` Reindl Harald
2017-05-09 11:15                       ` Nix
2017-05-09 11:48                         ` Reindl Harald
2017-05-09 16:11                           ` Nix
2017-05-09 16:46                             ` Reindl Harald
2017-05-09  7:37             ` David Brown
2017-05-09  9:58               ` Nix
2017-05-09 10:28                 ` Brad Campbell
2017-05-09 10:40                   ` Nix
2017-05-09 12:15                     ` Tim Small
2017-05-09 15:30                       ` Nix
2017-05-05 20:23     ` Peter Grandi
2017-05-05 22:14       ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2196f02-2b94-8afb-06a0-9695d441c890@turmel.org \
    --to=philip@turmel.org \
    --cc=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    --cc=ravi@hale.ee \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.