From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Turmel Subject: Re: Fault tolerance with badblocks Date: Mon, 8 May 2017 16:56:24 -0400 Message-ID: References: <03294ec0-2df0-8c1c-dd98-2e9e5efb6f4f@hale.ee> <590B3039.3060000@youngman.org.uk> <84184eb3-52c4-e7ad-cd5b-5021b5cf47ee@hale.ee> <590DC905.60207@youngman.org.uk> <87h90v8kt3.fsf@esperi.org.uk> <1533bba8-41cb-2c50-b28a-52786e463072@turmel.org> <87vapb6s9h.fsf@esperi.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <87vapb6s9h.fsf@esperi.org.uk> Sender: linux-raid-owner@vger.kernel.org To: Nix Cc: Wols Lists , "Ravi (Tom) Hale" , linux-raid@vger.kernel.org List-Id: linux-raid.ids On 05/08/2017 03:52 PM, Nix wrote: > On 8 May 2017, Phil Turmel verbalised: > >> On 05/08/2017 10:50 AM, Nix wrote: > And... then what do you do? On RAID-6, it appears the answer is "live > with a high probability of inevitable corruption". No, you investigate the quality of your data and the integrity of the rest of the system, as something *other* than a drive problem caused the mismatch. (Swap is a known exception, though.) > That's not very good. > (AIUI, if a check scrub finds a URE, it'll rewrite it, and when in the > common case the drive spares it out and the write succeeds, this will > not be reported as a mismatch: is this right?) This is also wrong, because you are assuming sparing-out is the common case. A read error does not automatically trigger relocation. It triggers *verification* of the next *write*. In young drives, successful rewrite in place is the common case. As the drive ages, rewrites will begin relocating because there really is a new problem at that spot, not simple thermal/magnetic decay. But keep in mind that the firmware of the drive will start verification of a sector only if it gets a *read* error. Such sectors get marked as "pending" relocations until they are written again. If that write verifies correct, the "pending" status simply goes away. Ordinary writes to presumed-ok sectors are *not* verified. (There'd be a huge difference between read and write speeds on rotating media if they were.) { Drive self tests might do some pre-emptive rewriting of marginal sectors -- it's not something drive manufacturers are documenting. But a drive self-test cannot fix an unreadable sector -- it doesn't know what to write there. } >> This is actually counterproductive. Rewriting everything may refresh >> the magnetism on weakening sectors, but will also prevent the drive from >> *finding* weakening sectors that really do need relocation. > > If a sector weakens purely because of neighbouring writes or temperature > or a vibrating housing or something (i.e. not because of actual damage), > so that a rewrite will strengthen it and relocation was never necessary, > surely you've just saved a pointless bit of sector sparing? (I don't > know: I'm not sure what the relative frequency of these things is. Read > and write errors in general are so rare that it's quite possible I'm > worrying about nothing at all. I do know I forgot to scrub my old > hardware RAID array for about three years and nothing bad happened...) Drives that are in applications that get *read* pretty often don't need much if any scrubbing -- the application itself will expose problem sectors. Hobbyists and home media servers can go months with specific files unread, so developing problems can hit in clusters. Regular scrubbing will catch these problems before they take your array down. And you can't compare hardware array behavior to MD -- they have their own algorithms to take care of attached disks without OS intervention. Phil