All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: Reindl Harald <h.reindl@thelounge.net>,
	Adam Goryachev <mailinglists@websitemanagers.com.au>,
	Jeff Allison <jeff.allison@allygray.2y.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: proactive disk replacement
Date: Tue, 21 Mar 2017 15:15:52 +0100	[thread overview]
Message-ID: <58D13598.50403@hesbynett.no> (raw)
In-Reply-To: <09f4c794-8b17-05f5-10b7-6a3fa515bfa9@thelounge.net>

On 21/03/17 14:24, Reindl Harald wrote:
> 
> 
> Am 21.03.2017 um 14:13 schrieb David Brown:
>> On 21/03/17 12:03, Reindl Harald wrote:
>>>
>>> Am 21.03.2017 um 11:54 schrieb Adam Goryachev:
>> <snip>
>>>
>>>> In addition, you claim that a drive larger than 2TB is almost certainly
>>>> going to suffer from a URE during recovery, yet this is exactly the
>>>> situation you will be in when trying to recover a RAID10 with member
>>>> devices 2TB or larger. A single URE on the surviving portion of the
>>>> RAID1 will cause you to lose the entire RAID10 array. On the other
>>>> hand,
>>>> 3 URE's on the three remaining members of the RAID6 will not cause more
>>>> than a hiccup (as long as no more than one URE on the same stripe,
>>>> which
>>>> I would argue is ... exceptionally unlikely).
>>>
>>> given that when your disks have the same age errors on another disk
>>> become more likely when one failed and the heavy disk IO due recovery of
>>> a RAID6 with takes *many hours* where you have heavy IO on *all disks*
>>> compared with a way faster restore of RAID1/10 guess in which case a URE
>>> is more likely
>>>
>>> additionally why should the whole array fail just because a single block
>>> get lost? the is no parity which needs to be calculated, you just lost a
>>> single block somewhere - RAID1/10 are way easier in their implementation
>>
>> If you have RAID1, and you have an URE, then the data can be recovered
>> from the other have of that RAID1 pair.  If you have had a disk failure
>> (manual for replacement, or a real failure), and you get an URE on the
>> other half of that pair, then you lose data.
>>
>> With RAID6, you need an additional failure (either another full disk
>> failure or an URE in the /same/ stripe) to lose data.  RAID6 has higher
>> redundancy than two-way RAID1 - of this there is /no/ doubt
> 
> yes, but with RAID5/RAID6 *all disks* are involved in the rebuild, with
> a 10 disk RAID10 only one disk needs to be read and the data written to
> the new one - all other disks are not involved in the resync at all

True...

> 
> for most arrays the disks have a similar age and usage pattern, so when
> the first one fails it becomes likely that it don't take too long for
> another one and so load and recovery time matters

False.  There is no reason to suspect that - certainly not to within the
hours or day it takes to rebuild your array.  Disk failure pattern shows
a peak within the first month or so (failures due to manufacturing or
handling), then a very low error rate for a few years, then a gradually
increasing rate after that.  There is not a very significant correlation
between drive failures within the same system, nor is there a very
significant correlation between usage and failures.  It might seem
reasonable to suspect that a drive is more likely to fail during a
rebuild since the disk is being heavily used, but that does not appear
to be the practice.  You will /spot/ more errors at that point - simply
because you don't see errors in parts of the disk that are not read -
but the rebuilding does not cause them.

And even if it /were/ true, then the key point is if there is an error
that causes data loss.  An error during reading for a RAID1 rebuild
means lost data.  An error during reading for a RAID6 rebuild means you
have to read an extra sector from another disk and correct the mistake.



  reply	other threads:[~2017-03-21 14:15 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-20 12:47 proactive disk replacement Jeff Allison
2017-03-20 13:25 ` Reindl Harald
2017-03-20 14:59 ` Adam Goryachev
2017-03-20 15:04   ` Reindl Harald
2017-03-20 15:23     ` Adam Goryachev
2017-03-20 16:19       ` Wols Lists
2017-03-21  2:33   ` Jeff Allison
2017-03-21  9:54     ` Reindl Harald
2017-03-21 10:54       ` Adam Goryachev
2017-03-21 11:03         ` Reindl Harald
2017-03-21 11:34           ` Andreas Klauer
2017-03-21 12:03             ` Reindl Harald
2017-03-21 12:41               ` Andreas Klauer
2017-03-22  4:16                 ` NeilBrown
2017-03-21 11:56           ` Adam Goryachev
2017-03-21 12:10             ` Reindl Harald
2017-03-21 13:13           ` David Brown
2017-03-21 13:24             ` Reindl Harald
2017-03-21 14:15               ` David Brown [this message]
2017-03-21 15:25                 ` Wols Lists
2017-03-21 15:41                   ` David Brown
2017-03-21 16:49                     ` Phil Turmel
2017-03-22 13:53                       ` Gandalf Corvotempesta
2017-03-22 14:12                         ` David Brown
2017-03-22 14:32                         ` Phil Turmel
2017-03-21 11:55         ` Gandalf Corvotempesta
2017-03-21 13:02       ` David Brown
2017-03-21 13:26         ` Gandalf Corvotempesta
2017-03-21 14:26           ` David Brown
2017-03-21 15:31             ` Wols Lists
2017-03-21 17:00               ` Phil Turmel
2017-03-21 15:29         ` Wols Lists
2017-03-21 16:55         ` Phil Turmel
2017-03-22 14:51 ` John Stoffel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58D13598.50403@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=h.reindl@thelounge.net \
    --cc=jeff.allison@allygray.2y.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=mailinglists@websitemanagers.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.