From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gandalf Corvotempesta Subject: Re: proactive disk replacement Date: Wed, 22 Mar 2017 14:53:19 +0100 Message-ID: References: <3FA2E00F-B107-4F3C-A9D3-A10CA5F81EC0@allygray.2y.net> <11c21a22-4bbf-7b16-5e64-8932be768c68@websitemanagers.com.au> <02316742-3887-b811-3c77-aad29cda4077@websitemanagers.com.au> <583576ca-a76c-3901-c196-6083791533ee@thelounge.net> <58D126EB.7060707@hesbynett.no> <09f4c794-8b17-05f5-10b7-6a3fa515bfa9@thelounge.net> <58D13598.50403@hesbynett.no> <58D145F9.1080405@youngman.org.uk> <58D14998.1060601@hesbynett.no> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Phil Turmel Cc: David Brown , Wols Lists , Reindl Harald , Adam Goryachev , Jeff Allison , linux-raid@vger.kernel.org List-Id: linux-raid.ids 2017-03-21 17:49 GMT+01:00 Phil Turmel : > The correlation is effectively immaterial in a non-degraded raid5 and > singly-degraded raid6 because recovery will succeed as long as any two > errors are in different 4k block/sector locations. And for non-degraded > raid6, all three UREs must occur in the same block/sector to lose > data. Some participants in this discussion need to read the statistical > description of this stuff here: > > http://marc.info/?l=linux-raid&m=139050322510249&w=2 > > As long as you are 'check' scrubbing every so often (I scrub weekly), > the odds of catastrophe on raid6 are the odds of something *else* taking > out the machine or controller, not the odds of simultaneous drive > failures. This is true but disk failures happens much more than multiple UREs on the same stripe. I think that in a RAID6 is much easier to loose data due to multiple disk failures. Last years i've lose a server due to 4 (of 6) disks failures in less than an hours during a rebuild. The first failure was detected in the middle of the night. It was a disconnection/reconnaction of a single disks. The riconnection triggered a resync. During the resync another disk failed. RAID6 recovered even from this double failure but at about 60% of rebuild, the third disk failed bringing the whole raid down. I was waked up by our monitoring system and looking at the server, there was also a fourth disk down :) 4 disks down in less than a hour. All disk was enterprise: SAS 15K, not desktop drives.