* proactive disk replacement @ 2017-03-20 12:47 Jeff Allison 2017-03-20 13:25 ` Reindl Harald ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Jeff Allison @ 2017-03-20 12:47 UTC (permalink / raw) To: linux-raid Hi all I’ve had a poke around but am yet to find something definitive. I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this disks are getting a bit long in the tooth so before I get into problems I’ve bought 4 new disks to replace them. I have a backup so if it all goes west I’m covered. So I’m looking for suggestions. My current plan is just to replace the 2tb drives with the new 3tb drives and move on, I’d like to do it on line with out having to trash the array and start again, so does anyone have a game plan for doing that. Or is a 9tb raid 5 array the wrong thing to be doing and should I be doing something else 6tb raid 10 or something I’m open to suggestions. Cheers Jeff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 12:47 proactive disk replacement Jeff Allison @ 2017-03-20 13:25 ` Reindl Harald 2017-03-20 14:59 ` Adam Goryachev 2017-03-22 14:51 ` John Stoffel 2 siblings, 0 replies; 34+ messages in thread From: Reindl Harald @ 2017-03-20 13:25 UTC (permalink / raw) To: Jeff Allison, linux-raid Am 20.03.2017 um 13:47 schrieb Jeff Allison: > Hi all I’ve had a poke around but am yet to find something definitive. > > I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this disks are getting a bit long in the tooth so before I get into problems I’ve bought 4 new disks to replace them. > > I have a backup so if it all goes west I’m covered. So I’m looking for suggestions. > > My current plan is just to replace the 2tb drives with the new 3tb drives and move on, I’d like to do it on line with out having to trash the array and start again, so does anyone have a game plan for doing that. > > Or is a 9tb raid 5 array the wrong thing to be doing and should I be doing something else 6tb raid 10 or something I’m open to suggestions. you just manually fail them and replace them the same way as if they would have died unexpected - done that multiple times on machines without bayes i just poweroff, replace a disk and then clone the mbr and add the partitions also the same way as i do when one dies (partitions in case you didn't use the whole drives for the array) http://bencane.com/2011/07/06/mdadm-manually-fail-a-drive/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 12:47 proactive disk replacement Jeff Allison 2017-03-20 13:25 ` Reindl Harald @ 2017-03-20 14:59 ` Adam Goryachev 2017-03-20 15:04 ` Reindl Harald 2017-03-21 2:33 ` Jeff Allison 2017-03-22 14:51 ` John Stoffel 2 siblings, 2 replies; 34+ messages in thread From: Adam Goryachev @ 2017-03-20 14:59 UTC (permalink / raw) To: Jeff Allison, linux-raid On 20/3/17 23:47, Jeff Allison wrote: > Hi all I’ve had a poke around but am yet to find something definitive. > > I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this disks are getting a bit long in the tooth so before I get into problems I’ve bought 4 new disks to replace them. > > I have a backup so if it all goes west I’m covered. So I’m looking for suggestions. > > My current plan is just to replace the 2tb drives with the new 3tb drives and move on, I’d like to do it on line with out having to trash the array and start again, so does anyone have a game plan for doing that. Yes, do not fail a disk and then replace it, use the newer replace method (it keeps redundancy in the array). Even better would be to add a disk, and convert to RAID6, then add a second disk (using replace), and so on, then remove the last disk, grow the array to fill the 3TB, and then reduce the number of disks in the raid. This way, you end up with RAID6... > Or is a 9tb raid 5 array the wrong thing to be doing and should I be doing something else 6tb raid 10 or something I’m open to suggestions. I'd feel safer with RAID6, but it depends on your requirements. RAID10 is also a nice option, but, it depends... Regards, Adam ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 14:59 ` Adam Goryachev @ 2017-03-20 15:04 ` Reindl Harald 2017-03-20 15:23 ` Adam Goryachev 2017-03-21 2:33 ` Jeff Allison 1 sibling, 1 reply; 34+ messages in thread From: Reindl Harald @ 2017-03-20 15:04 UTC (permalink / raw) To: Adam Goryachev, Jeff Allison, linux-raid Am 20.03.2017 um 15:59 schrieb Adam Goryachev: > On 20/3/17 23:47, Jeff Allison wrote: >> Hi all I’ve had a poke around but am yet to find something definitive. >> >> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this >> disks are getting a bit long in the tooth so before I get into >> problems I’ve bought 4 new disks to replace them. >> >> I have a backup so if it all goes west I’m covered. So I’m looking for >> suggestions. >> >> My current plan is just to replace the 2tb drives with the new 3tb >> drives and move on, I’d like to do it on line with out having to trash >> the array and start again, so does anyone have a game plan for doing >> that. > Yes, do not fail a disk and then replace it, use the newer replace > method (it keeps redundancy in the array) how should it keep redundancy when you have to remove a disk anyways except you have enough slots to at least temporary add a additional one? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 15:04 ` Reindl Harald @ 2017-03-20 15:23 ` Adam Goryachev 2017-03-20 16:19 ` Wols Lists 0 siblings, 1 reply; 34+ messages in thread From: Adam Goryachev @ 2017-03-20 15:23 UTC (permalink / raw) To: Reindl Harald, Jeff Allison, linux-raid On 21/3/17 02:04, Reindl Harald wrote: > > > Am 20.03.2017 um 15:59 schrieb Adam Goryachev: >> On 20/3/17 23:47, Jeff Allison wrote: >>> Hi all I’ve had a poke around but am yet to find something definitive. >>> >>> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this >>> disks are getting a bit long in the tooth so before I get into >>> problems I’ve bought 4 new disks to replace them. >>> >>> I have a backup so if it all goes west I’m covered. So I’m looking for >>> suggestions. >>> >>> My current plan is just to replace the 2tb drives with the new 3tb >>> drives and move on, I’d like to do it on line with out having to trash >>> the array and start again, so does anyone have a game plan for doing >>> that. >> Yes, do not fail a disk and then replace it, use the newer replace >> method (it keeps redundancy in the array) > > how should it keep redundancy when you have to remove a disk anyways > except you have enough slots to at least temporary add a additional one? Yes, assuming you can (at least temporarily) add an additional disk, then you will not lose redundancy by using the replace instead of fail/add method. Regards, Adam ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 15:23 ` Adam Goryachev @ 2017-03-20 16:19 ` Wols Lists 0 siblings, 0 replies; 34+ messages in thread From: Wols Lists @ 2017-03-20 16:19 UTC (permalink / raw) To: Adam Goryachev, Reindl Harald, Jeff Allison, linux-raid On 20/03/17 15:23, Adam Goryachev wrote: > > > On 21/3/17 02:04, Reindl Harald wrote: >> >> >> Am 20.03.2017 um 15:59 schrieb Adam Goryachev: >>> On 20/3/17 23:47, Jeff Allison wrote: >>>> Hi all I’ve had a poke around but am yet to find something definitive. >>>> >>>> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this >>>> disks are getting a bit long in the tooth so before I get into >>>> problems I’ve bought 4 new disks to replace them. >>>> >>>> I have a backup so if it all goes west I’m covered. So I’m looking for >>>> suggestions. >>>> >>>> My current plan is just to replace the 2tb drives with the new 3tb >>>> drives and move on, I’d like to do it on line with out having to trash >>>> the array and start again, so does anyone have a game plan for doing >>>> that. >>> Yes, do not fail a disk and then replace it, use the newer replace >>> method (it keeps redundancy in the array) >> >> how should it keep redundancy when you have to remove a disk anyways >> except you have enough slots to at least temporary add a additional one? > Yes, assuming you can (at least temporarily) add an additional disk, > then you will not lose redundancy by using the replace instead of > fail/add method. > Take a look at the raid wiki. Especially this page ... https://raid.wiki.kernel.org/index.php/Replacing_a_failed_drive Okay, it's my work (unless people have come in since and edited it) but I make a point of asking "the people who should know" to check my work if I'm at all unsure. So this will have been looked over for mistakes by various people on the list who either write the code or provide advice and support. And yes, as you can see from that page, I'd say add a new disk then --replace it into the array. And upgrading the array to raid6 is a good idea. But Adam's way I think you need two extra temporary drive slots. What I think you can do is - the new drives you need to make the underlying partition the full 3TB. You can then replace all four drives. So long as 2*3TB >= 3*2TB (don't laugh - it might not be!!!) you should be able to reduce the number of drives to three then add the fourth back to give raid6. The other thing is, if you've got the space for Adam's method, you could always temporarily create a 4TB drive by combining 2*2TB in a raid0 - probably best striped rather than linear. Cheers, Wol ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 14:59 ` Adam Goryachev 2017-03-20 15:04 ` Reindl Harald @ 2017-03-21 2:33 ` Jeff Allison 2017-03-21 9:54 ` Reindl Harald 1 sibling, 1 reply; 34+ messages in thread From: Jeff Allison @ 2017-03-21 2:33 UTC (permalink / raw) To: Adam Goryachev; +Cc: linux-raid I don't have a spare SATA slot I do however have a spare USB carrier, is that fast enough to be used temporarily? On 21 March 2017 at 01:59, Adam Goryachev <mailinglists@websitemanagers.com.au> wrote: > > > On 20/3/17 23:47, Jeff Allison wrote: >> >> Hi all I’ve had a poke around but am yet to find something definitive. >> >> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this disks >> are getting a bit long in the tooth so before I get into problems I’ve >> bought 4 new disks to replace them. >> >> I have a backup so if it all goes west I’m covered. So I’m looking for >> suggestions. >> >> My current plan is just to replace the 2tb drives with the new 3tb drives >> and move on, I’d like to do it on line with out having to trash the array >> and start again, so does anyone have a game plan for doing that. > > Yes, do not fail a disk and then replace it, use the newer replace method > (it keeps redundancy in the array). > Even better would be to add a disk, and convert to RAID6, then add a second > disk (using replace), and so on, then remove the last disk, grow the array > to fill the 3TB, and then reduce the number of disks in the raid. > This way, you end up with RAID6... >> >> Or is a 9tb raid 5 array the wrong thing to be doing and should I be doing >> something else 6tb raid 10 or something I’m open to suggestions. > > I'd feel safer with RAID6, but it depends on your requirements. RAID10 is > also a nice option, but, it depends... > > Regards, > Adam > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 2:33 ` Jeff Allison @ 2017-03-21 9:54 ` Reindl Harald 2017-03-21 10:54 ` Adam Goryachev 2017-03-21 13:02 ` David Brown 0 siblings, 2 replies; 34+ messages in thread From: Reindl Harald @ 2017-03-21 9:54 UTC (permalink / raw) To: Jeff Allison, Adam Goryachev; +Cc: linux-raid Am 21.03.2017 um 03:33 schrieb Jeff Allison: > I don't have a spare SATA slot I do however have a spare USB carrier, > is that fast enough to be used temporarily? USB3 yes, USB2 don't make fun because the speed of the array depends on the slowest disk in the spindle and about RAID5/RAID6 versus RAID10: both RAID5 and RAID6 suffer from the same problems - due rebuild you have a lot of random-IO load on all remaining disks which leads in bad performance and make it more likely that before the rebuild is finished another disk fails, RAID6 produces even more random IO because of the double parity and if you have a Unrecoverable-Read-Error on RAID5 you are dead, RAID6 is not much better here and the probability of a URE becomes more likely with larger disks RAID10: less to zero performance impact due rebuild and no random-IO caused by the rebuild, it's just "read a disk from start to end and write the data on another disk linear" while the only head moves on your disks is the normal workload on the array with disks 2 TB or larger you can make the conclusion "do not use RAID5/6 anymore and when you do be prepared that you won't survive a rebuild caused by a failed disk" > On 21 March 2017 at 01:59, Adam Goryachev > <mailinglists@websitemanagers.com.au> wrote: >> >> >> On 20/3/17 23:47, Jeff Allison wrote: >>> >>> Hi all I’ve had a poke around but am yet to find something definitive. >>> >>> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this disks >>> are getting a bit long in the tooth so before I get into problems I’ve >>> bought 4 new disks to replace them. >>> >>> I have a backup so if it all goes west I’m covered. So I’m looking for >>> suggestions. >>> >>> My current plan is just to replace the 2tb drives with the new 3tb drives >>> and move on, I’d like to do it on line with out having to trash the array >>> and start again, so does anyone have a game plan for doing that. >> >> Yes, do not fail a disk and then replace it, use the newer replace method >> (it keeps redundancy in the array). >> Even better would be to add a disk, and convert to RAID6, then add a second >> disk (using replace), and so on, then remove the last disk, grow the array >> to fill the 3TB, and then reduce the number of disks in the raid. >> This way, you end up with RAID6... >>> >>> Or is a 9tb raid 5 array the wrong thing to be doing and should I be doing >>> something else 6tb raid 10 or something I’m open to suggestions. >> >> I'd feel safer with RAID6, but it depends on your requirements. RAID10 is >> also a nice option, but, it depends... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 9:54 ` Reindl Harald @ 2017-03-21 10:54 ` Adam Goryachev 2017-03-21 11:03 ` Reindl Harald 2017-03-21 11:55 ` Gandalf Corvotempesta 2017-03-21 13:02 ` David Brown 1 sibling, 2 replies; 34+ messages in thread From: Adam Goryachev @ 2017-03-21 10:54 UTC (permalink / raw) To: Reindl Harald, Jeff Allison; +Cc: linux-raid On 21/3/17 20:54, Reindl Harald wrote: > > > Am 21.03.2017 um 03:33 schrieb Jeff Allison: >> I don't have a spare SATA slot I do however have a spare USB carrier, >> is that fast enough to be used temporarily? > > USB3 yes, USB2 don't make fun because the speed of the array depends > on the slowest disk in the spindle > > and about RAID5/RAID6 versus RAID10: both RAID5 and RAID6 suffer from > the same problems - due rebuild you have a lot of random-IO load on > all remaining disks which leads in bad performance and make it more > likely that before the rebuild is finished another disk fails, RAID6 > produces even more random IO because of the double parity and if you > have a Unrecoverable-Read-Error on RAID5 you are dead, RAID6 is not > much better here and the probability of a URE becomes more likely with > larger disks > > RAID10: less to zero performance impact due rebuild and no random-IO > caused by the rebuild, it's just "read a disk from start to end and > write the data on another disk linear" while the only head moves on > your disks is the normal workload on the array > > with disks 2 TB or larger you can make the conclusion "do not use > RAID5/6 anymore and when you do be prepared that you won't survive a > rebuild caused by a failed disk" > I can't say I'm an expert in this, but in actual fact, I disagree with both your arguments against RAID6... You say recovery on a RAID10 is a simple linear read from one drive (the surviving member of the RAID1 portion) and a linear write on the other (the replaced drive). You also declare that there is no random IO with normal work load + recovery. I think you have forgotten that the "normal workload" is probably random IO, but certainly once combined with the recovery IO then it will be random IO. In addition, you claim that a drive larger than 2TB is almost certainly going to suffer from a URE during recovery, yet this is exactly the situation you will be in when trying to recover a RAID10 with member devices 2TB or larger. A single URE on the surviving portion of the RAID1 will cause you to lose the entire RAID10 array. On the other hand, 3 URE's on the three remaining members of the RAID6 will not cause more than a hiccup (as long as no more than one URE on the same stripe, which I would argue is ... exceptionally unlikely). In addition, with a 4 disk RAID6 you have a 100% chance of surviving a 2 drive failure without data loss, yet with 4 disk RAID10 you have a 50% chance of surviving a 2 drive failure. Sure, there are other things to consider (performance, cost, etc) but on a reliability point, RAID6 seems to be the far better option. Regards, Adam >> On 21 March 2017 at 01:59, Adam Goryachev >> <mailinglists@websitemanagers.com.au> wrote: >>> >>> >>> On 20/3/17 23:47, Jeff Allison wrote: >>>> >>>> Hi all I’ve had a poke around but am yet to find something definitive. >>>> >>>> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now >>>> this disks >>>> are getting a bit long in the tooth so before I get into problems I’ve >>>> bought 4 new disks to replace them. >>>> >>>> I have a backup so if it all goes west I’m covered. So I’m looking for >>>> suggestions. >>>> >>>> My current plan is just to replace the 2tb drives with the new 3tb >>>> drives >>>> and move on, I’d like to do it on line with out having to trash the >>>> array >>>> and start again, so does anyone have a game plan for doing that. >>> >>> Yes, do not fail a disk and then replace it, use the newer replace >>> method >>> (it keeps redundancy in the array). >>> Even better would be to add a disk, and convert to RAID6, then add a >>> second >>> disk (using replace), and so on, then remove the last disk, grow the >>> array >>> to fill the 3TB, and then reduce the number of disks in the raid. >>> This way, you end up with RAID6... >>>> >>>> Or is a 9tb raid 5 array the wrong thing to be doing and should I >>>> be doing >>>> something else 6tb raid 10 or something I’m open to suggestions. >>> >>> I'd feel safer with RAID6, but it depends on your requirements. >>> RAID10 is >>> also a nice option, but, it depends... > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 10:54 ` Adam Goryachev @ 2017-03-21 11:03 ` Reindl Harald 2017-03-21 11:34 ` Andreas Klauer ` (2 more replies) 2017-03-21 11:55 ` Gandalf Corvotempesta 1 sibling, 3 replies; 34+ messages in thread From: Reindl Harald @ 2017-03-21 11:03 UTC (permalink / raw) To: Adam Goryachev, Jeff Allison; +Cc: linux-raid Am 21.03.2017 um 11:54 schrieb Adam Goryachev: > On 21/3/17 20:54, Reindl Harald wrote: >> Am 21.03.2017 um 03:33 schrieb Jeff Allison: >>> I don't have a spare SATA slot I do however have a spare USB carrier, >>> is that fast enough to be used temporarily? >> >> USB3 yes, USB2 don't make fun because the speed of the array depends >> on the slowest disk in the spindle >> >> and about RAID5/RAID6 versus RAID10: both RAID5 and RAID6 suffer from >> the same problems - due rebuild you have a lot of random-IO load on >> all remaining disks which leads in bad performance and make it more >> likely that before the rebuild is finished another disk fails, RAID6 >> produces even more random IO because of the double parity and if you >> have a Unrecoverable-Read-Error on RAID5 you are dead, RAID6 is not >> much better here and the probability of a URE becomes more likely with >> larger disks >> >> RAID10: less to zero performance impact due rebuild and no random-IO >> caused by the rebuild, it's just "read a disk from start to end and >> write the data on another disk linear" while the only head moves on >> your disks is the normal workload on the array >> >> with disks 2 TB or larger you can make the conclusion "do not use >> RAID5/6 anymore and when you do be prepared that you won't survive a >> rebuild caused by a failed disk" >> > I can't say I'm an expert in this, but in actual fact, I disagree with > both your arguments against RAID6... > You say recovery on a RAID10 is a simple linear read from one drive (the > surviving member of the RAID1 portion) and a linear write on the other > (the replaced drive). You also declare that there is no random IO with > normal work load + recovery. I think you have forgotten that the "normal > workload" is probably random IO, but certainly once combined with the > recovery IO then it will be random IO. but the point is that with RAID5/6 the recovery itself is *heavy random IO* and that get *combined* with the random IO auf the normal workload and that means *heavy load on the disks* > In addition, you claim that a drive larger than 2TB is almost certainly > going to suffer from a URE during recovery, yet this is exactly the > situation you will be in when trying to recover a RAID10 with member > devices 2TB or larger. A single URE on the surviving portion of the > RAID1 will cause you to lose the entire RAID10 array. On the other hand, > 3 URE's on the three remaining members of the RAID6 will not cause more > than a hiccup (as long as no more than one URE on the same stripe, which > I would argue is ... exceptionally unlikely). given that when your disks have the same age errors on another disk become more likely when one failed and the heavy disk IO due recovery of a RAID6 with takes *many hours* where you have heavy IO on *all disks* compared with a way faster restore of RAID1/10 guess in which case a URE is more likely additionally why should the whole array fail just because a single block get lost? the is no parity which needs to be calculated, you just lost a single block somewhere - RAID1/10 are way easier in their implementation > In addition, with a 4 disk RAID6 you have a 100% chance of surviving a 2 > drive failure without data loss, yet with 4 disk RAID10 you have a 50% > chance of surviving a 2 drive failure. yeah and you *need that* when it takes many hours ot a few days until your 8 TB RAID6 is resynced while the whole time *all disks* are under heavy stress > Sure, there are other things to consider (performance, cost, etc) but on > a reliability point, RAID6 seems to be the far better option *no* - it takes twice as long to recalculate from parity and stresses the remaining disks twice as hard as RAID5 and so you pretty soon end with lost both of the disk you can lose without the array goes down while you still have many hours remaining recovery time here you go: http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 11:03 ` Reindl Harald @ 2017-03-21 11:34 ` Andreas Klauer 2017-03-21 12:03 ` Reindl Harald 2017-03-21 11:56 ` Adam Goryachev 2017-03-21 13:13 ` David Brown 2 siblings, 1 reply; 34+ messages in thread From: Andreas Klauer @ 2017-03-21 11:34 UTC (permalink / raw) To: Reindl Harald; +Cc: Adam Goryachev, Jeff Allison, linux-raid On Tue, Mar 21, 2017 at 12:03:51PM +0100, Reindl Harald wrote: > but the point is that with RAID5/6 the recovery itself is *heavy random > IO* and that get *combined* with the random IO auf the normal workload > and that means *heavy load on the disks* Where do you get that random I/O idea from? Rebuild is linear. Or what do you mean by random I/O in this context? (RAID rebuilds) What kind of random things do you think the RAID is doing? If you see read errors during rebuild, the most common cause is that the rebuild also happens to be the first read test since forever. (Happens to be the case for people who don't do any disk monitoring.) > here you go: http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ This is just wrong. Regards Andreas Klauer ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 11:34 ` Andreas Klauer @ 2017-03-21 12:03 ` Reindl Harald 2017-03-21 12:41 ` Andreas Klauer 0 siblings, 1 reply; 34+ messages in thread From: Reindl Harald @ 2017-03-21 12:03 UTC (permalink / raw) To: Andreas Klauer; +Cc: Adam Goryachev, Jeff Allison, linux-raid Am 21.03.2017 um 12:34 schrieb Andreas Klauer: > On Tue, Mar 21, 2017 at 12:03:51PM +0100, Reindl Harald wrote: >> but the point is that with RAID5/6 the recovery itself is *heavy random >> IO* and that get *combined* with the random IO auf the normal workload >> and that means *heavy load on the disks* > > Where do you get that random I/O idea from? Rebuild is linear. > Or what do you mean by random I/O in this context? (RAID rebuilds) > What kind of random things do you think the RAID is doing? the IO of a RAID5/6 rebuild is hardly linear beause the informations (data + parity) are spread all over the disks while in case of RAID1/10 it is really linear > If you see read errors during rebuild, the most common cause is > that the rebuild also happens to be the first read test since forever. > (Happens to be the case for people who don't do any disk monitoring.) > >> here you go: http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ > > This is just wrong ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 12:03 ` Reindl Harald @ 2017-03-21 12:41 ` Andreas Klauer 2017-03-22 4:16 ` NeilBrown 0 siblings, 1 reply; 34+ messages in thread From: Andreas Klauer @ 2017-03-21 12:41 UTC (permalink / raw) To: Reindl Harald; +Cc: Adam Goryachev, Jeff Allison, linux-raid On Tue, Mar 21, 2017 at 01:03:22PM +0100, Reindl Harald wrote: > the IO of a RAID5/6 rebuild is hardly linear beause the informations > (data + parity) are spread all over the disks It's not "randomly" spread all over. The blocks are always where they belong. https://en.wikipedia.org/wiki/Standard_RAID_levels#/media/File:RAID_6.svg It's AAAA, BBBB, CCCC, DDDD. Not DBCA, BADC, ADBC, ... There is no random I/O involved here, at worst it will decide to not read a parity block because it's not needed but that does not cause huge/random jumps for the HDD read heads. > while in case of RAID1/10 it is really linear Actually RAID 10 has the most interesting layout choices... to this day mdadm is unable to grow/convert some of these. In a RAID 10 rebuild the HDD might have to jump from end to start. Of course if you consider metadata updates (progress has to be recorded somewhere?) then ALL rebuilds regardless of RAID level are random I/O in a way. But such is the fate of a HDD, it's their bread and butter. Any server that does anything other than "idle" does random I/O 24/7. If there was no other I/O (because the RAID is live during rebuild) and no metadata updates (or external metadata) you could totally do RAID0/1/5/6 rebuilds with tape drives. That's how random it is. RAID10 might need a rewind in between. Regards Andreas Klauer ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 12:41 ` Andreas Klauer @ 2017-03-22 4:16 ` NeilBrown 0 siblings, 0 replies; 34+ messages in thread From: NeilBrown @ 2017-03-22 4:16 UTC (permalink / raw) To: Andreas Klauer, Reindl Harald; +Cc: Adam Goryachev, Jeff Allison, linux-raid [-- Attachment #1: Type: text/plain, Size: 2267 bytes --] On Tue, Mar 21 2017, Andreas Klauer wrote: > On Tue, Mar 21, 2017 at 01:03:22PM +0100, Reindl Harald wrote: >> the IO of a RAID5/6 rebuild is hardly linear beause the informations >> (data + parity) are spread all over the disks > > It's not "randomly" spread all over. The blocks are always where they belong. > > https://en.wikipedia.org/wiki/Standard_RAID_levels#/media/File:RAID_6.svg > > It's AAAA, BBBB, CCCC, DDDD. Not DBCA, BADC, ADBC, ... > > There is no random I/O involved here, at worst it will decide to not read > a parity block because it's not needed but that does not cause huge/random > jumps for the HDD read heads. RAID5 resync (after an unclean shutdown) does read the parity. It reads all devices in parallel and checks parity. Normally all the parity is correct so it doesn't write at all. Occasionally there might be incorrect parity, in which case the head will seek back and write the correct parity. RAID5 recovery (when a device was removed and a new device is added) reads all the *other* devices in parallel, calculates the missing block (parity or data) and writes out to the replaced devices. All reads and writes are sequential. NeilBrown > >> while in case of RAID1/10 it is really linear > > Actually RAID 10 has the most interesting layout choices... > to this day mdadm is unable to grow/convert some of these. > > In a RAID 10 rebuild the HDD might have to jump from end to start. > > Of course if you consider metadata updates (progress has to be > recorded somewhere?) then ALL rebuilds regardless of RAID level > are random I/O in a way. > > But such is the fate of a HDD, it's their bread and butter. > Any server that does anything other than "idle" does random I/O 24/7. > > If there was no other I/O (because the RAID is live during rebuild) > and no metadata updates (or external metadata) you could totally do > RAID0/1/5/6 rebuilds with tape drives. That's how random it is. > RAID10 might need a rewind in between. > > Regards > Andreas Klauer > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 11:03 ` Reindl Harald 2017-03-21 11:34 ` Andreas Klauer @ 2017-03-21 11:56 ` Adam Goryachev 2017-03-21 12:10 ` Reindl Harald 2017-03-21 13:13 ` David Brown 2 siblings, 1 reply; 34+ messages in thread From: Adam Goryachev @ 2017-03-21 11:56 UTC (permalink / raw) To: Reindl Harald, Jeff Allison; +Cc: linux-raid Sorry, but I'm just seeing scaremongering and things that don't compute. Possibly I'm just not seeing it, but I don't see your advise being given by a majority of "experts" either on this list or elsewhere. I'll try to refrain from responding beyond this one, and return to lurking and hopefully learning more. Also, please note that the quoting / attribution seems to be wrong (inverted). On 21/3/17 22:03, Reindl Harald wrote: > > Am 21.03.2017 um 11:54 schrieb Adam Goryachev: >> On 21/3/17 20:54, Reindl Harald wrote: >>> and about RAID5/RAID6 versus RAID10: both RAID5 and RAID6 suffer from >>> the same problems - due rebuild you have a lot of random-IO load on >>> all remaining disks which leads in bad performance and make it more >>> likely that before the rebuild is finished another disk fails, RAID6 >>> produces even more random IO because of the double parity and if you >>> have a Unrecoverable-Read-Error on RAID5 you are dead, RAID6 is not >>> much better here and the probability of a URE becomes more likely with >>> larger disks >>> >>> RAID10: less to zero performance impact due rebuild and no random-IO >>> caused by the rebuild, it's just "read a disk from start to end and >>> write the data on another disk linear" while the only head moves on >>> your disks is the normal workload on the array >>> >>> with disks 2 TB or larger you can make the conclusion "do not use >>> RAID5/6 anymore and when you do be prepared that you won't survive a >>> rebuild caused by a failed disk" >>> >> I can't say I'm an expert in this, but in actual fact, I disagree with >> both your arguments against RAID6... >> You say recovery on a RAID10 is a simple linear read from one drive (the >> surviving member of the RAID1 portion) and a linear write on the other >> (the replaced drive). You also declare that there is no random IO with >> normal work load + recovery. I think you have forgotten that the "normal >> workload" is probably random IO, but certainly once combined with the >> recovery IO then it will be random IO. > > but the point is that with RAID5/6 the recovery itself is *heavy > random IO* and that get *combined* with the random IO auf the normal > workload and that means *heavy load on the disks* random IO is the same as random IO, regardless of the "cause" of making the IO random. In most systems, you won't be running anywhere near the IO limits, so allowing your recovery some portion of IO is not an issue. > >> In addition, you claim that a drive larger than 2TB is almost certainly >> going to suffer from a URE during recovery, yet this is exactly the >> situation you will be in when trying to recover a RAID10 with member >> devices 2TB or larger. A single URE on the surviving portion of the >> RAID1 will cause you to lose the entire RAID10 array. On the other hand, >> 3 URE's on the three remaining members of the RAID6 will not cause more >> than a hiccup (as long as no more than one URE on the same stripe, which >> I would argue is ... exceptionally unlikely). > > given that when your disks have the same age errors on another disk > become more likely when one failed and the heavy disk IO due recovery > of a RAID6 with takes *many hours* where you have heavy IO on *all > disks* compared with a way faster restore of RAID1/10 guess in which > case a URE is more likely > URE's are based on amount of data read, and that isn't cumulative, every block read starts again with the same chance. If winning lottery is a chance of 100:1 it doesn't mean you will win at least once if you buy 100 tickets. So reading 200,000,000 blocks also doesn't ensure you will see a URE (equally, you just might be lucky and win the lottery more than once, and get more than one URE). In any case, if you only have a single source of data, then you are more likely to lose it (this is one of the reasons for RAID and backups). So RAID6 which stores your data in more than one location (during a drive failure event) is better. BTW, just because you say that you will suffer a URE under heavy load doesn't make it true. The load factor doesn't change the frequency of a URE (even though it sounds possible). > additionally why should the whole array fail just because a single > block get lost? the is no parity which needs to be calculated, you > just lost a single block somewhere - RAID1/10 are way easier in their > implementation Equally, worst case, you have multiple URE on the same stripe on RAID6 only loses a single stripe (ok, a stripe is bigger than a block, but still much less likely to occur anyway). > >> In addition, with a 4 disk RAID6 you have a 100% chance of surviving a 2 >> drive failure without data loss, yet with 4 disk RAID10 you have a 50% >> chance of surviving a 2 drive failure. > > yeah and you *need that* when it takes many hours ot a few days until > your 8 TB RAID6 is resynced while the whole time *all disks* are under > heavy stress Why are all disks under heavy stress? Again, you don't operate (under normal conditions) at a heavy stress level, you need room to grow, and also peak load is going to be higher but for short duration. Normal activity might be 50% of maximum, degraded performance together with recovery might push that to 80%, but disks (decent ones) are not going to have a problem doing simple read/write activity, that is what they are designed for right? > >> Sure, there are other things to consider (performance, cost, etc) but on >> a reliability point, RAID6 seems to be the far better option > > *no* - it takes twice as long to recalculate from parity and stresses > the remaining disks twice as hard as RAID5 and so you pretty soon end > with lost both of the disk you can lose without the array goes down > while you still have many hours remaining recovery time > > here you go: > http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ That was written in 2010, 2019 is only 2 years away, (unless you meant 2029 and it was a typo) and I don't see evidence of that being true nor becoming true in such a short time. We don't see many (any?) people trying to recover their RAID6 arrays with double URE failures. You say it takes twice as long to recalculate from parity for RAID6 compared to RAID5, but with CPU performance, this is still faster than the drive speed (unless you have NVMe or some SSD's, but then I assume the whole URE issue is different there anyway). Also, why do you think it stresses the disks twice as hard as RAID5? To recover a RAID5 you need a full read of all surviving drives, that's 100% read. To recover a RAID6 you need a full read of all remaining drives minus one, so that is less than 100% read. So why are you "stressing the remaining disks twice as hard"? Also, why does a URE equal losing a disk, all you do is read that block from another member in the array, and fix the URE at the same time. If anything, you might suggest triple mirror RAID (what is that called? RAID110?) If I was to believe you, then that is the only sensible option, with triple mirror, when you lose any one drive, then you may recover by simply reading from the surviving members, and you are no worse off under any scenario. Even losing any two drives and you are still protected, potentially you can lose up to 4 drives without data loss (assuming a minimum of 6 drives). However, cost is a factor here. Finally, other than RAID110 (really, what is this called?) do you have any other sensible suggestions? RAID10 just doesn't seem to be it, and zfs doesn't seem to be mainstream enough either, same with btrfs and other FS's which can do various checksum/redundant data storage. PS, In case you are wondering, I am still running 8 drive RAID5 in real life workloads, and don't have any problems with data loss (albeit, I do use DRBD to replicate the data between two systems with RAID5 each, so you can call that RAID51 perhaps, but the point remains, I've never (yet) lost an entire RAID5 array due to multiple drive failure or URE's). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 11:56 ` Adam Goryachev @ 2017-03-21 12:10 ` Reindl Harald 0 siblings, 0 replies; 34+ messages in thread From: Reindl Harald @ 2017-03-21 12:10 UTC (permalink / raw) To: Adam Goryachev, Jeff Allison; +Cc: linux-raid Am 21.03.2017 um 12:56 schrieb Adam Goryachev: > Sorry, but I'm just seeing scaremongering and things that don't compute. > Possibly I'm just not seeing it, but I don't see your advise being given > by a majority of "experts" either on this list or elsewhere. I'll try to > refrain from responding beyond this one, and return to lurking and > hopefully learning more. > > Also, please note that the quoting / attribution seems to be wrong > (inverted). only in your mail client > On 21/3/17 22:03, Reindl Harald wrote: >> Am 21.03.2017 um 11:54 schrieb Adam Goryachev: >> but the point is that with RAID5/6 the recovery itself is *heavy >> random IO* and that get *combined* with the random IO auf the normal >> workload and that means *heavy load on the disks* > random IO is the same as random IO, regardless of the "cause" of making > the IO random no - it's a matter of *how much* random IO you have - when the rebuild process needs to seek for parity and remaining data blocks and hence produces heavily head movements all over the time this is added to the IO of the normal workload in case of a RAID1/10 rebuild the rbuild process itself is just a linear read and the only head moves of the disks is the normal workload on the array > In most systems, you won't be running anywhere near the IO limits, so > allowing your recovery some portion of IO is not an issue IO limits don't matter here when we talk about IOPS and drive head moves around heavily all the time because parity and data blocks for the restore are spread all over the disk *and* the requested workload data is also somewhere else in case of a RAID1/10 rebuild you have all the time linear IO from time to time interrupted by the workload on the array - that's a completly other stress level for a disk compared with seek for hours and days parity and data to restore the data for the failed disk ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 11:03 ` Reindl Harald 2017-03-21 11:34 ` Andreas Klauer 2017-03-21 11:56 ` Adam Goryachev @ 2017-03-21 13:13 ` David Brown 2017-03-21 13:24 ` Reindl Harald 2 siblings, 1 reply; 34+ messages in thread From: David Brown @ 2017-03-21 13:13 UTC (permalink / raw) To: Reindl Harald, Adam Goryachev, Jeff Allison; +Cc: linux-raid On 21/03/17 12:03, Reindl Harald wrote: > > > Am 21.03.2017 um 11:54 schrieb Adam Goryachev: <snip> > >> In addition, you claim that a drive larger than 2TB is almost certainly >> going to suffer from a URE during recovery, yet this is exactly the >> situation you will be in when trying to recover a RAID10 with member >> devices 2TB or larger. A single URE on the surviving portion of the >> RAID1 will cause you to lose the entire RAID10 array. On the other hand, >> 3 URE's on the three remaining members of the RAID6 will not cause more >> than a hiccup (as long as no more than one URE on the same stripe, which >> I would argue is ... exceptionally unlikely). > > given that when your disks have the same age errors on another disk > become more likely when one failed and the heavy disk IO due recovery of > a RAID6 with takes *many hours* where you have heavy IO on *all disks* > compared with a way faster restore of RAID1/10 guess in which case a URE > is more likely > > additionally why should the whole array fail just because a single block > get lost? the is no parity which needs to be calculated, you just lost a > single block somewhere - RAID1/10 are way easier in their implementation If you have RAID1, and you have an URE, then the data can be recovered from the other have of that RAID1 pair. If you have had a disk failure (manual for replacement, or a real failure), and you get an URE on the other half of that pair, then you lose data. With RAID6, you need an additional failure (either another full disk failure or an URE in the /same/ stripe) to lose data. RAID6 has higher redundancy than two-way RAID1 - of this there is /no/ doubt. > >> In addition, with a 4 disk RAID6 you have a 100% chance of surviving a 2 >> drive failure without data loss, yet with 4 disk RAID10 you have a 50% >> chance of surviving a 2 drive failure. > > yeah and you *need that* when it takes many hours ot a few days until > your 8 TB RAID6 is resynced while the whole time *all disks* are under > heavy stress > >> Sure, there are other things to consider (performance, cost, etc) but on >> a reliability point, RAID6 seems to be the far better option > > *no* - it takes twice as long to recalculate from parity and stresses > the remaining disks twice as hard as RAID5 and so you pretty soon end > with lost both of the disk you can lose without the array goes down > while you still have many hours remaining recovery time For RAID5 and RAID6, you read the same data - the full data stripe. For RAID5, you calculate and write a single parity block, while for RAID6 you calculate and write an additional parity block. The disk reads are the same in both cases, but you write out twice as many blocks. You do not stress the disks noticeably harder with RAID6 than with RAID5. > > here you go: http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ This is an article heavily based on a Sun engineer trying to promote his own alternative using scaremongering. It is, however, correct in suggesting that RAID6 is more reliable than RAID5. And triple-parity raid (or additional layered RAID) is more reliable than RAID6. Nowhere does it suggest that RAID1 is more reliable than RAID6. It all boils down to the redundancy level. Two-drive RAID1 pairs have a single drive redundancy. RAID5 has a single drive redundancy. RAID6 has two drive redundancy - thus it is more reliable and will tolerate more failures before losing data. If this is not enough, and you don't have triple parity RAID (it is not yet implemented in md - one day, perhaps), you can use more mirrors on RAID1 or use layers such as a RAID5 array built on RAID1 pairs. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 13:13 ` David Brown @ 2017-03-21 13:24 ` Reindl Harald 2017-03-21 14:15 ` David Brown 0 siblings, 1 reply; 34+ messages in thread From: Reindl Harald @ 2017-03-21 13:24 UTC (permalink / raw) To: David Brown, Adam Goryachev, Jeff Allison; +Cc: linux-raid Am 21.03.2017 um 14:13 schrieb David Brown: > On 21/03/17 12:03, Reindl Harald wrote: >> >> Am 21.03.2017 um 11:54 schrieb Adam Goryachev: > <snip> >> >>> In addition, you claim that a drive larger than 2TB is almost certainly >>> going to suffer from a URE during recovery, yet this is exactly the >>> situation you will be in when trying to recover a RAID10 with member >>> devices 2TB or larger. A single URE on the surviving portion of the >>> RAID1 will cause you to lose the entire RAID10 array. On the other hand, >>> 3 URE's on the three remaining members of the RAID6 will not cause more >>> than a hiccup (as long as no more than one URE on the same stripe, which >>> I would argue is ... exceptionally unlikely). >> >> given that when your disks have the same age errors on another disk >> become more likely when one failed and the heavy disk IO due recovery of >> a RAID6 with takes *many hours* where you have heavy IO on *all disks* >> compared with a way faster restore of RAID1/10 guess in which case a URE >> is more likely >> >> additionally why should the whole array fail just because a single block >> get lost? the is no parity which needs to be calculated, you just lost a >> single block somewhere - RAID1/10 are way easier in their implementation > > If you have RAID1, and you have an URE, then the data can be recovered > from the other have of that RAID1 pair. If you have had a disk failure > (manual for replacement, or a real failure), and you get an URE on the > other half of that pair, then you lose data. > > With RAID6, you need an additional failure (either another full disk > failure or an URE in the /same/ stripe) to lose data. RAID6 has higher > redundancy than two-way RAID1 - of this there is /no/ doubt yes, but with RAID5/RAID6 *all disks* are involved in the rebuild, with a 10 disk RAID10 only one disk needs to be read and the data written to the new one - all other disks are not involved in the resync at all for most arrays the disks have a similar age and usage pattern, so when the first one fails it becomes likely that it don't take too long for another one and so load and recovery time matters ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 13:24 ` Reindl Harald @ 2017-03-21 14:15 ` David Brown 2017-03-21 15:25 ` Wols Lists 0 siblings, 1 reply; 34+ messages in thread From: David Brown @ 2017-03-21 14:15 UTC (permalink / raw) To: Reindl Harald, Adam Goryachev, Jeff Allison; +Cc: linux-raid On 21/03/17 14:24, Reindl Harald wrote: > > > Am 21.03.2017 um 14:13 schrieb David Brown: >> On 21/03/17 12:03, Reindl Harald wrote: >>> >>> Am 21.03.2017 um 11:54 schrieb Adam Goryachev: >> <snip> >>> >>>> In addition, you claim that a drive larger than 2TB is almost certainly >>>> going to suffer from a URE during recovery, yet this is exactly the >>>> situation you will be in when trying to recover a RAID10 with member >>>> devices 2TB or larger. A single URE on the surviving portion of the >>>> RAID1 will cause you to lose the entire RAID10 array. On the other >>>> hand, >>>> 3 URE's on the three remaining members of the RAID6 will not cause more >>>> than a hiccup (as long as no more than one URE on the same stripe, >>>> which >>>> I would argue is ... exceptionally unlikely). >>> >>> given that when your disks have the same age errors on another disk >>> become more likely when one failed and the heavy disk IO due recovery of >>> a RAID6 with takes *many hours* where you have heavy IO on *all disks* >>> compared with a way faster restore of RAID1/10 guess in which case a URE >>> is more likely >>> >>> additionally why should the whole array fail just because a single block >>> get lost? the is no parity which needs to be calculated, you just lost a >>> single block somewhere - RAID1/10 are way easier in their implementation >> >> If you have RAID1, and you have an URE, then the data can be recovered >> from the other have of that RAID1 pair. If you have had a disk failure >> (manual for replacement, or a real failure), and you get an URE on the >> other half of that pair, then you lose data. >> >> With RAID6, you need an additional failure (either another full disk >> failure or an URE in the /same/ stripe) to lose data. RAID6 has higher >> redundancy than two-way RAID1 - of this there is /no/ doubt > > yes, but with RAID5/RAID6 *all disks* are involved in the rebuild, with > a 10 disk RAID10 only one disk needs to be read and the data written to > the new one - all other disks are not involved in the resync at all True... > > for most arrays the disks have a similar age and usage pattern, so when > the first one fails it becomes likely that it don't take too long for > another one and so load and recovery time matters False. There is no reason to suspect that - certainly not to within the hours or day it takes to rebuild your array. Disk failure pattern shows a peak within the first month or so (failures due to manufacturing or handling), then a very low error rate for a few years, then a gradually increasing rate after that. There is not a very significant correlation between drive failures within the same system, nor is there a very significant correlation between usage and failures. It might seem reasonable to suspect that a drive is more likely to fail during a rebuild since the disk is being heavily used, but that does not appear to be the practice. You will /spot/ more errors at that point - simply because you don't see errors in parts of the disk that are not read - but the rebuilding does not cause them. And even if it /were/ true, then the key point is if there is an error that causes data loss. An error during reading for a RAID1 rebuild means lost data. An error during reading for a RAID6 rebuild means you have to read an extra sector from another disk and correct the mistake. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 14:15 ` David Brown @ 2017-03-21 15:25 ` Wols Lists 2017-03-21 15:41 ` David Brown 0 siblings, 1 reply; 34+ messages in thread From: Wols Lists @ 2017-03-21 15:25 UTC (permalink / raw) To: David Brown, Reindl Harald, Adam Goryachev, Jeff Allison; +Cc: linux-raid On 21/03/17 14:15, David Brown wrote: >> for most arrays the disks have a similar age and usage pattern, so when >> > the first one fails it becomes likely that it don't take too long for >> > another one and so load and recovery time matters > False. There is no reason to suspect that - certainly not to within the > hours or day it takes to rebuild your array. Disk failure pattern shows > a peak within the first month or so (failures due to manufacturing or > handling), then a very low error rate for a few years, then a gradually > increasing rate after that. There is not a very significant correlation > between drive failures within the same system, nor is there a very > significant correlation between usage and failures. Except your argument and the claim don't match. You're right - disk failures follow the pattern you describe. BUT. If the array was created from completely new disks, then the usage patterns will be very similar, therefore there will be a statistical correlation between failures as compared to the population as a whole. (Bit like a false DNA match is much higher in an inbred town, than in a cosmopolitan city of immigrants.) EVEN WORSE. The probability of all the drives coming off the same batch, and sharing the same systematic defects, is much much higher. One only has to look at the Seagate 3TB Barracuda mess to see a perfect example. In other words, IFF your array is built of a bunch of identical drives all bought at the same time, the risk of multiple failure is significantly higher. How significant that is I don't know, but it is a very valid reason for replacing your drives at semi-random intervals. (Completely off topic :-) but a real-world demonstrable example is couples' initials. "Like chooses like" and if you compare a couple's first initials against what you would expect from a random sample, there is a VERY significant spike in couples that share the same initial.) To put it bluntly, if your array consists of disks with near-identical characteristics (including manufacturing batch), then your chances of random multiple failure are noticeably increased. Is it worth worrying about? If you can do something about it, of course! Cheers, Wol ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 15:25 ` Wols Lists @ 2017-03-21 15:41 ` David Brown 2017-03-21 16:49 ` Phil Turmel 0 siblings, 1 reply; 34+ messages in thread From: David Brown @ 2017-03-21 15:41 UTC (permalink / raw) To: Wols Lists, Reindl Harald, Adam Goryachev, Jeff Allison; +Cc: linux-raid On 21/03/17 16:25, Wols Lists wrote: > On 21/03/17 14:15, David Brown wrote: >>> for most arrays the disks have a similar age and usage pattern, so when >>>> the first one fails it becomes likely that it don't take too long for >>>> another one and so load and recovery time matters > >> False. There is no reason to suspect that - certainly not to within the >> hours or day it takes to rebuild your array. Disk failure pattern shows >> a peak within the first month or so (failures due to manufacturing or >> handling), then a very low error rate for a few years, then a gradually >> increasing rate after that. There is not a very significant correlation >> between drive failures within the same system, nor is there a very >> significant correlation between usage and failures. > > Except your argument and the claim don't match. You're right - disk > failures follow the pattern you describe. BUT. > > If the array was created from completely new disks, then the usage > patterns will be very similar, therefore there will be a statistical > correlation between failures as compared to the population as a whole. > (Bit like a false DNA match is much higher in an inbred town, than in a > cosmopolitan city of immigrants.) > > EVEN WORSE. The probability of all the drives coming off the same batch, > and sharing the same systematic defects, is much much higher. One only > has to look at the Seagate 3TB Barracuda mess to see a perfect example. > > In other words, IFF your array is built of a bunch of identical drives > all bought at the same time, the risk of multiple failure is > significantly higher. How significant that is I don't know, but it is a > very valid reason for replacing your drives at semi-random intervals. > There /is/ a bit of correlation for early-fail drives coming from the same batch. But there is little correlation for normal lifetime drives. If you roll three dice and sum them, the expected sum will follow a nice Bell curve distribution. If you pick another three dice and roll them, they will follow the same distribution for the expected sum. But there is no correlation between the sums. Similarly, maybe you figure out that there is a 10% chance of the drive dying in the first month, 10% chance of it dying in the next three years, then 30% for the fourth year, 40% for the fifth year, and 10% spread out over the following years. Multiple drives of the same type bought at the same time, and run in the same conditions (usage patterns, heat, humidity, etc.) will have the same expected lifetime curves. But if one drive fails in its fourth year, that does not affect the probability of a second drive also failing in the same year - it is basically independent. Now, there will be a little bit of correlation, especially if there are factors that may significantly affect reliability (such as someone bumping the server). But you are still extremely unlikely to find that after one drive dies, a second drive dies on the same day or so (during the rebuild) - it is possible, but it is very bad luck. There is no statistical basis for thinking it that when one drive dies, it is likely that another one will die too. Of course, some types of failures can affect several drives - a motherboard failure, power supply problem, or similar event could kill all your disks at the same time. RAID does not avoid the need for backups! Also early death failures can be correlated with a bad production batch - mixing different batches helps reduce the risk of total failure. Similarly, mixing different disk types reduces the risk of total failures due to systematic errors such as firmware bugs. > (Completely off topic :-) but a real-world demonstrable example is > couples' initials. "Like chooses like" and if you compare a couple's > first initials against what you would expect from a random sample, there > is a VERY significant spike in couples that share the same initial.) > > To put it bluntly, if your array consists of disks with near-identical > characteristics (including manufacturing batch), then your chances of > random multiple failure are noticeably increased. Is it worth worrying > about? If you can do something about it, of course! > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 15:41 ` David Brown @ 2017-03-21 16:49 ` Phil Turmel 2017-03-22 13:53 ` Gandalf Corvotempesta 0 siblings, 1 reply; 34+ messages in thread From: Phil Turmel @ 2017-03-21 16:49 UTC (permalink / raw) To: David Brown, Wols Lists, Reindl Harald, Adam Goryachev, Jeff Allison Cc: linux-raid On 03/21/2017 11:41 AM, David Brown wrote: > There /is/ a bit of correlation for early-fail drives coming from > the same batch. But there is little correlation for normal lifetime > drives. > > If you roll three dice and sum them, the expected sum will follow a > nice Bell curve distribution. If you pick another three dice and > roll them, they will follow the same distribution for the expected > sum. But there is no correlation between the sums. Let me add to this: The correlation is effectively immaterial in a non-degraded raid5 and singly-degraded raid6 because recovery will succeed as long as any two errors are in different 4k block/sector locations. And for non-degraded raid6, all three UREs must occur in the same block/sector to lose data. Some participants in this discussion need to read the statistical description of this stuff here: http://marc.info/?l=linux-raid&m=139050322510249&w=2 As long as you are 'check' scrubbing every so often (I scrub weekly), the odds of catastrophe on raid6 are the odds of something *else* taking out the machine or controller, not the odds of simultaneous drive failures. Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 16:49 ` Phil Turmel @ 2017-03-22 13:53 ` Gandalf Corvotempesta 2017-03-22 14:12 ` David Brown 2017-03-22 14:32 ` Phil Turmel 0 siblings, 2 replies; 34+ messages in thread From: Gandalf Corvotempesta @ 2017-03-22 13:53 UTC (permalink / raw) To: Phil Turmel Cc: David Brown, Wols Lists, Reindl Harald, Adam Goryachev, Jeff Allison, linux-raid 2017-03-21 17:49 GMT+01:00 Phil Turmel <philip@turmel.org>: > The correlation is effectively immaterial in a non-degraded raid5 and > singly-degraded raid6 because recovery will succeed as long as any two > errors are in different 4k block/sector locations. And for non-degraded > raid6, all three UREs must occur in the same block/sector to lose > data. Some participants in this discussion need to read the statistical > description of this stuff here: > > http://marc.info/?l=linux-raid&m=139050322510249&w=2 > > As long as you are 'check' scrubbing every so often (I scrub weekly), > the odds of catastrophe on raid6 are the odds of something *else* taking > out the machine or controller, not the odds of simultaneous drive > failures. This is true but disk failures happens much more than multiple UREs on the same stripe. I think that in a RAID6 is much easier to loose data due to multiple disk failures. Last years i've lose a server due to 4 (of 6) disks failures in less than an hours during a rebuild. The first failure was detected in the middle of the night. It was a disconnection/reconnaction of a single disks. The riconnection triggered a resync. During the resync another disk failed. RAID6 recovered even from this double failure but at about 60% of rebuild, the third disk failed bringing the whole raid down. I was waked up by our monitoring system and looking at the server, there was also a fourth disk down :) 4 disks down in less than a hour. All disk was enterprise: SAS 15K, not desktop drives. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-22 13:53 ` Gandalf Corvotempesta @ 2017-03-22 14:12 ` David Brown 2017-03-22 14:32 ` Phil Turmel 1 sibling, 0 replies; 34+ messages in thread From: David Brown @ 2017-03-22 14:12 UTC (permalink / raw) To: Gandalf Corvotempesta, Phil Turmel Cc: Wols Lists, Reindl Harald, Adam Goryachev, Jeff Allison, linux-raid On 22/03/17 14:53, Gandalf Corvotempesta wrote: > 2017-03-21 17:49 GMT+01:00 Phil Turmel <philip@turmel.org>: >> The correlation is effectively immaterial in a non-degraded raid5 and >> singly-degraded raid6 because recovery will succeed as long as any two >> errors are in different 4k block/sector locations. And for non-degraded >> raid6, all three UREs must occur in the same block/sector to lose >> data. Some participants in this discussion need to read the statistical >> description of this stuff here: >> >> http://marc.info/?l=linux-raid&m=139050322510249&w=2 >> >> As long as you are 'check' scrubbing every so often (I scrub weekly), >> the odds of catastrophe on raid6 are the odds of something *else* taking >> out the machine or controller, not the odds of simultaneous drive >> failures. > > This is true but disk failures happens much more than multiple UREs on > the same stripe. > I think that in a RAID6 is much easier to loose data due to multiple > disk failures. Certainly multiple disk failures is an easy way to loose data in /any/ storage system (or at least, loose data since the last backup). The issue here is whether it is more or less likely to be a problem in RAID6 than other raid arrangements. And the answer is that complete disk failures are not more likely during a RAID6 rebuild than during other raid rebuilds, and a RAID6 will tolerate more failures than RAID1 or RAID5. Of course, multiple disk failures /do/ occur. There can be a common cause of failure. I have had a few raid systems die completely over the years. The causes I can remember include: 1. The SAS controller card died - and I didn't have a replacement. The data on the disks is probably still fine. 2. The whole computer died in some unknown way. The data on the disks was fine - I put them in another cabinet and re-assembled the md array. 3. A hardware raid card died. The data may have been on the disks, but the hardware raid was in a proprietary format. 4. I knocked a disk cabinet off its shelf. This let to multiple simultaneous drive failures. Based on these, my policy is: 1. Stick to SATA drives that are easily available, easily replaced, and easily read from any system. 2. Avoid hardware raid - use md raid and/or btrfs raid. 3. Do a lot of backups - on independent systems, and with off-site copies. Raid does not prevent loss from fire or theft, or a UPS going bananas, or a user deleting the wrong file. 4. Mount your equipment securely, and turn round slowly! > > Last years i've lose a server due to 4 (of 6) disks failures in less > than an hours during a rebuild. > > The first failure was detected in the middle of the night. It was a > disconnection/reconnaction of a single disks. > The riconnection triggered a resync. During the resync another disk > failed. RAID6 recovered even from this double failure > but at about 60% of rebuild, the third disk failed bringing the whole raid down. > > I was waked up by our monitoring system and looking at the server, > there was also a fourth disk down :) > > 4 disks down in less than a hour. All disk was enterprise: SAS 15K, > not desktop drives. > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-22 13:53 ` Gandalf Corvotempesta 2017-03-22 14:12 ` David Brown @ 2017-03-22 14:32 ` Phil Turmel 1 sibling, 0 replies; 34+ messages in thread From: Phil Turmel @ 2017-03-22 14:32 UTC (permalink / raw) To: Gandalf Corvotempesta Cc: David Brown, Wols Lists, Reindl Harald, Adam Goryachev, Jeff Allison, linux-raid On 03/22/2017 09:53 AM, Gandalf Corvotempesta wrote: > Last years i've lose a server due to 4 (of 6) disks failures in less > than an hours during a rebuild. > > The first failure was detected in the middle of the night. It was a > disconnection/reconnaction of a single disks. > The riconnection triggered a resync. During the resync another disk > failed. RAID6 recovered even from this double failure > but at about 60% of rebuild, the third disk failed bringing the whole raid down. > > I was waked up by our monitoring system and looking at the server, > there was also a fourth disk down :) > > 4 disks down in less than a hour. All disk was enterprise: SAS 15K, > not desktop drives. You should win a prize, Gandalf. In the several years I've participated on this mailing list, you are the first to describe such a catastrophe where the drives really were at fault, instead of timeout mismatch, power supplies, cables, or controllers. All four disks had permanent "FAILED" smartctl status after this, yes? Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 10:54 ` Adam Goryachev 2017-03-21 11:03 ` Reindl Harald @ 2017-03-21 11:55 ` Gandalf Corvotempesta 1 sibling, 0 replies; 34+ messages in thread From: Gandalf Corvotempesta @ 2017-03-21 11:55 UTC (permalink / raw) To: Adam Goryachev; +Cc: Reindl Harald, Jeff Allison, linux-raid 2017-03-21 11:54 GMT+01:00 Adam Goryachev <mailinglists@websitemanagers.com.au>: > I can't say I'm an expert in this, but in actual fact, I disagree with both > your arguments against RAID6... > You say recovery on a RAID10 is a simple linear read from one drive (the > surviving member of the RAID1 portion) and a linear write on the other (the > replaced drive). You also declare that there is no random IO with normal > work load + recovery. I think you have forgotten that the "normal workload" > is probably random IO, but certainly once combined with the recovery IO then > it will be random IO. > > In addition, you claim that a drive larger than 2TB is almost certainly > going to suffer from a URE during recovery, yet this is exactly the > situation you will be in when trying to recover a RAID10 with member devices > 2TB or larger. A single URE on the surviving portion of the RAID1 will cause > you to lose the entire RAID10 array. On the other hand, 3 URE's on the three > remaining members of the RAID6 will not cause more than a hiccup (as long as > no more than one URE on the same stripe, which I would argue is ... > exceptionally unlikely). > > In addition, with a 4 disk RAID6 you have a 100% chance of surviving a 2 > drive failure without data loss, yet with 4 disk RAID10 you have a 50% > chance of surviving a 2 drive failure. > > Sure, there are other things to consider (performance, cost, etc) but on a > reliability point, RAID6 seems to be the far better option. Totally agree ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 9:54 ` Reindl Harald 2017-03-21 10:54 ` Adam Goryachev @ 2017-03-21 13:02 ` David Brown 2017-03-21 13:26 ` Gandalf Corvotempesta ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: David Brown @ 2017-03-21 13:02 UTC (permalink / raw) To: Reindl Harald, Jeff Allison, Adam Goryachev; +Cc: linux-raid On 21/03/17 10:54, Reindl Harald wrote: > > > Am 21.03.2017 um 03:33 schrieb Jeff Allison: >> I don't have a spare SATA slot I do however have a spare USB carrier, >> is that fast enough to be used temporarily? > > USB3 yes, USB2 don't make fun because the speed of the array depends on > the slowest disk in the spindle When you are turning your RAID5 into RAID6, you can use a non-standard layout with the external drive being the second parity. That way you don't need to re-write the data on the existing drives, and the access to the external drive will all be writes of the Q parity - the system will not read from that drive unless it has to recover from a two drive failure. This will reduce stress on all the disks, and make the limited USB2 bandwidth less of an issue. If you have to use two USB carriers for the whole process, try to make sure they are connected to separate root hubs so that they don't share the bandwidth. This is not always just a matter of using two USB ports - sometimes two adjacent USB ports on a PC share an internal hub. > > and about RAID5/RAID6 versus RAID10: both RAID5 and RAID6 suffer from > the same problems - due rebuild you have a lot of random-IO load on all > remaining disks which leads in bad performance and make it more likely > that before the rebuild is finished another disk fails, RAID6 produces > even more random IO because of the double parity and if you have a > Unrecoverable-Read-Error on RAID5 you are dead, RAID6 is not much better > here and the probability of a URE becomes more likely with larger disks Rebuilds are done using streamed linear access - the only random access is the mix of rebuild transfers with normal usage of the array. This applies to RAID5 and RAID6 as well as RAID1 or RAID10. With RAID5 or two-disk RAID1, if you get an URE on a read then you can recover the data without loss. This is the case for normal (non-degraded) use, or if you are using "replace" to duplicate an existing disk before replacement. If you have failed a drive (manually, or due to a serious disk failure), then any single URE means lost data in that stripe. With RAID6 (or three-disk RAID1), you can tolerate /two/ URE's on the same stripe. If you have failed a disk for replacement, you can tolerate one URE. Note that to cause failure in non-degraded RAID5 (or degraded RAID6), your two URE's need to be on the same stripe in order to cause data loss. The chances of getting an URE somewhere on the disk are roughly proportional to the size of the disk - but the chance of getting an URE on the same stripe as another URE on another disk are basically independent of the disk size, and it is extraordinarily small. > > RAID10: less to zero performance impact due rebuild and no random-IO > caused by the rebuild, it's just "read a disk from start to end and > write the data on another disk linear" while the only head moves on your > disks is the normal workload on the array RAID1 (and RAID0) rebuilds are a little more efficient than RAID5 or RAID6 rebuilds - but not hugely so. Depending on factors such as IO structures, cpu speed and loading, number of disks in the array, concurrent access to other data, etc., they can be something like 25% to 50% faster. They do not involve noticeably more or less linear access than a RAID5/RAID6 rebuild, but they avoid heavy access to disks other than those in the RAID1 pair being rebuilt. > > with disks 2 TB or larger you can make the conclusion "do not use > RAID5/6 anymore and when you do be prepared that you won't survive a > rebuild caused by a failed disk" No, you cannot. Your conclusion here is based on several totally incorrect assumptions: 1. You think that RAID5/RAID6 recovery is more stressful, because the parity is "all over the place". This is wrong. 2. You think that random IO has higher chance of getting an URE than linear IO. This is wrong. 3. You think that getting an URE on one disk, then getting an URE on a second disk, counts as a double failure that will break an single-parity redundancy (RAID5, RAID1, RAID6 in degraded mode). This is wrong - it is only a problem if the two UREs are in the same stripe, which is quite literally a one in a million chance. There are certainly good reasons to prefer RAID10 systems to RAID5/RAID6 - for some types of loads, it can be significantly faster, and even though the rebuild time is not as much faster as you think, it is still faster. Linux supports a range of different RAID types for good reason - it is not a "one size fits all" problem. But you should learn the differences and make your choices and recommendations based on facts, rather than articles written by people trying to sell their own "solutions". mvh., David > >> On 21 March 2017 at 01:59, Adam Goryachev >> <mailinglists@websitemanagers.com.au> wrote: >>> >>> >>> On 20/3/17 23:47, Jeff Allison wrote: >>>> >>>> Hi all I’ve had a poke around but am yet to find something definitive. >>>> >>>> I have a raid 5 array of 4 disks amounting to approx 5.5tb. Now this >>>> disks >>>> are getting a bit long in the tooth so before I get into problems I’ve >>>> bought 4 new disks to replace them. >>>> >>>> I have a backup so if it all goes west I’m covered. So I’m looking for >>>> suggestions. >>>> >>>> My current plan is just to replace the 2tb drives with the new 3tb >>>> drives >>>> and move on, I’d like to do it on line with out having to trash the >>>> array >>>> and start again, so does anyone have a game plan for doing that. >>> >>> Yes, do not fail a disk and then replace it, use the newer replace >>> method >>> (it keeps redundancy in the array). >>> Even better would be to add a disk, and convert to RAID6, then add a >>> second >>> disk (using replace), and so on, then remove the last disk, grow the >>> array >>> to fill the 3TB, and then reduce the number of disks in the raid. >>> This way, you end up with RAID6... >>>> >>>> Or is a 9tb raid 5 array the wrong thing to be doing and should I be >>>> doing >>>> something else 6tb raid 10 or something I’m open to suggestions. >>> >>> I'd feel safer with RAID6, but it depends on your requirements. >>> RAID10 is >>> also a nice option, but, it depends... > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 13:02 ` David Brown @ 2017-03-21 13:26 ` Gandalf Corvotempesta 2017-03-21 14:26 ` David Brown 2017-03-21 15:29 ` Wols Lists 2017-03-21 16:55 ` Phil Turmel 2 siblings, 1 reply; 34+ messages in thread From: Gandalf Corvotempesta @ 2017-03-21 13:26 UTC (permalink / raw) To: David Brown; +Cc: Reindl Harald, Jeff Allison, Adam Goryachev, linux-raid 2017-03-21 14:02 GMT+01:00 David Brown <david.brown@hesbynett.no>: > Note that to cause failure in non-degraded RAID5 (or degraded RAID6), > your two URE's need to be on the same stripe in order to cause data > loss. The chances of getting an URE somewhere on the disk are roughly > proportional to the size of the disk - but the chance of getting an URE > on the same stripe as another URE on another disk are basically > independent of the disk size, and it is extraordinarily small. Little bit OT: is this the same even for HW RAID Controllers like LSI Megaraid or they tend to fail the rebuild in case of multiple URE even in different stripes? > No, you cannot. Your conclusion here is based on several totally > incorrect assumptions: > > 1. You think that RAID5/RAID6 recovery is more stressful, because the > parity is "all over the place". This is wrong. > > 2. You think that random IO has higher chance of getting an URE than > linear IO. This is wrong. Totally agree. > 3. You think that getting an URE on one disk, then getting an URE on a > second disk, counts as a double failure that will break an single-parity > redundancy (RAID5, RAID1, RAID6 in degraded mode). This is wrong - it > is only a problem if the two UREs are in the same stripe, which is quite > literally a one in a million chance. I'm not sure about this. The posted paper is talking about "standard" raid made with hw raid controllers and I'm not sure if they are able to finish a rebuild in case of double URE even if coming from different stripes. I think they fail the whole rebuild. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 13:26 ` Gandalf Corvotempesta @ 2017-03-21 14:26 ` David Brown 2017-03-21 15:31 ` Wols Lists 0 siblings, 1 reply; 34+ messages in thread From: David Brown @ 2017-03-21 14:26 UTC (permalink / raw) To: Gandalf Corvotempesta Cc: Reindl Harald, Jeff Allison, Adam Goryachev, linux-raid On 21/03/17 14:26, Gandalf Corvotempesta wrote: > 2017-03-21 14:02 GMT+01:00 David Brown <david.brown@hesbynett.no>: >> Note that to cause failure in non-degraded RAID5 (or degraded RAID6), >> your two URE's need to be on the same stripe in order to cause data >> loss. The chances of getting an URE somewhere on the disk are roughly >> proportional to the size of the disk - but the chance of getting an URE >> on the same stripe as another URE on another disk are basically >> independent of the disk size, and it is extraordinarily small. > > Little bit OT: > is this the same even for HW RAID Controllers like LSI Megaraid > or they tend to fail the rebuild in case of multiple URE even in > different stripes? It should be true, for decent HW RAID setups. One possible problem is the famous re-read timeouts - if you use a consumer hard drive with long re-read timeouts, and have not (or cannot) configure it to have a short timeout, then a hardware RAID controller might consider a drive to be completely dead while the drive is simply spending 30 seconds re-trying its read. If the raid controller drops the drive, then it is like an URE in /all/ stripes at once! > >> No, you cannot. Your conclusion here is based on several totally >> incorrect assumptions: >> >> 1. You think that RAID5/RAID6 recovery is more stressful, because the >> parity is "all over the place". This is wrong. >> >> 2. You think that random IO has higher chance of getting an URE than >> linear IO. This is wrong. > > Totally agree. > >> 3. You think that getting an URE on one disk, then getting an URE on a >> second disk, counts as a double failure that will break an single-parity >> redundancy (RAID5, RAID1, RAID6 in degraded mode). This is wrong - it >> is only a problem if the two UREs are in the same stripe, which is quite >> literally a one in a million chance. > > I'm not sure about this. > The posted paper is talking about "standard" raid made with hw raid controllers > and I'm not sure if they are able to finish a rebuild in case of double URE even > if coming from different stripes. > > I think they fail the whole rebuild. > I cannot imagine why that would be the case. Suppose you have seven drive RAID6, with data blocks ABCDE and parities PQ. To make it simpler, assume that on this particular stripe, the order is ABCDEPQ. If drive 5 has failed and you are rebuilding, the RAID system will read in ABCD-P-. It will not read from drive 5 (since you are rebuilding it), and it will not bother reading drive 7 because it doesn't need the Q parity (it /might/ read it in as part of a streamed read). It calculates E from ABCD and P, and writes it out. If, for example, drive 3 gets an URE at this point then it will read the Q parity and calculate C and E from ABD P and Q. It will write out E to the rebuild drive, and also C to the drive with the URE - the drive will handle sector relocation as needed. The result is that the stripe ABCDEPQ is correct on the disk. The drive with the URE will not be dropped from the array. Then it moves on to the next stripe, and repeats the process. An URE here is independent of an URE in the previous stripe, and errors can again be corrected. It is possible that if there are a large number of UREs from a drive, that the RAID system will consider the whole drive bad and drop it. But other than that, UREs will be treated independently. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 14:26 ` David Brown @ 2017-03-21 15:31 ` Wols Lists 2017-03-21 17:00 ` Phil Turmel 0 siblings, 1 reply; 34+ messages in thread From: Wols Lists @ 2017-03-21 15:31 UTC (permalink / raw) To: David Brown, Gandalf Corvotempesta Cc: Reindl Harald, Jeff Allison, Adam Goryachev, linux-raid On 21/03/17 14:26, David Brown wrote: > It is possible that if there are a large number of UREs from a drive, > that the RAID system will consider the whole drive bad and drop it. But > other than that, UREs will be treated independently. Doesn't mdadm have a setting that does exactly that? Too many UREs and the drive gets dropped? I'm sure I've come across that interfering with rebuilds. Cheers, Wol ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 15:31 ` Wols Lists @ 2017-03-21 17:00 ` Phil Turmel 0 siblings, 0 replies; 34+ messages in thread From: Phil Turmel @ 2017-03-21 17:00 UTC (permalink / raw) To: Wols Lists, David Brown, Gandalf Corvotempesta Cc: Reindl Harald, Jeff Allison, Adam Goryachev, linux-raid On 03/21/2017 11:31 AM, Wols Lists wrote: > On 21/03/17 14:26, David Brown wrote: >> It is possible that if there are a large number of UREs from a >> drive, that the RAID system will consider the whole drive bad and >> drop it. But other than that, UREs will be treated independently. > > Doesn't mdadm have a setting that does exactly that? Too many UREs > and the drive gets dropped? I'm sure I've come across that > interfering with rebuilds. Yes. MD maintains a per-member-device counter of read errors and drops the device when the counter reaches 20 (twenty). The counter is decremented by 10 (ten) once an hour. A short burst of less than 20 read errors will be tolerated, as long as they don't continue at more than 10/hour. Last I checked, this behavior is hard-coded. Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 13:02 ` David Brown 2017-03-21 13:26 ` Gandalf Corvotempesta @ 2017-03-21 15:29 ` Wols Lists 2017-03-21 16:55 ` Phil Turmel 2 siblings, 0 replies; 34+ messages in thread From: Wols Lists @ 2017-03-21 15:29 UTC (permalink / raw) To: David Brown, Reindl Harald, Jeff Allison, Adam Goryachev; +Cc: linux-raid On 21/03/17 13:02, David Brown wrote: > If you have to use two USB carriers for the whole process, try to make > sure they are connected to separate root hubs so that they don't share > the bandwidth. This is not always just a matter of using two USB ports > - sometimes two adjacent USB ports on a PC share an internal hub. Having built a bunch of desktop pcs from parts, I'd say adjacent ports almost certainly share an internal hub. Typically, a single mobo header will run a wire to a double slot at the front, or a double slot at the back. So plugging one in at the front, and one at the back, will get round this unless it's actually just one hub in the ?northbridge. Cheers, Wol ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-21 13:02 ` David Brown 2017-03-21 13:26 ` Gandalf Corvotempesta 2017-03-21 15:29 ` Wols Lists @ 2017-03-21 16:55 ` Phil Turmel 2 siblings, 0 replies; 34+ messages in thread From: Phil Turmel @ 2017-03-21 16:55 UTC (permalink / raw) To: David Brown, Reindl Harald, Jeff Allison, Adam Goryachev; +Cc: linux-raid On 03/21/2017 09:02 AM, David Brown wrote: > With RAID6 (or three-disk RAID1), you can tolerate /two/ URE's on > the same stripe. If you have failed a disk for replacement, you can > tolerate one URE. One nit to pick here: The UREs have to be in the same 4k block/sector, not just in the same stripe. The stripe cache and all parity calculations are done on strips of 4k blocks, not whole N*chunk stripes. That makes the odds even larger. Phil ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: proactive disk replacement 2017-03-20 12:47 proactive disk replacement Jeff Allison 2017-03-20 13:25 ` Reindl Harald 2017-03-20 14:59 ` Adam Goryachev @ 2017-03-22 14:51 ` John Stoffel 2 siblings, 0 replies; 34+ messages in thread From: John Stoffel @ 2017-03-22 14:51 UTC (permalink / raw) To: Jeff Allison; +Cc: linux-raid Jeff> Hi all I’ve had a poke around but am yet to find something Jeff> definitive. I have a raid 5 array of 4 disks amounting to Jeff> approx 5.5tb. Now this disks are getting a bit long in the tooth Jeff> so before I get into problems I’ve bought 4 new disks to replace Jeff> them. Can I suggest that you buy another disk and convert into a RAID6 setup for even more resiliency? Esp with that much data (great that you have backups!) the piece of mind of an extra disk is well worth the cost in my mind. Personally, I just go with RAID1 mirrors on large disks like this for my home system. I don't have *that* much stuff... though my disks too are getting long in tooth. Jeff> I have a backup so if it all goes west I’m covered. So I’m Jeff> looking for suggestions. Jeff> My current plan is just to replace the 2tb drives with the new Jeff> 3tb drives and move on, I’d like to do it on line with out Jeff> having to trash the array and start again, so does anyone have a Jeff> game plan for doing that. You don't say how your system is setup, whether or not you have LVM on top of the MD RAID5 array or not. If you, you could simply do: 1. Build a new RAID6 array with five disks (buying another one like I suggest above). 2. Add this into your VG with the 4x2tb disks. 3. pvmove all your data onto the new PVs: pvmove -b <VG> <old-raid5-PV> And once it's done, you can them remove that PV from the VG and pull them from the system. Or turn them into a scratch space until they die... Jeff> Or is a 9tb raid 5 array the wrong thing to be doing and should Jeff> I be doing something else 6tb raid 10 or something I’m open to Jeff> suggestions. Depends on how good your backups are and how critical it is that this data stay online. John ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2017-03-22 14:51 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-20 12:47 proactive disk replacement Jeff Allison 2017-03-20 13:25 ` Reindl Harald 2017-03-20 14:59 ` Adam Goryachev 2017-03-20 15:04 ` Reindl Harald 2017-03-20 15:23 ` Adam Goryachev 2017-03-20 16:19 ` Wols Lists 2017-03-21 2:33 ` Jeff Allison 2017-03-21 9:54 ` Reindl Harald 2017-03-21 10:54 ` Adam Goryachev 2017-03-21 11:03 ` Reindl Harald 2017-03-21 11:34 ` Andreas Klauer 2017-03-21 12:03 ` Reindl Harald 2017-03-21 12:41 ` Andreas Klauer 2017-03-22 4:16 ` NeilBrown 2017-03-21 11:56 ` Adam Goryachev 2017-03-21 12:10 ` Reindl Harald 2017-03-21 13:13 ` David Brown 2017-03-21 13:24 ` Reindl Harald 2017-03-21 14:15 ` David Brown 2017-03-21 15:25 ` Wols Lists 2017-03-21 15:41 ` David Brown 2017-03-21 16:49 ` Phil Turmel 2017-03-22 13:53 ` Gandalf Corvotempesta 2017-03-22 14:12 ` David Brown 2017-03-22 14:32 ` Phil Turmel 2017-03-21 11:55 ` Gandalf Corvotempesta 2017-03-21 13:02 ` David Brown 2017-03-21 13:26 ` Gandalf Corvotempesta 2017-03-21 14:26 ` David Brown 2017-03-21 15:31 ` Wols Lists 2017-03-21 17:00 ` Phil Turmel 2017-03-21 15:29 ` Wols Lists 2017-03-21 16:55 ` Phil Turmel 2017-03-22 14:51 ` John Stoffel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.