* [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust [not found] <17025a94-1999-4619-b23d-7460946c2f85@zmail15.collab.prod.int.phx2.redhat.com> @ 2012-07-18 11:01 ` Jaromir Capik 2012-07-18 11:13 ` Mathias Burén ` (3 more replies) 0 siblings, 4 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-18 11:01 UTC (permalink / raw) To: linux-raid Hello. I'd like to ask you to implement the following ... The current RAID1 solution is not robust enough to protect the data against random data corruptions. Such corruptions usually happen when an unreadable sector is found by the drive's electronics and when the drive's trying to reallocate the sector to the spare area. There's no guarantee that the reallocated data will always match the original stored data since the drive sometimes can't read the data correctly even with several retries. That unfortunately completely masks the issue, because the sector can be read by the OS without problems even if it doesn't contain correct data. Would it be possible to implement chunk checksums to avoid such data corruptions? If a corrupted chunk is encountered, it would be taken from the second drive and immediately synced back. This would have a small performance and capacity impact (1 sector per chunk to minimize performance impact caused by unaligned granularity = 0.78% of the capacity with 64k chunks). Please, let me know if you find my request reasonable or not. Thanks in advance. Regards, Jaromir. -- Jaromir Capik Red Hat Czech, s.r.o. Software Engineer / BaseOS Email: jcapik@redhat.com Web: www.cz.redhat.com Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic IC: 27690016 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:01 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik @ 2012-07-18 11:13 ` Mathias Burén 2012-07-18 12:42 ` Jaromir Capik 2012-07-18 11:15 ` NeilBrown ` (2 subsequent siblings) 3 siblings, 1 reply; 36+ messages in thread From: Mathias Burén @ 2012-07-18 11:13 UTC (permalink / raw) To: Jaromir Capik; +Cc: linux-raid On 18 July 2012 12:01, Jaromir Capik <jcapik@redhat.com> wrote: > Hello. > > I'd like to ask you to implement the following ... > > The current RAID1 solution is not robust enough to protect the data > against random data corruptions. Such corruptions usually happen > when an unreadable sector is found by the drive's electronics > and when the drive's trying to reallocate the sector to the spare area. > There's no guarantee that the reallocated data will always match > the original stored data since the drive sometimes can't read the data > correctly even with several retries. That unfortunately completely masks > the issue, because the sector can be read by the OS without problems > even if it doesn't contain correct data. Would it be possible > to implement chunk checksums to avoid such data corruptions? > If a corrupted chunk is encountered, it would be taken from the second > drive and immediately synced back. This would have a small performance > and capacity impact (1 sector per chunk to minimize performance impact > caused by unaligned granularity = 0.78% of the capacity with 64k chunks). > > Please, let me know if you find my request reasonable or not. > > Thanks in advance. > > Regards, > Jaromir. > > -- > Jaromir Capik > Red Hat Czech, s.r.o. > Software Engineer / BaseOS > > Email: jcapik@redhat.com > Web: www.cz.redhat.com > Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic > IC: 27690016 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html That would be a disk format change... Why not use btrfs or zfs? Mathias ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:13 ` Mathias Burén @ 2012-07-18 12:42 ` Jaromir Capik 0 siblings, 0 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-18 12:42 UTC (permalink / raw) To: Mathias Burén; +Cc: linux-raid > On 18 July 2012 12:01, Jaromir Capik <jcapik@redhat.com> wrote: > > Hello. > > > > I'd like to ask you to implement the following ... > > > > The current RAID1 solution is not robust enough to protect the data > > against random data corruptions. Such corruptions usually happen > > when an unreadable sector is found by the drive's electronics > > and when the drive's trying to reallocate the sector to the spare > > area. > > There's no guarantee that the reallocated data will always match > > the original stored data since the drive sometimes can't read the > > data > > correctly even with several retries. That unfortunately completely > > masks > > the issue, because the sector can be read by the OS without > > problems > > even if it doesn't contain correct data. Would it be possible > > to implement chunk checksums to avoid such data corruptions? > > If a corrupted chunk is encountered, it would be taken from the > > second > > drive and immediately synced back. This would have a small > > performance > > and capacity impact (1 sector per chunk to minimize performance > > impact > > caused by unaligned granularity = 0.78% of the capacity with 64k > > chunks). > > > > Please, let me know if you find my request reasonable or not. > > > > Thanks in advance. > > > > Regards, > > Jaromir. > > > > -- > > Jaromir Capik > > Red Hat Czech, s.r.o. > > Software Engineer / BaseOS > > > > Email: jcapik@redhat.com > > Web: www.cz.redhat.com > > Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic > > IC: 27690016 > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > That would be a disk format change... > Hello Mathias. Yes. That would be a disk format change ... but optional only! Chunks would be interleaved with checksums, therefore the small capacity loss. > Why not use btrfs or zfs? I know btrfs implements that, but AFAIK it still lacks the transparent encryption. Am I wrong? In that case I would have to create one large regular file holding the LUKS data and modify the initramdisk to handle that. I could give ZFS a try ... > > Mathias > Thanks, Jaromir. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:01 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik 2012-07-18 11:13 ` Mathias Burén @ 2012-07-18 11:15 ` NeilBrown 2012-07-18 13:04 ` Jaromir Capik 2012-07-18 11:49 ` keld 2012-07-18 16:28 ` Asdo 3 siblings, 1 reply; 36+ messages in thread From: NeilBrown @ 2012-07-18 11:15 UTC (permalink / raw) To: Jaromir Capik; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1917 bytes --] On Wed, 18 Jul 2012 07:01:48 -0400 (EDT) Jaromir Capik <jcapik@redhat.com> wrote: > Hello. > > I'd like to ask you to implement the following ... > > The current RAID1 solution is not robust enough to protect the data > against random data corruptions. Such corruptions usually happen > when an unreadable sector is found by the drive's electronics > and when the drive's trying to reallocate the sector to the spare area. > There's no guarantee that the reallocated data will always match > the original stored data since the drive sometimes can't read the data > correctly even with several retries. That unfortunately completely masks > the issue, because the sector can be read by the OS without problems > even if it doesn't contain correct data. If a drive ever lets you read incorrect data rather than giving you an error indication, then the drive is broken by design. Don't use drives that do that. > Would it be possible > to implement chunk checksums to avoid such data corruptions? No. NeilBrown > If a corrupted chunk is encountered, it would be taken from the second > drive and immediately synced back. This would have a small performance > and capacity impact (1 sector per chunk to minimize performance impact > caused by unaligned granularity = 0.78% of the capacity with 64k chunks). > > Please, let me know if you find my request reasonable or not. > > Thanks in advance. > > Regards, > Jaromir. > > -- > Jaromir Capik > Red Hat Czech, s.r.o. > Software Engineer / BaseOS > > Email: jcapik@redhat.com > Web: www.cz.redhat.com > Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic > IC: 27690016 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:15 ` NeilBrown @ 2012-07-18 13:04 ` Jaromir Capik 2012-07-19 3:48 ` Stan Hoeppner 0 siblings, 1 reply; 36+ messages in thread From: Jaromir Capik @ 2012-07-18 13:04 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Hello Neil. > > If a drive ever lets you read incorrect data rather than giving you > an error > indication, then the drive is broken by design. Don't use drives > that do > that. Unfortunately many drives do that. This happens transparently during the drive's idle surface checks, when there's no read request from the OS. I don't know how it is done in server harddrives, but I experienced data corruptions related to bad sector reallocations in case of several different desktop drives. I got no read errors at all even when the SMART attributes were showing reallocations. Somebody could ask, why people want to implement RAID on top of cheap desktop harddrives. It's surprisingly because of their price. Chunk checksums would give people a cheap and safe/robust solution. > > > Would it be possible > > to implement chunk checksums to avoid such data corruptions? > > No. I respect your decision. Thank you for your time. Jaromir. > > NeilBrown > > > > If a corrupted chunk is encountered, it would be taken from the > > second > > drive and immediately synced back. This would have a small > > performance > > and capacity impact (1 sector per chunk to minimize performance > > impact > > caused by unaligned granularity = 0.78% of the capacity with 64k > > chunks). > > > > Please, let me know if you find my request reasonable or not. > > > > Thanks in advance. > > > > Regards, > > Jaromir. > > > > -- > > Jaromir Capik > > Red Hat Czech, s.r.o. > > Software Engineer / BaseOS > > > > Email: jcapik@redhat.com > > Web: www.cz.redhat.com > > Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic > > IC: 27690016 > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 13:04 ` Jaromir Capik @ 2012-07-19 3:48 ` Stan Hoeppner 2012-07-20 12:53 ` Jaromir Capik 0 siblings, 1 reply; 36+ messages in thread From: Stan Hoeppner @ 2012-07-19 3:48 UTC (permalink / raw) To: Jaromir Capik; +Cc: NeilBrown, linux-raid On 7/18/2012 8:04 AM, Jaromir Capik wrote: > Unfortunately many drives do that. This happens transparently > during the drive's idle surface checks, Please list the SATA drives you have verified that perform firmware self initiated surface scans when idle, and transparently (to the OS) relocate bad sectors during this process. Then list the drives that have relocated sectors during such a process for which they could not read all the data, causing the silent data corruption you describe. > I experienced data corruptions related to bad sector reallocations > in case of several different desktop drives. Please name the drives, make/model/manufacturer, the drive count of the array, and the array type used when these silent corruptions occurred. For one user to experience silent corruption once is extremely rare. To experience it multiple times within a human lifetime is statistically impossible, unless you manage very large disk farms with high cap drives. If your multiple silent corruptions relate strictly to RAID1 pairs, it would seem the problem is not with the drives, but lay somewhere else. Unless you're using some el cheapo 3rd rate Asian sourced white label drives nobody ever heard of. One such company flooded the market with such drives in the mid 90s. I've not heard of anything similar since, but that doesn't mean such drives aren't in the wild. -- Stan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-19 3:48 ` Stan Hoeppner @ 2012-07-20 12:53 ` Jaromir Capik 2012-07-20 18:24 ` Roberto Spadim 2012-07-21 3:58 ` Stan Hoeppner 0 siblings, 2 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-20 12:53 UTC (permalink / raw) To: stan; +Cc: NeilBrown, linux-raid > > Unfortunately many drives do that. This happens transparently > > during the drive's idle surface checks, > > Please list the SATA drives you have verified that perform firmware > self > initiated surface scans when idle, and transparently (to the OS) > relocate bad sectors during this process. > > Then list the drives that have relocated sectors during such a > process > for which they could not read all the data, causing the silent data > corruption you describe. I can't say I "have verified" that, since that doesn't happen everyday and in such cases I'm trying to focus on saving my data. I accept It's my fault that I had no mental power to play with the failing drives more prior to returning them for warranty replacement. I just know that I had corrupted data on the clones whilst there were no I/O errors in any logs during the cloning. I experienced that mainly on systems without RAID (=with single drive). One of my drives became unbootable due to a MBR data corruption. There were no intentional writes to that sector for a long time. I was able to read it by dd, I was able to clean it with zeroes by dd and I was able to create a new partition table with fdisk. All of these operations worked without problems and the number of reallocated sectors didn't increase when I was writing to that sector. I used to periodically check the SMART attributes by calling smartctl instead of retrieving emails from smartd and I remember there were no reallocated sectors shortly before it happened. But they were present after the incident. That doesn't verify such behavior, but I seems to me that it's exactly what happened. I experienced data corruptions with the following drives: Seagate Barracuda 7200.7 series (120GB, 200GB, 250GB). Seagate U6 series (40GB). All of them were IDE drives. Western Digital (320GB) ... SATA one, don't remember exact type. And now I'm playing with recently failed WDC WD2500AAJS-60M0A1, that was as member of RAID1. In the last case I put the failing drive to a different computer and assembled two independent arrays in degraded mode since it got out of sync / kicked the healthy drive out of the RAID1 for unknown reason. I then mounted partitions from the failing drive via sshfs and did a directory diff to find modification made in the meantime and copy all the recently modified files from the failing (but more recent) drive to the healthy one. I found one patch file, that had a total binary mess inside on the failing drive, but that mess was still perfectly readable. And even if it was not caused by the drive itself, it's a data corruption that would be hopefully prevented with chunk checksums. > For one user to experience silent corruption once is extremely rare. > To > experience it multiple times within a human lifetime is statistically > impossible, unless you manage very large disk farms with high cap > drives. > > If your multiple silent corruptions relate strictly to RAID1 pairs, > it > would seem the problem is not with the drives, but lay somewhere > else. I admit, that the problem could lie elsewhere ... but that doesn't change anything on the fact, that the data became corrupted without me noticing that. I don't feel well when I see what happened because I trusted this solution a bit too much. Sorry if I look too anxious. Regards, Jaromir. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 12:53 ` Jaromir Capik @ 2012-07-20 18:24 ` Roberto Spadim 2012-07-20 18:30 ` Roberto Spadim 2012-07-20 20:07 ` Jaromir Capik 2012-07-21 3:58 ` Stan Hoeppner 1 sibling, 2 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-20 18:24 UTC (permalink / raw) To: Jaromir Capik; +Cc: stan, NeilBrown, linux-raid IMO I think Jaromir is probably right about silent disk 'losts', it's not normal to lost data but it's possible (electronic problems, radioactive problems or another problem not related maybe lost of disk magnetic properties) since we are at block device layer (md) i don't know if we could/should implement a recovery algorithm or just a badblock report algorithm (checksum) i know it's not a 'normal' situation and it's not a property of raid1 implementations, but could be nice to implement it raid1 extended?! we have many mirrors (more than 2) that's not a normal implementation but it works really nice and help a lot in parallel work load maybe for a 'fast' solution you could use raid5 or raid6? while we discuss if this could/should/will not be implemented?! i think raid5/6 have checksums and others tools to get this type of problem while you can use your normal filesystem (ext3? ext4? reiser? xfs?) or direct the block device (a oracle database for example or mysql innodb) 2012/7/20 Jaromir Capik <jcapik@redhat.com> > > > > Unfortunately many drives do that. This happens transparently > > > during the drive's idle surface checks, > > > > Please list the SATA drives you have verified that perform firmware > > self > > initiated surface scans when idle, and transparently (to the OS) > > relocate bad sectors during this process. > > > > Then list the drives that have relocated sectors during such a > > process > > for which they could not read all the data, causing the silent data > > corruption you describe. > > I can't say I "have verified" that, since that doesn't happen everyday > and in such cases I'm trying to focus on saving my data. I accept > It's my fault that I had no mental power to play with the failing > drives more prior to returning them for warranty replacement. > I just know that I had corrupted data on the clones whilst there were > no I/O errors in any logs during the cloning. I experienced that > mainly on systems without RAID (=with single drive). One of my drives > became unbootable due to a MBR data corruption. There were no intentional > writes to that sector for a long time. I was able to read it by dd, > I was able to clean it with zeroes by dd and I was able to create > a new partition table with fdisk. All of these operations worked > without problems and the number of reallocated sectors didn't increase > when I was writing to that sector. I used to periodically check > the SMART attributes by calling smartctl instead of retrieving emails > from smartd and I remember there were no reallocated sectors shortly > before it happened. But they were present after the incident. > That doesn't verify such behavior, but I seems to me that it's exactly > what happened. > > I experienced data corruptions with the following drives: > Seagate Barracuda 7200.7 series (120GB, 200GB, 250GB). > Seagate U6 series (40GB). All of them were IDE drives. > Western Digital (320GB) ... SATA one, don't remember exact type. > And now I'm playing with recently failed WDC WD2500AAJS-60M0A1, > that was as member of RAID1. > > In the last case I put the failing drive to a different computer > and assembled two independent arrays in degraded mode since it got > out of sync / kicked the healthy drive out of the RAID1 for unknown > reason. I then mounted partitions from the failing drive via sshfs > and did a directory diff to find modification made in the meantime > and copy all the recently modified files from the failing (but more > recent) drive to the healthy one. I found one patch file, that had > a total binary mess inside on the failing drive, but that mess was > still perfectly readable. And even if it was not caused by the drive > itself, it's a data corruption that would be hopefully prevented > with chunk checksums. > > > For one user to experience silent corruption once is extremely rare. > > To > > experience it multiple times within a human lifetime is statistically > > impossible, unless you manage very large disk farms with high cap > > drives. > > > > If your multiple silent corruptions relate strictly to RAID1 pairs, > > it > > would seem the problem is not with the drives, but lay somewhere > > else. > > I admit, that the problem could lie elsewhere ... but that doesn't > change anything on the fact, that the data became corrupted without > me noticing that. I don't feel well when I see what happened because > I trusted this solution a bit too much. Sorry if I look too anxious. > > Regards, > Jaromir. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 18:24 ` Roberto Spadim @ 2012-07-20 18:30 ` Roberto Spadim 2012-07-20 20:07 ` Jaromir Capik 1 sibling, 0 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-20 18:30 UTC (permalink / raw) To: Jaromir Capik; +Cc: stan, NeilBrown, linux-raid Just some examples... searching on google (silent data loss) some guys report silent loss, but the NEC study is usefull to support Jamiro report... http://www.necam.com/Docs/?id=54157ff5-5de8-4966-a99d-341cf2cb27d3 page 3) Silent data corruption Introduction There are certain types of storage errors that go completely unreported and undetected in other storage systems which result in corrupt data being provided to applications with no warning, logging, error messages or notification of any kind. Though the problem is frequently identified as a silent read failure, the root cause can be that the write failed, thus we refer to this class of errors as “silent data corruption.” These errors are difficult to detect and diagnose, yet what’s worse is they are actually fairly common in systems without an extended data integrity feature. -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 18:24 ` Roberto Spadim 2012-07-20 18:30 ` Roberto Spadim @ 2012-07-20 20:07 ` Jaromir Capik 2012-07-20 20:21 ` Roberto Spadim 1 sibling, 1 reply; 36+ messages in thread From: Jaromir Capik @ 2012-07-20 20:07 UTC (permalink / raw) To: Roberto Spadim; +Cc: stan, NeilBrown, linux-raid > it's not normal to lost data but it's possible (electronic problems, > radioactive problems or another problem not related maybe lost of disk > magnetic properties) It might be also caused by a bug in the SATA controller driver. And nobody can be sure that there will be no new issues in case of future chipsets and their very first driver versions. > since we are at block device layer (md) i don't know if we > could/should implement a recovery algorithm or just a badblock report > algorithm (checksum) Direct recovery would be better since it doesn't cost much and lowers the possibility of data loss due to the second drive's failure. > maybe for a 'fast' solution you could use raid5 or raid6? while we > discuss if this could/should/will not be implemented?! > i think raid5/6 have checksums and others tools to get this type of > problem while you can use your normal filesystem (ext3? ext4? reiser? > xfs?) or direct the block device (a oracle database for example or > mysql innodb) RAID5/6 would need more drives than I actually have, right? There's not enough space for 3 drives in those small and cheap mini-ITX based home routers/servers I started building 3 years ago. Moreover that would mean a need for better cooling and a higher power consumption and that's something I'm exactly trying to avoid in this particular case. I slowly started to accept the idea, that I'll have to migrate my systems from mdraid to btrfs if there's no solution soon :( I don't like it much, but there's apparently nothing else I can do about that. > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > Thanks a lot for your answers and have a nice day. Regards, Jaromir. -- Jaromir Capik Red Hat Czech, s.r.o. Software Engineer / BaseOS Email: jcapik@redhat.com Web: www.cz.redhat.com Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic IC: 27690016 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 20:07 ` Jaromir Capik @ 2012-07-20 20:21 ` Roberto Spadim 2012-07-20 20:44 ` Jaromir Capik 0 siblings, 1 reply; 36+ messages in thread From: Roberto Spadim @ 2012-07-20 20:21 UTC (permalink / raw) To: Jaromir Capik; +Cc: stan, NeilBrown, linux-raid yeah for a 'fast' solution moving from one file system to another that works with theses checks can help you, while we check if this is usefull or not IMHO, if we implement this, we should implement outside any today raid levels, this should be done between device and filesystem, in others words: we should implement this to work like: DISKS - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - TODAY RAID LEVELS - FILESYSTEMS or DISKS - RAIDS LEVELS - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - FILESYSTEM or DISK - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - FILESYSTEM using this, we "could give more security" to usb pendrives for example, and any block device (network block device, DRBD, or anyother block device in linux) 2012/7/20 Jaromir Capik <jcapik@redhat.com>: >> it's not normal to lost data but it's possible (electronic problems, >> radioactive problems or another problem not related maybe lost of disk >> magnetic properties) > > It might be also caused by a bug in the SATA controller driver. > And nobody can be sure that there will be no new issues in case > of future chipsets and their very first driver versions. > > >> since we are at block device layer (md) i don't know if we >> could/should implement a recovery algorithm or just a badblock report >> algorithm (checksum) > > Direct recovery would be better since it doesn't cost much and lowers > the possibility of data loss due to the second drive's failure. > > >> maybe for a 'fast' solution you could use raid5 or raid6? while we >> discuss if this could/should/will not be implemented?! >> i think raid5/6 have checksums and others tools to get this type of >> problem while you can use your normal filesystem (ext3? ext4? reiser? >> xfs?) or direct the block device (a oracle database for example or >> mysql innodb) > > RAID5/6 would need more drives than I actually have, right? There's > not enough space for 3 drives in those small and cheap mini-ITX based home > routers/servers I started building 3 years ago. Moreover that would mean > a need for better cooling and a higher power consumption and that's > something I'm exactly trying to avoid in this particular case. > > I slowly started to accept the idea, that I'll have to migrate my > systems from mdraid to btrfs if there's no solution soon :( I don't > like it much, but there's apparently nothing else I can do about that. > >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> > > Thanks a lot for your answers and have a nice day. > > Regards, > Jaromir. > > -- > Jaromir Capik > Red Hat Czech, s.r.o. > Software Engineer / BaseOS > > Email: jcapik@redhat.com > Web: www.cz.redhat.com > Red Hat Czech s.r.o., Purkynova 99/71, 612 45, Brno, Czech Republic > IC: 27690016 > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 20:21 ` Roberto Spadim @ 2012-07-20 20:44 ` Jaromir Capik 2012-07-20 20:59 ` Roberto Spadim 0 siblings, 1 reply; 36+ messages in thread From: Jaromir Capik @ 2012-07-20 20:44 UTC (permalink / raw) To: Roberto Spadim; +Cc: stan, NeilBrown, linux-raid > yeah for a 'fast' solution moving from one file system to another > that > works with theses checks can help you, while we check if this is > usefull or not > > IMHO, if we implement this, we should implement outside any today > raid > levels, this should be done between device and filesystem, in others > words: > > we should implement this to work like: > DISKS - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - > TODAY RAID LEVELS - FILESYSTEMS > > or > > DISKS - RAIDS LEVELS - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 > PER DEVICE) - FILESYSTEM > > or > > DISK - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - > FILESYSTEM > > > using this, we "could give more security" to usb pendrives for > example, and any block device (network block device, DRBD, or > anyother > block device in linux) Well ... it looks more modular, easier and could have more usecases. You're probably right at this point. Dracut maintainers would kill us both, but that's a different story. I'm only missing that possibility of immediate resyncing of the data when a corruption is detected. That's probably the only thing, that would be nice to have directly in the RAID layer (and could/should be also optional). J. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 20:44 ` Jaromir Capik @ 2012-07-20 20:59 ` Roberto Spadim 0 siblings, 0 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-20 20:59 UTC (permalink / raw) To: Jaromir Capik; +Cc: stan, NeilBrown, linux-raid sorry about many posts guys (that´s not a spam) well... we are discussing ideas... IMO, that's one more layer (ok some developers like it, some developers don't), this implement a security layer of some well know harddrives, like ECC, checksums (maybe we will talk about inteligent SDD reallocation algorithm too...) since we are near to emulate a harddisk controller (with more tools) we could report block errors too like a harddisk do the error correction could be done by some raid levels that implement block correction (resync or maybe badblock reallocation)... just ideas... it's hard to implement and must be well tested... like any new code 2012/7/20 Jaromir Capik <jcapik@redhat.com>: >> yeah for a 'fast' solution moving from one file system to another >> that >> works with theses checks can help you, while we check if this is >> usefull or not >> >> IMHO, if we implement this, we should implement outside any today >> raid >> levels, this should be done between device and filesystem, in others >> words: >> >> we should implement this to work like: >> DISKS - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - >> TODAY RAID LEVELS - FILESYSTEMS >> >> or >> >> DISKS - RAIDS LEVELS - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 >> PER DEVICE) - FILESYSTEM >> >> or >> >> DISK - (OUR NEW SILENT ERROR SECURITY SYSTEM LEVEL, 1 PER DEVICE) - >> FILESYSTEM >> >> >> using this, we "could give more security" to usb pendrives for >> example, and any block device (network block device, DRBD, or >> anyother >> block device in linux) > > Well ... it looks more modular, easier and could have more usecases. > You're probably right at this point. Dracut maintainers would kill > us both, but that's a different story. > I'm only missing that possibility of immediate resyncing of the data > when a corruption is detected. That's probably the only thing, that > would be nice to have directly in the RAID layer (and could/should > be also optional). > > J. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 12:53 ` Jaromir Capik 2012-07-20 18:24 ` Roberto Spadim @ 2012-07-21 3:58 ` Stan Hoeppner 1 sibling, 0 replies; 36+ messages in thread From: Stan Hoeppner @ 2012-07-21 3:58 UTC (permalink / raw) To: Linux RAID On 7/20/2012 7:53 AM, Jaromir Capik wrote: > I admit, that the problem could lie elsewhere ... but that doesn't > change anything on the fact, that the data became corrupted without > me noticing that. The key here I think is "without me noticing that". Drives normally cry out in the night, spitting errors to logs, when they encounter problems. You may not receive an immediate error in your application, especially when the drive is a RAID member and the data can be shipped regardless of the drive error. If you never check your logs, or simply don't see these disk errors, how will you know there's a problem? Likewise, if the checksumming you request is implemented in md/RAID1, and your application never sees a problem when a drive heads South, and you never check your logs and thus don't see the checksum errors... How is this new checksumming any better than the current situation? The drive is still failing and you're still unaware of it. -- Stan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:01 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik 2012-07-18 11:13 ` Mathias Burén 2012-07-18 11:15 ` NeilBrown @ 2012-07-18 11:49 ` keld 2012-07-18 13:08 ` Jaromir Capik 2012-07-18 16:28 ` Asdo 3 siblings, 1 reply; 36+ messages in thread From: keld @ 2012-07-18 11:49 UTC (permalink / raw) To: Jaromir Capik; +Cc: linux-raid On Wed, Jul 18, 2012 at 07:01:48AM -0400, Jaromir Capik wrote: > Hello. > > I'd like to ask you to implement the following ... > > The current RAID1 solution is not robust enough to protect the data > against random data corruptions. Such corruptions usually happen > when an unreadable sector is found by the drive's electronics > and when the drive's trying to reallocate the sector to the spare area. > There's no guarantee that the reallocated data will always match > the original stored data since the drive sometimes can't read the data > correctly even with several retries. That unfortunately completely masks > the issue, because the sector can be read by the OS without problems > even if it doesn't contain correct data. Would it be possible > to implement chunk checksums to avoid such data corruptions? > If a corrupted chunk is encountered, it would be taken from the second > drive and immediately synced back. This would have a small performance > and capacity impact (1 sector per chunk to minimize performance impact > caused by unaligned granularity = 0.78% of the capacity with 64k chunks). > > Please, let me know if you find my request reasonable or not. I believe alternative to that is implemented via the Linux RAID MD badblock feature. best regards keld ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:49 ` keld @ 2012-07-18 13:08 ` Jaromir Capik 2012-07-18 16:08 ` Roberto Spadim 2012-07-18 21:02 ` keld 0 siblings, 2 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-18 13:08 UTC (permalink / raw) To: keld; +Cc: linux-raid > > Hello. > > > > I'd like to ask you to implement the following ... > > > > The current RAID1 solution is not robust enough to protect the data > > against random data corruptions. Such corruptions usually happen > > when an unreadable sector is found by the drive's electronics > > and when the drive's trying to reallocate the sector to the spare > > area. > > There's no guarantee that the reallocated data will always match > > the original stored data since the drive sometimes can't read the > > data > > correctly even with several retries. That unfortunately completely > > masks > > the issue, because the sector can be read by the OS without > > problems > > even if it doesn't contain correct data. Would it be possible > > to implement chunk checksums to avoid such data corruptions? > > If a corrupted chunk is encountered, it would be taken from the > > second > > drive and immediately synced back. This would have a small > > performance > > and capacity impact (1 sector per chunk to minimize performance > > impact > > caused by unaligned granularity = 0.78% of the capacity with 64k > > chunks). > > > > Please, let me know if you find my request reasonable or not. > > I believe alternative to that is implemented via the Linux RAID MD > badblock feature. Hello keld ... I couldn't find any info about that feature. Could you please give me more info about that? Thanks in advance. Regards, Jaromir. > > best regards > keld > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 13:08 ` Jaromir Capik @ 2012-07-18 16:08 ` Roberto Spadim 2012-07-20 10:35 ` Jaromir Capik 2012-07-18 21:02 ` keld 1 sibling, 1 reply; 36+ messages in thread From: Roberto Spadim @ 2012-07-18 16:08 UTC (permalink / raw) To: Jaromir Capik; +Cc: keld, linux-raid yeah, i think this data corruption could/should be implemented as badblocks... do you have a disk that read blocks with wrong data like you told? if yes, could you check if it have bad blocks? (via some software, since i don´t know if linux kernel will report it as badblock on dmesg or something else) some disk manufacturers have MSDOS compatible program to check disk badblocks and others features... check if it´s a real bad block, or a disk problem (controller/data comunication) 2012/7/18 Jaromir Capik <jcapik@redhat.com> > > > > Hello. > > > > > > I'd like to ask you to implement the following ... > > > > > > The current RAID1 solution is not robust enough to protect the data > > > against random data corruptions. Such corruptions usually happen > > > when an unreadable sector is found by the drive's electronics > > > and when the drive's trying to reallocate the sector to the spare > > > area. > > > There's no guarantee that the reallocated data will always match > > > the original stored data since the drive sometimes can't read the > > > data > > > correctly even with several retries. That unfortunately completely > > > masks > > > the issue, because the sector can be read by the OS without > > > problems > > > even if it doesn't contain correct data. Would it be possible > > > to implement chunk checksums to avoid such data corruptions? > > > If a corrupted chunk is encountered, it would be taken from the > > > second > > > drive and immediately synced back. This would have a small > > > performance > > > and capacity impact (1 sector per chunk to minimize performance > > > impact > > > caused by unaligned granularity = 0.78% of the capacity with 64k > > > chunks). > > > > > > Please, let me know if you find my request reasonable or not. > > > > I believe alternative to that is implemented via the Linux RAID MD > > badblock feature. > > Hello keld ... > > I couldn't find any info about that feature. Could you please give > me more info about that? > > Thanks in advance. > > Regards, > Jaromir. > > > > > best regards > > keld > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 16:08 ` Roberto Spadim @ 2012-07-20 10:35 ` Jaromir Capik 0 siblings, 0 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-20 10:35 UTC (permalink / raw) To: Roberto Spadim; +Cc: keld, linux-raid > yeah, i think this data corruption could/should be implemented as > badblocks... > do you have a disk that read blocks with wrong data like you told? All of them were replaced during the warranty period ... but it seems I have a new candidate. I'll use it for my tests. I'll write specific data there and then I'll be reading them with sufficient idle time intervals until I get either a read error or corrupted data without read errors. > if yes, could you check if it have bad blocks? (via some software, > since i don´t know if linux kernel will report it as badblock on > dmesg or something else) I always check S.M.A.R.T. atributes and all of the drives reported reallocated and pending sectors, while there were no uncorrectable sectors reported in some cases. I remember, that one of the drives stopped booting because of MBR corruption, but the sector was readable with dd without problems. I could also clean it and created new partition table with fdisk (but the SMART atributes didn't change with the new write operation. That really looks like there was a reallocation done prior to my checks even if reallocations should be done only during the write operation and I'm sure there was absolutely no need for writing to MBR. I suspect some of the drive firmwares, that they do the reallocation transparently during the idle state. Especially seagate drives with capacities around 200GB can be heard, that they're doing their own surface checks when they're idle. Maybe that's intention of the manufacturers. I could imagine they don't want people to claim for the drive replacement and thus they're trying to cover the issues up. I also believe, that the SMART attributes might be intentionally misreported by the firmware. The drive's electronics might be transparently doing a lot of internal stuff dependent on the current drive's internal design, that can't be easily mapped to any of the SMART attributes and thus not reported at all. You know, nobody can make the manufacturers to follow the rules ... moreover, there might be a design/firmware bug or something else preventing the drive working correctly in some cases. I can imagine many different scenarios since I was a hardware designer for almost 10 years and writing a firmware for conceptually wrong hardware design might be the worst nightmare you could ever imagine. And low-price device designs are often cheated and full of workarounds. Anyway ... I believe, that relying on (by nature) unreliable hardware might be considered a conceptual issue of the current MD-RAID layer. > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > Regards, Jaromir. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 13:08 ` Jaromir Capik 2012-07-18 16:08 ` Roberto Spadim @ 2012-07-18 21:02 ` keld 1 sibling, 0 replies; 36+ messages in thread From: keld @ 2012-07-18 21:02 UTC (permalink / raw) To: Jaromir Capik; +Cc: linux-raid On Wed, Jul 18, 2012 at 09:08:51AM -0400, Jaromir Capik wrote: > > > Hello. > > > > > > I'd like to ask you to implement the following ... > > > > > > The current RAID1 solution is not robust enough to protect the data > > > against random data corruptions. Such corruptions usually happen > > > when an unreadable sector is found by the drive's electronics > > > and when the drive's trying to reallocate the sector to the spare > > > area. > > > There's no guarantee that the reallocated data will always match > > > the original stored data since the drive sometimes can't read the > > > data > > > correctly even with several retries. That unfortunately completely > > > masks > > > the issue, because the sector can be read by the OS without > > > problems > > > even if it doesn't contain correct data. Would it be possible > > > to implement chunk checksums to avoid such data corruptions? > > > If a corrupted chunk is encountered, it would be taken from the > > > second > > > drive and immediately synced back. This would have a small > > > performance > > > and capacity impact (1 sector per chunk to minimize performance > > > impact > > > caused by unaligned granularity = 0.78% of the capacity with 64k > > > chunks). > > > > > > Please, let me know if you find my request reasonable or not. > > > > I believe alternative to that is implemented via the Linux RAID MD > > badblock feature. > > Hello keld ... > > I couldn't find any info about that feature. Could you please give > me more info about that? I do believe it is already implemented. I do not have the documentation. But have a look in newer mdadm documentation or in the kernel sources, or in he archives for this email list. If you do find useful info, then please tell it here, and we can put the info on our wiki. The wiki is now open again for modifications. If you find out how to use it we could also add info on that to the wiki. best regards keld ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 11:01 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik ` (2 preceding siblings ...) 2012-07-18 11:49 ` keld @ 2012-07-18 16:28 ` Asdo 2012-07-20 11:07 ` Jaromir Capik 3 siblings, 1 reply; 36+ messages in thread From: Asdo @ 2012-07-18 16:28 UTC (permalink / raw) To: Jaromir Capik; +Cc: linux-raid On 07/18/12 13:01, Jaromir Capik wrote: > Hello. > > I'd like to ask you to implement the following ... > > The current RAID1 solution is not robust enough to protect the data > against random data corruptions. Such corruptions usually happen > when an unreadable sector is found by the drive's electronics > and when the drive's trying to reallocate the sector to the spare area. > There's no guarantee that the reallocated data will always match > the original stored data since the drive sometimes can't read the data > correctly even with several retries. That unfortunately completely masks > the issue, because the sector can be read by the OS without problems > even if it doesn't contain correct data. Would it be possible > to implement chunk checksums to avoid such data corruptions? > If a corrupted chunk is encountered, it would be taken from the second > drive and immediately synced back. This would have a small performance > and capacity impact (1 sector per chunk to minimize performance impact > caused by unaligned granularity = 0.78% of the capacity with 64k chunks). > > Please, let me know if you find my request reasonable or not. > > Thanks in advance. > > Regards, > Jaromir. > This is a very invasive change that you ask, conceptually, man-hours-wise, performance-wise, ondisk-format wise, space-wise and also it really should stay at another layer, preferably below the RAID (btrfs and zfs do this above though). This should probably be a DM/LVM project. Drives do this already, they have checksums (google for reed-solomon). If the checksums are not long enough you should use different drives. But in my life I never saw a "silent data corruption" like the one you say. Also, statistically speaking, if one disk checksum returns false positive the drive is very likely dying, because it takes very many bit flips to bypass the reed-solomon check, so other sectors on the same drive have almost certainly given read error and you should have replaced the drive long ago already. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-18 16:28 ` Asdo @ 2012-07-20 11:07 ` Jaromir Capik 2012-07-20 11:14 ` Oliver Schinagl 2012-07-20 11:28 ` Jaromir Capik 0 siblings, 2 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-20 11:07 UTC (permalink / raw) To: Asdo; +Cc: linux-raid > This is a very invasive change that you ask, conceptually, > man-hours-wise, performance-wise, ondisk-format wise, space-wise Yes ... I'm aware of possibly high number of man-hours. If we talk about space ... 0.78% is not so invasive, is it? On-disk format ... interleaving chunks with checksum sectors doesn't seem to me a complicated math ... chunk_starting_sector = chunk_number * (chunk_size_in_sectors + 1) ... of course this is relative to the chunk area offset. > also it really should stay at another layer, preferably below the > RAID but how would you like to implement that if the lower level is known to be unreliable enough? > (btrfs and zfs do this above though). Btrfs and zfs has it's own RAID layer, so there's no need for underlying MD-RAID. But I haven't studied how exactly it's done there. > This should probably be a > DM/LVM > project. LVM ? How do you want to implement that in LVM? You would create two big PVs with two big logical partitions protected by checksums? The mdraid layer would be built on top of these, right? That could possibly work too if LVM returns read errors for blocks with incorrect checksums. I'm not fully against that idea. > > Drives do this already, they have checksums (google for > reed-solomon). > If the checksums are not long enough you should use different drives. > But in my life I never saw a "silent data corruption" like the one > you say. I believe I've mentioned my experience with such nasty HDD behaviour in my previous email. I also don't like that, but it apparently happens and we can't rely on the proper hardware functioning especially when it's unreliable by nature. Regards, Jaromir. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 11:07 ` Jaromir Capik @ 2012-07-20 11:14 ` Oliver Schinagl 2012-07-20 11:28 ` Jaromir Capik 1 sibling, 0 replies; 36+ messages in thread From: Oliver Schinagl @ 2012-07-20 11:14 UTC (permalink / raw) To: Jaromir Capik; +Cc: Asdo, linux-raid On 20-07-12 13:07, Jaromir Capik wrote: >> This is a very invasive change that you ask, conceptually, >> man-hours-wise, performance-wise, ondisk-format wise, space-wise > Yes ... I'm aware of possibly high number of man-hours. > If we talk about space ... 0.78% is not so invasive, is it? > On-disk format ... interleaving chunks with checksum sectors doesn't > seem to me a complicated math ... > > chunk_starting_sector = chunk_number * (chunk_size_in_sectors + 1) > > ... of course this is relative to the chunk area offset. > >> also it really should stay at another layer, preferably below the >> RAID > but how would you like to implement that if the lower level is known > to be unreliable enough? > >> (btrfs and zfs do this above though). > Btrfs and zfs has it's own RAID layer, so there's no need for > underlying MD-RAID. But I haven't studied how exactly it's done > there. > >> This should probably be a >> DM/LVM >> project. > LVM ? How do you want to implement that in LVM? You would create > two big PVs with two big logical partitions protected by checksums? > The mdraid layer would be built on top of these, right? > That could possibly work too if LVM returns read errors for blocks > with incorrect checksums. I'm not fully against that idea. > >> Drives do this already, they have checksums (google for >> reed-solomon). >> If the checksums are not long enough you should use different drives. >> But in my life I never saw a "silent data corruption" like the one >> you say. > I believe I've mentioned my experience with such nasty HDD behaviour > in my previous email. I also don't like that, but it apparently > happens and we can't rely on the proper hardware functioning > especially when it's unreliable by nature. Actually, I've had quite some dataloss due to a hardrive/controller/cabling not working properly (no clue what caused it) but raid5 never complained. To this date, I do not know what happened and why my data was corrupt. > Regards, > Jaromir. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-20 11:07 ` Jaromir Capik 2012-07-20 11:14 ` Oliver Schinagl @ 2012-07-20 11:28 ` Jaromir Capik 1 sibling, 0 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-20 11:28 UTC (permalink / raw) To: Asdo; +Cc: linux-raid > Btrfs and zfs has it's own RAID layer, so there's no need for > underlying MD-RAID. But I haven't studied how exactly it's done > there. I just read some ZFS docs and look at the following paragraph ... --- Mirrored Vdev’s (RAID1) This is akin to RAID1. If you mirror a pair of Vdev’s (each Vdev is usually a single hard drive) it is just like RAID1, except you get the added bonus of automatic checksumming. This prevents silent data corruption that is usually undetectable by most hardware RAID cards. --- As you see that's not just my own blurred dream ... It seems I'm not the only one who encountered silent data corruptions. And it doesn't matter what the root cause of such corruptions is. They simply appear from time to time and checksums seem to prevent from them being silently ignored. > > > This should probably be a > > DM/LVM > > project. > > LVM ? How do you want to implement that in LVM? You would create > two big PVs with two big logical partitions protected by checksums? > The mdraid layer would be built on top of these, right? > That could possibly work too if LVM returns read errors for blocks > with incorrect checksums. I'm not fully against that idea. I just got on my mind, that this wouldn't allow us to resync the correct data immediately back to the drive where the corruption appeared. So ... I still believe, that the RAID layer is the best place for this feature. Regards, Jaromir. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1082734092.338339.1342995087426.JavaMail.root@redhat.com>]
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust [not found] <1082734092.338339.1342995087426.JavaMail.root@redhat.com> @ 2012-07-23 4:29 ` Stan Hoeppner 2012-07-23 9:34 ` Jaromir Capik 0 siblings, 1 reply; 36+ messages in thread From: Stan Hoeppner @ 2012-07-23 4:29 UTC (permalink / raw) To: Jaromir Capik, Linux RAID Please keep discussion on list. This is probably an MUA issue. Happens to me on occasion when I hit "reply to list" instead of "reply to all". vger doesn't provide a List-Post: header so "reply to list" doesn't work and you end up replying to the sender. On 7/22/2012 5:11 PM, Jaromir Capik wrote: >>> I admit, that the problem could lie elsewhere ... but that doesn't >>> change anything on the fact, that the data became corrupted without >>> me noticing that. >> >> The key here I think is "without me noticing that". Drives normally >> cry >> out in the night, spitting errors to logs, when they encounter >> problems. >> You may not receive an immediate error in your application, >> especially >> when the drive is a RAID member and the data can be shipped >> regardless >> of the drive error. If you never check your logs, or simply don't >> see >> these disk errors, how will you know there's a problem? > > Hello Stan. > > I used to periodically check logs as well as S.M.A.R.T. attributes. > And I believe I've already mentioned two of the cases and how > I finally discovered the issues. Moreover I switched from manual > checking to receiving emails from monitoring daemons. And even > if you receive such email, it usually takes some time to replace > the failing drive. That time window might be fatal for your data > if junk is read from one of the drives and when it's followed > by a write. Such write would destroy the second correct copy ... > >> >> Likewise, if the checksumming you request is implemented in md/RAID1, >> and your application never sees a problem when a drive heads South, >> and >> you never check your logs and thus don't see the checksum errors... > > You wouldn't have to ... because the corrupted chunks would be > immediately resynced with good data and you'll REALLY get some errors > in the logs if the harddrive or controller or it's driver doesn't > produce them for whatever reason. > >> >> How is this new checksumming any better than the current situation? >> The >> drive is still failing and you're still unaware of it. > > Do you believe, that other reasons of silent data corruptions simply > do not exist? Try to imagine a case, when the correct data aren't > written at all to one of the drives due to a bug in the drive's firmware > or due to a bug in the controller design or due to a bug in the > controller driver or due to other reasons. Such bug could be tiggered > by anything ... it could be a delay in the read operation when the > sector is not well readable or any race condition, etc. Especially > new devices and their very first versions are expected to be buggy. > Checksuming would prevent them all and would make the whole > I/O really bulletproof. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 4:29 ` Stan Hoeppner @ 2012-07-23 9:34 ` Jaromir Capik 2012-07-23 10:53 ` Stan Hoeppner 2012-07-23 17:03 ` Piergiorgio Sartor 0 siblings, 2 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-23 9:34 UTC (permalink / raw) To: stan; +Cc: Linux RAID Hello Stan. I received your reply without having the Linux RAID list in Cc and thus I was unsure if you wanna discuss that privately or not. I always choose reply to all unless I really want to remove some of the recipients :] Cheers, Jaromir. > > Please keep discussion on list. This is probably an MUA issue. > Happens > to me on occasion when I hit "reply to list" instead of "reply to > all". > vger doesn't provide a List-Post: header so "reply to list" doesn't > work and you end up replying to the sender. > > On 7/22/2012 5:11 PM, Jaromir Capik wrote: > >>> I admit, that the problem could lie elsewhere ... but that > >>> doesn't > >>> change anything on the fact, that the data became corrupted > >>> without > >>> me noticing that. > >> > >> The key here I think is "without me noticing that". Drives > >> normally > >> cry > >> out in the night, spitting errors to logs, when they encounter > >> problems. > >> You may not receive an immediate error in your application, > >> especially > >> when the drive is a RAID member and the data can be shipped > >> regardless > >> of the drive error. If you never check your logs, or simply don't > >> see > >> these disk errors, how will you know there's a problem? > > > > Hello Stan. > > > > I used to periodically check logs as well as S.M.A.R.T. attributes. > > And I believe I've already mentioned two of the cases and how > > I finally discovered the issues. Moreover I switched from manual > > checking to receiving emails from monitoring daemons. And even > > if you receive such email, it usually takes some time to replace > > the failing drive. That time window might be fatal for your data > > if junk is read from one of the drives and when it's followed > > by a write. Such write would destroy the second correct copy ... > > > >> > >> Likewise, if the checksumming you request is implemented in > >> md/RAID1, > >> and your application never sees a problem when a drive heads > >> South, > >> and > >> you never check your logs and thus don't see the checksum > >> errors... > > > > You wouldn't have to ... because the corrupted chunks would be > > immediately resynced with good data and you'll REALLY get some > > errors > > in the logs if the harddrive or controller or it's driver doesn't > > produce them for whatever reason. > > > >> > >> How is this new checksumming any better than the current > >> situation? > >> The > >> drive is still failing and you're still unaware of it. > > > > Do you believe, that other reasons of silent data corruptions > > simply > > do not exist? Try to imagine a case, when the correct data aren't > > written at all to one of the drives due to a bug in the drive's > > firmware > > or due to a bug in the controller design or due to a bug in the > > controller driver or due to other reasons. Such bug could be > > tiggered > > by anything ... it could be a delay in the read operation when the > > sector is not well readable or any race condition, etc. Especially > > new devices and their very first versions are expected to be buggy. > > Checksuming would prevent them all and would make the whole > > I/O really bulletproof. > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 9:34 ` Jaromir Capik @ 2012-07-23 10:53 ` Stan Hoeppner 2012-07-23 17:03 ` Piergiorgio Sartor 1 sibling, 0 replies; 36+ messages in thread From: Stan Hoeppner @ 2012-07-23 10:53 UTC (permalink / raw) To: Jaromir Capik; +Cc: Linux RAID On 7/23/2012 4:34 AM, Jaromir Capik wrote: > Hello Stan. > > I received your reply without having the Linux RAID list in Cc > and thus I was unsure if you wanna discuss that privately or not. > I always choose reply to all unless I really want to remove > some of the recipients :] When you saw the same message also arrive via the list, that wasn't a clue. ;) -- Stan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 9:34 ` Jaromir Capik 2012-07-23 10:53 ` Stan Hoeppner @ 2012-07-23 17:03 ` Piergiorgio Sartor 2012-07-23 18:24 ` Roberto Spadim 1 sibling, 1 reply; 36+ messages in thread From: Piergiorgio Sartor @ 2012-07-23 17:03 UTC (permalink / raw) To: Jaromir Capik; +Cc: stan, Linux RAID Hi all, actually, what you would like to do is already possible, albeit it will kill the performance of a rotating, mechanical, HDD. With SSD might work better. If you take an HDD and partition it, let's say with 100 partitions (GPT will be required), then you can build a RAID-6 using this 100 partitions, having a redundancy of 2%. Taking two, or more, of such configured RAID-6, it will be possible to build (with them) a RAID-1 (or else). If a check of this RAID-1 returns mismatches, it will be possible to check the single devices and find out which is not OK. With RAID-6 (per device), and a bit of luck, it will be possible to fix it directly. Of course a lot of variables are tunable here. For example the number of partitions, the chunk size, or even the fact that with X partitions it could be possible to build more than one RAID-6, increasing the effective redundancy. All with the performance price I mentioned at the beginning. bye, -- piergiorgio ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 17:03 ` Piergiorgio Sartor @ 2012-07-23 18:24 ` Roberto Spadim 2012-07-23 21:31 ` Drew 2012-07-24 15:09 ` Jaromir Capik 0 siblings, 2 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-23 18:24 UTC (permalink / raw) To: Piergiorgio Sartor; +Cc: Jaromir Capik, stan, Linux RAID yeah, i think this too, but IMO Jamiro exposed a specific scenario, let´s get back to it and after check a generic scenario, he is using small computers (i don´t know if it´s ARM or X86) with space to only 2 disks (i told him to use raid5 or raid6 because the checksums but he don´t have space for >=3 disks in computer case, maybe if we could run raid5 with 2 disks could help... or 1 disk... just a idiot idea, but this could help...) i don´t know the real scenario, i think he will not use it in 100partitions, maybe 4 or 5 partitions, and performance to be a second option, security is priority here in the implementation of this new layer (maybe like LINEAR, MULTIPATH or another not raid level) we could focus on security and after performace just some ideas... 2012/7/23 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> > > Hi all, > > actually, what you would like to do is already > possible, albeit it will kill the performance > of a rotating, mechanical, HDD. > With SSD might work better. > > If you take an HDD and partition it, let's say > with 100 partitions (GPT will be required), > then you can build a RAID-6 using this 100 > partitions, having a redundancy of 2%. > Taking two, or more, of such configured RAID-6, > it will be possible to build (with them) a > RAID-1 (or else). > > If a check of this RAID-1 returns mismatches, > it will be possible to check the single devices > and find out which is not OK. > With RAID-6 (per device), and a bit of luck, it > will be possible to fix it directly. > > Of course a lot of variables are tunable here. > For example the number of partitions, the chunk > size, or even the fact that with X partitions > it could be possible to build more than one RAID-6, > increasing the effective redundancy. > > All with the performance price I mentioned at the > beginning. > > bye, > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 18:24 ` Roberto Spadim @ 2012-07-23 21:31 ` Drew 2012-07-23 21:42 ` Roberto Spadim ` (2 more replies) 2012-07-24 15:09 ` Jaromir Capik 1 sibling, 3 replies; 36+ messages in thread From: Drew @ 2012-07-23 21:31 UTC (permalink / raw) To: Linux RAID Been mulling this problem over and I keep getting hung up on one problem with ECC on a two disk RAID1 setup. In the event of silent corruption of one disk, which one is the good copy? It works fine if the ECC code is identical across both mirrors. Just checksum both chunks and discard the incorrect one. It also works fine if the ECC codes are corrupted but the data chunks are identical. Discard the bad checksum. What if the corruption goes across several sectors and both data & ECC chuncks are corrupted? Now you're back to square one. -- Drew "Nothing in life is to be feared. It is only to be understood." --Marie Curie ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 21:31 ` Drew @ 2012-07-23 21:42 ` Roberto Spadim 2012-07-24 4:42 ` Stan Hoeppner 2012-07-27 6:06 ` Adam Goryachev 2 siblings, 0 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-23 21:42 UTC (permalink / raw) To: Drew; +Cc: Linux RAID that´s the point... in few words.... 2012/7/23 Drew <drew.kay@gmail.com>: > Been mulling this problem over and I keep getting hung up on one > problem with ECC on a two disk RAID1 setup. > > In the event of silent corruption of one disk, which one is the good copy? > > It works fine if the ECC code is identical across both mirrors. Just > checksum both chunks and discard the incorrect one. nice we can recover data =) > > It also works fine if the ECC codes are corrupted but the data chunks > are identical. Discard the bad checksum. nice we can recover data too =) > > What if the corruption goes across several sectors and both data & ECC > chuncks are corrupted? Now you're back to square one. report badblock to upper layer (file system or a mdraid or lvm or anyother process) the same should occur on harddisk with known corrupted data but in this case we know that´s wrong and we can report it! =) that´s the nice part! very different from "silent data corruption" where no alert or warning or error is reported, that´s the today bad part... ok... you will tell about NEC report? they have this 'software security' in firmware (i think we could make something similar) > -- > Drew > > "Nothing in life is to be feared. It is only to be understood." > --Marie Curie > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 21:31 ` Drew 2012-07-23 21:42 ` Roberto Spadim @ 2012-07-24 4:42 ` Stan Hoeppner 2012-07-24 12:51 ` Roberto Spadim 2012-07-27 6:06 ` Adam Goryachev 2 siblings, 1 reply; 36+ messages in thread From: Stan Hoeppner @ 2012-07-24 4:42 UTC (permalink / raw) To: Drew; +Cc: Linux RAID On 7/23/2012 4:31 PM, Drew wrote: > What if the corruption goes across several sectors and both data & ECC > chuncks are corrupted? What if the 'silent' corruption spans 10 million sectors (~5GB)? -- Stan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-24 4:42 ` Stan Hoeppner @ 2012-07-24 12:51 ` Roberto Spadim 0 siblings, 0 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-24 12:51 UTC (permalink / raw) To: stan; +Cc: Drew, Linux RAID 10 milion bad blocks (5gb of lost information) check that we are talking about one device (it can be a disk, a partition, a raid1 a raid0, nbd, drbd, or anything else) 2012/7/24 Stan Hoeppner <stan@hardwarefreak.com>: > On 7/23/2012 4:31 PM, Drew wrote: > >> What if the corruption goes across several sectors and both data & ECC >> chuncks are corrupted? > > What if the 'silent' corruption spans 10 million sectors (~5GB)? > > -- > Stan > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 21:31 ` Drew 2012-07-23 21:42 ` Roberto Spadim 2012-07-24 4:42 ` Stan Hoeppner @ 2012-07-27 6:06 ` Adam Goryachev 2012-07-27 13:42 ` Roberto Spadim 2 siblings, 1 reply; 36+ messages in thread From: Adam Goryachev @ 2012-07-27 6:06 UTC (permalink / raw) To: Linux RAID On 24/07/12 07:31, Drew wrote: > Been mulling this problem over and I keep getting hung up on one > problem with ECC on a two disk RAID1 setup. > > In the event of silent corruption of one disk, which one is the good > copy? > > It works fine if the ECC code is identical across both mirrors. Just > checksum both chunks and discard the incorrect one. > > It also works fine if the ECC codes are corrupted but the data > chunks are identical. Discard the bad checksum. > > What if the corruption goes across several sectors and both data & > ECC chuncks are corrupted? Now you're back to square one. I know I'm a bit late to this discussion, and I know very little about the code level/etc... however, I thought the whole point of the checksum is to determine that the data + checksum do not match, therefore the data is wrong and should be discarded. You would re-write the data and checksum from another source (ie, the other drive in RAID1, or other drives in RAID5/6 etc...). ie, it should be treated the same as a bad block / non-readable sector (or lots of unreadable sectors....) Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-27 6:06 ` Adam Goryachev @ 2012-07-27 13:42 ` Roberto Spadim 0 siblings, 0 replies; 36+ messages in thread From: Roberto Spadim @ 2012-07-27 13:42 UTC (permalink / raw) To: Adam Goryachev; +Cc: Linux RAID IMO the first idea was put this only in md_raid1, the second idea was a new md device (maybe a md_security or md_redundancy or md_conformity or any another beautiful name...) in this case the device will do a checksum and report 'badblock' (maybe the right word could be badchecksum), that the option that i agree, since we could do it to any device, doesn´t matter if it´s a raid1 or raid4 or raidXYZ just to explain words: badchecksum -> we can read data but we know that it doesn´t match checksum, or checksum doesn´t match data badblock -> we can´t read, because 'physical block' reported as bad for mirror layers we could do more than just know if we have a badchecksum (this is not good, check...) in the case of all mirrors reporting badchecksum, we could read data (doesn´t matter the badchecksum information) and vote to the data that have more repeated values and resync data from this new 'primary information', for example: /dev/md0 -> disks: /dev/sda /dev/sdb /dev/sdc original data: block= "ABCDEF", checksum=5 for /dev/sda: block="ABCDEH", checksum=5 (badchecksum) for /dev/sdb: block="ABCDEG", checksum=5 (badchecksum) for /dev/sdc: block="ABCDEG", checksum=5 (badchecksum) in this case, we could elect "ABCEG" (2 repeats) as the 'new data' recalcule the checksum and sync data to all devices (check that we coudl have a a 1 repeat for each device and couln´t elect the new primary information source...) well this ideal could be good and bad... for application level that´s bad, since we done a silent data corruption..., but maybe for a recovery tool this could be good since we corrected the checksum... maybe this could be a tool of the new device level... (CHECKS and REPAIRS like mdadm do today with echo "check"> /sys/block/md0/md/sync_action, or echo "repair" > /sys/block/md0/md/sync_action ) i don´t like the idea of put the 'recovery' inside md_raid1, i prefer a badblock per device (doesn´t matter if it´s a badblock or badchecksum..), and don´t do any 'silent recover' of information at raid level, to do a checksum correction or data correction, maybe leave this problem to a external tool, like harddisks have badblocks tools, we could have a badblock tool too going back to our new device, check that a data corruption (silent or not) is a data corruption, and in any case (checksum corruption or data corruption) we have a bad device, and we should report that we have a badblock in that read operation the best we could do when we have a badchecksum is reread many times and recalculate the checksum, if the good matches are bigger than X% (maybe 80%) we could send a write to device (to ensure that disk wrote the good value to disk again) and do a new read if that match (only with 1 read) that´s nice we done a good 'silent' repair with a 'good' (80% of probabilty of good) data, this could be an option of the new device to the new device ("silent recover") i think that´s all we could do of interesting =) maybe in some future... we could do a realoc?! like ssd do... mark the badchecksum block as badblock (inside a badblock list) and sync the data inside current badblock, to a new never used block (we could alloc 1% of device to use as never used blocks), this could be good for data security, but administrator should read logs to ensure that system don´t run with badblocks.... that´s are the ideas of the 'new' security device level that i could imagine... thanks guys :) 2012/7/27 Adam Goryachev <mailinglists@websitemanagers.com.au>: > On 24/07/12 07:31, Drew wrote: >> Been mulling this problem over and I keep getting hung up on one >> problem with ECC on a two disk RAID1 setup. >> >> In the event of silent corruption of one disk, which one is the good >> copy? >> >> It works fine if the ECC code is identical across both mirrors. Just >> checksum both chunks and discard the incorrect one. >> >> It also works fine if the ECC codes are corrupted but the data >> chunks are identical. Discard the bad checksum. >> >> What if the corruption goes across several sectors and both data & >> ECC chuncks are corrupted? Now you're back to square one. > > I know I'm a bit late to this discussion, and I know very little about > the code level/etc... however, I thought the whole point of the checksum > is to determine that the data + checksum do not match, therefore the > data is wrong and should be discarded. You would re-write the data and > checksum from another source (ie, the other drive in RAID1, or other > drives in RAID5/6 etc...). > > ie, it should be treated the same as a bad block / non-readable sector > (or lots of unreadable sectors....) > > Regards, > Adam > > > -- > Adam Goryachev > Website Managers > www.websitemanagers.com.au > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust 2012-07-23 18:24 ` Roberto Spadim 2012-07-23 21:31 ` Drew @ 2012-07-24 15:09 ` Jaromir Capik 1 sibling, 0 replies; 36+ messages in thread From: Jaromir Capik @ 2012-07-24 15:09 UTC (permalink / raw) To: Roberto Spadim; +Cc: stan, Linux RAID, Piergiorgio Sartor > yeah, i think this too, but IMO Jamiro exposed a specific scenario, > let´s get back to it and after check a generic scenario, > he is using small computers (i don´t know if it´s ARM or X86) with > space to only 2 disks (i told him to use raid5 or raid6 because the > checksums but he don´t have space for >=3 disks in computer case, > maybe if we could run raid5 with 2 disks could help... or 1 disk... > just a idiot idea, but this could help...) I believe, that Piergiorgio meant something else. It was about creation of a high number of small partitions on two physical drives and then build a RAID6 array on top of them. But that's really a bit overkill :] > i don´t know the real scenario, i think he will not use it in > 100partitions, maybe 4 or 5 partitions, and performance to be a > second > option, security is priority here > in the implementation of this new layer (maybe like LINEAR, MULTIPATH > or another not raid level) we could focus on security and after > performace > > just some ideas... > > 2012/7/23 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> > > > > Hi all, > > > > actually, what you would like to do is already > > possible, albeit it will kill the performance > > of a rotating, mechanical, HDD. > > With SSD might work better. > > > > If you take an HDD and partition it, let's say > > with 100 partitions (GPT will be required), > > then you can build a RAID-6 using this 100 > > partitions, having a redundancy of 2%. > > Taking two, or more, of such configured RAID-6, > > it will be possible to build (with them) a > > RAID-1 (or else). > > > > If a check of this RAID-1 returns mismatches, > > it will be possible to check the single devices > > and find out which is not OK. > > With RAID-6 (per device), and a bit of luck, it > > will be possible to fix it directly. > > > > Of course a lot of variables are tunable here. > > For example the number of partitions, the chunk > > size, or even the fact that with X partitions > > it could be possible to build more than one RAID-6, > > increasing the effective redundancy. > > > > All with the performance price I mentioned at the > > beginning. > > > > bye, > > > > -- > > > > piergiorgio > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <1897705147.341625.1342995720661.JavaMail.root@redhat.com>]
* Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust [not found] <1897705147.341625.1342995720661.JavaMail.root@redhat.com> @ 2012-07-23 4:30 ` Stan Hoeppner 0 siblings, 0 replies; 36+ messages in thread From: Stan Hoeppner @ 2012-07-23 4:30 UTC (permalink / raw) To: Jaromir Capik, Linux RAID Same issue likely. On 7/22/2012 5:22 PM, Jaromir Capik wrote: >>> Likewise, if the checksumming you request is implemented in >>> md/RAID1, > > Btw. I like what Roberto proposed ... this could be a completely > independent layer having it's own device file. MD RAID1 would > be then built on top of such safe device files. The only thing > to be implemented directly in the RAID1 would be that immediate > resyncing in case of discovered read error received from such > safety layer. And this immediate resyncing could/should be > optional ... > > Jaromir. > ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-07-27 13:42 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <17025a94-1999-4619-b23d-7460946c2f85@zmail15.collab.prod.int.phx2.redhat.com> 2012-07-18 11:01 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik 2012-07-18 11:13 ` Mathias Burén 2012-07-18 12:42 ` Jaromir Capik 2012-07-18 11:15 ` NeilBrown 2012-07-18 13:04 ` Jaromir Capik 2012-07-19 3:48 ` Stan Hoeppner 2012-07-20 12:53 ` Jaromir Capik 2012-07-20 18:24 ` Roberto Spadim 2012-07-20 18:30 ` Roberto Spadim 2012-07-20 20:07 ` Jaromir Capik 2012-07-20 20:21 ` Roberto Spadim 2012-07-20 20:44 ` Jaromir Capik 2012-07-20 20:59 ` Roberto Spadim 2012-07-21 3:58 ` Stan Hoeppner 2012-07-18 11:49 ` keld 2012-07-18 13:08 ` Jaromir Capik 2012-07-18 16:08 ` Roberto Spadim 2012-07-20 10:35 ` Jaromir Capik 2012-07-18 21:02 ` keld 2012-07-18 16:28 ` Asdo 2012-07-20 11:07 ` Jaromir Capik 2012-07-20 11:14 ` Oliver Schinagl 2012-07-20 11:28 ` Jaromir Capik [not found] <1082734092.338339.1342995087426.JavaMail.root@redhat.com> 2012-07-23 4:29 ` Stan Hoeppner 2012-07-23 9:34 ` Jaromir Capik 2012-07-23 10:53 ` Stan Hoeppner 2012-07-23 17:03 ` Piergiorgio Sartor 2012-07-23 18:24 ` Roberto Spadim 2012-07-23 21:31 ` Drew 2012-07-23 21:42 ` Roberto Spadim 2012-07-24 4:42 ` Stan Hoeppner 2012-07-24 12:51 ` Roberto Spadim 2012-07-27 6:06 ` Adam Goryachev 2012-07-27 13:42 ` Roberto Spadim 2012-07-24 15:09 ` Jaromir Capik [not found] <1897705147.341625.1342995720661.JavaMail.root@redhat.com> 2012-07-23 4:30 ` Stan Hoeppner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.