* Inactive arrays @ 2016-08-02 7:36 Daniel Sanabria 2016-08-02 10:17 ` Wols Lists 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-08-02 7:36 UTC (permalink / raw) To: linux-raid Hi All, I have a box that I believe was not powered down correctly and after transporting it to a different location it doesn't boot anymore stopping at BIOS check "Verifying DMI Pool Data". The box have 6 drives and after instructing the BIOS to boot from the first drive I managed to boot the OS (Fedora 23) after commenting out 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: [root@lamachine ~]# uname -a; cat /etc/fstab Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux # # /etc/fstab # Created by anaconda on Tue Mar 24 19:31:21 2015 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/vg_bigblackbox-LogVol_root / ext4 defaults 1 1 UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 defaults 1 2 /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 defaults 1 2 /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 defaults 1 2 /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 defaults 1 2 UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap defaults 0 0 /dev/md2 /home ext4 defaults 1 2 #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 [root@lamachine ~]# When checking mdstat I can see that 2 of the arrays are showing up as inactive, but not sure how to safely activate these so looking for some knowledgeable advice on how to proceed here. Thanks in advance, Daniel Below some more relevant outputs: [root@lamachine ~]# cat /proc/mdstat Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] md127 : active raid0 sda5[0] sdc5[2] sdb5[1] 94367232 blocks super 1.2 512k chunks md2 : active raid5 sda3[0] sdc2[2] sdb2[1] 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md128 : inactive sdf1[3](S) 2147352576 blocks super 1.2 md129 : inactive sdf2[2](S) 524156928 blocks super 1.2 md126 : active raid10 sda2[0] sdc1[1] 30719936 blocks 2 near-copies [2/2] [UU] unused devices: <none> [root@lamachine ~]# cat /etc/mdadm.conf # mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md2 level=raid5 num-devices=3 UUID=2cff15d1:e411447b:fd5d4721:03e44022 ARRAY /dev/md126 level=raid10 num-devices=2 UUID=9af006ca:8845bbd3:bfe78010:bc810f04 ARRAY /dev/md127 level=raid0 num-devices=3 UUID=acd5374f:72628c93:6a906c4b:5f675ce5 ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 UUID=f2372cb9:d3816fd6:ce86d826:882ec82e ARRAY /dev/md129 metadata=1.2 name=lamachine:129 UUID=895dae98:d1a496de:4f590b8b:cb8ac12a [root@lamachine ~]# mdadm --detail /dev/md1* /dev/md126: Version : 0.90 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Array Size : 30719936 (29.30 GiB 31.46 GB) Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Persistence : Superblock is persistent Update Time : Tue Aug 2 07:46:39 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 64K UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Events : 0.264152 Number Major Minor RaidDevice State 0 8 2 0 active sync set-A /dev/sda2 1 8 33 1 active sync set-B /dev/sdc1 /dev/md127: Version : 1.2 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Array Size : 94367232 (90.00 GiB 96.63 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Jul 26 19:00:28 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : reading.homeunix.com:3 UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Events : 0 Number Major Minor RaidDevice State 0 8 5 0 active sync /dev/sda5 1 8 21 1 active sync /dev/sdb5 2 8 37 2 active sync /dev/sdc5 /dev/md128: Version : 1.2 Raid Level : raid0 Total Devices : 1 Persistence : Superblock is persistent State : inactive Name : lamachine:128 (local to host lamachine) UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Events : 4154 Number Major Minor RaidDevice - 8 81 - /dev/sdf1 /dev/md129: Version : 1.2 Raid Level : raid0 Total Devices : 1 Persistence : Superblock is persistent State : inactive Name : lamachine:129 (local to host lamachine) UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Events : 0 Number Major Minor RaidDevice - 8 82 - /dev/sdf2 [root@lamachine ~]# mdadm --detail /dev/md2 /dev/md2: Version : 0.90 Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Array Size : 511999872 (488.28 GiB 524.29 GB) Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Mon Aug 1 20:24:23 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Events : 0.611 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 18 1 active sync /dev/sdb2 2 8 34 2 active sync /dev/sdc2 [root@lamachine ~]# ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-08-02 7:36 Inactive arrays Daniel Sanabria @ 2016-08-02 10:17 ` Wols Lists 2016-08-02 10:45 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Wols Lists @ 2016-08-02 10:17 UTC (permalink / raw) To: Daniel Sanabria, linux-raid Just a quick first response. I see md128 and md129 are both down, and are both listed as one drive, raid0. Bit odd, that ... What version of mdadm are you using? One of them had a bug (3.2.3 era?) that would split an array in two. Is it possible that you should have one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... I notice also that md126 is raid10 across two drives. That's odd, too. How much do you know about what the setup should be, and why it was set up that way? Download lspci by Phil Turmel (it requires python2.7, if your machine is python3 a quick fix to the shebang at the start should get it to work). Post the output from that here. Cheers, Wol On 02/08/16 08:36, Daniel Sanabria wrote: > Hi All, > > I have a box that I believe was not powered down correctly and after > transporting it to a different location it doesn't boot anymore > stopping at BIOS check "Verifying DMI Pool Data". > > The box have 6 drives and after instructing the BIOS to boot from the > first drive I managed to boot the OS (Fedora 23) after commenting out > 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: > > [root@lamachine ~]# uname -a; cat /etc/fstab > Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC > 2016 x86_64 x86_64 x86_64 GNU/Linux > > # > # /etc/fstab > # Created by anaconda on Tue Mar 24 19:31:21 2015 > # > # Accessible filesystems, by reference, are maintained under '/dev/disk' > # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info > # > /dev/mapper/vg_bigblackbox-LogVol_root / ext4 > defaults 1 1 > UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 > defaults 1 2 > /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 > defaults 1 2 > /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 > defaults 1 2 > /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 > defaults 1 2 > UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap > defaults 0 0 > /dev/md2 /home ext4 defaults 1 2 > #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 > #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 > [root@lamachine ~]# > > When checking mdstat I can see that 2 of the arrays are showing up as > inactive, but not sure how to safely activate these so looking for > some knowledgeable advice on how to proceed here. > > Thanks in advance, > > Daniel > > Below some more relevant outputs: > > [root@lamachine ~]# cat /proc/mdstat > Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] > md127 : active raid0 sda5[0] sdc5[2] sdb5[1] > 94367232 blocks super 1.2 512k chunks > > md2 : active raid5 sda3[0] sdc2[2] sdb2[1] > 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > md128 : inactive sdf1[3](S) > 2147352576 blocks super 1.2 > > md129 : inactive sdf2[2](S) > 524156928 blocks super 1.2 > > md126 : active raid10 sda2[0] sdc1[1] > 30719936 blocks 2 near-copies [2/2] [UU] > > unused devices: <none> > [root@lamachine ~]# cat /etc/mdadm.conf > # mdadm.conf written out by anaconda > MAILADDR root > AUTO +imsm +1.x -all > ARRAY /dev/md2 level=raid5 num-devices=3 > UUID=2cff15d1:e411447b:fd5d4721:03e44022 > ARRAY /dev/md126 level=raid10 num-devices=2 > UUID=9af006ca:8845bbd3:bfe78010:bc810f04 > ARRAY /dev/md127 level=raid0 num-devices=3 > UUID=acd5374f:72628c93:6a906c4b:5f675ce5 > ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 > UUID=f2372cb9:d3816fd6:ce86d826:882ec82e > ARRAY /dev/md129 metadata=1.2 name=lamachine:129 > UUID=895dae98:d1a496de:4f590b8b:cb8ac12a > [root@lamachine ~]# mdadm --detail /dev/md1* > /dev/md126: > Version : 0.90 > Creation Time : Thu Dec 3 22:12:12 2009 > Raid Level : raid10 > Array Size : 30719936 (29.30 GiB 31.46 GB) > Used Dev Size : 30719936 (29.30 GiB 31.46 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 126 > Persistence : Superblock is persistent > > Update Time : Tue Aug 2 07:46:39 2016 > State : clean > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Layout : near=2 > Chunk Size : 64K > > UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 > Events : 0.264152 > > Number Major Minor RaidDevice State > 0 8 2 0 active sync set-A /dev/sda2 > 1 8 33 1 active sync set-B /dev/sdc1 > /dev/md127: > Version : 1.2 > Creation Time : Tue Jul 26 19:00:28 2011 > Raid Level : raid0 > Array Size : 94367232 (90.00 GiB 96.63 GB) > Raid Devices : 3 > Total Devices : 3 > Persistence : Superblock is persistent > > Update Time : Tue Jul 26 19:00:28 2011 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > > Name : reading.homeunix.com:3 > UUID : acd5374f:72628c93:6a906c4b:5f675ce5 > Events : 0 > > Number Major Minor RaidDevice State > 0 8 5 0 active sync /dev/sda5 > 1 8 21 1 active sync /dev/sdb5 > 2 8 37 2 active sync /dev/sdc5 > /dev/md128: > Version : 1.2 > Raid Level : raid0 > Total Devices : 1 > Persistence : Superblock is persistent > > State : inactive > > Name : lamachine:128 (local to host lamachine) > UUID : f2372cb9:d3816fd6:ce86d826:882ec82e > Events : 4154 > > Number Major Minor RaidDevice > > - 8 81 - /dev/sdf1 > /dev/md129: > Version : 1.2 > Raid Level : raid0 > Total Devices : 1 > Persistence : Superblock is persistent > > State : inactive > > Name : lamachine:129 (local to host lamachine) > UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a > Events : 0 > > Number Major Minor RaidDevice > > - 8 82 - /dev/sdf2 > [root@lamachine ~]# mdadm --detail /dev/md2 > /dev/md2: > Version : 0.90 > Creation Time : Mon Feb 11 07:54:36 2013 > Raid Level : raid5 > Array Size : 511999872 (488.28 GiB 524.29 GB) > Used Dev Size : 255999936 (244.14 GiB 262.14 GB) > Raid Devices : 3 > Total Devices : 3 > Preferred Minor : 2 > Persistence : Superblock is persistent > > Update Time : Mon Aug 1 20:24:23 2016 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) > Events : 0.611 > > Number Major Minor RaidDevice State > 0 8 3 0 active sync /dev/sda3 > 1 8 18 1 active sync /dev/sdb2 > 2 8 34 2 active sync /dev/sdc2 > [root@lamachine ~]# > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-08-02 10:17 ` Wols Lists @ 2016-08-02 10:45 ` Daniel Sanabria 2016-08-03 19:18 ` Daniel Sanabria 2016-09-11 18:48 ` Daniel Sanabria 0 siblings, 2 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-08-02 10:45 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid Thanks very much for the response Wol. It looks like the PSU is dead (server automatically powers off a few seconds after power on). I'm planning to order a PSU replacement to resume troubleshooting so please bear with me; maybe the PSU was degraded and couldn't power some of drives? Cheers, Daniel On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote: > Just a quick first response. I see md128 and md129 are both down, and > are both listed as one drive, raid0. Bit odd, that ... > > What version of mdadm are you using? One of them had a bug (3.2.3 era?) > that would split an array in two. Is it possible that you should have > one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... > > I notice also that md126 is raid10 across two drives. That's odd, too. > > How much do you know about what the setup should be, and why it was set > up that way? > > Download lspci by Phil Turmel (it requires python2.7, if your machine is > python3 a quick fix to the shebang at the start should get it to work). > Post the output from that here. > > Cheers, > Wol > > On 02/08/16 08:36, Daniel Sanabria wrote: >> Hi All, >> >> I have a box that I believe was not powered down correctly and after >> transporting it to a different location it doesn't boot anymore >> stopping at BIOS check "Verifying DMI Pool Data". >> >> The box have 6 drives and after instructing the BIOS to boot from the >> first drive I managed to boot the OS (Fedora 23) after commenting out >> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: >> >> [root@lamachine ~]# uname -a; cat /etc/fstab >> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC >> 2016 x86_64 x86_64 x86_64 GNU/Linux >> >> # >> # /etc/fstab >> # Created by anaconda on Tue Mar 24 19:31:21 2015 >> # >> # Accessible filesystems, by reference, are maintained under '/dev/disk' >> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info >> # >> /dev/mapper/vg_bigblackbox-LogVol_root / ext4 >> defaults 1 1 >> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 >> defaults 1 2 >> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 >> defaults 1 2 >> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 >> defaults 1 2 >> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 >> defaults 1 2 >> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap >> defaults 0 0 >> /dev/md2 /home ext4 defaults 1 2 >> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 >> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 >> [root@lamachine ~]# >> >> When checking mdstat I can see that 2 of the arrays are showing up as >> inactive, but not sure how to safely activate these so looking for >> some knowledgeable advice on how to proceed here. >> >> Thanks in advance, >> >> Daniel >> >> Below some more relevant outputs: >> >> [root@lamachine ~]# cat /proc/mdstat >> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] >> md127 : active raid0 sda5[0] sdc5[2] sdb5[1] >> 94367232 blocks super 1.2 512k chunks >> >> md2 : active raid5 sda3[0] sdc2[2] sdb2[1] >> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> >> md128 : inactive sdf1[3](S) >> 2147352576 blocks super 1.2 >> >> md129 : inactive sdf2[2](S) >> 524156928 blocks super 1.2 >> >> md126 : active raid10 sda2[0] sdc1[1] >> 30719936 blocks 2 near-copies [2/2] [UU] >> >> unused devices: <none> >> [root@lamachine ~]# cat /etc/mdadm.conf >> # mdadm.conf written out by anaconda >> MAILADDR root >> AUTO +imsm +1.x -all >> ARRAY /dev/md2 level=raid5 num-devices=3 >> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >> ARRAY /dev/md126 level=raid10 num-devices=2 >> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >> ARRAY /dev/md127 level=raid0 num-devices=3 >> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >> [root@lamachine ~]# mdadm --detail /dev/md1* >> /dev/md126: >> Version : 0.90 >> Creation Time : Thu Dec 3 22:12:12 2009 >> Raid Level : raid10 >> Array Size : 30719936 (29.30 GiB 31.46 GB) >> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >> Raid Devices : 2 >> Total Devices : 2 >> Preferred Minor : 126 >> Persistence : Superblock is persistent >> >> Update Time : Tue Aug 2 07:46:39 2016 >> State : clean >> Active Devices : 2 >> Working Devices : 2 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : near=2 >> Chunk Size : 64K >> >> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >> Events : 0.264152 >> >> Number Major Minor RaidDevice State >> 0 8 2 0 active sync set-A /dev/sda2 >> 1 8 33 1 active sync set-B /dev/sdc1 >> /dev/md127: >> Version : 1.2 >> Creation Time : Tue Jul 26 19:00:28 2011 >> Raid Level : raid0 >> Array Size : 94367232 (90.00 GiB 96.63 GB) >> Raid Devices : 3 >> Total Devices : 3 >> Persistence : Superblock is persistent >> >> Update Time : Tue Jul 26 19:00:28 2011 >> State : clean >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >> Name : reading.homeunix.com:3 >> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >> Events : 0 >> >> Number Major Minor RaidDevice State >> 0 8 5 0 active sync /dev/sda5 >> 1 8 21 1 active sync /dev/sdb5 >> 2 8 37 2 active sync /dev/sdc5 >> /dev/md128: >> Version : 1.2 >> Raid Level : raid0 >> Total Devices : 1 >> Persistence : Superblock is persistent >> >> State : inactive >> >> Name : lamachine:128 (local to host lamachine) >> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >> Events : 4154 >> >> Number Major Minor RaidDevice >> >> - 8 81 - /dev/sdf1 >> /dev/md129: >> Version : 1.2 >> Raid Level : raid0 >> Total Devices : 1 >> Persistence : Superblock is persistent >> >> State : inactive >> >> Name : lamachine:129 (local to host lamachine) >> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >> Events : 0 >> >> Number Major Minor RaidDevice >> >> - 8 82 - /dev/sdf2 >> [root@lamachine ~]# mdadm --detail /dev/md2 >> /dev/md2: >> Version : 0.90 >> Creation Time : Mon Feb 11 07:54:36 2013 >> Raid Level : raid5 >> Array Size : 511999872 (488.28 GiB 524.29 GB) >> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >> Raid Devices : 3 >> Total Devices : 3 >> Preferred Minor : 2 >> Persistence : Superblock is persistent >> >> Update Time : Mon Aug 1 20:24:23 2016 >> State : clean >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >> Events : 0.611 >> >> Number Major Minor RaidDevice State >> 0 8 3 0 active sync /dev/sda3 >> 1 8 18 1 active sync /dev/sdb2 >> 2 8 34 2 active sync /dev/sdc2 >> [root@lamachine ~]# >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-08-02 10:45 ` Daniel Sanabria @ 2016-08-03 19:18 ` Daniel Sanabria 2016-08-03 21:31 ` Wols Lists 2016-09-11 18:48 ` Daniel Sanabria 1 sibling, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-08-03 19:18 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid Ok, Unfortunately the PSU replacement didn't help and I could be facing a failed motherboard/cpu :( My question now is, is it possible to restore the arrays in a new machine? Daniel On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote: > Thanks very much for the response Wol. > > It looks like the PSU is dead (server automatically powers off a few > seconds after power on). > > I'm planning to order a PSU replacement to resume troubleshooting so > please bear with me; maybe the PSU was degraded and couldn't power > some of drives? > > Cheers, > > Daniel > > On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote: >> Just a quick first response. I see md128 and md129 are both down, and >> are both listed as one drive, raid0. Bit odd, that ... >> >> What version of mdadm are you using? One of them had a bug (3.2.3 era?) >> that would split an array in two. Is it possible that you should have >> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... >> >> I notice also that md126 is raid10 across two drives. That's odd, too. >> >> How much do you know about what the setup should be, and why it was set >> up that way? >> >> Download lspci by Phil Turmel (it requires python2.7, if your machine is >> python3 a quick fix to the shebang at the start should get it to work). >> Post the output from that here. >> >> Cheers, >> Wol >> >> On 02/08/16 08:36, Daniel Sanabria wrote: >>> Hi All, >>> >>> I have a box that I believe was not powered down correctly and after >>> transporting it to a different location it doesn't boot anymore >>> stopping at BIOS check "Verifying DMI Pool Data". >>> >>> The box have 6 drives and after instructing the BIOS to boot from the >>> first drive I managed to boot the OS (Fedora 23) after commenting out >>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: >>> >>> [root@lamachine ~]# uname -a; cat /etc/fstab >>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC >>> 2016 x86_64 x86_64 x86_64 GNU/Linux >>> >>> # >>> # /etc/fstab >>> # Created by anaconda on Tue Mar 24 19:31:21 2015 >>> # >>> # Accessible filesystems, by reference, are maintained under '/dev/disk' >>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info >>> # >>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4 >>> defaults 1 1 >>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 >>> defaults 1 2 >>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 >>> defaults 1 2 >>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 >>> defaults 1 2 >>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 >>> defaults 1 2 >>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap >>> defaults 0 0 >>> /dev/md2 /home ext4 defaults 1 2 >>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 >>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 >>> [root@lamachine ~]# >>> >>> When checking mdstat I can see that 2 of the arrays are showing up as >>> inactive, but not sure how to safely activate these so looking for >>> some knowledgeable advice on how to proceed here. >>> >>> Thanks in advance, >>> >>> Daniel >>> >>> Below some more relevant outputs: >>> >>> [root@lamachine ~]# cat /proc/mdstat >>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] >>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1] >>> 94367232 blocks super 1.2 512k chunks >>> >>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1] >>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >>> >>> md128 : inactive sdf1[3](S) >>> 2147352576 blocks super 1.2 >>> >>> md129 : inactive sdf2[2](S) >>> 524156928 blocks super 1.2 >>> >>> md126 : active raid10 sda2[0] sdc1[1] >>> 30719936 blocks 2 near-copies [2/2] [UU] >>> >>> unused devices: <none> >>> [root@lamachine ~]# cat /etc/mdadm.conf >>> # mdadm.conf written out by anaconda >>> MAILADDR root >>> AUTO +imsm +1.x -all >>> ARRAY /dev/md2 level=raid5 num-devices=3 >>> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >>> ARRAY /dev/md126 level=raid10 num-devices=2 >>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >>> ARRAY /dev/md127 level=raid0 num-devices=3 >>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >>> [root@lamachine ~]# mdadm --detail /dev/md1* >>> /dev/md126: >>> Version : 0.90 >>> Creation Time : Thu Dec 3 22:12:12 2009 >>> Raid Level : raid10 >>> Array Size : 30719936 (29.30 GiB 31.46 GB) >>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >>> Raid Devices : 2 >>> Total Devices : 2 >>> Preferred Minor : 126 >>> Persistence : Superblock is persistent >>> >>> Update Time : Tue Aug 2 07:46:39 2016 >>> State : clean >>> Active Devices : 2 >>> Working Devices : 2 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Layout : near=2 >>> Chunk Size : 64K >>> >>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >>> Events : 0.264152 >>> >>> Number Major Minor RaidDevice State >>> 0 8 2 0 active sync set-A /dev/sda2 >>> 1 8 33 1 active sync set-B /dev/sdc1 >>> /dev/md127: >>> Version : 1.2 >>> Creation Time : Tue Jul 26 19:00:28 2011 >>> Raid Level : raid0 >>> Array Size : 94367232 (90.00 GiB 96.63 GB) >>> Raid Devices : 3 >>> Total Devices : 3 >>> Persistence : Superblock is persistent >>> >>> Update Time : Tue Jul 26 19:00:28 2011 >>> State : clean >>> Active Devices : 3 >>> Working Devices : 3 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Chunk Size : 512K >>> >>> Name : reading.homeunix.com:3 >>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >>> Events : 0 >>> >>> Number Major Minor RaidDevice State >>> 0 8 5 0 active sync /dev/sda5 >>> 1 8 21 1 active sync /dev/sdb5 >>> 2 8 37 2 active sync /dev/sdc5 >>> /dev/md128: >>> Version : 1.2 >>> Raid Level : raid0 >>> Total Devices : 1 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> >>> Name : lamachine:128 (local to host lamachine) >>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >>> Events : 4154 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 81 - /dev/sdf1 >>> /dev/md129: >>> Version : 1.2 >>> Raid Level : raid0 >>> Total Devices : 1 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> >>> Name : lamachine:129 (local to host lamachine) >>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>> Events : 0 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 82 - /dev/sdf2 >>> [root@lamachine ~]# mdadm --detail /dev/md2 >>> /dev/md2: >>> Version : 0.90 >>> Creation Time : Mon Feb 11 07:54:36 2013 >>> Raid Level : raid5 >>> Array Size : 511999872 (488.28 GiB 524.29 GB) >>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >>> Raid Devices : 3 >>> Total Devices : 3 >>> Preferred Minor : 2 >>> Persistence : Superblock is persistent >>> >>> Update Time : Mon Aug 1 20:24:23 2016 >>> State : clean >>> Active Devices : 3 >>> Working Devices : 3 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Layout : left-symmetric >>> Chunk Size : 64K >>> >>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >>> Events : 0.611 >>> >>> Number Major Minor RaidDevice State >>> 0 8 3 0 active sync /dev/sda3 >>> 1 8 18 1 active sync /dev/sdb2 >>> 2 8 34 2 active sync /dev/sdc2 >>> [root@lamachine ~]# >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-08-03 19:18 ` Daniel Sanabria @ 2016-08-03 21:31 ` Wols Lists 0 siblings, 0 replies; 37+ messages in thread From: Wols Lists @ 2016-08-03 21:31 UTC (permalink / raw) To: Daniel Sanabria; +Cc: linux-raid On 03/08/16 20:18, Daniel Sanabria wrote: > Ok, > > Unfortunately the PSU replacement didn't help and I could be facing a > failed motherboard/cpu :( My question now is, is it possible to > restore the arrays in a new machine? Your comment about the system starting up and shutting down almost immediately implies to me a faulty on/off switch. It could be that simple. Or of course, it could be worse. Like so many things nowadays, on/off is controlled by software and if the switch is stuck ... You might find all it needs is a new case. Or you might not. But yes, if it's all linux software raid, then sticking the disks into a new machine should work fine - the new machine should just come straight up unless you're running an optimised kernel. (That doesn't imply the arrays will :-( you might need a bit of fixing for them, but if the machine boots successfully you're most of the way there.) Cheers, Wol ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-08-02 10:45 ` Daniel Sanabria 2016-08-03 19:18 ` Daniel Sanabria @ 2016-09-11 18:48 ` Daniel Sanabria 2016-09-11 20:06 ` Daniel Sanabria 1 sibling, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-11 18:48 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid ok, system up and running after MB was replaced however the arrays remain inactive. mdadm version is: mdadm - v3.3.4 - 3rd August 2015 Here's the output from Phil's lsdrv: [root@lamachine ~]# ./lsdrv PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 06) ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379} │└sda 465.76g [8:0] Partitioned (dos) │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync {9af006ca-8845-bbd3-bfe7-8010bc810f04} │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk {9af006ca:8845bbd3:bfe78010:bc810f04} │ │ │ PV LVM2_member 28.03g used, 1.26g free {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} │ │ └VG vg_bigblackbox 29.29g 1.26g free {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR} │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4 {b08d7f5e-f15f-4241-804e-edccecab6003} │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4 {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3} │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ / │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4 {f6b46363-170b-4038-83bd-2c5f9f6a1973} │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4 {ab165c61-3d62-4c55-8639-6c2c2bf4b021} │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync {2cff15d1-e411-447b-fd5d-472103e44022} │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk {2cff15d1:e411447b:fd5d4721:03e44022} │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} │ │ └Mounted as /dev/md2 @ /home │ ├sda3 1.00k [8:3] Partitioned (dos) │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} │ │ │ PV LVM2_member 86.00g used, 3.99g free {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB} │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos) │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos) │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos) │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos) │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos) │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos) │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos) │ └sda6 3.39g [8:6] Empty/Unknown ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185} │└sdb 465.76g [8:16] Partitioned (dos) │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync {2cff15d1-e411-447b-fd5d-472103e44022} │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk {2cff15d1:e411447b:fd5d4721:03e44022} │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a} │ ├sdb4 1.00k [8:20] Partitioned (dos) │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} │ │ PV LVM2_member 86.00g used, 3.99g free {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} │ └sdb6 3.39g [8:22] Empty/Unknown ├scsi 2:x:x:x [Empty] ├scsi 3:x:x:x [Empty] ├scsi 4:x:x:x [Empty] └scsi 5:x:x:x [Empty] PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe SATA 6Gb/s Controller (rev 11) ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF} │└sdc 2.73t [8:32] Partitioned (PMBR) ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7} │└sdd 2.73t [8:48] Partitioned (gpt) │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128' {f2372cb9-d381-6fd6-ce86-d826882ec82e} │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None {f2372cb9:d3816fd6:ce86d826:882ec82e} │ │ Empty/Unknown │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129' {895dae98-d1a4-96de-4f59-0b8bcb8ac12a} │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None {895dae98:d1a496de:4f590b8b:cb8ac12a} │ Empty/Unknown ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906} │└sde 2.73t [8:64] Partitioned (PMBR) ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724} │└sdf 465.76g [8:80] Partitioned (dos) │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557} │ │└Mounted as /dev/sdf1 @ /boot │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync {9af006ca-8845-bbd3-bfe7-8010bc810f04} │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk {9af006ca:8845bbd3:bfe78010:bc810f04} │ │ PV LVM2_member 28.03g used, 1.26g free {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync {2cff15d1-e411-447b-fd5d-472103e44022} │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk {2cff15d1:e411447b:fd5d4721:03e44022} │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} │ ├sdf4 1.00k [8:84] Partitioned (dos) │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} │ │ PV LVM2_member 86.00g used, 3.99g free {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} │ └sdf6 3.39g [8:86] Empty/Unknown ├scsi 10:x:x:x [Empty] ├scsi 11:x:x:x [Empty] └scsi 12:x:x:x [Empty] PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation C602 chipset 4-Port SATA Storage Control Unit (rev 06) └scsi 14:x:x:x [Empty] [root@lamachine ~]# Thanks in advance for any recommendations on what steps to take in order to bring these arrays back online. Regards, Daniel On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote: > Thanks very much for the response Wol. > > It looks like the PSU is dead (server automatically powers off a few > seconds after power on). > > I'm planning to order a PSU replacement to resume troubleshooting so > please bear with me; maybe the PSU was degraded and couldn't power > some of drives? > > Cheers, > > Daniel > > On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote: >> Just a quick first response. I see md128 and md129 are both down, and >> are both listed as one drive, raid0. Bit odd, that ... >> >> What version of mdadm are you using? One of them had a bug (3.2.3 era?) >> that would split an array in two. Is it possible that you should have >> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... >> >> I notice also that md126 is raid10 across two drives. That's odd, too. >> >> How much do you know about what the setup should be, and why it was set >> up that way? >> >> Download lspci by Phil Turmel (it requires python2.7, if your machine is >> python3 a quick fix to the shebang at the start should get it to work). >> Post the output from that here. >> >> Cheers, >> Wol >> >> On 02/08/16 08:36, Daniel Sanabria wrote: >>> Hi All, >>> >>> I have a box that I believe was not powered down correctly and after >>> transporting it to a different location it doesn't boot anymore >>> stopping at BIOS check "Verifying DMI Pool Data". >>> >>> The box have 6 drives and after instructing the BIOS to boot from the >>> first drive I managed to boot the OS (Fedora 23) after commenting out >>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: >>> >>> [root@lamachine ~]# uname -a; cat /etc/fstab >>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC >>> 2016 x86_64 x86_64 x86_64 GNU/Linux >>> >>> # >>> # /etc/fstab >>> # Created by anaconda on Tue Mar 24 19:31:21 2015 >>> # >>> # Accessible filesystems, by reference, are maintained under '/dev/disk' >>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info >>> # >>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4 >>> defaults 1 1 >>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 >>> defaults 1 2 >>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 >>> defaults 1 2 >>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 >>> defaults 1 2 >>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 >>> defaults 1 2 >>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap >>> defaults 0 0 >>> /dev/md2 /home ext4 defaults 1 2 >>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 >>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 >>> [root@lamachine ~]# >>> >>> When checking mdstat I can see that 2 of the arrays are showing up as >>> inactive, but not sure how to safely activate these so looking for >>> some knowledgeable advice on how to proceed here. >>> >>> Thanks in advance, >>> >>> Daniel >>> >>> Below some more relevant outputs: >>> >>> [root@lamachine ~]# cat /proc/mdstat >>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] >>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1] >>> 94367232 blocks super 1.2 512k chunks >>> >>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1] >>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >>> >>> md128 : inactive sdf1[3](S) >>> 2147352576 blocks super 1.2 >>> >>> md129 : inactive sdf2[2](S) >>> 524156928 blocks super 1.2 >>> >>> md126 : active raid10 sda2[0] sdc1[1] >>> 30719936 blocks 2 near-copies [2/2] [UU] >>> >>> unused devices: <none> >>> [root@lamachine ~]# cat /etc/mdadm.conf >>> # mdadm.conf written out by anaconda >>> MAILADDR root >>> AUTO +imsm +1.x -all >>> ARRAY /dev/md2 level=raid5 num-devices=3 >>> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >>> ARRAY /dev/md126 level=raid10 num-devices=2 >>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >>> ARRAY /dev/md127 level=raid0 num-devices=3 >>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >>> [root@lamachine ~]# mdadm --detail /dev/md1* >>> /dev/md126: >>> Version : 0.90 >>> Creation Time : Thu Dec 3 22:12:12 2009 >>> Raid Level : raid10 >>> Array Size : 30719936 (29.30 GiB 31.46 GB) >>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >>> Raid Devices : 2 >>> Total Devices : 2 >>> Preferred Minor : 126 >>> Persistence : Superblock is persistent >>> >>> Update Time : Tue Aug 2 07:46:39 2016 >>> State : clean >>> Active Devices : 2 >>> Working Devices : 2 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Layout : near=2 >>> Chunk Size : 64K >>> >>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >>> Events : 0.264152 >>> >>> Number Major Minor RaidDevice State >>> 0 8 2 0 active sync set-A /dev/sda2 >>> 1 8 33 1 active sync set-B /dev/sdc1 >>> /dev/md127: >>> Version : 1.2 >>> Creation Time : Tue Jul 26 19:00:28 2011 >>> Raid Level : raid0 >>> Array Size : 94367232 (90.00 GiB 96.63 GB) >>> Raid Devices : 3 >>> Total Devices : 3 >>> Persistence : Superblock is persistent >>> >>> Update Time : Tue Jul 26 19:00:28 2011 >>> State : clean >>> Active Devices : 3 >>> Working Devices : 3 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Chunk Size : 512K >>> >>> Name : reading.homeunix.com:3 >>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >>> Events : 0 >>> >>> Number Major Minor RaidDevice State >>> 0 8 5 0 active sync /dev/sda5 >>> 1 8 21 1 active sync /dev/sdb5 >>> 2 8 37 2 active sync /dev/sdc5 >>> /dev/md128: >>> Version : 1.2 >>> Raid Level : raid0 >>> Total Devices : 1 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> >>> Name : lamachine:128 (local to host lamachine) >>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >>> Events : 4154 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 81 - /dev/sdf1 >>> /dev/md129: >>> Version : 1.2 >>> Raid Level : raid0 >>> Total Devices : 1 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> >>> Name : lamachine:129 (local to host lamachine) >>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>> Events : 0 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 82 - /dev/sdf2 >>> [root@lamachine ~]# mdadm --detail /dev/md2 >>> /dev/md2: >>> Version : 0.90 >>> Creation Time : Mon Feb 11 07:54:36 2013 >>> Raid Level : raid5 >>> Array Size : 511999872 (488.28 GiB 524.29 GB) >>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >>> Raid Devices : 3 >>> Total Devices : 3 >>> Preferred Minor : 2 >>> Persistence : Superblock is persistent >>> >>> Update Time : Mon Aug 1 20:24:23 2016 >>> State : clean >>> Active Devices : 3 >>> Working Devices : 3 >>> Failed Devices : 0 >>> Spare Devices : 0 >>> >>> Layout : left-symmetric >>> Chunk Size : 64K >>> >>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >>> Events : 0.611 >>> >>> Number Major Minor RaidDevice State >>> 0 8 3 0 active sync /dev/sda3 >>> 1 8 18 1 active sync /dev/sdb2 >>> 2 8 34 2 active sync /dev/sdc2 >>> [root@lamachine ~]# >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-11 18:48 ` Daniel Sanabria @ 2016-09-11 20:06 ` Daniel Sanabria 2016-09-12 19:41 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-11 20:06 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid However I'm noticing that the details with this new MB are somewhat different: [root@lamachine ~]# cat /etc/mdadm.conf # mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md2 level=raid5 num-devices=3 UUID=2cff15d1:e411447b:fd5d4721:03e44022 ARRAY /dev/md126 level=raid10 num-devices=2 UUID=9af006ca:8845bbd3:bfe78010:bc810f04 ARRAY /dev/md127 level=raid0 num-devices=3 UUID=acd5374f:72628c93:6a906c4b:5f675ce5 ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 UUID=f2372cb9:d3816fd6:ce86d826:882ec82e ARRAY /dev/md129 metadata=1.2 name=lamachine:129 UUID=895dae98:d1a496de:4f590b8b:cb8ac12a [root@lamachine ~]# mdadm --detail /dev/md1* /dev/md126: Version : 0.90 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Array Size : 30719936 (29.30 GiB 31.46 GB) Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Persistence : Superblock is persistent Update Time : Tue Jan 12 04:03:41 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 64K UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Events : 0.264152 Number Major Minor RaidDevice State 0 8 82 0 active sync set-A /dev/sdf2 1 8 1 1 active sync set-B /dev/sda1 /dev/md127: Version : 1.2 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Array Size : 94367232 (90.00 GiB 96.63 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Jul 26 19:00:28 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : reading.homeunix.com:3 UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Events : 0 Number Major Minor RaidDevice State 0 8 85 0 active sync /dev/sdf5 1 8 21 1 active sync /dev/sdb5 2 8 5 2 active sync /dev/sda5 /dev/md128: Version : 1.2 Raid Level : raid0 Total Devices : 1 Persistence : Superblock is persistent State : inactive Name : lamachine:128 (local to host lamachine) UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Events : 4154 Number Major Minor RaidDevice - 8 49 - /dev/sdd1 /dev/md129: Version : 1.2 Raid Level : raid0 Total Devices : 1 Persistence : Superblock is persistent State : inactive Name : lamachine:129 (local to host lamachine) UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Events : 0 Number Major Minor RaidDevice - 8 50 - /dev/sdd2 [root@lamachine ~]# mdadm --detail /dev/md2* /dev/md2: Version : 0.90 Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Array Size : 511999872 (488.28 GiB 524.29 GB) Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Jan 12 02:31:50 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Events : 0.611 Number Major Minor RaidDevice State 0 8 83 0 active sync /dev/sdf3 1 8 18 1 active sync /dev/sdb2 2 8 2 2 active sync /dev/sda2 [root@lamachine ~]# cat /proc/mdstat Personalities : [raid10] [raid0] [raid6] [raid5] [raid4] md2 : active raid5 sda2[2] sdf3[0] sdb2[1] 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] md127 : active raid0 sda5[2] sdf5[0] sdb5[1] 94367232 blocks super 1.2 512k chunks md129 : inactive sdd2[2](S) 524156928 blocks super 1.2 md128 : inactive sdd1[3](S) 2147352576 blocks super 1.2 md126 : active raid10 sdf2[0] sda1[1] 30719936 blocks 2 near-copies [2/2] [UU] unused devices: <none> [root@lamachine ~]# On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote: > ok, system up and running after MB was replaced however the arrays > remain inactive. > > mdadm version is: > mdadm - v3.3.4 - 3rd August 2015 > > Here's the output from Phil's lsdrv: > > [root@lamachine ~]# ./lsdrv > PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series > chipset 6-Port SATA AHCI Controller (rev 06) > ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379} > │└sda 465.76g [8:0] Partitioned (dos) > │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync > {9af006ca-8845-bbd3-bfe7-8010bc810f04} > │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk > {9af006ca:8845bbd3:bfe78010:bc810f04} > │ │ │ PV LVM2_member 28.03g used, 1.26g free > {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} > │ │ └VG vg_bigblackbox 29.29g 1.26g free > {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR} > │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4 > {b08d7f5e-f15f-4241-804e-edccecab6003} > │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt > │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4 > {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3} > │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ / > │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4 > {f6b46363-170b-4038-83bd-2c5f9f6a1973} > │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp > │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4 > {ab165c61-3d62-4c55-8639-6c2c2bf4b021} > │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var > │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync > {2cff15d1-e411-447b-fd5d-472103e44022} > │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk > {2cff15d1:e411447b:fd5d4721:03e44022} > │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} > │ │ └Mounted as /dev/md2 @ /home > │ ├sda3 1.00k [8:3] Partitioned (dos) > │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync > 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} > │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None > (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} > │ │ │ PV LVM2_member 86.00g used, 3.99g free > {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} > │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB} > │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos) > │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos) > │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos) > │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos) > │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos) > │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos) > │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos) > │ └sda6 3.39g [8:6] Empty/Unknown > ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185} > │└sdb 465.76g [8:16] Partitioned (dos) > │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync > {2cff15d1-e411-447b-fd5d-472103e44022} > │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk > {2cff15d1:e411447b:fd5d4721:03e44022} > │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} > │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a} > │ ├sdb4 1.00k [8:20] Partitioned (dos) > │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync > 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} > │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None > (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} > │ │ PV LVM2_member 86.00g used, 3.99g free > {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} > │ └sdb6 3.39g [8:22] Empty/Unknown > ├scsi 2:x:x:x [Empty] > ├scsi 3:x:x:x [Empty] > ├scsi 4:x:x:x [Empty] > └scsi 5:x:x:x [Empty] > PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd. > 88SE9230 PCIe SATA 6Gb/s Controller (rev 11) > ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF} > │└sdc 2.73t [8:32] Partitioned (PMBR) > ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7} > │└sdd 2.73t [8:48] Partitioned (gpt) > │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128' > {f2372cb9-d381-6fd6-ce86-d826882ec82e} > │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None > {f2372cb9:d3816fd6:ce86d826:882ec82e} > │ │ Empty/Unknown > │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129' > {895dae98-d1a4-96de-4f59-0b8bcb8ac12a} > │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None > {895dae98:d1a496de:4f590b8b:cb8ac12a} > │ Empty/Unknown > ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906} > │└sde 2.73t [8:64] Partitioned (PMBR) > ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724} > │└sdf 465.76g [8:80] Partitioned (dos) > │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557} > │ │└Mounted as /dev/sdf1 @ /boot > │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync > {9af006ca-8845-bbd3-bfe7-8010bc810f04} > │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk > {9af006ca:8845bbd3:bfe78010:bc810f04} > │ │ PV LVM2_member 28.03g used, 1.26g free > {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} > │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync > {2cff15d1-e411-447b-fd5d-472103e44022} > │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk > {2cff15d1:e411447b:fd5d4721:03e44022} > │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} > │ ├sdf4 1.00k [8:84] Partitioned (dos) > │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync > 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} > │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None > (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} > │ │ PV LVM2_member 86.00g used, 3.99g free > {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} > │ └sdf6 3.39g [8:86] Empty/Unknown > ├scsi 10:x:x:x [Empty] > ├scsi 11:x:x:x [Empty] > └scsi 12:x:x:x [Empty] > PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation > C602 chipset 4-Port SATA Storage Control Unit (rev 06) > └scsi 14:x:x:x [Empty] > [root@lamachine ~]# > > Thanks in advance for any recommendations on what steps to take in > order to bring these arrays back online. > > Regards, > > Daniel > > > On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> Thanks very much for the response Wol. >> >> It looks like the PSU is dead (server automatically powers off a few >> seconds after power on). >> >> I'm planning to order a PSU replacement to resume troubleshooting so >> please bear with me; maybe the PSU was degraded and couldn't power >> some of drives? >> >> Cheers, >> >> Daniel >> >> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote: >>> Just a quick first response. I see md128 and md129 are both down, and >>> are both listed as one drive, raid0. Bit odd, that ... >>> >>> What version of mdadm are you using? One of them had a bug (3.2.3 era?) >>> that would split an array in two. Is it possible that you should have >>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... >>> >>> I notice also that md126 is raid10 across two drives. That's odd, too. >>> >>> How much do you know about what the setup should be, and why it was set >>> up that way? >>> >>> Download lspci by Phil Turmel (it requires python2.7, if your machine is >>> python3 a quick fix to the shebang at the start should get it to work). >>> Post the output from that here. >>> >>> Cheers, >>> Wol >>> >>> On 02/08/16 08:36, Daniel Sanabria wrote: >>>> Hi All, >>>> >>>> I have a box that I believe was not powered down correctly and after >>>> transporting it to a different location it doesn't boot anymore >>>> stopping at BIOS check "Verifying DMI Pool Data". >>>> >>>> The box have 6 drives and after instructing the BIOS to boot from the >>>> first drive I managed to boot the OS (Fedora 23) after commenting out >>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: >>>> >>>> [root@lamachine ~]# uname -a; cat /etc/fstab >>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC >>>> 2016 x86_64 x86_64 x86_64 GNU/Linux >>>> >>>> # >>>> # /etc/fstab >>>> # Created by anaconda on Tue Mar 24 19:31:21 2015 >>>> # >>>> # Accessible filesystems, by reference, are maintained under '/dev/disk' >>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info >>>> # >>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4 >>>> defaults 1 1 >>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 >>>> defaults 1 2 >>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 >>>> defaults 1 2 >>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 >>>> defaults 1 2 >>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 >>>> defaults 1 2 >>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap >>>> defaults 0 0 >>>> /dev/md2 /home ext4 defaults 1 2 >>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 >>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 >>>> [root@lamachine ~]# >>>> >>>> When checking mdstat I can see that 2 of the arrays are showing up as >>>> inactive, but not sure how to safely activate these so looking for >>>> some knowledgeable advice on how to proceed here. >>>> >>>> Thanks in advance, >>>> >>>> Daniel >>>> >>>> Below some more relevant outputs: >>>> >>>> [root@lamachine ~]# cat /proc/mdstat >>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] >>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1] >>>> 94367232 blocks super 1.2 512k chunks >>>> >>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1] >>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >>>> >>>> md128 : inactive sdf1[3](S) >>>> 2147352576 blocks super 1.2 >>>> >>>> md129 : inactive sdf2[2](S) >>>> 524156928 blocks super 1.2 >>>> >>>> md126 : active raid10 sda2[0] sdc1[1] >>>> 30719936 blocks 2 near-copies [2/2] [UU] >>>> >>>> unused devices: <none> >>>> [root@lamachine ~]# cat /etc/mdadm.conf >>>> # mdadm.conf written out by anaconda >>>> MAILADDR root >>>> AUTO +imsm +1.x -all >>>> ARRAY /dev/md2 level=raid5 num-devices=3 >>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >>>> ARRAY /dev/md126 level=raid10 num-devices=2 >>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >>>> ARRAY /dev/md127 level=raid0 num-devices=3 >>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >>>> [root@lamachine ~]# mdadm --detail /dev/md1* >>>> /dev/md126: >>>> Version : 0.90 >>>> Creation Time : Thu Dec 3 22:12:12 2009 >>>> Raid Level : raid10 >>>> Array Size : 30719936 (29.30 GiB 31.46 GB) >>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >>>> Raid Devices : 2 >>>> Total Devices : 2 >>>> Preferred Minor : 126 >>>> Persistence : Superblock is persistent >>>> >>>> Update Time : Tue Aug 2 07:46:39 2016 >>>> State : clean >>>> Active Devices : 2 >>>> Working Devices : 2 >>>> Failed Devices : 0 >>>> Spare Devices : 0 >>>> >>>> Layout : near=2 >>>> Chunk Size : 64K >>>> >>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >>>> Events : 0.264152 >>>> >>>> Number Major Minor RaidDevice State >>>> 0 8 2 0 active sync set-A /dev/sda2 >>>> 1 8 33 1 active sync set-B /dev/sdc1 >>>> /dev/md127: >>>> Version : 1.2 >>>> Creation Time : Tue Jul 26 19:00:28 2011 >>>> Raid Level : raid0 >>>> Array Size : 94367232 (90.00 GiB 96.63 GB) >>>> Raid Devices : 3 >>>> Total Devices : 3 >>>> Persistence : Superblock is persistent >>>> >>>> Update Time : Tue Jul 26 19:00:28 2011 >>>> State : clean >>>> Active Devices : 3 >>>> Working Devices : 3 >>>> Failed Devices : 0 >>>> Spare Devices : 0 >>>> >>>> Chunk Size : 512K >>>> >>>> Name : reading.homeunix.com:3 >>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >>>> Events : 0 >>>> >>>> Number Major Minor RaidDevice State >>>> 0 8 5 0 active sync /dev/sda5 >>>> 1 8 21 1 active sync /dev/sdb5 >>>> 2 8 37 2 active sync /dev/sdc5 >>>> /dev/md128: >>>> Version : 1.2 >>>> Raid Level : raid0 >>>> Total Devices : 1 >>>> Persistence : Superblock is persistent >>>> >>>> State : inactive >>>> >>>> Name : lamachine:128 (local to host lamachine) >>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >>>> Events : 4154 >>>> >>>> Number Major Minor RaidDevice >>>> >>>> - 8 81 - /dev/sdf1 >>>> /dev/md129: >>>> Version : 1.2 >>>> Raid Level : raid0 >>>> Total Devices : 1 >>>> Persistence : Superblock is persistent >>>> >>>> State : inactive >>>> >>>> Name : lamachine:129 (local to host lamachine) >>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>>> Events : 0 >>>> >>>> Number Major Minor RaidDevice >>>> >>>> - 8 82 - /dev/sdf2 >>>> [root@lamachine ~]# mdadm --detail /dev/md2 >>>> /dev/md2: >>>> Version : 0.90 >>>> Creation Time : Mon Feb 11 07:54:36 2013 >>>> Raid Level : raid5 >>>> Array Size : 511999872 (488.28 GiB 524.29 GB) >>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >>>> Raid Devices : 3 >>>> Total Devices : 3 >>>> Preferred Minor : 2 >>>> Persistence : Superblock is persistent >>>> >>>> Update Time : Mon Aug 1 20:24:23 2016 >>>> State : clean >>>> Active Devices : 3 >>>> Working Devices : 3 >>>> Failed Devices : 0 >>>> Spare Devices : 0 >>>> >>>> Layout : left-symmetric >>>> Chunk Size : 64K >>>> >>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >>>> Events : 0.611 >>>> >>>> Number Major Minor RaidDevice State >>>> 0 8 3 0 active sync /dev/sda3 >>>> 1 8 18 1 active sync /dev/sdb2 >>>> 2 8 34 2 active sync /dev/sdc2 >>>> [root@lamachine ~]# >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-11 20:06 ` Daniel Sanabria @ 2016-09-12 19:41 ` Daniel Sanabria 2016-09-12 21:13 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-12 19:41 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid ok, I just adjusted system time so that I can start tracking logs. what I'm noticing however is that fdisk -l is not giving me the expect partitions (I was expecting at least 2 partitions in every 2.7 disk similar to what I have in sdd): [root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde} Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0x00000000 Device Boot Start End Sectors Size Id Type /dev/sdc1 1 4294967295 4294967295 2T ee GPT Partition 1 does not start on physical sector boundary. Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9 Device Start End Sectors Size Type /dev/sdd1 2048 4294969343 4294967296 2T Linux RAID /dev/sdd2 4294969344 5343545343 1048576000 500G Linux filesystem Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0x00000000 Device Boot Start End Sectors Size Id Type /dev/sde1 1 4294967295 4294967295 2T ee GPT Partition 1 does not start on physical sector boundary. [root@lamachine lamachine_220315]# what could've happened here? any ideas why the partition tables ended up like that? From previous information I have an idea of what the md128 and md129 are supposed to looks like (also noticed that the device names changed): # md128 and md129 details From an old command output /dev/md128: Version : 1.2 Creation Time : Fri Oct 24 15:24:38 2014 Raid Level : raid5 Array Size : 4294705152 (4095.75 GiB 4397.78 GB) Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun Mar 22 06:20:08 2015 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : lamachine:128 (local to host lamachine) UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Events : 4041 Number Major Minor RaidDevice State 0 8 49 0 active sync /dev/sdd1 1 8 65 1 active sync /dev/sde1 3 8 81 2 active sync /dev/sdf1 /dev/md129: Version : 1.2 Creation Time : Mon Nov 10 16:28:11 2014 Raid Level : raid0 Array Size : 1572470784 (1499.63 GiB 1610.21 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Mon Nov 10 16:28:11 2014 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : lamachine:129 (local to host lamachine) UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Events : 0 Number Major Minor RaidDevice State 0 8 50 0 active sync /dev/sdd2 1 8 66 1 active sync /dev/sde2 2 8 82 2 active sync /dev/sdf2 Is there any way to recover the contents of these two arrays ? :( On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote: > However I'm noticing that the details with this new MB are somewhat different: > > [root@lamachine ~]# cat /etc/mdadm.conf > # mdadm.conf written out by anaconda > MAILADDR root > AUTO +imsm +1.x -all > ARRAY /dev/md2 level=raid5 num-devices=3 > UUID=2cff15d1:e411447b:fd5d4721:03e44022 > ARRAY /dev/md126 level=raid10 num-devices=2 > UUID=9af006ca:8845bbd3:bfe78010:bc810f04 > ARRAY /dev/md127 level=raid0 num-devices=3 > UUID=acd5374f:72628c93:6a906c4b:5f675ce5 > ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 > UUID=f2372cb9:d3816fd6:ce86d826:882ec82e > ARRAY /dev/md129 metadata=1.2 name=lamachine:129 > UUID=895dae98:d1a496de:4f590b8b:cb8ac12a > [root@lamachine ~]# mdadm --detail /dev/md1* > /dev/md126: > Version : 0.90 > Creation Time : Thu Dec 3 22:12:12 2009 > Raid Level : raid10 > Array Size : 30719936 (29.30 GiB 31.46 GB) > Used Dev Size : 30719936 (29.30 GiB 31.46 GB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 126 > Persistence : Superblock is persistent > > Update Time : Tue Jan 12 04:03:41 2016 > State : clean > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Layout : near=2 > Chunk Size : 64K > > UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 > Events : 0.264152 > > Number Major Minor RaidDevice State > 0 8 82 0 active sync set-A /dev/sdf2 > 1 8 1 1 active sync set-B /dev/sda1 > /dev/md127: > Version : 1.2 > Creation Time : Tue Jul 26 19:00:28 2011 > Raid Level : raid0 > Array Size : 94367232 (90.00 GiB 96.63 GB) > Raid Devices : 3 > Total Devices : 3 > Persistence : Superblock is persistent > > Update Time : Tue Jul 26 19:00:28 2011 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > > Name : reading.homeunix.com:3 > UUID : acd5374f:72628c93:6a906c4b:5f675ce5 > Events : 0 > > Number Major Minor RaidDevice State > 0 8 85 0 active sync /dev/sdf5 > 1 8 21 1 active sync /dev/sdb5 > 2 8 5 2 active sync /dev/sda5 > /dev/md128: > Version : 1.2 > Raid Level : raid0 > Total Devices : 1 > Persistence : Superblock is persistent > > State : inactive > > Name : lamachine:128 (local to host lamachine) > UUID : f2372cb9:d3816fd6:ce86d826:882ec82e > Events : 4154 > > Number Major Minor RaidDevice > > - 8 49 - /dev/sdd1 > /dev/md129: > Version : 1.2 > Raid Level : raid0 > Total Devices : 1 > Persistence : Superblock is persistent > > State : inactive > > Name : lamachine:129 (local to host lamachine) > UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a > Events : 0 > > Number Major Minor RaidDevice > > - 8 50 - /dev/sdd2 > [root@lamachine ~]# mdadm --detail /dev/md2* > /dev/md2: > Version : 0.90 > Creation Time : Mon Feb 11 07:54:36 2013 > Raid Level : raid5 > Array Size : 511999872 (488.28 GiB 524.29 GB) > Used Dev Size : 255999936 (244.14 GiB 262.14 GB) > Raid Devices : 3 > Total Devices : 3 > Preferred Minor : 2 > Persistence : Superblock is persistent > > Update Time : Tue Jan 12 02:31:50 2016 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) > Events : 0.611 > > Number Major Minor RaidDevice State > 0 8 83 0 active sync /dev/sdf3 > 1 8 18 1 active sync /dev/sdb2 > 2 8 2 2 active sync /dev/sda2 > [root@lamachine ~]# cat /proc/mdstat > Personalities : [raid10] [raid0] [raid6] [raid5] [raid4] > md2 : active raid5 sda2[2] sdf3[0] sdb2[1] > 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] > > md127 : active raid0 sda5[2] sdf5[0] sdb5[1] > 94367232 blocks super 1.2 512k chunks > > md129 : inactive sdd2[2](S) > 524156928 blocks super 1.2 > > md128 : inactive sdd1[3](S) > 2147352576 blocks super 1.2 > > md126 : active raid10 sdf2[0] sda1[1] > 30719936 blocks 2 near-copies [2/2] [UU] > > unused devices: <none> > [root@lamachine ~]# > > On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> ok, system up and running after MB was replaced however the arrays >> remain inactive. >> >> mdadm version is: >> mdadm - v3.3.4 - 3rd August 2015 >> >> Here's the output from Phil's lsdrv: >> >> [root@lamachine ~]# ./lsdrv >> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series >> chipset 6-Port SATA AHCI Controller (rev 06) >> ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379} >> │└sda 465.76g [8:0] Partitioned (dos) >> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync >> {9af006ca-8845-bbd3-bfe7-8010bc810f04} >> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk >> {9af006ca:8845bbd3:bfe78010:bc810f04} >> │ │ │ PV LVM2_member 28.03g used, 1.26g free >> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} >> │ │ └VG vg_bigblackbox 29.29g 1.26g free >> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR} >> │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4 >> {b08d7f5e-f15f-4241-804e-edccecab6003} >> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt >> │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4 >> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3} >> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ / >> │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4 >> {f6b46363-170b-4038-83bd-2c5f9f6a1973} >> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp >> │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4 >> {ab165c61-3d62-4c55-8639-6c2c2bf4b021} >> │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var >> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync >> {2cff15d1-e411-447b-fd5d-472103e44022} >> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk >> {2cff15d1:e411447b:fd5d4721:03e44022} >> │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} >> │ │ └Mounted as /dev/md2 @ /home >> │ ├sda3 1.00k [8:3] Partitioned (dos) >> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync >> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} >> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None >> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} >> │ │ │ PV LVM2_member 86.00g used, 3.99g free >> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} >> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB} >> │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos) >> │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos) >> │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos) >> │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos) >> │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos) >> │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos) >> │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos) >> │ └sda6 3.39g [8:6] Empty/Unknown >> ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185} >> │└sdb 465.76g [8:16] Partitioned (dos) >> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync >> {2cff15d1-e411-447b-fd5d-472103e44022} >> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk >> {2cff15d1:e411447b:fd5d4721:03e44022} >> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} >> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a} >> │ ├sdb4 1.00k [8:20] Partitioned (dos) >> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync >> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} >> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None >> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} >> │ │ PV LVM2_member 86.00g used, 3.99g free >> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} >> │ └sdb6 3.39g [8:22] Empty/Unknown >> ├scsi 2:x:x:x [Empty] >> ├scsi 3:x:x:x [Empty] >> ├scsi 4:x:x:x [Empty] >> └scsi 5:x:x:x [Empty] >> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd. >> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11) >> ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF} >> │└sdc 2.73t [8:32] Partitioned (PMBR) >> ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7} >> │└sdd 2.73t [8:48] Partitioned (gpt) >> │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128' >> {f2372cb9-d381-6fd6-ce86-d826882ec82e} >> │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None >> {f2372cb9:d3816fd6:ce86d826:882ec82e} >> │ │ Empty/Unknown >> │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129' >> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a} >> │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None >> {895dae98:d1a496de:4f590b8b:cb8ac12a} >> │ Empty/Unknown >> ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906} >> │└sde 2.73t [8:64] Partitioned (PMBR) >> ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724} >> │└sdf 465.76g [8:80] Partitioned (dos) >> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557} >> │ │└Mounted as /dev/sdf1 @ /boot >> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync >> {9af006ca-8845-bbd3-bfe7-8010bc810f04} >> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk >> {9af006ca:8845bbd3:bfe78010:bc810f04} >> │ │ PV LVM2_member 28.03g used, 1.26g free >> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} >> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync >> {2cff15d1-e411-447b-fd5d-472103e44022} >> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk >> {2cff15d1:e411447b:fd5d4721:03e44022} >> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} >> │ ├sdf4 1.00k [8:84] Partitioned (dos) >> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync >> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} >> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None >> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} >> │ │ PV LVM2_member 86.00g used, 3.99g free >> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} >> │ └sdf6 3.39g [8:86] Empty/Unknown >> ├scsi 10:x:x:x [Empty] >> ├scsi 11:x:x:x [Empty] >> └scsi 12:x:x:x [Empty] >> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation >> C602 chipset 4-Port SATA Storage Control Unit (rev 06) >> └scsi 14:x:x:x [Empty] >> [root@lamachine ~]# >> >> Thanks in advance for any recommendations on what steps to take in >> order to bring these arrays back online. >> >> Regards, >> >> Daniel >> >> >> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote: >>> Thanks very much for the response Wol. >>> >>> It looks like the PSU is dead (server automatically powers off a few >>> seconds after power on). >>> >>> I'm planning to order a PSU replacement to resume troubleshooting so >>> please bear with me; maybe the PSU was degraded and couldn't power >>> some of drives? >>> >>> Cheers, >>> >>> Daniel >>> >>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote: >>>> Just a quick first response. I see md128 and md129 are both down, and >>>> are both listed as one drive, raid0. Bit odd, that ... >>>> >>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?) >>>> that would split an array in two. Is it possible that you should have >>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... >>>> >>>> I notice also that md126 is raid10 across two drives. That's odd, too. >>>> >>>> How much do you know about what the setup should be, and why it was set >>>> up that way? >>>> >>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is >>>> python3 a quick fix to the shebang at the start should get it to work). >>>> Post the output from that here. >>>> >>>> Cheers, >>>> Wol >>>> >>>> On 02/08/16 08:36, Daniel Sanabria wrote: >>>>> Hi All, >>>>> >>>>> I have a box that I believe was not powered down correctly and after >>>>> transporting it to a different location it doesn't boot anymore >>>>> stopping at BIOS check "Verifying DMI Pool Data". >>>>> >>>>> The box have 6 drives and after instructing the BIOS to boot from the >>>>> first drive I managed to boot the OS (Fedora 23) after commenting out >>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: >>>>> >>>>> [root@lamachine ~]# uname -a; cat /etc/fstab >>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC >>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux >>>>> >>>>> # >>>>> # /etc/fstab >>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015 >>>>> # >>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk' >>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info >>>>> # >>>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4 >>>>> defaults 1 1 >>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 >>>>> defaults 1 2 >>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 >>>>> defaults 1 2 >>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 >>>>> defaults 1 2 >>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 >>>>> defaults 1 2 >>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap >>>>> defaults 0 0 >>>>> /dev/md2 /home ext4 defaults 1 2 >>>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 >>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 >>>>> [root@lamachine ~]# >>>>> >>>>> When checking mdstat I can see that 2 of the arrays are showing up as >>>>> inactive, but not sure how to safely activate these so looking for >>>>> some knowledgeable advice on how to proceed here. >>>>> >>>>> Thanks in advance, >>>>> >>>>> Daniel >>>>> >>>>> Below some more relevant outputs: >>>>> >>>>> [root@lamachine ~]# cat /proc/mdstat >>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] >>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1] >>>>> 94367232 blocks super 1.2 512k chunks >>>>> >>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1] >>>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >>>>> >>>>> md128 : inactive sdf1[3](S) >>>>> 2147352576 blocks super 1.2 >>>>> >>>>> md129 : inactive sdf2[2](S) >>>>> 524156928 blocks super 1.2 >>>>> >>>>> md126 : active raid10 sda2[0] sdc1[1] >>>>> 30719936 blocks 2 near-copies [2/2] [UU] >>>>> >>>>> unused devices: <none> >>>>> [root@lamachine ~]# cat /etc/mdadm.conf >>>>> # mdadm.conf written out by anaconda >>>>> MAILADDR root >>>>> AUTO +imsm +1.x -all >>>>> ARRAY /dev/md2 level=raid5 num-devices=3 >>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >>>>> ARRAY /dev/md126 level=raid10 num-devices=2 >>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >>>>> ARRAY /dev/md127 level=raid0 num-devices=3 >>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >>>>> [root@lamachine ~]# mdadm --detail /dev/md1* >>>>> /dev/md126: >>>>> Version : 0.90 >>>>> Creation Time : Thu Dec 3 22:12:12 2009 >>>>> Raid Level : raid10 >>>>> Array Size : 30719936 (29.30 GiB 31.46 GB) >>>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >>>>> Raid Devices : 2 >>>>> Total Devices : 2 >>>>> Preferred Minor : 126 >>>>> Persistence : Superblock is persistent >>>>> >>>>> Update Time : Tue Aug 2 07:46:39 2016 >>>>> State : clean >>>>> Active Devices : 2 >>>>> Working Devices : 2 >>>>> Failed Devices : 0 >>>>> Spare Devices : 0 >>>>> >>>>> Layout : near=2 >>>>> Chunk Size : 64K >>>>> >>>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >>>>> Events : 0.264152 >>>>> >>>>> Number Major Minor RaidDevice State >>>>> 0 8 2 0 active sync set-A /dev/sda2 >>>>> 1 8 33 1 active sync set-B /dev/sdc1 >>>>> /dev/md127: >>>>> Version : 1.2 >>>>> Creation Time : Tue Jul 26 19:00:28 2011 >>>>> Raid Level : raid0 >>>>> Array Size : 94367232 (90.00 GiB 96.63 GB) >>>>> Raid Devices : 3 >>>>> Total Devices : 3 >>>>> Persistence : Superblock is persistent >>>>> >>>>> Update Time : Tue Jul 26 19:00:28 2011 >>>>> State : clean >>>>> Active Devices : 3 >>>>> Working Devices : 3 >>>>> Failed Devices : 0 >>>>> Spare Devices : 0 >>>>> >>>>> Chunk Size : 512K >>>>> >>>>> Name : reading.homeunix.com:3 >>>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >>>>> Events : 0 >>>>> >>>>> Number Major Minor RaidDevice State >>>>> 0 8 5 0 active sync /dev/sda5 >>>>> 1 8 21 1 active sync /dev/sdb5 >>>>> 2 8 37 2 active sync /dev/sdc5 >>>>> /dev/md128: >>>>> Version : 1.2 >>>>> Raid Level : raid0 >>>>> Total Devices : 1 >>>>> Persistence : Superblock is persistent >>>>> >>>>> State : inactive >>>>> >>>>> Name : lamachine:128 (local to host lamachine) >>>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >>>>> Events : 4154 >>>>> >>>>> Number Major Minor RaidDevice >>>>> >>>>> - 8 81 - /dev/sdf1 >>>>> /dev/md129: >>>>> Version : 1.2 >>>>> Raid Level : raid0 >>>>> Total Devices : 1 >>>>> Persistence : Superblock is persistent >>>>> >>>>> State : inactive >>>>> >>>>> Name : lamachine:129 (local to host lamachine) >>>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>>>> Events : 0 >>>>> >>>>> Number Major Minor RaidDevice >>>>> >>>>> - 8 82 - /dev/sdf2 >>>>> [root@lamachine ~]# mdadm --detail /dev/md2 >>>>> /dev/md2: >>>>> Version : 0.90 >>>>> Creation Time : Mon Feb 11 07:54:36 2013 >>>>> Raid Level : raid5 >>>>> Array Size : 511999872 (488.28 GiB 524.29 GB) >>>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >>>>> Raid Devices : 3 >>>>> Total Devices : 3 >>>>> Preferred Minor : 2 >>>>> Persistence : Superblock is persistent >>>>> >>>>> Update Time : Mon Aug 1 20:24:23 2016 >>>>> State : clean >>>>> Active Devices : 3 >>>>> Working Devices : 3 >>>>> Failed Devices : 0 >>>>> Spare Devices : 0 >>>>> >>>>> Layout : left-symmetric >>>>> Chunk Size : 64K >>>>> >>>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >>>>> Events : 0.611 >>>>> >>>>> Number Major Minor RaidDevice State >>>>> 0 8 3 0 active sync /dev/sda3 >>>>> 1 8 18 1 active sync /dev/sdb2 >>>>> 2 8 34 2 active sync /dev/sdc2 >>>>> [root@lamachine ~]# >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-12 19:41 ` Daniel Sanabria @ 2016-09-12 21:13 ` Daniel Sanabria 2016-09-12 21:37 ` Chris Murphy 2016-09-12 21:39 ` Wols Lists 0 siblings, 2 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-12 21:13 UTC (permalink / raw) To: Wols Lists; +Cc: linux-raid apologies for the verbosity just adding some more info which is now making me lose hope. Using parted -l instead of fdisk gives me this: [root@lamachine ~]# parted -l Model: ATA WDC WD5000AAKS-0 (scsi) Disk /dev/sda: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 32.3kB 31.5GB 31.5GB primary raid 2 31.5GB 294GB 262GB primary ext4 raid 3 294GB 500GB 207GB extended 5 294GB 326GB 32.2GB logical 6 336GB 339GB 3644MB logical raid Model: ATA WDC WD5000AAKS-0 (scsi) Disk /dev/sdb: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 2 210MB 262GB 262GB primary raid 3 262GB 271GB 8389MB primary linux-swap(v1) 4 271GB 500GB 229GB extended 5 271GB 303GB 32.2GB logical 6 313GB 317GB 3644MB logical raid Error: Invalid argument during seek for read on /dev/sdc Retry/Ignore/Cancel? R Error: Invalid argument during seek for read on /dev/sdc Retry/Ignore/Cancel? I Error: The backup GPT table is corrupt, but the primary appears OK, so that will be used. OK/Cancel? O Model: ATA WDC WD30EZRX-00D (scsi) Disk /dev/sdc: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: unknown Disk Flags: Model: ATA WDC WD30EZRX-00D (scsi) Disk /dev/sdd: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 2199GB 2199GB raid 2 2199GB 2736GB 537GB Error: Invalid argument during seek for read on /dev/sde Retry/Ignore/Cancel? C Model: ATA WDC WD30EZRX-00D (scsi) Disk /dev/sde: 3001GB Sector size (logical/physical): 512B/4096B Partition Table: unknown Disk Flags: Model: ATA WDC WD5000AAKS-0 (scsi) Disk /dev/sdf: 500GB Sector size (logical/physical): 512B/512B Partition Table: msdos Disk Flags: Number Start End Size Type File system Flags 1 1049kB 210MB 209MB primary ext4 boot 2 210MB 31.7GB 31.5GB primary raid 3 31.7GB 294GB 262GB primary ext4 raid 4 294GB 500GB 206GB extended 5 294GB 326GB 32.2GB logical 6 336GB 340GB 3644MB logical raid Model: Linux Software RAID Array (md) Disk /dev/md2: 524GB Sector size (logical/physical): 512B/512B Partition Table: loop Disk Flags: Number Start End Size File system Flags 1 0.00B 524GB 524GB ext4 Error: /dev/md126: unrecognised disk label Model: Linux Software RAID Array (md) Disk /dev/md126: 31.5GB Sector size (logical/physical): 512B/512B Partition Table: unknown Disk Flags: Error: /dev/md127: unrecognised disk label Model: Linux Software RAID Array (md) Disk /dev/md127: 96.6GB Sector size (logical/physical): 512B/512B Partition Table: unknown Disk Flags: On 12 September 2016 at 20:41, Daniel Sanabria <sanabria.d@gmail.com> wrote: > ok, I just adjusted system time so that I can start tracking logs. > > what I'm noticing however is that fdisk -l is not giving me the expect > partitions (I was expecting at least 2 partitions in every 2.7 disk > similar to what I have in sdd): > > [root@lamachine lamachine_220315]# fdisk -l /dev/{sdc,sdd,sde} > Disk /dev/sdc: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > Disklabel type: dos > Disk identifier: 0x00000000 > > Device Boot Start End Sectors Size Id Type > /dev/sdc1 1 4294967295 4294967295 2T ee GPT > > Partition 1 does not start on physical sector boundary. > Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > Disklabel type: gpt > Disk identifier: D3233810-F552-4126-8281-7F71A4938DF9 > > Device Start End Sectors Size Type > /dev/sdd1 2048 4294969343 4294967296 2T Linux RAID > /dev/sdd2 4294969344 5343545343 1048576000 500G Linux filesystem > Disk /dev/sde: 2.7 TiB, 3000591900160 bytes, 5860531055 sectors > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 4096 bytes > I/O size (minimum/optimal): 4096 bytes / 4096 bytes > Disklabel type: dos > Disk identifier: 0x00000000 > > Device Boot Start End Sectors Size Id Type > /dev/sde1 1 4294967295 4294967295 2T ee GPT > > Partition 1 does not start on physical sector boundary. > [root@lamachine lamachine_220315]# > > what could've happened here? any ideas why the partition tables ended > up like that? > > From previous information I have an idea of what the md128 and md129 > are supposed to looks like (also noticed that the device names > changed): > > # md128 and md129 details From an old command output > /dev/md128: > Version : 1.2 > Creation Time : Fri Oct 24 15:24:38 2014 > Raid Level : raid5 > Array Size : 4294705152 (4095.75 GiB 4397.78 GB) > Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB) > Raid Devices : 3 > Total Devices : 3 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Sun Mar 22 06:20:08 2015 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Name : lamachine:128 (local to host lamachine) > UUID : f2372cb9:d3816fd6:ce86d826:882ec82e > Events : 4041 > > Number Major Minor RaidDevice State > 0 8 49 0 active sync /dev/sdd1 > 1 8 65 1 active sync /dev/sde1 > 3 8 81 2 active sync /dev/sdf1 > /dev/md129: > Version : 1.2 > Creation Time : Mon Nov 10 16:28:11 2014 > Raid Level : raid0 > Array Size : 1572470784 (1499.63 GiB 1610.21 GB) > Raid Devices : 3 > Total Devices : 3 > Persistence : Superblock is persistent > > Update Time : Mon Nov 10 16:28:11 2014 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > > Name : lamachine:129 (local to host lamachine) > UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a > Events : 0 > > Number Major Minor RaidDevice State > 0 8 50 0 active sync /dev/sdd2 > 1 8 66 1 active sync /dev/sde2 > 2 8 82 2 active sync /dev/sdf2 > > Is there any way to recover the contents of these two arrays ? :( > > On 11 September 2016 at 21:06, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> However I'm noticing that the details with this new MB are somewhat different: >> >> [root@lamachine ~]# cat /etc/mdadm.conf >> # mdadm.conf written out by anaconda >> MAILADDR root >> AUTO +imsm +1.x -all >> ARRAY /dev/md2 level=raid5 num-devices=3 >> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >> ARRAY /dev/md126 level=raid10 num-devices=2 >> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >> ARRAY /dev/md127 level=raid0 num-devices=3 >> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >> [root@lamachine ~]# mdadm --detail /dev/md1* >> /dev/md126: >> Version : 0.90 >> Creation Time : Thu Dec 3 22:12:12 2009 >> Raid Level : raid10 >> Array Size : 30719936 (29.30 GiB 31.46 GB) >> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >> Raid Devices : 2 >> Total Devices : 2 >> Preferred Minor : 126 >> Persistence : Superblock is persistent >> >> Update Time : Tue Jan 12 04:03:41 2016 >> State : clean >> Active Devices : 2 >> Working Devices : 2 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : near=2 >> Chunk Size : 64K >> >> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >> Events : 0.264152 >> >> Number Major Minor RaidDevice State >> 0 8 82 0 active sync set-A /dev/sdf2 >> 1 8 1 1 active sync set-B /dev/sda1 >> /dev/md127: >> Version : 1.2 >> Creation Time : Tue Jul 26 19:00:28 2011 >> Raid Level : raid0 >> Array Size : 94367232 (90.00 GiB 96.63 GB) >> Raid Devices : 3 >> Total Devices : 3 >> Persistence : Superblock is persistent >> >> Update Time : Tue Jul 26 19:00:28 2011 >> State : clean >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >> Name : reading.homeunix.com:3 >> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >> Events : 0 >> >> Number Major Minor RaidDevice State >> 0 8 85 0 active sync /dev/sdf5 >> 1 8 21 1 active sync /dev/sdb5 >> 2 8 5 2 active sync /dev/sda5 >> /dev/md128: >> Version : 1.2 >> Raid Level : raid0 >> Total Devices : 1 >> Persistence : Superblock is persistent >> >> State : inactive >> >> Name : lamachine:128 (local to host lamachine) >> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >> Events : 4154 >> >> Number Major Minor RaidDevice >> >> - 8 49 - /dev/sdd1 >> /dev/md129: >> Version : 1.2 >> Raid Level : raid0 >> Total Devices : 1 >> Persistence : Superblock is persistent >> >> State : inactive >> >> Name : lamachine:129 (local to host lamachine) >> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >> Events : 0 >> >> Number Major Minor RaidDevice >> >> - 8 50 - /dev/sdd2 >> [root@lamachine ~]# mdadm --detail /dev/md2* >> /dev/md2: >> Version : 0.90 >> Creation Time : Mon Feb 11 07:54:36 2013 >> Raid Level : raid5 >> Array Size : 511999872 (488.28 GiB 524.29 GB) >> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >> Raid Devices : 3 >> Total Devices : 3 >> Preferred Minor : 2 >> Persistence : Superblock is persistent >> >> Update Time : Tue Jan 12 02:31:50 2016 >> State : clean >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : left-symmetric >> Chunk Size : 64K >> >> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >> Events : 0.611 >> >> Number Major Minor RaidDevice State >> 0 8 83 0 active sync /dev/sdf3 >> 1 8 18 1 active sync /dev/sdb2 >> 2 8 2 2 active sync /dev/sda2 >> [root@lamachine ~]# cat /proc/mdstat >> Personalities : [raid10] [raid0] [raid6] [raid5] [raid4] >> md2 : active raid5 sda2[2] sdf3[0] sdb2[1] >> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >> >> md127 : active raid0 sda5[2] sdf5[0] sdb5[1] >> 94367232 blocks super 1.2 512k chunks >> >> md129 : inactive sdd2[2](S) >> 524156928 blocks super 1.2 >> >> md128 : inactive sdd1[3](S) >> 2147352576 blocks super 1.2 >> >> md126 : active raid10 sdf2[0] sda1[1] >> 30719936 blocks 2 near-copies [2/2] [UU] >> >> unused devices: <none> >> [root@lamachine ~]# >> >> On 11 September 2016 at 19:48, Daniel Sanabria <sanabria.d@gmail.com> wrote: >>> ok, system up and running after MB was replaced however the arrays >>> remain inactive. >>> >>> mdadm version is: >>> mdadm - v3.3.4 - 3rd August 2015 >>> >>> Here's the output from Phil's lsdrv: >>> >>> [root@lamachine ~]# ./lsdrv >>> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation C600/X79 series >>> chipset 6-Port SATA AHCI Controller (rev 06) >>> ├scsi 0:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASZ0505379} >>> │└sda 465.76g [8:0] Partitioned (dos) >>> │ ├sda1 29.30g [8:1] MD raid10,near2 (1/2) (w/ sdf2) in_sync >>> {9af006ca-8845-bbd3-bfe7-8010bc810f04} >>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk >>> {9af006ca:8845bbd3:bfe78010:bc810f04} >>> │ │ │ PV LVM2_member 28.03g used, 1.26g free >>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} >>> │ │ └VG vg_bigblackbox 29.29g 1.26g free >>> {VWfuwI-5v2q-w8qf-FEbc-BdGW-3mKX-pZd7hR} >>> │ │ ├dm-2 7.81g [253:2] LV LogVol_opt ext4 >>> {b08d7f5e-f15f-4241-804e-edccecab6003} >>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_opt @ /opt >>> │ │ ├dm-0 9.77g [253:0] LV LogVol_root ext4 >>> {4dabd6b0-b1a3-464d-8ed7-0aab93fab6c3} >>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_root @ / >>> │ │ ├dm-3 1.95g [253:3] LV LogVol_tmp ext4 >>> {f6b46363-170b-4038-83bd-2c5f9f6a1973} >>> │ │ │└Mounted as /dev/mapper/vg_bigblackbox-LogVol_tmp @ /tmp >>> │ │ └dm-1 8.50g [253:1] LV LogVol_var ext4 >>> {ab165c61-3d62-4c55-8639-6c2c2bf4b021} >>> │ │ └Mounted as /dev/mapper/vg_bigblackbox-LogVol_var @ /var >>> │ ├sda2 244.14g [8:2] MD raid5 (2/3) (w/ sdb2,sdf3) in_sync >>> {2cff15d1-e411-447b-fd5d-472103e44022} >>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk >>> {2cff15d1:e411447b:fd5d4721:03e44022} >>> │ │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} >>> │ │ └Mounted as /dev/md2 @ /home >>> │ ├sda3 1.00k [8:3] Partitioned (dos) >>> │ ├sda5 30.00g [8:5] MD raid0 (2/3) (w/ sdb5,sdf5) in_sync >>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} >>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None >>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} >>> │ │ │ PV LVM2_member 86.00g used, 3.99g free >>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} >>> │ │ └VG libvirt_lvm 89.99g 3.99g free {t8GQck-f2Eu-iD2V-fnJQ-kBm6-QyKw-dR31PB} >>> │ │ ├dm-6 8.00g [253:6] LV builder2 Partitioned (dos) >>> │ │ ├dm-7 8.00g [253:7] LV builder3 Partitioned (dos) >>> │ │ ├dm-9 8.00g [253:9] LV builder5.3 Partitioned (dos) >>> │ │ ├dm-8 8.00g [253:8] LV builder5.6 Partitioned (dos) >>> │ │ ├dm-5 8.00g [253:5] LV centos_updt Partitioned (dos) >>> │ │ ├dm-10 16.00g [253:10] LV f22lvm Partitioned (dos) >>> │ │ └dm-4 30.00g [253:4] LV win7 Partitioned (dos) >>> │ └sda6 3.39g [8:6] Empty/Unknown >>> ├scsi 1:0:0:0 ATA WDC WD5000AAKS-0 {WD-WCASY7694185} >>> │└sdb 465.76g [8:16] Partitioned (dos) >>> │ ├sdb2 244.14g [8:18] MD raid5 (1/3) (w/ sda2,sdf3) in_sync >>> {2cff15d1-e411-447b-fd5d-472103e44022} >>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk >>> {2cff15d1:e411447b:fd5d4721:03e44022} >>> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} >>> │ ├sdb3 7.81g [8:19] swap {9194f492-881a-4fc3-ac09-ca4e1cc2985a} >>> │ ├sdb4 1.00k [8:20] Partitioned (dos) >>> │ ├sdb5 30.00g [8:21] MD raid0 (1/3) (w/ sda5,sdf5) in_sync >>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} >>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None >>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} >>> │ │ PV LVM2_member 86.00g used, 3.99g free >>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} >>> │ └sdb6 3.39g [8:22] Empty/Unknown >>> ├scsi 2:x:x:x [Empty] >>> ├scsi 3:x:x:x [Empty] >>> ├scsi 4:x:x:x [Empty] >>> └scsi 5:x:x:x [Empty] >>> PCI [ahci] 0a:00.0 SATA controller: Marvell Technology Group Ltd. >>> 88SE9230 PCIe SATA 6Gb/s Controller (rev 11) >>> ├scsi 6:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NCWT13RF} >>> │└sdc 2.73t [8:32] Partitioned (PMBR) >>> ├scsi 7:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4NPRDD6D7} >>> │└sdd 2.73t [8:48] Partitioned (gpt) >>> │ ├sdd1 2.00t [8:49] MD (none/) spare 'lamachine:128' >>> {f2372cb9-d381-6fd6-ce86-d826882ec82e} >>> │ │└md128 0.00k [9:128] MD v1.2 () inactive, None (None) None >>> {f2372cb9:d3816fd6:ce86d826:882ec82e} >>> │ │ Empty/Unknown >>> │ └sdd2 500.00g [8:50] MD (none/) spare 'lamachine:129' >>> {895dae98-d1a4-96de-4f59-0b8bcb8ac12a} >>> │ └md129 0.00k [9:129] MD v1.2 () inactive, None (None) None >>> {895dae98:d1a496de:4f590b8b:cb8ac12a} >>> │ Empty/Unknown >>> ├scsi 8:0:0:0 ATA WDC WD30EZRX-00D {WD-WCC4N1294906} >>> │└sde 2.73t [8:64] Partitioned (PMBR) >>> ├scsi 9:0:0:0 ATA WDC WD5000AAKS-0 {WD-WMAWF0085724} >>> │└sdf 465.76g [8:80] Partitioned (dos) >>> │ ├sdf1 199.00m [8:81] ext4 {4e51f903-37ca-4479-9197-fac7b2280557} >>> │ │└Mounted as /dev/sdf1 @ /boot >>> │ ├sdf2 29.30g [8:82] MD raid10,near2 (0/2) (w/ sda1) in_sync >>> {9af006ca-8845-bbd3-bfe7-8010bc810f04} >>> │ │└md126 29.30g [9:126] MD v0.90 raid10,near2 (2) clean, 64k Chunk >>> {9af006ca:8845bbd3:bfe78010:bc810f04} >>> │ │ PV LVM2_member 28.03g used, 1.26g free >>> {cE4ePh-RWO8-Wgdy-YPOY-ehyC-KI6u-io1cyH} >>> │ ├sdf3 244.14g [8:83] MD raid5 (0/3) (w/ sda2,sdb2) in_sync >>> {2cff15d1-e411-447b-fd5d-472103e44022} >>> │ │└md2 488.28g [9:2] MD v0.90 raid5 (3) clean, 64k Chunk >>> {2cff15d1:e411447b:fd5d4721:03e44022} >>> │ │ ext4 {e9c1c787-496f-4e8f-b62e-35d5b1ff8311} >>> │ ├sdf4 1.00k [8:84] Partitioned (dos) >>> │ ├sdf5 30.00g [8:85] MD raid0 (0/3) (w/ sda5,sdb5) in_sync >>> 'reading.homeunix.com:3' {acd5374f-7262-8c93-6a90-6c4b5f675ce5} >>> │ │└md127 90.00g [9:127] MD v1.2 raid0 (3) clean, 512k Chunk, None >>> (None) None {acd5374f:72628c93:6a906c4b:5f675ce5} >>> │ │ PV LVM2_member 86.00g used, 3.99g free >>> {VmsWRd-8qHt-bauf-lvAn-FC97-KyH5-gk89ox} >>> │ └sdf6 3.39g [8:86] Empty/Unknown >>> ├scsi 10:x:x:x [Empty] >>> ├scsi 11:x:x:x [Empty] >>> └scsi 12:x:x:x [Empty] >>> PCI [isci] 05:00.0 Serial Attached SCSI controller: Intel Corporation >>> C602 chipset 4-Port SATA Storage Control Unit (rev 06) >>> └scsi 14:x:x:x [Empty] >>> [root@lamachine ~]# >>> >>> Thanks in advance for any recommendations on what steps to take in >>> order to bring these arrays back online. >>> >>> Regards, >>> >>> Daniel >>> >>> >>> On 2 August 2016 at 11:45, Daniel Sanabria <sanabria.d@gmail.com> wrote: >>>> Thanks very much for the response Wol. >>>> >>>> It looks like the PSU is dead (server automatically powers off a few >>>> seconds after power on). >>>> >>>> I'm planning to order a PSU replacement to resume troubleshooting so >>>> please bear with me; maybe the PSU was degraded and couldn't power >>>> some of drives? >>>> >>>> Cheers, >>>> >>>> Daniel >>>> >>>> On 2 August 2016 at 11:17, Wols Lists <antlists@youngman.org.uk> wrote: >>>>> Just a quick first response. I see md128 and md129 are both down, and >>>>> are both listed as one drive, raid0. Bit odd, that ... >>>>> >>>>> What version of mdadm are you using? One of them had a bug (3.2.3 era?) >>>>> that would split an array in two. Is it possible that you should have >>>>> one raid0 array with sdf1 and sdf2? But that's a bit of a weird setup... >>>>> >>>>> I notice also that md126 is raid10 across two drives. That's odd, too. >>>>> >>>>> How much do you know about what the setup should be, and why it was set >>>>> up that way? >>>>> >>>>> Download lspci by Phil Turmel (it requires python2.7, if your machine is >>>>> python3 a quick fix to the shebang at the start should get it to work). >>>>> Post the output from that here. >>>>> >>>>> Cheers, >>>>> Wol >>>>> >>>>> On 02/08/16 08:36, Daniel Sanabria wrote: >>>>>> Hi All, >>>>>> >>>>>> I have a box that I believe was not powered down correctly and after >>>>>> transporting it to a different location it doesn't boot anymore >>>>>> stopping at BIOS check "Verifying DMI Pool Data". >>>>>> >>>>>> The box have 6 drives and after instructing the BIOS to boot from the >>>>>> first drive I managed to boot the OS (Fedora 23) after commenting out >>>>>> 2 /etc/fstab entries , output for "uname -a; cat /etc/fstab" follows: >>>>>> >>>>>> [root@lamachine ~]# uname -a; cat /etc/fstab >>>>>> Linux lamachine 4.3.3-303.fc23.x86_64 #1 SMP Tue Jan 19 18:31:55 UTC >>>>>> 2016 x86_64 x86_64 x86_64 GNU/Linux >>>>>> >>>>>> # >>>>>> # /etc/fstab >>>>>> # Created by anaconda on Tue Mar 24 19:31:21 2015 >>>>>> # >>>>>> # Accessible filesystems, by reference, are maintained under '/dev/disk' >>>>>> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info >>>>>> # >>>>>> /dev/mapper/vg_bigblackbox-LogVol_root / ext4 >>>>>> defaults 1 1 >>>>>> UUID=4e51f903-37ca-4479-9197-fac7b2280557 /boot ext4 >>>>>> defaults 1 2 >>>>>> /dev/mapper/vg_bigblackbox-LogVol_opt /opt ext4 >>>>>> defaults 1 2 >>>>>> /dev/mapper/vg_bigblackbox-LogVol_tmp /tmp ext4 >>>>>> defaults 1 2 >>>>>> /dev/mapper/vg_bigblackbox-LogVol_var /var ext4 >>>>>> defaults 1 2 >>>>>> UUID=9194f492-881a-4fc3-ac09-ca4e1cc2985a swap swap >>>>>> defaults 0 0 >>>>>> /dev/md2 /home ext4 defaults 1 2 >>>>>> #/dev/vg_media/lv_media /mnt/media ext4 defaults 1 2 >>>>>> #/dev/vg_virt_dir/lv_virt_dir1 /mnt/guest_images/ ext4 defaults 1 2 >>>>>> [root@lamachine ~]# >>>>>> >>>>>> When checking mdstat I can see that 2 of the arrays are showing up as >>>>>> inactive, but not sure how to safely activate these so looking for >>>>>> some knowledgeable advice on how to proceed here. >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Daniel >>>>>> >>>>>> Below some more relevant outputs: >>>>>> >>>>>> [root@lamachine ~]# cat /proc/mdstat >>>>>> Personalities : [raid10] [raid6] [raid5] [raid4] [raid0] >>>>>> md127 : active raid0 sda5[0] sdc5[2] sdb5[1] >>>>>> 94367232 blocks super 1.2 512k chunks >>>>>> >>>>>> md2 : active raid5 sda3[0] sdc2[2] sdb2[1] >>>>>> 511999872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] >>>>>> >>>>>> md128 : inactive sdf1[3](S) >>>>>> 2147352576 blocks super 1.2 >>>>>> >>>>>> md129 : inactive sdf2[2](S) >>>>>> 524156928 blocks super 1.2 >>>>>> >>>>>> md126 : active raid10 sda2[0] sdc1[1] >>>>>> 30719936 blocks 2 near-copies [2/2] [UU] >>>>>> >>>>>> unused devices: <none> >>>>>> [root@lamachine ~]# cat /etc/mdadm.conf >>>>>> # mdadm.conf written out by anaconda >>>>>> MAILADDR root >>>>>> AUTO +imsm +1.x -all >>>>>> ARRAY /dev/md2 level=raid5 num-devices=3 >>>>>> UUID=2cff15d1:e411447b:fd5d4721:03e44022 >>>>>> ARRAY /dev/md126 level=raid10 num-devices=2 >>>>>> UUID=9af006ca:8845bbd3:bfe78010:bc810f04 >>>>>> ARRAY /dev/md127 level=raid0 num-devices=3 >>>>>> UUID=acd5374f:72628c93:6a906c4b:5f675ce5 >>>>>> ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 >>>>>> UUID=f2372cb9:d3816fd6:ce86d826:882ec82e >>>>>> ARRAY /dev/md129 metadata=1.2 name=lamachine:129 >>>>>> UUID=895dae98:d1a496de:4f590b8b:cb8ac12a >>>>>> [root@lamachine ~]# mdadm --detail /dev/md1* >>>>>> /dev/md126: >>>>>> Version : 0.90 >>>>>> Creation Time : Thu Dec 3 22:12:12 2009 >>>>>> Raid Level : raid10 >>>>>> Array Size : 30719936 (29.30 GiB 31.46 GB) >>>>>> Used Dev Size : 30719936 (29.30 GiB 31.46 GB) >>>>>> Raid Devices : 2 >>>>>> Total Devices : 2 >>>>>> Preferred Minor : 126 >>>>>> Persistence : Superblock is persistent >>>>>> >>>>>> Update Time : Tue Aug 2 07:46:39 2016 >>>>>> State : clean >>>>>> Active Devices : 2 >>>>>> Working Devices : 2 >>>>>> Failed Devices : 0 >>>>>> Spare Devices : 0 >>>>>> >>>>>> Layout : near=2 >>>>>> Chunk Size : 64K >>>>>> >>>>>> UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 >>>>>> Events : 0.264152 >>>>>> >>>>>> Number Major Minor RaidDevice State >>>>>> 0 8 2 0 active sync set-A /dev/sda2 >>>>>> 1 8 33 1 active sync set-B /dev/sdc1 >>>>>> /dev/md127: >>>>>> Version : 1.2 >>>>>> Creation Time : Tue Jul 26 19:00:28 2011 >>>>>> Raid Level : raid0 >>>>>> Array Size : 94367232 (90.00 GiB 96.63 GB) >>>>>> Raid Devices : 3 >>>>>> Total Devices : 3 >>>>>> Persistence : Superblock is persistent >>>>>> >>>>>> Update Time : Tue Jul 26 19:00:28 2011 >>>>>> State : clean >>>>>> Active Devices : 3 >>>>>> Working Devices : 3 >>>>>> Failed Devices : 0 >>>>>> Spare Devices : 0 >>>>>> >>>>>> Chunk Size : 512K >>>>>> >>>>>> Name : reading.homeunix.com:3 >>>>>> UUID : acd5374f:72628c93:6a906c4b:5f675ce5 >>>>>> Events : 0 >>>>>> >>>>>> Number Major Minor RaidDevice State >>>>>> 0 8 5 0 active sync /dev/sda5 >>>>>> 1 8 21 1 active sync /dev/sdb5 >>>>>> 2 8 37 2 active sync /dev/sdc5 >>>>>> /dev/md128: >>>>>> Version : 1.2 >>>>>> Raid Level : raid0 >>>>>> Total Devices : 1 >>>>>> Persistence : Superblock is persistent >>>>>> >>>>>> State : inactive >>>>>> >>>>>> Name : lamachine:128 (local to host lamachine) >>>>>> UUID : f2372cb9:d3816fd6:ce86d826:882ec82e >>>>>> Events : 4154 >>>>>> >>>>>> Number Major Minor RaidDevice >>>>>> >>>>>> - 8 81 - /dev/sdf1 >>>>>> /dev/md129: >>>>>> Version : 1.2 >>>>>> Raid Level : raid0 >>>>>> Total Devices : 1 >>>>>> Persistence : Superblock is persistent >>>>>> >>>>>> State : inactive >>>>>> >>>>>> Name : lamachine:129 (local to host lamachine) >>>>>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>>>>> Events : 0 >>>>>> >>>>>> Number Major Minor RaidDevice >>>>>> >>>>>> - 8 82 - /dev/sdf2 >>>>>> [root@lamachine ~]# mdadm --detail /dev/md2 >>>>>> /dev/md2: >>>>>> Version : 0.90 >>>>>> Creation Time : Mon Feb 11 07:54:36 2013 >>>>>> Raid Level : raid5 >>>>>> Array Size : 511999872 (488.28 GiB 524.29 GB) >>>>>> Used Dev Size : 255999936 (244.14 GiB 262.14 GB) >>>>>> Raid Devices : 3 >>>>>> Total Devices : 3 >>>>>> Preferred Minor : 2 >>>>>> Persistence : Superblock is persistent >>>>>> >>>>>> Update Time : Mon Aug 1 20:24:23 2016 >>>>>> State : clean >>>>>> Active Devices : 3 >>>>>> Working Devices : 3 >>>>>> Failed Devices : 0 >>>>>> Spare Devices : 0 >>>>>> >>>>>> Layout : left-symmetric >>>>>> Chunk Size : 64K >>>>>> >>>>>> UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) >>>>>> Events : 0.611 >>>>>> >>>>>> Number Major Minor RaidDevice State >>>>>> 0 8 3 0 active sync /dev/sda3 >>>>>> 1 8 18 1 active sync /dev/sdb2 >>>>>> 2 8 34 2 active sync /dev/sdc2 >>>>>> [root@lamachine ~]# >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-12 21:13 ` Daniel Sanabria @ 2016-09-12 21:37 ` Chris Murphy 2016-09-13 6:51 ` Daniel Sanabria 2016-09-12 21:39 ` Wols Lists 1 sibling, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-12 21:37 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Wols Lists, Linux-RAID On Mon, Sep 12, 2016 at 3:13 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote: > Error: Invalid argument during seek for read on /dev/sdc > Retry/Ignore/Cancel? R > Error: Invalid argument during seek for read on /dev/sdc > Retry/Ignore/Cancel? I > Error: The backup GPT table is corrupt, but the primary appears OK, so > that will be used. > OK/Cancel? O > Model: ATA WDC WD30EZRX-00D (scsi) > Disk /dev/sdc: 3001GB > Sector size (logical/physical): 512B/4096B > Partition Table: unknown > Disk Flags: What version of parted? This looks like a problem due to the error, followed by "primary appears OK" but instead of using that like it says, it reports the Partition Table as unknown. Not expected. > Error: Invalid argument during seek for read on /dev/sde > Retry/Ignore/Cancel? C > Model: ATA WDC WD30EZRX-00D (scsi) > Disk /dev/sde: 3001GB > Sector size (logical/physical): 512B/4096B > Partition Table: unknown > Disk Flags: And the same thing with a second drive? What are the chances? Can you post a complete dmesg somewhere? Are these two drives on the same controller? smartctl -x for each drive? -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-12 21:37 ` Chris Murphy @ 2016-09-13 6:51 ` Daniel Sanabria 2016-09-13 15:04 ` Chris Murphy 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 6:51 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > What version of parted? parted (GNU parted) 3.2 > Are these two drives on the same > controller? yes afaict ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 6:51 ` Daniel Sanabria @ 2016-09-13 15:04 ` Chris Murphy 0 siblings, 0 replies; 37+ messages in thread From: Chris Murphy @ 2016-09-13 15:04 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Tue, Sep 13, 2016 at 12:51 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> What version of parted? > > parted (GNU parted) 3.2 That should reliably fix the GPT, assuming it's appropriate to fix it - which isn't necessarily a good assumption. The backup GPT is located in about the same place as the mdadm v2 metadata. So one can step on the other, depending on how the array is created (on whole devices or on partitions). The problem with unwinding this is if any of these structures becomes stale without having the signature wiped, it becomes important to find out for certain which one is valid and which one is stale. > >> Are these two drives on the same >> controller? > > yes afaict I'd check all the cables (the connections in particular) drive to controller and that the controller is seated. Seems sorta unlikely you'd have the same kind of problem on two drives at the same time that's a hardware problem unless they share something the other drives don't; or if some event has modified both drives the same way. Thing is, invalid argument during seek for read? Kinda sounds like a hardware problem. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-12 21:13 ` Daniel Sanabria 2016-09-12 21:37 ` Chris Murphy @ 2016-09-12 21:39 ` Wols Lists 2016-09-13 6:56 ` Daniel Sanabria 1 sibling, 1 reply; 37+ messages in thread From: Wols Lists @ 2016-09-12 21:39 UTC (permalink / raw) To: Daniel Sanabria; +Cc: linux-raid On 12/09/16 22:13, Daniel Sanabria wrote: > apologies for the verbosity just adding some more info which is now > making me lose hope. Using parted -l instead of fdisk gives me this: Hmmm... I'd wait and let some of the experts in disk recovery chime in before it gets that far. The fact that fdisk found the partitions on sdc and sde is a hopeful sign. And providing something hasn't scribbled all over the disk, several of them know their way around partition tables and can recreate them if they're not totally scrambled. I think using fdisk on the drives was a mistake, but seeing as -l doesn't write anything it was a harmless mistake. iirc fdisk can't handle gpt disks, or stuff over 2TB, so that's where that problem lay. Try using gdisk instead of parted, that might behave just that little bit differently, and so long as it doesn't write anything, it won't do any harm. What does smartctl report on sdc and sde (I think you want smartctl -x, the extended "display everything" command)? And looking at the lsdrv stuff, were you using LVM? That's got a load of references to LV. The snag is, I'm now getting a bit out of my depth - I tend to "first respond" and ask for the information that I know the experts will want. If Phil or someone else now takes a look at this thread it'll contain all the information they need, but I think we need to wait for them to chime in now. So hopefully they'll see this tonight and be on hand soon... I'll come back if I think of anything new ... Cheers, Wol ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-12 21:39 ` Wols Lists @ 2016-09-13 6:56 ` Daniel Sanabria 2016-09-13 7:02 ` Adam Goryachev 2016-09-13 15:20 ` Chris Murphy 0 siblings, 2 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 6:56 UTC (permalink / raw) To: Wols Lists; +Cc: Linux-RAID > What does smartctl report on sdc and sde (I think you want smartctl -x, > the extended "display everything" command)? [root@lamachine ~]# smartctl -x /dev/{sdc,sdd,sde} smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org ERROR: smartctl takes ONE device name as the final command-line argument. You have provided 3 device names: /dev/sdc /dev/sdd /dev/sde Use smartctl -h to get a usage summary [root@lamachine ~]# smartctl -x /dev/sdc smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD30EZRX-00D8PB0 Serial Number: WD-WCC4NCWT13RF LU WWN Device Id: 5 0014ee 25fc9e460 Firmware Version: 80.00A80 User Capacity: 3,000,591,900,160 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Sep 13 07:53:18 2016 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (38940) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 391) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 177 177 021 - 6116 4 Start_Stop_Count -O--CK 100 100 000 - 41 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 085 085 000 - 11128 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 41 192 Power-Off_Retract_Count -O--CK 200 200 000 - 20 193 Load_Cycle_Count -O--CK 142 142 000 - 175500 194 Temperature_Celsius -O---K 123 114 000 - 27 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 200 200 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 SATA NCQ Queued Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb7 GPL,SL VS 1 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) SCT Support Level: 1 Device State: Active (0) Current Temperature: 27 Celsius Power Cycle Min/Max Temperature: 23/27 Celsius Lifetime Min/Max Temperature: 15/36 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (376) Index Estimated Time Temperature Celsius 377 2016-09-12 23:56 ? - 378 2016-09-12 23:57 23 **** 379 2016-09-12 23:58 24 ***** ... ..( 4 skipped). .. ***** 384 2016-09-13 00:03 24 ***** 385 2016-09-13 00:04 25 ****** 386 2016-09-13 00:05 25 ****** 387 2016-09-13 00:06 25 ****** 388 2016-09-13 00:07 26 ******* ... ..( 2 skipped). .. ******* 391 2016-09-13 00:10 26 ******* 392 2016-09-13 00:11 27 ******** ... ..( 7 skipped). .. ******** 400 2016-09-13 00:19 27 ******** 401 2016-09-13 00:20 29 ********** 402 2016-09-13 00:21 28 ********* ... ..( 4 skipped). .. ********* 407 2016-09-13 00:26 28 ********* 408 2016-09-13 00:27 29 ********** ... ..( 4 skipped). .. ********** 413 2016-09-13 00:32 29 ********** 414 2016-09-13 00:33 30 *********** ... ..( 5 skipped). .. *********** 420 2016-09-13 00:39 30 *********** 421 2016-09-13 00:40 31 ************ ... ..( 23 skipped). .. ************ 445 2016-09-13 01:04 31 ************ 446 2016-09-13 01:05 32 ************* ... ..( 24 skipped). .. ************* 471 2016-09-13 01:30 32 ************* 472 2016-09-13 01:31 ? - 473 2016-09-13 01:32 33 ************** 474 2016-09-13 01:33 32 ************* ... ..( 9 skipped). .. ************* 6 2016-09-13 01:43 32 ************* 7 2016-09-13 01:44 33 ************** ... ..( 20 skipped). .. ************** 28 2016-09-13 02:05 33 ************** 29 2016-09-13 02:06 32 ************* 30 2016-09-13 02:07 32 ************* 31 2016-09-13 02:08 32 ************* 32 2016-09-13 02:09 33 ************** 33 2016-09-13 02:10 32 ************* ... ..( 62 skipped). .. ************* 96 2016-09-13 03:13 32 ************* 97 2016-09-13 03:14 ? - 98 2016-09-13 03:15 24 ***** ... ..( 4 skipped). .. ***** 103 2016-09-13 03:20 24 ***** 104 2016-09-13 03:21 25 ****** 105 2016-09-13 03:22 25 ****** 106 2016-09-13 03:23 25 ****** 107 2016-09-13 03:24 26 ******* ... ..( 2 skipped). .. ******* 110 2016-09-13 03:27 26 ******* 111 2016-09-13 03:28 27 ******** ... ..( 6 skipped). .. ******** 118 2016-09-13 03:35 27 ******** 119 2016-09-13 03:36 28 ********* ... ..( 11 skipped). .. ********* 131 2016-09-13 03:48 28 ********* 132 2016-09-13 03:49 29 ********** ... ..( 8 skipped). .. ********** 141 2016-09-13 03:58 29 ********** 142 2016-09-13 03:59 30 *********** ... ..( 11 skipped). .. *********** 154 2016-09-13 04:11 30 *********** 155 2016-09-13 04:12 31 ************ ... ..( 42 skipped). .. ************ 198 2016-09-13 04:55 31 ************ 199 2016-09-13 04:56 ? - 200 2016-09-13 04:57 22 *** 201 2016-09-13 04:58 22 *** 202 2016-09-13 04:59 23 **** 203 2016-09-13 05:00 23 **** 204 2016-09-13 05:01 24 ***** 205 2016-09-13 05:02 24 ***** 206 2016-09-13 05:03 25 ****** ... ..( 3 skipped). .. ****** 210 2016-09-13 05:07 25 ****** 211 2016-09-13 05:08 26 ******* 212 2016-09-13 05:09 27 ******** ... ..( 7 skipped). .. ******** 220 2016-09-13 05:17 27 ******** 221 2016-09-13 05:18 28 ********* ... ..( 13 skipped). .. ********* 235 2016-09-13 05:32 28 ********* 236 2016-09-13 05:33 29 ********** ... ..( 10 skipped). .. ********** 247 2016-09-13 05:44 29 ********** 248 2016-09-13 05:45 30 *********** ... ..( 16 skipped). .. *********** 265 2016-09-13 06:02 30 *********** 266 2016-09-13 06:03 31 ************ ... ..( 96 skipped). .. ************ 363 2016-09-13 07:40 31 ************ 364 2016-09-13 07:41 ? - 365 2016-09-13 07:42 31 ************ 366 2016-09-13 07:43 31 ************ 367 2016-09-13 07:44 30 *********** 368 2016-09-13 07:45 30 *********** 369 2016-09-13 07:46 30 *********** 370 2016-09-13 07:47 31 ************ ... ..( 5 skipped). .. ************ 376 2016-09-13 07:53 31 ************ SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 2 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 2 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 1280 Vendor specific [root@lamachine ~]# smartctl -x /dev/sdd smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD30EZRX-00D8PB0 Serial Number: WD-WCC4NPRDD6D7 LU WWN Device Id: 5 0014ee 25fca27b1 Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Sep 13 07:53:20 2016 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (39060) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 392) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 176 176 021 - 6166 4 Start_Stop_Count -O--CK 100 100 000 - 41 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 085 085 000 - 11129 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 41 192 Power-Off_Retract_Count -O--CK 200 200 000 - 27 193 Load_Cycle_Count -O--CK 137 137 000 - 191240 194 Temperature_Celsius -O---K 123 114 000 - 27 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 200 200 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 SATA NCQ Queued Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb7 GPL,SL VS 1 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) SCT Support Level: 1 Device State: Active (0) Current Temperature: 27 Celsius Power Cycle Min/Max Temperature: 23/27 Celsius Lifetime Min/Max Temperature: 15/36 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (401) Index Estimated Time Temperature Celsius 402 2016-09-12 23:56 ? - 403 2016-09-12 23:57 23 **** 404 2016-09-12 23:58 23 **** 405 2016-09-12 23:59 24 ***** ... ..( 3 skipped). .. ***** 409 2016-09-13 00:03 24 ***** 410 2016-09-13 00:04 25 ****** 411 2016-09-13 00:05 25 ****** 412 2016-09-13 00:06 25 ****** 413 2016-09-13 00:07 26 ******* ... ..( 2 skipped). .. ******* 416 2016-09-13 00:10 26 ******* 417 2016-09-13 00:11 27 ******** ... ..( 7 skipped). .. ******** 425 2016-09-13 00:19 27 ******** 426 2016-09-13 00:20 28 ********* 427 2016-09-13 00:21 29 ********** ... ..( 10 skipped). .. ********** 438 2016-09-13 00:32 29 ********** 439 2016-09-13 00:33 30 *********** ... ..( 4 skipped). .. *********** 444 2016-09-13 00:38 30 *********** 445 2016-09-13 00:39 31 ************ ... ..( 23 skipped). .. ************ 469 2016-09-13 01:03 31 ************ 470 2016-09-13 01:04 32 ************* ... ..( 26 skipped). .. ************* 19 2016-09-13 01:31 32 ************* 20 2016-09-13 01:32 ? - 21 2016-09-13 01:33 33 ************** 22 2016-09-13 01:34 32 ************* ... ..( 11 skipped). .. ************* 34 2016-09-13 01:46 32 ************* 35 2016-09-13 01:47 33 ************** ... ..( 11 skipped). .. ************** 47 2016-09-13 01:59 33 ************** 48 2016-09-13 02:00 32 ************* ... ..( 73 skipped). .. ************* 122 2016-09-13 03:14 32 ************* 123 2016-09-13 03:15 ? - 124 2016-09-13 03:16 23 **** 125 2016-09-13 03:17 24 ***** ... ..( 3 skipped). .. ***** 129 2016-09-13 03:21 24 ***** 130 2016-09-13 03:22 25 ****** 131 2016-09-13 03:23 25 ****** 132 2016-09-13 03:24 25 ****** 133 2016-09-13 03:25 26 ******* 134 2016-09-13 03:26 26 ******* 135 2016-09-13 03:27 26 ******* 136 2016-09-13 03:28 27 ******** ... ..( 7 skipped). .. ******** 144 2016-09-13 03:36 27 ******** 145 2016-09-13 03:37 28 ********* ... ..( 11 skipped). .. ********* 157 2016-09-13 03:49 28 ********* 158 2016-09-13 03:50 29 ********** ... ..( 9 skipped). .. ********** 168 2016-09-13 04:00 29 ********** 169 2016-09-13 04:01 30 *********** ... ..( 12 skipped). .. *********** 182 2016-09-13 04:14 30 *********** 183 2016-09-13 04:15 31 ************ ... ..( 40 skipped). .. ************ 224 2016-09-13 04:56 31 ************ 225 2016-09-13 04:57 ? - 226 2016-09-13 04:58 22 *** 227 2016-09-13 04:59 22 *** 228 2016-09-13 05:00 23 **** 229 2016-09-13 05:01 23 **** 230 2016-09-13 05:02 24 ***** 231 2016-09-13 05:03 24 ***** 232 2016-09-13 05:04 25 ****** ... ..( 3 skipped). .. ****** 236 2016-09-13 05:08 25 ****** 237 2016-09-13 05:09 26 ******* 238 2016-09-13 05:10 27 ******** ... ..( 6 skipped). .. ******** 245 2016-09-13 05:17 27 ******** 246 2016-09-13 05:18 28 ********* ... ..( 13 skipped). .. ********* 260 2016-09-13 05:32 28 ********* 261 2016-09-13 05:33 29 ********** ... ..( 11 skipped). .. ********** 273 2016-09-13 05:45 29 ********** 274 2016-09-13 05:46 30 *********** ... ..( 20 skipped). .. *********** 295 2016-09-13 06:07 30 *********** 296 2016-09-13 06:08 31 ************ ... ..( 92 skipped). .. ************ 389 2016-09-13 07:41 31 ************ 390 2016-09-13 07:42 ? - 391 2016-09-13 07:43 31 ************ ... ..( 9 skipped). .. ************ 401 2016-09-13 07:53 31 ************ SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 3 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 1283 Vendor specific [root@lamachine ~]# smartctl -x /dev/sde smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Green Device Model: WDC WD30EZRX-00D8PB0 Serial Number: WD-WCC4N1294906 LU WWN Device Id: 5 0014ee 25f968120 Firmware Version: 80.00A80 User Capacity: 3,000,591,900,160 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Sep 13 07:53:23 2016 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM feature is: Unavailable Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, NOT FROZEN [SEC1] Wt Cache Reorder: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (43200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 433) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0 3 Spin_Up_Time POS--K 175 175 021 - 6208 4 Start_Stop_Count -O--CK 100 100 000 - 40 5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0 7 Seek_Error_Rate -OSR-K 200 200 000 - 0 9 Power_On_Hours -O--CK 085 085 000 - 11141 10 Spin_Retry_Count -O--CK 100 253 000 - 0 11 Calibration_Retry_Count -O--CK 100 253 000 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 40 192 Power-Off_Retract_Count -O--CK 200 200 000 - 27 193 Load_Cycle_Count -O--CK 143 143 000 - 172837 194 Temperature_Celsius -O---K 123 113 000 - 27 196 Reallocated_Event_Count -O--CK 200 200 000 - 0 197 Current_Pending_Sector -O--CK 200 200 000 - 0 198 Offline_Uncorrectable ----CK 200 200 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning General Purpose Log Directory Version 1 SMART Log Directory Version 1 [multi-sector log support] Address Access R/W Size Description 0x00 GPL,SL R/O 1 Log Directory 0x01 SL R/O 1 Summary SMART error log 0x02 SL R/O 5 Comprehensive SMART error log 0x03 GPL R/O 6 Ext. Comprehensive SMART error log 0x06 SL R/O 1 SMART self-test log 0x07 GPL R/O 1 Extended self-test log 0x09 SL R/W 1 Selective self-test log 0x10 GPL R/O 1 SATA NCQ Queued Error log 0x11 GPL R/O 1 SATA Phy Event Counters log 0x80-0x9f GPL,SL R/W 16 Host vendor specific log 0xa0-0xa7 GPL,SL VS 16 Device vendor specific log 0xa8-0xb7 GPL,SL VS 1 Device vendor specific log 0xbd GPL,SL VS 1 Device vendor specific log 0xc0 GPL,SL VS 1 Device vendor specific log 0xc1 GPL VS 93 Device vendor specific log 0xe0 GPL,SL R/W 1 SCT Command/Status 0xe1 GPL,SL R/W 1 SCT Data Transfer SMART Extended Comprehensive Error Log Version: 1 (6 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. SCT Status Version: 3 SCT Version (vendor specific): 258 (0x0102) SCT Support Level: 1 Device State: Active (0) Current Temperature: 27 Celsius Power Cycle Min/Max Temperature: 23/27 Celsius Lifetime Min/Max Temperature: 16/37 Celsius Under/Over Temperature Limit Count: 0/0 Vendor specific: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SCT Temperature History Version: 2 Temperature Sampling Period: 1 minute Temperature Logging Interval: 1 minute Min/Max recommended Temperature: 0/60 Celsius Min/Max Temperature Limit: -41/85 Celsius Temperature History Size (Index): 478 (145) Index Estimated Time Temperature Celsius 146 2016-09-12 23:56 ? - 147 2016-09-12 23:57 23 **** 148 2016-09-12 23:58 23 **** 149 2016-09-12 23:59 24 ***** ... ..( 3 skipped). .. ***** 153 2016-09-13 00:03 24 ***** 154 2016-09-13 00:04 25 ****** 155 2016-09-13 00:05 25 ****** 156 2016-09-13 00:06 25 ****** 157 2016-09-13 00:07 26 ******* 158 2016-09-13 00:08 26 ******* 159 2016-09-13 00:09 26 ******* 160 2016-09-13 00:10 27 ******** ... ..( 8 skipped). .. ******** 169 2016-09-13 00:19 27 ******** 170 2016-09-13 00:20 28 ********* ... ..( 6 skipped). .. ********* 177 2016-09-13 00:27 28 ********* 178 2016-09-13 00:28 29 ********** ... ..( 5 skipped). .. ********** 184 2016-09-13 00:34 29 ********** 185 2016-09-13 00:35 30 *********** ... ..( 5 skipped). .. *********** 191 2016-09-13 00:41 30 *********** 192 2016-09-13 00:42 31 ************ ... ..( 25 skipped). .. ************ 218 2016-09-13 01:08 31 ************ 219 2016-09-13 01:09 32 ************* ... ..( 20 skipped). .. ************* 240 2016-09-13 01:30 32 ************* 241 2016-09-13 01:31 ? - 242 2016-09-13 01:32 33 ************** 243 2016-09-13 01:33 32 ************* ... ..( 35 skipped). .. ************* 279 2016-09-13 02:09 32 ************* 280 2016-09-13 02:10 33 ************** ... ..( 63 skipped). .. ************** 344 2016-09-13 03:14 33 ************** 345 2016-09-13 03:15 ? - 346 2016-09-13 03:16 23 **** 347 2016-09-13 03:17 24 ***** ... ..( 3 skipped). .. ***** 351 2016-09-13 03:21 24 ***** 352 2016-09-13 03:22 25 ****** 353 2016-09-13 03:23 25 ****** 354 2016-09-13 03:24 25 ****** 355 2016-09-13 03:25 26 ******* 356 2016-09-13 03:26 26 ******* 357 2016-09-13 03:27 26 ******* 358 2016-09-13 03:28 27 ******** ... ..( 5 skipped). .. ******** 364 2016-09-13 03:34 27 ******** 365 2016-09-13 03:35 28 ********* ... ..( 9 skipped). .. ********* 375 2016-09-13 03:45 28 ********* 376 2016-09-13 03:46 29 ********** ... ..( 5 skipped). .. ********** 382 2016-09-13 03:52 29 ********** 383 2016-09-13 03:53 30 *********** ... ..( 7 skipped). .. *********** 391 2016-09-13 04:01 30 *********** 392 2016-09-13 04:02 31 ************ ... ..( 28 skipped). .. ************ 421 2016-09-13 04:31 31 ************ 422 2016-09-13 04:32 32 ************* ... ..( 23 skipped). .. ************* 446 2016-09-13 04:56 32 ************* 447 2016-09-13 04:57 ? - 448 2016-09-13 04:58 22 *** 449 2016-09-13 04:59 22 *** 450 2016-09-13 05:00 23 **** 451 2016-09-13 05:01 23 **** 452 2016-09-13 05:02 24 ***** 453 2016-09-13 05:03 24 ***** 454 2016-09-13 05:04 25 ****** ... ..( 3 skipped). .. ****** 458 2016-09-13 05:08 25 ****** 459 2016-09-13 05:09 26 ******* 460 2016-09-13 05:10 27 ******** 461 2016-09-13 05:11 28 ********* ... ..( 12 skipped). .. ********* 474 2016-09-13 05:24 28 ********* 475 2016-09-13 05:25 29 ********** ... ..( 6 skipped). .. ********** 4 2016-09-13 05:32 29 ********** 5 2016-09-13 05:33 30 *********** ... ..( 8 skipped). .. *********** 14 2016-09-13 05:42 30 *********** 15 2016-09-13 05:43 31 ************ ... ..( 41 skipped). .. ************ 57 2016-09-13 06:25 31 ************ 58 2016-09-13 06:26 32 ************* ... ..( 74 skipped). .. ************* 133 2016-09-13 07:41 32 ************* 134 2016-09-13 07:42 ? - 135 2016-09-13 07:43 32 ************* ... ..( 9 skipped). .. ************* 145 2016-09-13 07:53 32 ************* SCT Error Recovery Control command not supported Device Statistics (GP/SMART Log 0x04) not supported SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 2 0 Command failed due to ICRC error 0x0002 2 0 R_ERR response for data FIS 0x0003 2 0 R_ERR response for device-to-host data FIS 0x0004 2 0 R_ERR response for host-to-device data FIS 0x0005 2 0 R_ERR response for non-data FIS 0x0006 2 0 R_ERR response for device-to-host non-data FIS 0x0007 2 0 R_ERR response for host-to-device non-data FIS 0x0008 2 0 Device-to-host non-data FIS retries 0x0009 2 3 Transition from drive PhyRdy to drive PhyNRdy 0x000a 2 3 Device-to-host register FISes sent due to a COMRESET 0x000b 2 0 CRC errors within host-to-device FIS 0x000f 2 0 R_ERR response for host-to-device data FIS, CRC 0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC 0x8000 4 1285 Vendor specific [root@lamachine ~]# > And looking at the lsdrv stuff, were you using LVM? That's got a load of > references to LV. yes I was using LVM on top of raid Thanks for the help so far guys! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 6:56 ` Daniel Sanabria @ 2016-09-13 7:02 ` Adam Goryachev 2016-09-13 15:20 ` Chris Murphy 1 sibling, 0 replies; 37+ messages in thread From: Adam Goryachev @ 2016-09-13 7:02 UTC (permalink / raw) To: Daniel Sanabria, Wols Lists; +Cc: Linux-RAID On 13/09/16 16:56, Daniel Sanabria wrote: >> What does smartctl report on sdc and sde (I think you want smartctl -x, >> the extended "display everything" command)? > [root@lamachine ~]# smartctl -x /dev/{sdc,sdd,sde} > smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) > Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org > > ERROR: smartctl takes ONE device name as the final command-line argument. > You have provided 3 device names: > /dev/sdc > /dev/sdd > /dev/sde > > Use smartctl -h to get a usage summary > > [root@lamachine ~]# smartctl -x /dev/sdc > smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) > Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Green Make sure you have read about linux raid and SCT/ERC and fixed the timeout issue for these drives. Do that immediately, before you bother trying to recover anything, or do anything else (or else recovery will probably fail, things will get worse, the sky will fall in...) > Device Model: WDC WD30EZRX-00D8PB0 > Serial Number: WD-WCC4NCWT13RF > LU WWN Device Id: 5 0014ee 25fc9e460 > Firmware Version: 80.00A80 > User Capacity: 3,000,591,900,160 bytes [3.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 5400 rpm > Device is: In smartctl database [for details use: -P show] > ATA Version is: ACS-2 (minor revision not indicated) > SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Tue Sep 13 07:53:18 2016 BST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > AAM feature is: Unavailable > APM feature is: Unavailable > Rd look-ahead is: Enabled > Write cache is: Enabled > ATA Security is: Disabled, NOT FROZEN [SEC1] > Wt Cache Reorder: Enabled > > 193 Load_Cycle_Count -O--CK 142 142 000 - 175500 I recall there is some tool or setting for the drive that will stop it from "parking" every 30 seconds, you should read up on that and see if you can prevent this. This will slow the drive down every time it needs to "restart" the drive to read/write after a short period of inactivity. Once you fix those two issues, (which might be related to the cause of your problem), then someone else with more detailed knowledge can advise on the best next step. Regards, Adam -- -- Adam Goryachev Website Managers www.websitemanagers.com.au ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 6:56 ` Daniel Sanabria 2016-09-13 7:02 ` Adam Goryachev @ 2016-09-13 15:20 ` Chris Murphy 2016-09-13 19:43 ` Daniel Sanabria 1 sibling, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-13 15:20 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Wols Lists, Linux-RAID On Tue, Sep 13, 2016 at 12:56 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote: > > [root@lamachine ~]# smartctl -x /dev/sdc > smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) > Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Green > Device Model: WDC WD30EZRX-00D8PB0 > Serial Number: WD-WCC4NCWT13RF > SCT Error Recovery Control command not supported This is a problem. What do you get for cat /sys/block/sdc/device/timeout > > [root@lamachine ~]# smartctl -x /dev/sdd > smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) > Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Green > Device Model: WDC WD30EZRX-00D8PB0 > Serial Number: WD-WCC4NPRDD6D7 > > SCT Error Recovery Control command not supported Same for sdd. > [root@lamachine ~]# smartctl -x /dev/sde > smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.3.3-303.fc23.x86_64] (local build) > Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Model Family: Western Digital Green > Device Model: WDC WD30EZRX-00D8PB0 > Serial Number: WD-WCC4N1294906 > SCT Error Recovery Control command not supported And sde. But otherwise no other issues. Note that WDC Greens are explicitly disqualified for use in RAID of any level by the manufacturer. We can complain til the cows come home but the reality is the manufacturer does not stand behind this product at all in RAID configurations. Anyone specifically familiar with WDC Greens, and if the lack of SCT ERC can be worked around in the usual way by increasing the SCSI command timer value? Or is there also something else? I vaguely recall something about drive spin down that can also cause issues, does that need mitigation? If no one chimes in, this information is in the archives, just search for 'WDC green' and you'll get an shittonne of results. OK so the next thing I want to see is why you're getting these messages from parted when you check sdc and sde for partition maps. At the time you do this, what do you see in kernel messages? Maybe best to just stick the entire dmesg for the current boot up somewhere like fpaste.org or equivalent. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 15:20 ` Chris Murphy @ 2016-09-13 19:43 ` Daniel Sanabria 2016-09-13 19:52 ` Chris Murphy 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 19:43 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > This is a problem. What do you get for > > cat /sys/block/sdc/device/timeout [root@lamachine ~]# cat /sys/block/sdc/device/timeout 30 [root@lamachine ~]# cat /sys/block/sdd/device/timeout 30 [root@lamachine ~]# cat /sys/block/sde/device/timeout 30 [root@lamachine ~]# > Anyone specifically familiar with WDC Greens, and if the lack of SCT > ERC can be worked around in the usual way by increasing the SCSI > command timer value? Or is there also something else? I vaguely recall > something about drive spin down that can also cause issues, does that > need mitigation? If no one chimes in, this information is in the > archives, just search for 'WDC green' and you'll get an shittonne of > results. In another thread I found Phil Turmel recommending to change the timeout value like this: for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done Is that what you guys are talking about when mentioning the SCT/ERC issues? > OK so the next thing I want to see is why you're getting these > messages from parted when you check sdc and sde for partition maps. At > the time you do this, what do you see in kernel messages? Maybe best > to just stick the entire dmesg for the current boot up somewhere like > fpaste.org or equivalent. https://paste.fedoraproject.org/427719/37952531/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 19:43 ` Daniel Sanabria @ 2016-09-13 19:52 ` Chris Murphy 2016-09-13 20:04 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-13 19:52 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Tue, Sep 13, 2016 at 1:43 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> This is a problem. What do you get for >> >> cat /sys/block/sdc/device/timeout > > [root@lamachine ~]# cat /sys/block/sdc/device/timeout > 30 > [root@lamachine ~]# cat /sys/block/sdd/device/timeout > 30 > [root@lamachine ~]# cat /sys/block/sde/device/timeout > 30 > [root@lamachine ~]# Common and often fatal misconfiguration. Since the drives don't support SCT ERC, the command timer needs to be changed to something higher. Without the benefit of historical kernel messages, it's unclear if there have been any link resets that'd indicate improper correction for bad sectors on the drives. > >> Anyone specifically familiar with WDC Greens, and if the lack of SCT >> ERC can be worked around in the usual way by increasing the SCSI >> command timer value? Or is there also something else? I vaguely recall >> something about drive spin down that can also cause issues, does that >> need mitigation? If no one chimes in, this information is in the >> archives, just search for 'WDC green' and you'll get an shittonne of >> results. > > In another thread I found Phil Turmel recommending to change the > timeout value like this: > > for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done > > Is that what you guys are talking about when mentioning the SCT/ERC issues? Yes. You should do that. > >> OK so the next thing I want to see is why you're getting these >> messages from parted when you check sdc and sde for partition maps. At >> the time you do this, what do you see in kernel messages? Maybe best >> to just stick the entire dmesg for the current boot up somewhere like >> fpaste.org or equivalent. > > https://paste.fedoraproject.org/427719/37952531/ Yeah that looks like a recent boot; if that's a boot where you'd run parted and got those errors on read, then I don't have a good explanation why you're getting parted errors that don't have matching kernel messages, i.e. something from libata about the drive not liking the command or not properly reading from the drive, etc. What do you get for gdisk -l <dev> for each of these drives? -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 19:52 ` Chris Murphy @ 2016-09-13 20:04 ` Daniel Sanabria 2016-09-13 20:13 ` Chris Murphy 2016-09-14 4:33 ` Chris Murphy 0 siblings, 2 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 20:04 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > Yeah that looks like a recent boot; if that's a boot where you'd run > parted and got those errors on read, then I don't have a good > explanation why you're getting parted errors that don't have matching > kernel messages, i.e. something from libata about the drive not liking > the command or not properly reading from the drive, etc. let me see if I can find something > What do you get for gdisk -l <dev> for each of these drives? [root@lamachine ~]# gdisk -l /dev/sdc GPT fdisk (gdisk) version 1.0.1 Warning! Disk size is smaller than the main header indicates! Loading secondary header from the last sector of the disk! You should use 'v' to verify disk integrity, and perhaps options on the experts' menu to repair the disk. Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Warning! One or more CRCs don't match. You should repair the disk! Partition table scan: MBR: protective BSD: not present APM: not present GPT: damaged **************************************************************************** Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk verification and recovery are STRONGLY recommended. **************************************************************************** Disk /dev/sdc: 5860531055 sectors, 2.7 TiB Logical sector size: 512 bytes Disk identifier (GUID): 6DB70F4E-D8ED-4290-AA2E-4E81D8324992 Partition table holds up to 128 entries First usable sector is 2048, last usable sector is 5860533134 Partitions will be aligned on 2048-sector boundaries Total free space is 516987791 sectors (246.5 GiB) Number Start (sector) End (sector) Size Code Name 1 2048 4294969343 2.0 TiB FD00 2 4294969344 5343545343 500.0 GiB 8300 [root@lamachine ~]# gdisk -l /dev/sdd GPT fdisk (gdisk) version 1.0.1 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdd: 5860533168 sectors, 2.7 TiB Logical sector size: 512 bytes Disk identifier (GUID): D3233810-F552-4126-8281-7F71A4938DF9 Partition table holds up to 128 entries First usable sector is 2048, last usable sector is 5860533134 Partitions will be aligned on 2048-sector boundaries Total free space is 516987791 sectors (246.5 GiB) Number Start (sector) End (sector) Size Code Name 1 2048 4294969343 2.0 TiB FD00 2 4294969344 5343545343 500.0 GiB 8300 [root@lamachine ~]# gdisk -l /dev/sde GPT fdisk (gdisk) version 1.0.1 Warning! Disk size is smaller than the main header indicates! Loading secondary header from the last sector of the disk! You should use 'v' to verify disk integrity, and perhaps options on the experts' menu to repair the disk. Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Warning! One or more CRCs don't match. You should repair the disk! Partition table scan: MBR: protective BSD: not present APM: not present GPT: damaged **************************************************************************** Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk verification and recovery are STRONGLY recommended. **************************************************************************** Disk /dev/sde: 5860531055 sectors, 2.7 TiB Logical sector size: 512 bytes Disk identifier (GUID): B64DAA7C-E1D8-4E8A-A5C8-76001DAE6B30 Partition table holds up to 128 entries First usable sector is 2048, last usable sector is 5860533134 Partitions will be aligned on 2048-sector boundaries Total free space is 516987791 sectors (246.5 GiB) Number Start (sector) End (sector) Size Code Name 1 2048 4294969343 2.0 TiB FD00 2 4294969344 5343545343 500.0 GiB 8300 [root@lamachine ~]# On 13 September 2016 at 20:52, Chris Murphy <lists@colorremedies.com> wrote: > On Tue, Sep 13, 2016 at 1:43 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote: >>> This is a problem. What do you get for >>> >>> cat /sys/block/sdc/device/timeout >> >> [root@lamachine ~]# cat /sys/block/sdc/device/timeout >> 30 >> [root@lamachine ~]# cat /sys/block/sdd/device/timeout >> 30 >> [root@lamachine ~]# cat /sys/block/sde/device/timeout >> 30 >> [root@lamachine ~]# > > Common and often fatal misconfiguration. Since the drives don't > support SCT ERC, the command timer needs to be changed to something > higher. Without the benefit of historical kernel messages, it's > unclear if there have been any link resets that'd indicate improper > correction for bad sectors on the drives. > > > > >> >>> Anyone specifically familiar with WDC Greens, and if the lack of SCT >>> ERC can be worked around in the usual way by increasing the SCSI >>> command timer value? Or is there also something else? I vaguely recall >>> something about drive spin down that can also cause issues, does that >>> need mitigation? If no one chimes in, this information is in the >>> archives, just search for 'WDC green' and you'll get an shittonne of >>> results. >> >> In another thread I found Phil Turmel recommending to change the >> timeout value like this: >> >> for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done >> >> Is that what you guys are talking about when mentioning the SCT/ERC issues? > > Yes. You should do that. > > > > >> >>> OK so the next thing I want to see is why you're getting these >>> messages from parted when you check sdc and sde for partition maps. At >>> the time you do this, what do you see in kernel messages? Maybe best >>> to just stick the entire dmesg for the current boot up somewhere like >>> fpaste.org or equivalent. >> >> https://paste.fedoraproject.org/427719/37952531/ > > Yeah that looks like a recent boot; if that's a boot where you'd run > parted and got those errors on read, then I don't have a good > explanation why you're getting parted errors that don't have matching > kernel messages, i.e. something from libata about the drive not liking > the command or not properly reading from the drive, etc. > > What do you get for gdisk -l <dev> for each of these drives? > > > > -- > Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 20:04 ` Daniel Sanabria @ 2016-09-13 20:13 ` Chris Murphy 2016-09-13 20:29 ` Daniel Sanabria 2016-09-13 20:36 ` Daniel Sanabria 2016-09-14 4:33 ` Chris Murphy 1 sibling, 2 replies; 37+ messages in thread From: Chris Murphy @ 2016-09-13 20:13 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID An invalid backup GPT suggests it was stepped on by something that was used on the whole block device. The backup GPT is at the end of the drive. And if you were to use mdadm create on the entire drive rather than a partition, you'd step on that GPT and also incorrectly recreate the array. Have you told us the entire story about how you got into this situation? Have you use 'mdadm create' trying to fix this? If you haven't, don't do it. I see a lot of conflicting information. For example: > /dev/md129: > Version : 1.2 > Creation Time : Mon Nov 10 16:28:11 2014 > Raid Level : raid0 > Array Size : 1572470784 (1499.63 GiB 1610.21 GB) > Raid Devices : 3 > Total Devices : 3 > Persistence : Superblock is persistent > > Update Time : Mon Nov 10 16:28:11 2014 > State : clean > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > > Name : lamachine:129 (local to host lamachine) > UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a > Events : 0 > > Number Major Minor RaidDevice State > 0 8 50 0 active sync /dev/sdd2 > 1 8 66 1 active sync /dev/sde2 > 2 8 82 2 active sync /dev/sdf >> /dev/md129: >> Version : 1.2 >> Raid Level : raid0 >> Total Devices : 1 >> Persistence : Superblock is persistent >> >> State : inactive >> >> Name : lamachine:129 (local to host lamachine) >> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >> Events : 0 >> >> Number Major Minor RaidDevice >> >> - 8 50 - /dev/sdd2 The same md device, one raid0 one raid5. The same sdd2, one in the raid0, and it's also in the raid5. Which is true? It sounds to me like you've tried recovery and did something wrong; or about as bad is you've had these drives in more than one software raid setup, and you didn't zero out old superblocks first. If you leave old signatures intact you end up with this sort of ambiguity, which signature is correct. So now you have to figure out which one is correct and which one is wrong... Maybe start out with 'mdadm -D' on everything... literally everything, every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also everyone of their partitions; and see if it's possible to sort out this mess. Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 20:13 ` Chris Murphy @ 2016-09-13 20:29 ` Daniel Sanabria 2016-09-13 20:36 ` Daniel Sanabria 1 sibling, 0 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 20:29 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID Thanks for the help Chris, > Have you told us the entire story about how you got into > this situation? I think I have but I can see how it can be confusing since I have provided non requested info - including old records from where arrays were working (more on that below). Basically the system was moved meaning it was offline for a few days, on first boot after the move I ended up with md128 and md129 inactive > Have you use 'mdadm create' trying to fix this? If you > haven't, don't do it. I haven't > I see a lot of conflicting information. For example: > >> /dev/md129: >> Version : 1.2 >> Creation Time : Mon Nov 10 16:28:11 2014 >> Raid Level : raid0 >> Array Size : 1572470784 (1499.63 GiB 1610.21 GB) >> Raid Devices : 3 >> Total Devices : 3 >> Persistence : Superblock is persistent >> >> Update Time : Mon Nov 10 16:28:11 2014 >> State : clean >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >> Name : lamachine:129 (local to host lamachine) >> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >> Events : 0 >> >> Number Major Minor RaidDevice State >> 0 8 50 0 active sync /dev/sdd2 >> 1 8 66 1 active sync /dev/sde2 >> 2 8 82 2 active sync /dev/sdf > > > >>> /dev/md129: >>> Version : 1.2 >>> Raid Level : raid0 >>> Total Devices : 1 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> >>> Name : lamachine:129 (local to host lamachine) >>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>> Events : 0 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 50 - /dev/sdd2 > > > The same md device, one raid0 one raid5. The same sdd2, one in the > raid0, and it's also in the raid5. Which is true? So the first record for /dev/md129 is from the time the array was working ok and the second is the current status. I think both records shows Raid Level: raid0 > It sounds to me like > you've tried recovery and did something wrong; or about as bad is > you've had these drives in more than one software raid setup, and you > didn't zero out old superblocks first. The only thing that comes to mind is that at first the system wasn't coming up because so I tried to boot from individual drives while trying to locate the boot device. > Maybe start out with 'mdadm -D' on everything... literally everything, > every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also > everyone of their partitions; and see if it's possible to sort out > this mess. Will run on devices "a to f" On 13 September 2016 at 21:13, Chris Murphy <lists@colorremedies.com> wrote: > An invalid backup GPT suggests it was stepped on by something that was > used on the whole block device. The backup GPT is at the end of the > drive. And if you were to use mdadm create on the entire drive rather > than a partition, you'd step on that GPT and also incorrectly recreate > the array. Have you told us the entire story about how you got into > this situation? Have you use 'mdadm create' trying to fix this? If you > haven't, don't do it. > > I see a lot of conflicting information. For example: > >> /dev/md129: >> Version : 1.2 >> Creation Time : Mon Nov 10 16:28:11 2014 >> Raid Level : raid0 >> Array Size : 1572470784 (1499.63 GiB 1610.21 GB) >> Raid Devices : 3 >> Total Devices : 3 >> Persistence : Superblock is persistent >> >> Update Time : Mon Nov 10 16:28:11 2014 >> State : clean >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Chunk Size : 512K >> >> Name : lamachine:129 (local to host lamachine) >> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >> Events : 0 >> >> Number Major Minor RaidDevice State >> 0 8 50 0 active sync /dev/sdd2 >> 1 8 66 1 active sync /dev/sde2 >> 2 8 82 2 active sync /dev/sdf > > > >>> /dev/md129: >>> Version : 1.2 >>> Raid Level : raid0 >>> Total Devices : 1 >>> Persistence : Superblock is persistent >>> >>> State : inactive >>> >>> Name : lamachine:129 (local to host lamachine) >>> UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a >>> Events : 0 >>> >>> Number Major Minor RaidDevice >>> >>> - 8 50 - /dev/sdd2 > > > The same md device, one raid0 one raid5. The same sdd2, one in the > raid0, and it's also in the raid5. Which is true? It sounds to me like > you've tried recovery and did something wrong; or about as bad is > you've had these drives in more than one software raid setup, and you > didn't zero out old superblocks first. If you leave old signatures > intact you end up with this sort of ambiguity, which signature is > correct. So now you have to figure out which one is correct and which > one is wrong... > > Maybe start out with 'mdadm -D' on everything... literally everything, > every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also > everyone of their partitions; and see if it's possible to sort out > this mess. > > > Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 20:13 ` Chris Murphy 2016-09-13 20:29 ` Daniel Sanabria @ 2016-09-13 20:36 ` Daniel Sanabria 2016-09-13 21:10 ` Chris Murphy 2016-09-13 21:26 ` Wols Lists 1 sibling, 2 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 20:36 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > Maybe start out with 'mdadm -D' on everything... literally everything, > every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) and also > everyone of their partitions; and see if it's possible to sort out > this mess. [root@lamachine ~]# mdadm -D /dev/md* /dev/md126: Version : 0.90 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Array Size : 30719936 (29.30 GiB 31.46 GB) Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Persistence : Superblock is persistent Update Time : Tue Sep 13 21:33:13 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 64K UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Events : 0.264152 Number Major Minor RaidDevice State 0 8 82 0 active sync set-A /dev/sdf2 1 8 1 1 active sync set-B /dev/sda1 /dev/md127: Version : 1.2 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Array Size : 94367232 (90.00 GiB 96.63 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Jul 26 19:00:28 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : reading.homeunix.com:3 UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Events : 0 Number Major Minor RaidDevice State 0 8 85 0 active sync /dev/sdf5 1 8 21 1 active sync /dev/sdb5 2 8 5 2 active sync /dev/sda5 /dev/md128: Version : 1.2 Raid Level : raid0 Total Devices : 1 Persistence : Superblock is persistent State : inactive Name : lamachine:128 (local to host lamachine) UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Events : 4154 Number Major Minor RaidDevice - 8 49 - /dev/sdd1 /dev/md129: Version : 1.2 Raid Level : raid0 Total Devices : 1 Persistence : Superblock is persistent State : inactive Name : lamachine:129 (local to host lamachine) UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Events : 0 Number Major Minor RaidDevice - 8 50 - /dev/sdd2 /dev/md2: Version : 0.90 Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Array Size : 511999872 (488.28 GiB 524.29 GB) Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Sep 13 20:29:02 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Events : 0.611 Number Major Minor RaidDevice State 0 8 83 0 active sync /dev/sdf3 1 8 18 1 active sync /dev/sdb2 2 8 2 2 active sync /dev/sda2 [root@lamachine ~]# mdadm -D /dev/sd* mdadm: /dev/sda does not appear to be an md device mdadm: /dev/sda1 does not appear to be an md device mdadm: /dev/sda2 does not appear to be an md device mdadm: /dev/sda3 does not appear to be an md device mdadm: /dev/sda5 does not appear to be an md device mdadm: /dev/sda6 does not appear to be an md device mdadm: /dev/sdb does not appear to be an md device mdadm: /dev/sdb2 does not appear to be an md device mdadm: /dev/sdb3 does not appear to be an md device mdadm: /dev/sdb4 does not appear to be an md device mdadm: /dev/sdb5 does not appear to be an md device mdadm: /dev/sdb6 does not appear to be an md device mdadm: /dev/sdc does not appear to be an md device mdadm: /dev/sdd does not appear to be an md device mdadm: /dev/sdd1 does not appear to be an md device mdadm: /dev/sdd2 does not appear to be an md device mdadm: /dev/sde does not appear to be an md device mdadm: /dev/sdf does not appear to be an md device mdadm: /dev/sdf1 does not appear to be an md device mdadm: /dev/sdf2 does not appear to be an md device mdadm: /dev/sdf3 does not appear to be an md device mdadm: /dev/sdf4 does not appear to be an md device mdadm: /dev/sdf5 does not appear to be an md device mdadm: /dev/sdf6 does not appear to be an md device [root@lamachine ~]# ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 20:36 ` Daniel Sanabria @ 2016-09-13 21:10 ` Chris Murphy 2016-09-13 21:46 ` Daniel Sanabria 2016-09-13 21:26 ` Wols Lists 1 sibling, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-13 21:10 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Tue, Sep 13, 2016 at 2:36 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> Maybe start out with 'mdadm -D' on everything... literally everything, >> every whole drive (i.e. /dev/sdd, /dev/sdc, all of them) Sorry, mdadm -E on the individual drives and their partitions. I'm curious what metadata it finds on each if any. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 21:10 ` Chris Murphy @ 2016-09-13 21:46 ` Daniel Sanabria 0 siblings, 0 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-13 21:46 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > Sorry, mdadm -E on the individual drives and their partitions. I'm > curious what metadata it finds on each if any. [root@lamachine ~]# mdadm -E /dev/sd* /dev/sda: MBR Magic : aa55 Partition[0] : 61440000 sectors at 63 (type fd) Partition[1] : 512000000 sectors at 61440063 (type fd) Partition[2] : 403328002 sectors at 573440063 (type 05) /dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Array Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Tue Sep 13 22:43:06 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : ed981d35 - correct Events : 264152 Layout : near=2 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 1 1 active sync /dev/sda1 0 0 8 82 0 active sync /dev/sdf2 1 1 8 1 1 active sync /dev/sda1 /dev/sda2: Magic : a92b4efc Version : 0.90.00 UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Array Size : 511999872 (488.28 GiB 524.29 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Update Time : Tue Sep 13 20:29:02 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 73b16d69 - correct Events : 611 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 2 2 active sync /dev/sda2 0 0 8 83 0 active sync /dev/sdf3 1 1 8 18 1 active sync /dev/sdb2 2 2 8 2 2 active sync /dev/sda2 /dev/sda3: MBR Magic : aa55 Partition[0] : 62910589 sectors at 63 (type 83) Partition[1] : 7116795 sectors at 82445692 (type 05) /dev/sda5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Name : reading.homeunix.com:3 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 62908541 (30.00 GiB 32.21 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Unused Space : before=1968 sectors, after=0 sectors State : clean Device UUID : a0efc1b3:94cc6eb8:deea76ca:772b2d2d Update Time : Tue Jul 26 19:00:28 2011 Checksum : 9eba9119 - correct Events : 0 Chunk Size : 512K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) mdadm: No md superblock detected on /dev/sda6. /dev/sdb: MBR Magic : aa55 Partition[1] : 512000000 sectors at 409663 (type fd) Partition[2] : 16384000 sectors at 512409663 (type 82) Partition[3] : 447974402 sectors at 528793663 (type 05) /dev/sdb2: Magic : a92b4efc Version : 0.90.00 UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Array Size : 511999872 (488.28 GiB 524.29 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Update Time : Tue Sep 13 20:29:02 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 73b16d77 - correct Events : 611 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 18 1 active sync /dev/sdb2 0 0 8 83 0 active sync /dev/sdf3 1 1 8 18 1 active sync /dev/sdb2 2 2 8 2 2 active sync /dev/sda2 mdadm: No md superblock detected on /dev/sdb3. /dev/sdb4: MBR Magic : aa55 Partition[0] : 62912354 sectors at 63 (type 83) Partition[1] : 7116795 sectors at 82447457 (type 05) /dev/sdb5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Name : reading.homeunix.com:3 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 62910306 (30.00 GiB 32.21 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Unused Space : before=1968 sectors, after=0 sectors State : clean Device UUID : 152d0202:64efb3e7:f23658c3:82a239a1 Update Time : Tue Jul 26 19:00:28 2011 Checksum : 892dbb61 - correct Events : 0 Chunk Size : 512K Device Role : Active device 1 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) mdadm: No md superblock detected on /dev/sdb6. /dev/sdc: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sdd: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Name : lamachine:128 (local to host lamachine) Creation Time : Fri Oct 24 15:24:38 2014 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 4294705152 (2047.88 GiB 2198.89 GB) Array Size : 4294705152 (4095.75 GiB 4397.78 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 1f652d4f:92fccd8e:b439abf2:76b881e1 Internal Bitmap : 8 sectors from superblock Update Time : Thu Feb 4 18:55:34 2016 Bad Block Log : 512 entries available at offset 72 sectors Checksum : ed602b13 - correct Events : 4154 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Name : lamachine:129 (local to host lamachine) Creation Time : Mon Nov 10 16:28:11 2014 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 1048313856 (499.88 GiB 536.74 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 562dd382:5ccc00aa:449ea7e4:d8b266c2 Update Time : Mon Nov 10 16:28:11 2014 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 937158c1 - correct Events : 0 Chunk Size : 512K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sdf: MBR Magic : aa55 Partition[0] : 407552 sectors at 2048 (type 83) Partition[1] : 61440000 sectors at 409663 (type fd) Partition[2] : 512000000 sectors at 61849663 (type fd) Partition[3] : 402918402 sectors at 573849663 (type 05) mdadm: No md superblock detected on /dev/sdf1. /dev/sdf2: Magic : a92b4efc Version : 0.90.00 UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Array Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Tue Sep 13 22:43:06 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : ed981d84 - correct Events : 264152 Layout : near=2 Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 82 0 active sync /dev/sdf2 0 0 8 82 0 active sync /dev/sdf2 1 1 8 1 1 active sync /dev/sda1 /dev/sdf3: Magic : a92b4efc Version : 0.90.00 UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Array Size : 511999872 (488.28 GiB 524.29 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Update Time : Tue Sep 13 20:29:02 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 73b16db6 - correct Events : 611 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 83 0 active sync /dev/sdf3 0 0 8 83 0 active sync /dev/sdf3 1 1 8 18 1 active sync /dev/sdb2 2 2 8 2 2 active sync /dev/sda2 /dev/sdf4: MBR Magic : aa55 Partition[0] : 62918679 sectors at 63 (type 83) Partition[1] : 7116795 sectors at 82453782 (type 05) /dev/sdf5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Name : reading.homeunix.com:3 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 62916631 (30.00 GiB 32.21 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Unused Space : before=1968 sectors, after=0 sectors State : clean Device UUID : 5778cd64:0bbba183:ef3270a8:41f83aca Update Time : Tue Jul 26 19:00:28 2011 Checksum : 96003cba - correct Events : 0 Chunk Size : 512K Device Role : Active device 0 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) mdadm: No md superblock detected on /dev/sdf6. [root@lamachine ~]# ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 20:36 ` Daniel Sanabria 2016-09-13 21:10 ` Chris Murphy @ 2016-09-13 21:26 ` Wols Lists 1 sibling, 0 replies; 37+ messages in thread From: Wols Lists @ 2016-09-13 21:26 UTC (permalink / raw) To: Daniel Sanabria, Chris Murphy; +Cc: Linux-RAID On 13/09/16 21:36, Daniel Sanabria wrote: > [root@lamachine ~]# mdadm -D /dev/sd* > mdadm: /dev/sda does not appear to be an md device > mdadm: /dev/sda1 does not appear to be an md device > mdadm: /dev/sda2 does not appear to be an md device > mdadm: /dev/sda3 does not appear to be an md device > mdadm: /dev/sda5 does not appear to be an md device > mdadm: /dev/sda6 does not appear to be an md device > mdadm: /dev/sdb does not appear to be an md device I think that it's been pointed out, but this should be "mdadm -E". "mdadm -E" gets passed physical devices eg /dev/sda, "mdadm -D" gets passed raid devices eg /dev/md127. Confusing, I know ... Cheers, Wol ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-13 20:04 ` Daniel Sanabria 2016-09-13 20:13 ` Chris Murphy @ 2016-09-14 4:33 ` Chris Murphy 2016-09-14 10:36 ` Daniel Sanabria 1 sibling, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-14 4:33 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Tue, Sep 13, 2016 at 2:04 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote: > [root@lamachine ~]# gdisk -l /dev/sdc > GPT fdisk (gdisk) version 1.0.1 > > Warning! Disk size is smaller than the main header indicates! This is true... > Disk /dev/sdc: 5860531055 sectors, 2.7 TiB > First usable sector is 2048, last usable sector is 5860533134 The last usable sector LBA is bigger than the total number of LBAs. So either there's a bug in whatever partitioned this or maybe the partition map was copied from one disk to another somehow? Hard to say, but it happened twice as sde has the exact same problem. > [root@lamachine ~]# gdisk -l /dev/sde > GPT fdisk (gdisk) version 1.0.1 > > Warning! Disk size is smaller than the main header indicates! > Disk /dev/sde: 5860531055 sectors, 2.7 TiB > First usable sector is 2048, last usable sector is 5860533134 Pretty weird. Any ideas how that happened? My guess is sdd was partitioned first, and its partition was copied to sdc and sde, and the tool blindly did not recompute the last usable sector LBA, it used the value from sdd. Anyway... sdd1 is 2TB sdd2 is 500MB And it looks like sdc and sde, if we believe the backup GPT, have the same exact partition scheme. sdd1 has mdadm v1.2 metadata indicating it's a raid5 with two other members, logically that means sdc1 and sde1 are the missing members for md128. sdd2 has mdadm v1.2 metadata indicating it's a raid0 with two other members, logically that means sdc2 and sde2 are the missing members for md129. This is consistent with the metadata that's been found on sdd1 and sdd2. So now the question is really how to go about fixing sdc and sde partition tables, so that their partitions appear? Weirdly enough the safest way to fix it is to replace the PMBR with a conventional MBR with two primary partitions with start and end LBAs just as gdisk shows them for sdc, sdd and sde. Why? Well, by spec, even if you don't remove the GPT signatures, if an MBR is present and is not a protective MBR, then it is supposed to be honored over the GPT. That 'd let you keep the GPT untouched, and only alter the MBR which right now doesn't contain any valuable information anyway. The trick though is you need to use an old version of fdisk that won't check for the GPT first; OR you can use wipefs to wipe the GPT signatures only, and then use fdisk to create a new MBR (msdos disk label it's sometimes called). Hint with wipefs. First, use it with -n to see what it will do. And then when you're ready to act, replace -n with -b which will create a backup file for what it's wiping. The signature is a small amount of data that's easily replaced and not unique to the table being wiped so it's still possible to use the GPT later should it be necessary. i.e. everything I'm describing is reversible. wipefs -a -n /dev/sdc wipefs -a -n /dev/sde So what do you get for that? When you get ready to use fdisk to create the new partitions, since both use metadata 1.2, use 0xda as the partition type code (although 0x83 is OK also, 0xfd technically only applies to 0.9 metadata for kernel autodetect). Chances are you could just use gdisk to verify and fix the primary and backup GPTs on sdc and sde, it looks like there's nothing in the vicinity that'll get stepped on by doing this. But the above is one part being strict about being able to reverse each step, and 1 part ass covering. When it's all done and working with the new MBR you can either leave it alone, or you can run gdisk on it and it will immediately convert it (in memory) and you can commit it to disk with the w command to go back to GPT - which I personally prefer just because there are two copies of everything, and each copy is separately checksummed. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 4:33 ` Chris Murphy @ 2016-09-14 10:36 ` Daniel Sanabria 2016-09-14 14:32 ` Chris Murphy 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-14 10:36 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > Pretty weird. Any ideas how that happened? My guess is sdd was > partitioned first, and its partition was copied to sdc and sde, and > the tool blindly did not recompute the last usable sector LBA, it used > the value from sdd. I have no solid idea but my money is on a human screwing up. While trying to boot the server after the move I have no clear record of what actions were taken, however I think it was during this time when the disks were probably messed. > sdd1 is 2TB > sdd2 is 500MB > > And it looks like sdc and sde, if we believe the backup GPT, have the > same exact partition scheme. yes from the original build that was the idea > wipefs -a -n /dev/sdc > wipefs -a -n /dev/sde > > So what do you get for that? [root@lamachine ~]# wipefs -a -n /dev/sdc /dev/sdc: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdc: calling ioctl to re-read partition table: Success [root@lamachine ~]# wipefs -a -n /dev/sde /dev/sde: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sde: calling ioctl to re-read partition table: Success it didn't give me any indication that it was in no-act mode but I decided to carry on with the backup flag and got this: [root@lamachine ~]# wipefs -a -b /dev/sdc wipefs: invalid option -- 'b' Usage: wipefs [options] <device> Wipe signatures from a device. Options: -a, --all wipe all magic strings (BE CAREFUL!) -b, --backup create a signature backup in $HOME -f, --force force erasure -h, --help show this help text -n, --no-act do everything except the actual write() call -o, --offset <num> offset to erase, in bytes -p, --parsable print out in parsable instead of printable format -q, --quiet suppress output messages -t, --types <list> limit the set of filesystem, RAIDs or partition tables -V, --version output version information and exit For more details see wipefs(8). [root@lamachine ~]# wipefs -V wipefs from util-linux 2.27.1 [root@lamachine ~]# wipefs -a --backup /dev/sdc /dev/sdc: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sdc: calling ioctl to re-read partition table: Success [root@lamachine ~]# wipefs -a --backup /dev/sde /dev/sde: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa /dev/sde: calling ioctl to re-read partition table: Success [root@lamachine ~]# before proceeding with manually creating both partitions could you confirm the above is kind of expected? > Chances are you could just use gdisk to verify and fix the primary and > backup GPTs on sdc and sde Will the fix be offered as part of the verify command in gdisk (command: v)? > When it's all done and working with the new MBR you can either leave > it alone, or you can run gdisk on it and it will immediately convert > it (in memory) and you can commit it to disk with the w command to go > back to GPT so just running the w command after running the verify command while on the same session ? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 10:36 ` Daniel Sanabria @ 2016-09-14 14:32 ` Chris Murphy 2016-09-14 14:57 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-14 14:32 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Wed, Sep 14, 2016 at 4:36 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote > it didn't give me any indication that it was in no-act mode but I > decided to carry on with the backup flag and got this: > > [root@lamachine ~]# wipefs -a -b /dev/sdc > > wipefs: invalid option -- 'b' Huh, must be a bug. Works for me with util-linux-2.28.1-1.fc24.x86_64 and I also get files in the user home. -rw-------. 1 root root 2 Sep 13 22:13 wipefs-sdb-0x000001fe.bak -rw-------. 1 root root 8 Sep 13 22:13 wipefs-sdb-0x00000200.bak -rw-------. 1 root root 8 Sep 13 22:13 wipefs-sdb-0x3ba7ffe00.bak Those are backups for the PMBR, primary GPT and backup GPT. > before proceeding with manually creating both partitions could you > confirm the above is kind of expected? I expected that it would find the GPT's also and erase their signature and write out backup files. Oh well. >> Chances are you could just use gdisk to verify and fix the primary and >> backup GPTs on sdc and sde > > Will the fix be offered as part of the verify command in gdisk (command: v)? It actually does the fix in memory as soon as you run the command, and v just elaborates on any problems that don't make sense like overlapping partitions etc. So yes you can just run gdisk, go to expert menu with x, then verify with v, and print the table with p, and post all of that and we'll confirm before you use w to write out the fixed table. > >> When it's all done and working with the new MBR you can either leave >> it alone, or you can run gdisk on it and it will immediately convert >> it (in memory) and you can commit it to disk with the w command to go >> back to GPT > > so just running the w command after running the verify command while > on the same session ? Yes but let's just skip the fdisk stuff. There is already a primary and backup GPT, and you even have a backup of the backup with the good on on sdd that confirms all of the numbers. So, I think it's safe to just move foward with the repair using gdisk. But post the verify, and print command output first, before writing out the change. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 14:32 ` Chris Murphy @ 2016-09-14 14:57 ` Daniel Sanabria 2016-09-14 15:15 ` Chris Murphy 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-14 14:57 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID > So yes you can just run gdisk, go to expert menu with x, then verify > with v, and print the table with p, and post all of that and we'll > confirm before you use w to write out the fixed table. choices now: [root@lamachine ~]# gdisk /dev/sdc GPT fdisk (gdisk) version 1.0.1 Warning! Disk size is smaller than the main header indicates! Loading secondary header from the last sector of the disk! You should use 'v' to verify disk integrity, and perhaps options on the experts' menu to repair the disk. Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Warning! One or more CRCs don't match. You should repair the disk! Partition table scan: MBR: not present BSD: not present APM: not present GPT: damaged Found invalid MBR and corrupt GPT. What do you want to do? (Using the GPT MAY permit recovery of GPT data.) 1 - Use current GPT 2 - Create blank GPT Your answer: On 14 September 2016 at 15:32, Chris Murphy <lists@colorremedies.com> wrote: > On Wed, Sep 14, 2016 at 4:36 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote > >> it didn't give me any indication that it was in no-act mode but I >> decided to carry on with the backup flag and got this: >> >> [root@lamachine ~]# wipefs -a -b /dev/sdc >> >> wipefs: invalid option -- 'b' > > Huh, must be a bug. Works for me with util-linux-2.28.1-1.fc24.x86_64 > and I also get files in the user home. > > -rw-------. 1 root root 2 Sep 13 22:13 wipefs-sdb-0x000001fe.bak > -rw-------. 1 root root 8 Sep 13 22:13 wipefs-sdb-0x00000200.bak > -rw-------. 1 root root 8 Sep 13 22:13 wipefs-sdb-0x3ba7ffe00.bak > > Those are backups for the PMBR, primary GPT and backup GPT. > > > >> before proceeding with manually creating both partitions could you >> confirm the above is kind of expected? > > I expected that it would find the GPT's also and erase their signature > and write out backup files. Oh well. > > > >>> Chances are you could just use gdisk to verify and fix the primary and >>> backup GPTs on sdc and sde >> >> Will the fix be offered as part of the verify command in gdisk (command: v)? > > It actually does the fix in memory as soon as you run the command, and > v just elaborates on any problems that don't make sense like > overlapping partitions etc. > > So yes you can just run gdisk, go to expert menu with x, then verify > with v, and print the table with p, and post all of that and we'll > confirm before you use w to write out the fixed table. > > > >> >>> When it's all done and working with the new MBR you can either leave >>> it alone, or you can run gdisk on it and it will immediately convert >>> it (in memory) and you can commit it to disk with the w command to go >>> back to GPT >> >> so just running the w command after running the verify command while >> on the same session ? > > Yes but let's just skip the fdisk stuff. There is already a primary > and backup GPT, and you even have a backup of the backup with the > good on on sdd that confirms all of the numbers. So, I think it's safe > to just move foward with the repair using gdisk. But post the verify, > and print command output first, before writing out the change. > > > -- > Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 14:57 ` Daniel Sanabria @ 2016-09-14 15:15 ` Chris Murphy 2016-09-14 15:47 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-14 15:15 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Wed, Sep 14, 2016 at 8:57 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote: >> So yes you can just run gdisk, go to expert menu with x, then verify >> with v, and print the table with p, and post all of that and we'll >> confirm before you use w to write out the fixed table. > > choices now: > > [root@lamachine ~]# gdisk /dev/sdc > > GPT fdisk (gdisk) version 1.0.1 > > Warning! Disk size is smaller than the main header indicates! Loading > secondary header from the last sector of the disk! You should use 'v' to > verify disk integrity, and perhaps options on the experts' menu to repair > the disk. > > Caution: invalid backup GPT header, but valid main header; regenerating > backup header from main header. > > > Warning! One or more CRCs don't match. You should repair the disk! > > Partition table scan: > > MBR: not present > BSD: not present > APM: not present > GPT: damaged > > > Found invalid MBR and corrupt GPT. What do you want to do? (Using the > GPT MAY permit recovery of GPT data.) > 1 - Use current GPT > 2 - Create blank GPT > > > Your answer: 1. Use current. Then x, v, p and post the output. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 15:15 ` Chris Murphy @ 2016-09-14 15:47 ` Daniel Sanabria 2016-09-14 16:10 ` Chris Murphy 0 siblings, 1 reply; 37+ messages in thread From: Daniel Sanabria @ 2016-09-14 15:47 UTC (permalink / raw) To: Chris Murphy; +Cc: Wols Lists, Linux-RAID [root@lamachine ~]# gdisk /dev/sdc GPT fdisk (gdisk) version 1.0.1 Warning! Disk size is smaller than the main header indicates! Loading secondary header from the last sector of the disk! You should use 'v' to verify disk integrity, and perhaps options on the experts' menu to repair the disk. Caution: invalid backup GPT header, but valid main header; regenerating backup header from main header. Warning! One or more CRCs don't match. You should repair the disk! Partition table scan: MBR: not present BSD: not present APM: not present GPT: damaged Found invalid MBR and corrupt GPT. What do you want to do? (Using the GPT MAY permit recovery of GPT data.) 1 - Use current GPT 2 - Create blank GPT Your answer: 1 Command (? for help): x Expert command (? for help): v Caution: The CRC for the backup partition table is invalid. This table may be corrupt. This program will automatically create a new backup partition table when you save your partitions. Problem: The secondary header's self-pointer indicates that it doesn't reside at the end of the disk. If you've added a disk to a RAID array, use the 'e' option on the experts' menu to adjust the secondary header's and partition table's locations. Problem: Disk is too small to hold all the data! (Disk size is 5860531055 sectors, needs to be 5860533168 sectors.) The 'e' option on the experts' menu may fix this problem. Problem: GPT claims the disk is larger than it is! (Claimed last usable sector is 5860533134, but backup header is at 5860533167 and disk size is 5860531055 sectors. The 'e' option on the experts' menu will probably fix this problem Identified 4 problems! Expert command (? for help): p Disk /dev/sdc: 5860531055 sectors, 2.7 TiB Logical sector size: 512 bytes Disk identifier (GUID): 6DB70F4E-D8ED-4290-AA2E-4E81D8324992 Partition table holds up to 128 entries First usable sector is 2048, last usable sector is 5860533134 Partitions will be aligned on 2048-sector boundaries Total free space is 516987791 sectors (246.5 GiB) Number Start (sector) End (sector) Size Code Name 1 2048 4294969343 2.0 TiB FD00 2 4294969344 5343545343 500.0 GiB 8300 Expert command (? for help): ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 15:47 ` Daniel Sanabria @ 2016-09-14 16:10 ` Chris Murphy 2016-09-14 16:13 ` Chris Murphy 0 siblings, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-14 16:10 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Wols Lists, Linux-RAID On Wed, Sep 14, 2016 at 9:47 AM, Daniel Sanabria <sanabria.d@gmail.com> wrote: > [root@lamachine ~]# gdisk /dev/sdc > > GPT fdisk (gdisk) version 1.0.1 > > Warning! Disk size is smaller than the main header indicates! Loading > secondary header from the last sector of the disk! You should use 'v' to > verify disk integrity, and perhaps options on the experts' menu to repair > the disk. > > Caution: invalid backup GPT header, but valid main header; regenerating > backup header from main header. > > Warning! One or more CRCs don't match. You should repair the disk! > > Partition table scan: > MBR: not present > BSD: not present > APM: not present > GPT: damaged > > Found invalid MBR and corrupt GPT. What do you want to do? (Using the > GPT MAY permit recovery of GPT data.) > 1 - Use current GPT > 2 - Create blank GPT > > Your answer: 1 > > Command (? for help): x > > Expert command (? for help): v > > Caution: The CRC for the backup partition table is invalid. This table may > be corrupt. This program will automatically create a new backup partition > table when you save your partitions. > > Problem: The secondary header's self-pointer indicates that it doesn't reside > at the end of the disk. If you've added a disk to a RAID array, use the 'e' > option on the experts' menu to adjust the secondary header's and partition > table's locations. > > Problem: Disk is too small to hold all the data! > (Disk size is 5860531055 sectors, needs to be 5860533168 sectors.) > > The 'e' option on the experts' menu may fix this problem. > > Problem: GPT claims the disk is larger than it is! (Claimed last usable > sector is 5860533134, but backup header is at > 5860533167 and disk size is 5860531055 sectors. > The 'e' option on the experts' menu will probably fix this problem > > Identified 4 problems! > > Expert command (? for help): p > Disk /dev/sdc: 5860531055 sectors, 2.7 TiB > Logical sector size: 512 bytes > Disk identifier (GUID): 6DB70F4E-D8ED-4290-AA2E-4E81D8324992 > Partition table holds up to 128 entries > First usable sector is 2048, last usable sector is 5860533134 > Partitions will be aligned on 2048-sector boundaries > Total free space is 516987791 sectors (246.5 GiB) > > Number Start (sector) End (sector) Size Code Name > 1 2048 4294969343 2.0 TiB FD00 > 2 4294969344 5343545343 500.0 GiB 8300 > > Expert command (? for help): Use the e command. That should fix the 3 main problems, and then a new CRC is automatically computed for the two headers and two tables at write time. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 16:10 ` Chris Murphy @ 2016-09-14 16:13 ` Chris Murphy 2016-09-14 18:16 ` Daniel Sanabria 0 siblings, 1 reply; 37+ messages in thread From: Chris Murphy @ 2016-09-14 16:13 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Linux-RAID Low priority but you could make the type codes consistent for both partitions. It doesn't matter if they're 8300 or FD00 on GPT disks, there's nothing I know that actually uses this information. FD00 is maybe slightly better only in that it'll flag a human to expect that there's mdadm metadata on this partition rather than a file system. Chris ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 16:13 ` Chris Murphy @ 2016-09-14 18:16 ` Daniel Sanabria 2016-09-14 18:37 ` Chris Murphy 2016-09-14 18:42 ` Wols Lists 0 siblings, 2 replies; 37+ messages in thread From: Daniel Sanabria @ 2016-09-14 18:16 UTC (permalink / raw) To: Chris Murphy; +Cc: Linux-RAID BRAVO!!!!!! Thanks a million Chris! After following your advice on recovering the MBR and the GPT the arrays re-assembled automatically and all data is there. I already changed the type to make it consistent (FD00 on both partitions) and working on setting up the timeouts to 180 at boot time. Other than replacing the green drives with something more suitable (any suggestions are welcome), what else would you suggest to change to make the setup a bit more consistent and upgrade proof (i.e. having different metadata versions doesn't look right to me)? I'd also like to thank Wol and Adam for their help and for keeping the thread alive. Thanks again and again, This is the current status: [root@lamachine ~]# mdadm -D /dev/md* /dev/md126: Version : 0.90 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Array Size : 30719936 (29.30 GiB 31.46 GB) Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Persistence : Superblock is persistent Update Time : Wed Sep 14 19:02:55 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 64K UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Events : 0.264152 Number Major Minor RaidDevice State 0 8 82 0 active sync set-A /dev/sdf2 1 8 1 1 active sync set-B /dev/sda1 /dev/md127: Version : 1.2 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Array Size : 94367232 (90.00 GiB 96.63 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Tue Jul 26 19:00:28 2011 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : reading.homeunix.com:3 UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Events : 0 Number Major Minor RaidDevice State 0 8 85 0 active sync /dev/sdf5 1 8 21 1 active sync /dev/sdb5 2 8 5 2 active sync /dev/sda5 /dev/md128: Version : 1.2 Creation Time : Fri Oct 24 15:24:38 2014 Raid Level : raid5 Array Size : 4294705152 (4095.75 GiB 4397.78 GB) Used Dev Size : 2147352576 (2047.88 GiB 2198.89 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Sep 14 18:46:47 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Name : lamachine:128 (local to host lamachine) UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Events : 4154 Number Major Minor RaidDevice State 0 8 65 0 active sync /dev/sde1 1 8 33 1 active sync /dev/sdc1 3 8 49 2 active sync /dev/sdd1 /dev/md129: Version : 1.2 Creation Time : Mon Nov 10 16:28:11 2014 Raid Level : raid0 Array Size : 1572470784 (1499.63 GiB 1610.21 GB) Raid Devices : 3 Total Devices : 3 Persistence : Superblock is persistent Update Time : Mon Nov 10 16:28:11 2014 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Chunk Size : 512K Name : lamachine:129 (local to host lamachine) UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Events : 0 Number Major Minor RaidDevice State 0 8 66 0 active sync /dev/sde2 1 8 34 1 active sync /dev/sdc2 2 8 50 2 active sync /dev/sdd2 /dev/md2: Version : 0.90 Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Array Size : 511999872 (488.28 GiB 524.29 GB) Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Wed Sep 14 18:48:51 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Events : 0.611 Number Major Minor RaidDevice State 0 8 83 0 active sync /dev/sdf3 1 8 18 1 active sync /dev/sdb2 2 8 2 2 active sync /dev/sda2 [root@lamachine ~]# [root@lamachine ~]# mdadm -E /dev/sd* /dev/sda: MBR Magic : aa55 Partition[0] : 61440000 sectors at 63 (type fd) Partition[1] : 512000000 sectors at 61440063 (type fd) Partition[2] : 403328002 sectors at 573440063 (type 05) /dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Array Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Wed Sep 14 19:07:26 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : ed993c29 - correct Events : 264152 Layout : near=2 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 1 1 active sync /dev/sda1 0 0 8 82 0 active sync /dev/sdf2 1 1 8 1 1 active sync /dev/sda1 /dev/sda2: Magic : a92b4efc Version : 0.90.00 UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Array Size : 511999872 (488.28 GiB 524.29 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Update Time : Wed Sep 14 18:48:51 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 73b2a76e - correct Events : 611 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 2 2 active sync /dev/sda2 0 0 8 83 0 active sync /dev/sdf3 1 1 8 18 1 active sync /dev/sdb2 2 2 8 2 2 active sync /dev/sda2 /dev/sda3: MBR Magic : aa55 Partition[0] : 62910589 sectors at 63 (type 83) Partition[1] : 7116795 sectors at 82445692 (type 05) /dev/sda5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Name : reading.homeunix.com:3 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 62908541 (30.00 GiB 32.21 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Unused Space : before=1968 sectors, after=0 sectors State : clean Device UUID : a0efc1b3:94cc6eb8:deea76ca:772b2d2d Update Time : Tue Jul 26 19:00:28 2011 Checksum : 9eba9119 - correct Events : 0 Chunk Size : 512K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) mdadm: No md superblock detected on /dev/sda6. /dev/sdb: MBR Magic : aa55 Partition[1] : 512000000 sectors at 409663 (type fd) Partition[2] : 16384000 sectors at 512409663 (type 82) Partition[3] : 447974402 sectors at 528793663 (type 05) /dev/sdb2: Magic : a92b4efc Version : 0.90.00 UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Array Size : 511999872 (488.28 GiB 524.29 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Update Time : Wed Sep 14 18:48:51 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 73b2a77c - correct Events : 611 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 18 1 active sync /dev/sdb2 0 0 8 83 0 active sync /dev/sdf3 1 1 8 18 1 active sync /dev/sdb2 2 2 8 2 2 active sync /dev/sda2 mdadm: No md superblock detected on /dev/sdb3. /dev/sdb4: MBR Magic : aa55 Partition[0] : 62912354 sectors at 63 (type 83) Partition[1] : 7116795 sectors at 82447457 (type 05) /dev/sdb5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Name : reading.homeunix.com:3 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 62910306 (30.00 GiB 32.21 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Unused Space : before=1968 sectors, after=0 sectors State : clean Device UUID : 152d0202:64efb3e7:f23658c3:82a239a1 Update Time : Tue Jul 26 19:00:28 2011 Checksum : 892dbb61 - correct Events : 0 Chunk Size : 512K Device Role : Active device 1 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) mdadm: No md superblock detected on /dev/sdb6. /dev/sdc: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Name : lamachine:128 (local to host lamachine) Creation Time : Fri Oct 24 15:24:38 2014 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 4294705152 (2047.88 GiB 2198.89 GB) Array Size : 4294705152 (4095.75 GiB 4397.78 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 8b1bac5c:6c2cb5a4:bff59099:986b26cd Internal Bitmap : 8 sectors from superblock Update Time : Wed Sep 14 18:46:47 2016 Bad Block Log : 512 entries available at offset 72 sectors Checksum : a4766ee5 - correct Events : 4154 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdc2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Name : lamachine:129 (local to host lamachine) Creation Time : Mon Nov 10 16:28:11 2014 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 1048313856 (499.88 GiB 536.74 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 8bc41f51:9af76d31:36349135:2d004cb3 Update Time : Mon Nov 10 16:28:11 2014 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 2af9fe79 - correct Events : 0 Chunk Size : 512K Device Role : Active device 1 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Name : lamachine:128 (local to host lamachine) Creation Time : Fri Oct 24 15:24:38 2014 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 4294705152 (2047.88 GiB 2198.89 GB) Array Size : 4294705152 (4095.75 GiB 4397.78 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 1f652d4f:92fccd8e:b439abf2:76b881e1 Internal Bitmap : 8 sectors from superblock Update Time : Wed Sep 14 18:46:47 2016 Bad Block Log : 512 entries available at offset 72 sectors Checksum : ee861974 - correct Events : 4154 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Name : lamachine:129 (local to host lamachine) Creation Time : Mon Nov 10 16:28:11 2014 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 1048313856 (499.88 GiB 536.74 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 562dd382:5ccc00aa:449ea7e4:d8b266c2 Update Time : Mon Nov 10 16:28:11 2014 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 937158c1 - correct Events : 0 Chunk Size : 512K Device Role : Active device 2 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde: MBR Magic : aa55 Partition[0] : 4294967295 sectors at 1 (type ee) /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f2372cb9:d3816fd6:ce86d826:882ec82e Name : lamachine:128 (local to host lamachine) Creation Time : Fri Oct 24 15:24:38 2014 Raid Level : raid5 Raid Devices : 3 Avail Dev Size : 4294705152 (2047.88 GiB 2198.89 GB) Array Size : 4294705152 (4095.75 GiB 4397.78 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 95334f42:beba0d90:8a0854f4:7dfdbd31 Internal Bitmap : 8 sectors from superblock Update Time : Wed Sep 14 18:46:47 2016 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 34ccb9f0 - correct Events : 4154 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sde2: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 895dae98:d1a496de:4f590b8b:cb8ac12a Name : lamachine:129 (local to host lamachine) Creation Time : Mon Nov 10 16:28:11 2014 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 1048313856 (499.88 GiB 536.74 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : fd6e6f59:89dad658:e361db17:7c15a63f Update Time : Mon Nov 10 16:28:11 2014 Bad Block Log : 512 entries available at offset 72 sectors Checksum : c956ced4 - correct Events : 0 Chunk Size : 512K Device Role : Active device 0 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf: MBR Magic : aa55 Partition[0] : 407552 sectors at 2048 (type 83) Partition[1] : 61440000 sectors at 409663 (type fd) Partition[2] : 512000000 sectors at 61849663 (type fd) Partition[3] : 402918402 sectors at 573849663 (type 05) mdadm: No md superblock detected on /dev/sdf1. /dev/sdf2: Magic : a92b4efc Version : 0.90.00 UUID : 9af006ca:8845bbd3:bfe78010:bc810f04 Creation Time : Thu Dec 3 22:12:12 2009 Raid Level : raid10 Used Dev Size : 30719936 (29.30 GiB 31.46 GB) Array Size : 30719936 (29.30 GiB 31.46 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 126 Update Time : Wed Sep 14 19:07:26 2016 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : ed993c78 - correct Events : 264152 Layout : near=2 Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 82 0 active sync /dev/sdf2 0 0 8 82 0 active sync /dev/sdf2 1 1 8 1 1 active sync /dev/sda1 /dev/sdf3: Magic : a92b4efc Version : 0.90.00 UUID : 2cff15d1:e411447b:fd5d4721:03e44022 (local to host lamachine) Creation Time : Mon Feb 11 07:54:36 2013 Raid Level : raid5 Used Dev Size : 255999936 (244.14 GiB 262.14 GB) Array Size : 511999872 (488.28 GiB 524.29 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 2 Update Time : Wed Sep 14 18:48:51 2016 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 73b2a7bb - correct Events : 611 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 83 0 active sync /dev/sdf3 0 0 8 83 0 active sync /dev/sdf3 1 1 8 18 1 active sync /dev/sdb2 2 2 8 2 2 active sync /dev/sda2 /dev/sdf4: MBR Magic : aa55 Partition[0] : 62918679 sectors at 63 (type 83) Partition[1] : 7116795 sectors at 82453782 (type 05) /dev/sdf5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : acd5374f:72628c93:6a906c4b:5f675ce5 Name : reading.homeunix.com:3 Creation Time : Tue Jul 26 19:00:28 2011 Raid Level : raid0 Raid Devices : 3 Avail Dev Size : 62916631 (30.00 GiB 32.21 GB) Data Offset : 2048 sectors Super Offset : 8 sectors Unused Space : before=1968 sectors, after=0 sectors State : clean Device UUID : 5778cd64:0bbba183:ef3270a8:41f83aca Update Time : Tue Jul 26 19:00:28 2011 Checksum : 96003cba - correct Events : 0 Chunk Size : 512K Device Role : Active device 0 Array State : AAA ('A' == active, '.' == missing, 'R' == replacing) mdadm: No md superblock detected on /dev/sdf6. [root@lamachine ~]# cat /etc/mdadm.conf # mdadm.conf written out by anaconda MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md2 level=raid5 num-devices=3 UUID=2cff15d1:e411447b:fd5d4721:03e44022 ARRAY /dev/md126 level=raid10 num-devices=2 UUID=9af006ca:8845bbd3:bfe78010:bc810f04 ARRAY /dev/md127 level=raid0 num-devices=3 UUID=acd5374f:72628c93:6a906c4b:5f675ce5 ARRAY /dev/md128 metadata=1.2 spares=1 name=lamachine:128 UUID=f2372cb9:d3816fd6:ce86d826:882ec82e ARRAY /dev/md129 metadata=1.2 name=lamachine:129 UUID=895dae98:d1a496de:4f590b8b:cb8ac12a [root@lamachine ~]# On 14 September 2016 at 17:13, Chris Murphy <lists@colorremedies.com> wrote: > Low priority but you could make the type codes consistent for both > partitions. It doesn't matter if they're 8300 or FD00 on GPT disks, > there's nothing I know that actually uses this information. FD00 is > maybe slightly better only in that it'll flag a human to expect that > there's mdadm metadata on this partition rather than a file system. > > > Chris ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 18:16 ` Daniel Sanabria @ 2016-09-14 18:37 ` Chris Murphy 2016-09-14 18:42 ` Wols Lists 1 sibling, 0 replies; 37+ messages in thread From: Chris Murphy @ 2016-09-14 18:37 UTC (permalink / raw) To: Daniel Sanabria; +Cc: Chris Murphy, Linux-RAID On Wed, Sep 14, 2016 at 12:16 PM, Daniel Sanabria <sanabria.d@gmail.com> wrote: > BRAVO!!!!!! > > Thanks a million Chris! After following your advice on recovering the > MBR and the GPT the arrays re-assembled automatically and all data is > there. > > I already changed the type to make it consistent (FD00 on both > partitions) and working on setting up the timeouts to 180 at boot > time. Other than replacing the green drives with something more > suitable (any suggestions are welcome), what else would you suggest to > change to make the setup a bit more consistent and upgrade proof (i.e. > having different metadata versions doesn't look right to me)? Like I mentioned there's something about Greens spinning down that you might look at. I'm not sure if delays in spinning back up is a contributing factor to anything? I'd kinda expect that if the kernel/libata send a command to the drive, and one spins up slow, the kernel is just going to wait up to whatever the command timer is set to. So if you set that to 180 seconds, it should be fine because no drive takes 3 minute to spin up. But... I dunno if there's some other vector for these drives to cause confusion. Umm, yeah I don't think you need to worry too much about the metadata. 0.9 is deprecated, uses kernel autodetect rather than initrd based detection like metadata 1.x, and can be more complex to troubleshoot. But so long as it's working I honestly wouldn't mess with it. If you do want to simplify it just make sure you have current backups because changes are a RIPE time for mistakes that end up in user data loss. I would pretty much just assume the user will break something, you have a not too complex layout compared to others I've seen, but there are some opportunities to make simple mistakes that will just blow shit up and then you're screwed. So I'd say it's easier to just plot a future when you're going to buy a bunch of new drives and do a complete migration, rather than change the existing setup metadata just for the sake of changing it. And one thing to incorporate in the planning stage is LVM RAID. You could take all of your drives into one big pool, and create LV's like you are individual RAIDs, and each LV can have its own RAID level. In many ways it's easier because you're already using LVM on top of RAID on top of partitioning. Instead you can create basically one partition, add them all to LVM, and then manage the LV and raid level at the same time. The main issue here is, familiarity with all the tools. If you're more comfortable with mdadm, then use that. If you can get over the hurdle that is lvm tools (it's like emacs for storage, its metric piles of flags, documentation, features, and as yet doesn't have all the same features as mdadm still for the raid stuff). But it'll do scrubs, and device replacements, all the basic stuff is there. Monitoring for drive failures is a little different, I don't think it has a way to email you like mdadm does in case of drive failures/ejections. So you'll have to look at that also. Note that on the backend LVM raid uses the md kernel driver just like mdadm does, it's just the user space tools and on disk metadata that differ. -- Chris Murphy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 18:16 ` Daniel Sanabria 2016-09-14 18:37 ` Chris Murphy @ 2016-09-14 18:42 ` Wols Lists 2016-09-15 9:21 ` Brad Campbell 1 sibling, 1 reply; 37+ messages in thread From: Wols Lists @ 2016-09-14 18:42 UTC (permalink / raw) To: Daniel Sanabria, Chris Murphy; +Cc: Linux-RAID On 14/09/16 19:16, Daniel Sanabria wrote: > Other than replacing the green drives with something more > suitable (any suggestions are welcome) WD Reds or Seagate NAS. I don't think they make them any more, but Seagate Constellations are fine too. My Toshiba 2TB 2.5" laptop drive would be fine. The tl;dr version of the problem with Greens (and any other desktop drive for that matter), if you haven't read it up yet, is that when the kernel requests a read from a dodgy drive, it just sits there, *unresponsive*, until the read succeeds or the drive times out. And the drive will time out in its own good time. If the kernel times out *before* the drive, and by default the kernel does so after 7 secs, while the drive can take two minutes or more, then the kernel will recreate the missing block and try to write it. The drive is unresponsive, the write times out, and the kernel assumes the drive is dead and kicks it from the array. That's why you need to increase the kernel timeout, because you can't reduce the drive timeout, and which is why a flaky hard drive will cause system response to fall off horrendously. Cheers, Wol ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Inactive arrays 2016-09-14 18:42 ` Wols Lists @ 2016-09-15 9:21 ` Brad Campbell 0 siblings, 0 replies; 37+ messages in thread From: Brad Campbell @ 2016-09-15 9:21 UTC (permalink / raw) To: Wols Lists, Daniel Sanabria, Chris Murphy; +Cc: Linux-RAID On 15/09/16 02:42, Wols Lists wrote: > The tl;dr version of the problem with Greens (and any other desktop > drive for that matter), if you haven't read it up yet, is that when the > kernel requests a read from a dodgy drive, it just sits there, > *unresponsive*, until the read succeeds or the drive times out. And the > drive will time out in its own good time. Yep. I've had great results using Greens but the trick is either TLER or adjusted timeout, *and* disable head parking & spindown. I'm lucky in that all my Greens are old enough they still have TLER. These are the oldest ones with power on hours taken from SMART: /dev/sdq - WDC WD20EARS-60MVWB0 - 5 years 157 days /dev/sdi - WDC WD20EARS-60MVWB0 - 5 years 157 days /dev/sdl - WDC WD20EARS-60MVWB0 - 5 years 157 days /dev/sdm - WDC WD20EARS-60MVWB0 - 5 years 157 days /dev/sdp - WDC WD20EARS-60MVWB0 - 5 years 157 days /dev/sdn - WDC WD20EARS-60MVWB0 - 5 years 157 days /dev/sdg - WDC WD20EARS-60MVWB0 - 5 years 158 days I have one that is only 3 years 242 days (replacement for an early failure), and anything newer than that is a Red. I started with 10 Greens and 7 are still humming along nicely. Like any commodity drive, get your timeouts right and keep them spinning. Regards, Brad. ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2016-09-15 9:21 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-02 7:36 Inactive arrays Daniel Sanabria 2016-08-02 10:17 ` Wols Lists 2016-08-02 10:45 ` Daniel Sanabria 2016-08-03 19:18 ` Daniel Sanabria 2016-08-03 21:31 ` Wols Lists 2016-09-11 18:48 ` Daniel Sanabria 2016-09-11 20:06 ` Daniel Sanabria 2016-09-12 19:41 ` Daniel Sanabria 2016-09-12 21:13 ` Daniel Sanabria 2016-09-12 21:37 ` Chris Murphy 2016-09-13 6:51 ` Daniel Sanabria 2016-09-13 15:04 ` Chris Murphy 2016-09-12 21:39 ` Wols Lists 2016-09-13 6:56 ` Daniel Sanabria 2016-09-13 7:02 ` Adam Goryachev 2016-09-13 15:20 ` Chris Murphy 2016-09-13 19:43 ` Daniel Sanabria 2016-09-13 19:52 ` Chris Murphy 2016-09-13 20:04 ` Daniel Sanabria 2016-09-13 20:13 ` Chris Murphy 2016-09-13 20:29 ` Daniel Sanabria 2016-09-13 20:36 ` Daniel Sanabria 2016-09-13 21:10 ` Chris Murphy 2016-09-13 21:46 ` Daniel Sanabria 2016-09-13 21:26 ` Wols Lists 2016-09-14 4:33 ` Chris Murphy 2016-09-14 10:36 ` Daniel Sanabria 2016-09-14 14:32 ` Chris Murphy 2016-09-14 14:57 ` Daniel Sanabria 2016-09-14 15:15 ` Chris Murphy 2016-09-14 15:47 ` Daniel Sanabria 2016-09-14 16:10 ` Chris Murphy 2016-09-14 16:13 ` Chris Murphy 2016-09-14 18:16 ` Daniel Sanabria 2016-09-14 18:37 ` Chris Murphy 2016-09-14 18:42 ` Wols Lists 2016-09-15 9:21 ` Brad Campbell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.