* Cannot add replacement hard drive to mdadm RAID5 array @ 2021-03-12 22:55 Devon Beets 2021-03-15 14:38 ` Artur Paszkiewicz 0 siblings, 1 reply; 7+ messages in thread From: Devon Beets @ 2021-03-12 22:55 UTC (permalink / raw) To: linux-raid; +Cc: Glenn Wikle Hello, My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail. Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not. Below is the output of some commands for additional diagnostic information. Please let me know if you need more. Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far. uname -a Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux sudo mdadm --version mdadm - v4.1-126-gbdbe7f8 - 2021-03-09 sudo smartctl -H -i -l scterc /dev/sda smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ST2000NX0253 Serial Number: W461SCHM LU WWN Device Id: 5 000c50 0b426d2d0 Firmware Version: SN05 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Mar 11 15:07:30 2021 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SCT Error Recovery Control: Read: 100 (10.0 seconds) Write: 100 (10.0 seconds) sudo smartctl -H -i -l scterc /dev/sdb smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ST2000NX0253 Serial Number: W462MZ0R LU WWN Device Id: 5 000c50 0c569b51c Firmware Version: SN05 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Mar 11 15:09:34 2021 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SCT Error Recovery Control: Read: 100 (10.0 seconds) Write: 100 (10.0 seconds) sudo smartctl -H -i -l scterc /dev/sdc smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ST2000NX0253 Serial Number: W461NLPM LU WWN Device Id: 5 000c50 0b426f335 Firmware Version: SN05 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Mar 11 15:14:38 2021 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SCT Error Recovery Control: Read: 100 (10.0 seconds) Write: 100 (10.0 seconds) sudo smartctl -H -i -l scterc /dev/sdd smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: ST2000NX0253 Serial Number: W461NHAB LU WWN Device Id: 5 000c50 0b426f8a4 Firmware Version: SN05 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 (minor revision not indicated) SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Thu Mar 11 15:16:24 2021 MST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SCT Error Recovery Control: Read: 100 (10.0 seconds) Write: 100 (10.0 seconds) sudo mdadm --examine /dev/sda /dev/sda: Magic : Intel Raid ISM Cfg Sig. Version : 1.3.00 Orig Family : 154b243e Family : 154b243e Generation : 000003aa Creation Time : Unknown Attributes : All supported UUID : 72360627:bb745f4c:aedafaab:e25d3123 Checksum : 21ae5a2a correct MPB Sectors : 2 Disks : 4 RAID Devices : 1 Disk00 Serial : W461SCHM State : active Id : 00000000 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) [Data]: Subarray : 0 UUID : 764aa814:831953a1:06cf2a07:1ca42b2e RAID Level : 5 <-- 5 Members : 4 <-- 4 Slots : [U_UU] <-- [U_UU] Failed disk : 1 This Slot : 0 Sector Size : 512 Array Size : 11135008768 (5.19 TiB 5.70 TB) Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) Sector Offset : 0 Num Stripes : 28997420 Chunk Size : 64 KiB <-- 64 KiB Reserved : 0 Migrate State : repair Map State : degraded <-- degraded Checkpoint : 462393 (512) Dirty State : dirty RWH Policy : off Volume ID : 1 Disk01 Serial : W461S13X:0 State : active Id : ffffffff Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) Disk02 Serial : W461NLPM State : active Id : 00000002 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) Disk03 Serial : W461NHAB State : active Id : 00000003 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) sudo mdadm --examine /dev/sdb /dev/sdb: Magic : Intel Raid ISM Cfg Sig. Version : 1.0.00 Orig Family : 00000000 Family : e5cd8601 Generation : 00000001 Creation Time : Unknown Attributes : All supported UUID : 00000000:00000000:00000000:00000000 Checksum : cb9b0c02 correct MPB Sectors : 1 Disks : 1 RAID Devices : 0 Disk00 Serial : W462MZ0R State : spare Id : 04000000 Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) Disk Serial : W462MZ0R State : spare Id : 04000000 Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) sudo mdadm --examine /dev/sdc /dev/sdc: Magic : Intel Raid ISM Cfg Sig. Version : 1.3.00 Orig Family : 154b243e Family : 154b243e Generation : 000003aa Creation Time : Unknown Attributes : All supported UUID : 72360627:bb745f4c:aedafaab:e25d3123 Checksum : 21ae5a2a correct MPB Sectors : 2 Disks : 4 RAID Devices : 1 Disk02 Serial : W461NLPM State : active Id : 00000002 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) [Data]: Subarray : 0 UUID : 764aa814:831953a1:06cf2a07:1ca42b2e RAID Level : 5 <-- 5 Members : 4 <-- 4 Slots : [U_UU] <-- [U_UU] Failed disk : 1 This Slot : 2 Sector Size : 512 Array Size : 11135008768 (5.19 TiB 5.70 TB) Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) Sector Offset : 0 Num Stripes : 28997420 Chunk Size : 64 KiB <-- 64 KiB Reserved : 0 Migrate State : repair Map State : degraded <-- degraded Checkpoint : 462393 (512) Dirty State : dirty RWH Policy : off Volume ID : 1 Disk00 Serial : W461SCHM State : active Id : 00000000 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) Disk01 Serial : W461S13X:0 State : active Id : ffffffff Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) Disk03 Serial : W461NHAB State : active Id : 00000003 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) sudo mdadm --examine /dev/sdd /dev/sdd: Magic : Intel Raid ISM Cfg Sig. Version : 1.3.00 Orig Family : 154b243e Family : 154b243e Generation : 000003aa Creation Time : Unknown Attributes : All supported UUID : 72360627:bb745f4c:aedafaab:e25d3123 Checksum : 21ae5a2a correct MPB Sectors : 2 Disks : 4 RAID Devices : 1 Disk03 Serial : W461NHAB State : active Id : 00000003 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) [Data]: Subarray : 0 UUID : 764aa814:831953a1:06cf2a07:1ca42b2e RAID Level : 5 <-- 5 Members : 4 <-- 4 Slots : [U_UU] <-- [U_UU] Failed disk : 1 This Slot : 3 Sector Size : 512 Array Size : 11135008768 (5.19 TiB 5.70 TB) Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) Sector Offset : 0 Num Stripes : 28997420 Chunk Size : 64 KiB <-- 64 KiB Reserved : 0 Migrate State : repair Map State : degraded <-- degraded Checkpoint : 462393 (512) Dirty State : dirty RWH Policy : off Volume ID : 1 Disk00 Serial : W461SCHM State : active Id : 00000000 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) Disk01 Serial : W461S13X:0 State : active Id : ffffffff Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) Disk02 Serial : W461NLPM State : active Id : 00000002 Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) sudo mdadm --detail /dev/md125 /dev/md125: Version : imsm Raid Level : container Total Devices : 1 Working Devices : 1 Member Arrays : Number Major Minor RaidDevice - 8 16 - /dev/sdb sudo mdadm --detail /dev/md126 /dev/md126: Container : /dev/md/imsm0, member 0 Raid Level : raid5 Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB) Raid Devices : 4 Total Devices : 3 State : active, FAILED, Not Started Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-asymmetric Chunk Size : 64K Consistency Policy : unknown UUID : 764aa814:831953a1:06cf2a07:1ca42b2e Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 0 0 3 removed - 8 0 0 sync /dev/sda - 8 32 2 sync /dev/sdc - 8 48 3 sync /dev/sdd sudo mdadm --detail /dev/md127 /dev/md127: Version : imsm Raid Level : container Total Devices : 3 Working Devices : 3 UUID : 72360627:bb745f4c:aedafaab:e25d3123 Member Arrays : /dev/md126 Number Major Minor RaidDevice - 8 0 - /dev/sda - 8 32 - /dev/sdc - 8 48 - /dev/sdd lsdrv **Warning** The following utility(ies) failed to execute: sginfo Some information may be missing. PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 └nvme nvme0 Samsung SSD 970 EVO Plus 500GB {S4P2NF0M501223D} └nvme0n1 465.76g [259:0] Empty/Unknown ├nvme0n1p1 512.00m [259:1] Empty/Unknown │└Mounted as /dev/nvme0n1p1 @ /boot/efi ├nvme0n1p2 732.00m [259:2] Empty/Unknown │└Mounted as /dev/nvme0n1p2 @ /boot └nvme0n1p3 464.54g [259:3] Empty/Unknown ├dm-0 463.59g [252:0] Empty/Unknown │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ / └dm-1 980.00m [252:1] Empty/Unknown PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09) └scsi 0:x:x:x [Empty] PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09) ├scsi 2:0:0:0 ATA ST2000NX0253 │└sda 1.82t [8:0] Empty/Unknown │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} │ │ Empty/Unknown │ └md127 0.00k [9:127] MD vexternal:imsm () inactive, None (None) None {None} │ Empty/Unknown ├scsi 3:0:0:0 ATA ST2000NX0253 │└sdb 1.82t [8:16] Empty/Unknown │ └md125 0.00k [9:125] MD vexternal:imsm () inactive, None (None) None {None} │ Empty/Unknown ├scsi 4:0:0:0 ATA ST2000NX0253 │└sdc 1.82t [8:32] Empty/Unknown │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} │ │ Empty/Unknown ├scsi 5:0:0:0 ATA ST2000NX0253 │└sdd 1.82t [8:48] Empty/Unknown │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} │ │ Empty/Unknown └scsi 6:0:0:0 Slimtype DVD A DS8ACSH └sr0 1.00g [11:0] Empty/Unknown Other Block Devices ├loop0 0.00k [7:0] Empty/Unknown ├loop1 0.00k [7:1] Empty/Unknown ├loop2 0.00k [7:2] Empty/Unknown ├loop3 0.00k [7:3] Empty/Unknown ├loop4 0.00k [7:4] Empty/Unknown ├loop5 0.00k [7:5] Empty/Unknown ├loop6 0.00k [7:6] Empty/Unknown └loop7 0.00k [7:7] Empty/Unknown cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md125 : inactive sdb[0](S) 1105 blocks super external:imsm md126 : inactive sda[2] sdc[1] sdd[0] 5567507712 blocks super external:/md127/0 md127 : inactive sdc[2](S) sdd[1](S) sda[0](S) 9459 blocks super external:imsm unused devices: <none> Thanks for the help! Devon Beets ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Cannot add replacement hard drive to mdadm RAID5 array 2021-03-12 22:55 Cannot add replacement hard drive to mdadm RAID5 array Devon Beets @ 2021-03-15 14:38 ` Artur Paszkiewicz 2021-03-16 8:15 ` Tkaczyk, Mariusz 0 siblings, 1 reply; 7+ messages in thread From: Artur Paszkiewicz @ 2021-03-15 14:38 UTC (permalink / raw) To: Devon Beets, linux-raid; +Cc: Glenn Wikle On 3/12/21 11:55 PM, Devon Beets wrote: > Hello, > > My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail. > > Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not. > > Below is the output of some commands for additional diagnostic information. Please let me know if you need more. > > Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far. > > uname -a > Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux > > sudo mdadm --version > mdadm - v4.1-126-gbdbe7f8 - 2021-03-09 > > sudo smartctl -H -i -l scterc /dev/sda > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Device Model: ST2000NX0253 > Serial Number: W461SCHM > LU WWN Device Id: 5 000c50 0b426d2d0 > Firmware Version: SN05 > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 7200 rpm > Form Factor: 2.5 inches > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: ACS-3 (minor revision not indicated) > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Thu Mar 11 15:07:30 2021 MST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 100 (10.0 seconds) > Write: 100 (10.0 seconds) > > > sudo smartctl -H -i -l scterc /dev/sdb > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Device Model: ST2000NX0253 > Serial Number: W462MZ0R > LU WWN Device Id: 5 000c50 0c569b51c > Firmware Version: SN05 > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 7200 rpm > Form Factor: 2.5 inches > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: ACS-3 (minor revision not indicated) > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Thu Mar 11 15:09:34 2021 MST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 100 (10.0 seconds) > Write: 100 (10.0 seconds) > > sudo smartctl -H -i -l scterc /dev/sdc > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Device Model: ST2000NX0253 > Serial Number: W461NLPM > LU WWN Device Id: 5 000c50 0b426f335 > Firmware Version: SN05 > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 7200 rpm > Form Factor: 2.5 inches > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: ACS-3 (minor revision not indicated) > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Thu Mar 11 15:14:38 2021 MST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 100 (10.0 seconds) > Write: 100 (10.0 seconds) > > sudo smartctl -H -i -l scterc /dev/sdd > smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) > Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org > > === START OF INFORMATION SECTION === > Device Model: ST2000NX0253 > Serial Number: W461NHAB > LU WWN Device Id: 5 000c50 0b426f8a4 > Firmware Version: SN05 > User Capacity: 2,000,398,934,016 bytes [2.00 TB] > Sector Sizes: 512 bytes logical, 4096 bytes physical > Rotation Rate: 7200 rpm > Form Factor: 2.5 inches > Device is: Not in smartctl database [for details use: -P showall] > ATA Version is: ACS-3 (minor revision not indicated) > SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) > Local Time is: Thu Mar 11 15:16:24 2021 MST > SMART support is: Available - device has SMART capability. > SMART support is: Enabled > > === START OF READ SMART DATA SECTION === > SMART overall-health self-assessment test result: PASSED > > SCT Error Recovery Control: > Read: 100 (10.0 seconds) > Write: 100 (10.0 seconds) > > sudo mdadm --examine /dev/sda > /dev/sda: > Magic : Intel Raid ISM Cfg Sig. > Version : 1.3.00 > Orig Family : 154b243e > Family : 154b243e > Generation : 000003aa > Creation Time : Unknown > Attributes : All supported > UUID : 72360627:bb745f4c:aedafaab:e25d3123 > Checksum : 21ae5a2a correct > MPB Sectors : 2 > Disks : 4 > RAID Devices : 1 > > Disk00 Serial : W461SCHM > State : active > Id : 00000000 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > [Data]: > Subarray : 0 > UUID : 764aa814:831953a1:06cf2a07:1ca42b2e > RAID Level : 5 <-- 5 > Members : 4 <-- 4 > Slots : [U_UU] <-- [U_UU] > Failed disk : 1 > This Slot : 0 > Sector Size : 512 > Array Size : 11135008768 (5.19 TiB 5.70 TB) > Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) > Sector Offset : 0 > Num Stripes : 28997420 > Chunk Size : 64 KiB <-- 64 KiB > Reserved : 0 > Migrate State : repair > Map State : degraded <-- degraded > Checkpoint : 462393 (512) > Dirty State : dirty > RWH Policy : off > Volume ID : 1 > > Disk01 Serial : W461S13X:0 > State : active > Id : ffffffff > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > Disk02 Serial : W461NLPM > State : active > Id : 00000002 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > Disk03 Serial : W461NHAB > State : active > Id : 00000003 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > sudo mdadm --examine /dev/sdb > /dev/sdb: > Magic : Intel Raid ISM Cfg Sig. > Version : 1.0.00 > Orig Family : 00000000 > Family : e5cd8601 > Generation : 00000001 > Creation Time : Unknown > Attributes : All supported > UUID : 00000000:00000000:00000000:00000000 > Checksum : cb9b0c02 correct > MPB Sectors : 1 > Disks : 1 > RAID Devices : 0 > > Disk00 Serial : W462MZ0R > State : spare > Id : 04000000 > Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) > > Disk Serial : W462MZ0R > State : spare > Id : 04000000 > Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) > > sudo mdadm --examine /dev/sdc > /dev/sdc: > Magic : Intel Raid ISM Cfg Sig. > Version : 1.3.00 > Orig Family : 154b243e > Family : 154b243e > Generation : 000003aa > Creation Time : Unknown > Attributes : All supported > UUID : 72360627:bb745f4c:aedafaab:e25d3123 > Checksum : 21ae5a2a correct > MPB Sectors : 2 > Disks : 4 > RAID Devices : 1 > > Disk02 Serial : W461NLPM > State : active > Id : 00000002 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > [Data]: > Subarray : 0 > UUID : 764aa814:831953a1:06cf2a07:1ca42b2e > RAID Level : 5 <-- 5 > Members : 4 <-- 4 > Slots : [U_UU] <-- [U_UU] > Failed disk : 1 > This Slot : 2 > Sector Size : 512 > Array Size : 11135008768 (5.19 TiB 5.70 TB) > Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) > Sector Offset : 0 > Num Stripes : 28997420 > Chunk Size : 64 KiB <-- 64 KiB > Reserved : 0 > Migrate State : repair > Map State : degraded <-- degraded > Checkpoint : 462393 (512) > Dirty State : dirty > RWH Policy : off > Volume ID : 1 > > Disk00 Serial : W461SCHM > State : active > Id : 00000000 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > Disk01 Serial : W461S13X:0 > State : active > Id : ffffffff > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > Disk03 Serial : W461NHAB > State : active > Id : 00000003 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > sudo mdadm --examine /dev/sdd > /dev/sdd: > Magic : Intel Raid ISM Cfg Sig. > Version : 1.3.00 > Orig Family : 154b243e > Family : 154b243e > Generation : 000003aa > Creation Time : Unknown > Attributes : All supported > UUID : 72360627:bb745f4c:aedafaab:e25d3123 > Checksum : 21ae5a2a correct > MPB Sectors : 2 > Disks : 4 > RAID Devices : 1 > > Disk03 Serial : W461NHAB > State : active > Id : 00000003 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > [Data]: > Subarray : 0 > UUID : 764aa814:831953a1:06cf2a07:1ca42b2e > RAID Level : 5 <-- 5 > Members : 4 <-- 4 > Slots : [U_UU] <-- [U_UU] > Failed disk : 1 > This Slot : 3 > Sector Size : 512 > Array Size : 11135008768 (5.19 TiB 5.70 TB) > Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) > Sector Offset : 0 > Num Stripes : 28997420 > Chunk Size : 64 KiB <-- 64 KiB > Reserved : 0 > Migrate State : repair > Map State : degraded <-- degraded > Checkpoint : 462393 (512) > Dirty State : dirty > RWH Policy : off > Volume ID : 1 > > Disk00 Serial : W461SCHM > State : active > Id : 00000000 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > Disk01 Serial : W461S13X:0 > State : active > Id : ffffffff > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > > Disk02 Serial : W461NLPM > State : active > Id : 00000002 > Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) > sudo mdadm --detail /dev/md125 > /dev/md125: > Version : imsm > Raid Level : container > Total Devices : 1 > > Working Devices : 1 > > Member Arrays : > > Number Major Minor RaidDevice > > - 8 16 - /dev/sdb > > sudo mdadm --detail /dev/md126 > /dev/md126: > Container : /dev/md/imsm0, member 0 > Raid Level : raid5 > Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB) > Raid Devices : 4 > Total Devices : 3 > > State : active, FAILED, Not Started > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-asymmetric > Chunk Size : 64K > > Consistency Policy : unknown > > > UUID : 764aa814:831953a1:06cf2a07:1ca42b2e > Number Major Minor RaidDevice State > - 0 0 0 removed > - 0 0 1 removed > - 0 0 2 removed > - 0 0 3 removed > > - 8 0 0 sync /dev/sda > - 8 32 2 sync /dev/sdc > - 8 48 3 sync /dev/sdd > > sudo mdadm --detail /dev/md127 > /dev/md127: > Version : imsm > Raid Level : container > Total Devices : 3 > > Working Devices : 3 > > > UUID : 72360627:bb745f4c:aedafaab:e25d3123 > Member Arrays : /dev/md126 > > Number Major Minor RaidDevice > > - 8 0 - /dev/sda > - 8 32 - /dev/sdc > - 8 48 - /dev/sdd > > lsdrv > **Warning** The following utility(ies) failed to execute: > sginfo > Some information may be missing. > > PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 > └nvme nvme0 Samsung SSD 970 EVO Plus 500GB {S4P2NF0M501223D} > └nvme0n1 465.76g [259:0] Empty/Unknown > ├nvme0n1p1 512.00m [259:1] Empty/Unknown > │└Mounted as /dev/nvme0n1p1 @ /boot/efi > ├nvme0n1p2 732.00m [259:2] Empty/Unknown > │└Mounted as /dev/nvme0n1p2 @ /boot > └nvme0n1p3 464.54g [259:3] Empty/Unknown > ├dm-0 463.59g [252:0] Empty/Unknown > │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ / > └dm-1 980.00m [252:1] Empty/Unknown > PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09) > └scsi 0:x:x:x [Empty] > PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09) > ├scsi 2:0:0:0 ATA ST2000NX0253 > │└sda 1.82t [8:0] Empty/Unknown > │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} > │ │ Empty/Unknown > │ └md127 0.00k [9:127] MD vexternal:imsm () inactive, None (None) None {None} > │ Empty/Unknown > ├scsi 3:0:0:0 ATA ST2000NX0253 > │└sdb 1.82t [8:16] Empty/Unknown > │ └md125 0.00k [9:125] MD vexternal:imsm () inactive, None (None) None {None} > │ Empty/Unknown > ├scsi 4:0:0:0 ATA ST2000NX0253 > │└sdc 1.82t [8:32] Empty/Unknown > │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} > │ │ Empty/Unknown > ├scsi 5:0:0:0 ATA ST2000NX0253 > │└sdd 1.82t [8:48] Empty/Unknown > │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} > │ │ Empty/Unknown > └scsi 6:0:0:0 Slimtype DVD A DS8ACSH > └sr0 1.00g [11:0] Empty/Unknown > Other Block Devices > ├loop0 0.00k [7:0] Empty/Unknown > ├loop1 0.00k [7:1] Empty/Unknown > ├loop2 0.00k [7:2] Empty/Unknown > ├loop3 0.00k [7:3] Empty/Unknown > ├loop4 0.00k [7:4] Empty/Unknown > ├loop5 0.00k [7:5] Empty/Unknown > ├loop6 0.00k [7:6] Empty/Unknown > └loop7 0.00k [7:7] Empty/Unknown > > cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] > md125 : inactive sdb[0](S) > 1105 blocks super external:imsm > > md126 : inactive sda[2] sdc[1] sdd[0] > 5567507712 blocks super external:/md127/0 > > md127 : inactive sdc[2](S) sdd[1](S) sda[0](S) > 9459 blocks super external:imsm > > unused devices: <none> > > > Thanks for the help! > Devon Beets Hi Devon, The array is in dirty degraded state. It does not start automatically because there is a risk of silent data corruption, i.e. RAID write hole. You can force it to start with: # mdadm -R --force /dev/md126 You will need mdadm built with this commit for it to work: https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=7b99edab2834d5d08ef774b4cff784caaa1a186f It may be a good idea to copy the array contents with dd before you fsck or mount the filesystem in case the recovery goes wrong. Then stop the second container and add the new drive to the array: # mdadm -S /dev/md125 # mdadm -a /dev/md127 /dev/sdb Rebuild should begin at this point. Regards, Artur ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Cannot add replacement hard drive to mdadm RAID5 array 2021-03-15 14:38 ` Artur Paszkiewicz @ 2021-03-16 8:15 ` Tkaczyk, Mariusz 2021-04-06 20:05 ` Devon Beets 0 siblings, 1 reply; 7+ messages in thread From: Tkaczyk, Mariusz @ 2021-03-16 8:15 UTC (permalink / raw) To: Artur Paszkiewicz, Devon Beets, linux-raid; +Cc: Glenn Wikle On 15.03.2021 15:38, Artur Paszkiewicz wrote: > On 3/12/21 11:55 PM, Devon Beets wrote: >> Hello, >> >> My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail. >> >> Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not. >> >> Below is the output of some commands for additional diagnostic information. Please let me know if you need more. >> >> Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far. >> >> uname -a >> Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >> >> sudo mdadm --version >> mdadm - v4.1-126-gbdbe7f8 - 2021-03-09 >> >> sudo smartctl -H -i -l scterc /dev/sda >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W461SCHM >> LU WWN Device Id: 5 000c50 0b426d2d0 >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:07:30 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> >> sudo smartctl -H -i -l scterc /dev/sdb >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W462MZ0R >> LU WWN Device Id: 5 000c50 0c569b51c >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:09:34 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> sudo smartctl -H -i -l scterc /dev/sdc >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W461NLPM >> LU WWN Device Id: 5 000c50 0b426f335 >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:14:38 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> sudo smartctl -H -i -l scterc /dev/sdd >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W461NHAB >> LU WWN Device Id: 5 000c50 0b426f8a4 >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:16:24 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> sudo mdadm --examine /dev/sda >> /dev/sda: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.3.00 >> Orig Family : 154b243e >> Family : 154b243e >> Generation : 000003aa >> Creation Time : Unknown >> Attributes : All supported >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Checksum : 21ae5a2a correct >> MPB Sectors : 2 >> Disks : 4 >> RAID Devices : 1 >> >> Disk00 Serial : W461SCHM >> State : active >> Id : 00000000 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> [Data]: >> Subarray : 0 >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> RAID Level : 5 <-- 5 >> Members : 4 <-- 4 >> Slots : [U_UU] <-- [U_UU] >> Failed disk : 1 >> This Slot : 0 >> Sector Size : 512 >> Array Size : 11135008768 (5.19 TiB 5.70 TB) >> Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) >> Sector Offset : 0 >> Num Stripes : 28997420 >> Chunk Size : 64 KiB <-- 64 KiB >> Reserved : 0 >> Migrate State : repair >> Map State : degraded <-- degraded >> Checkpoint : 462393 (512) >> Dirty State : dirty >> RWH Policy : off >> Volume ID : 1 >> >> Disk01 Serial : W461S13X:0 >> State : active >> Id : ffffffff >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk02 Serial : W461NLPM >> State : active >> Id : 00000002 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk03 Serial : W461NHAB >> State : active >> Id : 00000003 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> sudo mdadm --examine /dev/sdb >> /dev/sdb: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.0.00 >> Orig Family : 00000000 >> Family : e5cd8601 >> Generation : 00000001 >> Creation Time : Unknown >> Attributes : All supported >> UUID : 00000000:00000000:00000000:00000000 >> Checksum : cb9b0c02 correct >> MPB Sectors : 1 >> Disks : 1 >> RAID Devices : 0 >> >> Disk00 Serial : W462MZ0R >> State : spare >> Id : 04000000 >> Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) >> >> Disk Serial : W462MZ0R >> State : spare >> Id : 04000000 >> Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) >> >> sudo mdadm --examine /dev/sdc >> /dev/sdc: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.3.00 >> Orig Family : 154b243e >> Family : 154b243e >> Generation : 000003aa >> Creation Time : Unknown >> Attributes : All supported >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Checksum : 21ae5a2a correct >> MPB Sectors : 2 >> Disks : 4 >> RAID Devices : 1 >> >> Disk02 Serial : W461NLPM >> State : active >> Id : 00000002 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> [Data]: >> Subarray : 0 >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> RAID Level : 5 <-- 5 >> Members : 4 <-- 4 >> Slots : [U_UU] <-- [U_UU] >> Failed disk : 1 >> This Slot : 2 >> Sector Size : 512 >> Array Size : 11135008768 (5.19 TiB 5.70 TB) >> Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) >> Sector Offset : 0 >> Num Stripes : 28997420 >> Chunk Size : 64 KiB <-- 64 KiB >> Reserved : 0 >> Migrate State : repair >> Map State : degraded <-- degraded >> Checkpoint : 462393 (512) >> Dirty State : dirty >> RWH Policy : off >> Volume ID : 1 >> >> Disk00 Serial : W461SCHM >> State : active >> Id : 00000000 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk01 Serial : W461S13X:0 >> State : active >> Id : ffffffff >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk03 Serial : W461NHAB >> State : active >> Id : 00000003 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> sudo mdadm --examine /dev/sdd >> /dev/sdd: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.3.00 >> Orig Family : 154b243e >> Family : 154b243e >> Generation : 000003aa >> Creation Time : Unknown >> Attributes : All supported >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Checksum : 21ae5a2a correct >> MPB Sectors : 2 >> Disks : 4 >> RAID Devices : 1 >> >> Disk03 Serial : W461NHAB >> State : active >> Id : 00000003 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> [Data]: >> Subarray : 0 >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> RAID Level : 5 <-- 5 >> Members : 4 <-- 4 >> Slots : [U_UU] <-- [U_UU] >> Failed disk : 1 >> This Slot : 3 >> Sector Size : 512 >> Array Size : 11135008768 (5.19 TiB 5.70 TB) >> Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) >> Sector Offset : 0 >> Num Stripes : 28997420 >> Chunk Size : 64 KiB <-- 64 KiB >> Reserved : 0 >> Migrate State : repair >> Map State : degraded <-- degraded >> Checkpoint : 462393 (512) >> Dirty State : dirty >> RWH Policy : off >> Volume ID : 1 >> >> Disk00 Serial : W461SCHM >> State : active >> Id : 00000000 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk01 Serial : W461S13X:0 >> State : active >> Id : ffffffff >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk02 Serial : W461NLPM >> State : active >> Id : 00000002 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> sudo mdadm --detail /dev/md125 >> /dev/md125: >> Version : imsm >> Raid Level : container >> Total Devices : 1 >> >> Working Devices : 1 >> >> Member Arrays : >> >> Number Major Minor RaidDevice >> >> - 8 16 - /dev/sdb >> >> sudo mdadm --detail /dev/md126 >> /dev/md126: >> Container : /dev/md/imsm0, member 0 >> Raid Level : raid5 >> Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB) >> Raid Devices : 4 >> Total Devices : 3 >> >> State : active, FAILED, Not Started >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : left-asymmetric >> Chunk Size : 64K >> >> Consistency Policy : unknown >> >> >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> Number Major Minor RaidDevice State >> - 0 0 0 removed >> - 0 0 1 removed >> - 0 0 2 removed >> - 0 0 3 removed >> >> - 8 0 0 sync /dev/sda >> - 8 32 2 sync /dev/sdc >> - 8 48 3 sync /dev/sdd >> >> sudo mdadm --detail /dev/md127 >> /dev/md127: >> Version : imsm >> Raid Level : container >> Total Devices : 3 >> >> Working Devices : 3 >> >> >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Member Arrays : /dev/md126 >> >> Number Major Minor RaidDevice >> >> - 8 0 - /dev/sda >> - 8 32 - /dev/sdc >> - 8 48 - /dev/sdd >> >> lsdrv >> **Warning** The following utility(ies) failed to execute: >> sginfo >> Some information may be missing. >> >> PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 >> └nvme nvme0 Samsung SSD 970 EVO Plus 500GB {S4P2NF0M501223D} >> └nvme0n1 465.76g [259:0] Empty/Unknown >> ├nvme0n1p1 512.00m [259:1] Empty/Unknown >> │└Mounted as /dev/nvme0n1p1 @ /boot/efi >> ├nvme0n1p2 732.00m [259:2] Empty/Unknown >> │└Mounted as /dev/nvme0n1p2 @ /boot >> └nvme0n1p3 464.54g [259:3] Empty/Unknown >> ├dm-0 463.59g [252:0] Empty/Unknown >> │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ / >> └dm-1 980.00m [252:1] Empty/Unknown >> PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09) >> └scsi 0:x:x:x [Empty] >> PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09) >> ├scsi 2:0:0:0 ATA ST2000NX0253 >> │└sda 1.82t [8:0] Empty/Unknown >> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} >> │ │ Empty/Unknown >> │ └md127 0.00k [9:127] MD vexternal:imsm () inactive, None (None) None {None} >> │ Empty/Unknown >> ├scsi 3:0:0:0 ATA ST2000NX0253 >> │└sdb 1.82t [8:16] Empty/Unknown >> │ └md125 0.00k [9:125] MD vexternal:imsm () inactive, None (None) None {None} >> │ Empty/Unknown >> ├scsi 4:0:0:0 ATA ST2000NX0253 >> │└sdc 1.82t [8:32] Empty/Unknown >> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} >> │ │ Empty/Unknown >> ├scsi 5:0:0:0 ATA ST2000NX0253 >> │└sdd 1.82t [8:48] Empty/Unknown >> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} >> │ │ Empty/Unknown >> └scsi 6:0:0:0 Slimtype DVD A DS8ACSH >> └sr0 1.00g [11:0] Empty/Unknown >> Other Block Devices >> ├loop0 0.00k [7:0] Empty/Unknown >> ├loop1 0.00k [7:1] Empty/Unknown >> ├loop2 0.00k [7:2] Empty/Unknown >> ├loop3 0.00k [7:3] Empty/Unknown >> ├loop4 0.00k [7:4] Empty/Unknown >> ├loop5 0.00k [7:5] Empty/Unknown >> ├loop6 0.00k [7:6] Empty/Unknown >> └loop7 0.00k [7:7] Empty/Unknown >> >> cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] >> md125 : inactive sdb[0](S) >> 1105 blocks super external:imsm >> >> md126 : inactive sda[2] sdc[1] sdd[0] >> 5567507712 blocks super external:/md127/0 >> >> md127 : inactive sdc[2](S) sdd[1](S) sda[0](S) >> 9459 blocks super external:imsm >> >> unused devices: <none> >> >> >> Thanks for the help! >> Devon Beets > > Hi Devon, > > The array is in dirty degraded state. It does not start automatically because > there is a risk of silent data corruption, i.e. RAID write hole. You can force > it to start with: > > # mdadm -R --force /dev/md126 > > You will need mdadm built with this commit for it to work: > > https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=7b99edab2834d5d08ef774b4cff784caaa1a186f > > It may be a good idea to copy the array contents with dd before you fsck or > mount the filesystem in case the recovery goes wrong. > > Then stop the second container and add the new drive to the array: > > # mdadm -S /dev/md125 > # mdadm -a /dev/md127 /dev/sdb > > Rebuild should begin at this point. > > Regards, > Artur > > Hello, also see: https://lore.kernel.org/linux-raid/ac370d79-95e8-d0a1-0991- fb12b128818c@linux.intel.com/T/#t Mariusz ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Cannot add replacement hard drive to mdadm RAID5 array 2021-03-16 8:15 ` Tkaczyk, Mariusz @ 2021-04-06 20:05 ` Devon Beets 2021-04-07 10:25 ` Artur Paszkiewicz 0 siblings, 1 reply; 7+ messages in thread From: Devon Beets @ 2021-04-06 20:05 UTC (permalink / raw) To: Tkaczyk, Mariusz, Artur Paszkiewicz, linux-raid; +Cc: Glenn Wikle Hello Artur, I followed your recommended steps using the --force flag on run of my /dev/md126 array, in addition to stopping the extra container /dev/md125 and adding the new disk /dev/sdb to the parent container /dev/md127. It seems that it still does not work yet. Output of mdstat prior to running the commands: cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md125 : inactive sdb[0](S) 1105 blocks super external:imsm md126 : inactive sda[2] sdc[1] sdd[0] 5567507712 blocks super external:/md127/0 md127 : inactive sdc[2](S) sdd[1](S) sda[0](S) 9459 blocks super external:imsm unused devices: <none> Output of the recommended commands for adding the new disk to the RAID5 array: sudo mdadm -R --force /dev/md126 mdadm: array /dev/md/Data now has 3 devices (0 new) sudo mdadm -S /dev/md125 mdadm: stopped /dev/md125 sudo mdadm -a /dev/md127 /dev/sdb mdadm: added /dev/sdb Output of mdstat after running the commands. Shows that both md126 and md127 are inactive, and there is no RAID resync happening. cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] md126 : inactive sda[2] sdc[1] sdd[0] 5567507712 blocks super external:/md127/0 md127 : inactive sdb[3](S) sdc[2](S) sdd[1](S) sda[0](S) 10564 blocks super external:imsm unused devices: <none> Output of examination of the new disk /dev/sdb. Still shows that it does not have a UUID and is a spare disk, even after being added to the md127 parent container. sudo mdadm -E /dev/sdb /dev/sdb: Magic : Intel Raid ISM Cfg Sig. Version : 1.0.00 Orig Family : 00000000 Family : e4cd8601 Generation : 00000001 Creation Time : Unknown Attributes : All supported UUID : 00000000:00000000:00000000:00000000 Checksum : c99b0c02 correct MPB Sectors : 1 Disks : 1 RAID Devices : 0 Disk00 Serial : W462MZ0R State : spare Id : 03000000 Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) Disk Serial : W462MZ0R State : spare Id : 03000000 Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) Output of details of parent container md127, which does show that the new disk /dev/sdb is in the container after being added. sudo mdadm -D /dev/md127 /dev/md127: Version : imsm Raid Level : container Total Devices : 4 Working Devices : 4 UUID : 72360627:bb745f4c:aedafaab:e25d3123 Member Arrays : /dev/md126 Number Major Minor RaidDevice - 8 0 - /dev/sda - 8 16 - /dev/sdb - 8 32 - /dev/sdc - 8 48 - /dev/sdd Output of details on the real RAID array md126 which does not show the new disk sdb, only containing the other three devices sda, sdc, and sdd. Also the state shows active, FAILED, Not Started. Also, it shows that four devices were removed. sudo mdadm -D /dev/md126 /dev/md126: Container : /dev/md/imsm0, member 0 Raid Level : raid5 Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB) Raid Devices : 4 Total Devices : 3 State : active, FAILED, Not Started Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : left-asymmetric Chunk Size : 64K Consistency Policy : unknown UUID : 764aa814:831953a1:06cf2a07:1ca42b2e Number Major Minor RaidDevice State - 0 0 0 removed - 0 0 1 removed - 0 0 2 removed - 0 0 3 removed - 8 0 0 sync /dev/sda - 8 32 2 sync /dev/sdc - 8 48 3 sync /dev/sdd Here is an example of the details of a healthy RAID array on another system. I want to recreate this result if possible. sudo mdadm -D /dev/md126 /dev/md126: Container : /dev/md/imsm0, member 0 Raid Level : raid5 Array Size : 5567516672 (5309.60 GiB 5701.14 GB) Used Dev Size : 1855838976 (1769.87 GiB 1900.38 GB) Raid Devices : 4 Total Devices : 4 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-asymmetric Chunk Size : 64K Consistency Policy : resync UUID : 4541a34d:4a84d4dc:3d1b0695:96f58987 Number Major Minor RaidDevice State 3 8 0 0 active sync /dev/sda 2 8 16 1 active sync /dev/sdb 1 8 32 2 active sync /dev/sdc 0 8 48 3 active sync /dev/sdd Please let me know if there is any additional information I can provide to help troubleshooting. Thanks! Devon Beets From: Tkaczyk, Mariusz <mariusz.tkaczyk@linux.intel.com> Sent: Tuesday, March 16, 2021 2:15 AM To: Artur Paszkiewicz <artur.paszkiewicz@intel.com>; Devon Beets <devon@sigmalabsinc.com>; linux-raid@vger.kernel.org <linux-raid@vger.kernel.org> Cc: Glenn Wikle <gwikle@sigmalabsinc.com> Subject: Re: Cannot add replacement hard drive to mdadm RAID5 array On 15.03.2021 15:38, Artur Paszkiewicz wrote: > On 3/12/21 11:55 PM, Devon Beets wrote: >> Hello, >> >> My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail. >> >> Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not. >> >> Below is the output of some commands for additional diagnostic information. Please let me know if you need more. >> >> Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far. >> >> uname -a >> Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux >> >> sudo mdadm --version >> mdadm - v4.1-126-gbdbe7f8 - 2021-03-09 >> >> sudo smartctl -H -i -l scterc /dev/sda >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W461SCHM >> LU WWN Device Id: 5 000c50 0b426d2d0 >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:07:30 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> >> sudo smartctl -H -i -l scterc /dev/sdb >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W462MZ0R >> LU WWN Device Id: 5 000c50 0c569b51c >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:09:34 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> sudo smartctl -H -i -l scterc /dev/sdc >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W461NLPM >> LU WWN Device Id: 5 000c50 0b426f335 >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:14:38 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> sudo smartctl -H -i -l scterc /dev/sdd >> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build) >> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org >> >> === START OF INFORMATION SECTION === >> Device Model: ST2000NX0253 >> Serial Number: W461NHAB >> LU WWN Device Id: 5 000c50 0b426f8a4 >> Firmware Version: SN05 >> User Capacity: 2,000,398,934,016 bytes [2.00 TB] >> Sector Sizes: 512 bytes logical, 4096 bytes physical >> Rotation Rate: 7200 rpm >> Form Factor: 2.5 inches >> Device is: Not in smartctl database [for details use: -P showall] >> ATA Version is: ACS-3 (minor revision not indicated) >> SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) >> Local Time is: Thu Mar 11 15:16:24 2021 MST >> SMART support is: Available - device has SMART capability. >> SMART support is: Enabled >> >> === START OF READ SMART DATA SECTION === >> SMART overall-health self-assessment test result: PASSED >> >> SCT Error Recovery Control: >> Read: 100 (10.0 seconds) >> Write: 100 (10.0 seconds) >> >> sudo mdadm --examine /dev/sda >> /dev/sda: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.3.00 >> Orig Family : 154b243e >> Family : 154b243e >> Generation : 000003aa >> Creation Time : Unknown >> Attributes : All supported >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Checksum : 21ae5a2a correct >> MPB Sectors : 2 >> Disks : 4 >> RAID Devices : 1 >> >> Disk00 Serial : W461SCHM >> State : active >> Id : 00000000 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> [Data]: >> Subarray : 0 >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> RAID Level : 5 <-- 5 >> Members : 4 <-- 4 >> Slots : [U_UU] <-- [U_UU] >> Failed disk : 1 >> This Slot : 0 >> Sector Size : 512 >> Array Size : 11135008768 (5.19 TiB 5.70 TB) >> Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) >> Sector Offset : 0 >> Num Stripes : 28997420 >> Chunk Size : 64 KiB <-- 64 KiB >> Reserved : 0 >> Migrate State : repair >> Map State : degraded <-- degraded >> Checkpoint : 462393 (512) >> Dirty State : dirty >> RWH Policy : off >> Volume ID : 1 >> >> Disk01 Serial : W461S13X:0 >> State : active >> Id : ffffffff >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk02 Serial : W461NLPM >> State : active >> Id : 00000002 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk03 Serial : W461NHAB >> State : active >> Id : 00000003 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> sudo mdadm --examine /dev/sdb >> /dev/sdb: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.0.00 >> Orig Family : 00000000 >> Family : e5cd8601 >> Generation : 00000001 >> Creation Time : Unknown >> Attributes : All supported >> UUID : 00000000:00000000:00000000:00000000 >> Checksum : cb9b0c02 correct >> MPB Sectors : 1 >> Disks : 1 >> RAID Devices : 0 >> >> Disk00 Serial : W462MZ0R >> State : spare >> Id : 04000000 >> Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) >> >> Disk Serial : W462MZ0R >> State : spare >> Id : 04000000 >> Usable Size : 3907026958 (1863.02 GiB 2000.40 GB) >> >> sudo mdadm --examine /dev/sdc >> /dev/sdc: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.3.00 >> Orig Family : 154b243e >> Family : 154b243e >> Generation : 000003aa >> Creation Time : Unknown >> Attributes : All supported >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Checksum : 21ae5a2a correct >> MPB Sectors : 2 >> Disks : 4 >> RAID Devices : 1 >> >> Disk02 Serial : W461NLPM >> State : active >> Id : 00000002 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> [Data]: >> Subarray : 0 >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> RAID Level : 5 <-- 5 >> Members : 4 <-- 4 >> Slots : [U_UU] <-- [U_UU] >> Failed disk : 1 >> This Slot : 2 >> Sector Size : 512 >> Array Size : 11135008768 (5.19 TiB 5.70 TB) >> Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) >> Sector Offset : 0 >> Num Stripes : 28997420 >> Chunk Size : 64 KiB <-- 64 KiB >> Reserved : 0 >> Migrate State : repair >> Map State : degraded <-- degraded >> Checkpoint : 462393 (512) >> Dirty State : dirty >> RWH Policy : off >> Volume ID : 1 >> >> Disk00 Serial : W461SCHM >> State : active >> Id : 00000000 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk01 Serial : W461S13X:0 >> State : active >> Id : ffffffff >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk03 Serial : W461NHAB >> State : active >> Id : 00000003 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> sudo mdadm --examine /dev/sdd >> /dev/sdd: >> Magic : Intel Raid ISM Cfg Sig. >> Version : 1.3.00 >> Orig Family : 154b243e >> Family : 154b243e >> Generation : 000003aa >> Creation Time : Unknown >> Attributes : All supported >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Checksum : 21ae5a2a correct >> MPB Sectors : 2 >> Disks : 4 >> RAID Devices : 1 >> >> Disk03 Serial : W461NHAB >> State : active >> Id : 00000003 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> [Data]: >> Subarray : 0 >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> RAID Level : 5 <-- 5 >> Members : 4 <-- 4 >> Slots : [U_UU] <-- [U_UU] >> Failed disk : 1 >> This Slot : 3 >> Sector Size : 512 >> Array Size : 11135008768 (5.19 TiB 5.70 TB) >> Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB) >> Sector Offset : 0 >> Num Stripes : 28997420 >> Chunk Size : 64 KiB <-- 64 KiB >> Reserved : 0 >> Migrate State : repair >> Map State : degraded <-- degraded >> Checkpoint : 462393 (512) >> Dirty State : dirty >> RWH Policy : off >> Volume ID : 1 >> >> Disk00 Serial : W461SCHM >> State : active >> Id : 00000000 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk01 Serial : W461S13X:0 >> State : active >> Id : ffffffff >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> >> Disk02 Serial : W461NLPM >> State : active >> Id : 00000002 >> Usable Size : 3907018766 (1863.01 GiB 2000.39 GB) >> sudo mdadm --detail /dev/md125 >> /dev/md125: >> Version : imsm >> Raid Level : container >> Total Devices : 1 >> >> Working Devices : 1 >> >> Member Arrays : >> >> Number Major Minor RaidDevice >> >> - 8 16 - /dev/sdb >> >> sudo mdadm --detail /dev/md126 >> /dev/md126: >> Container : /dev/md/imsm0, member 0 >> Raid Level : raid5 >> Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB) >> Raid Devices : 4 >> Total Devices : 3 >> >> State : active, FAILED, Not Started >> Active Devices : 3 >> Working Devices : 3 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Layout : left-asymmetric >> Chunk Size : 64K >> >> Consistency Policy : unknown >> >> >> UUID : 764aa814:831953a1:06cf2a07:1ca42b2e >> Number Major Minor RaidDevice State >> - 0 0 0 removed >> - 0 0 1 removed >> - 0 0 2 removed >> - 0 0 3 removed >> >> - 8 0 0 sync /dev/sda >> - 8 32 2 sync /dev/sdc >> - 8 48 3 sync /dev/sdd >> >> sudo mdadm --detail /dev/md127 >> /dev/md127: >> Version : imsm >> Raid Level : container >> Total Devices : 3 >> >> Working Devices : 3 >> >> >> UUID : 72360627:bb745f4c:aedafaab:e25d3123 >> Member Arrays : /dev/md126 >> >> Number Major Minor RaidDevice >> >> - 8 0 - /dev/sda >> - 8 32 - /dev/sdc >> - 8 48 - /dev/sdd >> >> lsdrv >> **Warning** The following utility(ies) failed to execute: >> sginfo >> Some information may be missing. >> >> PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981 >> └nvme nvme0 Samsung SSD 970 EVO Plus 500GB {S4P2NF0M501223D} >> └nvme0n1 465.76g [259:0] Empty/Unknown >> ├nvme0n1p1 512.00m [259:1] Empty/Unknown >> │└Mounted as /dev/nvme0n1p1 @ /boot/efi >> ├nvme0n1p2 732.00m [259:2] Empty/Unknown >> │└Mounted as /dev/nvme0n1p2 @ /boot >> └nvme0n1p3 464.54g [259:3] Empty/Unknown >> ├dm-0 463.59g [252:0] Empty/Unknown >> │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ / >> └dm-1 980.00m [252:1] Empty/Unknown >> PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09) >> └scsi 0:x:x:x [Empty] >> PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09) >> ├scsi 2:0:0:0 ATA ST2000NX0253 >> │└sda 1.82t [8:0] Empty/Unknown >> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} >> │ │ Empty/Unknown >> │ └md127 0.00k [9:127] MD vexternal:imsm () inactive, None (None) None {None} >> │ Empty/Unknown >> ├scsi 3:0:0:0 ATA ST2000NX0253 >> │└sdb 1.82t [8:16] Empty/Unknown >> │ └md125 0.00k [9:125] MD vexternal:imsm () inactive, None (None) None {None} >> │ Empty/Unknown >> ├scsi 4:0:0:0 ATA ST2000NX0253 >> │└sdc 1.82t [8:32] Empty/Unknown >> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} >> │ │ Empty/Unknown >> ├scsi 5:0:0:0 ATA ST2000NX0253 >> │└sdd 1.82t [8:48] Empty/Unknown >> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None} >> │ │ Empty/Unknown >> └scsi 6:0:0:0 Slimtype DVD A DS8ACSH >> └sr0 1.00g [11:0] Empty/Unknown >> Other Block Devices >> ├loop0 0.00k [7:0] Empty/Unknown >> ├loop1 0.00k [7:1] Empty/Unknown >> ├loop2 0.00k [7:2] Empty/Unknown >> ├loop3 0.00k [7:3] Empty/Unknown >> ├loop4 0.00k [7:4] Empty/Unknown >> ├loop5 0.00k [7:5] Empty/Unknown >> ├loop6 0.00k [7:6] Empty/Unknown >> └loop7 0.00k [7:7] Empty/Unknown >> >> cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] >> md125 : inactive sdb[0](S) >> 1105 blocks super external:imsm >> >> md126 : inactive sda[2] sdc[1] sdd[0] >> 5567507712 blocks super external:/md127/0 >> >> md127 : inactive sdc[2](S) sdd[1](S) sda[0](S) >> 9459 blocks super external:imsm >> >> unused devices: <none> >> >> >> Thanks for the help! >> Devon Beets > > Hi Devon, > > The array is in dirty degraded state. It does not start automatically because > there is a risk of silent data corruption, i.e. RAID write hole. You can force > it to start with: > > # mdadm -R --force /dev/md126 > > You will need mdadm built with this commit for it to work: > > https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=7b99edab2834d5d08ef774b4cff784caaa1a186f > > It may be a good idea to copy the array contents with dd before you fsck or > mount the filesystem in case the recovery goes wrong. > > Then stop the second container and add the new drive to the array: > > # mdadm -S /dev/md125 > # mdadm -a /dev/md127 /dev/sdb > > Rebuild should begin at this point. > > Regards, > Artur > > Hello, also see: https://lore.kernel.org/linux-raid/ac370d79-95e8-d0a1-0991- fb12b128818c@linux.intel.com/T/#t Mariusz ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Cannot add replacement hard drive to mdadm RAID5 array 2021-04-06 20:05 ` Devon Beets @ 2021-04-07 10:25 ` Artur Paszkiewicz 2021-04-22 17:56 ` Devon Beets 0 siblings, 1 reply; 7+ messages in thread From: Artur Paszkiewicz @ 2021-04-07 10:25 UTC (permalink / raw) To: Devon Beets, Tkaczyk, Mariusz, linux-raid; +Cc: Glenn Wikle On 06.04.2021 22:05, Devon Beets wrote: > Output of the recommended commands for adding the new disk to the RAID5 array: > > sudo mdadm -R --force /dev/md126 > mdadm: array /dev/md/Data now has 3 devices (0 new) > > sudo mdadm -S /dev/md125 > mdadm: stopped /dev/md125 > > sudo mdadm -a /dev/md127 /dev/sdb > mdadm: added /dev/sdb > > Output of mdstat after running the commands. Shows that both md126 and md127 are inactive, and there is no RAID resync happening. > > cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] > md126 : inactive sda[2] sdc[1] sdd[0] > 5567507712 blocks super external:/md127/0 > > md127 : inactive sdb[3](S) sdc[2](S) sdd[1](S) sda[0](S) > 10564 blocks super external:imsm > > unused devices: <none> It looks like mdadm still does not handle this case correctly. Please do this before the "mdadm -R --force /dev/md126": printf "%llu\n" -1 > /sys/block/md126/md/resync_start Regards, Artur ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Cannot add replacement hard drive to mdadm RAID5 array 2021-04-07 10:25 ` Artur Paszkiewicz @ 2021-04-22 17:56 ` Devon Beets 2021-04-23 7:26 ` Artur Paszkiewicz 0 siblings, 1 reply; 7+ messages in thread From: Devon Beets @ 2021-04-22 17:56 UTC (permalink / raw) To: Artur Paszkiewicz, Tkaczyk, Mariusz, linux-raid; +Cc: Glenn Wikle Hello Artur, Your answer worked! I just wanted to follow up, so anyone else who has a similar issue can arrive at the solution. Also, I am reporting what I did since it is not exactly as you suggested. I tried to follow your extra step, but it returned as an invalid number. I ran with sudo since my user is not root in this case: printf "%llu\n" -1 > sudo /sys/block/md126/md/resync_start -bash: printf: /sys/block/md126/md/resync_start: invalid number So, we assumed that you simply wanted us to edit the resync_start file value to the number 18446744073709551615. I did it by hand using a text editor. After doing so, the value of the file changed to none. sudo cat /sys/block/md126/md/resync_start none After that, I proceeded to reconstruct the array. But I changed the order of the commands. Not sure if that mattered. sudo mdadm -S /dev/md125 mdadm: stopped /dev/md125 sudo mdadm -a /dev/md127 /dev/sdb mdadm: added /dev/sdb sudo mdadm -R --force /dev/md126 mdadm: Started /dev/md/Data with 3 devices (0 new) Even though it only reported 3 devices (0 new) during the last command's output, it successfully added the new /dev/sdb drive as a spare, started the array resync, and is recovering now as reported by cat /proc/mdstat. Thank you so much for the assistance! Devon Beets From: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Sent: Wednesday, April 7, 2021 4:25 AM To: Devon Beets <devon@sigmalabsinc.com>; Tkaczyk, Mariusz <mariusz.tkaczyk@linux.intel.com>; linux-raid@vger.kernel.org <linux-raid@vger.kernel.org> Cc: Glenn Wikle <gwikle@sigmalabsinc.com> Subject: Re: Cannot add replacement hard drive to mdadm RAID5 array On 06.04.2021 22:05, Devon Beets wrote: > Output of the recommended commands for adding the new disk to the RAID5 array: > > sudo mdadm -R --force /dev/md126 > mdadm: array /dev/md/Data now has 3 devices (0 new) > > sudo mdadm -S /dev/md125 > mdadm: stopped /dev/md125 > > sudo mdadm -a /dev/md127 /dev/sdb > mdadm: added /dev/sdb > > Output of mdstat after running the commands. Shows that both md126 and md127 are inactive, and there is no RAID resync happening. > > cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] > md126 : inactive sda[2] sdc[1] sdd[0] > 5567507712 blocks super external:/md127/0 > > md127 : inactive sdb[3](S) sdc[2](S) sdd[1](S) sda[0](S) > 10564 blocks super external:imsm > > unused devices: <none> It looks like mdadm still does not handle this case correctly. Please do this before the "mdadm -R --force /dev/md126": printf "%llu\n" -1 > /sys/block/md126/md/resync_start Regards, Artur ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Cannot add replacement hard drive to mdadm RAID5 array 2021-04-22 17:56 ` Devon Beets @ 2021-04-23 7:26 ` Artur Paszkiewicz 0 siblings, 0 replies; 7+ messages in thread From: Artur Paszkiewicz @ 2021-04-23 7:26 UTC (permalink / raw) To: Devon Beets, Tkaczyk, Mariusz, linux-raid; +Cc: Glenn Wikle On 22.04.2021 19:56, Devon Beets wrote: > I ran with sudo since my user is not root in this case: > > printf "%llu\n" -1 > sudo /sys/block/md126/md/resync_start > -bash: printf: /sys/block/md126/md/resync_start: invalid number > > So, we assumed that you simply wanted us to edit the resync_start file value to the number 18446744073709551615. I did it by hand using a text editor. After doing so, the value of the file changed to none. That's right, sorry I forgot about sudo. It's a bit tricky to use it with redirections. Something like this should work: sudo sh -c 'printf "%llu\n" -1 > /sys/block/md126/md/resync_start' > After that, I proceeded to reconstruct the array. But I changed the order of the commands. Not sure if that mattered. > > sudo mdadm -S /dev/md125 > mdadm: stopped /dev/md125 > > sudo mdadm -a /dev/md127 /dev/sdb > mdadm: added /dev/sdb > > sudo mdadm -R --force /dev/md126 > mdadm: Started /dev/md/Data with 3 devices (0 new) > > Even though it only reported 3 devices (0 new) during the last command's output, it successfully added the new /dev/sdb drive as a spare, started the array resync, and is recovering now as reported by cat /proc/mdstat. > > Thank you so much for the assistance! No problem, I'm glad you got it working. Regards, Artur ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-04-23 7:26 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-12 22:55 Cannot add replacement hard drive to mdadm RAID5 array Devon Beets 2021-03-15 14:38 ` Artur Paszkiewicz 2021-03-16 8:15 ` Tkaczyk, Mariusz 2021-04-06 20:05 ` Devon Beets 2021-04-07 10:25 ` Artur Paszkiewicz 2021-04-22 17:56 ` Devon Beets 2021-04-23 7:26 ` Artur Paszkiewicz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.