All of lore.kernel.org
 help / color / mirror / Atom feed
* Cannot add replacement hard drive to mdadm RAID5 array
@ 2021-03-12 22:55 Devon Beets
  2021-03-15 14:38 ` Artur Paszkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Devon Beets @ 2021-03-12 22:55 UTC (permalink / raw)
  To: linux-raid; +Cc: Glenn Wikle

Hello,

My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail.

Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not.

Below is the output of some commands for additional diagnostic information. Please let me know if you need more. 

Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far.

uname -a
Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

sudo mdadm --version
mdadm - v4.1-126-gbdbe7f8 - 2021-03-09

sudo smartctl -H -i -l scterc /dev/sda
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000NX0253
Serial Number:    W461SCHM
LU WWN Device Id: 5 000c50 0b426d2d0
Firmware Version: SN05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Mar 11 15:07:30 2021 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:    100 (10.0 seconds)
          Write:    100 (10.0 seconds)
 

sudo smartctl -H -i -l scterc /dev/sdb
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000NX0253
Serial Number:    W462MZ0R
LU WWN Device Id: 5 000c50 0c569b51c
Firmware Version: SN05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Mar 11 15:09:34 2021 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:    100 (10.0 seconds)
          Write:    100 (10.0 seconds)
 
sudo smartctl -H -i -l scterc /dev/sdc
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000NX0253
Serial Number:    W461NLPM
LU WWN Device Id: 5 000c50 0b426f335
Firmware Version: SN05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Mar 11 15:14:38 2021 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:    100 (10.0 seconds)
          Write:    100 (10.0 seconds)

sudo smartctl -H -i -l scterc /dev/sdd
smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST2000NX0253
Serial Number:    W461NHAB
LU WWN Device Id: 5 000c50 0b426f8a4
Firmware Version: SN05
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Mar 11 15:16:24 2021 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:    100 (10.0 seconds)
          Write:    100 (10.0 seconds)
 
sudo mdadm --examine /dev/sda
/dev/sda:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.3.00
    Orig Family : 154b243e
         Family : 154b243e
     Generation : 000003aa
  Creation Time : Unknown
     Attributes : All supported
           UUID : 72360627:bb745f4c:aedafaab:e25d3123
       Checksum : 21ae5a2a correct
    MPB Sectors : 2
          Disks : 4
   RAID Devices : 1

  Disk00 Serial : W461SCHM
          State : active
             Id : 00000000
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

[Data]:
       Subarray : 0
           UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
     RAID Level : 5 <-- 5
        Members : 4 <-- 4
          Slots : [U_UU] <-- [U_UU]
    Failed disk : 1
      This Slot : 0
    Sector Size : 512
     Array Size : 11135008768 (5.19 TiB 5.70 TB)
   Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
  Sector Offset : 0
    Num Stripes : 28997420
     Chunk Size : 64 KiB <-- 64 KiB
       Reserved : 0
  Migrate State : repair
      Map State : degraded <-- degraded
     Checkpoint : 462393 (512)
    Dirty State : dirty
     RWH Policy : off
      Volume ID : 1

  Disk01 Serial : W461S13X:0
          State : active
             Id : ffffffff
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

  Disk02 Serial : W461NLPM
          State : active
             Id : 00000002
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

  Disk03 Serial : W461NHAB
          State : active
             Id : 00000003
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

sudo mdadm --examine /dev/sdb
/dev/sdb:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.0.00
    Orig Family : 00000000
         Family : e5cd8601
     Generation : 00000001
  Creation Time : Unknown
     Attributes : All supported
           UUID : 00000000:00000000:00000000:00000000
       Checksum : cb9b0c02 correct
    MPB Sectors : 1
          Disks : 1
   RAID Devices : 0

  Disk00 Serial : W462MZ0R
          State : spare
             Id : 04000000
    Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)

    Disk Serial : W462MZ0R
          State : spare
             Id : 04000000
    Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)

sudo mdadm --examine /dev/sdc
/dev/sdc:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.3.00
    Orig Family : 154b243e
         Family : 154b243e
     Generation : 000003aa
  Creation Time : Unknown
     Attributes : All supported
           UUID : 72360627:bb745f4c:aedafaab:e25d3123
       Checksum : 21ae5a2a correct
    MPB Sectors : 2
          Disks : 4
   RAID Devices : 1

  Disk02 Serial : W461NLPM
          State : active
             Id : 00000002
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

[Data]:
       Subarray : 0
           UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
     RAID Level : 5 <-- 5
        Members : 4 <-- 4
          Slots : [U_UU] <-- [U_UU]
    Failed disk : 1
      This Slot : 2
    Sector Size : 512
     Array Size : 11135008768 (5.19 TiB 5.70 TB)
   Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
  Sector Offset : 0
    Num Stripes : 28997420
     Chunk Size : 64 KiB <-- 64 KiB
       Reserved : 0
  Migrate State : repair
      Map State : degraded <-- degraded
     Checkpoint : 462393 (512)
    Dirty State : dirty
     RWH Policy : off
      Volume ID : 1

  Disk00 Serial : W461SCHM
          State : active
             Id : 00000000
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

  Disk01 Serial : W461S13X:0
          State : active
             Id : ffffffff
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

  Disk03 Serial : W461NHAB
          State : active
             Id : 00000003
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

sudo mdadm --examine /dev/sdd
/dev/sdd:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.3.00
    Orig Family : 154b243e
         Family : 154b243e
     Generation : 000003aa
  Creation Time : Unknown
     Attributes : All supported
           UUID : 72360627:bb745f4c:aedafaab:e25d3123
       Checksum : 21ae5a2a correct
    MPB Sectors : 2
          Disks : 4
   RAID Devices : 1

  Disk03 Serial : W461NHAB
          State : active
             Id : 00000003
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

[Data]:
       Subarray : 0
           UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
     RAID Level : 5 <-- 5
        Members : 4 <-- 4
          Slots : [U_UU] <-- [U_UU]
    Failed disk : 1
      This Slot : 3
    Sector Size : 512
     Array Size : 11135008768 (5.19 TiB 5.70 TB)
   Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
  Sector Offset : 0
    Num Stripes : 28997420
     Chunk Size : 64 KiB <-- 64 KiB
       Reserved : 0
  Migrate State : repair
      Map State : degraded <-- degraded
     Checkpoint : 462393 (512)
    Dirty State : dirty
     RWH Policy : off
      Volume ID : 1

  Disk00 Serial : W461SCHM
          State : active
             Id : 00000000
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

  Disk01 Serial : W461S13X:0
          State : active
             Id : ffffffff
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)

  Disk02 Serial : W461NLPM
          State : active
             Id : 00000002
    Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
sudo mdadm --detail /dev/md125
/dev/md125:
           Version : imsm
        Raid Level : container
     Total Devices : 1

   Working Devices : 1

     Member Arrays :

    Number   Major   Minor   RaidDevice

       -       8       16        -        /dev/sdb

sudo mdadm --detail /dev/md126
/dev/md126:
         Container : /dev/md/imsm0, member 0
        Raid Level : raid5
     Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB)
      Raid Devices : 4
     Total Devices : 3

             State : active, FAILED, Not Started
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-asymmetric
        Chunk Size : 64K

Consistency Policy : unknown


              UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed

       -       8        0        0      sync   /dev/sda
       -       8       32        2      sync   /dev/sdc
       -       8       48        3      sync   /dev/sdd

sudo mdadm --detail /dev/md127
/dev/md127:
           Version : imsm
        Raid Level : container
     Total Devices : 3

   Working Devices : 3


              UUID : 72360627:bb745f4c:aedafaab:e25d3123
     Member Arrays : /dev/md126

    Number   Major   Minor   RaidDevice

       -       8        0        -        /dev/sda
       -       8       32        -        /dev/sdc
       -       8       48        -        /dev/sdd
 
lsdrv
**Warning** The following utility(ies) failed to execute:
  sginfo
Some information may be missing.

PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
└nvme nvme0 Samsung SSD 970 EVO Plus 500GB           {S4P2NF0M501223D}
 └nvme0n1 465.76g [259:0] Empty/Unknown
  ├nvme0n1p1 512.00m [259:1] Empty/Unknown
  │└Mounted as /dev/nvme0n1p1 @ /boot/efi
  ├nvme0n1p2 732.00m [259:2] Empty/Unknown
  │└Mounted as /dev/nvme0n1p2 @ /boot
  └nvme0n1p3 464.54g [259:3] Empty/Unknown
   ├dm-0 463.59g [252:0] Empty/Unknown
   │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ /
   └dm-1 980.00m [252:1] Empty/Unknown
PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09)
└scsi 0:x:x:x [Empty]
PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09)
├scsi 2:0:0:0 ATA      ST2000NX0253
│└sda 1.82t [8:0] Empty/Unknown
│ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
│ │                    Empty/Unknown
│ └md127 0.00k [9:127] MD vexternal:imsm  () inactive, None (None) None {None}
│                      Empty/Unknown
├scsi 3:0:0:0 ATA      ST2000NX0253
│└sdb 1.82t [8:16] Empty/Unknown
│ └md125 0.00k [9:125] MD vexternal:imsm  () inactive, None (None) None {None}
│                      Empty/Unknown
├scsi 4:0:0:0 ATA      ST2000NX0253
│└sdc 1.82t [8:32] Empty/Unknown
│ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
│ │                    Empty/Unknown
├scsi 5:0:0:0 ATA      ST2000NX0253
│└sdd 1.82t [8:48] Empty/Unknown
│ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
│ │                    Empty/Unknown
└scsi 6:0:0:0 Slimtype DVD A  DS8ACSH
 └sr0 1.00g [11:0] Empty/Unknown
Other Block Devices
├loop0 0.00k [7:0] Empty/Unknown
├loop1 0.00k [7:1] Empty/Unknown
├loop2 0.00k [7:2] Empty/Unknown
├loop3 0.00k [7:3] Empty/Unknown
├loop4 0.00k [7:4] Empty/Unknown
├loop5 0.00k [7:5] Empty/Unknown
├loop6 0.00k [7:6] Empty/Unknown
└loop7 0.00k [7:7] Empty/Unknown

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md125 : inactive sdb[0](S)
      1105 blocks super external:imsm

md126 : inactive sda[2] sdc[1] sdd[0]
      5567507712 blocks super external:/md127/0

md127 : inactive sdc[2](S) sdd[1](S) sda[0](S)
      9459 blocks super external:imsm

unused devices: <none>


Thanks for the help!
Devon Beets








^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cannot add replacement hard drive to mdadm RAID5 array
  2021-03-12 22:55 Cannot add replacement hard drive to mdadm RAID5 array Devon Beets
@ 2021-03-15 14:38 ` Artur Paszkiewicz
  2021-03-16  8:15   ` Tkaczyk, Mariusz
  0 siblings, 1 reply; 7+ messages in thread
From: Artur Paszkiewicz @ 2021-03-15 14:38 UTC (permalink / raw)
  To: Devon Beets, linux-raid; +Cc: Glenn Wikle

On 3/12/21 11:55 PM, Devon Beets wrote:
> Hello,
> 
> My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail.
> 
> Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not.
> 
> Below is the output of some commands for additional diagnostic information. Please let me know if you need more. 
> 
> Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far.
> 
> uname -a
> Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> 
> sudo mdadm --version
> mdadm - v4.1-126-gbdbe7f8 - 2021-03-09
> 
> sudo smartctl -H -i -l scterc /dev/sda
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Device Model:     ST2000NX0253
> Serial Number:    W461SCHM
> LU WWN Device Id: 5 000c50 0b426d2d0
> Firmware Version: SN05
> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Form Factor:      2.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-3 (minor revision not indicated)
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Mar 11 15:07:30 2021 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>            Read:    100 (10.0 seconds)
>           Write:    100 (10.0 seconds)
>  
> 
> sudo smartctl -H -i -l scterc /dev/sdb
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Device Model:     ST2000NX0253
> Serial Number:    W462MZ0R
> LU WWN Device Id: 5 000c50 0c569b51c
> Firmware Version: SN05
> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Form Factor:      2.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-3 (minor revision not indicated)
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Mar 11 15:09:34 2021 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>            Read:    100 (10.0 seconds)
>           Write:    100 (10.0 seconds)
>  
> sudo smartctl -H -i -l scterc /dev/sdc
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Device Model:     ST2000NX0253
> Serial Number:    W461NLPM
> LU WWN Device Id: 5 000c50 0b426f335
> Firmware Version: SN05
> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Form Factor:      2.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-3 (minor revision not indicated)
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Mar 11 15:14:38 2021 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>            Read:    100 (10.0 seconds)
>           Write:    100 (10.0 seconds)
> 
> sudo smartctl -H -i -l scterc /dev/sdd
> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Device Model:     ST2000NX0253
> Serial Number:    W461NHAB
> LU WWN Device Id: 5 000c50 0b426f8a4
> Firmware Version: SN05
> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Form Factor:      2.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-3 (minor revision not indicated)
> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Mar 11 15:16:24 2021 MST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> SCT Error Recovery Control:
>            Read:    100 (10.0 seconds)
>           Write:    100 (10.0 seconds)
>  
> sudo mdadm --examine /dev/sda
> /dev/sda:
>           Magic : Intel Raid ISM Cfg Sig.
>         Version : 1.3.00
>     Orig Family : 154b243e
>          Family : 154b243e
>      Generation : 000003aa
>   Creation Time : Unknown
>      Attributes : All supported
>            UUID : 72360627:bb745f4c:aedafaab:e25d3123
>        Checksum : 21ae5a2a correct
>     MPB Sectors : 2
>           Disks : 4
>    RAID Devices : 1
> 
>   Disk00 Serial : W461SCHM
>           State : active
>              Id : 00000000
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
> [Data]:
>        Subarray : 0
>            UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>      RAID Level : 5 <-- 5
>         Members : 4 <-- 4
>           Slots : [U_UU] <-- [U_UU]
>     Failed disk : 1
>       This Slot : 0
>     Sector Size : 512
>      Array Size : 11135008768 (5.19 TiB 5.70 TB)
>    Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>   Sector Offset : 0
>     Num Stripes : 28997420
>      Chunk Size : 64 KiB <-- 64 KiB
>        Reserved : 0
>   Migrate State : repair
>       Map State : degraded <-- degraded
>      Checkpoint : 462393 (512)
>     Dirty State : dirty
>      RWH Policy : off
>       Volume ID : 1
> 
>   Disk01 Serial : W461S13X:0
>           State : active
>              Id : ffffffff
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
>   Disk02 Serial : W461NLPM
>           State : active
>              Id : 00000002
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
>   Disk03 Serial : W461NHAB
>           State : active
>              Id : 00000003
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
> sudo mdadm --examine /dev/sdb
> /dev/sdb:
>           Magic : Intel Raid ISM Cfg Sig.
>         Version : 1.0.00
>     Orig Family : 00000000
>          Family : e5cd8601
>      Generation : 00000001
>   Creation Time : Unknown
>      Attributes : All supported
>            UUID : 00000000:00000000:00000000:00000000
>        Checksum : cb9b0c02 correct
>     MPB Sectors : 1
>           Disks : 1
>    RAID Devices : 0
> 
>   Disk00 Serial : W462MZ0R
>           State : spare
>              Id : 04000000
>     Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
> 
>     Disk Serial : W462MZ0R
>           State : spare
>              Id : 04000000
>     Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
> 
> sudo mdadm --examine /dev/sdc
> /dev/sdc:
>           Magic : Intel Raid ISM Cfg Sig.
>         Version : 1.3.00
>     Orig Family : 154b243e
>          Family : 154b243e
>      Generation : 000003aa
>   Creation Time : Unknown
>      Attributes : All supported
>            UUID : 72360627:bb745f4c:aedafaab:e25d3123
>        Checksum : 21ae5a2a correct
>     MPB Sectors : 2
>           Disks : 4
>    RAID Devices : 1
> 
>   Disk02 Serial : W461NLPM
>           State : active
>              Id : 00000002
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
> [Data]:
>        Subarray : 0
>            UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>      RAID Level : 5 <-- 5
>         Members : 4 <-- 4
>           Slots : [U_UU] <-- [U_UU]
>     Failed disk : 1
>       This Slot : 2
>     Sector Size : 512
>      Array Size : 11135008768 (5.19 TiB 5.70 TB)
>    Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>   Sector Offset : 0
>     Num Stripes : 28997420
>      Chunk Size : 64 KiB <-- 64 KiB
>        Reserved : 0
>   Migrate State : repair
>       Map State : degraded <-- degraded
>      Checkpoint : 462393 (512)
>     Dirty State : dirty
>      RWH Policy : off
>       Volume ID : 1
> 
>   Disk00 Serial : W461SCHM
>           State : active
>              Id : 00000000
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
>   Disk01 Serial : W461S13X:0
>           State : active
>              Id : ffffffff
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
>   Disk03 Serial : W461NHAB
>           State : active
>              Id : 00000003
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
> sudo mdadm --examine /dev/sdd
> /dev/sdd:
>           Magic : Intel Raid ISM Cfg Sig.
>         Version : 1.3.00
>     Orig Family : 154b243e
>          Family : 154b243e
>      Generation : 000003aa
>   Creation Time : Unknown
>      Attributes : All supported
>            UUID : 72360627:bb745f4c:aedafaab:e25d3123
>        Checksum : 21ae5a2a correct
>     MPB Sectors : 2
>           Disks : 4
>    RAID Devices : 1
> 
>   Disk03 Serial : W461NHAB
>           State : active
>              Id : 00000003
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
> [Data]:
>        Subarray : 0
>            UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>      RAID Level : 5 <-- 5
>         Members : 4 <-- 4
>           Slots : [U_UU] <-- [U_UU]
>     Failed disk : 1
>       This Slot : 3
>     Sector Size : 512
>      Array Size : 11135008768 (5.19 TiB 5.70 TB)
>    Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>   Sector Offset : 0
>     Num Stripes : 28997420
>      Chunk Size : 64 KiB <-- 64 KiB
>        Reserved : 0
>   Migrate State : repair
>       Map State : degraded <-- degraded
>      Checkpoint : 462393 (512)
>     Dirty State : dirty
>      RWH Policy : off
>       Volume ID : 1
> 
>   Disk00 Serial : W461SCHM
>           State : active
>              Id : 00000000
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
>   Disk01 Serial : W461S13X:0
>           State : active
>              Id : ffffffff
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> 
>   Disk02 Serial : W461NLPM
>           State : active
>              Id : 00000002
>     Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
> sudo mdadm --detail /dev/md125
> /dev/md125:
>            Version : imsm
>         Raid Level : container
>      Total Devices : 1
> 
>    Working Devices : 1
> 
>      Member Arrays :
> 
>     Number   Major   Minor   RaidDevice
> 
>        -       8       16        -        /dev/sdb
> 
> sudo mdadm --detail /dev/md126
> /dev/md126:
>          Container : /dev/md/imsm0, member 0
>         Raid Level : raid5
>      Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB)
>       Raid Devices : 4
>      Total Devices : 3
> 
>              State : active, FAILED, Not Started
>     Active Devices : 3
>    Working Devices : 3
>     Failed Devices : 0
>      Spare Devices : 0
> 
>             Layout : left-asymmetric
>         Chunk Size : 64K
> 
> Consistency Policy : unknown
> 
> 
>               UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>     Number   Major   Minor   RaidDevice State
>        -       0        0        0      removed
>        -       0        0        1      removed
>        -       0        0        2      removed
>        -       0        0        3      removed
> 
>        -       8        0        0      sync   /dev/sda
>        -       8       32        2      sync   /dev/sdc
>        -       8       48        3      sync   /dev/sdd
> 
> sudo mdadm --detail /dev/md127
> /dev/md127:
>            Version : imsm
>         Raid Level : container
>      Total Devices : 3
> 
>    Working Devices : 3
> 
> 
>               UUID : 72360627:bb745f4c:aedafaab:e25d3123
>      Member Arrays : /dev/md126
> 
>     Number   Major   Minor   RaidDevice
> 
>        -       8        0        -        /dev/sda
>        -       8       32        -        /dev/sdc
>        -       8       48        -        /dev/sdd
>  
> lsdrv
> **Warning** The following utility(ies) failed to execute:
>   sginfo
> Some information may be missing.
> 
> PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
> └nvme nvme0 Samsung SSD 970 EVO Plus 500GB           {S4P2NF0M501223D}
>  └nvme0n1 465.76g [259:0] Empty/Unknown
>   ├nvme0n1p1 512.00m [259:1] Empty/Unknown
>   │└Mounted as /dev/nvme0n1p1 @ /boot/efi
>   ├nvme0n1p2 732.00m [259:2] Empty/Unknown
>   │└Mounted as /dev/nvme0n1p2 @ /boot
>   └nvme0n1p3 464.54g [259:3] Empty/Unknown
>    ├dm-0 463.59g [252:0] Empty/Unknown
>    │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ /
>    └dm-1 980.00m [252:1] Empty/Unknown
> PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09)
> └scsi 0:x:x:x [Empty]
> PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09)
> ├scsi 2:0:0:0 ATA      ST2000NX0253
> │└sda 1.82t [8:0] Empty/Unknown
> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
> │ │                    Empty/Unknown
> │ └md127 0.00k [9:127] MD vexternal:imsm  () inactive, None (None) None {None}
> │                      Empty/Unknown
> ├scsi 3:0:0:0 ATA      ST2000NX0253
> │└sdb 1.82t [8:16] Empty/Unknown
> │ └md125 0.00k [9:125] MD vexternal:imsm  () inactive, None (None) None {None}
> │                      Empty/Unknown
> ├scsi 4:0:0:0 ATA      ST2000NX0253
> │└sdc 1.82t [8:32] Empty/Unknown
> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
> │ │                    Empty/Unknown
> ├scsi 5:0:0:0 ATA      ST2000NX0253
> │└sdd 1.82t [8:48] Empty/Unknown
> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
> │ │                    Empty/Unknown
> └scsi 6:0:0:0 Slimtype DVD A  DS8ACSH
>  └sr0 1.00g [11:0] Empty/Unknown
> Other Block Devices
> ├loop0 0.00k [7:0] Empty/Unknown
> ├loop1 0.00k [7:1] Empty/Unknown
> ├loop2 0.00k [7:2] Empty/Unknown
> ├loop3 0.00k [7:3] Empty/Unknown
> ├loop4 0.00k [7:4] Empty/Unknown
> ├loop5 0.00k [7:5] Empty/Unknown
> ├loop6 0.00k [7:6] Empty/Unknown
> └loop7 0.00k [7:7] Empty/Unknown
> 
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
> md125 : inactive sdb[0](S)
>       1105 blocks super external:imsm
> 
> md126 : inactive sda[2] sdc[1] sdd[0]
>       5567507712 blocks super external:/md127/0
> 
> md127 : inactive sdc[2](S) sdd[1](S) sda[0](S)
>       9459 blocks super external:imsm
> 
> unused devices: <none>
> 
> 
> Thanks for the help!
> Devon Beets

Hi Devon,

The array is in dirty degraded state. It does not start automatically because
there is a risk of silent data corruption, i.e. RAID write hole. You can force
it to start with:

# mdadm -R --force /dev/md126

You will need mdadm built with this commit for it to work:

https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=7b99edab2834d5d08ef774b4cff784caaa1a186f

It may be a good idea to copy the array contents with dd before you fsck or
mount the filesystem in case the recovery goes wrong.

Then stop the second container and add the new drive to the array:

# mdadm -S /dev/md125
# mdadm -a /dev/md127 /dev/sdb

Rebuild should begin at this point.

Regards,
Artur



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cannot add replacement hard drive to mdadm RAID5 array
  2021-03-15 14:38 ` Artur Paszkiewicz
@ 2021-03-16  8:15   ` Tkaczyk, Mariusz
  2021-04-06 20:05     ` Devon Beets
  0 siblings, 1 reply; 7+ messages in thread
From: Tkaczyk, Mariusz @ 2021-03-16  8:15 UTC (permalink / raw)
  To: Artur Paszkiewicz, Devon Beets, linux-raid; +Cc: Glenn Wikle

On 15.03.2021 15:38, Artur Paszkiewicz wrote:
> On 3/12/21 11:55 PM, Devon Beets wrote:
>> Hello,
>>
>> My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail.
>>
>> Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not.
>>
>> Below is the output of some commands for additional diagnostic information. Please let me know if you need more.
>>
>> Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far.
>>
>> uname -a
>> Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> sudo mdadm --version
>> mdadm - v4.1-126-gbdbe7f8 - 2021-03-09
>>
>> sudo smartctl -H -i -l scterc /dev/sda
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W461SCHM
>> LU WWN Device Id: 5 000c50 0b426d2d0
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:07:30 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>   
>>
>> sudo smartctl -H -i -l scterc /dev/sdb
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W462MZ0R
>> LU WWN Device Id: 5 000c50 0c569b51c
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:09:34 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>   
>> sudo smartctl -H -i -l scterc /dev/sdc
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W461NLPM
>> LU WWN Device Id: 5 000c50 0b426f335
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:14:38 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>
>> sudo smartctl -H -i -l scterc /dev/sdd
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W461NHAB
>> LU WWN Device Id: 5 000c50 0b426f8a4
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:16:24 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>   
>> sudo mdadm --examine /dev/sda
>> /dev/sda:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.3.00
>>      Orig Family : 154b243e
>>           Family : 154b243e
>>       Generation : 000003aa
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>         Checksum : 21ae5a2a correct
>>      MPB Sectors : 2
>>            Disks : 4
>>     RAID Devices : 1
>>
>>    Disk00 Serial : W461SCHM
>>            State : active
>>               Id : 00000000
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> [Data]:
>>         Subarray : 0
>>             UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>       RAID Level : 5 <-- 5
>>          Members : 4 <-- 4
>>            Slots : [U_UU] <-- [U_UU]
>>      Failed disk : 1
>>        This Slot : 0
>>      Sector Size : 512
>>       Array Size : 11135008768 (5.19 TiB 5.70 TB)
>>     Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>>    Sector Offset : 0
>>      Num Stripes : 28997420
>>       Chunk Size : 64 KiB <-- 64 KiB
>>         Reserved : 0
>>    Migrate State : repair
>>        Map State : degraded <-- degraded
>>       Checkpoint : 462393 (512)
>>      Dirty State : dirty
>>       RWH Policy : off
>>        Volume ID : 1
>>
>>    Disk01 Serial : W461S13X:0
>>            State : active
>>               Id : ffffffff
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk02 Serial : W461NLPM
>>            State : active
>>               Id : 00000002
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk03 Serial : W461NHAB
>>            State : active
>>               Id : 00000003
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> sudo mdadm --examine /dev/sdb
>> /dev/sdb:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.0.00
>>      Orig Family : 00000000
>>           Family : e5cd8601
>>       Generation : 00000001
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 00000000:00000000:00000000:00000000
>>         Checksum : cb9b0c02 correct
>>      MPB Sectors : 1
>>            Disks : 1
>>     RAID Devices : 0
>>
>>    Disk00 Serial : W462MZ0R
>>            State : spare
>>               Id : 04000000
>>      Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
>>
>>      Disk Serial : W462MZ0R
>>            State : spare
>>               Id : 04000000
>>      Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
>>
>> sudo mdadm --examine /dev/sdc
>> /dev/sdc:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.3.00
>>      Orig Family : 154b243e
>>           Family : 154b243e
>>       Generation : 000003aa
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>         Checksum : 21ae5a2a correct
>>      MPB Sectors : 2
>>            Disks : 4
>>     RAID Devices : 1
>>
>>    Disk02 Serial : W461NLPM
>>            State : active
>>               Id : 00000002
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> [Data]:
>>         Subarray : 0
>>             UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>       RAID Level : 5 <-- 5
>>          Members : 4 <-- 4
>>            Slots : [U_UU] <-- [U_UU]
>>      Failed disk : 1
>>        This Slot : 2
>>      Sector Size : 512
>>       Array Size : 11135008768 (5.19 TiB 5.70 TB)
>>     Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>>    Sector Offset : 0
>>      Num Stripes : 28997420
>>       Chunk Size : 64 KiB <-- 64 KiB
>>         Reserved : 0
>>    Migrate State : repair
>>        Map State : degraded <-- degraded
>>       Checkpoint : 462393 (512)
>>      Dirty State : dirty
>>       RWH Policy : off
>>        Volume ID : 1
>>
>>    Disk00 Serial : W461SCHM
>>            State : active
>>               Id : 00000000
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk01 Serial : W461S13X:0
>>            State : active
>>               Id : ffffffff
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk03 Serial : W461NHAB
>>            State : active
>>               Id : 00000003
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> sudo mdadm --examine /dev/sdd
>> /dev/sdd:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.3.00
>>      Orig Family : 154b243e
>>           Family : 154b243e
>>       Generation : 000003aa
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>         Checksum : 21ae5a2a correct
>>      MPB Sectors : 2
>>            Disks : 4
>>     RAID Devices : 1
>>
>>    Disk03 Serial : W461NHAB
>>            State : active
>>               Id : 00000003
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> [Data]:
>>         Subarray : 0
>>             UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>       RAID Level : 5 <-- 5
>>          Members : 4 <-- 4
>>            Slots : [U_UU] <-- [U_UU]
>>      Failed disk : 1
>>        This Slot : 3
>>      Sector Size : 512
>>       Array Size : 11135008768 (5.19 TiB 5.70 TB)
>>     Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>>    Sector Offset : 0
>>      Num Stripes : 28997420
>>       Chunk Size : 64 KiB <-- 64 KiB
>>         Reserved : 0
>>    Migrate State : repair
>>        Map State : degraded <-- degraded
>>       Checkpoint : 462393 (512)
>>      Dirty State : dirty
>>       RWH Policy : off
>>        Volume ID : 1
>>
>>    Disk00 Serial : W461SCHM
>>            State : active
>>               Id : 00000000
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk01 Serial : W461S13X:0
>>            State : active
>>               Id : ffffffff
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk02 Serial : W461NLPM
>>            State : active
>>               Id : 00000002
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>> sudo mdadm --detail /dev/md125
>> /dev/md125:
>>             Version : imsm
>>          Raid Level : container
>>       Total Devices : 1
>>
>>     Working Devices : 1
>>
>>       Member Arrays :
>>
>>      Number   Major   Minor   RaidDevice
>>
>>         -       8       16        -        /dev/sdb
>>
>> sudo mdadm --detail /dev/md126
>> /dev/md126:
>>           Container : /dev/md/imsm0, member 0
>>          Raid Level : raid5
>>       Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB)
>>        Raid Devices : 4
>>       Total Devices : 3
>>
>>               State : active, FAILED, Not Started
>>      Active Devices : 3
>>     Working Devices : 3
>>      Failed Devices : 0
>>       Spare Devices : 0
>>
>>              Layout : left-asymmetric
>>          Chunk Size : 64K
>>
>> Consistency Policy : unknown
>>
>>
>>                UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>      Number   Major   Minor   RaidDevice State
>>         -       0        0        0      removed
>>         -       0        0        1      removed
>>         -       0        0        2      removed
>>         -       0        0        3      removed
>>
>>         -       8        0        0      sync   /dev/sda
>>         -       8       32        2      sync   /dev/sdc
>>         -       8       48        3      sync   /dev/sdd
>>
>> sudo mdadm --detail /dev/md127
>> /dev/md127:
>>             Version : imsm
>>          Raid Level : container
>>       Total Devices : 3
>>
>>     Working Devices : 3
>>
>>
>>                UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>       Member Arrays : /dev/md126
>>
>>      Number   Major   Minor   RaidDevice
>>
>>         -       8        0        -        /dev/sda
>>         -       8       32        -        /dev/sdc
>>         -       8       48        -        /dev/sdd
>>   
>> lsdrv
>> **Warning** The following utility(ies) failed to execute:
>>    sginfo
>> Some information may be missing.
>>
>> PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
>> └nvme nvme0 Samsung SSD 970 EVO Plus 500GB           {S4P2NF0M501223D}
>>   └nvme0n1 465.76g [259:0] Empty/Unknown
>>    ├nvme0n1p1 512.00m [259:1] Empty/Unknown
>>    │└Mounted as /dev/nvme0n1p1 @ /boot/efi
>>    ├nvme0n1p2 732.00m [259:2] Empty/Unknown
>>    │└Mounted as /dev/nvme0n1p2 @ /boot
>>    └nvme0n1p3 464.54g [259:3] Empty/Unknown
>>     ├dm-0 463.59g [252:0] Empty/Unknown
>>     │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ /
>>     └dm-1 980.00m [252:1] Empty/Unknown
>> PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09)
>> └scsi 0:x:x:x [Empty]
>> PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09)
>> ├scsi 2:0:0:0 ATA      ST2000NX0253
>> │└sda 1.82t [8:0] Empty/Unknown
>> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
>> │ │                    Empty/Unknown
>> │ └md127 0.00k [9:127] MD vexternal:imsm  () inactive, None (None) None {None}
>> │                      Empty/Unknown
>> ├scsi 3:0:0:0 ATA      ST2000NX0253
>> │└sdb 1.82t [8:16] Empty/Unknown
>> │ └md125 0.00k [9:125] MD vexternal:imsm  () inactive, None (None) None {None}
>> │                      Empty/Unknown
>> ├scsi 4:0:0:0 ATA      ST2000NX0253
>> │└sdc 1.82t [8:32] Empty/Unknown
>> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
>> │ │                    Empty/Unknown
>> ├scsi 5:0:0:0 ATA      ST2000NX0253
>> │└sdd 1.82t [8:48] Empty/Unknown
>> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
>> │ │                    Empty/Unknown
>> └scsi 6:0:0:0 Slimtype DVD A  DS8ACSH
>>   └sr0 1.00g [11:0] Empty/Unknown
>> Other Block Devices
>> ├loop0 0.00k [7:0] Empty/Unknown
>> ├loop1 0.00k [7:1] Empty/Unknown
>> ├loop2 0.00k [7:2] Empty/Unknown
>> ├loop3 0.00k [7:3] Empty/Unknown
>> ├loop4 0.00k [7:4] Empty/Unknown
>> ├loop5 0.00k [7:5] Empty/Unknown
>> ├loop6 0.00k [7:6] Empty/Unknown
>> └loop7 0.00k [7:7] Empty/Unknown
>>
>> cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
>> md125 : inactive sdb[0](S)
>>        1105 blocks super external:imsm
>>
>> md126 : inactive sda[2] sdc[1] sdd[0]
>>        5567507712 blocks super external:/md127/0
>>
>> md127 : inactive sdc[2](S) sdd[1](S) sda[0](S)
>>        9459 blocks super external:imsm
>>
>> unused devices: <none>
>>
>>
>> Thanks for the help!
>> Devon Beets
> 
> Hi Devon,
> 
> The array is in dirty degraded state. It does not start automatically because
> there is a risk of silent data corruption, i.e. RAID write hole. You can force
> it to start with:
> 
> # mdadm -R --force /dev/md126
> 
> You will need mdadm built with this commit for it to work:
> 
> https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=7b99edab2834d5d08ef774b4cff784caaa1a186f
> 
> It may be a good idea to copy the array contents with dd before you fsck or
> mount the filesystem in case the recovery goes wrong.
> 
> Then stop the second container and add the new drive to the array:
> 
> # mdadm -S /dev/md125
> # mdadm -a /dev/md127 /dev/sdb
> 
> Rebuild should begin at this point.
> 
> Regards,
> Artur
> 
>

Hello,
also see:
https://lore.kernel.org/linux-raid/ac370d79-95e8-d0a1-0991-
fb12b128818c@linux.intel.com/T/#t

Mariusz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cannot add replacement hard drive to mdadm RAID5 array
  2021-03-16  8:15   ` Tkaczyk, Mariusz
@ 2021-04-06 20:05     ` Devon Beets
  2021-04-07 10:25       ` Artur Paszkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Devon Beets @ 2021-04-06 20:05 UTC (permalink / raw)
  To: Tkaczyk, Mariusz, Artur Paszkiewicz, linux-raid; +Cc: Glenn Wikle

Hello Artur,

I followed your recommended steps using the --force flag on run of my /dev/md126 array, in addition to stopping the extra container /dev/md125 and adding the new disk /dev/sdb to the parent container /dev/md127. It seems that it still does not work yet.

Output of mdstat prior to running the commands:

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md125 : inactive sdb[0](S)
      1105 blocks super external:imsm

md126 : inactive sda[2] sdc[1] sdd[0]
      5567507712 blocks super external:/md127/0

md127 : inactive sdc[2](S) sdd[1](S) sda[0](S)
      9459 blocks super external:imsm

unused devices: <none>

Output of the recommended commands for adding the new disk to the RAID5 array:

sudo mdadm -R --force /dev/md126
mdadm: array /dev/md/Data now has 3 devices (0 new)

sudo mdadm -S /dev/md125
mdadm: stopped /dev/md125

sudo mdadm -a /dev/md127 /dev/sdb
mdadm: added /dev/sdb

Output of mdstat after running the commands. Shows that both md126 and md127 are inactive, and there is no RAID resync happening.

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md126 : inactive sda[2] sdc[1] sdd[0]
      5567507712 blocks super external:/md127/0

md127 : inactive sdb[3](S) sdc[2](S) sdd[1](S) sda[0](S)
      10564 blocks super external:imsm

unused devices: <none>

Output of examination of the new disk /dev/sdb. Still shows that it does not have a UUID and is a spare disk, even after being added to the md127 parent container.

sudo mdadm -E /dev/sdb
/dev/sdb:
          Magic : Intel Raid ISM Cfg Sig.
        Version : 1.0.00
    Orig Family : 00000000
         Family : e4cd8601
     Generation : 00000001
  Creation Time : Unknown
     Attributes : All supported
           UUID : 00000000:00000000:00000000:00000000
       Checksum : c99b0c02 correct
    MPB Sectors : 1
          Disks : 1
   RAID Devices : 0

  Disk00 Serial : W462MZ0R
          State : spare
             Id : 03000000
    Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)

    Disk Serial : W462MZ0R
          State : spare
             Id : 03000000
    Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
	
Output of details of parent container md127, which does show that the new disk /dev/sdb is in the container after being added.
	
sudo mdadm -D /dev/md127
/dev/md127:
           Version : imsm
        Raid Level : container
     Total Devices : 4

   Working Devices : 4


              UUID : 72360627:bb745f4c:aedafaab:e25d3123
     Member Arrays : /dev/md126

    Number   Major   Minor   RaidDevice

       -       8        0        -        /dev/sda
       -       8       16        -        /dev/sdb
       -       8       32        -        /dev/sdc
       -       8       48        -        /dev/sdd
	   
	   
Output of details on the real RAID array md126 which does not show the new disk sdb, only containing the other three devices sda, sdc, and sdd. Also the state shows active, FAILED, Not Started. Also, it shows that four devices were removed.
	   
sudo mdadm -D /dev/md126
/dev/md126:
         Container : /dev/md/imsm0, member 0
        Raid Level : raid5
     Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB)
      Raid Devices : 4
     Total Devices : 3

             State : active, FAILED, Not Started
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-asymmetric
        Chunk Size : 64K

Consistency Policy : unknown


              UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed

       -       8        0        0      sync   /dev/sda
       -       8       32        2      sync   /dev/sdc
       -       8       48        3      sync   /dev/sdd
	   
Here is an example of the details of a healthy RAID array on another system. I want to recreate this result if possible.

sudo mdadm -D /dev/md126
/dev/md126:
         Container : /dev/md/imsm0, member 0
        Raid Level : raid5
        Array Size : 5567516672 (5309.60 GiB 5701.14 GB)
     Used Dev Size : 1855838976 (1769.87 GiB 1900.38 GB)
      Raid Devices : 4
     Total Devices : 4

             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-asymmetric
        Chunk Size : 64K

Consistency Policy : resync
​

              UUID : 4541a34d:4a84d4dc:3d1b0695:96f58987
    Number   Major   Minor   RaidDevice State
       3       8        0        0      active sync   /dev/sda
       2       8       16        1      active sync   /dev/sdb
       1       8       32        2      active sync   /dev/sdc
       0       8       48        3      active sync   /dev/sdd

Please let me know if there is any additional information I can provide to help troubleshooting.

Thanks!

Devon Beets
 
  
 










From: Tkaczyk, Mariusz <mariusz.tkaczyk@linux.intel.com>
Sent: Tuesday, March 16, 2021 2:15 AM
To: Artur Paszkiewicz <artur.paszkiewicz@intel.com>; Devon Beets <devon@sigmalabsinc.com>; linux-raid@vger.kernel.org <linux-raid@vger.kernel.org>
Cc: Glenn Wikle <gwikle@sigmalabsinc.com>
Subject: Re: Cannot add replacement hard drive to mdadm RAID5 array 
 
On 15.03.2021 15:38, Artur Paszkiewicz wrote:
> On 3/12/21 11:55 PM, Devon Beets wrote:
>> Hello,
>>
>> My colleague and I have been trying to replace a failed hard drive in a four-drive RAID5 array (/dev/sda to /dev/sdd). The failed drive is sdb. We have physically removed the hard drive and replaced it with a new drive that has identical specifications. We did not first use mdadm to set the failed hard drive with --fail.
>>
>> Upon booting the system with the new /dev/sdb drive installed, we see that instead of the usual two md entries (/dev/md127 which is an IMSM container and /dev/md126 which is the actual array) there are now three entries: md125 to md127. md127 is the IMSM container for sda, sdc, and sdd. md125 is a new container for sdb that we do not want. md126 is the actual array and only contains sda, sdc, and sdd. We tried to use --stop and --remove to get rid of md125, then add sdb to md127, and assemble to see if it adds to md126. It does not.
>>
>> Below is the output of some commands for additional diagnostic information. Please let me know if you need more.
>>
>> Note: The output of these commands is after a fresh reboot, without/before all the other commands we tried to fix it. It gets reset to this state after every reboot we tried so far.
>>
>> uname -a
>> Linux aerospace-pr3d-app 4.4.0-194-generic #226-Ubuntu SMP Wed Oct 21 10:19:36 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> sudo mdadm --version
>> mdadm - v4.1-126-gbdbe7f8 - 2021-03-09
>>
>> sudo smartctl -H -i -l scterc /dev/sda
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W461SCHM
>> LU WWN Device Id: 5 000c50 0b426d2d0
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:07:30 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>   
>>
>> sudo smartctl -H -i -l scterc /dev/sdb
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W462MZ0R
>> LU WWN Device Id: 5 000c50 0c569b51c
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:09:34 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>   
>> sudo smartctl -H -i -l scterc /dev/sdc
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W461NLPM
>> LU WWN Device Id: 5 000c50 0b426f335
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:14:38 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>
>> sudo smartctl -H -i -l scterc /dev/sdd
>> smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-194-generic] (local build)
>> Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     ST2000NX0253
>> Serial Number:    W461NHAB
>> LU WWN Device Id: 5 000c50 0b426f8a4
>> Firmware Version: SN05
>> User Capacity:    2,000,398,934,016 bytes [2.00 TB]
>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>> Rotation Rate:    7200 rpm
>> Form Factor:      2.5 inches
>> Device is:        Not in smartctl database [for details use: -P showall]
>> ATA Version is:   ACS-3 (minor revision not indicated)
>> SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Mar 11 15:16:24 2021 MST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> SCT Error Recovery Control:
>>             Read:    100 (10.0 seconds)
>>            Write:    100 (10.0 seconds)
>>   
>> sudo mdadm --examine /dev/sda
>> /dev/sda:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.3.00
>>      Orig Family : 154b243e
>>           Family : 154b243e
>>       Generation : 000003aa
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>         Checksum : 21ae5a2a correct
>>      MPB Sectors : 2
>>            Disks : 4
>>     RAID Devices : 1
>>
>>    Disk00 Serial : W461SCHM
>>            State : active
>>               Id : 00000000
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> [Data]:
>>         Subarray : 0
>>             UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>       RAID Level : 5 <-- 5
>>          Members : 4 <-- 4
>>            Slots : [U_UU] <-- [U_UU]
>>      Failed disk : 1
>>        This Slot : 0
>>      Sector Size : 512
>>       Array Size : 11135008768 (5.19 TiB 5.70 TB)
>>     Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>>    Sector Offset : 0
>>      Num Stripes : 28997420
>>       Chunk Size : 64 KiB <-- 64 KiB
>>         Reserved : 0
>>    Migrate State : repair
>>        Map State : degraded <-- degraded
>>       Checkpoint : 462393 (512)
>>      Dirty State : dirty
>>       RWH Policy : off
>>        Volume ID : 1
>>
>>    Disk01 Serial : W461S13X:0
>>            State : active
>>               Id : ffffffff
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk02 Serial : W461NLPM
>>            State : active
>>               Id : 00000002
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk03 Serial : W461NHAB
>>            State : active
>>               Id : 00000003
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> sudo mdadm --examine /dev/sdb
>> /dev/sdb:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.0.00
>>      Orig Family : 00000000
>>           Family : e5cd8601
>>       Generation : 00000001
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 00000000:00000000:00000000:00000000
>>         Checksum : cb9b0c02 correct
>>      MPB Sectors : 1
>>            Disks : 1
>>     RAID Devices : 0
>>
>>    Disk00 Serial : W462MZ0R
>>            State : spare
>>               Id : 04000000
>>      Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
>>
>>      Disk Serial : W462MZ0R
>>            State : spare
>>               Id : 04000000
>>      Usable Size : 3907026958 (1863.02 GiB 2000.40 GB)
>>
>> sudo mdadm --examine /dev/sdc
>> /dev/sdc:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.3.00
>>      Orig Family : 154b243e
>>           Family : 154b243e
>>       Generation : 000003aa
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>         Checksum : 21ae5a2a correct
>>      MPB Sectors : 2
>>            Disks : 4
>>     RAID Devices : 1
>>
>>    Disk02 Serial : W461NLPM
>>            State : active
>>               Id : 00000002
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> [Data]:
>>         Subarray : 0
>>             UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>       RAID Level : 5 <-- 5
>>          Members : 4 <-- 4
>>            Slots : [U_UU] <-- [U_UU]
>>      Failed disk : 1
>>        This Slot : 2
>>      Sector Size : 512
>>       Array Size : 11135008768 (5.19 TiB 5.70 TB)
>>     Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>>    Sector Offset : 0
>>      Num Stripes : 28997420
>>       Chunk Size : 64 KiB <-- 64 KiB
>>         Reserved : 0
>>    Migrate State : repair
>>        Map State : degraded <-- degraded
>>       Checkpoint : 462393 (512)
>>      Dirty State : dirty
>>       RWH Policy : off
>>        Volume ID : 1
>>
>>    Disk00 Serial : W461SCHM
>>            State : active
>>               Id : 00000000
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk01 Serial : W461S13X:0
>>            State : active
>>               Id : ffffffff
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk03 Serial : W461NHAB
>>            State : active
>>               Id : 00000003
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> sudo mdadm --examine /dev/sdd
>> /dev/sdd:
>>            Magic : Intel Raid ISM Cfg Sig.
>>          Version : 1.3.00
>>      Orig Family : 154b243e
>>           Family : 154b243e
>>       Generation : 000003aa
>>    Creation Time : Unknown
>>       Attributes : All supported
>>             UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>         Checksum : 21ae5a2a correct
>>      MPB Sectors : 2
>>            Disks : 4
>>     RAID Devices : 1
>>
>>    Disk03 Serial : W461NHAB
>>            State : active
>>               Id : 00000003
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>> [Data]:
>>         Subarray : 0
>>             UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>       RAID Level : 5 <-- 5
>>          Members : 4 <-- 4
>>            Slots : [U_UU] <-- [U_UU]
>>      Failed disk : 1
>>        This Slot : 3
>>      Sector Size : 512
>>       Array Size : 11135008768 (5.19 TiB 5.70 TB)
>>     Per Dev Size : 3711671808 (1769.86 GiB 1900.38 GB)
>>    Sector Offset : 0
>>      Num Stripes : 28997420
>>       Chunk Size : 64 KiB <-- 64 KiB
>>         Reserved : 0
>>    Migrate State : repair
>>        Map State : degraded <-- degraded
>>       Checkpoint : 462393 (512)
>>      Dirty State : dirty
>>       RWH Policy : off
>>        Volume ID : 1
>>
>>    Disk00 Serial : W461SCHM
>>            State : active
>>               Id : 00000000
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk01 Serial : W461S13X:0
>>            State : active
>>               Id : ffffffff
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>>
>>    Disk02 Serial : W461NLPM
>>            State : active
>>               Id : 00000002
>>      Usable Size : 3907018766 (1863.01 GiB 2000.39 GB)
>> sudo mdadm --detail /dev/md125
>> /dev/md125:
>>             Version : imsm
>>          Raid Level : container
>>       Total Devices : 1
>>
>>     Working Devices : 1
>>
>>       Member Arrays :
>>
>>      Number   Major   Minor   RaidDevice
>>
>>         -       8       16        -        /dev/sdb
>>
>> sudo mdadm --detail /dev/md126
>> /dev/md126:
>>           Container : /dev/md/imsm0, member 0
>>          Raid Level : raid5
>>       Used Dev Size : 1855835904 (1769.86 GiB 1900.38 GB)
>>        Raid Devices : 4
>>       Total Devices : 3
>>
>>               State : active, FAILED, Not Started
>>      Active Devices : 3
>>     Working Devices : 3
>>      Failed Devices : 0
>>       Spare Devices : 0
>>
>>              Layout : left-asymmetric
>>          Chunk Size : 64K
>>
>> Consistency Policy : unknown
>>
>>
>>                UUID : 764aa814:831953a1:06cf2a07:1ca42b2e
>>      Number   Major   Minor   RaidDevice State
>>         -       0        0        0      removed
>>         -       0        0        1      removed
>>         -       0        0        2      removed
>>         -       0        0        3      removed
>>
>>         -       8        0        0      sync   /dev/sda
>>         -       8       32        2      sync   /dev/sdc
>>         -       8       48        3      sync   /dev/sdd
>>
>> sudo mdadm --detail /dev/md127
>> /dev/md127:
>>             Version : imsm
>>          Raid Level : container
>>       Total Devices : 3
>>
>>     Working Devices : 3
>>
>>
>>                UUID : 72360627:bb745f4c:aedafaab:e25d3123
>>       Member Arrays : /dev/md126
>>
>>      Number   Major   Minor   RaidDevice
>>
>>         -       8        0        -        /dev/sda
>>         -       8       32        -        /dev/sdc
>>         -       8       48        -        /dev/sdd
>>   
>> lsdrv
>> **Warning** The following utility(ies) failed to execute:
>>    sginfo
>> Some information may be missing.
>>
>> PCI [nvme] 04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
>> └nvme nvme0 Samsung SSD 970 EVO Plus 500GB           {S4P2NF0M501223D}
>>   └nvme0n1 465.76g [259:0] Empty/Unknown
>>    ├nvme0n1p1 512.00m [259:1] Empty/Unknown
>>    │└Mounted as /dev/nvme0n1p1 @ /boot/efi
>>    ├nvme0n1p2 732.00m [259:2] Empty/Unknown
>>    │└Mounted as /dev/nvme0n1p2 @ /boot
>>    └nvme0n1p3 464.54g [259:3] Empty/Unknown
>>     ├dm-0 463.59g [252:0] Empty/Unknown
>>     │└Mounted as /dev/mapper/customer--pr3d--app--vg-root @ /
>>     └dm-1 980.00m [252:1] Empty/Unknown
>> PCI [ahci] 00:11.5 SATA controller: Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode] (rev 09)
>> └scsi 0:x:x:x [Empty]
>> PCI [ahci] 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 09)
>> ├scsi 2:0:0:0 ATA      ST2000NX0253
>> │└sda 1.82t [8:0] Empty/Unknown
>> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
>> │ │                    Empty/Unknown
>> │ └md127 0.00k [9:127] MD vexternal:imsm  () inactive, None (None) None {None}
>> │                      Empty/Unknown
>> ├scsi 3:0:0:0 ATA      ST2000NX0253
>> │└sdb 1.82t [8:16] Empty/Unknown
>> │ └md125 0.00k [9:125] MD vexternal:imsm  () inactive, None (None) None {None}
>> │                      Empty/Unknown
>> ├scsi 4:0:0:0 ATA      ST2000NX0253
>> │└sdc 1.82t [8:32] Empty/Unknown
>> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
>> │ │                    Empty/Unknown
>> ├scsi 5:0:0:0 ATA      ST2000NX0253
>> │└sdd 1.82t [8:48] Empty/Unknown
>> │ ├md126 0.00k [9:126] MD vexternal:/md127/0 raid5 (4) inactive, 64k Chunk, None (None) None {None}
>> │ │                    Empty/Unknown
>> └scsi 6:0:0:0 Slimtype DVD A  DS8ACSH
>>   └sr0 1.00g [11:0] Empty/Unknown
>> Other Block Devices
>> ├loop0 0.00k [7:0] Empty/Unknown
>> ├loop1 0.00k [7:1] Empty/Unknown
>> ├loop2 0.00k [7:2] Empty/Unknown
>> ├loop3 0.00k [7:3] Empty/Unknown
>> ├loop4 0.00k [7:4] Empty/Unknown
>> ├loop5 0.00k [7:5] Empty/Unknown
>> ├loop6 0.00k [7:6] Empty/Unknown
>> └loop7 0.00k [7:7] Empty/Unknown
>>
>> cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
>> md125 : inactive sdb[0](S)
>>        1105 blocks super external:imsm
>>
>> md126 : inactive sda[2] sdc[1] sdd[0]
>>        5567507712 blocks super external:/md127/0
>>
>> md127 : inactive sdc[2](S) sdd[1](S) sda[0](S)
>>        9459 blocks super external:imsm
>>
>> unused devices: <none>
>>
>>
>> Thanks for the help!
>> Devon Beets
> 
> Hi Devon,
> 
> The array is in dirty degraded state. It does not start automatically because
> there is a risk of silent data corruption, i.e. RAID write hole. You can force
> it to start with:
> 
> # mdadm -R --force /dev/md126
> 
> You will need mdadm built with this commit for it to work:
> 
> https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=7b99edab2834d5d08ef774b4cff784caaa1a186f
> 
> It may be a good idea to copy the array contents with dd before you fsck or
> mount the filesystem in case the recovery goes wrong.
> 
> Then stop the second container and add the new drive to the array:
> 
> # mdadm -S /dev/md125
> # mdadm -a /dev/md127 /dev/sdb
> 
> Rebuild should begin at this point.
> 
> Regards,
> Artur
> 
>

Hello,
also see:
https://lore.kernel.org/linux-raid/ac370d79-95e8-d0a1-0991-
fb12b128818c@linux.intel.com/T/#t

Mariusz

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cannot add replacement hard drive to mdadm RAID5 array
  2021-04-06 20:05     ` Devon Beets
@ 2021-04-07 10:25       ` Artur Paszkiewicz
  2021-04-22 17:56         ` Devon Beets
  0 siblings, 1 reply; 7+ messages in thread
From: Artur Paszkiewicz @ 2021-04-07 10:25 UTC (permalink / raw)
  To: Devon Beets, Tkaczyk, Mariusz, linux-raid; +Cc: Glenn Wikle

On 06.04.2021 22:05, Devon Beets wrote:
> Output of the recommended commands for adding the new disk to the RAID5 array:
> 
> sudo mdadm -R --force /dev/md126
> mdadm: array /dev/md/Data now has 3 devices (0 new)
> 
> sudo mdadm -S /dev/md125
> mdadm: stopped /dev/md125
> 
> sudo mdadm -a /dev/md127 /dev/sdb
> mdadm: added /dev/sdb
> 
> Output of mdstat after running the commands. Shows that both md126 and md127 are inactive, and there is no RAID resync happening.
> 
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
> md126 : inactive sda[2] sdc[1] sdd[0]
>       5567507712 blocks super external:/md127/0
> 
> md127 : inactive sdb[3](S) sdc[2](S) sdd[1](S) sda[0](S)
>       10564 blocks super external:imsm
> 
> unused devices: <none>

It looks like mdadm still does not handle this case correctly. Please do this
before the "mdadm -R --force /dev/md126":

printf "%llu\n" -1 > /sys/block/md126/md/resync_start

Regards,
Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cannot add replacement hard drive to mdadm RAID5 array
  2021-04-07 10:25       ` Artur Paszkiewicz
@ 2021-04-22 17:56         ` Devon Beets
  2021-04-23  7:26           ` Artur Paszkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Devon Beets @ 2021-04-22 17:56 UTC (permalink / raw)
  To: Artur Paszkiewicz, Tkaczyk, Mariusz, linux-raid; +Cc: Glenn Wikle

Hello Artur,

Your answer worked! I just wanted to follow up, so anyone else who has a similar issue can arrive at the solution. Also, I am reporting what I did since it is not exactly as you suggested.

I tried to follow your extra step, but it returned as an invalid number.

I ran with sudo since my user is not root in this case:

printf "%llu\n" -1 > sudo /sys/block/md126/md/resync_start
-bash: printf: /sys/block/md126/md/resync_start: invalid number

So, we assumed that you simply wanted us to edit the resync_start file value to the number 18446744073709551615. I did it by hand using a text editor. After doing so, the value of the file changed to none.

sudo cat /sys/block/md126/md/resync_start
none

After that, I proceeded to reconstruct the array. But I changed the order of the commands. Not sure if that mattered.

sudo mdadm -S /dev/md125
mdadm: stopped /dev/md125

sudo mdadm -a /dev/md127 /dev/sdb
mdadm: added /dev/sdb

sudo mdadm -R --force /dev/md126
mdadm: Started /dev/md/Data with 3 devices (0 new)

Even though it only reported 3 devices (0 new) during the last command's output, it successfully added the new /dev/sdb drive as a spare, started the array resync, and is recovering now as reported by cat /proc/mdstat.

Thank you so much for the assistance!

Devon Beets


From: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Sent: Wednesday, April 7, 2021 4:25 AM
To: Devon Beets <devon@sigmalabsinc.com>; Tkaczyk, Mariusz <mariusz.tkaczyk@linux.intel.com>; linux-raid@vger.kernel.org <linux-raid@vger.kernel.org>
Cc: Glenn Wikle <gwikle@sigmalabsinc.com>
Subject: Re: Cannot add replacement hard drive to mdadm RAID5 array 
 
On 06.04.2021 22:05, Devon Beets wrote:
> Output of the recommended commands for adding the new disk to the RAID5 array:
> 
> sudo mdadm -R --force /dev/md126
> mdadm: array /dev/md/Data now has 3 devices (0 new)
> 
> sudo mdadm -S /dev/md125
> mdadm: stopped /dev/md125
> 
> sudo mdadm -a /dev/md127 /dev/sdb
> mdadm: added /dev/sdb
> 
> Output of mdstat after running the commands. Shows that both md126 and md127 are inactive, and there is no RAID resync happening.
> 
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
> md126 : inactive sda[2] sdc[1] sdd[0]
>       5567507712 blocks super external:/md127/0
> 
> md127 : inactive sdb[3](S) sdc[2](S) sdd[1](S) sda[0](S)
>       10564 blocks super external:imsm
> 
> unused devices: <none>

It looks like mdadm still does not handle this case correctly. Please do this
before the "mdadm -R --force /dev/md126":

printf "%llu\n" -1 > /sys/block/md126/md/resync_start

Regards,
Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Cannot add replacement hard drive to mdadm RAID5 array
  2021-04-22 17:56         ` Devon Beets
@ 2021-04-23  7:26           ` Artur Paszkiewicz
  0 siblings, 0 replies; 7+ messages in thread
From: Artur Paszkiewicz @ 2021-04-23  7:26 UTC (permalink / raw)
  To: Devon Beets, Tkaczyk, Mariusz, linux-raid; +Cc: Glenn Wikle

On 22.04.2021 19:56, Devon Beets wrote:
> I ran with sudo since my user is not root in this case:
> 
> printf "%llu\n" -1 > sudo /sys/block/md126/md/resync_start
> -bash: printf: /sys/block/md126/md/resync_start: invalid number
> 
> So, we assumed that you simply wanted us to edit the resync_start file value to the number 18446744073709551615. I did it by hand using a text editor. After doing so, the value of the file changed to none.

That's right, sorry I forgot about sudo. It's a bit tricky to use it with
redirections. Something like this should work:

sudo sh -c 'printf "%llu\n" -1 > /sys/block/md126/md/resync_start'

> After that, I proceeded to reconstruct the array. But I changed the order of the commands. Not sure if that mattered.
> 
> sudo mdadm -S /dev/md125
> mdadm: stopped /dev/md125
> 
> sudo mdadm -a /dev/md127 /dev/sdb
> mdadm: added /dev/sdb
> 
> sudo mdadm -R --force /dev/md126
> mdadm: Started /dev/md/Data with 3 devices (0 new)
> 
> Even though it only reported 3 devices (0 new) during the last command's output, it successfully added the new /dev/sdb drive as a spare, started the array resync, and is recovering now as reported by cat /proc/mdstat.
> 
> Thank you so much for the assistance!

No problem, I'm glad you got it working.

Regards,
Artur

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-04-23  7:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 22:55 Cannot add replacement hard drive to mdadm RAID5 array Devon Beets
2021-03-15 14:38 ` Artur Paszkiewicz
2021-03-16  8:15   ` Tkaczyk, Mariusz
2021-04-06 20:05     ` Devon Beets
2021-04-07 10:25       ` Artur Paszkiewicz
2021-04-22 17:56         ` Devon Beets
2021-04-23  7:26           ` Artur Paszkiewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.