All of lore.kernel.org
 help / color / mirror / Atom feed
* Unable to reactivate a RAID10 mdadm device
@ 2013-02-12  7:16 Arun Khan
  2013-02-12  8:32 ` Adam Goryachev
  2013-02-12  8:42 ` Dave Cundiff
  0 siblings, 2 replies; 11+ messages in thread
From: Arun Khan @ 2013-02-12  7:16 UTC (permalink / raw)
  To: Linux MDADM Raid

Recovery OS -- System Rescue CD v 2.8.0

Production OS - Debian Squeeze (6) 2.6.32 stock kernel, using mdadm raid

/dev/md0 in raid level RAID10 - members /dev/sdb1, /dev/sdc1,
/dev/sdd1, /dev/sde1 all with partion id=fd

HDD /dev/sdb went bad, replaced it with another disk with same size
partion (id=fd)
using System Rescue CD v2.8.0

1. System Rescue CD recognized the md devices but it comes up as 'inactive'

Searched for possible solutions and I have tried several things including
zeroing the super block and adding them back to the array.

Still unable to bring back /dev/md0 with all 4 partions in active mode.

I have included below, the entire transcript of the commands I have
tried to recover /dev/md0

I have data on /dev/md0 that I need. I do have back ups of critical
files (but not all).

I prefer solving the problem v/s recreating the /dev/md0 from scratch.

Any help in solving this problem would be highly appreciated.

TIA,
-- Arun Khan

---------------  transcript of mdadm activity  with System Rescue CD
v2.8.0  ----------------

# mdadm -V
mdadm - v3.1.4 - 31st August 2010

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdd1[2] sde1[3]
      312574512 blocks super 1.0

# mdadm -S /dev/md0
mdadm: stopped /dev/md0

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
unused devices: <none>


# mdadm -v -v -A /dev/md0 -R /dev/sd[bcde]1
mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
mdadm: added /dev/sdb1 to /dev/md0 as 0
mdadm: added /dev/sdc1 to /dev/md0 as 1
mdadm: added /dev/sde1 to /dev/md0 as 3
mdadm: added /dev/sdd1 to /dev/md0 as 2
mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: Not enough devices to start the array.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdd1[2] sde1[3]
      312574512 blocks super 1.0

unused devices: <none>

from /var/log/messages
Feb 12 06:13:59 sysresccd kernel: [ 7593.339015] md: md0 stopped.
Feb 12 06:13:59 sysresccd kernel: [ 7593.374016] md: bind<sdb1>
Feb 12 06:13:59 sysresccd kernel: [ 7593.374417] md: bind<sdc1>
Feb 12 06:13:59 sysresccd kernel: [ 7593.374604] md: bind<sde1>
Feb 12 06:13:59 sysresccd kernel: [ 7593.374869] md: bind<sdd1>
Feb 12 06:13:59 sysresccd kernel: [ 7593.374899] md: kicking non-fresh
sdc1 from array!
Feb 12 06:13:59 sysresccd kernel: [ 7593.374903] md: unbind<sdc1>
Feb 12 06:13:59 sysresccd kernel: [ 7593.379016] md: export_rdev(sdc1)
Feb 12 06:13:59 sysresccd kernel: [ 7593.379041] md: kicking non-fresh
sdb1 from array!
Feb 12 06:13:59 sysresccd kernel: [ 7593.379044] md: unbind<sdb1>
Feb 12 06:13:59 sysresccd kernel: [ 7593.386010] md: export_rdev(sdb1)
Feb 12 06:13:59 sysresccd kernel: [ 7593.387382] md/raid10:md0: not
enough operational mirrors.
Feb 12 06:13:59 sysresccd kernel: [ 7593.387410] md: pers->run() failed ...

# mdadm -D /dev/md0
/dev/md0:
        Version : 1.0
  Creation Time : Fri Apr 29 04:27:04 2011
     Raid Level : raid10
  Used Dev Size : 156287232 (149.05 GiB 160.04 GB)
   Raid Devices : 4
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Feb 11 13:43:52 2013
          State : active, FAILED, Not Started
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 32K

           Name : brahmaputra:0
           UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
         Events : 783527

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       0        0        1      removed
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1


# mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
           Name : brahmaputra:0
  Creation Time : Fri Apr 29 04:27:04 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
     Array Size : 625148928 (298.09 GiB 320.08 GB)
  Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
   Super Offset : 312574640 sectors
          State : clean
    Device UUID : 2fbc103e:ca40e0c2:b8e4d64f:0fbc7b94

Internal Bitmap : -8 sectors from superblock
    Update Time : Mon Feb 11 13:43:52 2013
       Checksum : 2e8e9fad - correct
         Events : 0

         Layout : near=2
     Chunk Size : 32K

   Device Role : spare
   Array State : ..AA ('A' == active, '.' == missing)


#  mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
           Name : brahmaputra:0
  Creation Time : Fri Apr 29 04:27:04 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
     Array Size : 625148928 (298.09 GiB 320.08 GB)
  Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
   Super Offset : 312574640 sectors
          State : clean
    Device UUID : e27c187b:9004cb93:5bb05639:164822cd

Internal Bitmap : -8 sectors from superblock
    Update Time : Mon Feb 11 13:43:52 2013
       Checksum : 5ea77bd0 - correct
         Events : 0

         Layout : near=2
     Chunk Size : 32K

   Device Role : spare
   Array State : ..AA ('A' == active, '.' == missing)

#  mdadm -E /dev/sde1
/dev/sde1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
           Name : brahmaputra:0
  Creation Time : Fri Apr 29 04:27:04 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
     Array Size : 625148928 (298.09 GiB 320.08 GB)
  Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
   Super Offset : 312574640 sectors
          State : clean
    Device UUID : 78fc82bc:2eb18f07:56c98922:7639269e

Internal Bitmap : -8 sectors from superblock
    Update Time : Mon Feb 11 13:43:52 2013
       Checksum : ce19a703 - correct
         Events : 783527

         Layout : near=2
     Chunk Size : 32K

   Device Role : Active device 3
   Array State : ..AA ('A' == active, '.' == missing)

#  mdadm -E /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x1
     Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
           Name : brahmaputra:0
  Creation Time : Fri Apr 29 04:27:04 2011
     Raid Level : raid10
   Raid Devices : 4

 Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
     Array Size : 625148928 (298.09 GiB 320.08 GB)
  Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
   Super Offset : 312574640 sectors
          State : clean
    Device UUID : 1dce2f63:fef488cb:1a362c57:7ed908ac

Internal Bitmap : -8 sectors from superblock
    Update Time : Mon Feb 11 13:43:52 2013
       Checksum : 7b44c944 - correct
         Events : 783527

         Layout : near=2
     Chunk Size : 32K

   Device Role : Active device 2
   Array State : ..AA ('A' == active, '.' == missing)

# mdadm --zero-superblock /dev/sdb1

# mdadm -E /dev/sdb1
mdadm: No md superblock detected on /dev/sdb1.

# mdadm /dev/md0 --add /dev/sdb1
mdadm: add new device failed for /dev/sdb1 as 4: Invalid argument

from /var/log/messages
Feb 12 06:52:26 sysresccd kernel: [ 9900.298880] md0: ADD_NEW_DISK not supported

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12  7:16 Unable to reactivate a RAID10 mdadm device Arun Khan
@ 2013-02-12  8:32 ` Adam Goryachev
  2013-02-12 11:30   ` Arun Khan
  2013-02-12  8:42 ` Dave Cundiff
  1 sibling, 1 reply; 11+ messages in thread
From: Adam Goryachev @ 2013-02-12  8:32 UTC (permalink / raw)
  To: Arun Khan; +Cc: Linux MDADM Raid

On 12/02/13 18:16, Arun Khan wrote:
> Recovery OS -- System Rescue CD v 2.8.0
>
> Production OS - Debian Squeeze (6) 2.6.32 stock kernel, using mdadm raid
>
> /dev/md0 in raid level RAID10 - members /dev/sdb1, /dev/sdc1,
> /dev/sdd1, /dev/sde1 all with partion id=fd
>
> HDD /dev/sdb went bad, replaced it with another disk with same size
> partion (id=fd)
> using System Rescue CD v2.8.0
>
> 1. System Rescue CD recognized the md devices but it comes up as 'inactive'
>
> Searched for possible solutions and I have tried several things including
> zeroing the super block and adding them back to the array.
>
> Still unable to bring back /dev/md0 with all 4 partions in active mode.
>
> I have included below, the entire transcript of the commands I have
> tried to recover /dev/md0
>
> I have data on /dev/md0 that I need. I do have back ups of critical
> files (but not all).
>
> I prefer solving the problem v/s recreating the /dev/md0 from scratch.
>
> Any help in solving this problem would be highly appreciated.
>
> TIA,
> -- Arun Khan
>
> ---------------  transcript of mdadm activity  with System Rescue CD
> v2.8.0  ----------------
>
> # mdadm -V
> mdadm - v3.1.4 - 31st August 2010
>
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdd1[2] sde1[3]
>       312574512 blocks super 1.0
>
> # mdadm -S /dev/md0
> mdadm: stopped /dev/md0
>
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
>
>
> # mdadm -v -v -A /dev/md0 -R /dev/sd[bcde]1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: added /dev/sdb1 to /dev/md0 as 0
> mdadm: added /dev/sdc1 to /dev/md0 as 1
> mdadm: added /dev/sde1 to /dev/md0 as 3
> mdadm: added /dev/sdd1 to /dev/md0 as 2
> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
> mdadm: Not enough devices to start the array.

Within the last month or thereabouts, I saw a patch go past the list
which indicated that this might happen. Perhaps you need either a newer,
or older version of md (linux kernel) or mdadm.

The issue was failing to start an array which had enough members to
start while a member was marked failed or something like that....

You didn't mention the kernel version (or I've missed it) but that is my
suggestion, certainly before you try anything destructive....

Or someone else may have a more sensible solution for you (hopefully).

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12  7:16 Unable to reactivate a RAID10 mdadm device Arun Khan
  2013-02-12  8:32 ` Adam Goryachev
@ 2013-02-12  8:42 ` Dave Cundiff
  2013-02-12 11:25   ` Arun Khan
  1 sibling, 1 reply; 11+ messages in thread
From: Dave Cundiff @ 2013-02-12  8:42 UTC (permalink / raw)
  To: Arun Khan; +Cc: Linux MDADM Raid

On Tue, Feb 12, 2013 at 2:16 AM, Arun Khan <knura9@gmail.com> wrote:
> Recovery OS -- System Rescue CD v 2.8.0
>
> Production OS - Debian Squeeze (6) 2.6.32 stock kernel, using mdadm raid
>
> /dev/md0 in raid level RAID10 - members /dev/sdb1, /dev/sdc1,
> /dev/sdd1, /dev/sde1 all with partion id=fd
>
> HDD /dev/sdb went bad, replaced it with another disk with same size
> partion (id=fd)
> using System Rescue CD v2.8.0
>
> 1. System Rescue CD recognized the md devices but it comes up as 'inactive'
>
> Searched for possible solutions and I have tried several things including
> zeroing the super block and adding them back to the array.
>
> Still unable to bring back /dev/md0 with all 4 partions in active mode.
>
> I have included below, the entire transcript of the commands I have
> tried to recover /dev/md0
>
> I have data on /dev/md0 that I need. I do have back ups of critical
> files (but not all).
>
> I prefer solving the problem v/s recreating the /dev/md0 from scratch.
>
> Any help in solving this problem would be highly appreciated.
>
> TIA,
> -- Arun Khan
>
> ---------------  transcript of mdadm activity  with System Rescue CD
> v2.8.0  ----------------
>
> # mdadm -V
> mdadm - v3.1.4 - 31st August 2010
>
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdd1[2] sde1[3]
>       312574512 blocks super 1.0
>
> # mdadm -S /dev/md0
> mdadm: stopped /dev/md0
>
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
>
>
> # mdadm -v -v -A /dev/md0 -R /dev/sd[bcde]1
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> mdadm: added /dev/sdb1 to /dev/md0 as 0
> mdadm: added /dev/sdc1 to /dev/md0 as 1
> mdadm: added /dev/sde1 to /dev/md0 as 3
> mdadm: added /dev/sdd1 to /dev/md0 as 2
> mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
> mdadm: Not enough devices to start the array.
>
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : inactive sdd1[2] sde1[3]
>       312574512 blocks super 1.0
>
> unused devices: <none>
>
> from /var/log/messages
> Feb 12 06:13:59 sysresccd kernel: [ 7593.339015] md: md0 stopped.
> Feb 12 06:13:59 sysresccd kernel: [ 7593.374016] md: bind<sdb1>
> Feb 12 06:13:59 sysresccd kernel: [ 7593.374417] md: bind<sdc1>
> Feb 12 06:13:59 sysresccd kernel: [ 7593.374604] md: bind<sde1>
> Feb 12 06:13:59 sysresccd kernel: [ 7593.374869] md: bind<sdd1>
> Feb 12 06:13:59 sysresccd kernel: [ 7593.374899] md: kicking non-fresh
> sdc1 from array!
> Feb 12 06:13:59 sysresccd kernel: [ 7593.374903] md: unbind<sdc1>
> Feb 12 06:13:59 sysresccd kernel: [ 7593.379016] md: export_rdev(sdc1)
> Feb 12 06:13:59 sysresccd kernel: [ 7593.379041] md: kicking non-fresh
> sdb1 from array!
> Feb 12 06:13:59 sysresccd kernel: [ 7593.379044] md: unbind<sdb1>
> Feb 12 06:13:59 sysresccd kernel: [ 7593.386010] md: export_rdev(sdb1)
> Feb 12 06:13:59 sysresccd kernel: [ 7593.387382] md/raid10:md0: not
> enough operational mirrors.
> Feb 12 06:13:59 sysresccd kernel: [ 7593.387410] md: pers->run() failed ...
>
> # mdadm -D /dev/md0
> /dev/md0:
>         Version : 1.0
>   Creation Time : Fri Apr 29 04:27:04 2011
>      Raid Level : raid10
>   Used Dev Size : 156287232 (149.05 GiB 160.04 GB)
>    Raid Devices : 4
>   Total Devices : 2
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Feb 11 13:43:52 2013
>           State : active, FAILED, Not Started
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : near=2
>      Chunk Size : 32K
>
>            Name : brahmaputra:0
>            UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
>          Events : 783527
>
>     Number   Major   Minor   RaidDevice State
>        0       0        0        0      removed
>        1       0        0        1      removed
>        2       8       49        2      active sync   /dev/sdd1
>        3       8       65        3      active sync   /dev/sde1
>
>
> # mdadm -E /dev/sdc1
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 1.0
>     Feature Map : 0x1
>      Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
>            Name : brahmaputra:0
>   Creation Time : Fri Apr 29 04:27:04 2011
>      Raid Level : raid10
>    Raid Devices : 4
>
>  Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
>      Array Size : 625148928 (298.09 GiB 320.08 GB)
>   Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
>    Super Offset : 312574640 sectors
>           State : clean
>     Device UUID : 2fbc103e:ca40e0c2:b8e4d64f:0fbc7b94
>
> Internal Bitmap : -8 sectors from superblock
>     Update Time : Mon Feb 11 13:43:52 2013
>        Checksum : 2e8e9fad - correct
>          Events : 0
>
>          Layout : near=2
>      Chunk Size : 32K
>
>    Device Role : spare
>    Array State : ..AA ('A' == active, '.' == missing)
>
>
> #  mdadm -E /dev/sdb1
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 1.0
>     Feature Map : 0x1
>      Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
>            Name : brahmaputra:0
>   Creation Time : Fri Apr 29 04:27:04 2011
>      Raid Level : raid10
>    Raid Devices : 4
>
>  Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
>      Array Size : 625148928 (298.09 GiB 320.08 GB)
>   Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
>    Super Offset : 312574640 sectors
>           State : clean
>     Device UUID : e27c187b:9004cb93:5bb05639:164822cd
>
> Internal Bitmap : -8 sectors from superblock
>     Update Time : Mon Feb 11 13:43:52 2013
>        Checksum : 5ea77bd0 - correct
>          Events : 0
>
>          Layout : near=2
>      Chunk Size : 32K
>
>    Device Role : spare
>    Array State : ..AA ('A' == active, '.' == missing)
>
> #  mdadm -E /dev/sde1
> /dev/sde1:
>           Magic : a92b4efc
>         Version : 1.0
>     Feature Map : 0x1
>      Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
>            Name : brahmaputra:0
>   Creation Time : Fri Apr 29 04:27:04 2011
>      Raid Level : raid10
>    Raid Devices : 4
>
>  Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
>      Array Size : 625148928 (298.09 GiB 320.08 GB)
>   Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
>    Super Offset : 312574640 sectors
>           State : clean
>     Device UUID : 78fc82bc:2eb18f07:56c98922:7639269e
>
> Internal Bitmap : -8 sectors from superblock
>     Update Time : Mon Feb 11 13:43:52 2013
>        Checksum : ce19a703 - correct
>          Events : 783527
>
>          Layout : near=2
>      Chunk Size : 32K
>
>    Device Role : Active device 3
>    Array State : ..AA ('A' == active, '.' == missing)
>
> #  mdadm -E /dev/sdd1
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.0
>     Feature Map : 0x1
>      Array UUID : f2d4e898:2e026f85:244a7e9c:908e1af7
>            Name : brahmaputra:0
>   Creation Time : Fri Apr 29 04:27:04 2011
>      Raid Level : raid10
>    Raid Devices : 4
>
>  Avail Dev Size : 312574512 (149.05 GiB 160.04 GB)
>      Array Size : 625148928 (298.09 GiB 320.08 GB)
>   Used Dev Size : 312574464 (149.05 GiB 160.04 GB)
>    Super Offset : 312574640 sectors
>           State : clean
>     Device UUID : 1dce2f63:fef488cb:1a362c57:7ed908ac
>
> Internal Bitmap : -8 sectors from superblock
>     Update Time : Mon Feb 11 13:43:52 2013
>        Checksum : 7b44c944 - correct
>          Events : 783527
>
>          Layout : near=2
>      Chunk Size : 32K
>
>    Device Role : Active device 2
>    Array State : ..AA ('A' == active, '.' == missing)
>
> # mdadm --zero-superblock /dev/sdb1
>
> # mdadm -E /dev/sdb1
> mdadm: No md superblock detected on /dev/sdb1.
>
> # mdadm /dev/md0 --add /dev/sdb1
> mdadm: add new device failed for /dev/sdb1 as 4: Invalid argument
>
> from /var/log/messages
> Feb 12 06:52:26 sysresccd kernel: [ 9900.298880] md0: ADD_NEW_DISK not supported
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

According to what I can see your array refuses to start because the
mirrored pair sdb1 and sdc1 are both out of sync. Both are showing an
event count of zero. That generally means the disk has never attempted
to sync. The device role of "spare" on both of these disks reinforces
that fact.

Are you sure sdc1 was active, sync'd, and in the array before sdb1 failed?
Could you have accidentally cleared the superblock on sdc1 as well?


--
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12  8:42 ` Dave Cundiff
@ 2013-02-12 11:25   ` Arun Khan
  2013-02-12 21:28     ` Dave Cundiff
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Khan @ 2013-02-12 11:25 UTC (permalink / raw)
  To: Linux MDADM Raid

On Tue, Feb 12, 2013 at 2:12 PM, Dave Cundiff  wrote:
>
> According to what I can see your array refuses to start because the
> mirrored pair sdb1 and sdc1 are both out of sync. Both are showing an
> event count of zero. That generally means the disk has never attempted
> to sync. The device role of "spare" on both of these disks reinforces
> that fact.

Monday, evening when I swapped in a good HDD (with the /dev/sdb1
member), the other three devices were working fine.

In fact, the device /dev/md0 was mounted on /mnt/md0, using System
Rescue CD Live session.

> Are you sure sdc1 was active, sync'd, and in the array before sdb1 failed?
> Could you have accidentally cleared the superblock on sdc1 as well?

I used 'watch cat /proc/mdstat' to watch the rebuild progress and the
progress bar showed completion (100%) mark.
When I broke out of this session and thereafter did 'cat /proc/mdstat'
 I noticed that not only was /dev/sdb1 not added but /dev/sdc1 was
also not part of the array anymore.   With two failed devices,
/dev/md0 was still working mounted on /mnt/md0.

I did not clear the superblock on /dev/sdc1.

Tue AM, after reboot, /dev/md0 remains in inactive state.

Please let me know if you can think of anything that will get me out
of this mess.

I will try to use a  distro with a new version of mdadm + kernel.

Thanks.
-- Arun Khan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12  8:32 ` Adam Goryachev
@ 2013-02-12 11:30   ` Arun Khan
  2013-02-12 11:34     ` Arun Khan
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Khan @ 2013-02-12 11:30 UTC (permalink / raw)
  To: Linux MDADM Raid

On Tue, Feb 12, 2013 at 2:02 PM, Adam Goryachev  wrote:
>
> Within the last month or thereabouts, I saw a patch go past the list
> which indicated that this might happen. Perhaps you need either a newer,
> or older version of md (linux kernel) or mdadm.
>
> The issue was failing to start an array which had enough members to
> start while a member was marked failed or something like that....

Thanks.  I will try a Live distro with a newer  mdadm + kernel.


> You didn't mention the kernel version (or I've missed it) but that is my
> suggestion, certainly before you try anything destructive....

It was mentioned right at the beginning of my OP - Stock Debian Squeeze 2.6.32-5

>
> Or someone else may have a more sensible solution for you (hopefully).

I hope so.   I can start afresh from the backups of the critical files
but I stand to loose at least a day's worth of work :(

Thanks,
--  Arun Khan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12 11:30   ` Arun Khan
@ 2013-02-12 11:34     ` Arun Khan
  2013-02-12 12:05       ` Adam Goryachev
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Khan @ 2013-02-12 11:34 UTC (permalink / raw)
  To: Linux MDADM Raid

On Tue, Feb 12, 2013 at 5:00 PM, Arun Khan <knura9@gmail.com> wrote:
> On Tue, Feb 12, 2013 at 2:02 PM, Adam Goryachev  wrote:
>>
>> You didn't mention the kernel version (or I've missed it) but that is my
>> suggestion, certainly before you try anything destructive....
>
> It was mentioned right at the beginning of my OP - Stock Debian Squeeze 2.6.32-5
>

and I missed this one -
The System Rescue CD has kernel 3.2.19 and mdadm - v3.1.4 - 31st August 2010.

I am using System Rescue CD to recover /dev/md0.

-- Arun Khan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12 11:34     ` Arun Khan
@ 2013-02-12 12:05       ` Adam Goryachev
  0 siblings, 0 replies; 11+ messages in thread
From: Adam Goryachev @ 2013-02-12 12:05 UTC (permalink / raw)
  To: Arun Khan; +Cc: Linux MDADM Raid

On 12/02/13 22:34, Arun Khan wrote:
> On Tue, Feb 12, 2013 at 5:00 PM, Arun Khan <knura9@gmail.com> wrote:
>> On Tue, Feb 12, 2013 at 2:02 PM, Adam Goryachev  wrote:
>>> You didn't mention the kernel version (or I've missed it) but that is my
>>> suggestion, certainly before you try anything destructive....
>> It was mentioned right at the beginning of my OP - Stock Debian Squeeze 2.6.32-5
>>
> and I missed this one -
> The System Rescue CD has kernel 3.2.19 and mdadm - v3.1.4 - 31st August 2010.
>
> I am using System Rescue CD to recover /dev/md0.
Sorry, I missed those then....

Why don't you try using a debian stable rescue cd? It should have a
similar version of mdadm and kernel to what you are using on the machine.
I think the issue I mentioned was only in the kernel for a limited
period of time, and I'm sure 2.6.x was not affected... I'd suggest you
check the list archives if you need to know exactly which versions are
affected.

Hope this helps...

Regards,
Adam


-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12 11:25   ` Arun Khan
@ 2013-02-12 21:28     ` Dave Cundiff
  2013-02-23 14:59       ` Arun Khan
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Cundiff @ 2013-02-12 21:28 UTC (permalink / raw)
  To: Arun Khan; +Cc: Linux MDADM Raid

On Tue, Feb 12, 2013 at 6:25 AM, Arun Khan <knura9@gmail.com> wrote:
> I used 'watch cat /proc/mdstat' to watch the rebuild progress and the
> progress bar showed completion (100%) mark.
> When I broke out of this session and thereafter did 'cat /proc/mdstat'
>  I noticed that not only was /dev/sdb1 not added but /dev/sdc1 was
> also not part of the array anymore.   With two failed devices,
> /dev/md0 was still working mounted on /mnt/md0.
>

have you tried adding the --force option to assemble? I would leave
out sdb since its an empty drive.

If that brings it online you can try a read-only fsck with -n to check
the consistency of your data.


-- 
Dave Cundiff
System Administrator
A2Hosting, Inc
http://www.a2hosting.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-12 21:28     ` Dave Cundiff
@ 2013-02-23 14:59       ` Arun Khan
  2013-02-24 14:27         ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Khan @ 2013-02-23 14:59 UTC (permalink / raw)
  To: Linux MDADM Raid

On Wed, Feb 13, 2013 at 2:58 AM, Dave Cundiff <syshackmin@gmail.com> wrote:
> On Tue, Feb 12, 2013 at 6:25 AM, Arun Khan <knura9@gmail.com> wrote:
>> I used 'watch cat /proc/mdstat' to watch the rebuild progress and the
>> progress bar showed completion (100%) mark.
>> When I broke out of this session and thereafter did 'cat /proc/mdstat'
>>  I noticed that not only was /dev/sdb1 not added but /dev/sdc1 was
>> also not part of the array anymore.   With two failed devices,
>> /dev/md0 was still working mounted on /mnt/md0.
>>
>
> have you tried adding the --force option to assemble? I would leave
> out sdb since its an empty drive.
>
> If that brings it online you can try a read-only fsck with -n to check
> the consistency of your data.

Yes did use --force to assemble but no joy.

Fortunately, I have been able to recover the data due to sheer luck!

I figured I was already in a hole so there was no harm in reconnecting
the 'failed' disk before RMA'ing it.
The disk (/dev/sdb)  was recognized by the BIOS,  the OS (Debian) did
not report and DRDY errors on the device.

The Events count for raid partition (/dev/sdb1) on this device
reported > zero.  So now I had three devices with Events > 0

So far so good ...

mdadm --assemble --force /dev/md0 /dev/sdb1 /dev/sdd1 /dev/sde1

did initialize /dev/md0 as active!

mdadm --add --force /dev/md0 /dev/sdc1
added  /dev/sdc1 into the array and I got a fully functional array.

Then I 'failed/removed' /dev/sdb1 from the array (the original failed
disk), /dev/md0 was still functional with 3 disks.

I connected the new hard disk, partitioned /dev/sdb1 to match size and
partition id (fd),

mdadm --add /dev/md0 /dev/sdb1  gave a fully functional /dev/md0

It was a shot in the dark and it worked!

Do not RMA a failing disk in a hurry, it might still save your day.

Thanks to all for your help.

-- Arun Khan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-23 14:59       ` Arun Khan
@ 2013-02-24 14:27         ` Roy Sigurd Karlsbakk
  2013-02-25  6:22           ` Arun Khan
  0 siblings, 1 reply; 11+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-02-24 14:27 UTC (permalink / raw)
  To: Arun Khan; +Cc: Linux MDADM Raid

> I connected the new hard disk, partitioned /dev/sdb1 to match size and
> partition id (fd),

Out of interest - why do you use partitions if you use whole drives for the RAID?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unable to reactivate a RAID10 mdadm device
  2013-02-24 14:27         ` Roy Sigurd Karlsbakk
@ 2013-02-25  6:22           ` Arun Khan
  0 siblings, 0 replies; 11+ messages in thread
From: Arun Khan @ 2013-02-25  6:22 UTC (permalink / raw)
  To: Linux MDADM Raid

On Sun, Feb 24, 2013 at 7:57 PM, Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
>> I connected the new hard disk, partitioned /dev/sdb1 to match size and
>> partition id (fd),
>
> Out of interest - why do you use partitions if you use whole drives for the RAID?
>

The system has a total of 5 disk devices.

1. /dev/sda is a 4GB CF card with a minimal Debian Squeeze (amd64).

The other four are in the order (SATA ports 0,1,2,3)

2. 250GB is /dev/sdb
3. 160GB is /dev/sdc
4. 160GB is /dev/sdd
5. 160GB is /dev/sde

/dev/sdb1 (part id=fd) is slightly larger than /dev/sd[cde]1,
remainder of the disk /dev/sdb is used for /tmp, /var, /var/tmp/, /opt
etc.

Partitions v/s Entire disk --
The wiki link <https://raid.wiki.kernel.org/index.php/Partition_Types>
suggests both are OK

My understanding is that partid must be = fd for autodetect.   To play
it safe, I have created partitions with part_id=fd.

-- Arun Khan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-02-25  6:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-12  7:16 Unable to reactivate a RAID10 mdadm device Arun Khan
2013-02-12  8:32 ` Adam Goryachev
2013-02-12 11:30   ` Arun Khan
2013-02-12 11:34     ` Arun Khan
2013-02-12 12:05       ` Adam Goryachev
2013-02-12  8:42 ` Dave Cundiff
2013-02-12 11:25   ` Arun Khan
2013-02-12 21:28     ` Dave Cundiff
2013-02-23 14:59       ` Arun Khan
2013-02-24 14:27         ` Roy Sigurd Karlsbakk
2013-02-25  6:22           ` Arun Khan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.