All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID 5 re-add of removed drive? (failed drive replacement)
@ 2009-06-02 10:09 Alex R
  2009-06-02 10:18 ` Sujit Karataparambil
  2009-06-02 11:17 ` Robin Hill
  0 siblings, 2 replies; 12+ messages in thread
From: Alex R @ 2009-06-02 10:09 UTC (permalink / raw)
  To: linux-raid


I have a serious RAID problem here. Please have a look at this. Any help
would be greatly appreciated!

As always, most problems occur only during critical tasks like
enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
as explained here:
http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/

After removing a drive and restoring to the new one, another disc in the
array failed. Now I still have all the data redundantly available (the old
drive is still there), but the RAID header is now in a state where it's
impossible to access the data. Is it possible to rearrange the drives to
force the kernel to a valid array?

Here is the story:

// my normal boot log showing RAID devices

Jun  1 22:37:45 localhost klogd: md: md0 stopped.
Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
Jun  1 22:37:45 localhost klogd: md: bind<sda1>
Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
Jun  1 22:37:45 localhost klogd: xor: automatically using best checksumming
function: generic_sse
Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
MB/sec)
Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for level
6
Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for level
5
Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for level
4
Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
0
Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
6
Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
5
Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
4
Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
3
Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
2
Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
1
Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
out of 7 devices, algorithm 2
Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
6001213046784
Jun  1 22:37:45 localhost klogd:  md0: unknown partition table

// now a new spare drive is added

[root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1

Jun  1 22:42:00 localhost klogd: md: bind<sdb1>

// and here goes the drive replacement

[root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1

Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
device.
Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total of
976759936 blocks.
Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)

[root@localhost ~]# more /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
      5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
      [=====>...............]  recovery = 27.5% (269352320/976759936)
finish=276.2min speed=42686K/sec

// surface error on RAID drive while recovery:

Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
SErr 0x0 action 0x0
Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun  2 03:59:49 localhost klogd: ata1.00: cmd
60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
Jun  2 03:59:49 localhost klogd:          res
41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun  2 03:59:49 localhost klogd: ata1: EH complete
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
SErr 0x0 action 0x0
Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun  2 03:59:49 localhost klogd: ata1.00: cmd
60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
Jun  2 03:59:49 localhost klogd:          res
41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun  2 03:59:49 localhost klogd: ata1: EH complete
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
...
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269136 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269144 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269152 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269160 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269168 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269176 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269184 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269192 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269200 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269208 on sda1).
Jun  2 03:59:49 localhost klogd: ata1: EH complete
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
unreadable (pending) sectors 
Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
uncorrectable sectors 

// md0 is now down. But hey, still got the old drive, so just add it again:

[root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1

Jun  2 09:11:49 localhost klogd: md: bind<sdi1>

// it's just added as a SPARE! HELP!!! reboot always helps..

[root@localhost ~]# reboot
[root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 0

    Update Time : Mon Jun  1 22:44:10 2009
          State : clean
 Active Devices : 6
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 22d364f3 - correct
         Events : 2599984

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     6       8        1        6      active sync   /dev/sda1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       8        1        6      active sync   /dev/sda1
   7     7       8       17        7      spare   /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f8dd - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     8       8       17        8      spare   /dev/sdb1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1
/dev/sdg1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f92d - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       97        5      active sync   /dev/sdg1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1
/dev/sdh1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f937 - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8      113        2      active sync   /dev/sdh1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1
/dev/sdi1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f94b - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     7       8      129        7      spare   /dev/sdi1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1
/dev/sdj1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f959 - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8      145        3      active sync   /dev/sdj1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1
/dev/sdk1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f96b - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8      161        4      active sync   /dev/sdk1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1
/dev/sdl1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 15401f4b:391c2538:89022bfa:d48f439f
  Creation Time : Sun Nov  2 13:21:54 2008
     Raid Level : raid5
  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
   Raid Devices : 7
  Total Devices : 8
Preferred Minor : 0

    Update Time : Tue Jun  2 09:11:49 2009
          State : clean
 Active Devices : 5
Working Devices : 7
 Failed Devices : 1
  Spare Devices : 2
       Checksum : 22d3f975 - correct
         Events : 2599992

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8      177        1      active sync   /dev/sdl1

   0     0       0        0        0      removed
   1     1       8      177        1      active sync   /dev/sdl1
   2     2       8      113        2      active sync   /dev/sdh1
   3     3       8      145        3      active sync   /dev/sdj1
   4     4       8      161        4      active sync   /dev/sdk1
   5     5       8       97        5      active sync   /dev/sdg1
   6     6       0        0        6      faulty removed
   7     7       8      129        7      spare   /dev/sdi1
   8     8       8       17        8      spare   /dev/sdb1

the old RAID configuration was:

disc 0: sdi1 <- is now disc 7 and SPARE
disc 1: sdl1
disc 2: sdh1
disc 3: sdj1
disc 4: sdk1
disc 5: sdg1
disc 6: sda1 <- is now faulty removed

[root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
the array.
[root@localhost log]# cat /proc/mdstat 
Personalities : 
md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
sdk1[4](S) sdj1[3](S) sdh1[2](S)
      8790840960 blocks


On large arrays this may happen a lot: A bad drive is first discovered
during maintenance operations when it's too late. Maybe an option to add a
redundant drive in a fail-save way would be a good idea to add to md
sevices.

Please tell me if you see any solution to the problems below.

1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
was before the restore attempt?

2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
readable data on the RAID?

3. I guess more then 90% of data was written to /dev/sdb1 in the restore
attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?

Thank you for looking at the problem
Alexander
-- 
View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
Sent from the linux-raid mailing list archive at Nabble.com.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 10:09 RAID 5 re-add of removed drive? (failed drive replacement) Alex R
@ 2009-06-02 10:18 ` Sujit Karataparambil
  2009-06-02 10:45   ` Alexander Rietsch
  2009-06-02 10:52   ` Sujit Karataparambil
  2009-06-02 11:17 ` Robin Hill
  1 sibling, 2 replies; 12+ messages in thread
From: Sujit Karataparambil @ 2009-06-02 10:18 UTC (permalink / raw)
  To: Alex R; +Cc: linux-raid

http://www.cyberciti.biz/faq/howto-rebuilding-a-raid-array-after-a-disk-fails/


On Tue, Jun 2, 2009 at 3:39 PM, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
>
> I have a serious RAID problem here. Please have a look at this. Any help
> would be greatly appreciated!
>
> As always, most problems occur only during critical tasks like
> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
> as explained here:
> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>
> After removing a drive and restoring to the new one, another disc in the
> array failed. Now I still have all the data redundantly available (the old
> drive is still there), but the RAID header is now in a state where it's
> impossible to access the data. Is it possible to rearrange the drives to
> force the kernel to a valid array?
>
> Here is the story:
>
> // my normal boot log showing RAID devices
>
> Jun  1 22:37:45 localhost klogd: md: md0 stopped.
> Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
> Jun  1 22:37:45 localhost klogd: md: bind<sda1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
> Jun  1 22:37:45 localhost klogd: xor: automatically using best checksumming
> function: generic_sse
> Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
> Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
> MB/sec)
> Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
> Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
> Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for level
> 6
> Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for level
> 5
> Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for level
> 4
> Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
> 0
> Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
> 6
> Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
> 5
> Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
> 4
> Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
> 3
> Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
> 2
> Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
> 1
> Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
> Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
> out of 7 devices, algorithm 2
> Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
> Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
> Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
> Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
> 6001213046784
> Jun  1 22:37:45 localhost klogd:  md0: unknown partition table
>
> // now a new spare drive is added
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>
> Jun  1 22:42:00 localhost klogd: md: bind<sdb1>
>
> // and here goes the drive replacement
>
> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>
> Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
> device.
> Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
> Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
> Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
> Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
> Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000
> KB/sec/disk.
> Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total of
> 976759936 blocks.
> Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>
> [root@localhost ~]# more /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>      5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>      [=====>...............]  recovery = 27.5% (269352320/976759936)
> finish=276.2min speed=42686K/sec
>
> // surface error on RAID drive while recovery:
>
> Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
> SErr 0x0 action 0x0
> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
> Jun  2 03:59:49 localhost klogd:          res
> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun  2 03:59:49 localhost klogd: ata1: EH complete
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
> SErr 0x0 action 0x0
> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
> Jun  2 03:59:49 localhost klogd:          res
> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun  2 03:59:49 localhost klogd: ata1: EH complete
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> ...
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269136 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269144 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269152 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269160 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269168 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269176 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269184 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269192 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269200 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269208 on sda1).
> Jun  2 03:59:49 localhost klogd: ata1: EH complete
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
> unreadable (pending) sectors
> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
> uncorrectable sectors
>
> // md0 is now down. But hey, still got the old drive, so just add it again:
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>
> Jun  2 09:11:49 localhost klogd: md: bind<sdi1>
>
> // it's just added as a SPARE! HELP!!! reboot always helps..
>
> [root@localhost ~]# reboot
> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
> /dev/sda1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 7
> Preferred Minor : 0
>
>    Update Time : Mon Jun  1 22:44:10 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 7
>  Failed Devices : 0
>  Spare Devices : 1
>       Checksum : 22d364f3 - correct
>         Events : 2599984
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     6       8        1        6      active sync   /dev/sda1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       8        1        6      active sync   /dev/sda1
>   7     7       8       17        7      spare   /dev/sdb1
> /dev/sdb1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f8dd - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     8       8       17        8      spare   /dev/sdb1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdg1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f92d - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     5       8       97        5      active sync   /dev/sdg1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdh1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f937 - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     2       8      113        2      active sync   /dev/sdh1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdi1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f94b - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     7       8      129        7      spare   /dev/sdi1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdj1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f959 - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     3       8      145        3      active sync   /dev/sdj1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdk1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f96b - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     4       8      161        4      active sync   /dev/sdk1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdl1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f975 - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     1       8      177        1      active sync   /dev/sdl1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
>
> the old RAID configuration was:
>
> disc 0: sdi1 <- is now disc 7 and SPARE
> disc 1: sdl1
> disc 2: sdh1
> disc 3: sdj1
> disc 4: sdk1
> disc 5: sdg1
> disc 6: sda1 <- is now faulty removed
>
> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
> the array.
> [root@localhost log]# cat /proc/mdstat
> Personalities :
> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>      8790840960 blocks
>
>
> On large arrays this may happen a lot: A bad drive is first discovered
> during maintenance operations when it's too late. Maybe an option to add a
> redundant drive in a fail-save way would be a good idea to add to md
> sevices.
>
> Please tell me if you see any solution to the problems below.
>
> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
> was before the restore attempt?
>
> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
> readable data on the RAID?
>
> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>
> Thank you for looking at the problem
> Alexander
> --
> View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
> Sent from the linux-raid mailing list archive at Nabble.com.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 10:18 ` Sujit Karataparambil
@ 2009-06-02 10:45   ` Alexander Rietsch
  2009-06-02 10:52   ` Sujit Karataparambil
  1 sibling, 0 replies; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 10:45 UTC (permalink / raw)
  To: Sujit Karataparambil; +Cc: linux-raid

Thank you for answering my mail. But to actually read it instead of  
posting a link which contains no more information as already in the  
RAID FAQ or mdadm man page, here is the short version of my problem:

>> disc 0: sdi1 <- is now disc 7 and SPARE
>> disc 1: sdl1
>> disc 2: sdh1
>> disc 3: sdj1
>> disc 4: sdk1
>> disc 5: sdg1
>> disc 6: sda1 <- is now faulty removed
sdb1 <- not finished replacement drive, now SPARE

of the original 7 drives, 2 are disabled. Please tell me how to
- re-add sdi1 as disc 0 (mdadm --re-add just adds it as spare)
- how to enable sda1 as disc6 (mdadm --assemble --force --scan refuses  
to acceppt it)
- how to use the new drive sdb1 as disc7 (mdadm --assemble --force -- 
scan just adds it as spare)

original post:

After removing a drive and restoring to the new one, another disc in  
the array failed. Now I still have all the data redundantly available  
(the old drive is still there), but the RAID header is now in a state  
where it's impossible to access the data. Is it possible to rearrange  
the drives to force the kernel to a valid array?

Here is the story:

// my normal boot log showing RAID devices

Jun  1 22:37:45 localhost klogd: md: md0 stopped.
Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
Jun  1 22:37:45 localhost klogd: md: bind<sda1>
Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
Jun  1 22:37:45 localhost klogd: xor: automatically using best  
checksumming function: generic_sse
Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse  
(5144.000 MB/sec)
Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460  
MB/s)
Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for  
level 6
Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for  
level 5
Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for  
level 4
Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as  
raid disk 0
Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as  
raid disk 6
Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as  
raid disk 5
Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as  
raid disk 4
Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as  
raid disk 3
Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as  
raid disk 2
Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as  
raid disk 1
Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active  
with 7 out of 7 devices, algorithm 2
Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0  
to 6001213046784
Jun  1 22:37:45 localhost klogd:  md0: unknown partition table

// now a new spare drive is added

[root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1

Jun  1 22:42:00 localhost klogd: md: bind<sdb1>

// and here goes the drive replacement

[root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1

Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1,  
disabling device.
Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6  
devices.
Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000  
KB/sec/disk.
Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO  
bandwidth (but not more than 200000 KB/sec) for recovery.
Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total  
of 976759936 blocks.
Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)

[root@localhost ~]# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2]  
sdl1[1]
       5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
       [=====>...............]  recovery = 27.5% (269352320/976759936)  
finish=276.2min speed=42686K/sec

// surface error on RAID drive while recovery:

Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct  
0xffff SErr 0x0 action 0x0
Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun  2 03:59:49 localhost klogd: ata1.00: cmd  
60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
Jun  2 03:59:49 localhost klogd:          res 41/40:08:3f:bd:b8/8c: 
00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun  2 03:59:49 localhost klogd: ata1: EH complete
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte  
hardware sectors: (1.50 TB/1.36 TiB)
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache:  
enabled, read cache: enabled, doesn't support DPO or FUA
Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct  
0x3ffc SErr 0x0 action 0x0
Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun  2 03:59:49 localhost klogd: ata1.00: cmd  
60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
Jun  2 03:59:49 localhost klogd:          res  
41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun  2 03:59:49 localhost klogd: ata1: EH complete
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte  
hardware sectors: (1.50 TB/1.36 TiB)
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache:  
enabled, read cache: enabled, doesn't support DPO or FUA
...
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269136 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269144 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269152 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269160 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269168 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269176 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269184 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269192 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269200 on sda1).
Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable  
(sector 1807269208 on sda1).
Jun  2 03:59:49 localhost klogd: ata1: EH complete
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte  
hardware sectors: (1.50 TB/1.36 TiB)
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache:  
enabled, read cache: enabled, doesn't support DPO or FUA
Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently  
unreadable (pending) sectors
Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline  
uncorrectable sectors

// md0 is now down. But hey, still got the old drive, so just add it  
again:

[root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1

Jun  2 09:11:49 localhost klogd: md: bind<sdi1>

// it's just added as a SPARE! HELP!!! reboot always helps..

[root@localhost ~]# reboot
[root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
/dev/sda1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 7
Preferred Minor : 0

     Update Time : Mon Jun  1 22:44:10 2009
           State : clean
  Active Devices : 6
Working Devices : 7
  Failed Devices : 0
   Spare Devices : 1
        Checksum : 22d364f3 - correct
          Events : 2599984

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     6       8        1        6      active sync   /dev/sda1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       8        1        6      active sync   /dev/sda1
    7     7       8       17        7      spare   /dev/sdb1
/dev/sdb1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f8dd - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     8       8       17        8      spare   /dev/sdb1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1
/dev/sdg1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f92d - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     5       8       97        5      active sync   /dev/sdg1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1
/dev/sdh1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f937 - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     2       8      113        2      active sync   /dev/sdh1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1
/dev/sdi1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f94b - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     7       8      129        7      spare   /dev/sdi1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1
/dev/sdj1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f959 - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8      145        3      active sync   /dev/sdj1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1
/dev/sdk1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f96b - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     4       8      161        4      active sync   /dev/sdk1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1
/dev/sdl1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 15401f4b:391c2538:89022bfa:d48f439f
   Creation Time : Sun Nov  2 13:21:54 2008
      Raid Level : raid5
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
    Raid Devices : 7
   Total Devices : 8
Preferred Minor : 0

     Update Time : Tue Jun  2 09:11:49 2009
           State : clean
  Active Devices : 5
Working Devices : 7
  Failed Devices : 1
   Spare Devices : 2
        Checksum : 22d3f975 - correct
          Events : 2599992

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     1       8      177        1      active sync   /dev/sdl1

    0     0       0        0        0      removed
    1     1       8      177        1      active sync   /dev/sdl1
    2     2       8      113        2      active sync   /dev/sdh1
    3     3       8      145        3      active sync   /dev/sdj1
    4     4       8      161        4      active sync   /dev/sdk1
    5     5       8       97        5      active sync   /dev/sdg1
    6     6       0        0        6      faulty removed
    7     7       8      129        7      spare   /dev/sdi1
    8     8       8       17        8      spare   /dev/sdb1

the old RAID configuration was:

disc 0: sdi1 <- is now disc 7 and SPARE
disc 1: sdl1
disc 2: sdh1
disc 3: sdj1
disc 4: sdk1
disc 5: sdg1
disc 6: sda1 <- is now faulty removed

[root@localhost log]# mdadm --assemble --force /dev/md0 /dev/ 
sd[ilhjkgab]1
mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to  
start the array.
[root@localhost log]# cat /proc/mdstat
Personalities :
md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)  
sdk1[4](S) sdj1[3](S) sdh1[2](S)
       8790840960 blocks


On large arrays this may happen a lot: A bad drive is first discovered  
during maintenance operations when it's too late. Maybe an option to  
add a redundant drive in a fail-save way would be a good idea to add  
to md sevices.

Please tell me if you see any solution to the problems below.

1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID  
as is was before the restore attempt?

2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still  
readable data on the RAID?

3. I guess more then 90% of data was written to /dev/sdb1 in the  
restore attempt. Is it possble to use /dev/sdb1 as disc 7 to access  
the RAID?

Thank you for looking at the problem
Alexander


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 10:18 ` Sujit Karataparambil
  2009-06-02 10:45   ` Alexander Rietsch
@ 2009-06-02 10:52   ` Sujit Karataparambil
  2009-06-02 10:55     ` Sujit Karataparambil
  1 sibling, 1 reply; 12+ messages in thread
From: Sujit Karataparambil @ 2009-06-02 10:52 UTC (permalink / raw)
  To: Alex R; +Cc: linux-raid

Kindly Read the document correctly and throughly.

raidhotadd /dev/mdX /dev/sdb

It says

Q. I have two disk-mirrored array, suppose if one of my disk in
mirrored RAID array fails, then I will replace that disk with new one
(I have hot swapping SCSI drives). Now question is how I rebuild a
RAID array after a disk fails.

A. A redundant array of inexpensive disks, (redundant array of
independent disks) is a system, which uses multiple hard drives to
share or replicate data among the drives. You can use both IDE and
SCSI disk for mirroring.

If you are not using hot swapping drives then you need to shutdown
server. Once hard disk has been replaced to system, you need to use
used raidhotadd to add disks from RAID-1, -4 and -5 arrays, while they
are active.

Assuming that new SCSI disk is /dev/sdb, type the following command:#
raidhotadd /dev/mdX /dev/sdb


On Tue, Jun 2, 2009 at 4:15 PM, Alexander Rietsch
<Alexander.Rietsch@hispeed.ch> wrote:
> Thank you for answering my mail. But to actually read it instead of posting
> a link which contains no more information as already in the RAID FAQ or
> mdadm man page, here is the short version of my problem:
>
>>> disc 0: sdi1 <- is now disc 7 and SPARE
>>> disc 1: sdl1
>>> disc 2: sdh1
>>> disc 3: sdj1
>>> disc 4: sdk1
>>> disc 5: sdg1
>>> disc 6: sda1 <- is now faulty removed
>
> sdb1 <- not finished replacement drive, now SPARE
>
> of the original 7 drives, 2 are disabled. Please tell me how to
> - re-add sdi1 as disc 0 (mdadm --re-add just adds it as spare)
> - how to enable sda1 as disc6 (mdadm --assemble --force --scan refuses to
> acceppt it)
> - how to use the new drive sdb1 as disc7 (mdadm --assemble --force --scan
> just adds it as spare)
>
> original post:
>
> After removing a drive and restoring to the new one, another disc in the
> array failed. Now I still have all the data redundantly available (the old
> drive is still there), but the RAID header is now in a state where it's
> impossible to access the data. Is it possible to rearrange the drives to
> force the kernel to a valid array?
>
> Here is the story:
>
> // my normal boot log showing RAID devices
>
> Jun  1 22:37:45 localhost klogd: md: md0 stopped.
> Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
> Jun  1 22:37:45 localhost klogd: md: bind<sda1>
> Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
> Jun  1 22:37:45 localhost klogd: xor: automatically using best checksumming
> function: generic_sse
> Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
> Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
> MB/sec)
> Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
> Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
> Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
> Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for level
> 6
> Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for level
> 5
> Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for level
> 4
> Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
> 0
> Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
> 6
> Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
> 5
> Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
> 4
> Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
> 3
> Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
> 2
> Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
> 1
> Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
> Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
> out of 7 devices, algorithm 2
> Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
> Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
> Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
> Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
> 6001213046784
> Jun  1 22:37:45 localhost klogd:  md0: unknown partition table
>
> // now a new spare drive is added
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>
> Jun  1 22:42:00 localhost klogd: md: bind<sdb1>
>
> // and here goes the drive replacement
>
> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>
> Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
> device.
> Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
> Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
> Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
> Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
> Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
> Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000
> KB/sec/disk.
> Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total of
> 976759936 blocks.
> Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>
> [root@localhost ~]# more /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>      5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>      [=====>...............]  recovery = 27.5% (269352320/976759936)
> finish=276.2min speed=42686K/sec
>
> // surface error on RAID drive while recovery:
>
> Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
> SErr 0x0 action 0x0
> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
> Jun  2 03:59:49 localhost klogd:          res
> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun  2 03:59:49 localhost klogd: ata1: EH complete
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
> SErr 0x0 action 0x0
> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
> Jun  2 03:59:49 localhost klogd:          res
> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun  2 03:59:49 localhost klogd: ata1: EH complete
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> ...
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269136 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269144 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269152 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269160 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269168 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269176 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269184 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269192 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269200 on sda1).
> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269208 on sda1).
> Jun  2 03:59:49 localhost klogd: ata1: EH complete
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
> unreadable (pending) sectors
> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
> uncorrectable sectors
>
> // md0 is now down. But hey, still got the old drive, so just add it again:
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>
> Jun  2 09:11:49 localhost klogd: md: bind<sdi1>
>
> // it's just added as a SPARE! HELP!!! reboot always helps..
>
> [root@localhost ~]# reboot
> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
> /dev/sda1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 7
> Preferred Minor : 0
>
>    Update Time : Mon Jun  1 22:44:10 2009
>          State : clean
>  Active Devices : 6
> Working Devices : 7
>  Failed Devices : 0
>  Spare Devices : 1
>       Checksum : 22d364f3 - correct
>         Events : 2599984
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     6       8        1        6      active sync   /dev/sda1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       8        1        6      active sync   /dev/sda1
>   7     7       8       17        7      spare   /dev/sdb1
> /dev/sdb1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f8dd - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     8       8       17        8      spare   /dev/sdb1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdg1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f92d - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     5       8       97        5      active sync   /dev/sdg1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdh1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f937 - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     2       8      113        2      active sync   /dev/sdh1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdi1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f94b - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     7       8      129        7      spare   /dev/sdi1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdj1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f959 - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     3       8      145        3      active sync   /dev/sdj1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdk1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f96b - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     4       8      161        4      active sync   /dev/sdk1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
> /dev/sdl1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>  Creation Time : Sun Nov  2 13:21:54 2008
>     Raid Level : raid5
>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>   Raid Devices : 7
>  Total Devices : 8
> Preferred Minor : 0
>
>    Update Time : Tue Jun  2 09:11:49 2009
>          State : clean
>  Active Devices : 5
> Working Devices : 7
>  Failed Devices : 1
>  Spare Devices : 2
>       Checksum : 22d3f975 - correct
>         Events : 2599992
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     1       8      177        1      active sync   /dev/sdl1
>
>   0     0       0        0        0      removed
>   1     1       8      177        1      active sync   /dev/sdl1
>   2     2       8      113        2      active sync   /dev/sdh1
>   3     3       8      145        3      active sync   /dev/sdj1
>   4     4       8      161        4      active sync   /dev/sdk1
>   5     5       8       97        5      active sync   /dev/sdg1
>   6     6       0        0        6      faulty removed
>   7     7       8      129        7      spare   /dev/sdi1
>   8     8       8       17        8      spare   /dev/sdb1
>
> the old RAID configuration was:
>
> disc 0: sdi1 <- is now disc 7 and SPARE
> disc 1: sdl1
> disc 2: sdh1
> disc 3: sdj1
> disc 4: sdk1
> disc 5: sdg1
> disc 6: sda1 <- is now faulty removed
>
> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
> the array.
> [root@localhost log]# cat /proc/mdstat
> Personalities :
> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>      8790840960 blocks
>
>
> On large arrays this may happen a lot: A bad drive is first discovered
> during maintenance operations when it's too late. Maybe an option to add a
> redundant drive in a fail-save way would be a good idea to add to md
> sevices.
>
> Please tell me if you see any solution to the problems below.
>
> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
> was before the restore attempt?
>
> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
> readable data on the RAID?
>
> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>
> Thank you for looking at the problem
> Alexander
>
>



-- 
-- Sujit K M



On Tue, Jun 2, 2009 at 3:48 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
> http://www.cyberciti.biz/faq/howto-rebuilding-a-raid-array-after-a-disk-fails/
>
>
> On Tue, Jun 2, 2009 at 3:39 PM, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
>>
>> I have a serious RAID problem here. Please have a look at this. Any help
>> would be greatly appreciated!
>>
>> As always, most problems occur only during critical tasks like
>> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
>> as explained here:
>> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>>
>> After removing a drive and restoring to the new one, another disc in the
>> array failed. Now I still have all the data redundantly available (the old
>> drive is still there), but the RAID header is now in a state where it's
>> impossible to access the data. Is it possible to rearrange the drives to
>> force the kernel to a valid array?
>>
>> Here is the story:
>>
>> // my normal boot log showing RAID devices
>>
>> Jun  1 22:37:45 localhost klogd: md: md0 stopped.
>> Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sda1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
>> Jun  1 22:37:45 localhost klogd: xor: automatically using best checksumming
>> function: generic_sse
>> Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
>> Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
>> MB/sec)
>> Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
>> Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
>> Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for level
>> 6
>> Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for level
>> 5
>> Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for level
>> 4
>> Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
>> 0
>> Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
>> 6
>> Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
>> 5
>> Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
>> 4
>> Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
>> 3
>> Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
>> 2
>> Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
>> 1
>> Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
>> Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
>> out of 7 devices, algorithm 2
>> Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
>> Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
>> Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
>> Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
>> 6001213046784
>> Jun  1 22:37:45 localhost klogd:  md0: unknown partition table
>>
>> // now a new spare drive is added
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>>
>> Jun  1 22:42:00 localhost klogd: md: bind<sdb1>
>>
>> // and here goes the drive replacement
>>
>> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>>
>> Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
>> device.
>> Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>> Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>> Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
>> Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
>> Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000
>> KB/sec/disk.
>> Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO
>> bandwidth (but not more than 200000 KB/sec) for recovery.
>> Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total of
>> 976759936 blocks.
>> Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>>
>> [root@localhost ~]# more /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>>      5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>>      [=====>...............]  recovery = 27.5% (269352320/976759936)
>> finish=276.2min speed=42686K/sec
>>
>> // surface error on RAID drive while recovery:
>>
>> Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
>> SErr 0x0 action 0x0
>> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
>> Jun  2 03:59:49 localhost klogd:          res
>> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
>> SErr 0x0 action 0x0
>> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
>> Jun  2 03:59:49 localhost klogd:          res
>> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> ...
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269136 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269144 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269152 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269160 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269168 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269176 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269184 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269192 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269200 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269208 on sda1).
>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
>> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
>> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
>> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
>> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
>> unreadable (pending) sectors
>> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
>> uncorrectable sectors
>>
>> // md0 is now down. But hey, still got the old drive, so just add it again:
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>>
>> Jun  2 09:11:49 localhost klogd: md: bind<sdi1>
>>
>> // it's just added as a SPARE! HELP!!! reboot always helps..
>>
>> [root@localhost ~]# reboot
>> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
>> /dev/sda1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 7
>> Preferred Minor : 0
>>
>>    Update Time : Mon Jun  1 22:44:10 2009
>>          State : clean
>>  Active Devices : 6
>> Working Devices : 7
>>  Failed Devices : 0
>>  Spare Devices : 1
>>       Checksum : 22d364f3 - correct
>>         Events : 2599984
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     6       8        1        6      active sync   /dev/sda1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       8        1        6      active sync   /dev/sda1
>>   7     7       8       17        7      spare   /dev/sdb1
>> /dev/sdb1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f8dd - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     8       8       17        8      spare   /dev/sdb1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdg1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f92d - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     5       8       97        5      active sync   /dev/sdg1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdh1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f937 - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     2       8      113        2      active sync   /dev/sdh1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdi1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f94b - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     7       8      129        7      spare   /dev/sdi1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdj1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f959 - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     3       8      145        3      active sync   /dev/sdj1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdk1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f96b - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     4       8      161        4      active sync   /dev/sdk1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdl1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f975 - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     1       8      177        1      active sync   /dev/sdl1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>>
>> the old RAID configuration was:
>>
>> disc 0: sdi1 <- is now disc 7 and SPARE
>> disc 1: sdl1
>> disc 2: sdh1
>> disc 3: sdj1
>> disc 4: sdk1
>> disc 5: sdg1
>> disc 6: sda1 <- is now faulty removed
>>
>> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
>> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
>> the array.
>> [root@localhost log]# cat /proc/mdstat
>> Personalities :
>> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
>> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>>      8790840960 blocks
>>
>>
>> On large arrays this may happen a lot: A bad drive is first discovered
>> during maintenance operations when it's too late. Maybe an option to add a
>> redundant drive in a fail-save way would be a good idea to add to md
>> sevices.
>>
>> Please tell me if you see any solution to the problems below.
>>
>> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
>> was before the restore attempt?
>>
>> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
>> readable data on the RAID?
>>
>> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
>> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>>
>> Thank you for looking at the problem
>> Alexander
>> --
>> View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
>> Sent from the linux-raid mailing list archive at Nabble.com.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> -- Sujit K M
>



-- 
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 10:52   ` Sujit Karataparambil
@ 2009-06-02 10:55     ` Sujit Karataparambil
  0 siblings, 0 replies; 12+ messages in thread
From: Sujit Karataparambil @ 2009-06-02 10:55 UTC (permalink / raw)
  To: Alex R; +Cc: linux-raid

http://www.tldp.org/HOWTO/Software-RAID-HOWTO-3.html

This is the RAID Documentation which I found very less suffiecient.

On Tue, Jun 2, 2009 at 4:22 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
> Kindly Read the document correctly and throughly.
>
> raidhotadd /dev/mdX /dev/sdb
>
> It says
>
> Q. I have two disk-mirrored array, suppose if one of my disk in
> mirrored RAID array fails, then I will replace that disk with new one
> (I have hot swapping SCSI drives). Now question is how I rebuild a
> RAID array after a disk fails.
>
> A. A redundant array of inexpensive disks, (redundant array of
> independent disks) is a system, which uses multiple hard drives to
> share or replicate data among the drives. You can use both IDE and
> SCSI disk for mirroring.
>
> If you are not using hot swapping drives then you need to shutdown
> server. Once hard disk has been replaced to system, you need to use
> used raidhotadd to add disks from RAID-1, -4 and -5 arrays, while they
> are active.
>
> Assuming that new SCSI disk is /dev/sdb, type the following command:#
> raidhotadd /dev/mdX /dev/sdb
>
>
> On Tue, Jun 2, 2009 at 4:15 PM, Alexander Rietsch
> <Alexander.Rietsch@hispeed.ch> wrote:
>> Thank you for answering my mail. But to actually read it instead of posting
>> a link which contains no more information as already in the RAID FAQ or
>> mdadm man page, here is the short version of my problem:
>>
>>>> disc 0: sdi1 <- is now disc 7 and SPARE
>>>> disc 1: sdl1
>>>> disc 2: sdh1
>>>> disc 3: sdj1
>>>> disc 4: sdk1
>>>> disc 5: sdg1
>>>> disc 6: sda1 <- is now faulty removed
>>
>> sdb1 <- not finished replacement drive, now SPARE
>>
>> of the original 7 drives, 2 are disabled. Please tell me how to
>> - re-add sdi1 as disc 0 (mdadm --re-add just adds it as spare)
>> - how to enable sda1 as disc6 (mdadm --assemble --force --scan refuses to
>> acceppt it)
>> - how to use the new drive sdb1 as disc7 (mdadm --assemble --force --scan
>> just adds it as spare)
>>
>> original post:
>>
>> After removing a drive and restoring to the new one, another disc in the
>> array failed. Now I still have all the data redundantly available (the old
>> drive is still there), but the RAID header is now in a state where it's
>> impossible to access the data. Is it possible to rearrange the drives to
>> force the kernel to a valid array?
>>
>> Here is the story:
>>
>> // my normal boot log showing RAID devices
>>
>> Jun  1 22:37:45 localhost klogd: md: md0 stopped.
>> Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sda1>
>> Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
>> Jun  1 22:37:45 localhost klogd: xor: automatically using best checksumming
>> function: generic_sse
>> Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
>> Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
>> MB/sec)
>> Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
>> Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
>> Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
>> Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for level
>> 6
>> Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for level
>> 5
>> Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for level
>> 4
>> Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
>> 0
>> Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
>> 6
>> Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
>> 5
>> Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
>> 4
>> Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
>> 3
>> Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
>> 2
>> Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
>> 1
>> Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
>> Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
>> out of 7 devices, algorithm 2
>> Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
>> Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
>> Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
>> Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
>> 6001213046784
>> Jun  1 22:37:45 localhost klogd:  md0: unknown partition table
>>
>> // now a new spare drive is added
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>>
>> Jun  1 22:42:00 localhost klogd: md: bind<sdb1>
>>
>> // and here goes the drive replacement
>>
>> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>>
>> Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
>> device.
>> Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>> Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>> Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>> Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
>> Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
>> Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000
>> KB/sec/disk.
>> Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO
>> bandwidth (but not more than 200000 KB/sec) for recovery.
>> Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total of
>> 976759936 blocks.
>> Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>>
>> [root@localhost ~]# more /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>>      5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>>      [=====>...............]  recovery = 27.5% (269352320/976759936)
>> finish=276.2min speed=42686K/sec
>>
>> // surface error on RAID drive while recovery:
>>
>> Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
>> SErr 0x0 action 0x0
>> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
>> Jun  2 03:59:49 localhost klogd:          res
>> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
>> SErr 0x0 action 0x0
>> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
>> Jun  2 03:59:49 localhost klogd:          res
>> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> ...
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269136 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269144 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269152 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269160 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269168 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269176 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269184 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269192 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269200 on sda1).
>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269208 on sda1).
>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
>> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
>> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
>> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
>> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
>> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
>> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
>> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
>> unreadable (pending) sectors
>> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
>> uncorrectable sectors
>>
>> // md0 is now down. But hey, still got the old drive, so just add it again:
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>>
>> Jun  2 09:11:49 localhost klogd: md: bind<sdi1>
>>
>> // it's just added as a SPARE! HELP!!! reboot always helps..
>>
>> [root@localhost ~]# reboot
>> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
>> /dev/sda1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 7
>> Preferred Minor : 0
>>
>>    Update Time : Mon Jun  1 22:44:10 2009
>>          State : clean
>>  Active Devices : 6
>> Working Devices : 7
>>  Failed Devices : 0
>>  Spare Devices : 1
>>       Checksum : 22d364f3 - correct
>>         Events : 2599984
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     6       8        1        6      active sync   /dev/sda1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       8        1        6      active sync   /dev/sda1
>>   7     7       8       17        7      spare   /dev/sdb1
>> /dev/sdb1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f8dd - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     8       8       17        8      spare   /dev/sdb1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdg1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f92d - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     5       8       97        5      active sync   /dev/sdg1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdh1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f937 - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     2       8      113        2      active sync   /dev/sdh1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdi1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f94b - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     7       8      129        7      spare   /dev/sdi1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdj1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f959 - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     3       8      145        3      active sync   /dev/sdj1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdk1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f96b - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     4       8      161        4      active sync   /dev/sdk1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>> /dev/sdl1:
>>          Magic : a92b4efc
>>        Version : 0.90.00
>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>  Creation Time : Sun Nov  2 13:21:54 2008
>>     Raid Level : raid5
>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>   Raid Devices : 7
>>  Total Devices : 8
>> Preferred Minor : 0
>>
>>    Update Time : Tue Jun  2 09:11:49 2009
>>          State : clean
>>  Active Devices : 5
>> Working Devices : 7
>>  Failed Devices : 1
>>  Spare Devices : 2
>>       Checksum : 22d3f975 - correct
>>         Events : 2599992
>>
>>         Layout : left-symmetric
>>     Chunk Size : 64K
>>
>>      Number   Major   Minor   RaidDevice State
>> this     1       8      177        1      active sync   /dev/sdl1
>>
>>   0     0       0        0        0      removed
>>   1     1       8      177        1      active sync   /dev/sdl1
>>   2     2       8      113        2      active sync   /dev/sdh1
>>   3     3       8      145        3      active sync   /dev/sdj1
>>   4     4       8      161        4      active sync   /dev/sdk1
>>   5     5       8       97        5      active sync   /dev/sdg1
>>   6     6       0        0        6      faulty removed
>>   7     7       8      129        7      spare   /dev/sdi1
>>   8     8       8       17        8      spare   /dev/sdb1
>>
>> the old RAID configuration was:
>>
>> disc 0: sdi1 <- is now disc 7 and SPARE
>> disc 1: sdl1
>> disc 2: sdh1
>> disc 3: sdj1
>> disc 4: sdk1
>> disc 5: sdg1
>> disc 6: sda1 <- is now faulty removed
>>
>> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
>> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
>> the array.
>> [root@localhost log]# cat /proc/mdstat
>> Personalities :
>> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
>> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>>      8790840960 blocks
>>
>>
>> On large arrays this may happen a lot: A bad drive is first discovered
>> during maintenance operations when it's too late. Maybe an option to add a
>> redundant drive in a fail-save way would be a good idea to add to md
>> sevices.
>>
>> Please tell me if you see any solution to the problems below.
>>
>> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
>> was before the restore attempt?
>>
>> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
>> readable data on the RAID?
>>
>> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
>> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>>
>> Thank you for looking at the problem
>> Alexander
>>
>>
>
>
>
> --
> -- Sujit K M
>
>
>
> On Tue, Jun 2, 2009 at 3:48 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
>> http://www.cyberciti.biz/faq/howto-rebuilding-a-raid-array-after-a-disk-fails/
>>
>>
>> On Tue, Jun 2, 2009 at 3:39 PM, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
>>>
>>> I have a serious RAID problem here. Please have a look at this. Any help
>>> would be greatly appreciated!
>>>
>>> As always, most problems occur only during critical tasks like
>>> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
>>> as explained here:
>>> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>>>
>>> After removing a drive and restoring to the new one, another disc in the
>>> array failed. Now I still have all the data redundantly available (the old
>>> drive is still there), but the RAID header is now in a state where it's
>>> impossible to access the data. Is it possible to rearrange the drives to
>>> force the kernel to a valid array?
>>>
>>> Here is the story:
>>>
>>> // my normal boot log showing RAID devices
>>>
>>> Jun  1 22:37:45 localhost klogd: md: md0 stopped.
>>> Jun  1 22:37:45 localhost klogd: md: bind<sdl1>
>>> Jun  1 22:37:45 localhost klogd: md: bind<sdh1>
>>> Jun  1 22:37:45 localhost klogd: md: bind<sdj1>
>>> Jun  1 22:37:45 localhost klogd: md: bind<sdk1>
>>> Jun  1 22:37:45 localhost klogd: md: bind<sdg1>
>>> Jun  1 22:37:45 localhost klogd: md: bind<sda1>
>>> Jun  1 22:37:45 localhost klogd: md: bind<sdi1>
>>> Jun  1 22:37:45 localhost klogd: xor: automatically using best checksumming
>>> function: generic_sse
>>> Jun  1 22:37:45 localhost klogd:    generic_sse:  5144.000 MB/sec
>>> Jun  1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
>>> MB/sec)
>>> Jun  1 22:37:45 localhost klogd: async_tx: api initialized (async)
>>> Jun  1 22:37:45 localhost klogd: raid6: int64x1   1539 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: int64x2   1558 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: int64x4   1968 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: int64x8   1554 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: sse2x1    2441 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: sse2x2    3250 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: sse2x4    3460 MB/s
>>> Jun  1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
>>> Jun  1 22:37:45 localhost klogd: md: raid6 personality registered for level
>>> 6
>>> Jun  1 22:37:45 localhost klogd: md: raid5 personality registered for level
>>> 5
>>> Jun  1 22:37:45 localhost klogd: md: raid4 personality registered for level
>>> 4
>>> Jun  1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
>>> 0
>>> Jun  1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
>>> 6
>>> Jun  1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
>>> 5
>>> Jun  1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
>>> 4
>>> Jun  1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
>>> 3
>>> Jun  1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
>>> 2
>>> Jun  1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
>>> 1
>>> Jun  1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
>>> Jun  1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
>>> out of 7 devices, algorithm 2
>>> Jun  1 22:37:45 localhost klogd: RAID5 conf printout:
>>> Jun  1 22:37:45 localhost klogd:  --- rd:7 wd:7
>>> Jun  1 22:37:45 localhost klogd:  disk 0, o:1, dev:sdi1
>>> Jun  1 22:37:45 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  1 22:37:45 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  1 22:37:45 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  1 22:37:45 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  1 22:37:45 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  1 22:37:45 localhost klogd:  disk 6, o:1, dev:sda1
>>> Jun  1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
>>> 6001213046784
>>> Jun  1 22:37:45 localhost klogd:  md0: unknown partition table
>>>
>>> // now a new spare drive is added
>>>
>>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>>>
>>> Jun  1 22:42:00 localhost klogd: md: bind<sdb1>
>>>
>>> // and here goes the drive replacement
>>>
>>> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>>>
>>> Jun  1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
>>> device.
>>> Jun  1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
>>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>>> Jun  1 22:44:10 localhost klogd:  disk 0, o:0, dev:sdi1
>>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>>> Jun  1 22:44:10 localhost klogd: RAID5 conf printout:
>>> Jun  1 22:44:10 localhost klogd:  --- rd:7 wd:6
>>> Jun  1 22:44:10 localhost klogd:  disk 0, o:1, dev:sdb1
>>> Jun  1 22:44:10 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  1 22:44:10 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  1 22:44:10 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  1 22:44:10 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  1 22:44:10 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  1 22:44:10 localhost klogd:  disk 6, o:1, dev:sda1
>>> Jun  1 22:44:10 localhost klogd: md: recovery of RAID array md0
>>> Jun  1 22:44:10 localhost klogd: md: unbind<sdi1>
>>> Jun  1 22:44:10 localhost klogd: md: minimum _guaranteed_  speed: 1000
>>> KB/sec/disk.
>>> Jun  1 22:44:10 localhost klogd: md: using maximum available idle IO
>>> bandwidth (but not more than 200000 KB/sec) for recovery.
>>> Jun  1 22:44:10 localhost klogd: md: using 128k window, over a total of
>>> 976759936 blocks.
>>> Jun  1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>>>
>>> [root@localhost ~]# more /proc/mdstat
>>> Personalities : [raid6] [raid5] [raid4]
>>> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>>>      5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>>>      [=====>...............]  recovery = 27.5% (269352320/976759936)
>>> finish=276.2min speed=42686K/sec
>>>
>>> // surface error on RAID drive while recovery:
>>>
>>> Jun  2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
>>> SErr 0x0 action 0x0
>>> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>>> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
>>> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
>>> Jun  2 03:59:49 localhost klogd:          res
>>> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
>>> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>>> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>>> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>>> hardware sectors: (1.50 TB/1.36 TiB)
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>>> read cache: enabled, doesn't support DPO or FUA
>>> Jun  2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
>>> SErr 0x0 action 0x0
>>> Jun  2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>>> Jun  2 03:59:49 localhost klogd: ata1.00: cmd
>>> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
>>> Jun  2 03:59:49 localhost klogd:          res
>>> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
>>> Jun  2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>>> Jun  2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>>> Jun  2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>>> hardware sectors: (1.50 TB/1.36 TiB)
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>>> read cache: enabled, doesn't support DPO or FUA
>>> ...
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269136 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269144 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269152 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269160 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269168 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269176 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269184 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269192 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269200 on sda1).
>>> Jun  2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269208 on sda1).
>>> Jun  2 03:59:49 localhost klogd: ata1: EH complete
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>>> hardware sectors: (1.50 TB/1.36 TiB)
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>>> Jun  2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>>> read cache: enabled, doesn't support DPO or FUA
>>> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
>>> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
>>> Jun  2 03:59:49 localhost klogd:  disk 0, o:1, dev:sdb1
>>> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  2 03:59:49 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  2 03:59:49 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  2 03:59:49 localhost klogd:  disk 6, o:0, dev:sda1
>>> Jun  2 03:59:49 localhost klogd: RAID5 conf printout:
>>> Jun  2 03:59:49 localhost klogd:  --- rd:7 wd:5
>>> Jun  2 03:59:49 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  2 03:59:49 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  2 03:59:49 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
>>> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
>>> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
>>> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  2 03:59:50 localhost klogd:  disk 6, o:0, dev:sda1
>>> Jun  2 03:59:50 localhost klogd: RAID5 conf printout:
>>> Jun  2 03:59:50 localhost klogd:  --- rd:7 wd:5
>>> Jun  2 03:59:50 localhost klogd:  disk 1, o:1, dev:sdl1
>>> Jun  2 03:59:50 localhost klogd:  disk 2, o:1, dev:sdh1
>>> Jun  2 03:59:50 localhost klogd:  disk 3, o:1, dev:sdj1
>>> Jun  2 03:59:50 localhost klogd:  disk 4, o:1, dev:sdk1
>>> Jun  2 03:59:50 localhost klogd:  disk 5, o:1, dev:sdg1
>>> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
>>> unreadable (pending) sectors
>>> Jun  2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
>>> uncorrectable sectors
>>>
>>> // md0 is now down. But hey, still got the old drive, so just add it again:
>>>
>>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>>>
>>> Jun  2 09:11:49 localhost klogd: md: bind<sdi1>
>>>
>>> // it's just added as a SPARE! HELP!!! reboot always helps..
>>>
>>> [root@localhost ~]# reboot
>>> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
>>> /dev/sda1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 7
>>> Preferred Minor : 0
>>>
>>>    Update Time : Mon Jun  1 22:44:10 2009
>>>          State : clean
>>>  Active Devices : 6
>>> Working Devices : 7
>>>  Failed Devices : 0
>>>  Spare Devices : 1
>>>       Checksum : 22d364f3 - correct
>>>         Events : 2599984
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     6       8        1        6      active sync   /dev/sda1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       8        1        6      active sync   /dev/sda1
>>>   7     7       8       17        7      spare   /dev/sdb1
>>> /dev/sdb1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f8dd - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     8       8       17        8      spare   /dev/sdb1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>> /dev/sdg1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f92d - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     5       8       97        5      active sync   /dev/sdg1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>> /dev/sdh1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f937 - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     2       8      113        2      active sync   /dev/sdh1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>> /dev/sdi1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f94b - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     7       8      129        7      spare   /dev/sdi1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>> /dev/sdj1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f959 - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     3       8      145        3      active sync   /dev/sdj1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>> /dev/sdk1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f96b - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     4       8      161        4      active sync   /dev/sdk1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>> /dev/sdl1:
>>>          Magic : a92b4efc
>>>        Version : 0.90.00
>>>           UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>>  Creation Time : Sun Nov  2 13:21:54 2008
>>>     Raid Level : raid5
>>>  Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>>     Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>>   Raid Devices : 7
>>>  Total Devices : 8
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Jun  2 09:11:49 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 7
>>>  Failed Devices : 1
>>>  Spare Devices : 2
>>>       Checksum : 22d3f975 - correct
>>>         Events : 2599992
>>>
>>>         Layout : left-symmetric
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     1       8      177        1      active sync   /dev/sdl1
>>>
>>>   0     0       0        0        0      removed
>>>   1     1       8      177        1      active sync   /dev/sdl1
>>>   2     2       8      113        2      active sync   /dev/sdh1
>>>   3     3       8      145        3      active sync   /dev/sdj1
>>>   4     4       8      161        4      active sync   /dev/sdk1
>>>   5     5       8       97        5      active sync   /dev/sdg1
>>>   6     6       0        0        6      faulty removed
>>>   7     7       8      129        7      spare   /dev/sdi1
>>>   8     8       8       17        8      spare   /dev/sdb1
>>>
>>> the old RAID configuration was:
>>>
>>> disc 0: sdi1 <- is now disc 7 and SPARE
>>> disc 1: sdl1
>>> disc 2: sdh1
>>> disc 3: sdj1
>>> disc 4: sdk1
>>> disc 5: sdg1
>>> disc 6: sda1 <- is now faulty removed
>>>
>>> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
>>> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
>>> the array.
>>> [root@localhost log]# cat /proc/mdstat
>>> Personalities :
>>> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
>>> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>>>      8790840960 blocks
>>>
>>>
>>> On large arrays this may happen a lot: A bad drive is first discovered
>>> during maintenance operations when it's too late. Maybe an option to add a
>>> redundant drive in a fail-save way would be a good idea to add to md
>>> sevices.
>>>
>>> Please tell me if you see any solution to the problems below.
>>>
>>> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
>>> was before the restore attempt?
>>>
>>> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
>>> readable data on the RAID?
>>>
>>> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
>>> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>>>
>>> Thank you for looking at the problem
>>> Alexander
>>> --
>>> View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
>>> Sent from the linux-raid mailing list archive at Nabble.com.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> -- Sujit K M
>>
>
>
>
> --
> -- Sujit K M
>



-- 
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 10:09 RAID 5 re-add of removed drive? (failed drive replacement) Alex R
  2009-06-02 10:18 ` Sujit Karataparambil
@ 2009-06-02 11:17 ` Robin Hill
  2009-06-02 12:00   ` Alexander Rietsch
  1 sibling, 1 reply; 12+ messages in thread
From: Robin Hill @ 2009-06-02 11:17 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On Tue Jun 02, 2009 at 03:09:11AM -0700, Alex R wrote:

> 
> I have a serious RAID problem here. Please have a look at this. Any help
> would be greatly appreciated!
> 
> As always, most problems occur only during critical tasks like
> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
> as explained here:
> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
> 
> After removing a drive and restoring to the new one, another disc in the
> array failed. Now I still have all the data redundantly available (the old
> drive is still there), but the RAID header is now in a state where it's
> impossible to access the data. Is it possible to rearrange the drives to
> force the kernel to a valid array?
> 
<-- SNIP details -->

AFAIK, the only solution at this stage is to recreate the array.

You need to use the "--assume-clean" flag (or replace one of the drives
with "missing"), along with _exactly_ the same parameters & drive order
as when you originally created the array (you should be able to get most
of this from mdadm -D).  This will rewrite the RAID metadata, but leave
the filesystem untouched.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
@ 2009-06-02 12:00   ` Alexander Rietsch
  2009-06-02 13:10     ` Robin Hill
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 12:00 UTC (permalink / raw)
  To: linux-raid

>
> AFAIK, the only solution at this stage is to recreate the array.
>
> You need to use the "--assume-clean" flag (or replace one of the  
> drives
> with "missing"), along with _exactly_ the same parameters & drive  
> order
> as when you originally created the array (you should be able to get  
> most
> of this from mdadm -D).  This will rewrite the RAID metadata, but  
> leave
> the filesystem untouched.

A glimpse of hope. Thank you! Didn't know about this --assume-clean  
flag. So just to double-check:

The array to create would be:
disc 0: sdi1 <- is now disc 7 and SPARE due to failed replacement  
operation
disc 1: sdl1
disc 2: sdh1
disc 3: sdj1
disc 4: sdk1
disc 5: sdg1
disc 6: sda1 <- is now faulty removed

So I just create an incomplete array without sda1 in the same order  
which would be:

mdadm --create /dev/md0 --assume-clean --level=5 --chunk=64 --raid- 
devices=7 /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1

I'm not sure about the drive oder in the mdadm command: is it correct  
to assume <drive 0> <drive 1> <drive 2> in order or is it mirrored  
like <drive 2> <drive 1> <drive 0> ?
I also hope the command doesn't trigger any recovery actions or  
filesystem changes..

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 12:00   ` Alexander Rietsch
@ 2009-06-02 13:10     ` Robin Hill
  2009-06-02 14:24       ` Alexander Rietsch
  0 siblings, 1 reply; 12+ messages in thread
From: Robin Hill @ 2009-06-02 13:10 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]

On Tue Jun 02, 2009 at 02:00:15PM +0200, Alexander Rietsch wrote:

>>
>> AFAIK, the only solution at this stage is to recreate the array.
>>
>> You need to use the "--assume-clean" flag (or replace one of the drives
>> with "missing"), along with _exactly_ the same parameters & drive order
>> as when you originally created the array (you should be able to get most
>> of this from mdadm -D).  This will rewrite the RAID metadata, but leave
>> the filesystem untouched.
>
> A glimpse of hope. Thank you! Didn't know about this --assume-clean flag. 
> So just to double-check:
>
> The array to create would be:
> disc 0: sdi1 <- is now disc 7 and SPARE due to failed replacement operation
> disc 1: sdl1
> disc 2: sdh1
> disc 3: sdj1
> disc 4: sdk1
> disc 5: sdg1
> disc 6: sda1 <- is now faulty removed
>
> So I just create an incomplete array without sda1 in the same order which 
> would be:
>
> mdadm --create /dev/md0 --assume-clean --level=5 --chunk=64 
> --raid-devices=7 /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 
> /dev/sdg1
>
Almost - you'll also need to specify "missing" for disc 6 (and the
--assume-clean isn't actually needed in this case, as the array can't do
any reconstruction with a missing drive), so:

    mdadm --create /dev/md0 --level=5 --chunk=64 --raid-devices=7
    /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing

> I'm not sure about the drive oder in the mdadm command: is it correct to 
> assume <drive 0> <drive 1> <drive 2> in order or is it mirrored like <drive 
> 2> <drive 1> <drive 0> ?
> I also hope the command doesn't trigger any recovery actions or filesystem 
> changes..

This should be safe, yes - the numbers are also given in the output from
"mdadm -D /dev/md0" or "mdadm -E /dev/sdl1".  The array creation doesn't
trigger any changes at all to the filesystem (though mounting it might,
even in read-only mode) so is perfectly safe to do.  You can also try
"fsck -n" on the filesystem before mounting to verify that the array
order is correct - this may fail on filesystems with unflushed journal
data though.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 13:10     ` Robin Hill
@ 2009-06-02 14:24       ` Alexander Rietsch
  2009-06-08  9:19         ` David Greaves
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 14:24 UTC (permalink / raw)
  To: Robin Hill; +Cc: linux-raid

On 02.06.2009, at 15:10, Robin Hill wrote:

> Almost - you'll also need to specify "missing" for disc 6 (and the
> --assume-clean isn't actually needed in this case, as the array  
> can't do
> any reconstruction with a missing drive), so:
>
>    mdadm --create /dev/md0 --level=5 --chunk=64 --raid-devices=7
>    /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing

Yes, that's it! The RAID is alive! Mr. Robin Hill, you're a HERO!

With this trick, it's possible to recover a RAID which was confused by  
a data error during disk-replacement. I'll note this somewhere.

Here's the log of the creation command for completeness:

[root@localhost ~]# mdadm --create /dev/md0 --assume-clean --level=5 -- 
chunk=64 --raid-devices=7 --spare-devices=0 /dev/sdi1 /dev/sdl1 /dev/ 
sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing
mdadm: /dev/sdi1 appears to be part of a raid array:
     level=raid5 devices=7 ctime=Sun Nov  2 13:21:54 2008
mdadm: /dev/sdl1 appears to be part of a raid array:
     level=raid5 devices=7 ctime=Sun Nov  2 13:21:54 2008
mdadm: /dev/sdh1 appears to be part of a raid array:
     level=raid5 devices=7 ctime=Sun Nov  2 13:21:54 2008
mdadm: /dev/sdj1 appears to be part of a raid array:
     level=raid5 devices=7 ctime=Sun Nov  2 13:21:54 2008
mdadm: /dev/sdk1 appears to be part of a raid array:
     level=raid5 devices=7 ctime=Sun Nov  2 13:21:54 2008
mdadm: /dev/sdg1 appears to be part of a raid array:
     level=raid5 devices=7 ctime=Sun Nov  2 13:21:54 2008
mdadm: largest drive (/dev/sdg1) exceeds size (976759936K) by more  
than 1%
Continue creating array? y
mdadm: array /dev/md/0 started.

Jun  2 15:34:47 localhost klogd: md: bind<sdi1>
Jun  2 15:34:47 localhost klogd: md: bind<sdl1>
Jun  2 15:34:47 localhost klogd: md: bind<sdh1>
Jun  2 15:34:47 localhost klogd: md: bind<sdj1>
Jun  2 15:34:47 localhost klogd: md: bind<sdk1>
Jun  2 15:34:47 localhost klogd: md: bind<sdg1>
Jun  2 15:34:47 localhost klogd: md: raid6 personality registered for  
level 6
Jun  2 15:34:47 localhost klogd: md: raid5 personality registered for  
level 5
Jun  2 15:34:47 localhost klogd: md: raid4 personality registered for  
level 4
Jun  2 15:34:47 localhost klogd: raid5: device sdg1 operational as  
raid disk 5
Jun  2 15:34:47 localhost klogd: raid5: device sdk1 operational as  
raid disk 4
Jun  2 15:34:47 localhost klogd: raid5: device sdj1 operational as  
raid disk 3
Jun  2 15:34:47 localhost klogd: raid5: device sdh1 operational as  
raid disk 2
Jun  2 15:34:47 localhost klogd: raid5: device sdl1 operational as  
raid disk 1
Jun  2 15:34:47 localhost klogd: raid5: device sdi1 operational as  
raid disk 0
Jun  2 15:34:47 localhost klogd: raid5: allocated 7434kB for md0
Jun  2 15:34:47 localhost klogd: raid5: raid level 5 set md0 active  
with 6 out of 7 devices, algorithm 2
Jun  2 15:34:47 localhost klogd: RAID5 conf printout:
Jun  2 15:34:47 localhost klogd:  --- rd:7 wd:6
Jun  2 15:34:47 localhost klogd:  disk 0, o:1, dev:sdi1
Jun  2 15:34:47 localhost klogd:  disk 1, o:1, dev:sdl1
Jun  2 15:34:47 localhost klogd:  disk 2, o:1, dev:sdh1
Jun  2 15:34:47 localhost klogd:  disk 3, o:1, dev:sdj1
Jun  2 15:34:47 localhost klogd:  disk 4, o:1, dev:sdk1
Jun  2 15:34:47 localhost klogd:  disk 5, o:1, dev:sdg1
Jun  2 15:34:47 localhost klogd: md0: detected capacity change from 0  
to 6001213046784
Jun  2 15:34:47 localhost klogd:  md0: unknown partition table

[root@localhost ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1] sdi1[0]
       5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [UUUUUU_]
unused devices: <none>

Again, thanks a lot for your help. Very appreciated.

Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 14:24       ` Alexander Rietsch
@ 2009-06-08  9:19         ` David Greaves
  0 siblings, 0 replies; 12+ messages in thread
From: David Greaves @ 2009-06-08  9:19 UTC (permalink / raw)
  To: Alexander Rietsch; +Cc: Robin Hill, linux-raid, Sujit Karataparambil

Alexander Rietsch wrote:
> On 02.06.2009, at 15:10, Robin Hill wrote:
> 
>> Almost - you'll also need to specify "missing" for disc 6 (and the
>> --assume-clean isn't actually needed in this case, as the array can't do
>> any reconstruction with a missing drive), so:
>>
>>    mdadm --create /dev/md0 --level=5 --chunk=64 --raid-devices=7
>>    /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing
> 
> Yes, that's it! The RAID is alive! Mr. Robin Hill, you're a HERO!
> 
> With this trick, it's possible to recover a RAID which was confused by a
> data error during disk-replacement. I'll note this somewhere.

Maybe:
  http://linux-raid.osdl.org/

:)

I've not had time to update it recently.

Sujit Karataparambil wrote:
> http://www.tldp.org/HOWTO/Software-RAID-HOWTO-3.html
>
> This is the RAID Documentation which I found very less suffiecient.

I spent some considerable time trying to get that resolved but sadly they were
of the opinion that it is better for tldp to provide misleading docs rather than
no docs or a link to better docs. I updated it and moved it to the link above.

Sujit Karataparambil wrote:
> Kindly Read the document correctly and throughly.
>
> raidhotadd /dev/mdX /dev/sdb
nb this is very very old unsupported software and it may not be wise to suggest it.

David

-- 
"Don't worry, you'll be fine; I saw it work in a cartoon once..."

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
  2009-06-02 14:20 Jon Hardcastle
@ 2009-06-02 17:13 ` Alexander Rietsch
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 17:13 UTC (permalink / raw)
  To: Jon; +Cc: linux-raid


On 02.06.2009, at 16:20, Jon Hardcastle wrote:

>> As always, most problems occur only during critical tasks
>> like
>> enlarging/restoring.
>
> ....
>
> Is this a compelling case for regular
>
> echo check >> /sys/block/mdX/md/sync_action
>
> and/or
>
> echo repair >> /sys/block/mdX/md/sync_action
>
> ??

Yes, indeed! Another lesson learned.
This might have prevented numerous cases in the past when disc surface  
errors occurred while enlarging and resizing a raid array. But all  
those serious incidents where handled perfectly and flawlessly by the  
software raid module. Congrats to all kernel developers here. The raid  
module is far more flexible, error prone and stable than any hardware  
raid solution I've seen so far.

Regards,
Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID 5 re-add of removed drive? (failed drive replacement)
@ 2009-06-02 14:20 Jon Hardcastle
  2009-06-02 17:13 ` Alexander Rietsch
  0 siblings, 1 reply; 12+ messages in thread
From: Jon Hardcastle @ 2009-06-02 14:20 UTC (permalink / raw)
  To: linux-raid, Alex R





--- On Tue, 2/6/09, Alex R <Alexander.Rietsch@hispeed.ch> wrote:

> From: Alex R <Alexander.Rietsch@hispeed.ch>
> Subject: RAID 5 re-add of removed drive? (failed drive replacement)
> To: linux-raid@vger.kernel.org
> Date: Tuesday, 2 June, 2009, 11:09 AM
> 
> I have a serious RAID problem here. Please have a look at
> this. Any help
> would be greatly appreciated!
> 
> As always, most problems occur only during critical tasks
> like
> enlarging/restoring. 

....

> http://vger.kernel.org/majordomo-info.html
> 

Is this a compelling case for regular

echo check >> /sys/block/mdX/md/sync_action

and/or

echo repair >> /sys/block/mdX/md/sync_action

??

I have cron job that does this weekly.

(meant more for people that find this thread in 300yrs time)

-----------------------
N: Jon Hardcastle
E: Jon@eHardcastle.com
'Do not worry about tomorrow, for tomorrow will bring worries of its own.'

Please sponsor me for the London to Brighton 2009.
Just Giving: http://www.justgiving.com/jonathanhardcastle
-----------------------


      

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-06-08  9:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-02 10:09 RAID 5 re-add of removed drive? (failed drive replacement) Alex R
2009-06-02 10:18 ` Sujit Karataparambil
2009-06-02 10:45   ` Alexander Rietsch
2009-06-02 10:52   ` Sujit Karataparambil
2009-06-02 10:55     ` Sujit Karataparambil
2009-06-02 11:17 ` Robin Hill
2009-06-02 12:00   ` Alexander Rietsch
2009-06-02 13:10     ` Robin Hill
2009-06-02 14:24       ` Alexander Rietsch
2009-06-08  9:19         ` David Greaves
2009-06-02 14:20 Jon Hardcastle
2009-06-02 17:13 ` Alexander Rietsch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.