* RAID 5 re-add of removed drive? (failed drive replacement)
@ 2009-06-02 10:09 Alex R
2009-06-02 10:18 ` Sujit Karataparambil
2009-06-02 11:17 ` Robin Hill
0 siblings, 2 replies; 12+ messages in thread
From: Alex R @ 2009-06-02 10:09 UTC (permalink / raw)
To: linux-raid
I have a serious RAID problem here. Please have a look at this. Any help
would be greatly appreciated!
As always, most problems occur only during critical tasks like
enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
as explained here:
http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
After removing a drive and restoring to the new one, another disc in the
array failed. Now I still have all the data redundantly available (the old
drive is still there), but the RAID header is now in a state where it's
impossible to access the data. Is it possible to rearrange the drives to
force the kernel to a valid array?
Here is the story:
// my normal boot log showing RAID devices
Jun 1 22:37:45 localhost klogd: md: md0 stopped.
Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
Jun 1 22:37:45 localhost klogd: md: bind<sda1>
Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
Jun 1 22:37:45 localhost klogd: xor: automatically using best checksumming
function: generic_sse
Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
MB/sec)
Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for level
6
Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for level
5
Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for level
4
Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
0
Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
6
Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
5
Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
4
Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
3
Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
2
Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
1
Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
out of 7 devices, algorithm 2
Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
6001213046784
Jun 1 22:37:45 localhost klogd: md0: unknown partition table
// now a new spare drive is added
[root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
// and here goes the drive replacement
[root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
device.
Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total of
976759936 blocks.
Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
[root@localhost ~]# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
[=====>...............] recovery = 27.5% (269352320/976759936)
finish=276.2min speed=42686K/sec
// surface error on RAID drive while recovery:
Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
SErr 0x0 action 0x0
Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun 2 03:59:49 localhost klogd: ata1.00: cmd
60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
Jun 2 03:59:49 localhost klogd: res
41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun 2 03:59:49 localhost klogd: ata1: EH complete
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
SErr 0x0 action 0x0
Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun 2 03:59:49 localhost klogd: ata1.00: cmd
60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
Jun 2 03:59:49 localhost klogd: res
41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun 2 03:59:49 localhost klogd: ata1: EH complete
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
...
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269136 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269144 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269152 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269160 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269168 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269176 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269184 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269192 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269200 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269208 on sda1).
Jun 2 03:59:49 localhost klogd: ata1: EH complete
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
unreadable (pending) sectors
Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
uncorrectable sectors
// md0 is now down. But hey, still got the old drive, so just add it again:
[root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
// it's just added as a SPARE! HELP!!! reboot always helps..
[root@localhost ~]# reboot
[root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
/dev/sda1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Mon Jun 1 22:44:10 2009
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : 22d364f3 - correct
Events : 2599984
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 6 8 1 6 active sync /dev/sda1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 1 6 active sync /dev/sda1
7 7 8 17 7 spare /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f8dd - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 8 8 17 8 spare /dev/sdb1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdg1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f92d - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 97 5 active sync /dev/sdg1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdh1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f937 - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 113 2 active sync /dev/sdh1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdi1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f94b - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 7 8 129 7 spare /dev/sdi1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdj1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f959 - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 145 3 active sync /dev/sdj1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdk1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f96b - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 161 4 active sync /dev/sdk1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdl1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f975 - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 177 1 active sync /dev/sdl1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
the old RAID configuration was:
disc 0: sdi1 <- is now disc 7 and SPARE
disc 1: sdl1
disc 2: sdh1
disc 3: sdj1
disc 4: sdk1
disc 5: sdg1
disc 6: sda1 <- is now faulty removed
[root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
the array.
[root@localhost log]# cat /proc/mdstat
Personalities :
md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
sdk1[4](S) sdj1[3](S) sdh1[2](S)
8790840960 blocks
On large arrays this may happen a lot: A bad drive is first discovered
during maintenance operations when it's too late. Maybe an option to add a
redundant drive in a fail-save way would be a good idea to add to md
sevices.
Please tell me if you see any solution to the problems below.
1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
was before the restore attempt?
2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
readable data on the RAID?
3. I guess more then 90% of data was written to /dev/sdb1 in the restore
attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
Thank you for looking at the problem
Alexander
--
View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
Sent from the linux-raid mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 10:09 RAID 5 re-add of removed drive? (failed drive replacement) Alex R
@ 2009-06-02 10:18 ` Sujit Karataparambil
2009-06-02 10:45 ` Alexander Rietsch
2009-06-02 10:52 ` Sujit Karataparambil
2009-06-02 11:17 ` Robin Hill
1 sibling, 2 replies; 12+ messages in thread
From: Sujit Karataparambil @ 2009-06-02 10:18 UTC (permalink / raw)
To: Alex R; +Cc: linux-raid
http://www.cyberciti.biz/faq/howto-rebuilding-a-raid-array-after-a-disk-fails/
On Tue, Jun 2, 2009 at 3:39 PM, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
>
> I have a serious RAID problem here. Please have a look at this. Any help
> would be greatly appreciated!
>
> As always, most problems occur only during critical tasks like
> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
> as explained here:
> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>
> After removing a drive and restoring to the new one, another disc in the
> array failed. Now I still have all the data redundantly available (the old
> drive is still there), but the RAID header is now in a state where it's
> impossible to access the data. Is it possible to rearrange the drives to
> force the kernel to a valid array?
>
> Here is the story:
>
> // my normal boot log showing RAID devices
>
> Jun 1 22:37:45 localhost klogd: md: md0 stopped.
> Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
> Jun 1 22:37:45 localhost klogd: md: bind<sda1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
> Jun 1 22:37:45 localhost klogd: xor: automatically using best checksumming
> function: generic_sse
> Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
> Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
> MB/sec)
> Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
> Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
> Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for level
> 6
> Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for level
> 5
> Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for level
> 4
> Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
> 0
> Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
> 6
> Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
> 5
> Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
> 4
> Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
> 3
> Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
> 2
> Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
> 1
> Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
> Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
> out of 7 devices, algorithm 2
> Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
> Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
> Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
> Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
> 6001213046784
> Jun 1 22:37:45 localhost klogd: md0: unknown partition table
>
> // now a new spare drive is added
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>
> Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
>
> // and here goes the drive replacement
>
> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>
> Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
> device.
> Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
> Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
> Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
> Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
> Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
> KB/sec/disk.
> Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total of
> 976759936 blocks.
> Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>
> [root@localhost ~]# more /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
> 5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
> [=====>...............] recovery = 27.5% (269352320/976759936)
> finish=276.2min speed=42686K/sec
>
> // surface error on RAID drive while recovery:
>
> Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
> SErr 0x0 action 0x0
> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
> Jun 2 03:59:49 localhost klogd: res
> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun 2 03:59:49 localhost klogd: ata1: EH complete
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
> SErr 0x0 action 0x0
> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
> Jun 2 03:59:49 localhost klogd: res
> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun 2 03:59:49 localhost klogd: ata1: EH complete
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> ...
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269136 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269144 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269152 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269160 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269168 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269176 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269184 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269192 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269200 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269208 on sda1).
> Jun 2 03:59:49 localhost klogd: ata1: EH complete
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
> unreadable (pending) sectors
> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
> uncorrectable sectors
>
> // md0 is now down. But hey, still got the old drive, so just add it again:
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>
> Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
>
> // it's just added as a SPARE! HELP!!! reboot always helps..
>
> [root@localhost ~]# reboot
> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
> /dev/sda1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 7
> Preferred Minor : 0
>
> Update Time : Mon Jun 1 22:44:10 2009
> State : clean
> Active Devices : 6
> Working Devices : 7
> Failed Devices : 0
> Spare Devices : 1
> Checksum : 22d364f3 - correct
> Events : 2599984
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 6 8 1 6 active sync /dev/sda1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 8 1 6 active sync /dev/sda1
> 7 7 8 17 7 spare /dev/sdb1
> /dev/sdb1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f8dd - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 8 8 17 8 spare /dev/sdb1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdg1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f92d - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 5 8 97 5 active sync /dev/sdg1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdh1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f937 - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 2 8 113 2 active sync /dev/sdh1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdi1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f94b - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 7 8 129 7 spare /dev/sdi1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f959 - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 3 8 145 3 active sync /dev/sdj1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdk1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f96b - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 4 8 161 4 active sync /dev/sdk1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdl1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f975 - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 1 8 177 1 active sync /dev/sdl1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
>
> the old RAID configuration was:
>
> disc 0: sdi1 <- is now disc 7 and SPARE
> disc 1: sdl1
> disc 2: sdh1
> disc 3: sdj1
> disc 4: sdk1
> disc 5: sdg1
> disc 6: sda1 <- is now faulty removed
>
> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
> the array.
> [root@localhost log]# cat /proc/mdstat
> Personalities :
> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
> sdk1[4](S) sdj1[3](S) sdh1[2](S)
> 8790840960 blocks
>
>
> On large arrays this may happen a lot: A bad drive is first discovered
> during maintenance operations when it's too late. Maybe an option to add a
> redundant drive in a fail-save way would be a good idea to add to md
> sevices.
>
> Please tell me if you see any solution to the problems below.
>
> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
> was before the restore attempt?
>
> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
> readable data on the RAID?
>
> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>
> Thank you for looking at the problem
> Alexander
> --
> View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
> Sent from the linux-raid mailing list archive at Nabble.com.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 10:18 ` Sujit Karataparambil
@ 2009-06-02 10:45 ` Alexander Rietsch
2009-06-02 10:52 ` Sujit Karataparambil
1 sibling, 0 replies; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 10:45 UTC (permalink / raw)
To: Sujit Karataparambil; +Cc: linux-raid
Thank you for answering my mail. But to actually read it instead of
posting a link which contains no more information as already in the
RAID FAQ or mdadm man page, here is the short version of my problem:
>> disc 0: sdi1 <- is now disc 7 and SPARE
>> disc 1: sdl1
>> disc 2: sdh1
>> disc 3: sdj1
>> disc 4: sdk1
>> disc 5: sdg1
>> disc 6: sda1 <- is now faulty removed
sdb1 <- not finished replacement drive, now SPARE
of the original 7 drives, 2 are disabled. Please tell me how to
- re-add sdi1 as disc 0 (mdadm --re-add just adds it as spare)
- how to enable sda1 as disc6 (mdadm --assemble --force --scan refuses
to acceppt it)
- how to use the new drive sdb1 as disc7 (mdadm --assemble --force --
scan just adds it as spare)
original post:
After removing a drive and restoring to the new one, another disc in
the array failed. Now I still have all the data redundantly available
(the old drive is still there), but the RAID header is now in a state
where it's impossible to access the data. Is it possible to rearrange
the drives to force the kernel to a valid array?
Here is the story:
// my normal boot log showing RAID devices
Jun 1 22:37:45 localhost klogd: md: md0 stopped.
Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
Jun 1 22:37:45 localhost klogd: md: bind<sda1>
Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
Jun 1 22:37:45 localhost klogd: xor: automatically using best
checksumming function: generic_sse
Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse
(5144.000 MB/sec)
Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460
MB/s)
Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for
level 6
Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for
level 5
Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for
level 4
Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as
raid disk 0
Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as
raid disk 6
Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as
raid disk 5
Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as
raid disk 4
Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as
raid disk 3
Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as
raid disk 2
Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as
raid disk 1
Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active
with 7 out of 7 devices, algorithm 2
Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0
to 6001213046784
Jun 1 22:37:45 localhost klogd: md0: unknown partition table
// now a new spare drive is added
[root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
// and here goes the drive replacement
[root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1,
disabling device.
Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6
devices.
Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total
of 976759936 blocks.
Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
[root@localhost ~]# more /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2]
sdl1[1]
5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
[=====>...............] recovery = 27.5% (269352320/976759936)
finish=276.2min speed=42686K/sec
// surface error on RAID drive while recovery:
Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct
0xffff SErr 0x0 action 0x0
Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun 2 03:59:49 localhost klogd: ata1.00: cmd
60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
Jun 2 03:59:49 localhost klogd: res 41/40:08:3f:bd:b8/8c:
00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun 2 03:59:49 localhost klogd: ata1: EH complete
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct
0x3ffc SErr 0x0 action 0x0
Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
Jun 2 03:59:49 localhost klogd: ata1.00: cmd
60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
Jun 2 03:59:49 localhost klogd: res
41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
Jun 2 03:59:49 localhost klogd: ata1: EH complete
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
...
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269136 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269144 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269152 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269160 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269168 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269176 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269184 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269192 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269200 on sda1).
Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
(sector 1807269208 on sda1).
Jun 2 03:59:49 localhost klogd: ata1: EH complete
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
hardware sectors: (1.50 TB/1.36 TiB)
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA
Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
unreadable (pending) sectors
Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
uncorrectable sectors
// md0 is now down. But hey, still got the old drive, so just add it
again:
[root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
// it's just added as a SPARE! HELP!!! reboot always helps..
[root@localhost ~]# reboot
[root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
/dev/sda1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 0
Update Time : Mon Jun 1 22:44:10 2009
State : clean
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : 22d364f3 - correct
Events : 2599984
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 6 8 1 6 active sync /dev/sda1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 8 1 6 active sync /dev/sda1
7 7 8 17 7 spare /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f8dd - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 8 8 17 8 spare /dev/sdb1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdg1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f92d - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 97 5 active sync /dev/sdg1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdh1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f937 - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 113 2 active sync /dev/sdh1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdi1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f94b - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 7 8 129 7 spare /dev/sdi1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdj1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f959 - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 145 3 active sync /dev/sdj1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdk1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f96b - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 161 4 active sync /dev/sdk1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
/dev/sdl1:
Magic : a92b4efc
Version : 0.90.00
UUID : 15401f4b:391c2538:89022bfa:d48f439f
Creation Time : Sun Nov 2 13:21:54 2008
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
Raid Devices : 7
Total Devices : 8
Preferred Minor : 0
Update Time : Tue Jun 2 09:11:49 2009
State : clean
Active Devices : 5
Working Devices : 7
Failed Devices : 1
Spare Devices : 2
Checksum : 22d3f975 - correct
Events : 2599992
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 177 1 active sync /dev/sdl1
0 0 0 0 0 removed
1 1 8 177 1 active sync /dev/sdl1
2 2 8 113 2 active sync /dev/sdh1
3 3 8 145 3 active sync /dev/sdj1
4 4 8 161 4 active sync /dev/sdk1
5 5 8 97 5 active sync /dev/sdg1
6 6 0 0 6 faulty removed
7 7 8 129 7 spare /dev/sdi1
8 8 8 17 8 spare /dev/sdb1
the old RAID configuration was:
disc 0: sdi1 <- is now disc 7 and SPARE
disc 1: sdl1
disc 2: sdh1
disc 3: sdj1
disc 4: sdk1
disc 5: sdg1
disc 6: sda1 <- is now faulty removed
[root@localhost log]# mdadm --assemble --force /dev/md0 /dev/
sd[ilhjkgab]1
mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to
start the array.
[root@localhost log]# cat /proc/mdstat
Personalities :
md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
sdk1[4](S) sdj1[3](S) sdh1[2](S)
8790840960 blocks
On large arrays this may happen a lot: A bad drive is first discovered
during maintenance operations when it's too late. Maybe an option to
add a redundant drive in a fail-save way would be a good idea to add
to md sevices.
Please tell me if you see any solution to the problems below.
1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID
as is was before the restore attempt?
2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
readable data on the RAID?
3. I guess more then 90% of data was written to /dev/sdb1 in the
restore attempt. Is it possble to use /dev/sdb1 as disc 7 to access
the RAID?
Thank you for looking at the problem
Alexander
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 10:18 ` Sujit Karataparambil
2009-06-02 10:45 ` Alexander Rietsch
@ 2009-06-02 10:52 ` Sujit Karataparambil
2009-06-02 10:55 ` Sujit Karataparambil
1 sibling, 1 reply; 12+ messages in thread
From: Sujit Karataparambil @ 2009-06-02 10:52 UTC (permalink / raw)
To: Alex R; +Cc: linux-raid
Kindly Read the document correctly and throughly.
raidhotadd /dev/mdX /dev/sdb
It says
Q. I have two disk-mirrored array, suppose if one of my disk in
mirrored RAID array fails, then I will replace that disk with new one
(I have hot swapping SCSI drives). Now question is how I rebuild a
RAID array after a disk fails.
A. A redundant array of inexpensive disks, (redundant array of
independent disks) is a system, which uses multiple hard drives to
share or replicate data among the drives. You can use both IDE and
SCSI disk for mirroring.
If you are not using hot swapping drives then you need to shutdown
server. Once hard disk has been replaced to system, you need to use
used raidhotadd to add disks from RAID-1, -4 and -5 arrays, while they
are active.
Assuming that new SCSI disk is /dev/sdb, type the following command:#
raidhotadd /dev/mdX /dev/sdb
On Tue, Jun 2, 2009 at 4:15 PM, Alexander Rietsch
<Alexander.Rietsch@hispeed.ch> wrote:
> Thank you for answering my mail. But to actually read it instead of posting
> a link which contains no more information as already in the RAID FAQ or
> mdadm man page, here is the short version of my problem:
>
>>> disc 0: sdi1 <- is now disc 7 and SPARE
>>> disc 1: sdl1
>>> disc 2: sdh1
>>> disc 3: sdj1
>>> disc 4: sdk1
>>> disc 5: sdg1
>>> disc 6: sda1 <- is now faulty removed
>
> sdb1 <- not finished replacement drive, now SPARE
>
> of the original 7 drives, 2 are disabled. Please tell me how to
> - re-add sdi1 as disc 0 (mdadm --re-add just adds it as spare)
> - how to enable sda1 as disc6 (mdadm --assemble --force --scan refuses to
> acceppt it)
> - how to use the new drive sdb1 as disc7 (mdadm --assemble --force --scan
> just adds it as spare)
>
> original post:
>
> After removing a drive and restoring to the new one, another disc in the
> array failed. Now I still have all the data redundantly available (the old
> drive is still there), but the RAID header is now in a state where it's
> impossible to access the data. Is it possible to rearrange the drives to
> force the kernel to a valid array?
>
> Here is the story:
>
> // my normal boot log showing RAID devices
>
> Jun 1 22:37:45 localhost klogd: md: md0 stopped.
> Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
> Jun 1 22:37:45 localhost klogd: md: bind<sda1>
> Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
> Jun 1 22:37:45 localhost klogd: xor: automatically using best checksumming
> function: generic_sse
> Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
> Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
> MB/sec)
> Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
> Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
> Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
> Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for level
> 6
> Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for level
> 5
> Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for level
> 4
> Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
> 0
> Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
> 6
> Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
> 5
> Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
> 4
> Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
> 3
> Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
> 2
> Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
> 1
> Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
> Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
> out of 7 devices, algorithm 2
> Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
> Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
> Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
> Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
> 6001213046784
> Jun 1 22:37:45 localhost klogd: md0: unknown partition table
>
> // now a new spare drive is added
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>
> Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
>
> // and here goes the drive replacement
>
> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>
> Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
> device.
> Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
> Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
> Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
> Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
> Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
> Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
> KB/sec/disk.
> Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total of
> 976759936 blocks.
> Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>
> [root@localhost ~]# more /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
> 5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
> [=====>...............] recovery = 27.5% (269352320/976759936)
> finish=276.2min speed=42686K/sec
>
> // surface error on RAID drive while recovery:
>
> Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
> SErr 0x0 action 0x0
> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
> Jun 2 03:59:49 localhost klogd: res
> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun 2 03:59:49 localhost klogd: ata1: EH complete
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
> SErr 0x0 action 0x0
> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
> Jun 2 03:59:49 localhost klogd: res
> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
> Jun 2 03:59:49 localhost klogd: ata1: EH complete
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> ...
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269136 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269144 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269152 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269160 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269168 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269176 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269184 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269192 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269200 on sda1).
> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
> (sector 1807269208 on sda1).
> Jun 2 03:59:49 localhost klogd: ata1: EH complete
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
> hardware sectors: (1.50 TB/1.36 TiB)
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
> read cache: enabled, doesn't support DPO or FUA
> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
> unreadable (pending) sectors
> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
> uncorrectable sectors
>
> // md0 is now down. But hey, still got the old drive, so just add it again:
>
> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>
> Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
>
> // it's just added as a SPARE! HELP!!! reboot always helps..
>
> [root@localhost ~]# reboot
> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
> /dev/sda1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 7
> Preferred Minor : 0
>
> Update Time : Mon Jun 1 22:44:10 2009
> State : clean
> Active Devices : 6
> Working Devices : 7
> Failed Devices : 0
> Spare Devices : 1
> Checksum : 22d364f3 - correct
> Events : 2599984
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 6 8 1 6 active sync /dev/sda1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 8 1 6 active sync /dev/sda1
> 7 7 8 17 7 spare /dev/sdb1
> /dev/sdb1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f8dd - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 8 8 17 8 spare /dev/sdb1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdg1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f92d - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 5 8 97 5 active sync /dev/sdg1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdh1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f937 - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 2 8 113 2 active sync /dev/sdh1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdi1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f94b - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 7 8 129 7 spare /dev/sdi1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f959 - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 3 8 145 3 active sync /dev/sdj1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdk1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f96b - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 4 8 161 4 active sync /dev/sdk1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
> /dev/sdl1:
> Magic : a92b4efc
> Version : 0.90.00
> UUID : 15401f4b:391c2538:89022bfa:d48f439f
> Creation Time : Sun Nov 2 13:21:54 2008
> Raid Level : raid5
> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
> Raid Devices : 7
> Total Devices : 8
> Preferred Minor : 0
>
> Update Time : Tue Jun 2 09:11:49 2009
> State : clean
> Active Devices : 5
> Working Devices : 7
> Failed Devices : 1
> Spare Devices : 2
> Checksum : 22d3f975 - correct
> Events : 2599992
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Number Major Minor RaidDevice State
> this 1 8 177 1 active sync /dev/sdl1
>
> 0 0 0 0 0 removed
> 1 1 8 177 1 active sync /dev/sdl1
> 2 2 8 113 2 active sync /dev/sdh1
> 3 3 8 145 3 active sync /dev/sdj1
> 4 4 8 161 4 active sync /dev/sdk1
> 5 5 8 97 5 active sync /dev/sdg1
> 6 6 0 0 6 faulty removed
> 7 7 8 129 7 spare /dev/sdi1
> 8 8 8 17 8 spare /dev/sdb1
>
> the old RAID configuration was:
>
> disc 0: sdi1 <- is now disc 7 and SPARE
> disc 1: sdl1
> disc 2: sdh1
> disc 3: sdj1
> disc 4: sdk1
> disc 5: sdg1
> disc 6: sda1 <- is now faulty removed
>
> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
> the array.
> [root@localhost log]# cat /proc/mdstat
> Personalities :
> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
> sdk1[4](S) sdj1[3](S) sdh1[2](S)
> 8790840960 blocks
>
>
> On large arrays this may happen a lot: A bad drive is first discovered
> during maintenance operations when it's too late. Maybe an option to add a
> redundant drive in a fail-save way would be a good idea to add to md
> sevices.
>
> Please tell me if you see any solution to the problems below.
>
> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
> was before the restore attempt?
>
> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
> readable data on the RAID?
>
> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>
> Thank you for looking at the problem
> Alexander
>
>
--
-- Sujit K M
On Tue, Jun 2, 2009 at 3:48 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
> http://www.cyberciti.biz/faq/howto-rebuilding-a-raid-array-after-a-disk-fails/
>
>
> On Tue, Jun 2, 2009 at 3:39 PM, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
>>
>> I have a serious RAID problem here. Please have a look at this. Any help
>> would be greatly appreciated!
>>
>> As always, most problems occur only during critical tasks like
>> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
>> as explained here:
>> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>>
>> After removing a drive and restoring to the new one, another disc in the
>> array failed. Now I still have all the data redundantly available (the old
>> drive is still there), but the RAID header is now in a state where it's
>> impossible to access the data. Is it possible to rearrange the drives to
>> force the kernel to a valid array?
>>
>> Here is the story:
>>
>> // my normal boot log showing RAID devices
>>
>> Jun 1 22:37:45 localhost klogd: md: md0 stopped.
>> Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sda1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
>> Jun 1 22:37:45 localhost klogd: xor: automatically using best checksumming
>> function: generic_sse
>> Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
>> Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
>> MB/sec)
>> Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
>> Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
>> Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for level
>> 6
>> Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for level
>> 5
>> Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for level
>> 4
>> Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
>> 0
>> Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
>> 6
>> Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
>> 5
>> Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
>> 4
>> Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
>> 3
>> Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
>> 2
>> Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
>> 1
>> Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
>> Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
>> out of 7 devices, algorithm 2
>> Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
>> Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
>> Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
>> Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
>> 6001213046784
>> Jun 1 22:37:45 localhost klogd: md0: unknown partition table
>>
>> // now a new spare drive is added
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>>
>> Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
>>
>> // and here goes the drive replacement
>>
>> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>>
>> Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
>> device.
>> Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>> Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>> Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
>> Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
>> Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
>> KB/sec/disk.
>> Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
>> bandwidth (but not more than 200000 KB/sec) for recovery.
>> Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total of
>> 976759936 blocks.
>> Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>>
>> [root@localhost ~]# more /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>> 5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>> [=====>...............] recovery = 27.5% (269352320/976759936)
>> finish=276.2min speed=42686K/sec
>>
>> // surface error on RAID drive while recovery:
>>
>> Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
>> SErr 0x0 action 0x0
>> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
>> Jun 2 03:59:49 localhost klogd: res
>> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
>> SErr 0x0 action 0x0
>> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
>> Jun 2 03:59:49 localhost klogd: res
>> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> ...
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269136 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269144 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269152 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269160 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269168 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269176 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269184 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269192 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269200 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269208 on sda1).
>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
>> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
>> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
>> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
>> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
>> unreadable (pending) sectors
>> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
>> uncorrectable sectors
>>
>> // md0 is now down. But hey, still got the old drive, so just add it again:
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>>
>> Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
>>
>> // it's just added as a SPARE! HELP!!! reboot always helps..
>>
>> [root@localhost ~]# reboot
>> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
>> /dev/sda1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 7
>> Preferred Minor : 0
>>
>> Update Time : Mon Jun 1 22:44:10 2009
>> State : clean
>> Active Devices : 6
>> Working Devices : 7
>> Failed Devices : 0
>> Spare Devices : 1
>> Checksum : 22d364f3 - correct
>> Events : 2599984
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 6 8 1 6 active sync /dev/sda1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 8 1 6 active sync /dev/sda1
>> 7 7 8 17 7 spare /dev/sdb1
>> /dev/sdb1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f8dd - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 8 8 17 8 spare /dev/sdb1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdg1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f92d - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 5 8 97 5 active sync /dev/sdg1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdh1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f937 - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 2 8 113 2 active sync /dev/sdh1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdi1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f94b - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 7 8 129 7 spare /dev/sdi1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdj1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f959 - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 3 8 145 3 active sync /dev/sdj1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdk1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f96b - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 4 8 161 4 active sync /dev/sdk1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdl1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f975 - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 1 8 177 1 active sync /dev/sdl1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>>
>> the old RAID configuration was:
>>
>> disc 0: sdi1 <- is now disc 7 and SPARE
>> disc 1: sdl1
>> disc 2: sdh1
>> disc 3: sdj1
>> disc 4: sdk1
>> disc 5: sdg1
>> disc 6: sda1 <- is now faulty removed
>>
>> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
>> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
>> the array.
>> [root@localhost log]# cat /proc/mdstat
>> Personalities :
>> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
>> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>> 8790840960 blocks
>>
>>
>> On large arrays this may happen a lot: A bad drive is first discovered
>> during maintenance operations when it's too late. Maybe an option to add a
>> redundant drive in a fail-save way would be a good idea to add to md
>> sevices.
>>
>> Please tell me if you see any solution to the problems below.
>>
>> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
>> was before the restore attempt?
>>
>> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
>> readable data on the RAID?
>>
>> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
>> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>>
>> Thank you for looking at the problem
>> Alexander
>> --
>> View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
>> Sent from the linux-raid mailing list archive at Nabble.com.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> -- Sujit K M
>
--
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 10:52 ` Sujit Karataparambil
@ 2009-06-02 10:55 ` Sujit Karataparambil
0 siblings, 0 replies; 12+ messages in thread
From: Sujit Karataparambil @ 2009-06-02 10:55 UTC (permalink / raw)
To: Alex R; +Cc: linux-raid
http://www.tldp.org/HOWTO/Software-RAID-HOWTO-3.html
This is the RAID Documentation which I found very less suffiecient.
On Tue, Jun 2, 2009 at 4:22 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
> Kindly Read the document correctly and throughly.
>
> raidhotadd /dev/mdX /dev/sdb
>
> It says
>
> Q. I have two disk-mirrored array, suppose if one of my disk in
> mirrored RAID array fails, then I will replace that disk with new one
> (I have hot swapping SCSI drives). Now question is how I rebuild a
> RAID array after a disk fails.
>
> A. A redundant array of inexpensive disks, (redundant array of
> independent disks) is a system, which uses multiple hard drives to
> share or replicate data among the drives. You can use both IDE and
> SCSI disk for mirroring.
>
> If you are not using hot swapping drives then you need to shutdown
> server. Once hard disk has been replaced to system, you need to use
> used raidhotadd to add disks from RAID-1, -4 and -5 arrays, while they
> are active.
>
> Assuming that new SCSI disk is /dev/sdb, type the following command:#
> raidhotadd /dev/mdX /dev/sdb
>
>
> On Tue, Jun 2, 2009 at 4:15 PM, Alexander Rietsch
> <Alexander.Rietsch@hispeed.ch> wrote:
>> Thank you for answering my mail. But to actually read it instead of posting
>> a link which contains no more information as already in the RAID FAQ or
>> mdadm man page, here is the short version of my problem:
>>
>>>> disc 0: sdi1 <- is now disc 7 and SPARE
>>>> disc 1: sdl1
>>>> disc 2: sdh1
>>>> disc 3: sdj1
>>>> disc 4: sdk1
>>>> disc 5: sdg1
>>>> disc 6: sda1 <- is now faulty removed
>>
>> sdb1 <- not finished replacement drive, now SPARE
>>
>> of the original 7 drives, 2 are disabled. Please tell me how to
>> - re-add sdi1 as disc 0 (mdadm --re-add just adds it as spare)
>> - how to enable sda1 as disc6 (mdadm --assemble --force --scan refuses to
>> acceppt it)
>> - how to use the new drive sdb1 as disc7 (mdadm --assemble --force --scan
>> just adds it as spare)
>>
>> original post:
>>
>> After removing a drive and restoring to the new one, another disc in the
>> array failed. Now I still have all the data redundantly available (the old
>> drive is still there), but the RAID header is now in a state where it's
>> impossible to access the data. Is it possible to rearrange the drives to
>> force the kernel to a valid array?
>>
>> Here is the story:
>>
>> // my normal boot log showing RAID devices
>>
>> Jun 1 22:37:45 localhost klogd: md: md0 stopped.
>> Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sda1>
>> Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
>> Jun 1 22:37:45 localhost klogd: xor: automatically using best checksumming
>> function: generic_sse
>> Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
>> Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
>> MB/sec)
>> Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
>> Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
>> Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
>> Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for level
>> 6
>> Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for level
>> 5
>> Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for level
>> 4
>> Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
>> 0
>> Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
>> 6
>> Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
>> 5
>> Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
>> 4
>> Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
>> 3
>> Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
>> 2
>> Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
>> 1
>> Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
>> Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
>> out of 7 devices, algorithm 2
>> Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
>> Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
>> Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
>> Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
>> 6001213046784
>> Jun 1 22:37:45 localhost klogd: md0: unknown partition table
>>
>> // now a new spare drive is added
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>>
>> Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
>>
>> // and here goes the drive replacement
>>
>> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>>
>> Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
>> device.
>> Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>> Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>> Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>> Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
>> Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
>> Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
>> KB/sec/disk.
>> Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
>> bandwidth (but not more than 200000 KB/sec) for recovery.
>> Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total of
>> 976759936 blocks.
>> Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>>
>> [root@localhost ~]# more /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>> 5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>> [=====>...............] recovery = 27.5% (269352320/976759936)
>> finish=276.2min speed=42686K/sec
>>
>> // surface error on RAID drive while recovery:
>>
>> Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
>> SErr 0x0 action 0x0
>> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
>> Jun 2 03:59:49 localhost klogd: res
>> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
>> SErr 0x0 action 0x0
>> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
>> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
>> Jun 2 03:59:49 localhost klogd: res
>> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
>> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> ...
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269136 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269144 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269152 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269160 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269168 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269176 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269184 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269192 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269200 on sda1).
>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>> (sector 1807269208 on sda1).
>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>> hardware sectors: (1.50 TB/1.36 TiB)
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>> read cache: enabled, doesn't support DPO or FUA
>> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
>> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
>> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
>> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
>> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
>> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
>> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
>> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
>> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
>> unreadable (pending) sectors
>> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
>> uncorrectable sectors
>>
>> // md0 is now down. But hey, still got the old drive, so just add it again:
>>
>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>>
>> Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
>>
>> // it's just added as a SPARE! HELP!!! reboot always helps..
>>
>> [root@localhost ~]# reboot
>> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
>> /dev/sda1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 7
>> Preferred Minor : 0
>>
>> Update Time : Mon Jun 1 22:44:10 2009
>> State : clean
>> Active Devices : 6
>> Working Devices : 7
>> Failed Devices : 0
>> Spare Devices : 1
>> Checksum : 22d364f3 - correct
>> Events : 2599984
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 6 8 1 6 active sync /dev/sda1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 8 1 6 active sync /dev/sda1
>> 7 7 8 17 7 spare /dev/sdb1
>> /dev/sdb1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f8dd - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 8 8 17 8 spare /dev/sdb1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdg1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f92d - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 5 8 97 5 active sync /dev/sdg1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdh1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f937 - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 2 8 113 2 active sync /dev/sdh1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdi1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f94b - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 7 8 129 7 spare /dev/sdi1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdj1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f959 - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 3 8 145 3 active sync /dev/sdj1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdk1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f96b - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 4 8 161 4 active sync /dev/sdk1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>> /dev/sdl1:
>> Magic : a92b4efc
>> Version : 0.90.00
>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>> Creation Time : Sun Nov 2 13:21:54 2008
>> Raid Level : raid5
>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>> Raid Devices : 7
>> Total Devices : 8
>> Preferred Minor : 0
>>
>> Update Time : Tue Jun 2 09:11:49 2009
>> State : clean
>> Active Devices : 5
>> Working Devices : 7
>> Failed Devices : 1
>> Spare Devices : 2
>> Checksum : 22d3f975 - correct
>> Events : 2599992
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> Number Major Minor RaidDevice State
>> this 1 8 177 1 active sync /dev/sdl1
>>
>> 0 0 0 0 0 removed
>> 1 1 8 177 1 active sync /dev/sdl1
>> 2 2 8 113 2 active sync /dev/sdh1
>> 3 3 8 145 3 active sync /dev/sdj1
>> 4 4 8 161 4 active sync /dev/sdk1
>> 5 5 8 97 5 active sync /dev/sdg1
>> 6 6 0 0 6 faulty removed
>> 7 7 8 129 7 spare /dev/sdi1
>> 8 8 8 17 8 spare /dev/sdb1
>>
>> the old RAID configuration was:
>>
>> disc 0: sdi1 <- is now disc 7 and SPARE
>> disc 1: sdl1
>> disc 2: sdh1
>> disc 3: sdj1
>> disc 4: sdk1
>> disc 5: sdg1
>> disc 6: sda1 <- is now faulty removed
>>
>> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
>> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
>> the array.
>> [root@localhost log]# cat /proc/mdstat
>> Personalities :
>> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
>> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>> 8790840960 blocks
>>
>>
>> On large arrays this may happen a lot: A bad drive is first discovered
>> during maintenance operations when it's too late. Maybe an option to add a
>> redundant drive in a fail-save way would be a good idea to add to md
>> sevices.
>>
>> Please tell me if you see any solution to the problems below.
>>
>> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
>> was before the restore attempt?
>>
>> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
>> readable data on the RAID?
>>
>> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
>> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>>
>> Thank you for looking at the problem
>> Alexander
>>
>>
>
>
>
> --
> -- Sujit K M
>
>
>
> On Tue, Jun 2, 2009 at 3:48 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
>> http://www.cyberciti.biz/faq/howto-rebuilding-a-raid-array-after-a-disk-fails/
>>
>>
>> On Tue, Jun 2, 2009 at 3:39 PM, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
>>>
>>> I have a serious RAID problem here. Please have a look at this. Any help
>>> would be greatly appreciated!
>>>
>>> As always, most problems occur only during critical tasks like
>>> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
>>> as explained here:
>>> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>>>
>>> After removing a drive and restoring to the new one, another disc in the
>>> array failed. Now I still have all the data redundantly available (the old
>>> drive is still there), but the RAID header is now in a state where it's
>>> impossible to access the data. Is it possible to rearrange the drives to
>>> force the kernel to a valid array?
>>>
>>> Here is the story:
>>>
>>> // my normal boot log showing RAID devices
>>>
>>> Jun 1 22:37:45 localhost klogd: md: md0 stopped.
>>> Jun 1 22:37:45 localhost klogd: md: bind<sdl1>
>>> Jun 1 22:37:45 localhost klogd: md: bind<sdh1>
>>> Jun 1 22:37:45 localhost klogd: md: bind<sdj1>
>>> Jun 1 22:37:45 localhost klogd: md: bind<sdk1>
>>> Jun 1 22:37:45 localhost klogd: md: bind<sdg1>
>>> Jun 1 22:37:45 localhost klogd: md: bind<sda1>
>>> Jun 1 22:37:45 localhost klogd: md: bind<sdi1>
>>> Jun 1 22:37:45 localhost klogd: xor: automatically using best checksumming
>>> function: generic_sse
>>> Jun 1 22:37:45 localhost klogd: generic_sse: 5144.000 MB/sec
>>> Jun 1 22:37:45 localhost klogd: xor: using function: generic_sse (5144.000
>>> MB/sec)
>>> Jun 1 22:37:45 localhost klogd: async_tx: api initialized (async)
>>> Jun 1 22:37:45 localhost klogd: raid6: int64x1 1539 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: int64x2 1558 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: int64x4 1968 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: int64x8 1554 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: sse2x1 2441 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: sse2x2 3250 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: sse2x4 3460 MB/s
>>> Jun 1 22:37:45 localhost klogd: raid6: using algorithm sse2x4 (3460 MB/s)
>>> Jun 1 22:37:45 localhost klogd: md: raid6 personality registered for level
>>> 6
>>> Jun 1 22:37:45 localhost klogd: md: raid5 personality registered for level
>>> 5
>>> Jun 1 22:37:45 localhost klogd: md: raid4 personality registered for level
>>> 4
>>> Jun 1 22:37:45 localhost klogd: raid5: device sdi1 operational as raid disk
>>> 0
>>> Jun 1 22:37:45 localhost klogd: raid5: device sda1 operational as raid disk
>>> 6
>>> Jun 1 22:37:45 localhost klogd: raid5: device sdg1 operational as raid disk
>>> 5
>>> Jun 1 22:37:45 localhost klogd: raid5: device sdk1 operational as raid disk
>>> 4
>>> Jun 1 22:37:45 localhost klogd: raid5: device sdj1 operational as raid disk
>>> 3
>>> Jun 1 22:37:45 localhost klogd: raid5: device sdh1 operational as raid disk
>>> 2
>>> Jun 1 22:37:45 localhost klogd: raid5: device sdl1 operational as raid disk
>>> 1
>>> Jun 1 22:37:45 localhost klogd: raid5: allocated 7434kB for md0
>>> Jun 1 22:37:45 localhost klogd: raid5: raid level 5 set md0 active with 7
>>> out of 7 devices, algorithm 2
>>> Jun 1 22:37:45 localhost klogd: RAID5 conf printout:
>>> Jun 1 22:37:45 localhost klogd: --- rd:7 wd:7
>>> Jun 1 22:37:45 localhost klogd: disk 0, o:1, dev:sdi1
>>> Jun 1 22:37:45 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 1 22:37:45 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 1 22:37:45 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 1 22:37:45 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 1 22:37:45 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 1 22:37:45 localhost klogd: disk 6, o:1, dev:sda1
>>> Jun 1 22:37:45 localhost klogd: md0: detected capacity change from 0 to
>>> 6001213046784
>>> Jun 1 22:37:45 localhost klogd: md0: unknown partition table
>>>
>>> // now a new spare drive is added
>>>
>>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdb1
>>>
>>> Jun 1 22:42:00 localhost klogd: md: bind<sdb1>
>>>
>>> // and here goes the drive replacement
>>>
>>> [root@localhost ~]# mdadm /dev/md0 --fail /dev/sdi1 --remove /dev/sdi1
>>>
>>> Jun 1 22:44:10 localhost klogd: raid5: Disk failure on sdi1, disabling
>>> device.
>>> Jun 1 22:44:10 localhost klogd: raid5: Operation continuing on 6 devices.
>>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>>> Jun 1 22:44:10 localhost klogd: disk 0, o:0, dev:sdi1
>>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>>> Jun 1 22:44:10 localhost klogd: RAID5 conf printout:
>>> Jun 1 22:44:10 localhost klogd: --- rd:7 wd:6
>>> Jun 1 22:44:10 localhost klogd: disk 0, o:1, dev:sdb1
>>> Jun 1 22:44:10 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 1 22:44:10 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 1 22:44:10 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 1 22:44:10 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 1 22:44:10 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 1 22:44:10 localhost klogd: disk 6, o:1, dev:sda1
>>> Jun 1 22:44:10 localhost klogd: md: recovery of RAID array md0
>>> Jun 1 22:44:10 localhost klogd: md: unbind<sdi1>
>>> Jun 1 22:44:10 localhost klogd: md: minimum _guaranteed_ speed: 1000
>>> KB/sec/disk.
>>> Jun 1 22:44:10 localhost klogd: md: using maximum available idle IO
>>> bandwidth (but not more than 200000 KB/sec) for recovery.
>>> Jun 1 22:44:10 localhost klogd: md: using 128k window, over a total of
>>> 976759936 blocks.
>>> Jun 1 22:44:10 localhost klogd: md: export_rdev(sdi1)
>>>
>>> [root@localhost ~]# more /proc/mdstat
>>> Personalities : [raid6] [raid5] [raid4]
>>> md0 : active raid5 sdb1[7] sda1[6] sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1]
>>> 5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [_UUUUUU]
>>> [=====>...............] recovery = 27.5% (269352320/976759936)
>>> finish=276.2min speed=42686K/sec
>>>
>>> // surface error on RAID drive while recovery:
>>>
>>> Jun 2 03:58:59 localhost klogd: ata1.00: exception Emask 0x0 SAct 0xffff
>>> SErr 0x0 action 0x0
>>> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>>> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
>>> 60/08:58:3f:bd:b8/00:00:6b:00:00/40 tag 11 ncq 4096 in
>>> Jun 2 03:59:49 localhost klogd: res
>>> 41/40:08:3f:bd:b8/8c:00:6b:00:00/00 Emask 0x409 (media error) <F>
>>> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>>> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>>> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>>> hardware sectors: (1.50 TB/1.36 TiB)
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>>> read cache: enabled, doesn't support DPO or FUA
>>> Jun 2 03:59:49 localhost klogd: ata1.00: exception Emask 0x0 SAct 0x3ffc
>>> SErr 0x0 action 0x0
>>> Jun 2 03:59:49 localhost klogd: ata1.00: irq_stat 0x40000008
>>> Jun 2 03:59:49 localhost klogd: ata1.00: cmd
>>> 60/08:20:3f:bd:b8/00:00:6b:00:00/40 tag 4 ncq 4096 in
>>> Jun 2 03:59:49 localhost klogd: res
>>> 41/40:08:3f:bd:b8/28:00:6b:00:00/00 Emask 0x409 (media error) <F>
>>> Jun 2 03:59:49 localhost klogd: ata1.00: status: { DRDY ERR }
>>> Jun 2 03:59:49 localhost klogd: ata1.00: error: { UNC }
>>> Jun 2 03:59:49 localhost klogd: ata1.00: configured for UDMA/133
>>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>>> hardware sectors: (1.50 TB/1.36 TiB)
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>>> read cache: enabled, doesn't support DPO or FUA
>>> ...
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269136 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269144 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269152 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269160 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269168 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269176 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269184 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269192 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269200 on sda1).
>>> Jun 2 03:59:49 localhost klogd: raid5:md0: read error not correctable
>>> (sector 1807269208 on sda1).
>>> Jun 2 03:59:49 localhost klogd: ata1: EH complete
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] 2930277168 512-byte
>>> hardware sectors: (1.50 TB/1.36 TiB)
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write Protect is off
>>> Jun 2 03:59:49 localhost klogd: sd 0:0:0:0: [sda] Write cache: enabled,
>>> read cache: enabled, doesn't support DPO or FUA
>>> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
>>> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
>>> Jun 2 03:59:49 localhost klogd: disk 0, o:1, dev:sdb1
>>> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 2 03:59:49 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 2 03:59:49 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 2 03:59:49 localhost klogd: disk 6, o:0, dev:sda1
>>> Jun 2 03:59:49 localhost klogd: RAID5 conf printout:
>>> Jun 2 03:59:49 localhost klogd: --- rd:7 wd:5
>>> Jun 2 03:59:49 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 2 03:59:49 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 2 03:59:49 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
>>> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
>>> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
>>> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 2 03:59:50 localhost klogd: disk 6, o:0, dev:sda1
>>> Jun 2 03:59:50 localhost klogd: RAID5 conf printout:
>>> Jun 2 03:59:50 localhost klogd: --- rd:7 wd:5
>>> Jun 2 03:59:50 localhost klogd: disk 1, o:1, dev:sdl1
>>> Jun 2 03:59:50 localhost klogd: disk 2, o:1, dev:sdh1
>>> Jun 2 03:59:50 localhost klogd: disk 3, o:1, dev:sdj1
>>> Jun 2 03:59:50 localhost klogd: disk 4, o:1, dev:sdk1
>>> Jun 2 03:59:50 localhost klogd: disk 5, o:1, dev:sdg1
>>> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Currently
>>> unreadable (pending) sectors
>>> Jun 2 04:26:17 localhost smartd[2502]: Device: /dev/sda, 34 Offline
>>> uncorrectable sectors
>>>
>>> // md0 is now down. But hey, still got the old drive, so just add it again:
>>>
>>> [root@localhost ~]# mdadm /dev/md0 --add /dev/sdi1
>>>
>>> Jun 2 09:11:49 localhost klogd: md: bind<sdi1>
>>>
>>> // it's just added as a SPARE! HELP!!! reboot always helps..
>>>
>>> [root@localhost ~]# reboot
>>> [root@localhost log]# mdadm -E /dev/sd[bagkjhli]1
>>> /dev/sda1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 7
>>> Preferred Minor : 0
>>>
>>> Update Time : Mon Jun 1 22:44:10 2009
>>> State : clean
>>> Active Devices : 6
>>> Working Devices : 7
>>> Failed Devices : 0
>>> Spare Devices : 1
>>> Checksum : 22d364f3 - correct
>>> Events : 2599984
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 6 8 1 6 active sync /dev/sda1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 8 1 6 active sync /dev/sda1
>>> 7 7 8 17 7 spare /dev/sdb1
>>> /dev/sdb1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f8dd - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 8 8 17 8 spare /dev/sdb1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>> /dev/sdg1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f92d - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 5 8 97 5 active sync /dev/sdg1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>> /dev/sdh1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f937 - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 2 8 113 2 active sync /dev/sdh1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>> /dev/sdi1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f94b - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 7 8 129 7 spare /dev/sdi1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>> /dev/sdj1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f959 - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 3 8 145 3 active sync /dev/sdj1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>> /dev/sdk1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f96b - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 4 8 161 4 active sync /dev/sdk1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>> /dev/sdl1:
>>> Magic : a92b4efc
>>> Version : 0.90.00
>>> UUID : 15401f4b:391c2538:89022bfa:d48f439f
>>> Creation Time : Sun Nov 2 13:21:54 2008
>>> Raid Level : raid5
>>> Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>> Array Size : 5860559616 (5589.07 GiB 6001.21 GB)
>>> Raid Devices : 7
>>> Total Devices : 8
>>> Preferred Minor : 0
>>>
>>> Update Time : Tue Jun 2 09:11:49 2009
>>> State : clean
>>> Active Devices : 5
>>> Working Devices : 7
>>> Failed Devices : 1
>>> Spare Devices : 2
>>> Checksum : 22d3f975 - correct
>>> Events : 2599992
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> Number Major Minor RaidDevice State
>>> this 1 8 177 1 active sync /dev/sdl1
>>>
>>> 0 0 0 0 0 removed
>>> 1 1 8 177 1 active sync /dev/sdl1
>>> 2 2 8 113 2 active sync /dev/sdh1
>>> 3 3 8 145 3 active sync /dev/sdj1
>>> 4 4 8 161 4 active sync /dev/sdk1
>>> 5 5 8 97 5 active sync /dev/sdg1
>>> 6 6 0 0 6 faulty removed
>>> 7 7 8 129 7 spare /dev/sdi1
>>> 8 8 8 17 8 spare /dev/sdb1
>>>
>>> the old RAID configuration was:
>>>
>>> disc 0: sdi1 <- is now disc 7 and SPARE
>>> disc 1: sdl1
>>> disc 2: sdh1
>>> disc 3: sdj1
>>> disc 4: sdk1
>>> disc 5: sdg1
>>> disc 6: sda1 <- is now faulty removed
>>>
>>> [root@localhost log]# mdadm --assemble --force /dev/md0 /dev/sd[ilhjkgab]1
>>> mdadm: /dev/md/0 assembled from 5 drives and 2 spares - not enough to start
>>> the array.
>>> [root@localhost log]# cat /proc/mdstat
>>> Personalities :
>>> md0 : inactive sdl1[1](S) sdb1[8](S) sdi1[7](S) sda1[6](S) sdg1[5](S)
>>> sdk1[4](S) sdj1[3](S) sdh1[2](S)
>>> 8790840960 blocks
>>>
>>>
>>> On large arrays this may happen a lot: A bad drive is first discovered
>>> during maintenance operations when it's too late. Maybe an option to add a
>>> redundant drive in a fail-save way would be a good idea to add to md
>>> sevices.
>>>
>>> Please tell me if you see any solution to the problems below.
>>>
>>> 1. Is it possible to reassign /dev/sdi1 as disc 0 and access the RAID as is
>>> was before the restore attempt?
>>>
>>> 2. Is it possible to reassign /dev/sda1 as disc 6 and backup the still
>>> readable data on the RAID?
>>>
>>> 3. I guess more then 90% of data was written to /dev/sdb1 in the restore
>>> attempt. Is it possble to use /dev/sdb1 as disc 7 to access the RAID?
>>>
>>> Thank you for looking at the problem
>>> Alexander
>>> --
>>> View this message in context: http://www.nabble.com/RAID-5-re-add-of-removed-drive--%28failed-drive-replacement%29-tp23828899p23828899.html
>>> Sent from the linux-raid mailing list archive at Nabble.com.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> -- Sujit K M
>>
>
>
>
> --
> -- Sujit K M
>
--
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 10:09 RAID 5 re-add of removed drive? (failed drive replacement) Alex R
2009-06-02 10:18 ` Sujit Karataparambil
@ 2009-06-02 11:17 ` Robin Hill
2009-06-02 12:00 ` Alexander Rietsch
1 sibling, 1 reply; 12+ messages in thread
From: Robin Hill @ 2009-06-02 11:17 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]
On Tue Jun 02, 2009 at 03:09:11AM -0700, Alex R wrote:
>
> I have a serious RAID problem here. Please have a look at this. Any help
> would be greatly appreciated!
>
> As always, most problems occur only during critical tasks like
> enlarging/restoring. I tried to replace a drive in my 7disc 6T RAID5 array
> as explained here:
> http://michael-prokop.at/blog/2006/09/09/raid5-online-resizing-with-linux/
>
> After removing a drive and restoring to the new one, another disc in the
> array failed. Now I still have all the data redundantly available (the old
> drive is still there), but the RAID header is now in a state where it's
> impossible to access the data. Is it possible to rearrange the drives to
> force the kernel to a valid array?
>
<-- SNIP details -->
AFAIK, the only solution at this stage is to recreate the array.
You need to use the "--assume-clean" flag (or replace one of the drives
with "missing"), along with _exactly_ the same parameters & drive order
as when you originally created the array (you should be able to get most
of this from mdadm -D). This will rewrite the RAID metadata, but leave
the filesystem untouched.
HTH,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
@ 2009-06-02 12:00 ` Alexander Rietsch
2009-06-02 13:10 ` Robin Hill
0 siblings, 1 reply; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 12:00 UTC (permalink / raw)
To: linux-raid
>
> AFAIK, the only solution at this stage is to recreate the array.
>
> You need to use the "--assume-clean" flag (or replace one of the
> drives
> with "missing"), along with _exactly_ the same parameters & drive
> order
> as when you originally created the array (you should be able to get
> most
> of this from mdadm -D). This will rewrite the RAID metadata, but
> leave
> the filesystem untouched.
A glimpse of hope. Thank you! Didn't know about this --assume-clean
flag. So just to double-check:
The array to create would be:
disc 0: sdi1 <- is now disc 7 and SPARE due to failed replacement
operation
disc 1: sdl1
disc 2: sdh1
disc 3: sdj1
disc 4: sdk1
disc 5: sdg1
disc 6: sda1 <- is now faulty removed
So I just create an incomplete array without sda1 in the same order
which would be:
mdadm --create /dev/md0 --assume-clean --level=5 --chunk=64 --raid-
devices=7 /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1
I'm not sure about the drive oder in the mdadm command: is it correct
to assume <drive 0> <drive 1> <drive 2> in order or is it mirrored
like <drive 2> <drive 1> <drive 0> ?
I also hope the command doesn't trigger any recovery actions or
filesystem changes..
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 12:00 ` Alexander Rietsch
@ 2009-06-02 13:10 ` Robin Hill
2009-06-02 14:24 ` Alexander Rietsch
0 siblings, 1 reply; 12+ messages in thread
From: Robin Hill @ 2009-06-02 13:10 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2380 bytes --]
On Tue Jun 02, 2009 at 02:00:15PM +0200, Alexander Rietsch wrote:
>>
>> AFAIK, the only solution at this stage is to recreate the array.
>>
>> You need to use the "--assume-clean" flag (or replace one of the drives
>> with "missing"), along with _exactly_ the same parameters & drive order
>> as when you originally created the array (you should be able to get most
>> of this from mdadm -D). This will rewrite the RAID metadata, but leave
>> the filesystem untouched.
>
> A glimpse of hope. Thank you! Didn't know about this --assume-clean flag.
> So just to double-check:
>
> The array to create would be:
> disc 0: sdi1 <- is now disc 7 and SPARE due to failed replacement operation
> disc 1: sdl1
> disc 2: sdh1
> disc 3: sdj1
> disc 4: sdk1
> disc 5: sdg1
> disc 6: sda1 <- is now faulty removed
>
> So I just create an incomplete array without sda1 in the same order which
> would be:
>
> mdadm --create /dev/md0 --assume-clean --level=5 --chunk=64
> --raid-devices=7 /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1
> /dev/sdg1
>
Almost - you'll also need to specify "missing" for disc 6 (and the
--assume-clean isn't actually needed in this case, as the array can't do
any reconstruction with a missing drive), so:
mdadm --create /dev/md0 --level=5 --chunk=64 --raid-devices=7
/dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing
> I'm not sure about the drive oder in the mdadm command: is it correct to
> assume <drive 0> <drive 1> <drive 2> in order or is it mirrored like <drive
> 2> <drive 1> <drive 0> ?
> I also hope the command doesn't trigger any recovery actions or filesystem
> changes..
This should be safe, yes - the numbers are also given in the output from
"mdadm -D /dev/md0" or "mdadm -E /dev/sdl1". The array creation doesn't
trigger any changes at all to the filesystem (though mounting it might,
even in read-only mode) so is perfectly safe to do. You can also try
"fsck -n" on the filesystem before mounting to verify that the array
order is correct - this may fail on filesystems with unflushed journal
data though.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 13:10 ` Robin Hill
@ 2009-06-02 14:24 ` Alexander Rietsch
2009-06-08 9:19 ` David Greaves
0 siblings, 1 reply; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 14:24 UTC (permalink / raw)
To: Robin Hill; +Cc: linux-raid
On 02.06.2009, at 15:10, Robin Hill wrote:
> Almost - you'll also need to specify "missing" for disc 6 (and the
> --assume-clean isn't actually needed in this case, as the array
> can't do
> any reconstruction with a missing drive), so:
>
> mdadm --create /dev/md0 --level=5 --chunk=64 --raid-devices=7
> /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing
Yes, that's it! The RAID is alive! Mr. Robin Hill, you're a HERO!
With this trick, it's possible to recover a RAID which was confused by
a data error during disk-replacement. I'll note this somewhere.
Here's the log of the creation command for completeness:
[root@localhost ~]# mdadm --create /dev/md0 --assume-clean --level=5 --
chunk=64 --raid-devices=7 --spare-devices=0 /dev/sdi1 /dev/sdl1 /dev/
sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing
mdadm: /dev/sdi1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Sun Nov 2 13:21:54 2008
mdadm: /dev/sdl1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Sun Nov 2 13:21:54 2008
mdadm: /dev/sdh1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Sun Nov 2 13:21:54 2008
mdadm: /dev/sdj1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Sun Nov 2 13:21:54 2008
mdadm: /dev/sdk1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Sun Nov 2 13:21:54 2008
mdadm: /dev/sdg1 appears to be part of a raid array:
level=raid5 devices=7 ctime=Sun Nov 2 13:21:54 2008
mdadm: largest drive (/dev/sdg1) exceeds size (976759936K) by more
than 1%
Continue creating array? y
mdadm: array /dev/md/0 started.
Jun 2 15:34:47 localhost klogd: md: bind<sdi1>
Jun 2 15:34:47 localhost klogd: md: bind<sdl1>
Jun 2 15:34:47 localhost klogd: md: bind<sdh1>
Jun 2 15:34:47 localhost klogd: md: bind<sdj1>
Jun 2 15:34:47 localhost klogd: md: bind<sdk1>
Jun 2 15:34:47 localhost klogd: md: bind<sdg1>
Jun 2 15:34:47 localhost klogd: md: raid6 personality registered for
level 6
Jun 2 15:34:47 localhost klogd: md: raid5 personality registered for
level 5
Jun 2 15:34:47 localhost klogd: md: raid4 personality registered for
level 4
Jun 2 15:34:47 localhost klogd: raid5: device sdg1 operational as
raid disk 5
Jun 2 15:34:47 localhost klogd: raid5: device sdk1 operational as
raid disk 4
Jun 2 15:34:47 localhost klogd: raid5: device sdj1 operational as
raid disk 3
Jun 2 15:34:47 localhost klogd: raid5: device sdh1 operational as
raid disk 2
Jun 2 15:34:47 localhost klogd: raid5: device sdl1 operational as
raid disk 1
Jun 2 15:34:47 localhost klogd: raid5: device sdi1 operational as
raid disk 0
Jun 2 15:34:47 localhost klogd: raid5: allocated 7434kB for md0
Jun 2 15:34:47 localhost klogd: raid5: raid level 5 set md0 active
with 6 out of 7 devices, algorithm 2
Jun 2 15:34:47 localhost klogd: RAID5 conf printout:
Jun 2 15:34:47 localhost klogd: --- rd:7 wd:6
Jun 2 15:34:47 localhost klogd: disk 0, o:1, dev:sdi1
Jun 2 15:34:47 localhost klogd: disk 1, o:1, dev:sdl1
Jun 2 15:34:47 localhost klogd: disk 2, o:1, dev:sdh1
Jun 2 15:34:47 localhost klogd: disk 3, o:1, dev:sdj1
Jun 2 15:34:47 localhost klogd: disk 4, o:1, dev:sdk1
Jun 2 15:34:47 localhost klogd: disk 5, o:1, dev:sdg1
Jun 2 15:34:47 localhost klogd: md0: detected capacity change from 0
to 6001213046784
Jun 2 15:34:47 localhost klogd: md0: unknown partition table
[root@localhost ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdg1[5] sdk1[4] sdj1[3] sdh1[2] sdl1[1] sdi1[0]
5860559616 blocks level 5, 64k chunk, algorithm 2 [7/6] [UUUUUU_]
unused devices: <none>
Again, thanks a lot for your help. Very appreciated.
Alexander
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 14:24 ` Alexander Rietsch
@ 2009-06-08 9:19 ` David Greaves
0 siblings, 0 replies; 12+ messages in thread
From: David Greaves @ 2009-06-08 9:19 UTC (permalink / raw)
To: Alexander Rietsch; +Cc: Robin Hill, linux-raid, Sujit Karataparambil
Alexander Rietsch wrote:
> On 02.06.2009, at 15:10, Robin Hill wrote:
>
>> Almost - you'll also need to specify "missing" for disc 6 (and the
>> --assume-clean isn't actually needed in this case, as the array can't do
>> any reconstruction with a missing drive), so:
>>
>> mdadm --create /dev/md0 --level=5 --chunk=64 --raid-devices=7
>> /dev/sdi1 /dev/sdl1 /dev/sdh1 /dev/sdj1 /dev/sdk1 /dev/sdg1 missing
>
> Yes, that's it! The RAID is alive! Mr. Robin Hill, you're a HERO!
>
> With this trick, it's possible to recover a RAID which was confused by a
> data error during disk-replacement. I'll note this somewhere.
Maybe:
http://linux-raid.osdl.org/
:)
I've not had time to update it recently.
Sujit Karataparambil wrote:
> http://www.tldp.org/HOWTO/Software-RAID-HOWTO-3.html
>
> This is the RAID Documentation which I found very less suffiecient.
I spent some considerable time trying to get that resolved but sadly they were
of the opinion that it is better for tldp to provide misleading docs rather than
no docs or a link to better docs. I updated it and moved it to the link above.
Sujit Karataparambil wrote:
> Kindly Read the document correctly and throughly.
>
> raidhotadd /dev/mdX /dev/sdb
nb this is very very old unsupported software and it may not be wise to suggest it.
David
--
"Don't worry, you'll be fine; I saw it work in a cartoon once..."
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
2009-06-02 14:20 Jon Hardcastle
@ 2009-06-02 17:13 ` Alexander Rietsch
0 siblings, 0 replies; 12+ messages in thread
From: Alexander Rietsch @ 2009-06-02 17:13 UTC (permalink / raw)
To: Jon; +Cc: linux-raid
On 02.06.2009, at 16:20, Jon Hardcastle wrote:
>> As always, most problems occur only during critical tasks
>> like
>> enlarging/restoring.
>
> ....
>
> Is this a compelling case for regular
>
> echo check >> /sys/block/mdX/md/sync_action
>
> and/or
>
> echo repair >> /sys/block/mdX/md/sync_action
>
> ??
Yes, indeed! Another lesson learned.
This might have prevented numerous cases in the past when disc surface
errors occurred while enlarging and resizing a raid array. But all
those serious incidents where handled perfectly and flawlessly by the
software raid module. Congrats to all kernel developers here. The raid
module is far more flexible, error prone and stable than any hardware
raid solution I've seen so far.
Regards,
Alexander
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: RAID 5 re-add of removed drive? (failed drive replacement)
@ 2009-06-02 14:20 Jon Hardcastle
2009-06-02 17:13 ` Alexander Rietsch
0 siblings, 1 reply; 12+ messages in thread
From: Jon Hardcastle @ 2009-06-02 14:20 UTC (permalink / raw)
To: linux-raid, Alex R
--- On Tue, 2/6/09, Alex R <Alexander.Rietsch@hispeed.ch> wrote:
> From: Alex R <Alexander.Rietsch@hispeed.ch>
> Subject: RAID 5 re-add of removed drive? (failed drive replacement)
> To: linux-raid@vger.kernel.org
> Date: Tuesday, 2 June, 2009, 11:09 AM
>
> I have a serious RAID problem here. Please have a look at
> this. Any help
> would be greatly appreciated!
>
> As always, most problems occur only during critical tasks
> like
> enlarging/restoring.
....
> http://vger.kernel.org/majordomo-info.html
>
Is this a compelling case for regular
echo check >> /sys/block/mdX/md/sync_action
and/or
echo repair >> /sys/block/mdX/md/sync_action
??
I have cron job that does this weekly.
(meant more for people that find this thread in 300yrs time)
-----------------------
N: Jon Hardcastle
E: Jon@eHardcastle.com
'Do not worry about tomorrow, for tomorrow will bring worries of its own.'
Please sponsor me for the London to Brighton 2009.
Just Giving: http://www.justgiving.com/jonathanhardcastle
-----------------------
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-06-08 9:19 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-02 10:09 RAID 5 re-add of removed drive? (failed drive replacement) Alex R
2009-06-02 10:18 ` Sujit Karataparambil
2009-06-02 10:45 ` Alexander Rietsch
2009-06-02 10:52 ` Sujit Karataparambil
2009-06-02 10:55 ` Sujit Karataparambil
2009-06-02 11:17 ` Robin Hill
2009-06-02 12:00 ` Alexander Rietsch
2009-06-02 13:10 ` Robin Hill
2009-06-02 14:24 ` Alexander Rietsch
2009-06-08 9:19 ` David Greaves
2009-06-02 14:20 Jon Hardcastle
2009-06-02 17:13 ` Alexander Rietsch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.