Likely forced assemby with wrong disk during raid5 grow. Recoverable?

* Likely forced assemby with wrong disk during raid5 grow. Recoverable?
       [not found] <AANLkTikhOAXQ6JAG1fK3x9V3icki8cjn0_ggyQwkGmnt@mail.gmail.com>
@ 2011-02-20  3:23 ` Claude Nobs
  2011-02-20  5:25   ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Claude Nobs @ 2011-02-20  3:23 UTC (permalink / raw)
  To: linux-raid

Hi All,

I was wondering if someone might be willing to share if this array is
recoverable.

I had a clean, running raid5 using 4 block devices (two of those were
2 disk raid0 md devices) in RAID 5. Last night I decided it was safe
to grow the array by one disk. But then a) a disk failed, b) a power
loss occured, c) i probably switched the wrong disk and forced
assembly, resulting in an inconsistent state. Here is a complete set
of actions taken :

> bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2
> mdadm: Need to backup 768K of critical section..
> mdadm: ... critical section passed.
> bernstein@server:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md1 : active raid0 sdg1[1] sdf1[0]
>       976770944 blocks super 1.2 64k chunks
>
> md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0]
>       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       [>....................]  reshape =  1.6% (16423164/976760640) finish=902.2min speed=17739K/sec
>
> md0 : active raid0 sdh1[0] sdb1[1]
>       976770944 blocks super 1.2 64k chunks
>
> unused devices: <none>

now i thought /dev/sdg1 failed. unfortunately i have no log for this
one, just my memory of seeing this changed to the one above :

>       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU]

some 10 minutes later a power loss occurred, thanks to an ups the
server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1,
rebooted and in a lapse of judgement forced assembly:

> bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
> mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
> mdadm: Failed to restore critical section for reshape, sorry.
>
> bernstein@server:~$ sudo mdadm --detail /dev/md2
> /dev/md2:
>         Version : 01.02
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>   Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
>    Raid Devices : 5
>   Total Devices : 3
> Preferred Minor : 3
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Feb 19 22:32:04 2011
>           State : active, degraded, Not Started
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>   Delta Devices : 1, (4->5)
>
>            Name : master:public
>            UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>          Events : 133609
>
>     Number   Major   Minor   RaidDevice State
>        0       8       33        0      active sync   /dev/sdc1
>        1       0        0        1      removed
>        2       0        0        2      removed
>        4       9        0        3      active sync   /dev/block/9:0
>        5       8        1        4      active sync   /dev/sda1

so i reattached the old disk, got /dev/md1 back and did the
investigation i should have done before :

> bernstein@server:~$ sudo mdadm --examine /dev/sdd1
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6
>
>   Reshape pos'n : 489510400 (466.83 GiB 501.26 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:23:09 2011
>        Checksum : fd0c1794 - correct
>          Events : 133567
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 1 (0, 1, failed, 2, 3, 4)
>    Array State : uUuuu 1 failed
> bernstein@server:~$ sudo mdadm --examine /dev/sda1
> /dev/sda1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : baebd175:e4128e4c:f768b60f:4df18f77
>
>   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:32:04 2011
>        Checksum : 12c832c6 - correct
>          Events : 133609
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 5 (0, failed, failed, failed, 3, 4)
>    Array State : u__uU 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/sdc1
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94
>
>   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:32:04 2011
>        Checksum : 8aa7d094 - correct
>          Events : 133609
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 0 (0, failed, failed, failed, 3, 4)
>    Array State : U__uu 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/md0
> /dev/md0:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893
>
>   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:32:04 2011
>        Checksum : 1bbf913b - correct
>          Events : 133609
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 4 (0, failed, failed, failed, 3, 4)
>    Array State : u__Uu 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/md1
> /dev/md1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed
>
>   Reshape pos'n : 502809856 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:30:29 2011
>        Checksum : 6c591e90 - correct
>          Events : 133603
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 3 (0, failed, failed, 2, 3, 4)
>    Array State : u_Uuu 2 failed

so obviously not /dev/sdd1 failed. however (due to that silly forced
assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a
few bytes, resulting in an inconsistent state...

> bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1
>
> mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.
> bernstein@server:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S)
>       4883823704 blocks super 1.2
>
> md1 : active raid0 sdf1[0] sdg1[1]
>       976770944 blocks super 1.2 64k chunks
>
> md0 : active raid0 sdb1[1] sdh1[0]
>       976770944 blocks super 1.2 64k chunks
>
> unused devices: <none>

i do have a backup but since recovery from it takes a few days, i'd
like to know if there is a way to recover the array or if it's
completely lost.

Any suggestions gratefully received,

claude
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread