All of lore.kernel.org
 help / color / mirror / Atom feed
* Likely forced assemby with wrong disk during raid5 grow. Recoverable?
       [not found] <AANLkTikhOAXQ6JAG1fK3x9V3icki8cjn0_ggyQwkGmnt@mail.gmail.com>
@ 2011-02-20  3:23 ` Claude Nobs
  2011-02-20  5:25   ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Claude Nobs @ 2011-02-20  3:23 UTC (permalink / raw)
  To: linux-raid

Hi All,

I was wondering if someone might be willing to share if this array is
recoverable.

I had a clean, running raid5 using 4 block devices (two of those were
2 disk raid0 md devices) in RAID 5. Last night I decided it was safe
to grow the array by one disk. But then a) a disk failed, b) a power
loss occured, c) i probably switched the wrong disk and forced
assembly, resulting in an inconsistent state. Here is a complete set
of actions taken :

> bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2
> mdadm: Need to backup 768K of critical section..
> mdadm: ... critical section passed.
> bernstein@server:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md1 : active raid0 sdg1[1] sdf1[0]
>       976770944 blocks super 1.2 64k chunks
>
> md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0]
>       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
>       [>....................]  reshape =  1.6% (16423164/976760640) finish=902.2min speed=17739K/sec
>
> md0 : active raid0 sdh1[0] sdb1[1]
>       976770944 blocks super 1.2 64k chunks
>
> unused devices: <none>


now i thought /dev/sdg1 failed. unfortunately i have no log for this
one, just my memory of seeing this changed to the one above :

>       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU]

some 10 minutes later a power loss occurred, thanks to an ups the
server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1,
rebooted and in a lapse of judgement forced assembly:

> bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
> mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
> mdadm: Failed to restore critical section for reshape, sorry.
>
> bernstein@server:~$ sudo mdadm --detail /dev/md2
> /dev/md2:
>         Version : 01.02
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>   Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
>    Raid Devices : 5
>   Total Devices : 3
> Preferred Minor : 3
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Feb 19 22:32:04 2011
>           State : active, degraded, Not Started
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>   Delta Devices : 1, (4->5)
>
>            Name : master:public
>            UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>          Events : 133609
>
>     Number   Major   Minor   RaidDevice State
>        0       8       33        0      active sync   /dev/sdc1
>        1       0        0        1      removed
>        2       0        0        2      removed
>        4       9        0        3      active sync   /dev/block/9:0
>        5       8        1        4      active sync   /dev/sda1

so i reattached the old disk, got /dev/md1 back and did the
investigation i should have done before :

> bernstein@server:~$ sudo mdadm --examine /dev/sdd1
> /dev/sdd1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6
>
>   Reshape pos'n : 489510400 (466.83 GiB 501.26 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:23:09 2011
>        Checksum : fd0c1794 - correct
>          Events : 133567
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 1 (0, 1, failed, 2, 3, 4)
>    Array State : uUuuu 1 failed
> bernstein@server:~$ sudo mdadm --examine /dev/sda1
> /dev/sda1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : baebd175:e4128e4c:f768b60f:4df18f77
>
>   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:32:04 2011
>        Checksum : 12c832c6 - correct
>          Events : 133609
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 5 (0, failed, failed, failed, 3, 4)
>    Array State : u__uU 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/sdc1
> /dev/sdc1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94
>
>   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:32:04 2011
>        Checksum : 8aa7d094 - correct
>          Events : 133609
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 0 (0, failed, failed, failed, 3, 4)
>    Array State : U__uu 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/md0
> /dev/md0:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893
>
>   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:32:04 2011
>        Checksum : 1bbf913b - correct
>          Events : 133609
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 4 (0, failed, failed, failed, 3, 4)
>    Array State : u__Uu 3 failed
> bernstein@server:~$ sudo mdadm --examine /dev/md1
> /dev/md1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>            Name : master:public
>   Creation Time : Sat Jan 22 00:15:43 2011
>      Raid Level : raid5
>    Raid Devices : 5
>
>  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>     Data Offset : 272 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed
>
>   Reshape pos'n : 502809856 (479.52 GiB 514.88 GB)
>   Delta Devices : 1 (4->5)
>
>     Update Time : Sat Feb 19 22:30:29 2011
>        Checksum : 6c591e90 - correct
>          Events : 133603
>
>          Layout : left-symmetric
>      Chunk Size : 64K
>
>     Array Slot : 3 (0, failed, failed, 2, 3, 4)
>    Array State : u_Uuu 2 failed

so obviously not /dev/sdd1 failed. however (due to that silly forced
assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a
few bytes, resulting in an inconsistent state...

> bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1
>
> mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.
> bernstein@server:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S)
>       4883823704 blocks super 1.2
>
> md1 : active raid0 sdf1[0] sdg1[1]
>       976770944 blocks super 1.2 64k chunks
>
> md0 : active raid0 sdb1[1] sdh1[0]
>       976770944 blocks super 1.2 64k chunks
>
> unused devices: <none>

i do have a backup but since recovery from it takes a few days, i'd
like to know if there is a way to recover the array or if it's
completely lost.

Any suggestions gratefully received,

claude
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-20  3:23 ` Likely forced assemby with wrong disk during raid5 grow. Recoverable? Claude Nobs
@ 2011-02-20  5:25   ` NeilBrown
  2011-02-20 14:44     ` Claude Nobs
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-02-20  5:25 UTC (permalink / raw)
  To: Claude Nobs; +Cc: linux-raid

On Sun, 20 Feb 2011 04:23:09 +0100 Claude Nobs <claudenobs@blunet.cc> wrote:

> Hi All,
> 
> I was wondering if someone might be willing to share if this array is
> recoverable.
> 

Probably is.  But don't do anything yet - any further action until you have
read all of the following email, will probably cause more harm than good.

> I had a clean, running raid5 using 4 block devices (two of those were
> 2 disk raid0 md devices) in RAID 5. Last night I decided it was safe
> to grow the array by one disk. But then a) a disk failed, b) a power
> loss occured, c) i probably switched the wrong disk and forced
> assembly, resulting in an inconsistent state. Here is a complete set
> of actions taken :

Providing this level of information is excellent!


> 
> > bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2
> > mdadm: Need to backup 768K of critical section..
> > mdadm: ... critical section passed.
> > bernstein@server:~$ cat /proc/mdstat
> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> > md1 : active raid0 sdg1[1] sdf1[0]
> >       976770944 blocks super 1.2 64k chunks
> >
> > md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0]
> >       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> >       [>....................]  reshape =  1.6% (16423164/976760640) finish=902.2min speed=17739K/sec
> >
> > md0 : active raid0 sdh1[0] sdb1[1]
> >       976770944 blocks super 1.2 64k chunks
> >
> > unused devices: <none>

All looks good so-far.

> 
> 
> now i thought /dev/sdg1 failed. unfortunately i have no log for this
> one, just my memory of seeing this changed to the one above :
> 
> >       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU]
> 

Unfortunately it is not possible to know which drive is missing from the
above info.  The [numbers] is brackets don't exactly corresponds to the
positions in the array that you might thing they do.  The mdstat listing above
has numbers 0,1,3,4,5.

They are the 'Number' column in the --detail output below.  This is /dev/md1
- I can tell from the --examine outputs, but it is a bit confusing.  Newer
versions of mdadm make this a little less confusing.  If you look for
patterns of U and u  in the 'Array State' line, the U is 'this device', the
'u' is some other devices.

So /dev/md1 had a failure, so it could well have been sdg1.


> some 10 minutes later a power loss occurred, thanks to an ups the
> server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1,
> rebooted and in a lapse of judgement forced assembly:

Perfect timing :-)

> 
> > bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
> > mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
> > mdadm: Failed to restore critical section for reshape, sorry.

This isn't actually a 'forced assembly' as you seem to think.  There is no
'-f' or '--force'.  It didn't cause any harm.

> >
> > bernstein@server:~$ sudo mdadm --detail /dev/md2
> > /dev/md2:
> >         Version : 01.02
> >   Creation Time : Sat Jan 22 00:15:43 2011
> >      Raid Level : raid5
> >   Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
> >    Raid Devices : 5
> >   Total Devices : 3
> > Preferred Minor : 3
> >     Persistence : Superblock is persistent
> >
> >     Update Time : Sat Feb 19 22:32:04 2011
> >           State : active, degraded, Not Started
                                        ^^^^^^^^^^^^

mdadm has put the devices together as best it can, but has not started the
array because it didn't have enough devices.  This is good.


> >  Active Devices : 3
> > Working Devices : 3
> >  Failed Devices : 0
> >   Spare Devices : 0
> >
> >          Layout : left-symmetric
> >      Chunk Size : 64K
> >
> >   Delta Devices : 1, (4->5)
> >
> >            Name : master:public
> >            UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> >          Events : 133609
> >
> >     Number   Major   Minor   RaidDevice State
> >        0       8       33        0      active sync   /dev/sdc1
> >        1       0        0        1      removed
> >        2       0        0        2      removed
> >        4       9        0        3      active sync   /dev/block/9:0
> >        5       8        1        4      active sync   /dev/sda1

Some you now have 2 devices missing.  Along as we can find the devices, 
  mdadm --assemble --force
should be able to put them togethe for you.  But let's see  what we have...

> 
> so i reattached the old disk, got /dev/md1 back and did the
> investigation i should have done before :
> 
> > bernstein@server:~$ sudo mdadm --examine /dev/sdd1
> > /dev/sdd1:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x4
> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> >            Name : master:public
> >   Creation Time : Sat Jan 22 00:15:43 2011
> >      Raid Level : raid5
> >    Raid Devices : 5
> >
> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> >     Data Offset : 272 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6
> >
> >   Reshape pos'n : 489510400 (466.83 GiB 501.26 GB)
> >   Delta Devices : 1 (4->5)
> >
> >     Update Time : Sat Feb 19 22:23:09 2011
> >        Checksum : fd0c1794 - correct
> >          Events : 133567
> >
> >          Layout : left-symmetric
> >      Chunk Size : 64K
> >
> >     Array Slot : 1 (0, 1, failed, 2, 3, 4)
> >    Array State : uUuuu 1 failed

This device thinks all is well.  The "1 failed" is misleading.  The
   uUuuu
patterns says that all the devices are though to be working.
Note for later reference:
         Events: 133567
 Reshape pos'n : 489510400


> > bernstein@server:~$ sudo mdadm --examine /dev/sda1
> > /dev/sda1:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x4
> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> >            Name : master:public
> >   Creation Time : Sat Jan 22 00:15:43 2011
> >      Raid Level : raid5
> >    Raid Devices : 5
> >
> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> >     Data Offset : 272 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : baebd175:e4128e4c:f768b60f:4df18f77
> >
> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
> >   Delta Devices : 1 (4->5)
> >
> >     Update Time : Sat Feb 19 22:32:04 2011
> >        Checksum : 12c832c6 - correct
> >          Events : 133609
> >
> >          Layout : left-symmetric
> >      Chunk Size : 64K
> >
> >     Array Slot : 5 (0, failed, failed, failed, 3, 4)
> >    Array State : u__uU 3 failed

This device thinks devices 1 and 2 have failed (the '_'s).
So 'sdd1' above, and and md1.
        Events : 133609 - this has advanced a bit from sdd1
 Reshape Pos'n : 502815488 - this has advanced quite a lot.


> > bernstein@server:~$ sudo mdadm --examine /dev/sdc1
> > /dev/sdc1:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x4
> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> >            Name : master:public
> >   Creation Time : Sat Jan 22 00:15:43 2011
> >      Raid Level : raid5
> >    Raid Devices : 5
> >
> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> >     Data Offset : 272 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94
> >
> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
> >   Delta Devices : 1 (4->5)
> >
> >     Update Time : Sat Feb 19 22:32:04 2011
> >        Checksum : 8aa7d094 - correct
> >          Events : 133609
> >
> >          Layout : left-symmetric
> >      Chunk Size : 64K
> >
> >     Array Slot : 0 (0, failed, failed, failed, 3, 4)
> >    Array State : U__uu 3 failed

 Reshape pos'n, Events, and Array State are identical to sda1.
So these two are in agreement.


> > bernstein@server:~$ sudo mdadm --examine /dev/md0
> > /dev/md0:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x4
> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> >            Name : master:public
> >   Creation Time : Sat Jan 22 00:15:43 2011
> >      Raid Level : raid5
> >    Raid Devices : 5
> >
> >  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> >     Data Offset : 272 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893
> >
> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
> >   Delta Devices : 1 (4->5)
> >
> >     Update Time : Sat Feb 19 22:32:04 2011
> >        Checksum : 1bbf913b - correct
> >          Events : 133609
> >
> >          Layout : left-symmetric
> >      Chunk Size : 64K
> >
> >     Array Slot : 4 (0, failed, failed, failed, 3, 4)
> >    Array State : u__Uu 3 failed

again, exactly the same as sda1 and sdc1.

> > bernstein@server:~$ sudo mdadm --examine /dev/md1
> > /dev/md1:
> >           Magic : a92b4efc
> >         Version : 1.2
> >     Feature Map : 0x4
> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
> >            Name : master:public
> >   Creation Time : Sat Jan 22 00:15:43 2011
> >      Raid Level : raid5
> >    Raid Devices : 5
> >
> >  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
> >     Data Offset : 272 sectors
> >    Super Offset : 8 sectors
> >           State : clean
> >     Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed
> >
> >   Reshape pos'n : 502809856 (479.52 GiB 514.88 GB)
> >   Delta Devices : 1 (4->5)
> >
> >     Update Time : Sat Feb 19 22:30:29 2011
> >        Checksum : 6c591e90 - correct
> >          Events : 133603
> >
> >          Layout : left-symmetric
> >      Chunk Size : 64K
> >
> >     Array Slot : 3 (0, failed, failed, 2, 3, 4)
> >    Array State : u_Uuu 2 failed

And here is md1.  Thinks device 2 - sdd1 - has failed.
        Events : 133603 - slightly behind the 3 good devices, be well after
                                                  sdd1
 Reshape Pos'n : 502809856 - just a little before the 3 good devices too.

> 
> so obviously not /dev/sdd1 failed. however (due to that silly forced
> assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a
> few bytes, resulting in an inconsistent state...

The way I read it is:

  sdd1 failed first - shortly after Sat Feb 19 22:23:09 2011 - the update
                      time on sdd1
reshape continued until some time between Sat Feb 19 22:30:29 2011
and Sat Feb 19 22:32:04 2011 when md1 had a failure.
The reshape couldn't continue now, so it stopped.

So the data on sdd1 is only (there has been about 8 minutes of reshape since
then) and cannot be used.
The data on md1 is very close to the rest.  The data that was in the process
of being relocated lives in two locations on the 'good' drives, both the new
and the old.  It only lives in the 'old' location on md1.

So what we need to do is re-assemble the array, but telling it that the
reshape has only gone as far as md1 thinks it has.  This will make sure it
repeats that last part of the reshape.

mdadm -Af should do that BUT IT DOESN'T.  Assuming I have thought through
this properly (and I should go through it again with more care), mdadm won't
do the right thing for you.  I need to get it to handle 'reshape' specially
when doing a --force assemble.

> 
> > bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1
> >
> > mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.
> > bernstein@server:~$ cat /proc/mdstat
> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> > md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S)
> >       4883823704 blocks super 1.2
> >
> > md1 : active raid0 sdf1[0] sdg1[1]
> >       976770944 blocks super 1.2 64k chunks
> >
> > md0 : active raid0 sdb1[1] sdh1[0]
> >       976770944 blocks super 1.2 64k chunks
> >
> > unused devices: <none>
> 
> i do have a backup but since recovery from it takes a few days, i'd
> like to know if there is a way to recover the array or if it's
> completely lost.
> 
> Any suggestions gratefully received,

The fact that you have a backup is excellent.  You might need it, but I hope
not.

I would like to provide you with a modified version of mdadm which you can
then user to --force assemble the array.  It should be able to get you access
to all your data.
The array will be degraded and will finish reshape in that state.  Then you
will need to add sdd1 back in (Assuming you are confident that it works) and
it will be rebuilt.

Just to go through some of the numbers...

Chunk size is 64K.  Reshape was 4->5, so 3 -> 4 data disks.
So old stripes have 192K, new stripes have 256K.

The 'good' disks think reshape has reached 502815488K which is
1964123 new stripes. (2618830.66 old stripes)
md1 thinks reshape has only reached 489510400K which is 1912150
new stripes (2549533.33 old stripes).

So of the 51973 stripes that have been reshaped since the last metadata
update on sdd1, some will have been done on sdd1, but some not, and we don't
really know how many.  But it is perfectly safe to repeat those stripes
as all writes to that region will have been suspended (and you probably
weren't writing anyway).

So I need to change the loop in Assemble.c which calls ->update_super
with "force-one" to also make sure the reshape_position in the 'chosen'
superblock match the oldest 'forced' superblock.

So if you are able to wait a day, I'll try to write a patch first thing
tomorrow and send it to you.

Thanks for the excellent problem report.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-20  5:25   ` NeilBrown
@ 2011-02-20 14:44     ` Claude Nobs
  2011-02-20 14:47       ` Mathias Burén
  2011-02-21  0:53       ` NeilBrown
  0 siblings, 2 replies; 9+ messages in thread
From: Claude Nobs @ 2011-02-20 14:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sun, Feb 20, 2011 at 06:25, NeilBrown <neilb@suse.de> wrote:
> On Sun, 20 Feb 2011 04:23:09 +0100 Claude Nobs <claudenobs@blunet.cc> wrote:
>
>> Hi All,
>>
>> I was wondering if someone might be willing to share if this array is
>> recoverable.
>>
>
> Probably is.  But don't do anything yet - any further action until you have
> read all of the following email, will probably cause more harm than good.
>
>> I had a clean, running raid5 using 4 block devices (two of those were
>> 2 disk raid0 md devices) in RAID 5. Last night I decided it was safe
>> to grow the array by one disk. But then a) a disk failed, b) a power
>> loss occured, c) i probably switched the wrong disk and forced
>> assembly, resulting in an inconsistent state. Here is a complete set
>> of actions taken :
>
> Providing this level of information is excellent!
>
>
>>
>> > bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2
>> > mdadm: Need to backup 768K of critical section..
>> > mdadm: ... critical section passed.
>> > bernstein@server:~$ cat /proc/mdstat
>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
>> > md1 : active raid0 sdg1[1] sdf1[0]
>> >       976770944 blocks super 1.2 64k chunks
>> >
>> > md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0]
>> >       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
>> >       [>....................]  reshape =  1.6% (16423164/976760640) finish=902.2min speed=17739K/sec
>> >
>> > md0 : active raid0 sdh1[0] sdb1[1]
>> >       976770944 blocks super 1.2 64k chunks
>> >
>> > unused devices: <none>
>
> All looks good so-far.
>
>>
>>
>> now i thought /dev/sdg1 failed. unfortunately i have no log for this
>> one, just my memory of seeing this changed to the one above :
>>
>> >       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU]
>>
>
> Unfortunately it is not possible to know which drive is missing from the
> above info.  The [numbers] is brackets don't exactly corresponds to the
> positions in the array that you might thing they do.  The mdstat listing above
> has numbers 0,1,3,4,5.
>
> They are the 'Number' column in the --detail output below.  This is /dev/md1
> - I can tell from the --examine outputs, but it is a bit confusing.  Newer
> versions of mdadm make this a little less confusing.  If you look for
> patterns of U and u  in the 'Array State' line, the U is 'this device', the
> 'u' is some other devices.

Actually this is running a stock Ubunutu 10.10 server kernel. But as
it is from my memory it could very well have been :

       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [U_UUU]

>
> So /dev/md1 had a failure, so it could well have been sdg1.
>
>
>> some 10 minutes later a power loss occurred, thanks to an ups the
>> server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1,
>> rebooted and in a lapse of judgement forced assembly:
>
> Perfect timing :-)
>
>>
>> > bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
>> > mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
>> > mdadm: Failed to restore critical section for reshape, sorry.
>
> This isn't actually a 'forced assembly' as you seem to think.  There is no
> '-f' or '--force'.  It didn't cause any harm.

phew... at last some luck! that "Failed to restore critical section
for reshape, sorry" really scared the hell out of me.
But then again it got me paying attention and stop making things worse... :-)

>
>> >
>> > bernstein@server:~$ sudo mdadm --detail /dev/md2
>> > /dev/md2:
>> >         Version : 01.02
>> >   Creation Time : Sat Jan 22 00:15:43 2011
>> >      Raid Level : raid5
>> >   Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
>> >    Raid Devices : 5
>> >   Total Devices : 3
>> > Preferred Minor : 3
>> >     Persistence : Superblock is persistent
>> >
>> >     Update Time : Sat Feb 19 22:32:04 2011
>> >           State : active, degraded, Not Started
>                                        ^^^^^^^^^^^^
>
> mdadm has put the devices together as best it can, but has not started the
> array because it didn't have enough devices.  This is good.
>
>
>> >  Active Devices : 3
>> > Working Devices : 3
>> >  Failed Devices : 0
>> >   Spare Devices : 0
>> >
>> >          Layout : left-symmetric
>> >      Chunk Size : 64K
>> >
>> >   Delta Devices : 1, (4->5)
>> >
>> >            Name : master:public
>> >            UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>> >          Events : 133609
>> >
>> >     Number   Major   Minor   RaidDevice State
>> >        0       8       33        0      active sync   /dev/sdc1
>> >        1       0        0        1      removed
>> >        2       0        0        2      removed
>> >        4       9        0        3      active sync   /dev/block/9:0
>> >        5       8        1        4      active sync   /dev/sda1
>
> Some you now have 2 devices missing.  Along as we can find the devices,
>  mdadm --assemble --force
> should be able to put them togethe for you.  But let's see  what we have...
>
>>
>> so i reattached the old disk, got /dev/md1 back and did the
>> investigation i should have done before :
>>
>> > bernstein@server:~$ sudo mdadm --examine /dev/sdd1
>> > /dev/sdd1:
>> >           Magic : a92b4efc
>> >         Version : 1.2
>> >     Feature Map : 0x4
>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>> >            Name : master:public
>> >   Creation Time : Sat Jan 22 00:15:43 2011
>> >      Raid Level : raid5
>> >    Raid Devices : 5
>> >
>> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>> >     Data Offset : 272 sectors
>> >    Super Offset : 8 sectors
>> >           State : clean
>> >     Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6
>> >
>> >   Reshape pos'n : 489510400 (466.83 GiB 501.26 GB)
>> >   Delta Devices : 1 (4->5)
>> >
>> >     Update Time : Sat Feb 19 22:23:09 2011
>> >        Checksum : fd0c1794 - correct
>> >          Events : 133567
>> >
>> >          Layout : left-symmetric
>> >      Chunk Size : 64K
>> >
>> >     Array Slot : 1 (0, 1, failed, 2, 3, 4)
>> >    Array State : uUuuu 1 failed
>
> This device thinks all is well.  The "1 failed" is misleading.  The
>   uUuuu
> patterns says that all the devices are though to be working.
> Note for later reference:
>         Events: 133567
>  Reshape pos'n : 489510400
>
>
>> > bernstein@server:~$ sudo mdadm --examine /dev/sda1
>> > /dev/sda1:
>> >           Magic : a92b4efc
>> >         Version : 1.2
>> >     Feature Map : 0x4
>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>> >            Name : master:public
>> >   Creation Time : Sat Jan 22 00:15:43 2011
>> >      Raid Level : raid5
>> >    Raid Devices : 5
>> >
>> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>> >     Data Offset : 272 sectors
>> >    Super Offset : 8 sectors
>> >           State : clean
>> >     Device UUID : baebd175:e4128e4c:f768b60f:4df18f77
>> >
>> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>> >   Delta Devices : 1 (4->5)
>> >
>> >     Update Time : Sat Feb 19 22:32:04 2011
>> >        Checksum : 12c832c6 - correct
>> >          Events : 133609
>> >
>> >          Layout : left-symmetric
>> >      Chunk Size : 64K
>> >
>> >     Array Slot : 5 (0, failed, failed, failed, 3, 4)
>> >    Array State : u__uU 3 failed
>
> This device thinks devices 1 and 2 have failed (the '_'s).
> So 'sdd1' above, and and md1.
>        Events : 133609 - this has advanced a bit from sdd1
>  Reshape Pos'n : 502815488 - this has advanced quite a lot.
>
>
>> > bernstein@server:~$ sudo mdadm --examine /dev/sdc1
>> > /dev/sdc1:
>> >           Magic : a92b4efc
>> >         Version : 1.2
>> >     Feature Map : 0x4
>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>> >            Name : master:public
>> >   Creation Time : Sat Jan 22 00:15:43 2011
>> >      Raid Level : raid5
>> >    Raid Devices : 5
>> >
>> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>> >     Data Offset : 272 sectors
>> >    Super Offset : 8 sectors
>> >           State : clean
>> >     Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94
>> >
>> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>> >   Delta Devices : 1 (4->5)
>> >
>> >     Update Time : Sat Feb 19 22:32:04 2011
>> >        Checksum : 8aa7d094 - correct
>> >          Events : 133609
>> >
>> >          Layout : left-symmetric
>> >      Chunk Size : 64K
>> >
>> >     Array Slot : 0 (0, failed, failed, failed, 3, 4)
>> >    Array State : U__uu 3 failed
>
>  Reshape pos'n, Events, and Array State are identical to sda1.
> So these two are in agreement.
>
>
>> > bernstein@server:~$ sudo mdadm --examine /dev/md0
>> > /dev/md0:
>> >           Magic : a92b4efc
>> >         Version : 1.2
>> >     Feature Map : 0x4
>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>> >            Name : master:public
>> >   Creation Time : Sat Jan 22 00:15:43 2011
>> >      Raid Level : raid5
>> >    Raid Devices : 5
>> >
>> >  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>> >     Data Offset : 272 sectors
>> >    Super Offset : 8 sectors
>> >           State : clean
>> >     Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893
>> >
>> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>> >   Delta Devices : 1 (4->5)
>> >
>> >     Update Time : Sat Feb 19 22:32:04 2011
>> >        Checksum : 1bbf913b - correct
>> >          Events : 133609
>> >
>> >          Layout : left-symmetric
>> >      Chunk Size : 64K
>> >
>> >     Array Slot : 4 (0, failed, failed, failed, 3, 4)
>> >    Array State : u__Uu 3 failed
>
> again, exactly the same as sda1 and sdc1.
>
>> > bernstein@server:~$ sudo mdadm --examine /dev/md1
>> > /dev/md1:
>> >           Magic : a92b4efc
>> >         Version : 1.2
>> >     Feature Map : 0x4
>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>> >            Name : master:public
>> >   Creation Time : Sat Jan 22 00:15:43 2011
>> >      Raid Level : raid5
>> >    Raid Devices : 5
>> >
>> >  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>> >     Data Offset : 272 sectors
>> >    Super Offset : 8 sectors
>> >           State : clean
>> >     Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed
>> >
>> >   Reshape pos'n : 502809856 (479.52 GiB 514.88 GB)
>> >   Delta Devices : 1 (4->5)
>> >
>> >     Update Time : Sat Feb 19 22:30:29 2011
>> >        Checksum : 6c591e90 - correct
>> >          Events : 133603
>> >
>> >          Layout : left-symmetric
>> >      Chunk Size : 64K
>> >
>> >     Array Slot : 3 (0, failed, failed, 2, 3, 4)
>> >    Array State : u_Uuu 2 failed
>
> And here is md1.  Thinks device 2 - sdd1 - has failed.
>        Events : 133603 - slightly behind the 3 good devices, be well after
>                                                  sdd1
>  Reshape Pos'n : 502809856 - just a little before the 3 good devices too.
>
>>
>> so obviously not /dev/sdd1 failed. however (due to that silly forced
>> assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a
>> few bytes, resulting in an inconsistent state...
>
> The way I read it is:
>
>  sdd1 failed first - shortly after Sat Feb 19 22:23:09 2011 - the update time on sdd1
> reshape continued until some time between Sat Feb 19 22:30:29 2011
> and Sat Feb 19 22:32:04 2011 when md1 had a failure.
> The reshape couldn't continue now, so it stopped.
>
> So the data on sdd1 is only (there has been about 8 minutes of reshape since
> then) and cannot be used.
> The data on md1 is very close to the rest.  The data that was in the process
> of being relocated lives in two locations on the 'good' drives, both the new
> and the old.  It only lives in the 'old' location on md1.
>
> So what we need to do is re-assemble the array, but telling it that the
> reshape has only gone as far as md1 thinks it has.  This will make sure it
> repeats that last part of the reshape.
>
> mdadm -Af should do that BUT IT DOESN'T.  Assuming I have thought through
> this properly (and I should go through it again with more care), mdadm won't
> do the right thing for you.  I need to get it to handle 'reshape' specially
> when doing a --force assemble.

exactly what i was thinking of doing, glad i waited and asked.

>
>>
>> > bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1
>> >
>> > mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.
>> > bernstein@server:~$ cat /proc/mdstat
>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
>> > md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S)
>> >       4883823704 blocks super 1.2
>> >
>> > md1 : active raid0 sdf1[0] sdg1[1]
>> >       976770944 blocks super 1.2 64k chunks
>> >
>> > md0 : active raid0 sdb1[1] sdh1[0]
>> >       976770944 blocks super 1.2 64k chunks
>> >
>> > unused devices: <none>
>>
>> i do have a backup but since recovery from it takes a few days, i'd
>> like to know if there is a way to recover the array or if it's
>> completely lost.
>>
>> Any suggestions gratefully received,
>
> The fact that you have a backup is excellent.  You might need it, but I hope
> not.
>
> I would like to provide you with a modified version of mdadm which you can
> then user to --force assemble the array.  It should be able to get you access
> to all your data.
> The array will be degraded and will finish reshape in that state.  Then you
> will need to add sdd1 back in (Assuming you are confident that it works) and
> it will be rebuilt.
>
> Just to go through some of the numbers...
>
> Chunk size is 64K.  Reshape was 4->5, so 3 -> 4 data disks.
> So old stripes have 192K, new stripes have 256K.
>
> The 'good' disks think reshape has reached 502815488K which is
> 1964123 new stripes. (2618830.66 old stripes)
> md1 thinks reshape has only reached 489510400K which is 1912150
> new stripes (2549533.33 old stripes).

i think you mixed up sdd1 with md1 here? (the numbers above for md1
are for sdd1. md1 would be :  reshape has reached 502809856K which
would be 1964101 new stripes. so the difference between the good disks
and md1 would be 22 stripes.)

>
> So of the 51973 stripes that have been reshaped since the last metadata
> update on sdd1, some will have been done on sdd1, but some not, and we don't
> really know how many.  But it is perfectly safe to repeat those stripes
> as all writes to that region will have been suspended (and you probably
> weren't writing anyway).

jep there was nothing writing to the array. so now i am a little
confused, if you meant sdd1 (which failed first is 51973 stripes
behind) this would imply that at least so many stripes of data are
kept of the old (3 data disks) configuration as well as the new one?
if continuing from there is possible then the array would no longer be
degraded right? so i think you meant md1 (22 stripes behind), as
keeping 5.5M of data from the old and new config seems more
reasonable. however this is just a guess :-)

>
> So I need to change the loop in Assemble.c which calls ->update_super
> with "force-one" to also make sure the reshape_position in the 'chosen'
> superblock match the oldest 'forced' superblock.

uh... ah... probably, i have zero knowledge of kernel code :-)
i guess it should take into account that the oldest superblock (sdd1
in this case) may already be out of the section were the data (in the
old config) still exists? but i guess you already thought of that...

>
> So if you are able to wait a day, I'll try to write a patch first thing
> tomorrow and send it to you.

sure, that would be awesome! that boils down to compiling the patched
kernel doesn't it? this will probably take a few days as the system is
quite slow and i'd have to get up to speed with kernel compiling. but
shouldn't be a problem. would i have to patch the ubuntu kernel (based
on 2.6.35.4) or the latest 2.6.38-rc from kernel.org?

>
> Thanks for the excellent problem report.
>
> NeilBrown

Well i thank you for providing such an elaborate and friendly answer!
this is actually my first mailing list post and considering how many
questions get ignored (don't know about this list though) i just hoped
someone would at least answer with a one liner... i never expected
this. so thanks again.

Claude
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-20 14:44     ` Claude Nobs
@ 2011-02-20 14:47       ` Mathias Burén
  2011-02-21  0:53       ` NeilBrown
  1 sibling, 0 replies; 9+ messages in thread
From: Mathias Burén @ 2011-02-20 14:47 UTC (permalink / raw)
  To: Claude Nobs; +Cc: NeilBrown, linux-raid

On 20 February 2011 14:44, Claude Nobs <claudenobs@blunet.cc> wrote:
> On Sun, Feb 20, 2011 at 06:25, NeilBrown <neilb@suse.de> wrote:
>> On Sun, 20 Feb 2011 04:23:09 +0100 Claude Nobs <claudenobs@blunet.cc> wrote:
>>
>>> Hi All,
>>>
>>> I was wondering if someone might be willing to share if this array is
>>> recoverable.
>>>
>>
>> Probably is.  But don't do anything yet - any further action until you have
>> read all of the following email, will probably cause more harm than good.
>>
>>> I had a clean, running raid5 using 4 block devices (two of those were
>>> 2 disk raid0 md devices) in RAID 5. Last night I decided it was safe
>>> to grow the array by one disk. But then a) a disk failed, b) a power
>>> loss occured, c) i probably switched the wrong disk and forced
>>> assembly, resulting in an inconsistent state. Here is a complete set
>>> of actions taken :
>>
>> Providing this level of information is excellent!
>>
>>
>>>
>>> > bernstein@server:~$ sudo mdadm --grow --raid-devices=5 --backup-file=/raid.grow.backupfile /dev/md2
>>> > mdadm: Need to backup 768K of critical section..
>>> > mdadm: ... critical section passed.
>>> > bernstein@server:~$ cat /proc/mdstat
>>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
>>> > md1 : active raid0 sdg1[1] sdf1[0]
>>> >       976770944 blocks super 1.2 64k chunks
>>> >
>>> > md2 : active raid5 sda1[5] md0[4] md1[3] sdd1[1] sdc1[0]
>>> >       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
>>> >       [>....................]  reshape =  1.6% (16423164/976760640) finish=902.2min speed=17739K/sec
>>> >
>>> > md0 : active raid0 sdh1[0] sdb1[1]
>>> >       976770944 blocks super 1.2 64k chunks
>>> >
>>> > unused devices: <none>
>>
>> All looks good so-far.
>>
>>>
>>>
>>> now i thought /dev/sdg1 failed. unfortunately i have no log for this
>>> one, just my memory of seeing this changed to the one above :
>>>
>>> >       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [UU_UU]
>>>
>>
>> Unfortunately it is not possible to know which drive is missing from the
>> above info.  The [numbers] is brackets don't exactly corresponds to the
>> positions in the array that you might thing they do.  The mdstat listing above
>> has numbers 0,1,3,4,5.
>>
>> They are the 'Number' column in the --detail output below.  This is /dev/md1
>> - I can tell from the --examine outputs, but it is a bit confusing.  Newer
>> versions of mdadm make this a little less confusing.  If you look for
>> patterns of U and u  in the 'Array State' line, the U is 'this device', the
>> 'u' is some other devices.
>
> Actually this is running a stock Ubunutu 10.10 server kernel. But as
> it is from my memory it could very well have been :
>
>       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [U_UUU]
>
>>
>> So /dev/md1 had a failure, so it could well have been sdg1.
>>
>>
>>> some 10 minutes later a power loss occurred, thanks to an ups the
>>> server shut down as with 'shutdown -h now'. now i exchanged /dev/sdg1,
>>> rebooted and in a lapse of judgement forced assembly:
>>
>> Perfect timing :-)
>>
>>>
>>> > bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0 /dev/sda1 /dev/sdc1 /dev/sdd1
>>> > mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
>>> > mdadm: Failed to restore critical section for reshape, sorry.
>>
>> This isn't actually a 'forced assembly' as you seem to think.  There is no
>> '-f' or '--force'.  It didn't cause any harm.
>
> phew... at last some luck! that "Failed to restore critical section
> for reshape, sorry" really scared the hell out of me.
> But then again it got me paying attention and stop making things worse... :-)
>
>>
>>> >
>>> > bernstein@server:~$ sudo mdadm --detail /dev/md2
>>> > /dev/md2:
>>> >         Version : 01.02
>>> >   Creation Time : Sat Jan 22 00:15:43 2011
>>> >      Raid Level : raid5
>>> >   Used Dev Size : 976760640 (931.51 GiB 1000.20 GB)
>>> >    Raid Devices : 5
>>> >   Total Devices : 3
>>> > Preferred Minor : 3
>>> >     Persistence : Superblock is persistent
>>> >
>>> >     Update Time : Sat Feb 19 22:32:04 2011
>>> >           State : active, degraded, Not Started
>>                                        ^^^^^^^^^^^^
>>
>> mdadm has put the devices together as best it can, but has not started the
>> array because it didn't have enough devices.  This is good.
>>
>>
>>> >  Active Devices : 3
>>> > Working Devices : 3
>>> >  Failed Devices : 0
>>> >   Spare Devices : 0
>>> >
>>> >          Layout : left-symmetric
>>> >      Chunk Size : 64K
>>> >
>>> >   Delta Devices : 1, (4->5)
>>> >
>>> >            Name : master:public
>>> >            UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>>> >          Events : 133609
>>> >
>>> >     Number   Major   Minor   RaidDevice State
>>> >        0       8       33        0      active sync   /dev/sdc1
>>> >        1       0        0        1      removed
>>> >        2       0        0        2      removed
>>> >        4       9        0        3      active sync   /dev/block/9:0
>>> >        5       8        1        4      active sync   /dev/sda1
>>
>> Some you now have 2 devices missing.  Along as we can find the devices,
>>  mdadm --assemble --force
>> should be able to put them togethe for you.  But let's see  what we have...
>>
>>>
>>> so i reattached the old disk, got /dev/md1 back and did the
>>> investigation i should have done before :
>>>
>>> > bernstein@server:~$ sudo mdadm --examine /dev/sdd1
>>> > /dev/sdd1:
>>> >           Magic : a92b4efc
>>> >         Version : 1.2
>>> >     Feature Map : 0x4
>>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>>> >            Name : master:public
>>> >   Creation Time : Sat Jan 22 00:15:43 2011
>>> >      Raid Level : raid5
>>> >    Raid Devices : 5
>>> >
>>> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>>> >     Data Offset : 272 sectors
>>> >    Super Offset : 8 sectors
>>> >           State : clean
>>> >     Device UUID : 5e37fc7c:50ff3b50:de3755a1:6bdbebc6
>>> >
>>> >   Reshape pos'n : 489510400 (466.83 GiB 501.26 GB)
>>> >   Delta Devices : 1 (4->5)
>>> >
>>> >     Update Time : Sat Feb 19 22:23:09 2011
>>> >        Checksum : fd0c1794 - correct
>>> >          Events : 133567
>>> >
>>> >          Layout : left-symmetric
>>> >      Chunk Size : 64K
>>> >
>>> >     Array Slot : 1 (0, 1, failed, 2, 3, 4)
>>> >    Array State : uUuuu 1 failed
>>
>> This device thinks all is well.  The "1 failed" is misleading.  The
>>   uUuuu
>> patterns says that all the devices are though to be working.
>> Note for later reference:
>>         Events: 133567
>>  Reshape pos'n : 489510400
>>
>>
>>> > bernstein@server:~$ sudo mdadm --examine /dev/sda1
>>> > /dev/sda1:
>>> >           Magic : a92b4efc
>>> >         Version : 1.2
>>> >     Feature Map : 0x4
>>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>>> >            Name : master:public
>>> >   Creation Time : Sat Jan 22 00:15:43 2011
>>> >      Raid Level : raid5
>>> >    Raid Devices : 5
>>> >
>>> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>>> >     Data Offset : 272 sectors
>>> >    Super Offset : 8 sectors
>>> >           State : clean
>>> >     Device UUID : baebd175:e4128e4c:f768b60f:4df18f77
>>> >
>>> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>>> >   Delta Devices : 1 (4->5)
>>> >
>>> >     Update Time : Sat Feb 19 22:32:04 2011
>>> >        Checksum : 12c832c6 - correct
>>> >          Events : 133609
>>> >
>>> >          Layout : left-symmetric
>>> >      Chunk Size : 64K
>>> >
>>> >     Array Slot : 5 (0, failed, failed, failed, 3, 4)
>>> >    Array State : u__uU 3 failed
>>
>> This device thinks devices 1 and 2 have failed (the '_'s).
>> So 'sdd1' above, and and md1.
>>        Events : 133609 - this has advanced a bit from sdd1
>>  Reshape Pos'n : 502815488 - this has advanced quite a lot.
>>
>>
>>> > bernstein@server:~$ sudo mdadm --examine /dev/sdc1
>>> > /dev/sdc1:
>>> >           Magic : a92b4efc
>>> >         Version : 1.2
>>> >     Feature Map : 0x4
>>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>>> >            Name : master:public
>>> >   Creation Time : Sat Jan 22 00:15:43 2011
>>> >      Raid Level : raid5
>>> >    Raid Devices : 5
>>> >
>>> >  Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
>>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>>> >     Data Offset : 272 sectors
>>> >    Super Offset : 8 sectors
>>> >           State : clean
>>> >     Device UUID : 82f5284a:2bffb837:19d366ab:ef2e3d94
>>> >
>>> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>>> >   Delta Devices : 1 (4->5)
>>> >
>>> >     Update Time : Sat Feb 19 22:32:04 2011
>>> >        Checksum : 8aa7d094 - correct
>>> >          Events : 133609
>>> >
>>> >          Layout : left-symmetric
>>> >      Chunk Size : 64K
>>> >
>>> >     Array Slot : 0 (0, failed, failed, failed, 3, 4)
>>> >    Array State : U__uu 3 failed
>>
>>  Reshape pos'n, Events, and Array State are identical to sda1.
>> So these two are in agreement.
>>
>>
>>> > bernstein@server:~$ sudo mdadm --examine /dev/md0
>>> > /dev/md0:
>>> >           Magic : a92b4efc
>>> >         Version : 1.2
>>> >     Feature Map : 0x4
>>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>>> >            Name : master:public
>>> >   Creation Time : Sat Jan 22 00:15:43 2011
>>> >      Raid Level : raid5
>>> >    Raid Devices : 5
>>> >
>>> >  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>>> >     Data Offset : 272 sectors
>>> >    Super Offset : 8 sectors
>>> >           State : clean
>>> >     Device UUID : 83ecd60d:f3947a5e:a69c4353:3c4a0893
>>> >
>>> >   Reshape pos'n : 502815488 (479.52 GiB 514.88 GB)
>>> >   Delta Devices : 1 (4->5)
>>> >
>>> >     Update Time : Sat Feb 19 22:32:04 2011
>>> >        Checksum : 1bbf913b - correct
>>> >          Events : 133609
>>> >
>>> >          Layout : left-symmetric
>>> >      Chunk Size : 64K
>>> >
>>> >     Array Slot : 4 (0, failed, failed, failed, 3, 4)
>>> >    Array State : u__Uu 3 failed
>>
>> again, exactly the same as sda1 and sdc1.
>>
>>> > bernstein@server:~$ sudo mdadm --examine /dev/md1
>>> > /dev/md1:
>>> >           Magic : a92b4efc
>>> >         Version : 1.2
>>> >     Feature Map : 0x4
>>> >      Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
>>> >            Name : master:public
>>> >   Creation Time : Sat Jan 22 00:15:43 2011
>>> >      Raid Level : raid5
>>> >    Raid Devices : 5
>>> >
>>> >  Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
>>> >      Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
>>> >   Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
>>> >     Data Offset : 272 sectors
>>> >    Super Offset : 8 sectors
>>> >           State : clean
>>> >     Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed
>>> >
>>> >   Reshape pos'n : 502809856 (479.52 GiB 514.88 GB)
>>> >   Delta Devices : 1 (4->5)
>>> >
>>> >     Update Time : Sat Feb 19 22:30:29 2011
>>> >        Checksum : 6c591e90 - correct
>>> >          Events : 133603
>>> >
>>> >          Layout : left-symmetric
>>> >      Chunk Size : 64K
>>> >
>>> >     Array Slot : 3 (0, failed, failed, 2, 3, 4)
>>> >    Array State : u_Uuu 2 failed
>>
>> And here is md1.  Thinks device 2 - sdd1 - has failed.
>>        Events : 133603 - slightly behind the 3 good devices, be well after
>>                                                  sdd1
>>  Reshape Pos'n : 502809856 - just a little before the 3 good devices too.
>>
>>>
>>> so obviously not /dev/sdd1 failed. however (due to that silly forced
>>> assembly?!) the reshape pos'n field of md0, sd[ac]1 differs from md1 a
>>> few bytes, resulting in an inconsistent state...
>>
>> The way I read it is:
>>
>>  sdd1 failed first - shortly after Sat Feb 19 22:23:09 2011 - the update time on sdd1
>> reshape continued until some time between Sat Feb 19 22:30:29 2011
>> and Sat Feb 19 22:32:04 2011 when md1 had a failure.
>> The reshape couldn't continue now, so it stopped.
>>
>> So the data on sdd1 is only (there has been about 8 minutes of reshape since
>> then) and cannot be used.
>> The data on md1 is very close to the rest.  The data that was in the process
>> of being relocated lives in two locations on the 'good' drives, both the new
>> and the old.  It only lives in the 'old' location on md1.
>>
>> So what we need to do is re-assemble the array, but telling it that the
>> reshape has only gone as far as md1 thinks it has.  This will make sure it
>> repeats that last part of the reshape.
>>
>> mdadm -Af should do that BUT IT DOESN'T.  Assuming I have thought through
>> this properly (and I should go through it again with more care), mdadm won't
>> do the right thing for you.  I need to get it to handle 'reshape' specially
>> when doing a --force assemble.
>
> exactly what i was thinking of doing, glad i waited and asked.
>
>>
>>>
>>> > bernstein@server:~$ sudo mdadm --assemble /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdd1 /dev/sdc1
>>> >
>>> > mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.
>>> > bernstein@server:~$ cat /proc/mdstat
>>> > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
>>> > md2 : inactive sdc1[0](S) sda1[5](S) md0[4](S) md1[3](S) sdd1[1](S)
>>> >       4883823704 blocks super 1.2
>>> >
>>> > md1 : active raid0 sdf1[0] sdg1[1]
>>> >       976770944 blocks super 1.2 64k chunks
>>> >
>>> > md0 : active raid0 sdb1[1] sdh1[0]
>>> >       976770944 blocks super 1.2 64k chunks
>>> >
>>> > unused devices: <none>
>>>
>>> i do have a backup but since recovery from it takes a few days, i'd
>>> like to know if there is a way to recover the array or if it's
>>> completely lost.
>>>
>>> Any suggestions gratefully received,
>>
>> The fact that you have a backup is excellent.  You might need it, but I hope
>> not.
>>
>> I would like to provide you with a modified version of mdadm which you can
>> then user to --force assemble the array.  It should be able to get you access
>> to all your data.
>> The array will be degraded and will finish reshape in that state.  Then you
>> will need to add sdd1 back in (Assuming you are confident that it works) and
>> it will be rebuilt.
>>
>> Just to go through some of the numbers...
>>
>> Chunk size is 64K.  Reshape was 4->5, so 3 -> 4 data disks.
>> So old stripes have 192K, new stripes have 256K.
>>
>> The 'good' disks think reshape has reached 502815488K which is
>> 1964123 new stripes. (2618830.66 old stripes)
>> md1 thinks reshape has only reached 489510400K which is 1912150
>> new stripes (2549533.33 old stripes).
>
> i think you mixed up sdd1 with md1 here? (the numbers above for md1
> are for sdd1. md1 would be :  reshape has reached 502809856K which
> would be 1964101 new stripes. so the difference between the good disks
> and md1 would be 22 stripes.)
>
>>
>> So of the 51973 stripes that have been reshaped since the last metadata
>> update on sdd1, some will have been done on sdd1, but some not, and we don't
>> really know how many.  But it is perfectly safe to repeat those stripes
>> as all writes to that region will have been suspended (and you probably
>> weren't writing anyway).
>
> jep there was nothing writing to the array. so now i am a little
> confused, if you meant sdd1 (which failed first is 51973 stripes
> behind) this would imply that at least so many stripes of data are
> kept of the old (3 data disks) configuration as well as the new one?
> if continuing from there is possible then the array would no longer be
> degraded right? so i think you meant md1 (22 stripes behind), as
> keeping 5.5M of data from the old and new config seems more
> reasonable. however this is just a guess :-)
>
>>
>> So I need to change the loop in Assemble.c which calls ->update_super
>> with "force-one" to also make sure the reshape_position in the 'chosen'
>> superblock match the oldest 'forced' superblock.
>
> uh... ah... probably, i have zero knowledge of kernel code :-)
> i guess it should take into account that the oldest superblock (sdd1
> in this case) may already be out of the section were the data (in the
> old config) still exists? but i guess you already thought of that...
>
>>
>> So if you are able to wait a day, I'll try to write a patch first thing
>> tomorrow and send it to you.
>
> sure, that would be awesome! that boils down to compiling the patched
> kernel doesn't it? this will probably take a few days as the system is
> quite slow and i'd have to get up to speed with kernel compiling. but
> shouldn't be a problem. would i have to patch the ubuntu kernel (based
> on 2.6.35.4) or the latest 2.6.38-rc from kernel.org?
>
>>
>> Thanks for the excellent problem report.
>>
>> NeilBrown
>
> Well i thank you for providing such an elaborate and friendly answer!
> this is actually my first mailing list post and considering how many
> questions get ignored (don't know about this list though) i just hoped
> someone would at least answer with a one liner... i never expected
> this. so thanks again.
>
> Claude
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Just a quick FYI, you can find (new, and unreleased) Ubuntu kernels
here: http://kernel.ubuntu.com/~kernel-ppa/mainline/

// Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-20 14:44     ` Claude Nobs
  2011-02-20 14:47       ` Mathias Burén
@ 2011-02-21  0:53       ` NeilBrown
  2011-02-21  1:03         ` NeilBrown
  2011-02-23  0:56         ` Claude Nobs
  1 sibling, 2 replies; 9+ messages in thread
From: NeilBrown @ 2011-02-21  0:53 UTC (permalink / raw)
  To: Claude Nobs; +Cc: linux-raid

On Sun, 20 Feb 2011 15:44:35 +0100 Claude Nobs <claudenobs@blunet.cc> wrote:

> > They are the 'Number' column in the --detail output below.  This is /dev/md1
> > - I can tell from the --examine outputs, but it is a bit confusing.  Newer
> > versions of mdadm make this a little less confusing.  If you look for
> > patterns of U and u  in the 'Array State' line, the U is 'this device', the
> > 'u' is some other devices.
> 
> Actually this is running a stock Ubunutu 10.10 server kernel. But as
> it is from my memory it could very well have been :
> 
>        2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/5] [U_UUU]
> 

I'm quite sure it would have been '[U_UUU]' as you say.

When I say "Newer versions" I mean of mdadm, not the kernel.

What does
   mdadm -V

show?  Version 3.0 or later gives less confusing output for "mdadm --examine"
on 1.x metadata.

> > Just to go through some of the numbers...
> >
> > Chunk size is 64K.  Reshape was 4->5, so 3 -> 4 data disks.
> > So old stripes have 192K, new stripes have 256K.
> >
> > The 'good' disks think reshape has reached 502815488K which is
> > 1964123 new stripes. (2618830.66 old stripes)
> > md1 thinks reshape has only reached 489510400K which is 1912150
> > new stripes (2549533.33 old stripes).
> 
> i think you mixed up sdd1 with md1 here? (the numbers above for md1
> are for sdd1. md1 would be :  reshape has reached 502809856K which
> would be 1964101 new stripes. so the difference between the good disks
> and md1 would be 22 stripes.)

Yes, I got them mixed up.  But the net result is the same - the 'new' stripes
numbers haven't got close to overwriting the 'old' stripe numbers.

> 
> >
> > So of the 51973 stripes that have been reshaped since the last metadata
> > update on sdd1, some will have been done on sdd1, but some not, and we don't
> > really know how many.  But it is perfectly safe to repeat those stripes
> > as all writes to that region will have been suspended (and you probably
> > weren't writing anyway).
> 
> jep there was nothing writing to the array. so now i am a little
> confused, if you meant sdd1 (which failed first is 51973 stripes
> behind) this would imply that at least so many stripes of data are
> kept of the old (3 data disks) configuration as well as the new one?
> if continuing from there is possible then the array would no longer be
> degraded right? so i think you meant md1 (22 stripes behind), as
> keeping 5.5M of data from the old and new config seems more
> reasonable. however this is just a guess :-)

Yes, it probably is possible to re-assemble the array to include sdd1 and not
have a degraded array, and still have all your data safe - providing you are
sure that nothing at all changed on the array (e.g. maybe it was unmounted?).

I'm not sure I'd recommend it though....  I cannot see anything that would go
wrong, but it is somewhat unknown territory.
Up to you...

If you:

% git clone git://neil.brown.name/mdadm master
% cd mdadm
% make
% sudo bash
# ./mdadm -S /dev/md2
# ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1

It should restart your array - degraded - and repeat the last stages of
reshape just in case.

Alternately, before you run 'make' you could edit Assemble.c, find:
	while (force && !enough(content->array.level, content->array.raid_disks,
				content->array.layout, 1,
				avail, okcnt)) {

around line 818, and change the '1,' to '0,', then run make, mdadm -S, and
then
# ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1 /dev/sdd1

it should assemble the array non-degraded and repeat all of the reshape since
sdd1 fell out of the array.

As you have a backup, this is probably safe because even if to goes bad you
can restore from backups - not that I expect it to go bad but ....

> >
> > Thanks for the excellent problem report.
> >
> > NeilBrown
> 
> Well i thank you for providing such an elaborate and friendly answer!
> this is actually my first mailing list post and considering how many
> questions get ignored (don't know about this list though) i just hoped
> someone would at least answer with a one liner... i never expected
> this. so thanks again.

All part of the service... :-)

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-21  0:53       ` NeilBrown
@ 2011-02-21  1:03         ` NeilBrown
  2011-02-23  0:56         ` Claude Nobs
  1 sibling, 0 replies; 9+ messages in thread
From: NeilBrown @ 2011-02-21  1:03 UTC (permalink / raw)
  To: Claude Nobs; +Cc: linux-raid

On Mon, 21 Feb 2011 11:53:03 +1100 NeilBrown <neilb@suse.de> wrote:

> % git clone git://neil.brown.name/mdadm master

No, that's wrong.  It's just

    git clone git://neil.brown.name/mdadm

NeilBrown


> % cd mdadm
> % make
> % sudo bash
> # ./mdadm -S /dev/md2
> # ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-21  0:53       ` NeilBrown
  2011-02-21  1:03         ` NeilBrown
@ 2011-02-23  0:56         ` Claude Nobs
  2011-02-23  1:53           ` NeilBrown
  1 sibling, 1 reply; 9+ messages in thread
From: Claude Nobs @ 2011-02-23  0:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Feb 21, 2011 at 01:53, NeilBrown <neilb@suse.de> wrote:
>
> When I say "Newer versions" I mean of mdadm, not the kernel.
>
> What does
>   mdadm -V
>
> show?  Version 3.0 or later gives less confusing output for "mdadm --examine"
> on 1.x metadata.

mdadm - v2.6.7.1 - 15th October 2008
so yes the ubuntu mdadm seems to be a very old version indeed

> Yes, it probably is possible to re-assemble the array to include sdd1 and not
> have a degraded array, and still have all your data safe - providing you are
> sure that nothing at all changed on the array (e.g. maybe it was unmounted?).
>
> I'm not sure I'd recommend it though....  I cannot see anything that would go
> wrong, but it is somewhat unknown territory.
> Up to you...
>
> If you:
>
> % git clone git://neil.brown.name/mdadm master
> % cd mdadm
> % make
> % sudo bash
> # ./mdadm -S /dev/md2
> # ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1
>
> It should restart your array - degraded - and repeat the last stages of
> reshape just in case.
>
> Alternately, before you run 'make' you could edit Assemble.c, find:
>        while (force && !enough(content->array.level, content->array.raid_disks,
>                                content->array.layout, 1,
>                                avail, okcnt)) {
>
> around line 818, and change the '1,' to '0,', then run make, mdadm -S, and
> then
> # ./mdadm -Afvv /dev/md2 /dev/sda1 /dev/md0 /dev/md1 /dev/sdc1 /dev/sdd1
>
> it should assemble the array non-degraded and repeat all of the reshape since
> sdd1 fell out of the array.
>
> As you have a backup, this is probably safe because even if to goes bad you
> can restore from backups - not that I expect it to go bad but ....

I tried to recreate the scenario so i could test both versions first
but i just could not recreate this step (resp. it's result (different
reshape posn on the last 3+1 drives)) :

bernstein@server:~$ sudo mdadm --assemble --run /dev/md2 /dev/md0
/dev/sda1 /dev/sdc1 /dev/sdd1
mdadm: Could not open /dev/sda1 for write - cannot Assemble array.
mdadm: Failed to restore critical section for reshape, sorry.

which i think lead to the inconsistent state. all i got was :

$ sudo mdadm --create /dev/md4 --level raid5 --metadata=1.2
--raid-devices=4 /dev/sde[5678]
$ sudo mkfs.ext4 /dev/md4
$ sudo mdadm --add /dev/md4 /dev/sde9
$ sudo mdadm --grow --raid-devices 5 /dev/md4
$ sudo mdadm /dev/md4 --fail /dev/sde9
$ sudo umount /dev/md4 && sudo mdadm -S /dev/md4
$ sudo reboot
$ sudo mdadm -S /dev/md4
$ sudo mdadm --assemble --run /dev/md4 /dev/sde[6789]
mdadm: failed to RUN_ARRAY /dev/md4: Input/output error
mdadm: Not enough devices to start the array.
$ sudo mdadm --examine /dev/sde[56789]
/dev/sde5:
  Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
  Delta Devices : 1 (4->5)
    Update Time : Tue Feb 22 23:52:56 2011
    Array Slot : 0 (0, 1, 2, failed, failed, failed)
   Array State : Uuu__ 3 failed
/dev/sde6:
  Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
  Delta Devices : 1 (4->5)
    Update Time : Tue Feb 22 23:52:56 2011
    Array Slot : 1 (0, 1, 2, failed, failed, failed)
   Array State : uUu__ 3 failed
/dev/sde7:
  Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
  Delta Devices : 1 (4->5)
    Update Time : Tue Feb 22 23:52:56 2011
    Array Slot : 2 (0, 1, 2, failed, failed, failed)
   Array State : uuU__ 3 failed
/dev/sde8:
  Reshape pos'n : 126720 (123.77 MiB 129.76 MB)
  Delta Devices : 1 (4->5)
    Update Time : Tue Feb 22 23:52:15 2011
    Array Slot : 4 (0, 1, 2, failed, 3, failed)
   Array State : uuuU_ 2 failed
/dev/sde9:
  Reshape pos'n : 54016 (52.76 MiB 55.31 MB)
  Delta Devices : 1 (4->5)
    Update Time : Tue Feb 22 23:52:11 2011
    Array Slot : 5 (0, 1, 2, failed, 3, 4)
   Array State : uuuuU 1 failed

which got instantly correctly reshaped by the freshly compiled
version. without any more real testing, i chose the safer way and went
ahead on the real array :

bernstein@server:~/mdadm$ sudo ./mdadm -Afvv /dev/md2 /dev/sda1
/dev/md0 /dev/md1 /dev/sdc1
mdadm: looking for devices for /dev/md2
mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 4.
mdadm: /dev/md0 is identified as a member of /dev/md2, slot 3.
mdadm: /dev/md1 is identified as a member of /dev/md2, slot 2.
mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 0.
mdadm: forcing event count in /dev/md1(2) from 133603 upto 133609
mdadm: Cannot open /dev/sdc1: Device or resource busy
bernstein@server:~/mdadm$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md2 : active raid5 md1[3] md0[4] sda1[5] sdc1[0]
      2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [U_UUU]
      [==>..................]  reshape = 12.8% (125839952/976760640)
finish=825.1min speed=17186K/sec

md1 : active raid0 sdg1[1] sdf1[0]
      976770944 blocks super 1.2 64k chunks

md0 : active raid0 sdh1[0] sdb1[1]
      976770944 blocks super 1.2 64k chunks

unused devices: <none>

reshape is in progress and is looking good to complete overnight.
although i am a little scared about the "mdadm: forcing event count in
/dev/md1(2) from 133603 upto 133609" and the "device busy" line. is
this the way it's supposed to be? i assumed that when it's repeating
all the reshape it would be like : forcing event count in /dev/sda1,
md0, sdc1 from 133609 downto 133603...

this i not strictly a raid/mdadm question, but do you know a simple
way to ckeck everything went ok? i think that an e2fsck (ext4 fs) and
checksumming some random files located behind the interruption point
should verify all went ok. plus just to be sure i'd like to check
files located at the interruption point. is the offset to the
interruption point into the md device simply the reshape pos'n (e.g.
502815488K) ?

> All part of the service... :-)

Well then, great service!
Thanks a lot.

Claude
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-23  0:56         ` Claude Nobs
@ 2011-02-23  1:53           ` NeilBrown
  2011-02-24  4:06             ` Claude Nobs
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-02-23  1:53 UTC (permalink / raw)
  To: Claude Nobs; +Cc: linux-raid

On Wed, 23 Feb 2011 01:56:13 +0100 Claude Nobs <claudenobs@blunet.cc> wrote:

> bernstein@server:~/mdadm$ sudo ./mdadm -Afvv /dev/md2 /dev/sda1
> /dev/md0 /dev/md1 /dev/sdc1
> mdadm: looking for devices for /dev/md2
> mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 4.
> mdadm: /dev/md0 is identified as a member of /dev/md2, slot 3.
> mdadm: /dev/md1 is identified as a member of /dev/md2, slot 2.
> mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 0.
> mdadm: forcing event count in /dev/md1(2) from 133603 upto 133609

This is normal - mdadm is just letting you know that it is including in the 
array a device that looks a bit old - we expected this.

> mdadm: Cannot open /dev/sdc1: Device or resource busy

This is odd.  I cannot explain this at all.  When this message is printed
mdadm should give up and  not continue.  Yet it seems that it did continue
because the array is started and is reshaping.

> bernstein@server:~/mdadm$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md2 : active raid5 md1[3] md0[4] sda1[5] sdc1[0]
>       2930281920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [U_UUU]
>       [==>..................]  reshape = 12.8% (125839952/976760640)
> finish=825.1min speed=17186K/sec

This looks OK.  125839952 corresponds to a "reshape Pos'n" of 
503359808 which is slightly after where we would expect it to start, which
is what we would expect.
There won't be any info in the logs to tell us exactly where it started,
which is a shame, but it probably started at the right place.

> 
> this i not strictly a raid/mdadm question, but do you know a simple
> way to ckeck everything went ok? i think that an e2fsck (ext4 fs) and
> checksumming some random files located behind the interruption point
> should verify all went ok. plus just to be sure i'd like to check
> files located at the interruption point. is the offset to the
> interruption point into the md device simply the reshape pos'n (e.g.
> 502815488K) ?

No - just the things you suggest.
The Reshape pos'n is the address in the array where reshape was up to.
You could try using 'debugfs' to have a look at the context of those blocks.
Remember to divide this number by 4 to get an ext4fs block number (assuming
4K blocks).

Use:   testb BLOCKNUMBER COUNT

to see if the blocks were even allocated.
Then
       icheck BLOCKNUM
on a few of the blocks to see what inode was using them.
Then
       ncheck INODE
to find a path to that inode number.


Feel free to report your results - particularly if you find anything helpful.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Likely forced assemby with wrong disk during raid5 grow. Recoverable?
  2011-02-23  1:53           ` NeilBrown
@ 2011-02-24  4:06             ` Claude Nobs
  0 siblings, 0 replies; 9+ messages in thread
From: Claude Nobs @ 2011-02-24  4:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Feb 23, 2011 at 02:53, NeilBrown <neilb@suse.de> wrote:
> No - just the things you suggest.
> The Reshape pos'n is the address in the array where reshape was up to.
> You could try using 'debugfs' to have a look at the context of those blocks.
> Remember to divide this number by 4 to get an ext4fs block number (assuming
> 4K blocks).
>
> Use:   testb BLOCKNUMBER COUNT
>
> to see if the blocks were even allocated.
> Then
>       icheck BLOCKNUM
> on a few of the blocks to see what inode was using them.
> Then
>       ncheck INODE
> to find a path to that inode number.
>
>
> Feel free to report your results - particularly if you find anything helpful.

So... the reshape went through fine... /dev/md1 failed once more but
doing the same thing over seemed to work fine. i then instantly went
on to resync the array. this however did not go so well... it failed
twice at the exact same point (/dev/m1 failing again)... looking at
dmesg i got repeated :

[66289.326235] ata2.00: exception Emask 0x0 SAct 0x1fe1ff SErr 0x0 action 0x0
[66289.326247] ata2.00: irq_stat 0x40000008
[66289.326257] ata2.00: failed command: READ FPDMA QUEUED
[66289.326273] ata2.00: cmd 60/20:a0:20:64:5c/00:00:07:00:00/40 tag 20
ncq 16384 in
[66289.326276]          res 41/40:00:36:64:5c/00:00:07:00:00/40 Emask
0x409 (media error) <F>
[66289.326284] ata2.00: status: { DRDY ERR }
[66289.326290] ata2.00: error: { UNC }
[66289.334377] ata2.00: configured for UDMA/133
[66289.334478] sd 2:0:0:0: [sdf] Unhandled sense code
[66289.334486] sd 2:0:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[66289.334499] sd 2:0:0:0: [sdf] Sense Key : Medium Error [current] [descriptor]
[66289.334515] Descriptor sense data with sense descriptors (in hex):
[66289.334522]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[66289.334552]         07 5c 64 36
[66289.334566] sd 2:0:0:0: [sdf] Add. Sense: Unrecovered read error -
auto reallocate failed
[66289.334582] sd 2:0:0:0: [sdf] CDB: Read(10): 28 00 07 5c 64 20 00 00 20 00
[66289.334611] end_request: I/O error, dev sdf, sector 123495478

and smartctl data confirmed a dying /dev/sdf (part of /dev/md1) :

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
Always       -       10
197 Current_Pending_Sector  0x0032   200   200   000    Old_age
Always       -       2

did some further digging and copied (dd) the whole /dev/md1 to another
disk (/dev/sdd1). unearthing a total of 5 unrecoverable 4K blocks. if
only i had gone with the less secure non-degraded option you gave me.
:-)
however assembly with the copied disk fails :

bernstein@server:~$ sudo mdadm/mdadm -Avv /dev/md2 /dev/sda1 /dev/md0
/dev/sdd1 /dev/sdc1

mdadm: looking for devices for /dev/md2
mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 4.
mdadm: /dev/md0 is identified as a member of /dev/md2, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md2, slot 2.

mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 0.
mdadm: no uptodate device for slot 1 of /dev/md2
mdadm: failed to add /dev/sdd1 to /dev/md2: Invalid argument
mdadm: added /dev/md0 to /dev/md2 as 3
mdadm: added /dev/sda1 to /dev/md2 as 4
mdadm: added /dev/sdc1 to /dev/md2 as 0

mdadm: /dev/md2 assembled from 3 drives - not enough to start the array.

and dmesg shows :

[22728.265365] md: md2 stopped.
[22728.271142] md: sdd1 does not have a valid v1.2 superblock, not importing!
[22728.271167] md: md_import_device returned -22
[22728.271524] md: bind<md0>
[22728.271854] md: bind<sda1>
[22728.272135] md: bind<sdc1>
[22728.295812] md: sdd1 does not have a valid v1.2 superblock, not importing!
[22728.295838] md: md_import_device returned -22

but mdadm --examine /dev/md1 /dev/sdd1 outputs exactly the same
superblock information for both devices (and apart from device uuid,
checksum, array slot, array state its identical to sdc1 & sda1) :

/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0

     Array UUID : c3b6db19:b61c3ba9:0a74b12b:3041a523
           Name : master:public
  Creation Time : Sat Jan 22 00:15:43 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)
     Array Size : 7814085120 (3726.05 GiB 4000.81 GB)
  Used Dev Size : 1953521280 (931.51 GiB 1000.20 GB)
    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 3c7e2c3f:8b6c7c43:a0ce7e33:ad680bed

    Update Time : Wed Feb 23 19:34:36 2011
       Checksum : 2132964 - correct
         Events : 137715


         Layout : left-symmetric
     Chunk Size : 64K

    Array Slot : 3 (0, 1, failed, 2, 3, 4)
   Array State : uuUuu 1 failed

does it fail because the device size of /dev/sdd1 & /dev/md1 differs
(normally reflected in the superblock) :
/dev/sdd1:

 Avail Dev Size : 1953521392 (931.51 GiB 1000.20 GB)
/dev/md1:

 Avail Dev Size : 1953541616 (931.52 GiB 1000.21 GB)

or any other idea why it complains about an incorrect superblock?

i really hoped that cloning the defective device would get me back in
the game (guessing this is completely transparent to md and the
defective blocks will only corrupt the filesystem blocks and don't
interfere with md operation) but at this point it seems that restoring
from backup might be faster still.

thanks
claude

@neil sorry about the multiple messages...
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-24  4:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AANLkTikhOAXQ6JAG1fK3x9V3icki8cjn0_ggyQwkGmnt@mail.gmail.com>
2011-02-20  3:23 ` Likely forced assemby with wrong disk during raid5 grow. Recoverable? Claude Nobs
2011-02-20  5:25   ` NeilBrown
2011-02-20 14:44     ` Claude Nobs
2011-02-20 14:47       ` Mathias Burén
2011-02-21  0:53       ` NeilBrown
2011-02-21  1:03         ` NeilBrown
2011-02-23  0:56         ` Claude Nobs
2011-02-23  1:53           ` NeilBrown
2011-02-24  4:06             ` Claude Nobs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.