All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID10 failed with two disks
@ 2011-08-22 10:39 Piotr Legiecki
  2011-08-22 11:09 ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Piotr Legiecki @ 2011-08-22 10:39 UTC (permalink / raw)
  To: linux-raid

Hi

I'v got RAID10 on 4 disks. Suddenly two of the disks failed (I doubt the 
disks actually failed, rather it is a kernel failure or maybe 
motherboard SATA controller)?

So after rebootig I cannot start the array. So my first question is: on 
RAID10 (default layout) which disks may fail and still the array survive?

mdadm --examine /dev/sda1
/dev/sda1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
   Creation Time : Mon Aug 22 10:40:36 2011
      Raid Level : raid10
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 4

     Update Time : Mon Aug 22 10:40:36 2011
           State : clean
  Active Devices : 2
Working Devices : 2
  Failed Devices : 2
   Spare Devices : 0
        Checksum : d4ba8390 - correct
          Events : 1

          Layout : near=2, far=1
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

    0     0       8        1        0      active sync   /dev/sda1
    1     1       8       17        1      active sync   /dev/sdb1
    2     2       0        0        2      faulty
    3     3       0        0        3      faulty

The last two disks (failed ones) are sde1 and sdf1.

So do I have any chances to get the array running or it is dead?

I've tried a few steps to run the array but with no luck.

Regards
P.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID10 failed with two disks
  2011-08-22 10:39 RAID10 failed with two disks Piotr Legiecki
@ 2011-08-22 11:09 ` NeilBrown
  2011-08-22 11:42   ` Piotr Legiecki
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2011-08-22 11:09 UTC (permalink / raw)
  To: Piotr Legiecki; +Cc: linux-raid

On Mon, 22 Aug 2011 12:39:42 +0200 Piotr Legiecki <piotrlg@pum.edu.pl> wrote:

> Hi
> 
> I'v got RAID10 on 4 disks. Suddenly two of the disks failed (I doubt the 
> disks actually failed, rather it is a kernel failure or maybe 
> motherboard SATA controller)?
> 
> So after rebootig I cannot start the array. So my first question is: on 
> RAID10 (default layout) which disks may fail and still the array survive?

Not adjacent disks.

> 
> mdadm --examine /dev/sda1
> /dev/sda1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Mon Aug 22 10:40:36 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Mon Aug 22 10:40:36 2011
>            State : clean
>   Active Devices : 2
> Working Devices : 2
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : d4ba8390 - correct
>           Events : 1
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     0       8        1        0      active sync   /dev/sda1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       0        0        2      faulty
>     3     3       0        0        3      faulty
> 
> The last two disks (failed ones) are sde1 and sdf1.
> 
> So do I have any chances to get the array running or it is dead?

Possible.
Report "mdadm --examine" of all devices that you believe should be part of
the array.

NeilBrown


> 
> I've tried a few steps to run the array but with no luck.
> 
> Regards
> P.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID10 failed with two disks
  2011-08-22 11:09 ` NeilBrown
@ 2011-08-22 11:42   ` Piotr Legiecki
  2011-08-22 12:01     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Piotr Legiecki @ 2011-08-22 11:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

>> mdadm --examine /dev/sda1
>> /dev/sda1:
>>            Magic : a92b4efc
>>          Version : 00.90.00
>>             UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
>>    Creation Time : Mon Aug 22 10:40:36 2011
>>       Raid Level : raid10
>>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>>     Raid Devices : 4
>>    Total Devices : 4
>> Preferred Minor : 4
>>
>>      Update Time : Mon Aug 22 10:40:36 2011
>>            State : clean
>>   Active Devices : 2
>> Working Devices : 2
>>   Failed Devices : 2
>>    Spare Devices : 0
>>         Checksum : d4ba8390 - correct
>>           Events : 1
>>
>>           Layout : near=2, far=1
>>       Chunk Size : 64K
>>
>>        Number   Major   Minor   RaidDevice State
>> this     0       8        1        0      active sync   /dev/sda1
>>
>>     0     0       8        1        0      active sync   /dev/sda1
>>     1     1       8       17        1      active sync   /dev/sdb1
>>     2     2       0        0        2      faulty
>>     3     3       0        0        3      faulty
>>
>> The last two disks (failed ones) are sde1 and sdf1.
>>
>> So do I have any chances to get the array running or it is dead?
> 
> Possible.
> Report "mdadm --examine" of all devices that you believe should be part of
> the array.

/dev/sdb1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
   Creation Time : Mon Aug 22 10:40:36 2011
      Raid Level : raid10
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 4

     Update Time : Mon Aug 22 10:40:36 2011
           State : clean
  Active Devices : 2
Working Devices : 2
  Failed Devices : 2
   Spare Devices : 0
        Checksum : d4ba83a2 - correct
          Events : 1

          Layout : near=2, far=1
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

    0     0       8        1        0      active sync   /dev/sda1
    1     1       8       17        1      active sync   /dev/sdb1
    2     2       0        0        2      faulty
    3     3       0        0        3      faulty



/dev/sde1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 157a7440:4502f6db:990002ab:4fde9f0c (local to host bez)
   Creation Time : Fri Jun  3 12:18:33 2011
      Raid Level : raid10
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 4

     Update Time : Sat Aug 20 03:06:27 2011
           State : clean
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : c2f848c2 - correct
          Events : 24

          Layout : near=2, far=1
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     2       8       65        2      active sync   /dev/sde1

    0     0       8        1        0      active sync   /dev/sda1
    1     1       8       17        1      active sync   /dev/sdb1
    2     2       8       65        2      active sync   /dev/sde1
    3     3       8       81        3      active sync   /dev/sdf1

/dev/sdf1:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 157a7440:4502f6db:990002ab:4fde9f0c (local to host bez)
   Creation Time : Fri Jun  3 12:18:33 2011
      Raid Level : raid10
   Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
      Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 4

     Update Time : Sat Aug 20 03:06:27 2011
           State : clean
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0
        Checksum : c2f848d4 - correct
          Events : 24

          Layout : near=2, far=1
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     3       8       81        3      active sync   /dev/sdf1

    0     0       8        1        0      active sync   /dev/sda1
    1     1       8       17        1      active sync   /dev/sdb1
    2     2       8       65        2      active sync   /dev/sde1
    3     3       8       81        3      active sync   /dev/sdf1


smartd reported the sde and sdf disks are failed, but after rebooting it 
does not complain anymore.

You say adjacent disks must be healthy for RAID10. So in my situation I 
have adjacent disks dead (sde and sdf). It does not look good.

And does layout (near, far etc) influence on this rule: adjacent disk 
must be healthy?


Regards
P.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID10 failed with two disks
  2011-08-22 11:42   ` Piotr Legiecki
@ 2011-08-22 12:01     ` NeilBrown
  2011-08-22 12:52       ` Piotr Legiecki
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2011-08-22 12:01 UTC (permalink / raw)
  To: Piotr Legiecki; +Cc: linux-raid

On Mon, 22 Aug 2011 13:42:54 +0200 Piotr Legiecki <piotrlg@pum.edu.pl> wrote:

> >> mdadm --examine /dev/sda1
> >> /dev/sda1:
> >>            Magic : a92b4efc
> >>          Version : 00.90.00
> >>             UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
> >>    Creation Time : Mon Aug 22 10:40:36 2011
> >>       Raid Level : raid10
> >>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
> >>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
> >>     Raid Devices : 4
> >>    Total Devices : 4
> >> Preferred Minor : 4
> >>
> >>      Update Time : Mon Aug 22 10:40:36 2011
> >>            State : clean
> >>   Active Devices : 2
> >> Working Devices : 2
> >>   Failed Devices : 2
> >>    Spare Devices : 0
> >>         Checksum : d4ba8390 - correct
> >>           Events : 1
> >>
> >>           Layout : near=2, far=1
> >>       Chunk Size : 64K
> >>
> >>        Number   Major   Minor   RaidDevice State
> >> this     0       8        1        0      active sync   /dev/sda1
> >>
> >>     0     0       8        1        0      active sync   /dev/sda1
> >>     1     1       8       17        1      active sync   /dev/sdb1
> >>     2     2       0        0        2      faulty
> >>     3     3       0        0        3      faulty
> >>
> >> The last two disks (failed ones) are sde1 and sdf1.
> >>
> >> So do I have any chances to get the array running or it is dead?
> > 
> > Possible.
> > Report "mdadm --examine" of all devices that you believe should be part of
> > the array.
> 
> /dev/sdb1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : fab2336d:71210520:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Mon Aug 22 10:40:36 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Mon Aug 22 10:40:36 2011
>            State : clean
>   Active Devices : 2
> Working Devices : 2
>   Failed Devices : 2
>    Spare Devices : 0
>         Checksum : d4ba83a2 - correct
>           Events : 1
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     1       8       17        1      active sync   /dev/sdb1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       0        0        2      faulty
>     3     3       0        0        3      faulty
> 
> 
> 
> /dev/sde1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 157a7440:4502f6db:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Fri Jun  3 12:18:33 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Sat Aug 20 03:06:27 2011
>            State : clean
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : c2f848c2 - correct
>           Events : 24
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     2       8       65        2      active sync   /dev/sde1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       8       65        2      active sync   /dev/sde1
>     3     3       8       81        3      active sync   /dev/sdf1
> 
> /dev/sdf1:
>            Magic : a92b4efc
>          Version : 00.90.00
>             UUID : 157a7440:4502f6db:990002ab:4fde9f0c (local to host bez)
>    Creation Time : Fri Jun  3 12:18:33 2011
>       Raid Level : raid10
>    Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
>       Array Size : 1953519872 (1863.02 GiB 2000.40 GB)
>     Raid Devices : 4
>    Total Devices : 4
> Preferred Minor : 4
> 
>      Update Time : Sat Aug 20 03:06:27 2011
>            State : clean
>   Active Devices : 4
> Working Devices : 4
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : c2f848d4 - correct
>           Events : 24
> 
>           Layout : near=2, far=1
>       Chunk Size : 64K
> 
>        Number   Major   Minor   RaidDevice State
> this     3       8       81        3      active sync   /dev/sdf1
> 
>     0     0       8        1        0      active sync   /dev/sda1
>     1     1       8       17        1      active sync   /dev/sdb1
>     2     2       8       65        2      active sync   /dev/sde1
>     3     3       8       81        3      active sync   /dev/sdf1

It looks like sde1 and sdf1 are unchanged since the "failure" which happened
shortly after 3am on Saturday.  So the data on them is probably good.

It looks like someone (you?) tried to create a new array on sda1 and sdb1
thus destroying the old metadata (but probably not the data).  I'm surprised
that mdadm would have let you create a RAID10 with just 2 devices...   Is
that what happened?  or something else?

Anyway it looks as though if you run the command:

  mdadm --create /dev/md4 -l10 -n4 -e 0.90 /dev/sd{a,b,e,d}1 --assume-clean

there is a reasonable change that /dev/md4 would have all your data.
You should then
   fsck -fn /dev/md4
to check that it is all OK.  If it is you can
   echo check > /sys/block/md4/md/sync_action
to check if the mirrors are consistent.  When it finished 
   cat /sys/block/md4/md/mismatch_cnt
will show '0' if all is consistent.

If it is not zero but a small number, you can feel safe doing
    echo repair > /sys/block/md4/md/sync_action
to fix it up.
If it is a big number.... that would be troubling.


> 
> 
> smartd reported the sde and sdf disks are failed, but after rebooting it 
> does not complain anymore.
> 
> You say adjacent disks must be healthy for RAID10. So in my situation I 
> have adjacent disks dead (sde and sdf). It does not look good.
> 
> And does layout (near, far etc) influence on this rule: adjacent disk 
> must be healthy?

I didn't say adjacent disks must be healthy.  Is said you cannot have
adjacent disks both failing.  This is not affected by near/far.
It is a bit more subtle than that though.  It is OK for 2nd and 3rd to both
fail.  But not 1st and 2nd or 3rd and 4th.

NeilBrown


> 
> 
> Regards
> P.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID10 failed with two disks
  2011-08-22 12:01     ` NeilBrown
@ 2011-08-22 12:52       ` Piotr Legiecki
  2011-08-22 23:56         ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Piotr Legiecki @ 2011-08-22 12:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown pisze:
> It looks like sde1 and sdf1 are unchanged since the "failure" which happened
> shortly after 3am on Saturday.  So the data on them is probably good.

And I think so.

> It looks like someone (you?) tried to create a new array on sda1 and sdb1
> thus destroying the old metadata (but probably not the data).  I'm surprised
> that mdadm would have let you create a RAID10 with just 2 devices...   Is
> that what happened?  or something else?

Well, its me of course ;-) I've tried to run the array. It of course 
didn't allo me to create RAID10 on two disks only, so I have used mdadm 
--create .... missing missing parameters. But it didn't help.


> Anyway it looks as though if you run the command:
> 
>   mdadm --create /dev/md4 -l10 -n4 -e 0.90 /dev/sd{a,b,e,d}1 --assume-clean

Personalities : [raid1] [raid10]
md4 : active (auto-read-only) raid10 sdf1[3] sde1[2] sdb1[1] sda1[0]
       1953519872 blocks 64K chunks 2 near-copies [4/4] [UUUU]

md3 : active raid1 sdc4[0] sdd4[1]
       472752704 blocks [2/2] [UU]

md2 : active (auto-read-only) raid1 sdc3[0] sdd3[1]
       979840 blocks [2/2] [UU]

md0 : active raid1 sdd1[0] sdc1[1]
       9767424 blocks [2/2] [UU]

md1 : active raid1 sdd2[0] sdc2[1]
       4883648 blocks [2/2] [UU]

Hura, hura, hura! ;-) Well, wonder why it didn't work for me ;-(


> there is a reasonable change that /dev/md4 would have all your data.
> You should then
>    fsck -fn /dev/md4

fsck issued some errors
....
Illegal block #-1 (3126319976) in inode 14794786.  IGNORED.
Error while iterating over blocks in inode 14794786: Illegal indirect 
block found
e2fsck: aborted

md4 is read-only now.

> to check that it is all OK.  If it is you can
>    echo check > /sys/block/md4/md/sync_action
> to check if the mirrors are consistent.  When it finished 
>    cat /sys/block/md4/md/mismatch_cnt
> will show '0' if all is consistent.
> 
> If it is not zero but a small number, you can feel safe doing
>     echo repair > /sys/block/md4/md/sync_action
> to fix it up.
> If it is a big number.... that would be troubling.

A bit of magic as I can see. Would it not be reasonable to put those 
commands in mdadm?

>> And does layout (near, far etc) influence on this rule: adjacent disk 
>> must be healthy?
> 
> I didn't say adjacent disks must be healthy.  Is said you cannot have
> adjacent disks both failing.  This is not affected by near/far.
> It is a bit more subtle than that though.  It is OK for 2nd and 3rd to both
> fail.  But not 1st and 2nd or 3rd and 4th.

I see. Just like ordinary RAID1+0. First and second pair of the disks 
are RAID1, when both disks in that pair fail the mirror is dead.

Wonder what happens when I create RAID10 on 6 disks? So we have got: 
sda1+sdb1 = RAID1
sdc1+sdd1 = RAID1
sde1+sdf1 = RAID1
Those three RAID1 are striped together in RAID0?
And assuming each disk is 1TB, I have 3TB logical space?
In such situation still the adjacent disks of each RAID1 both must not 
fail.


And I still wonder why it happened? Hardware issue (motherboard)? Or 
kernel bug (2.6.26 - debian/lenny)?


Thank you very nice for help.

Regards
Piotr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID10 failed with two disks
  2011-08-22 12:52       ` Piotr Legiecki
@ 2011-08-22 23:56         ` NeilBrown
  2011-08-23  8:35           ` Piotr Legiecki
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2011-08-22 23:56 UTC (permalink / raw)
  To: Piotr Legiecki; +Cc: linux-raid

On Mon, 22 Aug 2011 14:52:50 +0200 Piotr Legiecki <piotrlg@pum.edu.pl> wrote:

> NeilBrown pisze:
> > It looks like sde1 and sdf1 are unchanged since the "failure" which happened
> > shortly after 3am on Saturday.  So the data on them is probably good.
> 
> And I think so.
> 
> > It looks like someone (you?) tried to create a new array on sda1 and sdb1
> > thus destroying the old metadata (but probably not the data).  I'm surprised
> > that mdadm would have let you create a RAID10 with just 2 devices...   Is
> > that what happened?  or something else?
> 
> Well, its me of course ;-) I've tried to run the array. It of course 
> didn't allo me to create RAID10 on two disks only, so I have used mdadm 
> --create .... missing missing parameters. But it didn't help.
> 
> 
> > Anyway it looks as though if you run the command:
> > 
> >   mdadm --create /dev/md4 -l10 -n4 -e 0.90 /dev/sd{a,b,e,d}1 --assume-clean
> 
> Personalities : [raid1] [raid10]
> md4 : active (auto-read-only) raid10 sdf1[3] sde1[2] sdb1[1] sda1[0]
>        1953519872 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> md3 : active raid1 sdc4[0] sdd4[1]
>        472752704 blocks [2/2] [UU]
> 
> md2 : active (auto-read-only) raid1 sdc3[0] sdd3[1]
>        979840 blocks [2/2] [UU]
> 
> md0 : active raid1 sdd1[0] sdc1[1]
>        9767424 blocks [2/2] [UU]
> 
> md1 : active raid1 sdd2[0] sdc2[1]
>        4883648 blocks [2/2] [UU]
> 
> Hura, hura, hura! ;-) Well, wonder why it didn't work for me ;-(

Looks good so far, but is you data safe?


> 
> 
> > there is a reasonable change that /dev/md4 would have all your data.
> > You should then
> >    fsck -fn /dev/md4
> 
> fsck issued some errors
> ....
> Illegal block #-1 (3126319976) in inode 14794786.  IGNORED.
> Error while iterating over blocks in inode 14794786: Illegal indirect 
> block found
> e2fsck: aborted

Mostly safe it seems .... assuming there were really serious things that you
hid behind the "...".

An "fsck -f /dev/md4" would probably fix it up.


> 
> md4 is read-only now.
> 
> > to check that it is all OK.  If it is you can
> >    echo check > /sys/block/md4/md/sync_action
> > to check if the mirrors are consistent.  When it finished 
> >    cat /sys/block/md4/md/mismatch_cnt
> > will show '0' if all is consistent.
> > 
> > If it is not zero but a small number, you can feel safe doing
> >     echo repair > /sys/block/md4/md/sync_action
> > to fix it up.
> > If it is a big number.... that would be troubling.
> 
> A bit of magic as I can see. Would it not be reasonable to put those 
> commands in mdadm?

Maybe one day.   So much to do, so little time!


> 
> >> And does layout (near, far etc) influence on this rule: adjacent disk 
> >> must be healthy?
> > 
> > I didn't say adjacent disks must be healthy.  Is said you cannot have
> > adjacent disks both failing.  This is not affected by near/far.
> > It is a bit more subtle than that though.  It is OK for 2nd and 3rd to both
> > fail.  But not 1st and 2nd or 3rd and 4th.
> 
> I see. Just like ordinary RAID1+0. First and second pair of the disks 
> are RAID1, when both disks in that pair fail the mirror is dead.

Like that - yes.

> 
> Wonder what happens when I create RAID10 on 6 disks? So we have got: 
> sda1+sdb1 = RAID1
> sdc1+sdd1 = RAID1
> sde1+sdf1 = RAID1
> Those three RAID1 are striped together in RAID0?
> And assuming each disk is 1TB, I have 3TB logical space?
> In such situation still the adjacent disks of each RAID1 both must not 
> fail.

This is correct assuming the default layout.
If you asked for "--layout=n3" you would get a 3-way mirror over a1,b1,c1 and
d1,e1,f1 and those would be raid0-ed.

If you had 5 devices then you get data copied on
  sda1+sdb1
  sdc1+sdd1
  sde1+sda1
  sdb1+sdc1
  sdde+sde1

so is *any* pair of adjacent devices fail, you lose data.


> 
> 
> And I still wonder why it happened? Hardware issue (motherboard)? Or 
> kernel bug (2.6.26 - debian/lenny)?

Hard to tell without seeing kernel logs.  Almost certainly a hardware issue
of some sort.  Maybe a loose or bumped cable. Maybe a power supply spike.
Maybe a stray cosmic ray....

NeilBrown
> 
> 
> Thank you very nice for help.
> 
> Regards
> Piotr


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RAID10 failed with two disks
  2011-08-22 23:56         ` NeilBrown
@ 2011-08-23  8:35           ` Piotr Legiecki
  0 siblings, 0 replies; 7+ messages in thread
From: Piotr Legiecki @ 2011-08-23  8:35 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid


>> Hura, hura, hura! ;-) Well, wonder why it didn't work for me ;-(
> 
> Looks good so far, but is you data safe?

I think so.
fsck has found some errors and corrected them.
resync done
cat /sys/block/md4/md/mismatch_cnt
0
Looks good.


> This is correct assuming the default layout.
> If you asked for "--layout=n3" you would get a 3-way mirror over a1,b1,c1 and
> d1,e1,f1 and those would be raid0-ed.

And the question which one is the most efficient  is beyond the scope of 
our subject I'm of course? Or maybe there is some general rule of thumb 
for this?

The more disks the faster array should be *but* the more data to mirror 
at once when writing...

Anyway my tests proved that RAID1 on two disks is *much* slower than 
RAID10 on 4 disks. RAID10 SATA can easily compete with HP SmartAray 
P410i/BBC SAS RAIDs (but in RAID1 only ;-)). Well, at least during 
iozone benchmarks.

> If you had 5 devices then you get data copied on
>   sda1+sdb1
>   sdc1+sdd1
>   sde1+sda1
>   sdb1+sdc1
>   sdde+sde1
> 
> so is *any* pair of adjacent devices fail, you lose data.

So from safety point of view there is need for more spare disks or go 
for RAID6.

> Hard to tell without seeing kernel logs.  Almost certainly a hardware issue
> of some sort.  Maybe a loose or bumped cable. Maybe a power supply spike.
> Maybe a stray cosmic ray....

http://pastebin.com/iapZWm0S

Those 'failed' disks are connected to motherboard SATA ports. I've got 
also Adaptec 1430 adapter with 2 free ports, maybe I should move those 
disks there.

Thank you for all the help and time put into answering my questions.

Piotr

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-23  8:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-22 10:39 RAID10 failed with two disks Piotr Legiecki
2011-08-22 11:09 ` NeilBrown
2011-08-22 11:42   ` Piotr Legiecki
2011-08-22 12:01     ` NeilBrown
2011-08-22 12:52       ` Piotr Legiecki
2011-08-22 23:56         ` NeilBrown
2011-08-23  8:35           ` Piotr Legiecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.