All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Piotr Legiecki <piotrlg@pum.edu.pl>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID10 failed with two disks
Date: Tue, 23 Aug 2011 09:56:34 +1000	[thread overview]
Message-ID: <20110823095634.398f2118@notabene.brown> (raw)
In-Reply-To: <4E525122.7090607@pum.edu.pl>

On Mon, 22 Aug 2011 14:52:50 +0200 Piotr Legiecki <piotrlg@pum.edu.pl> wrote:

> NeilBrown pisze:
> > It looks like sde1 and sdf1 are unchanged since the "failure" which happened
> > shortly after 3am on Saturday.  So the data on them is probably good.
> 
> And I think so.
> 
> > It looks like someone (you?) tried to create a new array on sda1 and sdb1
> > thus destroying the old metadata (but probably not the data).  I'm surprised
> > that mdadm would have let you create a RAID10 with just 2 devices...   Is
> > that what happened?  or something else?
> 
> Well, its me of course ;-) I've tried to run the array. It of course 
> didn't allo me to create RAID10 on two disks only, so I have used mdadm 
> --create .... missing missing parameters. But it didn't help.
> 
> 
> > Anyway it looks as though if you run the command:
> > 
> >   mdadm --create /dev/md4 -l10 -n4 -e 0.90 /dev/sd{a,b,e,d}1 --assume-clean
> 
> Personalities : [raid1] [raid10]
> md4 : active (auto-read-only) raid10 sdf1[3] sde1[2] sdb1[1] sda1[0]
>        1953519872 blocks 64K chunks 2 near-copies [4/4] [UUUU]
> 
> md3 : active raid1 sdc4[0] sdd4[1]
>        472752704 blocks [2/2] [UU]
> 
> md2 : active (auto-read-only) raid1 sdc3[0] sdd3[1]
>        979840 blocks [2/2] [UU]
> 
> md0 : active raid1 sdd1[0] sdc1[1]
>        9767424 blocks [2/2] [UU]
> 
> md1 : active raid1 sdd2[0] sdc2[1]
>        4883648 blocks [2/2] [UU]
> 
> Hura, hura, hura! ;-) Well, wonder why it didn't work for me ;-(

Looks good so far, but is you data safe?


> 
> 
> > there is a reasonable change that /dev/md4 would have all your data.
> > You should then
> >    fsck -fn /dev/md4
> 
> fsck issued some errors
> ....
> Illegal block #-1 (3126319976) in inode 14794786.  IGNORED.
> Error while iterating over blocks in inode 14794786: Illegal indirect 
> block found
> e2fsck: aborted

Mostly safe it seems .... assuming there were really serious things that you
hid behind the "...".

An "fsck -f /dev/md4" would probably fix it up.


> 
> md4 is read-only now.
> 
> > to check that it is all OK.  If it is you can
> >    echo check > /sys/block/md4/md/sync_action
> > to check if the mirrors are consistent.  When it finished 
> >    cat /sys/block/md4/md/mismatch_cnt
> > will show '0' if all is consistent.
> > 
> > If it is not zero but a small number, you can feel safe doing
> >     echo repair > /sys/block/md4/md/sync_action
> > to fix it up.
> > If it is a big number.... that would be troubling.
> 
> A bit of magic as I can see. Would it not be reasonable to put those 
> commands in mdadm?

Maybe one day.   So much to do, so little time!


> 
> >> And does layout (near, far etc) influence on this rule: adjacent disk 
> >> must be healthy?
> > 
> > I didn't say adjacent disks must be healthy.  Is said you cannot have
> > adjacent disks both failing.  This is not affected by near/far.
> > It is a bit more subtle than that though.  It is OK for 2nd and 3rd to both
> > fail.  But not 1st and 2nd or 3rd and 4th.
> 
> I see. Just like ordinary RAID1+0. First and second pair of the disks 
> are RAID1, when both disks in that pair fail the mirror is dead.

Like that - yes.

> 
> Wonder what happens when I create RAID10 on 6 disks? So we have got: 
> sda1+sdb1 = RAID1
> sdc1+sdd1 = RAID1
> sde1+sdf1 = RAID1
> Those three RAID1 are striped together in RAID0?
> And assuming each disk is 1TB, I have 3TB logical space?
> In such situation still the adjacent disks of each RAID1 both must not 
> fail.

This is correct assuming the default layout.
If you asked for "--layout=n3" you would get a 3-way mirror over a1,b1,c1 and
d1,e1,f1 and those would be raid0-ed.

If you had 5 devices then you get data copied on
  sda1+sdb1
  sdc1+sdd1
  sde1+sda1
  sdb1+sdc1
  sdde+sde1

so is *any* pair of adjacent devices fail, you lose data.


> 
> 
> And I still wonder why it happened? Hardware issue (motherboard)? Or 
> kernel bug (2.6.26 - debian/lenny)?

Hard to tell without seeing kernel logs.  Almost certainly a hardware issue
of some sort.  Maybe a loose or bumped cable. Maybe a power supply spike.
Maybe a stray cosmic ray....

NeilBrown
> 
> 
> Thank you very nice for help.
> 
> Regards
> Piotr


  reply	other threads:[~2011-08-22 23:56 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-22 10:39 RAID10 failed with two disks Piotr Legiecki
2011-08-22 11:09 ` NeilBrown
2011-08-22 11:42   ` Piotr Legiecki
2011-08-22 12:01     ` NeilBrown
2011-08-22 12:52       ` Piotr Legiecki
2011-08-22 23:56         ` NeilBrown [this message]
2011-08-23  8:35           ` Piotr Legiecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110823095634.398f2118@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=piotrlg@pum.edu.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.