All of lore.kernel.org
 help / color / mirror / Atom feed
* Several steps to death
@ 2010-01-25 21:21 aragonx
  2010-01-26  1:35 ` Michael Evans
  2010-01-26  9:28 ` Asdo
  0 siblings, 2 replies; 11+ messages in thread
From: aragonx @ 2010-01-25 21:21 UTC (permalink / raw)
  To: linux-raid

Hello all,

I have a RAID 5 array that was created on Fedora 9 that just holds user
files (Samba share).  Everything was fine until a kernel upgrade and
motherboard failure made it impossible for me to boot.  After a new
motherboard and an upgrade to Fedora 12, my array is toast.

The problems are my own.  I was not paying enough attention to the data
and more on the OS.  So what happened was what was originally a 5 disk
RAID 5 array was somehow detected as a RAID 5 array with 4 disks + 1
spare.  It mounted and started a rebuild.  It was somewhere around 40%
before I noticed it.

So my question is, can I get this data back or is it gone?

If I try to mount it now, with the correct configuration I get the
following error:

mdadm --create /dev/md0 --level=5 --spare-devices=0 --raid-devices=5
/dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdf1[5] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      2930287616 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
      [>....................]  recovery =  0.1% (1255864/732571904)
finish=155.2min speed=78491K/sec

unused devices: <none>

mount -t ext4 -o usrquota,grpquota,acl,user_xattr /dev/md0 /home/data

mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 18928390:76024ba7:d9fdb3bf:6408b6d2 (local to host server)
  Creation Time : Mon Jan 25 16:14:08 2010
     Raid Level : raid5
  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
   Raid Devices : 5
  Total Devices : 6
Preferred Minor : 0

    Update Time : Mon Jan 25 16:14:08 2010
          State : clean
 Active Devices : 4
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 382dc6ea - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       65        3      active sync   /dev/sde1
   4     0       0        0        0      spare
   5     5       8       81        5      spare   /dev/sdf1


Here is what is in /var/log/messages

Jan 25 16:14:08 server kernel: md: bind<sdb1>
Jan 25 16:14:08 server kernel: md: bind<sdc1>
Jan 25 16:14:08 server kernel: md: bind<sdd1>
Jan 25 16:14:08 server kernel: md: bind<sde1>
Jan 25 16:14:08 server kernel: md: bind<sdf1>
Jan 25 16:14:09 server kernel: raid5: device sde1 operational as raid disk 3
Jan 25 16:14:09 server kernel: raid5: device sdd1 operational as raid disk 2
Jan 25 16:14:09 server kernel: raid5: device sdc1 operational as raid disk 1
Jan 25 16:14:09 server kernel: raid5: device sdb1 operational as raid disk 0
Jan 25 16:14:09 server kernel: raid5: allocated 5332kB for md0
Jan 25 16:14:09 server kernel: raid5: raid level 5 set md0 active with 4
out of 5 devices, algorithm 2
Jan 25 16:14:09 server kernel: RAID5 conf printout:
Jan 25 16:14:09 server kernel: --- rd:5 wd:4
Jan 25 16:14:09 server kernel: disk 0, o:1, dev:sdb1
Jan 25 16:14:09 server kernel: disk 1, o:1, dev:sdc1
Jan 25 16:14:09 server kernel: disk 2, o:1, dev:sdd1
Jan 25 16:14:09 server kernel: disk 3, o:1, dev:sde1
Jan 25 16:14:09 server kernel: md0: detected capacity change from 0 to
3000614518784
Jan 25 16:14:09 server kernel: md0: unknown partition table
Jan 25 16:14:09 server kernel: RAID5 conf printout:
Jan 25 16:14:09 server kernel: --- rd:5 wd:4
Jan 25 16:14:09 server kernel: disk 0, o:1, dev:sdb1
Jan 25 16:14:09 server kernel: disk 1, o:1, dev:sdc1
Jan 25 16:14:09 server kernel: disk 2, o:1, dev:sdd1
Jan 25 16:14:09 server kernel: disk 3, o:1, dev:sde1
Jan 25 16:14:09 server kernel: disk 4, o:1, dev:sdf1
Jan 25 16:14:09 server kernel: md: recovery of RAID array md0
Jan 25 16:14:09 server kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Jan 25 16:14:09 server kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for recovery.
Jan 25 16:14:09 server kernel: md: using 128k window, over a total of
732571904 blocks.
Jan 25 16:15:12 server kernel: EXT4-fs (md0): VFS: Can't find ext4 filesystem

Thank you in advance.

---
Will Y.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-25 21:21 Several steps to death aragonx
@ 2010-01-26  1:35 ` Michael Evans
  2010-01-26  9:28 ` Asdo
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Evans @ 2010-01-26  1:35 UTC (permalink / raw)
  To: aragonx; +Cc: linux-raid

On Mon, Jan 25, 2010 at 1:21 PM,  <aragonx@dcsnow.com> wrote:
> Hello all,
>
> I have a RAID 5 array that was created on Fedora 9 that just holds user
> files (Samba share).  Everything was fine until a kernel upgrade and
> motherboard failure made it impossible for me to boot.  After a new
> motherboard and an upgrade to Fedora 12, my array is toast.
>
> The problems are my own.  I was not paying enough attention to the data
> and more on the OS.  So what happened was what was originally a 5 disk
> RAID 5 array was somehow detected as a RAID 5 array with 4 disks + 1
> spare.  It mounted and started a rebuild.  It was somewhere around 40%
> before I noticed it.
>
> So my question is, can I get this data back or is it gone?
>
> If I try to mount it now, with the correct configuration I get the
> following error:
>
> mdadm --create /dev/md0 --level=5 --spare-devices=0 --raid-devices=5
> /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
>
> cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdf1[5] sde1[3] sdd1[2] sdc1[1] sdb1[0]
>      2930287616 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
>      [>....................]  recovery =  0.1% (1255864/732571904)
> finish=155.2min speed=78491K/sec
>
> unused devices: <none>
>
> mount -t ext4 -o usrquota,grpquota,acl,user_xattr /dev/md0 /home/data
>
> mdadm -E /dev/sdb1
> /dev/sdb1:
>          Magic : a92b4efc
>        Version : 0.90.00
>           UUID : 18928390:76024ba7:d9fdb3bf:6408b6d2 (local to host server)
>  Creation Time : Mon Jan 25 16:14:08 2010
>     Raid Level : raid5
>  Used Dev Size : 732571904 (698.64 GiB 750.15 GB)
>     Array Size : 2930287616 (2794.54 GiB 3000.61 GB)
>   Raid Devices : 5
>  Total Devices : 6
> Preferred Minor : 0
>
>    Update Time : Mon Jan 25 16:14:08 2010
>          State : clean
>  Active Devices : 4
> Working Devices : 5
>  Failed Devices : 1
>  Spare Devices : 1
>       Checksum : 382dc6ea - correct
>         Events : 1
>
>         Layout : left-symmetric
>     Chunk Size : 64K
>
>      Number   Major   Minor   RaidDevice State
> this     0       8       17        0      active sync   /dev/sdb1
>
>   0     0       8       17        0      active sync   /dev/sdb1
>   1     1       8       33        1      active sync   /dev/sdc1
>   2     2       8       49        2      active sync   /dev/sdd1
>   3     3       8       65        3      active sync   /dev/sde1
>   4     0       0        0        0      spare
>   5     5       8       81        5      spare   /dev/sdf1
>
>
> Here is what is in /var/log/messages
>
> Jan 25 16:14:08 server kernel: md: bind<sdb1>
> Jan 25 16:14:08 server kernel: md: bind<sdc1>
> Jan 25 16:14:08 server kernel: md: bind<sdd1>
> Jan 25 16:14:08 server kernel: md: bind<sde1>
> Jan 25 16:14:08 server kernel: md: bind<sdf1>
> Jan 25 16:14:09 server kernel: raid5: device sde1 operational as raid disk 3
> Jan 25 16:14:09 server kernel: raid5: device sdd1 operational as raid disk 2
> Jan 25 16:14:09 server kernel: raid5: device sdc1 operational as raid disk 1
> Jan 25 16:14:09 server kernel: raid5: device sdb1 operational as raid disk 0
> Jan 25 16:14:09 server kernel: raid5: allocated 5332kB for md0
> Jan 25 16:14:09 server kernel: raid5: raid level 5 set md0 active with 4
> out of 5 devices, algorithm 2
> Jan 25 16:14:09 server kernel: RAID5 conf printout:
> Jan 25 16:14:09 server kernel: --- rd:5 wd:4
> Jan 25 16:14:09 server kernel: disk 0, o:1, dev:sdb1
> Jan 25 16:14:09 server kernel: disk 1, o:1, dev:sdc1
> Jan 25 16:14:09 server kernel: disk 2, o:1, dev:sdd1
> Jan 25 16:14:09 server kernel: disk 3, o:1, dev:sde1
> Jan 25 16:14:09 server kernel: md0: detected capacity change from 0 to
> 3000614518784
> Jan 25 16:14:09 server kernel: md0: unknown partition table
> Jan 25 16:14:09 server kernel: RAID5 conf printout:
> Jan 25 16:14:09 server kernel: --- rd:5 wd:4
> Jan 25 16:14:09 server kernel: disk 0, o:1, dev:sdb1
> Jan 25 16:14:09 server kernel: disk 1, o:1, dev:sdc1
> Jan 25 16:14:09 server kernel: disk 2, o:1, dev:sdd1
> Jan 25 16:14:09 server kernel: disk 3, o:1, dev:sde1
> Jan 25 16:14:09 server kernel: disk 4, o:1, dev:sdf1
> Jan 25 16:14:09 server kernel: md: recovery of RAID array md0
> Jan 25 16:14:09 server kernel: md: minimum _guaranteed_  speed: 1000
> KB/sec/disk.
> Jan 25 16:14:09 server kernel: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for recovery.
> Jan 25 16:14:09 server kernel: md: using 128k window, over a total of
> 732571904 blocks.
> Jan 25 16:15:12 server kernel: EXT4-fs (md0): VFS: Can't find ext4 filesystem
>
> Thank you in advance.
>
> ---
> Will Y.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Are you able to bring the 4 complete members up read only and read
your file-system?  In this case it sounds as if one disk was stale
when your system crashed (probably it's what didn't get data
written/synced to it) and thus is trying to regenerate the stale disk
(you previously had one distributed drive worth of parity thanks to
using raid-5 over raid-0).

Otherwise, I think you've probably obliterated enough data for any
recovery to be problematic at best.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-25 21:21 Several steps to death aragonx
  2010-01-26  1:35 ` Michael Evans
@ 2010-01-26  9:28 ` Asdo
  2010-01-26 13:17   ` aragonx
  1 sibling, 1 reply; 11+ messages in thread
From: Asdo @ 2010-01-26  9:28 UTC (permalink / raw)
  To: aragonx; +Cc: linux-raid

aragonx@dcsnow.com wrote:
> If I try to mount it now, with the correct configuration I get the
> following error:
>
> mdadm --create /dev/md0 --level=5 --spare-devices=0 --raid-devices=5
> /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
>
> ...
>   

I think this is the point where you destroyed your data. Probably the 
devices were not specified in the exact order as before: in this command 
you specified the array with increasing letters order but if you changed 
the mainboard, drives won't be detected in the same order as with the 
old mainboard.

Previously when you thought it was detected as 4 disks + 1 spare, I 
believe instead it was just doing a resync because the array was stale, 
probably the old mainboard broke when it was writing something. Hadn't 
you entered the above --create command, your data would presumably still 
be there. You didn't try mounting it, did you?

At the current state of things, I'm not sure if you can get your data 
out of there.

In theory if you stop the current resync, try to recreate it in all 
possible permutations of 5 drives using --assume-clean during creation 
so to prevent any resyncs (and just mount readonly!) then try mounting 
it or try fsck -n to see if data makes sense, you might be able to get 
data out of it. I have never tried this technique, I read it from a post 
by Neil Brown in this ML, date 12/04/2009 10:12 PM, subject "Re: RAID5 
demise or coma? after re-creating with a spare" try looking for it in 
the archives

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26  9:28 ` Asdo
@ 2010-01-26 13:17   ` aragonx
  2010-01-26 13:36     ` Asdo
  2010-01-26 14:45     ` Kristleifur Daðason
  0 siblings, 2 replies; 11+ messages in thread
From: aragonx @ 2010-01-26 13:17 UTC (permalink / raw)
  To: linux-raid

> In theory if you stop the current resync, try to recreate it in all
> possible permutations of 5 drives using --assume-clean during creation
> so to prevent any resyncs (and just mount readonly!) then try mounting
> it or try fsck -n to see if data makes sense, you might be able to get
> data out of it. I have never tried this technique, I read it from a post
> by Neil Brown in this ML, date 12/04/2009 10:12 PM, subject "Re: RAID5
> demise or coma? after re-creating with a spare" try looking for it in
> the archives

Wow.

I tried several times to assemble the array but I was not aware that it
has to be in the same order.  That explains a lot.  Time for a hardware
RAID controller.  :)  Thank you for the information.

---
Will Y.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 13:17   ` aragonx
@ 2010-01-26 13:36     ` Asdo
  2010-01-26 14:36       ` aragonx
  2010-01-26 14:45     ` Kristleifur Daðason
  1 sibling, 1 reply; 11+ messages in thread
From: Asdo @ 2010-01-26 13:36 UTC (permalink / raw)
  To: aragonx; +Cc: linux-raid

aragonx@dcsnow.com wrote:
> Wow.
>
> I tried several times to assemble the array but I was not aware that it
> has to be in the same order.  That explains a lot.  Time for a hardware
> RAID controller.  :)  Thank you for the information.
>   

"--assemble" does not need drives to be specified in the correct order, 
but "--create" on an existing array will be destructive on existing data 
unless drives are specified in the same order as before.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 13:36     ` Asdo
@ 2010-01-26 14:36       ` aragonx
  2010-01-26 14:56         ` Michał Sawicz
  0 siblings, 1 reply; 11+ messages in thread
From: aragonx @ 2010-01-26 14:36 UTC (permalink / raw)
  To: linux-raid

> "--assemble" does not need drives to be specified in the correct order,
> but "--create" on an existing array will be destructive on existing data
> unless drives are specified in the same order as before.

Poor wording on my part.  Yes I tried to assemble the array several times
under Fedora 12 and Knoppix 6.2.  It would not start so then I tried the
create.  Either way the data is gone and that answers my question.

Thank you


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 13:17   ` aragonx
  2010-01-26 13:36     ` Asdo
@ 2010-01-26 14:45     ` Kristleifur Daðason
  1 sibling, 0 replies; 11+ messages in thread
From: Kristleifur Daðason @ 2010-01-26 14:45 UTC (permalink / raw)
  To: aragonx; +Cc: linux-raid

On Tue, Jan 26, 2010 at 1:17 PM, <aragonx@dcsnow.com> wrote:
> Time for a hardware RAID controller. :)

I'm sorry for your data loss. Not a nice thing to experience.

When/if you get a hardware RAID controller, be sure to get a hardware
controller that you can reliably purchase and replace. A friend of
mine had some pretty bad trouble when his hardware controller fried
and he couldn't easily get a replacement.

I'd actually suggest getting a h/w controller that uses an internal
format that is supported by some software implementation, so you have
more options to rescue the array. (I think mdadm can read some h/w
ctrlr data formats (??))

Best of luck.

-- Kristleifur
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 14:36       ` aragonx
@ 2010-01-26 14:56         ` Michał Sawicz
  2010-01-26 15:07           ` Asdo
  2010-01-26 17:48           ` aragonx
  0 siblings, 2 replies; 11+ messages in thread
From: Michał Sawicz @ 2010-01-26 14:56 UTC (permalink / raw)
  To: aragonx; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 490 bytes --]

Dnia 2010-01-26, wto o godzinie 09:36 -0500, aragonx@dcsnow.com pisze:
> Poor wording on my part.  Yes I tried to assemble the array several
> times
> under Fedora 12 and Knoppix 6.2.  It would not start so then I tried
> the
> create.  Either way the data is gone and that answers my question. 

You should have also used '--assume-clean' so that no data would be
written onto the array while the array was assembled. Saved my ass
earlier.

-- 
Cheers
Michał (Saviq) Sawicz

[-- Attachment #2: To jest część wiadomości podpisana cyfrowo --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 14:56         ` Michał Sawicz
@ 2010-01-26 15:07           ` Asdo
  2010-01-26 17:48           ` aragonx
  1 sibling, 0 replies; 11+ messages in thread
From: Asdo @ 2010-01-26 15:07 UTC (permalink / raw)
  To: Michał Sawicz; +Cc: linux-raid

Michał Sawicz wrote:
> Dnia 2010-01-26, wto o godzinie 09:36 -0500, aragonx@dcsnow.com pisze:
>   
>> Poor wording on my part.  Yes I tried to assemble the array several
>> times
>> under Fedora 12 and Knoppix 6.2.  It would not start so then I tried
>> the
>> create.  Either way the data is gone and that answers my question. 
>>     
>
> You should have also used '--assume-clean' so that no data would be
> written onto the array while the array was assembled. Saved my ass
> earlier.

--assume-clean is a create option not an assemble option.

Assemble should never destroy data. If it really happened it's a bug but 
we don't have much information.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 14:56         ` Michał Sawicz
  2010-01-26 15:07           ` Asdo
@ 2010-01-26 17:48           ` aragonx
  2010-01-26 18:12             ` Michał Sawicz
  1 sibling, 1 reply; 11+ messages in thread
From: aragonx @ 2010-01-26 17:48 UTC (permalink / raw)
  To: linux-raid

> You should have also used '--assume-clean' so that no data would be
> written onto the array while the array was assembled. Saved my ass
> earlier.

A very good thing to know.  I wish I would have known earlier.  I had
tried the create trick before and it worked fine so I figured that was all
I needed to do.  This RAID stuff may be too involved for what I need.  It
is only about 2TB of data.  I may just split it up between two 2TB hard
disks and leave it be.

So I guess that opens this up to a question.  What would everyone suggest
I do.  Should I try the software RAID again?  Should I go with a hardware
controller?  Or should I just have two independent disks?

Right now I have 5 750GB disks that were part of the RAID 5.  I can of
course reuse those in whatever solution I go with.

My thinking this morning is to setup a 2nd server with the 5 750GB disks
and leave that as a software RAID.  Then I would purchase 2 2TB disks for
my current server and just use them as-is.

Throughput is a concern though.  I currently was getting 50MB/sec over the
network.  This was okay but I really wanted to get that up to 75MB/sec or
more.

Suggestions?

---
Will Y.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Several steps to death
  2010-01-26 17:48           ` aragonx
@ 2010-01-26 18:12             ` Michał Sawicz
  0 siblings, 0 replies; 11+ messages in thread
From: Michał Sawicz @ 2010-01-26 18:12 UTC (permalink / raw)
  To: aragonx; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

Dnia 2010-01-26, wto o godzinie 12:48 -0500, aragonx@dcsnow.com pisze:
> My thinking this morning is to setup a 2nd server with the 5 750GB
> disks
> and leave that as a software RAID.  Then I would purchase 2 2TB disks
> for
> my current server and just use them as-is.

Remember that if you lose any of the 2TB disks, you lose all data on
that disk. With those 5 disks in a RAID5 array, you can safely replace
any of them, without losing any data.

> Throughput is a concern though.  I currently was getting 50MB/sec over
> the
> network.  This was okay but I really wanted to get that up to 75MB/sec
> or
> more.

From a RAID5 array (when reading) you should get more than from one,
even faster, disk. Remember that with a RAID5 data is read sequentially
from N-1 disks (where N is the number of disks in your array). Writing
can be a bit slower due to parity calculation that needs to occur.

I would stay with the software RAID setup, but it seems you should try
and read some more about it. Remember to use some monitoring tools like
smartd and mdadm --monitor that will shout at you when there are
problems.

-- 
Cheers
Michał (Saviq) Sawicz

[-- Attachment #2: To jest część wiadomości podpisana cyfrowo --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-26 18:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-25 21:21 Several steps to death aragonx
2010-01-26  1:35 ` Michael Evans
2010-01-26  9:28 ` Asdo
2010-01-26 13:17   ` aragonx
2010-01-26 13:36     ` Asdo
2010-01-26 14:36       ` aragonx
2010-01-26 14:56         ` Michał Sawicz
2010-01-26 15:07           ` Asdo
2010-01-26 17:48           ` aragonx
2010-01-26 18:12             ` Michał Sawicz
2010-01-26 14:45     ` Kristleifur Daðason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.