All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Peter van Es <vanes.peter@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Help needed recovering from raid failure
Date: Wed, 29 Apr 2015 08:26:03 +1000	[thread overview]
Message-ID: <20150429082603.56fb9aa9@notabene.brown> (raw)
In-Reply-To: <4D8713B5-39E7-4EE2-898C-35DC0948B4CA@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4049 bytes --]

On Mon, 27 Apr 2015 11:35:09 +0200 Peter van Es <vanes.peter@gmail.com> wrote:

> Sorry for the long post...
> 
> I am running Ubuntu LTS 14.04.02 Server edition, 64 bits, with 4x 2.0TB drives in a raid-5 array.
> 
> The 4th drive was beginning to show read errors. Because it was weekend, I could not go out
> and buy a spare 2TB drive to replace the one that was beginning to fail.
> 
> I first got a fail event:
> 
> This is an automatically generated mail message from mdadm
> running on bali
> 
> A Fail event had been detected on md device /dev/md/1.
> 
> It could be related to component device /dev/sdd2.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
> md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
>     5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
> 
> md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
>     5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
> 
> unused devices: <none>
> 
> And then subsequently, around 18 hours later:
> 
> This is an automatically generated mail message from mdadm
> running on bali
> 
> A DegradedArray event had been detected on md device /dev/md/1.

This isn't really reporting anything new.
There is probably a daily cron job which reports all degraded arrays.  This
message is reported by that job.

> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
> md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
>     5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
> 
> md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
>     5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
> 
> unused devices: <none>
> 
> The server had taken the array off line at that point.

Why do you think the array is off-line?  The above message doesn't suggest
that.


> 
> Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
> get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).

You boot off a RAID5?  Does grub support that?  I didn't know.
But md0 hasn't failed, has it?

Confused.



> 
> I booted Linux from a USB stick (which is on /dev/sdc1 hence changing the numbering),
> in recovery mode. Below is the output of /proc/mdstat and 
> mdadm --examine. It looks like somehow the /dev/sdd2 and /dev/sde2 drives took on the 
> super block of the /dev/md127 device (my swap file). May that have been done by the boot from
> the Ubuntu USB stick?

There is something VERY sick here.  I suggest that you tread very carefully.

All your '1' partitions should be about 2GB and the '2' parititions about 2TB

But the --examine output suggests sda2 and sdb2 are 2TB, while sdd2 and sde2
are 2GB.

That really really shouldn't happen.  Maybe check your partition table
(fdisk).
I really cannot see how this would happen.
> 
> My plan... assemble a degraded array, with /dev/sde2 (the 4th drive, formerly known as /dev/sdd2) not in it.
> Because the fail event put the file system in RO mode, I expect /dev/sdd2 (formerly /dev/sdc2) to be ok.
> Then insert new 2TB drive in slot 4. Let system resync and recover.
> 
> I'm running xfs on the /dev/md1 device.
> 
> Questions:
> 
> 1. is this the wise course of action ?
> 2. how exactly do I reassemble the array (/etc/mdadm.conf is inaccessible in recovery mode)
> 3. what command line options do I use exactly from the --examine output below without screwing things up
> 
> And help or pointers gratefully accepted

Can you
  mdadm -Ss

to stop all the arrays, then

  fdisk -l /dev/sd?

then 

  mdadm -Esvv

and post all of that.  Hopefully some of it will make sense.

NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

  parent reply	other threads:[~2015-04-28 22:26 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-27  9:35 Help needed recovering from raid failure Peter van Es
2015-04-27 11:07 ` Mikael Abrahamsson
2015-04-28 22:26 ` NeilBrown [this message]
2015-04-29 18:17 Peter van Es
2015-04-29 23:27 ` NeilBrown
2015-04-30 19:25   ` Peter van Es
2015-05-01  2:31     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150429082603.56fb9aa9@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=vanes.peter@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.