All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Help needed recovering from raid failure
@ 2015-04-29 18:17 Peter van Es
  2015-04-29 23:27 ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Peter van Es @ 2015-04-29 18:17 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Dear Neil,

first of all, I really appreciate you trying to help me. This is the first time I’m deploying software raid, so really appreciate the guidance.


> On 29 Apr 2015, at 00:26, NeilBrown <neilb@suse.de> wrote:
> 
> This isn't really reporting anything new.
> There is probably a daily cron job which reports all degraded arrays.  This
> message is reported by that job.

I understand...

> 
> 
> Why do you think the array is off-line?  The above message doesn't suggest
> that.
> 

My Ubuntu server was accessible through ssh but did not serve webpages, files etc. When I went to the console, 
it told me it had taken the array offline because of degraded /dev/sdd2 and /dev/sdc2
Those two drives were out of the array. 

> 
>> 
>> Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
>> get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).
> 
> You boot off a RAID5?  Does grub support that?  I didn't know.
> But md0 hasn't failed, has it?
> 
> Confused.

Well, it took a little time but yes, I managed to define a raid 5 array that the system was able to boot from. 

> There is something VERY sick here.  I suggest that you tread very carefully.
> 
> All your '1' partitions should be about 2GB and the '2' parititions about 2TB
> 
> But the --examine output suggests sda2 and sdb2 are 2TB, while sdd2 and sde2
> are 2GB.
> 
> That really really shouldn't happen.  Maybe check your partition table
> (fdisk).
> I really cannot see how this would happen.

But this question, and the previous question you asked, tell me a little of what I may have done…

I think confused /dev/md0 and /dev/md1 (now called /dev/md126 and /dev/md127 when running of the USB stick). 

/dev/md0 is a swap array (around 6GB, comprised of 4 x 2 GB in raid 5)
/dev/md1 is the boot and data array (around 5 TB, comprised of 4 x ~2 TB in raid 5) 

I must have confused them and tried to add the /dev/sdc2 and /dev/sdd2 drive to the /dev/md0 array (mdadm —add /dev/md0 /dev/sdc2)
instead of to the /dev/md1 array.  They were  then added as spare drives, their superblocks were overwritten, but since
a) no swap space was used, and 
b) they were added as spares

The data should not have been overwritten.

> 
> Can you
>  mdadm -Ss
> 
> to stop all the arrays, then
> 
>  fdisk -l /dev/sd?
> 
> then 
> 
>  mdadm -Esvv
> 

Neil, here they are: again, I appreciate you taking the time and guiding me through this!

Is there any way to resurrect the super blocks and try to force assemble the array, skipping the failing drive /dev/sdd2 (the /dev/sdd2 drive created some errors I observed in the log, /dev/sdc2 must have had a one off issue to be taken out….). I have two new drives (arrived today), and a new SSD drive. I would want to get the new array assembled using /dev/sdc2 perhaps forcing it back to the array geometry and “hoping for the best” and then install a new /dev/sdd2 to be recovered. Then I’ll create a boot and swap drive off the SSD which means that any array failures should not prevent the system from booting…

Requested outputs are below

Thanks, 

Peter


fdisk output: (USB devices deleted)


Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f24ee

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048     3905535     1951744   fd  Linux raid autodetect
/dev/sda2   *     3905536  3907028991  1951561728   fd  Linux raid autodetect

Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00029d5c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048     3905535     1951744   fd  Linux raid autodetect
/dev/sdb2   *     3905536  3907028991  1951561728   fd  Linux raid autodetect


Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000727bf

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048     3905535     1951744   fd  Linux raid autodetect
/dev/sdd2   *     3905536  3907028991  1951561728   fd  Linux raid autodetect

Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009fe7f

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048     3905535     1951744   fd  Linux raid autodetect
/dev/sde2   *     3905536  3907028991  1951561728   fd  Linux raid autodetect


mdadm -Esvv output (USB devices deleted)

/dev/sde2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
           Name : ubuntu:0  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
     Array Size : 5850624 (5.58 GiB 5.99 GB)
  Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : cdae3287:91168194:942ba99d:1a85c466

    Update Time : Wed Apr 29 17:46:25 2015
       Checksum : b8b84dad - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
           Name : ubuntu:0  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 5850624 (5.58 GiB 5.99 GB)
  Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b051f523:4887e729:cd63bed1:8c2a7575

    Update Time : Wed Apr 29 17:46:25 2015
       Checksum : 453ddeef - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sde:
   MBR Magic : aa55
Partition[0] :      3903488 sectors at         2048 (type fd)
Partition[1] :   3903123456 sectors at      3905536 (type fd)
/dev/sdd2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
           Name : ubuntu:0  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
     Array Size : 5850624 (5.58 GiB 5.99 GB)
  Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 0f3f2b91:09cbb344:e52c4c4b:722d65c4

    Update Time : Wed Apr 29 17:46:25 2015
       Checksum : 7e273c0f - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
           Name : ubuntu:0  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 5850624 (5.58 GiB 5.99 GB)
  Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b6668730:3b1380bf:556700d9:30df829c

    Update Time : Wed Apr 29 17:46:25 2015
       Checksum : 15b83814 - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :      3903488 sectors at         2048 (type fd)
Partition[1] :   3903123456 sectors at      3905536 (type fd)
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
           Name : ubuntu:1  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:58 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
     Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
  Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f1e79609:79b7ac23:55197f70:e8fbfd58

    Update Time : Sun Apr 26 05:59:13 2015
       Checksum : 696f4e76 - correct
         Events : 18014

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.. ('A' == active, '.' == missing)
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
           Name : ubuntu:0  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 5850624 (5.58 GiB 5.99 GB)
  Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : f52239b1:0fb87e7e:71e29ea4:bf67184a

    Update Time : Wed Apr 29 17:46:25 2015
       Checksum : ce9c9cd0 - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sdb:
   MBR Magic : aa55
Partition[0] :      3903488 sectors at         2048 (type fd)
Partition[1] :   3903123456 sectors at      3905536 (type fd)
/dev/sda2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
           Name : ubuntu:1  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:58 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
     Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
  Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 713e556d:ca104217:785db68a:d820a57b

    Update Time : Sun Apr 26 05:59:13 2015
       Checksum : fda151f9 - correct
         Events : 18014

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.. ('A' == active, '.' == missing)
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
           Name : ubuntu:0  (local to host ubuntu)
  Creation Time : Wed Apr  1 22:27:42 2015
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3901440 (1905.32 MiB 1997.54 MB)
     Array Size : 5850624 (5.58 GiB 5.99 GB)
  Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : c483532d:06f93351:cfdf5a92:e83855b5

    Update Time : Wed Apr 29 17:46:25 2015
       Checksum : 76650d1c - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing)
/dev/sda:
   MBR Magic : aa55
Partition[0] :      3903488 sectors at         2048 (type fd)
Partition[1] :   3903123456 sectors at      3905536 (type fd)--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread
* Help needed recovering from raid failure
@ 2015-04-27  9:35 Peter van Es
  2015-04-27 11:07 ` Mikael Abrahamsson
  2015-04-28 22:26 ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Peter van Es @ 2015-04-27  9:35 UTC (permalink / raw)
  To: linux-raid

Sorry for the long post...

I am running Ubuntu LTS 14.04.02 Server edition, 64 bits, with 4x 2.0TB drives in a raid-5 array.

The 4th drive was beginning to show read errors. Because it was weekend, I could not go out
and buy a spare 2TB drive to replace the one that was beginning to fail.

I first got a fail event:

This is an automatically generated mail message from mdadm
running on bali

A Fail event had been detected on md device /dev/md/1.

It could be related to component device /dev/sdd2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
    5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
    5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

And then subsequently, around 18 hours later:

This is an automatically generated mail message from mdadm
running on bali

A DegradedArray event had been detected on md device /dev/md/1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sdc2[2] sdb2[1] sda2[0] sdd2[3](F)
    5854290432 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

md0 : active raid5 sdc1[2] sdd1[3] sdb1[1] sda1[0]
    5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

The server had taken the array off line at that point.

Needless to say, I can't boot the system anymore as the boot drive is /dev/md0, and GRUB can't
get at it. I do need to recover data (I know, but there's stuf on there I have no backup for--yet).

I booted Linux from a USB stick (which is on /dev/sdc1 hence changing the numbering),
in recovery mode. Below is the output of /proc/mdstat and 
mdadm --examine. It looks like somehow the /dev/sdd2 and /dev/sde2 drives took on the 
super block of the /dev/md127 device (my swap file). May that have been done by the boot from
the Ubuntu USB stick?

My plan... assemble a degraded array, with /dev/sde2 (the 4th drive, formerly known as /dev/sdd2) not in it.
Because the fail event put the file system in RO mode, I expect /dev/sdd2 (formerly /dev/sdc2) to be ok.
Then insert new 2TB drive in slot 4. Let system resync and recover.

I'm running xfs on the /dev/md1 device.

Questions:

1. is this the wise course of action ?
2. how exactly do I reassemble the array (/etc/mdadm.conf is inaccessible in recovery mode)
3. what command line options do I use exactly from the --examine output below without screwing things up

And help or pointers gratefully accepted

Peter van Es




/proc/mdstat (in recovery)

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md126 : inactive sdb2[1](S) sda2[0](S)
     3902861312 blocks super 1.2

md127 : active raid5 sde2[5](S) sde1[3] sdb1[1] sda1[0] sdd1[2] sdd2[4](S)
     5850624 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

mdadm --examine /dev/sd[abde]2 


/dev/sda2:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x0
    Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
          Name : ubuntu:1  (local to host ubuntu)
 Creation Time : Wed Apr  1 22:27:58 2015
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
    Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
 Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
   Data Offset : 262144 sectors
  Super Offset : 8 sectors
         State : clean
   Device UUID : 713e556d:ca104217:785db68a:d820a57b

   Update Time : Sun Apr 26 05:59:13 2015
      Checksum : fda151f9 - correct
        Events : 18014

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : Active device 0
  Array State : AA.. ('A' == active, '.' == missing)

/dev/sdb2:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x0
    Array UUID : 1f28f7bb:7b3ecd41:ca0fa5d1:ccd008df
          Name : ubuntu:1  (local to host ubuntu)
 Creation Time : Wed Apr  1 22:27:58 2015
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 3902861312 (1861.03 GiB 1998.26 GB)
    Array Size : 5854290432 (5583.09 GiB 5994.79 GB)
 Used Dev Size : 3902860288 (1861.03 GiB 1998.26 GB)
   Data Offset : 262144 sectors
  Super Offset : 8 sectors
         State : clean
   Device UUID : f1e79609:79b7ac23:55197f70:e8fbfd58

   Update Time : Sun Apr 26 05:59:13 2015
      Checksum : 696f4e76 - correct
        Events : 18014

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : Active device 1
  Array State : AA.. ('A' == active, '.' == missing)

/dev/sdd2:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x0
    Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
          Name : ubuntu:0  (local to host ubuntu)
 Creation Time : Wed Apr  1 22:27:42 2015
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
    Array Size : 5850624 (5.58 GiB 5.99 GB)
 Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
   Data Offset : 2048 sectors
  Super Offset : 8 sectors
         State : clean
   Device UUID : 0f3f2b91:09cbb344:e52c4c4b:722d65c4

   Update Time : Mon Apr 27 08:37:15 2015
      Checksum : 7e241855 - correct
        Events : 26

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : spare
  Array State : AAAA ('A' == active, '.' == missing)

/dev/sde2:
         Magic : a92b4efc
       Version : 1.2
   Feature Map : 0x0
    Array UUID : dbe238a3:c7a528c1:a1b78589:276ecfcf
          Name : ubuntu:0  (local to host ubuntu)
 Creation Time : Wed Apr  1 22:27:42 2015
    Raid Level : raid5
  Raid Devices : 4

Avail Dev Size : 3903121408 (1861.15 GiB 1998.40 GB)
    Array Size : 5850624 (5.58 GiB 5.99 GB)
 Used Dev Size : 3900416 (1904.82 MiB 1997.01 MB)
   Data Offset : 2048 sectors
  Super Offset : 8 sectors
         State : clean
   Device UUID : cdae3287:91168194:942ba99d:1a85c466

   Update Time : Mon Apr 27 08:37:15 2015
      Checksum : b8b529f3 - correct
        Events : 26

        Layout : left-symmetric
    Chunk Size : 512K

  Device Role : spare
  Array State : AAAA ('A' == active, '.' == missing)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-01  2:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-29 18:17 Help needed recovering from raid failure Peter van Es
2015-04-29 23:27 ` NeilBrown
2015-04-30 19:25   ` Peter van Es
2015-05-01  2:31     ` NeilBrown
  -- strict thread matches above, loose matches on Subject: below --
2015-04-27  9:35 Peter van Es
2015-04-27 11:07 ` Mikael Abrahamsson
2015-04-28 22:26 ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.