Failed drive while converting raid5 to raid6, then a hard reboot

* Failed drive while converting raid5 to raid6, then a hard reboot
@ 2012-04-30 13:59 Hákon Gíslason
  2012-05-08 20:48 ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Hákon Gíslason @ 2012-04-30 13:59 UTC (permalink / raw)
  To: linux-raid

Hello,
I've been having frequent drive "failures", as in, they are reported
failed/bad and mdadm sends me an email telling me things went wrong,
etc... but after a reboot or two, they are perfectly fine again. I'm
not sure what it is, but this server is quite new and I think there
might be more behind it, bad memory or the motherboard (I've been
having other issues as well). I've had 4 drive "failures" in this
month, all different drives except for one, which "failed" twice, and
all have been fixed with a reboot or rebuild (all drives reported bad
by mdadm passed an extensive SMART test).
Due to this, I decided to convert my raid5 array to a raid6 array
while I find the root cause of the problem.

I started the conversion right after a drive failure & rebuild, but as
it had converted/reshaped aprox. 4%(if I remember correctly, and it
was going really slowly, ~7500 minutes to completion), it reported
another drive bad, and the conversion to raid6 stopped (it said
"rebuilding", but the speed was 0K/sec and the time left was a few
million minutes.
After that happened, I tried to stop the array and reboot the server,
as I had done previously to get the reportedly "bad" drive working
again, but It wouldn't stop the array or reboot, neither could I
unmount it, it just hung whenever I tried to do something with
/dev/md0. After trying to reboot a few times, I just killed the power
and re-started it. Admittedly this was probably not the best thing I
could have done at that point.

I have backup of ca. 80% of the data on there, it's been a month since
the last complete backup (because I ran out of backup disk space).

So, the big question, can the array be activated, and can it complete
the conversion to raid6? And will I get my data back?
I hope the data can be rescued, and any help I can get would be much
appreciated!

I'm fairly new to raid in general, and have been using mdadm for about
a month now.
Here's some data:

root@axiom:~# mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=cfedbfc1:feaee982:4e92ccf4:45e08ed1
name=axiom.is:0

root@axiom:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdc[6] sde[7] sdb[5] sda[4]
      7814054240 blocks super 1.2

root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
mdadm: /dev/md0 is already in use.

root@axiom:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0

root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
mdadm: Failed to restore critical section for reshape, sorry.
      Possibly you needed to specify the --backup-file

root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
--backup-file=/root/mdadm-backup-file
mdadm: Failed to restore critical section for reshape, sorry.

root@axiom:~# fdisk -l | grep 2000
Disk /dev/sda doesn't contain a valid partition table
Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdb: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
Disk /dev/sdf: 2000.4 GB, 2000398934016 bytes

root@axiom:~# mdadm --examine /dev/sd{a,b,c,e,f}
/dev/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0

  (local to host axiom.is

)
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : b11a7424:fc470ea7:51ba6ea0:158c0ce6

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : 76ecd244 - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 3
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x6
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0

  (local to host axiom.is

)
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
Recovery Offset : 161546240 sectors
          State : active
    Device UUID : 8389f39f:cc7fa027:f10cf717:1d41d40b

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : 19ef8090 - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 4
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0

  (local to host axiom.is

)
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : b2cec17f:e526b42e:9e69e46b:23be5163

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : a29b468a - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 1
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0

  (local to host axiom.is

)
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 21c799cd:58be3156:6830865b:fa984134

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sun Oct 14 15:20:06 2012
       Checksum : d882780e - correct
         Events : 138274

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 2
   Array State : .AAAA ('A' == active, '.' == missing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x4
     Array UUID : cfedbfc1:feaee982:4e92ccf4:45e08ed1
           Name : axiom.is:0

  (local to host axiom.is

)
  Creation Time : Mon Apr  9 01:05:20 2012
     Raid Level : raid6
   Raid Devices : 5

 Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB)
     Array Size : 11721080448 (5589.05 GiB 6001.19 GB)
  Used Dev Size : 3907026816 (1863.02 GiB 2000.40 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 8b043488:8379f327:5f00e0fe:6a1e0bee

  Reshape pos'n : 242343936 (231.12 GiB 248.16 GB)
     New Layout : left-symmetric

    Update Time : Sat Apr 28 22:57:36 2012
       Checksum : c122639f - correct
         Events : 138241

         Layout : left-symmetric-6
     Chunk Size : 32K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)
--
Hákon G.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread