All of lore.kernel.org
 help / color / mirror / Atom feed
* broken raid level 5 array caused by user error
@ 2015-11-09 11:27 Mathias Mueller
  2015-11-09 11:56 ` Mikael Abrahamsson
  0 siblings, 1 reply; 31+ messages in thread
From: Mathias Mueller @ 2015-11-09 11:27 UTC (permalink / raw)
  To: linux-raid

Hi Folks,

I'm running a raid level 5 with 4 devices for some years and tried to 
grow my array yesterday. I wanted to add two more devices and used the 
following commands:

mdadm --add /dev/md0 /dev/sdf1 /dev/sdg1
mdadm --grow --raid-devices=6 /dev/md0

So far, so good. Everything seems to work, but after about 2 hours, the 
reshape progress was still at 0.0% and now, my own stupidity kicked in. 
I checked the logs via journalctl (I'm running Centos 7) and read 
something about "main process died" or similar... then I decided to 
reboot.

After reboot, assembling the array failed:

mdadm: Failed to restore critical section for reshape, sorry. Possibly 
you needed to specify the --backup-file

But I did not have a backup file and so I panicked and made even worse 
decisions.

First I tried to assemble the array using --invalid-backup but it did 
not work. I should stop here and ask but I didn't. I read at some board, 
that rebuilding the original array with 4 devices will fix my problem. I 
did not validate this and entered the suggested command:

mdadm -CR /dev/md0 --metadata=1.2 -n4 -l5 -c512 /dev/sd[bcde]1 
--assume-clean

But this did not work (the array assembled but I could not access the 
ext4 filesystem), it seems that I assembled it in the wrong device 
order, so I also tried different (i.e. all possible) orders, but nothing 
helped ( I always used --assume-clean).

I guess this is the perfect guide for how _not_ to do it :(

I continued reading and found this:

http://serverfault.com/questions/347606/recover-raid-5-data-after-created-new-array-instead-of-re-using

This gave me some hope and now I wonder, if there is a way to get my 
data back, maybe the offset is wrong?

Things I know about the array:

metadata: 1.2
Left-symetric
chunk-size: 512

When I run mdadm --detail /dev/md0 it still shows an array size of 6TB, 
the UUID is also still the same

         Version : 1.2
   Creation Time : Mon Nov  9 00:00:40 2015
      Raid Level : raid5
      Array Size : 5860142592 (5588.67 GiB 6000.79 GB)
   Used Dev Size : 1953380864 (1862.89 GiB 2000.26 GB)
    Raid Devices : 4
   Total Devices : 4
     Persistence : Superblock is persistent

   Intent Bitmap : Internal

     Update Time : Mon Nov  9 00:00:45 2015
           State : active
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 512K

            Name : xxxx
            UUID : 1d0fdb4e:6111bd7a:96cad2dd:b6a29039
          Events : 1

     Number   Major   Minor   RaidDevice State
        0       8       49        0      active sync   /dev/sdd1
        1       8       65        1      active sync   /dev/sde1
        2       8       33        2      active sync   /dev/sdc1
        3       8       17        3      active sync   /dev/sdb1



an mdadm --examine gives this results:

/dev/sdb1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 1d0fdb4e:6111bd7a:96cad2dd:b6a29039
            Name : xxxx
   Creation Time : Mon Nov  9 00:00:40 2015
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3906766961 (1862.89 GiB 2000.26 GB)
      Array Size : 5860142592 (5588.67 GiB 6000.79 GB)
   Used Dev Size : 3906761728 (1862.89 GiB 2000.26 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=5233 sectors
           State : clean
     Device UUID : e14f0e2d:a26a7b90:d7dbf780:e2218327

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Nov  9 00:00:45 2015
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 7a76b0d6 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdc1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 1d0fdb4e:6111bd7a:96cad2dd:b6a29039
            Name : xxxx
   Creation Time : Mon Nov  9 00:00:40 2015
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
      Array Size : 5860142592 (5588.67 GiB 6000.79 GB)
   Used Dev Size : 3906761728 (1862.89 GiB 2000.26 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=3248 sectors
           State : clean
     Device UUID : d408e617:37f3f0f5:feb5d77f:07e57668

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Nov  9 00:00:45 2015
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : a9787e9 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 2
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdd:
    MBR Magic : aa55
Partition[0] :   3907024002 sectors at           63 (type fd)

/dev/sdd1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 1d0fdb4e:6111bd7a:96cad2dd:b6a29039
            Name : xxxx
   Creation Time : Mon Nov  9 00:00:40 2015
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3906761858 (1862.89 GiB 2000.26 GB)
      Array Size : 5860142592 (5588.67 GiB 6000.79 GB)
   Used Dev Size : 3906761728 (1862.89 GiB 2000.26 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=130 sectors
           State : clean
     Device UUID : faf7ec39:e7c0cb77:770a439d:18dc65a0

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Nov  9 00:00:45 2015
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 3d38419 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sde1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x1
      Array UUID : 1d0fdb4e:6111bd7a:96cad2dd:b6a29039
            Name : xxx
   Creation Time : Mon Nov  9 00:00:40 2015
      Raid Level : raid5
    Raid Devices : 4

  Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
      Array Size : 5860142592 (5588.67 GiB 6000.79 GB)
   Used Dev Size : 3906761728 (1862.89 GiB 2000.26 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262056 sectors, after=3248 sectors
           State : clean
     Device UUID : fe31b351:3559f949:978035ae:616ae615

Internal Bitmap : 8 sectors from superblock
     Update Time : Mon Nov  9 00:00:45 2015
   Bad Block Log : 512 entries available at offset 72 sectors
        Checksum : 743a6702 - correct
          Events : 1

          Layout : left-symmetric
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)



I guess I know the old device order as well, I saved an old boot-log:

md: bind<sde1>
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md/raid:md127: device sdd1 operational as raid disk 0
md/raid:md127: device sdc1 operational as raid disk 1
md/raid:md127: device sdb1 operational as raid disk 2
md/raid:md127: device sde1 operational as raid disk 3
md/raid:md127: allocated 4314kB
md/raid:md127: raid level 5 active with 4 out of 4 devices, algorithm 2
created bitmap (15 pages) for device md127
md127: bitmap initialized from disk: read 1 pages, set 0 of 29809 bits
md127: detected capacity change from 0 to 6001188667392



Please help me, I know I'm stupid and don't deserve it. I really hope, 
there is a chance for reovering the array.

Thanks a lot in advance

Mathias

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: broken raid level 5 array caused by user error
@ 2015-11-10 21:33 Mathias Mueller
  2015-11-10 21:41 ` Phil Turmel
  0 siblings, 1 reply; 31+ messages in thread
From: Mathias Mueller @ 2015-11-10 21:33 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Linux raid

Hi Phil,

>> This both combinations
> 
> Well, I gave you *four* combinations for order, and two suggestions for
> chunk size.  Eight combinations to try.

Sorry, this two words were just the beginning of a sentence, which I 
didn't finish an forgot to remove :)

I tried 16 combinations at all (eight with --data-offset=1024 and eight 
with --data-offset=2048), the following four combinations gave the 
"/dev/md0 has unsupported feature(s)" message occurs:

--data-offset=1024 --chunk 64: sde sdd sdb sdc
--data-offset=1024 --chunk 512: sde sdd sdb sdc
--data-offset=1024 --chunk 64: sde sdb sdd sdc
--data-offset=1024 --chunk 512: sde sdb sdd sdc

the other four combinations with --data-offset=1024 and all eight 
combinations with --data-offset=2048 gave

fsck.ext2: Superblock invalid, trying backup superblocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/md0


>> /dev/md0 has unsupported feature(s): FEATURE_C16 FEATURE_C17 
>> FEATURE_C18
>> FEATURE_C19 FEATURE_C21 FEATURE_C22 FEATURE_C23 FEATURE_C25 
>> FEATURE_C27
>> FEATURE_C28 FEATURE_I29 FEATURE_R29
>> e2fsck: Get a newer version of e2fsck!
> 
> None of the feature bits are documented to exist.  The choice of first
> drive must be wrong.

oh that's good to know

>> Also mdadm --examine on on of the drives tells "Data offset: 2048
>> Sectors" when mdadm --create --data-offset 1024 is used. Is this 
>> normal?
>> It's confusing me (using --data-offset 2048 on creation gives "Data
>> offset: 2048 Sectors").
> 
> mdadm must be enforcing a minimum offset.

is it possible that the data-offset differs from device to device?

Thanks

Mathias

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-01-25 10:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 11:27 broken raid level 5 array caused by user error Mathias Mueller
2015-11-09 11:56 ` Mikael Abrahamsson
2015-11-09 13:50   ` Phil Turmel
     [not found]     ` <07de4cd96f39ecb6154794d072ca12e7@pingofdeath.de>
     [not found]       ` <5640B8AD.3030800@turmel.org>
2015-11-09 15:41         ` Mathias Mueller
     [not found]           ` <d764bf541381927fa4183c9266fb3f5a@pingofdeath.de>
     [not found]             ` <5640C38B.4060503@turmel.org>
     [not found]               ` <a3a91665c4b7cdd70dacc7d8815cc365@pingofdeath.de>
2015-11-09 21:13                 ` Phil Turmel
2015-11-10  8:37                   ` Mathias Mueller
2015-11-10 13:55                     ` Phil Turmel
2015-11-10 14:55                       ` Mathias Mueller
2015-11-10 15:20                       ` Mathias Mueller
2015-11-10 15:28                         ` Phil Turmel
2015-11-10 21:02                           ` Mathias Mueller
2015-11-10 21:11                             ` Phil Turmel
2015-11-10 21:33 Mathias Mueller
2015-11-10 21:41 ` Phil Turmel
2015-11-10 23:47   ` Mathias Mueller
2015-11-10 23:59     ` Phil Turmel
     [not found]       ` <b0cdddd4394bbc1356980bb61ac199c3@pingofdeath.de>
2015-11-11  1:00         ` Phil Turmel
2015-11-11 17:53           ` Mathias Mueller
2016-01-18 15:33             ` Mathias Mueller
2016-01-18 19:09               ` Phil Turmel
2016-01-19 14:35                 ` Mathias Mueller
2016-01-19 17:51                   ` Phil Turmel
2016-01-19 19:37                     ` Phil Turmel
2016-01-20  9:04                       ` Mathias Mueller
2016-01-22  9:30                         ` Mathias Mueller
2016-01-22 17:16                           ` Phil Turmel
2016-01-22 17:39                             ` Mathias Mueller
2016-01-22 19:13                               ` Phil Turmel
2016-01-25 10:02                                 ` Mathias Mueller
2015-11-11  1:03       ` Phil Turmel
2015-11-11  1:29         ` Mathias Mueller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.