All of lore.kernel.org
 help / color / mirror / Atom feed
* [RAID recovery] Unable to recover RAID5 array after disk failure
@ 2017-03-03 21:35 Olivier Swinkels
  2017-03-05 18:55 ` Phil Turmel
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Swinkels @ 2017-03-03 21:35 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm in quite a pickle here. I can't recover from a disk failure on my
6 disk raid 5 array.
Any help would really be appreciated!

Please bear with me as I lay out the the steps that got me here:

- I got a message my raid went down as 3 disks seemed to have failed.
I've dealt with this before and usually it meant that one disk failed
and took out the complete SATA controller.

- 1 of the disks was quite old and the 2 others quite new (<1 year).
So i removed the old drive and the controller can up again. I tried to
reassemble the RAID using:   sudo mdadm -v --assemble --force /dev/md0
 /dev/sdb /dev/sdc /dev/sdg /dev/sdf /dev/sde

- However I got the message :
mdadm: /dev/md0 assembled from 4 drives - not enough to start the array.

- This got me worried and this was the place I screwed up:

- Against the recommendations on the wiki I tried to recover the RAID
using a re-create:
sudo mdadm --verbose --create --assume-clean --level=5
--raid-devices=6 /dev/md0 /dev/sdb /dev/sdc missing /dev/sdg /dev/sdf
/dev/sde

- The second error I made was I forgot to add the correct superblock
version and chunksize.

- The resulting RAID did not seem correct as I couldn't find the LVM
which should be there.

- Subsequently the SATA controller went down again, so my assumption
on the failed disk was also incorrect and I disconnected the wrong
disk.

- After some trial and error I found out one of the newer disk was the
culprit and I tried to recover the RAID by re-creating the array with
the healthy disks and the correct superblock configuration using:
sudo mdadm --verbose --create --bitmap=none --chunk=64 --metadata=0.90
--assume-clean --level=5 --raid-devices=6 /dev/md0 /dev/sdb missing
/dev/sdc /dev/sdf /dev/sde /dev/sdd

- This gives me a degraded array, but unfortunately the LVM is still
not available.

- Is this situation still rescue-able?


===============================================================================
===============================================================================
- Below is the output of "mdadm --examine /dev/sd*" BEFORE the first
create action.

/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7af7d0ad:b37b1b49:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Sun Apr 10 17:59:16 2011
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Feb 24 16:31:02 2017
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 0
       Checksum : ae0d0dec - correct
         Events : 51108

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       16        0      active sync   /dev/sdb

   0     0       8       16        0      active sync   /dev/sdb
   1     1       0        0        1      active sync
   2     2       0        0        2      faulty removed
   3     3       8       96        3      active sync   /dev/sdg
   4     4       8       80        4      active sync   /dev/sdf
   5     5       0        0        5      faulty removed
/dev/sdc:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7af7d0ad:b37b1b49:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Sun Apr 10 17:59:16 2011
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Feb 24 02:01:04 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ae0c42ac - correct
         Events : 51108

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       32        1      active sync   /dev/sdc

   0     0       8       16        0      active sync   /dev/sdb
   1     1       8       32        1      active sync   /dev/sdc
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       96        3      active sync   /dev/sdg
   4     4       8       80        4      active sync   /dev/sdf
   5     5       8       64        5      active sync   /dev/sde
/dev/sde:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7af7d0ad:b37b1b49:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Sun Apr 10 17:59:16 2011
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Feb 24 02:01:04 2017
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ae0c42c0 - correct
         Events : 51088

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       64        5      active sync   /dev/sde

   0     0       8       16        0      active sync   /dev/sdb
   1     1       8       32        1      active sync   /dev/sdc
   2     2       8       48        2      active sync   /dev/sdd
   3     3       8       96        3      active sync   /dev/sdg
   4     4       8       80        4      active sync   /dev/sdf
   5     5       8       64        5      active sync   /dev/sde
/dev/sdf:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7af7d0ad:b37b1b49:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Sun Apr 10 17:59:16 2011
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Feb 24 16:31:02 2017
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 0
       Checksum : ae0d0e37 - correct
         Events : 51108

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       80        4      active sync   /dev/sdf

   0     0       8       16        0      active sync   /dev/sdb
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       96        3      active sync   /dev/sdg
   4     4       8       80        4      active sync   /dev/sdf
   5     5       0        0        5      faulty removed
/dev/sdg:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 7af7d0ad:b37b1b49:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Sun Apr 10 17:59:16 2011
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 3
Preferred Minor : 0

    Update Time : Fri Feb 24 16:31:02 2017
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 0
       Checksum : ae0d0e45 - correct
         Events : 51108

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       96        3      active sync   /dev/sdg

   0     0       8       16        0      active sync   /dev/sdb
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       96        3      active sync   /dev/sdg
   4     4       8       80        4      active sync   /dev/sdf
   5     5       0        0        5      faulty removed

===============================================================================
===============================================================================
- below is status of the current situation:
===============================================================================
===============================================================================

- Phil Turmel's lsdrv:

sudo ./lsdrv
[sudo] password for horus:
PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation NM10/ICH7
Family SATA Controller [IDE mode] (rev 02)
├scsi 0:0:0:0 ATA      Samsung SSD 850  {S21UNX0H601730R}
│└sda 111.79g [8:0] Partitioned (dos)
│ └sda1 97.66g [8:1] Partitioned (dos) {a2d2e5b3-cef5-44f8-83a7-3c25f285c7b4}
│  └Mounted as /dev/sda1 @ /
└scsi 1:0:0:0 ATA      SAMSUNG HD204UI  {S2H7JD2B201244}
 └sdb 1.82t [8:16] MD raid5 (0/6) (w/ sdc,sdd,sde,sdf) in_sync
{4c0518af-d198-d804-151d-b09aa68c27d9}
  └md0 9.10t [9:0] MD v0.90 raid5 (6) clean DEGRADED, 64k Chunk
{4c0518af:d198d804:151db09a:a68c27d9}
                   PV LVM2_member (inactive)
{DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS}
PCI [pata_jmicron] 01:00.1 IDE interface: JMicron Technology Corp.
JMB363 SATA/IDE Controller (rev 03)
├scsi 2:x:x:x [Empty]
└scsi 3:x:x:x [Empty]
PCI [ahci] 01:00.0 SATA controller: JMicron Technology Corp. JMB363
SATA/IDE Controller (rev 03)
├scsi 4:0:0:0 ATA      SAMSUNG HD204UI  {S2H7JD2B201246}
│└sdc 1.82t [8:32] MD raid5 (2/6) (w/ sdb,sdd,sde,sdf) in_sync
{4c0518af-d198-d804-151d-b09aa68c27d9}
│ └md0 9.10t [9:0] MD v0.90 raid5 (6) clean DEGRADED, 64k Chunk
{4c0518af:d198d804:151db09a:a68c27d9}
│                  PV LVM2_member (inactive)
{DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS}
├scsi 4:1:0:0 ATA      WDC WD40EFRX-68W {WD-WCC4E6JF3EE3}
│└sdd 3.64t [8:48] MD raid5 (5/6) (w/ sdb,sdc,sde,sdf) in_sync
{4c0518af-d198-d804-151d-b09aa68c27d9}
│ ├md0 9.10t [9:0] MD v0.90 raid5 (6) clean DEGRADED, 64k Chunk
{4c0518af:d198d804:151db09a:a68c27d9}
│ │                PV LVM2_member (inactive)
{DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS}
└scsi 5:x:x:x [Empty]
PCI [ahci] 05:00.0 SATA controller: Marvell Technology Group Ltd.
88SE9120 SATA 6Gb/s Controller (rev 12)
├scsi 6:0:0:0 ATA      Hitachi HDS72202 {JK11A1YAJN30GV}
│└sde 1.82t [8:64] MD raid5 (4/6) (w/ sdb,sdc,sdd,sdf) in_sync
{4c0518af-d198-d804-151d-b09aa68c27d9}
│ └md0 9.10t [9:0] MD v0.90 raid5 (6) clean DEGRADED, 64k Chunk
{4c0518af:d198d804:151db09a:a68c27d9}
│                  PV LVM2_member (inactive)
{DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS}
└scsi 7:0:0:0 ATA      Hitachi HDS72202 {JK1174YAH779AW}
 └sdf 1.82t [8:80] MD raid5 (3/6) (w/ sdb,sdc,sdd,sde) in_sync
{4c0518af-d198-d804-151d-b09aa68c27d9}
  └md0 9.10t [9:0] MD v0.90 raid5 (6) clean DEGRADED, 64k Chunk
{4c0518af:d198d804:151db09a:a68c27d9}
                   PV LVM2_member (inactive)
{DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS}

===============================================================================
===============================================================================
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid5 sdd[5] sde[4] sdf[3] sdc[2] sdb[0]
      9767572480 blocks level 5, 64k chunk, algorithm 2 [6/5] [U_UUUU]

unused devices: <none>

===============================================================================
===============================================================================
This is the current none functional RAID.
sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Fri Mar  3 21:09:22 2017
     Raid Level : raid5
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
   Raid Devices : 6
  Total Devices : 5
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Mar  3 21:09:22 2017
          State : clean, degraded
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 4c0518af:d198d804:151db09a:a68c27d9 (local to host
horus-server)
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       2       0        0        2      removed
       2       8       32        2      active sync   /dev/sdc
       3       8       80        3      active sync   /dev/sdf
       4       8       64        4      active sync   /dev/sde
       5       8       48        5      active sync   /dev/sdd


===============================================================================
===============================================================================

sudo mdadm --examine /dev/sd*
/dev/sda:
   MBR Magic : aa55
Partition[0] :    204800000 sectors at         2048 (type 83)
/dev/sda1:
   MBR Magic : aa55
/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4c0518af:d198d804:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Fri Mar  3 21:09:22 2017
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Mar  3 21:09:22 2017
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a857f8f3 - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       16        0      active sync   /dev/sdb

   0     0       8       16        0      active sync   /dev/sdb
   1     0       0        0        0      spare
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       80        3      active sync   /dev/sdf
   4     4       8       64        4      active sync   /dev/sde
   5     5       8       48        5      active sync   /dev/sdd
/dev/sdc:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4c0518af:d198d804:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Fri Mar  3 21:09:22 2017
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Mar  3 21:09:22 2017
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a857f907 - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       32        2      active sync   /dev/sdc

   0     0       8       16        0      active sync   /dev/sdb
   1     0       0        0        0      spare
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       80        3      active sync   /dev/sdf
   4     4       8       64        4      active sync   /dev/sde
   5     5       8       48        5      active sync   /dev/sdd
/dev/sdd:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4c0518af:d198d804:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Fri Mar  3 21:09:22 2017
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Mar  3 21:09:22 2017
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a857f91d - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5       8       48        5      active sync   /dev/sdd

   0     0       8       16        0      active sync   /dev/sdb
   1     0       0        0        0      spare
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       80        3      active sync   /dev/sdf
   4     4       8       64        4      active sync   /dev/sde
   5     5       8       48        5      active sync   /dev/sdd
mdadm: No md superblock detected on /dev/sdd1.
/dev/sde:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4c0518af:d198d804:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Fri Mar  3 21:09:22 2017
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Mar  3 21:09:22 2017
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a857f92b - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       64        4      active sync   /dev/sde

   0     0       8       16        0      active sync   /dev/sdb
   1     0       0        0        0      spare
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       80        3      active sync   /dev/sdf
   4     4       8       64        4      active sync   /dev/sde
   5     5       8       48        5      active sync   /dev/sdd
/dev/sdf:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 4c0518af:d198d804:151db09a:a68c27d9 (local to host
horus-server)
  Creation Time : Fri Mar  3 21:09:22 2017
     Raid Level : raid5
  Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
     Array Size : 9767572480 (9315.08 GiB 10001.99 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0

    Update Time : Fri Mar  3 21:09:22 2017
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
  Spare Devices : 0
       Checksum : a857f939 - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       80        3      active sync   /dev/sdf

   0     0       8       16        0      active sync   /dev/sdb
   1     0       0        0        0      spare
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       80        3      active sync   /dev/sdf
   4     4       8       64        4      active sync   /dev/sde
   5     5       8       48        5      active sync   /dev/sdd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-03 21:35 [RAID recovery] Unable to recover RAID5 array after disk failure Olivier Swinkels
@ 2017-03-05 18:55 ` Phil Turmel
  2017-03-06  8:26   ` Olivier Swinkels
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2017-03-05 18:55 UTC (permalink / raw)
  To: Olivier Swinkels, linux-raid

On 03/03/2017 04:35 PM, Olivier Swinkels wrote:
> Hi,
> 
> I'm in quite a pickle here. I can't recover from a disk failure on my
> 6 disk raid 5 array.
> Any help would really be appreciated!
> 
> Please bear with me as I lay out the the steps that got me here:

[trim /]

Well, you've learned that mdadm --create is not a good idea. /-:

However, you did save your pre-re-create --examine reports, and it
looks like you've reconstructed correctly.  (Very brief look.)

However, you discovered that mdadm's defaults have long since changed
to v1.2 superblock, 512k chunks, bitmaps, and a substantially different
metadata layout.  In fact, I'm certain your LVM metadata has been
damaged by the brief existence of mdadm's v1.2 metadata on your member
devices.  Including removal of the LVM magic signature.

What you need is a backup of your lvm configuration, which is commonly
available in /etc/ of an install, but naturally not available if /etc/
was inside this array.  In addition, though, LVM generally writes
multiple copies of this backup in its metadata.  And that is likely
still there, near the beginning of your array.

You should hexdump the first several megabytes of your array looking for
LVM's XML formatted configuration.  If you can locate some of those
copies, you can probably use dd to extract a copy to a file, then use
that with LVM's recovery tools to re-establish all of your LVs.

There is a possibilility that some of your actual LV content was damaged
by the mdadm v1.2 metadata, too, but first recover the LVM setup.

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-05 18:55 ` Phil Turmel
@ 2017-03-06  8:26   ` Olivier Swinkels
  2017-03-06  9:17     ` Olivier Swinkels
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Swinkels @ 2017-03-06  8:26 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Sun, Mar 5, 2017 at 7:55 PM, Phil Turmel <philip@turmel.org> wrote:
>
> On 03/03/2017 04:35 PM, Olivier Swinkels wrote:
> > Hi,
> >
> > I'm in quite a pickle here. I can't recover from a disk failure on my
> > 6 disk raid 5 array.
> > Any help would really be appreciated!
> >
> > Please bear with me as I lay out the the steps that got me here:
>
> [trim /]
>
> Well, you've learned that mdadm --create is not a good idea. /-:
>
> However, you did save your pre-re-create --examine reports, and it
> looks like you've reconstructed correctly.  (Very brief look.)
>
> However, you discovered that mdadm's defaults have long since changed
> to v1.2 superblock, 512k chunks, bitmaps, and a substantially different
> metadata layout.  In fact, I'm certain your LVM metadata has been
> damaged by the brief existence of mdadm's v1.2 metadata on your member
> devices.  Including removal of the LVM magic signature.
>
> What you need is a backup of your lvm configuration, which is commonly
> available in /etc/ of an install, but naturally not available if /etc/
> was inside this array.  In addition, though, LVM generally writes
> multiple copies of this backup in its metadata.  And that is likely
> still there, near the beginning of your array.
>
> You should hexdump the first several megabytes of your array looking for
> LVM's XML formatted configuration.  If you can locate some of those
> copies, you can probably use dd to extract a copy to a file, then use
> that with LVM's recovery tools to re-establish all of your LVs.
>
> There is a possibilility that some of your actual LV content was damaged
> by the mdadm v1.2 metadata, too, but first recover the LVM setup.
>
> Phil


That sounds promising, as /etc was not on the array.
I found a backup in /etc/lvm/backup/lvm-raid (contents shown below).

Unfortunatelly when I try to use it to restore the LVM I get the
following error:
vgcfgrestore -f /etc/lvm/backup/lvm-raid lvm-raid
Aborting vg_write: No metadata areas to write to!
Restore failed.

So I guess I also need to recreate the physical volume using:
pvcreate --uuid "0Esja8-U0EZ-fndQ-vjUq-oIuX-3KgA-uTL6rP" --restorefile
/etc/lvm/backup/lvm-raid
Is this correct? (I'm a bit hesitant with another 'create' command as
you might understand.)

Regards,

Olivier


===============================================================================
/etc/lvm/backup/lvm-raid
===============================================================================
# Generated by LVM2 version 2.02.133(2) (2015-10-30): Fri Oct 14 15:55:36 2016

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgcfgbackup'"

creation_host = "horus-server"  # Linux horus-server 3.13.0-98-generic
#145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64
creation_time = 1476453336      # Fri Oct 14 15:55:36 2016

lvm-raid {
        id = "0Esja8-U0EZ-fndQ-vjUq-oIuX-3KgA-uTL6rP"
        seqno = 8
        format = "lvm2"                 # informational
        status = ["RESIZEABLE", "READ", "WRITE"]
        flags = []
        extent_size = 524288            # 256 Megabytes
        max_lv = 0
        max_pv = 0
        metadata_copies = 0

        physical_volumes {

                pv0 {
                        id = "DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS"
                        device = "/dev/md0"     # Hint only

                        status = ["ALLOCATABLE"]
                        flags = []
                        dev_size = 19535144448  # 9.09676 Terabytes
                        pe_start = 512
                        pe_count = 37260        # 9.09668 Terabytes
                }
        }

        logical_volumes {

                lvm0 {
                        id = "OpWRpy-O4JT-Ua3t-E1A4-2SuN-GLLR-5CFMLh"
                        status = ["READ", "WRITE", "VISIBLE"]
                        flags = []
                        segment_count = 1

                        segment1 {
                                start_extent = 0
                                extent_count = 37260    # 9.09668 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 0
                                ]
                        }
                }
        }
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-06  8:26   ` Olivier Swinkels
@ 2017-03-06  9:17     ` Olivier Swinkels
  2017-03-06 19:50       ` Phil Turmel
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Swinkels @ 2017-03-06  9:17 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Mon, Mar 6, 2017 at 9:26 AM, Olivier Swinkels
<olivier.swinkels@gmail.com> wrote:
> On Sun, Mar 5, 2017 at 7:55 PM, Phil Turmel <philip@turmel.org> wrote:
>>
>> On 03/03/2017 04:35 PM, Olivier Swinkels wrote:
>> > Hi,
>> >
>> > I'm in quite a pickle here. I can't recover from a disk failure on my
>> > 6 disk raid 5 array.
>> > Any help would really be appreciated!
>> >
>> > Please bear with me as I lay out the the steps that got me here:
>>
>> [trim /]
>>
>> Well, you've learned that mdadm --create is not a good idea. /-:
>>
>> However, you did save your pre-re-create --examine reports, and it
>> looks like you've reconstructed correctly.  (Very brief look.)
>>
>> However, you discovered that mdadm's defaults have long since changed
>> to v1.2 superblock, 512k chunks, bitmaps, and a substantially different
>> metadata layout.  In fact, I'm certain your LVM metadata has been
>> damaged by the brief existence of mdadm's v1.2 metadata on your member
>> devices.  Including removal of the LVM magic signature.
>>
>> What you need is a backup of your lvm configuration, which is commonly
>> available in /etc/ of an install, but naturally not available if /etc/
>> was inside this array.  In addition, though, LVM generally writes
>> multiple copies of this backup in its metadata.  And that is likely
>> still there, near the beginning of your array.
>>
>> You should hexdump the first several megabytes of your array looking for
>> LVM's XML formatted configuration.  If you can locate some of those
>> copies, you can probably use dd to extract a copy to a file, then use
>> that with LVM's recovery tools to re-establish all of your LVs.
>>
>> There is a possibilility that some of your actual LV content was damaged
>> by the mdadm v1.2 metadata, too, but first recover the LVM setup.
>>
>> Phil
>
>
> That sounds promising, as /etc was not on the array.
> I found a backup in /etc/lvm/backup/lvm-raid (contents shown below).
>
> Unfortunatelly when I try to use it to restore the LVM I get the
> following error:
> vgcfgrestore -f /etc/lvm/backup/lvm-raid lvm-raid
> Aborting vg_write: No metadata areas to write to!
> Restore failed.
>
> So I guess I also need to recreate the physical volume using:

Correction: (Put the wrong ID in the pvcreate example):
pvcreate --uuid "DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS" --restorefile
/etc/lvm/backup/lvm-raid

> Is this correct? (I'm a bit hesitant with another 'create' command as
> you might understand.)
>
> Regards,
>
> Olivier
>
>
> ===============================================================================
> /etc/lvm/backup/lvm-raid
> ===============================================================================
> # Generated by LVM2 version 2.02.133(2) (2015-10-30): Fri Oct 14 15:55:36 2016
>
> contents = "Text Format Volume Group"
> version = 1
>
> description = "Created *after* executing 'vgcfgbackup'"
>
> creation_host = "horus-server"  # Linux horus-server 3.13.0-98-generic
> #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64
> creation_time = 1476453336      # Fri Oct 14 15:55:36 2016
>
> lvm-raid {
>         id = "0Esja8-U0EZ-fndQ-vjUq-oIuX-3KgA-uTL6rP"
>         seqno = 8
>         format = "lvm2"                 # informational
>         status = ["RESIZEABLE", "READ", "WRITE"]
>         flags = []
>         extent_size = 524288            # 256 Megabytes
>         max_lv = 0
>         max_pv = 0
>         metadata_copies = 0
>
>         physical_volumes {
>
>                 pv0 {
>                         id = "DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS"
>                         device = "/dev/md0"     # Hint only
>
>                         status = ["ALLOCATABLE"]
>                         flags = []
>                         dev_size = 19535144448  # 9.09676 Terabytes
>                         pe_start = 512
>                         pe_count = 37260        # 9.09668 Terabytes
>                 }
>         }
>
>         logical_volumes {
>
>                 lvm0 {
>                         id = "OpWRpy-O4JT-Ua3t-E1A4-2SuN-GLLR-5CFMLh"
>                         status = ["READ", "WRITE", "VISIBLE"]
>                         flags = []
>                         segment_count = 1
>
>                         segment1 {
>                                 start_extent = 0
>                                 extent_count = 37260    # 9.09668 Terabytes
>
>                                 type = "striped"
>                                 stripe_count = 1        # linear
>
>                                 stripes = [
>                                         "pv0", 0
>                                 ]
>                         }
>                 }
>         }
> }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-06  9:17     ` Olivier Swinkels
@ 2017-03-06 19:50       ` Phil Turmel
  2017-03-07  8:39         ` Olivier Swinkels
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2017-03-06 19:50 UTC (permalink / raw)
  To: Olivier Swinkels; +Cc: linux-raid

On 03/06/2017 04:17 AM, Olivier Swinkels wrote:

>> That sounds promising, as /etc was not on the array.
>> I found a backup in /etc/lvm/backup/lvm-raid (contents shown below).

Yay!  That's exactly what you need.

>> Unfortunatelly when I try to use it to restore the LVM I get the
>> following error:
>> vgcfgrestore -f /etc/lvm/backup/lvm-raid lvm-raid
>> Aborting vg_write: No metadata areas to write to!
>> Restore failed.

You're command doesn't specify the device name of your reconstructed
array.

>> So I guess I also need to recreate the physical volume using:
> 
> Correction: (Put the wrong ID in the pvcreate example):
> pvcreate --uuid "DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS" --restorefile
> /etc/lvm/backup/lvm-raid
> 
>> Is this correct? (I'm a bit hesitant with another 'create' command as
>> you might understand.)

I haven't actually had to do this operation but once, and I don't
recall if the vgcfgrestore was sufficient.  But either way, you simply
didn't tell LVM where you are restoring TO.

Phil


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-06 19:50       ` Phil Turmel
@ 2017-03-07  8:39         ` Olivier Swinkels
  2017-03-07 14:52           ` Phil Turmel
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Swinkels @ 2017-03-07  8:39 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Mon, Mar 6, 2017 at 8:50 PM, Phil Turmel <philip@turmel.org> wrote:
> On 03/06/2017 04:17 AM, Olivier Swinkels wrote:
>
>>> That sounds promising, as /etc was not on the array.
>>> I found a backup in /etc/lvm/backup/lvm-raid (contents shown below).
>
> Yay!  That's exactly what you need.
>
>>> Unfortunatelly when I try to use it to restore the LVM I get the
>>> following error:
>>> vgcfgrestore -f /etc/lvm/backup/lvm-raid lvm-raid
>>> Aborting vg_write: No metadata areas to write to!
>>> Restore failed.
>
> You're command doesn't specify the device name of your reconstructed
> array.
>
>>> So I guess I also need to recreate the physical volume using:
>>
>> Correction: (Put the wrong ID in the pvcreate example):
>> pvcreate --uuid "DWv51O-lg9s-Dl4w-EBp9-QeIF-Vv60-8wt2uS" --restorefile
>> /etc/lvm/backup/lvm-raid
>>
>>> Is this correct? (I'm a bit hesitant with another 'create' command as
>>> you might understand.)
>
> I haven't actually had to do this operation but once, and I don't
> recall if the vgcfgrestore was sufficient.  But either way, you simply
> didn't tell LVM where you are restoring TO.
>
> Phil
>
Hi,
Thanks for your response, as far as I can see the syntax of the
vgcfgrestore command is correct as the destination location is in the
backup file.
After I used the pvcreate command to recreate pv the vgcfgrestore
command succeeds and the lvm is available (after activating).

However when I try to mount it I get the following error:
sudo mount -t ext4 /dev/lvm-raid/lvm0 /mnt/raid
mount: mount /dev/mapper/lvm--raid-lvm0 on /mnt/raid failed: Structure
needs cleaning

So I guess the underlying RAID array is still not ok...

Is there a way for me to validate if the degraded array contains valid
data or if there is still a disk swapped or corrupted?
Thanks.

Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-07  8:39         ` Olivier Swinkels
@ 2017-03-07 14:52           ` Phil Turmel
  2017-03-08 19:01             ` Olivier Swinkels
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2017-03-07 14:52 UTC (permalink / raw)
  To: Olivier Swinkels; +Cc: linux-raid

On 03/07/2017 03:39 AM, Olivier Swinkels wrote:

> After I used the pvcreate command to recreate pv the vgcfgrestore 
> command succeeds and the lvm is available (after activating).
> 
> However when I try to mount it I get the following error: sudo mount
> -t ext4 /dev/lvm-raid/lvm0 /mnt/raid mount: mount
> /dev/mapper/lvm--raid-lvm0 on /mnt/raid failed: Structure needs
> cleaning
> 
> So I guess the underlying RAID array is still not ok...

No, your underlying array is very likely correct.  But the intervening
incorrect --create operation stomped on your filesystems.  Run fsck
while unmounted to deal with the corruption and recover what you can.

Run fsck with "-n" first, to see just how extensive the problems are,
then with "-y" to actually fix things.  Based on your sequence of
events, your corruptions should be at low sector addresses (first few
Gigs) of your array.  If that's what appears with "-n", proceed.

If you are unlucky, the stompage hit one or more of your filesystems'
superblocks, requiring access to backup superblocks.  If you still see
no progress with either of the above, you might need to search your
array for ext2/3/4 superblocks.  This grep would help:

dd if=/dev/md0 bs=1M count=16k 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'

(Not all hits from the grep will be superblocks, but they would be
visually distinguishable, and would have decipherable timestamps.)

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-07 14:52           ` Phil Turmel
@ 2017-03-08 19:01             ` Olivier Swinkels
  2017-03-17 19:25               ` Olivier Swinkels
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Swinkels @ 2017-03-08 19:01 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Tue, Mar 7, 2017 at 3:52 PM, Phil Turmel <philip@turmel.org> wrote:
> On 03/07/2017 03:39 AM, Olivier Swinkels wrote:
>
>> After I used the pvcreate command to recreate pv the vgcfgrestore
>> command succeeds and the lvm is available (after activating).
>>
>> However when I try to mount it I get the following error: sudo mount
>> -t ext4 /dev/lvm-raid/lvm0 /mnt/raid mount: mount
>> /dev/mapper/lvm--raid-lvm0 on /mnt/raid failed: Structure needs
>> cleaning
>>
>> So I guess the underlying RAID array is still not ok...
>
> No, your underlying array is very likely correct.  But the intervening
> incorrect --create operation stomped on your filesystems.  Run fsck
> while unmounted to deal with the corruption and recover what you can.
>
> Run fsck with "-n" first, to see just how extensive the problems are,
> then with "-y" to actually fix things.  Based on your sequence of
> events, your corruptions should be at low sector addresses (first few
> Gigs) of your array.  If that's what appears with "-n", proceed.
>
> If you are unlucky, the stompage hit one or more of your filesystems'
> superblocks, requiring access to backup superblocks.  If you still see
> no progress with either of the above, you might need to search your
> array for ext2/3/4 superblocks.  This grep would help:
>
> dd if=/dev/md0 bs=1M count=16k 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'
>
> (Not all hits from the grep will be superblocks, but they would be
> visually distinguishable, and would have decipherable timestamps.)
>
> Phil

Hi,

I ran fsck -n disk on the lvm and got the very large response below.
This didn't look promising, so I ran the ext2/3/4 superblock search.
I didn't recognize any obvious timestamps, but i'm not sure what to look for.
(I also pasted this output below)

Can you recommend any further action?

Olivier

===============================================================================
fsck /dev/lvm-raid/lvm0
fsck from util-linux 2.27.1
ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
fsck.ext4: Group descriptors look bad... trying backup blocks...
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/mapper/lvm--raid-lvm0 has gone 1275 days without being checked,
check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (23513, counted=5372).
Fix? no
Free blocks count wrong for group #1 (980, counted=485).
Fix? no
Free blocks count wrong for group #3 (1023, counted=151).
Fix? no
Free blocks count wrong for group #5 (1022, counted=142).
Fix? no

<<TRIMMED ~200000 lines >>>

Free inodes count wrong for group #60160 (8192, counted=8163).
Fix? no
Directories count wrong for group #60160 (0, counted=1).
Fix? no
Free inodes count wrong for group #60161 (8192, counted=8191).
Fix? no
Directories count wrong for group #60161 (0, counted=1).
Fix? no
Free inodes count wrong (488172170, counted=609035768).
Fix? no

/dev/mapper/lvm--raid-lvm0: ********** WARNING: Filesystem still has
errors **********
/dev/mapper/lvm--raid-lvm0: 122295670/610467840 files (0.0%
non-contiguous), 2291846165/2441871360 blocks
===============================================================================
dd if=/dev/md0 bs=1M count=16k 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'
00040430  3a 3b ab 58 17 00 19 00  53 ef 01 00 01 00 00 00  |:;.X....S.......|
08040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
18040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
1d220e30  2e 93 9c 90 c1 89 27 62  53 ef 03 12 d4 82 36 76  |......'bS.....6v|
27340430  bb 78 c4 e3 9e 56 62 0f  53 ef 04 b9 38 0c d0 26  |.x...Vb.S...8..&|
28040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
2f02ab30  5e d0 c9 98 83 ce 3b 92  53 ef 08 51 c4 4b dc af  |^.....;.S..Q.K..|
38040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
44318c30  55 c1 19 f3 10 fb ab 2f  53 ef 04 0d fd c1 dc ed  |U....../S.......|
48040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
50791730  6b 04 06 e8 97 88 9b 08  53 ef 0b 24 21 68 3d a5  |k.......S..$!h=.|
5aab3030  db 52 8d 5c 82 f4 80 cd  53 ef 08 4c f7 a7 c7 a9  |.R.\....S..L....|
98b69330  41 07 bb 64 c2 4a 00 5a  53 ef 08 71 88 d2 5a 42  |A..d.J.ZS..q..ZB|
c8040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
cefc4130  14 37 b9 ef 1f 89 14 ab  53 ef 02 1c ca 88 f0 c4  |.7......S.......|
d8040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
fe124d30  8a 6f 46 db 63 9b c5 9b  53 ef 07 97 74 24 2d e0  |.oF.c...S...t$-.|
106432f30  76 d0 fc 66 02 06 e9 50  53 ef 02 44 6c ab 89 ac  |v..f...PS..Dl...|
111e92630  f8 59 df b2 af 5d 6b 8b  53 ef 0d 65 c6 29 66 0c  |.Y...]k.S..e.)f.|
12883be30  88 63 b9 3a 6a 52 03 a6  53 ef 0c 12 67 dc ad 9e  |.c.:jR..S...g...|
14d315530  30 92 ef 27 a0 dc 46 dd  53 ef 09 d7 89 5a ba 95  |0..'..F.S....Z..|
17222e430  aa 7d 4f 4a c5 88 fa f0  53 ef 0c 08 04 10 9e ad  |.}OJ....S.......|
173717c30  3d fd 91 37 5a 0e 7b aa  53 ef 0b 08 d4 51 89 fa  |=..7Z.{.S....Q..|
17665ae30  c9 a8 24 d6 ba 0a 50 f7  53 ef 08 13 da d7 52 0a  |..$...P.S.....R.|
188040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
18ef3be30  41 c3 91 02 65 af 8d 27  53 ef 0c 1f 4b 47 f5 97  |A...e..'S...KG..|
1943f1d30  d6 87 e0 d7 c1 3b e7 53  53 ef 08 a5 88 14 b6 c8  |.....;.SS.......|
19bfba930  c3 e4 07 04 b4 d1 05 4a  53 ef 02 c4 69 2f e2 d0  |.......JS...i/..|
1a807c530  a1 12 db b0 77 9d c5 6c  53 ef 0d 6e 37 05 74 61  |....w..lS..n7.ta|
1b1976330  1a 2b 57 56 46 96 c2 9b  53 ef 03 78 0d 9e 4a e0  |.+WVF...S..x..J.|
1ebf5db30  1e 76 bf 9d 25 8f fd 16  53 ef 04 ba 23 14 2f 8d  |.v..%...S...#./.|
1eec8b930  d5 dc 1e 90 b1 c8 f4 31  53 ef 06 10 3c 81 fb 37  |.......1S...<..7|
1f1392930  89 db 6c 92 6c 10 a5 f9  53 ef 05 31 0e 04 a2 d0  |..l.l...S..1....|
227cdff30  8d 8a 0a 61 91 04 c4 15  53 ef 03 5b 28 c4 52 59  |...a....S..[(.RY|
23000da30  1d e6 34 ca 2f 36 08 78  53 ef 0e 28 e4 94 5f 42  |..4./6.xS..(.._B|
245749e30  83 67 b7 d0 8b 40 0d f0  53 ef 0e ec 13 37 39 89  |.g...@..S....79.|
24f40e530  86 3b fc 43 ea ef 81 29  53 ef 09 6c fb 08 3b 6d  |.;.C...)S..l..;m|
25a473430  94 24 8d 8c 6b 15 7b 8b  53 ef 00 ae 43 b2 64 a4  |.$..k.{.S...C.d.|
288040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
29966b530  00 ad 14 71 ea c5 c6 7a  53 ef 0c 11 d2 57 fb a0  |...q...zS....W..|
2a5924830  98 43 57 9b 2f 98 18 4f  53 ef 0d 27 72 c3 58 ba  |.CW./..OS..'r.X.|
2a7bf9b30  72 f2 a7 7c 73 4f 2f a6  53 ef 02 14 5b 7c 93 9e  |r..|sO/.S...[|..|
2aae14530  25 6b d5 ab b0 48 94 cd  53 ef 06 08 41 e0 0b ba  |%k...H..S...A...|
2ac470a30  a4 a6 cb a0 40 a2 8f cd  53 ef 06 47 3d d5 e0 38  |....@...S..G=..8|
2aca3fb30  99 e4 1a cd b1 a8 be 6c  53 ef 01 03 68 6f 4c f4  |.......lS...hoL.|
2b19b6030  30 64 6a d9 08 f0 46 c9  53 ef 07 95 b2 14 fb aa  |0dj...F.S.......|
2c53e0a30  59 1b 4b c8 25 a1 56 d9  53 ef 0b 26 3a 76 72 a5  |Y.K.%.V.S..&:vr.|
2cb48d730  0a d2 b4 d5 40 fc c5 d0  53 ef 0b 6d 90 9b db 22  |....@...S..m..."|
2dc925b30  16 18 5b 30 09 18 f8 2e  53 ef 0a d1 00 74 13 e7  |..[0....S....t..|
2dcc76230  e9 75 d0 2e 92 45 59 2c  53 ef 0e 84 63 9a aa b6  |.u...EY,S...c...|
2e6d28730  f4 81 47 6a 8f 9c ae 1a  53 ef 09 9d d2 5b e4 35  |..Gj....S....[.5|
2f9639f30  82 67 ae 49 2c a1 fc c0  53 ef 07 b1 1f 54 c5 c9  |.g.I,...S....T..|
2fa204230  78 42 16 12 e9 83 72 88  53 ef 00 08 df d9 8b 2a  |xB....r.S......*|
31b7c1030  99 f0 43 99 1f 52 77 17  53 ef 0c 5f b9 51 a1 42  |..C..Rw.S.._.Q.B|
321ad8a30  4e 9b 9e 2e c6 8e 19 8d  53 ef 0e 1d 5d 20 c0 9d  |N.......S...] ..|
32ff61630  72 9f 28 d2 9a 35 79 63  53 ef 09 e6 d6 e3 27 c5  |r.(..5ycS.....'.|
33a4b8230  58 9d 65 cb da 31 07 e7  53 ef 04 07 e2 4a 9b 17  |X.e..1..S....J..|
3472a7230  b4 a9 fa cf 3d 39 c3 95  53 ef 03 a4 07 2a ac 9a  |....=9..S....*..|
3534f5130  54 d2 19 77 a4 c2 4a 2f  53 ef 07 b3 0f 60 be ee  |T..w..J/S....`..|
356145130  df f7 f8 eb 6b 2a 8b fa  53 ef 09 cf 9d cd db 68  |....k*..S......h|
361b68830  53 ab 0c 7a 8e 4e 96 1b  53 ef 0f 3b 78 e9 d5 ce  |S..z.N..S..;x...|
37381e630  24 63 4a ce 1b eb b0 df  53 ef 02 68 54 7d 7a ad  |$cJ.....S..hT}z.|
375103330  3a 20 84 18 b6 6d f5 3b  53 ef 0a 86 8f ac b2 d8  |: ...m.;S.......|
37b3f7230  11 43 a4 46 fd c8 da ae  53 ef 04 f3 80 db 5c 4b  |.C.F....S.....\K|
3a5adf530  7f c1 e1 75 26 cf 25 b2  53 ef 01 4d 64 71 10 bf  |...u&.%.S..Mdq..|
3ccc7b830  55 cb 6f 68 e3 c0 0f 36  53 ef 06 00 85 8a 17 72  |U.oh...6S......r|
3d6743f30  fa ca 0e 36 a6 4e 02 6a  53 ef 05 6d 6f e5 31 78  |...6.N.jS..mo.1x|
3e8040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|

===============================================================================

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-08 19:01             ` Olivier Swinkels
@ 2017-03-17 19:25               ` Olivier Swinkels
  2017-03-21 17:08                 ` Phil Turmel
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Swinkels @ 2017-03-17 19:25 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

On Wed, Mar 8, 2017 at 8:01 PM, Olivier Swinkels
<olivier.swinkels@gmail.com> wrote:
> On Tue, Mar 7, 2017 at 3:52 PM, Phil Turmel <philip@turmel.org> wrote:
>> On 03/07/2017 03:39 AM, Olivier Swinkels wrote:
>>
>>> After I used the pvcreate command to recreate pv the vgcfgrestore
>>> command succeeds and the lvm is available (after activating).
>>>
>>> However when I try to mount it I get the following error: sudo mount
>>> -t ext4 /dev/lvm-raid/lvm0 /mnt/raid mount: mount
>>> /dev/mapper/lvm--raid-lvm0 on /mnt/raid failed: Structure needs
>>> cleaning
>>>
>>> So I guess the underlying RAID array is still not ok...
>>
>> No, your underlying array is very likely correct.  But the intervening
>> incorrect --create operation stomped on your filesystems.  Run fsck
>> while unmounted to deal with the corruption and recover what you can.
>>
>> Run fsck with "-n" first, to see just how extensive the problems are,
>> then with "-y" to actually fix things.  Based on your sequence of
>> events, your corruptions should be at low sector addresses (first few
>> Gigs) of your array.  If that's what appears with "-n", proceed.
>>
>> If you are unlucky, the stompage hit one or more of your filesystems'
>> superblocks, requiring access to backup superblocks.  If you still see
>> no progress with either of the above, you might need to search your
>> array for ext2/3/4 superblocks.  This grep would help:
>>
>> dd if=/dev/md0 bs=1M count=16k 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'
>>
>> (Not all hits from the grep will be superblocks, but they would be
>> visually distinguishable, and would have decipherable timestamps.)
>>
>> Phil
>
> Hi,
>
> I ran fsck -n disk on the lvm and got the very large response below.
> This didn't look promising, so I ran the ext2/3/4 superblock search.
> I didn't recognize any obvious timestamps, but i'm not sure what to look for.
> (I also pasted this output below)
>
> Can you recommend any further action?
>
> Olivier
>
> ===============================================================================
> fsck /dev/lvm-raid/lvm0
> fsck from util-linux 2.27.1
> ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap
> fsck.ext4: Group descriptors look bad... trying backup blocks...
> Warning: skipping journal recovery because doing a read-only filesystem check.
> /dev/mapper/lvm--raid-lvm0 has gone 1275 days without being checked,
> check forced.
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Free blocks count wrong for group #0 (23513, counted=5372).
> Fix? no
> Free blocks count wrong for group #1 (980, counted=485).
> Fix? no
> Free blocks count wrong for group #3 (1023, counted=151).
> Fix? no
> Free blocks count wrong for group #5 (1022, counted=142).
> Fix? no
>
> <<TRIMMED ~200000 lines >>>
>
> Free inodes count wrong for group #60160 (8192, counted=8163).
> Fix? no
> Directories count wrong for group #60160 (0, counted=1).
> Fix? no
> Free inodes count wrong for group #60161 (8192, counted=8191).
> Fix? no
> Directories count wrong for group #60161 (0, counted=1).
> Fix? no
> Free inodes count wrong (488172170, counted=609035768).
> Fix? no
>
> /dev/mapper/lvm--raid-lvm0: ********** WARNING: Filesystem still has
> errors **********
> /dev/mapper/lvm--raid-lvm0: 122295670/610467840 files (0.0%
> non-contiguous), 2291846165/2441871360 blocks
> ===============================================================================
> dd if=/dev/md0 bs=1M count=16k 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'
> 00040430  3a 3b ab 58 17 00 19 00  53 ef 01 00 01 00 00 00  |:;.X....S.......|
> 08040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 18040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 1d220e30  2e 93 9c 90 c1 89 27 62  53 ef 03 12 d4 82 36 76  |......'bS.....6v|
> 27340430  bb 78 c4 e3 9e 56 62 0f  53 ef 04 b9 38 0c d0 26  |.x...Vb.S...8..&|
> 28040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 2f02ab30  5e d0 c9 98 83 ce 3b 92  53 ef 08 51 c4 4b dc af  |^.....;.S..Q.K..|
> 38040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 44318c30  55 c1 19 f3 10 fb ab 2f  53 ef 04 0d fd c1 dc ed  |U....../S.......|
> 48040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 50791730  6b 04 06 e8 97 88 9b 08  53 ef 0b 24 21 68 3d a5  |k.......S..$!h=.|
> 5aab3030  db 52 8d 5c 82 f4 80 cd  53 ef 08 4c f7 a7 c7 a9  |.R.\....S..L....|
> 98b69330  41 07 bb 64 c2 4a 00 5a  53 ef 08 71 88 d2 5a 42  |A..d.J.ZS..q..ZB|
> c8040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> cefc4130  14 37 b9 ef 1f 89 14 ab  53 ef 02 1c ca 88 f0 c4  |.7......S.......|
> d8040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> fe124d30  8a 6f 46 db 63 9b c5 9b  53 ef 07 97 74 24 2d e0  |.oF.c...S...t$-.|
> 106432f30  76 d0 fc 66 02 06 e9 50  53 ef 02 44 6c ab 89 ac  |v..f...PS..Dl...|
> 111e92630  f8 59 df b2 af 5d 6b 8b  53 ef 0d 65 c6 29 66 0c  |.Y...]k.S..e.)f.|
> 12883be30  88 63 b9 3a 6a 52 03 a6  53 ef 0c 12 67 dc ad 9e  |.c.:jR..S...g...|
> 14d315530  30 92 ef 27 a0 dc 46 dd  53 ef 09 d7 89 5a ba 95  |0..'..F.S....Z..|
> 17222e430  aa 7d 4f 4a c5 88 fa f0  53 ef 0c 08 04 10 9e ad  |.}OJ....S.......|
> 173717c30  3d fd 91 37 5a 0e 7b aa  53 ef 0b 08 d4 51 89 fa  |=..7Z.{.S....Q..|
> 17665ae30  c9 a8 24 d6 ba 0a 50 f7  53 ef 08 13 da d7 52 0a  |..$...P.S.....R.|
> 188040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 18ef3be30  41 c3 91 02 65 af 8d 27  53 ef 0c 1f 4b 47 f5 97  |A...e..'S...KG..|
> 1943f1d30  d6 87 e0 d7 c1 3b e7 53  53 ef 08 a5 88 14 b6 c8  |.....;.SS.......|
> 19bfba930  c3 e4 07 04 b4 d1 05 4a  53 ef 02 c4 69 2f e2 d0  |.......JS...i/..|
> 1a807c530  a1 12 db b0 77 9d c5 6c  53 ef 0d 6e 37 05 74 61  |....w..lS..n7.ta|
> 1b1976330  1a 2b 57 56 46 96 c2 9b  53 ef 03 78 0d 9e 4a e0  |.+WVF...S..x..J.|
> 1ebf5db30  1e 76 bf 9d 25 8f fd 16  53 ef 04 ba 23 14 2f 8d  |.v..%...S...#./.|
> 1eec8b930  d5 dc 1e 90 b1 c8 f4 31  53 ef 06 10 3c 81 fb 37  |.......1S...<..7|
> 1f1392930  89 db 6c 92 6c 10 a5 f9  53 ef 05 31 0e 04 a2 d0  |..l.l...S..1....|
> 227cdff30  8d 8a 0a 61 91 04 c4 15  53 ef 03 5b 28 c4 52 59  |...a....S..[(.RY|
> 23000da30  1d e6 34 ca 2f 36 08 78  53 ef 0e 28 e4 94 5f 42  |..4./6.xS..(.._B|
> 245749e30  83 67 b7 d0 8b 40 0d f0  53 ef 0e ec 13 37 39 89  |.g...@..S....79.|
> 24f40e530  86 3b fc 43 ea ef 81 29  53 ef 09 6c fb 08 3b 6d  |.;.C...)S..l..;m|
> 25a473430  94 24 8d 8c 6b 15 7b 8b  53 ef 00 ae 43 b2 64 a4  |.$..k.{.S...C.d.|
> 288040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
> 29966b530  00 ad 14 71 ea c5 c6 7a  53 ef 0c 11 d2 57 fb a0  |...q...zS....W..|
> 2a5924830  98 43 57 9b 2f 98 18 4f  53 ef 0d 27 72 c3 58 ba  |.CW./..OS..'r.X.|
> 2a7bf9b30  72 f2 a7 7c 73 4f 2f a6  53 ef 02 14 5b 7c 93 9e  |r..|sO/.S...[|..|
> 2aae14530  25 6b d5 ab b0 48 94 cd  53 ef 06 08 41 e0 0b ba  |%k...H..S...A...|
> 2ac470a30  a4 a6 cb a0 40 a2 8f cd  53 ef 06 47 3d d5 e0 38  |....@...S..G=..8|
> 2aca3fb30  99 e4 1a cd b1 a8 be 6c  53 ef 01 03 68 6f 4c f4  |.......lS...hoL.|
> 2b19b6030  30 64 6a d9 08 f0 46 c9  53 ef 07 95 b2 14 fb aa  |0dj...F.S.......|
> 2c53e0a30  59 1b 4b c8 25 a1 56 d9  53 ef 0b 26 3a 76 72 a5  |Y.K.%.V.S..&:vr.|
> 2cb48d730  0a d2 b4 d5 40 fc c5 d0  53 ef 0b 6d 90 9b db 22  |....@...S..m..."|
> 2dc925b30  16 18 5b 30 09 18 f8 2e  53 ef 0a d1 00 74 13 e7  |..[0....S....t..|
> 2dcc76230  e9 75 d0 2e 92 45 59 2c  53 ef 0e 84 63 9a aa b6  |.u...EY,S...c...|
> 2e6d28730  f4 81 47 6a 8f 9c ae 1a  53 ef 09 9d d2 5b e4 35  |..Gj....S....[.5|
> 2f9639f30  82 67 ae 49 2c a1 fc c0  53 ef 07 b1 1f 54 c5 c9  |.g.I,...S....T..|
> 2fa204230  78 42 16 12 e9 83 72 88  53 ef 00 08 df d9 8b 2a  |xB....r.S......*|
> 31b7c1030  99 f0 43 99 1f 52 77 17  53 ef 0c 5f b9 51 a1 42  |..C..Rw.S.._.Q.B|
> 321ad8a30  4e 9b 9e 2e c6 8e 19 8d  53 ef 0e 1d 5d 20 c0 9d  |N.......S...] ..|
> 32ff61630  72 9f 28 d2 9a 35 79 63  53 ef 09 e6 d6 e3 27 c5  |r.(..5ycS.....'.|
> 33a4b8230  58 9d 65 cb da 31 07 e7  53 ef 04 07 e2 4a 9b 17  |X.e..1..S....J..|
> 3472a7230  b4 a9 fa cf 3d 39 c3 95  53 ef 03 a4 07 2a ac 9a  |....=9..S....*..|
> 3534f5130  54 d2 19 77 a4 c2 4a 2f  53 ef 07 b3 0f 60 be ee  |T..w..J/S....`..|
> 356145130  df f7 f8 eb 6b 2a 8b fa  53 ef 09 cf 9d cd db 68  |....k*..S......h|
> 361b68830  53 ab 0c 7a 8e 4e 96 1b  53 ef 0f 3b 78 e9 d5 ce  |S..z.N..S..;x...|
> 37381e630  24 63 4a ce 1b eb b0 df  53 ef 02 68 54 7d 7a ad  |$cJ.....S..hT}z.|
> 375103330  3a 20 84 18 b6 6d f5 3b  53 ef 0a 86 8f ac b2 d8  |: ...m.;S.......|
> 37b3f7230  11 43 a4 46 fd c8 da ae  53 ef 04 f3 80 db 5c 4b  |.C.F....S.....\K|
> 3a5adf530  7f c1 e1 75 26 cf 25 b2  53 ef 01 4d 64 71 10 bf  |...u&.%.S..Mdq..|
> 3ccc7b830  55 cb 6f 68 e3 c0 0f 36  53 ef 06 00 85 8a 17 72  |U.oh...6S......r|
> 3d6743f30  fa ca 0e 36 a6 4e 02 6a  53 ef 05 6d 6f e5 31 78  |...6.N.jS..mo.1x|
> 3e8040030  0c 58 34 52 02 00 19 00  53 ef 01 00 01 00 00 00  |.X4R....S.......|
>
> ===============================================================================

Hi Phil,

Did you already have time to look at the results of the fsck check and
ext2/3/4 superblock search?
I would really like some feedback, as I'm quite out of ideas.

Thanks!

Olivier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RAID recovery] Unable to recover RAID5 array after disk failure
  2017-03-17 19:25               ` Olivier Swinkels
@ 2017-03-21 17:08                 ` Phil Turmel
  0 siblings, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2017-03-21 17:08 UTC (permalink / raw)
  To: Olivier Swinkels; +Cc: linux-raid

Hi Olivier,

{ Sorry, lot of work travel lately /-: }

On 03/17/2017 03:25 PM, Olivier Swinkels wrote:

> Hi Phil,
> 
> Did you already have time to look at the results of the fsck check
> and ext2/3/4 superblock search?

Well, there are a few superblock candidates in your output, the lines
showing ".X4R" for the first four bytes, but that converts to a
timestamp of Sat, 14 Sep 2013 12:35:24 GMT.

Ewww.

> I would really like some feedback, as I'm quite out of ideas.

You should look at the other devices, but with that timestamp, the odds
look very poor.

Sorry.  Possibly try photorec or similar raw data recovery tools.

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-21 17:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 21:35 [RAID recovery] Unable to recover RAID5 array after disk failure Olivier Swinkels
2017-03-05 18:55 ` Phil Turmel
2017-03-06  8:26   ` Olivier Swinkels
2017-03-06  9:17     ` Olivier Swinkels
2017-03-06 19:50       ` Phil Turmel
2017-03-07  8:39         ` Olivier Swinkels
2017-03-07 14:52           ` Phil Turmel
2017-03-08 19:01             ` Olivier Swinkels
2017-03-17 19:25               ` Olivier Swinkels
2017-03-21 17:08                 ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.