All of lore.kernel.org
 help / color / mirror / Atom feed
* Help in recovering a RAID5 volume
@ 2016-11-10 15:41 Felipe Kich
  2016-11-10 17:06 ` Wols Lists
  0 siblings, 1 reply; 6+ messages in thread
From: Felipe Kich @ 2016-11-10 15:41 UTC (permalink / raw)
  To: linux-raid

Hello,

I have an Iomega IX4-200D bought in 2009 with 4 Seagate Barracuda LP
1TB drives that came pre-installed, and since then it's been working
fine, never had had real complaints about it in those 7 years. This
week, the samba shares disappeared. Accessing the web admin page, I
saw that the shares were gone, but the disk usage was correct (1,2TB
in use / 1,5TB free), and the status of the disks was the problem.
Disks 1, 2 and 4 had an alert and disk 3 was offline. Problem is that
until then, the unit never gave any warnings or signs that the disks
could fail. Well, doesn't really matter now. So, I turned off the unit
and started reading about what can be done to recover the files
inside.

I've set up a Linux PC, connected all disks, and began collecting
information about the condition of the HDDs, partitions, all I could
find. After reading the Linux Raid wiki and lots of threads on the
topic I'm still unable to mount the RAID5 volume in question. So, I'm
posting below the info I gathered from the RAID config in hopes
someone can give me some advice. Before posting the info, I've already
read about using hard disks designed for NAS usage, SCT Error Recovery
Control support, Desktop vs Enterprise drives, etc, but that's what we
could afford to buy at the time, unfortunately.

So, here's the info I got so far:

--------------------------------------------------------------------------------
Index
--------------------------------------------------------------------------------
1) smartctl -H -i -l scterc (for all disks)
2a) mdadm --examine /dev/sda (for the disk and both partitions)
2b) mdadm --examine /dev/sdb (for the disk and both partitions)
2c) mdadm --examine /dev/sdc (for the disk and both partitions)
2d) mdadm --examine /dev/sdd (for the disk and both partitions)
3) lsdrv
4) cat /proc/mdstat

--------------------------------------------------------------------------------
1) smartcl -H -i -l scterc
--------------------------------------------------------------------------------
root@it:/home/it/Desktop# smartctl -H -i -l scterc /dev/sda
smartctl 6.5 2016-01-24 r4214 [i686-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31000520AS
Serial Number:    9VX0Y8JW
LU WWN Device Id: 5 000c50 026dca9fb
Firmware Version: CC37
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Nov 10 14:53:37 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

root@it:/home/it/Desktop# smartctl -H -i -l scterc /dev/sdb
smartctl 6.5 2016-01-24 r4214 [i686-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31000520AS
Serial Number:    9VX0WRVM
LU WWN Device Id: 5 000c50 026ca4019
Firmware Version: CC37
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Nov 10 14:54:07 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

root@it:/home/it/Desktop# smartctl -H -i -l scterc /dev/sdc
smartctl 6.5 2016-01-24 r4214 [i686-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31000520AS
Serial Number:    9VX0XD1S
LU WWN Device Id: 5 000c50 026dbdbf0
Firmware Version: CC38
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Nov 10 14:54:09 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   002   002   036    Pre-fail
Always   FAILING_NOW 4033

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

root@it:/home/it/Desktop# smartctl -H -i -l scterc /dev/sdd
smartctl 6.5 2016-01-24 r4214 [i686-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda LP
Device Model:     ST31000520AS
Serial Number:    9VX0Y9JW
LU WWN Device Id: 5 000c50 026d7169b
Firmware Version: CC38
User Capacity:    1.000.204.886.016 bytes [1,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Nov 10 14:54:10 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   003   003   036    Pre-fail
Always   FAILING_NOW 4013

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

--------------------------------------------------------------------------------
2a) mdadm --examine /dev/sda (for the disk and both partitions)
--------------------------------------------------------------------------------

root@it:/home/it/Desktop# mdadm --examine /dev/sda
/dev/sda:
   MBR Magic : aa55
Partition[0] :      4080509 sectors at            1 (type 83)
Partition[1] :   1949444658 sectors at      4080510 (type 83)


root@it:/home/it/Desktop# mdadm --examine /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ab0d7fdf:373ee9f2:5d8fd52f:304e1b90
  Creation Time : Thu May  6 20:34:46 2010
     Raid Level : raid1
  Used Dev Size : 2040128 (1992.65 MiB 2089.09 MB)
     Array Size : 2040128 (1992.65 MiB 2089.09 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Nov  9 16:49:29 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : bd7c68c8 - correct
         Events : 37056

      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       0        0        2      faulty removed
   3     3       8       17        3      active sync   /dev/sdb1


root@it:/home/it/Desktop# mdadm --examine /dev/sda2
/dev/sda2:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : b570d224:d61d7f45:8352223d:f9c68ac4
           Name : storage:1
  Creation Time : Thu Feb 17 10:22:16 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1949444384 (929.57 GiB 998.12 GB)
     Array Size : 2924166528 (2788.70 GiB 2994.35 GB)
  Used Dev Size : 1949444352 (929.57 GiB 998.12 GB)
   Super Offset : 1949444640 sectors
   Unused Space : before=0 sectors, after=288 sectors
          State : clean
    Device UUID : e0b08740:62497ceb:c107ad71:6bade30e

    Update Time : Wed Nov  9 16:05:03 2016
       Checksum : 70a9b667 - correct
         Events : 161174

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

--------------------------------------------------------------------------------
2b) mdadm --examine /dev/sdb (for the disk and both partitions)
--------------------------------------------------------------------------------

root@it:/home/it/Desktop# mdadm --examine /dev/sdb
/dev/sdb:
   MBR Magic : aa55
Partition[0] :      4080447 sectors at           63 (type 83)
Partition[1] :   1949444658 sectors at      4080510 (type 83)


root@it:/home/it/Desktop# mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ab0d7fdf:373ee9f2:5d8fd52f:304e1b90
  Creation Time : Thu May  6 20:34:46 2010
     Raid Level : raid1
  Used Dev Size : 2040128 (1992.65 MiB 2089.09 MB)
     Array Size : 2040128 (1992.65 MiB 2089.09 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Nov  9 16:49:29 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : bd7c68de - correct
         Events : 37056

      Number   Major   Minor   RaidDevice State
this     3       8       17        3      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       0        0        2      faulty removed
   3     3       8       17        3      active sync   /dev/sdb1


root@it:/home/it/Desktop# mdadm --examine /dev/sdb2
/dev/sdb2:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : b570d224:d61d7f45:8352223d:f9c68ac4
           Name : storage:1
  Creation Time : Thu Feb 17 10:22:16 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1949444384 (929.57 GiB 998.12 GB)
     Array Size : 2924166528 (2788.70 GiB 2994.35 GB)
  Used Dev Size : 1949444352 (929.57 GiB 998.12 GB)
   Super Offset : 1949444640 sectors
   Unused Space : before=0 sectors, after=288 sectors
          State : clean
    Device UUID : c07ecc29:5939c5c0:dda4e6fd:343fbf57

    Update Time : Wed Nov  9 16:05:03 2016
       Checksum : 44ca328 - correct
         Events : 161174

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

--------------------------------------------------------------------------------
2c) mdadm --examine /dev/sdc (for the disk and both partitions)
--------------------------------------------------------------------------------

root@it:/home/it/Desktop# mdadm --examine /dev/sdc
/dev/sdc:
   MBR Magic : aa55
Partition[0] :      4080509 sectors at            1 (type 83)
Partition[1] :   1949444658 sectors at      4080510 (type 83)


root@it:/home/it/Desktop# mdadm --examine /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ab0d7fdf:373ee9f2:5d8fd52f:304e1b90
  Creation Time : Thu May  6 20:34:46 2010
     Raid Level : raid1
  Used Dev Size : 2040128 (1992.65 MiB 2089.09 MB)
     Array Size : 2040128 (1992.65 MiB 2089.09 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Nov  9 12:55:15 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : bd7c31b0 - correct
         Events : 37022

      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       0        0        2      faulty removed
   3     3       8       17        3      active sync   /dev/sdb1


root@it:/home/it/Desktop# mdadm --examine /dev/sdc2
/dev/sdc2:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : b570d224:d61d7f45:8352223d:f9c68ac4
           Name : storage:1
  Creation Time : Thu Feb 17 10:22:16 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1949444384 (929.57 GiB 998.12 GB)
     Array Size : 2924166528 (2788.70 GiB 2994.35 GB)
  Used Dev Size : 1949444352 (929.57 GiB 998.12 GB)
   Super Offset : 1949444640 sectors
   Unused Space : before=0 sectors, after=288 sectors
          State : active
    Device UUID : ceb844db:855e415a:cfc9efe5:4c2db02d

    Update Time : Wed Nov  9 12:55:49 2016
       Checksum : d39e909 - correct
         Events : 161163

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)

--------------------------------------------------------------------------------
2d) mdadm --examine /dev/sdd (for the disk and both partitions)
--------------------------------------------------------------------------------

root@it:/home/it/Desktop# mdadm --examine /dev/sdd
/dev/sdd:
   MBR Magic : aa55
Partition[0] :      4080509 sectors at            1 (type 83)
Partition[1] :   1949444658 sectors at      4080510 (type 83)


root@it:/home/it/Desktop# mdadm --examine /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : ab0d7fdf:373ee9f2:5d8fd52f:304e1b90
  Creation Time : Thu May  6 20:34:46 2010
     Raid Level : raid1
  Used Dev Size : 2040128 (1992.65 MiB 2089.09 MB)
     Array Size : 2040128 (1992.65 MiB 2089.09 MB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Wed Nov  9 16:49:29 2016
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : bd7c68fa - correct
         Events : 37056

      Number   Major   Minor   RaidDevice State
this     1       8       49        1      active sync   /dev/sdd1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       0        0        2      faulty removed
   3     3       8       17        3      active sync   /dev/sdb1


root@it:/home/it/Desktop# mdadm --examine /dev/sdd2
/dev/sdd2:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : b570d224:d61d7f45:8352223d:f9c68ac4
           Name : storage:1
  Creation Time : Thu Feb 17 10:22:16 2011
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1949444384 (929.57 GiB 998.12 GB)
     Array Size : 2924166528 (2788.70 GiB 2994.35 GB)
  Used Dev Size : 1949444352 (929.57 GiB 998.12 GB)
   Super Offset : 1949444640 sectors
   Unused Space : before=0 sectors, after=288 sectors
          State : clean
    Device UUID : c95e2f61:d146c52c:dc6336fc:c2987aab

    Update Time : Wed Nov  9 16:05:03 2016
       Checksum : f9bab3b4 - correct
         Events : 161174

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : spare
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

--------------------------------------------------------------------------------
3) lsdrv
--------------------------------------------------------------------------------

root@it:/home/it/Desktop# ./lsdrv
PCI [ahci] 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 40)
├scsi 0:0:0:0 ATA      ST31000520AS     {9VX0Y8JW}
│└sda 931.51g [8:0] Partitioned (dos)
│ ├sda1 1.95g [8:1] MD raid1 (4) inactive {ab0d7fdf-373e-e9f2-5d8f-d52f304e1b90}
│ └sda2 929.57g [8:2] MD raid5 (4) inactive 'storage:1'
{b570d224-d61d-7f45-8352-223df9c68ac4}
├scsi 1:0:0:0 ATA      ST31000520AS     {9VX0WRVM}
│└sdb 931.51g [8:16] Partitioned (dos)
│ ├sdb1 1.95g [8:17] MD raid1 (4) inactive
{ab0d7fdf-373e-e9f2-5d8f-d52f304e1b90}
│ └sdb2 929.57g [8:18] MD raid5 (4) inactive 'storage:1'
{b570d224-d61d-7f45-8352-223df9c68ac4}
├scsi 2:0:0:0 ATA      ST31000520AS     {9VX0XD1S}
│└sdc 931.51g [8:32] Partitioned (dos)
│ ├sdc1 1.95g [8:33] MD raid1 (4) inactive
{ab0d7fdf-373e-e9f2-5d8f-d52f304e1b90}
│ └sdc2 929.57g [8:34] MD raid5 (4) inactive 'storage:1'
{b570d224-d61d-7f45-8352-223df9c68ac4}
└scsi 3:0:0:0 ATA      ST31000520AS     {9VX0Y9JW}
 └sdd 931.51g [8:48] Partitioned (dos)
  ├sdd1 1.95g [8:49] MD raid1 (4) inactive
{ab0d7fdf-373e-e9f2-5d8f-d52f304e1b90}
  └sdd2 929.57g [8:50] MD raid5 (4) inactive 'storage:1'
{b570d224-d61d-7f45-8352-223df9c68ac4}
USB [usb-storage] Bus 002 Device 002: ID 0781:5530 SanDisk Corp.
Cruzer {2005244391081570854A}
└scsi 4:0:0:0 SanDisk  Cruzer
 └sde 14.91g [8:64] Partitioned (dos)
  └sde1 14.91g [8:65] vfat 'FK16GB_LIVE' {1214-3C58}
   └Mounted as /dev/sde1 @ /cdrom
Other Block Devices
├loop0 820.33m [7:0] squashfs
│└Mounted as /dev/loop0 @ /rofs
├loop1 0.00k [7:1] Empty/Unknown
├loop2 0.00k [7:2] Empty/Unknown
├loop3 0.00k [7:3] Empty/Unknown
├loop4 0.00k [7:4] Empty/Unknown
├loop5 0.00k [7:5] Empty/Unknown
├loop6 0.00k [7:6] Empty/Unknown
├loop7 0.00k [7:7] Empty/Unknown
├md0 0.00k [9:0] MD vnone  () clear, None (None) None {None}
│                Empty/Unknown
├md1 0.00k [9:1] MD vnone  () clear, None (None) None {None}
│                Empty/Unknown
├md5 0.00k [9:5] MD vnone  () clear, None (None) None {None}
│                Empty/Unknown
├ram0 64.00m [1:0] Empty/Unknown
├ram1 64.00m [1:1] Empty/Unknown
├ram2 64.00m [1:2] Empty/Unknown
├ram3 64.00m [1:3] Empty/Unknown
├ram4 64.00m [1:4] Empty/Unknown
├ram5 64.00m [1:5] Empty/Unknown
├ram6 64.00m [1:6] Empty/Unknown
├ram7 64.00m [1:7] Empty/Unknown
├ram8 64.00m [1:8] Empty/Unknown
├ram9 64.00m [1:9] Empty/Unknown
├ram10 64.00m [1:10] Empty/Unknown
├ram11 64.00m [1:11] Empty/Unknown
├ram12 64.00m [1:12] Empty/Unknown
├ram13 64.00m [1:13] Empty/Unknown
├ram14 64.00m [1:14] Empty/Unknown
├ram15 64.00m [1:15] Empty/Unknown
├zram0 910.69m [251:0] swap {dd565600-cbd9-4d3c-bfa8-d534f6b0edea}
├zram1 910.69m [251:1] swap {6cc52777-6aef-4046-8acf-fd7b88eb5d74}
├zram2 910.69m [251:2] swap {7d6eba27-e88b-46a9-9edc-b36fc273b63a}
└zram3 910.69m [251:3] swap {ce871e97-37f7-4a37-b09d-bef1f1e288b9}

--------------------------------------------------------------------------------
4) cat /proc/mdstat
--------------------------------------------------------------------------------

root@it:/home/it/Desktop# cat /proc/mdstat
Personalities : [raid1]
unused devices: <none>

--------------------------------------------------------------------------------


So, with that info, I could verify some things that are frequently
mentioned on the posts:
- SCT Error Recovery Control is disabled for both Read and Write operations;
- Events counter in the devices are the same, except for one disk, but
the difference is small (<50);
- Magic Numbers and Checksums are all correct;

Hope someone can give some advice as how to proceed next.

Best regards.

-
Felipe Kich
51-9622-2067

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help in recovering a RAID5 volume
  2016-11-10 15:41 Help in recovering a RAID5 volume Felipe Kich
@ 2016-11-10 17:06 ` Wols Lists
  2016-11-10 17:32   ` Wols Lists
  2016-11-10 17:47   ` Felipe Kich
  0 siblings, 2 replies; 6+ messages in thread
From: Wols Lists @ 2016-11-10 17:06 UTC (permalink / raw)
  To: Felipe Kich, linux-raid

On 10/11/16 15:41, Felipe Kich wrote:
> So, with that info, I could verify some things that are frequently
> mentioned on the posts:
> - SCT Error Recovery Control is disabled for both Read and Write operations;
> - Events counter in the devices are the same, except for one disk, but
> the difference is small (<50);
> - Magic Numbers and Checksums are all correct;
> 
> Hope someone can give some advice as how to proceed next.
> 
Okay. It says the drives are failing, so the first thing is to go out
and get four new drives :-( Ouch!

Preferably WD Reds or Seagate NAS (Toshibas seem to support ERC too, I'm
not sure...)

DON'T TOUCH A 3TB BARRACUDA. Barracudas aren't a good idea but the 3TB
disk is apparently an especially bad choice.

Do you want to upgrade your array size? Or do you want to go Raid-6?
Four 2TB drives will give you a 4TB Raid-6 array. And look at getting 3-
or 4TB drives, they're good value for money. You might decide it's not
worth it.

Copy and replace all the failing drives with ddrescue. Hopefully you'll
get a perfect copy. Don't worry that the old drive is smaller than the
new one if you get 2TB or larger drives.

Assuming everything copies fine, find the three drives that are copies
of sda, sdb, sdd (ie the ones with the highest event counts), and
assemble with --force. You should now have a new array working fine. Do
a fsck to make sure everything's okay - you'll probably lose a file or
two :-(

Add in the fourth disk - it'll trigger a rebuild, but that's normal.

Now if your new disks are bigger than the old ones, you can expand the
array to use the space. You can either create a new partition in the
empty drive space for a third array, or you can use a utility to
move/expand the partitions. If you take the latter step, you should be
able to convert your raid-5 to a raid-6 (I'll let the experts chime in
on that). You can then expand the array to use all the available space,
and expand the filesystem on the array to use it.

NB: If you don't get a perfect ddrescue copy, can you please email me
the log files - especially where it logs the blocks it can't copy. One
of the things I want to do is work out how to write that utility
mentioned on the "programming" page of the wiki.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help in recovering a RAID5 volume
  2016-11-10 17:06 ` Wols Lists
@ 2016-11-10 17:32   ` Wols Lists
  2016-11-10 17:47   ` Felipe Kich
  1 sibling, 0 replies; 6+ messages in thread
From: Wols Lists @ 2016-11-10 17:32 UTC (permalink / raw)
  To: Felipe Kich, linux-raid

On 10/11/16 17:06, Wols Lists wrote:
> Add in the fourth disk - it'll trigger a rebuild, but that's normal.
> 
Just had a thought. Especially if you get larger drives, and you can
identify and copy just the three good disks, then don't bother with the
bad one.

Just partition the new fourth disk the way you plan to do it, and then
add it back in. You can then use the utilities to re-arrange the other
drives.

Or, and it's a bit more work, partition the new drives the way you want,
and ddrescue the old drives partition by partition, rather than a drive
at a time. But it'll save moving the partitions around later.

Cheers,
Wol


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help in recovering a RAID5 volume
  2016-11-10 17:06 ` Wols Lists
  2016-11-10 17:32   ` Wols Lists
@ 2016-11-10 17:47   ` Felipe Kich
  2016-11-10 18:58     ` Wols Lists
  1 sibling, 1 reply; 6+ messages in thread
From: Felipe Kich @ 2016-11-10 17:47 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

Hi Anthony,

Thanks for the reply. Here's some answers to your questions and also
another question.

It really seems that 2 disks are bad, but 2 are still good, according
to SMART. I'll replace them ASAP.
For now, I don't need to increase the array size. It's more than
enough for what I need.

About the drive duplication, I don't have spare discs available now
for that, I only have one 4TB disk at hand, so I'd like to know if
it's possible to create device images that I can mount and try to
rebuild the array, to test if it would work, then I can go and buy new
disks to replace the defective ones.

And sure, I'll send you the logs you asked, no problem.

Regards.

-
Felipe Kich
51-9622-2067


2016-11-10 15:06 GMT-02:00 Wols Lists <antlists@youngman.org.uk>:
> On 10/11/16 15:41, Felipe Kich wrote:
>> So, with that info, I could verify some things that are frequently
>> mentioned on the posts:
>> - SCT Error Recovery Control is disabled for both Read and Write operations;
>> - Events counter in the devices are the same, except for one disk, but
>> the difference is small (<50);
>> - Magic Numbers and Checksums are all correct;
>>
>> Hope someone can give some advice as how to proceed next.
>>
> Okay. It says the drives are failing, so the first thing is to go out
> and get four new drives :-( Ouch!
>
> Preferably WD Reds or Seagate NAS (Toshibas seem to support ERC too, I'm
> not sure...)
>
> DON'T TOUCH A 3TB BARRACUDA. Barracudas aren't a good idea but the 3TB
> disk is apparently an especially bad choice.
>
> Do you want to upgrade your array size? Or do you want to go Raid-6?
> Four 2TB drives will give you a 4TB Raid-6 array. And look at getting 3-
> or 4TB drives, they're good value for money. You might decide it's not
> worth it.
>
> Copy and replace all the failing drives with ddrescue. Hopefully you'll
> get a perfect copy. Don't worry that the old drive is smaller than the
> new one if you get 2TB or larger drives.
>
> Assuming everything copies fine, find the three drives that are copies
> of sda, sdb, sdd (ie the ones with the highest event counts), and
> assemble with --force. You should now have a new array working fine. Do
> a fsck to make sure everything's okay - you'll probably lose a file or
> two :-(
>
> Add in the fourth disk - it'll trigger a rebuild, but that's normal.
>
> Now if your new disks are bigger than the old ones, you can expand the
> array to use the space. You can either create a new partition in the
> empty drive space for a third array, or you can use a utility to
> move/expand the partitions. If you take the latter step, you should be
> able to convert your raid-5 to a raid-6 (I'll let the experts chime in
> on that). You can then expand the array to use all the available space,
> and expand the filesystem on the array to use it.
>
> NB: If you don't get a perfect ddrescue copy, can you please email me
> the log files - especially where it logs the blocks it can't copy. One
> of the things I want to do is work out how to write that utility
> mentioned on the "programming" page of the wiki.
>
> Cheers,
> Wol
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help in recovering a RAID5 volume
  2016-11-10 17:47   ` Felipe Kich
@ 2016-11-10 18:58     ` Wols Lists
  2016-11-22 16:12       ` Felipe Kich
  0 siblings, 1 reply; 6+ messages in thread
From: Wols Lists @ 2016-11-10 18:58 UTC (permalink / raw)
  To: Felipe Kich; +Cc: linux-raid

On 10/11/16 17:47, Felipe Kich wrote:
> Hi Anthony,
> 
> Thanks for the reply. Here's some answers to your questions and also
> another question.
> 
> It really seems that 2 disks are bad, but 2 are still good, according
> to SMART. I'll replace them ASAP.
> For now, I don't need to increase the array size. It's more than
> enough for what I need.
> 
You might find the extra price of larger drives is minimal. It's down to
you. And even 2TB drives would give you the space to go raid-6.

> About the drive duplication, I don't have spare discs available now
> for that, I only have one 4TB disk at hand, so I'd like to know if
> it's possible to create device images that I can mount and try to
> rebuild the array, to test if it would work, then I can go and buy new
> disks to replace the defective ones.

Okay, if you've got a 4TB drive ...

I can't remember what the second bad drive was ... iirc the one that was
truly dud was sdc ...

So. What I'd do is create two partitions on the 4TB that are the same
(or possibly slightly larger) than your sdx1 partition. ddrescue the 1
partition from the best of the dud drives across. Create two partitions
the same size (or larger) than your sdx2 partition, and likewise
ddrescue the 2 partition.

Do a --force assembly, and then mount the arrays read-only. The
partition should be fine. Look over it and see. I think you can do a
fsck without it actually changing anything. fsck will probably find a
few problems.

If everything's fine, add in the other two partitions and let it rebuild.

And then replace the drives as quickly as possible. With this setup
you're critically vulnerable to the 4TB failing. Read up on the
--replace option to replace the drives with minimal risk.
> 
> And sure, I'll send you the logs you asked, no problem.
> 
> Regards.
> 
Ta muchly.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Help in recovering a RAID5 volume
  2016-11-10 18:58     ` Wols Lists
@ 2016-11-22 16:12       ` Felipe Kich
  0 siblings, 0 replies; 6+ messages in thread
From: Felipe Kich @ 2016-11-22 16:12 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid

Hello again Anthony,

Well, it's been two weeks and only now I got the time to return to the
task of trying to recover that failed RAID5 volume.
Following your recommendations I ddrescue'd all 4 1TB drives to a
single (partitioned) 4TB drive.
Problem is, now mdadm can't mount the array, it's compaining about
missing superblocks. Below is what I've done so far.

I've created four partitions on a single 4TB HDD. I then used ddrescue
to copy the contents from the original disks to it.
To avoid confusion when reading the logs, let me explain how the
computer was setup:

- sda is the system disk. Runs Mint.
- sdb is the 4TB disk.
- sdc is a 1TB disk from the array. I connected the 1st disk,
ddrescue'd it, shut down the PC, replaced the disk, and so on. So, the
original disks are always sdc in the outputs below.

So disk1 (sda) was copied to sdb1, disk2 (sdb) to sdb2, disk3 (sdc) to
sdb3 and disk4 (sdd) to sdb4.

I've used ddrescue version 1.19 and always the same command line:
ddrescue --force --verbose /dev/sdc2 /dev/sdb% mapfile.disco%

--------------------------------------------------------------------------------
Mapfile for Disk 1

# current_pos  current_status
0xE864540000     +
#      pos        size  status
0x00000000  0xE864546400  +
--------------------------------------------------------------------------------
Mapfile for Disk 2

# current_pos  current_status
0xE864540000     +
#      pos        size  status
0x00000000  0xE864546400  +
--------------------------------------------------------------------------------
Mapfile for Disk 3

# current_pos  current_status
0x3FDB717C00     +
#      pos        size  status
0x00000000  0x3FDB717000  +
0x3FDB717000  0x00001000  -
0x3FDB718000  0xA888E2E400  +
--------------------------------------------------------------------------------
Mapfile for Disk 4

# current_pos  current_status
0xE864546000     +
#      pos        size  status
0x00000000  0x3A78C80000  +
0x3A78C80000  0xADEB8B0000  -
0xE864530000  0x00001000  +
0xE864531000  0x00013000  -
0xE864544000  0x00001000  +
0xE864545000  0x00001400  -
--------------------------------------------------------------------------------

After that, I tried to verify the data in the partitions, and got that:

--------------------------------------------------------------------------------
root@it:/home/it/Desktop# mdadm --examine /dev/sdb
/dev/sdb:
   MBR Magic : aa55
Partition[0] :  4294967295 sectors at            1 (type ee)

root@it:/home/it/Desktop# mdadm --examine /dev/sdb1
mdadm: /dev/sdb1 has no superblock - assembly aborted

root@it:/home/it/Desktop# mdadm --examine /dev/sdb2
mdadm: /dev/sdb2 has no superblock - assembly aborted

root@it:/home/it/Desktop# mdadm --examine /dev/sdb3
mdadm: /dev/sdb3 has no superblock - assembly aborted

root@it:/home/it/Desktop# mdadm --examine /dev/sdb4
mdadm: /dev/sdb4 has no superblock - assembly aborted
--------------------------------------------------------------------------------

And if I try to assemble the array, mdadm tells me that there's no
superblocks in sdb1.

So now I'm stuck. Any tips on what should I do next?

I don't know if it matters, but the original disks have 2 partitions,
the first being where the EMC Lifeline software is installed (less
than 2GB), and the rest is the data partition. When I ddrescue'd, I
only copied the 2nd (data) partition.

Out of curiosity, I opened GParted to see if it can identify the
partitions. It recognizes sdb1 and sdb4 as "LVM PV", but sdb2 and sdb3
are unknown.

That's it for now. I'll keep reading about what can be done and
waiting for some more help from the list.

Regards,

-
Felipe Kich
51-9622-2067


2016-11-10 16:58 GMT-02:00 Wols Lists <antlists@youngman.org.uk>:
> On 10/11/16 17:47, Felipe Kich wrote:
>> Hi Anthony,
>>
>> Thanks for the reply. Here's some answers to your questions and also
>> another question.
>>
>> It really seems that 2 disks are bad, but 2 are still good, according
>> to SMART. I'll replace them ASAP.
>> For now, I don't need to increase the array size. It's more than
>> enough for what I need.
>>
> You might find the extra price of larger drives is minimal. It's down to
> you. And even 2TB drives would give you the space to go raid-6.
>
>> About the drive duplication, I don't have spare discs available now
>> for that, I only have one 4TB disk at hand, so I'd like to know if
>> it's possible to create device images that I can mount and try to
>> rebuild the array, to test if it would work, then I can go and buy new
>> disks to replace the defective ones.
>
> Okay, if you've got a 4TB drive ...
>
> I can't remember what the second bad drive was ... iirc the one that was
> truly dud was sdc ...
>
> So. What I'd do is create two partitions on the 4TB that are the same
> (or possibly slightly larger) than your sdx1 partition. ddrescue the 1
> partition from the best of the dud drives across. Create two partitions
> the same size (or larger) than your sdx2 partition, and likewise
> ddrescue the 2 partition.
>
> Do a --force assembly, and then mount the arrays read-only. The
> partition should be fine. Look over it and see. I think you can do a
> fsck without it actually changing anything. fsck will probably find a
> few problems.
>
> If everything's fine, add in the other two partitions and let it rebuild.
>
> And then replace the drives as quickly as possible. With this setup
> you're critically vulnerable to the 4TB failing. Read up on the
> --replace option to replace the drives with minimal risk.
>>
>> And sure, I'll send you the logs you asked, no problem.
>>
>> Regards.
>>
> Ta muchly.
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-11-22 16:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-10 15:41 Help in recovering a RAID5 volume Felipe Kich
2016-11-10 17:06 ` Wols Lists
2016-11-10 17:32   ` Wols Lists
2016-11-10 17:47   ` Felipe Kich
2016-11-10 18:58     ` Wols Lists
2016-11-22 16:12       ` Felipe Kich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.