All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID Recovery
@ 2017-03-06 15:07 Adam Goryachev
  2017-03-06 15:36 ` Reindl Harald
  2017-03-06 20:10 ` Phil Turmel
  0 siblings, 2 replies; 12+ messages in thread
From: Adam Goryachev @ 2017-03-06 15:07 UTC (permalink / raw)
  To: linux-raid

Hi all,

I'm trying to assist a friend to recover their RAID array, it consists 
of 4 drives, most likely in RAID10. It was a linux based NAS (AFAIK). I 
would really appreciate any tips or suggestions...

First, the bad news:

mdadm --misc --detail /dev/sd[abcd]
mdadm: /dev/sda does not appear to be an md device
mdadm: /dev/sdb does not appear to be an md device
mdadm: /dev/sdc does not appear to be an md device
mdadm: /dev/sdd does not appear to be an md device

mdadm --misc --examine /dev/sd[abcd]
/dev/sda:
    MBR Magic : aa55
/dev/sdb:
    MBR Magic : aa55
/dev/sdc:
    MBR Magic : aa55
/dev/sdd:
    MBR Magic : aa55

This really doesn't look promising.... but the disks themselves look 
"healthy"... at least mostly.

Looking at the content of the drives, it might be possible that all four 
drives were in RAID1 ... at least, I can find identical data on all four 
of the drives:

Running this command for each drive:

strings /dev/sdd |cat -n |less

looking for some "text", and I find what looks like a log file snipped 
which is identical across all four drives. Thats 25 lines of output, 
that exists on the same output line number, matching across all 4 
drives. So perhaps I have a 4 drive RAID1, which I guess should make it 
easier to recover from.


Disks are /dev/sda /dev/sdb /dev/sdc /dev/sdd, all identical 
"partitions" that don't seem to exist, but there is a MBR partition table

gdisk -l /dev/sda
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
   MBR: MBR only
   BSD: not present
   APM: not present
   GPT: not present


***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.
***************************************************************

Disk /dev/sda: 1953525168 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 145F71F0-4D0B-4941-9F9E-2C5301BF518F
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 2048-sector boundaries
Total free space is 1953525101 sectors (931.5 GiB)

The first two drives look like this (lots of read errors), the second 
two look perfectly clean...

/dev/sda

Number  Start (sector)    End (sector)  Size       Code  Name

=== START OF INFORMATION SECTION ===
Device Model:     ST1000NM0033         81Y9807 81Y3867IBM
Serial Number:    Z1W2ZG3M
LU WWN Device Id: 5 000c50 079c48262
Firmware Version: BB5A
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  6 14:44:32 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                         was completed without error.
                                         Auto Offline Data Collection: 
Enabled.
Self-test execution status:      (   0) The previous self-test routine 
completed
                                         without error or no self-test 
has ever
                                         been run.
Total time to complete Offline
data collection:                (  592) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                         Auto Offline data collection 
on/off support.
                                         Suspend Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 115) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                         SCT Error Recovery Control 
supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
   1 Raw_Read_Error_Rate     POSR--   081   063   044 -    120486603
   3 Spin_Up_Time            PO----   097   096   000 -    0
   4 Start_Stop_Count        -O--CK   100   100   020 -    80
   5 Reallocated_Sector_Ct   PO--CK   100   100   010 -    0
   7 Seek_Error_Rate         POSR--   082   060   030 -    181785851
   9 Power_On_Hours          -O--CK   087   087   000 -    11874
  10 Spin_Retry_Count        PO--C-   100   100   097 -    0
  12 Power_Cycle_Count       -O--CK   100   100   020 -    65
184 End-to-End_Error        -O--CK   100   100   099 -    0
187 Reported_Uncorrect      -O--CK   100   100   000 -    0
188 Command_Timeout         -O--CK   100   100   000 -    0
189 High_Fly_Writes         -O-RCK   100   100   000 -    0
190 Airflow_Temperature_Cel -O---K   060   051   045 -    40 (Min/Max 37/40)
191 G-Sense_Error_Rate      -O--CK   100   100   000 -    0
192 Power-Off_Retract_Count -O--CK   100   100   000 -    60
193 Load_Cycle_Count        -O--CK   100   100   000 -    559
194 Temperature_Celsius     -O---K   040   049   000 -    40 (0 21 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   021   007   000 -    120486603
197 Current_Pending_Sector  -O--C-   100   100   000 -    0
198 Offline_Uncorrectable   ----C-   100   100   000 -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000 -    0
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL,SL  R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    512  Current Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS     129  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xb0       GPL,SL  VS       1  Device vendor specific log
0xbd       GPL     VS     512  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc1       GPL,SL  VS      10  Device vendor specific log
0xc2       GPL,SL  VS      50  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS       5  Device vendor specific log
0xd1       GPL,SL  VS       8  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    40 Celsius
Power Cycle Min/Max Temperature:     37/40 Celsius
Lifetime    Min/Max Temperature:     21/49 Celsius
Under/Over Temperature Limit Count:   0/8

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        10 minutes
Min/Max recommended Temperature:      0/ 0 Celsius
Min/Max Temperature Limit:            0/ 0 Celsius
Temperature History Size (Index):    128 (50)
[SNIP TEMPERATURE SECTION]

SCT Error Recovery Control:
            Read:     75 (7.5 seconds)
           Write:     75 (7.5 seconds)

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 2) ==
0x01  0x008  4              65  ---  Lifetime Power-On Resets
0x01  0x010  4           11874  ---  Power-on Hours
0x01  0x018  6      2728215464  ---  Logical Sectors Written
0x01  0x020  6        60447755  ---  Number of Write Commands
0x01  0x028  6     72441102919  ---  Logical Sectors Read
0x01  0x030  6        47955151  ---  Number of Read Commands
0x01  0x038  6               -  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4           11667  ---  Spindle Motor Power-on Hours
0x03  0x010  4             437  ---  Head Flying Hours
0x03  0x018  4             559  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and 
Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              40  N--  Current Temperature
0x05  0x010  1              38  N--  Average Short Term Temperature
0x05  0x018  1              38  N--  Average Long Term Temperature
0x05  0x020  1              49  N--  Highest Temperature
0x05  0x028  1              21  N--  Lowest Temperature
0x05  0x030  1              41  N--  Highest Average Short Term Temperature
0x05  0x038  1               2  N--  Lowest Average Short Term Temperature
0x05  0x040  1              39  N--  Highest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              55  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1              13  ---  Specified Minimum Operating Temperature
                                 |||_ C monitored condition met
                                 ||__ D supports DSN
                                 |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS

/dev/sdb is similar:
=== START OF INFORMATION SECTION ===
Device Model:     ST1000NM0033         81Y9807 81Y3867IBM
Serial Number:    Z1W2ZKKD
LU WWN Device Id: 5 000c50 079c557df
Firmware Version: BB5A
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Mar  6 14:47:49 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
   1 Raw_Read_Error_Rate     POSR--   083   063   044 -    224696760
   3 Spin_Up_Time            PO----   097   096   000 -    0
   4 Start_Stop_Count        -O--CK   100   100   020 -    80
   5 Reallocated_Sector_Ct   PO--CK   100   100   010 -    0
   7 Seek_Error_Rate         POSR--   082   060   030 -    192122915
   9 Power_On_Hours          -O--CK   087   087   000 -    11874
  10 Spin_Retry_Count        PO--C-   100   100   097 -    0
  12 Power_Cycle_Count       -O--CK   100   100   020 -    66
184 End-to-End_Error        -O--CK   100   100   099 -    0
187 Reported_Uncorrect      -O--CK   100   100   000 -    0
188 Command_Timeout         -O--CK   100   100   000 -    0
189 High_Fly_Writes         -O-RCK   100   100   000 -    0
190 Airflow_Temperature_Cel -O---K   056   047   045 -    44 (Min/Max 41/44)
191 G-Sense_Error_Rate      -O--CK   100   100   000 -    0
192 Power-Off_Retract_Count -O--CK   100   100   000 -    58
193 Load_Cycle_Count        -O--CK   100   100   000 -    556
194 Temperature_Celsius     -O---K   044   053   000 -    44 (0 21 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   023   013   000 -    224696760
197 Current_Pending_Sector  -O--C-   100   100   000 -    0
198 Offline_Uncorrectable   ----C-   100   100   000 -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000 -    0
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning


while /dev/sdc looks OK

=== START OF INFORMATION SECTION ===
Device Model:     WD1003FBYX-23        81Y9807 81Y3867IBM
Serial Number:    WD-WCAW37DULJLP
LU WWN Device Id: 5 0014ee 261450c09
Firmware Version: WB35
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar  6 14:48:21 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
   1 Raw_Read_Error_Rate     POSR-K   200   200   051 -    3
   3 Spin_Up_Time            POS--K   186   173   021 -    3691
   4 Start_Stop_Count        -O--CK   100   100   000 -    90
   5 Reallocated_Sector_Ct   PO--CK   200   200   140 -    0
   7 Seek_Error_Rate         -OSR-K   100   253   000 -    0
   9 Power_On_Hours          -O--CK   084   084   000 -    11759
  10 Spin_Retry_Count        -O--CK   100   253   000 -    0
  11 Calibration_Retry_Count -O--CK   100   253   000 -    0
  12 Power_Cycle_Count       -O--CK   100   100   000 -    76
192 Power-Off_Retract_Count -O--CK   200   200   000 -    69
193 Load_Cycle_Count        -O--CK   200   200   000 -    20
194 Temperature_Celsius     -O---K   102   092   000 -    45
196 Reallocated_Event_Count -O--CK   200   200   000 -    0
197 Current_Pending_Sector  -O--CK   200   200   000 -    0
198 Offline_Uncorrectable   ----CK   200   200   000 -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000 -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000 -    0
                             ||||||_ K auto-keep
                             |||||__ C event count
                             ||||___ R error rate
                             |||____ S speed/performance
                             ||_____ O updated online
                             |______ P prefailure warning


Can anyone provide some hints or suggestions on how I should proceed?

Regards,
Adam



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-06 15:07 RAID Recovery Adam Goryachev
@ 2017-03-06 15:36 ` Reindl Harald
  2017-03-06 20:10 ` Phil Turmel
  1 sibling, 0 replies; 12+ messages in thread
From: Reindl Harald @ 2017-03-06 15:36 UTC (permalink / raw)
  To: Adam Goryachev, linux-raid



Am 06.03.2017 um 16:07 schrieb Adam Goryachev:
> Hi all,
>
> I'm trying to assist a friend to recover their RAID array, it consists
> of 4 drives, most likely in RAID10. It was a linux based NAS (AFAIK). I
> would really appreciate any tips or suggestions...
>
> First, the bad news:
>
> mdadm --misc --detail /dev/sd[abcd]
> mdadm: /dev/sda does not appear to be an md device
> mdadm: /dev/sdb does not appear to be an md device
> mdadm: /dev/sdc does not appear to be an md device
> mdadm: /dev/sdd does not appear to be an md device

it's not uncommon to use partitions instead of whole drives

that below are two RAID10 and one RAID1 on four 2 TB drives and so 
/dev/sd[-d] are not an md-device

[root@rh:~]$ cat /proc/mdstat
Personalities : [raid1] [raid10]
md1 : active raid10 sda2[0] sdd2[2] sdb2[3] sdc2[1]
       30716928 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid1 sda1[0] sdd1[2] sdb1[3] sdc1[1]
       511988 blocks super 1.0 [4/4] [UUUU]

md2 : active raid10 sda3[0] sdd3[2] sdb3[3] sdc3[1]
       3875222528 blocks super 1.1 512K chunks 2 near-copies [4/4] [UUUU]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-06 15:07 RAID Recovery Adam Goryachev
  2017-03-06 15:36 ` Reindl Harald
@ 2017-03-06 20:10 ` Phil Turmel
  2017-03-07  9:17   ` Adam Goryachev
  1 sibling, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2017-03-06 20:10 UTC (permalink / raw)
  To: Adam Goryachev, linux-raid

On 03/06/2017 10:07 AM, Adam Goryachev wrote:
> Hi all,
> 
> I'm trying to assist a friend to recover their RAID array, it consists
> of 4 drives, most likely in RAID10. It was a linux based NAS (AFAIK). I
> would really appreciate any tips or suggestions...
> 
> First, the bad news:
> 
> mdadm --misc --detail /dev/sd[abcd]
> mdadm: /dev/sda does not appear to be an md device

For future reference:  --detail is only applicable to the /dev/md*
*array* device.  --examine is only applicable to *member* devices,
and is required for valid results when an array is not running.

> mdadm --misc --examine /dev/sd[abcd]
> /dev/sda:
>    MBR Magic : aa55
> /dev/sdb:
>    MBR Magic : aa55
> /dev/sdc:
>    MBR Magic : aa55
> /dev/sdd:
>    MBR Magic : aa55
> 
> This really doesn't look promising.... but the disks themselves look
> "healthy"... at least mostly.

As Reindl said, this by itself is no surprise.  The NAS has to boot off
of *something*, so partitions for /boot, /swap, /, and /data, or some
combination, is common for such small systems.

The first partition is likely raid1 across all devices for /boot.
After that, all bets are off.

> Looking at the content of the drives, it might be possible that all four
> drives were in RAID1 ... at least, I can find identical data on all four
> of the drives:
> 
> Running this command for each drive:
> 
> strings /dev/sdd |cat -n |less
> 
> looking for some "text", and I find what looks like a log file snipped
> which is identical across all four drives. Thats 25 lines of output,
> that exists on the same output line number, matching across all 4
> drives. So perhaps I have a 4 drive RAID1, which I guess should make it
> easier to recover from.

Probably just a 4x raid1 mirror for the root partition.

> Disks are /dev/sda /dev/sdb /dev/sdc /dev/sdd, all identical
> "partitions" that don't seem to exist, but there is a MBR partition table
> 
> gdisk -l /dev/sda
> GPT fdisk (gdisk) version 1.0.1
> 
> Partition table scan:
>   MBR: MBR only
>   BSD: not present
>   APM: not present
>   GPT: not present
> 
> 
> ***************************************************************
> Found invalid GPT and valid MBR; converting MBR to GPT format
> in memory.
> ***************************************************************
> 
> Disk /dev/sda: 1953525168 sectors, 931.5 GiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): 145F71F0-4D0B-4941-9F9E-2C5301BF518F
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 1953525134
> Partitions will be aligned on 2048-sector boundaries
> Total free space is 1953525101 sectors (931.5 GiB)

This is worrisome.  Please repost the complete output of fdisk -l
and gdisk -l for all of these devices.  But....

> The first two drives look like this (lots of read errors), the second
> two look perfectly clean...

Please remove the drives from the NAS box and connect to a known good
system.  Your smartctl reports include neither re-allocated sectors
nor pending relocations, which would be expected if there are many
read errors.  That means the read errors are likely due to controller,
cables, or power supply problems.

Note, timeout mismatch *does not* apply to sda, but you trimmed
too much to tell for the other devices.  Please submit complete output
from smartctl -iA -l scterc /dev/sdX for each of these devices.

Do the fdisk & gdisk reports from the known good system, and also,
if you can find any partitions, run --examine on each from the same
system.  Keep the --examine reports with the corresponding smartctl
report.

Phil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-06 20:10 ` Phil Turmel
@ 2017-03-07  9:17   ` Adam Goryachev
  2017-03-07 14:06     ` Adam Goryachev
  0 siblings, 1 reply; 12+ messages in thread
From: Adam Goryachev @ 2017-03-07  9:17 UTC (permalink / raw)
  To: Phil Turmel, linux-raid



On 7/3/17 07:10, Phil Turmel wrote:
> On 03/06/2017 10:07 AM, Adam Goryachev wrote:
>> Hi all,
>>
>> I'm trying to assist a friend to recover their RAID array, it consists
>> of 4 drives, most likely in RAID10. It was a linux based NAS (AFAIK). I
>> would really appreciate any tips or suggestions...
>>
>> First, the bad news:
>>
>> mdadm --misc --examine /dev/sd[abcd]
>> /dev/sda:
>>     MBR Magic : aa55
>> /dev/sdb:
>>     MBR Magic : aa55
>> /dev/sdc:
>>     MBR Magic : aa55
>> /dev/sdd:
>>     MBR Magic : aa55
>>
>> This really doesn't look promising.... but the disks themselves look
>> "healthy"... at least mostly.
> As Reindl said, this by itself is no surprise.  The NAS has to boot off
> of *something*, so partitions for /boot, /swap, /, and /data, or some
> combination, is common for such small systems.
>
> The first partition is likely raid1 across all devices for /boot.
> After that, all bets are off.
OK, so since it looks like the partition table has been lost, is there 
something that could be used to define where the partition table 
boundaries are? eg, if the raid is marked at the beginning of each 
partition, then finding it will show that "this" is the beginning, or 
vice versa if the raid marker is at the end of the partitions....
>> Looking at the content of the drives, it might be possible that all four
>> drives were in RAID1 ... at least, I can find identical data on all four
>> of the drives:
>>
>> Running this command for each drive:
>>
>> strings /dev/sdd |cat -n |less
>>
>> looking for some "text", and I find what looks like a log file snipped
>> which is identical across all four drives. Thats 25 lines of output,
>> that exists on the same output line number, matching across all 4
>> drives. So perhaps I have a 4 drive RAID1, which I guess should make it
>> easier to recover from.
> Probably just a 4x raid1 mirror for the root partition.
OK, so if I can find where the drives stop being identical, then I can 
probably identify the end of the root partition. Also, if I can recover 
the root partition (not the goal) then it might contain some valuable 
information on the original config of the rest of the drives.... and 
hence get to recover the actual data partitions.
>> Disks are /dev/sda /dev/sdb /dev/sdc /dev/sdd, all identical
>> "partitions" that don't seem to exist, but there is a MBR partition table
>>
>> gdisk -l /dev/sda
>> GPT fdisk (gdisk) version 1.0.1
>>
>> Partition table scan:
>>    MBR: MBR only
>>    BSD: not present
>>    APM: not present
>>    GPT: not present
>>
>>
>> ***************************************************************
>> Found invalid GPT and valid MBR; converting MBR to GPT format
>> in memory.
>> ***************************************************************
>>
>> Disk /dev/sda: 1953525168 sectors, 931.5 GiB
>> Logical sector size: 512 bytes
>> Disk identifier (GUID): 145F71F0-4D0B-4941-9F9E-2C5301BF518F
>> Partition table holds up to 128 entries
>> First usable sector is 34, last usable sector is 1953525134
>> Partitions will be aligned on 2048-sector boundaries
>> Total free space is 1953525101 sectors (931.5 GiB)
> This is worrisome.  Please repost the complete output of fdisk -l
> and gdisk -l for all of these devices.  But....
>
>> The first two drives look like this (lots of read errors), the second
>> two look perfectly clean...
> Please remove the drives from the NAS box and connect to a known good
> system.  Your smartctl reports include neither re-allocated sectors
> nor pending relocations, which would be expected if there are many
> read errors.  That means the read errors are likely due to controller,
> cables, or power supply problems.
The drives have already been removed from the original NAS device, they 
are now connected to a PC running from a ubuntu live CD...
> Note, timeout mismatch *does not* apply to sda, but you trimmed
> too much to tell for the other devices.  Please submit complete output
> from smartctl -iA -l scterc /dev/sdX for each of these devices.
All devices are the same, but I'll include it here:
smartctl -iA -l scterc /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST1000NM0033         81Y9807 81Y3867IBM
Serial Number:    Z1W2ZG3M
LU WWN Device Id: 5 000c50 079c48262
Firmware Version: BB5A
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Mar  7 09:11:11 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   081   063   044 Pre-fail  
Always       -       122481432
   3 Spin_Up_Time            0x0003   097   096   000 Pre-fail  
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020 Old_age   
Always       -       80
   5 Reallocated_Sector_Ct   0x0033   100   100   010 Pre-fail  
Always       -       0
   7 Seek_Error_Rate         0x000f   082   060   030 Pre-fail  
Always       -       182098084
   9 Power_On_Hours          0x0032   087   087   000 Old_age   
Always       -       11892
  10 Spin_Retry_Count        0x0013   100   100   097 Pre-fail  
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020 Old_age   
Always       -       65
184 End-to-End_Error        0x0032   100   100   099 Old_age   
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000 Old_age   
Always       -       0
188 Command_Timeout         0x0032   100   100   000 Old_age   
Always       -       0
189 High_Fly_Writes         0x003a   100   100   000 Old_age   
Always       -       0
190 Airflow_Temperature_Cel 0x0022   063   051   045 Old_age   
Always       -       37 (Min/Max 36/40)
191 G-Sense_Error_Rate      0x0032   100   100   000 Old_age   
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000 Old_age   
Always       -       60
193 Load_Cycle_Count        0x0032   100   100   000 Old_age   
Always       -       559
194 Temperature_Celsius     0x0022   037   049   000 Old_age   
Always       -       37 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   021   007   000 Old_age   
Always       -       122481432
197 Current_Pending_Sector  0x0012   100   100   000 Old_age   
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000 Old_age   
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000 Old_age   
Always       -       0

SCT Error Recovery Control:
            Read:     75 (7.5 seconds)
           Write:     75 (7.5 seconds)

smartctl -iA -l scterc /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST1000NM0033         81Y9807 81Y3867IBM
Serial Number:    Z1W2ZKKD
LU WWN Device Id: 5 000c50 079c557df
Firmware Version: BB5A
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Mar  7 09:11:11 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   083   063   044    Pre-fail 
Always       -       225986939
   3 Spin_Up_Time            0x0003   097   096   000    Pre-fail 
Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age 
Always       -       80
   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x000f   082   060   030    Pre-fail 
Always       -       192404045
   9 Power_On_Hours          0x0032   087   087   000    Old_age 
Always       -       11892
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
Always       -       66
184 End-to-End_Error        0x0032   100   100   099    Old_age 
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age 
Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age 
Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age 
Always       -       0
190 Airflow_Temperature_Cel 0x0022   059   047   045    Old_age 
Always       -       41 (Min/Max 40/44)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age 
Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age 
Always       -       58
193 Load_Cycle_Count        0x0032   100   100   000    Old_age 
Always       -       556
194 Temperature_Celsius     0x0022   041   053   000    Old_age 
Always       -       41 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   023   013   000    Old_age 
Always       -       225986939
197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age 
Always       -       0

SCT Error Recovery Control:
            Read:     75 (7.5 seconds)
           Write:     75 (7.5 seconds)

smartctl -iA -l scterc /dev/sdc
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WD1003FBYX-23        81Y9807 81Y3867IBM
Serial Number:    WD-WCAW37DULJLP
LU WWN Device Id: 5 0014ee 261450c09
Firmware Version: WB35
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Mar  7 09:11:11 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       3
   3 Spin_Up_Time            0x0027   186   173   021    Pre-fail 
Always       -       3691
   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       90
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x002e   100   253   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   084   084   000    Old_age 
Always       -       11777
  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
Always       -       0
  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       76
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
Always       -       69
193 Load_Cycle_Count        0x0032   200   200   000    Old_age 
Always       -       20
194 Temperature_Celsius     0x0022   103   092   000    Old_age 
Always       -       44
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

SCT Error Recovery Control:
            Read:     70 (7.0 seconds)
           Write:     70 (7.0 seconds)

smartctl -iA -l scterc /dev/sdd
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WD1003FBYX-23        81Y9807 81Y3867IBM
Serial Number:    WD-WCAW37DULEES
LU WWN Device Id: 5 0014ee 26143dbd2
Firmware Version: WB35
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Mar  7 09:11:11 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
Always       -       0
   3 Spin_Up_Time            0x0027   184   173   021    Pre-fail 
Always       -       3758
   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
Always       -       90
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
Always       -       0
   7 Seek_Error_Rate         0x002e   100   253   000    Old_age 
Always       -       0
   9 Power_On_Hours          0x0032   084   084   000    Old_age 
Always       -       11767
  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
Always       -       0
  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
Always       -       76
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
Always       -       67
193 Load_Cycle_Count        0x0032   200   200   000    Old_age 
Always       -       22
194 Temperature_Celsius     0x0022   106   095   000    Old_age 
Always       -       41
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
Offline      -       0

SCT Error Recovery Control:
            Read:     70 (7.0 seconds)
           Write:     70 (7.0 seconds)

> Do the fdisk & gdisk reports from the known good system, and also,
> if you can find any partitions, run --examine on each from the same
> system.  Keep the --examine reports with the corresponding smartctl
> report.
>
Looks like the partition tables are all gone... fdisk and gdisk both 
report no partitions on any drive. gdisk shows them all with MBR (as 
mdadm did, which is apparently due to the magic bytes aa55

So it seems the real problem will be to work out where the various 
partitions start and end...... Then re-create the partition table, and 
hopefully the actual data will still be good.
Any ideas on how to "find" the partitions?

Thanks,
Adam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-07  9:17   ` Adam Goryachev
@ 2017-03-07 14:06     ` Adam Goryachev
  2017-03-07 15:00       ` Phil Turmel
  0 siblings, 1 reply; 12+ messages in thread
From: Adam Goryachev @ 2017-03-07 14:06 UTC (permalink / raw)
  To: Phil Turmel, linux-raid

BTW, just some more info I've found... either almost the entire drives 
are RAID1 mirrors, or all 4 are RAID1 mirrors:

root@ubuntu:~# cmp /dev/sda /dev/sdb
/dev/sda /dev/sdb differ: byte 1000162959365, line 243319233
root@ubuntu:~# cmp /dev/sdc /dev/sdd
/dev/sdc /dev/sdd differ: byte 1000162971653, line 243236929

Now I'll need to compare sdb and sdc to find out if all 4 are identical...

PS, they are 1TB drives, so sure, might be a minute amount of unique 
data at the very end of the drives....

Other option, they have been re-initialised/zero'd or similar, and thats 
why all the data is identical (useless). I was hoping to get a starting 
point for where the partition boundaries might have been ....

Regards,
Adam


On 7/3/17 20:17, Adam Goryachev wrote:
>
>
> On 7/3/17 07:10, Phil Turmel wrote:
>> On 03/06/2017 10:07 AM, Adam Goryachev wrote:
>>> Hi all,
>>>
>>> I'm trying to assist a friend to recover their RAID array, it consists
>>> of 4 drives, most likely in RAID10. It was a linux based NAS (AFAIK). I
>>> would really appreciate any tips or suggestions...
>>>
>>> First, the bad news:
>>>
>>> mdadm --misc --examine /dev/sd[abcd]
>>> /dev/sda:
>>>     MBR Magic : aa55
>>> /dev/sdb:
>>>     MBR Magic : aa55
>>> /dev/sdc:
>>>     MBR Magic : aa55
>>> /dev/sdd:
>>>     MBR Magic : aa55
>>>
>>> This really doesn't look promising.... but the disks themselves look
>>> "healthy"... at least mostly.
>> As Reindl said, this by itself is no surprise.  The NAS has to boot off
>> of *something*, so partitions for /boot, /swap, /, and /data, or some
>> combination, is common for such small systems.
>>
>> The first partition is likely raid1 across all devices for /boot.
>> After that, all bets are off.
> OK, so since it looks like the partition table has been lost, is there 
> something that could be used to define where the partition table 
> boundaries are? eg, if the raid is marked at the beginning of each 
> partition, then finding it will show that "this" is the beginning, or 
> vice versa if the raid marker is at the end of the partitions....
>>> Looking at the content of the drives, it might be possible that all 
>>> four
>>> drives were in RAID1 ... at least, I can find identical data on all 
>>> four
>>> of the drives:
>>>
>>> Running this command for each drive:
>>>
>>> strings /dev/sdd |cat -n |less
>>>
>>> looking for some "text", and I find what looks like a log file snipped
>>> which is identical across all four drives. Thats 25 lines of output,
>>> that exists on the same output line number, matching across all 4
>>> drives. So perhaps I have a 4 drive RAID1, which I guess should make it
>>> easier to recover from.
>> Probably just a 4x raid1 mirror for the root partition.
> OK, so if I can find where the drives stop being identical, then I can 
> probably identify the end of the root partition. Also, if I can 
> recover the root partition (not the goal) then it might contain some 
> valuable information on the original config of the rest of the 
> drives.... and hence get to recover the actual data partitions.
>>> Disks are /dev/sda /dev/sdb /dev/sdc /dev/sdd, all identical
>>> "partitions" that don't seem to exist, but there is a MBR partition 
>>> table
>>>
>>> gdisk -l /dev/sda
>>> GPT fdisk (gdisk) version 1.0.1
>>>
>>> Partition table scan:
>>>    MBR: MBR only
>>>    BSD: not present
>>>    APM: not present
>>>    GPT: not present
>>>
>>>
>>> ***************************************************************
>>> Found invalid GPT and valid MBR; converting MBR to GPT format
>>> in memory.
>>> ***************************************************************
>>>
>>> Disk /dev/sda: 1953525168 sectors, 931.5 GiB
>>> Logical sector size: 512 bytes
>>> Disk identifier (GUID): 145F71F0-4D0B-4941-9F9E-2C5301BF518F
>>> Partition table holds up to 128 entries
>>> First usable sector is 34, last usable sector is 1953525134
>>> Partitions will be aligned on 2048-sector boundaries
>>> Total free space is 1953525101 sectors (931.5 GiB)
>> This is worrisome.  Please repost the complete output of fdisk -l
>> and gdisk -l for all of these devices.  But....
>>
>>> The first two drives look like this (lots of read errors), the second
>>> two look perfectly clean...
>> Please remove the drives from the NAS box and connect to a known good
>> system.  Your smartctl reports include neither re-allocated sectors
>> nor pending relocations, which would be expected if there are many
>> read errors.  That means the read errors are likely due to controller,
>> cables, or power supply problems.
> The drives have already been removed from the original NAS device, 
> they are now connected to a PC running from a ubuntu live CD...
>> Note, timeout mismatch *does not* apply to sda, but you trimmed
>> too much to tell for the other devices.  Please submit complete output
>> from smartctl -iA -l scterc /dev/sdX for each of these devices.
> All devices are the same, but I'll include it here:
> smartctl -iA -l scterc /dev/sda
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local 
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, 
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model:     ST1000NM0033         81Y9807 81Y3867IBM
> Serial Number:    Z1W2ZG3M
> LU WWN Device Id: 5 000c50 079c48262
> Firmware Version: BB5A
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    7200 rpm
> Form Factor:      3.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Mar  7 09:11:11 2017 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   081   063   044 Pre-fail 
> Always       -       122481432
>   3 Spin_Up_Time            0x0003   097   096   000 Pre-fail 
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020 Old_age 
> Always       -       80
>   5 Reallocated_Sector_Ct   0x0033   100   100   010 Pre-fail 
> Always       -       0
>   7 Seek_Error_Rate         0x000f   082   060   030 Pre-fail 
> Always       -       182098084
>   9 Power_On_Hours          0x0032   087   087   000 Old_age 
> Always       -       11892
>  10 Spin_Retry_Count        0x0013   100   100   097 Pre-fail 
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020 Old_age 
> Always       -       65
> 184 End-to-End_Error        0x0032   100   100   099 Old_age 
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000 Old_age 
> Always       -       0
> 188 Command_Timeout         0x0032   100   100   000 Old_age 
> Always       -       0
> 189 High_Fly_Writes         0x003a   100   100   000 Old_age 
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   063   051   045 Old_age 
> Always       -       37 (Min/Max 36/40)
> 191 G-Sense_Error_Rate      0x0032   100   100   000 Old_age 
> Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000 Old_age 
> Always       -       60
> 193 Load_Cycle_Count        0x0032   100   100   000 Old_age 
> Always       -       559
> 194 Temperature_Celsius     0x0022   037   049   000 Old_age 
> Always       -       37 (0 21 0 0 0)
> 195 Hardware_ECC_Recovered  0x001a   021   007   000 Old_age 
> Always       -       122481432
> 197 Current_Pending_Sector  0x0012   100   100   000 Old_age 
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000 Old_age 
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000 Old_age 
> Always       -       0
>
> SCT Error Recovery Control:
>            Read:     75 (7.5 seconds)
>           Write:     75 (7.5 seconds)
>
> smartctl -iA -l scterc /dev/sdb
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local 
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, 
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model:     ST1000NM0033         81Y9807 81Y3867IBM
> Serial Number:    Z1W2ZKKD
> LU WWN Device Id: 5 000c50 079c557df
> Firmware Version: BB5A
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    7200 rpm
> Form Factor:      3.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-2 (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Tue Mar  7 09:11:11 2017 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x000f   083   063   044    Pre-fail 
> Always       -       225986939
>   3 Spin_Up_Time            0x0003   097   096   000    Pre-fail 
> Always       -       0
>   4 Start_Stop_Count        0x0032   100   100   020    Old_age 
> Always       -       80
>   5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
>   7 Seek_Error_Rate         0x000f   082   060   030    Pre-fail 
> Always       -       192404045
>   9 Power_On_Hours          0x0032   087   087   000    Old_age 
> Always       -       11892
>  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail 
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   020    Old_age 
> Always       -       66
> 184 End-to-End_Error        0x0032   100   100   099    Old_age 
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age 
> Always       -       0
> 188 Command_Timeout         0x0032   100   100   000    Old_age 
> Always       -       0
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age 
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   059   047   045    Old_age 
> Always       -       41 (Min/Max 40/44)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age 
> Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age 
> Always       -       58
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age 
> Always       -       556
> 194 Temperature_Celsius     0x0022   041   053   000    Old_age 
> Always       -       41 (0 21 0 0 0)
> 195 Hardware_ECC_Recovered  0x001a   023   013   000    Old_age 
> Always       -       225986939
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age 
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age 
> Always       -       0
>
> SCT Error Recovery Control:
>            Read:     75 (7.5 seconds)
>           Write:     75 (7.5 seconds)
>
> smartctl -iA -l scterc /dev/sdc
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local 
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, 
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model:     WD1003FBYX-23        81Y9807 81Y3867IBM
> Serial Number:    WD-WCAW37DULJLP
> LU WWN Device Id: 5 0014ee 261450c09
> Firmware Version: WB35
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    7200 rpm
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ATA8-ACS (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:    Tue Mar  7 09:11:11 2017 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
> Always       -       3
>   3 Spin_Up_Time            0x0027   186   173   021    Pre-fail 
> Always       -       3691
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
> Always       -       90
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
> Always       -       0
>   7 Seek_Error_Rate         0x002e   100   253   000    Old_age 
> Always       -       0
>   9 Power_On_Hours          0x0032   084   084   000    Old_age 
> Always       -       11777
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
> Always       -       76
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
> Always       -       69
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age 
> Always       -       20
> 194 Temperature_Celsius     0x0022   103   092   000    Old_age 
> Always       -       44
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
> Offline      -       0
>
> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)
>
> smartctl -iA -l scterc /dev/sdd
> smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.8.0-22-generic] (local 
> build)
> Copyright (C) 2002-16, Bruce Allen, Christian Franke, 
> www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Device Model:     WD1003FBYX-23        81Y9807 81Y3867IBM
> Serial Number:    WD-WCAW37DULEES
> LU WWN Device Id: 5 0014ee 26143dbd2
> Firmware Version: WB35
> User Capacity:    1,000,204,886,016 bytes [1.00 TB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    7200 rpm
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ATA8-ACS (minor revision not indicated)
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
> Local Time is:    Tue Mar  7 09:11:11 2017 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail 
> Always       -       0
>   3 Spin_Up_Time            0x0027   184   173   021    Pre-fail 
> Always       -       3758
>   4 Start_Stop_Count        0x0032   100   100   000    Old_age 
> Always       -       90
>   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail 
> Always       -       0
>   7 Seek_Error_Rate         0x002e   100   253   000    Old_age 
> Always       -       0
>   9 Power_On_Hours          0x0032   084   084   000    Old_age 
> Always       -       11767
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age 
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age 
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
> Always       -       76
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age 
> Always       -       67
> 193 Load_Cycle_Count        0x0032   200   200   000    Old_age 
> Always       -       22
> 194 Temperature_Celsius     0x0022   106   095   000    Old_age 
> Always       -       41
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age 
> Always       -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age 
> Always       -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age 
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age 
> Always       -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age 
> Offline      -       0
>
> SCT Error Recovery Control:
>            Read:     70 (7.0 seconds)
>           Write:     70 (7.0 seconds)
>
>> Do the fdisk & gdisk reports from the known good system, and also,
>> if you can find any partitions, run --examine on each from the same
>> system.  Keep the --examine reports with the corresponding smartctl
>> report.
>>
> Looks like the partition tables are all gone... fdisk and gdisk both 
> report no partitions on any drive. gdisk shows them all with MBR (as 
> mdadm did, which is apparently due to the magic bytes aa55
>
> So it seems the real problem will be to work out where the various 
> partitions start and end...... Then re-create the partition table, and 
> hopefully the actual data will still be good.
> Any ideas on how to "find" the partitions?
>
> Thanks,
> Adam
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-07 14:06     ` Adam Goryachev
@ 2017-03-07 15:00       ` Phil Turmel
  2017-03-08  9:08         ` Adam Goryachev
  0 siblings, 1 reply; 12+ messages in thread
From: Phil Turmel @ 2017-03-07 15:00 UTC (permalink / raw)
  To: Adam Goryachev, linux-raid

Hi Adam,

{Please remember to trim repetitive stuff, and interleave.}

On 03/07/2017 09:06 AM, Adam Goryachev wrote:
> BTW, just some more info I've found... either almost the entire
> drives are RAID1 mirrors, or all 4 are RAID1 mirrors:

> Other option, they have been re-initialised/zero'd or similar, and
> thats why all the data is identical (useless). I was hoping to get a
> starting point for where the partition boundaries might have been
> ....

Search the devices for ext2/3/4 superblocks, like so:

dd if=/dev/sdX bs=1M 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'

This will take a very long time, and will generate false positives.
You probably would want to use screen or tmux to run these in
parallel in separate processes.

But superblock locations will give you hints as to the rest of data,
and make it possible to create partitions that will let you copy
stuff off into a new array.

Phil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-07 15:00       ` Phil Turmel
@ 2017-03-08  9:08         ` Adam Goryachev
  2017-03-08 15:25           ` Adam Goryachev
  0 siblings, 1 reply; 12+ messages in thread
From: Adam Goryachev @ 2017-03-08  9:08 UTC (permalink / raw)
  To: Phil Turmel, linux-raid



On 8/3/17 02:00, Phil Turmel wrote:
> Hi Adam,
>
> {Please remember to trim repetitive stuff, and interleave.}
>
> On 03/07/2017 09:06 AM, Adam Goryachev wrote:
>> BTW, just some more info I've found... either almost the entire
>> drives are RAID1 mirrors, or all 4 are RAID1 mirrors:

OK, so I now have the following:
root@ubuntu:~# cmp /dev/sda /dev/sdb
/dev/sda /dev/sdb differ: byte 1000162959365, line 243319233
root@ubuntu:~# cmp /dev/sdc /dev/sdd
/dev/sdc /dev/sdd differ: byte 1000162971653, line 243236929
root@ubuntu:~# cmp /dev/sdb /dev/sdc
/dev/sdb /dev/sdc differ: byte 499637813249, line 54927275

So drives sda/sdb are almost complete mirror, and drives sdc/sdd are 
almost complete mirror.
On top of that, the first half of all four drives are a complete mirror 
(which seems oversized considering a "small" root RAID1 drive....)
Why there is a difference half way, and whether they are more difference 
after that point I haven't yet checked, but that looks like something to 
come back to...
>> Other option, they have been re-initialised/zero'd or similar, and
>> thats why all the data is identical (useless). I was hoping to get a
>> starting point for where the partition boundaries might have been
>> ....
> Search the devices for ext2/3/4 superblocks, like so:
>
> dd if=/dev/sdX bs=1M 2>/dev/null |hexdump -C |grep '30  .\+  53 ef 0'
What is the chance it is a ext2/3/4 based FS? I suppose most NAS would 
use these filesystems... I guess I'll find out soon enough.

> This will take a very long time, and will generate false positives.
Can you advise what to do to "verify" these and work out which ones are 
false positives?
> You probably would want to use screen or tmux to run these in
> parallel in separate processes.
I'm not sure there is much of a point, since they are mostly duplicates 
of each other. I'm running it on sdd now, and copying sda and sdc to a 
spare drive. I may re-run the command on sdc (and skip the first 
490GB...) if nothing useful is found on sdd.
> But superblock locations will give you hints as to the rest of data,
> and make it possible to create partitions that will let you copy
> stuff off into a new array.
>
Sounds good, will see how it goes. Thanks for the advice!

Regards,
Adam

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID Recovery
  2017-03-08  9:08         ` Adam Goryachev
@ 2017-03-08 15:25           ` Adam Goryachev
  0 siblings, 0 replies; 12+ messages in thread
From: Adam Goryachev @ 2017-03-08 15:25 UTC (permalink / raw)
  To: Phil Turmel, linux-raid



On 8/3/17 20:08, Adam Goryachev wrote:
>
>
> On 8/3/17 02:00, Phil Turmel wrote:
>> Hi Adam,
>>
>> Search the devices for ext2/3/4 superblocks, like so:
>>
>> dd if=/dev/sdX bs=1M 2>/dev/null |hexdump -C |grep '30  .\+ 53 ef 0'
> What is the chance it is a ext2/3/4 based FS? I suppose most NAS would 
> use these filesystems... I guess I'll find out soon enough.
>
>> This will take a very long time, and will generate false positives.
> Can you advise what to do to "verify" these and work out which ones 
> are false positives?
>> You probably would want to use screen or tmux to run these in
>> parallel in separate processes.
> I'm not sure there is much of a point, since they are mostly 
> duplicates of each other. I'm running it on sdd now, and copying sda 
> and sdc to a spare drive. I may re-run the command on sdc (and skip 
> the first 490GB...) if nothing useful is found on sdd.
>> But superblock locations will give you hints as to the rest of data,
>> and make it possible to create partitions that will let you copy
>> stuff off into a new array.
>>
>

OK, so the first disk has completed and found 85 matches, I don't think 
there is any point in posting the raw output, but can you advise what I 
should do with it? Here is the first few matches:
082f6c30  3a 5d f7 a2 d0 52 ab ba  53 ef 0d 6d 51 d0 2a 76 
|:]...R..S..mQ.*v|
21146d30  cf 17 bf 15 bf 9e e2 67  53 ef 05 15 a7 89 ae 38 
|.......gS......8|
217e2730  73 0f 3c 00 99 68 a4 ed  53 ef 0f b4 ed 8d a6 7b 
|s.<..h..S......{|
4c64f430  a4 48 02 00 00 00 86 00  53 ef 00 00 03 7a 1f 09 
|.H......S....z..|
4c688430  a4 48 02 00 00 00 86 00  53 ef 00 00 03 7a 1f 09 
|.H......S....z..|
e8724430  00 2b 02 17 71 e3 27 3b  53 ef 0c f3 90 3b 0c 0e 
|.+..q.';S....;..|
f1fd0d30  53 ef 04 cd 4d ef 04 cd  53 ef 04 cd 4d ef 04 cd 
|S...M...S...M...|
fb9a2930  ce f0 97 69 a0 f2 1b 07  53 ef 0b 95 74 ff 98 c9 
|...i....S...t...|
10a0c9d30  41 2c 9b 24 7f bf ec 76  53 ef 0f 07 3e 0a 3d 15 
|A,.$...vS...>.=.|
15bd95430  e1 72 92 fb 73 64 a4 1a  53 ef 0b 6a 66 e1 ef ae 
|.r..sd..S..jf...|


Regards,
Adam

-- 
Adam Goryachev
Website Managers
P: +61 2 8304 0000                    adam@websitemanagers.com.au
F: +61 2 8304 0001                     www.websitemanagers.com.au


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Raid Recovery
  2015-06-29 21:05 Raid Recovery Carter J. Castor
@ 2015-07-01  1:01 ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2015-07-01  1:01 UTC (permalink / raw)
  To: Carter J. Castor; +Cc: linux-raid

On Mon, 29 Jun 2015 17:05:17 -0400 "Carter J. Castor"
<cjcastor@gmail.com> wrote:

> I have an ubuntu RAID 10 server using mdadm. It has six drives and a
> hot spare. The eSATA cables to the box holding the harddrives got
> disconnected from the server. When I plugged them back, it immediately
> saw only three of the drives and my hotspare and started rebuilding
> the array. Not wanting to screw it up, I let it finish, and then I
> rebooted. On start up, the array was inactive. When I tried to
> assemble it, it told me that there was only one disk(!?) so I tried to
> assemble it with the --force command. It then assembled, but the
> logical volume that I use to mount the RAID partition was gone. I have
> backups, but I'd like to save the data if at all possible. What's my
> best course of action from here?
> 
> Here's the /proc/mdstat after rebooting:
> md127 : active (auto-read-only) raid10 sdg1[5] sdd1[1] sdh1[6] sde1[7]
> sdf1[4] sdc1[2]
>       2929890816 blocks super 1.2 256K chunks 2 far-copies [6/6] [UUUUUU]
> 
> 
> Blkid output:
> 
> /dev/sda1: LABEL="KINGSTON" UUID="C3D4-FB34" TYPE="vfat"
> /dev/sdb1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="f858b034-29ab-4d59-02d3-775b9b2139cd" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sdc1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="90675a7f-bf95-1514-4e54-e38ef61d5943" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sdd1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="eaa50f1c-e1cd-d1a2-b10b-da77f2c0d5d7" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sde1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="0b8210f9-cd73-a70d-bba7-01231bf9c578" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sdf1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="fe58a109-7ec1-37d6-ee10-76197ffb4c67" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sdg1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="726e7104-835c-adcc-32c4-952699b3b030" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sdh1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
> UUID_SUB="91095954-1368-ba5e-592a-b3d245904cb8" LABEL="Pangolin:0"
> TYPE="linux_raid_member"
> /dev/sdi1: UUID="d24a5068-14fb-4cec-831a-4b6d3bdba44d" TYPE="ext4"
> /dev/sdi5: UUID="ab5617d1-58e1-4013-9690-64679982b2cf" TYPE="swap"
> /dev/md127: UUID="S2atdF-aFUn-hmKm-yzsr-iqYW-DX3W-ioWjnl" TYPE="LVM2_member"
> 

"mdadm --examine" output of each drive is usually a good idea.
Also kernel logs from the time of the failure can help.

However blkid is reporting that "md127" is an "LVM2_member", which is
encouraging.  Why do you think that the logical volume is gone?
What does
  pvdisplay /dev/md127
report?
What about
  pvck /dev/md127
??


NeilBrown


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Raid Recovery
@ 2015-06-29 21:05 Carter J. Castor
  2015-07-01  1:01 ` NeilBrown
  0 siblings, 1 reply; 12+ messages in thread
From: Carter J. Castor @ 2015-06-29 21:05 UTC (permalink / raw)
  To: linux-raid

I have an ubuntu RAID 10 server using mdadm. It has six drives and a
hot spare. The eSATA cables to the box holding the harddrives got
disconnected from the server. When I plugged them back, it immediately
saw only three of the drives and my hotspare and started rebuilding
the array. Not wanting to screw it up, I let it finish, and then I
rebooted. On start up, the array was inactive. When I tried to
assemble it, it told me that there was only one disk(!?) so I tried to
assemble it with the --force command. It then assembled, but the
logical volume that I use to mount the RAID partition was gone. I have
backups, but I'd like to save the data if at all possible. What's my
best course of action from here?

Here's the /proc/mdstat after rebooting:
md127 : active (auto-read-only) raid10 sdg1[5] sdd1[1] sdh1[6] sde1[7]
sdf1[4] sdc1[2]
      2929890816 blocks super 1.2 256K chunks 2 far-copies [6/6] [UUUUUU]


Blkid output:

/dev/sda1: LABEL="KINGSTON" UUID="C3D4-FB34" TYPE="vfat"
/dev/sdb1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="f858b034-29ab-4d59-02d3-775b9b2139cd" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sdc1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="90675a7f-bf95-1514-4e54-e38ef61d5943" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sdd1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="eaa50f1c-e1cd-d1a2-b10b-da77f2c0d5d7" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sde1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="0b8210f9-cd73-a70d-bba7-01231bf9c578" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sdf1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="fe58a109-7ec1-37d6-ee10-76197ffb4c67" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sdg1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="726e7104-835c-adcc-32c4-952699b3b030" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sdh1: UUID="dedc8d6c-8155-f88d-b035-c8cf66d4e561"
UUID_SUB="91095954-1368-ba5e-592a-b3d245904cb8" LABEL="Pangolin:0"
TYPE="linux_raid_member"
/dev/sdi1: UUID="d24a5068-14fb-4cec-831a-4b6d3bdba44d" TYPE="ext4"
/dev/sdi5: UUID="ab5617d1-58e1-4013-9690-64679982b2cf" TYPE="swap"
/dev/md127: UUID="S2atdF-aFUn-hmKm-yzsr-iqYW-DX3W-ioWjnl" TYPE="LVM2_member"

Carter J. Castor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID recovery
  2003-07-11  6:54 RAID recovery Mikael Chambon
@ 2003-07-11 10:51 ` Avijit Pathania
  0 siblings, 0 replies; 12+ messages in thread
From: Avijit Pathania @ 2003-07-11 10:51 UTC (permalink / raw)
  To: Mikael Chambon; +Cc: linux-raid

After you have replaced the drive, the raid will start in degraded mode 
using only one half of the mirror. It will not replace the failed drive 
automatically, even though that drive exists.

To answer your question make sure that you pass the correct path of the 
drive that you have just added as the argument to raidhotadd. From the 
looks of your configuration from your previous email, the drive will 
need to have the various slices created before it can be added to the 
existing raid setup.


Mikael Chambon wrote:

>Hi again,
>
>Here is my last question about RAID for now.
>
>I saw a lot of docs about RAID installation, but I was not able to find
>one which clearly explain what to do after a HD crash. Does someone has a
>good link for me ?
>
>For example, let's say I have two new IDE HD in a RAID1 array, raid-disk1
>and
>raid-disk2. If raid-disk1 crash, I have to (from linux-raid-How-to)
>
>Power down the system
>replace the failed disk
>powerup the system
>user raidhotadd to insert the new disk in the array as raid-disk1
>
>
>But what should I do to make sure that raid-disk1 will reconstruct from
>raid-disk2 ? If raid-disk2 reconstruct from raid-disk1, I will lose all
>my data right ?
>
>Thanks for all guys,
>
>--
>Mikael Chambon
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>  
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RAID recovery
@ 2003-07-11  6:54 Mikael Chambon
  2003-07-11 10:51 ` Avijit Pathania
  0 siblings, 1 reply; 12+ messages in thread
From: Mikael Chambon @ 2003-07-11  6:54 UTC (permalink / raw)
  To: linux-raid

Hi again,

Here is my last question about RAID for now.

I saw a lot of docs about RAID installation, but I was not able to find
one which clearly explain what to do after a HD crash. Does someone has a
good link for me ?

For example, let's say I have two new IDE HD in a RAID1 array, raid-disk1
and
raid-disk2. If raid-disk1 crash, I have to (from linux-raid-How-to)

Power down the system
replace the failed disk
powerup the system
user raidhotadd to insert the new disk in the array as raid-disk1


But what should I do to make sure that raid-disk1 will reconstruct from
raid-disk2 ? If raid-disk2 reconstruct from raid-disk1, I will lose all
my data right ?

Thanks for all guys,

--
Mikael Chambon



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-03-08 15:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-06 15:07 RAID Recovery Adam Goryachev
2017-03-06 15:36 ` Reindl Harald
2017-03-06 20:10 ` Phil Turmel
2017-03-07  9:17   ` Adam Goryachev
2017-03-07 14:06     ` Adam Goryachev
2017-03-07 15:00       ` Phil Turmel
2017-03-08  9:08         ` Adam Goryachev
2017-03-08 15:25           ` Adam Goryachev
  -- strict thread matches above, loose matches on Subject: below --
2015-06-29 21:05 Raid Recovery Carter J. Castor
2015-07-01  1:01 ` NeilBrown
2003-07-11  6:54 RAID recovery Mikael Chambon
2003-07-11 10:51 ` Avijit Pathania

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.