All of lore.kernel.org
 help / color / mirror / Atom feed
* RAID 6, 6 device array - all devices lost superblock
@ 2022-08-28  2:00 Peter Sanders
  2022-08-28  9:14 ` Wols Lists
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Peter Sanders @ 2022-08-28  2:00 UTC (permalink / raw)
  To: linux-raid

have a RAID 6 array, 6 devices.  Been running it for years without much issue.

Had hardware issues with my system - ended up replacing the
motherboard, video card, and power supply and re-installing the OS
(Debian 11).

As the hardware issues evolved, I'd crash, reboot, un-mount the array,
run fsck, mount and continue on my way - no problems.

After the hardware was replaced, my array will not assemble - mdadm
assemble reports no RAID superblock on the devices.
root@superior:/etc/mdadm# mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/sr0: No medium found
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb

Examine reports
/dev/sda:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

Searching for these results indicate I can rebuild the superblock, but
details on how to do that are lacking, at least on the pages I found.

Currently I have no /dev/md* devices.
I have access to the old mdadm.conf file - have tried assembling with
it, with the default mdadm.conf, and with no mdadm.conf file in /etc
and /etc/mdadm.

Suggestions for how to get the array back would be most appreciated.

Thanks
- Peter

Here is the data suggested from the wiki page:

root@superior:/etc/mdadm# mdadm --assemble --scan --verbose
mdadm: looking for devices for /dev/md/0
mdadm: cannot open device /dev/sr0: No medium found
mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sda
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdb
mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdd
mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sde
mdadm: No super block found on /dev/sdf (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdf
mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/sdc
mdadm: No super block found on /dev/nvme0n1p9 (Expected magic
a92b4efc, got 00000000)
mdadm: no RAID superblock on /dev/nvme0n1p9
mdadm: No super block found on /dev/nvme0n1p8 (Expected magic
a92b4efc, got 0000040c)
mdadm: no RAID superblock on /dev/nvme0n1p8
mdadm: No super block found on /dev/nvme0n1p7 (Expected magic
a92b4efc, got 00002004)
mdadm: no RAID superblock on /dev/nvme0n1p7
mdadm: No super block found on /dev/nvme0n1p6 (Expected magic
a92b4efc, got 0000040d)
mdadm: no RAID superblock on /dev/nvme0n1p6
mdadm: No super block found on /dev/nvme0n1p5 (Expected magic
a92b4efc, got 00000409)
mdadm: no RAID superblock on /dev/nvme0n1p5
mdadm: /dev/nvme0n1p2 is too small for md: size is 2 sectors.
mdadm: no RAID superblock on /dev/nvme0n1p2
mdadm: No super block found on /dev/nvme0n1p1 (Expected magic
a92b4efc, got 00040001)
mdadm: no RAID superblock on /dev/nvme0n1p1
mdadm: No super block found on /dev/nvme0n1 (Expected magic a92b4efc,
got 7a78e8ed)
mdadm: no RAID superblock on /dev/nvme0n1
root@superior:/etc/mdadm#


uname -a
Linux superior 5.10.0-17-amd64 #1 SMP Debian 5.10.136-1 (2022-08-13)
x86_64 GNU/Linux

mdadm --version
mdadm - v4.1 - 2018-10-01

smartctl devices ------------
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba P300
Device Model:     TOSHIBA HDWD130
Serial Number:    477ALBNAS
LU WWN Device Id: 5 000039 fe6d2e832
Firmware Version: MX6OACF0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 25 21:19:49 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (21791) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 364) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   141   141   054    -    66
  3 Spin_Up_Time            POS---   160   160   024    -    361 (Average 357)
  4 Start_Stop_Count        -O--C-   100   100   000    -    204
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   124   124   020    -    33
  9 Power_On_Hours          -O--C-   095   095   000    -    41740
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    204
192 Power-Off_Retract_Count -O--CK   100   100   000    -    759
193 Load_Cycle_Count        -O--C-   100   100   000    -    759
194 Temperature_Celsius     -O----   181   181   000    -    33 (Min/Max 20/50)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O      7  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x20       GPL     R/O      1  Streaming performance log [OBS-8]
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    33 Celsius
Power Cycle Min/Max Temperature:     29/33 Celsius
Lifetime    Min/Max Temperature:     20/50 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (51)

Index    Estimated Time   Temperature Celsius
  52    2022-08-25 19:12    33  **************
 ...    ..( 76 skipped).    ..  **************
   1    2022-08-25 20:29    33  **************
   2    2022-08-25 20:30     ?  -
   3    2022-08-25 20:31    33  **************
   4    2022-08-25 20:32    34  ***************
   5    2022-08-25 20:33    33  **************
   6    2022-08-25 20:34    34  ***************
 ...    ..(  2 skipped).    ..  ***************
   9    2022-08-25 20:37    34  ***************
  10    2022-08-25 20:38     ?  -
  11    2022-08-25 20:39    29  **********
  12    2022-08-25 20:40    30  ***********
 ...    ..(  2 skipped).    ..  ***********
  15    2022-08-25 20:43    30  ***********
  16    2022-08-25 20:44    31  ************
 ...    ..(  3 skipped).    ..  ************
  20    2022-08-25 20:48    31  ************
  21    2022-08-25 20:49    32  *************
 ...    ..(  9 skipped).    ..  *************
  31    2022-08-25 20:59    32  *************
  32    2022-08-25 21:00    33  **************
 ...    ..( 18 skipped).    ..  **************
  51    2022-08-25 21:19    33  **************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4             204  ---  Lifetime Power-On Resets
0x01  0x010  4           41740  ---  Power-on Hours
0x01  0x018  6     20304278904  ---  Logical Sectors Written
0x01  0x020  6        64656942  ---  Number of Write Commands
0x01  0x028  6    350269182084  ---  Logical Sectors Read
0x01  0x030  6       481405773  ---  Number of Read Commands
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4           41734  ---  Spindle Motor Power-on Hours
0x03  0x010  4           41734  ---  Head Flying Hours
0x03  0x018  4             759  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4              22  ---  Read Recovery Attempts
0x03  0x030  4               6  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and
Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              33  ---  Current Temperature
0x05  0x010  1              33  N--  Average Short Term Temperature
0x05  0x018  1              37  N--  Average Long Term Temperature
0x05  0x020  1              50  ---  Highest Temperature
0x05  0x028  1              20  ---  Lowest Temperature
0x05  0x030  1              46  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1              43  N--  Highest Average Long Term Temperature
0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4            1006  ---  Number of Hardware Resets
0x06  0x010  4             494  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2           35  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00DC0B0
Serial Number:    WD-WCC1T0668790
LU WWN Device Id: 5 0014ee 2084d406a
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 25 21:19:51 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (40560) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 407) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x70b5)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   181   178   021    -    5916
  4 Start_Stop_Count        -O--CK   100   100   000    -    377
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   007   007   000    -    68295
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    296
192 Power-Off_Retract_Count -O--CK   200   200   000    -    242
193 Load_Cycle_Count        -O--CK   052   052   000    -    445057
194 Temperature_Celsius     -O---K   121   102   000    -    29
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         7         -
# 2  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
Device State:                        Active (0)
Current Temperature:                    29 Celsius
Power Cycle Min/Max Temperature:     28/29 Celsius
Lifetime    Min/Max Temperature:      2/48 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (138)

Index    Estimated Time   Temperature Celsius
 139    2022-08-25 13:22    30  ***********
 140    2022-08-25 13:23    29  **********
 ...    ..(  5 skipped).    ..  **********
 146    2022-08-25 13:29    29  **********
 147    2022-08-25 13:30     ?  -
 148    2022-08-25 13:31    26  *******
 149    2022-08-25 13:32     ?  -
 150    2022-08-25 13:33    28  *********
 151    2022-08-25 13:34     ?  -
 152    2022-08-25 13:35    28  *********
 153    2022-08-25 13:36    28  *********
 154    2022-08-25 13:37    29  **********
 ...    ..( 55 skipped).    ..  **********
 210    2022-08-25 14:33    29  **********
 211    2022-08-25 14:34    30  ***********
 ...    ..( 11 skipped).    ..  ***********
 223    2022-08-25 14:46    30  ***********
 224    2022-08-25 14:47    29  **********
 ...    ..(103 skipped).    ..  **********
 328    2022-08-25 16:31    29  **********
 329    2022-08-25 16:32    30  ***********
 ...    ..( 18 skipped).    ..  ***********
 348    2022-08-25 16:51    30  ***********
 349    2022-08-25 16:52    29  **********
 ...    ..( 33 skipped).    ..  **********
 383    2022-08-25 17:26    29  **********
 384    2022-08-25 17:27    30  ***********
 ...    ..( 10 skipped).    ..  ***********
 395    2022-08-25 17:38    30  ***********
 396    2022-08-25 17:39    29  **********
 ...    ..(218 skipped).    ..  **********
 137    2022-08-25 21:18    29  **********
 138    2022-08-25 21:19     ?  -

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2          305  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         2491  Vendor specific

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba P300
Device Model:     TOSHIBA HDWD130
Serial Number:    Y7211KPAS
LU WWN Device Id: 5 000039 fe6dca946
Firmware Version: MX6OACF0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 25 21:19:51 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (21791) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 364) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   139   139   054    -    71
  3 Spin_Up_Time            POS---   160   160   024    -    361 (Average 355)
  4 Start_Stop_Count        -O--C-   100   100   000    -    189
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    31
  9 Power_On_Hours          -O--C-   095   095   000    -    35428
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    189
192 Power-Off_Retract_Count -O--CK   100   100   000    -    599
193 Load_Cycle_Count        -O--C-   100   100   000    -    599
194 Temperature_Celsius     -O----   176   176   000    -    34 (Min/Max 19/50)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O      7  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x20       GPL     R/O      1  Streaming performance log [OBS-8]
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    34 Celsius
Power Cycle Min/Max Temperature:     28/34 Celsius
Lifetime    Min/Max Temperature:     19/50 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (15)

Index    Estimated Time   Temperature Celsius
  16    2022-08-25 19:12    33  **************
 ...    ..( 66 skipped).    ..  **************
  83    2022-08-25 20:19    33  **************
  84    2022-08-25 20:20    34  ***************
 ...    ..(  8 skipped).    ..  ***************
  93    2022-08-25 20:29    34  ***************
  94    2022-08-25 20:30     ?  -
  95    2022-08-25 20:31    34  ***************
 ...    ..(  5 skipped).    ..  ***************
 101    2022-08-25 20:37    34  ***************
 102    2022-08-25 20:38     ?  -
 103    2022-08-25 20:39    29  **********
 104    2022-08-25 20:40    29  **********
 105    2022-08-25 20:41    30  ***********
 106    2022-08-25 20:42    30  ***********
 107    2022-08-25 20:43    31  ************
 ...    ..(  3 skipped).    ..  ************
 111    2022-08-25 20:47    31  ************
 112    2022-08-25 20:48    32  *************
 ...    ..(  4 skipped).    ..  *************
 117    2022-08-25 20:53    32  *************
 118    2022-08-25 20:54    33  **************
 ...    ..( 15 skipped).    ..  **************
   6    2022-08-25 21:10    33  **************
   7    2022-08-25 21:11    34  ***************
   8    2022-08-25 21:12    33  **************
   9    2022-08-25 21:13    34  ***************
 ...    ..(  5 skipped).    ..  ***************
  15    2022-08-25 21:19    34  ***************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4             189  ---  Lifetime Power-On Resets
0x01  0x010  4           35428  ---  Power-on Hours
0x01  0x018  6     12728825059  ---  Logical Sectors Written
0x01  0x020  6        36220308  ---  Number of Write Commands
0x01  0x028  6    289884223915  ---  Logical Sectors Read
0x01  0x030  6       321688917  ---  Number of Read Commands
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4           35423  ---  Spindle Motor Power-on Hours
0x03  0x010  4           35423  ---  Head Flying Hours
0x03  0x018  4             599  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               7  ---  Read Recovery Attempts
0x03  0x030  4               6  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and
Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              34  ---  Current Temperature
0x05  0x010  1              34  N--  Average Short Term Temperature
0x05  0x018  1              37  N--  Average Long Term Temperature
0x05  0x020  1              50  ---  Highest Temperature
0x05  0x028  1              19  ---  Lowest Temperature
0x05  0x030  1              46  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1              43  N--  Highest Average Long Term Temperature
0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4          147297  ---  Number of Hardware Resets
0x06  0x010  4            8793  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2           29  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00D8PB0
Serial Number:    WD-WCC4N0091255
LU WWN Device Id: 5 0014ee 2b3d4ffa1
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 25 21:19:53 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (42480) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 426) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x7035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    2
  3 Spin_Up_Time            POS--K   184   181   021    -    5783
  4 Start_Stop_Count        -O--CK   100   100   000    -    275
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   039   039   000    -    44593
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    273
192 Power-Off_Retract_Count -O--CK   200   200   000    -    225
193 Load_Cycle_Count        -O--CK   047   047   000    -    461100
194 Temperature_Celsius     -O---K   122   105   000    -    28
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
Device State:                        Active (0)
Current Temperature:                    28 Celsius
Power Cycle Min/Max Temperature:     27/28 Celsius
Lifetime    Min/Max Temperature:      2/44 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (444)

Index    Estimated Time   Temperature Celsius
 445    2022-08-25 13:22    29  **********
 ...    ..( 33 skipped).    ..  **********
   1    2022-08-25 13:56    29  **********
   2    2022-08-25 13:57     ?  -
   3    2022-08-25 13:58    29  **********
 ...    ..(  6 skipped).    ..  **********
  10    2022-08-25 14:05    29  **********
  11    2022-08-25 14:06     ?  -
  12    2022-08-25 14:07    26  *******
  13    2022-08-25 14:08     ?  -
  14    2022-08-25 14:09    27  ********
  15    2022-08-25 14:10    27  ********
  16    2022-08-25 14:11    28  *********
 ...    ..( 37 skipped).    ..  *********
  54    2022-08-25 14:49    28  *********
  55    2022-08-25 14:50    29  **********
 ...    ..(388 skipped).    ..  **********
 444    2022-08-25 21:19    29  **********

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2          286  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x8000  4         2493  Vendor specific

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD30EZRX-00MMMB0
Serial Number:    WD-WCAWZ2669166
LU WWN Device Id: 5 0014ee 15a13d994
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 25 21:19:53 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (50160) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 482) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x3035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   153   138   021    -    9350
  4 Start_Stop_Count        -O--CK   100   100   000    -    297
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   040   040   000    -    44409
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    268
192 Power-Off_Retract_Count -O--CK   200   200   000    -    218
193 Load_Cycle_Count        -O--CK   001   001   000    -    1082082
194 Temperature_Celsius     -O---K   122   105   000    -    30
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
Device State:                        Active (0)
Current Temperature:                    30 Celsius
Power Cycle Min/Max Temperature:     27/30 Celsius
Lifetime    Min/Max Temperature:      0/47 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (88)

Index    Estimated Time   Temperature Celsius
  89    2022-08-25 13:22    30  ***********
 ...    ..( 33 skipped).    ..  ***********
 123    2022-08-25 13:56    30  ***********
 124    2022-08-25 13:57     ?  -
 125    2022-08-25 13:58    31  ************
 126    2022-08-25 13:59    30  ***********
 ...    ..(  5 skipped).    ..  ***********
 132    2022-08-25 14:05    30  ***********
 133    2022-08-25 14:06     ?  -
 134    2022-08-25 14:07    26  *******
 135    2022-08-25 14:08     ?  -
 136    2022-08-25 14:09    27  ********
 ...    ..(  3 skipped).    ..  ********
 140    2022-08-25 14:13    27  ********
 141    2022-08-25 14:14    28  *********
 ...    ..(  3 skipped).    ..  *********
 145    2022-08-25 14:18    28  *********
 146    2022-08-25 14:19    29  **********
 ...    ..( 13 skipped).    ..  **********
 160    2022-08-25 14:33    29  **********
 161    2022-08-25 14:34    30  ***********
 ...    ..( 43 skipped).    ..  ***********
 205    2022-08-25 15:18    30  ***********
 206    2022-08-25 15:19    31  ************
 207    2022-08-25 15:20    30  ***********
 ...    ..(168 skipped).    ..  ***********
 376    2022-08-25 18:09    30  ***********
 377    2022-08-25 18:10    31  ************
 378    2022-08-25 18:11    30  ***********
 ...    ..( 34 skipped).    ..  ***********
 413    2022-08-25 18:46    30  ***********
 414    2022-08-25 18:47    31  ************
 415    2022-08-25 18:48    30  ***********
 ...    ..(  7 skipped).    ..  ***********
 423    2022-08-25 18:56    30  ***********
 424    2022-08-25 18:57    31  ************
 425    2022-08-25 18:58    30  ***********
 ...    ..(  7 skipped).    ..  ***********
 433    2022-08-25 19:06    30  ***********
 434    2022-08-25 19:07    31  ************
 435    2022-08-25 19:08    30  ***********
 ...    ..( 47 skipped).    ..  ***********
   5    2022-08-25 19:56    30  ***********
   6    2022-08-25 19:57    31  ************
   7    2022-08-25 19:58    30  ***********
 ...    ..( 80 skipped).    ..  ***********
  88    2022-08-25 21:19    30  ***********

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x8000  4         2492  Vendor specific

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba P300
Device Model:     TOSHIBA HDWD130
Serial Number:    477ABEJAS
LU WWN Device Id: 5 000039 fe6d2ce25
Firmware Version: MX6OACF0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 25 21:19:53 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test
routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (23082) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 385) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   140   140   054    -    68
  3 Spin_Up_Time            POS---   161   161   024    -    358 (Average 354)
  4 Start_Stop_Count        -O--C-   100   100   000    -    243
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   126   126   020    -    32
  9 Power_On_Hours          -O--C-   094   094   000    -    44046
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    243
192 Power-Off_Retract_Count -O--CK   100   100   000    -    912
193 Load_Cycle_Count        -O--C-   100   100   000    -    912
194 Temperature_Celsius     -O----   193   193   000    -    31 (Min/Max 19/46)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O      7  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x20       GPL     R/O      1  Streaming performance log [OBS-8]
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    31 Celsius
Power Cycle Min/Max Temperature:     28/32 Celsius
Lifetime    Min/Max Temperature:     19/46 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (117)

Index    Estimated Time   Temperature Celsius
 118    2022-08-25 19:12    31  ************
 ...    ..( 76 skipped).    ..  ************
  67    2022-08-25 20:29    31  ************
  68    2022-08-25 20:30     ?  -
  69    2022-08-25 20:31    31  ************
  70    2022-08-25 20:32    32  *************
  71    2022-08-25 20:33    31  ************
  72    2022-08-25 20:34    31  ************
  73    2022-08-25 20:35    32  *************
  74    2022-08-25 20:36    32  *************
  75    2022-08-25 20:37    32  *************
  76    2022-08-25 20:38     ?  -
  77    2022-08-25 20:39    28  *********
  78    2022-08-25 20:40    29  **********
 ...    ..(  2 skipped).    ..  **********
  81    2022-08-25 20:43    29  **********
  82    2022-08-25 20:44    30  ***********
 ...    ..(  5 skipped).    ..  ***********
  88    2022-08-25 20:50    30  ***********
  89    2022-08-25 20:51    31  ************
 ...    ..( 27 skipped).    ..  ************
 117    2022-08-25 21:19    31  ************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4             243  ---  Lifetime Power-On Resets
0x01  0x010  4           44046  ---  Power-on Hours
0x01  0x018  6     27756962802  ---  Logical Sectors Written
0x01  0x020  6        86355955  ---  Number of Write Commands
0x01  0x028  6    381193626849  ---  Logical Sectors Read
0x01  0x030  6       791200694  ---  Number of Read Commands
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4           44040  ---  Spindle Motor Power-on Hours
0x03  0x010  4           44040  ---  Head Flying Hours
0x03  0x018  4             912  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               6  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and
Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              32  ---  Current Temperature
0x05  0x010  1              31  N--  Average Short Term Temperature
0x05  0x018  1              35  N--  Average Long Term Temperature
0x05  0x020  1              46  ---  Highest Temperature
0x05  0x028  1              19  ---  Lowest Temperature
0x05  0x030  1              43  N--  Highest Average Short Term Temperature
0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
0x05  0x040  1              41  N--  Highest Average Long Term Temperature
0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4            4706  ---  Number of Hardware Resets
0x06  0x010  4            3910  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0009  2           29  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS


mdadm --examine devices -----
/dev/sda:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

mdadm --detail /dev/md0 ------

lsdrv ------------------------
PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
Corporation E12 NVMe Controller (rev 01)
└nvme nvme0 PCIe SSD                                 {21112925606047}
 └nvme0n1 238.47g [259:0] Partitioned (dos)
  ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
  │└Mounted as /dev/nvme0n1p1 @ /boot
  ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
  ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
  │└Mounted as /dev/nvme0n1p5 @ /
  ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
  │└Mounted as /dev/nvme0n1p6 @ /var
  ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
  ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
  │└Mounted as /dev/nvme0n1p8 @ /home
  └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
JMB368 IDE controller (rev 10)
└scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
 └sr0 1.00g [11:0] Empty/Unknown
PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
SATA/IDE Controller (rev 10)
└scsi 2:x:x:x [Empty]
PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 51)
├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
│└sda 2.73t [8:0] Partitioned (PMBR)
└scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
 └sdc 2.73t [8:32] Partitioned (gpt)
PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
FCH SATA Controller [AHCI mode] (rev 51)
├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
│└sdb 2.73t [8:16] Partitioned (gpt)
├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
│└sdd 2.73t [8:48] Partitioned (gpt)
├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
│└sde 2.73t [8:64] Partitioned (gpt)
└scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
 └sdf 2.73t [8:80] Partitioned (gpt)

cat /proc/mdstat -------------
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
unused devices: <none>

cat /etc/mdadm/mdadm.conf ----
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md/0  metadata=1.2 UUID=109fa7b0:cf08fdba:e36284a9:5786ffff
name=superior:0

# This configuration was auto-generated on Sun, 26 Dec 2021 13:31:14
-0500 by mkconf

cat /proc/partitions ---------
major minor  #blocks  name

 259        0  250059096 nvme0n1
 259        1     496640 nvme0n1p1
 259        2          1 nvme0n1p2
 259        3   63475712 nvme0n1p5
 259        4   97654784 nvme0n1p6
 259        5      37888 nvme0n1p7
 259        6   86913024 nvme0n1p8
 259        7    1474560 nvme0n1p9
   8       32 2930266584 sdc
   8       80 2930266584 sdf
   8       64 2930266584 sde
   8       48 2930266584 sdd
   8       16 2930266584 sdb
   8        0 2930266584 sda
  11        0    1048575 sr0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28  2:00 RAID 6, 6 device array - all devices lost superblock Peter Sanders
@ 2022-08-28  9:14 ` Wols Lists
  2022-08-28  9:54   ` Wols Lists
  2022-08-28 15:10 ` John Stoffel
  2022-08-28 17:11 ` Andy Smith
  2 siblings, 1 reply; 29+ messages in thread
From: Wols Lists @ 2022-08-28  9:14 UTC (permalink / raw)
  To: Peter Sanders, linux-raid; +Cc: Phil Turmel, NeilBrown

On 28/08/2022 03:00, Peter Sanders wrote:
> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
> 
> Had hardware issues with my system - ended up replacing the
> motherboard, video card, and power supply and re-installing the OS
> (Debian 11).
> 
> As the hardware issues evolved, I'd crash, reboot, un-mount the array,
> run fsck, mount and continue on my way - no problems.
> 
> After the hardware was replaced, my array will not assemble - mdadm
> assemble reports no RAID superblock on the devices.
> root@superior:/etc/mdadm# mdadm --assemble --scan --verbose
> mdadm: looking for devices for /dev/md/0
> mdadm: cannot open device /dev/sr0: No medium found
> mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
> mdadm: no RAID superblock on /dev/sda
> mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
> mdadm: no RAID superblock on /dev/sdb
> 
> Examine reports
> /dev/sda:
>     MBR Magic : aa55
> Partition[0] :   4294967295 sectors at            1 (type ee)
> 
> Searching for these results indicate I can rebuild the superblock, but
> details on how to do that are lacking, at least on the pages I found.

Ouch. That's not nice, but we should be able to get things back, I hope.

I notice it's looking for your superblock on the drive itself. Were your 
drives partitioned? Because unfortunately, it's well known for drives 
moving between hardware to have their MBR/GPT wiped :-( Hopefully that's 
the case, and examining the drives with gdisk/fdisk will come up with 
"your GPT is damaged. Recover?". If so, you're probably good. If not, do 
you have a record of your partitions? Can you just recreate them? If you 
don't know what you're doing here, I'd wait for a bit more advice unless 
you can back the drives up first.

Whatever happens, do you have a backup? Can you make one?

If your drives were NOT partitioned, then I'm afraid we're into 
forensics here. Read up on overlays, so you can make the drives 
read-only, re-create the superblock, and check if you got it right. I've 
not done this myself, so I would hesitate to advise you, but loads of 
people have said the instructions do work, and they've recovered their 
arrays.
> 
> Currently I have no /dev/md* devices.
> I have access to the old mdadm.conf file - have tried assembling with
> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
> and /etc/mdadm.

It looks like the drives weren't partitioned :-( I think you're into 
forensics.
> 
> Suggestions for how to get the array back would be most appreciated.

Not what you're asking for, but another suggestion - DITCH THOSE DRIVES. 
WD Greens are just plain unsuitable for raid, and if your P300s are new, 
they are too :-( (Greens will be damaged as their optimisation is 
completely wrong for raid, the new P300s are SMR) I notice that ERC is 
disabled ...

I'd get 4x6TB N300s or Seagate Ironwolves (if cost is an issue, you can 
get away with two). If you do get four, swap out two greens, and rebuild 
onto the 6TBs. If you can only afford two, swap out two P300s, raid-0 
them and rebuild as raid-5 onto the 6TB/P300 drives, then you CAN go 
raid-6 raid-0ing a green and your last P300 together. Just get rid of 
the greens asap, and the P300s after.
> 
> Thanks
> - Peter
> 
Cheers,
Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28  9:14 ` Wols Lists
@ 2022-08-28  9:54   ` Wols Lists
  2022-08-28 16:47     ` Phil Turmel
  0 siblings, 1 reply; 29+ messages in thread
From: Wols Lists @ 2022-08-28  9:54 UTC (permalink / raw)
  To: Peter Sanders, linux-raid; +Cc: Phil Turmel, NeilBrown

On 28/08/2022 10:14, Wols Lists wrote:
>> Currently I have no /dev/md* devices.
>> I have access to the old mdadm.conf file - have tried assembling with
>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
>> and /etc/mdadm.
> 
> It looks like the drives weren't partitioned :-( I think you're into 
> forensics.

Whoops - my system froze while I was originally writing my reply, and I 
forgot to put this into my rewrite ...

Look up overlays in the wiki. I've never done it myself, but a fair few 
people have said the instructions worked a treat.

You're basically making the drives read-only (all writes get dumped into 
the overlay file), and then re-creating the array over the top, so you 
can test whether you got it right. If you don't, you just ditch the 
overlays and start again, if you did get it right you can recreate the 
array for real.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28  2:00 RAID 6, 6 device array - all devices lost superblock Peter Sanders
  2022-08-28  9:14 ` Wols Lists
@ 2022-08-28 15:10 ` John Stoffel
  2022-08-28 17:11 ` Andy Smith
  2 siblings, 0 replies; 29+ messages in thread
From: John Stoffel @ 2022-08-28 15:10 UTC (permalink / raw)
  To: Peter Sanders; +Cc: linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
Peter> Had hardware issues with my system - ended up replacing the
Peter> motherboard, video card, and power supply and re-installing the OS
Peter> (Debian 11).

Can you give us details on the old vs new motherboard/cpu?  It might
be that you need to tweak the BIOS of the motherboard to expose the
old SATA formats as well.  

Did you install debian onto a fresh boot disk?  Is your BIOS setup to
only do the new form of booting from UEFI devices, so maybe check your
BIOS settings that the data drives are all in AHCI mode, or possibly
even in IDE mode.  It all depends on how old the original hardware
was.  

I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
the BIOS defaults to see my disks.  I guess I should do a clean
install from a blank disk, but I wanted to minimize downtime.  

Wols has some great advice here, and I heartily recommend that you use
overlayfs when doing your testing.  Check the RAID WIKI for
suggestions.

And don't panic!  Your data is probably there, but just missing the
super blocks or partition tables. 

John


Peter> As the hardware issues evolved, I'd crash, reboot, un-mount the array,
Peter> run fsck, mount and continue on my way - no problems.

Peter> After the hardware was replaced, my array will not assemble - mdadm
Peter> assemble reports no RAID superblock on the devices.
Peter> root@superior:/etc/mdadm# mdadm --assemble --scan --verbose
Peter> mdadm: looking for devices for /dev/md/0
Peter> mdadm: cannot open device /dev/sr0: No medium found
Peter> mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sda
Peter> mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sdb

Peter> Examine reports
Peter> /dev/sda:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)

Peter> Searching for these results indicate I can rebuild the superblock, but
Peter> details on how to do that are lacking, at least on the pages I found.

Peter> Currently I have no /dev/md* devices.
Peter> I have access to the old mdadm.conf file - have tried assembling with
Peter> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
Peter> and /etc/mdadm.

Peter> Suggestions for how to get the array back would be most appreciated.

Peter> Thanks
Peter> - Peter

Peter> Here is the data suggested from the wiki page:

Peter> root@superior:/etc/mdadm# mdadm --assemble --scan --verbose
Peter> mdadm: looking for devices for /dev/md/0
Peter> mdadm: cannot open device /dev/sr0: No medium found
Peter> mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sda
Peter> mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sdb
Peter> mdadm: No super block found on /dev/sdd (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sdd
Peter> mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sde
Peter> mdadm: No super block found on /dev/sdf (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sdf
Peter> mdadm: No super block found on /dev/sdc (Expected magic a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/sdc
Peter> mdadm: No super block found on /dev/nvme0n1p9 (Expected magic
Peter> a92b4efc, got 00000000)
Peter> mdadm: no RAID superblock on /dev/nvme0n1p9
Peter> mdadm: No super block found on /dev/nvme0n1p8 (Expected magic
Peter> a92b4efc, got 0000040c)
Peter> mdadm: no RAID superblock on /dev/nvme0n1p8
Peter> mdadm: No super block found on /dev/nvme0n1p7 (Expected magic
Peter> a92b4efc, got 00002004)
Peter> mdadm: no RAID superblock on /dev/nvme0n1p7
Peter> mdadm: No super block found on /dev/nvme0n1p6 (Expected magic
Peter> a92b4efc, got 0000040d)
Peter> mdadm: no RAID superblock on /dev/nvme0n1p6
Peter> mdadm: No super block found on /dev/nvme0n1p5 (Expected magic
Peter> a92b4efc, got 00000409)
Peter> mdadm: no RAID superblock on /dev/nvme0n1p5
Peter> mdadm: /dev/nvme0n1p2 is too small for md: size is 2 sectors.
Peter> mdadm: no RAID superblock on /dev/nvme0n1p2
Peter> mdadm: No super block found on /dev/nvme0n1p1 (Expected magic
Peter> a92b4efc, got 00040001)
Peter> mdadm: no RAID superblock on /dev/nvme0n1p1
Peter> mdadm: No super block found on /dev/nvme0n1 (Expected magic a92b4efc,
Peter> got 7a78e8ed)
Peter> mdadm: no RAID superblock on /dev/nvme0n1
Peter> root@superior:/etc/mdadm#


Peter> uname -a
Peter> Linux superior 5.10.0-17-amd64 #1 SMP Debian 5.10.136-1 (2022-08-13)
Peter> x86_64 GNU/Linux

Peter> mdadm --version
Peter> mdadm - v4.1 - 2018-10-01

Peter> smartctl devices ------------
Peter> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Peter> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Peter> === START OF INFORMATION SECTION ===
Peter> Model Family:     Toshiba P300
Peter> Device Model:     TOSHIBA HDWD130
Peter> Serial Number:    477ALBNAS
Peter> LU WWN Device Id: 5 000039 fe6d2e832
Peter> Firmware Version: MX6OACF0
Peter> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Peter> Sector Sizes:     512 bytes logical, 4096 bytes physical
Peter> Rotation Rate:    7200 rpm
Peter> Form Factor:      3.5 inches
Peter> Device is:        In smartctl database [for details use: -P show]
Peter> ATA Version is:   ATA8-ACS T13/1699-D revision 4
Peter> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Peter> Local Time is:    Thu Aug 25 21:19:49 2022 EDT
Peter> SMART support is: Available - device has SMART capability.
Peter> SMART support is: Enabled
Peter> AAM feature is:   Unavailable
Peter> APM feature is:   Disabled
Peter> Rd look-ahead is: Enabled
Peter> Write cache is:   Enabled
Peter> DSN feature is:   Unavailable
Peter> ATA Security is:  Disabled, frozen [SEC2]
Peter> Wt Cache Reorder: Enabled

Peter> === START OF READ SMART DATA SECTION ===
Peter> SMART overall-health self-assessment test result: PASSED

Peter> General SMART Values:
Peter> Offline data collection status:  (0x80)    Offline data collection activity
Peter>                     was never started.
Peter>                     Auto Offline Data Collection: Enabled.
Peter> Self-test execution status:      (   0)    The previous self-test
Peter> routine completed
Peter>                     without error or no self-test has ever
Peter>                     been run.
Peter> Total time to complete Offline
Peter> data collection:         (21791) seconds.
Peter> Offline data collection
Peter> capabilities:              (0x5b) SMART execute Offline immediate.
Peter>                     Auto Offline data collection on/off support.
Peter>                     Suspend Offline collection upon new
Peter>                     command.
Peter>                     Offline surface scan supported.
Peter>                     Self-test supported.
Peter>                     No Conveyance Self-test supported.
Peter>                     Selective Self-test supported.
Peter> SMART capabilities:            (0x0003)    Saves SMART data before entering
Peter>                     power-saving mode.
Peter>                     Supports SMART auto save timer.
Peter> Error logging capability:        (0x01)    Error logging supported.
Peter>                     General Purpose Logging supported.
Peter> Short self-test routine
Peter> recommended polling time:      (   1) minutes.
Peter> Extended self-test routine
Peter> recommended polling time:      ( 364) minutes.
Peter> SCT capabilities:            (0x003d)    SCT Status supported.
Peter>                     SCT Error Recovery Control supported.
Peter>                     SCT Feature Control supported.
Peter>                     SCT Data Table supported.

Peter> SMART Attributes Data Structure revision number: 16
Peter> Vendor Specific SMART Attributes with Thresholds:
Peter> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
Peter>   1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
Peter>   2 Throughput_Performance  P-S---   141   141   054    -    66
Peter>   3 Spin_Up_Time            POS---   160   160   024    -    361 (Average 357)
Peter>   4 Start_Stop_Count        -O--C-   100   100   000    -    204
Peter>   5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
Peter>   7 Seek_Error_Rate         PO-R--   100   100   067    -    0
Peter>   8 Seek_Time_Performance   P-S---   124   124   020    -    33
Peter>   9 Power_On_Hours          -O--C-   095   095   000    -    41740
Peter>  10 Spin_Retry_Count        PO--C-   100   100   060    -    0
Peter>  12 Power_Cycle_Count       -O--CK   100   100   000    -    204
Peter> 192 Power-Off_Retract_Count -O--CK   100   100   000    -    759
Peter> 193 Load_Cycle_Count        -O--C-   100   100   000    -    759
Peter> 194 Temperature_Celsius     -O----   181   181   000    -    33 (Min/Max 20/50)
Peter> 196 Reallocated_Event_Count -O--CK   100   100   000    -    0
Peter> 197 Current_Pending_Sector  -O---K   100   100   000    -    0
Peter> 198 Offline_Uncorrectable   ---R--   100   100   000    -    0
Peter> 199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
Peter>                             ||||||_ K auto-keep
Peter>                             |||||__ C event count
Peter>                             ||||___ R error rate
Peter>                             |||____ S speed/performance
Peter>                             ||_____ O updated online
Peter>                             |______ P prefailure warning

Peter> General Purpose Log Directory Version 1
Peter> SMART           Log Directory Version 1 [multi-sector log support]
Peter> Address    Access  R/W   Size  Description
Peter> 0x00       GPL,SL  R/O      1  Log Directory
Peter> 0x01           SL  R/O      1  Summary SMART error log
Peter> 0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
Peter> 0x04       GPL     R/O      7  Device Statistics log
Peter> 0x06           SL  R/O      1  SMART self-test log
Peter> 0x07       GPL     R/O      1  Extended self-test log
Peter> 0x08       GPL     R/O      2  Power Conditions log
Peter> 0x09           SL  R/W      1  Selective self-test log
Peter> 0x10       GPL     R/O      1  NCQ Command Error log
Peter> 0x11       GPL     R/O      1  SATA Phy Event Counters log
Peter> 0x20       GPL     R/O      1  Streaming performance log [OBS-8]
Peter> 0x21       GPL     R/O      1  Write stream error log
Peter> 0x22       GPL     R/O      1  Read stream error log
Peter> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
Peter> 0xe0       GPL,SL  R/W      1  SCT Command/Status
Peter> 0xe1       GPL,SL  R/W      1  SCT Data Transfer

Peter> SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Peter> No Errors Logged

Peter> SMART Extended Self-test Log Version: 1 (1 sectors)
Peter> No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Peter> SMART Selective self-test log data structure revision number 1
Peter>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Peter>     1        0        0  Not_testing
Peter>     2        0        0  Not_testing
Peter>     3        0        0  Not_testing
Peter>     4        0        0  Not_testing
Peter>     5        0        0  Not_testing
Peter> Selective self-test flags (0x0):
Peter>   After scanning selected spans, do NOT read-scan remainder of disk.
Peter> If Selective self-test is pending on power-up, resume after 0 minute delay.

Peter> SCT Status Version:                  3
Peter> SCT Version (vendor specific):       256 (0x0100)
Peter> Device State:                        Active (0)
Peter> Current Temperature:                    33 Celsius
Peter> Power Cycle Min/Max Temperature:     29/33 Celsius
Peter> Lifetime    Min/Max Temperature:     20/50 Celsius
Peter> Under/Over Temperature Limit Count:   0/0

Peter> SCT Temperature History Version:     2
Peter> Temperature Sampling Period:         1 minute
Peter> Temperature Logging Interval:        1 minute
Peter> Min/Max recommended Temperature:      0/60 Celsius
Peter> Min/Max Temperature Limit:           -40/70 Celsius
Peter> Temperature History Size (Index):    128 (51)

Peter> Index    Estimated Time   Temperature Celsius
Peter>   52    2022-08-25 19:12    33  **************
Peter>  ...    ..( 76 skipped).    ..  **************
Peter>    1    2022-08-25 20:29    33  **************
Peter>    2    2022-08-25 20:30     ?  -
Peter>    3    2022-08-25 20:31    33  **************
Peter>    4    2022-08-25 20:32    34  ***************
Peter>    5    2022-08-25 20:33    33  **************
Peter>    6    2022-08-25 20:34    34  ***************
Peter>  ...    ..(  2 skipped).    ..  ***************
Peter>    9    2022-08-25 20:37    34  ***************
Peter>   10    2022-08-25 20:38     ?  -
Peter>   11    2022-08-25 20:39    29  **********
Peter>   12    2022-08-25 20:40    30  ***********
Peter>  ...    ..(  2 skipped).    ..  ***********
Peter>   15    2022-08-25 20:43    30  ***********
Peter>   16    2022-08-25 20:44    31  ************
Peter>  ...    ..(  3 skipped).    ..  ************
Peter>   20    2022-08-25 20:48    31  ************
Peter>   21    2022-08-25 20:49    32  *************
Peter>  ...    ..(  9 skipped).    ..  *************
Peter>   31    2022-08-25 20:59    32  *************
Peter>   32    2022-08-25 21:00    33  **************
Peter>  ...    ..( 18 skipped).    ..  **************
Peter>   51    2022-08-25 21:19    33  **************

Peter> SCT Error Recovery Control:
Peter>            Read: Disabled
Peter>           Write: Disabled

Peter> Device Statistics (GP Log 0x04)
Peter> Page  Offset Size        Value Flags Description
Peter> 0x01  =====  =               =  ===  == General Statistics (rev 1) ==
Peter> 0x01  0x008  4             204  ---  Lifetime Power-On Resets
Peter> 0x01  0x010  4           41740  ---  Power-on Hours
Peter> 0x01  0x018  6     20304278904  ---  Logical Sectors Written
Peter> 0x01  0x020  6        64656942  ---  Number of Write Commands
Peter> 0x01  0x028  6    350269182084  ---  Logical Sectors Read
Peter> 0x01  0x030  6       481405773  ---  Number of Read Commands
Peter> 0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
Peter> 0x03  0x008  4           41734  ---  Spindle Motor Power-on Hours
Peter> 0x03  0x010  4           41734  ---  Head Flying Hours
Peter> 0x03  0x018  4             759  ---  Head Load Events
Peter> 0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
Peter> 0x03  0x028  4              22  ---  Read Recovery Attempts
Peter> 0x03  0x030  4               6  ---  Number of Mechanical Start Failures
Peter> 0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
Peter> 0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
Peter> 0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and
Peter> Completion
Peter> 0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
Peter> 0x05  0x008  1              33  ---  Current Temperature
Peter> 0x05  0x010  1              33  N--  Average Short Term Temperature
Peter> 0x05  0x018  1              37  N--  Average Long Term Temperature
Peter> 0x05  0x020  1              50  ---  Highest Temperature
Peter> 0x05  0x028  1              20  ---  Lowest Temperature
Peter> 0x05  0x030  1              46  N--  Highest Average Short Term Temperature
Peter> 0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
Peter> 0x05  0x040  1              43  N--  Highest Average Long Term Temperature
Peter> 0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
Peter> 0x05  0x050  4               0  ---  Time in Over-Temperature
Peter> 0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
Peter> 0x05  0x060  4               0  ---  Time in Under-Temperature
Peter> 0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
Peter> 0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
Peter> 0x06  0x008  4            1006  ---  Number of Hardware Resets
Peter> 0x06  0x010  4             494  ---  Number of ASR Events
Peter> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
Peter>                                 |||_ C monitored condition met
Peter>                                 ||__ D supports DSN
Peter>                                 |___ N normalized value

Peter> Pending Defects log (GP Log 0x0c) not supported

Peter> SATA Phy Event Counters (GP Log 0x11)
Peter> ID      Size     Value  Description
Peter> 0x0001  2            0  Command failed due to ICRC error
Peter> 0x0002  2            0  R_ERR response for data FIS
Peter> 0x0003  2            0  R_ERR response for device-to-host data FIS
Peter> 0x0004  2            0  R_ERR response for host-to-device data FIS
Peter> 0x0005  2            0  R_ERR response for non-data FIS
Peter> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
Peter> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
Peter> 0x0009  2           35  Transition from drive PhyRdy to drive PhyNRdy
Peter> 0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
Peter> 0x000b  2            0  CRC errors within host-to-device FIS
Peter> 0x000d  2            0  Non-CRC errors within host-to-device FIS

Peter> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Peter> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Peter> === START OF INFORMATION SECTION ===
Peter> Model Family:     Western Digital Green
Peter> Device Model:     WDC WD30EZRX-00DC0B0
Peter> Serial Number:    WD-WCC1T0668790
Peter> LU WWN Device Id: 5 0014ee 2084d406a
Peter> Firmware Version: 80.00A80
Peter> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Peter> Sector Sizes:     512 bytes logical, 4096 bytes physical
Peter> Device is:        In smartctl database [for details use: -P show]
Peter> ATA Version is:   ACS-2 (minor revision not indicated)
Peter> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Peter> Local Time is:    Thu Aug 25 21:19:51 2022 EDT
Peter> SMART support is: Available - device has SMART capability.
Peter> SMART support is: Enabled
Peter> AAM feature is:   Unavailable
Peter> APM feature is:   Unavailable
Peter> Rd look-ahead is: Enabled
Peter> Write cache is:   Enabled
Peter> DSN feature is:   Unavailable
Peter> ATA Security is:  Disabled, frozen [SEC2]
Peter> Wt Cache Reorder: Enabled

Peter> === START OF READ SMART DATA SECTION ===
Peter> SMART overall-health self-assessment test result: PASSED

Peter> General SMART Values:
Peter> Offline data collection status:  (0x82)    Offline data collection activity
Peter>                     was completed without error.
Peter>                     Auto Offline Data Collection: Enabled.
Peter> Self-test execution status:      (   0)    The previous self-test
Peter> routine completed
Peter>                     without error or no self-test has ever
Peter>                     been run.
Peter> Total time to complete Offline
Peter> data collection:         (40560) seconds.
Peter> Offline data collection
Peter> capabilities:              (0x7b) SMART execute Offline immediate.
Peter>                     Auto Offline data collection on/off support.
Peter>                     Suspend Offline collection upon new
Peter>                     command.
Peter>                     Offline surface scan supported.
Peter>                     Self-test supported.
Peter>                     Conveyance Self-test supported.
Peter>                     Selective Self-test supported.
Peter> SMART capabilities:            (0x0003)    Saves SMART data before entering
Peter>                     power-saving mode.
Peter>                     Supports SMART auto save timer.
Peter> Error logging capability:        (0x01)    Error logging supported.
Peter>                     General Purpose Logging supported.
Peter> Short self-test routine
Peter> recommended polling time:      (   2) minutes.
Peter> Extended self-test routine
Peter> recommended polling time:      ( 407) minutes.
Peter> Conveyance self-test routine
Peter> recommended polling time:      (   5) minutes.
Peter> SCT capabilities:            (0x70b5)    SCT Status supported.
Peter>                     SCT Feature Control supported.
Peter>                     SCT Data Table supported.

Peter> SMART Attributes Data Structure revision number: 16
Peter> Vendor Specific SMART Attributes with Thresholds:
Peter> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
Peter>   1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
Peter>   3 Spin_Up_Time            POS--K   181   178   021    -    5916
Peter>   4 Start_Stop_Count        -O--CK   100   100   000    -    377
Peter>   5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
Peter>   7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
Peter>   9 Power_On_Hours          -O--CK   007   007   000    -    68295
Peter>  10 Spin_Retry_Count        -O--CK   100   100   000    -    0
Peter>  11 Calibration_Retry_Count -O--CK   100   100   000    -    0
Peter>  12 Power_Cycle_Count       -O--CK   100   100   000    -    296
Peter> 192 Power-Off_Retract_Count -O--CK   200   200   000    -    242
Peter> 193 Load_Cycle_Count        -O--CK   052   052   000    -    445057
Peter> 194 Temperature_Celsius     -O---K   121   102   000    -    29
Peter> 196 Reallocated_Event_Count -O--CK   200   200   000    -    0
Peter> 197 Current_Pending_Sector  -O--CK   200   200   000    -    0
Peter> 198 Offline_Uncorrectable   ----CK   200   200   000    -    0
Peter> 199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
Peter> 200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
Peter>                             ||||||_ K auto-keep
Peter>                             |||||__ C event count
Peter>                             ||||___ R error rate
Peter>                             |||____ S speed/performance
Peter>                             ||_____ O updated online
Peter>                             |______ P prefailure warning

Peter> General Purpose Log Directory Version 1
Peter> SMART           Log Directory Version 1 [multi-sector log support]
Peter> Address    Access  R/W   Size  Description
Peter> 0x00       GPL,SL  R/O      1  Log Directory
Peter> 0x01           SL  R/O      1  Summary SMART error log
Peter> 0x02           SL  R/O      5  Comprehensive SMART error log
Peter> 0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
Peter> 0x06           SL  R/O      1  SMART self-test log
Peter> 0x07       GPL     R/O      1  Extended self-test log
Peter> 0x09           SL  R/W      1  Selective self-test log
Peter> 0x10       GPL     R/O      1  NCQ Command Error log
Peter> 0x11       GPL     R/O      1  SATA Phy Event Counters log
Peter> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
Peter> 0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
Peter> 0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
Peter> 0xbd       GPL,SL  VS       1  Device vendor specific log
Peter> 0xc0       GPL,SL  VS       1  Device vendor specific log
Peter> 0xc1       GPL     VS      93  Device vendor specific log
Peter> 0xe0       GPL,SL  R/W      1  SCT Command/Status
Peter> 0xe1       GPL,SL  R/W      1  SCT Data Transfer

Peter> SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Peter> No Errors Logged

Peter> SMART Extended Self-test Log Version: 1 (1 sectors)
Peter> Num  Test_Description    Status                  Remaining
Peter> LifeTime(hours)  LBA_of_first_error
Peter> # 1  Extended offline    Completed without error       00%         7         -
Peter> # 2  Short offline       Completed without error       00%         0         -

Peter> SMART Selective self-test log data structure revision number 1
Peter>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Peter>     1        0        0  Not_testing
Peter>     2        0        0  Not_testing
Peter>     3        0        0  Not_testing
Peter>     4        0        0  Not_testing
Peter>     5        0        0  Not_testing
Peter> Selective self-test flags (0x0):
Peter>   After scanning selected spans, do NOT read-scan remainder of disk.
Peter> If Selective self-test is pending on power-up, resume after 0 minute delay.

Peter> SCT Status Version:                  3
Peter> SCT Version (vendor specific):       258 (0x0102)
Peter> Device State:                        Active (0)
Peter> Current Temperature:                    29 Celsius
Peter> Power Cycle Min/Max Temperature:     28/29 Celsius
Peter> Lifetime    Min/Max Temperature:      2/48 Celsius
Peter> Under/Over Temperature Limit Count:   0/0
Peter> Vendor specific:
Peter> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Peter> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Peter> SCT Temperature History Version:     2
Peter> Temperature Sampling Period:         1 minute
Peter> Temperature Logging Interval:        1 minute
Peter> Min/Max recommended Temperature:      0/60 Celsius
Peter> Min/Max Temperature Limit:           -41/85 Celsius
Peter> Temperature History Size (Index):    478 (138)

Peter> Index    Estimated Time   Temperature Celsius
Peter>  139    2022-08-25 13:22    30  ***********
Peter>  140    2022-08-25 13:23    29  **********
Peter>  ...    ..(  5 skipped).    ..  **********
Peter>  146    2022-08-25 13:29    29  **********
Peter>  147    2022-08-25 13:30     ?  -
Peter>  148    2022-08-25 13:31    26  *******
Peter>  149    2022-08-25 13:32     ?  -
Peter>  150    2022-08-25 13:33    28  *********
Peter>  151    2022-08-25 13:34     ?  -
Peter>  152    2022-08-25 13:35    28  *********
Peter>  153    2022-08-25 13:36    28  *********
Peter>  154    2022-08-25 13:37    29  **********
Peter>  ...    ..( 55 skipped).    ..  **********
Peter>  210    2022-08-25 14:33    29  **********
Peter>  211    2022-08-25 14:34    30  ***********
Peter>  ...    ..( 11 skipped).    ..  ***********
Peter>  223    2022-08-25 14:46    30  ***********
Peter>  224    2022-08-25 14:47    29  **********
Peter>  ...    ..(103 skipped).    ..  **********
Peter>  328    2022-08-25 16:31    29  **********
Peter>  329    2022-08-25 16:32    30  ***********
Peter>  ...    ..( 18 skipped).    ..  ***********
Peter>  348    2022-08-25 16:51    30  ***********
Peter>  349    2022-08-25 16:52    29  **********
Peter>  ...    ..( 33 skipped).    ..  **********
Peter>  383    2022-08-25 17:26    29  **********
Peter>  384    2022-08-25 17:27    30  ***********
Peter>  ...    ..( 10 skipped).    ..  ***********
Peter>  395    2022-08-25 17:38    30  ***********
Peter>  396    2022-08-25 17:39    29  **********
Peter>  ...    ..(218 skipped).    ..  **********
Peter>  137    2022-08-25 21:18    29  **********
Peter>  138    2022-08-25 21:19     ?  -

Peter> SCT Error Recovery Control command not supported

Peter> Device Statistics (GP/SMART Log 0x04) not supported

Peter> Pending Defects log (GP Log 0x0c) not supported

Peter> SATA Phy Event Counters (GP Log 0x11)
Peter> ID      Size     Value  Description
Peter> 0x0001  2            0  Command failed due to ICRC error
Peter> 0x0002  2            0  R_ERR response for data FIS
Peter> 0x0003  2            0  R_ERR response for device-to-host data FIS
Peter> 0x0004  2            0  R_ERR response for host-to-device data FIS
Peter> 0x0005  2            0  R_ERR response for non-data FIS
Peter> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
Peter> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
Peter> 0x0008  2            0  Device-to-host non-data FIS retries
Peter> 0x0009  2          305  Transition from drive PhyRdy to drive PhyNRdy
Peter> 0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
Peter> 0x000b  2            0  CRC errors within host-to-device FIS
Peter> 0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
Peter> 0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
Peter> 0x8000  4         2491  Vendor specific

Peter> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Peter> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Peter> === START OF INFORMATION SECTION ===
Peter> Model Family:     Toshiba P300
Peter> Device Model:     TOSHIBA HDWD130
Peter> Serial Number:    Y7211KPAS
Peter> LU WWN Device Id: 5 000039 fe6dca946
Peter> Firmware Version: MX6OACF0
Peter> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Peter> Sector Sizes:     512 bytes logical, 4096 bytes physical
Peter> Rotation Rate:    7200 rpm
Peter> Form Factor:      3.5 inches
Peter> Device is:        In smartctl database [for details use: -P show]
Peter> ATA Version is:   ATA8-ACS T13/1699-D revision 4
Peter> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Peter> Local Time is:    Thu Aug 25 21:19:51 2022 EDT
Peter> SMART support is: Available - device has SMART capability.
Peter> SMART support is: Enabled
Peter> AAM feature is:   Unavailable
Peter> APM feature is:   Disabled
Peter> Rd look-ahead is: Enabled
Peter> Write cache is:   Enabled
Peter> DSN feature is:   Unavailable
Peter> ATA Security is:  Disabled, frozen [SEC2]
Peter> Wt Cache Reorder: Enabled

Peter> === START OF READ SMART DATA SECTION ===
Peter> SMART overall-health self-assessment test result: PASSED

Peter> General SMART Values:
Peter> Offline data collection status:  (0x80)    Offline data collection activity
Peter>                     was never started.
Peter>                     Auto Offline Data Collection: Enabled.
Peter> Self-test execution status:      (   0)    The previous self-test
Peter> routine completed
Peter>                     without error or no self-test has ever
Peter>                     been run.
Peter> Total time to complete Offline
Peter> data collection:         (21791) seconds.
Peter> Offline data collection
Peter> capabilities:              (0x5b) SMART execute Offline immediate.
Peter>                     Auto Offline data collection on/off support.
Peter>                     Suspend Offline collection upon new
Peter>                     command.
Peter>                     Offline surface scan supported.
Peter>                     Self-test supported.
Peter>                     No Conveyance Self-test supported.
Peter>                     Selective Self-test supported.
Peter> SMART capabilities:            (0x0003)    Saves SMART data before entering
Peter>                     power-saving mode.
Peter>                     Supports SMART auto save timer.
Peter> Error logging capability:        (0x01)    Error logging supported.
Peter>                     General Purpose Logging supported.
Peter> Short self-test routine
Peter> recommended polling time:      (   1) minutes.
Peter> Extended self-test routine
Peter> recommended polling time:      ( 364) minutes.
Peter> SCT capabilities:            (0x003d)    SCT Status supported.
Peter>                     SCT Error Recovery Control supported.
Peter>                     SCT Feature Control supported.
Peter>                     SCT Data Table supported.

Peter> SMART Attributes Data Structure revision number: 16
Peter> Vendor Specific SMART Attributes with Thresholds:
Peter> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
Peter>   1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
Peter>   2 Throughput_Performance  P-S---   139   139   054    -    71
Peter>   3 Spin_Up_Time            POS---   160   160   024    -    361 (Average 355)
Peter>   4 Start_Stop_Count        -O--C-   100   100   000    -    189
Peter>   5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
Peter>   7 Seek_Error_Rate         PO-R--   100   100   067    -    0
Peter>   8 Seek_Time_Performance   P-S---   128   128   020    -    31
Peter>   9 Power_On_Hours          -O--C-   095   095   000    -    35428
Peter>  10 Spin_Retry_Count        PO--C-   100   100   060    -    0
Peter>  12 Power_Cycle_Count       -O--CK   100   100   000    -    189
Peter> 192 Power-Off_Retract_Count -O--CK   100   100   000    -    599
Peter> 193 Load_Cycle_Count        -O--C-   100   100   000    -    599
Peter> 194 Temperature_Celsius     -O----   176   176   000    -    34 (Min/Max 19/50)
Peter> 196 Reallocated_Event_Count -O--CK   100   100   000    -    0
Peter> 197 Current_Pending_Sector  -O---K   100   100   000    -    0
Peter> 198 Offline_Uncorrectable   ---R--   100   100   000    -    0
Peter> 199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
Peter>                             ||||||_ K auto-keep
Peter>                             |||||__ C event count
Peter>                             ||||___ R error rate
Peter>                             |||____ S speed/performance
Peter>                             ||_____ O updated online
Peter>                             |______ P prefailure warning

Peter> General Purpose Log Directory Version 1
Peter> SMART           Log Directory Version 1 [multi-sector log support]
Peter> Address    Access  R/W   Size  Description
Peter> 0x00       GPL,SL  R/O      1  Log Directory
Peter> 0x01           SL  R/O      1  Summary SMART error log
Peter> 0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
Peter> 0x04       GPL     R/O      7  Device Statistics log
Peter> 0x06           SL  R/O      1  SMART self-test log
Peter> 0x07       GPL     R/O      1  Extended self-test log
Peter> 0x08       GPL     R/O      2  Power Conditions log
Peter> 0x09           SL  R/W      1  Selective self-test log
Peter> 0x10       GPL     R/O      1  NCQ Command Error log
Peter> 0x11       GPL     R/O      1  SATA Phy Event Counters log
Peter> 0x20       GPL     R/O      1  Streaming performance log [OBS-8]
Peter> 0x21       GPL     R/O      1  Write stream error log
Peter> 0x22       GPL     R/O      1  Read stream error log
Peter> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
Peter> 0xe0       GPL,SL  R/W      1  SCT Command/Status
Peter> 0xe1       GPL,SL  R/W      1  SCT Data Transfer

Peter> SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Peter> No Errors Logged

Peter> SMART Extended Self-test Log Version: 1 (1 sectors)
Peter> No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Peter> SMART Selective self-test log data structure revision number 1
Peter>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Peter>     1        0        0  Not_testing
Peter>     2        0        0  Not_testing
Peter>     3        0        0  Not_testing
Peter>     4        0        0  Not_testing
Peter>     5        0        0  Not_testing
Peter> Selective self-test flags (0x0):
Peter>   After scanning selected spans, do NOT read-scan remainder of disk.
Peter> If Selective self-test is pending on power-up, resume after 0 minute delay.

Peter> SCT Status Version:                  3
Peter> SCT Version (vendor specific):       256 (0x0100)
Peter> Device State:                        Active (0)
Peter> Current Temperature:                    34 Celsius
Peter> Power Cycle Min/Max Temperature:     28/34 Celsius
Peter> Lifetime    Min/Max Temperature:     19/50 Celsius
Peter> Under/Over Temperature Limit Count:   0/0

Peter> SCT Temperature History Version:     2
Peter> Temperature Sampling Period:         1 minute
Peter> Temperature Logging Interval:        1 minute
Peter> Min/Max recommended Temperature:      0/60 Celsius
Peter> Min/Max Temperature Limit:           -40/70 Celsius
Peter> Temperature History Size (Index):    128 (15)

Peter> Index    Estimated Time   Temperature Celsius
Peter>   16    2022-08-25 19:12    33  **************
Peter>  ...    ..( 66 skipped).    ..  **************
Peter>   83    2022-08-25 20:19    33  **************
Peter>   84    2022-08-25 20:20    34  ***************
Peter>  ...    ..(  8 skipped).    ..  ***************
Peter>   93    2022-08-25 20:29    34  ***************
Peter>   94    2022-08-25 20:30     ?  -
Peter>   95    2022-08-25 20:31    34  ***************
Peter>  ...    ..(  5 skipped).    ..  ***************
Peter>  101    2022-08-25 20:37    34  ***************
Peter>  102    2022-08-25 20:38     ?  -
Peter>  103    2022-08-25 20:39    29  **********
Peter>  104    2022-08-25 20:40    29  **********
Peter>  105    2022-08-25 20:41    30  ***********
Peter>  106    2022-08-25 20:42    30  ***********
Peter>  107    2022-08-25 20:43    31  ************
Peter>  ...    ..(  3 skipped).    ..  ************
Peter>  111    2022-08-25 20:47    31  ************
Peter>  112    2022-08-25 20:48    32  *************
Peter>  ...    ..(  4 skipped).    ..  *************
Peter>  117    2022-08-25 20:53    32  *************
Peter>  118    2022-08-25 20:54    33  **************
Peter>  ...    ..( 15 skipped).    ..  **************
Peter>    6    2022-08-25 21:10    33  **************
Peter>    7    2022-08-25 21:11    34  ***************
Peter>    8    2022-08-25 21:12    33  **************
Peter>    9    2022-08-25 21:13    34  ***************
Peter>  ...    ..(  5 skipped).    ..  ***************
Peter>   15    2022-08-25 21:19    34  ***************

Peter> SCT Error Recovery Control:
Peter>            Read: Disabled
Peter>           Write: Disabled

Peter> Device Statistics (GP Log 0x04)
Peter> Page  Offset Size        Value Flags Description
Peter> 0x01  =====  =               =  ===  == General Statistics (rev 1) ==
Peter> 0x01  0x008  4             189  ---  Lifetime Power-On Resets
Peter> 0x01  0x010  4           35428  ---  Power-on Hours
Peter> 0x01  0x018  6     12728825059  ---  Logical Sectors Written
Peter> 0x01  0x020  6        36220308  ---  Number of Write Commands
Peter> 0x01  0x028  6    289884223915  ---  Logical Sectors Read
Peter> 0x01  0x030  6       321688917  ---  Number of Read Commands
Peter> 0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
Peter> 0x03  0x008  4           35423  ---  Spindle Motor Power-on Hours
Peter> 0x03  0x010  4           35423  ---  Head Flying Hours
Peter> 0x03  0x018  4             599  ---  Head Load Events
Peter> 0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
Peter> 0x03  0x028  4               7  ---  Read Recovery Attempts
Peter> 0x03  0x030  4               6  ---  Number of Mechanical Start Failures
Peter> 0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
Peter> 0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
Peter> 0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and
Peter> Completion
Peter> 0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
Peter> 0x05  0x008  1              34  ---  Current Temperature
Peter> 0x05  0x010  1              34  N--  Average Short Term Temperature
Peter> 0x05  0x018  1              37  N--  Average Long Term Temperature
Peter> 0x05  0x020  1              50  ---  Highest Temperature
Peter> 0x05  0x028  1              19  ---  Lowest Temperature
Peter> 0x05  0x030  1              46  N--  Highest Average Short Term Temperature
Peter> 0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
Peter> 0x05  0x040  1              43  N--  Highest Average Long Term Temperature
Peter> 0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
Peter> 0x05  0x050  4               0  ---  Time in Over-Temperature
Peter> 0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
Peter> 0x05  0x060  4               0  ---  Time in Under-Temperature
Peter> 0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
Peter> 0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
Peter> 0x06  0x008  4          147297  ---  Number of Hardware Resets
Peter> 0x06  0x010  4            8793  ---  Number of ASR Events
Peter> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
Peter>                                 |||_ C monitored condition met
Peter>                                 ||__ D supports DSN
Peter>                                 |___ N normalized value

Peter> Pending Defects log (GP Log 0x0c) not supported

Peter> SATA Phy Event Counters (GP Log 0x11)
Peter> ID      Size     Value  Description
Peter> 0x0001  2            0  Command failed due to ICRC error
Peter> 0x0002  2            0  R_ERR response for data FIS
Peter> 0x0003  2            0  R_ERR response for device-to-host data FIS
Peter> 0x0004  2            0  R_ERR response for host-to-device data FIS
Peter> 0x0005  2            0  R_ERR response for non-data FIS
Peter> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
Peter> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
Peter> 0x0009  2           29  Transition from drive PhyRdy to drive PhyNRdy
Peter> 0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
Peter> 0x000b  2            0  CRC errors within host-to-device FIS
Peter> 0x000d  2            0  Non-CRC errors within host-to-device FIS

Peter> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Peter> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Peter> === START OF INFORMATION SECTION ===
Peter> Model Family:     Western Digital Green
Peter> Device Model:     WDC WD30EZRX-00D8PB0
Peter> Serial Number:    WD-WCC4N0091255
Peter> LU WWN Device Id: 5 0014ee 2b3d4ffa1
Peter> Firmware Version: 80.00A80
Peter> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Peter> Sector Sizes:     512 bytes logical, 4096 bytes physical
Peter> Rotation Rate:    5400 rpm
Peter> Device is:        In smartctl database [for details use: -P show]
Peter> ATA Version is:   ACS-2 (minor revision not indicated)
Peter> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Peter> Local Time is:    Thu Aug 25 21:19:53 2022 EDT
Peter> SMART support is: Available - device has SMART capability.
Peter> SMART support is: Enabled
Peter> AAM feature is:   Unavailable
Peter> APM feature is:   Unavailable
Peter> Rd look-ahead is: Enabled
Peter> Write cache is:   Enabled
Peter> DSN feature is:   Unavailable
Peter> ATA Security is:  Disabled, frozen [SEC2]
Peter> Wt Cache Reorder: Enabled

Peter> === START OF READ SMART DATA SECTION ===
Peter> SMART overall-health self-assessment test result: PASSED

Peter> General SMART Values:
Peter> Offline data collection status:  (0x82)    Offline data collection activity
Peter>                     was completed without error.
Peter>                     Auto Offline Data Collection: Enabled.
Peter> Self-test execution status:      (   0)    The previous self-test
Peter> routine completed
Peter>                     without error or no self-test has ever
Peter>                     been run.
Peter> Total time to complete Offline
Peter> data collection:         (42480) seconds.
Peter> Offline data collection
Peter> capabilities:              (0x7b) SMART execute Offline immediate.
Peter>                     Auto Offline data collection on/off support.
Peter>                     Suspend Offline collection upon new
Peter>                     command.
Peter>                     Offline surface scan supported.
Peter>                     Self-test supported.
Peter>                     Conveyance Self-test supported.
Peter>                     Selective Self-test supported.
Peter> SMART capabilities:            (0x0003)    Saves SMART data before entering
Peter>                     power-saving mode.
Peter>                     Supports SMART auto save timer.
Peter> Error logging capability:        (0x01)    Error logging supported.
Peter>                     General Purpose Logging supported.
Peter> Short self-test routine
Peter> recommended polling time:      (   2) minutes.
Peter> Extended self-test routine
Peter> recommended polling time:      ( 426) minutes.
Peter> Conveyance self-test routine
Peter> recommended polling time:      (   5) minutes.
Peter> SCT capabilities:            (0x7035)    SCT Status supported.
Peter>                     SCT Feature Control supported.
Peter>                     SCT Data Table supported.

Peter> SMART Attributes Data Structure revision number: 16
Peter> Vendor Specific SMART Attributes with Thresholds:
Peter> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
Peter>   1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    2
Peter>   3 Spin_Up_Time            POS--K   184   181   021    -    5783
Peter>   4 Start_Stop_Count        -O--CK   100   100   000    -    275
Peter>   5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
Peter>   7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
Peter>   9 Power_On_Hours          -O--CK   039   039   000    -    44593
Peter>  10 Spin_Retry_Count        -O--CK   100   100   000    -    0
Peter>  11 Calibration_Retry_Count -O--CK   100   100   000    -    0
Peter>  12 Power_Cycle_Count       -O--CK   100   100   000    -    273
Peter> 192 Power-Off_Retract_Count -O--CK   200   200   000    -    225
Peter> 193 Load_Cycle_Count        -O--CK   047   047   000    -    461100
Peter> 194 Temperature_Celsius     -O---K   122   105   000    -    28
Peter> 196 Reallocated_Event_Count -O--CK   200   200   000    -    0
Peter> 197 Current_Pending_Sector  -O--CK   200   200   000    -    0
Peter> 198 Offline_Uncorrectable   ----CK   200   200   000    -    0
Peter> 199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
Peter> 200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
Peter>                             ||||||_ K auto-keep
Peter>                             |||||__ C event count
Peter>                             ||||___ R error rate
Peter>                             |||____ S speed/performance
Peter>                             ||_____ O updated online
Peter>                             |______ P prefailure warning

Peter> General Purpose Log Directory Version 1
Peter> SMART           Log Directory Version 1 [multi-sector log support]
Peter> Address    Access  R/W   Size  Description
Peter> 0x00       GPL,SL  R/O      1  Log Directory
Peter> 0x01           SL  R/O      1  Summary SMART error log
Peter> 0x02           SL  R/O      5  Comprehensive SMART error log
Peter> 0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
Peter> 0x06           SL  R/O      1  SMART self-test log
Peter> 0x07       GPL     R/O      1  Extended self-test log
Peter> 0x09           SL  R/W      1  Selective self-test log
Peter> 0x10       GPL     R/O      1  NCQ Command Error log
Peter> 0x11       GPL     R/O      1  SATA Phy Event Counters log
Peter> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
Peter> 0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
Peter> 0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
Peter> 0xbd       GPL,SL  VS       1  Device vendor specific log
Peter> 0xc0       GPL,SL  VS       1  Device vendor specific log
Peter> 0xc1       GPL     VS      93  Device vendor specific log
Peter> 0xe0       GPL,SL  R/W      1  SCT Command/Status
Peter> 0xe1       GPL,SL  R/W      1  SCT Data Transfer

Peter> SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Peter> No Errors Logged

Peter> SMART Extended Self-test Log Version: 1 (1 sectors)
Peter> No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Peter> SMART Selective self-test log data structure revision number 1
Peter>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Peter>     1        0        0  Not_testing
Peter>     2        0        0  Not_testing
Peter>     3        0        0  Not_testing
Peter>     4        0        0  Not_testing
Peter>     5        0        0  Not_testing
Peter> Selective self-test flags (0x0):
Peter>   After scanning selected spans, do NOT read-scan remainder of disk.
Peter> If Selective self-test is pending on power-up, resume after 0 minute delay.

Peter> SCT Status Version:                  3
Peter> SCT Version (vendor specific):       258 (0x0102)
Peter> Device State:                        Active (0)
Peter> Current Temperature:                    28 Celsius
Peter> Power Cycle Min/Max Temperature:     27/28 Celsius
Peter> Lifetime    Min/Max Temperature:      2/44 Celsius
Peter> Under/Over Temperature Limit Count:   0/0
Peter> Vendor specific:
Peter> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Peter> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Peter> SCT Temperature History Version:     2
Peter> Temperature Sampling Period:         1 minute
Peter> Temperature Logging Interval:        1 minute
Peter> Min/Max recommended Temperature:      0/60 Celsius
Peter> Min/Max Temperature Limit:           -41/85 Celsius
Peter> Temperature History Size (Index):    478 (444)

Peter> Index    Estimated Time   Temperature Celsius
Peter>  445    2022-08-25 13:22    29  **********
Peter>  ...    ..( 33 skipped).    ..  **********
Peter>    1    2022-08-25 13:56    29  **********
Peter>    2    2022-08-25 13:57     ?  -
Peter>    3    2022-08-25 13:58    29  **********
Peter>  ...    ..(  6 skipped).    ..  **********
Peter>   10    2022-08-25 14:05    29  **********
Peter>   11    2022-08-25 14:06     ?  -
Peter>   12    2022-08-25 14:07    26  *******
Peter>   13    2022-08-25 14:08     ?  -
Peter>   14    2022-08-25 14:09    27  ********
Peter>   15    2022-08-25 14:10    27  ********
Peter>   16    2022-08-25 14:11    28  *********
Peter>  ...    ..( 37 skipped).    ..  *********
Peter>   54    2022-08-25 14:49    28  *********
Peter>   55    2022-08-25 14:50    29  **********
Peter>  ...    ..(388 skipped).    ..  **********
Peter>  444    2022-08-25 21:19    29  **********

Peter> SCT Error Recovery Control command not supported

Peter> Device Statistics (GP/SMART Log 0x04) not supported

Peter> Pending Defects log (GP Log 0x0c) not supported

Peter> SATA Phy Event Counters (GP Log 0x11)
Peter> ID      Size     Value  Description
Peter> 0x0001  2            0  Command failed due to ICRC error
Peter> 0x0002  2            0  R_ERR response for data FIS
Peter> 0x0003  2            0  R_ERR response for device-to-host data FIS
Peter> 0x0004  2            0  R_ERR response for host-to-device data FIS
Peter> 0x0005  2            0  R_ERR response for non-data FIS
Peter> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
Peter> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
Peter> 0x0008  2            0  Device-to-host non-data FIS retries
Peter> 0x0009  2          286  Transition from drive PhyRdy to drive PhyNRdy
Peter> 0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
Peter> 0x000b  2            0  CRC errors within host-to-device FIS
Peter> 0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
Peter> 0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
Peter> 0x8000  4         2493  Vendor specific

Peter> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Peter> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Peter> === START OF INFORMATION SECTION ===
Peter> Model Family:     Western Digital Green
Peter> Device Model:     WDC WD30EZRX-00MMMB0
Peter> Serial Number:    WD-WCAWZ2669166
Peter> LU WWN Device Id: 5 0014ee 15a13d994
Peter> Firmware Version: 80.00A80
Peter> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Peter> Sector Sizes:     512 bytes logical, 4096 bytes physical
Peter> Device is:        In smartctl database [for details use: -P show]
Peter> ATA Version is:   ATA8-ACS (minor revision not indicated)
Peter> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Peter> Local Time is:    Thu Aug 25 21:19:53 2022 EDT
Peter> SMART support is: Available - device has SMART capability.
Peter> SMART support is: Enabled
Peter> AAM feature is:   Unavailable
Peter> APM feature is:   Unavailable
Peter> Rd look-ahead is: Enabled
Peter> Write cache is:   Enabled
Peter> DSN feature is:   Unavailable
Peter> ATA Security is:  Disabled, frozen [SEC2]
Peter> Wt Cache Reorder: Enabled

Peter> === START OF READ SMART DATA SECTION ===
Peter> SMART overall-health self-assessment test result: PASSED

Peter> General SMART Values:
Peter> Offline data collection status:  (0x82)    Offline data collection activity
Peter>                     was completed without error.
Peter>                     Auto Offline Data Collection: Enabled.
Peter> Self-test execution status:      (   0)    The previous self-test
Peter> routine completed
Peter>                     without error or no self-test has ever
Peter>                     been run.
Peter> Total time to complete Offline
Peter> data collection:         (50160) seconds.
Peter> Offline data collection
Peter> capabilities:              (0x7b) SMART execute Offline immediate.
Peter>                     Auto Offline data collection on/off support.
Peter>                     Suspend Offline collection upon new
Peter>                     command.
Peter>                     Offline surface scan supported.
Peter>                     Self-test supported.
Peter>                     Conveyance Self-test supported.
Peter>                     Selective Self-test supported.
Peter> SMART capabilities:            (0x0003)    Saves SMART data before entering
Peter>                     power-saving mode.
Peter>                     Supports SMART auto save timer.
Peter> Error logging capability:        (0x01)    Error logging supported.
Peter>                     General Purpose Logging supported.
Peter> Short self-test routine
Peter> recommended polling time:      (   2) minutes.
Peter> Extended self-test routine
Peter> recommended polling time:      ( 482) minutes.
Peter> Conveyance self-test routine
Peter> recommended polling time:      (   5) minutes.
Peter> SCT capabilities:            (0x3035)    SCT Status supported.
Peter>                     SCT Feature Control supported.
Peter>                     SCT Data Table supported.

Peter> SMART Attributes Data Structure revision number: 16
Peter> Vendor Specific SMART Attributes with Thresholds:
Peter> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
Peter>   1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
Peter>   3 Spin_Up_Time            POS--K   153   138   021    -    9350
Peter>   4 Start_Stop_Count        -O--CK   100   100   000    -    297
Peter>   5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
Peter>   7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
Peter>   9 Power_On_Hours          -O--CK   040   040   000    -    44409
Peter>  10 Spin_Retry_Count        -O--CK   100   100   000    -    0
Peter>  11 Calibration_Retry_Count -O--CK   100   100   000    -    0
Peter>  12 Power_Cycle_Count       -O--CK   100   100   000    -    268
Peter> 192 Power-Off_Retract_Count -O--CK   200   200   000    -    218
Peter> 193 Load_Cycle_Count        -O--CK   001   001   000    -    1082082
Peter> 194 Temperature_Celsius     -O---K   122   105   000    -    30
Peter> 196 Reallocated_Event_Count -O--CK   200   200   000    -    0
Peter> 197 Current_Pending_Sector  -O--CK   200   200   000    -    0
Peter> 198 Offline_Uncorrectable   ----CK   200   200   000    -    0
Peter> 199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
Peter> 200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
Peter>                             ||||||_ K auto-keep
Peter>                             |||||__ C event count
Peter>                             ||||___ R error rate
Peter>                             |||____ S speed/performance
Peter>                             ||_____ O updated online
Peter>                             |______ P prefailure warning

Peter> General Purpose Log Directory Version 1
Peter> SMART           Log Directory Version 1 [multi-sector log support]
Peter> Address    Access  R/W   Size  Description
Peter> 0x00       GPL,SL  R/O      1  Log Directory
Peter> 0x01           SL  R/O      1  Summary SMART error log
Peter> 0x02           SL  R/O      5  Comprehensive SMART error log
Peter> 0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
Peter> 0x06           SL  R/O      1  SMART self-test log
Peter> 0x07       GPL     R/O      1  Extended self-test log
Peter> 0x09           SL  R/W      1  Selective self-test log
Peter> 0x10       GPL     R/O      1  NCQ Command Error log
Peter> 0x11       GPL     R/O      1  SATA Phy Event Counters log
Peter> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
Peter> 0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
Peter> 0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
Peter> 0xbd       GPL,SL  VS       1  Device vendor specific log
Peter> 0xc0       GPL,SL  VS       1  Device vendor specific log
Peter> 0xc1       GPL     VS      93  Device vendor specific log
Peter> 0xe0       GPL,SL  R/W      1  SCT Command/Status
Peter> 0xe1       GPL,SL  R/W      1  SCT Data Transfer

Peter> SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Peter> No Errors Logged

Peter> SMART Extended Self-test Log Version: 1 (1 sectors)
Peter> No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Peter> SMART Selective self-test log data structure revision number 1
Peter>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Peter>     1        0        0  Not_testing
Peter>     2        0        0  Not_testing
Peter>     3        0        0  Not_testing
Peter>     4        0        0  Not_testing
Peter>     5        0        0  Not_testing
Peter> Selective self-test flags (0x0):
Peter>   After scanning selected spans, do NOT read-scan remainder of disk.
Peter> If Selective self-test is pending on power-up, resume after 0 minute delay.

Peter> SCT Status Version:                  3
Peter> SCT Version (vendor specific):       258 (0x0102)
Peter> Device State:                        Active (0)
Peter> Current Temperature:                    30 Celsius
Peter> Power Cycle Min/Max Temperature:     27/30 Celsius
Peter> Lifetime    Min/Max Temperature:      0/47 Celsius
Peter> Under/Over Temperature Limit Count:   0/0

Peter> SCT Temperature History Version:     2
Peter> Temperature Sampling Period:         1 minute
Peter> Temperature Logging Interval:        1 minute
Peter> Min/Max recommended Temperature:      0/60 Celsius
Peter> Min/Max Temperature Limit:           -41/85 Celsius
Peter> Temperature History Size (Index):    478 (88)

Peter> Index    Estimated Time   Temperature Celsius
Peter>   89    2022-08-25 13:22    30  ***********
Peter>  ...    ..( 33 skipped).    ..  ***********
Peter>  123    2022-08-25 13:56    30  ***********
Peter>  124    2022-08-25 13:57     ?  -
Peter>  125    2022-08-25 13:58    31  ************
Peter>  126    2022-08-25 13:59    30  ***********
Peter>  ...    ..(  5 skipped).    ..  ***********
Peter>  132    2022-08-25 14:05    30  ***********
Peter>  133    2022-08-25 14:06     ?  -
Peter>  134    2022-08-25 14:07    26  *******
Peter>  135    2022-08-25 14:08     ?  -
Peter>  136    2022-08-25 14:09    27  ********
Peter>  ...    ..(  3 skipped).    ..  ********
Peter>  140    2022-08-25 14:13    27  ********
Peter>  141    2022-08-25 14:14    28  *********
Peter>  ...    ..(  3 skipped).    ..  *********
Peter>  145    2022-08-25 14:18    28  *********
Peter>  146    2022-08-25 14:19    29  **********
Peter>  ...    ..( 13 skipped).    ..  **********
Peter>  160    2022-08-25 14:33    29  **********
Peter>  161    2022-08-25 14:34    30  ***********
Peter>  ...    ..( 43 skipped).    ..  ***********
Peter>  205    2022-08-25 15:18    30  ***********
Peter>  206    2022-08-25 15:19    31  ************
Peter>  207    2022-08-25 15:20    30  ***********
Peter>  ...    ..(168 skipped).    ..  ***********
Peter>  376    2022-08-25 18:09    30  ***********
Peter>  377    2022-08-25 18:10    31  ************
Peter>  378    2022-08-25 18:11    30  ***********
Peter>  ...    ..( 34 skipped).    ..  ***********
Peter>  413    2022-08-25 18:46    30  ***********
Peter>  414    2022-08-25 18:47    31  ************
Peter>  415    2022-08-25 18:48    30  ***********
Peter>  ...    ..(  7 skipped).    ..  ***********
Peter>  423    2022-08-25 18:56    30  ***********
Peter>  424    2022-08-25 18:57    31  ************
Peter>  425    2022-08-25 18:58    30  ***********
Peter>  ...    ..(  7 skipped).    ..  ***********
Peter>  433    2022-08-25 19:06    30  ***********
Peter>  434    2022-08-25 19:07    31  ************
Peter>  435    2022-08-25 19:08    30  ***********
Peter>  ...    ..( 47 skipped).    ..  ***********
Peter>    5    2022-08-25 19:56    30  ***********
Peter>    6    2022-08-25 19:57    31  ************
Peter>    7    2022-08-25 19:58    30  ***********
Peter>  ...    ..( 80 skipped).    ..  ***********
Peter>   88    2022-08-25 21:19    30  ***********

Peter> SCT Error Recovery Control command not supported

Peter> Device Statistics (GP/SMART Log 0x04) not supported

Peter> Pending Defects log (GP Log 0x0c) not supported

Peter> SATA Phy Event Counters (GP Log 0x11)
Peter> ID      Size     Value  Description
Peter> 0x0001  2            0  Command failed due to ICRC error
Peter> 0x0002  2            0  R_ERR response for data FIS
Peter> 0x0003  2            0  R_ERR response for device-to-host data FIS
Peter> 0x0004  2            0  R_ERR response for host-to-device data FIS
Peter> 0x0005  2            0  R_ERR response for non-data FIS
Peter> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
Peter> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
Peter> 0x000a  2            3  Device-to-host register FISes sent due to a COMRESET
Peter> 0x000b  2            0  CRC errors within host-to-device FIS
Peter> 0x8000  4         2492  Vendor specific

Peter> smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-17-amd64] (local build)
Peter> Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

Peter> === START OF INFORMATION SECTION ===
Peter> Model Family:     Toshiba P300
Peter> Device Model:     TOSHIBA HDWD130
Peter> Serial Number:    477ABEJAS
Peter> LU WWN Device Id: 5 000039 fe6d2ce25
Peter> Firmware Version: MX6OACF0
Peter> User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Peter> Sector Sizes:     512 bytes logical, 4096 bytes physical
Peter> Rotation Rate:    7200 rpm
Peter> Form Factor:      3.5 inches
Peter> Device is:        In smartctl database [for details use: -P show]
Peter> ATA Version is:   ATA8-ACS T13/1699-D revision 4
Peter> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Peter> Local Time is:    Thu Aug 25 21:19:53 2022 EDT
Peter> SMART support is: Available - device has SMART capability.
Peter> SMART support is: Enabled
Peter> AAM feature is:   Unavailable
Peter> APM feature is:   Disabled
Peter> Rd look-ahead is: Enabled
Peter> Write cache is:   Enabled
Peter> DSN feature is:   Unavailable
Peter> ATA Security is:  Disabled, frozen [SEC2]
Peter> Wt Cache Reorder: Enabled

Peter> === START OF READ SMART DATA SECTION ===
Peter> SMART overall-health self-assessment test result: PASSED

Peter> General SMART Values:
Peter> Offline data collection status:  (0x80)    Offline data collection activity
Peter>                     was never started.
Peter>                     Auto Offline Data Collection: Enabled.
Peter> Self-test execution status:      (   0)    The previous self-test
Peter> routine completed
Peter>                     without error or no self-test has ever
Peter>                     been run.
Peter> Total time to complete Offline
Peter> data collection:         (23082) seconds.
Peter> Offline data collection
Peter> capabilities:              (0x5b) SMART execute Offline immediate.
Peter>                     Auto Offline data collection on/off support.
Peter>                     Suspend Offline collection upon new
Peter>                     command.
Peter>                     Offline surface scan supported.
Peter>                     Self-test supported.
Peter>                     No Conveyance Self-test supported.
Peter>                     Selective Self-test supported.
Peter> SMART capabilities:            (0x0003)    Saves SMART data before entering
Peter>                     power-saving mode.
Peter>                     Supports SMART auto save timer.
Peter> Error logging capability:        (0x01)    Error logging supported.
Peter>                     General Purpose Logging supported.
Peter> Short self-test routine
Peter> recommended polling time:      (   1) minutes.
Peter> Extended self-test routine
Peter> recommended polling time:      ( 385) minutes.
Peter> SCT capabilities:            (0x003d)    SCT Status supported.
Peter>                     SCT Error Recovery Control supported.
Peter>                     SCT Feature Control supported.
Peter>                     SCT Data Table supported.

Peter> SMART Attributes Data Structure revision number: 16
Peter> Vendor Specific SMART Attributes with Thresholds:
Peter> ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
Peter>   1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
Peter>   2 Throughput_Performance  P-S---   140   140   054    -    68
Peter>   3 Spin_Up_Time            POS---   161   161   024    -    358 (Average 354)
Peter>   4 Start_Stop_Count        -O--C-   100   100   000    -    243
Peter>   5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
Peter>   7 Seek_Error_Rate         PO-R--   100   100   067    -    0
Peter>   8 Seek_Time_Performance   P-S---   126   126   020    -    32
Peter>   9 Power_On_Hours          -O--C-   094   094   000    -    44046
Peter>  10 Spin_Retry_Count        PO--C-   100   100   060    -    0
Peter>  12 Power_Cycle_Count       -O--CK   100   100   000    -    243
Peter> 192 Power-Off_Retract_Count -O--CK   100   100   000    -    912
Peter> 193 Load_Cycle_Count        -O--C-   100   100   000    -    912
Peter> 194 Temperature_Celsius     -O----   193   193   000    -    31 (Min/Max 19/46)
Peter> 196 Reallocated_Event_Count -O--CK   100   100   000    -    0
Peter> 197 Current_Pending_Sector  -O---K   100   100   000    -    0
Peter> 198 Offline_Uncorrectable   ---R--   100   100   000    -    0
Peter> 199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
Peter>                             ||||||_ K auto-keep
Peter>                             |||||__ C event count
Peter>                             ||||___ R error rate
Peter>                             |||____ S speed/performance
Peter>                             ||_____ O updated online
Peter>                             |______ P prefailure warning

Peter> General Purpose Log Directory Version 1
Peter> SMART           Log Directory Version 1 [multi-sector log support]
Peter> Address    Access  R/W   Size  Description
Peter> 0x00       GPL,SL  R/O      1  Log Directory
Peter> 0x01           SL  R/O      1  Summary SMART error log
Peter> 0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
Peter> 0x04       GPL     R/O      7  Device Statistics log
Peter> 0x06           SL  R/O      1  SMART self-test log
Peter> 0x07       GPL     R/O      1  Extended self-test log
Peter> 0x08       GPL     R/O      2  Power Conditions log
Peter> 0x09           SL  R/W      1  Selective self-test log
Peter> 0x10       GPL     R/O      1  NCQ Command Error log
Peter> 0x11       GPL     R/O      1  SATA Phy Event Counters log
Peter> 0x20       GPL     R/O      1  Streaming performance log [OBS-8]
Peter> 0x21       GPL     R/O      1  Write stream error log
Peter> 0x22       GPL     R/O      1  Read stream error log
Peter> 0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
Peter> 0xe0       GPL,SL  R/W      1  SCT Command/Status
Peter> 0xe1       GPL,SL  R/W      1  SCT Data Transfer

Peter> SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Peter> No Errors Logged

Peter> SMART Extended Self-test Log Version: 1 (1 sectors)
Peter> No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Peter> SMART Selective self-test log data structure revision number 1
Peter>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Peter>     1        0        0  Not_testing
Peter>     2        0        0  Not_testing
Peter>     3        0        0  Not_testing
Peter>     4        0        0  Not_testing
Peter>     5        0        0  Not_testing
Peter> Selective self-test flags (0x0):
Peter>   After scanning selected spans, do NOT read-scan remainder of disk.
Peter> If Selective self-test is pending on power-up, resume after 0 minute delay.

Peter> SCT Status Version:                  3
Peter> SCT Version (vendor specific):       256 (0x0100)
Peter> Device State:                        Active (0)
Peter> Current Temperature:                    31 Celsius
Peter> Power Cycle Min/Max Temperature:     28/32 Celsius
Peter> Lifetime    Min/Max Temperature:     19/46 Celsius
Peter> Under/Over Temperature Limit Count:   0/0

Peter> SCT Temperature History Version:     2
Peter> Temperature Sampling Period:         1 minute
Peter> Temperature Logging Interval:        1 minute
Peter> Min/Max recommended Temperature:      0/60 Celsius
Peter> Min/Max Temperature Limit:           -40/70 Celsius
Peter> Temperature History Size (Index):    128 (117)

Peter> Index    Estimated Time   Temperature Celsius
Peter>  118    2022-08-25 19:12    31  ************
Peter>  ...    ..( 76 skipped).    ..  ************
Peter>   67    2022-08-25 20:29    31  ************
Peter>   68    2022-08-25 20:30     ?  -
Peter>   69    2022-08-25 20:31    31  ************
Peter>   70    2022-08-25 20:32    32  *************
Peter>   71    2022-08-25 20:33    31  ************
Peter>   72    2022-08-25 20:34    31  ************
Peter>   73    2022-08-25 20:35    32  *************
Peter>   74    2022-08-25 20:36    32  *************
Peter>   75    2022-08-25 20:37    32  *************
Peter>   76    2022-08-25 20:38     ?  -
Peter>   77    2022-08-25 20:39    28  *********
Peter>   78    2022-08-25 20:40    29  **********
Peter>  ...    ..(  2 skipped).    ..  **********
Peter>   81    2022-08-25 20:43    29  **********
Peter>   82    2022-08-25 20:44    30  ***********
Peter>  ...    ..(  5 skipped).    ..  ***********
Peter>   88    2022-08-25 20:50    30  ***********
Peter>   89    2022-08-25 20:51    31  ************
Peter>  ...    ..( 27 skipped).    ..  ************
Peter>  117    2022-08-25 21:19    31  ************

Peter> SCT Error Recovery Control:
Peter>            Read: Disabled
Peter>           Write: Disabled

Peter> Device Statistics (GP Log 0x04)
Peter> Page  Offset Size        Value Flags Description
Peter> 0x01  =====  =               =  ===  == General Statistics (rev 1) ==
Peter> 0x01  0x008  4             243  ---  Lifetime Power-On Resets
Peter> 0x01  0x010  4           44046  ---  Power-on Hours
Peter> 0x01  0x018  6     27756962802  ---  Logical Sectors Written
Peter> 0x01  0x020  6        86355955  ---  Number of Write Commands
Peter> 0x01  0x028  6    381193626849  ---  Logical Sectors Read
Peter> 0x01  0x030  6       791200694  ---  Number of Read Commands
Peter> 0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
Peter> 0x03  0x008  4           44040  ---  Spindle Motor Power-on Hours
Peter> 0x03  0x010  4           44040  ---  Head Flying Hours
Peter> 0x03  0x018  4             912  ---  Head Load Events
Peter> 0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
Peter> 0x03  0x028  4               0  ---  Read Recovery Attempts
Peter> 0x03  0x030  4               6  ---  Number of Mechanical Start Failures
Peter> 0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
Peter> 0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
Peter> 0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and
Peter> Completion
Peter> 0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
Peter> 0x05  0x008  1              32  ---  Current Temperature
Peter> 0x05  0x010  1              31  N--  Average Short Term Temperature
Peter> 0x05  0x018  1              35  N--  Average Long Term Temperature
Peter> 0x05  0x020  1              46  ---  Highest Temperature
Peter> 0x05  0x028  1              19  ---  Lowest Temperature
Peter> 0x05  0x030  1              43  N--  Highest Average Short Term Temperature
Peter> 0x05  0x038  1              25  N--  Lowest Average Short Term Temperature
Peter> 0x05  0x040  1              41  N--  Highest Average Long Term Temperature
Peter> 0x05  0x048  1              25  N--  Lowest Average Long Term Temperature
Peter> 0x05  0x050  4               0  ---  Time in Over-Temperature
Peter> 0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
Peter> 0x05  0x060  4               0  ---  Time in Under-Temperature
Peter> 0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
Peter> 0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
Peter> 0x06  0x008  4            4706  ---  Number of Hardware Resets
Peter> 0x06  0x010  4            3910  ---  Number of ASR Events
Peter> 0x06  0x018  4               0  ---  Number of Interface CRC Errors
Peter>                                 |||_ C monitored condition met
Peter>                                 ||__ D supports DSN
Peter>                                 |___ N normalized value

Peter> Pending Defects log (GP Log 0x0c) not supported

Peter> SATA Phy Event Counters (GP Log 0x11)
Peter> ID      Size     Value  Description
Peter> 0x0001  2            0  Command failed due to ICRC error
Peter> 0x0002  2            0  R_ERR response for data FIS
Peter> 0x0003  2            0  R_ERR response for device-to-host data FIS
Peter> 0x0004  2            0  R_ERR response for host-to-device data FIS
Peter> 0x0005  2            0  R_ERR response for non-data FIS
Peter> 0x0006  2            0  R_ERR response for device-to-host non-data FIS
Peter> 0x0007  2            0  R_ERR response for host-to-device non-data FIS
Peter> 0x0009  2           29  Transition from drive PhyRdy to drive PhyNRdy
Peter> 0x000a  2            5  Device-to-host register FISes sent due to a COMRESET
Peter> 0x000b  2            0  CRC errors within host-to-device FIS
Peter> 0x000d  2            0  Non-CRC errors within host-to-device FIS


Peter> mdadm --examine devices -----
Peter> /dev/sda:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)
Peter> /dev/sdb:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)
Peter> /dev/sdc:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)
Peter> /dev/sdd:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)
Peter> /dev/sde:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)
Peter> /dev/sdf:
Peter>    MBR Magic : aa55
Peter> Partition[0] :   4294967295 sectors at            1 (type ee)

Peter> mdadm --detail /dev/md0 ------

Peter> lsdrv ------------------------
Peter> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
Peter> Corporation E12 NVMe Controller (rev 01)
Peter> └nvme nvme0 PCIe SSD                                 {21112925606047}
Peter>  └nvme0n1 238.47g [259:0] Partitioned (dos)
Peter>   ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
Peter>   │└Mounted as /dev/nvme0n1p1 @ /boot
Peter>   ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
Peter>   ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
Peter>   │└Mounted as /dev/nvme0n1p5 @ /
Peter>   ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
Peter>   │└Mounted as /dev/nvme0n1p6 @ /var
Peter>   ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
Peter>   ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
Peter>   │└Mounted as /dev/nvme0n1p8 @ /home
Peter>   └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
Peter> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
Peter> JMB368 IDE controller (rev 10)
Peter> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
Peter>  └sr0 1.00g [11:0] Empty/Unknown
Peter> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
Peter> SATA/IDE Controller (rev 10)
Peter> └scsi 2:x:x:x [Empty]
Peter> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
Peter> FCH SATA Controller [AHCI mode] (rev 51)
Peter> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
Peter> │└sda 2.73t [8:0] Partitioned (PMBR)
Peter> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
Peter>  └sdc 2.73t [8:32] Partitioned (gpt)
Peter> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
Peter> FCH SATA Controller [AHCI mode] (rev 51)
Peter> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
Peter> │└sdb 2.73t [8:16] Partitioned (gpt)
Peter> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
Peter> │└sdd 2.73t [8:48] Partitioned (gpt)
Peter> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
Peter> │└sde 2.73t [8:64] Partitioned (gpt)
Peter> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
Peter>  └sdf 2.73t [8:80] Partitioned (gpt)

Peter> cat /proc/mdstat -------------
Peter> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
Peter> [raid4] [raid10]
Peter> unused devices: <none>

Peter> cat /etc/mdadm/mdadm.conf ----
Peter> # mdadm.conf
Peter> #
Peter> # !NB! Run update-initramfs -u after updating this file.
Peter> # !NB! This will ensure that initramfs has an uptodate copy.
Peter> #
Peter> # Please refer to mdadm.conf(5) for information about this file.
Peter> #

Peter> # by default (built-in), scan all partitions (/proc/partitions) and all
Peter> # containers for MD superblocks. alternatively, specify devices to scan, using
Peter> # wildcards if desired.
Peter> #DEVICE partitions containers

Peter> # automatically tag new arrays as belonging to the local system
Peter> HOMEHOST <system>

Peter> # instruct the monitoring daemon where to send mail alerts
Peter> MAILADDR root

Peter> # definitions of existing MD arrays
Peter> ARRAY /dev/md/0  metadata=1.2 UUID=109fa7b0:cf08fdba:e36284a9:5786ffff
Peter> name=superior:0

Peter> # This configuration was auto-generated on Sun, 26 Dec 2021 13:31:14
Peter> -0500 by mkconf

Peter> cat /proc/partitions ---------
Peter> major minor  #blocks  name

Peter>  259        0  250059096 nvme0n1
Peter>  259        1     496640 nvme0n1p1
Peter>  259        2          1 nvme0n1p2
Peter>  259        3   63475712 nvme0n1p5
Peter>  259        4   97654784 nvme0n1p6
Peter>  259        5      37888 nvme0n1p7
Peter>  259        6   86913024 nvme0n1p8
Peter>  259        7    1474560 nvme0n1p9
Peter>    8       32 2930266584 sdc
Peter>    8       80 2930266584 sdf
Peter>    8       64 2930266584 sde
Peter>    8       48 2930266584 sdd
Peter>    8       16 2930266584 sdb
Peter>    8        0 2930266584 sda
Peter>   11        0    1048575 sr0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28  9:54   ` Wols Lists
@ 2022-08-28 16:47     ` Phil Turmel
       [not found]       ` <CAKAPSkJAQYsec-4zzcePbkJ7Ee0=sd_QvHj4Stnyineq+T8BXw@mail.gmail.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Phil Turmel @ 2022-08-28 16:47 UTC (permalink / raw)
  To: Wols Lists, Peter Sanders, John Stoffel; +Cc: NeilBrown, linux-raid

Hi Peter, et al,

On 8/28/22 05:54, Wols Lists wrote:
> On 28/08/2022 10:14, Wols Lists wrote:
>>> Currently I have no /dev/md* devices.
>>> I have access to the old mdadm.conf file - have tried assembling with
>>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
>>> and /etc/mdadm.
>>
>> It looks like the drives weren't partitioned :-( I think you're into 
>> forensics.

It is too soon to say this.  The supplied mdadm.conf file does not
contain specific partition information.  It is possible the partition
tables have just been wiped.


> Whoops - my system froze while I was originally writing my reply, and I 
> forgot to put this into my rewrite ...
> 
> Look up overlays in the wiki. I've never done it myself, but a fair few 
> people have said the instructions worked a treat.
> 
> You're basically making the drives read-only (all writes get dumped into 
> the overlay file), and then re-creating the array over the top, so you 
> can test whether you got it right. If you don't, you just ditch the 
> overlays and start again, if you did get it right you can recreate the 
> array for real.
> 
> Cheers,
> Wol

On 8/28/22 11:10, John Stoffel wrote:
>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> 
> Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
> Peter> Had hardware issues with my system - ended up replacing the
> Peter> motherboard, video card, and power supply and re-installing the OS
> Peter> (Debian 11).
> 
> Can you give us details on the old vs new motherboard/cpu?  It might
> be that you need to tweak the BIOS of the motherboard to expose the
> old SATA formats as well.  
> 
> Did you install debian onto a fresh boot disk?  Is your BIOS setup to
> only do the new form of booting from UEFI devices, so maybe check your
> BIOS settings that the data drives are all in AHCI mode, or possibly
> even in IDE mode.  It all depends on how old the original hardware
> was.  
> 
> I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
> the BIOS defaults to see my disks.  I guess I should do a clean
> install from a blank disk, but I wanted to minimize downtime.  

It is important to end up in AHCI mode on all MOBO ports.  If not set 
that way now, please change them.

> Wols has some great advice here, and I heartily recommend that you use
> overlayfs when doing your testing.  Check the RAID WIKI for
> suggestions.

Concur.

> And don't panic!  Your data is probably there, but just missing the
> super blocks or partition tables. 

Both, I suspect.

On 8/27/22 22:00, Peter Sanders wrote:
> lsdrv ------------------------
> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
> Corporation E12 NVMe Controller (rev 01)
> └nvme nvme0 PCIe SSD                                 {21112925606047}
>  └nvme0n1 238.47g [259:0] Partitioned (dos)
>   ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
>   │└Mounted as /dev/nvme0n1p1 @ /boot
>   ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
>   ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
>   │└Mounted as /dev/nvme0n1p5 @ /
>   ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
>   │└Mounted as /dev/nvme0n1p6 @ /var
>   ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
>   ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
>   │└Mounted as /dev/nvme0n1p8 @ /home
>   └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
> JMB368 IDE controller (rev 10)
> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
>  └sr0 1.00g [11:0] Empty/Unknown
> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
> SATA/IDE Controller (rev 10)
> └scsi 2:x:x:x [Empty]
> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
> FCH SATA Controller [AHCI mode] (rev 51)
> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
> │└sda 2.73t [8:0] Partitioned (PMBR)
> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
>  └sdc 2.73t [8:32] Partitioned (gpt)
> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
> FCH SATA Controller [AHCI mode] (rev 51)
> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
> │└sdb 2.73t [8:16] Partitioned (gpt)
> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
> │└sdd 2.73t [8:48] Partitioned (gpt)
> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
> │└sde 2.73t [8:64] Partitioned (gpt)
> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
>  └sdf 2.73t [8:80] Partitioned (gpt)

Unfortunately, my lsdrv tool is not able to reconstruct missing parts. 
It is most useful when used on a *good* system and *saved* for help 
diagnosing *future* problems.

Please share your /etc/fstab, and if you were using LVM on top of the 
raid, share your lvm.conf and anything in /etc/lvm/backup.

Please describe the layer(s) that were on top of the raid.

We need to help you look for signatures, and it helps to be selective in 
what signatures to look for.

After that, we will want to figure out your raid's chunk size and data 
offsets.  If you know of a particular large file (8MB or larger) that is 
sure to be in the raid and you happen to have a copy tucked away, then 
my findHash[1] tool might be able to definitively determine those 
values.  (Time consuming, though.)

Meanwhile, don't do *anything* that would write to those drives.

Phil

[1] https://github.com/pturmel/findHash

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28  2:00 RAID 6, 6 device array - all devices lost superblock Peter Sanders
  2022-08-28  9:14 ` Wols Lists
  2022-08-28 15:10 ` John Stoffel
@ 2022-08-28 17:11 ` Andy Smith
  2022-08-28 17:22   ` Andy Smith
  2 siblings, 1 reply; 29+ messages in thread
From: Andy Smith @ 2022-08-28 17:11 UTC (permalink / raw)
  To: linux-raid

Hi Peter,

On Sat, Aug 27, 2022 at 10:00:32PM -0400, Peter Sanders wrote:
> After the hardware was replaced, my array will not assemble - mdadm
> assemble reports no RAID superblock on the devices.
> root@superior:/etc/mdadm# mdadm --assemble --scan --verbose
> mdadm: looking for devices for /dev/md/0
> mdadm: cannot open device /dev/sr0: No medium found
> mdadm: No super block found on /dev/sda (Expected magic a92b4efc, got 00000000)
> mdadm: no RAID superblock on /dev/sda
> mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 00000000)
> mdadm: no RAID superblock on /dev/sdb

I'm wondering if this is one of those motherboards that at boot
helpfully writes a new empty GPT on any drive that it thinks doesn't
have any kind of partitioning. I say this because:

- It looks like you're using sd{a,b} etc with no partitions
- I've heard of motherboards that do this
- You say you just switched to a new motherboard

Cheers,
Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
       [not found]       ` <CAKAPSkJAQYsec-4zzcePbkJ7Ee0=sd_QvHj4Stnyineq+T8BXw@mail.gmail.com>
@ 2022-08-28 17:16         ` Wols Lists
  2022-08-28 18:45         ` John Stoffel
  1 sibling, 0 replies; 29+ messages in thread
From: Wols Lists @ 2022-08-28 17:16 UTC (permalink / raw)
  To: Peter Sanders, Phil Turmel; +Cc: John Stoffel, NeilBrown, linux-raid

On 28/08/2022 18:01, Peter Sanders wrote:
> It was set up on the device level, not partitions.  (I remember getting 
> some advice on the web that device was better than partition... Yay for 
> internet advice)

Well, it really SHOULDN'T matter. Except there's plenty of crap software 
that says "ooh a disk with no partition table - it must be empty - let's 
initialise it without asking the user whether it's okay". Or, as in your 
case, it seems like your mobo has wiped the start of the disk for some 
reason. We now recommend partitions, not because it's better, but as a 
defensive mechanism against the idiots out there ... :-(
> 
> I'm surveying my other disks to see what I have available to do the 
> overlay attempt.
> 
> What are the size of the overlay files going to end up being?

I'm not sure what the recommendation is, I think it used to be about 
10%, but I think you'll get away with much less. If you have the space 
ELSEWHERE eg your system partition, try and allow about 1% ie 60GB per 
drive. So you want about 360GB of free space if possible.

I don't think it's dangerous if you don't allow enough space - you'll 
just hit a "disk full" on your overlays which will be a frustration but 
not a disaster. So just give it as much room as you can afford.
> 
> I did run into UEFI vs AHCI issues early in the process.. they are all 
> set to non-UEFI.
> 
> OS update was onto a new SSD...
> 
Cheers,
Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 17:11 ` Andy Smith
@ 2022-08-28 17:22   ` Andy Smith
  2022-08-28 17:34     ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: Andy Smith @ 2022-08-28 17:22 UTC (permalink / raw)
  To: linux-raid

On Sun, Aug 28, 2022 at 05:11:56PM +0000, Andy Smith wrote:
> I'm wondering if this is one of those motherboards that at boot
> helpfully writes a new empty GPT

Sorry, all the other replies speculating the same have just arrived
in my inbox!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 17:22   ` Andy Smith
@ 2022-08-28 17:34     ` Peter Sanders
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Sanders @ 2022-08-28 17:34 UTC (permalink / raw)
  To: linux-raid

It was set up on the device level, not partitions.  (I remember
getting some advice on the web that device was better than
partition... Yay for internet advice)

I'm surveying my other disks to see what I have available to do the
overlay attempt.

What are the size of the overlay files going to end up being?

I did run into UEFI vs AHCI issues early in the process.. they are all
set to non-UEFI.

OS update was onto a new SSD...

On Sun, Aug 28, 2022 at 1:23 PM Andy Smith <andy@strugglers.net> wrote:
>
> On Sun, Aug 28, 2022 at 05:11:56PM +0000, Andy Smith wrote:
> > I'm wondering if this is one of those motherboards that at boot
> > helpfully writes a new empty GPT
>
> Sorry, all the other replies speculating the same have just arrived
> in my inbox!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
       [not found]       ` <CAKAPSkJAQYsec-4zzcePbkJ7Ee0=sd_QvHj4Stnyineq+T8BXw@mail.gmail.com>
  2022-08-28 17:16         ` Wols Lists
@ 2022-08-28 18:45         ` John Stoffel
  2022-08-28 19:36           ` Phil Turmel
  1 sibling, 1 reply; 29+ messages in thread
From: John Stoffel @ 2022-08-28 18:45 UTC (permalink / raw)
  To: Peter Sanders
  Cc: Phil Turmel, Wols Lists, John Stoffel, NeilBrown, linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

Peter> It was set up on the device level, not partitions.  (I remember
Peter> getting some advice on the web that device was better than
Peter> partition... Yay for internet advice)

Yeah, this is NOT good advice.  Generally systems will not destroy
existing partition tables, but if they see an empty (to them)
disk... all bets are off.

Peter> I'm surveying my other disks to see what I have available to do
Peter> the overlay attempt.

They're small.  They are sparse files, so just follow the
instructions. 

Peter> What are the size of the overlay files going to end up being?  

Not too large, but it depends on how much data is written to the
overlayfs to get your data back.  If you follow the instructions on
this page:

   https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID

It says to create a sparse file for each disk that is 1% of the size
of the disk.  This can add up... you might need to add a blank disk to
your system to hold these.  

In this case, if you think you know which order the disks were in, you
could try to create the RAID6 array (but only using the overlayfs
devices!!!!!)  I can't stress this enough.  


Peter> I did run into UEFI vs AHCI issues early in the process.. they
Peter> are all set to non-UEFI.

That's good. 

Peter> OS update was onto a new SSD...

Ok.  Do you have the old OS disk around by any chance?  That might
give some pointers to how the disks are setup..  You could look in
/var/tmp/initrd/... for old mdadm.conf files, which might give more
details.  

Peter> On Sun, Aug 28, 2022, 12:47 Phil Turmel <philip@turmel.org> wrote:

Peter>     Hi Peter, et al,
   
Peter>     On 8/28/22 05:54, Wols Lists wrote:
>> On 28/08/2022 10:14, Wols Lists wrote:
>>>> Currently I have no /dev/md* devices.
>>>> I have access to the old mdadm.conf file - have tried assembling with
>>>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
>>>> and /etc/mdadm.
>>> 
>>> It looks like the drives weren't partitioned :-( I think you're into
>>> forensics.
   
Peter>     It is too soon to say this.  The supplied mdadm.conf file does not
Peter>     contain specific partition information.  It is possible the partition
Peter>     tables have just been wiped.

>> Whoops - my system froze while I was originally writing my reply, and I
>> forgot to put this into my rewrite ...
>> 
>> Look up overlays in the wiki. I've never done it myself, but a fair few
>> people have said the instructions worked a treat.
>> 
>> You're basically making the drives read-only (all writes get dumped into
>> the overlay file), and then re-creating the array over the top, so you
>> can test whether you got it right. If you don't, you just ditch the
>> overlays and start again, if you did get it right you can recreate the
>> array for real.
>> 
>> Cheers,
>> Wol
   
Peter>     On 8/28/22 11:10, John Stoffel wrote:
>>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> 
Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
Peter> Had hardware issues with my system - ended up replacing the
Peter> motherboard, video card, and power supply and re-installing the OS
Peter> (Debian 11).
>> 
>> Can you give us details on the old vs new motherboard/cpu?  It might
>> be that you need to tweak the BIOS of the motherboard to expose the
>> old SATA formats as well. 
>> 
>> Did you install debian onto a fresh boot disk?  Is your BIOS setup to
>> only do the new form of booting from UEFI devices, so maybe check your
>> BIOS settings that the data drives are all in AHCI mode, or possibly
>> even in IDE mode.  It all depends on how old the original hardware
>> was. 
>> 
>> I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
>> the BIOS defaults to see my disks.  I guess I should do a clean
>> install from a blank disk, but I wanted to minimize downtime. 
   
Peter>     It is important to end up in AHCI mode on all MOBO ports.  If not set
Peter>     that way now, please change them.
   
>> Wols has some great advice here, and I heartily recommend that you use
>> overlayfs when doing your testing.  Check the RAID WIKI for
>> suggestions.
   
Peter>     Concur.
   
>> And don't panic!  Your data is probably there, but just missing the
>> super blocks or partition tables.
   
Peter>     Both, I suspect.
   
Peter>     On 8/27/22 22:00, Peter Sanders wrote:
>> lsdrv ------------------------
>> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
>> Corporation E12 NVMe Controller (rev 01)
>> └nvme nvme0 PCIe SSD                                 {21112925606047}
>>   └nvme0n1 238.47g [259:0] Partitioned (dos)
>>    ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
>>    │└Mounted as /dev/nvme0n1p1 @ /boot
>>    ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
>>    ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
>>    │└Mounted as /dev/nvme0n1p5 @ /
>>    ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
>>    │└Mounted as /dev/nvme0n1p6 @ /var
>>    ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
>>    ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
>>    │└Mounted as /dev/nvme0n1p8 @ /home
>>    └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
>> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
>> JMB368 IDE controller (rev 10)
>> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
>>   └sr0 1.00g [11:0] Empty/Unknown
>> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
>> SATA/IDE Controller (rev 10)
>> └scsi 2:x:x:x [Empty]
>> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
>> FCH SATA Controller [AHCI mode] (rev 51)
>> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
>> │└sda 2.73t [8:0] Partitioned (PMBR)
>> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
>>   └sdc 2.73t [8:32] Partitioned (gpt)
>> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
>> FCH SATA Controller [AHCI mode] (rev 51)
>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
>> │└sdb 2.73t [8:16] Partitioned (gpt)
>> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
>> │└sdd 2.73t [8:48] Partitioned (gpt)
>> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
>> │└sde 2.73t [8:64] Partitioned (gpt)
>> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
>>   └sdf 2.73t [8:80] Partitioned (gpt)
   
Peter>     Unfortunately, my lsdrv tool is not able to reconstruct missing parts.
Peter>     It is most useful when used on a *good* system and *saved* for help
Peter>     diagnosing *future* problems.
   
Peter>     Please share your /etc/fstab, and if you were using LVM on top of the
Peter>     raid, share your lvm.conf and anything in /etc/lvm/backup.
   
Peter>     Please describe the layer(s) that were on top of the raid.
   
Peter>     We need to help you look for signatures, and it helps to be selective in
Peter>     what signatures to look for.
   
Peter>     After that, we will want to figure out your raid's chunk size and data
Peter>     offsets.  If you know of a particular large file (8MB or larger) that is
Peter>     sure to be in the raid and you happen to have a copy tucked away, then
Peter>     my findHash[1] tool might be able to definitively determine those
Peter>     values.  (Time consuming, though.)
   
Peter>     Meanwhile, don't do *anything* that would write to those drives.
   
Peter>     Phil
   
Peter>     [1] https://github.com/pturmel/findHash


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 18:45         ` John Stoffel
@ 2022-08-28 19:36           ` Phil Turmel
  2022-08-28 19:49             ` John Stoffel
  0 siblings, 1 reply; 29+ messages in thread
From: Phil Turmel @ 2022-08-28 19:36 UTC (permalink / raw)
  To: John Stoffel, Peter Sanders; +Cc: Wols Lists, NeilBrown, linux-raid

Pssst! John,

All of my comments were attributed to Peter by your mail client. ):

On 8/28/22 14:45, John Stoffel wrote:
>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> 
> Peter> It was set up on the device level, not partitions.  (I remember
> Peter> getting some advice on the web that device was better than
> Peter> partition... Yay for internet advice)
> 
> Yeah, this is NOT good advice.  Generally systems will not destroy
> existing partition tables, but if they see an empty (to them)
> disk... all bets are off.
> 
> Peter> I'm surveying my other disks to see what I have available to do
> Peter> the overlay attempt.
> 
> They're small.  They are sparse files, so just follow the
> instructions.
> 
> Peter> What are the size of the overlay files going to end up being?
> 
> Not too large, but it depends on how much data is written to the
> overlayfs to get your data back.  If you follow the instructions on
> this page:
> 
>     https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID
> 
> It says to create a sparse file for each disk that is 1% of the size
> of the disk.  This can add up... you might need to add a blank disk to
> your system to hold these.
> 
> In this case, if you think you know which order the disks were in, you
> could try to create the RAID6 array (but only using the overlayfs
> devices!!!!!)  I can't stress this enough.
> 
> 
> Peter> I did run into UEFI vs AHCI issues early in the process.. they
> Peter> are all set to non-UEFI.
> 
> That's good.
> 
> Peter> OS update was onto a new SSD...
> 
> Ok.  Do you have the old OS disk around by any chance?  That might
> give some pointers to how the disks are setup..  You could look in
> /var/tmp/initrd/... for old mdadm.conf files, which might give more
> details.
> 
> Peter> On Sun, Aug 28, 2022, 12:47 Phil Turmel <philip@turmel.org> wrote:
> 
> Peter>     Hi Peter, et al,
>     
> Peter>     On 8/28/22 05:54, Wols Lists wrote:
>>> On 28/08/2022 10:14, Wols Lists wrote:
>>>>> Currently I have no /dev/md* devices.
>>>>> I have access to the old mdadm.conf file - have tried assembling with
>>>>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
>>>>> and /etc/mdadm.
>>>>
>>>> It looks like the drives weren't partitioned :-( I think you're into
>>>> forensics.
>     
> Peter>     It is too soon to say this.  The supplied mdadm.conf file does not
> Peter>     contain specific partition information.  It is possible the partition
> Peter>     tables have just been wiped.
> 
>>> Whoops - my system froze while I was originally writing my reply, and I
>>> forgot to put this into my rewrite ...
>>>
>>> Look up overlays in the wiki. I've never done it myself, but a fair few
>>> people have said the instructions worked a treat.
>>>
>>> You're basically making the drives read-only (all writes get dumped into
>>> the overlay file), and then re-creating the array over the top, so you
>>> can test whether you got it right. If you don't, you just ditch the
>>> overlays and start again, if you did get it right you can recreate the
>>> array for real.
>>>
>>> Cheers,
>>> Wol
>     
> Peter>     On 8/28/22 11:10, John Stoffel wrote:
>>>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>>>
> Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
> Peter> Had hardware issues with my system - ended up replacing the
> Peter> motherboard, video card, and power supply and re-installing the OS
> Peter> (Debian 11).
>>>
>>> Can you give us details on the old vs new motherboard/cpu?  It might
>>> be that you need to tweak the BIOS of the motherboard to expose the
>>> old SATA formats as well.
>>>
>>> Did you install debian onto a fresh boot disk?  Is your BIOS setup to
>>> only do the new form of booting from UEFI devices, so maybe check your
>>> BIOS settings that the data drives are all in AHCI mode, or possibly
>>> even in IDE mode.  It all depends on how old the original hardware
>>> was.
>>>
>>> I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
>>> the BIOS defaults to see my disks.  I guess I should do a clean
>>> install from a blank disk, but I wanted to minimize downtime.
>     
> Peter>     It is important to end up in AHCI mode on all MOBO ports.  If not set
> Peter>     that way now, please change them.
>     
>>> Wols has some great advice here, and I heartily recommend that you use
>>> overlayfs when doing your testing.  Check the RAID WIKI for
>>> suggestions.
>     
> Peter>     Concur.
>     
>>> And don't panic!  Your data is probably there, but just missing the
>>> super blocks or partition tables.
>     
> Peter>     Both, I suspect.
>     
> Peter>     On 8/27/22 22:00, Peter Sanders wrote:
>>> lsdrv ------------------------
>>> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
>>> Corporation E12 NVMe Controller (rev 01)
>>> └nvme nvme0 PCIe SSD                                 {21112925606047}
>>>    └nvme0n1 238.47g [259:0] Partitioned (dos)
>>>     ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
>>>     │└Mounted as /dev/nvme0n1p1 @ /boot
>>>     ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
>>>     ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
>>>     │└Mounted as /dev/nvme0n1p5 @ /
>>>     ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
>>>     │└Mounted as /dev/nvme0n1p6 @ /var
>>>     ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
>>>     ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
>>>     │└Mounted as /dev/nvme0n1p8 @ /home
>>>     └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
>>> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
>>> JMB368 IDE controller (rev 10)
>>> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
>>>    └sr0 1.00g [11:0] Empty/Unknown
>>> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
>>> SATA/IDE Controller (rev 10)
>>> └scsi 2:x:x:x [Empty]
>>> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
>>> FCH SATA Controller [AHCI mode] (rev 51)
>>> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
>>> │└sda 2.73t [8:0] Partitioned (PMBR)
>>> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
>>>    └sdc 2.73t [8:32] Partitioned (gpt)
>>> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
>>> FCH SATA Controller [AHCI mode] (rev 51)
>>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
>>> │└sdb 2.73t [8:16] Partitioned (gpt)
>>> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
>>> │└sdd 2.73t [8:48] Partitioned (gpt)
>>> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
>>> │└sde 2.73t [8:64] Partitioned (gpt)
>>> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
>>>    └sdf 2.73t [8:80] Partitioned (gpt)
>     
> Peter>     Unfortunately, my lsdrv tool is not able to reconstruct missing parts.
> Peter>     It is most useful when used on a *good* system and *saved* for help
> Peter>     diagnosing *future* problems.
>     
> Peter>     Please share your /etc/fstab, and if you were using LVM on top of the
> Peter>     raid, share your lvm.conf and anything in /etc/lvm/backup.
>     
> Peter>     Please describe the layer(s) that were on top of the raid.
>     
> Peter>     We need to help you look for signatures, and it helps to be selective in
> Peter>     what signatures to look for.
>     
> Peter>     After that, we will want to figure out your raid's chunk size and data
> Peter>     offsets.  If you know of a particular large file (8MB or larger) that is
> Peter>     sure to be in the raid and you happen to have a copy tucked away, then
> Peter>     my findHash[1] tool might be able to definitively determine those
> Peter>     values.  (Time consuming, though.)
>     
> Peter>     Meanwhile, don't do *anything* that would write to those drives.
>     
> Peter>     Phil
>     
> Peter>     [1] https://github.com/pturmel/findHash
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 19:36           ` Phil Turmel
@ 2022-08-28 19:49             ` John Stoffel
  2022-08-28 23:24               ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: John Stoffel @ 2022-08-28 19:49 UTC (permalink / raw)
  To: Phil Turmel
  Cc: John Stoffel, Peter Sanders, Wols Lists, NeilBrown, linux-raid

>>>>> "Phil" == Phil Turmel <philip@turmel.org> writes:

Phil> Pssst! John,
Phil> All of my comments were attributed to Peter by your mail client. ):

Yeah... sometimes my mail reader gets confused when it cites previous
emails.  I shoudl probably just drop to > only from now on.


Phil> On 8/28/22 14:45, John Stoffel wrote:
>>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> 
Peter> It was set up on the device level, not partitions.  (I remember
Peter> getting some advice on the web that device was better than
Peter> partition... Yay for internet advice)
>> 
>> Yeah, this is NOT good advice.  Generally systems will not destroy
>> existing partition tables, but if they see an empty (to them)
>> disk... all bets are off.
>> 
Peter> I'm surveying my other disks to see what I have available to do
Peter> the overlay attempt.
>> 
>> They're small.  They are sparse files, so just follow the
>> instructions.
>> 
Peter> What are the size of the overlay files going to end up being?
>> 
>> Not too large, but it depends on how much data is written to the
>> overlayfs to get your data back.  If you follow the instructions on
>> this page:
>> 
>> https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID
>> 
>> It says to create a sparse file for each disk that is 1% of the size
>> of the disk.  This can add up... you might need to add a blank disk to
>> your system to hold these.
>> 
>> In this case, if you think you know which order the disks were in, you
>> could try to create the RAID6 array (but only using the overlayfs
>> devices!!!!!)  I can't stress this enough.
>> 
>> 
Peter> I did run into UEFI vs AHCI issues early in the process.. they
Peter> are all set to non-UEFI.
>> 
>> That's good.
>> 
Peter> OS update was onto a new SSD...
>> 
>> Ok.  Do you have the old OS disk around by any chance?  That might
>> give some pointers to how the disks are setup..  You could look in
>> /var/tmp/initrd/... for old mdadm.conf files, which might give more
>> details.
>> 
Peter> On Sun, Aug 28, 2022, 12:47 Phil Turmel <philip@turmel.org> wrote:
>> 
Peter> Hi Peter, et al,
>> 
Peter> On 8/28/22 05:54, Wols Lists wrote:
>>>> On 28/08/2022 10:14, Wols Lists wrote:
>>>>> Currently I have no /dev/md* devices.
>>>>> I have access to the old mdadm.conf file - have tried assembling with
>>>>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
>>>>> and /etc/mdadm.
>>>>> 
>>>>> It looks like the drives weren't partitioned :-( I think you're into
>>>>> forensics.
>> 
Peter> It is too soon to say this.  The supplied mdadm.conf file does not
Peter> contain specific partition information.  It is possible the partition
Peter> tables have just been wiped.
>> 
>>>> Whoops - my system froze while I was originally writing my reply, and I
>>>> forgot to put this into my rewrite ...
>>>> 
>>>> Look up overlays in the wiki. I've never done it myself, but a fair few
>>>> people have said the instructions worked a treat.
>>>> 
>>>> You're basically making the drives read-only (all writes get dumped into
>>>> the overlay file), and then re-creating the array over the top, so you
>>>> can test whether you got it right. If you don't, you just ditch the
>>>> overlays and start again, if you did get it right you can recreate the
>>>> array for real.
>>>> 
>>>> Cheers,
>>>> Wol
>> 
Peter> On 8/28/22 11:10, John Stoffel wrote:
>>>>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>>>> 
Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
Peter> Had hardware issues with my system - ended up replacing the
Peter> motherboard, video card, and power supply and re-installing the OS
Peter> (Debian 11).
>>>> 
>>>> Can you give us details on the old vs new motherboard/cpu?  It might
>>>> be that you need to tweak the BIOS of the motherboard to expose the
>>>> old SATA formats as well.
>>>> 
>>>> Did you install debian onto a fresh boot disk?  Is your BIOS setup to
>>>> only do the new form of booting from UEFI devices, so maybe check your
>>>> BIOS settings that the data drives are all in AHCI mode, or possibly
>>>> even in IDE mode.  It all depends on how old the original hardware
>>>> was.
>>>> 
>>>> I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
>>>> the BIOS defaults to see my disks.  I guess I should do a clean
>>>> install from a blank disk, but I wanted to minimize downtime.
>> 
Peter> It is important to end up in AHCI mode on all MOBO ports.  If not set
Peter> that way now, please change them.
>> 
>>>> Wols has some great advice here, and I heartily recommend that you use
>>>> overlayfs when doing your testing.  Check the RAID WIKI for
>>>> suggestions.
>> 
Peter> Concur.
>> 
>>>> And don't panic!  Your data is probably there, but just missing the
>>>> super blocks or partition tables.
>> 
Peter> Both, I suspect.
>> 
Peter> On 8/27/22 22:00, Peter Sanders wrote:
>>>> lsdrv ------------------------
>>>> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
>>>> Corporation E12 NVMe Controller (rev 01)
>>>> └nvme nvme0 PCIe SSD                                 {21112925606047}
>>>>   └nvme0n1 238.47g [259:0] Partitioned (dos)
>>>>    ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
>>>>    │└Mounted as /dev/nvme0n1p1 @ /boot
>>>>    ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
>>>>    ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
>>>>    │└Mounted as /dev/nvme0n1p5 @ /
>>>>    ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
>>>>    │└Mounted as /dev/nvme0n1p6 @ /var
>>>>    ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
>>>>    ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
>>>>    │└Mounted as /dev/nvme0n1p8 @ /home
>>>>    └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
>>>> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
>>>> JMB368 IDE controller (rev 10)
>>>> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
>>>>   └sr0 1.00g [11:0] Empty/Unknown
>>>> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
>>>> SATA/IDE Controller (rev 10)
>>>> └scsi 2:x:x:x [Empty]
>>>> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
>>>> FCH SATA Controller [AHCI mode] (rev 51)
>>>> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
>>>> │└sda 2.73t [8:0] Partitioned (PMBR)
>>>> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
>>>>   └sdc 2.73t [8:32] Partitioned (gpt)
>>>> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
>>>> FCH SATA Controller [AHCI mode] (rev 51)
>>>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
>>>> │└sdb 2.73t [8:16] Partitioned (gpt)
>>>> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
>>>> │└sdd 2.73t [8:48] Partitioned (gpt)
>>>> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
>>>> │└sde 2.73t [8:64] Partitioned (gpt)
>>>> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
>>>>   └sdf 2.73t [8:80] Partitioned (gpt)
>> 
Peter> Unfortunately, my lsdrv tool is not able to reconstruct missing parts.
Peter> It is most useful when used on a *good* system and *saved* for help
Peter> diagnosing *future* problems.
>> 
Peter> Please share your /etc/fstab, and if you were using LVM on top of the
Peter> raid, share your lvm.conf and anything in /etc/lvm/backup.
>> 
Peter> Please describe the layer(s) that were on top of the raid.
>> 
Peter> We need to help you look for signatures, and it helps to be selective in
Peter> what signatures to look for.
>> 
Peter> After that, we will want to figure out your raid's chunk size and data
Peter> offsets.  If you know of a particular large file (8MB or larger) that is
Peter> sure to be in the raid and you happen to have a copy tucked away, then
Peter> my findHash[1] tool might be able to definitively determine those
Peter> values.  (Time consuming, though.)
>> 
Peter> Meanwhile, don't do *anything* that would write to those drives.
>> 
Peter> Phil
>> 
Peter> [1] https://github.com/pturmel/findHash
>> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 19:49             ` John Stoffel
@ 2022-08-28 23:24               ` Peter Sanders
  2022-08-29 13:12                 ` Peter Sanders
  2022-08-29 21:45                 ` John Stoffel
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Sanders @ 2022-08-28 23:24 UTC (permalink / raw)
  To: John Stoffel; +Cc: Phil Turmel, Wols Lists, NeilBrown, linux-raid

Phil,

fstab from the working config -

# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda1 during installation
UUID=50976432-b750-4809-80ac-3bbdd2773163 /               ext4
errors=remount-ro 0       1
# /home was on /dev/sda6 during installation
UUID=eb93a2c4-0190-41fa-a41d-7a5966c6bc47 /home           ext4
defaults        0       2
# /var was on /dev/sda5 during installation
UUID=d1aa6d1f-3ee9-48a8-9350-b15149f738c4 /var            ext4
defaults        0       2
/dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
/dev/sr1        /media/cdrom1   udf,iso9660 user,noauto     0       0
# raid array
/dev/md0    /mnt/raid6    ext4    defaults    0    2

No LVM, one large EXT4 partition

I have several large files ( NEF and various mpg files) I can identify
and have backup copies available.

I have the overlays created. 300G for each of the six drives.

- Peter

On Sun, Aug 28, 2022 at 3:49 PM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Phil" == Phil Turmel <philip@turmel.org> writes:
>
> Phil> Pssst! John,
> Phil> All of my comments were attributed to Peter by your mail client. ):
>
> Yeah... sometimes my mail reader gets confused when it cites previous
> emails.  I shoudl probably just drop to > only from now on.
>
>
> Phil> On 8/28/22 14:45, John Stoffel wrote:
> >>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >>
> Peter> It was set up on the device level, not partitions.  (I remember
> Peter> getting some advice on the web that device was better than
> Peter> partition... Yay for internet advice)
> >>
> >> Yeah, this is NOT good advice.  Generally systems will not destroy
> >> existing partition tables, but if they see an empty (to them)
> >> disk... all bets are off.
> >>
> Peter> I'm surveying my other disks to see what I have available to do
> Peter> the overlay attempt.
> >>
> >> They're small.  They are sparse files, so just follow the
> >> instructions.
> >>
> Peter> What are the size of the overlay files going to end up being?
> >>
> >> Not too large, but it depends on how much data is written to the
> >> overlayfs to get your data back.  If you follow the instructions on
> >> this page:
> >>
> >> https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID
> >>
> >> It says to create a sparse file for each disk that is 1% of the size
> >> of the disk.  This can add up... you might need to add a blank disk to
> >> your system to hold these.
> >>
> >> In this case, if you think you know which order the disks were in, you
> >> could try to create the RAID6 array (but only using the overlayfs
> >> devices!!!!!)  I can't stress this enough.
> >>
> >>
> Peter> I did run into UEFI vs AHCI issues early in the process.. they
> Peter> are all set to non-UEFI.
> >>
> >> That's good.
> >>
> Peter> OS update was onto a new SSD...
> >>
> >> Ok.  Do you have the old OS disk around by any chance?  That might
> >> give some pointers to how the disks are setup..  You could look in
> >> /var/tmp/initrd/... for old mdadm.conf files, which might give more
> >> details.
> >>
> Peter> On Sun, Aug 28, 2022, 12:47 Phil Turmel <philip@turmel.org> wrote:
> >>
> Peter> Hi Peter, et al,
> >>
> Peter> On 8/28/22 05:54, Wols Lists wrote:
> >>>> On 28/08/2022 10:14, Wols Lists wrote:
> >>>>> Currently I have no /dev/md* devices.
> >>>>> I have access to the old mdadm.conf file - have tried assembling with
> >>>>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
> >>>>> and /etc/mdadm.
> >>>>>
> >>>>> It looks like the drives weren't partitioned :-( I think you're into
> >>>>> forensics.
> >>
> Peter> It is too soon to say this.  The supplied mdadm.conf file does not
> Peter> contain specific partition information.  It is possible the partition
> Peter> tables have just been wiped.
> >>
> >>>> Whoops - my system froze while I was originally writing my reply, and I
> >>>> forgot to put this into my rewrite ...
> >>>>
> >>>> Look up overlays in the wiki. I've never done it myself, but a fair few
> >>>> people have said the instructions worked a treat.
> >>>>
> >>>> You're basically making the drives read-only (all writes get dumped into
> >>>> the overlay file), and then re-creating the array over the top, so you
> >>>> can test whether you got it right. If you don't, you just ditch the
> >>>> overlays and start again, if you did get it right you can recreate the
> >>>> array for real.
> >>>>
> >>>> Cheers,
> >>>> Wol
> >>
> Peter> On 8/28/22 11:10, John Stoffel wrote:
> >>>>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >>>>
> Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
> Peter> Had hardware issues with my system - ended up replacing the
> Peter> motherboard, video card, and power supply and re-installing the OS
> Peter> (Debian 11).
> >>>>
> >>>> Can you give us details on the old vs new motherboard/cpu?  It might
> >>>> be that you need to tweak the BIOS of the motherboard to expose the
> >>>> old SATA formats as well.
> >>>>
> >>>> Did you install debian onto a fresh boot disk?  Is your BIOS setup to
> >>>> only do the new form of booting from UEFI devices, so maybe check your
> >>>> BIOS settings that the data drives are all in AHCI mode, or possibly
> >>>> even in IDE mode.  It all depends on how old the original hardware
> >>>> was.
> >>>>
> >>>> I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
> >>>> the BIOS defaults to see my disks.  I guess I should do a clean
> >>>> install from a blank disk, but I wanted to minimize downtime.
> >>
> Peter> It is important to end up in AHCI mode on all MOBO ports.  If not set
> Peter> that way now, please change them.
> >>
> >>>> Wols has some great advice here, and I heartily recommend that you use
> >>>> overlayfs when doing your testing.  Check the RAID WIKI for
> >>>> suggestions.
> >>
> Peter> Concur.
> >>
> >>>> And don't panic!  Your data is probably there, but just missing the
> >>>> super blocks or partition tables.
> >>
> Peter> Both, I suspect.
> >>
> Peter> On 8/27/22 22:00, Peter Sanders wrote:
> >>>> lsdrv ------------------------
> >>>> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
> >>>> Corporation E12 NVMe Controller (rev 01)
> >>>> └nvme nvme0 PCIe SSD                                 {21112925606047}
> >>>>   └nvme0n1 238.47g [259:0] Partitioned (dos)
> >>>>    ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
> >>>>    │└Mounted as /dev/nvme0n1p1 @ /boot
> >>>>    ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
> >>>>    ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
> >>>>    │└Mounted as /dev/nvme0n1p5 @ /
> >>>>    ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
> >>>>    │└Mounted as /dev/nvme0n1p6 @ /var
> >>>>    ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
> >>>>    ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
> >>>>    │└Mounted as /dev/nvme0n1p8 @ /home
> >>>>    └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
> >>>> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
> >>>> JMB368 IDE controller (rev 10)
> >>>> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
> >>>>   └sr0 1.00g [11:0] Empty/Unknown
> >>>> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
> >>>> SATA/IDE Controller (rev 10)
> >>>> └scsi 2:x:x:x [Empty]
> >>>> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
> >>>> FCH SATA Controller [AHCI mode] (rev 51)
> >>>> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
> >>>> │└sda 2.73t [8:0] Partitioned (PMBR)
> >>>> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
> >>>>   └sdc 2.73t [8:32] Partitioned (gpt)
> >>>> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
> >>>> FCH SATA Controller [AHCI mode] (rev 51)
> >>>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
> >>>> │└sdb 2.73t [8:16] Partitioned (gpt)
> >>>> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
> >>>> │└sdd 2.73t [8:48] Partitioned (gpt)
> >>>> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
> >>>> │└sde 2.73t [8:64] Partitioned (gpt)
> >>>> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
> >>>>   └sdf 2.73t [8:80] Partitioned (gpt)
> >>
> Peter> Unfortunately, my lsdrv tool is not able to reconstruct missing parts.
> Peter> It is most useful when used on a *good* system and *saved* for help
> Peter> diagnosing *future* problems.
> >>
> Peter> Please share your /etc/fstab, and if you were using LVM on top of the
> Peter> raid, share your lvm.conf and anything in /etc/lvm/backup.
> >>
> Peter> Please describe the layer(s) that were on top of the raid.
> >>
> Peter> We need to help you look for signatures, and it helps to be selective in
> Peter> what signatures to look for.
> >>
> Peter> After that, we will want to figure out your raid's chunk size and data
> Peter> offsets.  If you know of a particular large file (8MB or larger) that is
> Peter> sure to be in the raid and you happen to have a copy tucked away, then
> Peter> my findHash[1] tool might be able to definitively determine those
> Peter> values.  (Time consuming, though.)
> >>
> Peter> Meanwhile, don't do *anything* that would write to those drives.
> >>
> Peter> Phil
> >>
> Peter> [1] https://github.com/pturmel/findHash
> >>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 23:24               ` Peter Sanders
@ 2022-08-29 13:12                 ` Peter Sanders
  2022-08-29 21:45                 ` John Stoffel
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Sanders @ 2022-08-29 13:12 UTC (permalink / raw)
  To: John Stoffel; +Cc: Phil Turmel, Wols Lists, NeilBrown, linux-raid

Phil,

The correct findHash - python tool on github - pturmel/findHash?

- Peter

On Sun, Aug 28, 2022 at 7:24 PM Peter Sanders <plsander@gmail.com> wrote:
>
> Phil,
>
> fstab from the working config -
>
> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> # / was on /dev/sda1 during installation
> UUID=50976432-b750-4809-80ac-3bbdd2773163 /               ext4
> errors=remount-ro 0       1
> # /home was on /dev/sda6 during installation
> UUID=eb93a2c4-0190-41fa-a41d-7a5966c6bc47 /home           ext4
> defaults        0       2
> # /var was on /dev/sda5 during installation
> UUID=d1aa6d1f-3ee9-48a8-9350-b15149f738c4 /var            ext4
> defaults        0       2
> /dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
> /dev/sr1        /media/cdrom1   udf,iso9660 user,noauto     0       0
> # raid array
> /dev/md0    /mnt/raid6    ext4    defaults    0    2
>
> No LVM, one large EXT4 partition
>
> I have several large files ( NEF and various mpg files) I can identify
> and have backup copies available.
>
> I have the overlays created. 300G for each of the six drives.
>
> - Peter
>
> On Sun, Aug 28, 2022 at 3:49 PM John Stoffel <john@stoffel.org> wrote:
> >
> > >>>>> "Phil" == Phil Turmel <philip@turmel.org> writes:
> >
> > Phil> Pssst! John,
> > Phil> All of my comments were attributed to Peter by your mail client. ):
> >
> > Yeah... sometimes my mail reader gets confused when it cites previous
> > emails.  I shoudl probably just drop to > only from now on.
> >
> >
> > Phil> On 8/28/22 14:45, John Stoffel wrote:
> > >>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> > >>
> > Peter> It was set up on the device level, not partitions.  (I remember
> > Peter> getting some advice on the web that device was better than
> > Peter> partition... Yay for internet advice)
> > >>
> > >> Yeah, this is NOT good advice.  Generally systems will not destroy
> > >> existing partition tables, but if they see an empty (to them)
> > >> disk... all bets are off.
> > >>
> > Peter> I'm surveying my other disks to see what I have available to do
> > Peter> the overlay attempt.
> > >>
> > >> They're small.  They are sparse files, so just follow the
> > >> instructions.
> > >>
> > Peter> What are the size of the overlay files going to end up being?
> > >>
> > >> Not too large, but it depends on how much data is written to the
> > >> overlayfs to get your data back.  If you follow the instructions on
> > >> this page:
> > >>
> > >> https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID
> > >>
> > >> It says to create a sparse file for each disk that is 1% of the size
> > >> of the disk.  This can add up... you might need to add a blank disk to
> > >> your system to hold these.
> > >>
> > >> In this case, if you think you know which order the disks were in, you
> > >> could try to create the RAID6 array (but only using the overlayfs
> > >> devices!!!!!)  I can't stress this enough.
> > >>
> > >>
> > Peter> I did run into UEFI vs AHCI issues early in the process.. they
> > Peter> are all set to non-UEFI.
> > >>
> > >> That's good.
> > >>
> > Peter> OS update was onto a new SSD...
> > >>
> > >> Ok.  Do you have the old OS disk around by any chance?  That might
> > >> give some pointers to how the disks are setup..  You could look in
> > >> /var/tmp/initrd/... for old mdadm.conf files, which might give more
> > >> details.
> > >>
> > Peter> On Sun, Aug 28, 2022, 12:47 Phil Turmel <philip@turmel.org> wrote:
> > >>
> > Peter> Hi Peter, et al,
> > >>
> > Peter> On 8/28/22 05:54, Wols Lists wrote:
> > >>>> On 28/08/2022 10:14, Wols Lists wrote:
> > >>>>> Currently I have no /dev/md* devices.
> > >>>>> I have access to the old mdadm.conf file - have tried assembling with
> > >>>>> it, with the default mdadm.conf, and with no mdadm.conf file in /etc
> > >>>>> and /etc/mdadm.
> > >>>>>
> > >>>>> It looks like the drives weren't partitioned :-( I think you're into
> > >>>>> forensics.
> > >>
> > Peter> It is too soon to say this.  The supplied mdadm.conf file does not
> > Peter> contain specific partition information.  It is possible the partition
> > Peter> tables have just been wiped.
> > >>
> > >>>> Whoops - my system froze while I was originally writing my reply, and I
> > >>>> forgot to put this into my rewrite ...
> > >>>>
> > >>>> Look up overlays in the wiki. I've never done it myself, but a fair few
> > >>>> people have said the instructions worked a treat.
> > >>>>
> > >>>> You're basically making the drives read-only (all writes get dumped into
> > >>>> the overlay file), and then re-creating the array over the top, so you
> > >>>> can test whether you got it right. If you don't, you just ditch the
> > >>>> overlays and start again, if you did get it right you can recreate the
> > >>>> array for real.
> > >>>>
> > >>>> Cheers,
> > >>>> Wol
> > >>
> > Peter> On 8/28/22 11:10, John Stoffel wrote:
> > >>>>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> > >>>>
> > Peter> have a RAID 6 array, 6 devices.  Been running it for years without much issue.
> > Peter> Had hardware issues with my system - ended up replacing the
> > Peter> motherboard, video card, and power supply and re-installing the OS
> > Peter> (Debian 11).
> > >>>>
> > >>>> Can you give us details on the old vs new motherboard/cpu?  It might
> > >>>> be that you need to tweak the BIOS of the motherboard to expose the
> > >>>> old SATA formats as well.
> > >>>>
> > >>>> Did you install debian onto a fresh boot disk?  Is your BIOS setup to
> > >>>> only do the new form of booting from UEFI devices, so maybe check your
> > >>>> BIOS settings that the data drives are all in AHCI mode, or possibly
> > >>>> even in IDE mode.  It all depends on how old the original hardware
> > >>>> was.
> > >>>>
> > >>>> I just recenly upgraded from a 2010 MB/CPU combo and I had to tweak
> > >>>> the BIOS defaults to see my disks.  I guess I should do a clean
> > >>>> install from a blank disk, but I wanted to minimize downtime.
> > >>
> > Peter> It is important to end up in AHCI mode on all MOBO ports.  If not set
> > Peter> that way now, please change them.
> > >>
> > >>>> Wols has some great advice here, and I heartily recommend that you use
> > >>>> overlayfs when doing your testing.  Check the RAID WIKI for
> > >>>> suggestions.
> > >>
> > Peter> Concur.
> > >>
> > >>>> And don't panic!  Your data is probably there, but just missing the
> > >>>> super blocks or partition tables.
> > >>
> > Peter> Both, I suspect.
> > >>
> > Peter> On 8/27/22 22:00, Peter Sanders wrote:
> > >>>> lsdrv ------------------------
> > >>>> PCI [nvme] 01:00.0 Non-Volatile memory controller: Phison Electronics
> > >>>> Corporation E12 NVMe Controller (rev 01)
> > >>>> └nvme nvme0 PCIe SSD                                 {21112925606047}
> > >>>>   └nvme0n1 238.47g [259:0] Partitioned (dos)
> > >>>>    ├nvme0n1p1 485.00m [259:1] ext4 {f38776ac-1ce9-4fc8-ba50-94844b9f504e}
> > >>>>    │└Mounted as /dev/nvme0n1p1 @ /boot
> > >>>>    ├nvme0n1p2 1.00k [259:2] Partitioned (dos)
> > >>>>    ├nvme0n1p5 60.54g [259:3] ext4 {5ee1c3c0-3a05-466c-9f98-f5807c8d813b}
> > >>>>    │└Mounted as /dev/nvme0n1p5 @ /
> > >>>>    ├nvme0n1p6 93.13g [259:4] ext4 {9064169f-4fe3-4836-a906-28c1b445cdff}
> > >>>>    │└Mounted as /dev/nvme0n1p6 @ /var
> > >>>>    ├nvme0n1p7 37.00m [259:5] ext4 {25e161ad-94a0-4298-afaf-18e2433766ee}
> > >>>>    ├nvme0n1p8 82.89g [259:6] ext4 {ac874071-d759-4d33-b32f-83272f3eacd9}
> > >>>>    │└Mounted as /dev/nvme0n1p8 @ /home
> > >>>>    └nvme0n1p9 1.41g [259:7] swap {02cef84b-9a9d-4a0a-973c-fda1a78c533c}
> > >>>> PCI [pata_jmicron] 26:00.1 IDE interface: JMicron Technology Corp.
> > >>>> JMB368 IDE controller (rev 10)
> > >>>> └scsi 0:0:0:0 MAD DOG  LS-DVDRW TSH652M {MAD_DOG_LS-DVDRW_TSH652M}
> > >>>>   └sr0 1.00g [11:0] Empty/Unknown
> > >>>> PCI [ahci] 26:00.0 SATA controller: JMicron Technology Corp. JMB363
> > >>>> SATA/IDE Controller (rev 10)
> > >>>> └scsi 2:x:x:x [Empty]
> > >>>> PCI [ahci] 2b:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
> > >>>> FCH SATA Controller [AHCI mode] (rev 51)
> > >>>> ├scsi 6:0:0:0 ATA      TOSHIBA HDWD130  {477ALBNAS}
> > >>>> │└sda 2.73t [8:0] Partitioned (PMBR)
> > >>>> └scsi 7:0:0:0 ATA      TOSHIBA HDWD130  {Y7211KPAS}
> > >>>>   └sdc 2.73t [8:32] Partitioned (gpt)
> > >>>> PCI [ahci] 2c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD]
> > >>>> FCH SATA Controller [AHCI mode] (rev 51)
> > >>>> ├scsi 8:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC1T0668790}
> > >>>> │└sdb 2.73t [8:16] Partitioned (gpt)
> > >>>> ├scsi 9:0:0:0 ATA      WDC WD30EZRX-00D {WD-WCC4N0091255}
> > >>>> │└sdd 2.73t [8:48] Partitioned (gpt)
> > >>>> ├scsi 12:0:0:0 ATA      WDC WD30EZRX-00M {WD-WCAWZ2669166}
> > >>>> │└sde 2.73t [8:64] Partitioned (gpt)
> > >>>> └scsi 13:0:0:0 ATA      TOSHIBA HDWD130  {477ABEJAS}
> > >>>>   └sdf 2.73t [8:80] Partitioned (gpt)
> > >>
> > Peter> Unfortunately, my lsdrv tool is not able to reconstruct missing parts.
> > Peter> It is most useful when used on a *good* system and *saved* for help
> > Peter> diagnosing *future* problems.
> > >>
> > Peter> Please share your /etc/fstab, and if you were using LVM on top of the
> > Peter> raid, share your lvm.conf and anything in /etc/lvm/backup.
> > >>
> > Peter> Please describe the layer(s) that were on top of the raid.
> > >>
> > Peter> We need to help you look for signatures, and it helps to be selective in
> > Peter> what signatures to look for.
> > >>
> > Peter> After that, we will want to figure out your raid's chunk size and data
> > Peter> offsets.  If you know of a particular large file (8MB or larger) that is
> > Peter> sure to be in the raid and you happen to have a copy tucked away, then
> > Peter> my findHash[1] tool might be able to definitively determine those
> > Peter> values.  (Time consuming, though.)
> > >>
> > Peter> Meanwhile, don't do *anything* that would write to those drives.
> > >>
> > Peter> Phil
> > >>
> > Peter> [1] https://github.com/pturmel/findHash
> > >>
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-28 23:24               ` Peter Sanders
  2022-08-29 13:12                 ` Peter Sanders
@ 2022-08-29 21:45                 ` John Stoffel
  2022-08-29 22:29                   ` Eyal Lebedinsky
  1 sibling, 1 reply; 29+ messages in thread
From: John Stoffel @ 2022-08-29 21:45 UTC (permalink / raw)
  To: Peter Sanders
  Cc: John Stoffel, Phil Turmel, Wols Lists, NeilBrown, linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

Peter> Phil,
Peter> fstab from the working config -

Peter> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
Peter> # / was on /dev/sda1 during installation
Peter> UUID=50976432-b750-4809-80ac-3bbdd2773163 /               ext4
Peter> errors=remount-ro 0       1
Peter> # /home was on /dev/sda6 during installation
Peter> UUID=eb93a2c4-0190-41fa-a41d-7a5966c6bc47 /home           ext4
Peter> defaults        0       2
Peter> # /var was on /dev/sda5 during installation
Peter> UUID=d1aa6d1f-3ee9-48a8-9350-b15149f738c4 /var            ext4
Peter> defaults        0       2
Peter> /dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
Peter> /dev/sr1        /media/cdrom1   udf,iso9660 user,noauto     0       0
Peter> # raid array
Peter> /dev/md0    /mnt/raid6    ext4    defaults    0    2

Peter> No LVM, one large EXT4 partition

Peter> I have several large files ( NEF and various mpg files) I can identify
Peter> and have backup copies available.

Peter> I have the overlays created. 300G for each of the six drives.

So that's good.  Now you have to try and figure out which order they
were created in.  As the docs show, you setup the overlayfs on top of
each of the six drives.  

Keep track by noting the drive serial numbers, since Linux can move
them around and change drive letters on reboots.


Then using the overlays, do an:

     mdadm --create /dev/md0 --level=raid6 -n 6 /dev/sd[bcdefg] 
     fsck -n /dev/md0

and see what you get.  If it doesn't look like a real filesystem, then
you can break it down, and then modify the order you give the drive
letters, like:

	 /dev/sd[cdefge]

and rinse and repeat as it goes.  Not fun... but should hopefully fix
things for you.

John

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-29 21:45                 ` John Stoffel
@ 2022-08-29 22:29                   ` Eyal Lebedinsky
  2022-08-29 23:53                     ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: Eyal Lebedinsky @ 2022-08-29 22:29 UTC (permalink / raw)
  To: linux-raid


On 30/08/2022 07.45, John Stoffel wrote:
>>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> 
> Peter> Phil,
> Peter> fstab from the working config -
> 
> Peter> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> Peter> # / was on /dev/sda1 during installation
> Peter> UUID=50976432-b750-4809-80ac-3bbdd2773163 /               ext4
> Peter> errors=remount-ro 0       1
> Peter> # /home was on /dev/sda6 during installation
> Peter> UUID=eb93a2c4-0190-41fa-a41d-7a5966c6bc47 /home           ext4
> Peter> defaults        0       2
> Peter> # /var was on /dev/sda5 during installation
> Peter> UUID=d1aa6d1f-3ee9-48a8-9350-b15149f738c4 /var            ext4
> Peter> defaults        0       2
> Peter> /dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
> Peter> /dev/sr1        /media/cdrom1   udf,iso9660 user,noauto     0       0
> Peter> # raid array
> Peter> /dev/md0    /mnt/raid6    ext4    defaults    0    2
> 
> Peter> No LVM, one large EXT4 partition
> 
> Peter> I have several large files ( NEF and various mpg files) I can identify
> Peter> and have backup copies available.
> 
> Peter> I have the overlays created. 300G for each of the six drives.
> 
> So that's good.  Now you have to try and figure out which order they
> were created in.  As the docs show, you setup the overlayfs on top of
> each of the six drives.
> 
> Keep track by noting the drive serial numbers, since Linux can move
> them around and change drive letters on reboots.
> 
> 
> Then using the overlays, do an:
> 
>       mdadm --create /dev/md0 --level=raid6 -n 6 /dev/sd[bcdefg]
>       fsck -n /dev/md0
> 
> and see what you get.  If it doesn't look like a real filesystem, then
> you can break it down, and then modify the order you give the drive
> letters, like:
> 
> 	 /dev/sd[cdefge]
> 
> and rinse and repeat as it goes.  Not fun... but should hopefully fix
> things for you.
> 
> John

An aside, I would think the way to specify a list in a nominated order is something like

$ echo /dev/sd{c,d,a,b}
/dev/sdc /dev/sdd /dev/sda /dev/sdb

rather than

$ echo /dev/sd[cdab]
/dev/sda /dev/sdb /dev/sdc /dev/sdd

which will be in sorting order, regardless of the order of the letter.

-- 
Eyal Lebedinsky (fedora@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-29 22:29                   ` Eyal Lebedinsky
@ 2022-08-29 23:53                     ` Peter Sanders
  2022-08-30 13:27                       ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Sanders @ 2022-08-29 23:53 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-raid

Couple more questions.

Mdadm -create ... Do I use the /dev/sdx or /dev/mapper/sdx name for
the overlayed device?

And reset the mapping between each create attempt by doing:
remove the loop-device/overlay association
   dmsetup remove on all devices
remove the overlay files
  rm
remove the loop back devices
  losetup -d ...
rebuild the loop back devices
  mknod -m 660 ...
build the overlay files
  truncate -s 300G overlay-...
reassociate the loop-devices and the overlays
  losetup... dmsetup..

and try again.

(Yeah, I recognize that there is code to do this (I think) in the
article, but my script-fu is not up to fully understanding those
examples.)

On Mon, Aug 29, 2022 at 6:58 PM Eyal Lebedinsky <fedora@eyal.emu.id.au> wrote:
>
>
> On 30/08/2022 07.45, John Stoffel wrote:
> >>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >
> > Peter> Phil,
> > Peter> fstab from the working config -
> >
> > Peter> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> > Peter> # / was on /dev/sda1 during installation
> > Peter> UUID=50976432-b750-4809-80ac-3bbdd2773163 /               ext4
> > Peter> errors=remount-ro 0       1
> > Peter> # /home was on /dev/sda6 during installation
> > Peter> UUID=eb93a2c4-0190-41fa-a41d-7a5966c6bc47 /home           ext4
> > Peter> defaults        0       2
> > Peter> # /var was on /dev/sda5 during installation
> > Peter> UUID=d1aa6d1f-3ee9-48a8-9350-b15149f738c4 /var            ext4
> > Peter> defaults        0       2
> > Peter> /dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
> > Peter> /dev/sr1        /media/cdrom1   udf,iso9660 user,noauto     0       0
> > Peter> # raid array
> > Peter> /dev/md0    /mnt/raid6    ext4    defaults    0    2
> >
> > Peter> No LVM, one large EXT4 partition
> >
> > Peter> I have several large files ( NEF and various mpg files) I can identify
> > Peter> and have backup copies available.
> >
> > Peter> I have the overlays created. 300G for each of the six drives.
> >
> > So that's good.  Now you have to try and figure out which order they
> > were created in.  As the docs show, you setup the overlayfs on top of
> > each of the six drives.
> >
> > Keep track by noting the drive serial numbers, since Linux can move
> > them around and change drive letters on reboots.
> >
> >
> > Then using the overlays, do an:
> >
> >       mdadm --create /dev/md0 --level=raid6 -n 6 /dev/sd[bcdefg]
> >       fsck -n /dev/md0
> >
> > and see what you get.  If it doesn't look like a real filesystem, then
> > you can break it down, and then modify the order you give the drive
> > letters, like:
> >
> >        /dev/sd[cdefge]
> >
> > and rinse and repeat as it goes.  Not fun... but should hopefully fix
> > things for you.
> >
> > John
>
> An aside, I would think the way to specify a list in a nominated order is something like
>
> $ echo /dev/sd{c,d,a,b}
> /dev/sdc /dev/sdd /dev/sda /dev/sdb
>
> rather than
>
> $ echo /dev/sd[cdab]
> /dev/sda /dev/sdb /dev/sdc /dev/sdd
>
> which will be in sorting order, regardless of the order of the letter.
>
> --
> Eyal Lebedinsky (fedora@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-29 23:53                     ` Peter Sanders
@ 2022-08-30 13:27                       ` Peter Sanders
  2022-08-30 18:03                         ` Wols Lists
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Sanders @ 2022-08-30 13:27 UTC (permalink / raw)
  To: Eyal Lebedinsky; +Cc: linux-raid

Tried with the /dev/mapper/sdx devices.

root@superior:/mnt/backup# mdadm --create /dev/md0 --level=raid6 -n 6
/dev/mapper/sdb  /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde
/dev/mapper/sdf /dev/mapper/sdg
mdadm: partition table exists on /dev/mapper/sdb
mdadm: partition table exists on /dev/mapper/sdc
mdadm: partition table exists on /dev/mapper/sdc but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sdd
mdadm: partition table exists on /dev/mapper/sdd but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sde
mdadm: partition table exists on /dev/mapper/sde but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sdf
mdadm: partition table exists on /dev/mapper/sdf but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sdg
mdadm: partition table exists on /dev/mapper/sdg but will be lost or
       meaningless after creating array
Continue creating array? n
mdadm: create aborted.
root@superior:/mnt/backup#

Chickened out and aborted the create.
Are those expected messages for the mess I am in?

And the victory conditions would be a mountable file system that passes a fsck?


On Mon, Aug 29, 2022 at 7:53 PM Peter Sanders <plsander@gmail.com> wrote:
>
> Couple more questions.
>
> Mdadm -create ... Do I use the /dev/sdx or /dev/mapper/sdx name for
> the overlayed device?
>
> And reset the mapping between each create attempt by doing:
> remove the loop-device/overlay association
>    dmsetup remove on all devices
> remove the overlay files
>   rm
> remove the loop back devices
>   losetup -d ...
> rebuild the loop back devices
>   mknod -m 660 ...
> build the overlay files
>   truncate -s 300G overlay-...
> reassociate the loop-devices and the overlays
>   losetup... dmsetup..
>
> and try again.
>
> (Yeah, I recognize that there is code to do this (I think) in the
> article, but my script-fu is not up to fully understanding those
> examples.)
>
> On Mon, Aug 29, 2022 at 6:58 PM Eyal Lebedinsky <fedora@eyal.emu.id.au> wrote:
> >
> >
> > On 30/08/2022 07.45, John Stoffel wrote:
> > >>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> > >
> > > Peter> Phil,
> > > Peter> fstab from the working config -
> > >
> > > Peter> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> > > Peter> # / was on /dev/sda1 during installation
> > > Peter> UUID=50976432-b750-4809-80ac-3bbdd2773163 /               ext4
> > > Peter> errors=remount-ro 0       1
> > > Peter> # /home was on /dev/sda6 during installation
> > > Peter> UUID=eb93a2c4-0190-41fa-a41d-7a5966c6bc47 /home           ext4
> > > Peter> defaults        0       2
> > > Peter> # /var was on /dev/sda5 during installation
> > > Peter> UUID=d1aa6d1f-3ee9-48a8-9350-b15149f738c4 /var            ext4
> > > Peter> defaults        0       2
> > > Peter> /dev/sr0        /media/cdrom0   udf,iso9660 user,noauto     0       0
> > > Peter> /dev/sr1        /media/cdrom1   udf,iso9660 user,noauto     0       0
> > > Peter> # raid array
> > > Peter> /dev/md0    /mnt/raid6    ext4    defaults    0    2
> > >
> > > Peter> No LVM, one large EXT4 partition
> > >
> > > Peter> I have several large files ( NEF and various mpg files) I can identify
> > > Peter> and have backup copies available.
> > >
> > > Peter> I have the overlays created. 300G for each of the six drives.
> > >
> > > So that's good.  Now you have to try and figure out which order they
> > > were created in.  As the docs show, you setup the overlayfs on top of
> > > each of the six drives.
> > >
> > > Keep track by noting the drive serial numbers, since Linux can move
> > > them around and change drive letters on reboots.
> > >
> > >
> > > Then using the overlays, do an:
> > >
> > >       mdadm --create /dev/md0 --level=raid6 -n 6 /dev/sd[bcdefg]
> > >       fsck -n /dev/md0
> > >
> > > and see what you get.  If it doesn't look like a real filesystem, then
> > > you can break it down, and then modify the order you give the drive
> > > letters, like:
> > >
> > >        /dev/sd[cdefge]
> > >
> > > and rinse and repeat as it goes.  Not fun... but should hopefully fix
> > > things for you.
> > >
> > > John
> >
> > An aside, I would think the way to specify a list in a nominated order is something like
> >
> > $ echo /dev/sd{c,d,a,b}
> > /dev/sdc /dev/sdd /dev/sda /dev/sdb
> >
> > rather than
> >
> > $ echo /dev/sd[cdab]
> > /dev/sda /dev/sdb /dev/sdc /dev/sdd
> >
> > which will be in sorting order, regardless of the order of the letter.
> >
> > --
> > Eyal Lebedinsky (fedora@eyal.emu.id.au)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-30 13:27                       ` Peter Sanders
@ 2022-08-30 18:03                         ` Wols Lists
  2022-08-31 17:48                           ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: Wols Lists @ 2022-08-30 18:03 UTC (permalink / raw)
  To: Peter Sanders, Eyal Lebedinsky; +Cc: linux-raid

On 30/08/2022 14:27, Peter Sanders wrote:
> 
> And the victory conditions would be a mountable file system that passes a fsck?

Yes. Just make sure you delve through the file system a bit and satisfy 
yourself it looks good, too ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-30 18:03                         ` Wols Lists
@ 2022-08-31 17:48                           ` Peter Sanders
  2022-08-31 20:37                             ` John Stoffel
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Sanders @ 2022-08-31 17:48 UTC (permalink / raw)
  To: Wols Lists; +Cc: Eyal Lebedinsky, linux-raid

encountering a puzzling situation.

dmsetup is failing to return.

root@superior:/mnt/backup# dmsetup status
sdg: 0 5860533168 snapshot 16/8388608000 16
sdf: 0 5860533168 snapshot 16/8388608000 16
sde: 0 5860533168 snapshot 16/8388608000 16
sdd: 0 5860533168 snapshot 16/8388608000 16
sdc: 0 5860533168 snapshot 16/8388608000 16
sdb: 0 5860533168 snapshot 16/8388608000 16

dmsetup remove sdg  runs for hours.
Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.

dmsetup status shows:
sdf: 0 5860533168 snapshot 16/8388608000 16
sde: 0 5860533168 snapshot 16/8388608000 16
sdd: 0 5860533168 snapshot 16/8388608000 16
sdc: 0 5860533168 snapshot 16/8388608000 16
sdb: 0 5860533168 snapshot 16/8388608000 16

dmsetup ls --tree
root@superior:/mnt/backup# dmsetup ls --tree
sdf (253:3)
 ├─ (7:3)
 └─ (8:80)
sde (253:1)
 ├─ (7:1)
 └─ (8:64)
sdd (253:2)
 ├─ (7:2)
 └─ (8:48)
sdc (253:0)
 ├─ (7:0)
 └─ (8:32)
sdb (253:5)
 ├─ (7:5)
 └─ (8:16)

any suggestions?



On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
>
> On 30/08/2022 14:27, Peter Sanders wrote:
> >
> > And the victory conditions would be a mountable file system that passes a fsck?
>
> Yes. Just make sure you delve through the file system a bit and satisfy
> yourself it looks good, too ...
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-31 17:48                           ` Peter Sanders
@ 2022-08-31 20:37                             ` John Stoffel
  2022-09-02 14:56                               ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: John Stoffel @ 2022-08-31 20:37 UTC (permalink / raw)
  To: Peter Sanders; +Cc: Wols Lists, Eyal Lebedinsky, linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

> encountering a puzzling situation.
> dmsetup is failing to return.

I don't think you need to use dmsetup in your case, but can you post
*all* the commands you ran before you got to this point, and the
output of 

       cat /proc/mdstat

as well?  Thinking on this some more, you might need to actually also
add:

	--assume-clean

to the 'mdadm create ....' string, since you don't want it to zero the
array or anything.  

Sorry for not remembering this at the time!

So if you can, please just start over from scratch, showing the setup
of the loop devices, the overlayfs setup, and the building the RAID6
array, along with the cat /proc/mdstat after you do the initial build.

John

P.S.  For those who hated my email citing tool, I pulled it out for
now.  Only citing with > now.  :-)

> root@superior:/mnt/backup# dmsetup status
> sdg: 0 5860533168 snapshot 16/8388608000 16
> sdf: 0 5860533168 snapshot 16/8388608000 16
> sde: 0 5860533168 snapshot 16/8388608000 16
> sdd: 0 5860533168 snapshot 16/8388608000 16
> sdc: 0 5860533168 snapshot 16/8388608000 16
> sdb: 0 5860533168 snapshot 16/8388608000 16

> dmsetup remove sdg  runs for hours.
> Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.

> dmsetup status shows:
> sdf: 0 5860533168 snapshot 16/8388608000 16
> sde: 0 5860533168 snapshot 16/8388608000 16
> sdd: 0 5860533168 snapshot 16/8388608000 16
> sdc: 0 5860533168 snapshot 16/8388608000 16
> sdb: 0 5860533168 snapshot 16/8388608000 16

> dmsetup ls --tree
> root@superior:/mnt/backup# dmsetup ls --tree
> sdf (253:3)
>  ├─ (7:3)
>  └─ (8:80)
> sde (253:1)
>  ├─ (7:1)
>  └─ (8:64)
> sdd (253:2)
>  ├─ (7:2)
>  └─ (8:48)
> sdc (253:0)
>  ├─ (7:0)
>  └─ (8:32)
> sdb (253:5)
>  ├─ (7:5)
>  └─ (8:16)

> any suggestions?



> On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
>> 
>> On 30/08/2022 14:27, Peter Sanders wrote:
>> >
>> > And the victory conditions would be a mountable file system that passes a fsck?
>> 
>> Yes. Just make sure you delve through the file system a bit and satisfy
>> yourself it looks good, too ...
>> 
>> Cheers,
>> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-08-31 20:37                             ` John Stoffel
@ 2022-09-02 14:56                               ` Peter Sanders
  2022-09-02 18:52                                 ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Sanders @ 2022-09-02 14:56 UTC (permalink / raw)
  To: John Stoffel; +Cc: Wols Lists, Eyal Lebedinsky, linux-raid

contents of /proc/mdstat

root@superior:/mnt/backup# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
unused devices: <none>
root@superior:/mnt/backup#



Here are the steps I ran (minus some mounting other devices and
looking around for mdadm tracks on the old os disk)

  410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
--colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
echo /dev/{1})
  411  apt install parallel
  412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
--colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
echo /dev/{1})
  413  echo $DEVICES
  414  cat /proc/partitions
  415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
  416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
  417  echo $DEVICES
  418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
{#}' ::: $DEVICES
  419  ls /dev/loop*
  420  dc
  421  cd /mnt/backup/
  422  ls
  423  parallel truncate -s300G overlay-{/} ::: $DEVICES
  424  ls
  425  ls -la
  426  df -h
  427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
--show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
create {/}' ::: $DEVICES
  428  ls /dev/mapper/
  429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
  430  echo $OVERLAYS
  431  dmsetup status
  432  mdadm --assemble --force /dev/md1 $OVERLAYS
  433  history
  434  dmsetup status
  435  echo $OVERLAYS
  436  mdadm --assemble --force /dev/md0 $OVERLAYS
  437  cat /proc/partitions
  438  mkdir /mnt/oldroot
  << look for inird mdadm files >>
  484  echo $OVERLAYS
  485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
/dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
/dev/mapper/sdg
  << cancelled out of 485, review instructions... >>
  486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
/dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
/dev/mapper/sdg
  487  fsck -n /dev/md0
  488  mdadm --stop /dev/md0
  489  echo $DEVICES
  490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
  491  dmsetup status
  492  ls
  493  rm overlay-*
  494  ls
  495  parallel losetup -d ::: /dev/loop[0-9]*
  496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
{#}' ::: $DEVICES
  497  parallel truncate -s300G overlay-{/} ::: $DEVICES
  498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
--show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
create {/}' ::: $DEVICES
  499  dmsetup status
  500  /sbin/reboot
  501  history
  502  dmsetup status
  503  mount
  504  cat /proc/partitions
  505  nano /etc/fstab
  506  mount /mnt/backup/
  507  ls /mnt/backup/
  508  rm /mnt/backup/
  509  rm /mnt/backup/overlay-sd*
  510  emacs setupOverlay &
  511  ps auxww | grep emacs
  512  kill 65017
  513  ls /dev/loo*
  514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
  515  echo $DEVICES
  516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
7 {#}' ::: $DEVICES
  517  ls /dev/loo*
  518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
  519  ls
  520  rm overlay-sd*
  521  cd /mnt/bak
  522  cd /mnt/backup/
  523  ls
  524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
  525  ls -la
  526  blockdev --getsize /dev/sdb
  527  man losetup
  528  man losetup
  529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
--show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
create {/}' ::: $DEVICES
  530  dmsetup status
  531  history | grep mdadm
  532  history
  533  dmsetup status
  534  history | grep dmsetup
  535  dmsetup status
  536  dmsetup remove sdg
  537  dmsetup ls --tree
  538  lsof
  539  dmsetup ls --tre
  540  dmsetup ls --tree
  541  lsof | grep -i sdg
  542  lsof | grep -i sdf
  543  history |grep dmsetup | less
  544  dmsetup status
  545  history > ~plsander/Documents/raidIssues/joblog

On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>
> > encountering a puzzling situation.
> > dmsetup is failing to return.
>
> I don't think you need to use dmsetup in your case, but can you post
> *all* the commands you ran before you got to this point, and the
> output of
>
>        cat /proc/mdstat
>
> as well?  Thinking on this some more, you might need to actually also
> add:
>
>         --assume-clean
>
> to the 'mdadm create ....' string, since you don't want it to zero the
> array or anything.
>
> Sorry for not remembering this at the time!
>
> So if you can, please just start over from scratch, showing the setup
> of the loop devices, the overlayfs setup, and the building the RAID6
> array, along with the cat /proc/mdstat after you do the initial build.
>
> John
>
> P.S.  For those who hated my email citing tool, I pulled it out for
> now.  Only citing with > now.  :-)
>
> > root@superior:/mnt/backup# dmsetup status
> > sdg: 0 5860533168 snapshot 16/8388608000 16
> > sdf: 0 5860533168 snapshot 16/8388608000 16
> > sde: 0 5860533168 snapshot 16/8388608000 16
> > sdd: 0 5860533168 snapshot 16/8388608000 16
> > sdc: 0 5860533168 snapshot 16/8388608000 16
> > sdb: 0 5860533168 snapshot 16/8388608000 16
>
> > dmsetup remove sdg  runs for hours.
> > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
>
> > dmsetup status shows:
> > sdf: 0 5860533168 snapshot 16/8388608000 16
> > sde: 0 5860533168 snapshot 16/8388608000 16
> > sdd: 0 5860533168 snapshot 16/8388608000 16
> > sdc: 0 5860533168 snapshot 16/8388608000 16
> > sdb: 0 5860533168 snapshot 16/8388608000 16
>
> > dmsetup ls --tree
> > root@superior:/mnt/backup# dmsetup ls --tree
> > sdf (253:3)
> >  ├─ (7:3)
> >  └─ (8:80)
> > sde (253:1)
> >  ├─ (7:1)
> >  └─ (8:64)
> > sdd (253:2)
> >  ├─ (7:2)
> >  └─ (8:48)
> > sdc (253:0)
> >  ├─ (7:0)
> >  └─ (8:32)
> > sdb (253:5)
> >  ├─ (7:5)
> >  └─ (8:16)
>
> > any suggestions?
>
>
>
> > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
> >>
> >> On 30/08/2022 14:27, Peter Sanders wrote:
> >> >
> >> > And the victory conditions would be a mountable file system that passes a fsck?
> >>
> >> Yes. Just make sure you delve through the file system a bit and satisfy
> >> yourself it looks good, too ...
> >>
> >> Cheers,
> >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-02 14:56                               ` Peter Sanders
@ 2022-09-02 18:52                                 ` Peter Sanders
  2022-09-02 19:12                                   ` John Stoffel
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Sanders @ 2022-09-02 18:52 UTC (permalink / raw)
  To: John Stoffel; +Cc: Wols Lists, Eyal Lebedinsky, linux-raid

Question on restarting from scratch...

How to reset to the starting point?
dmsetup, both for remove and create of the overlay seems to be hanging.

On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
>
> contents of /proc/mdstat
>
> root@superior:/mnt/backup# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
> root@superior:/mnt/backup#
>
>
>
> Here are the steps I ran (minus some mounting other devices and
> looking around for mdadm tracks on the old os disk)
>
>   410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> echo /dev/{1})
>   411  apt install parallel
>   412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> echo /dev/{1})
>   413  echo $DEVICES
>   414  cat /proc/partitions
>   415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
>   416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
>   417  echo $DEVICES
>   418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> {#}' ::: $DEVICES
>   419  ls /dev/loop*
>   420  dc
>   421  cd /mnt/backup/
>   422  ls
>   423  parallel truncate -s300G overlay-{/} ::: $DEVICES
>   424  ls
>   425  ls -la
>   426  df -h
>   427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> create {/}' ::: $DEVICES
>   428  ls /dev/mapper/
>   429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
>   430  echo $OVERLAYS
>   431  dmsetup status
>   432  mdadm --assemble --force /dev/md1 $OVERLAYS
>   433  history
>   434  dmsetup status
>   435  echo $OVERLAYS
>   436  mdadm --assemble --force /dev/md0 $OVERLAYS
>   437  cat /proc/partitions
>   438  mkdir /mnt/oldroot
>   << look for inird mdadm files >>
>   484  echo $OVERLAYS
>   485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> /dev/mapper/sdg
>   << cancelled out of 485, review instructions... >>
>   486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> /dev/mapper/sdg
>   487  fsck -n /dev/md0
>   488  mdadm --stop /dev/md0
>   489  echo $DEVICES
>   490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
>   491  dmsetup status
>   492  ls
>   493  rm overlay-*
>   494  ls
>   495  parallel losetup -d ::: /dev/loop[0-9]*
>   496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> {#}' ::: $DEVICES
>   497  parallel truncate -s300G overlay-{/} ::: $DEVICES
>   498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> create {/}' ::: $DEVICES
>   499  dmsetup status
>   500  /sbin/reboot
>   501  history
>   502  dmsetup status
>   503  mount
>   504  cat /proc/partitions
>   505  nano /etc/fstab
>   506  mount /mnt/backup/
>   507  ls /mnt/backup/
>   508  rm /mnt/backup/
>   509  rm /mnt/backup/overlay-sd*
>   510  emacs setupOverlay &
>   511  ps auxww | grep emacs
>   512  kill 65017
>   513  ls /dev/loo*
>   514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
>   515  echo $DEVICES
>   516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
> 7 {#}' ::: $DEVICES
>   517  ls /dev/loo*
>   518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>   519  ls
>   520  rm overlay-sd*
>   521  cd /mnt/bak
>   522  cd /mnt/backup/
>   523  ls
>   524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>   525  ls -la
>   526  blockdev --getsize /dev/sdb
>   527  man losetup
>   528  man losetup
>   529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> create {/}' ::: $DEVICES
>   530  dmsetup status
>   531  history | grep mdadm
>   532  history
>   533  dmsetup status
>   534  history | grep dmsetup
>   535  dmsetup status
>   536  dmsetup remove sdg
>   537  dmsetup ls --tree
>   538  lsof
>   539  dmsetup ls --tre
>   540  dmsetup ls --tree
>   541  lsof | grep -i sdg
>   542  lsof | grep -i sdf
>   543  history |grep dmsetup | less
>   544  dmsetup status
>   545  history > ~plsander/Documents/raidIssues/joblog
>
> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
> >
> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >
> > > encountering a puzzling situation.
> > > dmsetup is failing to return.
> >
> > I don't think you need to use dmsetup in your case, but can you post
> > *all* the commands you ran before you got to this point, and the
> > output of
> >
> >        cat /proc/mdstat
> >
> > as well?  Thinking on this some more, you might need to actually also
> > add:
> >
> >         --assume-clean
> >
> > to the 'mdadm create ....' string, since you don't want it to zero the
> > array or anything.
> >
> > Sorry for not remembering this at the time!
> >
> > So if you can, please just start over from scratch, showing the setup
> > of the loop devices, the overlayfs setup, and the building the RAID6
> > array, along with the cat /proc/mdstat after you do the initial build.
> >
> > John
> >
> > P.S.  For those who hated my email citing tool, I pulled it out for
> > now.  Only citing with > now.  :-)
> >
> > > root@superior:/mnt/backup# dmsetup status
> > > sdg: 0 5860533168 snapshot 16/8388608000 16
> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> > > sde: 0 5860533168 snapshot 16/8388608000 16
> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> >
> > > dmsetup remove sdg  runs for hours.
> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
> >
> > > dmsetup status shows:
> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> > > sde: 0 5860533168 snapshot 16/8388608000 16
> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> >
> > > dmsetup ls --tree
> > > root@superior:/mnt/backup# dmsetup ls --tree
> > > sdf (253:3)
> > >  ├─ (7:3)
> > >  └─ (8:80)
> > > sde (253:1)
> > >  ├─ (7:1)
> > >  └─ (8:64)
> > > sdd (253:2)
> > >  ├─ (7:2)
> > >  └─ (8:48)
> > > sdc (253:0)
> > >  ├─ (7:0)
> > >  └─ (8:32)
> > > sdb (253:5)
> > >  ├─ (7:5)
> > >  └─ (8:16)
> >
> > > any suggestions?
> >
> >
> >
> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
> > >>
> > >> On 30/08/2022 14:27, Peter Sanders wrote:
> > >> >
> > >> > And the victory conditions would be a mountable file system that passes a fsck?
> > >>
> > >> Yes. Just make sure you delve through the file system a bit and satisfy
> > >> yourself it looks good, too ...
> > >>
> > >> Cheers,
> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-02 18:52                                 ` Peter Sanders
@ 2022-09-02 19:12                                   ` John Stoffel
  2022-09-03  0:39                                     ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: John Stoffel @ 2022-09-02 19:12 UTC (permalink / raw)
  To: Peter Sanders; +Cc: John Stoffel, Wols Lists, Eyal Lebedinsky, linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

Peter, please include the output of all the commands, not just the
commands themselves.  See my comments below.


> Question on restarting from scratch...
> How to reset to the starting point?

I think you need to blow away the loop devices and re-create them.  

Or at least blow away the dmsetup devices you just created.  

It might be quickest to just reboot.  What OS are you using for the
recovery?  Is it a recent live image?  Sorry for asking so many
questions... some of this is new to me too.


> dmsetup, both for remove and create of the overlay seems to be hanging.

> On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
>> 
>> contents of /proc/mdstat
>> 
>> root@superior:/mnt/backup# cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> unused devices: <none>
>> root@superior:/mnt/backup#
>> 
>> 
>> 
>> Here are the steps I ran (minus some mounting other devices and
>> looking around for mdadm tracks on the old os disk)
>> 
>> 410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
>> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
>> echo /dev/{1})
>> 411  apt install parallel
>> 412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
>> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
>> echo /dev/{1})
>> 413  echo $DEVICES

So you found no MD RAID super blocks on any of the base devices.  You
can skip this step moving forward. 

>> 414  cat /proc/partitions
>> 415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
>> 416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
>> 417  echo $DEVICES
>> 418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
>> {#}' ::: $DEVICES
>> 419  ls /dev/loop*

Can you show the output of all these commands, not just the commands please?

>> 423  parallel truncate -s300G overlay-{/} ::: $DEVICES

>> 427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> create {/}' ::: $DEVICES
>> 428  ls /dev/mapper/

This is some key output to view.

>> 429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
>> 430  echo $OVERLAYS

What are the overlays?  

>> 431  dmsetup status

What did this command show?

>> 432  mdadm --assemble --force /dev/md1 $OVERLAYS

And here is where I think you need to put --assume-clean when using
'create' command instead.  It's not going to assemble anything because
the info was wiped.  I *think* you really want:

   mdadm --create /dev/md1 --level=raid6 -n 6 --assume-clean $OVERLAYS

And once you do this above command and it comes back, do:

    cat /proc/mdstat

and show all the output please!  

>> 433  history
>> 434  dmsetup status
>> 435  echo $OVERLAYS
>> 436  mdadm --assemble --force /dev/md0 $OVERLAYS
>> 437  cat /proc/partitions
>> 438  mkdir /mnt/oldroot
>> << look for inird mdadm files >>
>> 484  echo $OVERLAYS
>> 485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
>> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
>> /dev/mapper/sdg

I'm confused here, what  is the difference between the md1 you
assembled above, and the md0 you're doing here?  

>> << cancelled out of 485, review instructions... >>
>> 486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
>> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
>> /dev/mapper/sdg
>> 487  fsck -n /dev/md0

And what output did you get here?  Did it find a filesystem?  You might want
to try:

   blkid /dev/md0  


>> 488  mdadm --stop /dev/md0
>> 489  echo $DEVICES
>> 490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
>> 491  dmsetup status

This all worked properly?  No errors?

I gave up after this because it's not clear what the results really
are.  If you don't find a filesystem that fsck's cleanly, then you
should just need to stop the array, then re-create it but shuffle the
order of the devices.  

Instead of disk in order of "sdb sdc sdd... sdN", you would try the
order "sdc sdd ... sdN sdb".   See how I moved sdb to the end of the
list of devices?  With six disks, you have I think 6 factorial options
to try.   Which is alot of options to go though, and why you need to
automate this more.  But also keep a log and show the output!

John


>> 492  ls
>> 493  rm overlay-*
>> 494  ls
>> 495  parallel losetup -d ::: /dev/loop[0-9]*
>> 496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
>> {#}' ::: $DEVICES
>> 497  parallel truncate -s300G overlay-{/} ::: $DEVICES
>> 498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> create {/}' ::: $DEVICES
>> 499  dmsetup status
>> 500  /sbin/reboot
>> 501  history
>> 502  dmsetup status
>> 503  mount
>> 504  cat /proc/partitions
>> 505  nano /etc/fstab
>> 506  mount /mnt/backup/
>> 507  ls /mnt/backup/
>> 508  rm /mnt/backup/
>> 509  rm /mnt/backup/overlay-sd*
>> 510  emacs setupOverlay &
>> 511  ps auxww | grep emacs
>> 512  kill 65017
>> 513  ls /dev/loo*
>> 514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
>> 515  echo $DEVICES
>> 516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
>> 7 {#}' ::: $DEVICES
>> 517  ls /dev/loo*
>> 518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> 519  ls
>> 520  rm overlay-sd*
>> 521  cd /mnt/bak
>> 522  cd /mnt/backup/
>> 523  ls
>> 524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> 525  ls -la
>> 526  blockdev --getsize /dev/sdb
>> 527  man losetup
>> 528  man losetup
>> 529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> create {/}' ::: $DEVICES
>> 530  dmsetup status
>> 531  history | grep mdadm
>> 532  history
>> 533  dmsetup status
>> 534  history | grep dmsetup
>> 535  dmsetup status
>> 536  dmsetup remove sdg
>> 537  dmsetup ls --tree
>> 538  lsof
>> 539  dmsetup ls --tre
>> 540  dmsetup ls --tree
>> 541  lsof | grep -i sdg
>> 542  lsof | grep -i sdf
>> 543  history |grep dmsetup | less
>> 544  dmsetup status
>> 545  history > ~plsander/Documents/raidIssues/joblog
>> 
>> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
>> >
>> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> >
>> > > encountering a puzzling situation.
>> > > dmsetup is failing to return.
>> >
>> > I don't think you need to use dmsetup in your case, but can you post
>> > *all* the commands you ran before you got to this point, and the
>> > output of
>> >
>> >        cat /proc/mdstat
>> >
>> > as well?  Thinking on this some more, you might need to actually also
>> > add:
>> >
>> >         --assume-clean
>> >
>> > to the 'mdadm create ....' string, since you don't want it to zero the
>> > array or anything.
>> >
>> > Sorry for not remembering this at the time!
>> >
>> > So if you can, please just start over from scratch, showing the setup
>> > of the loop devices, the overlayfs setup, and the building the RAID6
>> > array, along with the cat /proc/mdstat after you do the initial build.
>> >
>> > John
>> >
>> > P.S.  For those who hated my email citing tool, I pulled it out for
>> > now.  Only citing with > now.  :-)
>> >
>> > > root@superior:/mnt/backup# dmsetup status
>> > > sdg: 0 5860533168 snapshot 16/8388608000 16
>> > > sdf: 0 5860533168 snapshot 16/8388608000 16
>> > > sde: 0 5860533168 snapshot 16/8388608000 16
>> > > sdd: 0 5860533168 snapshot 16/8388608000 16
>> > > sdc: 0 5860533168 snapshot 16/8388608000 16
>> > > sdb: 0 5860533168 snapshot 16/8388608000 16
>> >
>> > > dmsetup remove sdg  runs for hours.
>> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
>> >
>> > > dmsetup status shows:
>> > > sdf: 0 5860533168 snapshot 16/8388608000 16
>> > > sde: 0 5860533168 snapshot 16/8388608000 16
>> > > sdd: 0 5860533168 snapshot 16/8388608000 16
>> > > sdc: 0 5860533168 snapshot 16/8388608000 16
>> > > sdb: 0 5860533168 snapshot 16/8388608000 16
>> >
>> > > dmsetup ls --tree
>> > > root@superior:/mnt/backup# dmsetup ls --tree
>> > > sdf (253:3)
>> > >  ├─ (7:3)
>> > >  └─ (8:80)
>> > > sde (253:1)
>> > >  ├─ (7:1)
>> > >  └─ (8:64)
>> > > sdd (253:2)
>> > >  ├─ (7:2)
>> > >  └─ (8:48)
>> > > sdc (253:0)
>> > >  ├─ (7:0)
>> > >  └─ (8:32)
>> > > sdb (253:5)
>> > >  ├─ (7:5)
>> > >  └─ (8:16)
>> >
>> > > any suggestions?
>> >
>> >
>> >
>> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
>> > >>
>> > >> On 30/08/2022 14:27, Peter Sanders wrote:
>> > >> >
>> > >> > And the victory conditions would be a mountable file system that passes a fsck?
>> > >>
>> > >> Yes. Just make sure you delve through the file system a bit and satisfy
>> > >> yourself it looks good, too ...
>> > >>
>> > >> Cheers,
>> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-02 19:12                                   ` John Stoffel
@ 2022-09-03  0:39                                     ` Peter Sanders
  2022-09-03  5:51                                       ` Peter Sanders
  2022-09-05 19:25                                       ` John Stoffel
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Sanders @ 2022-09-03  0:39 UTC (permalink / raw)
  To: John Stoffel; +Cc: Wols Lists, Eyal Lebedinsky, linux-raid

Repeat of run 1

plsander@superior:~$ su -
Password:
root@superior:~# cat /proc/partitions
major minor  #blocks  name

 259        0  250059096 nvme0n1
 259        1     496640 nvme0n1p1
 259        2          1 nvme0n1p2
 259        3   63475712 nvme0n1p5
 259        4   97654784 nvme0n1p6
 259        5      37888 nvme0n1p7
 259        6   86913024 nvme0n1p8
 259        7    1474560 nvme0n1p9
   8       16 2930266584 sdb
   8       80 2930266584 sdf
   8        0 1953514584 sda
   8        1 1953513472 sda1
   8       32 2930266584 sdc
   8       96 2930266584 sdg
   8       64 2930266584 sde
   8       48 2930266584 sdd
  11        0    1048575 sr0
root@superior:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
unused devices: <none>
root@superior:~# DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
root@superior:~# echo $DEVICES
/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
root@superior:~# parallel 'test -e /dev/loop{#} || mknod -m 660
/dev/loop{#} b 7 {#}' ::: $DEVICES
root@superior:~# ls /dev/lo
log           loop2         loop4         loop6
loop1         loop3         loop5         loop-control
root@superior:~# ls /dev/lo*
/dev/log  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5
/dev/loop6  /dev/loop-control
root@superior:~# ls -l /dev/loop*
brw-rw---- 1 root root  7,   1 Sep  2 20:30 /dev/loop1
brw-rw---- 1 root root  7,   2 Sep  2 20:30 /dev/loop2
brw-rw---- 1 root root  7,   3 Sep  2 20:30 /dev/loop3
brw-rw---- 1 root root  7,   4 Sep  2 20:30 /dev/loop4
brw-rw---- 1 root root  7,   5 Sep  2 20:30 /dev/loop5
brw-rw---- 1 root root  7,   6 Sep  2 20:30 /dev/loop6
crw-rw---- 1 root disk 10, 237 Sep  2 20:22 /dev/loop-control
root@superior:~# cd /mnt/backup/
root@superior:/mnt/backup# parallel truncate -s4000G overlay-{/} ::: $DEVICES
root@superior:/mnt/backup# ls -l
total 16
drwx------ 2 root root         16384 Aug 28 18:50 lost+found
-rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdb
-rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdc
-rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdd
-rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sde
-rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdf
-rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdg
root@superior:/mnt/backup# rm over*
root@superior:/mnt/backup# parallel truncate -s300G overlay-{/} ::: $DEVICES
root@superior:/mnt/backup# ls -la
total 24
drwxr-xr-x 3 root root         4096 Sep  2 20:31 .
drwxr-xr-x 7 root root         4096 Aug 29 09:17 ..
drwx------ 2 root root        16384 Aug 28 18:50 lost+found
-rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdb
-rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdc
-rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdd
-rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sde
-rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdf
-rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdg
root@superior:/mnt/backup# dmsetup status
No devices found
root@superior:/mnt/backup# date
Fri 02 Sep 2022 08:32:11 PM EDT
root@superior:/mnt/backup#  parallel 'size=$(blockdev --getsize {});
loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {}
$loop P 8 | dmsetup create {/}' ::: $DEVICES
root@superior:/mnt/backup# date
Fri 02 Sep 2022 08:32:20 PM EDT
root@superior:/mnt/backup# dmsetup status
sdg: 0 5860533168 snapshot 16/629145600 16
sdf: 0 5860533168 snapshot 16/629145600 16
sde: 0 5860533168 snapshot 16/629145600 16
sdd: 0 5860533168 snapshot 16/629145600 16
sdc: 0 5860533168 snapshot 16/629145600 16
sdb: 0 5860533168 snapshot 16/629145600 16
root@superior:/mnt/backup# OVERLAYS=$(parallel echo /dev/mapper/{/}
::: $DEVICES)
root@superior:/mnt/backup# echo $OVERLAYS
/dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde
/dev/mapper/sdf /dev/mapper/sdg
root@superior:/mnt/backup# mdadm --create /dev/md1 --level=raid6 -n 6
--assume-clean $OVERLAYS
mdadm: partition table exists on /dev/mapper/sdb
mdadm: partition table exists on /dev/mapper/sdc
mdadm: partition table exists on /dev/mapper/sdc but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sdd
mdadm: partition table exists on /dev/mapper/sdd but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sde
mdadm: partition table exists on /dev/mapper/sde but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sdf
mdadm: partition table exists on /dev/mapper/sdf but will be lost or
       meaningless after creating array
mdadm: partition table exists on /dev/mapper/sdg
mdadm: partition table exists on /dev/mapper/sdg but will be lost or
       meaningless after creating array
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
root@superior:/mnt/backup# ls -l /dev/md*
brw-rw---- 1 root disk 9, 1 Sep  2 20:34 /dev/md1
root@superior:/mnt/backup# fsck /dev/md1
fsck from util-linux 2.36.1
e2fsck 1.46.2 (28-Feb-2021)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/md1

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

root@superior:/mnt/backup# blkid /dev/md1
root@superior:/mnt/backup#
root@superior:/mnt/backup# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md1 : active raid6 dm-3[5] dm-2[4] dm-1[3] dm-5[2] dm-0[1] dm-4[0]
      11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2
[6/6] [UUUUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>
root@superior:/mnt/backup#

Some questions -
- is the easiest 'reset for next run' to reboot and rebuild?


On Fri, Sep 2, 2022 at 3:12 PM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>
> Peter, please include the output of all the commands, not just the
> commands themselves.  See my comments below.
>
>
> > Question on restarting from scratch...
> > How to reset to the starting point?
>
> I think you need to blow away the loop devices and re-create them.
>
> Or at least blow away the dmsetup devices you just created.
>
> It might be quickest to just reboot.  What OS are you using for the
> recovery?  Is it a recent live image?  Sorry for asking so many
> questions... some of this is new to me too.
>
>
> > dmsetup, both for remove and create of the overlay seems to be hanging.
>
> > On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
> >>
> >> contents of /proc/mdstat
> >>
> >> root@superior:/mnt/backup# cat /proc/mdstat
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> unused devices: <none>
> >> root@superior:/mnt/backup#
> >>
> >>
> >>
> >> Here are the steps I ran (minus some mounting other devices and
> >> looking around for mdadm tracks on the old os disk)
> >>
> >> 410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> >> echo /dev/{1})
> >> 411  apt install parallel
> >> 412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> >> echo /dev/{1})
> >> 413  echo $DEVICES
>
> So you found no MD RAID super blocks on any of the base devices.  You
> can skip this step moving forward.
>
> >> 414  cat /proc/partitions
> >> 415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
> >> 416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
> >> 417  echo $DEVICES
> >> 418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> >> {#}' ::: $DEVICES
> >> 419  ls /dev/loop*
>
> Can you show the output of all these commands, not just the commands please?
>
> >> 423  parallel truncate -s300G overlay-{/} ::: $DEVICES
>
> >> 427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> >> create {/}' ::: $DEVICES
> >> 428  ls /dev/mapper/
>
> This is some key output to view.
>
> >> 429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
> >> 430  echo $OVERLAYS
>
> What are the overlays?
>
> >> 431  dmsetup status
>
> What did this command show?
>
> >> 432  mdadm --assemble --force /dev/md1 $OVERLAYS
>
> And here is where I think you need to put --assume-clean when using
> 'create' command instead.  It's not going to assemble anything because
> the info was wiped.  I *think* you really want:
>
>    mdadm --create /dev/md1 --level=raid6 -n 6 --assume-clean $OVERLAYS
>
> And once you do this above command and it comes back, do:
>
>     cat /proc/mdstat
>
> and show all the output please!
>
> >> 433  history
> >> 434  dmsetup status
> >> 435  echo $OVERLAYS
> >> 436  mdadm --assemble --force /dev/md0 $OVERLAYS
> >> 437  cat /proc/partitions
> >> 438  mkdir /mnt/oldroot
> >> << look for inird mdadm files >>
> >> 484  echo $OVERLAYS
> >> 485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> >> /dev/mapper/sdg
>
> I'm confused here, what  is the difference between the md1 you
> assembled above, and the md0 you're doing here?
>
> >> << cancelled out of 485, review instructions... >>
> >> 486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> >> /dev/mapper/sdg
> >> 487  fsck -n /dev/md0
>
> And what output did you get here?  Did it find a filesystem?  You might want
> to try:
>
>    blkid /dev/md0
>
>
> >> 488  mdadm --stop /dev/md0
> >> 489  echo $DEVICES
> >> 490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
> >> 491  dmsetup status
>
> This all worked properly?  No errors?
>
> I gave up after this because it's not clear what the results really
> are.  If you don't find a filesystem that fsck's cleanly, then you
> should just need to stop the array, then re-create it but shuffle the
> order of the devices.
>
> Instead of disk in order of "sdb sdc sdd... sdN", you would try the
> order "sdc sdd ... sdN sdb".   See how I moved sdb to the end of the
> list of devices?  With six disks, you have I think 6 factorial options
> to try.   Which is alot of options to go though, and why you need to
> automate this more.  But also keep a log and show the output!
>
> John
>
>
> >> 492  ls
> >> 493  rm overlay-*
> >> 494  ls
> >> 495  parallel losetup -d ::: /dev/loop[0-9]*
> >> 496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> >> {#}' ::: $DEVICES
> >> 497  parallel truncate -s300G overlay-{/} ::: $DEVICES
> >> 498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> >> create {/}' ::: $DEVICES
> >> 499  dmsetup status
> >> 500  /sbin/reboot
> >> 501  history
> >> 502  dmsetup status
> >> 503  mount
> >> 504  cat /proc/partitions
> >> 505  nano /etc/fstab
> >> 506  mount /mnt/backup/
> >> 507  ls /mnt/backup/
> >> 508  rm /mnt/backup/
> >> 509  rm /mnt/backup/overlay-sd*
> >> 510  emacs setupOverlay &
> >> 511  ps auxww | grep emacs
> >> 512  kill 65017
> >> 513  ls /dev/loo*
> >> 514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
> >> 515  echo $DEVICES
> >> 516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
> >> 7 {#}' ::: $DEVICES
> >> 517  ls /dev/loo*
> >> 518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
> >> 519  ls
> >> 520  rm overlay-sd*
> >> 521  cd /mnt/bak
> >> 522  cd /mnt/backup/
> >> 523  ls
> >> 524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
> >> 525  ls -la
> >> 526  blockdev --getsize /dev/sdb
> >> 527  man losetup
> >> 528  man losetup
> >> 529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> >> create {/}' ::: $DEVICES
> >> 530  dmsetup status
> >> 531  history | grep mdadm
> >> 532  history
> >> 533  dmsetup status
> >> 534  history | grep dmsetup
> >> 535  dmsetup status
> >> 536  dmsetup remove sdg
> >> 537  dmsetup ls --tree
> >> 538  lsof
> >> 539  dmsetup ls --tre
> >> 540  dmsetup ls --tree
> >> 541  lsof | grep -i sdg
> >> 542  lsof | grep -i sdf
> >> 543  history |grep dmsetup | less
> >> 544  dmsetup status
> >> 545  history > ~plsander/Documents/raidIssues/joblog
> >>
> >> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
> >> >
> >> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >> >
> >> > > encountering a puzzling situation.
> >> > > dmsetup is failing to return.
> >> >
> >> > I don't think you need to use dmsetup in your case, but can you post
> >> > *all* the commands you ran before you got to this point, and the
> >> > output of
> >> >
> >> >        cat /proc/mdstat
> >> >
> >> > as well?  Thinking on this some more, you might need to actually also
> >> > add:
> >> >
> >> >         --assume-clean
> >> >
> >> > to the 'mdadm create ....' string, since you don't want it to zero the
> >> > array or anything.
> >> >
> >> > Sorry for not remembering this at the time!
> >> >
> >> > So if you can, please just start over from scratch, showing the setup
> >> > of the loop devices, the overlayfs setup, and the building the RAID6
> >> > array, along with the cat /proc/mdstat after you do the initial build.
> >> >
> >> > John
> >> >
> >> > P.S.  For those who hated my email citing tool, I pulled it out for
> >> > now.  Only citing with > now.  :-)
> >> >
> >> > > root@superior:/mnt/backup# dmsetup status
> >> > > sdg: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> >> > > sde: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> >> >
> >> > > dmsetup remove sdg  runs for hours.
> >> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
> >> >
> >> > > dmsetup status shows:
> >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> >> > > sde: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> >> >
> >> > > dmsetup ls --tree
> >> > > root@superior:/mnt/backup# dmsetup ls --tree
> >> > > sdf (253:3)
> >> > >  ├─ (7:3)
> >> > >  └─ (8:80)
> >> > > sde (253:1)
> >> > >  ├─ (7:1)
> >> > >  └─ (8:64)
> >> > > sdd (253:2)
> >> > >  ├─ (7:2)
> >> > >  └─ (8:48)
> >> > > sdc (253:0)
> >> > >  ├─ (7:0)
> >> > >  └─ (8:32)
> >> > > sdb (253:5)
> >> > >  ├─ (7:5)
> >> > >  └─ (8:16)
> >> >
> >> > > any suggestions?
> >> >
> >> >
> >> >
> >> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
> >> > >>
> >> > >> On 30/08/2022 14:27, Peter Sanders wrote:
> >> > >> >
> >> > >> > And the victory conditions would be a mountable file system that passes a fsck?
> >> > >>
> >> > >> Yes. Just make sure you delve through the file system a bit and satisfy
> >> > >> yourself it looks good, too ...
> >> > >>
> >> > >> Cheers,
> >> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-03  0:39                                     ` Peter Sanders
@ 2022-09-03  5:51                                       ` Peter Sanders
  2022-09-05 19:36                                         ` John Stoffel
  2022-09-05 19:25                                       ` John Stoffel
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Sanders @ 2022-09-03  5:51 UTC (permalink / raw)
  To: John Stoffel; +Cc: Wols Lists, Eyal Lebedinsky, linux-raid

tried removing the setup:

root@superior:/mnt/backup# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@superior:/mnt/backup#  parallel 'dmsetup remove {/}; rm
overlay-{/}' ::: $DEVICES
^C

(ran for an hour before cancel... )

root@superior:/mnt/backup# dmsetup status
No devices found
root@superior:/mnt/backupls
lost+found  overlay-sdb  overlay-sdc  overlay-sdd  overlay-sde
overlay-sdf  overlay-sdg
root@superior:/mnt/backup# rm overlay-sd*
root@superior:/mnt/backup# ls /dev/loop
ls: cannot access '/dev/loop': No such file or directory
root@superior:/mnt/backup# ls /dev/loop*
/dev/loop0  /dev/loop2    /dev/loop4  /dev/loop6    /dev/loop-control
/dev/loop1  /dev/loop3    /dev/loop5  /dev/loop7
root@superior:/mnt/backup# parallel losetup -d ::: /dev/loop[0-9]*
losetup: /dev/loop6: detach failed: No such device or address
losetup: /dev/loop7: detach failed: No such device or address
root@superior:/mnt/backup# ls /dev/loop*
/dev/loop0  /dev/loop2    /dev/loop4  /dev/loop6    /dev/loop-control
/dev/loop1  /dev/loop3    /dev/loop5  /dev/loop7
root@superior:/mnt/backup# ls -la /dev/lo*
lrwxrwxrwx 1 root root      28 Sep  2 20:22 /dev/log ->
/run/systemd/journal/dev-log
brw-rw---- 1 root disk  7,   0 Sep  2 20:32 /dev/loop0
brw-rw---- 1 root disk  7,   1 Sep  2 20:32 /dev/loop1
brw-rw---- 1 root disk  7,   2 Sep  2 20:32 /dev/loop2
brw-rw---- 1 root disk  7,   3 Sep  2 20:32 /dev/loop3
brw-rw---- 1 root disk  7,   4 Sep  2 20:32 /dev/loop4
brw-rw---- 1 root disk  7,   5 Sep  2 20:32 /dev/loop5
brw-rw---- 1 root disk  7,   6 Sep  2 20:32 /dev/loop6
brw-rw---- 1 root disk  7,   7 Sep  2 20:32 /dev/loop7
crw-rw---- 1 root disk 10, 237 Sep  2 20:32 /dev/loop-control
root@superior:/mnt/backup# losetup -d ::: /dev/loop70
losetup: :::: failed to use device: No such device
root@superior:/mnt/backup# losetup -d ::: /dev/loop7
losetup: :::: failed to use device: No such device
root@superior:/mnt/backup# losetup -d  /dev/loop7
losetup: /dev/loop7: detach failed: No such device or address
root@superior:/mnt/backup# ls -la /dev/loop*
brw-rw---- 1 root disk  7,   0 Sep  2 20:32 /dev/loop0
brw-rw---- 1 root disk  7,   1 Sep  2 20:32 /dev/loop1
brw-rw---- 1 root disk  7,   2 Sep  2 20:32 /dev/loop2
brw-rw---- 1 root disk  7,   3 Sep  2 20:32 /dev/loop3
brw-rw---- 1 root disk  7,   4 Sep  2 20:32 /dev/loop4
brw-rw---- 1 root disk  7,   5 Sep  2 20:32 /dev/loop5
brw-rw---- 1 root disk  7,   6 Sep  2 20:32 /dev/loop6
brw-rw---- 1 root disk  7,   7 Sep  2 20:32 /dev/loop7
crw-rw---- 1 root disk 10, 237 Sep  2 20:32 /dev/loop-control
root@superior:/mnt/backup# losetup /dev/loop7
losetup: /dev/loop7: No such file or directory
root@superior:/mnt/backup# losetup /dev/loop5
losetup: /dev/loop5: No such file or directory
root@superior:/mnt/backup#


not sure why losetup cannot see the existing /dev/loopx devices.

On Fri, Sep 2, 2022 at 8:39 PM Peter Sanders <plsander@gmail.com> wrote:
>
> Repeat of run 1
>
> plsander@superior:~$ su -
> Password:
> root@superior:~# cat /proc/partitions
> major minor  #blocks  name
>
>  259        0  250059096 nvme0n1
>  259        1     496640 nvme0n1p1
>  259        2          1 nvme0n1p2
>  259        3   63475712 nvme0n1p5
>  259        4   97654784 nvme0n1p6
>  259        5      37888 nvme0n1p7
>  259        6   86913024 nvme0n1p8
>  259        7    1474560 nvme0n1p9
>    8       16 2930266584 sdb
>    8       80 2930266584 sdf
>    8        0 1953514584 sda
>    8        1 1953513472 sda1
>    8       32 2930266584 sdc
>    8       96 2930266584 sdg
>    8       64 2930266584 sde
>    8       48 2930266584 sdd
>   11        0    1048575 sr0
> root@superior:~# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
> root@superior:~# DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
> root@superior:~# echo $DEVICES
> /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
> root@superior:~# parallel 'test -e /dev/loop{#} || mknod -m 660
> /dev/loop{#} b 7 {#}' ::: $DEVICES
> root@superior:~# ls /dev/lo
> log           loop2         loop4         loop6
> loop1         loop3         loop5         loop-control
> root@superior:~# ls /dev/lo*
> /dev/log  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5
> /dev/loop6  /dev/loop-control
> root@superior:~# ls -l /dev/loop*
> brw-rw---- 1 root root  7,   1 Sep  2 20:30 /dev/loop1
> brw-rw---- 1 root root  7,   2 Sep  2 20:30 /dev/loop2
> brw-rw---- 1 root root  7,   3 Sep  2 20:30 /dev/loop3
> brw-rw---- 1 root root  7,   4 Sep  2 20:30 /dev/loop4
> brw-rw---- 1 root root  7,   5 Sep  2 20:30 /dev/loop5
> brw-rw---- 1 root root  7,   6 Sep  2 20:30 /dev/loop6
> crw-rw---- 1 root disk 10, 237 Sep  2 20:22 /dev/loop-control
> root@superior:~# cd /mnt/backup/
> root@superior:/mnt/backup# parallel truncate -s4000G overlay-{/} ::: $DEVICES
> root@superior:/mnt/backup# ls -l
> total 16
> drwx------ 2 root root         16384 Aug 28 18:50 lost+found
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdb
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdc
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdd
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sde
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdf
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdg
> root@superior:/mnt/backup# rm over*
> root@superior:/mnt/backup# parallel truncate -s300G overlay-{/} ::: $DEVICES
> root@superior:/mnt/backup# ls -la
> total 24
> drwxr-xr-x 3 root root         4096 Sep  2 20:31 .
> drwxr-xr-x 7 root root         4096 Aug 29 09:17 ..
> drwx------ 2 root root        16384 Aug 28 18:50 lost+found
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdb
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdc
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdd
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sde
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdf
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdg
> root@superior:/mnt/backup# dmsetup status
> No devices found
> root@superior:/mnt/backup# date
> Fri 02 Sep 2022 08:32:11 PM EDT
> root@superior:/mnt/backup#  parallel 'size=$(blockdev --getsize {});
> loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {}
> $loop P 8 | dmsetup create {/}' ::: $DEVICES
> root@superior:/mnt/backup# date
> Fri 02 Sep 2022 08:32:20 PM EDT
> root@superior:/mnt/backup# dmsetup status
> sdg: 0 5860533168 snapshot 16/629145600 16
> sdf: 0 5860533168 snapshot 16/629145600 16
> sde: 0 5860533168 snapshot 16/629145600 16
> sdd: 0 5860533168 snapshot 16/629145600 16
> sdc: 0 5860533168 snapshot 16/629145600 16
> sdb: 0 5860533168 snapshot 16/629145600 16
> root@superior:/mnt/backup# OVERLAYS=$(parallel echo /dev/mapper/{/}
> ::: $DEVICES)
> root@superior:/mnt/backup# echo $OVERLAYS
> /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde
> /dev/mapper/sdf /dev/mapper/sdg
> root@superior:/mnt/backup# mdadm --create /dev/md1 --level=raid6 -n 6
> --assume-clean $OVERLAYS
> mdadm: partition table exists on /dev/mapper/sdb
> mdadm: partition table exists on /dev/mapper/sdc
> mdadm: partition table exists on /dev/mapper/sdc but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sdd
> mdadm: partition table exists on /dev/mapper/sdd but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sde
> mdadm: partition table exists on /dev/mapper/sde but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sdf
> mdadm: partition table exists on /dev/mapper/sdf but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sdg
> mdadm: partition table exists on /dev/mapper/sdg but will be lost or
>        meaningless after creating array
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md1 started.
> root@superior:/mnt/backup# ls -l /dev/md*
> brw-rw---- 1 root disk 9, 1 Sep  2 20:34 /dev/md1
> root@superior:/mnt/backup# fsck /dev/md1
> fsck from util-linux 2.36.1
> e2fsck 1.46.2 (28-Feb-2021)
> ext2fs_open2: Bad magic number in super-block
> fsck.ext2: Superblock invalid, trying backup blocks...
> fsck.ext2: Bad magic number in super-block while trying to open /dev/md1
>
> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
>     e2fsck -b 8193 <device>
>  or
>     e2fsck -b 32768 <device>
>
> root@superior:/mnt/backup# blkid /dev/md1
> root@superior:/mnt/backup#
> root@superior:/mnt/backup# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md1 : active raid6 dm-3[5] dm-2[4] dm-1[3] dm-5[2] dm-0[1] dm-4[0]
>       11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [6/6] [UUUUUU]
>       bitmap: 0/22 pages [0KB], 65536KB chunk
>
> unused devices: <none>
> root@superior:/mnt/backup#
>
> Some questions -
> - is the easiest 'reset for next run' to reboot and rebuild?
>
>
> On Fri, Sep 2, 2022 at 3:12 PM John Stoffel <john@stoffel.org> wrote:
> >
> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >
> > Peter, please include the output of all the commands, not just the
> > commands themselves.  See my comments below.
> >
> >
> > > Question on restarting from scratch...
> > > How to reset to the starting point?
> >
> > I think you need to blow away the loop devices and re-create them.
> >
> > Or at least blow away the dmsetup devices you just created.
> >
> > It might be quickest to just reboot.  What OS are you using for the
> > recovery?  Is it a recent live image?  Sorry for asking so many
> > questions... some of this is new to me too.
> >
> >
> > > dmsetup, both for remove and create of the overlay seems to be hanging.
> >
> > > On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
> > >>
> > >> contents of /proc/mdstat
> > >>
> > >> root@superior:/mnt/backup# cat /proc/mdstat
> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> > >> [raid4] [raid10]
> > >> unused devices: <none>
> > >> root@superior:/mnt/backup#
> > >>
> > >>
> > >>
> > >> Here are the steps I ran (minus some mounting other devices and
> > >> looking around for mdadm tracks on the old os disk)
> > >>
> > >> 410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> > >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> > >> echo /dev/{1})
> > >> 411  apt install parallel
> > >> 412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> > >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> > >> echo /dev/{1})
> > >> 413  echo $DEVICES
> >
> > So you found no MD RAID super blocks on any of the base devices.  You
> > can skip this step moving forward.
> >
> > >> 414  cat /proc/partitions
> > >> 415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
> > >> 416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
> > >> 417  echo $DEVICES
> > >> 418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> > >> {#}' ::: $DEVICES
> > >> 419  ls /dev/loop*
> >
> > Can you show the output of all these commands, not just the commands please?
> >
> > >> 423  parallel truncate -s300G overlay-{/} ::: $DEVICES
> >
> > >> 427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> > >> create {/}' ::: $DEVICES
> > >> 428  ls /dev/mapper/
> >
> > This is some key output to view.
> >
> > >> 429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
> > >> 430  echo $OVERLAYS
> >
> > What are the overlays?
> >
> > >> 431  dmsetup status
> >
> > What did this command show?
> >
> > >> 432  mdadm --assemble --force /dev/md1 $OVERLAYS
> >
> > And here is where I think you need to put --assume-clean when using
> > 'create' command instead.  It's not going to assemble anything because
> > the info was wiped.  I *think* you really want:
> >
> >    mdadm --create /dev/md1 --level=raid6 -n 6 --assume-clean $OVERLAYS
> >
> > And once you do this above command and it comes back, do:
> >
> >     cat /proc/mdstat
> >
> > and show all the output please!
> >
> > >> 433  history
> > >> 434  dmsetup status
> > >> 435  echo $OVERLAYS
> > >> 436  mdadm --assemble --force /dev/md0 $OVERLAYS
> > >> 437  cat /proc/partitions
> > >> 438  mkdir /mnt/oldroot
> > >> << look for inird mdadm files >>
> > >> 484  echo $OVERLAYS
> > >> 485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> > >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> > >> /dev/mapper/sdg
> >
> > I'm confused here, what  is the difference between the md1 you
> > assembled above, and the md0 you're doing here?
> >
> > >> << cancelled out of 485, review instructions... >>
> > >> 486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> > >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> > >> /dev/mapper/sdg
> > >> 487  fsck -n /dev/md0
> >
> > And what output did you get here?  Did it find a filesystem?  You might want
> > to try:
> >
> >    blkid /dev/md0
> >
> >
> > >> 488  mdadm --stop /dev/md0
> > >> 489  echo $DEVICES
> > >> 490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
> > >> 491  dmsetup status
> >
> > This all worked properly?  No errors?
> >
> > I gave up after this because it's not clear what the results really
> > are.  If you don't find a filesystem that fsck's cleanly, then you
> > should just need to stop the array, then re-create it but shuffle the
> > order of the devices.
> >
> > Instead of disk in order of "sdb sdc sdd... sdN", you would try the
> > order "sdc sdd ... sdN sdb".   See how I moved sdb to the end of the
> > list of devices?  With six disks, you have I think 6 factorial options
> > to try.   Which is alot of options to go though, and why you need to
> > automate this more.  But also keep a log and show the output!
> >
> > John
> >
> >
> > >> 492  ls
> > >> 493  rm overlay-*
> > >> 494  ls
> > >> 495  parallel losetup -d ::: /dev/loop[0-9]*
> > >> 496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> > >> {#}' ::: $DEVICES
> > >> 497  parallel truncate -s300G overlay-{/} ::: $DEVICES
> > >> 498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> > >> create {/}' ::: $DEVICES
> > >> 499  dmsetup status
> > >> 500  /sbin/reboot
> > >> 501  history
> > >> 502  dmsetup status
> > >> 503  mount
> > >> 504  cat /proc/partitions
> > >> 505  nano /etc/fstab
> > >> 506  mount /mnt/backup/
> > >> 507  ls /mnt/backup/
> > >> 508  rm /mnt/backup/
> > >> 509  rm /mnt/backup/overlay-sd*
> > >> 510  emacs setupOverlay &
> > >> 511  ps auxww | grep emacs
> > >> 512  kill 65017
> > >> 513  ls /dev/loo*
> > >> 514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
> > >> 515  echo $DEVICES
> > >> 516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
> > >> 7 {#}' ::: $DEVICES
> > >> 517  ls /dev/loo*
> > >> 518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
> > >> 519  ls
> > >> 520  rm overlay-sd*
> > >> 521  cd /mnt/bak
> > >> 522  cd /mnt/backup/
> > >> 523  ls
> > >> 524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
> > >> 525  ls -la
> > >> 526  blockdev --getsize /dev/sdb
> > >> 527  man losetup
> > >> 528  man losetup
> > >> 529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> > >> create {/}' ::: $DEVICES
> > >> 530  dmsetup status
> > >> 531  history | grep mdadm
> > >> 532  history
> > >> 533  dmsetup status
> > >> 534  history | grep dmsetup
> > >> 535  dmsetup status
> > >> 536  dmsetup remove sdg
> > >> 537  dmsetup ls --tree
> > >> 538  lsof
> > >> 539  dmsetup ls --tre
> > >> 540  dmsetup ls --tree
> > >> 541  lsof | grep -i sdg
> > >> 542  lsof | grep -i sdf
> > >> 543  history |grep dmsetup | less
> > >> 544  dmsetup status
> > >> 545  history > ~plsander/Documents/raidIssues/joblog
> > >>
> > >> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
> > >> >
> > >> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> > >> >
> > >> > > encountering a puzzling situation.
> > >> > > dmsetup is failing to return.
> > >> >
> > >> > I don't think you need to use dmsetup in your case, but can you post
> > >> > *all* the commands you ran before you got to this point, and the
> > >> > output of
> > >> >
> > >> >        cat /proc/mdstat
> > >> >
> > >> > as well?  Thinking on this some more, you might need to actually also
> > >> > add:
> > >> >
> > >> >         --assume-clean
> > >> >
> > >> > to the 'mdadm create ....' string, since you don't want it to zero the
> > >> > array or anything.
> > >> >
> > >> > Sorry for not remembering this at the time!
> > >> >
> > >> > So if you can, please just start over from scratch, showing the setup
> > >> > of the loop devices, the overlayfs setup, and the building the RAID6
> > >> > array, along with the cat /proc/mdstat after you do the initial build.
> > >> >
> > >> > John
> > >> >
> > >> > P.S.  For those who hated my email citing tool, I pulled it out for
> > >> > now.  Only citing with > now.  :-)
> > >> >
> > >> > > root@superior:/mnt/backup# dmsetup status
> > >> > > sdg: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sde: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> > >> >
> > >> > > dmsetup remove sdg  runs for hours.
> > >> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
> > >> >
> > >> > > dmsetup status shows:
> > >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sde: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> > >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> > >> >
> > >> > > dmsetup ls --tree
> > >> > > root@superior:/mnt/backup# dmsetup ls --tree
> > >> > > sdf (253:3)
> > >> > >  ├─ (7:3)
> > >> > >  └─ (8:80)
> > >> > > sde (253:1)
> > >> > >  ├─ (7:1)
> > >> > >  └─ (8:64)
> > >> > > sdd (253:2)
> > >> > >  ├─ (7:2)
> > >> > >  └─ (8:48)
> > >> > > sdc (253:0)
> > >> > >  ├─ (7:0)
> > >> > >  └─ (8:32)
> > >> > > sdb (253:5)
> > >> > >  ├─ (7:5)
> > >> > >  └─ (8:16)
> > >> >
> > >> > > any suggestions?
> > >> >
> > >> >
> > >> >
> > >> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
> > >> > >>
> > >> > >> On 30/08/2022 14:27, Peter Sanders wrote:
> > >> > >> >
> > >> > >> > And the victory conditions would be a mountable file system that passes a fsck?
> > >> > >>
> > >> > >> Yes. Just make sure you delve through the file system a bit and satisfy
> > >> > >> yourself it looks good, too ...
> > >> > >>
> > >> > >> Cheers,
> > >> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-03  0:39                                     ` Peter Sanders
  2022-09-03  5:51                                       ` Peter Sanders
@ 2022-09-05 19:25                                       ` John Stoffel
  1 sibling, 0 replies; 29+ messages in thread
From: John Stoffel @ 2022-09-05 19:25 UTC (permalink / raw)
  To: Peter Sanders; +Cc: John Stoffel, Wols Lists, Eyal Lebedinsky, linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

> Repeat of run 1
> plsander@superior:~$ su -
> Password:
> root@superior:~# cat /proc/partitions
> major minor  #blocks  name

>  259        0  250059096 nvme0n1
>  259        1     496640 nvme0n1p1
>  259        2          1 nvme0n1p2
>  259        3   63475712 nvme0n1p5
>  259        4   97654784 nvme0n1p6
>  259        5      37888 nvme0n1p7
>  259        6   86913024 nvme0n1p8
>  259        7    1474560 nvme0n1p9
>    8       16 2930266584 sdb
>    8       80 2930266584 sdf
>    8        0 1953514584 sda
>    8        1 1953513472 sda1
>    8       32 2930266584 sdc
>    8       96 2930266584 sdg
>    8       64 2930266584 sde
>    8       48 2930266584 sdd
>   11        0    1048575 sr0
> root@superior:~# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> unused devices: <none>
> root@superior:~# DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
> root@superior:~# echo $DEVICES
> /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
> root@superior:~# parallel 'test -e /dev/loop{#} || mknod -m 660
> /dev/loop{#} b 7 {#}' ::: $DEVICES
> root@superior:~# ls /dev/lo
> log           loop2         loop4         loop6
> loop1         loop3         loop5         loop-control
> root@superior:~# ls /dev/lo*
> /dev/log  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5
> /dev/loop6  /dev/loop-control
> root@superior:~# ls -l /dev/loop*
> brw-rw---- 1 root root  7,   1 Sep  2 20:30 /dev/loop1
> brw-rw---- 1 root root  7,   2 Sep  2 20:30 /dev/loop2
> brw-rw---- 1 root root  7,   3 Sep  2 20:30 /dev/loop3
> brw-rw---- 1 root root  7,   4 Sep  2 20:30 /dev/loop4
> brw-rw---- 1 root root  7,   5 Sep  2 20:30 /dev/loop5
> brw-rw---- 1 root root  7,   6 Sep  2 20:30 /dev/loop6
> crw-rw---- 1 root disk 10, 237 Sep  2 20:22 /dev/loop-control
> root@superior:~# cd /mnt/backup/
> root@superior:/mnt/backup# parallel truncate -s4000G overlay-{/} ::: $DEVICES
> root@superior:/mnt/backup# ls -l
> total 16
> drwx------ 2 root root         16384 Aug 28 18:50 lost+found
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdb
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdc
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdd
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sde
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdf
> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdg
> root@superior:/mnt/backup# rm over*

So why did you remove these overlays?  Too big for some reason?

> root@superior:/mnt/backup# parallel truncate -s300G overlay-{/} ::: $DEVICES
> root@superior:/mnt/backup# ls -la
> total 24
> drwxr-xr-x 3 root root         4096 Sep  2 20:31 .
> drwxr-xr-x 7 root root         4096 Aug 29 09:17 ..
> drwx------ 2 root root        16384 Aug 28 18:50 lost+found
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdb
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdc
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdd
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sde
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdf
> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdg
> root@superior:/mnt/backup# dmsetup status
> No devices found
> root@superior:/mnt/backup# date
> Fri 02 Sep 2022 08:32:11 PM EDT

This looks good.

> root@superior:/mnt/backup#  parallel 'size=$(blockdev --getsize {});
> loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {}
> $loop P 8 | dmsetup create {/}' ::: $DEVICES
> root@superior:/mnt/backup# date
> Fri 02 Sep 2022 08:32:20 PM EDT
> root@superior:/mnt/backup# dmsetup status
> sdg: 0 5860533168 snapshot 16/629145600 16
> sdf: 0 5860533168 snapshot 16/629145600 16
> sde: 0 5860533168 snapshot 16/629145600 16
> sdd: 0 5860533168 snapshot 16/629145600 16
> sdc: 0 5860533168 snapshot 16/629145600 16
> sdb: 0 5860533168 snapshot 16/629145600 16

Here's where I might want to see the output of the commands:  pvs, vgs
and lvs.  

I'm not wild about the 'dmsetup status' command and it's output.  


> root@superior:/mnt/backup# OVERLAYS=$(parallel echo /dev/mapper/{/}
> ::: $DEVICES)
> root@superior:/mnt/backup# echo $OVERLAYS
> /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde
> /dev/mapper/sdf /dev/mapper/sdg
> root@superior:/mnt/backup# mdadm --create /dev/md1 --level=raid6 -n 6
> --assume-clean $OVERLAYS
> mdadm: partition table exists on /dev/mapper/sdb
> mdadm: partition table exists on /dev/mapper/sdc
> mdadm: partition table exists on /dev/mapper/sdc but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sdd
> mdadm: partition table exists on /dev/mapper/sdd but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sde
> mdadm: partition table exists on /dev/mapper/sde but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sdf
> mdadm: partition table exists on /dev/mapper/sdf but will be lost or
>        meaningless after creating array
> mdadm: partition table exists on /dev/mapper/sdg
> mdadm: partition table exists on /dev/mapper/sdg but will be lost or
>        meaningless after creating array
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: array /dev/md1 started.
> root@superior:/mnt/backup# ls -l /dev/md*
> brw-rw---- 1 root disk 9, 1 Sep  2 20:34 /dev/md1
> root@superior:/mnt/backup# fsck /dev/md1
> fsck from util-linux 2.36.1
> e2fsck 1.46.2 (28-Feb-2021)
> ext2fs_open2: Bad magic number in super-block
> fsck.ext2: Superblock invalid, trying backup blocks...
> fsck.ext2: Bad magic number in super-block while trying to open /dev/md1

> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
>     e2fsck -b 8193 <device>
>  or
>     e2fsck -b 32768 <device>


Did you try these various e2fsck -b numbers?  You might need to look
at the ext2 man page for if there are higher numbers.  It all depends
on how much of the start of the disk(s) has been wiped here.  


> root@superior:/mnt/backup# blkid /dev/md1
> root@superior:/mnt/backup#
> root@superior:/mnt/backup# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md1 : active raid6 dm-3[5] dm-2[4] dm-1[3] dm-5[2] dm-0[1] dm-4[0]
>       11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [6/6] [UUUUUU]
>       bitmap: 0/22 pages [0KB], 65536KB chunk

> unused devices: <none>
> root@superior:/mnt/backup#

> Some questions -
> - is the easiest 'reset for next run' to reboot and rebuild?


I think if you do:

  mdadm stop /dev/md1
  dmsetup remove ...

And it should release the disks.  You might need to do the '-force'
flag to dmsetup remove though.  


It should work.  I'm just worried that each time you reboot the new
motherboard's bios is doing something to the disks and maybe
re-initilizing the disks.  

Could you maybe dump the first couple of MBs of one of the disks and
then hexdump it with ASCII to look for strings you might recognize?

Then you can also keep that and see if there's any changes when you
reboot, just as a sanity check.  



> On Fri, Sep 2, 2022 at 3:12 PM John Stoffel <john@stoffel.org> wrote:
>> 
>> >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> 
>> Peter, please include the output of all the commands, not just the
>> commands themselves.  See my comments below.
>> 
>> 
>> > Question on restarting from scratch...
>> > How to reset to the starting point?
>> 
>> I think you need to blow away the loop devices and re-create them.
>> 
>> Or at least blow away the dmsetup devices you just created.
>> 
>> It might be quickest to just reboot.  What OS are you using for the
>> recovery?  Is it a recent live image?  Sorry for asking so many
>> questions... some of this is new to me too.
>> 
>> 
>> > dmsetup, both for remove and create of the overlay seems to be hanging.
>> 
>> > On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
>> >>
>> >> contents of /proc/mdstat
>> >>
>> >> root@superior:/mnt/backup# cat /proc/mdstat
>> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> >> [raid4] [raid10]
>> >> unused devices: <none>
>> >> root@superior:/mnt/backup#
>> >>
>> >>
>> >>
>> >> Here are the steps I ran (minus some mounting other devices and
>> >> looking around for mdadm tracks on the old os disk)
>> >>
>> >> 410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
>> >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
>> >> echo /dev/{1})
>> >> 411  apt install parallel
>> >> 412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
>> >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
>> >> echo /dev/{1})
>> >> 413  echo $DEVICES
>> 
>> So you found no MD RAID super blocks on any of the base devices.  You
>> can skip this step moving forward.
>> 
>> >> 414  cat /proc/partitions
>> >> 415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
>> >> 416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
>> >> 417  echo $DEVICES
>> >> 418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
>> >> {#}' ::: $DEVICES
>> >> 419  ls /dev/loop*
>> 
>> Can you show the output of all these commands, not just the commands please?
>> 
>> >> 423  parallel truncate -s300G overlay-{/} ::: $DEVICES
>> 
>> >> 427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> >> create {/}' ::: $DEVICES
>> >> 428  ls /dev/mapper/
>> 
>> This is some key output to view.
>> 
>> >> 429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
>> >> 430  echo $OVERLAYS
>> 
>> What are the overlays?
>> 
>> >> 431  dmsetup status
>> 
>> What did this command show?
>> 
>> >> 432  mdadm --assemble --force /dev/md1 $OVERLAYS
>> 
>> And here is where I think you need to put --assume-clean when using
>> 'create' command instead.  It's not going to assemble anything because
>> the info was wiped.  I *think* you really want:
>> 
>> mdadm --create /dev/md1 --level=raid6 -n 6 --assume-clean $OVERLAYS
>> 
>> And once you do this above command and it comes back, do:
>> 
>> cat /proc/mdstat
>> 
>> and show all the output please!
>> 
>> >> 433  history
>> >> 434  dmsetup status
>> >> 435  echo $OVERLAYS
>> >> 436  mdadm --assemble --force /dev/md0 $OVERLAYS
>> >> 437  cat /proc/partitions
>> >> 438  mkdir /mnt/oldroot
>> >> << look for inird mdadm files >>
>> >> 484  echo $OVERLAYS
>> >> 485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
>> >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
>> >> /dev/mapper/sdg
>> 
>> I'm confused here, what  is the difference between the md1 you
>> assembled above, and the md0 you're doing here?
>> 
>> >> << cancelled out of 485, review instructions... >>
>> >> 486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
>> >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
>> >> /dev/mapper/sdg
>> >> 487  fsck -n /dev/md0
>> 
>> And what output did you get here?  Did it find a filesystem?  You might want
>> to try:
>> 
>> blkid /dev/md0
>> 
>> 
>> >> 488  mdadm --stop /dev/md0
>> >> 489  echo $DEVICES
>> >> 490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
>> >> 491  dmsetup status
>> 
>> This all worked properly?  No errors?
>> 
>> I gave up after this because it's not clear what the results really
>> are.  If you don't find a filesystem that fsck's cleanly, then you
>> should just need to stop the array, then re-create it but shuffle the
>> order of the devices.
>> 
>> Instead of disk in order of "sdb sdc sdd... sdN", you would try the
>> order "sdc sdd ... sdN sdb".   See how I moved sdb to the end of the
>> list of devices?  With six disks, you have I think 6 factorial options
>> to try.   Which is alot of options to go though, and why you need to
>> automate this more.  But also keep a log and show the output!
>> 
>> John
>> 
>> 
>> >> 492  ls
>> >> 493  rm overlay-*
>> >> 494  ls
>> >> 495  parallel losetup -d ::: /dev/loop[0-9]*
>> >> 496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
>> >> {#}' ::: $DEVICES
>> >> 497  parallel truncate -s300G overlay-{/} ::: $DEVICES
>> >> 498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> >> create {/}' ::: $DEVICES
>> >> 499  dmsetup status
>> >> 500  /sbin/reboot
>> >> 501  history
>> >> 502  dmsetup status
>> >> 503  mount
>> >> 504  cat /proc/partitions
>> >> 505  nano /etc/fstab
>> >> 506  mount /mnt/backup/
>> >> 507  ls /mnt/backup/
>> >> 508  rm /mnt/backup/
>> >> 509  rm /mnt/backup/overlay-sd*
>> >> 510  emacs setupOverlay &
>> >> 511  ps auxww | grep emacs
>> >> 512  kill 65017
>> >> 513  ls /dev/loo*
>> >> 514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
>> >> 515  echo $DEVICES
>> >> 516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
>> >> 7 {#}' ::: $DEVICES
>> >> 517  ls /dev/loo*
>> >> 518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> >> 519  ls
>> >> 520  rm overlay-sd*
>> >> 521  cd /mnt/bak
>> >> 522  cd /mnt/backup/
>> >> 523  ls
>> >> 524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> >> 525  ls -la
>> >> 526  blockdev --getsize /dev/sdb
>> >> 527  man losetup
>> >> 528  man losetup
>> >> 529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> >> create {/}' ::: $DEVICES
>> >> 530  dmsetup status
>> >> 531  history | grep mdadm
>> >> 532  history
>> >> 533  dmsetup status
>> >> 534  history | grep dmsetup
>> >> 535  dmsetup status
>> >> 536  dmsetup remove sdg
>> >> 537  dmsetup ls --tree
>> >> 538  lsof
>> >> 539  dmsetup ls --tre
>> >> 540  dmsetup ls --tree
>> >> 541  lsof | grep -i sdg
>> >> 542  lsof | grep -i sdf
>> >> 543  history |grep dmsetup | less
>> >> 544  dmsetup status
>> >> 545  history > ~plsander/Documents/raidIssues/joblog
>> >>
>> >> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
>> >> >
>> >> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> >> >
>> >> > > encountering a puzzling situation.
>> >> > > dmsetup is failing to return.
>> >> >
>> >> > I don't think you need to use dmsetup in your case, but can you post
>> >> > *all* the commands you ran before you got to this point, and the
>> >> > output of
>> >> >
>> >> >        cat /proc/mdstat
>> >> >
>> >> > as well?  Thinking on this some more, you might need to actually also
>> >> > add:
>> >> >
>> >> >         --assume-clean
>> >> >
>> >> > to the 'mdadm create ....' string, since you don't want it to zero the
>> >> > array or anything.
>> >> >
>> >> > Sorry for not remembering this at the time!
>> >> >
>> >> > So if you can, please just start over from scratch, showing the setup
>> >> > of the loop devices, the overlayfs setup, and the building the RAID6
>> >> > array, along with the cat /proc/mdstat after you do the initial build.
>> >> >
>> >> > John
>> >> >
>> >> > P.S.  For those who hated my email citing tool, I pulled it out for
>> >> > now.  Only citing with > now.  :-)
>> >> >
>> >> > > root@superior:/mnt/backup# dmsetup status
>> >> > > sdg: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sde: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
>> >> >
>> >> > > dmsetup remove sdg  runs for hours.
>> >> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
>> >> >
>> >> > > dmsetup status shows:
>> >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sde: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
>> >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
>> >> >
>> >> > > dmsetup ls --tree
>> >> > > root@superior:/mnt/backup# dmsetup ls --tree
>> >> > > sdf (253:3)
>> >> > >  ├─ (7:3)
>> >> > >  └─ (8:80)
>> >> > > sde (253:1)
>> >> > >  ├─ (7:1)
>> >> > >  └─ (8:64)
>> >> > > sdd (253:2)
>> >> > >  ├─ (7:2)
>> >> > >  └─ (8:48)
>> >> > > sdc (253:0)
>> >> > >  ├─ (7:0)
>> >> > >  └─ (8:32)
>> >> > > sdb (253:5)
>> >> > >  ├─ (7:5)
>> >> > >  └─ (8:16)
>> >> >
>> >> > > any suggestions?
>> >> >
>> >> >
>> >> >
>> >> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
>> >> > >>
>> >> > >> On 30/08/2022 14:27, Peter Sanders wrote:
>> >> > >> >
>> >> > >> > And the victory conditions would be a mountable file system that passes a fsck?
>> >> > >>
>> >> > >> Yes. Just make sure you delve through the file system a bit and satisfy
>> >> > >> yourself it looks good, too ...
>> >> > >>
>> >> > >> Cheers,
>> >> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-03  5:51                                       ` Peter Sanders
@ 2022-09-05 19:36                                         ` John Stoffel
  2022-09-05 20:16                                           ` Peter Sanders
  0 siblings, 1 reply; 29+ messages in thread
From: John Stoffel @ 2022-09-05 19:36 UTC (permalink / raw)
  To: Peter Sanders; +Cc: John Stoffel, Wols Lists, Eyal Lebedinsky, linux-raid

>>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:

> tried removing the setup:
> root@superior:/mnt/backup# mdadm --stop /dev/md1
> mdadm: stopped /dev/md1
> root@superior:/mnt/backup#  parallel 'dmsetup remove {/}; rm
> overlay-{/}' ::: $DEVICES
> ^C

> (ran for an hour before cancel... )

> root@superior:/mnt/backup# dmsetup status
> No devices found
> root@superior:/mnt/backupls
> lost+found  overlay-sdb  overlay-sdc  overlay-sdd  overlay-sde
> overlay-sdf  overlay-sdg
> root@superior:/mnt/backup# rm overlay-sd*
> root@superior:/mnt/backup# ls /dev/loop
> ls: cannot access '/dev/loop': No such file or directory
> root@superior:/mnt/backup# ls /dev/loop*
> /dev/loop0  /dev/loop2    /dev/loop4  /dev/loop6    /dev/loop-control
> /dev/loop1  /dev/loop3    /dev/loop5  /dev/loop7
> root@superior:/mnt/backup# parallel losetup -d ::: /dev/loop[0-9]*
> losetup: /dev/loop6: detach failed: No such device or address
> losetup: /dev/loop7: detach failed: No such device or address
> root@superior:/mnt/backup# ls /dev/loop*
> /dev/loop0  /dev/loop2    /dev/loop4  /dev/loop6    /dev/loop-control
> /dev/loop1  /dev/loop3    /dev/loop5  /dev/loop7
> root@superior:/mnt/backup# ls -la /dev/lo*
> lrwxrwxrwx 1 root root      28 Sep  2 20:22 /dev/log ->
> /run/systemd/journal/dev-log
> brw-rw---- 1 root disk  7,   0 Sep  2 20:32 /dev/loop0
> brw-rw---- 1 root disk  7,   1 Sep  2 20:32 /dev/loop1
> brw-rw---- 1 root disk  7,   2 Sep  2 20:32 /dev/loop2
> brw-rw---- 1 root disk  7,   3 Sep  2 20:32 /dev/loop3
> brw-rw---- 1 root disk  7,   4 Sep  2 20:32 /dev/loop4
> brw-rw---- 1 root disk  7,   5 Sep  2 20:32 /dev/loop5
> brw-rw---- 1 root disk  7,   6 Sep  2 20:32 /dev/loop6
> brw-rw---- 1 root disk  7,   7 Sep  2 20:32 /dev/loop7
> crw-rw---- 1 root disk 10, 237 Sep  2 20:32 /dev/loop-control
> root@superior:/mnt/backup# losetup -d ::: /dev/loop70
> losetup: :::: failed to use device: No such device
> root@superior:/mnt/backup# losetup -d ::: /dev/loop7
> losetup: :::: failed to use device: No such device
> root@superior:/mnt/backup# losetup -d  /dev/loop7
> losetup: /dev/loop7: detach failed: No such device or address
> root@superior:/mnt/backup# ls -la /dev/loop*
> brw-rw---- 1 root disk  7,   0 Sep  2 20:32 /dev/loop0
> brw-rw---- 1 root disk  7,   1 Sep  2 20:32 /dev/loop1
> brw-rw---- 1 root disk  7,   2 Sep  2 20:32 /dev/loop2
> brw-rw---- 1 root disk  7,   3 Sep  2 20:32 /dev/loop3
> brw-rw---- 1 root disk  7,   4 Sep  2 20:32 /dev/loop4
> brw-rw---- 1 root disk  7,   5 Sep  2 20:32 /dev/loop5
> brw-rw---- 1 root disk  7,   6 Sep  2 20:32 /dev/loop6
> brw-rw---- 1 root disk  7,   7 Sep  2 20:32 /dev/loop7
> crw-rw---- 1 root disk 10, 237 Sep  2 20:32 /dev/loop-control
> root@superior:/mnt/backup# losetup /dev/loop7
> losetup: /dev/loop7: No such file or directory
> root@superior:/mnt/backup# losetup /dev/loop5
> losetup: /dev/loop5: No such file or directory
> root@superior:/mnt/backup#


> not sure why losetup cannot see the existing /dev/loopx devices.

I was reading the dmsetup man page and it said that if the devices are
open, when you do a remove, it sorta fails them and then blocks them
as unable to have more IO sent to them.  But honestly I'm not an
expert on dmsetup.  

But sure, try to reboot each time, but you also need to make sure the
disks are in the same positions each time after reboot, so look at the
serial numbers.

       lsscsi -g -l 

might be enough to give you unique or instead use:

      hdparm -i /dev/sda | grep SerialNo

to get the info and keep track of which disk is in which order.

John



> On Fri, Sep 2, 2022 at 8:39 PM Peter Sanders <plsander@gmail.com> wrote:
>> 
>> Repeat of run 1
>> 
>> plsander@superior:~$ su -
>> Password:
>> root@superior:~# cat /proc/partitions
>> major minor  #blocks  name
>> 
>> 259        0  250059096 nvme0n1
>> 259        1     496640 nvme0n1p1
>> 259        2          1 nvme0n1p2
>> 259        3   63475712 nvme0n1p5
>> 259        4   97654784 nvme0n1p6
>> 259        5      37888 nvme0n1p7
>> 259        6   86913024 nvme0n1p8
>> 259        7    1474560 nvme0n1p9
>> 8       16 2930266584 sdb
>> 8       80 2930266584 sdf
>> 8        0 1953514584 sda
>> 8        1 1953513472 sda1
>> 8       32 2930266584 sdc
>> 8       96 2930266584 sdg
>> 8       64 2930266584 sde
>> 8       48 2930266584 sdd
>> 11        0    1048575 sr0
>> root@superior:~# cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> unused devices: <none>
>> root@superior:~# DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
>> root@superior:~# echo $DEVICES
>> /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
>> root@superior:~# parallel 'test -e /dev/loop{#} || mknod -m 660
>> /dev/loop{#} b 7 {#}' ::: $DEVICES
>> root@superior:~# ls /dev/lo
>> log           loop2         loop4         loop6
>> loop1         loop3         loop5         loop-control
>> root@superior:~# ls /dev/lo*
>> /dev/log  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5
>> /dev/loop6  /dev/loop-control
>> root@superior:~# ls -l /dev/loop*
>> brw-rw---- 1 root root  7,   1 Sep  2 20:30 /dev/loop1
>> brw-rw---- 1 root root  7,   2 Sep  2 20:30 /dev/loop2
>> brw-rw---- 1 root root  7,   3 Sep  2 20:30 /dev/loop3
>> brw-rw---- 1 root root  7,   4 Sep  2 20:30 /dev/loop4
>> brw-rw---- 1 root root  7,   5 Sep  2 20:30 /dev/loop5
>> brw-rw---- 1 root root  7,   6 Sep  2 20:30 /dev/loop6
>> crw-rw---- 1 root disk 10, 237 Sep  2 20:22 /dev/loop-control
>> root@superior:~# cd /mnt/backup/
>> root@superior:/mnt/backup# parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> root@superior:/mnt/backup# ls -l
>> total 16
>> drwx------ 2 root root         16384 Aug 28 18:50 lost+found
>> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdb
>> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdc
>> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdd
>> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sde
>> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdf
>> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdg
>> root@superior:/mnt/backup# rm over*
>> root@superior:/mnt/backup# parallel truncate -s300G overlay-{/} ::: $DEVICES
>> root@superior:/mnt/backup# ls -la
>> total 24
>> drwxr-xr-x 3 root root         4096 Sep  2 20:31 .
>> drwxr-xr-x 7 root root         4096 Aug 29 09:17 ..
>> drwx------ 2 root root        16384 Aug 28 18:50 lost+found
>> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdb
>> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdc
>> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdd
>> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sde
>> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdf
>> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdg
>> root@superior:/mnt/backup# dmsetup status
>> No devices found
>> root@superior:/mnt/backup# date
>> Fri 02 Sep 2022 08:32:11 PM EDT
>> root@superior:/mnt/backup#  parallel 'size=$(blockdev --getsize {});
>> loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {}
>> $loop P 8 | dmsetup create {/}' ::: $DEVICES
>> root@superior:/mnt/backup# date
>> Fri 02 Sep 2022 08:32:20 PM EDT
>> root@superior:/mnt/backup# dmsetup status
>> sdg: 0 5860533168 snapshot 16/629145600 16
>> sdf: 0 5860533168 snapshot 16/629145600 16
>> sde: 0 5860533168 snapshot 16/629145600 16
>> sdd: 0 5860533168 snapshot 16/629145600 16
>> sdc: 0 5860533168 snapshot 16/629145600 16
>> sdb: 0 5860533168 snapshot 16/629145600 16
>> root@superior:/mnt/backup# OVERLAYS=$(parallel echo /dev/mapper/{/}
>> ::: $DEVICES)
>> root@superior:/mnt/backup# echo $OVERLAYS
>> /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde
>> /dev/mapper/sdf /dev/mapper/sdg
>> root@superior:/mnt/backup# mdadm --create /dev/md1 --level=raid6 -n 6
>> --assume-clean $OVERLAYS
>> mdadm: partition table exists on /dev/mapper/sdb
>> mdadm: partition table exists on /dev/mapper/sdc
>> mdadm: partition table exists on /dev/mapper/sdc but will be lost or
>> meaningless after creating array
>> mdadm: partition table exists on /dev/mapper/sdd
>> mdadm: partition table exists on /dev/mapper/sdd but will be lost or
>> meaningless after creating array
>> mdadm: partition table exists on /dev/mapper/sde
>> mdadm: partition table exists on /dev/mapper/sde but will be lost or
>> meaningless after creating array
>> mdadm: partition table exists on /dev/mapper/sdf
>> mdadm: partition table exists on /dev/mapper/sdf but will be lost or
>> meaningless after creating array
>> mdadm: partition table exists on /dev/mapper/sdg
>> mdadm: partition table exists on /dev/mapper/sdg but will be lost or
>> meaningless after creating array
>> Continue creating array? y
>> mdadm: Defaulting to version 1.2 metadata
>> mdadm: array /dev/md1 started.
>> root@superior:/mnt/backup# ls -l /dev/md*
>> brw-rw---- 1 root disk 9, 1 Sep  2 20:34 /dev/md1
>> root@superior:/mnt/backup# fsck /dev/md1
>> fsck from util-linux 2.36.1
>> e2fsck 1.46.2 (28-Feb-2021)
>> ext2fs_open2: Bad magic number in super-block
>> fsck.ext2: Superblock invalid, trying backup blocks...
>> fsck.ext2: Bad magic number in super-block while trying to open /dev/md1
>> 
>> The superblock could not be read or does not describe a valid ext2/ext3/ext4
>> filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
>> filesystem (and not swap or ufs or something else), then the superblock
>> is corrupt, and you might try running e2fsck with an alternate superblock:
>> e2fsck -b 8193 <device>
>> or
>> e2fsck -b 32768 <device>
>> 
>> root@superior:/mnt/backup# blkid /dev/md1
>> root@superior:/mnt/backup#
>> root@superior:/mnt/backup# cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md1 : active raid6 dm-3[5] dm-2[4] dm-1[3] dm-5[2] dm-0[1] dm-4[0]
>> 11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2
>> [6/6] [UUUUUU]
>> bitmap: 0/22 pages [0KB], 65536KB chunk
>> 
>> unused devices: <none>
>> root@superior:/mnt/backup#
>> 
>> Some questions -
>> - is the easiest 'reset for next run' to reboot and rebuild?
>> 
>> 
>> On Fri, Sep 2, 2022 at 3:12 PM John Stoffel <john@stoffel.org> wrote:
>> >
>> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> >
>> > Peter, please include the output of all the commands, not just the
>> > commands themselves.  See my comments below.
>> >
>> >
>> > > Question on restarting from scratch...
>> > > How to reset to the starting point?
>> >
>> > I think you need to blow away the loop devices and re-create them.
>> >
>> > Or at least blow away the dmsetup devices you just created.
>> >
>> > It might be quickest to just reboot.  What OS are you using for the
>> > recovery?  Is it a recent live image?  Sorry for asking so many
>> > questions... some of this is new to me too.
>> >
>> >
>> > > dmsetup, both for remove and create of the overlay seems to be hanging.
>> >
>> > > On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
>> > >>
>> > >> contents of /proc/mdstat
>> > >>
>> > >> root@superior:/mnt/backup# cat /proc/mdstat
>> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> > >> [raid4] [raid10]
>> > >> unused devices: <none>
>> > >> root@superior:/mnt/backup#
>> > >>
>> > >>
>> > >>
>> > >> Here are the steps I ran (minus some mounting other devices and
>> > >> looking around for mdadm tracks on the old os disk)
>> > >>
>> > >> 410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
>> > >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
>> > >> echo /dev/{1})
>> > >> 411  apt install parallel
>> > >> 412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
>> > >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
>> > >> echo /dev/{1})
>> > >> 413  echo $DEVICES
>> >
>> > So you found no MD RAID super blocks on any of the base devices.  You
>> > can skip this step moving forward.
>> >
>> > >> 414  cat /proc/partitions
>> > >> 415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
>> > >> 416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
>> > >> 417  echo $DEVICES
>> > >> 418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
>> > >> {#}' ::: $DEVICES
>> > >> 419  ls /dev/loop*
>> >
>> > Can you show the output of all these commands, not just the commands please?
>> >
>> > >> 423  parallel truncate -s300G overlay-{/} ::: $DEVICES
>> >
>> > >> 427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> > >> create {/}' ::: $DEVICES
>> > >> 428  ls /dev/mapper/
>> >
>> > This is some key output to view.
>> >
>> > >> 429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
>> > >> 430  echo $OVERLAYS
>> >
>> > What are the overlays?
>> >
>> > >> 431  dmsetup status
>> >
>> > What did this command show?
>> >
>> > >> 432  mdadm --assemble --force /dev/md1 $OVERLAYS
>> >
>> > And here is where I think you need to put --assume-clean when using
>> > 'create' command instead.  It's not going to assemble anything because
>> > the info was wiped.  I *think* you really want:
>> >
>> >    mdadm --create /dev/md1 --level=raid6 -n 6 --assume-clean $OVERLAYS
>> >
>> > And once you do this above command and it comes back, do:
>> >
>> >     cat /proc/mdstat
>> >
>> > and show all the output please!
>> >
>> > >> 433  history
>> > >> 434  dmsetup status
>> > >> 435  echo $OVERLAYS
>> > >> 436  mdadm --assemble --force /dev/md0 $OVERLAYS
>> > >> 437  cat /proc/partitions
>> > >> 438  mkdir /mnt/oldroot
>> > >> << look for inird mdadm files >>
>> > >> 484  echo $OVERLAYS
>> > >> 485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
>> > >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
>> > >> /dev/mapper/sdg
>> >
>> > I'm confused here, what  is the difference between the md1 you
>> > assembled above, and the md0 you're doing here?
>> >
>> > >> << cancelled out of 485, review instructions... >>
>> > >> 486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
>> > >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
>> > >> /dev/mapper/sdg
>> > >> 487  fsck -n /dev/md0
>> >
>> > And what output did you get here?  Did it find a filesystem?  You might want
>> > to try:
>> >
>> >    blkid /dev/md0
>> >
>> >
>> > >> 488  mdadm --stop /dev/md0
>> > >> 489  echo $DEVICES
>> > >> 490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
>> > >> 491  dmsetup status
>> >
>> > This all worked properly?  No errors?
>> >
>> > I gave up after this because it's not clear what the results really
>> > are.  If you don't find a filesystem that fsck's cleanly, then you
>> > should just need to stop the array, then re-create it but shuffle the
>> > order of the devices.
>> >
>> > Instead of disk in order of "sdb sdc sdd... sdN", you would try the
>> > order "sdc sdd ... sdN sdb".   See how I moved sdb to the end of the
>> > list of devices?  With six disks, you have I think 6 factorial options
>> > to try.   Which is alot of options to go though, and why you need to
>> > automate this more.  But also keep a log and show the output!
>> >
>> > John
>> >
>> >
>> > >> 492  ls
>> > >> 493  rm overlay-*
>> > >> 494  ls
>> > >> 495  parallel losetup -d ::: /dev/loop[0-9]*
>> > >> 496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
>> > >> {#}' ::: $DEVICES
>> > >> 497  parallel truncate -s300G overlay-{/} ::: $DEVICES
>> > >> 498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> > >> create {/}' ::: $DEVICES
>> > >> 499  dmsetup status
>> > >> 500  /sbin/reboot
>> > >> 501  history
>> > >> 502  dmsetup status
>> > >> 503  mount
>> > >> 504  cat /proc/partitions
>> > >> 505  nano /etc/fstab
>> > >> 506  mount /mnt/backup/
>> > >> 507  ls /mnt/backup/
>> > >> 508  rm /mnt/backup/
>> > >> 509  rm /mnt/backup/overlay-sd*
>> > >> 510  emacs setupOverlay &
>> > >> 511  ps auxww | grep emacs
>> > >> 512  kill 65017
>> > >> 513  ls /dev/loo*
>> > >> 514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
>> > >> 515  echo $DEVICES
>> > >> 516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
>> > >> 7 {#}' ::: $DEVICES
>> > >> 517  ls /dev/loo*
>> > >> 518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> > >> 519  ls
>> > >> 520  rm overlay-sd*
>> > >> 521  cd /mnt/bak
>> > >> 522  cd /mnt/backup/
>> > >> 523  ls
>> > >> 524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
>> > >> 525  ls -la
>> > >> 526  blockdev --getsize /dev/sdb
>> > >> 527  man losetup
>> > >> 528  man losetup
>> > >> 529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
>> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
>> > >> create {/}' ::: $DEVICES
>> > >> 530  dmsetup status
>> > >> 531  history | grep mdadm
>> > >> 532  history
>> > >> 533  dmsetup status
>> > >> 534  history | grep dmsetup
>> > >> 535  dmsetup status
>> > >> 536  dmsetup remove sdg
>> > >> 537  dmsetup ls --tree
>> > >> 538  lsof
>> > >> 539  dmsetup ls --tre
>> > >> 540  dmsetup ls --tree
>> > >> 541  lsof | grep -i sdg
>> > >> 542  lsof | grep -i sdf
>> > >> 543  history |grep dmsetup | less
>> > >> 544  dmsetup status
>> > >> 545  history > ~plsander/Documents/raidIssues/joblog
>> > >>
>> > >> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
>> > >> >
>> > >> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>> > >> >
>> > >> > > encountering a puzzling situation.
>> > >> > > dmsetup is failing to return.
>> > >> >
>> > >> > I don't think you need to use dmsetup in your case, but can you post
>> > >> > *all* the commands you ran before you got to this point, and the
>> > >> > output of
>> > >> >
>> > >> >        cat /proc/mdstat
>> > >> >
>> > >> > as well?  Thinking on this some more, you might need to actually also
>> > >> > add:
>> > >> >
>> > >> >         --assume-clean
>> > >> >
>> > >> > to the 'mdadm create ....' string, since you don't want it to zero the
>> > >> > array or anything.
>> > >> >
>> > >> > Sorry for not remembering this at the time!
>> > >> >
>> > >> > So if you can, please just start over from scratch, showing the setup
>> > >> > of the loop devices, the overlayfs setup, and the building the RAID6
>> > >> > array, along with the cat /proc/mdstat after you do the initial build.
>> > >> >
>> > >> > John
>> > >> >
>> > >> > P.S.  For those who hated my email citing tool, I pulled it out for
>> > >> > now.  Only citing with > now.  :-)
>> > >> >
>> > >> > > root@superior:/mnt/backup# dmsetup status
>> > >> > > sdg: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sde: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
>> > >> >
>> > >> > > dmsetup remove sdg  runs for hours.
>> > >> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
>> > >> >
>> > >> > > dmsetup status shows:
>> > >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sde: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
>> > >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
>> > >> >
>> > >> > > dmsetup ls --tree
>> > >> > > root@superior:/mnt/backup# dmsetup ls --tree
>> > >> > > sdf (253:3)
>> > >> > >  ├─ (7:3)
>> > >> > >  └─ (8:80)
>> > >> > > sde (253:1)
>> > >> > >  ├─ (7:1)
>> > >> > >  └─ (8:64)
>> > >> > > sdd (253:2)
>> > >> > >  ├─ (7:2)
>> > >> > >  └─ (8:48)
>> > >> > > sdc (253:0)
>> > >> > >  ├─ (7:0)
>> > >> > >  └─ (8:32)
>> > >> > > sdb (253:5)
>> > >> > >  ├─ (7:5)
>> > >> > >  └─ (8:16)
>> > >> >
>> > >> > > any suggestions?
>> > >> >
>> > >> >
>> > >> >
>> > >> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
>> > >> > >>
>> > >> > >> On 30/08/2022 14:27, Peter Sanders wrote:
>> > >> > >> >
>> > >> > >> > And the victory conditions would be a mountable file system that passes a fsck?
>> > >> > >>
>> > >> > >> Yes. Just make sure you delve through the file system a bit and satisfy
>> > >> > >> yourself it looks good, too ...
>> > >> > >>
>> > >> > >> Cheers,
>> > >> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RAID 6, 6 device array - all devices lost superblock
  2022-09-05 19:36                                         ` John Stoffel
@ 2022-09-05 20:16                                           ` Peter Sanders
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Sanders @ 2022-09-05 20:16 UTC (permalink / raw)
  To: John Stoffel; +Cc: Wols Lists, Eyal Lebedinsky, linux-raid

Wrote a script to do the setup:
-----
#! /usr/bin/bash

cd /mnt/backup;

DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg';
echo $DEVICES;

date;
parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7 {#}'
::: $DEVICES;
ls -l /dev/loop*;

date;
parallel truncate -s300G overlay-{/} ::: $DEVICES;
ls -l /mnt/backup;

date;
parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show --
overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}'
::: $DEVICES;

date;
OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES);

dmsetup status;
-----
Ran that, then ran pvs, etc

Mon 05 Sep 2022 04:05:50 PM EDT
sdg: 0 5860533168 snapshot 16/629145600 16
sdf: 0 5860533168 snapshot 16/629145600 16
sde: 0 5860533168 snapshot 16/629145600 16
sdd: 0 5860533168 snapshot 16/629145600 16
sdc: 0 5860533168 snapshot 16/629145600 16
sdb: 0 5860533168 snapshot 16/629145600 16
root@superior:~# pvs
pvs     pvscan
root@superior:~# pvs
root@superior:~# vg
vgcfgbackup    vgck           vgdisplay      vgimport       vgmknodes
    vgrename       vgsplit
vgcfgrestore   vgconvert      vgexport       vgimportclone  vgreduce
    vgs
vgchange       vgcreate       vgextend       vgmerge        vgremove
    vgscan
root@superior:~# vgs
root@superior:~# pvs -a
  PV             VG Fmt Attr PSize PFree
  /dev/loop0            ---     0     0
  /dev/loop1            ---     0     0
  /dev/loop2            ---     0     0
  /dev/loop3            ---     0     0
  /dev/loop4            ---     0     0
  /dev/loop5            ---     0     0
  /dev/nvme0n1          ---     0     0
  /dev/nvme0n1p1        ---     0     0
  /dev/nvme0n1p5        ---     0     0
  /dev/nvme0n1p6        ---     0     0
  /dev/nvme0n1p7        ---     0     0
  /dev/nvme0n1p8        ---     0     0
  /dev/nvme0n1p9        ---     0     0
  /dev/sda1             ---     0     0
root@superior:~# vgs -a
root@superior:~# vgs -a
root@superior:~# dmsetup remove
No device specified.
Command failed.
root@superior:~# echo $DEVICES

root@superior:~# dmsetup remove /dev/sdb
Device sdb not found
Command failed.
root@superior:~# cat /proc/partitions
major minor  #blocks  name

 259        0  250059096 nvme0n1
 259        1     496640 nvme0n1p1
 259        2          1 nvme0n1p2
 259        3   63475712 nvme0n1p5
 259        4   97654784 nvme0n1p6
 259        5      37888 nvme0n1p7
 259        6   86913024 nvme0n1p8
 259        7    1474560 nvme0n1p9
  11        0    1048575 sr0
   8       48 2930266584 sdd
   8       16 2930266584 sdb
   8       80 2930266584 sdf
   8        0 1953514584 sda
   8        1 1953513472 sda1
   8       32 2930266584 sdc
   8       64 2930266584 sde
   8       96 2930266584 sdg
   7        0  314572800 loop0
   7        1  314572800 loop1
   7        2  314572800 loop2
   7        3  314572800 loop3
   7        4  314572800 loop4
   7        5  314572800 loop5
 253        0 2930266584 dm-0
 253        1 2930266584 dm-1
 253        2 2930266584 dm-2
 253        3 2930266584 dm-3
 253        4 2930266584 dm-4
 253        5 2930266584 dm-5
root@superior:~#


Should I be calling dmsetup remove  for /dev/sdx or /dev/dm-n?
Since dmsetup remove fails to find /dev/sdb...

for device order -- will ls -l /dev/disk/by-id work?
These first nine lines appear to be show serial numbers.


root@superior:~# ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root  9 Sep  5 16:02 ata-MAD_DOG_LS-DVDRW_TSH652M -> ../../sr0
lrwxrwxrwx 1 root root  9 Sep  5 16:02 ata-TOSHIBA_HDWD130_477ABEJAS
-> ../../sdg
lrwxrwxrwx 1 root root  9 Sep  5 16:02 ata-TOSHIBA_HDWD130_477ALBNAS
-> ../../sdb
lrwxrwxrwx 1 root root  9 Sep  5 16:02 ata-TOSHIBA_HDWD130_Y7211KPAS
-> ../../sdc
lrwxrwxrwx 1 root root  9 Sep  5 16:02
ata-WDC_WD20EARX-00PASB0_WD-WMAZA6843376 -> ../../sda
lrwxrwxrwx 1 root root 10 Sep  5 16:02
ata-WDC_WD20EARX-00PASB0_WD-WMAZA6843376-part1 -> ../../sda1
lrwxrwxrwx 1 root root  9 Sep  5 16:02
ata-WDC_WD30EZRX-00D8PB0_WD-WCC4N0091255 -> ../../sde
lrwxrwxrwx 1 root root  9 Sep  5 16:02
ata-WDC_WD30EZRX-00DC0B0_WD-WCC1T0668790 -> ../../sdd
lrwxrwxrwx 1 root root  9 Sep  5 16:02
ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2669166 -> ../../sdf
lrwxrwxrwx 1 root root 13 Sep  5 16:02 nvme-eui.6479a75970c003ee ->
../../nvme0n1
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part1
-> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part2
-> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part5
-> ../../nvme0n1p5
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part6
-> ../../nvme0n1p6
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part7
-> ../../nvme0n1p7
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part8
-> ../../nvme0n1p8
lrwxrwxrwx 1 root root 15 Sep  5 16:02 nvme-eui.6479a75970c003ee-part9
-> ../../nvme0n1p9
lrwxrwxrwx 1 root root 13 Sep  5 16:02 nvme-PCIe_SSD_21112925606047 ->
../../nvme0n1
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part5 -> ../../nvme0n1p5
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part6 -> ../../nvme0n1p6
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part7 -> ../../nvme0n1p7
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part8 -> ../../nvme0n1p8
lrwxrwxrwx 1 root root 15 Sep  5 16:02
nvme-PCIe_SSD_21112925606047-part9 -> ../../nvme0n1p9
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x5000039fe6d2ce25 -> ../../sdg
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x5000039fe6d2e832 -> ../../sdb
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x5000039fe6dca946 -> ../../sdc
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x50014ee15a13d994 -> ../../sdf
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x50014ee206a417d2 -> ../../sda
lrwxrwxrwx 1 root root 10 Sep  5 16:02 wwn-0x50014ee206a417d2-part1 ->
../../sda1
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x50014ee2084d406a -> ../../sdd
lrwxrwxrwx 1 root root  9 Sep  5 16:02 wwn-0x50014ee2b3d4ffa1 -> ../../sde

On Mon, Sep 5, 2022 at 3:36 PM John Stoffel <john@stoffel.org> wrote:
>
> >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
>
> > tried removing the setup:
> > root@superior:/mnt/backup# mdadm --stop /dev/md1
> > mdadm: stopped /dev/md1
> > root@superior:/mnt/backup#  parallel 'dmsetup remove {/}; rm
> > overlay-{/}' ::: $DEVICES
> > ^C
>
> > (ran for an hour before cancel... )
>
> > root@superior:/mnt/backup# dmsetup status
> > No devices found
> > root@superior:/mnt/backupls
> > lost+found  overlay-sdb  overlay-sdc  overlay-sdd  overlay-sde
> > overlay-sdf  overlay-sdg
> > root@superior:/mnt/backup# rm overlay-sd*
> > root@superior:/mnt/backup# ls /dev/loop
> > ls: cannot access '/dev/loop': No such file or directory
> > root@superior:/mnt/backup# ls /dev/loop*
> > /dev/loop0  /dev/loop2    /dev/loop4  /dev/loop6    /dev/loop-control
> > /dev/loop1  /dev/loop3    /dev/loop5  /dev/loop7
> > root@superior:/mnt/backup# parallel losetup -d ::: /dev/loop[0-9]*
> > losetup: /dev/loop6: detach failed: No such device or address
> > losetup: /dev/loop7: detach failed: No such device or address
> > root@superior:/mnt/backup# ls /dev/loop*
> > /dev/loop0  /dev/loop2    /dev/loop4  /dev/loop6    /dev/loop-control
> > /dev/loop1  /dev/loop3    /dev/loop5  /dev/loop7
> > root@superior:/mnt/backup# ls -la /dev/lo*
> > lrwxrwxrwx 1 root root      28 Sep  2 20:22 /dev/log ->
> > /run/systemd/journal/dev-log
> > brw-rw---- 1 root disk  7,   0 Sep  2 20:32 /dev/loop0
> > brw-rw---- 1 root disk  7,   1 Sep  2 20:32 /dev/loop1
> > brw-rw---- 1 root disk  7,   2 Sep  2 20:32 /dev/loop2
> > brw-rw---- 1 root disk  7,   3 Sep  2 20:32 /dev/loop3
> > brw-rw---- 1 root disk  7,   4 Sep  2 20:32 /dev/loop4
> > brw-rw---- 1 root disk  7,   5 Sep  2 20:32 /dev/loop5
> > brw-rw---- 1 root disk  7,   6 Sep  2 20:32 /dev/loop6
> > brw-rw---- 1 root disk  7,   7 Sep  2 20:32 /dev/loop7
> > crw-rw---- 1 root disk 10, 237 Sep  2 20:32 /dev/loop-control
> > root@superior:/mnt/backup# losetup -d ::: /dev/loop70
> > losetup: :::: failed to use device: No such device
> > root@superior:/mnt/backup# losetup -d ::: /dev/loop7
> > losetup: :::: failed to use device: No such device
> > root@superior:/mnt/backup# losetup -d  /dev/loop7
> > losetup: /dev/loop7: detach failed: No such device or address
> > root@superior:/mnt/backup# ls -la /dev/loop*
> > brw-rw---- 1 root disk  7,   0 Sep  2 20:32 /dev/loop0
> > brw-rw---- 1 root disk  7,   1 Sep  2 20:32 /dev/loop1
> > brw-rw---- 1 root disk  7,   2 Sep  2 20:32 /dev/loop2
> > brw-rw---- 1 root disk  7,   3 Sep  2 20:32 /dev/loop3
> > brw-rw---- 1 root disk  7,   4 Sep  2 20:32 /dev/loop4
> > brw-rw---- 1 root disk  7,   5 Sep  2 20:32 /dev/loop5
> > brw-rw---- 1 root disk  7,   6 Sep  2 20:32 /dev/loop6
> > brw-rw---- 1 root disk  7,   7 Sep  2 20:32 /dev/loop7
> > crw-rw---- 1 root disk 10, 237 Sep  2 20:32 /dev/loop-control
> > root@superior:/mnt/backup# losetup /dev/loop7
> > losetup: /dev/loop7: No such file or directory
> > root@superior:/mnt/backup# losetup /dev/loop5
> > losetup: /dev/loop5: No such file or directory
> > root@superior:/mnt/backup#
>
>
> > not sure why losetup cannot see the existing /dev/loopx devices.
>
> I was reading the dmsetup man page and it said that if the devices are
> open, when you do a remove, it sorta fails them and then blocks them
> as unable to have more IO sent to them.  But honestly I'm not an
> expert on dmsetup.
>
> But sure, try to reboot each time, but you also need to make sure the
> disks are in the same positions each time after reboot, so look at the
> serial numbers.
>
>        lsscsi -g -l
>
> might be enough to give you unique or instead use:
>
>       hdparm -i /dev/sda | grep SerialNo
>
> to get the info and keep track of which disk is in which order.
>
> John
>
>
>
> > On Fri, Sep 2, 2022 at 8:39 PM Peter Sanders <plsander@gmail.com> wrote:
> >>
> >> Repeat of run 1
> >>
> >> plsander@superior:~$ su -
> >> Password:
> >> root@superior:~# cat /proc/partitions
> >> major minor  #blocks  name
> >>
> >> 259        0  250059096 nvme0n1
> >> 259        1     496640 nvme0n1p1
> >> 259        2          1 nvme0n1p2
> >> 259        3   63475712 nvme0n1p5
> >> 259        4   97654784 nvme0n1p6
> >> 259        5      37888 nvme0n1p7
> >> 259        6   86913024 nvme0n1p8
> >> 259        7    1474560 nvme0n1p9
> >> 8       16 2930266584 sdb
> >> 8       80 2930266584 sdf
> >> 8        0 1953514584 sda
> >> 8        1 1953513472 sda1
> >> 8       32 2930266584 sdc
> >> 8       96 2930266584 sdg
> >> 8       64 2930266584 sde
> >> 8       48 2930266584 sdd
> >> 11        0    1048575 sr0
> >> root@superior:~# cat /proc/mdstat
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> unused devices: <none>
> >> root@superior:~# DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
> >> root@superior:~# echo $DEVICES
> >> /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
> >> root@superior:~# parallel 'test -e /dev/loop{#} || mknod -m 660
> >> /dev/loop{#} b 7 {#}' ::: $DEVICES
> >> root@superior:~# ls /dev/lo
> >> log           loop2         loop4         loop6
> >> loop1         loop3         loop5         loop-control
> >> root@superior:~# ls /dev/lo*
> >> /dev/log  /dev/loop1  /dev/loop2  /dev/loop3  /dev/loop4  /dev/loop5
> >> /dev/loop6  /dev/loop-control
> >> root@superior:~# ls -l /dev/loop*
> >> brw-rw---- 1 root root  7,   1 Sep  2 20:30 /dev/loop1
> >> brw-rw---- 1 root root  7,   2 Sep  2 20:30 /dev/loop2
> >> brw-rw---- 1 root root  7,   3 Sep  2 20:30 /dev/loop3
> >> brw-rw---- 1 root root  7,   4 Sep  2 20:30 /dev/loop4
> >> brw-rw---- 1 root root  7,   5 Sep  2 20:30 /dev/loop5
> >> brw-rw---- 1 root root  7,   6 Sep  2 20:30 /dev/loop6
> >> crw-rw---- 1 root disk 10, 237 Sep  2 20:22 /dev/loop-control
> >> root@superior:~# cd /mnt/backup/
> >> root@superior:/mnt/backup# parallel truncate -s4000G overlay-{/} ::: $DEVICES
> >> root@superior:/mnt/backup# ls -l
> >> total 16
> >> drwx------ 2 root root         16384 Aug 28 18:50 lost+found
> >> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdb
> >> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdc
> >> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdd
> >> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sde
> >> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdf
> >> -rw-r--r-- 1 root root 4294967296000 Sep  2 20:31 overlay-sdg
> >> root@superior:/mnt/backup# rm over*
> >> root@superior:/mnt/backup# parallel truncate -s300G overlay-{/} ::: $DEVICES
> >> root@superior:/mnt/backup# ls -la
> >> total 24
> >> drwxr-xr-x 3 root root         4096 Sep  2 20:31 .
> >> drwxr-xr-x 7 root root         4096 Aug 29 09:17 ..
> >> drwx------ 2 root root        16384 Aug 28 18:50 lost+found
> >> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdb
> >> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdc
> >> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdd
> >> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sde
> >> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdf
> >> -rw-r--r-- 1 root root 322122547200 Sep  2 20:31 overlay-sdg
> >> root@superior:/mnt/backup# dmsetup status
> >> No devices found
> >> root@superior:/mnt/backup# date
> >> Fri 02 Sep 2022 08:32:11 PM EDT
> >> root@superior:/mnt/backup#  parallel 'size=$(blockdev --getsize {});
> >> loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {}
> >> $loop P 8 | dmsetup create {/}' ::: $DEVICES
> >> root@superior:/mnt/backup# date
> >> Fri 02 Sep 2022 08:32:20 PM EDT
> >> root@superior:/mnt/backup# dmsetup status
> >> sdg: 0 5860533168 snapshot 16/629145600 16
> >> sdf: 0 5860533168 snapshot 16/629145600 16
> >> sde: 0 5860533168 snapshot 16/629145600 16
> >> sdd: 0 5860533168 snapshot 16/629145600 16
> >> sdc: 0 5860533168 snapshot 16/629145600 16
> >> sdb: 0 5860533168 snapshot 16/629145600 16
> >> root@superior:/mnt/backup# OVERLAYS=$(parallel echo /dev/mapper/{/}
> >> ::: $DEVICES)
> >> root@superior:/mnt/backup# echo $OVERLAYS
> >> /dev/mapper/sdb /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde
> >> /dev/mapper/sdf /dev/mapper/sdg
> >> root@superior:/mnt/backup# mdadm --create /dev/md1 --level=raid6 -n 6
> >> --assume-clean $OVERLAYS
> >> mdadm: partition table exists on /dev/mapper/sdb
> >> mdadm: partition table exists on /dev/mapper/sdc
> >> mdadm: partition table exists on /dev/mapper/sdc but will be lost or
> >> meaningless after creating array
> >> mdadm: partition table exists on /dev/mapper/sdd
> >> mdadm: partition table exists on /dev/mapper/sdd but will be lost or
> >> meaningless after creating array
> >> mdadm: partition table exists on /dev/mapper/sde
> >> mdadm: partition table exists on /dev/mapper/sde but will be lost or
> >> meaningless after creating array
> >> mdadm: partition table exists on /dev/mapper/sdf
> >> mdadm: partition table exists on /dev/mapper/sdf but will be lost or
> >> meaningless after creating array
> >> mdadm: partition table exists on /dev/mapper/sdg
> >> mdadm: partition table exists on /dev/mapper/sdg but will be lost or
> >> meaningless after creating array
> >> Continue creating array? y
> >> mdadm: Defaulting to version 1.2 metadata
> >> mdadm: array /dev/md1 started.
> >> root@superior:/mnt/backup# ls -l /dev/md*
> >> brw-rw---- 1 root disk 9, 1 Sep  2 20:34 /dev/md1
> >> root@superior:/mnt/backup# fsck /dev/md1
> >> fsck from util-linux 2.36.1
> >> e2fsck 1.46.2 (28-Feb-2021)
> >> ext2fs_open2: Bad magic number in super-block
> >> fsck.ext2: Superblock invalid, trying backup blocks...
> >> fsck.ext2: Bad magic number in super-block while trying to open /dev/md1
> >>
> >> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> >> filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
> >> filesystem (and not swap or ufs or something else), then the superblock
> >> is corrupt, and you might try running e2fsck with an alternate superblock:
> >> e2fsck -b 8193 <device>
> >> or
> >> e2fsck -b 32768 <device>
> >>
> >> root@superior:/mnt/backup# blkid /dev/md1
> >> root@superior:/mnt/backup#
> >> root@superior:/mnt/backup# cat /proc/mdstat
> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> [raid4] [raid10]
> >> md1 : active raid6 dm-3[5] dm-2[4] dm-1[3] dm-5[2] dm-0[1] dm-4[0]
> >> 11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2
> >> [6/6] [UUUUUU]
> >> bitmap: 0/22 pages [0KB], 65536KB chunk
> >>
> >> unused devices: <none>
> >> root@superior:/mnt/backup#
> >>
> >> Some questions -
> >> - is the easiest 'reset for next run' to reboot and rebuild?
> >>
> >>
> >> On Fri, Sep 2, 2022 at 3:12 PM John Stoffel <john@stoffel.org> wrote:
> >> >
> >> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >> >
> >> > Peter, please include the output of all the commands, not just the
> >> > commands themselves.  See my comments below.
> >> >
> >> >
> >> > > Question on restarting from scratch...
> >> > > How to reset to the starting point?
> >> >
> >> > I think you need to blow away the loop devices and re-create them.
> >> >
> >> > Or at least blow away the dmsetup devices you just created.
> >> >
> >> > It might be quickest to just reboot.  What OS are you using for the
> >> > recovery?  Is it a recent live image?  Sorry for asking so many
> >> > questions... some of this is new to me too.
> >> >
> >> >
> >> > > dmsetup, both for remove and create of the overlay seems to be hanging.
> >> >
> >> > > On Fri, Sep 2, 2022 at 10:56 AM Peter Sanders <plsander@gmail.com> wrote:
> >> > >>
> >> > >> contents of /proc/mdstat
> >> > >>
> >> > >> root@superior:/mnt/backup# cat /proc/mdstat
> >> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> >> > >> [raid4] [raid10]
> >> > >> unused devices: <none>
> >> > >> root@superior:/mnt/backup#
> >> > >>
> >> > >>
> >> > >>
> >> > >> Here are the steps I ran (minus some mounting other devices and
> >> > >> looking around for mdadm tracks on the old os disk)
> >> > >>
> >> > >> 410  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> >> > >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> >> > >> echo /dev/{1})
> >> > >> 411  apt install parallel
> >> > >> 412  DEVICES=$(cat /proc/partitions | parallel --tagstring {5}
> >> > >> --colsep ' +' mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t'
> >> > >> echo /dev/{1})
> >> > >> 413  echo $DEVICES
> >> >
> >> > So you found no MD RAID super blocks on any of the base devices.  You
> >> > can skip this step moving forward.
> >> >
> >> > >> 414  cat /proc/partitions
> >> > >> 415  DEVICES=/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
> >> > >> 416  DEVICES="/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg"
> >> > >> 417  echo $DEVICES
> >> > >> 418  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> >> > >> {#}' ::: $DEVICES
> >> > >> 419  ls /dev/loop*
> >> >
> >> > Can you show the output of all these commands, not just the commands please?
> >> >
> >> > >> 423  parallel truncate -s300G overlay-{/} ::: $DEVICES
> >> >
> >> > >> 427  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> >> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> >> > >> create {/}' ::: $DEVICES
> >> > >> 428  ls /dev/mapper/
> >> >
> >> > This is some key output to view.
> >> >
> >> > >> 429  OVERLAYS=$(parallel echo /dev/mapper/{/} ::: $DEVICES)
> >> > >> 430  echo $OVERLAYS
> >> >
> >> > What are the overlays?
> >> >
> >> > >> 431  dmsetup status
> >> >
> >> > What did this command show?
> >> >
> >> > >> 432  mdadm --assemble --force /dev/md1 $OVERLAYS
> >> >
> >> > And here is where I think you need to put --assume-clean when using
> >> > 'create' command instead.  It's not going to assemble anything because
> >> > the info was wiped.  I *think* you really want:
> >> >
> >> >    mdadm --create /dev/md1 --level=raid6 -n 6 --assume-clean $OVERLAYS
> >> >
> >> > And once you do this above command and it comes back, do:
> >> >
> >> >     cat /proc/mdstat
> >> >
> >> > and show all the output please!
> >> >
> >> > >> 433  history
> >> > >> 434  dmsetup status
> >> > >> 435  echo $OVERLAYS
> >> > >> 436  mdadm --assemble --force /dev/md0 $OVERLAYS
> >> > >> 437  cat /proc/partitions
> >> > >> 438  mkdir /mnt/oldroot
> >> > >> << look for inird mdadm files >>
> >> > >> 484  echo $OVERLAYS
> >> > >> 485  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> >> > >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> >> > >> /dev/mapper/sdg
> >> >
> >> > I'm confused here, what  is the difference between the md1 you
> >> > assembled above, and the md0 you're doing here?
> >> >
> >> > >> << cancelled out of 485, review instructions... >>
> >> > >> 486  mdadm --create /dev/md0 --level=raid6 -n 6 /dev/mapper/sdb
> >> > >> /dev/mapper/sdc /dev/mapper/sdd /dev/mapper/sde /dev/mapper/sdf
> >> > >> /dev/mapper/sdg
> >> > >> 487  fsck -n /dev/md0
> >> >
> >> > And what output did you get here?  Did it find a filesystem?  You might want
> >> > to try:
> >> >
> >> >    blkid /dev/md0
> >> >
> >> >
> >> > >> 488  mdadm --stop /dev/md0
> >> > >> 489  echo $DEVICES
> >> > >> 490   parallel 'dmsetup remove {/}; rm overlay-{/}' ::: $DEVICES
> >> > >> 491  dmsetup status
> >> >
> >> > This all worked properly?  No errors?
> >> >
> >> > I gave up after this because it's not clear what the results really
> >> > are.  If you don't find a filesystem that fsck's cleanly, then you
> >> > should just need to stop the array, then re-create it but shuffle the
> >> > order of the devices.
> >> >
> >> > Instead of disk in order of "sdb sdc sdd... sdN", you would try the
> >> > order "sdc sdd ... sdN sdb".   See how I moved sdb to the end of the
> >> > list of devices?  With six disks, you have I think 6 factorial options
> >> > to try.   Which is alot of options to go though, and why you need to
> >> > automate this more.  But also keep a log and show the output!
> >> >
> >> > John
> >> >
> >> >
> >> > >> 492  ls
> >> > >> 493  rm overlay-*
> >> > >> 494  ls
> >> > >> 495  parallel losetup -d ::: /dev/loop[0-9]*
> >> > >> 496  parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b 7
> >> > >> {#}' ::: $DEVICES
> >> > >> 497  parallel truncate -s300G overlay-{/} ::: $DEVICES
> >> > >> 498  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> >> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> >> > >> create {/}' ::: $DEVICES
> >> > >> 499  dmsetup status
> >> > >> 500  /sbin/reboot
> >> > >> 501  history
> >> > >> 502  dmsetup status
> >> > >> 503  mount
> >> > >> 504  cat /proc/partitions
> >> > >> 505  nano /etc/fstab
> >> > >> 506  mount /mnt/backup/
> >> > >> 507  ls /mnt/backup/
> >> > >> 508  rm /mnt/backup/
> >> > >> 509  rm /mnt/backup/overlay-sd*
> >> > >> 510  emacs setupOverlay &
> >> > >> 511  ps auxww | grep emacs
> >> > >> 512  kill 65017
> >> > >> 513  ls /dev/loo*
> >> > >> 514  DEVICES='/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg'
> >> > >> 515  echo $DEVICES
> >> > >> 516   parallel 'test -e /dev/loop{#} || mknod -m 660 /dev/loop{#} b
> >> > >> 7 {#}' ::: $DEVICES
> >> > >> 517  ls /dev/loo*
> >> > >> 518  parallel truncate -s4000G overlay-{/} ::: $DEVICES
> >> > >> 519  ls
> >> > >> 520  rm overlay-sd*
> >> > >> 521  cd /mnt/bak
> >> > >> 522  cd /mnt/backup/
> >> > >> 523  ls
> >> > >> 524  parallel truncate -s4000G overlay-{/} ::: $DEVICES
> >> > >> 525  ls -la
> >> > >> 526  blockdev --getsize /dev/sdb
> >> > >> 527  man losetup
> >> > >> 528  man losetup
> >> > >> 529  parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f
> >> > >> --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup
> >> > >> create {/}' ::: $DEVICES
> >> > >> 530  dmsetup status
> >> > >> 531  history | grep mdadm
> >> > >> 532  history
> >> > >> 533  dmsetup status
> >> > >> 534  history | grep dmsetup
> >> > >> 535  dmsetup status
> >> > >> 536  dmsetup remove sdg
> >> > >> 537  dmsetup ls --tree
> >> > >> 538  lsof
> >> > >> 539  dmsetup ls --tre
> >> > >> 540  dmsetup ls --tree
> >> > >> 541  lsof | grep -i sdg
> >> > >> 542  lsof | grep -i sdf
> >> > >> 543  history |grep dmsetup | less
> >> > >> 544  dmsetup status
> >> > >> 545  history > ~plsander/Documents/raidIssues/joblog
> >> > >>
> >> > >> On Wed, Aug 31, 2022 at 4:37 PM John Stoffel <john@stoffel.org> wrote:
> >> > >> >
> >> > >> > >>>>> "Peter" == Peter Sanders <plsander@gmail.com> writes:
> >> > >> >
> >> > >> > > encountering a puzzling situation.
> >> > >> > > dmsetup is failing to return.
> >> > >> >
> >> > >> > I don't think you need to use dmsetup in your case, but can you post
> >> > >> > *all* the commands you ran before you got to this point, and the
> >> > >> > output of
> >> > >> >
> >> > >> >        cat /proc/mdstat
> >> > >> >
> >> > >> > as well?  Thinking on this some more, you might need to actually also
> >> > >> > add:
> >> > >> >
> >> > >> >         --assume-clean
> >> > >> >
> >> > >> > to the 'mdadm create ....' string, since you don't want it to zero the
> >> > >> > array or anything.
> >> > >> >
> >> > >> > Sorry for not remembering this at the time!
> >> > >> >
> >> > >> > So if you can, please just start over from scratch, showing the setup
> >> > >> > of the loop devices, the overlayfs setup, and the building the RAID6
> >> > >> > array, along with the cat /proc/mdstat after you do the initial build.
> >> > >> >
> >> > >> > John
> >> > >> >
> >> > >> > P.S.  For those who hated my email citing tool, I pulled it out for
> >> > >> > now.  Only citing with > now.  :-)
> >> > >> >
> >> > >> > > root@superior:/mnt/backup# dmsetup status
> >> > >> > > sdg: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sde: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> >> > >> >
> >> > >> > > dmsetup remove sdg  runs for hours.
> >> > >> > > Canceled it, ran dmsetup ls --tree and find that sdg is not present in the list.
> >> > >> >
> >> > >> > > dmsetup status shows:
> >> > >> > > sdf: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sde: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdd: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdc: 0 5860533168 snapshot 16/8388608000 16
> >> > >> > > sdb: 0 5860533168 snapshot 16/8388608000 16
> >> > >> >
> >> > >> > > dmsetup ls --tree
> >> > >> > > root@superior:/mnt/backup# dmsetup ls --tree
> >> > >> > > sdf (253:3)
> >> > >> > >  ├─ (7:3)
> >> > >> > >  └─ (8:80)
> >> > >> > > sde (253:1)
> >> > >> > >  ├─ (7:1)
> >> > >> > >  └─ (8:64)
> >> > >> > > sdd (253:2)
> >> > >> > >  ├─ (7:2)
> >> > >> > >  └─ (8:48)
> >> > >> > > sdc (253:0)
> >> > >> > >  ├─ (7:0)
> >> > >> > >  └─ (8:32)
> >> > >> > > sdb (253:5)
> >> > >> > >  ├─ (7:5)
> >> > >> > >  └─ (8:16)
> >> > >> >
> >> > >> > > any suggestions?
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > > On Tue, Aug 30, 2022 at 2:03 PM Wols Lists <antlists@youngman.org.uk> wrote:
> >> > >> > >>
> >> > >> > >> On 30/08/2022 14:27, Peter Sanders wrote:
> >> > >> > >> >
> >> > >> > >> > And the victory conditions would be a mountable file system that passes a fsck?
> >> > >> > >>
> >> > >> > >> Yes. Just make sure you delve through the file system a bit and satisfy
> >> > >> > >> yourself it looks good, too ...
> >> > >> > >>
> >> > >> > >> Cheers,
> >> > >> > >> Wol

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-09-05 20:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-28  2:00 RAID 6, 6 device array - all devices lost superblock Peter Sanders
2022-08-28  9:14 ` Wols Lists
2022-08-28  9:54   ` Wols Lists
2022-08-28 16:47     ` Phil Turmel
     [not found]       ` <CAKAPSkJAQYsec-4zzcePbkJ7Ee0=sd_QvHj4Stnyineq+T8BXw@mail.gmail.com>
2022-08-28 17:16         ` Wols Lists
2022-08-28 18:45         ` John Stoffel
2022-08-28 19:36           ` Phil Turmel
2022-08-28 19:49             ` John Stoffel
2022-08-28 23:24               ` Peter Sanders
2022-08-29 13:12                 ` Peter Sanders
2022-08-29 21:45                 ` John Stoffel
2022-08-29 22:29                   ` Eyal Lebedinsky
2022-08-29 23:53                     ` Peter Sanders
2022-08-30 13:27                       ` Peter Sanders
2022-08-30 18:03                         ` Wols Lists
2022-08-31 17:48                           ` Peter Sanders
2022-08-31 20:37                             ` John Stoffel
2022-09-02 14:56                               ` Peter Sanders
2022-09-02 18:52                                 ` Peter Sanders
2022-09-02 19:12                                   ` John Stoffel
2022-09-03  0:39                                     ` Peter Sanders
2022-09-03  5:51                                       ` Peter Sanders
2022-09-05 19:36                                         ` John Stoffel
2022-09-05 20:16                                           ` Peter Sanders
2022-09-05 19:25                                       ` John Stoffel
2022-08-28 15:10 ` John Stoffel
2022-08-28 17:11 ` Andy Smith
2022-08-28 17:22   ` Andy Smith
2022-08-28 17:34     ` Peter Sanders

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.