All of lore.kernel.org
 help / color / mirror / Atom feed
* Requesting assistance recovering RAID-5 array
@ 2020-03-31  0:04 Daniel Jones
  2020-03-31  0:24 ` antlists
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-03-31  0:04 UTC (permalink / raw)
  To: linux-raid

Hello,

I've ended up my array in an unpleasant condition and am unsure how to
best attempt recovery.  Any assistance from this list would be greatly
appreciated.

The short version: I have a 4 device RAID-5 array currently degraded
to 3 devices. The superblock is missing from 3 out of 4 drives. I've
also lost track of which device was originally /dev/sd[bcde] and doubt
they are in their original order.

How did I end up here?

1) Dec 2018: Created RAID-5 array on three HDDs on Centos7

2) Jul 2019: Added fourth HDD to array.

3) Mar 22 2020: One drive in array failed (originally /dev/sdb). **Due
to outgoing email issue I was not aware of this until (4a) below.**

4) Yesterday: Blissfully unaware of (3) did a planned upgrade of
Mobo/CPU/Boot-HDD in chassis. This went poorly as follows.

  a) After connecting the four drives to the new mobo I noted that
BIOS would not recognize the drive in bay #4.

  b) After booting into "new" system mdadm did not recognize the array.
     Shut down, replaced various SATA/power cables, at some point bay
#4 was recognized.
     Array still not recognized by mdadm.

  c) Put old Mobo/CPU/boot-HDD back into chassis to try to recover to
"last known good" state. Still using reconfigured SATA/power cables.
     The drive in bay #4 is still recognized.
     Due to all the part swapping I doubt the disks still match their
original sdb/sdc/sdd/sde mapping.

  d) After booting into "old" system mdadm does not recognize the array.

     ** Discover that the superblock appears overwritten on three out
of the four drives. **

     Find anecdotal reports online of superblock deletion when moving
arrays between motherboards:

       https://serverfault.com/questions/580761/is-mdadm-raid-toast
       https://forum.openmediavault.org/index.php?thread/11625-raid5-missing-superblocks-after-restart/
(see comments by Nordmann)
       Note that Nordmann claims "Sometimes it occurs that one single
drive out of the array doesnt get affected"

  e) Give up for the day.

4) Today: Looked at things fresh today.

  a) Discovered the Mar-22 drive failure in /var/spool/mail/root.
Working assumption is that the bay #4 drive is one that was /dev/sdb
at the time of failure.
  b) Collected the information posted below.

So, here is my current situation as I see it.

  A four-disk RAID-5 array that degraded to three-disk a week ago with
the failure of what was then /dev/sdb.
  Due to the moving of cables I am no longer confident that
/dev/sd[bcde] are still what they once were.
  I suspect the orginal failed /dev/sdb is the bay #4 drive, not
completely sure.
  Three of the four disks have erased superblocks for unknown reasons.
  Doing a full 'dd' backup of the four disks is not feasible, but if I
can get them to assemble and mount one time I can copy off the data I
need.
  I think the best chance at data recovery is to do a --create to
replace the missing superblocks, but am unsure of the best way in
light of the degraded state of the array.

The only "good news" I have at this point is that I've done nothing at
this time to intentionally overwrite anything.

Information:

Here is the mdadm failure message from 3/22:

        Date: Sun, 22 Mar 2020 13:12:50 -0600 (MDT)This is an
automatically generated mail message from mdadm running on hulk
        A Fail event had been detected on md device /dev/md/0.
        It could be related to component device /dev/sdb.
        Faithfully yours, etc.
        P.S. The /proc/mdstat file currently contains the following:
        Personalities : [raid6] [raid5] [raid4]
        md0 : active raid5 sdd[3] sde[4] sdc[1] sdb[0](F)
              29298914304 blocks super 1.2 level 5, 512k chunk,
algorithm 2 [4/3] [_UUU]
              [========>............]  check = 40.2%
(3927650944/9766304768) finish=1271.6min speed=76523K/sec
              bitmap: 0/73 pages [0KB], 65536KB chunk

Followed shortly by:

        Date: Sun, 22 Mar 2020 13:21:20 -0600 (MDT)
        This is an automatically generated mail message from mdadm
running on hulk
        A DegradedArray event had been detected on md device /dev/md/0.
        Faithfully yours, etc.
        P.S. The /proc/mdstat file currently contains the following:
        Personalities : [raid6] [raid5] [raid4]
        md0 : active raid5 sdc[1] sde[4] sdd[3]
              29298914304 blocks super 1.2 level 5, 512k chunk,
algorithm 2 [4/3] [_UUU]
              bitmap: 2/73 pages [8KB], 65536KB chunk


Actions today:

# cat /proc/mdstat
Personalities :
md0 : inactive sdc[1](S)
      9766306304 blocks super 1.2

unused devices: <none>


#  mdadm --stop /dev/md0
mdadm: stopped /dev/md0


# mdadm -E /dev/sd[bcde]
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 423d9a8e:636a5f08:56ecbd90:282e478b
           Name : hulk:0  (local to host hulk)
  Creation Time : Wed Dec 26 14:13:35 2018
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 19532612608 sectors (9313.88 GiB 10000.70 GB)
     Array Size : 29298914304 KiB (27941.62 GiB 30002.09 GB)
  Used Dev Size : 19532609536 sectors (9313.87 GiB 10000.70 GB)
    Data Offset : 261120 sectors
   Super Offset : 8 sectors
   Unused Space : before=261040 sectors, after=3072 sectors
          State : clean
    Device UUID : 31fa9d90:a407908d:d4d7c7cc:e362b8a5

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Mar 29 15:43:14 2020
  Bad Block Log : 512 entries available at offset 48 sectors
       Checksum : d01e7462 - correct
         Events : 103087

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

# gdisk -l /dev/sdb
GPT fdisk (gdisk) version 0.8.10

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdb: 19532873728 sectors, 9.1 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): A0CB08EC-4CA4-4A87-8848-5ED928708E84
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19532873694
Partitions will be aligned on 2048-sector boundaries
Total free space is 19532873661 sectors (9.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name

# gdisk -l /dev/sdc
GPT fdisk (gdisk) version 0.8.10

Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
Disk /dev/sdc: 19532873728 sectors, 9.1 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): C41898B6-A81D-41B9-BE14-F2AB6D71D8EF
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19532873694
Partitions will be aligned on 2048-sector boundaries
Total free space is 19532873661 sectors (9.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name

# gdisk -l /dev/sdd
GPT fdisk (gdisk) version 0.8.10

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdd: 19532873728 sectors, 9.1 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): A0CB08EC-4CA4-4A87-8848-5ED928708E84
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19532873694
Partitions will be aligned on 2048-sector boundaries
Total free space is 19532873661 sectors (9.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name

# gdisk -l /dev/sde
GPT fdisk (gdisk) version 0.8.10

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sde: 19532873728 sectors, 9.1 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): FB0B481A-6258-4F61-BA60-6AAC8F663DA8
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 19532873694
Partitions will be aligned on 2048-sector boundaries
Total free space is 19532873661 sectors (9.1 TiB)

Number  Start (sector)    End (sector)  Size       Code  Name

# lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME                   SIZE FSTYPE            TYPE MOUNTPOINT
sda                  238.5G                   disk
├─sda1                 500M xfs               part /boot
└─sda2                 238G LVM2_member       part
  ├─centos_hulk-root    50G xfs               lvm  /
  ├─centos_hulk-swap     2G swap              lvm  [SWAP]
  └─centos_hulk-home 185.9G xfs               lvm  /home
sdb                    9.1T                   disk
sdc                    9.1T linux_raid_member disk
sdd                    9.1T                   disk
sde                    9.1T                   disk


# smartctl -H -i -l scterc /dev/sdb
smartctl 7.0 2018-12-30 r4883
[x86_64-linux-3.10.0-1062.18.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    *removed*
LU WWN Device Id: 5 000cca 26ccc09f6
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar 30 17:21:04 2020 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

# smartctl -H -i -l scterc /dev/sdc
smartctl 7.0 2018-12-30 r4883
[x86_64-linux-3.10.0-1062.18.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    *removed*
LU WWN Device Id: 5 000cca 273dd833e
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar 30 17:21:24 2020 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

# smartctl -H -i -l scterc /dev/sdd
smartctl 7.0 2018-12-30 r4883
[x86_64-linux-3.10.0-1062.18.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    *removed*
LU WWN Device Id: 5 000cca 273e1f716
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar 30 17:21:43 2020 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

# smartctl -H -i -l scterc /dev/sde
smartctl 7.0 2018-12-30 r4883
[x86_64-linux-3.10.0-1062.18.1.el7.x86_64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD100EMAZ-00WJTA0
Serial Number:    *removed*
LU WWN Device Id: 5 000cca 267d8594f
Firmware Version: 83.H0A83
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Mar 30 17:21:58 2020 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)


I am genuinely over my head at this point and unsure how to proceed.
My logic tells me the best choice is to attempt a --create to try to
rebuild the missing superblocks, but I'm not clear if I should try
devices=4 (the true size of the array) or devices=3 (the size it was
last operating in).  I'm also not sure of what device order to use
since I have likely scrambled /dev/sd[bcde] and am concerned about
what happens when I bring the previously disable drive back into the
array.

Can anybody provide any guidance?

Thanks,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  0:04 Requesting assistance recovering RAID-5 array Daniel Jones
@ 2020-03-31  0:24 ` antlists
  2020-03-31  0:51   ` Daniel Jones
  0 siblings, 1 reply; 25+ messages in thread
From: antlists @ 2020-03-31  0:24 UTC (permalink / raw)
  To: Daniel Jones, linux-raid

On 31/03/2020 01:04, Daniel Jones wrote:
> I am genuinely over my head at this point and unsure how to proceed.
> My logic tells me the best choice is to attempt a --create to try to
> rebuild the missing superblocks, but I'm not clear if I should try
> devices=4 (the true size of the array) or devices=3 (the size it was
> last operating in).  I'm also not sure of what device order to use
> since I have likely scrambled /dev/sd[bcde] and am concerned about
> what happens when I bring the previously disable drive back into the
> array.

Don't even THINK of --create until the experts have chimed in !!!

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

The lsdrv information is crucial - that recovers pretty much all the 
config information that is available, and massively increases the 
chances of a successful --create, if you do have to go down that route...

If your drives are 1TB, I would *seriously* consider getting hold of a 
4TB drive - they're not expensive - to make a backup. And read up on 
overlays.

Hopefully we can recover your data without too much grief, but this will 
all help.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  0:24 ` antlists
@ 2020-03-31  0:51   ` Daniel Jones
  2020-03-31  1:27     ` crowston.name
  2020-03-31  1:48     ` Phil Turmel
  0 siblings, 2 replies; 25+ messages in thread
From: Daniel Jones @ 2020-03-31  0:51 UTC (permalink / raw)
  To: antlists; +Cc: linux-raid

Greetings Wol,

> Don't even THINK of --create until the experts have chimed in !!!

Yes, I have had impure thoughts, but fortunately (?) I've done nothing
yet to intentionally write to the drives.

> If your drives are 1TB, I would *seriously* consider getting hold of a 4TB drive - they're not expensive - to make a backup. And read up on overlays.

The array drives are 10TB each.  Understand the concept of overlays in
general (have used them in a container context) and have skimmed the
wiki, but not yet acted.

> The lsdrv information is crucial - that recovers pretty much all the config information that is available

Attached.

$ ./lsdrv
PCI [pata_marvell] 02:00.0 IDE interface: Marvell Technology Group
Ltd. 88SE6101/6102 single-port PATA133 interface (rev b2)
└scsi 0:x:x:x [Empty]
PCI [ahci] 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10
Family) SATA AHCI Controller
├scsi 2:0:0:0 ATA      M4-CT256M4SSD2   {0000000012050904283E}
│└sda 238.47g [8:0] Partitioned (dos)
│ ├sda1 500.00m [8:1] xfs {8ed274ce-4cf6-4804-88f8-0213c002a716}
│ │└Mounted as /dev/sda1 @ /boot
│ └sda2 237.99g [8:2] PV LVM2_member 237.92g used, 64.00m free
{kn8lMS-0Cy8-xpsR-QRTk-CTRG-Eh1J-lmtfws}
│  └VG centos_hulk 237.98g 64.00m free {P5MVrD-UMGG-0IO9-zFNq-8zd2-lycX-oYqe5L}
│   ├dm-2 185.92g [253:2] LV home xfs {39075ece-de0a-4ace-b291-cc22aff5a4b2}
│   │└Mounted as /dev/mapper/centos_hulk-home @ /home
│   ├dm-0 50.00g [253:0] LV root xfs {68ffae87-7b51-4392-b3b8-59a7aa13ea68}
│   │└Mounted as /dev/mapper/centos_hulk-root @ /
│   └dm-1 2.00g [253:1] LV swap swap {f2da9893-93f0-42a1-ba86-5f3b3a72cc9b}
├scsi 3:0:0:0 ATA      WDC WD100EMAZ-00 {1DGVH01Z}
│└sdb 9.10t [8:16] Partitioned (gpt)
├scsi 4:0:0:0 ATA      WDC WD100EMAZ-00 {2YJ2XMPD}
│└sdc 9.10t [8:32] MD raid5 (4) inactive 'hulk:0'
{423d9a8e-636a-5f08-56ec-bd90282e478b}
├scsi 5:0:0:0 ATA      WDC WD100EMAZ-00 {2YJDR8LD}
│└sdd 9.10t [8:48] Partitioned (gpt)
└scsi 6:0:0:0 ATA      WDC WD100EMAZ-00 {JEHRKH2Z}
 └sde 9.10t [8:64] Partitioned (gpt)

Cheers,
DJ

On Mon, Mar 30, 2020 at 6:24 PM antlists <antlists@youngman.org.uk> wrote:
>
> On 31/03/2020 01:04, Daniel Jones wrote:
> > I am genuinely over my head at this point and unsure how to proceed.
> > My logic tells me the best choice is to attempt a --create to try to
> > rebuild the missing superblocks, but I'm not clear if I should try
> > devices=4 (the true size of the array) or devices=3 (the size it was
> > last operating in).  I'm also not sure of what device order to use
> > since I have likely scrambled /dev/sd[bcde] and am concerned about
> > what happens when I bring the previously disable drive back into the
> > array.
>
> Don't even THINK of --create until the experts have chimed in !!!
>
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
>
> The lsdrv information is crucial - that recovers pretty much all the
> config information that is available, and massively increases the
> chances of a successful --create, if you do have to go down that route...
>
> If your drives are 1TB, I would *seriously* consider getting hold of a
> 4TB drive - they're not expensive - to make a backup. And read up on
> overlays.
>
> Hopefully we can recover your data without too much grief, but this will
> all help.
>
> Cheers,
> Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  0:51   ` Daniel Jones
@ 2020-03-31  1:27     ` crowston.name
  2020-03-31  1:50       ` Phil Turmel
  2020-03-31  1:48     ` Phil Turmel
  1 sibling, 1 reply; 25+ messages in thread
From: crowston.name @ 2020-03-31  1:27 UTC (permalink / raw)
  To: Daniel Jones, antlists; +Cc: linux-raid

I got the following error trying to run lsdrv: 

./lsdrv 
Traceback (most recent call last):
  File "./lsdrv", line 423, in <module>
    probe_block('/sys/block/'+x)
  File "./lsdrv", line 340, in probe_block
    blk.__dict__.update(extractvars(runx(['vol_id', '--export', '/dev/block/'+blk.dev])))
  File "./lsdrv", line 125, in runx
    out, err = sub.communicate()
  File "/usr/lib/python2.5/subprocess.py", line 667, in communicate
    return self._communicate(input)
  File "/usr/lib/python2.5/subprocess.py", line 1138, in _communicate
    rlist, wlist, xlist = select.select(read_set, write_set, [])
select.error: (4, 'Interrupted system call')




Kevin Crowston
206 Meadowbrook Dr.
Syracuse, NY 13210 USA
Phone: +1 (315) 464-0272
Fax: +1 (815) 550-2155

-----Original Message-----
From: Daniel Jones <dj@iowni.com>
Reply: Daniel Jones <dj@iowni.com>
Date: March 30, 2020 at 8:52:18 PM
To: antlists <antlists@youngman.org.uk>
Cc: linux-raid@vger.kernel.org <linux-raid@vger.kernel.org>
Subject:  Re: Requesting assistance recovering RAID-5 array

> Greetings Wol,
>  
> > Don't even THINK of --create until the experts have chimed in !!!
>  
> Yes, I have had impure thoughts, but fortunately (?) I've done nothing
> yet to intentionally write to the drives.
>  
> > If your drives are 1TB, I would *seriously* consider getting hold of a 4TB drive - they're  
> not expensive - to make a backup. And read up on overlays.
>  
> The array drives are 10TB each. Understand the concept of overlays in
> general (have used them in a container context) and have skimmed the
> wiki, but not yet acted.
>  
> > The lsdrv information is crucial - that recovers pretty much all the config information  
> that is available
>  
> Attached.
>  
> $ ./lsdrv
> PCI [pata_marvell] 02:00.0 IDE interface: Marvell Technology Group
> Ltd. 88SE6101/6102 single-port PATA133 interface (rev b2)
> └scsi 0:x:x:x [Empty]
> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10
> Family) SATA AHCI Controller
> ├scsi 2:0:0:0 ATA M4-CT256M4SSD2 {0000000012050904283E}
> │└sda 238.47g [8:0] Partitioned (dos)
> │ ├sda1 500.00m [8:1] xfs {8ed274ce-4cf6-4804-88f8-0213c002a716}
> │ │└Mounted as /dev/sda1 @ /boot
> │ └sda2 237.99g [8:2] PV LVM2_member 237.92g used, 64.00m free
> {kn8lMS-0Cy8-xpsR-QRTk-CTRG-Eh1J-lmtfws}
> │ └VG centos_hulk 237.98g 64.00m free {P5MVrD-UMGG-0IO9-zFNq-8zd2-lycX-oYqe5L}  
> │ ├dm-2 185.92g [253:2] LV home xfs {39075ece-de0a-4ace-b291-cc22aff5a4b2}
> │ │└Mounted as /dev/mapper/centos_hulk-home @ /home
> │ ├dm-0 50.00g [253:0] LV root xfs {68ffae87-7b51-4392-b3b8-59a7aa13ea68}
> │ │└Mounted as /dev/mapper/centos_hulk-root @ /
> │ └dm-1 2.00g [253:1] LV swap swap {f2da9893-93f0-42a1-ba86-5f3b3a72cc9b}
> ├scsi 3:0:0:0 ATA WDC WD100EMAZ-00 {1DGVH01Z}
> │└sdb 9.10t [8:16] Partitioned (gpt)
> ├scsi 4:0:0:0 ATA WDC WD100EMAZ-00 {2YJ2XMPD}
> │└sdc 9.10t [8:32] MD raid5 (4) inactive 'hulk:0'
> {423d9a8e-636a-5f08-56ec-bd90282e478b}
> ├scsi 5:0:0:0 ATA WDC WD100EMAZ-00 {2YJDR8LD}
> │└sdd 9.10t [8:48] Partitioned (gpt)
> └scsi 6:0:0:0 ATA WDC WD100EMAZ-00 {JEHRKH2Z}
> └sde 9.10t [8:64] Partitioned (gpt)
>  
> Cheers,
> DJ
>  
> On Mon, Mar 30, 2020 at 6:24 PM antlists wrote:
> >
> > On 31/03/2020 01:04, Daniel Jones wrote:
> > > I am genuinely over my head at this point and unsure how to proceed.
> > > My logic tells me the best choice is to attempt a --create to try to
> > > rebuild the missing superblocks, but I'm not clear if I should try
> > > devices=4 (the true size of the array) or devices=3 (the size it was
> > > last operating in). I'm also not sure of what device order to use
> > > since I have likely scrambled /dev/sd[bcde] and am concerned about
> > > what happens when I bring the previously disable drive back into the
> > > array.
> >
> > Don't even THINK of --create until the experts have chimed in !!!
> >
> > https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn  
> >
> > The lsdrv information is crucial - that recovers pretty much all the
> > config information that is available, and massively increases the
> > chances of a successful --create, if you do have to go down that route...
> >
> > If your drives are 1TB, I would *seriously* consider getting hold of a
> > 4TB drive - they're not expensive - to make a backup. And read up on
> > overlays.
> >
> > Hopefully we can recover your data without too much grief, but this will
> > all help.
> >
> > Cheers,
> > Wol
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  0:51   ` Daniel Jones
  2020-03-31  1:27     ` crowston.name
@ 2020-03-31  1:48     ` Phil Turmel
  2020-03-31  2:09       ` Daniel Jones
  1 sibling, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-03-31  1:48 UTC (permalink / raw)
  To: Daniel Jones, antlists; +Cc: linux-raid

Hi Daniel,

{ Convention on all kernel.org lists is to avoid top-posting and to trim 
unnecessary quoted material.  Please do so going forward. }

On 3/30/20 8:51 PM, Daniel Jones wrote:
> Greetings Wol,
> 
>> Don't even THINK of --create until the experts have chimed in !!!

Unfortunately, your new motherboard or your distro appears to have 
reacted to the presence of whole-disk raid members by establishing Gnu 
Partition Tables on them, blowing away those drives' superblocks.

Personally, I like the idea of whole-disk raid members, and did so for a 
while, until reports like yours made me change my ways.  Sorry.

You absolutely will need to use --create in this situation.  Much of the 
data necessary is available from the one remaining superblock, which you 
nicely included in your original report.

Since a --create will be needed, I recommend adjusting offsets to work 
with new partitions on each drive that start at a 1MB offset.

> Yes, I have had impure thoughts, but fortunately (?) I've done nothing
> yet to intentionally write to the drives.

Thank you.  This makes it much easier to help you.

[trim /]

>> The lsdrv information is crucial - that recovers pretty much all the config information that is available
> 
> Attached.
> 
> $ ./lsdrv
> PCI [pata_marvell] 02:00.0 IDE interface: Marvell Technology Group
> Ltd. 88SE6101/6102 single-port PATA133 interface (rev b2)
> └scsi 0:x:x:x [Empty]
> PCI [ahci] 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10
> Family) SATA AHCI Controller
> ├scsi 2:0:0:0 ATA      M4-CT256M4SSD2   {0000000012050904283E}
> │└sda 238.47g [8:0] Partitioned (dos)
> │ ├sda1 500.00m [8:1] xfs {8ed274ce-4cf6-4804-88f8-0213c002a716}
> │ │└Mounted as /dev/sda1 @ /boot
> │ └sda2 237.99g [8:2] PV LVM2_member 237.92g used, 64.00m free
> {kn8lMS-0Cy8-xpsR-QRTk-CTRG-Eh1J-lmtfws}
> │  └VG centos_hulk 237.98g 64.00m free {P5MVrD-UMGG-0IO9-zFNq-8zd2-lycX-oYqe5L}
> │   ├dm-2 185.92g [253:2] LV home xfs {39075ece-de0a-4ace-b291-cc22aff5a4b2}
> │   │└Mounted as /dev/mapper/centos_hulk-home @ /home
> │   ├dm-0 50.00g [253:0] LV root xfs {68ffae87-7b51-4392-b3b8-59a7aa13ea68}
> │   │└Mounted as /dev/mapper/centos_hulk-root @ /
> │   └dm-1 2.00g [253:1] LV swap swap {f2da9893-93f0-42a1-ba86-5f3b3a72cc9b}
> ├scsi 3:0:0:0 ATA      WDC WD100EMAZ-00 {1DGVH01Z}
> │└sdb 9.10t [8:16] Partitioned (gpt)
> ├scsi 4:0:0:0 ATA      WDC WD100EMAZ-00 {2YJ2XMPD}
> │└sdc 9.10t [8:32] MD raid5 (4) inactive 'hulk:0'
> {423d9a8e-636a-5f08-56ec-bd90282e478b}
> ├scsi 5:0:0:0 ATA      WDC WD100EMAZ-00 {2YJDR8LD}
> │└sdd 9.10t [8:48] Partitioned (gpt)
> └scsi 6:0:0:0 ATA      WDC WD100EMAZ-00 {JEHRKH2Z}
>   └sde 9.10t [8:64] Partitioned (gpt)

No shocks here.  But due to the incomplete array, useful details are 
missing.  In particular, knowledge of the filesystem or nested structure 
(LVM?) present on the array will be needed to identify the real data 
offsets of the three mangled members.  (lsdrv is really intended to 
document critical details of a healthy system to minimize this kind of 
uncertainty when it eventually breaks.)

Please tell us what you can.  If it was another LVM volume group, please 
look for backups of the LVM metadata, typically in /etc/lvm/backup/.

Or we can make educated guesses until read-only access presents working 
or near-working content.

> Cheers,
> DJ

Regards,

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  1:27     ` crowston.name
@ 2020-03-31  1:50       ` Phil Turmel
  0 siblings, 0 replies; 25+ messages in thread
From: Phil Turmel @ 2020-03-31  1:50 UTC (permalink / raw)
  To: crowston.name, Daniel Jones, antlists; +Cc: linux-raid

Hi Kevin,

On 3/30/20 9:27 PM, crowston.name wrote:
> I got the following error trying to run lsdrv:
> 
> ./lsdrv
> Traceback (most recent call last):
>    File "./lsdrv", line 423, in <module>
>      probe_block('/sys/block/'+x)
>    File "./lsdrv", line 340, in probe_block
>      blk.__dict__.update(extractvars(runx(['vol_id', '--export', '/dev/block/'+blk.dev])))
>    File "./lsdrv", line 125, in runx
>      out, err = sub.communicate()
>    File "/usr/lib/python2.5/subprocess.py", line 667, in communicate
>      return self._communicate(input)
>    File "/usr/lib/python2.5/subprocess.py", line 1138, in _communicate
>      rlist, wlist, xlist = select.select(read_set, write_set, [])
> select.error: (4, 'Interrupted system call')
> 
> 
> 
> 
> Kevin Crowston
> 206 Meadowbrook Dr.
> Syracuse, NY 13210 USA
> Phone: +1 (315) 464-0272
> Fax: +1 (815) 550-2155

Please don't hijack threads with unrelated reports.

Please report this on github where I can track it, but note that I'm not 
even trying to support python 2.5.

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  1:48     ` Phil Turmel
@ 2020-03-31  2:09       ` Daniel Jones
  2020-03-31 12:00         ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-03-31  2:09 UTC (permalink / raw)
  To: Phil Turmel; +Cc: antlists, linux-raid

Hello Phil,

> your new motherboard or your distro appears to have reacted to the presence of whole-disk raid members by establishing Gnu Partition Tables on them, blowing away those drives' superblocks.

Yes, this was an unpleasant surprise.  Won't build them this way again.

> In particular, knowledge of the filesystem or nested structure (LVM?) present on the array will be needed to identify the real data offsets of the three mangled members.

I don't have the history of original creation, but I'm fairly certain
it was something straightforward like:

  mdadm --create /dev/md0 {parameters}
  sudo mkfs.ext4 /dev/md0
  mount /dev/md0 /mnt/raid5

After the array was corrupted I needed to comment out the mount from
my fstab, which was as follows (confirming ext4):

    /dev/md0                                      /mnt/raid5
   ext4    defaults        0       0

Cheers,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31  2:09       ` Daniel Jones
@ 2020-03-31 12:00         ` Phil Turmel
  2020-03-31 13:36           ` Daniel Jones
  2020-04-01  3:39           ` Daniel Jones
  0 siblings, 2 replies; 25+ messages in thread
From: Phil Turmel @ 2020-03-31 12:00 UTC (permalink / raw)
  To: Daniel Jones; +Cc: antlists, linux-raid

Good morning Daniel,

On 3/30/20 10:09 PM, Daniel Jones wrote:
> Hello Phil,
>> In particular, knowledge of the filesystem or nested structure (LVM?) present on the array will be needed to identify the real data offsets of the three mangled members.
> 
> I don't have the history of original creation, but I'm fairly certain
> it was something straightforward like:
> 
>    mdadm --create /dev/md0 {parameters}
>    sudo mkfs.ext4 /dev/md0
>    mount /dev/md0 /mnt/raid5
> 
> After the array was corrupted I needed to comment out the mount from
> my fstab, which was as follows (confirming ext4):
> 
>      /dev/md0                                      /mnt/raid5
>     ext4    defaults        0       0

Ok.  This should be relatively easy, if a bit time consuming.  Things we 
know:

1) array layout, and chunk size: 512k or 1024 sectors
2) Active device #1 offset 261124 sectors.
3) The array had bad block logging turned on.  We won't re-enable this 
mis-feature.  It is default, so you must turn it off in your --create.

Things we don't know:

1) Data offsets for other drives.  However, the one we know appears to 
be the typical you'd get from one reshape after a modern default 
creation (262144).  There are good odds that the others are at this 
offset, except the newest one that might be at 262144.  You'll have to 
test four combinations: all at 261124 plus one at a time at 262144.

2) Member order for the other drives.  Three drives taken three at a 
time is six combinations.

3) Identity of the first drive kicked out. (Or do we know?)  If not 
known, there's four more combinations: whether to leave out or one of 
three left out.

That yields either twenty-four or 96 different --create --assume-clean 
combinations to test to find the one that gives you the cleanest 
filesystem in a read-only fsck.  (Do NOT mount!  Even a read-only mount 
will write to the filesystem.  Only test with fsck -n.)

Start by creating partitions on all devices, preferably at 2048 sectors. 
  (Should be the default offered.)  Use data offsets 259076 and 260100 
instead of 261124 and 262144.

I recommend writing out all the combinations before you start and 
keeping the fsck -n output from each until you have the final version 
you want.

Yeah, I'd write a script to do it all for me, if your best guess 
combination doesn't yield a good filesystem.

> Cheers,
> DJ

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31 12:00         ` Phil Turmel
@ 2020-03-31 13:36           ` Daniel Jones
  2020-04-01  3:39           ` Daniel Jones
  1 sibling, 0 replies; 25+ messages in thread
From: Daniel Jones @ 2020-03-31 13:36 UTC (permalink / raw)
  To: Phil Turmel; +Cc: antlists, linux-raid

Hi Phil,

Thanks for the guidance.  I'll think this through and come back with
any questions.

It turns out that I do have a file in /etc/lvm/backup/ but I believe
it is a red herring.  The file date is from 2016, well before this
array was created, and must represent some long-ago configuration
of this machine.

Regards,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-03-31 12:00         ` Phil Turmel
  2020-03-31 13:36           ` Daniel Jones
@ 2020-04-01  3:39           ` Daniel Jones
  2020-04-01  4:45             ` Phil Turmel
  1 sibling, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-04-01  3:39 UTC (permalink / raw)
  To: Phil Turmel; +Cc: antlists, linux-raid

Hello Phil, et al.,

Phil, after reading through your email I have some questions.

> The array had bad block logging turned on.  We won't re-enable this
> mis-feature.  It is default, so you must turn it off in your --create.

Am I able to turn off during --create?  The man page for mdadm on my
system (mdadm - v4.1 - 2018-10-01) suggests that --update=no-bbl can
be used for --assemble and --manage but doesn't list it for --create.

> However, the one we know appears to be the typical you'd get from
> one reshape after a modern default creation (262144).

> You'll have to test four combinations: all at 261124 plus one at a
> time at 262144.

I'm confused by the offsets. The one remaining superblock I have
reports "Data Offset : 261120 sectors".  Your email mentions 261124
and 262144. I don't understand how these three values are related?

I think it is most likely that my one existing superblock with 261120
is one of the original three drives and not the fourth drive that was
added later.  (Based on the position in drive bay).

So possible offsets (I'm still not clear on this) could be:

a) all 261120
b) all 261124
c) all 262144
d) three at 261120, one at 262144
e) three at 261120, one at 261124
f) three at 261124, one at 261120
g) three at 261124, one at 262144
h) three at 262144, one at 261120
i) three at 262144, one at 261124

( this ignores the combinations of not knowing which drive gets the
oddball offset )
( this also ignores for now the offsets of 259076 and 260100 mentioned below )

> 2) Member order for the other drives.  Three drives taken three at a
> time is six combinations.
>
> 3) Identity of the first drive kicked out. (Or do we know?)  If not
> known, there's four more combinations: whether to leave out or one of
> three left out.

Can I make any tentative conclusions from this information:

  Device Role : Active device 1
  Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)

I know /dev/sde is the device that didn't initially respond to BIOS
and suspect it is the "missing" drive from my superblock.

I know that /dev/sdc is the drive with a working superblock that
reports itself as "Active device 1".

I don't know how mdadm counts things (starting at 0 or starting at 1,
left to right or right to left, including or excluding the missing
drive).

Would it be reasonable for a first guess that:

.AAA = sde sdd sdc sdb  (assuming the order is missing, active 0,
active 1, active 2) ?

Procedure questions:

If I understand all the above, attempted recovery is going to be along
the lines of:

mdadm --create /dev/md0 --force --assume-clean --readonly
--data-offset=261120 --chunk=512K --level=5 --raid-devices=4 missing
/dev/sdd /dev/sdc /dev/sdb
fsck -n /dev/md0

Subject to:
Don't know if --force is desirable in this case?
Might need to try different offsets from above.  Don't know how to set
offsets if they are different per drive.
Should I start with guessing "missing" for 1 or should I start with all 4?
Might need to try all device orders.

> Start by creating partitions on all devices, preferably at 2048 sectors.
> (Should be the default offered.)  Use data offsets 259076 and 260100
> instead of 261124 and 262144.

If I understand, this is an alternative to recreating the whole-disk
mdadm containing one partition. Instead it would involve creating new
partition tables on each physical drive, creating one partition per
table, writing superblocks to the new /dev/sd[bcde]1 with offsets
adjusted by either 2044 or 2048 sectors, and then doing the fsck on
the assembled RAID.

I think the advantage proposed here is that it prevents this
"automated superblock overwrite" from happening again if/when I try
the motherboard upgrade, but the risk I'm not comfortable with yet is
going beyond "do the minimum to get it working again". Although it
isn't practical for me to do a dd full backup of these drives, if I
can get the array mounted again I can copy off the most important data
before doing a grander repartitioning.

Can you advise on any of the above?

Thanks,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01  3:39           ` Daniel Jones
@ 2020-04-01  4:45             ` Phil Turmel
  2020-04-01  6:03               ` Daniel Jones
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-01  4:45 UTC (permalink / raw)
  To: Daniel Jones; +Cc: antlists, linux-raid

Hi Daniel,

On 3/31/20 11:39 PM, Daniel Jones wrote:
> Hello Phil, et al.,
> 
> Phil, after reading through your email I have some questions.
> 
>> The array had bad block logging turned on.  We won't re-enable this
>> mis-feature.  It is default, so you must turn it off in your --create.
> 
> Am I able to turn off during --create?  The man page for mdadm on my
> system (mdadm - v4.1 - 2018-10-01) suggests that --update=no-bbl can
> be used for --assemble and --manage but doesn't list it for --create.

Uhm, self compile and see what you get.  In these situations, relying on 
a potentially buggy system mdadm is not recommended.  But if still not 
available in create, fix it afterwards.  You definitely do not want this.

>> However, the one we know appears to be the typical you'd get from
>> one reshape after a modern default creation (262144).
> 
>> You'll have to test four combinations: all at 261124 plus one at a
>> time at 262144.
> 
> I'm confused by the offsets. The one remaining superblock I have
> reports "Data Offset : 261120 sectors".  Your email mentions 261124
> and 262144. I don't understand how these three values are related?

Yeah, doing math in one's head quickly sometimes yields a fail.  262144 
is 128MB in sectors.  Minus 1024 sectors (your chunk size) yields 
261120.  /:

> I think it is most likely that my one existing superblock with 261120
> is one of the original three drives and not the fourth drive that was
> added later.  (Based on the position in drive bay).
> 
> So possible offsets (I'm still not clear on this) could be:
> 
> a) all 261120

Yes.

> b) all 261124

No.

> c) all 262144

No.

> d) three at 261120, one at 262144

Yes.

> e) three at 261120, one at 261124
> f) three at 261124, one at 261120
> g) three at 261124, one at 262144

No, no, and no.

> h) three at 262144, one at 261120

Extremely unlikely.  Not in my recommended combinations to check.

> i) three at 262144, one at 261124

No.

> ( this ignores the combinations of not knowing which drive gets the
> oddball offset )
> ( this also ignores for now the offsets of 259076 and 260100 mentioned below )
> 
>> 2) Member order for the other drives.  Three drives taken three at a
>> time is six combinations.
>>
>> 3) Identity of the first drive kicked out. (Or do we know?)  If not
>> known, there's four more combinations: whether to leave out or one of
>> three left out.
> 
> Can I make any tentative conclusions from this information:
> 
>    Device Role : Active device 1
>    Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)

This device will always be listed as the 2nd member in all of your 
--create commands, and always with the offset of 261120 - 2048.

> I know /dev/sde is the device that didn't initially respond to BIOS
> and suspect it is the "missing" drive from my superblock.

That eliminates the combinations of (3).  Section (2) becomes three 
drives taken two at a time (since you don't know which device role 
/dev/sde had).  But that is still six combinations.

> I know that /dev/sdc is the drive with a working superblock that
> reports itself as "Active device 1".

Right, as above.

> I don't know how mdadm counts things (starting at 0 or starting at 1,
> left to right or right to left, including or excluding the missing
> drive).

Active devices start with zero.

> Would it be reasonable for a first guess that:
> 
> .AAA = sde sdd sdc sdb  (assuming the order is missing, active 0,
> active 1, active 2) ?

No.  Order is always active devices 0-3, with one of those replaced (in 
order) with "missing".

> Procedure questions:
> 
> If I understand all the above, attempted recovery is going to be along
> the lines of:
> 
> mdadm --create /dev/md0 --force --assume-clean --readonly
> --data-offset=261120 --chunk=512K --level=5 --raid-devices=4 missing
> /dev/sdd /dev/sdc /dev/sdb
> fsck -n /dev/md0

Yes, but with the order above, and with --data-offset=variable when 
mixing them.

> Subject to:
> Don't know if --force is desirable in this case?

Not applicable to --create.

> Might need to try different offsets from above.  Don't know how to set
> offsets if they are different per drive.

man page.

> Should I start with guessing "missing" for 1 or should I start with all 4?
> Might need to try all device orders.
> 
>> Start by creating partitions on all devices, preferably at 2048 sectors.
>> (Should be the default offered.)  Use data offsets 259076 and 260100
>> instead of 261124 and 262144.
> 
> If I understand, this is an alternative to recreating the whole-disk
> mdadm containing one partition. Instead it would involve creating new
> partition tables on each physical drive, creating one partition per
> table, writing superblocks to the new /dev/sd[bcde]1 with offsets
> adjusted by either 2044 or 2048 sectors, and then doing the fsck on
> the assembled RAID.

Yes, 2048.

> I think the advantage proposed here is that it prevents this
> "automated superblock overwrite" from happening again if/when I try
> the motherboard upgrade, but the risk I'm not comfortable with yet is
> going beyond "do the minimum to get it working again". Although it
> isn't practical for me to do a dd full backup of these drives, if I
> can get the array mounted again I can copy off the most important data
> before doing a grander repartitioning.

It's virtually impossible to correct at any time other than create, so 
do it now.  The "minimum" is a rather brutal situation.  Fix it right.

> Can you advise on any of the above?
> 
> Thanks,
> DJ
> 

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01  4:45             ` Phil Turmel
@ 2020-04-01  6:03               ` Daniel Jones
  2020-04-01 12:15                 ` Wols Lists
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-04-01  6:03 UTC (permalink / raw)
  To: Phil Turmel; +Cc: antlists, linux-raid

Thanks Phil,

I'll read this a couple of times and try some commands (likely on an
overlay) tomorrow.

Regards,
DJ

On Tue, Mar 31, 2020 at 10:45 PM Phil Turmel <philip@turmel.org> wrote:
>
> Hi Daniel,
>
> On 3/31/20 11:39 PM, Daniel Jones wrote:
> > Hello Phil, et al.,
> >
> > Phil, after reading through your email I have some questions.
> >
> >> The array had bad block logging turned on.  We won't re-enable this
> >> mis-feature.  It is default, so you must turn it off in your --create.
> >
> > Am I able to turn off during --create?  The man page for mdadm on my
> > system (mdadm - v4.1 - 2018-10-01) suggests that --update=no-bbl can
> > be used for --assemble and --manage but doesn't list it for --create.
>
> Uhm, self compile and see what you get.  In these situations, relying on
> a potentially buggy system mdadm is not recommended.  But if still not
> available in create, fix it afterwards.  You definitely do not want this.
>
> >> However, the one we know appears to be the typical you'd get from
> >> one reshape after a modern default creation (262144).
> >
> >> You'll have to test four combinations: all at 261124 plus one at a
> >> time at 262144.
> >
> > I'm confused by the offsets. The one remaining superblock I have
> > reports "Data Offset : 261120 sectors".  Your email mentions 261124
> > and 262144. I don't understand how these three values are related?
>
> Yeah, doing math in one's head quickly sometimes yields a fail.  262144
> is 128MB in sectors.  Minus 1024 sectors (your chunk size) yields
> 261120.  /:
>
> > I think it is most likely that my one existing superblock with 261120
> > is one of the original three drives and not the fourth drive that was
> > added later.  (Based on the position in drive bay).
> >
> > So possible offsets (I'm still not clear on this) could be:
> >
> > a) all 261120
>
> Yes.
>
> > b) all 261124
>
> No.
>
> > c) all 262144
>
> No.
>
> > d) three at 261120, one at 262144
>
> Yes.
>
> > e) three at 261120, one at 261124
> > f) three at 261124, one at 261120
> > g) three at 261124, one at 262144
>
> No, no, and no.
>
> > h) three at 262144, one at 261120
>
> Extremely unlikely.  Not in my recommended combinations to check.
>
> > i) three at 262144, one at 261124
>
> No.
>
> > ( this ignores the combinations of not knowing which drive gets the
> > oddball offset )
> > ( this also ignores for now the offsets of 259076 and 260100 mentioned below )
> >
> >> 2) Member order for the other drives.  Three drives taken three at a
> >> time is six combinations.
> >>
> >> 3) Identity of the first drive kicked out. (Or do we know?)  If not
> >> known, there's four more combinations: whether to leave out or one of
> >> three left out.
> >
> > Can I make any tentative conclusions from this information:
> >
> >    Device Role : Active device 1
> >    Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)
>
> This device will always be listed as the 2nd member in all of your
> --create commands, and always with the offset of 261120 - 2048.
>
> > I know /dev/sde is the device that didn't initially respond to BIOS
> > and suspect it is the "missing" drive from my superblock.
>
> That eliminates the combinations of (3).  Section (2) becomes three
> drives taken two at a time (since you don't know which device role
> /dev/sde had).  But that is still six combinations.
>
> > I know that /dev/sdc is the drive with a working superblock that
> > reports itself as "Active device 1".
>
> Right, as above.
>
> > I don't know how mdadm counts things (starting at 0 or starting at 1,
> > left to right or right to left, including or excluding the missing
> > drive).
>
> Active devices start with zero.
>
> > Would it be reasonable for a first guess that:
> >
> > .AAA = sde sdd sdc sdb  (assuming the order is missing, active 0,
> > active 1, active 2) ?
>
> No.  Order is always active devices 0-3, with one of those replaced (in
> order) with "missing".
>
> > Procedure questions:
> >
> > If I understand all the above, attempted recovery is going to be along
> > the lines of:
> >
> > mdadm --create /dev/md0 --force --assume-clean --readonly
> > --data-offset=261120 --chunk=512K --level=5 --raid-devices=4 missing
> > /dev/sdd /dev/sdc /dev/sdb
> > fsck -n /dev/md0
>
> Yes, but with the order above, and with --data-offset=variable when
> mixing them.
>
> > Subject to:
> > Don't know if --force is desirable in this case?
>
> Not applicable to --create.
>
> > Might need to try different offsets from above.  Don't know how to set
> > offsets if they are different per drive.
>
> man page.
>
> > Should I start with guessing "missing" for 1 or should I start with all 4?
> > Might need to try all device orders.
> >
> >> Start by creating partitions on all devices, preferably at 2048 sectors.
> >> (Should be the default offered.)  Use data offsets 259076 and 260100
> >> instead of 261124 and 262144.
> >
> > If I understand, this is an alternative to recreating the whole-disk
> > mdadm containing one partition. Instead it would involve creating new
> > partition tables on each physical drive, creating one partition per
> > table, writing superblocks to the new /dev/sd[bcde]1 with offsets
> > adjusted by either 2044 or 2048 sectors, and then doing the fsck on
> > the assembled RAID.
>
> Yes, 2048.
>
> > I think the advantage proposed here is that it prevents this
> > "automated superblock overwrite" from happening again if/when I try
> > the motherboard upgrade, but the risk I'm not comfortable with yet is
> > going beyond "do the minimum to get it working again". Although it
> > isn't practical for me to do a dd full backup of these drives, if I
> > can get the array mounted again I can copy off the most important data
> > before doing a grander repartitioning.
>
> It's virtually impossible to correct at any time other than create, so
> do it now.  The "minimum" is a rather brutal situation.  Fix it right.
>
> > Can you advise on any of the above?
> >
> > Thanks,
> > DJ
> >
>
> Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01  6:03               ` Daniel Jones
@ 2020-04-01 12:15                 ` Wols Lists
  2020-04-01 12:55                   ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Wols Lists @ 2020-04-01 12:15 UTC (permalink / raw)
  To: Daniel Jones, Phil Turmel; +Cc: linux-raid

On 01/04/20 07:03, Daniel Jones wrote:
> Thanks Phil,
> 
> I'll read this a couple of times and try some commands (likely on an
> overlay) tomorrow.

If you CAN overlay, then DO. If you can't back up the drives, the more
you can do to protect them from being accidentally written to, the better.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 12:15                 ` Wols Lists
@ 2020-04-01 12:55                   ` Phil Turmel
  2020-04-01 15:21                     ` Daniel Jones
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-01 12:55 UTC (permalink / raw)
  To: Wols Lists, Daniel Jones; +Cc: linux-raid

On 4/1/20 8:15 AM, Wols Lists wrote:
> On 01/04/20 07:03, Daniel Jones wrote:
>> Thanks Phil,
>>
>> I'll read this a couple of times and try some commands (likely on an
>> overlay) tomorrow.
> 
> If you CAN overlay, then DO. If you can't back up the drives, the more
> you can do to protect them from being accidentally written to, the better.

I have to admit that I pretty much never use overlays.  But then I'm 
entirely confident of what any given mdadm/lvm/fdisk operation will do, 
in regards to writing to devices.  And since it is burned into my psyche 
that raid is *not* backup, only an uptime aid, I keep good external 
backups. (:

Lacking confidence and lacking backups are both good reasons for using 
overlays.  I'm not entirely sure the mental effort for a novice to learn 
to use overlays is time better spent than learning enough about MD and 
mdadm for confident use.

I do think initial recovery efforts with --assemble and --assemble 
--force do not need to be done with overlays.  They are so safe and so 
likely to quickly yield a working array that I think overlays should be 
recommended only for invasive tasks needed after these --assemble 
operations fail.

--create is a very invasive opertion.

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 12:55                   ` Phil Turmel
@ 2020-04-01 15:21                     ` Daniel Jones
  2020-04-01 15:38                       ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-04-01 15:21 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Wols Lists, linux-raid

Hi Phil, Wols,

(Sorry for the top-post in my last reply).

I'm working through everything Phil recommended.  I am also using
overlays exactly as documented on the
"Irreversible_mdadm_failure_recovery" wiki.  Things look very
favorable so far.

A quick question on what I'm doing?

As per Phil's suggestion to put the array inside partitions, I have
created partitions inside each of /dev/mapper/sd[bcde].  The gdisk
operations end with the message:

  Warning: The kernel is still using the old partition table.
  The new table will be used at the next reboot.
  The operation has completed successfully.

My question is how to get the kernel to recognize the
/dev/mapper/sd[bcde]1 partitions I have created?  Rebooting doesn't do
anything as the overlay loop files aren't something that gets
recognized during boot.

Thanks,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 15:21                     ` Daniel Jones
@ 2020-04-01 15:38                       ` Phil Turmel
  2020-04-01 15:39                         ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-01 15:38 UTC (permalink / raw)
  To: Daniel Jones; +Cc: Wols Lists, linux-raid

On 4/1/20 11:21 AM, Daniel Jones wrote:
> Hi Phil, Wols,

> A quick question on what I'm doing?
> 
> As per Phil's suggestion to put the array inside partitions, I have
> created partitions inside each of /dev/mapper/sd[bcde].  The gdisk
> operations end with the message:
> 
>    Warning: The kernel is still using the old partition table.
>    The new table will be used at the next reboot.
>    The operation has completed successfully.
> 
> My question is how to get the kernel to recognize the
> /dev/mapper/sd[bcde]1 partitions I have created?  Rebooting doesn't do
> anything as the overlay loop files aren't something that gets
> recognized during boot.

I would create the partition tables once on the live disks.  Then make 
overlays for the partitions on each test.

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 15:38                       ` Phil Turmel
@ 2020-04-01 15:39                         ` Phil Turmel
  2020-04-01 18:07                           ` Daniel Jones
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-01 15:39 UTC (permalink / raw)
  To: Daniel Jones; +Cc: Wols Lists, linux-raid

On 4/1/20 11:38 AM, Phil Turmel wrote:
> On 4/1/20 11:21 AM, Daniel Jones wrote:
>> Hi Phil, Wols,

>> My question is how to get the kernel to recognize the
>> /dev/mapper/sd[bcde]1 partitions I have created?  Rebooting doesn't do
>> anything as the overlay loop files aren't something that gets
>> recognized during boot.
> 
> I would create the partition tables once on the live disks.  Then make 
> overlays for the partitions on each test.

And use partprobe if needed to tell the kernel to re-read the partition 
tables.

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 15:39                         ` Phil Turmel
@ 2020-04-01 18:07                           ` Daniel Jones
  2020-04-01 18:32                             ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-04-01 18:07 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Wols Lists, linux-raid

Hi Phil,

So far so good.

1: I have run gdisk on each physical drive to create a new partition.

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048     19532873694   9.1 TiB     8300  Linux filesystem

2: Everything from here is on overlays. I tested many combinations of
--create. This appears to be the correct one:

mdadm --create /dev/md0 --assume-clean --data-offset=129536 --level=5
--chunk=512K --raid-devices=4 missing /dev/mapper/sdc1
/dev/mapper/sdd1 /dev/mapper/sde1

Data offset was calculated to be (261120-2048)/2 since my mdadm
expects it in kB.
All six combinations of device orders were tested, bcde was the only
one that fsck liked.
Array was tested in configs of bcde, Xcde, bcXe, bcdX (where X is missing).
Configs that passed fsck were mounted and data inspected.

  bcde = did not contain last files known to be written to array
  Xcde = **did** contain last files known to be written to array
  bcXe = fsck 400,000+ errors
  bcdX = did not contain last files known to be written to array

3: I then attempted to add the removed drive (still using overlay).

# mdadm --manage /dev/md0 --re-add /dev/mapper/sdb1
mdadm: --re-add for /dev/mapper/sdb1 to /dev/md0 is not possible

# mdadm --manage /dev/md0 --add /dev/mapper/sdb1
mdadm: added /dev/mapper/sdb1

It did this for a short while

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 dm-3[4] dm-6[3] dm-5[2] dm-4[1]
      29298917376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.0% (457728/9766305792)
finish=18089.5min speed=8997K/sec
      bitmap: 0/73 pages [0KB], 65536KB chunk

Then ended in this state:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 dm-3[4](F) dm-6[3] dm-5[2] dm-4[1]
      29298917376 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      bitmap: 2/73 pages [8KB], 65536KB chunk

unused devices: <none>

# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Wed Apr  1 11:34:26 2020
        Raid Level : raid5
        Array Size : 29298917376 (27941.63 GiB 30002.09 GB)
     Used Dev Size : 9766305792 (9313.88 GiB 10000.70 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Wed Apr  1 11:41:27 2020
             State : clean, degraded
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 1
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : hulk:0  (local to host hulk)
              UUID : 29e8195c:1da9c101:209c7751:5fc7d1b9
            Events : 37

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1     253        4        1      active sync   /dev/dm-4
       2     253        5        2      active sync   /dev/dm-5
       3     253        6        3      active sync   /dev/dm-6

       4     253        3        -      faulty   /dev/dm-3


# mdadm -E /dev/mapper/sd[bcde]1
mdadm: No md superblock detected on /dev/mapper/sdb1.
/dev/mapper/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 29e8195c:1da9c101:209c7751:5fc7d1b9
           Name : hulk:0  (local to host hulk)
  Creation Time : Wed Apr  1 11:34:26 2020
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 19532612575 sectors (9313.88 GiB 10000.70 GB)
     Array Size : 29298917376 KiB (27941.63 GiB 30002.09 GB)
  Used Dev Size : 19532611584 sectors (9313.88 GiB 10000.70 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258992 sectors, after=991 sectors
          State : clean
    Device UUID : 683a2ac8:e9242cda:e522c872:f86ca9b5

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Apr  1 11:41:27 2020
  Bad Block Log : 512 entries available at offset 48 sectors
       Checksum : 1e17785b - correct
         Events : 37

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 29e8195c:1da9c101:209c7751:5fc7d1b9
           Name : hulk:0  (local to host hulk)
  Creation Time : Wed Apr  1 11:34:26 2020
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 19532612575 sectors (9313.88 GiB 10000.70 GB)
     Array Size : 29298917376 KiB (27941.63 GiB 30002.09 GB)
  Used Dev Size : 19532611584 sectors (9313.88 GiB 10000.70 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258992 sectors, after=991 sectors
          State : clean
    Device UUID : 63885fec:40e5f57f:59f73757:958d5cf6

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Apr  1 11:41:27 2020
  Bad Block Log : 512 entries available at offset 48 sectors
       Checksum : d397bbf - correct
         Events : 37

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 29e8195c:1da9c101:209c7751:5fc7d1b9
           Name : hulk:0  (local to host hulk)
  Creation Time : Wed Apr  1 11:34:26 2020
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 19532612575 sectors (9313.88 GiB 10000.70 GB)
     Array Size : 29298917376 KiB (27941.63 GiB 30002.09 GB)
  Used Dev Size : 19532611584 sectors (9313.88 GiB 10000.70 GB)
    Data Offset : 259072 sectors
   Super Offset : 8 sectors
   Unused Space : before=258992 sectors, after=991 sectors
          State : clean
    Device UUID : ad3fa4d3:20a0582a:098b31d1:38f2b248

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Apr  1 11:41:27 2020
  Bad Block Log : 512 entries available at offset 48 sectors
       Checksum : 6b30e63c - correct
         Events : 37

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : .AAA ('A' == active, '.' == missing, 'R' == replacing)

4: Summary

The drives have had physical partitions written.
I think I've found the correct offset and device order to use --create
to restore the array to the degraded state it was in before the
superblocks were overwritten.
I'm not sure why the --add doesn't work.

Thanks so much for your help this far.

Regards,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 18:07                           ` Daniel Jones
@ 2020-04-01 18:32                             ` Phil Turmel
  2020-04-03 18:29                               ` Daniel Jones
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-01 18:32 UTC (permalink / raw)
  To: Daniel Jones; +Cc: Wols Lists, linux-raid

Hi Daniel,

On 4/1/20 2:07 PM, Daniel Jones wrote:
> Hi Phil,
> 
> So far so good.

Yes.

> # mdadm --manage /dev/md0 --add /dev/mapper/sdb1
> mdadm: added /dev/mapper/sdb1

Don't do this.  Overlays can't really handle the amount of data that 
would be involved, and you definitely don't want to rebuild yet.

> 4: Summary
> 
> The drives have had physical partitions written.
> I think I've found the correct offset and device order to use --create
> to restore the array to the degraded state it was in before the
> superblocks were overwritten.

Yes.

> I'm not sure why the --add doesn't work.

Don't do the --add operation until you've copied anything critical in 
the array to external backups (while running with 3 of 4).  The reason 
is that any not-yet-discovered URE on those three will certainly crash 
the array during rebuild.  It could still crash copying critical stuff, 
but you can repeatedly --assemble --force to keep going with the next 
items to backup.

Only when you've backed up everything possible do you --add the fourth 
drive back into the array.

> Thanks so much for your help this far.

You're welcome.

> Regards,
> DJ

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-01 18:32                             ` Phil Turmel
@ 2020-04-03 18:29                               ` Daniel Jones
  2020-04-03 18:34                                 ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-04-03 18:29 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Wols Lists, linux-raid

Hello again,

> Don't do the --add operation until you've copied anything critical in the array to external backups (while running with 3 of 4).

Everything from the array has been backed up elsewhere.

Up until now the only writes intentionally done to the physical drives
have been the new partition tables.  Everything else has been through
the overlay.

Now I think I'm ready to run a --create as follows on the physical drives:
mdadm --create /dev/md0 --assume-clean --data-offset=129536 --level=5
--chunk=512K --raid-devices=4 missing /dev/sdc1 /dev/sdd1 /dev/sde1

After that I'd try to re-add the rejected drive?
mdadm --manage /dev/md0 --add /dev/sdb1

Part of me wonders about just rebuilding the whole thing and then
copying the data back, but I don't know that would be any better then
this path.

Thanks,
DJ

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-03 18:29                               ` Daniel Jones
@ 2020-04-03 18:34                                 ` Phil Turmel
  2020-04-03 18:42                                   ` Daniel Jones
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-03 18:34 UTC (permalink / raw)
  To: Daniel Jones; +Cc: Wols Lists, linux-raid

On 4/3/20 2:29 PM, Daniel Jones wrote:
> Hello again,
> 
>> Don't do the --add operation until you've copied anything critical in the array to external backups (while running with 3 of 4).
> 
> Everything from the array has been backed up elsewhere.
> 
> Up until now the only writes intentionally done to the physical drives
> have been the new partition tables.  Everything else has been through
> the overlay.
> 
> Now I think I'm ready to run a --create as follows on the physical drives:
> mdadm --create /dev/md0 --assume-clean --data-offset=129536 --level=5
> --chunk=512K --raid-devices=4 missing /dev/sdc1 /dev/sdd1 /dev/sde1
> 
> After that I'd try to re-add the rejected drive?
> mdadm --manage /dev/md0 --add /dev/sdb1
> 
> Part of me wonders about just rebuilding the whole thing and then
> copying the data back, but I don't know that would be any better then
> this path.

Sounds like a risk-free decision.  mdadm --create --assume-clean 
followed by a proper fsck will be lots faster than mdadm --create, mkfs, 
and copying.

I'd go fast.

Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-03 18:34                                 ` Phil Turmel
@ 2020-04-03 18:42                                   ` Daniel Jones
  2020-04-03 18:43                                     ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Jones @ 2020-04-03 18:42 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Wols Lists, linux-raid

After the "--create missing /dev/sdc1 /dev/sdd1 /dev/sde1"  and the
fsck, is "mdadm --manage /dev/md0 --add /dev/sdb" the correct syntax
for attempting to add?

-DJ

On Fri, Apr 3, 2020 at 12:34 PM Phil Turmel <philip@turmel.org> wrote:
>
> On 4/3/20 2:29 PM, Daniel Jones wrote:
> > Hello again,
> >
> >> Don't do the --add operation until you've copied anything critical in the array to external backups (while running with 3 of 4).
> >
> > Everything from the array has been backed up elsewhere.
> >
> > Up until now the only writes intentionally done to the physical drives
> > have been the new partition tables.  Everything else has been through
> > the overlay.
> >
> > Now I think I'm ready to run a --create as follows on the physical drives:
> > mdadm --create /dev/md0 --assume-clean --data-offset=129536 --level=5
> > --chunk=512K --raid-devices=4 missing /dev/sdc1 /dev/sdd1 /dev/sde1
> >
> > After that I'd try to re-add the rejected drive?
> > mdadm --manage /dev/md0 --add /dev/sdb1
> >
> > Part of me wonders about just rebuilding the whole thing and then
> > copying the data back, but I don't know that would be any better then
> > this path.
>
> Sounds like a risk-free decision.  mdadm --create --assume-clean
> followed by a proper fsck will be lots faster than mdadm --create, mkfs,
> and copying.
>
> I'd go fast.
>
> Phil

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-03 18:42                                   ` Daniel Jones
@ 2020-04-03 18:43                                     ` Phil Turmel
  2020-04-03 20:13                                       ` Adam Goryachev
  0 siblings, 1 reply; 25+ messages in thread
From: Phil Turmel @ 2020-04-03 18:43 UTC (permalink / raw)
  To: Daniel Jones; +Cc: Wols Lists, linux-raid


On 4/3/20 2:42 PM, Daniel Jones wrote:
> After the "--create missing /dev/sdc1 /dev/sdd1 /dev/sde1"  and the
> fsck, is "mdadm --manage /dev/md0 --add /dev/sdb" the correct syntax
> for attempting to add?

You can leave out "--manage".  But yes.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-03 18:43                                     ` Phil Turmel
@ 2020-04-03 20:13                                       ` Adam Goryachev
  2020-04-03 20:14                                         ` Phil Turmel
  0 siblings, 1 reply; 25+ messages in thread
From: Adam Goryachev @ 2020-04-03 20:13 UTC (permalink / raw)
  To: Phil Turmel, Daniel Jones; +Cc: Wols Lists, linux-raid


On 4/4/20 05:43, Phil Turmel wrote:
>
> On 4/3/20 2:42 PM, Daniel Jones wrote:
>> After the "--create missing /dev/sdc1 /dev/sdd1 /dev/sde1"  and the
>> fsck, is "mdadm --manage /dev/md0 --add /dev/sdb" the correct syntax
>> for attempting to add?
>
> You can leave out "--manage".  But yes.

I was mostly following this, but might have missed something here so 
this is just a suggestion to double check....

If you are trying to use partitions instead of whole devices (to prevent 
this happening again in future), then shouldn't you use:

mdadm --manage /dev/md0 --add /dev/sdb1

ie, sdb1 not sdb....

Regards,
Adam

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Requesting assistance recovering RAID-5 array
  2020-04-03 20:13                                       ` Adam Goryachev
@ 2020-04-03 20:14                                         ` Phil Turmel
  0 siblings, 0 replies; 25+ messages in thread
From: Phil Turmel @ 2020-04-03 20:14 UTC (permalink / raw)
  To: Adam Goryachev, Daniel Jones; +Cc: Wols Lists, linux-raid

On 4/3/20 4:13 PM, Adam Goryachev wrote:
> 
> On 4/4/20 05:43, Phil Turmel wrote:
>>
>> On 4/3/20 2:42 PM, Daniel Jones wrote:
>>> After the "--create missing /dev/sdc1 /dev/sdd1 /dev/sde1"  and the
>>> fsck, is "mdadm --manage /dev/md0 --add /dev/sdb" the correct syntax
>>> for attempting to add?
>>
>> You can leave out "--manage".  But yes.
> 
> I was mostly following this, but might have missed something here so 
> this is just a suggestion to double check....
> 
> If you are trying to use partitions instead of whole devices (to prevent 
> this happening again in future), then shouldn't you use:
> 
> mdadm --manage /dev/md0 --add /dev/sdb1
> 
> ie, sdb1 not sdb....

Yes.  Good catch.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-04-03 20:14 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-31  0:04 Requesting assistance recovering RAID-5 array Daniel Jones
2020-03-31  0:24 ` antlists
2020-03-31  0:51   ` Daniel Jones
2020-03-31  1:27     ` crowston.name
2020-03-31  1:50       ` Phil Turmel
2020-03-31  1:48     ` Phil Turmel
2020-03-31  2:09       ` Daniel Jones
2020-03-31 12:00         ` Phil Turmel
2020-03-31 13:36           ` Daniel Jones
2020-04-01  3:39           ` Daniel Jones
2020-04-01  4:45             ` Phil Turmel
2020-04-01  6:03               ` Daniel Jones
2020-04-01 12:15                 ` Wols Lists
2020-04-01 12:55                   ` Phil Turmel
2020-04-01 15:21                     ` Daniel Jones
2020-04-01 15:38                       ` Phil Turmel
2020-04-01 15:39                         ` Phil Turmel
2020-04-01 18:07                           ` Daniel Jones
2020-04-01 18:32                             ` Phil Turmel
2020-04-03 18:29                               ` Daniel Jones
2020-04-03 18:34                                 ` Phil Turmel
2020-04-03 18:42                                   ` Daniel Jones
2020-04-03 18:43                                     ` Phil Turmel
2020-04-03 20:13                                       ` Adam Goryachev
2020-04-03 20:14                                         ` Phil Turmel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.