All of lore.kernel.org
 help / color / mirror / Atom feed
* Raid 6 recovery
@ 2017-10-31 15:42 John Crisp
  2017-10-31 16:27 ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: John Crisp @ 2017-10-31 15:42 UTC (permalink / raw)
  To: linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 8402 bytes --]

Hi,

Returning once again to this list for some help and advice.

Long story short I have a failed Raid 6 array that I would like to try
and recover. The data is not vitally important as I have most of it in a
number of other places, but I'd like to try and resurrect the array if
possible, as much to learn as anything.

The array had an issue some while ago, but as I had no space to store
any recovered data I left the machine off.

The OS is Xubuntu 14.04

The system consisted of a boot/OS array with two mirrored drives (which
is fine), and then a Raid 6 data array which consisted of 8 300Gb Ultra
Wide SCSI drives. 7 were in the array with a spare (if my memory serves
me correctly).

As far as I remember the machine suffered a power failure. When it
powered up again, the system tried to restore/rebuild the array. During
this I think the power failed again (don't ask.....) It suffered at lest
one disk failure. I then left it off to try another day.

Drive layout is as follows:

RAID 1 mirror /dev/sda + b

RAID 6 array /dev/sd[cdefghij]

/dev/sdd was dead and has been replaced.

As far as I remember I created the array, then added a partition and
then LVM (possibly not a good idea in hindsight). So none of the
individual drives show a partition......

I had a good read here and created some of the scripts.

https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID

Here is some of the output I have got so far. Any advice appreciated.

B. Rgds
John



root@garage:~# sed -e '/^[[:space:]]*$/d' -e '/^[[:space:]]*#/d'
/etc/mdadm/mdadm.conf
CREATE owner=root group=disk mode=0660 auto=yes
HOMEHOST <system>
MAILADDR root
DEVICE partitions
ARRAY /dev/md0 metadata=1.2 name=garage:0
UUID=90624393:3b638ad8:9aeb81ca:fa3caafc
ARRAY /dev/md1 metadata=1.2 name=garage:1
UUID=f624610a:b711ff4b:3b126550:a8f78732
ARRAY /dev/md/Data metadata=1.2 name=garage:Data
UUID=1a2f92b0:d7c1a540:165b9ab7:0baed449


cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : inactive sdf[3](S) sdh[5](S)
      585675356 blocks super 1.2

md1 : active raid1 sda5[2] sdb5[3]
      292674368 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[2] sdb1[3]
      248640 blocks super 1.2 [2/2] [UU]

unused devices: <none>

mdadm --stop /dev/md127


Notice the following will not work with /dev/sdc1 as there is no
partition on the drive. Have to use /dev/sdc :

UUID=$(mdadm -E /dev/sdc|perl -ne '/Array UUID : (\S+)/ and print $1')
echo $UUID
1a2f92b0:d7c1a540:165b9ab7:0baed449

DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +'
mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})

echo $DEVICES
/dev/sdc /dev/sde /dev/sdf /dev/sdh /dev/sdi


Create overlays:

root@garage:~# ./overlayoptions.sh create
Currently set device are
/dev/sdc /dev/sde /dev/sdf /dev/sdh /dev/sdi
Input is create
Creating Overlay
free 235071M
Clear any old overlays
Removing Overlay
/dev/sdc 286102M /dev/loop0 /dev/mapper/sdc
/dev/sde 286102M /dev/loop1 /dev/mapper/sde
/dev/sdf 286102M /dev/loop2 /dev/mapper/sdf
/dev/sdh 286102M /dev/loop3 /dev/mapper/sdh
/dev/sdi 286102M /dev/loop4 /dev/mapper/sdi


root@garage:~# mdadm --assemble --force /dev/md127 $OVERLAYS
mdadm: clearing FAULTY flag for device 3 in /dev/md127 for /dev/mapper/sdh
mdadm: Marking array /dev/md127 as 'clean'
mdadm: failed to add /dev/mapper/sde to /dev/md127: Invalid argument
mdadm: failed to add /dev/mapper/sdi to /dev/md127: Invalid argument
mdadm: /dev/md127 assembled from 2 drives and  1 rebuilding - not enough
to start the array.



root@garage:~# mdadm --examine /dev/sd[cdefghij] |grep Event
         Events : 1911
         Events : 1911
         Events : 1910
         Events : 1910
         Events : 1911

(Two drives have older Events)




root@garage:~# mdadm --examine /dev/sd[cdefghij]
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a2f92b0:d7c1a540:165b9ab7:0baed449
           Name : garage:Data  (local to host garage)
  Creation Time : Mon Sep  8 16:44:06 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 585675356 (279.27 GiB 299.87 GB)
     Array Size : 1464186880 (1396.36 GiB 1499.33 GB)
  Used Dev Size : 585674752 (279.27 GiB 299.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 9b89d817:36633ebf:6d30d5ea:5bba21a0

    Update Time : Sun Jan  8 13:47:20 2017
       Checksum : 448c93b2 - expected 448c93b1
         Events : 1911

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.AAAAA ('A' == active, '.' == missing)
/dev/sdd:
   MBR Magic : aa55
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a2f92b0:d7c1a540:165b9ab7:0baed449
           Name : garage:Data  (local to host garage)
  Creation Time : Mon Sep  8 16:44:06 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 585675356 (279.27 GiB 299.87 GB)
     Array Size : 1464186880 (1396.36 GiB 1499.33 GB)
  Used Dev Size : 585674752 (279.27 GiB 299.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : 3ee17acf:73dabec5:22dd87f5:981b904f

    Update Time : Sun Jan  8 13:47:20 2017
       Checksum : bcd07086 - expected bcd07085
         Events : 1911

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.AAAAA ('A' == active, '.' == missing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a2f92b0:d7c1a540:165b9ab7:0baed449
           Name : garage:Data  (local to host garage)
  Creation Time : Mon Sep  8 16:44:06 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 585675356 (279.27 GiB 299.87 GB)
     Array Size : 1464186880 (1396.36 GiB 1499.33 GB)
  Used Dev Size : 585674752 (279.27 GiB 299.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6b8031bd:2a03c716:c2633b9f:f3b0a1d1

    Update Time : Sun Jan  8 13:42:11 2017
       Checksum : 2757532f - correct
         Events : 1910

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAAA ('A' == active, '.' == missing)
/dev/sdg:
   MBR Magic : aa55
/dev/sdh:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x12
     Array UUID : 1a2f92b0:d7c1a540:165b9ab7:0baed449
           Name : garage:Data  (local to host garage)
  Creation Time : Mon Sep  8 16:44:06 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 585675356 (279.27 GiB 299.87 GB)
     Array Size : 1464186880 (1396.36 GiB 1499.33 GB)
  Used Dev Size : 585674752 (279.27 GiB 299.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
Recovery Offset : 0 sectors
          State : clean
    Device UUID : 03670c0d:342627ae:391927a0:d8ba78d4

    Update Time : Sun Jan  8 13:42:11 2017
       Checksum : 12551c41 - correct
         Events : 1910

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAAA ('A' == active, '.' == missing)
/dev/sdi:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1a2f92b0:d7c1a540:165b9ab7:0baed449
           Name : garage:Data  (local to host garage)
  Creation Time : Mon Sep  8 16:44:06 2014
     Raid Level : raid6
   Raid Devices : 7

 Avail Dev Size : 585675356 (279.27 GiB 299.87 GB)
     Array Size : 1464186880 (1396.36 GiB 1499.33 GB)
  Used Dev Size : 585674752 (279.27 GiB 299.87 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : e8587c1b:e9c60b00:18e20a16:1b145b10

    Update Time : Sun Jan  8 13:47:20 2017
       Checksum : 246cd221 - expected 246ad21f
         Events : 1911

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 6
   Array State : A.AAAAA ('A' == active, '.' == missing)
/dev/sdj:
   MBR Magic : aa55


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-10-31 15:42 Raid 6 recovery John Crisp
@ 2017-10-31 16:27 ` Wols Lists
  2017-10-31 17:42   ` John Crisp
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2017-10-31 16:27 UTC (permalink / raw)
  To: John Crisp, linux-raid

On 31/10/17 15:42, John Crisp wrote:
> Hi,
> 
> Returning once again to this list for some help and advice.

Doing a first-responder job ... :-)
> 
> Long story short I have a failed Raid 6 array that I would like to try
> and recover. The data is not vitally important as I have most of it in a
> number of other places, but I'd like to try and resurrect the array if
> possible, as much to learn as anything.
> 
Looks very promising ...

> The array had an issue some while ago, but as I had no space to store
> any recovered data I left the machine off.
> 
> The OS is Xubuntu 14.04
> 
> The system consisted of a boot/OS array with two mirrored drives (which
> is fine), and then a Raid 6 data array which consisted of 8 300Gb Ultra
> Wide SCSI drives. 7 were in the array with a spare (if my memory serves
> me correctly).

Okay. That makes 5 data drives, 2 parity, one spare. I'm wondering if
one drive failed a while back and was rebuilt, so you didn't have the
spare you think you did. I'm half-hoping that's the case, because if it
fell over in the middle of a rebuild, that could be a problem ...
> 
> As far as I remember the machine suffered a power failure. When it
> powered up again, the system tried to restore/rebuild the array. During
> this I think the power failed again (don't ask.....) It suffered at lest
> one disk failure. I then left it off to try another day.
> 
> Drive layout is as follows:
> 
> RAID 1 mirror /dev/sda + b
> 
> RAID 6 array /dev/sd[cdefghij]
> 
> /dev/sdd was dead and has been replaced.


> 
> As far as I remember I created the array, then added a partition and
> then LVM (possibly not a good idea in hindsight). So none of the
> individual drives show a partition......
> 
> I had a good read here and created some of the scripts.
> 
> https://raid.wiki.kernel.org/index.php/Recovering_a_damaged_RAID
> 
> Here is some of the output I have got so far. Any advice appreciated.
> 
> B. Rgds
> John
> 
> 
> 
> root@garage:~# sed -e '/^[[:space:]]*$/d' -e '/^[[:space:]]*#/d'
> /etc/mdadm/mdadm.conf
> CREATE owner=root group=disk mode=0660 auto=yes
> HOMEHOST <system>
> MAILADDR root
> DEVICE partitions
> ARRAY /dev/md0 metadata=1.2 name=garage:0
> UUID=90624393:3b638ad8:9aeb81ca:fa3caafc
> ARRAY /dev/md1 metadata=1.2 name=garage:1
> UUID=f624610a:b711ff4b:3b126550:a8f78732
> ARRAY /dev/md/Data metadata=1.2 name=garage:Data
> UUID=1a2f92b0:d7c1a540:165b9ab7:0baed449
> 
> 
> cat /proc/mdstat
> 
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md127 : inactive sdf[3](S) sdh[5](S)
>       585675356 blocks super 1.2
> 
> md1 : active raid1 sda5[2] sdb5[3]
>       292674368 blocks super 1.2 [2/2] [UU]
> 
> md0 : active raid1 sda1[2] sdb1[3]
>       248640 blocks super 1.2 [2/2] [UU]
> 
> unused devices: <none>
> 
> mdadm --stop /dev/md127
> 
> 
> Notice the following will not work with /dev/sdc1 as there is no
> partition on the drive. Have to use /dev/sdc :
> 
> UUID=$(mdadm -E /dev/sdc|perl -ne '/Array UUID : (\S+)/ and print $1')
> echo $UUID
> 1a2f92b0:d7c1a540:165b9ab7:0baed449
> 
> DEVICES=$(cat /proc/partitions | parallel --tagstring {5} --colsep ' +'
> mdadm -E /dev/{5} |grep $UUID | parallel --colsep '\t' echo /dev/{1})
> 
> echo $DEVICES
> /dev/sdc /dev/sde /dev/sdf /dev/sdh /dev/sdi
> 
> 
> Create overlays:
> 
> root@garage:~# ./overlayoptions.sh create
> Currently set device are
> /dev/sdc /dev/sde /dev/sdf /dev/sdh /dev/sdi
> Input is create
> Creating Overlay
> free 235071M
> Clear any old overlays
> Removing Overlay
> /dev/sdc 286102M /dev/loop0 /dev/mapper/sdc
> /dev/sde 286102M /dev/loop1 /dev/mapper/sde
> /dev/sdf 286102M /dev/loop2 /dev/mapper/sdf
> /dev/sdh 286102M /dev/loop3 /dev/mapper/sdh
> /dev/sdi 286102M /dev/loop4 /dev/mapper/sdi
> 
> 
> root@garage:~# mdadm --assemble --force /dev/md127 $OVERLAYS
> mdadm: clearing FAULTY flag for device 3 in /dev/md127 for /dev/mapper/sdh
> mdadm: Marking array /dev/md127 as 'clean'
> mdadm: failed to add /dev/mapper/sde to /dev/md127: Invalid argument
> mdadm: failed to add /dev/mapper/sdi to /dev/md127: Invalid argument
> mdadm: /dev/md127 assembled from 2 drives and  1 rebuilding - not enough
> to start the array.
> 
This worries me. We have 5 drives, which would normally be enough to
recreate the array - a quick "--force" and we're up and running. Except
one drive is rebuilding, so we have one drive's worth of data scattered
across two drives :-(

Examine tells us that sdd, sdg, and sdj have been partitioned. What does
"fdisk -l" tell us about those drives? Assuming they have one large
partition each, what does "--examine" tell us about sdd1, sdg1 and sdj1
(assuming that's what the partitions are)?
> 
> 
> root@garage:~# mdadm --examine /dev/sd[cdefghij] |grep Event
>          Events : 1911
>          Events : 1911
>          Events : 1910
>          Events : 1910
>          Events : 1911
> 
> (Two drives have older Events)
> 
Do you mean the two with 1910? That's no great shakes.
> 
> 
> 
> root@garage:~# mdadm --examine /dev/sd[cdefghij]
> /dev/sdc:

Snip the details ... :-)

First things first, I'd suggest going out and getting a 3TB drive. Once
we've worked out where the data is hiding on sdd, sdg, and sdj you can
ddrescue all that into partitions on this drive and still have space
left over. That way you've got your original drives untouched, you've
got a copy of everything on a fresh drive that's not going to die on you
(touch wood), and you've got spare space left over. (Even better, a 4TB
drive and then you can probably backup the array into the space left
over!). That'll set you back just over £100 for a Seagate Ironwolf or
similar.

Second, as I say, work out where that data is hiding - I strongly
suspect those drives have been partitioned.

And lastly, go back to the wiki. The page you read was the last in a
series - it would pay you to read the lot.

https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn

Note especially the utility lsdrv, which will tell the experts here
straight away where your data has decided to play hide-and-seek.

ESPECIALLY if you've ddrescued the data to a new drive, I suspect it
will be a simple matter of "--assemble --force" and your array will back
up and running in a flash - well, maybe not a flash, it's got to rebuild
and sort itself out, but it'll be back and working.

(And then, of course, if you have built a new raid with a bunch of
partitions all on one disk, you need to backup the data, tear down the
raid, and re-organise the disk(s) into a more sensible long-term
configuration).

Oh - and putting LVM on top of a raid is perfectly sensible behaviour.
We have a problem with the raid - let's fix the raid and your LVM should
just come straight back.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-10-31 16:27 ` Wols Lists
@ 2017-10-31 17:42   ` John Crisp
  2017-10-31 18:11     ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: John Crisp @ 2017-10-31 17:42 UTC (permalink / raw)
  To: Wols Lists, linux-raid


[-- Attachment #1.1: Type: text/plain, Size: 12219 bytes --]

On 31/10/17 17:27, Wols Lists wrote:
> On 31/10/17 15:42, John Crisp wrote:
>> Hi,
>>
>> Returning once again to this list for some help and advice.
> 
> Doing a first-responder job ... :-)

Aww thanks :-) Thunderbirds to the rescue !


>>
>> Long story short I have a failed Raid 6 array that I would like to try
>> and recover. The data is not vitally important as I have most of it in a
>> number of other places, but I'd like to try and resurrect the array if
>> possible, as much to learn as anything.
>>
> Looks very promising ...

I hope so....
> 
> Okay. That makes 5 data drives, 2 parity, one spare. I'm wondering if
> one drive failed a while back and was rebuilt, so you didn't have the
> spare you think you did. I'm half-hoping that's the case, because if it
> fell over in the middle of a rebuild, that could be a problem ...

Quite possibly.

>> root@garage:~# mdadm --assemble --force /dev/md127 $OVERLAYS
>> mdadm: clearing FAULTY flag for device 3 in /dev/md127 for /dev/mapper/sdh
>> mdadm: Marking array /dev/md127 as 'clean'
>> mdadm: failed to add /dev/mapper/sde to /dev/md127: Invalid argument
>> mdadm: failed to add /dev/mapper/sdi to /dev/md127: Invalid argument
>> mdadm: /dev/md127 assembled from 2 drives and  1 rebuilding - not enough
>> to start the array.
>>
> This worries me. We have 5 drives, which would normally be enough to
> recreate the array - a quick "--force" and we're up and running. Except
> one drive is rebuilding, so we have one drive's worth of data scattered
> across two drives :-(
> 

Oh yuck.....


> Examine tells us that sdd, sdg, and sdj have been partitioned. What does
> "fdisk -l" tell us about those drives? Assuming they have one large
> partition each, what does "--examine" tell us about sdd1, sdg1 and sdj1
> (assuming that's what the partitions are)?

mdadm --examine was pasted at the bottom of my original post.


cat /etc/fstab

# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/xubuntu-vg__raider-root / ext4 errors=remount-ro 0 1
# /boot was on /dev/sda1 during installation
UUID=86b99e91-e21e-4381-97e3-9b38ea8dae1b /boot ext2 defaults 0 2
/dev/mapper/xubuntu-vg__raider-swap_1 none swap sw 0 0
UUID=b19a1b13-e650-4288-864a-b84a3a86edad /media/Data ext4 rw,noatime 0 0


fdisk:

root@garage:~# fdisk -l /dev/sd[cdefghij]

Disk /dev/sdc: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b5cc0

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdd: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sde: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0003fdad

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdf: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00098c62

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdg: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000c9bb4

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdh: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000ae9f1

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdi: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00040d18

   Device Boot      Start         End      Blocks   Id  System

Disk /dev/sdj: 300.0 GB, 300000000000 bytes
255 heads, 63 sectors/track, 36472 cylinders, total 585937500 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000924c0

   Device Boot      Start         End      Blocks   Id  System




>>
>> (Two drives have older Events)
>>
> Do you mean the two with 1910? That's no great shakes.

OK.

>> root@garage:~# mdadm --examine /dev/sd[cdefghij]
>> /dev/sdc:
> 
> Snip the details ... :-)
> 
> First things first, I'd suggest going out and getting a 3TB drive. Once
> we've worked out where the data is hiding on sdd, sdg, and sdj you can
> ddrescue all that into partitions on this drive and still have space
> left over. That way you've got your original drives untouched, you've
> got a copy of everything on a fresh drive that's not going to die on you
> (touch wood), and you've got spare space left over. (Even better, a 4TB
> drive and then you can probably backup the array into the space left
> over!). That'll set you back just over £100 for a Seagate Ironwolf or
> similar.

I'm not sure I can add another drive in the existing rig (which is a bit
of a jury rig - the original box died so I have a bog standard PC with a
the 2 SATA drives with the OS and then a PCI raid card plugged in to
fire up the array cage on the old box. It's serious open heart surgery
here !)

If you want to laugh look here (I told you it was bad....)
http://picpaste.com/20171031-0F5Z1t5c.jpg

I could dd over ssh to my main server which has a few TB of space.

> 
> Second, as I say, work out where that data is hiding - I strongly
> suspect those drives have been partitioned.
> 

See fdisk -l above - there are no partitions on the drives. This was a
data array only and was mounted after booting the OS

> And lastly, go back to the wiki. The page you read was the last in a
> series - it would pay you to read the lot.
> 
> https://raid.wiki.kernel.org/index.php/Linux_Raid#When_Things_Go_Wrogn
> 
> Note especially the utility lsdrv, which will tell the experts here
> straight away where your data has decided to play hide-and-seek.
> 

I had read through most stuff I think but you miss stuff sometimes (the
grey cells are getting old!)

OK - grabbed a copy of lsdrv and results pasted below - for ref I did:

git clone https://github.com/pturmel/lsdrv.git

I ran the script and realised it wanted sginfo which comes in the
s3g-utils package which I then installed.


> ESPECIALLY if you've ddrescued the data to a new drive, I suspect it
> will be a simple matter of "--assemble --force" and your array will back
> up and running in a flash - well, maybe not a flash, it's got to rebuild
> and sort itself out, but it'll be back and working.
> 
> (And then, of course, if you have built a new raid with a bunch of
> partitions all on one disk, you need to backup the data, tear down the
> raid, and re-organise the disk(s) into a more sensible long-term
> configuration).
> 

OK.... here's hoping :-)

> Oh - and putting LVM on top of a raid is perfectly sensible behaviour.
> We have a problem with the raid - let's fix the raid and your LVM should
> just come straight back.
> 

OK - nice to know.


Note below sda and sdb are a mirror with the OS. Drives sd[cdefghij] are
the Raid 6 data volume.

root@garage:/home/john/git/lsdrv# ./lsdrv
PCI [ata_piix] 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7
Family) IDE Controller (rev 01)
├scsi 0:x:x:x [Empty]
└scsi 1:x:x:x [Empty]
PCI [ata_piix] 00:1f.2 IDE interface: Intel Corporation NM10/ICH7 Family
SATA Controller [IDE mode] (rev 01)
├scsi 2:0:0:0 ATA      Maxtor 6L300S0   {L6159N1H}
│└sda 279.48g [8:0] Partitioned (dos)
│ ├sda1 243.00m [8:1] MD raid1 (1/2) (w/ sdb1) in_sync 'garage:0'
{90624393-3b63-8ad8-9aeb-81cafa3caafc}
│ │└md0 242.81m [9:0] MD v1.2 raid1 (2) clean
{90624393:3b638ad8:9aeb81ca:fa3caafc}
│ │ │                 ext2 {86b99e91-e21e-4381-97e3-9b38ea8dae1b}
│ │ └Mounted as /dev/md0 @ /boot
│ ├sda2 1.00k [8:2] Partitioned (dos)
│ └sda5 279.24g [8:5] MD raid1 (1/2) (w/ sdb5) in_sync 'garage:1'
{f624610a-b711-ff4b-3b12-6550a8f78732}
│  └md1 279.12g [9:1] MD v1.2 raid1 (2) clean
{f624610a:b711ff4b:3b126550:a8f78732}
│   │                 PV LVM2_member 279.11g used, 0 free
{sBqZxo-ybSN-5axJ-VKtQ-HVlJ-KSRd-rQKw5b}
│   └VG xubuntu-vg__raider 279.11g 0 free
{uSpmjO-b5cC-UfQU-7J5h-ZOMo-M6H6-Gb0qOp}
│    ├dm-0 269.21g [252:0] LV root ext4
{ce84b80b-a8cc-48ed-b8b6-5264c211feaf}
│    │└Mounted as /dev/dm-0 @ /
│    └dm-1 9.90g [252:1] LV swap_1 swap
{20e81f82-d6f9-4f46-8c34-5cece8fc6126}
└scsi 3:0:0:0 ATA      Maxtor 6L300S0   {L6159ETH}
 └sdb 279.48g [8:16] Partitioned (dos)
  ├sdb1 243.00m [8:17] MD raid1 (0/2) (w/ sda1) in_sync 'garage:0'
{90624393-3b63-8ad8-9aeb-81cafa3caafc}
  │└md0 242.81m [9:0] MD v1.2 raid1 (2) clean
{90624393:3b638ad8:9aeb81ca:fa3caafc}
  │                   ext2 {86b99e91-e21e-4381-97e3-9b38ea8dae1b}
  ├sdb2 1.00k [8:18] Partitioned (dos)
  └sdb5 279.24g [8:21] MD raid1 (0/2) (w/ sda5) in_sync 'garage:1'
{f624610a-b711-ff4b-3b12-6550a8f78732}
   └md1 279.12g [9:1] MD v1.2 raid1 (2) clean
{f624610a:b711ff4b:3b126550:a8f78732}
                      PV LVM2_member 279.11g used, 0 free
{sBqZxo-ybSN-5axJ-VKtQ-HVlJ-KSRd-rQKw5b}

PCI [aic7xxx] 03:02.0 SCSI storage controller: Adaptec AIC-7892A U160/m
(rev 02)
├scsi 4:0:0:0 COMPAQ   BD30089BBA       {DA01P770DB4P0726}
│└sdc 279.40g [8:32] MD raid6 (7) inactive 'garage:Data'
{1a2f92b0-d7c1-a540-165b-9ab70baed449}
├scsi 4:0:1:0 COMPAQ   BD30089BBA       {DA01P760D7FW0724}
│└sdd 279.40g [8:48] Partitioned (dos)
├scsi 4:0:2:0 COMPAQ   BD30089BBA       {DA01P760DABG0726}
│└sde 279.40g [8:64] MD raid6 (7) inactive 'garage:Data'
{1a2f92b0-d7c1-a540-165b-9ab70baed449}
├scsi 4:0:3:0 COMPAQ   BD30089BBA       {DA01P760D9NR0726}
│└sdf 279.40g [8:80] MD  (none/) (w/ sdh) spare 'garage:Data'
{1a2f92b0-d7c1-a540-165b-9ab70baed449}
│ └md127 0.00k [9:127] MD v1.2  () inactive, None (None) None {None}
│                      Empty/Unknown
├scsi 4:0:4:0 COMPAQ   BD30089BBA       {DA01P760DAKA0726}
│└sdg 279.40g [8:96] Partitioned (dos)
├scsi 4:0:5:0 COMPAQ   BD30089BBA       {DA01P770DB4C0726}
│└sdh 279.40g [8:112] MD  (none/) (w/ sdf) spare 'garage:Data'
{1a2f92b0-d7c1-a540-165b-9ab70baed449}
│ └md127 0.00k [9:127] MD v1.2  () inactive, None (None) None {None}
│                      Empty/Unknown
├scsi 4:0:6:0 COMPAQ   BD30089BBA       {DA01P760D9NJ0726}
│└sdi 279.40g [8:128] MD raid6 (7) inactive 'garage:Data'
{1a2f92b0-d7c1-a540-165b-9ab70baed449}
└scsi 4:0:9:0 COMPAQ   BD30089BBA       {DA01P770DBB80727}
 └sdj 279.40g [8:144] Partitioned (dos)

Other Block Devices
├loop0 0.00k [7:0] Empty/Unknown
├loop1 0.00k [7:1] Empty/Unknown
├loop2 0.00k [7:2] Empty/Unknown
├loop3 0.00k [7:3] Empty/Unknown
├loop4 0.00k [7:4] Empty/Unknown
├loop5 0.00k [7:5] Empty/Unknown
├loop6 0.00k [7:6] Empty/Unknown
└loop7 0.00k [7:7] Empty/Unknown




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-10-31 17:42   ` John Crisp
@ 2017-10-31 18:11     ` Wols Lists
  2017-11-02 23:47       ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2017-10-31 18:11 UTC (permalink / raw)
  To: John Crisp, linux-raid, Phil Turmel, NeilBrown

On 31/10/17 17:42, John Crisp wrote:
>> This worries me. We have 5 drives, which would normally be enough to
>> > recreate the array - a quick "--force" and we're up and running. Except
>> > one drive is rebuilding, so we have one drive's worth of data scattered
>> > across two drives :-(
>> > 
> Oh yuck.....
> 
> 
>> > Examine tells us that sdd, sdg, and sdj have been partitioned. What does
>> > "fdisk -l" tell us about those drives? Assuming they have one large
>> > partition each, what does "--examine" tell us about sdd1, sdg1 and sdj1
>> > (assuming that's what the partitions are)?
> mdadm --examine was pasted at the bottom of my original post.
> 
> 
> cat /etc/fstab
> 
> # <file system> <mount point>   <type>  <options>       <dump>  <pass>
> /dev/mapper/xubuntu-vg__raider-root / ext4 errors=remount-ro 0 1
> # /boot was on /dev/sda1 during installation
> UUID=86b99e91-e21e-4381-97e3-9b38ea8dae1b /boot ext2 defaults 0 2
> /dev/mapper/xubuntu-vg__raider-swap_1 none swap sw 0 0
> UUID=b19a1b13-e650-4288-864a-b84a3a86edad /media/Data ext4 rw,noatime 0 0
> 
> 
> fdisk:
> 
> root@garage:~# fdisk -l /dev/sd[cdefghij]

OUCH!!!

Okay, this is getting scary, sorry. We need to find where the data on
disks sdd, sdg, and sdj has gone. And according to all the information
you've given me, we have an array that's been saving its data in thin
air. Obviously that's not true, it's gone somewhere, the question is where.

What's also noticeable, is that the three drives that have "vanished"
have all got an MBR. There's a whole bunch of possible explanations, but
I really don't have the experience or knowledge to make a call here.

Were those drives re-purposed from somewhere else? Could there have been
a blank partition table on them before you used them?

Is it possible somebody did a
# fdisk /dev/sdd
: write
and wrote a blank mbr to a drive that was already in the array?

Or it could be that these drives really did have a partition sdd1 etc,
and something's blown the partition table away (I remember a GPT case
where that happened ...)

I hate to say it, but I think you are now in the world of hexdump and
searching manually for a superblock on those three drives. See
https://raid.wiki.kernel.org/index.php/Advanced_data_recovery

It's possible - I wouldn't know - that the existence of the MBR stops
mdadm looking for a superblock that applies to the whole drive, which
could explain why those drives have disappeared.

Or it could be that the superblock is in a partition, and because the
MBR has been blanked mdadm is not looking for said partition.

In a v1.2 array, the superblock is offset 4K from the start of the
device (to stop things like a rouge fdisk overwriting it :-) so you need
to see if you can find one at about the 4K mark.

If you can, the experts here should be able to recover the array fairly
easily, but it's beyond my ability, sorry.

NB - If you did get the big new drive, I was suggesting you turn all
your eight raid devices into partitions on the new drive, so you would
only have one drive to worry about while recovering :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-10-31 18:11     ` Wols Lists
@ 2017-11-02 23:47       ` Wols Lists
  2017-11-03 10:44         ` John Crisp
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2017-11-02 23:47 UTC (permalink / raw)
  To: John Crisp, linux-raid, Phil Turmel, NeilBrown

On 31/10/17 18:11, Wols Lists wrote:
> It's possible - I wouldn't know - that the existence of the MBR stops
> mdadm looking for a superblock that applies to the whole drive, which
> could explain why those drives have disappeared.

Just been digging in the code - function check_raid - I think if you
pass a drive to mdadm it checks the drive for a superblock, and doesn't
look for whether a partition table exists. So I think that's that
explanation blown out the water ... :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-11-02 23:47       ` Wols Lists
@ 2017-11-03 10:44         ` John Crisp
  2017-11-04 12:46           ` Wols Lists
  0 siblings, 1 reply; 9+ messages in thread
From: John Crisp @ 2017-11-03 10:44 UTC (permalink / raw)
  To: Wols Lists, linux-raid, Phil Turmel, NeilBrown


[-- Attachment #1.1: Type: text/plain, Size: 980 bytes --]

On 03/11/17 00:47, Wols Lists wrote:
> On 31/10/17 18:11, Wols Lists wrote:
>> It's possible - I wouldn't know - that the existence of the MBR stops
>> mdadm looking for a superblock that applies to the whole drive, which
>> could explain why those drives have disappeared.
> 
> Just been digging in the code - function check_raid - I think if you
> pass a drive to mdadm it checks the drive for a superblock, and doesn't
> look for whether a partition table exists. So I think that's that
> explanation blown out the water ... :-(
> 

:-) Thanks for looking.

As I mentioned, I think I created the array via Xubuntu.

As far as I remember I didn't partition any of the drives - just added
them to the array and I think it effectively created one partition
across the whole array (which would make sense why you can't see
partitions on individual drives)

I am pretty sure that is trashed, but I like to try and understand these
things !

B. Rgds
John


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-11-03 10:44         ` John Crisp
@ 2017-11-04 12:46           ` Wols Lists
  2017-11-05 17:13             ` John Crisp
  0 siblings, 1 reply; 9+ messages in thread
From: Wols Lists @ 2017-11-04 12:46 UTC (permalink / raw)
  To: John Crisp, linux-raid, Phil Turmel, NeilBrown

On 03/11/17 10:44, John Crisp wrote:
> On 03/11/17 00:47, Wols Lists wrote:
>> On 31/10/17 18:11, Wols Lists wrote:
>>> It's possible - I wouldn't know - that the existence of the MBR stops
>>> mdadm looking for a superblock that applies to the whole drive, which
>>> could explain why those drives have disappeared.
>>
>> Just been digging in the code - function check_raid - I think if you
>> pass a drive to mdadm it checks the drive for a superblock, and doesn't
>> look for whether a partition table exists. So I think that's that
>> explanation blown out the water ... :-(
>>
> 
> :-) Thanks for looking.
> 
> As I mentioned, I think I created the array via Xubuntu.
> 
> As far as I remember I didn't partition any of the drives - just added
> them to the array and I think it effectively created one partition
> across the whole array (which would make sense why you can't see
> partitions on individual drives)
> 
> I am pretty sure that is trashed, but I like to try and understand these
> things !
> 
More for Neil, this, but I've had a nasty thought. Did you reboot at all
(as in clean shutdown, not crash and restart) at any point during this?
I get the impression you might not have.

Because that *could* explain the missing superblocks! I think raid
updates the superblock every time it moves the rebuild window, so it
should get rewritten to disk regularly. BUT. Is it possible that it was
sitting in the disk cache when the system crashed?

I find it hard to believe that's the case, but it could explain at least
some of it (doesn't explain why those drives had mbr's, though).

Cheers,
Wol


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-11-04 12:46           ` Wols Lists
@ 2017-11-05 17:13             ` John Crisp
  2017-11-05 18:08               ` Wol's lists
  0 siblings, 1 reply; 9+ messages in thread
From: John Crisp @ 2017-11-05 17:13 UTC (permalink / raw)
  To: Wols Lists, linux-raid, Phil Turmel, NeilBrown


[-- Attachment #1.1: Type: text/plain, Size: 1433 bytes --]

Thanks for continuing to mull this one :-)

On 04/11/17 13:46, Wols Lists wrote:
>>
>> I am pretty sure that is trashed, but I like to try and understand these
>> things !
>>
> More for Neil, this, but I've had a nasty thought. Did you reboot at all
> (as in clean shutdown, not crash and restart) at any point during this?
> I get the impression you might not have.
> 

That I cannot remember to be honest. As I previously mentioned I *think*
it may have fallen over or crashed during the first rebuild but my
memory of exact events back then is hazy. I can try and fire it up and
check the logs - maybe they will have some history.

> Because that *could* explain the missing superblocks! I think raid
> updates the superblock every time it moves the rebuild window, so it
> should get rewritten to disk regularly. BUT. Is it possible that it was
> sitting in the disk cache when the system crashed?
> 
> I find it hard to believe that's the case, but it could explain at least
> some of it (doesn't explain why those drives had mbr's, though).
> 

I'm pretty sure the cards in use are that old they have precious little
cache in them. However that might not be the case.

I'll go check if I can see anything in any of the logs.

B. Rgds
John

--

PS - I seem to remember I should do reply to all on this list but can't
see where I might have read that ! Please let me know if that is wrong


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Raid 6 recovery
  2017-11-05 17:13             ` John Crisp
@ 2017-11-05 18:08               ` Wol's lists
  0 siblings, 0 replies; 9+ messages in thread
From: Wol's lists @ 2017-11-05 18:08 UTC (permalink / raw)
  To: John Crisp, linux-raid

On 05/11/17 17:13, John Crisp wrote:
> PS - I seem to remember I should do reply to all on this list but can't
> see where I might have read that ! Please let me know if that is wrong

It's standard procedure on LKML, and the list allows posts by people who 
are not subscribed, so yes that is standard procedure. I put that on the 
wiki.

HOWEVER. Just like you should trim the message body appropriately, you 
should also trim the respondents appropriately. I probably ought to put 
that on the wiki, too :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-11-05 18:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-31 15:42 Raid 6 recovery John Crisp
2017-10-31 16:27 ` Wols Lists
2017-10-31 17:42   ` John Crisp
2017-10-31 18:11     ` Wols Lists
2017-11-02 23:47       ` Wols Lists
2017-11-03 10:44         ` John Crisp
2017-11-04 12:46           ` Wols Lists
2017-11-05 17:13             ` John Crisp
2017-11-05 18:08               ` Wol's lists

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.