All of lore.kernel.org
 help / color / mirror / Atom feed
* Recover array after I panicked
@ 2017-04-23  9:47 Patrik Dahlström
  2017-04-23 10:16 ` Andreas Klauer
  2017-04-23 14:06 ` Brad Campbell
  0 siblings, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23  9:47 UTC (permalink / raw)
  To: linux-raid

Hello,

Here's the story:

I started with a 5x6 TB raid5 array. I added another 6 TB drive and
started to grow the array. However, one of my SATA cables were bad and
the reshape gave me lots of I/O errors.

Instead of fixing the SATA cable issue directly, I shutdown the server
and swapped places of 2 drives. My reasoning was that putting the new
drive in a good slot would reduce the I/O errors. Bad move, I know. I
tried a few commands but was not able to continue the reshape.

I then took out the server from the rack and replaced the SATA cable. No
more I/O errors, but at this point I am unable to recreate the array.
This is when I really started to panic. After multiple attempts at
rescuing the array, I tried running the Permute_array.pl script [1] with
some local modifications to additionally try different chunk sizes. So
far I've had no luck.

I don't remember what commands I have run.
I have the OS on a separate drive, i.e. /var/log and /etc is intact.
raid metadata version is 1.2.
Ubuntu 16.04.02 LTS
Kernel 4.4.0-72-generic
mdadm 3.3


Is there any help you can offer?


[1] https://raid.wiki.kernel.org/index.php/Permute_array.pl




^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23  9:47 Recover array after I panicked Patrik Dahlström
@ 2017-04-23 10:16 ` Andreas Klauer
  2017-04-23 10:23   ` Patrik Dahlström
  2017-04-23 14:06 ` Brad Campbell
  1 sibling, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 10:16 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: linux-raid

On Sun, Apr 23, 2017 at 11:47:34AM +0200, Patrik Dahlström wrote:
> Is there any help you can offer?

Is there any mdadm --examine output?

What was on the array? Regular filesystem, unencrypted, or LVM, LUKS, ...?

If it's LUKS encrypted and you had RAID metadata at the end, yet 
mdadm --create'd new metadata at the start, that would likely have 
damaged your LUKS header beyond repair (and regular filesystems 
don't like it, either).

If it's unencrypted data, as a last resort you can always go and find 
the header of a large file of known type... for example if you find 
a megapixel JPEG image and the first 512K of it are part of that then 
your chunksize would be 512K and then you can go looking for the 
next chunk on the other disks... and that should give you some notion 
of the RAID layout and offset.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 10:16 ` Andreas Klauer
@ 2017-04-23 10:23   ` Patrik Dahlström
  2017-04-23 10:46     ` Andreas Klauer
  2017-04-23 12:32     ` Patrik Dahlström
  0 siblings, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 10:23 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid

On 04/23/2017 12:16 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 11:47:34AM +0200, Patrik Dahlström wrote:
>> Is there any help you can offer?
> 
> Is there any mdadm --examine output?
At this point, it is incorrect. I've lost the output from the working
raid too, unless it's located in any log in /var/log/.
I have /etc/mdadm/mdadm.conf, but don't know if it's updated.

> 
> What was on the array? Regular filesystem, unencrypted, or LVM, LUKS, ...?
Regular filesystem, unencrypted ext4.

> 
> If it's LUKS encrypted and you had RAID metadata at the end, yet 
> mdadm --create'd new metadata at the start, that would likely have 
> damaged your LUKS header beyond repair (and regular filesystems 
> don't like it, either).
No file system encryption.

> 
> If it's unencrypted data, as a last resort you can always go and find 
> the header of a large file of known type... for example if you find 
> a megapixel JPEG image and the first 512K of it are part of that then 
> your chunksize would be 512K and then you can go looking for the 
> next chunk on the other disks... and that should give you some notion 
> of the RAID layout and offset.
That's not a bad idea. Will hopefully narrow down my unknown variables.

> 
> Regards
> Andreas Klauer
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 10:23   ` Patrik Dahlström
@ 2017-04-23 10:46     ` Andreas Klauer
  2017-04-23 11:12       ` Patrik Dahlström
  2017-04-23 12:32     ` Patrik Dahlström
  1 sibling, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 10:46 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: linux-raid

On Sun, Apr 23, 2017 at 12:23:17PM +0200, Patrik Dahlström wrote:
> At this point, it is incorrect.

:-(

> I've lost the output from the working
> raid too, unless it's located in any log in /var/log/.

If you have kernel logs from before your accident...
check for stuff like this:

[    7.328420] md: md6 stopped.
[    7.329705] md/raid:md6: device sdb6 operational as raid disk 0
[    7.329705] md/raid:md6: device sdg6 operational as raid disk 6
[    7.329706] md/raid:md6: device sdh6 operational as raid disk 5
[    7.329706] md/raid:md6: device sdf6 operational as raid disk 4
[    7.329706] md/raid:md6: device sde6 operational as raid disk 3
[    7.329707] md/raid:md6: device sdd6 operational as raid disk 2
[    7.329707] md/raid:md6: device sdc6 operational as raid disk 1
[    7.329924] md/raid:md6: raid level 5 active with 7 out of 7 devices, algorithm 2
[    7.329936] md6: detected capacity change from 0 to 1500282617856

That's not everything but it's something.

Also in the future for experiments, go with overlays.

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
(and the overlay manipulation functions below that)

Lets you mess with mdadm --create stuff w/o actually overwriting original metadata.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 10:46     ` Andreas Klauer
@ 2017-04-23 11:12       ` Patrik Dahlström
  2017-04-23 11:36         ` Wols Lists
  2017-04-23 13:16         ` Andreas Klauer
  0 siblings, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 11:12 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid



On 04/23/2017 12:46 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 12:23:17PM +0200, Patrik Dahlström wrote:
>> At this point, it is incorrect.
> 
> :-(
> 
>> I've lost the output from the working
>> raid too, unless it's located in any log in /var/log/.
> 
> If you have kernel logs from before your accident...
> check for stuff like this:
> 
> [    7.328420] md: md6 stopped.
> [    7.329705] md/raid:md6: device sdb6 operational as raid disk 0
> [    7.329705] md/raid:md6: device sdg6 operational as raid disk 6
> [    7.329706] md/raid:md6: device sdh6 operational as raid disk 5
> [    7.329706] md/raid:md6: device sdf6 operational as raid disk 4
> [    7.329706] md/raid:md6: device sde6 operational as raid disk 3
> [    7.329707] md/raid:md6: device sdd6 operational as raid disk 2
> [    7.329707] md/raid:md6: device sdc6 operational as raid disk 1
> [    7.329924] md/raid:md6: raid level 5 active with 7 out of 7 devices, algorithm 2
> [    7.329936] md6: detected capacity change from 0 to 1500282617856
> 
> That's not everything but it's something.
I got some of that!
[    3.100350] md/raid:md1: device sde operational as raid disk 4
[    3.100350] md/raid:md1: device sdc operational as raid disk 3
[    3.100350] md/raid:md1: device sdd operational as raid disk 2
[    3.100351] md/raid:md1: device sda operational as raid disk 0
[    3.100351] md/raid:md1: device sdb operational as raid disk 1
[    3.100689] md/raid:md1: allocated 5432kB
[    3.100699] md/raid:md1: raid level 5 active with 5 out of 5 devices,
algorithm 2
[    3.100700] RAID conf printout:
[    3.100700]  --- level:5 rd:5 wd:5
[    3.100700]  disk 0, o:1, dev:sda
[    3.100700]  disk 1, o:1, dev:sdb
[    3.100701]  disk 2, o:1, dev:sdd
[    3.100701]  disk 3, o:1, dev:sdc
[    3.100701]  disk 4, o:1, dev:sde
[    3.101006] created bitmap (44 pages) for device md1
[    3.102245] md1: bitmap initialized from disk: read 3 pages, set 0 of
89423 bits
[    3.159019] md1: detected capacity change from 0 to 24004163272704

At least I now know in what order that I should assemble my original 5
drives:
sda, sdb, sdd, sdc, sde
It would only be logical for the new drive (sdf) to be last in that list.

> 
> Also in the future for experiments, go with overlays.
> 
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
> (and the overlay manipulation functions below that)
> 
> Lets you mess with mdadm --create stuff w/o actually overwriting original metadata.
Will do
> 
> Regards
> Andreas Klauer
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 11:12       ` Patrik Dahlström
@ 2017-04-23 11:36         ` Wols Lists
  2017-04-23 11:47           ` Patrik Dahlström
  2017-04-23 11:58           ` Roman Mamedov
  2017-04-23 13:16         ` Andreas Klauer
  1 sibling, 2 replies; 63+ messages in thread
From: Wols Lists @ 2017-04-23 11:36 UTC (permalink / raw)
  To: Patrik Dahlström, Andreas Klauer; +Cc: linux-raid

On 23/04/17 12:12, Patrik Dahlström wrote:
> At least I now know in what order that I should assemble my original 5
> drives:
> sda, sdb, sdd, sdc, sde
> It would only be logical for the new drive (sdf) to be last in that list.

And, as the raid wiki tells you, download lspci and run that. That will
hopefully tell us a lot about what's left of the various headers on your
disks.

My worry is that Permute_Array will have tried a whole bunch of "mdadm
--create" variants and totally trashed your original headers.

Cheers,
Wol

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 11:36         ` Wols Lists
@ 2017-04-23 11:47           ` Patrik Dahlström
  2017-04-23 11:53             ` Reindl Harald
  2017-04-23 11:58           ` Roman Mamedov
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 11:47 UTC (permalink / raw)
  To: Wols Lists, Andreas Klauer; +Cc: linux-raid



On 04/23/2017 01:36 PM, Wols Lists wrote:
> On 23/04/17 12:12, Patrik Dahlström wrote:
>> At least I now know in what order that I should assemble my original 5
>> drives:
>> sda, sdb, sdd, sdc, sde
>> It would only be logical for the new drive (sdf) to be last in that list.
> 
> And, as the raid wiki tells you, download lspci and run that. That will
> hopefully tell us a lot about what's left of the various headers on your
> disks.
I didn't find that part on the wiki. Here's the output of lspci:
00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)
00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #6 (rev f1)
00:1c.6 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #7 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)
00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

> 
> My worry is that Permute_Array will have tried a whole bunch of "mdadm
> --create" variants and totally trashed your original headers.
This is most likely.

> 
> Cheers,
> Wol
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 11:47           ` Patrik Dahlström
@ 2017-04-23 11:53             ` Reindl Harald
  0 siblings, 0 replies; 63+ messages in thread
From: Reindl Harald @ 2017-04-23 11:53 UTC (permalink / raw)
  To: Patrik Dahlström, Wols Lists, Andreas Klauer; +Cc: linux-raid



Am 23.04.2017 um 13:47 schrieb Patrik Dahlström:
> 
> 
> On 04/23/2017 01:36 PM, Wols Lists wrote:
>> On 23/04/17 12:12, Patrik Dahlström wrote:
>>> At least I now know in what order that I should assemble my original 5
>>> drives:
>>> sda, sdb, sdd, sdc, sde
>>> It would only be logical for the new drive (sdf) to be last in that list.
>>
>> And, as the raid wiki tells you, download lspci and run that. That will
>> hopefully tell us a lot about what's left of the various headers on your
>> disks.
> I didn't find that part on the wiki. Here's the output of lspci

"lspci" is surely nonsense but even "lsscsi" by looking at the 
helpoutput hardly shows "various headers on your disks", so when someone 
recommends a command and it's only useful with a defined set of 
non-default params he should mention that clearly instead point to some 
random command

> 00:00.0 Host bridge: Intel Corporation Sky Lake Host Bridge/DRAM Registers (rev 07)
> 00:14.0 USB controller: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller (rev 31)
> 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-H Thermal subsystem (rev 31)
> 00:16.0 Communication controller: Intel Corporation Sunrise Point-H CSME HECI #1 (rev 31)
> 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31)
> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #5 (rev f1)
> 00:1c.5 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #6 (rev f1)
> 00:1c.6 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #7 (rev f1)
> 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-H LPC Controller (rev 31)
> 00:1f.2 Memory controller: Intel Corporation Sunrise Point-H PMC (rev 31)
> 00:1f.4 SMBus: Intel Corporation Sunrise Point-H SMBus (rev 31)
> 01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
> 02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
> 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
> 04:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 11:36         ` Wols Lists
  2017-04-23 11:47           ` Patrik Dahlström
@ 2017-04-23 11:58           ` Roman Mamedov
  2017-04-23 12:11             ` Wols Lists
  1 sibling, 1 reply; 63+ messages in thread
From: Roman Mamedov @ 2017-04-23 11:58 UTC (permalink / raw)
  To: Wols Lists; +Cc: Patrik Dahlström, Andreas Klauer, linux-raid

On Sun, 23 Apr 2017 12:36:24 +0100
Wols Lists <antlists@youngman.org.uk> wrote:

> And, as the raid wiki tells you, download lspci and run that

Maybe you meant lsdrv. https://github.com/pturmel/lsdrv

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 11:58           ` Roman Mamedov
@ 2017-04-23 12:11             ` Wols Lists
  2017-04-23 12:15               ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Wols Lists @ 2017-04-23 12:11 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Patrik Dahlström, Andreas Klauer, linux-raid

On 23/04/17 12:58, Roman Mamedov wrote:
> On Sun, 23 Apr 2017 12:36:24 +0100
> Wols Lists <antlists@youngman.org.uk> wrote:
> 
>> And, as the raid wiki tells you, download lspci and run that
> 
> Maybe you meant lsdrv. https://github.com/pturmel/lsdrv
> 
Sorry, yes I did ... (too many ls_xxx commands :-)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 12:11             ` Wols Lists
@ 2017-04-23 12:15               ` Patrik Dahlström
  2017-04-24 21:04                 ` Phil Turmel
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 12:15 UTC (permalink / raw)
  To: Wols Lists, Roman Mamedov; +Cc: Andreas Klauer, linux-raid



On 04/23/2017 02:11 PM, Wols Lists wrote:
> On 23/04/17 12:58, Roman Mamedov wrote:
>> On Sun, 23 Apr 2017 12:36:24 +0100
>> Wols Lists <antlists@youngman.org.uk> wrote:
>>
>>> And, as the raid wiki tells you, download lspci and run that
>>
>> Maybe you meant lsdrv. https://github.com/pturmel/lsdrv
>>
> Sorry, yes I did ... (too many ls_xxx commands :-)
Ok, I had to patch lsdrv a bit to make it run. Diff:
diff --git a/lsdrv b/lsdrv
index fe6e77d..e868dbc 100755
--- a/lsdrv
+++ b/lsdrv
@@ -386,7 +386,8 @@ def probe_block(blocklink):
 				peers = " (w/ %s)" % ",".join(peers)
 			else:
 				peers = ""
-			blk.FS = "MD %s (%s/%s)%s %s" % (blk.array.md.LEVEL, blk.slave.slot, blk.array.md.raid_disks, peers, blk.slave.state)
+			if blk.array.md:
+				blk.FS = "MD %s (%s/%s)%s %s" % (blk.array.md.LEVEL, blk.slave.slot, blk.array.md.raid_disks, peers, blk.slave.state)
 		else:
 			blk.__dict__.update(extractvars(runx(['mdadm', '--export', '--examine', '/dev/block/'+blk.dev])))
 			blk.FS = "MD %s (%s) inactive" % (blk.MD_LEVEL, blk.MD_DEVICES)
@@ -402,9 +403,11 @@ def probe_block(blocklink):
 	else:
 		blk.FS = "Empty/Unknown"
 	if blk.ID_FS_LABEL:
-		blk.FS += " '%s'" % blk.ID_FS_LABEL
+		if blk.FS:
+			blk.FS += " '%s'" % blk.ID_FS_LABEL
 	if blk.ID_FS_UUID:
-		blk.FS += " {%s}" % blk.ID_FS_UUID
+		if blk.FS:
+			blk.FS += " {%s}" % blk.ID_FS_UUID
 	for part in blk.partitions:
 		probe_block(blkpath+'/'+part)
 	return blk

Here's the output of a run. Overlays are enabled:
PCI [ahci] 00:17.0 SATA controller: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] (rev 31)
├scsi 0:0:0:0 ATA      WDC WD60EFRX-68M {WD-WX91D6535N7Y}
│└sda 5.46t [8:0] None
│ └dm-2 5.46t [252:2] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├scsi 1:0:0:0 ATA      WDC WD600PF4PZ-4 {WD-WX11D741AE8K}
│└sdb 5.64t [8:16] None
│ └dm-5 5.64t [252:5] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├scsi 2:0:0:0 ATA      WDC WD60EFRX-68M {WD-WX11DC449Y02}
│└sdc 5.46t [8:32] None
│ └dm-1 5.46t [252:1] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├scsi 3:0:0:0 ATA      WDC WD60EFRX-68L {WD-WX11DA53427A}
│└sdd 5.46t [8:48] None
│ └dm-3 5.46t [252:3] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├scsi 4:0:0:0 ATA      WDC WD60EFRX-68L {WD-WXB1HB4W238J}
│└sde 5.46t [8:64] None
│ └dm-4 5.46t [252:4] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
└scsi 5:0:0:0 ATA      WDC WD60EFRX-68L {WD-WX41D75LN7CK}
 └sdf 5.46t [8:80] None
  └dm-6 5.46t [252:6] MD raid5 (6) inactive 'rack-server-1:1' {18cd5b54-707a-36df-36be-8f01e8a77122}
USB [usb-storage] Bus 001 Device 003: ID 152d:2338 JMicron Technology Corp. / JMicron USA Technology Corp. JM20337 Hi-Speed USB to SATA & PATA Combo Bridge {77C301992933}
└scsi 6:0:0:0 WDC WD20  WD-WMC301992933 {WD-WMC301992933}
 └sdg 1.82t [8:96] Partitioned (dos)
  ├sdg1 1.80t [8:97] ext4 {eb94342f-2eea-4318-9f79-3517ae1ccaad}
  │└Mounted as /dev/sdg1 @ /
  ├sdg2 1.00k [8:98] Partitioned (dos)
  └sdg5 15.93g [8:101] swap {568ea822-2f0c-42a8-a355-1a2e856728a0}
   └dm-0 15.93g [252:0] swap {fac64c73-bb78-417d-9323-a5dd381178bf}
USB [usb-storage] Bus 001 Device 006: ID 0781:5567 SanDisk Corp. Cruzer Blade {2005224340054080F2CD}
└scsi 9:0:0:0 SanDisk  Cruzer Blade     {2005224340054080F2CD}
 └sdh 3.73g [8:112] Partitioned (dos)
  └sdh1 3.73g [8:113] vfat {6E17-F675}
   └Mounted as /dev/sdh1 @ /media/cdrom
Other Block Devices
├loop0 5.86t [7:0] DM_snapshot_cow
│└dm-4 5.46t [252:4] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├loop1 5.86t [7:1] DM_snapshot_cow
│└dm-1 5.46t [252:1] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├loop2 5.86t [7:2] Empty/Unknown
│└dm-2 5.46t [252:2] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├loop3 5.86t [7:3] DM_snapshot_cow
│└dm-3 5.46t [252:3] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├loop4 5.86t [7:4] DM_snapshot_cow
│└dm-5 5.64t [252:5] MD raid5 (6) inactive 'rack-server-1:1' {510d9668-d30c-b4cd-cc76-9fcace98c3b1}
├loop5 5.86t [7:5] DM_snapshot_cow
│└dm-6 5.46t [252:6] MD raid5 (6) inactive 'rack-server-1:1' {18cd5b54-707a-36df-36be-8f01e8a77122}
├loop6 0.00k [7:6] Empty/Unknown
└loop7 0.00k [7:7] Empty/Unknown

Please note that the superblocks have probably been trashed by Permute arrays

> 
> Cheers,
> Wol
> 

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 10:23   ` Patrik Dahlström
  2017-04-23 10:46     ` Andreas Klauer
@ 2017-04-23 12:32     ` Patrik Dahlström
  2017-04-23 12:45       ` Andreas Klauer
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 12:32 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid



On 04/23/2017 12:23 PM, Patrik Dahlström wrote:
> On 04/23/2017 12:16 PM, Andreas Klauer wrote:
>> On Sun, Apr 23, 2017 at 11:47:34AM +0200, Patrik Dahlström wrote:
>>> Is there any help you can offer?
>>
>> Is there any mdadm --examine output?
> At this point, it is incorrect. I've lost the output from the working
> raid too, unless it's located in any log in /var/log/.
> I have /etc/mdadm/mdadm.conf, but don't know if it's updated.
> 
>>
>> What was on the array? Regular filesystem, unencrypted, or LVM, LUKS, ...?
> Regular filesystem, unencrypted ext4.
> 
>>
>> If it's LUKS encrypted and you had RAID metadata at the end, yet 
>> mdadm --create'd new metadata at the start, that would likely have 
>> damaged your LUKS header beyond repair (and regular filesystems 
>> don't like it, either).
> No file system encryption.
> 
>>
>> If it's unencrypted data, as a last resort you can always go and find 
>> the header of a large file of known type... for example if you find 
>> a megapixel JPEG image and the first 512K of it are part of that then 
>> your chunksize would be 512K and then you can go looking for the 
>> next chunk on the other disks... and that should give you some notion 
>> of the RAID layout and offset.
> That's not a bad idea. Will hopefully narrow down my unknown variables.
Okay, I extracted parts of an mkv file and this is what I found out:
* playing 512 kb of data is OK
* playing 1024 kb of data will give me the following error (from mpv):
[mkv] Invalid EBML length at position 539473
[mkv] Corrupt file detected. Trying to resync starting from position
539473...

"position 539473" is at ~527 kb, which leads me to suspect that the
correct chunk size is 512 kb.

Any thoughts?

> 
>>
>> Regards
>> Andreas Klauer
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 12:32     ` Patrik Dahlström
@ 2017-04-23 12:45       ` Andreas Klauer
  2017-04-23 12:57         ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 12:45 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: linux-raid

On Sun, Apr 23, 2017 at 02:32:07PM +0200, Patrik Dahlström wrote:
> Any thoughts?

What's the exact size of your drives?

blockdev --getsize64 /dev/sd[abcdefg]

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 12:45       ` Andreas Klauer
@ 2017-04-23 12:57         ` Patrik Dahlström
  0 siblings, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 12:57 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid



On 04/23/2017 02:45 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 02:32:07PM +0200, Patrik Dahlström wrote:
>> Any thoughts?
> 
> What's the exact size of your drives?
> 
> blockdev --getsize64 /dev/sd[abcdefg]
> 
$ blockdev --getsize64 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg
6001175126016
6201213935616
6001175126016
6001175126016
6001175126016
6001175126016
2000398934016

/dev/sdg is the OS disk running from USB cabinet.

I have a faint memory of not using the full drives when creating the
array. Maybe save ~%5 in case I get a replacement that is slightly
smaller than the drives I have today. I could probably check for zeros
at the end of a drive

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 11:12       ` Patrik Dahlström
  2017-04-23 11:36         ` Wols Lists
@ 2017-04-23 13:16         ` Andreas Klauer
  2017-04-23 13:49           ` Patrik Dahlström
  1 sibling, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 13:16 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: linux-raid

On Sun, Apr 23, 2017 at 01:12:54PM +0200, Patrik Dahlström wrote:
> I got some of that!

> [    3.100700] RAID conf printout:
> [    3.100700]  --- level:5 rd:5 wd:5
> [    3.100700]  disk 0, o:1, dev:sda
> [    3.100700]  disk 1, o:1, dev:sdb
> [    3.100701]  disk 2, o:1, dev:sdd
> [    3.100701]  disk 3, o:1, dev:sdc
> [    3.100701]  disk 4, o:1, dev:sde
> [    3.101006] created bitmap (44 pages) for device md1
> [    3.102245] md1: bitmap initialized from disk: read 3 pages, set 0 of
> 89423 bits
> [    3.159019] md1: detected capacity change from 0 to 24004163272704

Fairly standard, RAID5, presumably 1.2 metadata with 128M data offset, 
which is the default mdadm uses lately. Older RAIDs would have smaller 
data offsets.

So... ...the output above really is from before any of your accidents?
How old is your raid ...?

Tested with loop devices:

# truncate -s 6001175126016 0 1 2 3 4
# losetup --find --show
# mdadm --create /dev/md42 --assume-clean --data-offset=128M --level=5 --raid-devices=5 /dev/loop[01234]

| [14580.373999] md/raid:md42: device loop4 operational as raid disk 4
| [14580.373999] md/raid:md42: device loop3 operational as raid disk 3
| [14580.374000] md/raid:md42: device loop2 operational as raid disk 2
| [14580.374000] md/raid:md42: device loop1 operational as raid disk 1
| [14580.374001] md/raid:md42: device loop0 operational as raid disk 0
| [14580.374308] md/raid:md42: raid level 5 active with 5 out of 5 devices, algorithm 2
| [14580.377043] md42: detected capacity change from 0 to 24004163272704

(Results in identical capacity as yours so it's the most likely match.)

Again, you'd do this with overlays only...

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 13:16         ` Andreas Klauer
@ 2017-04-23 13:49           ` Patrik Dahlström
  2017-04-23 14:36             ` Andreas Klauer
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 13:49 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid



On 04/23/2017 03:16 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 01:12:54PM +0200, Patrik Dahlström wrote:
>> I got some of that!
> 
>> [    3.100700] RAID conf printout:
>> [    3.100700]  --- level:5 rd:5 wd:5
>> [    3.100700]  disk 0, o:1, dev:sda
>> [    3.100700]  disk 1, o:1, dev:sdb
>> [    3.100701]  disk 2, o:1, dev:sdd
>> [    3.100701]  disk 3, o:1, dev:sdc
>> [    3.100701]  disk 4, o:1, dev:sde
>> [    3.101006] created bitmap (44 pages) for device md1
>> [    3.102245] md1: bitmap initialized from disk: read 3 pages, set 0 of
>> 89423 bits
>> [    3.159019] md1: detected capacity change from 0 to 24004163272704
> 
> Fairly standard, RAID5, presumably 1.2 metadata with 128M data offset, 
> which is the default mdadm uses lately. Older RAIDs would have smaller 
> data offsets.
> 
> So... ...the output above really is from before any of your accidents?
Yes, it is from before adding /dev/sdf and starting a reshape

> How old is your raid ...?
The raid is roughly 1 year old. It started as a combination of raids:
md0: 4x2TB raid5
md1: 2x6TB + md0 raid5

A few months after that, md0 was replaced with a 6 TB drive (/dev/sdd).
Last august I added /dev/sde and this january I added /dev/sde.
Yesterday I tried to add /dev/sdf.

> 
> Tested with loop devices:
> 
> # truncate -s 6001175126016 0 1 2 3 4
> # losetup --find --show
> # mdadm --create /dev/md42 --assume-clean --data-offset=128M --level=5 --raid-devices=5 /dev/loop[01234]

> 
> | [14580.373999] md/raid:md42: device loop4 operational as raid disk 4
> | [14580.373999] md/raid:md42: device loop3 operational as raid disk 3
> | [14580.374000] md/raid:md42: device loop2 operational as raid disk 2
> | [14580.374000] md/raid:md42: device loop1 operational as raid disk 1
> | [14580.374001] md/raid:md42: device loop0 operational as raid disk 0
> | [14580.374308] md/raid:md42: raid level 5 active with 5 out of 5 devices, algorithm 2
> | [14580.377043] md42: detected capacity change from 0 to 24004163272704
> 
> (Results in identical capacity as yours so it's the most likely match.)
> 
> Again, you'd do this with overlays only...
I did
$ mdadm --create /dev/md1 --assume-clean --data-offset=128M --level=5 --raid-devices=5 /dev/mapper/sd[abdce]
$ dmesg | tail
[10079.442770] md: bind<dm-2>
[10079.442835] md: bind<dm-5>
[10079.442889] md: bind<dm-1>
[10079.442954] md: bind<dm-3>
[10079.443015] md: bind<dm-4>
[10079.443814] md/raid:md1: device dm-4 operational as raid disk 4
[10079.443815] md/raid:md1: device dm-3 operational as raid disk 3
[10079.443816] md/raid:md1: device dm-1 operational as raid disk 2
[10079.443830] md/raid:md1: device dm-5 operational as raid disk 1
[10079.443830] md/raid:md1: device dm-2 operational as raid disk 0
[10079.444123] md/raid:md1: allocated 5432kB
[10079.444168] md/raid:md1: raid level 5 active with 5 out of 5 devices, algorithm 2
[10079.444169] RAID conf printout:
[10079.444170]  --- level:5 rd:5 wd:5
[10079.444171]  disk 0, o:1, dev:dm-2
[10079.444171]  disk 1, o:1, dev:dm-5
[10079.444172]  disk 2, o:1, dev:dm-1
[10079.444173]  disk 3, o:1, dev:dm-3
[10079.444173]  disk 4, o:1, dev:dm-4
[10079.444237] created bitmap (44 pages) for device md1
[10079.446272] md1: bitmap initialized from disk: read 3 pages, set 89423 of 89423 bits
[10079.451821] md1: detected capacity change from 0 to 24004163272704
$ mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sun Apr 23 15:40:15 2017
     Raid Level : raid5
     Array Size : 23441565696 (22355.62 GiB 24004.16 GB)
  Used Dev Size : 5860391424 (5588.90 GiB 6001.04 GB)
   Raid Devices : 5
  Total Devices : 5
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Apr 23 15:40:15 2017
          State : clean 
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : rack-server-1:1  (local to host rack-server-1)
           UUID : 6beee843:59371bd6:c9278c83:1eb89111
         Events : 0

    Number   Major   Minor   RaidDevice State
       0     252        2        0      active sync   /dev/dm-2
       1     252        5        1      active sync   /dev/dm-5
       2     252        1        2      active sync   /dev/dm-1
       3     252        3        3      active sync   /dev/dm-3
       4     252        4        4      active sync   /dev/dm-4

$ mount /dev/md1 /storage
mount: wrong fs type, bad option, bad superblock on /dev/md1,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

Still no luck. Was the drives added in the wrong order?

> 
> Regards
> Andreas Klauer
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23  9:47 Recover array after I panicked Patrik Dahlström
  2017-04-23 10:16 ` Andreas Klauer
@ 2017-04-23 14:06 ` Brad Campbell
  2017-04-23 14:09   ` Patrik Dahlström
  2017-04-23 14:48   ` Andreas Klauer
  1 sibling, 2 replies; 63+ messages in thread
From: Brad Campbell @ 2017-04-23 14:06 UTC (permalink / raw)
  To: Patrik Dahlström, linux-raid

On 23/04/17 17:47, Patrik Dahlström wrote:
> Hello,
>
> Here's the story:
>
> I started with a 5x6 TB raid5 array. I added another 6 TB drive and
> started to grow the array. However, one of my SATA cables were bad and
> the reshape gave me lots of I/O errors.
>
> Instead of fixing the SATA cable issue directly, I shutdown the server
> and swapped places of 2 drives. My reasoning was that putting the new
> drive in a good slot would reduce the I/O errors. Bad move, I know. I
> tried a few commands but was not able to continue the reshape.
>

Nobody seems to have mentioned the reshape issue. What sort of reshape 
were you running? How far into the reshape did it get? Do you have any 
logs of the errors (which might at least indicate whereabouts in the 
array things were before you pushed it over the edge)?


What you'll have is one part of the array in one configuration, the 
remaining part in another and no record of where that split begins.

Regards,
Brad
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 14:06 ` Brad Campbell
@ 2017-04-23 14:09   ` Patrik Dahlström
  2017-04-23 14:20     ` Patrik Dahlström
  2017-04-23 14:25     ` Brad Campbell
  2017-04-23 14:48   ` Andreas Klauer
  1 sibling, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 14:09 UTC (permalink / raw)
  To: Brad Campbell, linux-raid



On 04/23/2017 04:06 PM, Brad Campbell wrote:
> On 23/04/17 17:47, Patrik Dahlström wrote:
>> Hello,
>>
>> Here's the story:
>>
>> I started with a 5x6 TB raid5 array. I added another 6 TB drive and
>> started to grow the array. However, one of my SATA cables were bad and
>> the reshape gave me lots of I/O errors.
>>
>> Instead of fixing the SATA cable issue directly, I shutdown the server
>> and swapped places of 2 drives. My reasoning was that putting the new
>> drive in a good slot would reduce the I/O errors. Bad move, I know. I
>> tried a few commands but was not able to continue the reshape.
>>
> 
> Nobody seems to have mentioned the reshape issue. What sort of reshape
> were you running? How far into the reshape did it get? Do you have any
> logs of the errors (which might at least indicate whereabouts in the
> array things were before you pushed it over the edge)?
These were the grow commands I ran:
mdadm --add /dev/md1 /dev/sdf
mdadm --grow --raid-devices=6 /dev/md1

It got to roughly 15-17 % before I decided that the I/O errors were more
scary than stopping the reshape.
> 
> 
> What you'll have is one part of the array in one configuration, the
> remaining part in another and no record of where that split begins.
Like I said, ~15-17 % into the reshape.
> 
> Regards,
> Brad

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 14:09   ` Patrik Dahlström
@ 2017-04-23 14:20     ` Patrik Dahlström
  2017-04-23 14:25     ` Brad Campbell
  1 sibling, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 14:20 UTC (permalink / raw)
  To: Brad Campbell, linux-raid



On 04/23/2017 04:09 PM, Patrik Dahlström wrote:
> 
> 
> On 04/23/2017 04:06 PM, Brad Campbell wrote:
>> On 23/04/17 17:47, Patrik Dahlström wrote:
>>> Hello,
>>>
>>> Here's the story:
>>>
>>> I started with a 5x6 TB raid5 array. I added another 6 TB drive and
>>> started to grow the array. However, one of my SATA cables were bad and
>>> the reshape gave me lots of I/O errors.
>>>
>>> Instead of fixing the SATA cable issue directly, I shutdown the server
>>> and swapped places of 2 drives. My reasoning was that putting the new
>>> drive in a good slot would reduce the I/O errors. Bad move, I know. I
>>> tried a few commands but was not able to continue the reshape.
>>>
>>
>> Nobody seems to have mentioned the reshape issue. What sort of reshape
>> were you running? How far into the reshape did it get? Do you have any
>> logs of the errors (which might at least indicate whereabouts in the
>> array things were before you pushed it over the edge)?
> These were the grow commands I ran:
> mdadm --add /dev/md1 /dev/sdf
> mdadm --grow --raid-devices=6 /dev/md1
> 
I found the kernel log output from when I ran the command:
[ 1912.303661] md: bind<sdf>
[ 1912.355423] RAID conf printout:
[ 1912.355426]  --- level:5 rd:5 wd:5
[ 1912.355428]  disk 0, o:1, dev:sda
[ 1912.355429]  disk 1, o:1, dev:sdb
[ 1912.355430]  disk 2, o:1, dev:sdd
[ 1912.355431]  disk 3, o:1, dev:sdc
[ 1912.355432]  disk 4, o:1, dev:sde
[ 1937.287333] RAID conf printout:
[ 1937.287341]  --- level:5 rd:6 wd:6
[ 1937.287347]  disk 0, o:1, dev:sda
[ 1937.287351]  disk 1, o:1, dev:sdb
[ 1937.287355]  disk 2, o:1, dev:sdd
[ 1937.287358]  disk 3, o:1, dev:sdc
[ 1937.287361]  disk 4, o:1, dev:sde
[ 1937.287365]  disk 5, o:1, dev:sdf
[ 1937.287469] md: reshape of RAID array md1
[ 1937.287475] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1937.287478] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
[ 1937.287487] md: using 128k window, over a total of 5860391424k.
[ 1937.424014] ata6.00: exception Emask 0x10 SAct 0x20000 SErr 0x480100 action 0x6 frozen
[ 1937.424086] ata6.00: irq_stat 0x08000000, interface fatal error
[ 1937.424134] ata6: SError: { UnrecovData 10B8B Handshk }
[ 1937.424179] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1937.424227] ata6.00: cmd 61/40:88:00:dc:03/01:00:00:00:00/40 tag 17 ncq 163840 out
[ 1937.424227]          res 40/00:88:00:dc:03/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 1937.424341] ata6.00: status: { DRDY }
[ 1937.424375] ata6: hard resetting link
[ 1937.743934] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1937.745491] ata6.00: configured for UDMA/133
[ 1937.745498] ata6: EH complete
[ 1937.751920] ata6.00: exception Emask 0x10 SAct 0xc00000 SErr 0x400100 action 0x6 frozen
[ 1937.751948] ata6.00: irq_stat 0x08000000, interface fatal error
[ 1937.751966] ata6: SError: { UnrecovData Handshk }
[ 1937.751982] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1937.751999] ata6.00: cmd 61/b8:b0:80:e2:03/02:00:00:00:00/40 tag 22 ncq 356352 out
[ 1937.751999]          res 40/00:b8:40:dd:03/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 1937.752042] ata6.00: status: { DRDY }
[ 1937.752053] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1937.752070] ata6.00: cmd 61/40:b8:40:dd:03/05:00:00:00:00/40 tag 23 ncq 688128 out
[ 1937.752070]          res 40/00:b8:40:dd:03/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[ 1937.752113] ata6.00: status: { DRDY }
[ 1937.752125] ata6: hard resetting link
[ 1938.072176] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1938.074013] ata6.00: configured for UDMA/133
[ 1938.074036] ata6: EH complete
etc.

The rest is lots and lots of I/O errors due to bad SATA cable.

> It got to roughly 15-17 % before I decided that the I/O errors were more
> scary than stopping the reshape.
>>
>>
>> What you'll have is one part of the array in one configuration, the
>> remaining part in another and no record of where that split begins.
> Like I said, ~15-17 % into the reshape.
>>
>> Regards,
>> Brad

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 14:09   ` Patrik Dahlström
  2017-04-23 14:20     ` Patrik Dahlström
@ 2017-04-23 14:25     ` Brad Campbell
  1 sibling, 0 replies; 63+ messages in thread
From: Brad Campbell @ 2017-04-23 14:25 UTC (permalink / raw)
  To: Patrik Dahlström, linux-raid



On 23/04/17 22:09, Patrik Dahlström wrote:
>
> On 04/23/2017 04:06 PM, Brad Campbell wrote:
>> On 23/04/17 17:47, Patrik Dahlström wrote:
>>> Hello,
>>>
>>> Here's the story:
>>>
>>> I started with a 5x6 TB raid5 array. I added another 6 TB drive and
>>> started to grow the array. However, one of my SATA cables were bad and
>>> the reshape gave me lots of I/O errors.
>>>
>>> Instead of fixing the SATA cable issue directly, I shutdown the server
>>> and swapped places of 2 drives. My reasoning was that putting the new
>>> drive in a good slot would reduce the I/O errors. Bad move, I know. I
>>> tried a few commands but was not able to continue the reshape.
>>>
>> Nobody seems to have mentioned the reshape issue. What sort of reshape
>> were you running? How far into the reshape did it get? Do you have any
>> logs of the errors (which might at least indicate whereabouts in the
>> array things were before you pushed it over the edge)?
> These were the grow commands I ran:
> mdadm --add /dev/md1 /dev/sdf
> mdadm --grow --raid-devices=6 /dev/md1
>
> It got to roughly 15-17 % before I decided that the I/O errors were more
> scary than stopping the reshape.
>
You might be very lucky. If my reading of the code is correct (and my 
memory is any good), simply adding a disk to a raid5 on a recent enough 
kernel should make the resync go backwards. So it should have started at 
the end and worked towards the start. This would mean the majority of 
your data should be on your 5 original disks.

If this is the case, then permuting the array with 6 disks is going to 
fail always as 1/5th of every stripe will be bogus. Doing it with the 
original 5 disks may ultimately yield something in the order of 85% of 
your data if your estimate of 15-17% is correct.

No harm testing it with some overlays anyway. Someone more familiar with 
the code will correct me if I'm wrong.

Regards,
Brad

-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 13:49           ` Patrik Dahlström
@ 2017-04-23 14:36             ` Andreas Klauer
  2017-04-23 14:45               ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 14:36 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: linux-raid

On Sun, Apr 23, 2017 at 03:49:16PM +0200, Patrik Dahlström wrote:
> > Again, you'd do this with overlays only...
> I did
> $ mdadm --create /dev/md1 --assume-clean --data-offset=128M --level=5 --raid-devices=5 /dev/mapper/sd[abdce]
> $ dmesg | tail

Hi,

the shell globbing style [abdce] actually expands alphabetically,
you probably have to write {a,b,d,c,e} instead.

$ echo [edcba]
a b c d e
$ echo {e,d,c,b,a}
e d c b a

Use 'dmsetup ls' to make sense of the dm-XX (253:XX) numbers.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 14:36             ` Andreas Klauer
@ 2017-04-23 14:45               ` Patrik Dahlström
  0 siblings, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 14:45 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: linux-raid



On 04/23/2017 04:36 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 03:49:16PM +0200, Patrik Dahlström wrote:
>>> Again, you'd do this with overlays only...
>> I did
>> $ mdadm --create /dev/md1 --assume-clean --data-offset=128M --level=5 --raid-devices=5 /dev/mapper/sd[abdce]
>> $ dmesg | tail
> 
> Hi,
> 
> the shell globbing style [abdce] actually expands alphabetically,
> you probably have to write {a,b,d,c,e} instead.
I spelled it out for every device but no change in the result I'm
afraid. It still won't mount it, nor will fsck recognize it.

Doesn't the triggered reshape pose an issue here? It went ~16 % through.

// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 14:06 ` Brad Campbell
  2017-04-23 14:09   ` Patrik Dahlström
@ 2017-04-23 14:48   ` Andreas Klauer
  2017-04-23 15:11     ` Patrik Dahlström
  1 sibling, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 14:48 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Patrik Dahlström, linux-raid

On Sun, Apr 23, 2017 at 10:06:15PM +0800, Brad Campbell wrote:
> Nobody seems to have mentioned the reshape issue.

Good point.

If it was mid-reshape you need two sets of overlays, 
create two RAIDs (one for each configuration), and 
then find the point where it converges.

> If my reading of the code is correct (and my memory
> is any good), simply adding a disk to a raid5 on a 
> recent enough kernel should make the resync go backwards.

Doesn't it cut the offset by half and grow forwards...?

With growing a disk that should give you a segment where 
data is identical for both 5-disk and 6-disk RAID-5. 
And that's where you join them using dmsetup linear.

Before:

/dev/loop0:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 4611f41b:0464e815:8b6f9cfe:b29c56fd
           Name : EIS:42  (local to host EIS)
  Creation Time : Sun Apr 23 16:44:59 2017
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 11720783024 (5588.90 GiB 6001.04 GB)
     Array Size : 23441565696 (22355.62 GiB 24004.16 GB)
  Used Dev Size : 11720782848 (5588.90 GiB 6001.04 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=176 sectors
          State : clean
    Device UUID : acd8d9fd:7b7cf9a0:f63369d1:907ffa66

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Apr 23 16:44:59 2017
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : f89bdc5 - correct
         Events : 2

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

After/During grow:

/dev/loop0:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x45
     Array UUID : 4611f41b:0464e815:8b6f9cfe:b29c56fd
           Name : EIS:42  (local to host EIS)
  Creation Time : Sun Apr 23 16:44:59 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 11720783024 (5588.90 GiB 6001.04 GB)
     Array Size : 29301957120 (27944.52 GiB 30005.20 GB)
  Used Dev Size : 11720782848 (5588.90 GiB 6001.04 GB)
    Data Offset : 262144 sectors
|     New Offset : 257024 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : acd8d9fd:7b7cf9a0:f63369d1:907ffa66

Internal Bitmap : 8 sectors from superblock
|  Reshape pos'n : 1472000 (1437.50 MiB 1507.33 MB)
|  Delta Devices : 1 (5->6)

    Update Time : Sun Apr 23 16:45:38 2017
  Bad Block Log : 512 entries available at offset 32 sectors
       Checksum : fbd9a55 - correct
         Events : 30

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

Basically you have to know the New Offset 
(search first 128M of your drives for filesystem headers, that should be it)
and then guess the Reshape pos'n by comparing raw data at offset X 
(find non-zero data at identical offsets for both raid sets)

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 14:48   ` Andreas Klauer
@ 2017-04-23 15:11     ` Patrik Dahlström
  2017-04-23 15:24       ` Patrik Dahlström
  2017-04-23 15:42       ` Andreas Klauer
  0 siblings, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 15:11 UTC (permalink / raw)
  To: Andreas Klauer, Brad Campbell; +Cc: linux-raid



On 04/23/2017 04:48 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 10:06:15PM +0800, Brad Campbell wrote:
>> Nobody seems to have mentioned the reshape issue.
> 
> Good point.
> 
> If it was mid-reshape you need two sets of overlays, 
> create two RAIDs (one for each configuration), and 
> then find the point where it converges.
> 
>> If my reading of the code is correct (and my memory
>> is any good), simply adding a disk to a raid5 on a 
>> recent enough kernel should make the resync go backwards.
> 
> Doesn't it cut the offset by half and grow forwards...?
> 
> With growing a disk that should give you a segment where 
> data is identical for both 5-disk and 6-disk RAID-5. 
> And that's where you join them using dmsetup linear.
> 
> Before:
> 
> /dev/loop0:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x1
>      Array UUID : 4611f41b:0464e815:8b6f9cfe:b29c56fd
>            Name : EIS:42  (local to host EIS)
>   Creation Time : Sun Apr 23 16:44:59 2017
>      Raid Level : raid5
>    Raid Devices : 5
> 
>  Avail Dev Size : 11720783024 (5588.90 GiB 6001.04 GB)
>      Array Size : 23441565696 (22355.62 GiB 24004.16 GB)
>   Used Dev Size : 11720782848 (5588.90 GiB 6001.04 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>    Unused Space : before=262064 sectors, after=176 sectors
>           State : clean
>     Device UUID : acd8d9fd:7b7cf9a0:f63369d1:907ffa66
> 
> Internal Bitmap : 8 sectors from superblock
>     Update Time : Sun Apr 23 16:44:59 2017
>   Bad Block Log : 512 entries available at offset 32 sectors
>        Checksum : f89bdc5 - correct
>          Events : 2
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 0
>    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
> 
> After/During grow:
> 
> /dev/loop0:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x45
>      Array UUID : 4611f41b:0464e815:8b6f9cfe:b29c56fd
>            Name : EIS:42  (local to host EIS)
>   Creation Time : Sun Apr 23 16:44:59 2017
>      Raid Level : raid5
>    Raid Devices : 6
> 
>  Avail Dev Size : 11720783024 (5588.90 GiB 6001.04 GB)
>      Array Size : 29301957120 (27944.52 GiB 30005.20 GB)
>   Used Dev Size : 11720782848 (5588.90 GiB 6001.04 GB)
>     Data Offset : 262144 sectors
> |     New Offset : 257024 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : acd8d9fd:7b7cf9a0:f63369d1:907ffa66
> 
> Internal Bitmap : 8 sectors from superblock
> |  Reshape pos'n : 1472000 (1437.50 MiB 1507.33 MB)
> |  Delta Devices : 1 (5->6)
> 
>     Update Time : Sun Apr 23 16:45:38 2017
>   Bad Block Log : 512 entries available at offset 32 sectors
>        Checksum : fbd9a55 - correct
>          Events : 30
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>    Device Role : Active device 0
>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
> 
> Basically you have to know the New Offset 
> (search first 128M of your drives for filesystem headers, that should be it)
Let's see if I understand you correctly:

* I try to find 0x53EF (ext4 magic) within the first 128M of
/dev/sd[abcde]. Not after? This will be an indication of my "New
Offset". I need to adjust the offset a bit since the ext4 magic is
located at 0x438 offset.

> and then guess the Reshape pos'n by comparing raw data at offset X 
> (find non-zero data at identical offsets for both raid sets)

* I create a 5 and a 6 drive raid set and try to find an offset where
they both carry the same raw data. With some overlays, I should be able
to create both these raids at the same time, correct?

> 
> Regards
> Andreas Klauer
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 15:11     ` Patrik Dahlström
@ 2017-04-23 15:24       ` Patrik Dahlström
  2017-04-23 15:42       ` Andreas Klauer
  1 sibling, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 15:24 UTC (permalink / raw)
  To: Andreas Klauer, Brad Campbell; +Cc: linux-raid



On 04/23/2017 05:11 PM, Patrik Dahlström wrote:
> 
> 
> On 04/23/2017 04:48 PM, Andreas Klauer wrote:
>> On Sun, Apr 23, 2017 at 10:06:15PM +0800, Brad Campbell wrote:
>>> Nobody seems to have mentioned the reshape issue.
>>
>> Good point.
>>
>> If it was mid-reshape you need two sets of overlays, 
>> create two RAIDs (one for each configuration), and 
>> then find the point where it converges.
>>
>>> If my reading of the code is correct (and my memory
>>> is any good), simply adding a disk to a raid5 on a 
>>> recent enough kernel should make the resync go backwards.
>>
>> Doesn't it cut the offset by half and grow forwards...?
>>
>> With growing a disk that should give you a segment where 
>> data is identical for both 5-disk and 6-disk RAID-5. 
>> And that's where you join them using dmsetup linear.
>>
>> Before:
>>
>> /dev/loop0:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x1
>>      Array UUID : 4611f41b:0464e815:8b6f9cfe:b29c56fd
>>            Name : EIS:42  (local to host EIS)
>>   Creation Time : Sun Apr 23 16:44:59 2017
>>      Raid Level : raid5
>>    Raid Devices : 5
>>
>>  Avail Dev Size : 11720783024 (5588.90 GiB 6001.04 GB)
>>      Array Size : 23441565696 (22355.62 GiB 24004.16 GB)
>>   Used Dev Size : 11720782848 (5588.90 GiB 6001.04 GB)
>>     Data Offset : 262144 sectors
>>    Super Offset : 8 sectors
>>    Unused Space : before=262064 sectors, after=176 sectors
>>           State : clean
>>     Device UUID : acd8d9fd:7b7cf9a0:f63369d1:907ffa66
>>
>> Internal Bitmap : 8 sectors from superblock
>>     Update Time : Sun Apr 23 16:44:59 2017
>>   Bad Block Log : 512 entries available at offset 32 sectors
>>        Checksum : f89bdc5 - correct
>>          Events : 2
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 0
>>    Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
>>
>> After/During grow:
>>
>> /dev/loop0:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x45
>>      Array UUID : 4611f41b:0464e815:8b6f9cfe:b29c56fd
>>            Name : EIS:42  (local to host EIS)
>>   Creation Time : Sun Apr 23 16:44:59 2017
>>      Raid Level : raid5
>>    Raid Devices : 6
>>
>>  Avail Dev Size : 11720783024 (5588.90 GiB 6001.04 GB)
>>      Array Size : 29301957120 (27944.52 GiB 30005.20 GB)
>>   Used Dev Size : 11720782848 (5588.90 GiB 6001.04 GB)
>>     Data Offset : 262144 sectors
>> |     New Offset : 257024 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : acd8d9fd:7b7cf9a0:f63369d1:907ffa66
>>
>> Internal Bitmap : 8 sectors from superblock
>> |  Reshape pos'n : 1472000 (1437.50 MiB 1507.33 MB)
>> |  Delta Devices : 1 (5->6)
>>
>>     Update Time : Sun Apr 23 16:45:38 2017
>>   Bad Block Log : 512 entries available at offset 32 sectors
>>        Checksum : fbd9a55 - correct
>>          Events : 30
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 0
>>    Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
>>
>> Basically you have to know the New Offset 
>> (search first 128M of your drives for filesystem headers, that should be it)
> Let's see if I understand you correctly:
> 
> * I try to find 0x53EF (ext4 magic) within the first 128M of
> /dev/sd[abcde]. Not after? This will be an indication of my "New
> Offset". I need to adjust the offset a bit since the ext4 magic is
> located at 0x438 offset.
> 
Okay, I located what appears to be a ext4 file system header at
0x7B80000 in both /dev/sda and /dev/sdf. I used this command:
dd if=/dev/sda bs=524288 count=256 | ./ext2scan

where ext2scan comes from https://goo.gl/2TnZSR

>> and then guess the Reshape pos'n by comparing raw data at offset X 
>> (find non-zero data at identical offsets for both raid sets)
> 
> * I create a 5 and a 6 drive raid set and try to find an offset where
> they both carry the same raw data. With some overlays, I should be able
> to create both these raids at the same time, correct?
I'm still working on this one. Should I start looking at ~15% of the raid?

What is the next step after this?

> 
>>
>> Regards
>> Andreas Klauer
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 15:11     ` Patrik Dahlström
  2017-04-23 15:24       ` Patrik Dahlström
@ 2017-04-23 15:42       ` Andreas Klauer
  2017-04-23 16:29         ` Patrik Dahlström
  2017-04-23 19:21         ` Patrik Dahlström
  1 sibling, 2 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-23 15:42 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Sun, Apr 23, 2017 at 05:11:08PM +0200, Patrik Dahlström wrote:
> * I try to find 0x53EF (ext4 magic) within the first 128M of
> /dev/sd[abcde]. Not after? This will be an indication of my "New
> Offset". I need to adjust the offset a bit since the ext4 magic is
> located at 0x438 offset.

Yes, that should be about it.
 
> * I create a 5 and a 6 drive raid set and try to find an offset where
> they both carry the same raw data. With some overlays, I should be able
> to create both these raids at the same time, correct?

Yes, two sets of overlays.

So overlay A is your 5 disk raid5, overlay B is your 6 disk raid5, 
and then you'll just have to take a stab at it with hexdump.

So kind of like,

hexdump --skip $((17*1024*1024*1024)) --length 4096 /dev/md42
hexdump --skip $((17*1024*1024*1024)) --length 4096 /dev/md43

If that produces the same random-looking data then your reshape 
might have progressed 17G-ish and you could try to use that as 
a starting point for a linear device mapping that uses the 
first 17G of the 6 disk raid and everything else from the 5disk.

The further the grow processed the larger a zone of overlap there 
should be since more data ends up on the additional drive so 
the original representation of the same data isn't yet written 
into on the old drives.

Does that make sense?

5 disks: a b c d e   : f g h i j   : k l m n o   : p q r s t : ...
6 disks: a b c d e f : g h i j k l : m n o p q r : s t       : ...

If it stopped at "t", both have "r s t"... and that "r s t" 
would be what you have to find. You have to be wary of false 
matches though (such as zeroes or other common patterns of data 
you might find anywhere).

And there's one more thing, you mentioned a disk had a bad sata cable.
If that disk got kicked and reshape went on for a while afterwards, 
you should take that disk out of consideration. (Specify as "missing" 
when creating the arrays.) It will have outdated/unreshaped data on 
it that would be hard to incorporate into your recovery attempt...

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 15:42       ` Andreas Klauer
@ 2017-04-23 16:29         ` Patrik Dahlström
  2017-04-23 19:21         ` Patrik Dahlström
  1 sibling, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 16:29 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid



On 04/23/2017 05:42 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 05:11:08PM +0200, Patrik Dahlström wrote:
>> * I try to find 0x53EF (ext4 magic) within the first 128M of
>> /dev/sd[abcde]. Not after? This will be an indication of my "New
>> Offset". I need to adjust the offset a bit since the ext4 magic is
>> located at 0x438 offset.
> 
> Yes, that should be about it.
>  
>> * I create a 5 and a 6 drive raid set and try to find an offset where
>> they both carry the same raw data. With some overlays, I should be able
>> to create both these raids at the same time, correct?
> 
> Yes, two sets of overlays.
> 
> So overlay A is your 5 disk raid5, overlay B is your 6 disk raid5, 
> and then you'll just have to take a stab at it with hexdump.
> 
> So kind of like,
> 
> hexdump --skip $((17*1024*1024*1024)) --length 4096 /dev/md42
> hexdump --skip $((17*1024*1024*1024)) --length 4096 /dev/md43
> 
> If that produces the same random-looking data then your reshape 
> might have progressed 17G-ish and you could try to use that as 
> a starting point for a linear device mapping that uses the 
> first 17G of the 6 disk raid and everything else from the 5disk.
> 
> The further the grow processed the larger a zone of overlap there 
> should be since more data ends up on the additional drive so 
> the original representation of the same data isn't yet written 
> into on the old drives.
> 
> Does that make sense?
> 
> 5 disks: a b c d e   : f g h i j   : k l m n o   : p q r s t : ...
> 6 disks: a b c d e f : g h i j k l : m n o p q r : s t       : ...
> 
> If it stopped at "t", both have "r s t"... and that "r s t" 
> would be what you have to find. You have to be wary of false 
> matches though (such as zeroes or other common patterns of data 
> you might find anywhere).
> 
> And there's one more thing, you mentioned a disk had a bad sata cable.
> If that disk got kicked and reshape went on for a while afterwards, 
> you should take that disk out of consideration. (Specify as "missing" 
> when creating the arrays.) It will have outdated/unreshaped data on 
> it that would be hard to incorporate into your recovery attempt...
The bad SATA cable has been there for a while and reported spurious I/O
errors on the old 5 disk raid. It really started to produce I/O errors
when I placed the new drive in that slot.

Is it OK to specify the new drive as missing? No special information
that should be located on that drive?

I wrote a small program to help me locate the correct offset:
https://gist.github.com/Risca/3eda5e7aba3dc6b72d61e79eaf7cc147

I am on the right track with that program?

> 
> Regards
> Andreas Klauer
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 15:42       ` Andreas Klauer
  2017-04-23 16:29         ` Patrik Dahlström
@ 2017-04-23 19:21         ` Patrik Dahlström
  2017-04-24  2:09           ` Brad Campbell
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-23 19:21 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid


On 04/23/2017 05:42 PM, Andreas Klauer wrote:
> On Sun, Apr 23, 2017 at 05:11:08PM +0200, Patrik Dahlström wrote:
>> * I create a 5 and a 6 drive raid set and try to find an offset where
>> they both carry the same raw data. With some overlays, I should be able
>> to create both these raids at the same time, correct?
> 
> Yes, two sets of overlays.
> 
> So overlay A is your 5 disk raid5, overlay B is your 6 disk raid5, 
> and then you'll just have to take a stab at it with hexdump.
> 
> So kind of like,
> 
> hexdump --skip $((17*1024*1024*1024)) --length 4096 /dev/md42
> hexdump --skip $((17*1024*1024*1024)) --length 4096 /dev/md43
> 
> If that produces the same random-looking data then your reshape 
> might have progressed 17G-ish and you could try to use that as 
> a starting point for a linear device mapping that uses the 
> first 17G of the 6 disk raid and everything else from the 5disk.
> 
> The further the grow processed the larger a zone of overlap there 
> should be since more data ends up on the additional drive so 
> the original representation of the same data isn't yet written 
> into on the old drives.
> 
> Does that make sense?
> 
> 5 disks: a b c d e   : f g h i j   : k l m n o   : p q r s t : ...
> 6 disks: a b c d e f : g h i j k l : m n o p q r : s t       : ...
> 
> If it stopped at "t", both have "r s t"... and that "r s t" 
> would be what you have to find. You have to be wary of false 
> matches though (such as zeroes or other common patterns of data 
> you might find anywhere).
I just thought of something: since /dev/sdf is a completely new disk,
shouldn't I be able to locate the starting point by looking at how much
of the new disk is filled. It comes filled with zeros from factory, right?

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 19:21         ` Patrik Dahlström
@ 2017-04-24  2:09           ` Brad Campbell
  2017-04-24  7:34             ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Brad Campbell @ 2017-04-24  2:09 UTC (permalink / raw)
  To: Patrik Dahlström, Andreas Klauer; +Cc: linux-raid

On 24/04/17 03:21, Patrik Dahlström wrote:

> I just thought of something: since /dev/sdf is a completely new disk,
> shouldn't I be able to locate the starting point by looking at how much
> of the new disk is filled. It comes filled with zeros from factory, right?

Pretty close. You should also be able to see how the reshape worked 
(front to back, back to front) and you should get a pretty good idea of 
the data-offset by the gap between the superblock and the data.

I must say I'm having trouble imagining a more difficult recovery 
scenario, so at least you present an interesting challenge.

There's a lot to be said for the old "Don't panic" mantra, but hindsight 
is always 20/20.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24  2:09           ` Brad Campbell
@ 2017-04-24  7:34             ` Patrik Dahlström
  2017-04-24 11:04               ` Andreas Klauer
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24  7:34 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Andreas Klauer, linux-raid

2017-04-24 4:09 GMT+02:00, Brad Campbell <lists2009@fnarfbargle.com>:
> On 24/04/17 03:21, Patrik Dahlström wrote:
>
>> I just thought of something: since /dev/sdf is a completely new disk,
>> shouldn't I be able to locate the starting point by looking at how much
>> of the new disk is filled. It comes filled with zeros from factory,
>> right?
>
> Pretty close. You should also be able to see how the reshape worked
> (front to back, back to front) and you should get a pretty good idea of
> the data-offset by the gap between the superblock and the data.
I've let a program compare both raid sets (5 and 6 disk) overnight. So
far it has gone from 128 MB to 14 TB without finding common data. Does
that tell us anything?
Should I pause the comparison and look at the end of sdf to find the
last written offset? I currently only have 1 shell in Ubuntu
maintenance/rescue mode
>
> I must say I'm having trouble imagining a more difficult recovery
> scenario, so at least you present an interesting challenge.
I always like a challenge, but this one I believe I could've skipped :)
>
> There's a lot to be said for the old "Don't panic" mantra, but hindsight
> is always 20/20.
All too true

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24  7:34             ` Patrik Dahlström
@ 2017-04-24 11:04               ` Andreas Klauer
  2017-04-24 12:13                 ` Patrik Dahlström
  2017-04-25 23:01                 ` Patrik Dahlström
  0 siblings, 2 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-24 11:04 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Mon, Apr 24, 2017 at 09:34:04AM +0200, Patrik Dahlström wrote:
> I've let a program compare both raid sets (5 and 6 disk) overnight. So
> far it has gone from 128 MB to 14 TB without finding common data. Does
> that tell us anything?

Are both RAID sets created correctly?

On the 6 disk one, `file -s /dev/mdX` should say ext filesystem.

If that's not there it's certainly incorrect. (The reverse isn't true though.)

I experiment a little:

# truncate -s 100M a b c d e f
# for f in ?; do losetup --find --show "$f"; done
# mdadm --create /dev/md42 --level=5 --raid-devices=5 /dev/loop{0,1,2,3,4}
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md42 started.
# i=0; while printf "%015x\n" $i; do let i+=16; done > /dev/md42
# hexdump -C -n 64 -s 808080 /dev/md42
000c5490  30 30 30 30 30 30 30 30  30 30 63 35 34 39 30 0a  |0000000000c5490.|
000c54a0  30 30 30 30 30 30 30 30  30 30 63 35 34 61 30 0a  |0000000000c54a0.|
000c54b0  30 30 30 30 30 30 30 30  30 30 63 35 34 62 30 0a  |0000000000c54b0.|
000c54c0  30 30 30 30 30 30 30 30  30 30 63 35 34 63 30 0a  |0000000000c54c0.|
000c54d0

So in this sample array the data itself represents the offset it should be at.
This is just so we can verify later.

Now grow.

# echo 1 > /sys/block/md42/md/sync_speed_min
# echo 256 > /sys/block/md42/md/sync_speed_max
# mdadm --grow /dev/md42 --raid-devices=6 --add /dev/loop5
mdadm: added /dev/loop5
mdadm: Need to backup 10240K of critical section..
# watch grep -A3 md42 /proc/mdstat
... wait for it to reach around 50% or whatever ...
# mdadm --stop /dev/md42
mdadm: stopped /dev/md42
# mdadm --examine /dev/loop1
[...]
  Reshape pos'n : 296960 (290.00 MiB 304.09 MB)
  Delta Devices : 1 (5->6)
[...]

Now create two RAID sets:

# losetup -D
# for f in ? ; do cp "$f" "$f".a ; done;
# for f in ? ; do cp "$f" "$f".b ; done;
# for a in *.a ; do losetup --find --show "$a" ; done
# for b in *.b ; do losetup --find --show "$b" ; done
# mdadm --create /dev/md42 --assume-clean --level=5 --raid-devices=5 /dev/loop{0,1,2,3,4}
# mdadm --create /dev/md42 --assume-clean --level=5 --raid-devices=6 /dev/loop{5,6,7,8,9,10}

# cat /proc/mdstat 
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md42 : active raid5 loop4[4] loop3[3] loop2[2] loop1[1] loop0[0]
      405504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
      
md43 : active raid5 loop10[5] loop9[4] loop8[3] loop7[2] loop6[1] loop5[0]
      506880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]

And compare:

# hexdump -C -n 64 /dev/md42
00000000  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 0a  |000000000000000.|
00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 31 30 0a  |000000000000010.|
00000020  30 30 30 30 30 30 30 30  30 30 30 30 30 32 30 0a  |000000000000020.|
00000030  30 30 30 30 30 30 30 30  30 30 30 30 30 33 30 0a  |000000000000030.|
00000040
# hexdump -C -n 64 /dev/md43
00000000  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 0a  |000000000000000.|
00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 31 30 0a  |000000000000010.|
00000020  30 30 30 30 30 30 30 30  30 30 30 30 30 32 30 0a  |000000000000020.|
00000030  30 30 30 30 30 30 30 30  30 30 30 30 30 33 30 0a  |000000000000030.|
00000040

This is identical because in this example, the offset didn't change.

# hexdump -C -n 64 -s 80808080 /dev/md42
04d10890  30 30 30 30 30 30 30 30  35 66 31 30 38 39 30 0a  |000000005f10890.|
04d108a0  30 30 30 30 30 30 30 30  35 66 31 30 38 61 30 0a  |000000005f108a0.|
04d108b0  30 30 30 30 30 30 30 30  35 66 31 30 38 62 30 0a  |000000005f108b0.|
04d108c0  30 30 30 30 30 30 30 30  35 66 31 30 38 63 30 0a  |000000005f108c0.|
04d108d0
# hexdump -C -n 64 -s 80808080 /dev/md43
04d10890  30 30 30 30 30 30 30 30  34 64 31 30 38 39 30 0a  |000000004d10890.|
04d108a0  30 30 30 30 30 30 30 30  34 64 31 30 38 61 30 0a  |000000004d108a0.|
04d108b0  30 30 30 30 30 30 30 30  34 64 31 30 38 62 30 0a  |000000004d108b0.|
04d108c0  30 30 30 30 30 30 30 30  34 64 31 30 38 63 30 0a  |000000004d108c0.|
04d108d0

For this offset, md42 was wrong, md43 is correct.

# hexdump -C -n 64 -s 300808080 /dev/md42
11edf790  30 30 30 30 30 30 30 31  31 65 64 66 37 39 30 0a  |000000011edf790.|
11edf7a0  30 30 30 30 30 30 30 31  31 65 64 66 37 61 30 0a  |000000011edf7a0.|
11edf7b0  30 30 30 30 30 30 30 31  31 65 64 66 37 62 30 0a  |000000011edf7b0.|
11edf7c0  30 30 30 30 30 30 30 31  31 65 64 66 37 63 30 0a  |000000011edf7c0.|
11edf7d0
# hexdump -C -n 64 -s 300808080 /dev/md43
11edf790  30 30 30 30 30 30 30 31  31 65 64 66 37 39 30 0a  |000000011edf790.|
11edf7a0  30 30 30 30 30 30 30 31  31 65 64 66 37 61 30 0a  |000000011edf7a0.|
11edf7b0  30 30 30 30 30 30 30 31  31 65 64 66 37 62 30 0a  |000000011edf7b0.|
11edf7c0  30 30 30 30 30 30 30 31  31 65 64 66 37 63 30 0a  |000000011edf7c0.|
11edf7d0

For this offset, md42 and md43 overlapped. Grow progressed that far yet 
without writing into the original data of the 5disk raid5. This could be 
a suitable merge point for a linear device mapping.

# hexdump -C -n 64 -s 400008080 /dev/md42
17d7a390  30 30 30 30 30 30 30 31  37 64 37 61 33 39 30 0a  |000000017d7a390.|
17d7a3a0  30 30 30 30 30 30 30 31  37 64 37 61 33 61 30 0a  |000000017d7a3a0.|
17d7a3b0  30 30 30 30 30 30 30 31  37 64 37 61 33 62 30 0a  |000000017d7a3b0.|
17d7a3c0  30 30 30 30 30 30 30 31  37 64 37 61 33 63 30 0a  |000000017d7a3c0.|
17d7a3d0
# hexdump -C -n 64 -s 400008080 /dev/md43
17d7a390  30 30 30 30 30 30 30 31  33 31 37 61 33 39 30 0a  |00000001317a390.|
17d7a3a0  30 30 30 30 30 30 30 31  33 31 37 61 33 61 30 0a  |00000001317a3a0.|
17d7a3b0  30 30 30 30 30 30 30 31  33 31 37 61 33 62 30 0a  |00000001317a3b0.|
17d7a3c0  30 30 30 30 30 30 30 31  33 31 37 61 33 63 30 0a  |00000001317a3c0.|
17d7a3d0

For this offset, md42 is correct and md43 is wrong.
Grow did not progress that far.

That's the general outline of the idea. 
The problem in your case is of course, your data is not that easy to verify.

( You can't even easily verify your disk order, offsets, et cetera.
  These are things you have to figure out by yourself,
  not sure how else to help you. Best of luck. )

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 11:04               ` Andreas Klauer
@ 2017-04-24 12:13                 ` Patrik Dahlström
  2017-04-24 12:37                   ` Andreas Klauer
  2017-04-25 23:01                 ` Patrik Dahlström
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24 12:13 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-24 13:04 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Mon, Apr 24, 2017 at 09:34:04AM +0200, Patrik Dahlström wrote:
>> I've let a program compare both raid sets (5 and 6 disk) overnight. So
>> far it has gone from 128 MB to 14 TB without finding common data. Does
>> that tell us anything?
>
> Are both RAID sets created correctly?
>
> On the 6 disk one, `file -s /dev/mdX` should say ext filesystem.
>
> If that's not there it's certainly incorrect. (The reverse isn't true
> though.)
I'm afraid it doesn't say that. I can get the exact command I used
when I get home. I do know that both raids contains only zeros for
many MB before any data appears.

> That's the general outline of the idea.
> The problem in your case is of course, your data is not that easy to
> verify.
My raid contains many large files (8-12 GB each). If I can get
reference data, I should be able to locate where on the disks the file
is split up. Would that help? I imagine file system fragmentation
could become an issue.
>
> ( You can't even easily verify your disk order, offsets, et cetera.
>   These are things you have to figure out by yourself,
>   not sure how else to help you. Best of luck. )
From old kernel log, we know that the disk order was /dev/sd[abdcef],
given that the drives were always discovered in that order. Could the
offsets be verified with data from reference files as discussed above?
>
> Regards
> Andreas Klauer
>
Best regards
Patrik Dahlström

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 12:13                 ` Patrik Dahlström
@ 2017-04-24 12:37                   ` Andreas Klauer
  2017-04-24 12:54                     ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-24 12:37 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Mon, Apr 24, 2017 at 02:13:24PM +0200, Patrik Dahlström wrote:
> I'm afraid it doesn't say that.

You said you found an ext header in the raw data.

If that exists then only thing I can think of is that you ended 
up picking the wrong offset (or disk order) after all.

> I do know that both raids contains only zeros for
> many MB before any data appears.

This could be normal for the 5disk array since that part already 
reshaped and the offset changed and there just could happen to 
be zeroes somewhere in the beginning of a filesystem after the 
first block of metadata.

Basically the 5disk array is supposed to have bogus data at 
the start in your case. But it should turn into valid data 
at whatever point the reshape did not yet reach.

This bogus data makes it hard to determine the correct offset 
but according to the output you showed before the offset should 
be 128M here.

For the 6disk array you should see valid data (starting with 
the filesystem header) for however far the reshape was already done.
Depending on how the filesystem works you might even be able to 
mount it but everything that is located behind the progress point 
would appear corrupted.

Again if one of the disks actually was kicked from the array 
while the grow was going on, you should leave that disk out as missing 
as otherwise it will just appear as wrong data in both arrays.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 12:37                   ` Andreas Klauer
@ 2017-04-24 12:54                     ` Patrik Dahlström
  2017-04-24 13:39                       ` Andreas Klauer
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24 12:54 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-24 14:37 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Mon, Apr 24, 2017 at 02:13:24PM +0200, Patrik Dahlström wrote:
>> I'm afraid it doesn't say that.
>
> You said you found an ext header in the raw data.
Yes, I found ext headers in both /dev/sda and /dev/sdf, but it doesn't
show up in the 6 disk raid (/dev/md1).
>
> If that exists then only thing I can think of is that you ended
> up picking the wrong offset (or disk order) after all.
What offset are you referring to here? The --data-offset to the mdadm
--create command?

>
>> I do know that both raids contains only zeros for
>> many MB before any data appears.
>
> This could be normal for the 5disk array since that part already
> reshaped and the offset changed and there just could happen to
> be zeroes somewhere in the beginning of a filesystem after the
> first block of metadata.
Makes sense
>
> Basically the 5disk array is supposed to have bogus data at
> the start in your case. But it should turn into valid data
> at whatever point the reshape did not yet reach.
Should I then be able to find a copy of ext4 superblock in the 5 disk
array once valid data start to appear?
>
> This bogus data makes it hard to determine the correct offset
> but according to the output you showed before the offset should
> be 128M here.
I found out that there existed a ext4 file system at offset 0x7B80000
(123,5 MB) on both /dev/sda and /dev/sdb. I will adjust my mdadm
--create commands to this offset when I get home and try again.
>
> For the 6disk array you should see valid data (starting with
> the filesystem header) for however far the reshape was already done.
> Depending on how the filesystem works you might even be able to
> mount it but everything that is located behind the progress point
> would appear corrupted.
Interesting. Like I said above, I will retry the create commands with
123.5 MB --data-offset.
>
> Again if one of the disks actually was kicked from the array
> while the grow was going on, you should leave that disk out as missing
> as otherwise it will just appear as wrong data in both arrays.
I don't think the disk was ever kicked out. The kernel reset the link
and continued, I believe.
>
> Regards
> Andreas Klauer
>
Best regards
Patrik Dahlström

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 12:54                     ` Patrik Dahlström
@ 2017-04-24 13:39                       ` Andreas Klauer
  2017-04-24 14:05                         ` Patrik Dahlström
  2017-04-24 23:00                         ` Patrik Dahlström
  0 siblings, 2 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-24 13:39 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Mon, Apr 24, 2017 at 02:54:50PM +0200, Patrik Dahlström wrote:
> What offset are you referring to here? The --data-offset to the mdadm
> --create command?

Yes, data offset...

> I don't think the disk was ever kicked out. The kernel reset the link
> and continued, I believe.

Okay.

Another thing you can check is, pick an arbitrary offset that does not 
have zeroes on the disks, and see if the XOR matches for 5, or 6 disks.

It should match for 6 for however far the grow progressed, 
and match for 5 afterwards.

Again with my example from before,

# for hexdump -C -n 16 -s 10000000 "$f"
00989680  30 30 30 30 30 30 30 30  32 61 38 39 36 38 30 0a  |000000002a89680.|
00989690                                                             ^^
00989680  30 30 30 30 30 30 30 30  32 61 38 39 36 38 30 0a  |000000002a89680.|
00989690                                                             ^^
00989680  30 30 30 30 30 30 30 30  32 62 30 39 36 38 30 0a  |000000002b09680.|
00989690                                                             ^^
00989680  30 30 30 30 30 30 30 30  32 62 38 39 36 38 30 0a  |000000002b89680.|
00989690                                                             ^^
00989680  30 30 30 30 30 30 30 30  32 63 30 39 36 38 30 0a  |000000002c09680.|
00989690                                                             ^^ 
00989680  30 30 30 30 30 30 30 30  32 63 38 39 36 38 30 0a  |000000002c89680.|
00989690                                                             ^^

2a ^ 2a ^ 2b ^ 2b ^ 2c = 2c OK for 6 disk area

# for hexdump -C -n 16 -s 90000000 "$f"
055d4a80  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
055d4a90                                                              ^^
055d4a80  30 30 30 30 30 30 30 31  35 32 35 34 61 38 30 0a  |000000015254a80.|
055d4a90                                                              ^^
055d4a80  30 30 30 30 30 30 30 31  35 32 64 34 61 38 30 0a  |0000000152d4a80.|
055d4a90                                                              ^^
055d4a80  30 30 30 30 30 30 30 31  35 33 35 34 61 38 30 0a  |000000015354a80.|
055d4a90                                                              ^^ 
055d4a80  30 30 30 30 30 30 30 31  35 33 64 34 61 38 30 0a  |0000000153d4a80.|
055d4a90                                                              ^^

25 ^ 2d ^ 35 ^ 3d = \0\0 OK for 5 disk area

(I did not intend for the parity to be zero in this example, 
 but that's just how it turned out to be...)

If you have a 5disk parity match in the zone that should be 6disks, 
then maybe one of your earlier mdadm create experiments started to 
sync the raid.

This would be yet another nail in your coffin.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 13:39                       ` Andreas Klauer
@ 2017-04-24 14:05                         ` Patrik Dahlström
  2017-04-24 14:21                           ` Andreas Klauer
  2017-04-24 16:00                           ` Patrik Dahlström
  2017-04-24 23:00                         ` Patrik Dahlström
  1 sibling, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24 14:05 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-24 15:39 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Mon, Apr 24, 2017 at 02:54:50PM +0200, Patrik Dahlström wrote:
>> What offset are you referring to here? The --data-offset to the mdadm
>> --create command?
>
> Yes, data offset...
It all makes sense to me now! I believe I was still in sort of a
panicked mode yesterday. The pieces are starting to make more sense
today.

> Another thing you can check is, pick an arbitrary offset that does not
> have zeroes on the disks, and see if the XOR matches for 5, or 6 disks.
I will try that when I get home.
>
> If you have a 5disk parity match in the zone that should be 6disks,
> then maybe one of your earlier mdadm create experiments started to
> sync the raid.
This is definitely a possibility.
>
> This would be yet another nail in your coffin.
Could you define how big nail? I can live with some data loss, but
preferably not 20 TB.

Best regards
Patrik Dahlström

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 14:05                         ` Patrik Dahlström
@ 2017-04-24 14:21                           ` Andreas Klauer
  2017-04-24 16:00                           ` Patrik Dahlström
  1 sibling, 0 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-24 14:21 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Mon, Apr 24, 2017 at 04:05:39PM +0200, Patrik Dahlström wrote:
> > This would be yet another nail in your coffin.
> Could you define how big nail?

Lost another drive ...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 14:05                         ` Patrik Dahlström
  2017-04-24 14:21                           ` Andreas Klauer
@ 2017-04-24 16:00                           ` Patrik Dahlström
  1 sibling, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24 16:00 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid



On 04/24/2017 04:05 PM, Patrik Dahlström wrote:
> 2017-04-24 15:39 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
>> On Mon, Apr 24, 2017 at 02:54:50PM +0200, Patrik Dahlström wrote:
>>> What offset are you referring to here? The --data-offset to the mdadm
>>> --create command?
>>
>> Yes, data offset...
> It all makes sense to me now! I believe I was still in sort of a
> panicked mode yesterday. The pieces are starting to make more sense
> today.
Picking the offset I found out yesterday, I did indeed get a ext4 file
system on the 6 disk raid. I could even mount it at one point. Strangely
enough, I also got a file system  on the 5 disk raid. Running a quick
test showed that offset 0.5 MB to 1.5 MB (3 chunks) had identical data,
and also one chunk at 32 GB. I'm gonna leave the program running for a
while now and see what I can figure out. The first chunk is most
probably identical too, but my program ignores chunks that are all zeros.
> 
>> Another thing you can check is, pick an arbitrary offset that does not
>> have zeroes on the disks, and see if the XOR matches for 5, or 6 disks.
> I will try that when I get home.
Still haven't tried this. I have some life stuff to do as well.

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-23 12:15               ` Patrik Dahlström
@ 2017-04-24 21:04                 ` Phil Turmel
  2017-04-24 21:56                   ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Phil Turmel @ 2017-04-24 21:04 UTC (permalink / raw)
  To: Patrik Dahlström, Wols Lists, Roman Mamedov
  Cc: Andreas Klauer, linux-raid

On 04/23/2017 08:15 AM, Patrik Dahlström wrote:
> 
> 
> On 04/23/2017 02:11 PM, Wols Lists wrote:
>> On 23/04/17 12:58, Roman Mamedov wrote:
>>> On Sun, 23 Apr 2017 12:36:24 +0100
>>> Wols Lists <antlists@youngman.org.uk> wrote:
>>>
>>>> And, as the raid wiki tells you, download lspci and run that
>>>
>>> Maybe you meant lsdrv. https://github.com/pturmel/lsdrv
>>>
>> Sorry, yes I did ... (too many ls_xxx commands :-)
> Ok, I had to patch lsdrv a bit to make it run. Diff:

Thanks for the patch.  Could you elaborate a bit on the errors you
received so I can reproduce and document this fully?

Also, do you have some large files (media files, perhaps) that you know
are in your array but you have a copy in hand?  If so, you could use the
findHash script in my github account to map how that file is laid out on
your array's devices.  Since large media files tend to be contiguous,
such a map would definitively show your chunk size and device order.

It would also show if your data offsets are consistent among the member
drives (but not the absolute value of the offset).

Phil



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 21:04                 ` Phil Turmel
@ 2017-04-24 21:56                   ` Patrik Dahlström
  2017-04-24 23:35                     ` Phil Turmel
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24 21:56 UTC (permalink / raw)
  To: Phil Turmel, Wols Lists, Roman Mamedov; +Cc: Andreas Klauer, linux-raid



On 04/24/2017 11:04 PM, Phil Turmel wrote:
> On 04/23/2017 08:15 AM, Patrik Dahlström wrote:
>>
>>
>> On 04/23/2017 02:11 PM, Wols Lists wrote:
>>> On 23/04/17 12:58, Roman Mamedov wrote:
>>>> On Sun, 23 Apr 2017 12:36:24 +0100
>>>> Wols Lists <antlists@youngman.org.uk> wrote:
>>>>
>>>>> And, as the raid wiki tells you, download lspci and run that
>>>>
>>>> Maybe you meant lsdrv. https://github.com/pturmel/lsdrv
>>>>
>>> Sorry, yes I did ... (too many ls_xxx commands :-)
>> Ok, I had to patch lsdrv a bit to make it run. Diff:
> 
> Thanks for the patch.  Could you elaborate a bit on the errors you
> received so I can reproduce and document this fully?
Sure. It started out with this error:
$ ./lsdrv
Traceback (most recent call last):
  File "./lsdrv", line 413, in <module>
    probe_block('/sys/block/'+x)
  File "./lsdrv", line 389, in probe_block
    blk.FS = "MD %s (%s/%s)%s %s" % (blk.array.md.LEVEL, blk.slave.slot,
blk.array.md.raid_disks, peers, blk.slave.state)
AttributeError: 'NoneType' object has no attribute 'LEVEL'

So I added an if statement for blk.array.md.
Next, I got this error:
$ ./lsdrv
Traceback (most recent call last):
  File "./lsdrv", line 414, in <module>
    probe_block('/sys/block/'+x)
  File "./lsdrv", line 406, in probe_block
    blk.FS += " '%s'" % blk.ID_FS_LABEL
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

That's what the other 2 if statements are for. I don't claim to know the
root cause of the errors, I've simply worked around them.

> 
> Also, do you have some large files (media files, perhaps) that you know
> are in your array but you have a copy in hand?  If so, you could use the
> findHash script in my github account to map how that file is laid out on
> your array's devices.  Since large media files tend to be contiguous,
> such a map would definitively show your chunk size and device order.
I'll take a look. I definitely have some large continuous files on this
array.
> 
> It would also show if your data offsets are consistent among the member
> drives (but not the absolute value of the offset).
> 
> Phil
> 
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 13:39                       ` Andreas Klauer
  2017-04-24 14:05                         ` Patrik Dahlström
@ 2017-04-24 23:00                         ` Patrik Dahlström
  2017-04-25  0:16                           ` Andreas Klauer
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-24 23:00 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid



On 04/24/2017 03:39 PM, Andreas Klauer wrote:
> On Mon, Apr 24, 2017 at 02:54:50PM +0200, Patrik Dahlström wrote:
> Another thing you can check is, pick an arbitrary offset that does not 
> have zeroes on the disks, and see if the XOR matches for 5, or 6 disks.
> 
> It should match for 6 for however far the grow progressed, 
> and match for 5 afterwards.
I got some success!

I started experimenting with this and realised that about 1 TB into
/dev/sdf, I started getting all zeros from the disk. Incidentally this
will also make the xor pass for both 5 and 6 disk arrays. Here's some
examples:
# for f in /dev/sd{a,b,d,c,e,f}; do hexdump -C -n 16 -s 0x8000000000 "$f"; done
8000000000  96 6c 5d 2c a4 03 7a 62  9c 7d 67 b9 55 24 aa 84  |.l],..zb.}g.U$..|
8000000010
8000000000  59 10 a8 e9 a7 5e fa e9  cd 8c 16 a2 7c 06 60 f6  |Y....^......|.`.|
8000000010
8000000000  7d ca 8e ea cc fe 2a 36  be b8 a8 b6 77 f9 fa 87  |}.....*6....w...|
8000000010
8000000000  4d 42 49 1e 0b ae a7 3b  42 50 68 bb c1 d5 89 96  |MBI....;BPh.....|
8000000010
8000000000  fc 3d c7 b7 82 39 06 a7  ad fc 00 81 05 5b 52 e1  |.=...9.......[R.|
8000000010
8000000000  03 c9 f5 86 46 34 0b 21  00 e5 b1 97 9a 55 eb 82  |....F4.!.....U..|
8000000010
5 disk raid xor check: 84 ^ f6 ^ 87 ^ 96 ^ e1 == 82, NOK
6 disk raid xor check: 84 ^ f6 ^ 87 ^ 96 ^ e1 ^ 82 == 0, OK

# for f in /dev/sd{a,b,d,c,e,f}; do hexdump -C -n 16 -s 0x10000000000 "$f"; done
10000000000  03 33 0a 04 d9 7a 44 4b  dc 8f be 58 0b 80 8f 46  |.3...zDK...X...F|
10000000010
10000000000  79 5f 18 51 f7 44 03 59  7f aa ce 9d f1 a9 3d 73  |y_.Q.D.Y......=s|
10000000010
10000000000  67 f4 9e b2 34 b6 c2 43  b5 8d 2f 0d 5f 80 a4 6d  |g...4..C../._..m|
10000000010
10000000000  4f ec f4 c9 da 04 64 db  dc e9 e1 72 7f e4 74 06  |O.....d....r..t.|
10000000010
10000000000  52 74 78 2e c0 8c e1 8a  ca 41 be ba da 4d 62 5e  |Rtx......A...Mb^|
10000000010
10000000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
10000000010
5 disk raid xor check: 46 ^ 73 ^ 6d ^ 06 ^ 5e == 0, OK
6 disk raid xor check: 46 ^ 73 ^ 6d ^ 06 ^ 5e ^ 00 == 0, OK

So I did a binary search from 0x8000000000 to 0x10000000000 to see
where I start getting all zeros from the newest drive. I ended up at
offset 0xfaa2880000 (~0.98 TB or ~18 % into the disk). After that, it's
all zeros.

# for f in /dev/sd{a,b,d,c,e,f}; do hexdump -C -n 16 -s 0xfaa287fff8 "$f"; done
faa287fff8  e5 4d 6b e6 ef 7b 1f b0  2e 30 82 8e 5b 4b e0 30  |.Mk..{...0..[K.0|
faa2880008
faa287fff8  dd f7 ab 02 cf e8 a2 6d  93 a8 08 a7 d8 9e c7 b4  |.......m........|
faa2880008
faa287fff8  31 f8 7b d9 41 c7 72 13  f1 37 b6 4a 51 fc 46 74  |1.{.A.r..7.JQ.Ft|
faa2880008
faa287fff8  87 29 59 58 97 05 87 1b  1a 8d 83 84 83 b0 21 4a  |.)YX..........!J|
faa2880008
faa287fff8  04 2b 32 6d 0f ab fb b7  56 22 bf e7 51 99 40 ba  |.+2m....V"..Q.@.|
faa2880008
faa287fff8  f4 06 c8 81 00 00 03 ae  00 00 00 00 00 00 00 00  |................|
faa2880008

Now, at offset 0xfaa2880000 the 5 disk raid xor sums are OK:
0xfaa2880000: 2e ^ 93 ^ f1 ^ 1a ^ 56 = 0, OK

But immediately before that, I can't get the xor sums to line up:
0xfaa287ffff: b0 ^ 6d ^ 13 ^ 1b ^ b7 != ae (62 actually), NOK
This would mean that it's incorrect for both 5 and 6 disk raids.

So I did a binary search backwards to find out where the checksums match
again and came to offset 0xed90332000 (~0.93 TB or ~17 % into the disk).
# for f in /dev/sd{a,b,d,c,e,f}; do hexdump -C -n 16 -s 0xed90331ff8 "$f"; done
ed90331ff8  69 1a b1 fb 9f 80 cf fa  6f 97 dc b7 26 40 38 6f  |i.......o...&@8o|
ed90332008
ed90331ff8  e3 58 c4 e7 7a 9a b5 a4  74 37 5a de d8 24 a6 e5  |.X..z...t7Z..$..|
ed90332008
ed90331ff8  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
ed90332008
ed90331ff8  53 0b 7c bb 49 d6 63 55  ff fb b2 40 35 00 03 39  |S.|.I.cU...@5..9|
ed90332008
ed90331ff8  9b 49 25 23 ec ad 89 0c  ce a0 29 29 a2 86 bc c2  |.I%#......))....|
ed90332008
ed90331ff8  bd ff d3 7b bf 9e 6f f8  17 bc f3 8d e7 9b d0 65  |...{..o........e|
ed90332008

0xed90331fff: fa ^ a4 ^ ff ^ 55 ^ 0c == f8, OK for 6 disk raid
0xed90332000: 6f ^ 74 ^ ff ^ ff ^ ce != 17 (d5 actually), NOK

That is a span of ~52 GB where I presumably can't get the checksums
right. What does all this mean? What am I missing?

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 21:56                   ` Patrik Dahlström
@ 2017-04-24 23:35                     ` Phil Turmel
  0 siblings, 0 replies; 63+ messages in thread
From: Phil Turmel @ 2017-04-24 23:35 UTC (permalink / raw)
  To: Patrik Dahlström, Wols Lists, Roman Mamedov
  Cc: Andreas Klauer, linux-raid

On 04/24/2017 05:56 PM, Patrik Dahlström wrote:
> On 04/24/2017 11:04 PM, Phil Turmel wrote:

>> Thanks for the patch.  Could you elaborate a bit on the errors you
>> received so I can reproduce and document this fully?
> Sure. It started out with this error:
> $ ./lsdrv
> Traceback (most recent call last):
>   File "./lsdrv", line 413, in <module>
>     probe_block('/sys/block/'+x)
>   File "./lsdrv", line 389, in probe_block
>     blk.FS = "MD %s (%s/%s)%s %s" % (blk.array.md.LEVEL, blk.slave.slot,
> blk.array.md.raid_disks, peers, blk.slave.state)
> AttributeError: 'NoneType' object has no attribute 'LEVEL'

Ok.  I'll spin up an Ubuntu 16.04 VM to play with this.  Thanks.

>> Also, do you have some large files (media files, perhaps) that you know
>> are in your array but you have a copy in hand?  If so, you could use the
>> findHash script in my github account to map how that file is laid out on
>> your array's devices.  Since large media files tend to be contiguous,
>> such a map would definitively show your chunk size and device order.

> I'll take a look. I definitely have some large continuous files on this
> array.

The first draft of this script was written for a fellow in a situation
very similar to yours.  The results were miraculous.  Not to get your
hopes up too high, though -- my first impression of this thread is that
you're screwed. /-:

Phil


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 23:00                         ` Patrik Dahlström
@ 2017-04-25  0:16                           ` Andreas Klauer
  2017-04-25  8:44                             ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-25  0:16 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Tue, Apr 25, 2017 at 01:00:47AM +0200, Patrik Dahlström wrote:
> 6 disk raid xor check: 84 ^ f6 ^ 87 ^ 96 ^ e1 ^ 82 == 0, OK

This should be the 6 disk raid area.

> 5 disk raid xor check: 46 ^ 73 ^ 6d ^ 06 ^ 5e == 0, OK

This should be the 5 disk raid area.

> 6 disk raid xor check: 46 ^ 73 ^ 6d ^ 06 ^ 5e ^ 00 == 0, OK

Still 5 disks... grow did not progess until here, 
and the 6th disk is likely zero because it's new.

> But immediately before that, I can't get the xor sums to line up:
> 0xfaa287ffff: b0 ^ 6d ^ 13 ^ 1b ^ b7 != ae (62 actually), NOK
> This would mean that it's incorrect for both 5 and 6 disk raids.

Not too sure about this point.

If it up and died in mid-grow there might be a chunk that's wrong.

But that's a few kilobytes, not...

> That is a span of ~52 GB where I presumably can't get the checksums
> right. What does all this mean? What am I missing?

...well, it would make sense if a disk got kicked / went missing 
and it progressed the reshape for another ~52GB afterwards. 

If you still had your original md metadata the --examine would clear
that point up but unfortunately...

In a /dev/md that doesn't have that same disk as missing, this would 
result in roughly ~260-320Gs of data that is garbage (because one drive 
was not reshaped but the others were so every nth chunk is wrong).

You might still be able to survive that (if the raid6 <-> raid5 overlap 
zone is larger than that - I didn't do the math, but at a progress of 
17% of your 6T disks you've added about 1T? Might just work out).

So you might have these zones on your RAIDs

6DISK: ?G VALID-DATA : ~320G of GARBAGE : ?G 5DISK-WRONGOFFSET-NONSENSE
5DISK: ?G 6DISK-WRONGOFFSET-NONSENSE : ~260G of GARBAGE : ?G VALID DATA

And you're hoping the VALID DATA areas will overlap. They would if it 
progressed far enough with all disks and not too far with one missing.

Or you just have to identify the questionable drive and kick it out.

You have some experimenteering to do :-|

( 
    Not sure if I'm still making sense at this point. Sorry.
)

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25  0:16                           ` Andreas Klauer
@ 2017-04-25  8:44                             ` Patrik Dahlström
  2017-04-25  9:01                               ` Andreas Klauer
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-25  8:44 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-25 2:16 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Tue, Apr 25, 2017 at 01:00:47AM +0200, Patrik Dahlström wrote:
>> 6 disk raid xor check: 84 ^ f6 ^ 87 ^ 96 ^ e1 ^ 82 == 0, OK
>
> This should be the 6 disk raid area.
Agreed
>
>> 5 disk raid xor check: 46 ^ 73 ^ 6d ^ 06 ^ 5e == 0, OK
>
> This should be the 5 disk raid area.
Agreed
>
>> 6 disk raid xor check: 46 ^ 73 ^ 6d ^ 06 ^ 5e ^ 00 == 0, OK
>
> Still 5 disks... grow did not progess until here,
> and the 6th disk is likely zero because it's new.
Agreed
>
>> But immediately before that, I can't get the xor sums to line up:
>> 0xfaa287ffff: b0 ^ 6d ^ 13 ^ 1b ^ b7 != ae (62 actually), NOK
>> This would mean that it's incorrect for both 5 and 6 disk raids.
>
> Not too sure about this point.
>
> If it up and died in mid-grow there might be a chunk that's wrong.
>
> But that's a few kilobytes, not...
>
>> That is a span of ~52 GB where I presumably can't get the checksums
>> right. What does all this mean? What am I missing?
>
> ...well, it would make sense if a disk got kicked / went missing
> and it progressed the reshape for another ~52GB afterwards.
Would that mean that I should be able to get the checksum to match if
I remove one of the values in a mismatching series. I have tried this,
but never got it correct.

> And you're hoping the VALID DATA areas will overlap. They would if it
> progressed far enough with all disks and not too far with one missing.
How do I test this? I haven't really used overlays before.

> Or you just have to identify the questionable drive and kick it out.
I'm still not sure if a drive was actually kicked out of the array. I
don't have any memory of discovering that a drive was suddenly kicked
out. Would a kicked out drive be automatically re-added if it came
back?
Shouldn't I find something about it in my syslog or kernel log?
I checked my command history and I didn't find any --fail commands
(except on /dev/mapper/sdf)
Should I post the command history here? It is quite long and doesn't
contain return codes or command output.

>
> You have some experimenteering to do :-|
>
> (
>     Not sure if I'm still making sense at this point. Sorry.
> )
>
> Regards
> Andreas Klauer
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25  8:44                             ` Patrik Dahlström
@ 2017-04-25  9:01                               ` Andreas Klauer
  2017-04-25 10:40                                 ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-25  9:01 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Tue, Apr 25, 2017 at 10:44:15AM +0200, Patrik Dahlström wrote:
> Would that mean that I should be able to get the checksum to match if
> I remove one of the values in a mismatching series. I have tried this,
> but never got it correct.

No, there is no parity left with a missing drive in a raid5.

But you might find some duplicate blocks of one drive still in 5disk 
layout on the other drives where it would have ended up in 6disk layout.
You won't find all of them since there are parity blocks on both sides.
 
> Shouldn't I find something about it in my syslog or kernel log?

I suppose so, if you didn't boot a rescue/live system in between 
(they assemble raid and grow continues and whatever happens there 
is not in your logs).

Mind showing the --examine output of the two raid sets you're working 
with, did you check that the offsets are correct for either one?

You found multiple filesystem headers, do you know the UUID it should 
have so you're not working with some old remnant?

Not sure what to tell you really. :)
I hope things turn out to be simpler than they appear right now.

Good luck
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25  9:01                               ` Andreas Klauer
@ 2017-04-25 10:40                                 ` Patrik Dahlström
  2017-04-25 10:51                                   ` Patrik Dahlström
  2017-04-25 11:08                                   ` Andreas Klauer
  0 siblings, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-25 10:40 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-25 11:01 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Tue, Apr 25, 2017 at 10:44:15AM +0200, Patrik Dahlström wrote:
>> Would that mean that I should be able to get the checksum to match if
>> I remove one of the values in a mismatching series. I have tried this,
>> but never got it correct.
>
> No, there is no parity left with a missing drive in a raid5.
I was thinking if it was in fact a 5 disk raid + garbage from 1 disk,
then the checksum would be correct if the garbage disk was filtered
out. Does that make sens?
>
> But you might find some duplicate blocks of one drive still in 5disk
> layout on the other drives where it would have ended up in 6disk layout.
> You won't find all of them since there are parity blocks on both sides.
Not sure if I follow
>
>> Shouldn't I find something about it in my syslog or kernel log?
>
> I suppose so, if you didn't boot a rescue/live system in between
> (they assemble raid and grow continues and whatever happens there
> is not in your logs).
I did boot 1 rescue system, but it was a raid rescue system that,
according to their docs, never write anything to the disks so we
should be fine. I did a grep on mdadm in both syslogs and kernel logs
and I didn't find anything about kicking a drive.
>
> Mind showing the --examine output of the two raid sets you're working
> with, did you check that the offsets are correct for either one?
Attached to the end of the email. AFAIK, the data offsets are pointing
to where the ext4 file system begins. Honestly, I don't know what to
look for in the output.

>
> You found multiple filesystem headers, do you know the UUID it should
> have so you're not working with some old remnant?
The file systems looks correct:
$ grep storage /etc/fstab
# commented out during recovery
#UUID=345ec7b8-b523-45d3-8c2e-35cda1ab62c1 /storage        ext4
errors=remount-ro 0       1
$ file -s /dev/md0
/dev/md0: Linux rev 1.0 ext4 filesystem data,
UUID=345ec7b8-b523-45d3-8c2e-35cda1ab62c1 (extents) (64bit) (large
files) (huge files)
$ file -s /dev/md1
/dev/md1: Linux rev 1.0 ext4 filesystem data,
UUID=345ec7b8-b523-45d3-8c2e-35cda1ab62c1 (extents) (64bit) (large
files) (huge files)

You mentioned something about linear device mapping before. What is
that? Is it something I could experiment with? How do I do that?
If I can get copies of files I know should be present on the raid set,
would that help?

Output of examine calls:
$ mdadm --examine /dev/mapper/sd{a,b,d,c,e}
/dev/mapper/sda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1d7570c5:da2c78dd:eb7add42:73d27d8d
           Name : rack-server-1:0  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:54 2017
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 23441584128 (22355.64 GiB 24004.18 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : c285c661:ae3a9df5:dadb24d8:ea226018

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:54 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 97093390 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1d7570c5:da2c78dd:eb7add42:73d27d8d
           Name : rack-server-1:0  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:54 2017
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 12111493040 (5775.21 GiB 6201.08 GB)
     Array Size : 23441584128 (22355.64 GiB 24004.18 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=390700976 sectors
          State : clean
    Device UUID : fe47f0cf:68894c26:9464bfc6:3dd1b973

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:54 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 97201a94 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1d7570c5:da2c78dd:eb7add42:73d27d8d
           Name : rack-server-1:0  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:54 2017
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 23441584128 (22355.64 GiB 24004.18 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : 80965407:ecae968e:936beab7:0dedbc67

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:54 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 4b3126a - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1d7570c5:da2c78dd:eb7add42:73d27d8d
           Name : rack-server-1:0  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:54 2017
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 23441584128 (22355.64 GiB 24004.18 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : 121f2396:94a105f4:db8d9b34:8f950eb3

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:54 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : c0f3586f - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 1d7570c5:da2c78dd:eb7add42:73d27d8d
           Name : rack-server-1:0  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:54 2017
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 23441584128 (22355.64 GiB 24004.18 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : 973d238d:dd08107c:186646f0:2131ad56

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:54 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 9f47520d - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)

$ mdadm --examine /dev/mapper/sd{a,b,d,c,e,f}-2
/dev/mapper/sda-2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b83927d8:ec75eb8e:9d1d1016:4cce5dff
           Name : rack-server-1:1  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:22 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 29301980160 (27944.55 GiB 30005.23 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : 4536ac05:80fa79ce:fb940867:4dea2d14

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:22 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : a7c0d060 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdb-2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b83927d8:ec75eb8e:9d1d1016:4cce5dff
           Name : rack-server-1:1  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:22 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 12111493040 (5775.21 GiB 6201.08 GB)
     Array Size : 29301980160 (27944.55 GiB 30005.23 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=390700976 sectors
          State : clean
    Device UUID : 2990130a:7759c23c:fed7a8e8:1ea1542f

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:22 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ce812210 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdd-2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b83927d8:ec75eb8e:9d1d1016:4cce5dff
           Name : rack-server-1:1  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:22 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 29301980160 (27944.55 GiB 30005.23 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : ebd7e323:0480c5e8:f6362f38:0926ec10

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:22 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : ae28d543 - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdc-2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b83927d8:ec75eb8e:9d1d1016:4cce5dff
           Name : rack-server-1:1  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:22 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 29301980160 (27944.55 GiB 30005.23 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : 5bd0d200:3878824d:f1ad243f:130503c0

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:22 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : a5e11bed - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sde-2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b83927d8:ec75eb8e:9d1d1016:4cce5dff
           Name : rack-server-1:1  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:22 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 29301980160 (27944.55 GiB 30005.23 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : b859bdd4:b10a1761:a5803a47:76c9775b

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:22 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 30eacedc - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/mapper/sdf-2:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : b83927d8:ec75eb8e:9d1d1016:4cce5dff
           Name : rack-server-1:1  (local to host rack-server-1)
  Creation Time : Mon Apr 24 21:54:22 2017
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 11720792240 (5588.91 GiB 6001.05 GB)
     Array Size : 29301980160 (27944.55 GiB 30005.23 GB)
  Used Dev Size : 11720792064 (5588.91 GiB 6001.05 GB)
    Data Offset : 252928 sectors
   Super Offset : 8 sectors
   Unused Space : before=252840 sectors, after=176 sectors
          State : clean
    Device UUID : 1c2e4b33:e87a49fa:55e6f42e:bab18c10

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Apr 24 21:54:22 2017
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : c57a616b - correct
         Events : 0

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25 10:40                                 ` Patrik Dahlström
@ 2017-04-25 10:51                                   ` Patrik Dahlström
  2017-04-25 11:08                                   ` Andreas Klauer
  1 sibling, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-25 10:51 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-25 12:40 GMT+02:00, Patrik Dahlström <risca@powerlamerz.org>:
> 2017-04-25 11:01 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
>> On Tue, Apr 25, 2017 at 10:44:15AM +0200, Patrik Dahlström wrote:
>> You found multiple filesystem headers, do you know the UUID it should
>> have so you're not working with some old remnant?
> The file systems looks correct:
> $ grep storage /etc/fstab
> # commented out during recovery
> #UUID=345ec7b8-b523-45d3-8c2e-35cda1ab62c1 /storage        ext4
> errors=remount-ro 0       1
> $ file -s /dev/md0
> /dev/md0: Linux rev 1.0 ext4 filesystem data,
> UUID=345ec7b8-b523-45d3-8c2e-35cda1ab62c1 (extents) (64bit) (large
> files) (huge files)
> $ file -s /dev/md1
> /dev/md1: Linux rev 1.0 ext4 filesystem data,
> UUID=345ec7b8-b523-45d3-8c2e-35cda1ab62c1 (extents) (64bit) (large
> files) (huge files)
This actually got me thinking. During my destructive recovery
attempts, I would either don't specify a --data-offset or use
--data-offset=128M. Since the correct offsets are less than 128M
(123,5 MB actually), that data would be untouched in case a
reshape/rebuild was triggered by my attempts. That must explain why
the first 4 chunks of /dev/md{0,1} were identical when using correct
offset, right?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25 10:40                                 ` Patrik Dahlström
  2017-04-25 10:51                                   ` Patrik Dahlström
@ 2017-04-25 11:08                                   ` Andreas Klauer
  2017-04-25 11:37                                     ` Patrik Dahlström
  2017-04-27 19:57                                     ` Patrik Dahlström
  1 sibling, 2 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-25 11:08 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Tue, Apr 25, 2017 at 12:40:37PM +0200, Patrik Dahlström wrote:
> I was thinking if it was in fact a 5 disk raid + garbage from 1 disk,
> then the checksum would be correct if the garbage disk was filtered
> out. Does that make sens?

No. With one disk missing there is no parity (no way to verify it).

> You mentioned something about linear device mapping before. What is
> that? Is it something I could experiment with? How do I do that?

https://www.kernel.org/doc/Documentation/device-mapper/linear.txt

Once you have found where data overlaps on both raids, you create 
a linear mapping of start..X of the 6disk raid, followed by X..end 
of the 5disk RAID.

That way you get a device that holds your data intact as a whole, 
whereas the raid sets would give you the first half of data on 
the 6disk raid set (what was already reshaped) and the other half 
on the 5disk raid set (what had yet to be reshaped).

This is only a way to get read access at the data, making the raid 
work as a standalone again (resume reshape with lost raid metadata) 
is another problem, to be tackled after you backed up your data ;)

> Output of examine calls:

> /dev/mapper/sda:
>     Data Offset : 252928 sectors

> /dev/mapper/sda-2:
>     Data Offset : 252928 sectors

Wrong.

mdadm --grow changes the offset, so:

The 5disk raid should have the original offset 
(presumably 128M, 262144 sectors)

The 6disk raid should have the new offset 
(where you found your already reshaped ext4 filesystem header)

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25 11:08                                   ` Andreas Klauer
@ 2017-04-25 11:37                                     ` Patrik Dahlström
  2017-04-25 12:41                                       ` Andreas Klauer
  2017-04-25 18:22                                       ` Wols Lists
  2017-04-27 19:57                                     ` Patrik Dahlström
  1 sibling, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-25 11:37 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-25 13:08 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Tue, Apr 25, 2017 at 12:40:37PM +0200, Patrik Dahlström wrote:
>> I was thinking if it was in fact a 5 disk raid + garbage from 1 disk,
>> then the checksum would be correct if the garbage disk was filtered
>> out. Does that make sens?
>
> No. With one disk missing there is no parity (no way to verify it).
>
>> You mentioned something about linear device mapping before. What is
>> that? Is it something I could experiment with? How do I do that?
>
> https://www.kernel.org/doc/Documentation/device-mapper/linear.txt
>
> Once you have found where data overlaps on both raids, you create
> a linear mapping of start..X of the 6disk raid, followed by X..end
> of the 5disk RAID.
>
> That way you get a device that holds your data intact as a whole,
> whereas the raid sets would give you the first half of data on
> the 6disk raid set (what was already reshaped) and the other half
> on the 5disk raid set (what had yet to be reshaped).
>
> This is only a way to get read access at the data, making the raid
> work as a standalone again (resume reshape with lost raid metadata)
> is another problem, to be tackled after you backed up your data ;)
I'm not sure how I would backup 20 TB of data. I'll backup what I can,
of course. The most essential.
>
>> Output of examine calls:
>
>> /dev/mapper/sda:
>>     Data Offset : 252928 sectors
>
>> /dev/mapper/sda-2:
>>     Data Offset : 252928 sectors
>
> Wrong.
>
> mdadm --grow changes the offset, so:
Oh, I see. Does it do that every time I grow? In that case the
original 5disk raid won't have 128M offset either.
The full history of this raid:
1. 2x6 TB + md0 (4x2 TB striped) <- data offset at 128M
2. replace md0 with 6 TB, rebuild data. Shouldn't change data offset
since this is basically the same as replacing a faulty disk, right?
3. Grow to 4x6 TB <- data offset changed (unknown).
4. Grow to 5x6 TB <- data offset changed (unknown).
5. Grow to 6x6 TB <- data offset changed. This is when the problem started.

Does the data offset change by a fixed or calculated value?
Can I calculate the data offset by comparing to known data?

I must say that you help has been most valuable. You have my eternal gratitude.

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25 11:37                                     ` Patrik Dahlström
@ 2017-04-25 12:41                                       ` Andreas Klauer
  2017-04-25 18:22                                       ` Wols Lists
  1 sibling, 0 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-25 12:41 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Tue, Apr 25, 2017 at 01:37:53PM +0200, Patrik Dahlström wrote:
> Oh, I see. Does it do that every time I grow? In that case the
> original 5disk raid won't have 128M offset either.

The 128M offset is what we guessed from the device size and log output 
you provided a few mails back in the discussion.

On raid assembly the raid capacity is printed in the syslog and that's 
the capacity it would have with your disk size and 128M offset.

So I'd start with that if possible...

> Does the data offset change by a fixed or calculated value?
> Can I calculate the data offset by comparing to known data?

I guess it depends on mdadm/kernel version, and also the exact command, 
the number of disks involved, the chunk size, ...

For example the offset does not shift when you use --backup-file.

Without metadata it's a puzzle.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25 11:37                                     ` Patrik Dahlström
  2017-04-25 12:41                                       ` Andreas Klauer
@ 2017-04-25 18:22                                       ` Wols Lists
  1 sibling, 0 replies; 63+ messages in thread
From: Wols Lists @ 2017-04-25 18:22 UTC (permalink / raw)
  To: Patrik Dahlström, Andreas Klauer; +Cc: Brad Campbell, linux-raid

On 25/04/17 12:37, Patrik Dahlström wrote:
> I'm not sure how I would backup 20 TB of data. I'll backup what I can,
> of course. The most essential.

Buy some cheap disks :-)

A quick skim gives me £111 for a 4TB drive, but that'll be tricky to
find. It is a Barracuda, however ...

https://www.eclipsecomputers.com/Product/4TB-Seagate-Barracuda-5900rpm-SATA3-64Mb-HardDrive/

Better are 8TB (£260) or 10TB (£365, £420) drives, but they're pricy.

https://www.eclipsecomputers.com/Product/8Tb-Seagate-SkyHawk-7200rpm-SATA3-256Mb-Hard-Drive/

https://www.eclipsecomputers.com/Product/10Tb-Seagate-IronWolf-7200rpm-SATA3-256Mb-Hard-Drive/

https://www.eclipsecomputers.com/Product/10Tb-Seagate-IronWolf-Pro-7200rpm-SATA3-256Mb-Hard-Drive/

If it's just a temporary backup, I'd try and get another couple of
drives same as your raid, and beg borrow or steal as many drives from
your friends as you need. Your local little computer shop might have a
load of drives salvaged from dead pc's that you can borrow (or maybe not :-(

Cheers,
Wol

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-24 11:04               ` Andreas Klauer
  2017-04-24 12:13                 ` Patrik Dahlström
@ 2017-04-25 23:01                 ` Patrik Dahlström
  1 sibling, 0 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-25 23:01 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid



On 04/24/2017 01:04 PM, Andreas Klauer wrote:
> On Mon, Apr 24, 2017 at 09:34:04AM +0200, Patrik Dahlström wrote:
> Now create two RAID sets:
> 
> # losetup -D
> # for f in ? ; do cp "$f" "$f".a ; done;
> # for f in ? ; do cp "$f" "$f".b ; done;
> # for a in *.a ; do losetup --find --show "$a" ; done
> # for b in *.b ; do losetup --find --show "$b" ; done
> # mdadm --create /dev/md42 --assume-clean --level=5 --raid-devices=5 /dev/loop{0,1,2,3,4}
> # mdadm --create /dev/md42 --assume-clean --level=5 --raid-devices=6 /dev/loop{5,6,7,8,9,10}
> 
> # cat /proc/mdstat 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
> md42 : active raid5 loop4[4] loop3[3] loop2[2] loop1[1] loop0[0]
>       405504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
>       
> md43 : active raid5 loop10[5] loop9[4] loop8[3] loop7[2] loop6[1] loop5[0]
>       506880 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/6] [UUUUUU]
> 
> And compare:
> 
> # hexdump -C -n 64 /dev/md42
> 00000000  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 0a  |000000000000000.|
> 00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 31 30 0a  |000000000000010.|
> 00000020  30 30 30 30 30 30 30 30  30 30 30 30 30 32 30 0a  |000000000000020.|
> 00000030  30 30 30 30 30 30 30 30  30 30 30 30 30 33 30 0a  |000000000000030.|
> 00000040
> # hexdump -C -n 64 /dev/md43
> 00000000  30 30 30 30 30 30 30 30  30 30 30 30 30 30 30 0a  |000000000000000.|
> 00000010  30 30 30 30 30 30 30 30  30 30 30 30 30 31 30 0a  |000000000000010.|
> 00000020  30 30 30 30 30 30 30 30  30 30 30 30 30 32 30 0a  |000000000000020.|
> 00000030  30 30 30 30 30 30 30 30  30 30 30 30 30 33 30 0a  |000000000000030.|
> 00000040
> 
> This is identical because in this example, the offset didn't change.
> 
> # hexdump -C -n 64 -s 80808080 /dev/md42
> 04d10890  30 30 30 30 30 30 30 30  35 66 31 30 38 39 30 0a  |000000005f10890.|
> 04d108a0  30 30 30 30 30 30 30 30  35 66 31 30 38 61 30 0a  |000000005f108a0.|
> 04d108b0  30 30 30 30 30 30 30 30  35 66 31 30 38 62 30 0a  |000000005f108b0.|
> 04d108c0  30 30 30 30 30 30 30 30  35 66 31 30 38 63 30 0a  |000000005f108c0.|
> 04d108d0
> # hexdump -C -n 64 -s 80808080 /dev/md43
> 04d10890  30 30 30 30 30 30 30 30  34 64 31 30 38 39 30 0a  |000000004d10890.|
> 04d108a0  30 30 30 30 30 30 30 30  34 64 31 30 38 61 30 0a  |000000004d108a0.|
> 04d108b0  30 30 30 30 30 30 30 30  34 64 31 30 38 62 30 0a  |000000004d108b0.|
> 04d108c0  30 30 30 30 30 30 30 30  34 64 31 30 38 63 30 0a  |000000004d108c0.|
> 04d108d0
> 
> For this offset, md42 was wrong, md43 is correct.
> 
> # hexdump -C -n 64 -s 300808080 /dev/md42
> 11edf790  30 30 30 30 30 30 30 31  31 65 64 66 37 39 30 0a  |000000011edf790.|
> 11edf7a0  30 30 30 30 30 30 30 31  31 65 64 66 37 61 30 0a  |000000011edf7a0.|
> 11edf7b0  30 30 30 30 30 30 30 31  31 65 64 66 37 62 30 0a  |000000011edf7b0.|
> 11edf7c0  30 30 30 30 30 30 30 31  31 65 64 66 37 63 30 0a  |000000011edf7c0.|
> 11edf7d0
> # hexdump -C -n 64 -s 300808080 /dev/md43
> 11edf790  30 30 30 30 30 30 30 31  31 65 64 66 37 39 30 0a  |000000011edf790.|
> 11edf7a0  30 30 30 30 30 30 30 31  31 65 64 66 37 61 30 0a  |000000011edf7a0.|
> 11edf7b0  30 30 30 30 30 30 30 31  31 65 64 66 37 62 30 0a  |000000011edf7b0.|
> 11edf7c0  30 30 30 30 30 30 30 31  31 65 64 66 37 63 30 0a  |000000011edf7c0.|
> 11edf7d0
> 
> For this offset, md42 and md43 overlapped. Grow progressed that far yet 
> without writing into the original data of the 5disk raid5. This could be 
> a suitable merge point for a linear device mapping.
> 
> # hexdump -C -n 64 -s 400008080 /dev/md42
> 17d7a390  30 30 30 30 30 30 30 31  37 64 37 61 33 39 30 0a  |000000017d7a390.|
> 17d7a3a0  30 30 30 30 30 30 30 31  37 64 37 61 33 61 30 0a  |000000017d7a3a0.|
> 17d7a3b0  30 30 30 30 30 30 30 31  37 64 37 61 33 62 30 0a  |000000017d7a3b0.|
> 17d7a3c0  30 30 30 30 30 30 30 31  37 64 37 61 33 63 30 0a  |000000017d7a3c0.|
> 17d7a3d0
> # hexdump -C -n 64 -s 400008080 /dev/md43
> 17d7a390  30 30 30 30 30 30 30 31  33 31 37 61 33 39 30 0a  |00000001317a390.|
> 17d7a3a0  30 30 30 30 30 30 30 31  33 31 37 61 33 61 30 0a  |00000001317a3a0.|
> 17d7a3b0  30 30 30 30 30 30 30 31  33 31 37 61 33 62 30 0a  |00000001317a3b0.|
> 17d7a3c0  30 30 30 30 30 30 30 31  33 31 37 61 33 63 30 0a  |00000001317a3c0.|
> 17d7a3d0
> 
> For this offset, md42 is correct and md43 is wrong.
> Grow did not progress that far.
> 
> That's the general outline of the idea. 
> The problem in your case is of course, your data is not that easy to verify.
I've been experimenting with this idea today by writing a small program
that looks for a mkv (matroska) header in a file, prints the offset and
then exit. I then extract 10 MB from that offset and try to play the
file with mpv.

$ ./find_matroska /dev/md1 0x202C0000000
Offset: 0x202C0000000 (2059 GB)
Offset: 0x20300000000 (2060 GB)
Found magic @ 0x2030CFCDDD7
Not matroska
Offset: 0x20340000000 (2061 GB)
Offset: 0x20380000000 (2062 GB)
Found magic @ 0x203A0005A49
Not matroska
Found magic @ 0x203AA800000
It's matroska

$ dd if=/dev/md1 bs=524288 count=20 skip=$((0x203AA800000/524288)) of=/tmp/raw.mkv
20+0 records in
20+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0748999 s, 140 MB/s

(copy to laptop and play)

On the 6 disk raid, I had no problem finding an mkv file at relatively
low offset (< 4 TB) that would play the whole 10 MB without issues.
Somewhere between 4-5 TB I start to get corrupted videos again. I
actually stumbled upon the exact same video on offsets:
0x400C1800000 (4 TB) : Intact, grow has come here
0x500F2B00000 (5 TB) : Corrupted, grow did not progress here yet

However, I was not able to do that on the 5 disk raid. It didn't matter
if I started the search at 1 TB or 16 TB, it would always have errors.

Would this mean that my data offset is wrong for the 5 disk raid?

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-25 11:08                                   ` Andreas Klauer
  2017-04-25 11:37                                     ` Patrik Dahlström
@ 2017-04-27 19:57                                     ` Patrik Dahlström
  2017-04-27 23:12                                       ` Andreas Klauer
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-27 19:57 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid


On 04/25/2017 01:08 PM, Andreas Klauer wrote:
> On Tue, Apr 25, 2017 at 12:40:37PM +0200, Patrik Dahlström wrote:
>> You mentioned something about linear device mapping before. What is
>> that? Is it something I could experiment with? How do I do that?
> 
> https://www.kernel.org/doc/Documentation/device-mapper/linear.txt
> 
> Once you have found where data overlaps on both raids, you create 
> a linear mapping of start..X of the 6disk raid, followed by X..end 
> of the 5disk RAID.
> 
> That way you get a device that holds your data intact as a whole, 
> whereas the raid sets would give you the first half of data on 
> the 6disk raid set (what was already reshaped) and the other half 
> on the 5disk raid set (what had yet to be reshaped).

Success! Using a 126M as data offset gave me valid data for the 5 disk
raid and using linear device mapping I'm able to access my data again.
Some is probably corrupted from my previous destructive recovery
attempts, but it seems like most of the data is still there and
accessible. I will start backup up the most essential data now.

> This is only a way to get read access at the data, making the raid 
> work as a standalone again (resume reshape with lost raid metadata) 
> is another problem, to be tackled after you backed up your data ;)

Now, how do we do this?

// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-27 19:57                                     ` Patrik Dahlström
@ 2017-04-27 23:12                                       ` Andreas Klauer
  2017-04-28  7:11                                         ` Patrik Dahlström
  2017-04-28 22:46                                         ` Patrik Dahlström
  0 siblings, 2 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-27 23:12 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Thu, Apr 27, 2017 at 09:57:20PM +0200, Patrik Dahlström wrote:
> Success! Using a 126M as data offset gave me valid data for the 5 disk
> raid and using linear device mapping I'm able to access my data again.

Nice.

> Some is probably corrupted from my previous destructive recovery
> attempts, but it seems like most of the data is still there and
> accessible. I will start backup up the most essential data now.

If nothing else happened, I'd expect it to be intact (or at least 
not any more corrupt than it would be after a regular power loss).
If you encounter any files with corrupted contents, you could use 
`filefrag` or `hdparm --fibmap` to determine where it is physically 
located and then perhaps see if that's anywhere near your danger zone. 
It could still mean you didn't choose the correct X for linear mapping.

But of course, I don't know everything that happened. :-P

> > This is only a way to get read access at the data, making the raid 
> > work as a standalone again (resume reshape with lost raid metadata) 
> > is another problem, to be tackled after you backed up your data ;)
> 
> Now, how do we do this?

Well, there's an elegant approach that works beautifully and perfectly 
and entirely risk free... and then there's a simple, stupid, quick&dirty.
Only problem is, I'm way too lazy to describe the elegant one. Shucks.

The quick and dirty method is... grow using dd from 5disk raid (overlay)
to 6disk raid (no overlay) starting from the correct offset (verify!). 
That resumes your reshape in an offline-ish, hackish manner, and once 
it's done you have the whole thing. That's the theory anyway.

Pseudocode:

mdadm --create /dev/md5 --assume-clean /dev/overlay/{a,b,c}
mdadm --create /dev/md6 --assume-clean /dev/{a,b,c,d} # +1 drive
losetup --offset=X /dev/loop55 /dev/md5
losetup --offset=X /dev/loop66 /dev/md6
cmp /dev/loop55 /dev/loop66 # much identical, so wow
dd bs=1M status=progress if=/dev/loop55 of=/dev/loop66

It's dangerous, okay?
I didn't tell you to do it, okay?
Don't blame me, okay?
Backup your stuff first, okay?

As this writes to your drives, you have only one shot to get it right.
You're not allowed to mount until dd is done. It's an offline operation.
If dd were to be aborted for any reason, the offset would shift to X+n.
If dd were to encounter write errors without aborting, corruption ensues.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-27 23:12                                       ` Andreas Klauer
@ 2017-04-28  7:11                                         ` Patrik Dahlström
  2017-04-28  9:52                                           ` Andreas Klauer
  2017-04-28 22:46                                         ` Patrik Dahlström
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-28  7:11 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-28 1:12 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Thu, Apr 27, 2017 at 09:57:20PM +0200, Patrik Dahlström wrote:
>> Success! Using a 126M as data offset gave me valid data for the 5 disk
>> raid and using linear device mapping I'm able to access my data again.
>
> Nice.
>
>> Some is probably corrupted from my previous destructive recovery
>> attempts, but it seems like most of the data is still there and
>> accessible. I will start backup up the most essential data now.
>
> If nothing else happened, I'd expect it to be intact (or at least
> not any more corrupt than it would be after a regular power loss).
> If you encounter any files with corrupted contents, you could use
> `filefrag` or `hdparm --fibmap` to determine where it is physically
> located and then perhaps see if that's anywhere near your danger zone.
> It could still mean you didn't choose the correct X for linear mapping.
>
> But of course, I don't know everything that happened. :-P

During some of my experiments, reshaping was triggered with wrong
metadata. This corrupted my data. I'm gonna ignore this and let fsck
solve it. I already tried it with overlays and linear device mapping.
>
>> > This is only a way to get read access at the data, making the raid
>> > work as a standalone again (resume reshape with lost raid metadata)
>> > is another problem, to be tackled after you backed up your data ;)
>>
>> Now, how do we do this?
>
> Well, there's an elegant approach that works beautifully and perfectly
> and entirely risk free... and then there's a simple, stupid, quick&dirty.
> Only problem is, I'm way too lazy to describe the elegant one. Shucks.
Ha!
>
> The quick and dirty method is... grow using dd from 5disk raid (overlay)
> to 6disk raid (no overlay) starting from the correct offset (verify!).
> That resumes your reshape in an offline-ish, hackish manner, and once
> it's done you have the whole thing. That's the theory anyway.
>
> Pseudocode:
>
> mdadm --create /dev/md5 --assume-clean /dev/overlay/{a,b,c}
> mdadm --create /dev/md6 --assume-clean /dev/{a,b,c,d} # +1 drive
> losetup --offset=X /dev/loop55 /dev/md5
> losetup --offset=X /dev/loop66 /dev/md6
> cmp /dev/loop55 /dev/loop66 # much identical, so wow
> dd bs=1M status=progress if=/dev/loop55 of=/dev/loop66
Is it missing a 'skip=<offset/1M>'  here?
>
> It's dangerous, okay?
> I didn't tell you to do it, okay?
> Don't blame me, okay?
> Backup your stuff first, okay?
Ha!
>
> As this writes to your drives, you have only one shot to get it right.
> You're not allowed to mount until dd is done. It's an offline operation.
> If dd were to be aborted for any reason, the offset would shift to X+n.
> If dd were to encounter write errors without aborting, corruption ensues.
Swell!
>
> Regards
> Andreas Klauer
>

Thanks a lot. I will try this later.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-28  7:11                                         ` Patrik Dahlström
@ 2017-04-28  9:52                                           ` Andreas Klauer
  2017-04-28 10:31                                             ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-28  9:52 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Fri, Apr 28, 2017 at 09:11:45AM +0200, Patrik Dahlström wrote:
> Is it missing a 'skip=<offset/1M>'  here?

In this example the offset is provided by the loop devices.

If you don't like loop devices you can use seek=X skip=X (same X)
but you should triple check the offset is correct, data at this 
offset should be identical on both arrays. (What my example 
checks using `cmp` which should return differ at byte xxxxxxxxx. 
Have a look at it with hexdump too so it's not just zeroes...)

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-28  9:52                                           ` Andreas Klauer
@ 2017-04-28 10:31                                             ` Patrik Dahlström
  2017-04-28 11:39                                               ` Andreas Klauer
  0 siblings, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-28 10:31 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid

2017-04-28 11:52 GMT+02:00, Andreas Klauer <Andreas.Klauer@metamorpher.de>:
> On Fri, Apr 28, 2017 at 09:11:45AM +0200, Patrik Dahlström wrote:
>> Is it missing a 'skip=<offset/1M>'  here?
>
> In this example the offset is provided by the loop devices.
>
Yes, I saw that after I sent my reply.

Can't I reduce my risks by doing it the other way around? Restore the
5 disk raid and then restart the reshape?
dd if=/dev/md6 bs=1M count=X/1M of=/dev/md5
mdadm --grow --raid-devices=6 /dev/md5 --add /dev/sdf
--backup-file=/root/grow_md5.bak

It's a bit more disk I/O, but less risk

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-28 10:31                                             ` Patrik Dahlström
@ 2017-04-28 11:39                                               ` Andreas Klauer
  0 siblings, 0 replies; 63+ messages in thread
From: Andreas Klauer @ 2017-04-28 11:39 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Fri, Apr 28, 2017 at 12:31:43PM +0200, Patrik Dahlström wrote:
> Can't I reduce my risks by doing it the other way around?

Not like this.

> dd if=/dev/md6 bs=1M count=X/1M of=/dev/md5

Writes into data you haven't read yet, so you end up reading data 
you just wrote and write that again... the result will be garbage.

ddrescue has a reverse mode but implementation details matter a lot. 
It might be tempting because your progress is just 10-20%ish but 
going forwards is a lot safer here.

If you must go backwards you should use mdadm's revert-reshape 
for which you need RAID metadata that properly represents the 
current mid-grow state of your RAID (the elegant approach).

You can produce such metadata by

truncate -s <drivesize> a b c d e f
losetup --find -show
mdadm --create 5disk
mdadm --grow 6disk
<fallocate punchhole in the background>
<adapt sync_speed_min/max in the background>
<watch progress until X>
mdadm --stop
mdadm --examine <verify correct offsets and reshape pos'n>

Tadaa.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-27 23:12                                       ` Andreas Klauer
  2017-04-28  7:11                                         ` Patrik Dahlström
@ 2017-04-28 22:46                                         ` Patrik Dahlström
  2017-04-29  9:56                                           ` Andreas Klauer
  1 sibling, 1 reply; 63+ messages in thread
From: Patrik Dahlström @ 2017-04-28 22:46 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid



On 04/28/2017 01:12 AM, Andreas Klauer wrote:
> On Thu, Apr 27, 2017 at 09:57:20PM +0200, Patrik Dahlström wrote:
> The quick and dirty method is... grow using dd from 5disk raid (overlay)
> to 6disk raid (no overlay) starting from the correct offset (verify!). 
> That resumes your reshape in an offline-ish, hackish manner, and once 
> it's done you have the whole thing. That's the theory anyway.
> 
> Pseudocode:
> 
> mdadm --create /dev/md5 --assume-clean /dev/overlay/{a,b,c}
> mdadm --create /dev/md6 --assume-clean /dev/{a,b,c,d} # +1 drive
I'm stuck at this step
$ mdadm --create --assume-clean /dev/md1 --data-offset=126464K --level=5 --raid-devices=6 /dev/sda /dev/sdb /dev/sdd /dev/sdc /dev/sde /dev/sdf
mdadm: cannot open /dev/sda: Device or resource busy

My overlay setup is keeping /dev/sd[abdce] busy. How do I setup the
overlay to keep it from occupying the disks?
Overlays are currently created like this:
$ dmsetup create ${b} --table "0 $size_bkl snapshot $d $loop P 8"

Best regards
// Patrik


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-28 22:46                                         ` Patrik Dahlström
@ 2017-04-29  9:56                                           ` Andreas Klauer
  2017-05-02 13:08                                             ` Patrik Dahlström
  0 siblings, 1 reply; 63+ messages in thread
From: Andreas Klauer @ 2017-04-29  9:56 UTC (permalink / raw)
  To: Patrik Dahlström; +Cc: Brad Campbell, linux-raid

On Sat, Apr 29, 2017 at 12:46:15AM +0200, Patrik Dahlström wrote:
> mdadm: cannot open /dev/sda: Device or resource busy

Well, normally you're not supposed to do this. :-)
Easy to wreck your data if you do.

> My overlay setup is keeping /dev/sd[abdce] busy. How do I setup the
> overlay to keep it from occupying the disks?

You could create the RAID first and the overlay afterwards.
Or you can create loop devices on the busy devices and use those.

Regards
Andreas Klauer

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-04-29  9:56                                           ` Andreas Klauer
@ 2017-05-02 13:08                                             ` Patrik Dahlström
  2017-05-02 13:11                                               ` Brad Campbell
  2017-05-02 15:49                                               ` Anthony Youngman
  0 siblings, 2 replies; 63+ messages in thread
From: Patrik Dahlström @ 2017-05-02 13:08 UTC (permalink / raw)
  To: Andreas Klauer; +Cc: Brad Campbell, linux-raid



On 04/29/2017 11:56 AM, Andreas Klauer wrote:
> On Sat, Apr 29, 2017 at 12:46:15AM +0200, Patrik Dahlström wrote:
> You could create the RAID first and the overlay afterwards.
> Or you can create loop devices on the busy devices and use those.
I solved it by creating loop devices and use dd. I let it finish, then
ran fsck and resize2fs on the partition and now my data is back!

Thank you all for your invaluable help!

Would it be worth it for me to write up an article on the wiki on how to
re-create messed up metadata?

Best regards
// Patrik

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-05-02 13:08                                             ` Patrik Dahlström
@ 2017-05-02 13:11                                               ` Brad Campbell
  2017-05-02 15:49                                               ` Anthony Youngman
  1 sibling, 0 replies; 63+ messages in thread
From: Brad Campbell @ 2017-05-02 13:11 UTC (permalink / raw)
  To: Patrik Dahlström, Andreas Klauer; +Cc: linux-raid

On 02/05/17 21:08, Patrik Dahlström wrote:

>
> On 04/29/2017 11:56 AM, Andreas Klauer wrote:
>> On Sat, Apr 29, 2017 at 12:46:15AM +0200, Patrik Dahlström wrote:
>> You could create the RAID first and the overlay afterwards.
>> Or you can create loop devices on the busy devices and use those.
> I solved it by creating loop devices and use dd. I let it finish, then
> ran fsck and resize2fs on the partition and now my data is back!
>
> Thank you all for your invaluable help!
>
> Would it be worth it for me to write up an article on the wiki on how to
> re-create messed up metadata?
>
> Best regards
> // Patrik
>

Yes please! And maybe an extra warning about getting full --detail and 
--examine information *before* making any alterations to an array!

Regards,
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Recover array after I panicked
  2017-05-02 13:08                                             ` Patrik Dahlström
  2017-05-02 13:11                                               ` Brad Campbell
@ 2017-05-02 15:49                                               ` Anthony Youngman
  1 sibling, 0 replies; 63+ messages in thread
From: Anthony Youngman @ 2017-05-02 15:49 UTC (permalink / raw)
  To: Patrik Dahlström, Andreas Klauer; +Cc: Brad Campbell, linux-raid

On 02/05/17 14:08, Patrik Dahlström wrote:
>
>
> On 04/29/2017 11:56 AM, Andreas Klauer wrote:
>> On Sat, Apr 29, 2017 at 12:46:15AM +0200, Patrik Dahlström wrote:
>> You could create the RAID first and the overlay afterwards.
>> Or you can create loop devices on the busy devices and use those.
> I solved it by creating loop devices and use dd. I let it finish, then
> ran fsck and resize2fs on the partition and now my data is back!
>
> Thank you all for your invaluable help!
>
> Would it be worth it for me to write up an article on the wiki on how to
> re-create messed up metadata?
>
Hey - I was planning to do that! :-)

Seriously, yes of course, please do. That would be wonderful. It belongs 
best after "Recovering a damaged RAID" in the "When things go Wrogn" 
section, i think.

Just skim the contribution guidelines before you start, they're nothing 
onerous, just use good English and use "I" a lot, take responsibility 
for what you write. It's meant to be easy reading but informative. A 
case study of what you did would be marvelous.

I need to edit the page about that script you ran, and put a big big 
warning in front of it not to run it recklessly :-)

(Contact me off-line if you haven't got an account and have difficulty 
creating one.)

Cheers,
Wol

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2017-05-02 15:49 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-23  9:47 Recover array after I panicked Patrik Dahlström
2017-04-23 10:16 ` Andreas Klauer
2017-04-23 10:23   ` Patrik Dahlström
2017-04-23 10:46     ` Andreas Klauer
2017-04-23 11:12       ` Patrik Dahlström
2017-04-23 11:36         ` Wols Lists
2017-04-23 11:47           ` Patrik Dahlström
2017-04-23 11:53             ` Reindl Harald
2017-04-23 11:58           ` Roman Mamedov
2017-04-23 12:11             ` Wols Lists
2017-04-23 12:15               ` Patrik Dahlström
2017-04-24 21:04                 ` Phil Turmel
2017-04-24 21:56                   ` Patrik Dahlström
2017-04-24 23:35                     ` Phil Turmel
2017-04-23 13:16         ` Andreas Klauer
2017-04-23 13:49           ` Patrik Dahlström
2017-04-23 14:36             ` Andreas Klauer
2017-04-23 14:45               ` Patrik Dahlström
2017-04-23 12:32     ` Patrik Dahlström
2017-04-23 12:45       ` Andreas Klauer
2017-04-23 12:57         ` Patrik Dahlström
2017-04-23 14:06 ` Brad Campbell
2017-04-23 14:09   ` Patrik Dahlström
2017-04-23 14:20     ` Patrik Dahlström
2017-04-23 14:25     ` Brad Campbell
2017-04-23 14:48   ` Andreas Klauer
2017-04-23 15:11     ` Patrik Dahlström
2017-04-23 15:24       ` Patrik Dahlström
2017-04-23 15:42       ` Andreas Klauer
2017-04-23 16:29         ` Patrik Dahlström
2017-04-23 19:21         ` Patrik Dahlström
2017-04-24  2:09           ` Brad Campbell
2017-04-24  7:34             ` Patrik Dahlström
2017-04-24 11:04               ` Andreas Klauer
2017-04-24 12:13                 ` Patrik Dahlström
2017-04-24 12:37                   ` Andreas Klauer
2017-04-24 12:54                     ` Patrik Dahlström
2017-04-24 13:39                       ` Andreas Klauer
2017-04-24 14:05                         ` Patrik Dahlström
2017-04-24 14:21                           ` Andreas Klauer
2017-04-24 16:00                           ` Patrik Dahlström
2017-04-24 23:00                         ` Patrik Dahlström
2017-04-25  0:16                           ` Andreas Klauer
2017-04-25  8:44                             ` Patrik Dahlström
2017-04-25  9:01                               ` Andreas Klauer
2017-04-25 10:40                                 ` Patrik Dahlström
2017-04-25 10:51                                   ` Patrik Dahlström
2017-04-25 11:08                                   ` Andreas Klauer
2017-04-25 11:37                                     ` Patrik Dahlström
2017-04-25 12:41                                       ` Andreas Klauer
2017-04-25 18:22                                       ` Wols Lists
2017-04-27 19:57                                     ` Patrik Dahlström
2017-04-27 23:12                                       ` Andreas Klauer
2017-04-28  7:11                                         ` Patrik Dahlström
2017-04-28  9:52                                           ` Andreas Klauer
2017-04-28 10:31                                             ` Patrik Dahlström
2017-04-28 11:39                                               ` Andreas Klauer
2017-04-28 22:46                                         ` Patrik Dahlström
2017-04-29  9:56                                           ` Andreas Klauer
2017-05-02 13:08                                             ` Patrik Dahlström
2017-05-02 13:11                                               ` Brad Campbell
2017-05-02 15:49                                               ` Anthony Youngman
2017-04-25 23:01                 ` Patrik Dahlström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.