All of lore.kernel.org
 help / color / mirror / Atom feed
* on assembly and recovery of a hardware RAID
@ 2017-03-10 20:37 Alfred Matthews
  2017-03-13 21:32 ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Alfred Matthews @ 2017-03-10 20:37 UTC (permalink / raw)
  To: linux-raid

Hello list. I'm facing a non-redundant Western Digital hardware RAID,
for which, hardware seems to cause a kernel panic at about 3 seconds
running time.

I've assembled the customary testing. The drives appear to be striped RAID 0.

Output: http://pastebin.com/c361jGVx

Evidently WD metadata changes over time, since a new console (adding
USB) will not recognize the drives without erasing them. Files are
visible as files for the short period of controller health.

I imagine I'm trying to assemble a 2 x 3TB RAID array from the
original WD disks when mounted as SATA.

Seeking input on proper mdadm configuration for this.

Then I imagine that I may recover files-as-files from this 2 x 3TB to
standalone disks. Ultimately they would need to move to a new RAID.

Failing: WD My Book Thunderbolt Duo, 2x3TB
New, incompatible: WD My Book Pro, 2x3TB.

Thanks for any comment.

Thanks for your time.

Al Matthews

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-10 20:37 on assembly and recovery of a hardware RAID Alfred Matthews
@ 2017-03-13 21:32 ` NeilBrown
  2017-03-14 17:27   ` Alfred Matthews
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-03-13 21:32 UTC (permalink / raw)
  To: Alfred Matthews, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]

On Fri, Mar 10 2017, Alfred Matthews wrote:

> Hello list. I'm facing a non-redundant Western Digital hardware RAID,
> for which, hardware seems to cause a kernel panic at about 3 seconds
> running time.
>
> I've assembled the customary testing. The drives appear to be striped RAID 0.
>
> Output: http://pastebin.com/c361jGVx
>
> Evidently WD metadata changes over time, since a new console (adding
> USB) will not recognize the drives without erasing them. Files are
> visible as files for the short period of controller health.
>
> I imagine I'm trying to assemble a 2 x 3TB RAID array from the
> original WD disks when mounted as SATA.
>
> Seeking input on proper mdadm configuration for this.
>
> Then I imagine that I may recover files-as-files from this 2 x 3TB to
> standalone disks. Ultimately they would need to move to a new RAID.
>
> Failing: WD My Book Thunderbolt Duo, 2x3TB
> New, incompatible: WD My Book Pro, 2x3TB.
>
> Thanks for any comment.
>
> Thanks for your time.
>

Does
  dmraid -b /dev/sda /dev/sdb /dev/sdc
tell you anything useful?


You will probably want a command like:

 mdadm --build /dev/md0 -l0 -n2 --chunk=SOMETHING /dev/sdXX /dev/sdYY

where SOMETHING is the chunk size. e.g. 64K or 512K or something.
Doing this is non-destructive so you can try several different times,
using "mdadm --stop /dev/md0" to reset before trying again.

After building the array, try "cfdisk /dev/md0" or maybe "fdisk
/dev/md0" to look at the partition table.

What filesystem(s) did you have on the device? Maybe "fsck -n /dev/md0p1"
might tell you if the filesystem looks OK.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-13 21:32 ` NeilBrown
@ 2017-03-14 17:27   ` Alfred Matthews
  2017-03-15  2:56     ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Alfred Matthews @ 2017-03-14 17:27 UTC (permalink / raw)
  To: NeilBrown, linux-raid

> Does
>   dmraid -b /dev/sda /dev/sdb /dev/sdc
> tell you anything useful?
>

# dmraid -b /dev/sdb /dev/sdc
/dev/sdc:   5860533168 total, "WD-WCAWZ2927144"
/dev/sdb:   5860533168 total, "WD-WCAWZ2939730"

>
> You will probably want a command like:
>
>  mdadm --build /dev/md0 -l0 -n2 --chunk=SOMETHING /dev/sdXX /dev/sdYY
>
> where SOMETHING is the chunk size. e.g. 64K or 512K or something.
> Doing this is non-destructive so you can try several different times,
> using "mdadm --stop /dev/md0" to reset before trying again.
>

# mdadm --build /dev/md0 -l0 -n2 --chunk=512K /dev/sdb1 /dev/sdc1
mdadm: array /dev/md0 built and started.

> After building the array, try "cfdisk /dev/md0" or maybe "fdisk
> /dev/md0" to look at the partition table.
>

#fdisk /dev/md0

Command (m for help): p
Disk /dev/md0: 400 MiB, 419430400 bytes, 819200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disklabel type: dos
Disk identifier: 0x00000000

> What filesystem(s) did you have on the device?

In fact, I'm insufficiently familiar with fsck to know why it's
reporting fsck-fat, except to suppose that that's the bootsector
format. The data partitions themselves are Apple RAID and presumed HFS
something.

Maybe "fsck -n /dev/md0p1"
> might tell you if the filesystem looks OK.
>

# fsck -n /dev/md0
fsck from util-linux 2.28.2
fsck.fat 4.0 (2016-05-06)
FATs differ - using first FAT.
Cluster 126974 out of range (43014379 > 403267). Setting to EOF.
Cluster 126975 out of range (2114643 > 403267). Setting to EOF.
Cluster 126976 out of range (3419700 > 403267). Setting to EOF.
Cluster 126977 out of range (2097410 > 403267). Setting to EOF.
Cluster 126980 out of range (1048608 > 403267). Setting to EOF.
Cluster 126982 out of range (409600 > 403267). Setting to EOF.
Cluster 126990 out of range (19464192 > 403267). Setting to EOF.
Cluster 126991 out of range (91280919 > 403267). Setting to EOF.
Cluster 126992 out of range (2115910 > 403267). Setting to EOF.
Cluster 126993 out of range (2105376 > 403267). Setting to EOF.
Cluster 126994 out of range (21372960 > 403267). Setting to EOF.
Cluster 126995 out of range (3289940 > 403267). Setting to EOF.
Cluster 126996 out of range (33169440 > 403267). Setting to EOF.
Cluster 126997 out of range (214994624 > 403267). Setting to EOF.
Cluster 126998 out of range (251362304 > 403267). Setting to EOF.
Cluster 127000 out of range (164004702 > 403267). Setting to EOF.
Cluster 127001 out of range (201328571 > 403267). Setting to EOF.
Cluster 127002 out of range (79725740 > 403267). Setting to EOF.
Cluster 127003 out of range (219067398 > 403267). Setting to EOF.
Cluster 127004 out of range (16116496 > 403267). Setting to EOF.
Cluster 127005 out of range (219598308 > 403267). Setting to EOF.
Cluster 127006 out of range (235539737 > 403267). Setting to EOF.
Cluster 127007 out of range (53309039 > 403267). Setting to EOF.
Cluster 127008 out of range (91517817 > 403267). Setting to EOF.
Cluster 127009 out of range (157556845 > 403267). Setting to EOF.
Cluster 127010 out of range (168651635 > 403267). Setting to EOF.
Cluster 127011 out of range (56980048 > 403267). Setting to EOF.
Cluster 127012 out of range (241246323 > 403267). Setting to EOF.
Cluster 127013 out of range (90906745 > 403267). Setting to EOF.
Cluster 127014 out of range (259268729 > 403267). Setting to EOF.
Cluster 127015 out of range (40202784 > 403267). Setting to EOF.
Cluster 127016 out of range (225734511 > 403267). Setting to EOF.
Cluster 127101 out of range (173342720 > 403267). Setting to EOF.
Cluster 127102 out of range (23155282 > 403267). Setting to EOF.
Cluster 127223 out of range (21066354 > 403267). Setting to EOF.
Cluster 127229 out of range (173342720 > 403267). Setting to EOF.
Cluster 127742 out of range (43014379 > 403267). Setting to EOF.
Cluster 127743 out of range (2114643 > 403267). Setting to EOF.
Cluster 127744 out of range (3419700 > 403267). Setting to EOF.
Cluster 127745 out of range (2097410 > 403267). Setting to EOF.
Cluster 127748 out of range (1048608 > 403267). Setting to EOF.
Cluster 127750 out of range (409600 > 403267). Setting to EOF.
Cluster 127758 out of range (19464192 > 403267). Setting to EOF.
Cluster 127759 out of range (91280919 > 403267). Setting to EOF.
Cluster 127760 out of range (2115910 > 403267). Setting to EOF.
Cluster 127761 out of range (2105376 > 403267). Setting to EOF.
Cluster 127762 out of range (21372960 > 403267). Setting to EOF.
Cluster 127763 out of range (3289940 > 403267). Setting to EOF.
Cluster 127764 out of range (33169440 > 403267). Setting to EOF.
Cluster 127765 out of range (214994624 > 403267). Setting to EOF.
Cluster 127766 out of range (251362304 > 403267). Setting to EOF.
Cluster 127768 out of range (164004702 > 403267). Setting to EOF.
Cluster 127769 out of range (201328571 > 403267). Setting to EOF.
Cluster 127770 out of range (79725740 > 403267). Setting to EOF.
Cluster 127771 out of range (219067398 > 403267). Setting to EOF.
Cluster 127772 out of range (16116496 > 403267). Setting to EOF.
Cluster 127773 out of range (219598308 > 403267). Setting to EOF.
Cluster 127774 out of range (235539737 > 403267). Setting to EOF.
Cluster 127775 out of range (53309039 > 403267). Setting to EOF.
Cluster 127776 out of range (91517817 > 403267). Setting to EOF.
Cluster 127777 out of range (157556845 > 403267). Setting to EOF.
Cluster 127778 out of range (168651635 > 403267). Setting to EOF.
Cluster 127779 out of range (56980048 > 403267). Setting to EOF.
Cluster 127780 out of range (241246323 > 403267). Setting to EOF.
Cluster 127781 out of range (90906745 > 403267). Setting to EOF.
Cluster 127782 out of range (259268729 > 403267). Setting to EOF.
Cluster 127783 out of range (40202784 > 403267). Setting to EOF.
Cluster 127784 out of range (225734511 > 403267). Setting to EOF.
Cluster 127869 out of range (173342720 > 403267). Setting to EOF.
Cluster 127870 out of range (23155282 > 403267). Setting to EOF.
Cluster 127991 out of range (21066354 > 403267). Setting to EOF.
Cluster 127997 out of range (173342720 > 403267). Setting to EOF.
Cluster 131070 out of range (268435440 > 403267). Setting to EOF.
Reclaimed 93 unused clusters (47616 bytes).
Leaving filesystem unchanged.
/dev/md0: 0 files, 1/403266 clusters

Thanks for your comments so far.

Al.

> NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-14 17:27   ` Alfred Matthews
@ 2017-03-15  2:56     ` NeilBrown
  2017-03-18 17:11       ` Alfred Matthews
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-03-15  2:56 UTC (permalink / raw)
  To: Alfred Matthews, linux-raid

[-- Attachment #1: Type: text/plain, Size: 927 bytes --]

On Tue, Mar 14 2017, Alfred Matthews wrote:

>> Does
>>   dmraid -b /dev/sda /dev/sdb /dev/sdc
>> tell you anything useful?
>>
>
> # dmraid -b /dev/sdb /dev/sdc
> /dev/sdc:   5860533168 total, "WD-WCAWZ2927144"
> /dev/sdb:   5860533168 total, "WD-WCAWZ2939730"


No, not useful.

Your other output also doesn't show anything interesting.
I had another look at the lsdrv output you showed before and I'm
wondering if there really is anything "hardware RAID" here at all.

Both drives (sdb and sdc) are partitioned into a 200MB EFI partition, a
2.75TB hfsplus (Apple file system) / unknown partition, and a 128M
hfsplus boot partition.

Maybe the two 2.75TB paritions were raided together.
sdc looks like the "first" device if this were the case.
Try:

  mdadm --build /dev/md0 --level=0 -n2 --chunk=512M /dev/sdc2 /dev/sdb2

Then try fsck.hfs of hpfsck ... in the "hfsplus" package on Debian.
 hpfsck /dev/md0
maybe.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-15  2:56     ` NeilBrown
@ 2017-03-18 17:11       ` Alfred Matthews
  2017-03-18 18:08         ` Alfred Matthews
  0 siblings, 1 reply; 14+ messages in thread
From: Alfred Matthews @ 2017-03-18 17:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Sorry for the delay. Thanks for the reply.

You're correct, there is no hardware RAID in the available lsdrv; I've
removed each drive from its housing in the controller because it's
Thunderbolt-only, Thundderbolt is borked on OSX here, and I'm rescuing
on Linux. I appreciate your diagnosis and will try, and will write
back.

On 14 March 2017 at 22:56, NeilBrown <neilb@suse.com> wrote:
> On Tue, Mar 14 2017, Alfred Matthews wrote:
>
>>> Does
>>>   dmraid -b /dev/sda /dev/sdb /dev/sdc
>>> tell you anything useful?
>>>
>>
>> # dmraid -b /dev/sdb /dev/sdc
>> /dev/sdc:   5860533168 total, "WD-WCAWZ2927144"
>> /dev/sdb:   5860533168 total, "WD-WCAWZ2939730"
>
>
> No, not useful.
>
> Your other output also doesn't show anything interesting.
> I had another look at the lsdrv output you showed before and I'm
> wondering if there really is anything "hardware RAID" here at all.
>
> Both drives (sdb and sdc) are partitioned into a 200MB EFI partition, a
> 2.75TB hfsplus (Apple file system) / unknown partition, and a 128M
> hfsplus boot partition.
>
> Maybe the two 2.75TB paritions were raided together.
> sdc looks like the "first" device if this were the case.
> Try:
>
>   mdadm --build /dev/md0 --level=0 -n2 --chunk=512M /dev/sdc2 /dev/sdb2
>
> Then try fsck.hfs of hpfsck ... in the "hfsplus" package on Debian.
>  hpfsck /dev/md0
> maybe.
>
> NeilBrown



-- 

Alfred S. Matthews

Software, Visuals, Music

Atlanta, Georgia, US

+1.337.214.4688

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-18 17:11       ` Alfred Matthews
@ 2017-03-18 18:08         ` Alfred Matthews
  2017-03-20  5:34           ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Alfred Matthews @ 2017-03-18 18:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

I've switched to the backup drives which are clones of the first, now,
so destructive operations are ok if necessary. Also signatures will
have changed.

0. Hm. Evidently the system is JHFS instead of HFS+, per the output
below. Unsure if there is separate tooling in Debian.

1. Mount via

mdadm --build /dev/md0 --level=0 -n2 --chunk=512K /dev/sdc2 /dev/sdb2

works just fine. Thanks!

2. I'm still sticking with the non-destructive, non-mount edits for
now. So I can report the following:

hpfsck -v /dev/md0 | cat >> hpfsck_output.txt

yields some stuff probably more enlightening than prior.

*** Checking Volume Header:
This HFS+ volume is not wrapped.
signature       : +H
version         : 4
attributes      : 0X80002100
last_mount_vers : JSFH
reserved        : 11178
create_date     : Mon Oct 15 05:03:21 2012
modify_date     : Sat Mar  4 15:53:51 2017
backup_date     : Thu Dec 31 19:00:00 1903
checked_date    : Sun Oct 14 22:03:21 2012
file_count      : 961818
folder_count    : 192894
blocksize       : 2000
total_blocks    : 732482664
free_blocks     : 30231127
next_alloc      : 517947667
rsrc_clump_sz   : 65536
data_clump_sz   : 65536
next_cnid       : 1885617
write_count     : 348970204
encodings_bmp   : 0X200008B
                  Allocation file
total_size          : 0X5752000
clump_size          : 0X5752000
total_blocks        : 0X2BA9
extents             : (0X1+0X2BA9) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0)
                  Extension file
total_size          : 0X1400000
clump_size          : 0X1400000
total_blocks        : 0XA00
extents             : (0X10BAB+0XA00) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0)
                  Catalog file
total_size          : 0X30800000
clump_size          : 0X18400000
total_blocks        : 0X18400
extents             : (0X96BAB+0XC200) (0X15586D+0XC200) (0+0) (0+0)
(0+0) (0+0) (0+0) (0+0)
                  Attribute file
total_size          : 0X18400000
clump_size          : 0X18400000
total_blocks        : 0XC200
extents             : (0X115AB+0XC200) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0)
                  Start file
total_size          : 0
clump_size          : 0
total_blocks        : 0
extents             : (0+0) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0) (0+0)
Reserved attribute in use: 2000
Reserved attribute in use: 80000000
Volume was last Mounted by unknnown implemenatation:
JSFH
Invalid total blocks 2BA8CC68, expected 0 Done ***
*** Checking Backup Volume Header:
Unexpected Volume signature '  ' expected 'H+'

There is also the following on stderr

hpfsck: hpfsck: error writing to medium (Bad file descriptor)

Al.

On 18 March 2017 at 13:11, Alfred Matthews <asm13243546@gmail.com> wrote:
> Sorry for the delay. Thanks for the reply.
>
> You're correct, there is no hardware RAID in the available lsdrv; I've
> removed each drive from its housing in the controller because it's
> Thunderbolt-only, Thundderbolt is borked on OSX here, and I'm rescuing
> on Linux. I appreciate your diagnosis and will try, and will write
> back.
>
> On 14 March 2017 at 22:56, NeilBrown <neilb@suse.com> wrote:
>> On Tue, Mar 14 2017, Alfred Matthews wrote:
>>
>>>> Does
>>>>   dmraid -b /dev/sda /dev/sdb /dev/sdc
>>>> tell you anything useful?
>>>>
>>>
>>> # dmraid -b /dev/sdb /dev/sdc
>>> /dev/sdc:   5860533168 total, "WD-WCAWZ2927144"
>>> /dev/sdb:   5860533168 total, "WD-WCAWZ2939730"
>>
>>
>> No, not useful.
>>
>> Your other output also doesn't show anything interesting.
>> I had another look at the lsdrv output you showed before and I'm
>> wondering if there really is anything "hardware RAID" here at all.
>>
>> Both drives (sdb and sdc) are partitioned into a 200MB EFI partition, a
>> 2.75TB hfsplus (Apple file system) / unknown partition, and a 128M
>> hfsplus boot partition.
>>
>> Maybe the two 2.75TB paritions were raided together.
>> sdc looks like the "first" device if this were the case.
>> Try:
>>
>>   mdadm --build /dev/md0 --level=0 -n2 --chunk=512M /dev/sdc2 /dev/sdb2
>>
>> Then try fsck.hfs of hpfsck ... in the "hfsplus" package on Debian.
>>  hpfsck /dev/md0
>> maybe.
>>
>> NeilBrown
>
>
>
> --
>
> Alfred S. Matthews
>
> Software, Visuals, Music
>
> Atlanta, Georgia, US
>
> +1.337.214.4688



-- 

Alfred S. Matthews

Software, Visuals, Music

Atlanta, Georgia, US

+1.337.214.4688

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-18 18:08         ` Alfred Matthews
@ 2017-03-20  5:34           ` NeilBrown
  2017-03-20 21:42             ` Alfred Matthews
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-03-20  5:34 UTC (permalink / raw)
  To: Alfred Matthews; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

On Sat, Mar 18 2017, Alfred Matthews wrote:

> I've switched to the backup drives which are clones of the first, now,
> so destructive operations are ok if necessary. Also signatures will
> have changed.
>
> 0. Hm. Evidently the system is JHFS instead of HFS+, per the output
> below. Unsure if there is separate tooling in Debian.
>
> 1. Mount via
>
> mdadm --build /dev/md0 --level=0 -n2 --chunk=512K /dev/sdc2 /dev/sdb2
>
> works just fine. Thanks!
>
> 2. I'm still sticking with the non-destructive, non-mount edits for
> now. So I can report the following:
>
> hpfsck -v /dev/md0 | cat >> hpfsck_output.txt
>
> yields some stuff probably more enlightening than prior.

This is promising until:


> *** Checking Backup Volume Header:
> Unexpected Volume signature '  ' expected 'H+'

Here the backup volume header, which is 2 blocks (blocks are 8K) from
the end of the device, looks wrong.
This probably means the chunk size is wrong.
I would suggest trying different chunksizes, starting at 4K and
doubling, until this message goes away.
That still might not be the correct chunk size, so I would continue up
to several megabytes and find all the chunksizes that seem to work.
Then look at what else hpfsck says on those.

BTW, this:
> Invalid total blocks 2BA8CC68, expected 0 Done ***
is not a real problem, just some odd code.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-20  5:34           ` NeilBrown
@ 2017-03-20 21:42             ` Alfred Matthews
  2017-03-21  2:38               ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Alfred Matthews @ 2017-03-20 21:42 UTC (permalink / raw)
  To: NeilBrown, linux-raid

>> *** Checking Backup Volume Header:
>> Unexpected Volume signature '  ' expected 'H+'
>
> Here the backup volume header, which is 2 blocks (blocks are 8K) from
> the end of the device, looks wrong.
> This probably means the chunk size is wrong.
> I would suggest trying different chunksizes, starting at 4K and
> doubling, until this message goes away.
> That still might not be the correct chunk size, so I would continue up
> to several megabytes and find all the chunksizes that seem to work.
> Then look at what else hpfsck says on those.

I'm not actually able to generate happy output in hpfsck using any of
the following multiples of 4K

4
[...]
8192
16384
32768
65536
131072
262144
524288
1048576
2097152

Any chance it's not really an HFS system at all?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-20 21:42             ` Alfred Matthews
@ 2017-03-21  2:38               ` NeilBrown
  2017-03-21 12:09                 ` Alfred Matthews
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-03-21  2:38 UTC (permalink / raw)
  To: Alfred Matthews, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On Mon, Mar 20 2017, Alfred Matthews wrote:

>>> *** Checking Backup Volume Header:
>>> Unexpected Volume signature '  ' expected 'H+'
>>
>> Here the backup volume header, which is 2 blocks (blocks are 8K) from
>> the end of the device, looks wrong.
>> This probably means the chunk size is wrong.
>> I would suggest trying different chunksizes, starting at 4K and
>> doubling, until this message goes away.
>> That still might not be the correct chunk size, so I would continue up
>> to several megabytes and find all the chunksizes that seem to work.
>> Then look at what else hpfsck says on those.
>
> I'm not actually able to generate happy output in hpfsck using any of
> the following multiples of 4K
>
> 4
> [...]
> 8192
> 16384
> 32768
> 65536
> 131072
> 262144
> 524288
> 1048576
> 2097152
>
> Any chance it's not really an HFS system at all?

Not likely.  hpfsck finds a perfectly valid superblock (or "Volume
Header") at the start of the device.  It just cannot find the end one.

The blocksize is:
     blocksize       : 2000

which is in HEX, so 8K.
The total_blocks is:
     total_blocks    : 732482664

which are 8K blocks, so 5859861312K or 5.4TB (using 1024*1024*1024).
which matches the fact that each partition is 2.73TB.

The problem seems to be that we are not combining the two partitions
together in the correct way to create the original 5.4TB partition.

All we know is that the backup volume header should look
much like the main header, and particularly should have 'H+' in the
signature, which is the first 2 bytes.
i.e. the first two bytes of the volume headers should be
0x4A2B

The second (8K) block of the disk must look like this, and
the second last should as well.
If you can search through both devices for all 8K blocks which
start with 0x4A2B, that might give us a hint what to look for.
I would write a C program to do this.  I might take a while to run, but
you can test on the first device, as you know block 2 matches.


Hmmm... I've got a new theory.  The code is broken.
fscheck_read_wrapper() in libhfsp/src/fscheck.c should set
vol->maxblocks.
It is set to a dummy value of '3' before this is called.
In the "signature == HFS_VOLHEAD_SIG" it sets it properly,
but in the "signature == HFSP_VOLHEAD_SIG" case (which applies to you)
it doesn't.
So it tries to read the backup from block "3-2", or block 1.  And there
is nothing there.

How is your C coding?  You could
  apt-get source hfsplus
and hack the code and try to build it yourself....


NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-21  2:38               ` NeilBrown
@ 2017-03-21 12:09                 ` Alfred Matthews
  2017-05-03 15:44                   ` Alfred Matthews
  0 siblings, 1 reply; 14+ messages in thread
From: Alfred Matthews @ 2017-03-21 12:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks, I'll take a look. Could use a project, insane as that sounds.

On 20 March 2017 at 22:38, NeilBrown <neilb@suse.com> wrote:
> On Mon, Mar 20 2017, Alfred Matthews wrote:
>
>>>> *** Checking Backup Volume Header:
>>>> Unexpected Volume signature '  ' expected 'H+'
>>>
>>> Here the backup volume header, which is 2 blocks (blocks are 8K) from
>>> the end of the device, looks wrong.
>>> This probably means the chunk size is wrong.
>>> I would suggest trying different chunksizes, starting at 4K and
>>> doubling, until this message goes away.
>>> That still might not be the correct chunk size, so I would continue up
>>> to several megabytes and find all the chunksizes that seem to work.
>>> Then look at what else hpfsck says on those.
>>
>> I'm not actually able to generate happy output in hpfsck using any of
>> the following multiples of 4K
>>
>> 4
>> [...]
>> 8192
>> 16384
>> 32768
>> 65536
>> 131072
>> 262144
>> 524288
>> 1048576
>> 2097152
>>
>> Any chance it's not really an HFS system at all?
>
> Not likely.  hpfsck finds a perfectly valid superblock (or "Volume
> Header") at the start of the device.  It just cannot find the end one.
>
> The blocksize is:
>      blocksize       : 2000
>
> which is in HEX, so 8K.
> The total_blocks is:
>      total_blocks    : 732482664
>
> which are 8K blocks, so 5859861312K or 5.4TB (using 1024*1024*1024).
> which matches the fact that each partition is 2.73TB.
>
> The problem seems to be that we are not combining the two partitions
> together in the correct way to create the original 5.4TB partition.
>
> All we know is that the backup volume header should look
> much like the main header, and particularly should have 'H+' in the
> signature, which is the first 2 bytes.
> i.e. the first two bytes of the volume headers should be
> 0x4A2B
>
> The second (8K) block of the disk must look like this, and
> the second last should as well.
> If you can search through both devices for all 8K blocks which
> start with 0x4A2B, that might give us a hint what to look for.
> I would write a C program to do this.  I might take a while to run, but
> you can test on the first device, as you know block 2 matches.
>
>
> Hmmm... I've got a new theory.  The code is broken.
> fscheck_read_wrapper() in libhfsp/src/fscheck.c should set
> vol->maxblocks.
> It is set to a dummy value of '3' before this is called.
> In the "signature == HFS_VOLHEAD_SIG" it sets it properly,
> but in the "signature == HFSP_VOLHEAD_SIG" case (which applies to you)
> it doesn't.
> So it tries to read the backup from block "3-2", or block 1.  And there
> is nothing there.
>
> How is your C coding?  You could
>   apt-get source hfsplus
> and hack the code and try to build it yourself....
>
>
> NeilBrown
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-03-21 12:09                 ` Alfred Matthews
@ 2017-05-03 15:44                   ` Alfred Matthews
  2017-05-11 17:18                     ` Alfred Matthews
  2017-05-11 22:27                     ` NeilBrown
  0 siblings, 2 replies; 14+ messages in thread
From: Alfred Matthews @ 2017-05-03 15:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Sorry to disappear for a bit.

Of necessity I've migrated to another working system and I'm in Arch.
Utility here is fsck.hfs as opposed to hpfsck. I'd like to post that
output.

mdadm --build /dev/md0 --level=0 -n2 --chunk=8K /dev/sdb2 /dev/sdc2

works, then,

fsck.hfs -v /dev/md0
*** Checking volume MDB
  drSigWord      = 0xffffffff
  drCrDate       = Mon Feb  6 06:28:15 2040
  drLsMod        = Mon Feb  6 06:28:15 2040
  drAtrb         = BUSY | HLOCKED | UMOUNTED | BBSPARED | BVINCONSIS |
COPYPROT | SLOCKED
  drNmFls        = 65535
  drVBMSt        = 65535
  drAllocPtr     = 65535
  drNmAlBlks     = 65535
  drAlBlkSiz     = 4294967295
  drClpSiz       = 4294967295
  drAlBlSt       = 65535
  drNxtCNID      = 4294967295
  drFreeBks      = 65535
  drVN           = ""
  drVolBkUp      = Mon Feb  6 06:28:15 2040
  drVSeqNum      = 4294967295
  drWrCnt        = 4294967295
  drXTClpSiz     = 4294967295
  drCTClpSiz     = 4294967295
  drNmRtDirs     = 65535
  drFilCnt       = 4294967295
  drDirCnt       = 4294967295
  drEmbedSigWord = 0xffff
  drEmbedExtent  = 65535[65535..131069]
  drXTFlSize     = 4294967295
  drXTExtRec     =
65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
  drCTFlSize     = 4294967295
  drCTExtRec     =
65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
Bad volume signature (0xffffffff); should be 0x4244. Fix? y
Volume creation date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
Volume last modify date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
Volume bitmap starts at unusual location (1), not 3. Fix? y
*** Checking volume structure
*** Checking extents overflow B*-tree

It's possible that I was unwise in accepting the changes from fsck.

>>
>> How is your C coding?  You could
>>   apt-get source hfsplus
>> and hack the code and try to build it yourself....
>>

Still fine. By now not clear if this is what's needed.

Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-05-03 15:44                   ` Alfred Matthews
@ 2017-05-11 17:18                     ` Alfred Matthews
  2017-05-11 22:27                     ` NeilBrown
  1 sibling, 0 replies; 14+ messages in thread
From: Alfred Matthews @ 2017-05-11 17:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

I'm bumping this for lack of reply. I know you all are busy.

fsck.hfs /dev/md0 after build sends a core dump. Not clear to me what
I'm doing wrong. Thanks.

On 3 May 2017 at 15:44, Alfred Matthews <asm13243546@gmail.com> wrote:
> Sorry to disappear for a bit.
>
> Of necessity I've migrated to another working system and I'm in Arch.
> Utility here is fsck.hfs as opposed to hpfsck. I'd like to post that
> output.
>
> mdadm --build /dev/md0 --level=0 -n2 --chunk=8K /dev/sdb2 /dev/sdc2
>
> works, then,
>
> fsck.hfs -v /dev/md0
> *** Checking volume MDB
>   drSigWord      = 0xffffffff
>   drCrDate       = Mon Feb  6 06:28:15 2040
>   drLsMod        = Mon Feb  6 06:28:15 2040
>   drAtrb         = BUSY | HLOCKED | UMOUNTED | BBSPARED | BVINCONSIS |
> COPYPROT | SLOCKED
>   drNmFls        = 65535
>   drVBMSt        = 65535
>   drAllocPtr     = 65535
>   drNmAlBlks     = 65535
>   drAlBlkSiz     = 4294967295
>   drClpSiz       = 4294967295
>   drAlBlSt       = 65535
>   drNxtCNID      = 4294967295
>   drFreeBks      = 65535
>   drVN           = ""
>   drVolBkUp      = Mon Feb  6 06:28:15 2040
>   drVSeqNum      = 4294967295
>   drWrCnt        = 4294967295
>   drXTClpSiz     = 4294967295
>   drCTClpSiz     = 4294967295
>   drNmRtDirs     = 65535
>   drFilCnt       = 4294967295
>   drDirCnt       = 4294967295
>   drEmbedSigWord = 0xffff
>   drEmbedExtent  = 65535[65535..131069]
>   drXTFlSize     = 4294967295
>   drXTExtRec     =
> 65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
>   drCTFlSize     = 4294967295
>   drCTExtRec     =
> 65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
> Bad volume signature (0xffffffff); should be 0x4244. Fix? y
> Volume creation date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
> Volume last modify date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
> Volume bitmap starts at unusual location (1), not 3. Fix? y
> *** Checking volume structure
> *** Checking extents overflow B*-tree
>
> It's possible that I was unwise in accepting the changes from fsck.
>
>>>
>>> How is your C coding?  You could
>>>   apt-get source hfsplus
>>> and hack the code and try to build it yourself....
>>>
>
> Still fine. By now not clear if this is what's needed.
>
> Al

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-05-03 15:44                   ` Alfred Matthews
  2017-05-11 17:18                     ` Alfred Matthews
@ 2017-05-11 22:27                     ` NeilBrown
  2017-05-12 15:57                       ` Alfred Matthews
  1 sibling, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-05-11 22:27 UTC (permalink / raw)
  To: Alfred Matthews; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2533 bytes --]

On Wed, May 03 2017, Alfred Matthews wrote:

> Sorry to disappear for a bit.
>
> Of necessity I've migrated to another working system and I'm in Arch.
> Utility here is fsck.hfs as opposed to hpfsck. I'd like to post that
> output.

I wonder if they are the same thing, or completely different.

>
> mdadm --build /dev/md0 --level=0 -n2 --chunk=8K /dev/sdb2 /dev/sdc2
>
> works, then,
>
> fsck.hfs -v /dev/md0
> *** Checking volume MDB
>   drSigWord      = 0xffffffff
>   drCrDate       = Mon Feb  6 06:28:15 2040
>   drLsMod        = Mon Feb  6 06:28:15 2040
>   drAtrb         = BUSY | HLOCKED | UMOUNTED | BBSPARED | BVINCONSIS |
> COPYPROT | SLOCKED
>   drNmFls        = 65535
>   drVBMSt        = 65535
>   drAllocPtr     = 65535
>   drNmAlBlks     = 65535
>   drAlBlkSiz     = 4294967295
>   drClpSiz       = 4294967295
>   drAlBlSt       = 65535
>   drNxtCNID      = 4294967295
>   drFreeBks      = 65535
>   drVN           = ""
>   drVolBkUp      = Mon Feb  6 06:28:15 2040
>   drVSeqNum      = 4294967295
>   drWrCnt        = 4294967295
>   drXTClpSiz     = 4294967295
>   drCTClpSiz     = 4294967295
>   drNmRtDirs     = 65535
>   drFilCnt       = 4294967295
>   drDirCnt       = 4294967295
>   drEmbedSigWord = 0xffff
>   drEmbedExtent  = 65535[65535..131069]
>   drXTFlSize     = 4294967295
>   drXTExtRec     =
> 65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
>   drCTFlSize     = 4294967295
>   drCTExtRec     =
> 65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
> Bad volume signature (0xffffffff); should be 0x4244. Fix? y
> Volume creation date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
> Volume last modify date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
> Volume bitmap starts at unusual location (1), not 3. Fix? y
> *** Checking volume structure
> *** Checking extents overflow B*-tree
>
> It's possible that I was unwise in accepting the changes from fsck.

Definitely possible.

Given how many of the numbers listed above are 0xffff or 0xffffffff, it
is unlikely that the tool found anything useful.

>
>>>
>>> How is your C coding?  You could
>>>   apt-get source hfsplus
>>> and hack the code and try to build it yourself....
>>>
>
> Still fine. By now not clear if this is what's needed.

I don't think I can be of further help, sorry.

NeilBrown


>
> Al
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: on assembly and recovery of a hardware RAID
  2017-05-11 22:27                     ` NeilBrown
@ 2017-05-12 15:57                       ` Alfred Matthews
  0 siblings, 0 replies; 14+ messages in thread
From: Alfred Matthews @ 2017-05-12 15:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Thanks. You've been tremendously helpful thus far. I appreciate it.
Will post any successful results.

On 11 May 2017 at 22:27, NeilBrown <neilb@suse.com> wrote:
> On Wed, May 03 2017, Alfred Matthews wrote:
>
>> Sorry to disappear for a bit.
>>
>> Of necessity I've migrated to another working system and I'm in Arch.
>> Utility here is fsck.hfs as opposed to hpfsck. I'd like to post that
>> output.
>
> I wonder if they are the same thing, or completely different.
>
>>
>> mdadm --build /dev/md0 --level=0 -n2 --chunk=8K /dev/sdb2 /dev/sdc2
>>
>> works, then,
>>
>> fsck.hfs -v /dev/md0
>> *** Checking volume MDB
>>   drSigWord      = 0xffffffff
>>   drCrDate       = Mon Feb  6 06:28:15 2040
>>   drLsMod        = Mon Feb  6 06:28:15 2040
>>   drAtrb         = BUSY | HLOCKED | UMOUNTED | BBSPARED | BVINCONSIS |
>> COPYPROT | SLOCKED
>>   drNmFls        = 65535
>>   drVBMSt        = 65535
>>   drAllocPtr     = 65535
>>   drNmAlBlks     = 65535
>>   drAlBlkSiz     = 4294967295
>>   drClpSiz       = 4294967295
>>   drAlBlSt       = 65535
>>   drNxtCNID      = 4294967295
>>   drFreeBks      = 65535
>>   drVN           = ""
>>   drVolBkUp      = Mon Feb  6 06:28:15 2040
>>   drVSeqNum      = 4294967295
>>   drWrCnt        = 4294967295
>>   drXTClpSiz     = 4294967295
>>   drCTClpSiz     = 4294967295
>>   drNmRtDirs     = 65535
>>   drFilCnt       = 4294967295
>>   drDirCnt       = 4294967295
>>   drEmbedSigWord = 0xffff
>>   drEmbedExtent  = 65535[65535..131069]
>>   drXTFlSize     = 4294967295
>>   drXTExtRec     =
>> 65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
>>   drCTFlSize     = 4294967295
>>   drCTExtRec     =
>> 65535[65535..131069]+65535[65535..131069]+65535[65535..131069]
>> Bad volume signature (0xffffffff); should be 0x4244. Fix? y
>> Volume creation date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
>> Volume last modify date is in the future (Mon Feb  6 06:28:15 2040). Fix? y
>> Volume bitmap starts at unusual location (1), not 3. Fix? y
>> *** Checking volume structure
>> *** Checking extents overflow B*-tree
>>
>> It's possible that I was unwise in accepting the changes from fsck.
>
> Definitely possible.
>
> Given how many of the numbers listed above are 0xffff or 0xffffffff, it
> is unlikely that the tool found anything useful.
>
>>
>>>>
>>>> How is your C coding?  You could
>>>>   apt-get source hfsplus
>>>> and hack the code and try to build it yourself....
>>>>
>>
>> Still fine. By now not clear if this is what's needed.
>
> I don't think I can be of further help, sorry.
>
> NeilBrown
>
>
>>
>> Al
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-05-12 15:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-10 20:37 on assembly and recovery of a hardware RAID Alfred Matthews
2017-03-13 21:32 ` NeilBrown
2017-03-14 17:27   ` Alfred Matthews
2017-03-15  2:56     ` NeilBrown
2017-03-18 17:11       ` Alfred Matthews
2017-03-18 18:08         ` Alfred Matthews
2017-03-20  5:34           ` NeilBrown
2017-03-20 21:42             ` Alfred Matthews
2017-03-21  2:38               ` NeilBrown
2017-03-21 12:09                 ` Alfred Matthews
2017-05-03 15:44                   ` Alfred Matthews
2017-05-11 17:18                     ` Alfred Matthews
2017-05-11 22:27                     ` NeilBrown
2017-05-12 15:57                       ` Alfred Matthews

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.