All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs_repair questions
@ 2018-02-22 23:42 Alatun Rom
  2018-02-23 12:02 ` Emmanuel Florac
  2018-02-23 19:30 ` Eric Sandeen
  0 siblings, 2 replies; 4+ messages in thread
From: Alatun Rom @ 2018-02-22 23:42 UTC (permalink / raw)
  To: linux-xfs

Hi all!

Maybe someone with knowledge about xfs_repair can tell me, what could
be the reason why a repair of an XFS partition is currently impossible.

The backstory: one of our servers is running a VM with a proprietary
service. After updating some packages on the host, the VM had crashed
after running initially after the update. At next start the system in
the VM (a rather old linux based on 2.2 kernel) could no longer boot
complaining about invalid XFS structures. The VM is using a VMDK HD
container. Using vmware-mount tools I detected that the VMDK container
itself was also damaged. Vmware-tools could repair the container and I
tried to mount the root partition which failed. I was able to locate
the offset of the MBR inside the container and mounted a loop-back
with that offset and using testdisk I was able to extract the
partition with the root file system to a file. I also extracted two
other partitions with this approach: another XFS and one EXT2.
These two could be checked and fixed, but unfortunately the root filesystem is
tricky. Maybe another information is important here: initially the
system inside the VM was running on a native machine (about 10 years
ago). The root fs was created as XFS on a software raid (/dev/md0).
When I migrated the native machine to a VM I cloned the disks and
later removed all but one of partitions of the disc array.

Now I'm trying to fix the xfs in this partition dump. xfs_repair found
7 candidates of secondary superblocks, but is unable to verify them. I
was rather disappointed with the lack of information xfs_repair
reported.

So I made a git clone of xfstools and looked for the code doing the
repair and added some more output. I added information about the file
offset of the candidates and the status, why it is considered unusable.
The output of xfs_repair for my broken partition now looks like this:
---
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
found candidate secondary superblock... off: 32204029952
checking sbs#: 4
check sbs@ 32212221952 - status: 1
check sbs@ 64424443904 - status: 1
check sbs@ 96636665856 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 57813565440
checking sbs#: 4
check sbs@ 32212221952 - status: 1
check sbs@ 64424443904 - status: 1
check sbs@ 96636665856 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 64185663488
checking sbs#: 4
check sbs@ 32212221952 - status: 1
check sbs@ 64424443904 - status: 1
check sbs@ 96636665856 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96412992512
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96413164544
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96413271040
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96419537920
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
Sorry, could not find valid secondary superblock
Exiting now.
--
(error code 16: not enough superblocks to validate
status code 1: invalid superblock signature)
I also extracted the blocks at the given offsets (eg @ 32204029952) ,
examined them and the contents of the SSBs are looking quite good:
--
00000000  58 46 53 42 00 00 10 00  00 00 00 00 01 df ff e0  |XFSB............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  b1 f2 a8 ad 7d 75 4d a0  9a be 3f e6 a4 e4 05 c3  |....}uM...?.....|
00000030  00 00 00 00 01 00 00 04  00 00 00 00 00 00 00 80  |................|
00000040  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000050  00 00 00 01 00 77 ff f8  00 00 00 04 00 00 00 00  |.....w..........|
00000060  00 00 3b ff b4 a4 02 00  01 00 00 10 68 79 00 00  |..;.........hy..|
00000070  00 00 00 00 00 00 00 00  0c 09 08 04 17 00 00 19  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 01 df c3 d1  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 02  00 00 00 00 00 00 00 00  |................|
000000c0  00 00 00 00 00 00 00 01  00 00 00 08 00 00 00 08  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200
--
As far as I understood, xfs_repair considers the file system
unfixable, because the secondary superblocks are not found at the offsets,
where they should be. The blocks found look quite ok.
My guess: because the VMDK container got damaged, the information of
the internal partitions is now incorrect, so testdisk did extract a
file that is somehow broken. But from what I've seen so far, it could
be possible to fix it up to the point, to extract some files.

What is unclear for me: why are 7 SSB found? Is this a geometry issue?
The found superblocks tell, that a total of 4 superblocks should
exist. What happens if you grow an XFS file system? Do additional
stripes generate a layout like this? Is the distance between the
superblock copies ALWAYS a constant value? In my scenario the distance
of the first 3 superblocks is not a fixed value. But how can blocks
get lost? I don't think this is possible with VMDK to "mark blocks as
deleted" inside a partition and they are skipped when reading a partition.

Perhaps somebody with more knowledge can comment on this and can give me
some pointers.


Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xfs_repair questions
  2018-02-22 23:42 xfs_repair questions Alatun Rom
@ 2018-02-23 12:02 ` Emmanuel Florac
  2018-02-23 19:30 ` Eric Sandeen
  1 sibling, 0 replies; 4+ messages in thread
From: Emmanuel Florac @ 2018-02-23 12:02 UTC (permalink / raw)
  To: Alatun Rom; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

Le Fri, 23 Feb 2018 00:42:11 +0100
Alatun Rom <alatun.rom@gmail.com> écrivait:

> As far as I understood, xfs_repair considers the file system
> unfixable, because the secondary superblocks are not found at the
> offsets, where they should be.

I don't think so, as it reports "invalid superblock signature", this is
most probably the culprit. It needs at least one proper, uncorrupted
superblock.

> The blocks found look quite ok.
> My guess: because the VMDK container got damaged, the information of
> the internal partitions is now incorrect, so testdisk did extract a
> file that is somehow broken. But from what I've seen so far, it could
> be possible to fix it up to the point, to extract some files.
> 

I never knew that testdisk could be used like that. Is the volume of
the right size, matching the original partition?

> What is unclear for me: why are 7 SSB found? Is this a geometry issue?

The adresses are super weird: the 4 first seem OK, the the last three
are bizarrely close to each other:

32204029952
57813565440
64185663488
96412992512
96413164544
96413271040
96419537920

> The found superblocks tell, that a total of 4 superblocks should
> exist. What happens if you grow an XFS file system? Do additional
> stripes generate a layout like this? Is the distance between the
> superblock copies ALWAYS a constant value? In my scenario the distance
> of the first 3 superblocks is not a fixed value. But how can blocks
> get lost? I don't think this is possible with VMDK to "mark blocks as
> deleted" inside a partition and they are skipped when reading a
> partition.

Honestly, AFAIK VMDK are hollow files. If it gets damaged all bets are
off...

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xfs_repair questions
  2018-02-22 23:42 xfs_repair questions Alatun Rom
  2018-02-23 12:02 ` Emmanuel Florac
@ 2018-02-23 19:30 ` Eric Sandeen
  1 sibling, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2018-02-23 19:30 UTC (permalink / raw)
  To: Alatun Rom, linux-xfs

On 2/22/18 5:42 PM, Alatun Rom wrote:
> Now I'm trying to fix the xfs in this partition dump. xfs_repair found
> 7 candidates of secondary superblocks, but is unable to verify them. I
> was rather disappointed with the lack of information xfs_repair
> reported.
> 
> So I made a git clone of xfstools and looked for the code doing the
> repair and added some more output. I added information about the file
> offset of the candidates and the status, why it is considered unusable.
> The output of xfs_repair for my broken partition now looks like this:
> ---
> Phase 1 - find and verify superblock...
> bad primary superblock - bad magic number !!!

So right off the bat, the first superblock in sector zero looks bad.
Are you sure you properly extracted this part of the disk?  What
does i.e. blkid or file -s say about what you extracted, and
what does a hexdump look like?

> attempting to find secondary superblock...
> found candidate secondary superblock... off: 32204029952
> checking sbs#: 4
> check sbs@ 32212221952 - status: 1

without knowing what your patches do, I don't know what "status: 1"
means here.  Is that a failure?

And backing up, what version of xfsprogs are you using?

> check sbs@ 64424443904 - status: 1
> check sbs@ 96636665856 - status: 1
> unable to verify superblock, continuing... error code: 16

#define XR_INSUFF_SEC_SB        16      /* not enough matching secondary sbs */

        if (num_ok < num_sbs / 2) {
                retval = XR_INSUFF_SEC_SB;

> found candidate secondary superblock... off: 57813565440
> checking sbs#: 4
> check sbs@ 32212221952 - status: 1
> check sbs@ 64424443904 - status: 1
> check sbs@ 96636665856 - status: 1
> unable to verify superblock, continuing... error code: 16
> found candidate secondary superblock... off: 64185663488
> checking sbs#: 4
> check sbs@ 32212221952 - status: 1
> check sbs@ 64424443904 - status: 1
> check sbs@ 96636665856 - status: 1
> unable to verify superblock, continuing... error code: 16
> found candidate secondary superblock... off: 96412992512
> checking sbs#: 4
> check sbs@ 7232069632 - status: 1
> check sbs@ 14464139264 - status: 1
> check sbs@ 21696208896 - status: 1
> unable to verify superblock, continuing... error code: 16
> found candidate secondary superblock... off: 96413164544
> checking sbs#: 4
> check sbs@ 7232069632 - status: 1
> check sbs@ 14464139264 - status: 1
> check sbs@ 21696208896 - status: 1
> unable to verify superblock, continuing... error code: 16
> found candidate secondary superblock... off: 96413271040
> checking sbs#: 4
> check sbs@ 7232069632 - status: 1
> check sbs@ 14464139264 - status: 1
> check sbs@ 21696208896 - status: 1
> unable to verify superblock, continuing... error code: 16
> found candidate secondary superblock... off: 96419537920
> checking sbs#: 4
> check sbs@ 7232069632 - status: 1
> check sbs@ 14464139264 - status: 1
> check sbs@ 21696208896 - status: 1
> unable to verify superblock, continuing... error code: 16
> Sorry, could not find valid secondary superblock
> Exiting now.
> --
> (error code 16: not enough superblocks to validate
> status code 1: invalid superblock signature)
> I also extracted the blocks at the given offsets (eg @ 32204029952) ,
> examined them and the contents of the SSBs are looking quite good:
> --
> 00000000  58 46 53 42 00 00 10 00  00 00 00 00 01 df ff e0  |XFSB............|
> 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> 00000020  b1 f2 a8 ad 7d 75 4d a0  9a be 3f e6 a4 e4 05 c3  |....}uM...?.....|
> 00000030  00 00 00 00 01 00 00 04  00 00 00 00 00 00 00 80  |................|
> 00000040  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
> 00000050  00 00 00 01 00 77 ff f8  00 00 00 04 00 00 00 00  |.....w..........|
> 00000060  00 00 3b ff b4 a4 02 00  01 00 00 10 68 79 00 00  |..;.........hy..|
> 00000070  00 00 00 00 00 00 00 00  0c 09 08 04 17 00 00 19  |................|
> 00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> 00000090  00 00 00 00 01 df c3 d1  00 00 00 00 00 00 00 00  |................|
> 000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> 000000b0  00 00 00 00 00 00 00 02  00 00 00 00 00 00 00 00  |................|
> 000000c0  00 00 00 00 00 00 00 01  00 00 00 08 00 00 00 08  |................|
> 000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> *
> 00000200
> --
> As far as I understood, xfs_repair considers the file system
> unfixable, because the secondary superblocks are not found at the offsets,
> where they should be. The blocks found look quite ok.
> My guess: because the VMDK container got damaged, the information of
> the internal partitions is now incorrect, so testdisk did extract a
> file that is somehow broken. But from what I've seen so far, it could
> be possible to fix it up to the point, to extract some files.
> 
> What is unclear for me: why are 7 SSB found?

It may have found logged superblocks in the log region.

> Is this a geometry issue?
> The found superblocks tell, that a total of 4 superblocks should
> exist. What happens if you grow an XFS file system? Do additional
> stripes generate a layout like this? 

you get extra allocation groups added, and the superblock AG count
will reflect this.

> Is the distance between the
> superblock copies ALWAYS a constant value? 

yes, though the last one may be a smaller size.

> In my scenario the distance
> of the first 3 superblocks is not a fixed value. But how can blocks
> get lost? I don't think this is possible with VMDK to "mark blocks as
> deleted" inside a partition and they are skipped when reading a partition.
> 
> Perhaps somebody with more knowledge can comment on this and can give me
> some pointers.

So, xfs_repair used to be really braindead and scan the entire device
sector by sector looking for candidate superblocks.  A while back some
heuristics were added to try to be a little smarter; once a candidate was
found, try checking the other locations that the found superblock
points to.  However if things fail, IIRC it is supposed to fall back to
a block-by-block scan.

Can you try xfsprogs-4.6.0, and run an xfs_repair -n from that?

(-n will be check only and will leave the fs unchanged; if that does
work I'd like to get to the bottom of why newer xfsprogs fails for you,
so would rather not modify anything yet, if you can tolerate a bit of
a wait)

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* xfs_repair questions
@ 2018-02-22 15:11 Alatun Rom
  0 siblings, 0 replies; 4+ messages in thread
From: Alatun Rom @ 2018-02-22 15:11 UTC (permalink / raw)
  To: linux-xfs

Hi all!

Maybe someone with knowledge about xfs_repair, can tell me what could
be the reason why a repair of an XFS  partition is currently
impossible.

The backstory: one of our servers is running a VM with a proprietary
service. After updating some packages on the host, the VM had crashed
after running initially after the update. At next start the system in
the VM (a rather old linux based on 2.6 kernel) could no longer boot,
complaining about invalid XFS structures. The VM is using a VMDK HD
container. Using vmware-mount tools I detected that the VMDK container
itself was also damaged. Vmware-tools could repair the container and I
tried to mount the root partition, which failed. I was able to locate
the offset of the MBR inside the container and mounted a loop-back
with that offset and using testdisk I was able to extract the
partition with the root file system to a file. I also extracted two
other partitions with this approach: another XFS and one ext2. Both
could be tested and fixed, but unfortunately the root filesystem is
tricky. Maybe another information is important here: initially the
system inside the VM was running on a native machine (about 10 years
ago). The root fs was created as XFS on a software raid (/dev/md0).
When I migrated the native machine to a VM I cloned the disks and
later removed one of partitions of the disc array

Now I'm trying to fix the xfs in this partition dump. xfs_repair found
7 candidates of secondary superblocks, but is unable to verify them. I
was rather disappointed with the lack of information xfs_repair
reported.

So I made a git clone of xfstools and looked for the code doing the
repair and added some more output. The output of xfs_repair for my
broken partition now looks like this:
---
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
found candidate secondary superblock... off: 32204029952
checking sbs#: 4
check sbs@ 32212221952 - status: 1
check sbs@ 64424443904 - status: 1
check sbs@ 96636665856 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 57813565440
checking sbs#: 4
check sbs@ 32212221952 - status: 1
check sbs@ 64424443904 - status: 1
check sbs@ 96636665856 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 64185663488
checking sbs#: 4
check sbs@ 32212221952 - status: 1
check sbs@ 64424443904 - status: 1
check sbs@ 96636665856 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96412992512
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96413164544
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96413271040
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
found candidate secondary superblock... off: 96419537920
checking sbs#: 4
check sbs@ 7232069632 - status: 1
check sbs@ 14464139264 - status: 1
check sbs@ 21696208896 - status: 1
unable to verify superblock, continuing... error code: 16
Sorry, could not find valid secondary superblock
Exiting now.
--
(error code: 16 not enough superblocks the validate/ status code: 1 -
invalid superblock signature)
I added information about the file offset of the candiadates and the
status, why it is considered unusable. I also extracted the blocks at
the given offsets, and examined them and the contents of the SSBs are
looking extremely good:
--
00000000  58 46 53 42 00 00 10 00  00 00 00 00 01 df ff e0  |XFSB............|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  b1 f2 a8 ad 7d 75 4d a0  9a be 3f e6 a4 e4 05 c3  |....}uM...?.....|
00000030  00 00 00 00 01 00 00 04  00 00 00 00 00 00 00 80  |................|
00000040  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000050  00 00 00 01 00 77 ff f8  00 00 00 04 00 00 00 00  |.....w..........|
00000060  00 00 3b ff b4 a4 02 00  01 00 00 10 68 79 00 00  |..;.........hy..|
00000070  00 00 00 00 00 00 00 00  0c 09 08 04 17 00 00 19  |................|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  00 00 00 00 01 df c3 d1  00 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 02  00 00 00 00 00 00 00 00  |................|
000000c0  00 00 00 00 00 00 00 01  00 00 00 08 00 00 00 08  |................|
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000200
--
As far as I understood, xfs_repair considers the file system
unfixable, because the other superblocks are not found at the offsets,
where they should be. The blocks found look quite ok.
My guess: because the VMDK container got damaged, the information of
the internal partitions can be incorrect, so testdisk did extract a
file that is somehow broken. But my guess is: I'm pretty close to get
it working again, but I'm stuck somehow.

What is unclear for me: why are 7 SSB found? Is this a geometry issue?
The found superblocks tell, that a total of 4 superblocks should
exist. What happens if you grow an XFS file system? Do additional
stripes generate a layout like this? Is the distance between the
superblock copies ALWAYS a constant value? In my scenario the distance
of the first 3 superblocks is not a fixed value. But how can blocks
get lost? I don't think this is possible with VMDK to "mark blocks as
deleted" inside a partition.

Perhaps somebody with more knowledge can comment this and can give me
some pointers.


Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-23 19:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-22 23:42 xfs_repair questions Alatun Rom
2018-02-23 12:02 ` Emmanuel Florac
2018-02-23 19:30 ` Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2018-02-22 15:11 Alatun Rom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.