All of lore.kernel.org
 help / color / mirror / Atom feed
* xfs_repair 3.2.0 cannot (?) fix fs
@ 2014-06-27 23:41 Arkadiusz Miśkiewicz
  2014-06-28 21:52 ` Arkadiusz Miśkiewicz
  2014-06-30  3:18 ` Dave Chinner
  0 siblings, 2 replies; 9+ messages in thread
From: Arkadiusz Miśkiewicz @ 2014-06-27 23:41 UTC (permalink / raw)
  To: xfs


Hello.

I have a fs (metadump of it
http://ixion.pld-linux.org/~arekm/p2/x1/web2-home.metadump.gz)
that xfs_repair 3.2.0 is unable to fix properly.

Running xfs_repair few times shows the same errors repeating:
http://ixion.pld-linux.org/~arekm/p2/x1/repair2.txt
http://ixion.pld-linux.org/~arekm/p2/x1/repair3.txt
http://ixion.pld-linux.org/~arekm/p2/x1/repair4.txt
http://ixion.pld-linux.org/~arekm/p2/x1/repair5.txt

(repair1.txt also exists - it was initial, very big/long repair)

Note that fs mounts fine (and was mounting fine before and after repair) but 
xfs_repair indicates that not everything got fixed.


Unfortunately there looks to be a problem with metadump image. xfs_repair is 
able to finish fixing on a restored image but is not able (see repairX.txt) 
above on real devices. Huh?

Examples of problems repeating each time xfs_repair is run:

1)
reset bad sb for ag 5
non-null group quota inode field in superblock 7

2) 
correcting nblocks for inode 965195858, was 19 - counted 20
correcting nextents for inode 965195858, was 16 - counted 17

3) clearing some entries; moving to lost+found (the same files)

4) 
Phase 7 - verify and correct link counts...
Invalid inode number 0xfeffffffffffffff
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0x11fbb698/0x1000
libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
Invalid inode number 0xfeffffffffffffff
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0x11fbb698/0x1000
libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
done

5)Metadata CRC error detected at block 0x0/0x200
but it is not CRC enabled fs

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-27 23:41 xfs_repair 3.2.0 cannot (?) fix fs Arkadiusz Miśkiewicz
@ 2014-06-28 21:52 ` Arkadiusz Miśkiewicz
  2014-06-28 22:01   ` Arkadiusz Miśkiewicz
  2014-06-30  3:18 ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Arkadiusz Miśkiewicz @ 2014-06-28 21:52 UTC (permalink / raw)
  To: xfs; +Cc: Alex Elder

On Saturday 28 of June 2014, Arkadiusz Miśkiewicz wrote:
> Hello.
> 
> I have a fs (metadump of it
> http://ixion.pld-linux.org/~arekm/p2/x1/web2-home.metadump.gz)
> that xfs_repair 3.2.0 is unable to fix properly.
> 
> Running xfs_repair few times shows the same errors repeating:
> http://ixion.pld-linux.org/~arekm/p2/x1/repair2.txt
> http://ixion.pld-linux.org/~arekm/p2/x1/repair3.txt
> http://ixion.pld-linux.org/~arekm/p2/x1/repair4.txt
> http://ixion.pld-linux.org/~arekm/p2/x1/repair5.txt
> 
> (repair1.txt also exists - it was initial, very big/long repair)
> 
> Note that fs mounts fine (and was mounting fine before and after repair)
> but xfs_repair indicates that not everything got fixed.
> 
> 
> Unfortunately there looks to be a problem with metadump image. xfs_repair
> is able to finish fixing on a restored image but is not able (see
> repairX.txt) above on real devices. Huh?

Made xfs metadump without file obfuscation and I'm able to reproduce the 
problem reliably on the image (if some xfs developer wants metadump image then 
please mail me - I don't want to put it for everyone due to obvious reasons).

So additional bug in xfs_metadump where file obfuscation "fixes" some issues. 
Does it obfuscate but keep invalid conditions (like keeping "/" in file name) 
? I guess it is not doing that.

CC: Alex as he did obfuscation algorithm rewrite at some point.


Anyway repair logs for that metadump
http://ixion.pld-linux.org/~arekm/p2/x2/repair-no-obfuscation1.txt
http://ixion.pld-linux.org/~arekm/p2/x2/repair-no-obfuscation2.txt

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-28 21:52 ` Arkadiusz Miśkiewicz
@ 2014-06-28 22:01   ` Arkadiusz Miśkiewicz
  0 siblings, 0 replies; 9+ messages in thread
From: Arkadiusz Miśkiewicz @ 2014-06-28 22:01 UTC (permalink / raw)
  To: xfs

On Saturday 28 of June 2014, Arkadiusz Miśkiewicz wrote:
> On Saturday 28 of June 2014, Arkadiusz Miśkiewicz wrote:
> > Hello.
> > 
> > I have a fs (metadump of it
> > http://ixion.pld-linux.org/~arekm/p2/x1/web2-home.metadump.gz)
> > that xfs_repair 3.2.0 is unable to fix properly.
> > 
> > Running xfs_repair few times shows the same errors repeating:
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair2.txt
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair3.txt
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair4.txt
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair5.txt
> > 
> > (repair1.txt also exists - it was initial, very big/long repair)
> > 
> > Note that fs mounts fine (and was mounting fine before and after repair)
> > but xfs_repair indicates that not everything got fixed.
> > 
> > 
> > Unfortunately there looks to be a problem with metadump image. xfs_repair
> > is able to finish fixing on a restored image but is not able (see
> > repairX.txt) above on real devices. Huh?
> 
> Made xfs metadump without file obfuscation and I'm able to reproduce the
> problem reliably on the image (if some xfs developer wants metadump image
> then please mail me - I don't want to put it for everyone due to obvious
> reasons).

Forgot to mention about new problem. After running xfs_repair (reproducible on 
real fs and on metadump) and trying to mount fs:

[3571367.717167] XFS (loop0): Mounting Filesystem
[3571367.883958] XFS (loop0): Ending clean mount
[3571367.900733] XFS (loop0): Failed to initialize disk quotas.

Files are accessible etc. Just no quota. Unfortunately no information why 
initialization failed.

So xfs_repair wasn't able to fix that, too.

-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-27 23:41 xfs_repair 3.2.0 cannot (?) fix fs Arkadiusz Miśkiewicz
  2014-06-28 21:52 ` Arkadiusz Miśkiewicz
@ 2014-06-30  3:18 ` Dave Chinner
  2014-06-30  3:44   ` Dave Chinner
  2014-06-30  5:36   ` Arkadiusz Miśkiewicz
  1 sibling, 2 replies; 9+ messages in thread
From: Dave Chinner @ 2014-06-30  3:18 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: Alex Elder, xfs

[Compendium reply to all 3 emails]

On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> 
> Hello.
> 
> I have a fs (metadump of it
> http://ixion.pld-linux.org/~arekm/p2/x1/web2-home.metadump.gz)
> that xfs_repair 3.2.0 is unable to fix properly.
> 
> Running xfs_repair few times shows the same errors repeating:
> http://ixion.pld-linux.org/~arekm/p2/x1/repair2.txt
> http://ixion.pld-linux.org/~arekm/p2/x1/repair3.txt
> http://ixion.pld-linux.org/~arekm/p2/x1/repair4.txt
> http://ixion.pld-linux.org/~arekm/p2/x1/repair5.txt
> 
> (repair1.txt also exists - it was initial, very big/long repair)
> 
> Note that fs mounts fine (and was mounting fine before and after repair) but 
> xfs_repair indicates that not everything got fixed.
> 
> 
> Unfortunately there looks to be a problem with metadump image. xfs_repair is 
> able to finish fixing on a restored image but is not able (see repairX.txt) 
> above on real devices. Huh?
> 
> Examples of problems repeating each time xfs_repair is run:
> 
> 1)
> reset bad sb for ag 5
>. non-null group quota inode field in superblock 7

OK, so this is indicative of something screwed up a long time ago.
Firstly, the primary superblocks shows:

uquotino = 4077961
gquotino = 0
qflags = 0

i.e. user quota @ inode 4077961, no group quota. The secondary
superblocks that are being warned about show:

uquotino = 0
gquotino = 4077962
qflags = 0

Which is clearly wrong. They should have been overwritten during the
growfs operation to match the primary superblock.

The similarity in inode number leads me to beleive at some point
both user and group/project quotas were enabled on this filesystem,
but right now only user quotas are enabled.  It's only AGs 1-15 that
show this, so this seems to me that it is likely that this
filesystem was originally only 16 AGs and it's been grown many times
since?

Oh, this all occurred because you had a growfs operation on 3.10
fail because of garbage in the the sb of AG 16 (i.e. this from IRC:
http://sprunge.us/UJFE)? IOWs, this commit:

9802182 xfs: verify superblocks as they are read from disk

tripped up on sb 16. That means sb 16 is was not modified by the
growfs operation, and so should have the pre-growfs information in
it:

uquotino = 4077961
gquotino = 4077962
qflags = 0x77

Yeah, that's what I thought - the previous grow operation had both
quotas enabled. OK, that explains why the growfs operation had
issues, but it doesn't explain exactly how the quota inodes got
screwed up like that. Anyway, the growfs issues were solved by:

10e6e65 xfs: be more forgiving of a v4 secondary sb w/ junk in v5 fields

which landed in 3.13.

> 2) 
> correcting nblocks for inode 965195858, was 19 - counted 20
> correcting nextents for inode 965195858, was 16 - counted 17

Which is preceeded by:

data fork in ino 965195858 claims free block 60323539
data fork in ino 965195858 claims free block 60323532

and when combined with the later:

entry "dsc0945153ac18d4d4f1a-150x150.jpg" (ino 967349800) in dir 965195858 is a duplicate name, marking entry to be junked

errors from that directory, it looks like the space was freed but
the directory btree not correctly updated. No idea what might have
caused that, but it is a classic symptom of volatile write caches...

Hmmm, and when It goes to junk them on my local testing:

rebuilding directory inode 965195858
name create failed in ino 965195858 (117), filesystem may be out of space

Which is an EFSCORRUPTED error trying to rebuild that directory.
The second error pass did not throw an error, but it did not fix
the errors as a 3rd pass still reported this. I'll look into why.


> 3) clearing some entries; moving to lost+found (the same files)

> 
> 4) 
> Phase 7 - verify and correct link counts...
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> done

Not sure what that is yet, but it looks like writing a directory
block found entries with invalid inode numbers in it. i.e. it's
telling me that there's something not been fixed up.

I'm actually seeing this in phase4:

        - agno = 148
Invalid inode number 0xfeffffffffffffff
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0x11fbb698/0x1000
libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000

Second time around, this does not happen, so the error has been
corrected in a later phase of the first pass.


> 5)Metadata CRC error detected at block 0x0/0x200
> but it is not CRC enabled fs

That's typically caused by junk in the superblock beyond the end
of the v4 superblock structure. It should be followed by "zeroing
junk ..."


> Made xfs metadump without file obfuscation and I'm able to reproduce the 
> problem reliably on the image (if some xfs developer wants metadump image then 
> please mail me - I don't want to put it for everyone due to obvious reasons).
> 
> So additional bug in xfs_metadump where file obfuscation "fixes" some issues. 
> Does it obfuscate but keep invalid conditions (like keeping "/" in file name) 
> ? I guess it is not doing that.

I doubt it handles a "/" in a file name properly - that's rather
illegal, and the obfuscation code probably doesn't handle it at all.
FWIW, xfs_repair will trash those files anyway:

entry at block 22 offset 560 in directory inode 419558142 has illegal name "/_198.jpg": clearing entry

So regardless of whether metadump handles them or is not going to
change the fact that filenames with "/" them are broken....

But the real question here is how did you get "/" characters in
filenames? 

> [3571367.717167] XFS (loop0): Mounting Filesystem
> [3571367.883958] XFS (loop0): Ending clean mount
> [3571367.900733] XFS (loop0): Failed to initialize disk quotas.
>
> Files are accessible etc. Just no quota. Unfortunately no information why 
> initialization failed.

I can't tell why that's happening yet. I'm not sure what the correct
state is supposed to be yet (mount options will tell me) so I'm not
sure what went wrong. As it is, you probaby should be upgrading to
a more recent kernel....

> So xfs_repair wasn't able to fix that, too.

xfs_repair isn't detecting there is a problem because the uquotino
is not corrupt and the qflags is zero. Hence it doesn't do anything.

More as I find it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-30  3:18 ` Dave Chinner
@ 2014-06-30  3:44   ` Dave Chinner
  2014-06-30  5:36   ` Arkadiusz Miśkiewicz
  1 sibling, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2014-06-30  3:44 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: Alex Elder, xfs

On Mon, Jun 30, 2014 at 01:18:10PM +1000, Dave Chinner wrote:
> On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> > 4) 
> > Phase 7 - verify and correct link counts...
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > done
> 
> Not sure what that is yet, but it looks like writing a directory
> block found entries with invalid inode numbers in it. i.e. it's
> telling me that there's something not been fixed up.
> 
> I'm actually seeing this in phase4:
> 
>         - agno = 148
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000

OK:

repair/dir2.c:#define   BADFSINO ((xfs_ino_t)0xfeffffffffffffffULL)

And:

                /*
                 * Now we can mark entries with BADFSINO's bad.
                 */
                if (!no_modify && ent_ino == BADFSINO) {
                        dep->name[0] = '/';
                        *dirty = 1;
                        junkit = 0;
                }

So these errors:

> entry at block 22 offset 560 in directory inode 419558142 has illegal name "/_198.jpg": clearing entry

Appear to be a result of repair failing at some point before phase
6 which cleans up bad inodes and entries in the directory structure.

> So regardless of whether metadump handles them or is not going to
> change the fact that filenames with "/" them are broken....
> 
> But the real question here is how did you get "/" characters in
> filenames? 

Yup, there are several places where repair overwrites the dirent name
with a leading "/" to indicate a junked entry, and this is supposed
to be detected and handled in phase 6.  Seems like the directory is
not being rebuilt in phase 6?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-30  3:18 ` Dave Chinner
  2014-06-30  3:44   ` Dave Chinner
@ 2014-06-30  5:36   ` Arkadiusz Miśkiewicz
  2014-06-30 11:12     ` Dave Chinner
  1 sibling, 1 reply; 9+ messages in thread
From: Arkadiusz Miśkiewicz @ 2014-06-30  5:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Alex Elder, xfs

On Monday 30 of June 2014, Dave Chinner wrote:
> [Compendium reply to all 3 emails]
> 
> On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> > Hello.
> > 
> > I have a fs (metadump of it
> > http://ixion.pld-linux.org/~arekm/p2/x1/web2-home.metadump.gz)
> > that xfs_repair 3.2.0 is unable to fix properly.
> > 
> > Running xfs_repair few times shows the same errors repeating:
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair2.txt
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair3.txt
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair4.txt
> > http://ixion.pld-linux.org/~arekm/p2/x1/repair5.txt
> > 
> > (repair1.txt also exists - it was initial, very big/long repair)
> > 
> > Note that fs mounts fine (and was mounting fine before and after repair)
> > but xfs_repair indicates that not everything got fixed.
> > 
> > 
> > Unfortunately there looks to be a problem with metadump image. xfs_repair
> > is able to finish fixing on a restored image but is not able (see
> > repairX.txt) above on real devices. Huh?
> > 
> > Examples of problems repeating each time xfs_repair is run:
> > 
> > 1)
> > reset bad sb for ag 5
> >
> >. non-null group quota inode field in superblock 7
> 
> OK, so this is indicative of something screwed up a long time ago.
> Firstly, the primary superblocks shows:
> 
> uquotino = 4077961
> gquotino = 0
> qflags = 0
> 
> i.e. user quota @ inode 4077961, no group quota. The secondary
> superblocks that are being warned about show:
> 
> uquotino = 0
> gquotino = 4077962
> qflags = 0
> 
> Which is clearly wrong. They should have been overwritten during the
> growfs operation to match the primary superblock.
> 
> The similarity in inode number leads me to beleive at some point
> both user and group/project quotas were enabled on this filesystem,

Both user and project quotas were enabled on this fs for last few years.

> but right now only user quotas are enabled.  It's only AGs 1-15 that
> show this, so this seems to me that it is likely that this
> filesystem was originally only 16 AGs and it's been grown many times
> since?

The quotas was running fine until some repair run (ie. before and after first 
repair mounting with quota succeeded) - some xfs_repair run later broke this.

> 
> Oh, this all occurred because you had a growfs operation on 3.10
> fail because of garbage in the the sb of AG 16 (i.e. this from IRC:
> http://sprunge.us/UJFE)? IOWs, this commit:
> 
> 9802182 xfs: verify superblocks as they are read from disk
> 
> tripped up on sb 16. That means sb 16 is was not modified by the
> growfs operation, and so should have the pre-growfs information in
> it:
> 
> uquotino = 4077961
> gquotino = 4077962
> qflags = 0x77
> 
> Yeah, that's what I thought - the previous grow operation had both
> quotas enabled. OK, that explains why the growfs operation had
> issues, but it doesn't explain exactly how the quota inodes got
> screwed up like that.

The fs had working quota when having 3 digit number of AGs. I wouldn't blame 
growfs failure to be related to quota brokeness. IMO some repair broke this 
(or tried fixing and broke).

> Anyway, the growfs issues were solved by:
> 
> 10e6e65 xfs: be more forgiving of a v4 secondary sb w/ junk in v5 fields
> 
> which landed in 3.13.

Ok.

> 
> > 2)
> > correcting nblocks for inode 965195858, was 19 - counted 20
> > correcting nextents for inode 965195858, was 16 - counted 17
> 
> Which is preceeded by:
> 
> data fork in ino 965195858 claims free block 60323539
> data fork in ino 965195858 claims free block 60323532
> 
> and when combined with the later:
> 
> entry "dsc0945153ac18d4d4f1a-150x150.jpg" (ino 967349800) in dir 965195858
> is a duplicate name, marking entry to be junked
> 
> errors from that directory, it looks like the space was freed but
> the directory btree not correctly updated. No idea what might have
> caused that, but it is a classic symptom of volatile write caches...
> 
> Hmmm, and when It goes to junk them on my local testing:
> 
> rebuilding directory inode 965195858
> name create failed in ino 965195858 (117), filesystem may be out of space
> 
> Which is an EFSCORRUPTED error trying to rebuild that directory.
> The second error pass did not throw an error, but it did not fix
> the errors as a 3rd pass still reported this. I'll look into why.
> 
> > 3) clearing some entries; moving to lost+found (the same files)
> > 
> > 
> > 4)
> > Phase 7 - verify and correct link counts...
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > done
> 
> Not sure what that is yet, but it looks like writing a directory
> block found entries with invalid inode numbers in it. i.e. it's
> telling me that there's something not been fixed up.
> 
> I'm actually seeing this in phase4:
> 
>         - agno = 148
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> 
> Second time around, this does not happen, so the error has been
> corrected in a later phase of the first pass.

Here on two runs I got exactly the same report:

Phase 7 - verify and correct link counts...

Invalid inode number 0xfeffffffffffffff
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0x11fbb698/0x1000
libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
Invalid inode number 0xfeffffffffffffff
xfs_dir_ino_validate: XFS_ERROR_REPORT
Metadata corruption detected at block 0x11fbb698/0x1000
libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000

but there were more of errors like this earlier so repair fixed some but left 
with these two.

> 
> > 5)Metadata CRC error detected at block 0x0/0x200
> > but it is not CRC enabled fs
> 
> That's typically caused by junk in the superblock beyond the end
> of the v4 superblock structure. It should be followed by "zeroing
> junk ..."

Shouldn't repair fix superblocks when noticing v4 fs?

I mean 3.2.0 repair reports:

$ xfs_repair -v ./1t-image 
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
        - block cache size set to 748144 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2 tail block 2
        - scan filesystem freespace and inode maps...
Metadata CRC error detected at block 0x0/0x200
zeroing unused portion of primary superblock (AG #0)
        - 07:20:11: scanning filesystem freespace - 391 of 391 allocation 
groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 07:20:11: scanning agi unlinked lists - 391 of 391 allocation groups 
done
        - process known inodes and perform inode discovery...
        - agno = 0
[...]

but if I run 3.1.11 after running 3.2.0 then superblocks get fixed:

$ ./xfsprogs/repair/xfs_repair -v ./1t-image 
Phase 1 - find and verify superblock...
        - block cache size set to 748144 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2 tail block 2
        - scan filesystem freespace and inode maps...
zeroing unused portion of primary superblock (AG #0)
zeroing unused portion of secondary superblock (AG #3)
zeroing unused portion of secondary superblock (AG #1)
zeroing unused portion of secondary superblock (AG #8)
zeroing unused portion of secondary superblock (AG #2)
zeroing unused portion of secondary superblock (AG #5)
zeroing unused portion of secondary superblock (AG #6)
zeroing unused portion of secondary superblock (AG #20)
zeroing unused portion of secondary superblock (AG #9)
zeroing unused portion of secondary superblock (AG #7)
zeroing unused portion of secondary superblock (AG #12)
zeroing unused portion of secondary superblock (AG #10)
zeroing unused portion of secondary superblock (AG #13)
zeroing unused portion of secondary superblock (AG #14)
[...]
zeroing unused portion of secondary superblock (AG #388)
zeroing unused portion of secondary superblock (AG #363)
        - found root inode chunk
Phase 3 - for each AG...

Shouldn't these be "unused" for 3.2.0, too (since v4 fs) ?

> > Made xfs metadump without file obfuscation and I'm able to reproduce the
> > problem reliably on the image (if some xfs developer wants metadump image
> > then please mail me - I don't want to put it for everyone due to obvious
> > reasons).
> > 
> > So additional bug in xfs_metadump where file obfuscation "fixes" some
> > issues. Does it obfuscate but keep invalid conditions (like keeping "/"
> > in file name) ? I guess it is not doing that.
> 
> I doubt it handles a "/" in a file name properly - that's rather
> illegal, and the obfuscation code probably doesn't handle it at all.

Would be nice to keep these bad conditions. obfuscated metadump is behaving 
differently than non-obfuscated metadump with xfs_repair here (less issues 
with obfuscated than non-obfuscated), so obfuscation simply hides problems.

I assume that you do testing on the non-obfuscated dump I gave on irc?

> FWIW, xfs_repair will trash those files anyway:
> 
> entry at block 22 offset 560 in directory inode 419558142 has illegal name
> "/_198.jpg": clearing entry
> 
> So regardless of whether metadump handles them or is not going to
> change the fact that filenames with "/" them are broken....
> 
> But the real question here is how did you get "/" characters in
> filenames?

No idea. It could get corrupted many months/years ago. This fs has not seen 
repair for very long time (since there was no visible issues with it).

> > [3571367.717167] XFS (loop0): Mounting Filesystem
> > [3571367.883958] XFS (loop0): Ending clean mount
> > [3571367.900733] XFS (loop0): Failed to initialize disk quotas.
> > 
> > Files are accessible etc. Just no quota. Unfortunately no information why
> > initialization failed.
> 
> I can't tell why that's happening yet. I'm not sure what the correct
> state is supposed to be yet (mount options will tell me)

noatime,nodiratime,nodev,nosuid,usrquota,prjquota

> so I'm not
> sure what went wrong. As it is, you probaby should be upgrading to
> a more recent kernel....

I can try to mount metadump image on newer kernel - will check and report 
back.

> > So xfs_repair wasn't able to fix that, too.
> 
> xfs_repair isn't detecting there is a problem because the uquotino
> is not corrupt and the qflags is zero. Hence it doesn't do anything.
> 
> More as I find it.
> 
> Cheers,
> 
> Dave.


-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-30  5:36   ` Arkadiusz Miśkiewicz
@ 2014-06-30 11:12     ` Dave Chinner
  2014-06-30 11:53       ` Arkadiusz Miśkiewicz
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2014-06-30 11:12 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: Alex Elder, xfs

On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of June 2014, Dave Chinner wrote:
> > [Compendium reply to all 3 emails]
> > 
> > On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> > > reset bad sb for ag 5
> > >
> > >. non-null group quota inode field in superblock 7
> > 
> > OK, so this is indicative of something screwed up a long time ago.
> > Firstly, the primary superblocks shows:
> > 
> > uquotino = 4077961
> > gquotino = 0
> > qflags = 0
> > 
> > i.e. user quota @ inode 4077961, no group quota. The secondary
> > superblocks that are being warned about show:
> > 
> > uquotino = 0
> > gquotino = 4077962
> > qflags = 0
> > 
> > Which is clearly wrong. They should have been overwritten during the
> > growfs operation to match the primary superblock.
> > 
> > The similarity in inode number leads me to beleive at some point
> > both user and group/project quotas were enabled on this filesystem,
> 
> Both user and project quotas were enabled on this fs for last few years.
> 
> > but right now only user quotas are enabled.  It's only AGs 1-15 that
> > show this, so this seems to me that it is likely that this
> > filesystem was originally only 16 AGs and it's been grown many times
> > since?
> 
> The quotas was running fine until some repair run (ie. before and after first 
> repair mounting with quota succeeded) - some xfs_repair run later broke this.

Actually, it looks more likely that a quotacheck has failed part way
though, leaving the quota in an indeterminate state and then repair
has been run, messing things up more...

> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > done
> > 
> > Not sure what that is yet, but it looks like writing a directory
> > block found entries with invalid inode numbers in it. i.e. it's
> > telling me that there's something not been fixed up.
> > 
> > I'm actually seeing this in phase4:
> > 
> >         - agno = 148
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > 
> > Second time around, this does not happen, so the error has been
> > corrected in a later phase of the first pass.
> 
> Here on two runs I got exactly the same report:
> 
> Phase 7 - verify and correct link counts...
> 
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> Invalid inode number 0xfeffffffffffffff
> xfs_dir_ino_validate: XFS_ERROR_REPORT
> Metadata corruption detected at block 0x11fbb698/0x1000
> libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> 
> but there were more of errors like this earlier so repair fixed some but left 
> with these two.

Right, I suspect that I've got a partial fix for this already in
place - i was having xfs_repair -n ... SEGV on when parsing the
broken directory in phase 6, so I have some code that prevents that
crash which might also be partially fixing this.

> > > 5)Metadata CRC error detected at block 0x0/0x200
> > > but it is not CRC enabled fs
> > 
> > That's typically caused by junk in the superblock beyond the end
> > of the v4 superblock structure. It should be followed by "zeroing
> > junk ..."
> 
> Shouldn't repair fix superblocks when noticing v4 fs?

It does.

> I mean 3.2.0 repair reports:
> 
> $ xfs_repair -v ./1t-image 
> Phase 1 - find and verify superblock...
>         - reporting progress in intervals of 15 minutes
>         - block cache size set to 748144 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 2 tail block 2
>         - scan filesystem freespace and inode maps...
> Metadata CRC error detected at block 0x0/0x200
> zeroing unused portion of primary superblock (AG #0)
>         - 07:20:11: scanning filesystem freespace - 391 of 391 allocation 
> groups done
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - 07:20:11: scanning agi unlinked lists - 391 of 391 allocation groups 
> done
>         - process known inodes and perform inode discovery...
>         - agno = 0
> [...]
> 
> but if I run 3.1.11 after running 3.2.0 then superblocks get fixed:
> 
> $ ./xfsprogs/repair/xfs_repair -v ./1t-image 
> Phase 1 - find and verify superblock...
>         - block cache size set to 748144 entries
> Phase 2 - using internal log
>         - zero log...
> zero_log: head block 2 tail block 2
>         - scan filesystem freespace and inode maps...
> zeroing unused portion of primary superblock (AG #0)

,,,

> Shouldn't these be "unused" for 3.2.0, too (since v4 fs) ?

I'm pretty sure that's indicative of older xfs_repair code
not understanding that sb_badfeatures2  didn't need to be zeroed.
It wasn't until:

cbd7508 xfs_repair: zero out unused parts of superblocks

that xfs_repair correctly sized the unused area of the superblock.
You'll probably find that mounting this filesystem resulted in 
""sb_badfeatures2 mistach detected. Correcting." or something
similar in dmesg because of this (now fixed) repair bug.

> > > Made xfs metadump without file obfuscation and I'm able to reproduce the
> > > problem reliably on the image (if some xfs developer wants metadump image
> > > then please mail me - I don't want to put it for everyone due to obvious
> > > reasons).
> > > 
> > > So additional bug in xfs_metadump where file obfuscation "fixes" some
> > > issues. Does it obfuscate but keep invalid conditions (like keeping "/"
> > > in file name) ? I guess it is not doing that.
> > 
> > I doubt it handles a "/" in a file name properly - that's rather
> > illegal, and the obfuscation code probably doesn't handle it at all.
> 
> Would be nice to keep these bad conditions. obfuscated metadump is behaving 
> differently than non-obfuscated metadump with xfs_repair here (less issues 
> with obfuscated than non-obfuscated), so obfuscation simply hides problems.

Sure, but we didn't even know this was a problem until now, so that
will have to wait....

> I assume that you do testing on the non-obfuscated dump I gave on irc?

Yes, but I've been cross checking against the obfuscated one with
xfs_db....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-30 11:12     ` Dave Chinner
@ 2014-06-30 11:53       ` Arkadiusz Miśkiewicz
  2014-06-30 12:06         ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Arkadiusz Miśkiewicz @ 2014-06-30 11:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Alex Elder, xfs

On Monday 30 of June 2014, Dave Chinner wrote:
> On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote:
> > On Monday 30 of June 2014, Dave Chinner wrote:
> > > [Compendium reply to all 3 emails]
> > > 
> > > On Sat, Jun 28, 2014 at 01:41:54AM +0200, Arkadiusz Miśkiewicz wrote:
> > > > reset bad sb for ag 5
> > > >
> > > >. non-null group quota inode field in superblock 7
> > > 
> > > OK, so this is indicative of something screwed up a long time ago.
> > > Firstly, the primary superblocks shows:
> > > 
> > > uquotino = 4077961
> > > gquotino = 0
> > > qflags = 0
> > > 
> > > i.e. user quota @ inode 4077961, no group quota. The secondary
> > > superblocks that are being warned about show:
> > > 
> > > uquotino = 0
> > > gquotino = 4077962
> > > qflags = 0
> > > 
> > > Which is clearly wrong. They should have been overwritten during the
> > > growfs operation to match the primary superblock.
> > > 
> > > The similarity in inode number leads me to beleive at some point
> > > both user and group/project quotas were enabled on this filesystem,
> > 
> > Both user and project quotas were enabled on this fs for last few years.
> > 
> > > but right now only user quotas are enabled.  It's only AGs 1-15 that
> > > show this, so this seems to me that it is likely that this
> > > filesystem was originally only 16 AGs and it's been grown many times
> > > since?
> > 
> > The quotas was running fine until some repair run (ie. before and after
> > first repair mounting with quota succeeded) - some xfs_repair run later
> > broke this.
> 
> Actually, it looks more likely that a quotacheck has failed part way
> though, leaving the quota in an indeterminate state and then repair
> has been run, messing things up more...

Hm, the only quotacheck I see in logs from that day reported "Done". I assume 
it wouldn't report that if some problem occured in middle?

Jun 28 00:57:36 web2 kernel: [736161.906626] XFS (dm-1): Quotacheck needed: 
Please wait.
Jun 28 01:09:10 web2 kernel: [736855.851555] XFS (dm-1): Quotacheck: Done.

[...] here were few Internal error xfs_bmap_read_extents(1) while doing 
xfs_dir_lookup (I assume due to not fixed directory entries problem). 
xfs_repair was also run few times and then...

Jun 28 23:16:50 web2 kernel: [816515.898210] XFS (dm-1): Mounting Filesystem
Jun 28 23:16:50 web2 kernel: [816515.915356] XFS (dm-1): Ending clean mount
Jun 28 23:16:50 web2 kernel: [816515.940008] XFS (dm-1): Failed to initialize 
disk quotas.


> 
> > > > Invalid inode number 0xfeffffffffffffff
> > > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > > done
> > > 
> > > Not sure what that is yet, but it looks like writing a directory
> > > block found entries with invalid inode numbers in it. i.e. it's
> > > telling me that there's something not been fixed up.
> > > 
> > > I'm actually seeing this in phase4:
> > >         - agno = 148
> > > 
> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > 
> > > Second time around, this does not happen, so the error has been
> > > corrected in a later phase of the first pass.
> > 
> > Here on two runs I got exactly the same report:
> > 
> > Phase 7 - verify and correct link counts...
> > 
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > Invalid inode number 0xfeffffffffffffff
> > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > Metadata corruption detected at block 0x11fbb698/0x1000
> > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > 
> > but there were more of errors like this earlier so repair fixed some but
> > left with these two.
> 
> Right, I suspect that I've got a partial fix for this already in
> place - i was having xfs_repair -n ... SEGV on when parsing the
> broken directory in phase 6, so I have some code that prevents that
> crash which might also be partially fixing this.

Nice :-) Do you also know why 3.1.11 doesn't have this problem with 
xfs_dir_ino_validate: XFS_ERROR_REPORT ?

> > > > 5)Metadata CRC error detected at block 0x0/0x200
> > > > but it is not CRC enabled fs
> > > 
> > > That's typically caused by junk in the superblock beyond the end
> > > of the v4 superblock structure. It should be followed by "zeroing
> > > junk ..."
> > 
> > Shouldn't repair fix superblocks when noticing v4 fs?
> 
> It does.
> 
> > I mean 3.2.0 repair reports:
> > 
> > $ xfs_repair -v ./1t-image
> > Phase 1 - find and verify superblock...
> > 
> >         - reporting progress in intervals of 15 minutes
> >         - block cache size set to 748144 entries
> > 
> > Phase 2 - using internal log
> > 
> >         - zero log...
> > 
> > zero_log: head block 2 tail block 2
> > 
> >         - scan filesystem freespace and inode maps...
> > 
> > Metadata CRC error detected at block 0x0/0x200
> > zeroing unused portion of primary superblock (AG #0)
> > 
> >         - 07:20:11: scanning filesystem freespace - 391 of 391 allocation
> > 
> > groups done
> > 
> >         - found root inode chunk
> > 
> > Phase 3 - for each AG...
> > 
> >         - scan and clear agi unlinked lists...
> >         - 07:20:11: scanning agi unlinked lists - 391 of 391 allocation
> >         groups
> > 
> > done
> > 
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> > 
> > [...]
> > 
> > but if I run 3.1.11 after running 3.2.0 then superblocks get fixed:
> > 
> > $ ./xfsprogs/repair/xfs_repair -v ./1t-image
> > Phase 1 - find and verify superblock...
> > 
> >         - block cache size set to 748144 entries
> > 
> > Phase 2 - using internal log
> > 
> >         - zero log...
> > 
> > zero_log: head block 2 tail block 2
> > 
> >         - scan filesystem freespace and inode maps...
> > 
> > zeroing unused portion of primary superblock (AG #0)
> 
> ,,,
> 
> > Shouldn't these be "unused" for 3.2.0, too (since v4 fs) ?
> 
> I'm pretty sure that's indicative of older xfs_repair code
> not understanding that sb_badfeatures2  didn't need to be zeroed.
> It wasn't until:
> 
> cbd7508 xfs_repair: zero out unused parts of superblocks
> 
> that xfs_repair correctly sized the unused area of the superblock.
> You'll probably find that mounting this filesystem resulted in
> ""sb_badfeatures2 mistach detected. Correcting." or something
> similar in dmesg because of this (now fixed) repair bug.

Tested 3.1.11 with  cbd7508 applied and indeed no "zeroing unused portion of 
primary superblock: anymore.

> > > > Made xfs metadump without file obfuscation and I'm able to reproduce
> > > > the problem reliably on the image (if some xfs developer wants
> > > > metadump image then please mail me - I don't want to put it for
> > > > everyone due to obvious reasons).
> > > > 
> > > > So additional bug in xfs_metadump where file obfuscation "fixes" some
> > > > issues. Does it obfuscate but keep invalid conditions (like keeping
> > > > "/" in file name) ? I guess it is not doing that.
> > > 
> > > I doubt it handles a "/" in a file name properly - that's rather
> > > illegal, and the obfuscation code probably doesn't handle it at all.
> > 
> > Would be nice to keep these bad conditions. obfuscated metadump is
> > behaving differently than non-obfuscated metadump with xfs_repair here
> > (less issues with obfuscated than non-obfuscated), so obfuscation simply
> > hides problems.
> 
> Sure, but we didn't even know this was a problem until now, so that
> will have to wait....
> 
> > I assume that you do testing on the non-obfuscated dump I gave on irc?
> 
> Yes, but I've been cross checking against the obfuscated one with
> xfs_db....
> 
> Cheers,
> 
> Dave.


-- 
Arkadiusz Miśkiewicz, arekm / maven.pl

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: xfs_repair 3.2.0 cannot (?) fix fs
  2014-06-30 11:53       ` Arkadiusz Miśkiewicz
@ 2014-06-30 12:06         ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2014-06-30 12:06 UTC (permalink / raw)
  To: Arkadiusz Miśkiewicz; +Cc: Alex Elder, xfs

On Mon, Jun 30, 2014 at 01:53:10PM +0200, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of June 2014, Dave Chinner wrote:
> > On Mon, Jun 30, 2014 at 07:36:24AM +0200, Arkadiusz Miśkiewicz wrote:
> > > On Monday 30 of June 2014, Dave Chinner wrote:
> > > > but right now only user quotas are enabled.  It's only AGs 1-15 that
> > > > show this, so this seems to me that it is likely that this
> > > > filesystem was originally only 16 AGs and it's been grown many times
> > > > since?
> > > 
> > > The quotas was running fine until some repair run (ie. before and after
> > > first repair mounting with quota succeeded) - some xfs_repair run later
> > > broke this.
> > 
> > Actually, it looks more likely that a quotacheck has failed part way
> > though, leaving the quota in an indeterminate state and then repair
> > has been run, messing things up more...
> 
> Hm, the only quotacheck I see in logs from that day reported "Done". I assume 
> it wouldn't report that if some problem occured in middle?
> 
> Jun 28 00:57:36 web2 kernel: [736161.906626] XFS (dm-1): Quotacheck needed: 
> Please wait.
> Jun 28 01:09:10 web2 kernel: [736855.851555] XFS (dm-1): Quotacheck: Done.

If there was an error, it should report it and say that quotas are
being turned off.

> [...] here were few Internal error xfs_bmap_read_extents(1) while doing 
> xfs_dir_lookup (I assume due to not fixed directory entries problem). 
> xfs_repair was also run few times and then...
> 
> Jun 28 23:16:50 web2 kernel: [816515.898210] XFS (dm-1): Mounting Filesystem
> Jun 28 23:16:50 web2 kernel: [816515.915356] XFS (dm-1): Ending clean mount
> Jun 28 23:16:50 web2 kernel: [816515.940008] XFS (dm-1): Failed to initialize 
> disk quotas.

I haven't yet tracked down what the error here is yet - I'm still
working on the reapir side of things before I even try to mount the
images you sent me. :/

Once I get repair running cleanly, I'll look at why this is failing.

> > > > > Invalid inode number 0xfeffffffffffffff
> > > > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > > > done
> > > > 
> > > > Not sure what that is yet, but it looks like writing a directory
> > > > block found entries with invalid inode numbers in it. i.e. it's
> > > > telling me that there's something not been fixed up.
> > > > 
> > > > I'm actually seeing this in phase4:
> > > >         - agno = 148
> > > > 
> > > > Invalid inode number 0xfeffffffffffffff
> > > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > > 
> > > > Second time around, this does not happen, so the error has been
> > > > corrected in a later phase of the first pass.
> > > 
> > > Here on two runs I got exactly the same report:
> > > 
> > > Phase 7 - verify and correct link counts...
> > > 
> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > Invalid inode number 0xfeffffffffffffff
> > > xfs_dir_ino_validate: XFS_ERROR_REPORT
> > > Metadata corruption detected at block 0x11fbb698/0x1000
> > > libxfs_writebufr: write verifer failed on bno 0x11fbb698/0x1000
> > > 
> > > but there were more of errors like this earlier so repair fixed some but
> > > left with these two.
> > 
> > Right, I suspect that I've got a partial fix for this already in
> > place - i was having xfs_repair -n ... SEGV on when parsing the
> > broken directory in phase 6, so I have some code that prevents that
> > crash which might also be partially fixing this.
> 
> Nice :-) Do you also know why 3.1.11 doesn't have this problem with 
> xfs_dir_ino_validate: XFS_ERROR_REPORT ?

Oh, that's easy: 3.1.11 doesn't have write verifiers, so it would
never know that it wrote a bad inode number to disk. Like the kernel
code, the write verifiers actually check that the modifications
being made result in valid on disk values, and that's something
we've never had in repair before 3.2.0.

IOWs, 3.1.11 could well be writing inodes with 0xfeffffffffffffff in
them, but there's nothing to catch that in repair or libxfs on read
or write. Hence we could be tripping over an old bug we never knew
existed until now...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-06-30 12:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-27 23:41 xfs_repair 3.2.0 cannot (?) fix fs Arkadiusz Miśkiewicz
2014-06-28 21:52 ` Arkadiusz Miśkiewicz
2014-06-28 22:01   ` Arkadiusz Miśkiewicz
2014-06-30  3:18 ` Dave Chinner
2014-06-30  3:44   ` Dave Chinner
2014-06-30  5:36   ` Arkadiusz Miśkiewicz
2014-06-30 11:12     ` Dave Chinner
2014-06-30 11:53       ` Arkadiusz Miśkiewicz
2014-06-30 12:06         ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.