linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: John Jore <john@jore.no>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption
Date: Sun, 9 Feb 2020 21:47:18 -0600	[thread overview]
Message-ID: <74152f80-3a42-eab5-a95f-e29f03db46a9@sandeen.net> (raw)
In-Reply-To: <b2babb761ed24dc986abc3073c5c47fc@jore.no>

On 2/9/20 12:19 AM, John Jore wrote:
> Hi all,
> 
> Not sure if this is the appropriate forum to reports xfs_repair bugs? If wrong, please point me in the appropriate direction?

This is the place.

> I have a corrupted XFS volume which mounts fine, but xfs_repair is unable to repair it and volume eventually shuts down due to metadata corruption if writes are performed.

what does dmesg say when it shuts down?

> 
> Originally I used xfs_repair from CentOS 8.1.1911, but cloned latest xfs_repair from git://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git (Today, Feb 9th, reports as version 5.4.0)
> 
> 
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - 16:08:04: scanning agi unlinked lists - 64 of 64 allocation groups done
>         - process known inodes and perform inode discovery...
>         - agno = 45
>         - agno = 15
>         - agno = 0
>         - agno = 30
>         - agno = 60
>         - agno = 46
>         - agno = 16
> Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
>         - agno = 61
>         - agno = 31
>         - agno = 47
>         - agno = 62
>         - agno = 48
>         - agno = 49
>         - agno = 32
>         - agno = 33
>         - agno = 17
>         - agno = 1
> bad magic number 0x0 on inode 18253615584
> bad version number 0x0 on inode 18253615584
> bad magic number 0x0 on inode 18253615585
> bad version number 0x0 on inode 18253615585
> bad magic number 0x0 on inode 18253615586 
> .....
> bad magic number 0x0 on inode 18253615584, resetting magic number
> bad version number 0x0 on inode 18253615584, resetting version number
> bad magic number 0x0 on inode 18253615585, resetting magic number
> bad version number 0x0 on inode 18253615585, resetting version number
> bad magic number 0x0 on inode 18253615586, resetting magic number
> bad version number 0x0 on inode 18253615586, resetting version number

Looks like a whole chunk of inodes with at least 0 magic numbers.

> ....
>         - agno = 16
>         - agno = 17
> Metadata corruption detected at 0x4330e3, xfs_inode block 0x17312a3f0/0x2000
>         - agno = 18
>         - agno = 19
> ...   
> Phase 7 - verify and correct link counts...
>         - 16:10:41: verify and correct link counts - 64 of 64 allocation groups done
> Metadata corruption detected at 0x433385, xfs_inode block 0x17312a3f0/0x2000
> libxfs_writebufr: write verifier failed on xfs_inode bno 0x17312a3f0/0x2000

This bit seems problematic, I guess it's unable to write the updated inode buffer,
due to some corruption, which presumably is why you keep tripping over the same
corruption each time.

> releasing dirty buffer (bulk) to free list!
> 
>  
> 
> Does not matter how many times, I've lost count, I re-run xfs_repair, with, or without -d,

-d is for repairing a filesystem while mounted.  I hope you are not doing that, are you?

> it never does repair the volume.
> Volume is a ~12GB LV build using 4x 4TB disks in RAID 5 using a 3Ware 9690SA controller. 

Just to double check, are there any storage errors reported in dmesg?

> Any suggestions or additional data I can provide?

If you are willing to provide an xfs_metadump to me (off-list) I will see if I can
reproduce it from the metadump. 

# xfs_metadump /dev/$WHATEVER metadump.img
# bzip2 metadump.img

-Eric

> 
> John
> 

  reply	other threads:[~2020-02-10  3:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <186d30f217e645728ad1f34724cbe3e7@jore.no>
2020-02-09  6:19 ` Bug in xfs_repair 5..4.0 / Unable to repair metadata corruption John Jore
2020-02-10  3:47   ` Eric Sandeen [this message]
2020-02-10  3:49     ` Eric Sandeen
     [not found]       ` <60f32c031f4345a2b680fbc8531f7bd3@jore.no>
2020-02-10 10:33         ` John Jore
2020-02-10 14:36         ` Eric Sandeen
2020-02-10 14:43     ` Eric Sandeen
2020-02-10 15:35       ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74152f80-3a42-eab5-a95f-e29f03db46a9@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=john@jore.no \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).