linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Dave Chinner <david@fromorbit.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: xfs metadata corruption since 30 March
Date: Tue, 31 Mar 2020 22:13:42 -0400	[thread overview]
Message-ID: <05FB019A-F4DC-414C-B8D9-D2735AF22034@lca.pw> (raw)
In-Reply-To: <20200331221324.GZ10776@dread.disaster.area>



> On Mar 31, 2020, at 6:13 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote:
>> Ever since two days ago, linux-next starts to trigger xfs metadata corruption
>> during compilation workloads on both powerpc and arm64,
> 
> Is this on an existing filesystem, or a new filesystem?

New.

> 
>> I suspect it could be one of those commits,
>> 
>> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/
>> 
>> Especially, those commits that would mark corruption more aggressively?
>> 
>>      [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers
>>      [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address
>>      [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
>>      [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read
>>      [6fb5aac73310] xfs: check owner of dir3 free blocks
>>      [a10c21ed5d52] xfs: check owner of dir3 data blocks
>>      [1b2c1a63b678] xfs: check owner of dir3 blocks
>>      [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails
>>      [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails
> 
> Doubt it - they only add extra detection code and these:
> 
>> [29331.182313][  T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify
>> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101
>> [29331.182373][  T665] XFS (dm-2): Unmount and run xfs_repair
>> [29331.182386][  T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer:
>> [29331.182402][  T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d  /* SPDX-License-
>> [29331.182426][  T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d  Identifier: GPL-
> 
> Would get caught by the existing  verifiers as they aren't valid
> metadata at all.
> 
> Basically, you are getting file data where there should be inode
> metadata. First thing to do is fix the existing corruptions with
> xfs_repair - please post the entire output so we can see what was
> corruption and what it fixed.


# xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home 
Phase 1 - find and verify superblock...
        - block cache size set to 4355512 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 793608 tail block 786824
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

# mount /dev/mapper/rhel_hpe--apollo--cn99xx--11-home /home/
# umount /home/
# xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home 
Phase 1 - find and verify superblock...
        - block cache size set to 4355512 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 793624 tail block 793624
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Tue Mar 31 22:10:54 2020

Phase		Start		End		Duration
Phase 1:	03/31 22:10:45	03/31 22:10:45	
Phase 2:	03/31 22:10:45	03/31 22:10:45	
Phase 3:	03/31 22:10:45	03/31 22:10:46	1 second
Phase 4:	03/31 22:10:46	03/31 22:10:53	7 seconds
Phase 5:	03/31 22:10:53	03/31 22:10:53	
Phase 6:	03/31 22:10:53	03/31 22:10:53	
Phase 7:	03/31 22:10:53	03/31 22:10:53	

Total run time: 8 seconds
done
> 
> Then if the problem is still reproducable, I suspect you are going
> to have to bisect it. i.e. run test, get corruption, mark bisect
> bad, run xfs_repair or mkfs to fix mess, install new kernel, run
> test again....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


  reply	other threads:[~2020-04-01  2:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-31 21:57 linux-next: xfs metadata corruption since 30 March Qian Cai
2020-03-31 22:13 ` Dave Chinner
2020-04-01  2:13   ` Qian Cai [this message]
2020-04-01  4:14 ` Chandan Rajendra
2020-04-01  4:15   ` Qian Cai
2020-04-01  4:45     ` Darrick J. Wong
2020-04-01  6:10       ` Chandan Rajendra
2020-04-01 13:54       ` Qian Cai
2020-04-01 12:34 ` Brian Foster
2020-04-01 16:21   ` Brian Foster
2020-04-01 18:24     ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05FB019A-F4DC-414C-B8D9-D2735AF22034@lca.pw \
    --to=cai@lca.pw \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).