From: Qian Cai <cai@lca.pw> To: Dave Chinner <david@fromorbit.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>, Christoph Hellwig <hch@lst.de>, linux-xfs@vger.kernel.org, LKML <linux-kernel@vger.kernel.org> Subject: Re: linux-next: xfs metadata corruption since 30 March Date: Tue, 31 Mar 2020 22:13:42 -0400 [thread overview] Message-ID: <05FB019A-F4DC-414C-B8D9-D2735AF22034@lca.pw> (raw) In-Reply-To: <20200331221324.GZ10776@dread.disaster.area> > On Mar 31, 2020, at 6:13 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Tue, Mar 31, 2020 at 05:57:24PM -0400, Qian Cai wrote: >> Ever since two days ago, linux-next starts to trigger xfs metadata corruption >> during compilation workloads on both powerpc and arm64, > > Is this on an existing filesystem, or a new filesystem? New. > >> I suspect it could be one of those commits, >> >> https://lore.kernel.org/linux-xfs/20200328182533.GM29339@magnolia/ >> >> Especially, those commits that would mark corruption more aggressively? >> >> [8d57c21600a5] xfs: add a function to deal with corrupt buffers post-verifiers >> [e83cf875d67a] xfs: xfs_buf_corruption_error should take __this_address >> [ce99494c9699] xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails >> [1cb5deb5bc09] xfs: don't ever return a stale pointer from __xfs_dir3_free_read >> [6fb5aac73310] xfs: check owner of dir3 free blocks >> [a10c21ed5d52] xfs: check owner of dir3 data blocks >> [1b2c1a63b678] xfs: check owner of dir3 blocks >> [2e107cf869ee] xfs: mark dir corrupt when lookup-by-hash fails >> [806d3909a57e] xfs: mark extended attr corrupt when lookup-by-hash fails > > Doubt it - they only add extra detection code and these: > >> [29331.182313][ T665] XFS (dm-2): Metadata corruption detected at xfs_inode_buf_verify+0x2b8/0x350 [xfs], xfs_inode block 0xa9b97900 xfs_inode_buf_verify >> xfs_inode_buf_verify at fs/xfs/libxfs/xfs_inode_buf.c:101 >> [29331.182373][ T665] XFS (dm-2): Unmount and run xfs_repair >> [29331.182386][ T665] XFS (dm-2): First 128 bytes of corrupted metadata buffer: >> [29331.182402][ T665] 00000000: 2f 2a 20 53 50 44 58 2d 4c 69 63 65 6e 73 65 2d /* SPDX-License- >> [29331.182426][ T665] 00000010: 49 64 65 6e 74 69 66 69 65 72 3a 20 47 50 4c 2d Identifier: GPL- > > Would get caught by the existing verifiers as they aren't valid > metadata at all. > > Basically, you are getting file data where there should be inode > metadata. First thing to do is fix the existing corruptions with > xfs_repair - please post the entire output so we can see what was > corruption and what it fixed. # xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home Phase 1 - find and verify superblock... - block cache size set to 4355512 entries Phase 2 - using internal log - zero log... zero_log: head block 793608 tail block 786824 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. # mount /dev/mapper/rhel_hpe--apollo--cn99xx--11-home /home/ # umount /home/ # xfs_repair -v /dev/mapper/rhel_hpe--apollo--cn99xx--11-home Phase 1 - find and verify superblock... - block cache size set to 4355512 entries Phase 2 - using internal log - zero log... zero_log: head block 793624 tail block 793624 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... XFS_REPAIR Summary Tue Mar 31 22:10:54 2020 Phase Start End Duration Phase 1: 03/31 22:10:45 03/31 22:10:45 Phase 2: 03/31 22:10:45 03/31 22:10:45 Phase 3: 03/31 22:10:45 03/31 22:10:46 1 second Phase 4: 03/31 22:10:46 03/31 22:10:53 7 seconds Phase 5: 03/31 22:10:53 03/31 22:10:53 Phase 6: 03/31 22:10:53 03/31 22:10:53 Phase 7: 03/31 22:10:53 03/31 22:10:53 Total run time: 8 seconds done > > Then if the problem is still reproducable, I suspect you are going > to have to bisect it. i.e. run test, get corruption, mark bisect > bad, run xfs_repair or mkfs to fix mess, install new kernel, run > test again.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
next prev parent reply other threads:[~2020-04-01 2:13 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-03-31 21:57 Qian Cai 2020-03-31 22:13 ` Dave Chinner 2020-04-01 2:13 ` Qian Cai [this message] 2020-04-01 4:14 ` Chandan Rajendra 2020-04-01 4:15 ` Qian Cai 2020-04-01 4:45 ` Darrick J. Wong 2020-04-01 6:10 ` Chandan Rajendra 2020-04-01 13:54 ` Qian Cai 2020-04-01 12:34 ` Brian Foster 2020-04-01 16:21 ` Brian Foster 2020-04-01 18:24 ` Qian Cai
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=05FB019A-F4DC-414C-B8D9-D2735AF22034@lca.pw \ --to=cai@lca.pw \ --cc=darrick.wong@oracle.com \ --cc=david@fromorbit.com \ --cc=hch@lst.de \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-xfs@vger.kernel.org \ --subject='Re: linux-next: xfs metadata corruption since 30 March' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).