From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D56A4C74A5B for ; Sat, 18 Mar 2023 19:21:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229488AbjCRTVI (ORCPT ); Sat, 18 Mar 2023 15:21:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbjCRTVH (ORCPT ); Sat, 18 Mar 2023 15:21:07 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D71E7687 for ; Sat, 18 Mar 2023 12:21:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id D9A4CB80861 for ; Sat, 18 Mar 2023 19:21:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A37AC4339B; Sat, 18 Mar 2023 19:20:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679167259; bh=k3idzc2zTJ7xaAXIZqiGQPAgSsww4xkZPrUIQ2tOA+w=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Xlw9cOXvJV373/1T8RYD3UHc4oYOH3nKF8geogAcghF7VR23mZb+KuTpuv+w936Qp YCozRuxUNKnVnwegb+n8+8pfHIiEyMVoiowPaP4akdb87+PCC6qM4CD7FMVcPuT8pf gedGfBqLWRriWXHypXoll1VEywcDHfabpuCPsjZzfCvtvJr0rWctiedN16FwZ9xrt5 TebAyh/yP2lpKGbfc8sYohmm8p/ZHUNsruVCFHzfHc4D4vQC8ilwJnj1Q+jGLNh/Nw aMc0ctYmeT6mvhtaCj1o1/B1nP3UZcqNI18zkmdcjBpKGqKA6aQxHgyc10/6wBcIYc sCcUzMpz0WRiA== Date: Sat, 18 Mar 2023 12:20:59 -0700 From: "Darrick J. Wong" To: Ritesh Harjani Cc: shrikanth hegde , dchinner@redhat.com, linux-xfs@vger.kernel.org, Srikar Dronamraju , ojaswin@linux.ibm.com Subject: Re: xfs: system fails to boot up due to Internal error xfs_trans_cancel Message-ID: <20230318192059.GX11376@frogsfrogsfrogs> References: <20230317204418.GQ11376@frogsfrogsfrogs> <87pm969f6z.fsf@doe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87pm969f6z.fsf@doe.com> Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Sat, Mar 18, 2023 at 10:20:28PM +0530, Ritesh Harjani wrote: > "Darrick J. Wong" writes: > > > On Wed, Mar 15, 2023 at 10:20:37PM -0700, Darrick J. Wong wrote: > >> On Thu, Mar 16, 2023 at 10:16:02AM +0530, Ritesh Harjani wrote: > >> > "Darrick J. Wong" writes: > >> > > >> > Hi Darrick, > >> > > >> > Thanks for your analysis and quick help on this. > >> > > >> > >> > >> > >> Hi Darrick, > >> > >> > >> > >> Please find the information collected from the system. We added some > >> > >> debug logs and looks like it is exactly what is happening which you > >> > >> pointed out. > >> > >> > >> > >> We added a debug kernel patch to get more info from the system which > >> > >> you had requested [1] > >> > >> > >> > >> 1. We first breaked into emergency shell where root fs is first getting > >> > >> mounted on /sysroot as "ro" filesystem. Here are the logs. > >> > >> > >> > >> [ OK ] Started File System Check on /dev/mapper/rhel_ltcden3--lp1-root. > >> > >> Mounting /sysroot... > >> > >> [ 7.203990] SGI XFS with ACLs, security attributes, quota, no debug enabled > >> > >> [ 7.205835] XFS (dm-0): Mounting V5 Filesystem 7b801289-75a7-4d39-8cd3-24526e9e9da7 > >> > >> [ ***] A start job is running for /sysroot (15s / 1min 35s)[ 17.439377] XFS (dm-0): Starting recovery (logdev: internal) > >> > >> [ *** ] A start job is running for /sysroot (16s / 1min 35s)[ 17.771158] xfs_log_mount_finish: Recovery needed is set > >> > >> [ 17.771172] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:0 > >> > >> [ 17.771179] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:1 > >> > >> [ 17.771184] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:2 > >> > >> [ 17.771190] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:3 > >> > >> [ 17.771196] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:4 > >> > >> [ 17.771201] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:5 > >> > >> [ 17.801033] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:6 > >> > >> [ 17.801041] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:7 > >> > >> [ 17.801046] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:8 > >> > >> [ 17.801052] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:9 > >> > >> [ 17.801057] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:10 > >> > >> [ 17.801063] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:11 > >> > >> [ 17.801068] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:12 > >> > >> [ 17.801272] xlog_recover_iunlink_bucket: bucket: 13, agino: 3064909, ino: 3064909, iget ret: 0, previno:18446744073709551615, prev_agino:4294967295 > >> > >> > >> > >> is just <-1 %ull and -1 %u> in above. That's why > >> > >> the huge value. > >> > > > >> > > Ok, so log recovery finds 3064909 and clears it... > >> > > > >> > >> [ 17.801281] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:13 > >> > >> [ 17.801287] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:0, bucket:14 > >> > > > >> > > > >> > > > >> > >> [ 17.844910] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:3, bucket:62 > >> > >> [ 17.844916] xlog_recover_iunlink_ag: ran xlog_recover_iunlink_bucket for agi:3, bucket:63 > >> > >> [ 17.886079] XFS (dm-0): Ending recovery (logdev: internal) > >> > >> [ OK ] Mounted /sysroot. > >> > >> [ OK ] Reached target Initrd Root File System. > >> > >> > >> > >> > >> > >> 2. Then these are the logs from xfs_repair -n /dev/dm-0 > >> > >> Here you will notice the same agi 3064909 in bucket 13 (from phase-2) which got also > >> > >> printed in above xlog_recover_iunlink_ag() function. > >> > >> > >> > >> switch_root:/# xfs_repair -n /dev/dm-0 > >> > >> Phase 1 - find and verify superblock... > >> > >> Phase 2 - using internal log > >> > >> - zero log... > >> > >> - scan filesystem freespace and inode maps... > >> > >> agi unlinked bucket 13 is 3064909 in ag 0 (inode=3064909) > >> > > > >> > > ...yet here we find that 3064909 is still on the unlinked list? > >> > > > >> > > Just to confirm -- you ran xfs_repair -n after the successful recovery > >> > > above, right? > >> > > > >> > Yes, that's right. > >> > > >> > >> - found root inode chunk > >> > >> Phase 3 - for each AG... > >> > >> - scan (but don't clear) agi unlinked lists... > >> > >> - process known inodes and perform inode discovery... > >> > >> - agno = 0 > >> > >> - agno = 1 > >> > >> - agno = 2 > >> > >> - agno = 3 > >> > >> - process newly discovered inodes... > >> > >> Phase 4 - check for duplicate blocks... > >> > >> - setting up duplicate extent list... > >> > >> - check for inodes claiming duplicate blocks... > >> > >> - agno = 0 > >> > >> - agno = 2 > >> > >> - agno = 1 > >> > >> - agno = 3 > >> > >> No modify flag set, skipping phase 5 > >> > >> Phase 6 - check inode connectivity... > >> > >> - traversing filesystem ... > >> > >> - traversal finished ... > >> > >> - moving disconnected inodes to lost+found ... > >> > >> Phase 7 - verify link counts... > >> > >> would have reset inode 3064909 nlinks from 4294967291 to 2 > >> > > > >> > > Oh now that's interesting. Inode on unlinked list with grossly nonzero > >> > > (but probably underflowed) link count. That might explain why iunlink > >> > > recovery ignores the inode. Is inode 3064909 reachable via the > >> > > directory tree? > >> > > > >> > > Would you mind sending me a metadump to play with? metadump -ago would > >> > > be best, if filenames/xattrnames aren't sensitive customer data. > >> > > >> > Sorry about the delay. > >> > I am checking for any permissions part internally. > >> > Meanwhile - I can help out if you would like me to try anything. > >> > >> Ok. I'll try creating a filesystem with a weirdly high refcount > >> unlinked inode and I guess you can try it to see if you get the same > >> symptoms. I've finished with my parent pointers work for the time > >> being, so I might have some time tomorrow (after I kick the tires on > >> SETFSUUID) to simulate this and see if I can adapt the AGI repair code > >> to deal with this. > > > > If you uncompress and mdrestore the attached file to a blockdev, mount > > it, and run some creat() exerciser, do you get the same symptoms? I've > > figured out how to make online fsck deal with it. :) > > > > A possible solution for runtime would be to make it so that > > xfs_iunlink_lookup could iget the inode if it's not in cache at all. > > > > Hello Darrick, > > I did xfs_mdrestore the metadump you provided on a loop mounted > blockdev. I ran fsstress on the root dir of the mounted filesystem, > but I was unable to hit the issue. > > I tried the same with the original FS metadump as well and I am unable > to hit the issue while running fsstress on the filesystem. > > I am thinking of identifying which file unlink operation was in progress > when we see the issue during mounting. Maybe that will help in > recreating the issue. Yeah, creating a bunch of O_TMPFILE files will exercise the unlinked lists, possibly enough to trip over the affected agi bucket. See t_open_tmpfiles.c in the fstests repo. --D > Although the xfs_repair -n does show a similar log of unlinked inode > with the metadump you provided. > > root@ubuntu:~# xfs_repair -n -o force_geometry /dev/loop7 > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > agi unlinked bucket 3 is 6979 in ag 0 (inode=6979) > agi unlinked bucket 4 is 6980 in ag 0 (inode=6980) > - found root inode chunk > Phase 3 - for each AG... > - scan (but don't clear) agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > No modify flag set, skipping phase 5 > Phase 6 - check inode connectivity... > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > disconnected inode 6979, would move to lost+found > disconnected inode 6980, would move to lost+found > Phase 7 - verify link counts... > would have reset inode 6979 nlinks from 5555 to 1 > would have reset inode 6980 nlinks from 0 to 1 > No modify flag set, skipping filesystem flush and exiting. > > Thanks again for the help. Once I have more info I will update the > thread! > > -ritesh