From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20C3FC433EF for ; Tue, 7 Jun 2022 02:37:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236000AbiFGCho (ORCPT ); Mon, 6 Jun 2022 22:37:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233017AbiFGChn (ORCPT ); Mon, 6 Jun 2022 22:37:43 -0400 Received: from mail1.merlins.org (magic.merlins.org [209.81.13.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD679488B2 for ; Mon, 6 Jun 2022 19:37:41 -0700 (PDT) Received: from merlin by mail1.merlins.org with local (Exim 4.94.2 #2) id 1nyP6C-000386-8j by authid ; Mon, 06 Jun 2022 19:37:40 -0700 Date: Mon, 6 Jun 2022 19:37:40 -0700 From: Marc MERLIN To: Josef Bacik Cc: linux-btrfs Subject: Re: Rebuilding 24TB Raid5 array (was btrfs corruption: parent transid verify failed + open_ctree failed) Message-ID: <20220607023740.GQ22722@merlins.org> References: <20220606012204.GP1745079@merlins.org> <20220606210855.GL22722@merlins.org> <20220606212301.GM22722@merlins.org> <20220606215013.GN22722@merlins.org> <20220606221755.GO22722@merlins.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Sysadmin: BOFH X-URL: http://marc.merlins.org/ User-Agent: Mutt/1.10.1 (2018-07-13) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: marc@merlins.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Mon, Jun 06, 2022 at 10:28:25PM -0400, Josef Bacik wrote: > Ok I thought I caught this particular problem but I don't, so I fixed > tree-recover to handle unordered keys in different nodes. Pull and > build, run tree-recover. It's going to start deleting slots for > unordered keys, picking whichever node is newer as the source of > truth. You should only see this happen on 164624, if you see it fire > a bunch right away stop it and send me the output so I can make sure I > didn't screw anything up. I went over the code and diff a few times > to make sure I didn't mess anything up, but I could have missed > something. If that runs and fixes stuff, run it again just to make > sure it doesn't find anything the second time. It shouldn't since I > re-start the loop if we adjust things, but just in case. I assume > this will blow up, but if it doesn't you can try running > init-extent-tree again and see how that goes. Thanks, gargamel:/var/local/src/btrfs-progs-josefbacik# ./btrfs rescue tree-recover /dev/mapper/dshelf1 FS_INFO IS 0x557d47c7ebc0 Couldn't find the last root for 8 FS_INFO AFTER IS 0x557d47c7ebc0 Checking root 2 bytenr 15645018587136 Checking root 4 bytenr 15645019078656 Checking root 5 bytenr 15645018161152 Checking root 7 bytenr 15645018570752 Checking root 9 bytenr 15645740367872 Checking root 161197 bytenr 15645018341376 Checking root 161199 bytenr 15645018652672 Checking root 161200 bytenr 15645018750976 Checking root 161889 bytenr 11160502124544 Checking root 162628 bytenr 15645018931200 Checking root 162632 bytenr 15645018210304 Checking root 163298 bytenr 15645019045888 Checking root 163302 bytenr 15645018685440 Checking root 163303 bytenr 15645019095040 Checking root 163316 bytenr 15645018996736 Checking root 163920 bytenr 15645019144192 Checking root 164620 bytenr 15645019275264 Checking root 164623 bytenr 15645019226112 Checking root 164624 bytenr 15645019602944 Checking root 164629 bytenr 15645485137920 Checking root 164631 bytenr 15645496983552 Checking root 164633 bytenr 15645526884352 Checking root 164823 bytenr 15645999005696 Tree recovery finished, you can run check now gargamel:/var/local/src/btrfs-progs-josefbacik# ./btrfs rescue init-extent-tree /dev/mapper/dshelf1 FS_INFO IS 0x564088b52bc0 Couldn't find the last root for 8 FS_INFO AFTER IS 0x564088b52bc0 Walking all our trees and pinning down the currently accessible blocks Clearing the extent root and re-init'ing the block groups deleting space cache for 11106814787584 (...) inserting block group 15932780969984 inserting block group 15933854711808 ERROR: Error reading data reloc tree -2 ERROR: failed to reinit the data reloc root searching 1 for bad extents processed 999424 of 0 possible bytes, 0% searching 4 for bad extents processed 163840 of 1064960 possible bytes, 15% searching 5 for bad extents processed 65536 of 10960896 possible bytes, 0% searching 7 for bad extents processed 16384 of 16570974208 possible bytes, 0% searching 9 for bad extents processed 16384 of 16384 possible bytes, 100% searching 161197 for bad extents processed 131072 of 108986368 possible bytes, 0% searching 161199 for bad extents processed 196608 of 49479680 possible bytes, 0% searching 161200 for bad extents processed 180224 of 254214144 possible bytes, 0% searching 161889 for bad extents processed 229376 of 49446912 possible bytes, 0% searching 162628 for bad extents processed 49152 of 49463296 possible bytes, 0% searching 162632 for bad extents processed 147456 of 94633984 possible bytes, 0% searching 163298 for bad extents processed 49152 of 49463296 possible bytes, 0% searching 163302 for bad extents processed 147456 of 94633984 possible bytes, 0% searching 163303 for bad extents processed 131072 of 76333056 possible bytes, 0% searching 163316 for bad extents processed 147456 of 108544000 possible bytes, 0% searching 163920 for bad extents processed 16384 of 108691456 possible bytes, 0% searching 164620 for bad extents processed 49152 of 49463296 possible bytes, 0% searching 164623 for bad extents processed 311296 of 63193088 possible bytes, 0% searching 164624 for bad extents Found an extent we don't have a block group for in the file file kernel-shared/ctree.c:1718: push_node_left: BUG_ON `check_sibling_keys(dst, src)` triggered, value 1 ./btrfs(+0x1cf51)[0x56090b357f51] ./btrfs(+0x1ef96)[0x56090b359f96] ./btrfs(btrfs_search_slot+0xee4)[0x56090b35d881] ./btrfs(+0x8d592)[0x56090b3c8592] ./btrfs(+0x8d2c1)[0x56090b3c82c1] ./btrfs(+0x8d2c1)[0x56090b3c82c1] ./btrfs(+0x8d8b0)[0x56090b3c88b0] ./btrfs(+0x8d0bb)[0x56090b3c80bb] ./btrfs(btrfs_init_extent_tree+0xc83)[0x56090b3c9e48] ./btrfs(+0x8467e)[0x56090b3bf67e] ./btrfs(handle_command_group+0x49)[0x56090b35317b] ./btrfs(main+0x94)[0x56090b353275] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xcd)[0x7f37ada9c7fd] ./btrfs(_start+0x2a)[0x56090b352e1a] Aborted -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Home page: http://marc.merlins.org/