From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 011B6C433F5 for ; Fri, 11 Mar 2022 00:43:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234231AbiCKAoe (ORCPT ); Thu, 10 Mar 2022 19:44:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233264AbiCKAod (ORCPT ); Thu, 10 Mar 2022 19:44:33 -0500 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55B22143467 for ; Thu, 10 Mar 2022 16:43:31 -0800 (PST) Received: by mail-qk1-x72a.google.com with SMTP id b67so5899537qkc.6 for ; Thu, 10 Mar 2022 16:43:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=v3NLLTG9OFHcMhsmKK0uZxt7eFYZIERxpKpNsaSLOa0=; b=MLswcwQTh6s8yspiHwFZbD06THWWMh9MGReNoyjnjirOUj4OgzPSE4u8/a4NP/A/UW TbeeRM8wHzF01nYUZiqi6YYcQKLQUTPTUhugAk6BCOlKrHm03dI5CWTrZdunDEvHs6DU mYPf2FOpOPqCN4o3PGqhsWAcXfdewNt9NJ8E7voZ9j9mzVpzoWltrFBmNxfbLwfbid9i yxYR3hBVmuLHeFF3Fnbar/TnjCQTFee+97YuAmbOXoZ+pKIUg2j96tJhrUP8AAldOaaI XqIy5/goEBBsrsR3L1jaDfoh9G/D82/6R5DfQFyhMWzdsnEbM3Ckh7Wb/KYvp3+YSLnj 4esw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=v3NLLTG9OFHcMhsmKK0uZxt7eFYZIERxpKpNsaSLOa0=; b=46z5IqbZY0s+W6NEwn8gN9av3iJwobC1dD1+Kuse1CuBf8bv0cXjZZ1kwTLu+60REM 3kV+n4L/qP+HiD6TPNC+vMSd4hvX806nhlvWnY3ZmFmi0tuBtKTems09NMR5NjiB+okI NX8lScPd0lhkzzrC0aVX4pqVuuIqyo2d0KAp6B+CaG0e8OQ/Hk/d04xcD9pPPf1hLFSh Wk1ul2X0tSTQA+JmyJn5YovMukwlKY5A0aYSM6ue4bZwweIZkLlcj4XMNzLQaM3Wzf1f kniCTYBqPNm9cmH2J7UVR7DH5gInObHtwysks7dzRthwuFacU+98OzsBdTjqv2Z+kcNn dmsA== X-Gm-Message-State: AOAM532UNYqZLDqu7497d5/SujmtjidujOq/N1JU7sBzk7w4CvsVeRha LOtCvm2zTIRGm5idy1C+wiHZaOaW/g== X-Google-Smtp-Source: ABdhPJzkQ6X2BJ4S0iRchkv34iPR9GPkIQnmkoDdS31l6qNGxhfJhsMoQAe/o+T2z/ej+Wxt/BLjLw== X-Received: by 2002:a05:620a:22b3:b0:67b:3170:c383 with SMTP id p19-20020a05620a22b300b0067b3170c383mr4863123qkh.325.1646959410308; Thu, 10 Mar 2022 16:43:30 -0800 (PST) Received: from moria.home.lan (c-73-219-103-14.hsd1.vt.comcast.net. [73.219.103.14]) by smtp.gmail.com with ESMTPSA id f7-20020a05622a104700b002d4b318692esm4342336qte.31.2022.03.10.16.43.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Mar 2022 16:43:29 -0800 (PST) Date: Thu, 10 Mar 2022 19:43:27 -0500 From: Kent Overstreet To: Eric Wheeler Cc: linux-bcachefs@vger.kernel.org Subject: Re: bcachefs: Kernel panic - not syncing: trans path oveflow Message-ID: <20220311004327.t2rtzd4eg7ktmdrp@moria.home.lan> References: <6bc8aca6-2f93-4a81-376-13155fcc5d7@ewheeler.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-bcachefs@vger.kernel.org On Thu, Mar 10, 2022 at 02:25:29PM -0800, Eric Wheeler wrote: > On Wed, 9 Mar 2022, Kent Overstreet wrote: > > > On Wed, Mar 09, 2022 at 01:14:58PM -0800, Eric Wheeler wrote: > > > Hi Kent, > > > > > > We just started testing bcachefs snapshots this week: we have a bunch of > > > mysql replicas, each in its own subvolume. Every 4 hours we stop mysql, > > > run a subvolume snapshot and restart mysql, so it gets lots of snapshot > > > and sync IO from the many database instances. > > > > Cool! Would love to hear any comments you've got so far. > > Happy to. So far we've hit this bug...but once that is fixed I'm curious > how it will compare to btrfs, which has just become too slow... > > > > We hit the following bcachefs panic while testing commit# > > > 5490c9c529770aa18b2571bd98f5416ed9ae24c6 from March 3rd. Can you tell what > > > the issue might be? > > > > > > It is easily reproducable, the same problem hits shortly after we reboot > > > and remount so happy to test patches or git-pull's to rebuild with: > > > > > > Here is the stack trace (more logs below): > > > > So it looks like there's some code that iterates over btree keys and goes > > further than it's supposed to - we have paths that point to different inode > > numbers and that's not supposed to happen in the write path, we're only updating > > a single inode. > > > > I've had a report of a similar bug in the data move path, which may or may not > > be the same as this bug - but I haven't worked up a repro for it yet so I > > haven't figured out yet which code path is allocating these btree paths. Could > > you enable CONFIG_BCACHEFS_DEBUG, then run your log through > > scripts/decode_stacktrace.sh from the kernel source tree? > > Here's the stack trace, full log below that. > > [ 179.179253] Kernel panic - not syncing: trans path oveflow > [ 179.179957] CPU: 0 PID: 5197 Comm: mysqld Not tainted 5.15.0+ #1 > [ 179.180629] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 179.181296] Call Trace: > [ 179.181954] dump_stack_lvl (lib/dump_stack.c:107) > [ 179.182938] panic (kernel/panic.c:240) > [ 179.184231] ? bch2_dump_trans_paths_updates (fs/bcachefs/alloc_foreground.c:618) bcachefs > [ 179.185574] btree_path_alloc.cold.74 (fs/bcachefs/alloc_foreground.c:608) bcachefs > [ 179.186811] btree_path_clone (fs/bcachefs/btree_iter.c:1648 fs/bcachefs/btree_iter.c:1664) bcachefs > [ 179.187983] bch2_btree_path_set_pos (fs/bcachefs/btree_iter.c:1679 fs/bcachefs/btree_iter.c:1701) bcachefs > [ 179.189043] ? bch2_trans_update_extent (fs/bcachefs/btree_update_leaf.c:1220) bcachefs > [ 179.190126] bch2_btree_iter_peek (fs/bcachefs/btree_iter.c:2387) bcachefs > [ 179.191178] bch2_trans_update_extent (fs/bcachefs/btree_update_leaf.c:1220) bcachefs > [ 179.192200] ? bch2_trans_update_extent (fs/bcachefs/btree_update_leaf.c:1220) bcachefs > [ 179.193174] ? bch2_inode_unpack_v2 (fs/bcachefs/inode.c:199 (discriminator 287)) bcachefs > [ 179.194169] ? bch2_inode_peek (fs/bcachefs/inode.c:272) bcachefs > [ 179.195078] bch2_extent_update (fs/bcachefs/io.c:297) bcachefs > [ 179.195938] ? bch2_inode_peek (fs/bcachefs/inode.c:262) bcachefs > [ 179.196767] __bchfs_fallocate (fs/bcachefs/fs-io.c:3039) bcachefs > [ 179.197522] ? __bchfs_fallocate (fs/bcachefs/bkey.h:527 fs/bcachefs/fs-io.c:3006) bcachefs > [ 179.198249] ? mntput_no_expire (fs/namespace.c:1224) > [ 179.198940] bch2_fallocate_dispatch (fs/bcachefs/fs-io.c:3096 fs/bcachefs/fs-io.c:3139) bcachefs > [ 179.199634] vfs_fallocate (fs/open.c:307) > [ 179.200272] ksys_fallocate (./include/linux/file.h:45 fs/open.c:331) > [ 179.200895] __x64_sys_fallocate (fs/open.c:338 fs/open.c:336 fs/open.c:336) > [ 179.201519] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) > [ 179.202074] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) > [ 179.202630] RIP: 0033:0x7eff23af5fb9 Thanks, I think I know what's going on. It's the BTREE_ITER_FILTER_SNAPSHOTS code, and in particular it's the code that saves a path for the update position that's allocating all these iterators. So, we need two changes: - delay setting the update_path as should_be_locked until we return from bch2_btree_iter_peek(), so that we don't end up saving a bunch of duplicate iterators - the bigger change: if the next inode is in a different subvolume, we could end up scanning past a bunch of different inodes until we find a key in the curent snapshot to return and terminate the lookup - so we need to add a "search up to this position" to bch2_btree_iter_peek(). I'll let you know when the fixes are up.