From: Chandan Babu R <chandanrlinux@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: xfs <linux-xfs@vger.kernel.org>, oliver.sang@intel.com
Subject: Re: [PATCH] xfs: fix perag structure refcounting error when scrub fails
Date: Fri, 20 Aug 2021 15:42:31 +0530 [thread overview]
Message-ID: <87a6lcpgq8.fsf@debian-BULLSEYE-live-builder-AMD64> (raw)
In-Reply-To: <20210820050647.GW12640@magnolia>
On 20 Aug 2021 at 10:36, Darrick J. Wong wrote:
> The kernel test robot found the following bug when running xfs/355 to
> scrub a bmap btree:
>
> XFS: Assertion failed: !sa->pag, file: fs/xfs/scrub/common.c, line: 412
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:110!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 2 PID: 1415 Comm: xfs_scrub Not tainted 5.14.0-rc4-00021-g48c6615cc557 #1
> Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
> RIP: 0010:assfail+0x23/0x28 [xfs]
> RSP: 0018:ffffc9000aacb890 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffc9000aacbcc8 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffc09e7dcd
> RBP: ffffc9000aacbc80 R08: ffff8881fdf17d50 R09: 0000000000000000
> R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000
> R13: ffff88820c7ed000 R14: 0000000000000001 R15: ffffc9000aacb980
> FS: 00007f185b955700(0000) GS:ffff8881fdf00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7f6ef43000 CR3: 000000020de38002 CR4: 00000000001706e0
> Call Trace:
> xchk_ag_read_headers+0xda/0x100 [xfs]
> xchk_ag_init+0x15/0x40 [xfs]
> xchk_btree_check_block_owner+0x76/0x180 [xfs]
> xchk_btree_get_block+0xd0/0x140 [xfs]
> xchk_btree+0x32e/0x440 [xfs]
> xchk_bmap_btree+0xd4/0x140 [xfs]
> xchk_bmap+0x1eb/0x3c0 [xfs]
> xfs_scrub_metadata+0x227/0x4c0 [xfs]
> xfs_ioc_scrub_metadata+0x50/0xc0 [xfs]
> xfs_file_ioctl+0x90c/0xc40 [xfs]
> __x64_sys_ioctl+0x83/0xc0
> do_syscall_64+0x3b/0xc0
>
> The unusual handling of errors while initializing struct xchk_ag is the
> root cause here. Since the beginning of xfs_scrub, the goal of
> xchk_ag_read_headers has been to read all three AG header buffers and
> attach them both to the xchk_ag structure and the scrub transaction.
> Corruption errors on any of the three headers doesn't necessarily
> trigger an immediate return to userspace, because xfs_scrub can also
> tell us to /fix/ the problem.
>
> In other words, it's possible for the xchk_ag init functions to return
> an error code and a partially filled out structure so that scrub can use
> however much information it managed to pull. Before 5.15, it was
> sufficient to cancel (or commit) the scrub transaction on the way out of
> the scrub code to release the buffers.
>
> Ccommit 48c6615cc557 added a reference to the perag structure to struct
> xchk_ag. Since perag structures are not attached to transactions like
> buffers are, this adds the requirement that the perag ref be released
> explicitly. The scrub teardown function xchk_teardown was amended to do
> this for the xchk_ag embedded in struct xfs_scrub.
>
> Unfortunately, I forgot that certain parts of the scrub code probe
> multiple AGs and therefore handle the initialization and cleanup on
> their own. Specifically, the bmbt scrubber will initialize it long
> enough to cross-reference AG metadata for btree blocks and for the
> extent mappings in the bmbt.
>
> If one of the AG headers is corrupt, the init function returns with a
> live perag structure reference and some of the AG header buffers. If an
> error occurs, the cross referencing will be noted as XCORRUPTion and
> skipped, but the main scrub process will move on to the next record.
> It is now necessary to release the perag reference before we try to
> analyze something from a different AG, or else we'll trip over the
> assertion noted above.
Looks good to me.
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
>
> Fixes: 48c6615cc557 ("xfs: grab active perag ref when reading AG headers")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/xfs/scrub/bmap.c | 3 ++-
> fs/xfs/scrub/btree.c | 3 ++-
> fs/xfs/scrub/fscounters.c | 2 +-
> fs/xfs/scrub/inode.c | 3 ++-
> 4 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> index 2df5e5a51cbd..017da9ceaee9 100644
> --- a/fs/xfs/scrub/bmap.c
> +++ b/fs/xfs/scrub/bmap.c
> @@ -263,7 +263,7 @@ xchk_bmap_iextent_xref(
> error = xchk_ag_init_existing(info->sc, agno, &info->sc->sa);
> if (!xchk_fblock_process_error(info->sc, info->whichfork,
> irec->br_startoff, &error))
> - return;
> + goto out_free;
>
> xchk_xref_is_used_space(info->sc, agbno, len);
> xchk_xref_is_not_inode_chunk(info->sc, agbno, len);
> @@ -283,6 +283,7 @@ xchk_bmap_iextent_xref(
> break;
> }
>
> +out_free:
> xchk_ag_free(info->sc, &info->sc->sa);
> }
>
> diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> index fd832f103fa4..eccb855dc904 100644
> --- a/fs/xfs/scrub/btree.c
> +++ b/fs/xfs/scrub/btree.c
> @@ -377,7 +377,7 @@ xchk_btree_check_block_owner(
> error = xchk_ag_init_existing(bs->sc, agno, &bs->sc->sa);
> if (!xchk_btree_xref_process_error(bs->sc, bs->cur,
> level, &error))
> - return error;
> + goto out_free;
> }
>
> xchk_xref_is_used_space(bs->sc, agbno, 1);
> @@ -393,6 +393,7 @@ xchk_btree_check_block_owner(
> if (!bs->sc->sa.rmap_cur && btnum == XFS_BTNUM_RMAP)
> bs->cur = NULL;
>
> +out_free:
> if (init_sa)
> xchk_ag_free(bs->sc, &bs->sc->sa);
>
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index 737aa5b39d5e..48a6cbdf95d0 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -150,7 +150,7 @@ xchk_fscount_btreeblks(
>
> error = xchk_ag_init_existing(sc, agno, &sc->sa);
> if (error)
> - return error;
> + goto out_free;
>
> error = xfs_btree_count_blocks(sc->sa.bno_cur, &blocks);
> if (error)
> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> index d6e0e3a11fbc..2405b09d03d0 100644
> --- a/fs/xfs/scrub/inode.c
> +++ b/fs/xfs/scrub/inode.c
> @@ -533,7 +533,7 @@ xchk_inode_xref(
>
> error = xchk_ag_init_existing(sc, agno, &sc->sa);
> if (!xchk_xref_process_error(sc, agno, agbno, &error))
> - return;
> + goto out_free;
>
> xchk_xref_is_used_space(sc, agbno, 1);
> xchk_inode_xref_finobt(sc, ino);
> @@ -541,6 +541,7 @@ xchk_inode_xref(
> xchk_xref_is_not_shared(sc, agbno, 1);
> xchk_inode_xref_bmap(sc, dip);
>
> +out_free:
> xchk_ag_free(sc, &sc->sa);
> }
>
--
chandan
prev parent reply other threads:[~2021-08-20 10:12 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-20 5:06 [PATCH] xfs: fix perag structure refcounting error when scrub fails Darrick J. Wong
2021-08-20 10:12 ` Chandan Babu R [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a6lcpgq8.fsf@debian-BULLSEYE-live-builder-AMD64 \
--to=chandanrlinux@gmail.com \
--cc=djwong@kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=oliver.sang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).