linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chandan Babu R <chandanrlinux@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: xfs <linux-xfs@vger.kernel.org>, oliver.sang@intel.com
Subject: Re: [PATCH] xfs: fix perag structure refcounting error when scrub fails
Date: Fri, 20 Aug 2021 15:42:31 +0530	[thread overview]
Message-ID: <87a6lcpgq8.fsf@debian-BULLSEYE-live-builder-AMD64> (raw)
In-Reply-To: <20210820050647.GW12640@magnolia>

On 20 Aug 2021 at 10:36, Darrick J. Wong wrote:
> The kernel test robot found the following bug when running xfs/355 to
> scrub a bmap btree:
>
> XFS: Assertion failed: !sa->pag, file: fs/xfs/scrub/common.c, line: 412
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:110!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 2 PID: 1415 Comm: xfs_scrub Not tainted 5.14.0-rc4-00021-g48c6615cc557 #1
> Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013
> RIP: 0010:assfail+0x23/0x28 [xfs]
> RSP: 0018:ffffc9000aacb890 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffffc9000aacbcc8 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffc09e7dcd
> RBP: ffffc9000aacbc80 R08: ffff8881fdf17d50 R09: 0000000000000000
> R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000
> R13: ffff88820c7ed000 R14: 0000000000000001 R15: ffffc9000aacb980
> FS:  00007f185b955700(0000) GS:ffff8881fdf00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7f6ef43000 CR3: 000000020de38002 CR4: 00000000001706e0
> Call Trace:
>  xchk_ag_read_headers+0xda/0x100 [xfs]
>  xchk_ag_init+0x15/0x40 [xfs]
>  xchk_btree_check_block_owner+0x76/0x180 [xfs]
>  xchk_btree_get_block+0xd0/0x140 [xfs]
>  xchk_btree+0x32e/0x440 [xfs]
>  xchk_bmap_btree+0xd4/0x140 [xfs]
>  xchk_bmap+0x1eb/0x3c0 [xfs]
>  xfs_scrub_metadata+0x227/0x4c0 [xfs]
>  xfs_ioc_scrub_metadata+0x50/0xc0 [xfs]
>  xfs_file_ioctl+0x90c/0xc40 [xfs]
>  __x64_sys_ioctl+0x83/0xc0
>  do_syscall_64+0x3b/0xc0
>
> The unusual handling of errors while initializing struct xchk_ag is the
> root cause here.  Since the beginning of xfs_scrub, the goal of
> xchk_ag_read_headers has been to read all three AG header buffers and
> attach them both to the xchk_ag structure and the scrub transaction.
> Corruption errors on any of the three headers doesn't necessarily
> trigger an immediate return to userspace, because xfs_scrub can also
> tell us to /fix/ the problem.
>
> In other words, it's possible for the xchk_ag init functions to return
> an error code and a partially filled out structure so that scrub can use
> however much information it managed to pull.  Before 5.15, it was
> sufficient to cancel (or commit) the scrub transaction on the way out of
> the scrub code to release the buffers.
>
> Ccommit 48c6615cc557 added a reference to the perag structure to struct
> xchk_ag.  Since perag structures are not attached to transactions like
> buffers are, this adds the requirement that the perag ref be released
> explicitly.  The scrub teardown function xchk_teardown was amended to do
> this for the xchk_ag embedded in struct xfs_scrub.
>
> Unfortunately, I forgot that certain parts of the scrub code probe
> multiple AGs and therefore handle the initialization and cleanup on
> their own.  Specifically, the bmbt scrubber will initialize it long
> enough to cross-reference AG metadata for btree blocks and for the
> extent mappings in the bmbt.
>
> If one of the AG headers is corrupt, the init function returns with a
> live perag structure reference and some of the AG header buffers.  If an
> error occurs, the cross referencing will be noted as XCORRUPTion and
> skipped, but the main scrub process will move on to the next record.
> It is now necessary to release the perag reference before we try to
> analyze something from a different AG, or else we'll trip over the
> assertion noted above.

Looks good to me.

Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>

>
> Fixes: 48c6615cc557 ("xfs: grab active perag ref when reading AG headers")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/bmap.c       |    3 ++-
>  fs/xfs/scrub/btree.c      |    3 ++-
>  fs/xfs/scrub/fscounters.c |    2 +-
>  fs/xfs/scrub/inode.c      |    3 ++-
>  4 files changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> index 2df5e5a51cbd..017da9ceaee9 100644
> --- a/fs/xfs/scrub/bmap.c
> +++ b/fs/xfs/scrub/bmap.c
> @@ -263,7 +263,7 @@ xchk_bmap_iextent_xref(
>  	error = xchk_ag_init_existing(info->sc, agno, &info->sc->sa);
>  	if (!xchk_fblock_process_error(info->sc, info->whichfork,
>  			irec->br_startoff, &error))
> -		return;
> +		goto out_free;
>  
>  	xchk_xref_is_used_space(info->sc, agbno, len);
>  	xchk_xref_is_not_inode_chunk(info->sc, agbno, len);
> @@ -283,6 +283,7 @@ xchk_bmap_iextent_xref(
>  		break;
>  	}
>  
> +out_free:
>  	xchk_ag_free(info->sc, &info->sc->sa);
>  }
>  
> diff --git a/fs/xfs/scrub/btree.c b/fs/xfs/scrub/btree.c
> index fd832f103fa4..eccb855dc904 100644
> --- a/fs/xfs/scrub/btree.c
> +++ b/fs/xfs/scrub/btree.c
> @@ -377,7 +377,7 @@ xchk_btree_check_block_owner(
>  		error = xchk_ag_init_existing(bs->sc, agno, &bs->sc->sa);
>  		if (!xchk_btree_xref_process_error(bs->sc, bs->cur,
>  				level, &error))
> -			return error;
> +			goto out_free;
>  	}
>  
>  	xchk_xref_is_used_space(bs->sc, agbno, 1);
> @@ -393,6 +393,7 @@ xchk_btree_check_block_owner(
>  	if (!bs->sc->sa.rmap_cur && btnum == XFS_BTNUM_RMAP)
>  		bs->cur = NULL;
>  
> +out_free:
>  	if (init_sa)
>  		xchk_ag_free(bs->sc, &bs->sc->sa);
>  
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index 737aa5b39d5e..48a6cbdf95d0 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -150,7 +150,7 @@ xchk_fscount_btreeblks(
>  
>  	error = xchk_ag_init_existing(sc, agno, &sc->sa);
>  	if (error)
> -		return error;
> +		goto out_free;
>  
>  	error = xfs_btree_count_blocks(sc->sa.bno_cur, &blocks);
>  	if (error)
> diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
> index d6e0e3a11fbc..2405b09d03d0 100644
> --- a/fs/xfs/scrub/inode.c
> +++ b/fs/xfs/scrub/inode.c
> @@ -533,7 +533,7 @@ xchk_inode_xref(
>  
>  	error = xchk_ag_init_existing(sc, agno, &sc->sa);
>  	if (!xchk_xref_process_error(sc, agno, agbno, &error))
> -		return;
> +		goto out_free;
>  
>  	xchk_xref_is_used_space(sc, agbno, 1);
>  	xchk_inode_xref_finobt(sc, ino);
> @@ -541,6 +541,7 @@ xchk_inode_xref(
>  	xchk_xref_is_not_shared(sc, agbno, 1);
>  	xchk_inode_xref_bmap(sc, dip);
>  
> +out_free:
>  	xchk_ag_free(sc, &sc->sa);
>  }
>  


-- 
chandan

      reply	other threads:[~2021-08-20 10:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-20  5:06 [PATCH] xfs: fix perag structure refcounting error when scrub fails Darrick J. Wong
2021-08-20 10:12 ` Chandan Babu R [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6lcpgq8.fsf@debian-BULLSEYE-live-builder-AMD64 \
    --to=chandanrlinux@gmail.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).