linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET v23.1 0/4] xfs: fix handling of AG[IF] header buffers during scrub
@ 2022-10-02 18:19 Darrick J. Wong
  2022-10-02 18:19 ` [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:19 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

Hi all,

While reading through the online fsck code, I noticed that the setup
code for AG metadata scrubs will attach the AGI, the AGF, and the AGFL
buffers to the transaction.  It isn't necessary to hold the AGFL buffer,
since any code that wants to do anything with the AGFL will need to hold
the AGF to know which parts of the AGFL are active.  Therefore, we only
need to hold the AGFL when scrubbing the AGFL itself.

The second bug fixed by this patchset is one that I observed while
testing online repair.  When a buffer is held across a transaction roll,
its buffer log item will be detached if the bli was clean before the
roll.  If we are holding the AG headers to maintain a lock on an AG, we
then need to set the buffer type on the new bli to avoid confusing the
logging code later.

There's also a bug fix for uninitialized memory in the directory scanner
that didn't fit anywhere else.

Ths patchset finishes off by teaching the AGFL repair function to look
for and discard crosslinked blocks instead of putting them back on the
AGFL.  Is this a bug fix?  Or merely an enhancement?

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-fix-ag-header-handling
---
 fs/xfs/scrub/agheader.c        |   47 +++++++++++++++---------
 fs/xfs/scrub/agheader_repair.c |   79 +++++++++++++++++++++++++++++++++-------
 fs/xfs/scrub/common.c          |    8 ----
 fs/xfs/scrub/dir.c             |   10 +++--
 fs/xfs/scrub/repair.c          |   29 +++++++++++----
 fs/xfs/scrub/scrub.h           |    1 -
 6 files changed, 122 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] xfs: fully initialize xfs_da_args in xchk_directory_blocks
  2022-10-02 18:19 [PATCHSET v23.1 0/4] xfs: fix handling of AG[IF] header buffers during scrub Darrick J. Wong
  2022-10-02 18:19 ` [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll Darrick J. Wong
  2022-10-02 18:19 ` [PATCH 2/4] xfs: don't track the AGFL buffer in the scrub AG context Darrick J. Wong
@ 2022-10-02 18:19 ` Darrick J. Wong
  2022-10-13 22:33   ` Dave Chinner
  2022-10-02 18:19 ` [PATCH 4/4] xfs: make AGFL repair function avoid crosslinked blocks Darrick J. Wong
  3 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:19 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While running the online fsck test suite, I noticed the following
assertion in the kernel log (edited for brevity):

XFS: Assertion failed: 0, file: fs/xfs/xfs_health.c, line: 571
------------[ cut here ]------------
WARNING: CPU: 3 PID: 11667 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
CPU: 3 PID: 11667 Comm: xfs_scrub Tainted: G        W         5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:assfail+0x46/0x4a [xfs]
Call Trace:
 <TASK>
 xfs_dir2_isblock+0xcc/0xe0
 xchk_directory_blocks+0xc7/0x420
 xchk_directory+0x53/0xb0
 xfs_scrub_metadata+0x2b6/0x6b0
 xfs_scrubv_metadata+0x35e/0x4d0
 xfs_ioc_scrubv_metadata+0x111/0x160
 xfs_file_ioctl+0x4ec/0xef0
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

This assertion triggers in xfs_dirattr_mark_sick when the caller passes
in a whichfork value that is neither of XFS_{DATA,ATTR}_FORK.  The cause
of this is that xchk_directory_blocks only partially initializes the
xfs_da_args structure that is passed to xfs_dir2_isblock.  If the data
fork is not correct, the XFS_IS_CORRUPT clause will trigger.  My
development branch reports this failure to the health monitoring
subsystem, which accesses the uninitialized args->whichfork field,
leading the the assertion tripping.  We really shouldn't be passing
random stack contents around, so the solution here is to force the
compiler to zero-initialize the struct.

Found by fuzzing u3.bmx[0].blockcount = middlebit on xfs/1554.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 5abb5fdb71d9..58b9761db48d 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -666,7 +666,12 @@ xchk_directory_blocks(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_bmbt_irec	got;
-	struct xfs_da_args	args;
+	struct xfs_da_args	args = {
+		.dp		= sc ->ip,
+		.whichfork	= XFS_DATA_FORK,
+		.geo		= sc->mp->m_dir_geo,
+		.trans		= sc->tp,
+	};
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(sc->ip, XFS_DATA_FORK);
 	struct xfs_mount	*mp = sc->mp;
 	xfs_fileoff_t		leaf_lblk;
@@ -689,9 +694,6 @@ xchk_directory_blocks(
 	free_lblk = XFS_B_TO_FSB(mp, XFS_DIR2_FREE_OFFSET);
 
 	/* Is this a block dir? */
-	args.dp = sc->ip;
-	args.geo = mp->m_dir_geo;
-	args.trans = sc->tp;
 	error = xfs_dir2_isblock(&args, &is_block);
 	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, lblk, &error))
 		goto out;


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] xfs: don't track the AGFL buffer in the scrub AG context
  2022-10-02 18:19 [PATCHSET v23.1 0/4] xfs: fix handling of AG[IF] header buffers during scrub Darrick J. Wong
  2022-10-02 18:19 ` [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll Darrick J. Wong
@ 2022-10-02 18:19 ` Darrick J. Wong
  2022-10-13 22:32   ` Dave Chinner
  2022-10-02 18:19 ` [PATCH 1/4] xfs: fully initialize xfs_da_args in xchk_directory_blocks Darrick J. Wong
  2022-10-02 18:19 ` [PATCH 4/4] xfs: make AGFL repair function avoid crosslinked blocks Darrick J. Wong
  3 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:19 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While scrubbing an allocation group, we don't need to hold the AGFL
buffer as part of the scrub context.  All that is necessary to lock an
AG is to hold the AGI and AGF buffers, so fix all the existing users of
the AGFL buffer to grab them only when necessary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/agheader.c        |   47 +++++++++++++++++++++++++---------------
 fs/xfs/scrub/agheader_repair.c |    1 -
 fs/xfs/scrub/common.c          |    8 -------
 fs/xfs/scrub/repair.c          |   11 +++++----
 fs/xfs/scrub/scrub.h           |    1 -
 5 files changed, 35 insertions(+), 33 deletions(-)


diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c
index b7b838bd4ba4..af284baa6f4c 100644
--- a/fs/xfs/scrub/agheader.c
+++ b/fs/xfs/scrub/agheader.c
@@ -609,9 +609,16 @@ xchk_agf(
 /* AGFL */
 
 struct xchk_agfl_info {
-	unsigned int		sz_entries;
+	/* Number of AGFL entries that the AGF claims are in use. */
+	unsigned int		agflcount;
+
+	/* Number of AGFL entries that we found. */
 	unsigned int		nr_entries;
+
+	/* Buffer to hold AGFL entries for extent checking. */
 	xfs_agblock_t		*entries;
+
+	struct xfs_buf		*agfl_bp;
 	struct xfs_scrub	*sc;
 };
 
@@ -641,10 +648,10 @@ xchk_agfl_block(
 	struct xfs_scrub	*sc = sai->sc;
 
 	if (xfs_verify_agbno(sc->sa.pag, agbno) &&
-	    sai->nr_entries < sai->sz_entries)
+	    sai->nr_entries < sai->agflcount)
 		sai->entries[sai->nr_entries++] = agbno;
 	else
-		xchk_block_set_corrupt(sc, sc->sa.agfl_bp);
+		xchk_block_set_corrupt(sc, sai->agfl_bp);
 
 	xchk_agfl_block_xref(sc, agbno);
 
@@ -696,19 +703,26 @@ int
 xchk_agfl(
 	struct xfs_scrub	*sc)
 {
-	struct xchk_agfl_info	sai;
+	struct xchk_agfl_info	sai = {
+		.sc		= sc,
+	};
 	struct xfs_agf		*agf;
 	xfs_agnumber_t		agno = sc->sm->sm_agno;
-	unsigned int		agflcount;
 	unsigned int		i;
 	int			error;
 
+	/* Lock the AGF and AGI so that nobody can touch this AG. */
 	error = xchk_ag_read_headers(sc, agno, &sc->sa);
 	if (!xchk_process_error(sc, agno, XFS_AGFL_BLOCK(sc->mp), &error))
-		goto out;
+		return error;
 	if (!sc->sa.agf_bp)
 		return -EFSCORRUPTED;
-	xchk_buffer_recheck(sc, sc->sa.agfl_bp);
+
+	/* Try to read the AGFL, and verify its structure if we get it. */
+	error = xfs_alloc_read_agfl(sc->sa.pag, sc->tp, &sai.agfl_bp);
+	if (!xchk_process_error(sc, agno, XFS_AGFL_BLOCK(sc->mp), &error))
+		return error;
+	xchk_buffer_recheck(sc, sai.agfl_bp);
 
 	xchk_agfl_xref(sc);
 
@@ -717,24 +731,21 @@ xchk_agfl(
 
 	/* Allocate buffer to ensure uniqueness of AGFL entries. */
 	agf = sc->sa.agf_bp->b_addr;
-	agflcount = be32_to_cpu(agf->agf_flcount);
-	if (agflcount > xfs_agfl_size(sc->mp)) {
+	sai.agflcount = be32_to_cpu(agf->agf_flcount);
+	if (sai.agflcount > xfs_agfl_size(sc->mp)) {
 		xchk_block_set_corrupt(sc, sc->sa.agf_bp);
 		goto out;
 	}
-	memset(&sai, 0, sizeof(sai));
-	sai.sc = sc;
-	sai.sz_entries = agflcount;
-	sai.entries = kmem_zalloc(sizeof(xfs_agblock_t) * agflcount,
-			KM_MAYFAIL);
+	sai.entries = kvcalloc(sai.agflcount, sizeof(xfs_agblock_t),
+			GFP_KERNEL | __GFP_RETRY_MAYFAIL);
 	if (!sai.entries) {
 		error = -ENOMEM;
 		goto out;
 	}
 
 	/* Check the blocks in the AGFL. */
-	error = xfs_agfl_walk(sc->mp, sc->sa.agf_bp->b_addr,
-			sc->sa.agfl_bp, xchk_agfl_block, &sai);
+	error = xfs_agfl_walk(sc->mp, sc->sa.agf_bp->b_addr, sai.agfl_bp,
+			xchk_agfl_block, &sai);
 	if (error == -ECANCELED) {
 		error = 0;
 		goto out_free;
@@ -742,7 +753,7 @@ xchk_agfl(
 	if (error)
 		goto out_free;
 
-	if (agflcount != sai.nr_entries) {
+	if (sai.agflcount != sai.nr_entries) {
 		xchk_block_set_corrupt(sc, sc->sa.agf_bp);
 		goto out_free;
 	}
@@ -758,7 +769,7 @@ xchk_agfl(
 	}
 
 out_free:
-	kmem_free(sai.entries);
+	kvfree(sai.entries);
 out:
 	return error;
 }
diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 1b0b4e243f77..2e75ff9b5b2e 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -697,7 +697,6 @@ xrep_agfl(
 	 * freespace overflow to the freespace btrees.
 	 */
 	sc->sa.agf_bp = agf_bp;
-	sc->sa.agfl_bp = agfl_bp;
 	error = xrep_roll_ag_trans(sc);
 	if (error)
 		goto err;
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 9bbbf20f401b..ad70f29233c3 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -424,10 +424,6 @@ xchk_ag_read_headers(
 	if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGF))
 		return error;
 
-	error = xfs_alloc_read_agfl(sa->pag, sc->tp, &sa->agfl_bp);
-	if (error && want_ag_read_header_failure(sc, XFS_SCRUB_TYPE_AGFL))
-		return error;
-
 	return 0;
 }
 
@@ -515,10 +511,6 @@ xchk_ag_free(
 	struct xchk_ag		*sa)
 {
 	xchk_ag_btcur_free(sa);
-	if (sa->agfl_bp) {
-		xfs_trans_brelse(sc->tp, sa->agfl_bp);
-		sa->agfl_bp = NULL;
-	}
 	if (sa->agf_bp) {
 		xfs_trans_brelse(sc->tp, sa->agf_bp);
 		sa->agf_bp = NULL;
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index c18bd039fce9..2ada7fc1c398 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -126,8 +126,6 @@ xrep_roll_ag_trans(
 		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
 	if (sc->sa.agf_bp)
 		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
-	if (sc->sa.agfl_bp)
-		xfs_trans_bhold(sc->tp, sc->sa.agfl_bp);
 
 	/*
 	 * Roll the transaction.  We still own the buffer and the buffer lock
@@ -145,8 +143,6 @@ xrep_roll_ag_trans(
 		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
 	if (sc->sa.agf_bp)
 		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
-	if (sc->sa.agfl_bp)
-		xfs_trans_bjoin(sc->tp, sc->sa.agfl_bp);
 
 	return 0;
 }
@@ -498,6 +494,7 @@ xrep_put_freelist(
 	struct xfs_scrub	*sc,
 	xfs_agblock_t		agbno)
 {
+	struct xfs_buf		*agfl_bp;
 	int			error;
 
 	/* Make sure there's space on the freelist. */
@@ -516,8 +513,12 @@ xrep_put_freelist(
 		return error;
 
 	/* Put the block on the AGFL. */
+	error = xfs_alloc_read_agfl(sc->sa.pag, sc->tp, &agfl_bp);
+	if (error)
+		return error;
+
 	error = xfs_alloc_put_freelist(sc->sa.pag, sc->tp, sc->sa.agf_bp,
-			sc->sa.agfl_bp, agbno, 0);
+			agfl_bp, agbno, 0);
 	if (error)
 		return error;
 	xfs_extent_busy_insert(sc->tp, sc->sa.pag, agbno, 1,
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 3de5287e98d8..151567f88366 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -39,7 +39,6 @@ struct xchk_ag {
 
 	/* AG btree roots */
 	struct xfs_buf		*agf_bp;
-	struct xfs_buf		*agfl_bp;
 	struct xfs_buf		*agi_bp;
 
 	/* AG btrees */


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll
  2022-10-02 18:19 [PATCHSET v23.1 0/4] xfs: fix handling of AG[IF] header buffers during scrub Darrick J. Wong
@ 2022-10-02 18:19 ` Darrick J. Wong
  2022-10-13 22:25   ` Dave Chinner
  2022-10-31 18:08   ` [PATCH v23.2 3/4] xfs: log the AGI/AGF buffers when rolling transactions during an AG repair Darrick J. Wong
  2022-10-02 18:19 ` [PATCH 2/4] xfs: don't track the AGFL buffer in the scrub AG context Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:19 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Currently, the only way to lock an allocation group is to hold the AGI
and AGF buffers.  If repair needs to roll the transaction while
repairing some AG metadata, it maintains that lock by holding the two
buffers across the transaction roll and joins them afterwards.

However, repair is not the same as the other parts of XFS that employ
this bhold/bjoin sequence, because it's possible that the AGI or AGF
buffers are not actually dirty before the roll.  In this case, the
buffer log item can detach from the buffer, which means that we have to
re-set the buffer type in the bli after joining the buffer to the new
transaction so that log recovery will know what to do if the fs fails.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/repair.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 2ada7fc1c398..92c661b98892 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -138,11 +138,23 @@ xrep_roll_ag_trans(
 	if (error)
 		return error;
 
-	/* Join AG headers to the new transaction. */
-	if (sc->sa.agi_bp)
+	/*
+	 * Join AG headers to the new transaction.  The buffer log item can
+	 * detach from the buffer across the transaction roll if the bli is
+	 * clean, so ensure the buffer type is still set on the AG header
+	 * buffers' blis before we return.
+	 *
+	 * Normal code would never hold a clean buffer across a roll, but scrub
+	 * needs both buffers to maintain a total lock on the AG.
+	 */
+	if (sc->sa.agi_bp) {
 		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
-	if (sc->sa.agf_bp)
+		xfs_trans_buf_set_type(sc->tp, sc->sa.agi_bp, XFS_BLFT_AGI_BUF);
+	}
+	if (sc->sa.agf_bp) {
 		xfs_trans_bjoin(sc->tp, sc->sa.agf_bp);
+		xfs_trans_buf_set_type(sc->tp, sc->sa.agf_bp, XFS_BLFT_AGF_BUF);
+	}
 
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] xfs: make AGFL repair function avoid crosslinked blocks
  2022-10-02 18:19 [PATCHSET v23.1 0/4] xfs: fix handling of AG[IF] header buffers during scrub Darrick J. Wong
                   ` (2 preceding siblings ...)
  2022-10-02 18:19 ` [PATCH 1/4] xfs: fully initialize xfs_da_args in xchk_directory_blocks Darrick J. Wong
@ 2022-10-02 18:19 ` Darrick J. Wong
  2022-10-13 22:35   ` Dave Chinner
  3 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-02 18:19 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach the AGFL repair function to check each block of the proposed AGFL
against the rmap btree.  If the rmapbt finds any mappings that are not
OWN_AG, strike that block from the list.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/agheader_repair.c |   78 ++++++++++++++++++++++++++++++++++------
 1 file changed, 66 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 2e75ff9b5b2e..82ceb60ea5fc 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -442,12 +442,18 @@ xrep_agf(
 /* AGFL */
 
 struct xrep_agfl {
+	/* Bitmap of alleged AGFL blocks that we're not going to add. */
+	struct xbitmap		crossed;
+
 	/* Bitmap of other OWN_AG metadata blocks. */
 	struct xbitmap		agmetablocks;
 
 	/* Bitmap of free space. */
 	struct xbitmap		*freesp;
 
+	/* rmapbt cursor for finding crosslinked blocks */
+	struct xfs_btree_cur	*rmap_cur;
+
 	struct xfs_scrub	*sc;
 };
 
@@ -477,6 +483,41 @@ xrep_agfl_walk_rmap(
 	return xbitmap_set_btcur_path(&ra->agmetablocks, cur);
 }
 
+/* Strike out the blocks that are cross-linked according to the rmapbt. */
+STATIC int
+xrep_agfl_check_extent(
+	struct xrep_agfl	*ra,
+	uint64_t		start,
+	uint64_t		len)
+{
+	xfs_agblock_t		agbno = XFS_FSB_TO_AGBNO(ra->sc->mp, start);
+	xfs_agblock_t		last_agbno = agbno + len - 1;
+	int			error;
+
+	ASSERT(XFS_FSB_TO_AGNO(ra->sc->mp, start) == ra->sc->sa.pag->pag_agno);
+
+	while (agbno <= last_agbno) {
+		bool		other_owners;
+
+		error = xfs_rmap_has_other_keys(ra->rmap_cur, agbno, 1,
+				&XFS_RMAP_OINFO_AG, &other_owners);
+		if (error)
+			return error;
+
+		if (other_owners) {
+			error = xbitmap_set(&ra->crossed, agbno, 1);
+			if (error)
+				return error;
+		}
+
+		if (xchk_should_terminate(ra->sc, &error))
+			return error;
+		agbno++;
+	}
+
+	return 0;
+}
+
 /*
  * Map out all the non-AGFL OWN_AG space in this AG so that we can deduce
  * which blocks belong to the AGFL.
@@ -496,44 +537,58 @@ xrep_agfl_collect_blocks(
 	struct xrep_agfl	ra;
 	struct xfs_mount	*mp = sc->mp;
 	struct xfs_btree_cur	*cur;
+	struct xbitmap_range	*br, *n;
 	int			error;
 
 	ra.sc = sc;
 	ra.freesp = agfl_extents;
 	xbitmap_init(&ra.agmetablocks);
+	xbitmap_init(&ra.crossed);
 
 	/* Find all space used by the free space btrees & rmapbt. */
 	cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
 	error = xfs_rmap_query_all(cur, xrep_agfl_walk_rmap, &ra);
-	if (error)
-		goto err;
 	xfs_btree_del_cursor(cur, error);
+	if (error)
+		goto out_bmp;
 
 	/* Find all blocks currently being used by the bnobt. */
 	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp,
 			sc->sa.pag, XFS_BTNUM_BNO);
 	error = xbitmap_set_btblocks(&ra.agmetablocks, cur);
-	if (error)
-		goto err;
 	xfs_btree_del_cursor(cur, error);
+	if (error)
+		goto out_bmp;
 
 	/* Find all blocks currently being used by the cntbt. */
 	cur = xfs_allocbt_init_cursor(mp, sc->tp, agf_bp,
 			sc->sa.pag, XFS_BTNUM_CNT);
 	error = xbitmap_set_btblocks(&ra.agmetablocks, cur);
-	if (error)
-		goto err;
-
 	xfs_btree_del_cursor(cur, error);
+	if (error)
+		goto out_bmp;
 
 	/*
 	 * Drop the freesp meta blocks that are in use by btrees.
 	 * The remaining blocks /should/ be AGFL blocks.
 	 */
 	error = xbitmap_disunion(agfl_extents, &ra.agmetablocks);
-	xbitmap_destroy(&ra.agmetablocks);
 	if (error)
-		return error;
+		goto out_bmp;
+
+	/* Strike out the blocks that are cross-linked. */
+	ra.rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
+	for_each_xbitmap_extent(br, n, agfl_extents) {
+		error = xrep_agfl_check_extent(&ra, br->start, br->len);
+		if (error)
+			break;
+	}
+	xfs_btree_del_cursor(ra.rmap_cur, error);
+	if (error)
+		goto out_bmp;
+	error = xbitmap_disunion(agfl_extents, &ra.crossed);
+	if (error)
+		goto out_bmp;
 
 	/*
 	 * Calculate the new AGFL size.  If we found more blocks than fit in
@@ -541,11 +596,10 @@ xrep_agfl_collect_blocks(
 	 */
 	*flcount = min_t(uint64_t, xbitmap_hweight(agfl_extents),
 			 xfs_agfl_size(mp));
-	return 0;
 
-err:
+out_bmp:
+	xbitmap_destroy(&ra.crossed);
 	xbitmap_destroy(&ra.agmetablocks);
-	xfs_btree_del_cursor(cur, error);
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll
  2022-10-02 18:19 ` [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll Darrick J. Wong
@ 2022-10-13 22:25   ` Dave Chinner
  2022-10-13 23:19     ` Darrick J. Wong
  2022-10-31 18:08   ` [PATCH v23.2 3/4] xfs: log the AGI/AGF buffers when rolling transactions during an AG repair Darrick J. Wong
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-10-13 22:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Currently, the only way to lock an allocation group is to hold the AGI
> and AGF buffers.  If repair needs to roll the transaction while
> repairing some AG metadata, it maintains that lock by holding the two
> buffers across the transaction roll and joins them afterwards.
> 
> However, repair is not the same as the other parts of XFS that employ
> this bhold/bjoin sequence, because it's possible that the AGI or AGF
> buffers are not actually dirty before the roll.  In this case, the
> buffer log item can detach from the buffer, which means that we have to

Doesn't this imply we have a reference counting problem with
XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
reference count on it goes to zero. If the buffer is clean and
joined to a transaction, then that means the only reference to the
BLI is the current transaction. Hence the only way it can get
detached is for the transaction commit to release the current
transaction's reference to the BLI.

Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
prevents ->iop_release from releasing the -buffer- after it drops
the transaction reference to the BLI. That's the problem right there
- xfs_buf_item_release() drops the current trans ref to the clean
item via xfs_buf_item_release() regardless of whether BLI_HOLD is
set or not, hence freeing the BLI on clean buffers.

IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
reference to the BLI as well as the buffer so that we don't free the
BLI for a held clean buffer in xfs_buf_item_release(). The reference
we leave behind will then be picked up by the subsequent call to
xfs_trans_bjoin() which finds the clean BLI already attached to the
buffer...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] xfs: don't track the AGFL buffer in the scrub AG context
  2022-10-02 18:19 ` [PATCH 2/4] xfs: don't track the AGFL buffer in the scrub AG context Darrick J. Wong
@ 2022-10-13 22:32   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-10-13 22:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While scrubbing an allocation group, we don't need to hold the AGFL
> buffer as part of the scrub context.  All that is necessary to lock an
> AG is to hold the AGI and AGF buffers, so fix all the existing users of
> the AGFL buffer to grab them only when necessary.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/agheader.c        |   47 +++++++++++++++++++++++++---------------
>  fs/xfs/scrub/agheader_repair.c |    1 -
>  fs/xfs/scrub/common.c          |    8 -------
>  fs/xfs/scrub/repair.c          |   11 +++++----
>  fs/xfs/scrub/scrub.h           |    1 -
>  5 files changed, 35 insertions(+), 33 deletions(-)

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

> @@ -717,24 +731,21 @@ xchk_agfl(
>  
>  	/* Allocate buffer to ensure uniqueness of AGFL entries. */
>  	agf = sc->sa.agf_bp->b_addr;
> -	agflcount = be32_to_cpu(agf->agf_flcount);
> -	if (agflcount > xfs_agfl_size(sc->mp)) {
> +	sai.agflcount = be32_to_cpu(agf->agf_flcount);
> +	if (sai.agflcount > xfs_agfl_size(sc->mp)) {
>  		xchk_block_set_corrupt(sc, sc->sa.agf_bp);
>  		goto out;
>  	}
> -	memset(&sai, 0, sizeof(sai));
> -	sai.sc = sc;
> -	sai.sz_entries = agflcount;
> -	sai.entries = kmem_zalloc(sizeof(xfs_agblock_t) * agflcount,
> -			KM_MAYFAIL);
> +	sai.entries = kvcalloc(sai.agflcount, sizeof(xfs_agblock_t),
> +			GFP_KERNEL | __GFP_RETRY_MAYFAIL);

The code is fine, but I'm curious why kvcalloc()? Are there really
devices out there with sector sizes large than 4kB that we support?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xfs: fully initialize xfs_da_args in xchk_directory_blocks
  2022-10-02 18:19 ` [PATCH 1/4] xfs: fully initialize xfs_da_args in xchk_directory_blocks Darrick J. Wong
@ 2022-10-13 22:33   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-10-13 22:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While running the online fsck test suite, I noticed the following
> assertion in the kernel log (edited for brevity):
> 
> XFS: Assertion failed: 0, file: fs/xfs/xfs_health.c, line: 571
> ------------[ cut here ]------------
> WARNING: CPU: 3 PID: 11667 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
> CPU: 3 PID: 11667 Comm: xfs_scrub Tainted: G        W         5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> RIP: 0010:assfail+0x46/0x4a [xfs]
> Call Trace:
>  <TASK>
>  xfs_dir2_isblock+0xcc/0xe0
>  xchk_directory_blocks+0xc7/0x420
>  xchk_directory+0x53/0xb0
>  xfs_scrub_metadata+0x2b6/0x6b0
>  xfs_scrubv_metadata+0x35e/0x4d0
>  xfs_ioc_scrubv_metadata+0x111/0x160
>  xfs_file_ioctl+0x4ec/0xef0
>  __x64_sys_ioctl+0x82/0xa0
>  do_syscall_64+0x2b/0x80
>  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> This assertion triggers in xfs_dirattr_mark_sick when the caller passes
> in a whichfork value that is neither of XFS_{DATA,ATTR}_FORK.  The cause
> of this is that xchk_directory_blocks only partially initializes the
> xfs_da_args structure that is passed to xfs_dir2_isblock.  If the data
> fork is not correct, the XFS_IS_CORRUPT clause will trigger.  My
> development branch reports this failure to the health monitoring
> subsystem, which accesses the uninitialized args->whichfork field,
> leading the the assertion tripping.  We really shouldn't be passing
> random stack contents around, so the solution here is to force the
> compiler to zero-initialize the struct.
> 
> Found by fuzzing u3.bmx[0].blockcount = middlebit on xfs/1554.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/dir.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)

Looks good, surprised it took this long to trip over this...

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: make AGFL repair function avoid crosslinked blocks
  2022-10-02 18:19 ` [PATCH 4/4] xfs: make AGFL repair function avoid crosslinked blocks Darrick J. Wong
@ 2022-10-13 22:35   ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-10-13 22:35 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Oct 02, 2022 at 11:19:49AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Teach the AGFL repair function to check each block of the proposed AGFL
> against the rmap btree.  If the rmapbt finds any mappings that are not
> OWN_AG, strike that block from the list.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/scrub/agheader_repair.c |   78 ++++++++++++++++++++++++++++++++++------
>  1 file changed, 66 insertions(+), 12 deletions(-)

Looks ok.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll
  2022-10-13 22:25   ` Dave Chinner
@ 2022-10-13 23:19     ` Darrick J. Wong
  2022-10-14  1:28       ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-13 23:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 14, 2022 at 09:25:53AM +1100, Dave Chinner wrote:
> On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Currently, the only way to lock an allocation group is to hold the AGI
> > and AGF buffers.  If repair needs to roll the transaction while
> > repairing some AG metadata, it maintains that lock by holding the two
> > buffers across the transaction roll and joins them afterwards.
> > 
> > However, repair is not the same as the other parts of XFS that employ
> > this bhold/bjoin sequence, because it's possible that the AGI or AGF
> > buffers are not actually dirty before the roll.  In this case, the
> > buffer log item can detach from the buffer, which means that we have to
> 
> Doesn't this imply we have a reference counting problem with
> XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
> reference count on it goes to zero. If the buffer is clean and
> joined to a transaction, then that means the only reference to the
> BLI is the current transaction. Hence the only way it can get
> detached is for the transaction commit to release the current
> transaction's reference to the BLI.
> 
> Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
> prevents ->iop_release from releasing the -buffer- after it drops
> the transaction reference to the BLI. That's the problem right there
> - xfs_buf_item_release() drops the current trans ref to the clean
> item via xfs_buf_item_release() regardless of whether BLI_HOLD is
> set or not, hence freeing the BLI on clean buffers.
> 
> IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
> reference to the BLI as well as the buffer so that we don't free the
> BLI for a held clean buffer in xfs_buf_item_release(). The reference
> we leave behind will then be picked up by the subsequent call to
> xfs_trans_bjoin() which finds the clean BLI already attached to the
> buffer...

<nod> I think you're saying that _xfs_trans_bjoin should:

	if (!(bip->bli_flags & XFS_BLI_HOLD))
		atomic_inc(&bip->bli_refcount);

and xfs_buf_item_release should do:

	if (hold)
		return;
	released = xfs_buf_item_put(bip);
	if (stale && !released)
		return;

I'll have to remember how I induced this error in the first place.  I
think it was when I was running repair tests with mem=600M.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll
  2022-10-13 23:19     ` Darrick J. Wong
@ 2022-10-14  1:28       ` Dave Chinner
  2022-10-24  4:16         ` Darrick J. Wong
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2022-10-14  1:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Oct 13, 2022 at 04:19:15PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 14, 2022 at 09:25:53AM +1100, Dave Chinner wrote:
> > On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Currently, the only way to lock an allocation group is to hold the AGI
> > > and AGF buffers.  If repair needs to roll the transaction while
> > > repairing some AG metadata, it maintains that lock by holding the two
> > > buffers across the transaction roll and joins them afterwards.
> > > 
> > > However, repair is not the same as the other parts of XFS that employ
> > > this bhold/bjoin sequence, because it's possible that the AGI or AGF
> > > buffers are not actually dirty before the roll.  In this case, the
> > > buffer log item can detach from the buffer, which means that we have to
> > 
> > Doesn't this imply we have a reference counting problem with
> > XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
> > reference count on it goes to zero. If the buffer is clean and
> > joined to a transaction, then that means the only reference to the
> > BLI is the current transaction. Hence the only way it can get
> > detached is for the transaction commit to release the current
> > transaction's reference to the BLI.
> > 
> > Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
> > prevents ->iop_release from releasing the -buffer- after it drops
> > the transaction reference to the BLI. That's the problem right there
> > - xfs_buf_item_release() drops the current trans ref to the clean
> > item via xfs_buf_item_release() regardless of whether BLI_HOLD is
> > set or not, hence freeing the BLI on clean buffers.
> > 
> > IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
> > reference to the BLI as well as the buffer so that we don't free the
> > BLI for a held clean buffer in xfs_buf_item_release(). The reference
> > we leave behind will then be picked up by the subsequent call to
> > xfs_trans_bjoin() which finds the clean BLI already attached to the
> > buffer...
> 
> <nod> I think you're saying that _xfs_trans_bjoin should:
> 
> 	if (!(bip->bli_flags & XFS_BLI_HOLD))
> 		atomic_inc(&bip->bli_refcount);
> 
> and xfs_buf_item_release should do:
> 
> 	if (hold)
> 		return;
> 	released = xfs_buf_item_put(bip);
> 	if (stale && !released)
> 		return;

Not exactly. What I was thinking was something like this:

xfs_trans_bhold() should do:

	bip->bli_flags |= XFS_BLI_HOLD;
	atomic_inc(&bip->bli_refcount);

xfs_trans_bhold_release() should do:

	bip->bli_flags &= ~XFS_BLI_HOLD;
	atomic_dec(&bip->bli_refcount);

xfs_trans_brelse() shoudl do:

	if (bip->bli_flags & XFS_BLI_HOLD) {
		bip->bli_flags &= ~XFS_BLI_HOLD;
		atomic_dec(&bip->bli_refcount);
	}

and xfs_buf_item_release() should do:

	if (hold) {
		/* Release the hold ref but not the rolling transaction ref */
		bip->bli_flags &= ~XFS_BLI_HOLD;
		atomic_dec(&bip->bli_refcount);
		return;
	}
	released = xfs_buf_item_put(bip);
	if (stale && !released)
		return;

Then _xfs_trans_bjoin() remains unchanged, as we don't remove the
BLI from the held clean buffer as there is still a reference to it.
The new transaction we rejoin the buffer to will take that
reference.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll
  2022-10-14  1:28       ` Dave Chinner
@ 2022-10-24  4:16         ` Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-24  4:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Oct 14, 2022 at 12:28:19PM +1100, Dave Chinner wrote:
> On Thu, Oct 13, 2022 at 04:19:15PM -0700, Darrick J. Wong wrote:
> > On Fri, Oct 14, 2022 at 09:25:53AM +1100, Dave Chinner wrote:
> > > On Sun, Oct 02, 2022 at 11:19:48AM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Currently, the only way to lock an allocation group is to hold the AGI
> > > > and AGF buffers.  If repair needs to roll the transaction while
> > > > repairing some AG metadata, it maintains that lock by holding the two
> > > > buffers across the transaction roll and joins them afterwards.
> > > > 
> > > > However, repair is not the same as the other parts of XFS that employ
> > > > this bhold/bjoin sequence, because it's possible that the AGI or AGF
> > > > buffers are not actually dirty before the roll.  In this case, the
> > > > buffer log item can detach from the buffer, which means that we have to
> > > 
> > > Doesn't this imply we have a reference counting problem with
> > > XFS_BLI_HOLD buffers? i.e. the bli can only get detached when the
> > > reference count on it goes to zero. If the buffer is clean and
> > > joined to a transaction, then that means the only reference to the
> > > BLI is the current transaction. Hence the only way it can get
> > > detached is for the transaction commit to release the current
> > > transaction's reference to the BLI.
> > > 
> > > Ah, XFS_BLI_HOLD does not take a reference to the BLI - it just
> > > prevents ->iop_release from releasing the -buffer- after it drops
> > > the transaction reference to the BLI. That's the problem right there
> > > - xfs_buf_item_release() drops the current trans ref to the clean
> > > item via xfs_buf_item_release() regardless of whether BLI_HOLD is
> > > set or not, hence freeing the BLI on clean buffers.
> > > 
> > > IOWs, it looks to me like XFS_BLI_HOLD should actually hold a
> > > reference to the BLI as well as the buffer so that we don't free the
> > > BLI for a held clean buffer in xfs_buf_item_release(). The reference
> > > we leave behind will then be picked up by the subsequent call to
> > > xfs_trans_bjoin() which finds the clean BLI already attached to the
> > > buffer...
> > 
> > <nod> I think you're saying that _xfs_trans_bjoin should:
> > 
> > 	if (!(bip->bli_flags & XFS_BLI_HOLD))
> > 		atomic_inc(&bip->bli_refcount);
> > 
> > and xfs_buf_item_release should do:
> > 
> > 	if (hold)
> > 		return;
> > 	released = xfs_buf_item_put(bip);
> > 	if (stale && !released)
> > 		return;
> 
> Not exactly. What I was thinking was something like this:
> 
> xfs_trans_bhold() should do:
> 
> 	bip->bli_flags |= XFS_BLI_HOLD;
> 	atomic_inc(&bip->bli_refcount);
> 
> xfs_trans_bhold_release() should do:
> 
> 	bip->bli_flags &= ~XFS_BLI_HOLD;
> 	atomic_dec(&bip->bli_refcount);
> 
> xfs_trans_brelse() shoudl do:
> 
> 	if (bip->bli_flags & XFS_BLI_HOLD) {
> 		bip->bli_flags &= ~XFS_BLI_HOLD;
> 		atomic_dec(&bip->bli_refcount);
> 	}
> 
> and xfs_buf_item_release() should do:
> 
> 	if (hold) {
> 		/* Release the hold ref but not the rolling transaction ref */
> 		bip->bli_flags &= ~XFS_BLI_HOLD;
> 		atomic_dec(&bip->bli_refcount);
> 		return;
> 	}
> 	released = xfs_buf_item_put(bip);
> 	if (stale && !released)
> 		return;
> 
> Then _xfs_trans_bjoin() remains unchanged, as we don't remove the
> BLI from the held clean buffer as there is still a reference to it.
> The new transaction we rejoin the buffer to will take that
> reference.

Hnmmmm.  I tried this, but it lead to massive memory corruption
problems, slab leak warnings, and a hang.  If I get around to it I'll
try to see if KASAN will help me figure out what went wrong.

It occurred to me that it might be easier to log the first byte of the
AGI and AGF buffers before the transaction roll, which prevents the bli
from falling off the buffer during the transaction rolling.

Yes that isn't fixing the problem that the unusual use of bhold on a
clean buffer doesn't work right, but <shrug> I don't want to increase
the risks of this patchset by reworking buffer/bli lifetimes when I can
do something simpler. :)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v23.2 3/4] xfs: log the AGI/AGF buffers when rolling transactions during an AG repair
  2022-10-02 18:19 ` [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll Darrick J. Wong
  2022-10-13 22:25   ` Dave Chinner
@ 2022-10-31 18:08   ` Darrick J. Wong
  2022-10-31 21:17     ` Dave Chinner
  1 sibling, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-10-31 18:08 UTC (permalink / raw)
  To: linux-xfs, Dave Chinner

From: Darrick J. Wong <djwong@kernel.org>

Currently, the only way to lock an allocation group is to hold the AGI
and AGF buffers.  If a repair needs to roll the transaction while
repairing some AG metadata, it maintains that lock by holding the two
buffers across the transaction roll and joins them afterwards.

However, repair is not like other parts of XFS that employ the bhold -
roll - bjoin sequence because it's possible that the AGI or AGF buffers
are not actually dirty before the roll.  This presents two problems --
First, we need to redirty those buffers to keep them moving along in the
log to avoid pinning the log tail.  Second, a clean buffer log item can
detach from the buffer.  If this happens, the buffer type state is
discarded along with the bli and must be reattached before the next time
the buffer is logged.   If it is not, the logging code will complain and
log recovery will not work properly.

An earlier version of this patch tried to fix the second problem by
re-setting the buffer type in the bli after joining the buffer to the
new transaction, but that looked weird and didn't solve the first
problem.  Instead, solve both problems by logging the buffer before
rolling the transaction.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/repair.c |   30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 2ada7fc1c398..22335619c84e 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -121,24 +121,36 @@ xrep_roll_ag_trans(
 {
 	int			error;
 
-	/* Keep the AG header buffers locked so we can keep going. */
-	if (sc->sa.agi_bp)
+	/*
+	 * Keep the AG header buffers locked while we roll the transaction.
+	 * Ensure that both AG buffers are dirty and held when we roll the
+	 * transaction so that they move forward in the log without losing the
+	 * bli (and hence the bli type) when the transaction commits.
+	 *
+	 * Normal code would never hold clean buffers across a roll, but repair
+	 * needs both buffers to maintain a total lock on the AG.
+	 */
+	if (sc->sa.agi_bp) {
+		xfs_ialloc_log_agi(sc->tp, sc->sa.agi_bp, XFS_AGI_MAGICNUM);
 		xfs_trans_bhold(sc->tp, sc->sa.agi_bp);
-	if (sc->sa.agf_bp)
+	}
+
+	if (sc->sa.agf_bp) {
+		xfs_alloc_log_agf(sc->tp, sc->sa.agf_bp, XFS_AGF_MAGICNUM);
 		xfs_trans_bhold(sc->tp, sc->sa.agf_bp);
+	}
 
 	/*
-	 * Roll the transaction.  We still own the buffer and the buffer lock
-	 * regardless of whether or not the roll succeeds.  If the roll fails,
-	 * the buffers will be released during teardown on our way out of the
-	 * kernel.  If it succeeds, we join them to the new transaction and
-	 * move on.
+	 * Roll the transaction.  We still hold the AG header buffers locked
+	 * regardless of whether or not that succeeds.  On failure, the buffers
+	 * will be released during teardown on our way out of the kernel.  If
+	 * successful, join the buffers to the new transaction and move on.
 	 */
 	error = xfs_trans_roll(&sc->tp);
 	if (error)
 		return error;
 
-	/* Join AG headers to the new transaction. */
+	/* Join the AG headers to the new transaction. */
 	if (sc->sa.agi_bp)
 		xfs_trans_bjoin(sc->tp, sc->sa.agi_bp);
 	if (sc->sa.agf_bp)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v23.2 3/4] xfs: log the AGI/AGF buffers when rolling transactions during an AG repair
  2022-10-31 18:08   ` [PATCH v23.2 3/4] xfs: log the AGI/AGF buffers when rolling transactions during an AG repair Darrick J. Wong
@ 2022-10-31 21:17     ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2022-10-31 21:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Mon, Oct 31, 2022 at 11:08:33AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Currently, the only way to lock an allocation group is to hold the AGI
> and AGF buffers.  If a repair needs to roll the transaction while
> repairing some AG metadata, it maintains that lock by holding the two
> buffers across the transaction roll and joins them afterwards.
> 
> However, repair is not like other parts of XFS that employ the bhold -
> roll - bjoin sequence because it's possible that the AGI or AGF buffers
> are not actually dirty before the roll.  This presents two problems --
> First, we need to redirty those buffers to keep them moving along in the
> log to avoid pinning the log tail.  Second, a clean buffer log item can
> detach from the buffer.  If this happens, the buffer type state is
> discarded along with the bli and must be reattached before the next time
> the buffer is logged.   If it is not, the logging code will complain and
> log recovery will not work properly.
> 
> An earlier version of this patch tried to fix the second problem by
> re-setting the buffer type in the bli after joining the buffer to the
> new transaction, but that looked weird and didn't solve the first
> problem.  Instead, solve both problems by logging the buffer before
> rolling the transaction.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

I guess this is fine as long as it is confined to the scrub code;
if we need to hold clean buffers across transaction rolls in other
code we really need to sort out the BLI life cycle issues that this
currently exposes.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-31 21:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-02 18:19 [PATCHSET v23.1 0/4] xfs: fix handling of AG[IF] header buffers during scrub Darrick J. Wong
2022-10-02 18:19 ` [PATCH 3/4] xfs: set the buffer type after holding the AG[IF] across trans_roll Darrick J. Wong
2022-10-13 22:25   ` Dave Chinner
2022-10-13 23:19     ` Darrick J. Wong
2022-10-14  1:28       ` Dave Chinner
2022-10-24  4:16         ` Darrick J. Wong
2022-10-31 18:08   ` [PATCH v23.2 3/4] xfs: log the AGI/AGF buffers when rolling transactions during an AG repair Darrick J. Wong
2022-10-31 21:17     ` Dave Chinner
2022-10-02 18:19 ` [PATCH 2/4] xfs: don't track the AGFL buffer in the scrub AG context Darrick J. Wong
2022-10-13 22:32   ` Dave Chinner
2022-10-02 18:19 ` [PATCH 1/4] xfs: fully initialize xfs_da_args in xchk_directory_blocks Darrick J. Wong
2022-10-13 22:33   ` Dave Chinner
2022-10-02 18:19 ` [PATCH 4/4] xfs: make AGFL repair function avoid crosslinked blocks Darrick J. Wong
2022-10-13 22:35   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).