All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] xfs: refactor and tablise growfs
@ 2018-02-01  6:41 Dave Chinner
  2018-02-01  6:41 ` [PATCH 1/7] xfs: factor out AG header initialisation from growfs core Dave Chinner
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:41 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

This is a series I posted months ago with the first thinspace
filesystem support. There was no comments on any of these patches
because all the heat and light got focussed on the growfs API.
I'm posting this separately to avoid that problem again....

Anyway, the core of this change is to make the growfs code much
simpler to extend. Most of the code that does structure
initialisation is cookie-cutter code and it's whacked into one great
big function. This patch set splits it up into separate functions
and uses common helper functions where possible. The different
structures and their initialisation definitions are now held in a
table, so when we add new stuctures or modify existing structures
it's a simple and isolate change.

The reworked initialisation code is suitable for moving to libxfs
and converting mkfs.xfs to use it for the initial formatting of
the filesystem. This will take more work to acheive, so this
patch set stops short of moving the code to libxfs.

The other changes to the growfs code in this patchset also isolate
separate parts of the growfs functionality, such as updating the
secondary superblocks and changing imaxpct. This makes adding
thinspace functionality to growfs much easier.

Finally, there are optimisations to make a large AG count growfs
much faster. Instead of initialising and writing headers one at a
time synchronously, they are added to a delwri buffer list and
written in bulk and asynchronously. This means AG headers get merged
by the block layer and it can reduce the IO wait time by an order of
magnitude or more.

There are also mods to the secondary superblock update algorithm
which make it more resilient in the face of writeback failures. We
use a two pass update now - the main growfs loop now initialised
secondary superblocks with sb_inprogess = 1 to indicate it is not
in a valid state before we make any modifications, then after teh
transactional grow we do a second pass to set sb_inprogess = 0 and
mark them valid.

This means that if we fail to write any secondary superblock, repair
is not going to get confused by partial grow state. If we crash
during the initial write, nothing has changed in the primary
superblock. If we crash after the primary sb grow, then we'll know
exactly what secondary superblocks did not get updated because
they'll be the ones with sb_inprogress = 1 in them. Hence the
recovery process becomes much easier as the parts of the fs that
need updating are obvious....

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/7] xfs: factor out AG header initialisation from growfs core
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
@ 2018-02-01  6:41 ` Dave Chinner
  2018-02-08 18:53   ` Brian Foster
  2018-02-01  6:41 ` [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists Dave Chinner
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The intialisation of new AG headers is mostly common with the
userspace mkfs code and growfs in the kernel, so start factoring it
out so we can move it to libxfs and use it in both places.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 637 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 331 insertions(+), 306 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8b4545623e25..cd5196bf8756 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -71,20 +71,344 @@ xfs_growfs_get_hdr_buf(
 	return bp;
 }
 
+/*
+ * Write new AG headers to disk. Non-transactional, but written
+ * synchronously so they are completed prior to the growfs transaction
+ * being logged.
+ */
+static int
+xfs_grow_ag_headers(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		agsize,
+	xfs_rfsblock_t		*nfree)
+{
+	struct xfs_agf		*agf;
+	struct xfs_agi		*agi;
+	struct xfs_agfl		*agfl;
+	__be32			*agfl_bno;
+	xfs_alloc_rec_t		*arec;
+	struct xfs_buf		*bp;
+	int			bucket;
+	xfs_extlen_t		tmpsize;
+	int			error = 0;
+
+	/*
+	 * AG freespace header block
+	 */
+	bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0,
+			&xfs_agf_buf_ops);
+	if (!bp) {
+		error = -ENOMEM;
+		goto out_error;
+	}
+
+	agf = XFS_BUF_TO_AGF(bp);
+	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
+	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
+	agf->agf_seqno = cpu_to_be32(agno);
+	agf->agf_length = cpu_to_be32(agsize);
+	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(XFS_BNO_BLOCK(mp));
+	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
+	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
+	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		agf->agf_roots[XFS_BTNUM_RMAPi] =
+					cpu_to_be32(XFS_RMAP_BLOCK(mp));
+		agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+		agf->agf_rmap_blocks = cpu_to_be32(1);
+	}
+
+	agf->agf_flfirst = cpu_to_be32(1);
+	agf->agf_fllast = 0;
+	agf->agf_flcount = 0;
+	tmpsize = agsize - mp->m_ag_prealloc_blocks;
+	agf->agf_freeblks = cpu_to_be32(tmpsize);
+	agf->agf_longest = cpu_to_be32(tmpsize);
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		agf->agf_refcount_root = cpu_to_be32(
+				xfs_refc_block(mp));
+		agf->agf_refcount_level = cpu_to_be32(1);
+		agf->agf_refcount_blocks = cpu_to_be32(1);
+	}
+
+	error = xfs_bwrite(bp);
+	xfs_buf_relse(bp);
+	if (error)
+		goto out_error;
+
+	/*
+	 * AG freelist header block
+	 */
+	bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0,
+			&xfs_agfl_buf_ops);
+	if (!bp) {
+		error = -ENOMEM;
+		goto out_error;
+	}
+
+	agfl = XFS_BUF_TO_AGFL(bp);
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
+		agfl->agfl_seqno = cpu_to_be32(agno);
+		uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
+	}
+
+	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
+	for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
+		agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
+
+	error = xfs_bwrite(bp);
+	xfs_buf_relse(bp);
+	if (error)
+		goto out_error;
+
+	/*
+	 * AG inode header block
+	 */
+	bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
+			XFS_FSS_TO_BB(mp, 1), 0,
+			&xfs_agi_buf_ops);
+	if (!bp) {
+		error = -ENOMEM;
+		goto out_error;
+	}
+
+	agi = XFS_BUF_TO_AGI(bp);
+	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
+	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
+	agi->agi_seqno = cpu_to_be32(agno);
+	agi->agi_length = cpu_to_be32(agsize);
+	agi->agi_count = 0;
+	agi->agi_root = cpu_to_be32(XFS_IBT_BLOCK(mp));
+	agi->agi_level = cpu_to_be32(1);
+	agi->agi_freecount = 0;
+	agi->agi_newino = cpu_to_be32(NULLAGINO);
+	agi->agi_dirino = cpu_to_be32(NULLAGINO);
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		agi->agi_free_root = cpu_to_be32(XFS_FIBT_BLOCK(mp));
+		agi->agi_free_level = cpu_to_be32(1);
+	}
+	for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
+		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
+
+	error = xfs_bwrite(bp);
+	xfs_buf_relse(bp);
+	if (error)
+		goto out_error;
+
+	/*
+	 * BNO btree root block
+	 */
+	bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
+			BTOBB(mp->m_sb.sb_blocksize), 0,
+			&xfs_allocbt_buf_ops);
+
+	if (!bp) {
+		error = -ENOMEM;
+		goto out_error;
+	}
+
+	xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, agno, 0);
+
+	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
+	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
+	arec->ar_blockcount = cpu_to_be32(
+		agsize - be32_to_cpu(arec->ar_startblock));
+
+	error = xfs_bwrite(bp);
+	xfs_buf_relse(bp);
+	if (error)
+		goto out_error;
+
+	/*
+	 * CNT btree root block
+	 */
+	bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
+			BTOBB(mp->m_sb.sb_blocksize), 0,
+			&xfs_allocbt_buf_ops);
+	if (!bp) {
+		error = -ENOMEM;
+		goto out_error;
+	}
+
+	xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, agno, 0);
+
+	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
+	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
+	arec->ar_blockcount = cpu_to_be32(
+		agsize - be32_to_cpu(arec->ar_startblock));
+	*nfree += be32_to_cpu(arec->ar_blockcount);
+
+	error = xfs_bwrite(bp);
+	xfs_buf_relse(bp);
+	if (error)
+		goto out_error;
+
+	/* RMAP btree root block */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		struct xfs_rmap_rec	*rrec;
+		struct xfs_btree_block	*block;
+
+		bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
+			BTOBB(mp->m_sb.sb_blocksize), 0,
+			&xfs_rmapbt_buf_ops);
+		if (!bp) {
+			error = -ENOMEM;
+			goto out_error;
+		}
+
+		xfs_btree_init_block(mp, bp, XFS_BTNUM_RMAP, 0, 0,
+					agno, 0);
+		block = XFS_BUF_TO_BLOCK(bp);
+
+
+		/*
+		 * mark the AG header regions as static metadata The BNO
+		 * btree block is the first block after the headers, so
+		 * it's location defines the size of region the static
+		 * metadata consumes.
+		 *
+		 * Note: unlike mkfs, we never have to account for log
+		 * space when growing the data regions
+		 */
+		rrec = XFS_RMAP_REC_ADDR(block, 1);
+		rrec->rm_startblock = 0;
+		rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
+		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+		rrec->rm_offset = 0;
+		be16_add_cpu(&block->bb_numrecs, 1);
+
+		/* account freespace btree root blocks */
+		rrec = XFS_RMAP_REC_ADDR(block, 2);
+		rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
+		rrec->rm_blockcount = cpu_to_be32(2);
+		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+		rrec->rm_offset = 0;
+		be16_add_cpu(&block->bb_numrecs, 1);
+
+		/* account inode btree root blocks */
+		rrec = XFS_RMAP_REC_ADDR(block, 3);
+		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
+		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
+						XFS_IBT_BLOCK(mp));
+		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+		rrec->rm_offset = 0;
+		be16_add_cpu(&block->bb_numrecs, 1);
+
+		/* account for rmap btree root */
+		rrec = XFS_RMAP_REC_ADDR(block, 4);
+		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
+		rrec->rm_blockcount = cpu_to_be32(1);
+		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+		rrec->rm_offset = 0;
+		be16_add_cpu(&block->bb_numrecs, 1);
+
+		/* account for refc btree root */
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			rrec = XFS_RMAP_REC_ADDR(block, 5);
+			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
+			rrec->rm_blockcount = cpu_to_be32(1);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
+			rrec->rm_offset = 0;
+			be16_add_cpu(&block->bb_numrecs, 1);
+		}
+
+		error = xfs_bwrite(bp);
+		xfs_buf_relse(bp);
+		if (error)
+			goto out_error;
+	}
+
+	/*
+	 * INO btree root block
+	 */
+	bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
+			BTOBB(mp->m_sb.sb_blocksize), 0,
+			&xfs_inobt_buf_ops);
+	if (!bp) {
+		error = -ENOMEM;
+		goto out_error;
+	}
+
+	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
+
+	error = xfs_bwrite(bp);
+	xfs_buf_relse(bp);
+	if (error)
+		goto out_error;
+
+	/*
+	 * FINO btree root block
+	 */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
+			BTOBB(mp->m_sb.sb_blocksize), 0,
+			&xfs_inobt_buf_ops);
+		if (!bp) {
+			error = -ENOMEM;
+			goto out_error;
+		}
+
+		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO,
+					     0, 0, agno, 0);
+
+		error = xfs_bwrite(bp);
+		xfs_buf_relse(bp);
+		if (error)
+			goto out_error;
+	}
+
+	/*
+	 * refcount btree root block
+	 */
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		bp = xfs_growfs_get_hdr_buf(mp,
+			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
+			BTOBB(mp->m_sb.sb_blocksize), 0,
+			&xfs_refcountbt_buf_ops);
+		if (!bp) {
+			error = -ENOMEM;
+			goto out_error;
+		}
+
+		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC,
+				     0, 0, agno, 0);
+
+		error = xfs_bwrite(bp);
+		xfs_buf_relse(bp);
+		if (error)
+			goto out_error;
+	}
+
+out_error:
+	return error;
+}
+
 static int
 xfs_growfs_data_private(
 	xfs_mount_t		*mp,		/* mount point for filesystem */
 	xfs_growfs_data_t	*in)		/* growfs data input struct */
 {
 	xfs_agf_t		*agf;
-	struct xfs_agfl		*agfl;
 	xfs_agi_t		*agi;
 	xfs_agnumber_t		agno;
 	xfs_extlen_t		agsize;
-	xfs_extlen_t		tmpsize;
-	xfs_alloc_rec_t		*arec;
 	xfs_buf_t		*bp;
-	int			bucket;
 	int			dpct;
 	int			error, saved_error = 0;
 	xfs_agnumber_t		nagcount;
@@ -141,318 +465,19 @@ xfs_growfs_data_private(
 	 */
 	nfree = 0;
 	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
-		__be32	*agfl_bno;
-
-		/*
-		 * AG freespace header block
-		 */
-		bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
-				XFS_FSS_TO_BB(mp, 1), 0,
-				&xfs_agf_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
-			goto error0;
-		}
 
-		agf = XFS_BUF_TO_AGF(bp);
-		agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
-		agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
-		agf->agf_seqno = cpu_to_be32(agno);
 		if (agno == nagcount - 1)
-			agsize =
-				nb -
+			agsize = nb -
 				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
 		else
 			agsize = mp->m_sb.sb_agblocks;
-		agf->agf_length = cpu_to_be32(agsize);
-		agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(XFS_BNO_BLOCK(mp));
-		agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
-		agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
-		agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
-		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-			agf->agf_roots[XFS_BTNUM_RMAPi] =
-						cpu_to_be32(XFS_RMAP_BLOCK(mp));
-			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
-			agf->agf_rmap_blocks = cpu_to_be32(1);
-		}
-
-		agf->agf_flfirst = cpu_to_be32(1);
-		agf->agf_fllast = 0;
-		agf->agf_flcount = 0;
-		tmpsize = agsize - mp->m_ag_prealloc_blocks;
-		agf->agf_freeblks = cpu_to_be32(tmpsize);
-		agf->agf_longest = cpu_to_be32(tmpsize);
-		if (xfs_sb_version_hascrc(&mp->m_sb))
-			uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
-		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-			agf->agf_refcount_root = cpu_to_be32(
-					xfs_refc_block(mp));
-			agf->agf_refcount_level = cpu_to_be32(1);
-			agf->agf_refcount_blocks = cpu_to_be32(1);
-		}
-
-		error = xfs_bwrite(bp);
-		xfs_buf_relse(bp);
-		if (error)
-			goto error0;
-
-		/*
-		 * AG freelist header block
-		 */
-		bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
-				XFS_FSS_TO_BB(mp, 1), 0,
-				&xfs_agfl_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
-			goto error0;
-		}
-
-		agfl = XFS_BUF_TO_AGFL(bp);
-		if (xfs_sb_version_hascrc(&mp->m_sb)) {
-			agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
-			agfl->agfl_seqno = cpu_to_be32(agno);
-			uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
-		}
-
-		agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
-		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
-			agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
 
-		error = xfs_bwrite(bp);
-		xfs_buf_relse(bp);
+		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree);
 		if (error)
 			goto error0;
-
-		/*
-		 * AG inode header block
-		 */
-		bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
-				XFS_FSS_TO_BB(mp, 1), 0,
-				&xfs_agi_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
-			goto error0;
-		}
-
-		agi = XFS_BUF_TO_AGI(bp);
-		agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
-		agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
-		agi->agi_seqno = cpu_to_be32(agno);
-		agi->agi_length = cpu_to_be32(agsize);
-		agi->agi_count = 0;
-		agi->agi_root = cpu_to_be32(XFS_IBT_BLOCK(mp));
-		agi->agi_level = cpu_to_be32(1);
-		agi->agi_freecount = 0;
-		agi->agi_newino = cpu_to_be32(NULLAGINO);
-		agi->agi_dirino = cpu_to_be32(NULLAGINO);
-		if (xfs_sb_version_hascrc(&mp->m_sb))
-			uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
-		if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-			agi->agi_free_root = cpu_to_be32(XFS_FIBT_BLOCK(mp));
-			agi->agi_free_level = cpu_to_be32(1);
-		}
-		for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
-			agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
-
-		error = xfs_bwrite(bp);
-		xfs_buf_relse(bp);
-		if (error)
-			goto error0;
-
-		/*
-		 * BNO btree root block
-		 */
-		bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
-				BTOBB(mp->m_sb.sb_blocksize), 0,
-				&xfs_allocbt_buf_ops);
-
-		if (!bp) {
-			error = -ENOMEM;
-			goto error0;
-		}
-
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, agno, 0);
-
-		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
-		arec->ar_blockcount = cpu_to_be32(
-			agsize - be32_to_cpu(arec->ar_startblock));
-
-		error = xfs_bwrite(bp);
-		xfs_buf_relse(bp);
-		if (error)
-			goto error0;
-
-		/*
-		 * CNT btree root block
-		 */
-		bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
-				BTOBB(mp->m_sb.sb_blocksize), 0,
-				&xfs_allocbt_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
-			goto error0;
-		}
-
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, agno, 0);
-
-		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
-		arec->ar_blockcount = cpu_to_be32(
-			agsize - be32_to_cpu(arec->ar_startblock));
-		nfree += be32_to_cpu(arec->ar_blockcount);
-
-		error = xfs_bwrite(bp);
-		xfs_buf_relse(bp);
-		if (error)
-			goto error0;
-
-		/* RMAP btree root block */
-		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-			struct xfs_rmap_rec	*rrec;
-			struct xfs_btree_block	*block;
-
-			bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
-				BTOBB(mp->m_sb.sb_blocksize), 0,
-				&xfs_rmapbt_buf_ops);
-			if (!bp) {
-				error = -ENOMEM;
-				goto error0;
-			}
-
-			xfs_btree_init_block(mp, bp, XFS_BTNUM_RMAP, 0, 0,
-						agno, 0);
-			block = XFS_BUF_TO_BLOCK(bp);
-
-
-			/*
-			 * mark the AG header regions as static metadata The BNO
-			 * btree block is the first block after the headers, so
-			 * it's location defines the size of region the static
-			 * metadata consumes.
-			 *
-			 * Note: unlike mkfs, we never have to account for log
-			 * space when growing the data regions
-			 */
-			rrec = XFS_RMAP_REC_ADDR(block, 1);
-			rrec->rm_startblock = 0;
-			rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-
-			/* account freespace btree root blocks */
-			rrec = XFS_RMAP_REC_ADDR(block, 2);
-			rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
-			rrec->rm_blockcount = cpu_to_be32(2);
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-
-			/* account inode btree root blocks */
-			rrec = XFS_RMAP_REC_ADDR(block, 3);
-			rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
-			rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
-							XFS_IBT_BLOCK(mp));
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-
-			/* account for rmap btree root */
-			rrec = XFS_RMAP_REC_ADDR(block, 4);
-			rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
-			rrec->rm_blockcount = cpu_to_be32(1);
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-
-			/* account for refc btree root */
-			if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-				rrec = XFS_RMAP_REC_ADDR(block, 5);
-				rrec->rm_startblock = cpu_to_be32(
-						xfs_refc_block(mp));
-				rrec->rm_blockcount = cpu_to_be32(1);
-				rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
-				rrec->rm_offset = 0;
-				be16_add_cpu(&block->bb_numrecs, 1);
-			}
-
-			error = xfs_bwrite(bp);
-			xfs_buf_relse(bp);
-			if (error)
-				goto error0;
-		}
-
-		/*
-		 * INO btree root block
-		 */
-		bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
-				BTOBB(mp->m_sb.sb_blocksize), 0,
-				&xfs_inobt_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
-			goto error0;
-		}
-
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
-
-		error = xfs_bwrite(bp);
-		xfs_buf_relse(bp);
-		if (error)
-			goto error0;
-
-		/*
-		 * FINO btree root block
-		 */
-		if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-			bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
-				BTOBB(mp->m_sb.sb_blocksize), 0,
-				&xfs_inobt_buf_ops);
-			if (!bp) {
-				error = -ENOMEM;
-				goto error0;
-			}
-
-			xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO,
-						     0, 0, agno, 0);
-
-			error = xfs_bwrite(bp);
-			xfs_buf_relse(bp);
-			if (error)
-				goto error0;
-		}
-
-		/*
-		 * refcount btree root block
-		 */
-		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-			bp = xfs_growfs_get_hdr_buf(mp,
-				XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
-				BTOBB(mp->m_sb.sb_blocksize), 0,
-				&xfs_refcountbt_buf_ops);
-			if (!bp) {
-				error = -ENOMEM;
-				goto error0;
-			}
-
-			xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC,
-					     0, 0, agno, 0);
-
-			error = xfs_bwrite(bp);
-			xfs_buf_relse(bp);
-			if (error)
-				goto error0;
-		}
 	}
 	xfs_trans_agblocks_delta(tp, nfree);
+
 	/*
 	 * There are new blocks in the old last a.g.
 	 */
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
  2018-02-01  6:41 ` [PATCH 1/7] xfs: factor out AG header initialisation from growfs core Dave Chinner
@ 2018-02-01  6:41 ` Dave Chinner
  2018-02-08 18:53   ` Brian Foster
  2018-02-01  6:41 ` [PATCH 3/7] xfs: factor ag btree reoot block initialisation Dave Chinner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently write all new AG headers synchronously, which can be
slow for large grow operations. All we really need to do is ensure
all the headers are on disk before we run the growfs transaction, so
convert this to a buffer list and a delayed write operation. We
block waiting for the delayed write buffer submission to complete,
so this will fulfill the requirement to have all the buffers written
correctly before proceeding.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 74 ++++++++++++++++++++++++------------------------------
 1 file changed, 33 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index cd5196bf8756..d9e08d8cf9ac 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -81,7 +81,8 @@ xfs_grow_ag_headers(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
 	xfs_extlen_t		agsize,
-	xfs_rfsblock_t		*nfree)
+	xfs_rfsblock_t		*nfree,
+	struct list_head	*buffer_list)
 {
 	struct xfs_agf		*agf;
 	struct xfs_agi		*agi;
@@ -135,11 +136,8 @@ xfs_grow_ag_headers(
 		agf->agf_refcount_level = cpu_to_be32(1);
 		agf->agf_refcount_blocks = cpu_to_be32(1);
 	}
-
-	error = xfs_bwrite(bp);
+	xfs_buf_delwri_queue(bp, buffer_list);
 	xfs_buf_relse(bp);
-	if (error)
-		goto out_error;
 
 	/*
 	 * AG freelist header block
@@ -164,10 +162,8 @@ xfs_grow_ag_headers(
 	for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
 		agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
 
-	error = xfs_bwrite(bp);
+	xfs_buf_delwri_queue(bp, buffer_list);
 	xfs_buf_relse(bp);
-	if (error)
-		goto out_error;
 
 	/*
 	 * AG inode header block
@@ -201,10 +197,8 @@ xfs_grow_ag_headers(
 	for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
 		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
 
-	error = xfs_bwrite(bp);
+	xfs_buf_delwri_queue(bp, buffer_list);
 	xfs_buf_relse(bp);
-	if (error)
-		goto out_error;
 
 	/*
 	 * BNO btree root block
@@ -226,10 +220,8 @@ xfs_grow_ag_headers(
 	arec->ar_blockcount = cpu_to_be32(
 		agsize - be32_to_cpu(arec->ar_startblock));
 
-	error = xfs_bwrite(bp);
+	xfs_buf_delwri_queue(bp, buffer_list);
 	xfs_buf_relse(bp);
-	if (error)
-		goto out_error;
 
 	/*
 	 * CNT btree root block
@@ -251,10 +243,8 @@ xfs_grow_ag_headers(
 		agsize - be32_to_cpu(arec->ar_startblock));
 	*nfree += be32_to_cpu(arec->ar_blockcount);
 
-	error = xfs_bwrite(bp);
+	xfs_buf_delwri_queue(bp, buffer_list);
 	xfs_buf_relse(bp);
-	if (error)
-		goto out_error;
 
 	/* RMAP btree root block */
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
@@ -326,10 +316,8 @@ xfs_grow_ag_headers(
 			be16_add_cpu(&block->bb_numrecs, 1);
 		}
 
-		error = xfs_bwrite(bp);
+		xfs_buf_delwri_queue(bp, buffer_list);
 		xfs_buf_relse(bp);
-		if (error)
-			goto out_error;
 	}
 
 	/*
@@ -345,11 +333,8 @@ xfs_grow_ag_headers(
 	}
 
 	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
-
-	error = xfs_bwrite(bp);
+	xfs_buf_delwri_queue(bp, buffer_list);
 	xfs_buf_relse(bp);
-	if (error)
-		goto out_error;
 
 	/*
 	 * FINO btree root block
@@ -364,13 +349,9 @@ xfs_grow_ag_headers(
 			goto out_error;
 		}
 
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO,
-					     0, 0, agno, 0);
-
-		error = xfs_bwrite(bp);
+		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
+		xfs_buf_delwri_queue(bp, buffer_list);
 		xfs_buf_relse(bp);
-		if (error)
-			goto out_error;
 	}
 
 	/*
@@ -386,13 +367,9 @@ xfs_grow_ag_headers(
 			goto out_error;
 		}
 
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC,
-				     0, 0, agno, 0);
-
-		error = xfs_bwrite(bp);
+		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
+		xfs_buf_delwri_queue(bp, buffer_list);
 		xfs_buf_relse(bp);
-		if (error)
-			goto out_error;
 	}
 
 out_error:
@@ -419,6 +396,7 @@ xfs_growfs_data_private(
 	xfs_agnumber_t		oagcount;
 	int			pct;
 	xfs_trans_t		*tp;
+	LIST_HEAD		(buffer_list);
 
 	nb = in->newblocks;
 	pct = in->imaxpct;
@@ -459,9 +437,16 @@ xfs_growfs_data_private(
 		return error;
 
 	/*
-	 * Write new AG headers to disk. Non-transactional, but written
-	 * synchronously so they are completed prior to the growfs transaction
-	 * being logged.
+	 * Write new AG headers to disk. Non-transactional, but need to be
+	 * written and completed prior to the growfs transaction being logged.
+	 * To do this, we use a delayed write buffer list and wait for
+	 * submission and IO completion of the list as a whole. This allows the
+	 * IO subsystem to merge all the AG headers in a single AG into a single
+	 * IO and hide most of the latency of the IO from us.
+	 *
+	 * This also means that if we get an error whilst building the buffer
+	 * list to write, we can cancel the entire list without having written
+	 * anything.
 	 */
 	nfree = 0;
 	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
@@ -472,10 +457,17 @@ xfs_growfs_data_private(
 		else
 			agsize = mp->m_sb.sb_agblocks;
 
-		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree);
-		if (error)
+		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
+					    &buffer_list);
+		if (error) {
+			xfs_buf_delwri_cancel(&buffer_list);
 			goto error0;
+		}
 	}
+	error = xfs_buf_delwri_submit(&buffer_list);
+	if (error)
+		goto error0;
+
 	xfs_trans_agblocks_delta(tp, nfree);
 
 	/*
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/7] xfs: factor ag btree reoot block initialisation
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
  2018-02-01  6:41 ` [PATCH 1/7] xfs: factor out AG header initialisation from growfs core Dave Chinner
  2018-02-01  6:41 ` [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists Dave Chinner
@ 2018-02-01  6:41 ` Dave Chinner
  2018-02-08 18:54   ` Brian Foster
  2018-02-01  6:41 ` [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation Dave Chinner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Cookie cutter code, easily factored.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 493 +++++++++++++++++++++++++++++------------------------
 1 file changed, 271 insertions(+), 222 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index d9e08d8cf9ac..44eac79e0b49 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -71,46 +71,146 @@ xfs_growfs_get_hdr_buf(
 	return bp;
 }
 
+struct aghdr_init_data {
+	/* per ag data */
+	xfs_agblock_t		agno;
+	xfs_extlen_t		agsize;
+	struct list_head	buffer_list;
+	xfs_rfsblock_t		nfree;
+
+	/* per header data */
+	xfs_daddr_t		daddr;
+	size_t			numblks;
+	xfs_btnum_t		type;
+	int			numrecs;
+};
+
 /*
- * Write new AG headers to disk. Non-transactional, but written
- * synchronously so they are completed prior to the growfs transaction
- * being logged.
+ * Generic btree root block init function
  */
-static int
-xfs_grow_ag_headers(
+static void
+xfs_btroot_init(
 	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	xfs_extlen_t		agsize,
-	xfs_rfsblock_t		*nfree,
-	struct list_head	*buffer_list)
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
+}
+
+/*
+ * Alloc btree root block init functions
+ */
+static void
+xfs_bnoroot_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
 {
-	struct xfs_agf		*agf;
-	struct xfs_agi		*agi;
-	struct xfs_agfl		*agfl;
-	__be32			*agfl_bno;
 	xfs_alloc_rec_t		*arec;
-	struct xfs_buf		*bp;
-	int			bucket;
-	xfs_extlen_t		tmpsize;
-	int			error = 0;
+
+	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
+	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
+	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
+	arec->ar_blockcount = cpu_to_be32(id->agsize -
+					  be32_to_cpu(arec->ar_startblock));
+}
+
+static void
+xfs_cntroot_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	xfs_alloc_rec_t		*arec;
+
+	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
+	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
+	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
+	arec->ar_blockcount = cpu_to_be32(id->agsize -
+					  be32_to_cpu(arec->ar_startblock));
+	id->nfree += be32_to_cpu(arec->ar_blockcount);
+}
+
+/*
+ * Reverse map root block init
+ */
+static void
+xfs_rmaproot_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_rmap_rec	*rrec;
+
+	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
 
 	/*
-	 * AG freespace header block
+	 * mark the AG header regions as static metadata The BNO
+	 * btree block is the first block after the headers, so
+	 * it's location defines the size of region the static
+	 * metadata consumes.
+	 *
+	 * Note: unlike mkfs, we never have to account for log
+	 * space when growing the data regions
 	 */
-	bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
-			XFS_FSS_TO_BB(mp, 1), 0,
-			&xfs_agf_buf_ops);
-	if (!bp) {
-		error = -ENOMEM;
-		goto out_error;
+	rrec = XFS_RMAP_REC_ADDR(block, 1);
+	rrec->rm_startblock = 0;
+	rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
+	rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+	rrec->rm_offset = 0;
+	be16_add_cpu(&block->bb_numrecs, 1);
+
+	/* account freespace btree root blocks */
+	rrec = XFS_RMAP_REC_ADDR(block, 2);
+	rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
+	rrec->rm_blockcount = cpu_to_be32(2);
+	rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+	rrec->rm_offset = 0;
+	be16_add_cpu(&block->bb_numrecs, 1);
+
+	/* account inode btree root blocks */
+	rrec = XFS_RMAP_REC_ADDR(block, 3);
+	rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
+	rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
+					  XFS_IBT_BLOCK(mp));
+	rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+	rrec->rm_offset = 0;
+	be16_add_cpu(&block->bb_numrecs, 1);
+
+	/* account for rmap btree root */
+	rrec = XFS_RMAP_REC_ADDR(block, 4);
+	rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
+	rrec->rm_blockcount = cpu_to_be32(1);
+	rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+	rrec->rm_offset = 0;
+	be16_add_cpu(&block->bb_numrecs, 1);
+
+	/* account for refc btree root */
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		rrec = XFS_RMAP_REC_ADDR(block, 5);
+		rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
+		rrec->rm_blockcount = cpu_to_be32(1);
+		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
+		rrec->rm_offset = 0;
+		be16_add_cpu(&block->bb_numrecs, 1);
 	}
+}
+
+
+static void
+xfs_agfblock_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(bp);
+	xfs_extlen_t		tmpsize;
 
-	agf = XFS_BUF_TO_AGF(bp);
 	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
 	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
-	agf->agf_seqno = cpu_to_be32(agno);
-	agf->agf_length = cpu_to_be32(agsize);
+	agf->agf_seqno = cpu_to_be32(id->agno);
+	agf->agf_length = cpu_to_be32(id->agsize);
 	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(XFS_BNO_BLOCK(mp));
 	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
 	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
@@ -125,7 +225,7 @@ xfs_grow_ag_headers(
 	agf->agf_flfirst = cpu_to_be32(1);
 	agf->agf_fllast = 0;
 	agf->agf_flcount = 0;
-	tmpsize = agsize - mp->m_ag_prealloc_blocks;
+	tmpsize = id->agsize - mp->m_ag_prealloc_blocks;
 	agf->agf_freeblks = cpu_to_be32(tmpsize);
 	agf->agf_longest = cpu_to_be32(tmpsize);
 	if (xfs_sb_version_hascrc(&mp->m_sb))
@@ -136,52 +236,42 @@ xfs_grow_ag_headers(
 		agf->agf_refcount_level = cpu_to_be32(1);
 		agf->agf_refcount_blocks = cpu_to_be32(1);
 	}
-	xfs_buf_delwri_queue(bp, buffer_list);
-	xfs_buf_relse(bp);
+}
 
-	/*
-	 * AG freelist header block
-	 */
-	bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
-			XFS_FSS_TO_BB(mp, 1), 0,
-			&xfs_agfl_buf_ops);
-	if (!bp) {
-		error = -ENOMEM;
-		goto out_error;
-	}
+static void
+xfs_agflblock_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	struct xfs_agfl		*agfl = XFS_BUF_TO_AGFL(bp);
+	__be32			*agfl_bno;
+	int			bucket;
 
-	agfl = XFS_BUF_TO_AGFL(bp);
 	if (xfs_sb_version_hascrc(&mp->m_sb)) {
 		agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
-		agfl->agfl_seqno = cpu_to_be32(agno);
+		agfl->agfl_seqno = cpu_to_be32(id->agno);
 		uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
 	}
 
 	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
 	for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
 		agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
+}
 
-	xfs_buf_delwri_queue(bp, buffer_list);
-	xfs_buf_relse(bp);
-
-	/*
-	 * AG inode header block
-	 */
-	bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
-			XFS_FSS_TO_BB(mp, 1), 0,
-			&xfs_agi_buf_ops);
-	if (!bp) {
-		error = -ENOMEM;
-		goto out_error;
-	}
+static void
+xfs_agiblock_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	struct xfs_agi		*agi = XFS_BUF_TO_AGI(bp);
+	int			bucket;
 
-	agi = XFS_BUF_TO_AGI(bp);
 	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
 	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
-	agi->agi_seqno = cpu_to_be32(agno);
-	agi->agi_length = cpu_to_be32(agsize);
+	agi->agi_seqno = cpu_to_be32(id->agno);
+	agi->agi_length = cpu_to_be32(id->agsize);
 	agi->agi_count = 0;
 	agi->agi_root = cpu_to_be32(XFS_IBT_BLOCK(mp));
 	agi->agi_level = cpu_to_be32(1);
@@ -196,180 +286,139 @@ xfs_grow_ag_headers(
 	}
 	for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
 		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
+}
 
-	xfs_buf_delwri_queue(bp, buffer_list);
-	xfs_buf_relse(bp);
-
-	/*
-	 * BNO btree root block
-	 */
-	bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
-			BTOBB(mp->m_sb.sb_blocksize), 0,
-			&xfs_allocbt_buf_ops);
+static int
+xfs_growfs_init_aghdr(
+	struct xfs_mount	*mp,
+	struct aghdr_init_data	*id,
+	void			(*work)(struct xfs_mount *, struct xfs_buf *,
+					struct aghdr_init_data *),
+	const struct xfs_buf_ops *ops)
 
-	if (!bp) {
-		error = -ENOMEM;
-		goto out_error;
-	}
+{
+	struct xfs_buf		*bp;
 
-	xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, agno, 0);
+	bp = xfs_growfs_get_hdr_buf(mp, id->daddr, id->numblks, 0, ops);
+	if (!bp)
+		return -ENOMEM;
 
-	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
-	arec->ar_blockcount = cpu_to_be32(
-		agsize - be32_to_cpu(arec->ar_startblock));
+	(*work)(mp, bp, id);
 
-	xfs_buf_delwri_queue(bp, buffer_list);
+	xfs_buf_delwri_queue(bp, &id->buffer_list);
 	xfs_buf_relse(bp);
+	return 0;
+}
 
-	/*
-	 * CNT btree root block
-	 */
-	bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
-			BTOBB(mp->m_sb.sb_blocksize), 0,
-			&xfs_allocbt_buf_ops);
-	if (!bp) {
-		error = -ENOMEM;
-		goto out_error;
-	}
-
-	xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, agno, 0);
+/*
+ * Write new AG headers to disk. Non-transactional, but written
+ * synchronously so they are completed prior to the growfs transaction
+ * being logged.
+ */
+static int
+xfs_grow_ag_headers(
+	struct xfs_mount	*mp,
+	struct aghdr_init_data	*id)
 
-	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
-	arec->ar_blockcount = cpu_to_be32(
-		agsize - be32_to_cpu(arec->ar_startblock));
-	*nfree += be32_to_cpu(arec->ar_blockcount);
+{
+	int			error = 0;
 
-	xfs_buf_delwri_queue(bp, buffer_list);
-	xfs_buf_relse(bp);
+	/* AG freespace header block */
+	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGF_DADDR(mp));
+	id->numblks = XFS_FSS_TO_BB(mp, 1);
+	error = xfs_growfs_init_aghdr(mp, id, xfs_agfblock_init,
+					&xfs_agf_buf_ops);
+	if (error)
+		goto out_error;
 
-	/* RMAP btree root block */
-	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-		struct xfs_rmap_rec	*rrec;
-		struct xfs_btree_block	*block;
-
-		bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
-			BTOBB(mp->m_sb.sb_blocksize), 0,
-			&xfs_rmapbt_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
-			goto out_error;
-		}
+	/* AG freelist header block */
+	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGFL_DADDR(mp));
+	id->numblks = XFS_FSS_TO_BB(mp, 1);
+	error = xfs_growfs_init_aghdr(mp, id, xfs_agflblock_init,
+					&xfs_agfl_buf_ops);
+	if (error)
+		goto out_error;
 
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_RMAP, 0, 0,
-					agno, 0);
-		block = XFS_BUF_TO_BLOCK(bp);
+	/* AG inode header block */
+	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGI_DADDR(mp));
+	id->numblks = XFS_FSS_TO_BB(mp, 1);
+	error = xfs_growfs_init_aghdr(mp, id, xfs_agiblock_init,
+					&xfs_agi_buf_ops);
+	if (error)
+		goto out_error;
 
 
-		/*
-		 * mark the AG header regions as static metadata The BNO
-		 * btree block is the first block after the headers, so
-		 * it's location defines the size of region the static
-		 * metadata consumes.
-		 *
-		 * Note: unlike mkfs, we never have to account for log
-		 * space when growing the data regions
-		 */
-		rrec = XFS_RMAP_REC_ADDR(block, 1);
-		rrec->rm_startblock = 0;
-		rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
-
-		/* account freespace btree root blocks */
-		rrec = XFS_RMAP_REC_ADDR(block, 2);
-		rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
-		rrec->rm_blockcount = cpu_to_be32(2);
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
+	/* BNO btree root block */
+	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
+	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
+	id->type = XFS_BTNUM_BNO;
+	id->numrecs = 1;
+	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
+				   &xfs_allocbt_buf_ops);
+	if (error)
+		goto out_error;
 
-		/* account inode btree root blocks */
-		rrec = XFS_RMAP_REC_ADDR(block, 3);
-		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
-		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
-						XFS_IBT_BLOCK(mp));
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
 
-		/* account for rmap btree root */
-		rrec = XFS_RMAP_REC_ADDR(block, 4);
-		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
-		rrec->rm_blockcount = cpu_to_be32(1);
-		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
-		rrec->rm_offset = 0;
-		be16_add_cpu(&block->bb_numrecs, 1);
+	/* CNT btree root block */
+	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
+	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
+	id->type = XFS_BTNUM_CNT;
+	id->numrecs = 1;
+	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
+				   &xfs_allocbt_buf_ops);
+	if (error)
+		goto out_error;
 
-		/* account for refc btree root */
-		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-			rrec = XFS_RMAP_REC_ADDR(block, 5);
-			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
-			rrec->rm_blockcount = cpu_to_be32(1);
-			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
-			rrec->rm_offset = 0;
-			be16_add_cpu(&block->bb_numrecs, 1);
-		}
+	/* RMAP btree root block */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
+		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
+		id->type = XFS_BTNUM_RMAP;
+		id->numrecs = 0;
+		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
+					   &xfs_rmapbt_buf_ops);
+		if (error)
+			goto out_error;
 
-		xfs_buf_delwri_queue(bp, buffer_list);
-		xfs_buf_relse(bp);
 	}
 
-	/*
-	 * INO btree root block
-	 */
-	bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
-			BTOBB(mp->m_sb.sb_blocksize), 0,
-			&xfs_inobt_buf_ops);
-	if (!bp) {
-		error = -ENOMEM;
+	/* INO btree root block */
+	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
+	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
+	id->type = XFS_BTNUM_INO;
+	id->numrecs = 0;
+	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
+				   &xfs_inobt_buf_ops);
+	if (error)
 		goto out_error;
-	}
 
-	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
-	xfs_buf_delwri_queue(bp, buffer_list);
-	xfs_buf_relse(bp);
 
 	/*
 	 * FINO btree root block
 	 */
 	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-		bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
-			BTOBB(mp->m_sb.sb_blocksize), 0,
-			&xfs_inobt_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
+		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
+		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
+		id->type = XFS_BTNUM_FINO;
+		id->numrecs = 0;
+		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
+					   &xfs_inobt_buf_ops);
+		if (error)
 			goto out_error;
-		}
-
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
-		xfs_buf_delwri_queue(bp, buffer_list);
-		xfs_buf_relse(bp);
 	}
 
 	/*
 	 * refcount btree root block
 	 */
 	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-		bp = xfs_growfs_get_hdr_buf(mp,
-			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
-			BTOBB(mp->m_sb.sb_blocksize), 0,
-			&xfs_refcountbt_buf_ops);
-		if (!bp) {
-			error = -ENOMEM;
+		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
+		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
+		id->type = XFS_BTNUM_REFC;
+		id->numrecs = 0;
+		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
+					   &xfs_refcountbt_buf_ops);
+		if (error)
 			goto out_error;
-		}
-
-		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
-		xfs_buf_delwri_queue(bp, buffer_list);
-		xfs_buf_relse(bp);
 	}
 
 out_error:
@@ -384,7 +433,6 @@ xfs_growfs_data_private(
 	xfs_agf_t		*agf;
 	xfs_agi_t		*agi;
 	xfs_agnumber_t		agno;
-	xfs_extlen_t		agsize;
 	xfs_buf_t		*bp;
 	int			dpct;
 	int			error, saved_error = 0;
@@ -392,11 +440,11 @@ xfs_growfs_data_private(
 	xfs_agnumber_t		nagimax = 0;
 	xfs_rfsblock_t		nb, nb_mod;
 	xfs_rfsblock_t		new;
-	xfs_rfsblock_t		nfree;
 	xfs_agnumber_t		oagcount;
 	int			pct;
 	xfs_trans_t		*tp;
 	LIST_HEAD		(buffer_list);
+	struct aghdr_init_data	id = {};
 
 	nb = in->newblocks;
 	pct = in->imaxpct;
@@ -448,27 +496,28 @@ xfs_growfs_data_private(
 	 * list to write, we can cancel the entire list without having written
 	 * anything.
 	 */
-	nfree = 0;
-	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
-
-		if (agno == nagcount - 1)
-			agsize = nb -
-				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
+	INIT_LIST_HEAD(&id.buffer_list);
+	for (id.agno = nagcount - 1;
+	     id.agno >= oagcount;
+	     id.agno--, new -= id.agsize) {
+
+		if (id.agno == nagcount - 1)
+			id.agsize = nb -
+				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
 		else
-			agsize = mp->m_sb.sb_agblocks;
+			id.agsize = mp->m_sb.sb_agblocks;
 
-		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
-					    &buffer_list);
+		error = xfs_grow_ag_headers(mp, &id);
 		if (error) {
-			xfs_buf_delwri_cancel(&buffer_list);
+			xfs_buf_delwri_cancel(&id.buffer_list);
 			goto error0;
 		}
 	}
-	error = xfs_buf_delwri_submit(&buffer_list);
+	error = xfs_buf_delwri_submit(&id.buffer_list);
 	if (error)
 		goto error0;
 
-	xfs_trans_agblocks_delta(tp, nfree);
+	xfs_trans_agblocks_delta(tp, id.nfree);
 
 	/*
 	 * There are new blocks in the old last a.g.
@@ -479,7 +528,7 @@ xfs_growfs_data_private(
 		/*
 		 * Change the agi length.
 		 */
-		error = xfs_ialloc_read_agi(mp, tp, agno, &bp);
+		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
 		if (error) {
 			goto error0;
 		}
@@ -492,7 +541,7 @@ xfs_growfs_data_private(
 		/*
 		 * Change agf length.
 		 */
-		error = xfs_alloc_read_agf(mp, tp, agno, 0, &bp);
+		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
 		if (error) {
 			goto error0;
 		}
@@ -511,13 +560,13 @@ xfs_growfs_data_private(
 		 * this doesn't actually exist in the rmap btree.
 		 */
 		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
-		error = xfs_rmap_free(tp, bp, agno,
+		error = xfs_rmap_free(tp, bp, id.agno,
 				be32_to_cpu(agf->agf_length) - new,
 				new, &oinfo);
 		if (error)
 			goto error0;
 		error = xfs_free_extent(tp,
-				XFS_AGB_TO_FSB(mp, agno,
+				XFS_AGB_TO_FSB(mp, id.agno,
 					be32_to_cpu(agf->agf_length) - new),
 				new, &oinfo, XFS_AG_RESV_NONE);
 		if (error)
@@ -534,8 +583,8 @@ xfs_growfs_data_private(
 	if (nb > mp->m_sb.sb_dblocks)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
 				 nb - mp->m_sb.sb_dblocks);
-	if (nfree)
-		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
+	if (id.nfree)
+		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
 	if (dpct)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
 	xfs_trans_set_sync(tp);
@@ -562,7 +611,7 @@ xfs_growfs_data_private(
 	if (new) {
 		struct xfs_perag	*pag;
 
-		pag = xfs_perag_get(mp, agno);
+		pag = xfs_perag_get(mp, id.agno);
 		error = xfs_ag_resv_free(pag);
 		xfs_perag_put(pag);
 		if (error)
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
                   ` (2 preceding siblings ...)
  2018-02-01  6:41 ` [PATCH 3/7] xfs: factor ag btree reoot block initialisation Dave Chinner
@ 2018-02-01  6:41 ` Dave Chinner
  2018-02-09 16:11   ` Brian Foster
  2018-02-01  6:42 ` [PATCH 5/7] xfs: make imaxpct changes in growfs separate Dave Chinner
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:41 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There's still more cookie cutter code in setting up each AG header.
Separate all the variables into a simple structure and iterate a
table of header definitions to initialise everything.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 163 ++++++++++++++++++++++-------------------------------
 1 file changed, 66 insertions(+), 97 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 44eac79e0b49..94650b7d517e 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -288,12 +288,13 @@ xfs_agiblock_init(
 		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
 }
 
+typedef void (*aghdr_init_work_f)(struct xfs_mount *mp, struct xfs_buf *bp,
+				  struct aghdr_init_data *id);
 static int
 xfs_growfs_init_aghdr(
 	struct xfs_mount	*mp,
 	struct aghdr_init_data	*id,
-	void			(*work)(struct xfs_mount *, struct xfs_buf *,
-					struct aghdr_init_data *),
+	aghdr_init_work_f	work,
 	const struct xfs_buf_ops *ops)
 
 {
@@ -310,6 +311,16 @@ xfs_growfs_init_aghdr(
 	return 0;
 }
 
+struct xfs_aghdr_grow_data {
+	xfs_daddr_t		daddr;
+	size_t			numblks;
+	const struct xfs_buf_ops *ops;
+	aghdr_init_work_f	work;
+	xfs_btnum_t		type;
+	int			numrecs;
+	bool			need_init;
+};
+
 /*
  * Write new AG headers to disk. Non-transactional, but written
  * synchronously so they are completed prior to the growfs transaction
@@ -321,107 +332,65 @@ xfs_grow_ag_headers(
 	struct aghdr_init_data	*id)
 
 {
+	struct xfs_aghdr_grow_data aghdr_data[] = {
+		/* AGF */
+		{ XFS_AG_DADDR(mp, id->agno, XFS_AGF_DADDR(mp)),
+		  XFS_FSS_TO_BB(mp, 1), &xfs_agf_buf_ops,
+		  &xfs_agfblock_init, 0, 0, true },
+		/* AGFL */
+		{ XFS_AG_DADDR(mp, id->agno, XFS_AGFL_DADDR(mp)),
+		  XFS_FSS_TO_BB(mp, 1), &xfs_agfl_buf_ops,
+		  &xfs_agflblock_init, 0, 0, true },
+		/* AGI */
+		{ XFS_AG_DADDR(mp, id->agno, XFS_AGI_DADDR(mp)),
+		  XFS_FSS_TO_BB(mp, 1), &xfs_agi_buf_ops,
+		  &xfs_agiblock_init, 0, 0, true },
+		/* BNO root block */
+		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp)),
+		  BTOBB(mp->m_sb.sb_blocksize), &xfs_allocbt_buf_ops,
+		  &xfs_bnoroot_init, XFS_BTNUM_BNO, 1, true },
+		/* CNT root block */
+		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp)),
+		  BTOBB(mp->m_sb.sb_blocksize), &xfs_allocbt_buf_ops,
+		  &xfs_cntroot_init, XFS_BTNUM_CNT, 1, true },
+		/* INO root block */
+		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp)),
+		  BTOBB(mp->m_sb.sb_blocksize), &xfs_inobt_buf_ops,
+		  &xfs_btroot_init, XFS_BTNUM_INO, 0, true },
+		/* FINO root block */
+		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp)),
+		  BTOBB(mp->m_sb.sb_blocksize), &xfs_inobt_buf_ops,
+		  &xfs_btroot_init, XFS_BTNUM_FINO, 0,
+		  xfs_sb_version_hasfinobt(&mp->m_sb) },
+		/* RMAP root block */
+		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp)),
+		  BTOBB(mp->m_sb.sb_blocksize), &xfs_rmapbt_buf_ops,
+		  &xfs_rmaproot_init, XFS_BTNUM_RMAP, 0,
+		  xfs_sb_version_hasrmapbt(&mp->m_sb) },
+		/* REFC root block */
+		{ XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp)),
+		  BTOBB(mp->m_sb.sb_blocksize), &xfs_refcountbt_buf_ops,
+		  &xfs_btroot_init, XFS_BTNUM_REFC, 0,
+		  xfs_sb_version_hasreflink(&mp->m_sb) },
+		/* NULL terminating block */
+		{ XFS_BUF_DADDR_NULL, 0, NULL, NULL, 0, 0, false },
+	};
+	struct  xfs_aghdr_grow_data *dp;
 	int			error = 0;
 
-	/* AG freespace header block */
-	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGF_DADDR(mp));
-	id->numblks = XFS_FSS_TO_BB(mp, 1);
-	error = xfs_growfs_init_aghdr(mp, id, xfs_agfblock_init,
-					&xfs_agf_buf_ops);
-	if (error)
-		goto out_error;
-
-	/* AG freelist header block */
-	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGFL_DADDR(mp));
-	id->numblks = XFS_FSS_TO_BB(mp, 1);
-	error = xfs_growfs_init_aghdr(mp, id, xfs_agflblock_init,
-					&xfs_agfl_buf_ops);
-	if (error)
-		goto out_error;
-
-	/* AG inode header block */
-	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGI_DADDR(mp));
-	id->numblks = XFS_FSS_TO_BB(mp, 1);
-	error = xfs_growfs_init_aghdr(mp, id, xfs_agiblock_init,
-					&xfs_agi_buf_ops);
-	if (error)
-		goto out_error;
-
-
-	/* BNO btree root block */
-	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
-	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
-	id->type = XFS_BTNUM_BNO;
-	id->numrecs = 1;
-	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
-				   &xfs_allocbt_buf_ops);
-	if (error)
-		goto out_error;
-
-
-	/* CNT btree root block */
-	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
-	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
-	id->type = XFS_BTNUM_CNT;
-	id->numrecs = 1;
-	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
-				   &xfs_allocbt_buf_ops);
-	if (error)
-		goto out_error;
-
-	/* RMAP btree root block */
-	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
-		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
-		id->type = XFS_BTNUM_RMAP;
-		id->numrecs = 0;
-		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
-					   &xfs_rmapbt_buf_ops);
-		if (error)
-			goto out_error;
-
-	}
-
-	/* INO btree root block */
-	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
-	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
-	id->type = XFS_BTNUM_INO;
-	id->numrecs = 0;
-	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
-				   &xfs_inobt_buf_ops);
-	if (error)
-		goto out_error;
-
-
-	/*
-	 * FINO btree root block
-	 */
-	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
-		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
-		id->type = XFS_BTNUM_FINO;
-		id->numrecs = 0;
-		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
-					   &xfs_inobt_buf_ops);
-		if (error)
-			goto out_error;
-	}
+	for (dp = &aghdr_data[0]; dp->daddr != XFS_BUF_DADDR_NULL; dp++) {
+		if (!dp->need_init)
+			continue;
 
-	/*
-	 * refcount btree root block
-	 */
-	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
-		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
-		id->type = XFS_BTNUM_REFC;
-		id->numrecs = 0;
-		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
-					   &xfs_refcountbt_buf_ops);
+		id->daddr = dp->daddr;
+		id->numblks = dp->numblks;
+		id->numrecs = dp->numrecs;
+		id->type = dp->type;
+		error = xfs_growfs_init_aghdr(mp, id, dp->work, dp->ops);
 		if (error)
-			goto out_error;
+			break;
 	}
 
-out_error:
 	return error;
 }
 
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/7] xfs: make imaxpct changes in growfs separate
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
                   ` (3 preceding siblings ...)
  2018-02-01  6:41 ` [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation Dave Chinner
@ 2018-02-01  6:42 ` Dave Chinner
  2018-02-09 16:11   ` Brian Foster
  2018-02-01  6:42 ` [PATCH 6/7] xfs: separate secondary sb update in growfs Dave Chinner
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:42 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When growfs changes the imaxpct value of the filesystem, it runs
through all the "change size" growfs code, whether it needs to or
not. Separate out changing imaxpct into it's own function and
transaction to simplify the rest of the growfs code.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 67 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 94650b7d517e..5c844e540320 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -403,25 +403,21 @@ xfs_growfs_data_private(
 	xfs_agi_t		*agi;
 	xfs_agnumber_t		agno;
 	xfs_buf_t		*bp;
-	int			dpct;
 	int			error, saved_error = 0;
 	xfs_agnumber_t		nagcount;
 	xfs_agnumber_t		nagimax = 0;
 	xfs_rfsblock_t		nb, nb_mod;
 	xfs_rfsblock_t		new;
 	xfs_agnumber_t		oagcount;
-	int			pct;
 	xfs_trans_t		*tp;
 	LIST_HEAD		(buffer_list);
 	struct aghdr_init_data	id = {};
 
 	nb = in->newblocks;
-	pct = in->imaxpct;
-	if (nb < mp->m_sb.sb_dblocks || pct < 0 || pct > 100)
+	if (nb < mp->m_sb.sb_dblocks)
 		return -EINVAL;
 	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
 		return error;
-	dpct = pct - mp->m_sb.sb_imax_pct;
 	error = xfs_buf_read_uncached(mp->m_ddev_targp,
 				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
 				XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
@@ -554,8 +550,6 @@ xfs_growfs_data_private(
 				 nb - mp->m_sb.sb_dblocks);
 	if (id.nfree)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
-	if (dpct)
-		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
 	xfs_trans_set_sync(tp);
 	error = xfs_trans_commit(tp);
 	if (error)
@@ -564,12 +558,6 @@ xfs_growfs_data_private(
 	/* New allocation groups fully initialized, so update mount struct */
 	if (nagimax)
 		mp->m_maxagi = nagimax;
-	if (mp->m_sb.sb_imax_pct) {
-		uint64_t icount = mp->m_sb.sb_dblocks * mp->m_sb.sb_imax_pct;
-		do_div(icount, 100);
-		mp->m_maxicount = icount << mp->m_sb.sb_inopblog;
-	} else
-		mp->m_maxicount = 0;
 	xfs_set_low_space_thresholds(mp);
 	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 
@@ -673,25 +661,68 @@ xfs_growfs_log_private(
 	return -ENOSYS;
 }
 
+static int
+xfs_growfs_imaxpct(
+	struct xfs_mount	*mp,
+	__u32			imaxpct)
+{
+	struct xfs_trans	*tp;
+	int64_t			dpct;
+	int			error;
+
+	if (imaxpct > 100)
+		return -EINVAL;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
+			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
+	if (error)
+		return error;
+
+	dpct = (int64_t)imaxpct - mp->m_sb.sb_imax_pct;
+	xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
+	xfs_trans_set_sync(tp);
+	return xfs_trans_commit(tp);
+}
+
 /*
  * protected versions of growfs function acquire and release locks on the mount
  * point - exported through ioctls: XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG,
  * XFS_IOC_FSGROWFSRT
  */
-
-
 int
 xfs_growfs_data(
-	xfs_mount_t		*mp,
-	xfs_growfs_data_t	*in)
+	struct xfs_mount	*mp,
+	struct xfs_growfs_data	*in)
 {
-	int error;
+	int			error = 0;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 	if (!mutex_trylock(&mp->m_growlock))
 		return -EWOULDBLOCK;
+
+	/* update imaxpct seperately to the physical grow of the filesystem */
+	if (in->imaxpct != mp->m_sb.sb_imax_pct) {
+		error = xfs_growfs_imaxpct(mp, in->imaxpct);
+		if (error)
+			goto out_error;
+	}
+
 	error = xfs_growfs_data_private(mp, in);
+	if (error)
+		goto out_error;
+
+	/*
+	 * Post growfs calculations needed to reflect new state in operations
+	 */
+	if (mp->m_sb.sb_imax_pct) {
+		uint64_t icount = mp->m_sb.sb_dblocks * mp->m_sb.sb_imax_pct;
+		do_div(icount, 100);
+		mp->m_maxicount = icount << mp->m_sb.sb_inopblog;
+	} else
+		mp->m_maxicount = 0;
+
+out_error:
 	/*
 	 * Increment the generation unconditionally, the error could be from
 	 * updating the secondary superblocks, in which case the new size
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/7] xfs: separate secondary sb update in growfs
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
                   ` (4 preceding siblings ...)
  2018-02-01  6:42 ` [PATCH 5/7] xfs: make imaxpct changes in growfs separate Dave Chinner
@ 2018-02-01  6:42 ` Dave Chinner
  2018-02-09 16:11   ` Brian Foster
  2018-02-01  6:42 ` [PATCH 7/7] xfs: rework secondary superblock updates " Dave Chinner
  2018-02-06 23:44 ` [PATCH 0/7] xfs: refactor and tablise growfs Darrick J. Wong
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:42 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

This happens after all the transactions to update the superblock
occur, and errors need to be handled slightly differently. Seperate
out the code into it's own function, and clean up the error goto
stack in the core growfs code as it is now much simpler.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 154 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 87 insertions(+), 67 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 5c844e540320..113be7dbdc81 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -401,9 +401,8 @@ xfs_growfs_data_private(
 {
 	xfs_agf_t		*agf;
 	xfs_agi_t		*agi;
-	xfs_agnumber_t		agno;
 	xfs_buf_t		*bp;
-	int			error, saved_error = 0;
+	int			error;
 	xfs_agnumber_t		nagcount;
 	xfs_agnumber_t		nagimax = 0;
 	xfs_rfsblock_t		nb, nb_mod;
@@ -475,12 +474,12 @@ xfs_growfs_data_private(
 		error = xfs_grow_ag_headers(mp, &id);
 		if (error) {
 			xfs_buf_delwri_cancel(&id.buffer_list);
-			goto error0;
+			goto out_trans_cancel;
 		}
 	}
 	error = xfs_buf_delwri_submit(&id.buffer_list);
 	if (error)
-		goto error0;
+		goto out_trans_cancel;
 
 	xfs_trans_agblocks_delta(tp, id.nfree);
 
@@ -494,22 +493,23 @@ xfs_growfs_data_private(
 		 * Change the agi length.
 		 */
 		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
-		if (error) {
-			goto error0;
-		}
+		if (error)
+			goto out_trans_cancel;
+
 		ASSERT(bp);
 		agi = XFS_BUF_TO_AGI(bp);
 		be32_add_cpu(&agi->agi_length, new);
 		ASSERT(nagcount == oagcount ||
 		       be32_to_cpu(agi->agi_length) == mp->m_sb.sb_agblocks);
 		xfs_ialloc_log_agi(tp, bp, XFS_AGI_LENGTH);
+
 		/*
 		 * Change agf length.
 		 */
 		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
-		if (error) {
-			goto error0;
-		}
+		if (error)
+			goto out_trans_cancel;
+
 		ASSERT(bp);
 		agf = XFS_BUF_TO_AGF(bp);
 		be32_add_cpu(&agf->agf_length, new);
@@ -529,13 +529,13 @@ xfs_growfs_data_private(
 				be32_to_cpu(agf->agf_length) - new,
 				new, &oinfo);
 		if (error)
-			goto error0;
+			goto out_trans_cancel;
 		error = xfs_free_extent(tp,
 				XFS_AGB_TO_FSB(mp, id.agno,
 					be32_to_cpu(agf->agf_length) - new),
 				new, &oinfo, XFS_AG_RESV_NONE);
 		if (error)
-			goto error0;
+			goto out_trans_cancel;
 	}
 
 	/*
@@ -572,16 +572,79 @@ xfs_growfs_data_private(
 		error = xfs_ag_resv_free(pag);
 		xfs_perag_put(pag);
 		if (error)
-			goto out;
+			return error;
 	}
 
 	/* Reserve AG metadata blocks. */
-	error = xfs_fs_reserve_ag_blocks(mp);
-	if (error && error != -ENOSPC)
-		goto out;
+	return xfs_fs_reserve_ag_blocks(mp);
+
+out_trans_cancel:
+	xfs_trans_cancel(tp);
+	return error;
+}
+
+static int
+xfs_growfs_log_private(
+	xfs_mount_t		*mp,	/* mount point for filesystem */
+	xfs_growfs_log_t	*in)	/* growfs log input struct */
+{
+	xfs_extlen_t		nb;
+
+	nb = in->newblocks;
+	if (nb < XFS_MIN_LOG_BLOCKS || nb < XFS_B_TO_FSB(mp, XFS_MIN_LOG_BYTES))
+		return -EINVAL;
+	if (nb == mp->m_sb.sb_logblocks &&
+	    in->isint == (mp->m_sb.sb_logstart != 0))
+		return -EINVAL;
+	/*
+	 * Moving the log is hard, need new interfaces to sync
+	 * the log first, hold off all activity while moving it.
+	 * Can have shorter or longer log in the same space,
+	 * or transform internal to external log or vice versa.
+	 */
+	return -ENOSYS;
+}
+
+static int
+xfs_growfs_imaxpct(
+	struct xfs_mount	*mp,
+	__u32			imaxpct)
+{
+	struct xfs_trans	*tp;
+	int			dpct;
+	int			error;
+
+	if (imaxpct > 100)
+		return -EINVAL;
+
+	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
+			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
+	if (error)
+		return error;
+
+	dpct = imaxpct - mp->m_sb.sb_imax_pct;
+	xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
+	xfs_trans_set_sync(tp);
+	return xfs_trans_commit(tp);
+}
+
+/*
+ * After a grow operation, we need to update all the secondary superblocks
+ * to match the new state of the primary. Read/init the superblocks and update
+ * them appropriately.
+ */
+static int
+xfs_growfs_update_superblocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		oagcount)
+{
+	struct xfs_buf		*bp;
+	xfs_agnumber_t		agno;
+	int			saved_error = 0;
+	int			error = 0;
 
 	/* update secondary superblocks. */
-	for (agno = 1; agno < nagcount; agno++) {
+	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
 		error = 0;
 		/*
 		 * new secondary superblocks need to be zeroed, not read from
@@ -631,57 +694,7 @@ xfs_growfs_data_private(
 		}
 	}
 
- out:
 	return saved_error ? saved_error : error;
-
- error0:
-	xfs_trans_cancel(tp);
-	return error;
-}
-
-static int
-xfs_growfs_log_private(
-	xfs_mount_t		*mp,	/* mount point for filesystem */
-	xfs_growfs_log_t	*in)	/* growfs log input struct */
-{
-	xfs_extlen_t		nb;
-
-	nb = in->newblocks;
-	if (nb < XFS_MIN_LOG_BLOCKS || nb < XFS_B_TO_FSB(mp, XFS_MIN_LOG_BYTES))
-		return -EINVAL;
-	if (nb == mp->m_sb.sb_logblocks &&
-	    in->isint == (mp->m_sb.sb_logstart != 0))
-		return -EINVAL;
-	/*
-	 * Moving the log is hard, need new interfaces to sync
-	 * the log first, hold off all activity while moving it.
-	 * Can have shorter or longer log in the same space,
-	 * or transform internal to external log or vice versa.
-	 */
-	return -ENOSYS;
-}
-
-static int
-xfs_growfs_imaxpct(
-	struct xfs_mount	*mp,
-	__u32			imaxpct)
-{
-	struct xfs_trans	*tp;
-	int64_t			dpct;
-	int			error;
-
-	if (imaxpct > 100)
-		return -EINVAL;
-
-	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
-			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
-	if (error)
-		return error;
-
-	dpct = (int64_t)imaxpct - mp->m_sb.sb_imax_pct;
-	xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
-	xfs_trans_set_sync(tp);
-	return xfs_trans_commit(tp);
 }
 
 /*
@@ -694,6 +707,7 @@ xfs_growfs_data(
 	struct xfs_mount	*mp,
 	struct xfs_growfs_data	*in)
 {
+	xfs_agnumber_t		oagcount;
 	int			error = 0;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -708,6 +722,7 @@ xfs_growfs_data(
 			goto out_error;
 	}
 
+	oagcount = mp->m_sb.sb_agcount;
 	error = xfs_growfs_data_private(mp, in);
 	if (error)
 		goto out_error;
@@ -722,6 +737,11 @@ xfs_growfs_data(
 	} else
 		mp->m_maxicount = 0;
 
+	/*
+	 * Update secondary superblocks now the physical grow has completed
+	 */
+	error = xfs_growfs_update_superblocks(mp, oagcount);
+
 out_error:
 	/*
 	 * Increment the generation unconditionally, the error could be from
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
                   ` (5 preceding siblings ...)
  2018-02-01  6:42 ` [PATCH 6/7] xfs: separate secondary sb update in growfs Dave Chinner
@ 2018-02-01  6:42 ` Dave Chinner
  2018-02-09 16:12   ` Brian Foster
  2018-02-06 23:44 ` [PATCH 0/7] xfs: refactor and tablise growfs Darrick J. Wong
  7 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-01  6:42 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Right now we wait until we've committed changes to the primary
superblock before we initialise any of the new secondary
superblocks. This means that if we have any write errors for new
secondary superblocks we end up with garbage in place rather than
zeros or even an "in progress" superblock to indicate a grow
operation is being done.

To ensure we can write the secondary superblocks, initialise them
earlier in the same loop that initialises the AG headers. We stamp
the new secondary superblocks here with the old geometry, but set
the "sb_inprogress" field to indicate that updates are being done to
the superblock so they cannot be used.  This will result in the
secondary superblock fields being updated or triggering errors that
will abort the grow before we commit any permanent changes.

This also means we can change the update mechanism of the secondary
superblocks.  We know that we are going to wholly overwrite the
information in the struct xfs_sb in the buffer, so there's no point
reading it from disk. Just allocate an uncached buffer, zero it in
memory, stamp the new superblock structure in it and write it out.
If we fail to write it out, then we'll leave the existing sb (old or
new w/ inprogress) on disk for repair to deal with later.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 92 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 55 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 113be7dbdc81..7318cebb591d 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -197,6 +197,25 @@ xfs_rmaproot_init(
 	}
 }
 
+/*
+ * Initialise new secondary superblocks with the pre-grow geometry, but mark
+ * them as "in progress" so we know they haven't yet been activated. This will
+ * get cleared when the update with the new geometry information is done after
+ * changes to the primary are committed. This isn't strictly necessary, but we
+ * get it for free with the delayed buffer write lists and it means we can tell
+ * if a grow operation didn't complete properly after the fact.
+ */
+static void
+xfs_sbblock_init(
+	struct xfs_mount	*mp,
+	struct xfs_buf		*bp,
+	struct aghdr_init_data	*id)
+{
+	struct xfs_dsb		*dsb = XFS_BUF_TO_SBP(bp);
+
+	xfs_sb_to_disk(dsb, &mp->m_sb);
+	dsb->sb_inprogress = 1;
+}
 
 static void
 xfs_agfblock_init(
@@ -333,6 +352,10 @@ xfs_grow_ag_headers(
 
 {
 	struct xfs_aghdr_grow_data aghdr_data[] = {
+		/* SB */
+		{ XFS_AG_DADDR(mp, id->agno, XFS_SB_DADDR),
+		  XFS_FSS_TO_BB(mp, 1), &xfs_sb_buf_ops,
+		  &xfs_sbblock_init, 0, 0, true },
 		/* AGF */
 		{ XFS_AG_DADDR(mp, id->agno, XFS_AGF_DADDR(mp)),
 		  XFS_FSS_TO_BB(mp, 1), &xfs_agf_buf_ops,
@@ -630,43 +653,27 @@ xfs_growfs_imaxpct(
 
 /*
  * After a grow operation, we need to update all the secondary superblocks
- * to match the new state of the primary. Read/init the superblocks and update
- * them appropriately.
+ * to match the new state of the primary. Because we are completely overwriting
+ * all the existing fields in the secondary superblock buffers, there is no need
+ * to read them in from disk. Just get a new uncached buffer, stamp it and
+ * write it.
  */
 static int
 xfs_growfs_update_superblocks(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		oagcount)
+	struct xfs_mount	*mp)
 {
-	struct xfs_buf		*bp;
 	xfs_agnumber_t		agno;
 	int			saved_error = 0;
 	int			error = 0;
+	LIST_HEAD		(buffer_list);
 
 	/* update secondary superblocks. */
 	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
-		error = 0;
-		/*
-		 * new secondary superblocks need to be zeroed, not read from
-		 * disk as the contents of the new area we are growing into is
-		 * completely unknown.
-		 */
-		if (agno < oagcount) {
-			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
-				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
-				  XFS_FSS_TO_BB(mp, 1), 0, &bp,
-				  &xfs_sb_buf_ops);
-		} else {
-			bp = xfs_trans_get_buf(NULL, mp->m_ddev_targp,
-				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
-				  XFS_FSS_TO_BB(mp, 1), 0);
-			if (bp) {
-				bp->b_ops = &xfs_sb_buf_ops;
-				xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
-			} else
-				error = -ENOMEM;
-		}
+		struct xfs_buf		*bp;
 
+		bp = xfs_growfs_get_hdr_buf(mp,
+				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
+				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
 		/*
 		 * If we get an error reading or writing alternate superblocks,
 		 * continue.  xfs_repair chooses the "best" superblock based
@@ -674,25 +681,38 @@ xfs_growfs_update_superblocks(
 		 * superblocks un-updated than updated, and xfs_repair may
 		 * pick them over the properly-updated primary.
 		 */
-		if (error) {
+		if (!bp) {
 			xfs_warn(mp,
-		"error %d reading secondary superblock for ag %d",
-				error, agno);
-			saved_error = error;
+		"error allocating secondary superblock for ag %d",
+				agno);
+			if (!saved_error)
+				saved_error = -ENOMEM;
 			continue;
 		}
 		xfs_sb_to_disk(XFS_BUF_TO_SBP(bp), &mp->m_sb);
-
-		error = xfs_bwrite(bp);
+		xfs_buf_delwri_queue(bp, &buffer_list);
 		xfs_buf_relse(bp);
+
+		/* don't hold too many buffers at once */
+		if (agno % 16)
+			continue;
+
+		error = xfs_buf_delwri_submit(&buffer_list);
 		if (error) {
 			xfs_warn(mp,
-		"write error %d updating secondary superblock for ag %d",
+		"write error %d updating a secondary superblock near ag %d",
 				error, agno);
-			saved_error = error;
+			if (!saved_error)
+				saved_error = error;
 			continue;
 		}
 	}
+	error = xfs_buf_delwri_submit(&buffer_list);
+	if (error) {
+		xfs_warn(mp,
+		"write error %d updating a secondary superblock near ag %d",
+			error, agno);
+	}
 
 	return saved_error ? saved_error : error;
 }
@@ -707,7 +727,6 @@ xfs_growfs_data(
 	struct xfs_mount	*mp,
 	struct xfs_growfs_data	*in)
 {
-	xfs_agnumber_t		oagcount;
 	int			error = 0;
 
 	if (!capable(CAP_SYS_ADMIN))
@@ -722,7 +741,6 @@ xfs_growfs_data(
 			goto out_error;
 	}
 
-	oagcount = mp->m_sb.sb_agcount;
 	error = xfs_growfs_data_private(mp, in);
 	if (error)
 		goto out_error;
@@ -740,7 +758,7 @@ xfs_growfs_data(
 	/*
 	 * Update secondary superblocks now the physical grow has completed
 	 */
-	error = xfs_growfs_update_superblocks(mp, oagcount);
+	error = xfs_growfs_update_superblocks(mp);
 
 out_error:
 	/*
-- 
2.15.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/7] xfs: refactor and tablise growfs
  2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
                   ` (6 preceding siblings ...)
  2018-02-01  6:42 ` [PATCH 7/7] xfs: rework secondary superblock updates " Dave Chinner
@ 2018-02-06 23:44 ` Darrick J. Wong
  2018-02-07  7:10   ` Dave Chinner
  7 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2018-02-06 23:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:41:55PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> This is a series I posted months ago with the first thinspace
> filesystem support. There was no comments on any of these patches
> because all the heat and light got focussed on the growfs API.
> I'm posting this separately to avoid that problem again....

...just in time to collide head-on with the online repair series that I
am planning to push out for review for 4.17. :)

> Anyway, the core of this change is to make the growfs code much
> simpler to extend. Most of the code that does structure
> initialisation is cookie-cutter code and it's whacked into one great
> big function. This patch set splits it up into separate functions
> and uses common helper functions where possible. The different
> structures and their initialisation definitions are now held in a
> table, so when we add new stuctures or modify existing structures
> it's a simple and isolate change.
> 
> The reworked initialisation code is suitable for moving to libxfs
> and converting mkfs.xfs to use it for the initial formatting of
> the filesystem. This will take more work to acheive, so this
> patch set stops short of moving the code to libxfs.

Or maybe I'll just pull the patches into my dev tree and move all the
code to libxfs /now/ since I don't see much difference between growing
extra limbs and regrowing new body parts.  The "repair" code I've
written so far chooses to rebuild the entire data structure from other
parts rather than trying to save an existing structure:

1. Lock the AG{I,F,FL} header/inode/whatever we're repairing.
2. Gather all the data that would have been in that data structure.
3. Make a list of all the blocks with the relevant rmap owner.
4. Make a list of all the blocks with the relevant rmap owner that are
   owned by any other structure.  For example, if we're rebuilding the
   inobt then we make a list of all the OWN_INOBT blocks, and then we
   iterate the finobt to make a list of finobt blocks.
5. Allocate a new block for the root, if necessary.
6. Initialize the data structure.
7. Import all the data gathered in step 2.
8. Subtract the list made in step 4 from the list made in step 3.  These
   are all the blocks that were owned by the structure we just rebuilt,
   so free them.
9. Commit transaction, release locks, we're done.

(Steps 7-8 involve rolling transactions.)

I think growfs'ing a new AG is basically steps 1, 5, 6, 9, with the only
twist being that growfs uses a delwri list instead of joining things to
a transaction.  For this to work there needs to be separate functions to
initialize a block and to deal with writing the xfs_buf to disk; I think
I see this happening in the patchset, but tbh I suck at reading diff. :)

> The other changes to the growfs code in this patchset also isolate
> separate parts of the growfs functionality, such as updating the
> secondary superblocks and changing imaxpct. This makes adding
> thinspace functionality to growfs much easier.
> 
> Finally, there are optimisations to make a large AG count growfs
> much faster. Instead of initialising and writing headers one at a
> time synchronously, they are added to a delwri buffer list and
> written in bulk and asynchronously. This means AG headers get merged
> by the block layer and it can reduce the IO wait time by an order of
> magnitude or more.

Sounds good.

> There are also mods to the secondary superblock update algorithm
> which make it more resilient in the face of writeback failures. We
> use a two pass update now - the main growfs loop now initialised
> secondary superblocks with sb_inprogess = 1 to indicate it is not
> in a valid state before we make any modifications, then after teh
> transactional grow we do a second pass to set sb_inprogess = 0 and
> mark them valid.
> 
> This means that if we fail to write any secondary superblock, repair
> is not going to get confused by partial grow state. If we crash
> during the initial write, nothing has changed in the primary
> superblock. If we crash after the primary sb grow, then we'll know
> exactly what secondary superblocks did not get updated because
> they'll be the ones with sb_inprogress = 1 in them. Hence the
> recovery process becomes much easier as the parts of the fs that
> need updating are obvious....

I assume there will eventually be some kind of code to detect
sb_inprogress==1 and fix it as soon as we try to write an AG?

--D

> Cheers,
> 
> Dave.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/7] xfs: refactor and tablise growfs
  2018-02-06 23:44 ` [PATCH 0/7] xfs: refactor and tablise growfs Darrick J. Wong
@ 2018-02-07  7:10   ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-02-07  7:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Feb 06, 2018 at 03:44:07PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 01, 2018 at 05:41:55PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > This is a series I posted months ago with the first thinspace
> > filesystem support. There was no comments on any of these patches
> > because all the heat and light got focussed on the growfs API.
> > I'm posting this separately to avoid that problem again....
> 
> ...just in time to collide head-on with the online repair series that I
> am planning to push out for review for 4.17. :)

Yuck. :(

> > Anyway, the core of this change is to make the growfs code much
> > simpler to extend. Most of the code that does structure
> > initialisation is cookie-cutter code and it's whacked into one great
> > big function. This patch set splits it up into separate functions
> > and uses common helper functions where possible. The different
> > structures and their initialisation definitions are now held in a
> > table, so when we add new stuctures or modify existing structures
> > it's a simple and isolate change.
> > 
> > The reworked initialisation code is suitable for moving to libxfs
> > and converting mkfs.xfs to use it for the initial formatting of
> > the filesystem. This will take more work to acheive, so this
> > patch set stops short of moving the code to libxfs.
> 
> Or maybe I'll just pull the patches into my dev tree and move all the
> code to libxfs /now/ since I don't see much difference between growing
> extra limbs and regrowing new body parts.  The "repair" code I've
> written so far chooses to rebuild the entire data structure from other
> parts rather than trying to save an existing structure:
> 
> 1. Lock the AG{I,F,FL} header/inode/whatever we're repairing.
> 2. Gather all the data that would have been in that data structure.
> 3. Make a list of all the blocks with the relevant rmap owner.
> 4. Make a list of all the blocks with the relevant rmap owner that are
>    owned by any other structure.  For example, if we're rebuilding the
>    inobt then we make a list of all the OWN_INOBT blocks, and then we
>    iterate the finobt to make a list of finobt blocks.
> 5. Allocate a new block for the root, if necessary.
> 6. Initialize the data structure.
> 7. Import all the data gathered in step 2.
> 8. Subtract the list made in step 4 from the list made in step 3.  These
>    are all the blocks that were owned by the structure we just rebuilt,
>    so free them.
> 9. Commit transaction, release locks, we're done.
> 
> (Steps 7-8 involve rolling transactions.)
> 
> I think growfs'ing a new AG is basically steps 1, 5, 6, 9, with the only
> twist being that growfs uses a delwri list instead of joining things to
> a transaction. 

Actually, it's a lot more different than just that. It's not
transactional, and we can't use cached buffers (and hence
transactions) because we're writing beyond the current valid end
of the filesystem. i.e. we're initialising all the structures before
we modify the superblock to indicate they are valid. That's also why
all the dirty buffers are gathered on a delwri list rather than in a
transaction context and we don't have to worry about locking...

> For this to work there needs to be separate functions to
> initialize a block and to deal with writing the xfs_buf to disk; I think
> I see this happening in the patchset, but tbh I suck at reading diff. :)

Yeah, there are separate functions to initialise the different
structures, but common code controls the delwri list addition and
writeback.

> > There are also mods to the secondary superblock update algorithm
> > which make it more resilient in the face of writeback failures. We
> > use a two pass update now - the main growfs loop now initialised
> > secondary superblocks with sb_inprogess = 1 to indicate it is not
> > in a valid state before we make any modifications, then after teh
> > transactional grow we do a second pass to set sb_inprogess = 0 and
> > mark them valid.
> > 
> > This means that if we fail to write any secondary superblock, repair
> > is not going to get confused by partial grow state. If we crash
> > during the initial write, nothing has changed in the primary
> > superblock. If we crash after the primary sb grow, then we'll know
> > exactly what secondary superblocks did not get updated because
> > they'll be the ones with sb_inprogress = 1 in them. Hence the
> > recovery process becomes much easier as the parts of the fs that
> > need updating are obvious....
> 
> I assume there will eventually be some kind of code to detect
> sb_inprogress==1 and fix it as soon as we try to write an AG?

Well, sort of. We never actually read secondary superblocks, except
in repair and now scrub. But reapir needs to know if secondary
superblocks are stale or not yet valid - right now it's an interesting
situation to be in if grow succeeds and then fails to write all the
secondary superblocks because we can't tell the difference between
per-grow and post-grow sbs. At least with sb_inprogress set in these
secondary superblocks we'll be able to tell how far the update got
before failing, and hence that makes it easier to recover to
consistent state.

Ultimately, I want to make the secondary superblock update loop a
set of ordered buffers associated with the primary superblock change
transaction. That way we can make log recovery correctly recover
from crashes during grow operations (i.e. it can re-run the
secondary sb updates based on the sb_inprogress flags...).

Or alternatively, turn grow into a set of defered ops with intents
so we can do the superblock size change first and always be able to
complete the secondary superblock updates in a transactional
manner...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/7] xfs: factor out AG header initialisation from growfs core
  2018-02-01  6:41 ` [PATCH 1/7] xfs: factor out AG header initialisation from growfs core Dave Chinner
@ 2018-02-08 18:53   ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2018-02-08 18:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:41:56PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The intialisation of new AG headers is mostly common with the
> userspace mkfs code and growfs in the kernel, so start factoring it
> out so we can move it to libxfs and use it in both places.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c | 637 ++++++++++++++++++++++++++++-------------------------
>  1 file changed, 331 insertions(+), 306 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 8b4545623e25..cd5196bf8756 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -71,20 +71,344 @@ xfs_growfs_get_hdr_buf(
>  	return bp;
>  }
>  
> +/*
> + * Write new AG headers to disk. Non-transactional, but written
> + * synchronously so they are completed prior to the growfs transaction
> + * being logged.
> + */
> +static int
> +xfs_grow_ag_headers(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	xfs_extlen_t		agsize,
> +	xfs_rfsblock_t		*nfree)
> +{
...
> +	bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
> +			XFS_FSS_TO_BB(mp, 1), 0,
> +			&xfs_agf_buf_ops);
> +	if (!bp) {
> +		error = -ENOMEM;
> +		goto out_error;

I was going to suggest to kill the error label since it serves no
purpose in this helper, but a quick look ahead shows that it goes away.
Looks good to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	}
> +
> +	agf = XFS_BUF_TO_AGF(bp);
> +	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
> +	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
> +	agf->agf_seqno = cpu_to_be32(agno);
> +	agf->agf_length = cpu_to_be32(agsize);
> +	agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(XFS_BNO_BLOCK(mp));
> +	agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
> +	agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
> +	agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		agf->agf_roots[XFS_BTNUM_RMAPi] =
> +					cpu_to_be32(XFS_RMAP_BLOCK(mp));
> +		agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
> +		agf->agf_rmap_blocks = cpu_to_be32(1);
> +	}
> +
> +	agf->agf_flfirst = cpu_to_be32(1);
> +	agf->agf_fllast = 0;
> +	agf->agf_flcount = 0;
> +	tmpsize = agsize - mp->m_ag_prealloc_blocks;
> +	agf->agf_freeblks = cpu_to_be32(tmpsize);
> +	agf->agf_longest = cpu_to_be32(tmpsize);
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		agf->agf_refcount_root = cpu_to_be32(
> +				xfs_refc_block(mp));
> +		agf->agf_refcount_level = cpu_to_be32(1);
> +		agf->agf_refcount_blocks = cpu_to_be32(1);
> +	}
> +
> +	error = xfs_bwrite(bp);
> +	xfs_buf_relse(bp);
> +	if (error)
> +		goto out_error;
> +
> +	/*
> +	 * AG freelist header block
> +	 */
> +	bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
> +			XFS_FSS_TO_BB(mp, 1), 0,
> +			&xfs_agfl_buf_ops);
> +	if (!bp) {
> +		error = -ENOMEM;
> +		goto out_error;
> +	}
> +
> +	agfl = XFS_BUF_TO_AGFL(bp);
> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
> +		agfl->agfl_seqno = cpu_to_be32(agno);
> +		uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
> +	}
> +
> +	agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
> +	for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
> +		agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
> +
> +	error = xfs_bwrite(bp);
> +	xfs_buf_relse(bp);
> +	if (error)
> +		goto out_error;
> +
> +	/*
> +	 * AG inode header block
> +	 */
> +	bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
> +			XFS_FSS_TO_BB(mp, 1), 0,
> +			&xfs_agi_buf_ops);
> +	if (!bp) {
> +		error = -ENOMEM;
> +		goto out_error;
> +	}
> +
> +	agi = XFS_BUF_TO_AGI(bp);
> +	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
> +	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
> +	agi->agi_seqno = cpu_to_be32(agno);
> +	agi->agi_length = cpu_to_be32(agsize);
> +	agi->agi_count = 0;
> +	agi->agi_root = cpu_to_be32(XFS_IBT_BLOCK(mp));
> +	agi->agi_level = cpu_to_be32(1);
> +	agi->agi_freecount = 0;
> +	agi->agi_newino = cpu_to_be32(NULLAGINO);
> +	agi->agi_dirino = cpu_to_be32(NULLAGINO);
> +	if (xfs_sb_version_hascrc(&mp->m_sb))
> +		uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> +		agi->agi_free_root = cpu_to_be32(XFS_FIBT_BLOCK(mp));
> +		agi->agi_free_level = cpu_to_be32(1);
> +	}
> +	for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
> +		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
> +
> +	error = xfs_bwrite(bp);
> +	xfs_buf_relse(bp);
> +	if (error)
> +		goto out_error;
> +
> +	/*
> +	 * BNO btree root block
> +	 */
> +	bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
> +			BTOBB(mp->m_sb.sb_blocksize), 0,
> +			&xfs_allocbt_buf_ops);
> +
> +	if (!bp) {
> +		error = -ENOMEM;
> +		goto out_error;
> +	}
> +
> +	xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, agno, 0);
> +
> +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> +	arec->ar_blockcount = cpu_to_be32(
> +		agsize - be32_to_cpu(arec->ar_startblock));
> +
> +	error = xfs_bwrite(bp);
> +	xfs_buf_relse(bp);
> +	if (error)
> +		goto out_error;
> +
> +	/*
> +	 * CNT btree root block
> +	 */
> +	bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
> +			BTOBB(mp->m_sb.sb_blocksize), 0,
> +			&xfs_allocbt_buf_ops);
> +	if (!bp) {
> +		error = -ENOMEM;
> +		goto out_error;
> +	}
> +
> +	xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, agno, 0);
> +
> +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> +	arec->ar_blockcount = cpu_to_be32(
> +		agsize - be32_to_cpu(arec->ar_startblock));
> +	*nfree += be32_to_cpu(arec->ar_blockcount);
> +
> +	error = xfs_bwrite(bp);
> +	xfs_buf_relse(bp);
> +	if (error)
> +		goto out_error;
> +
> +	/* RMAP btree root block */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		struct xfs_rmap_rec	*rrec;
> +		struct xfs_btree_block	*block;
> +
> +		bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
> +			BTOBB(mp->m_sb.sb_blocksize), 0,
> +			&xfs_rmapbt_buf_ops);
> +		if (!bp) {
> +			error = -ENOMEM;
> +			goto out_error;
> +		}
> +
> +		xfs_btree_init_block(mp, bp, XFS_BTNUM_RMAP, 0, 0,
> +					agno, 0);
> +		block = XFS_BUF_TO_BLOCK(bp);
> +
> +
> +		/*
> +		 * mark the AG header regions as static metadata The BNO
> +		 * btree block is the first block after the headers, so
> +		 * it's location defines the size of region the static
> +		 * metadata consumes.
> +		 *
> +		 * Note: unlike mkfs, we never have to account for log
> +		 * space when growing the data regions
> +		 */
> +		rrec = XFS_RMAP_REC_ADDR(block, 1);
> +		rrec->rm_startblock = 0;
> +		rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
> +		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
> +		rrec->rm_offset = 0;
> +		be16_add_cpu(&block->bb_numrecs, 1);
> +
> +		/* account freespace btree root blocks */
> +		rrec = XFS_RMAP_REC_ADDR(block, 2);
> +		rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
> +		rrec->rm_blockcount = cpu_to_be32(2);
> +		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> +		rrec->rm_offset = 0;
> +		be16_add_cpu(&block->bb_numrecs, 1);
> +
> +		/* account inode btree root blocks */
> +		rrec = XFS_RMAP_REC_ADDR(block, 3);
> +		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> +		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> +						XFS_IBT_BLOCK(mp));
> +		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> +		rrec->rm_offset = 0;
> +		be16_add_cpu(&block->bb_numrecs, 1);
> +
> +		/* account for rmap btree root */
> +		rrec = XFS_RMAP_REC_ADDR(block, 4);
> +		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> +		rrec->rm_blockcount = cpu_to_be32(1);
> +		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> +		rrec->rm_offset = 0;
> +		be16_add_cpu(&block->bb_numrecs, 1);
> +
> +		/* account for refc btree root */
> +		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +			rrec = XFS_RMAP_REC_ADDR(block, 5);
> +			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
> +			rrec->rm_blockcount = cpu_to_be32(1);
> +			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> +			rrec->rm_offset = 0;
> +			be16_add_cpu(&block->bb_numrecs, 1);
> +		}
> +
> +		error = xfs_bwrite(bp);
> +		xfs_buf_relse(bp);
> +		if (error)
> +			goto out_error;
> +	}
> +
> +	/*
> +	 * INO btree root block
> +	 */
> +	bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> +			BTOBB(mp->m_sb.sb_blocksize), 0,
> +			&xfs_inobt_buf_ops);
> +	if (!bp) {
> +		error = -ENOMEM;
> +		goto out_error;
> +	}
> +
> +	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> +
> +	error = xfs_bwrite(bp);
> +	xfs_buf_relse(bp);
> +	if (error)
> +		goto out_error;
> +
> +	/*
> +	 * FINO btree root block
> +	 */
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> +		bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> +			BTOBB(mp->m_sb.sb_blocksize), 0,
> +			&xfs_inobt_buf_ops);
> +		if (!bp) {
> +			error = -ENOMEM;
> +			goto out_error;
> +		}
> +
> +		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO,
> +					     0, 0, agno, 0);
> +
> +		error = xfs_bwrite(bp);
> +		xfs_buf_relse(bp);
> +		if (error)
> +			goto out_error;
> +	}
> +
> +	/*
> +	 * refcount btree root block
> +	 */
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		bp = xfs_growfs_get_hdr_buf(mp,
> +			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> +			BTOBB(mp->m_sb.sb_blocksize), 0,
> +			&xfs_refcountbt_buf_ops);
> +		if (!bp) {
> +			error = -ENOMEM;
> +			goto out_error;
> +		}
> +
> +		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC,
> +				     0, 0, agno, 0);
> +
> +		error = xfs_bwrite(bp);
> +		xfs_buf_relse(bp);
> +		if (error)
> +			goto out_error;
> +	}
> +
> +out_error:
> +	return error;
> +}
> +
>  static int
>  xfs_growfs_data_private(
>  	xfs_mount_t		*mp,		/* mount point for filesystem */
>  	xfs_growfs_data_t	*in)		/* growfs data input struct */
>  {
>  	xfs_agf_t		*agf;
> -	struct xfs_agfl		*agfl;
>  	xfs_agi_t		*agi;
>  	xfs_agnumber_t		agno;
>  	xfs_extlen_t		agsize;
> -	xfs_extlen_t		tmpsize;
> -	xfs_alloc_rec_t		*arec;
>  	xfs_buf_t		*bp;
> -	int			bucket;
>  	int			dpct;
>  	int			error, saved_error = 0;
>  	xfs_agnumber_t		nagcount;
> @@ -141,318 +465,19 @@ xfs_growfs_data_private(
>  	 */
>  	nfree = 0;
>  	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> -		__be32	*agfl_bno;
> -
> -		/*
> -		 * AG freespace header block
> -		 */
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
> -				XFS_FSS_TO_BB(mp, 1), 0,
> -				&xfs_agf_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> -			goto error0;
> -		}
>  
> -		agf = XFS_BUF_TO_AGF(bp);
> -		agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
> -		agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
> -		agf->agf_seqno = cpu_to_be32(agno);
>  		if (agno == nagcount - 1)
> -			agsize =
> -				nb -
> +			agsize = nb -
>  				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
>  		else
>  			agsize = mp->m_sb.sb_agblocks;
> -		agf->agf_length = cpu_to_be32(agsize);
> -		agf->agf_roots[XFS_BTNUM_BNOi] = cpu_to_be32(XFS_BNO_BLOCK(mp));
> -		agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
> -		agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
> -		agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
> -		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> -			agf->agf_roots[XFS_BTNUM_RMAPi] =
> -						cpu_to_be32(XFS_RMAP_BLOCK(mp));
> -			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
> -			agf->agf_rmap_blocks = cpu_to_be32(1);
> -		}
> -
> -		agf->agf_flfirst = cpu_to_be32(1);
> -		agf->agf_fllast = 0;
> -		agf->agf_flcount = 0;
> -		tmpsize = agsize - mp->m_ag_prealloc_blocks;
> -		agf->agf_freeblks = cpu_to_be32(tmpsize);
> -		agf->agf_longest = cpu_to_be32(tmpsize);
> -		if (xfs_sb_version_hascrc(&mp->m_sb))
> -			uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
> -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -			agf->agf_refcount_root = cpu_to_be32(
> -					xfs_refc_block(mp));
> -			agf->agf_refcount_level = cpu_to_be32(1);
> -			agf->agf_refcount_blocks = cpu_to_be32(1);
> -		}
> -
> -		error = xfs_bwrite(bp);
> -		xfs_buf_relse(bp);
> -		if (error)
> -			goto error0;
> -
> -		/*
> -		 * AG freelist header block
> -		 */
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AG_DADDR(mp, agno, XFS_AGFL_DADDR(mp)),
> -				XFS_FSS_TO_BB(mp, 1), 0,
> -				&xfs_agfl_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> -			goto error0;
> -		}
> -
> -		agfl = XFS_BUF_TO_AGFL(bp);
> -		if (xfs_sb_version_hascrc(&mp->m_sb)) {
> -			agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
> -			agfl->agfl_seqno = cpu_to_be32(agno);
> -			uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
> -		}
> -
> -		agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, bp);
> -		for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
> -			agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
>  
> -		error = xfs_bwrite(bp);
> -		xfs_buf_relse(bp);
> +		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree);
>  		if (error)
>  			goto error0;
> -
> -		/*
> -		 * AG inode header block
> -		 */
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
> -				XFS_FSS_TO_BB(mp, 1), 0,
> -				&xfs_agi_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> -			goto error0;
> -		}
> -
> -		agi = XFS_BUF_TO_AGI(bp);
> -		agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
> -		agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
> -		agi->agi_seqno = cpu_to_be32(agno);
> -		agi->agi_length = cpu_to_be32(agsize);
> -		agi->agi_count = 0;
> -		agi->agi_root = cpu_to_be32(XFS_IBT_BLOCK(mp));
> -		agi->agi_level = cpu_to_be32(1);
> -		agi->agi_freecount = 0;
> -		agi->agi_newino = cpu_to_be32(NULLAGINO);
> -		agi->agi_dirino = cpu_to_be32(NULLAGINO);
> -		if (xfs_sb_version_hascrc(&mp->m_sb))
> -			uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
> -		if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> -			agi->agi_free_root = cpu_to_be32(XFS_FIBT_BLOCK(mp));
> -			agi->agi_free_level = cpu_to_be32(1);
> -		}
> -		for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
> -			agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
> -
> -		error = xfs_bwrite(bp);
> -		xfs_buf_relse(bp);
> -		if (error)
> -			goto error0;
> -
> -		/*
> -		 * BNO btree root block
> -		 */
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
> -				BTOBB(mp->m_sb.sb_blocksize), 0,
> -				&xfs_allocbt_buf_ops);
> -
> -		if (!bp) {
> -			error = -ENOMEM;
> -			goto error0;
> -		}
> -
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_BNO, 0, 1, agno, 0);
> -
> -		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> -		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> -		arec->ar_blockcount = cpu_to_be32(
> -			agsize - be32_to_cpu(arec->ar_startblock));
> -
> -		error = xfs_bwrite(bp);
> -		xfs_buf_relse(bp);
> -		if (error)
> -			goto error0;
> -
> -		/*
> -		 * CNT btree root block
> -		 */
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
> -				BTOBB(mp->m_sb.sb_blocksize), 0,
> -				&xfs_allocbt_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> -			goto error0;
> -		}
> -
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_CNT, 0, 1, agno, 0);
> -
> -		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> -		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> -		arec->ar_blockcount = cpu_to_be32(
> -			agsize - be32_to_cpu(arec->ar_startblock));
> -		nfree += be32_to_cpu(arec->ar_blockcount);
> -
> -		error = xfs_bwrite(bp);
> -		xfs_buf_relse(bp);
> -		if (error)
> -			goto error0;
> -
> -		/* RMAP btree root block */
> -		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> -			struct xfs_rmap_rec	*rrec;
> -			struct xfs_btree_block	*block;
> -
> -			bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
> -				BTOBB(mp->m_sb.sb_blocksize), 0,
> -				&xfs_rmapbt_buf_ops);
> -			if (!bp) {
> -				error = -ENOMEM;
> -				goto error0;
> -			}
> -
> -			xfs_btree_init_block(mp, bp, XFS_BTNUM_RMAP, 0, 0,
> -						agno, 0);
> -			block = XFS_BUF_TO_BLOCK(bp);
> -
> -
> -			/*
> -			 * mark the AG header regions as static metadata The BNO
> -			 * btree block is the first block after the headers, so
> -			 * it's location defines the size of region the static
> -			 * metadata consumes.
> -			 *
> -			 * Note: unlike mkfs, we never have to account for log
> -			 * space when growing the data regions
> -			 */
> -			rrec = XFS_RMAP_REC_ADDR(block, 1);
> -			rrec->rm_startblock = 0;
> -			rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
> -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
> -			rrec->rm_offset = 0;
> -			be16_add_cpu(&block->bb_numrecs, 1);
> -
> -			/* account freespace btree root blocks */
> -			rrec = XFS_RMAP_REC_ADDR(block, 2);
> -			rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
> -			rrec->rm_blockcount = cpu_to_be32(2);
> -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> -			rrec->rm_offset = 0;
> -			be16_add_cpu(&block->bb_numrecs, 1);
> -
> -			/* account inode btree root blocks */
> -			rrec = XFS_RMAP_REC_ADDR(block, 3);
> -			rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> -			rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> -							XFS_IBT_BLOCK(mp));
> -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> -			rrec->rm_offset = 0;
> -			be16_add_cpu(&block->bb_numrecs, 1);
> -
> -			/* account for rmap btree root */
> -			rrec = XFS_RMAP_REC_ADDR(block, 4);
> -			rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> -			rrec->rm_blockcount = cpu_to_be32(1);
> -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> -			rrec->rm_offset = 0;
> -			be16_add_cpu(&block->bb_numrecs, 1);
> -
> -			/* account for refc btree root */
> -			if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -				rrec = XFS_RMAP_REC_ADDR(block, 5);
> -				rrec->rm_startblock = cpu_to_be32(
> -						xfs_refc_block(mp));
> -				rrec->rm_blockcount = cpu_to_be32(1);
> -				rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> -				rrec->rm_offset = 0;
> -				be16_add_cpu(&block->bb_numrecs, 1);
> -			}
> -
> -			error = xfs_bwrite(bp);
> -			xfs_buf_relse(bp);
> -			if (error)
> -				goto error0;
> -		}
> -
> -		/*
> -		 * INO btree root block
> -		 */
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> -				BTOBB(mp->m_sb.sb_blocksize), 0,
> -				&xfs_inobt_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> -			goto error0;
> -		}
> -
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> -
> -		error = xfs_bwrite(bp);
> -		xfs_buf_relse(bp);
> -		if (error)
> -			goto error0;
> -
> -		/*
> -		 * FINO btree root block
> -		 */
> -		if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> -			bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> -				BTOBB(mp->m_sb.sb_blocksize), 0,
> -				&xfs_inobt_buf_ops);
> -			if (!bp) {
> -				error = -ENOMEM;
> -				goto error0;
> -			}
> -
> -			xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO,
> -						     0, 0, agno, 0);
> -
> -			error = xfs_bwrite(bp);
> -			xfs_buf_relse(bp);
> -			if (error)
> -				goto error0;
> -		}
> -
> -		/*
> -		 * refcount btree root block
> -		 */
> -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -			bp = xfs_growfs_get_hdr_buf(mp,
> -				XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> -				BTOBB(mp->m_sb.sb_blocksize), 0,
> -				&xfs_refcountbt_buf_ops);
> -			if (!bp) {
> -				error = -ENOMEM;
> -				goto error0;
> -			}
> -
> -			xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC,
> -					     0, 0, agno, 0);
> -
> -			error = xfs_bwrite(bp);
> -			xfs_buf_relse(bp);
> -			if (error)
> -				goto error0;
> -		}
>  	}
>  	xfs_trans_agblocks_delta(tp, nfree);
> +
>  	/*
>  	 * There are new blocks in the old last a.g.
>  	 */
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists
  2018-02-01  6:41 ` [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists Dave Chinner
@ 2018-02-08 18:53   ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2018-02-08 18:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:41:57PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently write all new AG headers synchronously, which can be
> slow for large grow operations. All we really need to do is ensure
> all the headers are on disk before we run the growfs transaction, so
> convert this to a buffer list and a delayed write operation. We
> block waiting for the delayed write buffer submission to complete,
> so this will fulfill the requirement to have all the buffers written
> correctly before proceeding.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_fsops.c | 74 ++++++++++++++++++++++++------------------------------
>  1 file changed, 33 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index cd5196bf8756..d9e08d8cf9ac 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -81,7 +81,8 @@ xfs_grow_ag_headers(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
>  	xfs_extlen_t		agsize,
> -	xfs_rfsblock_t		*nfree)
> +	xfs_rfsblock_t		*nfree,
> +	struct list_head	*buffer_list)
>  {
>  	struct xfs_agf		*agf;
>  	struct xfs_agi		*agi;
> @@ -135,11 +136,8 @@ xfs_grow_ag_headers(
>  		agf->agf_refcount_level = cpu_to_be32(1);
>  		agf->agf_refcount_blocks = cpu_to_be32(1);
>  	}
> -
> -	error = xfs_bwrite(bp);
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  	xfs_buf_relse(bp);
> -	if (error)
> -		goto out_error;
>  
>  	/*
>  	 * AG freelist header block
> @@ -164,10 +162,8 @@ xfs_grow_ag_headers(
>  	for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
>  		agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
>  
> -	error = xfs_bwrite(bp);
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  	xfs_buf_relse(bp);
> -	if (error)
> -		goto out_error;
>  
>  	/*
>  	 * AG inode header block
> @@ -201,10 +197,8 @@ xfs_grow_ag_headers(
>  	for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++)
>  		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
>  
> -	error = xfs_bwrite(bp);
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  	xfs_buf_relse(bp);
> -	if (error)
> -		goto out_error;
>  
>  	/*
>  	 * BNO btree root block
> @@ -226,10 +220,8 @@ xfs_grow_ag_headers(
>  	arec->ar_blockcount = cpu_to_be32(
>  		agsize - be32_to_cpu(arec->ar_startblock));
>  
> -	error = xfs_bwrite(bp);
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  	xfs_buf_relse(bp);
> -	if (error)
> -		goto out_error;
>  
>  	/*
>  	 * CNT btree root block
> @@ -251,10 +243,8 @@ xfs_grow_ag_headers(
>  		agsize - be32_to_cpu(arec->ar_startblock));
>  	*nfree += be32_to_cpu(arec->ar_blockcount);
>  
> -	error = xfs_bwrite(bp);
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  	xfs_buf_relse(bp);
> -	if (error)
> -		goto out_error;
>  
>  	/* RMAP btree root block */
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> @@ -326,10 +316,8 @@ xfs_grow_ag_headers(
>  			be16_add_cpu(&block->bb_numrecs, 1);
>  		}
>  
> -		error = xfs_bwrite(bp);
> +		xfs_buf_delwri_queue(bp, buffer_list);
>  		xfs_buf_relse(bp);
> -		if (error)
> -			goto out_error;
>  	}
>  
>  	/*
> @@ -345,11 +333,8 @@ xfs_grow_ag_headers(
>  	}
>  
>  	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> -
> -	error = xfs_bwrite(bp);
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  	xfs_buf_relse(bp);
> -	if (error)
> -		goto out_error;
>  
>  	/*
>  	 * FINO btree root block
> @@ -364,13 +349,9 @@ xfs_grow_ag_headers(
>  			goto out_error;
>  		}
>  
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO,
> -					     0, 0, agno, 0);
> -
> -		error = xfs_bwrite(bp);
> +		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
> +		xfs_buf_delwri_queue(bp, buffer_list);
>  		xfs_buf_relse(bp);
> -		if (error)
> -			goto out_error;
>  	}
>  
>  	/*
> @@ -386,13 +367,9 @@ xfs_grow_ag_headers(
>  			goto out_error;
>  		}
>  
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC,
> -				     0, 0, agno, 0);
> -
> -		error = xfs_bwrite(bp);
> +		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
> +		xfs_buf_delwri_queue(bp, buffer_list);
>  		xfs_buf_relse(bp);
> -		if (error)
> -			goto out_error;
>  	}
>  
>  out_error:
> @@ -419,6 +396,7 @@ xfs_growfs_data_private(
>  	xfs_agnumber_t		oagcount;
>  	int			pct;
>  	xfs_trans_t		*tp;
> +	LIST_HEAD		(buffer_list);
>  
>  	nb = in->newblocks;
>  	pct = in->imaxpct;
> @@ -459,9 +437,16 @@ xfs_growfs_data_private(
>  		return error;
>  
>  	/*
> -	 * Write new AG headers to disk. Non-transactional, but written
> -	 * synchronously so they are completed prior to the growfs transaction
> -	 * being logged.
> +	 * Write new AG headers to disk. Non-transactional, but need to be
> +	 * written and completed prior to the growfs transaction being logged.
> +	 * To do this, we use a delayed write buffer list and wait for
> +	 * submission and IO completion of the list as a whole. This allows the
> +	 * IO subsystem to merge all the AG headers in a single AG into a single
> +	 * IO and hide most of the latency of the IO from us.
> +	 *
> +	 * This also means that if we get an error whilst building the buffer
> +	 * list to write, we can cancel the entire list without having written
> +	 * anything.
>  	 */
>  	nfree = 0;
>  	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> @@ -472,10 +457,17 @@ xfs_growfs_data_private(
>  		else
>  			agsize = mp->m_sb.sb_agblocks;
>  
> -		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree);
> -		if (error)
> +		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
> +					    &buffer_list);
> +		if (error) {
> +			xfs_buf_delwri_cancel(&buffer_list);
>  			goto error0;
> +		}
>  	}
> +	error = xfs_buf_delwri_submit(&buffer_list);
> +	if (error)
> +		goto error0;
> +
>  	xfs_trans_agblocks_delta(tp, nfree);
>  
>  	/*
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] xfs: factor ag btree reoot block initialisation
  2018-02-01  6:41 ` [PATCH 3/7] xfs: factor ag btree reoot block initialisation Dave Chinner
@ 2018-02-08 18:54   ` Brian Foster
  2018-02-08 20:00     ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-08 18:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:41:58PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Cookie cutter code, easily factored.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---

Seems sane, a couple factoring nits..

>  fs/xfs/xfs_fsops.c | 493 +++++++++++++++++++++++++++++------------------------
>  1 file changed, 271 insertions(+), 222 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index d9e08d8cf9ac..44eac79e0b49 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -71,46 +71,146 @@ xfs_growfs_get_hdr_buf(
>  	return bp;
>  }
>  
...
> +/*
> + * Alloc btree root block init functions
> + */
> +static void
> +xfs_bnoroot_init(
> +	struct xfs_mount	*mp,
> +	struct xfs_buf		*bp,
> +	struct aghdr_init_data	*id)
>  {
> -	struct xfs_agf		*agf;
> -	struct xfs_agi		*agi;
> -	struct xfs_agfl		*agfl;
> -	__be32			*agfl_bno;
>  	xfs_alloc_rec_t		*arec;

A couple more typedef instances to kill (here and cntroot_init() below).

> -	struct xfs_buf		*bp;
> -	int			bucket;
> -	xfs_extlen_t		tmpsize;
> -	int			error = 0;
> +
> +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> +					  be32_to_cpu(arec->ar_startblock));
> +}
> +
> +static void
> +xfs_cntroot_init(
> +	struct xfs_mount	*mp,
> +	struct xfs_buf		*bp,
> +	struct aghdr_init_data	*id)
> +{
> +	xfs_alloc_rec_t		*arec;
> +
> +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> +					  be32_to_cpu(arec->ar_startblock));
> +	id->nfree += be32_to_cpu(arec->ar_blockcount);

This seems unrelated to the cntbt. Perhaps move it to the parent
function? It looks like all we need are mp->m_ag_prealloc_blocks and
id->agsize, after all.

That also looks like the only difference between xfs_bnoroot_init() and
xfs_cntroot_init(), fwiw, so we could condense those as well.

> +}
> +
...
> +/*
> + * Write new AG headers to disk. Non-transactional, but written
> + * synchronously so they are completed prior to the growfs transaction
> + * being logged.
> + */
> +static int
> +xfs_grow_ag_headers(
> +	struct xfs_mount	*mp,
> +	struct aghdr_init_data	*id)
>  
> -	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> -	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> -	arec->ar_blockcount = cpu_to_be32(
> -		agsize - be32_to_cpu(arec->ar_startblock));
> -	*nfree += be32_to_cpu(arec->ar_blockcount);
> +{
> +	int			error = 0;
>  
...
> +	/* BNO btree root block */
> +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
> +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> +	id->type = XFS_BTNUM_BNO;
> +	id->numrecs = 1;

Do we really need to set numrecs for all of these calls? It looks out of
place/context and inconsistently used to me. Case in point: we pass 1 to
the space btree init functions which add a single record, but pass 0 to
the rmapbt init function which actually adds up to 5 records (and
increments the initial numrecs count).

AFAICT, each initialization function knows how many records it's going
to add. I don't see why that information needs to leak outside of init
function context..?

Brian

> +	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
> +				   &xfs_allocbt_buf_ops);
> +	if (error)
> +		goto out_error;
>  
> -		/* account inode btree root blocks */
> -		rrec = XFS_RMAP_REC_ADDR(block, 3);
> -		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> -		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> -						XFS_IBT_BLOCK(mp));
> -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> -		rrec->rm_offset = 0;
> -		be16_add_cpu(&block->bb_numrecs, 1);
>  
> -		/* account for rmap btree root */
> -		rrec = XFS_RMAP_REC_ADDR(block, 4);
> -		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> -		rrec->rm_blockcount = cpu_to_be32(1);
> -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> -		rrec->rm_offset = 0;
> -		be16_add_cpu(&block->bb_numrecs, 1);
> +	/* CNT btree root block */
> +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
> +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> +	id->type = XFS_BTNUM_CNT;
> +	id->numrecs = 1;
> +	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
> +				   &xfs_allocbt_buf_ops);
> +	if (error)
> +		goto out_error;
>  
> -		/* account for refc btree root */
> -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -			rrec = XFS_RMAP_REC_ADDR(block, 5);
> -			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
> -			rrec->rm_blockcount = cpu_to_be32(1);
> -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> -			rrec->rm_offset = 0;
> -			be16_add_cpu(&block->bb_numrecs, 1);
> -		}
> +	/* RMAP btree root block */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
> +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> +		id->type = XFS_BTNUM_RMAP;
> +		id->numrecs = 0;
> +		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
> +					   &xfs_rmapbt_buf_ops);
> +		if (error)
> +			goto out_error;
>  
> -		xfs_buf_delwri_queue(bp, buffer_list);
> -		xfs_buf_relse(bp);
>  	}
>  
> -	/*
> -	 * INO btree root block
> -	 */
> -	bp = xfs_growfs_get_hdr_buf(mp,
> -			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> -			BTOBB(mp->m_sb.sb_blocksize), 0,
> -			&xfs_inobt_buf_ops);
> -	if (!bp) {
> -		error = -ENOMEM;
> +	/* INO btree root block */
> +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
> +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> +	id->type = XFS_BTNUM_INO;
> +	id->numrecs = 0;
> +	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> +				   &xfs_inobt_buf_ops);
> +	if (error)
>  		goto out_error;
> -	}
>  
> -	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> -	xfs_buf_delwri_queue(bp, buffer_list);
> -	xfs_buf_relse(bp);
>  
>  	/*
>  	 * FINO btree root block
>  	 */
>  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> -			BTOBB(mp->m_sb.sb_blocksize), 0,
> -			&xfs_inobt_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
> +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> +		id->type = XFS_BTNUM_FINO;
> +		id->numrecs = 0;
> +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> +					   &xfs_inobt_buf_ops);
> +		if (error)
>  			goto out_error;
> -		}
> -
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
> -		xfs_buf_delwri_queue(bp, buffer_list);
> -		xfs_buf_relse(bp);
>  	}
>  
>  	/*
>  	 * refcount btree root block
>  	 */
>  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -		bp = xfs_growfs_get_hdr_buf(mp,
> -			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> -			BTOBB(mp->m_sb.sb_blocksize), 0,
> -			&xfs_refcountbt_buf_ops);
> -		if (!bp) {
> -			error = -ENOMEM;
> +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
> +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> +		id->type = XFS_BTNUM_REFC;
> +		id->numrecs = 0;
> +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> +					   &xfs_refcountbt_buf_ops);
> +		if (error)
>  			goto out_error;
> -		}
> -
> -		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
> -		xfs_buf_delwri_queue(bp, buffer_list);
> -		xfs_buf_relse(bp);
>  	}
>  
>  out_error:
> @@ -384,7 +433,6 @@ xfs_growfs_data_private(
>  	xfs_agf_t		*agf;
>  	xfs_agi_t		*agi;
>  	xfs_agnumber_t		agno;
> -	xfs_extlen_t		agsize;
>  	xfs_buf_t		*bp;
>  	int			dpct;
>  	int			error, saved_error = 0;
> @@ -392,11 +440,11 @@ xfs_growfs_data_private(
>  	xfs_agnumber_t		nagimax = 0;
>  	xfs_rfsblock_t		nb, nb_mod;
>  	xfs_rfsblock_t		new;
> -	xfs_rfsblock_t		nfree;
>  	xfs_agnumber_t		oagcount;
>  	int			pct;
>  	xfs_trans_t		*tp;
>  	LIST_HEAD		(buffer_list);
> +	struct aghdr_init_data	id = {};
>  
>  	nb = in->newblocks;
>  	pct = in->imaxpct;
> @@ -448,27 +496,28 @@ xfs_growfs_data_private(
>  	 * list to write, we can cancel the entire list without having written
>  	 * anything.
>  	 */
> -	nfree = 0;
> -	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> -
> -		if (agno == nagcount - 1)
> -			agsize = nb -
> -				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> +	INIT_LIST_HEAD(&id.buffer_list);
> +	for (id.agno = nagcount - 1;
> +	     id.agno >= oagcount;
> +	     id.agno--, new -= id.agsize) {
> +
> +		if (id.agno == nagcount - 1)
> +			id.agsize = nb -
> +				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
>  		else
> -			agsize = mp->m_sb.sb_agblocks;
> +			id.agsize = mp->m_sb.sb_agblocks;
>  
> -		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
> -					    &buffer_list);
> +		error = xfs_grow_ag_headers(mp, &id);
>  		if (error) {
> -			xfs_buf_delwri_cancel(&buffer_list);
> +			xfs_buf_delwri_cancel(&id.buffer_list);
>  			goto error0;
>  		}
>  	}
> -	error = xfs_buf_delwri_submit(&buffer_list);
> +	error = xfs_buf_delwri_submit(&id.buffer_list);
>  	if (error)
>  		goto error0;
>  
> -	xfs_trans_agblocks_delta(tp, nfree);
> +	xfs_trans_agblocks_delta(tp, id.nfree);
>  
>  	/*
>  	 * There are new blocks in the old last a.g.
> @@ -479,7 +528,7 @@ xfs_growfs_data_private(
>  		/*
>  		 * Change the agi length.
>  		 */
> -		error = xfs_ialloc_read_agi(mp, tp, agno, &bp);
> +		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
>  		if (error) {
>  			goto error0;
>  		}
> @@ -492,7 +541,7 @@ xfs_growfs_data_private(
>  		/*
>  		 * Change agf length.
>  		 */
> -		error = xfs_alloc_read_agf(mp, tp, agno, 0, &bp);
> +		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
>  		if (error) {
>  			goto error0;
>  		}
> @@ -511,13 +560,13 @@ xfs_growfs_data_private(
>  		 * this doesn't actually exist in the rmap btree.
>  		 */
>  		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> -		error = xfs_rmap_free(tp, bp, agno,
> +		error = xfs_rmap_free(tp, bp, id.agno,
>  				be32_to_cpu(agf->agf_length) - new,
>  				new, &oinfo);
>  		if (error)
>  			goto error0;
>  		error = xfs_free_extent(tp,
> -				XFS_AGB_TO_FSB(mp, agno,
> +				XFS_AGB_TO_FSB(mp, id.agno,
>  					be32_to_cpu(agf->agf_length) - new),
>  				new, &oinfo, XFS_AG_RESV_NONE);
>  		if (error)
> @@ -534,8 +583,8 @@ xfs_growfs_data_private(
>  	if (nb > mp->m_sb.sb_dblocks)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
>  				 nb - mp->m_sb.sb_dblocks);
> -	if (nfree)
> -		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
> +	if (id.nfree)
> +		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
>  	if (dpct)
>  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
>  	xfs_trans_set_sync(tp);
> @@ -562,7 +611,7 @@ xfs_growfs_data_private(
>  	if (new) {
>  		struct xfs_perag	*pag;
>  
> -		pag = xfs_perag_get(mp, agno);
> +		pag = xfs_perag_get(mp, id.agno);
>  		error = xfs_ag_resv_free(pag);
>  		xfs_perag_put(pag);
>  		if (error)
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] xfs: factor ag btree reoot block initialisation
  2018-02-08 18:54   ` Brian Foster
@ 2018-02-08 20:00     ` Darrick J. Wong
  2018-02-09 13:10       ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2018-02-08 20:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Thu, Feb 08, 2018 at 01:54:03PM -0500, Brian Foster wrote:
> On Thu, Feb 01, 2018 at 05:41:58PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Cookie cutter code, easily factored.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Seems sane, a couple factoring nits..
> 
> >  fs/xfs/xfs_fsops.c | 493 +++++++++++++++++++++++++++++------------------------
> >  1 file changed, 271 insertions(+), 222 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index d9e08d8cf9ac..44eac79e0b49 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -71,46 +71,146 @@ xfs_growfs_get_hdr_buf(
> >  	return bp;
> >  }
> >  
> ...
> > +/*
> > + * Alloc btree root block init functions
> > + */
> > +static void
> > +xfs_bnoroot_init(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_buf		*bp,
> > +	struct aghdr_init_data	*id)
> >  {
> > -	struct xfs_agf		*agf;
> > -	struct xfs_agi		*agi;
> > -	struct xfs_agfl		*agfl;
> > -	__be32			*agfl_bno;
> >  	xfs_alloc_rec_t		*arec;
> 
> A couple more typedef instances to kill (here and cntroot_init() below).
> 
> > -	struct xfs_buf		*bp;
> > -	int			bucket;
> > -	xfs_extlen_t		tmpsize;
> > -	int			error = 0;
> > +
> > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > +					  be32_to_cpu(arec->ar_startblock));
> > +}
> > +
> > +static void
> > +xfs_cntroot_init(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_buf		*bp,
> > +	struct aghdr_init_data	*id)
> > +{
> > +	xfs_alloc_rec_t		*arec;
> > +
> > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > +					  be32_to_cpu(arec->ar_startblock));
> > +	id->nfree += be32_to_cpu(arec->ar_blockcount);
> 
> This seems unrelated to the cntbt. Perhaps move it to the parent
> function? It looks like all we need are mp->m_ag_prealloc_blocks and
> id->agsize, after all.
> 
> That also looks like the only difference between xfs_bnoroot_init() and
> xfs_cntroot_init(), fwiw, so we could condense those as well.
> 
> > +}
> > +
> ...
> > +/*
> > + * Write new AG headers to disk. Non-transactional, but written
> > + * synchronously so they are completed prior to the growfs transaction
> > + * being logged.
> > + */
> > +static int
> > +xfs_grow_ag_headers(
> > +	struct xfs_mount	*mp,
> > +	struct aghdr_init_data	*id)
> >  
> > -	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > -	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > -	arec->ar_blockcount = cpu_to_be32(
> > -		agsize - be32_to_cpu(arec->ar_startblock));
> > -	*nfree += be32_to_cpu(arec->ar_blockcount);
> > +{
> > +	int			error = 0;
> >  
> ...
> > +	/* BNO btree root block */
> > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
> > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > +	id->type = XFS_BTNUM_BNO;
> > +	id->numrecs = 1;
> 
> Do we really need to set numrecs for all of these calls? It looks out of
> place/context and inconsistently used to me. Case in point: we pass 1 to
> the space btree init functions which add a single record, but pass 0 to
> the rmapbt init function which actually adds up to 5 records (and
> increments the initial numrecs count).

I would've (will?) refactor all this to look like:

struct xfs_rmap_li {
	struct list_head	list;
	struct xfs_rmap_irec	rmap;
};

int
xfs_rmapbt_initialize(
	struct xfs_mount	*mp,
	struct xfs_buf		*agf_bp,
	struct xfs_buf		*root_bp,
	struct list_head	*rmaps)
{
	struct xfs_agf		*agf = XFS_AGF_BUF(agf_bp);
	struct xfs_rmap_li	*li;
	struct xfs_rmap_rec	*disk_rec;
	struct xfs_btree_headr	*rmap_hdr;

	agf->rmaproot = be32_to_cpu(...);
	agf->rmaplevel = 1;

	disk_rec = xfs_get_rmap_entries(root_bp);
	rmap_hdr = root_bp->b_addr;
	list_for_each_entry(li, ..., rmaps) {
		xfs_rmap_irec_to_disk(disk_rec, &li->rmap);
		disk_rec++;
		rmap_hdr->numrecs++; /* yeah yeah be32 */
	}

	/* mark agf dirty, mark rootbp dirty */
	return 0;
}

So then you can call it via:

INIT_LIST_HEAD(&rec_list);
/* construct rec_list of records */
agf_bp = xfs_alloc_read_agf(...);
root_bp = xfs_buf_get(..., XFS_RMAP_BLOCK(mp));
error = xfs_rmapbt_initialize(mp, agf_bp, root_bp, &rec_list);

But I haven't had time to look through this in enough detail to figure
out how to merge it with the online repair stuff.  Maybe it doesn't even
make sense to merge them just to shave a few lines of header
initialization.

--D

> AFAICT, each initialization function knows how many records it's going
> to add. I don't see why that information needs to leak outside of init
> function context..?
> 
> Brian
> 
> > +	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
> > +				   &xfs_allocbt_buf_ops);
> > +	if (error)
> > +		goto out_error;
> >  
> > -		/* account inode btree root blocks */
> > -		rrec = XFS_RMAP_REC_ADDR(block, 3);
> > -		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> > -		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> > -						XFS_IBT_BLOCK(mp));
> > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> > -		rrec->rm_offset = 0;
> > -		be16_add_cpu(&block->bb_numrecs, 1);
> >  
> > -		/* account for rmap btree root */
> > -		rrec = XFS_RMAP_REC_ADDR(block, 4);
> > -		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> > -		rrec->rm_blockcount = cpu_to_be32(1);
> > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> > -		rrec->rm_offset = 0;
> > -		be16_add_cpu(&block->bb_numrecs, 1);
> > +	/* CNT btree root block */
> > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
> > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > +	id->type = XFS_BTNUM_CNT;
> > +	id->numrecs = 1;
> > +	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
> > +				   &xfs_allocbt_buf_ops);
> > +	if (error)
> > +		goto out_error;
> >  
> > -		/* account for refc btree root */
> > -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > -			rrec = XFS_RMAP_REC_ADDR(block, 5);
> > -			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
> > -			rrec->rm_blockcount = cpu_to_be32(1);
> > -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> > -			rrec->rm_offset = 0;
> > -			be16_add_cpu(&block->bb_numrecs, 1);
> > -		}
> > +	/* RMAP btree root block */
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
> > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > +		id->type = XFS_BTNUM_RMAP;
> > +		id->numrecs = 0;
> > +		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
> > +					   &xfs_rmapbt_buf_ops);
> > +		if (error)
> > +			goto out_error;
> >  
> > -		xfs_buf_delwri_queue(bp, buffer_list);
> > -		xfs_buf_relse(bp);
> >  	}
> >  
> > -	/*
> > -	 * INO btree root block
> > -	 */
> > -	bp = xfs_growfs_get_hdr_buf(mp,
> > -			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > -			&xfs_inobt_buf_ops);
> > -	if (!bp) {
> > -		error = -ENOMEM;
> > +	/* INO btree root block */
> > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
> > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > +	id->type = XFS_BTNUM_INO;
> > +	id->numrecs = 0;
> > +	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > +				   &xfs_inobt_buf_ops);
> > +	if (error)
> >  		goto out_error;
> > -	}
> >  
> > -	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> > -	xfs_buf_delwri_queue(bp, buffer_list);
> > -	xfs_buf_relse(bp);
> >  
> >  	/*
> >  	 * FINO btree root block
> >  	 */
> >  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > -		bp = xfs_growfs_get_hdr_buf(mp,
> > -			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > -			&xfs_inobt_buf_ops);
> > -		if (!bp) {
> > -			error = -ENOMEM;
> > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
> > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > +		id->type = XFS_BTNUM_FINO;
> > +		id->numrecs = 0;
> > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > +					   &xfs_inobt_buf_ops);
> > +		if (error)
> >  			goto out_error;
> > -		}
> > -
> > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
> > -		xfs_buf_delwri_queue(bp, buffer_list);
> > -		xfs_buf_relse(bp);
> >  	}
> >  
> >  	/*
> >  	 * refcount btree root block
> >  	 */
> >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > -		bp = xfs_growfs_get_hdr_buf(mp,
> > -			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > -			&xfs_refcountbt_buf_ops);
> > -		if (!bp) {
> > -			error = -ENOMEM;
> > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
> > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > +		id->type = XFS_BTNUM_REFC;
> > +		id->numrecs = 0;
> > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > +					   &xfs_refcountbt_buf_ops);
> > +		if (error)
> >  			goto out_error;
> > -		}
> > -
> > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
> > -		xfs_buf_delwri_queue(bp, buffer_list);
> > -		xfs_buf_relse(bp);
> >  	}
> >  
> >  out_error:
> > @@ -384,7 +433,6 @@ xfs_growfs_data_private(
> >  	xfs_agf_t		*agf;
> >  	xfs_agi_t		*agi;
> >  	xfs_agnumber_t		agno;
> > -	xfs_extlen_t		agsize;
> >  	xfs_buf_t		*bp;
> >  	int			dpct;
> >  	int			error, saved_error = 0;
> > @@ -392,11 +440,11 @@ xfs_growfs_data_private(
> >  	xfs_agnumber_t		nagimax = 0;
> >  	xfs_rfsblock_t		nb, nb_mod;
> >  	xfs_rfsblock_t		new;
> > -	xfs_rfsblock_t		nfree;
> >  	xfs_agnumber_t		oagcount;
> >  	int			pct;
> >  	xfs_trans_t		*tp;
> >  	LIST_HEAD		(buffer_list);
> > +	struct aghdr_init_data	id = {};
> >  
> >  	nb = in->newblocks;
> >  	pct = in->imaxpct;
> > @@ -448,27 +496,28 @@ xfs_growfs_data_private(
> >  	 * list to write, we can cancel the entire list without having written
> >  	 * anything.
> >  	 */
> > -	nfree = 0;
> > -	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> > -
> > -		if (agno == nagcount - 1)
> > -			agsize = nb -
> > -				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > +	INIT_LIST_HEAD(&id.buffer_list);
> > +	for (id.agno = nagcount - 1;
> > +	     id.agno >= oagcount;
> > +	     id.agno--, new -= id.agsize) {
> > +
> > +		if (id.agno == nagcount - 1)
> > +			id.agsize = nb -
> > +				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> >  		else
> > -			agsize = mp->m_sb.sb_agblocks;
> > +			id.agsize = mp->m_sb.sb_agblocks;
> >  
> > -		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
> > -					    &buffer_list);
> > +		error = xfs_grow_ag_headers(mp, &id);
> >  		if (error) {
> > -			xfs_buf_delwri_cancel(&buffer_list);
> > +			xfs_buf_delwri_cancel(&id.buffer_list);
> >  			goto error0;
> >  		}
> >  	}
> > -	error = xfs_buf_delwri_submit(&buffer_list);
> > +	error = xfs_buf_delwri_submit(&id.buffer_list);
> >  	if (error)
> >  		goto error0;
> >  
> > -	xfs_trans_agblocks_delta(tp, nfree);
> > +	xfs_trans_agblocks_delta(tp, id.nfree);
> >  
> >  	/*
> >  	 * There are new blocks in the old last a.g.
> > @@ -479,7 +528,7 @@ xfs_growfs_data_private(
> >  		/*
> >  		 * Change the agi length.
> >  		 */
> > -		error = xfs_ialloc_read_agi(mp, tp, agno, &bp);
> > +		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
> >  		if (error) {
> >  			goto error0;
> >  		}
> > @@ -492,7 +541,7 @@ xfs_growfs_data_private(
> >  		/*
> >  		 * Change agf length.
> >  		 */
> > -		error = xfs_alloc_read_agf(mp, tp, agno, 0, &bp);
> > +		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
> >  		if (error) {
> >  			goto error0;
> >  		}
> > @@ -511,13 +560,13 @@ xfs_growfs_data_private(
> >  		 * this doesn't actually exist in the rmap btree.
> >  		 */
> >  		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> > -		error = xfs_rmap_free(tp, bp, agno,
> > +		error = xfs_rmap_free(tp, bp, id.agno,
> >  				be32_to_cpu(agf->agf_length) - new,
> >  				new, &oinfo);
> >  		if (error)
> >  			goto error0;
> >  		error = xfs_free_extent(tp,
> > -				XFS_AGB_TO_FSB(mp, agno,
> > +				XFS_AGB_TO_FSB(mp, id.agno,
> >  					be32_to_cpu(agf->agf_length) - new),
> >  				new, &oinfo, XFS_AG_RESV_NONE);
> >  		if (error)
> > @@ -534,8 +583,8 @@ xfs_growfs_data_private(
> >  	if (nb > mp->m_sb.sb_dblocks)
> >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> >  				 nb - mp->m_sb.sb_dblocks);
> > -	if (nfree)
> > -		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
> > +	if (id.nfree)
> > +		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> >  	if (dpct)
> >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
> >  	xfs_trans_set_sync(tp);
> > @@ -562,7 +611,7 @@ xfs_growfs_data_private(
> >  	if (new) {
> >  		struct xfs_perag	*pag;
> >  
> > -		pag = xfs_perag_get(mp, agno);
> > +		pag = xfs_perag_get(mp, id.agno);
> >  		error = xfs_ag_resv_free(pag);
> >  		xfs_perag_put(pag);
> >  		if (error)
> > -- 
> > 2.15.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] xfs: factor ag btree reoot block initialisation
  2018-02-08 20:00     ` Darrick J. Wong
@ 2018-02-09 13:10       ` Brian Foster
  2018-02-12  0:45         ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-09 13:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Thu, Feb 08, 2018 at 12:00:07PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 08, 2018 at 01:54:03PM -0500, Brian Foster wrote:
> > On Thu, Feb 01, 2018 at 05:41:58PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Cookie cutter code, easily factored.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > 
> > Seems sane, a couple factoring nits..
> > 
> > >  fs/xfs/xfs_fsops.c | 493 +++++++++++++++++++++++++++++------------------------
> > >  1 file changed, 271 insertions(+), 222 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index d9e08d8cf9ac..44eac79e0b49 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > > @@ -71,46 +71,146 @@ xfs_growfs_get_hdr_buf(
> > >  	return bp;
> > >  }
> > >  
> > ...
> > > +/*
> > > + * Alloc btree root block init functions
> > > + */
> > > +static void
> > > +xfs_bnoroot_init(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_buf		*bp,
> > > +	struct aghdr_init_data	*id)
> > >  {
> > > -	struct xfs_agf		*agf;
> > > -	struct xfs_agi		*agi;
> > > -	struct xfs_agfl		*agfl;
> > > -	__be32			*agfl_bno;
> > >  	xfs_alloc_rec_t		*arec;
> > 
> > A couple more typedef instances to kill (here and cntroot_init() below).
> > 
> > > -	struct xfs_buf		*bp;
> > > -	int			bucket;
> > > -	xfs_extlen_t		tmpsize;
> > > -	int			error = 0;
> > > +
> > > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > > +					  be32_to_cpu(arec->ar_startblock));
> > > +}
> > > +
> > > +static void
> > > +xfs_cntroot_init(
> > > +	struct xfs_mount	*mp,
> > > +	struct xfs_buf		*bp,
> > > +	struct aghdr_init_data	*id)
> > > +{
> > > +	xfs_alloc_rec_t		*arec;
> > > +
> > > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > > +					  be32_to_cpu(arec->ar_startblock));
> > > +	id->nfree += be32_to_cpu(arec->ar_blockcount);
> > 
> > This seems unrelated to the cntbt. Perhaps move it to the parent
> > function? It looks like all we need are mp->m_ag_prealloc_blocks and
> > id->agsize, after all.
> > 
> > That also looks like the only difference between xfs_bnoroot_init() and
> > xfs_cntroot_init(), fwiw, so we could condense those as well.
> > 
> > > +}
> > > +
> > ...
> > > +/*
> > > + * Write new AG headers to disk. Non-transactional, but written
> > > + * synchronously so they are completed prior to the growfs transaction
> > > + * being logged.
> > > + */
> > > +static int
> > > +xfs_grow_ag_headers(
> > > +	struct xfs_mount	*mp,
> > > +	struct aghdr_init_data	*id)
> > >  
> > > -	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > -	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > -	arec->ar_blockcount = cpu_to_be32(
> > > -		agsize - be32_to_cpu(arec->ar_startblock));
> > > -	*nfree += be32_to_cpu(arec->ar_blockcount);
> > > +{
> > > +	int			error = 0;
> > >  
> > ...
> > > +	/* BNO btree root block */
> > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
> > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > +	id->type = XFS_BTNUM_BNO;
> > > +	id->numrecs = 1;
> > 
> > Do we really need to set numrecs for all of these calls? It looks out of
> > place/context and inconsistently used to me. Case in point: we pass 1 to
> > the space btree init functions which add a single record, but pass 0 to
> > the rmapbt init function which actually adds up to 5 records (and
> > increments the initial numrecs count).
> 
> I would've (will?) refactor all this to look like:
> 
> struct xfs_rmap_li {
> 	struct list_head	list;
> 	struct xfs_rmap_irec	rmap;
> };
> 
> int
> xfs_rmapbt_initialize(
> 	struct xfs_mount	*mp,
> 	struct xfs_buf		*agf_bp,
> 	struct xfs_buf		*root_bp,
> 	struct list_head	*rmaps)
> {
> 	struct xfs_agf		*agf = XFS_AGF_BUF(agf_bp);
> 	struct xfs_rmap_li	*li;
> 	struct xfs_rmap_rec	*disk_rec;
> 	struct xfs_btree_headr	*rmap_hdr;
> 
> 	agf->rmaproot = be32_to_cpu(...);
> 	agf->rmaplevel = 1;
> 
> 	disk_rec = xfs_get_rmap_entries(root_bp);
> 	rmap_hdr = root_bp->b_addr;
> 	list_for_each_entry(li, ..., rmaps) {
> 		xfs_rmap_irec_to_disk(disk_rec, &li->rmap);
> 		disk_rec++;
> 		rmap_hdr->numrecs++; /* yeah yeah be32 */
> 	}
> 
> 	/* mark agf dirty, mark rootbp dirty */
> 	return 0;
> }
> 
> So then you can call it via:
> 
> INIT_LIST_HEAD(&rec_list);
> /* construct rec_list of records */
> agf_bp = xfs_alloc_read_agf(...);
> root_bp = xfs_buf_get(..., XFS_RMAP_BLOCK(mp));
> error = xfs_rmapbt_initialize(mp, agf_bp, root_bp, &rec_list);
> 
> But I haven't had time to look through this in enough detail to figure
> out how to merge it with the online repair stuff.  Maybe it doesn't even
> make sense to merge them just to shave a few lines of header
> initialization.
> 

Hm, that looks like it has the potential to change this a decent amount.
It's not clear to me how it would affect the growfs init function
interface. From the perspective of the original comment
(aghdr_init_data->numrecs), ISTM you'd still expect it to be zero in
this example, regardless of whether the init function hardcodes
insertion of a fixed number of entries or dynamically adds those (likely
the same number of in the growfs case) entries.

So what is the status of this series then? Is it being tossed in favor
of something else that is pending?

Brian

> --D
> 
> > AFAICT, each initialization function knows how many records it's going
> > to add. I don't see why that information needs to leak outside of init
> > function context..?
> > 
> > Brian
> > 
> > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
> > > +				   &xfs_allocbt_buf_ops);
> > > +	if (error)
> > > +		goto out_error;
> > >  
> > > -		/* account inode btree root blocks */
> > > -		rrec = XFS_RMAP_REC_ADDR(block, 3);
> > > -		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> > > -		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> > > -						XFS_IBT_BLOCK(mp));
> > > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> > > -		rrec->rm_offset = 0;
> > > -		be16_add_cpu(&block->bb_numrecs, 1);
> > >  
> > > -		/* account for rmap btree root */
> > > -		rrec = XFS_RMAP_REC_ADDR(block, 4);
> > > -		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> > > -		rrec->rm_blockcount = cpu_to_be32(1);
> > > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> > > -		rrec->rm_offset = 0;
> > > -		be16_add_cpu(&block->bb_numrecs, 1);
> > > +	/* CNT btree root block */
> > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
> > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > +	id->type = XFS_BTNUM_CNT;
> > > +	id->numrecs = 1;
> > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
> > > +				   &xfs_allocbt_buf_ops);
> > > +	if (error)
> > > +		goto out_error;
> > >  
> > > -		/* account for refc btree root */
> > > -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > -			rrec = XFS_RMAP_REC_ADDR(block, 5);
> > > -			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
> > > -			rrec->rm_blockcount = cpu_to_be32(1);
> > > -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> > > -			rrec->rm_offset = 0;
> > > -			be16_add_cpu(&block->bb_numrecs, 1);
> > > -		}
> > > +	/* RMAP btree root block */
> > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
> > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > +		id->type = XFS_BTNUM_RMAP;
> > > +		id->numrecs = 0;
> > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
> > > +					   &xfs_rmapbt_buf_ops);
> > > +		if (error)
> > > +			goto out_error;
> > >  
> > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > -		xfs_buf_relse(bp);
> > >  	}
> > >  
> > > -	/*
> > > -	 * INO btree root block
> > > -	 */
> > > -	bp = xfs_growfs_get_hdr_buf(mp,
> > > -			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > -			&xfs_inobt_buf_ops);
> > > -	if (!bp) {
> > > -		error = -ENOMEM;
> > > +	/* INO btree root block */
> > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
> > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > +	id->type = XFS_BTNUM_INO;
> > > +	id->numrecs = 0;
> > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > +				   &xfs_inobt_buf_ops);
> > > +	if (error)
> > >  		goto out_error;
> > > -	}
> > >  
> > > -	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> > > -	xfs_buf_delwri_queue(bp, buffer_list);
> > > -	xfs_buf_relse(bp);
> > >  
> > >  	/*
> > >  	 * FINO btree root block
> > >  	 */
> > >  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > > -		bp = xfs_growfs_get_hdr_buf(mp,
> > > -			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > -			&xfs_inobt_buf_ops);
> > > -		if (!bp) {
> > > -			error = -ENOMEM;
> > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
> > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > +		id->type = XFS_BTNUM_FINO;
> > > +		id->numrecs = 0;
> > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > +					   &xfs_inobt_buf_ops);
> > > +		if (error)
> > >  			goto out_error;
> > > -		}
> > > -
> > > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
> > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > -		xfs_buf_relse(bp);
> > >  	}
> > >  
> > >  	/*
> > >  	 * refcount btree root block
> > >  	 */
> > >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > -		bp = xfs_growfs_get_hdr_buf(mp,
> > > -			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > -			&xfs_refcountbt_buf_ops);
> > > -		if (!bp) {
> > > -			error = -ENOMEM;
> > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
> > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > +		id->type = XFS_BTNUM_REFC;
> > > +		id->numrecs = 0;
> > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > +					   &xfs_refcountbt_buf_ops);
> > > +		if (error)
> > >  			goto out_error;
> > > -		}
> > > -
> > > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
> > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > -		xfs_buf_relse(bp);
> > >  	}
> > >  
> > >  out_error:
> > > @@ -384,7 +433,6 @@ xfs_growfs_data_private(
> > >  	xfs_agf_t		*agf;
> > >  	xfs_agi_t		*agi;
> > >  	xfs_agnumber_t		agno;
> > > -	xfs_extlen_t		agsize;
> > >  	xfs_buf_t		*bp;
> > >  	int			dpct;
> > >  	int			error, saved_error = 0;
> > > @@ -392,11 +440,11 @@ xfs_growfs_data_private(
> > >  	xfs_agnumber_t		nagimax = 0;
> > >  	xfs_rfsblock_t		nb, nb_mod;
> > >  	xfs_rfsblock_t		new;
> > > -	xfs_rfsblock_t		nfree;
> > >  	xfs_agnumber_t		oagcount;
> > >  	int			pct;
> > >  	xfs_trans_t		*tp;
> > >  	LIST_HEAD		(buffer_list);
> > > +	struct aghdr_init_data	id = {};
> > >  
> > >  	nb = in->newblocks;
> > >  	pct = in->imaxpct;
> > > @@ -448,27 +496,28 @@ xfs_growfs_data_private(
> > >  	 * list to write, we can cancel the entire list without having written
> > >  	 * anything.
> > >  	 */
> > > -	nfree = 0;
> > > -	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> > > -
> > > -		if (agno == nagcount - 1)
> > > -			agsize = nb -
> > > -				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > > +	INIT_LIST_HEAD(&id.buffer_list);
> > > +	for (id.agno = nagcount - 1;
> > > +	     id.agno >= oagcount;
> > > +	     id.agno--, new -= id.agsize) {
> > > +
> > > +		if (id.agno == nagcount - 1)
> > > +			id.agsize = nb -
> > > +				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > >  		else
> > > -			agsize = mp->m_sb.sb_agblocks;
> > > +			id.agsize = mp->m_sb.sb_agblocks;
> > >  
> > > -		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
> > > -					    &buffer_list);
> > > +		error = xfs_grow_ag_headers(mp, &id);
> > >  		if (error) {
> > > -			xfs_buf_delwri_cancel(&buffer_list);
> > > +			xfs_buf_delwri_cancel(&id.buffer_list);
> > >  			goto error0;
> > >  		}
> > >  	}
> > > -	error = xfs_buf_delwri_submit(&buffer_list);
> > > +	error = xfs_buf_delwri_submit(&id.buffer_list);
> > >  	if (error)
> > >  		goto error0;
> > >  
> > > -	xfs_trans_agblocks_delta(tp, nfree);
> > > +	xfs_trans_agblocks_delta(tp, id.nfree);
> > >  
> > >  	/*
> > >  	 * There are new blocks in the old last a.g.
> > > @@ -479,7 +528,7 @@ xfs_growfs_data_private(
> > >  		/*
> > >  		 * Change the agi length.
> > >  		 */
> > > -		error = xfs_ialloc_read_agi(mp, tp, agno, &bp);
> > > +		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
> > >  		if (error) {
> > >  			goto error0;
> > >  		}
> > > @@ -492,7 +541,7 @@ xfs_growfs_data_private(
> > >  		/*
> > >  		 * Change agf length.
> > >  		 */
> > > -		error = xfs_alloc_read_agf(mp, tp, agno, 0, &bp);
> > > +		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
> > >  		if (error) {
> > >  			goto error0;
> > >  		}
> > > @@ -511,13 +560,13 @@ xfs_growfs_data_private(
> > >  		 * this doesn't actually exist in the rmap btree.
> > >  		 */
> > >  		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> > > -		error = xfs_rmap_free(tp, bp, agno,
> > > +		error = xfs_rmap_free(tp, bp, id.agno,
> > >  				be32_to_cpu(agf->agf_length) - new,
> > >  				new, &oinfo);
> > >  		if (error)
> > >  			goto error0;
> > >  		error = xfs_free_extent(tp,
> > > -				XFS_AGB_TO_FSB(mp, agno,
> > > +				XFS_AGB_TO_FSB(mp, id.agno,
> > >  					be32_to_cpu(agf->agf_length) - new),
> > >  				new, &oinfo, XFS_AG_RESV_NONE);
> > >  		if (error)
> > > @@ -534,8 +583,8 @@ xfs_growfs_data_private(
> > >  	if (nb > mp->m_sb.sb_dblocks)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > >  				 nb - mp->m_sb.sb_dblocks);
> > > -	if (nfree)
> > > -		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
> > > +	if (id.nfree)
> > > +		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > >  	if (dpct)
> > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
> > >  	xfs_trans_set_sync(tp);
> > > @@ -562,7 +611,7 @@ xfs_growfs_data_private(
> > >  	if (new) {
> > >  		struct xfs_perag	*pag;
> > >  
> > > -		pag = xfs_perag_get(mp, agno);
> > > +		pag = xfs_perag_get(mp, id.agno);
> > >  		error = xfs_ag_resv_free(pag);
> > >  		xfs_perag_put(pag);
> > >  		if (error)
> > > -- 
> > > 2.15.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation
  2018-02-01  6:41 ` [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation Dave Chinner
@ 2018-02-09 16:11   ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2018-02-09 16:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:41:59PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There's still more cookie cutter code in setting up each AG header.
> Separate all the variables into a simple structure and iterate a
> table of header definitions to initialise everything.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---

Seems fine, modulo previous refactoring comments:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_fsops.c | 163 ++++++++++++++++++++++-------------------------------
>  1 file changed, 66 insertions(+), 97 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 44eac79e0b49..94650b7d517e 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -288,12 +288,13 @@ xfs_agiblock_init(
>  		agi->agi_unlinked[bucket] = cpu_to_be32(NULLAGINO);
>  }
>  
> +typedef void (*aghdr_init_work_f)(struct xfs_mount *mp, struct xfs_buf *bp,
> +				  struct aghdr_init_data *id);
>  static int
>  xfs_growfs_init_aghdr(
>  	struct xfs_mount	*mp,
>  	struct aghdr_init_data	*id,
> -	void			(*work)(struct xfs_mount *, struct xfs_buf *,
> -					struct aghdr_init_data *),
> +	aghdr_init_work_f	work,
>  	const struct xfs_buf_ops *ops)
>  
>  {
> @@ -310,6 +311,16 @@ xfs_growfs_init_aghdr(
>  	return 0;
>  }
>  
> +struct xfs_aghdr_grow_data {
> +	xfs_daddr_t		daddr;
> +	size_t			numblks;
> +	const struct xfs_buf_ops *ops;
> +	aghdr_init_work_f	work;
> +	xfs_btnum_t		type;
> +	int			numrecs;
> +	bool			need_init;
> +};
> +
>  /*
>   * Write new AG headers to disk. Non-transactional, but written
>   * synchronously so they are completed prior to the growfs transaction
> @@ -321,107 +332,65 @@ xfs_grow_ag_headers(
>  	struct aghdr_init_data	*id)
>  
>  {
> +	struct xfs_aghdr_grow_data aghdr_data[] = {
> +		/* AGF */
> +		{ XFS_AG_DADDR(mp, id->agno, XFS_AGF_DADDR(mp)),
> +		  XFS_FSS_TO_BB(mp, 1), &xfs_agf_buf_ops,
> +		  &xfs_agfblock_init, 0, 0, true },
> +		/* AGFL */
> +		{ XFS_AG_DADDR(mp, id->agno, XFS_AGFL_DADDR(mp)),
> +		  XFS_FSS_TO_BB(mp, 1), &xfs_agfl_buf_ops,
> +		  &xfs_agflblock_init, 0, 0, true },
> +		/* AGI */
> +		{ XFS_AG_DADDR(mp, id->agno, XFS_AGI_DADDR(mp)),
> +		  XFS_FSS_TO_BB(mp, 1), &xfs_agi_buf_ops,
> +		  &xfs_agiblock_init, 0, 0, true },
> +		/* BNO root block */
> +		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp)),
> +		  BTOBB(mp->m_sb.sb_blocksize), &xfs_allocbt_buf_ops,
> +		  &xfs_bnoroot_init, XFS_BTNUM_BNO, 1, true },
> +		/* CNT root block */
> +		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp)),
> +		  BTOBB(mp->m_sb.sb_blocksize), &xfs_allocbt_buf_ops,
> +		  &xfs_cntroot_init, XFS_BTNUM_CNT, 1, true },
> +		/* INO root block */
> +		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp)),
> +		  BTOBB(mp->m_sb.sb_blocksize), &xfs_inobt_buf_ops,
> +		  &xfs_btroot_init, XFS_BTNUM_INO, 0, true },
> +		/* FINO root block */
> +		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp)),
> +		  BTOBB(mp->m_sb.sb_blocksize), &xfs_inobt_buf_ops,
> +		  &xfs_btroot_init, XFS_BTNUM_FINO, 0,
> +		  xfs_sb_version_hasfinobt(&mp->m_sb) },
> +		/* RMAP root block */
> +		{ XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp)),
> +		  BTOBB(mp->m_sb.sb_blocksize), &xfs_rmapbt_buf_ops,
> +		  &xfs_rmaproot_init, XFS_BTNUM_RMAP, 0,
> +		  xfs_sb_version_hasrmapbt(&mp->m_sb) },
> +		/* REFC root block */
> +		{ XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp)),
> +		  BTOBB(mp->m_sb.sb_blocksize), &xfs_refcountbt_buf_ops,
> +		  &xfs_btroot_init, XFS_BTNUM_REFC, 0,
> +		  xfs_sb_version_hasreflink(&mp->m_sb) },
> +		/* NULL terminating block */
> +		{ XFS_BUF_DADDR_NULL, 0, NULL, NULL, 0, 0, false },
> +	};
> +	struct  xfs_aghdr_grow_data *dp;
>  	int			error = 0;
>  
> -	/* AG freespace header block */
> -	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGF_DADDR(mp));
> -	id->numblks = XFS_FSS_TO_BB(mp, 1);
> -	error = xfs_growfs_init_aghdr(mp, id, xfs_agfblock_init,
> -					&xfs_agf_buf_ops);
> -	if (error)
> -		goto out_error;
> -
> -	/* AG freelist header block */
> -	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGFL_DADDR(mp));
> -	id->numblks = XFS_FSS_TO_BB(mp, 1);
> -	error = xfs_growfs_init_aghdr(mp, id, xfs_agflblock_init,
> -					&xfs_agfl_buf_ops);
> -	if (error)
> -		goto out_error;
> -
> -	/* AG inode header block */
> -	id->daddr = XFS_AG_DADDR(mp, id->agno, XFS_AGI_DADDR(mp));
> -	id->numblks = XFS_FSS_TO_BB(mp, 1);
> -	error = xfs_growfs_init_aghdr(mp, id, xfs_agiblock_init,
> -					&xfs_agi_buf_ops);
> -	if (error)
> -		goto out_error;
> -
> -
> -	/* BNO btree root block */
> -	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
> -	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> -	id->type = XFS_BTNUM_BNO;
> -	id->numrecs = 1;
> -	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
> -				   &xfs_allocbt_buf_ops);
> -	if (error)
> -		goto out_error;
> -
> -
> -	/* CNT btree root block */
> -	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
> -	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> -	id->type = XFS_BTNUM_CNT;
> -	id->numrecs = 1;
> -	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
> -				   &xfs_allocbt_buf_ops);
> -	if (error)
> -		goto out_error;
> -
> -	/* RMAP btree root block */
> -	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> -		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
> -		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> -		id->type = XFS_BTNUM_RMAP;
> -		id->numrecs = 0;
> -		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
> -					   &xfs_rmapbt_buf_ops);
> -		if (error)
> -			goto out_error;
> -
> -	}
> -
> -	/* INO btree root block */
> -	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
> -	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> -	id->type = XFS_BTNUM_INO;
> -	id->numrecs = 0;
> -	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> -				   &xfs_inobt_buf_ops);
> -	if (error)
> -		goto out_error;
> -
> -
> -	/*
> -	 * FINO btree root block
> -	 */
> -	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> -		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
> -		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> -		id->type = XFS_BTNUM_FINO;
> -		id->numrecs = 0;
> -		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> -					   &xfs_inobt_buf_ops);
> -		if (error)
> -			goto out_error;
> -	}
> +	for (dp = &aghdr_data[0]; dp->daddr != XFS_BUF_DADDR_NULL; dp++) {
> +		if (!dp->need_init)
> +			continue;
>  
> -	/*
> -	 * refcount btree root block
> -	 */
> -	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
> -		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> -		id->type = XFS_BTNUM_REFC;
> -		id->numrecs = 0;
> -		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> -					   &xfs_refcountbt_buf_ops);
> +		id->daddr = dp->daddr;
> +		id->numblks = dp->numblks;
> +		id->numrecs = dp->numrecs;
> +		id->type = dp->type;
> +		error = xfs_growfs_init_aghdr(mp, id, dp->work, dp->ops);
>  		if (error)
> -			goto out_error;
> +			break;
>  	}
>  
> -out_error:
>  	return error;
>  }
>  
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] xfs: make imaxpct changes in growfs separate
  2018-02-01  6:42 ` [PATCH 5/7] xfs: make imaxpct changes in growfs separate Dave Chinner
@ 2018-02-09 16:11   ` Brian Foster
  2018-02-15 22:10     ` Dave Chinner
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-09 16:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:42:00PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When growfs changes the imaxpct value of the filesystem, it runs
> through all the "change size" growfs code, whether it needs to or
> not. Separate out changing imaxpct into it's own function and
> transaction to simplify the rest of the growfs code.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c | 67 +++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 49 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 94650b7d517e..5c844e540320 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
...
> @@ -673,25 +661,68 @@ xfs_growfs_log_private(
...
>  int
>  xfs_growfs_data(
> -	xfs_mount_t		*mp,
> -	xfs_growfs_data_t	*in)
> +	struct xfs_mount	*mp,
> +	struct xfs_growfs_data	*in)
>  {
> -	int error;
> +	int			error = 0;
>  
>  	if (!capable(CAP_SYS_ADMIN))
>  		return -EPERM;
>  	if (!mutex_trylock(&mp->m_growlock))
>  		return -EWOULDBLOCK;
> +
> +	/* update imaxpct seperately to the physical grow of the filesystem */

separately

> +	if (in->imaxpct != mp->m_sb.sb_imax_pct) {
> +		error = xfs_growfs_imaxpct(mp, in->imaxpct);
> +		if (error)
> +			goto out_error;
> +	}
> +
>  	error = xfs_growfs_data_private(mp, in);
> +	if (error)
> +		goto out_error;

The 'xfs_growfs -m <maxpct>' use case typically doesn't involve a size
change. With this change, there's no reason to run through
xfs_growfs_data_private() if in.newblocks == mp->m_sb.sb_dblocks, right?
Otherwise this seems fine.

Brian

> +
> +	/*
> +	 * Post growfs calculations needed to reflect new state in operations
> +	 */
> +	if (mp->m_sb.sb_imax_pct) {
> +		uint64_t icount = mp->m_sb.sb_dblocks * mp->m_sb.sb_imax_pct;
> +		do_div(icount, 100);
> +		mp->m_maxicount = icount << mp->m_sb.sb_inopblog;
> +	} else
> +		mp->m_maxicount = 0;
> +
> +out_error:
>  	/*
>  	 * Increment the generation unconditionally, the error could be from
>  	 * updating the secondary superblocks, in which case the new size
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] xfs: separate secondary sb update in growfs
  2018-02-01  6:42 ` [PATCH 6/7] xfs: separate secondary sb update in growfs Dave Chinner
@ 2018-02-09 16:11   ` Brian Foster
  2018-02-15 22:23     ` Dave Chinner
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-09 16:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:42:01PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This happens after all the transactions to update the superblock
> occur, and errors need to be handled slightly differently. Seperate

Separate

> out the code into it's own function, and clean up the error goto
> stack in the core growfs code as it is now much simpler.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c | 154 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 87 insertions(+), 67 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 5c844e540320..113be7dbdc81 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
...
> @@ -572,16 +572,79 @@ xfs_growfs_data_private(
>  		error = xfs_ag_resv_free(pag);
>  		xfs_perag_put(pag);
>  		if (error)
> -			goto out;
> +			return error;
>  	}
>  
>  	/* Reserve AG metadata blocks. */
> -	error = xfs_fs_reserve_ag_blocks(mp);
> -	if (error && error != -ENOSPC)
> -		goto out;
> +	return xfs_fs_reserve_ag_blocks(mp);

It looks like we change the semantics of -ENOSPC during perag
reservation init. No mention of whether this is intentional and/or
why..?

Brian

> +
> +out_trans_cancel:
> +	xfs_trans_cancel(tp);
> +	return error;
> +}
> +
> +static int
> +xfs_growfs_log_private(
> +	xfs_mount_t		*mp,	/* mount point for filesystem */
> +	xfs_growfs_log_t	*in)	/* growfs log input struct */
> +{
> +	xfs_extlen_t		nb;
> +
> +	nb = in->newblocks;
> +	if (nb < XFS_MIN_LOG_BLOCKS || nb < XFS_B_TO_FSB(mp, XFS_MIN_LOG_BYTES))
> +		return -EINVAL;
> +	if (nb == mp->m_sb.sb_logblocks &&
> +	    in->isint == (mp->m_sb.sb_logstart != 0))
> +		return -EINVAL;
> +	/*
> +	 * Moving the log is hard, need new interfaces to sync
> +	 * the log first, hold off all activity while moving it.
> +	 * Can have shorter or longer log in the same space,
> +	 * or transform internal to external log or vice versa.
> +	 */
> +	return -ENOSYS;
> +}
> +
> +static int
> +xfs_growfs_imaxpct(
> +	struct xfs_mount	*mp,
> +	__u32			imaxpct)
> +{
> +	struct xfs_trans	*tp;
> +	int			dpct;
> +	int			error;
> +
> +	if (imaxpct > 100)
> +		return -EINVAL;
> +
> +	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> +			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> +	if (error)
> +		return error;
> +
> +	dpct = imaxpct - mp->m_sb.sb_imax_pct;
> +	xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
> +	xfs_trans_set_sync(tp);
> +	return xfs_trans_commit(tp);
> +}
> +
> +/*
> + * After a grow operation, we need to update all the secondary superblocks
> + * to match the new state of the primary. Read/init the superblocks and update
> + * them appropriately.
> + */
> +static int
> +xfs_growfs_update_superblocks(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		oagcount)
> +{
> +	struct xfs_buf		*bp;
> +	xfs_agnumber_t		agno;
> +	int			saved_error = 0;
> +	int			error = 0;
>  
>  	/* update secondary superblocks. */
> -	for (agno = 1; agno < nagcount; agno++) {
> +	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
>  		error = 0;
>  		/*
>  		 * new secondary superblocks need to be zeroed, not read from
> @@ -631,57 +694,7 @@ xfs_growfs_data_private(
>  		}
>  	}
>  
> - out:
>  	return saved_error ? saved_error : error;
> -
> - error0:
> -	xfs_trans_cancel(tp);
> -	return error;
> -}
> -
> -static int
> -xfs_growfs_log_private(
> -	xfs_mount_t		*mp,	/* mount point for filesystem */
> -	xfs_growfs_log_t	*in)	/* growfs log input struct */
> -{
> -	xfs_extlen_t		nb;
> -
> -	nb = in->newblocks;
> -	if (nb < XFS_MIN_LOG_BLOCKS || nb < XFS_B_TO_FSB(mp, XFS_MIN_LOG_BYTES))
> -		return -EINVAL;
> -	if (nb == mp->m_sb.sb_logblocks &&
> -	    in->isint == (mp->m_sb.sb_logstart != 0))
> -		return -EINVAL;
> -	/*
> -	 * Moving the log is hard, need new interfaces to sync
> -	 * the log first, hold off all activity while moving it.
> -	 * Can have shorter or longer log in the same space,
> -	 * or transform internal to external log or vice versa.
> -	 */
> -	return -ENOSYS;
> -}
> -
> -static int
> -xfs_growfs_imaxpct(
> -	struct xfs_mount	*mp,
> -	__u32			imaxpct)
> -{
> -	struct xfs_trans	*tp;
> -	int64_t			dpct;
> -	int			error;
> -
> -	if (imaxpct > 100)
> -		return -EINVAL;
> -
> -	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata,
> -			XFS_GROWFS_SPACE_RES(mp), 0, XFS_TRANS_RESERVE, &tp);
> -	if (error)
> -		return error;
> -
> -	dpct = (int64_t)imaxpct - mp->m_sb.sb_imax_pct;
> -	xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
> -	xfs_trans_set_sync(tp);
> -	return xfs_trans_commit(tp);
>  }
>  
>  /*
> @@ -694,6 +707,7 @@ xfs_growfs_data(
>  	struct xfs_mount	*mp,
>  	struct xfs_growfs_data	*in)
>  {
> +	xfs_agnumber_t		oagcount;
>  	int			error = 0;
>  
>  	if (!capable(CAP_SYS_ADMIN))
> @@ -708,6 +722,7 @@ xfs_growfs_data(
>  			goto out_error;
>  	}
>  
> +	oagcount = mp->m_sb.sb_agcount;
>  	error = xfs_growfs_data_private(mp, in);
>  	if (error)
>  		goto out_error;
> @@ -722,6 +737,11 @@ xfs_growfs_data(
>  	} else
>  		mp->m_maxicount = 0;
>  
> +	/*
> +	 * Update secondary superblocks now the physical grow has completed
> +	 */
> +	error = xfs_growfs_update_superblocks(mp, oagcount);
> +
>  out_error:
>  	/*
>  	 * Increment the generation unconditionally, the error could be from
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-01  6:42 ` [PATCH 7/7] xfs: rework secondary superblock updates " Dave Chinner
@ 2018-02-09 16:12   ` Brian Foster
  2018-02-15 22:31     ` Dave Chinner
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-09 16:12 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Right now we wait until we've committed changes to the primary
> superblock before we initialise any of the new secondary
> superblocks. This means that if we have any write errors for new
> secondary superblocks we end up with garbage in place rather than
> zeros or even an "in progress" superblock to indicate a grow
> operation is being done.
> 
> To ensure we can write the secondary superblocks, initialise them
> earlier in the same loop that initialises the AG headers. We stamp
> the new secondary superblocks here with the old geometry, but set
> the "sb_inprogress" field to indicate that updates are being done to
> the superblock so they cannot be used.  This will result in the
> secondary superblock fields being updated or triggering errors that
> will abort the grow before we commit any permanent changes.
> 
> This also means we can change the update mechanism of the secondary
> superblocks.  We know that we are going to wholly overwrite the
> information in the struct xfs_sb in the buffer, so there's no point
> reading it from disk. Just allocate an uncached buffer, zero it in
> memory, stamp the new superblock structure in it and write it out.
> If we fail to write it out, then we'll leave the existing sb (old or
> new w/ inprogress) on disk for repair to deal with later.
> 
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_fsops.c | 92 ++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 55 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 113be7dbdc81..7318cebb591d 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
...
> @@ -630,43 +653,27 @@ xfs_growfs_imaxpct(
>  
...
>  static int
>  xfs_growfs_update_superblocks(
...
>  	/* update secondary superblocks. */
>  	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
> -		error = 0;
> -		/*
> -		 * new secondary superblocks need to be zeroed, not read from
> -		 * disk as the contents of the new area we are growing into is
> -		 * completely unknown.
> -		 */
> -		if (agno < oagcount) {
> -			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
> -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> -				  XFS_FSS_TO_BB(mp, 1), 0, &bp,
> -				  &xfs_sb_buf_ops);
> -		} else {
> -			bp = xfs_trans_get_buf(NULL, mp->m_ddev_targp,
> -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> -				  XFS_FSS_TO_BB(mp, 1), 0);
> -			if (bp) {
> -				bp->b_ops = &xfs_sb_buf_ops;
> -				xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
> -			} else
> -				error = -ENOMEM;
> -		}
> +		struct xfs_buf		*bp;
>  
> +		bp = xfs_growfs_get_hdr_buf(mp,
> +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);

This all seems fine to me up until the point where we use uncached
buffers for pre-existing secondary superblocks. This may all be fine now
if nothing else happens to access/use secondary supers, but it seems
like this essentially enforces that going forward.

Hmm, I see that scrub does appear to look at secondary superblocks via
cached buffers. Shouldn't we expect this path to maintain coherency with
an sb buffer that may have been read/cached from there?

Brian

>  		/*
>  		 * If we get an error reading or writing alternate superblocks,
>  		 * continue.  xfs_repair chooses the "best" superblock based
> @@ -674,25 +681,38 @@ xfs_growfs_update_superblocks(
>  		 * superblocks un-updated than updated, and xfs_repair may
>  		 * pick them over the properly-updated primary.
>  		 */
> -		if (error) {
> +		if (!bp) {
>  			xfs_warn(mp,
> -		"error %d reading secondary superblock for ag %d",
> -				error, agno);
> -			saved_error = error;
> +		"error allocating secondary superblock for ag %d",
> +				agno);
> +			if (!saved_error)
> +				saved_error = -ENOMEM;
>  			continue;
>  		}
>  		xfs_sb_to_disk(XFS_BUF_TO_SBP(bp), &mp->m_sb);
> -
> -		error = xfs_bwrite(bp);
> +		xfs_buf_delwri_queue(bp, &buffer_list);
>  		xfs_buf_relse(bp);
> +
> +		/* don't hold too many buffers at once */
> +		if (agno % 16)
> +			continue;
> +
> +		error = xfs_buf_delwri_submit(&buffer_list);
>  		if (error) {
>  			xfs_warn(mp,
> -		"write error %d updating secondary superblock for ag %d",
> +		"write error %d updating a secondary superblock near ag %d",
>  				error, agno);
> -			saved_error = error;
> +			if (!saved_error)
> +				saved_error = error;
>  			continue;
>  		}
>  	}
> +	error = xfs_buf_delwri_submit(&buffer_list);
> +	if (error) {
> +		xfs_warn(mp,
> +		"write error %d updating a secondary superblock near ag %d",
> +			error, agno);
> +	}
>  
>  	return saved_error ? saved_error : error;
>  }
> @@ -707,7 +727,6 @@ xfs_growfs_data(
>  	struct xfs_mount	*mp,
>  	struct xfs_growfs_data	*in)
>  {
> -	xfs_agnumber_t		oagcount;
>  	int			error = 0;
>  
>  	if (!capable(CAP_SYS_ADMIN))
> @@ -722,7 +741,6 @@ xfs_growfs_data(
>  			goto out_error;
>  	}
>  
> -	oagcount = mp->m_sb.sb_agcount;
>  	error = xfs_growfs_data_private(mp, in);
>  	if (error)
>  		goto out_error;
> @@ -740,7 +758,7 @@ xfs_growfs_data(
>  	/*
>  	 * Update secondary superblocks now the physical grow has completed
>  	 */
> -	error = xfs_growfs_update_superblocks(mp, oagcount);
> +	error = xfs_growfs_update_superblocks(mp);
>  
>  out_error:
>  	/*
> -- 
> 2.15.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] xfs: factor ag btree reoot block initialisation
  2018-02-09 13:10       ` Brian Foster
@ 2018-02-12  0:45         ` Darrick J. Wong
  2018-02-15  5:53           ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2018-02-12  0:45 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Fri, Feb 09, 2018 at 08:10:10AM -0500, Brian Foster wrote:
> On Thu, Feb 08, 2018 at 12:00:07PM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 08, 2018 at 01:54:03PM -0500, Brian Foster wrote:
> > > On Thu, Feb 01, 2018 at 05:41:58PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Cookie cutter code, easily factored.
> > > > 
> > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > 
> > > Seems sane, a couple factoring nits..
> > > 
> > > >  fs/xfs/xfs_fsops.c | 493 +++++++++++++++++++++++++++++------------------------
> > > >  1 file changed, 271 insertions(+), 222 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > > index d9e08d8cf9ac..44eac79e0b49 100644
> > > > --- a/fs/xfs/xfs_fsops.c
> > > > +++ b/fs/xfs/xfs_fsops.c
> > > > @@ -71,46 +71,146 @@ xfs_growfs_get_hdr_buf(
> > > >  	return bp;
> > > >  }
> > > >  
> > > ...
> > > > +/*
> > > > + * Alloc btree root block init functions
> > > > + */
> > > > +static void
> > > > +xfs_bnoroot_init(
> > > > +	struct xfs_mount	*mp,
> > > > +	struct xfs_buf		*bp,
> > > > +	struct aghdr_init_data	*id)
> > > >  {
> > > > -	struct xfs_agf		*agf;
> > > > -	struct xfs_agi		*agi;
> > > > -	struct xfs_agfl		*agfl;
> > > > -	__be32			*agfl_bno;
> > > >  	xfs_alloc_rec_t		*arec;
> > > 
> > > A couple more typedef instances to kill (here and cntroot_init() below).
> > > 
> > > > -	struct xfs_buf		*bp;
> > > > -	int			bucket;
> > > > -	xfs_extlen_t		tmpsize;
> > > > -	int			error = 0;
> > > > +
> > > > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > > > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > > > +					  be32_to_cpu(arec->ar_startblock));
> > > > +}
> > > > +
> > > > +static void
> > > > +xfs_cntroot_init(
> > > > +	struct xfs_mount	*mp,
> > > > +	struct xfs_buf		*bp,
> > > > +	struct aghdr_init_data	*id)
> > > > +{
> > > > +	xfs_alloc_rec_t		*arec;
> > > > +
> > > > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > > > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > > > +					  be32_to_cpu(arec->ar_startblock));
> > > > +	id->nfree += be32_to_cpu(arec->ar_blockcount);
> > > 
> > > This seems unrelated to the cntbt. Perhaps move it to the parent
> > > function? It looks like all we need are mp->m_ag_prealloc_blocks and
> > > id->agsize, after all.
> > > 
> > > That also looks like the only difference between xfs_bnoroot_init() and
> > > xfs_cntroot_init(), fwiw, so we could condense those as well.
> > > 
> > > > +}
> > > > +
> > > ...
> > > > +/*
> > > > + * Write new AG headers to disk. Non-transactional, but written
> > > > + * synchronously so they are completed prior to the growfs transaction
> > > > + * being logged.
> > > > + */
> > > > +static int
> > > > +xfs_grow_ag_headers(
> > > > +	struct xfs_mount	*mp,
> > > > +	struct aghdr_init_data	*id)
> > > >  
> > > > -	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > > -	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > > -	arec->ar_blockcount = cpu_to_be32(
> > > > -		agsize - be32_to_cpu(arec->ar_startblock));
> > > > -	*nfree += be32_to_cpu(arec->ar_blockcount);
> > > > +{
> > > > +	int			error = 0;
> > > >  
> > > ...
> > > > +	/* BNO btree root block */
> > > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
> > > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > +	id->type = XFS_BTNUM_BNO;
> > > > +	id->numrecs = 1;
> > > 
> > > Do we really need to set numrecs for all of these calls? It looks out of
> > > place/context and inconsistently used to me. Case in point: we pass 1 to
> > > the space btree init functions which add a single record, but pass 0 to
> > > the rmapbt init function which actually adds up to 5 records (and
> > > increments the initial numrecs count).
> > 
> > I would've (will?) refactor all this to look like:
> > 
> > struct xfs_rmap_li {
> > 	struct list_head	list;
> > 	struct xfs_rmap_irec	rmap;
> > };
> > 
> > int
> > xfs_rmapbt_initialize(
> > 	struct xfs_mount	*mp,
> > 	struct xfs_buf		*agf_bp,
> > 	struct xfs_buf		*root_bp,
> > 	struct list_head	*rmaps)
> > {
> > 	struct xfs_agf		*agf = XFS_AGF_BUF(agf_bp);
> > 	struct xfs_rmap_li	*li;
> > 	struct xfs_rmap_rec	*disk_rec;
> > 	struct xfs_btree_headr	*rmap_hdr;
> > 
> > 	agf->rmaproot = be32_to_cpu(...);
> > 	agf->rmaplevel = 1;
> > 
> > 	disk_rec = xfs_get_rmap_entries(root_bp);
> > 	rmap_hdr = root_bp->b_addr;
> > 	list_for_each_entry(li, ..., rmaps) {
> > 		xfs_rmap_irec_to_disk(disk_rec, &li->rmap);
> > 		disk_rec++;
> > 		rmap_hdr->numrecs++; /* yeah yeah be32 */
> > 	}
> > 
> > 	/* mark agf dirty, mark rootbp dirty */
> > 	return 0;
> > }
> > 
> > So then you can call it via:
> > 
> > INIT_LIST_HEAD(&rec_list);
> > /* construct rec_list of records */
> > agf_bp = xfs_alloc_read_agf(...);
> > root_bp = xfs_buf_get(..., XFS_RMAP_BLOCK(mp));
> > error = xfs_rmapbt_initialize(mp, agf_bp, root_bp, &rec_list);
> > 
> > But I haven't had time to look through this in enough detail to figure
> > out how to merge it with the online repair stuff.  Maybe it doesn't even
> > make sense to merge them just to shave a few lines of header
> > initialization.
> > 
> 
> Hm, that looks like it has the potential to change this a decent amount.
> It's not clear to me how it would affect the growfs init function
> interface. From the perspective of the original comment
> (aghdr_init_data->numrecs), ISTM you'd still expect it to be zero in
> this example, regardless of whether the init function hardcodes
> insertion of a fixed number of entries or dynamically adds those (likely
> the same number of in the growfs case) entries.
> 
> So what is the status of this series then? Is it being tossed in favor
> of something else that is pending?

I plan to take a closer look at whether or not it makes sense to try to
share initialization code between growfs and onlinerepair and then
decide what to do.  I'd probably just pull this as is and then take a
look at what I'd have to change to make it work with repair code.  It
may very well turn out that it's not worth sharing for two rather
different use cases.

--D

> Brian
> 
> > --D
> > 
> > > AFAICT, each initialization function knows how many records it's going
> > > to add. I don't see why that information needs to leak outside of init
> > > function context..?
> > > 
> > > Brian
> > > 
> > > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
> > > > +				   &xfs_allocbt_buf_ops);
> > > > +	if (error)
> > > > +		goto out_error;
> > > >  
> > > > -		/* account inode btree root blocks */
> > > > -		rrec = XFS_RMAP_REC_ADDR(block, 3);
> > > > -		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> > > > -		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> > > > -						XFS_IBT_BLOCK(mp));
> > > > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> > > > -		rrec->rm_offset = 0;
> > > > -		be16_add_cpu(&block->bb_numrecs, 1);
> > > >  
> > > > -		/* account for rmap btree root */
> > > > -		rrec = XFS_RMAP_REC_ADDR(block, 4);
> > > > -		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> > > > -		rrec->rm_blockcount = cpu_to_be32(1);
> > > > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> > > > -		rrec->rm_offset = 0;
> > > > -		be16_add_cpu(&block->bb_numrecs, 1);
> > > > +	/* CNT btree root block */
> > > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
> > > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > +	id->type = XFS_BTNUM_CNT;
> > > > +	id->numrecs = 1;
> > > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
> > > > +				   &xfs_allocbt_buf_ops);
> > > > +	if (error)
> > > > +		goto out_error;
> > > >  
> > > > -		/* account for refc btree root */
> > > > -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > -			rrec = XFS_RMAP_REC_ADDR(block, 5);
> > > > -			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
> > > > -			rrec->rm_blockcount = cpu_to_be32(1);
> > > > -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> > > > -			rrec->rm_offset = 0;
> > > > -			be16_add_cpu(&block->bb_numrecs, 1);
> > > > -		}
> > > > +	/* RMAP btree root block */
> > > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
> > > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > +		id->type = XFS_BTNUM_RMAP;
> > > > +		id->numrecs = 0;
> > > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
> > > > +					   &xfs_rmapbt_buf_ops);
> > > > +		if (error)
> > > > +			goto out_error;
> > > >  
> > > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > > -		xfs_buf_relse(bp);
> > > >  	}
> > > >  
> > > > -	/*
> > > > -	 * INO btree root block
> > > > -	 */
> > > > -	bp = xfs_growfs_get_hdr_buf(mp,
> > > > -			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> > > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > > -			&xfs_inobt_buf_ops);
> > > > -	if (!bp) {
> > > > -		error = -ENOMEM;
> > > > +	/* INO btree root block */
> > > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
> > > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > +	id->type = XFS_BTNUM_INO;
> > > > +	id->numrecs = 0;
> > > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > > +				   &xfs_inobt_buf_ops);
> > > > +	if (error)
> > > >  		goto out_error;
> > > > -	}
> > > >  
> > > > -	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> > > > -	xfs_buf_delwri_queue(bp, buffer_list);
> > > > -	xfs_buf_relse(bp);
> > > >  
> > > >  	/*
> > > >  	 * FINO btree root block
> > > >  	 */
> > > >  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > > > -		bp = xfs_growfs_get_hdr_buf(mp,
> > > > -			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> > > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > > -			&xfs_inobt_buf_ops);
> > > > -		if (!bp) {
> > > > -			error = -ENOMEM;
> > > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
> > > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > +		id->type = XFS_BTNUM_FINO;
> > > > +		id->numrecs = 0;
> > > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > > +					   &xfs_inobt_buf_ops);
> > > > +		if (error)
> > > >  			goto out_error;
> > > > -		}
> > > > -
> > > > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
> > > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > > -		xfs_buf_relse(bp);
> > > >  	}
> > > >  
> > > >  	/*
> > > >  	 * refcount btree root block
> > > >  	 */
> > > >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > -		bp = xfs_growfs_get_hdr_buf(mp,
> > > > -			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> > > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > > -			&xfs_refcountbt_buf_ops);
> > > > -		if (!bp) {
> > > > -			error = -ENOMEM;
> > > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
> > > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > +		id->type = XFS_BTNUM_REFC;
> > > > +		id->numrecs = 0;
> > > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > > +					   &xfs_refcountbt_buf_ops);
> > > > +		if (error)
> > > >  			goto out_error;
> > > > -		}
> > > > -
> > > > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
> > > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > > -		xfs_buf_relse(bp);
> > > >  	}
> > > >  
> > > >  out_error:
> > > > @@ -384,7 +433,6 @@ xfs_growfs_data_private(
> > > >  	xfs_agf_t		*agf;
> > > >  	xfs_agi_t		*agi;
> > > >  	xfs_agnumber_t		agno;
> > > > -	xfs_extlen_t		agsize;
> > > >  	xfs_buf_t		*bp;
> > > >  	int			dpct;
> > > >  	int			error, saved_error = 0;
> > > > @@ -392,11 +440,11 @@ xfs_growfs_data_private(
> > > >  	xfs_agnumber_t		nagimax = 0;
> > > >  	xfs_rfsblock_t		nb, nb_mod;
> > > >  	xfs_rfsblock_t		new;
> > > > -	xfs_rfsblock_t		nfree;
> > > >  	xfs_agnumber_t		oagcount;
> > > >  	int			pct;
> > > >  	xfs_trans_t		*tp;
> > > >  	LIST_HEAD		(buffer_list);
> > > > +	struct aghdr_init_data	id = {};
> > > >  
> > > >  	nb = in->newblocks;
> > > >  	pct = in->imaxpct;
> > > > @@ -448,27 +496,28 @@ xfs_growfs_data_private(
> > > >  	 * list to write, we can cancel the entire list without having written
> > > >  	 * anything.
> > > >  	 */
> > > > -	nfree = 0;
> > > > -	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> > > > -
> > > > -		if (agno == nagcount - 1)
> > > > -			agsize = nb -
> > > > -				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > > > +	INIT_LIST_HEAD(&id.buffer_list);
> > > > +	for (id.agno = nagcount - 1;
> > > > +	     id.agno >= oagcount;
> > > > +	     id.agno--, new -= id.agsize) {
> > > > +
> > > > +		if (id.agno == nagcount - 1)
> > > > +			id.agsize = nb -
> > > > +				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > > >  		else
> > > > -			agsize = mp->m_sb.sb_agblocks;
> > > > +			id.agsize = mp->m_sb.sb_agblocks;
> > > >  
> > > > -		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
> > > > -					    &buffer_list);
> > > > +		error = xfs_grow_ag_headers(mp, &id);
> > > >  		if (error) {
> > > > -			xfs_buf_delwri_cancel(&buffer_list);
> > > > +			xfs_buf_delwri_cancel(&id.buffer_list);
> > > >  			goto error0;
> > > >  		}
> > > >  	}
> > > > -	error = xfs_buf_delwri_submit(&buffer_list);
> > > > +	error = xfs_buf_delwri_submit(&id.buffer_list);
> > > >  	if (error)
> > > >  		goto error0;
> > > >  
> > > > -	xfs_trans_agblocks_delta(tp, nfree);
> > > > +	xfs_trans_agblocks_delta(tp, id.nfree);
> > > >  
> > > >  	/*
> > > >  	 * There are new blocks in the old last a.g.
> > > > @@ -479,7 +528,7 @@ xfs_growfs_data_private(
> > > >  		/*
> > > >  		 * Change the agi length.
> > > >  		 */
> > > > -		error = xfs_ialloc_read_agi(mp, tp, agno, &bp);
> > > > +		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
> > > >  		if (error) {
> > > >  			goto error0;
> > > >  		}
> > > > @@ -492,7 +541,7 @@ xfs_growfs_data_private(
> > > >  		/*
> > > >  		 * Change agf length.
> > > >  		 */
> > > > -		error = xfs_alloc_read_agf(mp, tp, agno, 0, &bp);
> > > > +		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
> > > >  		if (error) {
> > > >  			goto error0;
> > > >  		}
> > > > @@ -511,13 +560,13 @@ xfs_growfs_data_private(
> > > >  		 * this doesn't actually exist in the rmap btree.
> > > >  		 */
> > > >  		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> > > > -		error = xfs_rmap_free(tp, bp, agno,
> > > > +		error = xfs_rmap_free(tp, bp, id.agno,
> > > >  				be32_to_cpu(agf->agf_length) - new,
> > > >  				new, &oinfo);
> > > >  		if (error)
> > > >  			goto error0;
> > > >  		error = xfs_free_extent(tp,
> > > > -				XFS_AGB_TO_FSB(mp, agno,
> > > > +				XFS_AGB_TO_FSB(mp, id.agno,
> > > >  					be32_to_cpu(agf->agf_length) - new),
> > > >  				new, &oinfo, XFS_AG_RESV_NONE);
> > > >  		if (error)
> > > > @@ -534,8 +583,8 @@ xfs_growfs_data_private(
> > > >  	if (nb > mp->m_sb.sb_dblocks)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > >  				 nb - mp->m_sb.sb_dblocks);
> > > > -	if (nfree)
> > > > -		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
> > > > +	if (id.nfree)
> > > > +		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > > >  	if (dpct)
> > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
> > > >  	xfs_trans_set_sync(tp);
> > > > @@ -562,7 +611,7 @@ xfs_growfs_data_private(
> > > >  	if (new) {
> > > >  		struct xfs_perag	*pag;
> > > >  
> > > > -		pag = xfs_perag_get(mp, agno);
> > > > +		pag = xfs_perag_get(mp, id.agno);
> > > >  		error = xfs_ag_resv_free(pag);
> > > >  		xfs_perag_put(pag);
> > > >  		if (error)
> > > > -- 
> > > > 2.15.1
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/7] xfs: factor ag btree reoot block initialisation
  2018-02-12  0:45         ` Darrick J. Wong
@ 2018-02-15  5:53           ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2018-02-15  5:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Sun, Feb 11, 2018 at 04:45:16PM -0800, Darrick J. Wong wrote:
> On Fri, Feb 09, 2018 at 08:10:10AM -0500, Brian Foster wrote:
> > On Thu, Feb 08, 2018 at 12:00:07PM -0800, Darrick J. Wong wrote:
> > > On Thu, Feb 08, 2018 at 01:54:03PM -0500, Brian Foster wrote:
> > > > On Thu, Feb 01, 2018 at 05:41:58PM +1100, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > Cookie cutter code, easily factored.
> > > > > 
> > > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > > > ---
> > > > 
> > > > Seems sane, a couple factoring nits..
> > > > 
> > > > >  fs/xfs/xfs_fsops.c | 493 +++++++++++++++++++++++++++++------------------------
> > > > >  1 file changed, 271 insertions(+), 222 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > > > index d9e08d8cf9ac..44eac79e0b49 100644
> > > > > --- a/fs/xfs/xfs_fsops.c
> > > > > +++ b/fs/xfs/xfs_fsops.c
> > > > > @@ -71,46 +71,146 @@ xfs_growfs_get_hdr_buf(
> > > > >  	return bp;
> > > > >  }
> > > > >  
> > > > ...
> > > > > +/*
> > > > > + * Alloc btree root block init functions
> > > > > + */
> > > > > +static void
> > > > > +xfs_bnoroot_init(
> > > > > +	struct xfs_mount	*mp,
> > > > > +	struct xfs_buf		*bp,
> > > > > +	struct aghdr_init_data	*id)
> > > > >  {
> > > > > -	struct xfs_agf		*agf;
> > > > > -	struct xfs_agi		*agi;
> > > > > -	struct xfs_agfl		*agfl;
> > > > > -	__be32			*agfl_bno;
> > > > >  	xfs_alloc_rec_t		*arec;
> > > > 
> > > > A couple more typedef instances to kill (here and cntroot_init() below).
> > > > 
> > > > > -	struct xfs_buf		*bp;
> > > > > -	int			bucket;
> > > > > -	xfs_extlen_t		tmpsize;
> > > > > -	int			error = 0;
> > > > > +
> > > > > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > > > > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > > > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > > > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > > > > +					  be32_to_cpu(arec->ar_startblock));
> > > > > +}
> > > > > +
> > > > > +static void
> > > > > +xfs_cntroot_init(
> > > > > +	struct xfs_mount	*mp,
> > > > > +	struct xfs_buf		*bp,
> > > > > +	struct aghdr_init_data	*id)
> > > > > +{
> > > > > +	xfs_alloc_rec_t		*arec;
> > > > > +
> > > > > +	xfs_btree_init_block(mp, bp, id->type, 0, id->numrecs, id->agno, 0);
> > > > > +	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > > > +	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > > > +	arec->ar_blockcount = cpu_to_be32(id->agsize -
> > > > > +					  be32_to_cpu(arec->ar_startblock));
> > > > > +	id->nfree += be32_to_cpu(arec->ar_blockcount);
> > > > 
> > > > This seems unrelated to the cntbt. Perhaps move it to the parent
> > > > function? It looks like all we need are mp->m_ag_prealloc_blocks and
> > > > id->agsize, after all.
> > > > 
> > > > That also looks like the only difference between xfs_bnoroot_init() and
> > > > xfs_cntroot_init(), fwiw, so we could condense those as well.
> > > > 
> > > > > +}
> > > > > +
> > > > ...
> > > > > +/*
> > > > > + * Write new AG headers to disk. Non-transactional, but written
> > > > > + * synchronously so they are completed prior to the growfs transaction
> > > > > + * being logged.
> > > > > + */
> > > > > +static int
> > > > > +xfs_grow_ag_headers(
> > > > > +	struct xfs_mount	*mp,
> > > > > +	struct aghdr_init_data	*id)
> > > > >  
> > > > > -	arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
> > > > > -	arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
> > > > > -	arec->ar_blockcount = cpu_to_be32(
> > > > > -		agsize - be32_to_cpu(arec->ar_startblock));
> > > > > -	*nfree += be32_to_cpu(arec->ar_blockcount);
> > > > > +{
> > > > > +	int			error = 0;
> > > > >  
> > > > ...
> > > > > +	/* BNO btree root block */
> > > > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_BNO_BLOCK(mp));
> > > > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > > +	id->type = XFS_BTNUM_BNO;
> > > > > +	id->numrecs = 1;
> > > > 
> > > > Do we really need to set numrecs for all of these calls? It looks out of
> > > > place/context and inconsistently used to me. Case in point: we pass 1 to
> > > > the space btree init functions which add a single record, but pass 0 to
> > > > the rmapbt init function which actually adds up to 5 records (and
> > > > increments the initial numrecs count).
> > > 
> > > I would've (will?) refactor all this to look like:
> > > 
> > > struct xfs_rmap_li {
> > > 	struct list_head	list;
> > > 	struct xfs_rmap_irec	rmap;
> > > };
> > > 
> > > int
> > > xfs_rmapbt_initialize(
> > > 	struct xfs_mount	*mp,
> > > 	struct xfs_buf		*agf_bp,
> > > 	struct xfs_buf		*root_bp,
> > > 	struct list_head	*rmaps)
> > > {
> > > 	struct xfs_agf		*agf = XFS_AGF_BUF(agf_bp);
> > > 	struct xfs_rmap_li	*li;
> > > 	struct xfs_rmap_rec	*disk_rec;
> > > 	struct xfs_btree_headr	*rmap_hdr;
> > > 
> > > 	agf->rmaproot = be32_to_cpu(...);
> > > 	agf->rmaplevel = 1;
> > > 
> > > 	disk_rec = xfs_get_rmap_entries(root_bp);
> > > 	rmap_hdr = root_bp->b_addr;
> > > 	list_for_each_entry(li, ..., rmaps) {
> > > 		xfs_rmap_irec_to_disk(disk_rec, &li->rmap);
> > > 		disk_rec++;
> > > 		rmap_hdr->numrecs++; /* yeah yeah be32 */
> > > 	}
> > > 
> > > 	/* mark agf dirty, mark rootbp dirty */
> > > 	return 0;
> > > }
> > > 
> > > So then you can call it via:
> > > 
> > > INIT_LIST_HEAD(&rec_list);
> > > /* construct rec_list of records */
> > > agf_bp = xfs_alloc_read_agf(...);
> > > root_bp = xfs_buf_get(..., XFS_RMAP_BLOCK(mp));
> > > error = xfs_rmapbt_initialize(mp, agf_bp, root_bp, &rec_list);
> > > 
> > > But I haven't had time to look through this in enough detail to figure
> > > out how to merge it with the online repair stuff.  Maybe it doesn't even
> > > make sense to merge them just to shave a few lines of header
> > > initialization.
> > > 
> > 
> > Hm, that looks like it has the potential to change this a decent amount.
> > It's not clear to me how it would affect the growfs init function
> > interface. From the perspective of the original comment
> > (aghdr_init_data->numrecs), ISTM you'd still expect it to be zero in
> > this example, regardless of whether the init function hardcodes
> > insertion of a fixed number of entries or dynamically adds those (likely
> > the same number of in the growfs case) entries.
> > 
> > So what is the status of this series then? Is it being tossed in favor
> > of something else that is pending?
> 
> I plan to take a closer look at whether or not it makes sense to try to
> share initialization code between growfs and onlinerepair and then
> decide what to do.  I'd probably just pull this as is and then take a
> look at what I'd have to change to make it work with repair code.  It
> may very well turn out that it's not worth sharing for two rather
> different use cases.

I took another look, as promised.  The growfs functions are very
simplistic in sense that they don't have to contend with existing space
allocations, which is a fancy way of saying that they initialize the
header, and stuff in the minimal set of entries.  The repair code is
rather more complex since it has to step around adjusting whatever
counters were already initialized, and deal with a large number of
records, so it might as well live separately from growfs.  growfs using
delwri buffer lists (vs. repair which uses transactions) is another
stumbling block.

TLDR: I think I'll just take this series (module review comments)
without trying to merge it with the repair code.

--D

> --D
> 
> > Brian
> > 
> > > --D
> > > 
> > > > AFAICT, each initialization function knows how many records it's going
> > > > to add. I don't see why that information needs to leak outside of init
> > > > function context..?
> > > > 
> > > > Brian
> > > > 
> > > > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_bnoroot_init,
> > > > > +				   &xfs_allocbt_buf_ops);
> > > > > +	if (error)
> > > > > +		goto out_error;
> > > > >  
> > > > > -		/* account inode btree root blocks */
> > > > > -		rrec = XFS_RMAP_REC_ADDR(block, 3);
> > > > > -		rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
> > > > > -		rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
> > > > > -						XFS_IBT_BLOCK(mp));
> > > > > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
> > > > > -		rrec->rm_offset = 0;
> > > > > -		be16_add_cpu(&block->bb_numrecs, 1);
> > > > >  
> > > > > -		/* account for rmap btree root */
> > > > > -		rrec = XFS_RMAP_REC_ADDR(block, 4);
> > > > > -		rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
> > > > > -		rrec->rm_blockcount = cpu_to_be32(1);
> > > > > -		rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
> > > > > -		rrec->rm_offset = 0;
> > > > > -		be16_add_cpu(&block->bb_numrecs, 1);
> > > > > +	/* CNT btree root block */
> > > > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_CNT_BLOCK(mp));
> > > > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > > +	id->type = XFS_BTNUM_CNT;
> > > > > +	id->numrecs = 1;
> > > > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_cntroot_init,
> > > > > +				   &xfs_allocbt_buf_ops);
> > > > > +	if (error)
> > > > > +		goto out_error;
> > > > >  
> > > > > -		/* account for refc btree root */
> > > > > -		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > > -			rrec = XFS_RMAP_REC_ADDR(block, 5);
> > > > > -			rrec->rm_startblock = cpu_to_be32(xfs_refc_block(mp));
> > > > > -			rrec->rm_blockcount = cpu_to_be32(1);
> > > > > -			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
> > > > > -			rrec->rm_offset = 0;
> > > > > -			be16_add_cpu(&block->bb_numrecs, 1);
> > > > > -		}
> > > > > +	/* RMAP btree root block */
> > > > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_RMAP_BLOCK(mp));
> > > > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > > +		id->type = XFS_BTNUM_RMAP;
> > > > > +		id->numrecs = 0;
> > > > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_rmaproot_init,
> > > > > +					   &xfs_rmapbt_buf_ops);
> > > > > +		if (error)
> > > > > +			goto out_error;
> > > > >  
> > > > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > > > -		xfs_buf_relse(bp);
> > > > >  	}
> > > > >  
> > > > > -	/*
> > > > > -	 * INO btree root block
> > > > > -	 */
> > > > > -	bp = xfs_growfs_get_hdr_buf(mp,
> > > > > -			XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
> > > > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > > > -			&xfs_inobt_buf_ops);
> > > > > -	if (!bp) {
> > > > > -		error = -ENOMEM;
> > > > > +	/* INO btree root block */
> > > > > +	id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_IBT_BLOCK(mp));
> > > > > +	id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > > +	id->type = XFS_BTNUM_INO;
> > > > > +	id->numrecs = 0;
> > > > > +	error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > > > +				   &xfs_inobt_buf_ops);
> > > > > +	if (error)
> > > > >  		goto out_error;
> > > > > -	}
> > > > >  
> > > > > -	xfs_btree_init_block(mp, bp, XFS_BTNUM_INO , 0, 0, agno, 0);
> > > > > -	xfs_buf_delwri_queue(bp, buffer_list);
> > > > > -	xfs_buf_relse(bp);
> > > > >  
> > > > >  	/*
> > > > >  	 * FINO btree root block
> > > > >  	 */
> > > > >  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > > > > -		bp = xfs_growfs_get_hdr_buf(mp,
> > > > > -			XFS_AGB_TO_DADDR(mp, agno, XFS_FIBT_BLOCK(mp)),
> > > > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > > > -			&xfs_inobt_buf_ops);
> > > > > -		if (!bp) {
> > > > > -			error = -ENOMEM;
> > > > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, XFS_FIBT_BLOCK(mp));
> > > > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > > +		id->type = XFS_BTNUM_FINO;
> > > > > +		id->numrecs = 0;
> > > > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > > > +					   &xfs_inobt_buf_ops);
> > > > > +		if (error)
> > > > >  			goto out_error;
> > > > > -		}
> > > > > -
> > > > > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_FINO, 0, 0, agno, 0);
> > > > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > > > -		xfs_buf_relse(bp);
> > > > >  	}
> > > > >  
> > > > >  	/*
> > > > >  	 * refcount btree root block
> > > > >  	 */
> > > > >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > > -		bp = xfs_growfs_get_hdr_buf(mp,
> > > > > -			XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
> > > > > -			BTOBB(mp->m_sb.sb_blocksize), 0,
> > > > > -			&xfs_refcountbt_buf_ops);
> > > > > -		if (!bp) {
> > > > > -			error = -ENOMEM;
> > > > > +		id->daddr = XFS_AGB_TO_DADDR(mp, id->agno, xfs_refc_block(mp));
> > > > > +		id->numblks = BTOBB(mp->m_sb.sb_blocksize);
> > > > > +		id->type = XFS_BTNUM_REFC;
> > > > > +		id->numrecs = 0;
> > > > > +		error = xfs_growfs_init_aghdr(mp, id, xfs_btroot_init,
> > > > > +					   &xfs_refcountbt_buf_ops);
> > > > > +		if (error)
> > > > >  			goto out_error;
> > > > > -		}
> > > > > -
> > > > > -		xfs_btree_init_block(mp, bp, XFS_BTNUM_REFC, 0, 0, agno, 0);
> > > > > -		xfs_buf_delwri_queue(bp, buffer_list);
> > > > > -		xfs_buf_relse(bp);
> > > > >  	}
> > > > >  
> > > > >  out_error:
> > > > > @@ -384,7 +433,6 @@ xfs_growfs_data_private(
> > > > >  	xfs_agf_t		*agf;
> > > > >  	xfs_agi_t		*agi;
> > > > >  	xfs_agnumber_t		agno;
> > > > > -	xfs_extlen_t		agsize;
> > > > >  	xfs_buf_t		*bp;
> > > > >  	int			dpct;
> > > > >  	int			error, saved_error = 0;
> > > > > @@ -392,11 +440,11 @@ xfs_growfs_data_private(
> > > > >  	xfs_agnumber_t		nagimax = 0;
> > > > >  	xfs_rfsblock_t		nb, nb_mod;
> > > > >  	xfs_rfsblock_t		new;
> > > > > -	xfs_rfsblock_t		nfree;
> > > > >  	xfs_agnumber_t		oagcount;
> > > > >  	int			pct;
> > > > >  	xfs_trans_t		*tp;
> > > > >  	LIST_HEAD		(buffer_list);
> > > > > +	struct aghdr_init_data	id = {};
> > > > >  
> > > > >  	nb = in->newblocks;
> > > > >  	pct = in->imaxpct;
> > > > > @@ -448,27 +496,28 @@ xfs_growfs_data_private(
> > > > >  	 * list to write, we can cancel the entire list without having written
> > > > >  	 * anything.
> > > > >  	 */
> > > > > -	nfree = 0;
> > > > > -	for (agno = nagcount - 1; agno >= oagcount; agno--, new -= agsize) {
> > > > > -
> > > > > -		if (agno == nagcount - 1)
> > > > > -			agsize = nb -
> > > > > -				(agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > > > > +	INIT_LIST_HEAD(&id.buffer_list);
> > > > > +	for (id.agno = nagcount - 1;
> > > > > +	     id.agno >= oagcount;
> > > > > +	     id.agno--, new -= id.agsize) {
> > > > > +
> > > > > +		if (id.agno == nagcount - 1)
> > > > > +			id.agsize = nb -
> > > > > +				(id.agno * (xfs_rfsblock_t)mp->m_sb.sb_agblocks);
> > > > >  		else
> > > > > -			agsize = mp->m_sb.sb_agblocks;
> > > > > +			id.agsize = mp->m_sb.sb_agblocks;
> > > > >  
> > > > > -		error = xfs_grow_ag_headers(mp, agno, agsize, &nfree,
> > > > > -					    &buffer_list);
> > > > > +		error = xfs_grow_ag_headers(mp, &id);
> > > > >  		if (error) {
> > > > > -			xfs_buf_delwri_cancel(&buffer_list);
> > > > > +			xfs_buf_delwri_cancel(&id.buffer_list);
> > > > >  			goto error0;
> > > > >  		}
> > > > >  	}
> > > > > -	error = xfs_buf_delwri_submit(&buffer_list);
> > > > > +	error = xfs_buf_delwri_submit(&id.buffer_list);
> > > > >  	if (error)
> > > > >  		goto error0;
> > > > >  
> > > > > -	xfs_trans_agblocks_delta(tp, nfree);
> > > > > +	xfs_trans_agblocks_delta(tp, id.nfree);
> > > > >  
> > > > >  	/*
> > > > >  	 * There are new blocks in the old last a.g.
> > > > > @@ -479,7 +528,7 @@ xfs_growfs_data_private(
> > > > >  		/*
> > > > >  		 * Change the agi length.
> > > > >  		 */
> > > > > -		error = xfs_ialloc_read_agi(mp, tp, agno, &bp);
> > > > > +		error = xfs_ialloc_read_agi(mp, tp, id.agno, &bp);
> > > > >  		if (error) {
> > > > >  			goto error0;
> > > > >  		}
> > > > > @@ -492,7 +541,7 @@ xfs_growfs_data_private(
> > > > >  		/*
> > > > >  		 * Change agf length.
> > > > >  		 */
> > > > > -		error = xfs_alloc_read_agf(mp, tp, agno, 0, &bp);
> > > > > +		error = xfs_alloc_read_agf(mp, tp, id.agno, 0, &bp);
> > > > >  		if (error) {
> > > > >  			goto error0;
> > > > >  		}
> > > > > @@ -511,13 +560,13 @@ xfs_growfs_data_private(
> > > > >  		 * this doesn't actually exist in the rmap btree.
> > > > >  		 */
> > > > >  		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> > > > > -		error = xfs_rmap_free(tp, bp, agno,
> > > > > +		error = xfs_rmap_free(tp, bp, id.agno,
> > > > >  				be32_to_cpu(agf->agf_length) - new,
> > > > >  				new, &oinfo);
> > > > >  		if (error)
> > > > >  			goto error0;
> > > > >  		error = xfs_free_extent(tp,
> > > > > -				XFS_AGB_TO_FSB(mp, agno,
> > > > > +				XFS_AGB_TO_FSB(mp, id.agno,
> > > > >  					be32_to_cpu(agf->agf_length) - new),
> > > > >  				new, &oinfo, XFS_AG_RESV_NONE);
> > > > >  		if (error)
> > > > > @@ -534,8 +583,8 @@ xfs_growfs_data_private(
> > > > >  	if (nb > mp->m_sb.sb_dblocks)
> > > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_DBLOCKS,
> > > > >  				 nb - mp->m_sb.sb_dblocks);
> > > > > -	if (nfree)
> > > > > -		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, nfree);
> > > > > +	if (id.nfree)
> > > > > +		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, id.nfree);
> > > > >  	if (dpct)
> > > > >  		xfs_trans_mod_sb(tp, XFS_TRANS_SB_IMAXPCT, dpct);
> > > > >  	xfs_trans_set_sync(tp);
> > > > > @@ -562,7 +611,7 @@ xfs_growfs_data_private(
> > > > >  	if (new) {
> > > > >  		struct xfs_perag	*pag;
> > > > >  
> > > > > -		pag = xfs_perag_get(mp, agno);
> > > > > +		pag = xfs_perag_get(mp, id.agno);
> > > > >  		error = xfs_ag_resv_free(pag);
> > > > >  		xfs_perag_put(pag);
> > > > >  		if (error)
> > > > > -- 
> > > > > 2.15.1
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/7] xfs: make imaxpct changes in growfs separate
  2018-02-09 16:11   ` Brian Foster
@ 2018-02-15 22:10     ` Dave Chinner
  0 siblings, 0 replies; 32+ messages in thread
From: Dave Chinner @ 2018-02-15 22:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Feb 09, 2018 at 11:11:43AM -0500, Brian Foster wrote:
> On Thu, Feb 01, 2018 at 05:42:00PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When growfs changes the imaxpct value of the filesystem, it runs
> > through all the "change size" growfs code, whether it needs to or
> > not. Separate out changing imaxpct into it's own function and
> > transaction to simplify the rest of the growfs code.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_fsops.c | 67 +++++++++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 49 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 94650b7d517e..5c844e540320 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> ...
> > @@ -673,25 +661,68 @@ xfs_growfs_log_private(
> ...
> >  int
> >  xfs_growfs_data(
> > -	xfs_mount_t		*mp,
> > -	xfs_growfs_data_t	*in)
> > +	struct xfs_mount	*mp,
> > +	struct xfs_growfs_data	*in)
> >  {
> > -	int error;
> > +	int			error = 0;
> >  
> >  	if (!capable(CAP_SYS_ADMIN))
> >  		return -EPERM;
> >  	if (!mutex_trylock(&mp->m_growlock))
> >  		return -EWOULDBLOCK;
> > +
> > +	/* update imaxpct seperately to the physical grow of the filesystem */
> 
> separately
> 
> > +	if (in->imaxpct != mp->m_sb.sb_imax_pct) {
> > +		error = xfs_growfs_imaxpct(mp, in->imaxpct);
> > +		if (error)
> > +			goto out_error;
> > +	}
> > +
> >  	error = xfs_growfs_data_private(mp, in);
> > +	if (error)
> > +		goto out_error;
> 
> The 'xfs_growfs -m <maxpct>' use case typically doesn't involve a size
> change. With this change, there's no reason to run through
> xfs_growfs_data_private() if in.newblocks == mp->m_sb.sb_dblocks, right?

Yeah, we can probably do that. I hadn't done that because those
checks (and much more complex ones like a runt last AG) were
already in the xfs_growfs_data_private() code.

> Otherwise this seems fine.

Thanks!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] xfs: separate secondary sb update in growfs
  2018-02-09 16:11   ` Brian Foster
@ 2018-02-15 22:23     ` Dave Chinner
  2018-02-16 12:31       ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-15 22:23 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Feb 09, 2018 at 11:11:54AM -0500, Brian Foster wrote:
> On Thu, Feb 01, 2018 at 05:42:01PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > This happens after all the transactions to update the superblock
> > occur, and errors need to be handled slightly differently. Seperate
> 
> Separate
> 
> > out the code into it's own function, and clean up the error goto
> > stack in the core growfs code as it is now much simpler.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_fsops.c | 154 ++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 87 insertions(+), 67 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 5c844e540320..113be7dbdc81 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> ...
> > @@ -572,16 +572,79 @@ xfs_growfs_data_private(
> >  		error = xfs_ag_resv_free(pag);
> >  		xfs_perag_put(pag);
> >  		if (error)
> > -			goto out;
> > +			return error;
> >  	}
> >  
> >  	/* Reserve AG metadata blocks. */
> > -	error = xfs_fs_reserve_ag_blocks(mp);
> > -	if (error && error != -ENOSPC)
> > -		goto out;
> > +	return xfs_fs_reserve_ag_blocks(mp);
> 
> It looks like we change the semantics of -ENOSPC during perag
> reservation init. No mention of whether this is intentional and/or
> why..?

Not sure what I changed here - it just returns the error to the
caller because it's no longer going to jump over code after
xfs_fs_reserve_ag_blocks(mp) has already shut down the filesystem
(which it does on any error other than ENOSPC).

Perhaps....

> > @@ -694,6 +707,7 @@ xfs_growfs_data(
> >  	struct xfs_mount	*mp,
> >  	struct xfs_growfs_data	*in)
> >  {
> > +	xfs_agnumber_t		oagcount;
> >  	int			error = 0;
> >  
> >  	if (!capable(CAP_SYS_ADMIN))
> > @@ -708,6 +722,7 @@ xfs_growfs_data(
> >  			goto out_error;
> >  	}
> >  
> > +	oagcount = mp->m_sb.sb_agcount;
> >  	error = xfs_growfs_data_private(mp, in);
> >  	if (error)
> >  		goto out_error;

.... you are commenting on this code here, were ENOSPC is not
specially handled to all the superblocks to be updated even if we
got an ENOSPC on data-grow?

> > @@ -722,6 +737,11 @@ xfs_growfs_data(
> >  	} else
> >  		mp->m_maxicount = 0;
> >  
> > +	/*
> > +	 * Update secondary superblocks now the physical grow has completed
> > +	 */
> > +	error = xfs_growfs_update_superblocks(mp, oagcount);
> > +

i.e. it doesn't run this at ENOSPC now?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-09 16:12   ` Brian Foster
@ 2018-02-15 22:31     ` Dave Chinner
  2018-02-16 12:56       ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-15 22:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Right now we wait until we've committed changes to the primary
> > superblock before we initialise any of the new secondary
> > superblocks. This means that if we have any write errors for new
> > secondary superblocks we end up with garbage in place rather than
> > zeros or even an "in progress" superblock to indicate a grow
> > operation is being done.
> > 
> > To ensure we can write the secondary superblocks, initialise them
> > earlier in the same loop that initialises the AG headers. We stamp
> > the new secondary superblocks here with the old geometry, but set
> > the "sb_inprogress" field to indicate that updates are being done to
> > the superblock so they cannot be used.  This will result in the
> > secondary superblock fields being updated or triggering errors that
> > will abort the grow before we commit any permanent changes.
> > 
> > This also means we can change the update mechanism of the secondary
> > superblocks.  We know that we are going to wholly overwrite the
> > information in the struct xfs_sb in the buffer, so there's no point
> > reading it from disk. Just allocate an uncached buffer, zero it in
> > memory, stamp the new superblock structure in it and write it out.
> > If we fail to write it out, then we'll leave the existing sb (old or
> > new w/ inprogress) on disk for repair to deal with later.
> > 
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_fsops.c | 92 ++++++++++++++++++++++++++++++++----------------------
> >  1 file changed, 55 insertions(+), 37 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 113be7dbdc81..7318cebb591d 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> ...
> > @@ -630,43 +653,27 @@ xfs_growfs_imaxpct(
> >  
> ...
> >  static int
> >  xfs_growfs_update_superblocks(
> ...
> >  	/* update secondary superblocks. */
> >  	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
> > -		error = 0;
> > -		/*
> > -		 * new secondary superblocks need to be zeroed, not read from
> > -		 * disk as the contents of the new area we are growing into is
> > -		 * completely unknown.
> > -		 */
> > -		if (agno < oagcount) {
> > -			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
> > -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > -				  XFS_FSS_TO_BB(mp, 1), 0, &bp,
> > -				  &xfs_sb_buf_ops);
> > -		} else {
> > -			bp = xfs_trans_get_buf(NULL, mp->m_ddev_targp,
> > -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > -				  XFS_FSS_TO_BB(mp, 1), 0);
> > -			if (bp) {
> > -				bp->b_ops = &xfs_sb_buf_ops;
> > -				xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
> > -			} else
> > -				error = -ENOMEM;
> > -		}
> > +		struct xfs_buf		*bp;
> >  
> > +		bp = xfs_growfs_get_hdr_buf(mp,
> > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> 
> This all seems fine to me up until the point where we use uncached
> buffers for pre-existing secondary superblocks. This may all be fine now
> if nothing else happens to access/use secondary supers, but it seems
> like this essentially enforces that going forward.
> 
> Hmm, I see that scrub does appear to look at secondary superblocks via
> cached buffers. Shouldn't we expect this path to maintain coherency with
> an sb buffer that may have been read/cached from there?

Good catch! I wrote this before scrub started looking at secondary
superblocks. As a general rulle, we don't want to cache secondary
superblocks as they should never be used by the kernel except in
exceptional situations like grow or scrub.

I'll have a look at making this use cached buffers that get freed
immediately after we release them (i.e. don't go onto the LRU) and
that should solve the problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/7] xfs: separate secondary sb update in growfs
  2018-02-15 22:23     ` Dave Chinner
@ 2018-02-16 12:31       ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2018-02-16 12:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Feb 16, 2018 at 09:23:58AM +1100, Dave Chinner wrote:
> On Fri, Feb 09, 2018 at 11:11:54AM -0500, Brian Foster wrote:
> > On Thu, Feb 01, 2018 at 05:42:01PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > This happens after all the transactions to update the superblock
> > > occur, and errors need to be handled slightly differently. Seperate
> > 
> > Separate
> > 
> > > out the code into it's own function, and clean up the error goto
> > > stack in the core growfs code as it is now much simpler.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_fsops.c | 154 ++++++++++++++++++++++++++++++-----------------------
> > >  1 file changed, 87 insertions(+), 67 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index 5c844e540320..113be7dbdc81 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > ...
> > > @@ -572,16 +572,79 @@ xfs_growfs_data_private(
> > >  		error = xfs_ag_resv_free(pag);
> > >  		xfs_perag_put(pag);
> > >  		if (error)
> > > -			goto out;
> > > +			return error;
> > >  	}
> > >  
> > >  	/* Reserve AG metadata blocks. */
> > > -	error = xfs_fs_reserve_ag_blocks(mp);
> > > -	if (error && error != -ENOSPC)
> > > -		goto out;
> > > +	return xfs_fs_reserve_ag_blocks(mp);
> > 
> > It looks like we change the semantics of -ENOSPC during perag
> > reservation init. No mention of whether this is intentional and/or
> > why..?
> 
> Not sure what I changed here - it just returns the error to the
> caller because it's no longer going to jump over code after
> xfs_fs_reserve_ag_blocks(mp) has already shut down the filesystem
> (which it does on any error other than ENOSPC).
> 

It the semantics of -ENOSPC (i.e., how that error is/was specially
handled) that looked different..

> Perhaps....
> 
> > > @@ -694,6 +707,7 @@ xfs_growfs_data(
> > >  	struct xfs_mount	*mp,
> > >  	struct xfs_growfs_data	*in)
> > >  {
> > > +	xfs_agnumber_t		oagcount;
> > >  	int			error = 0;
> > >  
> > >  	if (!capable(CAP_SYS_ADMIN))
> > > @@ -708,6 +722,7 @@ xfs_growfs_data(
> > >  			goto out_error;
> > >  	}
> > >  
> > > +	oagcount = mp->m_sb.sb_agcount;
> > >  	error = xfs_growfs_data_private(mp, in);
> > >  	if (error)
> > >  		goto out_error;
> 
> .... you are commenting on this code here, were ENOSPC is not
> specially handled to all the superblocks to be updated even if we
> got an ENOSPC on data-grow?
> 

Not sure I parse that...

> > > @@ -722,6 +737,11 @@ xfs_growfs_data(
> > >  	} else
> > >  		mp->m_maxicount = 0;
> > >  
> > > +	/*
> > > +	 * Update secondary superblocks now the physical grow has completed
> > > +	 */
> > > +	error = xfs_growfs_update_superblocks(mp, oagcount);
> > > +
> 
> i.e. it doesn't run this at ENOSPC now?
> 

... but yeah, this I think, taking a quick look back.

Essentially it looked like -ENOSPC from the perag res init currently
does not result in a growfs operation error. We'd simply move on to the
next step and the growfs may very well return success. Here, it looks
like we've changed behavior to return -ENOSPC to userspace (without any
explanation).

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-15 22:31     ` Dave Chinner
@ 2018-02-16 12:56       ` Brian Foster
  2018-02-16 16:20         ` Darrick J. Wong
  2018-02-19  2:16         ` Dave Chinner
  0 siblings, 2 replies; 32+ messages in thread
From: Brian Foster @ 2018-02-16 12:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Right now we wait until we've committed changes to the primary
> > > superblock before we initialise any of the new secondary
> > > superblocks. This means that if we have any write errors for new
> > > secondary superblocks we end up with garbage in place rather than
> > > zeros or even an "in progress" superblock to indicate a grow
> > > operation is being done.
> > > 
> > > To ensure we can write the secondary superblocks, initialise them
> > > earlier in the same loop that initialises the AG headers. We stamp
> > > the new secondary superblocks here with the old geometry, but set
> > > the "sb_inprogress" field to indicate that updates are being done to
> > > the superblock so they cannot be used.  This will result in the
> > > secondary superblock fields being updated or triggering errors that
> > > will abort the grow before we commit any permanent changes.
> > > 
> > > This also means we can change the update mechanism of the secondary
> > > superblocks.  We know that we are going to wholly overwrite the
> > > information in the struct xfs_sb in the buffer, so there's no point
> > > reading it from disk. Just allocate an uncached buffer, zero it in
> > > memory, stamp the new superblock structure in it and write it out.
> > > If we fail to write it out, then we'll leave the existing sb (old or
> > > new w/ inprogress) on disk for repair to deal with later.
> > > 
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/xfs_fsops.c | 92 ++++++++++++++++++++++++++++++++----------------------
> > >  1 file changed, 55 insertions(+), 37 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > index 113be7dbdc81..7318cebb591d 100644
> > > --- a/fs/xfs/xfs_fsops.c
> > > +++ b/fs/xfs/xfs_fsops.c
> > ...
> > > @@ -630,43 +653,27 @@ xfs_growfs_imaxpct(
> > >  
> > ...
> > >  static int
> > >  xfs_growfs_update_superblocks(
> > ...
> > >  	/* update secondary superblocks. */
> > >  	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
> > > -		error = 0;
> > > -		/*
> > > -		 * new secondary superblocks need to be zeroed, not read from
> > > -		 * disk as the contents of the new area we are growing into is
> > > -		 * completely unknown.
> > > -		 */
> > > -		if (agno < oagcount) {
> > > -			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
> > > -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > > -				  XFS_FSS_TO_BB(mp, 1), 0, &bp,
> > > -				  &xfs_sb_buf_ops);
> > > -		} else {
> > > -			bp = xfs_trans_get_buf(NULL, mp->m_ddev_targp,
> > > -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > > -				  XFS_FSS_TO_BB(mp, 1), 0);
> > > -			if (bp) {
> > > -				bp->b_ops = &xfs_sb_buf_ops;
> > > -				xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
> > > -			} else
> > > -				error = -ENOMEM;
> > > -		}
> > > +		struct xfs_buf		*bp;
> > >  
> > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > 
> > This all seems fine to me up until the point where we use uncached
> > buffers for pre-existing secondary superblocks. This may all be fine now
> > if nothing else happens to access/use secondary supers, but it seems
> > like this essentially enforces that going forward.
> > 
> > Hmm, I see that scrub does appear to look at secondary superblocks via
> > cached buffers. Shouldn't we expect this path to maintain coherency with
> > an sb buffer that may have been read/cached from there?
> 
> Good catch! I wrote this before scrub started looking at secondary
> superblocks. As a general rulle, we don't want to cache secondary
> superblocks as they should never be used by the kernel except in
> exceptional situations like grow or scrub.
> 
> I'll have a look at making this use cached buffers that get freed
> immediately after we release them (i.e. don't go onto the LRU) and
> that should solve the problem.
> 

Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
that is not cached? Isn't the behavior you're after here (perhaps
analogous to pagecache coherency management between buffered/direct I/O)
more cleanly implemented using a cache invalidation mechanism? E.g.,
invalidate cache, use uncached buffer (then perhaps invalidate again).

I guess I'm also a little curious why we couldn't continue to use cached
buffers here, but it doesn't really matter to me that much so long as
the metadata ends up coherent between subsystems..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-16 12:56       ` Brian Foster
@ 2018-02-16 16:20         ` Darrick J. Wong
  2018-02-19  2:16         ` Dave Chinner
  1 sibling, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2018-02-16 16:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Fri, Feb 16, 2018 at 07:56:25AM -0500, Brian Foster wrote:
> On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> > On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Right now we wait until we've committed changes to the primary
> > > > superblock before we initialise any of the new secondary
> > > > superblocks. This means that if we have any write errors for new
> > > > secondary superblocks we end up with garbage in place rather than
> > > > zeros or even an "in progress" superblock to indicate a grow
> > > > operation is being done.
> > > > 
> > > > To ensure we can write the secondary superblocks, initialise them
> > > > earlier in the same loop that initialises the AG headers. We stamp
> > > > the new secondary superblocks here with the old geometry, but set
> > > > the "sb_inprogress" field to indicate that updates are being done to
> > > > the superblock so they cannot be used.  This will result in the
> > > > secondary superblock fields being updated or triggering errors that
> > > > will abort the grow before we commit any permanent changes.
> > > > 
> > > > This also means we can change the update mechanism of the secondary
> > > > superblocks.  We know that we are going to wholly overwrite the
> > > > information in the struct xfs_sb in the buffer, so there's no point
> > > > reading it from disk. Just allocate an uncached buffer, zero it in
> > > > memory, stamp the new superblock structure in it and write it out.
> > > > If we fail to write it out, then we'll leave the existing sb (old or
> > > > new w/ inprogress) on disk for repair to deal with later.
> > > > 
> > > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_fsops.c | 92 ++++++++++++++++++++++++++++++++----------------------
> > > >  1 file changed, 55 insertions(+), 37 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > > > index 113be7dbdc81..7318cebb591d 100644
> > > > --- a/fs/xfs/xfs_fsops.c
> > > > +++ b/fs/xfs/xfs_fsops.c
> > > ...
> > > > @@ -630,43 +653,27 @@ xfs_growfs_imaxpct(
> > > >  
> > > ...
> > > >  static int
> > > >  xfs_growfs_update_superblocks(
> > > ...
> > > >  	/* update secondary superblocks. */
> > > >  	for (agno = 1; agno < mp->m_sb.sb_agcount; agno++) {
> > > > -		error = 0;
> > > > -		/*
> > > > -		 * new secondary superblocks need to be zeroed, not read from
> > > > -		 * disk as the contents of the new area we are growing into is
> > > > -		 * completely unknown.
> > > > -		 */
> > > > -		if (agno < oagcount) {
> > > > -			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
> > > > -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > > > -				  XFS_FSS_TO_BB(mp, 1), 0, &bp,
> > > > -				  &xfs_sb_buf_ops);
> > > > -		} else {
> > > > -			bp = xfs_trans_get_buf(NULL, mp->m_ddev_targp,
> > > > -				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
> > > > -				  XFS_FSS_TO_BB(mp, 1), 0);
> > > > -			if (bp) {
> > > > -				bp->b_ops = &xfs_sb_buf_ops;
> > > > -				xfs_buf_zero(bp, 0, BBTOB(bp->b_length));
> > > > -			} else
> > > > -				error = -ENOMEM;
> > > > -		}
> > > > +		struct xfs_buf		*bp;
> > > >  
> > > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > > 
> > > This all seems fine to me up until the point where we use uncached
> > > buffers for pre-existing secondary superblocks. This may all be fine now
> > > if nothing else happens to access/use secondary supers, but it seems
> > > like this essentially enforces that going forward.
> > > 
> > > Hmm, I see that scrub does appear to look at secondary superblocks via
> > > cached buffers. Shouldn't we expect this path to maintain coherency with
> > > an sb buffer that may have been read/cached from there?
> > 
> > Good catch! I wrote this before scrub started looking at secondary
> > superblocks. As a general rulle, we don't want to cache secondary
> > superblocks as they should never be used by the kernel except in
> > exceptional situations like grow or scrub.
> > 
> > I'll have a look at making this use cached buffers that get freed
> > immediately after we release them (i.e. don't go onto the LRU) and
> > that should solve the problem.
> > 
> 
> Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
> that is not cached? Isn't the behavior you're after here (perhaps
> analogous to pagecache coherency management between buffered/direct I/O)
> more cleanly implemented using a cache invalidation mechanism? E.g.,
> invalidate cache, use uncached buffer (then perhaps invalidate again).
>
> I guess I'm also a little curious why we couldn't continue to use cached
> buffers here, but it doesn't really matter to me that much so long as
> the metadata ends up coherent between subsystems..

Perhaps it would be easier to change the sb scrub to use
xfs_buf_read_uncached instead?  The critical blind spot here for me is
that I'm not sure why secondary superblock buffers are uncached.

--D

> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-16 12:56       ` Brian Foster
  2018-02-16 16:20         ` Darrick J. Wong
@ 2018-02-19  2:16         ` Dave Chinner
  2018-02-19 13:21           ` Brian Foster
  1 sibling, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-19  2:16 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Feb 16, 2018 at 07:56:25AM -0500, Brian Foster wrote:
> On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> > On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > > 
> > > This all seems fine to me up until the point where we use uncached
> > > buffers for pre-existing secondary superblocks. This may all be fine now
> > > if nothing else happens to access/use secondary supers, but it seems
> > > like this essentially enforces that going forward.
> > > 
> > > Hmm, I see that scrub does appear to look at secondary superblocks via
> > > cached buffers. Shouldn't we expect this path to maintain coherency with
> > > an sb buffer that may have been read/cached from there?
> > 
> > Good catch! I wrote this before scrub started looking at secondary
> > superblocks. As a general rulle, we don't want to cache secondary
> > superblocks as they should never be used by the kernel except in
> > exceptional situations like grow or scrub.
> > 
> > I'll have a look at making this use cached buffers that get freed
> > immediately after we release them (i.e. don't go onto the LRU) and
> > that should solve the problem.
> > 
> 
> Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
> that is not cached?

Serialisation of concurrent access to what is normal a single-use
access code path while it is in memory. i.e. exactly the reason we
have XFS_IGET_DONTCACHE and use it for things like bulkstat lookups.

> Isn't the behavior you're after here (perhaps
> analogous to pagecache coherency management between buffered/direct I/O)
> more cleanly implemented using a cache invalidation mechanism? E.g.,
> invalidate cache, use uncached buffer (then perhaps invalidate again).

Invalidation as a mechanism for non-coherent access sycnhronisation
is completely broken model when it comes to concurrent access. We
explicitly tell app developers not ot mix cached + uncached IO to
the same file for exactly this reason.  Using a cached buffer and
using the existing xfs_buf_find/lock serialisation avoids this
problem, and by freeing them immediately after we've used them we
also minimise the memory footprint of single-use access patterns.

> I guess I'm also a little curious why we couldn't continue to use cached
> buffers here,

As I said, we will continue to use cached buffers here. I'll just
call xfs_buf_set_ref(bp, 0) on them so they are reclaimed when
released. That means concurrent access will serialise correctly
through _xfs_buf_find(), otherwise we won't keep them in memory.

> but it doesn't really matter to me that much so long as
> the metadata ends up coherent between subsystems..

Yup, that's the idea.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-19  2:16         ` Dave Chinner
@ 2018-02-19 13:21           ` Brian Foster
  2018-02-19 22:14             ` Dave Chinner
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-19 13:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Mon, Feb 19, 2018 at 01:16:36PM +1100, Dave Chinner wrote:
> On Fri, Feb 16, 2018 at 07:56:25AM -0500, Brian Foster wrote:
> > On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> > > On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > > > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > > > 
> > > > This all seems fine to me up until the point where we use uncached
> > > > buffers for pre-existing secondary superblocks. This may all be fine now
> > > > if nothing else happens to access/use secondary supers, but it seems
> > > > like this essentially enforces that going forward.
> > > > 
> > > > Hmm, I see that scrub does appear to look at secondary superblocks via
> > > > cached buffers. Shouldn't we expect this path to maintain coherency with
> > > > an sb buffer that may have been read/cached from there?
> > > 
> > > Good catch! I wrote this before scrub started looking at secondary
> > > superblocks. As a general rulle, we don't want to cache secondary
> > > superblocks as they should never be used by the kernel except in
> > > exceptional situations like grow or scrub.
> > > 
> > > I'll have a look at making this use cached buffers that get freed
> > > immediately after we release them (i.e. don't go onto the LRU) and
> > > that should solve the problem.
> > > 
> > 
> > Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
> > that is not cached?
> 
> Serialisation of concurrent access to what is normal a single-use
> access code path while it is in memory. i.e. exactly the reason we
> have XFS_IGET_DONTCACHE and use it for things like bulkstat lookups.
> 

Well, that's the purpose of looking up a cached instance of an uncached
buffer. That makes sense, but that's only half the question...

> > Isn't the behavior you're after here (perhaps
> > analogous to pagecache coherency management between buffered/direct I/O)
> > more cleanly implemented using a cache invalidation mechanism? E.g.,
> > invalidate cache, use uncached buffer (then perhaps invalidate again).
> 
> Invalidation as a mechanism for non-coherent access sycnhronisation
> is completely broken model when it comes to concurrent access. We
> explicitly tell app developers not ot mix cached + uncached IO to
> the same file for exactly this reason.  Using a cached buffer and
> using the existing xfs_buf_find/lock serialisation avoids this
> problem, and by freeing them immediately after we've used them we
> also minimise the memory footprint of single-use access patterns.
> 

Ok..

> > I guess I'm also a little curious why we couldn't continue to use cached
> > buffers here,
> 
> As I said, we will continue to use cached buffers here. I'll just
> call xfs_buf_set_ref(bp, 0) on them so they are reclaimed when
> released. That means concurrent access will serialise correctly
> through _xfs_buf_find(), otherwise we won't keep them in memory.
> 

Ok, but what's the purpose/motivation for doing that here? Purely to
save on memory? Is that really an impactful enough change in behavior
for (pre-existing) secondary superblocks? This seems a clear enough
decision when growfs was the only consumer of these buffers, but having
another cached accessor kind of clouds the logic.

E.g., if task A reads a set of buffers cached, it's made a decision that
it's potentially beneficial to leave them around. Now we have task B
that has decided it doesn't want to cache the buffers, but what bearing
does that have on task A? It certainly makes sense for task B to drop
any buffer that wasn't already cached, but for already cached buffers it
doesn't really make sense for task B to decide there is no further
advantage to caching for task A.

FWIW, I think this is how IGET_DONTCACHE works: don't cache the inode
unless it was actually found in cache. I presume that is so a bulkstat
or whatever doesn't toss the existing cached inode working set. It also
looks like an intermediate xfs_iget_cache_hit() actually clears the
pending 'don't cache' state (which makes me wonder what happens when
simultaneous 'don't cache' lookups occur; afaict we'd end up with a
cached inode :/). Bugs aside, perhaps that is a better approach here
rather than stomping on the lru reference count?

Brian

P.S., Another factor to consider is I think this may have potential for
unintended side effect without one of the previously suggested changes
to not call into the growfs internals code on pure imaxpct changes
(which I think you indicated you were going to fix, I just haven't
looked back).

> > but it doesn't really matter to me that much so long as
> > the metadata ends up coherent between subsystems..
> 
> Yup, that's the idea.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-19 13:21           ` Brian Foster
@ 2018-02-19 22:14             ` Dave Chinner
  2018-02-20 12:44               ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Dave Chinner @ 2018-02-19 22:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Feb 19, 2018 at 08:21:04AM -0500, Brian Foster wrote:
> On Mon, Feb 19, 2018 at 01:16:36PM +1100, Dave Chinner wrote:
> > On Fri, Feb 16, 2018 at 07:56:25AM -0500, Brian Foster wrote:
> > > On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> > > > On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > > > > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > > > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > > > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > > > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > > > > 
> > > > > This all seems fine to me up until the point where we use uncached
> > > > > buffers for pre-existing secondary superblocks. This may all be fine now
> > > > > if nothing else happens to access/use secondary supers, but it seems
> > > > > like this essentially enforces that going forward.
> > > > > 
> > > > > Hmm, I see that scrub does appear to look at secondary superblocks via
> > > > > cached buffers. Shouldn't we expect this path to maintain coherency with
> > > > > an sb buffer that may have been read/cached from there?
> > > > 
> > > > Good catch! I wrote this before scrub started looking at secondary
> > > > superblocks. As a general rulle, we don't want to cache secondary
> > > > superblocks as they should never be used by the kernel except in
> > > > exceptional situations like grow or scrub.
> > > > 
> > > > I'll have a look at making this use cached buffers that get freed
> > > > immediately after we release them (i.e. don't go onto the LRU) and
> > > > that should solve the problem.
> > > > 
> > > 
> > > Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
> > > that is not cached?
> > 
> > Serialisation of concurrent access to what is normal a single-use
> > access code path while it is in memory. i.e. exactly the reason we
> > have XFS_IGET_DONTCACHE and use it for things like bulkstat lookups.
> > 
> 
> Well, that's the purpose of looking up a cached instance of an uncached
> buffer. That makes sense, but that's only half the question...
> 
> > > Isn't the behavior you're after here (perhaps
> > > analogous to pagecache coherency management between buffered/direct I/O)
> > > more cleanly implemented using a cache invalidation mechanism? E.g.,
> > > invalidate cache, use uncached buffer (then perhaps invalidate again).
> > 
> > Invalidation as a mechanism for non-coherent access sycnhronisation
> > is completely broken model when it comes to concurrent access. We
> > explicitly tell app developers not ot mix cached + uncached IO to
> > the same file for exactly this reason.  Using a cached buffer and
> > using the existing xfs_buf_find/lock serialisation avoids this
> > problem, and by freeing them immediately after we've used them we
> > also minimise the memory footprint of single-use access patterns.
> > 
> 
> Ok..
> 
> > > I guess I'm also a little curious why we couldn't continue to use cached
> > > buffers here,
> > 
> > As I said, we will continue to use cached buffers here. I'll just
> > call xfs_buf_set_ref(bp, 0) on them so they are reclaimed when
> > released. That means concurrent access will serialise correctly
> > through _xfs_buf_find(), otherwise we won't keep them in memory.
> > 
> 
> Ok, but what's the purpose/motivation for doing that here? Purely to
> save on memory?

Partly, but mainly because they are single use buffers and accesses
are so rare that it's a waste of resources to cache them because
they'll be reclaimed long before they are ever accessed again.

> Is that really an impactful enough change in behavior
> for (pre-existing) secondary superblocks?

Yes. We know that there are people out there doing "create tiny,
deploy, grow to thousands of AGs" as part of their crazy, screwed up
container deployment scripts. THat's thousands of secondary
superblocks that will be cached and generate unnecessary memory
pressure when cached,

> This seems a clear enough
> decision when growfs was the only consumer of these buffers, but having
> another cached accessor kind of clouds the logic.

Scrub is not something that runs often enough we should be trying to
cache it's metadata to speed up the next run. The whole point of
scrub is that it reads metadata that hasn't been accessed in a long
time to verify it hasn't degraded. Caching secondary superblocks for
either growfs or scrub makes no sense. However, we have to make sure
if the two occur at the same time, their actions are coherent and
correctly serialised.

> E.g., if task A reads a set of buffers cached, it's made a decision that
> it's potentially beneficial to leave them around. Now we have task B
> that has decided it doesn't want to cache the buffers, but what bearing
> does that have on task A? It certainly makes sense for task B to drop
> any buffer that wasn't already cached, but for already cached buffers it
> doesn't really make sense for task B to decide there is no further
> advantage to caching for task A.
> 
> FWIW, I think this is how IGET_DONTCACHE works: don't cache the inode
> unless it was actually found in cache. I presume that is so a bulkstat
> or whatever doesn't toss the existing cached inode working set.

Yes, precisely the point of this inode cache behaviour. However,
that's not a concern for secondary superblocks because they are
never part of the working set of metadata ongoing user workloads
require to be cached. They only get brought into memory as a result
of admin operations, and those are very, very rare.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-19 22:14             ` Dave Chinner
@ 2018-02-20 12:44               ` Brian Foster
  2018-03-24  0:37                 ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2018-02-20 12:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Feb 20, 2018 at 09:14:04AM +1100, Dave Chinner wrote:
> On Mon, Feb 19, 2018 at 08:21:04AM -0500, Brian Foster wrote:
> > On Mon, Feb 19, 2018 at 01:16:36PM +1100, Dave Chinner wrote:
> > > On Fri, Feb 16, 2018 at 07:56:25AM -0500, Brian Foster wrote:
> > > > On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> > > > > On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > > > > > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > > > > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > > > > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > > > > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > > > > > 
> > > > > > This all seems fine to me up until the point where we use uncached
> > > > > > buffers for pre-existing secondary superblocks. This may all be fine now
> > > > > > if nothing else happens to access/use secondary supers, but it seems
> > > > > > like this essentially enforces that going forward.
> > > > > > 
> > > > > > Hmm, I see that scrub does appear to look at secondary superblocks via
> > > > > > cached buffers. Shouldn't we expect this path to maintain coherency with
> > > > > > an sb buffer that may have been read/cached from there?
> > > > > 
> > > > > Good catch! I wrote this before scrub started looking at secondary
> > > > > superblocks. As a general rulle, we don't want to cache secondary
> > > > > superblocks as they should never be used by the kernel except in
> > > > > exceptional situations like grow or scrub.
> > > > > 
> > > > > I'll have a look at making this use cached buffers that get freed
> > > > > immediately after we release them (i.e. don't go onto the LRU) and
> > > > > that should solve the problem.
> > > > > 
> > > > 
> > > > Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
> > > > that is not cached?
> > > 
> > > Serialisation of concurrent access to what is normal a single-use
> > > access code path while it is in memory. i.e. exactly the reason we
> > > have XFS_IGET_DONTCACHE and use it for things like bulkstat lookups.
> > > 
> > 
> > Well, that's the purpose of looking up a cached instance of an uncached
> > buffer. That makes sense, but that's only half the question...
> > 
> > > > Isn't the behavior you're after here (perhaps
> > > > analogous to pagecache coherency management between buffered/direct I/O)
> > > > more cleanly implemented using a cache invalidation mechanism? E.g.,
> > > > invalidate cache, use uncached buffer (then perhaps invalidate again).
> > > 
> > > Invalidation as a mechanism for non-coherent access sycnhronisation
> > > is completely broken model when it comes to concurrent access. We
> > > explicitly tell app developers not ot mix cached + uncached IO to
> > > the same file for exactly this reason.  Using a cached buffer and
> > > using the existing xfs_buf_find/lock serialisation avoids this
> > > problem, and by freeing them immediately after we've used them we
> > > also minimise the memory footprint of single-use access patterns.
> > > 
> > 
> > Ok..
> > 
> > > > I guess I'm also a little curious why we couldn't continue to use cached
> > > > buffers here,
> > > 
> > > As I said, we will continue to use cached buffers here. I'll just
> > > call xfs_buf_set_ref(bp, 0) on them so they are reclaimed when
> > > released. That means concurrent access will serialise correctly
> > > through _xfs_buf_find(), otherwise we won't keep them in memory.
> > > 
> > 
> > Ok, but what's the purpose/motivation for doing that here? Purely to
> > save on memory?
> 
> Partly, but mainly because they are single use buffers and accesses
> are so rare that it's a waste of resources to cache them because
> they'll be reclaimed long before they are ever accessed again.
> 
> > Is that really an impactful enough change in behavior
> > for (pre-existing) secondary superblocks?
> 
> Yes. We know that there are people out there doing "create tiny,
> deploy, grow to thousands of AGs" as part of their crazy, screwed up
> container deployment scripts. THat's thousands of secondary
> superblocks that will be cached and generate unnecessary memory
> pressure when cached,
> 
> > This seems a clear enough
> > decision when growfs was the only consumer of these buffers, but having
> > another cached accessor kind of clouds the logic.
> 
> Scrub is not something that runs often enough we should be trying to
> cache it's metadata to speed up the next run. The whole point of
> scrub is that it reads metadata that hasn't been accessed in a long
> time to verify it hasn't degraded. Caching secondary superblocks for
> either growfs or scrub makes no sense. However, we have to make sure
> if the two occur at the same time, their actions are coherent and
> correctly serialised.
> 

Ok, so then the right thing to do (as Darrick posited earlier) is also
tweak scrub to effectively not cache buffers from that path. That seems
perfectly reasonable to me.

> > E.g., if task A reads a set of buffers cached, it's made a decision that
> > it's potentially beneficial to leave them around. Now we have task B
> > that has decided it doesn't want to cache the buffers, but what bearing
> > does that have on task A? It certainly makes sense for task B to drop
> > any buffer that wasn't already cached, but for already cached buffers it
> > doesn't really make sense for task B to decide there is no further
> > advantage to caching for task A.
> > 
> > FWIW, I think this is how IGET_DONTCACHE works: don't cache the inode
> > unless it was actually found in cache. I presume that is so a bulkstat
> > or whatever doesn't toss the existing cached inode working set.
> 
> Yes, precisely the point of this inode cache behaviour. However,
> that's not a concern for secondary superblocks because they are
> never part of the working set of metadata ongoing user workloads
> require to be cached. They only get brought into memory as a result
> of admin operations, and those are very, very rare.
> 

I'm not concerned about trashing a working set of secondary superblocks
in practice... Darrick has already suggested that it's probably not
critical for scrub and I think your reasoning also makes sense. I'm just
pointing out that we have a similar interface/control in place for
another cached object, and that is how it happens to work.

With that in mind, I am still interested in having sane/consistent and
predictable behavior here. Having two user driven operations A and B
where path A caches buffers and path B effectively invalidates that
cache (for the purpose of saving memory) doesn't make a lot of sense to
me. However, having two paths that both use "don't cache" references is
clean, predictable and provides the necessary coherency between them.

So to be more specific, all I'm really suggesting here is something like
an xfs_read_secondary_sb() helper that calls xfs_buf_set_ref(bp,
XFS_SSB_REF) on the buffer, and to use that in both places so it's clear
that we expect to handle such buffers in a certain way going forward. It
might also be worth factoring into a separate patch since this is
technically a change in behavior (growfs currently uses cached buffers)
worthy of an independent commit log (IMO), but not a huge deal if that
is too much churn.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/7] xfs: rework secondary superblock updates in growfs
  2018-02-20 12:44               ` Brian Foster
@ 2018-03-24  0:37                 ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2018-03-24  0:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

Drat, this fell off my radar...

On Tue, Feb 20, 2018 at 07:44:39AM -0500, Brian Foster wrote:
> On Tue, Feb 20, 2018 at 09:14:04AM +1100, Dave Chinner wrote:
> > On Mon, Feb 19, 2018 at 08:21:04AM -0500, Brian Foster wrote:
> > > On Mon, Feb 19, 2018 at 01:16:36PM +1100, Dave Chinner wrote:
> > > > On Fri, Feb 16, 2018 at 07:56:25AM -0500, Brian Foster wrote:
> > > > > On Fri, Feb 16, 2018 at 09:31:38AM +1100, Dave Chinner wrote:
> > > > > > On Fri, Feb 09, 2018 at 11:12:41AM -0500, Brian Foster wrote:
> > > > > > > On Thu, Feb 01, 2018 at 05:42:02PM +1100, Dave Chinner wrote:
> > > > > > > > +		bp = xfs_growfs_get_hdr_buf(mp,
> > > > > > > > +				XFS_AG_DADDR(mp, agno, XFS_SB_DADDR),
> > > > > > > > +				XFS_FSS_TO_BB(mp, 1), 0, &xfs_sb_buf_ops);
> > > > > > > 
> > > > > > > This all seems fine to me up until the point where we use uncached
> > > > > > > buffers for pre-existing secondary superblocks. This may all be fine now
> > > > > > > if nothing else happens to access/use secondary supers, but it seems
> > > > > > > like this essentially enforces that going forward.
> > > > > > > 
> > > > > > > Hmm, I see that scrub does appear to look at secondary superblocks via
> > > > > > > cached buffers. Shouldn't we expect this path to maintain coherency with
> > > > > > > an sb buffer that may have been read/cached from there?
> > > > > > 
> > > > > > Good catch! I wrote this before scrub started looking at secondary
> > > > > > superblocks. As a general rulle, we don't want to cache secondary
> > > > > > superblocks as they should never be used by the kernel except in
> > > > > > exceptional situations like grow or scrub.
> > > > > > 
> > > > > > I'll have a look at making this use cached buffers that get freed
> > > > > > immediately after we release them (i.e. don't go onto the LRU) and
> > > > > > that should solve the problem.
> > > > > > 
> > > > > 
> > > > > Ok. Though that sounds a bit odd. What is the purpose of a cached buffer
> > > > > that is not cached?
> > > > 
> > > > Serialisation of concurrent access to what is normal a single-use
> > > > access code path while it is in memory. i.e. exactly the reason we
> > > > have XFS_IGET_DONTCACHE and use it for things like bulkstat lookups.
> > > > 
> > > 
> > > Well, that's the purpose of looking up a cached instance of an uncached
> > > buffer. That makes sense, but that's only half the question...
> > > 
> > > > > Isn't the behavior you're after here (perhaps
> > > > > analogous to pagecache coherency management between buffered/direct I/O)
> > > > > more cleanly implemented using a cache invalidation mechanism? E.g.,
> > > > > invalidate cache, use uncached buffer (then perhaps invalidate again).
> > > > 
> > > > Invalidation as a mechanism for non-coherent access sycnhronisation
> > > > is completely broken model when it comes to concurrent access. We
> > > > explicitly tell app developers not ot mix cached + uncached IO to
> > > > the same file for exactly this reason.  Using a cached buffer and
> > > > using the existing xfs_buf_find/lock serialisation avoids this
> > > > problem, and by freeing them immediately after we've used them we
> > > > also minimise the memory footprint of single-use access patterns.
> > > > 
> > > 
> > > Ok..
> > > 
> > > > > I guess I'm also a little curious why we couldn't continue to use cached
> > > > > buffers here,
> > > > 
> > > > As I said, we will continue to use cached buffers here. I'll just
> > > > call xfs_buf_set_ref(bp, 0) on them so they are reclaimed when
> > > > released. That means concurrent access will serialise correctly
> > > > through _xfs_buf_find(), otherwise we won't keep them in memory.
> > > > 
> > > 
> > > Ok, but what's the purpose/motivation for doing that here? Purely to
> > > save on memory?
> > 
> > Partly, but mainly because they are single use buffers and accesses
> > are so rare that it's a waste of resources to cache them because
> > they'll be reclaimed long before they are ever accessed again.
> > 
> > > Is that really an impactful enough change in behavior
> > > for (pre-existing) secondary superblocks?
> > 
> > Yes. We know that there are people out there doing "create tiny,
> > deploy, grow to thousands of AGs" as part of their crazy, screwed up
> > container deployment scripts. THat's thousands of secondary
> > superblocks that will be cached and generate unnecessary memory
> > pressure when cached,
> > 
> > > This seems a clear enough
> > > decision when growfs was the only consumer of these buffers, but having
> > > another cached accessor kind of clouds the logic.
> > 
> > Scrub is not something that runs often enough we should be trying to
> > cache it's metadata to speed up the next run. The whole point of
> > scrub is that it reads metadata that hasn't been accessed in a long
> > time to verify it hasn't degraded. Caching secondary superblocks for
> > either growfs or scrub makes no sense. However, we have to make sure
> > if the two occur at the same time, their actions are coherent and
> > correctly serialised.
> > 
> 
> Ok, so then the right thing to do (as Darrick posited earlier) is also
> tweak scrub to effectively not cache buffers from that path. That seems
> perfectly reasonable to me.

Yes. :)

> > > E.g., if task A reads a set of buffers cached, it's made a decision that
> > > it's potentially beneficial to leave them around. Now we have task B
> > > that has decided it doesn't want to cache the buffers, but what bearing
> > > does that have on task A? It certainly makes sense for task B to drop
> > > any buffer that wasn't already cached, but for already cached buffers it
> > > doesn't really make sense for task B to decide there is no further
> > > advantage to caching for task A.
> > > 
> > > FWIW, I think this is how IGET_DONTCACHE works: don't cache the inode
> > > unless it was actually found in cache. I presume that is so a bulkstat
> > > or whatever doesn't toss the existing cached inode working set.
> > 
> > Yes, precisely the point of this inode cache behaviour. However,
> > that's not a concern for secondary superblocks because they are
> > never part of the working set of metadata ongoing user workloads
> > require to be cached. They only get brought into memory as a result
> > of admin operations, and those are very, very rare.
> > 
> 
> I'm not concerned about trashing a working set of secondary superblocks
> in practice... Darrick has already suggested that it's probably not
> critical for scrub and I think your reasoning also makes sense. I'm just
> pointing out that we have a similar interface/control in place for
> another cached object, and that is how it happens to work.
> 
> With that in mind, I am still interested in having sane/consistent and
> predictable behavior here. Having two user driven operations A and B
> where path A caches buffers and path B effectively invalidates that
> cache (for the purpose of saving memory) doesn't make a lot of sense to
> me. However, having two paths that both use "don't cache" references is
> clean, predictable and provides the necessary coherency between them.
> 
> So to be more specific, all I'm really suggesting here is something like
> an xfs_read_secondary_sb() helper that calls xfs_buf_set_ref(bp,
> XFS_SSB_REF) on the buffer, and to use that in both places so it's clear
> that we expect to handle such buffers in a certain way going forward. It
> might also be worth factoring into a separate patch since this is
> technically a change in behavior (growfs currently uses cached buffers)
> worthy of an independent commit log (IMO), but not a huge deal if that
> is too much churn.

Agree.  If such a helper were added as part of this patchset, I'd patch
up the corresponding part of scrub to use it.

Dave, will you be reposting this series soon?  I've decided against
trying to combine this with repair, so (afaict) once Brian's review
comments are addressed I think this one is in relatively good shape.

--D

> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-03-24  0:37 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-01  6:41 [PATCH 0/7] xfs: refactor and tablise growfs Dave Chinner
2018-02-01  6:41 ` [PATCH 1/7] xfs: factor out AG header initialisation from growfs core Dave Chinner
2018-02-08 18:53   ` Brian Foster
2018-02-01  6:41 ` [PATCH 2/7] xfs: convert growfs AG header init to use buffer lists Dave Chinner
2018-02-08 18:53   ` Brian Foster
2018-02-01  6:41 ` [PATCH 3/7] xfs: factor ag btree reoot block initialisation Dave Chinner
2018-02-08 18:54   ` Brian Foster
2018-02-08 20:00     ` Darrick J. Wong
2018-02-09 13:10       ` Brian Foster
2018-02-12  0:45         ` Darrick J. Wong
2018-02-15  5:53           ` Darrick J. Wong
2018-02-01  6:41 ` [PATCH 4/7] xfs: turn ag header initialisation into a table driven operation Dave Chinner
2018-02-09 16:11   ` Brian Foster
2018-02-01  6:42 ` [PATCH 5/7] xfs: make imaxpct changes in growfs separate Dave Chinner
2018-02-09 16:11   ` Brian Foster
2018-02-15 22:10     ` Dave Chinner
2018-02-01  6:42 ` [PATCH 6/7] xfs: separate secondary sb update in growfs Dave Chinner
2018-02-09 16:11   ` Brian Foster
2018-02-15 22:23     ` Dave Chinner
2018-02-16 12:31       ` Brian Foster
2018-02-01  6:42 ` [PATCH 7/7] xfs: rework secondary superblock updates " Dave Chinner
2018-02-09 16:12   ` Brian Foster
2018-02-15 22:31     ` Dave Chinner
2018-02-16 12:56       ` Brian Foster
2018-02-16 16:20         ` Darrick J. Wong
2018-02-19  2:16         ` Dave Chinner
2018-02-19 13:21           ` Brian Foster
2018-02-19 22:14             ` Dave Chinner
2018-02-20 12:44               ` Brian Foster
2018-03-24  0:37                 ` Darrick J. Wong
2018-02-06 23:44 ` [PATCH 0/7] xfs: refactor and tablise growfs Darrick J. Wong
2018-02-07  7:10   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.