All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/12] xfs_repair: use btree bulk loading
@ 2020-06-02  4:26 Darrick J. Wong
  2020-06-02  4:26 ` [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl Darrick J. Wong
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:26 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: Eric Sandeen, Brian Foster, linux-xfs, bfoster

Hi all,

In preparation for landing the online fs repair feature, it was
necessary to design a generic btree bulk loading module that it could
use.  Fortunately, xfs_repair has four of these (for the four btree
types), so I synthesized one generic version and pushed it into the
kernel libxfs in 5.7.

That being done, port xfs_repair to use the generic btree bulk loader.
In addition to dropping a lot of code from xfs_repair, this also enables
us to control the fullness of the tree nodes in the rebuilt indices for
testing.

For v5 I rebased the support code from my kernel tree, and fixed some
of the more obvious warts that Brian found in v4.

For v6 I shortened function prefixes, stripped out all the code that
wasn't strictly necessary, and moved the new code to a separate file
so that phase5.c will be less cluttered.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-bulk-load
---
 include/libxfs.h         |    1 
 libxfs/libxfs_api_defs.h |    8 
 repair/Makefile          |    4 
 repair/agbtree.c         |  659 +++++++++++++
 repair/agbtree.h         |   62 +
 repair/bulkload.c        |  134 +++
 repair/bulkload.h        |   59 +
 repair/phase5.c          | 2397 ++++------------------------------------------
 repair/xfs_repair.c      |   17 
 9 files changed, 1164 insertions(+), 2177 deletions(-)
 create mode 100644 repair/agbtree.c
 create mode 100644 repair/agbtree.h
 create mode 100644 repair/bulkload.c
 create mode 100644 repair/bulkload.h


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
@ 2020-06-02  4:26 ` Darrick J. Wong
  2020-06-17 12:09   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 02/12] xfs_repair: rename the agfl index loop variable in build_agf_agfl Darrick J. Wong
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:26 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

We don't do anything with this parameter, so get rid of it.

Fixes: ef4332b8 ("xfs_repair: add freesp btree block overflow to the free space")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index 677297fe..c9b278bd 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -2049,7 +2049,6 @@ build_agf_agfl(
 	struct bt_status	*bno_bt,
 	struct bt_status	*bcnt_bt,
 	xfs_extlen_t		freeblks,	/* # free blocks in tree */
-	int			lostblocks,	/* # blocks that will be lost */
 	struct bt_status	*rmap_bt,
 	struct bt_status	*refcnt_bt,
 	struct xfs_slab		*lost_fsb)
@@ -2465,9 +2464,9 @@ phase5_func(
 		/*
 		 * set up agf and agfl
 		 */
-		build_agf_agfl(mp, agno, &bno_btree_curs,
-				&bcnt_btree_curs, freeblks1, extra_blocks,
-				&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
+		build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs,
+				freeblks1, &rmap_btree_curs,
+				&refcnt_btree_curs, lost_fsb);
 		/*
 		 * build inode allocation tree.
 		 */


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/12] xfs_repair: rename the agfl index loop variable in build_agf_agfl
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
  2020-06-02  4:26 ` [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-17 12:09   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 03/12] xfs_repair: make container for btree bulkload root and block reservation Darrick J. Wong
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

The variable 'i' is used to index the AGFL block list, so change the
name to make it clearer what this is to be used for.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |   28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index c9b278bd..169a2d89 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -2055,7 +2055,7 @@ build_agf_agfl(
 {
 	struct extent_tree_node	*ext_ptr;
 	struct xfs_buf		*agf_buf, *agfl_buf;
-	int			i;
+	unsigned int		agfl_idx;
 	struct xfs_agfl		*agfl;
 	struct xfs_agf		*agf;
 	xfs_fsblock_t		fsb;
@@ -2153,8 +2153,8 @@ build_agf_agfl(
 		agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
 		agfl->agfl_seqno = cpu_to_be32(agno);
 		platform_uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
-		for (i = 0; i < libxfs_agfl_size(mp); i++)
-			freelist[i] = cpu_to_be32(NULLAGBLOCK);
+		for (agfl_idx = 0; agfl_idx < libxfs_agfl_size(mp); agfl_idx++)
+			freelist[agfl_idx] = cpu_to_be32(NULLAGBLOCK);
 	}
 
 	/*
@@ -2165,19 +2165,21 @@ build_agf_agfl(
 		/*
 		 * yes, now grab as many blocks as we can
 		 */
-		i = 0;
-		while (bno_bt->num_free_blocks > 0 && i < libxfs_agfl_size(mp))
+		agfl_idx = 0;
+		while (bno_bt->num_free_blocks > 0 &&
+		       agfl_idx < libxfs_agfl_size(mp))
 		{
-			freelist[i] = cpu_to_be32(
+			freelist[agfl_idx] = cpu_to_be32(
 					get_next_blockaddr(agno, 0, bno_bt));
-			i++;
+			agfl_idx++;
 		}
 
-		while (bcnt_bt->num_free_blocks > 0 && i < libxfs_agfl_size(mp))
+		while (bcnt_bt->num_free_blocks > 0 &&
+		       agfl_idx < libxfs_agfl_size(mp))
 		{
-			freelist[i] = cpu_to_be32(
+			freelist[agfl_idx] = cpu_to_be32(
 					get_next_blockaddr(agno, 0, bcnt_bt));
-			i++;
+			agfl_idx++;
 		}
 		/*
 		 * now throw the rest of the blocks away and complain
@@ -2200,9 +2202,9 @@ _("Insufficient memory saving lost blocks.\n"));
 		}
 
 		agf->agf_flfirst = 0;
-		agf->agf_fllast = cpu_to_be32(i - 1);
-		agf->agf_flcount = cpu_to_be32(i);
-		rmap_store_agflcount(mp, agno, i);
+		agf->agf_fllast = cpu_to_be32(agfl_idx - 1);
+		agf->agf_flcount = cpu_to_be32(agfl_idx);
+		rmap_store_agflcount(mp, agno, agfl_idx);
 
 #ifdef XR_BLD_FREE_TRACE
 		fprintf(stderr, "writing agfl for ag %u\n", agno);


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/12] xfs_repair: make container for btree bulkload root and block reservation
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
  2020-06-02  4:26 ` [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl Darrick J. Wong
  2020-06-02  4:27 ` [PATCH 02/12] xfs_repair: rename the agfl index loop variable in build_agf_agfl Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-17 12:09   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 04/12] xfs_repair: remove gratuitous code block in phase5 Darrick J. Wong
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create appropriate data structures to manage the fake btree root and
block reservation lists needed to stage a btree bulkload operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/libxfs.h         |    1 
 libxfs/libxfs_api_defs.h |    2 +
 repair/Makefile          |    4 +-
 repair/bulkload.c        |   97 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/bulkload.h        |   57 +++++++++++++++++++++++++++
 repair/xfs_repair.c      |   17 ++++++++
 6 files changed, 176 insertions(+), 2 deletions(-)
 create mode 100644 repair/bulkload.c
 create mode 100644 repair/bulkload.h


diff --git a/include/libxfs.h b/include/libxfs.h
index 12447835..b9370139 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -76,6 +76,7 @@ struct iomap;
 #include "xfs_rmap.h"
 #include "xfs_refcount_btree.h"
 #include "xfs_refcount.h"
+#include "xfs_btree_staging.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index be06c763..61047f8f 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -27,12 +27,14 @@
 #define xfs_alloc_fix_freelist		libxfs_alloc_fix_freelist
 #define xfs_alloc_min_freelist		libxfs_alloc_min_freelist
 #define xfs_alloc_read_agf		libxfs_alloc_read_agf
+#define xfs_alloc_vextent		libxfs_alloc_vextent
 
 #define xfs_attr_get			libxfs_attr_get
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_set			libxfs_attr_set
 
+#define __xfs_bmap_add_free		__libxfs_bmap_add_free
 #define xfs_bmapi_read			libxfs_bmapi_read
 #define xfs_bmapi_write			libxfs_bmapi_write
 #define xfs_bmap_last_offset		libxfs_bmap_last_offset
diff --git a/repair/Makefile b/repair/Makefile
index 0964499a..62d84bbf 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -9,11 +9,11 @@ LSRCFILES = README
 
 LTCOMMAND = xfs_repair
 
-HFILES = agheader.h attr_repair.h avl.h bmap.h btree.h \
+HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
 	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
 	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
 
-CFILES = agheader.c attr_repair.c avl.c bmap.c btree.c \
+CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
 	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
 	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
 	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
diff --git a/repair/bulkload.c b/repair/bulkload.c
new file mode 100644
index 00000000..4c69fe0d
--- /dev/null
+++ b/repair/bulkload.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2020 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include <libxfs.h>
+#include "bulkload.h"
+
+int bload_leaf_slack = -1;
+int bload_node_slack = -1;
+
+/* Initialize accounting resources for staging a new AG btree. */
+void
+bulkload_init_ag(
+	struct bulkload			*bkl,
+	struct repair_ctx		*sc,
+	const struct xfs_owner_info	*oinfo)
+{
+	memset(bkl, 0, sizeof(struct bulkload));
+	bkl->sc = sc;
+	bkl->oinfo = *oinfo; /* structure copy */
+	INIT_LIST_HEAD(&bkl->resv_list);
+}
+
+/* Designate specific blocks to be used to build our new btree. */
+int
+bulkload_add_blocks(
+	struct bulkload		*bkl,
+	xfs_fsblock_t		fsbno,
+	xfs_extlen_t		len)
+{
+	struct bulkload_resv	*resv;
+
+	resv = kmem_alloc(sizeof(struct bulkload_resv), KM_MAYFAIL);
+	if (!resv)
+		return ENOMEM;
+
+	INIT_LIST_HEAD(&resv->list);
+	resv->fsbno = fsbno;
+	resv->len = len;
+	resv->used = 0;
+	list_add_tail(&resv->list, &bkl->resv_list);
+	return 0;
+}
+
+/* Free all the accounting info and disk space we reserved for a new btree. */
+void
+bulkload_destroy(
+	struct bulkload		*bkl,
+	int			error)
+{
+	struct bulkload_resv	*resv, *n;
+
+	list_for_each_entry_safe(resv, n, &bkl->resv_list, list) {
+		list_del(&resv->list);
+		kmem_free(resv);
+	}
+}
+
+/* Feed one of the reserved btree blocks to the bulk loader. */
+int
+bulkload_claim_block(
+	struct xfs_btree_cur	*cur,
+	struct bulkload		*bkl,
+	union xfs_btree_ptr	*ptr)
+{
+	struct bulkload_resv	*resv;
+	xfs_fsblock_t		fsb;
+
+	/*
+	 * The first item in the list should always have a free block unless
+	 * we're completely out.
+	 */
+	resv = list_first_entry(&bkl->resv_list, struct bulkload_resv, list);
+	if (resv->used == resv->len)
+		return ENOSPC;
+
+	/*
+	 * Peel off a block from the start of the reservation.  We allocate
+	 * blocks in order to place blocks on disk in increasing record or key
+	 * order.  The block reservations tend to end up on the list in
+	 * decreasing order, which hopefully results in leaf blocks ending up
+	 * together.
+	 */
+	fsb = resv->fsbno + resv->used;
+	resv->used++;
+
+	/* If we used all the blocks in this reservation, move it to the end. */
+	if (resv->used == resv->len)
+		list_move_tail(&resv->list, &bkl->resv_list);
+
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+		ptr->l = cpu_to_be64(fsb);
+	else
+		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
+	return 0;
+}
diff --git a/repair/bulkload.h b/repair/bulkload.h
new file mode 100644
index 00000000..79f81cb0
--- /dev/null
+++ b/repair/bulkload.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_REPAIR_BULKLOAD_H__
+#define __XFS_REPAIR_BULKLOAD_H__
+
+extern int bload_leaf_slack;
+extern int bload_node_slack;
+
+struct repair_ctx {
+	struct xfs_mount	*mp;
+};
+
+struct bulkload_resv {
+	/* Link to list of extents that we've reserved. */
+	struct list_head	list;
+
+	/* FSB of the block we reserved. */
+	xfs_fsblock_t		fsbno;
+
+	/* Length of the reservation. */
+	xfs_extlen_t		len;
+
+	/* How much of this reservation we've used. */
+	xfs_extlen_t		used;
+};
+
+struct bulkload {
+	struct repair_ctx	*sc;
+
+	/* List of extents that we've reserved. */
+	struct list_head	resv_list;
+
+	/* Fake root for new btree. */
+	struct xbtree_afakeroot	afake;
+
+	/* rmap owner of these blocks */
+	struct xfs_owner_info	oinfo;
+
+	/* The last reservation we allocated from. */
+	struct bulkload_resv	*last_resv;
+};
+
+#define for_each_bulkload_reservation(bkl, resv, n)	\
+	list_for_each_entry_safe((resv), (n), &(bkl)->resv_list, list)
+
+void bulkload_init_ag(struct bulkload *bkl, struct repair_ctx *sc,
+		const struct xfs_owner_info *oinfo);
+int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
+		xfs_extlen_t len);
+void bulkload_destroy(struct bulkload *bkl, int error);
+int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
+		union xfs_btree_ptr *ptr);
+
+#endif /* __XFS_REPAIR_BULKLOAD_H__ */
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index 9d72fa8e..3bfc8311 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -24,6 +24,7 @@
 #include "rmap.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/platform.h"
+#include "bulkload.h"
 
 /*
  * option tables for getsubopt calls
@@ -39,6 +40,8 @@ enum o_opt_nums {
 	AG_STRIDE,
 	FORCE_GEO,
 	PHASE2_THREADS,
+	BLOAD_LEAF_SLACK,
+	BLOAD_NODE_SLACK,
 	O_MAX_OPTS,
 };
 
@@ -49,6 +52,8 @@ static char *o_opts[] = {
 	[AG_STRIDE]		= "ag_stride",
 	[FORCE_GEO]		= "force_geometry",
 	[PHASE2_THREADS]	= "phase2_threads",
+	[BLOAD_LEAF_SLACK]	= "debug_bload_leaf_slack",
+	[BLOAD_NODE_SLACK]	= "debug_bload_node_slack",
 	[O_MAX_OPTS]		= NULL,
 };
 
@@ -260,6 +265,18 @@ process_args(int argc, char **argv)
 		_("-o phase2_threads requires a parameter\n"));
 					phase2_threads = (int)strtol(val, NULL, 0);
 					break;
+				case BLOAD_LEAF_SLACK:
+					if (!val)
+						do_abort(
+		_("-o debug_bload_leaf_slack requires a parameter\n"));
+					bload_leaf_slack = (int)strtol(val, NULL, 0);
+					break;
+				case BLOAD_NODE_SLACK:
+					if (!val)
+						do_abort(
+		_("-o debug_bload_node_slack requires a parameter\n"));
+					bload_node_slack = (int)strtol(val, NULL, 0);
+					break;
 				default:
 					unknown('o', val);
 					break;


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/12] xfs_repair: remove gratuitous code block in phase5
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 03/12] xfs_repair: make container for btree bulkload root and block reservation Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-02  4:27 ` [PATCH 05/12] xfs_repair: inject lost blocks back into the fs no matter the owner Darrick J. Wong
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: Eric Sandeen, Brian Foster, linux-xfs, bfoster

From: Eric Sandeen <sandeen@redhat.com>

A commit back in 2008 removed a "for" loop ahead of this code block, but
left the indented code block in place. Remove it for clarity and reflow
comments & lines as needed.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 repair/phase5.c |  316 ++++++++++++++++++++++++++-----------------------------
 1 file changed, 150 insertions(+), 166 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index 169a2d89..44a6bda8 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -2314,201 +2314,185 @@ phase5_func(
 	if (verbose)
 		do_log(_("        - agno = %d\n"), agno);
 
-	{
-		/*
-		 * build up incore bno and bcnt extent btrees
-		 */
-		num_extents = mk_incore_fstree(mp, agno);
+	/*
+	 * build up incore bno and bcnt extent btrees
+	 */
+	num_extents = mk_incore_fstree(mp, agno);
 
 #ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, "# of bno extents is %d\n",
-				count_bno_extents(agno));
+	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
 #endif
 
-		if (num_extents == 0)  {
-			/*
-			 * XXX - what we probably should do here is pick an
-			 * inode for a regular file in the allocation group
-			 * that has space allocated and shoot it by traversing
-			 * the bmap list and putting all its extents on the
-			 * incore freespace trees, clearing the inode,
-			 * and clearing the in-use bit in the incore inode
-			 * tree.  Then try mk_incore_fstree() again.
-			 */
-			do_error(_("unable to rebuild AG %u.  "
-				  "Not enough free space in on-disk AG.\n"),
-				agno);
-		}
-
-		/*
-		 * ok, now set up the btree cursors for the
-		 * on-disk btrees (includs pre-allocating all
-		 * required blocks for the trees themselves)
-		 */
-		init_ino_cursor(mp, agno, &ino_btree_curs, &num_inos,
-				&num_free_inos, 0);
-
-		if (xfs_sb_version_hasfinobt(&mp->m_sb))
-			init_ino_cursor(mp, agno, &fino_btree_curs,
-					&finobt_num_inos, &finobt_num_free_inos,
-					1);
-
-		sb_icount_ag[agno] += num_inos;
-		sb_ifree_ag[agno] += num_free_inos;
-
-		/*
-		 * Set up the btree cursors for the on-disk rmap btrees,
-		 * which includes pre-allocating all required blocks.
-		 */
-		init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
-
+	if (num_extents == 0)  {
 		/*
-		 * Set up the btree cursors for the on-disk refcount btrees,
-		 * which includes pre-allocating all required blocks.
+		 * XXX - what we probably should do here is pick an inode for
+		 * a regular file in the allocation group that has space
+		 * allocated and shoot it by traversing the bmap list and
+		 * putting all its extents on the incore freespace trees,
+		 * clearing the inode, and clearing the in-use bit in the
+		 * incore inode tree.  Then try mk_incore_fstree() again.
 		 */
-		init_refc_cursor(mp, agno, &refcnt_btree_curs);
-
-		num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
+		do_error(
+_("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
+			agno);
+	}
+
+	/*
+	 * ok, now set up the btree cursors for the on-disk btrees (includes
+	 * pre-allocating all required blocks for the trees themselves)
+	 */
+	init_ino_cursor(mp, agno, &ino_btree_curs, &num_inos,
+			&num_free_inos, 0);
+
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		init_ino_cursor(mp, agno, &fino_btree_curs, &finobt_num_inos,
+				&finobt_num_free_inos, 1);
+
+	sb_icount_ag[agno] += num_inos;
+	sb_ifree_ag[agno] += num_free_inos;
+
+	/*
+	 * Set up the btree cursors for the on-disk rmap btrees, which includes
+	 * pre-allocating all required blocks.
+	 */
+	init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
+
+	/*
+	 * Set up the btree cursors for the on-disk refcount btrees,
+	 * which includes pre-allocating all required blocks.
+	 */
+	init_refc_cursor(mp, agno, &refcnt_btree_curs);
+
+	num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
+	/*
+	 * lose two blocks per AG -- the space tree roots are counted as
+	 * allocated since the space trees always have roots
+	 */
+	sb_fdblocks_ag[agno] += num_freeblocks - 2;
+
+	if (num_extents == 0)  {
 		/*
-		 * lose two blocks per AG -- the space tree roots
-		 * are counted as allocated since the space trees
-		 * always have roots
+		 * XXX - what we probably should do here is pick an inode for
+		 * a regular file in the allocation group that has space
+		 * allocated and shoot it by traversing the bmap list and
+		 * putting all its extents on the incore freespace trees,
+		 * clearing the inode, and clearing the in-use bit in the
+		 * incore inode tree.  Then try mk_incore_fstree() again.
 		 */
-		sb_fdblocks_ag[agno] += num_freeblocks - 2;
-
-		if (num_extents == 0)  {
-			/*
-			 * XXX - what we probably should do here is pick an
-			 * inode for a regular file in the allocation group
-			 * that has space allocated and shoot it by traversing
-			 * the bmap list and putting all its extents on the
-			 * incore freespace trees, clearing the inode,
-			 * and clearing the in-use bit in the incore inode
-			 * tree.  Then try mk_incore_fstree() again.
-			 */
-			do_error(
-			_("unable to rebuild AG %u.  No free space.\n"), agno);
-		}
+		do_error(_("unable to rebuild AG %u.  No free space.\n"), agno);
+	}
 
 #ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, "# of bno extents is %d\n", num_extents);
+	fprintf(stderr, "# of bno extents is %d\n", num_extents);
 #endif
 
-		/*
-		 * track blocks that we might really lose
-		 */
-		extra_blocks = calculate_freespace_cursor(mp, agno,
-					&num_extents, &bno_btree_curs);
+	/*
+	 * track blocks that we might really lose
+	 */
+	extra_blocks = calculate_freespace_cursor(mp, agno,
+				&num_extents, &bno_btree_curs);
 
-		/*
-		 * freespace btrees live in the "free space" but
-		 * the filesystem treats AGFL blocks as allocated
-		 * since they aren't described by the freespace trees
-		 */
+	/*
+	 * freespace btrees live in the "free space" but the filesystem treats
+	 * AGFL blocks as allocated since they aren't described by the
+	 * freespace trees
+	 */
 
-		/*
-		 * see if we can fit all the extra blocks into the AGFL
-		 */
-		extra_blocks = (extra_blocks - libxfs_agfl_size(mp) > 0)
-				? extra_blocks - libxfs_agfl_size(mp)
-				: 0;
+	/*
+	 * see if we can fit all the extra blocks into the AGFL
+	 */
+	extra_blocks = (extra_blocks - libxfs_agfl_size(mp) > 0) ?
+			extra_blocks - libxfs_agfl_size(mp) : 0;
 
-		if (extra_blocks > 0)
-			sb_fdblocks_ag[agno] -= extra_blocks;
+	if (extra_blocks > 0)
+		sb_fdblocks_ag[agno] -= extra_blocks;
 
-		bcnt_btree_curs = bno_btree_curs;
+	bcnt_btree_curs = bno_btree_curs;
 
-		bno_btree_curs.owner = XFS_RMAP_OWN_AG;
-		bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
-		setup_cursor(mp, agno, &bno_btree_curs);
-		setup_cursor(mp, agno, &bcnt_btree_curs);
+	bno_btree_curs.owner = XFS_RMAP_OWN_AG;
+	bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
+	setup_cursor(mp, agno, &bno_btree_curs);
+	setup_cursor(mp, agno, &bcnt_btree_curs);
 
 #ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, "# of bno extents is %d\n",
-				count_bno_extents(agno));
-		fprintf(stderr, "# of bcnt extents is %d\n",
-				count_bcnt_extents(agno));
+	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
+	fprintf(stderr, "# of bcnt extents is %d\n", count_bcnt_extents(agno));
 #endif
 
-		/*
-		 * now rebuild the freespace trees
-		 */
-		freeblks1 = build_freespace_tree(mp, agno,
+	/*
+	 * now rebuild the freespace trees
+	 */
+	freeblks1 = build_freespace_tree(mp, agno,
 					&bno_btree_curs, XFS_BTNUM_BNO);
 #ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, "# of free blocks == %d\n", freeblks1);
+	fprintf(stderr, "# of free blocks == %d\n", freeblks1);
 #endif
-		write_cursor(&bno_btree_curs);
+	write_cursor(&bno_btree_curs);
 
 #ifdef DEBUG
-		freeblks2 = build_freespace_tree(mp, agno,
-					&bcnt_btree_curs, XFS_BTNUM_CNT);
+	freeblks2 = build_freespace_tree(mp, agno,
+				&bcnt_btree_curs, XFS_BTNUM_CNT);
 #else
-		(void) build_freespace_tree(mp, agno,
-					&bcnt_btree_curs, XFS_BTNUM_CNT);
+	(void) build_freespace_tree(mp, agno, &bcnt_btree_curs, XFS_BTNUM_CNT);
 #endif
-		write_cursor(&bcnt_btree_curs);
-
-		ASSERT(freeblks1 == freeblks2);
-
-		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-			build_rmap_tree(mp, agno, &rmap_btree_curs);
-			write_cursor(&rmap_btree_curs);
-			sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
-					rmap_btree_curs.num_free_blocks) - 1;
-		}
-
-		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-			build_refcount_tree(mp, agno, &refcnt_btree_curs);
-			write_cursor(&refcnt_btree_curs);
-		}
-
-		/*
-		 * set up agf and agfl
-		 */
-		build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs,
-				freeblks1, &rmap_btree_curs,
-				&refcnt_btree_curs, lost_fsb);
-		/*
-		 * build inode allocation tree.
-		 */
-		build_ino_tree(mp, agno, &ino_btree_curs, XFS_BTNUM_INO,
-				&agi_stat);
-		write_cursor(&ino_btree_curs);
-
-		/*
-		 * build free inode tree
-		 */
-		if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-			build_ino_tree(mp, agno, &fino_btree_curs,
-					XFS_BTNUM_FINO, NULL);
-			write_cursor(&fino_btree_curs);
-		}
-
-		/* build the agi */
-		build_agi(mp, agno, &ino_btree_curs, &fino_btree_curs,
-			  &agi_stat);
-
-		/*
-		 * tear down cursors
-		 */
-		finish_cursor(&bno_btree_curs);
-		finish_cursor(&ino_btree_curs);
-		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
-			finish_cursor(&rmap_btree_curs);
-		if (xfs_sb_version_hasreflink(&mp->m_sb))
-			finish_cursor(&refcnt_btree_curs);
-		if (xfs_sb_version_hasfinobt(&mp->m_sb))
-			finish_cursor(&fino_btree_curs);
-		finish_cursor(&bcnt_btree_curs);
-
-		/*
-		 * release the incore per-AG bno/bcnt trees so
-		 * the extent nodes can be recycled
-		 */
-		release_agbno_extent_tree(agno);
-		release_agbcnt_extent_tree(agno);
+	write_cursor(&bcnt_btree_curs);
+
+	ASSERT(freeblks1 == freeblks2);
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		build_rmap_tree(mp, agno, &rmap_btree_curs);
+		write_cursor(&rmap_btree_curs);
+		sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
+				rmap_btree_curs.num_free_blocks) - 1;
+	}
+
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		build_refcount_tree(mp, agno, &refcnt_btree_curs);
+		write_cursor(&refcnt_btree_curs);
 	}
+
+	/*
+	 * set up agf and agfl
+	 */
+	build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs, freeblks1,
+			&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
+	/*
+	 * build inode allocation tree.
+	 */
+	build_ino_tree(mp, agno, &ino_btree_curs, XFS_BTNUM_INO, &agi_stat);
+	write_cursor(&ino_btree_curs);
+
+	/*
+	 * build free inode tree
+	 */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
+		build_ino_tree(mp, agno, &fino_btree_curs,
+				XFS_BTNUM_FINO, NULL);
+		write_cursor(&fino_btree_curs);
+	}
+
+	/* build the agi */
+	build_agi(mp, agno, &ino_btree_curs, &fino_btree_curs, &agi_stat);
+
+	/*
+	 * tear down cursors
+	 */
+	finish_cursor(&bno_btree_curs);
+	finish_cursor(&ino_btree_curs);
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		finish_cursor(&rmap_btree_curs);
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		finish_cursor(&refcnt_btree_curs);
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		finish_cursor(&fino_btree_curs);
+	finish_cursor(&bcnt_btree_curs);
+
+	/*
+	 * release the incore per-AG bno/bcnt trees so the extent nodes
+	 * can be recycled
+	 */
+	release_agbno_extent_tree(agno);
+	release_agbcnt_extent_tree(agno);
 	PROG_RPT_INC(prog_rpt_done[agno], 1);
 }
 


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/12] xfs_repair: inject lost blocks back into the fs no matter the owner
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 04/12] xfs_repair: remove gratuitous code block in phase5 Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-17 12:09   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors Darrick J. Wong
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

In repair phase 5, inject_lost_blocks takes the blocks that we allocated
but didn't use for constructing the new AG btrees and puts them back in
the filesystem by adding them to the free space.  The only btree that
can overestimate like that are the free space btrees, but in principle,
any of the btrees can do that.  If the others did, the rmap record owner
for those blocks won't necessarily be OWNER_AG, and if it isn't, repair
will fail.

Get rid of this logic bomb so that we can use it for /any/ block count
overestimation, and then we can use it to clean up after all
reconstruction of any btree type.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index 44a6bda8..75c480fd 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -2516,8 +2516,8 @@ inject_lost_blocks(
 		if (error)
 			goto out_cancel;
 
-		error = -libxfs_free_extent(tp, *fsb, 1, &XFS_RMAP_OINFO_AG,
-					    XFS_AG_RESV_NONE);
+		error = -libxfs_free_extent(tp, *fsb, 1,
+				&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
 		if (error)
 			goto out_cancel;
 


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (4 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 05/12] xfs_repair: inject lost blocks back into the fs no matter the owner Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-17 12:10   ` Brian Foster
  2020-07-02 15:18   ` [PATCH v2 " Darrick J. Wong
  2020-06-02  4:27 ` [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader Darrick J. Wong
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create some new support structures and functions to assist phase5 in
using the btree bulk loader to reconstruct metadata btrees.  This is the
first step in removing the open-coded AG btree rebuilding code.

Note: The code in this patch will not be used anywhere until the next
patch, so warnings about unused symbols are expected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/Makefile   |    4 +
 repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/agbtree.h  |   29 ++++++++++
 repair/bulkload.c |   37 +++++++++++++
 repair/bulkload.h |    2 +
 repair/phase5.c   |   41 ++++++++------
 6 files changed, 244 insertions(+), 21 deletions(-)
 create mode 100644 repair/agbtree.c
 create mode 100644 repair/agbtree.h


diff --git a/repair/Makefile b/repair/Makefile
index 62d84bbf..f6a6e3f9 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -9,11 +9,11 @@ LSRCFILES = README
 
 LTCOMMAND = xfs_repair
 
-HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
+HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
 	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
 	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
 
-CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
+CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
 	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
 	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
 	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
diff --git a/repair/agbtree.c b/repair/agbtree.c
new file mode 100644
index 00000000..e4179a44
--- /dev/null
+++ b/repair/agbtree.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2020 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include <libxfs.h>
+#include "err_protos.h"
+#include "slab.h"
+#include "rmap.h"
+#include "incore.h"
+#include "bulkload.h"
+#include "agbtree.h"
+
+/* Initialize a btree rebuild context. */
+static void
+init_rebuild(
+	struct repair_ctx		*sc,
+	const struct xfs_owner_info	*oinfo,
+	xfs_agblock_t			free_space,
+	struct bt_rebuild		*btr)
+{
+	memset(btr, 0, sizeof(struct bt_rebuild));
+
+	bulkload_init_ag(&btr->newbt, sc, oinfo);
+	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
+}
+
+/*
+ * Update this free space record to reflect the blocks we stole from the
+ * beginning of the record.
+ */
+static void
+consume_freespace(
+	xfs_agnumber_t		agno,
+	struct extent_tree_node	*ext_ptr,
+	uint32_t		len)
+{
+	struct extent_tree_node	*bno_ext_ptr;
+	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
+	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
+
+	/* Delete the used-up extent from both extent trees. */
+#ifdef XR_BLD_FREE_TRACE
+	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
+			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
+#endif
+	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
+	ASSERT(bno_ext_ptr != NULL);
+	get_bno_extent(agno, bno_ext_ptr);
+	release_extent_tree_node(bno_ext_ptr);
+
+	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
+			ext_ptr->ex_blockcount);
+	release_extent_tree_node(ext_ptr);
+
+	/*
+	 * If we only used part of this last extent, then we must reinsert the
+	 * extent to maintain proper sorting order.
+	 */
+	if (new_len > 0) {
+		add_bno_extent(agno, new_start, new_len);
+		add_bcnt_extent(agno, new_start, new_len);
+	}
+}
+
+/* Reserve blocks for the new btree. */
+static void
+reserve_btblocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr,
+	uint32_t		nr_blocks)
+{
+	struct extent_tree_node	*ext_ptr;
+	uint32_t		blocks_allocated = 0;
+	uint32_t		len;
+	int			error;
+
+	while (blocks_allocated < nr_blocks)  {
+		xfs_fsblock_t	fsbno;
+
+		/*
+		 * Grab the smallest extent and use it up, then get the
+		 * next smallest.  This mimics the init_*_cursor code.
+		 */
+		ext_ptr = findfirst_bcnt_extent(agno);
+		if (!ext_ptr)
+			do_error(
+_("error - not enough free space in filesystem\n"));
+
+		/* Use up the extent we've got. */
+		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
+		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
+		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
+		if (error)
+			do_error(_("could not set up btree reservation: %s\n"),
+				strerror(-error));
+
+		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
+				btr->newbt.oinfo.oi_owner);
+		if (error)
+			do_error(_("could not set up btree rmaps: %s\n"),
+				strerror(-error));
+
+		consume_freespace(agno, ext_ptr, len);
+		blocks_allocated += len;
+	}
+#ifdef XR_BLD_FREE_TRACE
+	fprintf(stderr, "blocks_allocated = %d\n",
+		blocks_allocated);
+#endif
+}
+
+/* Feed one of the new btree blocks to the bulk loader. */
+static int
+rebuild_claim_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	void			*priv)
+{
+	struct bt_rebuild	*btr = priv;
+
+	return bulkload_claim_block(cur, &btr->newbt, ptr);
+}
+
+/*
+ * Scoop up leftovers from a rebuild cursor for later freeing, then free the
+ * rebuild context.
+ */
+void
+finish_rebuild(
+	struct xfs_mount	*mp,
+	struct bt_rebuild	*btr,
+	struct xfs_slab		*lost_fsb)
+{
+	struct bulkload_resv	*resv, *n;
+
+	for_each_bulkload_reservation(&btr->newbt, resv, n) {
+		while (resv->used < resv->len) {
+			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
+			int		error;
+
+			error = slab_add(lost_fsb, &fsb);
+			if (error)
+				do_error(
+_("Insufficient memory saving lost blocks.\n"));
+			resv->used++;
+		}
+	}
+
+	bulkload_destroy(&btr->newbt, 0);
+}
diff --git a/repair/agbtree.h b/repair/agbtree.h
new file mode 100644
index 00000000..50ea3c60
--- /dev/null
+++ b/repair/agbtree.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_REPAIR_AG_BTREE_H__
+#define __XFS_REPAIR_AG_BTREE_H__
+
+/* Context for rebuilding a per-AG btree. */
+struct bt_rebuild {
+	/* Fake root for staging and space preallocations. */
+	struct bulkload	newbt;
+
+	/* Geometry of the new btree. */
+	struct xfs_btree_bload	bload;
+
+	/* Staging btree cursor for the new tree. */
+	struct xfs_btree_cur	*cur;
+
+	/* Tree-specific data. */
+	union {
+		struct xfs_slab_cursor	*slab_cursor;
+	};
+};
+
+void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
+		struct xfs_slab *lost_fsb);
+
+#endif /* __XFS_REPAIR_AG_BTREE_H__ */
diff --git a/repair/bulkload.c b/repair/bulkload.c
index 4c69fe0d..9a6ca0c2 100644
--- a/repair/bulkload.c
+++ b/repair/bulkload.c
@@ -95,3 +95,40 @@ bulkload_claim_block(
 		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
 	return 0;
 }
+
+/*
+ * Estimate proper slack values for a btree that's being reloaded.
+ *
+ * Under most circumstances, we'll take whatever default loading value the
+ * btree bulk loading code calculates for us.  However, there are some
+ * exceptions to this rule:
+ *
+ * (1) If someone turned one of the debug knobs.
+ * (2) The AG has less than ~9% space free.
+ *
+ * Note that we actually use 3/32 for the comparison to avoid division.
+ */
+void
+bulkload_estimate_ag_slack(
+	struct repair_ctx	*sc,
+	struct xfs_btree_bload	*bload,
+	unsigned int		free)
+{
+	/*
+	 * The global values are set to -1 (i.e. take the bload defaults)
+	 * unless someone has set them otherwise, so we just pull the values
+	 * here.
+	 */
+	bload->leaf_slack = bload_leaf_slack;
+	bload->node_slack = bload_node_slack;
+
+	/* No further changes if there's more than 3/32ths space left. */
+	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
+		return;
+
+	/* We're low on space; load the btrees as tightly as possible. */
+	if (bload->leaf_slack < 0)
+		bload->leaf_slack = 0;
+	if (bload->node_slack < 0)
+		bload->node_slack = 0;
+}
diff --git a/repair/bulkload.h b/repair/bulkload.h
index 79f81cb0..01f67279 100644
--- a/repair/bulkload.h
+++ b/repair/bulkload.h
@@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
 void bulkload_destroy(struct bulkload *bkl, int error);
 int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
 		union xfs_btree_ptr *ptr);
+void bulkload_estimate_ag_slack(struct repair_ctx *sc,
+		struct xfs_btree_bload *bload, unsigned int free);
 
 #endif /* __XFS_REPAIR_BULKLOAD_H__ */
diff --git a/repair/phase5.c b/repair/phase5.c
index 75c480fd..8175aa6f 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -18,6 +18,8 @@
 #include "progress.h"
 #include "slab.h"
 #include "rmap.h"
+#include "bulkload.h"
+#include "agbtree.h"
 
 /*
  * we maintain the current slice (path from root to leaf)
@@ -2288,28 +2290,29 @@ keep_fsinos(xfs_mount_t *mp)
 
 static void
 phase5_func(
-	xfs_mount_t	*mp,
-	xfs_agnumber_t	agno,
-	struct xfs_slab	*lost_fsb)
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct xfs_slab		*lost_fsb)
 {
-	uint64_t	num_inos;
-	uint64_t	num_free_inos;
-	uint64_t	finobt_num_inos;
-	uint64_t	finobt_num_free_inos;
-	bt_status_t	bno_btree_curs;
-	bt_status_t	bcnt_btree_curs;
-	bt_status_t	ino_btree_curs;
-	bt_status_t	fino_btree_curs;
-	bt_status_t	rmap_btree_curs;
-	bt_status_t	refcnt_btree_curs;
-	int		extra_blocks = 0;
-	uint		num_freeblocks;
-	xfs_extlen_t	freeblks1;
+	struct repair_ctx	sc = { .mp = mp, };
+	struct agi_stat		agi_stat = {0,};
+	uint64_t		num_inos;
+	uint64_t		num_free_inos;
+	uint64_t		finobt_num_inos;
+	uint64_t		finobt_num_free_inos;
+	bt_status_t		bno_btree_curs;
+	bt_status_t		bcnt_btree_curs;
+	bt_status_t		ino_btree_curs;
+	bt_status_t		fino_btree_curs;
+	bt_status_t		rmap_btree_curs;
+	bt_status_t		refcnt_btree_curs;
+	int			extra_blocks = 0;
+	uint			num_freeblocks;
+	xfs_extlen_t		freeblks1;
 #ifdef DEBUG
-	xfs_extlen_t	freeblks2;
+	xfs_extlen_t		freeblks2;
 #endif
-	xfs_agblock_t	num_extents;
-	struct agi_stat	agi_stat = {0,};
+	xfs_agblock_t		num_extents;
 
 	if (verbose)
 		do_log(_("        - agno = %d\n"), agno);


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (5 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-18 15:23   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 08/12] xfs_repair: rebuild inode " Darrick J. Wong
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the btree bulk loading functions to rebuild the free space btrees
and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_api_defs.h |    3 
 repair/agbtree.c         |  158 ++++++++++
 repair/agbtree.h         |   10 +
 repair/phase5.c          |  703 ++++------------------------------------------
 4 files changed, 236 insertions(+), 638 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 61047f8f..bace739c 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -24,6 +24,7 @@
 
 #define xfs_alloc_ag_max_usable		libxfs_alloc_ag_max_usable
 #define xfs_allocbt_maxrecs		libxfs_allocbt_maxrecs
+#define xfs_allocbt_stage_cursor	libxfs_allocbt_stage_cursor
 #define xfs_alloc_fix_freelist		libxfs_alloc_fix_freelist
 #define xfs_alloc_min_freelist		libxfs_alloc_min_freelist
 #define xfs_alloc_read_agf		libxfs_alloc_read_agf
@@ -41,6 +42,8 @@
 #define xfs_bmbt_maxrecs		libxfs_bmbt_maxrecs
 #define xfs_bmdr_maxrecs		libxfs_bmdr_maxrecs
 
+#define xfs_btree_bload			libxfs_btree_bload
+#define xfs_btree_bload_compute_geometry libxfs_btree_bload_compute_geometry
 #define xfs_btree_del_cursor		libxfs_btree_del_cursor
 #define xfs_btree_init_block		libxfs_btree_init_block
 #define xfs_buf_delwri_submit		libxfs_buf_delwri_submit
diff --git a/repair/agbtree.c b/repair/agbtree.c
index e4179a44..3b8ab47c 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -150,3 +150,161 @@ _("Insufficient memory saving lost blocks.\n"));
 
 	bulkload_destroy(&btr->newbt, 0);
 }
+
+/*
+ * Free Space Btrees
+ *
+ * We need to leave some free records in the tree for the corner case of
+ * setting up the AGFL. This may require allocation of blocks, and as
+ * such can require insertion of new records into the tree (e.g. moving
+ * a record in the by-count tree when a long extent is shortened). If we
+ * pack the records into the leaves with no slack space, this requires a
+ * leaf split to occur and a block to be allocated from the free list.
+ * If we don't have any blocks on the free list (because we are setting
+ * it up!), then we fail, and the filesystem will fail with the same
+ * failure at runtime. Hence leave a couple of records slack space in
+ * each block to allow immediate modification of the tree without
+ * requiring splits to be done.
+ */
+
+/*
+ * Return the next free space extent tree record from the previous value we
+ * saw.
+ */
+static inline struct extent_tree_node *
+get_bno_rec(
+	struct xfs_btree_cur	*cur,
+	struct extent_tree_node	*prev_value)
+{
+	xfs_agnumber_t		agno = cur->bc_ag.agno;
+
+	if (cur->bc_btnum == XFS_BTNUM_BNO) {
+		if (!prev_value)
+			return findfirst_bno_extent(agno);
+		return findnext_bno_extent(prev_value);
+	}
+
+	/* cnt btree */
+	if (!prev_value)
+		return findfirst_bcnt_extent(agno);
+	return findnext_bcnt_extent(agno, prev_value);
+}
+
+/* Grab one bnobt record and put it in the btree cursor. */
+static int
+get_bnobt_record(
+	struct xfs_btree_cur		*cur,
+	void				*priv)
+{
+	struct bt_rebuild		*btr = priv;
+	struct xfs_alloc_rec_incore	*arec = &cur->bc_rec.a;
+
+	btr->bno_rec = get_bno_rec(cur, btr->bno_rec);
+	arec->ar_startblock = btr->bno_rec->ex_startblock;
+	arec->ar_blockcount = btr->bno_rec->ex_blockcount;
+	btr->freeblks += btr->bno_rec->ex_blockcount;
+	return 0;
+}
+
+void
+init_freespace_cursors(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	unsigned int		free_space,
+	unsigned int		*nr_extents,
+	int			*extra_blocks,
+	struct bt_rebuild	*btr_bno,
+	struct bt_rebuild	*btr_cnt)
+{
+	unsigned int		bno_blocks;
+	unsigned int		cnt_blocks;
+	int			error;
+
+	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_bno);
+	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_cnt);
+
+	btr_bno->cur = libxfs_allocbt_stage_cursor(sc->mp,
+			&btr_bno->newbt.afake, agno, XFS_BTNUM_BNO);
+	btr_cnt->cur = libxfs_allocbt_stage_cursor(sc->mp,
+			&btr_cnt->newbt.afake, agno, XFS_BTNUM_CNT);
+
+	btr_bno->bload.get_record = get_bnobt_record;
+	btr_bno->bload.claim_block = rebuild_claim_block;
+
+	btr_cnt->bload.get_record = get_bnobt_record;
+	btr_cnt->bload.claim_block = rebuild_claim_block;
+
+	/*
+	 * Now we need to allocate blocks for the free space btrees using the
+	 * free space records we're about to put in them.  Every record we use
+	 * can change the shape of the free space trees, so we recompute the
+	 * btree shape until we stop needing /more/ blocks.  If we have any
+	 * left over we'll stash them in the AGFL when we're done.
+	 */
+	do {
+		unsigned int	num_freeblocks;
+
+		bno_blocks = btr_bno->bload.nr_blocks;
+		cnt_blocks = btr_cnt->bload.nr_blocks;
+
+		/* Compute how many bnobt blocks we'll need. */
+		error = -libxfs_btree_bload_compute_geometry(btr_bno->cur,
+				&btr_bno->bload, *nr_extents);
+		if (error)
+			do_error(
+_("Unable to compute free space by block btree geometry, error %d.\n"), -error);
+
+		/* Compute how many cntbt blocks we'll need. */
+		error = -libxfs_btree_bload_compute_geometry(btr_cnt->cur,
+				&btr_cnt->bload, *nr_extents);
+		if (error)
+			do_error(
+_("Unable to compute free space by length btree geometry, error %d.\n"), -error);
+
+		/* We don't need any more blocks, so we're done. */
+		if (bno_blocks >= btr_bno->bload.nr_blocks &&
+		    cnt_blocks >= btr_cnt->bload.nr_blocks)
+			break;
+
+		/* Allocate however many more blocks we need this time. */
+		if (bno_blocks < btr_bno->bload.nr_blocks)
+			reserve_btblocks(sc->mp, agno, btr_bno,
+					btr_bno->bload.nr_blocks - bno_blocks);
+		if (cnt_blocks < btr_cnt->bload.nr_blocks)
+			reserve_btblocks(sc->mp, agno, btr_cnt,
+					btr_cnt->bload.nr_blocks - cnt_blocks);
+
+		/* Ok, now how many free space records do we have? */
+		*nr_extents = count_bno_extents_blocks(agno, &num_freeblocks);
+	} while (1);
+
+	*extra_blocks = (bno_blocks - btr_bno->bload.nr_blocks) +
+			(cnt_blocks - btr_cnt->bload.nr_blocks);
+}
+
+/* Rebuild the free space btrees. */
+void
+build_freespace_btrees(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr_bno,
+	struct bt_rebuild	*btr_cnt)
+{
+	int			error;
+
+	/* Add all observed bnobt records. */
+	error = -libxfs_btree_bload(btr_bno->cur, &btr_bno->bload, btr_bno);
+	if (error)
+		do_error(
+_("Error %d while creating bnobt btree for AG %u.\n"), error, agno);
+
+	/* Add all observed cntbt records. */
+	error = -libxfs_btree_bload(btr_cnt->cur, &btr_cnt->bload, btr_cnt);
+	if (error)
+		do_error(
+_("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
+
+	/* Since we're not writing the AGF yet, no need to commit the cursor */
+	libxfs_btree_del_cursor(btr_bno->cur, 0);
+	libxfs_btree_del_cursor(btr_cnt->cur, 0);
+}
diff --git a/repair/agbtree.h b/repair/agbtree.h
index 50ea3c60..63352247 100644
--- a/repair/agbtree.h
+++ b/repair/agbtree.h
@@ -20,10 +20,20 @@ struct bt_rebuild {
 	/* Tree-specific data. */
 	union {
 		struct xfs_slab_cursor	*slab_cursor;
+		struct {
+			struct extent_tree_node	*bno_rec;
+			unsigned int		freeblks;
+		};
 	};
 };
 
 void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
 		struct xfs_slab *lost_fsb);
+void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
+		unsigned int free_space, unsigned int *nr_extents,
+		int *extra_blocks, struct bt_rebuild *btr_bno,
+		struct bt_rebuild *btr_cnt);
+void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
+		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
 
 #endif /* __XFS_REPAIR_AG_BTREE_H__ */
diff --git a/repair/phase5.c b/repair/phase5.c
index 8175aa6f..a93d900d 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -81,7 +81,10 @@ static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
 static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
 
 static int
-mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
+mk_incore_fstree(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	unsigned int		*num_freeblocks)
 {
 	int			in_extent;
 	int			num_extents;
@@ -93,6 +96,8 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
 	xfs_extlen_t		blen;
 	int			bstate;
 
+	*num_freeblocks = 0;
+
 	/*
 	 * scan the bitmap for the ag looking for continuous
 	 * extents of free blocks.  At this point, we know
@@ -148,6 +153,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
 #endif
 				add_bno_extent(agno, extent_start, extent_len);
 				add_bcnt_extent(agno, extent_start, extent_len);
+				*num_freeblocks += extent_len;
 			}
 		}
 	}
@@ -161,6 +167,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
 #endif
 		add_bno_extent(agno, extent_start, extent_len);
 		add_bcnt_extent(agno, extent_start, extent_len);
+		*num_freeblocks += extent_len;
 	}
 
 	return(num_extents);
@@ -338,287 +345,6 @@ finish_cursor(bt_status_t *curs)
 	free(curs->btree_blocks);
 }
 
-/*
- * We need to leave some free records in the tree for the corner case of
- * setting up the AGFL. This may require allocation of blocks, and as
- * such can require insertion of new records into the tree (e.g. moving
- * a record in the by-count tree when a long extent is shortened). If we
- * pack the records into the leaves with no slack space, this requires a
- * leaf split to occur and a block to be allocated from the free list.
- * If we don't have any blocks on the free list (because we are setting
- * it up!), then we fail, and the filesystem will fail with the same
- * failure at runtime. Hence leave a couple of records slack space in
- * each block to allow immediate modification of the tree without
- * requiring splits to be done.
- *
- * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?
- */
-#define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
-	(libxfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
-
-/*
- * this calculates a freespace cursor for an ag.
- * btree_curs is an in/out.  returns the number of
- * blocks that will show up in the AGFL.
- */
-static int
-calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
-			xfs_agblock_t *extents, bt_status_t *btree_curs)
-{
-	xfs_extlen_t		blocks_needed;		/* a running count */
-	xfs_extlen_t		blocks_allocated_pt;	/* per tree */
-	xfs_extlen_t		blocks_allocated_total;	/* for both trees */
-	xfs_agblock_t		num_extents;
-	int			i;
-	int			extents_used;
-	int			extra_blocks;
-	bt_stat_level_t		*lptr;
-	bt_stat_level_t		*p_lptr;
-	extent_tree_node_t	*ext_ptr;
-	int			level;
-
-	num_extents = *extents;
-	extents_used = 0;
-
-	ASSERT(num_extents != 0);
-
-	lptr = &btree_curs->level[0];
-	btree_curs->init = 1;
-
-	/*
-	 * figure out how much space we need for the leaf level
-	 * of the tree and set up the cursor for the leaf level
-	 * (note that the same code is duplicated further down)
-	 */
-	lptr->num_blocks = howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0));
-	lptr->num_recs_pb = num_extents / lptr->num_blocks;
-	lptr->modulo = num_extents % lptr->num_blocks;
-	lptr->num_recs_tot = num_extents;
-	level = 1;
-
-#ifdef XR_BLD_FREE_TRACE
-	fprintf(stderr, "%s 0 %d %d %d %d\n", __func__,
-			lptr->num_blocks,
-			lptr->num_recs_pb,
-			lptr->modulo,
-			lptr->num_recs_tot);
-#endif
-	/*
-	 * if we need more levels, set them up.  # of records
-	 * per level is the # of blocks in the level below it
-	 */
-	if (lptr->num_blocks > 1)  {
-		for (; btree_curs->level[level - 1].num_blocks > 1
-				&& level < XFS_BTREE_MAXLEVELS;
-				level++)  {
-			lptr = &btree_curs->level[level];
-			p_lptr = &btree_curs->level[level - 1];
-			lptr->num_blocks = howmany(p_lptr->num_blocks,
-					XR_ALLOC_BLOCK_MAXRECS(mp, level));
-			lptr->modulo = p_lptr->num_blocks
-					% lptr->num_blocks;
-			lptr->num_recs_pb = p_lptr->num_blocks
-					/ lptr->num_blocks;
-			lptr->num_recs_tot = p_lptr->num_blocks;
-#ifdef XR_BLD_FREE_TRACE
-			fprintf(stderr, "%s %d %d %d %d %d\n", __func__,
-					level,
-					lptr->num_blocks,
-					lptr->num_recs_pb,
-					lptr->modulo,
-					lptr->num_recs_tot);
-#endif
-		}
-	}
-
-	ASSERT(lptr->num_blocks == 1);
-	btree_curs->num_levels = level;
-
-	/*
-	 * ok, now we have a hypothetical cursor that
-	 * will work for both the bno and bcnt trees.
-	 * now figure out if using up blocks to set up the
-	 * trees will perturb the shape of the freespace tree.
-	 * if so, we've over-allocated.  the freespace trees
-	 * as they will be *after* accounting for the free space
-	 * we've used up will need fewer blocks to to represent
-	 * than we've allocated.  We can use the AGFL to hold
-	 * xfs_agfl_size (sector/struct xfs_agfl) blocks but that's it.
-	 * Thus we limit things to xfs_agfl_size/2 for each of the 2 btrees.
-	 * if the number of extra blocks is more than that,
-	 * we'll have to be called again.
-	 */
-	for (blocks_needed = 0, i = 0; i < level; i++)  {
-		blocks_needed += btree_curs->level[i].num_blocks;
-	}
-
-	/*
-	 * record the # of blocks we've allocated
-	 */
-	blocks_allocated_pt = blocks_needed;
-	blocks_needed *= 2;
-	blocks_allocated_total = blocks_needed;
-
-	/*
-	 * figure out how many free extents will be used up by
-	 * our space allocation
-	 */
-	if ((ext_ptr = findfirst_bcnt_extent(agno)) == NULL)
-		do_error(_("can't rebuild fs trees -- not enough free space "
-			   "on ag %u\n"), agno);
-
-	while (ext_ptr != NULL && blocks_needed > 0)  {
-		if (ext_ptr->ex_blockcount <= blocks_needed)  {
-			blocks_needed -= ext_ptr->ex_blockcount;
-			extents_used++;
-		} else  {
-			blocks_needed = 0;
-		}
-
-		ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
-
-#ifdef XR_BLD_FREE_TRACE
-		if (ext_ptr != NULL)  {
-			fprintf(stderr, "got next extent [%u %u]\n",
-				ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
-		} else  {
-			fprintf(stderr, "out of extents\n");
-		}
-#endif
-	}
-	if (blocks_needed > 0)
-		do_error(_("ag %u - not enough free space to build freespace "
-			   "btrees\n"), agno);
-
-	ASSERT(num_extents >= extents_used);
-
-	num_extents -= extents_used;
-
-	/*
-	 * see if the number of leaf blocks will change as a result
-	 * of the number of extents changing
-	 */
-	if (howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0))
-			!= btree_curs->level[0].num_blocks)  {
-		/*
-		 * yes -- recalculate the cursor.  If the number of
-		 * excess (overallocated) blocks is < xfs_agfl_size/2, we're ok.
-		 * we can put those into the AGFL.  we don't try
-		 * and get things to converge exactly (reach a
-		 * state with zero excess blocks) because there
-		 * exist pathological cases which will never
-		 * converge.  first, check for the zero-case.
-		 */
-		if (num_extents == 0)  {
-			/*
-			 * ok, we've used up all the free blocks
-			 * trying to lay out the leaf level. go
-			 * to a one block (empty) btree and put the
-			 * already allocated blocks into the AGFL
-			 */
-			if (btree_curs->level[0].num_blocks != 1)  {
-				/*
-				 * we really needed more blocks because
-				 * the old tree had more than one level.
-				 * this is bad.
-				 */
-				 do_warn(_("not enough free blocks left to "
-					   "describe all free blocks in AG "
-					   "%u\n"), agno);
-			}
-#ifdef XR_BLD_FREE_TRACE
-			fprintf(stderr,
-				"ag %u -- no free extents, alloc'ed %d\n",
-				agno, blocks_allocated_pt);
-#endif
-			lptr->num_blocks = 1;
-			lptr->modulo = 0;
-			lptr->num_recs_pb = 0;
-			lptr->num_recs_tot = 0;
-
-			btree_curs->num_levels = 1;
-
-			/*
-			 * don't reset the allocation stats, assume
-			 * they're all extra blocks
-			 * don't forget to return the total block count
-			 * not the per-tree block count.  these are the
-			 * extras that will go into the AGFL.  subtract
-			 * two for the root blocks.
-			 */
-			btree_curs->num_tot_blocks = blocks_allocated_pt;
-			btree_curs->num_free_blocks = blocks_allocated_pt;
-
-			*extents = 0;
-
-			return(blocks_allocated_total - 2);
-		}
-
-		lptr = &btree_curs->level[0];
-		lptr->num_blocks = howmany(num_extents,
-					XR_ALLOC_BLOCK_MAXRECS(mp, 0));
-		lptr->num_recs_pb = num_extents / lptr->num_blocks;
-		lptr->modulo = num_extents % lptr->num_blocks;
-		lptr->num_recs_tot = num_extents;
-		level = 1;
-
-		/*
-		 * if we need more levels, set them up
-		 */
-		if (lptr->num_blocks > 1)  {
-			for (level = 1; btree_curs->level[level-1].num_blocks
-					> 1 && level < XFS_BTREE_MAXLEVELS;
-					level++)  {
-				lptr = &btree_curs->level[level];
-				p_lptr = &btree_curs->level[level-1];
-				lptr->num_blocks = howmany(p_lptr->num_blocks,
-					XR_ALLOC_BLOCK_MAXRECS(mp, level));
-				lptr->modulo = p_lptr->num_blocks
-						% lptr->num_blocks;
-				lptr->num_recs_pb = p_lptr->num_blocks
-						/ lptr->num_blocks;
-				lptr->num_recs_tot = p_lptr->num_blocks;
-			}
-		}
-		ASSERT(lptr->num_blocks == 1);
-		btree_curs->num_levels = level;
-
-		/*
-		 * now figure out the number of excess blocks
-		 */
-		for (blocks_needed = 0, i = 0; i < level; i++)  {
-			blocks_needed += btree_curs->level[i].num_blocks;
-		}
-		blocks_needed *= 2;
-
-		ASSERT(blocks_allocated_total >= blocks_needed);
-		extra_blocks = blocks_allocated_total - blocks_needed;
-	} else  {
-		if (extents_used > 0) {
-			/*
-			 * reset the leaf level geometry to account
-			 * for consumed extents.  we can leave the
-			 * rest of the cursor alone since the number
-			 * of leaf blocks hasn't changed.
-			 */
-			lptr = &btree_curs->level[0];
-
-			lptr->num_recs_pb = num_extents / lptr->num_blocks;
-			lptr->modulo = num_extents % lptr->num_blocks;
-			lptr->num_recs_tot = num_extents;
-		}
-
-		extra_blocks = 0;
-	}
-
-	btree_curs->num_tot_blocks = blocks_allocated_pt;
-	btree_curs->num_free_blocks = blocks_allocated_pt;
-
-	*extents = num_extents;
-
-	return(extra_blocks);
-}
-
 /* Map btnum to buffer ops for the types that need it. */
 static const struct xfs_buf_ops *
 btnum_to_ops(
@@ -643,270 +369,6 @@ btnum_to_ops(
 	}
 }
 
-static void
-prop_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
-		bt_status_t *btree_curs, xfs_agblock_t startblock,
-		xfs_extlen_t blockcount, int level, xfs_btnum_t btnum)
-{
-	struct xfs_btree_block	*bt_hdr;
-	xfs_alloc_key_t		*bt_key;
-	xfs_alloc_ptr_t		*bt_ptr;
-	xfs_agblock_t		agbno;
-	bt_stat_level_t		*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
-	int			error;
-
-	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
-
-	level++;
-
-	if (level >= btree_curs->num_levels)
-		return;
-
-	lptr = &btree_curs->level[level];
-	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
-		/*
-		 * only happens once when initializing the
-		 * left-hand side of the tree.
-		 */
-		prop_freespace_cursor(mp, agno, btree_curs, startblock,
-				blockcount, level, btnum);
-	}
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
-				lptr->num_recs_pb + (lptr->modulo > 0))  {
-		/*
-		 * write out current prev block, grab us a new block,
-		 * and set the rightsib pointer of current block
-		 */
-#ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, " %d ", lptr->prev_agbno);
-#endif
-		if (lptr->prev_agbno != NULLAGBLOCK) {
-			ASSERT(lptr->prev_buf_p != NULL);
-			libxfs_buf_mark_dirty(lptr->prev_buf_p);
-			libxfs_buf_relse(lptr->prev_buf_p);
-		}
-		lptr->prev_agbno = lptr->agbno;;
-		lptr->prev_buf_p = lptr->buf_p;
-		agbno = get_next_blockaddr(agno, level, btree_curs);
-
-		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
-
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(
-	_("Cannot grab free space btree buffer, err=%d"),
-					error);
-		lptr->agbno = agbno;
-
-		if (lptr->modulo)
-			lptr->modulo--;
-
-		/*
-		 * initialize block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, btnum, level,
-					0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-
-		/*
-		 * propagate extent record for first extent in new block up
-		 */
-		prop_freespace_cursor(mp, agno, btree_curs, startblock,
-				blockcount, level, btnum);
-	}
-	/*
-	 * add extent info to current block
-	 */
-	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
-
-	bt_key = XFS_ALLOC_KEY_ADDR(mp, bt_hdr,
-				be16_to_cpu(bt_hdr->bb_numrecs));
-	bt_ptr = XFS_ALLOC_PTR_ADDR(mp, bt_hdr,
-				be16_to_cpu(bt_hdr->bb_numrecs),
-				mp->m_alloc_mxr[1]);
-
-	bt_key->ar_startblock = cpu_to_be32(startblock);
-	bt_key->ar_blockcount = cpu_to_be32(blockcount);
-	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
-}
-
-/*
- * rebuilds a freespace tree given a cursor and type
- * of tree to build (bno or bcnt).  returns the number of free blocks
- * represented by the tree.
- */
-static xfs_extlen_t
-build_freespace_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
-		bt_status_t *btree_curs, xfs_btnum_t btnum)
-{
-	xfs_agnumber_t		i;
-	xfs_agblock_t		j;
-	struct xfs_btree_block	*bt_hdr;
-	xfs_alloc_rec_t		*bt_rec;
-	int			level;
-	xfs_agblock_t		agbno;
-	extent_tree_node_t	*ext_ptr;
-	bt_stat_level_t		*lptr;
-	xfs_extlen_t		freeblks;
-	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
-	int			error;
-
-	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
-
-#ifdef XR_BLD_FREE_TRACE
-	fprintf(stderr, "in build_freespace_tree, agno = %d\n", agno);
-#endif
-	level = btree_curs->num_levels;
-	freeblks = 0;
-
-	ASSERT(level > 0);
-
-	/*
-	 * initialize the first block on each btree level
-	 */
-	for (i = 0; i < level; i++)  {
-		lptr = &btree_curs->level[i];
-
-		agbno = get_next_blockaddr(agno, i, btree_curs);
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(
-	_("Cannot grab free space btree buffer, err=%d"),
-					error);
-
-		if (i == btree_curs->num_levels - 1)
-			btree_curs->root = agbno;
-
-		lptr->agbno = agbno;
-		lptr->prev_agbno = NULLAGBLOCK;
-		lptr->prev_buf_p = NULL;
-		/*
-		 * initialize block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
-	}
-	/*
-	 * run along leaf, setting up records.  as we have to switch
-	 * blocks, call the prop_freespace_cursor routine to set up the new
-	 * pointers for the parent.  that can recurse up to the root
-	 * if required.  set the sibling pointers for leaf level here.
-	 */
-	if (btnum == XFS_BTNUM_BNO)
-		ext_ptr = findfirst_bno_extent(agno);
-	else
-		ext_ptr = findfirst_bcnt_extent(agno);
-
-#ifdef XR_BLD_FREE_TRACE
-	fprintf(stderr, "bft, agno = %d, start = %u, count = %u\n",
-		agno, ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
-#endif
-
-	lptr = &btree_curs->level[0];
-
-	for (i = 0; i < btree_curs->level[0].num_blocks; i++)  {
-		/*
-		 * block initialization, lay in block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
-							(lptr->modulo > 0));
-#ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, "bft, bb_numrecs = %d\n",
-				be16_to_cpu(bt_hdr->bb_numrecs));
-#endif
-
-		if (lptr->modulo > 0)
-			lptr->modulo--;
-
-		/*
-		 * initialize values in the path up to the root if
-		 * this is a multi-level btree
-		 */
-		if (btree_curs->num_levels > 1)
-			prop_freespace_cursor(mp, agno, btree_curs,
-					ext_ptr->ex_startblock,
-					ext_ptr->ex_blockcount,
-					0, btnum);
-
-		bt_rec = (xfs_alloc_rec_t *)
-			  ((char *)bt_hdr + XFS_ALLOC_BLOCK_LEN(mp));
-		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
-			ASSERT(ext_ptr != NULL);
-			bt_rec[j].ar_startblock = cpu_to_be32(
-							ext_ptr->ex_startblock);
-			bt_rec[j].ar_blockcount = cpu_to_be32(
-							ext_ptr->ex_blockcount);
-			freeblks += ext_ptr->ex_blockcount;
-			if (btnum == XFS_BTNUM_BNO)
-				ext_ptr = findnext_bno_extent(ext_ptr);
-			else
-				ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
-#if 0
-#ifdef XR_BLD_FREE_TRACE
-			if (ext_ptr == NULL)
-				fprintf(stderr, "null extent pointer, j = %d\n",
-					j);
-			else
-				fprintf(stderr,
-				"bft, agno = %d, start = %u, count = %u\n",
-					agno, ext_ptr->ex_startblock,
-					ext_ptr->ex_blockcount);
-#endif
-#endif
-		}
-
-		if (ext_ptr != NULL)  {
-			/*
-			 * get next leaf level block
-			 */
-			if (lptr->prev_buf_p != NULL)  {
-#ifdef XR_BLD_FREE_TRACE
-				fprintf(stderr, " writing fst agbno %u\n",
-					lptr->prev_agbno);
-#endif
-				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
-				libxfs_buf_mark_dirty(lptr->prev_buf_p);
-				libxfs_buf_relse(lptr->prev_buf_p);
-			}
-			lptr->prev_buf_p = lptr->buf_p;
-			lptr->prev_agbno = lptr->agbno;
-			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
-			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
-
-			error = -libxfs_buf_get(mp->m_dev,
-					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
-					XFS_FSB_TO_BB(mp, 1),
-					&lptr->buf_p);
-			if (error)
-				do_error(
-	_("Cannot grab free space btree buffer, err=%d"),
-						error);
-		}
-	}
-
-	return(freeblks);
-}
-
 /*
  * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
  */
@@ -2038,6 +1500,28 @@ _("Insufficient memory to construct refcount cursor."));
 	free_slab_cursor(&refc_cur);
 }
 
+/* Fill the AGFL with any leftover bnobt rebuilder blocks. */
+static void
+fill_agfl(
+	struct bt_rebuild	*btr,
+	__be32			*agfl_bnos,
+	unsigned int		*agfl_idx)
+{
+	struct bulkload_resv	*resv, *n;
+	struct xfs_mount	*mp = btr->newbt.sc->mp;
+
+	for_each_bulkload_reservation(&btr->newbt, resv, n) {
+		xfs_agblock_t	bno;
+
+		bno = XFS_FSB_TO_AGBNO(mp, resv->fsbno + resv->used);
+		while (resv->used < resv->len &&
+		       *agfl_idx < libxfs_agfl_size(mp)) {
+			agfl_bnos[(*agfl_idx)++] = cpu_to_be32(bno++);
+			resv->used++;
+		}
+	}
+}
+
 /*
  * build both the agf and the agfl for an agno given both
  * btree cursors.
@@ -2048,9 +1532,8 @@ static void
 build_agf_agfl(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
-	struct bt_status	*bno_bt,
-	struct bt_status	*bcnt_bt,
-	xfs_extlen_t		freeblks,	/* # free blocks in tree */
+	struct bt_rebuild	*btr_bno,
+	struct bt_rebuild	*btr_cnt,
 	struct bt_status	*rmap_bt,
 	struct bt_status	*refcnt_bt,
 	struct xfs_slab		*lost_fsb)
@@ -2060,7 +1543,6 @@ build_agf_agfl(
 	unsigned int		agfl_idx;
 	struct xfs_agfl		*agfl;
 	struct xfs_agf		*agf;
-	xfs_fsblock_t		fsb;
 	__be32			*freelist;
 	int			error;
 
@@ -2092,13 +1574,17 @@ build_agf_agfl(
 		agf->agf_length = cpu_to_be32(mp->m_sb.sb_dblocks -
 			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
 
-	agf->agf_roots[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->root);
-	agf->agf_levels[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->num_levels);
-	agf->agf_roots[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->root);
-	agf->agf_levels[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->num_levels);
+	agf->agf_roots[XFS_BTNUM_BNO] =
+			cpu_to_be32(btr_bno->newbt.afake.af_root);
+	agf->agf_levels[XFS_BTNUM_BNO] =
+			cpu_to_be32(btr_bno->newbt.afake.af_levels);
+	agf->agf_roots[XFS_BTNUM_CNT] =
+			cpu_to_be32(btr_cnt->newbt.afake.af_root);
+	agf->agf_levels[XFS_BTNUM_CNT] =
+			cpu_to_be32(btr_cnt->newbt.afake.af_levels);
 	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
 	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
-	agf->agf_freeblks = cpu_to_be32(freeblks);
+	agf->agf_freeblks = cpu_to_be32(btr_bno->freeblks);
 	agf->agf_rmap_blocks = cpu_to_be32(rmap_bt->num_tot_blocks -
 			rmap_bt->num_free_blocks);
 	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
@@ -2115,9 +1601,8 @@ build_agf_agfl(
 		 * Don't count the root blocks as they are already
 		 * accounted for.
 		 */
-		blks = (bno_bt->num_tot_blocks - bno_bt->num_free_blocks) +
-			(bcnt_bt->num_tot_blocks - bcnt_bt->num_free_blocks) -
-			2;
+		blks = btr_bno->newbt.afake.af_blocks +
+			btr_cnt->newbt.afake.af_blocks - 2;
 		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 			blks += rmap_bt->num_tot_blocks - rmap_bt->num_free_blocks - 1;
 		agf->agf_btreeblks = cpu_to_be32(blks);
@@ -2159,50 +1644,14 @@ build_agf_agfl(
 			freelist[agfl_idx] = cpu_to_be32(NULLAGBLOCK);
 	}
 
-	/*
-	 * do we have left-over blocks in the btree cursors that should
-	 * be used to fill the AGFL?
-	 */
-	if (bno_bt->num_free_blocks > 0 || bcnt_bt->num_free_blocks > 0)  {
-		/*
-		 * yes, now grab as many blocks as we can
-		 */
-		agfl_idx = 0;
-		while (bno_bt->num_free_blocks > 0 &&
-		       agfl_idx < libxfs_agfl_size(mp))
-		{
-			freelist[agfl_idx] = cpu_to_be32(
-					get_next_blockaddr(agno, 0, bno_bt));
-			agfl_idx++;
-		}
-
-		while (bcnt_bt->num_free_blocks > 0 &&
-		       agfl_idx < libxfs_agfl_size(mp))
-		{
-			freelist[agfl_idx] = cpu_to_be32(
-					get_next_blockaddr(agno, 0, bcnt_bt));
-			agfl_idx++;
-		}
-		/*
-		 * now throw the rest of the blocks away and complain
-		 */
-		while (bno_bt->num_free_blocks > 0) {
-			fsb = XFS_AGB_TO_FSB(mp, agno,
-					get_next_blockaddr(agno, 0, bno_bt));
-			error = slab_add(lost_fsb, &fsb);
-			if (error)
-				do_error(
-_("Insufficient memory saving lost blocks.\n"));
-		}
-		while (bcnt_bt->num_free_blocks > 0) {
-			fsb = XFS_AGB_TO_FSB(mp, agno,
-					get_next_blockaddr(agno, 0, bcnt_bt));
-			error = slab_add(lost_fsb, &fsb);
-			if (error)
-				do_error(
-_("Insufficient memory saving lost blocks.\n"));
-		}
+	/* Fill the AGFL with leftover blocks or save them for later. */
+	agfl_idx = 0;
+	freelist = xfs_buf_to_agfl_bno(agfl_buf);
+	fill_agfl(btr_bno, freelist, &agfl_idx);
+	fill_agfl(btr_cnt, freelist, &agfl_idx);
 
+	/* Set the AGF counters for the AGFL. */
+	if (agfl_idx > 0) {
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(agfl_idx - 1);
 		agf->agf_flcount = cpu_to_be32(agfl_idx);
@@ -2300,18 +1749,14 @@ phase5_func(
 	uint64_t		num_free_inos;
 	uint64_t		finobt_num_inos;
 	uint64_t		finobt_num_free_inos;
-	bt_status_t		bno_btree_curs;
-	bt_status_t		bcnt_btree_curs;
+	struct bt_rebuild	btr_bno;
+	struct bt_rebuild	btr_cnt;
 	bt_status_t		ino_btree_curs;
 	bt_status_t		fino_btree_curs;
 	bt_status_t		rmap_btree_curs;
 	bt_status_t		refcnt_btree_curs;
 	int			extra_blocks = 0;
 	uint			num_freeblocks;
-	xfs_extlen_t		freeblks1;
-#ifdef DEBUG
-	xfs_extlen_t		freeblks2;
-#endif
 	xfs_agblock_t		num_extents;
 
 	if (verbose)
@@ -2320,7 +1765,7 @@ phase5_func(
 	/*
 	 * build up incore bno and bcnt extent btrees
 	 */
-	num_extents = mk_incore_fstree(mp, agno);
+	num_extents = mk_incore_fstree(mp, agno, &num_freeblocks);
 
 #ifdef XR_BLD_FREE_TRACE
 	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
@@ -2392,8 +1837,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	/*
 	 * track blocks that we might really lose
 	 */
-	extra_blocks = calculate_freespace_cursor(mp, agno,
-				&num_extents, &bno_btree_curs);
+	init_freespace_cursors(&sc, agno, num_freeblocks, &num_extents,
+			&extra_blocks, &btr_bno, &btr_cnt);
 
 	/*
 	 * freespace btrees live in the "free space" but the filesystem treats
@@ -2410,37 +1855,18 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	if (extra_blocks > 0)
 		sb_fdblocks_ag[agno] -= extra_blocks;
 
-	bcnt_btree_curs = bno_btree_curs;
-
-	bno_btree_curs.owner = XFS_RMAP_OWN_AG;
-	bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
-	setup_cursor(mp, agno, &bno_btree_curs);
-	setup_cursor(mp, agno, &bcnt_btree_curs);
-
 #ifdef XR_BLD_FREE_TRACE
 	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
 	fprintf(stderr, "# of bcnt extents is %d\n", count_bcnt_extents(agno));
 #endif
 
-	/*
-	 * now rebuild the freespace trees
-	 */
-	freeblks1 = build_freespace_tree(mp, agno,
-					&bno_btree_curs, XFS_BTNUM_BNO);
+	build_freespace_btrees(&sc, agno, &btr_bno, &btr_cnt);
+
 #ifdef XR_BLD_FREE_TRACE
-	fprintf(stderr, "# of free blocks == %d\n", freeblks1);
+	fprintf(stderr, "# of free blocks == %d/%d\n", btr_bno.freeblks,
+			btr_cnt.freeblks);
 #endif
-	write_cursor(&bno_btree_curs);
-
-#ifdef DEBUG
-	freeblks2 = build_freespace_tree(mp, agno,
-				&bcnt_btree_curs, XFS_BTNUM_CNT);
-#else
-	(void) build_freespace_tree(mp, agno, &bcnt_btree_curs, XFS_BTNUM_CNT);
-#endif
-	write_cursor(&bcnt_btree_curs);
-
-	ASSERT(freeblks1 == freeblks2);
+	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
 
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
 		build_rmap_tree(mp, agno, &rmap_btree_curs);
@@ -2457,8 +1883,9 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	/*
 	 * set up agf and agfl
 	 */
-	build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs, freeblks1,
-			&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
+	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
+			&refcnt_btree_curs, lost_fsb);
+
 	/*
 	 * build inode allocation tree.
 	 */
@@ -2480,7 +1907,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	/*
 	 * tear down cursors
 	 */
-	finish_cursor(&bno_btree_curs);
+	finish_rebuild(mp, &btr_bno, lost_fsb);
+	finish_rebuild(mp, &btr_cnt, lost_fsb);
 	finish_cursor(&ino_btree_curs);
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		finish_cursor(&rmap_btree_curs);
@@ -2488,7 +1916,6 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 		finish_cursor(&refcnt_btree_curs);
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
 		finish_cursor(&fino_btree_curs);
-	finish_cursor(&bcnt_btree_curs);
 
 	/*
 	 * release the incore per-AG bno/bcnt trees so the extent nodes


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/12] xfs_repair: rebuild inode btrees with bulk loader
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (6 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-18 15:24   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 09/12] xfs_repair: rebuild reverse mapping " Darrick J. Wong
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the btree bulk loading functions to rebuild the inode btrees
and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/agbtree.c         |  207 ++++++++++++++++++++
 repair/agbtree.h         |   13 +
 repair/phase5.c          |  488 +++-------------------------------------------
 4 files changed, 248 insertions(+), 461 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index bace739c..5d0868c2 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -115,6 +115,7 @@
 #define xfs_init_local_fork		libxfs_init_local_fork
 
 #define xfs_inobt_maxrecs		libxfs_inobt_maxrecs
+#define xfs_inobt_stage_cursor		libxfs_inobt_stage_cursor
 #define xfs_inode_from_disk		libxfs_inode_from_disk
 #define xfs_inode_to_disk		libxfs_inode_to_disk
 #define xfs_inode_validate_cowextsize	libxfs_inode_validate_cowextsize
diff --git a/repair/agbtree.c b/repair/agbtree.c
index 3b8ab47c..e44475fc 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -308,3 +308,210 @@ _("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
 	libxfs_btree_del_cursor(btr_bno->cur, 0);
 	libxfs_btree_del_cursor(btr_cnt->cur, 0);
 }
+
+/* Inode Btrees */
+
+static inline struct ino_tree_node *
+get_ino_rec(
+	struct xfs_btree_cur	*cur,
+	struct ino_tree_node	*prev_value)
+{
+	xfs_agnumber_t		agno = cur->bc_ag.agno;
+
+	if (cur->bc_btnum == XFS_BTNUM_INO) {
+		if (!prev_value)
+			return findfirst_inode_rec(agno);
+		return next_ino_rec(prev_value);
+	}
+
+	/* finobt */
+	if (!prev_value)
+		return findfirst_free_inode_rec(agno);
+	return next_free_ino_rec(prev_value);
+}
+
+/* Grab one inobt record. */
+static int
+get_inobt_record(
+	struct xfs_btree_cur		*cur,
+	void				*priv)
+{
+	struct bt_rebuild		*btr = priv;
+	struct xfs_inobt_rec_incore	*irec = &cur->bc_rec.i;
+	struct ino_tree_node		*ino_rec;
+	int				inocnt = 0;
+	int				finocnt = 0;
+	int				k;
+
+	btr->ino_rec = ino_rec = get_ino_rec(cur, btr->ino_rec);
+
+	/* Transform the incore record into an on-disk record. */
+	irec->ir_startino = ino_rec->ino_startnum;
+	irec->ir_free = ino_rec->ir_free;
+
+	for (k = 0; k < sizeof(xfs_inofree_t) * NBBY; k++)  {
+		ASSERT(is_inode_confirmed(ino_rec, k));
+
+		if (is_inode_sparse(ino_rec, k))
+			continue;
+		if (is_inode_free(ino_rec, k))
+			finocnt++;
+		inocnt++;
+	}
+
+	irec->ir_count = inocnt;
+	irec->ir_freecount = finocnt;
+
+	if (xfs_sb_version_hassparseinodes(&cur->bc_mp->m_sb)) {
+		uint64_t		sparse;
+		int			spmask;
+		uint16_t		holemask;
+
+		/*
+		 * Convert the 64-bit in-core sparse inode state to the
+		 * 16-bit on-disk holemask.
+		 */
+		holemask = 0;
+		spmask = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1;
+		sparse = ino_rec->ir_sparse;
+		for (k = 0; k < XFS_INOBT_HOLEMASK_BITS; k++) {
+			if (sparse & spmask) {
+				ASSERT((sparse & spmask) == spmask);
+				holemask |= (1 << k);
+			} else
+				ASSERT((sparse & spmask) == 0);
+			sparse >>= XFS_INODES_PER_HOLEMASK_BIT;
+		}
+
+		irec->ir_holemask = holemask;
+	} else {
+		irec->ir_holemask = 0;
+	}
+
+	if (btr->first_agino == NULLAGINO)
+		btr->first_agino = ino_rec->ino_startnum;
+	btr->freecount += finocnt;
+	btr->count += inocnt;
+	return 0;
+}
+
+/* Initialize both inode btree cursors as needed. */
+void
+init_ino_cursors(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	unsigned int		free_space,
+	uint64_t		*num_inos,
+	uint64_t		*num_free_inos,
+	struct bt_rebuild	*btr_ino,
+	struct bt_rebuild	*btr_fino)
+{
+	struct ino_tree_node	*ino_rec;
+	unsigned int		ino_recs = 0;
+	unsigned int		fino_recs = 0;
+	bool			finobt;
+	int			error;
+
+	finobt = xfs_sb_version_hasfinobt(&sc->mp->m_sb);
+	init_rebuild(sc, &XFS_RMAP_OINFO_INOBT, free_space, btr_ino);
+
+	/* Compute inode statistics. */
+	*num_free_inos = 0;
+	*num_inos = 0;
+	for (ino_rec = findfirst_inode_rec(agno);
+	     ino_rec != NULL;
+	     ino_rec = next_ino_rec(ino_rec))  {
+		unsigned int	rec_ninos = 0;
+		unsigned int	rec_nfinos = 0;
+		int		i;
+
+		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)  {
+			ASSERT(is_inode_confirmed(ino_rec, i));
+			/*
+			 * sparse inodes are not factored into superblock (free)
+			 * inode counts
+			 */
+			if (is_inode_sparse(ino_rec, i))
+				continue;
+			if (is_inode_free(ino_rec, i))
+				rec_nfinos++;
+			rec_ninos++;
+		}
+
+		*num_free_inos += rec_nfinos;
+		*num_inos += rec_ninos;
+		ino_recs++;
+
+		/* finobt only considers records with free inodes */
+		if (rec_nfinos)
+			fino_recs++;
+	}
+
+	btr_ino->cur = libxfs_inobt_stage_cursor(sc->mp, &btr_ino->newbt.afake,
+			agno, XFS_BTNUM_INO);
+
+	btr_ino->bload.get_record = get_inobt_record;
+	btr_ino->bload.claim_block = rebuild_claim_block;
+	btr_ino->first_agino = NULLAGINO;
+
+	/* Compute how many inobt blocks we'll need. */
+	error = -libxfs_btree_bload_compute_geometry(btr_ino->cur,
+			&btr_ino->bload, ino_recs);
+	if (error)
+		do_error(
+_("Unable to compute inode btree geometry, error %d.\n"), error);
+
+	reserve_btblocks(sc->mp, agno, btr_ino, btr_ino->bload.nr_blocks);
+
+	if (!finobt)
+		return;
+
+	init_rebuild(sc, &XFS_RMAP_OINFO_INOBT, free_space, btr_fino);
+	btr_fino->cur = libxfs_inobt_stage_cursor(sc->mp,
+			&btr_fino->newbt.afake, agno, XFS_BTNUM_FINO);
+
+	btr_fino->bload.get_record = get_inobt_record;
+	btr_fino->bload.claim_block = rebuild_claim_block;
+	btr_fino->first_agino = NULLAGINO;
+
+	/* Compute how many finobt blocks we'll need. */
+	error = -libxfs_btree_bload_compute_geometry(btr_fino->cur,
+			&btr_fino->bload, fino_recs);
+	if (error)
+		do_error(
+_("Unable to compute free inode btree geometry, error %d.\n"), error);
+
+	reserve_btblocks(sc->mp, agno, btr_fino, btr_fino->bload.nr_blocks);
+}
+
+/* Rebuild the inode btrees. */
+void
+build_inode_btrees(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr_ino,
+	struct bt_rebuild	*btr_fino)
+{
+	int			error;
+
+	/* Add all observed inobt records. */
+	error = -libxfs_btree_bload(btr_ino->cur, &btr_ino->bload, btr_ino);
+	if (error)
+		do_error(
+_("Error %d while creating inobt btree for AG %u.\n"), error, agno);
+
+	/* Since we're not writing the AGI yet, no need to commit the cursor */
+	libxfs_btree_del_cursor(btr_ino->cur, 0);
+
+	if (!xfs_sb_version_hasfinobt(&sc->mp->m_sb))
+		return;
+
+	/* Add all observed finobt records. */
+	error = -libxfs_btree_bload(btr_fino->cur, &btr_fino->bload, btr_fino);
+	if (error)
+		do_error(
+_("Error %d while creating finobt btree for AG %u.\n"), error, agno);
+
+	/* Since we're not writing the AGI yet, no need to commit the cursor */
+	libxfs_btree_del_cursor(btr_fino->cur, 0);
+}
diff --git a/repair/agbtree.h b/repair/agbtree.h
index 63352247..3cad2a8e 100644
--- a/repair/agbtree.h
+++ b/repair/agbtree.h
@@ -24,6 +24,12 @@ struct bt_rebuild {
 			struct extent_tree_node	*bno_rec;
 			unsigned int		freeblks;
 		};
+		struct {
+			struct ino_tree_node	*ino_rec;
+			xfs_agino_t		first_agino;
+			xfs_agino_t		count;
+			xfs_agino_t		freecount;
+		};
 	};
 };
 
@@ -36,4 +42,11 @@ void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
 void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
 		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
 
+void init_ino_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
+		unsigned int free_space, uint64_t *num_inos,
+		uint64_t *num_free_inos, struct bt_rebuild *btr_ino,
+		struct bt_rebuild *btr_fino);
+void build_inode_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
+		struct bt_rebuild *btr_ino, struct bt_rebuild *btr_fino);
+
 #endif /* __XFS_REPAIR_AG_BTREE_H__ */
diff --git a/repair/phase5.c b/repair/phase5.c
index a93d900d..e570349d 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -67,15 +67,6 @@ typedef struct bt_status  {
 	uint64_t		owner;		/* owner */
 } bt_status_t;
 
-/*
- * extra metadata for the agi
- */
-struct agi_stat {
-	xfs_agino_t		first_agino;
-	xfs_agino_t		count;
-	xfs_agino_t		freecount;
-};
-
 static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
 static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
 static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
@@ -369,229 +360,20 @@ btnum_to_ops(
 	}
 }
 
-/*
- * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
- */
-#define XR_INOBT_BLOCK_MAXRECS(mp, level) \
-			libxfs_inobt_maxrecs((mp), (mp)->m_sb.sb_blocksize, \
-						(level) == 0)
-
-/*
- * we don't have to worry here about how chewing up free extents
- * may perturb things because inode tree building happens before
- * freespace tree building.
- */
-static void
-init_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
-		uint64_t *num_inos, uint64_t *num_free_inos, int finobt)
-{
-	uint64_t		ninos;
-	uint64_t		nfinos;
-	int			rec_nfinos;
-	int			rec_ninos;
-	ino_tree_node_t		*ino_rec;
-	int			num_recs;
-	int			level;
-	bt_stat_level_t		*lptr;
-	bt_stat_level_t		*p_lptr;
-	xfs_extlen_t		blocks_allocated;
-	int			i;
-
-	*num_inos = *num_free_inos = 0;
-	ninos = nfinos = 0;
-
-	lptr = &btree_curs->level[0];
-	btree_curs->init = 1;
-	btree_curs->owner = XFS_RMAP_OWN_INOBT;
-
-	/*
-	 * build up statistics
-	 */
-	ino_rec = findfirst_inode_rec(agno);
-	for (num_recs = 0; ino_rec != NULL; ino_rec = next_ino_rec(ino_rec))  {
-		rec_ninos = 0;
-		rec_nfinos = 0;
-		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)  {
-			ASSERT(is_inode_confirmed(ino_rec, i));
-			/*
-			 * sparse inodes are not factored into superblock (free)
-			 * inode counts
-			 */
-			if (is_inode_sparse(ino_rec, i))
-				continue;
-			if (is_inode_free(ino_rec, i))
-				rec_nfinos++;
-			rec_ninos++;
-		}
-
-		/*
-		 * finobt only considers records with free inodes
-		 */
-		if (finobt && !rec_nfinos)
-			continue;
-
-		nfinos += rec_nfinos;
-		ninos += rec_ninos;
-		num_recs++;
-	}
-
-	if (num_recs == 0) {
-		/*
-		 * easy corner-case -- no inode records
-		 */
-		lptr->num_blocks = 1;
-		lptr->modulo = 0;
-		lptr->num_recs_pb = 0;
-		lptr->num_recs_tot = 0;
-
-		btree_curs->num_levels = 1;
-		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
-
-		setup_cursor(mp, agno, btree_curs);
-
-		return;
-	}
-
-	blocks_allocated = lptr->num_blocks = howmany(num_recs,
-					XR_INOBT_BLOCK_MAXRECS(mp, 0));
-
-	lptr->modulo = num_recs % lptr->num_blocks;
-	lptr->num_recs_pb = num_recs / lptr->num_blocks;
-	lptr->num_recs_tot = num_recs;
-	level = 1;
-
-	if (lptr->num_blocks > 1)  {
-		for (; btree_curs->level[level-1].num_blocks > 1
-				&& level < XFS_BTREE_MAXLEVELS;
-				level++)  {
-			lptr = &btree_curs->level[level];
-			p_lptr = &btree_curs->level[level - 1];
-			lptr->num_blocks = howmany(p_lptr->num_blocks,
-				XR_INOBT_BLOCK_MAXRECS(mp, level));
-			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
-			lptr->num_recs_pb = p_lptr->num_blocks
-					/ lptr->num_blocks;
-			lptr->num_recs_tot = p_lptr->num_blocks;
-
-			blocks_allocated += lptr->num_blocks;
-		}
-	}
-	ASSERT(lptr->num_blocks == 1);
-	btree_curs->num_levels = level;
-
-	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
-			= blocks_allocated;
-
-	setup_cursor(mp, agno, btree_curs);
-
-	*num_inos = ninos;
-	*num_free_inos = nfinos;
-
-	return;
-}
-
-static void
-prop_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
-	xfs_btnum_t btnum, xfs_agino_t startino, int level)
-{
-	struct xfs_btree_block	*bt_hdr;
-	xfs_inobt_key_t		*bt_key;
-	xfs_inobt_ptr_t		*bt_ptr;
-	xfs_agblock_t		agbno;
-	bt_stat_level_t		*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
-	int			error;
-
-	level++;
-
-	if (level >= btree_curs->num_levels)
-		return;
-
-	lptr = &btree_curs->level[level];
-	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
-		/*
-		 * this only happens once to initialize the
-		 * first path up the left side of the tree
-		 * where the agbno's are already set up
-		 */
-		prop_ino_cursor(mp, agno, btree_curs, btnum, startino, level);
-	}
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
-				lptr->num_recs_pb + (lptr->modulo > 0))  {
-		/*
-		 * write out current prev block, grab us a new block,
-		 * and set the rightsib pointer of current block
-		 */
-#ifdef XR_BLD_INO_TRACE
-		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
-#endif
-		if (lptr->prev_agbno != NULLAGBLOCK)  {
-			ASSERT(lptr->prev_buf_p != NULL);
-			libxfs_buf_mark_dirty(lptr->prev_buf_p);
-			libxfs_buf_relse(lptr->prev_buf_p);
-		}
-		lptr->prev_agbno = lptr->agbno;;
-		lptr->prev_buf_p = lptr->buf_p;
-		agbno = get_next_blockaddr(agno, level, btree_curs);
-
-		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
-
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(_("Cannot grab inode btree buffer, err=%d"),
-					error);
-		lptr->agbno = agbno;
-
-		if (lptr->modulo)
-			lptr->modulo--;
-
-		/*
-		 * initialize block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, btnum,
-					level, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-
-		/*
-		 * propagate extent record for first extent in new block up
-		 */
-		prop_ino_cursor(mp, agno, btree_curs, btnum, startino, level);
-	}
-	/*
-	 * add inode info to current block
-	 */
-	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
-
-	bt_key = XFS_INOBT_KEY_ADDR(mp, bt_hdr,
-				    be16_to_cpu(bt_hdr->bb_numrecs));
-	bt_ptr = XFS_INOBT_PTR_ADDR(mp, bt_hdr,
-				    be16_to_cpu(bt_hdr->bb_numrecs),
-				    M_IGEO(mp)->inobt_mxr[1]);
-
-	bt_key->ir_startino = cpu_to_be32(startino);
-	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
-}
-
 /*
  * XXX: yet more code that can be shared with mkfs, growfs.
  */
 static void
-build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
-		bt_status_t *finobt_curs, struct agi_stat *agi_stat)
+build_agi(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr_ino,
+	struct bt_rebuild	*btr_fino)
 {
-	xfs_buf_t	*agi_buf;
-	xfs_agi_t	*agi;
-	int		i;
-	int		error;
+	struct xfs_buf		*agi_buf;
+	struct xfs_agi		*agi;
+	int			i;
+	int			error;
 
 	error = -libxfs_buf_get(mp->m_dev,
 			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
@@ -611,11 +393,11 @@ build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
 	else
 		agi->agi_length = cpu_to_be32(mp->m_sb.sb_dblocks -
 			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
-	agi->agi_count = cpu_to_be32(agi_stat->count);
-	agi->agi_root = cpu_to_be32(btree_curs->root);
-	agi->agi_level = cpu_to_be32(btree_curs->num_levels);
-	agi->agi_freecount = cpu_to_be32(agi_stat->freecount);
-	agi->agi_newino = cpu_to_be32(agi_stat->first_agino);
+	agi->agi_count = cpu_to_be32(btr_ino->count);
+	agi->agi_root = cpu_to_be32(btr_ino->newbt.afake.af_root);
+	agi->agi_level = cpu_to_be32(btr_ino->newbt.afake.af_levels);
+	agi->agi_freecount = cpu_to_be32(btr_ino->freecount);
+	agi->agi_newino = cpu_to_be32(btr_ino->first_agino);
 	agi->agi_dirino = cpu_to_be32(NULLAGINO);
 
 	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++)
@@ -625,203 +407,16 @@ build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
 		platform_uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
 
 	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-		agi->agi_free_root = cpu_to_be32(finobt_curs->root);
-		agi->agi_free_level = cpu_to_be32(finobt_curs->num_levels);
+		agi->agi_free_root =
+				cpu_to_be32(btr_fino->newbt.afake.af_root);
+		agi->agi_free_level =
+				cpu_to_be32(btr_fino->newbt.afake.af_levels);
 	}
 
 	libxfs_buf_mark_dirty(agi_buf);
 	libxfs_buf_relse(agi_buf);
 }
 
-/*
- * rebuilds an inode tree given a cursor.  We're lazy here and call
- * the routine that builds the agi
- */
-static void
-build_ino_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
-		bt_status_t *btree_curs, xfs_btnum_t btnum,
-		struct agi_stat *agi_stat)
-{
-	xfs_agnumber_t		i;
-	xfs_agblock_t		j;
-	xfs_agblock_t		agbno;
-	xfs_agino_t		first_agino;
-	struct xfs_btree_block	*bt_hdr;
-	xfs_inobt_rec_t		*bt_rec;
-	ino_tree_node_t		*ino_rec;
-	bt_stat_level_t		*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
-	xfs_agino_t		count = 0;
-	xfs_agino_t		freecount = 0;
-	int			inocnt;
-	uint8_t			finocnt;
-	int			k;
-	int			level = btree_curs->num_levels;
-	int			spmask;
-	uint64_t		sparse;
-	uint16_t		holemask;
-	int			error;
-
-	ASSERT(btnum == XFS_BTNUM_INO || btnum == XFS_BTNUM_FINO);
-
-	for (i = 0; i < level; i++)  {
-		lptr = &btree_curs->level[i];
-
-		agbno = get_next_blockaddr(agno, i, btree_curs);
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(_("Cannot grab inode btree buffer, err=%d"),
-					error);
-
-		if (i == btree_curs->num_levels - 1)
-			btree_curs->root = agbno;
-
-		lptr->agbno = agbno;
-		lptr->prev_agbno = NULLAGBLOCK;
-		lptr->prev_buf_p = NULL;
-		/*
-		 * initialize block header
-		 */
-
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
-	}
-
-	/*
-	 * run along leaf, setting up records.  as we have to switch
-	 * blocks, call the prop_ino_cursor routine to set up the new
-	 * pointers for the parent.  that can recurse up to the root
-	 * if required.  set the sibling pointers for leaf level here.
-	 */
-	if (btnum == XFS_BTNUM_FINO)
-		ino_rec = findfirst_free_inode_rec(agno);
-	else
-		ino_rec = findfirst_inode_rec(agno);
-
-	if (ino_rec != NULL)
-		first_agino = ino_rec->ino_startnum;
-	else
-		first_agino = NULLAGINO;
-
-	lptr = &btree_curs->level[0];
-
-	for (i = 0; i < lptr->num_blocks; i++)  {
-		/*
-		 * block initialization, lay in block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
-							(lptr->modulo > 0));
-
-		if (lptr->modulo > 0)
-			lptr->modulo--;
-
-		if (lptr->num_recs_pb > 0)
-			prop_ino_cursor(mp, agno, btree_curs, btnum,
-					ino_rec->ino_startnum, 0);
-
-		bt_rec = (xfs_inobt_rec_t *)
-			  ((char *)bt_hdr + XFS_INOBT_BLOCK_LEN(mp));
-		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
-			ASSERT(ino_rec != NULL);
-			bt_rec[j].ir_startino =
-					cpu_to_be32(ino_rec->ino_startnum);
-			bt_rec[j].ir_free = cpu_to_be64(ino_rec->ir_free);
-
-			inocnt = finocnt = 0;
-			for (k = 0; k < sizeof(xfs_inofree_t)*NBBY; k++)  {
-				ASSERT(is_inode_confirmed(ino_rec, k));
-
-				if (is_inode_sparse(ino_rec, k))
-					continue;
-				if (is_inode_free(ino_rec, k))
-					finocnt++;
-				inocnt++;
-			}
-
-			/*
-			 * Set the freecount and check whether we need to update
-			 * the sparse format fields. Otherwise, skip to the next
-			 * record.
-			 */
-			inorec_set_freecount(mp, &bt_rec[j], finocnt);
-			if (!xfs_sb_version_hassparseinodes(&mp->m_sb))
-				goto nextrec;
-
-			/*
-			 * Convert the 64-bit in-core sparse inode state to the
-			 * 16-bit on-disk holemask.
-			 */
-			holemask = 0;
-			spmask = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1;
-			sparse = ino_rec->ir_sparse;
-			for (k = 0; k < XFS_INOBT_HOLEMASK_BITS; k++) {
-				if (sparse & spmask) {
-					ASSERT((sparse & spmask) == spmask);
-					holemask |= (1 << k);
-				} else
-					ASSERT((sparse & spmask) == 0);
-				sparse >>= XFS_INODES_PER_HOLEMASK_BIT;
-			}
-
-			bt_rec[j].ir_u.sp.ir_count = inocnt;
-			bt_rec[j].ir_u.sp.ir_holemask = cpu_to_be16(holemask);
-
-nextrec:
-			freecount += finocnt;
-			count += inocnt;
-
-			if (btnum == XFS_BTNUM_FINO)
-				ino_rec = next_free_ino_rec(ino_rec);
-			else
-				ino_rec = next_ino_rec(ino_rec);
-		}
-
-		if (ino_rec != NULL)  {
-			/*
-			 * get next leaf level block
-			 */
-			if (lptr->prev_buf_p != NULL)  {
-#ifdef XR_BLD_INO_TRACE
-				fprintf(stderr, "writing inobt agbno %u\n",
-					lptr->prev_agbno);
-#endif
-				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
-				libxfs_buf_mark_dirty(lptr->prev_buf_p);
-				libxfs_buf_relse(lptr->prev_buf_p);
-			}
-			lptr->prev_buf_p = lptr->buf_p;
-			lptr->prev_agbno = lptr->agbno;
-			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
-			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
-
-			error = -libxfs_buf_get(mp->m_dev,
-					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
-					XFS_FSB_TO_BB(mp, 1),
-					&lptr->buf_p);
-			if (error)
-				do_error(
-	_("Cannot grab inode btree buffer, err=%d"),
-						error);
-		}
-	}
-
-	if (agi_stat) {
-		agi_stat->first_agino = first_agino;
-		agi_stat->count = count;
-		agi_stat->freecount = freecount;
-	}
-}
-
 /* rebuild the rmap tree */
 
 /*
@@ -1744,15 +1339,10 @@ phase5_func(
 	struct xfs_slab		*lost_fsb)
 {
 	struct repair_ctx	sc = { .mp = mp, };
-	struct agi_stat		agi_stat = {0,};
-	uint64_t		num_inos;
-	uint64_t		num_free_inos;
-	uint64_t		finobt_num_inos;
-	uint64_t		finobt_num_free_inos;
 	struct bt_rebuild	btr_bno;
 	struct bt_rebuild	btr_cnt;
-	bt_status_t		ino_btree_curs;
-	bt_status_t		fino_btree_curs;
+	struct bt_rebuild	btr_ino;
+	struct bt_rebuild	btr_fino;
 	bt_status_t		rmap_btree_curs;
 	bt_status_t		refcnt_btree_curs;
 	int			extra_blocks = 0;
@@ -1785,19 +1375,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 			agno);
 	}
 
-	/*
-	 * ok, now set up the btree cursors for the on-disk btrees (includes
-	 * pre-allocating all required blocks for the trees themselves)
-	 */
-	init_ino_cursor(mp, agno, &ino_btree_curs, &num_inos,
-			&num_free_inos, 0);
-
-	if (xfs_sb_version_hasfinobt(&mp->m_sb))
-		init_ino_cursor(mp, agno, &fino_btree_curs, &finobt_num_inos,
-				&finobt_num_free_inos, 1);
-
-	sb_icount_ag[agno] += num_inos;
-	sb_ifree_ag[agno] += num_free_inos;
+	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
+			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
 
 	/*
 	 * Set up the btree cursors for the on-disk rmap btrees, which includes
@@ -1886,36 +1465,23 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
 			&refcnt_btree_curs, lost_fsb);
 
-	/*
-	 * build inode allocation tree.
-	 */
-	build_ino_tree(mp, agno, &ino_btree_curs, XFS_BTNUM_INO, &agi_stat);
-	write_cursor(&ino_btree_curs);
-
-	/*
-	 * build free inode tree
-	 */
-	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
-		build_ino_tree(mp, agno, &fino_btree_curs,
-				XFS_BTNUM_FINO, NULL);
-		write_cursor(&fino_btree_curs);
-	}
+	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
 
 	/* build the agi */
-	build_agi(mp, agno, &ino_btree_curs, &fino_btree_curs, &agi_stat);
+	build_agi(mp, agno, &btr_ino, &btr_fino);
 
 	/*
 	 * tear down cursors
 	 */
 	finish_rebuild(mp, &btr_bno, lost_fsb);
 	finish_rebuild(mp, &btr_cnt, lost_fsb);
-	finish_cursor(&ino_btree_curs);
+	finish_rebuild(mp, &btr_ino, lost_fsb);
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		finish_rebuild(mp, &btr_fino, lost_fsb);
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		finish_cursor(&rmap_btree_curs);
 	if (xfs_sb_version_hasreflink(&mp->m_sb))
 		finish_cursor(&refcnt_btree_curs);
-	if (xfs_sb_version_hasfinobt(&mp->m_sb))
-		finish_cursor(&fino_btree_curs);
 
 	/*
 	 * release the incore per-AG bno/bcnt trees so the extent nodes


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/12] xfs_repair: rebuild reverse mapping btrees with bulk loader
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (7 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 08/12] xfs_repair: rebuild inode " Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-18 15:25   ` Brian Foster
  2020-06-02  4:27 ` [PATCH 10/12] xfs_repair: rebuild refcount " Darrick J. Wong
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the btree bulk loading functions to rebuild the reverse mapping
btrees and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/agbtree.c         |   70 ++++++++
 repair/agbtree.h         |    5 +
 repair/phase5.c          |  409 ++--------------------------------------------
 4 files changed, 96 insertions(+), 389 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 5d0868c2..0026ca45 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -142,6 +142,7 @@
 #define xfs_rmapbt_calc_reserves	libxfs_rmapbt_calc_reserves
 #define xfs_rmapbt_init_cursor		libxfs_rmapbt_init_cursor
 #define xfs_rmapbt_maxrecs		libxfs_rmapbt_maxrecs
+#define xfs_rmapbt_stage_cursor		libxfs_rmapbt_stage_cursor
 #define xfs_rmap_compare		libxfs_rmap_compare
 #define xfs_rmap_get_rec		libxfs_rmap_get_rec
 #define xfs_rmap_irec_offset_pack	libxfs_rmap_irec_offset_pack
diff --git a/repair/agbtree.c b/repair/agbtree.c
index e44475fc..7b075a52 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -515,3 +515,73 @@ _("Error %d while creating finobt btree for AG %u.\n"), error, agno);
 	/* Since we're not writing the AGI yet, no need to commit the cursor */
 	libxfs_btree_del_cursor(btr_fino->cur, 0);
 }
+
+/* rebuild the rmap tree */
+
+/* Grab one rmap record. */
+static int
+get_rmapbt_record(
+	struct xfs_btree_cur		*cur,
+	void				*priv)
+{
+	struct xfs_rmap_irec		*rec;
+	struct bt_rebuild		*btr = priv;
+
+	rec = pop_slab_cursor(btr->slab_cursor);
+	memcpy(&cur->bc_rec.r, rec, sizeof(struct xfs_rmap_irec));
+	return 0;
+}
+
+/* Set up the rmap rebuild parameters. */
+void
+init_rmapbt_cursor(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	unsigned int		free_space,
+	struct bt_rebuild	*btr)
+{
+	int			error;
+
+	if (!xfs_sb_version_hasrmapbt(&sc->mp->m_sb))
+		return;
+
+	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr);
+	btr->cur = libxfs_rmapbt_stage_cursor(sc->mp, &btr->newbt.afake, agno);
+
+	btr->bload.get_record = get_rmapbt_record;
+	btr->bload.claim_block = rebuild_claim_block;
+
+	/* Compute how many blocks we'll need. */
+	error = -libxfs_btree_bload_compute_geometry(btr->cur, &btr->bload,
+			rmap_record_count(sc->mp, agno));
+	if (error)
+		do_error(
+_("Unable to compute rmap btree geometry, error %d.\n"), error);
+
+	reserve_btblocks(sc->mp, agno, btr, btr->bload.nr_blocks);
+}
+
+/* Rebuild a rmap btree. */
+void
+build_rmap_tree(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr)
+{
+	int			error;
+
+	error = rmap_init_cursor(agno, &btr->slab_cursor);
+	if (error)
+		do_error(
+_("Insufficient memory to construct rmap cursor.\n"));
+
+	/* Add all observed rmap records. */
+	error = -libxfs_btree_bload(btr->cur, &btr->bload, btr);
+	if (error)
+		do_error(
+_("Error %d while creating rmap btree for AG %u.\n"), error, agno);
+
+	/* Since we're not writing the AGF yet, no need to commit the cursor */
+	libxfs_btree_del_cursor(btr->cur, 0);
+	free_slab_cursor(&btr->slab_cursor);
+}
diff --git a/repair/agbtree.h b/repair/agbtree.h
index 3cad2a8e..ca6e70de 100644
--- a/repair/agbtree.h
+++ b/repair/agbtree.h
@@ -49,4 +49,9 @@ void init_ino_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
 void build_inode_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
 		struct bt_rebuild *btr_ino, struct bt_rebuild *btr_fino);
 
+void init_rmapbt_cursor(struct repair_ctx *sc, xfs_agnumber_t agno,
+		unsigned int free_space, struct bt_rebuild *btr);
+void build_rmap_tree(struct repair_ctx *sc, xfs_agnumber_t agno,
+		struct bt_rebuild *btr);
+
 #endif /* __XFS_REPAIR_AG_BTREE_H__ */
diff --git a/repair/phase5.c b/repair/phase5.c
index e570349d..1c6448f4 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -417,377 +417,6 @@ build_agi(
 	libxfs_buf_relse(agi_buf);
 }
 
-/* rebuild the rmap tree */
-
-/*
- * we don't have to worry here about how chewing up free extents
- * may perturb things because rmap tree building happens before
- * freespace tree building.
- */
-static void
-init_rmapbt_cursor(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs)
-{
-	size_t			num_recs;
-	int			level;
-	struct bt_stat_level	*lptr;
-	struct bt_stat_level	*p_lptr;
-	xfs_extlen_t		blocks_allocated;
-	int			maxrecs;
-
-	if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-		memset(btree_curs, 0, sizeof(struct bt_status));
-		return;
-	}
-
-	lptr = &btree_curs->level[0];
-	btree_curs->init = 1;
-	btree_curs->owner = XFS_RMAP_OWN_AG;
-
-	/*
-	 * build up statistics
-	 */
-	num_recs = rmap_record_count(mp, agno);
-	if (num_recs == 0) {
-		/*
-		 * easy corner-case -- no rmap records
-		 */
-		lptr->num_blocks = 1;
-		lptr->modulo = 0;
-		lptr->num_recs_pb = 0;
-		lptr->num_recs_tot = 0;
-
-		btree_curs->num_levels = 1;
-		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
-
-		setup_cursor(mp, agno, btree_curs);
-
-		return;
-	}
-
-	/*
-	 * Leave enough slack in the rmapbt that we can insert the
-	 * metadata AG entries without too many splits.
-	 */
-	maxrecs = mp->m_rmap_mxr[0];
-	if (num_recs > maxrecs)
-		maxrecs -= 10;
-	blocks_allocated = lptr->num_blocks = howmany(num_recs, maxrecs);
-
-	lptr->modulo = num_recs % lptr->num_blocks;
-	lptr->num_recs_pb = num_recs / lptr->num_blocks;
-	lptr->num_recs_tot = num_recs;
-	level = 1;
-
-	if (lptr->num_blocks > 1)  {
-		for (; btree_curs->level[level-1].num_blocks > 1
-				&& level < XFS_BTREE_MAXLEVELS;
-				level++)  {
-			lptr = &btree_curs->level[level];
-			p_lptr = &btree_curs->level[level - 1];
-			lptr->num_blocks = howmany(p_lptr->num_blocks,
-				mp->m_rmap_mxr[1]);
-			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
-			lptr->num_recs_pb = p_lptr->num_blocks
-					/ lptr->num_blocks;
-			lptr->num_recs_tot = p_lptr->num_blocks;
-
-			blocks_allocated += lptr->num_blocks;
-		}
-	}
-	ASSERT(lptr->num_blocks == 1);
-	btree_curs->num_levels = level;
-
-	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
-			= blocks_allocated;
-
-	setup_cursor(mp, agno, btree_curs);
-}
-
-static void
-prop_rmap_cursor(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs,
-	struct xfs_rmap_irec	*rm_rec,
-	int			level)
-{
-	struct xfs_btree_block	*bt_hdr;
-	struct xfs_rmap_key	*bt_key;
-	xfs_rmap_ptr_t		*bt_ptr;
-	xfs_agblock_t		agbno;
-	struct bt_stat_level	*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(XFS_BTNUM_RMAP);
-	int			error;
-
-	level++;
-
-	if (level >= btree_curs->num_levels)
-		return;
-
-	lptr = &btree_curs->level[level];
-	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
-		/*
-		 * this only happens once to initialize the
-		 * first path up the left side of the tree
-		 * where the agbno's are already set up
-		 */
-		prop_rmap_cursor(mp, agno, btree_curs, rm_rec, level);
-	}
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
-				lptr->num_recs_pb + (lptr->modulo > 0))  {
-		/*
-		 * write out current prev block, grab us a new block,
-		 * and set the rightsib pointer of current block
-		 */
-#ifdef XR_BLD_INO_TRACE
-		fprintf(stderr, " rmap prop agbno %d ", lptr->prev_agbno);
-#endif
-		if (lptr->prev_agbno != NULLAGBLOCK)  {
-			ASSERT(lptr->prev_buf_p != NULL);
-			libxfs_buf_mark_dirty(lptr->prev_buf_p);
-			libxfs_buf_relse(lptr->prev_buf_p);
-		}
-		lptr->prev_agbno = lptr->agbno;
-		lptr->prev_buf_p = lptr->buf_p;
-		agbno = get_next_blockaddr(agno, level, btree_curs);
-
-		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
-
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(_("Cannot grab rmapbt buffer, err=%d"),
-					error);
-		lptr->agbno = agbno;
-
-		if (lptr->modulo)
-			lptr->modulo--;
-
-		/*
-		 * initialize block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, XFS_BTNUM_RMAP,
-					level, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-
-		/*
-		 * propagate extent record for first extent in new block up
-		 */
-		prop_rmap_cursor(mp, agno, btree_curs, rm_rec, level);
-	}
-	/*
-	 * add rmap info to current block
-	 */
-	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
-
-	bt_key = XFS_RMAP_KEY_ADDR(bt_hdr,
-				    be16_to_cpu(bt_hdr->bb_numrecs));
-	bt_ptr = XFS_RMAP_PTR_ADDR(bt_hdr,
-				    be16_to_cpu(bt_hdr->bb_numrecs),
-				    mp->m_rmap_mxr[1]);
-
-	bt_key->rm_startblock = cpu_to_be32(rm_rec->rm_startblock);
-	bt_key->rm_owner = cpu_to_be64(rm_rec->rm_owner);
-	bt_key->rm_offset = cpu_to_be64(rm_rec->rm_offset);
-
-	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
-}
-
-static void
-prop_rmap_highkey(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs,
-	struct xfs_rmap_irec	*rm_highkey)
-{
-	struct xfs_btree_block	*bt_hdr;
-	struct xfs_rmap_key	*bt_key;
-	struct bt_stat_level	*lptr;
-	struct xfs_rmap_irec	key = {0};
-	struct xfs_rmap_irec	high_key;
-	int			level;
-	int			i;
-	int			numrecs;
-
-	high_key = *rm_highkey;
-	for (level = 1; level < btree_curs->num_levels; level++) {
-		lptr = &btree_curs->level[level];
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		numrecs = be16_to_cpu(bt_hdr->bb_numrecs);
-		bt_key = XFS_RMAP_HIGH_KEY_ADDR(bt_hdr, numrecs);
-
-		bt_key->rm_startblock = cpu_to_be32(high_key.rm_startblock);
-		bt_key->rm_owner = cpu_to_be64(high_key.rm_owner);
-		bt_key->rm_offset = cpu_to_be64(
-				libxfs_rmap_irec_offset_pack(&high_key));
-
-		for (i = 1; i <= numrecs; i++) {
-			bt_key = XFS_RMAP_HIGH_KEY_ADDR(bt_hdr, i);
-			key.rm_startblock = be32_to_cpu(bt_key->rm_startblock);
-			key.rm_owner = be64_to_cpu(bt_key->rm_owner);
-			key.rm_offset = be64_to_cpu(bt_key->rm_offset);
-			if (rmap_diffkeys(&key, &high_key) > 0)
-				high_key = key;
-		}
-	}
-}
-
-/*
- * rebuilds a rmap btree given a cursor.
- */
-static void
-build_rmap_tree(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs)
-{
-	xfs_agnumber_t		i;
-	xfs_agblock_t		j;
-	xfs_agblock_t		agbno;
-	struct xfs_btree_block	*bt_hdr;
-	struct xfs_rmap_irec	*rm_rec;
-	struct xfs_slab_cursor	*rmap_cur;
-	struct xfs_rmap_rec	*bt_rec;
-	struct xfs_rmap_irec	highest_key = {0};
-	struct xfs_rmap_irec	hi_key = {0};
-	struct bt_stat_level	*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(XFS_BTNUM_RMAP);
-	int			numrecs;
-	int			level = btree_curs->num_levels;
-	int			error;
-
-	highest_key.rm_flags = 0;
-	for (i = 0; i < level; i++)  {
-		lptr = &btree_curs->level[i];
-
-		agbno = get_next_blockaddr(agno, i, btree_curs);
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(_("Cannot grab rmapbt buffer, err=%d"),
-					error);
-
-		if (i == btree_curs->num_levels - 1)
-			btree_curs->root = agbno;
-
-		lptr->agbno = agbno;
-		lptr->prev_agbno = NULLAGBLOCK;
-		lptr->prev_buf_p = NULL;
-		/*
-		 * initialize block header
-		 */
-
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, XFS_BTNUM_RMAP,
-					i, 0, agno);
-	}
-
-	/*
-	 * run along leaf, setting up records.  as we have to switch
-	 * blocks, call the prop_rmap_cursor routine to set up the new
-	 * pointers for the parent.  that can recurse up to the root
-	 * if required.  set the sibling pointers for leaf level here.
-	 */
-	error = rmap_init_cursor(agno, &rmap_cur);
-	if (error)
-		do_error(
-_("Insufficient memory to construct reverse-map cursor."));
-	rm_rec = pop_slab_cursor(rmap_cur);
-	lptr = &btree_curs->level[0];
-
-	for (i = 0; i < lptr->num_blocks; i++)  {
-		numrecs = lptr->num_recs_pb + (lptr->modulo > 0);
-		ASSERT(rm_rec != NULL || numrecs == 0);
-
-		/*
-		 * block initialization, lay in block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, XFS_BTNUM_RMAP,
-					0, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-		bt_hdr->bb_numrecs = cpu_to_be16(numrecs);
-
-		if (lptr->modulo > 0)
-			lptr->modulo--;
-
-		if (lptr->num_recs_pb > 0) {
-			ASSERT(rm_rec != NULL);
-			prop_rmap_cursor(mp, agno, btree_curs, rm_rec, 0);
-		}
-
-		bt_rec = (struct xfs_rmap_rec *)
-			  ((char *)bt_hdr + XFS_RMAP_BLOCK_LEN);
-		highest_key.rm_startblock = 0;
-		highest_key.rm_owner = 0;
-		highest_key.rm_offset = 0;
-		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
-			ASSERT(rm_rec != NULL);
-			bt_rec[j].rm_startblock =
-					cpu_to_be32(rm_rec->rm_startblock);
-			bt_rec[j].rm_blockcount =
-					cpu_to_be32(rm_rec->rm_blockcount);
-			bt_rec[j].rm_owner = cpu_to_be64(rm_rec->rm_owner);
-			bt_rec[j].rm_offset = cpu_to_be64(
-					libxfs_rmap_irec_offset_pack(rm_rec));
-			rmap_high_key_from_rec(rm_rec, &hi_key);
-			if (rmap_diffkeys(&hi_key, &highest_key) > 0)
-				highest_key = hi_key;
-
-			rm_rec = pop_slab_cursor(rmap_cur);
-		}
-
-		/* Now go set the parent key */
-		prop_rmap_highkey(mp, agno, btree_curs, &highest_key);
-
-		if (rm_rec != NULL)  {
-			/*
-			 * get next leaf level block
-			 */
-			if (lptr->prev_buf_p != NULL)  {
-#ifdef XR_BLD_RL_TRACE
-				fprintf(stderr, "writing rmapbt agbno %u\n",
-					lptr->prev_agbno);
-#endif
-				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
-				libxfs_buf_mark_dirty(lptr->prev_buf_p);
-				libxfs_buf_relse(lptr->prev_buf_p);
-			}
-			lptr->prev_buf_p = lptr->buf_p;
-			lptr->prev_agbno = lptr->agbno;
-			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
-			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
-
-			error = -libxfs_buf_get(mp->m_dev,
-					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
-					XFS_FSB_TO_BB(mp, 1),
-					&lptr->buf_p);
-			if (error)
-				do_error(
-	_("Cannot grab rmapbt buffer, err=%d"),
-						error);
-		}
-	}
-	free_slab_cursor(&rmap_cur);
-}
-
 /* rebuild the refcount tree */
 
 /*
@@ -1129,7 +758,7 @@ build_agf_agfl(
 	xfs_agnumber_t		agno,
 	struct bt_rebuild	*btr_bno,
 	struct bt_rebuild	*btr_cnt,
-	struct bt_status	*rmap_bt,
+	struct bt_rebuild	*btr_rmap,
 	struct bt_status	*refcnt_bt,
 	struct xfs_slab		*lost_fsb)
 {
@@ -1177,11 +806,17 @@ build_agf_agfl(
 			cpu_to_be32(btr_cnt->newbt.afake.af_root);
 	agf->agf_levels[XFS_BTNUM_CNT] =
 			cpu_to_be32(btr_cnt->newbt.afake.af_levels);
-	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
-	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
 	agf->agf_freeblks = cpu_to_be32(btr_bno->freeblks);
-	agf->agf_rmap_blocks = cpu_to_be32(rmap_bt->num_tot_blocks -
-			rmap_bt->num_free_blocks);
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		agf->agf_roots[XFS_BTNUM_RMAP] =
+				cpu_to_be32(btr_rmap->newbt.afake.af_root);
+		agf->agf_levels[XFS_BTNUM_RMAP] =
+				cpu_to_be32(btr_rmap->newbt.afake.af_levels);
+		agf->agf_rmap_blocks =
+				cpu_to_be32(btr_rmap->newbt.afake.af_blocks);
+	}
+
 	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
 	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
 	agf->agf_refcount_blocks = cpu_to_be32(refcnt_bt->num_tot_blocks -
@@ -1199,7 +834,7 @@ build_agf_agfl(
 		blks = btr_bno->newbt.afake.af_blocks +
 			btr_cnt->newbt.afake.af_blocks - 2;
 		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
-			blks += rmap_bt->num_tot_blocks - rmap_bt->num_free_blocks - 1;
+			blks += btr_rmap->newbt.afake.af_blocks - 1;
 		agf->agf_btreeblks = cpu_to_be32(blks);
 #ifdef XR_BLD_FREE_TRACE
 		fprintf(stderr, "agf->agf_btreeblks = %u\n",
@@ -1244,6 +879,8 @@ build_agf_agfl(
 	freelist = xfs_buf_to_agfl_bno(agfl_buf);
 	fill_agfl(btr_bno, freelist, &agfl_idx);
 	fill_agfl(btr_cnt, freelist, &agfl_idx);
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		fill_agfl(btr_rmap, freelist, &agfl_idx);
 
 	/* Set the AGF counters for the AGFL. */
 	if (agfl_idx > 0) {
@@ -1343,7 +980,7 @@ phase5_func(
 	struct bt_rebuild	btr_cnt;
 	struct bt_rebuild	btr_ino;
 	struct bt_rebuild	btr_fino;
-	bt_status_t		rmap_btree_curs;
+	struct bt_rebuild	btr_rmap;
 	bt_status_t		refcnt_btree_curs;
 	int			extra_blocks = 0;
 	uint			num_freeblocks;
@@ -1378,11 +1015,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
 			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
 
-	/*
-	 * Set up the btree cursors for the on-disk rmap btrees, which includes
-	 * pre-allocating all required blocks.
-	 */
-	init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
+	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
 
 	/*
 	 * Set up the btree cursors for the on-disk refcount btrees,
@@ -1448,10 +1081,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
 
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-		build_rmap_tree(mp, agno, &rmap_btree_curs);
-		write_cursor(&rmap_btree_curs);
-		sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
-				rmap_btree_curs.num_free_blocks) - 1;
+		build_rmap_tree(&sc, agno, &btr_rmap);
+		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
 	}
 
 	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
@@ -1462,7 +1093,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	/*
 	 * set up agf and agfl
 	 */
-	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
+	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
 			&refcnt_btree_curs, lost_fsb);
 
 	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
@@ -1479,7 +1110,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
 		finish_rebuild(mp, &btr_fino, lost_fsb);
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
-		finish_cursor(&rmap_btree_curs);
+		finish_rebuild(mp, &btr_rmap, lost_fsb);
 	if (xfs_sb_version_hasreflink(&mp->m_sb))
 		finish_cursor(&refcnt_btree_curs);
 


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/12] xfs_repair: rebuild refcount btrees with bulk loader
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (8 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 09/12] xfs_repair: rebuild reverse mapping " Darrick J. Wong
@ 2020-06-02  4:27 ` Darrick J. Wong
  2020-06-18 15:26   ` Brian Foster
  2020-06-02  4:28 ` [PATCH 11/12] xfs_repair: remove old btree rebuild support code Darrick J. Wong
  2020-06-02  4:28 ` [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction Darrick J. Wong
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:27 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the btree bulk loading functions to rebuild the refcount btrees
and drop the open-coded implementation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libxfs/libxfs_api_defs.h |    1 
 repair/agbtree.c         |   71 ++++++++++
 repair/agbtree.h         |    5 +
 repair/phase5.c          |  341 ++--------------------------------------------
 4 files changed, 93 insertions(+), 325 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 0026ca45..1a7cdbf9 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -135,6 +135,7 @@
 #define xfs_refcountbt_calc_reserves	libxfs_refcountbt_calc_reserves
 #define xfs_refcountbt_init_cursor	libxfs_refcountbt_init_cursor
 #define xfs_refcountbt_maxrecs		libxfs_refcountbt_maxrecs
+#define xfs_refcountbt_stage_cursor	libxfs_refcountbt_stage_cursor
 #define xfs_refcount_get_rec		libxfs_refcount_get_rec
 #define xfs_refcount_lookup_le		libxfs_refcount_lookup_le
 
diff --git a/repair/agbtree.c b/repair/agbtree.c
index 7b075a52..d3639fe4 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -585,3 +585,74 @@ _("Error %d while creating rmap btree for AG %u.\n"), error, agno);
 	libxfs_btree_del_cursor(btr->cur, 0);
 	free_slab_cursor(&btr->slab_cursor);
 }
+
+/* rebuild the refcount tree */
+
+/* Grab one refcount record. */
+static int
+get_refcountbt_record(
+	struct xfs_btree_cur		*cur,
+	void				*priv)
+{
+	struct xfs_refcount_irec	*rec;
+	struct bt_rebuild		*btr = priv;
+
+	rec = pop_slab_cursor(btr->slab_cursor);
+	memcpy(&cur->bc_rec.rc, rec, sizeof(struct xfs_refcount_irec));
+	return 0;
+}
+
+/* Set up the refcount rebuild parameters. */
+void
+init_refc_cursor(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	unsigned int		free_space,
+	struct bt_rebuild	*btr)
+{
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&sc->mp->m_sb))
+		return;
+
+	init_rebuild(sc, &XFS_RMAP_OINFO_REFC, free_space, btr);
+	btr->cur = libxfs_refcountbt_stage_cursor(sc->mp, &btr->newbt.afake,
+			agno);
+
+	btr->bload.get_record = get_refcountbt_record;
+	btr->bload.claim_block = rebuild_claim_block;
+
+	/* Compute how many blocks we'll need. */
+	error = -libxfs_btree_bload_compute_geometry(btr->cur, &btr->bload,
+			refcount_record_count(sc->mp, agno));
+	if (error)
+		do_error(
+_("Unable to compute refcount btree geometry, error %d.\n"), error);
+
+	reserve_btblocks(sc->mp, agno, btr, btr->bload.nr_blocks);
+}
+
+/* Rebuild a refcount btree. */
+void
+build_refcount_tree(
+	struct repair_ctx	*sc,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr)
+{
+	int			error;
+
+	error = init_refcount_cursor(agno, &btr->slab_cursor);
+	if (error)
+		do_error(
+_("Insufficient memory to construct refcount cursor.\n"));
+
+	/* Add all observed refcount records. */
+	error = -libxfs_btree_bload(btr->cur, &btr->bload, btr);
+	if (error)
+		do_error(
+_("Error %d while creating refcount btree for AG %u.\n"), error, agno);
+
+	/* Since we're not writing the AGF yet, no need to commit the cursor */
+	libxfs_btree_del_cursor(btr->cur, 0);
+	free_slab_cursor(&btr->slab_cursor);
+}
diff --git a/repair/agbtree.h b/repair/agbtree.h
index ca6e70de..6bbeb022 100644
--- a/repair/agbtree.h
+++ b/repair/agbtree.h
@@ -54,4 +54,9 @@ void init_rmapbt_cursor(struct repair_ctx *sc, xfs_agnumber_t agno,
 void build_rmap_tree(struct repair_ctx *sc, xfs_agnumber_t agno,
 		struct bt_rebuild *btr);
 
+void init_refc_cursor(struct repair_ctx *sc, xfs_agnumber_t agno,
+		unsigned int free_space, struct bt_rebuild *btr);
+void build_refcount_tree(struct repair_ctx *sc, xfs_agnumber_t agno,
+		struct bt_rebuild *btr);
+
 #endif /* __XFS_REPAIR_AG_BTREE_H__ */
diff --git a/repair/phase5.c b/repair/phase5.c
index 1c6448f4..ad009416 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -417,313 +417,6 @@ build_agi(
 	libxfs_buf_relse(agi_buf);
 }
 
-/* rebuild the refcount tree */
-
-/*
- * we don't have to worry here about how chewing up free extents
- * may perturb things because reflink tree building happens before
- * freespace tree building.
- */
-static void
-init_refc_cursor(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs)
-{
-	size_t			num_recs;
-	int			level;
-	struct bt_stat_level	*lptr;
-	struct bt_stat_level	*p_lptr;
-	xfs_extlen_t		blocks_allocated;
-
-	if (!xfs_sb_version_hasreflink(&mp->m_sb)) {
-		memset(btree_curs, 0, sizeof(struct bt_status));
-		return;
-	}
-
-	lptr = &btree_curs->level[0];
-	btree_curs->init = 1;
-	btree_curs->owner = XFS_RMAP_OWN_REFC;
-
-	/*
-	 * build up statistics
-	 */
-	num_recs = refcount_record_count(mp, agno);
-	if (num_recs == 0) {
-		/*
-		 * easy corner-case -- no refcount records
-		 */
-		lptr->num_blocks = 1;
-		lptr->modulo = 0;
-		lptr->num_recs_pb = 0;
-		lptr->num_recs_tot = 0;
-
-		btree_curs->num_levels = 1;
-		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
-
-		setup_cursor(mp, agno, btree_curs);
-
-		return;
-	}
-
-	blocks_allocated = lptr->num_blocks = howmany(num_recs,
-					mp->m_refc_mxr[0]);
-
-	lptr->modulo = num_recs % lptr->num_blocks;
-	lptr->num_recs_pb = num_recs / lptr->num_blocks;
-	lptr->num_recs_tot = num_recs;
-	level = 1;
-
-	if (lptr->num_blocks > 1)  {
-		for (; btree_curs->level[level-1].num_blocks > 1
-				&& level < XFS_BTREE_MAXLEVELS;
-				level++)  {
-			lptr = &btree_curs->level[level];
-			p_lptr = &btree_curs->level[level - 1];
-			lptr->num_blocks = howmany(p_lptr->num_blocks,
-					mp->m_refc_mxr[1]);
-			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
-			lptr->num_recs_pb = p_lptr->num_blocks
-					/ lptr->num_blocks;
-			lptr->num_recs_tot = p_lptr->num_blocks;
-
-			blocks_allocated += lptr->num_blocks;
-		}
-	}
-	ASSERT(lptr->num_blocks == 1);
-	btree_curs->num_levels = level;
-
-	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
-			= blocks_allocated;
-
-	setup_cursor(mp, agno, btree_curs);
-}
-
-static void
-prop_refc_cursor(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs,
-	xfs_agblock_t		startbno,
-	int			level)
-{
-	struct xfs_btree_block	*bt_hdr;
-	struct xfs_refcount_key	*bt_key;
-	xfs_refcount_ptr_t	*bt_ptr;
-	xfs_agblock_t		agbno;
-	struct bt_stat_level	*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(XFS_BTNUM_REFC);
-	int			error;
-
-	level++;
-
-	if (level >= btree_curs->num_levels)
-		return;
-
-	lptr = &btree_curs->level[level];
-	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
-		/*
-		 * this only happens once to initialize the
-		 * first path up the left side of the tree
-		 * where the agbno's are already set up
-		 */
-		prop_refc_cursor(mp, agno, btree_curs, startbno, level);
-	}
-
-	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
-				lptr->num_recs_pb + (lptr->modulo > 0))  {
-		/*
-		 * write out current prev block, grab us a new block,
-		 * and set the rightsib pointer of current block
-		 */
-#ifdef XR_BLD_INO_TRACE
-		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
-#endif
-		if (lptr->prev_agbno != NULLAGBLOCK)  {
-			ASSERT(lptr->prev_buf_p != NULL);
-			libxfs_buf_mark_dirty(lptr->prev_buf_p);
-			libxfs_buf_relse(lptr->prev_buf_p);
-		}
-		lptr->prev_agbno = lptr->agbno;
-		lptr->prev_buf_p = lptr->buf_p;
-		agbno = get_next_blockaddr(agno, level, btree_curs);
-
-		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
-
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(_("Cannot grab refcountbt buffer, err=%d"),
-					error);
-		lptr->agbno = agbno;
-
-		if (lptr->modulo)
-			lptr->modulo--;
-
-		/*
-		 * initialize block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, XFS_BTNUM_REFC,
-					level, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-
-		/*
-		 * propagate extent record for first extent in new block up
-		 */
-		prop_refc_cursor(mp, agno, btree_curs, startbno, level);
-	}
-	/*
-	 * add inode info to current block
-	 */
-	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
-
-	bt_key = XFS_REFCOUNT_KEY_ADDR(bt_hdr,
-				    be16_to_cpu(bt_hdr->bb_numrecs));
-	bt_ptr = XFS_REFCOUNT_PTR_ADDR(bt_hdr,
-				    be16_to_cpu(bt_hdr->bb_numrecs),
-				    mp->m_refc_mxr[1]);
-
-	bt_key->rc_startblock = cpu_to_be32(startbno);
-	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
-}
-
-/*
- * rebuilds a refcount btree given a cursor.
- */
-static void
-build_refcount_tree(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
-	struct bt_status	*btree_curs)
-{
-	xfs_agnumber_t		i;
-	xfs_agblock_t		j;
-	xfs_agblock_t		agbno;
-	struct xfs_btree_block	*bt_hdr;
-	struct xfs_refcount_irec	*refc_rec;
-	struct xfs_slab_cursor	*refc_cur;
-	struct xfs_refcount_rec	*bt_rec;
-	struct bt_stat_level	*lptr;
-	const struct xfs_buf_ops *ops = btnum_to_ops(XFS_BTNUM_REFC);
-	int			numrecs;
-	int			level = btree_curs->num_levels;
-	int			error;
-
-	for (i = 0; i < level; i++)  {
-		lptr = &btree_curs->level[i];
-
-		agbno = get_next_blockaddr(agno, i, btree_curs);
-		error = -libxfs_buf_get(mp->m_dev,
-				XFS_AGB_TO_DADDR(mp, agno, agbno),
-				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
-		if (error)
-			do_error(_("Cannot grab refcountbt buffer, err=%d"),
-					error);
-
-		if (i == btree_curs->num_levels - 1)
-			btree_curs->root = agbno;
-
-		lptr->agbno = agbno;
-		lptr->prev_agbno = NULLAGBLOCK;
-		lptr->prev_buf_p = NULL;
-		/*
-		 * initialize block header
-		 */
-
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, XFS_BTNUM_REFC,
-					i, 0, agno);
-	}
-
-	/*
-	 * run along leaf, setting up records.  as we have to switch
-	 * blocks, call the prop_refc_cursor routine to set up the new
-	 * pointers for the parent.  that can recurse up to the root
-	 * if required.  set the sibling pointers for leaf level here.
-	 */
-	error = init_refcount_cursor(agno, &refc_cur);
-	if (error)
-		do_error(
-_("Insufficient memory to construct refcount cursor."));
-	refc_rec = pop_slab_cursor(refc_cur);
-	lptr = &btree_curs->level[0];
-
-	for (i = 0; i < lptr->num_blocks; i++)  {
-		numrecs = lptr->num_recs_pb + (lptr->modulo > 0);
-		ASSERT(refc_rec != NULL || numrecs == 0);
-
-		/*
-		 * block initialization, lay in block header
-		 */
-		lptr->buf_p->b_ops = ops;
-		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
-		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
-		libxfs_btree_init_block(mp, lptr->buf_p, XFS_BTNUM_REFC,
-					0, 0, agno);
-
-		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
-		bt_hdr->bb_numrecs = cpu_to_be16(numrecs);
-
-		if (lptr->modulo > 0)
-			lptr->modulo--;
-
-		if (lptr->num_recs_pb > 0)
-			prop_refc_cursor(mp, agno, btree_curs,
-					refc_rec->rc_startblock, 0);
-
-		bt_rec = (struct xfs_refcount_rec *)
-			  ((char *)bt_hdr + XFS_REFCOUNT_BLOCK_LEN);
-		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
-			ASSERT(refc_rec != NULL);
-			bt_rec[j].rc_startblock =
-					cpu_to_be32(refc_rec->rc_startblock);
-			bt_rec[j].rc_blockcount =
-					cpu_to_be32(refc_rec->rc_blockcount);
-			bt_rec[j].rc_refcount = cpu_to_be32(refc_rec->rc_refcount);
-
-			refc_rec = pop_slab_cursor(refc_cur);
-		}
-
-		if (refc_rec != NULL)  {
-			/*
-			 * get next leaf level block
-			 */
-			if (lptr->prev_buf_p != NULL)  {
-#ifdef XR_BLD_RL_TRACE
-				fprintf(stderr, "writing refcntbt agbno %u\n",
-					lptr->prev_agbno);
-#endif
-				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
-				libxfs_buf_mark_dirty(lptr->prev_buf_p);
-				libxfs_buf_relse(lptr->prev_buf_p);
-			}
-			lptr->prev_buf_p = lptr->buf_p;
-			lptr->prev_agbno = lptr->agbno;
-			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
-			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
-
-			error = -libxfs_buf_get(mp->m_dev,
-					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
-					XFS_FSB_TO_BB(mp, 1),
-					&lptr->buf_p);
-			if (error)
-				do_error(
-	_("Cannot grab refcountbt buffer, err=%d"),
-						error);
-		}
-	}
-	free_slab_cursor(&refc_cur);
-}
-
 /* Fill the AGFL with any leftover bnobt rebuilder blocks. */
 static void
 fill_agfl(
@@ -759,7 +452,7 @@ build_agf_agfl(
 	struct bt_rebuild	*btr_bno,
 	struct bt_rebuild	*btr_cnt,
 	struct bt_rebuild	*btr_rmap,
-	struct bt_status	*refcnt_bt,
+	struct bt_rebuild	*btr_refc,
 	struct xfs_slab		*lost_fsb)
 {
 	struct extent_tree_node	*ext_ptr;
@@ -817,10 +510,14 @@ build_agf_agfl(
 				cpu_to_be32(btr_rmap->newbt.afake.af_blocks);
 	}
 
-	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
-	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
-	agf->agf_refcount_blocks = cpu_to_be32(refcnt_bt->num_tot_blocks -
-			refcnt_bt->num_free_blocks);
+	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+		agf->agf_refcount_root =
+				cpu_to_be32(btr_refc->newbt.afake.af_root);
+		agf->agf_refcount_level =
+				cpu_to_be32(btr_refc->newbt.afake.af_levels);
+		agf->agf_refcount_blocks =
+				cpu_to_be32(btr_refc->newbt.afake.af_blocks);
+	}
 
 	/*
 	 * Count and record the number of btree blocks consumed if required.
@@ -981,7 +678,7 @@ phase5_func(
 	struct bt_rebuild	btr_ino;
 	struct bt_rebuild	btr_fino;
 	struct bt_rebuild	btr_rmap;
-	bt_status_t		refcnt_btree_curs;
+	struct bt_rebuild	btr_refc;
 	int			extra_blocks = 0;
 	uint			num_freeblocks;
 	xfs_agblock_t		num_extents;
@@ -1017,11 +714,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 
 	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
 
-	/*
-	 * Set up the btree cursors for the on-disk refcount btrees,
-	 * which includes pre-allocating all required blocks.
-	 */
-	init_refc_cursor(mp, agno, &refcnt_btree_curs);
+	init_refc_cursor(&sc, agno, num_freeblocks, &btr_refc);
 
 	num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
 	/*
@@ -1085,16 +778,14 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
 	}
 
-	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
-		build_refcount_tree(mp, agno, &refcnt_btree_curs);
-		write_cursor(&refcnt_btree_curs);
-	}
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		build_refcount_tree(&sc, agno, &btr_refc);
 
 	/*
 	 * set up agf and agfl
 	 */
-	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
-			&refcnt_btree_curs, lost_fsb);
+	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
+			lost_fsb);
 
 	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
 
@@ -1112,7 +803,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		finish_rebuild(mp, &btr_rmap, lost_fsb);
 	if (xfs_sb_version_hasreflink(&mp->m_sb))
-		finish_cursor(&refcnt_btree_curs);
+		finish_rebuild(mp, &btr_refc, lost_fsb);
 
 	/*
 	 * release the incore per-AG bno/bcnt trees so the extent nodes


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/12] xfs_repair: remove old btree rebuild support code
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (9 preceding siblings ...)
  2020-06-02  4:27 ` [PATCH 10/12] xfs_repair: rebuild refcount " Darrick J. Wong
@ 2020-06-02  4:28 ` Darrick J. Wong
  2020-06-19 11:10   ` Brian Foster
  2020-06-02  4:28 ` [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction Darrick J. Wong
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:28 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

This code isn't needed anymore, so get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase5.c |  242 -------------------------------------------------------
 1 file changed, 242 deletions(-)


diff --git a/repair/phase5.c b/repair/phase5.c
index ad009416..439c1065 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -21,52 +21,6 @@
 #include "bulkload.h"
 #include "agbtree.h"
 
-/*
- * we maintain the current slice (path from root to leaf)
- * of the btree incore.  when we need a new block, we ask
- * the block allocator for the address of a block on that
- * level, map the block in, and set up the appropriate
- * pointers (child, silbing, etc.) and keys that should
- * point to the new block.
- */
-typedef struct bt_stat_level  {
-	/*
-	 * set in setup_cursor routine and maintained in the tree-building
-	 * routines
-	 */
-	xfs_buf_t		*buf_p;		/* 2 buffer pointers to ... */
-	xfs_buf_t		*prev_buf_p;
-	xfs_agblock_t		agbno;		/* current block being filled */
-	xfs_agblock_t		prev_agbno;	/* previous block */
-	/*
-	 * set in calculate/init cursor routines for each btree level
-	 */
-	int			num_recs_tot;	/* # tree recs in level */
-	int			num_blocks;	/* # tree blocks in level */
-	int			num_recs_pb;	/* num_recs_tot / num_blocks */
-	int			modulo;		/* num_recs_tot % num_blocks */
-} bt_stat_level_t;
-
-typedef struct bt_status  {
-	int			init;		/* cursor set up once? */
-	int			num_levels;	/* # of levels in btree */
-	xfs_extlen_t		num_tot_blocks;	/* # blocks alloc'ed for tree */
-	xfs_extlen_t		num_free_blocks;/* # blocks currently unused */
-
-	xfs_agblock_t		root;		/* root block */
-	/*
-	 * list of blocks to be used to set up this tree
-	 * and pointer to the first unused block on the list
-	 */
-	xfs_agblock_t		*btree_blocks;		/* block list */
-	xfs_agblock_t		*free_btree_blocks;	/* first unused block */
-	/*
-	 * per-level status info
-	 */
-	bt_stat_level_t		level[XFS_BTREE_MAXLEVELS];
-	uint64_t		owner;		/* owner */
-} bt_status_t;
-
 static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
 static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
 static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
@@ -164,202 +118,6 @@ mk_incore_fstree(
 	return(num_extents);
 }
 
-static xfs_agblock_t
-get_next_blockaddr(xfs_agnumber_t agno, int level, bt_status_t *curs)
-{
-	ASSERT(curs->free_btree_blocks < curs->btree_blocks +
-						curs->num_tot_blocks);
-	ASSERT(curs->num_free_blocks > 0);
-
-	curs->num_free_blocks--;
-	return(*curs->free_btree_blocks++);
-}
-
-/*
- * set up the dynamically allocated block allocation data in the btree
- * cursor that depends on the info in the static portion of the cursor.
- * allocates space from the incore bno/bcnt extent trees and sets up
- * the first path up the left side of the tree.  Also sets up the
- * cursor pointer to the btree root.   called by init_freespace_cursor()
- * and init_ino_cursor()
- */
-static void
-setup_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *curs)
-{
-	int			j;
-	unsigned int		u;
-	xfs_extlen_t		big_extent_len;
-	xfs_agblock_t		big_extent_start;
-	extent_tree_node_t	*ext_ptr;
-	extent_tree_node_t	*bno_ext_ptr;
-	xfs_extlen_t		blocks_allocated;
-	xfs_agblock_t		*agb_ptr;
-	int			error;
-
-	/*
-	 * get the number of blocks we need to allocate, then
-	 * set up block number array, set the free block pointer
-	 * to the first block in the array, and null the array
-	 */
-	big_extent_len = curs->num_tot_blocks;
-	blocks_allocated = 0;
-
-	ASSERT(big_extent_len > 0);
-
-	if ((curs->btree_blocks = malloc(sizeof(xfs_agblock_t)
-					* big_extent_len)) == NULL)
-		do_error(_("could not set up btree block array\n"));
-
-	agb_ptr = curs->free_btree_blocks = curs->btree_blocks;
-
-	for (j = 0; j < curs->num_free_blocks; j++, agb_ptr++)
-		*agb_ptr = NULLAGBLOCK;
-
-	/*
-	 * grab the smallest extent and use it up, then get the
-	 * next smallest.  This mimics the init_*_cursor code.
-	 */
-	ext_ptr =  findfirst_bcnt_extent(agno);
-
-	agb_ptr = curs->btree_blocks;
-
-	/*
-	 * set up the free block array
-	 */
-	while (blocks_allocated < big_extent_len)  {
-		if (!ext_ptr)
-			do_error(
-_("error - not enough free space in filesystem\n"));
-		/*
-		 * use up the extent we've got
-		 */
-		for (u = 0; u < ext_ptr->ex_blockcount &&
-				blocks_allocated < big_extent_len; u++)  {
-			ASSERT(agb_ptr < curs->btree_blocks
-					+ curs->num_tot_blocks);
-			*agb_ptr++ = ext_ptr->ex_startblock + u;
-			blocks_allocated++;
-		}
-
-		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, u,
-				curs->owner);
-		if (error)
-			do_error(_("could not set up btree rmaps: %s\n"),
-				strerror(-error));
-
-		/*
-		 * if we only used part of this last extent, then we
-		 * need only to reset the extent in the extent
-		 * trees and we're done
-		 */
-		if (u < ext_ptr->ex_blockcount)  {
-			big_extent_start = ext_ptr->ex_startblock + u;
-			big_extent_len = ext_ptr->ex_blockcount - u;
-
-			ASSERT(big_extent_len > 0);
-
-			bno_ext_ptr = find_bno_extent(agno,
-						ext_ptr->ex_startblock);
-			ASSERT(bno_ext_ptr != NULL);
-			get_bno_extent(agno, bno_ext_ptr);
-			release_extent_tree_node(bno_ext_ptr);
-
-			ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
-					ext_ptr->ex_blockcount);
-			release_extent_tree_node(ext_ptr);
-#ifdef XR_BLD_FREE_TRACE
-			fprintf(stderr, "releasing extent: %u [%u %u]\n",
-				agno, ext_ptr->ex_startblock,
-				ext_ptr->ex_blockcount);
-			fprintf(stderr, "blocks_allocated = %d\n",
-				blocks_allocated);
-#endif
-
-			add_bno_extent(agno, big_extent_start, big_extent_len);
-			add_bcnt_extent(agno, big_extent_start, big_extent_len);
-
-			return;
-		}
-		/*
-		 * delete the used-up extent from both extent trees and
-		 * find next biggest extent
-		 */
-#ifdef XR_BLD_FREE_TRACE
-		fprintf(stderr, "releasing extent: %u [%u %u]\n",
-			agno, ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
-#endif
-		bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
-		ASSERT(bno_ext_ptr != NULL);
-		get_bno_extent(agno, bno_ext_ptr);
-		release_extent_tree_node(bno_ext_ptr);
-
-		ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
-				ext_ptr->ex_blockcount);
-		ASSERT(ext_ptr != NULL);
-		release_extent_tree_node(ext_ptr);
-
-		ext_ptr = findfirst_bcnt_extent(agno);
-	}
-#ifdef XR_BLD_FREE_TRACE
-	fprintf(stderr, "blocks_allocated = %d\n",
-		blocks_allocated);
-#endif
-}
-
-static void
-write_cursor(bt_status_t *curs)
-{
-	int i;
-
-	for (i = 0; i < curs->num_levels; i++)  {
-#if defined(XR_BLD_FREE_TRACE) || defined(XR_BLD_INO_TRACE)
-		fprintf(stderr, "writing bt block %u\n", curs->level[i].agbno);
-#endif
-		if (curs->level[i].prev_buf_p != NULL)  {
-			ASSERT(curs->level[i].prev_agbno != NULLAGBLOCK);
-#if defined(XR_BLD_FREE_TRACE) || defined(XR_BLD_INO_TRACE)
-			fprintf(stderr, "writing bt prev block %u\n",
-						curs->level[i].prev_agbno);
-#endif
-			libxfs_buf_mark_dirty(curs->level[i].prev_buf_p);
-			libxfs_buf_relse(curs->level[i].prev_buf_p);
-		}
-		libxfs_buf_mark_dirty(curs->level[i].buf_p);
-		libxfs_buf_relse(curs->level[i].buf_p);
-	}
-}
-
-static void
-finish_cursor(bt_status_t *curs)
-{
-	ASSERT(curs->num_free_blocks == 0);
-	free(curs->btree_blocks);
-}
-
-/* Map btnum to buffer ops for the types that need it. */
-static const struct xfs_buf_ops *
-btnum_to_ops(
-	xfs_btnum_t	btnum)
-{
-	switch (btnum) {
-	case XFS_BTNUM_BNO:
-		return &xfs_bnobt_buf_ops;
-	case XFS_BTNUM_CNT:
-		return &xfs_cntbt_buf_ops;
-	case XFS_BTNUM_INO:
-		return &xfs_inobt_buf_ops;
-	case XFS_BTNUM_FINO:
-		return &xfs_finobt_buf_ops;
-	case XFS_BTNUM_RMAP:
-		return &xfs_rmapbt_buf_ops;
-	case XFS_BTNUM_REFC:
-		return &xfs_refcountbt_buf_ops;
-	default:
-		ASSERT(0);
-		return NULL;
-	}
-}
-
 /*
  * XXX: yet more code that can be shared with mkfs, growfs.
  */


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction
  2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
                   ` (10 preceding siblings ...)
  2020-06-02  4:28 ` [PATCH 11/12] xfs_repair: remove old btree rebuild support code Darrick J. Wong
@ 2020-06-02  4:28 ` Darrick J. Wong
  2020-06-19 11:10   ` Brian Foster
  11 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-02  4:28 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Use the incore bitmap structure to track blocks that were lost
during btree construction.  This makes it somewhat more efficient.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/agbtree.c |   21 ++++++++--------
 repair/agbtree.h |    2 +-
 repair/phase5.c  |   72 ++++++++++++++++++++++--------------------------------
 3 files changed, 41 insertions(+), 54 deletions(-)


diff --git a/repair/agbtree.c b/repair/agbtree.c
index d3639fe4..9f87253f 100644
--- a/repair/agbtree.c
+++ b/repair/agbtree.c
@@ -5,6 +5,7 @@
  */
 #include <libxfs.h>
 #include "err_protos.h"
+#include "libfrog/bitmap.h"
 #include "slab.h"
 #include "rmap.h"
 #include "incore.h"
@@ -131,21 +132,21 @@ void
 finish_rebuild(
 	struct xfs_mount	*mp,
 	struct bt_rebuild	*btr,
-	struct xfs_slab		*lost_fsb)
+	struct bitmap		*lost_blocks)
 {
 	struct bulkload_resv	*resv, *n;
+	int			error;
 
 	for_each_bulkload_reservation(&btr->newbt, resv, n) {
-		while (resv->used < resv->len) {
-			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
-			int		error;
+		if (resv->used == resv->len)
+			continue;
 
-			error = slab_add(lost_fsb, &fsb);
-			if (error)
-				do_error(
-_("Insufficient memory saving lost blocks.\n"));
-			resv->used++;
-		}
+		error = bitmap_set(lost_blocks, resv->fsbno + resv->used,
+				   resv->len - resv->used);
+		if (error)
+			do_error(
+_("Insufficient memory saving lost blocks, err=%d.\n"), error);
+		resv->used = resv->len;
 	}
 
 	bulkload_destroy(&btr->newbt, 0);
diff --git a/repair/agbtree.h b/repair/agbtree.h
index 6bbeb022..d8095d20 100644
--- a/repair/agbtree.h
+++ b/repair/agbtree.h
@@ -34,7 +34,7 @@ struct bt_rebuild {
 };
 
 void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
-		struct xfs_slab *lost_fsb);
+		struct bitmap *lost_blocks);
 void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
 		unsigned int free_space, unsigned int *nr_extents,
 		int *extra_blocks, struct bt_rebuild *btr_bno,
diff --git a/repair/phase5.c b/repair/phase5.c
index 439c1065..446f7ec0 100644
--- a/repair/phase5.c
+++ b/repair/phase5.c
@@ -5,6 +5,7 @@
  */
 
 #include "libxfs.h"
+#include "libfrog/bitmap.h"
 #include "avl.h"
 #include "globals.h"
 #include "agheader.h"
@@ -211,7 +212,7 @@ build_agf_agfl(
 	struct bt_rebuild	*btr_cnt,
 	struct bt_rebuild	*btr_rmap,
 	struct bt_rebuild	*btr_refc,
-	struct xfs_slab		*lost_fsb)
+	struct bitmap		*lost_blocks)
 {
 	struct extent_tree_node	*ext_ptr;
 	struct xfs_buf		*agf_buf, *agfl_buf;
@@ -428,7 +429,7 @@ static void
 phase5_func(
 	struct xfs_mount	*mp,
 	xfs_agnumber_t		agno,
-	struct xfs_slab		*lost_fsb)
+	struct bitmap		*lost_blocks)
 {
 	struct repair_ctx	sc = { .mp = mp, };
 	struct bt_rebuild	btr_bno;
@@ -543,7 +544,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	 * set up agf and agfl
 	 */
 	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
-			lost_fsb);
+			lost_blocks);
 
 	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
 
@@ -553,15 +554,15 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	/*
 	 * tear down cursors
 	 */
-	finish_rebuild(mp, &btr_bno, lost_fsb);
-	finish_rebuild(mp, &btr_cnt, lost_fsb);
-	finish_rebuild(mp, &btr_ino, lost_fsb);
+	finish_rebuild(mp, &btr_bno, lost_blocks);
+	finish_rebuild(mp, &btr_cnt, lost_blocks);
+	finish_rebuild(mp, &btr_ino, lost_blocks);
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
-		finish_rebuild(mp, &btr_fino, lost_fsb);
+		finish_rebuild(mp, &btr_fino, lost_blocks);
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
-		finish_rebuild(mp, &btr_rmap, lost_fsb);
+		finish_rebuild(mp, &btr_rmap, lost_blocks);
 	if (xfs_sb_version_hasreflink(&mp->m_sb))
-		finish_rebuild(mp, &btr_refc, lost_fsb);
+		finish_rebuild(mp, &btr_refc, lost_blocks);
 
 	/*
 	 * release the incore per-AG bno/bcnt trees so the extent nodes
@@ -572,48 +573,33 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
 	PROG_RPT_INC(prog_rpt_done[agno], 1);
 }
 
-/* Inject lost blocks back into the filesystem. */
+/* Inject this unused space back into the filesystem. */
 static int
-inject_lost_blocks(
-	struct xfs_mount	*mp,
-	struct xfs_slab		*lost_fsbs)
+inject_lost_extent(
+	uint64_t		start,
+	uint64_t		length,
+	void			*arg)
 {
-	struct xfs_trans	*tp = NULL;
-	struct xfs_slab_cursor	*cur = NULL;
-	xfs_fsblock_t		*fsb;
+	struct xfs_mount	*mp = arg;
+	struct xfs_trans	*tp;
 	int			error;
 
-	error = init_slab_cursor(lost_fsbs, NULL, &cur);
+	error = -libxfs_trans_alloc_rollable(mp, 16, &tp);
 	if (error)
 		return error;
 
-	while ((fsb = pop_slab_cursor(cur)) != NULL) {
-		error = -libxfs_trans_alloc_rollable(mp, 16, &tp);
-		if (error)
-			goto out_cancel;
-
-		error = -libxfs_free_extent(tp, *fsb, 1,
-				&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
-		if (error)
-			goto out_cancel;
-
-		error = -libxfs_trans_commit(tp);
-		if (error)
-			goto out_cancel;
-		tp = NULL;
-	}
+	error = -libxfs_free_extent(tp, start, length,
+			&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
+	if (error)
+		return error;
 
-out_cancel:
-	if (tp)
-		libxfs_trans_cancel(tp);
-	free_slab_cursor(&cur);
-	return error;
+	return -libxfs_trans_commit(tp);
 }
 
 void
 phase5(xfs_mount_t *mp)
 {
-	struct xfs_slab		*lost_fsb;
+	struct bitmap		*lost_blocks = NULL;
 	xfs_agnumber_t		agno;
 	int			error;
 
@@ -656,12 +642,12 @@ phase5(xfs_mount_t *mp)
 	if (sb_fdblocks_ag == NULL)
 		do_error(_("cannot alloc sb_fdblocks_ag buffers\n"));
 
-	error = init_slab(&lost_fsb, sizeof(xfs_fsblock_t));
+	error = bitmap_alloc(&lost_blocks);
 	if (error)
-		do_error(_("cannot alloc lost block slab\n"));
+		do_error(_("cannot alloc lost block bitmap\n"));
 
 	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
-		phase5_func(mp, agno, lost_fsb);
+		phase5_func(mp, agno, lost_blocks);
 
 	print_final_rpt();
 
@@ -704,10 +690,10 @@ _("unable to add AG %u reverse-mapping data to btree.\n"), agno);
 	 * Put blocks that were unnecessarily reserved for btree
 	 * reconstruction back into the filesystem free space data.
 	 */
-	error = inject_lost_blocks(mp, lost_fsb);
+	error = bitmap_iterate(lost_blocks, inject_lost_extent, mp);
 	if (error)
 		do_error(_("Unable to reinsert lost blocks into filesystem.\n"));
-	free_slab(&lost_fsb);
+	bitmap_free(&lost_blocks);
 
 	bad_ino_btree = 0;
 


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl
  2020-06-02  4:26 ` [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl Darrick J. Wong
@ 2020-06-17 12:09   ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-17 12:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:26:59PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We don't do anything with this parameter, so get rid of it.
> 
> Fixes: ef4332b8 ("xfs_repair: add freesp btree block overflow to the free space")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  repair/phase5.c |    7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 677297fe..c9b278bd 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -2049,7 +2049,6 @@ build_agf_agfl(
>  	struct bt_status	*bno_bt,
>  	struct bt_status	*bcnt_bt,
>  	xfs_extlen_t		freeblks,	/* # free blocks in tree */
> -	int			lostblocks,	/* # blocks that will be lost */
>  	struct bt_status	*rmap_bt,
>  	struct bt_status	*refcnt_bt,
>  	struct xfs_slab		*lost_fsb)
> @@ -2465,9 +2464,9 @@ phase5_func(
>  		/*
>  		 * set up agf and agfl
>  		 */
> -		build_agf_agfl(mp, agno, &bno_btree_curs,
> -				&bcnt_btree_curs, freeblks1, extra_blocks,
> -				&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
> +		build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs,
> +				freeblks1, &rmap_btree_curs,
> +				&refcnt_btree_curs, lost_fsb);
>  		/*
>  		 * build inode allocation tree.
>  		 */
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/12] xfs_repair: rename the agfl index loop variable in build_agf_agfl
  2020-06-02  4:27 ` [PATCH 02/12] xfs_repair: rename the agfl index loop variable in build_agf_agfl Darrick J. Wong
@ 2020-06-17 12:09   ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-17 12:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:05PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> The variable 'i' is used to index the AGFL block list, so change the
> name to make it clearer what this is to be used for.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  repair/phase5.c |   28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/repair/phase5.c b/repair/phase5.c
> index c9b278bd..169a2d89 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -2055,7 +2055,7 @@ build_agf_agfl(
>  {
>  	struct extent_tree_node	*ext_ptr;
>  	struct xfs_buf		*agf_buf, *agfl_buf;
> -	int			i;
> +	unsigned int		agfl_idx;
>  	struct xfs_agfl		*agfl;
>  	struct xfs_agf		*agf;
>  	xfs_fsblock_t		fsb;
> @@ -2153,8 +2153,8 @@ build_agf_agfl(
>  		agfl->agfl_magicnum = cpu_to_be32(XFS_AGFL_MAGIC);
>  		agfl->agfl_seqno = cpu_to_be32(agno);
>  		platform_uuid_copy(&agfl->agfl_uuid, &mp->m_sb.sb_meta_uuid);
> -		for (i = 0; i < libxfs_agfl_size(mp); i++)
> -			freelist[i] = cpu_to_be32(NULLAGBLOCK);
> +		for (agfl_idx = 0; agfl_idx < libxfs_agfl_size(mp); agfl_idx++)
> +			freelist[agfl_idx] = cpu_to_be32(NULLAGBLOCK);
>  	}
>  
>  	/*
> @@ -2165,19 +2165,21 @@ build_agf_agfl(
>  		/*
>  		 * yes, now grab as many blocks as we can
>  		 */
> -		i = 0;
> -		while (bno_bt->num_free_blocks > 0 && i < libxfs_agfl_size(mp))
> +		agfl_idx = 0;
> +		while (bno_bt->num_free_blocks > 0 &&
> +		       agfl_idx < libxfs_agfl_size(mp))
>  		{
> -			freelist[i] = cpu_to_be32(
> +			freelist[agfl_idx] = cpu_to_be32(
>  					get_next_blockaddr(agno, 0, bno_bt));
> -			i++;
> +			agfl_idx++;
>  		}
>  
> -		while (bcnt_bt->num_free_blocks > 0 && i < libxfs_agfl_size(mp))
> +		while (bcnt_bt->num_free_blocks > 0 &&
> +		       agfl_idx < libxfs_agfl_size(mp))
>  		{
> -			freelist[i] = cpu_to_be32(
> +			freelist[agfl_idx] = cpu_to_be32(
>  					get_next_blockaddr(agno, 0, bcnt_bt));
> -			i++;
> +			agfl_idx++;
>  		}
>  		/*
>  		 * now throw the rest of the blocks away and complain
> @@ -2200,9 +2202,9 @@ _("Insufficient memory saving lost blocks.\n"));
>  		}
>  
>  		agf->agf_flfirst = 0;
> -		agf->agf_fllast = cpu_to_be32(i - 1);
> -		agf->agf_flcount = cpu_to_be32(i);
> -		rmap_store_agflcount(mp, agno, i);
> +		agf->agf_fllast = cpu_to_be32(agfl_idx - 1);
> +		agf->agf_flcount = cpu_to_be32(agfl_idx);
> +		rmap_store_agflcount(mp, agno, agfl_idx);
>  
>  #ifdef XR_BLD_FREE_TRACE
>  		fprintf(stderr, "writing agfl for ag %u\n", agno);
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/12] xfs_repair: make container for btree bulkload root and block reservation
  2020-06-02  4:27 ` [PATCH 03/12] xfs_repair: make container for btree bulkload root and block reservation Darrick J. Wong
@ 2020-06-17 12:09   ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-17 12:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:12PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create appropriate data structures to manage the fake btree root and
> block reservation lists needed to stage a btree bulkload operation.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  include/libxfs.h         |    1 
>  libxfs/libxfs_api_defs.h |    2 +
>  repair/Makefile          |    4 +-
>  repair/bulkload.c        |   97 ++++++++++++++++++++++++++++++++++++++++++++++
>  repair/bulkload.h        |   57 +++++++++++++++++++++++++++
>  repair/xfs_repair.c      |   17 ++++++++
>  6 files changed, 176 insertions(+), 2 deletions(-)
>  create mode 100644 repair/bulkload.c
>  create mode 100644 repair/bulkload.h
> 
> 
> diff --git a/include/libxfs.h b/include/libxfs.h
> index 12447835..b9370139 100644
> --- a/include/libxfs.h
> +++ b/include/libxfs.h
> @@ -76,6 +76,7 @@ struct iomap;
>  #include "xfs_rmap.h"
>  #include "xfs_refcount_btree.h"
>  #include "xfs_refcount.h"
> +#include "xfs_btree_staging.h"
>  
>  #ifndef ARRAY_SIZE
>  #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
> index be06c763..61047f8f 100644
> --- a/libxfs/libxfs_api_defs.h
> +++ b/libxfs/libxfs_api_defs.h
> @@ -27,12 +27,14 @@
>  #define xfs_alloc_fix_freelist		libxfs_alloc_fix_freelist
>  #define xfs_alloc_min_freelist		libxfs_alloc_min_freelist
>  #define xfs_alloc_read_agf		libxfs_alloc_read_agf
> +#define xfs_alloc_vextent		libxfs_alloc_vextent
>  
>  #define xfs_attr_get			libxfs_attr_get
>  #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
>  #define xfs_attr_namecheck		libxfs_attr_namecheck
>  #define xfs_attr_set			libxfs_attr_set
>  
> +#define __xfs_bmap_add_free		__libxfs_bmap_add_free
>  #define xfs_bmapi_read			libxfs_bmapi_read
>  #define xfs_bmapi_write			libxfs_bmapi_write
>  #define xfs_bmap_last_offset		libxfs_bmap_last_offset
> diff --git a/repair/Makefile b/repair/Makefile
> index 0964499a..62d84bbf 100644
> --- a/repair/Makefile
> +++ b/repair/Makefile
> @@ -9,11 +9,11 @@ LSRCFILES = README
>  
>  LTCOMMAND = xfs_repair
>  
> -HFILES = agheader.h attr_repair.h avl.h bmap.h btree.h \
> +HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
>  	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
>  	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
>  
> -CFILES = agheader.c attr_repair.c avl.c bmap.c btree.c \
> +CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
>  	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
>  	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
>  	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
> diff --git a/repair/bulkload.c b/repair/bulkload.c
> new file mode 100644
> index 00000000..4c69fe0d
> --- /dev/null
> +++ b/repair/bulkload.c
> @@ -0,0 +1,97 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#include <libxfs.h>
> +#include "bulkload.h"
> +
> +int bload_leaf_slack = -1;
> +int bload_node_slack = -1;
> +
> +/* Initialize accounting resources for staging a new AG btree. */
> +void
> +bulkload_init_ag(
> +	struct bulkload			*bkl,
> +	struct repair_ctx		*sc,
> +	const struct xfs_owner_info	*oinfo)
> +{
> +	memset(bkl, 0, sizeof(struct bulkload));
> +	bkl->sc = sc;
> +	bkl->oinfo = *oinfo; /* structure copy */
> +	INIT_LIST_HEAD(&bkl->resv_list);
> +}
> +
> +/* Designate specific blocks to be used to build our new btree. */
> +int
> +bulkload_add_blocks(
> +	struct bulkload		*bkl,
> +	xfs_fsblock_t		fsbno,
> +	xfs_extlen_t		len)
> +{
> +	struct bulkload_resv	*resv;
> +
> +	resv = kmem_alloc(sizeof(struct bulkload_resv), KM_MAYFAIL);
> +	if (!resv)
> +		return ENOMEM;
> +
> +	INIT_LIST_HEAD(&resv->list);
> +	resv->fsbno = fsbno;
> +	resv->len = len;
> +	resv->used = 0;
> +	list_add_tail(&resv->list, &bkl->resv_list);
> +	return 0;
> +}
> +
> +/* Free all the accounting info and disk space we reserved for a new btree. */
> +void
> +bulkload_destroy(
> +	struct bulkload		*bkl,
> +	int			error)
> +{
> +	struct bulkload_resv	*resv, *n;
> +
> +	list_for_each_entry_safe(resv, n, &bkl->resv_list, list) {
> +		list_del(&resv->list);
> +		kmem_free(resv);
> +	}
> +}
> +
> +/* Feed one of the reserved btree blocks to the bulk loader. */
> +int
> +bulkload_claim_block(
> +	struct xfs_btree_cur	*cur,
> +	struct bulkload		*bkl,
> +	union xfs_btree_ptr	*ptr)
> +{
> +	struct bulkload_resv	*resv;
> +	xfs_fsblock_t		fsb;
> +
> +	/*
> +	 * The first item in the list should always have a free block unless
> +	 * we're completely out.
> +	 */
> +	resv = list_first_entry(&bkl->resv_list, struct bulkload_resv, list);
> +	if (resv->used == resv->len)
> +		return ENOSPC;
> +
> +	/*
> +	 * Peel off a block from the start of the reservation.  We allocate
> +	 * blocks in order to place blocks on disk in increasing record or key
> +	 * order.  The block reservations tend to end up on the list in
> +	 * decreasing order, which hopefully results in leaf blocks ending up
> +	 * together.
> +	 */
> +	fsb = resv->fsbno + resv->used;
> +	resv->used++;
> +
> +	/* If we used all the blocks in this reservation, move it to the end. */
> +	if (resv->used == resv->len)
> +		list_move_tail(&resv->list, &bkl->resv_list);
> +
> +	if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
> +		ptr->l = cpu_to_be64(fsb);
> +	else
> +		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
> +	return 0;
> +}
> diff --git a/repair/bulkload.h b/repair/bulkload.h
> new file mode 100644
> index 00000000..79f81cb0
> --- /dev/null
> +++ b/repair/bulkload.h
> @@ -0,0 +1,57 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_REPAIR_BULKLOAD_H__
> +#define __XFS_REPAIR_BULKLOAD_H__
> +
> +extern int bload_leaf_slack;
> +extern int bload_node_slack;
> +
> +struct repair_ctx {
> +	struct xfs_mount	*mp;
> +};
> +
> +struct bulkload_resv {
> +	/* Link to list of extents that we've reserved. */
> +	struct list_head	list;
> +
> +	/* FSB of the block we reserved. */
> +	xfs_fsblock_t		fsbno;
> +
> +	/* Length of the reservation. */
> +	xfs_extlen_t		len;
> +
> +	/* How much of this reservation we've used. */
> +	xfs_extlen_t		used;
> +};
> +
> +struct bulkload {
> +	struct repair_ctx	*sc;
> +
> +	/* List of extents that we've reserved. */
> +	struct list_head	resv_list;
> +
> +	/* Fake root for new btree. */
> +	struct xbtree_afakeroot	afake;
> +
> +	/* rmap owner of these blocks */
> +	struct xfs_owner_info	oinfo;
> +
> +	/* The last reservation we allocated from. */
> +	struct bulkload_resv	*last_resv;
> +};
> +
> +#define for_each_bulkload_reservation(bkl, resv, n)	\
> +	list_for_each_entry_safe((resv), (n), &(bkl)->resv_list, list)
> +
> +void bulkload_init_ag(struct bulkload *bkl, struct repair_ctx *sc,
> +		const struct xfs_owner_info *oinfo);
> +int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
> +		xfs_extlen_t len);
> +void bulkload_destroy(struct bulkload *bkl, int error);
> +int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
> +		union xfs_btree_ptr *ptr);
> +
> +#endif /* __XFS_REPAIR_BULKLOAD_H__ */
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index 9d72fa8e..3bfc8311 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -24,6 +24,7 @@
>  #include "rmap.h"
>  #include "libfrog/fsgeom.h"
>  #include "libfrog/platform.h"
> +#include "bulkload.h"
>  
>  /*
>   * option tables for getsubopt calls
> @@ -39,6 +40,8 @@ enum o_opt_nums {
>  	AG_STRIDE,
>  	FORCE_GEO,
>  	PHASE2_THREADS,
> +	BLOAD_LEAF_SLACK,
> +	BLOAD_NODE_SLACK,
>  	O_MAX_OPTS,
>  };
>  
> @@ -49,6 +52,8 @@ static char *o_opts[] = {
>  	[AG_STRIDE]		= "ag_stride",
>  	[FORCE_GEO]		= "force_geometry",
>  	[PHASE2_THREADS]	= "phase2_threads",
> +	[BLOAD_LEAF_SLACK]	= "debug_bload_leaf_slack",
> +	[BLOAD_NODE_SLACK]	= "debug_bload_node_slack",
>  	[O_MAX_OPTS]		= NULL,
>  };
>  
> @@ -260,6 +265,18 @@ process_args(int argc, char **argv)
>  		_("-o phase2_threads requires a parameter\n"));
>  					phase2_threads = (int)strtol(val, NULL, 0);
>  					break;
> +				case BLOAD_LEAF_SLACK:
> +					if (!val)
> +						do_abort(
> +		_("-o debug_bload_leaf_slack requires a parameter\n"));
> +					bload_leaf_slack = (int)strtol(val, NULL, 0);
> +					break;
> +				case BLOAD_NODE_SLACK:
> +					if (!val)
> +						do_abort(
> +		_("-o debug_bload_node_slack requires a parameter\n"));
> +					bload_node_slack = (int)strtol(val, NULL, 0);
> +					break;
>  				default:
>  					unknown('o', val);
>  					break;
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/12] xfs_repair: inject lost blocks back into the fs no matter the owner
  2020-06-02  4:27 ` [PATCH 05/12] xfs_repair: inject lost blocks back into the fs no matter the owner Darrick J. Wong
@ 2020-06-17 12:09   ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-17 12:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:24PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In repair phase 5, inject_lost_blocks takes the blocks that we allocated
> but didn't use for constructing the new AG btrees and puts them back in
> the filesystem by adding them to the free space.  The only btree that
> can overestimate like that are the free space btrees, but in principle,
> any of the btrees can do that.  If the others did, the rmap record owner
> for those blocks won't necessarily be OWNER_AG, and if it isn't, repair
> will fail.
> 
> Get rid of this logic bomb so that we can use it for /any/ block count
> overestimation, and then we can use it to clean up after all
> reconstruction of any btree type.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  repair/phase5.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 44a6bda8..75c480fd 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -2516,8 +2516,8 @@ inject_lost_blocks(
>  		if (error)
>  			goto out_cancel;
>  
> -		error = -libxfs_free_extent(tp, *fsb, 1, &XFS_RMAP_OINFO_AG,
> -					    XFS_AG_RESV_NONE);
> +		error = -libxfs_free_extent(tp, *fsb, 1,
> +				&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
>  		if (error)
>  			goto out_cancel;
>  
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-06-02  4:27 ` [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors Darrick J. Wong
@ 2020-06-17 12:10   ` Brian Foster
  2020-06-18 18:30     ` Darrick J. Wong
  2020-06-29 23:10     ` Darrick J. Wong
  2020-07-02 15:18   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 2 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-17 12:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:31PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create some new support structures and functions to assist phase5 in
> using the btree bulk loader to reconstruct metadata btrees.  This is the
> first step in removing the open-coded AG btree rebuilding code.
> 
> Note: The code in this patch will not be used anywhere until the next
> patch, so warnings about unused symbols are expected.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

I still find it odd to include the phase5.c changes in this patch when
it amounts to the addition of a single unused parameter, but I'll defer
to the maintainer on that. Otherwise LGTM:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  repair/Makefile   |    4 +
>  repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  repair/agbtree.h  |   29 ++++++++++
>  repair/bulkload.c |   37 +++++++++++++
>  repair/bulkload.h |    2 +
>  repair/phase5.c   |   41 ++++++++------
>  6 files changed, 244 insertions(+), 21 deletions(-)
>  create mode 100644 repair/agbtree.c
>  create mode 100644 repair/agbtree.h
> 
> 
> diff --git a/repair/Makefile b/repair/Makefile
> index 62d84bbf..f6a6e3f9 100644
> --- a/repair/Makefile
> +++ b/repair/Makefile
> @@ -9,11 +9,11 @@ LSRCFILES = README
>  
>  LTCOMMAND = xfs_repair
>  
> -HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> +HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
>  	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
>  	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
>  
> -CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> +CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
>  	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
>  	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
>  	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
> diff --git a/repair/agbtree.c b/repair/agbtree.c
> new file mode 100644
> index 00000000..e4179a44
> --- /dev/null
> +++ b/repair/agbtree.c
> @@ -0,0 +1,152 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#include <libxfs.h>
> +#include "err_protos.h"
> +#include "slab.h"
> +#include "rmap.h"
> +#include "incore.h"
> +#include "bulkload.h"
> +#include "agbtree.h"
> +
> +/* Initialize a btree rebuild context. */
> +static void
> +init_rebuild(
> +	struct repair_ctx		*sc,
> +	const struct xfs_owner_info	*oinfo,
> +	xfs_agblock_t			free_space,
> +	struct bt_rebuild		*btr)
> +{
> +	memset(btr, 0, sizeof(struct bt_rebuild));
> +
> +	bulkload_init_ag(&btr->newbt, sc, oinfo);
> +	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
> +}
> +
> +/*
> + * Update this free space record to reflect the blocks we stole from the
> + * beginning of the record.
> + */
> +static void
> +consume_freespace(
> +	xfs_agnumber_t		agno,
> +	struct extent_tree_node	*ext_ptr,
> +	uint32_t		len)
> +{
> +	struct extent_tree_node	*bno_ext_ptr;
> +	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
> +	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
> +
> +	/* Delete the used-up extent from both extent trees. */
> +#ifdef XR_BLD_FREE_TRACE
> +	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
> +			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> +#endif
> +	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
> +	ASSERT(bno_ext_ptr != NULL);
> +	get_bno_extent(agno, bno_ext_ptr);
> +	release_extent_tree_node(bno_ext_ptr);
> +
> +	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> +			ext_ptr->ex_blockcount);
> +	release_extent_tree_node(ext_ptr);
> +
> +	/*
> +	 * If we only used part of this last extent, then we must reinsert the
> +	 * extent to maintain proper sorting order.
> +	 */
> +	if (new_len > 0) {
> +		add_bno_extent(agno, new_start, new_len);
> +		add_bcnt_extent(agno, new_start, new_len);
> +	}
> +}
> +
> +/* Reserve blocks for the new btree. */
> +static void
> +reserve_btblocks(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	struct bt_rebuild	*btr,
> +	uint32_t		nr_blocks)
> +{
> +	struct extent_tree_node	*ext_ptr;
> +	uint32_t		blocks_allocated = 0;
> +	uint32_t		len;
> +	int			error;
> +
> +	while (blocks_allocated < nr_blocks)  {
> +		xfs_fsblock_t	fsbno;
> +
> +		/*
> +		 * Grab the smallest extent and use it up, then get the
> +		 * next smallest.  This mimics the init_*_cursor code.
> +		 */
> +		ext_ptr = findfirst_bcnt_extent(agno);
> +		if (!ext_ptr)
> +			do_error(
> +_("error - not enough free space in filesystem\n"));
> +
> +		/* Use up the extent we've got. */
> +		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
> +		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
> +		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
> +		if (error)
> +			do_error(_("could not set up btree reservation: %s\n"),
> +				strerror(-error));
> +
> +		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
> +				btr->newbt.oinfo.oi_owner);
> +		if (error)
> +			do_error(_("could not set up btree rmaps: %s\n"),
> +				strerror(-error));
> +
> +		consume_freespace(agno, ext_ptr, len);
> +		blocks_allocated += len;
> +	}
> +#ifdef XR_BLD_FREE_TRACE
> +	fprintf(stderr, "blocks_allocated = %d\n",
> +		blocks_allocated);
> +#endif
> +}
> +
> +/* Feed one of the new btree blocks to the bulk loader. */
> +static int
> +rebuild_claim_block(
> +	struct xfs_btree_cur	*cur,
> +	union xfs_btree_ptr	*ptr,
> +	void			*priv)
> +{
> +	struct bt_rebuild	*btr = priv;
> +
> +	return bulkload_claim_block(cur, &btr->newbt, ptr);
> +}
> +
> +/*
> + * Scoop up leftovers from a rebuild cursor for later freeing, then free the
> + * rebuild context.
> + */
> +void
> +finish_rebuild(
> +	struct xfs_mount	*mp,
> +	struct bt_rebuild	*btr,
> +	struct xfs_slab		*lost_fsb)
> +{
> +	struct bulkload_resv	*resv, *n;
> +
> +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> +		while (resv->used < resv->len) {
> +			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
> +			int		error;
> +
> +			error = slab_add(lost_fsb, &fsb);
> +			if (error)
> +				do_error(
> +_("Insufficient memory saving lost blocks.\n"));
> +			resv->used++;
> +		}
> +	}
> +
> +	bulkload_destroy(&btr->newbt, 0);
> +}
> diff --git a/repair/agbtree.h b/repair/agbtree.h
> new file mode 100644
> index 00000000..50ea3c60
> --- /dev/null
> +++ b/repair/agbtree.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_REPAIR_AG_BTREE_H__
> +#define __XFS_REPAIR_AG_BTREE_H__
> +
> +/* Context for rebuilding a per-AG btree. */
> +struct bt_rebuild {
> +	/* Fake root for staging and space preallocations. */
> +	struct bulkload	newbt;
> +
> +	/* Geometry of the new btree. */
> +	struct xfs_btree_bload	bload;
> +
> +	/* Staging btree cursor for the new tree. */
> +	struct xfs_btree_cur	*cur;
> +
> +	/* Tree-specific data. */
> +	union {
> +		struct xfs_slab_cursor	*slab_cursor;
> +	};
> +};
> +
> +void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> +		struct xfs_slab *lost_fsb);
> +
> +#endif /* __XFS_REPAIR_AG_BTREE_H__ */
> diff --git a/repair/bulkload.c b/repair/bulkload.c
> index 4c69fe0d..9a6ca0c2 100644
> --- a/repair/bulkload.c
> +++ b/repair/bulkload.c
> @@ -95,3 +95,40 @@ bulkload_claim_block(
>  		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
>  	return 0;
>  }
> +
> +/*
> + * Estimate proper slack values for a btree that's being reloaded.
> + *
> + * Under most circumstances, we'll take whatever default loading value the
> + * btree bulk loading code calculates for us.  However, there are some
> + * exceptions to this rule:
> + *
> + * (1) If someone turned one of the debug knobs.
> + * (2) The AG has less than ~9% space free.
> + *
> + * Note that we actually use 3/32 for the comparison to avoid division.
> + */
> +void
> +bulkload_estimate_ag_slack(
> +	struct repair_ctx	*sc,
> +	struct xfs_btree_bload	*bload,
> +	unsigned int		free)
> +{
> +	/*
> +	 * The global values are set to -1 (i.e. take the bload defaults)
> +	 * unless someone has set them otherwise, so we just pull the values
> +	 * here.
> +	 */
> +	bload->leaf_slack = bload_leaf_slack;
> +	bload->node_slack = bload_node_slack;
> +
> +	/* No further changes if there's more than 3/32ths space left. */
> +	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
> +		return;
> +
> +	/* We're low on space; load the btrees as tightly as possible. */
> +	if (bload->leaf_slack < 0)
> +		bload->leaf_slack = 0;
> +	if (bload->node_slack < 0)
> +		bload->node_slack = 0;
> +}
> diff --git a/repair/bulkload.h b/repair/bulkload.h
> index 79f81cb0..01f67279 100644
> --- a/repair/bulkload.h
> +++ b/repair/bulkload.h
> @@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
>  void bulkload_destroy(struct bulkload *bkl, int error);
>  int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
>  		union xfs_btree_ptr *ptr);
> +void bulkload_estimate_ag_slack(struct repair_ctx *sc,
> +		struct xfs_btree_bload *bload, unsigned int free);
>  
>  #endif /* __XFS_REPAIR_BULKLOAD_H__ */
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 75c480fd..8175aa6f 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -18,6 +18,8 @@
>  #include "progress.h"
>  #include "slab.h"
>  #include "rmap.h"
> +#include "bulkload.h"
> +#include "agbtree.h"
>  
>  /*
>   * we maintain the current slice (path from root to leaf)
> @@ -2288,28 +2290,29 @@ keep_fsinos(xfs_mount_t *mp)
>  
>  static void
>  phase5_func(
> -	xfs_mount_t	*mp,
> -	xfs_agnumber_t	agno,
> -	struct xfs_slab	*lost_fsb)
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	struct xfs_slab		*lost_fsb)
>  {
> -	uint64_t	num_inos;
> -	uint64_t	num_free_inos;
> -	uint64_t	finobt_num_inos;
> -	uint64_t	finobt_num_free_inos;
> -	bt_status_t	bno_btree_curs;
> -	bt_status_t	bcnt_btree_curs;
> -	bt_status_t	ino_btree_curs;
> -	bt_status_t	fino_btree_curs;
> -	bt_status_t	rmap_btree_curs;
> -	bt_status_t	refcnt_btree_curs;
> -	int		extra_blocks = 0;
> -	uint		num_freeblocks;
> -	xfs_extlen_t	freeblks1;
> +	struct repair_ctx	sc = { .mp = mp, };
> +	struct agi_stat		agi_stat = {0,};
> +	uint64_t		num_inos;
> +	uint64_t		num_free_inos;
> +	uint64_t		finobt_num_inos;
> +	uint64_t		finobt_num_free_inos;
> +	bt_status_t		bno_btree_curs;
> +	bt_status_t		bcnt_btree_curs;
> +	bt_status_t		ino_btree_curs;
> +	bt_status_t		fino_btree_curs;
> +	bt_status_t		rmap_btree_curs;
> +	bt_status_t		refcnt_btree_curs;
> +	int			extra_blocks = 0;
> +	uint			num_freeblocks;
> +	xfs_extlen_t		freeblks1;
>  #ifdef DEBUG
> -	xfs_extlen_t	freeblks2;
> +	xfs_extlen_t		freeblks2;
>  #endif
> -	xfs_agblock_t	num_extents;
> -	struct agi_stat	agi_stat = {0,};
> +	xfs_agblock_t		num_extents;
>  
>  	if (verbose)
>  		do_log(_("        - agno = %d\n"), agno);
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader
  2020-06-02  4:27 ` [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader Darrick J. Wong
@ 2020-06-18 15:23   ` Brian Foster
  2020-06-18 16:41     ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Brian Foster @ 2020-06-18 15:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:38PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the btree bulk loading functions to rebuild the free space btrees
> and drop the open-coded implementation.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  libxfs/libxfs_api_defs.h |    3 
>  repair/agbtree.c         |  158 ++++++++++
>  repair/agbtree.h         |   10 +
>  repair/phase5.c          |  703 ++++------------------------------------------
>  4 files changed, 236 insertions(+), 638 deletions(-)
> 
> 
...
> diff --git a/repair/agbtree.c b/repair/agbtree.c
> index e4179a44..3b8ab47c 100644
> --- a/repair/agbtree.c
> +++ b/repair/agbtree.c
> @@ -150,3 +150,161 @@ _("Insufficient memory saving lost blocks.\n"));
>  
>  	bulkload_destroy(&btr->newbt, 0);
>  }
...
> +/*
> + * Return the next free space extent tree record from the previous value we
> + * saw.
> + */
> +static inline struct extent_tree_node *
> +get_bno_rec(
> +	struct xfs_btree_cur	*cur,
> +	struct extent_tree_node	*prev_value)
> +{
> +	xfs_agnumber_t		agno = cur->bc_ag.agno;
> +
> +	if (cur->bc_btnum == XFS_BTNUM_BNO) {
> +		if (!prev_value)
> +			return findfirst_bno_extent(agno);
> +		return findnext_bno_extent(prev_value);
> +	}
> +
> +	/* cnt btree */
> +	if (!prev_value)
> +		return findfirst_bcnt_extent(agno);
> +	return findnext_bcnt_extent(agno, prev_value);
> +}
> +
> +/* Grab one bnobt record and put it in the btree cursor. */
> +static int
> +get_bnobt_record(
> +	struct xfs_btree_cur		*cur,
> +	void				*priv)
> +{
> +	struct bt_rebuild		*btr = priv;
> +	struct xfs_alloc_rec_incore	*arec = &cur->bc_rec.a;
> +
> +	btr->bno_rec = get_bno_rec(cur, btr->bno_rec);
> +	arec->ar_startblock = btr->bno_rec->ex_startblock;
> +	arec->ar_blockcount = btr->bno_rec->ex_blockcount;
> +	btr->freeblks += btr->bno_rec->ex_blockcount;
> +	return 0;
> +}

Nit, but the 'bno' naming in the above functions suggest this is bnobt
specific when it actually covers the bnobt and cntbt. Can we call these
something more generic? get_[bt_]record() seems reasonable enough to me
given they're static.

Other than that the factoring looks much nicer and the rest LGTM:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +
> +void
> +init_freespace_cursors(
> +	struct repair_ctx	*sc,
> +	xfs_agnumber_t		agno,
> +	unsigned int		free_space,
> +	unsigned int		*nr_extents,
> +	int			*extra_blocks,
> +	struct bt_rebuild	*btr_bno,
> +	struct bt_rebuild	*btr_cnt)
> +{
> +	unsigned int		bno_blocks;
> +	unsigned int		cnt_blocks;
> +	int			error;
> +
> +	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_bno);
> +	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_cnt);
> +
> +	btr_bno->cur = libxfs_allocbt_stage_cursor(sc->mp,
> +			&btr_bno->newbt.afake, agno, XFS_BTNUM_BNO);
> +	btr_cnt->cur = libxfs_allocbt_stage_cursor(sc->mp,
> +			&btr_cnt->newbt.afake, agno, XFS_BTNUM_CNT);
> +
> +	btr_bno->bload.get_record = get_bnobt_record;
> +	btr_bno->bload.claim_block = rebuild_claim_block;
> +
> +	btr_cnt->bload.get_record = get_bnobt_record;
> +	btr_cnt->bload.claim_block = rebuild_claim_block;
> +
> +	/*
> +	 * Now we need to allocate blocks for the free space btrees using the
> +	 * free space records we're about to put in them.  Every record we use
> +	 * can change the shape of the free space trees, so we recompute the
> +	 * btree shape until we stop needing /more/ blocks.  If we have any
> +	 * left over we'll stash them in the AGFL when we're done.
> +	 */
> +	do {
> +		unsigned int	num_freeblocks;
> +
> +		bno_blocks = btr_bno->bload.nr_blocks;
> +		cnt_blocks = btr_cnt->bload.nr_blocks;
> +
> +		/* Compute how many bnobt blocks we'll need. */
> +		error = -libxfs_btree_bload_compute_geometry(btr_bno->cur,
> +				&btr_bno->bload, *nr_extents);
> +		if (error)
> +			do_error(
> +_("Unable to compute free space by block btree geometry, error %d.\n"), -error);
> +
> +		/* Compute how many cntbt blocks we'll need. */
> +		error = -libxfs_btree_bload_compute_geometry(btr_cnt->cur,
> +				&btr_cnt->bload, *nr_extents);
> +		if (error)
> +			do_error(
> +_("Unable to compute free space by length btree geometry, error %d.\n"), -error);
> +
> +		/* We don't need any more blocks, so we're done. */
> +		if (bno_blocks >= btr_bno->bload.nr_blocks &&
> +		    cnt_blocks >= btr_cnt->bload.nr_blocks)
> +			break;
> +
> +		/* Allocate however many more blocks we need this time. */
> +		if (bno_blocks < btr_bno->bload.nr_blocks)
> +			reserve_btblocks(sc->mp, agno, btr_bno,
> +					btr_bno->bload.nr_blocks - bno_blocks);
> +		if (cnt_blocks < btr_cnt->bload.nr_blocks)
> +			reserve_btblocks(sc->mp, agno, btr_cnt,
> +					btr_cnt->bload.nr_blocks - cnt_blocks);
> +
> +		/* Ok, now how many free space records do we have? */
> +		*nr_extents = count_bno_extents_blocks(agno, &num_freeblocks);
> +	} while (1);
> +
> +	*extra_blocks = (bno_blocks - btr_bno->bload.nr_blocks) +
> +			(cnt_blocks - btr_cnt->bload.nr_blocks);
> +}
> +
> +/* Rebuild the free space btrees. */
> +void
> +build_freespace_btrees(
> +	struct repair_ctx	*sc,
> +	xfs_agnumber_t		agno,
> +	struct bt_rebuild	*btr_bno,
> +	struct bt_rebuild	*btr_cnt)
> +{
> +	int			error;
> +
> +	/* Add all observed bnobt records. */
> +	error = -libxfs_btree_bload(btr_bno->cur, &btr_bno->bload, btr_bno);
> +	if (error)
> +		do_error(
> +_("Error %d while creating bnobt btree for AG %u.\n"), error, agno);
> +
> +	/* Add all observed cntbt records. */
> +	error = -libxfs_btree_bload(btr_cnt->cur, &btr_cnt->bload, btr_cnt);
> +	if (error)
> +		do_error(
> +_("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
> +
> +	/* Since we're not writing the AGF yet, no need to commit the cursor */
> +	libxfs_btree_del_cursor(btr_bno->cur, 0);
> +	libxfs_btree_del_cursor(btr_cnt->cur, 0);
> +}
> diff --git a/repair/agbtree.h b/repair/agbtree.h
> index 50ea3c60..63352247 100644
> --- a/repair/agbtree.h
> +++ b/repair/agbtree.h
> @@ -20,10 +20,20 @@ struct bt_rebuild {
>  	/* Tree-specific data. */
>  	union {
>  		struct xfs_slab_cursor	*slab_cursor;
> +		struct {
> +			struct extent_tree_node	*bno_rec;
> +			unsigned int		freeblks;
> +		};
>  	};
>  };
>  
>  void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
>  		struct xfs_slab *lost_fsb);
> +void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
> +		unsigned int free_space, unsigned int *nr_extents,
> +		int *extra_blocks, struct bt_rebuild *btr_bno,
> +		struct bt_rebuild *btr_cnt);
> +void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
> +		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
>  
>  #endif /* __XFS_REPAIR_AG_BTREE_H__ */
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 8175aa6f..a93d900d 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -81,7 +81,10 @@ static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
>  static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
>  
>  static int
> -mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> +mk_incore_fstree(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	unsigned int		*num_freeblocks)
>  {
>  	int			in_extent;
>  	int			num_extents;
> @@ -93,6 +96,8 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
>  	xfs_extlen_t		blen;
>  	int			bstate;
>  
> +	*num_freeblocks = 0;
> +
>  	/*
>  	 * scan the bitmap for the ag looking for continuous
>  	 * extents of free blocks.  At this point, we know
> @@ -148,6 +153,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
>  #endif
>  				add_bno_extent(agno, extent_start, extent_len);
>  				add_bcnt_extent(agno, extent_start, extent_len);
> +				*num_freeblocks += extent_len;
>  			}
>  		}
>  	}
> @@ -161,6 +167,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
>  #endif
>  		add_bno_extent(agno, extent_start, extent_len);
>  		add_bcnt_extent(agno, extent_start, extent_len);
> +		*num_freeblocks += extent_len;
>  	}
>  
>  	return(num_extents);
> @@ -338,287 +345,6 @@ finish_cursor(bt_status_t *curs)
>  	free(curs->btree_blocks);
>  }
>  
> -/*
> - * We need to leave some free records in the tree for the corner case of
> - * setting up the AGFL. This may require allocation of blocks, and as
> - * such can require insertion of new records into the tree (e.g. moving
> - * a record in the by-count tree when a long extent is shortened). If we
> - * pack the records into the leaves with no slack space, this requires a
> - * leaf split to occur and a block to be allocated from the free list.
> - * If we don't have any blocks on the free list (because we are setting
> - * it up!), then we fail, and the filesystem will fail with the same
> - * failure at runtime. Hence leave a couple of records slack space in
> - * each block to allow immediate modification of the tree without
> - * requiring splits to be done.
> - *
> - * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?
> - */
> -#define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
> -	(libxfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
> -
> -/*
> - * this calculates a freespace cursor for an ag.
> - * btree_curs is an in/out.  returns the number of
> - * blocks that will show up in the AGFL.
> - */
> -static int
> -calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
> -			xfs_agblock_t *extents, bt_status_t *btree_curs)
> -{
> -	xfs_extlen_t		blocks_needed;		/* a running count */
> -	xfs_extlen_t		blocks_allocated_pt;	/* per tree */
> -	xfs_extlen_t		blocks_allocated_total;	/* for both trees */
> -	xfs_agblock_t		num_extents;
> -	int			i;
> -	int			extents_used;
> -	int			extra_blocks;
> -	bt_stat_level_t		*lptr;
> -	bt_stat_level_t		*p_lptr;
> -	extent_tree_node_t	*ext_ptr;
> -	int			level;
> -
> -	num_extents = *extents;
> -	extents_used = 0;
> -
> -	ASSERT(num_extents != 0);
> -
> -	lptr = &btree_curs->level[0];
> -	btree_curs->init = 1;
> -
> -	/*
> -	 * figure out how much space we need for the leaf level
> -	 * of the tree and set up the cursor for the leaf level
> -	 * (note that the same code is duplicated further down)
> -	 */
> -	lptr->num_blocks = howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0));
> -	lptr->num_recs_pb = num_extents / lptr->num_blocks;
> -	lptr->modulo = num_extents % lptr->num_blocks;
> -	lptr->num_recs_tot = num_extents;
> -	level = 1;
> -
> -#ifdef XR_BLD_FREE_TRACE
> -	fprintf(stderr, "%s 0 %d %d %d %d\n", __func__,
> -			lptr->num_blocks,
> -			lptr->num_recs_pb,
> -			lptr->modulo,
> -			lptr->num_recs_tot);
> -#endif
> -	/*
> -	 * if we need more levels, set them up.  # of records
> -	 * per level is the # of blocks in the level below it
> -	 */
> -	if (lptr->num_blocks > 1)  {
> -		for (; btree_curs->level[level - 1].num_blocks > 1
> -				&& level < XFS_BTREE_MAXLEVELS;
> -				level++)  {
> -			lptr = &btree_curs->level[level];
> -			p_lptr = &btree_curs->level[level - 1];
> -			lptr->num_blocks = howmany(p_lptr->num_blocks,
> -					XR_ALLOC_BLOCK_MAXRECS(mp, level));
> -			lptr->modulo = p_lptr->num_blocks
> -					% lptr->num_blocks;
> -			lptr->num_recs_pb = p_lptr->num_blocks
> -					/ lptr->num_blocks;
> -			lptr->num_recs_tot = p_lptr->num_blocks;
> -#ifdef XR_BLD_FREE_TRACE
> -			fprintf(stderr, "%s %d %d %d %d %d\n", __func__,
> -					level,
> -					lptr->num_blocks,
> -					lptr->num_recs_pb,
> -					lptr->modulo,
> -					lptr->num_recs_tot);
> -#endif
> -		}
> -	}
> -
> -	ASSERT(lptr->num_blocks == 1);
> -	btree_curs->num_levels = level;
> -
> -	/*
> -	 * ok, now we have a hypothetical cursor that
> -	 * will work for both the bno and bcnt trees.
> -	 * now figure out if using up blocks to set up the
> -	 * trees will perturb the shape of the freespace tree.
> -	 * if so, we've over-allocated.  the freespace trees
> -	 * as they will be *after* accounting for the free space
> -	 * we've used up will need fewer blocks to to represent
> -	 * than we've allocated.  We can use the AGFL to hold
> -	 * xfs_agfl_size (sector/struct xfs_agfl) blocks but that's it.
> -	 * Thus we limit things to xfs_agfl_size/2 for each of the 2 btrees.
> -	 * if the number of extra blocks is more than that,
> -	 * we'll have to be called again.
> -	 */
> -	for (blocks_needed = 0, i = 0; i < level; i++)  {
> -		blocks_needed += btree_curs->level[i].num_blocks;
> -	}
> -
> -	/*
> -	 * record the # of blocks we've allocated
> -	 */
> -	blocks_allocated_pt = blocks_needed;
> -	blocks_needed *= 2;
> -	blocks_allocated_total = blocks_needed;
> -
> -	/*
> -	 * figure out how many free extents will be used up by
> -	 * our space allocation
> -	 */
> -	if ((ext_ptr = findfirst_bcnt_extent(agno)) == NULL)
> -		do_error(_("can't rebuild fs trees -- not enough free space "
> -			   "on ag %u\n"), agno);
> -
> -	while (ext_ptr != NULL && blocks_needed > 0)  {
> -		if (ext_ptr->ex_blockcount <= blocks_needed)  {
> -			blocks_needed -= ext_ptr->ex_blockcount;
> -			extents_used++;
> -		} else  {
> -			blocks_needed = 0;
> -		}
> -
> -		ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
> -
> -#ifdef XR_BLD_FREE_TRACE
> -		if (ext_ptr != NULL)  {
> -			fprintf(stderr, "got next extent [%u %u]\n",
> -				ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> -		} else  {
> -			fprintf(stderr, "out of extents\n");
> -		}
> -#endif
> -	}
> -	if (blocks_needed > 0)
> -		do_error(_("ag %u - not enough free space to build freespace "
> -			   "btrees\n"), agno);
> -
> -	ASSERT(num_extents >= extents_used);
> -
> -	num_extents -= extents_used;
> -
> -	/*
> -	 * see if the number of leaf blocks will change as a result
> -	 * of the number of extents changing
> -	 */
> -	if (howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0))
> -			!= btree_curs->level[0].num_blocks)  {
> -		/*
> -		 * yes -- recalculate the cursor.  If the number of
> -		 * excess (overallocated) blocks is < xfs_agfl_size/2, we're ok.
> -		 * we can put those into the AGFL.  we don't try
> -		 * and get things to converge exactly (reach a
> -		 * state with zero excess blocks) because there
> -		 * exist pathological cases which will never
> -		 * converge.  first, check for the zero-case.
> -		 */
> -		if (num_extents == 0)  {
> -			/*
> -			 * ok, we've used up all the free blocks
> -			 * trying to lay out the leaf level. go
> -			 * to a one block (empty) btree and put the
> -			 * already allocated blocks into the AGFL
> -			 */
> -			if (btree_curs->level[0].num_blocks != 1)  {
> -				/*
> -				 * we really needed more blocks because
> -				 * the old tree had more than one level.
> -				 * this is bad.
> -				 */
> -				 do_warn(_("not enough free blocks left to "
> -					   "describe all free blocks in AG "
> -					   "%u\n"), agno);
> -			}
> -#ifdef XR_BLD_FREE_TRACE
> -			fprintf(stderr,
> -				"ag %u -- no free extents, alloc'ed %d\n",
> -				agno, blocks_allocated_pt);
> -#endif
> -			lptr->num_blocks = 1;
> -			lptr->modulo = 0;
> -			lptr->num_recs_pb = 0;
> -			lptr->num_recs_tot = 0;
> -
> -			btree_curs->num_levels = 1;
> -
> -			/*
> -			 * don't reset the allocation stats, assume
> -			 * they're all extra blocks
> -			 * don't forget to return the total block count
> -			 * not the per-tree block count.  these are the
> -			 * extras that will go into the AGFL.  subtract
> -			 * two for the root blocks.
> -			 */
> -			btree_curs->num_tot_blocks = blocks_allocated_pt;
> -			btree_curs->num_free_blocks = blocks_allocated_pt;
> -
> -			*extents = 0;
> -
> -			return(blocks_allocated_total - 2);
> -		}
> -
> -		lptr = &btree_curs->level[0];
> -		lptr->num_blocks = howmany(num_extents,
> -					XR_ALLOC_BLOCK_MAXRECS(mp, 0));
> -		lptr->num_recs_pb = num_extents / lptr->num_blocks;
> -		lptr->modulo = num_extents % lptr->num_blocks;
> -		lptr->num_recs_tot = num_extents;
> -		level = 1;
> -
> -		/*
> -		 * if we need more levels, set them up
> -		 */
> -		if (lptr->num_blocks > 1)  {
> -			for (level = 1; btree_curs->level[level-1].num_blocks
> -					> 1 && level < XFS_BTREE_MAXLEVELS;
> -					level++)  {
> -				lptr = &btree_curs->level[level];
> -				p_lptr = &btree_curs->level[level-1];
> -				lptr->num_blocks = howmany(p_lptr->num_blocks,
> -					XR_ALLOC_BLOCK_MAXRECS(mp, level));
> -				lptr->modulo = p_lptr->num_blocks
> -						% lptr->num_blocks;
> -				lptr->num_recs_pb = p_lptr->num_blocks
> -						/ lptr->num_blocks;
> -				lptr->num_recs_tot = p_lptr->num_blocks;
> -			}
> -		}
> -		ASSERT(lptr->num_blocks == 1);
> -		btree_curs->num_levels = level;
> -
> -		/*
> -		 * now figure out the number of excess blocks
> -		 */
> -		for (blocks_needed = 0, i = 0; i < level; i++)  {
> -			blocks_needed += btree_curs->level[i].num_blocks;
> -		}
> -		blocks_needed *= 2;
> -
> -		ASSERT(blocks_allocated_total >= blocks_needed);
> -		extra_blocks = blocks_allocated_total - blocks_needed;
> -	} else  {
> -		if (extents_used > 0) {
> -			/*
> -			 * reset the leaf level geometry to account
> -			 * for consumed extents.  we can leave the
> -			 * rest of the cursor alone since the number
> -			 * of leaf blocks hasn't changed.
> -			 */
> -			lptr = &btree_curs->level[0];
> -
> -			lptr->num_recs_pb = num_extents / lptr->num_blocks;
> -			lptr->modulo = num_extents % lptr->num_blocks;
> -			lptr->num_recs_tot = num_extents;
> -		}
> -
> -		extra_blocks = 0;
> -	}
> -
> -	btree_curs->num_tot_blocks = blocks_allocated_pt;
> -	btree_curs->num_free_blocks = blocks_allocated_pt;
> -
> -	*extents = num_extents;
> -
> -	return(extra_blocks);
> -}
> -
>  /* Map btnum to buffer ops for the types that need it. */
>  static const struct xfs_buf_ops *
>  btnum_to_ops(
> @@ -643,270 +369,6 @@ btnum_to_ops(
>  	}
>  }
>  
> -static void
> -prop_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
> -		bt_status_t *btree_curs, xfs_agblock_t startblock,
> -		xfs_extlen_t blockcount, int level, xfs_btnum_t btnum)
> -{
> -	struct xfs_btree_block	*bt_hdr;
> -	xfs_alloc_key_t		*bt_key;
> -	xfs_alloc_ptr_t		*bt_ptr;
> -	xfs_agblock_t		agbno;
> -	bt_stat_level_t		*lptr;
> -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> -	int			error;
> -
> -	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
> -
> -	level++;
> -
> -	if (level >= btree_curs->num_levels)
> -		return;
> -
> -	lptr = &btree_curs->level[level];
> -	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -
> -	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
> -		/*
> -		 * only happens once when initializing the
> -		 * left-hand side of the tree.
> -		 */
> -		prop_freespace_cursor(mp, agno, btree_curs, startblock,
> -				blockcount, level, btnum);
> -	}
> -
> -	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
> -				lptr->num_recs_pb + (lptr->modulo > 0))  {
> -		/*
> -		 * write out current prev block, grab us a new block,
> -		 * and set the rightsib pointer of current block
> -		 */
> -#ifdef XR_BLD_FREE_TRACE
> -		fprintf(stderr, " %d ", lptr->prev_agbno);
> -#endif
> -		if (lptr->prev_agbno != NULLAGBLOCK) {
> -			ASSERT(lptr->prev_buf_p != NULL);
> -			libxfs_buf_mark_dirty(lptr->prev_buf_p);
> -			libxfs_buf_relse(lptr->prev_buf_p);
> -		}
> -		lptr->prev_agbno = lptr->agbno;;
> -		lptr->prev_buf_p = lptr->buf_p;
> -		agbno = get_next_blockaddr(agno, level, btree_curs);
> -
> -		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
> -
> -		error = -libxfs_buf_get(mp->m_dev,
> -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> -		if (error)
> -			do_error(
> -	_("Cannot grab free space btree buffer, err=%d"),
> -					error);
> -		lptr->agbno = agbno;
> -
> -		if (lptr->modulo)
> -			lptr->modulo--;
> -
> -		/*
> -		 * initialize block header
> -		 */
> -		lptr->buf_p->b_ops = ops;
> -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, level,
> -					0, agno);
> -
> -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> -
> -		/*
> -		 * propagate extent record for first extent in new block up
> -		 */
> -		prop_freespace_cursor(mp, agno, btree_curs, startblock,
> -				blockcount, level, btnum);
> -	}
> -	/*
> -	 * add extent info to current block
> -	 */
> -	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
> -
> -	bt_key = XFS_ALLOC_KEY_ADDR(mp, bt_hdr,
> -				be16_to_cpu(bt_hdr->bb_numrecs));
> -	bt_ptr = XFS_ALLOC_PTR_ADDR(mp, bt_hdr,
> -				be16_to_cpu(bt_hdr->bb_numrecs),
> -				mp->m_alloc_mxr[1]);
> -
> -	bt_key->ar_startblock = cpu_to_be32(startblock);
> -	bt_key->ar_blockcount = cpu_to_be32(blockcount);
> -	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
> -}
> -
> -/*
> - * rebuilds a freespace tree given a cursor and type
> - * of tree to build (bno or bcnt).  returns the number of free blocks
> - * represented by the tree.
> - */
> -static xfs_extlen_t
> -build_freespace_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
> -		bt_status_t *btree_curs, xfs_btnum_t btnum)
> -{
> -	xfs_agnumber_t		i;
> -	xfs_agblock_t		j;
> -	struct xfs_btree_block	*bt_hdr;
> -	xfs_alloc_rec_t		*bt_rec;
> -	int			level;
> -	xfs_agblock_t		agbno;
> -	extent_tree_node_t	*ext_ptr;
> -	bt_stat_level_t		*lptr;
> -	xfs_extlen_t		freeblks;
> -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> -	int			error;
> -
> -	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
> -
> -#ifdef XR_BLD_FREE_TRACE
> -	fprintf(stderr, "in build_freespace_tree, agno = %d\n", agno);
> -#endif
> -	level = btree_curs->num_levels;
> -	freeblks = 0;
> -
> -	ASSERT(level > 0);
> -
> -	/*
> -	 * initialize the first block on each btree level
> -	 */
> -	for (i = 0; i < level; i++)  {
> -		lptr = &btree_curs->level[i];
> -
> -		agbno = get_next_blockaddr(agno, i, btree_curs);
> -		error = -libxfs_buf_get(mp->m_dev,
> -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> -		if (error)
> -			do_error(
> -	_("Cannot grab free space btree buffer, err=%d"),
> -					error);
> -
> -		if (i == btree_curs->num_levels - 1)
> -			btree_curs->root = agbno;
> -
> -		lptr->agbno = agbno;
> -		lptr->prev_agbno = NULLAGBLOCK;
> -		lptr->prev_buf_p = NULL;
> -		/*
> -		 * initialize block header
> -		 */
> -		lptr->buf_p->b_ops = ops;
> -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
> -	}
> -	/*
> -	 * run along leaf, setting up records.  as we have to switch
> -	 * blocks, call the prop_freespace_cursor routine to set up the new
> -	 * pointers for the parent.  that can recurse up to the root
> -	 * if required.  set the sibling pointers for leaf level here.
> -	 */
> -	if (btnum == XFS_BTNUM_BNO)
> -		ext_ptr = findfirst_bno_extent(agno);
> -	else
> -		ext_ptr = findfirst_bcnt_extent(agno);
> -
> -#ifdef XR_BLD_FREE_TRACE
> -	fprintf(stderr, "bft, agno = %d, start = %u, count = %u\n",
> -		agno, ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> -#endif
> -
> -	lptr = &btree_curs->level[0];
> -
> -	for (i = 0; i < btree_curs->level[0].num_blocks; i++)  {
> -		/*
> -		 * block initialization, lay in block header
> -		 */
> -		lptr->buf_p->b_ops = ops;
> -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
> -
> -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> -		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
> -							(lptr->modulo > 0));
> -#ifdef XR_BLD_FREE_TRACE
> -		fprintf(stderr, "bft, bb_numrecs = %d\n",
> -				be16_to_cpu(bt_hdr->bb_numrecs));
> -#endif
> -
> -		if (lptr->modulo > 0)
> -			lptr->modulo--;
> -
> -		/*
> -		 * initialize values in the path up to the root if
> -		 * this is a multi-level btree
> -		 */
> -		if (btree_curs->num_levels > 1)
> -			prop_freespace_cursor(mp, agno, btree_curs,
> -					ext_ptr->ex_startblock,
> -					ext_ptr->ex_blockcount,
> -					0, btnum);
> -
> -		bt_rec = (xfs_alloc_rec_t *)
> -			  ((char *)bt_hdr + XFS_ALLOC_BLOCK_LEN(mp));
> -		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
> -			ASSERT(ext_ptr != NULL);
> -			bt_rec[j].ar_startblock = cpu_to_be32(
> -							ext_ptr->ex_startblock);
> -			bt_rec[j].ar_blockcount = cpu_to_be32(
> -							ext_ptr->ex_blockcount);
> -			freeblks += ext_ptr->ex_blockcount;
> -			if (btnum == XFS_BTNUM_BNO)
> -				ext_ptr = findnext_bno_extent(ext_ptr);
> -			else
> -				ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
> -#if 0
> -#ifdef XR_BLD_FREE_TRACE
> -			if (ext_ptr == NULL)
> -				fprintf(stderr, "null extent pointer, j = %d\n",
> -					j);
> -			else
> -				fprintf(stderr,
> -				"bft, agno = %d, start = %u, count = %u\n",
> -					agno, ext_ptr->ex_startblock,
> -					ext_ptr->ex_blockcount);
> -#endif
> -#endif
> -		}
> -
> -		if (ext_ptr != NULL)  {
> -			/*
> -			 * get next leaf level block
> -			 */
> -			if (lptr->prev_buf_p != NULL)  {
> -#ifdef XR_BLD_FREE_TRACE
> -				fprintf(stderr, " writing fst agbno %u\n",
> -					lptr->prev_agbno);
> -#endif
> -				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
> -				libxfs_buf_mark_dirty(lptr->prev_buf_p);
> -				libxfs_buf_relse(lptr->prev_buf_p);
> -			}
> -			lptr->prev_buf_p = lptr->buf_p;
> -			lptr->prev_agbno = lptr->agbno;
> -			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
> -			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
> -
> -			error = -libxfs_buf_get(mp->m_dev,
> -					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
> -					XFS_FSB_TO_BB(mp, 1),
> -					&lptr->buf_p);
> -			if (error)
> -				do_error(
> -	_("Cannot grab free space btree buffer, err=%d"),
> -						error);
> -		}
> -	}
> -
> -	return(freeblks);
> -}
> -
>  /*
>   * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
>   */
> @@ -2038,6 +1500,28 @@ _("Insufficient memory to construct refcount cursor."));
>  	free_slab_cursor(&refc_cur);
>  }
>  
> +/* Fill the AGFL with any leftover bnobt rebuilder blocks. */
> +static void
> +fill_agfl(
> +	struct bt_rebuild	*btr,
> +	__be32			*agfl_bnos,
> +	unsigned int		*agfl_idx)
> +{
> +	struct bulkload_resv	*resv, *n;
> +	struct xfs_mount	*mp = btr->newbt.sc->mp;
> +
> +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> +		xfs_agblock_t	bno;
> +
> +		bno = XFS_FSB_TO_AGBNO(mp, resv->fsbno + resv->used);
> +		while (resv->used < resv->len &&
> +		       *agfl_idx < libxfs_agfl_size(mp)) {
> +			agfl_bnos[(*agfl_idx)++] = cpu_to_be32(bno++);
> +			resv->used++;
> +		}
> +	}
> +}
> +
>  /*
>   * build both the agf and the agfl for an agno given both
>   * btree cursors.
> @@ -2048,9 +1532,8 @@ static void
>  build_agf_agfl(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> -	struct bt_status	*bno_bt,
> -	struct bt_status	*bcnt_bt,
> -	xfs_extlen_t		freeblks,	/* # free blocks in tree */
> +	struct bt_rebuild	*btr_bno,
> +	struct bt_rebuild	*btr_cnt,
>  	struct bt_status	*rmap_bt,
>  	struct bt_status	*refcnt_bt,
>  	struct xfs_slab		*lost_fsb)
> @@ -2060,7 +1543,6 @@ build_agf_agfl(
>  	unsigned int		agfl_idx;
>  	struct xfs_agfl		*agfl;
>  	struct xfs_agf		*agf;
> -	xfs_fsblock_t		fsb;
>  	__be32			*freelist;
>  	int			error;
>  
> @@ -2092,13 +1574,17 @@ build_agf_agfl(
>  		agf->agf_length = cpu_to_be32(mp->m_sb.sb_dblocks -
>  			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
>  
> -	agf->agf_roots[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->root);
> -	agf->agf_levels[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->num_levels);
> -	agf->agf_roots[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->root);
> -	agf->agf_levels[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->num_levels);
> +	agf->agf_roots[XFS_BTNUM_BNO] =
> +			cpu_to_be32(btr_bno->newbt.afake.af_root);
> +	agf->agf_levels[XFS_BTNUM_BNO] =
> +			cpu_to_be32(btr_bno->newbt.afake.af_levels);
> +	agf->agf_roots[XFS_BTNUM_CNT] =
> +			cpu_to_be32(btr_cnt->newbt.afake.af_root);
> +	agf->agf_levels[XFS_BTNUM_CNT] =
> +			cpu_to_be32(btr_cnt->newbt.afake.af_levels);
>  	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
>  	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
> -	agf->agf_freeblks = cpu_to_be32(freeblks);
> +	agf->agf_freeblks = cpu_to_be32(btr_bno->freeblks);
>  	agf->agf_rmap_blocks = cpu_to_be32(rmap_bt->num_tot_blocks -
>  			rmap_bt->num_free_blocks);
>  	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
> @@ -2115,9 +1601,8 @@ build_agf_agfl(
>  		 * Don't count the root blocks as they are already
>  		 * accounted for.
>  		 */
> -		blks = (bno_bt->num_tot_blocks - bno_bt->num_free_blocks) +
> -			(bcnt_bt->num_tot_blocks - bcnt_bt->num_free_blocks) -
> -			2;
> +		blks = btr_bno->newbt.afake.af_blocks +
> +			btr_cnt->newbt.afake.af_blocks - 2;
>  		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
>  			blks += rmap_bt->num_tot_blocks - rmap_bt->num_free_blocks - 1;
>  		agf->agf_btreeblks = cpu_to_be32(blks);
> @@ -2159,50 +1644,14 @@ build_agf_agfl(
>  			freelist[agfl_idx] = cpu_to_be32(NULLAGBLOCK);
>  	}
>  
> -	/*
> -	 * do we have left-over blocks in the btree cursors that should
> -	 * be used to fill the AGFL?
> -	 */
> -	if (bno_bt->num_free_blocks > 0 || bcnt_bt->num_free_blocks > 0)  {
> -		/*
> -		 * yes, now grab as many blocks as we can
> -		 */
> -		agfl_idx = 0;
> -		while (bno_bt->num_free_blocks > 0 &&
> -		       agfl_idx < libxfs_agfl_size(mp))
> -		{
> -			freelist[agfl_idx] = cpu_to_be32(
> -					get_next_blockaddr(agno, 0, bno_bt));
> -			agfl_idx++;
> -		}
> -
> -		while (bcnt_bt->num_free_blocks > 0 &&
> -		       agfl_idx < libxfs_agfl_size(mp))
> -		{
> -			freelist[agfl_idx] = cpu_to_be32(
> -					get_next_blockaddr(agno, 0, bcnt_bt));
> -			agfl_idx++;
> -		}
> -		/*
> -		 * now throw the rest of the blocks away and complain
> -		 */
> -		while (bno_bt->num_free_blocks > 0) {
> -			fsb = XFS_AGB_TO_FSB(mp, agno,
> -					get_next_blockaddr(agno, 0, bno_bt));
> -			error = slab_add(lost_fsb, &fsb);
> -			if (error)
> -				do_error(
> -_("Insufficient memory saving lost blocks.\n"));
> -		}
> -		while (bcnt_bt->num_free_blocks > 0) {
> -			fsb = XFS_AGB_TO_FSB(mp, agno,
> -					get_next_blockaddr(agno, 0, bcnt_bt));
> -			error = slab_add(lost_fsb, &fsb);
> -			if (error)
> -				do_error(
> -_("Insufficient memory saving lost blocks.\n"));
> -		}
> +	/* Fill the AGFL with leftover blocks or save them for later. */
> +	agfl_idx = 0;
> +	freelist = xfs_buf_to_agfl_bno(agfl_buf);
> +	fill_agfl(btr_bno, freelist, &agfl_idx);
> +	fill_agfl(btr_cnt, freelist, &agfl_idx);
>  
> +	/* Set the AGF counters for the AGFL. */
> +	if (agfl_idx > 0) {
>  		agf->agf_flfirst = 0;
>  		agf->agf_fllast = cpu_to_be32(agfl_idx - 1);
>  		agf->agf_flcount = cpu_to_be32(agfl_idx);
> @@ -2300,18 +1749,14 @@ phase5_func(
>  	uint64_t		num_free_inos;
>  	uint64_t		finobt_num_inos;
>  	uint64_t		finobt_num_free_inos;
> -	bt_status_t		bno_btree_curs;
> -	bt_status_t		bcnt_btree_curs;
> +	struct bt_rebuild	btr_bno;
> +	struct bt_rebuild	btr_cnt;
>  	bt_status_t		ino_btree_curs;
>  	bt_status_t		fino_btree_curs;
>  	bt_status_t		rmap_btree_curs;
>  	bt_status_t		refcnt_btree_curs;
>  	int			extra_blocks = 0;
>  	uint			num_freeblocks;
> -	xfs_extlen_t		freeblks1;
> -#ifdef DEBUG
> -	xfs_extlen_t		freeblks2;
> -#endif
>  	xfs_agblock_t		num_extents;
>  
>  	if (verbose)
> @@ -2320,7 +1765,7 @@ phase5_func(
>  	/*
>  	 * build up incore bno and bcnt extent btrees
>  	 */
> -	num_extents = mk_incore_fstree(mp, agno);
> +	num_extents = mk_incore_fstree(mp, agno, &num_freeblocks);
>  
>  #ifdef XR_BLD_FREE_TRACE
>  	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
> @@ -2392,8 +1837,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	/*
>  	 * track blocks that we might really lose
>  	 */
> -	extra_blocks = calculate_freespace_cursor(mp, agno,
> -				&num_extents, &bno_btree_curs);
> +	init_freespace_cursors(&sc, agno, num_freeblocks, &num_extents,
> +			&extra_blocks, &btr_bno, &btr_cnt);
>  
>  	/*
>  	 * freespace btrees live in the "free space" but the filesystem treats
> @@ -2410,37 +1855,18 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	if (extra_blocks > 0)
>  		sb_fdblocks_ag[agno] -= extra_blocks;
>  
> -	bcnt_btree_curs = bno_btree_curs;
> -
> -	bno_btree_curs.owner = XFS_RMAP_OWN_AG;
> -	bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
> -	setup_cursor(mp, agno, &bno_btree_curs);
> -	setup_cursor(mp, agno, &bcnt_btree_curs);
> -
>  #ifdef XR_BLD_FREE_TRACE
>  	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
>  	fprintf(stderr, "# of bcnt extents is %d\n", count_bcnt_extents(agno));
>  #endif
>  
> -	/*
> -	 * now rebuild the freespace trees
> -	 */
> -	freeblks1 = build_freespace_tree(mp, agno,
> -					&bno_btree_curs, XFS_BTNUM_BNO);
> +	build_freespace_btrees(&sc, agno, &btr_bno, &btr_cnt);
> +
>  #ifdef XR_BLD_FREE_TRACE
> -	fprintf(stderr, "# of free blocks == %d\n", freeblks1);
> +	fprintf(stderr, "# of free blocks == %d/%d\n", btr_bno.freeblks,
> +			btr_cnt.freeblks);
>  #endif
> -	write_cursor(&bno_btree_curs);
> -
> -#ifdef DEBUG
> -	freeblks2 = build_freespace_tree(mp, agno,
> -				&bcnt_btree_curs, XFS_BTNUM_CNT);
> -#else
> -	(void) build_freespace_tree(mp, agno, &bcnt_btree_curs, XFS_BTNUM_CNT);
> -#endif
> -	write_cursor(&bcnt_btree_curs);
> -
> -	ASSERT(freeblks1 == freeblks2);
> +	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
>  
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
>  		build_rmap_tree(mp, agno, &rmap_btree_curs);
> @@ -2457,8 +1883,9 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	/*
>  	 * set up agf and agfl
>  	 */
> -	build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs, freeblks1,
> -			&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
> +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> +			&refcnt_btree_curs, lost_fsb);
> +
>  	/*
>  	 * build inode allocation tree.
>  	 */
> @@ -2480,7 +1907,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	/*
>  	 * tear down cursors
>  	 */
> -	finish_cursor(&bno_btree_curs);
> +	finish_rebuild(mp, &btr_bno, lost_fsb);
> +	finish_rebuild(mp, &btr_cnt, lost_fsb);
>  	finish_cursor(&ino_btree_curs);
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
>  		finish_cursor(&rmap_btree_curs);
> @@ -2488,7 +1916,6 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  		finish_cursor(&refcnt_btree_curs);
>  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
>  		finish_cursor(&fino_btree_curs);
> -	finish_cursor(&bcnt_btree_curs);
>  
>  	/*
>  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/12] xfs_repair: rebuild inode btrees with bulk loader
  2020-06-02  4:27 ` [PATCH 08/12] xfs_repair: rebuild inode " Darrick J. Wong
@ 2020-06-18 15:24   ` Brian Foster
  2020-06-18 18:33     ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Brian Foster @ 2020-06-18 15:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:44PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the btree bulk loading functions to rebuild the inode btrees
> and drop the open-coded implementation.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  libxfs/libxfs_api_defs.h |    1 
>  repair/agbtree.c         |  207 ++++++++++++++++++++
>  repair/agbtree.h         |   13 +
>  repair/phase5.c          |  488 +++-------------------------------------------
>  4 files changed, 248 insertions(+), 461 deletions(-)
> 
> 
...
> diff --git a/repair/agbtree.c b/repair/agbtree.c
> index 3b8ab47c..e44475fc 100644
> --- a/repair/agbtree.c
> +++ b/repair/agbtree.c
> @@ -308,3 +308,210 @@ _("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
>  	libxfs_btree_del_cursor(btr_bno->cur, 0);
>  	libxfs_btree_del_cursor(btr_cnt->cur, 0);
>  }
...
> +/* Initialize both inode btree cursors as needed. */
> +void
> +init_ino_cursors(
> +	struct repair_ctx	*sc,
> +	xfs_agnumber_t		agno,
> +	unsigned int		free_space,
> +	uint64_t		*num_inos,
> +	uint64_t		*num_free_inos,
> +	struct bt_rebuild	*btr_ino,
> +	struct bt_rebuild	*btr_fino)
> +{
> +	struct ino_tree_node	*ino_rec;
> +	unsigned int		ino_recs = 0;
> +	unsigned int		fino_recs = 0;
> +	bool			finobt;
> +	int			error;
> +
> +	finobt = xfs_sb_version_hasfinobt(&sc->mp->m_sb);

Seems like a pointless variable given it is only used in one place.
Otherwise looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	init_rebuild(sc, &XFS_RMAP_OINFO_INOBT, free_space, btr_ino);
> +
> +	/* Compute inode statistics. */
> +	*num_free_inos = 0;
> +	*num_inos = 0;
> +	for (ino_rec = findfirst_inode_rec(agno);
> +	     ino_rec != NULL;
> +	     ino_rec = next_ino_rec(ino_rec))  {
> +		unsigned int	rec_ninos = 0;
> +		unsigned int	rec_nfinos = 0;
> +		int		i;
> +
> +		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)  {
> +			ASSERT(is_inode_confirmed(ino_rec, i));
> +			/*
> +			 * sparse inodes are not factored into superblock (free)
> +			 * inode counts
> +			 */
> +			if (is_inode_sparse(ino_rec, i))
> +				continue;
> +			if (is_inode_free(ino_rec, i))
> +				rec_nfinos++;
> +			rec_ninos++;
> +		}
> +
> +		*num_free_inos += rec_nfinos;
> +		*num_inos += rec_ninos;
> +		ino_recs++;
> +
> +		/* finobt only considers records with free inodes */
> +		if (rec_nfinos)
> +			fino_recs++;
> +	}
> +
> +	btr_ino->cur = libxfs_inobt_stage_cursor(sc->mp, &btr_ino->newbt.afake,
> +			agno, XFS_BTNUM_INO);
> +
> +	btr_ino->bload.get_record = get_inobt_record;
> +	btr_ino->bload.claim_block = rebuild_claim_block;
> +	btr_ino->first_agino = NULLAGINO;
> +
> +	/* Compute how many inobt blocks we'll need. */
> +	error = -libxfs_btree_bload_compute_geometry(btr_ino->cur,
> +			&btr_ino->bload, ino_recs);
> +	if (error)
> +		do_error(
> +_("Unable to compute inode btree geometry, error %d.\n"), error);
> +
> +	reserve_btblocks(sc->mp, agno, btr_ino, btr_ino->bload.nr_blocks);
> +
> +	if (!finobt)
> +		return;
> +
> +	init_rebuild(sc, &XFS_RMAP_OINFO_INOBT, free_space, btr_fino);
> +	btr_fino->cur = libxfs_inobt_stage_cursor(sc->mp,
> +			&btr_fino->newbt.afake, agno, XFS_BTNUM_FINO);
> +
> +	btr_fino->bload.get_record = get_inobt_record;
> +	btr_fino->bload.claim_block = rebuild_claim_block;
> +	btr_fino->first_agino = NULLAGINO;
> +
> +	/* Compute how many finobt blocks we'll need. */
> +	error = -libxfs_btree_bload_compute_geometry(btr_fino->cur,
> +			&btr_fino->bload, fino_recs);
> +	if (error)
> +		do_error(
> +_("Unable to compute free inode btree geometry, error %d.\n"), error);
> +
> +	reserve_btblocks(sc->mp, agno, btr_fino, btr_fino->bload.nr_blocks);
> +}
> +
> +/* Rebuild the inode btrees. */
> +void
> +build_inode_btrees(
> +	struct repair_ctx	*sc,
> +	xfs_agnumber_t		agno,
> +	struct bt_rebuild	*btr_ino,
> +	struct bt_rebuild	*btr_fino)
> +{
> +	int			error;
> +
> +	/* Add all observed inobt records. */
> +	error = -libxfs_btree_bload(btr_ino->cur, &btr_ino->bload, btr_ino);
> +	if (error)
> +		do_error(
> +_("Error %d while creating inobt btree for AG %u.\n"), error, agno);
> +
> +	/* Since we're not writing the AGI yet, no need to commit the cursor */
> +	libxfs_btree_del_cursor(btr_ino->cur, 0);
> +
> +	if (!xfs_sb_version_hasfinobt(&sc->mp->m_sb))
> +		return;
> +
> +	/* Add all observed finobt records. */
> +	error = -libxfs_btree_bload(btr_fino->cur, &btr_fino->bload, btr_fino);
> +	if (error)
> +		do_error(
> +_("Error %d while creating finobt btree for AG %u.\n"), error, agno);
> +
> +	/* Since we're not writing the AGI yet, no need to commit the cursor */
> +	libxfs_btree_del_cursor(btr_fino->cur, 0);
> +}
> diff --git a/repair/agbtree.h b/repair/agbtree.h
> index 63352247..3cad2a8e 100644
> --- a/repair/agbtree.h
> +++ b/repair/agbtree.h
> @@ -24,6 +24,12 @@ struct bt_rebuild {
>  			struct extent_tree_node	*bno_rec;
>  			unsigned int		freeblks;
>  		};
> +		struct {
> +			struct ino_tree_node	*ino_rec;
> +			xfs_agino_t		first_agino;
> +			xfs_agino_t		count;
> +			xfs_agino_t		freecount;
> +		};
>  	};
>  };
>  
> @@ -36,4 +42,11 @@ void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
>  void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
>  		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
>  
> +void init_ino_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
> +		unsigned int free_space, uint64_t *num_inos,
> +		uint64_t *num_free_inos, struct bt_rebuild *btr_ino,
> +		struct bt_rebuild *btr_fino);
> +void build_inode_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
> +		struct bt_rebuild *btr_ino, struct bt_rebuild *btr_fino);
> +
>  #endif /* __XFS_REPAIR_AG_BTREE_H__ */
> diff --git a/repair/phase5.c b/repair/phase5.c
> index a93d900d..e570349d 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -67,15 +67,6 @@ typedef struct bt_status  {
>  	uint64_t		owner;		/* owner */
>  } bt_status_t;
>  
> -/*
> - * extra metadata for the agi
> - */
> -struct agi_stat {
> -	xfs_agino_t		first_agino;
> -	xfs_agino_t		count;
> -	xfs_agino_t		freecount;
> -};
> -
>  static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
>  static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
>  static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
> @@ -369,229 +360,20 @@ btnum_to_ops(
>  	}
>  }
>  
> -/*
> - * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
> - */
> -#define XR_INOBT_BLOCK_MAXRECS(mp, level) \
> -			libxfs_inobt_maxrecs((mp), (mp)->m_sb.sb_blocksize, \
> -						(level) == 0)
> -
> -/*
> - * we don't have to worry here about how chewing up free extents
> - * may perturb things because inode tree building happens before
> - * freespace tree building.
> - */
> -static void
> -init_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> -		uint64_t *num_inos, uint64_t *num_free_inos, int finobt)
> -{
> -	uint64_t		ninos;
> -	uint64_t		nfinos;
> -	int			rec_nfinos;
> -	int			rec_ninos;
> -	ino_tree_node_t		*ino_rec;
> -	int			num_recs;
> -	int			level;
> -	bt_stat_level_t		*lptr;
> -	bt_stat_level_t		*p_lptr;
> -	xfs_extlen_t		blocks_allocated;
> -	int			i;
> -
> -	*num_inos = *num_free_inos = 0;
> -	ninos = nfinos = 0;
> -
> -	lptr = &btree_curs->level[0];
> -	btree_curs->init = 1;
> -	btree_curs->owner = XFS_RMAP_OWN_INOBT;
> -
> -	/*
> -	 * build up statistics
> -	 */
> -	ino_rec = findfirst_inode_rec(agno);
> -	for (num_recs = 0; ino_rec != NULL; ino_rec = next_ino_rec(ino_rec))  {
> -		rec_ninos = 0;
> -		rec_nfinos = 0;
> -		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)  {
> -			ASSERT(is_inode_confirmed(ino_rec, i));
> -			/*
> -			 * sparse inodes are not factored into superblock (free)
> -			 * inode counts
> -			 */
> -			if (is_inode_sparse(ino_rec, i))
> -				continue;
> -			if (is_inode_free(ino_rec, i))
> -				rec_nfinos++;
> -			rec_ninos++;
> -		}
> -
> -		/*
> -		 * finobt only considers records with free inodes
> -		 */
> -		if (finobt && !rec_nfinos)
> -			continue;
> -
> -		nfinos += rec_nfinos;
> -		ninos += rec_ninos;
> -		num_recs++;
> -	}
> -
> -	if (num_recs == 0) {
> -		/*
> -		 * easy corner-case -- no inode records
> -		 */
> -		lptr->num_blocks = 1;
> -		lptr->modulo = 0;
> -		lptr->num_recs_pb = 0;
> -		lptr->num_recs_tot = 0;
> -
> -		btree_curs->num_levels = 1;
> -		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
> -
> -		setup_cursor(mp, agno, btree_curs);
> -
> -		return;
> -	}
> -
> -	blocks_allocated = lptr->num_blocks = howmany(num_recs,
> -					XR_INOBT_BLOCK_MAXRECS(mp, 0));
> -
> -	lptr->modulo = num_recs % lptr->num_blocks;
> -	lptr->num_recs_pb = num_recs / lptr->num_blocks;
> -	lptr->num_recs_tot = num_recs;
> -	level = 1;
> -
> -	if (lptr->num_blocks > 1)  {
> -		for (; btree_curs->level[level-1].num_blocks > 1
> -				&& level < XFS_BTREE_MAXLEVELS;
> -				level++)  {
> -			lptr = &btree_curs->level[level];
> -			p_lptr = &btree_curs->level[level - 1];
> -			lptr->num_blocks = howmany(p_lptr->num_blocks,
> -				XR_INOBT_BLOCK_MAXRECS(mp, level));
> -			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
> -			lptr->num_recs_pb = p_lptr->num_blocks
> -					/ lptr->num_blocks;
> -			lptr->num_recs_tot = p_lptr->num_blocks;
> -
> -			blocks_allocated += lptr->num_blocks;
> -		}
> -	}
> -	ASSERT(lptr->num_blocks == 1);
> -	btree_curs->num_levels = level;
> -
> -	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
> -			= blocks_allocated;
> -
> -	setup_cursor(mp, agno, btree_curs);
> -
> -	*num_inos = ninos;
> -	*num_free_inos = nfinos;
> -
> -	return;
> -}
> -
> -static void
> -prop_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> -	xfs_btnum_t btnum, xfs_agino_t startino, int level)
> -{
> -	struct xfs_btree_block	*bt_hdr;
> -	xfs_inobt_key_t		*bt_key;
> -	xfs_inobt_ptr_t		*bt_ptr;
> -	xfs_agblock_t		agbno;
> -	bt_stat_level_t		*lptr;
> -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> -	int			error;
> -
> -	level++;
> -
> -	if (level >= btree_curs->num_levels)
> -		return;
> -
> -	lptr = &btree_curs->level[level];
> -	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -
> -	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
> -		/*
> -		 * this only happens once to initialize the
> -		 * first path up the left side of the tree
> -		 * where the agbno's are already set up
> -		 */
> -		prop_ino_cursor(mp, agno, btree_curs, btnum, startino, level);
> -	}
> -
> -	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
> -				lptr->num_recs_pb + (lptr->modulo > 0))  {
> -		/*
> -		 * write out current prev block, grab us a new block,
> -		 * and set the rightsib pointer of current block
> -		 */
> -#ifdef XR_BLD_INO_TRACE
> -		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
> -#endif
> -		if (lptr->prev_agbno != NULLAGBLOCK)  {
> -			ASSERT(lptr->prev_buf_p != NULL);
> -			libxfs_buf_mark_dirty(lptr->prev_buf_p);
> -			libxfs_buf_relse(lptr->prev_buf_p);
> -		}
> -		lptr->prev_agbno = lptr->agbno;;
> -		lptr->prev_buf_p = lptr->buf_p;
> -		agbno = get_next_blockaddr(agno, level, btree_curs);
> -
> -		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
> -
> -		error = -libxfs_buf_get(mp->m_dev,
> -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> -		if (error)
> -			do_error(_("Cannot grab inode btree buffer, err=%d"),
> -					error);
> -		lptr->agbno = agbno;
> -
> -		if (lptr->modulo)
> -			lptr->modulo--;
> -
> -		/*
> -		 * initialize block header
> -		 */
> -		lptr->buf_p->b_ops = ops;
> -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> -		libxfs_btree_init_block(mp, lptr->buf_p, btnum,
> -					level, 0, agno);
> -
> -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> -
> -		/*
> -		 * propagate extent record for first extent in new block up
> -		 */
> -		prop_ino_cursor(mp, agno, btree_curs, btnum, startino, level);
> -	}
> -	/*
> -	 * add inode info to current block
> -	 */
> -	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
> -
> -	bt_key = XFS_INOBT_KEY_ADDR(mp, bt_hdr,
> -				    be16_to_cpu(bt_hdr->bb_numrecs));
> -	bt_ptr = XFS_INOBT_PTR_ADDR(mp, bt_hdr,
> -				    be16_to_cpu(bt_hdr->bb_numrecs),
> -				    M_IGEO(mp)->inobt_mxr[1]);
> -
> -	bt_key->ir_startino = cpu_to_be32(startino);
> -	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
> -}
> -
>  /*
>   * XXX: yet more code that can be shared with mkfs, growfs.
>   */
>  static void
> -build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> -		bt_status_t *finobt_curs, struct agi_stat *agi_stat)
> +build_agi(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	struct bt_rebuild	*btr_ino,
> +	struct bt_rebuild	*btr_fino)
>  {
> -	xfs_buf_t	*agi_buf;
> -	xfs_agi_t	*agi;
> -	int		i;
> -	int		error;
> +	struct xfs_buf		*agi_buf;
> +	struct xfs_agi		*agi;
> +	int			i;
> +	int			error;
>  
>  	error = -libxfs_buf_get(mp->m_dev,
>  			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
> @@ -611,11 +393,11 @@ build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
>  	else
>  		agi->agi_length = cpu_to_be32(mp->m_sb.sb_dblocks -
>  			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
> -	agi->agi_count = cpu_to_be32(agi_stat->count);
> -	agi->agi_root = cpu_to_be32(btree_curs->root);
> -	agi->agi_level = cpu_to_be32(btree_curs->num_levels);
> -	agi->agi_freecount = cpu_to_be32(agi_stat->freecount);
> -	agi->agi_newino = cpu_to_be32(agi_stat->first_agino);
> +	agi->agi_count = cpu_to_be32(btr_ino->count);
> +	agi->agi_root = cpu_to_be32(btr_ino->newbt.afake.af_root);
> +	agi->agi_level = cpu_to_be32(btr_ino->newbt.afake.af_levels);
> +	agi->agi_freecount = cpu_to_be32(btr_ino->freecount);
> +	agi->agi_newino = cpu_to_be32(btr_ino->first_agino);
>  	agi->agi_dirino = cpu_to_be32(NULLAGINO);
>  
>  	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++)
> @@ -625,203 +407,16 @@ build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
>  		platform_uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
>  
>  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> -		agi->agi_free_root = cpu_to_be32(finobt_curs->root);
> -		agi->agi_free_level = cpu_to_be32(finobt_curs->num_levels);
> +		agi->agi_free_root =
> +				cpu_to_be32(btr_fino->newbt.afake.af_root);
> +		agi->agi_free_level =
> +				cpu_to_be32(btr_fino->newbt.afake.af_levels);
>  	}
>  
>  	libxfs_buf_mark_dirty(agi_buf);
>  	libxfs_buf_relse(agi_buf);
>  }
>  
> -/*
> - * rebuilds an inode tree given a cursor.  We're lazy here and call
> - * the routine that builds the agi
> - */
> -static void
> -build_ino_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
> -		bt_status_t *btree_curs, xfs_btnum_t btnum,
> -		struct agi_stat *agi_stat)
> -{
> -	xfs_agnumber_t		i;
> -	xfs_agblock_t		j;
> -	xfs_agblock_t		agbno;
> -	xfs_agino_t		first_agino;
> -	struct xfs_btree_block	*bt_hdr;
> -	xfs_inobt_rec_t		*bt_rec;
> -	ino_tree_node_t		*ino_rec;
> -	bt_stat_level_t		*lptr;
> -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> -	xfs_agino_t		count = 0;
> -	xfs_agino_t		freecount = 0;
> -	int			inocnt;
> -	uint8_t			finocnt;
> -	int			k;
> -	int			level = btree_curs->num_levels;
> -	int			spmask;
> -	uint64_t		sparse;
> -	uint16_t		holemask;
> -	int			error;
> -
> -	ASSERT(btnum == XFS_BTNUM_INO || btnum == XFS_BTNUM_FINO);
> -
> -	for (i = 0; i < level; i++)  {
> -		lptr = &btree_curs->level[i];
> -
> -		agbno = get_next_blockaddr(agno, i, btree_curs);
> -		error = -libxfs_buf_get(mp->m_dev,
> -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> -		if (error)
> -			do_error(_("Cannot grab inode btree buffer, err=%d"),
> -					error);
> -
> -		if (i == btree_curs->num_levels - 1)
> -			btree_curs->root = agbno;
> -
> -		lptr->agbno = agbno;
> -		lptr->prev_agbno = NULLAGBLOCK;
> -		lptr->prev_buf_p = NULL;
> -		/*
> -		 * initialize block header
> -		 */
> -
> -		lptr->buf_p->b_ops = ops;
> -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
> -	}
> -
> -	/*
> -	 * run along leaf, setting up records.  as we have to switch
> -	 * blocks, call the prop_ino_cursor routine to set up the new
> -	 * pointers for the parent.  that can recurse up to the root
> -	 * if required.  set the sibling pointers for leaf level here.
> -	 */
> -	if (btnum == XFS_BTNUM_FINO)
> -		ino_rec = findfirst_free_inode_rec(agno);
> -	else
> -		ino_rec = findfirst_inode_rec(agno);
> -
> -	if (ino_rec != NULL)
> -		first_agino = ino_rec->ino_startnum;
> -	else
> -		first_agino = NULLAGINO;
> -
> -	lptr = &btree_curs->level[0];
> -
> -	for (i = 0; i < lptr->num_blocks; i++)  {
> -		/*
> -		 * block initialization, lay in block header
> -		 */
> -		lptr->buf_p->b_ops = ops;
> -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
> -
> -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> -		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
> -							(lptr->modulo > 0));
> -
> -		if (lptr->modulo > 0)
> -			lptr->modulo--;
> -
> -		if (lptr->num_recs_pb > 0)
> -			prop_ino_cursor(mp, agno, btree_curs, btnum,
> -					ino_rec->ino_startnum, 0);
> -
> -		bt_rec = (xfs_inobt_rec_t *)
> -			  ((char *)bt_hdr + XFS_INOBT_BLOCK_LEN(mp));
> -		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
> -			ASSERT(ino_rec != NULL);
> -			bt_rec[j].ir_startino =
> -					cpu_to_be32(ino_rec->ino_startnum);
> -			bt_rec[j].ir_free = cpu_to_be64(ino_rec->ir_free);
> -
> -			inocnt = finocnt = 0;
> -			for (k = 0; k < sizeof(xfs_inofree_t)*NBBY; k++)  {
> -				ASSERT(is_inode_confirmed(ino_rec, k));
> -
> -				if (is_inode_sparse(ino_rec, k))
> -					continue;
> -				if (is_inode_free(ino_rec, k))
> -					finocnt++;
> -				inocnt++;
> -			}
> -
> -			/*
> -			 * Set the freecount and check whether we need to update
> -			 * the sparse format fields. Otherwise, skip to the next
> -			 * record.
> -			 */
> -			inorec_set_freecount(mp, &bt_rec[j], finocnt);
> -			if (!xfs_sb_version_hassparseinodes(&mp->m_sb))
> -				goto nextrec;
> -
> -			/*
> -			 * Convert the 64-bit in-core sparse inode state to the
> -			 * 16-bit on-disk holemask.
> -			 */
> -			holemask = 0;
> -			spmask = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1;
> -			sparse = ino_rec->ir_sparse;
> -			for (k = 0; k < XFS_INOBT_HOLEMASK_BITS; k++) {
> -				if (sparse & spmask) {
> -					ASSERT((sparse & spmask) == spmask);
> -					holemask |= (1 << k);
> -				} else
> -					ASSERT((sparse & spmask) == 0);
> -				sparse >>= XFS_INODES_PER_HOLEMASK_BIT;
> -			}
> -
> -			bt_rec[j].ir_u.sp.ir_count = inocnt;
> -			bt_rec[j].ir_u.sp.ir_holemask = cpu_to_be16(holemask);
> -
> -nextrec:
> -			freecount += finocnt;
> -			count += inocnt;
> -
> -			if (btnum == XFS_BTNUM_FINO)
> -				ino_rec = next_free_ino_rec(ino_rec);
> -			else
> -				ino_rec = next_ino_rec(ino_rec);
> -		}
> -
> -		if (ino_rec != NULL)  {
> -			/*
> -			 * get next leaf level block
> -			 */
> -			if (lptr->prev_buf_p != NULL)  {
> -#ifdef XR_BLD_INO_TRACE
> -				fprintf(stderr, "writing inobt agbno %u\n",
> -					lptr->prev_agbno);
> -#endif
> -				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
> -				libxfs_buf_mark_dirty(lptr->prev_buf_p);
> -				libxfs_buf_relse(lptr->prev_buf_p);
> -			}
> -			lptr->prev_buf_p = lptr->buf_p;
> -			lptr->prev_agbno = lptr->agbno;
> -			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
> -			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
> -
> -			error = -libxfs_buf_get(mp->m_dev,
> -					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
> -					XFS_FSB_TO_BB(mp, 1),
> -					&lptr->buf_p);
> -			if (error)
> -				do_error(
> -	_("Cannot grab inode btree buffer, err=%d"),
> -						error);
> -		}
> -	}
> -
> -	if (agi_stat) {
> -		agi_stat->first_agino = first_agino;
> -		agi_stat->count = count;
> -		agi_stat->freecount = freecount;
> -	}
> -}
> -
>  /* rebuild the rmap tree */
>  
>  /*
> @@ -1744,15 +1339,10 @@ phase5_func(
>  	struct xfs_slab		*lost_fsb)
>  {
>  	struct repair_ctx	sc = { .mp = mp, };
> -	struct agi_stat		agi_stat = {0,};
> -	uint64_t		num_inos;
> -	uint64_t		num_free_inos;
> -	uint64_t		finobt_num_inos;
> -	uint64_t		finobt_num_free_inos;
>  	struct bt_rebuild	btr_bno;
>  	struct bt_rebuild	btr_cnt;
> -	bt_status_t		ino_btree_curs;
> -	bt_status_t		fino_btree_curs;
> +	struct bt_rebuild	btr_ino;
> +	struct bt_rebuild	btr_fino;
>  	bt_status_t		rmap_btree_curs;
>  	bt_status_t		refcnt_btree_curs;
>  	int			extra_blocks = 0;
> @@ -1785,19 +1375,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  			agno);
>  	}
>  
> -	/*
> -	 * ok, now set up the btree cursors for the on-disk btrees (includes
> -	 * pre-allocating all required blocks for the trees themselves)
> -	 */
> -	init_ino_cursor(mp, agno, &ino_btree_curs, &num_inos,
> -			&num_free_inos, 0);
> -
> -	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> -		init_ino_cursor(mp, agno, &fino_btree_curs, &finobt_num_inos,
> -				&finobt_num_free_inos, 1);
> -
> -	sb_icount_ag[agno] += num_inos;
> -	sb_ifree_ag[agno] += num_free_inos;
> +	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
> +			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
>  
>  	/*
>  	 * Set up the btree cursors for the on-disk rmap btrees, which includes
> @@ -1886,36 +1465,23 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
>  			&refcnt_btree_curs, lost_fsb);
>  
> -	/*
> -	 * build inode allocation tree.
> -	 */
> -	build_ino_tree(mp, agno, &ino_btree_curs, XFS_BTNUM_INO, &agi_stat);
> -	write_cursor(&ino_btree_curs);
> -
> -	/*
> -	 * build free inode tree
> -	 */
> -	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> -		build_ino_tree(mp, agno, &fino_btree_curs,
> -				XFS_BTNUM_FINO, NULL);
> -		write_cursor(&fino_btree_curs);
> -	}
> +	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
>  
>  	/* build the agi */
> -	build_agi(mp, agno, &ino_btree_curs, &fino_btree_curs, &agi_stat);
> +	build_agi(mp, agno, &btr_ino, &btr_fino);
>  
>  	/*
>  	 * tear down cursors
>  	 */
>  	finish_rebuild(mp, &btr_bno, lost_fsb);
>  	finish_rebuild(mp, &btr_cnt, lost_fsb);
> -	finish_cursor(&ino_btree_curs);
> +	finish_rebuild(mp, &btr_ino, lost_fsb);
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		finish_rebuild(mp, &btr_fino, lost_fsb);
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
>  		finish_cursor(&rmap_btree_curs);
>  	if (xfs_sb_version_hasreflink(&mp->m_sb))
>  		finish_cursor(&refcnt_btree_curs);
> -	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> -		finish_cursor(&fino_btree_curs);
>  
>  	/*
>  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/12] xfs_repair: rebuild reverse mapping btrees with bulk loader
  2020-06-02  4:27 ` [PATCH 09/12] xfs_repair: rebuild reverse mapping " Darrick J. Wong
@ 2020-06-18 15:25   ` Brian Foster
  2020-06-18 15:31     ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Brian Foster @ 2020-06-18 15:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:51PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the btree bulk loading functions to rebuild the reverse mapping
> btrees and drop the open-coded implementation.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  libxfs/libxfs_api_defs.h |    1 
>  repair/agbtree.c         |   70 ++++++++
>  repair/agbtree.h         |    5 +
>  repair/phase5.c          |  409 ++--------------------------------------------
>  4 files changed, 96 insertions(+), 389 deletions(-)
> 
> 
...
> diff --git a/repair/phase5.c b/repair/phase5.c
> index e570349d..1c6448f4 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
...
> @@ -1244,6 +879,8 @@ build_agf_agfl(
>  	freelist = xfs_buf_to_agfl_bno(agfl_buf);
>  	fill_agfl(btr_bno, freelist, &agfl_idx);
>  	fill_agfl(btr_cnt, freelist, &agfl_idx);
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		fill_agfl(btr_rmap, freelist, &agfl_idx);

Is this new behavior? Either way, I guess it makes sense since the
rmapbt feeds from/to the agfl:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  
>  	/* Set the AGF counters for the AGFL. */
>  	if (agfl_idx > 0) {
> @@ -1343,7 +980,7 @@ phase5_func(
>  	struct bt_rebuild	btr_cnt;
>  	struct bt_rebuild	btr_ino;
>  	struct bt_rebuild	btr_fino;
> -	bt_status_t		rmap_btree_curs;
> +	struct bt_rebuild	btr_rmap;
>  	bt_status_t		refcnt_btree_curs;
>  	int			extra_blocks = 0;
>  	uint			num_freeblocks;
> @@ -1378,11 +1015,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
>  			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
>  
> -	/*
> -	 * Set up the btree cursors for the on-disk rmap btrees, which includes
> -	 * pre-allocating all required blocks.
> -	 */
> -	init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
> +	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
>  
>  	/*
>  	 * Set up the btree cursors for the on-disk refcount btrees,
> @@ -1448,10 +1081,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
>  
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> -		build_rmap_tree(mp, agno, &rmap_btree_curs);
> -		write_cursor(&rmap_btree_curs);
> -		sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
> -				rmap_btree_curs.num_free_blocks) - 1;
> +		build_rmap_tree(&sc, agno, &btr_rmap);
> +		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
>  	}
>  
>  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> @@ -1462,7 +1093,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	/*
>  	 * set up agf and agfl
>  	 */
> -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
>  			&refcnt_btree_curs, lost_fsb);
>  
>  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> @@ -1479,7 +1110,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
>  		finish_rebuild(mp, &btr_fino, lost_fsb);
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> -		finish_cursor(&rmap_btree_curs);
> +		finish_rebuild(mp, &btr_rmap, lost_fsb);
>  	if (xfs_sb_version_hasreflink(&mp->m_sb))
>  		finish_cursor(&refcnt_btree_curs);
>  
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/12] xfs_repair: rebuild refcount btrees with bulk loader
  2020-06-02  4:27 ` [PATCH 10/12] xfs_repair: rebuild refcount " Darrick J. Wong
@ 2020-06-18 15:26   ` Brian Foster
  2020-06-18 16:56     ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Brian Foster @ 2020-06-18 15:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:27:57PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the btree bulk loading functions to rebuild the refcount btrees
> and drop the open-coded implementation.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  libxfs/libxfs_api_defs.h |    1 
>  repair/agbtree.c         |   71 ++++++++++
>  repair/agbtree.h         |    5 +
>  repair/phase5.c          |  341 ++--------------------------------------------
>  4 files changed, 93 insertions(+), 325 deletions(-)
> 
> 
...
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 1c6448f4..ad009416 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
...
> @@ -817,10 +510,14 @@ build_agf_agfl(
>  				cpu_to_be32(btr_rmap->newbt.afake.af_blocks);
>  	}
>  
> -	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
> -	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
> -	agf->agf_refcount_blocks = cpu_to_be32(refcnt_bt->num_tot_blocks -
> -			refcnt_bt->num_free_blocks);
> +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> +		agf->agf_refcount_root =
> +				cpu_to_be32(btr_refc->newbt.afake.af_root);
> +		agf->agf_refcount_level =
> +				cpu_to_be32(btr_refc->newbt.afake.af_levels);
> +		agf->agf_refcount_blocks =
> +				cpu_to_be32(btr_refc->newbt.afake.af_blocks);
> +	}

It looks like the previous cursor variant (refcnt_bt) would be zeroed
out if the feature isn't enabled (causing this to zero out the agf
fields on disk), whereas now we only write the fields when the feature
is enabled. Any concern over removing that zeroing behavior? Also note
that an assert further down unconditionally reads the
->agf_refcount_root field.

BTW, I suppose the same question may apply to the previous patch as
well...

Brian

>  
>  	/*
>  	 * Count and record the number of btree blocks consumed if required.
> @@ -981,7 +678,7 @@ phase5_func(
>  	struct bt_rebuild	btr_ino;
>  	struct bt_rebuild	btr_fino;
>  	struct bt_rebuild	btr_rmap;
> -	bt_status_t		refcnt_btree_curs;
> +	struct bt_rebuild	btr_refc;
>  	int			extra_blocks = 0;
>  	uint			num_freeblocks;
>  	xfs_agblock_t		num_extents;
> @@ -1017,11 +714,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  
>  	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
>  
> -	/*
> -	 * Set up the btree cursors for the on-disk refcount btrees,
> -	 * which includes pre-allocating all required blocks.
> -	 */
> -	init_refc_cursor(mp, agno, &refcnt_btree_curs);
> +	init_refc_cursor(&sc, agno, num_freeblocks, &btr_refc);
>  
>  	num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
>  	/*
> @@ -1085,16 +778,14 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
>  	}
>  
> -	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> -		build_refcount_tree(mp, agno, &refcnt_btree_curs);
> -		write_cursor(&refcnt_btree_curs);
> -	}
> +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> +		build_refcount_tree(&sc, agno, &btr_refc);
>  
>  	/*
>  	 * set up agf and agfl
>  	 */
> -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
> -			&refcnt_btree_curs, lost_fsb);
> +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
> +			lost_fsb);
>  
>  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
>  
> @@ -1112,7 +803,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
>  		finish_rebuild(mp, &btr_rmap, lost_fsb);
>  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> -		finish_cursor(&refcnt_btree_curs);
> +		finish_rebuild(mp, &btr_refc, lost_fsb);
>  
>  	/*
>  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/12] xfs_repair: rebuild reverse mapping btrees with bulk loader
  2020-06-18 15:25   ` Brian Foster
@ 2020-06-18 15:31     ` Darrick J. Wong
  2020-06-18 15:37       ` Brian Foster
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-18 15:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 11:25:11AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:27:51PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the btree bulk loading functions to rebuild the reverse mapping
> > btrees and drop the open-coded implementation.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  libxfs/libxfs_api_defs.h |    1 
> >  repair/agbtree.c         |   70 ++++++++
> >  repair/agbtree.h         |    5 +
> >  repair/phase5.c          |  409 ++--------------------------------------------
> >  4 files changed, 96 insertions(+), 389 deletions(-)
> > 
> > 
> ...
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index e570349d..1c6448f4 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> ...
> > @@ -1244,6 +879,8 @@ build_agf_agfl(
> >  	freelist = xfs_buf_to_agfl_bno(agfl_buf);
> >  	fill_agfl(btr_bno, freelist, &agfl_idx);
> >  	fill_agfl(btr_cnt, freelist, &agfl_idx);
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > +		fill_agfl(btr_rmap, freelist, &agfl_idx);
> 
> Is this new behavior? Either way, I guess it makes sense since the
> rmapbt feeds from/to the agfl:

It's a defensive move to make sure we don't lose the blocks if we
overestimate the size of the rmapbt.  We never did in the past (and we
shouldn't now) but I figured I should throw that in as a defensive
measure so we don't leak the blocks if something goes wrong.

(Granted, I think in the past any overages would have been freed back
into the filesystem...)

Thanks for the review.

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  
> >  	/* Set the AGF counters for the AGFL. */
> >  	if (agfl_idx > 0) {
> > @@ -1343,7 +980,7 @@ phase5_func(
> >  	struct bt_rebuild	btr_cnt;
> >  	struct bt_rebuild	btr_ino;
> >  	struct bt_rebuild	btr_fino;
> > -	bt_status_t		rmap_btree_curs;
> > +	struct bt_rebuild	btr_rmap;
> >  	bt_status_t		refcnt_btree_curs;
> >  	int			extra_blocks = 0;
> >  	uint			num_freeblocks;
> > @@ -1378,11 +1015,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
> >  			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
> >  
> > -	/*
> > -	 * Set up the btree cursors for the on-disk rmap btrees, which includes
> > -	 * pre-allocating all required blocks.
> > -	 */
> > -	init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
> > +	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
> >  
> >  	/*
> >  	 * Set up the btree cursors for the on-disk refcount btrees,
> > @@ -1448,10 +1081,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
> >  
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > -		build_rmap_tree(mp, agno, &rmap_btree_curs);
> > -		write_cursor(&rmap_btree_curs);
> > -		sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
> > -				rmap_btree_curs.num_free_blocks) - 1;
> > +		build_rmap_tree(&sc, agno, &btr_rmap);
> > +		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
> >  	}
> >  
> >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > @@ -1462,7 +1093,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	/*
> >  	 * set up agf and agfl
> >  	 */
> > -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
> >  			&refcnt_btree_curs, lost_fsb);
> >  
> >  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> > @@ -1479,7 +1110,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> >  		finish_rebuild(mp, &btr_fino, lost_fsb);
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > -		finish_cursor(&rmap_btree_curs);
> > +		finish_rebuild(mp, &btr_rmap, lost_fsb);
> >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> >  		finish_cursor(&refcnt_btree_curs);
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/12] xfs_repair: rebuild reverse mapping btrees with bulk loader
  2020-06-18 15:31     ` Darrick J. Wong
@ 2020-06-18 15:37       ` Brian Foster
  2020-06-18 16:54         ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Brian Foster @ 2020-06-18 15:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 08:31:00AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 18, 2020 at 11:25:11AM -0400, Brian Foster wrote:
> > On Mon, Jun 01, 2020 at 09:27:51PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Use the btree bulk loading functions to rebuild the reverse mapping
> > > btrees and drop the open-coded implementation.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  libxfs/libxfs_api_defs.h |    1 
> > >  repair/agbtree.c         |   70 ++++++++
> > >  repair/agbtree.h         |    5 +
> > >  repair/phase5.c          |  409 ++--------------------------------------------
> > >  4 files changed, 96 insertions(+), 389 deletions(-)
> > > 
> > > 
> > ...
> > > diff --git a/repair/phase5.c b/repair/phase5.c
> > > index e570349d..1c6448f4 100644
> > > --- a/repair/phase5.c
> > > +++ b/repair/phase5.c
> > ...
> > > @@ -1244,6 +879,8 @@ build_agf_agfl(
> > >  	freelist = xfs_buf_to_agfl_bno(agfl_buf);
> > >  	fill_agfl(btr_bno, freelist, &agfl_idx);
> > >  	fill_agfl(btr_cnt, freelist, &agfl_idx);
> > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > +		fill_agfl(btr_rmap, freelist, &agfl_idx);
> > 
> > Is this new behavior? Either way, I guess it makes sense since the
> > rmapbt feeds from/to the agfl:
> 
> It's a defensive move to make sure we don't lose the blocks if we
> overestimate the size of the rmapbt.  We never did in the past (and we
> shouldn't now) but I figured I should throw that in as a defensive
> measure so we don't leak the blocks if something goes wrong.
> 
> (Granted, I think in the past any overages would have been freed back
> into the filesystem...)
> 

I thought that was still the case since finish_rebuild() moves any
unused blocks over to the lost_fsb slab, which is why I was asking about
the agfl filling specifically..

Brian

> Thanks for the review.
> 
> --D
> 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > >  
> > >  	/* Set the AGF counters for the AGFL. */
> > >  	if (agfl_idx > 0) {
> > > @@ -1343,7 +980,7 @@ phase5_func(
> > >  	struct bt_rebuild	btr_cnt;
> > >  	struct bt_rebuild	btr_ino;
> > >  	struct bt_rebuild	btr_fino;
> > > -	bt_status_t		rmap_btree_curs;
> > > +	struct bt_rebuild	btr_rmap;
> > >  	bt_status_t		refcnt_btree_curs;
> > >  	int			extra_blocks = 0;
> > >  	uint			num_freeblocks;
> > > @@ -1378,11 +1015,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
> > >  			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
> > >  
> > > -	/*
> > > -	 * Set up the btree cursors for the on-disk rmap btrees, which includes
> > > -	 * pre-allocating all required blocks.
> > > -	 */
> > > -	init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
> > > +	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
> > >  
> > >  	/*
> > >  	 * Set up the btree cursors for the on-disk refcount btrees,
> > > @@ -1448,10 +1081,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
> > >  
> > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > -		build_rmap_tree(mp, agno, &rmap_btree_curs);
> > > -		write_cursor(&rmap_btree_curs);
> > > -		sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
> > > -				rmap_btree_curs.num_free_blocks) - 1;
> > > +		build_rmap_tree(&sc, agno, &btr_rmap);
> > > +		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
> > >  	}
> > >  
> > >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > @@ -1462,7 +1093,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	/*
> > >  	 * set up agf and agfl
> > >  	 */
> > > -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> > > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
> > >  			&refcnt_btree_curs, lost_fsb);
> > >  
> > >  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> > > @@ -1479,7 +1110,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > >  		finish_rebuild(mp, &btr_fino, lost_fsb);
> > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > -		finish_cursor(&rmap_btree_curs);
> > > +		finish_rebuild(mp, &btr_rmap, lost_fsb);
> > >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > >  		finish_cursor(&refcnt_btree_curs);
> > >  
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader
  2020-06-18 15:23   ` Brian Foster
@ 2020-06-18 16:41     ` Darrick J. Wong
  2020-06-18 16:51       ` Brian Foster
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-18 16:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 11:23:40AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:27:38PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the btree bulk loading functions to rebuild the free space btrees
> > and drop the open-coded implementation.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  libxfs/libxfs_api_defs.h |    3 
> >  repair/agbtree.c         |  158 ++++++++++
> >  repair/agbtree.h         |   10 +
> >  repair/phase5.c          |  703 ++++------------------------------------------
> >  4 files changed, 236 insertions(+), 638 deletions(-)
> > 
> > 
> ...
> > diff --git a/repair/agbtree.c b/repair/agbtree.c
> > index e4179a44..3b8ab47c 100644
> > --- a/repair/agbtree.c
> > +++ b/repair/agbtree.c
> > @@ -150,3 +150,161 @@ _("Insufficient memory saving lost blocks.\n"));
> >  
> >  	bulkload_destroy(&btr->newbt, 0);
> >  }
> ...
> > +/*
> > + * Return the next free space extent tree record from the previous value we
> > + * saw.
> > + */
> > +static inline struct extent_tree_node *
> > +get_bno_rec(
> > +	struct xfs_btree_cur	*cur,
> > +	struct extent_tree_node	*prev_value)
> > +{
> > +	xfs_agnumber_t		agno = cur->bc_ag.agno;
> > +
> > +	if (cur->bc_btnum == XFS_BTNUM_BNO) {
> > +		if (!prev_value)
> > +			return findfirst_bno_extent(agno);
> > +		return findnext_bno_extent(prev_value);
> > +	}
> > +
> > +	/* cnt btree */
> > +	if (!prev_value)
> > +		return findfirst_bcnt_extent(agno);
> > +	return findnext_bcnt_extent(agno, prev_value);
> > +}
> > +
> > +/* Grab one bnobt record and put it in the btree cursor. */
> > +static int
> > +get_bnobt_record(
> > +	struct xfs_btree_cur		*cur,
> > +	void				*priv)
> > +{
> > +	struct bt_rebuild		*btr = priv;
> > +	struct xfs_alloc_rec_incore	*arec = &cur->bc_rec.a;
> > +
> > +	btr->bno_rec = get_bno_rec(cur, btr->bno_rec);
> > +	arec->ar_startblock = btr->bno_rec->ex_startblock;
> > +	arec->ar_blockcount = btr->bno_rec->ex_blockcount;
> > +	btr->freeblks += btr->bno_rec->ex_blockcount;
> > +	return 0;
> > +}
> 
> Nit, but the 'bno' naming in the above functions suggest this is bnobt
> specific when it actually covers the bnobt and cntbt. Can we call these
> something more generic? get_[bt_]record() seems reasonable enough to me
> given they're static.

get_freesp() and get_freesp_record()?

--D

> Other than that the factoring looks much nicer and the rest LGTM:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > +
> > +void
> > +init_freespace_cursors(
> > +	struct repair_ctx	*sc,
> > +	xfs_agnumber_t		agno,
> > +	unsigned int		free_space,
> > +	unsigned int		*nr_extents,
> > +	int			*extra_blocks,
> > +	struct bt_rebuild	*btr_bno,
> > +	struct bt_rebuild	*btr_cnt)
> > +{
> > +	unsigned int		bno_blocks;
> > +	unsigned int		cnt_blocks;
> > +	int			error;
> > +
> > +	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_bno);
> > +	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_cnt);
> > +
> > +	btr_bno->cur = libxfs_allocbt_stage_cursor(sc->mp,
> > +			&btr_bno->newbt.afake, agno, XFS_BTNUM_BNO);
> > +	btr_cnt->cur = libxfs_allocbt_stage_cursor(sc->mp,
> > +			&btr_cnt->newbt.afake, agno, XFS_BTNUM_CNT);
> > +
> > +	btr_bno->bload.get_record = get_bnobt_record;
> > +	btr_bno->bload.claim_block = rebuild_claim_block;
> > +
> > +	btr_cnt->bload.get_record = get_bnobt_record;
> > +	btr_cnt->bload.claim_block = rebuild_claim_block;
> > +
> > +	/*
> > +	 * Now we need to allocate blocks for the free space btrees using the
> > +	 * free space records we're about to put in them.  Every record we use
> > +	 * can change the shape of the free space trees, so we recompute the
> > +	 * btree shape until we stop needing /more/ blocks.  If we have any
> > +	 * left over we'll stash them in the AGFL when we're done.
> > +	 */
> > +	do {
> > +		unsigned int	num_freeblocks;
> > +
> > +		bno_blocks = btr_bno->bload.nr_blocks;
> > +		cnt_blocks = btr_cnt->bload.nr_blocks;
> > +
> > +		/* Compute how many bnobt blocks we'll need. */
> > +		error = -libxfs_btree_bload_compute_geometry(btr_bno->cur,
> > +				&btr_bno->bload, *nr_extents);
> > +		if (error)
> > +			do_error(
> > +_("Unable to compute free space by block btree geometry, error %d.\n"), -error);
> > +
> > +		/* Compute how many cntbt blocks we'll need. */
> > +		error = -libxfs_btree_bload_compute_geometry(btr_cnt->cur,
> > +				&btr_cnt->bload, *nr_extents);
> > +		if (error)
> > +			do_error(
> > +_("Unable to compute free space by length btree geometry, error %d.\n"), -error);
> > +
> > +		/* We don't need any more blocks, so we're done. */
> > +		if (bno_blocks >= btr_bno->bload.nr_blocks &&
> > +		    cnt_blocks >= btr_cnt->bload.nr_blocks)
> > +			break;
> > +
> > +		/* Allocate however many more blocks we need this time. */
> > +		if (bno_blocks < btr_bno->bload.nr_blocks)
> > +			reserve_btblocks(sc->mp, agno, btr_bno,
> > +					btr_bno->bload.nr_blocks - bno_blocks);
> > +		if (cnt_blocks < btr_cnt->bload.nr_blocks)
> > +			reserve_btblocks(sc->mp, agno, btr_cnt,
> > +					btr_cnt->bload.nr_blocks - cnt_blocks);
> > +
> > +		/* Ok, now how many free space records do we have? */
> > +		*nr_extents = count_bno_extents_blocks(agno, &num_freeblocks);
> > +	} while (1);
> > +
> > +	*extra_blocks = (bno_blocks - btr_bno->bload.nr_blocks) +
> > +			(cnt_blocks - btr_cnt->bload.nr_blocks);
> > +}
> > +
> > +/* Rebuild the free space btrees. */
> > +void
> > +build_freespace_btrees(
> > +	struct repair_ctx	*sc,
> > +	xfs_agnumber_t		agno,
> > +	struct bt_rebuild	*btr_bno,
> > +	struct bt_rebuild	*btr_cnt)
> > +{
> > +	int			error;
> > +
> > +	/* Add all observed bnobt records. */
> > +	error = -libxfs_btree_bload(btr_bno->cur, &btr_bno->bload, btr_bno);
> > +	if (error)
> > +		do_error(
> > +_("Error %d while creating bnobt btree for AG %u.\n"), error, agno);
> > +
> > +	/* Add all observed cntbt records. */
> > +	error = -libxfs_btree_bload(btr_cnt->cur, &btr_cnt->bload, btr_cnt);
> > +	if (error)
> > +		do_error(
> > +_("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
> > +
> > +	/* Since we're not writing the AGF yet, no need to commit the cursor */
> > +	libxfs_btree_del_cursor(btr_bno->cur, 0);
> > +	libxfs_btree_del_cursor(btr_cnt->cur, 0);
> > +}
> > diff --git a/repair/agbtree.h b/repair/agbtree.h
> > index 50ea3c60..63352247 100644
> > --- a/repair/agbtree.h
> > +++ b/repair/agbtree.h
> > @@ -20,10 +20,20 @@ struct bt_rebuild {
> >  	/* Tree-specific data. */
> >  	union {
> >  		struct xfs_slab_cursor	*slab_cursor;
> > +		struct {
> > +			struct extent_tree_node	*bno_rec;
> > +			unsigned int		freeblks;
> > +		};
> >  	};
> >  };
> >  
> >  void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> >  		struct xfs_slab *lost_fsb);
> > +void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
> > +		unsigned int free_space, unsigned int *nr_extents,
> > +		int *extra_blocks, struct bt_rebuild *btr_bno,
> > +		struct bt_rebuild *btr_cnt);
> > +void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
> > +		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
> >  
> >  #endif /* __XFS_REPAIR_AG_BTREE_H__ */
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index 8175aa6f..a93d900d 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> > @@ -81,7 +81,10 @@ static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
> >  static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
> >  
> >  static int
> > -mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> > +mk_incore_fstree(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	unsigned int		*num_freeblocks)
> >  {
> >  	int			in_extent;
> >  	int			num_extents;
> > @@ -93,6 +96,8 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> >  	xfs_extlen_t		blen;
> >  	int			bstate;
> >  
> > +	*num_freeblocks = 0;
> > +
> >  	/*
> >  	 * scan the bitmap for the ag looking for continuous
> >  	 * extents of free blocks.  At this point, we know
> > @@ -148,6 +153,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> >  #endif
> >  				add_bno_extent(agno, extent_start, extent_len);
> >  				add_bcnt_extent(agno, extent_start, extent_len);
> > +				*num_freeblocks += extent_len;
> >  			}
> >  		}
> >  	}
> > @@ -161,6 +167,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> >  #endif
> >  		add_bno_extent(agno, extent_start, extent_len);
> >  		add_bcnt_extent(agno, extent_start, extent_len);
> > +		*num_freeblocks += extent_len;
> >  	}
> >  
> >  	return(num_extents);
> > @@ -338,287 +345,6 @@ finish_cursor(bt_status_t *curs)
> >  	free(curs->btree_blocks);
> >  }
> >  
> > -/*
> > - * We need to leave some free records in the tree for the corner case of
> > - * setting up the AGFL. This may require allocation of blocks, and as
> > - * such can require insertion of new records into the tree (e.g. moving
> > - * a record in the by-count tree when a long extent is shortened). If we
> > - * pack the records into the leaves with no slack space, this requires a
> > - * leaf split to occur and a block to be allocated from the free list.
> > - * If we don't have any blocks on the free list (because we are setting
> > - * it up!), then we fail, and the filesystem will fail with the same
> > - * failure at runtime. Hence leave a couple of records slack space in
> > - * each block to allow immediate modification of the tree without
> > - * requiring splits to be done.
> > - *
> > - * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?
> > - */
> > -#define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
> > -	(libxfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
> > -
> > -/*
> > - * this calculates a freespace cursor for an ag.
> > - * btree_curs is an in/out.  returns the number of
> > - * blocks that will show up in the AGFL.
> > - */
> > -static int
> > -calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
> > -			xfs_agblock_t *extents, bt_status_t *btree_curs)
> > -{
> > -	xfs_extlen_t		blocks_needed;		/* a running count */
> > -	xfs_extlen_t		blocks_allocated_pt;	/* per tree */
> > -	xfs_extlen_t		blocks_allocated_total;	/* for both trees */
> > -	xfs_agblock_t		num_extents;
> > -	int			i;
> > -	int			extents_used;
> > -	int			extra_blocks;
> > -	bt_stat_level_t		*lptr;
> > -	bt_stat_level_t		*p_lptr;
> > -	extent_tree_node_t	*ext_ptr;
> > -	int			level;
> > -
> > -	num_extents = *extents;
> > -	extents_used = 0;
> > -
> > -	ASSERT(num_extents != 0);
> > -
> > -	lptr = &btree_curs->level[0];
> > -	btree_curs->init = 1;
> > -
> > -	/*
> > -	 * figure out how much space we need for the leaf level
> > -	 * of the tree and set up the cursor for the leaf level
> > -	 * (note that the same code is duplicated further down)
> > -	 */
> > -	lptr->num_blocks = howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0));
> > -	lptr->num_recs_pb = num_extents / lptr->num_blocks;
> > -	lptr->modulo = num_extents % lptr->num_blocks;
> > -	lptr->num_recs_tot = num_extents;
> > -	level = 1;
> > -
> > -#ifdef XR_BLD_FREE_TRACE
> > -	fprintf(stderr, "%s 0 %d %d %d %d\n", __func__,
> > -			lptr->num_blocks,
> > -			lptr->num_recs_pb,
> > -			lptr->modulo,
> > -			lptr->num_recs_tot);
> > -#endif
> > -	/*
> > -	 * if we need more levels, set them up.  # of records
> > -	 * per level is the # of blocks in the level below it
> > -	 */
> > -	if (lptr->num_blocks > 1)  {
> > -		for (; btree_curs->level[level - 1].num_blocks > 1
> > -				&& level < XFS_BTREE_MAXLEVELS;
> > -				level++)  {
> > -			lptr = &btree_curs->level[level];
> > -			p_lptr = &btree_curs->level[level - 1];
> > -			lptr->num_blocks = howmany(p_lptr->num_blocks,
> > -					XR_ALLOC_BLOCK_MAXRECS(mp, level));
> > -			lptr->modulo = p_lptr->num_blocks
> > -					% lptr->num_blocks;
> > -			lptr->num_recs_pb = p_lptr->num_blocks
> > -					/ lptr->num_blocks;
> > -			lptr->num_recs_tot = p_lptr->num_blocks;
> > -#ifdef XR_BLD_FREE_TRACE
> > -			fprintf(stderr, "%s %d %d %d %d %d\n", __func__,
> > -					level,
> > -					lptr->num_blocks,
> > -					lptr->num_recs_pb,
> > -					lptr->modulo,
> > -					lptr->num_recs_tot);
> > -#endif
> > -		}
> > -	}
> > -
> > -	ASSERT(lptr->num_blocks == 1);
> > -	btree_curs->num_levels = level;
> > -
> > -	/*
> > -	 * ok, now we have a hypothetical cursor that
> > -	 * will work for both the bno and bcnt trees.
> > -	 * now figure out if using up blocks to set up the
> > -	 * trees will perturb the shape of the freespace tree.
> > -	 * if so, we've over-allocated.  the freespace trees
> > -	 * as they will be *after* accounting for the free space
> > -	 * we've used up will need fewer blocks to to represent
> > -	 * than we've allocated.  We can use the AGFL to hold
> > -	 * xfs_agfl_size (sector/struct xfs_agfl) blocks but that's it.
> > -	 * Thus we limit things to xfs_agfl_size/2 for each of the 2 btrees.
> > -	 * if the number of extra blocks is more than that,
> > -	 * we'll have to be called again.
> > -	 */
> > -	for (blocks_needed = 0, i = 0; i < level; i++)  {
> > -		blocks_needed += btree_curs->level[i].num_blocks;
> > -	}
> > -
> > -	/*
> > -	 * record the # of blocks we've allocated
> > -	 */
> > -	blocks_allocated_pt = blocks_needed;
> > -	blocks_needed *= 2;
> > -	blocks_allocated_total = blocks_needed;
> > -
> > -	/*
> > -	 * figure out how many free extents will be used up by
> > -	 * our space allocation
> > -	 */
> > -	if ((ext_ptr = findfirst_bcnt_extent(agno)) == NULL)
> > -		do_error(_("can't rebuild fs trees -- not enough free space "
> > -			   "on ag %u\n"), agno);
> > -
> > -	while (ext_ptr != NULL && blocks_needed > 0)  {
> > -		if (ext_ptr->ex_blockcount <= blocks_needed)  {
> > -			blocks_needed -= ext_ptr->ex_blockcount;
> > -			extents_used++;
> > -		} else  {
> > -			blocks_needed = 0;
> > -		}
> > -
> > -		ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
> > -
> > -#ifdef XR_BLD_FREE_TRACE
> > -		if (ext_ptr != NULL)  {
> > -			fprintf(stderr, "got next extent [%u %u]\n",
> > -				ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > -		} else  {
> > -			fprintf(stderr, "out of extents\n");
> > -		}
> > -#endif
> > -	}
> > -	if (blocks_needed > 0)
> > -		do_error(_("ag %u - not enough free space to build freespace "
> > -			   "btrees\n"), agno);
> > -
> > -	ASSERT(num_extents >= extents_used);
> > -
> > -	num_extents -= extents_used;
> > -
> > -	/*
> > -	 * see if the number of leaf blocks will change as a result
> > -	 * of the number of extents changing
> > -	 */
> > -	if (howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0))
> > -			!= btree_curs->level[0].num_blocks)  {
> > -		/*
> > -		 * yes -- recalculate the cursor.  If the number of
> > -		 * excess (overallocated) blocks is < xfs_agfl_size/2, we're ok.
> > -		 * we can put those into the AGFL.  we don't try
> > -		 * and get things to converge exactly (reach a
> > -		 * state with zero excess blocks) because there
> > -		 * exist pathological cases which will never
> > -		 * converge.  first, check for the zero-case.
> > -		 */
> > -		if (num_extents == 0)  {
> > -			/*
> > -			 * ok, we've used up all the free blocks
> > -			 * trying to lay out the leaf level. go
> > -			 * to a one block (empty) btree and put the
> > -			 * already allocated blocks into the AGFL
> > -			 */
> > -			if (btree_curs->level[0].num_blocks != 1)  {
> > -				/*
> > -				 * we really needed more blocks because
> > -				 * the old tree had more than one level.
> > -				 * this is bad.
> > -				 */
> > -				 do_warn(_("not enough free blocks left to "
> > -					   "describe all free blocks in AG "
> > -					   "%u\n"), agno);
> > -			}
> > -#ifdef XR_BLD_FREE_TRACE
> > -			fprintf(stderr,
> > -				"ag %u -- no free extents, alloc'ed %d\n",
> > -				agno, blocks_allocated_pt);
> > -#endif
> > -			lptr->num_blocks = 1;
> > -			lptr->modulo = 0;
> > -			lptr->num_recs_pb = 0;
> > -			lptr->num_recs_tot = 0;
> > -
> > -			btree_curs->num_levels = 1;
> > -
> > -			/*
> > -			 * don't reset the allocation stats, assume
> > -			 * they're all extra blocks
> > -			 * don't forget to return the total block count
> > -			 * not the per-tree block count.  these are the
> > -			 * extras that will go into the AGFL.  subtract
> > -			 * two for the root blocks.
> > -			 */
> > -			btree_curs->num_tot_blocks = blocks_allocated_pt;
> > -			btree_curs->num_free_blocks = blocks_allocated_pt;
> > -
> > -			*extents = 0;
> > -
> > -			return(blocks_allocated_total - 2);
> > -		}
> > -
> > -		lptr = &btree_curs->level[0];
> > -		lptr->num_blocks = howmany(num_extents,
> > -					XR_ALLOC_BLOCK_MAXRECS(mp, 0));
> > -		lptr->num_recs_pb = num_extents / lptr->num_blocks;
> > -		lptr->modulo = num_extents % lptr->num_blocks;
> > -		lptr->num_recs_tot = num_extents;
> > -		level = 1;
> > -
> > -		/*
> > -		 * if we need more levels, set them up
> > -		 */
> > -		if (lptr->num_blocks > 1)  {
> > -			for (level = 1; btree_curs->level[level-1].num_blocks
> > -					> 1 && level < XFS_BTREE_MAXLEVELS;
> > -					level++)  {
> > -				lptr = &btree_curs->level[level];
> > -				p_lptr = &btree_curs->level[level-1];
> > -				lptr->num_blocks = howmany(p_lptr->num_blocks,
> > -					XR_ALLOC_BLOCK_MAXRECS(mp, level));
> > -				lptr->modulo = p_lptr->num_blocks
> > -						% lptr->num_blocks;
> > -				lptr->num_recs_pb = p_lptr->num_blocks
> > -						/ lptr->num_blocks;
> > -				lptr->num_recs_tot = p_lptr->num_blocks;
> > -			}
> > -		}
> > -		ASSERT(lptr->num_blocks == 1);
> > -		btree_curs->num_levels = level;
> > -
> > -		/*
> > -		 * now figure out the number of excess blocks
> > -		 */
> > -		for (blocks_needed = 0, i = 0; i < level; i++)  {
> > -			blocks_needed += btree_curs->level[i].num_blocks;
> > -		}
> > -		blocks_needed *= 2;
> > -
> > -		ASSERT(blocks_allocated_total >= blocks_needed);
> > -		extra_blocks = blocks_allocated_total - blocks_needed;
> > -	} else  {
> > -		if (extents_used > 0) {
> > -			/*
> > -			 * reset the leaf level geometry to account
> > -			 * for consumed extents.  we can leave the
> > -			 * rest of the cursor alone since the number
> > -			 * of leaf blocks hasn't changed.
> > -			 */
> > -			lptr = &btree_curs->level[0];
> > -
> > -			lptr->num_recs_pb = num_extents / lptr->num_blocks;
> > -			lptr->modulo = num_extents % lptr->num_blocks;
> > -			lptr->num_recs_tot = num_extents;
> > -		}
> > -
> > -		extra_blocks = 0;
> > -	}
> > -
> > -	btree_curs->num_tot_blocks = blocks_allocated_pt;
> > -	btree_curs->num_free_blocks = blocks_allocated_pt;
> > -
> > -	*extents = num_extents;
> > -
> > -	return(extra_blocks);
> > -}
> > -
> >  /* Map btnum to buffer ops for the types that need it. */
> >  static const struct xfs_buf_ops *
> >  btnum_to_ops(
> > @@ -643,270 +369,6 @@ btnum_to_ops(
> >  	}
> >  }
> >  
> > -static void
> > -prop_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
> > -		bt_status_t *btree_curs, xfs_agblock_t startblock,
> > -		xfs_extlen_t blockcount, int level, xfs_btnum_t btnum)
> > -{
> > -	struct xfs_btree_block	*bt_hdr;
> > -	xfs_alloc_key_t		*bt_key;
> > -	xfs_alloc_ptr_t		*bt_ptr;
> > -	xfs_agblock_t		agbno;
> > -	bt_stat_level_t		*lptr;
> > -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> > -	int			error;
> > -
> > -	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
> > -
> > -	level++;
> > -
> > -	if (level >= btree_curs->num_levels)
> > -		return;
> > -
> > -	lptr = &btree_curs->level[level];
> > -	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -
> > -	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
> > -		/*
> > -		 * only happens once when initializing the
> > -		 * left-hand side of the tree.
> > -		 */
> > -		prop_freespace_cursor(mp, agno, btree_curs, startblock,
> > -				blockcount, level, btnum);
> > -	}
> > -
> > -	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
> > -				lptr->num_recs_pb + (lptr->modulo > 0))  {
> > -		/*
> > -		 * write out current prev block, grab us a new block,
> > -		 * and set the rightsib pointer of current block
> > -		 */
> > -#ifdef XR_BLD_FREE_TRACE
> > -		fprintf(stderr, " %d ", lptr->prev_agbno);
> > -#endif
> > -		if (lptr->prev_agbno != NULLAGBLOCK) {
> > -			ASSERT(lptr->prev_buf_p != NULL);
> > -			libxfs_buf_mark_dirty(lptr->prev_buf_p);
> > -			libxfs_buf_relse(lptr->prev_buf_p);
> > -		}
> > -		lptr->prev_agbno = lptr->agbno;;
> > -		lptr->prev_buf_p = lptr->buf_p;
> > -		agbno = get_next_blockaddr(agno, level, btree_curs);
> > -
> > -		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
> > -
> > -		error = -libxfs_buf_get(mp->m_dev,
> > -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> > -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> > -		if (error)
> > -			do_error(
> > -	_("Cannot grab free space btree buffer, err=%d"),
> > -					error);
> > -		lptr->agbno = agbno;
> > -
> > -		if (lptr->modulo)
> > -			lptr->modulo--;
> > -
> > -		/*
> > -		 * initialize block header
> > -		 */
> > -		lptr->buf_p->b_ops = ops;
> > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, level,
> > -					0, agno);
> > -
> > -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> > -
> > -		/*
> > -		 * propagate extent record for first extent in new block up
> > -		 */
> > -		prop_freespace_cursor(mp, agno, btree_curs, startblock,
> > -				blockcount, level, btnum);
> > -	}
> > -	/*
> > -	 * add extent info to current block
> > -	 */
> > -	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
> > -
> > -	bt_key = XFS_ALLOC_KEY_ADDR(mp, bt_hdr,
> > -				be16_to_cpu(bt_hdr->bb_numrecs));
> > -	bt_ptr = XFS_ALLOC_PTR_ADDR(mp, bt_hdr,
> > -				be16_to_cpu(bt_hdr->bb_numrecs),
> > -				mp->m_alloc_mxr[1]);
> > -
> > -	bt_key->ar_startblock = cpu_to_be32(startblock);
> > -	bt_key->ar_blockcount = cpu_to_be32(blockcount);
> > -	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
> > -}
> > -
> > -/*
> > - * rebuilds a freespace tree given a cursor and type
> > - * of tree to build (bno or bcnt).  returns the number of free blocks
> > - * represented by the tree.
> > - */
> > -static xfs_extlen_t
> > -build_freespace_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
> > -		bt_status_t *btree_curs, xfs_btnum_t btnum)
> > -{
> > -	xfs_agnumber_t		i;
> > -	xfs_agblock_t		j;
> > -	struct xfs_btree_block	*bt_hdr;
> > -	xfs_alloc_rec_t		*bt_rec;
> > -	int			level;
> > -	xfs_agblock_t		agbno;
> > -	extent_tree_node_t	*ext_ptr;
> > -	bt_stat_level_t		*lptr;
> > -	xfs_extlen_t		freeblks;
> > -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> > -	int			error;
> > -
> > -	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
> > -
> > -#ifdef XR_BLD_FREE_TRACE
> > -	fprintf(stderr, "in build_freespace_tree, agno = %d\n", agno);
> > -#endif
> > -	level = btree_curs->num_levels;
> > -	freeblks = 0;
> > -
> > -	ASSERT(level > 0);
> > -
> > -	/*
> > -	 * initialize the first block on each btree level
> > -	 */
> > -	for (i = 0; i < level; i++)  {
> > -		lptr = &btree_curs->level[i];
> > -
> > -		agbno = get_next_blockaddr(agno, i, btree_curs);
> > -		error = -libxfs_buf_get(mp->m_dev,
> > -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> > -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> > -		if (error)
> > -			do_error(
> > -	_("Cannot grab free space btree buffer, err=%d"),
> > -					error);
> > -
> > -		if (i == btree_curs->num_levels - 1)
> > -			btree_curs->root = agbno;
> > -
> > -		lptr->agbno = agbno;
> > -		lptr->prev_agbno = NULLAGBLOCK;
> > -		lptr->prev_buf_p = NULL;
> > -		/*
> > -		 * initialize block header
> > -		 */
> > -		lptr->buf_p->b_ops = ops;
> > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
> > -	}
> > -	/*
> > -	 * run along leaf, setting up records.  as we have to switch
> > -	 * blocks, call the prop_freespace_cursor routine to set up the new
> > -	 * pointers for the parent.  that can recurse up to the root
> > -	 * if required.  set the sibling pointers for leaf level here.
> > -	 */
> > -	if (btnum == XFS_BTNUM_BNO)
> > -		ext_ptr = findfirst_bno_extent(agno);
> > -	else
> > -		ext_ptr = findfirst_bcnt_extent(agno);
> > -
> > -#ifdef XR_BLD_FREE_TRACE
> > -	fprintf(stderr, "bft, agno = %d, start = %u, count = %u\n",
> > -		agno, ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > -#endif
> > -
> > -	lptr = &btree_curs->level[0];
> > -
> > -	for (i = 0; i < btree_curs->level[0].num_blocks; i++)  {
> > -		/*
> > -		 * block initialization, lay in block header
> > -		 */
> > -		lptr->buf_p->b_ops = ops;
> > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
> > -
> > -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> > -		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
> > -							(lptr->modulo > 0));
> > -#ifdef XR_BLD_FREE_TRACE
> > -		fprintf(stderr, "bft, bb_numrecs = %d\n",
> > -				be16_to_cpu(bt_hdr->bb_numrecs));
> > -#endif
> > -
> > -		if (lptr->modulo > 0)
> > -			lptr->modulo--;
> > -
> > -		/*
> > -		 * initialize values in the path up to the root if
> > -		 * this is a multi-level btree
> > -		 */
> > -		if (btree_curs->num_levels > 1)
> > -			prop_freespace_cursor(mp, agno, btree_curs,
> > -					ext_ptr->ex_startblock,
> > -					ext_ptr->ex_blockcount,
> > -					0, btnum);
> > -
> > -		bt_rec = (xfs_alloc_rec_t *)
> > -			  ((char *)bt_hdr + XFS_ALLOC_BLOCK_LEN(mp));
> > -		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
> > -			ASSERT(ext_ptr != NULL);
> > -			bt_rec[j].ar_startblock = cpu_to_be32(
> > -							ext_ptr->ex_startblock);
> > -			bt_rec[j].ar_blockcount = cpu_to_be32(
> > -							ext_ptr->ex_blockcount);
> > -			freeblks += ext_ptr->ex_blockcount;
> > -			if (btnum == XFS_BTNUM_BNO)
> > -				ext_ptr = findnext_bno_extent(ext_ptr);
> > -			else
> > -				ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
> > -#if 0
> > -#ifdef XR_BLD_FREE_TRACE
> > -			if (ext_ptr == NULL)
> > -				fprintf(stderr, "null extent pointer, j = %d\n",
> > -					j);
> > -			else
> > -				fprintf(stderr,
> > -				"bft, agno = %d, start = %u, count = %u\n",
> > -					agno, ext_ptr->ex_startblock,
> > -					ext_ptr->ex_blockcount);
> > -#endif
> > -#endif
> > -		}
> > -
> > -		if (ext_ptr != NULL)  {
> > -			/*
> > -			 * get next leaf level block
> > -			 */
> > -			if (lptr->prev_buf_p != NULL)  {
> > -#ifdef XR_BLD_FREE_TRACE
> > -				fprintf(stderr, " writing fst agbno %u\n",
> > -					lptr->prev_agbno);
> > -#endif
> > -				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
> > -				libxfs_buf_mark_dirty(lptr->prev_buf_p);
> > -				libxfs_buf_relse(lptr->prev_buf_p);
> > -			}
> > -			lptr->prev_buf_p = lptr->buf_p;
> > -			lptr->prev_agbno = lptr->agbno;
> > -			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
> > -			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
> > -
> > -			error = -libxfs_buf_get(mp->m_dev,
> > -					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
> > -					XFS_FSB_TO_BB(mp, 1),
> > -					&lptr->buf_p);
> > -			if (error)
> > -				do_error(
> > -	_("Cannot grab free space btree buffer, err=%d"),
> > -						error);
> > -		}
> > -	}
> > -
> > -	return(freeblks);
> > -}
> > -
> >  /*
> >   * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
> >   */
> > @@ -2038,6 +1500,28 @@ _("Insufficient memory to construct refcount cursor."));
> >  	free_slab_cursor(&refc_cur);
> >  }
> >  
> > +/* Fill the AGFL with any leftover bnobt rebuilder blocks. */
> > +static void
> > +fill_agfl(
> > +	struct bt_rebuild	*btr,
> > +	__be32			*agfl_bnos,
> > +	unsigned int		*agfl_idx)
> > +{
> > +	struct bulkload_resv	*resv, *n;
> > +	struct xfs_mount	*mp = btr->newbt.sc->mp;
> > +
> > +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> > +		xfs_agblock_t	bno;
> > +
> > +		bno = XFS_FSB_TO_AGBNO(mp, resv->fsbno + resv->used);
> > +		while (resv->used < resv->len &&
> > +		       *agfl_idx < libxfs_agfl_size(mp)) {
> > +			agfl_bnos[(*agfl_idx)++] = cpu_to_be32(bno++);
> > +			resv->used++;
> > +		}
> > +	}
> > +}
> > +
> >  /*
> >   * build both the agf and the agfl for an agno given both
> >   * btree cursors.
> > @@ -2048,9 +1532,8 @@ static void
> >  build_agf_agfl(
> >  	struct xfs_mount	*mp,
> >  	xfs_agnumber_t		agno,
> > -	struct bt_status	*bno_bt,
> > -	struct bt_status	*bcnt_bt,
> > -	xfs_extlen_t		freeblks,	/* # free blocks in tree */
> > +	struct bt_rebuild	*btr_bno,
> > +	struct bt_rebuild	*btr_cnt,
> >  	struct bt_status	*rmap_bt,
> >  	struct bt_status	*refcnt_bt,
> >  	struct xfs_slab		*lost_fsb)
> > @@ -2060,7 +1543,6 @@ build_agf_agfl(
> >  	unsigned int		agfl_idx;
> >  	struct xfs_agfl		*agfl;
> >  	struct xfs_agf		*agf;
> > -	xfs_fsblock_t		fsb;
> >  	__be32			*freelist;
> >  	int			error;
> >  
> > @@ -2092,13 +1574,17 @@ build_agf_agfl(
> >  		agf->agf_length = cpu_to_be32(mp->m_sb.sb_dblocks -
> >  			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
> >  
> > -	agf->agf_roots[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->root);
> > -	agf->agf_levels[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->num_levels);
> > -	agf->agf_roots[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->root);
> > -	agf->agf_levels[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->num_levels);
> > +	agf->agf_roots[XFS_BTNUM_BNO] =
> > +			cpu_to_be32(btr_bno->newbt.afake.af_root);
> > +	agf->agf_levels[XFS_BTNUM_BNO] =
> > +			cpu_to_be32(btr_bno->newbt.afake.af_levels);
> > +	agf->agf_roots[XFS_BTNUM_CNT] =
> > +			cpu_to_be32(btr_cnt->newbt.afake.af_root);
> > +	agf->agf_levels[XFS_BTNUM_CNT] =
> > +			cpu_to_be32(btr_cnt->newbt.afake.af_levels);
> >  	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
> >  	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
> > -	agf->agf_freeblks = cpu_to_be32(freeblks);
> > +	agf->agf_freeblks = cpu_to_be32(btr_bno->freeblks);
> >  	agf->agf_rmap_blocks = cpu_to_be32(rmap_bt->num_tot_blocks -
> >  			rmap_bt->num_free_blocks);
> >  	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
> > @@ -2115,9 +1601,8 @@ build_agf_agfl(
> >  		 * Don't count the root blocks as they are already
> >  		 * accounted for.
> >  		 */
> > -		blks = (bno_bt->num_tot_blocks - bno_bt->num_free_blocks) +
> > -			(bcnt_bt->num_tot_blocks - bcnt_bt->num_free_blocks) -
> > -			2;
> > +		blks = btr_bno->newbt.afake.af_blocks +
> > +			btr_cnt->newbt.afake.af_blocks - 2;
> >  		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> >  			blks += rmap_bt->num_tot_blocks - rmap_bt->num_free_blocks - 1;
> >  		agf->agf_btreeblks = cpu_to_be32(blks);
> > @@ -2159,50 +1644,14 @@ build_agf_agfl(
> >  			freelist[agfl_idx] = cpu_to_be32(NULLAGBLOCK);
> >  	}
> >  
> > -	/*
> > -	 * do we have left-over blocks in the btree cursors that should
> > -	 * be used to fill the AGFL?
> > -	 */
> > -	if (bno_bt->num_free_blocks > 0 || bcnt_bt->num_free_blocks > 0)  {
> > -		/*
> > -		 * yes, now grab as many blocks as we can
> > -		 */
> > -		agfl_idx = 0;
> > -		while (bno_bt->num_free_blocks > 0 &&
> > -		       agfl_idx < libxfs_agfl_size(mp))
> > -		{
> > -			freelist[agfl_idx] = cpu_to_be32(
> > -					get_next_blockaddr(agno, 0, bno_bt));
> > -			agfl_idx++;
> > -		}
> > -
> > -		while (bcnt_bt->num_free_blocks > 0 &&
> > -		       agfl_idx < libxfs_agfl_size(mp))
> > -		{
> > -			freelist[agfl_idx] = cpu_to_be32(
> > -					get_next_blockaddr(agno, 0, bcnt_bt));
> > -			agfl_idx++;
> > -		}
> > -		/*
> > -		 * now throw the rest of the blocks away and complain
> > -		 */
> > -		while (bno_bt->num_free_blocks > 0) {
> > -			fsb = XFS_AGB_TO_FSB(mp, agno,
> > -					get_next_blockaddr(agno, 0, bno_bt));
> > -			error = slab_add(lost_fsb, &fsb);
> > -			if (error)
> > -				do_error(
> > -_("Insufficient memory saving lost blocks.\n"));
> > -		}
> > -		while (bcnt_bt->num_free_blocks > 0) {
> > -			fsb = XFS_AGB_TO_FSB(mp, agno,
> > -					get_next_blockaddr(agno, 0, bcnt_bt));
> > -			error = slab_add(lost_fsb, &fsb);
> > -			if (error)
> > -				do_error(
> > -_("Insufficient memory saving lost blocks.\n"));
> > -		}
> > +	/* Fill the AGFL with leftover blocks or save them for later. */
> > +	agfl_idx = 0;
> > +	freelist = xfs_buf_to_agfl_bno(agfl_buf);
> > +	fill_agfl(btr_bno, freelist, &agfl_idx);
> > +	fill_agfl(btr_cnt, freelist, &agfl_idx);
> >  
> > +	/* Set the AGF counters for the AGFL. */
> > +	if (agfl_idx > 0) {
> >  		agf->agf_flfirst = 0;
> >  		agf->agf_fllast = cpu_to_be32(agfl_idx - 1);
> >  		agf->agf_flcount = cpu_to_be32(agfl_idx);
> > @@ -2300,18 +1749,14 @@ phase5_func(
> >  	uint64_t		num_free_inos;
> >  	uint64_t		finobt_num_inos;
> >  	uint64_t		finobt_num_free_inos;
> > -	bt_status_t		bno_btree_curs;
> > -	bt_status_t		bcnt_btree_curs;
> > +	struct bt_rebuild	btr_bno;
> > +	struct bt_rebuild	btr_cnt;
> >  	bt_status_t		ino_btree_curs;
> >  	bt_status_t		fino_btree_curs;
> >  	bt_status_t		rmap_btree_curs;
> >  	bt_status_t		refcnt_btree_curs;
> >  	int			extra_blocks = 0;
> >  	uint			num_freeblocks;
> > -	xfs_extlen_t		freeblks1;
> > -#ifdef DEBUG
> > -	xfs_extlen_t		freeblks2;
> > -#endif
> >  	xfs_agblock_t		num_extents;
> >  
> >  	if (verbose)
> > @@ -2320,7 +1765,7 @@ phase5_func(
> >  	/*
> >  	 * build up incore bno and bcnt extent btrees
> >  	 */
> > -	num_extents = mk_incore_fstree(mp, agno);
> > +	num_extents = mk_incore_fstree(mp, agno, &num_freeblocks);
> >  
> >  #ifdef XR_BLD_FREE_TRACE
> >  	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
> > @@ -2392,8 +1837,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	/*
> >  	 * track blocks that we might really lose
> >  	 */
> > -	extra_blocks = calculate_freespace_cursor(mp, agno,
> > -				&num_extents, &bno_btree_curs);
> > +	init_freespace_cursors(&sc, agno, num_freeblocks, &num_extents,
> > +			&extra_blocks, &btr_bno, &btr_cnt);
> >  
> >  	/*
> >  	 * freespace btrees live in the "free space" but the filesystem treats
> > @@ -2410,37 +1855,18 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	if (extra_blocks > 0)
> >  		sb_fdblocks_ag[agno] -= extra_blocks;
> >  
> > -	bcnt_btree_curs = bno_btree_curs;
> > -
> > -	bno_btree_curs.owner = XFS_RMAP_OWN_AG;
> > -	bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
> > -	setup_cursor(mp, agno, &bno_btree_curs);
> > -	setup_cursor(mp, agno, &bcnt_btree_curs);
> > -
> >  #ifdef XR_BLD_FREE_TRACE
> >  	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
> >  	fprintf(stderr, "# of bcnt extents is %d\n", count_bcnt_extents(agno));
> >  #endif
> >  
> > -	/*
> > -	 * now rebuild the freespace trees
> > -	 */
> > -	freeblks1 = build_freespace_tree(mp, agno,
> > -					&bno_btree_curs, XFS_BTNUM_BNO);
> > +	build_freespace_btrees(&sc, agno, &btr_bno, &btr_cnt);
> > +
> >  #ifdef XR_BLD_FREE_TRACE
> > -	fprintf(stderr, "# of free blocks == %d\n", freeblks1);
> > +	fprintf(stderr, "# of free blocks == %d/%d\n", btr_bno.freeblks,
> > +			btr_cnt.freeblks);
> >  #endif
> > -	write_cursor(&bno_btree_curs);
> > -
> > -#ifdef DEBUG
> > -	freeblks2 = build_freespace_tree(mp, agno,
> > -				&bcnt_btree_curs, XFS_BTNUM_CNT);
> > -#else
> > -	(void) build_freespace_tree(mp, agno, &bcnt_btree_curs, XFS_BTNUM_CNT);
> > -#endif
> > -	write_cursor(&bcnt_btree_curs);
> > -
> > -	ASSERT(freeblks1 == freeblks2);
> > +	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
> >  
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> >  		build_rmap_tree(mp, agno, &rmap_btree_curs);
> > @@ -2457,8 +1883,9 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	/*
> >  	 * set up agf and agfl
> >  	 */
> > -	build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs, freeblks1,
> > -			&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
> > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> > +			&refcnt_btree_curs, lost_fsb);
> > +
> >  	/*
> >  	 * build inode allocation tree.
> >  	 */
> > @@ -2480,7 +1907,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	/*
> >  	 * tear down cursors
> >  	 */
> > -	finish_cursor(&bno_btree_curs);
> > +	finish_rebuild(mp, &btr_bno, lost_fsb);
> > +	finish_rebuild(mp, &btr_cnt, lost_fsb);
> >  	finish_cursor(&ino_btree_curs);
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> >  		finish_cursor(&rmap_btree_curs);
> > @@ -2488,7 +1916,6 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  		finish_cursor(&refcnt_btree_curs);
> >  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> >  		finish_cursor(&fino_btree_curs);
> > -	finish_cursor(&bcnt_btree_curs);
> >  
> >  	/*
> >  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader
  2020-06-18 16:41     ` Darrick J. Wong
@ 2020-06-18 16:51       ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-18 16:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 09:41:15AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 18, 2020 at 11:23:40AM -0400, Brian Foster wrote:
> > On Mon, Jun 01, 2020 at 09:27:38PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Use the btree bulk loading functions to rebuild the free space btrees
> > > and drop the open-coded implementation.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  libxfs/libxfs_api_defs.h |    3 
> > >  repair/agbtree.c         |  158 ++++++++++
> > >  repair/agbtree.h         |   10 +
> > >  repair/phase5.c          |  703 ++++------------------------------------------
> > >  4 files changed, 236 insertions(+), 638 deletions(-)
> > > 
> > > 
> > ...
> > > diff --git a/repair/agbtree.c b/repair/agbtree.c
> > > index e4179a44..3b8ab47c 100644
> > > --- a/repair/agbtree.c
> > > +++ b/repair/agbtree.c
> > > @@ -150,3 +150,161 @@ _("Insufficient memory saving lost blocks.\n"));
> > >  
> > >  	bulkload_destroy(&btr->newbt, 0);
> > >  }
> > ...
> > > +/*
> > > + * Return the next free space extent tree record from the previous value we
> > > + * saw.
> > > + */
> > > +static inline struct extent_tree_node *
> > > +get_bno_rec(
> > > +	struct xfs_btree_cur	*cur,
> > > +	struct extent_tree_node	*prev_value)
> > > +{
> > > +	xfs_agnumber_t		agno = cur->bc_ag.agno;
> > > +
> > > +	if (cur->bc_btnum == XFS_BTNUM_BNO) {
> > > +		if (!prev_value)
> > > +			return findfirst_bno_extent(agno);
> > > +		return findnext_bno_extent(prev_value);
> > > +	}
> > > +
> > > +	/* cnt btree */
> > > +	if (!prev_value)
> > > +		return findfirst_bcnt_extent(agno);
> > > +	return findnext_bcnt_extent(agno, prev_value);
> > > +}
> > > +
> > > +/* Grab one bnobt record and put it in the btree cursor. */
> > > +static int
> > > +get_bnobt_record(
> > > +	struct xfs_btree_cur		*cur,
> > > +	void				*priv)
> > > +{
> > > +	struct bt_rebuild		*btr = priv;
> > > +	struct xfs_alloc_rec_incore	*arec = &cur->bc_rec.a;
> > > +
> > > +	btr->bno_rec = get_bno_rec(cur, btr->bno_rec);
> > > +	arec->ar_startblock = btr->bno_rec->ex_startblock;
> > > +	arec->ar_blockcount = btr->bno_rec->ex_blockcount;
> > > +	btr->freeblks += btr->bno_rec->ex_blockcount;
> > > +	return 0;
> > > +}
> > 
> > Nit, but the 'bno' naming in the above functions suggest this is bnobt
> > specific when it actually covers the bnobt and cntbt. Can we call these
> > something more generic? get_[bt_]record() seems reasonable enough to me
> > given they're static.
> 
> get_freesp() and get_freesp_record()?
> 

Sounds good, thanks!

Brian

> --D
> 
> > Other than that the factoring looks much nicer and the rest LGTM:
> > 
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > 
> > > +
> > > +void
> > > +init_freespace_cursors(
> > > +	struct repair_ctx	*sc,
> > > +	xfs_agnumber_t		agno,
> > > +	unsigned int		free_space,
> > > +	unsigned int		*nr_extents,
> > > +	int			*extra_blocks,
> > > +	struct bt_rebuild	*btr_bno,
> > > +	struct bt_rebuild	*btr_cnt)
> > > +{
> > > +	unsigned int		bno_blocks;
> > > +	unsigned int		cnt_blocks;
> > > +	int			error;
> > > +
> > > +	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_bno);
> > > +	init_rebuild(sc, &XFS_RMAP_OINFO_AG, free_space, btr_cnt);
> > > +
> > > +	btr_bno->cur = libxfs_allocbt_stage_cursor(sc->mp,
> > > +			&btr_bno->newbt.afake, agno, XFS_BTNUM_BNO);
> > > +	btr_cnt->cur = libxfs_allocbt_stage_cursor(sc->mp,
> > > +			&btr_cnt->newbt.afake, agno, XFS_BTNUM_CNT);
> > > +
> > > +	btr_bno->bload.get_record = get_bnobt_record;
> > > +	btr_bno->bload.claim_block = rebuild_claim_block;
> > > +
> > > +	btr_cnt->bload.get_record = get_bnobt_record;
> > > +	btr_cnt->bload.claim_block = rebuild_claim_block;
> > > +
> > > +	/*
> > > +	 * Now we need to allocate blocks for the free space btrees using the
> > > +	 * free space records we're about to put in them.  Every record we use
> > > +	 * can change the shape of the free space trees, so we recompute the
> > > +	 * btree shape until we stop needing /more/ blocks.  If we have any
> > > +	 * left over we'll stash them in the AGFL when we're done.
> > > +	 */
> > > +	do {
> > > +		unsigned int	num_freeblocks;
> > > +
> > > +		bno_blocks = btr_bno->bload.nr_blocks;
> > > +		cnt_blocks = btr_cnt->bload.nr_blocks;
> > > +
> > > +		/* Compute how many bnobt blocks we'll need. */
> > > +		error = -libxfs_btree_bload_compute_geometry(btr_bno->cur,
> > > +				&btr_bno->bload, *nr_extents);
> > > +		if (error)
> > > +			do_error(
> > > +_("Unable to compute free space by block btree geometry, error %d.\n"), -error);
> > > +
> > > +		/* Compute how many cntbt blocks we'll need. */
> > > +		error = -libxfs_btree_bload_compute_geometry(btr_cnt->cur,
> > > +				&btr_cnt->bload, *nr_extents);
> > > +		if (error)
> > > +			do_error(
> > > +_("Unable to compute free space by length btree geometry, error %d.\n"), -error);
> > > +
> > > +		/* We don't need any more blocks, so we're done. */
> > > +		if (bno_blocks >= btr_bno->bload.nr_blocks &&
> > > +		    cnt_blocks >= btr_cnt->bload.nr_blocks)
> > > +			break;
> > > +
> > > +		/* Allocate however many more blocks we need this time. */
> > > +		if (bno_blocks < btr_bno->bload.nr_blocks)
> > > +			reserve_btblocks(sc->mp, agno, btr_bno,
> > > +					btr_bno->bload.nr_blocks - bno_blocks);
> > > +		if (cnt_blocks < btr_cnt->bload.nr_blocks)
> > > +			reserve_btblocks(sc->mp, agno, btr_cnt,
> > > +					btr_cnt->bload.nr_blocks - cnt_blocks);
> > > +
> > > +		/* Ok, now how many free space records do we have? */
> > > +		*nr_extents = count_bno_extents_blocks(agno, &num_freeblocks);
> > > +	} while (1);
> > > +
> > > +	*extra_blocks = (bno_blocks - btr_bno->bload.nr_blocks) +
> > > +			(cnt_blocks - btr_cnt->bload.nr_blocks);
> > > +}
> > > +
> > > +/* Rebuild the free space btrees. */
> > > +void
> > > +build_freespace_btrees(
> > > +	struct repair_ctx	*sc,
> > > +	xfs_agnumber_t		agno,
> > > +	struct bt_rebuild	*btr_bno,
> > > +	struct bt_rebuild	*btr_cnt)
> > > +{
> > > +	int			error;
> > > +
> > > +	/* Add all observed bnobt records. */
> > > +	error = -libxfs_btree_bload(btr_bno->cur, &btr_bno->bload, btr_bno);
> > > +	if (error)
> > > +		do_error(
> > > +_("Error %d while creating bnobt btree for AG %u.\n"), error, agno);
> > > +
> > > +	/* Add all observed cntbt records. */
> > > +	error = -libxfs_btree_bload(btr_cnt->cur, &btr_cnt->bload, btr_cnt);
> > > +	if (error)
> > > +		do_error(
> > > +_("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
> > > +
> > > +	/* Since we're not writing the AGF yet, no need to commit the cursor */
> > > +	libxfs_btree_del_cursor(btr_bno->cur, 0);
> > > +	libxfs_btree_del_cursor(btr_cnt->cur, 0);
> > > +}
> > > diff --git a/repair/agbtree.h b/repair/agbtree.h
> > > index 50ea3c60..63352247 100644
> > > --- a/repair/agbtree.h
> > > +++ b/repair/agbtree.h
> > > @@ -20,10 +20,20 @@ struct bt_rebuild {
> > >  	/* Tree-specific data. */
> > >  	union {
> > >  		struct xfs_slab_cursor	*slab_cursor;
> > > +		struct {
> > > +			struct extent_tree_node	*bno_rec;
> > > +			unsigned int		freeblks;
> > > +		};
> > >  	};
> > >  };
> > >  
> > >  void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> > >  		struct xfs_slab *lost_fsb);
> > > +void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
> > > +		unsigned int free_space, unsigned int *nr_extents,
> > > +		int *extra_blocks, struct bt_rebuild *btr_bno,
> > > +		struct bt_rebuild *btr_cnt);
> > > +void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
> > > +		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
> > >  
> > >  #endif /* __XFS_REPAIR_AG_BTREE_H__ */
> > > diff --git a/repair/phase5.c b/repair/phase5.c
> > > index 8175aa6f..a93d900d 100644
> > > --- a/repair/phase5.c
> > > +++ b/repair/phase5.c
> > > @@ -81,7 +81,10 @@ static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
> > >  static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
> > >  
> > >  static int
> > > -mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> > > +mk_incore_fstree(
> > > +	struct xfs_mount	*mp,
> > > +	xfs_agnumber_t		agno,
> > > +	unsigned int		*num_freeblocks)
> > >  {
> > >  	int			in_extent;
> > >  	int			num_extents;
> > > @@ -93,6 +96,8 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> > >  	xfs_extlen_t		blen;
> > >  	int			bstate;
> > >  
> > > +	*num_freeblocks = 0;
> > > +
> > >  	/*
> > >  	 * scan the bitmap for the ag looking for continuous
> > >  	 * extents of free blocks.  At this point, we know
> > > @@ -148,6 +153,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> > >  #endif
> > >  				add_bno_extent(agno, extent_start, extent_len);
> > >  				add_bcnt_extent(agno, extent_start, extent_len);
> > > +				*num_freeblocks += extent_len;
> > >  			}
> > >  		}
> > >  	}
> > > @@ -161,6 +167,7 @@ mk_incore_fstree(xfs_mount_t *mp, xfs_agnumber_t agno)
> > >  #endif
> > >  		add_bno_extent(agno, extent_start, extent_len);
> > >  		add_bcnt_extent(agno, extent_start, extent_len);
> > > +		*num_freeblocks += extent_len;
> > >  	}
> > >  
> > >  	return(num_extents);
> > > @@ -338,287 +345,6 @@ finish_cursor(bt_status_t *curs)
> > >  	free(curs->btree_blocks);
> > >  }
> > >  
> > > -/*
> > > - * We need to leave some free records in the tree for the corner case of
> > > - * setting up the AGFL. This may require allocation of blocks, and as
> > > - * such can require insertion of new records into the tree (e.g. moving
> > > - * a record in the by-count tree when a long extent is shortened). If we
> > > - * pack the records into the leaves with no slack space, this requires a
> > > - * leaf split to occur and a block to be allocated from the free list.
> > > - * If we don't have any blocks on the free list (because we are setting
> > > - * it up!), then we fail, and the filesystem will fail with the same
> > > - * failure at runtime. Hence leave a couple of records slack space in
> > > - * each block to allow immediate modification of the tree without
> > > - * requiring splits to be done.
> > > - *
> > > - * XXX(hch): any reason we don't just look at mp->m_alloc_mxr?
> > > - */
> > > -#define XR_ALLOC_BLOCK_MAXRECS(mp, level) \
> > > -	(libxfs_allocbt_maxrecs((mp), (mp)->m_sb.sb_blocksize, (level) == 0) - 2)
> > > -
> > > -/*
> > > - * this calculates a freespace cursor for an ag.
> > > - * btree_curs is an in/out.  returns the number of
> > > - * blocks that will show up in the AGFL.
> > > - */
> > > -static int
> > > -calculate_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
> > > -			xfs_agblock_t *extents, bt_status_t *btree_curs)
> > > -{
> > > -	xfs_extlen_t		blocks_needed;		/* a running count */
> > > -	xfs_extlen_t		blocks_allocated_pt;	/* per tree */
> > > -	xfs_extlen_t		blocks_allocated_total;	/* for both trees */
> > > -	xfs_agblock_t		num_extents;
> > > -	int			i;
> > > -	int			extents_used;
> > > -	int			extra_blocks;
> > > -	bt_stat_level_t		*lptr;
> > > -	bt_stat_level_t		*p_lptr;
> > > -	extent_tree_node_t	*ext_ptr;
> > > -	int			level;
> > > -
> > > -	num_extents = *extents;
> > > -	extents_used = 0;
> > > -
> > > -	ASSERT(num_extents != 0);
> > > -
> > > -	lptr = &btree_curs->level[0];
> > > -	btree_curs->init = 1;
> > > -
> > > -	/*
> > > -	 * figure out how much space we need for the leaf level
> > > -	 * of the tree and set up the cursor for the leaf level
> > > -	 * (note that the same code is duplicated further down)
> > > -	 */
> > > -	lptr->num_blocks = howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0));
> > > -	lptr->num_recs_pb = num_extents / lptr->num_blocks;
> > > -	lptr->modulo = num_extents % lptr->num_blocks;
> > > -	lptr->num_recs_tot = num_extents;
> > > -	level = 1;
> > > -
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -	fprintf(stderr, "%s 0 %d %d %d %d\n", __func__,
> > > -			lptr->num_blocks,
> > > -			lptr->num_recs_pb,
> > > -			lptr->modulo,
> > > -			lptr->num_recs_tot);
> > > -#endif
> > > -	/*
> > > -	 * if we need more levels, set them up.  # of records
> > > -	 * per level is the # of blocks in the level below it
> > > -	 */
> > > -	if (lptr->num_blocks > 1)  {
> > > -		for (; btree_curs->level[level - 1].num_blocks > 1
> > > -				&& level < XFS_BTREE_MAXLEVELS;
> > > -				level++)  {
> > > -			lptr = &btree_curs->level[level];
> > > -			p_lptr = &btree_curs->level[level - 1];
> > > -			lptr->num_blocks = howmany(p_lptr->num_blocks,
> > > -					XR_ALLOC_BLOCK_MAXRECS(mp, level));
> > > -			lptr->modulo = p_lptr->num_blocks
> > > -					% lptr->num_blocks;
> > > -			lptr->num_recs_pb = p_lptr->num_blocks
> > > -					/ lptr->num_blocks;
> > > -			lptr->num_recs_tot = p_lptr->num_blocks;
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -			fprintf(stderr, "%s %d %d %d %d %d\n", __func__,
> > > -					level,
> > > -					lptr->num_blocks,
> > > -					lptr->num_recs_pb,
> > > -					lptr->modulo,
> > > -					lptr->num_recs_tot);
> > > -#endif
> > > -		}
> > > -	}
> > > -
> > > -	ASSERT(lptr->num_blocks == 1);
> > > -	btree_curs->num_levels = level;
> > > -
> > > -	/*
> > > -	 * ok, now we have a hypothetical cursor that
> > > -	 * will work for both the bno and bcnt trees.
> > > -	 * now figure out if using up blocks to set up the
> > > -	 * trees will perturb the shape of the freespace tree.
> > > -	 * if so, we've over-allocated.  the freespace trees
> > > -	 * as they will be *after* accounting for the free space
> > > -	 * we've used up will need fewer blocks to to represent
> > > -	 * than we've allocated.  We can use the AGFL to hold
> > > -	 * xfs_agfl_size (sector/struct xfs_agfl) blocks but that's it.
> > > -	 * Thus we limit things to xfs_agfl_size/2 for each of the 2 btrees.
> > > -	 * if the number of extra blocks is more than that,
> > > -	 * we'll have to be called again.
> > > -	 */
> > > -	for (blocks_needed = 0, i = 0; i < level; i++)  {
> > > -		blocks_needed += btree_curs->level[i].num_blocks;
> > > -	}
> > > -
> > > -	/*
> > > -	 * record the # of blocks we've allocated
> > > -	 */
> > > -	blocks_allocated_pt = blocks_needed;
> > > -	blocks_needed *= 2;
> > > -	blocks_allocated_total = blocks_needed;
> > > -
> > > -	/*
> > > -	 * figure out how many free extents will be used up by
> > > -	 * our space allocation
> > > -	 */
> > > -	if ((ext_ptr = findfirst_bcnt_extent(agno)) == NULL)
> > > -		do_error(_("can't rebuild fs trees -- not enough free space "
> > > -			   "on ag %u\n"), agno);
> > > -
> > > -	while (ext_ptr != NULL && blocks_needed > 0)  {
> > > -		if (ext_ptr->ex_blockcount <= blocks_needed)  {
> > > -			blocks_needed -= ext_ptr->ex_blockcount;
> > > -			extents_used++;
> > > -		} else  {
> > > -			blocks_needed = 0;
> > > -		}
> > > -
> > > -		ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
> > > -
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -		if (ext_ptr != NULL)  {
> > > -			fprintf(stderr, "got next extent [%u %u]\n",
> > > -				ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > > -		} else  {
> > > -			fprintf(stderr, "out of extents\n");
> > > -		}
> > > -#endif
> > > -	}
> > > -	if (blocks_needed > 0)
> > > -		do_error(_("ag %u - not enough free space to build freespace "
> > > -			   "btrees\n"), agno);
> > > -
> > > -	ASSERT(num_extents >= extents_used);
> > > -
> > > -	num_extents -= extents_used;
> > > -
> > > -	/*
> > > -	 * see if the number of leaf blocks will change as a result
> > > -	 * of the number of extents changing
> > > -	 */
> > > -	if (howmany(num_extents, XR_ALLOC_BLOCK_MAXRECS(mp, 0))
> > > -			!= btree_curs->level[0].num_blocks)  {
> > > -		/*
> > > -		 * yes -- recalculate the cursor.  If the number of
> > > -		 * excess (overallocated) blocks is < xfs_agfl_size/2, we're ok.
> > > -		 * we can put those into the AGFL.  we don't try
> > > -		 * and get things to converge exactly (reach a
> > > -		 * state with zero excess blocks) because there
> > > -		 * exist pathological cases which will never
> > > -		 * converge.  first, check for the zero-case.
> > > -		 */
> > > -		if (num_extents == 0)  {
> > > -			/*
> > > -			 * ok, we've used up all the free blocks
> > > -			 * trying to lay out the leaf level. go
> > > -			 * to a one block (empty) btree and put the
> > > -			 * already allocated blocks into the AGFL
> > > -			 */
> > > -			if (btree_curs->level[0].num_blocks != 1)  {
> > > -				/*
> > > -				 * we really needed more blocks because
> > > -				 * the old tree had more than one level.
> > > -				 * this is bad.
> > > -				 */
> > > -				 do_warn(_("not enough free blocks left to "
> > > -					   "describe all free blocks in AG "
> > > -					   "%u\n"), agno);
> > > -			}
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -			fprintf(stderr,
> > > -				"ag %u -- no free extents, alloc'ed %d\n",
> > > -				agno, blocks_allocated_pt);
> > > -#endif
> > > -			lptr->num_blocks = 1;
> > > -			lptr->modulo = 0;
> > > -			lptr->num_recs_pb = 0;
> > > -			lptr->num_recs_tot = 0;
> > > -
> > > -			btree_curs->num_levels = 1;
> > > -
> > > -			/*
> > > -			 * don't reset the allocation stats, assume
> > > -			 * they're all extra blocks
> > > -			 * don't forget to return the total block count
> > > -			 * not the per-tree block count.  these are the
> > > -			 * extras that will go into the AGFL.  subtract
> > > -			 * two for the root blocks.
> > > -			 */
> > > -			btree_curs->num_tot_blocks = blocks_allocated_pt;
> > > -			btree_curs->num_free_blocks = blocks_allocated_pt;
> > > -
> > > -			*extents = 0;
> > > -
> > > -			return(blocks_allocated_total - 2);
> > > -		}
> > > -
> > > -		lptr = &btree_curs->level[0];
> > > -		lptr->num_blocks = howmany(num_extents,
> > > -					XR_ALLOC_BLOCK_MAXRECS(mp, 0));
> > > -		lptr->num_recs_pb = num_extents / lptr->num_blocks;
> > > -		lptr->modulo = num_extents % lptr->num_blocks;
> > > -		lptr->num_recs_tot = num_extents;
> > > -		level = 1;
> > > -
> > > -		/*
> > > -		 * if we need more levels, set them up
> > > -		 */
> > > -		if (lptr->num_blocks > 1)  {
> > > -			for (level = 1; btree_curs->level[level-1].num_blocks
> > > -					> 1 && level < XFS_BTREE_MAXLEVELS;
> > > -					level++)  {
> > > -				lptr = &btree_curs->level[level];
> > > -				p_lptr = &btree_curs->level[level-1];
> > > -				lptr->num_blocks = howmany(p_lptr->num_blocks,
> > > -					XR_ALLOC_BLOCK_MAXRECS(mp, level));
> > > -				lptr->modulo = p_lptr->num_blocks
> > > -						% lptr->num_blocks;
> > > -				lptr->num_recs_pb = p_lptr->num_blocks
> > > -						/ lptr->num_blocks;
> > > -				lptr->num_recs_tot = p_lptr->num_blocks;
> > > -			}
> > > -		}
> > > -		ASSERT(lptr->num_blocks == 1);
> > > -		btree_curs->num_levels = level;
> > > -
> > > -		/*
> > > -		 * now figure out the number of excess blocks
> > > -		 */
> > > -		for (blocks_needed = 0, i = 0; i < level; i++)  {
> > > -			blocks_needed += btree_curs->level[i].num_blocks;
> > > -		}
> > > -		blocks_needed *= 2;
> > > -
> > > -		ASSERT(blocks_allocated_total >= blocks_needed);
> > > -		extra_blocks = blocks_allocated_total - blocks_needed;
> > > -	} else  {
> > > -		if (extents_used > 0) {
> > > -			/*
> > > -			 * reset the leaf level geometry to account
> > > -			 * for consumed extents.  we can leave the
> > > -			 * rest of the cursor alone since the number
> > > -			 * of leaf blocks hasn't changed.
> > > -			 */
> > > -			lptr = &btree_curs->level[0];
> > > -
> > > -			lptr->num_recs_pb = num_extents / lptr->num_blocks;
> > > -			lptr->modulo = num_extents % lptr->num_blocks;
> > > -			lptr->num_recs_tot = num_extents;
> > > -		}
> > > -
> > > -		extra_blocks = 0;
> > > -	}
> > > -
> > > -	btree_curs->num_tot_blocks = blocks_allocated_pt;
> > > -	btree_curs->num_free_blocks = blocks_allocated_pt;
> > > -
> > > -	*extents = num_extents;
> > > -
> > > -	return(extra_blocks);
> > > -}
> > > -
> > >  /* Map btnum to buffer ops for the types that need it. */
> > >  static const struct xfs_buf_ops *
> > >  btnum_to_ops(
> > > @@ -643,270 +369,6 @@ btnum_to_ops(
> > >  	}
> > >  }
> > >  
> > > -static void
> > > -prop_freespace_cursor(xfs_mount_t *mp, xfs_agnumber_t agno,
> > > -		bt_status_t *btree_curs, xfs_agblock_t startblock,
> > > -		xfs_extlen_t blockcount, int level, xfs_btnum_t btnum)
> > > -{
> > > -	struct xfs_btree_block	*bt_hdr;
> > > -	xfs_alloc_key_t		*bt_key;
> > > -	xfs_alloc_ptr_t		*bt_ptr;
> > > -	xfs_agblock_t		agbno;
> > > -	bt_stat_level_t		*lptr;
> > > -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> > > -	int			error;
> > > -
> > > -	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
> > > -
> > > -	level++;
> > > -
> > > -	if (level >= btree_curs->num_levels)
> > > -		return;
> > > -
> > > -	lptr = &btree_curs->level[level];
> > > -	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > > -
> > > -	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
> > > -		/*
> > > -		 * only happens once when initializing the
> > > -		 * left-hand side of the tree.
> > > -		 */
> > > -		prop_freespace_cursor(mp, agno, btree_curs, startblock,
> > > -				blockcount, level, btnum);
> > > -	}
> > > -
> > > -	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
> > > -				lptr->num_recs_pb + (lptr->modulo > 0))  {
> > > -		/*
> > > -		 * write out current prev block, grab us a new block,
> > > -		 * and set the rightsib pointer of current block
> > > -		 */
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -		fprintf(stderr, " %d ", lptr->prev_agbno);
> > > -#endif
> > > -		if (lptr->prev_agbno != NULLAGBLOCK) {
> > > -			ASSERT(lptr->prev_buf_p != NULL);
> > > -			libxfs_buf_mark_dirty(lptr->prev_buf_p);
> > > -			libxfs_buf_relse(lptr->prev_buf_p);
> > > -		}
> > > -		lptr->prev_agbno = lptr->agbno;;
> > > -		lptr->prev_buf_p = lptr->buf_p;
> > > -		agbno = get_next_blockaddr(agno, level, btree_curs);
> > > -
> > > -		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
> > > -
> > > -		error = -libxfs_buf_get(mp->m_dev,
> > > -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> > > -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> > > -		if (error)
> > > -			do_error(
> > > -	_("Cannot grab free space btree buffer, err=%d"),
> > > -					error);
> > > -		lptr->agbno = agbno;
> > > -
> > > -		if (lptr->modulo)
> > > -			lptr->modulo--;
> > > -
> > > -		/*
> > > -		 * initialize block header
> > > -		 */
> > > -		lptr->buf_p->b_ops = ops;
> > > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, level,
> > > -					0, agno);
> > > -
> > > -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> > > -
> > > -		/*
> > > -		 * propagate extent record for first extent in new block up
> > > -		 */
> > > -		prop_freespace_cursor(mp, agno, btree_curs, startblock,
> > > -				blockcount, level, btnum);
> > > -	}
> > > -	/*
> > > -	 * add extent info to current block
> > > -	 */
> > > -	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
> > > -
> > > -	bt_key = XFS_ALLOC_KEY_ADDR(mp, bt_hdr,
> > > -				be16_to_cpu(bt_hdr->bb_numrecs));
> > > -	bt_ptr = XFS_ALLOC_PTR_ADDR(mp, bt_hdr,
> > > -				be16_to_cpu(bt_hdr->bb_numrecs),
> > > -				mp->m_alloc_mxr[1]);
> > > -
> > > -	bt_key->ar_startblock = cpu_to_be32(startblock);
> > > -	bt_key->ar_blockcount = cpu_to_be32(blockcount);
> > > -	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
> > > -}
> > > -
> > > -/*
> > > - * rebuilds a freespace tree given a cursor and type
> > > - * of tree to build (bno or bcnt).  returns the number of free blocks
> > > - * represented by the tree.
> > > - */
> > > -static xfs_extlen_t
> > > -build_freespace_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
> > > -		bt_status_t *btree_curs, xfs_btnum_t btnum)
> > > -{
> > > -	xfs_agnumber_t		i;
> > > -	xfs_agblock_t		j;
> > > -	struct xfs_btree_block	*bt_hdr;
> > > -	xfs_alloc_rec_t		*bt_rec;
> > > -	int			level;
> > > -	xfs_agblock_t		agbno;
> > > -	extent_tree_node_t	*ext_ptr;
> > > -	bt_stat_level_t		*lptr;
> > > -	xfs_extlen_t		freeblks;
> > > -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> > > -	int			error;
> > > -
> > > -	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
> > > -
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -	fprintf(stderr, "in build_freespace_tree, agno = %d\n", agno);
> > > -#endif
> > > -	level = btree_curs->num_levels;
> > > -	freeblks = 0;
> > > -
> > > -	ASSERT(level > 0);
> > > -
> > > -	/*
> > > -	 * initialize the first block on each btree level
> > > -	 */
> > > -	for (i = 0; i < level; i++)  {
> > > -		lptr = &btree_curs->level[i];
> > > -
> > > -		agbno = get_next_blockaddr(agno, i, btree_curs);
> > > -		error = -libxfs_buf_get(mp->m_dev,
> > > -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> > > -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> > > -		if (error)
> > > -			do_error(
> > > -	_("Cannot grab free space btree buffer, err=%d"),
> > > -					error);
> > > -
> > > -		if (i == btree_curs->num_levels - 1)
> > > -			btree_curs->root = agbno;
> > > -
> > > -		lptr->agbno = agbno;
> > > -		lptr->prev_agbno = NULLAGBLOCK;
> > > -		lptr->prev_buf_p = NULL;
> > > -		/*
> > > -		 * initialize block header
> > > -		 */
> > > -		lptr->buf_p->b_ops = ops;
> > > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
> > > -	}
> > > -	/*
> > > -	 * run along leaf, setting up records.  as we have to switch
> > > -	 * blocks, call the prop_freespace_cursor routine to set up the new
> > > -	 * pointers for the parent.  that can recurse up to the root
> > > -	 * if required.  set the sibling pointers for leaf level here.
> > > -	 */
> > > -	if (btnum == XFS_BTNUM_BNO)
> > > -		ext_ptr = findfirst_bno_extent(agno);
> > > -	else
> > > -		ext_ptr = findfirst_bcnt_extent(agno);
> > > -
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -	fprintf(stderr, "bft, agno = %d, start = %u, count = %u\n",
> > > -		agno, ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > > -#endif
> > > -
> > > -	lptr = &btree_curs->level[0];
> > > -
> > > -	for (i = 0; i < btree_curs->level[0].num_blocks; i++)  {
> > > -		/*
> > > -		 * block initialization, lay in block header
> > > -		 */
> > > -		lptr->buf_p->b_ops = ops;
> > > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
> > > -
> > > -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> > > -		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
> > > -							(lptr->modulo > 0));
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -		fprintf(stderr, "bft, bb_numrecs = %d\n",
> > > -				be16_to_cpu(bt_hdr->bb_numrecs));
> > > -#endif
> > > -
> > > -		if (lptr->modulo > 0)
> > > -			lptr->modulo--;
> > > -
> > > -		/*
> > > -		 * initialize values in the path up to the root if
> > > -		 * this is a multi-level btree
> > > -		 */
> > > -		if (btree_curs->num_levels > 1)
> > > -			prop_freespace_cursor(mp, agno, btree_curs,
> > > -					ext_ptr->ex_startblock,
> > > -					ext_ptr->ex_blockcount,
> > > -					0, btnum);
> > > -
> > > -		bt_rec = (xfs_alloc_rec_t *)
> > > -			  ((char *)bt_hdr + XFS_ALLOC_BLOCK_LEN(mp));
> > > -		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
> > > -			ASSERT(ext_ptr != NULL);
> > > -			bt_rec[j].ar_startblock = cpu_to_be32(
> > > -							ext_ptr->ex_startblock);
> > > -			bt_rec[j].ar_blockcount = cpu_to_be32(
> > > -							ext_ptr->ex_blockcount);
> > > -			freeblks += ext_ptr->ex_blockcount;
> > > -			if (btnum == XFS_BTNUM_BNO)
> > > -				ext_ptr = findnext_bno_extent(ext_ptr);
> > > -			else
> > > -				ext_ptr = findnext_bcnt_extent(agno, ext_ptr);
> > > -#if 0
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -			if (ext_ptr == NULL)
> > > -				fprintf(stderr, "null extent pointer, j = %d\n",
> > > -					j);
> > > -			else
> > > -				fprintf(stderr,
> > > -				"bft, agno = %d, start = %u, count = %u\n",
> > > -					agno, ext_ptr->ex_startblock,
> > > -					ext_ptr->ex_blockcount);
> > > -#endif
> > > -#endif
> > > -		}
> > > -
> > > -		if (ext_ptr != NULL)  {
> > > -			/*
> > > -			 * get next leaf level block
> > > -			 */
> > > -			if (lptr->prev_buf_p != NULL)  {
> > > -#ifdef XR_BLD_FREE_TRACE
> > > -				fprintf(stderr, " writing fst agbno %u\n",
> > > -					lptr->prev_agbno);
> > > -#endif
> > > -				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
> > > -				libxfs_buf_mark_dirty(lptr->prev_buf_p);
> > > -				libxfs_buf_relse(lptr->prev_buf_p);
> > > -			}
> > > -			lptr->prev_buf_p = lptr->buf_p;
> > > -			lptr->prev_agbno = lptr->agbno;
> > > -			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
> > > -			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
> > > -
> > > -			error = -libxfs_buf_get(mp->m_dev,
> > > -					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
> > > -					XFS_FSB_TO_BB(mp, 1),
> > > -					&lptr->buf_p);
> > > -			if (error)
> > > -				do_error(
> > > -	_("Cannot grab free space btree buffer, err=%d"),
> > > -						error);
> > > -		}
> > > -	}
> > > -
> > > -	return(freeblks);
> > > -}
> > > -
> > >  /*
> > >   * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
> > >   */
> > > @@ -2038,6 +1500,28 @@ _("Insufficient memory to construct refcount cursor."));
> > >  	free_slab_cursor(&refc_cur);
> > >  }
> > >  
> > > +/* Fill the AGFL with any leftover bnobt rebuilder blocks. */
> > > +static void
> > > +fill_agfl(
> > > +	struct bt_rebuild	*btr,
> > > +	__be32			*agfl_bnos,
> > > +	unsigned int		*agfl_idx)
> > > +{
> > > +	struct bulkload_resv	*resv, *n;
> > > +	struct xfs_mount	*mp = btr->newbt.sc->mp;
> > > +
> > > +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> > > +		xfs_agblock_t	bno;
> > > +
> > > +		bno = XFS_FSB_TO_AGBNO(mp, resv->fsbno + resv->used);
> > > +		while (resv->used < resv->len &&
> > > +		       *agfl_idx < libxfs_agfl_size(mp)) {
> > > +			agfl_bnos[(*agfl_idx)++] = cpu_to_be32(bno++);
> > > +			resv->used++;
> > > +		}
> > > +	}
> > > +}
> > > +
> > >  /*
> > >   * build both the agf and the agfl for an agno given both
> > >   * btree cursors.
> > > @@ -2048,9 +1532,8 @@ static void
> > >  build_agf_agfl(
> > >  	struct xfs_mount	*mp,
> > >  	xfs_agnumber_t		agno,
> > > -	struct bt_status	*bno_bt,
> > > -	struct bt_status	*bcnt_bt,
> > > -	xfs_extlen_t		freeblks,	/* # free blocks in tree */
> > > +	struct bt_rebuild	*btr_bno,
> > > +	struct bt_rebuild	*btr_cnt,
> > >  	struct bt_status	*rmap_bt,
> > >  	struct bt_status	*refcnt_bt,
> > >  	struct xfs_slab		*lost_fsb)
> > > @@ -2060,7 +1543,6 @@ build_agf_agfl(
> > >  	unsigned int		agfl_idx;
> > >  	struct xfs_agfl		*agfl;
> > >  	struct xfs_agf		*agf;
> > > -	xfs_fsblock_t		fsb;
> > >  	__be32			*freelist;
> > >  	int			error;
> > >  
> > > @@ -2092,13 +1574,17 @@ build_agf_agfl(
> > >  		agf->agf_length = cpu_to_be32(mp->m_sb.sb_dblocks -
> > >  			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
> > >  
> > > -	agf->agf_roots[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->root);
> > > -	agf->agf_levels[XFS_BTNUM_BNO] = cpu_to_be32(bno_bt->num_levels);
> > > -	agf->agf_roots[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->root);
> > > -	agf->agf_levels[XFS_BTNUM_CNT] = cpu_to_be32(bcnt_bt->num_levels);
> > > +	agf->agf_roots[XFS_BTNUM_BNO] =
> > > +			cpu_to_be32(btr_bno->newbt.afake.af_root);
> > > +	agf->agf_levels[XFS_BTNUM_BNO] =
> > > +			cpu_to_be32(btr_bno->newbt.afake.af_levels);
> > > +	agf->agf_roots[XFS_BTNUM_CNT] =
> > > +			cpu_to_be32(btr_cnt->newbt.afake.af_root);
> > > +	agf->agf_levels[XFS_BTNUM_CNT] =
> > > +			cpu_to_be32(btr_cnt->newbt.afake.af_levels);
> > >  	agf->agf_roots[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->root);
> > >  	agf->agf_levels[XFS_BTNUM_RMAP] = cpu_to_be32(rmap_bt->num_levels);
> > > -	agf->agf_freeblks = cpu_to_be32(freeblks);
> > > +	agf->agf_freeblks = cpu_to_be32(btr_bno->freeblks);
> > >  	agf->agf_rmap_blocks = cpu_to_be32(rmap_bt->num_tot_blocks -
> > >  			rmap_bt->num_free_blocks);
> > >  	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
> > > @@ -2115,9 +1601,8 @@ build_agf_agfl(
> > >  		 * Don't count the root blocks as they are already
> > >  		 * accounted for.
> > >  		 */
> > > -		blks = (bno_bt->num_tot_blocks - bno_bt->num_free_blocks) +
> > > -			(bcnt_bt->num_tot_blocks - bcnt_bt->num_free_blocks) -
> > > -			2;
> > > +		blks = btr_bno->newbt.afake.af_blocks +
> > > +			btr_cnt->newbt.afake.af_blocks - 2;
> > >  		if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > >  			blks += rmap_bt->num_tot_blocks - rmap_bt->num_free_blocks - 1;
> > >  		agf->agf_btreeblks = cpu_to_be32(blks);
> > > @@ -2159,50 +1644,14 @@ build_agf_agfl(
> > >  			freelist[agfl_idx] = cpu_to_be32(NULLAGBLOCK);
> > >  	}
> > >  
> > > -	/*
> > > -	 * do we have left-over blocks in the btree cursors that should
> > > -	 * be used to fill the AGFL?
> > > -	 */
> > > -	if (bno_bt->num_free_blocks > 0 || bcnt_bt->num_free_blocks > 0)  {
> > > -		/*
> > > -		 * yes, now grab as many blocks as we can
> > > -		 */
> > > -		agfl_idx = 0;
> > > -		while (bno_bt->num_free_blocks > 0 &&
> > > -		       agfl_idx < libxfs_agfl_size(mp))
> > > -		{
> > > -			freelist[agfl_idx] = cpu_to_be32(
> > > -					get_next_blockaddr(agno, 0, bno_bt));
> > > -			agfl_idx++;
> > > -		}
> > > -
> > > -		while (bcnt_bt->num_free_blocks > 0 &&
> > > -		       agfl_idx < libxfs_agfl_size(mp))
> > > -		{
> > > -			freelist[agfl_idx] = cpu_to_be32(
> > > -					get_next_blockaddr(agno, 0, bcnt_bt));
> > > -			agfl_idx++;
> > > -		}
> > > -		/*
> > > -		 * now throw the rest of the blocks away and complain
> > > -		 */
> > > -		while (bno_bt->num_free_blocks > 0) {
> > > -			fsb = XFS_AGB_TO_FSB(mp, agno,
> > > -					get_next_blockaddr(agno, 0, bno_bt));
> > > -			error = slab_add(lost_fsb, &fsb);
> > > -			if (error)
> > > -				do_error(
> > > -_("Insufficient memory saving lost blocks.\n"));
> > > -		}
> > > -		while (bcnt_bt->num_free_blocks > 0) {
> > > -			fsb = XFS_AGB_TO_FSB(mp, agno,
> > > -					get_next_blockaddr(agno, 0, bcnt_bt));
> > > -			error = slab_add(lost_fsb, &fsb);
> > > -			if (error)
> > > -				do_error(
> > > -_("Insufficient memory saving lost blocks.\n"));
> > > -		}
> > > +	/* Fill the AGFL with leftover blocks or save them for later. */
> > > +	agfl_idx = 0;
> > > +	freelist = xfs_buf_to_agfl_bno(agfl_buf);
> > > +	fill_agfl(btr_bno, freelist, &agfl_idx);
> > > +	fill_agfl(btr_cnt, freelist, &agfl_idx);
> > >  
> > > +	/* Set the AGF counters for the AGFL. */
> > > +	if (agfl_idx > 0) {
> > >  		agf->agf_flfirst = 0;
> > >  		agf->agf_fllast = cpu_to_be32(agfl_idx - 1);
> > >  		agf->agf_flcount = cpu_to_be32(agfl_idx);
> > > @@ -2300,18 +1749,14 @@ phase5_func(
> > >  	uint64_t		num_free_inos;
> > >  	uint64_t		finobt_num_inos;
> > >  	uint64_t		finobt_num_free_inos;
> > > -	bt_status_t		bno_btree_curs;
> > > -	bt_status_t		bcnt_btree_curs;
> > > +	struct bt_rebuild	btr_bno;
> > > +	struct bt_rebuild	btr_cnt;
> > >  	bt_status_t		ino_btree_curs;
> > >  	bt_status_t		fino_btree_curs;
> > >  	bt_status_t		rmap_btree_curs;
> > >  	bt_status_t		refcnt_btree_curs;
> > >  	int			extra_blocks = 0;
> > >  	uint			num_freeblocks;
> > > -	xfs_extlen_t		freeblks1;
> > > -#ifdef DEBUG
> > > -	xfs_extlen_t		freeblks2;
> > > -#endif
> > >  	xfs_agblock_t		num_extents;
> > >  
> > >  	if (verbose)
> > > @@ -2320,7 +1765,7 @@ phase5_func(
> > >  	/*
> > >  	 * build up incore bno and bcnt extent btrees
> > >  	 */
> > > -	num_extents = mk_incore_fstree(mp, agno);
> > > +	num_extents = mk_incore_fstree(mp, agno, &num_freeblocks);
> > >  
> > >  #ifdef XR_BLD_FREE_TRACE
> > >  	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
> > > @@ -2392,8 +1837,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	/*
> > >  	 * track blocks that we might really lose
> > >  	 */
> > > -	extra_blocks = calculate_freespace_cursor(mp, agno,
> > > -				&num_extents, &bno_btree_curs);
> > > +	init_freespace_cursors(&sc, agno, num_freeblocks, &num_extents,
> > > +			&extra_blocks, &btr_bno, &btr_cnt);
> > >  
> > >  	/*
> > >  	 * freespace btrees live in the "free space" but the filesystem treats
> > > @@ -2410,37 +1855,18 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	if (extra_blocks > 0)
> > >  		sb_fdblocks_ag[agno] -= extra_blocks;
> > >  
> > > -	bcnt_btree_curs = bno_btree_curs;
> > > -
> > > -	bno_btree_curs.owner = XFS_RMAP_OWN_AG;
> > > -	bcnt_btree_curs.owner = XFS_RMAP_OWN_AG;
> > > -	setup_cursor(mp, agno, &bno_btree_curs);
> > > -	setup_cursor(mp, agno, &bcnt_btree_curs);
> > > -
> > >  #ifdef XR_BLD_FREE_TRACE
> > >  	fprintf(stderr, "# of bno extents is %d\n", count_bno_extents(agno));
> > >  	fprintf(stderr, "# of bcnt extents is %d\n", count_bcnt_extents(agno));
> > >  #endif
> > >  
> > > -	/*
> > > -	 * now rebuild the freespace trees
> > > -	 */
> > > -	freeblks1 = build_freespace_tree(mp, agno,
> > > -					&bno_btree_curs, XFS_BTNUM_BNO);
> > > +	build_freespace_btrees(&sc, agno, &btr_bno, &btr_cnt);
> > > +
> > >  #ifdef XR_BLD_FREE_TRACE
> > > -	fprintf(stderr, "# of free blocks == %d\n", freeblks1);
> > > +	fprintf(stderr, "# of free blocks == %d/%d\n", btr_bno.freeblks,
> > > +			btr_cnt.freeblks);
> > >  #endif
> > > -	write_cursor(&bno_btree_curs);
> > > -
> > > -#ifdef DEBUG
> > > -	freeblks2 = build_freespace_tree(mp, agno,
> > > -				&bcnt_btree_curs, XFS_BTNUM_CNT);
> > > -#else
> > > -	(void) build_freespace_tree(mp, agno, &bcnt_btree_curs, XFS_BTNUM_CNT);
> > > -#endif
> > > -	write_cursor(&bcnt_btree_curs);
> > > -
> > > -	ASSERT(freeblks1 == freeblks2);
> > > +	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
> > >  
> > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > >  		build_rmap_tree(mp, agno, &rmap_btree_curs);
> > > @@ -2457,8 +1883,9 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	/*
> > >  	 * set up agf and agfl
> > >  	 */
> > > -	build_agf_agfl(mp, agno, &bno_btree_curs, &bcnt_btree_curs, freeblks1,
> > > -			&rmap_btree_curs, &refcnt_btree_curs, lost_fsb);
> > > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> > > +			&refcnt_btree_curs, lost_fsb);
> > > +
> > >  	/*
> > >  	 * build inode allocation tree.
> > >  	 */
> > > @@ -2480,7 +1907,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	/*
> > >  	 * tear down cursors
> > >  	 */
> > > -	finish_cursor(&bno_btree_curs);
> > > +	finish_rebuild(mp, &btr_bno, lost_fsb);
> > > +	finish_rebuild(mp, &btr_cnt, lost_fsb);
> > >  	finish_cursor(&ino_btree_curs);
> > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > >  		finish_cursor(&rmap_btree_curs);
> > > @@ -2488,7 +1916,6 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  		finish_cursor(&refcnt_btree_curs);
> > >  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > >  		finish_cursor(&fino_btree_curs);
> > > -	finish_cursor(&bcnt_btree_curs);
> > >  
> > >  	/*
> > >  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/12] xfs_repair: rebuild reverse mapping btrees with bulk loader
  2020-06-18 15:37       ` Brian Foster
@ 2020-06-18 16:54         ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-18 16:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 11:37:40AM -0400, Brian Foster wrote:
> On Thu, Jun 18, 2020 at 08:31:00AM -0700, Darrick J. Wong wrote:
> > On Thu, Jun 18, 2020 at 11:25:11AM -0400, Brian Foster wrote:
> > > On Mon, Jun 01, 2020 at 09:27:51PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Use the btree bulk loading functions to rebuild the reverse mapping
> > > > btrees and drop the open-coded implementation.
> > > > 
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  libxfs/libxfs_api_defs.h |    1 
> > > >  repair/agbtree.c         |   70 ++++++++
> > > >  repair/agbtree.h         |    5 +
> > > >  repair/phase5.c          |  409 ++--------------------------------------------
> > > >  4 files changed, 96 insertions(+), 389 deletions(-)
> > > > 
> > > > 
> > > ...
> > > > diff --git a/repair/phase5.c b/repair/phase5.c
> > > > index e570349d..1c6448f4 100644
> > > > --- a/repair/phase5.c
> > > > +++ b/repair/phase5.c
> > > ...
> > > > @@ -1244,6 +879,8 @@ build_agf_agfl(
> > > >  	freelist = xfs_buf_to_agfl_bno(agfl_buf);
> > > >  	fill_agfl(btr_bno, freelist, &agfl_idx);
> > > >  	fill_agfl(btr_cnt, freelist, &agfl_idx);
> > > > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > > +		fill_agfl(btr_rmap, freelist, &agfl_idx);
> > > 
> > > Is this new behavior? Either way, I guess it makes sense since the
> > > rmapbt feeds from/to the agfl:
> > 
> > It's a defensive move to make sure we don't lose the blocks if we
> > overestimate the size of the rmapbt.  We never did in the past (and we
> > shouldn't now) but I figured I should throw that in as a defensive
> > measure so we don't leak the blocks if something goes wrong.
> > 
> > (Granted, I think in the past any overages would have been freed back
> > into the filesystem...)
> > 
> 
> I thought that was still the case since finish_rebuild() moves any
> unused blocks over to the lost_fsb slab, which is why I was asking about
> the agfl filling specifically..

Ah, right.  Ok.  I'll add a note to the commit message about how now we
feed unused rmapbt blocks back to the AGFL, similar to how we would in
regular operation.

--D

> Brian
> 
> > Thanks for the review.
> > 
> > --D
> > 
> > > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > > 
> > > >  
> > > >  	/* Set the AGF counters for the AGFL. */
> > > >  	if (agfl_idx > 0) {
> > > > @@ -1343,7 +980,7 @@ phase5_func(
> > > >  	struct bt_rebuild	btr_cnt;
> > > >  	struct bt_rebuild	btr_ino;
> > > >  	struct bt_rebuild	btr_fino;
> > > > -	bt_status_t		rmap_btree_curs;
> > > > +	struct bt_rebuild	btr_rmap;
> > > >  	bt_status_t		refcnt_btree_curs;
> > > >  	int			extra_blocks = 0;
> > > >  	uint			num_freeblocks;
> > > > @@ -1378,11 +1015,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > > >  	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
> > > >  			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
> > > >  
> > > > -	/*
> > > > -	 * Set up the btree cursors for the on-disk rmap btrees, which includes
> > > > -	 * pre-allocating all required blocks.
> > > > -	 */
> > > > -	init_rmapbt_cursor(mp, agno, &rmap_btree_curs);
> > > > +	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
> > > >  
> > > >  	/*
> > > >  	 * Set up the btree cursors for the on-disk refcount btrees,
> > > > @@ -1448,10 +1081,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > > >  	ASSERT(btr_bno.freeblks == btr_cnt.freeblks);
> > > >  
> > > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > > -		build_rmap_tree(mp, agno, &rmap_btree_curs);
> > > > -		write_cursor(&rmap_btree_curs);
> > > > -		sb_fdblocks_ag[agno] += (rmap_btree_curs.num_tot_blocks -
> > > > -				rmap_btree_curs.num_free_blocks) - 1;
> > > > +		build_rmap_tree(&sc, agno, &btr_rmap);
> > > > +		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
> > > >  	}
> > > >  
> > > >  	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > > @@ -1462,7 +1093,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > > >  	/*
> > > >  	 * set up agf and agfl
> > > >  	 */
> > > > -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> > > > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
> > > >  			&refcnt_btree_curs, lost_fsb);
> > > >  
> > > >  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> > > > @@ -1479,7 +1110,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > > >  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > > >  		finish_rebuild(mp, &btr_fino, lost_fsb);
> > > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > > > -		finish_cursor(&rmap_btree_curs);
> > > > +		finish_rebuild(mp, &btr_rmap, lost_fsb);
> > > >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > > >  		finish_cursor(&refcnt_btree_curs);
> > > >  
> > > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/12] xfs_repair: rebuild refcount btrees with bulk loader
  2020-06-18 15:26   ` Brian Foster
@ 2020-06-18 16:56     ` Darrick J. Wong
  2020-06-18 17:05       ` Brian Foster
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-18 16:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 11:26:17AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:27:57PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the btree bulk loading functions to rebuild the refcount btrees
> > and drop the open-coded implementation.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  libxfs/libxfs_api_defs.h |    1 
> >  repair/agbtree.c         |   71 ++++++++++
> >  repair/agbtree.h         |    5 +
> >  repair/phase5.c          |  341 ++--------------------------------------------
> >  4 files changed, 93 insertions(+), 325 deletions(-)
> > 
> > 
> ...
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index 1c6448f4..ad009416 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> ...
> > @@ -817,10 +510,14 @@ build_agf_agfl(
> >  				cpu_to_be32(btr_rmap->newbt.afake.af_blocks);
> >  	}
> >  
> > -	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
> > -	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
> > -	agf->agf_refcount_blocks = cpu_to_be32(refcnt_bt->num_tot_blocks -
> > -			refcnt_bt->num_free_blocks);
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > +		agf->agf_refcount_root =
> > +				cpu_to_be32(btr_refc->newbt.afake.af_root);
> > +		agf->agf_refcount_level =
> > +				cpu_to_be32(btr_refc->newbt.afake.af_levels);
> > +		agf->agf_refcount_blocks =
> > +				cpu_to_be32(btr_refc->newbt.afake.af_blocks);
> > +	}
> 
> It looks like the previous cursor variant (refcnt_bt) would be zeroed
> out if the feature isn't enabled (causing this to zero out the agf
> fields on disk), whereas now we only write the fields when the feature
> is enabled. Any concern over removing that zeroing behavior? Also note
> that an assert further down unconditionally reads the
> ->agf_refcount_root field.
> 
> BTW, I suppose the same question may apply to the previous patch as
> well...

I'll double check, but we do memset the AGF (and AGI) to zero before we
start initializing things, so the asserts should be fine even on
!reflink filesystems.

--D

> Brian
> 
> >  
> >  	/*
> >  	 * Count and record the number of btree blocks consumed if required.
> > @@ -981,7 +678,7 @@ phase5_func(
> >  	struct bt_rebuild	btr_ino;
> >  	struct bt_rebuild	btr_fino;
> >  	struct bt_rebuild	btr_rmap;
> > -	bt_status_t		refcnt_btree_curs;
> > +	struct bt_rebuild	btr_refc;
> >  	int			extra_blocks = 0;
> >  	uint			num_freeblocks;
> >  	xfs_agblock_t		num_extents;
> > @@ -1017,11 +714,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  
> >  	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
> >  
> > -	/*
> > -	 * Set up the btree cursors for the on-disk refcount btrees,
> > -	 * which includes pre-allocating all required blocks.
> > -	 */
> > -	init_refc_cursor(mp, agno, &refcnt_btree_curs);
> > +	init_refc_cursor(&sc, agno, num_freeblocks, &btr_refc);
> >  
> >  	num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
> >  	/*
> > @@ -1085,16 +778,14 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
> >  	}
> >  
> > -	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > -		build_refcount_tree(mp, agno, &refcnt_btree_curs);
> > -		write_cursor(&refcnt_btree_curs);
> > -	}
> > +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > +		build_refcount_tree(&sc, agno, &btr_refc);
> >  
> >  	/*
> >  	 * set up agf and agfl
> >  	 */
> > -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
> > -			&refcnt_btree_curs, lost_fsb);
> > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
> > +			lost_fsb);
> >  
> >  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> >  
> > @@ -1112,7 +803,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> >  		finish_rebuild(mp, &btr_rmap, lost_fsb);
> >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > -		finish_cursor(&refcnt_btree_curs);
> > +		finish_rebuild(mp, &btr_refc, lost_fsb);
> >  
> >  	/*
> >  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/12] xfs_repair: rebuild refcount btrees with bulk loader
  2020-06-18 16:56     ` Darrick J. Wong
@ 2020-06-18 17:05       ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-18 17:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 09:56:22AM -0700, Darrick J. Wong wrote:
> On Thu, Jun 18, 2020 at 11:26:17AM -0400, Brian Foster wrote:
> > On Mon, Jun 01, 2020 at 09:27:57PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Use the btree bulk loading functions to rebuild the refcount btrees
> > > and drop the open-coded implementation.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  libxfs/libxfs_api_defs.h |    1 
> > >  repair/agbtree.c         |   71 ++++++++++
> > >  repair/agbtree.h         |    5 +
> > >  repair/phase5.c          |  341 ++--------------------------------------------
> > >  4 files changed, 93 insertions(+), 325 deletions(-)
> > > 
> > > 
> > ...
> > > diff --git a/repair/phase5.c b/repair/phase5.c
> > > index 1c6448f4..ad009416 100644
> > > --- a/repair/phase5.c
> > > +++ b/repair/phase5.c
> > ...
> > > @@ -817,10 +510,14 @@ build_agf_agfl(
> > >  				cpu_to_be32(btr_rmap->newbt.afake.af_blocks);
> > >  	}
> > >  
> > > -	agf->agf_refcount_root = cpu_to_be32(refcnt_bt->root);
> > > -	agf->agf_refcount_level = cpu_to_be32(refcnt_bt->num_levels);
> > > -	agf->agf_refcount_blocks = cpu_to_be32(refcnt_bt->num_tot_blocks -
> > > -			refcnt_bt->num_free_blocks);
> > > +	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > +		agf->agf_refcount_root =
> > > +				cpu_to_be32(btr_refc->newbt.afake.af_root);
> > > +		agf->agf_refcount_level =
> > > +				cpu_to_be32(btr_refc->newbt.afake.af_levels);
> > > +		agf->agf_refcount_blocks =
> > > +				cpu_to_be32(btr_refc->newbt.afake.af_blocks);
> > > +	}
> > 
> > It looks like the previous cursor variant (refcnt_bt) would be zeroed
> > out if the feature isn't enabled (causing this to zero out the agf
> > fields on disk), whereas now we only write the fields when the feature
> > is enabled. Any concern over removing that zeroing behavior? Also note
> > that an assert further down unconditionally reads the
> > ->agf_refcount_root field.
> > 
> > BTW, I suppose the same question may apply to the previous patch as
> > well...
> 
> I'll double check, but we do memset the AGF (and AGI) to zero before we
> start initializing things, so the asserts should be fine even on
> !reflink filesystems.
> 

Ah, so the implicit per-field zeroing behavior of the old implementation
is superfluous. Assert aside, I just wanted to make sure we weren't
removing some subtle mechanism for clearing unused metadata fields if
they happened to contain garbage. That is not the case, so this one
looks fine to me as well:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> --D
> 
> > Brian
> > 
> > >  
> > >  	/*
> > >  	 * Count and record the number of btree blocks consumed if required.
> > > @@ -981,7 +678,7 @@ phase5_func(
> > >  	struct bt_rebuild	btr_ino;
> > >  	struct bt_rebuild	btr_fino;
> > >  	struct bt_rebuild	btr_rmap;
> > > -	bt_status_t		refcnt_btree_curs;
> > > +	struct bt_rebuild	btr_refc;
> > >  	int			extra_blocks = 0;
> > >  	uint			num_freeblocks;
> > >  	xfs_agblock_t		num_extents;
> > > @@ -1017,11 +714,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  
> > >  	init_rmapbt_cursor(&sc, agno, num_freeblocks, &btr_rmap);
> > >  
> > > -	/*
> > > -	 * Set up the btree cursors for the on-disk refcount btrees,
> > > -	 * which includes pre-allocating all required blocks.
> > > -	 */
> > > -	init_refc_cursor(mp, agno, &refcnt_btree_curs);
> > > +	init_refc_cursor(&sc, agno, num_freeblocks, &btr_refc);
> > >  
> > >  	num_extents = count_bno_extents_blocks(agno, &num_freeblocks);
> > >  	/*
> > > @@ -1085,16 +778,14 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  		sb_fdblocks_ag[agno] += btr_rmap.newbt.afake.af_blocks - 1;
> > >  	}
> > >  
> > > -	if (xfs_sb_version_hasreflink(&mp->m_sb)) {
> > > -		build_refcount_tree(mp, agno, &refcnt_btree_curs);
> > > -		write_cursor(&refcnt_btree_curs);
> > > -	}
> > > +	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > > +		build_refcount_tree(&sc, agno, &btr_refc);
> > >  
> > >  	/*
> > >  	 * set up agf and agfl
> > >  	 */
> > > -	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap,
> > > -			&refcnt_btree_curs, lost_fsb);
> > > +	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
> > > +			lost_fsb);
> > >  
> > >  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> > >  
> > > @@ -1112,7 +803,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > >  		finish_rebuild(mp, &btr_rmap, lost_fsb);
> > >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > > -		finish_cursor(&refcnt_btree_curs);
> > > +		finish_rebuild(mp, &btr_refc, lost_fsb);
> > >  
> > >  	/*
> > >  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-06-17 12:10   ` Brian Foster
@ 2020-06-18 18:30     ` Darrick J. Wong
  2020-06-29 23:10     ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-18 18:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Wed, Jun 17, 2020 at 08:10:01AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:27:31PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some new support structures and functions to assist phase5 in
> > using the btree bulk loader to reconstruct metadata btrees.  This is the
> > first step in removing the open-coded AG btree rebuilding code.
> > 
> > Note: The code in this patch will not be used anywhere until the next
> > patch, so warnings about unused symbols are expected.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> I still find it odd to include the phase5.c changes in this patch when
> it amounts to the addition of a single unused parameter, but I'll defer
> to the maintainer on that. Otherwise LGTM:

Yeah, I'll move it to the next patch.

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  repair/Makefile   |    4 +
> >  repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  repair/agbtree.h  |   29 ++++++++++
> >  repair/bulkload.c |   37 +++++++++++++
> >  repair/bulkload.h |    2 +
> >  repair/phase5.c   |   41 ++++++++------
> >  6 files changed, 244 insertions(+), 21 deletions(-)
> >  create mode 100644 repair/agbtree.c
> >  create mode 100644 repair/agbtree.h
> > 
> > 
> > diff --git a/repair/Makefile b/repair/Makefile
> > index 62d84bbf..f6a6e3f9 100644
> > --- a/repair/Makefile
> > +++ b/repair/Makefile
> > @@ -9,11 +9,11 @@ LSRCFILES = README
> >  
> >  LTCOMMAND = xfs_repair
> >  
> > -HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> > +HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> >  	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
> >  	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
> >  
> > -CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> > +CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> >  	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
> >  	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
> >  	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
> > diff --git a/repair/agbtree.c b/repair/agbtree.c
> > new file mode 100644
> > index 00000000..e4179a44
> > --- /dev/null
> > +++ b/repair/agbtree.c
> > @@ -0,0 +1,152 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#include <libxfs.h>
> > +#include "err_protos.h"
> > +#include "slab.h"
> > +#include "rmap.h"
> > +#include "incore.h"
> > +#include "bulkload.h"
> > +#include "agbtree.h"
> > +
> > +/* Initialize a btree rebuild context. */
> > +static void
> > +init_rebuild(
> > +	struct repair_ctx		*sc,
> > +	const struct xfs_owner_info	*oinfo,
> > +	xfs_agblock_t			free_space,
> > +	struct bt_rebuild		*btr)
> > +{
> > +	memset(btr, 0, sizeof(struct bt_rebuild));
> > +
> > +	bulkload_init_ag(&btr->newbt, sc, oinfo);
> > +	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
> > +}
> > +
> > +/*
> > + * Update this free space record to reflect the blocks we stole from the
> > + * beginning of the record.
> > + */
> > +static void
> > +consume_freespace(
> > +	xfs_agnumber_t		agno,
> > +	struct extent_tree_node	*ext_ptr,
> > +	uint32_t		len)
> > +{
> > +	struct extent_tree_node	*bno_ext_ptr;
> > +	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
> > +	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
> > +
> > +	/* Delete the used-up extent from both extent trees. */
> > +#ifdef XR_BLD_FREE_TRACE
> > +	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
> > +			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > +#endif
> > +	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
> > +	ASSERT(bno_ext_ptr != NULL);
> > +	get_bno_extent(agno, bno_ext_ptr);
> > +	release_extent_tree_node(bno_ext_ptr);
> > +
> > +	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> > +			ext_ptr->ex_blockcount);
> > +	release_extent_tree_node(ext_ptr);
> > +
> > +	/*
> > +	 * If we only used part of this last extent, then we must reinsert the
> > +	 * extent to maintain proper sorting order.
> > +	 */
> > +	if (new_len > 0) {
> > +		add_bno_extent(agno, new_start, new_len);
> > +		add_bcnt_extent(agno, new_start, new_len);
> > +	}
> > +}
> > +
> > +/* Reserve blocks for the new btree. */
> > +static void
> > +reserve_btblocks(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct bt_rebuild	*btr,
> > +	uint32_t		nr_blocks)
> > +{
> > +	struct extent_tree_node	*ext_ptr;
> > +	uint32_t		blocks_allocated = 0;
> > +	uint32_t		len;
> > +	int			error;
> > +
> > +	while (blocks_allocated < nr_blocks)  {
> > +		xfs_fsblock_t	fsbno;
> > +
> > +		/*
> > +		 * Grab the smallest extent and use it up, then get the
> > +		 * next smallest.  This mimics the init_*_cursor code.
> > +		 */
> > +		ext_ptr = findfirst_bcnt_extent(agno);
> > +		if (!ext_ptr)
> > +			do_error(
> > +_("error - not enough free space in filesystem\n"));
> > +
> > +		/* Use up the extent we've got. */
> > +		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
> > +		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
> > +		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
> > +		if (error)
> > +			do_error(_("could not set up btree reservation: %s\n"),
> > +				strerror(-error));
> > +
> > +		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
> > +				btr->newbt.oinfo.oi_owner);
> > +		if (error)
> > +			do_error(_("could not set up btree rmaps: %s\n"),
> > +				strerror(-error));
> > +
> > +		consume_freespace(agno, ext_ptr, len);
> > +		blocks_allocated += len;
> > +	}
> > +#ifdef XR_BLD_FREE_TRACE
> > +	fprintf(stderr, "blocks_allocated = %d\n",
> > +		blocks_allocated);
> > +#endif
> > +}
> > +
> > +/* Feed one of the new btree blocks to the bulk loader. */
> > +static int
> > +rebuild_claim_block(
> > +	struct xfs_btree_cur	*cur,
> > +	union xfs_btree_ptr	*ptr,
> > +	void			*priv)
> > +{
> > +	struct bt_rebuild	*btr = priv;
> > +
> > +	return bulkload_claim_block(cur, &btr->newbt, ptr);
> > +}
> > +
> > +/*
> > + * Scoop up leftovers from a rebuild cursor for later freeing, then free the
> > + * rebuild context.
> > + */
> > +void
> > +finish_rebuild(
> > +	struct xfs_mount	*mp,
> > +	struct bt_rebuild	*btr,
> > +	struct xfs_slab		*lost_fsb)
> > +{
> > +	struct bulkload_resv	*resv, *n;
> > +
> > +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> > +		while (resv->used < resv->len) {
> > +			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
> > +			int		error;
> > +
> > +			error = slab_add(lost_fsb, &fsb);
> > +			if (error)
> > +				do_error(
> > +_("Insufficient memory saving lost blocks.\n"));
> > +			resv->used++;
> > +		}
> > +	}
> > +
> > +	bulkload_destroy(&btr->newbt, 0);
> > +}
> > diff --git a/repair/agbtree.h b/repair/agbtree.h
> > new file mode 100644
> > index 00000000..50ea3c60
> > --- /dev/null
> > +++ b/repair/agbtree.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#ifndef __XFS_REPAIR_AG_BTREE_H__
> > +#define __XFS_REPAIR_AG_BTREE_H__
> > +
> > +/* Context for rebuilding a per-AG btree. */
> > +struct bt_rebuild {
> > +	/* Fake root for staging and space preallocations. */
> > +	struct bulkload	newbt;
> > +
> > +	/* Geometry of the new btree. */
> > +	struct xfs_btree_bload	bload;
> > +
> > +	/* Staging btree cursor for the new tree. */
> > +	struct xfs_btree_cur	*cur;
> > +
> > +	/* Tree-specific data. */
> > +	union {
> > +		struct xfs_slab_cursor	*slab_cursor;
> > +	};
> > +};
> > +
> > +void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> > +		struct xfs_slab *lost_fsb);
> > +
> > +#endif /* __XFS_REPAIR_AG_BTREE_H__ */
> > diff --git a/repair/bulkload.c b/repair/bulkload.c
> > index 4c69fe0d..9a6ca0c2 100644
> > --- a/repair/bulkload.c
> > +++ b/repair/bulkload.c
> > @@ -95,3 +95,40 @@ bulkload_claim_block(
> >  		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Estimate proper slack values for a btree that's being reloaded.
> > + *
> > + * Under most circumstances, we'll take whatever default loading value the
> > + * btree bulk loading code calculates for us.  However, there are some
> > + * exceptions to this rule:
> > + *
> > + * (1) If someone turned one of the debug knobs.
> > + * (2) The AG has less than ~9% space free.
> > + *
> > + * Note that we actually use 3/32 for the comparison to avoid division.
> > + */
> > +void
> > +bulkload_estimate_ag_slack(
> > +	struct repair_ctx	*sc,
> > +	struct xfs_btree_bload	*bload,
> > +	unsigned int		free)
> > +{
> > +	/*
> > +	 * The global values are set to -1 (i.e. take the bload defaults)
> > +	 * unless someone has set them otherwise, so we just pull the values
> > +	 * here.
> > +	 */
> > +	bload->leaf_slack = bload_leaf_slack;
> > +	bload->node_slack = bload_node_slack;
> > +
> > +	/* No further changes if there's more than 3/32ths space left. */
> > +	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
> > +		return;
> > +
> > +	/* We're low on space; load the btrees as tightly as possible. */
> > +	if (bload->leaf_slack < 0)
> > +		bload->leaf_slack = 0;
> > +	if (bload->node_slack < 0)
> > +		bload->node_slack = 0;
> > +}
> > diff --git a/repair/bulkload.h b/repair/bulkload.h
> > index 79f81cb0..01f67279 100644
> > --- a/repair/bulkload.h
> > +++ b/repair/bulkload.h
> > @@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
> >  void bulkload_destroy(struct bulkload *bkl, int error);
> >  int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
> >  		union xfs_btree_ptr *ptr);
> > +void bulkload_estimate_ag_slack(struct repair_ctx *sc,
> > +		struct xfs_btree_bload *bload, unsigned int free);
> >  
> >  #endif /* __XFS_REPAIR_BULKLOAD_H__ */
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index 75c480fd..8175aa6f 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> > @@ -18,6 +18,8 @@
> >  #include "progress.h"
> >  #include "slab.h"
> >  #include "rmap.h"
> > +#include "bulkload.h"
> > +#include "agbtree.h"
> >  
> >  /*
> >   * we maintain the current slice (path from root to leaf)
> > @@ -2288,28 +2290,29 @@ keep_fsinos(xfs_mount_t *mp)
> >  
> >  static void
> >  phase5_func(
> > -	xfs_mount_t	*mp,
> > -	xfs_agnumber_t	agno,
> > -	struct xfs_slab	*lost_fsb)
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct xfs_slab		*lost_fsb)
> >  {
> > -	uint64_t	num_inos;
> > -	uint64_t	num_free_inos;
> > -	uint64_t	finobt_num_inos;
> > -	uint64_t	finobt_num_free_inos;
> > -	bt_status_t	bno_btree_curs;
> > -	bt_status_t	bcnt_btree_curs;
> > -	bt_status_t	ino_btree_curs;
> > -	bt_status_t	fino_btree_curs;
> > -	bt_status_t	rmap_btree_curs;
> > -	bt_status_t	refcnt_btree_curs;
> > -	int		extra_blocks = 0;
> > -	uint		num_freeblocks;
> > -	xfs_extlen_t	freeblks1;
> > +	struct repair_ctx	sc = { .mp = mp, };
> > +	struct agi_stat		agi_stat = {0,};
> > +	uint64_t		num_inos;
> > +	uint64_t		num_free_inos;
> > +	uint64_t		finobt_num_inos;
> > +	uint64_t		finobt_num_free_inos;
> > +	bt_status_t		bno_btree_curs;
> > +	bt_status_t		bcnt_btree_curs;
> > +	bt_status_t		ino_btree_curs;
> > +	bt_status_t		fino_btree_curs;
> > +	bt_status_t		rmap_btree_curs;
> > +	bt_status_t		refcnt_btree_curs;
> > +	int			extra_blocks = 0;
> > +	uint			num_freeblocks;
> > +	xfs_extlen_t		freeblks1;
> >  #ifdef DEBUG
> > -	xfs_extlen_t	freeblks2;
> > +	xfs_extlen_t		freeblks2;
> >  #endif
> > -	xfs_agblock_t	num_extents;
> > -	struct agi_stat	agi_stat = {0,};
> > +	xfs_agblock_t		num_extents;
> >  
> >  	if (verbose)
> >  		do_log(_("        - agno = %d\n"), agno);
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/12] xfs_repair: rebuild inode btrees with bulk loader
  2020-06-18 15:24   ` Brian Foster
@ 2020-06-18 18:33     ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-18 18:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Thu, Jun 18, 2020 at 11:24:11AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:27:44PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the btree bulk loading functions to rebuild the inode btrees
> > and drop the open-coded implementation.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  libxfs/libxfs_api_defs.h |    1 
> >  repair/agbtree.c         |  207 ++++++++++++++++++++
> >  repair/agbtree.h         |   13 +
> >  repair/phase5.c          |  488 +++-------------------------------------------
> >  4 files changed, 248 insertions(+), 461 deletions(-)
> > 
> > 
> ...
> > diff --git a/repair/agbtree.c b/repair/agbtree.c
> > index 3b8ab47c..e44475fc 100644
> > --- a/repair/agbtree.c
> > +++ b/repair/agbtree.c
> > @@ -308,3 +308,210 @@ _("Error %d while creating cntbt btree for AG %u.\n"), error, agno);
> >  	libxfs_btree_del_cursor(btr_bno->cur, 0);
> >  	libxfs_btree_del_cursor(btr_cnt->cur, 0);
> >  }
> ...
> > +/* Initialize both inode btree cursors as needed. */
> > +void
> > +init_ino_cursors(
> > +	struct repair_ctx	*sc,
> > +	xfs_agnumber_t		agno,
> > +	unsigned int		free_space,
> > +	uint64_t		*num_inos,
> > +	uint64_t		*num_free_inos,
> > +	struct bt_rebuild	*btr_ino,
> > +	struct bt_rebuild	*btr_fino)
> > +{
> > +	struct ino_tree_node	*ino_rec;
> > +	unsigned int		ino_recs = 0;
> > +	unsigned int		fino_recs = 0;
> > +	bool			finobt;
> > +	int			error;
> > +
> > +	finobt = xfs_sb_version_hasfinobt(&sc->mp->m_sb);
> 
> Seems like a pointless variable given it is only used in one place.
> Otherwise looks good:

Fixed.

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> > +	init_rebuild(sc, &XFS_RMAP_OINFO_INOBT, free_space, btr_ino);
> > +
> > +	/* Compute inode statistics. */
> > +	*num_free_inos = 0;
> > +	*num_inos = 0;
> > +	for (ino_rec = findfirst_inode_rec(agno);
> > +	     ino_rec != NULL;
> > +	     ino_rec = next_ino_rec(ino_rec))  {
> > +		unsigned int	rec_ninos = 0;
> > +		unsigned int	rec_nfinos = 0;
> > +		int		i;
> > +
> > +		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)  {
> > +			ASSERT(is_inode_confirmed(ino_rec, i));
> > +			/*
> > +			 * sparse inodes are not factored into superblock (free)
> > +			 * inode counts
> > +			 */
> > +			if (is_inode_sparse(ino_rec, i))
> > +				continue;
> > +			if (is_inode_free(ino_rec, i))
> > +				rec_nfinos++;
> > +			rec_ninos++;
> > +		}
> > +
> > +		*num_free_inos += rec_nfinos;
> > +		*num_inos += rec_ninos;
> > +		ino_recs++;
> > +
> > +		/* finobt only considers records with free inodes */
> > +		if (rec_nfinos)
> > +			fino_recs++;
> > +	}
> > +
> > +	btr_ino->cur = libxfs_inobt_stage_cursor(sc->mp, &btr_ino->newbt.afake,
> > +			agno, XFS_BTNUM_INO);
> > +
> > +	btr_ino->bload.get_record = get_inobt_record;
> > +	btr_ino->bload.claim_block = rebuild_claim_block;
> > +	btr_ino->first_agino = NULLAGINO;
> > +
> > +	/* Compute how many inobt blocks we'll need. */
> > +	error = -libxfs_btree_bload_compute_geometry(btr_ino->cur,
> > +			&btr_ino->bload, ino_recs);
> > +	if (error)
> > +		do_error(
> > +_("Unable to compute inode btree geometry, error %d.\n"), error);
> > +
> > +	reserve_btblocks(sc->mp, agno, btr_ino, btr_ino->bload.nr_blocks);
> > +
> > +	if (!finobt)
> > +		return;
> > +
> > +	init_rebuild(sc, &XFS_RMAP_OINFO_INOBT, free_space, btr_fino);
> > +	btr_fino->cur = libxfs_inobt_stage_cursor(sc->mp,
> > +			&btr_fino->newbt.afake, agno, XFS_BTNUM_FINO);
> > +
> > +	btr_fino->bload.get_record = get_inobt_record;
> > +	btr_fino->bload.claim_block = rebuild_claim_block;
> > +	btr_fino->first_agino = NULLAGINO;
> > +
> > +	/* Compute how many finobt blocks we'll need. */
> > +	error = -libxfs_btree_bload_compute_geometry(btr_fino->cur,
> > +			&btr_fino->bload, fino_recs);
> > +	if (error)
> > +		do_error(
> > +_("Unable to compute free inode btree geometry, error %d.\n"), error);
> > +
> > +	reserve_btblocks(sc->mp, agno, btr_fino, btr_fino->bload.nr_blocks);
> > +}
> > +
> > +/* Rebuild the inode btrees. */
> > +void
> > +build_inode_btrees(
> > +	struct repair_ctx	*sc,
> > +	xfs_agnumber_t		agno,
> > +	struct bt_rebuild	*btr_ino,
> > +	struct bt_rebuild	*btr_fino)
> > +{
> > +	int			error;
> > +
> > +	/* Add all observed inobt records. */
> > +	error = -libxfs_btree_bload(btr_ino->cur, &btr_ino->bload, btr_ino);
> > +	if (error)
> > +		do_error(
> > +_("Error %d while creating inobt btree for AG %u.\n"), error, agno);
> > +
> > +	/* Since we're not writing the AGI yet, no need to commit the cursor */
> > +	libxfs_btree_del_cursor(btr_ino->cur, 0);
> > +
> > +	if (!xfs_sb_version_hasfinobt(&sc->mp->m_sb))
> > +		return;
> > +
> > +	/* Add all observed finobt records. */
> > +	error = -libxfs_btree_bload(btr_fino->cur, &btr_fino->bload, btr_fino);
> > +	if (error)
> > +		do_error(
> > +_("Error %d while creating finobt btree for AG %u.\n"), error, agno);
> > +
> > +	/* Since we're not writing the AGI yet, no need to commit the cursor */
> > +	libxfs_btree_del_cursor(btr_fino->cur, 0);
> > +}
> > diff --git a/repair/agbtree.h b/repair/agbtree.h
> > index 63352247..3cad2a8e 100644
> > --- a/repair/agbtree.h
> > +++ b/repair/agbtree.h
> > @@ -24,6 +24,12 @@ struct bt_rebuild {
> >  			struct extent_tree_node	*bno_rec;
> >  			unsigned int		freeblks;
> >  		};
> > +		struct {
> > +			struct ino_tree_node	*ino_rec;
> > +			xfs_agino_t		first_agino;
> > +			xfs_agino_t		count;
> > +			xfs_agino_t		freecount;
> > +		};
> >  	};
> >  };
> >  
> > @@ -36,4 +42,11 @@ void init_freespace_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
> >  void build_freespace_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
> >  		struct bt_rebuild *btr_bno, struct bt_rebuild *btr_cnt);
> >  
> > +void init_ino_cursors(struct repair_ctx *sc, xfs_agnumber_t agno,
> > +		unsigned int free_space, uint64_t *num_inos,
> > +		uint64_t *num_free_inos, struct bt_rebuild *btr_ino,
> > +		struct bt_rebuild *btr_fino);
> > +void build_inode_btrees(struct repair_ctx *sc, xfs_agnumber_t agno,
> > +		struct bt_rebuild *btr_ino, struct bt_rebuild *btr_fino);
> > +
> >  #endif /* __XFS_REPAIR_AG_BTREE_H__ */
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index a93d900d..e570349d 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> > @@ -67,15 +67,6 @@ typedef struct bt_status  {
> >  	uint64_t		owner;		/* owner */
> >  } bt_status_t;
> >  
> > -/*
> > - * extra metadata for the agi
> > - */
> > -struct agi_stat {
> > -	xfs_agino_t		first_agino;
> > -	xfs_agino_t		count;
> > -	xfs_agino_t		freecount;
> > -};
> > -
> >  static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
> >  static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
> >  static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
> > @@ -369,229 +360,20 @@ btnum_to_ops(
> >  	}
> >  }
> >  
> > -/*
> > - * XXX(hch): any reason we don't just look at mp->m_inobt_mxr?
> > - */
> > -#define XR_INOBT_BLOCK_MAXRECS(mp, level) \
> > -			libxfs_inobt_maxrecs((mp), (mp)->m_sb.sb_blocksize, \
> > -						(level) == 0)
> > -
> > -/*
> > - * we don't have to worry here about how chewing up free extents
> > - * may perturb things because inode tree building happens before
> > - * freespace tree building.
> > - */
> > -static void
> > -init_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> > -		uint64_t *num_inos, uint64_t *num_free_inos, int finobt)
> > -{
> > -	uint64_t		ninos;
> > -	uint64_t		nfinos;
> > -	int			rec_nfinos;
> > -	int			rec_ninos;
> > -	ino_tree_node_t		*ino_rec;
> > -	int			num_recs;
> > -	int			level;
> > -	bt_stat_level_t		*lptr;
> > -	bt_stat_level_t		*p_lptr;
> > -	xfs_extlen_t		blocks_allocated;
> > -	int			i;
> > -
> > -	*num_inos = *num_free_inos = 0;
> > -	ninos = nfinos = 0;
> > -
> > -	lptr = &btree_curs->level[0];
> > -	btree_curs->init = 1;
> > -	btree_curs->owner = XFS_RMAP_OWN_INOBT;
> > -
> > -	/*
> > -	 * build up statistics
> > -	 */
> > -	ino_rec = findfirst_inode_rec(agno);
> > -	for (num_recs = 0; ino_rec != NULL; ino_rec = next_ino_rec(ino_rec))  {
> > -		rec_ninos = 0;
> > -		rec_nfinos = 0;
> > -		for (i = 0; i < XFS_INODES_PER_CHUNK; i++)  {
> > -			ASSERT(is_inode_confirmed(ino_rec, i));
> > -			/*
> > -			 * sparse inodes are not factored into superblock (free)
> > -			 * inode counts
> > -			 */
> > -			if (is_inode_sparse(ino_rec, i))
> > -				continue;
> > -			if (is_inode_free(ino_rec, i))
> > -				rec_nfinos++;
> > -			rec_ninos++;
> > -		}
> > -
> > -		/*
> > -		 * finobt only considers records with free inodes
> > -		 */
> > -		if (finobt && !rec_nfinos)
> > -			continue;
> > -
> > -		nfinos += rec_nfinos;
> > -		ninos += rec_ninos;
> > -		num_recs++;
> > -	}
> > -
> > -	if (num_recs == 0) {
> > -		/*
> > -		 * easy corner-case -- no inode records
> > -		 */
> > -		lptr->num_blocks = 1;
> > -		lptr->modulo = 0;
> > -		lptr->num_recs_pb = 0;
> > -		lptr->num_recs_tot = 0;
> > -
> > -		btree_curs->num_levels = 1;
> > -		btree_curs->num_tot_blocks = btree_curs->num_free_blocks = 1;
> > -
> > -		setup_cursor(mp, agno, btree_curs);
> > -
> > -		return;
> > -	}
> > -
> > -	blocks_allocated = lptr->num_blocks = howmany(num_recs,
> > -					XR_INOBT_BLOCK_MAXRECS(mp, 0));
> > -
> > -	lptr->modulo = num_recs % lptr->num_blocks;
> > -	lptr->num_recs_pb = num_recs / lptr->num_blocks;
> > -	lptr->num_recs_tot = num_recs;
> > -	level = 1;
> > -
> > -	if (lptr->num_blocks > 1)  {
> > -		for (; btree_curs->level[level-1].num_blocks > 1
> > -				&& level < XFS_BTREE_MAXLEVELS;
> > -				level++)  {
> > -			lptr = &btree_curs->level[level];
> > -			p_lptr = &btree_curs->level[level - 1];
> > -			lptr->num_blocks = howmany(p_lptr->num_blocks,
> > -				XR_INOBT_BLOCK_MAXRECS(mp, level));
> > -			lptr->modulo = p_lptr->num_blocks % lptr->num_blocks;
> > -			lptr->num_recs_pb = p_lptr->num_blocks
> > -					/ lptr->num_blocks;
> > -			lptr->num_recs_tot = p_lptr->num_blocks;
> > -
> > -			blocks_allocated += lptr->num_blocks;
> > -		}
> > -	}
> > -	ASSERT(lptr->num_blocks == 1);
> > -	btree_curs->num_levels = level;
> > -
> > -	btree_curs->num_tot_blocks = btree_curs->num_free_blocks
> > -			= blocks_allocated;
> > -
> > -	setup_cursor(mp, agno, btree_curs);
> > -
> > -	*num_inos = ninos;
> > -	*num_free_inos = nfinos;
> > -
> > -	return;
> > -}
> > -
> > -static void
> > -prop_ino_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> > -	xfs_btnum_t btnum, xfs_agino_t startino, int level)
> > -{
> > -	struct xfs_btree_block	*bt_hdr;
> > -	xfs_inobt_key_t		*bt_key;
> > -	xfs_inobt_ptr_t		*bt_ptr;
> > -	xfs_agblock_t		agbno;
> > -	bt_stat_level_t		*lptr;
> > -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> > -	int			error;
> > -
> > -	level++;
> > -
> > -	if (level >= btree_curs->num_levels)
> > -		return;
> > -
> > -	lptr = &btree_curs->level[level];
> > -	bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -
> > -	if (be16_to_cpu(bt_hdr->bb_numrecs) == 0)  {
> > -		/*
> > -		 * this only happens once to initialize the
> > -		 * first path up the left side of the tree
> > -		 * where the agbno's are already set up
> > -		 */
> > -		prop_ino_cursor(mp, agno, btree_curs, btnum, startino, level);
> > -	}
> > -
> > -	if (be16_to_cpu(bt_hdr->bb_numrecs) ==
> > -				lptr->num_recs_pb + (lptr->modulo > 0))  {
> > -		/*
> > -		 * write out current prev block, grab us a new block,
> > -		 * and set the rightsib pointer of current block
> > -		 */
> > -#ifdef XR_BLD_INO_TRACE
> > -		fprintf(stderr, " ino prop agbno %d ", lptr->prev_agbno);
> > -#endif
> > -		if (lptr->prev_agbno != NULLAGBLOCK)  {
> > -			ASSERT(lptr->prev_buf_p != NULL);
> > -			libxfs_buf_mark_dirty(lptr->prev_buf_p);
> > -			libxfs_buf_relse(lptr->prev_buf_p);
> > -		}
> > -		lptr->prev_agbno = lptr->agbno;;
> > -		lptr->prev_buf_p = lptr->buf_p;
> > -		agbno = get_next_blockaddr(agno, level, btree_curs);
> > -
> > -		bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(agbno);
> > -
> > -		error = -libxfs_buf_get(mp->m_dev,
> > -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> > -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> > -		if (error)
> > -			do_error(_("Cannot grab inode btree buffer, err=%d"),
> > -					error);
> > -		lptr->agbno = agbno;
> > -
> > -		if (lptr->modulo)
> > -			lptr->modulo--;
> > -
> > -		/*
> > -		 * initialize block header
> > -		 */
> > -		lptr->buf_p->b_ops = ops;
> > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum,
> > -					level, 0, agno);
> > -
> > -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> > -
> > -		/*
> > -		 * propagate extent record for first extent in new block up
> > -		 */
> > -		prop_ino_cursor(mp, agno, btree_curs, btnum, startino, level);
> > -	}
> > -	/*
> > -	 * add inode info to current block
> > -	 */
> > -	be16_add_cpu(&bt_hdr->bb_numrecs, 1);
> > -
> > -	bt_key = XFS_INOBT_KEY_ADDR(mp, bt_hdr,
> > -				    be16_to_cpu(bt_hdr->bb_numrecs));
> > -	bt_ptr = XFS_INOBT_PTR_ADDR(mp, bt_hdr,
> > -				    be16_to_cpu(bt_hdr->bb_numrecs),
> > -				    M_IGEO(mp)->inobt_mxr[1]);
> > -
> > -	bt_key->ir_startino = cpu_to_be32(startino);
> > -	*bt_ptr = cpu_to_be32(btree_curs->level[level-1].agbno);
> > -}
> > -
> >  /*
> >   * XXX: yet more code that can be shared with mkfs, growfs.
> >   */
> >  static void
> > -build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> > -		bt_status_t *finobt_curs, struct agi_stat *agi_stat)
> > +build_agi(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct bt_rebuild	*btr_ino,
> > +	struct bt_rebuild	*btr_fino)
> >  {
> > -	xfs_buf_t	*agi_buf;
> > -	xfs_agi_t	*agi;
> > -	int		i;
> > -	int		error;
> > +	struct xfs_buf		*agi_buf;
> > +	struct xfs_agi		*agi;
> > +	int			i;
> > +	int			error;
> >  
> >  	error = -libxfs_buf_get(mp->m_dev,
> >  			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
> > @@ -611,11 +393,11 @@ build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> >  	else
> >  		agi->agi_length = cpu_to_be32(mp->m_sb.sb_dblocks -
> >  			(xfs_rfsblock_t) mp->m_sb.sb_agblocks * agno);
> > -	agi->agi_count = cpu_to_be32(agi_stat->count);
> > -	agi->agi_root = cpu_to_be32(btree_curs->root);
> > -	agi->agi_level = cpu_to_be32(btree_curs->num_levels);
> > -	agi->agi_freecount = cpu_to_be32(agi_stat->freecount);
> > -	agi->agi_newino = cpu_to_be32(agi_stat->first_agino);
> > +	agi->agi_count = cpu_to_be32(btr_ino->count);
> > +	agi->agi_root = cpu_to_be32(btr_ino->newbt.afake.af_root);
> > +	agi->agi_level = cpu_to_be32(btr_ino->newbt.afake.af_levels);
> > +	agi->agi_freecount = cpu_to_be32(btr_ino->freecount);
> > +	agi->agi_newino = cpu_to_be32(btr_ino->first_agino);
> >  	agi->agi_dirino = cpu_to_be32(NULLAGINO);
> >  
> >  	for (i = 0; i < XFS_AGI_UNLINKED_BUCKETS; i++)
> > @@ -625,203 +407,16 @@ build_agi(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *btree_curs,
> >  		platform_uuid_copy(&agi->agi_uuid, &mp->m_sb.sb_meta_uuid);
> >  
> >  	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > -		agi->agi_free_root = cpu_to_be32(finobt_curs->root);
> > -		agi->agi_free_level = cpu_to_be32(finobt_curs->num_levels);
> > +		agi->agi_free_root =
> > +				cpu_to_be32(btr_fino->newbt.afake.af_root);
> > +		agi->agi_free_level =
> > +				cpu_to_be32(btr_fino->newbt.afake.af_levels);
> >  	}
> >  
> >  	libxfs_buf_mark_dirty(agi_buf);
> >  	libxfs_buf_relse(agi_buf);
> >  }
> >  
> > -/*
> > - * rebuilds an inode tree given a cursor.  We're lazy here and call
> > - * the routine that builds the agi
> > - */
> > -static void
> > -build_ino_tree(xfs_mount_t *mp, xfs_agnumber_t agno,
> > -		bt_status_t *btree_curs, xfs_btnum_t btnum,
> > -		struct agi_stat *agi_stat)
> > -{
> > -	xfs_agnumber_t		i;
> > -	xfs_agblock_t		j;
> > -	xfs_agblock_t		agbno;
> > -	xfs_agino_t		first_agino;
> > -	struct xfs_btree_block	*bt_hdr;
> > -	xfs_inobt_rec_t		*bt_rec;
> > -	ino_tree_node_t		*ino_rec;
> > -	bt_stat_level_t		*lptr;
> > -	const struct xfs_buf_ops *ops = btnum_to_ops(btnum);
> > -	xfs_agino_t		count = 0;
> > -	xfs_agino_t		freecount = 0;
> > -	int			inocnt;
> > -	uint8_t			finocnt;
> > -	int			k;
> > -	int			level = btree_curs->num_levels;
> > -	int			spmask;
> > -	uint64_t		sparse;
> > -	uint16_t		holemask;
> > -	int			error;
> > -
> > -	ASSERT(btnum == XFS_BTNUM_INO || btnum == XFS_BTNUM_FINO);
> > -
> > -	for (i = 0; i < level; i++)  {
> > -		lptr = &btree_curs->level[i];
> > -
> > -		agbno = get_next_blockaddr(agno, i, btree_curs);
> > -		error = -libxfs_buf_get(mp->m_dev,
> > -				XFS_AGB_TO_DADDR(mp, agno, agbno),
> > -				XFS_FSB_TO_BB(mp, 1), &lptr->buf_p);
> > -		if (error)
> > -			do_error(_("Cannot grab inode btree buffer, err=%d"),
> > -					error);
> > -
> > -		if (i == btree_curs->num_levels - 1)
> > -			btree_curs->root = agbno;
> > -
> > -		lptr->agbno = agbno;
> > -		lptr->prev_agbno = NULLAGBLOCK;
> > -		lptr->prev_buf_p = NULL;
> > -		/*
> > -		 * initialize block header
> > -		 */
> > -
> > -		lptr->buf_p->b_ops = ops;
> > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, i, 0, agno);
> > -	}
> > -
> > -	/*
> > -	 * run along leaf, setting up records.  as we have to switch
> > -	 * blocks, call the prop_ino_cursor routine to set up the new
> > -	 * pointers for the parent.  that can recurse up to the root
> > -	 * if required.  set the sibling pointers for leaf level here.
> > -	 */
> > -	if (btnum == XFS_BTNUM_FINO)
> > -		ino_rec = findfirst_free_inode_rec(agno);
> > -	else
> > -		ino_rec = findfirst_inode_rec(agno);
> > -
> > -	if (ino_rec != NULL)
> > -		first_agino = ino_rec->ino_startnum;
> > -	else
> > -		first_agino = NULLAGINO;
> > -
> > -	lptr = &btree_curs->level[0];
> > -
> > -	for (i = 0; i < lptr->num_blocks; i++)  {
> > -		/*
> > -		 * block initialization, lay in block header
> > -		 */
> > -		lptr->buf_p->b_ops = ops;
> > -		bt_hdr = XFS_BUF_TO_BLOCK(lptr->buf_p);
> > -		memset(bt_hdr, 0, mp->m_sb.sb_blocksize);
> > -		libxfs_btree_init_block(mp, lptr->buf_p, btnum, 0, 0, agno);
> > -
> > -		bt_hdr->bb_u.s.bb_leftsib = cpu_to_be32(lptr->prev_agbno);
> > -		bt_hdr->bb_numrecs = cpu_to_be16(lptr->num_recs_pb +
> > -							(lptr->modulo > 0));
> > -
> > -		if (lptr->modulo > 0)
> > -			lptr->modulo--;
> > -
> > -		if (lptr->num_recs_pb > 0)
> > -			prop_ino_cursor(mp, agno, btree_curs, btnum,
> > -					ino_rec->ino_startnum, 0);
> > -
> > -		bt_rec = (xfs_inobt_rec_t *)
> > -			  ((char *)bt_hdr + XFS_INOBT_BLOCK_LEN(mp));
> > -		for (j = 0; j < be16_to_cpu(bt_hdr->bb_numrecs); j++) {
> > -			ASSERT(ino_rec != NULL);
> > -			bt_rec[j].ir_startino =
> > -					cpu_to_be32(ino_rec->ino_startnum);
> > -			bt_rec[j].ir_free = cpu_to_be64(ino_rec->ir_free);
> > -
> > -			inocnt = finocnt = 0;
> > -			for (k = 0; k < sizeof(xfs_inofree_t)*NBBY; k++)  {
> > -				ASSERT(is_inode_confirmed(ino_rec, k));
> > -
> > -				if (is_inode_sparse(ino_rec, k))
> > -					continue;
> > -				if (is_inode_free(ino_rec, k))
> > -					finocnt++;
> > -				inocnt++;
> > -			}
> > -
> > -			/*
> > -			 * Set the freecount and check whether we need to update
> > -			 * the sparse format fields. Otherwise, skip to the next
> > -			 * record.
> > -			 */
> > -			inorec_set_freecount(mp, &bt_rec[j], finocnt);
> > -			if (!xfs_sb_version_hassparseinodes(&mp->m_sb))
> > -				goto nextrec;
> > -
> > -			/*
> > -			 * Convert the 64-bit in-core sparse inode state to the
> > -			 * 16-bit on-disk holemask.
> > -			 */
> > -			holemask = 0;
> > -			spmask = (1 << XFS_INODES_PER_HOLEMASK_BIT) - 1;
> > -			sparse = ino_rec->ir_sparse;
> > -			for (k = 0; k < XFS_INOBT_HOLEMASK_BITS; k++) {
> > -				if (sparse & spmask) {
> > -					ASSERT((sparse & spmask) == spmask);
> > -					holemask |= (1 << k);
> > -				} else
> > -					ASSERT((sparse & spmask) == 0);
> > -				sparse >>= XFS_INODES_PER_HOLEMASK_BIT;
> > -			}
> > -
> > -			bt_rec[j].ir_u.sp.ir_count = inocnt;
> > -			bt_rec[j].ir_u.sp.ir_holemask = cpu_to_be16(holemask);
> > -
> > -nextrec:
> > -			freecount += finocnt;
> > -			count += inocnt;
> > -
> > -			if (btnum == XFS_BTNUM_FINO)
> > -				ino_rec = next_free_ino_rec(ino_rec);
> > -			else
> > -				ino_rec = next_ino_rec(ino_rec);
> > -		}
> > -
> > -		if (ino_rec != NULL)  {
> > -			/*
> > -			 * get next leaf level block
> > -			 */
> > -			if (lptr->prev_buf_p != NULL)  {
> > -#ifdef XR_BLD_INO_TRACE
> > -				fprintf(stderr, "writing inobt agbno %u\n",
> > -					lptr->prev_agbno);
> > -#endif
> > -				ASSERT(lptr->prev_agbno != NULLAGBLOCK);
> > -				libxfs_buf_mark_dirty(lptr->prev_buf_p);
> > -				libxfs_buf_relse(lptr->prev_buf_p);
> > -			}
> > -			lptr->prev_buf_p = lptr->buf_p;
> > -			lptr->prev_agbno = lptr->agbno;
> > -			lptr->agbno = get_next_blockaddr(agno, 0, btree_curs);
> > -			bt_hdr->bb_u.s.bb_rightsib = cpu_to_be32(lptr->agbno);
> > -
> > -			error = -libxfs_buf_get(mp->m_dev,
> > -					XFS_AGB_TO_DADDR(mp, agno, lptr->agbno),
> > -					XFS_FSB_TO_BB(mp, 1),
> > -					&lptr->buf_p);
> > -			if (error)
> > -				do_error(
> > -	_("Cannot grab inode btree buffer, err=%d"),
> > -						error);
> > -		}
> > -	}
> > -
> > -	if (agi_stat) {
> > -		agi_stat->first_agino = first_agino;
> > -		agi_stat->count = count;
> > -		agi_stat->freecount = freecount;
> > -	}
> > -}
> > -
> >  /* rebuild the rmap tree */
> >  
> >  /*
> > @@ -1744,15 +1339,10 @@ phase5_func(
> >  	struct xfs_slab		*lost_fsb)
> >  {
> >  	struct repair_ctx	sc = { .mp = mp, };
> > -	struct agi_stat		agi_stat = {0,};
> > -	uint64_t		num_inos;
> > -	uint64_t		num_free_inos;
> > -	uint64_t		finobt_num_inos;
> > -	uint64_t		finobt_num_free_inos;
> >  	struct bt_rebuild	btr_bno;
> >  	struct bt_rebuild	btr_cnt;
> > -	bt_status_t		ino_btree_curs;
> > -	bt_status_t		fino_btree_curs;
> > +	struct bt_rebuild	btr_ino;
> > +	struct bt_rebuild	btr_fino;
> >  	bt_status_t		rmap_btree_curs;
> >  	bt_status_t		refcnt_btree_curs;
> >  	int			extra_blocks = 0;
> > @@ -1785,19 +1375,8 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  			agno);
> >  	}
> >  
> > -	/*
> > -	 * ok, now set up the btree cursors for the on-disk btrees (includes
> > -	 * pre-allocating all required blocks for the trees themselves)
> > -	 */
> > -	init_ino_cursor(mp, agno, &ino_btree_curs, &num_inos,
> > -			&num_free_inos, 0);
> > -
> > -	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > -		init_ino_cursor(mp, agno, &fino_btree_curs, &finobt_num_inos,
> > -				&finobt_num_free_inos, 1);
> > -
> > -	sb_icount_ag[agno] += num_inos;
> > -	sb_ifree_ag[agno] += num_free_inos;
> > +	init_ino_cursors(&sc, agno, num_freeblocks, &sb_icount_ag[agno],
> > +			&sb_ifree_ag[agno], &btr_ino, &btr_fino);
> >  
> >  	/*
> >  	 * Set up the btree cursors for the on-disk rmap btrees, which includes
> > @@ -1886,36 +1465,23 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &rmap_btree_curs,
> >  			&refcnt_btree_curs, lost_fsb);
> >  
> > -	/*
> > -	 * build inode allocation tree.
> > -	 */
> > -	build_ino_tree(mp, agno, &ino_btree_curs, XFS_BTNUM_INO, &agi_stat);
> > -	write_cursor(&ino_btree_curs);
> > -
> > -	/*
> > -	 * build free inode tree
> > -	 */
> > -	if (xfs_sb_version_hasfinobt(&mp->m_sb)) {
> > -		build_ino_tree(mp, agno, &fino_btree_curs,
> > -				XFS_BTNUM_FINO, NULL);
> > -		write_cursor(&fino_btree_curs);
> > -	}
> > +	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> >  
> >  	/* build the agi */
> > -	build_agi(mp, agno, &ino_btree_curs, &fino_btree_curs, &agi_stat);
> > +	build_agi(mp, agno, &btr_ino, &btr_fino);
> >  
> >  	/*
> >  	 * tear down cursors
> >  	 */
> >  	finish_rebuild(mp, &btr_bno, lost_fsb);
> >  	finish_rebuild(mp, &btr_cnt, lost_fsb);
> > -	finish_cursor(&ino_btree_curs);
> > +	finish_rebuild(mp, &btr_ino, lost_fsb);
> > +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > +		finish_rebuild(mp, &btr_fino, lost_fsb);
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> >  		finish_cursor(&rmap_btree_curs);
> >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> >  		finish_cursor(&refcnt_btree_curs);
> > -	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > -		finish_cursor(&fino_btree_curs);
> >  
> >  	/*
> >  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/12] xfs_repair: remove old btree rebuild support code
  2020-06-02  4:28 ` [PATCH 11/12] xfs_repair: remove old btree rebuild support code Darrick J. Wong
@ 2020-06-19 11:10   ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-06-19 11:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:28:04PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> This code isn't needed anymore, so get rid of it.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  repair/phase5.c |  242 -------------------------------------------------------
>  1 file changed, 242 deletions(-)
> 
> 
> diff --git a/repair/phase5.c b/repair/phase5.c
> index ad009416..439c1065 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
> @@ -21,52 +21,6 @@
>  #include "bulkload.h"
>  #include "agbtree.h"
>  
> -/*
> - * we maintain the current slice (path from root to leaf)
> - * of the btree incore.  when we need a new block, we ask
> - * the block allocator for the address of a block on that
> - * level, map the block in, and set up the appropriate
> - * pointers (child, silbing, etc.) and keys that should
> - * point to the new block.
> - */
> -typedef struct bt_stat_level  {
> -	/*
> -	 * set in setup_cursor routine and maintained in the tree-building
> -	 * routines
> -	 */
> -	xfs_buf_t		*buf_p;		/* 2 buffer pointers to ... */
> -	xfs_buf_t		*prev_buf_p;
> -	xfs_agblock_t		agbno;		/* current block being filled */
> -	xfs_agblock_t		prev_agbno;	/* previous block */
> -	/*
> -	 * set in calculate/init cursor routines for each btree level
> -	 */
> -	int			num_recs_tot;	/* # tree recs in level */
> -	int			num_blocks;	/* # tree blocks in level */
> -	int			num_recs_pb;	/* num_recs_tot / num_blocks */
> -	int			modulo;		/* num_recs_tot % num_blocks */
> -} bt_stat_level_t;
> -
> -typedef struct bt_status  {
> -	int			init;		/* cursor set up once? */
> -	int			num_levels;	/* # of levels in btree */
> -	xfs_extlen_t		num_tot_blocks;	/* # blocks alloc'ed for tree */
> -	xfs_extlen_t		num_free_blocks;/* # blocks currently unused */
> -
> -	xfs_agblock_t		root;		/* root block */
> -	/*
> -	 * list of blocks to be used to set up this tree
> -	 * and pointer to the first unused block on the list
> -	 */
> -	xfs_agblock_t		*btree_blocks;		/* block list */
> -	xfs_agblock_t		*free_btree_blocks;	/* first unused block */
> -	/*
> -	 * per-level status info
> -	 */
> -	bt_stat_level_t		level[XFS_BTREE_MAXLEVELS];
> -	uint64_t		owner;		/* owner */
> -} bt_status_t;
> -
>  static uint64_t	*sb_icount_ag;		/* allocated inodes per ag */
>  static uint64_t	*sb_ifree_ag;		/* free inodes per ag */
>  static uint64_t	*sb_fdblocks_ag;	/* free data blocks per ag */
> @@ -164,202 +118,6 @@ mk_incore_fstree(
>  	return(num_extents);
>  }
>  
> -static xfs_agblock_t
> -get_next_blockaddr(xfs_agnumber_t agno, int level, bt_status_t *curs)
> -{
> -	ASSERT(curs->free_btree_blocks < curs->btree_blocks +
> -						curs->num_tot_blocks);
> -	ASSERT(curs->num_free_blocks > 0);
> -
> -	curs->num_free_blocks--;
> -	return(*curs->free_btree_blocks++);
> -}
> -
> -/*
> - * set up the dynamically allocated block allocation data in the btree
> - * cursor that depends on the info in the static portion of the cursor.
> - * allocates space from the incore bno/bcnt extent trees and sets up
> - * the first path up the left side of the tree.  Also sets up the
> - * cursor pointer to the btree root.   called by init_freespace_cursor()
> - * and init_ino_cursor()
> - */
> -static void
> -setup_cursor(xfs_mount_t *mp, xfs_agnumber_t agno, bt_status_t *curs)
> -{
> -	int			j;
> -	unsigned int		u;
> -	xfs_extlen_t		big_extent_len;
> -	xfs_agblock_t		big_extent_start;
> -	extent_tree_node_t	*ext_ptr;
> -	extent_tree_node_t	*bno_ext_ptr;
> -	xfs_extlen_t		blocks_allocated;
> -	xfs_agblock_t		*agb_ptr;
> -	int			error;
> -
> -	/*
> -	 * get the number of blocks we need to allocate, then
> -	 * set up block number array, set the free block pointer
> -	 * to the first block in the array, and null the array
> -	 */
> -	big_extent_len = curs->num_tot_blocks;
> -	blocks_allocated = 0;
> -
> -	ASSERT(big_extent_len > 0);
> -
> -	if ((curs->btree_blocks = malloc(sizeof(xfs_agblock_t)
> -					* big_extent_len)) == NULL)
> -		do_error(_("could not set up btree block array\n"));
> -
> -	agb_ptr = curs->free_btree_blocks = curs->btree_blocks;
> -
> -	for (j = 0; j < curs->num_free_blocks; j++, agb_ptr++)
> -		*agb_ptr = NULLAGBLOCK;
> -
> -	/*
> -	 * grab the smallest extent and use it up, then get the
> -	 * next smallest.  This mimics the init_*_cursor code.
> -	 */
> -	ext_ptr =  findfirst_bcnt_extent(agno);
> -
> -	agb_ptr = curs->btree_blocks;
> -
> -	/*
> -	 * set up the free block array
> -	 */
> -	while (blocks_allocated < big_extent_len)  {
> -		if (!ext_ptr)
> -			do_error(
> -_("error - not enough free space in filesystem\n"));
> -		/*
> -		 * use up the extent we've got
> -		 */
> -		for (u = 0; u < ext_ptr->ex_blockcount &&
> -				blocks_allocated < big_extent_len; u++)  {
> -			ASSERT(agb_ptr < curs->btree_blocks
> -					+ curs->num_tot_blocks);
> -			*agb_ptr++ = ext_ptr->ex_startblock + u;
> -			blocks_allocated++;
> -		}
> -
> -		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, u,
> -				curs->owner);
> -		if (error)
> -			do_error(_("could not set up btree rmaps: %s\n"),
> -				strerror(-error));
> -
> -		/*
> -		 * if we only used part of this last extent, then we
> -		 * need only to reset the extent in the extent
> -		 * trees and we're done
> -		 */
> -		if (u < ext_ptr->ex_blockcount)  {
> -			big_extent_start = ext_ptr->ex_startblock + u;
> -			big_extent_len = ext_ptr->ex_blockcount - u;
> -
> -			ASSERT(big_extent_len > 0);
> -
> -			bno_ext_ptr = find_bno_extent(agno,
> -						ext_ptr->ex_startblock);
> -			ASSERT(bno_ext_ptr != NULL);
> -			get_bno_extent(agno, bno_ext_ptr);
> -			release_extent_tree_node(bno_ext_ptr);
> -
> -			ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> -					ext_ptr->ex_blockcount);
> -			release_extent_tree_node(ext_ptr);
> -#ifdef XR_BLD_FREE_TRACE
> -			fprintf(stderr, "releasing extent: %u [%u %u]\n",
> -				agno, ext_ptr->ex_startblock,
> -				ext_ptr->ex_blockcount);
> -			fprintf(stderr, "blocks_allocated = %d\n",
> -				blocks_allocated);
> -#endif
> -
> -			add_bno_extent(agno, big_extent_start, big_extent_len);
> -			add_bcnt_extent(agno, big_extent_start, big_extent_len);
> -
> -			return;
> -		}
> -		/*
> -		 * delete the used-up extent from both extent trees and
> -		 * find next biggest extent
> -		 */
> -#ifdef XR_BLD_FREE_TRACE
> -		fprintf(stderr, "releasing extent: %u [%u %u]\n",
> -			agno, ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> -#endif
> -		bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
> -		ASSERT(bno_ext_ptr != NULL);
> -		get_bno_extent(agno, bno_ext_ptr);
> -		release_extent_tree_node(bno_ext_ptr);
> -
> -		ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> -				ext_ptr->ex_blockcount);
> -		ASSERT(ext_ptr != NULL);
> -		release_extent_tree_node(ext_ptr);
> -
> -		ext_ptr = findfirst_bcnt_extent(agno);
> -	}
> -#ifdef XR_BLD_FREE_TRACE
> -	fprintf(stderr, "blocks_allocated = %d\n",
> -		blocks_allocated);
> -#endif
> -}
> -
> -static void
> -write_cursor(bt_status_t *curs)
> -{
> -	int i;
> -
> -	for (i = 0; i < curs->num_levels; i++)  {
> -#if defined(XR_BLD_FREE_TRACE) || defined(XR_BLD_INO_TRACE)
> -		fprintf(stderr, "writing bt block %u\n", curs->level[i].agbno);
> -#endif
> -		if (curs->level[i].prev_buf_p != NULL)  {
> -			ASSERT(curs->level[i].prev_agbno != NULLAGBLOCK);
> -#if defined(XR_BLD_FREE_TRACE) || defined(XR_BLD_INO_TRACE)
> -			fprintf(stderr, "writing bt prev block %u\n",
> -						curs->level[i].prev_agbno);
> -#endif
> -			libxfs_buf_mark_dirty(curs->level[i].prev_buf_p);
> -			libxfs_buf_relse(curs->level[i].prev_buf_p);
> -		}
> -		libxfs_buf_mark_dirty(curs->level[i].buf_p);
> -		libxfs_buf_relse(curs->level[i].buf_p);
> -	}
> -}
> -
> -static void
> -finish_cursor(bt_status_t *curs)
> -{
> -	ASSERT(curs->num_free_blocks == 0);
> -	free(curs->btree_blocks);
> -}
> -
> -/* Map btnum to buffer ops for the types that need it. */
> -static const struct xfs_buf_ops *
> -btnum_to_ops(
> -	xfs_btnum_t	btnum)
> -{
> -	switch (btnum) {
> -	case XFS_BTNUM_BNO:
> -		return &xfs_bnobt_buf_ops;
> -	case XFS_BTNUM_CNT:
> -		return &xfs_cntbt_buf_ops;
> -	case XFS_BTNUM_INO:
> -		return &xfs_inobt_buf_ops;
> -	case XFS_BTNUM_FINO:
> -		return &xfs_finobt_buf_ops;
> -	case XFS_BTNUM_RMAP:
> -		return &xfs_rmapbt_buf_ops;
> -	case XFS_BTNUM_REFC:
> -		return &xfs_refcountbt_buf_ops;
> -	default:
> -		ASSERT(0);
> -		return NULL;
> -	}
> -}
> -
>  /*
>   * XXX: yet more code that can be shared with mkfs, growfs.
>   */
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction
  2020-06-02  4:28 ` [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction Darrick J. Wong
@ 2020-06-19 11:10   ` Brian Foster
  2020-06-19 21:36     ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Brian Foster @ 2020-06-19 11:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Jun 01, 2020 at 09:28:10PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Use the incore bitmap structure to track blocks that were lost
> during btree construction.  This makes it somewhat more efficient.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  repair/agbtree.c |   21 ++++++++--------
>  repair/agbtree.h |    2 +-
>  repair/phase5.c  |   72 ++++++++++++++++++++++--------------------------------
>  3 files changed, 41 insertions(+), 54 deletions(-)
> 
> 
...
> diff --git a/repair/phase5.c b/repair/phase5.c
> index 439c1065..446f7ec0 100644
> --- a/repair/phase5.c
> +++ b/repair/phase5.c
...
> @@ -211,7 +212,7 @@ build_agf_agfl(
>  	struct bt_rebuild	*btr_cnt,
>  	struct bt_rebuild	*btr_rmap,
>  	struct bt_rebuild	*btr_refc,
> -	struct xfs_slab		*lost_fsb)
> +	struct bitmap		*lost_blocks)
>  {

Looks like another case of an unused parameter here, otherwise looks
good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  	struct extent_tree_node	*ext_ptr;
>  	struct xfs_buf		*agf_buf, *agfl_buf;
> @@ -428,7 +429,7 @@ static void
>  phase5_func(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> -	struct xfs_slab		*lost_fsb)
> +	struct bitmap		*lost_blocks)
>  {
>  	struct repair_ctx	sc = { .mp = mp, };
>  	struct bt_rebuild	btr_bno;
> @@ -543,7 +544,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	 * set up agf and agfl
>  	 */
>  	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
> -			lost_fsb);
> +			lost_blocks);
>  
>  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
>  
> @@ -553,15 +554,15 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	/*
>  	 * tear down cursors
>  	 */
> -	finish_rebuild(mp, &btr_bno, lost_fsb);
> -	finish_rebuild(mp, &btr_cnt, lost_fsb);
> -	finish_rebuild(mp, &btr_ino, lost_fsb);
> +	finish_rebuild(mp, &btr_bno, lost_blocks);
> +	finish_rebuild(mp, &btr_cnt, lost_blocks);
> +	finish_rebuild(mp, &btr_ino, lost_blocks);
>  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> -		finish_rebuild(mp, &btr_fino, lost_fsb);
> +		finish_rebuild(mp, &btr_fino, lost_blocks);
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> -		finish_rebuild(mp, &btr_rmap, lost_fsb);
> +		finish_rebuild(mp, &btr_rmap, lost_blocks);
>  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> -		finish_rebuild(mp, &btr_refc, lost_fsb);
> +		finish_rebuild(mp, &btr_refc, lost_blocks);
>  
>  	/*
>  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> @@ -572,48 +573,33 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
>  	PROG_RPT_INC(prog_rpt_done[agno], 1);
>  }
>  
> -/* Inject lost blocks back into the filesystem. */
> +/* Inject this unused space back into the filesystem. */
>  static int
> -inject_lost_blocks(
> -	struct xfs_mount	*mp,
> -	struct xfs_slab		*lost_fsbs)
> +inject_lost_extent(
> +	uint64_t		start,
> +	uint64_t		length,
> +	void			*arg)
>  {
> -	struct xfs_trans	*tp = NULL;
> -	struct xfs_slab_cursor	*cur = NULL;
> -	xfs_fsblock_t		*fsb;
> +	struct xfs_mount	*mp = arg;
> +	struct xfs_trans	*tp;
>  	int			error;
>  
> -	error = init_slab_cursor(lost_fsbs, NULL, &cur);
> +	error = -libxfs_trans_alloc_rollable(mp, 16, &tp);
>  	if (error)
>  		return error;
>  
> -	while ((fsb = pop_slab_cursor(cur)) != NULL) {
> -		error = -libxfs_trans_alloc_rollable(mp, 16, &tp);
> -		if (error)
> -			goto out_cancel;
> -
> -		error = -libxfs_free_extent(tp, *fsb, 1,
> -				&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
> -		if (error)
> -			goto out_cancel;
> -
> -		error = -libxfs_trans_commit(tp);
> -		if (error)
> -			goto out_cancel;
> -		tp = NULL;
> -	}
> +	error = -libxfs_free_extent(tp, start, length,
> +			&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
> +	if (error)
> +		return error;
>  
> -out_cancel:
> -	if (tp)
> -		libxfs_trans_cancel(tp);
> -	free_slab_cursor(&cur);
> -	return error;
> +	return -libxfs_trans_commit(tp);
>  }
>  
>  void
>  phase5(xfs_mount_t *mp)
>  {
> -	struct xfs_slab		*lost_fsb;
> +	struct bitmap		*lost_blocks = NULL;
>  	xfs_agnumber_t		agno;
>  	int			error;
>  
> @@ -656,12 +642,12 @@ phase5(xfs_mount_t *mp)
>  	if (sb_fdblocks_ag == NULL)
>  		do_error(_("cannot alloc sb_fdblocks_ag buffers\n"));
>  
> -	error = init_slab(&lost_fsb, sizeof(xfs_fsblock_t));
> +	error = bitmap_alloc(&lost_blocks);
>  	if (error)
> -		do_error(_("cannot alloc lost block slab\n"));
> +		do_error(_("cannot alloc lost block bitmap\n"));
>  
>  	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
> -		phase5_func(mp, agno, lost_fsb);
> +		phase5_func(mp, agno, lost_blocks);
>  
>  	print_final_rpt();
>  
> @@ -704,10 +690,10 @@ _("unable to add AG %u reverse-mapping data to btree.\n"), agno);
>  	 * Put blocks that were unnecessarily reserved for btree
>  	 * reconstruction back into the filesystem free space data.
>  	 */
> -	error = inject_lost_blocks(mp, lost_fsb);
> +	error = bitmap_iterate(lost_blocks, inject_lost_extent, mp);
>  	if (error)
>  		do_error(_("Unable to reinsert lost blocks into filesystem.\n"));
> -	free_slab(&lost_fsb);
> +	bitmap_free(&lost_blocks);
>  
>  	bad_ino_btree = 0;
>  
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction
  2020-06-19 11:10   ` Brian Foster
@ 2020-06-19 21:36     ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-19 21:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Fri, Jun 19, 2020 at 07:10:47AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:28:10PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Use the incore bitmap structure to track blocks that were lost
> > during btree construction.  This makes it somewhat more efficient.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  repair/agbtree.c |   21 ++++++++--------
> >  repair/agbtree.h |    2 +-
> >  repair/phase5.c  |   72 ++++++++++++++++++++++--------------------------------
> >  3 files changed, 41 insertions(+), 54 deletions(-)
> > 
> > 
> ...
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index 439c1065..446f7ec0 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> ...
> > @@ -211,7 +212,7 @@ build_agf_agfl(
> >  	struct bt_rebuild	*btr_cnt,
> >  	struct bt_rebuild	*btr_rmap,
> >  	struct bt_rebuild	*btr_refc,
> > -	struct xfs_slab		*lost_fsb)
> > +	struct bitmap		*lost_blocks)
> >  {
> 
> Looks like another case of an unused parameter here, otherwise looks
> good:

Heh, yep, this could be removed all the way back in "xfs_repair: rebuild
free space btrees with bulk loader" so I'll go do it there.

Thanks for reviewing!

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  	struct extent_tree_node	*ext_ptr;
> >  	struct xfs_buf		*agf_buf, *agfl_buf;
> > @@ -428,7 +429,7 @@ static void
> >  phase5_func(
> >  	struct xfs_mount	*mp,
> >  	xfs_agnumber_t		agno,
> > -	struct xfs_slab		*lost_fsb)
> > +	struct bitmap		*lost_blocks)
> >  {
> >  	struct repair_ctx	sc = { .mp = mp, };
> >  	struct bt_rebuild	btr_bno;
> > @@ -543,7 +544,7 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	 * set up agf and agfl
> >  	 */
> >  	build_agf_agfl(mp, agno, &btr_bno, &btr_cnt, &btr_rmap, &btr_refc,
> > -			lost_fsb);
> > +			lost_blocks);
> >  
> >  	build_inode_btrees(&sc, agno, &btr_ino, &btr_fino);
> >  
> > @@ -553,15 +554,15 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	/*
> >  	 * tear down cursors
> >  	 */
> > -	finish_rebuild(mp, &btr_bno, lost_fsb);
> > -	finish_rebuild(mp, &btr_cnt, lost_fsb);
> > -	finish_rebuild(mp, &btr_ino, lost_fsb);
> > +	finish_rebuild(mp, &btr_bno, lost_blocks);
> > +	finish_rebuild(mp, &btr_cnt, lost_blocks);
> > +	finish_rebuild(mp, &btr_ino, lost_blocks);
> >  	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > -		finish_rebuild(mp, &btr_fino, lost_fsb);
> > +		finish_rebuild(mp, &btr_fino, lost_blocks);
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
> > -		finish_rebuild(mp, &btr_rmap, lost_fsb);
> > +		finish_rebuild(mp, &btr_rmap, lost_blocks);
> >  	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > -		finish_rebuild(mp, &btr_refc, lost_fsb);
> > +		finish_rebuild(mp, &btr_refc, lost_blocks);
> >  
> >  	/*
> >  	 * release the incore per-AG bno/bcnt trees so the extent nodes
> > @@ -572,48 +573,33 @@ _("unable to rebuild AG %u.  Not enough free space in on-disk AG.\n"),
> >  	PROG_RPT_INC(prog_rpt_done[agno], 1);
> >  }
> >  
> > -/* Inject lost blocks back into the filesystem. */
> > +/* Inject this unused space back into the filesystem. */
> >  static int
> > -inject_lost_blocks(
> > -	struct xfs_mount	*mp,
> > -	struct xfs_slab		*lost_fsbs)
> > +inject_lost_extent(
> > +	uint64_t		start,
> > +	uint64_t		length,
> > +	void			*arg)
> >  {
> > -	struct xfs_trans	*tp = NULL;
> > -	struct xfs_slab_cursor	*cur = NULL;
> > -	xfs_fsblock_t		*fsb;
> > +	struct xfs_mount	*mp = arg;
> > +	struct xfs_trans	*tp;
> >  	int			error;
> >  
> > -	error = init_slab_cursor(lost_fsbs, NULL, &cur);
> > +	error = -libxfs_trans_alloc_rollable(mp, 16, &tp);
> >  	if (error)
> >  		return error;
> >  
> > -	while ((fsb = pop_slab_cursor(cur)) != NULL) {
> > -		error = -libxfs_trans_alloc_rollable(mp, 16, &tp);
> > -		if (error)
> > -			goto out_cancel;
> > -
> > -		error = -libxfs_free_extent(tp, *fsb, 1,
> > -				&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
> > -		if (error)
> > -			goto out_cancel;
> > -
> > -		error = -libxfs_trans_commit(tp);
> > -		if (error)
> > -			goto out_cancel;
> > -		tp = NULL;
> > -	}
> > +	error = -libxfs_free_extent(tp, start, length,
> > +			&XFS_RMAP_OINFO_ANY_OWNER, XFS_AG_RESV_NONE);
> > +	if (error)
> > +		return error;
> >  
> > -out_cancel:
> > -	if (tp)
> > -		libxfs_trans_cancel(tp);
> > -	free_slab_cursor(&cur);
> > -	return error;
> > +	return -libxfs_trans_commit(tp);
> >  }
> >  
> >  void
> >  phase5(xfs_mount_t *mp)
> >  {
> > -	struct xfs_slab		*lost_fsb;
> > +	struct bitmap		*lost_blocks = NULL;
> >  	xfs_agnumber_t		agno;
> >  	int			error;
> >  
> > @@ -656,12 +642,12 @@ phase5(xfs_mount_t *mp)
> >  	if (sb_fdblocks_ag == NULL)
> >  		do_error(_("cannot alloc sb_fdblocks_ag buffers\n"));
> >  
> > -	error = init_slab(&lost_fsb, sizeof(xfs_fsblock_t));
> > +	error = bitmap_alloc(&lost_blocks);
> >  	if (error)
> > -		do_error(_("cannot alloc lost block slab\n"));
> > +		do_error(_("cannot alloc lost block bitmap\n"));
> >  
> >  	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
> > -		phase5_func(mp, agno, lost_fsb);
> > +		phase5_func(mp, agno, lost_blocks);
> >  
> >  	print_final_rpt();
> >  
> > @@ -704,10 +690,10 @@ _("unable to add AG %u reverse-mapping data to btree.\n"), agno);
> >  	 * Put blocks that were unnecessarily reserved for btree
> >  	 * reconstruction back into the filesystem free space data.
> >  	 */
> > -	error = inject_lost_blocks(mp, lost_fsb);
> > +	error = bitmap_iterate(lost_blocks, inject_lost_extent, mp);
> >  	if (error)
> >  		do_error(_("Unable to reinsert lost blocks into filesystem.\n"));
> > -	free_slab(&lost_fsb);
> > +	bitmap_free(&lost_blocks);
> >  
> >  	bad_ino_btree = 0;
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-06-17 12:10   ` Brian Foster
  2020-06-18 18:30     ` Darrick J. Wong
@ 2020-06-29 23:10     ` Darrick J. Wong
  1 sibling, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-06-29 23:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Wed, Jun 17, 2020 at 08:10:01AM -0400, Brian Foster wrote:
> On Mon, Jun 01, 2020 at 09:27:31PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some new support structures and functions to assist phase5 in
> > using the btree bulk loader to reconstruct metadata btrees.  This is the
> > first step in removing the open-coded AG btree rebuilding code.
> > 
> > Note: The code in this patch will not be used anywhere until the next
> > patch, so warnings about unused symbols are expected.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> I still find it odd to include the phase5.c changes in this patch when
> it amounts to the addition of a single unused parameter, but I'll defer
> to the maintainer on that. Otherwise LGTM:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  repair/Makefile   |    4 +
> >  repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  repair/agbtree.h  |   29 ++++++++++
> >  repair/bulkload.c |   37 +++++++++++++
> >  repair/bulkload.h |    2 +
> >  repair/phase5.c   |   41 ++++++++------
> >  6 files changed, 244 insertions(+), 21 deletions(-)
> >  create mode 100644 repair/agbtree.c
> >  create mode 100644 repair/agbtree.h
> > 
> > 
> > diff --git a/repair/Makefile b/repair/Makefile
> > index 62d84bbf..f6a6e3f9 100644
> > --- a/repair/Makefile
> > +++ b/repair/Makefile
> > @@ -9,11 +9,11 @@ LSRCFILES = README
> >  
> >  LTCOMMAND = xfs_repair
> >  
> > -HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> > +HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> >  	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
> >  	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
> >  
> > -CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> > +CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> >  	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
> >  	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
> >  	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
> > diff --git a/repair/agbtree.c b/repair/agbtree.c
> > new file mode 100644
> > index 00000000..e4179a44
> > --- /dev/null
> > +++ b/repair/agbtree.c
> > @@ -0,0 +1,152 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#include <libxfs.h>
> > +#include "err_protos.h"
> > +#include "slab.h"
> > +#include "rmap.h"
> > +#include "incore.h"
> > +#include "bulkload.h"
> > +#include "agbtree.h"
> > +
> > +/* Initialize a btree rebuild context. */
> > +static void
> > +init_rebuild(
> > +	struct repair_ctx		*sc,
> > +	const struct xfs_owner_info	*oinfo,
> > +	xfs_agblock_t			free_space,
> > +	struct bt_rebuild		*btr)
> > +{
> > +	memset(btr, 0, sizeof(struct bt_rebuild));
> > +
> > +	bulkload_init_ag(&btr->newbt, sc, oinfo);
> > +	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
> > +}
> > +
> > +/*
> > + * Update this free space record to reflect the blocks we stole from the
> > + * beginning of the record.
> > + */
> > +static void
> > +consume_freespace(
> > +	xfs_agnumber_t		agno,
> > +	struct extent_tree_node	*ext_ptr,
> > +	uint32_t		len)
> > +{
> > +	struct extent_tree_node	*bno_ext_ptr;
> > +	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
> > +	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
> > +
> > +	/* Delete the used-up extent from both extent trees. */
> > +#ifdef XR_BLD_FREE_TRACE
> > +	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
> > +			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > +#endif
> > +	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
> > +	ASSERT(bno_ext_ptr != NULL);
> > +	get_bno_extent(agno, bno_ext_ptr);
> > +	release_extent_tree_node(bno_ext_ptr);
> > +
> > +	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> > +			ext_ptr->ex_blockcount);
> > +	release_extent_tree_node(ext_ptr);
> > +
> > +	/*
> > +	 * If we only used part of this last extent, then we must reinsert the
> > +	 * extent to maintain proper sorting order.
> > +	 */
> > +	if (new_len > 0) {
> > +		add_bno_extent(agno, new_start, new_len);
> > +		add_bcnt_extent(agno, new_start, new_len);
> > +	}
> > +}
> > +
> > +/* Reserve blocks for the new btree. */
> > +static void
> > +reserve_btblocks(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct bt_rebuild	*btr,
> > +	uint32_t		nr_blocks)
> > +{
> > +	struct extent_tree_node	*ext_ptr;
> > +	uint32_t		blocks_allocated = 0;
> > +	uint32_t		len;
> > +	int			error;
> > +
> > +	while (blocks_allocated < nr_blocks)  {
> > +		xfs_fsblock_t	fsbno;
> > +
> > +		/*
> > +		 * Grab the smallest extent and use it up, then get the
> > +		 * next smallest.  This mimics the init_*_cursor code.
> > +		 */
> > +		ext_ptr = findfirst_bcnt_extent(agno);
> > +		if (!ext_ptr)
> > +			do_error(
> > +_("error - not enough free space in filesystem\n"));
> > +
> > +		/* Use up the extent we've got. */
> > +		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
> > +		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
> > +		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
> > +		if (error)
> > +			do_error(_("could not set up btree reservation: %s\n"),
> > +				strerror(-error));
> > +
> > +		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
> > +				btr->newbt.oinfo.oi_owner);
> > +		if (error)
> > +			do_error(_("could not set up btree rmaps: %s\n"),
> > +				strerror(-error));
> > +
> > +		consume_freespace(agno, ext_ptr, len);
> > +		blocks_allocated += len;
> > +	}
> > +#ifdef XR_BLD_FREE_TRACE
> > +	fprintf(stderr, "blocks_allocated = %d\n",
> > +		blocks_allocated);
> > +#endif
> > +}
> > +
> > +/* Feed one of the new btree blocks to the bulk loader. */
> > +static int
> > +rebuild_claim_block(
> > +	struct xfs_btree_cur	*cur,
> > +	union xfs_btree_ptr	*ptr,
> > +	void			*priv)
> > +{
> > +	struct bt_rebuild	*btr = priv;
> > +
> > +	return bulkload_claim_block(cur, &btr->newbt, ptr);
> > +}
> > +
> > +/*
> > + * Scoop up leftovers from a rebuild cursor for later freeing, then free the
> > + * rebuild context.
> > + */
> > +void
> > +finish_rebuild(
> > +	struct xfs_mount	*mp,
> > +	struct bt_rebuild	*btr,
> > +	struct xfs_slab		*lost_fsb)
> > +{
> > +	struct bulkload_resv	*resv, *n;
> > +
> > +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> > +		while (resv->used < resv->len) {
> > +			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
> > +			int		error;
> > +
> > +			error = slab_add(lost_fsb, &fsb);
> > +			if (error)
> > +				do_error(
> > +_("Insufficient memory saving lost blocks.\n"));
> > +			resv->used++;
> > +		}
> > +	}
> > +
> > +	bulkload_destroy(&btr->newbt, 0);
> > +}
> > diff --git a/repair/agbtree.h b/repair/agbtree.h
> > new file mode 100644
> > index 00000000..50ea3c60
> > --- /dev/null
> > +++ b/repair/agbtree.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#ifndef __XFS_REPAIR_AG_BTREE_H__
> > +#define __XFS_REPAIR_AG_BTREE_H__
> > +
> > +/* Context for rebuilding a per-AG btree. */
> > +struct bt_rebuild {
> > +	/* Fake root for staging and space preallocations. */
> > +	struct bulkload	newbt;
> > +
> > +	/* Geometry of the new btree. */
> > +	struct xfs_btree_bload	bload;
> > +
> > +	/* Staging btree cursor for the new tree. */
> > +	struct xfs_btree_cur	*cur;
> > +
> > +	/* Tree-specific data. */
> > +	union {
> > +		struct xfs_slab_cursor	*slab_cursor;
> > +	};
> > +};
> > +
> > +void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> > +		struct xfs_slab *lost_fsb);
> > +
> > +#endif /* __XFS_REPAIR_AG_BTREE_H__ */
> > diff --git a/repair/bulkload.c b/repair/bulkload.c
> > index 4c69fe0d..9a6ca0c2 100644
> > --- a/repair/bulkload.c
> > +++ b/repair/bulkload.c
> > @@ -95,3 +95,40 @@ bulkload_claim_block(
> >  		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Estimate proper slack values for a btree that's being reloaded.
> > + *
> > + * Under most circumstances, we'll take whatever default loading value the
> > + * btree bulk loading code calculates for us.  However, there are some
> > + * exceptions to this rule:
> > + *
> > + * (1) If someone turned one of the debug knobs.
> > + * (2) The AG has less than ~9% space free.
> > + *
> > + * Note that we actually use 3/32 for the comparison to avoid division.
> > + */
> > +void
> > +bulkload_estimate_ag_slack(
> > +	struct repair_ctx	*sc,
> > +	struct xfs_btree_bload	*bload,
> > +	unsigned int		free)
> > +{
> > +	/*
> > +	 * The global values are set to -1 (i.e. take the bload defaults)
> > +	 * unless someone has set them otherwise, so we just pull the values
> > +	 * here.
> > +	 */
> > +	bload->leaf_slack = bload_leaf_slack;
> > +	bload->node_slack = bload_node_slack;
> > +
> > +	/* No further changes if there's more than 3/32ths space left. */
> > +	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
> > +		return;
> > +
> > +	/* We're low on space; load the btrees as tightly as possible. */
> > +	if (bload->leaf_slack < 0)
> > +		bload->leaf_slack = 0;
> > +	if (bload->node_slack < 0)
> > +		bload->node_slack = 0;

Heh.  It turns out that this caused infrequent warnings in
generic/333 because adding the extra rmap records for the AG btrees at
the end of phase 5 could cause enough rmapbt splits such that we
wouldn't have enough space left in the AG to satisfy the per-AG
reservation at the next mount.

I /think/ the solution here is to set the slack values to 2 (instead of
zero) like we did in xfs_repair before this patch.

--D

> > +}
> > diff --git a/repair/bulkload.h b/repair/bulkload.h
> > index 79f81cb0..01f67279 100644
> > --- a/repair/bulkload.h
> > +++ b/repair/bulkload.h
> > @@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
> >  void bulkload_destroy(struct bulkload *bkl, int error);
> >  int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
> >  		union xfs_btree_ptr *ptr);
> > +void bulkload_estimate_ag_slack(struct repair_ctx *sc,
> > +		struct xfs_btree_bload *bload, unsigned int free);
> >  
> >  #endif /* __XFS_REPAIR_BULKLOAD_H__ */
> > diff --git a/repair/phase5.c b/repair/phase5.c
> > index 75c480fd..8175aa6f 100644
> > --- a/repair/phase5.c
> > +++ b/repair/phase5.c
> > @@ -18,6 +18,8 @@
> >  #include "progress.h"
> >  #include "slab.h"
> >  #include "rmap.h"
> > +#include "bulkload.h"
> > +#include "agbtree.h"
> >  
> >  /*
> >   * we maintain the current slice (path from root to leaf)
> > @@ -2288,28 +2290,29 @@ keep_fsinos(xfs_mount_t *mp)
> >  
> >  static void
> >  phase5_func(
> > -	xfs_mount_t	*mp,
> > -	xfs_agnumber_t	agno,
> > -	struct xfs_slab	*lost_fsb)
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct xfs_slab		*lost_fsb)
> >  {
> > -	uint64_t	num_inos;
> > -	uint64_t	num_free_inos;
> > -	uint64_t	finobt_num_inos;
> > -	uint64_t	finobt_num_free_inos;
> > -	bt_status_t	bno_btree_curs;
> > -	bt_status_t	bcnt_btree_curs;
> > -	bt_status_t	ino_btree_curs;
> > -	bt_status_t	fino_btree_curs;
> > -	bt_status_t	rmap_btree_curs;
> > -	bt_status_t	refcnt_btree_curs;
> > -	int		extra_blocks = 0;
> > -	uint		num_freeblocks;
> > -	xfs_extlen_t	freeblks1;
> > +	struct repair_ctx	sc = { .mp = mp, };
> > +	struct agi_stat		agi_stat = {0,};
> > +	uint64_t		num_inos;
> > +	uint64_t		num_free_inos;
> > +	uint64_t		finobt_num_inos;
> > +	uint64_t		finobt_num_free_inos;
> > +	bt_status_t		bno_btree_curs;
> > +	bt_status_t		bcnt_btree_curs;
> > +	bt_status_t		ino_btree_curs;
> > +	bt_status_t		fino_btree_curs;
> > +	bt_status_t		rmap_btree_curs;
> > +	bt_status_t		refcnt_btree_curs;
> > +	int			extra_blocks = 0;
> > +	uint			num_freeblocks;
> > +	xfs_extlen_t		freeblks1;
> >  #ifdef DEBUG
> > -	xfs_extlen_t	freeblks2;
> > +	xfs_extlen_t		freeblks2;
> >  #endif
> > -	xfs_agblock_t	num_extents;
> > -	struct agi_stat	agi_stat = {0,};
> > +	xfs_agblock_t		num_extents;
> >  
> >  	if (verbose)
> >  		do_log(_("        - agno = %d\n"), agno);
> > 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-06-02  4:27 ` [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors Darrick J. Wong
  2020-06-17 12:10   ` Brian Foster
@ 2020-07-02 15:18   ` Darrick J. Wong
  2020-07-03  3:24     ` Eric Sandeen
  2020-07-10 19:10     ` Eric Sandeen
  1 sibling, 2 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-07-02 15:18 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create some new support structures and functions to assist phase5 in
using the btree bulk loader to reconstruct metadata btrees.  This is the
first step in removing the open-coded AG btree rebuilding code.

Note: The code in this patch will not be used anywhere until the next
patch, so warnings about unused symbols are expected.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: set the "nearly out of space" slack value to 2 so that we don't
start out with tons of btree splitting right after mount
---
 repair/Makefile   |    4 +
 repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/agbtree.h  |   29 ++++++++++
 repair/bulkload.c |   41 ++++++++++++++
 repair/bulkload.h |    2 +
 5 files changed, 226 insertions(+), 2 deletions(-)
 create mode 100644 repair/agbtree.c
 create mode 100644 repair/agbtree.h

diff --git a/repair/Makefile b/repair/Makefile
index 62d84bbf..f6a6e3f9 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -9,11 +9,11 @@ LSRCFILES = README
 
 LTCOMMAND = xfs_repair
 
-HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
+HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
 	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
 	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
 
-CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
+CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
 	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
 	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
 	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
diff --git a/repair/agbtree.c b/repair/agbtree.c
new file mode 100644
index 00000000..95a3eac9
--- /dev/null
+++ b/repair/agbtree.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2020 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#include <libxfs.h>
+#include "err_protos.h"
+#include "slab.h"
+#include "rmap.h"
+#include "incore.h"
+#include "bulkload.h"
+#include "agbtree.h"
+
+/* Initialize a btree rebuild context. */
+static void
+init_rebuild(
+	struct repair_ctx		*sc,
+	const struct xfs_owner_info	*oinfo,
+	xfs_agblock_t			free_space,
+	struct bt_rebuild		*btr)
+{
+	memset(btr, 0, sizeof(struct bt_rebuild));
+
+	bulkload_init_ag(&btr->newbt, sc, oinfo);
+	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
+}
+
+/*
+ * Update this free space record to reflect the blocks we stole from the
+ * beginning of the record.
+ */
+static void
+consume_freespace(
+	xfs_agnumber_t		agno,
+	struct extent_tree_node	*ext_ptr,
+	uint32_t		len)
+{
+	struct extent_tree_node	*bno_ext_ptr;
+	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
+	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
+
+	/* Delete the used-up extent from both extent trees. */
+#ifdef XR_BLD_FREE_TRACE
+	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
+			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
+#endif
+	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
+	ASSERT(bno_ext_ptr != NULL);
+	get_bno_extent(agno, bno_ext_ptr);
+	release_extent_tree_node(bno_ext_ptr);
+
+	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
+			ext_ptr->ex_blockcount);
+	release_extent_tree_node(ext_ptr);
+
+	/*
+	 * If we only used part of this last extent, then we must reinsert the
+	 * extent to maintain proper sorting order.
+	 */
+	if (new_len > 0) {
+		add_bno_extent(agno, new_start, new_len);
+		add_bcnt_extent(agno, new_start, new_len);
+	}
+}
+
+/* Reserve blocks for the new per-AG structures. */
+static void
+reserve_btblocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	struct bt_rebuild	*btr,
+	uint32_t		nr_blocks)
+{
+	struct extent_tree_node	*ext_ptr;
+	uint32_t		blocks_allocated = 0;
+	uint32_t		len;
+	int			error;
+
+	while (blocks_allocated < nr_blocks)  {
+		xfs_fsblock_t	fsbno;
+
+		/*
+		 * Grab the smallest extent and use it up, then get the
+		 * next smallest.  This mimics the init_*_cursor code.
+		 */
+		ext_ptr = findfirst_bcnt_extent(agno);
+		if (!ext_ptr)
+			do_error(
+_("error - not enough free space in filesystem\n"));
+
+		/* Use up the extent we've got. */
+		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
+		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
+		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
+		if (error)
+			do_error(_("could not set up btree reservation: %s\n"),
+				strerror(-error));
+
+		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
+				btr->newbt.oinfo.oi_owner);
+		if (error)
+			do_error(_("could not set up btree rmaps: %s\n"),
+				strerror(-error));
+
+		consume_freespace(agno, ext_ptr, len);
+		blocks_allocated += len;
+	}
+#ifdef XR_BLD_FREE_TRACE
+	fprintf(stderr, "blocks_allocated = %d\n",
+		blocks_allocated);
+#endif
+}
+
+/* Feed one of the new btree blocks to the bulk loader. */
+static int
+rebuild_claim_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	void			*priv)
+{
+	struct bt_rebuild	*btr = priv;
+
+	return bulkload_claim_block(cur, &btr->newbt, ptr);
+}
+
+/*
+ * Scoop up leftovers from a rebuild cursor for later freeing, then free the
+ * rebuild context.
+ */
+void
+finish_rebuild(
+	struct xfs_mount	*mp,
+	struct bt_rebuild	*btr,
+	struct xfs_slab		*lost_fsb)
+{
+	struct bulkload_resv	*resv, *n;
+
+	for_each_bulkload_reservation(&btr->newbt, resv, n) {
+		while (resv->used < resv->len) {
+			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
+			int		error;
+
+			error = slab_add(lost_fsb, &fsb);
+			if (error)
+				do_error(
+_("Insufficient memory saving lost blocks.\n"));
+			resv->used++;
+		}
+	}
+
+	bulkload_destroy(&btr->newbt, 0);
+}
diff --git a/repair/agbtree.h b/repair/agbtree.h
new file mode 100644
index 00000000..50ea3c60
--- /dev/null
+++ b/repair/agbtree.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ */
+#ifndef __XFS_REPAIR_AG_BTREE_H__
+#define __XFS_REPAIR_AG_BTREE_H__
+
+/* Context for rebuilding a per-AG btree. */
+struct bt_rebuild {
+	/* Fake root for staging and space preallocations. */
+	struct bulkload	newbt;
+
+	/* Geometry of the new btree. */
+	struct xfs_btree_bload	bload;
+
+	/* Staging btree cursor for the new tree. */
+	struct xfs_btree_cur	*cur;
+
+	/* Tree-specific data. */
+	union {
+		struct xfs_slab_cursor	*slab_cursor;
+	};
+};
+
+void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
+		struct xfs_slab *lost_fsb);
+
+#endif /* __XFS_REPAIR_AG_BTREE_H__ */
diff --git a/repair/bulkload.c b/repair/bulkload.c
index 4c69fe0d..81d67e62 100644
--- a/repair/bulkload.c
+++ b/repair/bulkload.c
@@ -95,3 +95,44 @@ bulkload_claim_block(
 		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
 	return 0;
 }
+
+/*
+ * Estimate proper slack values for a btree that's being reloaded.
+ *
+ * Under most circumstances, we'll take whatever default loading value the
+ * btree bulk loading code calculates for us.  However, there are some
+ * exceptions to this rule:
+ *
+ * (1) If someone turned one of the debug knobs.
+ * (2) The AG has less than ~9% space free.
+ *
+ * Note that we actually use 3/32 for the comparison to avoid division.
+ */
+void
+bulkload_estimate_ag_slack(
+	struct repair_ctx	*sc,
+	struct xfs_btree_bload	*bload,
+	unsigned int		free)
+{
+	/*
+	 * The global values are set to -1 (i.e. take the bload defaults)
+	 * unless someone has set them otherwise, so we just pull the values
+	 * here.
+	 */
+	bload->leaf_slack = bload_leaf_slack;
+	bload->node_slack = bload_node_slack;
+
+	/* No further changes if there's more than 3/32ths space left. */
+	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
+		return;
+
+	/*
+	 * We're low on space; load the btrees as tightly as possible.  Leave
+	 * a couple of open slots in each btree block so that we don't end up
+	 * splitting the btrees like crazy right after mount.
+	 */
+	if (bload->leaf_slack < 0)
+		bload->leaf_slack = 2;
+	if (bload->node_slack < 0)
+		bload->node_slack = 2;
+}
diff --git a/repair/bulkload.h b/repair/bulkload.h
index 79f81cb0..01f67279 100644
--- a/repair/bulkload.h
+++ b/repair/bulkload.h
@@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
 void bulkload_destroy(struct bulkload *bkl, int error);
 int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
 		union xfs_btree_ptr *ptr);
+void bulkload_estimate_ag_slack(struct repair_ctx *sc,
+		struct xfs_btree_bload *bload, unsigned int free);
 
 #endif /* __XFS_REPAIR_BULKLOAD_H__ */

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-07-02 15:18   ` [PATCH v2 " Darrick J. Wong
@ 2020-07-03  3:24     ` Eric Sandeen
  2020-07-03 20:26       ` Darrick J. Wong
  2020-07-10 19:10     ` Eric Sandeen
  1 sibling, 1 reply; 42+ messages in thread
From: Eric Sandeen @ 2020-07-03  3:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster

On 7/2/20 10:18 AM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create some new support structures and functions to assist phase5 in
> using the btree bulk loader to reconstruct metadata btrees.  This is the
> first step in removing the open-coded AG btree rebuilding code.
> 
> Note: The code in this patch will not be used anywhere until the next
> patch, so warnings about unused symbols are expected.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: set the "nearly out of space" slack value to 2 so that we don't
> start out with tons of btree splitting right after mount

This also took out the changes to phase5_func() I think, but there is no
V2 of 07/12 to add them back?

-Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-07-03  3:24     ` Eric Sandeen
@ 2020-07-03 20:26       ` Darrick J. Wong
  2020-07-03 21:51         ` Eric Sandeen
  0 siblings, 1 reply; 42+ messages in thread
From: Darrick J. Wong @ 2020-07-03 20:26 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, bfoster

On Thu, Jul 02, 2020 at 10:24:30PM -0500, Eric Sandeen wrote:
> On 7/2/20 10:18 AM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some new support structures and functions to assist phase5 in
> > using the btree bulk loader to reconstruct metadata btrees.  This is the
> > first step in removing the open-coded AG btree rebuilding code.
> > 
> > Note: The code in this patch will not be used anywhere until the next
> > patch, so warnings about unused symbols are expected.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: set the "nearly out of space" slack value to 2 so that we don't
> > start out with tons of btree splitting right after mount
> 
> This also took out the changes to phase5_func() I think, but there is no
> V2 of 07/12 to add them back?

Doh.  Do you want me just to resend the entire pile that I have?  I've
forgotten which patches have been updated because tracking dozens of
small changes individually via email chains is awful save for the
automatic archiving.

--D

> -Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-07-03 20:26       ` Darrick J. Wong
@ 2020-07-03 21:51         ` Eric Sandeen
  2020-07-04  3:39           ` Darrick J. Wong
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Sandeen @ 2020-07-03 21:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster

On 7/3/20 3:26 PM, Darrick J. Wong wrote:
> On Thu, Jul 02, 2020 at 10:24:30PM -0500, Eric Sandeen wrote:
>> On 7/2/20 10:18 AM, Darrick J. Wong wrote:
>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>
>>> Create some new support structures and functions to assist phase5 in
>>> using the btree bulk loader to reconstruct metadata btrees.  This is the
>>> first step in removing the open-coded AG btree rebuilding code.
>>>
>>> Note: The code in this patch will not be used anywhere until the next
>>> patch, so warnings about unused symbols are expected.
>>>
>>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>> ---
>>> v2: set the "nearly out of space" slack value to 2 so that we don't
>>> start out with tons of btree splitting right after mount
>>
>> This also took out the changes to phase5_func() I think, but there is no
>> V2 of 07/12 to add them back?
> 
> Doh.  Do you want me just to resend the entire pile that I have?  I've
> forgotten which patches have been updated because tracking dozens of
> small changes individually via email chains is awful save for the
> automatic archiving.

I think I have it all good to go but if you want to point me at a branch to
compare against that might be good.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-07-03 21:51         ` Eric Sandeen
@ 2020-07-04  3:39           ` Darrick J. Wong
  0 siblings, 0 replies; 42+ messages in thread
From: Darrick J. Wong @ 2020-07-04  3:39 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, bfoster

On Fri, Jul 03, 2020 at 04:51:10PM -0500, Eric Sandeen wrote:
> On 7/3/20 3:26 PM, Darrick J. Wong wrote:
> > On Thu, Jul 02, 2020 at 10:24:30PM -0500, Eric Sandeen wrote:
> >> On 7/2/20 10:18 AM, Darrick J. Wong wrote:
> >>> From: Darrick J. Wong <darrick.wong@oracle.com>
> >>>
> >>> Create some new support structures and functions to assist phase5 in
> >>> using the btree bulk loader to reconstruct metadata btrees.  This is the
> >>> first step in removing the open-coded AG btree rebuilding code.
> >>>
> >>> Note: The code in this patch will not be used anywhere until the next
> >>> patch, so warnings about unused symbols are expected.
> >>>
> >>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> >>> ---
> >>> v2: set the "nearly out of space" slack value to 2 so that we don't
> >>> start out with tons of btree splitting right after mount
> >>
> >> This also took out the changes to phase5_func() I think, but there is no
> >> V2 of 07/12 to add them back?
> > 
> > Doh.  Do you want me just to resend the entire pile that I have?  I've
> > forgotten which patches have been updated because tracking dozens of
> > small changes individually via email chains is awful save for the
> > automatic archiving.
> 
> I think I have it all good to go but if you want to point me at a branch to
> compare against that might be good.

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-quotacheck_2020-07-02

--D

> 
> Thanks,
> -Eric

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-07-02 15:18   ` [PATCH v2 " Darrick J. Wong
  2020-07-03  3:24     ` Eric Sandeen
@ 2020-07-10 19:10     ` Eric Sandeen
  2020-07-13 13:37       ` Brian Foster
  1 sibling, 1 reply; 42+ messages in thread
From: Eric Sandeen @ 2020-07-10 19:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster

On 7/2/20 10:18 AM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create some new support structures and functions to assist phase5 in
> using the btree bulk loader to reconstruct metadata btrees.  This is the
> first step in removing the open-coded AG btree rebuilding code.
> 
> Note: The code in this patch will not be used anywhere until the next
> patch, so warnings about unused symbols are expected.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: set the "nearly out of space" slack value to 2 so that we don't
> start out with tons of btree splitting right after mount

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

Not sure if Brian's RVB carries through the V2 change or not ...

> ---
>  repair/Makefile   |    4 +
>  repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  repair/agbtree.h  |   29 ++++++++++
>  repair/bulkload.c |   41 ++++++++++++++
>  repair/bulkload.h |    2 +
>  5 files changed, 226 insertions(+), 2 deletions(-)
>  create mode 100644 repair/agbtree.c
>  create mode 100644 repair/agbtree.h
> 
> diff --git a/repair/Makefile b/repair/Makefile
> index 62d84bbf..f6a6e3f9 100644
> --- a/repair/Makefile
> +++ b/repair/Makefile
> @@ -9,11 +9,11 @@ LSRCFILES = README
>  
>  LTCOMMAND = xfs_repair
>  
> -HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> +HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
>  	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
>  	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
>  
> -CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> +CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
>  	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
>  	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
>  	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
> diff --git a/repair/agbtree.c b/repair/agbtree.c
> new file mode 100644
> index 00000000..95a3eac9
> --- /dev/null
> +++ b/repair/agbtree.c
> @@ -0,0 +1,152 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#include <libxfs.h>
> +#include "err_protos.h"
> +#include "slab.h"
> +#include "rmap.h"
> +#include "incore.h"
> +#include "bulkload.h"
> +#include "agbtree.h"
> +
> +/* Initialize a btree rebuild context. */
> +static void
> +init_rebuild(
> +	struct repair_ctx		*sc,
> +	const struct xfs_owner_info	*oinfo,
> +	xfs_agblock_t			free_space,
> +	struct bt_rebuild		*btr)
> +{
> +	memset(btr, 0, sizeof(struct bt_rebuild));
> +
> +	bulkload_init_ag(&btr->newbt, sc, oinfo);
> +	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
> +}
> +
> +/*
> + * Update this free space record to reflect the blocks we stole from the
> + * beginning of the record.
> + */
> +static void
> +consume_freespace(
> +	xfs_agnumber_t		agno,
> +	struct extent_tree_node	*ext_ptr,
> +	uint32_t		len)
> +{
> +	struct extent_tree_node	*bno_ext_ptr;
> +	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
> +	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
> +
> +	/* Delete the used-up extent from both extent trees. */
> +#ifdef XR_BLD_FREE_TRACE
> +	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
> +			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> +#endif
> +	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
> +	ASSERT(bno_ext_ptr != NULL);
> +	get_bno_extent(agno, bno_ext_ptr);
> +	release_extent_tree_node(bno_ext_ptr);
> +
> +	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> +			ext_ptr->ex_blockcount);
> +	release_extent_tree_node(ext_ptr);
> +
> +	/*
> +	 * If we only used part of this last extent, then we must reinsert the
> +	 * extent to maintain proper sorting order.
> +	 */
> +	if (new_len > 0) {
> +		add_bno_extent(agno, new_start, new_len);
> +		add_bcnt_extent(agno, new_start, new_len);
> +	}
> +}
> +
> +/* Reserve blocks for the new per-AG structures. */
> +static void
> +reserve_btblocks(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno,
> +	struct bt_rebuild	*btr,
> +	uint32_t		nr_blocks)
> +{
> +	struct extent_tree_node	*ext_ptr;
> +	uint32_t		blocks_allocated = 0;
> +	uint32_t		len;
> +	int			error;
> +
> +	while (blocks_allocated < nr_blocks)  {
> +		xfs_fsblock_t	fsbno;
> +
> +		/*
> +		 * Grab the smallest extent and use it up, then get the
> +		 * next smallest.  This mimics the init_*_cursor code.
> +		 */
> +		ext_ptr = findfirst_bcnt_extent(agno);
> +		if (!ext_ptr)
> +			do_error(
> +_("error - not enough free space in filesystem\n"));
> +
> +		/* Use up the extent we've got. */
> +		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
> +		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
> +		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
> +		if (error)
> +			do_error(_("could not set up btree reservation: %s\n"),
> +				strerror(-error));
> +
> +		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
> +				btr->newbt.oinfo.oi_owner);
> +		if (error)
> +			do_error(_("could not set up btree rmaps: %s\n"),
> +				strerror(-error));
> +
> +		consume_freespace(agno, ext_ptr, len);
> +		blocks_allocated += len;
> +	}
> +#ifdef XR_BLD_FREE_TRACE
> +	fprintf(stderr, "blocks_allocated = %d\n",
> +		blocks_allocated);
> +#endif
> +}
> +
> +/* Feed one of the new btree blocks to the bulk loader. */
> +static int
> +rebuild_claim_block(
> +	struct xfs_btree_cur	*cur,
> +	union xfs_btree_ptr	*ptr,
> +	void			*priv)
> +{
> +	struct bt_rebuild	*btr = priv;
> +
> +	return bulkload_claim_block(cur, &btr->newbt, ptr);
> +}
> +
> +/*
> + * Scoop up leftovers from a rebuild cursor for later freeing, then free the
> + * rebuild context.
> + */
> +void
> +finish_rebuild(
> +	struct xfs_mount	*mp,
> +	struct bt_rebuild	*btr,
> +	struct xfs_slab		*lost_fsb)
> +{
> +	struct bulkload_resv	*resv, *n;
> +
> +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> +		while (resv->used < resv->len) {
> +			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
> +			int		error;
> +
> +			error = slab_add(lost_fsb, &fsb);
> +			if (error)
> +				do_error(
> +_("Insufficient memory saving lost blocks.\n"));
> +			resv->used++;
> +		}
> +	}
> +
> +	bulkload_destroy(&btr->newbt, 0);
> +}
> diff --git a/repair/agbtree.h b/repair/agbtree.h
> new file mode 100644
> index 00000000..50ea3c60
> --- /dev/null
> +++ b/repair/agbtree.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> + */
> +#ifndef __XFS_REPAIR_AG_BTREE_H__
> +#define __XFS_REPAIR_AG_BTREE_H__
> +
> +/* Context for rebuilding a per-AG btree. */
> +struct bt_rebuild {
> +	/* Fake root for staging and space preallocations. */
> +	struct bulkload	newbt;
> +
> +	/* Geometry of the new btree. */
> +	struct xfs_btree_bload	bload;
> +
> +	/* Staging btree cursor for the new tree. */
> +	struct xfs_btree_cur	*cur;
> +
> +	/* Tree-specific data. */
> +	union {
> +		struct xfs_slab_cursor	*slab_cursor;
> +	};
> +};
> +
> +void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> +		struct xfs_slab *lost_fsb);
> +
> +#endif /* __XFS_REPAIR_AG_BTREE_H__ */
> diff --git a/repair/bulkload.c b/repair/bulkload.c
> index 4c69fe0d..81d67e62 100644
> --- a/repair/bulkload.c
> +++ b/repair/bulkload.c
> @@ -95,3 +95,44 @@ bulkload_claim_block(
>  		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
>  	return 0;
>  }
> +
> +/*
> + * Estimate proper slack values for a btree that's being reloaded.
> + *
> + * Under most circumstances, we'll take whatever default loading value the
> + * btree bulk loading code calculates for us.  However, there are some
> + * exceptions to this rule:
> + *
> + * (1) If someone turned one of the debug knobs.
> + * (2) The AG has less than ~9% space free.
> + *
> + * Note that we actually use 3/32 for the comparison to avoid division.
> + */
> +void
> +bulkload_estimate_ag_slack(
> +	struct repair_ctx	*sc,
> +	struct xfs_btree_bload	*bload,
> +	unsigned int		free)
> +{
> +	/*
> +	 * The global values are set to -1 (i.e. take the bload defaults)
> +	 * unless someone has set them otherwise, so we just pull the values
> +	 * here.
> +	 */
> +	bload->leaf_slack = bload_leaf_slack;
> +	bload->node_slack = bload_node_slack;
> +
> +	/* No further changes if there's more than 3/32ths space left. */
> +	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
> +		return;
> +
> +	/*
> +	 * We're low on space; load the btrees as tightly as possible.  Leave
> +	 * a couple of open slots in each btree block so that we don't end up
> +	 * splitting the btrees like crazy right after mount.
> +	 */
> +	if (bload->leaf_slack < 0)
> +		bload->leaf_slack = 2;
> +	if (bload->node_slack < 0)
> +		bload->node_slack = 2;
> +}
> diff --git a/repair/bulkload.h b/repair/bulkload.h
> index 79f81cb0..01f67279 100644
> --- a/repair/bulkload.h
> +++ b/repair/bulkload.h
> @@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
>  void bulkload_destroy(struct bulkload *bkl, int error);
>  int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
>  		union xfs_btree_ptr *ptr);
> +void bulkload_estimate_ag_slack(struct repair_ctx *sc,
> +		struct xfs_btree_bload *bload, unsigned int free);
>  
>  #endif /* __XFS_REPAIR_BULKLOAD_H__ */
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 06/12] xfs_repair: create a new class of btree rebuild cursors
  2020-07-10 19:10     ` Eric Sandeen
@ 2020-07-13 13:37       ` Brian Foster
  0 siblings, 0 replies; 42+ messages in thread
From: Brian Foster @ 2020-07-13 13:37 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Darrick J. Wong, linux-xfs

On Fri, Jul 10, 2020 at 12:10:26PM -0700, Eric Sandeen wrote:
> On 7/2/20 10:18 AM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create some new support structures and functions to assist phase5 in
> > using the btree bulk loader to reconstruct metadata btrees.  This is the
> > first step in removing the open-coded AG btree rebuilding code.
> > 
> > Note: The code in this patch will not be used anywhere until the next
> > patch, so warnings about unused symbols are expected.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: set the "nearly out of space" slack value to 2 so that we don't
> > start out with tons of btree splitting right after mount
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> Not sure if Brian's RVB carries through the V2 change or not ...
> 

No objection from me if the only changes were adjusting the default slack
values and lifting out the unrelated hunk..

Brian

> > ---
> >  repair/Makefile   |    4 +
> >  repair/agbtree.c  |  152 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  repair/agbtree.h  |   29 ++++++++++
> >  repair/bulkload.c |   41 ++++++++++++++
> >  repair/bulkload.h |    2 +
> >  5 files changed, 226 insertions(+), 2 deletions(-)
> >  create mode 100644 repair/agbtree.c
> >  create mode 100644 repair/agbtree.h
> > 
> > diff --git a/repair/Makefile b/repair/Makefile
> > index 62d84bbf..f6a6e3f9 100644
> > --- a/repair/Makefile
> > +++ b/repair/Makefile
> > @@ -9,11 +9,11 @@ LSRCFILES = README
> >  
> >  LTCOMMAND = xfs_repair
> >  
> > -HFILES = agheader.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> > +HFILES = agheader.h agbtree.h attr_repair.h avl.h bulkload.h bmap.h btree.h \
> >  	da_util.h dinode.h dir2.h err_protos.h globals.h incore.h protos.h \
> >  	rt.h progress.h scan.h versions.h prefetch.h rmap.h slab.h threads.h
> >  
> > -CFILES = agheader.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> > +CFILES = agheader.c agbtree.c attr_repair.c avl.c bulkload.c bmap.c btree.c \
> >  	da_util.c dino_chunks.c dinode.c dir2.c globals.c incore.c \
> >  	incore_bmc.c init.c incore_ext.c incore_ino.c phase1.c \
> >  	phase2.c phase3.c phase4.c phase5.c phase6.c phase7.c \
> > diff --git a/repair/agbtree.c b/repair/agbtree.c
> > new file mode 100644
> > index 00000000..95a3eac9
> > --- /dev/null
> > +++ b/repair/agbtree.c
> > @@ -0,0 +1,152 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/*
> > + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#include <libxfs.h>
> > +#include "err_protos.h"
> > +#include "slab.h"
> > +#include "rmap.h"
> > +#include "incore.h"
> > +#include "bulkload.h"
> > +#include "agbtree.h"
> > +
> > +/* Initialize a btree rebuild context. */
> > +static void
> > +init_rebuild(
> > +	struct repair_ctx		*sc,
> > +	const struct xfs_owner_info	*oinfo,
> > +	xfs_agblock_t			free_space,
> > +	struct bt_rebuild		*btr)
> > +{
> > +	memset(btr, 0, sizeof(struct bt_rebuild));
> > +
> > +	bulkload_init_ag(&btr->newbt, sc, oinfo);
> > +	bulkload_estimate_ag_slack(sc, &btr->bload, free_space);
> > +}
> > +
> > +/*
> > + * Update this free space record to reflect the blocks we stole from the
> > + * beginning of the record.
> > + */
> > +static void
> > +consume_freespace(
> > +	xfs_agnumber_t		agno,
> > +	struct extent_tree_node	*ext_ptr,
> > +	uint32_t		len)
> > +{
> > +	struct extent_tree_node	*bno_ext_ptr;
> > +	xfs_agblock_t		new_start = ext_ptr->ex_startblock + len;
> > +	xfs_extlen_t		new_len = ext_ptr->ex_blockcount - len;
> > +
> > +	/* Delete the used-up extent from both extent trees. */
> > +#ifdef XR_BLD_FREE_TRACE
> > +	fprintf(stderr, "releasing extent: %u [%u %u]\n", agno,
> > +			ext_ptr->ex_startblock, ext_ptr->ex_blockcount);
> > +#endif
> > +	bno_ext_ptr = find_bno_extent(agno, ext_ptr->ex_startblock);
> > +	ASSERT(bno_ext_ptr != NULL);
> > +	get_bno_extent(agno, bno_ext_ptr);
> > +	release_extent_tree_node(bno_ext_ptr);
> > +
> > +	ext_ptr = get_bcnt_extent(agno, ext_ptr->ex_startblock,
> > +			ext_ptr->ex_blockcount);
> > +	release_extent_tree_node(ext_ptr);
> > +
> > +	/*
> > +	 * If we only used part of this last extent, then we must reinsert the
> > +	 * extent to maintain proper sorting order.
> > +	 */
> > +	if (new_len > 0) {
> > +		add_bno_extent(agno, new_start, new_len);
> > +		add_bcnt_extent(agno, new_start, new_len);
> > +	}
> > +}
> > +
> > +/* Reserve blocks for the new per-AG structures. */
> > +static void
> > +reserve_btblocks(
> > +	struct xfs_mount	*mp,
> > +	xfs_agnumber_t		agno,
> > +	struct bt_rebuild	*btr,
> > +	uint32_t		nr_blocks)
> > +{
> > +	struct extent_tree_node	*ext_ptr;
> > +	uint32_t		blocks_allocated = 0;
> > +	uint32_t		len;
> > +	int			error;
> > +
> > +	while (blocks_allocated < nr_blocks)  {
> > +		xfs_fsblock_t	fsbno;
> > +
> > +		/*
> > +		 * Grab the smallest extent and use it up, then get the
> > +		 * next smallest.  This mimics the init_*_cursor code.
> > +		 */
> > +		ext_ptr = findfirst_bcnt_extent(agno);
> > +		if (!ext_ptr)
> > +			do_error(
> > +_("error - not enough free space in filesystem\n"));
> > +
> > +		/* Use up the extent we've got. */
> > +		len = min(ext_ptr->ex_blockcount, nr_blocks - blocks_allocated);
> > +		fsbno = XFS_AGB_TO_FSB(mp, agno, ext_ptr->ex_startblock);
> > +		error = bulkload_add_blocks(&btr->newbt, fsbno, len);
> > +		if (error)
> > +			do_error(_("could not set up btree reservation: %s\n"),
> > +				strerror(-error));
> > +
> > +		error = rmap_add_ag_rec(mp, agno, ext_ptr->ex_startblock, len,
> > +				btr->newbt.oinfo.oi_owner);
> > +		if (error)
> > +			do_error(_("could not set up btree rmaps: %s\n"),
> > +				strerror(-error));
> > +
> > +		consume_freespace(agno, ext_ptr, len);
> > +		blocks_allocated += len;
> > +	}
> > +#ifdef XR_BLD_FREE_TRACE
> > +	fprintf(stderr, "blocks_allocated = %d\n",
> > +		blocks_allocated);
> > +#endif
> > +}
> > +
> > +/* Feed one of the new btree blocks to the bulk loader. */
> > +static int
> > +rebuild_claim_block(
> > +	struct xfs_btree_cur	*cur,
> > +	union xfs_btree_ptr	*ptr,
> > +	void			*priv)
> > +{
> > +	struct bt_rebuild	*btr = priv;
> > +
> > +	return bulkload_claim_block(cur, &btr->newbt, ptr);
> > +}
> > +
> > +/*
> > + * Scoop up leftovers from a rebuild cursor for later freeing, then free the
> > + * rebuild context.
> > + */
> > +void
> > +finish_rebuild(
> > +	struct xfs_mount	*mp,
> > +	struct bt_rebuild	*btr,
> > +	struct xfs_slab		*lost_fsb)
> > +{
> > +	struct bulkload_resv	*resv, *n;
> > +
> > +	for_each_bulkload_reservation(&btr->newbt, resv, n) {
> > +		while (resv->used < resv->len) {
> > +			xfs_fsblock_t	fsb = resv->fsbno + resv->used;
> > +			int		error;
> > +
> > +			error = slab_add(lost_fsb, &fsb);
> > +			if (error)
> > +				do_error(
> > +_("Insufficient memory saving lost blocks.\n"));
> > +			resv->used++;
> > +		}
> > +	}
> > +
> > +	bulkload_destroy(&btr->newbt, 0);
> > +}
> > diff --git a/repair/agbtree.h b/repair/agbtree.h
> > new file mode 100644
> > index 00000000..50ea3c60
> > --- /dev/null
> > +++ b/repair/agbtree.h
> > @@ -0,0 +1,29 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * Copyright (C) 2020 Oracle.  All Rights Reserved.
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + */
> > +#ifndef __XFS_REPAIR_AG_BTREE_H__
> > +#define __XFS_REPAIR_AG_BTREE_H__
> > +
> > +/* Context for rebuilding a per-AG btree. */
> > +struct bt_rebuild {
> > +	/* Fake root for staging and space preallocations. */
> > +	struct bulkload	newbt;
> > +
> > +	/* Geometry of the new btree. */
> > +	struct xfs_btree_bload	bload;
> > +
> > +	/* Staging btree cursor for the new tree. */
> > +	struct xfs_btree_cur	*cur;
> > +
> > +	/* Tree-specific data. */
> > +	union {
> > +		struct xfs_slab_cursor	*slab_cursor;
> > +	};
> > +};
> > +
> > +void finish_rebuild(struct xfs_mount *mp, struct bt_rebuild *btr,
> > +		struct xfs_slab *lost_fsb);
> > +
> > +#endif /* __XFS_REPAIR_AG_BTREE_H__ */
> > diff --git a/repair/bulkload.c b/repair/bulkload.c
> > index 4c69fe0d..81d67e62 100644
> > --- a/repair/bulkload.c
> > +++ b/repair/bulkload.c
> > @@ -95,3 +95,44 @@ bulkload_claim_block(
> >  		ptr->s = cpu_to_be32(XFS_FSB_TO_AGBNO(cur->bc_mp, fsb));
> >  	return 0;
> >  }
> > +
> > +/*
> > + * Estimate proper slack values for a btree that's being reloaded.
> > + *
> > + * Under most circumstances, we'll take whatever default loading value the
> > + * btree bulk loading code calculates for us.  However, there are some
> > + * exceptions to this rule:
> > + *
> > + * (1) If someone turned one of the debug knobs.
> > + * (2) The AG has less than ~9% space free.
> > + *
> > + * Note that we actually use 3/32 for the comparison to avoid division.
> > + */
> > +void
> > +bulkload_estimate_ag_slack(
> > +	struct repair_ctx	*sc,
> > +	struct xfs_btree_bload	*bload,
> > +	unsigned int		free)
> > +{
> > +	/*
> > +	 * The global values are set to -1 (i.e. take the bload defaults)
> > +	 * unless someone has set them otherwise, so we just pull the values
> > +	 * here.
> > +	 */
> > +	bload->leaf_slack = bload_leaf_slack;
> > +	bload->node_slack = bload_node_slack;
> > +
> > +	/* No further changes if there's more than 3/32ths space left. */
> > +	if (free >= ((sc->mp->m_sb.sb_agblocks * 3) >> 5))
> > +		return;
> > +
> > +	/*
> > +	 * We're low on space; load the btrees as tightly as possible.  Leave
> > +	 * a couple of open slots in each btree block so that we don't end up
> > +	 * splitting the btrees like crazy right after mount.
> > +	 */
> > +	if (bload->leaf_slack < 0)
> > +		bload->leaf_slack = 2;
> > +	if (bload->node_slack < 0)
> > +		bload->node_slack = 2;
> > +}
> > diff --git a/repair/bulkload.h b/repair/bulkload.h
> > index 79f81cb0..01f67279 100644
> > --- a/repair/bulkload.h
> > +++ b/repair/bulkload.h
> > @@ -53,5 +53,7 @@ int bulkload_add_blocks(struct bulkload *bkl, xfs_fsblock_t fsbno,
> >  void bulkload_destroy(struct bulkload *bkl, int error);
> >  int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl,
> >  		union xfs_btree_ptr *ptr);
> > +void bulkload_estimate_ag_slack(struct repair_ctx *sc,
> > +		struct xfs_btree_bload *bload, unsigned int free);
> >  
> >  #endif /* __XFS_REPAIR_BULKLOAD_H__ */
> > 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-07-13 13:37 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-02  4:26 [PATCH v6 00/12] xfs_repair: use btree bulk loading Darrick J. Wong
2020-06-02  4:26 ` [PATCH 01/12] xfs_repair: drop lostblocks from build_agf_agfl Darrick J. Wong
2020-06-17 12:09   ` Brian Foster
2020-06-02  4:27 ` [PATCH 02/12] xfs_repair: rename the agfl index loop variable in build_agf_agfl Darrick J. Wong
2020-06-17 12:09   ` Brian Foster
2020-06-02  4:27 ` [PATCH 03/12] xfs_repair: make container for btree bulkload root and block reservation Darrick J. Wong
2020-06-17 12:09   ` Brian Foster
2020-06-02  4:27 ` [PATCH 04/12] xfs_repair: remove gratuitous code block in phase5 Darrick J. Wong
2020-06-02  4:27 ` [PATCH 05/12] xfs_repair: inject lost blocks back into the fs no matter the owner Darrick J. Wong
2020-06-17 12:09   ` Brian Foster
2020-06-02  4:27 ` [PATCH 06/12] xfs_repair: create a new class of btree rebuild cursors Darrick J. Wong
2020-06-17 12:10   ` Brian Foster
2020-06-18 18:30     ` Darrick J. Wong
2020-06-29 23:10     ` Darrick J. Wong
2020-07-02 15:18   ` [PATCH v2 " Darrick J. Wong
2020-07-03  3:24     ` Eric Sandeen
2020-07-03 20:26       ` Darrick J. Wong
2020-07-03 21:51         ` Eric Sandeen
2020-07-04  3:39           ` Darrick J. Wong
2020-07-10 19:10     ` Eric Sandeen
2020-07-13 13:37       ` Brian Foster
2020-06-02  4:27 ` [PATCH 07/12] xfs_repair: rebuild free space btrees with bulk loader Darrick J. Wong
2020-06-18 15:23   ` Brian Foster
2020-06-18 16:41     ` Darrick J. Wong
2020-06-18 16:51       ` Brian Foster
2020-06-02  4:27 ` [PATCH 08/12] xfs_repair: rebuild inode " Darrick J. Wong
2020-06-18 15:24   ` Brian Foster
2020-06-18 18:33     ` Darrick J. Wong
2020-06-02  4:27 ` [PATCH 09/12] xfs_repair: rebuild reverse mapping " Darrick J. Wong
2020-06-18 15:25   ` Brian Foster
2020-06-18 15:31     ` Darrick J. Wong
2020-06-18 15:37       ` Brian Foster
2020-06-18 16:54         ` Darrick J. Wong
2020-06-02  4:27 ` [PATCH 10/12] xfs_repair: rebuild refcount " Darrick J. Wong
2020-06-18 15:26   ` Brian Foster
2020-06-18 16:56     ` Darrick J. Wong
2020-06-18 17:05       ` Brian Foster
2020-06-02  4:28 ` [PATCH 11/12] xfs_repair: remove old btree rebuild support code Darrick J. Wong
2020-06-19 11:10   ` Brian Foster
2020-06-02  4:28 ` [PATCH 12/12] xfs_repair: use bitmap to track blocks lost during btree construction Darrick J. Wong
2020-06-19 11:10   ` Brian Foster
2020-06-19 21:36     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.