All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] xfs: btree bulk loading
@ 2020-03-12  3:45 Darrick J. Wong
  2020-03-12  3:45 ` [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

Hi all,

This series creates a bulk loading function for metadata btree cursors.

We start by creating the idea of a "fake root" for each of the btree
root types (AG header and inode) so that we can use a special btree
cursor to stage a new btree without altering anything that might already
exist.

Next, we add utility functions to compute the desired btree shape for a
given number of records, load records into new leaf blocks, compute the
node blocks from that, and present the new root ready for commit.

Finally we extend all four per-AG btree cursor types to support staging
cursors and therefore bulk loading.  This will be used by upcoming patch
series to implement online repair and refactor offline repair.

For v4, fix a lot of review comments from Brian Foster, most of which
relate to disentangling thornier parts of the code; and clarifying the
documentation so that someone other than the author can understand what
is going on here. :)

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This has been lightly tested with fstests.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=btree-bulk-loading-5.7

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
@ 2020-03-12  3:45 ` Darrick J. Wong
  2020-03-13 14:47   ` Brian Foster
  2020-03-12  3:45 ` [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an in-core fake root for AG-rooted btree types so that callers
can generate a whole new btree using the upcoming btree bulk load
function without making the new tree accessible from the rest of the
filesystem.  It is up to the individual btree type to provide a function
to create a staged cursor (presumably with the appropriate callouts to
update the fakeroot) and then commit the staged root back into the
filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c |  168 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_btree.h |   30 ++++++++
 fs/xfs/xfs_trace.h        |   28 ++++++++
 3 files changed, 225 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 4ef9f0b42c7f..085bc070e804 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -382,6 +382,8 @@ xfs_btree_del_cursor(
 	/*
 	 * Free the cursor.
 	 */
+	if (unlikely(cur->bc_flags & XFS_BTREE_STAGING))
+		kmem_free((void *)cur->bc_ops);
 	kmem_cache_free(xfs_btree_cur_zone, cur);
 }
 
@@ -4908,3 +4910,169 @@ xfs_btree_has_more_records(
 	else
 		return block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK);
 }
+
+/*
+ * Staging Cursors and Fake Roots for Btrees
+ * =========================================
+ *
+ * A staging btree cursor is a special type of btree cursor that callers must
+ * use to construct a new btree index using the btree bulk loader code.  The
+ * bulk loading code uses the staging btree cursor to abstract the details of
+ * initializing new btree blocks and filling them with records or key/ptr
+ * pairs.  Regular btree operations (e.g. queries and modifications) are not
+ * supported with staging cursors, and callers must not invoke them.
+ *
+ * Fake root structures contain all the information about a btree that is under
+ * construction by the bulk loading code.  Staging btree cursors point to fake
+ * root structures instead of the usual AG header or inode structure.
+ *
+ * Callers are expected to initialize a fake root structure and pass it into
+ * the _stage_cursor function for a specific btree type.  When bulk loading is
+ * complete, callers should call the _commit_staged_btree function for that
+ * specific btree type to commit the new btree into the filesystem.
+ */
+
+/*
+ * Don't allow staging cursors to be duplicated because they're supposed to be
+ * kept private to a single thread.
+ */
+STATIC struct xfs_btree_cur *
+xfs_btree_fakeroot_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	ASSERT(0);
+	return NULL;
+}
+
+/*
+ * Don't allow block allocation for a staging cursor, because staging cursors
+ * do not support regular btree modifications.
+ *
+ * Bulk loading uses a separate callback to obtain new blocks from a
+ * preallocated list, which prevents ENOSPC failures during loading.
+ */
+STATIC int
+xfs_btree_fakeroot_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start_bno,
+	union xfs_btree_ptr	*new_bno,
+	int			*stat)
+{
+	ASSERT(0);
+	return -EFSCORRUPTED;
+}
+
+/*
+ * Don't allow block freeing for a staging cursor, because staging cursors
+ * do not support regular btree modifications.
+ */
+STATIC int
+xfs_btree_fakeroot_free_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	ASSERT(0);
+	return -EFSCORRUPTED;
+}
+
+/* Initialize a pointer to the root block from the fakeroot. */
+STATIC void
+xfs_btree_fakeroot_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	struct xbtree_afakeroot	*afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	afake = cur->bc_ag.afake;
+	ptr->s = cpu_to_be32(afake->af_root);
+}
+
+/*
+ * Bulk Loading for AG Btrees
+ * ==========================
+ *
+ * For a btree rooted in an AG header, pass a xbtree_afakeroot structure to the
+ * staging cursor.  Callers should initialize this to zero.
+ *
+ * The _stage_cursor() function for a specific btree type should call
+ * xfs_btree_stage_afakeroot to set up the in-memory cursor as a staging
+ * cursor.  The corresponding _commit_staged_btree() function should log the
+ * new root and call xfs_btree_commit_afakeroot() to transform the staging
+ * cursor into a regular btree cursor.
+ */
+
+/* Update the btree root information for a per-AG fake root. */
+STATIC void
+xfs_btree_afakeroot_set_root(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			inc)
+{
+	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+	afake->af_root = be32_to_cpu(ptr->s);
+	afake->af_levels += inc;
+}
+
+/*
+ * Initialize a AG-rooted btree cursor with the given AG btree fake root.  The
+ * btree cursor's bc_ops will be overridden as needed to make the staging
+ * functionality work.  If new_ops is not NULL, these new ops will be passed
+ * out to the caller for further overriding.
+ */
+void
+xfs_btree_stage_afakeroot(
+	struct xfs_btree_cur		*cur,
+	struct xbtree_afakeroot		*afake,
+	struct xfs_btree_ops		**new_ops)
+{
+	struct xfs_btree_ops		*nops;
+
+	ASSERT(!(cur->bc_flags & XFS_BTREE_STAGING));
+	ASSERT(!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE));
+	ASSERT(cur->bc_tp == NULL);
+
+	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
+	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
+	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
+	nops->free_block = xfs_btree_fakeroot_free_block;
+	nops->init_ptr_from_cur = xfs_btree_fakeroot_init_ptr_from_cur;
+	nops->set_root = xfs_btree_afakeroot_set_root;
+	nops->dup_cursor = xfs_btree_fakeroot_dup_cursor;
+
+	cur->bc_ag.afake = afake;
+	cur->bc_nlevels = afake->af_levels;
+	cur->bc_ops = nops;
+	cur->bc_flags |= XFS_BTREE_STAGING;
+
+	if (new_ops)
+		*new_ops = nops;
+}
+
+/*
+ * Transform an AG-rooted staging btree cursor back into a regular cursor by
+ * substituting a real btree root for the fake one and restoring normal btree
+ * cursor ops.  The caller must log the btree root change prior to calling
+ * this.
+ */
+void
+xfs_btree_commit_afakeroot(
+	struct xfs_btree_cur		*cur,
+	struct xfs_trans		*tp,
+	struct xfs_buf			*agbp,
+	const struct xfs_btree_ops	*ops)
+{
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+	ASSERT(cur->bc_tp == NULL);
+
+	trace_xfs_btree_commit_afakeroot(cur);
+
+	kmem_free((void *)cur->bc_ops);
+	cur->bc_ag.agbp = agbp;
+	cur->bc_ops = ops;
+	cur->bc_flags &= ~XFS_BTREE_STAGING;
+	cur->bc_tp = tp;
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 0d10bbd5223a..aa4a7bd40023 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -179,7 +179,10 @@ union xfs_btree_irec {
 
 /* Per-AG btree information. */
 struct xfs_btree_cur_ag {
-	struct xfs_buf		*agbp;
+	union {
+		struct xfs_buf		*agbp;
+		struct xbtree_afakeroot	*afake;	/* fake ag header root */
+	};
 	xfs_agnumber_t		agno;
 	union {
 		struct {
@@ -235,6 +238,12 @@ typedef struct xfs_btree_cur
 #define XFS_BTREE_LASTREC_UPDATE	(1<<2)	/* track last rec externally */
 #define XFS_BTREE_CRC_BLOCKS		(1<<3)	/* uses extended btree blocks */
 #define XFS_BTREE_OVERLAPPING		(1<<4)	/* overlapping intervals */
+/*
+ * The root of this btree is a fakeroot structure so that we can stage a btree
+ * rebuild without leaving it accessible via primary metadata.  The ops struct
+ * is dynamically allocated and must be freed when the cursor is deleted.
+ */
+#define XFS_BTREE_STAGING		(1<<5)
 
 
 #define	XFS_BTREE_NOERROR	0
@@ -515,4 +524,23 @@ xfs_btree_islastblock(
 	return block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK);
 }
 
+/* Fake root for an AG-rooted btree. */
+struct xbtree_afakeroot {
+	/* AG block number of the new btree root. */
+	xfs_agblock_t		af_root;
+
+	/* Height of the new btree. */
+	unsigned int		af_levels;
+
+	/* Number of blocks used by the btree. */
+	unsigned int		af_blocks;
+};
+
+/* Cursor interactions with with fake roots for AG-rooted btrees. */
+void xfs_btree_stage_afakeroot(struct xfs_btree_cur *cur,
+		struct xbtree_afakeroot *afake,
+		struct xfs_btree_ops **new_ops);
+void xfs_btree_commit_afakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
+		struct xfs_buf *agbp, const struct xfs_btree_ops *ops);
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 059c3098a4a0..d8c229492973 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3605,6 +3605,34 @@ TRACE_EVENT(xfs_check_new_dalign,
 		  __entry->calc_rootino)
 )
 
+TRACE_EVENT(xfs_btree_commit_afakeroot,
+	TP_PROTO(struct xfs_btree_cur *cur),
+	TP_ARGS(cur),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, agbno)
+		__field(unsigned int, levels)
+		__field(unsigned int, blocks)
+	),
+	TP_fast_assign(
+		__entry->dev = cur->bc_mp->m_super->s_dev;
+		__entry->btnum = cur->bc_btnum;
+		__entry->agno = cur->bc_ag.agno;
+		__entry->agbno = cur->bc_ag.afake->af_root;
+		__entry->levels = cur->bc_ag.afake->af_levels;
+		__entry->blocks = cur->bc_ag.afake->af_blocks;
+	),
+	TP_printk("dev %d:%d btree %s ag %u levels %u blocks %u root %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
+		  __entry->agno,
+		  __entry->levels,
+		  __entry->blocks,
+		  __entry->agbno)
+)
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
  2020-03-12  3:45 ` [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
@ 2020-03-12  3:45 ` Darrick J. Wong
  2020-03-13 14:47   ` Brian Foster
  2020-03-12  3:45 ` [PATCH 3/7] xfs: support bulk loading of staged btrees Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Create an in-core fake root for inode-rooted btree types so that callers
can generate a whole new btree using the upcoming btree bulk load
function without making the new tree accessible from the rest of the
filesystem.  It is up to the individual btree type to provide a function
to create a staged cursor (presumably with the appropriate callouts to
update the fakeroot) and then commit the staged root back into the
filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c |  111 +++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_btree.h |   31 +++++++++++++
 fs/xfs/xfs_trace.h        |   33 +++++++++++++
 3 files changed, 171 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 085bc070e804..4e1d4f184d4b 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -644,6 +644,17 @@ xfs_btree_ptr_addr(
 		((char *)block + xfs_btree_ptr_offset(cur, n, level));
 }
 
+struct xfs_ifork *
+xfs_btree_ifork_ptr(
+	struct xfs_btree_cur	*cur)
+{
+	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
+
+	if (cur->bc_flags & XFS_BTREE_STAGING)
+		return cur->bc_ino.ifake->if_fork;
+	return XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork);
+}
+
 /*
  * Get the root block which is stored in the inode.
  *
@@ -654,9 +665,8 @@ STATIC struct xfs_btree_block *
 xfs_btree_get_iroot(
 	struct xfs_btree_cur	*cur)
 {
-	struct xfs_ifork	*ifp;
+	struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
 
-	ifp = XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork);
 	return (struct xfs_btree_block *)ifp->if_broot;
 }
 
@@ -4985,8 +4995,17 @@ xfs_btree_fakeroot_init_ptr_from_cur(
 
 	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
 
-	afake = cur->bc_ag.afake;
-	ptr->s = cpu_to_be32(afake->af_root);
+	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
+		/*
+		 * The root block lives in the inode core, so we zero the
+		 * pointer (like the bmbt code does) to make it obvious if
+		 * anyone ever tries to use this pointer.
+		 */
+		ptr->l = cpu_to_be64(0);
+	} else {
+		afake = cur->bc_ag.afake;
+		ptr->s = cpu_to_be32(afake->af_root);
+	}
 }
 
 /*
@@ -5076,3 +5095,87 @@ xfs_btree_commit_afakeroot(
 	cur->bc_flags &= ~XFS_BTREE_STAGING;
 	cur->bc_tp = tp;
 }
+
+/*
+ * Bulk Loading for Inode-Rooted Btrees
+ * ====================================
+ *
+ * For a btree rooted in an inode fork, pass a xbtree_ifakeroot structure to
+ * the staging cursor.  This structure should be initialized as follows:
+ *
+ * - if_fork_size field should be set to the number of bytes available to the
+ *   fork in the inode.
+ *
+ * - if_fork should point to a freshly allocated struct xfs_ifork.
+ *
+ * - if_format should be set to the appropriate fork type (e.g.
+ *   XFS_DINODE_FMT_BTREE).
+ *
+ * All other fields must be zero.
+ *
+ * The _stage_cursor() function for a specific btree type should call
+ * xfs_btree_stage_ifakeroot to set up the in-memory cursor as a staging
+ * cursor.  The corresponding _commit_staged_btree() function should log the
+ * new root and call xfs_btree_commit_ifakeroot() to transform the staging
+ * cursor into a regular btree cursor.
+ */
+
+/*
+ * Initialize an inode-rooted btree cursor with the given inode btree fake
+ * root.  The btree cursor's bc_ops will be overridden as needed to make the
+ * staging functionality work.  If new_ops is not NULL, these new ops will be
+ * passed out to the caller for further overriding.
+ */
+void
+xfs_btree_stage_ifakeroot(
+	struct xfs_btree_cur		*cur,
+	struct xbtree_ifakeroot		*ifake,
+	struct xfs_btree_ops		**new_ops)
+{
+	struct xfs_btree_ops		*nops;
+
+	ASSERT(!(cur->bc_flags & XFS_BTREE_STAGING));
+	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
+	ASSERT(cur->bc_tp == NULL);
+
+	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
+	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
+	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
+	nops->free_block = xfs_btree_fakeroot_free_block;
+	nops->init_ptr_from_cur = xfs_btree_fakeroot_init_ptr_from_cur;
+	nops->dup_cursor = xfs_btree_fakeroot_dup_cursor;
+
+	cur->bc_ino.ifake = ifake;
+	cur->bc_nlevels = ifake->if_levels;
+	cur->bc_ops = nops;
+	cur->bc_flags |= XFS_BTREE_STAGING;
+
+	if (new_ops)
+		*new_ops = nops;
+}
+
+/*
+ * Transform an inode-rooted staging btree cursor back into a regular cursor by
+ * substituting a real btree root for the fake one and restoring normal btree
+ * cursor ops.  The caller must log the btree root change prior to calling
+ * this.
+ */
+void
+xfs_btree_commit_ifakeroot(
+	struct xfs_btree_cur		*cur,
+	struct xfs_trans		*tp,
+	int				whichfork,
+	const struct xfs_btree_ops	*ops)
+{
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+	ASSERT(cur->bc_tp == NULL);
+
+	trace_xfs_btree_commit_ifakeroot(cur);
+
+	kmem_free((void *)cur->bc_ops);
+	cur->bc_ino.ifake = NULL;
+	cur->bc_ino.whichfork = whichfork;
+	cur->bc_ops = ops;
+	cur->bc_flags &= ~XFS_BTREE_STAGING;
+	cur->bc_tp = tp;
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index aa4a7bd40023..047067f52063 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -10,6 +10,7 @@ struct xfs_buf;
 struct xfs_inode;
 struct xfs_mount;
 struct xfs_trans;
+struct xfs_ifork;
 
 extern kmem_zone_t	*xfs_btree_cur_zone;
 
@@ -198,6 +199,7 @@ struct xfs_btree_cur_ag {
 /* Btree-in-inode cursor information */
 struct xfs_btree_cur_ino {
 	struct xfs_inode	*ip;
+	struct xbtree_ifakeroot	*ifake;		/* fake inode fork */
 	int			allocated;
 	short			forksize;
 	char			whichfork;
@@ -506,6 +508,7 @@ union xfs_btree_key *xfs_btree_high_key_from_key(struct xfs_btree_cur *cur,
 int xfs_btree_has_record(struct xfs_btree_cur *cur, union xfs_btree_irec *low,
 		union xfs_btree_irec *high, bool *exists);
 bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
+struct xfs_ifork *xfs_btree_ifork_ptr(struct xfs_btree_cur *cur);
 
 /* Does this cursor point to the last block in the given level? */
 static inline bool
@@ -543,4 +546,32 @@ void xfs_btree_stage_afakeroot(struct xfs_btree_cur *cur,
 void xfs_btree_commit_afakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
 		struct xfs_buf *agbp, const struct xfs_btree_ops *ops);
 
+/* Fake root for an inode-rooted btree. */
+struct xbtree_ifakeroot {
+	/* Fake inode fork. */
+	struct xfs_ifork	*if_fork;
+
+	/* Number of blocks used by the btree. */
+	int64_t			if_blocks;
+
+	/* Height of the new btree. */
+	unsigned int		if_levels;
+
+	/* Number of bytes available for this fork in the inode. */
+	unsigned int		if_fork_size;
+
+	/* Fork format. */
+	unsigned int		if_format;
+
+	/* Number of records. */
+	unsigned int		if_extents;
+};
+
+/* Cursor interactions with with fake roots for inode-rooted btrees. */
+void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
+		struct xbtree_ifakeroot *ifake,
+		struct xfs_btree_ops **new_ops);
+void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
+		int whichfork, const struct xfs_btree_ops *ops);
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d8c229492973..05db0398f040 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3633,6 +3633,39 @@ TRACE_EVENT(xfs_btree_commit_afakeroot,
 		  __entry->agbno)
 )
 
+TRACE_EVENT(xfs_btree_commit_ifakeroot,
+	TP_PROTO(struct xfs_btree_cur *cur),
+	TP_ARGS(cur),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agino_t, agino)
+		__field(unsigned int, levels)
+		__field(unsigned int, blocks)
+		__field(int, whichfork)
+	),
+	TP_fast_assign(
+		__entry->dev = cur->bc_mp->m_super->s_dev;
+		__entry->btnum = cur->bc_btnum;
+		__entry->agno = XFS_INO_TO_AGNO(cur->bc_mp,
+					cur->bc_ino.ip->i_ino);
+		__entry->agino = XFS_INO_TO_AGINO(cur->bc_mp,
+					cur->bc_ino.ip->i_ino);
+		__entry->levels = cur->bc_ino.ifake->if_levels;
+		__entry->blocks = cur->bc_ino.ifake->if_blocks;
+		__entry->whichfork = cur->bc_ino.whichfork;
+	),
+	TP_printk("dev %d:%d btree %s ag %u agino %u whichfork %s levels %u blocks %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
+		  __entry->agno,
+		  __entry->agino,
+		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+		  __entry->levels,
+		  __entry->blocks)
+)
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] xfs: support bulk loading of staged btrees
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
  2020-03-12  3:45 ` [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
  2020-03-12  3:45 ` [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees Darrick J. Wong
@ 2020-03-12  3:45 ` Darrick J. Wong
  2020-03-13 14:49   ` Brian Foster
  2020-03-12  3:45 ` [PATCH 4/7] xfs: add support for free space btree staging cursors Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a new btree function that enables us to bulk load a btree cursor.
This will be used by the upcoming online repair patches to generate new
btrees.  This avoids the programmatic inefficiency of calling
xfs_btree_insert in a loop (which generates a lot of log traffic) in
favor of stamping out new btree blocks with ordered buffers, and then
committing both the new root and scheduling the removal of the old btree
blocks in a single transaction commit.

The design of this new generic code is based off the btree rebuilding
code in xfs_repair's phase 5 code, with the explicit goal of enabling us
to share that code between scrub and repair.  It has the additional
feature of being able to control btree block loading factors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c |  604 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_btree.h |   68 +++++
 fs/xfs/xfs_trace.c        |    1 
 fs/xfs/xfs_trace.h        |   85 ++++++
 4 files changed, 757 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 4e1d4f184d4b..d579d8e99046 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1324,7 +1324,7 @@ STATIC void
 xfs_btree_copy_ptrs(
 	struct xfs_btree_cur	*cur,
 	union xfs_btree_ptr	*dst_ptr,
-	union xfs_btree_ptr	*src_ptr,
+	const union xfs_btree_ptr *src_ptr,
 	int			numptrs)
 {
 	ASSERT(numptrs >= 0);
@@ -5179,3 +5179,605 @@ xfs_btree_commit_ifakeroot(
 	cur->bc_flags &= ~XFS_BTREE_STAGING;
 	cur->bc_tp = tp;
 }
+
+/*
+ * Bulk Loading of Staged Btrees
+ * =============================
+ *
+ * This interface is used with a staged btree cursor to create a totally new
+ * btree with a large number of records (i.e. more than what would fit in a
+ * single root block).  When the creation is complete, the new root can be
+ * linked atomically into the filesystem by committing the staged cursor.
+ *
+ * Creation of a new btree proceeds roughly as follows:
+ *
+ * The first step is to initialize an appropriate fake btree root structure and
+ * then construct a staged btree cursor.  Refer to the block comments about
+ * "Bulk Loading for AG Btrees" and "Bulk Loading for Inode-Rooted Btrees" for
+ * more information about how to do this.
+ *
+ * The second step is to initialize a struct xfs_btree_bload context as
+ * documented in the structure definition.
+ *
+ * The third step is to call xfs_btree_bload_compute_geometry to compute the
+ * height of and the number of blocks needed to construct the btree.  See the
+ * section "Computing the Geometry of the New Btree" for details about this
+ * computation.
+ *
+ * In step four, the caller must allocate xfs_btree_bload.nr_blocks blocks and
+ * save them for later use by ->claim_block().  Bulk loading requires all
+ * blocks to be allocated beforehand to avoid ENOSPC failures midway through a
+ * rebuild, and to minimize seek distances of the new btree.
+ *
+ * Step five is to call xfs_btree_bload() to start constructing the btree.
+ *
+ * The final step is to commit the staging btree cursor, which logs the new
+ * btree root and turns the staging cursor into a regular cursor.  The caller
+ * is responsible for cleaning up the previous btree blocks, if any.
+ *
+ * Computing the Geometry of the New Btree
+ * =======================================
+ *
+ * The number of items placed in each btree block is computed via the following
+ * algorithm: For leaf levels, the number of items for the level is nr_records
+ * in the bload structure.  For node levels, the number of items for the level
+ * is the number of blocks in the next lower level of the tree.  For each
+ * level, the desired number of items per block is defined as:
+ *
+ * desired = max(minrecs, maxrecs - slack factor)
+ *
+ * The number of blocks for the level is defined to be:
+ *
+ * blocks = floor(nr_items / desired)
+ *
+ * Note this is rounded down so that the npb calculation below will never fall
+ * below minrecs.  The number of items that will actually be loaded into each
+ * btree block is defined as:
+ *
+ * npb =  nr_items / blocks
+ *
+ * Some of the leftmost blocks in the level will contain one extra record as
+ * needed to handle uneven division.  If the number of records in any block
+ * would exceed maxrecs for that level, blocks is incremented and npb is
+ * recalculated.
+ *
+ * In other words, we compute the number of blocks needed to satisfy a given
+ * loading level, then spread the items as evenly as possible.
+ *
+ * The height and number of fs blocks required to create the btree are computed
+ * and returned via btree_height and nr_blocks.
+ */
+
+/*
+ * Put a btree block that we're loading onto the ordered list and release it.
+ * The btree blocks will be written to disk when bulk loading is finished.
+ */
+static void
+xfs_btree_bload_drop_buf(
+	struct list_head	*buffers_list,
+	struct xfs_buf		**bpp)
+{
+	if (*bpp == NULL)
+		return;
+
+	xfs_buf_delwri_queue(*bpp, buffers_list);
+	xfs_buf_relse(*bpp);
+	*bpp = NULL;
+}
+
+/*
+ * Allocate and initialize one btree block for bulk loading.
+ *
+ * The new btree block will have its level and numrecs fields set to the values
+ * of the level and nr_this_block parameters, respectively.  On exit, ptrp,
+ * bpp, and blockp will all point to the new block.
+ */
+STATIC int
+xfs_btree_bload_prep_block(
+	struct xfs_btree_cur		*cur,
+	struct xfs_btree_bload		*bbl,
+	unsigned int			level,
+	unsigned int			nr_this_block,
+	union xfs_btree_ptr		*ptrp,
+	struct xfs_buf			**bpp,
+	struct xfs_btree_block		**blockp,
+	void				*priv)
+{
+	union xfs_btree_ptr		new_ptr;
+	struct xfs_buf			*new_bp;
+	struct xfs_btree_block		*new_block;
+	int				ret;
+
+	ASSERT(*bpp == NULL);
+
+	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
+	    level == cur->bc_nlevels - 1) {
+		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
+		size_t			new_size;
+
+		/* Allocate a new incore btree root block. */
+		new_size = bbl->iroot_size(cur, nr_this_block, priv);
+		ifp->if_broot = kmem_zalloc(new_size, 0);
+		ifp->if_broot_bytes = (int)new_size;
+		ifp->if_flags |= XFS_IFBROOT;
+
+		/* Initialize it and send it out. */
+		xfs_btree_init_block_int(cur->bc_mp, ifp->if_broot,
+				XFS_BUF_DADDR_NULL, cur->bc_btnum, level,
+				nr_this_block, cur->bc_ino.ip->i_ino,
+				cur->bc_flags);
+
+		*bpp = NULL;
+		*blockp = ifp->if_broot;
+		xfs_btree_set_ptr_null(cur, ptrp);
+		return 0;
+	}
+
+	/* Claim one of the caller's preallocated blocks. */
+	xfs_btree_set_ptr_null(cur, &new_ptr);
+	ret = bbl->claim_block(cur, &new_ptr, priv);
+	if (ret)
+		return ret;
+
+	ASSERT(!xfs_btree_ptr_is_null(cur, &new_ptr));
+
+	ret = xfs_btree_get_buf_block(cur, &new_ptr, &new_block, &new_bp);
+	if (ret)
+		return ret;
+
+	/* Initialize the btree block. */
+	xfs_btree_init_block_cur(cur, new_bp, level, nr_this_block);
+	if (*blockp)
+		xfs_btree_set_sibling(cur, *blockp, &new_ptr, XFS_BB_RIGHTSIB);
+	xfs_btree_set_sibling(cur, new_block, ptrp, XFS_BB_LEFTSIB);
+
+	/* Set the out parameters. */
+	*bpp = new_bp;
+	*blockp = new_block;
+	xfs_btree_copy_ptrs(cur, ptrp, &new_ptr, 1);
+	return 0;
+}
+
+/* Load one leaf block. */
+STATIC int
+xfs_btree_bload_leaf(
+	struct xfs_btree_cur		*cur,
+	unsigned int			recs_this_block,
+	xfs_btree_bload_get_record_fn	get_record,
+	struct xfs_btree_block		*block,
+	void				*priv)
+{
+	unsigned int			j;
+	int				ret;
+
+	/* Fill the leaf block with records. */
+	for (j = 1; j <= recs_this_block; j++) {
+		union xfs_btree_rec	*block_rec;
+
+		ret = get_record(cur, priv);
+		if (ret)
+			return ret;
+		block_rec = xfs_btree_rec_addr(cur, j, block);
+		cur->bc_ops->init_rec_from_cur(cur, block_rec);
+	}
+
+	return 0;
+}
+
+/*
+ * Load one node block with key/ptr pairs.
+ *
+ * child_ptr must point to a block within the next level down in the tree.  A
+ * key/ptr entry will be created in the new node block to the block pointed to
+ * by child_ptr.  On exit, child_ptr points to the next block on the child
+ * level that needs processing.
+ */
+STATIC int
+xfs_btree_bload_node(
+	struct xfs_btree_cur	*cur,
+	unsigned int		recs_this_block,
+	union xfs_btree_ptr	*child_ptr,
+	struct xfs_btree_block	*block)
+{
+	unsigned int		j;
+	int			ret;
+
+	/* Fill the node block with keys and pointers. */
+	for (j = 1; j <= recs_this_block; j++) {
+		union xfs_btree_key	child_key;
+		union xfs_btree_ptr	*block_ptr;
+		union xfs_btree_key	*block_key;
+		struct xfs_btree_block	*child_block;
+		struct xfs_buf		*child_bp;
+
+		ASSERT(!xfs_btree_ptr_is_null(cur, child_ptr));
+
+		ret = xfs_btree_get_buf_block(cur, child_ptr, &child_block,
+				&child_bp);
+		if (ret)
+			return ret;
+
+		block_ptr = xfs_btree_ptr_addr(cur, j, block);
+		xfs_btree_copy_ptrs(cur, block_ptr, child_ptr, 1);
+
+		block_key = xfs_btree_key_addr(cur, j, block);
+		xfs_btree_get_keys(cur, child_block, &child_key);
+		xfs_btree_copy_keys(cur, block_key, &child_key, 1);
+
+		xfs_btree_get_sibling(cur, child_block, child_ptr,
+				XFS_BB_RIGHTSIB);
+		xfs_buf_relse(child_bp);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute the maximum number of records (or keyptrs) per block that we want to
+ * install at this level in the btree.  Caller is responsible for having set
+ * @cur->bc_ino.forksize to the desired fork size, if appropriate.
+ */
+STATIC unsigned int
+xfs_btree_bload_max_npb(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_bload	*bbl,
+	unsigned int		level)
+{
+	unsigned int		ret;
+
+	if (level == cur->bc_nlevels - 1 && cur->bc_ops->get_dmaxrecs)
+		return cur->bc_ops->get_dmaxrecs(cur, level);
+
+	ret = cur->bc_ops->get_maxrecs(cur, level);
+	if (level == 0)
+		ret -= bbl->leaf_slack;
+	else
+		ret -= bbl->node_slack;
+	return ret;
+}
+
+/*
+ * Compute the desired number of records (or keyptrs) per block that we want to
+ * install at this level in the btree, which must be somewhere between minrecs
+ * and max_npb.  The caller is free to install fewer records per block.
+ */
+STATIC unsigned int
+xfs_btree_bload_desired_npb(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_bload	*bbl,
+	unsigned int		level)
+{
+	unsigned int		npb = xfs_btree_bload_max_npb(cur, bbl, level);
+
+	/* Root blocks are not subject to minrecs rules. */
+	if (level == cur->bc_nlevels - 1)
+		return max(1U, npb);
+
+	return max_t(unsigned int, cur->bc_ops->get_minrecs(cur, level), npb);
+}
+
+/*
+ * Compute the number of records to be stored in each block at this level and
+ * the number of blocks for this level.  For leaf levels, we must populate an
+ * empty root block even if there are no records, so we have to have at least
+ * one block.
+ */
+STATIC void
+xfs_btree_bload_level_geometry(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_bload	*bbl,
+	unsigned int		level,
+	uint64_t		nr_this_level,
+	unsigned int		*avg_per_block,
+	uint64_t		*blocks,
+	uint64_t		*blocks_with_extra)
+{
+	uint64_t		npb;
+	uint64_t		dontcare;
+	unsigned int		desired_npb;
+	unsigned int		maxnr;
+
+	maxnr = cur->bc_ops->get_maxrecs(cur, level);
+
+	/*
+	 * Compute the number of blocks we need to fill each block with the
+	 * desired number of records/keyptrs per block.  Because desired_npb
+	 * could be minrecs, we use regular integer division (which rounds
+	 * the block count down) so that in the next step the effective # of
+	 * items per block will never be less than desired_npb.
+	 */
+	desired_npb = xfs_btree_bload_desired_npb(cur, bbl, level);
+	*blocks = div64_u64_rem(nr_this_level, desired_npb, &dontcare);
+	*blocks = max(1ULL, *blocks);
+
+	/*
+	 * Compute the number of records that we will actually put in each
+	 * block, assuming that we want to spread the records evenly between
+	 * the blocks.  Take care that the effective # of items per block (npb)
+	 * won't exceed maxrecs even for the blocks that get an extra record,
+	 * since desired_npb could be maxrecs, and in the previous step we
+	 * rounded the block count down.
+	 */
+	npb = div64_u64_rem(nr_this_level, *blocks, blocks_with_extra);
+	if (npb > maxnr || (npb == maxnr && *blocks_with_extra > 0)) {
+		(*blocks)++;
+		npb = div64_u64_rem(nr_this_level, *blocks, blocks_with_extra);
+	}
+
+	*avg_per_block = min_t(uint64_t, npb, nr_this_level);
+
+	trace_xfs_btree_bload_level_geometry(cur, level, nr_this_level,
+			*avg_per_block, desired_npb, *blocks,
+			*blocks_with_extra);
+}
+
+/*
+ * Ensure a slack value is appropriate for the btree.
+ *
+ * If the slack value is negative, set slack so that we fill the block to
+ * halfway between minrecs and maxrecs.  Make sure the slack is never so large
+ * that we can underflow minrecs.
+ */
+static void
+xfs_btree_bload_ensure_slack(
+	struct xfs_btree_cur	*cur,
+	int			*slack,
+	int			level)
+{
+	int			maxr;
+	int			minr;
+
+	maxr = cur->bc_ops->get_maxrecs(cur, level);
+	minr = cur->bc_ops->get_minrecs(cur, level);
+
+	/*
+	 * If slack is negative, automatically set slack so that we load the
+	 * btree block approximately halfway between minrecs and maxrecs.
+	 * Generally, this will net us 75% loading.
+	 */
+	if (*slack < 0)
+		*slack = maxr - ((maxr + minr) >> 1);
+
+	*slack = min(*slack, maxr - minr);
+}
+
+/*
+ * Prepare a btree cursor for a bulk load operation by computing the geometry
+ * fields in bbl.  Caller must ensure that the btree cursor is a staging
+ * cursor.  This function can be called multiple times.
+ */
+int
+xfs_btree_bload_compute_geometry(
+	struct xfs_btree_cur	*cur,
+	struct xfs_btree_bload	*bbl,
+	uint64_t		nr_records)
+{
+	uint64_t		nr_blocks = 0;
+	uint64_t		nr_this_level;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	/*
+	 * Make sure that the slack values make sense for btree blocks that are
+	 * full disk blocks.  We do this by setting the btree nlevels to 3,
+	 * because inode-rooted btrees will return different minrecs/maxrecs
+	 * values for the root block.  Note that slack settings are not applied
+	 * to inode roots.
+	 */
+	cur->bc_nlevels = 3;
+	xfs_btree_bload_ensure_slack(cur, &bbl->leaf_slack, 0);
+	xfs_btree_bload_ensure_slack(cur, &bbl->node_slack, 1);
+
+	bbl->nr_records = nr_this_level = nr_records;
+	for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
+		uint64_t	level_blocks;
+		uint64_t	dontcare64;
+		unsigned int	level = cur->bc_nlevels - 1;
+		unsigned int	avg_per_block;
+
+		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
+				&avg_per_block, &level_blocks, &dontcare64);
+
+		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
+			/*
+			 * If all the items we want to store at this level
+			 * would fit in the inode root block, then we have our
+			 * btree root and are done.
+			 *
+			 * Note that bmap btrees forbid records in the root.
+			 */
+			if (level != 0 && nr_this_level <= avg_per_block) {
+				nr_blocks++;
+				break;
+			}
+
+			/*
+			 * Otherwise, we have to store all the items for this
+			 * level in traditional btree blocks and therefore need
+			 * another level of btree to point to those blocks.
+			 *
+			 * We have to re-compute the geometry for each level of
+			 * an inode-rooted btree because the geometry differs
+			 * between a btree root in an inode fork and a
+			 * traditional btree block.
+			 *
+			 * This distinction is made in the btree code based on
+			 * whether level == bc_nlevels - 1.  Based on the
+			 * previous root block size check against the root
+			 * block geometry, we know that we aren't yet ready to
+			 * populate the root.  Increment bc_nevels and
+			 * recalculate the geometry for a traditional
+			 * block-based btree level.
+			 */
+			cur->bc_nlevels++;
+			xfs_btree_bload_level_geometry(cur, bbl, level,
+					nr_this_level, &avg_per_block,
+					&level_blocks, &dontcare64);
+		} else {
+			/*
+			 * If all the items we want to store at this level
+			 * would fit in a single root block, we're done.
+			 */
+			if (nr_this_level <= avg_per_block) {
+				nr_blocks++;
+				break;
+			}
+
+			/* Otherwise, we need another level of btree. */
+			cur->bc_nlevels++;
+		}
+
+		nr_blocks += level_blocks;
+		nr_this_level = level_blocks;
+	}
+
+	if (cur->bc_nlevels == XFS_BTREE_MAXLEVELS)
+		return -EOVERFLOW;
+
+	bbl->btree_height = cur->bc_nlevels;
+	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
+		bbl->nr_blocks = nr_blocks - 1;
+	else
+		bbl->nr_blocks = nr_blocks;
+	return 0;
+}
+
+/* Bulk load a btree given the parameters and geometry established in bbl. */
+int
+xfs_btree_bload(
+	struct xfs_btree_cur		*cur,
+	struct xfs_btree_bload		*bbl,
+	void				*priv)
+{
+	struct list_head		buffers_list;
+	union xfs_btree_ptr		child_ptr;
+	union xfs_btree_ptr		ptr;
+	struct xfs_buf			*bp = NULL;
+	struct xfs_btree_block		*block = NULL;
+	uint64_t			nr_this_level = bbl->nr_records;
+	uint64_t			blocks;
+	uint64_t			i;
+	uint64_t			blocks_with_extra;
+	uint64_t			total_blocks = 0;
+	unsigned int			avg_per_block;
+	unsigned int			level = 0;
+	int				ret;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	INIT_LIST_HEAD(&buffers_list);
+	cur->bc_nlevels = bbl->btree_height;
+	xfs_btree_set_ptr_null(cur, &child_ptr);
+	xfs_btree_set_ptr_null(cur, &ptr);
+
+	xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
+			&avg_per_block, &blocks, &blocks_with_extra);
+
+	/* Load each leaf block. */
+	for (i = 0; i < blocks; i++) {
+		unsigned int		nr_this_block = avg_per_block;
+
+		if (i < blocks_with_extra)
+			nr_this_block++;
+
+		xfs_btree_bload_drop_buf(&buffers_list, &bp);
+
+		ret = xfs_btree_bload_prep_block(cur, bbl, level,
+				nr_this_block, &ptr, &bp, &block, priv);
+		if (ret)
+			goto out;
+
+		trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr,
+				nr_this_block);
+
+		ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_record,
+				block, priv);
+		if (ret)
+			goto out;
+
+		/*
+		 * Record the leftmost leaf pointer so we know where to start
+		 * with the first node level.
+		 */
+		if (i == 0)
+			xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1);
+	}
+	total_blocks += blocks;
+	xfs_btree_bload_drop_buf(&buffers_list, &bp);
+
+	/* Populate the internal btree nodes. */
+	for (level = 1; level < cur->bc_nlevels; level++) {
+		union xfs_btree_ptr	first_ptr;
+
+		nr_this_level = blocks;
+		block = NULL;
+		xfs_btree_set_ptr_null(cur, &ptr);
+
+		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
+				&avg_per_block, &blocks, &blocks_with_extra);
+
+		/* Load each node block. */
+		for (i = 0; i < blocks; i++) {
+			unsigned int	nr_this_block = avg_per_block;
+
+			if (i < blocks_with_extra)
+				nr_this_block++;
+
+			xfs_btree_bload_drop_buf(&buffers_list, &bp);
+
+			ret = xfs_btree_bload_prep_block(cur, bbl, level,
+					nr_this_block, &ptr, &bp, &block,
+					priv);
+			if (ret)
+				goto out;
+
+			trace_xfs_btree_bload_block(cur, level, i, blocks,
+					&ptr, nr_this_block);
+
+			ret = xfs_btree_bload_node(cur, nr_this_block,
+					&child_ptr, block);
+			if (ret)
+				goto out;
+
+			/*
+			 * Record the leftmost node pointer so that we know
+			 * where to start the next node level above this one.
+			 */
+			if (i == 0)
+				xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1);
+		}
+		total_blocks += blocks;
+		xfs_btree_bload_drop_buf(&buffers_list, &bp);
+		xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1);
+	}
+
+	/* Initialize the new root. */
+	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
+		ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
+		cur->bc_ino.ifake->if_levels = cur->bc_nlevels;
+		cur->bc_ino.ifake->if_blocks = total_blocks - 1;
+	} else {
+		cur->bc_ag.afake->af_root = be32_to_cpu(ptr.s);
+		cur->bc_ag.afake->af_levels = cur->bc_nlevels;
+		cur->bc_ag.afake->af_blocks = total_blocks;
+	}
+
+	/*
+	 * Write the new blocks to disk.  If the ordered list isn't empty after
+	 * that, then something went wrong and we have to fail.  This should
+	 * never happen, but we'll check anyway.
+	 */
+	ret = xfs_buf_delwri_submit(&buffers_list);
+	if (ret)
+		goto out;
+	if (!list_empty(&buffers_list)) {
+		ASSERT(list_empty(&buffers_list));
+		ret = -EIO;
+	}
+
+out:
+	xfs_buf_delwri_cancel(&buffers_list);
+	if (bp)
+		xfs_buf_relse(bp);
+	return ret;
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 047067f52063..c2de439a6f0d 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -574,4 +574,72 @@ void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
 void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
 		int whichfork, const struct xfs_btree_ops *ops);
 
+/* Bulk loading of staged btrees. */
+typedef int (*xfs_btree_bload_get_record_fn)(struct xfs_btree_cur *cur, void *priv);
+typedef int (*xfs_btree_bload_claim_block_fn)(struct xfs_btree_cur *cur,
+		union xfs_btree_ptr *ptr, void *priv);
+typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur,
+		unsigned int nr_this_level, void *priv);
+
+struct xfs_btree_bload {
+	/*
+	 * This function will be called nr_records times to load records into
+	 * the btree.  The function does this by setting the cursor's bc_rec
+	 * field in in-core format.  Records must be returned in sort order.
+	 */
+	xfs_btree_bload_get_record_fn	get_record;
+
+	/*
+	 * This function will be called nr_blocks times to obtain a pointer
+	 * to a new btree block on disk.  Callers must preallocate all space
+	 * for the new btree before calling xfs_btree_bload, and this function
+	 * is what claims that reservation.
+	 */
+	xfs_btree_bload_claim_block_fn	claim_block;
+
+	/*
+	 * This function should return the size of the in-core btree root
+	 * block.  It is only necessary for XFS_BTREE_ROOT_IN_INODE btree
+	 * types.
+	 */
+	xfs_btree_bload_iroot_size_fn	iroot_size;
+
+	/*
+	 * The caller should set this to the number of records that will be
+	 * stored in the new btree.
+	 */
+	uint64_t			nr_records;
+
+	/*
+	 * Number of free records to leave in each leaf block.  If the caller
+	 * sets this to -1, the slack value will be calculated to be be halfway
+	 * between maxrecs and minrecs.  This typically leaves the block 75%
+	 * full.  Note that slack values are not enforced on inode root blocks.
+	 */
+	int				leaf_slack;
+
+	/*
+	 * Number of free key/ptrs pairs to leave in each node block.  This
+	 * field has the same semantics as leaf_slack.
+	 */
+	int				node_slack;
+
+	/*
+	 * The xfs_btree_bload_compute_geometry function will set this to the
+	 * number of btree blocks needed to store nr_records records.
+	 */
+	uint64_t			nr_blocks;
+
+	/*
+	 * The xfs_btree_bload_compute_geometry function will set this to the
+	 * height of the new btree.
+	 */
+	unsigned int			btree_height;
+};
+
+int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur,
+		struct xfs_btree_bload *bbl, uint64_t nr_records);
+int xfs_btree_bload(struct xfs_btree_cur *cur, struct xfs_btree_bload *bbl,
+		void *priv);
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index bc85b89f88ca..9b5e58a92381 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -6,6 +6,7 @@
 #include "xfs.h"
 #include "xfs_fs.h"
 #include "xfs_shared.h"
+#include "xfs_bit.h"
 #include "xfs_format.h"
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 05db0398f040..efc7751550d9 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -35,6 +35,7 @@ struct xfs_icreate_log;
 struct xfs_owner_info;
 struct xfs_trans_res;
 struct xfs_inobt_rec_incore;
+union xfs_btree_ptr;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -3666,6 +3667,90 @@ TRACE_EVENT(xfs_btree_commit_ifakeroot,
 		  __entry->blocks)
 )
 
+TRACE_EVENT(xfs_btree_bload_level_geometry,
+	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
+		 uint64_t nr_this_level, unsigned int nr_per_block,
+		 unsigned int desired_npb, uint64_t blocks,
+		 uint64_t blocks_with_extra),
+	TP_ARGS(cur, level, nr_this_level, nr_per_block, desired_npb, blocks,
+		blocks_with_extra),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(unsigned int, level)
+		__field(unsigned int, nlevels)
+		__field(uint64_t, nr_this_level)
+		__field(unsigned int, nr_per_block)
+		__field(unsigned int, desired_npb)
+		__field(unsigned long long, blocks)
+		__field(unsigned long long, blocks_with_extra)
+	),
+	TP_fast_assign(
+		__entry->dev = cur->bc_mp->m_super->s_dev;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->nlevels = cur->bc_nlevels;
+		__entry->nr_this_level = nr_this_level;
+		__entry->nr_per_block = nr_per_block;
+		__entry->desired_npb = desired_npb;
+		__entry->blocks = blocks;
+		__entry->blocks_with_extra = blocks_with_extra;
+	),
+	TP_printk("dev %d:%d btree %s level %u/%u nr_this_level %llu nr_per_block %u desired_npb %u blocks %llu blocks_with_extra %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
+		  __entry->level,
+		  __entry->nlevels,
+		  __entry->nr_this_level,
+		  __entry->nr_per_block,
+		  __entry->desired_npb,
+		  __entry->blocks,
+		  __entry->blocks_with_extra)
+)
+
+TRACE_EVENT(xfs_btree_bload_block,
+	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
+		 uint64_t block_idx, uint64_t nr_blocks,
+		 union xfs_btree_ptr *ptr, unsigned int nr_records),
+	TP_ARGS(cur, level, block_idx, nr_blocks, ptr, nr_records),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(unsigned int, level)
+		__field(unsigned long long, block_idx)
+		__field(unsigned long long, nr_blocks)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, agbno)
+		__field(unsigned int, nr_records)
+	),
+	TP_fast_assign(
+		__entry->dev = cur->bc_mp->m_super->s_dev;
+		__entry->btnum = cur->bc_btnum;
+		__entry->level = level;
+		__entry->block_idx = block_idx;
+		__entry->nr_blocks = nr_blocks;
+		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+			xfs_fsblock_t	fsb = be64_to_cpu(ptr->l);
+
+			__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsb);
+			__entry->agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsb);
+		} else {
+			__entry->agno = cur->bc_ag.agno;
+			__entry->agbno = be32_to_cpu(ptr->s);
+		}
+		__entry->nr_records = nr_records;
+	),
+	TP_printk("dev %d:%d btree %s level %u block %llu/%llu fsb (%u/%u) recs %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
+		  __entry->level,
+		  __entry->block_idx,
+		  __entry->nr_blocks,
+		  __entry->agno,
+		  __entry->agbno,
+		  __entry->nr_records)
+)
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] xfs: add support for free space btree staging cursors
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-03-12  3:45 ` [PATCH 3/7] xfs: support bulk loading of staged btrees Darrick J. Wong
@ 2020-03-12  3:45 ` Darrick J. Wong
  2020-03-12  3:46 ` [PATCH 5/7] xfs: add support for inode " Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:45 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Add support for btree staging cursors for the free space btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc_btree.c |   96 ++++++++++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_alloc_btree.h |    7 +++
 2 files changed, 85 insertions(+), 18 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index a28041fdf4c0..e9d095e19f83 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -471,6 +471,41 @@ static const struct xfs_btree_ops xfs_cntbt_ops = {
 	.recs_inorder		= xfs_cntbt_recs_inorder,
 };
 
+/* Allocate most of a new allocation btree cursor. */
+STATIC struct xfs_btree_cur *
+xfs_allocbt_init_common(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum)
+{
+	struct xfs_btree_cur	*cur;
+
+	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
+
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
+
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = btnum;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ag.agno = agno;
+	cur->bc_ag.abt.active = false;
+
+	if (btnum == XFS_BTNUM_CNT) {
+		cur->bc_ops = &xfs_cntbt_ops;
+		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtc_2);
+	} else {
+		cur->bc_ops = &xfs_bnobt_ops;
+		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtb_2);
+	}
+
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+
+	return cur;
+}
+
 /*
  * Allocate a new allocation btree cursor.
  */
@@ -485,36 +520,61 @@ xfs_allocbt_init_cursor(
 	struct xfs_agf		*agf = agbp->b_addr;
 	struct xfs_btree_cur	*cur;
 
-	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
-
-	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
-
-	cur->bc_tp = tp;
-	cur->bc_mp = mp;
-	cur->bc_btnum = btnum;
-	cur->bc_blocklog = mp->m_sb.sb_blocklog;
-
+	cur = xfs_allocbt_init_common(mp, tp, agno, btnum);
 	if (btnum == XFS_BTNUM_CNT) {
-		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtc_2);
-		cur->bc_ops = &xfs_cntbt_ops;
 		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
-		cur->bc_flags = XFS_BTREE_LASTREC_UPDATE;
+		cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE;
 	} else {
-		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtb_2);
-		cur->bc_ops = &xfs_bnobt_ops;
 		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
 	}
 
 	cur->bc_ag.agbp = agbp;
-	cur->bc_ag.agno = agno;
-	cur->bc_ag.abt.active = false;
 
-	if (xfs_sb_version_hascrc(&mp->m_sb))
-		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+	return cur;
+}
 
+/* Create a free space btree cursor with a fake root for staging. */
+struct xfs_btree_cur *
+xfs_allocbt_stage_cursor(
+	struct xfs_mount	*mp,
+	struct xbtree_afakeroot	*afake,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum)
+{
+	struct xfs_btree_cur	*cur;
+
+	cur = xfs_allocbt_init_common(mp, NULL, agno, btnum);
+	xfs_btree_stage_afakeroot(cur, afake, NULL);
 	return cur;
 }
 
+/*
+ * Install a new free space btree root.  Caller is responsible for invalidating
+ * and freeing the old btree blocks.
+ */
+void
+xfs_allocbt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp)
+{
+	struct xfs_agf		*agf = agbp->b_addr;
+	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	agf->agf_roots[cur->bc_btnum] = cpu_to_be32(afake->af_root);
+	agf->agf_levels[cur->bc_btnum] = cpu_to_be32(afake->af_levels);
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+
+	if (cur->bc_btnum == XFS_BTNUM_BNO) {
+		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_bnobt_ops);
+	} else {
+		cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE;
+		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_cntbt_ops);
+	}
+}
+
 /*
  * Calculate number of records in an alloc btree block.
  */
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
index c9305ebb69f6..047f09f0be3c 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.h
+++ b/fs/xfs/libxfs/xfs_alloc_btree.h
@@ -13,6 +13,7 @@
 struct xfs_buf;
 struct xfs_btree_cur;
 struct xfs_mount;
+struct xbtree_afakeroot;
 
 /*
  * Btree block header size depends on a superblock flag.
@@ -48,8 +49,14 @@ struct xfs_mount;
 extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_buf *,
 		xfs_agnumber_t, xfs_btnum_t);
+struct xfs_btree_cur *xfs_allocbt_stage_cursor(struct xfs_mount *mp,
+		struct xbtree_afakeroot *afake, xfs_agnumber_t agno,
+		xfs_btnum_t btnum);
 extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
 extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 
+void xfs_allocbt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp, struct xfs_buf *agbp);
+
 #endif	/* __XFS_ALLOC_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] xfs: add support for inode btree staging cursors
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-03-12  3:45 ` [PATCH 4/7] xfs: add support for free space btree staging cursors Darrick J. Wong
@ 2020-03-12  3:46 ` Darrick J. Wong
  2020-03-12  3:46 ` [PATCH 6/7] xfs: add support for refcount " Darrick J. Wong
  2020-03-12  3:46 ` [PATCH 7/7] xfs: add support for rmap " Darrick J. Wong
  6 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Add support for btree staging cursors for the inode btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_ialloc_btree.c |   80 +++++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_ialloc_btree.h |    6 +++
 2 files changed, 75 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index e0e8570af023..d7cf74a68578 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -400,32 +400,27 @@ static const struct xfs_btree_ops xfs_finobt_ops = {
 };
 
 /*
- * Allocate a new inode btree cursor.
+ * Initialize a new inode btree cursor.
  */
-struct xfs_btree_cur *				/* new inode btree cursor */
-xfs_inobt_init_cursor(
+static struct xfs_btree_cur *
+xfs_inobt_init_common(
 	struct xfs_mount	*mp,		/* file system mount point */
 	struct xfs_trans	*tp,		/* transaction pointer */
-	struct xfs_buf		*agbp,		/* buffer for agi structure */
 	xfs_agnumber_t		agno,		/* allocation group number */
 	xfs_btnum_t		btnum)		/* ialloc or free ino btree */
 {
-	struct xfs_agi		*agi = agbp->b_addr;
 	struct xfs_btree_cur	*cur;
 
 	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
-
 	cur->bc_tp = tp;
 	cur->bc_mp = mp;
 	cur->bc_btnum = btnum;
 	if (btnum == XFS_BTNUM_INO) {
-		cur->bc_nlevels = be32_to_cpu(agi->agi_level);
-		cur->bc_ops = &xfs_inobt_ops;
 		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_ibt_2);
+		cur->bc_ops = &xfs_inobt_ops;
 	} else {
-		cur->bc_nlevels = be32_to_cpu(agi->agi_free_level);
-		cur->bc_ops = &xfs_finobt_ops;
 		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_fibt_2);
+		cur->bc_ops = &xfs_finobt_ops;
 	}
 
 	cur->bc_blocklog = mp->m_sb.sb_blocklog;
@@ -433,12 +428,75 @@ xfs_inobt_init_cursor(
 	if (xfs_sb_version_hascrc(&mp->m_sb))
 		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
 
-	cur->bc_ag.agbp = agbp;
 	cur->bc_ag.agno = agno;
+	return cur;
+}
 
+/* Create an inode btree cursor. */
+struct xfs_btree_cur *
+xfs_inobt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum)
+{
+	struct xfs_btree_cur	*cur;
+	struct xfs_agi		*agi = agbp->b_addr;
+
+	cur = xfs_inobt_init_common(mp, tp, agno, btnum);
+	if (btnum == XFS_BTNUM_INO)
+		cur->bc_nlevels = be32_to_cpu(agi->agi_level);
+	else
+		cur->bc_nlevels = be32_to_cpu(agi->agi_free_level);
+	cur->bc_ag.agbp = agbp;
 	return cur;
 }
 
+/* Create an inode btree cursor with a fake root for staging. */
+struct xfs_btree_cur *
+xfs_inobt_stage_cursor(
+	struct xfs_mount	*mp,
+	struct xbtree_afakeroot	*afake,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum)
+{
+	struct xfs_btree_cur	*cur;
+
+	cur = xfs_inobt_init_common(mp, NULL, agno, btnum);
+	xfs_btree_stage_afakeroot(cur, afake, NULL);
+	return cur;
+}
+
+/*
+ * Install a new inobt btree root.  Caller is responsible for invalidating
+ * and freeing the old btree blocks.
+ */
+void
+xfs_inobt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp)
+{
+	struct xfs_agi		*agi = agbp->b_addr;
+	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	if (cur->bc_btnum == XFS_BTNUM_INO) {
+		agi->agi_root = cpu_to_be32(afake->af_root);
+		agi->agi_level = cpu_to_be32(afake->af_levels);
+		xfs_ialloc_log_agi(tp, agbp, XFS_AGI_ROOT | XFS_AGI_LEVEL);
+		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_inobt_ops);
+	} else {
+		agi->agi_free_root = cpu_to_be32(afake->af_root);
+		agi->agi_free_level = cpu_to_be32(afake->af_levels);
+		xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREE_ROOT |
+					     XFS_AGI_FREE_LEVEL);
+		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_finobt_ops);
+	}
+}
+
 /*
  * Calculate number of records in an inobt btree block.
  */
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index 951305ecaae1..35bbd978c272 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -48,6 +48,9 @@ struct xfs_mount;
 extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_buf *, xfs_agnumber_t,
 		xfs_btnum_t);
+struct xfs_btree_cur *xfs_inobt_stage_cursor(struct xfs_mount *mp,
+		struct xbtree_afakeroot *afake, xfs_agnumber_t agno,
+		xfs_btnum_t btnum);
 extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
 
 /* ir_holemask to inode allocation bitmap conversion */
@@ -68,4 +71,7 @@ int xfs_inobt_cur(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_agnumber_t agno, xfs_btnum_t btnum,
 		struct xfs_btree_cur **curpp, struct xfs_buf **agi_bpp);
 
+void xfs_inobt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp, struct xfs_buf *agbp);
+
 #endif	/* __XFS_IALLOC_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] xfs: add support for refcount btree staging cursors
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
                   ` (4 preceding siblings ...)
  2020-03-12  3:46 ` [PATCH 5/7] xfs: add support for inode " Darrick J. Wong
@ 2020-03-12  3:46 ` Darrick J. Wong
  2020-03-12  3:46 ` [PATCH 7/7] xfs: add support for rmap " Darrick J. Wong
  6 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Add support for btree staging cursors for the refcount btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_refcount_btree.c |   69 +++++++++++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_refcount_btree.h |    6 +++
 2 files changed, 65 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index e07a2c45f8ec..0224b13733c8 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -311,41 +311,90 @@ static const struct xfs_btree_ops xfs_refcountbt_ops = {
 };
 
 /*
- * Allocate a new refcount btree cursor.
+ * Initialize a new refcount btree cursor.
  */
-struct xfs_btree_cur *
-xfs_refcountbt_init_cursor(
+static struct xfs_btree_cur *
+xfs_refcountbt_init_common(
 	struct xfs_mount	*mp,
 	struct xfs_trans	*tp,
-	struct xfs_buf		*agbp,
 	xfs_agnumber_t		agno)
 {
-	struct xfs_agf		*agf = agbp->b_addr;
 	struct xfs_btree_cur	*cur;
 
 	ASSERT(agno != NULLAGNUMBER);
 	ASSERT(agno < mp->m_sb.sb_agcount);
-	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
 
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
 	cur->bc_tp = tp;
 	cur->bc_mp = mp;
 	cur->bc_btnum = XFS_BTNUM_REFC;
 	cur->bc_blocklog = mp->m_sb.sb_blocklog;
-	cur->bc_ops = &xfs_refcountbt_ops;
 	cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_refcbt_2);
 
-	cur->bc_nlevels = be32_to_cpu(agf->agf_refcount_level);
-
-	cur->bc_ag.agbp = agbp;
 	cur->bc_ag.agno = agno;
 	cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
 
 	cur->bc_ag.refc.nr_ops = 0;
 	cur->bc_ag.refc.shape_changes = 0;
+	cur->bc_ops = &xfs_refcountbt_ops;
+	return cur;
+}
+
+/* Create a btree cursor. */
+struct xfs_btree_cur *
+xfs_refcountbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_agf		*agf = agbp->b_addr;
+	struct xfs_btree_cur	*cur;
+
+	cur = xfs_refcountbt_init_common(mp, tp, agno);
+	cur->bc_nlevels = be32_to_cpu(agf->agf_refcount_level);
+	cur->bc_ag.agbp = agbp;
+	return cur;
+}
+
+/* Create a btree cursor with a fake root for staging. */
+struct xfs_btree_cur *
+xfs_refcountbt_stage_cursor(
+	struct xfs_mount	*mp,
+	struct xbtree_afakeroot	*afake,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_btree_cur	*cur;
 
+	cur = xfs_refcountbt_init_common(mp, NULL, agno);
+	xfs_btree_stage_afakeroot(cur, afake, NULL);
 	return cur;
 }
 
+/*
+ * Swap in the new btree root.  Once we pass this point the newly rebuilt btree
+ * is in place and we have to kill off all the old btree blocks.
+ */
+void
+xfs_refcountbt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp)
+{
+	struct xfs_agf		*agf = agbp->b_addr;
+	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	agf->agf_refcount_root = cpu_to_be32(afake->af_root);
+	agf->agf_refcount_level = cpu_to_be32(afake->af_levels);
+	agf->agf_refcount_blocks = cpu_to_be32(afake->af_blocks);
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_REFCOUNT_BLOCKS |
+				    XFS_AGF_REFCOUNT_ROOT |
+				    XFS_AGF_REFCOUNT_LEVEL);
+	xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_refcountbt_ops);
+}
+
 /*
  * Calculate the number of records in a refcount btree block.
  */
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
index ba416f71c824..69dc515db671 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.h
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -13,6 +13,7 @@
 struct xfs_buf;
 struct xfs_btree_cur;
 struct xfs_mount;
+struct xbtree_afakeroot;
 
 /*
  * Btree block header size
@@ -46,6 +47,8 @@ struct xfs_mount;
 extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
 		struct xfs_trans *tp, struct xfs_buf *agbp,
 		xfs_agnumber_t agno);
+struct xfs_btree_cur *xfs_refcountbt_stage_cursor(struct xfs_mount *mp,
+		struct xbtree_afakeroot *afake, xfs_agnumber_t agno);
 extern int xfs_refcountbt_maxrecs(int blocklen, bool leaf);
 extern void xfs_refcountbt_compute_maxlevels(struct xfs_mount *mp);
 
@@ -58,4 +61,7 @@ extern int xfs_refcountbt_calc_reserves(struct xfs_mount *mp,
 		struct xfs_trans *tp, xfs_agnumber_t agno, xfs_extlen_t *ask,
 		xfs_extlen_t *used);
 
+void xfs_refcountbt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp, struct xfs_buf *agbp);
+
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] xfs: add support for rmap btree staging cursors
  2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
                   ` (5 preceding siblings ...)
  2020-03-12  3:46 ` [PATCH 6/7] xfs: add support for refcount " Darrick J. Wong
@ 2020-03-12  3:46 ` Darrick J. Wong
  6 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-12  3:46 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Add support for btree staging cursors for the rmap btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_rmap_btree.c |   66 ++++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_rmap_btree.h |    5 +++
 2 files changed, 61 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index af7e4966416f..a33543e1d434 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -448,17 +448,12 @@ static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.recs_inorder		= xfs_rmapbt_recs_inorder,
 };
 
-/*
- * Allocate a new allocation btree cursor.
- */
-struct xfs_btree_cur *
-xfs_rmapbt_init_cursor(
+static struct xfs_btree_cur *
+xfs_rmapbt_init_common(
 	struct xfs_mount	*mp,
 	struct xfs_trans	*tp,
-	struct xfs_buf		*agbp,
 	xfs_agnumber_t		agno)
 {
-	struct xfs_agf		*agf = agbp->b_addr;
 	struct xfs_btree_cur	*cur;
 
 	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
@@ -468,16 +463,67 @@ xfs_rmapbt_init_cursor(
 	cur->bc_btnum = XFS_BTNUM_RMAP;
 	cur->bc_flags = XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING;
 	cur->bc_blocklog = mp->m_sb.sb_blocklog;
-	cur->bc_ops = &xfs_rmapbt_ops;
-	cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
 	cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2);
+	cur->bc_ag.agno = agno;
+	cur->bc_ops = &xfs_rmapbt_ops;
 
+	return cur;
+}
+
+/* Create a new reverse mapping btree cursor. */
+struct xfs_btree_cur *
+xfs_rmapbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_agf		*agf = agbp->b_addr;
+	struct xfs_btree_cur	*cur;
+
+	cur = xfs_rmapbt_init_common(mp, tp, agno);
+	cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
 	cur->bc_ag.agbp = agbp;
-	cur->bc_ag.agno = agno;
+	return cur;
+}
+
+/* Create a new reverse mapping btree cursor with a fake root for staging. */
+struct xfs_btree_cur *
+xfs_rmapbt_stage_cursor(
+	struct xfs_mount	*mp,
+	struct xbtree_afakeroot	*afake,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_btree_cur	*cur;
 
+	cur = xfs_rmapbt_init_common(mp, NULL, agno);
+	xfs_btree_stage_afakeroot(cur, afake, NULL);
 	return cur;
 }
 
+/*
+ * Install a new reverse mapping btree root.  Caller is responsible for
+ * invalidating and freeing the old btree blocks.
+ */
+void
+xfs_rmapbt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp)
+{
+	struct xfs_agf		*agf = agbp->b_addr;
+	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	agf->agf_roots[cur->bc_btnum] = cpu_to_be32(afake->af_root);
+	agf->agf_levels[cur->bc_btnum] = cpu_to_be32(afake->af_levels);
+	agf->agf_rmap_blocks = cpu_to_be32(afake->af_blocks);
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS |
+				    XFS_AGF_RMAP_BLOCKS);
+	xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_rmapbt_ops);
+}
+
 /*
  * Calculate number of records in an rmap btree block.
  */
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 820d668b063d..115c3455a734 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -9,6 +9,7 @@
 struct xfs_buf;
 struct xfs_btree_cur;
 struct xfs_mount;
+struct xbtree_afakeroot;
 
 /* rmaps only exist on crc enabled filesystems */
 #define XFS_RMAP_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
@@ -43,6 +44,10 @@ struct xfs_mount;
 struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
 				struct xfs_trans *tp, struct xfs_buf *bp,
 				xfs_agnumber_t agno);
+struct xfs_btree_cur *xfs_rmapbt_stage_cursor(struct xfs_mount *mp,
+		struct xbtree_afakeroot *afake, xfs_agnumber_t agno);
+void xfs_rmapbt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp, struct xfs_buf *agbp);
 int xfs_rmapbt_maxrecs(int blocklen, int leaf);
 extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp);
 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees
  2020-03-12  3:45 ` [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
@ 2020-03-13 14:47   ` Brian Foster
  2020-03-13 16:30     ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2020-03-13 14:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Mar 11, 2020 at 08:45:37PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create an in-core fake root for AG-rooted btree types so that callers
> can generate a whole new btree using the upcoming btree bulk load
> function without making the new tree accessible from the rest of the
> filesystem.  It is up to the individual btree type to provide a function
> to create a staged cursor (presumably with the appropriate callouts to
> update the fakeroot) and then commit the staged root back into the
> filesystem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_btree.c |  168 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_btree.h |   30 ++++++++
>  fs/xfs/xfs_trace.h        |   28 ++++++++
>  3 files changed, 225 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 4ef9f0b42c7f..085bc070e804 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
...
> @@ -4908,3 +4910,169 @@ xfs_btree_has_more_records(
>  	else
>  		return block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK);
>  }
> +
...
> +/*
> + * Initialize a AG-rooted btree cursor with the given AG btree fake root.  The
> + * btree cursor's bc_ops will be overridden as needed to make the staging
> + * functionality work.  If new_ops is not NULL, these new ops will be passed
> + * out to the caller for further overriding.
> + */
> +void
> +xfs_btree_stage_afakeroot(
> +	struct xfs_btree_cur		*cur,
> +	struct xbtree_afakeroot		*afake,
> +	struct xfs_btree_ops		**new_ops)
> +{
> +	struct xfs_btree_ops		*nops;
> +
> +	ASSERT(!(cur->bc_flags & XFS_BTREE_STAGING));
> +	ASSERT(!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE));
> +	ASSERT(cur->bc_tp == NULL);
> +
> +	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
> +	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
> +	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
> +	nops->free_block = xfs_btree_fakeroot_free_block;
> +	nops->init_ptr_from_cur = xfs_btree_fakeroot_init_ptr_from_cur;
> +	nops->set_root = xfs_btree_afakeroot_set_root;
> +	nops->dup_cursor = xfs_btree_fakeroot_dup_cursor;
> +
> +	cur->bc_ag.afake = afake;
> +	cur->bc_nlevels = afake->af_levels;
> +	cur->bc_ops = nops;
> +	cur->bc_flags |= XFS_BTREE_STAGING;
> +
> +	if (new_ops)
> +		*new_ops = nops;

Curious why we have new_ops if the caller unconditionally assigns
->bc_ops to the same value..? That aside:

Reviewed-by: Brian Foster <bfoster@redhat.com> 

> +}
> +
> +/*
> + * Transform an AG-rooted staging btree cursor back into a regular cursor by
> + * substituting a real btree root for the fake one and restoring normal btree
> + * cursor ops.  The caller must log the btree root change prior to calling
> + * this.
> + */
> +void
> +xfs_btree_commit_afakeroot(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_trans		*tp,
> +	struct xfs_buf			*agbp,
> +	const struct xfs_btree_ops	*ops)
> +{
> +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> +	ASSERT(cur->bc_tp == NULL);
> +
> +	trace_xfs_btree_commit_afakeroot(cur);
> +
> +	kmem_free((void *)cur->bc_ops);
> +	cur->bc_ag.agbp = agbp;
> +	cur->bc_ops = ops;
> +	cur->bc_flags &= ~XFS_BTREE_STAGING;
> +	cur->bc_tp = tp;
> +}
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 0d10bbd5223a..aa4a7bd40023 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -179,7 +179,10 @@ union xfs_btree_irec {
>  
>  /* Per-AG btree information. */
>  struct xfs_btree_cur_ag {
> -	struct xfs_buf		*agbp;
> +	union {
> +		struct xfs_buf		*agbp;
> +		struct xbtree_afakeroot	*afake;	/* fake ag header root */
> +	};
>  	xfs_agnumber_t		agno;
>  	union {
>  		struct {
> @@ -235,6 +238,12 @@ typedef struct xfs_btree_cur
>  #define XFS_BTREE_LASTREC_UPDATE	(1<<2)	/* track last rec externally */
>  #define XFS_BTREE_CRC_BLOCKS		(1<<3)	/* uses extended btree blocks */
>  #define XFS_BTREE_OVERLAPPING		(1<<4)	/* overlapping intervals */
> +/*
> + * The root of this btree is a fakeroot structure so that we can stage a btree
> + * rebuild without leaving it accessible via primary metadata.  The ops struct
> + * is dynamically allocated and must be freed when the cursor is deleted.
> + */
> +#define XFS_BTREE_STAGING		(1<<5)
>  
>  
>  #define	XFS_BTREE_NOERROR	0
> @@ -515,4 +524,23 @@ xfs_btree_islastblock(
>  	return block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK);
>  }
>  
> +/* Fake root for an AG-rooted btree. */
> +struct xbtree_afakeroot {
> +	/* AG block number of the new btree root. */
> +	xfs_agblock_t		af_root;
> +
> +	/* Height of the new btree. */
> +	unsigned int		af_levels;
> +
> +	/* Number of blocks used by the btree. */
> +	unsigned int		af_blocks;
> +};
> +
> +/* Cursor interactions with with fake roots for AG-rooted btrees. */
> +void xfs_btree_stage_afakeroot(struct xfs_btree_cur *cur,
> +		struct xbtree_afakeroot *afake,
> +		struct xfs_btree_ops **new_ops);
> +void xfs_btree_commit_afakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
> +		struct xfs_buf *agbp, const struct xfs_btree_ops *ops);
> +
>  #endif	/* __XFS_BTREE_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 059c3098a4a0..d8c229492973 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3605,6 +3605,34 @@ TRACE_EVENT(xfs_check_new_dalign,
>  		  __entry->calc_rootino)
>  )
>  
> +TRACE_EVENT(xfs_btree_commit_afakeroot,
> +	TP_PROTO(struct xfs_btree_cur *cur),
> +	TP_ARGS(cur),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, agbno)
> +		__field(unsigned int, levels)
> +		__field(unsigned int, blocks)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = cur->bc_mp->m_super->s_dev;
> +		__entry->btnum = cur->bc_btnum;
> +		__entry->agno = cur->bc_ag.agno;
> +		__entry->agbno = cur->bc_ag.afake->af_root;
> +		__entry->levels = cur->bc_ag.afake->af_levels;
> +		__entry->blocks = cur->bc_ag.afake->af_blocks;
> +	),
> +	TP_printk("dev %d:%d btree %s ag %u levels %u blocks %u root %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> +		  __entry->agno,
> +		  __entry->levels,
> +		  __entry->blocks,
> +		  __entry->agbno)
> +)
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees
  2020-03-12  3:45 ` [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees Darrick J. Wong
@ 2020-03-13 14:47   ` Brian Foster
  2020-03-13 16:32     ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2020-03-13 14:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Mar 11, 2020 at 08:45:43PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Create an in-core fake root for inode-rooted btree types so that callers
> can generate a whole new btree using the upcoming btree bulk load
> function without making the new tree accessible from the rest of the
> filesystem.  It is up to the individual btree type to provide a function
> to create a staged cursor (presumably with the appropriate callouts to
> update the fakeroot) and then commit the staged root back into the
> filesystem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Same question as the previous patch, but otherwise looks Ok to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_btree.c |  111 +++++++++++++++++++++++++++++++++++++++++++--
>  fs/xfs/libxfs/xfs_btree.h |   31 +++++++++++++
>  fs/xfs/xfs_trace.h        |   33 +++++++++++++
>  3 files changed, 171 insertions(+), 4 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 085bc070e804..4e1d4f184d4b 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -644,6 +644,17 @@ xfs_btree_ptr_addr(
>  		((char *)block + xfs_btree_ptr_offset(cur, n, level));
>  }
>  
> +struct xfs_ifork *
> +xfs_btree_ifork_ptr(
> +	struct xfs_btree_cur	*cur)
> +{
> +	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
> +
> +	if (cur->bc_flags & XFS_BTREE_STAGING)
> +		return cur->bc_ino.ifake->if_fork;
> +	return XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork);
> +}
> +
>  /*
>   * Get the root block which is stored in the inode.
>   *
> @@ -654,9 +665,8 @@ STATIC struct xfs_btree_block *
>  xfs_btree_get_iroot(
>  	struct xfs_btree_cur	*cur)
>  {
> -	struct xfs_ifork	*ifp;
> +	struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
>  
> -	ifp = XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork);
>  	return (struct xfs_btree_block *)ifp->if_broot;
>  }
>  
> @@ -4985,8 +4995,17 @@ xfs_btree_fakeroot_init_ptr_from_cur(
>  
>  	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
>  
> -	afake = cur->bc_ag.afake;
> -	ptr->s = cpu_to_be32(afake->af_root);
> +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> +		/*
> +		 * The root block lives in the inode core, so we zero the
> +		 * pointer (like the bmbt code does) to make it obvious if
> +		 * anyone ever tries to use this pointer.
> +		 */
> +		ptr->l = cpu_to_be64(0);
> +	} else {
> +		afake = cur->bc_ag.afake;
> +		ptr->s = cpu_to_be32(afake->af_root);
> +	}
>  }
>  
>  /*
> @@ -5076,3 +5095,87 @@ xfs_btree_commit_afakeroot(
>  	cur->bc_flags &= ~XFS_BTREE_STAGING;
>  	cur->bc_tp = tp;
>  }
> +
> +/*
> + * Bulk Loading for Inode-Rooted Btrees
> + * ====================================
> + *
> + * For a btree rooted in an inode fork, pass a xbtree_ifakeroot structure to
> + * the staging cursor.  This structure should be initialized as follows:
> + *
> + * - if_fork_size field should be set to the number of bytes available to the
> + *   fork in the inode.
> + *
> + * - if_fork should point to a freshly allocated struct xfs_ifork.
> + *
> + * - if_format should be set to the appropriate fork type (e.g.
> + *   XFS_DINODE_FMT_BTREE).
> + *
> + * All other fields must be zero.
> + *
> + * The _stage_cursor() function for a specific btree type should call
> + * xfs_btree_stage_ifakeroot to set up the in-memory cursor as a staging
> + * cursor.  The corresponding _commit_staged_btree() function should log the
> + * new root and call xfs_btree_commit_ifakeroot() to transform the staging
> + * cursor into a regular btree cursor.
> + */
> +
> +/*
> + * Initialize an inode-rooted btree cursor with the given inode btree fake
> + * root.  The btree cursor's bc_ops will be overridden as needed to make the
> + * staging functionality work.  If new_ops is not NULL, these new ops will be
> + * passed out to the caller for further overriding.
> + */
> +void
> +xfs_btree_stage_ifakeroot(
> +	struct xfs_btree_cur		*cur,
> +	struct xbtree_ifakeroot		*ifake,
> +	struct xfs_btree_ops		**new_ops)
> +{
> +	struct xfs_btree_ops		*nops;
> +
> +	ASSERT(!(cur->bc_flags & XFS_BTREE_STAGING));
> +	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
> +	ASSERT(cur->bc_tp == NULL);
> +
> +	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
> +	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
> +	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
> +	nops->free_block = xfs_btree_fakeroot_free_block;
> +	nops->init_ptr_from_cur = xfs_btree_fakeroot_init_ptr_from_cur;
> +	nops->dup_cursor = xfs_btree_fakeroot_dup_cursor;
> +
> +	cur->bc_ino.ifake = ifake;
> +	cur->bc_nlevels = ifake->if_levels;
> +	cur->bc_ops = nops;
> +	cur->bc_flags |= XFS_BTREE_STAGING;
> +
> +	if (new_ops)
> +		*new_ops = nops;
> +}
> +
> +/*
> + * Transform an inode-rooted staging btree cursor back into a regular cursor by
> + * substituting a real btree root for the fake one and restoring normal btree
> + * cursor ops.  The caller must log the btree root change prior to calling
> + * this.
> + */
> +void
> +xfs_btree_commit_ifakeroot(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_trans		*tp,
> +	int				whichfork,
> +	const struct xfs_btree_ops	*ops)
> +{
> +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> +	ASSERT(cur->bc_tp == NULL);
> +
> +	trace_xfs_btree_commit_ifakeroot(cur);
> +
> +	kmem_free((void *)cur->bc_ops);
> +	cur->bc_ino.ifake = NULL;
> +	cur->bc_ino.whichfork = whichfork;
> +	cur->bc_ops = ops;
> +	cur->bc_flags &= ~XFS_BTREE_STAGING;
> +	cur->bc_tp = tp;
> +}
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index aa4a7bd40023..047067f52063 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -10,6 +10,7 @@ struct xfs_buf;
>  struct xfs_inode;
>  struct xfs_mount;
>  struct xfs_trans;
> +struct xfs_ifork;
>  
>  extern kmem_zone_t	*xfs_btree_cur_zone;
>  
> @@ -198,6 +199,7 @@ struct xfs_btree_cur_ag {
>  /* Btree-in-inode cursor information */
>  struct xfs_btree_cur_ino {
>  	struct xfs_inode	*ip;
> +	struct xbtree_ifakeroot	*ifake;		/* fake inode fork */
>  	int			allocated;
>  	short			forksize;
>  	char			whichfork;
> @@ -506,6 +508,7 @@ union xfs_btree_key *xfs_btree_high_key_from_key(struct xfs_btree_cur *cur,
>  int xfs_btree_has_record(struct xfs_btree_cur *cur, union xfs_btree_irec *low,
>  		union xfs_btree_irec *high, bool *exists);
>  bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
> +struct xfs_ifork *xfs_btree_ifork_ptr(struct xfs_btree_cur *cur);
>  
>  /* Does this cursor point to the last block in the given level? */
>  static inline bool
> @@ -543,4 +546,32 @@ void xfs_btree_stage_afakeroot(struct xfs_btree_cur *cur,
>  void xfs_btree_commit_afakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
>  		struct xfs_buf *agbp, const struct xfs_btree_ops *ops);
>  
> +/* Fake root for an inode-rooted btree. */
> +struct xbtree_ifakeroot {
> +	/* Fake inode fork. */
> +	struct xfs_ifork	*if_fork;
> +
> +	/* Number of blocks used by the btree. */
> +	int64_t			if_blocks;
> +
> +	/* Height of the new btree. */
> +	unsigned int		if_levels;
> +
> +	/* Number of bytes available for this fork in the inode. */
> +	unsigned int		if_fork_size;
> +
> +	/* Fork format. */
> +	unsigned int		if_format;
> +
> +	/* Number of records. */
> +	unsigned int		if_extents;
> +};
> +
> +/* Cursor interactions with with fake roots for inode-rooted btrees. */
> +void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
> +		struct xbtree_ifakeroot *ifake,
> +		struct xfs_btree_ops **new_ops);
> +void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
> +		int whichfork, const struct xfs_btree_ops *ops);
> +
>  #endif	/* __XFS_BTREE_H__ */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index d8c229492973..05db0398f040 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3633,6 +3633,39 @@ TRACE_EVENT(xfs_btree_commit_afakeroot,
>  		  __entry->agbno)
>  )
>  
> +TRACE_EVENT(xfs_btree_commit_ifakeroot,
> +	TP_PROTO(struct xfs_btree_cur *cur),
> +	TP_ARGS(cur),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agino_t, agino)
> +		__field(unsigned int, levels)
> +		__field(unsigned int, blocks)
> +		__field(int, whichfork)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = cur->bc_mp->m_super->s_dev;
> +		__entry->btnum = cur->bc_btnum;
> +		__entry->agno = XFS_INO_TO_AGNO(cur->bc_mp,
> +					cur->bc_ino.ip->i_ino);
> +		__entry->agino = XFS_INO_TO_AGINO(cur->bc_mp,
> +					cur->bc_ino.ip->i_ino);
> +		__entry->levels = cur->bc_ino.ifake->if_levels;
> +		__entry->blocks = cur->bc_ino.ifake->if_blocks;
> +		__entry->whichfork = cur->bc_ino.whichfork;
> +	),
> +	TP_printk("dev %d:%d btree %s ag %u agino %u whichfork %s levels %u blocks %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> +		  __entry->agno,
> +		  __entry->agino,
> +		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
> +		  __entry->levels,
> +		  __entry->blocks)
> +)
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] xfs: support bulk loading of staged btrees
  2020-03-12  3:45 ` [PATCH 3/7] xfs: support bulk loading of staged btrees Darrick J. Wong
@ 2020-03-13 14:49   ` Brian Foster
  2020-03-13 16:28     ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2020-03-13 14:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Mar 11, 2020 at 08:45:49PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add a new btree function that enables us to bulk load a btree cursor.
> This will be used by the upcoming online repair patches to generate new
> btrees.  This avoids the programmatic inefficiency of calling
> xfs_btree_insert in a loop (which generates a lot of log traffic) in
> favor of stamping out new btree blocks with ordered buffers, and then
> committing both the new root and scheduling the removal of the old btree
> blocks in a single transaction commit.
> 
> The design of this new generic code is based off the btree rebuilding
> code in xfs_repair's phase 5 code, with the explicit goal of enabling us
> to share that code between scrub and repair.  It has the additional
> feature of being able to control btree block loading factors.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

The code mostly looks fine to me. A few nits around comments and such
below. With those fixed up:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_btree.c |  604 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_btree.h |   68 +++++
>  fs/xfs/xfs_trace.c        |    1 
>  fs/xfs/xfs_trace.h        |   85 ++++++
>  4 files changed, 757 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index 4e1d4f184d4b..d579d8e99046 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -1324,7 +1324,7 @@ STATIC void
>  xfs_btree_copy_ptrs(
>  	struct xfs_btree_cur	*cur,
>  	union xfs_btree_ptr	*dst_ptr,
> -	union xfs_btree_ptr	*src_ptr,
> +	const union xfs_btree_ptr *src_ptr,
>  	int			numptrs)
>  {
>  	ASSERT(numptrs >= 0);
> @@ -5179,3 +5179,605 @@ xfs_btree_commit_ifakeroot(
>  	cur->bc_flags &= ~XFS_BTREE_STAGING;
>  	cur->bc_tp = tp;
>  }
> +
...
> +/*
> + * Put a btree block that we're loading onto the ordered list and release it.
> + * The btree blocks will be written to disk when bulk loading is finished.
> + */
> +static void
> +xfs_btree_bload_drop_buf(
> +	struct list_head	*buffers_list,
> +	struct xfs_buf		**bpp)
> +{
> +	if (*bpp == NULL)
> +		return;
> +
> +	xfs_buf_delwri_queue(*bpp, buffers_list);

Might want to do something like the following here, given there is no
error path:

	if (!xfs_buf_delwri_queue(...))
		ASSERT(0);

> +	xfs_buf_relse(*bpp);
> +	*bpp = NULL;
> +}
> +
> +/*
> + * Allocate and initialize one btree block for bulk loading.
> + *
> + * The new btree block will have its level and numrecs fields set to the values
> + * of the level and nr_this_block parameters, respectively.  On exit, ptrp,
> + * bpp, and blockp will all point to the new block.
> + */
> +STATIC int
> +xfs_btree_bload_prep_block(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_btree_bload		*bbl,
> +	unsigned int			level,
> +	unsigned int			nr_this_block,
> +	union xfs_btree_ptr		*ptrp,
> +	struct xfs_buf			**bpp,
> +	struct xfs_btree_block		**blockp,
> +	void				*priv)

The header comment doesn't mention that ptrp and blockp are input values
as well. I'd expect inline comments for the certain parameters that have
non-obvious uses. Something like the following for example:

	union xfs_btree_ptr		*ptrp,	/* in: prev ptr, out: current */
	...
	struct xfs_btree_block		*blockp, /* in: prev block, out: current */

> +{
> +	union xfs_btree_ptr		new_ptr;
> +	struct xfs_buf			*new_bp;
> +	struct xfs_btree_block		*new_block;
> +	int				ret;
> +
> +	ASSERT(*bpp == NULL);
> +
> +	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> +	    level == cur->bc_nlevels - 1) {
> +		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
> +		size_t			new_size;
> +
> +		/* Allocate a new incore btree root block. */
> +		new_size = bbl->iroot_size(cur, nr_this_block, priv);
> +		ifp->if_broot = kmem_zalloc(new_size, 0);
> +		ifp->if_broot_bytes = (int)new_size;
> +		ifp->if_flags |= XFS_IFBROOT;
> +
> +		/* Initialize it and send it out. */
> +		xfs_btree_init_block_int(cur->bc_mp, ifp->if_broot,
> +				XFS_BUF_DADDR_NULL, cur->bc_btnum, level,
> +				nr_this_block, cur->bc_ino.ip->i_ino,
> +				cur->bc_flags);
> +
> +		*bpp = NULL;
> +		*blockp = ifp->if_broot;
> +		xfs_btree_set_ptr_null(cur, ptrp);
> +		return 0;
> +	}
> +
> +	/* Claim one of the caller's preallocated blocks. */
> +	xfs_btree_set_ptr_null(cur, &new_ptr);
> +	ret = bbl->claim_block(cur, &new_ptr, priv);
> +	if (ret)
> +		return ret;
> +
> +	ASSERT(!xfs_btree_ptr_is_null(cur, &new_ptr));
> +
> +	ret = xfs_btree_get_buf_block(cur, &new_ptr, &new_block, &new_bp);
> +	if (ret)
> +		return ret;
> +
> +	/* Initialize the btree block. */
> +	xfs_btree_init_block_cur(cur, new_bp, level, nr_this_block);
> +	if (*blockp)
> +		xfs_btree_set_sibling(cur, *blockp, &new_ptr, XFS_BB_RIGHTSIB);
> +	xfs_btree_set_sibling(cur, new_block, ptrp, XFS_BB_LEFTSIB);
> +
> +	/* Set the out parameters. */
> +	*bpp = new_bp;
> +	*blockp = new_block;
> +	xfs_btree_copy_ptrs(cur, ptrp, &new_ptr, 1);
> +	return 0;
> +}
...
> +/*
> + * Prepare a btree cursor for a bulk load operation by computing the geometry
> + * fields in bbl.  Caller must ensure that the btree cursor is a staging
> + * cursor.  This function can be called multiple times.
> + */
> +int
> +xfs_btree_bload_compute_geometry(
> +	struct xfs_btree_cur	*cur,
> +	struct xfs_btree_bload	*bbl,
> +	uint64_t		nr_records)
> +{
> +	uint64_t		nr_blocks = 0;
> +	uint64_t		nr_this_level;
> +
> +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> +
> +	/*
> +	 * Make sure that the slack values make sense for btree blocks that are
> +	 * full disk blocks.  We do this by setting the btree nlevels to 3,
> +	 * because inode-rooted btrees will return different minrecs/maxrecs
> +	 * values for the root block.  Note that slack settings are not applied
> +	 * to inode roots.
> +	 */
> +	cur->bc_nlevels = 3;

I still find the wording of the comment a little confusing...

"Make sure the slack values make sense for leaf and node blocks.
Inode-rooted btrees return different geometry for the root block (when
->bc_nlevels == level - 1). We're checking levels 0 and 1 here, so set
->bc_nlevels such that btree code doesn't interpret either as the root
level."

BTW.. I also wonder if just setting XFS_BTREE_MAXLEVELS-1 would be more
clear than 3?

> +	xfs_btree_bload_ensure_slack(cur, &bbl->leaf_slack, 0);
> +	xfs_btree_bload_ensure_slack(cur, &bbl->node_slack, 1);
> +
> +	bbl->nr_records = nr_this_level = nr_records;
> +	for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
> +		uint64_t	level_blocks;
> +		uint64_t	dontcare64;
> +		unsigned int	level = cur->bc_nlevels - 1;
> +		unsigned int	avg_per_block;
> +
> +		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> +				&avg_per_block, &level_blocks, &dontcare64);
> +
> +		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> +			/*
> +			 * If all the items we want to store at this level
> +			 * would fit in the inode root block, then we have our
> +			 * btree root and are done.
> +			 *
> +			 * Note that bmap btrees forbid records in the root.
> +			 */
> +			if (level != 0 && nr_this_level <= avg_per_block) {
> +				nr_blocks++;
> +				break;
> +			}
> +
> +			/*
> +			 * Otherwise, we have to store all the items for this
> +			 * level in traditional btree blocks and therefore need
> +			 * another level of btree to point to those blocks.
> +			 *
> +			 * We have to re-compute the geometry for each level of
> +			 * an inode-rooted btree because the geometry differs
> +			 * between a btree root in an inode fork and a
> +			 * traditional btree block.
> +			 *
> +			 * This distinction is made in the btree code based on
> +			 * whether level == bc_nlevels - 1.  Based on the
> +			 * previous root block size check against the root
> +			 * block geometry, we know that we aren't yet ready to
> +			 * populate the root.  Increment bc_nevels and
> +			 * recalculate the geometry for a traditional
> +			 * block-based btree level.
> +			 */
> +			cur->bc_nlevels++;
> +			xfs_btree_bload_level_geometry(cur, bbl, level,
> +					nr_this_level, &avg_per_block,
> +					&level_blocks, &dontcare64);
> +		} else {
> +			/*
> +			 * If all the items we want to store at this level
> +			 * would fit in a single root block, we're done.
> +			 */
> +			if (nr_this_level <= avg_per_block) {
> +				nr_blocks++;
> +				break;
> +			}
> +
> +			/* Otherwise, we need another level of btree. */
> +			cur->bc_nlevels++;
> +		}
> +
> +		nr_blocks += level_blocks;
> +		nr_this_level = level_blocks;
> +	}
> +
> +	if (cur->bc_nlevels == XFS_BTREE_MAXLEVELS)
> +		return -EOVERFLOW;
> +
> +	bbl->btree_height = cur->bc_nlevels;
> +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> +		bbl->nr_blocks = nr_blocks - 1;
> +	else
> +		bbl->nr_blocks = nr_blocks;
> +	return 0;
> +}
> +
> +/* Bulk load a btree given the parameters and geometry established in bbl. */
> +int
> +xfs_btree_bload(
> +	struct xfs_btree_cur		*cur,
> +	struct xfs_btree_bload		*bbl,
> +	void				*priv)
> +{
> +	struct list_head		buffers_list;
> +	union xfs_btree_ptr		child_ptr;
> +	union xfs_btree_ptr		ptr;
> +	struct xfs_buf			*bp = NULL;
> +	struct xfs_btree_block		*block = NULL;
> +	uint64_t			nr_this_level = bbl->nr_records;
> +	uint64_t			blocks;
> +	uint64_t			i;
> +	uint64_t			blocks_with_extra;
> +	uint64_t			total_blocks = 0;
> +	unsigned int			avg_per_block;
> +	unsigned int			level = 0;
> +	int				ret;
> +
> +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> +
> +	INIT_LIST_HEAD(&buffers_list);
> +	cur->bc_nlevels = bbl->btree_height;
> +	xfs_btree_set_ptr_null(cur, &child_ptr);
> +	xfs_btree_set_ptr_null(cur, &ptr);
> +
> +	xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> +			&avg_per_block, &blocks, &blocks_with_extra);
> +
> +	/* Load each leaf block. */
> +	for (i = 0; i < blocks; i++) {
> +		unsigned int		nr_this_block = avg_per_block;
> +
> +		if (i < blocks_with_extra)
> +			nr_this_block++;

The blocks_with_extra thing kind of confused me until I made it through
the related functions. A brief comment would be helpful here, just to
explain what's going on in the high level context. I.e.:

"btree blocks will not be evenly populated in most cases.
blocks_with_extra tells us how many blocks get an extra record to evenly
distribute the excess across the current level."

Brian

> +
> +		xfs_btree_bload_drop_buf(&buffers_list, &bp);
> +
> +		ret = xfs_btree_bload_prep_block(cur, bbl, level,
> +				nr_this_block, &ptr, &bp, &block, priv);
> +		if (ret)
> +			goto out;
> +
> +		trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr,
> +				nr_this_block);
> +
> +		ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_record,
> +				block, priv);
> +		if (ret)
> +			goto out;
> +
> +		/*
> +		 * Record the leftmost leaf pointer so we know where to start
> +		 * with the first node level.
> +		 */
> +		if (i == 0)
> +			xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1);
> +	}
> +	total_blocks += blocks;
> +	xfs_btree_bload_drop_buf(&buffers_list, &bp);
> +
> +	/* Populate the internal btree nodes. */
> +	for (level = 1; level < cur->bc_nlevels; level++) {
> +		union xfs_btree_ptr	first_ptr;
> +
> +		nr_this_level = blocks;
> +		block = NULL;
> +		xfs_btree_set_ptr_null(cur, &ptr);
> +
> +		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> +				&avg_per_block, &blocks, &blocks_with_extra);
> +
> +		/* Load each node block. */
> +		for (i = 0; i < blocks; i++) {
> +			unsigned int	nr_this_block = avg_per_block;
> +
> +			if (i < blocks_with_extra)
> +				nr_this_block++;
> +
> +			xfs_btree_bload_drop_buf(&buffers_list, &bp);
> +
> +			ret = xfs_btree_bload_prep_block(cur, bbl, level,
> +					nr_this_block, &ptr, &bp, &block,
> +					priv);
> +			if (ret)
> +				goto out;
> +
> +			trace_xfs_btree_bload_block(cur, level, i, blocks,
> +					&ptr, nr_this_block);
> +
> +			ret = xfs_btree_bload_node(cur, nr_this_block,
> +					&child_ptr, block);
> +			if (ret)
> +				goto out;
> +
> +			/*
> +			 * Record the leftmost node pointer so that we know
> +			 * where to start the next node level above this one.
> +			 */
> +			if (i == 0)
> +				xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1);
> +		}
> +		total_blocks += blocks;
> +		xfs_btree_bload_drop_buf(&buffers_list, &bp);
> +		xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1);
> +	}
> +
> +	/* Initialize the new root. */
> +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> +		ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
> +		cur->bc_ino.ifake->if_levels = cur->bc_nlevels;
> +		cur->bc_ino.ifake->if_blocks = total_blocks - 1;
> +	} else {
> +		cur->bc_ag.afake->af_root = be32_to_cpu(ptr.s);
> +		cur->bc_ag.afake->af_levels = cur->bc_nlevels;
> +		cur->bc_ag.afake->af_blocks = total_blocks;
> +	}
> +
> +	/*
> +	 * Write the new blocks to disk.  If the ordered list isn't empty after
> +	 * that, then something went wrong and we have to fail.  This should
> +	 * never happen, but we'll check anyway.
> +	 */
> +	ret = xfs_buf_delwri_submit(&buffers_list);
> +	if (ret)
> +		goto out;
> +	if (!list_empty(&buffers_list)) {
> +		ASSERT(list_empty(&buffers_list));
> +		ret = -EIO;
> +	}
> +
> +out:
> +	xfs_buf_delwri_cancel(&buffers_list);
> +	if (bp)
> +		xfs_buf_relse(bp);
> +	return ret;
> +}
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 047067f52063..c2de439a6f0d 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -574,4 +574,72 @@ void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
>  void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
>  		int whichfork, const struct xfs_btree_ops *ops);
>  
> +/* Bulk loading of staged btrees. */
> +typedef int (*xfs_btree_bload_get_record_fn)(struct xfs_btree_cur *cur, void *priv);
> +typedef int (*xfs_btree_bload_claim_block_fn)(struct xfs_btree_cur *cur,
> +		union xfs_btree_ptr *ptr, void *priv);
> +typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur,
> +		unsigned int nr_this_level, void *priv);
> +
> +struct xfs_btree_bload {
> +	/*
> +	 * This function will be called nr_records times to load records into
> +	 * the btree.  The function does this by setting the cursor's bc_rec
> +	 * field in in-core format.  Records must be returned in sort order.
> +	 */
> +	xfs_btree_bload_get_record_fn	get_record;
> +
> +	/*
> +	 * This function will be called nr_blocks times to obtain a pointer
> +	 * to a new btree block on disk.  Callers must preallocate all space
> +	 * for the new btree before calling xfs_btree_bload, and this function
> +	 * is what claims that reservation.
> +	 */
> +	xfs_btree_bload_claim_block_fn	claim_block;
> +
> +	/*
> +	 * This function should return the size of the in-core btree root
> +	 * block.  It is only necessary for XFS_BTREE_ROOT_IN_INODE btree
> +	 * types.
> +	 */
> +	xfs_btree_bload_iroot_size_fn	iroot_size;
> +
> +	/*
> +	 * The caller should set this to the number of records that will be
> +	 * stored in the new btree.
> +	 */
> +	uint64_t			nr_records;
> +
> +	/*
> +	 * Number of free records to leave in each leaf block.  If the caller
> +	 * sets this to -1, the slack value will be calculated to be be halfway
> +	 * between maxrecs and minrecs.  This typically leaves the block 75%
> +	 * full.  Note that slack values are not enforced on inode root blocks.
> +	 */
> +	int				leaf_slack;
> +
> +	/*
> +	 * Number of free key/ptrs pairs to leave in each node block.  This
> +	 * field has the same semantics as leaf_slack.
> +	 */
> +	int				node_slack;
> +
> +	/*
> +	 * The xfs_btree_bload_compute_geometry function will set this to the
> +	 * number of btree blocks needed to store nr_records records.
> +	 */
> +	uint64_t			nr_blocks;
> +
> +	/*
> +	 * The xfs_btree_bload_compute_geometry function will set this to the
> +	 * height of the new btree.
> +	 */
> +	unsigned int			btree_height;
> +};
> +
> +int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur,
> +		struct xfs_btree_bload *bbl, uint64_t nr_records);
> +int xfs_btree_bload(struct xfs_btree_cur *cur, struct xfs_btree_bload *bbl,
> +		void *priv);
> +
>  #endif	/* __XFS_BTREE_H__ */
> diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> index bc85b89f88ca..9b5e58a92381 100644
> --- a/fs/xfs/xfs_trace.c
> +++ b/fs/xfs/xfs_trace.c
> @@ -6,6 +6,7 @@
>  #include "xfs.h"
>  #include "xfs_fs.h"
>  #include "xfs_shared.h"
> +#include "xfs_bit.h"
>  #include "xfs_format.h"
>  #include "xfs_log_format.h"
>  #include "xfs_trans_resv.h"
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 05db0398f040..efc7751550d9 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -35,6 +35,7 @@ struct xfs_icreate_log;
>  struct xfs_owner_info;
>  struct xfs_trans_res;
>  struct xfs_inobt_rec_incore;
> +union xfs_btree_ptr;
>  
>  #define XFS_ATTR_FILTER_FLAGS \
>  	{ XFS_ATTR_ROOT,	"ROOT" }, \
> @@ -3666,6 +3667,90 @@ TRACE_EVENT(xfs_btree_commit_ifakeroot,
>  		  __entry->blocks)
>  )
>  
> +TRACE_EVENT(xfs_btree_bload_level_geometry,
> +	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> +		 uint64_t nr_this_level, unsigned int nr_per_block,
> +		 unsigned int desired_npb, uint64_t blocks,
> +		 uint64_t blocks_with_extra),
> +	TP_ARGS(cur, level, nr_this_level, nr_per_block, desired_npb, blocks,
> +		blocks_with_extra),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(unsigned int, level)
> +		__field(unsigned int, nlevels)
> +		__field(uint64_t, nr_this_level)
> +		__field(unsigned int, nr_per_block)
> +		__field(unsigned int, desired_npb)
> +		__field(unsigned long long, blocks)
> +		__field(unsigned long long, blocks_with_extra)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = cur->bc_mp->m_super->s_dev;
> +		__entry->btnum = cur->bc_btnum;
> +		__entry->level = level;
> +		__entry->nlevels = cur->bc_nlevels;
> +		__entry->nr_this_level = nr_this_level;
> +		__entry->nr_per_block = nr_per_block;
> +		__entry->desired_npb = desired_npb;
> +		__entry->blocks = blocks;
> +		__entry->blocks_with_extra = blocks_with_extra;
> +	),
> +	TP_printk("dev %d:%d btree %s level %u/%u nr_this_level %llu nr_per_block %u desired_npb %u blocks %llu blocks_with_extra %llu",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> +		  __entry->level,
> +		  __entry->nlevels,
> +		  __entry->nr_this_level,
> +		  __entry->nr_per_block,
> +		  __entry->desired_npb,
> +		  __entry->blocks,
> +		  __entry->blocks_with_extra)
> +)
> +
> +TRACE_EVENT(xfs_btree_bload_block,
> +	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> +		 uint64_t block_idx, uint64_t nr_blocks,
> +		 union xfs_btree_ptr *ptr, unsigned int nr_records),
> +	TP_ARGS(cur, level, block_idx, nr_blocks, ptr, nr_records),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(unsigned int, level)
> +		__field(unsigned long long, block_idx)
> +		__field(unsigned long long, nr_blocks)
> +		__field(xfs_agnumber_t, agno)
> +		__field(xfs_agblock_t, agbno)
> +		__field(unsigned int, nr_records)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = cur->bc_mp->m_super->s_dev;
> +		__entry->btnum = cur->bc_btnum;
> +		__entry->level = level;
> +		__entry->block_idx = block_idx;
> +		__entry->nr_blocks = nr_blocks;
> +		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> +			xfs_fsblock_t	fsb = be64_to_cpu(ptr->l);
> +
> +			__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsb);
> +			__entry->agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsb);
> +		} else {
> +			__entry->agno = cur->bc_ag.agno;
> +			__entry->agbno = be32_to_cpu(ptr->s);
> +		}
> +		__entry->nr_records = nr_records;
> +	),
> +	TP_printk("dev %d:%d btree %s level %u block %llu/%llu fsb (%u/%u) recs %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> +		  __entry->level,
> +		  __entry->block_idx,
> +		  __entry->nr_blocks,
> +		  __entry->agno,
> +		  __entry->agbno,
> +		  __entry->nr_records)
> +)
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] xfs: support bulk loading of staged btrees
  2020-03-13 14:49   ` Brian Foster
@ 2020-03-13 16:28     ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-13 16:28 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Mar 13, 2020 at 10:49:43AM -0400, Brian Foster wrote:
> On Wed, Mar 11, 2020 at 08:45:49PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Add a new btree function that enables us to bulk load a btree cursor.
> > This will be used by the upcoming online repair patches to generate new
> > btrees.  This avoids the programmatic inefficiency of calling
> > xfs_btree_insert in a loop (which generates a lot of log traffic) in
> > favor of stamping out new btree blocks with ordered buffers, and then
> > committing both the new root and scheduling the removal of the old btree
> > blocks in a single transaction commit.
> > 
> > The design of this new generic code is based off the btree rebuilding
> > code in xfs_repair's phase 5 code, with the explicit goal of enabling us
> > to share that code between scrub and repair.  It has the additional
> > feature of being able to control btree block loading factors.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> The code mostly looks fine to me. A few nits around comments and such
> below. With those fixed up:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  fs/xfs/libxfs/xfs_btree.c |  604 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_btree.h |   68 +++++
> >  fs/xfs/xfs_trace.c        |    1 
> >  fs/xfs/xfs_trace.h        |   85 ++++++
> >  4 files changed, 757 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 4e1d4f184d4b..d579d8e99046 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -1324,7 +1324,7 @@ STATIC void
> >  xfs_btree_copy_ptrs(
> >  	struct xfs_btree_cur	*cur,
> >  	union xfs_btree_ptr	*dst_ptr,
> > -	union xfs_btree_ptr	*src_ptr,
> > +	const union xfs_btree_ptr *src_ptr,
> >  	int			numptrs)
> >  {
> >  	ASSERT(numptrs >= 0);
> > @@ -5179,3 +5179,605 @@ xfs_btree_commit_ifakeroot(
> >  	cur->bc_flags &= ~XFS_BTREE_STAGING;
> >  	cur->bc_tp = tp;
> >  }
> > +
> ...
> > +/*
> > + * Put a btree block that we're loading onto the ordered list and release it.
> > + * The btree blocks will be written to disk when bulk loading is finished.
> > + */
> > +static void
> > +xfs_btree_bload_drop_buf(
> > +	struct list_head	*buffers_list,
> > +	struct xfs_buf		**bpp)
> > +{
> > +	if (*bpp == NULL)
> > +		return;
> > +
> > +	xfs_buf_delwri_queue(*bpp, buffers_list);
> 
> Might want to do something like the following here, given there is no
> error path:
> 
> 	if (!xfs_buf_delwri_queue(...))
> 		ASSERT(0);

ok.

> > +	xfs_buf_relse(*bpp);
> > +	*bpp = NULL;
> > +}
> > +
> > +/*
> > + * Allocate and initialize one btree block for bulk loading.
> > + *
> > + * The new btree block will have its level and numrecs fields set to the values
> > + * of the level and nr_this_block parameters, respectively.  On exit, ptrp,
> > + * bpp, and blockp will all point to the new block.
> > + */
> > +STATIC int
> > +xfs_btree_bload_prep_block(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_btree_bload		*bbl,
> > +	unsigned int			level,
> > +	unsigned int			nr_this_block,
> > +	union xfs_btree_ptr		*ptrp,
> > +	struct xfs_buf			**bpp,
> > +	struct xfs_btree_block		**blockp,
> > +	void				*priv)
> 
> The header comment doesn't mention that ptrp and blockp are input values
> as well. I'd expect inline comments for the certain parameters that have
> non-obvious uses. Something like the following for example:
> 
> 	union xfs_btree_ptr		*ptrp,	/* in: prev ptr, out: current */
> 	...
> 	struct xfs_btree_block		*blockp, /* in: prev block, out: current */

Ok.  I think I'll rework the function comment to describe the in/outness
in more detail:

/*
 * Allocate and initialize one btree block for bulk loading.
 *
 * The new btree block will have its level and numrecs fields set to the values
 * of the level and nr_this_block parameters, respectively.
 *
 * The caller should ensure that ptrp, bpp, and blockp refer to the left
 * sibling of the new block, if there is any.  On exit, ptrp, bpp, and blockp
 * will all point to the new block.
 */
STATIC int
xfs_btree_bload_prep_block(
	struct xfs_btree_cur		*cur,
	struct xfs_btree_bload		*bbl,
	unsigned int			level,
	unsigned int			nr_this_block,
	union xfs_btree_ptr		*ptrp,    /* in/out */
	struct xfs_buf			**bpp,    /* in/out */
	struct xfs_btree_block		**blockp, /* in/out */
	void				*priv)

> 
> > +{
> > +	union xfs_btree_ptr		new_ptr;
> > +	struct xfs_buf			*new_bp;
> > +	struct xfs_btree_block		*new_block;
> > +	int				ret;
> > +
> > +	ASSERT(*bpp == NULL);
> > +
> > +	if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
> > +	    level == cur->bc_nlevels - 1) {
> > +		struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
> > +		size_t			new_size;
> > +
> > +		/* Allocate a new incore btree root block. */
> > +		new_size = bbl->iroot_size(cur, nr_this_block, priv);
> > +		ifp->if_broot = kmem_zalloc(new_size, 0);
> > +		ifp->if_broot_bytes = (int)new_size;
> > +		ifp->if_flags |= XFS_IFBROOT;
> > +
> > +		/* Initialize it and send it out. */
> > +		xfs_btree_init_block_int(cur->bc_mp, ifp->if_broot,
> > +				XFS_BUF_DADDR_NULL, cur->bc_btnum, level,
> > +				nr_this_block, cur->bc_ino.ip->i_ino,
> > +				cur->bc_flags);
> > +
> > +		*bpp = NULL;
> > +		*blockp = ifp->if_broot;
> > +		xfs_btree_set_ptr_null(cur, ptrp);
> > +		return 0;
> > +	}
> > +
> > +	/* Claim one of the caller's preallocated blocks. */
> > +	xfs_btree_set_ptr_null(cur, &new_ptr);
> > +	ret = bbl->claim_block(cur, &new_ptr, priv);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ASSERT(!xfs_btree_ptr_is_null(cur, &new_ptr));
> > +
> > +	ret = xfs_btree_get_buf_block(cur, &new_ptr, &new_block, &new_bp);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* Initialize the btree block. */
> > +	xfs_btree_init_block_cur(cur, new_bp, level, nr_this_block);
> > +	if (*blockp)
> > +		xfs_btree_set_sibling(cur, *blockp, &new_ptr, XFS_BB_RIGHTSIB);
> > +	xfs_btree_set_sibling(cur, new_block, ptrp, XFS_BB_LEFTSIB);
> > +
> > +	/* Set the out parameters. */
> > +	*bpp = new_bp;
> > +	*blockp = new_block;
> > +	xfs_btree_copy_ptrs(cur, ptrp, &new_ptr, 1);
> > +	return 0;
> > +}
> ...
> > +/*
> > + * Prepare a btree cursor for a bulk load operation by computing the geometry
> > + * fields in bbl.  Caller must ensure that the btree cursor is a staging
> > + * cursor.  This function can be called multiple times.
> > + */
> > +int
> > +xfs_btree_bload_compute_geometry(
> > +	struct xfs_btree_cur	*cur,
> > +	struct xfs_btree_bload	*bbl,
> > +	uint64_t		nr_records)
> > +{
> > +	uint64_t		nr_blocks = 0;
> > +	uint64_t		nr_this_level;
> > +
> > +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > +
> > +	/*
> > +	 * Make sure that the slack values make sense for btree blocks that are
> > +	 * full disk blocks.  We do this by setting the btree nlevels to 3,
> > +	 * because inode-rooted btrees will return different minrecs/maxrecs
> > +	 * values for the root block.  Note that slack settings are not applied
> > +	 * to inode roots.
> > +	 */
> > +	cur->bc_nlevels = 3;
> 
> I still find the wording of the comment a little confusing...
> 
> "Make sure the slack values make sense for leaf and node blocks.
> Inode-rooted btrees return different geometry for the root block (when
> ->bc_nlevels == level - 1). We're checking levels 0 and 1 here, so set
> ->bc_nlevels such that btree code doesn't interpret either as the root
> level."

Ok.

> BTW.. I also wonder if just setting XFS_BTREE_MAXLEVELS-1 would be more
> clear than 3?

It'll at least get rid of the seeming magic number. Fixed.

> > +	xfs_btree_bload_ensure_slack(cur, &bbl->leaf_slack, 0);
> > +	xfs_btree_bload_ensure_slack(cur, &bbl->node_slack, 1);
> > +
> > +	bbl->nr_records = nr_this_level = nr_records;
> > +	for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
> > +		uint64_t	level_blocks;
> > +		uint64_t	dontcare64;
> > +		unsigned int	level = cur->bc_nlevels - 1;
> > +		unsigned int	avg_per_block;
> > +
> > +		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > +				&avg_per_block, &level_blocks, &dontcare64);
> > +
> > +		if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> > +			/*
> > +			 * If all the items we want to store at this level
> > +			 * would fit in the inode root block, then we have our
> > +			 * btree root and are done.
> > +			 *
> > +			 * Note that bmap btrees forbid records in the root.
> > +			 */
> > +			if (level != 0 && nr_this_level <= avg_per_block) {
> > +				nr_blocks++;
> > +				break;
> > +			}
> > +
> > +			/*
> > +			 * Otherwise, we have to store all the items for this
> > +			 * level in traditional btree blocks and therefore need
> > +			 * another level of btree to point to those blocks.
> > +			 *
> > +			 * We have to re-compute the geometry for each level of
> > +			 * an inode-rooted btree because the geometry differs
> > +			 * between a btree root in an inode fork and a
> > +			 * traditional btree block.
> > +			 *
> > +			 * This distinction is made in the btree code based on
> > +			 * whether level == bc_nlevels - 1.  Based on the
> > +			 * previous root block size check against the root
> > +			 * block geometry, we know that we aren't yet ready to
> > +			 * populate the root.  Increment bc_nevels and
> > +			 * recalculate the geometry for a traditional
> > +			 * block-based btree level.
> > +			 */
> > +			cur->bc_nlevels++;
> > +			xfs_btree_bload_level_geometry(cur, bbl, level,
> > +					nr_this_level, &avg_per_block,
> > +					&level_blocks, &dontcare64);
> > +		} else {
> > +			/*
> > +			 * If all the items we want to store at this level
> > +			 * would fit in a single root block, we're done.
> > +			 */
> > +			if (nr_this_level <= avg_per_block) {
> > +				nr_blocks++;
> > +				break;
> > +			}
> > +
> > +			/* Otherwise, we need another level of btree. */
> > +			cur->bc_nlevels++;
> > +		}
> > +
> > +		nr_blocks += level_blocks;
> > +		nr_this_level = level_blocks;
> > +	}
> > +
> > +	if (cur->bc_nlevels == XFS_BTREE_MAXLEVELS)
> > +		return -EOVERFLOW;
> > +
> > +	bbl->btree_height = cur->bc_nlevels;
> > +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > +		bbl->nr_blocks = nr_blocks - 1;
> > +	else
> > +		bbl->nr_blocks = nr_blocks;
> > +	return 0;
> > +}
> > +
> > +/* Bulk load a btree given the parameters and geometry established in bbl. */
> > +int
> > +xfs_btree_bload(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_btree_bload		*bbl,
> > +	void				*priv)
> > +{
> > +	struct list_head		buffers_list;
> > +	union xfs_btree_ptr		child_ptr;
> > +	union xfs_btree_ptr		ptr;
> > +	struct xfs_buf			*bp = NULL;
> > +	struct xfs_btree_block		*block = NULL;
> > +	uint64_t			nr_this_level = bbl->nr_records;
> > +	uint64_t			blocks;
> > +	uint64_t			i;
> > +	uint64_t			blocks_with_extra;
> > +	uint64_t			total_blocks = 0;
> > +	unsigned int			avg_per_block;
> > +	unsigned int			level = 0;
> > +	int				ret;
> > +
> > +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > +
> > +	INIT_LIST_HEAD(&buffers_list);
> > +	cur->bc_nlevels = bbl->btree_height;
> > +	xfs_btree_set_ptr_null(cur, &child_ptr);
> > +	xfs_btree_set_ptr_null(cur, &ptr);
> > +
> > +	xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > +			&avg_per_block, &blocks, &blocks_with_extra);
> > +
> > +	/* Load each leaf block. */
> > +	for (i = 0; i < blocks; i++) {
> > +		unsigned int		nr_this_block = avg_per_block;
> > +
> > +		if (i < blocks_with_extra)
> > +			nr_this_block++;
> 
> The blocks_with_extra thing kind of confused me until I made it through
> the related functions. A brief comment would be helpful here, just to
> explain what's going on in the high level context. I.e.:
> 
> "btree blocks will not be evenly populated in most cases.
> blocks_with_extra tells us how many blocks get an extra record to evenly
> distribute the excess across the current level."

Ok, added.

--D

> Brian
> 
> > +
> > +		xfs_btree_bload_drop_buf(&buffers_list, &bp);
> > +
> > +		ret = xfs_btree_bload_prep_block(cur, bbl, level,
> > +				nr_this_block, &ptr, &bp, &block, priv);
> > +		if (ret)
> > +			goto out;
> > +
> > +		trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr,
> > +				nr_this_block);
> > +
> > +		ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_record,
> > +				block, priv);
> > +		if (ret)
> > +			goto out;
> > +
> > +		/*
> > +		 * Record the leftmost leaf pointer so we know where to start
> > +		 * with the first node level.
> > +		 */
> > +		if (i == 0)
> > +			xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1);
> > +	}
> > +	total_blocks += blocks;
> > +	xfs_btree_bload_drop_buf(&buffers_list, &bp);
> > +
> > +	/* Populate the internal btree nodes. */
> > +	for (level = 1; level < cur->bc_nlevels; level++) {
> > +		union xfs_btree_ptr	first_ptr;
> > +
> > +		nr_this_level = blocks;
> > +		block = NULL;
> > +		xfs_btree_set_ptr_null(cur, &ptr);
> > +
> > +		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > +				&avg_per_block, &blocks, &blocks_with_extra);
> > +
> > +		/* Load each node block. */
> > +		for (i = 0; i < blocks; i++) {
> > +			unsigned int	nr_this_block = avg_per_block;
> > +
> > +			if (i < blocks_with_extra)
> > +				nr_this_block++;
> > +
> > +			xfs_btree_bload_drop_buf(&buffers_list, &bp);
> > +
> > +			ret = xfs_btree_bload_prep_block(cur, bbl, level,
> > +					nr_this_block, &ptr, &bp, &block,
> > +					priv);
> > +			if (ret)
> > +				goto out;
> > +
> > +			trace_xfs_btree_bload_block(cur, level, i, blocks,
> > +					&ptr, nr_this_block);
> > +
> > +			ret = xfs_btree_bload_node(cur, nr_this_block,
> > +					&child_ptr, block);
> > +			if (ret)
> > +				goto out;
> > +
> > +			/*
> > +			 * Record the leftmost node pointer so that we know
> > +			 * where to start the next node level above this one.
> > +			 */
> > +			if (i == 0)
> > +				xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1);
> > +		}
> > +		total_blocks += blocks;
> > +		xfs_btree_bload_drop_buf(&buffers_list, &bp);
> > +		xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1);
> > +	}
> > +
> > +	/* Initialize the new root. */
> > +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> > +		ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
> > +		cur->bc_ino.ifake->if_levels = cur->bc_nlevels;
> > +		cur->bc_ino.ifake->if_blocks = total_blocks - 1;
> > +	} else {
> > +		cur->bc_ag.afake->af_root = be32_to_cpu(ptr.s);
> > +		cur->bc_ag.afake->af_levels = cur->bc_nlevels;
> > +		cur->bc_ag.afake->af_blocks = total_blocks;
> > +	}
> > +
> > +	/*
> > +	 * Write the new blocks to disk.  If the ordered list isn't empty after
> > +	 * that, then something went wrong and we have to fail.  This should
> > +	 * never happen, but we'll check anyway.
> > +	 */
> > +	ret = xfs_buf_delwri_submit(&buffers_list);
> > +	if (ret)
> > +		goto out;
> > +	if (!list_empty(&buffers_list)) {
> > +		ASSERT(list_empty(&buffers_list));
> > +		ret = -EIO;
> > +	}
> > +
> > +out:
> > +	xfs_buf_delwri_cancel(&buffers_list);
> > +	if (bp)
> > +		xfs_buf_relse(bp);
> > +	return ret;
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 047067f52063..c2de439a6f0d 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -574,4 +574,72 @@ void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
> >  void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
> >  		int whichfork, const struct xfs_btree_ops *ops);
> >  
> > +/* Bulk loading of staged btrees. */
> > +typedef int (*xfs_btree_bload_get_record_fn)(struct xfs_btree_cur *cur, void *priv);
> > +typedef int (*xfs_btree_bload_claim_block_fn)(struct xfs_btree_cur *cur,
> > +		union xfs_btree_ptr *ptr, void *priv);
> > +typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur,
> > +		unsigned int nr_this_level, void *priv);
> > +
> > +struct xfs_btree_bload {
> > +	/*
> > +	 * This function will be called nr_records times to load records into
> > +	 * the btree.  The function does this by setting the cursor's bc_rec
> > +	 * field in in-core format.  Records must be returned in sort order.
> > +	 */
> > +	xfs_btree_bload_get_record_fn	get_record;
> > +
> > +	/*
> > +	 * This function will be called nr_blocks times to obtain a pointer
> > +	 * to a new btree block on disk.  Callers must preallocate all space
> > +	 * for the new btree before calling xfs_btree_bload, and this function
> > +	 * is what claims that reservation.
> > +	 */
> > +	xfs_btree_bload_claim_block_fn	claim_block;
> > +
> > +	/*
> > +	 * This function should return the size of the in-core btree root
> > +	 * block.  It is only necessary for XFS_BTREE_ROOT_IN_INODE btree
> > +	 * types.
> > +	 */
> > +	xfs_btree_bload_iroot_size_fn	iroot_size;
> > +
> > +	/*
> > +	 * The caller should set this to the number of records that will be
> > +	 * stored in the new btree.
> > +	 */
> > +	uint64_t			nr_records;
> > +
> > +	/*
> > +	 * Number of free records to leave in each leaf block.  If the caller
> > +	 * sets this to -1, the slack value will be calculated to be be halfway
> > +	 * between maxrecs and minrecs.  This typically leaves the block 75%
> > +	 * full.  Note that slack values are not enforced on inode root blocks.
> > +	 */
> > +	int				leaf_slack;
> > +
> > +	/*
> > +	 * Number of free key/ptrs pairs to leave in each node block.  This
> > +	 * field has the same semantics as leaf_slack.
> > +	 */
> > +	int				node_slack;
> > +
> > +	/*
> > +	 * The xfs_btree_bload_compute_geometry function will set this to the
> > +	 * number of btree blocks needed to store nr_records records.
> > +	 */
> > +	uint64_t			nr_blocks;
> > +
> > +	/*
> > +	 * The xfs_btree_bload_compute_geometry function will set this to the
> > +	 * height of the new btree.
> > +	 */
> > +	unsigned int			btree_height;
> > +};
> > +
> > +int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur,
> > +		struct xfs_btree_bload *bbl, uint64_t nr_records);
> > +int xfs_btree_bload(struct xfs_btree_cur *cur, struct xfs_btree_bload *bbl,
> > +		void *priv);
> > +
> >  #endif	/* __XFS_BTREE_H__ */
> > diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> > index bc85b89f88ca..9b5e58a92381 100644
> > --- a/fs/xfs/xfs_trace.c
> > +++ b/fs/xfs/xfs_trace.c
> > @@ -6,6 +6,7 @@
> >  #include "xfs.h"
> >  #include "xfs_fs.h"
> >  #include "xfs_shared.h"
> > +#include "xfs_bit.h"
> >  #include "xfs_format.h"
> >  #include "xfs_log_format.h"
> >  #include "xfs_trans_resv.h"
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 05db0398f040..efc7751550d9 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -35,6 +35,7 @@ struct xfs_icreate_log;
> >  struct xfs_owner_info;
> >  struct xfs_trans_res;
> >  struct xfs_inobt_rec_incore;
> > +union xfs_btree_ptr;
> >  
> >  #define XFS_ATTR_FILTER_FLAGS \
> >  	{ XFS_ATTR_ROOT,	"ROOT" }, \
> > @@ -3666,6 +3667,90 @@ TRACE_EVENT(xfs_btree_commit_ifakeroot,
> >  		  __entry->blocks)
> >  )
> >  
> > +TRACE_EVENT(xfs_btree_bload_level_geometry,
> > +	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> > +		 uint64_t nr_this_level, unsigned int nr_per_block,
> > +		 unsigned int desired_npb, uint64_t blocks,
> > +		 uint64_t blocks_with_extra),
> > +	TP_ARGS(cur, level, nr_this_level, nr_per_block, desired_npb, blocks,
> > +		blocks_with_extra),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_btnum_t, btnum)
> > +		__field(unsigned int, level)
> > +		__field(unsigned int, nlevels)
> > +		__field(uint64_t, nr_this_level)
> > +		__field(unsigned int, nr_per_block)
> > +		__field(unsigned int, desired_npb)
> > +		__field(unsigned long long, blocks)
> > +		__field(unsigned long long, blocks_with_extra)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = cur->bc_mp->m_super->s_dev;
> > +		__entry->btnum = cur->bc_btnum;
> > +		__entry->level = level;
> > +		__entry->nlevels = cur->bc_nlevels;
> > +		__entry->nr_this_level = nr_this_level;
> > +		__entry->nr_per_block = nr_per_block;
> > +		__entry->desired_npb = desired_npb;
> > +		__entry->blocks = blocks;
> > +		__entry->blocks_with_extra = blocks_with_extra;
> > +	),
> > +	TP_printk("dev %d:%d btree %s level %u/%u nr_this_level %llu nr_per_block %u desired_npb %u blocks %llu blocks_with_extra %llu",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > +		  __entry->level,
> > +		  __entry->nlevels,
> > +		  __entry->nr_this_level,
> > +		  __entry->nr_per_block,
> > +		  __entry->desired_npb,
> > +		  __entry->blocks,
> > +		  __entry->blocks_with_extra)
> > +)
> > +
> > +TRACE_EVENT(xfs_btree_bload_block,
> > +	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> > +		 uint64_t block_idx, uint64_t nr_blocks,
> > +		 union xfs_btree_ptr *ptr, unsigned int nr_records),
> > +	TP_ARGS(cur, level, block_idx, nr_blocks, ptr, nr_records),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_btnum_t, btnum)
> > +		__field(unsigned int, level)
> > +		__field(unsigned long long, block_idx)
> > +		__field(unsigned long long, nr_blocks)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_agblock_t, agbno)
> > +		__field(unsigned int, nr_records)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = cur->bc_mp->m_super->s_dev;
> > +		__entry->btnum = cur->bc_btnum;
> > +		__entry->level = level;
> > +		__entry->block_idx = block_idx;
> > +		__entry->nr_blocks = nr_blocks;
> > +		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > +			xfs_fsblock_t	fsb = be64_to_cpu(ptr->l);
> > +
> > +			__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsb);
> > +			__entry->agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsb);
> > +		} else {
> > +			__entry->agno = cur->bc_ag.agno;
> > +			__entry->agbno = be32_to_cpu(ptr->s);
> > +		}
> > +		__entry->nr_records = nr_records;
> > +	),
> > +	TP_printk("dev %d:%d btree %s level %u block %llu/%llu fsb (%u/%u) recs %u",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > +		  __entry->level,
> > +		  __entry->block_idx,
> > +		  __entry->nr_blocks,
> > +		  __entry->agno,
> > +		  __entry->agbno,
> > +		  __entry->nr_records)
> > +)
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees
  2020-03-13 14:47   ` Brian Foster
@ 2020-03-13 16:30     ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-13 16:30 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Mar 13, 2020 at 10:47:12AM -0400, Brian Foster wrote:
> On Wed, Mar 11, 2020 at 08:45:37PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create an in-core fake root for AG-rooted btree types so that callers
> > can generate a whole new btree using the upcoming btree bulk load
> > function without making the new tree accessible from the rest of the
> > filesystem.  It is up to the individual btree type to provide a function
> > to create a staged cursor (presumably with the appropriate callouts to
> > update the fakeroot) and then commit the staged root back into the
> > filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_btree.c |  168 +++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_btree.h |   30 ++++++++
> >  fs/xfs/xfs_trace.h        |   28 ++++++++
> >  3 files changed, 225 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 4ef9f0b42c7f..085bc070e804 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> ...
> > @@ -4908,3 +4910,169 @@ xfs_btree_has_more_records(
> >  	else
> >  		return block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK);
> >  }
> > +
> ...
> > +/*
> > + * Initialize a AG-rooted btree cursor with the given AG btree fake root.  The
> > + * btree cursor's bc_ops will be overridden as needed to make the staging
> > + * functionality work.  If new_ops is not NULL, these new ops will be passed
> > + * out to the caller for further overriding.
> > + */
> > +void
> > +xfs_btree_stage_afakeroot(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xbtree_afakeroot		*afake,
> > +	struct xfs_btree_ops		**new_ops)
> > +{
> > +	struct xfs_btree_ops		*nops;
> > +
> > +	ASSERT(!(cur->bc_flags & XFS_BTREE_STAGING));
> > +	ASSERT(!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE));
> > +	ASSERT(cur->bc_tp == NULL);
> > +
> > +	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
> > +	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
> > +	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
> > +	nops->free_block = xfs_btree_fakeroot_free_block;
> > +	nops->init_ptr_from_cur = xfs_btree_fakeroot_init_ptr_from_cur;
> > +	nops->set_root = xfs_btree_afakeroot_set_root;
> > +	nops->dup_cursor = xfs_btree_fakeroot_dup_cursor;
> > +
> > +	cur->bc_ag.afake = afake;
> > +	cur->bc_nlevels = afake->af_levels;
> > +	cur->bc_ops = nops;
> > +	cur->bc_flags |= XFS_BTREE_STAGING;
> > +
> > +	if (new_ops)
> > +		*new_ops = nops;
> 
> Curious why we have new_ops if the caller unconditionally assigns
> ->bc_ops to the same value..? That aside:

The callers don't assign bc_ops anymore, though the benefit of hindsight
is that nobody uses *new_ops here so perhaps it makes more sense to drop
the parameter for _afakeroot.

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com> 
> 
> > +}
> > +
> > +/*
> > + * Transform an AG-rooted staging btree cursor back into a regular cursor by
> > + * substituting a real btree root for the fake one and restoring normal btree
> > + * cursor ops.  The caller must log the btree root change prior to calling
> > + * this.
> > + */
> > +void
> > +xfs_btree_commit_afakeroot(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_trans		*tp,
> > +	struct xfs_buf			*agbp,
> > +	const struct xfs_btree_ops	*ops)
> > +{
> > +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > +	ASSERT(cur->bc_tp == NULL);
> > +
> > +	trace_xfs_btree_commit_afakeroot(cur);
> > +
> > +	kmem_free((void *)cur->bc_ops);
> > +	cur->bc_ag.agbp = agbp;
> > +	cur->bc_ops = ops;
> > +	cur->bc_flags &= ~XFS_BTREE_STAGING;
> > +	cur->bc_tp = tp;
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 0d10bbd5223a..aa4a7bd40023 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -179,7 +179,10 @@ union xfs_btree_irec {
> >  
> >  /* Per-AG btree information. */
> >  struct xfs_btree_cur_ag {
> > -	struct xfs_buf		*agbp;
> > +	union {
> > +		struct xfs_buf		*agbp;
> > +		struct xbtree_afakeroot	*afake;	/* fake ag header root */
> > +	};
> >  	xfs_agnumber_t		agno;
> >  	union {
> >  		struct {
> > @@ -235,6 +238,12 @@ typedef struct xfs_btree_cur
> >  #define XFS_BTREE_LASTREC_UPDATE	(1<<2)	/* track last rec externally */
> >  #define XFS_BTREE_CRC_BLOCKS		(1<<3)	/* uses extended btree blocks */
> >  #define XFS_BTREE_OVERLAPPING		(1<<4)	/* overlapping intervals */
> > +/*
> > + * The root of this btree is a fakeroot structure so that we can stage a btree
> > + * rebuild without leaving it accessible via primary metadata.  The ops struct
> > + * is dynamically allocated and must be freed when the cursor is deleted.
> > + */
> > +#define XFS_BTREE_STAGING		(1<<5)
> >  
> >  
> >  #define	XFS_BTREE_NOERROR	0
> > @@ -515,4 +524,23 @@ xfs_btree_islastblock(
> >  	return block->bb_u.s.bb_rightsib == cpu_to_be32(NULLAGBLOCK);
> >  }
> >  
> > +/* Fake root for an AG-rooted btree. */
> > +struct xbtree_afakeroot {
> > +	/* AG block number of the new btree root. */
> > +	xfs_agblock_t		af_root;
> > +
> > +	/* Height of the new btree. */
> > +	unsigned int		af_levels;
> > +
> > +	/* Number of blocks used by the btree. */
> > +	unsigned int		af_blocks;
> > +};
> > +
> > +/* Cursor interactions with with fake roots for AG-rooted btrees. */
> > +void xfs_btree_stage_afakeroot(struct xfs_btree_cur *cur,
> > +		struct xbtree_afakeroot *afake,
> > +		struct xfs_btree_ops **new_ops);
> > +void xfs_btree_commit_afakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
> > +		struct xfs_buf *agbp, const struct xfs_btree_ops *ops);
> > +
> >  #endif	/* __XFS_BTREE_H__ */
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index 059c3098a4a0..d8c229492973 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3605,6 +3605,34 @@ TRACE_EVENT(xfs_check_new_dalign,
> >  		  __entry->calc_rootino)
> >  )
> >  
> > +TRACE_EVENT(xfs_btree_commit_afakeroot,
> > +	TP_PROTO(struct xfs_btree_cur *cur),
> > +	TP_ARGS(cur),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_btnum_t, btnum)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_agblock_t, agbno)
> > +		__field(unsigned int, levels)
> > +		__field(unsigned int, blocks)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = cur->bc_mp->m_super->s_dev;
> > +		__entry->btnum = cur->bc_btnum;
> > +		__entry->agno = cur->bc_ag.agno;
> > +		__entry->agbno = cur->bc_ag.afake->af_root;
> > +		__entry->levels = cur->bc_ag.afake->af_levels;
> > +		__entry->blocks = cur->bc_ag.afake->af_blocks;
> > +	),
> > +	TP_printk("dev %d:%d btree %s ag %u levels %u blocks %u root %u",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > +		  __entry->agno,
> > +		  __entry->levels,
> > +		  __entry->blocks,
> > +		  __entry->agbno)
> > +)
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees
  2020-03-13 14:47   ` Brian Foster
@ 2020-03-13 16:32     ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-13 16:32 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Mar 13, 2020 at 10:47:21AM -0400, Brian Foster wrote:
> On Wed, Mar 11, 2020 at 08:45:43PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Create an in-core fake root for inode-rooted btree types so that callers
> > can generate a whole new btree using the upcoming btree bulk load
> > function without making the new tree accessible from the rest of the
> > filesystem.  It is up to the individual btree type to provide a function
> > to create a staged cursor (presumably with the appropriate callouts to
> > update the fakeroot) and then commit the staged root back into the
> > filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Same question as the previous patch, but otherwise looks Ok to me:

This one's different -- the bmbt type will use *new_ops to override more
of the function pointers.  None of the _stage_ifakeroot callers will set
bc_ops since the generic staging function does that now.

--D

> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> >  fs/xfs/libxfs/xfs_btree.c |  111 +++++++++++++++++++++++++++++++++++++++++++--
> >  fs/xfs/libxfs/xfs_btree.h |   31 +++++++++++++
> >  fs/xfs/xfs_trace.h        |   33 +++++++++++++
> >  3 files changed, 171 insertions(+), 4 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index 085bc070e804..4e1d4f184d4b 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -644,6 +644,17 @@ xfs_btree_ptr_addr(
> >  		((char *)block + xfs_btree_ptr_offset(cur, n, level));
> >  }
> >  
> > +struct xfs_ifork *
> > +xfs_btree_ifork_ptr(
> > +	struct xfs_btree_cur	*cur)
> > +{
> > +	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
> > +
> > +	if (cur->bc_flags & XFS_BTREE_STAGING)
> > +		return cur->bc_ino.ifake->if_fork;
> > +	return XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork);
> > +}
> > +
> >  /*
> >   * Get the root block which is stored in the inode.
> >   *
> > @@ -654,9 +665,8 @@ STATIC struct xfs_btree_block *
> >  xfs_btree_get_iroot(
> >  	struct xfs_btree_cur	*cur)
> >  {
> > -	struct xfs_ifork	*ifp;
> > +	struct xfs_ifork	*ifp = xfs_btree_ifork_ptr(cur);
> >  
> > -	ifp = XFS_IFORK_PTR(cur->bc_ino.ip, cur->bc_ino.whichfork);
> >  	return (struct xfs_btree_block *)ifp->if_broot;
> >  }
> >  
> > @@ -4985,8 +4995,17 @@ xfs_btree_fakeroot_init_ptr_from_cur(
> >  
> >  	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> >  
> > -	afake = cur->bc_ag.afake;
> > -	ptr->s = cpu_to_be32(afake->af_root);
> > +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> > +		/*
> > +		 * The root block lives in the inode core, so we zero the
> > +		 * pointer (like the bmbt code does) to make it obvious if
> > +		 * anyone ever tries to use this pointer.
> > +		 */
> > +		ptr->l = cpu_to_be64(0);
> > +	} else {
> > +		afake = cur->bc_ag.afake;
> > +		ptr->s = cpu_to_be32(afake->af_root);
> > +	}
> >  }
> >  
> >  /*
> > @@ -5076,3 +5095,87 @@ xfs_btree_commit_afakeroot(
> >  	cur->bc_flags &= ~XFS_BTREE_STAGING;
> >  	cur->bc_tp = tp;
> >  }
> > +
> > +/*
> > + * Bulk Loading for Inode-Rooted Btrees
> > + * ====================================
> > + *
> > + * For a btree rooted in an inode fork, pass a xbtree_ifakeroot structure to
> > + * the staging cursor.  This structure should be initialized as follows:
> > + *
> > + * - if_fork_size field should be set to the number of bytes available to the
> > + *   fork in the inode.
> > + *
> > + * - if_fork should point to a freshly allocated struct xfs_ifork.
> > + *
> > + * - if_format should be set to the appropriate fork type (e.g.
> > + *   XFS_DINODE_FMT_BTREE).
> > + *
> > + * All other fields must be zero.
> > + *
> > + * The _stage_cursor() function for a specific btree type should call
> > + * xfs_btree_stage_ifakeroot to set up the in-memory cursor as a staging
> > + * cursor.  The corresponding _commit_staged_btree() function should log the
> > + * new root and call xfs_btree_commit_ifakeroot() to transform the staging
> > + * cursor into a regular btree cursor.
> > + */
> > +
> > +/*
> > + * Initialize an inode-rooted btree cursor with the given inode btree fake
> > + * root.  The btree cursor's bc_ops will be overridden as needed to make the
> > + * staging functionality work.  If new_ops is not NULL, these new ops will be
> > + * passed out to the caller for further overriding.
> > + */
> > +void
> > +xfs_btree_stage_ifakeroot(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xbtree_ifakeroot		*ifake,
> > +	struct xfs_btree_ops		**new_ops)
> > +{
> > +	struct xfs_btree_ops		*nops;
> > +
> > +	ASSERT(!(cur->bc_flags & XFS_BTREE_STAGING));
> > +	ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
> > +	ASSERT(cur->bc_tp == NULL);
> > +
> > +	nops = kmem_alloc(sizeof(struct xfs_btree_ops), KM_NOFS);
> > +	memcpy(nops, cur->bc_ops, sizeof(struct xfs_btree_ops));
> > +	nops->alloc_block = xfs_btree_fakeroot_alloc_block;
> > +	nops->free_block = xfs_btree_fakeroot_free_block;
> > +	nops->init_ptr_from_cur = xfs_btree_fakeroot_init_ptr_from_cur;
> > +	nops->dup_cursor = xfs_btree_fakeroot_dup_cursor;
> > +
> > +	cur->bc_ino.ifake = ifake;
> > +	cur->bc_nlevels = ifake->if_levels;
> > +	cur->bc_ops = nops;
> > +	cur->bc_flags |= XFS_BTREE_STAGING;
> > +
> > +	if (new_ops)
> > +		*new_ops = nops;
> > +}
> > +
> > +/*
> > + * Transform an inode-rooted staging btree cursor back into a regular cursor by
> > + * substituting a real btree root for the fake one and restoring normal btree
> > + * cursor ops.  The caller must log the btree root change prior to calling
> > + * this.
> > + */
> > +void
> > +xfs_btree_commit_ifakeroot(
> > +	struct xfs_btree_cur		*cur,
> > +	struct xfs_trans		*tp,
> > +	int				whichfork,
> > +	const struct xfs_btree_ops	*ops)
> > +{
> > +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > +	ASSERT(cur->bc_tp == NULL);
> > +
> > +	trace_xfs_btree_commit_ifakeroot(cur);
> > +
> > +	kmem_free((void *)cur->bc_ops);
> > +	cur->bc_ino.ifake = NULL;
> > +	cur->bc_ino.whichfork = whichfork;
> > +	cur->bc_ops = ops;
> > +	cur->bc_flags &= ~XFS_BTREE_STAGING;
> > +	cur->bc_tp = tp;
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index aa4a7bd40023..047067f52063 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -10,6 +10,7 @@ struct xfs_buf;
> >  struct xfs_inode;
> >  struct xfs_mount;
> >  struct xfs_trans;
> > +struct xfs_ifork;
> >  
> >  extern kmem_zone_t	*xfs_btree_cur_zone;
> >  
> > @@ -198,6 +199,7 @@ struct xfs_btree_cur_ag {
> >  /* Btree-in-inode cursor information */
> >  struct xfs_btree_cur_ino {
> >  	struct xfs_inode	*ip;
> > +	struct xbtree_ifakeroot	*ifake;		/* fake inode fork */
> >  	int			allocated;
> >  	short			forksize;
> >  	char			whichfork;
> > @@ -506,6 +508,7 @@ union xfs_btree_key *xfs_btree_high_key_from_key(struct xfs_btree_cur *cur,
> >  int xfs_btree_has_record(struct xfs_btree_cur *cur, union xfs_btree_irec *low,
> >  		union xfs_btree_irec *high, bool *exists);
> >  bool xfs_btree_has_more_records(struct xfs_btree_cur *cur);
> > +struct xfs_ifork *xfs_btree_ifork_ptr(struct xfs_btree_cur *cur);
> >  
> >  /* Does this cursor point to the last block in the given level? */
> >  static inline bool
> > @@ -543,4 +546,32 @@ void xfs_btree_stage_afakeroot(struct xfs_btree_cur *cur,
> >  void xfs_btree_commit_afakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
> >  		struct xfs_buf *agbp, const struct xfs_btree_ops *ops);
> >  
> > +/* Fake root for an inode-rooted btree. */
> > +struct xbtree_ifakeroot {
> > +	/* Fake inode fork. */
> > +	struct xfs_ifork	*if_fork;
> > +
> > +	/* Number of blocks used by the btree. */
> > +	int64_t			if_blocks;
> > +
> > +	/* Height of the new btree. */
> > +	unsigned int		if_levels;
> > +
> > +	/* Number of bytes available for this fork in the inode. */
> > +	unsigned int		if_fork_size;
> > +
> > +	/* Fork format. */
> > +	unsigned int		if_format;
> > +
> > +	/* Number of records. */
> > +	unsigned int		if_extents;
> > +};
> > +
> > +/* Cursor interactions with with fake roots for inode-rooted btrees. */
> > +void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
> > +		struct xbtree_ifakeroot *ifake,
> > +		struct xfs_btree_ops **new_ops);
> > +void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, struct xfs_trans *tp,
> > +		int whichfork, const struct xfs_btree_ops *ops);
> > +
> >  #endif	/* __XFS_BTREE_H__ */
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index d8c229492973..05db0398f040 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -3633,6 +3633,39 @@ TRACE_EVENT(xfs_btree_commit_afakeroot,
> >  		  __entry->agbno)
> >  )
> >  
> > +TRACE_EVENT(xfs_btree_commit_ifakeroot,
> > +	TP_PROTO(struct xfs_btree_cur *cur),
> > +	TP_ARGS(cur),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_btnum_t, btnum)
> > +		__field(xfs_agnumber_t, agno)
> > +		__field(xfs_agino_t, agino)
> > +		__field(unsigned int, levels)
> > +		__field(unsigned int, blocks)
> > +		__field(int, whichfork)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = cur->bc_mp->m_super->s_dev;
> > +		__entry->btnum = cur->bc_btnum;
> > +		__entry->agno = XFS_INO_TO_AGNO(cur->bc_mp,
> > +					cur->bc_ino.ip->i_ino);
> > +		__entry->agino = XFS_INO_TO_AGINO(cur->bc_mp,
> > +					cur->bc_ino.ip->i_ino);
> > +		__entry->levels = cur->bc_ino.ifake->if_levels;
> > +		__entry->blocks = cur->bc_ino.ifake->if_blocks;
> > +		__entry->whichfork = cur->bc_ino.whichfork;
> > +	),
> > +	TP_printk("dev %d:%d btree %s ag %u agino %u whichfork %s levels %u blocks %u",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > +		  __entry->agno,
> > +		  __entry->agino,
> > +		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
> > +		  __entry->levels,
> > +		  __entry->blocks)
> > +)
> > +
> >  #endif /* _TRACE_XFS_H */
> >  
> >  #undef TRACE_INCLUDE_PATH
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] xfs: add support for free space btree staging cursors
  2020-03-16 12:29   ` Brian Foster
@ 2020-03-16 14:58     ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-16 14:58 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Mar 16, 2020 at 08:29:42AM -0400, Brian Foster wrote:
> On Sun, Mar 15, 2020 at 04:51:06PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Add support for btree staging cursors for the free space btrees.  This
> > is needed both for online repair and also to convert xfs_repair to use
> > btree bulk loading.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc_btree.c |   98 +++++++++++++++++++++++++++++++--------
> >  fs/xfs/libxfs/xfs_alloc_btree.h |    7 +++
> >  2 files changed, 86 insertions(+), 19 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
> > index a28041fdf4c0..93792ee7924e 100644
> > --- a/fs/xfs/libxfs/xfs_alloc_btree.c
> > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c
> ...
> > @@ -485,36 +520,61 @@ xfs_allocbt_init_cursor(
> ...
> >  
> > +/*
> > + * Install a new free space btree root.  Caller is responsible for invalidating
> > + * and freeing the old btree blocks.
> > + */
> > +void
> > +xfs_allocbt_commit_staged_btree(
> > +	struct xfs_btree_cur	*cur,
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agbp)
> > +{
> > +	struct xfs_agf		*agf = agbp->b_addr;
> > +	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
> > +
> > +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > +
> > +	agf->agf_roots[cur->bc_btnum] = cpu_to_be32(afake->af_root);
> > +	agf->agf_levels[cur->bc_btnum] = cpu_to_be32(afake->af_levels);
> > +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
> > +
> > +	if (cur->bc_btnum == XFS_BTNUM_BNO) {
> > +		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_bnobt_ops);
> > +	} else {
> > +		cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE;
> 
> Any reason this is set here and not at init time for the staging cursor?

Originally it was so that ->update_lastrec couldn't get called, but
since you're not supposed to be calling the regular btree operations
anyway I suppose it doesn't matter if the flag is set in staging
cursors... will fix.

--D

> Brian
> 
> > +		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_cntbt_ops);
> > +	}
> > +}
> > +
> >  /*
> >   * Calculate number of records in an alloc btree block.
> >   */
> > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
> > index c9305ebb69f6..047f09f0be3c 100644
> > --- a/fs/xfs/libxfs/xfs_alloc_btree.h
> > +++ b/fs/xfs/libxfs/xfs_alloc_btree.h
> > @@ -13,6 +13,7 @@
> >  struct xfs_buf;
> >  struct xfs_btree_cur;
> >  struct xfs_mount;
> > +struct xbtree_afakeroot;
> >  
> >  /*
> >   * Btree block header size depends on a superblock flag.
> > @@ -48,8 +49,14 @@ struct xfs_mount;
> >  extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *,
> >  		struct xfs_trans *, struct xfs_buf *,
> >  		xfs_agnumber_t, xfs_btnum_t);
> > +struct xfs_btree_cur *xfs_allocbt_stage_cursor(struct xfs_mount *mp,
> > +		struct xbtree_afakeroot *afake, xfs_agnumber_t agno,
> > +		xfs_btnum_t btnum);
> >  extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
> >  extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
> >  		unsigned long long len);
> >  
> > +void xfs_allocbt_commit_staged_btree(struct xfs_btree_cur *cur,
> > +		struct xfs_trans *tp, struct xfs_buf *agbp);
> > +
> >  #endif	/* __XFS_ALLOC_BTREE_H__ */
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] xfs: add support for free space btree staging cursors
  2020-03-15 23:51 ` [PATCH 4/7] xfs: add support for free space btree staging cursors Darrick J. Wong
@ 2020-03-16 12:29   ` Brian Foster
  2020-03-16 14:58     ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2020-03-16 12:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Sun, Mar 15, 2020 at 04:51:06PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Add support for btree staging cursors for the free space btrees.  This
> is needed both for online repair and also to convert xfs_repair to use
> btree bulk loading.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_alloc_btree.c |   98 +++++++++++++++++++++++++++++++--------
>  fs/xfs/libxfs/xfs_alloc_btree.h |    7 +++
>  2 files changed, 86 insertions(+), 19 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
> index a28041fdf4c0..93792ee7924e 100644
> --- a/fs/xfs/libxfs/xfs_alloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_alloc_btree.c
...
> @@ -485,36 +520,61 @@ xfs_allocbt_init_cursor(
...
>  
> +/*
> + * Install a new free space btree root.  Caller is responsible for invalidating
> + * and freeing the old btree blocks.
> + */
> +void
> +xfs_allocbt_commit_staged_btree(
> +	struct xfs_btree_cur	*cur,
> +	struct xfs_trans	*tp,
> +	struct xfs_buf		*agbp)
> +{
> +	struct xfs_agf		*agf = agbp->b_addr;
> +	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
> +
> +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> +
> +	agf->agf_roots[cur->bc_btnum] = cpu_to_be32(afake->af_root);
> +	agf->agf_levels[cur->bc_btnum] = cpu_to_be32(afake->af_levels);
> +	xfs_alloc_log_agf(tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
> +
> +	if (cur->bc_btnum == XFS_BTNUM_BNO) {
> +		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_bnobt_ops);
> +	} else {
> +		cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE;

Any reason this is set here and not at init time for the staging cursor?

Brian

> +		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_cntbt_ops);
> +	}
> +}
> +
>  /*
>   * Calculate number of records in an alloc btree block.
>   */
> diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
> index c9305ebb69f6..047f09f0be3c 100644
> --- a/fs/xfs/libxfs/xfs_alloc_btree.h
> +++ b/fs/xfs/libxfs/xfs_alloc_btree.h
> @@ -13,6 +13,7 @@
>  struct xfs_buf;
>  struct xfs_btree_cur;
>  struct xfs_mount;
> +struct xbtree_afakeroot;
>  
>  /*
>   * Btree block header size depends on a superblock flag.
> @@ -48,8 +49,14 @@ struct xfs_mount;
>  extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *,
>  		struct xfs_trans *, struct xfs_buf *,
>  		xfs_agnumber_t, xfs_btnum_t);
> +struct xfs_btree_cur *xfs_allocbt_stage_cursor(struct xfs_mount *mp,
> +		struct xbtree_afakeroot *afake, xfs_agnumber_t agno,
> +		xfs_btnum_t btnum);
>  extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
>  extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
>  		unsigned long long len);
>  
> +void xfs_allocbt_commit_staged_btree(struct xfs_btree_cur *cur,
> +		struct xfs_trans *tp, struct xfs_buf *agbp);
> +
>  #endif	/* __XFS_ALLOC_BTREE_H__ */
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 4/7] xfs: add support for free space btree staging cursors
  2020-03-15 23:50 [PATCH v5 0/7] xfs: btree bulk loading Darrick J. Wong
@ 2020-03-15 23:51 ` Darrick J. Wong
  2020-03-16 12:29   ` Brian Foster
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2020-03-15 23:51 UTC (permalink / raw)
  To: darrick.wong; +Cc: linux-xfs, bfoster

From: Darrick J. Wong <darrick.wong@oracle.com>

Add support for btree staging cursors for the free space btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc_btree.c |   98 +++++++++++++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_alloc_btree.h |    7 +++
 2 files changed, 86 insertions(+), 19 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index a28041fdf4c0..93792ee7924e 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -12,6 +12,7 @@
 #include "xfs_sb.h"
 #include "xfs_mount.h"
 #include "xfs_btree.h"
+#include "xfs_btree_staging.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_alloc.h"
 #include "xfs_extent_busy.h"
@@ -19,7 +20,6 @@
 #include "xfs_trace.h"
 #include "xfs_trans.h"
 
-
 STATIC struct xfs_btree_cur *
 xfs_allocbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
@@ -471,6 +471,41 @@ static const struct xfs_btree_ops xfs_cntbt_ops = {
 	.recs_inorder		= xfs_cntbt_recs_inorder,
 };
 
+/* Allocate most of a new allocation btree cursor. */
+STATIC struct xfs_btree_cur *
+xfs_allocbt_init_common(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum)
+{
+	struct xfs_btree_cur	*cur;
+
+	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
+
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
+
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = btnum;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ag.agno = agno;
+	cur->bc_ag.abt.active = false;
+
+	if (btnum == XFS_BTNUM_CNT) {
+		cur->bc_ops = &xfs_cntbt_ops;
+		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtc_2);
+	} else {
+		cur->bc_ops = &xfs_bnobt_ops;
+		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtb_2);
+	}
+
+	if (xfs_sb_version_hascrc(&mp->m_sb))
+		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+
+	return cur;
+}
+
 /*
  * Allocate a new allocation btree cursor.
  */
@@ -485,36 +520,61 @@ xfs_allocbt_init_cursor(
 	struct xfs_agf		*agf = agbp->b_addr;
 	struct xfs_btree_cur	*cur;
 
-	ASSERT(btnum == XFS_BTNUM_BNO || btnum == XFS_BTNUM_CNT);
-
-	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_NOFS);
-
-	cur->bc_tp = tp;
-	cur->bc_mp = mp;
-	cur->bc_btnum = btnum;
-	cur->bc_blocklog = mp->m_sb.sb_blocklog;
-
+	cur = xfs_allocbt_init_common(mp, tp, agno, btnum);
 	if (btnum == XFS_BTNUM_CNT) {
-		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtc_2);
-		cur->bc_ops = &xfs_cntbt_ops;
 		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]);
-		cur->bc_flags = XFS_BTREE_LASTREC_UPDATE;
+		cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE;
 	} else {
-		cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_abtb_2);
-		cur->bc_ops = &xfs_bnobt_ops;
 		cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNO]);
 	}
 
 	cur->bc_ag.agbp = agbp;
-	cur->bc_ag.agno = agno;
-	cur->bc_ag.abt.active = false;
 
-	if (xfs_sb_version_hascrc(&mp->m_sb))
-		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+	return cur;
+}
 
+/* Create a free space btree cursor with a fake root for staging. */
+struct xfs_btree_cur *
+xfs_allocbt_stage_cursor(
+	struct xfs_mount	*mp,
+	struct xbtree_afakeroot	*afake,
+	xfs_agnumber_t		agno,
+	xfs_btnum_t		btnum)
+{
+	struct xfs_btree_cur	*cur;
+
+	cur = xfs_allocbt_init_common(mp, NULL, agno, btnum);
+	xfs_btree_stage_afakeroot(cur, afake);
 	return cur;
 }
 
+/*
+ * Install a new free space btree root.  Caller is responsible for invalidating
+ * and freeing the old btree blocks.
+ */
+void
+xfs_allocbt_commit_staged_btree(
+	struct xfs_btree_cur	*cur,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp)
+{
+	struct xfs_agf		*agf = agbp->b_addr;
+	struct xbtree_afakeroot	*afake = cur->bc_ag.afake;
+
+	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
+
+	agf->agf_roots[cur->bc_btnum] = cpu_to_be32(afake->af_root);
+	agf->agf_levels[cur->bc_btnum] = cpu_to_be32(afake->af_levels);
+	xfs_alloc_log_agf(tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+
+	if (cur->bc_btnum == XFS_BTNUM_BNO) {
+		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_bnobt_ops);
+	} else {
+		cur->bc_flags |= XFS_BTREE_LASTREC_UPDATE;
+		xfs_btree_commit_afakeroot(cur, tp, agbp, &xfs_cntbt_ops);
+	}
+}
+
 /*
  * Calculate number of records in an alloc btree block.
  */
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h
index c9305ebb69f6..047f09f0be3c 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.h
+++ b/fs/xfs/libxfs/xfs_alloc_btree.h
@@ -13,6 +13,7 @@
 struct xfs_buf;
 struct xfs_btree_cur;
 struct xfs_mount;
+struct xbtree_afakeroot;
 
 /*
  * Btree block header size depends on a superblock flag.
@@ -48,8 +49,14 @@ struct xfs_mount;
 extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_buf *,
 		xfs_agnumber_t, xfs_btnum_t);
+struct xfs_btree_cur *xfs_allocbt_stage_cursor(struct xfs_mount *mp,
+		struct xbtree_afakeroot *afake, xfs_agnumber_t agno,
+		xfs_btnum_t btnum);
 extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int);
 extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
 
+void xfs_allocbt_commit_staged_btree(struct xfs_btree_cur *cur,
+		struct xfs_trans *tp, struct xfs_buf *agbp);
+
 #endif	/* __XFS_ALLOC_BTREE_H__ */


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-16 14:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12  3:45 [PATCH v4 0/7] xfs: btree bulk loading Darrick J. Wong
2020-03-12  3:45 ` [PATCH 1/7] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
2020-03-13 14:47   ` Brian Foster
2020-03-13 16:30     ` Darrick J. Wong
2020-03-12  3:45 ` [PATCH 2/7] xfs: introduce fake roots for inode-rooted btrees Darrick J. Wong
2020-03-13 14:47   ` Brian Foster
2020-03-13 16:32     ` Darrick J. Wong
2020-03-12  3:45 ` [PATCH 3/7] xfs: support bulk loading of staged btrees Darrick J. Wong
2020-03-13 14:49   ` Brian Foster
2020-03-13 16:28     ` Darrick J. Wong
2020-03-12  3:45 ` [PATCH 4/7] xfs: add support for free space btree staging cursors Darrick J. Wong
2020-03-12  3:46 ` [PATCH 5/7] xfs: add support for inode " Darrick J. Wong
2020-03-12  3:46 ` [PATCH 6/7] xfs: add support for refcount " Darrick J. Wong
2020-03-12  3:46 ` [PATCH 7/7] xfs: add support for rmap " Darrick J. Wong
2020-03-15 23:50 [PATCH v5 0/7] xfs: btree bulk loading Darrick J. Wong
2020-03-15 23:51 ` [PATCH 4/7] xfs: add support for free space btree staging cursors Darrick J. Wong
2020-03-16 12:29   ` Brian Foster
2020-03-16 14:58     ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.