All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/4] xfs: support bulk loading of staged btrees
Date: Sat, 7 Mar 2020 07:40:41 -0500	[thread overview]
Message-ID: <20200307124041.GA27459@bfoster> (raw)
In-Reply-To: <20200306201458.GA8045@magnolia>

On Fri, Mar 06, 2020 at 12:14:58PM -0800, Darrick J. Wong wrote:
> On Fri, Mar 06, 2020 at 12:21:29PM -0500, Brian Foster wrote:
> > On Fri, Mar 06, 2020 at 08:27:05AM -0800, Darrick J. Wong wrote:
> > > On Fri, Mar 06, 2020 at 09:22:50AM -0500, Brian Foster wrote:
> > > > On Thu, Mar 05, 2020 at 10:13:29AM -0800, Darrick J. Wong wrote:
> > > > > On Thu, Mar 05, 2020 at 09:30:29AM -0500, Brian Foster wrote:
> > > > > > On Wed, Mar 04, 2020 at 05:22:13PM -0800, Darrick J. Wong wrote:
> > > > > > > On Wed, Mar 04, 2020 at 01:21:44PM -0500, Brian Foster wrote:
> > > > > > > > On Tue, Mar 03, 2020 at 07:28:41PM -0800, Darrick J. Wong wrote:
> > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > > > 
> > > > > > > > > Add a new btree function that enables us to bulk load a btree cursor.
> > > > > > > > > This will be used by the upcoming online repair patches to generate new
> > > > > > > > > btrees.  This avoids the programmatic inefficiency of calling
> > > > > > > > > xfs_btree_insert in a loop (which generates a lot of log traffic) in
> > > > > > > > > favor of stamping out new btree blocks with ordered buffers, and then
> > > > > > > > > committing both the new root and scheduling the removal of the old btree
> > > > > > > > > blocks in a single transaction commit.
> > > > > > > > > 
> > > > > > > > > The design of this new generic code is based off the btree rebuilding
> > > > > > > > > code in xfs_repair's phase 5 code, with the explicit goal of enabling us
> > > > > > > > > to share that code between scrub and repair.  It has the additional
> > > > > > > > > feature of being able to control btree block loading factors.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > > > ---
> > > > > > > > >  fs/xfs/libxfs/xfs_btree.c |  581 +++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > > >  fs/xfs/libxfs/xfs_btree.h |   46 ++++
> > > > > > > > >  fs/xfs/xfs_trace.c        |    1 
> > > > > > > > >  fs/xfs/xfs_trace.h        |   85 +++++++
> > > > > > > > >  4 files changed, 712 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > > > > > > > > index 469e1e9053bb..c21db7ed8481 100644
> > > > > > > > > --- a/fs/xfs/libxfs/xfs_btree.c
> > > > > > > > > +++ b/fs/xfs/libxfs/xfs_btree.c
> > > > ...
> > > > > > > > > +	for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
> > > > > > > > > +		uint64_t	level_blocks;
> > > > > > > > > +		uint64_t	dontcare64;
> > > > > > > > > +		unsigned int	level = cur->bc_nlevels - 1;
> > > > > > > > > +		unsigned int	avg_per_block;
> > > > > > > > > +
> > > > > > > > > +		/*
> > > > > > > > > +		 * If all the things we want to store at this level would fit
> > > > > > > > > +		 * in a single root block, then we have our btree root and are
> > > > > > > > > +		 * done.  Note that bmap btrees do not allow records in the
> > > > > > > > > +		 * root.
> > > > > > > > > +		 */
> > > > > > > > > +		if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) || level != 0) {
> > > > > > > > > +			xfs_btree_bload_level_geometry(cur, bbl, level,
> > > > > > > > > +					nr_this_level, &avg_per_block,
> > > > > > > > > +					&level_blocks, &dontcare64);
> > > > > > > > > +			if (nr_this_level <= avg_per_block) {
> > > > > > > > > +				nr_blocks++;
> > > > > > > > > +				break;
> > > > > > > > > +			}
> > > > > > > > > +		}
> > > > > > > > > +
> > > > > > > > > +		/*
> > > > > > > > > +		 * Otherwise, we have to store all the records for this level
> > > > > > > > > +		 * in blocks and therefore need another level of btree to point
> > > > > > > > > +		 * to those blocks.  Increase the number of levels and
> > > > > > > > > +		 * recompute the number of records we can store at this level
> > > > > > > > > +		 * because that can change depending on whether or not a level
> > > > > > > > > +		 * is the root level.
> > > > > > > > > +		 */
> > > > > > > > > +		cur->bc_nlevels++;
> > > > > > > > 
> > > > > > > > Hmm.. so does the ->bc_nlevels increment affect the
> > > > > > > > _bload_level_geometry() call or is it just part of the loop iteration?
> > > > > > > > If the latter, can these two _bload_level_geometry() calls be combined?
> > > > > > > 
> > > > > > > It affects the xfs_btree_bload_level_geometry call because that calls
> > > > > > > ->get_maxrecs(), which returns a different answer for the root level
> > > > > > > when the root is an inode fork.  Therefore, we cannot combine the calls.
> > > > > > > 
> > > > > > 
> > > > > > Hmm.. but doesn't this cause double calls for other cases? I.e. for
> > > > > > non-inode rooted trees it looks like we call the function once, check
> > > > > > the avg_per_block and then potentially call it again until we get to the
> > > > > > root block. Confused.. :/
> > > > > 
> > > > > Yes, we do end up computing the geometry twice per level, which frees
> > > > > the bulkload code from having to know anything at all about the
> > > > > relationship between bc_nlevels and specific behaviors of some of the
> > > > > ->maxrecs functions.
> > > > > 
> > > > 
> > > > Sort of.. I think the pattern is odd enough that the fact it needs to
> > > > accommodate this special case kind of bleeds through even though it
> > > > isn't explicit.
> > > 
> > > <nod>
> > > 
> > > > > I guess you could do:
> > > > > 
> > > > > 	xfs_btree_bload_level_geometry(...)
> > > > > 
> > > > > 	if ((!ROOT_IN_INODE || level != 0) ** nr_this_level <= avg_per_block) {
> > > > > 		nr_blocks++
> > > > > 		break
> > > > > 	}
> > > > > 
> > > > > 	nlevels++
> > > > > 
> > > > > 	if (ROOT_IN_INODE) {
> > > > > 		xfs_btree_bload_level_geometry(...)
> > > > > 	}
> > > > > 
> > > > > 	nr_blocks += level_blocks
> > > > > 	nr_this_level = level_blocks
> > > > > 
> > > > > ...which would be slightly more efficient for AG btrees, though my
> > > > > crappy perf trace showed that the overhead for the _level_geometry()
> > > > > calls is ~0.4% even for a huge ugly rmap btree because most of the time
> > > > > gets spent in the delwri_submit_buffers at the end.
> > > > > 
> > > > 
> > > > It wasn't primarily a performance concern rather than a "this sure looks
> > > > like we call the function twice per loop for no reason" comment.
> > > > Something like the above might be more clear, but I need to make sure I
> > > > understand this loop first...
> > > > 
> > > > Having stared at this some more, I _think_ I understand why this is
> > > > written as such. For the non-inode rooted case, the double call is
> > > > basically unnecessary so the whole loop could look something like this
> > > > (if we factored out the bmbt case):
> > > > 
> > > > 	for (cur->bc_nlevels = 1; ...) {
> > > > 		xfs_btree_bload_level_geometry(...);
> > > > 		if (nr_this_level <= avg_per_block) {
> > > > 			nr_blocks++;
> > > > 			break;
> > > > 		}
> > > > 		cur->bc_nlevels++;
> > > > 		nr_this_level = level_blocks;
> > > > 	}
> > > > 
> > > > Is that correct?
> > > 
> > > Yes, that is correct for inode-rooted ("non-bmbt") btrees.
> > > 
> > > > The bmbt case has these special cases where 1.) the bmbt root must be a
> > > > node block (not a leaf) and 2.) the root block has different size rules
> > > > than a typical node block because it's in the inode fork.
> > > 
> > > Right.
> > > 
> > > > The former
> > > > seems straightforward and explains the level != 0 check. The latter is
> > > > detected by the (level == ->bc_nlevels - 1) condition down in the
> > > > maxrecs code, so that means the order of ->bc_nlevels++ increment with
> > > > respect to the geometry call affects whether we check for a potential
> > > > bmbt root or regular node block (assuming level != 0).
> > > 
> > > Right.
> > > 
> > > > Hence, the bottom
> > > > part of the loop does the increment first and makes the geometry call
> > > > again... Am I following that correctly?
> > > 
> > > Correct.
> > > 
> > > > If so, I think at the very least the existing comments should start by
> > > > explaining the intentional construction of the loop and subtle ordering
> > > > requirements between ->bc_nlevels and the geometry calls for the bmbt.
> > > > In staring at it a bit more, I find something like the following more
> > > > clear even though it is more verbose:
> > > > 
> > > >        for (cur->bc_nlevels = 1; cur->bc_nlevels < XFS_BTREE_MAXLEVELS;) {
> > > >                 ...
> > > >                 xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > >                                                &avg_per_block,&level_blocks,
> > > >                                                &dontcare64);
> > > >                 if (<inode rooted>) {
> > > >                         /* bmbt root must be node format, skip check for level 0 */
> > > >                         if (level != 0 && nr_this_level <= avg_per_block) {
> > > >                                 nr_blocks++;
> > > >                                 break;
> > > >                         }
> > > >                         /*
> > > > 			 * We have to calculate geometry for each bmbt level
> > > > 			 * twice because there is a distinction between a bmbt
> > > > 			 * root in an inode fork and a traditional node block.
> > > > 			 * This distinction is made in the btree code based on
> > > > 			 * whether level == ->bc_nlevels - 1. We aren't yet at
> > > > 			 * the root, so bump ->bc_nevels and recalculate
> > > > 			 * geometry for a traditional node block tree level.
> > > 
> > > Oooh, I like this comment.  I'll put that in, with a bit of rewording?
> > > 
> > > 	/*
> > > 	 * Otherwise, we have to store all the items for this
> > > 	 * level in traditional btree blocks and therefore need
> > > 	 * another level of btree to point to those blocks.
> > > 	 *
> > > 	 * We have to re-compute the geometry for each level of
> > > 	 * an inode-rooted btree because the geometry differs
> > > 	 * between a btree root in an inode fork and a
> > > 	 * traditional btree block.
> > > 	 *
> > > 	 * This distinction is made in the btree code based on
> > > 	 * whether level == ->bc_nlevels - 1.  We know that we
> > > 	 * aren't yet ready to populate the root, so increment
> > > 	 * ->bc_nevels and recalculate the geometry for a
> > > 	 * traditional block-based btree level.
> > > 	 */
> > > 	cur->bc_nlevels++;
> > > 	xfs_btree_bload_level_geometry(...);
> > 
> > One thing I would tweak (even when reading back my own comment) is to
> > say something like "We aren't ready to populate the root based on the
> > previous check ..." in that last sentence so it's clear the previous
> > check was the "root check." Otherwise that looks good to me, thanks!
> 
> "This distinction is made in the btree code based on whether level ==
> bc_nlevels - 1.  Based on the previous root block size check against the
> root block geometry, we know that we aren't yet ready to populate the
> root.  Increment bc_nlevels and recalculate the geometry for a
> traditional block-based btree level."?
> 

ACK

> --D
> 
> > Brian
> > 
> > > 
> > > 
> > > >                          */
> > > >                         cur->bc_nlevels++;
> > > >                         xfs_btree_bload_level_geometry();
> > > >                 } else {
> > > >                         if (nr_this_level <= avg_per_block) {
> > > >                                 nr_blocks++;
> > > >                                 break;
> > > >                         }
> > > >                         cur->bc_nlevels++;
> > > >                 }
> > > > 
> > > >                 nr_blocks += level_blocks;
> > > >                 nr_this_level = level_blocks;
> > > >         }
> > > > 
> > > > The comments and whatnot could use massaging and perhaps it would still
> > > > be fine to factor out the root block check from the if/else, but that
> > > > illustrates the idea. Thoughts?
> > > 
> > > That structure (and the more verbose commenting) looks good to me.
> > > Truth be told, it took me quite a while to work out all these weird
> > > kinks vs. the phase5.c code which never had to deal with bmbts.
> > > 
> > > Onto the next email...
> > > 
> > > --D
> > > 
> > > > Brian
> > > > 
> > > > > --D
> > > > > 
> > > > > > Brian
> > > > > > 
> > > > > > > > 
> > > > > > > > > +		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > > > > > > +				&avg_per_block, &level_blocks, &dontcare64);
> > > > > > > > > +		nr_blocks += level_blocks;
> > > > > > > > > +		nr_this_level = level_blocks;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	if (cur->bc_nlevels == XFS_BTREE_MAXLEVELS)
> > > > > > > > > +		return -EOVERFLOW;
> > > > > > > > > +
> > > > > > > > > +	bbl->btree_height = cur->bc_nlevels;
> > > > > > > > > +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE)
> > > > > > > > > +		bbl->nr_blocks = nr_blocks - 1;
> > > > > > > > > +	else
> > > > > > > > > +		bbl->nr_blocks = nr_blocks;
> > > > > > > > > +	return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Bulk load a btree.
> > > > > > > > > + *
> > > > > > > > > + * Load @bbl->nr_records quantity of records into a btree using the supplied
> > > > > > > > > + * empty and staging btree cursor @cur and a @bbl that has been filled out by
> > > > > > > > > + * the xfs_btree_bload_compute_geometry function.
> > > > > > > > > + *
> > > > > > > > > + * The @bbl->get_data function must populate the cursor's bc_rec every time it
> > > > > > > > > + * is called.  The @bbl->alloc_block function will be used to allocate new
> > > > > > > > > + * btree blocks.  @priv is passed to both functions.
> > > > > > > > > + *
> > > > > > > > > + * Caller must ensure that @cur is a staging cursor.  Any existing btree rooted
> > > > > > > > > + * in the fakeroot will be lost, so do not call this function twice.
> > > > > > > > > + */
> > > > > > > > > +int
> > > > > > > > > +xfs_btree_bload(
> > > > > > > > > +	struct xfs_btree_cur		*cur,
> > > > > > > > > +	struct xfs_btree_bload		*bbl,
> > > > > > > > > +	void				*priv)
> > > > > > > > > +{
> > > > > > > > > +	union xfs_btree_ptr		child_ptr;
> > > > > > > > > +	union xfs_btree_ptr		ptr;
> > > > > > > > > +	struct xfs_buf			*bp = NULL;
> > > > > > > > > +	struct xfs_btree_block		*block = NULL;
> > > > > > > > > +	uint64_t			nr_this_level = bbl->nr_records;
> > > > > > > > > +	uint64_t			blocks;
> > > > > > > > > +	uint64_t			i;
> > > > > > > > > +	uint64_t			blocks_with_extra;
> > > > > > > > > +	uint64_t			total_blocks = 0;
> > > > > > > > > +	unsigned int			avg_per_block;
> > > > > > > > > +	unsigned int			level = 0;
> > > > > > > > > +	int				ret;
> > > > > > > > > +
> > > > > > > > > +	ASSERT(cur->bc_flags & XFS_BTREE_STAGING);
> > > > > > > > > +
> > > > > > > > > +	INIT_LIST_HEAD(&bbl->buffers_list);
> > > > > > > > > +	cur->bc_nlevels = bbl->btree_height;
> > > > > > > > > +	xfs_btree_set_ptr_null(cur, &child_ptr);
> > > > > > > > > +	xfs_btree_set_ptr_null(cur, &ptr);
> > > > > > > > > +
> > > > > > > > > +	xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > > > > > > +			&avg_per_block, &blocks, &blocks_with_extra);
> > > > > > > > > +
> > > > > > > > > +	/* Load each leaf block. */
> > > > > > > > > +	for (i = 0; i < blocks; i++) {
> > > > > > > > > +		unsigned int		nr_this_block = avg_per_block;
> > > > > > > > > +
> > > > > > > > > +		if (i < blocks_with_extra)
> > > > > > > > > +			nr_this_block++;
> > > > > > > > > +
> > > > > > > > > +		ret = xfs_btree_bload_prep_block(cur, bbl, level,
> > > > > > > > > +				nr_this_block, &ptr, &bp, &block, priv);
> > > > > > > > > +		if (ret)
> > > > > > > > > +			return ret;
> > > > > > > > > +
> > > > > > > > > +		trace_xfs_btree_bload_block(cur, level, i, blocks, &ptr,
> > > > > > > > > +				nr_this_block);
> > > > > > > > > +
> > > > > > > > > +		ret = xfs_btree_bload_leaf(cur, nr_this_block, bbl->get_data,
> > > > > > > > > +				block, priv);
> > > > > > > > > +		if (ret)
> > > > > > > > > +			goto out;
> > > > > > > > > +
> > > > > > > > > +		/* Record the leftmost pointer to start the next level. */
> > > > > > > > > +		if (i == 0)
> > > > > > > > > +			xfs_btree_copy_ptrs(cur, &child_ptr, &ptr, 1);
> > > > > > > > 
> > > > > > > > "leftmost pointer" refers to the leftmost leaf block..?
> > > > > > > 
> > > > > > > Yes.  "Record the leftmost leaf pointer so we know where to start with
> > > > > > > the first node level." ?
> > > > > > > 
> > > > > > > > > +	}
> > > > > > > > > +	total_blocks += blocks;
> > > > > > > > > +	xfs_btree_bload_drop_buf(bbl, cur->bc_tp, &bp);
> > > > > > > > > +
> > > > > > > > > +	/* Populate the internal btree nodes. */
> > > > > > > > > +	for (level = 1; level < cur->bc_nlevels; level++) {
> > > > > > > > > +		union xfs_btree_ptr	first_ptr;
> > > > > > > > > +
> > > > > > > > > +		nr_this_level = blocks;
> > > > > > > > > +		block = NULL;
> > > > > > > > > +		xfs_btree_set_ptr_null(cur, &ptr);
> > > > > > > > > +
> > > > > > > > > +		xfs_btree_bload_level_geometry(cur, bbl, level, nr_this_level,
> > > > > > > > > +				&avg_per_block, &blocks, &blocks_with_extra);
> > > > > > > > > +
> > > > > > > > > +		/* Load each node block. */
> > > > > > > > > +		for (i = 0; i < blocks; i++) {
> > > > > > > > > +			unsigned int	nr_this_block = avg_per_block;
> > > > > > > > > +
> > > > > > > > > +			if (i < blocks_with_extra)
> > > > > > > > > +				nr_this_block++;
> > > > > > > > > +
> > > > > > > > > +			ret = xfs_btree_bload_prep_block(cur, bbl, level,
> > > > > > > > > +					nr_this_block, &ptr, &bp, &block,
> > > > > > > > > +					priv);
> > > > > > > > > +			if (ret)
> > > > > > > > > +				return ret;
> > > > > > > > > +
> > > > > > > > > +			trace_xfs_btree_bload_block(cur, level, i, blocks,
> > > > > > > > > +					&ptr, nr_this_block);
> > > > > > > > > +
> > > > > > > > > +			ret = xfs_btree_bload_node(cur, nr_this_block,
> > > > > > > > > +					&child_ptr, block);
> > > > > > > > > +			if (ret)
> > > > > > > > > +				goto out;
> > > > > > > > > +
> > > > > > > > > +			/*
> > > > > > > > > +			 * Record the leftmost pointer to start the next level.
> > > > > > > > > +			 */
> > > > > > > > 
> > > > > > > > And the same thing here. I think the generic ptr name is a little
> > > > > > > > confusing, though I don't have a better suggestion. I think it would
> > > > > > > > help if the comments were more explicit to say something like: "ptr
> > > > > > > > refers to the current block addr. Save the first block in the current
> > > > > > > > level so the next level up knows where to start looking for keys."
> > > > > > > 
> > > > > > > Yes, I'll do that:
> > > > > > > 
> > > > > > > "Record the leftmost node pointer so that we know where to start the
> > > > > > > next node level above this one."
> > > > > > > 
> > > > > > > Thanks for reviewing!
> > > > > > > 
> > > > > > > --D
> > > > > > > 
> > > > > > > > Brian
> > > > > > > > 
> > > > > > > > > +			if (i == 0)
> > > > > > > > > +				xfs_btree_copy_ptrs(cur, &first_ptr, &ptr, 1);
> > > > > > > > > +		}
> > > > > > > > > +		total_blocks += blocks;
> > > > > > > > > +		xfs_btree_bload_drop_buf(bbl, cur->bc_tp, &bp);
> > > > > > > > > +		xfs_btree_copy_ptrs(cur, &child_ptr, &first_ptr, 1);
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	/* Initialize the new root. */
> > > > > > > > > +	if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) {
> > > > > > > > > +		ASSERT(xfs_btree_ptr_is_null(cur, &ptr));
> > > > > > > > > +		cur->bc_private.b.ifake->if_levels = cur->bc_nlevels;
> > > > > > > > > +		cur->bc_private.b.ifake->if_blocks = total_blocks - 1;
> > > > > > > > > +	} else {
> > > > > > > > > +		cur->bc_private.a.afake->af_root = be32_to_cpu(ptr.s);
> > > > > > > > > +		cur->bc_private.a.afake->af_levels = cur->bc_nlevels;
> > > > > > > > > +		cur->bc_private.a.afake->af_blocks = total_blocks;
> > > > > > > > > +	}
> > > > > > > > > +
> > > > > > > > > +	/*
> > > > > > > > > +	 * Write the new blocks to disk.  If the ordered list isn't empty after
> > > > > > > > > +	 * that, then something went wrong and we have to fail.  This should
> > > > > > > > > +	 * never happen, but we'll check anyway.
> > > > > > > > > +	 */
> > > > > > > > > +	ret = xfs_buf_delwri_submit(&bbl->buffers_list);
> > > > > > > > > +	if (ret)
> > > > > > > > > +		goto out;
> > > > > > > > > +	if (!list_empty(&bbl->buffers_list)) {
> > > > > > > > > +		ASSERT(list_empty(&bbl->buffers_list));
> > > > > > > > > +		ret = -EIO;
> > > > > > > > > +	}
> > > > > > > > > +out:
> > > > > > > > > +	xfs_buf_delwri_cancel(&bbl->buffers_list);
> > > > > > > > > +	if (bp)
> > > > > > > > > +		xfs_trans_brelse(cur->bc_tp, bp);
> > > > > > > > > +	return ret;
> > > > > > > > > +}
> > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > > > > > > > > index 2965ed663418..51720de366ae 100644
> > > > > > > > > --- a/fs/xfs/libxfs/xfs_btree.h
> > > > > > > > > +++ b/fs/xfs/libxfs/xfs_btree.h
> > > > > > > > > @@ -578,4 +578,50 @@ void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur,
> > > > > > > > >  void xfs_btree_commit_ifakeroot(struct xfs_btree_cur *cur, int whichfork,
> > > > > > > > >  		const struct xfs_btree_ops *ops);
> > > > > > > > >  
> > > > > > > > > +typedef int (*xfs_btree_bload_get_fn)(struct xfs_btree_cur *cur, void *priv);
> > > > > > > > > +typedef int (*xfs_btree_bload_alloc_block_fn)(struct xfs_btree_cur *cur,
> > > > > > > > > +		union xfs_btree_ptr *ptr, void *priv);
> > > > > > > > > +typedef size_t (*xfs_btree_bload_iroot_size_fn)(struct xfs_btree_cur *cur,
> > > > > > > > > +		unsigned int nr_this_level, void *priv);
> > > > > > > > > +
> > > > > > > > > +/* Bulk loading of staged btrees. */
> > > > > > > > > +struct xfs_btree_bload {
> > > > > > > > > +	/* Buffer list for delwri_queue. */
> > > > > > > > > +	struct list_head		buffers_list;
> > > > > > > > > +
> > > > > > > > > +	/* Function to store a record in the cursor. */
> > > > > > > > > +	xfs_btree_bload_get_fn		get_data;
> > > > > > > > > +
> > > > > > > > > +	/* Function to allocate a block for the btree. */
> > > > > > > > > +	xfs_btree_bload_alloc_block_fn	alloc_block;
> > > > > > > > > +
> > > > > > > > > +	/* Function to compute the size of the in-core btree root block. */
> > > > > > > > > +	xfs_btree_bload_iroot_size_fn	iroot_size;
> > > > > > > > > +
> > > > > > > > > +	/* Number of records the caller wants to store. */
> > > > > > > > > +	uint64_t			nr_records;
> > > > > > > > > +
> > > > > > > > > +	/* Number of btree blocks needed to store those records. */
> > > > > > > > > +	uint64_t			nr_blocks;
> > > > > > > > > +
> > > > > > > > > +	/*
> > > > > > > > > +	 * Number of free records to leave in each leaf block.  If this (or
> > > > > > > > > +	 * any of the slack values) are negative, this will be computed to
> > > > > > > > > +	 * be halfway between maxrecs and minrecs.  This typically leaves the
> > > > > > > > > +	 * block 75% full.
> > > > > > > > > +	 */
> > > > > > > > > +	int				leaf_slack;
> > > > > > > > > +
> > > > > > > > > +	/* Number of free keyptrs to leave in each node block. */
> > > > > > > > > +	int				node_slack;
> > > > > > > > > +
> > > > > > > > > +	/* Computed btree height. */
> > > > > > > > > +	unsigned int			btree_height;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +int xfs_btree_bload_compute_geometry(struct xfs_btree_cur *cur,
> > > > > > > > > +		struct xfs_btree_bload *bbl, uint64_t nr_records);
> > > > > > > > > +int xfs_btree_bload(struct xfs_btree_cur *cur, struct xfs_btree_bload *bbl,
> > > > > > > > > +		void *priv);
> > > > > > > > > +
> > > > > > > > >  #endif	/* __XFS_BTREE_H__ */
> > > > > > > > > diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> > > > > > > > > index bc85b89f88ca..9b5e58a92381 100644
> > > > > > > > > --- a/fs/xfs/xfs_trace.c
> > > > > > > > > +++ b/fs/xfs/xfs_trace.c
> > > > > > > > > @@ -6,6 +6,7 @@
> > > > > > > > >  #include "xfs.h"
> > > > > > > > >  #include "xfs_fs.h"
> > > > > > > > >  #include "xfs_shared.h"
> > > > > > > > > +#include "xfs_bit.h"
> > > > > > > > >  #include "xfs_format.h"
> > > > > > > > >  #include "xfs_log_format.h"
> > > > > > > > >  #include "xfs_trans_resv.h"
> > > > > > > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > > > > > > > index 7e162ca80c92..69e8605f9f97 100644
> > > > > > > > > --- a/fs/xfs/xfs_trace.h
> > > > > > > > > +++ b/fs/xfs/xfs_trace.h
> > > > > > > > > @@ -35,6 +35,7 @@ struct xfs_icreate_log;
> > > > > > > > >  struct xfs_owner_info;
> > > > > > > > >  struct xfs_trans_res;
> > > > > > > > >  struct xfs_inobt_rec_incore;
> > > > > > > > > +union xfs_btree_ptr;
> > > > > > > > >  
> > > > > > > > >  DECLARE_EVENT_CLASS(xfs_attr_list_class,
> > > > > > > > >  	TP_PROTO(struct xfs_attr_list_context *ctx),
> > > > > > > > > @@ -3655,6 +3656,90 @@ TRACE_EVENT(xfs_btree_commit_ifakeroot,
> > > > > > > > >  		  __entry->blocks)
> > > > > > > > >  )
> > > > > > > > >  
> > > > > > > > > +TRACE_EVENT(xfs_btree_bload_level_geometry,
> > > > > > > > > +	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> > > > > > > > > +		 uint64_t nr_this_level, unsigned int nr_per_block,
> > > > > > > > > +		 unsigned int desired_npb, uint64_t blocks,
> > > > > > > > > +		 uint64_t blocks_with_extra),
> > > > > > > > > +	TP_ARGS(cur, level, nr_this_level, nr_per_block, desired_npb, blocks,
> > > > > > > > > +		blocks_with_extra),
> > > > > > > > > +	TP_STRUCT__entry(
> > > > > > > > > +		__field(dev_t, dev)
> > > > > > > > > +		__field(xfs_btnum_t, btnum)
> > > > > > > > > +		__field(unsigned int, level)
> > > > > > > > > +		__field(unsigned int, nlevels)
> > > > > > > > > +		__field(uint64_t, nr_this_level)
> > > > > > > > > +		__field(unsigned int, nr_per_block)
> > > > > > > > > +		__field(unsigned int, desired_npb)
> > > > > > > > > +		__field(unsigned long long, blocks)
> > > > > > > > > +		__field(unsigned long long, blocks_with_extra)
> > > > > > > > > +	),
> > > > > > > > > +	TP_fast_assign(
> > > > > > > > > +		__entry->dev = cur->bc_mp->m_super->s_dev;
> > > > > > > > > +		__entry->btnum = cur->bc_btnum;
> > > > > > > > > +		__entry->level = level;
> > > > > > > > > +		__entry->nlevels = cur->bc_nlevels;
> > > > > > > > > +		__entry->nr_this_level = nr_this_level;
> > > > > > > > > +		__entry->nr_per_block = nr_per_block;
> > > > > > > > > +		__entry->desired_npb = desired_npb;
> > > > > > > > > +		__entry->blocks = blocks;
> > > > > > > > > +		__entry->blocks_with_extra = blocks_with_extra;
> > > > > > > > > +	),
> > > > > > > > > +	TP_printk("dev %d:%d btree %s level %u/%u nr_this_level %llu nr_per_block %u desired_npb %u blocks %llu blocks_with_extra %llu",
> > > > > > > > > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > > > > > > +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > > > > > > > > +		  __entry->level,
> > > > > > > > > +		  __entry->nlevels,
> > > > > > > > > +		  __entry->nr_this_level,
> > > > > > > > > +		  __entry->nr_per_block,
> > > > > > > > > +		  __entry->desired_npb,
> > > > > > > > > +		  __entry->blocks,
> > > > > > > > > +		  __entry->blocks_with_extra)
> > > > > > > > > +)
> > > > > > > > > +
> > > > > > > > > +TRACE_EVENT(xfs_btree_bload_block,
> > > > > > > > > +	TP_PROTO(struct xfs_btree_cur *cur, unsigned int level,
> > > > > > > > > +		 uint64_t block_idx, uint64_t nr_blocks,
> > > > > > > > > +		 union xfs_btree_ptr *ptr, unsigned int nr_records),
> > > > > > > > > +	TP_ARGS(cur, level, block_idx, nr_blocks, ptr, nr_records),
> > > > > > > > > +	TP_STRUCT__entry(
> > > > > > > > > +		__field(dev_t, dev)
> > > > > > > > > +		__field(xfs_btnum_t, btnum)
> > > > > > > > > +		__field(unsigned int, level)
> > > > > > > > > +		__field(unsigned long long, block_idx)
> > > > > > > > > +		__field(unsigned long long, nr_blocks)
> > > > > > > > > +		__field(xfs_agnumber_t, agno)
> > > > > > > > > +		__field(xfs_agblock_t, agbno)
> > > > > > > > > +		__field(unsigned int, nr_records)
> > > > > > > > > +	),
> > > > > > > > > +	TP_fast_assign(
> > > > > > > > > +		__entry->dev = cur->bc_mp->m_super->s_dev;
> > > > > > > > > +		__entry->btnum = cur->bc_btnum;
> > > > > > > > > +		__entry->level = level;
> > > > > > > > > +		__entry->block_idx = block_idx;
> > > > > > > > > +		__entry->nr_blocks = nr_blocks;
> > > > > > > > > +		if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
> > > > > > > > > +			xfs_fsblock_t	fsb = be64_to_cpu(ptr->l);
> > > > > > > > > +
> > > > > > > > > +			__entry->agno = XFS_FSB_TO_AGNO(cur->bc_mp, fsb);
> > > > > > > > > +			__entry->agbno = XFS_FSB_TO_AGBNO(cur->bc_mp, fsb);
> > > > > > > > > +		} else {
> > > > > > > > > +			__entry->agno = cur->bc_private.a.agno;
> > > > > > > > > +			__entry->agbno = be32_to_cpu(ptr->s);
> > > > > > > > > +		}
> > > > > > > > > +		__entry->nr_records = nr_records;
> > > > > > > > > +	),
> > > > > > > > > +	TP_printk("dev %d:%d btree %s level %u block %llu/%llu fsb (%u/%u) recs %u",
> > > > > > > > > +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> > > > > > > > > +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> > > > > > > > > +		  __entry->level,
> > > > > > > > > +		  __entry->block_idx,
> > > > > > > > > +		  __entry->nr_blocks,
> > > > > > > > > +		  __entry->agno,
> > > > > > > > > +		  __entry->agbno,
> > > > > > > > > +		  __entry->nr_records)
> > > > > > > > > +)
> > > > > > > > > +
> > > > > > > > >  #endif /* _TRACE_XFS_H */
> > > > > > > > >  
> > > > > > > > >  #undef TRACE_INCLUDE_PATH
> > > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 


  reply	other threads:[~2020-03-07 12:40 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04  3:28 [PATCH v3 0/4] xfs: btree bulk loading Darrick J. Wong
2020-03-04  3:28 ` [PATCH 1/4] xfs: introduce fake roots for ag-rooted btrees Darrick J. Wong
2020-03-04 18:21   ` Brian Foster
2020-03-04 23:34     ` Darrick J. Wong
2020-03-04 23:53       ` Dave Chinner
2020-03-05  1:23         ` Darrick J. Wong
2020-03-05 14:30       ` Brian Foster
2020-03-05 17:39         ` Darrick J. Wong
2020-03-05  1:20   ` Dave Chinner
2020-03-05  1:23     ` Darrick J. Wong
2020-03-04  3:28 ` [PATCH 2/4] xfs: introduce fake roots for inode-rooted btrees Darrick J. Wong
2020-03-04 23:40   ` Darrick J. Wong
2020-03-04  3:28 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2020-03-04 18:21   ` Brian Foster
2020-03-05  1:22     ` Darrick J. Wong
2020-03-05 14:30       ` Brian Foster
2020-03-05 18:13         ` Darrick J. Wong
2020-03-06 14:22           ` Brian Foster
2020-03-06 16:27             ` Darrick J. Wong
2020-03-06 17:21               ` Brian Foster
2020-03-06 20:14                 ` Darrick J. Wong
2020-03-07 12:40                   ` Brian Foster [this message]
2020-03-05 23:59         ` Darrick J. Wong
2020-03-06 14:23           ` Brian Foster
2020-03-06 16:51             ` Darrick J. Wong
2020-03-06 17:25               ` Brian Foster
2020-03-06 19:55                 ` Darrick J. Wong
2020-03-04  3:28 ` [PATCH 4/4] xfs: support staging cursors for per-AG btree types Darrick J. Wong
2020-03-05 14:30   ` Brian Foster
2020-03-05 18:18     ` Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2020-01-01  1:00 [PATCH v2 0/4] xfs: btree bulk loading Darrick J. Wong
2020-01-01  1:01 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2019-10-29 23:30 [PATCH v2 0/4] xfs: btree bulk loading Darrick J. Wong
2019-10-29 23:30 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2019-10-09 16:47 [PATCH 0/4] xfs: btree bulk loading Darrick J. Wong
2019-10-09 16:48 ` [PATCH 3/4] xfs: support bulk loading of staged btrees Darrick J. Wong
2019-10-16 15:26   ` Brian Foster
2019-10-16 18:15     ` Darrick J. Wong
2019-10-16 21:07       ` Brian Foster
2019-10-17  0:40         ` Darrick J. Wong
2019-10-17  9:32           ` Brian Foster
2019-10-17 19:06             ` Darrick J. Wong
2019-10-18 14:38               ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200307124041.GA27459@bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.