linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] XFS near block allocation algorithm prototype
@ 2018-12-14 12:34 Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 1/5] xfs: factor out bnobt based near allocation algorithm Brian Foster
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Brian Foster @ 2018-12-14 12:34 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This is a prototype of a rework for the NEAR type block allocation
algorithm in XFS. As a bit of background, this came up recently in a
thread[1] from a user report of terrible allocation performance on a
filesystem with at least one AG with highly fragmented free space. The
cause of the performance degradation is the reliance on the brute force
bnobt search as a fallback in the current algorithm.

A couple ideas materialized through this discussion. The first was to
take advantage of the fact that a cntbt extent lookup already supports
the ability to use the agbno as a secondary key. This allows us to find
an extent with ideal locality of a particular size and could help
optimize the NEAR algorithm. The second was a high level observation
that each of the current near, size and exact allocation types have a
unique implementation for allocation algorithms that ultimately aren't
all that much different from eachother. While playing around with this
further, I also pondered the idea of storing maximum record length data
in non-leaf bnobt keys in support of an optimized seek mechanism (i.e.,
find the next closest extent of a minimum size without having to process
each record), but that's a thought experiment and separate discussion
atm.

The purpose of this prototype is to try and rework the NEAR allocation
algorithm into something that is a bit more efficient with an eye
towards longer term reuse for the other higher level allocation types.
At the moment, this prototype only covers the common case of
>=args.maxlen extents. It is very lightly tested and only performance
tested enough to determine that it prospectively improves allocation
performance on a metadump associated with [1] without any obvious
regression from a very simple fs_mark test. I see a factor of 10
increase in small file allocation performance (files/sec) on the
associated fs_mark test. I'm sending an RFC at this point because this
is the minimal implementation that includes potential high level
algorithm changes. I would obviously like to solicit any thoughts,
feedback or ideas on the core allocation algorithm before getting too
far into refining this further.

On the details of the algorithm update, the original thought was to
replace the bnobt scan with a basic cntbt lookup (that incorporated
agbno). I ended up using an approach to perform iterative cntbt lookups
in combination with a bnobt scan for several reasons. First, the bnobt
lookup is actually more efficient for smaller allocations. If we
consider that NEAR lookups are used not just for user data, but for
inodes, inode btree blocks, bmap btree blocks, etc., it's pretty clear
that many of these are generally smaller (even single block) allocations
that are trivially completed with a bnobt lookup. Second, I didn't have
a great means otherwise to determine when to cap cntbt lookups outside
of going to the end of the tree. Third, a one time agbno+len cntbt
lookup is essentially a crapshoot with regard to locality, so I don't
think that is an appropriate solution.

The idea of using both trees in this manner is to essentially balance
eachother out. Smaller allocations are likely to be more efficient
through the bnobt while larger allocations are likely to be more
efficient through the cntbt. Admittedly, I do still need to figure a way
to try and measure "allocation locality effectiveness" so I can at least
compare with the current algorithm.

Beyond the obvious need for design review and more thorough regression
and performance testing, etc., there are still plenty of changes
required to make this production worthy. I expect to add minlen support
for the case where there are no maxlen extents available as well as to
fold in the small allocation case and eventually replace (i.e. remove)
the existing NEAR alloc code. The prototype implementation is simply
bolted on in a manner to facilitate experimentation.

In the longer term, I think the implementation can fairly easily support
the existing EXACT and SIZE allocation modes with some minor
enhancements to consider allocation type at the appropriate points. For
example, a SIZE allocation could simply elide bnobt cursor allocation at
setup time and stop the search as soon as the first usable extent is
found. In other words, it's a NEAR search without any locality
requirement. I think we could do something similar for the EXACT case by
allocating a single bnobt cursor and stopping the allocation sequence as
soon as we process the first extent, regardless of whether it satisfies
the exact allocation requirement.

Finally, note that the first four patches are refactors/cleanups of the
existing code but aren't technically required for patch 5. I started out
on this by refactoring the existing code in anticipation of shifting
bits around and whatnot (and because it made it easier to reason about)
and instead ended up bolting on additional code for the time being. I'm
including the refactoring patches simply because they are the base in my
tree for patch 5, but for the purposes of the RFC they can probably be
ignored.

Thoughts, reviews, flames appreciated.

Brian

[1] https://marc.info/?l=linux-xfs&m=154028142608367&w=2

Brian Foster (5):
  xfs: factor out bnobt based near allocation algorithm
  xfs: factor out cntbt based near allocation algorithm
  xfs: refactor near alloc small case
  xfs: clean up variable names in xfs_alloc_ag_vextent_near()
  xfs: new cntbt based near allocation algorithm

 fs/xfs/libxfs/xfs_alloc.c | 956 +++++++++++++++++++++++++++++---------
 fs/xfs/xfs_trace.h        |   1 +
 2 files changed, 726 insertions(+), 231 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH RFC 1/5] xfs: factor out bnobt based near allocation algorithm
  2018-12-14 12:34 [PATCH RFC 0/5] XFS near block allocation algorithm prototype Brian Foster
@ 2018-12-14 12:34 ` Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 2/5] xfs: factor out cntbt " Brian Foster
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2018-12-14 12:34 UTC (permalink / raw)
  To: linux-xfs

- mostly for reviewability
- clean up some var names, labels, logic format, comments
- use -EAGAIN to implement busy restart

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 486 +++++++++++++++++++++-----------------
 1 file changed, 265 insertions(+), 221 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index e1c0c0d2f1b0..76affbcbd5ff 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -980,6 +980,228 @@ xfs_alloc_find_best_extent(
 	return error;
 }
 
+/*
+ * Second NEAR allocation algorithm. Search in the by-bno tree to the left and
+ * to the right simultaneously until in each case we find a space big enough or
+ * run into the edge of the tree. When we run into the edge, we deallocate that
+ * cursor. If both searches succeed, we compare the two spaces and pick the
+ * better one. With alignment, it's possible for both to fail; the upper level
+ * algorithm that picks allocation groups for allocations is not supposed to do
+ * this.
+ */
+STATIC int
+xfs_alloc_ag_vextent_near_bnobt(
+	struct xfs_alloc_arg	*args,
+	struct xfs_btree_cur	*cnt_cur,
+	bool			*busy,
+	unsigned		*busy_gen)
+{
+	struct xfs_btree_cur	*bno_cur_gt;
+	struct xfs_btree_cur	*bno_cur_lt;
+	xfs_agblock_t		gtbno;	/* start bno of right side entry */
+	xfs_agblock_t		gtbnoa;	/* aligned ... */
+	xfs_extlen_t		gtdiff;	/* difference to right side entry */
+	xfs_extlen_t		gtlen;	/* length of right side entry */
+	xfs_extlen_t		gtlena;	/* aligned ... */
+	xfs_agblock_t		gtnew;	/* useful start bno of right side */
+	int			error;	/* error code */
+	int			i;	/* result code, temporary */
+	int			j;	/* result code, temporary */
+	xfs_agblock_t		ltbno;	/* start bno of left side entry */
+	xfs_agblock_t		ltbnoa;	/* aligned ... */
+	xfs_extlen_t		ltdiff;	/* difference to left side entry */
+	xfs_extlen_t		ltlen;	/* length of left side entry */
+	xfs_extlen_t		ltlena;	/* aligned ... */
+	xfs_agblock_t		ltnew;	/* useful start bno of left side */
+	xfs_extlen_t		rlen;	/* length of returned extent */
+
+	/*
+	 * Allocate and initialize the cursor for the leftward search.
+	 */
+	bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
+		args->agno, XFS_BTNUM_BNO);
+	/*
+	 * Lookup <= bno to find the leftward search's starting point.
+	 */
+	error = xfs_alloc_lookup_le(bno_cur_lt, args->agbno, args->maxlen, &i);
+	if (error)
+		goto error;
+	if (!i) {
+		/*
+		 * Didn't find anything; use this cursor for the rightward
+		 * search.
+		 */
+		bno_cur_gt = bno_cur_lt;
+		bno_cur_lt = NULL;
+	}
+	/*
+	 * Found something.  Duplicate the cursor for the rightward search.
+	 */
+	else if ((error = xfs_btree_dup_cursor(bno_cur_lt, &bno_cur_gt)))
+		goto error;
+	/*
+	 * Increment the cursor, so we will point at the entry just right
+	 * of the leftward entry if any, or to the leftmost entry.
+	 */
+	error = xfs_btree_increment(bno_cur_gt, 0, &i);
+	if (error)
+		goto error;
+	if (!i) {
+		/*
+		 * It failed, there are no rightward entries.
+		 */
+		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_NOERROR);
+		bno_cur_gt = NULL;
+	}
+	/*
+	 * Loop going left with the leftward cursor, right with the
+	 * rightward cursor, until either both directions give up or
+	 * we find an entry at least as big as minlen.
+	 */
+	do {
+		if (bno_cur_lt) {
+			error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i);
+			if (error)
+				goto error;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
+			*busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
+					&ltbnoa, &ltlena, busy_gen);
+			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
+				break;
+			error = xfs_btree_decrement(bno_cur_lt, 0, &i);
+			if (error)
+				goto error;
+			if (!i || ltbnoa < args->min_agbno) {
+				xfs_btree_del_cursor(bno_cur_lt,
+						     XFS_BTREE_NOERROR);
+				bno_cur_lt = NULL;
+			}
+		}
+		if (bno_cur_gt) {
+			error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i);
+			if (error)
+				goto error;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
+			*busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
+					&gtbnoa, &gtlena, busy_gen);
+			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
+				break;
+			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
+				goto error;
+			if (!i || gtbnoa > args->max_agbno) {
+				xfs_btree_del_cursor(bno_cur_gt,
+						     XFS_BTREE_NOERROR);
+				bno_cur_gt = NULL;
+			}
+		}
+	} while (bno_cur_lt || bno_cur_gt);
+
+	/*
+	 * Got both cursors still active, need to find better entry.
+	 */
+	if (bno_cur_lt && bno_cur_gt) {
+		if (ltlena >= args->minlen) {
+			/*
+			 * Left side is good, look for a right side entry.
+			 */
+			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
+			xfs_alloc_fix_len(args);
+			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
+				args->alignment, args->datatype, ltbnoa,
+				ltlena, &ltnew);
+
+			error = xfs_alloc_find_best_extent(args,
+						&bno_cur_lt, &bno_cur_gt,
+						ltdiff, &gtbno, &gtlen,
+						&gtbnoa, &gtlena,
+						0 /* search right */);
+		} else {
+			ASSERT(gtlena >= args->minlen);
+
+			/*
+			 * Right side is good, look for a left side entry.
+			 */
+			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
+			xfs_alloc_fix_len(args);
+			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
+				args->alignment, args->datatype, gtbnoa,
+				gtlena, &gtnew);
+
+			error = xfs_alloc_find_best_extent(args,
+						&bno_cur_gt, &bno_cur_lt,
+						gtdiff, &ltbno, &ltlen,
+						&ltbnoa, &ltlena,
+						1 /* search left */);
+		}
+
+		if (error)
+			goto error;
+	}
+
+	/*
+	 * If we couldn't get anything, give up.
+	 */
+	if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
+		if (*busy)
+			return -EAGAIN;
+		trace_xfs_alloc_size_neither(args);
+		args->agbno = NULLAGBLOCK;
+		return 0;
+	}
+
+	/*
+	 * At this point we have selected a freespace entry, either to the
+	 * left or to the right.  If it's on the right, copy all the
+	 * useful variables to the "left" set so we only have one
+	 * copy of this code.
+	 */
+	if (bno_cur_gt) {
+		bno_cur_lt = bno_cur_gt;
+		bno_cur_gt = NULL;
+		ltbno = gtbno;
+		ltbnoa = gtbnoa;
+		ltlen = gtlen;
+		ltlena = gtlena;
+		j = 1;
+	} else
+		j = 0;
+
+	/*
+	 * Fix up the length and compute the useful address.
+	 */
+	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
+	xfs_alloc_fix_len(args);
+	rlen = args->len;
+	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
+				     args->datatype, ltbnoa, ltlena, &ltnew);
+	ASSERT(ltnew >= ltbno);
+	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
+	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
+	args->agbno = ltnew;
+
+	error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen, ltnew,
+				rlen, XFSA_FIXUP_BNO_OK);
+	if (error)
+		goto error;
+
+	if (j)
+		trace_xfs_alloc_near_greater(args);
+	else
+		trace_xfs_alloc_near_lesser(args);
+
+	xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
+	return 0;
+
+error:
+	if (bno_cur_lt)
+		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_ERROR);
+	if (bno_cur_gt)
+		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_ERROR);
+	return error;
+
+}
+
 /*
  * Allocate a variable extent near bno in the allocation group agno.
  * Extent's length (returned in len) will be between minlen and maxlen,
@@ -990,15 +1212,8 @@ STATIC int				/* error */
 xfs_alloc_ag_vextent_near(
 	xfs_alloc_arg_t	*args)		/* allocation argument structure */
 {
-	xfs_btree_cur_t	*bno_cur_gt;	/* cursor for bno btree, right side */
-	xfs_btree_cur_t	*bno_cur_lt;	/* cursor for bno btree, left side */
+	xfs_btree_cur_t	*bno_cur = NULL;/* cursor for bno btree */
 	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
-	xfs_agblock_t	gtbno;		/* start bno of right side entry */
-	xfs_agblock_t	gtbnoa;		/* aligned ... */
-	xfs_extlen_t	gtdiff;		/* difference to right side entry */
-	xfs_extlen_t	gtlen;		/* length of right side entry */
-	xfs_extlen_t	gtlena;		/* aligned ... */
-	xfs_agblock_t	gtnew;		/* useful start bno of right side */
 	int		error;		/* error code */
 	int		i;		/* result code, temporary */
 	int		j;		/* result code, temporary */
@@ -1008,7 +1223,6 @@ xfs_alloc_ag_vextent_near(
 	xfs_extlen_t	ltlen;		/* length of left side entry */
 	xfs_extlen_t	ltlena;		/* aligned ... */
 	xfs_agblock_t	ltnew;		/* useful start bno of left side */
-	xfs_extlen_t	rlen;		/* length of returned extent */
 	bool		busy;
 	unsigned	busy_gen;
 #ifdef DEBUG
@@ -1032,10 +1246,7 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = args->max_agbno;
 
 restart:
-	bno_cur_lt = NULL;
-	bno_cur_gt = NULL;
 	ltlen = 0;
-	gtlena = 0;
 	ltlena = 0;
 	busy = false;
 
@@ -1048,16 +1259,18 @@ xfs_alloc_ag_vextent_near(
 	/*
 	 * See if there are any free extents as big as maxlen.
 	 */
-	if ((error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i)))
-		goto error0;
+	error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i);
+	if (error)
+		goto error;
 	/*
 	 * If none, then pick up the last entry in the tree unless the
 	 * tree is empty.
 	 */
 	if (!i) {
-		if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
-				&ltlen, &i)))
-			goto error0;
+		error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
+				&ltlen, &i);
+		if (error)
+			goto error;
 		if (i == 0 || ltlen == 0) {
 			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 			trace_xfs_alloc_near_noentry(args);
@@ -1096,14 +1309,15 @@ xfs_alloc_ag_vextent_near(
 		if (ltlen || args->alignment > 1) {
 			cnt_cur->bc_ptrs[0] = 1;
 			do {
-				if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno,
-						&ltlen, &i)))
-					goto error0;
-				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+				error = xfs_alloc_get_rec(cnt_cur, &ltbno,
+						&ltlen, &i);
+				if (error)
+					goto error;
+				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
 				if (ltlen >= args->minlen)
 					break;
 				if ((error = xfs_btree_increment(cnt_cur, 0, &i)))
-					goto error0;
+					goto error;
 			} while (i);
 			ASSERT(ltlen >= args->minlen);
 			if (!i)
@@ -1117,9 +1331,10 @@ xfs_alloc_ag_vextent_near(
 			 * For each entry, decide if it's better than
 			 * the previous best entry.
 			 */
-			if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+			error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i);
+			if (error)
+				goto error;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
 			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
 					&ltbnoa, &ltlena, &busy_gen);
 			if (ltlena < args->minlen)
@@ -1152,9 +1367,10 @@ xfs_alloc_ag_vextent_near(
 		 * Point at the best entry, and retrieve it again.
 		 */
 		cnt_cur->bc_ptrs[0] = besti;
-		if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
 		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
 		args->len = blen;
 
@@ -1167,218 +1383,46 @@ xfs_alloc_ag_vextent_near(
 		/*
 		 * Set up a cursor for the by-bno tree.
 		 */
-		bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp,
+		bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp,
 			args->agbp, args->agno, XFS_BTNUM_BNO);
 		/*
 		 * Fix up the btree entries.
 		 */
-		if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno,
-				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK)))
-			goto error0;
+		error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, ltbno,
+				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
+		if (error)
+			goto error;
 		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
+		xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
 
 		trace_xfs_alloc_near_first(args);
 		return 0;
 	}
-	/*
-	 * Second algorithm.
-	 * Search in the by-bno tree to the left and to the right
-	 * simultaneously, until in each case we find a space big enough,
-	 * or run into the edge of the tree.  When we run into the edge,
-	 * we deallocate that cursor.
-	 * If both searches succeed, we compare the two spaces and pick
-	 * the better one.
-	 * With alignment, it's possible for both to fail; the upper
-	 * level algorithm that picks allocation groups for allocations
-	 * is not supposed to do this.
-	 */
-	/*
-	 * Allocate and initialize the cursor for the leftward search.
-	 */
-	bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_BNO);
-	/*
-	 * Lookup <= bno to find the leftward search's starting point.
-	 */
-	if ((error = xfs_alloc_lookup_le(bno_cur_lt, args->agbno, args->maxlen, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * Didn't find anything; use this cursor for the rightward
-		 * search.
-		 */
-		bno_cur_gt = bno_cur_lt;
-		bno_cur_lt = NULL;
-	}
-	/*
-	 * Found something.  Duplicate the cursor for the rightward search.
-	 */
-	else if ((error = xfs_btree_dup_cursor(bno_cur_lt, &bno_cur_gt)))
-		goto error0;
-	/*
-	 * Increment the cursor, so we will point at the entry just right
-	 * of the leftward entry if any, or to the leftmost entry.
-	 */
-	if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * It failed, there are no rightward entries.
-		 */
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_NOERROR);
-		bno_cur_gt = NULL;
-	}
-	/*
-	 * Loop going left with the leftward cursor, right with the
-	 * rightward cursor, until either both directions give up or
-	 * we find an entry at least as big as minlen.
-	 */
-	do {
-		if (bno_cur_lt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
-			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
-				break;
-			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
-				goto error0;
-			if (!i || ltbnoa < args->min_agbno) {
-				xfs_btree_del_cursor(bno_cur_lt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_lt = NULL;
-			}
-		}
-		if (bno_cur_gt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
-					&gtbnoa, &gtlena, &busy_gen);
-			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
-				break;
-			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-				goto error0;
-			if (!i || gtbnoa > args->max_agbno) {
-				xfs_btree_del_cursor(bno_cur_gt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_gt = NULL;
-			}
-		}
-	} while (bno_cur_lt || bno_cur_gt);
-
-	/*
-	 * Got both cursors still active, need to find better entry.
-	 */
-	if (bno_cur_lt && bno_cur_gt) {
-		if (ltlena >= args->minlen) {
-			/*
-			 * Left side is good, look for a right side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_lt, &bno_cur_gt,
-						ltdiff, &gtbno, &gtlen,
-						&gtbnoa, &gtlena,
-						0 /* search right */);
-		} else {
-			ASSERT(gtlena >= args->minlen);
-
-			/*
-			 * Right side is good, look for a left side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, gtbnoa,
-				gtlena, &gtnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_gt, &bno_cur_lt,
-						gtdiff, &ltbno, &ltlen,
-						&ltbnoa, &ltlena,
-						1 /* search left */);
-		}
-
-		if (error)
-			goto error0;
-	}
 
 	/*
-	 * If we couldn't get anything, give up.
+	 * Execute the fallback bnobt-based algorithm. This returns -EAGAIN if
+	 * the allocation failed due to busy extents. In that case, flush all
+	 * busy extents and restart.
 	 */
-	if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
+	error = xfs_alloc_ag_vextent_near_bnobt(args, cnt_cur, &busy,
+						&busy_gen);
+	if (error == -EAGAIN) {
+		trace_xfs_alloc_near_busy(args);
 		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-
-		if (busy) {
-			trace_xfs_alloc_near_busy(args);
-			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
-			goto restart;
-		}
-		trace_xfs_alloc_size_neither(args);
-		args->agbno = NULLAGBLOCK;
-		return 0;
+		xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
+		goto restart;
 	}
-
-	/*
-	 * At this point we have selected a freespace entry, either to the
-	 * left or to the right.  If it's on the right, copy all the
-	 * useful variables to the "left" set so we only have one
-	 * copy of this code.
-	 */
-	if (bno_cur_gt) {
-		bno_cur_lt = bno_cur_gt;
-		bno_cur_gt = NULL;
-		ltbno = gtbno;
-		ltbnoa = gtbnoa;
-		ltlen = gtlen;
-		ltlena = gtlena;
-		j = 1;
-	} else
-		j = 0;
-
-	/*
-	 * Fix up the length and compute the useful address.
-	 */
-	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-	xfs_alloc_fix_len(args);
-	rlen = args->len;
-	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
-				     args->datatype, ltbnoa, ltlena, &ltnew);
-	ASSERT(ltnew >= ltbno);
-	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
-	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
-	args->agbno = ltnew;
-
-	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
-			ltnew, rlen, XFSA_FIXUP_BNO_OK)))
-		goto error0;
-
-	if (j)
-		trace_xfs_alloc_near_greater(args);
-	else
-		trace_xfs_alloc_near_lesser(args);
-
+	if (error)
+		goto error;
 	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-	xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
 	return 0;
 
- error0:
+error:
 	trace_xfs_alloc_near_error(args);
-	if (cnt_cur != NULL)
+	if (cnt_cur)
 		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
-	if (bno_cur_lt != NULL)
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_ERROR);
-	if (bno_cur_gt != NULL)
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_ERROR);
+	if (bno_cur)
+		xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
 	return error;
 }
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 2/5] xfs: factor out cntbt based near allocation algorithm
  2018-12-14 12:34 [PATCH RFC 0/5] XFS near block allocation algorithm prototype Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 1/5] xfs: factor out bnobt based near allocation algorithm Brian Foster
@ 2018-12-14 12:34 ` Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 3/5] xfs: refactor near alloc small case Brian Foster
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2018-12-14 12:34 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 224 +++++++++++++++++++++-----------------
 1 file changed, 127 insertions(+), 97 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 76affbcbd5ff..a7e3daa16717 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -980,6 +980,119 @@ xfs_alloc_find_best_extent(
 	return error;
 }
 
+/*
+ * First NEAR algorithm. If the requested extent is large wrt the freespace
+ * available in this AG, then the cursor will point to a btree entry near the
+ * right edge of the tree. If it's in the last btree leaf block, then we just
+ * examine all the entries in that block that are big enough and pick the best
+ * one.
+ */
+STATIC int
+xfs_alloc_ag_vextent_near_cntbt(
+	struct xfs_alloc_arg	*args,
+	struct xfs_btree_cur	*cnt_cur,
+	bool			*busy,
+	unsigned		*busy_gen,
+	int			*done)
+{
+	struct xfs_btree_cur	*bno_cur = NULL;
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len;
+	xfs_agblock_t		bnoa;
+	xfs_extlen_t		lena;
+	xfs_extlen_t		diff;
+	xfs_agblock_t		new;
+	xfs_extlen_t		bdiff;
+	int			besti = 0;
+	xfs_extlen_t		blen;
+	xfs_agblock_t		bnew = 0;
+	int			i, j;
+	int			error = 0;
+
+	*done = 0;
+
+	/*
+	 * Inspect each record to the end of the tree and keep track of the one
+	 * with the best locality. Note that the caller points cnt_cur to the
+	 * last block in the tree before we get here.
+	 */
+	for (j = 1, blen = 0, bdiff = 0;
+	     !error && j && (blen < args->maxlen || bdiff > 0);
+	     error = xfs_btree_increment(cnt_cur, 0, &j)) {
+		error = xfs_alloc_get_rec(cnt_cur, &bno, &len, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
+
+		*busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
+						  busy_gen);
+		if (lena < args->minlen)
+			continue;
+		if (bnoa < args->min_agbno || bnoa > args->max_agbno)
+			continue;
+		args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
+		xfs_alloc_fix_len(args);
+		ASSERT(args->len >= args->minlen);
+		if (args->len < blen)
+			continue;
+
+		diff = xfs_alloc_compute_diff(args->agbno, args->len,
+					      args->alignment, args->datatype,
+					      bnoa, lena, &new);
+		if (new != NULLAGBLOCK && (args->len > blen || diff < bdiff)) {
+			bdiff = diff;
+			bnew = new;
+			blen = args->len;
+			besti = cnt_cur->bc_ptrs[0];
+		}
+	}
+
+	/*
+	 * If we didn't find a suitable record, return and the let the caller
+	 * try another algorithm.
+	 */
+	if (blen == 0)
+		return 0;
+
+	/*
+	 * We found a suitable record. Point the cntbt back at the record,
+	 * retrieve it and fix up args with the final allocation.
+	 */
+	cnt_cur->bc_ptrs[0] = besti;
+	error = xfs_alloc_get_rec(cnt_cur, &bno, &len, &i);
+	if (error)
+		goto error;
+	XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
+	ASSERT(bno + len <=
+	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+	ASSERT(bnew >= bno);
+	ASSERT(bnew + blen <= bno + len);
+
+	args->agbno = bnew;
+	args->len = blen;
+
+	/*
+	 * Set up a bnobt cursor and fix up both trees with the allocation. Note
+	 * that the caller owns cnt_cur so we don't close it here.
+	 */
+	bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
+					  args->agno, XFS_BTNUM_BNO);
+	error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, bno, len, bnew, blen,
+				      XFSA_FIXUP_CNT_OK);
+	if (error)
+		goto error;
+	xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
+
+	*done = 1;
+	trace_xfs_alloc_near_first(args);
+	return 0;
+
+error:
+	if (bno_cur)
+		xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
+	return error;
+}
+
 /*
  * Second NEAR allocation algorithm. Search in the by-bno tree to the left and
  * to the right simultaneously until in each case we find a space big enough or
@@ -1216,13 +1329,8 @@ xfs_alloc_ag_vextent_near(
 	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
 	int		error;		/* error code */
 	int		i;		/* result code, temporary */
-	int		j;		/* result code, temporary */
 	xfs_agblock_t	ltbno;		/* start bno of left side entry */
-	xfs_agblock_t	ltbnoa;		/* aligned ... */
-	xfs_extlen_t	ltdiff;		/* difference to left side entry */
 	xfs_extlen_t	ltlen;		/* length of left side entry */
-	xfs_extlen_t	ltlena;		/* aligned ... */
-	xfs_agblock_t	ltnew;		/* useful start bno of left side */
 	bool		busy;
 	unsigned	busy_gen;
 #ifdef DEBUG
@@ -1247,7 +1355,6 @@ xfs_alloc_ag_vextent_near(
 
 restart:
 	ltlen = 0;
-	ltlena = 0;
 	busy = false;
 
 	/*
@@ -1280,25 +1387,10 @@ xfs_alloc_ag_vextent_near(
 	}
 	args->wasfromfl = 0;
 
-	/*
-	 * First algorithm.
-	 * If the requested extent is large wrt the freespaces available
-	 * in this a.g., then the cursor will be pointing to a btree entry
-	 * near the right edge of the tree.  If it's in the last btree leaf
-	 * block, then we just examine all the entries in that block
-	 * that are big enough, and pick the best one.
-	 * This is written as a while loop so we can break out of it,
-	 * but we never loop back to the top.
-	 */
-	while (xfs_btree_islastblock(cnt_cur, 0)) {
-		xfs_extlen_t	bdiff;
-		int		besti=0;
-		xfs_extlen_t	blen=0;
-		xfs_agblock_t	bnew=0;
-
+	if (xfs_btree_islastblock(cnt_cur, 0)) {
 #ifdef DEBUG
 		if (dofirst)
-			break;
+			goto near_bnobt;
 #endif
 		/*
 		 * Start from the entry that lookup found, sequence through
@@ -1313,92 +1405,30 @@ xfs_alloc_ag_vextent_near(
 						&ltlen, &i);
 				if (error)
 					goto error;
-				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
+				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1,
+							error);
 				if (ltlen >= args->minlen)
 					break;
-				if ((error = xfs_btree_increment(cnt_cur, 0, &i)))
+				error = xfs_btree_increment(cnt_cur, 0, &i);
+				if (error)
 					goto error;
 			} while (i);
 			ASSERT(ltlen >= args->minlen);
 			if (!i)
-				break;
-		}
-		i = cnt_cur->bc_ptrs[0];
-		for (j = 1, blen = 0, bdiff = 0;
-		     !error && j && (blen < args->maxlen || bdiff > 0);
-		     error = xfs_btree_increment(cnt_cur, 0, &j)) {
-			/*
-			 * For each entry, decide if it's better than
-			 * the previous best entry.
-			 */
-			error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i);
-			if (error)
-				goto error;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
-			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
-			if (ltlena < args->minlen)
-				continue;
-			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
-				continue;
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ASSERT(args->len >= args->minlen);
-			if (args->len < blen)
-				continue;
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-			if (ltnew != NULLAGBLOCK &&
-			    (args->len > blen || ltdiff < bdiff)) {
-				bdiff = ltdiff;
-				bnew = ltnew;
-				blen = args->len;
-				besti = cnt_cur->bc_ptrs[0];
-			}
+				goto near_bnobt;
 		}
-		/*
-		 * It didn't work.  We COULD be in a case where
-		 * there's a good record somewhere, so try again.
-		 */
-		if (blen == 0)
-			break;
-		/*
-		 * Point at the best entry, and retrieve it again.
-		 */
-		cnt_cur->bc_ptrs[0] = besti;
-		error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i);
-		if (error)
-			goto error;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
-		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-		args->len = blen;
 
-		/*
-		 * We are allocating starting at bnew for blen blocks.
-		 */
-		args->agbno = bnew;
-		ASSERT(bnew >= ltbno);
-		ASSERT(bnew + blen <= ltbno + ltlen);
-		/*
-		 * Set up a cursor for the by-bno tree.
-		 */
-		bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp,
-			args->agbp, args->agno, XFS_BTNUM_BNO);
-		/*
-		 * Fix up the btree entries.
-		 */
-		error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, ltbno,
-				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
+		error = xfs_alloc_ag_vextent_near_cntbt(args, cnt_cur, &busy,
+							&busy_gen, &i);
 		if (error)
 			goto error;
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-
-		trace_xfs_alloc_near_first(args);
-		return 0;
+		if (i) {
+			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+			return 0;
+		}
 	}
 
+near_bnobt:
 	/*
 	 * Execute the fallback bnobt-based algorithm. This returns -EAGAIN if
 	 * the allocation failed due to busy extents. In that case, flush all
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 3/5] xfs: refactor near alloc small case
  2018-12-14 12:34 [PATCH RFC 0/5] XFS near block allocation algorithm prototype Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 1/5] xfs: factor out bnobt based near allocation algorithm Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 2/5] xfs: factor out cntbt " Brian Foster
@ 2018-12-14 12:34 ` Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 4/5] xfs: clean up variable names in xfs_alloc_ag_vextent_near() Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 5/5] xfs: new cntbt based near allocation algorithm Brian Foster
  4 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2018-12-14 12:34 UTC (permalink / raw)
  To: linux-xfs

- open-code the empty cntbt case

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 61 +++++++++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a7e3daa16717..cd35502eee2b 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1323,21 +1323,21 @@ xfs_alloc_ag_vextent_near_bnobt(
  */
 STATIC int				/* error */
 xfs_alloc_ag_vextent_near(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+	struct xfs_alloc_arg	*args)	/* allocation argument structure */
 {
-	xfs_btree_cur_t	*bno_cur = NULL;/* cursor for bno btree */
-	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
-	int		error;		/* error code */
-	int		i;		/* result code, temporary */
-	xfs_agblock_t	ltbno;		/* start bno of left side entry */
-	xfs_extlen_t	ltlen;		/* length of left side entry */
-	bool		busy;
-	unsigned	busy_gen;
+	struct xfs_btree_cur	*bno_cur = NULL;/* cursor for bno btree */
+	struct xfs_btree_cur	*cnt_cur;	/* cursor for count btree */
+	int			error;		/* error code */
+	int			i;		/* result code, temporary */
+	xfs_agblock_t		ltbno;	      /* start bno of left side entry */
+	xfs_extlen_t		ltlen;		/* length of left side entry */
+	bool			busy;
+	unsigned		busy_gen;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
 	 */
-	int		dofirst;	/* set to do first algorithm */
+	int			dofirst;	/* set to do first algorithm */
 
 	dofirst = prandom_u32() & 1;
 #endif
@@ -1358,32 +1358,49 @@ xfs_alloc_ag_vextent_near(
 	busy = false;
 
 	/*
-	 * Get a cursor for the by-size btree.
+	 * Get a cursor for the by-size btree and start with a lookup for maxlen
+	 * free extents.
 	 */
 	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_CNT);
-
-	/*
-	 * See if there are any free extents as big as maxlen.
-	 */
+					  args->agno, XFS_BTNUM_CNT);
 	error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i);
 	if (error)
 		goto error;
+
 	/*
-	 * If none, then pick up the last entry in the tree unless the
-	 * tree is empty.
+	 * If there are no maxlen extents, check the last (largest) record
+	 * against minlen. If it's usable, the cntbt search will find the best
+	 * >= minlen extent to use in the last block. If the cntbt is completely
+	 * empty, the only other option is to try and allocate from the AGFL.
 	 */
 	if (!i) {
-		error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
-				&ltlen, &i);
+		error = xfs_btree_decrement(cnt_cur, 0, &i);
 		if (error)
 			goto error;
-		if (i == 0 || ltlen == 0) {
+		if (i) {
+			error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i);
+			if (error)
+				goto error;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
+			if (ltlen < args->minlen) {
+				trace_xfs_alloc_small_notenough(args);
+				return 0;
+			}
+		} else {
+			/*
+			 * We only get here if the cntbt is empty so this call
+			 * returns an AGFL block or nothing. We're done either
+			 * way.
+			 */
+			error = xfs_alloc_ag_vextent_small(args, cnt_cur,
+							   &ltbno, &ltlen, &i);
+			if (error)
+				goto error;
+			ASSERT(i == 0 || (i && ltlen == 0));
 			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 			trace_xfs_alloc_near_noentry(args);
 			return 0;
 		}
-		ASSERT(i == 1);
 	}
 	args->wasfromfl = 0;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 4/5] xfs: clean up variable names in xfs_alloc_ag_vextent_near()
  2018-12-14 12:34 [PATCH RFC 0/5] XFS near block allocation algorithm prototype Brian Foster
                   ` (2 preceding siblings ...)
  2018-12-14 12:34 ` [PATCH RFC 3/5] xfs: refactor near alloc small case Brian Foster
@ 2018-12-14 12:34 ` Brian Foster
  2018-12-14 12:34 ` [PATCH RFC 5/5] xfs: new cntbt based near allocation algorithm Brian Foster
  4 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2018-12-14 12:34 UTC (permalink / raw)
  To: linux-xfs

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index cd35502eee2b..95ee48e3fb9d 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1329,8 +1329,8 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_btree_cur	*cnt_cur;	/* cursor for count btree */
 	int			error;		/* error code */
 	int			i;		/* result code, temporary */
-	xfs_agblock_t		ltbno;	      /* start bno of left side entry */
-	xfs_extlen_t		ltlen;		/* length of left side entry */
+	xfs_agblock_t		bno;	      /* start bno of left side entry */
+	xfs_extlen_t		len;		/* length of left side entry */
 	bool			busy;
 	unsigned		busy_gen;
 #ifdef DEBUG
@@ -1354,7 +1354,7 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = args->max_agbno;
 
 restart:
-	ltlen = 0;
+	len = 0;
 	busy = false;
 
 	/*
@@ -1378,11 +1378,11 @@ xfs_alloc_ag_vextent_near(
 		if (error)
 			goto error;
 		if (i) {
-			error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i);
+			error = xfs_alloc_get_rec(cnt_cur, &bno, &len, &i);
 			if (error)
 				goto error;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error);
-			if (ltlen < args->minlen) {
+			if (len < args->minlen) {
 				trace_xfs_alloc_small_notenough(args);
 				return 0;
 			}
@@ -1393,10 +1393,10 @@ xfs_alloc_ag_vextent_near(
 			 * way.
 			 */
 			error = xfs_alloc_ag_vextent_small(args, cnt_cur,
-							   &ltbno, &ltlen, &i);
+							   &bno, &len, &i);
 			if (error)
 				goto error;
-			ASSERT(i == 0 || (i && ltlen == 0));
+			ASSERT(i == 0 || (i && len == 0));
 			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 			trace_xfs_alloc_near_noentry(args);
 			return 0;
@@ -1415,22 +1415,22 @@ xfs_alloc_ag_vextent_near(
 		 * record smaller than maxlen, go to the start of this block,
 		 * and skip all those smaller than minlen.
 		 */
-		if (ltlen || args->alignment > 1) {
+		if (len || args->alignment > 1) {
 			cnt_cur->bc_ptrs[0] = 1;
 			do {
-				error = xfs_alloc_get_rec(cnt_cur, &ltbno,
-						&ltlen, &i);
+				error = xfs_alloc_get_rec(cnt_cur, &bno,
+						&len, &i);
 				if (error)
 					goto error;
 				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1,
 							error);
-				if (ltlen >= args->minlen)
+				if (len >= args->minlen)
 					break;
 				error = xfs_btree_increment(cnt_cur, 0, &i);
 				if (error)
 					goto error;
 			} while (i);
-			ASSERT(ltlen >= args->minlen);
+			ASSERT(len >= args->minlen);
 			if (!i)
 				goto near_bnobt;
 		}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH RFC 5/5] xfs: new cntbt based near allocation algorithm
  2018-12-14 12:34 [PATCH RFC 0/5] XFS near block allocation algorithm prototype Brian Foster
                   ` (3 preceding siblings ...)
  2018-12-14 12:34 ` [PATCH RFC 4/5] xfs: clean up variable names in xfs_alloc_ag_vextent_near() Brian Foster
@ 2018-12-14 12:34 ` Brian Foster
  4 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2018-12-14 12:34 UTC (permalink / raw)
  To: linux-xfs

Prototype a new NEAR type allocation algorithm. Use both the
by-block and by-size btrees to allocate the largest available extent
with ideal locality.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 405 +++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trace.h        |   1 +
 2 files changed, 405 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 95ee48e3fb9d..96aef9f9eaca 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1315,6 +1315,392 @@ xfs_alloc_ag_vextent_near_bnobt(
 
 }
 
+enum xfs_alloc_cur {
+	XFS_ALLOC_CNT = 0,
+	XFS_ALLOC_BNOLT,
+	XFS_ALLOC_BNOGT,
+	XFS_ALLOC_MAX
+};
+
+struct xfs_alloc_best {
+	struct xfs_btree_cur	*cur[XFS_ALLOC_MAX];
+	xfs_extlen_t		cur_len;	/* current search length */
+	xfs_agblock_t		rec_bno;	/* record startblock */
+	xfs_extlen_t		rec_len;	/* record length */
+	xfs_agblock_t		bno;		/* alloc bno */
+	xfs_extlen_t		len;		/* alloc len */
+	xfs_extlen_t		diff;		/* diff from search bno */
+	bool			done;
+	bool			busy;
+	unsigned		busy_gen;
+};
+
+static int
+xfs_alloc_best_setup(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_best	*best)
+{
+	int			error;
+	int			i;
+
+	best->diff = -1;
+
+	best->cur[XFS_ALLOC_CNT] = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_CNT);
+
+	best->cur[XFS_ALLOC_BNOLT] = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	error = xfs_alloc_lookup_le(best->cur[XFS_ALLOC_BNOLT], args->agbno,
+				    args->maxlen, &i);
+	if (error)
+		return error;
+	if (!i) {
+		best->cur[XFS_ALLOC_BNOGT] = best->cur[XFS_ALLOC_BNOLT];
+		best->cur[XFS_ALLOC_BNOLT] = NULL;
+	} else {
+		xfs_btree_dup_cursor(best->cur[XFS_ALLOC_BNOLT],
+				     &best->cur[XFS_ALLOC_BNOGT]);
+	}
+
+	error = xfs_btree_increment(best->cur[XFS_ALLOC_BNOGT], 0, &i);
+	if (error)
+		return error;
+	if (!i) {
+		xfs_btree_del_cursor(best->cur[XFS_ALLOC_BNOGT],
+				     XFS_BTREE_NOERROR);
+		best->cur[XFS_ALLOC_BNOGT] = NULL;
+	}
+
+	return error;
+}
+
+static inline void
+__xfs_alloc_best_close(
+	struct xfs_alloc_best	*best,
+	enum xfs_alloc_cur	cidx,
+	int			error)
+{
+	if (!best->cur[cidx])
+		return;
+
+	xfs_btree_del_cursor(best->cur[cidx], error);
+	best->cur[cidx] = NULL;
+}
+
+static void
+xfs_alloc_best_close(
+	struct xfs_alloc_best	*best,
+	bool			error)
+{
+	int			cur_error = XFS_BTREE_NOERROR;
+	int			i;
+
+	if (error)
+		cur_error = XFS_BTREE_ERROR;
+	for (i = 0; i < XFS_ALLOC_MAX; i++)
+		__xfs_alloc_best_close(best, i, cur_error);
+}
+
+static bool
+xfs_alloc_best_check(
+	struct xfs_alloc_arg	*args,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	struct xfs_alloc_best	*best,
+	enum xfs_alloc_cur	cidx)
+{
+	xfs_agblock_t		bnoa, bnew;
+	xfs_extlen_t		lena, diff = -1;
+	bool			busy;
+	unsigned		busy_gen = 0;
+	bool			badrange = false;
+	bool			isbnobt = false;
+
+	isbnobt = (cidx == XFS_ALLOC_BNOLT || cidx == XFS_ALLOC_BNOGT);
+
+	busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
+					 &busy_gen);
+	best->busy |= busy;
+	if (busy)
+		best->busy_gen = busy_gen;
+
+	if (bnoa < args->min_agbno || bnoa > args->max_agbno) {
+		badrange = true;
+		goto fail;
+	}
+	if (lena < args->minlen)
+		goto fail;
+
+	args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
+	xfs_alloc_fix_len(args);
+	ASSERT(args->len >= args->minlen);
+	if (args->len < best->len)
+		goto fail;
+
+	diff = xfs_alloc_compute_diff(args->agbno, args->len,
+				      args->alignment, args->datatype,
+				      bnoa, lena, &bnew);
+	if (bnew == NULLAGBLOCK)
+		goto fail;
+	if (diff > best->diff) {
+		badrange = true;
+		goto fail;
+	}
+	if (args->len < best->len)
+		goto fail;
+
+	best->rec_bno = bno;
+	best->rec_len = len;
+	best->bno = bnew;
+	best->len = args->len;
+	best->diff = diff;
+	return true;
+
+fail:
+	/* close bnobt cursors once out of range to prevent further searching */
+	if (isbnobt && badrange)
+		__xfs_alloc_best_close(best, cidx, XFS_BTREE_NOERROR);
+	return false;
+}
+
+/*
+ * Complete an allocation if we have a candidate extent.
+ */
+STATIC int
+xfs_alloc_best_finish(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_best	*best,
+	int			*stat)
+{
+	struct xfs_btree_cur	*bno_cur;
+	int			error;
+
+	ASSERT(best->len);
+	ASSERT(best->cur[XFS_ALLOC_CNT]);
+
+	/*
+	 * Reuse or reopen a bnobt cursor if necessary. It will be closed as
+	 * part of the xfs_alloc_best cleanup.
+	 */
+	if (best->cur[XFS_ALLOC_BNOLT]) {
+		bno_cur = best->cur[XFS_ALLOC_BNOLT];
+	} else if (best->cur[XFS_ALLOC_BNOGT]) {
+		bno_cur = best->cur[XFS_ALLOC_BNOGT];
+	} else {
+		bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp,
+						  args->agbp, args->agno,
+						  XFS_BTNUM_BNO);
+		best->cur[XFS_ALLOC_BNOLT] = bno_cur;
+	}
+
+	error = xfs_alloc_fixup_trees(best->cur[XFS_ALLOC_CNT], bno_cur,
+				      best->rec_bno, best->rec_len, best->bno,
+				      best->len, 0);
+	if (error)
+		return error;
+
+	args->agbno = best->bno;
+	args->len = best->len;
+	*stat = 1;
+
+	return 0;
+}
+
+STATIC int
+xfs_alloc_size_iter(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_best	*best,
+	enum xfs_alloc_cur	cidx)
+{
+	struct xfs_btree_cur	*cur = best->cur[cidx];
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len, cur_len;
+	int			error;
+	int			i;
+
+	/*
+	 * Store the current search key and perform a locality optimized lookup.
+	 */
+	cur_len = best->cur_len;
+	error = xfs_alloc_lookup_ge(cur, args->agbno, cur_len, &i);
+	if (error)
+		return error;
+	if (i == 0) {
+		best->done = true;
+		return 0;
+	}
+	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+	if (error)
+		return error;
+
+	/*
+	 * Check the record and update the search key to the extent length we
+	 * actually found in the tree.
+	 */
+	XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
+	xfs_alloc_best_check(args, bno, len, best, cidx);
+	ASSERT(len >= best->cur_len);
+	best->cur_len = len;
+
+	/*
+	 * We looked up the first record >= [agbno, len] above. If this is a
+	 * cntbt search, the agbno is a secondary key and so the record may not
+	 * start beyond agbno if there are no such records for the particular
+	 * length. If it is past agbno, check the previous record too so long as
+	 * the length matches as it may be closer.
+	 */
+	if (bno > args->agbno) {
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			return error;
+		if (i) {
+			error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+			if (error)
+				return error;
+			XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
+			if (len == best->cur_len)
+				xfs_alloc_best_check(args, bno, len, best,
+						     cidx);
+		}
+	}
+
+	/*
+	 * Increment the search key if we haven't yet found a candidate extent
+	 * or this lookup already significantly bumped the key. We have to make
+	 * sure to not skip any records until we have at least one allocatable
+	 * extent. Note that if the allocation ultimately fails due to busy
+	 * extents, we'll flush the busy list and restart the whole thing.
+	 *
+	 * Otherwise, double the search key for the next lookup to optimize the
+	 * search. This allows us to find good enough locality at the expense of
+	 * absolute best locality. Max extent size and reasonable allocation
+	 * efficiency are more important here than perfect locality.
+	 */
+	cur_len <<= 1;
+	if (!best->len || best->cur_len >= cur_len)
+		best->cur_len++;
+	else
+		best->cur_len = cur_len;
+
+	return error;
+}
+
+STATIC int
+xfs_alloc_bno_iter(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_best	*best,
+	enum xfs_alloc_cur	cidx,
+	int			iters,
+	int			*stat)
+{
+	struct xfs_btree_cur	*cur;
+	int			error;
+	int			i;
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len;
+	bool			found = false;
+
+	*stat = 0;
+	cur = best->cur[cidx];
+	if (!cur)
+		return 0;
+
+	for (; iters > 0; iters--) {
+		error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+		if (error)
+			return error;
+		XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
+		found = xfs_alloc_best_check(args, bno, len, best, cidx);
+		if (found) {
+			*stat = 1;
+			break;
+		}
+		if (!best->cur[cidx])
+			break;
+
+		if (cidx == XFS_ALLOC_BNOLT)
+			error = xfs_btree_decrement(cur, 0, &i);
+		else if (cidx == XFS_ALLOC_BNOGT)
+			error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			return error;
+		if (i == 0) {
+			xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+			best->cur[cidx] = NULL;
+			break;
+		}
+	}
+
+	return error;
+}
+
+STATIC int
+xfs_alloc_ag_vextent_new(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_best	*best,
+	int			*stat)
+{
+	int			error;
+	int			i;
+	unsigned int		findbest = 0;
+	enum xfs_alloc_cur	findbestc;
+
+	*stat = 0;
+
+	error = xfs_alloc_best_setup(args, best);
+	if (error)
+		goto out;
+
+	best->cur_len = args->maxlen;
+	while (!best->done) {
+		/*
+		 * Search the bnobt left and right. If either of these find a
+		 * suitable extent, we know we've found ideal locality. Do a
+		 * capped search in the opposite direction and we're done.
+		 */
+		error = xfs_alloc_bno_iter(args, best, XFS_ALLOC_BNOLT, 1, &i);
+		if (error)
+			goto out;
+		if (i) {
+			best->done = true;
+			findbest = args->mp->m_alloc_mxr[0];
+			findbestc = XFS_ALLOC_BNOGT;
+			break;
+		}
+
+		error = xfs_alloc_bno_iter(args, best, XFS_ALLOC_BNOGT, 1, &i);
+		if (error)
+			goto out;
+		if (i) {
+			best->done = true;
+			findbest = args->mp->m_alloc_mxr[0];
+			findbestc = XFS_ALLOC_BNOLT;
+			break;
+		}
+
+		/*
+		 * Search for an extent based on size. This only sets best.done
+		 * if we hit the end of the tree.
+		 */
+		error = xfs_alloc_size_iter(args, best, XFS_ALLOC_CNT);
+		if (error)
+			goto out;
+	}
+
+	if (findbest) {
+		error = xfs_alloc_bno_iter(args, best, findbestc, findbest, &i);
+		if (error)
+			goto out;
+	}
+
+	if (best->len)
+		error = xfs_alloc_best_finish(args, best, stat);
+
+out:
+	xfs_alloc_best_close(best, error);
+	return error;
+}
+
 /*
  * Allocate a variable extent near bno in the allocation group agno.
  * Extent's length (returned in len) will be between minlen and maxlen,
@@ -1326,7 +1712,7 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_alloc_arg	*args)	/* allocation argument structure */
 {
 	struct xfs_btree_cur	*bno_cur = NULL;/* cursor for bno btree */
-	struct xfs_btree_cur	*cnt_cur;	/* cursor for count btree */
+	struct xfs_btree_cur	*cnt_cur = NULL;/* cursor for count btree */
 	int			error;		/* error code */
 	int			i;		/* result code, temporary */
 	xfs_agblock_t		bno;	      /* start bno of left side entry */
@@ -1357,6 +1743,23 @@ xfs_alloc_ag_vextent_near(
 	len = 0;
 	busy = false;
 
+	if (args->agbno != NULLAGBLOCK) {
+		struct xfs_alloc_best	best = {0,};
+
+		error = xfs_alloc_ag_vextent_new(args, &best, &i);
+		if (error)
+			goto error;
+		if (i) {
+			trace_xfs_alloc_near_new(args);
+			return 0;
+		}
+		if (best.busy) {
+			xfs_extent_busy_flush(args->mp, args->pag,
+					      best.busy_gen);
+			goto restart;
+		}
+	}
+
 	/*
 	 * Get a cursor for the by-size btree and start with a lookup for maxlen
 	 * free extents.
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 8a6532aae779..7a4e5829d594 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1626,6 +1626,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_done);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_new);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-14 12:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-14 12:34 [PATCH RFC 0/5] XFS near block allocation algorithm prototype Brian Foster
2018-12-14 12:34 ` [PATCH RFC 1/5] xfs: factor out bnobt based near allocation algorithm Brian Foster
2018-12-14 12:34 ` [PATCH RFC 2/5] xfs: factor out cntbt " Brian Foster
2018-12-14 12:34 ` [PATCH RFC 3/5] xfs: refactor near alloc small case Brian Foster
2018-12-14 12:34 ` [PATCH RFC 4/5] xfs: clean up variable names in xfs_alloc_ag_vextent_near() Brian Foster
2018-12-14 12:34 ` [PATCH RFC 5/5] xfs: new cntbt based near allocation algorithm Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).