linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/11] xfs: rework near mode extent allocation
@ 2019-09-27 17:17 Brian Foster
  2019-09-27 17:17 ` [PATCH v5 01/11] xfs: track active state of allocation btree cursors Brian Foster
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

Hi all,

Here's v5 of the near mode block allocation rework. This mostly consists
of various minor cleanups from v4. The only functional change is to
deactivate the cntbt cursor on finding a perfect extent in patch 8.
Thoughts, reviews, flames appreciated.

Brian

v5:
- Fix up active logic in patch 1.
- Fix busy_gen type.
- Change ->diff initialization value in proper patch.
- Deactivate cntbt cursor on location of perfect extent.
- Various aesthetic cleanups. 
v4: https://lore.kernel.org/linux-xfs/20190916121635.43148-1-bfoster@redhat.com/
- Fix up cursor active tracking type usage.
- Fix up cntbt lookup function signature.
- Added high level comment on optimized allocation algorithm.
- Split up series into smaller patches to separate refactoring from
  functional changes.
v3: https://lore.kernel.org/linux-xfs/20190815125538.49570-1-bfoster@redhat.com/
- Drop by-size and exact allocation rework bits.
- Add near mode last block scan.
- Add debug mode patch to randomly toggle near mode algos.
- Refactor cursor setup/lookup logic.
- Refactor minlen reverse scan to be common between near mode algos.
- Fix up logic to consistently prioritize extent size over locality.
- Add more useful tracepoints.
- Miscellaneous bug fixes and code/comment cleanups.
v2: https://marc.info/?l=linux-xfs&m=155854834815400&w=2
- Lift small mode refactoring into separate patch (retained review
  tag(s).
- Various logic cleanups and refactors.
- Push active flag down into btree cursor private area; eliminate cursor
  container struct.
- Refactor final allocation code. Fold xfs_alloc_ag_vextent_type() into
  caller and factor out accounting.                                                                                                                     
- Fix up tracepoints.
v1: https://marc.info/?l=linux-xfs&m=155742169729590&w=2
- Continued development (various fixes, refinements) on generic bits and
  near mode implementation.
- Added patches 4-6 to refactor exact, by-size and small allocation
  modes.
rfcv2: https://marc.info/?l=linux-xfs&m=155197946630582&w=2
- Dropped spurious initial refactoring.
- Added minlen functionality.
- Properly tied into near alloc path.
- General refactoring and cleanups.
rfcv1: https://marc.info/?l=linux-xfs&m=154479089914351&w=2

Brian Foster (11):
  xfs: track active state of allocation btree cursors
  xfs: introduce allocation cursor data structure
  xfs: track allocation busy state in allocation cursor
  xfs: track best extent from cntbt lastblock scan in alloc cursor
  xfs: refactor cntbt lastblock scan best extent logic into helper
  xfs: reuse best extent tracking logic for bnobt scan
  xfs: refactor allocation tree fixup code
  xfs: refactor and reuse best extent scanning logic
  xfs: refactor near mode alloc bnobt scan into separate function
  xfs: factor out tree fixup logic into helper
  xfs: optimize near mode bnobt scans with concurrent cntbt lookups

 fs/xfs/libxfs/xfs_alloc.c       | 897 ++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_alloc_btree.c |   1 +
 fs/xfs/libxfs/xfs_btree.h       |   3 +
 fs/xfs/xfs_trace.h              |  33 +-
 4 files changed, 547 insertions(+), 387 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 01/11] xfs: track active state of allocation btree cursors
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-09-30  8:11   ` Christoph Hellwig
  2019-10-01  5:35   ` Darrick J. Wong
  2019-09-27 17:17 ` [PATCH v5 02/11] xfs: introduce allocation cursor data structure Brian Foster
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

The upcoming allocation algorithm update searches multiple
allocation btree cursors concurrently. As such, it requires an
active state to track when a particular cursor should continue
searching. While active state will be modified based on higher level
logic, we can define base functionality based on the result of
allocation btree lookups.

Define an active flag in the private area of the btree cursor.
Update it based on the result of lookups in the existing allocation
btree helpers. Finally, provide a new helper to query the current
state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c       | 24 +++++++++++++++++++++---
 fs/xfs/libxfs/xfs_alloc_btree.c |  1 +
 fs/xfs/libxfs/xfs_btree.h       |  3 +++
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 533b04aaf6f6..0ecc142c833b 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -146,9 +146,13 @@ xfs_alloc_lookup_eq(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat)	/* success/failure */
 {
+	int			error;
+
 	cur->bc_rec.a.ar_startblock = bno;
 	cur->bc_rec.a.ar_blockcount = len;
-	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+	cur->bc_private.a.priv.abt.active = (*stat == 1);
+	return error;
 }
 
 /*
@@ -162,9 +166,13 @@ xfs_alloc_lookup_ge(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat)	/* success/failure */
 {
+	int			error;
+
 	cur->bc_rec.a.ar_startblock = bno;
 	cur->bc_rec.a.ar_blockcount = len;
-	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+	cur->bc_private.a.priv.abt.active = (*stat == 1);
+	return error;
 }
 
 /*
@@ -178,9 +186,19 @@ xfs_alloc_lookup_le(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat)	/* success/failure */
 {
+	int			error;
 	cur->bc_rec.a.ar_startblock = bno;
 	cur->bc_rec.a.ar_blockcount = len;
-	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+	cur->bc_private.a.priv.abt.active = (*stat == 1);
+	return error;
+}
+
+static inline bool
+xfs_alloc_cur_active(
+	struct xfs_btree_cur	*cur)
+{
+	return cur && cur->bc_private.a.priv.abt.active;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 2a94543857a1..279694d73e4e 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -507,6 +507,7 @@ xfs_allocbt_init_cursor(
 
 	cur->bc_private.a.agbp = agbp;
 	cur->bc_private.a.agno = agno;
+	cur->bc_private.a.priv.abt.active = false;
 
 	if (xfs_sb_version_hascrc(&mp->m_sb))
 		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index ced1e65d1483..b4e3ec1d7ff9 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -183,6 +183,9 @@ union xfs_btree_cur_private {
 		unsigned long	nr_ops;		/* # record updates */
 		int		shape_changes;	/* # of extent splits */
 	} refc;
+	struct {
+		bool		active;		/* allocation cursor state */
+	} abt;
 };
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 02/11] xfs: introduce allocation cursor data structure
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
  2019-09-27 17:17 ` [PATCH v5 01/11] xfs: track active state of allocation btree cursors Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-09-27 17:17 ` [PATCH v5 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

Introduce a new allocation cursor data structure to encapsulate the
various states and structures used to perform an extent allocation.
This structure will eventually be used to track overall allocation
state across different search algorithms on both free space btrees.

To start, include the three btree cursors (one for the cntbt and two
for the bnobt left/right search) used by the near mode allocation
algorithm and refactor the cursor setup and teardown code into
helpers. This slightly changes cursor memory allocation patterns,
but otherwise makes no functional changes to the allocation
algorithm.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 318 +++++++++++++++++++-------------------
 1 file changed, 163 insertions(+), 155 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 0ecc142c833b..e14ab480de92 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -710,8 +710,71 @@ xfs_alloc_update_counters(
 }
 
 /*
- * Allocation group level functions.
+ * Block allocation algorithm and data structures.
  */
+struct xfs_alloc_cur {
+	struct xfs_btree_cur		*cnt;	/* btree cursors */
+	struct xfs_btree_cur		*bnolt;
+	struct xfs_btree_cur		*bnogt;
+};
+
+/*
+ * Set up cursors, etc. in the extent allocation cursor. This function can be
+ * called multiple times to reset an initialized structure without having to
+ * reallocate cursors.
+ */
+static int
+xfs_alloc_cur_setup(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur)
+{
+	int			error;
+	int			i;
+
+	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
+
+	/*
+	 * Perform an initial cntbt lookup to check for availability of maxlen
+	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
+	 * attempt a small allocation.
+	 */
+	if (!acur->cnt)
+		acur->cnt = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_CNT);
+	error = xfs_alloc_lookup_ge(acur->cnt, 0, args->maxlen, &i);
+	if (error)
+		return error;
+
+	/*
+	 * Allocate the bnobt left and right search cursors.
+	 */
+	if (!acur->bnolt)
+		acur->bnolt = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	if (!acur->bnogt)
+		acur->bnogt = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	return i == 1 ? 0 : -ENOSPC;
+}
+
+static void
+xfs_alloc_cur_close(
+	struct xfs_alloc_cur	*acur,
+	bool			error)
+{
+	int			cur_error = XFS_BTREE_NOERROR;
+
+	if (error)
+		cur_error = XFS_BTREE_ERROR;
+
+	if (acur->cnt)
+		xfs_btree_del_cursor(acur->cnt, cur_error);
+	if (acur->bnolt)
+		xfs_btree_del_cursor(acur->bnolt, cur_error);
+	if (acur->bnogt)
+		xfs_btree_del_cursor(acur->bnogt, cur_error);
+	acur->cnt = acur->bnolt = acur->bnogt = NULL;
+}
 
 /*
  * Deal with the case where only small freespaces remain. Either return the
@@ -1008,8 +1071,8 @@ xfs_alloc_ag_vextent_exact(
 STATIC int
 xfs_alloc_find_best_extent(
 	struct xfs_alloc_arg	*args,	/* allocation argument structure */
-	struct xfs_btree_cur	**gcur,	/* good cursor */
-	struct xfs_btree_cur	**scur,	/* searching cursor */
+	struct xfs_btree_cur	*gcur,	/* good cursor */
+	struct xfs_btree_cur	*scur,	/* searching cursor */
 	xfs_agblock_t		gdiff,	/* difference for search comparison */
 	xfs_agblock_t		*sbno,	/* extent found by search */
 	xfs_extlen_t		*slen,	/* extent length */
@@ -1031,7 +1094,7 @@ xfs_alloc_find_best_extent(
 	 * Look until we find a better one, run out of space or run off the end.
 	 */
 	do {
-		error = xfs_alloc_get_rec(*scur, sbno, slen, &i);
+		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
 		if (error)
 			goto error0;
 		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
@@ -1074,21 +1137,19 @@ xfs_alloc_find_best_extent(
 		}
 
 		if (!dir)
-			error = xfs_btree_increment(*scur, 0, &i);
+			error = xfs_btree_increment(scur, 0, &i);
 		else
-			error = xfs_btree_decrement(*scur, 0, &i);
+			error = xfs_btree_decrement(scur, 0, &i);
 		if (error)
 			goto error0;
 	} while (i);
 
 out_use_good:
-	xfs_btree_del_cursor(*scur, XFS_BTREE_NOERROR);
-	*scur = NULL;
+	scur->bc_private.a.priv.abt.active = false;
 	return 0;
 
 out_use_search:
-	xfs_btree_del_cursor(*gcur, XFS_BTREE_NOERROR);
-	*gcur = NULL;
+	gcur->bc_private.a.priv.abt.active = false;
 	return 0;
 
 error0:
@@ -1102,13 +1163,12 @@ xfs_alloc_find_best_extent(
  * and of the form k * prod + mod unless there's nothing that large.
  * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
  */
-STATIC int				/* error */
+STATIC int
 xfs_alloc_ag_vextent_near(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+	struct xfs_alloc_arg	*args)
 {
-	xfs_btree_cur_t	*bno_cur_gt;	/* cursor for bno btree, right side */
-	xfs_btree_cur_t	*bno_cur_lt;	/* cursor for bno btree, left side */
-	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
+	struct xfs_alloc_cur	acur = {0,};
+	struct xfs_btree_cur	*bno_cur;
 	xfs_agblock_t	gtbno;		/* start bno of right side entry */
 	xfs_agblock_t	gtbnoa;		/* aligned ... */
 	xfs_extlen_t	gtdiff;		/* difference to right side entry */
@@ -1148,38 +1208,29 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = args->max_agbno;
 
 restart:
-	bno_cur_lt = NULL;
-	bno_cur_gt = NULL;
 	ltlen = 0;
 	gtlena = 0;
 	ltlena = 0;
 	busy = false;
 
 	/*
-	 * Get a cursor for the by-size btree.
+	 * Set up cursors and see if there are any free extents as big as
+	 * maxlen. If not, pick the last entry in the tree unless the tree is
+	 * empty.
 	 */
-	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_CNT);
-
-	/*
-	 * See if there are any free extents as big as maxlen.
-	 */
-	if ((error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i)))
-		goto error0;
-	/*
-	 * If none, then pick up the last entry in the tree unless the
-	 * tree is empty.
-	 */
-	if (!i) {
-		if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
-				&ltlen, &i)))
-			goto error0;
+	error = xfs_alloc_cur_setup(args, &acur);
+	if (error == -ENOSPC) {
+		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
+				&ltlen, &i);
+		if (error)
+			goto out;
 		if (i == 0 || ltlen == 0) {
-			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 			trace_xfs_alloc_near_noentry(args);
-			return 0;
+			goto out;
 		}
 		ASSERT(i == 1);
+	} else if (error) {
+		goto out;
 	}
 	args->wasfromfl = 0;
 
@@ -1193,7 +1244,7 @@ xfs_alloc_ag_vextent_near(
 	 * This is written as a while loop so we can break out of it,
 	 * but we never loop back to the top.
 	 */
-	while (xfs_btree_islastblock(cnt_cur, 0)) {
+	while (xfs_btree_islastblock(acur.cnt, 0)) {
 		xfs_extlen_t	bdiff;
 		int		besti=0;
 		xfs_extlen_t	blen=0;
@@ -1210,32 +1261,35 @@ xfs_alloc_ag_vextent_near(
 		 * and skip all those smaller than minlen.
 		 */
 		if (ltlen || args->alignment > 1) {
-			cnt_cur->bc_ptrs[0] = 1;
+			acur.cnt->bc_ptrs[0] = 1;
 			do {
-				if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno,
-						&ltlen, &i)))
-					goto error0;
-				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
+						&ltlen, &i);
+				if (error)
+					goto out;
+				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 				if (ltlen >= args->minlen)
 					break;
-				if ((error = xfs_btree_increment(cnt_cur, 0, &i)))
-					goto error0;
+				error = xfs_btree_increment(acur.cnt, 0, &i);
+				if (error)
+					goto out;
 			} while (i);
 			ASSERT(ltlen >= args->minlen);
 			if (!i)
 				break;
 		}
-		i = cnt_cur->bc_ptrs[0];
+		i = acur.cnt->bc_ptrs[0];
 		for (j = 1, blen = 0, bdiff = 0;
 		     !error && j && (blen < args->maxlen || bdiff > 0);
-		     error = xfs_btree_increment(cnt_cur, 0, &j)) {
+		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
 			/*
 			 * For each entry, decide if it's better than
 			 * the previous best entry.
 			 */
-			if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
 					&ltbnoa, &ltlena, &busy_gen);
 			if (ltlena < args->minlen)
@@ -1255,7 +1309,7 @@ xfs_alloc_ag_vextent_near(
 				bdiff = ltdiff;
 				bnew = ltnew;
 				blen = args->len;
-				besti = cnt_cur->bc_ptrs[0];
+				besti = acur.cnt->bc_ptrs[0];
 			}
 		}
 		/*
@@ -1267,10 +1321,11 @@ xfs_alloc_ag_vextent_near(
 		/*
 		 * Point at the best entry, and retrieve it again.
 		 */
-		cnt_cur->bc_ptrs[0] = besti;
-		if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		acur.cnt->bc_ptrs[0] = besti;
+		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
+		if (error)
+			goto out;
+		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
 		args->len = blen;
 
@@ -1280,23 +1335,14 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = bnew;
 		ASSERT(bnew >= ltbno);
 		ASSERT(bnew + blen <= ltbno + ltlen);
-		/*
-		 * Set up a cursor for the by-bno tree.
-		 */
-		bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp,
-			args->agbp, args->agno, XFS_BTNUM_BNO);
-		/*
-		 * Fix up the btree entries.
-		 */
-		if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno,
-				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK)))
-			goto error0;
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
-
+		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
+					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
+		if (error)
+			goto out;
 		trace_xfs_alloc_near_first(args);
-		return 0;
+		goto out;
 	}
+
 	/*
 	 * Second algorithm.
 	 * Search in the by-bno tree to the left and to the right
@@ -1309,86 +1355,57 @@ xfs_alloc_ag_vextent_near(
 	 * level algorithm that picks allocation groups for allocations
 	 * is not supposed to do this.
 	 */
-	/*
-	 * Allocate and initialize the cursor for the leftward search.
-	 */
-	bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_BNO);
-	/*
-	 * Lookup <= bno to find the leftward search's starting point.
-	 */
-	if ((error = xfs_alloc_lookup_le(bno_cur_lt, args->agbno, args->maxlen, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * Didn't find anything; use this cursor for the rightward
-		 * search.
-		 */
-		bno_cur_gt = bno_cur_lt;
-		bno_cur_lt = NULL;
-	}
-	/*
-	 * Found something.  Duplicate the cursor for the rightward search.
-	 */
-	else if ((error = xfs_btree_dup_cursor(bno_cur_lt, &bno_cur_gt)))
-		goto error0;
-	/*
-	 * Increment the cursor, so we will point at the entry just right
-	 * of the leftward entry if any, or to the leftmost entry.
-	 */
-	if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * It failed, there are no rightward entries.
-		 */
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_NOERROR);
-		bno_cur_gt = NULL;
-	}
+	error = xfs_alloc_lookup_le(acur.bnolt, args->agbno, 0, &i);
+	if (error)
+		goto out;
+	error = xfs_alloc_lookup_ge(acur.bnogt, args->agbno, 0, &i);
+	if (error)
+		goto out;
+
 	/*
 	 * Loop going left with the leftward cursor, right with the
 	 * rightward cursor, until either both directions give up or
 	 * we find an entry at least as big as minlen.
 	 */
 	do {
-		if (bno_cur_lt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		if (xfs_alloc_cur_active(acur.bnolt)) {
+			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
 					&ltbnoa, &ltlena, &busy_gen);
 			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
 				break;
-			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
-				goto error0;
-			if (!i || ltbnoa < args->min_agbno) {
-				xfs_btree_del_cursor(bno_cur_lt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_lt = NULL;
-			}
+			error = xfs_btree_decrement(acur.bnolt, 0, &i);
+			if (error)
+				goto out;
+			if (!i || ltbnoa < args->min_agbno)
+				acur.bnolt->bc_private.a.priv.abt.active = false;
 		}
-		if (bno_cur_gt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		if (xfs_alloc_cur_active(acur.bnogt)) {
+			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
 					&gtbnoa, &gtlena, &busy_gen);
 			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
 				break;
-			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-				goto error0;
-			if (!i || gtbnoa > args->max_agbno) {
-				xfs_btree_del_cursor(bno_cur_gt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_gt = NULL;
-			}
+			error = xfs_btree_increment(acur.bnogt, 0, &i);
+			if (error)
+				goto out;
+			if (!i || gtbnoa > args->max_agbno)
+				acur.bnogt->bc_private.a.priv.abt.active = false;
 		}
-	} while (bno_cur_lt || bno_cur_gt);
+	} while (xfs_alloc_cur_active(acur.bnolt) ||
+		 xfs_alloc_cur_active(acur.bnogt));
 
 	/*
 	 * Got both cursors still active, need to find better entry.
 	 */
-	if (bno_cur_lt && bno_cur_gt) {
+	if (xfs_alloc_cur_active(acur.bnolt) &&
+	    xfs_alloc_cur_active(acur.bnogt)) {
 		if (ltlena >= args->minlen) {
 			/*
 			 * Left side is good, look for a right side entry.
@@ -1400,7 +1417,7 @@ xfs_alloc_ag_vextent_near(
 				ltlena, &ltnew);
 
 			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_lt, &bno_cur_gt,
+						acur.bnolt, acur.bnogt,
 						ltdiff, &gtbno, &gtlen,
 						&gtbnoa, &gtlena,
 						0 /* search right */);
@@ -1417,22 +1434,21 @@ xfs_alloc_ag_vextent_near(
 				gtlena, &gtnew);
 
 			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_gt, &bno_cur_lt,
+						acur.bnogt, acur.bnolt,
 						gtdiff, &ltbno, &ltlen,
 						&ltbnoa, &ltlena,
 						1 /* search left */);
 		}
 
 		if (error)
-			goto error0;
+			goto out;
 	}
 
 	/*
 	 * If we couldn't get anything, give up.
 	 */
-	if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-
+	if (!xfs_alloc_cur_active(acur.bnolt) &&
+	    !xfs_alloc_cur_active(acur.bnogt)) {
 		if (busy) {
 			trace_xfs_alloc_near_busy(args);
 			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
@@ -1440,7 +1456,7 @@ xfs_alloc_ag_vextent_near(
 		}
 		trace_xfs_alloc_size_neither(args);
 		args->agbno = NULLAGBLOCK;
-		return 0;
+		goto out;
 	}
 
 	/*
@@ -1449,16 +1465,17 @@ xfs_alloc_ag_vextent_near(
 	 * useful variables to the "left" set so we only have one
 	 * copy of this code.
 	 */
-	if (bno_cur_gt) {
-		bno_cur_lt = bno_cur_gt;
-		bno_cur_gt = NULL;
+	if (xfs_alloc_cur_active(acur.bnogt)) {
+		bno_cur = acur.bnogt;
 		ltbno = gtbno;
 		ltbnoa = gtbnoa;
 		ltlen = gtlen;
 		ltlena = gtlena;
 		j = 1;
-	} else
+	} else {
+		bno_cur = acur.bnolt;
 		j = 0;
+	}
 
 	/*
 	 * Fix up the length and compute the useful address.
@@ -1474,27 +1491,18 @@ xfs_alloc_ag_vextent_near(
 	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
 	args->agbno = ltnew;
 
-	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
-			ltnew, rlen, XFSA_FIXUP_BNO_OK)))
-		goto error0;
+	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
+				      rlen, XFSA_FIXUP_BNO_OK);
+	if (error)
+		goto out;
 
 	if (j)
 		trace_xfs_alloc_near_greater(args);
 	else
 		trace_xfs_alloc_near_lesser(args);
 
-	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-	xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
-	return 0;
-
- error0:
-	trace_xfs_alloc_near_error(args);
-	if (cnt_cur != NULL)
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
-	if (bno_cur_lt != NULL)
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_ERROR);
-	if (bno_cur_gt != NULL)
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_ERROR);
+out:
+	xfs_alloc_cur_close(&acur, error);
 	return error;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 03/11] xfs: track allocation busy state in allocation cursor
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
  2019-09-27 17:17 ` [PATCH v5 01/11] xfs: track active state of allocation btree cursors Brian Foster
  2019-09-27 17:17 ` [PATCH v5 02/11] xfs: introduce allocation cursor data structure Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-09-27 17:17 ` [PATCH v5 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

Extend the allocation cursor to track extent busy state for an
allocation attempt. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index e14ab480de92..aa71a01a93b1 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -716,6 +716,8 @@ struct xfs_alloc_cur {
 	struct xfs_btree_cur		*cnt;	/* btree cursors */
 	struct xfs_btree_cur		*bnolt;
 	struct xfs_btree_cur		*bnogt;
+	unsigned int			busy_gen;/* busy state */
+	bool				busy;
 };
 
 /*
@@ -733,6 +735,9 @@ xfs_alloc_cur_setup(
 
 	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
 
+	acur->busy = false;
+	acur->busy_gen = 0;
+
 	/*
 	 * Perform an initial cntbt lookup to check for availability of maxlen
 	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
@@ -1185,8 +1190,6 @@ xfs_alloc_ag_vextent_near(
 	xfs_extlen_t	ltlena;		/* aligned ... */
 	xfs_agblock_t	ltnew;		/* useful start bno of left side */
 	xfs_extlen_t	rlen;		/* length of returned extent */
-	bool		busy;
-	unsigned	busy_gen;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
@@ -1211,7 +1214,6 @@ xfs_alloc_ag_vextent_near(
 	ltlen = 0;
 	gtlena = 0;
 	ltlena = 0;
-	busy = false;
 
 	/*
 	 * Set up cursors and see if there are any free extents as big as
@@ -1290,8 +1292,8 @@ xfs_alloc_ag_vextent_near(
 			if (error)
 				goto out;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
+			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
+					&ltbnoa, &ltlena, &acur.busy_gen);
 			if (ltlena < args->minlen)
 				continue;
 			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
@@ -1373,8 +1375,8 @@ xfs_alloc_ag_vextent_near(
 			if (error)
 				goto out;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
+			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
+					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
 			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
 				break;
 			error = xfs_btree_decrement(acur.bnolt, 0, &i);
@@ -1388,8 +1390,8 @@ xfs_alloc_ag_vextent_near(
 			if (error)
 				goto out;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
-					&gtbnoa, &gtlena, &busy_gen);
+			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
+					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
 			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
 				break;
 			error = xfs_btree_increment(acur.bnogt, 0, &i);
@@ -1449,9 +1451,10 @@ xfs_alloc_ag_vextent_near(
 	 */
 	if (!xfs_alloc_cur_active(acur.bnolt) &&
 	    !xfs_alloc_cur_active(acur.bnogt)) {
-		if (busy) {
+		if (acur.busy) {
 			trace_xfs_alloc_near_busy(args);
-			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
+			xfs_extent_busy_flush(args->mp, args->pag,
+					      acur.busy_gen);
 			goto restart;
 		}
 		trace_xfs_alloc_size_neither(args);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (2 preceding siblings ...)
  2019-09-27 17:17 ` [PATCH v5 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-10-04 22:35   ` Darrick J. Wong
  2019-09-27 17:17 ` [PATCH v5 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

If the size lookup lands in the last block of the by-size btree, the
near mode algorithm scans the entire block for the extent with best
available locality. In preparation for similar best available
extent tracking across both btrees, extend the allocation cursor
with best extent data and lift the associated state from the cntbt
last block scan code. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 63 ++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index aa71a01a93b1..4514a059486e 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -716,6 +716,11 @@ struct xfs_alloc_cur {
 	struct xfs_btree_cur		*cnt;	/* btree cursors */
 	struct xfs_btree_cur		*bnolt;
 	struct xfs_btree_cur		*bnogt;
+	xfs_agblock_t			rec_bno;/* extent startblock */
+	xfs_extlen_t			rec_len;/* extent length */
+	xfs_agblock_t			bno;	/* alloc bno */
+	xfs_extlen_t			len;	/* alloc len */
+	xfs_extlen_t			diff;	/* diff from search bno */
 	unsigned int			busy_gen;/* busy state */
 	bool				busy;
 };
@@ -735,6 +740,11 @@ xfs_alloc_cur_setup(
 
 	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
 
+	acur->rec_bno = 0;
+	acur->rec_len = 0;
+	acur->bno = 0;
+	acur->len = 0;
+	acur->diff = 0;
 	acur->busy = false;
 	acur->busy_gen = 0;
 
@@ -1247,10 +1257,7 @@ xfs_alloc_ag_vextent_near(
 	 * but we never loop back to the top.
 	 */
 	while (xfs_btree_islastblock(acur.cnt, 0)) {
-		xfs_extlen_t	bdiff;
-		int		besti=0;
-		xfs_extlen_t	blen=0;
-		xfs_agblock_t	bnew=0;
+		xfs_extlen_t	diff;
 
 #ifdef DEBUG
 		if (dofirst)
@@ -1281,8 +1288,8 @@ xfs_alloc_ag_vextent_near(
 				break;
 		}
 		i = acur.cnt->bc_ptrs[0];
-		for (j = 1, blen = 0, bdiff = 0;
-		     !error && j && (blen < args->maxlen || bdiff > 0);
+		for (j = 1;
+		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
 		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
 			/*
 			 * For each entry, decide if it's better than
@@ -1301,44 +1308,40 @@ xfs_alloc_ag_vextent_near(
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			ASSERT(args->len >= args->minlen);
-			if (args->len < blen)
+			if (args->len < acur.len)
 				continue;
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
+			diff = xfs_alloc_compute_diff(args->agbno, args->len,
 				args->alignment, args->datatype, ltbnoa,
 				ltlena, &ltnew);
 			if (ltnew != NULLAGBLOCK &&
-			    (args->len > blen || ltdiff < bdiff)) {
-				bdiff = ltdiff;
-				bnew = ltnew;
-				blen = args->len;
-				besti = acur.cnt->bc_ptrs[0];
+			    (args->len > acur.len || diff < acur.diff)) {
+				acur.rec_bno = ltbno;
+				acur.rec_len = ltlen;
+				acur.diff = diff;
+				acur.bno = ltnew;
+				acur.len = args->len;
 			}
 		}
 		/*
 		 * It didn't work.  We COULD be in a case where
 		 * there's a good record somewhere, so try again.
 		 */
-		if (blen == 0)
+		if (acur.len == 0)
 			break;
-		/*
-		 * Point at the best entry, and retrieve it again.
-		 */
-		acur.cnt->bc_ptrs[0] = besti;
-		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
-		if (error)
-			goto out;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-		args->len = blen;
 
 		/*
-		 * We are allocating starting at bnew for blen blocks.
+		 * Allocate at the bno/len tracked in the cursor.
 		 */
-		args->agbno = bnew;
-		ASSERT(bnew >= ltbno);
-		ASSERT(bnew + blen <= ltbno + ltlen);
-		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
-					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
+		args->agbno = acur.bno;
+		args->len = acur.len;
+		ASSERT(acur.bno >= acur.rec_bno);
+		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
+		ASSERT(acur.rec_bno + acur.rec_len <=
+		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+
+		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
+				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
+				0);
 		if (error)
 			goto out;
 		trace_xfs_alloc_near_first(args);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (3 preceding siblings ...)
  2019-09-27 17:17 ` [PATCH v5 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-10-04 22:40   ` Darrick J. Wong
  2019-09-27 17:17 ` [PATCH v5 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

The cntbt lastblock scan checks the size, alignment, locality, etc.
of each free extent in the block and compares it with the current
best candidate. This logic will be reused by the upcoming optimized
cntbt algorithm, so refactor it into a separate helper. Note that
acur->diff is now initialized to -1 (unsigned) instead of 0 to
support the more granular comparison logic in the new helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 115 ++++++++++++++++++++++++++++----------
 fs/xfs/xfs_trace.h        |  26 +++++++++
 2 files changed, 113 insertions(+), 28 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 4514a059486e..38183953636a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -744,7 +744,7 @@ xfs_alloc_cur_setup(
 	acur->rec_len = 0;
 	acur->bno = 0;
 	acur->len = 0;
-	acur->diff = 0;
+	acur->diff = -1;
 	acur->busy = false;
 	acur->busy_gen = 0;
 
@@ -791,6 +791,89 @@ xfs_alloc_cur_close(
 	acur->cnt = acur->bnolt = acur->bnogt = NULL;
 }
 
+/*
+ * Check an extent for allocation and track the best available candidate in the
+ * allocation structure. The cursor is deactivated if it has entered an out of
+ * range state based on allocation arguments. Optionally return the extent
+ * extent geometry and allocation status if requested by the caller.
+ */
+static int
+xfs_alloc_cur_check(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	struct xfs_btree_cur	*cur,
+	int			*new)
+{
+	int			error, i;
+	xfs_agblock_t		bno, bnoa, bnew;
+	xfs_extlen_t		len, lena, diff = -1;
+	bool			busy;
+	unsigned		busy_gen = 0;
+	bool			deactivate = false;
+
+	*new = 0;
+
+	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
+
+	/*
+	 * Check minlen and deactivate a cntbt cursor if out of acceptable size
+	 * range (i.e., walking backwards looking for a minlen extent).
+	 */
+	if (len < args->minlen) {
+		deactivate = true;
+		goto out;
+	}
+
+	busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
+					 &busy_gen);
+	acur->busy |= busy;
+	if (busy)
+		acur->busy_gen = busy_gen;
+	/* deactivate a bnobt cursor outside of locality range */
+	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
+		goto out;
+	if (lena < args->minlen)
+		goto out;
+
+	args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
+	xfs_alloc_fix_len(args);
+	ASSERT(args->len >= args->minlen);
+	if (args->len < acur->len)
+		goto out;
+
+	/*
+	 * We have an aligned record that satisfies minlen and beats or matches
+	 * the candidate extent size. Compare locality for near allocation mode.
+	 */
+	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
+	diff = xfs_alloc_compute_diff(args->agbno, args->len,
+				      args->alignment, args->datatype,
+				      bnoa, lena, &bnew);
+	if (bnew == NULLAGBLOCK)
+		goto out;
+	if (diff > acur->diff)
+		goto out;
+
+	ASSERT(args->len > acur->len ||
+	       (args->len == acur->len && diff <= acur->diff));
+	acur->rec_bno = bno;
+	acur->rec_len = len;
+	acur->bno = bnew;
+	acur->len = args->len;
+	acur->diff = diff;
+	*new = 1;
+
+out:
+	if (deactivate)
+		cur->bc_private.a.priv.abt.active = false;
+	trace_xfs_alloc_cur_check(args->mp, cur->bc_btnum, bno, len, diff,
+				  *new);
+	return 0;
+}
+
 /*
  * Deal with the case where only small freespaces remain. Either return the
  * contents of the last freespace record, or allocate space from the freelist if
@@ -1257,8 +1340,6 @@ xfs_alloc_ag_vextent_near(
 	 * but we never loop back to the top.
 	 */
 	while (xfs_btree_islastblock(acur.cnt, 0)) {
-		xfs_extlen_t	diff;
-
 #ifdef DEBUG
 		if (dofirst)
 			break;
@@ -1289,38 +1370,16 @@ xfs_alloc_ag_vextent_near(
 		}
 		i = acur.cnt->bc_ptrs[0];
 		for (j = 1;
-		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
+		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
+		     (acur.len < args->maxlen || acur.diff > 0);
 		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
 			/*
 			 * For each entry, decide if it's better than
 			 * the previous best entry.
 			 */
-			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
+			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
 			if (error)
 				goto out;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &acur.busy_gen);
-			if (ltlena < args->minlen)
-				continue;
-			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
-				continue;
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ASSERT(args->len >= args->minlen);
-			if (args->len < acur.len)
-				continue;
-			diff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-			if (ltnew != NULLAGBLOCK &&
-			    (args->len > acur.len || diff < acur.diff)) {
-				acur.rec_bno = ltbno;
-				acur.rec_len = ltlen;
-				acur.diff = diff;
-				acur.bno = ltnew;
-				acur.len = args->len;
-			}
 		}
 		/*
 		 * It didn't work.  We COULD be in a case where
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..cf618919cacf 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1663,6 +1663,32 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
 
+TRACE_EVENT(xfs_alloc_cur_check,
+	TP_PROTO(struct xfs_mount *mp, xfs_btnum_t btnum, xfs_agblock_t bno,
+		 xfs_extlen_t len, xfs_extlen_t diff, bool new),
+	TP_ARGS(mp, btnum, bno, len, diff, new),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agblock_t, bno)
+		__field(xfs_extlen_t, len)
+		__field(xfs_extlen_t, diff)
+		__field(bool, new)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->btnum = btnum;
+		__entry->bno = bno;
+		__entry->len = len;
+		__entry->diff = diff;
+		__entry->new = new;
+	),
+	TP_printk("dev %d:%d btree %s bno 0x%x len 0x%x diff 0x%x new %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
+		  __entry->bno, __entry->len, __entry->diff, __entry->new)
+)
+
 DECLARE_EVENT_CLASS(xfs_da_class,
 	TP_PROTO(struct xfs_da_args *args),
 	TP_ARGS(args),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (4 preceding siblings ...)
  2019-09-27 17:17 ` [PATCH v5 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-10-04 22:45   ` Darrick J. Wong
  2019-09-27 17:17 ` [PATCH v5 07/11] xfs: refactor allocation tree fixup code Brian Foster
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

The near mode bnobt scan searches left and right in the bnobt
looking for the closest free extent to the allocation hint that
satisfies minlen. Once such an extent is found, the left/right
search terminates, we search one more time in the opposite direction
and finish the allocation with the best overall extent.

The left/right and find best searches are currently controlled via a
combination of cursor state and local variables. Clean up this code
and prepare for further improvements to the near mode fallback
algorithm by reusing the allocation cursor best extent tracking
mechanism. Update the tracking logic to deactivate bnobt cursors
when out of allocation range and replace open-coded extent checks to
calls to the common helper. In doing so, rename some misnamed local
variables in the top-level near mode allocation function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 276 +++++++++++---------------------------
 fs/xfs/xfs_trace.h        |   4 +-
 2 files changed, 79 insertions(+), 201 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 38183953636a..d412ae41676a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -810,6 +810,7 @@ xfs_alloc_cur_check(
 	bool			busy;
 	unsigned		busy_gen = 0;
 	bool			deactivate = false;
+	bool			isbnobt = cur->bc_btnum == XFS_BTNUM_BNO;
 
 	*new = 0;
 
@@ -823,7 +824,7 @@ xfs_alloc_cur_check(
 	 * range (i.e., walking backwards looking for a minlen extent).
 	 */
 	if (len < args->minlen) {
-		deactivate = true;
+		deactivate = !isbnobt;
 		goto out;
 	}
 
@@ -833,8 +834,10 @@ xfs_alloc_cur_check(
 	if (busy)
 		acur->busy_gen = busy_gen;
 	/* deactivate a bnobt cursor outside of locality range */
-	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
+	if (bnoa < args->min_agbno || bnoa > args->max_agbno) {
+		deactivate = isbnobt;
 		goto out;
+	}
 	if (lena < args->minlen)
 		goto out;
 
@@ -854,8 +857,14 @@ xfs_alloc_cur_check(
 				      bnoa, lena, &bnew);
 	if (bnew == NULLAGBLOCK)
 		goto out;
-	if (diff > acur->diff)
+
+	/*
+	 * Deactivate a bnobt cursor with worse locality than the current best.
+	 */
+	if (diff > acur->diff) {
+		deactivate = isbnobt;
 		goto out;
+	}
 
 	ASSERT(args->len > acur->len ||
 	       (args->len == acur->len && diff <= acur->diff));
@@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
 }
 
 /*
- * Search the btree in a given direction via the search cursor and compare
- * the records found against the good extent we've already found.
+ * Search the btree in a given direction and check the records against the good
+ * extent we've already found.
  */
 STATIC int
 xfs_alloc_find_best_extent(
-	struct xfs_alloc_arg	*args,	/* allocation argument structure */
-	struct xfs_btree_cur	*gcur,	/* good cursor */
-	struct xfs_btree_cur	*scur,	/* searching cursor */
-	xfs_agblock_t		gdiff,	/* difference for search comparison */
-	xfs_agblock_t		*sbno,	/* extent found by search */
-	xfs_extlen_t		*slen,	/* extent length */
-	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
-	xfs_extlen_t		*slena,	/* aligned extent length */
-	int			dir)	/* 0 = search right, 1 = search left */
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	struct xfs_btree_cur	*cur,
+	bool			increment)
 {
-	xfs_agblock_t		new;
-	xfs_agblock_t		sdiff;
 	int			error;
 	int			i;
-	unsigned		busy_gen;
-
-	/* The good extent is perfect, no need to  search. */
-	if (!gdiff)
-		goto out_use_good;
 
 	/*
-	 * Look until we find a better one, run out of space or run off the end.
+	 * Search so long as the cursor is active or we find a better extent.
+	 * The cursor is deactivated if it extends beyond the range of the
+	 * current allocation candidate.
 	 */
-	do {
-		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
+	while (xfs_alloc_cur_active(cur)) {
+		error = xfs_alloc_cur_check(args, acur, cur, &i);
 		if (error)
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-		xfs_alloc_compute_aligned(args, *sbno, *slen,
-				sbnoa, slena, &busy_gen);
-
-		/*
-		 * The good extent is closer than this one.
-		 */
-		if (!dir) {
-			if (*sbnoa > args->max_agbno)
-				goto out_use_good;
-			if (*sbnoa >= args->agbno + gdiff)
-				goto out_use_good;
-		} else {
-			if (*sbnoa < args->min_agbno)
-				goto out_use_good;
-			if (*sbnoa <= args->agbno - gdiff)
-				goto out_use_good;
-		}
-
-		/*
-		 * Same distance, compare length and pick the best.
-		 */
-		if (*slena >= args->minlen) {
-			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
-			xfs_alloc_fix_len(args);
-
-			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-						       args->alignment,
-						       args->datatype, *sbnoa,
-						       *slena, &new);
-
-			/*
-			 * Choose closer size and invalidate other cursor.
-			 */
-			if (sdiff < gdiff)
-				goto out_use_search;
-			goto out_use_good;
-		}
+			return error;
+		if (i == 1)
+			break;
+		if (!xfs_alloc_cur_active(cur))
+			break;
 
-		if (!dir)
-			error = xfs_btree_increment(scur, 0, &i);
+		if (increment)
+			error = xfs_btree_increment(cur, 0, &i);
 		else
-			error = xfs_btree_decrement(scur, 0, &i);
+			error = xfs_btree_decrement(cur, 0, &i);
 		if (error)
-			goto error0;
-	} while (i);
-
-out_use_good:
-	scur->bc_private.a.priv.abt.active = false;
-	return 0;
+			return error;
+		if (i == 0)
+			cur->bc_private.a.priv.abt.active = false;
+	}
 
-out_use_search:
-	gcur->bc_private.a.priv.abt.active = false;
 	return 0;
-
-error0:
-	/* caller invalidates cursors */
-	return error;
 }
 
 /*
@@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_alloc_cur	acur = {0,};
-	struct xfs_btree_cur	*bno_cur;
-	xfs_agblock_t	gtbno;		/* start bno of right side entry */
-	xfs_agblock_t	gtbnoa;		/* aligned ... */
-	xfs_extlen_t	gtdiff;		/* difference to right side entry */
-	xfs_extlen_t	gtlen;		/* length of right side entry */
-	xfs_extlen_t	gtlena;		/* aligned ... */
-	xfs_agblock_t	gtnew;		/* useful start bno of right side */
-	int		error;		/* error code */
-	int		i;		/* result code, temporary */
-	int		j;		/* result code, temporary */
-	xfs_agblock_t	ltbno;		/* start bno of left side entry */
-	xfs_agblock_t	ltbnoa;		/* aligned ... */
-	xfs_extlen_t	ltdiff;		/* difference to left side entry */
-	xfs_extlen_t	ltlen;		/* length of left side entry */
-	xfs_extlen_t	ltlena;		/* aligned ... */
-	xfs_agblock_t	ltnew;		/* useful start bno of left side */
-	xfs_extlen_t	rlen;		/* length of returned extent */
+	struct xfs_btree_cur	*fbcur = NULL;
+	int			error;		/* error code */
+	int			i;		/* result code, temporary */
+	int			j;		/* result code, temporary */
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len;
+	bool			fbinc = false;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
@@ -1304,9 +1251,7 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = args->max_agbno;
 
 restart:
-	ltlen = 0;
-	gtlena = 0;
-	ltlena = 0;
+	len = 0;
 
 	/*
 	 * Set up cursors and see if there are any free extents as big as
@@ -1315,11 +1260,11 @@ xfs_alloc_ag_vextent_near(
 	 */
 	error = xfs_alloc_cur_setup(args, &acur);
 	if (error == -ENOSPC) {
-		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
-				&ltlen, &i);
+		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &bno,
+				&len, &i);
 		if (error)
 			goto out;
-		if (i == 0 || ltlen == 0) {
+		if (i == 0 || len == 0) {
 			trace_xfs_alloc_near_noentry(args);
 			goto out;
 		}
@@ -1350,21 +1295,21 @@ xfs_alloc_ag_vextent_near(
 		 * record smaller than maxlen, go to the start of this block,
 		 * and skip all those smaller than minlen.
 		 */
-		if (ltlen || args->alignment > 1) {
+		if (len || args->alignment > 1) {
 			acur.cnt->bc_ptrs[0] = 1;
 			do {
-				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
-						&ltlen, &i);
+				error = xfs_alloc_get_rec(acur.cnt, &bno, &len,
+						&i);
 				if (error)
 					goto out;
 				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-				if (ltlen >= args->minlen)
+				if (len >= args->minlen)
 					break;
 				error = xfs_btree_increment(acur.cnt, 0, &i);
 				if (error)
 					goto out;
 			} while (i);
-			ASSERT(ltlen >= args->minlen);
+			ASSERT(len >= args->minlen);
 			if (!i)
 				break;
 		}
@@ -1433,77 +1378,43 @@ xfs_alloc_ag_vextent_near(
 	 */
 	do {
 		if (xfs_alloc_cur_active(acur.bnolt)) {
-			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
+			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
 			if (error)
 				goto out;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
-					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
-			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
+			if (i == 1) {
+				trace_xfs_alloc_cur_left(args);
+				fbcur = acur.bnogt;
+				fbinc = true;
 				break;
+			}
 			error = xfs_btree_decrement(acur.bnolt, 0, &i);
 			if (error)
 				goto out;
-			if (!i || ltbnoa < args->min_agbno)
+			if (!i)
 				acur.bnolt->bc_private.a.priv.abt.active = false;
 		}
 		if (xfs_alloc_cur_active(acur.bnogt)) {
-			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
+			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
 			if (error)
 				goto out;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
-					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
-			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
+			if (i == 1) {
+				trace_xfs_alloc_cur_right(args);
+				fbcur = acur.bnolt;
+				fbinc = false;
 				break;
+			}
 			error = xfs_btree_increment(acur.bnogt, 0, &i);
 			if (error)
 				goto out;
-			if (!i || gtbnoa > args->max_agbno)
+			if (!i)
 				acur.bnogt->bc_private.a.priv.abt.active = false;
 		}
 	} while (xfs_alloc_cur_active(acur.bnolt) ||
 		 xfs_alloc_cur_active(acur.bnogt));
 
-	/*
-	 * Got both cursors still active, need to find better entry.
-	 */
-	if (xfs_alloc_cur_active(acur.bnolt) &&
-	    xfs_alloc_cur_active(acur.bnogt)) {
-		if (ltlena >= args->minlen) {
-			/*
-			 * Left side is good, look for a right side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						acur.bnolt, acur.bnogt,
-						ltdiff, &gtbno, &gtlen,
-						&gtbnoa, &gtlena,
-						0 /* search right */);
-		} else {
-			ASSERT(gtlena >= args->minlen);
-
-			/*
-			 * Right side is good, look for a left side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, gtbnoa,
-				gtlena, &gtnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						acur.bnogt, acur.bnolt,
-						gtdiff, &ltbno, &ltlen,
-						&ltbnoa, &ltlena,
-						1 /* search left */);
-		}
-
+	/* search the opposite direction for a better entry */
+	if (fbcur) {
+		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
 		if (error)
 			goto out;
 	}
@@ -1511,8 +1422,7 @@ xfs_alloc_ag_vextent_near(
 	/*
 	 * If we couldn't get anything, give up.
 	 */
-	if (!xfs_alloc_cur_active(acur.bnolt) &&
-	    !xfs_alloc_cur_active(acur.bnogt)) {
+	if (!acur.len) {
 		if (acur.busy) {
 			trace_xfs_alloc_near_busy(args);
 			xfs_extent_busy_flush(args->mp, args->pag,
@@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
 		goto out;
 	}
 
-	/*
-	 * At this point we have selected a freespace entry, either to the
-	 * left or to the right.  If it's on the right, copy all the
-	 * useful variables to the "left" set so we only have one
-	 * copy of this code.
-	 */
-	if (xfs_alloc_cur_active(acur.bnogt)) {
-		bno_cur = acur.bnogt;
-		ltbno = gtbno;
-		ltbnoa = gtbnoa;
-		ltlen = gtlen;
-		ltlena = gtlena;
-		j = 1;
-	} else {
-		bno_cur = acur.bnolt;
-		j = 0;
-	}
-
-	/*
-	 * Fix up the length and compute the useful address.
-	 */
-	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-	xfs_alloc_fix_len(args);
-	rlen = args->len;
-	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
-				     args->datatype, ltbnoa, ltlena, &ltnew);
-	ASSERT(ltnew >= ltbno);
-	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
-	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
-	args->agbno = ltnew;
-
-	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
-				      rlen, XFSA_FIXUP_BNO_OK);
-	if (error)
-		goto out;
+	args->agbno = acur.bno;
+	args->len = acur.len;
+	ASSERT(acur.bno >= acur.rec_bno);
+	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
+	ASSERT(acur.rec_bno + acur.rec_len <=
+	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
 
-	if (j)
-		trace_xfs_alloc_near_greater(args);
-	else
-		trace_xfs_alloc_near_lesser(args);
+	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
+				      acur.rec_len, acur.bno, acur.len, 0);
 
 out:
 	xfs_alloc_cur_close(&acur, error);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index cf618919cacf..ebe85e9ed7c8 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 07/11] xfs: refactor allocation tree fixup code
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (5 preceding siblings ...)
  2019-09-27 17:17 ` [PATCH v5 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-09-27 17:17 ` [PATCH v5 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

Both algorithms duplicate the same btree allocation code. Eliminate
the duplication and reuse the fallback algorithm codepath.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index d412ae41676a..32b378c8e16c 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1333,23 +1333,8 @@ xfs_alloc_ag_vextent_near(
 		if (acur.len == 0)
 			break;
 
-		/*
-		 * Allocate at the bno/len tracked in the cursor.
-		 */
-		args->agbno = acur.bno;
-		args->len = acur.len;
-		ASSERT(acur.bno >= acur.rec_bno);
-		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
-		ASSERT(acur.rec_bno + acur.rec_len <=
-		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-
-		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
-				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
-				0);
-		if (error)
-			goto out;
 		trace_xfs_alloc_near_first(args);
-		goto out;
+		goto alloc;
 	}
 
 	/*
@@ -1434,6 +1419,7 @@ xfs_alloc_ag_vextent_near(
 		goto out;
 	}
 
+alloc:
 	args->agbno = acur.bno;
 	args->len = acur.len;
 	ASSERT(acur.bno >= acur.rec_bno);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 08/11] xfs: refactor and reuse best extent scanning logic
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (6 preceding siblings ...)
  2019-09-27 17:17 ` [PATCH v5 07/11] xfs: refactor allocation tree fixup code Brian Foster
@ 2019-09-27 17:17 ` Brian Foster
  2019-10-04 22:59   ` Darrick J. Wong
  2019-09-27 17:18 ` [PATCH v5 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:17 UTC (permalink / raw)
  To: linux-xfs

The bnobt "find best" helper implements a simple btree walker
function. This general pattern, or a subset thereof, is reused in
various parts of a near mode allocation operation. For example, the
bnobt left/right scans are each iterative btree walks along with the
cntbt lastblock scan.

Rework this function into a generic btree walker, add a couple
parameters to control termination behavior from various contexts and
reuse it where applicable.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 110 +++++++++++++++++++-------------------
 1 file changed, 55 insertions(+), 55 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 32b378c8e16c..85e82e184ec9 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -875,6 +875,13 @@ xfs_alloc_cur_check(
 	acur->diff = diff;
 	*new = 1;
 
+	/*
+	 * We're done if we found a perfect allocation. This only deactivates
+	 * the current cursor, but this is just an optimization to terminate a
+	 * cntbt search that otherwise runs to the edge of the tree.
+	 */
+	if (acur->diff == 0 && acur->len == args->maxlen)
+		deactivate = true;
 out:
 	if (deactivate)
 		cur->bc_private.a.priv.abt.active = false;
@@ -1172,30 +1179,38 @@ xfs_alloc_ag_vextent_exact(
 }
 
 /*
- * Search the btree in a given direction and check the records against the good
- * extent we've already found.
+ * Search a given number of btree records in a given direction. Check each
+ * record against the good extent we've already found.
  */
 STATIC int
-xfs_alloc_find_best_extent(
+xfs_alloc_walk_iter(
 	struct xfs_alloc_arg	*args,
 	struct xfs_alloc_cur	*acur,
 	struct xfs_btree_cur	*cur,
-	bool			increment)
+	bool			increment,
+	bool			find_one, /* quit on first candidate */
+	int			count,    /* rec count (-1 for infinite) */
+	int			*stat)
 {
 	int			error;
 	int			i;
 
+	*stat = 0;
+
 	/*
 	 * Search so long as the cursor is active or we find a better extent.
 	 * The cursor is deactivated if it extends beyond the range of the
 	 * current allocation candidate.
 	 */
-	while (xfs_alloc_cur_active(cur)) {
+	while (xfs_alloc_cur_active(cur) && count) {
 		error = xfs_alloc_cur_check(args, acur, cur, &i);
 		if (error)
 			return error;
-		if (i == 1)
-			break;
+		if (i == 1) {
+			*stat = 1;
+			if (find_one)
+				break;
+		}
 		if (!xfs_alloc_cur_active(cur))
 			break;
 
@@ -1207,6 +1222,9 @@ xfs_alloc_find_best_extent(
 			return error;
 		if (i == 0)
 			cur->bc_private.a.priv.abt.active = false;
+
+		if (count > 0)
+			count--;
 	}
 
 	return 0;
@@ -1226,7 +1244,6 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_btree_cur	*fbcur = NULL;
 	int			error;		/* error code */
 	int			i;		/* result code, temporary */
-	int			j;		/* result code, temporary */
 	xfs_agblock_t		bno;
 	xfs_extlen_t		len;
 	bool			fbinc = false;
@@ -1313,19 +1330,12 @@ xfs_alloc_ag_vextent_near(
 			if (!i)
 				break;
 		}
-		i = acur.cnt->bc_ptrs[0];
-		for (j = 1;
-		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
-		     (acur.len < args->maxlen || acur.diff > 0);
-		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
-			/*
-			 * For each entry, decide if it's better than
-			 * the previous best entry.
-			 */
-			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
-			if (error)
-				goto out;
-		}
+
+		error = xfs_alloc_walk_iter(args, &acur, acur.cnt, true, false,
+					    -1, &i);
+		if (error)
+			goto out;
+
 		/*
 		 * It didn't work.  We COULD be in a case where
 		 * there's a good record somewhere, so try again.
@@ -1357,49 +1367,39 @@ xfs_alloc_ag_vextent_near(
 		goto out;
 
 	/*
-	 * Loop going left with the leftward cursor, right with the
-	 * rightward cursor, until either both directions give up or
-	 * we find an entry at least as big as minlen.
+	 * Loop going left with the leftward cursor, right with the rightward
+	 * cursor, until either both directions give up or we find an entry at
+	 * least as big as minlen.
 	 */
 	do {
-		if (xfs_alloc_cur_active(acur.bnolt)) {
-			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
-			if (error)
-				goto out;
-			if (i == 1) {
-				trace_xfs_alloc_cur_left(args);
-				fbcur = acur.bnogt;
-				fbinc = true;
-				break;
-			}
-			error = xfs_btree_decrement(acur.bnolt, 0, &i);
-			if (error)
-				goto out;
-			if (!i)
-				acur.bnolt->bc_private.a.priv.abt.active = false;
+		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
+					    true, 1, &i);
+		if (error)
+			goto out;
+		if (i == 1) {
+			trace_xfs_alloc_cur_left(args);
+			fbcur = acur.bnogt;
+			fbinc = true;
+			break;
 		}
-		if (xfs_alloc_cur_active(acur.bnogt)) {
-			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
-			if (error)
-				goto out;
-			if (i == 1) {
-				trace_xfs_alloc_cur_right(args);
-				fbcur = acur.bnolt;
-				fbinc = false;
-				break;
-			}
-			error = xfs_btree_increment(acur.bnogt, 0, &i);
-			if (error)
-				goto out;
-			if (!i)
-				acur.bnogt->bc_private.a.priv.abt.active = false;
+
+		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
+					    1, &i);
+		if (error)
+			goto out;
+		if (i == 1) {
+			trace_xfs_alloc_cur_right(args);
+			fbcur = acur.bnolt;
+			fbinc = false;
+			break;
 		}
 	} while (xfs_alloc_cur_active(acur.bnolt) ||
 		 xfs_alloc_cur_active(acur.bnogt));
 
 	/* search the opposite direction for a better entry */
 	if (fbcur) {
-		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
+		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
+					    &i);
 		if (error)
 			goto out;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 09/11] xfs: refactor near mode alloc bnobt scan into separate function
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (7 preceding siblings ...)
  2019-09-27 17:17 ` [PATCH v5 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
@ 2019-09-27 17:18 ` Brian Foster
  2019-09-27 17:18 ` [PATCH v5 10/11] xfs: factor out tree fixup logic into helper Brian Foster
  2019-09-27 17:18 ` [PATCH v5 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
  10 siblings, 0 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:18 UTC (permalink / raw)
  To: linux-xfs

In preparation to enhance the near mode allocation bnobt scan algorithm, lift
it into a separate function. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 128 ++++++++++++++++++++++----------------
 1 file changed, 74 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 85e82e184ec9..c1f59bfa8d09 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1230,6 +1230,78 @@ xfs_alloc_walk_iter(
 	return 0;
 }
 
+/*
+ * Search in the by-bno btree to the left and to the right simultaneously until
+ * in each case we find a large enough free extent or run into the edge of the
+ * tree. When we run into the edge of the tree, we deactivate that cursor.
+ */
+STATIC int
+xfs_alloc_ag_vextent_bnobt(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	int			*stat)
+{
+	struct xfs_btree_cur	*fbcur = NULL;
+	int			error;
+	int			i;
+	bool			fbinc;
+
+	ASSERT(acur->len == 0);
+	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
+
+	*stat = 0;
+
+	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
+	if (error)
+		return error;
+	error = xfs_alloc_lookup_ge(acur->bnogt, args->agbno, 0, &i);
+	if (error)
+		return error;
+
+	/*
+	 * Loop going left with the leftward cursor, right with the rightward
+	 * cursor, until either both directions give up or we find an entry at
+	 * least as big as minlen.
+	 */
+	while (xfs_alloc_cur_active(acur->bnolt) ||
+	       xfs_alloc_cur_active(acur->bnogt)) {
+		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
+					    true, 1, &i);
+		if (error)
+			return error;
+		if (i == 1) {
+			trace_xfs_alloc_cur_left(args);
+			fbcur = acur->bnogt;
+			fbinc = true;
+			break;
+		}
+
+		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
+					    1, &i);
+		if (error)
+			return error;
+		if (i == 1) {
+			trace_xfs_alloc_cur_right(args);
+			fbcur = acur->bnolt;
+			fbinc = false;
+			break;
+		}
+	}
+
+	/* search the opposite direction for a better entry */
+	if (fbcur) {
+		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
+					    &i);
+		if (error)
+			return error;
+	}
+
+	if (acur->len)
+		*stat = 1;
+
+	return 0;
+}
+
 /*
  * Allocate a variable extent near bno in the allocation group agno.
  * Extent's length (returned in len) will be between minlen and maxlen,
@@ -1241,12 +1313,10 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_alloc_cur	acur = {0,};
-	struct xfs_btree_cur	*fbcur = NULL;
 	int			error;		/* error code */
 	int			i;		/* result code, temporary */
 	xfs_agblock_t		bno;
 	xfs_extlen_t		len;
-	bool			fbinc = false;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
@@ -1348,62 +1418,12 @@ xfs_alloc_ag_vextent_near(
 	}
 
 	/*
-	 * Second algorithm.
-	 * Search in the by-bno tree to the left and to the right
-	 * simultaneously, until in each case we find a space big enough,
-	 * or run into the edge of the tree.  When we run into the edge,
-	 * we deallocate that cursor.
-	 * If both searches succeed, we compare the two spaces and pick
-	 * the better one.
-	 * With alignment, it's possible for both to fail; the upper
-	 * level algorithm that picks allocation groups for allocations
-	 * is not supposed to do this.
+	 * Second algorithm. Search the bnobt left and right.
 	 */
-	error = xfs_alloc_lookup_le(acur.bnolt, args->agbno, 0, &i);
-	if (error)
-		goto out;
-	error = xfs_alloc_lookup_ge(acur.bnogt, args->agbno, 0, &i);
+	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
 	if (error)
 		goto out;
 
-	/*
-	 * Loop going left with the leftward cursor, right with the rightward
-	 * cursor, until either both directions give up or we find an entry at
-	 * least as big as minlen.
-	 */
-	do {
-		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
-					    true, 1, &i);
-		if (error)
-			goto out;
-		if (i == 1) {
-			trace_xfs_alloc_cur_left(args);
-			fbcur = acur.bnogt;
-			fbinc = true;
-			break;
-		}
-
-		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
-					    1, &i);
-		if (error)
-			goto out;
-		if (i == 1) {
-			trace_xfs_alloc_cur_right(args);
-			fbcur = acur.bnolt;
-			fbinc = false;
-			break;
-		}
-	} while (xfs_alloc_cur_active(acur.bnolt) ||
-		 xfs_alloc_cur_active(acur.bnogt));
-
-	/* search the opposite direction for a better entry */
-	if (fbcur) {
-		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
-					    &i);
-		if (error)
-			goto out;
-	}
-
 	/*
 	 * If we couldn't get anything, give up.
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 10/11] xfs: factor out tree fixup logic into helper
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (8 preceding siblings ...)
  2019-09-27 17:18 ` [PATCH v5 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
@ 2019-09-27 17:18 ` Brian Foster
  2019-09-27 17:18 ` [PATCH v5 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
  10 siblings, 0 replies; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:18 UTC (permalink / raw)
  To: linux-xfs

Lift the btree fixup path into a helper function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 42 +++++++++++++++++++++++++++++----------
 fs/xfs/xfs_trace.h        |  1 +
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c1f59bfa8d09..b16aa41507e8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -890,6 +890,36 @@ xfs_alloc_cur_check(
 	return 0;
 }
 
+/*
+ * Complete an allocation of a candidate extent. Remove the extent from both
+ * trees and update the args structure.
+ */
+STATIC int
+xfs_alloc_cur_finish(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur)
+{
+	int			error;
+
+	ASSERT(acur->cnt && acur->bnolt);
+	ASSERT(acur->bno >= acur->rec_bno);
+	ASSERT(acur->bno + acur->len <= acur->rec_bno + acur->rec_len);
+	ASSERT(acur->rec_bno + acur->rec_len <=
+	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+
+	error = xfs_alloc_fixup_trees(acur->cnt, acur->bnolt, acur->rec_bno,
+				      acur->rec_len, acur->bno, acur->len, 0);
+	if (error)
+		return error;
+
+	args->agbno = acur->bno;
+	args->len = acur->len;
+	args->wasfromfl = 0;
+
+	trace_xfs_alloc_cur(args);
+	return 0;
+}
+
 /*
  * Deal with the case where only small freespaces remain. Either return the
  * contents of the last freespace record, or allocate space from the freelist if
@@ -1359,7 +1389,6 @@ xfs_alloc_ag_vextent_near(
 	} else if (error) {
 		goto out;
 	}
-	args->wasfromfl = 0;
 
 	/*
 	 * First algorithm.
@@ -1440,15 +1469,8 @@ xfs_alloc_ag_vextent_near(
 	}
 
 alloc:
-	args->agbno = acur.bno;
-	args->len = acur.len;
-	ASSERT(acur.bno >= acur.rec_bno);
-	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
-	ASSERT(acur.rec_bno + acur.rec_len <=
-	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-
-	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
-				      acur.rec_len, acur.bno, acur.len, 0);
+	/* fix up btrees on a successful allocation */
+	error = xfs_alloc_cur_finish(args, &acur);
 
 out:
 	xfs_alloc_cur_close(&acur, error);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ebe85e9ed7c8..21fa0e8822e3 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1642,6 +1642,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v5 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups
  2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (9 preceding siblings ...)
  2019-09-27 17:18 ` [PATCH v5 10/11] xfs: factor out tree fixup logic into helper Brian Foster
@ 2019-09-27 17:18 ` Brian Foster
  2019-10-04 23:20   ` Darrick J. Wong
  10 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2019-09-27 17:18 UTC (permalink / raw)
  To: linux-xfs

The near mode fallback algorithm consists of a left/right scan of
the bnobt. This algorithm has very poor breakdown characteristics
under worst case free space fragmentation conditions. If a suitable
extent is far enough from the locality hint, each allocation may
scan most or all of the bnobt before it completes. This causes
pathological behavior and extremely high allocation latencies.

While locality is important to near mode allocations, it is not so
important as to incur pathological allocation latency to provide the
asolute best available locality for every allocation. If the
allocation is large enough or far enough away, there is a point of
diminishing returns. As such, we can bound the overall operation by
including an iterative cntbt lookup in the broader search. The cntbt
lookup is optimized to immediately find the extent with best
locality for the given size on each iteration. Since the cntbt is
indexed by extent size, the lookup repeats with a variably
aggressive increasing search key size until it runs off the edge of
the tree.

This approach provides a natural balance between the two algorithms
for various situations. For example, the bnobt scan is able to
satisfy smaller allocations such as for inode chunks or btree blocks
more quickly where the cntbt search may have to search through a
large set of extent sizes when the search key starts off small
relative to the largest extent in the tree. On the other hand, the
cntbt search more deterministically covers the set of suitable
extents for larger data extent allocation requests that the bnobt
scan may have to search the entire tree to locate.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 154 +++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trace.h        |   2 +
 2 files changed, 144 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index b16aa41507e8..e9f74eb92073 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -716,6 +716,7 @@ struct xfs_alloc_cur {
 	struct xfs_btree_cur		*cnt;	/* btree cursors */
 	struct xfs_btree_cur		*bnolt;
 	struct xfs_btree_cur		*bnogt;
+	xfs_extlen_t			cur_len;/* current search length */
 	xfs_agblock_t			rec_bno;/* extent startblock */
 	xfs_extlen_t			rec_len;/* extent length */
 	xfs_agblock_t			bno;	/* alloc bno */
@@ -740,6 +741,7 @@ xfs_alloc_cur_setup(
 
 	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
 
+	acur->cur_len = args->maxlen;
 	acur->rec_bno = 0;
 	acur->rec_len = 0;
 	acur->bno = 0;
@@ -920,6 +922,76 @@ xfs_alloc_cur_finish(
 	return 0;
 }
 
+/*
+ * Locality allocation lookup algorithm. This expects a cntbt cursor and uses
+ * bno optimized lookup to search for extents with ideal size and locality.
+ */
+STATIC int
+xfs_alloc_cntbt_iter(
+	struct xfs_alloc_arg		*args,
+	struct xfs_alloc_cur		*acur)
+{
+	struct xfs_btree_cur	*cur = acur->cnt;
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len, cur_len;
+	int			error;
+	int			i;
+
+	if (!xfs_alloc_cur_active(cur))
+		return 0;
+
+	/* locality optimized lookup */
+	cur_len = acur->cur_len;
+	error = xfs_alloc_lookup_ge(cur, args->agbno, cur_len, &i);
+	if (error)
+		return error;
+	if (i == 0)
+		return 0;
+	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+	if (error)
+		return error;
+
+	/* check the current record and update search length from it */
+	error = xfs_alloc_cur_check(args, acur, cur, &i);
+	if (error)
+		return error;
+	ASSERT(len >= acur->cur_len);
+	acur->cur_len = len;
+
+	/*
+	 * We looked up the first record >= [agbno, len] above. The agbno is a
+	 * secondary key and so the current record may lie just before or after
+	 * agbno. If it is past agbno, check the previous record too so long as
+	 * the length matches as it may be closer. Don't check a smaller record
+	 * because that could deactivate our cursor.
+	 */
+	if (bno > args->agbno) {
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (!error && i) {
+			error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+			if (!error && i && len == acur->cur_len)
+				error = xfs_alloc_cur_check(args, acur, cur,
+							    &i);
+		}
+		if (error)
+			return error;
+	}
+
+	/*
+	 * Increment the search key until we find at least one allocation
+	 * candidate or if the extent we found was larger. Otherwise, double the
+	 * search key to optimize the search. Efficiency is more important here
+	 * than absolute best locality.
+	 */
+	cur_len <<= 1;
+	if (!acur->len || acur->cur_len >= cur_len)
+		acur->cur_len++;
+	else
+		acur->cur_len = cur_len;
+
+	return error;
+}
+
 /*
  * Deal with the case where only small freespaces remain. Either return the
  * contents of the last freespace record, or allocate space from the freelist if
@@ -1261,12 +1333,11 @@ xfs_alloc_walk_iter(
 }
 
 /*
- * Search in the by-bno btree to the left and to the right simultaneously until
- * in each case we find a large enough free extent or run into the edge of the
- * tree. When we run into the edge of the tree, we deactivate that cursor.
+ * Search the by-bno and by-size btrees in parallel in search of an extent with
+ * ideal locality based on the NEAR mode ->agbno locality hint.
  */
 STATIC int
-xfs_alloc_ag_vextent_bnobt(
+xfs_alloc_ag_vextent_locality(
 	struct xfs_alloc_arg	*args,
 	struct xfs_alloc_cur	*acur,
 	int			*stat)
@@ -1281,6 +1352,9 @@ xfs_alloc_ag_vextent_bnobt(
 
 	*stat = 0;
 
+	error = xfs_alloc_lookup_ge(acur->cnt, args->agbno, acur->cur_len, &i);
+	if (error)
+		return error;
 	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
 	if (error)
 		return error;
@@ -1289,12 +1363,37 @@ xfs_alloc_ag_vextent_bnobt(
 		return error;
 
 	/*
-	 * Loop going left with the leftward cursor, right with the rightward
-	 * cursor, until either both directions give up or we find an entry at
-	 * least as big as minlen.
+	 * Search the bnobt and cntbt in parallel. Search the bnobt left and
+	 * right and lookup the closest extent to the locality hint for each
+	 * extent size key in the cntbt. The entire search terminates
+	 * immediately on a bnobt hit because that means we've found best case
+	 * locality. Otherwise the search continues until the cntbt cursor runs
+	 * off the end of the tree. If no allocation candidate is found at this
+	 * point, give up on locality, walk backwards from the end of the cntbt
+	 * and take the first available extent.
+	 *
+	 * The parallel tree searches balance each other out to provide fairly
+	 * consistent performance for various situations. The bnobt search can
+	 * have pathological behavior in the worst case scenario of larger
+	 * allocation requests and fragmented free space. On the other hand, the
+	 * bnobt is able to satisfy most smaller allocation requests much more
+	 * quickly than the cntbt. The cntbt search can sift through fragmented
+	 * free space and sets of free extents for larger allocation requests
+	 * more quickly than the bnobt. Since the locality hint is just a hint
+	 * and we don't want to scan the entire bnobt for perfect locality, the
+	 * cntbt search essentially bounds the bnobt search such that we can
+	 * find good enough locality at reasonable performance in most cases.
 	 */
 	while (xfs_alloc_cur_active(acur->bnolt) ||
-	       xfs_alloc_cur_active(acur->bnogt)) {
+	       xfs_alloc_cur_active(acur->bnogt) ||
+	       xfs_alloc_cur_active(acur->cnt)) {
+
+		trace_xfs_alloc_cur_lookup(args);
+
+		/*
+		 * Search the bnobt left and right. In the case of a hit, finish
+		 * the search in the opposite direction and we're done.
+		 */
 		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
 					    true, 1, &i);
 		if (error)
@@ -1305,7 +1404,6 @@ xfs_alloc_ag_vextent_bnobt(
 			fbinc = true;
 			break;
 		}
-
 		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
 					    1, &i);
 		if (error)
@@ -1316,9 +1414,40 @@ xfs_alloc_ag_vextent_bnobt(
 			fbinc = false;
 			break;
 		}
+
+		/*
+		 * Check the extent with best locality based on the current
+		 * extent size search key and keep track of the best candidate.
+		 */
+		error = xfs_alloc_cntbt_iter(args, acur);
+		if (error)
+			return error;
+		if (!xfs_alloc_cur_active(acur->cnt)) {
+			trace_xfs_alloc_cur_lookup_done(args);
+			break;
+		}
+	}
+
+	/*
+	 * If we failed to find anything due to busy extents, return empty
+	 * handed so the caller can flush and retry. If no busy extents were
+	 * found, walk backwards from the end of the cntbt as a last resort.
+	 */
+	if (!xfs_alloc_cur_active(acur->cnt) && !acur->len && !acur->busy) {
+		error = xfs_btree_decrement(acur->cnt, 0, &i);
+		if (error)
+			return error;
+		if (i) {
+			acur->cnt->bc_private.a.priv.abt.active = true;
+			fbcur = acur->cnt;
+			fbinc = false;
+		}
 	}
 
-	/* search the opposite direction for a better entry */
+	/*
+	 * Search in the opposite direction for a better entry in the case of
+	 * a bnobt hit or walk backwards from the end of the cntbt.
+	 */
 	if (fbcur) {
 		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
 					    &i);
@@ -1447,9 +1576,10 @@ xfs_alloc_ag_vextent_near(
 	}
 
 	/*
-	 * Second algorithm. Search the bnobt left and right.
+	 * Second algorithm. Combined cntbt and bnobt search to find ideal
+	 * locality.
 	 */
-	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
+	error = xfs_alloc_ag_vextent_locality(args, &acur, &i);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 21fa0e8822e3..a07a5fe22218 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1645,6 +1645,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup_done);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 01/11] xfs: track active state of allocation btree cursors
  2019-09-27 17:17 ` [PATCH v5 01/11] xfs: track active state of allocation btree cursors Brian Foster
@ 2019-09-30  8:11   ` Christoph Hellwig
  2019-09-30 12:17     ` Brian Foster
  2019-10-01  5:35   ` Darrick J. Wong
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-09-30  8:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:17:52PM -0400, Brian Foster wrote:
> The upcoming allocation algorithm update searches multiple
> allocation btree cursors concurrently. As such, it requires an
> active state to track when a particular cursor should continue
> searching. While active state will be modified based on higher level
> logic, we can define base functionality based on the result of
> allocation btree lookups.
> 
> Define an active flag in the private area of the btree cursor.
> Update it based on the result of lookups in the existing allocation
> btree helpers. Finally, provide a new helper to query the current
> state.

I vaguely remember having the discussion before, but why isn't the
active flag in the generic part of xfs_btree_cur and just tracked
for all types?  That would seem bother simpler and more useful in
the long run.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 01/11] xfs: track active state of allocation btree cursors
  2019-09-30  8:11   ` Christoph Hellwig
@ 2019-09-30 12:17     ` Brian Foster
  2019-10-01  6:36       ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2019-09-30 12:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 30, 2019 at 01:11:38AM -0700, Christoph Hellwig wrote:
> On Fri, Sep 27, 2019 at 01:17:52PM -0400, Brian Foster wrote:
> > The upcoming allocation algorithm update searches multiple
> > allocation btree cursors concurrently. As such, it requires an
> > active state to track when a particular cursor should continue
> > searching. While active state will be modified based on higher level
> > logic, we can define base functionality based on the result of
> > allocation btree lookups.
> > 
> > Define an active flag in the private area of the btree cursor.
> > Update it based on the result of lookups in the existing allocation
> > btree helpers. Finally, provide a new helper to query the current
> > state.
> 
> I vaguely remember having the discussion before, but why isn't the
> active flag in the generic part of xfs_btree_cur and just tracked
> for all types?  That would seem bother simpler and more useful in
> the long run.

The active flag was in the allocation cursor originally and was moved to
the private portion of the btree cursor simply because IIRC that's where
you suggested to put it. FWIW, that seems like the appropriate place to
me because 1.) as of right now I don't have any other use case in mind
outside of allocbt cursors 2.) flag state is similarly managed in the
allocation btree helpers and 3.) the flag is not necessarily used as a
generic btree cursor state (it is more accurately a superset of the
generic btree state where the allocation algorithm can also make higher
level changes). The latter bit is why it was originally put in the
allocation tracking structure, FWIW.

I've no fundamental objection to moving some or all of this to more
generic code down the road, but I'd prefer not to do that until there's
another user so the above can be rectified against an actual use case.
I can include the reasoning for the current placement in the commit log
description if that is useful.

Brian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 01/11] xfs: track active state of allocation btree cursors
  2019-09-27 17:17 ` [PATCH v5 01/11] xfs: track active state of allocation btree cursors Brian Foster
  2019-09-30  8:11   ` Christoph Hellwig
@ 2019-10-01  5:35   ` Darrick J. Wong
  1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-10-01  5:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:17:52PM -0400, Brian Foster wrote:
> The upcoming allocation algorithm update searches multiple
> allocation btree cursors concurrently. As such, it requires an
> active state to track when a particular cursor should continue
> searching. While active state will be modified based on higher level
> logic, we can define base functionality based on the result of
> allocation btree lookups.
> 
> Define an active flag in the private area of the btree cursor.
> Update it based on the result of lookups in the existing allocation
> btree helpers. Finally, provide a new helper to query the current
> state.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks good to me,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c       | 24 +++++++++++++++++++++---
>  fs/xfs/libxfs/xfs_alloc_btree.c |  1 +
>  fs/xfs/libxfs/xfs_btree.h       |  3 +++
>  3 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 533b04aaf6f6..0ecc142c833b 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -146,9 +146,13 @@ xfs_alloc_lookup_eq(
>  	xfs_extlen_t		len,	/* length of extent */
>  	int			*stat)	/* success/failure */
>  {
> +	int			error;
> +
>  	cur->bc_rec.a.ar_startblock = bno;
>  	cur->bc_rec.a.ar_blockcount = len;
> -	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> +	error = xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> +	cur->bc_private.a.priv.abt.active = (*stat == 1);
> +	return error;
>  }
>  
>  /*
> @@ -162,9 +166,13 @@ xfs_alloc_lookup_ge(
>  	xfs_extlen_t		len,	/* length of extent */
>  	int			*stat)	/* success/failure */
>  {
> +	int			error;
> +
>  	cur->bc_rec.a.ar_startblock = bno;
>  	cur->bc_rec.a.ar_blockcount = len;
> -	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> +	error = xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> +	cur->bc_private.a.priv.abt.active = (*stat == 1);
> +	return error;
>  }
>  
>  /*
> @@ -178,9 +186,19 @@ xfs_alloc_lookup_le(
>  	xfs_extlen_t		len,	/* length of extent */
>  	int			*stat)	/* success/failure */
>  {
> +	int			error;
>  	cur->bc_rec.a.ar_startblock = bno;
>  	cur->bc_rec.a.ar_blockcount = len;
> -	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> +	error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> +	cur->bc_private.a.priv.abt.active = (*stat == 1);
> +	return error;
> +}
> +
> +static inline bool
> +xfs_alloc_cur_active(
> +	struct xfs_btree_cur	*cur)
> +{
> +	return cur && cur->bc_private.a.priv.abt.active;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
> index 2a94543857a1..279694d73e4e 100644
> --- a/fs/xfs/libxfs/xfs_alloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_alloc_btree.c
> @@ -507,6 +507,7 @@ xfs_allocbt_init_cursor(
>  
>  	cur->bc_private.a.agbp = agbp;
>  	cur->bc_private.a.agno = agno;
> +	cur->bc_private.a.priv.abt.active = false;
>  
>  	if (xfs_sb_version_hascrc(&mp->m_sb))
>  		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index ced1e65d1483..b4e3ec1d7ff9 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -183,6 +183,9 @@ union xfs_btree_cur_private {
>  		unsigned long	nr_ops;		/* # record updates */
>  		int		shape_changes;	/* # of extent splits */
>  	} refc;
> +	struct {
> +		bool		active;		/* allocation cursor state */
> +	} abt;
>  };
>  
>  /*
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 01/11] xfs: track active state of allocation btree cursors
  2019-09-30 12:17     ` Brian Foster
@ 2019-10-01  6:36       ` Christoph Hellwig
  2019-10-01 10:30         ` Brian Foster
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2019-10-01  6:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Mon, Sep 30, 2019 at 08:17:01AM -0400, Brian Foster wrote:
> The active flag was in the allocation cursor originally and was moved to
> the private portion of the btree cursor simply because IIRC that's where
> you suggested to put it.

My memory starts fading, but IIRC you had a separate containing
structure and I asked to move it into xfs_btree_cur itself.

> FWIW, that seems like the appropriate place to
> me because 1.) as of right now I don't have any other use case in mind
> outside of allocbt cursors 2.) flag state is similarly managed in the
> allocation btree helpers and 3.) the flag is not necessarily used as a
> generic btree cursor state (it is more accurately a superset of the
> generic btree state where the allocation algorithm can also make higher
> level changes). The latter bit is why it was originally put in the
> allocation tracking structure, FWIW.

Ok, sounds fine with me for now.  I just feels like doing it in the
generic code would actually be simpler than updating all the wrappers.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 01/11] xfs: track active state of allocation btree cursors
  2019-10-01  6:36       ` Christoph Hellwig
@ 2019-10-01 10:30         ` Brian Foster
  0 siblings, 0 replies; 22+ messages in thread
From: Brian Foster @ 2019-10-01 10:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Mon, Sep 30, 2019 at 11:36:34PM -0700, Christoph Hellwig wrote:
> On Mon, Sep 30, 2019 at 08:17:01AM -0400, Brian Foster wrote:
> > The active flag was in the allocation cursor originally and was moved to
> > the private portion of the btree cursor simply because IIRC that's where
> > you suggested to put it.
> 
> My memory starts fading, but IIRC you had a separate containing
> structure and I asked to move it into xfs_btree_cur itself.
> 

Right, that's the "allocation cursor" structure. I'd eventually like to
fold that into or with the existing allocation arg structure, but that's
something for after the other allocation modes are converted.

Anyways.. this was all buried in a single patch as well that makes it
harder to dig out. For reference, the original feedback was here:

https://marc.info/?l=linux-xfs&m=155750947225047&w=2

> > FWIW, that seems like the appropriate place to
> > me because 1.) as of right now I don't have any other use case in mind
> > outside of allocbt cursors 2.) flag state is similarly managed in the
> > allocation btree helpers and 3.) the flag is not necessarily used as a
> > generic btree cursor state (it is more accurately a superset of the
> > generic btree state where the allocation algorithm can also make higher
> > level changes). The latter bit is why it was originally put in the
> > allocation tracking structure, FWIW.
> 
> Ok, sounds fine with me for now.  I just feels like doing it in the
> generic code would actually be simpler than updating all the wrappers.

Ok. It's not quite as simple due to the semantics described above. I'm
not totally convinced the generic "active" state would exactly match the
semantics used by the block allocation code. I'd hate to bury it in
there as is and have it end up being a landmine or wart if it is not
ever reused outside of extent allocation (or replaced with something
cleaner, ideally).

Brian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor
  2019-09-27 17:17 ` [PATCH v5 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
@ 2019-10-04 22:35   ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-10-04 22:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:17:55PM -0400, Brian Foster wrote:
> If the size lookup lands in the last block of the by-size btree, the
> near mode algorithm scans the entire block for the extent with best
> available locality. In preparation for similar best available
> extent tracking across both btrees, extend the allocation cursor
> with best extent data and lift the associated state from the cntbt
> last block scan code. No functional changes.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 63 ++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index aa71a01a93b1..4514a059486e 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -716,6 +716,11 @@ struct xfs_alloc_cur {
>  	struct xfs_btree_cur		*cnt;	/* btree cursors */
>  	struct xfs_btree_cur		*bnolt;
>  	struct xfs_btree_cur		*bnogt;
> +	xfs_agblock_t			rec_bno;/* extent startblock */
> +	xfs_extlen_t			rec_len;/* extent length */
> +	xfs_agblock_t			bno;	/* alloc bno */
> +	xfs_extlen_t			len;	/* alloc len */
> +	xfs_extlen_t			diff;	/* diff from search bno */
>  	unsigned int			busy_gen;/* busy state */
>  	bool				busy;
>  };
> @@ -735,6 +740,11 @@ xfs_alloc_cur_setup(
>  
>  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
>  
> +	acur->rec_bno = 0;
> +	acur->rec_len = 0;
> +	acur->bno = 0;
> +	acur->len = 0;
> +	acur->diff = 0;
>  	acur->busy = false;
>  	acur->busy_gen = 0;
>  
> @@ -1247,10 +1257,7 @@ xfs_alloc_ag_vextent_near(
>  	 * but we never loop back to the top.
>  	 */
>  	while (xfs_btree_islastblock(acur.cnt, 0)) {
> -		xfs_extlen_t	bdiff;
> -		int		besti=0;
> -		xfs_extlen_t	blen=0;
> -		xfs_agblock_t	bnew=0;
> +		xfs_extlen_t	diff;
>  
>  #ifdef DEBUG
>  		if (dofirst)
> @@ -1281,8 +1288,8 @@ xfs_alloc_ag_vextent_near(
>  				break;
>  		}
>  		i = acur.cnt->bc_ptrs[0];
> -		for (j = 1, blen = 0, bdiff = 0;
> -		     !error && j && (blen < args->maxlen || bdiff > 0);
> +		for (j = 1;
> +		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
>  		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
>  			/*
>  			 * For each entry, decide if it's better than
> @@ -1301,44 +1308,40 @@ xfs_alloc_ag_vextent_near(
>  			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
>  			xfs_alloc_fix_len(args);
>  			ASSERT(args->len >= args->minlen);
> -			if (args->len < blen)
> +			if (args->len < acur.len)
>  				continue;
> -			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> +			diff = xfs_alloc_compute_diff(args->agbno, args->len,
>  				args->alignment, args->datatype, ltbnoa,
>  				ltlena, &ltnew);
>  			if (ltnew != NULLAGBLOCK &&
> -			    (args->len > blen || ltdiff < bdiff)) {
> -				bdiff = ltdiff;
> -				bnew = ltnew;
> -				blen = args->len;
> -				besti = acur.cnt->bc_ptrs[0];
> +			    (args->len > acur.len || diff < acur.diff)) {
> +				acur.rec_bno = ltbno;
> +				acur.rec_len = ltlen;
> +				acur.diff = diff;
> +				acur.bno = ltnew;
> +				acur.len = args->len;
>  			}
>  		}
>  		/*
>  		 * It didn't work.  We COULD be in a case where
>  		 * there's a good record somewhere, so try again.
>  		 */
> -		if (blen == 0)
> +		if (acur.len == 0)
>  			break;
> -		/*
> -		 * Point at the best entry, and retrieve it again.
> -		 */
> -		acur.cnt->bc_ptrs[0] = besti;
> -		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> -		if (error)
> -			goto out;
> -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> -		args->len = blen;
>  
>  		/*
> -		 * We are allocating starting at bnew for blen blocks.
> +		 * Allocate at the bno/len tracked in the cursor.
>  		 */
> -		args->agbno = bnew;
> -		ASSERT(bnew >= ltbno);
> -		ASSERT(bnew + blen <= ltbno + ltlen);
> -		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
> -					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
> +		args->agbno = acur.bno;
> +		args->len = acur.len;
> +		ASSERT(acur.bno >= acur.rec_bno);
> +		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> +		ASSERT(acur.rec_bno + acur.rec_len <=
> +		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> +
> +		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
> +				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
> +				0);
>  		if (error)
>  			goto out;
>  		trace_xfs_alloc_near_first(args);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper
  2019-09-27 17:17 ` [PATCH v5 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
@ 2019-10-04 22:40   ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-10-04 22:40 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:17:56PM -0400, Brian Foster wrote:
> The cntbt lastblock scan checks the size, alignment, locality, etc.
> of each free extent in the block and compares it with the current
> best candidate. This logic will be reused by the upcoming optimized
> cntbt algorithm, so refactor it into a separate helper. Note that
> acur->diff is now initialized to -1 (unsigned) instead of 0 to
> support the more granular comparison logic in the new helper.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 115 ++++++++++++++++++++++++++++----------
>  fs/xfs/xfs_trace.h        |  26 +++++++++
>  2 files changed, 113 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 4514a059486e..38183953636a 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -744,7 +744,7 @@ xfs_alloc_cur_setup(
>  	acur->rec_len = 0;
>  	acur->bno = 0;
>  	acur->len = 0;
> -	acur->diff = 0;
> +	acur->diff = -1;
>  	acur->busy = false;
>  	acur->busy_gen = 0;
>  
> @@ -791,6 +791,89 @@ xfs_alloc_cur_close(
>  	acur->cnt = acur->bnolt = acur->bnogt = NULL;
>  }
>  
> +/*
> + * Check an extent for allocation and track the best available candidate in the
> + * allocation structure. The cursor is deactivated if it has entered an out of
> + * range state based on allocation arguments. Optionally return the extent
> + * extent geometry and allocation status if requested by the caller.
> + */
> +static int
> +xfs_alloc_cur_check(
> +	struct xfs_alloc_arg	*args,
> +	struct xfs_alloc_cur	*acur,
> +	struct xfs_btree_cur	*cur,
> +	int			*new)
> +{
> +	int			error, i;
> +	xfs_agblock_t		bno, bnoa, bnew;
> +	xfs_extlen_t		len, lena, diff = -1;
> +	bool			busy;
> +	unsigned		busy_gen = 0;
> +	bool			deactivate = false;
> +
> +	*new = 0;
> +
> +	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
> +	if (error)
> +		return error;
> +	XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
> +
> +	/*
> +	 * Check minlen and deactivate a cntbt cursor if out of acceptable size
> +	 * range (i.e., walking backwards looking for a minlen extent).
> +	 */
> +	if (len < args->minlen) {
> +		deactivate = true;
> +		goto out;
> +	}
> +
> +	busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
> +					 &busy_gen);
> +	acur->busy |= busy;
> +	if (busy)
> +		acur->busy_gen = busy_gen;
> +	/* deactivate a bnobt cursor outside of locality range */
> +	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
> +		goto out;
> +	if (lena < args->minlen)
> +		goto out;
> +
> +	args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
> +	xfs_alloc_fix_len(args);
> +	ASSERT(args->len >= args->minlen);
> +	if (args->len < acur->len)
> +		goto out;
> +
> +	/*
> +	 * We have an aligned record that satisfies minlen and beats or matches
> +	 * the candidate extent size. Compare locality for near allocation mode.
> +	 */
> +	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
> +	diff = xfs_alloc_compute_diff(args->agbno, args->len,
> +				      args->alignment, args->datatype,
> +				      bnoa, lena, &bnew);
> +	if (bnew == NULLAGBLOCK)
> +		goto out;
> +	if (diff > acur->diff)
> +		goto out;
> +
> +	ASSERT(args->len > acur->len ||
> +	       (args->len == acur->len && diff <= acur->diff));
> +	acur->rec_bno = bno;
> +	acur->rec_len = len;
> +	acur->bno = bnew;
> +	acur->len = args->len;
> +	acur->diff = diff;
> +	*new = 1;
> +
> +out:
> +	if (deactivate)
> +		cur->bc_private.a.priv.abt.active = false;
> +	trace_xfs_alloc_cur_check(args->mp, cur->bc_btnum, bno, len, diff,
> +				  *new);
> +	return 0;
> +}
> +
>  /*
>   * Deal with the case where only small freespaces remain. Either return the
>   * contents of the last freespace record, or allocate space from the freelist if
> @@ -1257,8 +1340,6 @@ xfs_alloc_ag_vextent_near(
>  	 * but we never loop back to the top.
>  	 */
>  	while (xfs_btree_islastblock(acur.cnt, 0)) {
> -		xfs_extlen_t	diff;
> -
>  #ifdef DEBUG
>  		if (dofirst)
>  			break;
> @@ -1289,38 +1370,16 @@ xfs_alloc_ag_vextent_near(
>  		}
>  		i = acur.cnt->bc_ptrs[0];
>  		for (j = 1;
> -		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
> +		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
> +		     (acur.len < args->maxlen || acur.diff > 0);
>  		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
>  			/*
>  			 * For each entry, decide if it's better than
>  			 * the previous best entry.
>  			 */
> -			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> +			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
>  			if (error)
>  				goto out;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
> -					&ltbnoa, &ltlena, &acur.busy_gen);
> -			if (ltlena < args->minlen)
> -				continue;
> -			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
> -				continue;
> -			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -			ASSERT(args->len >= args->minlen);
> -			if (args->len < acur.len)
> -				continue;
> -			diff = xfs_alloc_compute_diff(args->agbno, args->len,
> -				args->alignment, args->datatype, ltbnoa,
> -				ltlena, &ltnew);
> -			if (ltnew != NULLAGBLOCK &&
> -			    (args->len > acur.len || diff < acur.diff)) {
> -				acur.rec_bno = ltbno;
> -				acur.rec_len = ltlen;
> -				acur.diff = diff;
> -				acur.bno = ltnew;
> -				acur.len = args->len;
> -			}
>  		}
>  		/*
>  		 * It didn't work.  We COULD be in a case where
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index eaae275ed430..cf618919cacf 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1663,6 +1663,32 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
>  
> +TRACE_EVENT(xfs_alloc_cur_check,
> +	TP_PROTO(struct xfs_mount *mp, xfs_btnum_t btnum, xfs_agblock_t bno,
> +		 xfs_extlen_t len, xfs_extlen_t diff, bool new),
> +	TP_ARGS(mp, btnum, bno, len, diff, new),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(xfs_agblock_t, bno)
> +		__field(xfs_extlen_t, len)
> +		__field(xfs_extlen_t, diff)
> +		__field(bool, new)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->btnum = btnum;
> +		__entry->bno = bno;
> +		__entry->len = len;
> +		__entry->diff = diff;
> +		__entry->new = new;
> +	),
> +	TP_printk("dev %d:%d btree %s bno 0x%x len 0x%x diff 0x%x new %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> +		  __entry->bno, __entry->len, __entry->diff, __entry->new)
> +)
> +
>  DECLARE_EVENT_CLASS(xfs_da_class,
>  	TP_PROTO(struct xfs_da_args *args),
>  	TP_ARGS(args),
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-09-27 17:17 ` [PATCH v5 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
@ 2019-10-04 22:45   ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-10-04 22:45 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:17:57PM -0400, Brian Foster wrote:
> The near mode bnobt scan searches left and right in the bnobt
> looking for the closest free extent to the allocation hint that
> satisfies minlen. Once such an extent is found, the left/right
> search terminates, we search one more time in the opposite direction
> and finish the allocation with the best overall extent.
> 
> The left/right and find best searches are currently controlled via a
> combination of cursor state and local variables. Clean up this code
> and prepare for further improvements to the near mode fallback
> algorithm by reusing the allocation cursor best extent tracking
> mechanism. Update the tracking logic to deactivate bnobt cursors
> when out of allocation range and replace open-coded extent checks to
> calls to the common helper. In doing so, rename some misnamed local
> variables in the top-level near mode allocation function.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks reasonable,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 276 +++++++++++---------------------------
>  fs/xfs/xfs_trace.h        |   4 +-
>  2 files changed, 79 insertions(+), 201 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 38183953636a..d412ae41676a 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -810,6 +810,7 @@ xfs_alloc_cur_check(
>  	bool			busy;
>  	unsigned		busy_gen = 0;
>  	bool			deactivate = false;
> +	bool			isbnobt = cur->bc_btnum == XFS_BTNUM_BNO;
>  
>  	*new = 0;
>  
> @@ -823,7 +824,7 @@ xfs_alloc_cur_check(
>  	 * range (i.e., walking backwards looking for a minlen extent).
>  	 */
>  	if (len < args->minlen) {
> -		deactivate = true;
> +		deactivate = !isbnobt;
>  		goto out;
>  	}
>  
> @@ -833,8 +834,10 @@ xfs_alloc_cur_check(
>  	if (busy)
>  		acur->busy_gen = busy_gen;
>  	/* deactivate a bnobt cursor outside of locality range */
> -	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
> +	if (bnoa < args->min_agbno || bnoa > args->max_agbno) {
> +		deactivate = isbnobt;
>  		goto out;
> +	}
>  	if (lena < args->minlen)
>  		goto out;
>  
> @@ -854,8 +857,14 @@ xfs_alloc_cur_check(
>  				      bnoa, lena, &bnew);
>  	if (bnew == NULLAGBLOCK)
>  		goto out;
> -	if (diff > acur->diff)
> +
> +	/*
> +	 * Deactivate a bnobt cursor with worse locality than the current best.
> +	 */
> +	if (diff > acur->diff) {
> +		deactivate = isbnobt;
>  		goto out;
> +	}
>  
>  	ASSERT(args->len > acur->len ||
>  	       (args->len == acur->len && diff <= acur->diff));
> @@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
>  }
>  
>  /*
> - * Search the btree in a given direction via the search cursor and compare
> - * the records found against the good extent we've already found.
> + * Search the btree in a given direction and check the records against the good
> + * extent we've already found.
>   */
>  STATIC int
>  xfs_alloc_find_best_extent(
> -	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> -	struct xfs_btree_cur	*gcur,	/* good cursor */
> -	struct xfs_btree_cur	*scur,	/* searching cursor */
> -	xfs_agblock_t		gdiff,	/* difference for search comparison */
> -	xfs_agblock_t		*sbno,	/* extent found by search */
> -	xfs_extlen_t		*slen,	/* extent length */
> -	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
> -	xfs_extlen_t		*slena,	/* aligned extent length */
> -	int			dir)	/* 0 = search right, 1 = search left */
> +	struct xfs_alloc_arg	*args,
> +	struct xfs_alloc_cur	*acur,
> +	struct xfs_btree_cur	*cur,
> +	bool			increment)
>  {
> -	xfs_agblock_t		new;
> -	xfs_agblock_t		sdiff;
>  	int			error;
>  	int			i;
> -	unsigned		busy_gen;
> -
> -	/* The good extent is perfect, no need to  search. */
> -	if (!gdiff)
> -		goto out_use_good;
>  
>  	/*
> -	 * Look until we find a better one, run out of space or run off the end.
> +	 * Search so long as the cursor is active or we find a better extent.
> +	 * The cursor is deactivated if it extends beyond the range of the
> +	 * current allocation candidate.
>  	 */
> -	do {
> -		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
> +	while (xfs_alloc_cur_active(cur)) {
> +		error = xfs_alloc_cur_check(args, acur, cur, &i);
>  		if (error)
> -			goto error0;
> -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> -		xfs_alloc_compute_aligned(args, *sbno, *slen,
> -				sbnoa, slena, &busy_gen);
> -
> -		/*
> -		 * The good extent is closer than this one.
> -		 */
> -		if (!dir) {
> -			if (*sbnoa > args->max_agbno)
> -				goto out_use_good;
> -			if (*sbnoa >= args->agbno + gdiff)
> -				goto out_use_good;
> -		} else {
> -			if (*sbnoa < args->min_agbno)
> -				goto out_use_good;
> -			if (*sbnoa <= args->agbno - gdiff)
> -				goto out_use_good;
> -		}
> -
> -		/*
> -		 * Same distance, compare length and pick the best.
> -		 */
> -		if (*slena >= args->minlen) {
> -			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -
> -			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> -						       args->alignment,
> -						       args->datatype, *sbnoa,
> -						       *slena, &new);
> -
> -			/*
> -			 * Choose closer size and invalidate other cursor.
> -			 */
> -			if (sdiff < gdiff)
> -				goto out_use_search;
> -			goto out_use_good;
> -		}
> +			return error;
> +		if (i == 1)
> +			break;
> +		if (!xfs_alloc_cur_active(cur))
> +			break;
>  
> -		if (!dir)
> -			error = xfs_btree_increment(scur, 0, &i);
> +		if (increment)
> +			error = xfs_btree_increment(cur, 0, &i);
>  		else
> -			error = xfs_btree_decrement(scur, 0, &i);
> +			error = xfs_btree_decrement(cur, 0, &i);
>  		if (error)
> -			goto error0;
> -	} while (i);
> -
> -out_use_good:
> -	scur->bc_private.a.priv.abt.active = false;
> -	return 0;
> +			return error;
> +		if (i == 0)
> +			cur->bc_private.a.priv.abt.active = false;
> +	}
>  
> -out_use_search:
> -	gcur->bc_private.a.priv.abt.active = false;
>  	return 0;
> -
> -error0:
> -	/* caller invalidates cursors */
> -	return error;
>  }
>  
>  /*
> @@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
>  	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_alloc_cur	acur = {0,};
> -	struct xfs_btree_cur	*bno_cur;
> -	xfs_agblock_t	gtbno;		/* start bno of right side entry */
> -	xfs_agblock_t	gtbnoa;		/* aligned ... */
> -	xfs_extlen_t	gtdiff;		/* difference to right side entry */
> -	xfs_extlen_t	gtlen;		/* length of right side entry */
> -	xfs_extlen_t	gtlena;		/* aligned ... */
> -	xfs_agblock_t	gtnew;		/* useful start bno of right side */
> -	int		error;		/* error code */
> -	int		i;		/* result code, temporary */
> -	int		j;		/* result code, temporary */
> -	xfs_agblock_t	ltbno;		/* start bno of left side entry */
> -	xfs_agblock_t	ltbnoa;		/* aligned ... */
> -	xfs_extlen_t	ltdiff;		/* difference to left side entry */
> -	xfs_extlen_t	ltlen;		/* length of left side entry */
> -	xfs_extlen_t	ltlena;		/* aligned ... */
> -	xfs_agblock_t	ltnew;		/* useful start bno of left side */
> -	xfs_extlen_t	rlen;		/* length of returned extent */
> +	struct xfs_btree_cur	*fbcur = NULL;
> +	int			error;		/* error code */
> +	int			i;		/* result code, temporary */
> +	int			j;		/* result code, temporary */
> +	xfs_agblock_t		bno;
> +	xfs_extlen_t		len;
> +	bool			fbinc = false;
>  #ifdef DEBUG
>  	/*
>  	 * Randomly don't execute the first algorithm.
> @@ -1304,9 +1251,7 @@ xfs_alloc_ag_vextent_near(
>  		args->agbno = args->max_agbno;
>  
>  restart:
> -	ltlen = 0;
> -	gtlena = 0;
> -	ltlena = 0;
> +	len = 0;
>  
>  	/*
>  	 * Set up cursors and see if there are any free extents as big as
> @@ -1315,11 +1260,11 @@ xfs_alloc_ag_vextent_near(
>  	 */
>  	error = xfs_alloc_cur_setup(args, &acur);
>  	if (error == -ENOSPC) {
> -		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
> -				&ltlen, &i);
> +		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &bno,
> +				&len, &i);
>  		if (error)
>  			goto out;
> -		if (i == 0 || ltlen == 0) {
> +		if (i == 0 || len == 0) {
>  			trace_xfs_alloc_near_noentry(args);
>  			goto out;
>  		}
> @@ -1350,21 +1295,21 @@ xfs_alloc_ag_vextent_near(
>  		 * record smaller than maxlen, go to the start of this block,
>  		 * and skip all those smaller than minlen.
>  		 */
> -		if (ltlen || args->alignment > 1) {
> +		if (len || args->alignment > 1) {
>  			acur.cnt->bc_ptrs[0] = 1;
>  			do {
> -				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
> -						&ltlen, &i);
> +				error = xfs_alloc_get_rec(acur.cnt, &bno, &len,
> +						&i);
>  				if (error)
>  					goto out;
>  				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -				if (ltlen >= args->minlen)
> +				if (len >= args->minlen)
>  					break;
>  				error = xfs_btree_increment(acur.cnt, 0, &i);
>  				if (error)
>  					goto out;
>  			} while (i);
> -			ASSERT(ltlen >= args->minlen);
> +			ASSERT(len >= args->minlen);
>  			if (!i)
>  				break;
>  		}
> @@ -1433,77 +1378,43 @@ xfs_alloc_ag_vextent_near(
>  	 */
>  	do {
>  		if (xfs_alloc_cur_active(acur.bnolt)) {
> -			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
> +			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
>  			if (error)
>  				goto out;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
> -					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
> -			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
> +			if (i == 1) {
> +				trace_xfs_alloc_cur_left(args);
> +				fbcur = acur.bnogt;
> +				fbinc = true;
>  				break;
> +			}
>  			error = xfs_btree_decrement(acur.bnolt, 0, &i);
>  			if (error)
>  				goto out;
> -			if (!i || ltbnoa < args->min_agbno)
> +			if (!i)
>  				acur.bnolt->bc_private.a.priv.abt.active = false;
>  		}
>  		if (xfs_alloc_cur_active(acur.bnogt)) {
> -			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
> +			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
>  			if (error)
>  				goto out;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
> -					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
> -			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
> +			if (i == 1) {
> +				trace_xfs_alloc_cur_right(args);
> +				fbcur = acur.bnolt;
> +				fbinc = false;
>  				break;
> +			}
>  			error = xfs_btree_increment(acur.bnogt, 0, &i);
>  			if (error)
>  				goto out;
> -			if (!i || gtbnoa > args->max_agbno)
> +			if (!i)
>  				acur.bnogt->bc_private.a.priv.abt.active = false;
>  		}
>  	} while (xfs_alloc_cur_active(acur.bnolt) ||
>  		 xfs_alloc_cur_active(acur.bnogt));
>  
> -	/*
> -	 * Got both cursors still active, need to find better entry.
> -	 */
> -	if (xfs_alloc_cur_active(acur.bnolt) &&
> -	    xfs_alloc_cur_active(acur.bnogt)) {
> -		if (ltlena >= args->minlen) {
> -			/*
> -			 * Left side is good, look for a right side entry.
> -			 */
> -			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> -				args->alignment, args->datatype, ltbnoa,
> -				ltlena, &ltnew);
> -
> -			error = xfs_alloc_find_best_extent(args,
> -						acur.bnolt, acur.bnogt,
> -						ltdiff, &gtbno, &gtlen,
> -						&gtbnoa, &gtlena,
> -						0 /* search right */);
> -		} else {
> -			ASSERT(gtlena >= args->minlen);
> -
> -			/*
> -			 * Right side is good, look for a left side entry.
> -			 */
> -			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> -				args->alignment, args->datatype, gtbnoa,
> -				gtlena, &gtnew);
> -
> -			error = xfs_alloc_find_best_extent(args,
> -						acur.bnogt, acur.bnolt,
> -						gtdiff, &ltbno, &ltlen,
> -						&ltbnoa, &ltlena,
> -						1 /* search left */);
> -		}
> -
> +	/* search the opposite direction for a better entry */
> +	if (fbcur) {
> +		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
>  		if (error)
>  			goto out;
>  	}
> @@ -1511,8 +1422,7 @@ xfs_alloc_ag_vextent_near(
>  	/*
>  	 * If we couldn't get anything, give up.
>  	 */
> -	if (!xfs_alloc_cur_active(acur.bnolt) &&
> -	    !xfs_alloc_cur_active(acur.bnogt)) {
> +	if (!acur.len) {
>  		if (acur.busy) {
>  			trace_xfs_alloc_near_busy(args);
>  			xfs_extent_busy_flush(args->mp, args->pag,
> @@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
>  		goto out;
>  	}
>  
> -	/*
> -	 * At this point we have selected a freespace entry, either to the
> -	 * left or to the right.  If it's on the right, copy all the
> -	 * useful variables to the "left" set so we only have one
> -	 * copy of this code.
> -	 */
> -	if (xfs_alloc_cur_active(acur.bnogt)) {
> -		bno_cur = acur.bnogt;
> -		ltbno = gtbno;
> -		ltbnoa = gtbnoa;
> -		ltlen = gtlen;
> -		ltlena = gtlena;
> -		j = 1;
> -	} else {
> -		bno_cur = acur.bnolt;
> -		j = 0;
> -	}
> -
> -	/*
> -	 * Fix up the length and compute the useful address.
> -	 */
> -	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> -	xfs_alloc_fix_len(args);
> -	rlen = args->len;
> -	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
> -				     args->datatype, ltbnoa, ltlena, &ltnew);
> -	ASSERT(ltnew >= ltbno);
> -	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
> -	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> -	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
> -	args->agbno = ltnew;
> -
> -	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
> -				      rlen, XFSA_FIXUP_BNO_OK);
> -	if (error)
> -		goto out;
> +	args->agbno = acur.bno;
> +	args->len = acur.len;
> +	ASSERT(acur.bno >= acur.rec_bno);
> +	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> +	ASSERT(acur.rec_bno + acur.rec_len <=
> +	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
>  
> -	if (j)
> -		trace_xfs_alloc_near_greater(args);
> -	else
> -		trace_xfs_alloc_near_lesser(args);
> +	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
> +				      acur.rec_len, acur.bno, acur.len, 0);
>  
>  out:
>  	xfs_alloc_cur_close(&acur, error);
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index cf618919cacf..ebe85e9ed7c8 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
>  DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> -DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
> -DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 08/11] xfs: refactor and reuse best extent scanning logic
  2019-09-27 17:17 ` [PATCH v5 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
@ 2019-10-04 22:59   ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-10-04 22:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:17:59PM -0400, Brian Foster wrote:
> The bnobt "find best" helper implements a simple btree walker
> function. This general pattern, or a subset thereof, is reused in
> various parts of a near mode allocation operation. For example, the
> bnobt left/right scans are each iterative btree walks along with the
> cntbt lastblock scan.
> 
> Rework this function into a generic btree walker, add a couple
> parameters to control termination behavior from various contexts and
> reuse it where applicable.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 110 +++++++++++++++++++-------------------
>  1 file changed, 55 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 32b378c8e16c..85e82e184ec9 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -875,6 +875,13 @@ xfs_alloc_cur_check(
>  	acur->diff = diff;
>  	*new = 1;
>  
> +	/*
> +	 * We're done if we found a perfect allocation. This only deactivates
> +	 * the current cursor, but this is just an optimization to terminate a
> +	 * cntbt search that otherwise runs to the edge of the tree.
> +	 */
> +	if (acur->diff == 0 && acur->len == args->maxlen)
> +		deactivate = true;
>  out:
>  	if (deactivate)
>  		cur->bc_private.a.priv.abt.active = false;
> @@ -1172,30 +1179,38 @@ xfs_alloc_ag_vextent_exact(
>  }
>  
>  /*
> - * Search the btree in a given direction and check the records against the good
> - * extent we've already found.
> + * Search a given number of btree records in a given direction. Check each
> + * record against the good extent we've already found.
>   */
>  STATIC int
> -xfs_alloc_find_best_extent(
> +xfs_alloc_walk_iter(
>  	struct xfs_alloc_arg	*args,
>  	struct xfs_alloc_cur	*acur,
>  	struct xfs_btree_cur	*cur,
> -	bool			increment)
> +	bool			increment,
> +	bool			find_one, /* quit on first candidate */
> +	int			count,    /* rec count (-1 for infinite) */
> +	int			*stat)
>  {
>  	int			error;
>  	int			i;
>  
> +	*stat = 0;
> +
>  	/*
>  	 * Search so long as the cursor is active or we find a better extent.
>  	 * The cursor is deactivated if it extends beyond the range of the
>  	 * current allocation candidate.
>  	 */
> -	while (xfs_alloc_cur_active(cur)) {
> +	while (xfs_alloc_cur_active(cur) && count) {
>  		error = xfs_alloc_cur_check(args, acur, cur, &i);
>  		if (error)
>  			return error;
> -		if (i == 1)
> -			break;
> +		if (i == 1) {
> +			*stat = 1;
> +			if (find_one)
> +				break;
> +		}
>  		if (!xfs_alloc_cur_active(cur))
>  			break;
>  
> @@ -1207,6 +1222,9 @@ xfs_alloc_find_best_extent(
>  			return error;
>  		if (i == 0)
>  			cur->bc_private.a.priv.abt.active = false;
> +
> +		if (count > 0)
> +			count--;
>  	}
>  
>  	return 0;
> @@ -1226,7 +1244,6 @@ xfs_alloc_ag_vextent_near(
>  	struct xfs_btree_cur	*fbcur = NULL;
>  	int			error;		/* error code */
>  	int			i;		/* result code, temporary */
> -	int			j;		/* result code, temporary */
>  	xfs_agblock_t		bno;
>  	xfs_extlen_t		len;
>  	bool			fbinc = false;
> @@ -1313,19 +1330,12 @@ xfs_alloc_ag_vextent_near(
>  			if (!i)
>  				break;
>  		}
> -		i = acur.cnt->bc_ptrs[0];
> -		for (j = 1;
> -		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
> -		     (acur.len < args->maxlen || acur.diff > 0);
> -		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
> -			/*
> -			 * For each entry, decide if it's better than
> -			 * the previous best entry.
> -			 */
> -			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
> -			if (error)
> -				goto out;
> -		}
> +
> +		error = xfs_alloc_walk_iter(args, &acur, acur.cnt, true, false,
> +					    -1, &i);
> +		if (error)
> +			goto out;
> +
>  		/*
>  		 * It didn't work.  We COULD be in a case where
>  		 * there's a good record somewhere, so try again.
> @@ -1357,49 +1367,39 @@ xfs_alloc_ag_vextent_near(
>  		goto out;
>  
>  	/*
> -	 * Loop going left with the leftward cursor, right with the
> -	 * rightward cursor, until either both directions give up or
> -	 * we find an entry at least as big as minlen.
> +	 * Loop going left with the leftward cursor, right with the rightward
> +	 * cursor, until either both directions give up or we find an entry at
> +	 * least as big as minlen.
>  	 */
>  	do {
> -		if (xfs_alloc_cur_active(acur.bnolt)) {
> -			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
> -			if (error)
> -				goto out;
> -			if (i == 1) {
> -				trace_xfs_alloc_cur_left(args);
> -				fbcur = acur.bnogt;
> -				fbinc = true;
> -				break;
> -			}
> -			error = xfs_btree_decrement(acur.bnolt, 0, &i);
> -			if (error)
> -				goto out;
> -			if (!i)
> -				acur.bnolt->bc_private.a.priv.abt.active = false;
> +		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
> +					    true, 1, &i);
> +		if (error)
> +			goto out;
> +		if (i == 1) {
> +			trace_xfs_alloc_cur_left(args);
> +			fbcur = acur.bnogt;
> +			fbinc = true;
> +			break;
>  		}
> -		if (xfs_alloc_cur_active(acur.bnogt)) {
> -			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
> -			if (error)
> -				goto out;
> -			if (i == 1) {
> -				trace_xfs_alloc_cur_right(args);
> -				fbcur = acur.bnolt;
> -				fbinc = false;
> -				break;
> -			}
> -			error = xfs_btree_increment(acur.bnogt, 0, &i);
> -			if (error)
> -				goto out;
> -			if (!i)
> -				acur.bnogt->bc_private.a.priv.abt.active = false;
> +
> +		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
> +					    1, &i);
> +		if (error)
> +			goto out;
> +		if (i == 1) {
> +			trace_xfs_alloc_cur_right(args);
> +			fbcur = acur.bnolt;
> +			fbinc = false;
> +			break;
>  		}
>  	} while (xfs_alloc_cur_active(acur.bnolt) ||
>  		 xfs_alloc_cur_active(acur.bnogt));
>  
>  	/* search the opposite direction for a better entry */
>  	if (fbcur) {
> -		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
> +		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
> +					    &i);
>  		if (error)
>  			goto out;
>  	}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups
  2019-09-27 17:18 ` [PATCH v5 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
@ 2019-10-04 23:20   ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2019-10-04 23:20 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Fri, Sep 27, 2019 at 01:18:02PM -0400, Brian Foster wrote:
> The near mode fallback algorithm consists of a left/right scan of
> the bnobt. This algorithm has very poor breakdown characteristics
> under worst case free space fragmentation conditions. If a suitable
> extent is far enough from the locality hint, each allocation may
> scan most or all of the bnobt before it completes. This causes
> pathological behavior and extremely high allocation latencies.
> 
> While locality is important to near mode allocations, it is not so
> important as to incur pathological allocation latency to provide the
> asolute best available locality for every allocation. If the
> allocation is large enough or far enough away, there is a point of
> diminishing returns. As such, we can bound the overall operation by
> including an iterative cntbt lookup in the broader search. The cntbt
> lookup is optimized to immediately find the extent with best
> locality for the given size on each iteration. Since the cntbt is
> indexed by extent size, the lookup repeats with a variably
> aggressive increasing search key size until it runs off the edge of
> the tree.
> 
> This approach provides a natural balance between the two algorithms
> for various situations. For example, the bnobt scan is able to
> satisfy smaller allocations such as for inode chunks or btree blocks
> more quickly where the cntbt search may have to search through a
> large set of extent sizes when the search key starts off small
> relative to the largest extent in the tree. On the other hand, the
> cntbt search more deterministically covers the set of suitable
> extents for larger data extent allocation requests that the bnobt
> scan may have to search the entire tree to locate.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok, I'll give it a spin along with all the other 5.5 stuff...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 154 +++++++++++++++++++++++++++++++++++---
>  fs/xfs/xfs_trace.h        |   2 +
>  2 files changed, 144 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index b16aa41507e8..e9f74eb92073 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -716,6 +716,7 @@ struct xfs_alloc_cur {
>  	struct xfs_btree_cur		*cnt;	/* btree cursors */
>  	struct xfs_btree_cur		*bnolt;
>  	struct xfs_btree_cur		*bnogt;
> +	xfs_extlen_t			cur_len;/* current search length */
>  	xfs_agblock_t			rec_bno;/* extent startblock */
>  	xfs_extlen_t			rec_len;/* extent length */
>  	xfs_agblock_t			bno;	/* alloc bno */
> @@ -740,6 +741,7 @@ xfs_alloc_cur_setup(
>  
>  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
>  
> +	acur->cur_len = args->maxlen;
>  	acur->rec_bno = 0;
>  	acur->rec_len = 0;
>  	acur->bno = 0;
> @@ -920,6 +922,76 @@ xfs_alloc_cur_finish(
>  	return 0;
>  }
>  
> +/*
> + * Locality allocation lookup algorithm. This expects a cntbt cursor and uses
> + * bno optimized lookup to search for extents with ideal size and locality.
> + */
> +STATIC int
> +xfs_alloc_cntbt_iter(
> +	struct xfs_alloc_arg		*args,
> +	struct xfs_alloc_cur		*acur)
> +{
> +	struct xfs_btree_cur	*cur = acur->cnt;
> +	xfs_agblock_t		bno;
> +	xfs_extlen_t		len, cur_len;
> +	int			error;
> +	int			i;
> +
> +	if (!xfs_alloc_cur_active(cur))
> +		return 0;
> +
> +	/* locality optimized lookup */
> +	cur_len = acur->cur_len;
> +	error = xfs_alloc_lookup_ge(cur, args->agbno, cur_len, &i);
> +	if (error)
> +		return error;
> +	if (i == 0)
> +		return 0;
> +	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
> +	if (error)
> +		return error;
> +
> +	/* check the current record and update search length from it */
> +	error = xfs_alloc_cur_check(args, acur, cur, &i);
> +	if (error)
> +		return error;
> +	ASSERT(len >= acur->cur_len);
> +	acur->cur_len = len;
> +
> +	/*
> +	 * We looked up the first record >= [agbno, len] above. The agbno is a
> +	 * secondary key and so the current record may lie just before or after
> +	 * agbno. If it is past agbno, check the previous record too so long as
> +	 * the length matches as it may be closer. Don't check a smaller record
> +	 * because that could deactivate our cursor.
> +	 */
> +	if (bno > args->agbno) {
> +		error = xfs_btree_decrement(cur, 0, &i);
> +		if (!error && i) {
> +			error = xfs_alloc_get_rec(cur, &bno, &len, &i);
> +			if (!error && i && len == acur->cur_len)
> +				error = xfs_alloc_cur_check(args, acur, cur,
> +							    &i);
> +		}
> +		if (error)
> +			return error;
> +	}
> +
> +	/*
> +	 * Increment the search key until we find at least one allocation
> +	 * candidate or if the extent we found was larger. Otherwise, double the
> +	 * search key to optimize the search. Efficiency is more important here
> +	 * than absolute best locality.
> +	 */
> +	cur_len <<= 1;
> +	if (!acur->len || acur->cur_len >= cur_len)
> +		acur->cur_len++;
> +	else
> +		acur->cur_len = cur_len;
> +
> +	return error;
> +}
> +
>  /*
>   * Deal with the case where only small freespaces remain. Either return the
>   * contents of the last freespace record, or allocate space from the freelist if
> @@ -1261,12 +1333,11 @@ xfs_alloc_walk_iter(
>  }
>  
>  /*
> - * Search in the by-bno btree to the left and to the right simultaneously until
> - * in each case we find a large enough free extent or run into the edge of the
> - * tree. When we run into the edge of the tree, we deactivate that cursor.
> + * Search the by-bno and by-size btrees in parallel in search of an extent with
> + * ideal locality based on the NEAR mode ->agbno locality hint.
>   */
>  STATIC int
> -xfs_alloc_ag_vextent_bnobt(
> +xfs_alloc_ag_vextent_locality(
>  	struct xfs_alloc_arg	*args,
>  	struct xfs_alloc_cur	*acur,
>  	int			*stat)
> @@ -1281,6 +1352,9 @@ xfs_alloc_ag_vextent_bnobt(
>  
>  	*stat = 0;
>  
> +	error = xfs_alloc_lookup_ge(acur->cnt, args->agbno, acur->cur_len, &i);
> +	if (error)
> +		return error;
>  	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
>  	if (error)
>  		return error;
> @@ -1289,12 +1363,37 @@ xfs_alloc_ag_vextent_bnobt(
>  		return error;
>  
>  	/*
> -	 * Loop going left with the leftward cursor, right with the rightward
> -	 * cursor, until either both directions give up or we find an entry at
> -	 * least as big as minlen.
> +	 * Search the bnobt and cntbt in parallel. Search the bnobt left and
> +	 * right and lookup the closest extent to the locality hint for each
> +	 * extent size key in the cntbt. The entire search terminates
> +	 * immediately on a bnobt hit because that means we've found best case
> +	 * locality. Otherwise the search continues until the cntbt cursor runs
> +	 * off the end of the tree. If no allocation candidate is found at this
> +	 * point, give up on locality, walk backwards from the end of the cntbt
> +	 * and take the first available extent.
> +	 *
> +	 * The parallel tree searches balance each other out to provide fairly
> +	 * consistent performance for various situations. The bnobt search can
> +	 * have pathological behavior in the worst case scenario of larger
> +	 * allocation requests and fragmented free space. On the other hand, the
> +	 * bnobt is able to satisfy most smaller allocation requests much more
> +	 * quickly than the cntbt. The cntbt search can sift through fragmented
> +	 * free space and sets of free extents for larger allocation requests
> +	 * more quickly than the bnobt. Since the locality hint is just a hint
> +	 * and we don't want to scan the entire bnobt for perfect locality, the
> +	 * cntbt search essentially bounds the bnobt search such that we can
> +	 * find good enough locality at reasonable performance in most cases.
>  	 */
>  	while (xfs_alloc_cur_active(acur->bnolt) ||
> -	       xfs_alloc_cur_active(acur->bnogt)) {
> +	       xfs_alloc_cur_active(acur->bnogt) ||
> +	       xfs_alloc_cur_active(acur->cnt)) {
> +
> +		trace_xfs_alloc_cur_lookup(args);
> +
> +		/*
> +		 * Search the bnobt left and right. In the case of a hit, finish
> +		 * the search in the opposite direction and we're done.
> +		 */
>  		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
>  					    true, 1, &i);
>  		if (error)
> @@ -1305,7 +1404,6 @@ xfs_alloc_ag_vextent_bnobt(
>  			fbinc = true;
>  			break;
>  		}
> -
>  		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
>  					    1, &i);
>  		if (error)
> @@ -1316,9 +1414,40 @@ xfs_alloc_ag_vextent_bnobt(
>  			fbinc = false;
>  			break;
>  		}
> +
> +		/*
> +		 * Check the extent with best locality based on the current
> +		 * extent size search key and keep track of the best candidate.
> +		 */
> +		error = xfs_alloc_cntbt_iter(args, acur);
> +		if (error)
> +			return error;
> +		if (!xfs_alloc_cur_active(acur->cnt)) {
> +			trace_xfs_alloc_cur_lookup_done(args);
> +			break;
> +		}
> +	}
> +
> +	/*
> +	 * If we failed to find anything due to busy extents, return empty
> +	 * handed so the caller can flush and retry. If no busy extents were
> +	 * found, walk backwards from the end of the cntbt as a last resort.
> +	 */
> +	if (!xfs_alloc_cur_active(acur->cnt) && !acur->len && !acur->busy) {
> +		error = xfs_btree_decrement(acur->cnt, 0, &i);
> +		if (error)
> +			return error;
> +		if (i) {
> +			acur->cnt->bc_private.a.priv.abt.active = true;
> +			fbcur = acur->cnt;
> +			fbinc = false;
> +		}
>  	}
>  
> -	/* search the opposite direction for a better entry */
> +	/*
> +	 * Search in the opposite direction for a better entry in the case of
> +	 * a bnobt hit or walk backwards from the end of the cntbt.
> +	 */
>  	if (fbcur) {
>  		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
>  					    &i);
> @@ -1447,9 +1576,10 @@ xfs_alloc_ag_vextent_near(
>  	}
>  
>  	/*
> -	 * Second algorithm. Search the bnobt left and right.
> +	 * Second algorithm. Combined cntbt and bnobt search to find ideal
> +	 * locality.
>  	 */
> -	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
> +	error = xfs_alloc_ag_vextent_locality(args, &acur, &i);
>  	if (error)
>  		goto out;
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 21fa0e8822e3..a07a5fe22218 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1645,6 +1645,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup_done);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-04 23:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-27 17:17 [PATCH v5 00/11] xfs: rework near mode extent allocation Brian Foster
2019-09-27 17:17 ` [PATCH v5 01/11] xfs: track active state of allocation btree cursors Brian Foster
2019-09-30  8:11   ` Christoph Hellwig
2019-09-30 12:17     ` Brian Foster
2019-10-01  6:36       ` Christoph Hellwig
2019-10-01 10:30         ` Brian Foster
2019-10-01  5:35   ` Darrick J. Wong
2019-09-27 17:17 ` [PATCH v5 02/11] xfs: introduce allocation cursor data structure Brian Foster
2019-09-27 17:17 ` [PATCH v5 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
2019-09-27 17:17 ` [PATCH v5 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
2019-10-04 22:35   ` Darrick J. Wong
2019-09-27 17:17 ` [PATCH v5 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
2019-10-04 22:40   ` Darrick J. Wong
2019-09-27 17:17 ` [PATCH v5 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
2019-10-04 22:45   ` Darrick J. Wong
2019-09-27 17:17 ` [PATCH v5 07/11] xfs: refactor allocation tree fixup code Brian Foster
2019-09-27 17:17 ` [PATCH v5 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
2019-10-04 22:59   ` Darrick J. Wong
2019-09-27 17:18 ` [PATCH v5 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
2019-09-27 17:18 ` [PATCH v5 10/11] xfs: factor out tree fixup logic into helper Brian Foster
2019-09-27 17:18 ` [PATCH v5 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
2019-10-04 23:20   ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).