All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/11] xfs: rework near mode extent allocation
@ 2019-09-16 12:16 Brian Foster
  2019-09-16 12:16 ` [PATCH v4 01/11] xfs: track active state of allocation btree cursors Brian Foster
                   ` (10 more replies)
  0 siblings, 11 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

Hi all,

Here is v4 of the near mode allocation rework. The primary difference in
v4 is that I've split up the one big rework patch from previous versions
into smaller patches that separate most of the functional changes from
the larger amount of refactoring required to hopefully facilitate
review. Since the previous versions were basically a rewrite of the
existing algorithm, this approach of evolving the current code to the
combined bnobt+cnbtbt algorithm ends up with slightly different code
from the previous version. For example, some changes to how minlen is
handled and tweaks to the best extent selection logic are lost in favor
of preservation of the existing logic. The debug mode patch is no longer
necessary because the existing equivalent code is preserved.

I think these differences are mostly harmless and essentially just
artifacts of the difference in how this patch series is constructed.
Some of these tweaks may be reintroduced as independent fixups or to
ultimately support the other allocation modes, but they are not required
to fix the fundamental problem of pathological worst case bnobt scanning
behavior. That said, I can't rule out some quirks that might have been
introduced through the process of taking the previous version apart and
reintroducing it in smaller increments, so review for that would be
useful.

I've run this through a series of tests mostly to verify that there
haven't been any regressions since v3. This survives fstests from a
functional perspective, maintains relative performance on clean and
pre-fragmented filesystems and maintains the same level of locality
effectiveness described by the previous version. It provides the same
significant speedup to the highly (free space) fragmented metadump image
provided by Mao Cheng that originally exhibited this problem.

Thoughts, reviews, flames appreciated.

Brian

v4:
- Fix up cursor active tracking type usage.
- Fix up cntbt lookup function signature.
- Added high level comment on optimized allocation algorithm.
- Split up series into smaller patches to separate refactoring from
  functional changes.
v3: https://lore.kernel.org/linux-xfs/20190815125538.49570-1-bfoster@redhat.com/
- Drop by-size and exact allocation rework bits.
- Add near mode last block scan.
- Add debug mode patch to randomly toggle near mode algos.
- Refactor cursor setup/lookup logic.
- Refactor minlen reverse scan to be common between near mode algos.
- Fix up logic to consistently prioritize extent size over locality.
- Add more useful tracepoints.
- Miscellaneous bug fixes and code/comment cleanups.
v2: https://marc.info/?l=linux-xfs&m=155854834815400&w=2
- Lift small mode refactoring into separate patch (retained review
  tag(s).
- Various logic cleanups and refactors.
- Push active flag down into btree cursor private area; eliminate cursor
  container struct.
- Refactor final allocation code. Fold xfs_alloc_ag_vextent_type() into
  caller and factor out accounting.                                                                                                                     
- Fix up tracepoints.
v1: https://marc.info/?l=linux-xfs&m=155742169729590&w=2
- Continued development (various fixes, refinements) on generic bits and
  near mode implementation.
- Added patches 4-6 to refactor exact, by-size and small allocation
  modes.
rfcv2: https://marc.info/?l=linux-xfs&m=155197946630582&w=2
- Dropped spurious initial refactoring.
- Added minlen functionality.
- Properly tied into near alloc path.
- General refactoring and cleanups.
rfcv1: https://marc.info/?l=linux-xfs&m=154479089914351&w=2


Brian Foster (11):
  xfs: track active state of allocation btree cursors
  xfs: introduce allocation cursor data structure
  xfs: track allocation busy state in allocation cursor
  xfs: track best extent from cntbt lastblock scan in alloc cursor
  xfs: refactor cntbt lastblock scan best extent logic into helper
  xfs: reuse best extent tracking logic for bnobt scan
  xfs: refactor allocation tree fixup code
  xfs: refactor and reuse best extent scanning logic
  xfs: refactor near mode alloc bnobt scan into separate function
  xfs: factor out tree fixup logic into helper
  xfs: optimize near mode bnobt scans with concurrent cntbt lookups

 fs/xfs/libxfs/xfs_alloc.c       | 887 ++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_alloc_btree.c |   1 +
 fs/xfs/libxfs/xfs_btree.h       |   3 +
 fs/xfs/xfs_trace.h              |  32 +-
 4 files changed, 537 insertions(+), 386 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 01/11] xfs: track active state of allocation btree cursors
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 18:38   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 02/11] xfs: introduce allocation cursor data structure Brian Foster
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

The upcoming allocation algorithm update searches multiple
allocation btree cursors concurrently. As such, it requires an
active state to track when a particular cursor should continue
searching. While active state will be modified based on higher level
logic, we can define base functionality based on the result of
allocation btree lookups.

Define an active flag in the private area of the btree cursor.
Update it based on the result of lookups in the existing allocation
btree helpers. Finally, provide a new helper to query the current
state.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c       | 24 +++++++++++++++++++++---
 fs/xfs/libxfs/xfs_alloc_btree.c |  1 +
 fs/xfs/libxfs/xfs_btree.h       |  3 +++
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 533b04aaf6f6..512a45888e06 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -146,9 +146,13 @@ xfs_alloc_lookup_eq(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat)	/* success/failure */
 {
+	int			error;
+
 	cur->bc_rec.a.ar_startblock = bno;
 	cur->bc_rec.a.ar_blockcount = len;
-	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
+	return error;
 }
 
 /*
@@ -162,9 +166,13 @@ xfs_alloc_lookup_ge(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat)	/* success/failure */
 {
+	int			error;
+
 	cur->bc_rec.a.ar_startblock = bno;
 	cur->bc_rec.a.ar_blockcount = len;
-	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
+	return error;
 }
 
 /*
@@ -178,9 +186,19 @@ xfs_alloc_lookup_le(
 	xfs_extlen_t		len,	/* length of extent */
 	int			*stat)	/* success/failure */
 {
+	int			error;
 	cur->bc_rec.a.ar_startblock = bno;
 	cur->bc_rec.a.ar_blockcount = len;
-	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+	error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
+	return error;
+}
+
+static inline bool
+xfs_alloc_cur_active(
+	struct xfs_btree_cur	*cur)
+{
+	return cur && cur->bc_private.a.priv.abt.active;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 2a94543857a1..279694d73e4e 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -507,6 +507,7 @@ xfs_allocbt_init_cursor(
 
 	cur->bc_private.a.agbp = agbp;
 	cur->bc_private.a.agno = agno;
+	cur->bc_private.a.priv.abt.active = false;
 
 	if (xfs_sb_version_hascrc(&mp->m_sb))
 		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index ced1e65d1483..b4e3ec1d7ff9 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -183,6 +183,9 @@ union xfs_btree_cur_private {
 		unsigned long	nr_ops;		/* # record updates */
 		int		shape_changes;	/* # of extent splits */
 	} refc;
+	struct {
+		bool		active;		/* allocation cursor state */
+	} abt;
 };
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 02/11] xfs: introduce allocation cursor data structure
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
  2019-09-16 12:16 ` [PATCH v4 01/11] xfs: track active state of allocation btree cursors Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 18:46   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

Introduce a new allocation cursor data structure to encapsulate the
various states and structures used to perform an extent allocation.
This structure will eventually be used to track overall allocation
state across different search algorithms on both free space btrees.

To start, include the three btree cursors (one for the cntbt and two
for the bnobt left/right search) used by the near mode allocation
algorithm and refactor the cursor setup and teardown code into
helpers. This slightly changes cursor memory allocation patterns,
but otherwise makes no functional changes to the allocation
algorithm.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 318 +++++++++++++++++++-------------------
 1 file changed, 163 insertions(+), 155 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 512a45888e06..d159377ed603 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -710,8 +710,71 @@ xfs_alloc_update_counters(
 }
 
 /*
- * Allocation group level functions.
+ * Block allocation algorithm and data structures.
  */
+struct xfs_alloc_cur {
+	struct xfs_btree_cur		*cnt;	/* btree cursors */
+	struct xfs_btree_cur		*bnolt;
+	struct xfs_btree_cur		*bnogt;
+};
+
+/*
+ * Set up cursors, etc. in the extent allocation cursor. This function can be
+ * called multiple times to reset an initialized structure without having to
+ * reallocate cursors.
+ */
+static int
+xfs_alloc_cur_setup(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur)
+{
+	int			error;
+	int			i;
+
+	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
+
+	/*
+	 * Perform an initial cntbt lookup to check for availability of maxlen
+	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
+	 * attempt a small allocation.
+	 */
+	if (!acur->cnt)
+		acur->cnt = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_CNT);
+	error = xfs_alloc_lookup_ge(acur->cnt, 0, args->maxlen, &i);
+	if (error)
+		return error;
+
+	/*
+	 * Allocate the bnobt left and right search cursors.
+	 */
+	if (!acur->bnolt)
+		acur->bnolt = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	if (!acur->bnogt)
+		acur->bnogt = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	return i == 1 ? 0 : -ENOSPC;
+}
+
+static void
+xfs_alloc_cur_close(
+	struct xfs_alloc_cur	*acur,
+	bool			error)
+{
+	int			cur_error = XFS_BTREE_NOERROR;
+
+	if (error)
+		cur_error = XFS_BTREE_ERROR;
+
+	if (acur->cnt)
+		xfs_btree_del_cursor(acur->cnt, cur_error);
+	if (acur->bnolt)
+		xfs_btree_del_cursor(acur->bnolt, cur_error);
+	if (acur->bnogt)
+		xfs_btree_del_cursor(acur->bnogt, cur_error);
+	acur->cnt = acur->bnolt = acur->bnogt = NULL;
+}
 
 /*
  * Deal with the case where only small freespaces remain. Either return the
@@ -1008,8 +1071,8 @@ xfs_alloc_ag_vextent_exact(
 STATIC int
 xfs_alloc_find_best_extent(
 	struct xfs_alloc_arg	*args,	/* allocation argument structure */
-	struct xfs_btree_cur	**gcur,	/* good cursor */
-	struct xfs_btree_cur	**scur,	/* searching cursor */
+	struct xfs_btree_cur	*gcur,	/* good cursor */
+	struct xfs_btree_cur	*scur,	/* searching cursor */
 	xfs_agblock_t		gdiff,	/* difference for search comparison */
 	xfs_agblock_t		*sbno,	/* extent found by search */
 	xfs_extlen_t		*slen,	/* extent length */
@@ -1031,7 +1094,7 @@ xfs_alloc_find_best_extent(
 	 * Look until we find a better one, run out of space or run off the end.
 	 */
 	do {
-		error = xfs_alloc_get_rec(*scur, sbno, slen, &i);
+		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
 		if (error)
 			goto error0;
 		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
@@ -1074,21 +1137,19 @@ xfs_alloc_find_best_extent(
 		}
 
 		if (!dir)
-			error = xfs_btree_increment(*scur, 0, &i);
+			error = xfs_btree_increment(scur, 0, &i);
 		else
-			error = xfs_btree_decrement(*scur, 0, &i);
+			error = xfs_btree_decrement(scur, 0, &i);
 		if (error)
 			goto error0;
 	} while (i);
 
 out_use_good:
-	xfs_btree_del_cursor(*scur, XFS_BTREE_NOERROR);
-	*scur = NULL;
+	scur->bc_private.a.priv.abt.active = false;
 	return 0;
 
 out_use_search:
-	xfs_btree_del_cursor(*gcur, XFS_BTREE_NOERROR);
-	*gcur = NULL;
+	gcur->bc_private.a.priv.abt.active = false;
 	return 0;
 
 error0:
@@ -1102,13 +1163,12 @@ xfs_alloc_find_best_extent(
  * and of the form k * prod + mod unless there's nothing that large.
  * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
  */
-STATIC int				/* error */
+STATIC int
 xfs_alloc_ag_vextent_near(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+	struct xfs_alloc_arg	*args)
 {
-	xfs_btree_cur_t	*bno_cur_gt;	/* cursor for bno btree, right side */
-	xfs_btree_cur_t	*bno_cur_lt;	/* cursor for bno btree, left side */
-	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
+	struct xfs_alloc_cur	acur = {0,};
+	struct xfs_btree_cur	*bno_cur;
 	xfs_agblock_t	gtbno;		/* start bno of right side entry */
 	xfs_agblock_t	gtbnoa;		/* aligned ... */
 	xfs_extlen_t	gtdiff;		/* difference to right side entry */
@@ -1148,38 +1208,29 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = args->max_agbno;
 
 restart:
-	bno_cur_lt = NULL;
-	bno_cur_gt = NULL;
 	ltlen = 0;
 	gtlena = 0;
 	ltlena = 0;
 	busy = false;
 
 	/*
-	 * Get a cursor for the by-size btree.
+	 * Set up cursors and see if there are any free extents as big as
+	 * maxlen. If not, pick the last entry in the tree unless the tree is
+	 * empty.
 	 */
-	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_CNT);
-
-	/*
-	 * See if there are any free extents as big as maxlen.
-	 */
-	if ((error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i)))
-		goto error0;
-	/*
-	 * If none, then pick up the last entry in the tree unless the
-	 * tree is empty.
-	 */
-	if (!i) {
-		if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
-				&ltlen, &i)))
-			goto error0;
+	error = xfs_alloc_cur_setup(args, &acur);
+	if (error == -ENOSPC) {
+		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
+				&ltlen, &i);
+		if (error)
+			goto out;
 		if (i == 0 || ltlen == 0) {
-			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 			trace_xfs_alloc_near_noentry(args);
-			return 0;
+			goto out;
 		}
 		ASSERT(i == 1);
+	} else if (error) {
+		goto out;
 	}
 	args->wasfromfl = 0;
 
@@ -1193,7 +1244,7 @@ xfs_alloc_ag_vextent_near(
 	 * This is written as a while loop so we can break out of it,
 	 * but we never loop back to the top.
 	 */
-	while (xfs_btree_islastblock(cnt_cur, 0)) {
+	while (xfs_btree_islastblock(acur.cnt, 0)) {
 		xfs_extlen_t	bdiff;
 		int		besti=0;
 		xfs_extlen_t	blen=0;
@@ -1210,32 +1261,35 @@ xfs_alloc_ag_vextent_near(
 		 * and skip all those smaller than minlen.
 		 */
 		if (ltlen || args->alignment > 1) {
-			cnt_cur->bc_ptrs[0] = 1;
+			acur.cnt->bc_ptrs[0] = 1;
 			do {
-				if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno,
-						&ltlen, &i)))
-					goto error0;
-				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
+						&ltlen, &i);
+				if (error)
+					goto out;
+				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 				if (ltlen >= args->minlen)
 					break;
-				if ((error = xfs_btree_increment(cnt_cur, 0, &i)))
-					goto error0;
+				error = xfs_btree_increment(acur.cnt, 0, &i);
+				if (error)
+					goto out;
 			} while (i);
 			ASSERT(ltlen >= args->minlen);
 			if (!i)
 				break;
 		}
-		i = cnt_cur->bc_ptrs[0];
+		i = acur.cnt->bc_ptrs[0];
 		for (j = 1, blen = 0, bdiff = 0;
 		     !error && j && (blen < args->maxlen || bdiff > 0);
-		     error = xfs_btree_increment(cnt_cur, 0, &j)) {
+		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
 			/*
 			 * For each entry, decide if it's better than
 			 * the previous best entry.
 			 */
-			if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
 					&ltbnoa, &ltlena, &busy_gen);
 			if (ltlena < args->minlen)
@@ -1255,7 +1309,7 @@ xfs_alloc_ag_vextent_near(
 				bdiff = ltdiff;
 				bnew = ltnew;
 				blen = args->len;
-				besti = cnt_cur->bc_ptrs[0];
+				besti = acur.cnt->bc_ptrs[0];
 			}
 		}
 		/*
@@ -1267,10 +1321,11 @@ xfs_alloc_ag_vextent_near(
 		/*
 		 * Point at the best entry, and retrieve it again.
 		 */
-		cnt_cur->bc_ptrs[0] = besti;
-		if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		acur.cnt->bc_ptrs[0] = besti;
+		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
+		if (error)
+			goto out;
+		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
 		args->len = blen;
 
@@ -1280,23 +1335,14 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = bnew;
 		ASSERT(bnew >= ltbno);
 		ASSERT(bnew + blen <= ltbno + ltlen);
-		/*
-		 * Set up a cursor for the by-bno tree.
-		 */
-		bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp,
-			args->agbp, args->agno, XFS_BTNUM_BNO);
-		/*
-		 * Fix up the btree entries.
-		 */
-		if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno,
-				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK)))
-			goto error0;
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
-
+		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
+					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
+		if (error)
+			goto out;
 		trace_xfs_alloc_near_first(args);
-		return 0;
+		goto out;
 	}
+
 	/*
 	 * Second algorithm.
 	 * Search in the by-bno tree to the left and to the right
@@ -1309,86 +1355,57 @@ xfs_alloc_ag_vextent_near(
 	 * level algorithm that picks allocation groups for allocations
 	 * is not supposed to do this.
 	 */
-	/*
-	 * Allocate and initialize the cursor for the leftward search.
-	 */
-	bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_BNO);
-	/*
-	 * Lookup <= bno to find the leftward search's starting point.
-	 */
-	if ((error = xfs_alloc_lookup_le(bno_cur_lt, args->agbno, args->maxlen, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * Didn't find anything; use this cursor for the rightward
-		 * search.
-		 */
-		bno_cur_gt = bno_cur_lt;
-		bno_cur_lt = NULL;
-	}
-	/*
-	 * Found something.  Duplicate the cursor for the rightward search.
-	 */
-	else if ((error = xfs_btree_dup_cursor(bno_cur_lt, &bno_cur_gt)))
-		goto error0;
-	/*
-	 * Increment the cursor, so we will point at the entry just right
-	 * of the leftward entry if any, or to the leftmost entry.
-	 */
-	if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * It failed, there are no rightward entries.
-		 */
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_NOERROR);
-		bno_cur_gt = NULL;
-	}
+	error = xfs_alloc_lookup_le(acur.bnolt, args->agbno, 0, &i);
+	if (error)
+		goto out;
+	error = xfs_alloc_lookup_ge(acur.bnogt, args->agbno, 0, &i);
+	if (error)
+		goto out;
+
 	/*
 	 * Loop going left with the leftward cursor, right with the
 	 * rightward cursor, until either both directions give up or
 	 * we find an entry at least as big as minlen.
 	 */
 	do {
-		if (bno_cur_lt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		if (xfs_alloc_cur_active(acur.bnolt)) {
+			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
 					&ltbnoa, &ltlena, &busy_gen);
 			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
 				break;
-			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
-				goto error0;
-			if (!i || ltbnoa < args->min_agbno) {
-				xfs_btree_del_cursor(bno_cur_lt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_lt = NULL;
-			}
+			error = xfs_btree_decrement(acur.bnolt, 0, &i);
+			if (error)
+				goto out;
+			if (!i || ltbnoa < args->min_agbno)
+				acur.bnolt->bc_private.a.priv.abt.active = false;
 		}
-		if (bno_cur_gt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
+		if (xfs_alloc_cur_active(acur.bnogt)) {
+			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
+			if (error)
+				goto out;
+			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
 			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
 					&gtbnoa, &gtlena, &busy_gen);
 			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
 				break;
-			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-				goto error0;
-			if (!i || gtbnoa > args->max_agbno) {
-				xfs_btree_del_cursor(bno_cur_gt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_gt = NULL;
-			}
+			error = xfs_btree_increment(acur.bnogt, 0, &i);
+			if (error)
+				goto out;
+			if (!i || gtbnoa > args->max_agbno)
+				acur.bnogt->bc_private.a.priv.abt.active = false;
 		}
-	} while (bno_cur_lt || bno_cur_gt);
+	} while (xfs_alloc_cur_active(acur.bnolt) ||
+		 xfs_alloc_cur_active(acur.bnogt));
 
 	/*
 	 * Got both cursors still active, need to find better entry.
 	 */
-	if (bno_cur_lt && bno_cur_gt) {
+	if (xfs_alloc_cur_active(acur.bnolt) &&
+	    xfs_alloc_cur_active(acur.bnogt)) {
 		if (ltlena >= args->minlen) {
 			/*
 			 * Left side is good, look for a right side entry.
@@ -1400,7 +1417,7 @@ xfs_alloc_ag_vextent_near(
 				ltlena, &ltnew);
 
 			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_lt, &bno_cur_gt,
+						acur.bnolt, acur.bnogt,
 						ltdiff, &gtbno, &gtlen,
 						&gtbnoa, &gtlena,
 						0 /* search right */);
@@ -1417,22 +1434,21 @@ xfs_alloc_ag_vextent_near(
 				gtlena, &gtnew);
 
 			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_gt, &bno_cur_lt,
+						acur.bnogt, acur.bnolt,
 						gtdiff, &ltbno, &ltlen,
 						&ltbnoa, &ltlena,
 						1 /* search left */);
 		}
 
 		if (error)
-			goto error0;
+			goto out;
 	}
 
 	/*
 	 * If we couldn't get anything, give up.
 	 */
-	if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-
+	if (!xfs_alloc_cur_active(acur.bnolt) &&
+	    !xfs_alloc_cur_active(acur.bnogt)) {
 		if (busy) {
 			trace_xfs_alloc_near_busy(args);
 			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
@@ -1440,7 +1456,7 @@ xfs_alloc_ag_vextent_near(
 		}
 		trace_xfs_alloc_size_neither(args);
 		args->agbno = NULLAGBLOCK;
-		return 0;
+		goto out;
 	}
 
 	/*
@@ -1449,16 +1465,17 @@ xfs_alloc_ag_vextent_near(
 	 * useful variables to the "left" set so we only have one
 	 * copy of this code.
 	 */
-	if (bno_cur_gt) {
-		bno_cur_lt = bno_cur_gt;
-		bno_cur_gt = NULL;
+	if (xfs_alloc_cur_active(acur.bnogt)) {
+		bno_cur = acur.bnogt;
 		ltbno = gtbno;
 		ltbnoa = gtbnoa;
 		ltlen = gtlen;
 		ltlena = gtlena;
 		j = 1;
-	} else
+	} else {
+		bno_cur = acur.bnolt;
 		j = 0;
+	}
 
 	/*
 	 * Fix up the length and compute the useful address.
@@ -1474,27 +1491,18 @@ xfs_alloc_ag_vextent_near(
 	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
 	args->agbno = ltnew;
 
-	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
-			ltnew, rlen, XFSA_FIXUP_BNO_OK)))
-		goto error0;
+	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
+				      rlen, XFSA_FIXUP_BNO_OK);
+	if (error)
+		goto out;
 
 	if (j)
 		trace_xfs_alloc_near_greater(args);
 	else
 		trace_xfs_alloc_near_lesser(args);
 
-	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-	xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
-	return 0;
-
- error0:
-	trace_xfs_alloc_near_error(args);
-	if (cnt_cur != NULL)
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
-	if (bno_cur_lt != NULL)
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_ERROR);
-	if (bno_cur_gt != NULL)
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_ERROR);
+out:
+	xfs_alloc_cur_close(&acur, error);
 	return error;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
  2019-09-16 12:16 ` [PATCH v4 01/11] xfs: track active state of allocation btree cursors Brian Foster
  2019-09-16 12:16 ` [PATCH v4 02/11] xfs: introduce allocation cursor data structure Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 18:48   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

Extend the allocation cursor to track extent busy state for an
allocation attempt. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index d159377ed603..5c34d4c41761 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -716,6 +716,8 @@ struct xfs_alloc_cur {
 	struct xfs_btree_cur		*cnt;	/* btree cursors */
 	struct xfs_btree_cur		*bnolt;
 	struct xfs_btree_cur		*bnogt;
+	unsigned			busy_gen;/* busy state */
+	bool				busy;
 };
 
 /*
@@ -733,6 +735,9 @@ xfs_alloc_cur_setup(
 
 	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
 
+	acur->busy = false;
+	acur->busy_gen = 0;
+
 	/*
 	 * Perform an initial cntbt lookup to check for availability of maxlen
 	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
@@ -1185,8 +1190,6 @@ xfs_alloc_ag_vextent_near(
 	xfs_extlen_t	ltlena;		/* aligned ... */
 	xfs_agblock_t	ltnew;		/* useful start bno of left side */
 	xfs_extlen_t	rlen;		/* length of returned extent */
-	bool		busy;
-	unsigned	busy_gen;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
@@ -1211,7 +1214,6 @@ xfs_alloc_ag_vextent_near(
 	ltlen = 0;
 	gtlena = 0;
 	ltlena = 0;
-	busy = false;
 
 	/*
 	 * Set up cursors and see if there are any free extents as big as
@@ -1290,8 +1292,8 @@ xfs_alloc_ag_vextent_near(
 			if (error)
 				goto out;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
+			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
+					&ltbnoa, &ltlena, &acur.busy_gen);
 			if (ltlena < args->minlen)
 				continue;
 			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
@@ -1373,8 +1375,8 @@ xfs_alloc_ag_vextent_near(
 			if (error)
 				goto out;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
+			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
+					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
 			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
 				break;
 			error = xfs_btree_decrement(acur.bnolt, 0, &i);
@@ -1388,8 +1390,8 @@ xfs_alloc_ag_vextent_near(
 			if (error)
 				goto out;
 			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
-					&gtbnoa, &gtlena, &busy_gen);
+			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
+					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
 			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
 				break;
 			error = xfs_btree_increment(acur.bnogt, 0, &i);
@@ -1449,9 +1451,10 @@ xfs_alloc_ag_vextent_near(
 	 */
 	if (!xfs_alloc_cur_active(acur.bnolt) &&
 	    !xfs_alloc_cur_active(acur.bnogt)) {
-		if (busy) {
+		if (acur.busy) {
 			trace_xfs_alloc_near_busy(args);
-			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
+			xfs_extent_busy_flush(args->mp, args->pag,
+					      acur.busy_gen);
 			goto restart;
 		}
 		trace_xfs_alloc_size_neither(args);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (2 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 18:56   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

If the size lookup lands in the last block of the by-size btree, the
near mode algorithm scans the entire block for the extent with best
available locality. In preparation for similar best available
extent tracking across both btrees, extend the allocation cursor
with best extent data and lift the associated state from the cntbt
last block scan code. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 63 ++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 5c34d4c41761..ee46989ab723 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -716,6 +716,11 @@ struct xfs_alloc_cur {
 	struct xfs_btree_cur		*cnt;	/* btree cursors */
 	struct xfs_btree_cur		*bnolt;
 	struct xfs_btree_cur		*bnogt;
+	xfs_agblock_t			rec_bno;/* extent startblock */
+	xfs_extlen_t			rec_len;/* extent length */
+	xfs_agblock_t			bno;	/* alloc bno */
+	xfs_extlen_t			len;	/* alloc len */
+	xfs_extlen_t			diff;	/* diff from search bno */
 	unsigned			busy_gen;/* busy state */
 	bool				busy;
 };
@@ -735,6 +740,11 @@ xfs_alloc_cur_setup(
 
 	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
 
+	acur->rec_bno = 0;
+	acur->rec_len = 0;
+	acur->bno = 0;
+	acur->len = 0;
+	acur->diff = -1;
 	acur->busy = false;
 	acur->busy_gen = 0;
 
@@ -1247,10 +1257,7 @@ xfs_alloc_ag_vextent_near(
 	 * but we never loop back to the top.
 	 */
 	while (xfs_btree_islastblock(acur.cnt, 0)) {
-		xfs_extlen_t	bdiff;
-		int		besti=0;
-		xfs_extlen_t	blen=0;
-		xfs_agblock_t	bnew=0;
+		xfs_extlen_t	diff;
 
 #ifdef DEBUG
 		if (dofirst)
@@ -1281,8 +1288,8 @@ xfs_alloc_ag_vextent_near(
 				break;
 		}
 		i = acur.cnt->bc_ptrs[0];
-		for (j = 1, blen = 0, bdiff = 0;
-		     !error && j && (blen < args->maxlen || bdiff > 0);
+		for (j = 1;
+		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
 		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
 			/*
 			 * For each entry, decide if it's better than
@@ -1301,44 +1308,40 @@ xfs_alloc_ag_vextent_near(
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			ASSERT(args->len >= args->minlen);
-			if (args->len < blen)
+			if (args->len < acur.len)
 				continue;
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
+			diff = xfs_alloc_compute_diff(args->agbno, args->len,
 				args->alignment, args->datatype, ltbnoa,
 				ltlena, &ltnew);
 			if (ltnew != NULLAGBLOCK &&
-			    (args->len > blen || ltdiff < bdiff)) {
-				bdiff = ltdiff;
-				bnew = ltnew;
-				blen = args->len;
-				besti = acur.cnt->bc_ptrs[0];
+			    (args->len > acur.len || diff < acur.diff)) {
+				acur.rec_bno = ltbno;
+				acur.rec_len = ltlen;
+				acur.diff = diff;
+				acur.bno = ltnew;
+				acur.len = args->len;
 			}
 		}
 		/*
 		 * It didn't work.  We COULD be in a case where
 		 * there's a good record somewhere, so try again.
 		 */
-		if (blen == 0)
+		if (acur.len == 0)
 			break;
-		/*
-		 * Point at the best entry, and retrieve it again.
-		 */
-		acur.cnt->bc_ptrs[0] = besti;
-		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
-		if (error)
-			goto out;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-		args->len = blen;
 
 		/*
-		 * We are allocating starting at bnew for blen blocks.
+		 * Allocate at the bno/len tracked in the cursor.
 		 */
-		args->agbno = bnew;
-		ASSERT(bnew >= ltbno);
-		ASSERT(bnew + blen <= ltbno + ltlen);
-		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
-					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
+		args->agbno = acur.bno;
+		args->len = acur.len;
+		ASSERT(acur.bno >= acur.rec_bno);
+		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
+		ASSERT(acur.rec_bno + acur.rec_len <=
+		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+
+		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
+				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
+				0);
 		if (error)
 			goto out;
 		trace_xfs_alloc_near_first(args);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (3 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 19:03   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

The cntbt lastblock scan checks the size, alignment, locality, etc.
of each free extent in the block and compares it with the current
best candidate. This logic will be reused by the upcoming optimized
cntbt algorithm, so refactor it into a separate helper.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 113 +++++++++++++++++++++++++++++---------
 fs/xfs/xfs_trace.h        |  25 +++++++++
 2 files changed, 111 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ee46989ab723..2fa7bb6a00a8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -791,6 +791,89 @@ xfs_alloc_cur_close(
 	acur->cnt = acur->bnolt = acur->bnogt = NULL;
 }
 
+/*
+ * Check an extent for allocation and track the best available candidate in the
+ * allocation structure. The cursor is deactivated if it has entered an out of
+ * range state based on allocation arguments. Optionally return the extent
+ * extent geometry and allocation status if requested by the caller.
+ */
+static int
+xfs_alloc_cur_check(
+	struct xfs_alloc_arg		*args,
+	struct xfs_alloc_cur		*acur,
+	struct xfs_btree_cur		*cur,
+	int				*new)
+{
+	int			error, i;
+	xfs_agblock_t		bno, bnoa, bnew;
+	xfs_extlen_t		len, lena, diff = -1;
+	bool			busy;
+	unsigned		busy_gen = 0;
+	bool			deactivate = false;
+
+	*new = 0;
+
+	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
+
+	/*
+	 * Check minlen and deactivate a cntbt cursor if out of acceptable size
+	 * range (i.e., walking backwards looking for a minlen extent).
+	 */
+	if (len < args->minlen) {
+		deactivate = true;
+		goto out;
+	}
+
+	busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
+					 &busy_gen);
+	acur->busy |= busy;
+	if (busy)
+		acur->busy_gen = busy_gen;
+	/* deactivate a bnobt cursor outside of locality range */
+	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
+		goto out;
+	if (lena < args->minlen)
+		goto out;
+
+	args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
+	xfs_alloc_fix_len(args);
+	ASSERT(args->len >= args->minlen);
+	if (args->len < acur->len)
+		goto out;
+
+	/*
+	 * We have an aligned record that satisfies minlen and beats or matches
+	 * the candidate extent size. Compare locality for near allocation mode.
+	 */
+	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
+	diff = xfs_alloc_compute_diff(args->agbno, args->len,
+				      args->alignment, args->datatype,
+				      bnoa, lena, &bnew);
+	if (bnew == NULLAGBLOCK)
+		goto out;
+	if (diff > acur->diff)
+		goto out;
+
+	ASSERT(args->len > acur->len ||
+	       (args->len == acur->len && diff <= acur->diff));
+	acur->rec_bno = bno;
+	acur->rec_len = len;
+	acur->bno = bnew;
+	acur->len = args->len;
+	acur->diff = diff;
+	*new = 1;
+
+out:
+	if (deactivate)
+		cur->bc_private.a.priv.abt.active = false;
+	trace_xfs_alloc_cur_check(args->mp, cur->bc_btnum, bno, len, diff,
+				  *new);
+	return 0;
+}
+
 /*
  * Deal with the case where only small freespaces remain. Either return the
  * contents of the last freespace record, or allocate space from the freelist if
@@ -1257,8 +1340,6 @@ xfs_alloc_ag_vextent_near(
 	 * but we never loop back to the top.
 	 */
 	while (xfs_btree_islastblock(acur.cnt, 0)) {
-		xfs_extlen_t	diff;
-
 #ifdef DEBUG
 		if (dofirst)
 			break;
@@ -1289,38 +1370,16 @@ xfs_alloc_ag_vextent_near(
 		}
 		i = acur.cnt->bc_ptrs[0];
 		for (j = 1;
-		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
+		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
+		     (acur.len < args->maxlen || acur.diff > 0);
 		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
 			/*
 			 * For each entry, decide if it's better than
 			 * the previous best entry.
 			 */
-			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
+			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
 			if (error)
 				goto out;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &acur.busy_gen);
-			if (ltlena < args->minlen)
-				continue;
-			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
-				continue;
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ASSERT(args->len >= args->minlen);
-			if (args->len < acur.len)
-				continue;
-			diff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-			if (ltnew != NULLAGBLOCK &&
-			    (args->len > acur.len || diff < acur.diff)) {
-				acur.rec_bno = ltbno;
-				acur.rec_len = ltlen;
-				acur.diff = diff;
-				acur.bno = ltnew;
-				acur.len = args->len;
-			}
 		}
 		/*
 		 * It didn't work.  We COULD be in a case where
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index eaae275ed430..b12fad3e45cb 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1663,6 +1663,31 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
 
+TRACE_EVENT(xfs_alloc_cur_check,
+	TP_PROTO(struct xfs_mount *mp, xfs_btnum_t btnum, xfs_agblock_t bno,
+		 xfs_extlen_t len, xfs_extlen_t diff, bool new),
+	TP_ARGS(mp, btnum, bno, len, diff, new),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_btnum_t, btnum)
+		__field(xfs_agblock_t, bno)
+		__field(xfs_extlen_t, len)
+		__field(xfs_extlen_t, diff)
+		__field(bool, new)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->btnum = btnum;
+		__entry->bno = bno;
+		__entry->len = len;
+		__entry->diff = diff;
+		__entry->new = new;
+	),
+	TP_printk("dev %d:%d btnum %d bno 0x%x len 0x%x diff 0x%x new %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->btnum,
+		  __entry->bno, __entry->len, __entry->diff, __entry->new)
+)
+
 DECLARE_EVENT_CLASS(xfs_da_class,
 	TP_PROTO(struct xfs_da_args *args),
 	TP_ARGS(args),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (4 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 20:43   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 07/11] xfs: refactor allocation tree fixup code Brian Foster
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

The near mode bnobt scan searches left and right in the bnobt
looking for the closest free extent to the allocation hint that
satisfies minlen. Once such an extent is found, the left/right
search terminates, we search one more time in the opposite direction
and finish the allocation with the best overall extent.

The left/right and find best searches are currently controlled via a
combination of cursor state and local variables. Clean up this code
and prepare for further improvements to the near mode fallback
algorithm by reusing the allocation cursor best extent tracking
mechanism. Update the tracking logic to deactivate bnobt cursors
when out of allocation range and replace open-coded extent checks to
calls to the common helper. In doing so, rename some misnamed local
variables in the top-level near mode allocation function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 270 +++++++++++---------------------------
 fs/xfs/xfs_trace.h        |   4 +-
 2 files changed, 76 insertions(+), 198 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 2fa7bb6a00a8..edcec975c306 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -810,6 +810,7 @@ xfs_alloc_cur_check(
 	bool			busy;
 	unsigned		busy_gen = 0;
 	bool			deactivate = false;
+	bool			isbnobt = cur->bc_btnum == XFS_BTNUM_BNO;
 
 	*new = 0;
 
@@ -823,7 +824,7 @@ xfs_alloc_cur_check(
 	 * range (i.e., walking backwards looking for a minlen extent).
 	 */
 	if (len < args->minlen) {
-		deactivate = true;
+		deactivate = !isbnobt;
 		goto out;
 	}
 
@@ -833,8 +834,10 @@ xfs_alloc_cur_check(
 	if (busy)
 		acur->busy_gen = busy_gen;
 	/* deactivate a bnobt cursor outside of locality range */
-	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
+	if (bnoa < args->min_agbno || bnoa > args->max_agbno) {
+		deactivate = isbnobt;
 		goto out;
+	}
 	if (lena < args->minlen)
 		goto out;
 
@@ -854,8 +857,14 @@ xfs_alloc_cur_check(
 				      bnoa, lena, &bnew);
 	if (bnew == NULLAGBLOCK)
 		goto out;
-	if (diff > acur->diff)
+
+	/*
+	 * Deactivate a bnobt cursor with worse locality than the current best.
+	 */
+	if (diff > acur->diff) {
+		deactivate = isbnobt;
 		goto out;
+	}
 
 	ASSERT(args->len > acur->len ||
 	       (args->len == acur->len && diff <= acur->diff));
@@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
 }
 
 /*
- * Search the btree in a given direction via the search cursor and compare
- * the records found against the good extent we've already found.
+ * Search the btree in a given direction and check the records against the good
+ * extent we've already found.
  */
 STATIC int
 xfs_alloc_find_best_extent(
-	struct xfs_alloc_arg	*args,	/* allocation argument structure */
-	struct xfs_btree_cur	*gcur,	/* good cursor */
-	struct xfs_btree_cur	*scur,	/* searching cursor */
-	xfs_agblock_t		gdiff,	/* difference for search comparison */
-	xfs_agblock_t		*sbno,	/* extent found by search */
-	xfs_extlen_t		*slen,	/* extent length */
-	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
-	xfs_extlen_t		*slena,	/* aligned extent length */
-	int			dir)	/* 0 = search right, 1 = search left */
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	struct xfs_btree_cur	*cur,
+	bool			increment)
 {
-	xfs_agblock_t		new;
-	xfs_agblock_t		sdiff;
 	int			error;
 	int			i;
-	unsigned		busy_gen;
-
-	/* The good extent is perfect, no need to  search. */
-	if (!gdiff)
-		goto out_use_good;
 
 	/*
-	 * Look until we find a better one, run out of space or run off the end.
+	 * Search so long as the cursor is active or we find a better extent.
+	 * The cursor is deactivated if it extends beyond the range of the
+	 * current allocation candidate.
 	 */
-	do {
-		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
+	while (xfs_alloc_cur_active(cur)) {
+		error = xfs_alloc_cur_check(args, acur, cur, &i);
 		if (error)
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-		xfs_alloc_compute_aligned(args, *sbno, *slen,
-				sbnoa, slena, &busy_gen);
-
-		/*
-		 * The good extent is closer than this one.
-		 */
-		if (!dir) {
-			if (*sbnoa > args->max_agbno)
-				goto out_use_good;
-			if (*sbnoa >= args->agbno + gdiff)
-				goto out_use_good;
-		} else {
-			if (*sbnoa < args->min_agbno)
-				goto out_use_good;
-			if (*sbnoa <= args->agbno - gdiff)
-				goto out_use_good;
-		}
-
-		/*
-		 * Same distance, compare length and pick the best.
-		 */
-		if (*slena >= args->minlen) {
-			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
-			xfs_alloc_fix_len(args);
-
-			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-						       args->alignment,
-						       args->datatype, *sbnoa,
-						       *slena, &new);
-
-			/*
-			 * Choose closer size and invalidate other cursor.
-			 */
-			if (sdiff < gdiff)
-				goto out_use_search;
-			goto out_use_good;
-		}
+			return error;
+		if (i == 1)
+			break;
+		if (!xfs_alloc_cur_active(cur))
+			break;
 
-		if (!dir)
-			error = xfs_btree_increment(scur, 0, &i);
+		if (increment)
+			error = xfs_btree_increment(cur, 0, &i);
 		else
-			error = xfs_btree_decrement(scur, 0, &i);
+			error = xfs_btree_decrement(cur, 0, &i);
 		if (error)
-			goto error0;
-	} while (i);
-
-out_use_good:
-	scur->bc_private.a.priv.abt.active = false;
-	return 0;
+			return error;
+		if (i == 0)
+			cur->bc_private.a.priv.abt.active = false;
+	}
 
-out_use_search:
-	gcur->bc_private.a.priv.abt.active = false;
 	return 0;
-
-error0:
-	/* caller invalidates cursors */
-	return error;
 }
 
 /*
@@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_alloc_cur	acur = {0,};
-	struct xfs_btree_cur	*bno_cur;
-	xfs_agblock_t	gtbno;		/* start bno of right side entry */
-	xfs_agblock_t	gtbnoa;		/* aligned ... */
-	xfs_extlen_t	gtdiff;		/* difference to right side entry */
-	xfs_extlen_t	gtlen;		/* length of right side entry */
-	xfs_extlen_t	gtlena;		/* aligned ... */
-	xfs_agblock_t	gtnew;		/* useful start bno of right side */
+	struct xfs_btree_cur	*fbcur = NULL;
 	int		error;		/* error code */
 	int		i;		/* result code, temporary */
 	int		j;		/* result code, temporary */
-	xfs_agblock_t	ltbno;		/* start bno of left side entry */
-	xfs_agblock_t	ltbnoa;		/* aligned ... */
-	xfs_extlen_t	ltdiff;		/* difference to left side entry */
-	xfs_extlen_t	ltlen;		/* length of left side entry */
-	xfs_extlen_t	ltlena;		/* aligned ... */
-	xfs_agblock_t	ltnew;		/* useful start bno of left side */
-	xfs_extlen_t	rlen;		/* length of returned extent */
+	xfs_agblock_t	bno;
+	xfs_extlen_t	len;
+	bool		fbinc = false;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
@@ -1304,9 +1251,7 @@ xfs_alloc_ag_vextent_near(
 		args->agbno = args->max_agbno;
 
 restart:
-	ltlen = 0;
-	gtlena = 0;
-	ltlena = 0;
+	len = 0;
 
 	/*
 	 * Set up cursors and see if there are any free extents as big as
@@ -1315,11 +1260,11 @@ xfs_alloc_ag_vextent_near(
 	 */
 	error = xfs_alloc_cur_setup(args, &acur);
 	if (error == -ENOSPC) {
-		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
-				&ltlen, &i);
+		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &bno,
+				&len, &i);
 		if (error)
 			goto out;
-		if (i == 0 || ltlen == 0) {
+		if (i == 0 || len == 0) {
 			trace_xfs_alloc_near_noentry(args);
 			goto out;
 		}
@@ -1350,21 +1295,21 @@ xfs_alloc_ag_vextent_near(
 		 * record smaller than maxlen, go to the start of this block,
 		 * and skip all those smaller than minlen.
 		 */
-		if (ltlen || args->alignment > 1) {
+		if (len || args->alignment > 1) {
 			acur.cnt->bc_ptrs[0] = 1;
 			do {
-				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
-						&ltlen, &i);
+				error = xfs_alloc_get_rec(acur.cnt, &bno, &len,
+						&i);
 				if (error)
 					goto out;
 				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-				if (ltlen >= args->minlen)
+				if (len >= args->minlen)
 					break;
 				error = xfs_btree_increment(acur.cnt, 0, &i);
 				if (error)
 					goto out;
 			} while (i);
-			ASSERT(ltlen >= args->minlen);
+			ASSERT(len >= args->minlen);
 			if (!i)
 				break;
 		}
@@ -1433,77 +1378,43 @@ xfs_alloc_ag_vextent_near(
 	 */
 	do {
 		if (xfs_alloc_cur_active(acur.bnolt)) {
-			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
+			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
 			if (error)
 				goto out;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
-					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
-			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
+			if (i == 1) {
+				trace_xfs_alloc_cur_left(args);
+				fbcur = acur.bnogt;
+				fbinc = true;
 				break;
+			}
 			error = xfs_btree_decrement(acur.bnolt, 0, &i);
 			if (error)
 				goto out;
-			if (!i || ltbnoa < args->min_agbno)
+			if (!i)
 				acur.bnolt->bc_private.a.priv.abt.active = false;
 		}
 		if (xfs_alloc_cur_active(acur.bnogt)) {
-			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
+			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
 			if (error)
 				goto out;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
-			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
-					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
-			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
+			if (i == 1) {
+				trace_xfs_alloc_cur_right(args);
+				fbcur = acur.bnolt;
+				fbinc = false;
 				break;
+			}
 			error = xfs_btree_increment(acur.bnogt, 0, &i);
 			if (error)
 				goto out;
-			if (!i || gtbnoa > args->max_agbno)
+			if (!i)
 				acur.bnogt->bc_private.a.priv.abt.active = false;
 		}
 	} while (xfs_alloc_cur_active(acur.bnolt) ||
 		 xfs_alloc_cur_active(acur.bnogt));
 
-	/*
-	 * Got both cursors still active, need to find better entry.
-	 */
-	if (xfs_alloc_cur_active(acur.bnolt) &&
-	    xfs_alloc_cur_active(acur.bnogt)) {
-		if (ltlena >= args->minlen) {
-			/*
-			 * Left side is good, look for a right side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						acur.bnolt, acur.bnogt,
-						ltdiff, &gtbno, &gtlen,
-						&gtbnoa, &gtlena,
-						0 /* search right */);
-		} else {
-			ASSERT(gtlena >= args->minlen);
-
-			/*
-			 * Right side is good, look for a left side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, gtbnoa,
-				gtlena, &gtnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						acur.bnogt, acur.bnolt,
-						gtdiff, &ltbno, &ltlen,
-						&ltbnoa, &ltlena,
-						1 /* search left */);
-		}
-
+	/* search the opposite direction for a better entry */
+	if (fbcur) {
+		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
 		if (error)
 			goto out;
 	}
@@ -1511,8 +1422,7 @@ xfs_alloc_ag_vextent_near(
 	/*
 	 * If we couldn't get anything, give up.
 	 */
-	if (!xfs_alloc_cur_active(acur.bnolt) &&
-	    !xfs_alloc_cur_active(acur.bnogt)) {
+	if (!acur.len) {
 		if (acur.busy) {
 			trace_xfs_alloc_near_busy(args);
 			xfs_extent_busy_flush(args->mp, args->pag,
@@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
 		goto out;
 	}
 
-	/*
-	 * At this point we have selected a freespace entry, either to the
-	 * left or to the right.  If it's on the right, copy all the
-	 * useful variables to the "left" set so we only have one
-	 * copy of this code.
-	 */
-	if (xfs_alloc_cur_active(acur.bnogt)) {
-		bno_cur = acur.bnogt;
-		ltbno = gtbno;
-		ltbnoa = gtbnoa;
-		ltlen = gtlen;
-		ltlena = gtlena;
-		j = 1;
-	} else {
-		bno_cur = acur.bnolt;
-		j = 0;
-	}
-
-	/*
-	 * Fix up the length and compute the useful address.
-	 */
-	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-	xfs_alloc_fix_len(args);
-	rlen = args->len;
-	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
-				     args->datatype, ltbnoa, ltlena, &ltnew);
-	ASSERT(ltnew >= ltbno);
-	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
-	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
-	args->agbno = ltnew;
-
-	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
-				      rlen, XFSA_FIXUP_BNO_OK);
-	if (error)
-		goto out;
+	args->agbno = acur.bno;
+	args->len = acur.len;
+	ASSERT(acur.bno >= acur.rec_bno);
+	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
+	ASSERT(acur.rec_bno + acur.rec_len <=
+	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
 
-	if (j)
-		trace_xfs_alloc_near_greater(args);
-	else
-		trace_xfs_alloc_near_lesser(args);
+	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
+				      acur.rec_len, acur.bno, acur.len, 0);
 
 out:
 	xfs_alloc_cur_close(&acur, error);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index b12fad3e45cb..2e57dc3d4230 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 07/11] xfs: refactor allocation tree fixup code
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (5 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 20:44   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

Both algorithms duplicate the same btree allocation code. Eliminate
the duplication and reuse the fallback algorithm codepath.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index edcec975c306..3eacc799c4cb 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1333,23 +1333,8 @@ xfs_alloc_ag_vextent_near(
 		if (acur.len == 0)
 			break;
 
-		/*
-		 * Allocate at the bno/len tracked in the cursor.
-		 */
-		args->agbno = acur.bno;
-		args->len = acur.len;
-		ASSERT(acur.bno >= acur.rec_bno);
-		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
-		ASSERT(acur.rec_bno + acur.rec_len <=
-		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-
-		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
-				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
-				0);
-		if (error)
-			goto out;
 		trace_xfs_alloc_near_first(args);
-		goto out;
+		goto alloc;
 	}
 
 	/*
@@ -1434,6 +1419,7 @@ xfs_alloc_ag_vextent_near(
 		goto out;
 	}
 
+alloc:
 	args->agbno = acur.bno;
 	args->len = acur.len;
 	ASSERT(acur.bno >= acur.rec_bno);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (6 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 07/11] xfs: refactor allocation tree fixup code Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 21:03   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

The bnobt "find best" helper implements a simple btree walker
function. This general pattern, or a subset thereof, is reused in
various parts of a near mode allocation operation. For example, the
bnobt left/right scans are each iterative btree walks along with the
cntbt lastblock scan.

Rework this function into a generic btree walker, add a couple
parameters to control termination behavior from various contexts and
reuse it where applicable.

XXX: This slightly changes the cntbt lastblock scan logic to scan
the entire block and use the best candidate as opposed to the
current logic of checking until we locate a perfect match. Fix up
cur_check() to terminate on perfect match.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 103 ++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 55 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3eacc799c4cb..ab494fd50dd7 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1172,30 +1172,38 @@ xfs_alloc_ag_vextent_exact(
 }
 
 /*
- * Search the btree in a given direction and check the records against the good
- * extent we've already found.
+ * Search a given number of btree records in a given direction. Check each
+ * record against the good extent we've already found.
  */
 STATIC int
-xfs_alloc_find_best_extent(
+xfs_alloc_walk_iter(
 	struct xfs_alloc_arg	*args,
 	struct xfs_alloc_cur	*acur,
 	struct xfs_btree_cur	*cur,
-	bool			increment)
+	bool			increment,
+	bool			find_one,/* quit on first candidate */
+	int			count,	/* rec count (-1 for infinite) */
+	int			*stat)
 {
 	int			error;
 	int			i;
 
+	*stat = 0;
+
 	/*
 	 * Search so long as the cursor is active or we find a better extent.
 	 * The cursor is deactivated if it extends beyond the range of the
 	 * current allocation candidate.
 	 */
-	while (xfs_alloc_cur_active(cur)) {
+	while (xfs_alloc_cur_active(cur) && count) {
 		error = xfs_alloc_cur_check(args, acur, cur, &i);
 		if (error)
 			return error;
-		if (i == 1)
-			break;
+		if (i == 1) {
+			*stat = 1;
+			if (find_one)
+				break;
+		}
 		if (!xfs_alloc_cur_active(cur))
 			break;
 
@@ -1207,6 +1215,9 @@ xfs_alloc_find_best_extent(
 			return error;
 		if (i == 0)
 			cur->bc_private.a.priv.abt.active = false;
+
+		if (count > 0)
+			count--;
 	}
 
 	return 0;
@@ -1226,7 +1237,6 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_btree_cur	*fbcur = NULL;
 	int		error;		/* error code */
 	int		i;		/* result code, temporary */
-	int		j;		/* result code, temporary */
 	xfs_agblock_t	bno;
 	xfs_extlen_t	len;
 	bool		fbinc = false;
@@ -1313,19 +1323,12 @@ xfs_alloc_ag_vextent_near(
 			if (!i)
 				break;
 		}
-		i = acur.cnt->bc_ptrs[0];
-		for (j = 1;
-		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
-		     (acur.len < args->maxlen || acur.diff > 0);
-		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
-			/*
-			 * For each entry, decide if it's better than
-			 * the previous best entry.
-			 */
-			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
-			if (error)
-				goto out;
-		}
+
+		error = xfs_alloc_walk_iter(args, &acur, acur.cnt, true, false,
+					    -1, &i);
+		if (error)
+			goto out;
+
 		/*
 		 * It didn't work.  We COULD be in a case where
 		 * there's a good record somewhere, so try again.
@@ -1357,49 +1360,39 @@ xfs_alloc_ag_vextent_near(
 		goto out;
 
 	/*
-	 * Loop going left with the leftward cursor, right with the
-	 * rightward cursor, until either both directions give up or
-	 * we find an entry at least as big as minlen.
+	 * Loop going left with the leftward cursor, right with the rightward
+	 * cursor, until either both directions give up or we find an entry at
+	 * least as big as minlen.
 	 */
 	do {
-		if (xfs_alloc_cur_active(acur.bnolt)) {
-			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
-			if (error)
-				goto out;
-			if (i == 1) {
-				trace_xfs_alloc_cur_left(args);
-				fbcur = acur.bnogt;
-				fbinc = true;
-				break;
-			}
-			error = xfs_btree_decrement(acur.bnolt, 0, &i);
-			if (error)
-				goto out;
-			if (!i)
-				acur.bnolt->bc_private.a.priv.abt.active = false;
+		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
+					    true, 1, &i);
+		if (error)
+			goto out;
+		if (i == 1) {
+			trace_xfs_alloc_cur_left(args);
+			fbcur = acur.bnogt;
+			fbinc = true;
+			break;
 		}
-		if (xfs_alloc_cur_active(acur.bnogt)) {
-			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
-			if (error)
-				goto out;
-			if (i == 1) {
-				trace_xfs_alloc_cur_right(args);
-				fbcur = acur.bnolt;
-				fbinc = false;
-				break;
-			}
-			error = xfs_btree_increment(acur.bnogt, 0, &i);
-			if (error)
-				goto out;
-			if (!i)
-				acur.bnogt->bc_private.a.priv.abt.active = false;
+
+		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
+					    1, &i);
+		if (error)
+			goto out;
+		if (i == 1) {
+			trace_xfs_alloc_cur_right(args);
+			fbcur = acur.bnolt;
+			fbinc = false;
+			break;
 		}
 	} while (xfs_alloc_cur_active(acur.bnolt) ||
 		 xfs_alloc_cur_active(acur.bnogt));
 
 	/* search the opposite direction for a better entry */
 	if (fbcur) {
-		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
+		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
+					    &i);
 		if (error)
 			goto out;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 09/11] xfs: refactor near mode alloc bnobt scan into separate function
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (7 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 20:55   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 10/11] xfs: factor out tree fixup logic into helper Brian Foster
  2019-09-16 12:16 ` [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

In preparation to enhance the near mode allocation bnobt scan algorithm, lift
it into a separate function. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 128 ++++++++++++++++++++++----------------
 1 file changed, 74 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ab494fd50dd7..6f7e4666250c 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1223,6 +1223,78 @@ xfs_alloc_walk_iter(
 	return 0;
 }
 
+/*
+ * Search in the by-bno btree to the left and to the right simultaneously until
+ * in each case we find a large enough free extent or run into the edge of the
+ * tree. When we run into the edge of the tree, we deactivate that cursor.
+ */
+STATIC int
+xfs_alloc_ag_vextent_bnobt(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	int			*stat)
+{
+	struct xfs_btree_cur	*fbcur = NULL;
+	int			error;
+	int			i;
+	bool			fbinc;
+
+	ASSERT(acur->len == 0);
+	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
+
+	*stat = 0;
+
+	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
+	if (error)
+		return error;
+	error = xfs_alloc_lookup_ge(acur->bnogt, args->agbno, 0, &i);
+	if (error)
+		return error;
+
+	/*
+	 * Loop going left with the leftward cursor, right with the rightward
+	 * cursor, until either both directions give up or we find an entry at
+	 * least as big as minlen.
+	 */
+	while (xfs_alloc_cur_active(acur->bnolt) ||
+	       xfs_alloc_cur_active(acur->bnogt)) {
+		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
+					    true, 1, &i);
+		if (error)
+			return error;
+		if (i == 1) {
+			trace_xfs_alloc_cur_left(args);
+			fbcur = acur->bnogt;
+			fbinc = true;
+			break;
+		}
+
+		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
+					    1, &i);
+		if (error)
+			return error;
+		if (i == 1) {
+			trace_xfs_alloc_cur_right(args);
+			fbcur = acur->bnolt;
+			fbinc = false;
+			break;
+		}
+	}
+
+	/* search the opposite direction for a better entry */
+	if (fbcur) {
+		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
+					    &i);
+		if (error)
+			return error;
+	}
+
+	if (acur->len)
+		*stat = 1;
+
+	return 0;
+}
+
 /*
  * Allocate a variable extent near bno in the allocation group agno.
  * Extent's length (returned in len) will be between minlen and maxlen,
@@ -1234,12 +1306,10 @@ xfs_alloc_ag_vextent_near(
 	struct xfs_alloc_arg	*args)
 {
 	struct xfs_alloc_cur	acur = {0,};
-	struct xfs_btree_cur	*fbcur = NULL;
 	int		error;		/* error code */
 	int		i;		/* result code, temporary */
 	xfs_agblock_t	bno;
 	xfs_extlen_t	len;
-	bool		fbinc = false;
 #ifdef DEBUG
 	/*
 	 * Randomly don't execute the first algorithm.
@@ -1341,62 +1411,12 @@ xfs_alloc_ag_vextent_near(
 	}
 
 	/*
-	 * Second algorithm.
-	 * Search in the by-bno tree to the left and to the right
-	 * simultaneously, until in each case we find a space big enough,
-	 * or run into the edge of the tree.  When we run into the edge,
-	 * we deallocate that cursor.
-	 * If both searches succeed, we compare the two spaces and pick
-	 * the better one.
-	 * With alignment, it's possible for both to fail; the upper
-	 * level algorithm that picks allocation groups for allocations
-	 * is not supposed to do this.
+	 * Second algorithm. Search the bnobt left and right.
 	 */
-	error = xfs_alloc_lookup_le(acur.bnolt, args->agbno, 0, &i);
-	if (error)
-		goto out;
-	error = xfs_alloc_lookup_ge(acur.bnogt, args->agbno, 0, &i);
+	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
 	if (error)
 		goto out;
 
-	/*
-	 * Loop going left with the leftward cursor, right with the rightward
-	 * cursor, until either both directions give up or we find an entry at
-	 * least as big as minlen.
-	 */
-	do {
-		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
-					    true, 1, &i);
-		if (error)
-			goto out;
-		if (i == 1) {
-			trace_xfs_alloc_cur_left(args);
-			fbcur = acur.bnogt;
-			fbinc = true;
-			break;
-		}
-
-		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
-					    1, &i);
-		if (error)
-			goto out;
-		if (i == 1) {
-			trace_xfs_alloc_cur_right(args);
-			fbcur = acur.bnolt;
-			fbinc = false;
-			break;
-		}
-	} while (xfs_alloc_cur_active(acur.bnolt) ||
-		 xfs_alloc_cur_active(acur.bnogt));
-
-	/* search the opposite direction for a better entry */
-	if (fbcur) {
-		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
-					    &i);
-		if (error)
-			goto out;
-	}
-
 	/*
 	 * If we couldn't get anything, give up.
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 10/11] xfs: factor out tree fixup logic into helper
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (8 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 20:56   ` Darrick J. Wong
  2019-09-16 12:16 ` [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

Lift the btree fixup path into a helper function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 42 +++++++++++++++++++++++++++++----------
 fs/xfs/xfs_trace.h        |  1 +
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6f7e4666250c..381a08257aaf 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -883,6 +883,36 @@ xfs_alloc_cur_check(
 	return 0;
 }
 
+/*
+ * Complete an allocation of a candidate extent. Remove the extent from both
+ * trees and update the args structure.
+ */
+STATIC int
+xfs_alloc_cur_finish(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur)
+{
+	int			error;
+
+	ASSERT(acur->cnt && acur->bnolt);
+	ASSERT(acur->bno >= acur->rec_bno);
+	ASSERT(acur->bno + acur->len <= acur->rec_bno + acur->rec_len);
+	ASSERT(acur->rec_bno + acur->rec_len <=
+	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
+
+	error = xfs_alloc_fixup_trees(acur->cnt, acur->bnolt, acur->rec_bno,
+				      acur->rec_len, acur->bno, acur->len, 0);
+	if (error)
+		return error;
+
+	args->agbno = acur->bno;
+	args->len = acur->len;
+	args->wasfromfl = 0;
+
+	trace_xfs_alloc_cur(args);
+	return 0;
+}
+
 /*
  * Deal with the case where only small freespaces remain. Either return the
  * contents of the last freespace record, or allocate space from the freelist if
@@ -1352,7 +1382,6 @@ xfs_alloc_ag_vextent_near(
 	} else if (error) {
 		goto out;
 	}
-	args->wasfromfl = 0;
 
 	/*
 	 * First algorithm.
@@ -1433,15 +1462,8 @@ xfs_alloc_ag_vextent_near(
 	}
 
 alloc:
-	args->agbno = acur.bno;
-	args->len = acur.len;
-	ASSERT(acur.bno >= acur.rec_bno);
-	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
-	ASSERT(acur.rec_bno + acur.rec_len <=
-	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-
-	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
-				      acur.rec_len, acur.bno, acur.len, 0);
+	/* fix up btrees on a successful allocation */
+	error = xfs_alloc_cur_finish(args, &acur);
 
 out:
 	xfs_alloc_cur_close(&acur, error);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 2e57dc3d4230..b8b93068efe7 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1642,6 +1642,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups
  2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
                   ` (9 preceding siblings ...)
  2019-09-16 12:16 ` [PATCH v4 10/11] xfs: factor out tree fixup logic into helper Brian Foster
@ 2019-09-16 12:16 ` Brian Foster
  2019-09-18 21:11   ` Darrick J. Wong
  10 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-16 12:16 UTC (permalink / raw)
  To: linux-xfs

The near mode fallback algorithm consists of a left/right scan of
the bnobt. This algorithm has very poor breakdown characteristics
under worst case free space fragmentation conditions. If a suitable
extent is far enough from the locality hint, each allocation may
scan most or all of the bnobt before it completes. This causes
pathological behavior and extremely high allocation latencies.

While locality is important to near mode allocations, it is not so
important as to incur pathological allocation latency to provide the
asolute best available locality for every allocation. If the
allocation is large enough or far enough away, there is a point of
diminishing returns. As such, we can bound the overall operation by
including an iterative cntbt lookup in the broader search. The cntbt
lookup is optimized to immediately find the extent with best
locality for the given size on each iteration. Since the cntbt is
indexed by extent size, the lookup repeats with a variably
aggressive increasing search key size until it runs off the edge of
the tree.

This approach provides a natural balance between the two algorithms
for various situations. For example, the bnobt scan is able to
satisfy smaller allocations such as for inode chunks or btree blocks
more quickly where the cntbt search may have to search through a
large set of extent sizes when the search key starts off small
relative to the largest extent in the tree. On the other hand, the
cntbt search more deterministically covers the set of suitable
extents for larger data extent allocation requests that the bnobt
scan may have to search the entire tree to locate.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 153 +++++++++++++++++++++++++++++++++++---
 fs/xfs/xfs_trace.h        |   2 +
 2 files changed, 143 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 381a08257aaf..4ec22040e516 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -716,6 +716,7 @@ struct xfs_alloc_cur {
 	struct xfs_btree_cur		*cnt;	/* btree cursors */
 	struct xfs_btree_cur		*bnolt;
 	struct xfs_btree_cur		*bnogt;
+	xfs_extlen_t			cur_len;/* current search length */
 	xfs_agblock_t			rec_bno;/* extent startblock */
 	xfs_extlen_t			rec_len;/* extent length */
 	xfs_agblock_t			bno;	/* alloc bno */
@@ -740,6 +741,7 @@ xfs_alloc_cur_setup(
 
 	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
 
+	acur->cur_len = args->maxlen;
 	acur->rec_bno = 0;
 	acur->rec_len = 0;
 	acur->bno = 0;
@@ -913,6 +915,76 @@ xfs_alloc_cur_finish(
 	return 0;
 }
 
+/*
+ * Locality allocation lookup algorithm. This expects a cntbt cursor and uses
+ * bno optimized lookup to search for extents with ideal size and locality.
+ */
+STATIC int
+xfs_alloc_cntbt_iter(
+	struct xfs_alloc_arg		*args,
+	struct xfs_alloc_cur		*acur)
+{
+	struct xfs_btree_cur	*cur = acur->cnt;
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len, cur_len;
+	int			error;
+	int			i;
+
+	if (!xfs_alloc_cur_active(cur))
+		return 0;
+
+	/* locality optimized lookup */
+	cur_len = acur->cur_len;
+	error = xfs_alloc_lookup_ge(cur, args->agbno, cur_len, &i);
+	if (error)
+		return error;
+	if (i == 0)
+		return 0;
+	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+	if (error)
+		return error;
+
+	/* check the current record and update search length from it */
+	error = xfs_alloc_cur_check(args, acur, cur, &i);
+	if (error)
+		return error;
+	ASSERT(len >= acur->cur_len);
+	acur->cur_len = len;
+
+	/*
+	 * We looked up the first record >= [agbno, len] above. The agbno is a
+	 * secondary key and so the current record may lie just before or after
+	 * agbno. If it is past agbno, check the previous record too so long as
+	 * the length matches as it may be closer. Don't check a smaller record
+	 * because that could deactivate our cursor.
+	 */
+	if (bno > args->agbno) {
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (!error && i) {
+			error = xfs_alloc_get_rec(cur, &bno, &len, &i);
+			if (!error && i && len == acur->cur_len)
+				error = xfs_alloc_cur_check(args, acur, cur,
+							    &i);
+		}
+		if (error)
+			return error;
+	}
+
+	/*
+	 * Increment the search key until we find at least one allocation
+	 * candidate or if the extent we found was larger. Otherwise, double the
+	 * search key to optimize the search. Efficiency is more important here
+	 * than absolute best locality.
+	 */
+	cur_len <<= 1;
+	if (!acur->len || acur->cur_len >= cur_len)
+		acur->cur_len++;
+	else
+		acur->cur_len = cur_len;
+
+	return error;
+}
+
 /*
  * Deal with the case where only small freespaces remain. Either return the
  * contents of the last freespace record, or allocate space from the freelist if
@@ -1254,12 +1326,11 @@ xfs_alloc_walk_iter(
 }
 
 /*
- * Search in the by-bno btree to the left and to the right simultaneously until
- * in each case we find a large enough free extent or run into the edge of the
- * tree. When we run into the edge of the tree, we deactivate that cursor.
+ * Search the by-bno and by-size btrees in parallel in search of an extent with
+ * ideal locality based on the NEAR mode ->agbno locality hint.
  */
 STATIC int
-xfs_alloc_ag_vextent_bnobt(
+xfs_alloc_ag_vextent_locality(
 	struct xfs_alloc_arg	*args,
 	struct xfs_alloc_cur	*acur,
 	int			*stat)
@@ -1274,6 +1345,9 @@ xfs_alloc_ag_vextent_bnobt(
 
 	*stat = 0;
 
+	error = xfs_alloc_lookup_ge(acur->cnt, args->agbno, acur->cur_len, &i);
+	if (error)
+		return error;
 	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
 	if (error)
 		return error;
@@ -1282,12 +1356,37 @@ xfs_alloc_ag_vextent_bnobt(
 		return error;
 
 	/*
-	 * Loop going left with the leftward cursor, right with the rightward
-	 * cursor, until either both directions give up or we find an entry at
-	 * least as big as minlen.
+	 * Search the bnobt and cntbt in parallel. Search the bnobt left and
+	 * right and lookup the closest extent to the locality hint for each
+	 * extent size key in the cntbt. The entire search terminates
+	 * immediately on a bnobt hit because that means we've found best case
+	 * locality. Otherwise the search continues until the cntbt cursor runs
+	 * off the end of the tree. If no allocation candidate is found at this
+	 * point, give up on locality, walk backwards from the end of the cntbt
+	 * and take the first available extent.
+	 *
+	 * The parallel tree searches balance each other out to provide fairly
+	 * consistent performance for various situations. The bnobt search can
+	 * have pathological behavior in the worst case scenario of larger
+	 * allocation requests and fragmented free space. On the other hand, the
+	 * bnobt is able to satisfy most smaller allocation requests much more
+	 * quickly than the cntbt. The cntbt search can sift through fragmented
+	 * free space and sets of free extents for larger allocation requests
+	 * more quickly than the bnobt. Since the locality hint is just a hint
+	 * and we don't want to scan the entire bnobt for perfect locality, the
+	 * cntbt search essentially bounds the bnobt search such that we can
+	 * find good enough locality at reasonable performance in most cases.
 	 */
 	while (xfs_alloc_cur_active(acur->bnolt) ||
-	       xfs_alloc_cur_active(acur->bnogt)) {
+	       xfs_alloc_cur_active(acur->bnogt) ||
+	       xfs_alloc_cur_active(acur->cnt)) {
+
+		trace_xfs_alloc_cur_lookup(args);
+
+		/*
+		 * Search the bnobt left and right. In the case of a hit, finish
+		 * the search in the opposite direction and we're done.
+		 */
 		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
 					    true, 1, &i);
 		if (error)
@@ -1298,7 +1397,6 @@ xfs_alloc_ag_vextent_bnobt(
 			fbinc = true;
 			break;
 		}
-
 		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
 					    1, &i);
 		if (error)
@@ -1309,9 +1407,39 @@ xfs_alloc_ag_vextent_bnobt(
 			fbinc = false;
 			break;
 		}
+
+		/*
+		 * Check the extent with best locality based on the current
+		 * extent size search key and keep track of the best candidate.
+		 * If we fail to find anything due to busy extents, return
+		 * empty handed so the caller can flush and retry the search. If
+		 * no busy extents were found, walk backwards from the end of
+		 * the cntbt as a last resort.
+		 */
+		error = xfs_alloc_cntbt_iter(args, acur);
+		if (error)
+			return error;
+		if (!xfs_alloc_cur_active(acur->cnt)) {
+			trace_xfs_alloc_cur_lookup_done(args);
+			if (!acur->len && !acur->busy) {
+				error = xfs_btree_decrement(acur->cnt, 0, &i);
+				if (error)
+					return error;
+				if (i) {
+					acur->cnt->bc_private.a.priv.abt.active = true;
+					fbcur = acur->cnt;
+					fbinc = false;
+				}
+			}
+			break;
+		}
+
 	}
 
-	/* search the opposite direction for a better entry */
+	/*
+	 * Search in the opposite direction for a better entry in the case of
+	 * a bnobt hit or walk backwards from the end of the cntbt.
+	 */
 	if (fbcur) {
 		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
 					    &i);
@@ -1440,9 +1568,10 @@ xfs_alloc_ag_vextent_near(
 	}
 
 	/*
-	 * Second algorithm. Search the bnobt left and right.
+	 * Second algorithm. Combined cntbt and bnobt search to find ideal
+	 * locality.
 	 */
-	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
+	error = xfs_alloc_ag_vextent_locality(args, &acur, &i);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index b8b93068efe7..0c9dfeac4e75 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1645,6 +1645,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup_done);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 01/11] xfs: track active state of allocation btree cursors
  2019-09-16 12:16 ` [PATCH v4 01/11] xfs: track active state of allocation btree cursors Brian Foster
@ 2019-09-18 18:38   ` Darrick J. Wong
  2019-09-19 14:57     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:38 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:25AM -0400, Brian Foster wrote:
> The upcoming allocation algorithm update searches multiple
> allocation btree cursors concurrently. As such, it requires an
> active state to track when a particular cursor should continue
> searching. While active state will be modified based on higher level
> logic, we can define base functionality based on the result of
> allocation btree lookups.
> 
> Define an active flag in the private area of the btree cursor.
> Update it based on the result of lookups in the existing allocation
> btree helpers. Finally, provide a new helper to query the current
> state.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c       | 24 +++++++++++++++++++++---
>  fs/xfs/libxfs/xfs_alloc_btree.c |  1 +
>  fs/xfs/libxfs/xfs_btree.h       |  3 +++
>  3 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 533b04aaf6f6..512a45888e06 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -146,9 +146,13 @@ xfs_alloc_lookup_eq(
>  	xfs_extlen_t		len,	/* length of extent */
>  	int			*stat)	/* success/failure */
>  {
> +	int			error;
> +
>  	cur->bc_rec.a.ar_startblock = bno;
>  	cur->bc_rec.a.ar_blockcount = len;
> -	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> +	error = xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> +	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;

I think "cur->bc_private.a.priv.abt.active = (*stat == 1);" would have
sufficed for these, right?  (Yeah, sorry, picking at nits here...)

--D

> +	return error;
>  }
>  
>  /*
> @@ -162,9 +166,13 @@ xfs_alloc_lookup_ge(
>  	xfs_extlen_t		len,	/* length of extent */
>  	int			*stat)	/* success/failure */
>  {
> +	int			error;
> +
>  	cur->bc_rec.a.ar_startblock = bno;
>  	cur->bc_rec.a.ar_blockcount = len;
> -	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> +	error = xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> +	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
> +	return error;
>  }
>  
>  /*
> @@ -178,9 +186,19 @@ xfs_alloc_lookup_le(
>  	xfs_extlen_t		len,	/* length of extent */
>  	int			*stat)	/* success/failure */
>  {
> +	int			error;
>  	cur->bc_rec.a.ar_startblock = bno;
>  	cur->bc_rec.a.ar_blockcount = len;
> -	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> +	error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> +	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
> +	return error;
> +}
> +
> +static inline bool
> +xfs_alloc_cur_active(
> +	struct xfs_btree_cur	*cur)
> +{
> +	return cur && cur->bc_private.a.priv.abt.active;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
> index 2a94543857a1..279694d73e4e 100644
> --- a/fs/xfs/libxfs/xfs_alloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_alloc_btree.c
> @@ -507,6 +507,7 @@ xfs_allocbt_init_cursor(
>  
>  	cur->bc_private.a.agbp = agbp;
>  	cur->bc_private.a.agno = agno;
> +	cur->bc_private.a.priv.abt.active = false;
>  
>  	if (xfs_sb_version_hascrc(&mp->m_sb))
>  		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index ced1e65d1483..b4e3ec1d7ff9 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -183,6 +183,9 @@ union xfs_btree_cur_private {
>  		unsigned long	nr_ops;		/* # record updates */
>  		int		shape_changes;	/* # of extent splits */
>  	} refc;
> +	struct {
> +		bool		active;		/* allocation cursor state */
> +	} abt;
>  };
>  
>  /*
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 02/11] xfs: introduce allocation cursor data structure
  2019-09-16 12:16 ` [PATCH v4 02/11] xfs: introduce allocation cursor data structure Brian Foster
@ 2019-09-18 18:46   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:26AM -0400, Brian Foster wrote:
> Introduce a new allocation cursor data structure to encapsulate the
> various states and structures used to perform an extent allocation.
> This structure will eventually be used to track overall allocation
> state across different search algorithms on both free space btrees.
> 
> To start, include the three btree cursors (one for the cntbt and two
> for the bnobt left/right search) used by the near mode allocation
> algorithm and refactor the cursor setup and teardown code into
> helpers. This slightly changes cursor memory allocation patterns,
> but otherwise makes no functional changes to the allocation
> algorithm.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks good to me; thanks for breaking this up a little more. :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 318 +++++++++++++++++++-------------------
>  1 file changed, 163 insertions(+), 155 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 512a45888e06..d159377ed603 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -710,8 +710,71 @@ xfs_alloc_update_counters(
>  }
>  
>  /*
> - * Allocation group level functions.
> + * Block allocation algorithm and data structures.
>   */
> +struct xfs_alloc_cur {
> +	struct xfs_btree_cur		*cnt;	/* btree cursors */
> +	struct xfs_btree_cur		*bnolt;
> +	struct xfs_btree_cur		*bnogt;
> +};
> +
> +/*
> + * Set up cursors, etc. in the extent allocation cursor. This function can be
> + * called multiple times to reset an initialized structure without having to
> + * reallocate cursors.
> + */
> +static int
> +xfs_alloc_cur_setup(
> +	struct xfs_alloc_arg	*args,
> +	struct xfs_alloc_cur	*acur)
> +{
> +	int			error;
> +	int			i;
> +
> +	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
> +
> +	/*
> +	 * Perform an initial cntbt lookup to check for availability of maxlen
> +	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
> +	 * attempt a small allocation.
> +	 */
> +	if (!acur->cnt)
> +		acur->cnt = xfs_allocbt_init_cursor(args->mp, args->tp,
> +					args->agbp, args->agno, XFS_BTNUM_CNT);
> +	error = xfs_alloc_lookup_ge(acur->cnt, 0, args->maxlen, &i);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Allocate the bnobt left and right search cursors.
> +	 */
> +	if (!acur->bnolt)
> +		acur->bnolt = xfs_allocbt_init_cursor(args->mp, args->tp,
> +					args->agbp, args->agno, XFS_BTNUM_BNO);
> +	if (!acur->bnogt)
> +		acur->bnogt = xfs_allocbt_init_cursor(args->mp, args->tp,
> +					args->agbp, args->agno, XFS_BTNUM_BNO);
> +	return i == 1 ? 0 : -ENOSPC;
> +}
> +
> +static void
> +xfs_alloc_cur_close(
> +	struct xfs_alloc_cur	*acur,
> +	bool			error)
> +{
> +	int			cur_error = XFS_BTREE_NOERROR;
> +
> +	if (error)
> +		cur_error = XFS_BTREE_ERROR;
> +
> +	if (acur->cnt)
> +		xfs_btree_del_cursor(acur->cnt, cur_error);
> +	if (acur->bnolt)
> +		xfs_btree_del_cursor(acur->bnolt, cur_error);
> +	if (acur->bnogt)
> +		xfs_btree_del_cursor(acur->bnogt, cur_error);
> +	acur->cnt = acur->bnolt = acur->bnogt = NULL;
> +}
>  
>  /*
>   * Deal with the case where only small freespaces remain. Either return the
> @@ -1008,8 +1071,8 @@ xfs_alloc_ag_vextent_exact(
>  STATIC int
>  xfs_alloc_find_best_extent(
>  	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> -	struct xfs_btree_cur	**gcur,	/* good cursor */
> -	struct xfs_btree_cur	**scur,	/* searching cursor */
> +	struct xfs_btree_cur	*gcur,	/* good cursor */
> +	struct xfs_btree_cur	*scur,	/* searching cursor */
>  	xfs_agblock_t		gdiff,	/* difference for search comparison */
>  	xfs_agblock_t		*sbno,	/* extent found by search */
>  	xfs_extlen_t		*slen,	/* extent length */
> @@ -1031,7 +1094,7 @@ xfs_alloc_find_best_extent(
>  	 * Look until we find a better one, run out of space or run off the end.
>  	 */
>  	do {
> -		error = xfs_alloc_get_rec(*scur, sbno, slen, &i);
> +		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
>  		if (error)
>  			goto error0;
>  		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> @@ -1074,21 +1137,19 @@ xfs_alloc_find_best_extent(
>  		}
>  
>  		if (!dir)
> -			error = xfs_btree_increment(*scur, 0, &i);
> +			error = xfs_btree_increment(scur, 0, &i);
>  		else
> -			error = xfs_btree_decrement(*scur, 0, &i);
> +			error = xfs_btree_decrement(scur, 0, &i);
>  		if (error)
>  			goto error0;
>  	} while (i);
>  
>  out_use_good:
> -	xfs_btree_del_cursor(*scur, XFS_BTREE_NOERROR);
> -	*scur = NULL;
> +	scur->bc_private.a.priv.abt.active = false;
>  	return 0;
>  
>  out_use_search:
> -	xfs_btree_del_cursor(*gcur, XFS_BTREE_NOERROR);
> -	*gcur = NULL;
> +	gcur->bc_private.a.priv.abt.active = false;
>  	return 0;
>  
>  error0:
> @@ -1102,13 +1163,12 @@ xfs_alloc_find_best_extent(
>   * and of the form k * prod + mod unless there's nothing that large.
>   * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
>   */
> -STATIC int				/* error */
> +STATIC int
>  xfs_alloc_ag_vextent_near(
> -	xfs_alloc_arg_t	*args)		/* allocation argument structure */
> +	struct xfs_alloc_arg	*args)
>  {
> -	xfs_btree_cur_t	*bno_cur_gt;	/* cursor for bno btree, right side */
> -	xfs_btree_cur_t	*bno_cur_lt;	/* cursor for bno btree, left side */
> -	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
> +	struct xfs_alloc_cur	acur = {0,};
> +	struct xfs_btree_cur	*bno_cur;
>  	xfs_agblock_t	gtbno;		/* start bno of right side entry */
>  	xfs_agblock_t	gtbnoa;		/* aligned ... */
>  	xfs_extlen_t	gtdiff;		/* difference to right side entry */
> @@ -1148,38 +1208,29 @@ xfs_alloc_ag_vextent_near(
>  		args->agbno = args->max_agbno;
>  
>  restart:
> -	bno_cur_lt = NULL;
> -	bno_cur_gt = NULL;
>  	ltlen = 0;
>  	gtlena = 0;
>  	ltlena = 0;
>  	busy = false;
>  
>  	/*
> -	 * Get a cursor for the by-size btree.
> +	 * Set up cursors and see if there are any free extents as big as
> +	 * maxlen. If not, pick the last entry in the tree unless the tree is
> +	 * empty.
>  	 */
> -	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
> -		args->agno, XFS_BTNUM_CNT);
> -
> -	/*
> -	 * See if there are any free extents as big as maxlen.
> -	 */
> -	if ((error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i)))
> -		goto error0;
> -	/*
> -	 * If none, then pick up the last entry in the tree unless the
> -	 * tree is empty.
> -	 */
> -	if (!i) {
> -		if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
> -				&ltlen, &i)))
> -			goto error0;
> +	error = xfs_alloc_cur_setup(args, &acur);
> +	if (error == -ENOSPC) {
> +		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
> +				&ltlen, &i);
> +		if (error)
> +			goto out;
>  		if (i == 0 || ltlen == 0) {
> -			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
>  			trace_xfs_alloc_near_noentry(args);
> -			return 0;
> +			goto out;
>  		}
>  		ASSERT(i == 1);
> +	} else if (error) {
> +		goto out;
>  	}
>  	args->wasfromfl = 0;
>  
> @@ -1193,7 +1244,7 @@ xfs_alloc_ag_vextent_near(
>  	 * This is written as a while loop so we can break out of it,
>  	 * but we never loop back to the top.
>  	 */
> -	while (xfs_btree_islastblock(cnt_cur, 0)) {
> +	while (xfs_btree_islastblock(acur.cnt, 0)) {
>  		xfs_extlen_t	bdiff;
>  		int		besti=0;
>  		xfs_extlen_t	blen=0;
> @@ -1210,32 +1261,35 @@ xfs_alloc_ag_vextent_near(
>  		 * and skip all those smaller than minlen.
>  		 */
>  		if (ltlen || args->alignment > 1) {
> -			cnt_cur->bc_ptrs[0] = 1;
> +			acur.cnt->bc_ptrs[0] = 1;
>  			do {
> -				if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno,
> -						&ltlen, &i)))
> -					goto error0;
> -				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> +				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
> +						&ltlen, &i);
> +				if (error)
> +					goto out;
> +				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
>  				if (ltlen >= args->minlen)
>  					break;
> -				if ((error = xfs_btree_increment(cnt_cur, 0, &i)))
> -					goto error0;
> +				error = xfs_btree_increment(acur.cnt, 0, &i);
> +				if (error)
> +					goto out;
>  			} while (i);
>  			ASSERT(ltlen >= args->minlen);
>  			if (!i)
>  				break;
>  		}
> -		i = cnt_cur->bc_ptrs[0];
> +		i = acur.cnt->bc_ptrs[0];
>  		for (j = 1, blen = 0, bdiff = 0;
>  		     !error && j && (blen < args->maxlen || bdiff > 0);
> -		     error = xfs_btree_increment(cnt_cur, 0, &j)) {
> +		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
>  			/*
>  			 * For each entry, decide if it's better than
>  			 * the previous best entry.
>  			 */
> -			if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
> -				goto error0;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> +			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> +			if (error)
> +				goto out;
> +			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
>  			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
>  					&ltbnoa, &ltlena, &busy_gen);
>  			if (ltlena < args->minlen)
> @@ -1255,7 +1309,7 @@ xfs_alloc_ag_vextent_near(
>  				bdiff = ltdiff;
>  				bnew = ltnew;
>  				blen = args->len;
> -				besti = cnt_cur->bc_ptrs[0];
> +				besti = acur.cnt->bc_ptrs[0];
>  			}
>  		}
>  		/*
> @@ -1267,10 +1321,11 @@ xfs_alloc_ag_vextent_near(
>  		/*
>  		 * Point at the best entry, and retrieve it again.
>  		 */
> -		cnt_cur->bc_ptrs[0] = besti;
> -		if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
> -			goto error0;
> -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> +		acur.cnt->bc_ptrs[0] = besti;
> +		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> +		if (error)
> +			goto out;
> +		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
>  		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
>  		args->len = blen;
>  
> @@ -1280,23 +1335,14 @@ xfs_alloc_ag_vextent_near(
>  		args->agbno = bnew;
>  		ASSERT(bnew >= ltbno);
>  		ASSERT(bnew + blen <= ltbno + ltlen);
> -		/*
> -		 * Set up a cursor for the by-bno tree.
> -		 */
> -		bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp,
> -			args->agbp, args->agno, XFS_BTNUM_BNO);
> -		/*
> -		 * Fix up the btree entries.
> -		 */
> -		if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno,
> -				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK)))
> -			goto error0;
> -		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
> -		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
> -
> +		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
> +					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
> +		if (error)
> +			goto out;
>  		trace_xfs_alloc_near_first(args);
> -		return 0;
> +		goto out;
>  	}
> +
>  	/*
>  	 * Second algorithm.
>  	 * Search in the by-bno tree to the left and to the right
> @@ -1309,86 +1355,57 @@ xfs_alloc_ag_vextent_near(
>  	 * level algorithm that picks allocation groups for allocations
>  	 * is not supposed to do this.
>  	 */
> -	/*
> -	 * Allocate and initialize the cursor for the leftward search.
> -	 */
> -	bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
> -		args->agno, XFS_BTNUM_BNO);
> -	/*
> -	 * Lookup <= bno to find the leftward search's starting point.
> -	 */
> -	if ((error = xfs_alloc_lookup_le(bno_cur_lt, args->agbno, args->maxlen, &i)))
> -		goto error0;
> -	if (!i) {
> -		/*
> -		 * Didn't find anything; use this cursor for the rightward
> -		 * search.
> -		 */
> -		bno_cur_gt = bno_cur_lt;
> -		bno_cur_lt = NULL;
> -	}
> -	/*
> -	 * Found something.  Duplicate the cursor for the rightward search.
> -	 */
> -	else if ((error = xfs_btree_dup_cursor(bno_cur_lt, &bno_cur_gt)))
> -		goto error0;
> -	/*
> -	 * Increment the cursor, so we will point at the entry just right
> -	 * of the leftward entry if any, or to the leftmost entry.
> -	 */
> -	if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
> -		goto error0;
> -	if (!i) {
> -		/*
> -		 * It failed, there are no rightward entries.
> -		 */
> -		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_NOERROR);
> -		bno_cur_gt = NULL;
> -	}
> +	error = xfs_alloc_lookup_le(acur.bnolt, args->agbno, 0, &i);
> +	if (error)
> +		goto out;
> +	error = xfs_alloc_lookup_ge(acur.bnogt, args->agbno, 0, &i);
> +	if (error)
> +		goto out;
> +
>  	/*
>  	 * Loop going left with the leftward cursor, right with the
>  	 * rightward cursor, until either both directions give up or
>  	 * we find an entry at least as big as minlen.
>  	 */
>  	do {
> -		if (bno_cur_lt) {
> -			if ((error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i)))
> -				goto error0;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> +		if (xfs_alloc_cur_active(acur.bnolt)) {
> +			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
> +			if (error)
> +				goto out;
> +			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
>  			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
>  					&ltbnoa, &ltlena, &busy_gen);
>  			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
>  				break;
> -			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
> -				goto error0;
> -			if (!i || ltbnoa < args->min_agbno) {
> -				xfs_btree_del_cursor(bno_cur_lt,
> -						     XFS_BTREE_NOERROR);
> -				bno_cur_lt = NULL;
> -			}
> +			error = xfs_btree_decrement(acur.bnolt, 0, &i);
> +			if (error)
> +				goto out;
> +			if (!i || ltbnoa < args->min_agbno)
> +				acur.bnolt->bc_private.a.priv.abt.active = false;
>  		}
> -		if (bno_cur_gt) {
> -			if ((error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i)))
> -				goto error0;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> +		if (xfs_alloc_cur_active(acur.bnogt)) {
> +			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
> +			if (error)
> +				goto out;
> +			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
>  			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
>  					&gtbnoa, &gtlena, &busy_gen);
>  			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
>  				break;
> -			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
> -				goto error0;
> -			if (!i || gtbnoa > args->max_agbno) {
> -				xfs_btree_del_cursor(bno_cur_gt,
> -						     XFS_BTREE_NOERROR);
> -				bno_cur_gt = NULL;
> -			}
> +			error = xfs_btree_increment(acur.bnogt, 0, &i);
> +			if (error)
> +				goto out;
> +			if (!i || gtbnoa > args->max_agbno)
> +				acur.bnogt->bc_private.a.priv.abt.active = false;
>  		}
> -	} while (bno_cur_lt || bno_cur_gt);
> +	} while (xfs_alloc_cur_active(acur.bnolt) ||
> +		 xfs_alloc_cur_active(acur.bnogt));
>  
>  	/*
>  	 * Got both cursors still active, need to find better entry.
>  	 */
> -	if (bno_cur_lt && bno_cur_gt) {
> +	if (xfs_alloc_cur_active(acur.bnolt) &&
> +	    xfs_alloc_cur_active(acur.bnogt)) {
>  		if (ltlena >= args->minlen) {
>  			/*
>  			 * Left side is good, look for a right side entry.
> @@ -1400,7 +1417,7 @@ xfs_alloc_ag_vextent_near(
>  				ltlena, &ltnew);
>  
>  			error = xfs_alloc_find_best_extent(args,
> -						&bno_cur_lt, &bno_cur_gt,
> +						acur.bnolt, acur.bnogt,
>  						ltdiff, &gtbno, &gtlen,
>  						&gtbnoa, &gtlena,
>  						0 /* search right */);
> @@ -1417,22 +1434,21 @@ xfs_alloc_ag_vextent_near(
>  				gtlena, &gtnew);
>  
>  			error = xfs_alloc_find_best_extent(args,
> -						&bno_cur_gt, &bno_cur_lt,
> +						acur.bnogt, acur.bnolt,
>  						gtdiff, &ltbno, &ltlen,
>  						&ltbnoa, &ltlena,
>  						1 /* search left */);
>  		}
>  
>  		if (error)
> -			goto error0;
> +			goto out;
>  	}
>  
>  	/*
>  	 * If we couldn't get anything, give up.
>  	 */
> -	if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
> -		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
> -
> +	if (!xfs_alloc_cur_active(acur.bnolt) &&
> +	    !xfs_alloc_cur_active(acur.bnogt)) {
>  		if (busy) {
>  			trace_xfs_alloc_near_busy(args);
>  			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
> @@ -1440,7 +1456,7 @@ xfs_alloc_ag_vextent_near(
>  		}
>  		trace_xfs_alloc_size_neither(args);
>  		args->agbno = NULLAGBLOCK;
> -		return 0;
> +		goto out;
>  	}
>  
>  	/*
> @@ -1449,16 +1465,17 @@ xfs_alloc_ag_vextent_near(
>  	 * useful variables to the "left" set so we only have one
>  	 * copy of this code.
>  	 */
> -	if (bno_cur_gt) {
> -		bno_cur_lt = bno_cur_gt;
> -		bno_cur_gt = NULL;
> +	if (xfs_alloc_cur_active(acur.bnogt)) {
> +		bno_cur = acur.bnogt;
>  		ltbno = gtbno;
>  		ltbnoa = gtbnoa;
>  		ltlen = gtlen;
>  		ltlena = gtlena;
>  		j = 1;
> -	} else
> +	} else {
> +		bno_cur = acur.bnolt;
>  		j = 0;
> +	}
>  
>  	/*
>  	 * Fix up the length and compute the useful address.
> @@ -1474,27 +1491,18 @@ xfs_alloc_ag_vextent_near(
>  	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
>  	args->agbno = ltnew;
>  
> -	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
> -			ltnew, rlen, XFSA_FIXUP_BNO_OK)))
> -		goto error0;
> +	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
> +				      rlen, XFSA_FIXUP_BNO_OK);
> +	if (error)
> +		goto out;
>  
>  	if (j)
>  		trace_xfs_alloc_near_greater(args);
>  	else
>  		trace_xfs_alloc_near_lesser(args);
>  
> -	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
> -	xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
> -	return 0;
> -
> - error0:
> -	trace_xfs_alloc_near_error(args);
> -	if (cnt_cur != NULL)
> -		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
> -	if (bno_cur_lt != NULL)
> -		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_ERROR);
> -	if (bno_cur_gt != NULL)
> -		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_ERROR);
> +out:
> +	xfs_alloc_cur_close(&acur, error);
>  	return error;
>  }
>  
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor
  2019-09-16 12:16 ` [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
@ 2019-09-18 18:48   ` Darrick J. Wong
  2019-09-19 14:58     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:48 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:27AM -0400, Brian Foster wrote:
> Extend the allocation cursor to track extent busy state for an
> allocation attempt. No functional changes.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index d159377ed603..5c34d4c41761 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -716,6 +716,8 @@ struct xfs_alloc_cur {
>  	struct xfs_btree_cur		*cnt;	/* btree cursors */
>  	struct xfs_btree_cur		*bnolt;
>  	struct xfs_btree_cur		*bnogt;
> +	unsigned			busy_gen;/* busy state */

Nit: unsigned int here?

'unsigned' without the 'int' looks a little funny to me, but eh
whatever, I guess we do that in the iomap code so:

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> +	bool				busy;
>  };
>  
>  /*
> @@ -733,6 +735,9 @@ xfs_alloc_cur_setup(
>  
>  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
>  
> +	acur->busy = false;
> +	acur->busy_gen = 0;
> +
>  	/*
>  	 * Perform an initial cntbt lookup to check for availability of maxlen
>  	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
> @@ -1185,8 +1190,6 @@ xfs_alloc_ag_vextent_near(
>  	xfs_extlen_t	ltlena;		/* aligned ... */
>  	xfs_agblock_t	ltnew;		/* useful start bno of left side */
>  	xfs_extlen_t	rlen;		/* length of returned extent */
> -	bool		busy;
> -	unsigned	busy_gen;
>  #ifdef DEBUG
>  	/*
>  	 * Randomly don't execute the first algorithm.
> @@ -1211,7 +1214,6 @@ xfs_alloc_ag_vextent_near(
>  	ltlen = 0;
>  	gtlena = 0;
>  	ltlena = 0;
> -	busy = false;
>  
>  	/*
>  	 * Set up cursors and see if there are any free extents as big as
> @@ -1290,8 +1292,8 @@ xfs_alloc_ag_vextent_near(
>  			if (error)
>  				goto out;
>  			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
> -					&ltbnoa, &ltlena, &busy_gen);
> +			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
> +					&ltbnoa, &ltlena, &acur.busy_gen);
>  			if (ltlena < args->minlen)
>  				continue;
>  			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
> @@ -1373,8 +1375,8 @@ xfs_alloc_ag_vextent_near(
>  			if (error)
>  				goto out;
>  			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
> -					&ltbnoa, &ltlena, &busy_gen);
> +			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
> +					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
>  			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
>  				break;
>  			error = xfs_btree_decrement(acur.bnolt, 0, &i);
> @@ -1388,8 +1390,8 @@ xfs_alloc_ag_vextent_near(
>  			if (error)
>  				goto out;
>  			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
> -					&gtbnoa, &gtlena, &busy_gen);
> +			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
> +					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
>  			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
>  				break;
>  			error = xfs_btree_increment(acur.bnogt, 0, &i);
> @@ -1449,9 +1451,10 @@ xfs_alloc_ag_vextent_near(
>  	 */
>  	if (!xfs_alloc_cur_active(acur.bnolt) &&
>  	    !xfs_alloc_cur_active(acur.bnogt)) {
> -		if (busy) {
> +		if (acur.busy) {
>  			trace_xfs_alloc_near_busy(args);
> -			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
> +			xfs_extent_busy_flush(args->mp, args->pag,
> +					      acur.busy_gen);
>  			goto restart;
>  		}
>  		trace_xfs_alloc_size_neither(args);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor
  2019-09-16 12:16 ` [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
@ 2019-09-18 18:56   ` Darrick J. Wong
  2019-09-19 15:00     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 18:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:28AM -0400, Brian Foster wrote:
> If the size lookup lands in the last block of the by-size btree, the
> near mode algorithm scans the entire block for the extent with best
> available locality. In preparation for similar best available
> extent tracking across both btrees, extend the allocation cursor
> with best extent data and lift the associated state from the cntbt
> last block scan code. No functional changes.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 63 ++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 5c34d4c41761..ee46989ab723 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -716,6 +716,11 @@ struct xfs_alloc_cur {
>  	struct xfs_btree_cur		*cnt;	/* btree cursors */
>  	struct xfs_btree_cur		*bnolt;
>  	struct xfs_btree_cur		*bnogt;
> +	xfs_agblock_t			rec_bno;/* extent startblock */
> +	xfs_extlen_t			rec_len;/* extent length */
> +	xfs_agblock_t			bno;	/* alloc bno */
> +	xfs_extlen_t			len;	/* alloc len */
> +	xfs_extlen_t			diff;	/* diff from search bno */
>  	unsigned			busy_gen;/* busy state */
>  	bool				busy;
>  };
> @@ -735,6 +740,11 @@ xfs_alloc_cur_setup(
>  
>  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
>  
> +	acur->rec_bno = 0;
> +	acur->rec_len = 0;
> +	acur->bno = 0;
> +	acur->len = 0;
> +	acur->diff = -1;

xfs_extlen_t is a uint32_t; is this going to cause comparison problems?

Also ... assuming that acur->diff is the successor to the bdiff variable
below, shouldn't it be initialized to zero like bdiff was?

--D

>  	acur->busy = false;
>  	acur->busy_gen = 0;
>  
> @@ -1247,10 +1257,7 @@ xfs_alloc_ag_vextent_near(
>  	 * but we never loop back to the top.
>  	 */
>  	while (xfs_btree_islastblock(acur.cnt, 0)) {
> -		xfs_extlen_t	bdiff;
> -		int		besti=0;
> -		xfs_extlen_t	blen=0;
> -		xfs_agblock_t	bnew=0;
> +		xfs_extlen_t	diff;
>  
>  #ifdef DEBUG
>  		if (dofirst)
> @@ -1281,8 +1288,8 @@ xfs_alloc_ag_vextent_near(
>  				break;
>  		}
>  		i = acur.cnt->bc_ptrs[0];
> -		for (j = 1, blen = 0, bdiff = 0;
> -		     !error && j && (blen < args->maxlen || bdiff > 0);
> +		for (j = 1;
> +		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
>  		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
>  			/*
>  			 * For each entry, decide if it's better than
> @@ -1301,44 +1308,40 @@ xfs_alloc_ag_vextent_near(
>  			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
>  			xfs_alloc_fix_len(args);
>  			ASSERT(args->len >= args->minlen);
> -			if (args->len < blen)
> +			if (args->len < acur.len)
>  				continue;
> -			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> +			diff = xfs_alloc_compute_diff(args->agbno, args->len,
>  				args->alignment, args->datatype, ltbnoa,
>  				ltlena, &ltnew);
>  			if (ltnew != NULLAGBLOCK &&
> -			    (args->len > blen || ltdiff < bdiff)) {
> -				bdiff = ltdiff;
> -				bnew = ltnew;
> -				blen = args->len;
> -				besti = acur.cnt->bc_ptrs[0];
> +			    (args->len > acur.len || diff < acur.diff)) {
> +				acur.rec_bno = ltbno;
> +				acur.rec_len = ltlen;
> +				acur.diff = diff;
> +				acur.bno = ltnew;
> +				acur.len = args->len;
>  			}
>  		}
>  		/*
>  		 * It didn't work.  We COULD be in a case where
>  		 * there's a good record somewhere, so try again.
>  		 */
> -		if (blen == 0)
> +		if (acur.len == 0)
>  			break;
> -		/*
> -		 * Point at the best entry, and retrieve it again.
> -		 */
> -		acur.cnt->bc_ptrs[0] = besti;
> -		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> -		if (error)
> -			goto out;
> -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> -		args->len = blen;
>  
>  		/*
> -		 * We are allocating starting at bnew for blen blocks.
> +		 * Allocate at the bno/len tracked in the cursor.
>  		 */
> -		args->agbno = bnew;
> -		ASSERT(bnew >= ltbno);
> -		ASSERT(bnew + blen <= ltbno + ltlen);
> -		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
> -					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
> +		args->agbno = acur.bno;
> +		args->len = acur.len;
> +		ASSERT(acur.bno >= acur.rec_bno);
> +		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> +		ASSERT(acur.rec_bno + acur.rec_len <=
> +		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> +
> +		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
> +				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
> +				0);
>  		if (error)
>  			goto out;
>  		trace_xfs_alloc_near_first(args);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper
  2019-09-16 12:16 ` [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
@ 2019-09-18 19:03   ` Darrick J. Wong
  2019-09-19 15:01     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 19:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:29AM -0400, Brian Foster wrote:
> The cntbt lastblock scan checks the size, alignment, locality, etc.
> of each free extent in the block and compares it with the current
> best candidate. This logic will be reused by the upcoming optimized
> cntbt algorithm, so refactor it into a separate helper.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 113 +++++++++++++++++++++++++++++---------
>  fs/xfs/xfs_trace.h        |  25 +++++++++
>  2 files changed, 111 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index ee46989ab723..2fa7bb6a00a8 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -791,6 +791,89 @@ xfs_alloc_cur_close(
>  	acur->cnt = acur->bnolt = acur->bnogt = NULL;
>  }
>  
> +/*
> + * Check an extent for allocation and track the best available candidate in the
> + * allocation structure. The cursor is deactivated if it has entered an out of
> + * range state based on allocation arguments. Optionally return the extent
> + * extent geometry and allocation status if requested by the caller.
> + */
> +static int
> +xfs_alloc_cur_check(
> +	struct xfs_alloc_arg		*args,
> +	struct xfs_alloc_cur		*acur,
> +	struct xfs_btree_cur		*cur,
> +	int				*new)
> +{
> +	int			error, i;

Inconsistent indentation here.

> +	xfs_agblock_t		bno, bnoa, bnew;
> +	xfs_extlen_t		len, lena, diff = -1;
> +	bool			busy;
> +	unsigned		busy_gen = 0;
> +	bool			deactivate = false;
> +
> +	*new = 0;
> +
> +	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
> +	if (error)
> +		return error;
> +	XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
> +
> +	/*
> +	 * Check minlen and deactivate a cntbt cursor if out of acceptable size
> +	 * range (i.e., walking backwards looking for a minlen extent).
> +	 */
> +	if (len < args->minlen) {
> +		deactivate = true;
> +		goto out;
> +	}
> +
> +	busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
> +					 &busy_gen);
> +	acur->busy |= busy;
> +	if (busy)
> +		acur->busy_gen = busy_gen;
> +	/* deactivate a bnobt cursor outside of locality range */
> +	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
> +		goto out;
> +	if (lena < args->minlen)
> +		goto out;
> +
> +	args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
> +	xfs_alloc_fix_len(args);
> +	ASSERT(args->len >= args->minlen);
> +	if (args->len < acur->len)
> +		goto out;
> +
> +	/*
> +	 * We have an aligned record that satisfies minlen and beats or matches
> +	 * the candidate extent size. Compare locality for near allocation mode.
> +	 */
> +	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
> +	diff = xfs_alloc_compute_diff(args->agbno, args->len,
> +				      args->alignment, args->datatype,
> +				      bnoa, lena, &bnew);
> +	if (bnew == NULLAGBLOCK)
> +		goto out;
> +	if (diff > acur->diff)
> +		goto out;
> +
> +	ASSERT(args->len > acur->len ||
> +	       (args->len == acur->len && diff <= acur->diff));
> +	acur->rec_bno = bno;
> +	acur->rec_len = len;
> +	acur->bno = bnew;
> +	acur->len = args->len;
> +	acur->diff = diff;
> +	*new = 1;
> +
> +out:
> +	if (deactivate)
> +		cur->bc_private.a.priv.abt.active = false;
> +	trace_xfs_alloc_cur_check(args->mp, cur->bc_btnum, bno, len, diff,
> +				  *new);
> +	return 0;
> +}
> +
>  /*
>   * Deal with the case where only small freespaces remain. Either return the
>   * contents of the last freespace record, or allocate space from the freelist if
> @@ -1257,8 +1340,6 @@ xfs_alloc_ag_vextent_near(
>  	 * but we never loop back to the top.
>  	 */
>  	while (xfs_btree_islastblock(acur.cnt, 0)) {
> -		xfs_extlen_t	diff;
> -
>  #ifdef DEBUG
>  		if (dofirst)
>  			break;
> @@ -1289,38 +1370,16 @@ xfs_alloc_ag_vextent_near(
>  		}
>  		i = acur.cnt->bc_ptrs[0];
>  		for (j = 1;
> -		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
> +		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
> +		     (acur.len < args->maxlen || acur.diff > 0);
>  		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
>  			/*
>  			 * For each entry, decide if it's better than
>  			 * the previous best entry.
>  			 */
> -			error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> +			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
>  			if (error)
>  				goto out;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
> -					&ltbnoa, &ltlena, &acur.busy_gen);
> -			if (ltlena < args->minlen)
> -				continue;
> -			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
> -				continue;
> -			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -			ASSERT(args->len >= args->minlen);
> -			if (args->len < acur.len)
> -				continue;
> -			diff = xfs_alloc_compute_diff(args->agbno, args->len,
> -				args->alignment, args->datatype, ltbnoa,
> -				ltlena, &ltnew);
> -			if (ltnew != NULLAGBLOCK &&
> -			    (args->len > acur.len || diff < acur.diff)) {
> -				acur.rec_bno = ltbno;
> -				acur.rec_len = ltlen;
> -				acur.diff = diff;
> -				acur.bno = ltnew;
> -				acur.len = args->len;
> -			}
>  		}
>  		/*
>  		 * It didn't work.  We COULD be in a case where
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index eaae275ed430..b12fad3e45cb 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1663,6 +1663,31 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
>  
> +TRACE_EVENT(xfs_alloc_cur_check,
> +	TP_PROTO(struct xfs_mount *mp, xfs_btnum_t btnum, xfs_agblock_t bno,
> +		 xfs_extlen_t len, xfs_extlen_t diff, bool new),
> +	TP_ARGS(mp, btnum, bno, len, diff, new),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_btnum_t, btnum)
> +		__field(xfs_agblock_t, bno)
> +		__field(xfs_extlen_t, len)
> +		__field(xfs_extlen_t, diff)
> +		__field(bool, new)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = mp->m_super->s_dev;
> +		__entry->btnum = btnum;
> +		__entry->bno = bno;
> +		__entry->len = len;
> +		__entry->diff = diff;
> +		__entry->new = new;
> +	),
> +	TP_printk("dev %d:%d btnum %d bno 0x%x len 0x%x diff 0x%x new %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->btnum,

Perhaps:

	__print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),

instead of dumping the raw btnum value?

Other than those two things, this looks like a pretty straightforward
hoisting.

--D

> +		  __entry->bno, __entry->len, __entry->diff, __entry->new)
> +)
> +
>  DECLARE_EVENT_CLASS(xfs_da_class,
>  	TP_PROTO(struct xfs_da_args *args),
>  	TP_ARGS(args),
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-09-16 12:16 ` [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
@ 2019-09-18 20:43   ` Darrick J. Wong
  2019-09-19 15:04     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 20:43 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:30AM -0400, Brian Foster wrote:
> The near mode bnobt scan searches left and right in the bnobt
> looking for the closest free extent to the allocation hint that
> satisfies minlen. Once such an extent is found, the left/right
> search terminates, we search one more time in the opposite direction
> and finish the allocation with the best overall extent.
> 
> The left/right and find best searches are currently controlled via a
> combination of cursor state and local variables. Clean up this code
> and prepare for further improvements to the near mode fallback
> algorithm by reusing the allocation cursor best extent tracking
> mechanism. Update the tracking logic to deactivate bnobt cursors
> when out of allocation range and replace open-coded extent checks to
> calls to the common helper. In doing so, rename some misnamed local
> variables in the top-level near mode allocation function.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 270 +++++++++++---------------------------
>  fs/xfs/xfs_trace.h        |   4 +-
>  2 files changed, 76 insertions(+), 198 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 2fa7bb6a00a8..edcec975c306 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -810,6 +810,7 @@ xfs_alloc_cur_check(
>  	bool			busy;
>  	unsigned		busy_gen = 0;
>  	bool			deactivate = false;
> +	bool			isbnobt = cur->bc_btnum == XFS_BTNUM_BNO;
>  
>  	*new = 0;
>  
> @@ -823,7 +824,7 @@ xfs_alloc_cur_check(
>  	 * range (i.e., walking backwards looking for a minlen extent).
>  	 */
>  	if (len < args->minlen) {
> -		deactivate = true;
> +		deactivate = !isbnobt;
>  		goto out;
>  	}
>  
> @@ -833,8 +834,10 @@ xfs_alloc_cur_check(
>  	if (busy)
>  		acur->busy_gen = busy_gen;
>  	/* deactivate a bnobt cursor outside of locality range */
> -	if (bnoa < args->min_agbno || bnoa > args->max_agbno)
> +	if (bnoa < args->min_agbno || bnoa > args->max_agbno) {
> +		deactivate = isbnobt;
>  		goto out;
> +	}
>  	if (lena < args->minlen)
>  		goto out;
>  
> @@ -854,8 +857,14 @@ xfs_alloc_cur_check(
>  				      bnoa, lena, &bnew);
>  	if (bnew == NULLAGBLOCK)
>  		goto out;
> -	if (diff > acur->diff)
> +
> +	/*
> +	 * Deactivate a bnobt cursor with worse locality than the current best.
> +	 */
> +	if (diff > acur->diff) {
> +		deactivate = isbnobt;
>  		goto out;
> +	}
>  
>  	ASSERT(args->len > acur->len ||
>  	       (args->len == acur->len && diff <= acur->diff));
> @@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
>  }
>  
>  /*
> - * Search the btree in a given direction via the search cursor and compare
> - * the records found against the good extent we've already found.
> + * Search the btree in a given direction and check the records against the good
> + * extent we've already found.
>   */
>  STATIC int
>  xfs_alloc_find_best_extent(
> -	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> -	struct xfs_btree_cur	*gcur,	/* good cursor */
> -	struct xfs_btree_cur	*scur,	/* searching cursor */
> -	xfs_agblock_t		gdiff,	/* difference for search comparison */
> -	xfs_agblock_t		*sbno,	/* extent found by search */
> -	xfs_extlen_t		*slen,	/* extent length */
> -	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
> -	xfs_extlen_t		*slena,	/* aligned extent length */
> -	int			dir)	/* 0 = search right, 1 = search left */
> +	struct xfs_alloc_arg	*args,
> +	struct xfs_alloc_cur	*acur,
> +	struct xfs_btree_cur	*cur,
> +	bool			increment)
>  {
> -	xfs_agblock_t		new;
> -	xfs_agblock_t		sdiff;
>  	int			error;
>  	int			i;
> -	unsigned		busy_gen;
> -
> -	/* The good extent is perfect, no need to  search. */
> -	if (!gdiff)
> -		goto out_use_good;
>  
>  	/*
> -	 * Look until we find a better one, run out of space or run off the end.
> +	 * Search so long as the cursor is active or we find a better extent.
> +	 * The cursor is deactivated if it extends beyond the range of the
> +	 * current allocation candidate.
>  	 */
> -	do {
> -		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
> +	while (xfs_alloc_cur_active(cur)) {
> +		error = xfs_alloc_cur_check(args, acur, cur, &i);

Does @i > 0 now mean "we found a better candidate for allocation so
we're done" ?

>  		if (error)
> -			goto error0;
> -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> -		xfs_alloc_compute_aligned(args, *sbno, *slen,
> -				sbnoa, slena, &busy_gen);
> -
> -		/*
> -		 * The good extent is closer than this one.
> -		 */
> -		if (!dir) {
> -			if (*sbnoa > args->max_agbno)
> -				goto out_use_good;
> -			if (*sbnoa >= args->agbno + gdiff)
> -				goto out_use_good;
> -		} else {
> -			if (*sbnoa < args->min_agbno)
> -				goto out_use_good;
> -			if (*sbnoa <= args->agbno - gdiff)
> -				goto out_use_good;
> -		}
> -
> -		/*
> -		 * Same distance, compare length and pick the best.
> -		 */
> -		if (*slena >= args->minlen) {
> -			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -
> -			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> -						       args->alignment,
> -						       args->datatype, *sbnoa,
> -						       *slena, &new);
> -
> -			/*
> -			 * Choose closer size and invalidate other cursor.
> -			 */
> -			if (sdiff < gdiff)
> -				goto out_use_search;
> -			goto out_use_good;
> -		}
> +			return error;
> +		if (i == 1)
> +			break;
> +		if (!xfs_alloc_cur_active(cur))
> +			break;
>  
> -		if (!dir)
> -			error = xfs_btree_increment(scur, 0, &i);
> +		if (increment)
> +			error = xfs_btree_increment(cur, 0, &i);
>  		else
> -			error = xfs_btree_decrement(scur, 0, &i);
> +			error = xfs_btree_decrement(cur, 0, &i);
>  		if (error)
> -			goto error0;
> -	} while (i);
> -
> -out_use_good:
> -	scur->bc_private.a.priv.abt.active = false;
> -	return 0;
> +			return error;
> +		if (i == 0)
> +			cur->bc_private.a.priv.abt.active = false;

...and I guess @i == 0 here means "@cur hit the end of the records so
deactivate it"?

> +	}
>  
> -out_use_search:
> -	gcur->bc_private.a.priv.abt.active = false;
>  	return 0;
> -
> -error0:
> -	/* caller invalidates cursors */
> -	return error;
>  }
>  
>  /*
> @@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
>  	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_alloc_cur	acur = {0,};
> -	struct xfs_btree_cur	*bno_cur;
> -	xfs_agblock_t	gtbno;		/* start bno of right side entry */
> -	xfs_agblock_t	gtbnoa;		/* aligned ... */
> -	xfs_extlen_t	gtdiff;		/* difference to right side entry */
> -	xfs_extlen_t	gtlen;		/* length of right side entry */
> -	xfs_extlen_t	gtlena;		/* aligned ... */
> -	xfs_agblock_t	gtnew;		/* useful start bno of right side */
> +	struct xfs_btree_cur	*fbcur = NULL;
>  	int		error;		/* error code */
>  	int		i;		/* result code, temporary */
>  	int		j;		/* result code, temporary */
> -	xfs_agblock_t	ltbno;		/* start bno of left side entry */
> -	xfs_agblock_t	ltbnoa;		/* aligned ... */
> -	xfs_extlen_t	ltdiff;		/* difference to left side entry */
> -	xfs_extlen_t	ltlen;		/* length of left side entry */
> -	xfs_extlen_t	ltlena;		/* aligned ... */
> -	xfs_agblock_t	ltnew;		/* useful start bno of left side */
> -	xfs_extlen_t	rlen;		/* length of returned extent */
> +	xfs_agblock_t	bno;

Is the indenting here inconsistent with the *fbcur declaration above?

> +	xfs_extlen_t	len;
> +	bool		fbinc = false;
>  #ifdef DEBUG
>  	/*
>  	 * Randomly don't execute the first algorithm.
> @@ -1304,9 +1251,7 @@ xfs_alloc_ag_vextent_near(
>  		args->agbno = args->max_agbno;
>  
>  restart:
> -	ltlen = 0;
> -	gtlena = 0;
> -	ltlena = 0;
> +	len = 0;
>  
>  	/*
>  	 * Set up cursors and see if there are any free extents as big as
> @@ -1315,11 +1260,11 @@ xfs_alloc_ag_vextent_near(
>  	 */
>  	error = xfs_alloc_cur_setup(args, &acur);
>  	if (error == -ENOSPC) {
> -		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &ltbno,
> -				&ltlen, &i);
> +		error = xfs_alloc_ag_vextent_small(args, acur.cnt, &bno,
> +				&len, &i);
>  		if (error)
>  			goto out;
> -		if (i == 0 || ltlen == 0) {
> +		if (i == 0 || len == 0) {
>  			trace_xfs_alloc_near_noentry(args);
>  			goto out;
>  		}
> @@ -1350,21 +1295,21 @@ xfs_alloc_ag_vextent_near(
>  		 * record smaller than maxlen, go to the start of this block,
>  		 * and skip all those smaller than minlen.
>  		 */
> -		if (ltlen || args->alignment > 1) {
> +		if (len || args->alignment > 1) {
>  			acur.cnt->bc_ptrs[0] = 1;
>  			do {
> -				error = xfs_alloc_get_rec(acur.cnt, &ltbno,
> -						&ltlen, &i);
> +				error = xfs_alloc_get_rec(acur.cnt, &bno, &len,
> +						&i);
>  				if (error)
>  					goto out;
>  				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -				if (ltlen >= args->minlen)
> +				if (len >= args->minlen)
>  					break;
>  				error = xfs_btree_increment(acur.cnt, 0, &i);
>  				if (error)
>  					goto out;
>  			} while (i);
> -			ASSERT(ltlen >= args->minlen);
> +			ASSERT(len >= args->minlen);
>  			if (!i)
>  				break;
>  		}
> @@ -1433,77 +1378,43 @@ xfs_alloc_ag_vextent_near(
>  	 */
>  	do {
>  		if (xfs_alloc_cur_active(acur.bnolt)) {
> -			error = xfs_alloc_get_rec(acur.bnolt, &ltbno, &ltlen, &i);
> +			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
>  			if (error)
>  				goto out;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
> -					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
> -			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
> +			if (i == 1) {
> +				trace_xfs_alloc_cur_left(args);
> +				fbcur = acur.bnogt;
> +				fbinc = true;
>  				break;
> +			}
>  			error = xfs_btree_decrement(acur.bnolt, 0, &i);
>  			if (error)
>  				goto out;
> -			if (!i || ltbnoa < args->min_agbno)
> +			if (!i)
>  				acur.bnolt->bc_private.a.priv.abt.active = false;
>  		}
>  		if (xfs_alloc_cur_active(acur.bnogt)) {
> -			error = xfs_alloc_get_rec(acur.bnogt, &gtbno, &gtlen, &i);
> +			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
>  			if (error)
>  				goto out;
> -			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> -			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
> -					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
> -			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
> +			if (i == 1) {
> +				trace_xfs_alloc_cur_right(args);
> +				fbcur = acur.bnolt;
> +				fbinc = false;
>  				break;
> +			}
>  			error = xfs_btree_increment(acur.bnogt, 0, &i);
>  			if (error)
>  				goto out;
> -			if (!i || gtbnoa > args->max_agbno)
> +			if (!i)
>  				acur.bnogt->bc_private.a.priv.abt.active = false;
>  		}
>  	} while (xfs_alloc_cur_active(acur.bnolt) ||
>  		 xfs_alloc_cur_active(acur.bnogt));
>  
> -	/*
> -	 * Got both cursors still active, need to find better entry.
> -	 */
> -	if (xfs_alloc_cur_active(acur.bnolt) &&
> -	    xfs_alloc_cur_active(acur.bnogt)) {
> -		if (ltlena >= args->minlen) {
> -			/*
> -			 * Left side is good, look for a right side entry.
> -			 */
> -			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> -				args->alignment, args->datatype, ltbnoa,
> -				ltlena, &ltnew);
> -
> -			error = xfs_alloc_find_best_extent(args,
> -						acur.bnolt, acur.bnogt,
> -						ltdiff, &gtbno, &gtlen,
> -						&gtbnoa, &gtlena,
> -						0 /* search right */);
> -		} else {
> -			ASSERT(gtlena >= args->minlen);
> -
> -			/*
> -			 * Right side is good, look for a left side entry.
> -			 */
> -			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
> -			xfs_alloc_fix_len(args);
> -			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> -				args->alignment, args->datatype, gtbnoa,
> -				gtlena, &gtnew);
> -
> -			error = xfs_alloc_find_best_extent(args,
> -						acur.bnogt, acur.bnolt,
> -						gtdiff, &ltbno, &ltlen,
> -						&ltbnoa, &ltlena,
> -						1 /* search left */);
> -		}
> -
> +	/* search the opposite direction for a better entry */
> +	if (fbcur) {
> +		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);

Heh, nice clean up. :)

>  		if (error)
>  			goto out;
>  	}
> @@ -1511,8 +1422,7 @@ xfs_alloc_ag_vextent_near(
>  	/*
>  	 * If we couldn't get anything, give up.
>  	 */
> -	if (!xfs_alloc_cur_active(acur.bnolt) &&
> -	    !xfs_alloc_cur_active(acur.bnogt)) {
> +	if (!acur.len) {
>  		if (acur.busy) {
>  			trace_xfs_alloc_near_busy(args);
>  			xfs_extent_busy_flush(args->mp, args->pag,
> @@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
>  		goto out;
>  	}
>  
> -	/*
> -	 * At this point we have selected a freespace entry, either to the
> -	 * left or to the right.  If it's on the right, copy all the
> -	 * useful variables to the "left" set so we only have one
> -	 * copy of this code.
> -	 */
> -	if (xfs_alloc_cur_active(acur.bnogt)) {
> -		bno_cur = acur.bnogt;
> -		ltbno = gtbno;
> -		ltbnoa = gtbnoa;
> -		ltlen = gtlen;
> -		ltlena = gtlena;
> -		j = 1;
> -	} else {
> -		bno_cur = acur.bnolt;
> -		j = 0;
> -	}
> -
> -	/*
> -	 * Fix up the length and compute the useful address.
> -	 */
> -	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> -	xfs_alloc_fix_len(args);
> -	rlen = args->len;
> -	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
> -				     args->datatype, ltbnoa, ltlena, &ltnew);

Hmm.  So I /think/ the reason this can go away is that
xfs_alloc_cur_check already did all this trimming and whatnot and
stuffed the values into the xfs_alloc_cur structure, so we can just copy
the allocated extent to the caller's @args, update the bnobt/cntbt to
reflect the allocation, and exit?

I think this looks ok, but woof a lot to digest. :/

--D

> -	ASSERT(ltnew >= ltbno);
> -	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
> -	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> -	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
> -	args->agbno = ltnew;
> -
> -	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
> -				      rlen, XFSA_FIXUP_BNO_OK);
> -	if (error)
> -		goto out;
> +	args->agbno = acur.bno;
> +	args->len = acur.len;
> +	ASSERT(acur.bno >= acur.rec_bno);
> +	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> +	ASSERT(acur.rec_bno + acur.rec_len <=
> +	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
>  
> -	if (j)
> -		trace_xfs_alloc_near_greater(args);
> -	else
> -		trace_xfs_alloc_near_lesser(args);
> +	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
> +				      acur.rec_len, acur.bno, acur.len, 0);
>  
>  out:
>  	xfs_alloc_cur_close(&acur, error);
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index b12fad3e45cb..2e57dc3d4230 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
>  DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> -DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
> -DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 07/11] xfs: refactor allocation tree fixup code
  2019-09-16 12:16 ` [PATCH v4 07/11] xfs: refactor allocation tree fixup code Brian Foster
@ 2019-09-18 20:44   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 20:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:31AM -0400, Brian Foster wrote:
> Both algorithms duplicate the same btree allocation code. Eliminate
> the duplication and reuse the fallback algorithm codepath.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 18 ++----------------
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index edcec975c306..3eacc799c4cb 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1333,23 +1333,8 @@ xfs_alloc_ag_vextent_near(
>  		if (acur.len == 0)
>  			break;
>  
> -		/*
> -		 * Allocate at the bno/len tracked in the cursor.
> -		 */
> -		args->agbno = acur.bno;
> -		args->len = acur.len;
> -		ASSERT(acur.bno >= acur.rec_bno);
> -		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> -		ASSERT(acur.rec_bno + acur.rec_len <=
> -		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> -
> -		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
> -				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
> -				0);
> -		if (error)
> -			goto out;
>  		trace_xfs_alloc_near_first(args);
> -		goto out;
> +		goto alloc;
>  	}
>  
>  	/*
> @@ -1434,6 +1419,7 @@ xfs_alloc_ag_vextent_near(
>  		goto out;
>  	}
>  
> +alloc:
>  	args->agbno = acur.bno;
>  	args->len = acur.len;
>  	ASSERT(acur.bno >= acur.rec_bno);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 09/11] xfs: refactor near mode alloc bnobt scan into separate function
  2019-09-16 12:16 ` [PATCH v4 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
@ 2019-09-18 20:55   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 20:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:33AM -0400, Brian Foster wrote:
> In preparation to enhance the near mode allocation bnobt scan algorithm, lift
> it into a separate function. No functional changes.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 128 ++++++++++++++++++++++----------------
>  1 file changed, 74 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index ab494fd50dd7..6f7e4666250c 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1223,6 +1223,78 @@ xfs_alloc_walk_iter(
>  	return 0;
>  }
>  
> +/*
> + * Search in the by-bno btree to the left and to the right simultaneously until
> + * in each case we find a large enough free extent or run into the edge of the
> + * tree. When we run into the edge of the tree, we deactivate that cursor.
> + */
> +STATIC int
> +xfs_alloc_ag_vextent_bnobt(
> +	struct xfs_alloc_arg	*args,
> +	struct xfs_alloc_cur	*acur,
> +	int			*stat)
> +{
> +	struct xfs_btree_cur	*fbcur = NULL;
> +	int			error;
> +	int			i;
> +	bool			fbinc;
> +
> +	ASSERT(acur->len == 0);
> +	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
> +
> +	*stat = 0;
> +
> +	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
> +	if (error)
> +		return error;
> +	error = xfs_alloc_lookup_ge(acur->bnogt, args->agbno, 0, &i);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * Loop going left with the leftward cursor, right with the rightward
> +	 * cursor, until either both directions give up or we find an entry at
> +	 * least as big as minlen.
> +	 */
> +	while (xfs_alloc_cur_active(acur->bnolt) ||
> +	       xfs_alloc_cur_active(acur->bnogt)) {
> +		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
> +					    true, 1, &i);
> +		if (error)
> +			return error;
> +		if (i == 1) {
> +			trace_xfs_alloc_cur_left(args);
> +			fbcur = acur->bnogt;
> +			fbinc = true;
> +			break;
> +		}
> +
> +		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
> +					    1, &i);
> +		if (error)
> +			return error;
> +		if (i == 1) {
> +			trace_xfs_alloc_cur_right(args);
> +			fbcur = acur->bnolt;
> +			fbinc = false;
> +			break;
> +		}
> +	}
> +
> +	/* search the opposite direction for a better entry */
> +	if (fbcur) {
> +		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
> +					    &i);
> +		if (error)
> +			return error;
> +	}
> +
> +	if (acur->len)
> +		*stat = 1;
> +
> +	return 0;
> +}
> +
>  /*
>   * Allocate a variable extent near bno in the allocation group agno.
>   * Extent's length (returned in len) will be between minlen and maxlen,
> @@ -1234,12 +1306,10 @@ xfs_alloc_ag_vextent_near(
>  	struct xfs_alloc_arg	*args)
>  {
>  	struct xfs_alloc_cur	acur = {0,};
> -	struct xfs_btree_cur	*fbcur = NULL;
>  	int		error;		/* error code */
>  	int		i;		/* result code, temporary */
>  	xfs_agblock_t	bno;
>  	xfs_extlen_t	len;
> -	bool		fbinc = false;
>  #ifdef DEBUG
>  	/*
>  	 * Randomly don't execute the first algorithm.
> @@ -1341,62 +1411,12 @@ xfs_alloc_ag_vextent_near(
>  	}
>  
>  	/*
> -	 * Second algorithm.
> -	 * Search in the by-bno tree to the left and to the right
> -	 * simultaneously, until in each case we find a space big enough,
> -	 * or run into the edge of the tree.  When we run into the edge,
> -	 * we deallocate that cursor.
> -	 * If both searches succeed, we compare the two spaces and pick
> -	 * the better one.
> -	 * With alignment, it's possible for both to fail; the upper
> -	 * level algorithm that picks allocation groups for allocations
> -	 * is not supposed to do this.
> +	 * Second algorithm. Search the bnobt left and right.
>  	 */
> -	error = xfs_alloc_lookup_le(acur.bnolt, args->agbno, 0, &i);
> -	if (error)
> -		goto out;
> -	error = xfs_alloc_lookup_ge(acur.bnogt, args->agbno, 0, &i);
> +	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
>  	if (error)
>  		goto out;
>  
> -	/*
> -	 * Loop going left with the leftward cursor, right with the rightward
> -	 * cursor, until either both directions give up or we find an entry at
> -	 * least as big as minlen.
> -	 */
> -	do {
> -		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
> -					    true, 1, &i);
> -		if (error)
> -			goto out;
> -		if (i == 1) {
> -			trace_xfs_alloc_cur_left(args);
> -			fbcur = acur.bnogt;
> -			fbinc = true;
> -			break;
> -		}
> -
> -		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
> -					    1, &i);
> -		if (error)
> -			goto out;
> -		if (i == 1) {
> -			trace_xfs_alloc_cur_right(args);
> -			fbcur = acur.bnolt;
> -			fbinc = false;
> -			break;
> -		}
> -	} while (xfs_alloc_cur_active(acur.bnolt) ||
> -		 xfs_alloc_cur_active(acur.bnogt));
> -
> -	/* search the opposite direction for a better entry */
> -	if (fbcur) {
> -		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
> -					    &i);
> -		if (error)
> -			goto out;
> -	}
> -
>  	/*
>  	 * If we couldn't get anything, give up.
>  	 */
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 10/11] xfs: factor out tree fixup logic into helper
  2019-09-16 12:16 ` [PATCH v4 10/11] xfs: factor out tree fixup logic into helper Brian Foster
@ 2019-09-18 20:56   ` Darrick J. Wong
  0 siblings, 0 replies; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 20:56 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:34AM -0400, Brian Foster wrote:
> Lift the btree fixup path into a helper function.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c | 42 +++++++++++++++++++++++++++++----------
>  fs/xfs/xfs_trace.h        |  1 +
>  2 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 6f7e4666250c..381a08257aaf 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -883,6 +883,36 @@ xfs_alloc_cur_check(
>  	return 0;
>  }
>  
> +/*
> + * Complete an allocation of a candidate extent. Remove the extent from both
> + * trees and update the args structure.
> + */
> +STATIC int
> +xfs_alloc_cur_finish(
> +	struct xfs_alloc_arg	*args,
> +	struct xfs_alloc_cur	*acur)
> +{
> +	int			error;
> +
> +	ASSERT(acur->cnt && acur->bnolt);
> +	ASSERT(acur->bno >= acur->rec_bno);
> +	ASSERT(acur->bno + acur->len <= acur->rec_bno + acur->rec_len);
> +	ASSERT(acur->rec_bno + acur->rec_len <=
> +	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> +
> +	error = xfs_alloc_fixup_trees(acur->cnt, acur->bnolt, acur->rec_bno,
> +				      acur->rec_len, acur->bno, acur->len, 0);
> +	if (error)
> +		return error;
> +
> +	args->agbno = acur->bno;
> +	args->len = acur->len;
> +	args->wasfromfl = 0;
> +
> +	trace_xfs_alloc_cur(args);
> +	return 0;
> +}
> +
>  /*
>   * Deal with the case where only small freespaces remain. Either return the
>   * contents of the last freespace record, or allocate space from the freelist if
> @@ -1352,7 +1382,6 @@ xfs_alloc_ag_vextent_near(
>  	} else if (error) {
>  		goto out;
>  	}
> -	args->wasfromfl = 0;
>  
>  	/*
>  	 * First algorithm.
> @@ -1433,15 +1462,8 @@ xfs_alloc_ag_vextent_near(
>  	}
>  
>  alloc:
> -	args->agbno = acur.bno;
> -	args->len = acur.len;
> -	ASSERT(acur.bno >= acur.rec_bno);
> -	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> -	ASSERT(acur.rec_bno + acur.rec_len <=
> -	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> -
> -	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
> -				      acur.rec_len, acur.bno, acur.len, 0);
> +	/* fix up btrees on a successful allocation */
> +	error = xfs_alloc_cur_finish(args, &acur);
>  
>  out:
>  	xfs_alloc_cur_close(&acur, error);
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 2e57dc3d4230..b8b93068efe7 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1642,6 +1642,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
>  DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic
  2019-09-16 12:16 ` [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
@ 2019-09-18 21:03   ` Darrick J. Wong
  2019-09-19 15:04     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 21:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:32AM -0400, Brian Foster wrote:
> The bnobt "find best" helper implements a simple btree walker
> function. This general pattern, or a subset thereof, is reused in
> various parts of a near mode allocation operation. For example, the
> bnobt left/right scans are each iterative btree walks along with the
> cntbt lastblock scan.
> 
> Rework this function into a generic btree walker, add a couple
> parameters to control termination behavior from various contexts and
> reuse it where applicable.
> 
> XXX: This slightly changes the cntbt lastblock scan logic to scan
> the entire block and use the best candidate as opposed to the
> current logic of checking until we locate a perfect match. Fix up
> cur_check() to terminate on perfect match.

Did that fix up happen in a previous patch?

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 103 ++++++++++++++++++--------------------
>  1 file changed, 48 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 3eacc799c4cb..ab494fd50dd7 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1172,30 +1172,38 @@ xfs_alloc_ag_vextent_exact(
>  }
>  
>  /*
> - * Search the btree in a given direction and check the records against the good
> - * extent we've already found.
> + * Search a given number of btree records in a given direction. Check each
> + * record against the good extent we've already found.
>   */
>  STATIC int
> -xfs_alloc_find_best_extent(
> +xfs_alloc_walk_iter(
>  	struct xfs_alloc_arg	*args,
>  	struct xfs_alloc_cur	*acur,
>  	struct xfs_btree_cur	*cur,
> -	bool			increment)
> +	bool			increment,
> +	bool			find_one,/* quit on first candidate */

Space between the comma and the comment?

--D

> +	int			count,	/* rec count (-1 for infinite) */
> +	int			*stat)
>  {
>  	int			error;
>  	int			i;
>  
> +	*stat = 0;
> +
>  	/*
>  	 * Search so long as the cursor is active or we find a better extent.
>  	 * The cursor is deactivated if it extends beyond the range of the
>  	 * current allocation candidate.
>  	 */
> -	while (xfs_alloc_cur_active(cur)) {
> +	while (xfs_alloc_cur_active(cur) && count) {
>  		error = xfs_alloc_cur_check(args, acur, cur, &i);
>  		if (error)
>  			return error;
> -		if (i == 1)
> -			break;
> +		if (i == 1) {
> +			*stat = 1;
> +			if (find_one)
> +				break;
> +		}
>  		if (!xfs_alloc_cur_active(cur))
>  			break;
>  
> @@ -1207,6 +1215,9 @@ xfs_alloc_find_best_extent(
>  			return error;
>  		if (i == 0)
>  			cur->bc_private.a.priv.abt.active = false;
> +
> +		if (count > 0)
> +			count--;
>  	}
>  
>  	return 0;
> @@ -1226,7 +1237,6 @@ xfs_alloc_ag_vextent_near(
>  	struct xfs_btree_cur	*fbcur = NULL;
>  	int		error;		/* error code */
>  	int		i;		/* result code, temporary */
> -	int		j;		/* result code, temporary */
>  	xfs_agblock_t	bno;
>  	xfs_extlen_t	len;
>  	bool		fbinc = false;
> @@ -1313,19 +1323,12 @@ xfs_alloc_ag_vextent_near(
>  			if (!i)
>  				break;
>  		}
> -		i = acur.cnt->bc_ptrs[0];
> -		for (j = 1;
> -		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
> -		     (acur.len < args->maxlen || acur.diff > 0);
> -		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
> -			/*
> -			 * For each entry, decide if it's better than
> -			 * the previous best entry.
> -			 */
> -			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
> -			if (error)
> -				goto out;
> -		}
> +
> +		error = xfs_alloc_walk_iter(args, &acur, acur.cnt, true, false,
> +					    -1, &i);
> +		if (error)
> +			goto out;
> +
>  		/*
>  		 * It didn't work.  We COULD be in a case where
>  		 * there's a good record somewhere, so try again.
> @@ -1357,49 +1360,39 @@ xfs_alloc_ag_vextent_near(
>  		goto out;
>  
>  	/*
> -	 * Loop going left with the leftward cursor, right with the
> -	 * rightward cursor, until either both directions give up or
> -	 * we find an entry at least as big as minlen.
> +	 * Loop going left with the leftward cursor, right with the rightward
> +	 * cursor, until either both directions give up or we find an entry at
> +	 * least as big as minlen.
>  	 */
>  	do {
> -		if (xfs_alloc_cur_active(acur.bnolt)) {
> -			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
> -			if (error)
> -				goto out;
> -			if (i == 1) {
> -				trace_xfs_alloc_cur_left(args);
> -				fbcur = acur.bnogt;
> -				fbinc = true;
> -				break;
> -			}
> -			error = xfs_btree_decrement(acur.bnolt, 0, &i);
> -			if (error)
> -				goto out;
> -			if (!i)
> -				acur.bnolt->bc_private.a.priv.abt.active = false;
> +		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
> +					    true, 1, &i);
> +		if (error)
> +			goto out;
> +		if (i == 1) {
> +			trace_xfs_alloc_cur_left(args);
> +			fbcur = acur.bnogt;
> +			fbinc = true;
> +			break;
>  		}
> -		if (xfs_alloc_cur_active(acur.bnogt)) {
> -			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
> -			if (error)
> -				goto out;
> -			if (i == 1) {
> -				trace_xfs_alloc_cur_right(args);
> -				fbcur = acur.bnolt;
> -				fbinc = false;
> -				break;
> -			}
> -			error = xfs_btree_increment(acur.bnogt, 0, &i);
> -			if (error)
> -				goto out;
> -			if (!i)
> -				acur.bnogt->bc_private.a.priv.abt.active = false;
> +
> +		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
> +					    1, &i);
> +		if (error)
> +			goto out;
> +		if (i == 1) {
> +			trace_xfs_alloc_cur_right(args);
> +			fbcur = acur.bnolt;
> +			fbinc = false;
> +			break;
>  		}
>  	} while (xfs_alloc_cur_active(acur.bnolt) ||
>  		 xfs_alloc_cur_active(acur.bnogt));
>  
>  	/* search the opposite direction for a better entry */
>  	if (fbcur) {
> -		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
> +		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
> +					    &i);
>  		if (error)
>  			goto out;
>  	}
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups
  2019-09-16 12:16 ` [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
@ 2019-09-18 21:11   ` Darrick J. Wong
  2019-09-19 15:05     ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-09-18 21:11 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Sep 16, 2019 at 08:16:35AM -0400, Brian Foster wrote:
> The near mode fallback algorithm consists of a left/right scan of
> the bnobt. This algorithm has very poor breakdown characteristics
> under worst case free space fragmentation conditions. If a suitable
> extent is far enough from the locality hint, each allocation may
> scan most or all of the bnobt before it completes. This causes
> pathological behavior and extremely high allocation latencies.
> 
> While locality is important to near mode allocations, it is not so
> important as to incur pathological allocation latency to provide the
> asolute best available locality for every allocation. If the
> allocation is large enough or far enough away, there is a point of
> diminishing returns. As such, we can bound the overall operation by
> including an iterative cntbt lookup in the broader search. The cntbt
> lookup is optimized to immediately find the extent with best
> locality for the given size on each iteration. Since the cntbt is
> indexed by extent size, the lookup repeats with a variably
> aggressive increasing search key size until it runs off the edge of
> the tree.
> 
> This approach provides a natural balance between the two algorithms
> for various situations. For example, the bnobt scan is able to
> satisfy smaller allocations such as for inode chunks or btree blocks
> more quickly where the cntbt search may have to search through a
> large set of extent sizes when the search key starts off small
> relative to the largest extent in the tree. On the other hand, the
> cntbt search more deterministically covers the set of suitable
> extents for larger data extent allocation requests that the bnobt
> scan may have to search the entire tree to locate.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 153 +++++++++++++++++++++++++++++++++++---
>  fs/xfs/xfs_trace.h        |   2 +
>  2 files changed, 143 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 381a08257aaf..4ec22040e516 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -716,6 +716,7 @@ struct xfs_alloc_cur {
>  	struct xfs_btree_cur		*cnt;	/* btree cursors */
>  	struct xfs_btree_cur		*bnolt;
>  	struct xfs_btree_cur		*bnogt;
> +	xfs_extlen_t			cur_len;/* current search length */
>  	xfs_agblock_t			rec_bno;/* extent startblock */
>  	xfs_extlen_t			rec_len;/* extent length */
>  	xfs_agblock_t			bno;	/* alloc bno */
> @@ -740,6 +741,7 @@ xfs_alloc_cur_setup(
>  
>  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
>  
> +	acur->cur_len = args->maxlen;
>  	acur->rec_bno = 0;
>  	acur->rec_len = 0;
>  	acur->bno = 0;
> @@ -913,6 +915,76 @@ xfs_alloc_cur_finish(
>  	return 0;
>  }
>  
> +/*
> + * Locality allocation lookup algorithm. This expects a cntbt cursor and uses
> + * bno optimized lookup to search for extents with ideal size and locality.
> + */
> +STATIC int
> +xfs_alloc_cntbt_iter(
> +	struct xfs_alloc_arg		*args,
> +	struct xfs_alloc_cur		*acur)
> +{
> +	struct xfs_btree_cur	*cur = acur->cnt;
> +	xfs_agblock_t		bno;
> +	xfs_extlen_t		len, cur_len;
> +	int			error;
> +	int			i;
> +
> +	if (!xfs_alloc_cur_active(cur))
> +		return 0;
> +
> +	/* locality optimized lookup */
> +	cur_len = acur->cur_len;
> +	error = xfs_alloc_lookup_ge(cur, args->agbno, cur_len, &i);
> +	if (error)
> +		return error;
> +	if (i == 0)
> +		return 0;
> +	error = xfs_alloc_get_rec(cur, &bno, &len, &i);
> +	if (error)
> +		return error;
> +
> +	/* check the current record and update search length from it */
> +	error = xfs_alloc_cur_check(args, acur, cur, &i);
> +	if (error)
> +		return error;
> +	ASSERT(len >= acur->cur_len);
> +	acur->cur_len = len;
> +
> +	/*
> +	 * We looked up the first record >= [agbno, len] above. The agbno is a
> +	 * secondary key and so the current record may lie just before or after
> +	 * agbno. If it is past agbno, check the previous record too so long as
> +	 * the length matches as it may be closer. Don't check a smaller record
> +	 * because that could deactivate our cursor.
> +	 */
> +	if (bno > args->agbno) {
> +		error = xfs_btree_decrement(cur, 0, &i);
> +		if (!error && i) {
> +			error = xfs_alloc_get_rec(cur, &bno, &len, &i);
> +			if (!error && i && len == acur->cur_len)
> +				error = xfs_alloc_cur_check(args, acur, cur,
> +							    &i);
> +		}
> +		if (error)
> +			return error;
> +	}
> +
> +	/*
> +	 * Increment the search key until we find at least one allocation
> +	 * candidate or if the extent we found was larger. Otherwise, double the
> +	 * search key to optimize the search. Efficiency is more important here
> +	 * than absolute best locality.
> +	 */
> +	cur_len <<= 1;
> +	if (!acur->len || acur->cur_len >= cur_len)
> +		acur->cur_len++;
> +	else
> +		acur->cur_len = cur_len;
> +
> +	return error;
> +}
> +
>  /*
>   * Deal with the case where only small freespaces remain. Either return the
>   * contents of the last freespace record, or allocate space from the freelist if
> @@ -1254,12 +1326,11 @@ xfs_alloc_walk_iter(
>  }
>  
>  /*
> - * Search in the by-bno btree to the left and to the right simultaneously until
> - * in each case we find a large enough free extent or run into the edge of the
> - * tree. When we run into the edge of the tree, we deactivate that cursor.
> + * Search the by-bno and by-size btrees in parallel in search of an extent with
> + * ideal locality based on the NEAR mode ->agbno locality hint.
>   */
>  STATIC int
> -xfs_alloc_ag_vextent_bnobt(
> +xfs_alloc_ag_vextent_locality(
>  	struct xfs_alloc_arg	*args,
>  	struct xfs_alloc_cur	*acur,
>  	int			*stat)
> @@ -1274,6 +1345,9 @@ xfs_alloc_ag_vextent_bnobt(
>  
>  	*stat = 0;
>  
> +	error = xfs_alloc_lookup_ge(acur->cnt, args->agbno, acur->cur_len, &i);
> +	if (error)
> +		return error;
>  	error = xfs_alloc_lookup_le(acur->bnolt, args->agbno, 0, &i);
>  	if (error)
>  		return error;
> @@ -1282,12 +1356,37 @@ xfs_alloc_ag_vextent_bnobt(
>  		return error;
>  
>  	/*
> -	 * Loop going left with the leftward cursor, right with the rightward
> -	 * cursor, until either both directions give up or we find an entry at
> -	 * least as big as minlen.
> +	 * Search the bnobt and cntbt in parallel. Search the bnobt left and
> +	 * right and lookup the closest extent to the locality hint for each
> +	 * extent size key in the cntbt. The entire search terminates
> +	 * immediately on a bnobt hit because that means we've found best case
> +	 * locality. Otherwise the search continues until the cntbt cursor runs
> +	 * off the end of the tree. If no allocation candidate is found at this
> +	 * point, give up on locality, walk backwards from the end of the cntbt
> +	 * and take the first available extent.
> +	 *
> +	 * The parallel tree searches balance each other out to provide fairly
> +	 * consistent performance for various situations. The bnobt search can
> +	 * have pathological behavior in the worst case scenario of larger
> +	 * allocation requests and fragmented free space. On the other hand, the
> +	 * bnobt is able to satisfy most smaller allocation requests much more
> +	 * quickly than the cntbt. The cntbt search can sift through fragmented
> +	 * free space and sets of free extents for larger allocation requests
> +	 * more quickly than the bnobt. Since the locality hint is just a hint
> +	 * and we don't want to scan the entire bnobt for perfect locality, the
> +	 * cntbt search essentially bounds the bnobt search such that we can
> +	 * find good enough locality at reasonable performance in most cases.
>  	 */
>  	while (xfs_alloc_cur_active(acur->bnolt) ||
> -	       xfs_alloc_cur_active(acur->bnogt)) {
> +	       xfs_alloc_cur_active(acur->bnogt) ||
> +	       xfs_alloc_cur_active(acur->cnt)) {
> +
> +		trace_xfs_alloc_cur_lookup(args);
> +
> +		/*
> +		 * Search the bnobt left and right. In the case of a hit, finish
> +		 * the search in the opposite direction and we're done.
> +		 */
>  		error = xfs_alloc_walk_iter(args, acur, acur->bnolt, false,
>  					    true, 1, &i);
>  		if (error)
> @@ -1298,7 +1397,6 @@ xfs_alloc_ag_vextent_bnobt(
>  			fbinc = true;
>  			break;
>  		}
> -
>  		error = xfs_alloc_walk_iter(args, acur, acur->bnogt, true, true,
>  					    1, &i);
>  		if (error)
> @@ -1309,9 +1407,39 @@ xfs_alloc_ag_vextent_bnobt(
>  			fbinc = false;
>  			break;
>  		}
> +
> +		/*
> +		 * Check the extent with best locality based on the current
> +		 * extent size search key and keep track of the best candidate.
> +		 * If we fail to find anything due to busy extents, return
> +		 * empty handed so the caller can flush and retry the search. If
> +		 * no busy extents were found, walk backwards from the end of
> +		 * the cntbt as a last resort.
> +		 */
> +		error = xfs_alloc_cntbt_iter(args, acur);
> +		if (error)
> +			return error;
> +		if (!xfs_alloc_cur_active(acur->cnt)) {
> +			trace_xfs_alloc_cur_lookup_done(args);
> +			if (!acur->len && !acur->busy) {
> +				error = xfs_btree_decrement(acur->cnt, 0, &i);
> +				if (error)
> +					return error;
> +				if (i) {
> +					acur->cnt->bc_private.a.priv.abt.active = true;

Line over 80 columns?

Or, put another way, could this be refactored not to have 5 levels of
indent?

Otherwise looks good.

--D

> +					fbcur = acur->cnt;
> +					fbinc = false;
> +				}
> +			}
> +			break;
> +		}
> +
>  	}
>  
> -	/* search the opposite direction for a better entry */
> +	/*
> +	 * Search in the opposite direction for a better entry in the case of
> +	 * a bnobt hit or walk backwards from the end of the cntbt.
> +	 */
>  	if (fbcur) {
>  		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
>  					    &i);
> @@ -1440,9 +1568,10 @@ xfs_alloc_ag_vextent_near(
>  	}
>  
>  	/*
> -	 * Second algorithm. Search the bnobt left and right.
> +	 * Second algorithm. Combined cntbt and bnobt search to find ideal
> +	 * locality.
>  	 */
> -	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
> +	error = xfs_alloc_ag_vextent_locality(args, &acur, &i);
>  	if (error)
>  		goto out;
>  
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index b8b93068efe7..0c9dfeac4e75 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1645,6 +1645,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
>  DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup);
> +DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup_done);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
>  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 01/11] xfs: track active state of allocation btree cursors
  2019-09-18 18:38   ` Darrick J. Wong
@ 2019-09-19 14:57     ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-19 14:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 11:38:18AM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:25AM -0400, Brian Foster wrote:
> > The upcoming allocation algorithm update searches multiple
> > allocation btree cursors concurrently. As such, it requires an
> > active state to track when a particular cursor should continue
> > searching. While active state will be modified based on higher level
> > logic, we can define base functionality based on the result of
> > allocation btree lookups.
> > 
> > Define an active flag in the private area of the btree cursor.
> > Update it based on the result of lookups in the existing allocation
> > btree helpers. Finally, provide a new helper to query the current
> > state.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c       | 24 +++++++++++++++++++++---
> >  fs/xfs/libxfs/xfs_alloc_btree.c |  1 +
> >  fs/xfs/libxfs/xfs_btree.h       |  3 +++
> >  3 files changed, 25 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 533b04aaf6f6..512a45888e06 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -146,9 +146,13 @@ xfs_alloc_lookup_eq(
> >  	xfs_extlen_t		len,	/* length of extent */
> >  	int			*stat)	/* success/failure */
> >  {
> > +	int			error;
> > +
> >  	cur->bc_rec.a.ar_startblock = bno;
> >  	cur->bc_rec.a.ar_blockcount = len;
> > -	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> > +	error = xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
> > +	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
> 
> I think "cur->bc_private.a.priv.abt.active = (*stat == 1);" would have
> sufficed for these, right?  (Yeah, sorry, picking at nits here...)
> 

Sure, I'll fix those up.

Brian

> --D
> 
> > +	return error;
> >  }
> >  
> >  /*
> > @@ -162,9 +166,13 @@ xfs_alloc_lookup_ge(
> >  	xfs_extlen_t		len,	/* length of extent */
> >  	int			*stat)	/* success/failure */
> >  {
> > +	int			error;
> > +
> >  	cur->bc_rec.a.ar_startblock = bno;
> >  	cur->bc_rec.a.ar_blockcount = len;
> > -	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> > +	error = xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
> > +	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
> > +	return error;
> >  }
> >  
> >  /*
> > @@ -178,9 +186,19 @@ xfs_alloc_lookup_le(
> >  	xfs_extlen_t		len,	/* length of extent */
> >  	int			*stat)	/* success/failure */
> >  {
> > +	int			error;
> >  	cur->bc_rec.a.ar_startblock = bno;
> >  	cur->bc_rec.a.ar_blockcount = len;
> > -	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> > +	error = xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
> > +	cur->bc_private.a.priv.abt.active = *stat == 1 ? true : false;
> > +	return error;
> > +}
> > +
> > +static inline bool
> > +xfs_alloc_cur_active(
> > +	struct xfs_btree_cur	*cur)
> > +{
> > +	return cur && cur->bc_private.a.priv.abt.active;
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
> > index 2a94543857a1..279694d73e4e 100644
> > --- a/fs/xfs/libxfs/xfs_alloc_btree.c
> > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c
> > @@ -507,6 +507,7 @@ xfs_allocbt_init_cursor(
> >  
> >  	cur->bc_private.a.agbp = agbp;
> >  	cur->bc_private.a.agno = agno;
> > +	cur->bc_private.a.priv.abt.active = false;
> >  
> >  	if (xfs_sb_version_hascrc(&mp->m_sb))
> >  		cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index ced1e65d1483..b4e3ec1d7ff9 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -183,6 +183,9 @@ union xfs_btree_cur_private {
> >  		unsigned long	nr_ops;		/* # record updates */
> >  		int		shape_changes;	/* # of extent splits */
> >  	} refc;
> > +	struct {
> > +		bool		active;		/* allocation cursor state */
> > +	} abt;
> >  };
> >  
> >  /*
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor
  2019-09-18 18:48   ` Darrick J. Wong
@ 2019-09-19 14:58     ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-19 14:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 11:48:32AM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:27AM -0400, Brian Foster wrote:
> > Extend the allocation cursor to track extent busy state for an
> > allocation attempt. No functional changes.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 25 ++++++++++++++-----------
> >  1 file changed, 14 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index d159377ed603..5c34d4c41761 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -716,6 +716,8 @@ struct xfs_alloc_cur {
> >  	struct xfs_btree_cur		*cnt;	/* btree cursors */
> >  	struct xfs_btree_cur		*bnolt;
> >  	struct xfs_btree_cur		*bnogt;
> > +	unsigned			busy_gen;/* busy state */
> 
> Nit: unsigned int here?
> 
> 'unsigned' without the 'int' looks a little funny to me, but eh
> whatever, I guess we do that in the iomap code so:
> 

Yeah, I was just copying from the original code and the lack of an int
didn't even register. Easy enough to fix. Note that there are still
other uses of 'unsigned busy_gen,' but those should be replaced with
this one over time..

Brian

> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> --D
> 
> > +	bool				busy;
> >  };
> >  
> >  /*
> > @@ -733,6 +735,9 @@ xfs_alloc_cur_setup(
> >  
> >  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
> >  
> > +	acur->busy = false;
> > +	acur->busy_gen = 0;
> > +
> >  	/*
> >  	 * Perform an initial cntbt lookup to check for availability of maxlen
> >  	 * extents. If this fails, we'll return -ENOSPC to signal the caller to
> > @@ -1185,8 +1190,6 @@ xfs_alloc_ag_vextent_near(
> >  	xfs_extlen_t	ltlena;		/* aligned ... */
> >  	xfs_agblock_t	ltnew;		/* useful start bno of left side */
> >  	xfs_extlen_t	rlen;		/* length of returned extent */
> > -	bool		busy;
> > -	unsigned	busy_gen;
> >  #ifdef DEBUG
> >  	/*
> >  	 * Randomly don't execute the first algorithm.
> > @@ -1211,7 +1214,6 @@ xfs_alloc_ag_vextent_near(
> >  	ltlen = 0;
> >  	gtlena = 0;
> >  	ltlena = 0;
> > -	busy = false;
> >  
> >  	/*
> >  	 * Set up cursors and see if there are any free extents as big as
> > @@ -1290,8 +1292,8 @@ xfs_alloc_ag_vextent_near(
> >  			if (error)
> >  				goto out;
> >  			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> > -			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
> > -					&ltbnoa, &ltlena, &busy_gen);
> > +			acur.busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
> > +					&ltbnoa, &ltlena, &acur.busy_gen);
> >  			if (ltlena < args->minlen)
> >  				continue;
> >  			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
> > @@ -1373,8 +1375,8 @@ xfs_alloc_ag_vextent_near(
> >  			if (error)
> >  				goto out;
> >  			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> > -			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
> > -					&ltbnoa, &ltlena, &busy_gen);
> > +			acur.busy |= xfs_alloc_compute_aligned(args, ltbno,
> > +					ltlen, &ltbnoa, &ltlena, &acur.busy_gen);
> >  			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
> >  				break;
> >  			error = xfs_btree_decrement(acur.bnolt, 0, &i);
> > @@ -1388,8 +1390,8 @@ xfs_alloc_ag_vextent_near(
> >  			if (error)
> >  				goto out;
> >  			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> > -			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
> > -					&gtbnoa, &gtlena, &busy_gen);
> > +			acur.busy |= xfs_alloc_compute_aligned(args, gtbno,
> > +					gtlen, &gtbnoa, &gtlena, &acur.busy_gen);
> >  			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
> >  				break;
> >  			error = xfs_btree_increment(acur.bnogt, 0, &i);
> > @@ -1449,9 +1451,10 @@ xfs_alloc_ag_vextent_near(
> >  	 */
> >  	if (!xfs_alloc_cur_active(acur.bnolt) &&
> >  	    !xfs_alloc_cur_active(acur.bnogt)) {
> > -		if (busy) {
> > +		if (acur.busy) {
> >  			trace_xfs_alloc_near_busy(args);
> > -			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
> > +			xfs_extent_busy_flush(args->mp, args->pag,
> > +					      acur.busy_gen);
> >  			goto restart;
> >  		}
> >  		trace_xfs_alloc_size_neither(args);
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor
  2019-09-18 18:56   ` Darrick J. Wong
@ 2019-09-19 15:00     ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-19 15:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 11:56:16AM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:28AM -0400, Brian Foster wrote:
> > If the size lookup lands in the last block of the by-size btree, the
> > near mode algorithm scans the entire block for the extent with best
> > available locality. In preparation for similar best available
> > extent tracking across both btrees, extend the allocation cursor
> > with best extent data and lift the associated state from the cntbt
> > last block scan code. No functional changes.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 63 ++++++++++++++++++++-------------------
> >  1 file changed, 33 insertions(+), 30 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 5c34d4c41761..ee46989ab723 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -716,6 +716,11 @@ struct xfs_alloc_cur {
> >  	struct xfs_btree_cur		*cnt;	/* btree cursors */
> >  	struct xfs_btree_cur		*bnolt;
> >  	struct xfs_btree_cur		*bnogt;
> > +	xfs_agblock_t			rec_bno;/* extent startblock */
> > +	xfs_extlen_t			rec_len;/* extent length */
> > +	xfs_agblock_t			bno;	/* alloc bno */
> > +	xfs_extlen_t			len;	/* alloc len */
> > +	xfs_extlen_t			diff;	/* diff from search bno */
> >  	unsigned			busy_gen;/* busy state */
> >  	bool				busy;
> >  };
> > @@ -735,6 +740,11 @@ xfs_alloc_cur_setup(
> >  
> >  	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
> >  
> > +	acur->rec_bno = 0;
> > +	acur->rec_len = 0;
> > +	acur->bno = 0;
> > +	acur->len = 0;
> > +	acur->diff = -1;
> 
> xfs_extlen_t is a uint32_t; is this going to cause comparison problems?
> 

How so? This is the distance (in FSB) between the allocation hint and
the current candidate extent. I didn't think that could exceed a 32-bit
value, but I could be missing something (if so, that sounds like a bug
fix for a separate patch since this should be mostly refactoring).

> Also ... assuming that acur->diff is the successor to the bdiff variable
> below, shouldn't it be initialized to zero like bdiff was?
> 

So I think the init to -1 carried over from my initial approach of
hacking on this from scratch. Initializing to a max possible diff seemed
more clear and less fragile than 0, which basically means perfect
locality. Looking at this patch in isolation, it seems it doesn't matter
how diff is initialized because acur->len is zero until we find the
first candidate extent and assign a real value to acur->diff.

If you skip ahead to the next patch, however, you'll see the logic is
split up in xfs_alloc_cur_check() such that we depend on a "proper"
initialization of acur->diff to function correctly. I guess this is kind
of a quirk of how the patches were split up. I'll change it to init to 0
in this patch and switch it over to -1 in the next with an explanation
in the commit log...

Brian

> --D
> 
> >  	acur->busy = false;
> >  	acur->busy_gen = 0;
> >  
> > @@ -1247,10 +1257,7 @@ xfs_alloc_ag_vextent_near(
> >  	 * but we never loop back to the top.
> >  	 */
> >  	while (xfs_btree_islastblock(acur.cnt, 0)) {
> > -		xfs_extlen_t	bdiff;
> > -		int		besti=0;
> > -		xfs_extlen_t	blen=0;
> > -		xfs_agblock_t	bnew=0;
> > +		xfs_extlen_t	diff;
> >  
> >  #ifdef DEBUG
> >  		if (dofirst)
> > @@ -1281,8 +1288,8 @@ xfs_alloc_ag_vextent_near(
> >  				break;
> >  		}
> >  		i = acur.cnt->bc_ptrs[0];
> > -		for (j = 1, blen = 0, bdiff = 0;
> > -		     !error && j && (blen < args->maxlen || bdiff > 0);
> > +		for (j = 1;
> > +		     !error && j && (acur.len < args->maxlen || acur.diff > 0);
> >  		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
> >  			/*
> >  			 * For each entry, decide if it's better than
> > @@ -1301,44 +1308,40 @@ xfs_alloc_ag_vextent_near(
> >  			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> >  			xfs_alloc_fix_len(args);
> >  			ASSERT(args->len >= args->minlen);
> > -			if (args->len < blen)
> > +			if (args->len < acur.len)
> >  				continue;
> > -			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> > +			diff = xfs_alloc_compute_diff(args->agbno, args->len,
> >  				args->alignment, args->datatype, ltbnoa,
> >  				ltlena, &ltnew);
> >  			if (ltnew != NULLAGBLOCK &&
> > -			    (args->len > blen || ltdiff < bdiff)) {
> > -				bdiff = ltdiff;
> > -				bnew = ltnew;
> > -				blen = args->len;
> > -				besti = acur.cnt->bc_ptrs[0];
> > +			    (args->len > acur.len || diff < acur.diff)) {
> > +				acur.rec_bno = ltbno;
> > +				acur.rec_len = ltlen;
> > +				acur.diff = diff;
> > +				acur.bno = ltnew;
> > +				acur.len = args->len;
> >  			}
> >  		}
> >  		/*
> >  		 * It didn't work.  We COULD be in a case where
> >  		 * there's a good record somewhere, so try again.
> >  		 */
> > -		if (blen == 0)
> > +		if (acur.len == 0)
> >  			break;
> > -		/*
> > -		 * Point at the best entry, and retrieve it again.
> > -		 */
> > -		acur.cnt->bc_ptrs[0] = besti;
> > -		error = xfs_alloc_get_rec(acur.cnt, &ltbno, &ltlen, &i);
> > -		if (error)
> > -			goto out;
> > -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, out);
> > -		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > -		args->len = blen;
> >  
> >  		/*
> > -		 * We are allocating starting at bnew for blen blocks.
> > +		 * Allocate at the bno/len tracked in the cursor.
> >  		 */
> > -		args->agbno = bnew;
> > -		ASSERT(bnew >= ltbno);
> > -		ASSERT(bnew + blen <= ltbno + ltlen);
> > -		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, ltbno,
> > -					ltlen, bnew, blen, XFSA_FIXUP_CNT_OK);
> > +		args->agbno = acur.bno;
> > +		args->len = acur.len;
> > +		ASSERT(acur.bno >= acur.rec_bno);
> > +		ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> > +		ASSERT(acur.rec_bno + acur.rec_len <=
> > +		       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > +
> > +		error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt,
> > +				acur.rec_bno, acur.rec_len, acur.bno, acur.len,
> > +				0);
> >  		if (error)
> >  			goto out;
> >  		trace_xfs_alloc_near_first(args);
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper
  2019-09-18 19:03   ` Darrick J. Wong
@ 2019-09-19 15:01     ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-19 15:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 12:03:41PM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:29AM -0400, Brian Foster wrote:
> > The cntbt lastblock scan checks the size, alignment, locality, etc.
> > of each free extent in the block and compares it with the current
> > best candidate. This logic will be reused by the upcoming optimized
> > cntbt algorithm, so refactor it into a separate helper.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 113 +++++++++++++++++++++++++++++---------
> >  fs/xfs/xfs_trace.h        |  25 +++++++++
> >  2 files changed, 111 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index ee46989ab723..2fa7bb6a00a8 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -791,6 +791,89 @@ xfs_alloc_cur_close(
> >  	acur->cnt = acur->bnolt = acur->bnogt = NULL;
> >  }
> >  
> > +/*
> > + * Check an extent for allocation and track the best available candidate in the
> > + * allocation structure. The cursor is deactivated if it has entered an out of
> > + * range state based on allocation arguments. Optionally return the extent
> > + * extent geometry and allocation status if requested by the caller.
> > + */
> > +static int
> > +xfs_alloc_cur_check(
> > +	struct xfs_alloc_arg		*args,
> > +	struct xfs_alloc_cur		*acur,
> > +	struct xfs_btree_cur		*cur,
> > +	int				*new)
> > +{
> > +	int			error, i;
> 
> Inconsistent indentation here.
> 

Fixed.

> > +	xfs_agblock_t		bno, bnoa, bnew;
> > +	xfs_extlen_t		len, lena, diff = -1;
> > +	bool			busy;
> > +	unsigned		busy_gen = 0;
> > +	bool			deactivate = false;
> > +
> > +	*new = 0;
> > +
...
> > +}
> > +
> >  /*
> >   * Deal with the case where only small freespaces remain. Either return the
> >   * contents of the last freespace record, or allocate space from the freelist if
...
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index eaae275ed430..b12fad3e45cb 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1663,6 +1663,31 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
> >  
> > +TRACE_EVENT(xfs_alloc_cur_check,
> > +	TP_PROTO(struct xfs_mount *mp, xfs_btnum_t btnum, xfs_agblock_t bno,
> > +		 xfs_extlen_t len, xfs_extlen_t diff, bool new),
> > +	TP_ARGS(mp, btnum, bno, len, diff, new),
> > +	TP_STRUCT__entry(
> > +		__field(dev_t, dev)
> > +		__field(xfs_btnum_t, btnum)
> > +		__field(xfs_agblock_t, bno)
> > +		__field(xfs_extlen_t, len)
> > +		__field(xfs_extlen_t, diff)
> > +		__field(bool, new)
> > +	),
> > +	TP_fast_assign(
> > +		__entry->dev = mp->m_super->s_dev;
> > +		__entry->btnum = btnum;
> > +		__entry->bno = bno;
> > +		__entry->len = len;
> > +		__entry->diff = diff;
> > +		__entry->new = new;
> > +	),
> > +	TP_printk("dev %d:%d btnum %d bno 0x%x len 0x%x diff 0x%x new %d",
> > +		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->btnum,
> 
> Perhaps:
> 
> 	__print_symbolic(__entry->btnum, XFS_BTNUM_STRINGS),
> 
> instead of dumping the raw btnum value?
> 

Good point, fixed.

Brian

> Other than those two things, this looks like a pretty straightforward
> hoisting.
> 
> --D
> 
> > +		  __entry->bno, __entry->len, __entry->diff, __entry->new)
> > +)
> > +
> >  DECLARE_EVENT_CLASS(xfs_da_class,
> >  	TP_PROTO(struct xfs_da_args *args),
> >  	TP_ARGS(args),
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-09-18 20:43   ` Darrick J. Wong
@ 2019-09-19 15:04     ` Brian Foster
  2019-10-04 22:44       ` Darrick J. Wong
  0 siblings, 1 reply; 32+ messages in thread
From: Brian Foster @ 2019-09-19 15:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 01:43:11PM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:30AM -0400, Brian Foster wrote:
> > The near mode bnobt scan searches left and right in the bnobt
> > looking for the closest free extent to the allocation hint that
> > satisfies minlen. Once such an extent is found, the left/right
> > search terminates, we search one more time in the opposite direction
> > and finish the allocation with the best overall extent.
> > 
> > The left/right and find best searches are currently controlled via a
> > combination of cursor state and local variables. Clean up this code
> > and prepare for further improvements to the near mode fallback
> > algorithm by reusing the allocation cursor best extent tracking
> > mechanism. Update the tracking logic to deactivate bnobt cursors
> > when out of allocation range and replace open-coded extent checks to
> > calls to the common helper. In doing so, rename some misnamed local
> > variables in the top-level near mode allocation function.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 270 +++++++++++---------------------------
> >  fs/xfs/xfs_trace.h        |   4 +-
> >  2 files changed, 76 insertions(+), 198 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 2fa7bb6a00a8..edcec975c306 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
...
> > @@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
> >  }
> >  
> >  /*
> > - * Search the btree in a given direction via the search cursor and compare
> > - * the records found against the good extent we've already found.
> > + * Search the btree in a given direction and check the records against the good
> > + * extent we've already found.
> >   */
> >  STATIC int
> >  xfs_alloc_find_best_extent(
> > -	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> > -	struct xfs_btree_cur	*gcur,	/* good cursor */
> > -	struct xfs_btree_cur	*scur,	/* searching cursor */
> > -	xfs_agblock_t		gdiff,	/* difference for search comparison */
> > -	xfs_agblock_t		*sbno,	/* extent found by search */
> > -	xfs_extlen_t		*slen,	/* extent length */
> > -	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
> > -	xfs_extlen_t		*slena,	/* aligned extent length */
> > -	int			dir)	/* 0 = search right, 1 = search left */
> > +	struct xfs_alloc_arg	*args,
> > +	struct xfs_alloc_cur	*acur,
> > +	struct xfs_btree_cur	*cur,
> > +	bool			increment)
> >  {
> > -	xfs_agblock_t		new;
> > -	xfs_agblock_t		sdiff;
> >  	int			error;
> >  	int			i;
> > -	unsigned		busy_gen;
> > -
> > -	/* The good extent is perfect, no need to  search. */
> > -	if (!gdiff)
> > -		goto out_use_good;
> >  
> >  	/*
> > -	 * Look until we find a better one, run out of space or run off the end.
> > +	 * Search so long as the cursor is active or we find a better extent.
> > +	 * The cursor is deactivated if it extends beyond the range of the
> > +	 * current allocation candidate.
> >  	 */
> > -	do {
> > -		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
> > +	while (xfs_alloc_cur_active(cur)) {
> > +		error = xfs_alloc_cur_check(args, acur, cur, &i);
> 
> Does @i > 0 now mean "we found a better candidate for allocation so
> we're done" ?
> 

In the context of xfs_alloc_cur_check(), i == 1 just means the checked
extent became the new allocation candidate. Whether "we're done" or not
is a higher level policy decision. In the xfs_alloc_find_best_extent()
case, we're doing a last effort search in the opposite direction after
finding an extent to allocate in the higher level left/right bnobt
search. So yes, we are done in that scenario, but that's not necessarily
the case for all calls to xfs_alloc_cur_check() that return i == 1.

Ideally this could be documented better, but this function is reworked
into xfs_alloc_walk_iter() in the next couple patches and the behavior
changes slightly.

> >  		if (error)
> > -			goto error0;
> > -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> > -		xfs_alloc_compute_aligned(args, *sbno, *slen,
> > -				sbnoa, slena, &busy_gen);
> > -
> > -		/*
> > -		 * The good extent is closer than this one.
> > -		 */
> > -		if (!dir) {
> > -			if (*sbnoa > args->max_agbno)
> > -				goto out_use_good;
> > -			if (*sbnoa >= args->agbno + gdiff)
> > -				goto out_use_good;
> > -		} else {
> > -			if (*sbnoa < args->min_agbno)
> > -				goto out_use_good;
> > -			if (*sbnoa <= args->agbno - gdiff)
> > -				goto out_use_good;
> > -		}
> > -
> > -		/*
> > -		 * Same distance, compare length and pick the best.
> > -		 */
> > -		if (*slena >= args->minlen) {
> > -			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
> > -			xfs_alloc_fix_len(args);
> > -
> > -			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> > -						       args->alignment,
> > -						       args->datatype, *sbnoa,
> > -						       *slena, &new);
> > -
> > -			/*
> > -			 * Choose closer size and invalidate other cursor.
> > -			 */
> > -			if (sdiff < gdiff)
> > -				goto out_use_search;
> > -			goto out_use_good;
> > -		}
> > +			return error;
> > +		if (i == 1)
> > +			break;
> > +		if (!xfs_alloc_cur_active(cur))
> > +			break;
> >  
> > -		if (!dir)
> > -			error = xfs_btree_increment(scur, 0, &i);
> > +		if (increment)
> > +			error = xfs_btree_increment(cur, 0, &i);
> >  		else
> > -			error = xfs_btree_decrement(scur, 0, &i);
> > +			error = xfs_btree_decrement(cur, 0, &i);
> >  		if (error)
> > -			goto error0;
> > -	} while (i);
> > -
> > -out_use_good:
> > -	scur->bc_private.a.priv.abt.active = false;
> > -	return 0;
> > +			return error;
> > +		if (i == 0)
> > +			cur->bc_private.a.priv.abt.active = false;
> 
> ...and I guess @i == 0 here means "@cur hit the end of the records so
> deactivate it"?
> 

Yeah, this is a bit of a quirk in that xfs_btree_[inc|dec]rement() are
generic btree functions and so there was no place to bury the 'active'
updates in the first patch of the series. I suppose you could argue that
updating active is a reason to create a couple
xfs_alloc_[inc|dec]rement() wrappers, but I don't have a strong
preference. Thoughts?

> > +	}
> >  
> > -out_use_search:
> > -	gcur->bc_private.a.priv.abt.active = false;
> >  	return 0;
> > -
> > -error0:
> > -	/* caller invalidates cursors */
> > -	return error;
> >  }
> >  
> >  /*
> > @@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
> >  	struct xfs_alloc_arg	*args)
> >  {
> >  	struct xfs_alloc_cur	acur = {0,};
> > -	struct xfs_btree_cur	*bno_cur;
> > -	xfs_agblock_t	gtbno;		/* start bno of right side entry */
> > -	xfs_agblock_t	gtbnoa;		/* aligned ... */
> > -	xfs_extlen_t	gtdiff;		/* difference to right side entry */
> > -	xfs_extlen_t	gtlen;		/* length of right side entry */
> > -	xfs_extlen_t	gtlena;		/* aligned ... */
> > -	xfs_agblock_t	gtnew;		/* useful start bno of right side */
> > +	struct xfs_btree_cur	*fbcur = NULL;
> >  	int		error;		/* error code */
> >  	int		i;		/* result code, temporary */
> >  	int		j;		/* result code, temporary */
> > -	xfs_agblock_t	ltbno;		/* start bno of left side entry */
> > -	xfs_agblock_t	ltbnoa;		/* aligned ... */
> > -	xfs_extlen_t	ltdiff;		/* difference to left side entry */
> > -	xfs_extlen_t	ltlen;		/* length of left side entry */
> > -	xfs_extlen_t	ltlena;		/* aligned ... */
> > -	xfs_agblock_t	ltnew;		/* useful start bno of left side */
> > -	xfs_extlen_t	rlen;		/* length of returned extent */
> > +	xfs_agblock_t	bno;
> 
> Is the indenting here inconsistent with the *fbcur declaration above?
> 

No, will fix.

> > +	xfs_extlen_t	len;
> > +	bool		fbinc = false;
> >  #ifdef DEBUG
> >  	/*
> >  	 * Randomly don't execute the first algorithm.
...
> > @@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
> >  		goto out;
> >  	}
> >  
> > -	/*
> > -	 * At this point we have selected a freespace entry, either to the
> > -	 * left or to the right.  If it's on the right, copy all the
> > -	 * useful variables to the "left" set so we only have one
> > -	 * copy of this code.
> > -	 */
> > -	if (xfs_alloc_cur_active(acur.bnogt)) {
> > -		bno_cur = acur.bnogt;
> > -		ltbno = gtbno;
> > -		ltbnoa = gtbnoa;
> > -		ltlen = gtlen;
> > -		ltlena = gtlena;
> > -		j = 1;
> > -	} else {
> > -		bno_cur = acur.bnolt;
> > -		j = 0;
> > -	}
> > -
> > -	/*
> > -	 * Fix up the length and compute the useful address.
> > -	 */
> > -	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> > -	xfs_alloc_fix_len(args);
> > -	rlen = args->len;
> > -	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
> > -				     args->datatype, ltbnoa, ltlena, &ltnew);
> 
> Hmm.  So I /think/ the reason this can go away is that
> xfs_alloc_cur_check already did all this trimming and whatnot and
> stuffed the values into the xfs_alloc_cur structure, so we can just copy
> the allocated extent to the caller's @args, update the bnobt/cntbt to
> reflect the allocation, and exit?
> 

Right, the code just above should be a subset of what
xfs_alloc_cur_check() is already doing before it selects a new
candidate. This was necessary in the current code because this was all
tracked by cursor state (i.e. xfs_alloc_find_best_extent() decides
whether to use the left or right cursor). This is all replaced with
explicit "best extent" tracking, independent from logic around things
like which direction to do the "find best" search in, etc.

> I think this looks ok, but woof a lot to digest. :/
> 

This was the hairiest patch of the set IMO. :P

Brian

> --D
> 
> > -	ASSERT(ltnew >= ltbno);
> > -	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
> > -	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > -	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
> > -	args->agbno = ltnew;
> > -
> > -	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
> > -				      rlen, XFSA_FIXUP_BNO_OK);
> > -	if (error)
> > -		goto out;
> > +	args->agbno = acur.bno;
> > +	args->len = acur.len;
> > +	ASSERT(acur.bno >= acur.rec_bno);
> > +	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> > +	ASSERT(acur.rec_bno + acur.rec_len <=
> > +	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> >  
> > -	if (j)
> > -		trace_xfs_alloc_near_greater(args);
> > -	else
> > -		trace_xfs_alloc_near_lesser(args);
> > +	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
> > +				      acur.rec_len, acur.bno, acur.len, 0);
> >  
> >  out:
> >  	xfs_alloc_cur_close(&acur, error);
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index b12fad3e45cb..2e57dc3d4230 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> > -DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
> > -DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
> > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
> > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic
  2019-09-18 21:03   ` Darrick J. Wong
@ 2019-09-19 15:04     ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-19 15:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 02:03:20PM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:32AM -0400, Brian Foster wrote:
> > The bnobt "find best" helper implements a simple btree walker
> > function. This general pattern, or a subset thereof, is reused in
> > various parts of a near mode allocation operation. For example, the
> > bnobt left/right scans are each iterative btree walks along with the
> > cntbt lastblock scan.
> > 
> > Rework this function into a generic btree walker, add a couple
> > parameters to control termination behavior from various contexts and
> > reuse it where applicable.
> > 
> > XXX: This slightly changes the cntbt lastblock scan logic to scan
> > the entire block and use the best candidate as opposed to the
> > current logic of checking until we locate a perfect match. Fix up
> > cur_check() to terminate on perfect match.
> 
> Did that fix up happen in a previous patch?
> 

No, I think at one point I convinced myself to just drop this because it
didn't seem likely to find an exact extent via the cntbt and then forgot
to update the commit log. That said, I'll revisit this because as rare
as it might be, I think it might make sense to just deactivate the cntbt
cursor in the (acur->len == args->maxlen && acur->diff == 0) scenario
and let that terminate the search reasonably gracefully.

> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 103 ++++++++++++++++++--------------------
> >  1 file changed, 48 insertions(+), 55 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 3eacc799c4cb..ab494fd50dd7 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -1172,30 +1172,38 @@ xfs_alloc_ag_vextent_exact(
> >  }
> >  
> >  /*
> > - * Search the btree in a given direction and check the records against the good
> > - * extent we've already found.
> > + * Search a given number of btree records in a given direction. Check each
> > + * record against the good extent we've already found.
> >   */
> >  STATIC int
> > -xfs_alloc_find_best_extent(
> > +xfs_alloc_walk_iter(
> >  	struct xfs_alloc_arg	*args,
> >  	struct xfs_alloc_cur	*acur,
> >  	struct xfs_btree_cur	*cur,
> > -	bool			increment)
> > +	bool			increment,
> > +	bool			find_one,/* quit on first candidate */
> 
> Space between the comma and the comment?
> 

Fixed and aligned both comments with spaces.

Brian

> --D
> 
> > +	int			count,	/* rec count (-1 for infinite) */
> > +	int			*stat)
> >  {
> >  	int			error;
> >  	int			i;
> >  
> > +	*stat = 0;
> > +
> >  	/*
> >  	 * Search so long as the cursor is active or we find a better extent.
> >  	 * The cursor is deactivated if it extends beyond the range of the
> >  	 * current allocation candidate.
> >  	 */
> > -	while (xfs_alloc_cur_active(cur)) {
> > +	while (xfs_alloc_cur_active(cur) && count) {
> >  		error = xfs_alloc_cur_check(args, acur, cur, &i);
> >  		if (error)
> >  			return error;
> > -		if (i == 1)
> > -			break;
> > +		if (i == 1) {
> > +			*stat = 1;
> > +			if (find_one)
> > +				break;
> > +		}
> >  		if (!xfs_alloc_cur_active(cur))
> >  			break;
> >  
> > @@ -1207,6 +1215,9 @@ xfs_alloc_find_best_extent(
> >  			return error;
> >  		if (i == 0)
> >  			cur->bc_private.a.priv.abt.active = false;
> > +
> > +		if (count > 0)
> > +			count--;
> >  	}
> >  
> >  	return 0;
> > @@ -1226,7 +1237,6 @@ xfs_alloc_ag_vextent_near(
> >  	struct xfs_btree_cur	*fbcur = NULL;
> >  	int		error;		/* error code */
> >  	int		i;		/* result code, temporary */
> > -	int		j;		/* result code, temporary */
> >  	xfs_agblock_t	bno;
> >  	xfs_extlen_t	len;
> >  	bool		fbinc = false;
> > @@ -1313,19 +1323,12 @@ xfs_alloc_ag_vextent_near(
> >  			if (!i)
> >  				break;
> >  		}
> > -		i = acur.cnt->bc_ptrs[0];
> > -		for (j = 1;
> > -		     !error && j && xfs_alloc_cur_active(acur.cnt) &&
> > -		     (acur.len < args->maxlen || acur.diff > 0);
> > -		     error = xfs_btree_increment(acur.cnt, 0, &j)) {
> > -			/*
> > -			 * For each entry, decide if it's better than
> > -			 * the previous best entry.
> > -			 */
> > -			error = xfs_alloc_cur_check(args, &acur, acur.cnt, &i);
> > -			if (error)
> > -				goto out;
> > -		}
> > +
> > +		error = xfs_alloc_walk_iter(args, &acur, acur.cnt, true, false,
> > +					    -1, &i);
> > +		if (error)
> > +			goto out;
> > +
> >  		/*
> >  		 * It didn't work.  We COULD be in a case where
> >  		 * there's a good record somewhere, so try again.
> > @@ -1357,49 +1360,39 @@ xfs_alloc_ag_vextent_near(
> >  		goto out;
> >  
> >  	/*
> > -	 * Loop going left with the leftward cursor, right with the
> > -	 * rightward cursor, until either both directions give up or
> > -	 * we find an entry at least as big as minlen.
> > +	 * Loop going left with the leftward cursor, right with the rightward
> > +	 * cursor, until either both directions give up or we find an entry at
> > +	 * least as big as minlen.
> >  	 */
> >  	do {
> > -		if (xfs_alloc_cur_active(acur.bnolt)) {
> > -			error = xfs_alloc_cur_check(args, &acur, acur.bnolt, &i);
> > -			if (error)
> > -				goto out;
> > -			if (i == 1) {
> > -				trace_xfs_alloc_cur_left(args);
> > -				fbcur = acur.bnogt;
> > -				fbinc = true;
> > -				break;
> > -			}
> > -			error = xfs_btree_decrement(acur.bnolt, 0, &i);
> > -			if (error)
> > -				goto out;
> > -			if (!i)
> > -				acur.bnolt->bc_private.a.priv.abt.active = false;
> > +		error = xfs_alloc_walk_iter(args, &acur, acur.bnolt, false,
> > +					    true, 1, &i);
> > +		if (error)
> > +			goto out;
> > +		if (i == 1) {
> > +			trace_xfs_alloc_cur_left(args);
> > +			fbcur = acur.bnogt;
> > +			fbinc = true;
> > +			break;
> >  		}
> > -		if (xfs_alloc_cur_active(acur.bnogt)) {
> > -			error = xfs_alloc_cur_check(args, &acur, acur.bnogt, &i);
> > -			if (error)
> > -				goto out;
> > -			if (i == 1) {
> > -				trace_xfs_alloc_cur_right(args);
> > -				fbcur = acur.bnolt;
> > -				fbinc = false;
> > -				break;
> > -			}
> > -			error = xfs_btree_increment(acur.bnogt, 0, &i);
> > -			if (error)
> > -				goto out;
> > -			if (!i)
> > -				acur.bnogt->bc_private.a.priv.abt.active = false;
> > +
> > +		error = xfs_alloc_walk_iter(args, &acur, acur.bnogt, true, true,
> > +					    1, &i);
> > +		if (error)
> > +			goto out;
> > +		if (i == 1) {
> > +			trace_xfs_alloc_cur_right(args);
> > +			fbcur = acur.bnolt;
> > +			fbinc = false;
> > +			break;
> >  		}
> >  	} while (xfs_alloc_cur_active(acur.bnolt) ||
> >  		 xfs_alloc_cur_active(acur.bnogt));
> >  
> >  	/* search the opposite direction for a better entry */
> >  	if (fbcur) {
> > -		error = xfs_alloc_find_best_extent(args, &acur, fbcur, fbinc);
> > +		error = xfs_alloc_walk_iter(args, &acur, fbcur, fbinc, true, -1,
> > +					    &i);
> >  		if (error)
> >  			goto out;
> >  	}
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups
  2019-09-18 21:11   ` Darrick J. Wong
@ 2019-09-19 15:05     ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-09-19 15:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Sep 18, 2019 at 02:11:58PM -0700, Darrick J. Wong wrote:
> On Mon, Sep 16, 2019 at 08:16:35AM -0400, Brian Foster wrote:
> > The near mode fallback algorithm consists of a left/right scan of
> > the bnobt. This algorithm has very poor breakdown characteristics
> > under worst case free space fragmentation conditions. If a suitable
> > extent is far enough from the locality hint, each allocation may
> > scan most or all of the bnobt before it completes. This causes
> > pathological behavior and extremely high allocation latencies.
> > 
> > While locality is important to near mode allocations, it is not so
> > important as to incur pathological allocation latency to provide the
> > asolute best available locality for every allocation. If the
> > allocation is large enough or far enough away, there is a point of
> > diminishing returns. As such, we can bound the overall operation by
> > including an iterative cntbt lookup in the broader search. The cntbt
> > lookup is optimized to immediately find the extent with best
> > locality for the given size on each iteration. Since the cntbt is
> > indexed by extent size, the lookup repeats with a variably
> > aggressive increasing search key size until it runs off the edge of
> > the tree.
> > 
> > This approach provides a natural balance between the two algorithms
> > for various situations. For example, the bnobt scan is able to
> > satisfy smaller allocations such as for inode chunks or btree blocks
> > more quickly where the cntbt search may have to search through a
> > large set of extent sizes when the search key starts off small
> > relative to the largest extent in the tree. On the other hand, the
> > cntbt search more deterministically covers the set of suitable
> > extents for larger data extent allocation requests that the bnobt
> > scan may have to search the entire tree to locate.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 153 +++++++++++++++++++++++++++++++++++---
> >  fs/xfs/xfs_trace.h        |   2 +
> >  2 files changed, 143 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 381a08257aaf..4ec22040e516 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
...
> > @@ -1309,9 +1407,39 @@ xfs_alloc_ag_vextent_bnobt(
> >  			fbinc = false;
> >  			break;
> >  		}
> > +
> > +		/*
> > +		 * Check the extent with best locality based on the current
> > +		 * extent size search key and keep track of the best candidate.
> > +		 * If we fail to find anything due to busy extents, return
> > +		 * empty handed so the caller can flush and retry the search. If
> > +		 * no busy extents were found, walk backwards from the end of
> > +		 * the cntbt as a last resort.
> > +		 */
> > +		error = xfs_alloc_cntbt_iter(args, acur);
> > +		if (error)
> > +			return error;
> > +		if (!xfs_alloc_cur_active(acur->cnt)) {
> > +			trace_xfs_alloc_cur_lookup_done(args);
> > +			if (!acur->len && !acur->busy) {
> > +				error = xfs_btree_decrement(acur->cnt, 0, &i);
> > +				if (error)
> > +					return error;
> > +				if (i) {
> > +					acur->cnt->bc_private.a.priv.abt.active = true;
> 
> Line over 80 columns?
> 

Yeah..

> Or, put another way, could this be refactored not to have 5 levels of
> indent?
> 

Hmm.. I think this could just break out of the loop and then check
whether the cntbt cursor is !active again just after the loop to reset
the cursor and set up a cntbt reverse search. I'll look into it. Thanks
for the feedback...

Brian

> Otherwise looks good.
> 
> --D
> 
> > +					fbcur = acur->cnt;
> > +					fbinc = false;
> > +				}
> > +			}
> > +			break;
> > +		}
> > +
> >  	}
> >  
> > -	/* search the opposite direction for a better entry */
> > +	/*
> > +	 * Search in the opposite direction for a better entry in the case of
> > +	 * a bnobt hit or walk backwards from the end of the cntbt.
> > +	 */
> >  	if (fbcur) {
> >  		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true, -1,
> >  					    &i);
> > @@ -1440,9 +1568,10 @@ xfs_alloc_ag_vextent_near(
> >  	}
> >  
> >  	/*
> > -	 * Second algorithm. Search the bnobt left and right.
> > +	 * Second algorithm. Combined cntbt and bnobt search to find ideal
> > +	 * locality.
> >  	 */
> > -	error = xfs_alloc_ag_vextent_bnobt(args, &acur, &i);
> > +	error = xfs_alloc_ag_vextent_locality(args, &acur, &i);
> >  	if (error)
> >  		goto out;
> >  
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index b8b93068efe7..0c9dfeac4e75 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1645,6 +1645,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_cur);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
> > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup);
> > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_lookup_done);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
> >  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-09-19 15:04     ` Brian Foster
@ 2019-10-04 22:44       ` Darrick J. Wong
  2019-10-07 11:08         ` Brian Foster
  0 siblings, 1 reply; 32+ messages in thread
From: Darrick J. Wong @ 2019-10-04 22:44 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 19, 2019 at 11:04:24AM -0400, Brian Foster wrote:
> On Wed, Sep 18, 2019 at 01:43:11PM -0700, Darrick J. Wong wrote:
> > On Mon, Sep 16, 2019 at 08:16:30AM -0400, Brian Foster wrote:
> > > The near mode bnobt scan searches left and right in the bnobt
> > > looking for the closest free extent to the allocation hint that
> > > satisfies minlen. Once such an extent is found, the left/right
> > > search terminates, we search one more time in the opposite direction
> > > and finish the allocation with the best overall extent.
> > > 
> > > The left/right and find best searches are currently controlled via a
> > > combination of cursor state and local variables. Clean up this code
> > > and prepare for further improvements to the near mode fallback
> > > algorithm by reusing the allocation cursor best extent tracking
> > > mechanism. Update the tracking logic to deactivate bnobt cursors
> > > when out of allocation range and replace open-coded extent checks to
> > > calls to the common helper. In doing so, rename some misnamed local
> > > variables in the top-level near mode allocation function.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c | 270 +++++++++++---------------------------
> > >  fs/xfs/xfs_trace.h        |   4 +-
> > >  2 files changed, 76 insertions(+), 198 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 2fa7bb6a00a8..edcec975c306 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> ...
> > > @@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
> > >  }
> > >  
> > >  /*
> > > - * Search the btree in a given direction via the search cursor and compare
> > > - * the records found against the good extent we've already found.
> > > + * Search the btree in a given direction and check the records against the good
> > > + * extent we've already found.
> > >   */
> > >  STATIC int
> > >  xfs_alloc_find_best_extent(
> > > -	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> > > -	struct xfs_btree_cur	*gcur,	/* good cursor */
> > > -	struct xfs_btree_cur	*scur,	/* searching cursor */
> > > -	xfs_agblock_t		gdiff,	/* difference for search comparison */
> > > -	xfs_agblock_t		*sbno,	/* extent found by search */
> > > -	xfs_extlen_t		*slen,	/* extent length */
> > > -	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
> > > -	xfs_extlen_t		*slena,	/* aligned extent length */
> > > -	int			dir)	/* 0 = search right, 1 = search left */
> > > +	struct xfs_alloc_arg	*args,
> > > +	struct xfs_alloc_cur	*acur,
> > > +	struct xfs_btree_cur	*cur,
> > > +	bool			increment)
> > >  {
> > > -	xfs_agblock_t		new;
> > > -	xfs_agblock_t		sdiff;
> > >  	int			error;
> > >  	int			i;
> > > -	unsigned		busy_gen;
> > > -
> > > -	/* The good extent is perfect, no need to  search. */
> > > -	if (!gdiff)
> > > -		goto out_use_good;
> > >  
> > >  	/*
> > > -	 * Look until we find a better one, run out of space or run off the end.
> > > +	 * Search so long as the cursor is active or we find a better extent.
> > > +	 * The cursor is deactivated if it extends beyond the range of the
> > > +	 * current allocation candidate.
> > >  	 */
> > > -	do {
> > > -		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
> > > +	while (xfs_alloc_cur_active(cur)) {
> > > +		error = xfs_alloc_cur_check(args, acur, cur, &i);
> > 
> > Does @i > 0 now mean "we found a better candidate for allocation so
> > we're done" ?
> > 
> 
> In the context of xfs_alloc_cur_check(), i == 1 just means the checked
> extent became the new allocation candidate. Whether "we're done" or not
> is a higher level policy decision. In the xfs_alloc_find_best_extent()
> case, we're doing a last effort search in the opposite direction after
> finding an extent to allocate in the higher level left/right bnobt
> search. So yes, we are done in that scenario, but that's not necessarily
> the case for all calls to xfs_alloc_cur_check() that return i == 1.
> 
> Ideally this could be documented better, but this function is reworked
> into xfs_alloc_walk_iter() in the next couple patches and the behavior
> changes slightly.
> 
> > >  		if (error)
> > > -			goto error0;
> > > -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> > > -		xfs_alloc_compute_aligned(args, *sbno, *slen,
> > > -				sbnoa, slena, &busy_gen);
> > > -
> > > -		/*
> > > -		 * The good extent is closer than this one.
> > > -		 */
> > > -		if (!dir) {
> > > -			if (*sbnoa > args->max_agbno)
> > > -				goto out_use_good;
> > > -			if (*sbnoa >= args->agbno + gdiff)
> > > -				goto out_use_good;
> > > -		} else {
> > > -			if (*sbnoa < args->min_agbno)
> > > -				goto out_use_good;
> > > -			if (*sbnoa <= args->agbno - gdiff)
> > > -				goto out_use_good;
> > > -		}
> > > -
> > > -		/*
> > > -		 * Same distance, compare length and pick the best.
> > > -		 */
> > > -		if (*slena >= args->minlen) {
> > > -			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
> > > -			xfs_alloc_fix_len(args);
> > > -
> > > -			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> > > -						       args->alignment,
> > > -						       args->datatype, *sbnoa,
> > > -						       *slena, &new);
> > > -
> > > -			/*
> > > -			 * Choose closer size and invalidate other cursor.
> > > -			 */
> > > -			if (sdiff < gdiff)
> > > -				goto out_use_search;
> > > -			goto out_use_good;
> > > -		}
> > > +			return error;
> > > +		if (i == 1)
> > > +			break;
> > > +		if (!xfs_alloc_cur_active(cur))
> > > +			break;
> > >  
> > > -		if (!dir)
> > > -			error = xfs_btree_increment(scur, 0, &i);
> > > +		if (increment)
> > > +			error = xfs_btree_increment(cur, 0, &i);
> > >  		else
> > > -			error = xfs_btree_decrement(scur, 0, &i);
> > > +			error = xfs_btree_decrement(cur, 0, &i);
> > >  		if (error)
> > > -			goto error0;
> > > -	} while (i);
> > > -
> > > -out_use_good:
> > > -	scur->bc_private.a.priv.abt.active = false;
> > > -	return 0;
> > > +			return error;
> > > +		if (i == 0)
> > > +			cur->bc_private.a.priv.abt.active = false;
> > 
> > ...and I guess @i == 0 here means "@cur hit the end of the records so
> > deactivate it"?
> > 
> 
> Yeah, this is a bit of a quirk in that xfs_btree_[inc|dec]rement() are
> generic btree functions and so there was no place to bury the 'active'
> updates in the first patch of the series. I suppose you could argue that
> updating active is a reason to create a couple
> xfs_alloc_[inc|dec]rement() wrappers, but I don't have a strong
> preference. Thoughts?

Heh, oops, forgot to reply to this.  I would say that if we're
open-coding a lot of:

	error = xfs_btree_increment(some_alloc_cur, &i);
	if (error)
		return error;
	if (i == 0)
		some_alloc_cur->bc_private.a.priv.abt.active = false;

then yes, let's have a helper.  However, I'd only do that if there are
places other than this function that do this, and only as a cleanup to
add to the end of the series.

--D

> 
> > > +	}
> > >  
> > > -out_use_search:
> > > -	gcur->bc_private.a.priv.abt.active = false;
> > >  	return 0;
> > > -
> > > -error0:
> > > -	/* caller invalidates cursors */
> > > -	return error;
> > >  }
> > >  
> > >  /*
> > > @@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
> > >  	struct xfs_alloc_arg	*args)
> > >  {
> > >  	struct xfs_alloc_cur	acur = {0,};
> > > -	struct xfs_btree_cur	*bno_cur;
> > > -	xfs_agblock_t	gtbno;		/* start bno of right side entry */
> > > -	xfs_agblock_t	gtbnoa;		/* aligned ... */
> > > -	xfs_extlen_t	gtdiff;		/* difference to right side entry */
> > > -	xfs_extlen_t	gtlen;		/* length of right side entry */
> > > -	xfs_extlen_t	gtlena;		/* aligned ... */
> > > -	xfs_agblock_t	gtnew;		/* useful start bno of right side */
> > > +	struct xfs_btree_cur	*fbcur = NULL;
> > >  	int		error;		/* error code */
> > >  	int		i;		/* result code, temporary */
> > >  	int		j;		/* result code, temporary */
> > > -	xfs_agblock_t	ltbno;		/* start bno of left side entry */
> > > -	xfs_agblock_t	ltbnoa;		/* aligned ... */
> > > -	xfs_extlen_t	ltdiff;		/* difference to left side entry */
> > > -	xfs_extlen_t	ltlen;		/* length of left side entry */
> > > -	xfs_extlen_t	ltlena;		/* aligned ... */
> > > -	xfs_agblock_t	ltnew;		/* useful start bno of left side */
> > > -	xfs_extlen_t	rlen;		/* length of returned extent */
> > > +	xfs_agblock_t	bno;
> > 
> > Is the indenting here inconsistent with the *fbcur declaration above?
> > 
> 
> No, will fix.
> 
> > > +	xfs_extlen_t	len;
> > > +	bool		fbinc = false;
> > >  #ifdef DEBUG
> > >  	/*
> > >  	 * Randomly don't execute the first algorithm.
> ...
> > > @@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
> > >  		goto out;
> > >  	}
> > >  
> > > -	/*
> > > -	 * At this point we have selected a freespace entry, either to the
> > > -	 * left or to the right.  If it's on the right, copy all the
> > > -	 * useful variables to the "left" set so we only have one
> > > -	 * copy of this code.
> > > -	 */
> > > -	if (xfs_alloc_cur_active(acur.bnogt)) {
> > > -		bno_cur = acur.bnogt;
> > > -		ltbno = gtbno;
> > > -		ltbnoa = gtbnoa;
> > > -		ltlen = gtlen;
> > > -		ltlena = gtlena;
> > > -		j = 1;
> > > -	} else {
> > > -		bno_cur = acur.bnolt;
> > > -		j = 0;
> > > -	}
> > > -
> > > -	/*
> > > -	 * Fix up the length and compute the useful address.
> > > -	 */
> > > -	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> > > -	xfs_alloc_fix_len(args);
> > > -	rlen = args->len;
> > > -	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
> > > -				     args->datatype, ltbnoa, ltlena, &ltnew);
> > 
> > Hmm.  So I /think/ the reason this can go away is that
> > xfs_alloc_cur_check already did all this trimming and whatnot and
> > stuffed the values into the xfs_alloc_cur structure, so we can just copy
> > the allocated extent to the caller's @args, update the bnobt/cntbt to
> > reflect the allocation, and exit?
> > 
> 
> Right, the code just above should be a subset of what
> xfs_alloc_cur_check() is already doing before it selects a new
> candidate. This was necessary in the current code because this was all
> tracked by cursor state (i.e. xfs_alloc_find_best_extent() decides
> whether to use the left or right cursor). This is all replaced with
> explicit "best extent" tracking, independent from logic around things
> like which direction to do the "find best" search in, etc.
> 
> > I think this looks ok, but woof a lot to digest. :/
> > 
> 
> This was the hairiest patch of the set IMO. :P
> 
> Brian
> 
> > --D
> > 
> > > -	ASSERT(ltnew >= ltbno);
> > > -	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
> > > -	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > > -	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
> > > -	args->agbno = ltnew;
> > > -
> > > -	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
> > > -				      rlen, XFSA_FIXUP_BNO_OK);
> > > -	if (error)
> > > -		goto out;
> > > +	args->agbno = acur.bno;
> > > +	args->len = acur.len;
> > > +	ASSERT(acur.bno >= acur.rec_bno);
> > > +	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> > > +	ASSERT(acur.rec_bno + acur.rec_len <=
> > > +	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > >  
> > > -	if (j)
> > > -		trace_xfs_alloc_near_greater(args);
> > > -	else
> > > -		trace_xfs_alloc_near_lesser(args);
> > > +	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
> > > +				      acur.rec_len, acur.bno, acur.len, 0);
> > >  
> > >  out:
> > >  	xfs_alloc_cur_close(&acur, error);
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index b12fad3e45cb..2e57dc3d4230 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
> > >  DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
> > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
> > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> > > -DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
> > > -DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
> > > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
> > > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
> > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
> > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
> > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> > > -- 
> > > 2.20.1
> > > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan
  2019-10-04 22:44       ` Darrick J. Wong
@ 2019-10-07 11:08         ` Brian Foster
  0 siblings, 0 replies; 32+ messages in thread
From: Brian Foster @ 2019-10-07 11:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Oct 04, 2019 at 03:44:43PM -0700, Darrick J. Wong wrote:
> On Thu, Sep 19, 2019 at 11:04:24AM -0400, Brian Foster wrote:
> > On Wed, Sep 18, 2019 at 01:43:11PM -0700, Darrick J. Wong wrote:
> > > On Mon, Sep 16, 2019 at 08:16:30AM -0400, Brian Foster wrote:
> > > > The near mode bnobt scan searches left and right in the bnobt
> > > > looking for the closest free extent to the allocation hint that
> > > > satisfies minlen. Once such an extent is found, the left/right
> > > > search terminates, we search one more time in the opposite direction
> > > > and finish the allocation with the best overall extent.
> > > > 
> > > > The left/right and find best searches are currently controlled via a
> > > > combination of cursor state and local variables. Clean up this code
> > > > and prepare for further improvements to the near mode fallback
> > > > algorithm by reusing the allocation cursor best extent tracking
> > > > mechanism. Update the tracking logic to deactivate bnobt cursors
> > > > when out of allocation range and replace open-coded extent checks to
> > > > calls to the common helper. In doing so, rename some misnamed local
> > > > variables in the top-level near mode allocation function.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_alloc.c | 270 +++++++++++---------------------------
> > > >  fs/xfs/xfs_trace.h        |   4 +-
> > > >  2 files changed, 76 insertions(+), 198 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > > index 2fa7bb6a00a8..edcec975c306 100644
> > > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > ...
> > > > @@ -1163,96 +1172,44 @@ xfs_alloc_ag_vextent_exact(
> > > >  }
> > > >  
> > > >  /*
> > > > - * Search the btree in a given direction via the search cursor and compare
> > > > - * the records found against the good extent we've already found.
> > > > + * Search the btree in a given direction and check the records against the good
> > > > + * extent we've already found.
> > > >   */
> > > >  STATIC int
> > > >  xfs_alloc_find_best_extent(
> > > > -	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> > > > -	struct xfs_btree_cur	*gcur,	/* good cursor */
> > > > -	struct xfs_btree_cur	*scur,	/* searching cursor */
> > > > -	xfs_agblock_t		gdiff,	/* difference for search comparison */
> > > > -	xfs_agblock_t		*sbno,	/* extent found by search */
> > > > -	xfs_extlen_t		*slen,	/* extent length */
> > > > -	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
> > > > -	xfs_extlen_t		*slena,	/* aligned extent length */
> > > > -	int			dir)	/* 0 = search right, 1 = search left */
> > > > +	struct xfs_alloc_arg	*args,
> > > > +	struct xfs_alloc_cur	*acur,
> > > > +	struct xfs_btree_cur	*cur,
> > > > +	bool			increment)
> > > >  {
> > > > -	xfs_agblock_t		new;
> > > > -	xfs_agblock_t		sdiff;
> > > >  	int			error;
> > > >  	int			i;
> > > > -	unsigned		busy_gen;
> > > > -
> > > > -	/* The good extent is perfect, no need to  search. */
> > > > -	if (!gdiff)
> > > > -		goto out_use_good;
> > > >  
> > > >  	/*
> > > > -	 * Look until we find a better one, run out of space or run off the end.
> > > > +	 * Search so long as the cursor is active or we find a better extent.
> > > > +	 * The cursor is deactivated if it extends beyond the range of the
> > > > +	 * current allocation candidate.
> > > >  	 */
> > > > -	do {
> > > > -		error = xfs_alloc_get_rec(scur, sbno, slen, &i);
> > > > +	while (xfs_alloc_cur_active(cur)) {
> > > > +		error = xfs_alloc_cur_check(args, acur, cur, &i);
> > > 
> > > Does @i > 0 now mean "we found a better candidate for allocation so
> > > we're done" ?
> > > 
> > 
> > In the context of xfs_alloc_cur_check(), i == 1 just means the checked
> > extent became the new allocation candidate. Whether "we're done" or not
> > is a higher level policy decision. In the xfs_alloc_find_best_extent()
> > case, we're doing a last effort search in the opposite direction after
> > finding an extent to allocate in the higher level left/right bnobt
> > search. So yes, we are done in that scenario, but that's not necessarily
> > the case for all calls to xfs_alloc_cur_check() that return i == 1.
> > 
> > Ideally this could be documented better, but this function is reworked
> > into xfs_alloc_walk_iter() in the next couple patches and the behavior
> > changes slightly.
> > 
> > > >  		if (error)
> > > > -			goto error0;
> > > > -		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
> > > > -		xfs_alloc_compute_aligned(args, *sbno, *slen,
> > > > -				sbnoa, slena, &busy_gen);
> > > > -
> > > > -		/*
> > > > -		 * The good extent is closer than this one.
> > > > -		 */
> > > > -		if (!dir) {
> > > > -			if (*sbnoa > args->max_agbno)
> > > > -				goto out_use_good;
> > > > -			if (*sbnoa >= args->agbno + gdiff)
> > > > -				goto out_use_good;
> > > > -		} else {
> > > > -			if (*sbnoa < args->min_agbno)
> > > > -				goto out_use_good;
> > > > -			if (*sbnoa <= args->agbno - gdiff)
> > > > -				goto out_use_good;
> > > > -		}
> > > > -
> > > > -		/*
> > > > -		 * Same distance, compare length and pick the best.
> > > > -		 */
> > > > -		if (*slena >= args->minlen) {
> > > > -			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
> > > > -			xfs_alloc_fix_len(args);
> > > > -
> > > > -			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
> > > > -						       args->alignment,
> > > > -						       args->datatype, *sbnoa,
> > > > -						       *slena, &new);
> > > > -
> > > > -			/*
> > > > -			 * Choose closer size and invalidate other cursor.
> > > > -			 */
> > > > -			if (sdiff < gdiff)
> > > > -				goto out_use_search;
> > > > -			goto out_use_good;
> > > > -		}
> > > > +			return error;
> > > > +		if (i == 1)
> > > > +			break;
> > > > +		if (!xfs_alloc_cur_active(cur))
> > > > +			break;
> > > >  
> > > > -		if (!dir)
> > > > -			error = xfs_btree_increment(scur, 0, &i);
> > > > +		if (increment)
> > > > +			error = xfs_btree_increment(cur, 0, &i);
> > > >  		else
> > > > -			error = xfs_btree_decrement(scur, 0, &i);
> > > > +			error = xfs_btree_decrement(cur, 0, &i);
> > > >  		if (error)
> > > > -			goto error0;
> > > > -	} while (i);
> > > > -
> > > > -out_use_good:
> > > > -	scur->bc_private.a.priv.abt.active = false;
> > > > -	return 0;
> > > > +			return error;
> > > > +		if (i == 0)
> > > > +			cur->bc_private.a.priv.abt.active = false;
> > > 
> > > ...and I guess @i == 0 here means "@cur hit the end of the records so
> > > deactivate it"?
> > > 
> > 
> > Yeah, this is a bit of a quirk in that xfs_btree_[inc|dec]rement() are
> > generic btree functions and so there was no place to bury the 'active'
> > updates in the first patch of the series. I suppose you could argue that
> > updating active is a reason to create a couple
> > xfs_alloc_[inc|dec]rement() wrappers, but I don't have a strong
> > preference. Thoughts?
> 
> Heh, oops, forgot to reply to this.  I would say that if we're
> open-coding a lot of:
> 
> 	error = xfs_btree_increment(some_alloc_cur, &i);
> 	if (error)
> 		return error;
> 	if (i == 0)
> 		some_alloc_cur->bc_private.a.priv.abt.active = false;
> 
> then yes, let's have a helper.  However, I'd only do that if there are
> places other than this function that do this, and only as a cleanup to
> add to the end of the series.
> 

That pattern is used in at least a couple places as of this series. Once
in xfs_alloc_walk_iter() (to increment or decrement) and again in
xfs_alloc_ag_vextent_locality() to restore a cntbt cursor for a fallback
reverse search. I'll start the next round of reworks off with a patch to
refactor those into a helper.. thanks for the reviews.

Brian

> --D
> 
> > 
> > > > +	}
> > > >  
> > > > -out_use_search:
> > > > -	gcur->bc_private.a.priv.abt.active = false;
> > > >  	return 0;
> > > > -
> > > > -error0:
> > > > -	/* caller invalidates cursors */
> > > > -	return error;
> > > >  }
> > > >  
> > > >  /*
> > > > @@ -1266,23 +1223,13 @@ xfs_alloc_ag_vextent_near(
> > > >  	struct xfs_alloc_arg	*args)
> > > >  {
> > > >  	struct xfs_alloc_cur	acur = {0,};
> > > > -	struct xfs_btree_cur	*bno_cur;
> > > > -	xfs_agblock_t	gtbno;		/* start bno of right side entry */
> > > > -	xfs_agblock_t	gtbnoa;		/* aligned ... */
> > > > -	xfs_extlen_t	gtdiff;		/* difference to right side entry */
> > > > -	xfs_extlen_t	gtlen;		/* length of right side entry */
> > > > -	xfs_extlen_t	gtlena;		/* aligned ... */
> > > > -	xfs_agblock_t	gtnew;		/* useful start bno of right side */
> > > > +	struct xfs_btree_cur	*fbcur = NULL;
> > > >  	int		error;		/* error code */
> > > >  	int		i;		/* result code, temporary */
> > > >  	int		j;		/* result code, temporary */
> > > > -	xfs_agblock_t	ltbno;		/* start bno of left side entry */
> > > > -	xfs_agblock_t	ltbnoa;		/* aligned ... */
> > > > -	xfs_extlen_t	ltdiff;		/* difference to left side entry */
> > > > -	xfs_extlen_t	ltlen;		/* length of left side entry */
> > > > -	xfs_extlen_t	ltlena;		/* aligned ... */
> > > > -	xfs_agblock_t	ltnew;		/* useful start bno of left side */
> > > > -	xfs_extlen_t	rlen;		/* length of returned extent */
> > > > +	xfs_agblock_t	bno;
> > > 
> > > Is the indenting here inconsistent with the *fbcur declaration above?
> > > 
> > 
> > No, will fix.
> > 
> > > > +	xfs_extlen_t	len;
> > > > +	bool		fbinc = false;
> > > >  #ifdef DEBUG
> > > >  	/*
> > > >  	 * Randomly don't execute the first algorithm.
> > ...
> > > > @@ -1524,47 +1434,15 @@ xfs_alloc_ag_vextent_near(
> > > >  		goto out;
> > > >  	}
> > > >  
> > > > -	/*
> > > > -	 * At this point we have selected a freespace entry, either to the
> > > > -	 * left or to the right.  If it's on the right, copy all the
> > > > -	 * useful variables to the "left" set so we only have one
> > > > -	 * copy of this code.
> > > > -	 */
> > > > -	if (xfs_alloc_cur_active(acur.bnogt)) {
> > > > -		bno_cur = acur.bnogt;
> > > > -		ltbno = gtbno;
> > > > -		ltbnoa = gtbnoa;
> > > > -		ltlen = gtlen;
> > > > -		ltlena = gtlena;
> > > > -		j = 1;
> > > > -	} else {
> > > > -		bno_cur = acur.bnolt;
> > > > -		j = 0;
> > > > -	}
> > > > -
> > > > -	/*
> > > > -	 * Fix up the length and compute the useful address.
> > > > -	 */
> > > > -	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
> > > > -	xfs_alloc_fix_len(args);
> > > > -	rlen = args->len;
> > > > -	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
> > > > -				     args->datatype, ltbnoa, ltlena, &ltnew);
> > > 
> > > Hmm.  So I /think/ the reason this can go away is that
> > > xfs_alloc_cur_check already did all this trimming and whatnot and
> > > stuffed the values into the xfs_alloc_cur structure, so we can just copy
> > > the allocated extent to the caller's @args, update the bnobt/cntbt to
> > > reflect the allocation, and exit?
> > > 
> > 
> > Right, the code just above should be a subset of what
> > xfs_alloc_cur_check() is already doing before it selects a new
> > candidate. This was necessary in the current code because this was all
> > tracked by cursor state (i.e. xfs_alloc_find_best_extent() decides
> > whether to use the left or right cursor). This is all replaced with
> > explicit "best extent" tracking, independent from logic around things
> > like which direction to do the "find best" search in, etc.
> > 
> > > I think this looks ok, but woof a lot to digest. :/
> > > 
> > 
> > This was the hairiest patch of the set IMO. :P
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > -	ASSERT(ltnew >= ltbno);
> > > > -	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
> > > > -	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > > > -	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
> > > > -	args->agbno = ltnew;
> > > > -
> > > > -	error = xfs_alloc_fixup_trees(acur.cnt, bno_cur, ltbno, ltlen, ltnew,
> > > > -				      rlen, XFSA_FIXUP_BNO_OK);
> > > > -	if (error)
> > > > -		goto out;
> > > > +	args->agbno = acur.bno;
> > > > +	args->len = acur.len;
> > > > +	ASSERT(acur.bno >= acur.rec_bno);
> > > > +	ASSERT(acur.bno + acur.len <= acur.rec_bno + acur.rec_len);
> > > > +	ASSERT(acur.rec_bno + acur.rec_len <=
> > > > +	       be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
> > > >  
> > > > -	if (j)
> > > > -		trace_xfs_alloc_near_greater(args);
> > > > -	else
> > > > -		trace_xfs_alloc_near_lesser(args);
> > > > +	error = xfs_alloc_fixup_trees(acur.cnt, acur.bnolt, acur.rec_bno,
> > > > +				      acur.rec_len, acur.bno, acur.len, 0);
> > > >  
> > > >  out:
> > > >  	xfs_alloc_cur_close(&acur, error);
> > > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > > index b12fad3e45cb..2e57dc3d4230 100644
> > > > --- a/fs/xfs/xfs_trace.h
> > > > +++ b/fs/xfs/xfs_trace.h
> > > > @@ -1642,8 +1642,8 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
> > > >  DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
> > > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
> > > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
> > > > -DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
> > > > -DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
> > > > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_right);
> > > > +DEFINE_ALLOC_EVENT(xfs_alloc_cur_left);
> > > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
> > > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
> > > >  DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
> > > > -- 
> > > > 2.20.1
> > > > 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-10-07 11:09 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16 12:16 [PATCH v4 00/11] xfs: rework near mode extent allocation Brian Foster
2019-09-16 12:16 ` [PATCH v4 01/11] xfs: track active state of allocation btree cursors Brian Foster
2019-09-18 18:38   ` Darrick J. Wong
2019-09-19 14:57     ` Brian Foster
2019-09-16 12:16 ` [PATCH v4 02/11] xfs: introduce allocation cursor data structure Brian Foster
2019-09-18 18:46   ` Darrick J. Wong
2019-09-16 12:16 ` [PATCH v4 03/11] xfs: track allocation busy state in allocation cursor Brian Foster
2019-09-18 18:48   ` Darrick J. Wong
2019-09-19 14:58     ` Brian Foster
2019-09-16 12:16 ` [PATCH v4 04/11] xfs: track best extent from cntbt lastblock scan in alloc cursor Brian Foster
2019-09-18 18:56   ` Darrick J. Wong
2019-09-19 15:00     ` Brian Foster
2019-09-16 12:16 ` [PATCH v4 05/11] xfs: refactor cntbt lastblock scan best extent logic into helper Brian Foster
2019-09-18 19:03   ` Darrick J. Wong
2019-09-19 15:01     ` Brian Foster
2019-09-16 12:16 ` [PATCH v4 06/11] xfs: reuse best extent tracking logic for bnobt scan Brian Foster
2019-09-18 20:43   ` Darrick J. Wong
2019-09-19 15:04     ` Brian Foster
2019-10-04 22:44       ` Darrick J. Wong
2019-10-07 11:08         ` Brian Foster
2019-09-16 12:16 ` [PATCH v4 07/11] xfs: refactor allocation tree fixup code Brian Foster
2019-09-18 20:44   ` Darrick J. Wong
2019-09-16 12:16 ` [PATCH v4 08/11] xfs: refactor and reuse best extent scanning logic Brian Foster
2019-09-18 21:03   ` Darrick J. Wong
2019-09-19 15:04     ` Brian Foster
2019-09-16 12:16 ` [PATCH v4 09/11] xfs: refactor near mode alloc bnobt scan into separate function Brian Foster
2019-09-18 20:55   ` Darrick J. Wong
2019-09-16 12:16 ` [PATCH v4 10/11] xfs: factor out tree fixup logic into helper Brian Foster
2019-09-18 20:56   ` Darrick J. Wong
2019-09-16 12:16 ` [PATCH v4 11/11] xfs: optimize near mode bnobt scans with concurrent cntbt lookups Brian Foster
2019-09-18 21:11   ` Darrick J. Wong
2019-09-19 15:05     ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.