All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] xfs: rework extent allocation
@ 2019-05-09 16:58 Brian Foster
  2019-05-09 16:58 ` [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt Brian Foster
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This is a first (non-rfc) pass at the XFS extent allocation rework. This
series aims to 1.) improve the near mode allocation algorithm to have
better breakdown characteristics by taking advantage of locality
optimized cntbt lookups instead of relying so heavily on bnobt scans and
2.) refactor the remaining allocation modes to reuse similar code and
reduce duplication.

Patches 1 and 2 are cleanups to the small allocation mode fallback
function.  Patch 3 reworks the near mode allocation algorithm as noted
above and introduces reusable infrastructure. Patches 4 and 5 rework the
exact and by-size allocation modes respectively. Patch 6 removes the
unused bits of the small allocation mode and refactors the remaining
code into an AGFL allocation helper.

With regard to regression testing, this series survives multiple fstests
runs with various geometries, such as defaults, abnormally high AG
counts, and nonstandard block sizes. This also survives several days of
iterative, concurrent stress loads of fsstress, fs_mark and growfiles
(xfstests/ltp/growfiles.c) to repeatedly run a filesystem out of space
without any explosions.

With regard to performance, I have a user provided metadump image with
heavy free space fragmentation that demonstrates pathological allocation
behavior. An fs_mark test against this filesystem image shows an
improvement from 30-100 files/second to 600-800 f/s. On newly created
filesystems, I ran a filebench workload of 16 file creator threads
creating a mix of small (4k-1mb), medium (10mb-100mb) and larger
(500mb-1g) files and issuing fsyncs which shows comparable performance
and post-test free space properties with and without the patchset.
Finally, I've also run some ad hoc tests with fs_mark and fio (fallocate
and aio engines doing random writes) and observe no notable regressions
with these patches.

I may continue to experiment with filebench but so far I've not seen
anything that shows significant regressions. Let me know if anybody
would like to see test output or configuration details or whatnot from
any of that, or would like me to run any additional tests, etc.

Thoughts, reviews, flames appreciated.

Brian

v1:
- Continued development (various fixes, refinements) on generic bits and
  near mode implementation.
- Added patches 4-6 to refactor exact, by-size and small allocation
  modes.
rfcv2: https://marc.info/?l=linux-xfs&m=155197946630582&w=2
- Dropped spurious initial refactoring.
- Added minlen functionality.
- Properly tied into near alloc path.
- General refactoring and cleanups.
rfcv1: https://marc.info/?l=linux-xfs&m=154479089914351&w=2

Brian Foster (6):
  xfs: refactor small allocation helper to skip cntbt attempt
  xfs: always update params on small allocation
  xfs: use locality optimized cntbt lookups for near mode allocations
  xfs: refactor exact extent allocation mode
  xfs: refactor by-size extent allocation mode
  xfs: replace small allocation logic with agfl only logic

 fs/xfs/libxfs/xfs_alloc.c | 1375 +++++++++++++++----------------------
 fs/xfs/xfs_trace.h        |   38 +-
 2 files changed, 595 insertions(+), 818 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt
  2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
@ 2019-05-09 16:58 ` Brian Foster
  2019-05-10 17:24   ` Christoph Hellwig
  2019-05-09 16:58 ` [PATCH 2/6] xfs: always update params on small allocation Brian Foster
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

The small allocation helper is implemented in a way that is fairly
tightly integrated to the existing allocation algorithms. It expects
a cntbt cursor beyond the end of the tree, attempts to locate the
last record in the tree and only attempts an AGFL allocation if the
cntbt is empty.

The upcoming generic algorithm doesn't rely on the cntbt processing
of this function. It will only call this function when the cntbt
doesn't have a big enough extent or is empty and thus AGFL
allocation is the only remaining option. Tweak
xfs_alloc_ag_vextent_small() to handle a NULL cntbt cursor and skip
the cntbt logic. This facilitates use by the existing allocation
code and new code that only requires an AGFL allocation attempt.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 52 +++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a9ff3cf82cce..55eda416e18b 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1589,33 +1589,36 @@ xfs_alloc_ag_vextent_size(
  */
 STATIC int			/* error */
 xfs_alloc_ag_vextent_small(
-	xfs_alloc_arg_t	*args,	/* allocation argument structure */
-	xfs_btree_cur_t	*ccur,	/* by-size cursor */
-	xfs_agblock_t	*fbnop,	/* result block number */
-	xfs_extlen_t	*flenp,	/* result length */
-	int		*stat)	/* status: 0-freelist, 1-normal/none */
+	struct xfs_alloc_arg	*args,	/* allocation argument structure */
+	struct xfs_btree_cur	*ccur,	/* optional by-size cursor */
+	xfs_agblock_t		*fbnop,	/* result block number */
+	xfs_extlen_t		*flenp,	/* result length */
+	int			*stat)	/* status: 0-freelist, 1-normal/none */
 {
-	int		error;
-	xfs_agblock_t	fbno;
-	xfs_extlen_t	flen;
-	int		i;
+	int			error = 0;
+	xfs_agblock_t		fbno;
+	xfs_extlen_t		flen;
+	int			i = 0;
 
-	if ((error = xfs_btree_decrement(ccur, 0, &i)))
+	/*
+	 * If a cntbt cursor is provided, try to allocate the largest record in
+	 * the tree. Try the AGFL if the cntbt is empty, otherwise fail the
+	 * allocation. Make sure to respect minleft even when pulling from the
+	 * freelist.
+	 */
+	if (ccur)
+		error = xfs_btree_decrement(ccur, 0, &i);
+	if (error)
 		goto error0;
 	if (i) {
-		if ((error = xfs_alloc_get_rec(ccur, &fbno, &flen, &i)))
+		error = xfs_alloc_get_rec(ccur, &fbno, &flen, &i);
+		if (error)
 			goto error0;
 		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-	}
-	/*
-	 * Nothing in the btree, try the freelist.  Make sure
-	 * to respect minleft even when pulling from the
-	 * freelist.
-	 */
-	else if (args->minlen == 1 && args->alignment == 1 &&
-		 args->resv != XFS_AG_RESV_AGFL &&
-		 (be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_flcount)
-		  > args->minleft)) {
+	} else if (args->minlen == 1 && args->alignment == 1 &&
+		   args->resv != XFS_AG_RESV_AGFL &&
+		   (be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_flcount) >
+		    args->minleft)) {
 		error = xfs_alloc_get_freelist(args->tp, args->agbp, &fbno, 0);
 		if (error)
 			goto error0;
@@ -1661,14 +1664,11 @@ xfs_alloc_ag_vextent_small(
 		 */
 		else
 			flen = 0;
-	}
-	/*
-	 * Can't allocate from the freelist for some reason.
-	 */
-	else {
+	} else {
 		fbno = NULLAGBLOCK;
 		flen = 0;
 	}
+
 	/*
 	 * Can't do the allocation, give up.
 	 */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] xfs: always update params on small allocation
  2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
  2019-05-09 16:58 ` [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt Brian Foster
@ 2019-05-09 16:58 ` Brian Foster
  2019-05-10 17:26   ` Christoph Hellwig
  2019-05-09 16:58 ` [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations Brian Foster
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

xfs_alloc_ag_vextent_small() doesn't update the output parameters in
the event of an AGFL allocation. Instead, it updates the
xfs_alloc_arg structure directly to complete the allocation.

Update both args and the output params to provide consistent
behavior for future callers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 55eda416e18b..231e8ca5cce0 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1637,8 +1637,6 @@ xfs_alloc_ag_vextent_small(
 				}
 				xfs_trans_binval(args->tp, bp);
 			}
-			args->len = 1;
-			args->agbno = fbno;
 			XFS_WANT_CORRUPTED_GOTO(args->mp,
 				args->agbno + args->len <=
 				be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length),
@@ -1656,6 +1654,8 @@ xfs_alloc_ag_vextent_small(
 			if (error)
 				goto error0;
 
+			*fbnop = args->agbno = fbno;
+			*flenp = args->len = 1;
 			*stat = 0;
 			return 0;
 		}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations
  2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
  2019-05-09 16:58 ` [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt Brian Foster
  2019-05-09 16:58 ` [PATCH 2/6] xfs: always update params on small allocation Brian Foster
@ 2019-05-09 16:58 ` Brian Foster
  2019-05-10 17:31   ` Christoph Hellwig
  2019-05-09 16:58 ` [PATCH 4/6] xfs: refactor exact extent allocation mode Brian Foster
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

The extent allocation code in XFS has several allocation modes with
unique implementations. This is slightly unfortunate as the
allocation modes are not all that different from a high level
perspective. The most involved mode is the near allocation mode
which attempts to allocate an optimally sized extent with ideal
locality with respect to a provided agbno.

In the common case, a near mode allocation consists of a conditional
scan of the last cntbt block followed by a concurrent left and right
spanning search of the bnobt starting from the ideal point of
locality in the bnobt. This works reasonably well as filesystems age
via most common allocation patterns. If free space fragments as the
filesystem ages, however, the near algorithm has very poor breakdown
characteristics. If the extent size lookup happens to land outside
(i.e., before) the last cntbt block, the alloc bypasses the cntbt
entirely. If a suitably sized extent lies beyond a large enough
number of unusable extents from the starting point(s) of the bnobt
search, the bnobt search can take a significant amount of time to
locate the target extent. This leads to pathological allocation
latencies in certain workloads.

The near allocation algorithm can be fundamentally improved to take
advantage of a preexisting mechanism: that by-size cntbt record
lookups can incorporate locality. This means that a single cntbt
lookup can return the extent of a particular size with best
locality. A single locality lookup only covers extents of the
requested size, but for larger extent allocations, repeated locality
lookups of increasing sizes can search more efficiently than the
bnobt scan because it isolates the search space to extents of
suitable size. Such a cntbt search may not always find the extent
with absolute best locality, but the tradeoff for good enough
locality for a more efficient scan is worthwhile because more often
than not, extent size and contiguity are more important for
performance than perfect locality for data allocations.

This patch introduces generic allocation infrastructure for cursor
setup/teardown, selected extent allocation and various means of
btree scanning. Based on this infrastructure, it reimplements the
near allocation algorithm to balance between repeated cntbt lookups
and bnobt left/right scans as opposed to effectively choosing one
search algorithm or the other. This provides more predictable
allocation latency under breakdown conditions with good enough
locality in the common case. The algorithm naturally balances
between smaller and larger allocations as smaller allocations are
more likely to be satisfied immediately from the bnobt whereas
larger allocations are more likely satisfied by the cntbt. The
generic infrastructure introduced by this patch will be reused to
reduce code duplication between different, but conceptually similar
allocation modes in subsequent patches.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 895 ++++++++++++++++++++------------------
 fs/xfs/xfs_trace.h        |  26 +-
 2 files changed, 483 insertions(+), 438 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 231e8ca5cce0..112411a46891 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -39,7 +39,7 @@ struct workqueue_struct *xfs_alloc_wq;
 #define	XFSA_FIXUP_CNT_OK	2
 
 STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *);
-STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *);
+STATIC int xfs_alloc_ag_vextent_type(struct xfs_alloc_arg *);
 STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
@@ -728,7 +728,7 @@ xfs_alloc_ag_vextent(
 		error = xfs_alloc_ag_vextent_size(args);
 		break;
 	case XFS_ALLOCTYPE_NEAR_BNO:
-		error = xfs_alloc_ag_vextent_near(args);
+		error = xfs_alloc_ag_vextent_type(args);
 		break;
 	case XFS_ALLOCTYPE_THIS_BNO:
 		error = xfs_alloc_ag_vextent_exact(args);
@@ -887,499 +887,526 @@ xfs_alloc_ag_vextent_exact(
 }
 
 /*
- * Search the btree in a given direction via the search cursor and compare
- * the records found against the good extent we've already found.
+ * BLock allocation algorithm and data structures.
  */
-STATIC int
-xfs_alloc_find_best_extent(
-	struct xfs_alloc_arg	*args,	/* allocation argument structure */
-	struct xfs_btree_cur	**gcur,	/* good cursor */
-	struct xfs_btree_cur	**scur,	/* searching cursor */
-	xfs_agblock_t		gdiff,	/* difference for search comparison */
-	xfs_agblock_t		*sbno,	/* extent found by search */
-	xfs_extlen_t		*slen,	/* extent length */
-	xfs_agblock_t		*sbnoa,	/* aligned extent found by search */
-	xfs_extlen_t		*slena,	/* aligned extent length */
-	int			dir)	/* 0 = search right, 1 = search left */
+struct xfs_alloc_btree_cur {
+	struct xfs_btree_cur	*cur;		/* cursor */
+	bool			active;		/* cursor active flag */
+};
+
+struct xfs_alloc_cur {
+	struct xfs_alloc_btree_cur	cnt;	/* btree cursors */
+	struct xfs_alloc_btree_cur	bnolt;
+	struct xfs_alloc_btree_cur	bnogt;
+	xfs_extlen_t			cur_len;/* current search length */
+	xfs_agblock_t			rec_bno;/* extent startblock */
+	xfs_extlen_t			rec_len;/* extent length */
+	xfs_agblock_t			bno;	/* alloc bno */
+	xfs_extlen_t			len;	/* alloc len */
+	xfs_extlen_t			diff;	/* diff from search bno */
+	unsigned			busy_gen;/* busy state */
+	bool				busy;
+};
+
+/*
+ * Set up cursors, etc. in the extent allocation cursor. This function can be
+ * called multiple times to reset an initialized structure without having to
+ * reallocate cursors.
+ */
+static int
+xfs_alloc_cur_setup(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur)
 {
-	xfs_agblock_t		new;
-	xfs_agblock_t		sdiff;
+	xfs_agblock_t		agbno = 0;
 	int			error;
 	int			i;
-	unsigned		busy_gen;
 
-	/* The good extent is perfect, no need to  search. */
-	if (!gdiff)
-		goto out_use_good;
+	acur->cnt.active = acur->bnolt.active = acur->bnogt.active = false;
+	acur->cur_len = args->maxlen;
+	acur->rec_bno = 0;
+	acur->rec_len = 0;
+	acur->bno = 0;
+	acur->len = 0;
+	acur->diff = -1;
+	acur->busy = false;
+	acur->busy_gen = 0;
+
+	if (args->agbno != NULLAGBLOCK)
+		agbno = args->agbno;
 
 	/*
-	 * Look until we find a better one, run out of space or run off the end.
+	 * Initialize the cntbt cursor and determine whether to start the search
+	 * at maxlen or minlen. THIS_AG allocation mode expects the cursor at
+	 * the first available maxlen extent or at the end of the tree.
 	 */
-	do {
-		error = xfs_alloc_get_rec(*scur, sbno, slen, &i);
+	if (!acur->cnt.cur)
+		acur->cnt.cur = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_CNT);
+	error = xfs_alloc_lookup_ge(acur->cnt.cur, agbno, acur->cur_len, &i);
+	if (!i) {
+		acur->cur_len = args->minlen;
+		error = xfs_alloc_lookup_ge(acur->cnt.cur, agbno, acur->cur_len,
+					    &i);
 		if (error)
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-		xfs_alloc_compute_aligned(args, *sbno, *slen,
-				sbnoa, slena, &busy_gen);
+			return error;
+	}
+	if (i)
+		acur->cnt.active = true;
+
+	/* init bnobt left/right search cursors */
+	if (!acur->bnolt.cur)
+		acur->bnolt.cur = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	error = xfs_alloc_lookup_le(acur->bnolt.cur, agbno, args->maxlen, &i);
+	if (error)
+		return error;
+	if (i)
+		acur->bnolt.active = true;
 
-		/*
-		 * The good extent is closer than this one.
-		 */
-		if (!dir) {
-			if (*sbnoa > args->max_agbno)
-				goto out_use_good;
-			if (*sbnoa >= args->agbno + gdiff)
-				goto out_use_good;
-		} else {
-			if (*sbnoa < args->min_agbno)
-				goto out_use_good;
-			if (*sbnoa <= args->agbno - gdiff)
-				goto out_use_good;
-		}
+	if (!acur->bnogt.cur)
+		acur->bnogt.cur = xfs_allocbt_init_cursor(args->mp, args->tp,
+					args->agbp, args->agno, XFS_BTNUM_BNO);
+	error = xfs_alloc_lookup_ge(acur->bnogt.cur, agbno, args->maxlen, &i);
+	if (error)
+		return error;
+	if (i)
+		acur->bnogt.active = true;
 
-		/*
-		 * Same distance, compare length and pick the best.
-		 */
-		if (*slena >= args->minlen) {
-			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
-			xfs_alloc_fix_len(args);
+	return 0;
+}
 
-			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-						       args->alignment,
-						       args->datatype, *sbnoa,
-						       *slena, &new);
+static void
+xfs_alloc_cur_close(
+	struct xfs_alloc_cur	*acur,
+	bool			error)
+{
+	int			cur_error = XFS_BTREE_NOERROR;
 
-			/*
-			 * Choose closer size and invalidate other cursor.
-			 */
-			if (sdiff < gdiff)
-				goto out_use_search;
-			goto out_use_good;
-		}
+	if (error)
+		cur_error = XFS_BTREE_ERROR;
+
+	if (acur->cnt.cur)
+		xfs_btree_del_cursor(acur->cnt.cur, cur_error);
+	if (acur->bnolt.cur)
+		xfs_btree_del_cursor(acur->bnolt.cur, cur_error);
+	if (acur->bnogt.cur)
+		xfs_btree_del_cursor(acur->bnogt.cur, cur_error);
+	acur->cnt.cur = acur->bnolt.cur = acur->bnogt.cur = NULL;
+}
 
-		if (!dir)
-			error = xfs_btree_increment(*scur, 0, &i);
-		else
-			error = xfs_btree_decrement(*scur, 0, &i);
-		if (error)
-			goto error0;
-	} while (i);
+/*
+ * Check an extent for allocation and track the best available candidate in the
+ * allocation structure. The cursor is deactivated if it has entered an out of
+ * range state based on allocation arguments. Optionally return the extent
+ * extent geometry and allocation status if requested by the caller.
+ */
+static int
+xfs_alloc_cur_check(
+	struct xfs_alloc_arg		*args,
+	struct xfs_alloc_cur		*acur,
+	struct xfs_alloc_btree_cur	*bcur,
+	xfs_agblock_t			*obno,
+	xfs_extlen_t			*olen,
+	bool				*new)
+{
+	int			error, i;
+	xfs_agblock_t		bno, bnoa, bnew;
+	xfs_extlen_t		len, lena, diff = -1;
+	bool			busy;
+	unsigned		busy_gen = 0;
+	bool			deactivate = false;
+	bool			isbnobt = bcur->cur->bc_btnum == XFS_BTNUM_BNO;
+
+	if (new)
+		*new = false;
+
+	error = xfs_alloc_get_rec(bcur->cur, &bno, &len, &i);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_RETURN(args->mp, i == 1);
+	if (obno)
+		*obno = bno;
+	if (olen)
+		*olen = len;
 
-out_use_good:
-	xfs_btree_del_cursor(*scur, XFS_BTREE_NOERROR);
-	*scur = NULL;
-	return 0;
+	/*
+	 * Check against minlen and then compute and check the aligned record.
+	 * If a cntbt record is out of size range (i.e., we're walking
+	 * backwards) or a bnobt record is out of locality range, deactivate the
+	 * cursor.
+	 */
+	if (len < args->minlen) {
+		deactivate = !isbnobt;
+		goto fail;
+	}
+
+	busy = xfs_alloc_compute_aligned(args, bno, len, &bnoa, &lena,
+					 &busy_gen);
+	acur->busy |= busy;
+	if (busy)
+		acur->busy_gen = busy_gen;
+	if (bnoa < args->min_agbno || bnoa > args->max_agbno) {
+		deactivate = isbnobt;
+		goto fail;
+	}
+	if (lena < args->minlen)
+		goto fail;
 
-out_use_search:
-	xfs_btree_del_cursor(*gcur, XFS_BTREE_NOERROR);
-	*gcur = NULL;
+	args->len = XFS_EXTLEN_MIN(lena, args->maxlen);
+	xfs_alloc_fix_len(args);
+	ASSERT(args->len >= args->minlen);
+	if (args->len < acur->len)
+		goto fail;
+
+	/*
+	 * We have an aligned record that satisfies minlen and beats the current
+	 * candidate length. The remaining locality checks are specific to near
+	 * allocation mode.
+	 */
+	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
+	diff = xfs_alloc_compute_diff(args->agbno, args->len,
+				      args->alignment, args->datatype,
+				      bnoa, lena, &bnew);
+	if (bnew == NULLAGBLOCK)
+		goto fail;
+	if (diff > acur->diff) {
+		/* deactivate bnobt cursor with worse locality */
+		deactivate = isbnobt;
+		goto fail;
+	}
+	if (args->len < acur->len)
+		goto fail;
+
+	/* found a new candidate extent */
+	acur->rec_bno = bno;
+	acur->rec_len = len;
+	acur->bno = bnew;
+	acur->len = args->len;
+	acur->diff = diff;
+	if (new)
+		*new = true;
+	trace_xfs_alloc_cur_new(args->mp, acur->bno, acur->len, acur->diff);
 	return 0;
 
-error0:
-	/* caller invalidates cursors */
-	return error;
+fail:
+	if (deactivate)
+		bcur->active = false;
+	return 0;
 }
 
 /*
- * Allocate a variable extent near bno in the allocation group agno.
- * Extent's length (returned in len) will be between minlen and maxlen,
- * and of the form k * prod + mod unless there's nothing that large.
- * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
+ * Complete an allocation of a candidate extent. Remove the extent from both
+ * trees and update the args structure.
  */
-STATIC int				/* error */
-xfs_alloc_ag_vextent_near(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+STATIC int
+xfs_alloc_cur_finish(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur)
 {
-	xfs_btree_cur_t	*bno_cur_gt;	/* cursor for bno btree, right side */
-	xfs_btree_cur_t	*bno_cur_lt;	/* cursor for bno btree, left side */
-	xfs_btree_cur_t	*cnt_cur;	/* cursor for count btree */
-	xfs_agblock_t	gtbno;		/* start bno of right side entry */
-	xfs_agblock_t	gtbnoa;		/* aligned ... */
-	xfs_extlen_t	gtdiff;		/* difference to right side entry */
-	xfs_extlen_t	gtlen;		/* length of right side entry */
-	xfs_extlen_t	gtlena;		/* aligned ... */
-	xfs_agblock_t	gtnew;		/* useful start bno of right side */
-	int		error;		/* error code */
-	int		i;		/* result code, temporary */
-	int		j;		/* result code, temporary */
-	xfs_agblock_t	ltbno;		/* start bno of left side entry */
-	xfs_agblock_t	ltbnoa;		/* aligned ... */
-	xfs_extlen_t	ltdiff;		/* difference to left side entry */
-	xfs_extlen_t	ltlen;		/* length of left side entry */
-	xfs_extlen_t	ltlena;		/* aligned ... */
-	xfs_agblock_t	ltnew;		/* useful start bno of left side */
-	xfs_extlen_t	rlen;		/* length of returned extent */
-	bool		busy;
-	unsigned	busy_gen;
-#ifdef DEBUG
-	/*
-	 * Randomly don't execute the first algorithm.
-	 */
-	int		dofirst;	/* set to do first algorithm */
+	int			error;
 
-	dofirst = prandom_u32() & 1;
-#endif
+	ASSERT(acur->len);
+	ASSERT(acur->cnt.cur && acur->bnolt.cur);
 
-	/* handle unitialized agbno range so caller doesn't have to */
-	if (!args->min_agbno && !args->max_agbno)
-		args->max_agbno = args->mp->m_sb.sb_agblocks - 1;
-	ASSERT(args->min_agbno <= args->max_agbno);
+	error = xfs_alloc_fixup_trees(acur->cnt.cur, acur->bnolt.cur,
+				      acur->rec_bno, acur->rec_len, acur->bno,
+				      acur->len, 0);
+	if (error)
+		return error;
 
-	/* clamp agbno to the range if it's outside */
-	if (args->agbno < args->min_agbno)
-		args->agbno = args->min_agbno;
-	if (args->agbno > args->max_agbno)
-		args->agbno = args->max_agbno;
+	args->agbno = acur->bno;
+	args->len = acur->len;
+	args->wasfromfl = 0;
 
-restart:
-	bno_cur_lt = NULL;
-	bno_cur_gt = NULL;
-	ltlen = 0;
-	gtlena = 0;
-	ltlena = 0;
-	busy = false;
+	trace_xfs_alloc_cur(args);
+	return 0;
+}
 
-	/*
-	 * Get a cursor for the by-size btree.
-	 */
-	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_CNT);
+/*
+ * Locality allocation lookup algorithm. This expects a cntbt cursor and uses
+ * bno optimized lookup to search for extents with ideal size and locality.
+ */
+STATIC int
+xfs_alloc_lookup_iter(
+	struct xfs_alloc_arg		*args,
+	struct xfs_alloc_cur		*acur,
+	struct xfs_alloc_btree_cur	*bcur)
+{
+	struct xfs_btree_cur	*cur = bcur->cur;
+	xfs_agblock_t		bno;
+	xfs_extlen_t		len, cur_len;
+	int			error;
+	int			i;
 
-	/*
-	 * See if there are any free extents as big as maxlen.
-	 */
-	if ((error = xfs_alloc_lookup_ge(cnt_cur, 0, args->maxlen, &i)))
-		goto error0;
-	/*
-	 * If none, then pick up the last entry in the tree unless the
-	 * tree is empty.
-	 */
-	if (!i) {
-		if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &ltbno,
-				&ltlen, &i)))
-			goto error0;
-		if (i == 0 || ltlen == 0) {
-			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-			trace_xfs_alloc_near_noentry(args);
-			return 0;
+	if (!bcur->active)
+		return 0;
+
+	/* locality optimized lookup */
+	cur_len = acur->cur_len;
+	error = xfs_alloc_lookup_ge(cur, args->agbno, cur_len, &i);
+	if (error)
+		return error;
+	if (i == 0) {
+		bcur->active = false;
+		return 0;
+	}
+
+	/* check the current record and update search length from it */
+	error = xfs_alloc_cur_check(args, acur, bcur, &bno, &len, NULL);
+	if (error)
+		return error;
+	ASSERT(len >= acur->cur_len);
+	acur->cur_len = len;
+
+	/*
+	 * We looked up the first record >= [agbno, len] above. The agbno is a
+	 * secondary key and so the current record may lie just before or after
+	 * agbno. If it is past agbno, check the previous record too so long as
+	 * the length matches as it may be closer. Don't check a smaller record
+	 * because that could deactivate our cursor.
+	 */
+	if (bno > args->agbno) {
+		error = xfs_btree_decrement(cur, 0, &i);
+		if (!error && i) {
+			error = xfs_alloc_get_rec(bcur->cur, &bno, &len, &i);
+			if (!error && i && len == acur->cur_len) {
+				error = xfs_alloc_cur_check(args, acur, bcur,
+							    NULL, NULL, NULL);
+			}
 		}
-		ASSERT(i == 1);
+		if (error)
+			return error;
 	}
-	args->wasfromfl = 0;
 
 	/*
-	 * First algorithm.
-	 * If the requested extent is large wrt the freespaces available
-	 * in this a.g., then the cursor will be pointing to a btree entry
-	 * near the right edge of the tree.  If it's in the last btree leaf
-	 * block, then we just examine all the entries in that block
-	 * that are big enough, and pick the best one.
-	 * This is written as a while loop so we can break out of it,
-	 * but we never loop back to the top.
+	 * Increment the search key until we find at least one allocation
+	 * candidate or if the extent we found was larger. Otherwise, double the
+	 * search key to optimize the search. Efficiency is more important here
+	 * than absolute best locality.
 	 */
-	while (xfs_btree_islastblock(cnt_cur, 0)) {
-		xfs_extlen_t	bdiff;
-		int		besti=0;
-		xfs_extlen_t	blen=0;
-		xfs_agblock_t	bnew=0;
+	cur_len <<= 1;
+	if (!acur->len || acur->cur_len >= cur_len)
+		acur->cur_len++;
+	else
+		acur->cur_len = cur_len;
 
-#ifdef DEBUG
-		if (dofirst)
-			break;
-#endif
-		/*
-		 * Start from the entry that lookup found, sequence through
-		 * all larger free blocks.  If we're actually pointing at a
-		 * record smaller than maxlen, go to the start of this block,
-		 * and skip all those smaller than minlen.
-		 */
-		if (ltlen || args->alignment > 1) {
-			cnt_cur->bc_ptrs[0] = 1;
-			do {
-				if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno,
-						&ltlen, &i)))
-					goto error0;
-				XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-				if (ltlen >= args->minlen)
-					break;
-				if ((error = xfs_btree_increment(cnt_cur, 0, &i)))
-					goto error0;
-			} while (i);
-			ASSERT(ltlen >= args->minlen);
-			if (!i)
+	return error;
+}
+
+/*
+ * Incremental lookup algorithm. Walk a btree in either direction looking for
+ * candidate extents. This works for either bnobt (locality allocation) or cntbt
+ * (by-size allocation) cursors.
+ */
+STATIC int
+xfs_alloc_walk_iter(
+	struct xfs_alloc_arg		*args,
+	struct xfs_alloc_cur		*acur,
+	struct xfs_alloc_btree_cur	*bcur,
+	bool				increment,
+	bool				findone,
+	int				iters,
+	int				*stat)
+{
+	struct xfs_btree_cur	*cur;
+	int			error;
+	int			i;
+	bool			found = false;
+
+	if (!bcur->active)
+		return 0;
+
+	*stat = 0;
+	cur = bcur->cur;
+	for (; iters > 0; iters--) {
+		error = xfs_alloc_cur_check(args, acur, bcur, NULL, NULL,
+					    &found);
+		if (error)
+			return error;
+		if (found) {
+			*stat = 1;
+			if (findone)
 				break;
 		}
-		i = cnt_cur->bc_ptrs[0];
-		for (j = 1, blen = 0, bdiff = 0;
-		     !error && j && (blen < args->maxlen || bdiff > 0);
-		     error = xfs_btree_increment(cnt_cur, 0, &j)) {
-			/*
-			 * For each entry, decide if it's better than
-			 * the previous best entry.
-			 */
-			if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-			busy = xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
-			if (ltlena < args->minlen)
-				continue;
-			if (ltbnoa < args->min_agbno || ltbnoa > args->max_agbno)
-				continue;
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ASSERT(args->len >= args->minlen);
-			if (args->len < blen)
-				continue;
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-			if (ltnew != NULLAGBLOCK &&
-			    (args->len > blen || ltdiff < bdiff)) {
-				bdiff = ltdiff;
-				bnew = ltnew;
-				blen = args->len;
-				besti = cnt_cur->bc_ptrs[0];
-			}
+		if (!bcur->active)
+			break;
+
+		if (increment)
+			error = xfs_btree_increment(cur, 0, &i);
+		else
+			error = xfs_btree_decrement(cur, 0, &i);
+		if (error)
+			return error;
+		if (i == 0) {
+			bcur->active = false;
+			break;
 		}
+	}
+
+	return error;
+}
+
+/*
+ * High level locality allocation algorithm. Search the bnobt (left and right)
+ * in parallel with locality-optimized cntbt lookups to find an extent with
+ * ideal locality.
+ */
+STATIC int
+xfs_alloc_ag_vextent_cur(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	int			*stat)
+{
+	int				error;
+	int				i;
+	unsigned int			findbestcount;
+	struct xfs_alloc_btree_cur	*fbcur = NULL;
+	bool				fbinc = false;
+
+	ASSERT(acur->cnt.cur);
+	ASSERT(args->type != XFS_ALLOCTYPE_THIS_AG);
+	findbestcount = args->mp->m_alloc_mxr[0];
+	*stat = 0;
+
+	/* search as long as we have at least one active cursor */
+	while (acur->cnt.active || acur->bnolt.active || acur->bnogt.active) {
 		/*
-		 * It didn't work.  We COULD be in a case where
-		 * there's a good record somewhere, so try again.
+		 * Search the bnobt left and right. If either of these find a
+		 * suitable extent, we know we've found ideal locality. Do a
+		 * capped search in the opposite direction and we're done.
 		 */
-		if (blen == 0)
+		error = xfs_alloc_walk_iter(args, acur, &acur->bnolt, false,
+					    true, 1, &i);
+		if (error)
+			return error;
+		if (i) {
+			fbcur = &acur->bnogt;
+			fbinc = true;
 			break;
-		/*
-		 * Point at the best entry, and retrieve it again.
-		 */
-		cnt_cur->bc_ptrs[0] = besti;
-		if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-		ASSERT(ltbno + ltlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-		args->len = blen;
+		}
 
-		/*
-		 * We are allocating starting at bnew for blen blocks.
-		 */
-		args->agbno = bnew;
-		ASSERT(bnew >= ltbno);
-		ASSERT(bnew + blen <= ltbno + ltlen);
-		/*
-		 * Set up a cursor for the by-bno tree.
-		 */
-		bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp,
-			args->agbp, args->agno, XFS_BTNUM_BNO);
-		/*
-		 * Fix up the btree entries.
-		 */
-		if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno,
-				ltlen, bnew, blen, XFSA_FIXUP_CNT_OK)))
-			goto error0;
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
+		error = xfs_alloc_walk_iter(args, acur, &acur->bnogt, true,
+					    true, 1, &i);
+		if (error)
+			return error;
+		if (i) {
+			fbcur = &acur->bnolt;
+			break;
+		}
 
-		trace_xfs_alloc_near_first(args);
-		return 0;
-	}
-	/*
-	 * Second algorithm.
-	 * Search in the by-bno tree to the left and to the right
-	 * simultaneously, until in each case we find a space big enough,
-	 * or run into the edge of the tree.  When we run into the edge,
-	 * we deallocate that cursor.
-	 * If both searches succeed, we compare the two spaces and pick
-	 * the better one.
-	 * With alignment, it's possible for both to fail; the upper
-	 * level algorithm that picks allocation groups for allocations
-	 * is not supposed to do this.
-	 */
-	/*
-	 * Allocate and initialize the cursor for the leftward search.
-	 */
-	bno_cur_lt = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_BNO);
-	/*
-	 * Lookup <= bno to find the leftward search's starting point.
-	 */
-	if ((error = xfs_alloc_lookup_le(bno_cur_lt, args->agbno, args->maxlen, &i)))
-		goto error0;
-	if (!i) {
 		/*
-		 * Didn't find anything; use this cursor for the rightward
-		 * search.
+		 * Search the cntbt for a maximum sized extent with ideal
+		 * locality. The lookup search terminates on reaching the end of
+		 * the cntbt. Break out of the loop if this occurs to throttle
+		 * the bnobt scans.
 		 */
-		bno_cur_gt = bno_cur_lt;
-		bno_cur_lt = NULL;
-	}
-	/*
-	 * Found something.  Duplicate the cursor for the rightward search.
-	 */
-	else if ((error = xfs_btree_dup_cursor(bno_cur_lt, &bno_cur_gt)))
-		goto error0;
-	/*
-	 * Increment the cursor, so we will point at the entry just right
-	 * of the leftward entry if any, or to the leftmost entry.
-	 */
-	if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-		goto error0;
-	if (!i) {
-		/*
-		 * It failed, there are no rightward entries.
-		 */
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_NOERROR);
-		bno_cur_gt = NULL;
-	}
-	/*
-	 * Loop going left with the leftward cursor, right with the
-	 * rightward cursor, until either both directions give up or
-	 * we find an entry at least as big as minlen.
-	 */
-	do {
-		if (bno_cur_lt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-			busy |= xfs_alloc_compute_aligned(args, ltbno, ltlen,
-					&ltbnoa, &ltlena, &busy_gen);
-			if (ltlena >= args->minlen && ltbnoa >= args->min_agbno)
-				break;
-			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
-				goto error0;
-			if (!i || ltbnoa < args->min_agbno) {
-				xfs_btree_del_cursor(bno_cur_lt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_lt = NULL;
-			}
-		}
-		if (bno_cur_gt) {
-			if ((error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-			busy |= xfs_alloc_compute_aligned(args, gtbno, gtlen,
-					&gtbnoa, &gtlena, &busy_gen);
-			if (gtlena >= args->minlen && gtbnoa <= args->max_agbno)
-				break;
-			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
-				goto error0;
-			if (!i || gtbnoa > args->max_agbno) {
-				xfs_btree_del_cursor(bno_cur_gt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_gt = NULL;
+		error = xfs_alloc_lookup_iter(args, acur, &acur->cnt);
+		if (error)
+			return error;
+		if (!acur->cnt.active) {
+			if (!acur->len) {
+				fbcur = &acur->cnt;
+				error = xfs_btree_decrement(fbcur->cur, 0, &i);
+				if (error)
+					return error;
+				if (i)
+					fbcur->active = true;
 			}
+			break;
 		}
-	} while (bno_cur_lt || bno_cur_gt);
+	}
 
 	/*
-	 * Got both cursors still active, need to find better entry.
+	 * Perform a best-effort search in the opposite direction from a bnobt
+	 * allocation or backwards from the end of the cntbt if we couldn't find
+	 * a maxlen extent.
 	 */
-	if (bno_cur_lt && bno_cur_gt) {
-		if (ltlena >= args->minlen) {
-			/*
-			 * Left side is good, look for a right side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, ltbnoa,
-				ltlena, &ltnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_lt, &bno_cur_gt,
-						ltdiff, &gtbno, &gtlen,
-						&gtbnoa, &gtlena,
-						0 /* search right */);
-		} else {
-			ASSERT(gtlena >= args->minlen);
-
-			/*
-			 * Right side is good, look for a left side entry.
-			 */
-			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
-			xfs_alloc_fix_len(args);
-			gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, args->datatype, gtbnoa,
-				gtlena, &gtnew);
-
-			error = xfs_alloc_find_best_extent(args,
-						&bno_cur_gt, &bno_cur_lt,
-						gtdiff, &ltbno, &ltlen,
-						&ltbnoa, &ltlena,
-						1 /* search left */);
-		}
-
+	if (fbcur) {
+		error = xfs_alloc_walk_iter(args, acur, fbcur, fbinc, true,
+					    findbestcount, &i);
 		if (error)
-			goto error0;
+			return error;
 	}
 
-	/*
-	 * If we couldn't get anything, give up.
-	 */
-	if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+	if (acur->len)
+		*stat = 1;
 
-		if (busy) {
-			trace_xfs_alloc_near_busy(args);
-			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
-			goto restart;
-		}
-		trace_xfs_alloc_size_neither(args);
-		args->agbno = NULLAGBLOCK;
-		return 0;
-	}
+	return error;
+}
 
-	/*
-	 * At this point we have selected a freespace entry, either to the
-	 * left or to the right.  If it's on the right, copy all the
-	 * useful variables to the "left" set so we only have one
-	 * copy of this code.
-	 */
-	if (bno_cur_gt) {
-		bno_cur_lt = bno_cur_gt;
-		bno_cur_gt = NULL;
-		ltbno = gtbno;
-		ltbnoa = gtbnoa;
-		ltlen = gtlen;
-		ltlena = gtlena;
-		j = 1;
-	} else
-		j = 0;
+/*
+ * Allocate a variable extent near bno in the allocation group agno.
+ * Extent's length (returned in len) will be between minlen and maxlen,
+ * and of the form k * prod + mod unless there's nothing that large.
+ * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
+ */
+STATIC int				/* error */
+xfs_alloc_ag_vextent_type(
+	xfs_alloc_arg_t	*args)		/* allocation argument structure */
+{
+	struct xfs_alloc_cur	acur = {0,};
+	int			error;		/* error code */
+	int			i;		/* result code, temporary */
+	xfs_agblock_t		bno;	      /* start bno of left side entry */
+	xfs_extlen_t		len;		/* length of left side entry */
+
+	/* handle unitialized agbno range so caller doesn't have to */
+	if (!args->min_agbno && !args->max_agbno)
+		args->max_agbno = args->mp->m_sb.sb_agblocks - 1;
+	ASSERT(args->min_agbno <= args->max_agbno);
+
+	/* clamp agbno to the range if it's outside */
+	if (args->agbno < args->min_agbno)
+		args->agbno = args->min_agbno;
+	if (args->agbno > args->max_agbno)
+		args->agbno = args->max_agbno;
+
+restart:
+	/* set up cursors and allocation tracking structure based on args */
+	error = xfs_alloc_cur_setup(args, &acur);
+	if (error)
+		goto out;
+
+	error = xfs_alloc_ag_vextent_cur(args, &acur, &i);
+	if (error)
+		goto out;
 
 	/*
-	 * Fix up the length and compute the useful address.
+	 * If we got an extent, finish the allocation. Otherwise check for busy
+	 * extents and retry or attempt a small allocation.
 	 */
-	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
-	xfs_alloc_fix_len(args);
-	rlen = args->len;
-	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
-				     args->datatype, ltbnoa, ltlena, &ltnew);
-	ASSERT(ltnew >= ltbno);
-	ASSERT(ltnew + rlen <= ltbnoa + ltlena);
-	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-	ASSERT(ltnew >= args->min_agbno && ltnew <= args->max_agbno);
-	args->agbno = ltnew;
-
-	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
-			ltnew, rlen, XFSA_FIXUP_BNO_OK)))
-		goto error0;
+	if (i) {
+		error = xfs_alloc_cur_finish(args, &acur);
+		if (error)
+			goto out;
+	} else  {
+		if (acur.busy) {
+			trace_xfs_alloc_near_busy(args);
+			xfs_extent_busy_flush(args->mp, args->pag,
+					      acur.busy_gen);
+			goto restart;
+		}
 
-	if (j)
-		trace_xfs_alloc_near_greater(args);
-	else
-		trace_xfs_alloc_near_lesser(args);
+		/*
+		 * We get here if we can't satisfy minlen or the trees are
+		 * empty. We don't pass a cursor so this returns an AGFL block
+		 * (i == 0) or nothing.
+		 */
+		error = xfs_alloc_ag_vextent_small(args, NULL, &bno, &len, &i);
+		if (error)
+			goto out;
+		ASSERT(i == 0 || (i && len == 0));
+		trace_xfs_alloc_near_noentry(args);
 
-	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-	xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_NOERROR);
-	return 0;
+		args->agbno = bno;
+		args->len = len;
+	}
 
- error0:
-	trace_xfs_alloc_near_error(args);
-	if (cnt_cur != NULL)
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
-	if (bno_cur_lt != NULL)
-		xfs_btree_del_cursor(bno_cur_lt, XFS_BTREE_ERROR);
-	if (bno_cur_gt != NULL)
-		xfs_btree_del_cursor(bno_cur_gt, XFS_BTREE_ERROR);
+out:
+	xfs_alloc_cur_close(&acur, error);
+	if (error)
+		trace_xfs_alloc_near_error(args);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 2464ea351f83..d08d747b51a8 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1636,13 +1636,10 @@ DEFINE_ALLOC_EVENT(xfs_alloc_exact_done);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_first);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
-DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
-DEFINE_ALLOC_EVENT(xfs_alloc_size_neither);
+DEFINE_ALLOC_EVENT(xfs_alloc_cur);
 DEFINE_ALLOC_EVENT(xfs_alloc_size_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_size_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_size_done);
@@ -1658,6 +1655,27 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
 
+TRACE_EVENT(xfs_alloc_cur_new,
+	TP_PROTO(struct xfs_mount *mp, xfs_agblock_t bno, xfs_extlen_t len,
+		 xfs_extlen_t diff),
+	TP_ARGS(mp, bno, len, diff),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agblock_t, bno)
+		__field(xfs_extlen_t, len)
+		__field(xfs_extlen_t, diff)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->bno = bno;
+		__entry->len = len;
+		__entry->diff = diff;
+	),
+	TP_printk("dev %d:%d bno 0x%x len 0x%x diff 0x%x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->bno, __entry->len, __entry->diff)
+)
+
 DECLARE_EVENT_CLASS(xfs_da_class,
 	TP_PROTO(struct xfs_da_args *args),
 	TP_ARGS(args),
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] xfs: refactor exact extent allocation mode
  2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
                   ` (2 preceding siblings ...)
  2019-05-09 16:58 ` [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations Brian Foster
@ 2019-05-09 16:58 ` Brian Foster
  2019-05-09 16:58 ` [PATCH 5/6] xfs: refactor by-size " Brian Foster
  2019-05-09 16:58 ` [PATCH 6/6] xfs: replace small allocation logic with agfl only logic Brian Foster
  5 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

Exact allocation mode attemps to allocate at a specific block and
otherwise fails. The implementation is straightforward and mostly
contained in a single function. It uses the bnobt to look up the
requested block and succeeds or fails.

An exact allocation is essentially just a near allocation with
slightly more strict requirements. Most of the boilerplate code
associated with an exact allocation is already implemented in the
generic infrastructure. The additional logic that is required is
oneshot behavior for cursor allocation and lookup and the record
examination requirements specific to allocation mode.

Update the generic allocation code to support exact mode allocations
and replace the existing implementation. This essentially provides
the same behavior with improved code reuse and less duplicated code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 181 +++++++++++---------------------------
 fs/xfs/xfs_trace.h        |   1 -
 2 files changed, 49 insertions(+), 133 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 112411a46891..9e22c6740ce3 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -38,7 +38,6 @@ struct workqueue_struct *xfs_alloc_wq;
 #define	XFSA_FIXUP_BNO_OK	1
 #define	XFSA_FIXUP_CNT_OK	2
 
-STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_type(struct xfs_alloc_arg *);
 STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
@@ -728,10 +727,8 @@ xfs_alloc_ag_vextent(
 		error = xfs_alloc_ag_vextent_size(args);
 		break;
 	case XFS_ALLOCTYPE_NEAR_BNO:
-		error = xfs_alloc_ag_vextent_type(args);
-		break;
 	case XFS_ALLOCTYPE_THIS_BNO:
-		error = xfs_alloc_ag_vextent_exact(args);
+		error = xfs_alloc_ag_vextent_type(args);
 		break;
 	default:
 		ASSERT(0);
@@ -772,120 +769,6 @@ xfs_alloc_ag_vextent(
 	return error;
 }
 
-/*
- * Allocate a variable extent at exactly agno/bno.
- * Extent's length (returned in *len) will be between minlen and maxlen,
- * and of the form k * prod + mod unless there's nothing that large.
- * Return the starting a.g. block (bno), or NULLAGBLOCK if we can't do it.
- */
-STATIC int			/* error */
-xfs_alloc_ag_vextent_exact(
-	xfs_alloc_arg_t	*args)	/* allocation argument structure */
-{
-	xfs_btree_cur_t	*bno_cur;/* by block-number btree cursor */
-	xfs_btree_cur_t	*cnt_cur;/* by count btree cursor */
-	int		error;
-	xfs_agblock_t	fbno;	/* start block of found extent */
-	xfs_extlen_t	flen;	/* length of found extent */
-	xfs_agblock_t	tbno;	/* start block of busy extent */
-	xfs_extlen_t	tlen;	/* length of busy extent */
-	xfs_agblock_t	tend;	/* end block of busy extent */
-	int		i;	/* success/failure of operation */
-	unsigned	busy_gen;
-
-	ASSERT(args->alignment == 1);
-
-	/*
-	 * Allocate/initialize a cursor for the by-number freespace btree.
-	 */
-	bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-					  args->agno, XFS_BTNUM_BNO);
-
-	/*
-	 * Lookup bno and minlen in the btree (minlen is irrelevant, really).
-	 * Look for the closest free block <= bno, it must contain bno
-	 * if any free block does.
-	 */
-	error = xfs_alloc_lookup_le(bno_cur, args->agbno, args->minlen, &i);
-	if (error)
-		goto error0;
-	if (!i)
-		goto not_found;
-
-	/*
-	 * Grab the freespace record.
-	 */
-	error = xfs_alloc_get_rec(bno_cur, &fbno, &flen, &i);
-	if (error)
-		goto error0;
-	XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-	ASSERT(fbno <= args->agbno);
-
-	/*
-	 * Check for overlapping busy extents.
-	 */
-	tbno = fbno;
-	tlen = flen;
-	xfs_extent_busy_trim(args, &tbno, &tlen, &busy_gen);
-
-	/*
-	 * Give up if the start of the extent is busy, or the freespace isn't
-	 * long enough for the minimum request.
-	 */
-	if (tbno > args->agbno)
-		goto not_found;
-	if (tlen < args->minlen)
-		goto not_found;
-	tend = tbno + tlen;
-	if (tend < args->agbno + args->minlen)
-		goto not_found;
-
-	/*
-	 * End of extent will be smaller of the freespace end and the
-	 * maximal requested end.
-	 *
-	 * Fix the length according to mod and prod if given.
-	 */
-	args->len = XFS_AGBLOCK_MIN(tend, args->agbno + args->maxlen)
-						- args->agbno;
-	xfs_alloc_fix_len(args);
-	ASSERT(args->agbno + args->len <= tend);
-
-	/*
-	 * We are allocating agbno for args->len
-	 * Allocate/initialize a cursor for the by-size btree.
-	 */
-	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_CNT);
-	ASSERT(args->agbno + args->len <=
-		be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
-	error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, fbno, flen, args->agbno,
-				      args->len, XFSA_FIXUP_BNO_OK);
-	if (error) {
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
-		goto error0;
-	}
-
-	xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-
-	args->wasfromfl = 0;
-	trace_xfs_alloc_exact_done(args);
-	return 0;
-
-not_found:
-	/* Didn't find it, return null. */
-	xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-	args->agbno = NULLAGBLOCK;
-	trace_xfs_alloc_exact_notfound(args);
-	return 0;
-
-error0:
-	xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
-	trace_xfs_alloc_exact_error(args);
-	return error;
-}
-
 /*
  * BLock allocation algorithm and data structures.
  */
@@ -964,6 +847,15 @@ xfs_alloc_cur_setup(
 	if (i)
 		acur->bnolt.active = true;
 
+	/*
+	 * Exact allocation mode requires only one bnobt cursor.
+	 */
+	if (args->type == XFS_ALLOCTYPE_THIS_BNO) {
+		ASSERT(args->alignment == 1);
+		acur->cnt.active = false;
+		return 0;
+	}
+
 	if (!acur->bnogt.cur)
 		acur->bnogt.cur = xfs_allocbt_init_cursor(args->mp, args->tp,
 					args->agbp, args->agno, XFS_BTNUM_BNO);
@@ -1030,6 +922,12 @@ xfs_alloc_cur_check(
 	if (olen)
 		*olen = len;
 
+	/* exact allocs only check one record, mark the cursor inactive */
+	if (args->type == XFS_ALLOCTYPE_THIS_BNO) {
+		ASSERT(isbnobt);
+		bcur->active = false;
+	}
+
 	/*
 	 * Check against minlen and then compute and check the aligned record.
 	 * If a cntbt record is out of size range (i.e., we're walking
@@ -1061,22 +959,39 @@ xfs_alloc_cur_check(
 
 	/*
 	 * We have an aligned record that satisfies minlen and beats the current
-	 * candidate length. The remaining locality checks are specific to near
-	 * allocation mode.
+	 * candidate length. The remaining checks depend on allocation type.
+	 * Exact allocation checks one record and either succeeds or fails. Near
+	 * allocation computes and checks locality.  Near allocation computes
+	 * and checks locality.
 	 */
-	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
-	diff = xfs_alloc_compute_diff(args->agbno, args->len,
-				      args->alignment, args->datatype,
-				      bnoa, lena, &bnew);
-	if (bnew == NULLAGBLOCK)
-		goto fail;
-	if (diff > acur->diff) {
-		/* deactivate bnobt cursor with worse locality */
-		deactivate = isbnobt;
-		goto fail;
+	if (args->type == XFS_ALLOCTYPE_THIS_BNO) {
+		if ((bnoa > args->agbno) ||
+		    (bnoa + lena < args->agbno + args->minlen)) {
+			trace_xfs_alloc_exact_notfound(args);
+			goto fail;
+		}
+
+		bnew = args->agbno;
+		args->len = XFS_AGBLOCK_MIN(bnoa + lena,
+					    args->agbno + args->maxlen) -
+			    args->agbno;
+		diff = 0;
+		trace_xfs_alloc_exact_done(args);
+	} else {
+		ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
+		diff = xfs_alloc_compute_diff(args->agbno, args->len,
+					      args->alignment, args->datatype,
+					      bnoa, lena, &bnew);
+		if (bnew == NULLAGBLOCK)
+			goto fail;
+		if (diff > acur->diff) {
+			/* deactivate bnobt cursor with worse locality */
+			deactivate = isbnobt;
+			goto fail;
+		}
+		if (args->len < acur->len)
+			goto fail;
 	}
-	if (args->len < acur->len)
-		goto fail;
 
 	/* found a new candidate extent */
 	acur->rec_bno = bno;
@@ -1280,6 +1195,8 @@ xfs_alloc_ag_vextent_cur(
 					    true, 1, &i);
 		if (error)
 			return error;
+		if (args->type == XFS_ALLOCTYPE_THIS_BNO)
+			break;
 		if (i) {
 			fbcur = &acur->bnogt;
 			fbinc = true;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d08d747b51a8..aa3b6f181d08 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1634,7 +1634,6 @@ DEFINE_EVENT(xfs_alloc_class, name, \
 	TP_ARGS(args))
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_done);
 DEFINE_ALLOC_EVENT(xfs_alloc_exact_notfound);
-DEFINE_ALLOC_EVENT(xfs_alloc_exact_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] xfs: refactor by-size extent allocation mode
  2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
                   ` (3 preceding siblings ...)
  2019-05-09 16:58 ` [PATCH 4/6] xfs: refactor exact extent allocation mode Brian Foster
@ 2019-05-09 16:58 ` Brian Foster
  2019-05-10 17:34   ` Christoph Hellwig
  2019-05-09 16:58 ` [PATCH 6/6] xfs: replace small allocation logic with agfl only logic Brian Foster
  5 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

By-size allocation mode is essentially a near allocation mode
without a locality requirement. The existing code looks up a
suitably sized extent in the cntbt and either succeeds or falls back
to a forward or reverse scan and eventually to the AGFL.

While similar in concept to near allocation mode, the lookup/search
algorithm is far more simple. As such, size allocation mode is still
more cleanly implemented with a mode-specific algorithm function.
However, this function reuses underlying mechanism used by the bnobt
scan for a near mode allocation to instead walk the cntbt looking
for a suitably sized extent. Much of the setup, finish and AGFL
fallback code is also unnecessarily duplicated in the current
implementation and can be removed.

Implement a by-size allocation mode search algorithm, tweak the
generic infrastructure to handle by-size allocations and replace the
old by-size implementation. As with exact allocation mode, this
essentially provides the same behavior with less duplicate mode
specific code.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 274 +++++++++-----------------------------
 fs/xfs/xfs_trace.h        |   4 -
 2 files changed, 65 insertions(+), 213 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 9e22c6740ce3..0b121cb5ef3f 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -39,7 +39,6 @@ struct workqueue_struct *xfs_alloc_wq;
 #define	XFSA_FIXUP_CNT_OK	2
 
 STATIC int xfs_alloc_ag_vextent_type(struct xfs_alloc_arg *);
-STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
@@ -724,8 +723,6 @@ xfs_alloc_ag_vextent(
 	args->wasfromfl = 0;
 	switch (args->type) {
 	case XFS_ALLOCTYPE_THIS_AG:
-		error = xfs_alloc_ag_vextent_size(args);
-		break;
 	case XFS_ALLOCTYPE_NEAR_BNO:
 	case XFS_ALLOCTYPE_THIS_BNO:
 		error = xfs_alloc_ag_vextent_type(args);
@@ -817,6 +814,8 @@ xfs_alloc_cur_setup(
 
 	if (args->agbno != NULLAGBLOCK)
 		agbno = args->agbno;
+	if (args->type == XFS_ALLOCTYPE_THIS_AG)
+		acur->cur_len += args->alignment - 1;
 
 	/*
 	 * Initialize the cntbt cursor and determine whether to start the search
@@ -827,7 +826,7 @@ xfs_alloc_cur_setup(
 		acur->cnt.cur = xfs_allocbt_init_cursor(args->mp, args->tp,
 					args->agbp, args->agno, XFS_BTNUM_CNT);
 	error = xfs_alloc_lookup_ge(acur->cnt.cur, agbno, acur->cur_len, &i);
-	if (!i) {
+	if (!i && args->type != XFS_ALLOCTYPE_THIS_AG) {
 		acur->cur_len = args->minlen;
 		error = xfs_alloc_lookup_ge(acur->cnt.cur, agbno, acur->cur_len,
 					    &i);
@@ -848,13 +847,15 @@ xfs_alloc_cur_setup(
 		acur->bnolt.active = true;
 
 	/*
-	 * Exact allocation mode requires only one bnobt cursor.
+	 * Exact allocation mode uses the bnobt, by-size allocation mode uses
+	 * the cntbt, either one requires only one bnobt cursor.
 	 */
 	if (args->type == XFS_ALLOCTYPE_THIS_BNO) {
 		ASSERT(args->alignment == 1);
 		acur->cnt.active = false;
 		return 0;
-	}
+	} else if (args->type == XFS_ALLOCTYPE_THIS_AG)
+		return 0;
 
 	if (!acur->bnogt.cur)
 		acur->bnogt.cur = xfs_allocbt_init_cursor(args->mp, args->tp,
@@ -960,9 +961,10 @@ xfs_alloc_cur_check(
 	/*
 	 * We have an aligned record that satisfies minlen and beats the current
 	 * candidate length. The remaining checks depend on allocation type.
-	 * Exact allocation checks one record and either succeeds or fails. Near
-	 * allocation computes and checks locality.  Near allocation computes
-	 * and checks locality.
+	 * Exact allocation checks one record and either succeeds or fails.
+	 * By-size allocation only needs to deactivate the cursor once we've
+	 * found a maxlen candidate. Near allocation computes and checks
+	 * locality. Near allocation computes and checks locality.
 	 */
 	if (args->type == XFS_ALLOCTYPE_THIS_BNO) {
 		if ((bnoa > args->agbno) ||
@@ -977,6 +979,12 @@ xfs_alloc_cur_check(
 			    args->agbno;
 		diff = 0;
 		trace_xfs_alloc_exact_done(args);
+	} else if (args->type == XFS_ALLOCTYPE_THIS_AG) {
+		if (lena >= args->maxlen) {
+			bcur->active = false;
+			trace_xfs_alloc_size_done(args);
+		}
+		bnew = bnoa;
 	} else {
 		ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
 		diff = xfs_alloc_compute_diff(args->agbno, args->len,
@@ -1162,6 +1170,50 @@ xfs_alloc_walk_iter(
 	return error;
 }
 
+/*
+ * High level size allocation algorithm.
+ */
+STATIC int
+xfs_alloc_ag_vextent_size(
+	struct xfs_alloc_arg	*args,
+	struct xfs_alloc_cur	*acur,
+	int			*stat)
+{
+	int			error;
+	int			i;
+	bool			increment = true;
+
+	ASSERT(args->type == XFS_ALLOCTYPE_THIS_AG);
+	*stat = 0;
+
+	/*
+	 * The cursor either points at the first sufficiently sized extent for
+	 * an aligned maxlen allocation or off the edge of the tree. The only
+	 * way the former should fail is if the target extents are busy, so
+	 * return nothing and let the caller flush and retry. If the latter,
+	 * point the cursor at the last valid record and walk backwards from
+	 * there. There is still a chance to find a minlen extent.
+	 */
+	if (!acur->cnt.active) {
+		increment = false;
+		error = xfs_btree_decrement(acur->cnt.cur, 0, &i);
+		if (error)
+			return error;
+		if (i)
+			acur->cnt.active = true;
+	}
+
+	error = xfs_alloc_walk_iter(args, acur, &acur->cnt, increment, false,
+				    INT_MAX, &i);
+	if (error)
+		return error;
+
+	ASSERT(i == 1 || acur->busy || !increment);
+	if (acur->len)
+		*stat = 1;
+	return 0;
+}
+
 /*
  * High level locality allocation algorithm. Search the bnobt (left and right)
  * in parallel with locality-optimized cntbt lookups to find an extent with
@@ -1285,7 +1337,10 @@ xfs_alloc_ag_vextent_type(
 	if (error)
 		goto out;
 
-	error = xfs_alloc_ag_vextent_cur(args, &acur, &i);
+	if (args->type == XFS_ALLOCTYPE_THIS_AG)
+		error = xfs_alloc_ag_vextent_size(args, &acur, &i);
+	else
+		error = xfs_alloc_ag_vextent_cur(args, &acur, &i);
 	if (error)
 		goto out;
 
@@ -1327,205 +1382,6 @@ xfs_alloc_ag_vextent_type(
 	return error;
 }
 
-/*
- * Allocate a variable extent anywhere in the allocation group agno.
- * Extent's length (returned in len) will be between minlen and maxlen,
- * and of the form k * prod + mod unless there's nothing that large.
- * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
- */
-STATIC int				/* error */
-xfs_alloc_ag_vextent_size(
-	xfs_alloc_arg_t	*args)		/* allocation argument structure */
-{
-	xfs_btree_cur_t	*bno_cur;	/* cursor for bno btree */
-	xfs_btree_cur_t	*cnt_cur;	/* cursor for cnt btree */
-	int		error;		/* error result */
-	xfs_agblock_t	fbno;		/* start of found freespace */
-	xfs_extlen_t	flen;		/* length of found freespace */
-	int		i;		/* temp status variable */
-	xfs_agblock_t	rbno;		/* returned block number */
-	xfs_extlen_t	rlen;		/* length of returned extent */
-	bool		busy;
-	unsigned	busy_gen;
-
-restart:
-	/*
-	 * Allocate and initialize a cursor for the by-size btree.
-	 */
-	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_CNT);
-	bno_cur = NULL;
-	busy = false;
-
-	/*
-	 * Look for an entry >= maxlen+alignment-1 blocks.
-	 */
-	if ((error = xfs_alloc_lookup_ge(cnt_cur, 0,
-			args->maxlen + args->alignment - 1, &i)))
-		goto error0;
-
-	/*
-	 * If none then we have to settle for a smaller extent. In the case that
-	 * there are no large extents, this will return the last entry in the
-	 * tree unless the tree is empty. In the case that there are only busy
-	 * large extents, this will return the largest small extent unless there
-	 * are no smaller extents available.
-	 */
-	if (!i) {
-		error = xfs_alloc_ag_vextent_small(args, cnt_cur,
-						   &fbno, &flen, &i);
-		if (error)
-			goto error0;
-		if (i == 0 || flen == 0) {
-			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-			trace_xfs_alloc_size_noentry(args);
-			return 0;
-		}
-		ASSERT(i == 1);
-		busy = xfs_alloc_compute_aligned(args, fbno, flen, &rbno,
-				&rlen, &busy_gen);
-	} else {
-		/*
-		 * Search for a non-busy extent that is large enough.
-		 */
-		for (;;) {
-			error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
-			if (error)
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-
-			busy = xfs_alloc_compute_aligned(args, fbno, flen,
-					&rbno, &rlen, &busy_gen);
-
-			if (rlen >= args->maxlen)
-				break;
-
-			error = xfs_btree_increment(cnt_cur, 0, &i);
-			if (error)
-				goto error0;
-			if (i == 0) {
-				/*
-				 * Our only valid extents must have been busy.
-				 * Make it unbusy by forcing the log out and
-				 * retrying.
-				 */
-				xfs_btree_del_cursor(cnt_cur,
-						     XFS_BTREE_NOERROR);
-				trace_xfs_alloc_size_busy(args);
-				xfs_extent_busy_flush(args->mp,
-							args->pag, busy_gen);
-				goto restart;
-			}
-		}
-	}
-
-	/*
-	 * In the first case above, we got the last entry in the
-	 * by-size btree.  Now we check to see if the space hits maxlen
-	 * once aligned; if not, we search left for something better.
-	 * This can't happen in the second case above.
-	 */
-	rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
-	XFS_WANT_CORRUPTED_GOTO(args->mp, rlen == 0 ||
-			(rlen <= flen && rbno + rlen <= fbno + flen), error0);
-	if (rlen < args->maxlen) {
-		xfs_agblock_t	bestfbno;
-		xfs_extlen_t	bestflen;
-		xfs_agblock_t	bestrbno;
-		xfs_extlen_t	bestrlen;
-
-		bestrlen = rlen;
-		bestrbno = rbno;
-		bestflen = flen;
-		bestfbno = fbno;
-		for (;;) {
-			if ((error = xfs_btree_decrement(cnt_cur, 0, &i)))
-				goto error0;
-			if (i == 0)
-				break;
-			if ((error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen,
-					&i)))
-				goto error0;
-			XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-			if (flen < bestrlen)
-				break;
-			busy = xfs_alloc_compute_aligned(args, fbno, flen,
-					&rbno, &rlen, &busy_gen);
-			rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
-			XFS_WANT_CORRUPTED_GOTO(args->mp, rlen == 0 ||
-				(rlen <= flen && rbno + rlen <= fbno + flen),
-				error0);
-			if (rlen > bestrlen) {
-				bestrlen = rlen;
-				bestrbno = rbno;
-				bestflen = flen;
-				bestfbno = fbno;
-				if (rlen == args->maxlen)
-					break;
-			}
-		}
-		if ((error = xfs_alloc_lookup_eq(cnt_cur, bestfbno, bestflen,
-				&i)))
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-		rlen = bestrlen;
-		rbno = bestrbno;
-		flen = bestflen;
-		fbno = bestfbno;
-	}
-	args->wasfromfl = 0;
-	/*
-	 * Fix up the length.
-	 */
-	args->len = rlen;
-	if (rlen < args->minlen) {
-		if (busy) {
-			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-			trace_xfs_alloc_size_busy(args);
-			xfs_extent_busy_flush(args->mp, args->pag, busy_gen);
-			goto restart;
-		}
-		goto out_nominleft;
-	}
-	xfs_alloc_fix_len(args);
-
-	rlen = args->len;
-	XFS_WANT_CORRUPTED_GOTO(args->mp, rlen <= flen, error0);
-	/*
-	 * Allocate and initialize a cursor for the by-block tree.
-	 */
-	bno_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
-		args->agno, XFS_BTNUM_BNO);
-	if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur, fbno, flen,
-			rbno, rlen, XFSA_FIXUP_CNT_OK)))
-		goto error0;
-	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-	xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-	cnt_cur = bno_cur = NULL;
-	args->len = rlen;
-	args->agbno = rbno;
-	XFS_WANT_CORRUPTED_GOTO(args->mp,
-		args->agbno + args->len <=
-			be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length),
-		error0);
-	trace_xfs_alloc_size_done(args);
-	return 0;
-
-error0:
-	trace_xfs_alloc_size_error(args);
-	if (cnt_cur)
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_ERROR);
-	if (bno_cur)
-		xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
-	return error;
-
-out_nominleft:
-	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-	trace_xfs_alloc_size_nominleft(args);
-	args->agbno = NULLAGBLOCK;
-	return 0;
-}
-
 /*
  * Deal with the case where only small freespaces remain.
  * Either return the contents of the last freespace record,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index aa3b6f181d08..54be8e30ab11 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1639,11 +1639,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur);
-DEFINE_ALLOC_EVENT(xfs_alloc_size_noentry);
-DEFINE_ALLOC_EVENT(xfs_alloc_size_nominleft);
 DEFINE_ALLOC_EVENT(xfs_alloc_size_done);
-DEFINE_ALLOC_EVENT(xfs_alloc_size_error);
-DEFINE_ALLOC_EVENT(xfs_alloc_size_busy);
 DEFINE_ALLOC_EVENT(xfs_alloc_small_freelist);
 DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
 DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] xfs: replace small allocation logic with agfl only logic
  2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
                   ` (4 preceding siblings ...)
  2019-05-09 16:58 ` [PATCH 5/6] xfs: refactor by-size " Brian Foster
@ 2019-05-09 16:58 ` Brian Foster
  2019-05-15  7:53   ` Christoph Hellwig
  5 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-09 16:58 UTC (permalink / raw)
  To: linux-xfs

Now that the various extent allocation modes have been reworked,
there are no more users of a large portion of
xfs_alloc_ag_vextent_small(). Remove the unnecessary record handling
logic, refactor and rename this function to a simple AGFL allocation
helper and simplify the interface.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 141 ++++++++++++++------------------------
 fs/xfs/xfs_trace.h        |   7 +-
 2 files changed, 56 insertions(+), 92 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 0b121cb5ef3f..4f2fa44a1460 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -39,8 +39,7 @@ struct workqueue_struct *xfs_alloc_wq;
 #define	XFSA_FIXUP_CNT_OK	2
 
 STATIC int xfs_alloc_ag_vextent_type(struct xfs_alloc_arg *);
-STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
-		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
+STATIC int xfs_alloc_ag_vextent_agfl(struct xfs_alloc_arg *, xfs_agblock_t *);
 
 /*
  * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of slots in
@@ -1318,7 +1317,6 @@ xfs_alloc_ag_vextent_type(
 	int			error;		/* error code */
 	int			i;		/* result code, temporary */
 	xfs_agblock_t		bno;	      /* start bno of left side entry */
-	xfs_extlen_t		len;		/* length of left side entry */
 
 	/* handle unitialized agbno range so caller doesn't have to */
 	if (!args->min_agbno && !args->max_agbno)
@@ -1365,14 +1363,16 @@ xfs_alloc_ag_vextent_type(
 		 * empty. We don't pass a cursor so this returns an AGFL block
 		 * (i == 0) or nothing.
 		 */
-		error = xfs_alloc_ag_vextent_small(args, NULL, &bno, &len, &i);
+		error = xfs_alloc_ag_vextent_agfl(args, &bno);
 		if (error)
 			goto out;
-		ASSERT(i == 0 || (i && len == 0));
 		trace_xfs_alloc_near_noentry(args);
 
 		args->agbno = bno;
-		args->len = len;
+		if (bno != NULLAGBLOCK) {
+			args->wasfromfl = 1;
+			args->len = 1;
+		}
 	}
 
 out:
@@ -1383,108 +1383,73 @@ xfs_alloc_ag_vextent_type(
 }
 
 /*
- * Deal with the case where only small freespaces remain.
- * Either return the contents of the last freespace record,
- * or allocate space from the freelist if there is nothing in the tree.
+ * Attempt to allocate from the AGFL. This is a last resort when no other free
+ * space is available.
  */
-STATIC int			/* error */
-xfs_alloc_ag_vextent_small(
+STATIC int
+xfs_alloc_ag_vextent_agfl(
 	struct xfs_alloc_arg	*args,	/* allocation argument structure */
-	struct xfs_btree_cur	*ccur,	/* optional by-size cursor */
-	xfs_agblock_t		*fbnop,	/* result block number */
-	xfs_extlen_t		*flenp,	/* result length */
-	int			*stat)	/* status: 0-freelist, 1-normal/none */
+	xfs_agblock_t		*fbnop)	/* result block number */
 {
 	int			error = 0;
 	xfs_agblock_t		fbno;
-	xfs_extlen_t		flen;
-	int			i = 0;
+
+	*fbnop = NULLAGBLOCK;
 
 	/*
-	 * If a cntbt cursor is provided, try to allocate the largest record in
-	 * the tree. Try the AGFL if the cntbt is empty, otherwise fail the
-	 * allocation. Make sure to respect minleft even when pulling from the
-	 * freelist.
+	 * The AGFL can only perform unaligned, single block allocations. Also
+	 * make sure this isn't an allocation for the AGFL itself and to respect
+	 * minleft before we take a block.
 	 */
-	if (ccur)
-		error = xfs_btree_decrement(ccur, 0, &i);
+	if (args->minlen != 1 || args->alignment != 1 ||
+	    args->resv == XFS_AG_RESV_AGFL ||
+	    (be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_flcount) <=
+	     args->minleft)) {
+		trace_xfs_alloc_agfl_notenough(args);
+		goto out;
+	}
+
+	error = xfs_alloc_get_freelist(args->tp, args->agbp, &fbno, 0);
 	if (error)
-		goto error0;
-	if (i) {
-		error = xfs_alloc_get_rec(ccur, &fbno, &flen, &i);
-		if (error)
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(args->mp, i == 1, error0);
-	} else if (args->minlen == 1 && args->alignment == 1 &&
-		   args->resv != XFS_AG_RESV_AGFL &&
-		   (be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_flcount) >
-		    args->minleft)) {
-		error = xfs_alloc_get_freelist(args->tp, args->agbp, &fbno, 0);
-		if (error)
-			goto error0;
-		if (fbno != NULLAGBLOCK) {
-			xfs_extent_busy_reuse(args->mp, args->agno, fbno, 1,
-			      xfs_alloc_allow_busy_reuse(args->datatype));
+		goto out;
 
-			if (xfs_alloc_is_userdata(args->datatype)) {
-				xfs_buf_t	*bp;
+	if (fbno == NULLAGBLOCK)
+		goto out;
 
-				bp = xfs_btree_get_bufs(args->mp, args->tp,
-					args->agno, fbno, 0);
-				if (!bp) {
-					error = -EFSCORRUPTED;
-					goto error0;
-				}
-				xfs_trans_binval(args->tp, bp);
-			}
-			XFS_WANT_CORRUPTED_GOTO(args->mp,
-				args->agbno + args->len <=
-				be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length),
-				error0);
-			args->wasfromfl = 1;
-			trace_xfs_alloc_small_freelist(args);
+	xfs_extent_busy_reuse(args->mp, args->agno, fbno, 1,
+			      xfs_alloc_allow_busy_reuse(args->datatype));
 
-			/*
-			 * If we're feeding an AGFL block to something that
-			 * doesn't live in the free space, we need to clear
-			 * out the OWN_AG rmap.
-			 */
-			error = xfs_rmap_free(args->tp, args->agbp, args->agno,
-					fbno, 1, &XFS_RMAP_OINFO_AG);
-			if (error)
-				goto error0;
+	if (xfs_alloc_is_userdata(args->datatype)) {
+		struct xfs_buf	*bp;
 
-			*fbnop = args->agbno = fbno;
-			*flenp = args->len = 1;
-			*stat = 0;
-			return 0;
+		bp = xfs_btree_get_bufs(args->mp, args->tp, args->agno, fbno,
+					0);
+		if (!bp) {
+			error = -EFSCORRUPTED;
+			goto out;
 		}
-		/*
-		 * Nothing in the freelist.
-		 */
-		else
-			flen = 0;
-	} else {
-		fbno = NULLAGBLOCK;
-		flen = 0;
+		xfs_trans_binval(args->tp, bp);
 	}
+	XFS_WANT_CORRUPTED_GOTO(args->mp,
+		fbno < be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length),
+		out);
 
 	/*
-	 * Can't do the allocation, give up.
+	 * If we're feeding an AGFL block to something that doesn't live in the
+	 * free space, we need to clear out the OWN_AG rmap.
 	 */
-	if (flen < args->minlen) {
-		args->agbno = NULLAGBLOCK;
-		trace_xfs_alloc_small_notenough(args);
-		flen = 0;
-	}
+	error = xfs_rmap_free(args->tp, args->agbp, args->agno, fbno, 1,
+			      &XFS_RMAP_OINFO_AG);
+	if (error)
+		goto out;
+
 	*fbnop = fbno;
-	*flenp = flen;
-	*stat = 1;
-	trace_xfs_alloc_small_done(args);
-	return 0;
 
-error0:
-	trace_xfs_alloc_small_error(args);
+out:
+	if (error)
+		trace_xfs_alloc_agfl_error(args);
+	else
+		trace_xfs_alloc_agfl_done(args);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 54be8e30ab11..e0df6e8bc87a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1640,10 +1640,9 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
 DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
 DEFINE_ALLOC_EVENT(xfs_alloc_cur);
 DEFINE_ALLOC_EVENT(xfs_alloc_size_done);
-DEFINE_ALLOC_EVENT(xfs_alloc_small_freelist);
-DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
-DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
-DEFINE_ALLOC_EVENT(xfs_alloc_small_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_agfl_notenough);
+DEFINE_ALLOC_EVENT(xfs_alloc_agfl_done);
+DEFINE_ALLOC_EVENT(xfs_alloc_agfl_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_badargs);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_nofix);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt
  2019-05-09 16:58 ` [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt Brian Foster
@ 2019-05-10 17:24   ` Christoph Hellwig
  2019-05-13 15:44     ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-10 17:24 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

This looks pretty sensible to me.  What confuses me a bit is that
the patch is much more (good!) refactoring than the actual change.

If you have to respin it maybe split it up, making the actual
behavior change even more obvious.

Otherwise:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/6] xfs: always update params on small allocation
  2019-05-09 16:58 ` [PATCH 2/6] xfs: always update params on small allocation Brian Foster
@ 2019-05-10 17:26   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-10 17:26 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, May 09, 2019 at 12:58:35PM -0400, Brian Foster wrote:
> xfs_alloc_ag_vextent_small() doesn't update the output parameters in
> the event of an AGFL allocation. Instead, it updates the
> xfs_alloc_arg structure directly to complete the allocation.
> 
> Update both args and the output params to provide consistent
> behavior for future callers.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations
  2019-05-09 16:58 ` [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations Brian Foster
@ 2019-05-10 17:31   ` Christoph Hellwig
  2019-05-13 15:45     ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-10 17:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

Still digesting the algorithmic changes, but a few nitpicks below:

>  /*
> + * BLock allocation algorithm and data structures.

I think the upper case L is a typo.

> +struct xfs_alloc_btree_cur {
> +	struct xfs_btree_cur	*cur;		/* cursor */
> +	bool			active;		/* cursor active flag */
> +};

Can't we move the active flag inside the btree_cur, more specically
into enum xfs_btree_cur_private?

Or maybe we should byte the bullet and make xfs_btree_cur a structure
embedded into the type specific structures and use container_of.
But I certainly don't want to burden that on you and this series..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: refactor by-size extent allocation mode
  2019-05-09 16:58 ` [PATCH 5/6] xfs: refactor by-size " Brian Foster
@ 2019-05-10 17:34   ` Christoph Hellwig
  2019-05-13 15:46     ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-10 17:34 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

> @@ -724,8 +723,6 @@ xfs_alloc_ag_vextent(
>  	args->wasfromfl = 0;
>  	switch (args->type) {
>  	case XFS_ALLOCTYPE_THIS_AG:
> -		error = xfs_alloc_ag_vextent_size(args);
> -		break;
>  	case XFS_ALLOCTYPE_NEAR_BNO:
>  	case XFS_ALLOCTYPE_THIS_BNO:
>  		error = xfs_alloc_ag_vextent_type(args);
> @@ -817,6 +814,8 @@ xfs_alloc_cur_setup(
>  
>  	if (args->agbno != NULLAGBLOCK)
>  		agbno = args->agbno;
> +	if (args->type == XFS_ALLOCTYPE_THIS_AG)
> +		acur->cur_len += args->alignment - 1;

At this point we can just kill that switch, or even better
merge xfs_alloc_ag_vextent_type and xfs_alloc_ag_vextent.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt
  2019-05-10 17:24   ` Christoph Hellwig
@ 2019-05-13 15:44     ` Brian Foster
  2019-05-15  7:52       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-13 15:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, May 10, 2019 at 10:24:46AM -0700, Christoph Hellwig wrote:
> This looks pretty sensible to me.  What confuses me a bit is that
> the patch is much more (good!) refactoring than the actual change.
> 
> If you have to respin it maybe split it up, making the actual
> behavior change even more obvious.
> 

The only functional change was basically to check for ccur before using
it and initializing i to zero. It just seemed to make sense to clean up
the surrounding code while there, but I can either split out the
aesthetic cleanup or defer that stuff to the broader rework at the end
of the series (where the cursor stuff just gets ripped out anyways) if
either of those is cleaner..

Brian

> Otherwise:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations
  2019-05-10 17:31   ` Christoph Hellwig
@ 2019-05-13 15:45     ` Brian Foster
  2019-05-15  7:54       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-13 15:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, May 10, 2019 at 10:31:10AM -0700, Christoph Hellwig wrote:
> Still digesting the algorithmic changes, but a few nitpicks below:
> 

Ok. FWIW, note that this patch contains the core algorithm changes
(associated with NEAR alloc mode) and core factoring changes. The
remaining modes are intended to be refactors that for the most part
preserve the current algorithms.

> >  /*
> > + * BLock allocation algorithm and data structures.
> 
> I think the upper case L is a typo.
> 

Yep.

> > +struct xfs_alloc_btree_cur {
> > +	struct xfs_btree_cur	*cur;		/* cursor */
> > +	bool			active;		/* cursor active flag */
> > +};
> 
> Can't we move the active flag inside the btree_cur, more specically
> into enum xfs_btree_cur_private?
> 

Hmm, I had attempted some tighter integration with the xfs_btree_cur on
a previous variant and ran into some roadblocks, the details of which I
don't recall. That said, I may have tried to include more than just this
active state, ran into header issues, and then never really stepped back
from that to explore more incremental changes.

I think extending the priv union with something for the allocation code
to use makes sense. Your suggestion has me wondering if down the road we
could genericize this further to a bc_valid state or some such that
quickly indicates whether a cursor points at a valid record or off into
space. That's a matter for another series however..

> Or maybe we should byte the bullet and make xfs_btree_cur a structure
> embedded into the type specific structures and use container_of.
> But I certainly don't want to burden that on you and this series..

You mean to create tree specific cursor structures of which
xfs_btree_cur is a member, then the tree specific logic uses
container_of() to pull out whatever it needs from cur..? I'd need to
think more about that one, but indeed that is beyond the scope of this
work.

I do intend to take a closer look at what further cleanups this rework
might facilitate once finalized, but in my mind the initial target are
allocation bits like xfs_alloc_arg and replacing/condensing some of that
with the xfs_alloc_cur structure introduced here. I don't want to get
too far into that until the functional and factoring bits are settled
some, however..

Brian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: refactor by-size extent allocation mode
  2019-05-10 17:34   ` Christoph Hellwig
@ 2019-05-13 15:46     ` Brian Foster
  2019-05-15  8:10       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2019-05-13 15:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Fri, May 10, 2019 at 10:34:13AM -0700, Christoph Hellwig wrote:
> > @@ -724,8 +723,6 @@ xfs_alloc_ag_vextent(
> >  	args->wasfromfl = 0;
> >  	switch (args->type) {
> >  	case XFS_ALLOCTYPE_THIS_AG:
> > -		error = xfs_alloc_ag_vextent_size(args);
> > -		break;
> >  	case XFS_ALLOCTYPE_NEAR_BNO:
> >  	case XFS_ALLOCTYPE_THIS_BNO:
> >  		error = xfs_alloc_ag_vextent_type(args);
> > @@ -817,6 +814,8 @@ xfs_alloc_cur_setup(
> >  
> >  	if (args->agbno != NULLAGBLOCK)
> >  		agbno = args->agbno;
> > +	if (args->type == XFS_ALLOCTYPE_THIS_AG)
> > +		acur->cur_len += args->alignment - 1;
> 
> At this point we can just kill that switch, or even better
> merge xfs_alloc_ag_vextent_type and xfs_alloc_ag_vextent.

Yeah, we can probably replace that switch with a simple assert on the
allocation type.

WRT to merging the functions, I'm a little concerned about the result
being too large. What do you think about folding in _vextent_type() but
at the same time factoring out the rmap/counter/resv post alloc bits
into an xfs_alloc_ag_vextent_accounting() helper or some such?

Brian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt
  2019-05-13 15:44     ` Brian Foster
@ 2019-05-15  7:52       ` Christoph Hellwig
  2019-05-15 14:41         ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-15  7:52 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Mon, May 13, 2019 at 11:44:34AM -0400, Brian Foster wrote:
> The only functional change was basically to check for ccur before using
> it and initializing i to zero. It just seemed to make sense to clean up
> the surrounding code while there, but I can either split out the
> aesthetic cleanup or defer that stuff to the broader rework at the end
> of the series (where the cursor stuff just gets ripped out anyways) if
> either of those is cleaner..

Yeah, I noticed how trivial the actual change was.  That is why just
doing the cleanup in pass 1 and applying the change in pass 2 might
make it more readbale.  It might also be worth to throw in the use
a goto label instead of long conditionals from your last patch into
that cleanup prep patch.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/6] xfs: replace small allocation logic with agfl only logic
  2019-05-09 16:58 ` [PATCH 6/6] xfs: replace small allocation logic with agfl only logic Brian Foster
@ 2019-05-15  7:53   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-15  7:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

> +out:
> +	if (error)
> +		trace_xfs_alloc_agfl_error(args);
> +	else
> +		trace_xfs_alloc_agfl_done(args);
>  	return error;

Splitting this into an out and an out_error label might be a little
nicer.  Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations
  2019-05-13 15:45     ` Brian Foster
@ 2019-05-15  7:54       ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-15  7:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

On Mon, May 13, 2019 at 11:45:44AM -0400, Brian Foster wrote:
> Hmm, I had attempted some tighter integration with the xfs_btree_cur on
> a previous variant and ran into some roadblocks, the details of which I
> don't recall. That said, I may have tried to include more than just this
> active state, ran into header issues, and then never really stepped back
> from that to explore more incremental changes.
> 
> I think extending the priv union with something for the allocation code
> to use makes sense. Your suggestion has me wondering if down the road we
> could genericize this further to a bc_valid state or some such that
> quickly indicates whether a cursor points at a valid record or off into
> space. That's a matter for another series however..

Yep.

> You mean to create tree specific cursor structures of which
> xfs_btree_cur is a member, then the tree specific logic uses
> container_of() to pull out whatever it needs from cur..? I'd need to
> think more about that one, but indeed that is beyond the scope of this
> work.

Yes.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: refactor by-size extent allocation mode
  2019-05-13 15:46     ` Brian Foster
@ 2019-05-15  8:10       ` Christoph Hellwig
  2019-05-15 14:42         ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-15  8:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs

> WRT to merging the functions, I'm a little concerned about the result
> being too large. What do you think about folding in _vextent_type() but
> at the same time factoring out the rmap/counter/resv post alloc bits
> into an xfs_alloc_ag_vextent_accounting() helper or some such?

Sounds good to me.  I've looked at the function and another nice thing
to do would be to not pass the ret bno to xfs_alloc_ag_vextent_agfl,
but let that function fill out the args structure return value itself.

Also for the trace piints that still say near in them - maybe we should
change that near to ag?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt
  2019-05-15  7:52       ` Christoph Hellwig
@ 2019-05-15 14:41         ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2019-05-15 14:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, May 15, 2019 at 12:52:53AM -0700, Christoph Hellwig wrote:
> On Mon, May 13, 2019 at 11:44:34AM -0400, Brian Foster wrote:
> > The only functional change was basically to check for ccur before using
> > it and initializing i to zero. It just seemed to make sense to clean up
> > the surrounding code while there, but I can either split out the
> > aesthetic cleanup or defer that stuff to the broader rework at the end
> > of the series (where the cursor stuff just gets ripped out anyways) if
> > either of those is cleaner..
> 
> Yeah, I noticed how trivial the actual change was.  That is why just
> doing the cleanup in pass 1 and applying the change in pass 2 might
> make it more readbale.  It might also be worth to throw in the use
> a goto label instead of long conditionals from your last patch into
> that cleanup prep patch.

Ok, I've lifted all of the small mode refactoring (from both this patch
and the last one) into an initial patch. Unless you indicate otherwise,
I've also retained your review tag(s), though you might want to take a
cursory look once the next version lands.

Brian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/6] xfs: refactor by-size extent allocation mode
  2019-05-15  8:10       ` Christoph Hellwig
@ 2019-05-15 14:42         ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2019-05-15 14:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

On Wed, May 15, 2019 at 01:10:16AM -0700, Christoph Hellwig wrote:
> > WRT to merging the functions, I'm a little concerned about the result
> > being too large. What do you think about folding in _vextent_type() but
> > at the same time factoring out the rmap/counter/resv post alloc bits
> > into an xfs_alloc_ag_vextent_accounting() helper or some such?
> 
> Sounds good to me.  I've looked at the function and another nice thing
> to do would be to not pass the ret bno to xfs_alloc_ag_vextent_agfl,
> but let that function fill out the args structure return value itself.
> 

I believe that's what the existing xfs_alloc_ag_vextent_small() function
does if it happens to allocate from the AGFL. I initially found that
inconsistent, but looking at the additional refactoring with the _type()
function folded away and whatnot I think it's actually better. I'll push
the args update back down into the AGFL helper.

> Also for the trace piints that still say near in them - maybe we should
> change that near to ag?

Yeah, I need to make another pass over the tracepoints...

Brian

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-05-15 14:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-09 16:58 [PATCH 0/6] xfs: rework extent allocation Brian Foster
2019-05-09 16:58 ` [PATCH 1/6] xfs: refactor small allocation helper to skip cntbt attempt Brian Foster
2019-05-10 17:24   ` Christoph Hellwig
2019-05-13 15:44     ` Brian Foster
2019-05-15  7:52       ` Christoph Hellwig
2019-05-15 14:41         ` Brian Foster
2019-05-09 16:58 ` [PATCH 2/6] xfs: always update params on small allocation Brian Foster
2019-05-10 17:26   ` Christoph Hellwig
2019-05-09 16:58 ` [PATCH 3/6] xfs: use locality optimized cntbt lookups for near mode allocations Brian Foster
2019-05-10 17:31   ` Christoph Hellwig
2019-05-13 15:45     ` Brian Foster
2019-05-15  7:54       ` Christoph Hellwig
2019-05-09 16:58 ` [PATCH 4/6] xfs: refactor exact extent allocation mode Brian Foster
2019-05-09 16:58 ` [PATCH 5/6] xfs: refactor by-size " Brian Foster
2019-05-10 17:34   ` Christoph Hellwig
2019-05-13 15:46     ` Brian Foster
2019-05-15  8:10       ` Christoph Hellwig
2019-05-15 14:42         ` Brian Foster
2019-05-09 16:58 ` [PATCH 6/6] xfs: replace small allocation logic with agfl only logic Brian Foster
2019-05-15  7:53   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.