All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/42] xfs: per-ag centric allocation alogrithms
@ 2023-01-18 22:44 Dave Chinner
  2023-01-18 22:44 ` [PATCH 01/42] xfs: fix low space alloc deadlock Dave Chinner
                   ` (42 more replies)
  0 siblings, 43 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

This series continues the work towards making shrinking a filesystem
possible.  We need to be able to stop operations from taking place
on AGs that need to be removed by a shrink, so before shrink can be
implemented we need to have the infrastructure in place to prevent
incursion into AGs that are going to be, or are in the process, of
being removed from active duty.

The focus of this is making operations that depend on access to AGs
use the perag to access and pin the AG in active use, thereby
creating a barrier we can use to delay shrink until all active uses
of an AG have been drained and new uses are prevented.

This series starts by fixing some existing issues that are exposed
by changes later in the series. They stand alone, so can be picked
up independently of the rest of this patchset.

The most complex of these fixes is cleaning up the mess that is the
AGF deadlock avoidance algorithm. This algorithm stores the first
block that is allocated in a transaction in tp->t_firstblock, then
uses this to try to limit future allocations within the transaction
to AGs at or higher than the filesystem block stored in
tp->t_firstblock. This depends on one of the initial bug fixes in
the series to move the deadlock avoidance checks to
xfs_alloc_vextent(), and then builds on it to relax the constraints
of the avoidance algorithm to only be active when a deadlock is
possible.

We also update the algorithm to record allocations from higher AGs
that are allocated from, because we when we need to lock more than
two AGs we still have to ensure lock order is correct. Therefore we
can't lock AGs in the order 1, 3, 2, even though tp->t_firstblock
indicates that we've allocated from AG 1 and so AG is valid to lock.
It's not valid, because we already hold AG 3 locked, and so
tp->t-first_block should actually point at AG 3, not AG 1 in this
situation.

It should now be obvious that the deadlock avoidance algorithm
should record AGs, not filesystem blocks. So the series then changes
the transaction to store the highest AG we've allocated in rather
than a filesystem block we allocated.  This makes it obvious what
the constraints are, and trivial to update as we lock and allocate
from various AGs.

With all the bug fixes out of the way, the series then starts
converting the code to use active references. Active reference
counts are used by high level code that needs to prevent the AG from
being taken out from under it by a shrink operation. The high level
code needs to be able to handle not getting an active reference
gracefully, and the shrink code will need to wait for active
references to drain before continuing.

Active references are implemented just as reference counts right now
- an active reference is taken at perag init during mount, and all
other active references are dependent on the active reference count
being greater than zero. This gives us an initial method of stopping
new active references without needing other infrastructure; just
drop the reference taken at filesystem mount time and when the
refcount then falls to zero no new references can be taken.

In future, this will need to take into account AG control state
(e.g. offline, no alloc, etc) as well as the reference count, but
right now we can implement a basic barrier for shrink with just
reference count manipulations. As such, patches to convert the perag
state to atomic opstate fields similar to the xfs_mount and xlog
opstate fields follow the initial active perag reference counting
patches.

The first target for active reference conversion is the
for_each_perag*() iterators. This captures a lot of high level code
that should skip offline AGs, and introduces the ability to
differentiate between a lookup that didn't have an online AG and the
end of the AG iteration range.

From there, the inode allocation AG selection is converted to active
references, and the perag is driven deeper into the inode allocation
and btree code to replace the xfs_mount. Most of the inode
allocation code operates on a single AG once it is selected, hence
it should pass the perag as the primary referenced object around for
allocation, not the xfs_mount. There is a bit of churn here, but it
emphasises that inode allocation is inherently an allocation group
based operation.

Next the bmap/alloc interface undergoes a major untangling,
reworking xfs_bmap_btalloc() into separate allocation operations for
different contexts and failure handling behaviours. This then allows
us to completely remove the xfs_alloc_vextent() layer via
restructuring the xfs_alloc_vextent/xfs_alloc_ag_vextent() into a
set of realtively simple helper function that describe the
allocation that they are doing. e.g.  xfs_alloc_vextent_exact_bno().

This allows the requirements for accessing AGs to be allocation
context dependent. The allocations that require operation on a
single AG generally can't tolerate failure after the allocation
method and AG has been decided on, and hence the caller needs to
manage the active references to ensure the allocation does not race
with shrink removing the selected AG for the duration of the
operation that requires access to that allocation group.

Other allocations iterate AGs and so the first AG is just a hint -
these do not need to pin a perag first as they can tolerate not
being able to access an AG by simply skipping over it. These require
new perag iteration functions that can start at arbitrary AGs and
wrap around at arbitrary AGs, hence a new set for
for_each_perag_wrap*() helpers to do this.

Next is the rework of the filestreams allocator. This doesn't change
any functionality, but gets rid of the unnecessary multi-pass
selection algorithm when the selected AG is not available. It
currently does a lookup pass which might iterate all AGs to select
an AG, then checks if the AG is acceptible and if not does a "new
AG" pass that is essentially identical to the lookup pass. Both of
these scans also do the same "longest extent in AG" check before
selecting an AG as is done after the AG is selected.

IOWs, the filestreams algorithm can be greatly simplified into a
single new AG selection pass if the there is no current association
or the currently associated AG doesn't have enough contiguous free
space for the allocation to proceed.  With this simplification of
the filestreams allocator, it's then trivial to convert it to use
for_each_perag_wrap() for the AG scan algorithm.

This series passes auto group fstests with rmapbt=1 on both 1kB and
4kB block size configurations without functional or performance
regressions. In some cases ENOSPC behaviour is improved, but fstests
does not capture those improvements as it only tests for regressions
in behaviour.

Version 2:
- AGI, AGF and AGFL access conversion patches removed due to being
  merged.
- AG geometry conversion patches removed due to being merged
- Rebase on 6.2-rc4
- fixed "firstblock" AGF deadlock avoidance algorithm
- lots of cleanups and bug fixes.

Version 1 [RFC]:
- https://lore.kernel.org/linux-xfs/20220611012659.3418072-1-david@fromorbit.com/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 01/42] xfs: fix low space alloc deadlock
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-19 16:39   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 02/42] xfs: prefer free inodes at ENOSPC over chunk allocation Dave Chinner
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

I've recently encountered an ABBA deadlock with g/476. The upcoming
changes seem to make this much easier to hit, but the underlying
problem is a pre-existing one.

Essentially, if we select an AG for allocation, then lock the AGF
and then fail to allocate for some reason (e.g. minimum length
requirements cannot be satisfied), then we drop out of the
allocation with the AGF still locked.

The caller then modifies the allocation constraints - usually
loosening them up - and tries again. This can result in trying to
access AGFs that are lower than the AGF we already have locked from
the failed attempt. e.g. the failed attempt skipped several AGs
before failing, so we have locks an AG higher than the start AG.
Retrying the allocation from the start AG then causes us to violate
AGF lock ordering and this can lead to deadlocks.

The deadlock exists even if allocation succeeds - we can do a
followup allocations in the same transaction for BMBT blocks that
aren't guaranteed to be in the same AG as the original, and can move
into higher AGs. Hence we really need to move the tp->t_firstblock
tracking down into xfs_alloc_vextent() where it can be set when we
exit with a locked AG.

xfs_alloc_vextent() can also check there if the requested
allocation falls within the allow range of AGs set by
tp->t_firstblock. If we can't allocate within the range set, we have
to fail the allocation. If we are allowed to to non-blocking AGF
locking, we can ignore the AG locking order limitations as we can
use try-locks for the first iteration over requested AG range.

This invalidates a set of post allocation asserts that check that
the allocation is always above tp->t_firstblock if it is set.
Because we can use try-locks to avoid the deadlock in some
circumstances, having a pre-existing locked AGF doesn't always
prevent allocation from lower order AGFs. Hence those ASSERTs need
to be removed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 69 ++++++++++++++++++++++++++++++++-------
 fs/xfs/libxfs/xfs_bmap.c  | 14 --------
 fs/xfs/xfs_trace.h        |  1 +
 3 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 989cf341779b..c2f38f595d7f 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3164,10 +3164,13 @@ xfs_alloc_vextent(
 	xfs_alloctype_t		type;	/* input allocation type */
 	int			bump_rotor = 0;
 	xfs_agnumber_t		rotorstep = xfs_rotorstep; /* inode32 agf stepper */
+	xfs_agnumber_t		minimum_agno = 0;
 
 	mp = args->mp;
 	type = args->otype = args->type;
 	args->agbno = NULLAGBLOCK;
+	if (args->tp->t_firstblock != NULLFSBLOCK)
+		minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp->t_firstblock);
 	/*
 	 * Just fix this up, for the case where the last a.g. is shorter
 	 * (or there's only one a.g.) and the caller couldn't easily figure
@@ -3201,6 +3204,13 @@ xfs_alloc_vextent(
 		 */
 		args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
 		args->pag = xfs_perag_get(mp, args->agno);
+
+		if (minimum_agno > args->agno) {
+			trace_xfs_alloc_vextent_skip_deadlock(args);
+			error = 0;
+			break;
+		}
+
 		error = xfs_alloc_fix_freelist(args, 0);
 		if (error) {
 			trace_xfs_alloc_vextent_nofix(args);
@@ -3232,6 +3242,8 @@ xfs_alloc_vextent(
 	case XFS_ALLOCTYPE_FIRST_AG:
 		/*
 		 * Rotate through the allocation groups looking for a winner.
+		 * If we are blocking, we must obey minimum_agno contraints for
+		 * avoiding ABBA deadlocks on AGF locking.
 		 */
 		if (type == XFS_ALLOCTYPE_FIRST_AG) {
 			/*
@@ -3239,7 +3251,7 @@ xfs_alloc_vextent(
 			 */
 			args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
 			args->type = XFS_ALLOCTYPE_THIS_AG;
-			sagno = 0;
+			sagno = minimum_agno;
 			flags = 0;
 		} else {
 			/*
@@ -3248,6 +3260,7 @@ xfs_alloc_vextent(
 			args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno);
 			flags = XFS_ALLOC_FLAG_TRYLOCK;
 		}
+
 		/*
 		 * Loop over allocation groups twice; first time with
 		 * trylock set, second time without.
@@ -3276,19 +3289,21 @@ xfs_alloc_vextent(
 			if (args->agno == sagno &&
 			    type == XFS_ALLOCTYPE_START_BNO)
 				args->type = XFS_ALLOCTYPE_THIS_AG;
+
 			/*
-			* For the first allocation, we can try any AG to get
-			* space.  However, if we already have allocated a
-			* block, we don't want to try AGs whose number is below
-			* sagno. Otherwise, we may end up with out-of-order
-			* locking of AGF, which might cause deadlock.
-			*/
+			 * If we are try-locking, we can't deadlock on AGF
+			 * locks, so we can wrap all the way back to the first
+			 * AG. Otherwise, wrap back to the start AG so we can't
+			 * deadlock, and let the end of scan handler decide what
+			 * to do next.
+			 */
 			if (++(args->agno) == mp->m_sb.sb_agcount) {
-				if (args->tp->t_firstblock != NULLFSBLOCK)
-					args->agno = sagno;
-				else
+				if (flags & XFS_ALLOC_FLAG_TRYLOCK)
 					args->agno = 0;
+				else
+					args->agno = sagno;
 			}
+
 			/*
 			 * Reached the starting a.g., must either be done
 			 * or switch to non-trylock mode.
@@ -3300,7 +3315,14 @@ xfs_alloc_vextent(
 					break;
 				}
 
+				/*
+				 * Blocking pass next, so we must obey minimum
+				 * agno constraints to avoid ABBA AGF deadlocks.
+				 */
 				flags = 0;
+				if (minimum_agno > sagno)
+					sagno = minimum_agno;
+
 				if (type == XFS_ALLOCTYPE_START_BNO) {
 					args->agbno = XFS_FSB_TO_AGBNO(mp,
 						args->fsbno);
@@ -3322,9 +3344,9 @@ xfs_alloc_vextent(
 		ASSERT(0);
 		/* NOTREACHED */
 	}
-	if (args->agbno == NULLAGBLOCK)
+	if (args->agbno == NULLAGBLOCK) {
 		args->fsbno = NULLFSBLOCK;
-	else {
+	} else {
 		args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
 #ifdef DEBUG
 		ASSERT(args->len >= args->minlen);
@@ -3335,6 +3357,29 @@ xfs_alloc_vextent(
 #endif
 
 	}
+
+	/*
+	 * We end up here with a locked AGF. If we failed, the caller is likely
+	 * going to try to allocate again with different parameters, and that
+	 * can widen the AGs that are searched for free space. If we have to do
+	 * BMBT block allocation, we have to do a new allocation.
+	 *
+	 * Hence leaving this function with the AGF locked opens up potential
+	 * ABBA AGF deadlocks because a future allocation attempt in this
+	 * transaction may attempt to lock a lower number AGF.
+	 *
+	 * We can't release the AGF until the transaction is commited, so at
+	 * this point we must update the "firstblock" tracker to point at this
+	 * AG if the tracker is empty or points to a lower AG. This allows the
+	 * next allocation attempt to be modified appropriately to avoid
+	 * deadlocks.
+	 */
+	if (args->agbp &&
+	    (args->tp->t_firstblock == NULLFSBLOCK ||
+	     args->pag->pag_agno > minimum_agno)) {
+		args->tp->t_firstblock = XFS_AGB_TO_FSB(mp,
+					args->pag->pag_agno, 0);
+	}
 	xfs_perag_put(args->pag);
 	return 0;
 error0:
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 0d56a8d862e8..018837bd72c8 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3413,21 +3413,7 @@ xfs_bmap_process_allocated_extent(
 	xfs_fileoff_t		orig_offset,
 	xfs_extlen_t		orig_length)
 {
-	int			nullfb;
-
-	nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
-
-	/*
-	 * check the allocation happened at the same or higher AG than
-	 * the first block that was allocated.
-	 */
-	ASSERT(nullfb ||
-		XFS_FSB_TO_AGNO(args->mp, ap->tp->t_firstblock) <=
-		XFS_FSB_TO_AGNO(args->mp, args->fsbno));
-
 	ap->blkno = args->fsbno;
-	if (nullfb)
-		ap->tp->t_firstblock = args->fsbno;
 	ap->length = args->len;
 	/*
 	 * If the extent size hint is active, we tried to round the
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 421d1e504ac4..918e778fdd55 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1877,6 +1877,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
 DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
 DEFINE_ALLOC_EVENT(xfs_alloc_small_error);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_badargs);
+DEFINE_ALLOC_EVENT(xfs_alloc_vextent_skip_deadlock);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_nofix);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 02/42] xfs: prefer free inodes at ENOSPC over chunk allocation
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
  2023-01-18 22:44 ` [PATCH 01/42] xfs: fix low space alloc deadlock Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-19 19:08   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 03/42] xfs: block reservation too large for minleft allocation Dave Chinner
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When an XFS filesystem has free inodes in chunks already allocated
on disk, it will still allocate new inode chunks if the target AG
has no free inodes in it. Normally, this is a good idea as it
preserves locality of all the inodes in a given directory.

However, at ENOSPC this can lead to using the last few remaining
free filesystem blocks to allocate a new chunk when there are many,
many free inodes that could be allocated without consuming free
space. This results in speeding up the consumption of the last few
blocks and inode create operations then returning ENOSPC when there
free inodes available because we don't have enough block left in the
filesystem for directory creation reservations to proceed.

Hence when we are near ENOSPC, we should be attempting to preserve
the remaining blocks for directory block allocation rather than
using them for unnecessary inode chunk creation.

This particular behaviour is exposed by xfs/294, when it drives to
ENOSPC on empty file creation whilst there are still thousands of
free inodes available for allocation in other AGs in the filesystem.

Hence, when we are within 1% of ENOSPC, change the inode allocation
behaviour to prefer to use existing free inodes over allocating new
inode chunks, even though it results is poorer locality of the data
set. It is more important for the allocations to be space efficient
near ENOSPC than to have optimal locality for performance, so lets
modify the inode AG selection code to reflect that fact.

This allows generic/294 to not only pass with this allocator rework
patchset, but to increase the number of post-ENOSPC empty inode
allocations to from ~600 to ~9080 before we hit ENOSPC on the
directory create transaction reservation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 5118dedf9267..e8068422aa21 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1737,6 +1737,7 @@ xfs_dialloc(
 	struct xfs_perag	*pag;
 	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
 	bool			ok_alloc = true;
+	bool			low_space = false;
 	int			flags;
 	xfs_ino_t		ino;
 
@@ -1767,6 +1768,20 @@ xfs_dialloc(
 		ok_alloc = false;
 	}
 
+	/*
+	 * If we are near to ENOSPC, we want to prefer allocation from AGs that
+	 * have free inodes in them rather than use up free space allocating new
+	 * inode chunks. Hence we turn off allocation for the first non-blocking
+	 * pass through the AGs if we are near ENOSPC to consume free inodes
+	 * that we can immediately allocate, but then we allow allocation on the
+	 * second pass if we fail to find an AG with free inodes in it.
+	 */
+	if (percpu_counter_read_positive(&mp->m_fdblocks) <
+			mp->m_low_space[XFS_LOWSP_1_PCNT]) {
+		ok_alloc = false;
+		low_space = true;
+	}
+
 	/*
 	 * Loop until we find an allocation group that either has free inodes
 	 * or in which we can allocate some inodes.  Iterate through the
@@ -1795,6 +1810,8 @@ xfs_dialloc(
 				break;
 			}
 			flags = 0;
+			if (low_space)
+				ok_alloc = true;
 		}
 		xfs_perag_put(pag);
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 03/42] xfs: block reservation too large for minleft allocation
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
  2023-01-18 22:44 ` [PATCH 01/42] xfs: fix low space alloc deadlock Dave Chinner
  2023-01-18 22:44 ` [PATCH 02/42] xfs: prefer free inodes at ENOSPC over chunk allocation Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-19 20:38   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 04/42] xfs: drop firstblock constraints from allocation setup Dave Chinner
                   ` (39 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When we enter xfs_bmbt_alloc_block() without having first allocated
a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we
are doing something like unwritten extent conversion, the transaction
block reservation is used as the minleft value.

This works for operations like unwritten extent conversion, but it
assumes that the block reservation is only for a BMBT split. THis is
not always true, and sometimes results in larger than necessary
minleft values being set. We only actually need enough space for a
btree split, something we already handle correctly in
xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.

We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to
calculate the number of blocks a BMBT split on this inode is going to
require, not use the transaction block reservation that contains the
maximum number of blocks this transaction may consume in it...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c       |  2 +-
 fs/xfs/libxfs/xfs_bmap.h       |  2 ++
 fs/xfs/libxfs/xfs_bmap_btree.c | 19 +++++++++----------
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 018837bd72c8..9dc33cdc2ab9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4242,7 +4242,7 @@ xfs_bmapi_convert_unwritten(
 	return 0;
 }
 
-static inline xfs_extlen_t
+xfs_extlen_t
 xfs_bmapi_minleft(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip,
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 16db95b11589..08c16e4edc0f 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -220,6 +220,8 @@ int	xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork,
 		struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp,
 		struct xfs_bmbt_irec *new, int *logflagsp);
+xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip,
+		int fork);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index cfa052d40105..18de4fbfef4e 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -213,18 +213,16 @@ xfs_bmbt_alloc_block(
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
 		args.type = XFS_ALLOCTYPE_START_BNO;
+
 		/*
-		 * Make sure there is sufficient room left in the AG to
-		 * complete a full tree split for an extent insert.  If
-		 * we are converting the middle part of an extent then
-		 * we may need space for two tree splits.
-		 *
-		 * We are relying on the caller to make the correct block
-		 * reservation for this operation to succeed.  If the
-		 * reservation amount is insufficient then we may fail a
-		 * block allocation here and corrupt the filesystem.
+		 * If we are coming here from something like unwritten extent
+		 * conversion, there has been no data extent allocation already
+		 * done, so we have to ensure that we attempt to locate the
+		 * entire set of bmbt allocations in the same AG, as
+		 * xfs_bmapi_write() would have reserved.
 		 */
-		args.minleft = args.tp->t_blk_res;
+		args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip,
+						cur->bc_ino.whichfork);
 	} else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 	} else {
@@ -248,6 +246,7 @@ xfs_bmbt_alloc_block(
 		 * successful activate the lowspace algorithm.
 		 */
 		args.fsbno = 0;
+		args.minleft = 0;
 		args.type = XFS_ALLOCTYPE_FIRST_AG;
 		error = xfs_alloc_vextent(&args);
 		if (error)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 04/42] xfs: drop firstblock constraints from allocation setup
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (2 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 03/42] xfs: block reservation too large for minleft allocation Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-19 22:03   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 05/42] xfs: t_firstblock is tracking AGs not blocks Dave Chinner
                   ` (38 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that xfs_alloc_vextent() does all the AGF deadlock prevention
filtering for multiple allocations in a single transaction, we no
longer need the allocation setup code to care about what AGs we
might already have locked.

Hence we can remove all the "nullfb" conditional logic in places
like xfs_bmap_btalloc() and instead have them focus simply on
setting up locality constraints. If the allocation fails due to
AGF lock filtering in xfs_alloc_vextent, then we just fall back as
we normally do to more relaxed allocation constraints.

As a result, any allocation that allows AG scanning (i.e. not
confined to a single AG) and does not force a worst case full
filesystem scan will now be able to attempt allocation from AGs
lower than that defined by tp->t_firstblock. This is because
xfs_alloc_vextent() allows try-locking of the AGFs and hence enables
low space algorithms to at least -try- to get space from AGs lower
than the one that we have currently locked and allocated from. This
is a significant improvement in the low space allocation algorithm.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c       | 168 +++++++++++----------------------
 fs/xfs/libxfs/xfs_bmap.h       |   1 +
 fs/xfs/libxfs/xfs_bmap_btree.c |  30 +++---
 3 files changed, 67 insertions(+), 132 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9dc33cdc2ab9..bc566aae4246 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -645,16 +645,9 @@ xfs_bmap_extents_to_btree(
 	args.tp = tp;
 	args.mp = mp;
 	xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, whichfork);
-	if (tp->t_firstblock == NULLFSBLOCK) {
-		args.type = XFS_ALLOCTYPE_START_BNO;
-		args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
-	} else if (tp->t_flags & XFS_TRANS_LOWMODE) {
-		args.type = XFS_ALLOCTYPE_START_BNO;
-		args.fsbno = tp->t_firstblock;
-	} else {
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-		args.fsbno = tp->t_firstblock;
-	}
+
+	args.type = XFS_ALLOCTYPE_START_BNO;
+	args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
 	args.minlen = args.maxlen = args.prod = 1;
 	args.wasdel = wasdel;
 	*logflagsp = 0;
@@ -662,17 +655,14 @@ xfs_bmap_extents_to_btree(
 	if (error)
 		goto out_root_realloc;
 
+	/*
+	 * Allocation can't fail, the space was reserved.
+	 */
 	if (WARN_ON_ONCE(args.fsbno == NULLFSBLOCK)) {
 		error = -ENOSPC;
 		goto out_root_realloc;
 	}
 
-	/*
-	 * Allocation can't fail, the space was reserved.
-	 */
-	ASSERT(tp->t_firstblock == NULLFSBLOCK ||
-	       args.agno >= XFS_FSB_TO_AGNO(mp, tp->t_firstblock));
-	tp->t_firstblock = args.fsbno;
 	cur->bc_ino.allocated++;
 	ip->i_nblocks++;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, 1L);
@@ -804,13 +794,8 @@ xfs_bmap_local_to_extents(
 	 * Allocate a block.  We know we need only one, since the
 	 * file currently fits in an inode.
 	 */
-	if (tp->t_firstblock == NULLFSBLOCK) {
-		args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
-		args.type = XFS_ALLOCTYPE_START_BNO;
-	} else {
-		args.fsbno = tp->t_firstblock;
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-	}
+	args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
+	args.type = XFS_ALLOCTYPE_START_BNO;
 	args.total = total;
 	args.minlen = args.maxlen = args.prod = 1;
 	error = xfs_alloc_vextent(&args);
@@ -820,7 +805,6 @@ xfs_bmap_local_to_extents(
 	/* Can't fail, the space was reserved. */
 	ASSERT(args.fsbno != NULLFSBLOCK);
 	ASSERT(args.len == 1);
-	tp->t_firstblock = args.fsbno;
 	error = xfs_trans_get_buf(tp, args.mp->m_ddev_targp,
 			XFS_FSB_TO_DADDR(args.mp, args.fsbno),
 			args.mp->m_bsize, 0, &bp);
@@ -854,8 +838,7 @@ xfs_bmap_local_to_extents(
 
 	ifp->if_nextents = 1;
 	ip->i_nblocks = 1;
-	xfs_trans_mod_dquot_byino(tp, ip,
-		XFS_TRANS_DQ_BCOUNT, 1L);
+	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, 1L);
 	flags |= xfs_ilog_fext(whichfork);
 
 done:
@@ -3025,9 +3008,7 @@ xfs_bmap_adjacent(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
 {
 	xfs_fsblock_t	adjust;		/* adjustment to block numbers */
-	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
 	xfs_mount_t	*mp;		/* mount point structure */
-	int		nullfb;		/* true if ap->firstblock isn't set */
 	int		rt;		/* true if inode is realtime */
 
 #define	ISVALID(x,y)	\
@@ -3038,11 +3019,8 @@ xfs_bmap_adjacent(
 		XFS_FSB_TO_AGBNO(mp, x) < mp->m_sb.sb_agblocks)
 
 	mp = ap->ip->i_mount;
-	nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
 	rt = XFS_IS_REALTIME_INODE(ap->ip) &&
 		(ap->datatype & XFS_ALLOC_USERDATA);
-	fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp,
-							ap->tp->t_firstblock);
 	/*
 	 * If allocating at eof, and there's a previous real block,
 	 * try to use its last block as our starting point.
@@ -3101,13 +3079,6 @@ xfs_bmap_adjacent(
 				prevbno += adjust;
 			else
 				prevdiff += adjust;
-			/*
-			 * If the firstblock forbids it, can't use it,
-			 * must use default.
-			 */
-			if (!rt && !nullfb &&
-			    XFS_FSB_TO_AGNO(mp, prevbno) != fb_agno)
-				prevbno = NULLFSBLOCK;
 		}
 		/*
 		 * No previous block or can't follow it, just default.
@@ -3143,13 +3114,6 @@ xfs_bmap_adjacent(
 				gotdiff += adjust - ap->length;
 			} else
 				gotdiff += adjust;
-			/*
-			 * If the firstblock forbids it, can't use it,
-			 * must use default.
-			 */
-			if (!rt && !nullfb &&
-			    XFS_FSB_TO_AGNO(mp, gotbno) != fb_agno)
-				gotbno = NULLFSBLOCK;
 		}
 		/*
 		 * No next block, just default.
@@ -3236,7 +3200,7 @@ xfs_bmap_select_minlen(
 }
 
 STATIC int
-xfs_bmap_btalloc_nullfb(
+xfs_bmap_btalloc_select_lengths(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		*blen)
@@ -3247,8 +3211,13 @@ xfs_bmap_btalloc_nullfb(
 	int			error;
 
 	args->type = XFS_ALLOCTYPE_START_BNO;
-	args->total = ap->total;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		args->total = ap->minlen;
+		args->minlen = ap->minlen;
+		return 0;
+	}
 
+	args->total = ap->total;
 	startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
 	if (startag == NULLAGNUMBER)
 		startag = ag = 0;
@@ -3280,6 +3249,13 @@ xfs_bmap_btalloc_filestreams(
 	int			notinit = 0;
 	int			error;
 
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		args->type = XFS_ALLOCTYPE_FIRST_AG;
+		args->total = ap->minlen;
+		args->minlen = ap->minlen;
+		return 0;
+	}
+
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 	args->total = ap->total;
 
@@ -3460,19 +3436,15 @@ xfs_bmap_exact_minlen_extent_alloc(
 
 	xfs_bmap_compute_alignments(ap, &args);
 
-	if (ap->tp->t_firstblock == NULLFSBLOCK) {
-		/*
-		 * Unlike the longest extent available in an AG, we don't track
-		 * the length of an AG's shortest extent.
-		 * XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is a debug only knob and
-		 * hence we can afford to start traversing from the 0th AG since
-		 * we need not be concerned about a drop in performance in
-		 * "debug only" code paths.
-		 */
-		ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
-	} else {
-		ap->blkno = ap->tp->t_firstblock;
-	}
+	/*
+	 * Unlike the longest extent available in an AG, we don't track
+	 * the length of an AG's shortest extent.
+	 * XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is a debug only knob and
+	 * hence we can afford to start traversing from the 0th AG since
+	 * we need not be concerned about a drop in performance in
+	 * "debug only" code paths.
+	 */
+	ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
 
 	args.fsbno = ap->blkno;
 	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
@@ -3515,13 +3487,11 @@ xfs_bmap_btalloc(
 	struct xfs_mount	*mp = ap->ip->i_mount;
 	struct xfs_alloc_arg	args = { .tp = ap->tp, .mp = mp };
 	xfs_alloctype_t		atype = 0;
-	xfs_agnumber_t		fb_agno;	/* ag number of ap->firstblock */
 	xfs_agnumber_t		ag;
 	xfs_fileoff_t		orig_offset;
 	xfs_extlen_t		orig_length;
 	xfs_extlen_t		blen;
 	xfs_extlen_t		nextminlen = 0;
-	int			nullfb; /* true if ap->firstblock isn't set */
 	int			isaligned;
 	int			tryagain;
 	int			error;
@@ -3533,34 +3503,17 @@ xfs_bmap_btalloc(
 
 	stripe_align = xfs_bmap_compute_alignments(ap, &args);
 
-	nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
-	fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp,
-							ap->tp->t_firstblock);
-	if (nullfb) {
-		if ((ap->datatype & XFS_ALLOC_USERDATA) &&
-		    xfs_inode_is_filestream(ap->ip)) {
-			ag = xfs_filestream_lookup_ag(ap->ip);
-			ag = (ag != NULLAGNUMBER) ? ag : 0;
-			ap->blkno = XFS_AGB_TO_FSB(mp, ag, 0);
-		} else {
-			ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
-		}
-	} else
-		ap->blkno = ap->tp->t_firstblock;
+	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
+	    xfs_inode_is_filestream(ap->ip)) {
+		ag = xfs_filestream_lookup_ag(ap->ip);
+		ag = (ag != NULLAGNUMBER) ? ag : 0;
+		ap->blkno = XFS_AGB_TO_FSB(mp, ag, 0);
+	} else {
+		ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
+	}
 
 	xfs_bmap_adjacent(ap);
 
-	/*
-	 * If allowed, use ap->blkno; otherwise must use firstblock since
-	 * it's in the right allocation group.
-	 */
-	if (nullfb || XFS_FSB_TO_AGNO(mp, ap->blkno) == fb_agno)
-		;
-	else
-		ap->blkno = ap->tp->t_firstblock;
-	/*
-	 * Normal allocation, done through xfs_alloc_vextent.
-	 */
 	tryagain = isaligned = 0;
 	args.fsbno = ap->blkno;
 	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
@@ -3568,30 +3521,19 @@ xfs_bmap_btalloc(
 	/* Trim the allocation back to the maximum an AG can fit. */
 	args.maxlen = min(ap->length, mp->m_ag_max_usable);
 	blen = 0;
-	if (nullfb) {
-		/*
-		 * Search for an allocation group with a single extent large
-		 * enough for the request.  If one isn't found, then adjust
-		 * the minimum allocation size to the largest space found.
-		 */
-		if ((ap->datatype & XFS_ALLOC_USERDATA) &&
-		    xfs_inode_is_filestream(ap->ip))
-			error = xfs_bmap_btalloc_filestreams(ap, &args, &blen);
-		else
-			error = xfs_bmap_btalloc_nullfb(ap, &args, &blen);
-		if (error)
-			return error;
-	} else if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
-		if (xfs_inode_is_filestream(ap->ip))
-			args.type = XFS_ALLOCTYPE_FIRST_AG;
-		else
-			args.type = XFS_ALLOCTYPE_START_BNO;
-		args.total = args.minlen = ap->minlen;
-	} else {
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-		args.total = ap->total;
-		args.minlen = ap->minlen;
-	}
+
+	/*
+	 * Search for an allocation group with a single extent large
+	 * enough for the request.  If one isn't found, then adjust
+	 * the minimum allocation size to the largest space found.
+	 */
+	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
+	    xfs_inode_is_filestream(ap->ip))
+		error = xfs_bmap_btalloc_filestreams(ap, &args, &blen);
+	else
+		error = xfs_bmap_btalloc_select_lengths(ap, &args, &blen);
+	if (error)
+		return error;
 
 	/*
 	 * If we are not low on available data blocks, and the underlying
@@ -3678,7 +3620,7 @@ xfs_bmap_btalloc(
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
 	}
-	if (args.fsbno == NULLFSBLOCK && nullfb &&
+	if (args.fsbno == NULLFSBLOCK &&
 	    args.minlen > ap->minlen) {
 		args.minlen = ap->minlen;
 		args.type = XFS_ALLOCTYPE_START_BNO;
@@ -3686,7 +3628,7 @@ xfs_bmap_btalloc(
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
 	}
-	if (args.fsbno == NULLFSBLOCK && nullfb) {
+	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = 0;
 		args.type = XFS_ALLOCTYPE_FIRST_AG;
 		args.total = ap->minlen;
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 08c16e4edc0f..0ffc0d998850 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -269,4 +269,5 @@ extern struct kmem_cache	*xfs_bmap_intent_cache;
 int __init xfs_bmap_intent_init_cache(void);
 void xfs_bmap_intent_destroy_cache(void);
 
+
 #endif	/* __XFS_BMAP_H__ */
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 18de4fbfef4e..76a0f0d260a4 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -206,28 +206,21 @@ xfs_bmbt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
-	args.fsbno = cur->bc_tp->t_firstblock;
 	xfs_rmap_ino_bmbt_owner(&args.oinfo, cur->bc_ino.ip->i_ino,
 			cur->bc_ino.whichfork);
 
-	if (args.fsbno == NULLFSBLOCK) {
-		args.fsbno = be64_to_cpu(start->l);
-		args.type = XFS_ALLOCTYPE_START_BNO;
+	args.fsbno = be64_to_cpu(start->l);
+	args.type = XFS_ALLOCTYPE_START_BNO;
 
-		/*
-		 * If we are coming here from something like unwritten extent
-		 * conversion, there has been no data extent allocation already
-		 * done, so we have to ensure that we attempt to locate the
-		 * entire set of bmbt allocations in the same AG, as
-		 * xfs_bmapi_write() would have reserved.
-		 */
+	/*
+	 * If we are coming here from something like unwritten extent
+	 * conversion, there has been no data extent allocation already done, so
+	 * we have to ensure that we attempt to locate the entire set of bmbt
+	 * allocations in the same AG, as xfs_bmapi_write() would have reserved.
+	 */
+	if (cur->bc_tp->t_firstblock == NULLFSBLOCK)
 		args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip,
-						cur->bc_ino.whichfork);
-	} else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) {
-		args.type = XFS_ALLOCTYPE_START_BNO;
-	} else {
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-	}
+					cur->bc_ino.whichfork);
 
 	args.minlen = args.maxlen = args.prod = 1;
 	args.wasdel = cur->bc_ino.flags & XFS_BTCUR_BMBT_WASDEL;
@@ -247,7 +240,7 @@ xfs_bmbt_alloc_block(
 		 */
 		args.fsbno = 0;
 		args.minleft = 0;
-		args.type = XFS_ALLOCTYPE_FIRST_AG;
+		args.type = XFS_ALLOCTYPE_START_BNO;
 		error = xfs_alloc_vextent(&args);
 		if (error)
 			goto error0;
@@ -259,7 +252,6 @@ xfs_bmbt_alloc_block(
 	}
 
 	ASSERT(args.len == 1);
-	cur->bc_tp->t_firstblock = args.fsbno;
 	cur->bc_ino.allocated++;
 	cur->bc_ino.ip->i_nblocks++;
 	xfs_trans_log_inode(args.tp, cur->bc_ino.ip, XFS_ILOG_CORE);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 05/42] xfs: t_firstblock is tracking AGs not blocks
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (3 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 04/42] xfs: drop firstblock constraints from allocation setup Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-19 22:12   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 06/42] xfs: don't assert fail on transaction cancel with deferred ops Dave Chinner
                   ` (37 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The tp->t_firstblock field is now raelly tracking the highest AG we
have locked, not the block number of the highest allocation we've
made. It's purpose is to prevent AGF locking deadlocks, so rename it
to "highest AG" and simplify the implementation to just track the
agno rather than a fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c      | 12 +++++-------
 fs/xfs/libxfs/xfs_bmap.c       |  4 ++--
 fs/xfs/libxfs/xfs_bmap_btree.c |  6 +++---
 fs/xfs/xfs_bmap_util.c         |  2 +-
 fs/xfs/xfs_inode.c             |  2 +-
 fs/xfs/xfs_reflink.c           |  2 +-
 fs/xfs/xfs_trace.h             |  8 ++++----
 fs/xfs/xfs_trans.c             |  4 ++--
 fs/xfs/xfs_trans.h             |  2 +-
 9 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c2f38f595d7f..9f26a9368eeb 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3169,8 +3169,8 @@ xfs_alloc_vextent(
 	mp = args->mp;
 	type = args->otype = args->type;
 	args->agbno = NULLAGBLOCK;
-	if (args->tp->t_firstblock != NULLFSBLOCK)
-		minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp->t_firstblock);
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		minimum_agno = args->tp->t_highest_agno;
 	/*
 	 * Just fix this up, for the case where the last a.g. is shorter
 	 * (or there's only one a.g.) and the caller couldn't easily figure
@@ -3375,11 +3375,9 @@ xfs_alloc_vextent(
 	 * deadlocks.
 	 */
 	if (args->agbp &&
-	    (args->tp->t_firstblock == NULLFSBLOCK ||
-	     args->pag->pag_agno > minimum_agno)) {
-		args->tp->t_firstblock = XFS_AGB_TO_FSB(mp,
-					args->pag->pag_agno, 0);
-	}
+	    (args->tp->t_highest_agno == NULLAGNUMBER ||
+	     args->pag->pag_agno > minimum_agno))
+		args->tp->t_highest_agno = args->pag->pag_agno;
 	xfs_perag_put(args->pag);
 	return 0;
 error0:
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index bc566aae4246..f15d45af661f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4192,7 +4192,7 @@ xfs_bmapi_minleft(
 {
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, fork);
 
-	if (tp && tp->t_firstblock != NULLFSBLOCK)
+	if (tp && tp->t_highest_agno != NULLAGNUMBER)
 		return 0;
 	if (ifp->if_format != XFS_DINODE_FMT_BTREE)
 		return 1;
@@ -6084,7 +6084,7 @@ xfs_bmap_finish_one(
 {
 	int				error = 0;
 
-	ASSERT(tp->t_firstblock == NULLFSBLOCK);
+	ASSERT(tp->t_highest_agno == NULLAGNUMBER);
 
 	trace_xfs_bmap_deferred(tp->t_mountp,
 			XFS_FSB_TO_AGNO(tp->t_mountp, startblock), type,
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 76a0f0d260a4..afd9b2d962a3 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -184,11 +184,11 @@ xfs_bmbt_update_cursor(
 	struct xfs_btree_cur	*src,
 	struct xfs_btree_cur	*dst)
 {
-	ASSERT((dst->bc_tp->t_firstblock != NULLFSBLOCK) ||
+	ASSERT((dst->bc_tp->t_highest_agno != NULLAGNUMBER) ||
 	       (dst->bc_ino.ip->i_diflags & XFS_DIFLAG_REALTIME));
 
 	dst->bc_ino.allocated += src->bc_ino.allocated;
-	dst->bc_tp->t_firstblock = src->bc_tp->t_firstblock;
+	dst->bc_tp->t_highest_agno = src->bc_tp->t_highest_agno;
 
 	src->bc_ino.allocated = 0;
 }
@@ -218,7 +218,7 @@ xfs_bmbt_alloc_block(
 	 * we have to ensure that we attempt to locate the entire set of bmbt
 	 * allocations in the same AG, as xfs_bmapi_write() would have reserved.
 	 */
-	if (cur->bc_tp->t_firstblock == NULLFSBLOCK)
+	if (cur->bc_tp->t_highest_agno == NULLAGNUMBER)
 		args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip,
 					cur->bc_ino.whichfork);
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 867645b74d88..a09dd2606479 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1410,7 +1410,7 @@ xfs_swap_extent_rmap(
 
 		/* Unmap the old blocks in the source file. */
 		while (tirec.br_blockcount) {
-			ASSERT(tp->t_firstblock == NULLFSBLOCK);
+			ASSERT(tp->t_highest_agno == NULLAGNUMBER);
 			trace_xfs_swap_extent_rmap_remap_piece(tip, &tirec);
 
 			/* Read extent from the source file */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d354ea2b74f9..dbe274b8065d 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1367,7 +1367,7 @@ xfs_itruncate_extents_flags(
 
 	unmap_len = XFS_MAX_FILEOFF - first_unmap_block + 1;
 	while (unmap_len > 0) {
-		ASSERT(tp->t_firstblock == NULLFSBLOCK);
+		ASSERT(tp->t_highest_agno == NULLAGNUMBER);
 		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
 				flags, XFS_ITRUNC_MAX_EXTENTS);
 		if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 5535778a98f9..57bf59ff4854 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -610,7 +610,7 @@ xfs_reflink_cancel_cow_blocks(
 			if (error)
 				break;
 		} else if (del.br_state == XFS_EXT_UNWRITTEN || cancel_real) {
-			ASSERT((*tpp)->t_firstblock == NULLFSBLOCK);
+			ASSERT((*tpp)->t_highest_agno == NULLAGNUMBER);
 
 			/* Free the CoW orphan record. */
 			xfs_refcount_free_cow_extent(*tpp, del.br_startblock,
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 918e778fdd55..7dc57db6aa42 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1801,7 +1801,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__field(char, wasfromfl)
 		__field(int, resv)
 		__field(int, datatype)
-		__field(xfs_fsblock_t, firstblock)
+		__field(xfs_agnumber_t, highest_agno)
 	),
 	TP_fast_assign(
 		__entry->dev = args->mp->m_super->s_dev;
@@ -1822,12 +1822,12 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->wasfromfl = args->wasfromfl;
 		__entry->resv = args->resv;
 		__entry->datatype = args->datatype;
-		__entry->firstblock = args->tp->t_firstblock;
+		__entry->highest_agno = args->tp->t_highest_agno;
 	),
 	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
 		  "prod %u minleft %u total %u alignment %u minalignslop %u "
 		  "len %u type %s otype %s wasdel %d wasfromfl %d resv %d "
-		  "datatype 0x%x firstblock 0x%llx",
+		  "datatype 0x%x highest_agno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
 		  __entry->agbno,
@@ -1846,7 +1846,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		  __entry->wasfromfl,
 		  __entry->resv,
 		  __entry->datatype,
-		  (unsigned long long)__entry->firstblock)
+		  __entry->highest_agno)
 )
 
 #define DEFINE_ALLOC_EVENT(name) \
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7bd16fbff534..53ab544e4c2c 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -102,7 +102,7 @@ xfs_trans_dup(
 	INIT_LIST_HEAD(&ntp->t_items);
 	INIT_LIST_HEAD(&ntp->t_busy);
 	INIT_LIST_HEAD(&ntp->t_dfops);
-	ntp->t_firstblock = NULLFSBLOCK;
+	ntp->t_highest_agno = NULLAGNUMBER;
 
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(tp->t_ticket != NULL);
@@ -278,7 +278,7 @@ xfs_trans_alloc(
 	INIT_LIST_HEAD(&tp->t_items);
 	INIT_LIST_HEAD(&tp->t_busy);
 	INIT_LIST_HEAD(&tp->t_dfops);
-	tp->t_firstblock = NULLFSBLOCK;
+	tp->t_highest_agno = NULLAGNUMBER;
 
 	error = xfs_trans_reserve(tp, resp, blocks, rtextents);
 	if (error == -ENOSPC && want_retry) {
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 55819785941c..6e3646d524ce 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -132,7 +132,7 @@ typedef struct xfs_trans {
 	unsigned int		t_rtx_res;	/* # of rt extents resvd */
 	unsigned int		t_rtx_res_used;	/* # of resvd rt extents used */
 	unsigned int		t_flags;	/* misc flags */
-	xfs_fsblock_t		t_firstblock;	/* first block allocated */
+	xfs_agnumber_t		t_highest_agno;	/* highest AGF locked */
 	struct xlog_ticket	*t_ticket;	/* log mgr ticket */
 	struct xfs_mount	*t_mountp;	/* ptr to fs mount struct */
 	struct xfs_dquot_acct   *t_dqinfo;	/* acctg info for dquots */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 06/42] xfs: don't assert fail on transaction cancel with deferred ops
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (4 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 05/42] xfs: t_firstblock is tracking AGs not blocks Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-19 22:18   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 07/42] xfs: active perag reference counting Dave Chinner
                   ` (36 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We can error out of an allocation transaction when updating BMBT
blocks when things go wrong. This can be a btree corruption, and
unexpected ENOSPC, etc. In these cases, we already have deferred ops
queued for the first allocation that has been done, and we just want
to cancel out the transaction and shut down the filesystem on error.

In fact, we do just that for production systems - the assert that we
can't have a transaction with defer ops attached unless we are
already shut down is bogus and gets in the way of debugging
whatever issue is actually causing the transaction to be cancelled.

Remove the assert because it is causing spurious test failures to
hang test machines.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 53ab544e4c2c..8afc0c080861 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1078,10 +1078,10 @@ xfs_trans_cancel(
 	/*
 	 * It's never valid to cancel a transaction with deferred ops attached,
 	 * because the transaction is effectively dirty.  Complain about this
-	 * loudly before freeing the in-memory defer items.
+	 * loudly before freeing the in-memory defer items and shutting down the
+	 * filesystem.
 	 */
 	if (!list_empty(&tp->t_dfops)) {
-		ASSERT(xfs_is_shutdown(mp) || list_empty(&tp->t_dfops));
 		ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 		dirty = true;
 		xfs_defer_cancel(tp);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 07/42] xfs: active perag reference counting
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (5 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 06/42] xfs: don't assert fail on transaction cancel with deferred ops Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-21  5:16   ` Allison Henderson
  2023-02-01 19:08   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 08/42] xfs: rework the perag trace points to be perag centric Dave Chinner
                   ` (35 subsequent siblings)
  42 siblings, 2 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We need to be able to dynamically remove instantiated AGs from
memory safely, either for shrinking the filesystem or paging AG
state in and out of memory (e.g. supporting millions of AGs). This
means we need to be able to safely exclude operations from accessing
perags while dynamic removal is in progress.

To do this, introduce the concept of active and passive references.
Active references are required for high level operations that make
use of an AG for a given operation (e.g. allocation) and pin the
perag in memory for the duration of the operation that is operating
on the perag (e.g. transaction scope). This means we can fail to get
an active reference to an AG, hence callers of the new active
reference API must be able to handle lookup failure gracefully.

Passive references are used in low level code, where we might need
to access the perag structure for the purposes of completing high
level operations. For example, buffers need to use passive
references because:
- we need to be able to do metadata IO during operations like grow
  and shrink transactions where high level active references to the
  AG have already been blocked
- buffers need to pin the perag until they are reclaimed from
  memory, something that high level code has no direct control over.
- unused cached buffers should not prevent a shrink from being
  started.

Hence we have active references that will form exclusion barriers
for operations to be performed on an AG, and passive references that
will prevent reclaim of the perag until all objects with passive
references have been reclaimed themselves.

This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API
for active AG reference functionality. We also need to convert the
for_each_perag*() iterators to use active references, which will
start the process of converting high level code over to using active
references. Conversion of non-iterator based code to active
references will be done in followup patches.

Note that the implementation using reference counting is really just
a development vehicle for the API to ensure we don't have any leaks
in the callers. Once we need to remove perag structures from memory
dyanmically, we will need a much more robust per-ag state transition
mechanism for preventing new references from being taken while we
wait for existing references to drain before removal from memory can
occur....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c    | 70 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ag.h    | 31 ++++++++++++-----
 fs/xfs/scrub/bmap.c       |  2 +-
 fs/xfs/scrub/fscounters.c |  4 +--
 fs/xfs/xfs_fsmap.c        |  4 +--
 fs/xfs/xfs_icache.c       |  2 +-
 fs/xfs/xfs_iwalk.c        |  6 ++--
 fs/xfs/xfs_reflink.c      |  2 +-
 fs/xfs/xfs_trace.h        |  3 ++
 9 files changed, 105 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index bb0c700afe3c..46e25c682bf4 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -94,6 +94,68 @@ xfs_perag_put(
 	trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref, _RET_IP_);
 }
 
+/*
+ * Active references for perag structures. This is for short term access to the
+ * per ag structures for walking trees or accessing state. If an AG is being
+ * shrunk or is offline, then this will fail to find that AG and return NULL
+ * instead.
+ */
+struct xfs_perag *
+xfs_perag_grab(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_perag	*pag;
+
+	rcu_read_lock();
+	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
+	if (pag) {
+		trace_xfs_perag_grab(mp, pag->pag_agno,
+				atomic_read(&pag->pag_active_ref), _RET_IP_);
+		if (!atomic_inc_not_zero(&pag->pag_active_ref))
+			pag = NULL;
+	}
+	rcu_read_unlock();
+	return pag;
+}
+
+/*
+ * search from @first to find the next perag with the given tag set.
+ */
+struct xfs_perag *
+xfs_perag_grab_tag(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		first,
+	int			tag)
+{
+	struct xfs_perag	*pag;
+	int			found;
+
+	rcu_read_lock();
+	found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
+					(void **)&pag, first, 1, tag);
+	if (found <= 0) {
+		rcu_read_unlock();
+		return NULL;
+	}
+	trace_xfs_perag_grab_tag(mp, pag->pag_agno,
+			atomic_read(&pag->pag_active_ref), _RET_IP_);
+	if (!atomic_inc_not_zero(&pag->pag_active_ref))
+		pag = NULL;
+	rcu_read_unlock();
+	return pag;
+}
+
+void
+xfs_perag_rele(
+	struct xfs_perag	*pag)
+{
+	trace_xfs_perag_rele(pag->pag_mount, pag->pag_agno,
+			atomic_read(&pag->pag_active_ref), _RET_IP_);
+	if (atomic_dec_and_test(&pag->pag_active_ref))
+		wake_up(&pag->pag_active_wq);
+}
+
 /*
  * xfs_initialize_perag_data
  *
@@ -196,6 +258,10 @@ xfs_free_perag(
 		cancel_delayed_work_sync(&pag->pag_blockgc_work);
 		xfs_buf_hash_destroy(pag);
 
+		/* drop the mount's active reference */
+		xfs_perag_rele(pag);
+		XFS_IS_CORRUPT(pag->pag_mount,
+				atomic_read(&pag->pag_active_ref) != 0);
 		call_rcu(&pag->rcu_head, __xfs_free_perag);
 	}
 }
@@ -314,6 +380,7 @@ xfs_initialize_perag(
 		INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
 		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
 		init_waitqueue_head(&pag->pagb_wait);
+		init_waitqueue_head(&pag->pag_active_wq);
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
 #endif /* __KERNEL__ */
@@ -322,6 +389,9 @@ xfs_initialize_perag(
 		if (error)
 			goto out_remove_pag;
 
+		/* Active ref owned by mount indicates AG is online. */
+		atomic_set(&pag->pag_active_ref, 1);
+
 		/* first new pag is fully initialized */
 		if (first_initialised == NULLAGNUMBER)
 			first_initialised = index;
diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 191b22b9a35b..aeb21c8df201 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -32,7 +32,9 @@ struct xfs_ag_resv {
 struct xfs_perag {
 	struct xfs_mount *pag_mount;	/* owner filesystem */
 	xfs_agnumber_t	pag_agno;	/* AG this structure belongs to */
-	atomic_t	pag_ref;	/* perag reference count */
+	atomic_t	pag_ref;	/* passive reference count */
+	atomic_t	pag_active_ref;	/* active reference count */
+	wait_queue_head_t pag_active_wq;/* woken active_ref falls to zero */
 	char		pagf_init;	/* this agf's entry is initialized */
 	char		pagi_init;	/* this agi's entry is initialized */
 	char		pagf_metadata;	/* the agf is preferred to be metadata */
@@ -111,11 +113,18 @@ int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
 int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
 void xfs_free_perag(struct xfs_mount *mp);
 
+/* Passive AG references */
 struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno);
 struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno,
 		unsigned int tag);
 void xfs_perag_put(struct xfs_perag *pag);
 
+/* Active AG references */
+struct xfs_perag *xfs_perag_grab(struct xfs_mount *, xfs_agnumber_t);
+struct xfs_perag *xfs_perag_grab_tag(struct xfs_mount *, xfs_agnumber_t,
+				   int tag);
+void xfs_perag_rele(struct xfs_perag *pag);
+
 /*
  * Per-ag geometry infomation and validation
  */
@@ -193,14 +202,18 @@ xfs_perag_next(
 	struct xfs_mount	*mp = pag->pag_mount;
 
 	*agno = pag->pag_agno + 1;
-	xfs_perag_put(pag);
-	if (*agno > end_agno)
-		return NULL;
-	return xfs_perag_get(mp, *agno);
+	xfs_perag_rele(pag);
+	while (*agno <= end_agno) {
+		pag = xfs_perag_grab(mp, *agno);
+		if (pag)
+			return pag;
+		(*agno)++;
+	}
+	return NULL;
 }
 
 #define for_each_perag_range(mp, agno, end_agno, pag) \
-	for ((pag) = xfs_perag_get((mp), (agno)); \
+	for ((pag) = xfs_perag_grab((mp), (agno)); \
 		(pag) != NULL; \
 		(pag) = xfs_perag_next((pag), &(agno), (end_agno)))
 
@@ -213,11 +226,11 @@ xfs_perag_next(
 	for_each_perag_from((mp), (agno), (pag))
 
 #define for_each_perag_tag(mp, agno, pag, tag) \
-	for ((agno) = 0, (pag) = xfs_perag_get_tag((mp), 0, (tag)); \
+	for ((agno) = 0, (pag) = xfs_perag_grab_tag((mp), 0, (tag)); \
 		(pag) != NULL; \
 		(agno) = (pag)->pag_agno + 1, \
-		xfs_perag_put(pag), \
-		(pag) = xfs_perag_get_tag((mp), (agno), (tag)))
+		xfs_perag_rele(pag), \
+		(pag) = xfs_perag_grab_tag((mp), (agno), (tag)))
 
 struct aghdr_init_data {
 	/* per ag data */
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index d50d0eab196a..dbbc7037074c 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -662,7 +662,7 @@ xchk_bmap_check_rmaps(
 		error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag);
 		if (error ||
 		    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) {
-			xfs_perag_put(pag);
+			xfs_perag_rele(pag);
 			return error;
 		}
 	}
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index 4777e7b89fdc..ef97670970c3 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -117,7 +117,7 @@ xchk_fscount_warmup(
 	if (agi_bp)
 		xfs_buf_relse(agi_bp);
 	if (pag)
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	return error;
 }
 
@@ -249,7 +249,7 @@ xchk_fscount_aggregate_agcounts(
 
 	}
 	if (pag)
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	if (error) {
 		xchk_set_incomplete(sc);
 		return error;
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 88a88506ffff..120d284a03fe 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -688,11 +688,11 @@ __xfs_getfsmap_datadev(
 		info->agf_bp = NULL;
 	}
 	if (info->pag) {
-		xfs_perag_put(info->pag);
+		xfs_perag_rele(info->pag);
 		info->pag = NULL;
 	} else if (pag) {
 		/* loop termination case */
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	}
 
 	return error;
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index ddeaccc04aec..0f4a014dded3 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1767,7 +1767,7 @@ xfs_icwalk(
 		if (error) {
 			last_error = error;
 			if (error == -EFSCORRUPTED) {
-				xfs_perag_put(pag);
+				xfs_perag_rele(pag);
 				break;
 			}
 		}
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index 7558486f4937..c31857d903a4 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -591,7 +591,7 @@ xfs_iwalk(
 	}
 
 	if (iwag.pag)
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	xfs_iwalk_free(&iwag);
 	return error;
 }
@@ -683,7 +683,7 @@ xfs_iwalk_threaded(
 			break;
 	}
 	if (pag)
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	if (polled)
 		xfs_pwork_poll(&pctl);
 	return xfs_pwork_destroy(&pctl);
@@ -776,7 +776,7 @@ xfs_inobt_walk(
 	}
 
 	if (iwag.pag)
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	xfs_iwalk_free(&iwag);
 	return error;
 }
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 57bf59ff4854..f5dc46ce9803 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -927,7 +927,7 @@ xfs_reflink_recover_cow(
 	for_each_perag(mp, agno, pag) {
 		error = xfs_refcount_recover_cow_leftovers(mp, pag);
 		if (error) {
-			xfs_perag_put(pag);
+			xfs_perag_rele(pag);
 			break;
 		}
 	}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 7dc57db6aa42..f0b62054ea68 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -189,6 +189,9 @@ DEFINE_EVENT(xfs_perag_class, name,	\
 DEFINE_PERAG_REF_EVENT(xfs_perag_get);
 DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
 DEFINE_PERAG_REF_EVENT(xfs_perag_put);
+DEFINE_PERAG_REF_EVENT(xfs_perag_grab);
+DEFINE_PERAG_REF_EVENT(xfs_perag_grab_tag);
+DEFINE_PERAG_REF_EVENT(xfs_perag_rele);
 DEFINE_PERAG_REF_EVENT(xfs_perag_set_inode_tag);
 DEFINE_PERAG_REF_EVENT(xfs_perag_clear_inode_tag);
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 08/42] xfs: rework the perag trace points to be perag centric
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (6 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 07/42] xfs: active perag reference counting Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-21  5:16   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 09/42] xfs: convert xfs_imap() to take a perag Dave Chinner
                   ` (34 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

So that they all output the same information in the traces to make
debugging refcount issues easier.

This means that all the lookup/drop functions no longer need to use
the full memory barrier atomic operations (atomic*_return()) so
will have less overhead when tracing is off. The set/clear tag
tracepoints no longer abuse the reference count to pass the tag -
the tag being cleared is obvious from the _RET_IP_ that is recorded
in the trace point.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c | 25 +++++++++----------------
 fs/xfs/xfs_icache.c    |  4 ++--
 fs/xfs/xfs_trace.h     | 21 +++++++++++----------
 3 files changed, 22 insertions(+), 28 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 46e25c682bf4..7cff61875340 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -44,16 +44,15 @@ xfs_perag_get(
 	xfs_agnumber_t		agno)
 {
 	struct xfs_perag	*pag;
-	int			ref = 0;
 
 	rcu_read_lock();
 	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
 	if (pag) {
+		trace_xfs_perag_get(pag, _RET_IP_);
 		ASSERT(atomic_read(&pag->pag_ref) >= 0);
-		ref = atomic_inc_return(&pag->pag_ref);
+		atomic_inc(&pag->pag_ref);
 	}
 	rcu_read_unlock();
-	trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
 	return pag;
 }
 
@@ -68,7 +67,6 @@ xfs_perag_get_tag(
 {
 	struct xfs_perag	*pag;
 	int			found;
-	int			ref;
 
 	rcu_read_lock();
 	found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
@@ -77,9 +75,9 @@ xfs_perag_get_tag(
 		rcu_read_unlock();
 		return NULL;
 	}
-	ref = atomic_inc_return(&pag->pag_ref);
+	trace_xfs_perag_get_tag(pag, _RET_IP_);
+	atomic_inc(&pag->pag_ref);
 	rcu_read_unlock();
-	trace_xfs_perag_get_tag(mp, pag->pag_agno, ref, _RET_IP_);
 	return pag;
 }
 
@@ -87,11 +85,9 @@ void
 xfs_perag_put(
 	struct xfs_perag	*pag)
 {
-	int	ref;
-
+	trace_xfs_perag_put(pag, _RET_IP_);
 	ASSERT(atomic_read(&pag->pag_ref) > 0);
-	ref = atomic_dec_return(&pag->pag_ref);
-	trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref, _RET_IP_);
+	atomic_dec(&pag->pag_ref);
 }
 
 /*
@@ -110,8 +106,7 @@ xfs_perag_grab(
 	rcu_read_lock();
 	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
 	if (pag) {
-		trace_xfs_perag_grab(mp, pag->pag_agno,
-				atomic_read(&pag->pag_active_ref), _RET_IP_);
+		trace_xfs_perag_grab(pag, _RET_IP_);
 		if (!atomic_inc_not_zero(&pag->pag_active_ref))
 			pag = NULL;
 	}
@@ -138,8 +133,7 @@ xfs_perag_grab_tag(
 		rcu_read_unlock();
 		return NULL;
 	}
-	trace_xfs_perag_grab_tag(mp, pag->pag_agno,
-			atomic_read(&pag->pag_active_ref), _RET_IP_);
+	trace_xfs_perag_grab_tag(pag, _RET_IP_);
 	if (!atomic_inc_not_zero(&pag->pag_active_ref))
 		pag = NULL;
 	rcu_read_unlock();
@@ -150,8 +144,7 @@ void
 xfs_perag_rele(
 	struct xfs_perag	*pag)
 {
-	trace_xfs_perag_rele(pag->pag_mount, pag->pag_agno,
-			atomic_read(&pag->pag_active_ref), _RET_IP_);
+	trace_xfs_perag_rele(pag, _RET_IP_);
 	if (atomic_dec_and_test(&pag->pag_active_ref))
 		wake_up(&pag->pag_active_wq);
 }
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 0f4a014dded3..8b2823d85a68 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -255,7 +255,7 @@ xfs_perag_set_inode_tag(
 		break;
 	}
 
-	trace_xfs_perag_set_inode_tag(mp, pag->pag_agno, tag, _RET_IP_);
+	trace_xfs_perag_set_inode_tag(pag, _RET_IP_);
 }
 
 /* Clear a tag on both the AG incore inode tree and the AG radix tree. */
@@ -289,7 +289,7 @@ xfs_perag_clear_inode_tag(
 	radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno, tag);
 	spin_unlock(&mp->m_perag_lock);
 
-	trace_xfs_perag_clear_inode_tag(mp, pag->pag_agno, tag, _RET_IP_);
+	trace_xfs_perag_clear_inode_tag(pag, _RET_IP_);
 }
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f0b62054ea68..c921e9a5256d 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -159,33 +159,34 @@ TRACE_EVENT(xlog_intent_recovery_failed,
 );
 
 DECLARE_EVENT_CLASS(xfs_perag_class,
-	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int refcount,
-		 unsigned long caller_ip),
-	TP_ARGS(mp, agno, refcount, caller_ip),
+	TP_PROTO(struct xfs_perag *pag, unsigned long caller_ip),
+	TP_ARGS(pag, caller_ip),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
 		__field(int, refcount)
+		__field(int, active_refcount)
 		__field(unsigned long, caller_ip)
 	),
 	TP_fast_assign(
-		__entry->dev = mp->m_super->s_dev;
-		__entry->agno = agno;
-		__entry->refcount = refcount;
+		__entry->dev = pag->pag_mount->m_super->s_dev;
+		__entry->agno = pag->pag_agno;
+		__entry->refcount = atomic_read(&pag->pag_ref);
+		__entry->active_refcount = atomic_read(&pag->pag_active_ref);
 		__entry->caller_ip = caller_ip;
 	),
-	TP_printk("dev %d:%d agno 0x%x refcount %d caller %pS",
+	TP_printk("dev %d:%d agno 0x%x passive refs %d active refs %d caller %pS",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
 		  __entry->refcount,
+		  __entry->active_refcount,
 		  (char *)__entry->caller_ip)
 );
 
 #define DEFINE_PERAG_REF_EVENT(name)	\
 DEFINE_EVENT(xfs_perag_class, name,	\
-	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int refcount,	\
-		 unsigned long caller_ip),					\
-	TP_ARGS(mp, agno, refcount, caller_ip))
+	TP_PROTO(struct xfs_perag *pag, unsigned long caller_ip), \
+	TP_ARGS(pag, caller_ip))
 DEFINE_PERAG_REF_EVENT(xfs_perag_get);
 DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
 DEFINE_PERAG_REF_EVENT(xfs_perag_put);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 09/42] xfs: convert xfs_imap() to take a perag
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (7 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 08/42] xfs: rework the perag trace points to be perag centric Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 19:10   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 10/42] xfs: use active perag references for inode allocation Dave Chinner
                   ` (33 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Callers have referenced perags but they don't pass it into
xfs_imap() so it takes it's own reference. Fix that so we can change
inode allocation over to using active references.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 43 +++++++++++++-------------------------
 fs/xfs/libxfs/xfs_ialloc.h |  3 ++-
 fs/xfs/scrub/common.c      | 13 ++++++++----
 fs/xfs/xfs_icache.c        |  2 +-
 4 files changed, 27 insertions(+), 34 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index e8068422aa21..2b4961ff2e24 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -2217,15 +2217,15 @@ xfs_difree(
 
 STATIC int
 xfs_imap_lookup(
-	struct xfs_mount	*mp,
-	struct xfs_trans	*tp,
 	struct xfs_perag	*pag,
+	struct xfs_trans	*tp,
 	xfs_agino_t		agino,
 	xfs_agblock_t		agbno,
 	xfs_agblock_t		*chunk_agbno,
 	xfs_agblock_t		*offset_agbno,
 	int			flags)
 {
+	struct xfs_mount	*mp = pag->pag_mount;
 	struct xfs_inobt_rec_incore rec;
 	struct xfs_btree_cur	*cur;
 	struct xfs_buf		*agbp;
@@ -2280,12 +2280,13 @@ xfs_imap_lookup(
  */
 int
 xfs_imap(
-	struct xfs_mount	 *mp,	/* file system mount structure */
+	struct xfs_perag	*pag,
 	struct xfs_trans	 *tp,	/* transaction pointer */
 	xfs_ino_t		ino,	/* inode to locate */
 	struct xfs_imap		*imap,	/* location map structure */
 	uint			flags)	/* flags for inode btree lookup */
 {
+	struct xfs_mount	*mp = pag->pag_mount;
 	xfs_agblock_t		agbno;	/* block number of inode in the alloc group */
 	xfs_agino_t		agino;	/* inode number within alloc group */
 	xfs_agblock_t		chunk_agbno;	/* first block in inode chunk */
@@ -2293,17 +2294,15 @@ xfs_imap(
 	int			error;	/* error code */
 	int			offset;	/* index of inode in its buffer */
 	xfs_agblock_t		offset_agbno;	/* blks from chunk start to inode */
-	struct xfs_perag	*pag;
 
 	ASSERT(ino != NULLFSINO);
 
 	/*
 	 * Split up the inode number into its parts.
 	 */
-	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ino));
 	agino = XFS_INO_TO_AGINO(mp, ino);
 	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
-	if (!pag || agbno >= mp->m_sb.sb_agblocks ||
+	if (agbno >= mp->m_sb.sb_agblocks ||
 	    ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
 		error = -EINVAL;
 #ifdef DEBUG
@@ -2312,20 +2311,14 @@ xfs_imap(
 		 * as they can be invalid without implying corruption.
 		 */
 		if (flags & XFS_IGET_UNTRUSTED)
-			goto out_drop;
-		if (!pag) {
-			xfs_alert(mp,
-				"%s: agno (%d) >= mp->m_sb.sb_agcount (%d)",
-				__func__, XFS_INO_TO_AGNO(mp, ino),
-				mp->m_sb.sb_agcount);
-		}
+			return error;
 		if (agbno >= mp->m_sb.sb_agblocks) {
 			xfs_alert(mp,
 		"%s: agbno (0x%llx) >= mp->m_sb.sb_agblocks (0x%lx)",
 				__func__, (unsigned long long)agbno,
 				(unsigned long)mp->m_sb.sb_agblocks);
 		}
-		if (pag && ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
+		if (ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
 			xfs_alert(mp,
 		"%s: ino (0x%llx) != XFS_AGINO_TO_INO() (0x%llx)",
 				__func__, ino,
@@ -2333,7 +2326,7 @@ xfs_imap(
 		}
 		xfs_stack_trace();
 #endif /* DEBUG */
-		goto out_drop;
+		return error;
 	}
 
 	/*
@@ -2344,10 +2337,10 @@ xfs_imap(
 	 * in all cases where an untrusted inode number is passed.
 	 */
 	if (flags & XFS_IGET_UNTRUSTED) {
-		error = xfs_imap_lookup(mp, tp, pag, agino, agbno,
+		error = xfs_imap_lookup(pag, tp, agino, agbno,
 					&chunk_agbno, &offset_agbno, flags);
 		if (error)
-			goto out_drop;
+			return error;
 		goto out_map;
 	}
 
@@ -2363,8 +2356,7 @@ xfs_imap(
 		imap->im_len = XFS_FSB_TO_BB(mp, 1);
 		imap->im_boffset = (unsigned short)(offset <<
 							mp->m_sb.sb_inodelog);
-		error = 0;
-		goto out_drop;
+		return 0;
 	}
 
 	/*
@@ -2376,10 +2368,10 @@ xfs_imap(
 		offset_agbno = agbno & M_IGEO(mp)->inoalign_mask;
 		chunk_agbno = agbno - offset_agbno;
 	} else {
-		error = xfs_imap_lookup(mp, tp, pag, agino, agbno,
+		error = xfs_imap_lookup(pag, tp, agino, agbno,
 					&chunk_agbno, &offset_agbno, flags);
 		if (error)
-			goto out_drop;
+			return error;
 	}
 
 out_map:
@@ -2407,14 +2399,9 @@ xfs_imap(
 			__func__, (unsigned long long) imap->im_blkno,
 			(unsigned long long) imap->im_len,
 			XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
-		error = -EINVAL;
-		goto out_drop;
+		return -EINVAL;
 	}
-	error = 0;
-out_drop:
-	if (pag)
-		xfs_perag_put(pag);
-	return error;
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 9bbbca6ac4ed..4cfce2eebe7e 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -12,6 +12,7 @@ struct xfs_imap;
 struct xfs_mount;
 struct xfs_trans;
 struct xfs_btree_cur;
+struct xfs_perag;
 
 /* Move inodes in clusters of this size */
 #define	XFS_INODE_BIG_CLUSTER_SIZE	8192
@@ -47,7 +48,7 @@ int xfs_difree(struct xfs_trans *tp, struct xfs_perag *pag,
  */
 int
 xfs_imap(
-	struct xfs_mount *mp,		/* file system mount structure */
+	struct xfs_perag *pag,
 	struct xfs_trans *tp,		/* transaction pointer */
 	xfs_ino_t	ino,		/* inode to locate */
 	struct xfs_imap	*imap,		/* location map structure */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 613260b04a3d..033bf6730ece 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -636,6 +636,7 @@ xchk_get_inode(
 {
 	struct xfs_imap		imap;
 	struct xfs_mount	*mp = sc->mp;
+	struct xfs_perag	*pag;
 	struct xfs_inode	*ip_in = XFS_I(file_inode(sc->file));
 	struct xfs_inode	*ip = NULL;
 	int			error;
@@ -671,10 +672,14 @@ xchk_get_inode(
 		 * Otherwise, we really couldn't find it so tell userspace
 		 * that it no longer exists.
 		 */
-		error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap,
-				XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE);
-		if (error)
-			return -ENOENT;
+		pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, sc->sm->sm_ino));
+		if (pag) {
+			error = xfs_imap(pag, sc->tp, sc->sm->sm_ino, &imap,
+					XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE);
+			xfs_perag_put(pag);
+			if (error)
+				return -ENOENT;
+		}
 		error = -EFSCORRUPTED;
 		fallthrough;
 	default:
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 8b2823d85a68..c9a7e270a428 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -586,7 +586,7 @@ xfs_iget_cache_miss(
 	if (!ip)
 		return -ENOMEM;
 
-	error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, flags);
+	error = xfs_imap(pag, tp, ip->i_ino, &ip->i_imap, flags);
 	if (error)
 		goto out_destroy;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 10/42] xfs: use active perag references for inode allocation
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (8 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 09/42] xfs: convert xfs_imap() to take a perag Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-22  6:48   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 11/42] xfs: inobt can use perags in many more places than it does Dave Chinner
                   ` (32 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Convert the inode allocation routines to use active perag references
or references held by callers rather than grab their own. Also drive
the perag further inwards to replace xfs_mounts when doing
operations on a specific AG.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c     |  3 +-
 fs/xfs/libxfs/xfs_ialloc.c | 63 +++++++++++++++++++-------------------
 fs/xfs/libxfs/xfs_ialloc.h |  2 +-
 3 files changed, 33 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 7cff61875340..a3bdcde95845 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -925,8 +925,7 @@ xfs_ag_shrink_space(
 	 * Make sure that the last inode cluster cannot overlap with the new
 	 * end of the AG, even if it's sparse.
 	 */
-	error = xfs_ialloc_check_shrink(*tpp, pag->pag_agno, agibp,
-			aglen - delta);
+	error = xfs_ialloc_check_shrink(pag, *tpp, agibp, aglen - delta);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 2b4961ff2e24..a1a482ec3065 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -169,14 +169,14 @@ xfs_inobt_insert_rec(
  */
 STATIC int
 xfs_inobt_insert(
-	struct xfs_mount	*mp,
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
 	struct xfs_buf		*agbp,
-	struct xfs_perag	*pag,
 	xfs_agino_t		newino,
 	xfs_agino_t		newlen,
 	xfs_btnum_t		btnum)
 {
+	struct xfs_mount	*mp = pag->pag_mount;
 	struct xfs_btree_cur	*cur;
 	xfs_agino_t		thisino;
 	int			i;
@@ -514,14 +514,14 @@ __xfs_inobt_rec_merge(
  */
 STATIC int
 xfs_inobt_insert_sprec(
-	struct xfs_mount		*mp,
+	struct xfs_perag		*pag,
 	struct xfs_trans		*tp,
 	struct xfs_buf			*agbp,
-	struct xfs_perag		*pag,
 	int				btnum,
 	struct xfs_inobt_rec_incore	*nrec,	/* in/out: new/merged rec. */
 	bool				merge)	/* merge or replace */
 {
+	struct xfs_mount		*mp = pag->pag_mount;
 	struct xfs_btree_cur		*cur;
 	int				error;
 	int				i;
@@ -609,9 +609,9 @@ xfs_inobt_insert_sprec(
  */
 STATIC int
 xfs_ialloc_ag_alloc(
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
-	struct xfs_buf		*agbp,
-	struct xfs_perag	*pag)
+	struct xfs_buf		*agbp)
 {
 	struct xfs_agi		*agi;
 	struct xfs_alloc_arg	args;
@@ -831,7 +831,7 @@ xfs_ialloc_ag_alloc(
 		 * if necessary. If a merge does occur, rec is updated to the
 		 * merged record.
 		 */
-		error = xfs_inobt_insert_sprec(args.mp, tp, agbp, pag,
+		error = xfs_inobt_insert_sprec(pag, tp, agbp,
 				XFS_BTNUM_INO, &rec, true);
 		if (error == -EFSCORRUPTED) {
 			xfs_alert(args.mp,
@@ -856,20 +856,20 @@ xfs_ialloc_ag_alloc(
 		 * existing record with this one.
 		 */
 		if (xfs_has_finobt(args.mp)) {
-			error = xfs_inobt_insert_sprec(args.mp, tp, agbp, pag,
+			error = xfs_inobt_insert_sprec(pag, tp, agbp,
 				       XFS_BTNUM_FINO, &rec, false);
 			if (error)
 				return error;
 		}
 	} else {
 		/* full chunk - insert new records to both btrees */
-		error = xfs_inobt_insert(args.mp, tp, agbp, pag, newino, newlen,
+		error = xfs_inobt_insert(pag, tp, agbp, newino, newlen,
 					 XFS_BTNUM_INO);
 		if (error)
 			return error;
 
 		if (xfs_has_finobt(args.mp)) {
-			error = xfs_inobt_insert(args.mp, tp, agbp, pag, newino,
+			error = xfs_inobt_insert(pag, tp, agbp, newino,
 						 newlen, XFS_BTNUM_FINO);
 			if (error)
 				return error;
@@ -981,9 +981,9 @@ xfs_inobt_first_free_inode(
  */
 STATIC int
 xfs_dialloc_ag_inobt(
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
 	struct xfs_buf		*agbp,
-	struct xfs_perag	*pag,
 	xfs_ino_t		parent,
 	xfs_ino_t		*inop)
 {
@@ -1429,9 +1429,9 @@ xfs_dialloc_ag_update_inobt(
  */
 static int
 xfs_dialloc_ag(
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
 	struct xfs_buf		*agbp,
-	struct xfs_perag	*pag,
 	xfs_ino_t		parent,
 	xfs_ino_t		*inop)
 {
@@ -1448,7 +1448,7 @@ xfs_dialloc_ag(
 	int				i;
 
 	if (!xfs_has_finobt(mp))
-		return xfs_dialloc_ag_inobt(tp, agbp, pag, parent, inop);
+		return xfs_dialloc_ag_inobt(pag, tp, agbp, parent, inop);
 
 	/*
 	 * If pagino is 0 (this is the root inode allocation) use newino.
@@ -1594,8 +1594,8 @@ xfs_ialloc_next_ag(
 
 static bool
 xfs_dialloc_good_ag(
-	struct xfs_trans	*tp,
 	struct xfs_perag	*pag,
+	struct xfs_trans	*tp,
 	umode_t			mode,
 	int			flags,
 	bool			ok_alloc)
@@ -1606,6 +1606,8 @@ xfs_dialloc_good_ag(
 	int			needspace;
 	int			error;
 
+	if (!pag)
+		return false;
 	if (!pag->pagi_inodeok)
 		return false;
 
@@ -1665,8 +1667,8 @@ xfs_dialloc_good_ag(
 
 static int
 xfs_dialloc_try_ag(
-	struct xfs_trans	**tpp,
 	struct xfs_perag	*pag,
+	struct xfs_trans	**tpp,
 	xfs_ino_t		parent,
 	xfs_ino_t		*new_ino,
 	bool			ok_alloc)
@@ -1689,7 +1691,7 @@ xfs_dialloc_try_ag(
 			goto out_release;
 		}
 
-		error = xfs_ialloc_ag_alloc(*tpp, agbp, pag);
+		error = xfs_ialloc_ag_alloc(pag, *tpp, agbp);
 		if (error < 0)
 			goto out_release;
 
@@ -1705,7 +1707,7 @@ xfs_dialloc_try_ag(
 	}
 
 	/* Allocate an inode in the found AG */
-	error = xfs_dialloc_ag(*tpp, agbp, pag, parent, &ino);
+	error = xfs_dialloc_ag(pag, *tpp, agbp, parent, &ino);
 	if (!error)
 		*new_ino = ino;
 	return error;
@@ -1790,9 +1792,9 @@ xfs_dialloc(
 	agno = start_agno;
 	flags = XFS_ALLOC_FLAG_TRYLOCK;
 	for (;;) {
-		pag = xfs_perag_get(mp, agno);
-		if (xfs_dialloc_good_ag(*tpp, pag, mode, flags, ok_alloc)) {
-			error = xfs_dialloc_try_ag(tpp, pag, parent,
+		pag = xfs_perag_grab(mp, agno);
+		if (xfs_dialloc_good_ag(pag, *tpp, mode, flags, ok_alloc)) {
+			error = xfs_dialloc_try_ag(pag, tpp, parent,
 					&ino, ok_alloc);
 			if (error != -EAGAIN)
 				break;
@@ -1813,12 +1815,12 @@ xfs_dialloc(
 			if (low_space)
 				ok_alloc = true;
 		}
-		xfs_perag_put(pag);
+		xfs_perag_rele(pag);
 	}
 
 	if (!error)
 		*new_ino = ino;
-	xfs_perag_put(pag);
+	xfs_perag_rele(pag);
 	return error;
 }
 
@@ -1902,14 +1904,14 @@ xfs_difree_inode_chunk(
 
 STATIC int
 xfs_difree_inobt(
-	struct xfs_mount		*mp,
+	struct xfs_perag		*pag,
 	struct xfs_trans		*tp,
 	struct xfs_buf			*agbp,
-	struct xfs_perag		*pag,
 	xfs_agino_t			agino,
 	struct xfs_icluster		*xic,
 	struct xfs_inobt_rec_incore	*orec)
 {
+	struct xfs_mount		*mp = pag->pag_mount;
 	struct xfs_agi			*agi = agbp->b_addr;
 	struct xfs_btree_cur		*cur;
 	struct xfs_inobt_rec_incore	rec;
@@ -2036,13 +2038,13 @@ xfs_difree_inobt(
  */
 STATIC int
 xfs_difree_finobt(
-	struct xfs_mount		*mp,
+	struct xfs_perag		*pag,
 	struct xfs_trans		*tp,
 	struct xfs_buf			*agbp,
-	struct xfs_perag		*pag,
 	xfs_agino_t			agino,
 	struct xfs_inobt_rec_incore	*ibtrec) /* inobt record */
 {
+	struct xfs_mount		*mp = pag->pag_mount;
 	struct xfs_btree_cur		*cur;
 	struct xfs_inobt_rec_incore	rec;
 	int				offset = agino - ibtrec->ir_startino;
@@ -2196,7 +2198,7 @@ xfs_difree(
 	/*
 	 * Fix up the inode allocation btree.
 	 */
-	error = xfs_difree_inobt(mp, tp, agbp, pag, agino, xic, &rec);
+	error = xfs_difree_inobt(pag, tp, agbp, agino, xic, &rec);
 	if (error)
 		goto error0;
 
@@ -2204,7 +2206,7 @@ xfs_difree(
 	 * Fix up the free inode btree.
 	 */
 	if (xfs_has_finobt(mp)) {
-		error = xfs_difree_finobt(mp, tp, agbp, pag, agino, &rec);
+		error = xfs_difree_finobt(pag, tp, agbp, agino, &rec);
 		if (error)
 			goto error0;
 	}
@@ -2928,15 +2930,14 @@ xfs_ialloc_calc_rootino(
  */
 int
 xfs_ialloc_check_shrink(
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
 	struct xfs_buf		*agibp,
 	xfs_agblock_t		new_length)
 {
 	struct xfs_inobt_rec_incore rec;
 	struct xfs_btree_cur	*cur;
 	struct xfs_mount	*mp = tp->t_mountp;
-	struct xfs_perag	*pag;
 	xfs_agino_t		agino = XFS_AGB_TO_AGINO(mp, new_length);
 	int			has;
 	int			error;
@@ -2944,7 +2945,6 @@ xfs_ialloc_check_shrink(
 	if (!xfs_has_sparseinodes(mp))
 		return 0;
 
-	pag = xfs_perag_get(mp, agno);
 	cur = xfs_inobt_init_cursor(mp, tp, agibp, pag, XFS_BTNUM_INO);
 
 	/* Look up the inobt record that would correspond to the new EOFS. */
@@ -2968,6 +2968,5 @@ xfs_ialloc_check_shrink(
 	}
 out:
 	xfs_btree_del_cursor(cur, error);
-	xfs_perag_put(pag);
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
index 4cfce2eebe7e..ab8c30b4ec22 100644
--- a/fs/xfs/libxfs/xfs_ialloc.h
+++ b/fs/xfs/libxfs/xfs_ialloc.h
@@ -107,7 +107,7 @@ int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
 void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
 xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
 
-int xfs_ialloc_check_shrink(struct xfs_trans *tp, xfs_agnumber_t agno,
+int xfs_ialloc_check_shrink(struct xfs_perag *pag, struct xfs_trans *tp,
 		struct xfs_buf *agibp, xfs_agblock_t new_length);
 
 #endif	/* __XFS_IALLOC_H__ */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 11/42] xfs: inobt can use perags in many more places than it does
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (9 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 10/42] xfs: use active perag references for inode allocation Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-22  6:48   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 12/42] xfs: convert xfs_ialloc_next_ag() to an atomic Dave Chinner
                   ` (31 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Lots of code in the inobt infrastructure is passed both xfs_mount
and perags. We only need perags for the per-ag inode allocation
code, so reduce the duplication by passing only the perags as the
primary object.

This ends up reducing the code size by a bit:

	   text    data     bss     dec     hex filename
orig	1138878  323979     548 1463405  16546d (TOTALS)
patched	1138709  323979     548 1463236  1653c4 (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag_resv.c      |  2 +-
 fs/xfs/libxfs/xfs_ialloc.c       | 25 +++++++++++----------
 fs/xfs/libxfs/xfs_ialloc_btree.c | 37 ++++++++++++++------------------
 fs/xfs/libxfs/xfs_ialloc_btree.h | 20 ++++++++---------
 fs/xfs/scrub/agheader_repair.c   |  7 +++---
 fs/xfs/scrub/common.c            |  8 +++----
 fs/xfs/xfs_iwalk.c               |  4 ++--
 7 files changed, 47 insertions(+), 56 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index 5af123d13a63..7fd1fea95552 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -264,7 +264,7 @@ xfs_ag_resv_init(
 		if (error)
 			goto out;
 
-		error = xfs_finobt_calc_reserves(mp, tp, pag, &ask, &used);
+		error = xfs_finobt_calc_reserves(pag, tp, &ask, &used);
 		if (error)
 			goto out;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index a1a482ec3065..5b8401038bab 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -176,13 +176,12 @@ xfs_inobt_insert(
 	xfs_agino_t		newlen,
 	xfs_btnum_t		btnum)
 {
-	struct xfs_mount	*mp = pag->pag_mount;
 	struct xfs_btree_cur	*cur;
 	xfs_agino_t		thisino;
 	int			i;
 	int			error;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, btnum);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, btnum);
 
 	for (thisino = newino;
 	     thisino < newino + newlen;
@@ -527,7 +526,7 @@ xfs_inobt_insert_sprec(
 	int				i;
 	struct xfs_inobt_rec_incore	rec;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, btnum);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, btnum);
 
 	/* the new record is pre-aligned so we know where to look */
 	error = xfs_inobt_lookup(cur, nrec->ir_startino, XFS_LOOKUP_EQ, &i);
@@ -1004,7 +1003,7 @@ xfs_dialloc_ag_inobt(
 	ASSERT(pag->pagi_freecount > 0);
 
  restart_pagno:
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
 	/*
 	 * If pagino is 0 (this is the root inode allocation) use newino.
 	 * This must work because we've just allocated some.
@@ -1457,7 +1456,7 @@ xfs_dialloc_ag(
 	if (!pagino)
 		pagino = be32_to_cpu(agi->agi_newino);
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_FINO);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_FINO);
 
 	error = xfs_check_agi_freecount(cur);
 	if (error)
@@ -1500,7 +1499,7 @@ xfs_dialloc_ag(
 	 * the original freecount. If all is well, make the equivalent update to
 	 * the inobt using the finobt record and offset information.
 	 */
-	icur = xfs_inobt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_INO);
+	icur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
 
 	error = xfs_check_agi_freecount(icur);
 	if (error)
@@ -1926,7 +1925,7 @@ xfs_difree_inobt(
 	/*
 	 * Initialize the cursor.
 	 */
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
 
 	error = xfs_check_agi_freecount(cur);
 	if (error)
@@ -2051,7 +2050,7 @@ xfs_difree_finobt(
 	int				error;
 	int				i;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_FINO);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_FINO);
 
 	error = xfs_inobt_lookup(cur, ibtrec->ir_startino, XFS_LOOKUP_EQ, &i);
 	if (error)
@@ -2248,7 +2247,7 @@ xfs_imap_lookup(
 	 * we have a record, we need to ensure it contains the inode number
 	 * we are looking up.
 	 */
-	cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
 	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &i);
 	if (!error) {
 		if (i)
@@ -2937,17 +2936,17 @@ xfs_ialloc_check_shrink(
 {
 	struct xfs_inobt_rec_incore rec;
 	struct xfs_btree_cur	*cur;
-	struct xfs_mount	*mp = tp->t_mountp;
-	xfs_agino_t		agino = XFS_AGB_TO_AGINO(mp, new_length);
+	xfs_agino_t		agino;
 	int			has;
 	int			error;
 
-	if (!xfs_has_sparseinodes(mp))
+	if (!xfs_has_sparseinodes(pag->pag_mount))
 		return 0;
 
-	cur = xfs_inobt_init_cursor(mp, tp, agibp, pag, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(pag, tp, agibp, XFS_BTNUM_INO);
 
 	/* Look up the inobt record that would correspond to the new EOFS. */
+	agino = XFS_AGB_TO_AGINO(pag->pag_mount, new_length);
 	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &has);
 	if (error || !has)
 		goto out;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 8c83e265770c..d657af2ec350 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -36,8 +36,8 @@ STATIC struct xfs_btree_cur *
 xfs_inobt_dup_cursor(
 	struct xfs_btree_cur	*cur)
 {
-	return xfs_inobt_init_cursor(cur->bc_mp, cur->bc_tp,
-			cur->bc_ag.agbp, cur->bc_ag.pag, cur->bc_btnum);
+	return xfs_inobt_init_cursor(cur->bc_ag.pag, cur->bc_tp,
+			cur->bc_ag.agbp, cur->bc_btnum);
 }
 
 STATIC void
@@ -427,11 +427,11 @@ static const struct xfs_btree_ops xfs_finobt_ops = {
  */
 static struct xfs_btree_cur *
 xfs_inobt_init_common(
-	struct xfs_mount	*mp,		/* file system mount point */
-	struct xfs_trans	*tp,		/* transaction pointer */
 	struct xfs_perag	*pag,
+	struct xfs_trans	*tp,		/* transaction pointer */
 	xfs_btnum_t		btnum)		/* ialloc or free ino btree */
 {
+	struct xfs_mount	*mp = pag->pag_mount;
 	struct xfs_btree_cur	*cur;
 
 	cur = xfs_btree_alloc_cursor(mp, tp, btnum,
@@ -456,16 +456,15 @@ xfs_inobt_init_common(
 /* Create an inode btree cursor. */
 struct xfs_btree_cur *
 xfs_inobt_init_cursor(
-	struct xfs_mount	*mp,
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
 	struct xfs_buf		*agbp,
-	struct xfs_perag	*pag,
 	xfs_btnum_t		btnum)
 {
 	struct xfs_btree_cur	*cur;
 	struct xfs_agi		*agi = agbp->b_addr;
 
-	cur = xfs_inobt_init_common(mp, tp, pag, btnum);
+	cur = xfs_inobt_init_common(pag, tp, btnum);
 	if (btnum == XFS_BTNUM_INO)
 		cur->bc_nlevels = be32_to_cpu(agi->agi_level);
 	else
@@ -477,14 +476,13 @@ xfs_inobt_init_cursor(
 /* Create an inode btree cursor with a fake root for staging. */
 struct xfs_btree_cur *
 xfs_inobt_stage_cursor(
-	struct xfs_mount	*mp,
-	struct xbtree_afakeroot	*afake,
 	struct xfs_perag	*pag,
+	struct xbtree_afakeroot	*afake,
 	xfs_btnum_t		btnum)
 {
 	struct xfs_btree_cur	*cur;
 
-	cur = xfs_inobt_init_common(mp, NULL, pag, btnum);
+	cur = xfs_inobt_init_common(pag, NULL, btnum);
 	xfs_btree_stage_afakeroot(cur, afake);
 	return cur;
 }
@@ -708,9 +706,8 @@ xfs_inobt_max_size(
 /* Read AGI and create inobt cursor. */
 int
 xfs_inobt_cur(
-	struct xfs_mount	*mp,
-	struct xfs_trans	*tp,
 	struct xfs_perag	*pag,
+	struct xfs_trans	*tp,
 	xfs_btnum_t		which,
 	struct xfs_btree_cur	**curpp,
 	struct xfs_buf		**agi_bpp)
@@ -725,16 +722,15 @@ xfs_inobt_cur(
 	if (error)
 		return error;
 
-	cur = xfs_inobt_init_cursor(mp, tp, *agi_bpp, pag, which);
+	cur = xfs_inobt_init_cursor(pag, tp, *agi_bpp, which);
 	*curpp = cur;
 	return 0;
 }
 
 static int
 xfs_inobt_count_blocks(
-	struct xfs_mount	*mp,
-	struct xfs_trans	*tp,
 	struct xfs_perag	*pag,
+	struct xfs_trans	*tp,
 	xfs_btnum_t		btnum,
 	xfs_extlen_t		*tree_blocks)
 {
@@ -742,7 +738,7 @@ xfs_inobt_count_blocks(
 	struct xfs_btree_cur	*cur = NULL;
 	int			error;
 
-	error = xfs_inobt_cur(mp, tp, pag, btnum, &cur, &agbp);
+	error = xfs_inobt_cur(pag, tp, btnum, &cur, &agbp);
 	if (error)
 		return error;
 
@@ -779,22 +775,21 @@ xfs_finobt_read_blocks(
  */
 int
 xfs_finobt_calc_reserves(
-	struct xfs_mount	*mp,
-	struct xfs_trans	*tp,
 	struct xfs_perag	*pag,
+	struct xfs_trans	*tp,
 	xfs_extlen_t		*ask,
 	xfs_extlen_t		*used)
 {
 	xfs_extlen_t		tree_len = 0;
 	int			error;
 
-	if (!xfs_has_finobt(mp))
+	if (!xfs_has_finobt(pag->pag_mount))
 		return 0;
 
-	if (xfs_has_inobtcounts(mp))
+	if (xfs_has_inobtcounts(pag->pag_mount))
 		error = xfs_finobt_read_blocks(pag, tp, &tree_len);
 	else
-		error = xfs_inobt_count_blocks(mp, tp, pag, XFS_BTNUM_FINO,
+		error = xfs_inobt_count_blocks(pag, tp, XFS_BTNUM_FINO,
 				&tree_len);
 	if (error)
 		return error;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h
index 26451cb76b98..e859a6e05230 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.h
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
@@ -46,12 +46,10 @@ struct xfs_perag;
 		 (maxrecs) * sizeof(xfs_inobt_key_t) + \
 		 ((index) - 1) * sizeof(xfs_inobt_ptr_t)))
 
-extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount *mp,
-		struct xfs_trans *tp, struct xfs_buf *agbp,
-		struct xfs_perag *pag, xfs_btnum_t btnum);
-struct xfs_btree_cur *xfs_inobt_stage_cursor(struct xfs_mount *mp,
-		struct xbtree_afakeroot *afake, struct xfs_perag *pag,
-		xfs_btnum_t btnum);
+extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_perag *pag,
+		struct xfs_trans *tp, struct xfs_buf *agbp, xfs_btnum_t btnum);
+struct xfs_btree_cur *xfs_inobt_stage_cursor(struct xfs_perag *pag,
+		struct xbtree_afakeroot *afake, xfs_btnum_t btnum);
 extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
 
 /* ir_holemask to inode allocation bitmap conversion */
@@ -64,13 +62,13 @@ int xfs_inobt_rec_check_count(struct xfs_mount *,
 #define xfs_inobt_rec_check_count(mp, rec)	0
 #endif	/* DEBUG */
 
-int xfs_finobt_calc_reserves(struct xfs_mount *mp, struct xfs_trans *tp,
-		struct xfs_perag *pag, xfs_extlen_t *ask, xfs_extlen_t *used);
+int xfs_finobt_calc_reserves(struct xfs_perag *perag, struct xfs_trans *tp,
+		xfs_extlen_t *ask, xfs_extlen_t *used);
 extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp,
 		unsigned long long len);
-int xfs_inobt_cur(struct xfs_mount *mp, struct xfs_trans *tp,
-		struct xfs_perag *pag, xfs_btnum_t btnum,
-		struct xfs_btree_cur **curpp, struct xfs_buf **agi_bpp);
+int xfs_inobt_cur(struct xfs_perag *pag, struct xfs_trans *tp,
+		xfs_btnum_t btnum, struct xfs_btree_cur **curpp,
+		struct xfs_buf **agi_bpp);
 
 void xfs_inobt_commit_staged_btree(struct xfs_btree_cur *cur,
 		struct xfs_trans *tp, struct xfs_buf *agbp);
diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index d75d82151eeb..b80b9111e781 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -873,8 +873,7 @@ xrep_agi_calc_from_btrees(
 	xfs_agino_t		freecount;
 	int			error;
 
-	cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp,
-			sc->sa.pag, XFS_BTNUM_INO);
+	cur = xfs_inobt_init_cursor(sc->sa.pag, sc->tp, agi_bp, XFS_BTNUM_INO);
 	error = xfs_ialloc_count_inodes(cur, &count, &freecount);
 	if (error)
 		goto err;
@@ -894,8 +893,8 @@ xrep_agi_calc_from_btrees(
 	if (xfs_has_finobt(mp) && xfs_has_inobtcounts(mp)) {
 		xfs_agblock_t	blocks;
 
-		cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp,
-				sc->sa.pag, XFS_BTNUM_FINO);
+		cur = xfs_inobt_init_cursor(sc->sa.pag, sc->tp, agi_bp,
+				XFS_BTNUM_FINO);
 		error = xfs_btree_count_blocks(cur, &blocks);
 		if (error)
 			goto err;
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 033bf6730ece..848a8e32e56f 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -478,15 +478,15 @@ xchk_ag_btcur_init(
 	/* Set up a inobt cursor for cross-referencing. */
 	if (sa->agi_bp &&
 	    xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_INO)) {
-		sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
-				sa->pag, XFS_BTNUM_INO);
+		sa->ino_cur = xfs_inobt_init_cursor(sa->pag, sc->tp, sa->agi_bp,
+				XFS_BTNUM_INO);
 	}
 
 	/* Set up a finobt cursor for cross-referencing. */
 	if (sa->agi_bp && xfs_has_finobt(mp) &&
 	    xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_FINO)) {
-		sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa->agi_bp,
-				sa->pag, XFS_BTNUM_FINO);
+		sa->fino_cur = xfs_inobt_init_cursor(sa->pag, sc->tp, sa->agi_bp,
+				XFS_BTNUM_FINO);
 	}
 
 	/* Set up a rmapbt cursor for cross-referencing. */
diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
index c31857d903a4..21be93bf006d 100644
--- a/fs/xfs/xfs_iwalk.c
+++ b/fs/xfs/xfs_iwalk.c
@@ -275,7 +275,7 @@ xfs_iwalk_ag_start(
 
 	/* Set up a fresh cursor and empty the inobt cache. */
 	iwag->nr_recs = 0;
-	error = xfs_inobt_cur(mp, tp, pag, XFS_BTNUM_INO, curpp, agi_bpp);
+	error = xfs_inobt_cur(pag, tp, XFS_BTNUM_INO, curpp, agi_bpp);
 	if (error)
 		return error;
 
@@ -390,7 +390,7 @@ xfs_iwalk_run_callbacks(
 	}
 
 	/* ...and recreate the cursor just past where we left off. */
-	error = xfs_inobt_cur(mp, iwag->tp, iwag->pag, XFS_BTNUM_INO, curpp,
+	error = xfs_inobt_cur(iwag->pag, iwag->tp, XFS_BTNUM_INO, curpp,
 			agi_bpp);
 	if (error)
 		return error;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 12/42] xfs: convert xfs_ialloc_next_ag() to an atomic
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (10 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 11/42] xfs: inobt can use perags in many more places than it does Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-22  7:03   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 13/42] xfs: perags need atomic operational state Dave Chinner
                   ` (30 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

This is currently a spinlock lock protected rotor which can be
implemented with a single atomic operation. Change it to be more
efficient and get rid of the m_agirotor_lock. Noticed while
converting the inode allocation AG selection loop to active perag
references.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ialloc.c | 17 +----------------
 fs/xfs/libxfs/xfs_sb.c     |  3 ++-
 fs/xfs/xfs_mount.h         |  3 +--
 fs/xfs/xfs_super.c         |  1 -
 4 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 5b8401038bab..c8d837d8876f 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1576,21 +1576,6 @@ xfs_dialloc_roll(
 	return error;
 }
 
-static xfs_agnumber_t
-xfs_ialloc_next_ag(
-	xfs_mount_t	*mp)
-{
-	xfs_agnumber_t	agno;
-
-	spin_lock(&mp->m_agirotor_lock);
-	agno = mp->m_agirotor;
-	if (++mp->m_agirotor >= mp->m_maxagi)
-		mp->m_agirotor = 0;
-	spin_unlock(&mp->m_agirotor_lock);
-
-	return agno;
-}
-
 static bool
 xfs_dialloc_good_ag(
 	struct xfs_perag	*pag,
@@ -1748,7 +1733,7 @@ xfs_dialloc(
 	 * an AG has enough space for file creation.
 	 */
 	if (S_ISDIR(mode))
-		start_agno = xfs_ialloc_next_ag(mp);
+		start_agno = atomic_inc_return(&mp->m_agirotor) % mp->m_maxagi;
 	else {
 		start_agno = XFS_INO_TO_AGNO(mp, parent);
 		if (start_agno >= mp->m_maxagi)
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 1eeecf2eb2a7..99cc03a298e2 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -909,7 +909,8 @@ xfs_sb_mount_common(
 	struct xfs_mount	*mp,
 	struct xfs_sb		*sbp)
 {
-	mp->m_agfrotor = mp->m_agirotor = 0;
+	mp->m_agfrotor = 0;
+	atomic_set(&mp->m_agirotor, 0);
 	mp->m_maxagi = mp->m_sb.sb_agcount;
 	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
 	mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 8aca2cc173ac..f3269c0626f0 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -210,8 +210,7 @@ typedef struct xfs_mount {
 	struct xfs_error_cfg	m_error_cfg[XFS_ERR_CLASS_MAX][XFS_ERR_ERRNO_MAX];
 	struct xstats		m_stats;	/* per-fs stats */
 	xfs_agnumber_t		m_agfrotor;	/* last ag where space found */
-	xfs_agnumber_t		m_agirotor;	/* last ag dir inode alloced */
-	spinlock_t		m_agirotor_lock;/* .. and lock protecting it */
+	atomic_t		m_agirotor;	/* last ag dir inode alloced */
 
 	/* Memory shrinker to throttle and reprioritize inodegc */
 	struct shrinker		m_inodegc_shrinker;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 0c4b73e9b29d..96375b5622fd 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1922,7 +1922,6 @@ static int xfs_init_fs_context(
 		return -ENOMEM;
 
 	spin_lock_init(&mp->m_sb_lock);
-	spin_lock_init(&mp->m_agirotor_lock);
 	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
 	spin_lock_init(&mp->m_perag_lock);
 	mutex_init(&mp->m_growlock);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 13/42] xfs: perags need atomic operational state
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (11 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 12/42] xfs: convert xfs_ialloc_next_ag() to an atomic Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-23  4:04   ` Allison Henderson
  2023-01-18 22:44 ` [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap() Dave Chinner
                   ` (29 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We currently don't have any flags or operational state in the
xfs_perag except for the pagf_init and pagi_init flags. And the
agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok
flags, too.

For controlling per-ag operations, we are going to need some atomic
state flags. Hence add an opstate field similar to what we already
have in the mount and log, and convert all these state flags across
to atomic bit operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.h             | 27 ++++++++++++++----
 fs/xfs/libxfs/xfs_alloc.c          | 23 ++++++++-------
 fs/xfs/libxfs/xfs_alloc_btree.c    |  2 +-
 fs/xfs/libxfs/xfs_bmap.c           |  2 +-
 fs/xfs/libxfs/xfs_ialloc.c         | 14 ++++-----
 fs/xfs/libxfs/xfs_ialloc_btree.c   |  4 +--
 fs/xfs/libxfs/xfs_refcount_btree.c |  2 +-
 fs/xfs/libxfs/xfs_rmap_btree.c     |  2 +-
 fs/xfs/scrub/agheader_repair.c     | 28 +++++++++---------
 fs/xfs/scrub/fscounters.c          |  9 ++++--
 fs/xfs/scrub/repair.c              |  2 +-
 fs/xfs/xfs_filestream.c            |  5 ++--
 fs/xfs/xfs_super.c                 | 46 ++++++++++++++++++------------
 13 files changed, 101 insertions(+), 65 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index aeb21c8df201..187d30d9bb13 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -35,13 +35,9 @@ struct xfs_perag {
 	atomic_t	pag_ref;	/* passive reference count */
 	atomic_t	pag_active_ref;	/* active reference count */
 	wait_queue_head_t pag_active_wq;/* woken active_ref falls to zero */
-	char		pagf_init;	/* this agf's entry is initialized */
-	char		pagi_init;	/* this agi's entry is initialized */
-	char		pagf_metadata;	/* the agf is preferred to be metadata */
-	char		pagi_inodeok;	/* The agi is ok for inodes */
+	unsigned long	pag_opstate;
 	uint8_t		pagf_levels[XFS_BTNUM_AGF];
 					/* # of levels in bno & cnt btree */
-	bool		pagf_agflreset; /* agfl requires reset before use */
 	uint32_t	pagf_flcount;	/* count of blocks in freelist */
 	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
 	xfs_extlen_t	pagf_longest;	/* longest free space */
@@ -108,6 +104,27 @@ struct xfs_perag {
 #endif /* __KERNEL__ */
 };
 
+/*
+ * Per-AG operational state. These are atomic flag bits.
+ */
+#define XFS_AGSTATE_AGF_INIT		0
+#define XFS_AGSTATE_AGI_INIT		1
+#define XFS_AGSTATE_PREFERS_METADATA	2
+#define XFS_AGSTATE_ALLOWS_INODES	3
+#define XFS_AGSTATE_AGFL_NEEDS_RESET	4
+
+#define __XFS_AG_OPSTATE(name, NAME) \
+static inline bool xfs_perag_ ## name (struct xfs_perag *pag) \
+{ \
+	return test_bit(XFS_AGSTATE_ ## NAME, &pag->pag_opstate); \
+}
+
+__XFS_AG_OPSTATE(initialised_agf, AGF_INIT)
+__XFS_AG_OPSTATE(initialised_agi, AGI_INIT)
+__XFS_AG_OPSTATE(prefers_metadata, PREFERS_METADATA)
+__XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES)
+__XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
+
 int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
 			xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi);
 int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 9f26a9368eeb..246c2e7d9e7a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2435,7 +2435,7 @@ xfs_agfl_reset(
 	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_agf		*agf = agbp->b_addr;
 
-	ASSERT(pag->pagf_agflreset);
+	ASSERT(xfs_perag_agfl_needs_reset(pag));
 	trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
 
 	xfs_warn(mp,
@@ -2450,7 +2450,7 @@ xfs_agfl_reset(
 				    XFS_AGF_FLCOUNT);
 
 	pag->pagf_flcount = 0;
-	pag->pagf_agflreset = false;
+	clear_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag->pag_opstate);
 }
 
 /*
@@ -2605,7 +2605,7 @@ xfs_alloc_fix_freelist(
 	/* deferred ops (AGFL block frees) require permanent transactions */
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 
-	if (!pag->pagf_init) {
+	if (!xfs_perag_initialised_agf(pag)) {
 		error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
 		if (error) {
 			/* Couldn't lock the AGF so skip this AG. */
@@ -2620,7 +2620,8 @@ xfs_alloc_fix_freelist(
 	 * somewhere else if we are not being asked to try harder at this
 	 * point
 	 */
-	if (pag->pagf_metadata && (args->datatype & XFS_ALLOC_USERDATA) &&
+	if (xfs_perag_prefers_metadata(pag) &&
+	    (args->datatype & XFS_ALLOC_USERDATA) &&
 	    (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
 		ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
 		goto out_agbp_relse;
@@ -2646,7 +2647,7 @@ xfs_alloc_fix_freelist(
 	}
 
 	/* reset a padding mismatched agfl before final free space check */
-	if (pag->pagf_agflreset)
+	if (xfs_perag_agfl_needs_reset(pag))
 		xfs_agfl_reset(tp, agbp, pag);
 
 	/* If there isn't enough total space or single-extent, reject it. */
@@ -2803,7 +2804,7 @@ xfs_alloc_get_freelist(
 	if (be32_to_cpu(agf->agf_flfirst) == xfs_agfl_size(mp))
 		agf->agf_flfirst = 0;
 
-	ASSERT(!pag->pagf_agflreset);
+	ASSERT(!xfs_perag_agfl_needs_reset(pag));
 	be32_add_cpu(&agf->agf_flcount, -1);
 	pag->pagf_flcount--;
 
@@ -2892,7 +2893,7 @@ xfs_alloc_put_freelist(
 	if (be32_to_cpu(agf->agf_fllast) == xfs_agfl_size(mp))
 		agf->agf_fllast = 0;
 
-	ASSERT(!pag->pagf_agflreset);
+	ASSERT(!xfs_perag_agfl_needs_reset(pag));
 	be32_add_cpu(&agf->agf_flcount, 1);
 	pag->pagf_flcount++;
 
@@ -3099,7 +3100,7 @@ xfs_alloc_read_agf(
 		return error;
 
 	agf = agfbp->b_addr;
-	if (!pag->pagf_init) {
+	if (!xfs_perag_initialised_agf(pag)) {
 		pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
 		pag->pagf_btreeblks = be32_to_cpu(agf->agf_btreeblks);
 		pag->pagf_flcount = be32_to_cpu(agf->agf_flcount);
@@ -3111,8 +3112,8 @@ xfs_alloc_read_agf(
 		pag->pagf_levels[XFS_BTNUM_RMAPi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
 		pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
-		pag->pagf_init = 1;
-		pag->pagf_agflreset = xfs_agfl_needs_reset(pag->pag_mount, agf);
+		if (xfs_agfl_needs_reset(pag->pag_mount, agf))
+			set_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag->pag_opstate);
 
 		/*
 		 * Update the in-core allocbt counter. Filter out the rmapbt
@@ -3127,6 +3128,8 @@ xfs_alloc_read_agf(
 		if (allocbt_blks > 0)
 			atomic64_add(allocbt_blks,
 					&pag->pag_mount->m_allocbt_blks);
+
+		set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
 	}
 #ifdef DEBUG
 	else if (!xfs_is_shutdown(pag->pag_mount)) {
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 549a3cba0234..0f29c7b1b39f 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -315,7 +315,7 @@ xfs_allocbt_verify(
 	level = be16_to_cpu(block->bb_level);
 	if (bp->b_ops->magic[0] == cpu_to_be32(XFS_ABTC_MAGIC))
 		btnum = XFS_BTNUM_CNTi;
-	if (pag && pag->pagf_init) {
+	if (pag && xfs_perag_initialised_agf(pag)) {
 		if (level >= pag->pagf_levels[btnum])
 			return __this_address;
 	} else if (level >= mp->m_alloc_maxlevels)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f15d45af661f..6aad0ea5e606 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3147,7 +3147,7 @@ xfs_bmap_longest_free_extent(
 	int			error = 0;
 
 	pag = xfs_perag_get(mp, ag);
-	if (!pag->pagf_init) {
+	if (!xfs_perag_initialised_agf(pag)) {
 		error = xfs_alloc_read_agf(pag, tp, XFS_ALLOC_FLAG_TRYLOCK,
 				NULL);
 		if (error) {
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index c8d837d8876f..2a323ffa5ba9 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -998,8 +998,8 @@ xfs_dialloc_ag_inobt(
 	int			i, j;
 	int			searchdistance = 10;
 
-	ASSERT(pag->pagi_init);
-	ASSERT(pag->pagi_inodeok);
+	ASSERT(xfs_perag_initialised_agi(pag));
+	ASSERT(xfs_perag_allows_inodes(pag));
 	ASSERT(pag->pagi_freecount > 0);
 
  restart_pagno:
@@ -1592,10 +1592,10 @@ xfs_dialloc_good_ag(
 
 	if (!pag)
 		return false;
-	if (!pag->pagi_inodeok)
+	if (!xfs_perag_allows_inodes(pag))
 		return false;
 
-	if (!pag->pagi_init) {
+	if (!xfs_perag_initialised_agi(pag)) {
 		error = xfs_ialloc_read_agi(pag, tp, NULL);
 		if (error)
 			return false;
@@ -1606,7 +1606,7 @@ xfs_dialloc_good_ag(
 	if (!ok_alloc)
 		return false;
 
-	if (!pag->pagf_init) {
+	if (!xfs_perag_initialised_agf(pag)) {
 		error = xfs_alloc_read_agf(pag, tp, flags, NULL);
 		if (error)
 			return false;
@@ -2603,10 +2603,10 @@ xfs_ialloc_read_agi(
 		return error;
 
 	agi = agibp->b_addr;
-	if (!pag->pagi_init) {
+	if (!xfs_perag_initialised_agi(pag)) {
 		pag->pagi_freecount = be32_to_cpu(agi->agi_freecount);
 		pag->pagi_count = be32_to_cpu(agi->agi_count);
-		pag->pagi_init = 1;
+		set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
 	}
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index d657af2ec350..3675a0d29310 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -291,8 +291,8 @@ xfs_inobt_verify(
 	 * Similarly, during log recovery we will have a perag structure
 	 * attached, but the agi information will not yet have been initialised
 	 * from the on disk AGI. We don't currently use any of this information,
-	 * but beware of the landmine (i.e. need to check pag->pagi_init) if we
-	 * ever do.
+	 * but beware of the landmine (i.e. need to check
+	 * xfs_perag_initialised_agi(pag)) if we ever do.
 	 */
 	if (xfs_has_crc(mp)) {
 		fa = xfs_btree_sblock_v5hdr_verify(bp);
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index e1f789866683..d20abf0390fc 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -227,7 +227,7 @@ xfs_refcountbt_verify(
 		return fa;
 
 	level = be16_to_cpu(block->bb_level);
-	if (pag && pag->pagf_init) {
+	if (pag && xfs_perag_initialised_agf(pag)) {
 		if (level >= pag->pagf_refcount_level)
 			return __this_address;
 	} else if (level >= mp->m_refc_maxlevels)
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 7f83f62e51e0..d3285684bb5e 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -313,7 +313,7 @@ xfs_rmapbt_verify(
 		return fa;
 
 	level = be16_to_cpu(block->bb_level);
-	if (pag && pag->pagf_init) {
+	if (pag && xfs_perag_initialised_agf(pag)) {
 		if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
 			return __this_address;
 	} else if (level >= mp->m_rmap_maxlevels)
diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index b80b9111e781..c37e6d72760b 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -191,14 +191,15 @@ xrep_agf_init_header(
 	struct xfs_agf		*old_agf)
 {
 	struct xfs_mount	*mp = sc->mp;
+	struct xfs_perag	*pag = sc->sa.pag;
 	struct xfs_agf		*agf = agf_bp->b_addr;
 
 	memcpy(old_agf, agf, sizeof(*old_agf));
 	memset(agf, 0, BBTOB(agf_bp->b_length));
 	agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
 	agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
-	agf->agf_seqno = cpu_to_be32(sc->sa.pag->pag_agno);
-	agf->agf_length = cpu_to_be32(sc->sa.pag->block_count);
+	agf->agf_seqno = cpu_to_be32(pag->pag_agno);
+	agf->agf_length = cpu_to_be32(pag->block_count);
 	agf->agf_flfirst = old_agf->agf_flfirst;
 	agf->agf_fllast = old_agf->agf_fllast;
 	agf->agf_flcount = old_agf->agf_flcount;
@@ -206,8 +207,8 @@ xrep_agf_init_header(
 		uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
 
 	/* Mark the incore AGF data stale until we're done fixing things. */
-	ASSERT(sc->sa.pag->pagf_init);
-	sc->sa.pag->pagf_init = 0;
+	ASSERT(xfs_perag_initialised_agf(pag));
+	clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
 }
 
 /* Set btree root information in an AGF. */
@@ -333,7 +334,7 @@ xrep_agf_commit_new(
 	pag->pagf_levels[XFS_BTNUM_RMAPi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
 	pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
-	pag->pagf_init = 1;
+	set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
 
 	return 0;
 }
@@ -434,7 +435,7 @@ xrep_agf(
 
 out_revert:
 	/* Mark the incore AGF state stale and revert the AGF. */
-	sc->sa.pag->pagf_init = 0;
+	clear_bit(XFS_AGSTATE_AGF_INIT, &sc->sa.pag->pag_opstate);
 	memcpy(agf, &old_agf, sizeof(old_agf));
 	return error;
 }
@@ -618,7 +619,7 @@ xrep_agfl_update_agf(
 	xfs_force_summary_recalc(sc->mp);
 
 	/* Update the AGF counters. */
-	if (sc->sa.pag->pagf_init)
+	if (xfs_perag_initialised_agf(sc->sa.pag))
 		sc->sa.pag->pagf_flcount = flcount;
 	agf->agf_flfirst = cpu_to_be32(0);
 	agf->agf_flcount = cpu_to_be32(flcount);
@@ -822,14 +823,15 @@ xrep_agi_init_header(
 	struct xfs_agi		*old_agi)
 {
 	struct xfs_agi		*agi = agi_bp->b_addr;
+	struct xfs_perag	*pag = sc->sa.pag;
 	struct xfs_mount	*mp = sc->mp;
 
 	memcpy(old_agi, agi, sizeof(*old_agi));
 	memset(agi, 0, BBTOB(agi_bp->b_length));
 	agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
 	agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
-	agi->agi_seqno = cpu_to_be32(sc->sa.pag->pag_agno);
-	agi->agi_length = cpu_to_be32(sc->sa.pag->block_count);
+	agi->agi_seqno = cpu_to_be32(pag->pag_agno);
+	agi->agi_length = cpu_to_be32(pag->block_count);
 	agi->agi_newino = cpu_to_be32(NULLAGINO);
 	agi->agi_dirino = cpu_to_be32(NULLAGINO);
 	if (xfs_has_crc(mp))
@@ -840,8 +842,8 @@ xrep_agi_init_header(
 			sizeof(agi->agi_unlinked));
 
 	/* Mark the incore AGF data stale until we're done fixing things. */
-	ASSERT(sc->sa.pag->pagi_init);
-	sc->sa.pag->pagi_init = 0;
+	ASSERT(xfs_perag_initialised_agi(pag));
+	clear_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
 }
 
 /* Set btree root information in an AGI. */
@@ -928,7 +930,7 @@ xrep_agi_commit_new(
 	pag = sc->sa.pag;
 	pag->pagi_count = be32_to_cpu(agi->agi_count);
 	pag->pagi_freecount = be32_to_cpu(agi->agi_freecount);
-	pag->pagi_init = 1;
+	set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
 
 	return 0;
 }
@@ -993,7 +995,7 @@ xrep_agi(
 
 out_revert:
 	/* Mark the incore AGI state stale and revert the AGI. */
-	sc->sa.pag->pagi_init = 0;
+	clear_bit(XFS_AGSTATE_AGI_INIT, &sc->sa.pag->pag_opstate);
 	memcpy(agi, &old_agi, sizeof(old_agi));
 	return error;
 }
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index ef97670970c3..f0c7f41897b9 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -86,7 +86,8 @@ xchk_fscount_warmup(
 	for_each_perag(mp, agno, pag) {
 		if (xchk_should_terminate(sc, &error))
 			break;
-		if (pag->pagi_init && pag->pagf_init)
+		if (xfs_perag_initialised_agi(pag) &&
+		    xfs_perag_initialised_agf(pag))
 			continue;
 
 		/* Lock both AG headers. */
@@ -101,7 +102,8 @@ xchk_fscount_warmup(
 		 * These are supposed to be initialized by the header read
 		 * function.
 		 */
-		if (!pag->pagi_init || !pag->pagf_init) {
+		if (!xfs_perag_initialised_agi(pag) ||
+		    !xfs_perag_initialised_agf(pag)) {
 			error = -EFSCORRUPTED;
 			break;
 		}
@@ -220,7 +222,8 @@ xchk_fscount_aggregate_agcounts(
 			break;
 
 		/* This somehow got unset since the warmup? */
-		if (!pag->pagi_init || !pag->pagf_init) {
+		if (!xfs_perag_initialised_agi(pag) ||
+		    !xfs_perag_initialised_agf(pag)) {
 			error = -EFSCORRUPTED;
 			break;
 		}
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 4b92f9253ccd..d0b1644efb89 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -206,7 +206,7 @@ xrep_calc_ag_resblks(
 		return 0;
 
 	pag = xfs_perag_get(mp, sm->sm_agno);
-	if (pag->pagi_init) {
+	if (xfs_perag_initialised_agi(pag)) {
 		/* Use in-core icount if possible. */
 		icount = pag->pagi_count;
 	} else {
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 34b21a29c39b..7e8b25ab6c46 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -125,7 +125,7 @@ xfs_filestream_pick_ag(
 
 		pag = xfs_perag_get(mp, ag);
 
-		if (!pag->pagf_init) {
+		if (!xfs_perag_initialised_agf(pag)) {
 			err = xfs_alloc_read_agf(pag, NULL, trylock, NULL);
 			if (err) {
 				if (err != -EAGAIN) {
@@ -159,7 +159,8 @@ xfs_filestream_pick_ag(
 				xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE));
 		if (((minlen && longest >= minlen) ||
 		     (!minlen && pag->pagf_freeblks >= minfree)) &&
-		    (!pag->pagf_metadata || !(flags & XFS_PICK_USERDATA) ||
+		    (!xfs_perag_prefers_metadata(pag) ||
+		     !(flags & XFS_PICK_USERDATA) ||
 		     (flags & XFS_PICK_LOWSPACE))) {
 
 			/* Break out, retaining the reference on the AG. */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 96375b5622fd..2479b5cbd75e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -247,6 +247,32 @@ xfs_fs_show_options(
 	return 0;
 }
 
+static bool
+xfs_set_inode_alloc_perag(
+	struct xfs_perag	*pag,
+	xfs_ino_t		ino,
+	xfs_agnumber_t		max_metadata)
+{
+	if (!xfs_is_inode32(pag->pag_mount)) {
+		set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
+		clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
+		return false;
+	}
+
+	if (ino > XFS_MAXINUMBER_32) {
+		clear_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
+		clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
+		return false;
+	}
+
+	set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
+	if (pag->pag_agno < max_metadata)
+		set_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
+	else
+		clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag->pag_opstate);
+	return true;
+}
+
 /*
  * Set parameters for inode allocation heuristics, taking into account
  * filesystem size and inode32/inode64 mount options; i.e. specifically
@@ -310,24 +336,8 @@ xfs_set_inode_alloc(
 		ino = XFS_AGINO_TO_INO(mp, index, agino);
 
 		pag = xfs_perag_get(mp, index);
-
-		if (xfs_is_inode32(mp)) {
-			if (ino > XFS_MAXINUMBER_32) {
-				pag->pagi_inodeok = 0;
-				pag->pagf_metadata = 0;
-			} else {
-				pag->pagi_inodeok = 1;
-				maxagi++;
-				if (index < max_metadata)
-					pag->pagf_metadata = 1;
-				else
-					pag->pagf_metadata = 0;
-			}
-		} else {
-			pag->pagi_inodeok = 1;
-			pag->pagf_metadata = 0;
-		}
-
+		if (xfs_set_inode_alloc_perag(pag, ino, max_metadata))
+			maxagi++;
 		xfs_perag_put(pag);
 	}
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (12 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 13/42] xfs: perags need atomic operational state Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-23  5:41   ` Allison Henderson
  2023-02-01 19:28   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 15/42] xfs: rework xfs_alloc_vextent() Dave Chinner
                   ` (28 subsequent siblings)
  42 siblings, 2 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

In several places we iterate every AG from a specific start agno and
wrap back to the first AG when we reach the end of the filesystem to
continue searching. We don't have a primitive for this iteration
yet, so add one for conversion of these algorithms to per-ag based
iteration.

The filestream AG select code is a mess, and this initially makes it
worse. The per-ag selection needs to be driven completely into the
filestream code to clean this up and it will be done in a future
patch that makes the filestream allocator use active per-ag
references correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.h     | 45 +++++++++++++++++++++-
 fs/xfs/libxfs/xfs_bmap.c   | 76 ++++++++++++++++++++++----------------
 fs/xfs/libxfs/xfs_ialloc.c | 32 ++++++++--------
 3 files changed, 104 insertions(+), 49 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 187d30d9bb13..8f43b91d4cf3 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -237,7 +237,6 @@ xfs_perag_next(
 #define for_each_perag_from(mp, agno, pag) \
 	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
 
-
 #define for_each_perag(mp, agno, pag) \
 	(agno) = 0; \
 	for_each_perag_from((mp), (agno), (pag))
@@ -249,6 +248,50 @@ xfs_perag_next(
 		xfs_perag_rele(pag), \
 		(pag) = xfs_perag_grab_tag((mp), (agno), (tag)))
 
+static inline struct xfs_perag *
+xfs_perag_next_wrap(
+	struct xfs_perag	*pag,
+	xfs_agnumber_t		*agno,
+	xfs_agnumber_t		stop_agno,
+	xfs_agnumber_t		wrap_agno)
+{
+	struct xfs_mount	*mp = pag->pag_mount;
+
+	*agno = pag->pag_agno + 1;
+	xfs_perag_rele(pag);
+	while (*agno != stop_agno) {
+		if (*agno >= wrap_agno)
+			*agno = 0;
+		if (*agno == stop_agno)
+			break;
+
+		pag = xfs_perag_grab(mp, *agno);
+		if (pag)
+			return pag;
+		(*agno)++;
+	}
+	return NULL;
+}
+
+/*
+ * Iterate all AGs from start_agno through wrap_agno, then 0 through
+ * (start_agno - 1).
+ */
+#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag) \
+	for ((agno) = (start_agno), (pag) = xfs_perag_grab((mp), (agno)); \
+		(pag) != NULL; \
+		(pag) = xfs_perag_next_wrap((pag), &(agno), (start_agno), \
+				(wrap_agno)))
+
+/*
+ * Iterate all AGs from start_agno through to the end of the filesystem, then 0
+ * through (start_agno - 1).
+ */
+#define for_each_perag_wrap(mp, start_agno, agno, pag) \
+	for_each_perag_wrap_at((mp), (start_agno), (mp)->m_sb.sb_agcount, \
+				(agno), (pag))
+
+
 struct aghdr_init_data {
 	/* per ag data */
 	xfs_agblock_t		agno;		/* ag to init */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6aad0ea5e606..e5519abbfa0d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3136,17 +3136,14 @@ xfs_bmap_adjacent(
 
 static int
 xfs_bmap_longest_free_extent(
+	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
-	xfs_agnumber_t		ag,
 	xfs_extlen_t		*blen,
 	int			*notinit)
 {
-	struct xfs_mount	*mp = tp->t_mountp;
-	struct xfs_perag	*pag;
 	xfs_extlen_t		longest;
 	int			error = 0;
 
-	pag = xfs_perag_get(mp, ag);
 	if (!xfs_perag_initialised_agf(pag)) {
 		error = xfs_alloc_read_agf(pag, tp, XFS_ALLOC_FLAG_TRYLOCK,
 				NULL);
@@ -3156,19 +3153,17 @@ xfs_bmap_longest_free_extent(
 				*notinit = 1;
 				error = 0;
 			}
-			goto out;
+			return error;
 		}
 	}
 
 	longest = xfs_alloc_longest_free_extent(pag,
-				xfs_alloc_min_freelist(mp, pag),
+				xfs_alloc_min_freelist(pag->pag_mount, pag),
 				xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE));
 	if (*blen < longest)
 		*blen = longest;
 
-out:
-	xfs_perag_put(pag);
-	return error;
+	return 0;
 }
 
 static void
@@ -3206,9 +3201,10 @@ xfs_bmap_btalloc_select_lengths(
 	xfs_extlen_t		*blen)
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
-	xfs_agnumber_t		ag, startag;
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno, startag;
 	int			notinit = 0;
-	int			error;
+	int			error = 0;
 
 	args->type = XFS_ALLOCTYPE_START_BNO;
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
@@ -3218,21 +3214,21 @@ xfs_bmap_btalloc_select_lengths(
 	}
 
 	args->total = ap->total;
-	startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
+	startag = XFS_FSB_TO_AGNO(mp, args->fsbno);
 	if (startag == NULLAGNUMBER)
-		startag = ag = 0;
+		startag = 0;
 
-	while (*blen < args->maxlen) {
-		error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
+	*blen = 0;
+	for_each_perag_wrap(mp, startag, agno, pag) {
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen,
 						     &notinit);
 		if (error)
-			return error;
-
-		if (++ag == mp->m_sb.sb_agcount)
-			ag = 0;
-		if (ag == startag)
+			break;
+		if (*blen >= args->maxlen)
 			break;
 	}
+	if (pag)
+		xfs_perag_rele(pag);
 
 	xfs_bmap_select_minlen(ap, args, blen, notinit);
 	return 0;
@@ -3245,7 +3241,8 @@ xfs_bmap_btalloc_filestreams(
 	xfs_extlen_t		*blen)
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
-	xfs_agnumber_t		ag;
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		start_agno;
 	int			notinit = 0;
 	int			error;
 
@@ -3259,33 +3256,50 @@ xfs_bmap_btalloc_filestreams(
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 	args->total = ap->total;
 
-	ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
-	if (ag == NULLAGNUMBER)
-		ag = 0;
+	start_agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
+	if (start_agno == NULLAGNUMBER)
+		start_agno = 0;
 
-	error = xfs_bmap_longest_free_extent(args->tp, ag, blen, &notinit);
-	if (error)
-		return error;
+	pag = xfs_perag_grab(mp, start_agno);
+	if (pag) {
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen,
+				&notinit);
+		xfs_perag_rele(pag);
+		if (error)
+			return error;
+	}
 
 	if (*blen < args->maxlen) {
-		error = xfs_filestream_new_ag(ap, &ag);
+		xfs_agnumber_t	agno = start_agno;
+
+		error = xfs_filestream_new_ag(ap, &agno);
 		if (error)
 			return error;
+		if (agno == NULLAGNUMBER)
+			goto out_select;
 
-		error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
-						     &notinit);
+		pag = xfs_perag_grab(mp, agno);
+		if (!pag)
+			goto out_select;
+
+		error = xfs_bmap_longest_free_extent(pag, args->tp,
+				blen, &notinit);
+		xfs_perag_rele(pag);
 		if (error)
 			return error;
 
+		start_agno = agno;
+
 	}
 
+out_select:
 	xfs_bmap_select_minlen(ap, args, blen, notinit);
 
 	/*
 	 * Set the failure fallback case to look in the selected AG as stream
 	 * may have moved.
 	 */
-	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, ag, 0);
+	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, start_agno, 0);
 	return 0;
 }
 
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 2a323ffa5ba9..50fef3f5af51 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1725,7 +1725,7 @@ xfs_dialloc(
 	bool			ok_alloc = true;
 	bool			low_space = false;
 	int			flags;
-	xfs_ino_t		ino;
+	xfs_ino_t		ino = NULLFSINO;
 
 	/*
 	 * Directories, symlinks, and regular files frequently allocate at least
@@ -1773,39 +1773,37 @@ xfs_dialloc(
 	 * or in which we can allocate some inodes.  Iterate through the
 	 * allocation groups upward, wrapping at the end.
 	 */
-	agno = start_agno;
 	flags = XFS_ALLOC_FLAG_TRYLOCK;
-	for (;;) {
-		pag = xfs_perag_grab(mp, agno);
+retry:
+	for_each_perag_wrap_at(mp, start_agno, mp->m_maxagi, agno, pag) {
 		if (xfs_dialloc_good_ag(pag, *tpp, mode, flags, ok_alloc)) {
 			error = xfs_dialloc_try_ag(pag, tpp, parent,
 					&ino, ok_alloc);
 			if (error != -EAGAIN)
 				break;
+			error = 0;
 		}
 
 		if (xfs_is_shutdown(mp)) {
 			error = -EFSCORRUPTED;
 			break;
 		}
-		if (++agno == mp->m_maxagi)
-			agno = 0;
-		if (agno == start_agno) {
-			if (!flags) {
-				error = -ENOSPC;
-				break;
-			}
+	}
+	if (pag)
+		xfs_perag_rele(pag);
+	if (error)
+		return error;
+	if (ino == NULLFSINO) {
+		if (flags) {
 			flags = 0;
 			if (low_space)
 				ok_alloc = true;
+			goto retry;
 		}
-		xfs_perag_rele(pag);
+		return -ENOSPC;
 	}
-
-	if (!error)
-		*new_ino = ino;
-	xfs_perag_rele(pag);
-	return error;
+	*new_ino = ino;
+	return 0;
 }
 
 /*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 15/42] xfs: rework xfs_alloc_vextent()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (13 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 19:39   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 16/42] xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags() Dave Chinner
                   ` (27 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

It's a multiplexing mess that can be greatly simplified, and really
needs to be simplified to allow active per-ag references to
propagate from initial AG selection code the the bmapi code.

This splits the code out into separate a parameter checking
function, an iterator function, and allocation completion functions
and then implements the individual policies using these functions.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 464 +++++++++++++++++++++++---------------
 1 file changed, 285 insertions(+), 179 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 246c2e7d9e7a..39e34a1bfa31 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3151,29 +3151,20 @@ xfs_alloc_read_agf(
 }
 
 /*
- * Allocate an extent (variable-size).
- * Depending on the allocation type, we either look in a single allocation
- * group or loop over the allocation groups to find the result.
+ * Pre-proces allocation arguments to set initial state that we don't require
+ * callers to set up correctly, as well as bounds check the allocation args
+ * that are set up.
  */
-int				/* error */
-xfs_alloc_vextent(
-	struct xfs_alloc_arg	*args)	/* allocation argument structure */
+static int
+xfs_alloc_vextent_check_args(
+	struct xfs_alloc_arg	*args)
 {
-	xfs_agblock_t		agsize;	/* allocation group size */
-	int			error;
-	int			flags;	/* XFS_ALLOC_FLAG_... locking flags */
-	struct xfs_mount	*mp;	/* mount structure pointer */
-	xfs_agnumber_t		sagno;	/* starting allocation group number */
-	xfs_alloctype_t		type;	/* input allocation type */
-	int			bump_rotor = 0;
-	xfs_agnumber_t		rotorstep = xfs_rotorstep; /* inode32 agf stepper */
-	xfs_agnumber_t		minimum_agno = 0;
+	struct xfs_mount	*mp = args->mp;
+	xfs_agblock_t		agsize;
 
-	mp = args->mp;
-	type = args->otype = args->type;
+	args->otype = args->type;
 	args->agbno = NULLAGBLOCK;
-	if (args->tp->t_highest_agno != NULLAGNUMBER)
-		minimum_agno = args->tp->t_highest_agno;
+
 	/*
 	 * Just fix this up, for the case where the last a.g. is shorter
 	 * (or there's only one a.g.) and the caller couldn't easily figure
@@ -3195,199 +3186,314 @@ xfs_alloc_vextent(
 	    args->mod >= args->prod) {
 		args->fsbno = NULLFSBLOCK;
 		trace_xfs_alloc_vextent_badargs(args);
+		return -ENOSPC;
+	}
+	return 0;
+}
+
+/*
+ * Post-process allocation results to set the allocated block number correctly
+ * for the caller.
+ *
+ * XXX: xfs_alloc_vextent() should really be returning ENOSPC for ENOSPC, not
+ * hiding it behind a "successful" NULLFSBLOCK allocation.
+ */
+static void
+xfs_alloc_vextent_set_fsbno(
+	struct xfs_alloc_arg	*args,
+	xfs_agnumber_t		minimum_agno)
+{
+	struct xfs_mount	*mp = args->mp;
+
+	/*
+	 * We can end up here with a locked AGF. If we failed, the caller is
+	 * likely going to try to allocate again with different parameters, and
+	 * that can widen the AGs that are searched for free space. If we have
+	 * to do BMBT block allocation, we have to do a new allocation.
+	 *
+	 * Hence leaving this function with the AGF locked opens up potential
+	 * ABBA AGF deadlocks because a future allocation attempt in this
+	 * transaction may attempt to lock a lower number AGF.
+	 *
+	 * We can't release the AGF until the transaction is commited, so at
+	 * this point we must update the "first allocation" tracker to point at
+	 * this AG if the tracker is empty or points to a lower AG. This allows
+	 * the next allocation attempt to be modified appropriately to avoid
+	 * deadlocks.
+	 */
+	if (args->agbp &&
+	    (args->tp->t_highest_agno == NULLAGNUMBER ||
+	     args->agno > minimum_agno))
+		args->tp->t_highest_agno = args->agno;
+
+	/* Allocation failed with ENOSPC if NULLAGBLOCK was returned. */
+	if (args->agbno == NULLAGBLOCK) {
+		args->fsbno = NULLFSBLOCK;
+		return;
+	}
+
+	args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
+#ifdef DEBUG
+	ASSERT(args->len >= args->minlen);
+	ASSERT(args->len <= args->maxlen);
+	ASSERT(args->agbno % args->alignment == 0);
+	XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno), args->len);
+#endif
+}
+
+/*
+ * Allocate within a single AG only.
+ */
+static int
+xfs_alloc_vextent_this_ag(
+	struct xfs_alloc_arg	*args,
+	xfs_agnumber_t		minimum_agno)
+{
+	struct xfs_mount	*mp = args->mp;
+	int			error;
+
+	error = xfs_alloc_vextent_check_args(args);
+	if (error) {
+		if (error == -ENOSPC)
+			return 0;
+		return error;
+	}
+
+	args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
+	if (minimum_agno > args->agno) {
+		trace_xfs_alloc_vextent_skip_deadlock(args);
+		args->fsbno = NULLFSBLOCK;
 		return 0;
 	}
 
-	switch (type) {
-	case XFS_ALLOCTYPE_THIS_AG:
-	case XFS_ALLOCTYPE_NEAR_BNO:
-	case XFS_ALLOCTYPE_THIS_BNO:
-		/*
-		 * These three force us into a single a.g.
-		 */
-		args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
-		args->pag = xfs_perag_get(mp, args->agno);
+	args->pag = xfs_perag_get(mp, args->agno);
+	error = xfs_alloc_fix_freelist(args, 0);
+	if (error) {
+		trace_xfs_alloc_vextent_nofix(args);
+		goto out_error;
+	}
+	if (!args->agbp) {
+		trace_xfs_alloc_vextent_noagbp(args);
+		args->fsbno = NULLFSBLOCK;
+		goto out_error;
+	}
+	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+	error = xfs_alloc_ag_vextent(args);
 
-		if (minimum_agno > args->agno) {
-			trace_xfs_alloc_vextent_skip_deadlock(args);
-			error = 0;
-			break;
-		}
+	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
+out_error:
+	xfs_perag_put(args->pag);
+	return error;
+}
+
+/*
+ * Iterate all AGs trying to allocate an extent starting from @start_ag.
+ *
+ * If the incoming allocation type is XFS_ALLOCTYPE_NEAR_BNO, it means the
+ * allocation attempts in @start_agno have locality information. If we fail to
+ * allocate in that AG, then we revert to anywhere-in-AG for all the other AGs
+ * we attempt to allocation in as there is no locality optimisation possible for
+ * those allocations.
+ *
+ * When we wrap the AG iteration at the end of the filesystem, we have to be
+ * careful not to wrap into AGs below ones we already have locked in the
+ * transaction if we are doing a blocking iteration. This will result in an
+ * out-of-order locking of AGFs and hence can cause deadlocks.
+ */
+static int
+xfs_alloc_vextent_iterate_ags(
+	struct xfs_alloc_arg	*args,
+	xfs_agnumber_t		minimum_agno,
+	xfs_agnumber_t		start_agno,
+	uint32_t		flags)
+{
+	struct xfs_mount	*mp = args->mp;
+	int			error = 0;
 
-		error = xfs_alloc_fix_freelist(args, 0);
+	ASSERT(start_agno >= minimum_agno);
+
+	/*
+	 * Loop over allocation groups twice; first time with
+	 * trylock set, second time without.
+	 */
+	args->agno = start_agno;
+	for (;;) {
+		args->pag = xfs_perag_get(mp, args->agno);
+		error = xfs_alloc_fix_freelist(args, flags);
 		if (error) {
 			trace_xfs_alloc_vextent_nofix(args);
-			goto error0;
-		}
-		if (!args->agbp) {
-			trace_xfs_alloc_vextent_noagbp(args);
 			break;
 		}
-		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
-		if ((error = xfs_alloc_ag_vextent(args)))
-			goto error0;
-		break;
-	case XFS_ALLOCTYPE_START_BNO:
 		/*
-		 * Try near allocation first, then anywhere-in-ag after
-		 * the first a.g. fails.
+		 * If we get a buffer back then the allocation will fly.
 		 */
-		if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
-		    xfs_is_inode32(mp)) {
-			args->fsbno = XFS_AGB_TO_FSB(mp,
-					((mp->m_agfrotor / rotorstep) %
-					mp->m_sb.sb_agcount), 0);
-			bump_rotor = 1;
+		if (args->agbp) {
+			error = xfs_alloc_ag_vextent(args);
+			break;
 		}
-		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
-		args->type = XFS_ALLOCTYPE_NEAR_BNO;
-		fallthrough;
-	case XFS_ALLOCTYPE_FIRST_AG:
+
+		trace_xfs_alloc_vextent_loopfailed(args);
+
 		/*
-		 * Rotate through the allocation groups looking for a winner.
-		 * If we are blocking, we must obey minimum_agno contraints for
-		 * avoiding ABBA deadlocks on AGF locking.
+		 * Didn't work, figure out the next iteration.
 		 */
-		if (type == XFS_ALLOCTYPE_FIRST_AG) {
-			/*
-			 * Start with allocation group given by bno.
-			 */
-			args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
+		if (args->agno == start_agno &&
+		    args->otype == XFS_ALLOCTYPE_START_BNO)
 			args->type = XFS_ALLOCTYPE_THIS_AG;
-			sagno = minimum_agno;
-			flags = 0;
-		} else {
-			/*
-			 * Start with the given allocation group.
-			 */
-			args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno);
-			flags = XFS_ALLOC_FLAG_TRYLOCK;
+
+		/*
+		 * If we are try-locking, we can't deadlock on AGF locks so we
+		 * can wrap all the way back to the first AG. Otherwise, wrap
+		 * back to the start AG so we can't deadlock and let the end of
+		 * scan handler decide what to do next.
+		 */
+		if (++(args->agno) == mp->m_sb.sb_agcount) {
+			if (flags & XFS_ALLOC_FLAG_TRYLOCK)
+				args->agno = 0;
+			else
+				args->agno = minimum_agno;
 		}
 
 		/*
-		 * Loop over allocation groups twice; first time with
-		 * trylock set, second time without.
+		 * Reached the starting a.g., must either be done
+		 * or switch to non-trylock mode.
 		 */
-		for (;;) {
-			args->pag = xfs_perag_get(mp, args->agno);
-			error = xfs_alloc_fix_freelist(args, flags);
-			if (error) {
-				trace_xfs_alloc_vextent_nofix(args);
-				goto error0;
-			}
-			/*
-			 * If we get a buffer back then the allocation will fly.
-			 */
-			if (args->agbp) {
-				if ((error = xfs_alloc_ag_vextent(args)))
-					goto error0;
+		if (args->agno == start_agno) {
+			if (flags == 0) {
+				args->agbno = NULLAGBLOCK;
+				trace_xfs_alloc_vextent_allfailed(args);
 				break;
 			}
 
-			trace_xfs_alloc_vextent_loopfailed(args);
+			flags = 0;
+			if (args->otype == XFS_ALLOCTYPE_START_BNO) {
+				args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+				args->type = XFS_ALLOCTYPE_NEAR_BNO;
+			}
+		}
+		xfs_perag_put(args->pag);
+		args->pag = NULL;
+	}
+	if (args->pag) {
+		xfs_perag_put(args->pag);
+		args->pag = NULL;
+	}
+	return error;
+}
 
-			/*
-			 * Didn't work, figure out the next iteration.
-			 */
-			if (args->agno == sagno &&
-			    type == XFS_ALLOCTYPE_START_BNO)
-				args->type = XFS_ALLOCTYPE_THIS_AG;
+/*
+ * Iterate from the AGs from the start AG to the end of the filesystem, trying
+ * to allocate blocks. It starts with a near allocation attempt in the initial
+ * AG, then falls back to anywhere-in-ag after the first AG fails. It will wrap
+ * back to zero if allowed by previous allocations in this transaction,
+ * otherwise will wrap back to the start AG and run a second blocking pass to
+ * the end of the filesystem.
+ */
+static int
+xfs_alloc_vextent_start_ag(
+	struct xfs_alloc_arg	*args,
+	xfs_agnumber_t		minimum_agno)
+{
+	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		start_agno;
+	xfs_agnumber_t		rotorstep = xfs_rotorstep;
+	bool			bump_rotor = false;
+	int			error;
 
-			/*
-			 * If we are try-locking, we can't deadlock on AGF
-			 * locks, so we can wrap all the way back to the first
-			 * AG. Otherwise, wrap back to the start AG so we can't
-			 * deadlock, and let the end of scan handler decide what
-			 * to do next.
-			 */
-			if (++(args->agno) == mp->m_sb.sb_agcount) {
-				if (flags & XFS_ALLOC_FLAG_TRYLOCK)
-					args->agno = 0;
-				else
-					args->agno = sagno;
-			}
+	error = xfs_alloc_vextent_check_args(args);
+	if (error) {
+		if (error == -ENOSPC)
+			return 0;
+		return error;
+	}
 
-			/*
-			 * Reached the starting a.g., must either be done
-			 * or switch to non-trylock mode.
-			 */
-			if (args->agno == sagno) {
-				if (flags == 0) {
-					args->agbno = NULLAGBLOCK;
-					trace_xfs_alloc_vextent_allfailed(args);
-					break;
-				}
+	if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
+	    xfs_is_inode32(mp)) {
+		args->fsbno = XFS_AGB_TO_FSB(mp,
+				((mp->m_agfrotor / rotorstep) %
+				mp->m_sb.sb_agcount), 0);
+		bump_rotor = 1;
+	}
+	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
+	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 
-				/*
-				 * Blocking pass next, so we must obey minimum
-				 * agno constraints to avoid ABBA AGF deadlocks.
-				 */
-				flags = 0;
-				if (minimum_agno > sagno)
-					sagno = minimum_agno;
-
-				if (type == XFS_ALLOCTYPE_START_BNO) {
-					args->agbno = XFS_FSB_TO_AGBNO(mp,
-						args->fsbno);
-					args->type = XFS_ALLOCTYPE_NEAR_BNO;
-				}
-			}
-			xfs_perag_put(args->pag);
-		}
-		if (bump_rotor) {
-			if (args->agno == sagno)
-				mp->m_agfrotor = (mp->m_agfrotor + 1) %
-					(mp->m_sb.sb_agcount * rotorstep);
-			else
-				mp->m_agfrotor = (args->agno * rotorstep + 1) %
-					(mp->m_sb.sb_agcount * rotorstep);
-		}
-		break;
-	default:
-		ASSERT(0);
-		/* NOTREACHED */
+	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno, start_agno,
+			XFS_ALLOC_FLAG_TRYLOCK);
+	if (bump_rotor) {
+		if (args->agno == start_agno)
+			mp->m_agfrotor = (mp->m_agfrotor + 1) %
+				(mp->m_sb.sb_agcount * rotorstep);
+		else
+			mp->m_agfrotor = (args->agno * rotorstep + 1) %
+				(mp->m_sb.sb_agcount * rotorstep);
 	}
-	if (args->agbno == NULLAGBLOCK) {
-		args->fsbno = NULLFSBLOCK;
-	} else {
-		args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
-#ifdef DEBUG
-		ASSERT(args->len >= args->minlen);
-		ASSERT(args->len <= args->maxlen);
-		ASSERT(args->agbno % args->alignment == 0);
-		XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno),
-			args->len);
-#endif
 
+	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
+	return error;
+}
+
+/*
+ * Iterate from the agno indicated from args->fsbno through to the end of the
+ * filesystem attempting blocking allocation. This does not wrap or try a second
+ * pass, so will not recurse into AGs lower than indicated by fsbno.
+ */
+static int
+xfs_alloc_vextent_first_ag(
+	struct xfs_alloc_arg	*args,
+	xfs_agnumber_t		minimum_agno)
+{
+	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		start_agno;
+	int			error;
+
+	error = xfs_alloc_vextent_check_args(args);
+	if (error) {
+		if (error == -ENOSPC)
+			return 0;
+		return error;
 	}
 
-	/*
-	 * We end up here with a locked AGF. If we failed, the caller is likely
-	 * going to try to allocate again with different parameters, and that
-	 * can widen the AGs that are searched for free space. If we have to do
-	 * BMBT block allocation, we have to do a new allocation.
-	 *
-	 * Hence leaving this function with the AGF locked opens up potential
-	 * ABBA AGF deadlocks because a future allocation attempt in this
-	 * transaction may attempt to lock a lower number AGF.
-	 *
-	 * We can't release the AGF until the transaction is commited, so at
-	 * this point we must update the "firstblock" tracker to point at this
-	 * AG if the tracker is empty or points to a lower AG. This allows the
-	 * next allocation attempt to be modified appropriately to avoid
-	 * deadlocks.
-	 */
-	if (args->agbp &&
-	    (args->tp->t_highest_agno == NULLAGNUMBER ||
-	     args->pag->pag_agno > minimum_agno))
-		args->tp->t_highest_agno = args->pag->pag_agno;
-	xfs_perag_put(args->pag);
-	return 0;
-error0:
-	xfs_perag_put(args->pag);
+	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
+
+	args->type = XFS_ALLOCTYPE_THIS_AG;
+	error =  xfs_alloc_vextent_iterate_ags(args, minimum_agno,
+			start_agno, 0);
+	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
 	return error;
 }
 
+/*
+ * Allocate an extent (variable-size).
+ * Depending on the allocation type, we either look in a single allocation
+ * group or loop over the allocation groups to find the result.
+ */
+int
+xfs_alloc_vextent(
+	struct xfs_alloc_arg	*args)
+{
+	xfs_agnumber_t		minimum_agno = 0;
+
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		minimum_agno = args->tp->t_highest_agno;
+
+	switch (args->type) {
+	case XFS_ALLOCTYPE_THIS_AG:
+	case XFS_ALLOCTYPE_NEAR_BNO:
+	case XFS_ALLOCTYPE_THIS_BNO:
+		return xfs_alloc_vextent_this_ag(args, minimum_agno);
+	case XFS_ALLOCTYPE_START_BNO:
+		return xfs_alloc_vextent_start_ag(args, minimum_agno);
+	case XFS_ALLOCTYPE_FIRST_AG:
+		return xfs_alloc_vextent_first_ag(args, minimum_agno);
+	default:
+		ASSERT(0);
+		/* NOTREACHED */
+	}
+	/* Should never get here */
+	return -EFSCORRUPTED;
+}
+
 /* Ensure that the freelist is at full capacity. */
 int
 xfs_free_extent_fix_freelist(
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 16/42] xfs: factor xfs_alloc_vextent_this_ag() for  _iterate_ags()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (14 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 15/42] xfs: rework xfs_alloc_vextent() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 17/42] xfs: combine __xfs_alloc_vextent_this_ag and xfs_alloc_ag_vextent Dave Chinner
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The core of the per-ag iteration is effectively doing a "this ag"
allocation on one AG at a time. Use the same code to implement the
core "this ag" allocation in both xfs_alloc_vextent_this_ag()
and xfs_alloc_vextent_iterate_ags().

This means we only call xfs_alloc_ag_vextent() from one place so we
can easily collapse the call stack in future patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 50 ++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 39e34a1bfa31..2dec95f35562 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3244,6 +3244,28 @@ xfs_alloc_vextent_set_fsbno(
 /*
  * Allocate within a single AG only.
  */
+static int
+__xfs_alloc_vextent_this_ag(
+	struct xfs_alloc_arg	*args)
+{
+	struct xfs_mount	*mp = args->mp;
+	int			error;
+
+	error = xfs_alloc_fix_freelist(args, 0);
+	if (error) {
+		trace_xfs_alloc_vextent_nofix(args);
+		return error;
+	}
+	if (!args->agbp) {
+		/* cannot allocate in this AG at all */
+		trace_xfs_alloc_vextent_noagbp(args);
+		args->agbno = NULLAGBLOCK;
+		return 0;
+	}
+	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+	return xfs_alloc_ag_vextent(args);
+}
+
 static int
 xfs_alloc_vextent_this_ag(
 	struct xfs_alloc_arg	*args,
@@ -3267,21 +3289,9 @@ xfs_alloc_vextent_this_ag(
 	}
 
 	args->pag = xfs_perag_get(mp, args->agno);
-	error = xfs_alloc_fix_freelist(args, 0);
-	if (error) {
-		trace_xfs_alloc_vextent_nofix(args);
-		goto out_error;
-	}
-	if (!args->agbp) {
-		trace_xfs_alloc_vextent_noagbp(args);
-		args->fsbno = NULLFSBLOCK;
-		goto out_error;
-	}
-	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
-	error = xfs_alloc_ag_vextent(args);
+	error = __xfs_alloc_vextent_this_ag(args);
 
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-out_error:
 	xfs_perag_put(args->pag);
 	return error;
 }
@@ -3319,24 +3329,16 @@ xfs_alloc_vextent_iterate_ags(
 	args->agno = start_agno;
 	for (;;) {
 		args->pag = xfs_perag_get(mp, args->agno);
-		error = xfs_alloc_fix_freelist(args, flags);
+		error = __xfs_alloc_vextent_this_ag(args);
 		if (error) {
-			trace_xfs_alloc_vextent_nofix(args);
+			args->agbno = NULLAGBLOCK;
 			break;
 		}
-		/*
-		 * If we get a buffer back then the allocation will fly.
-		 */
-		if (args->agbp) {
-			error = xfs_alloc_ag_vextent(args);
+		if (args->agbp)
 			break;
-		}
 
 		trace_xfs_alloc_vextent_loopfailed(args);
 
-		/*
-		 * Didn't work, figure out the next iteration.
-		 */
 		if (args->agno == start_agno &&
 		    args->otype == XFS_ALLOCTYPE_START_BNO)
 			args->type = XFS_ALLOCTYPE_THIS_AG;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 17/42] xfs: combine __xfs_alloc_vextent_this_ag and  xfs_alloc_ag_vextent
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (15 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 16/42] xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 22:25   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 18/42] xfs: use xfs_alloc_vextent_this_ag() where appropriate Dave Chinner
                   ` (25 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There's a bit of a recursive conundrum around
xfs_alloc_ag_vextent(). We can't first call xfs_alloc_ag_vextent()
without preparing the AGFL for the allocation, and preparing the
AGFL calls xfs_alloc_ag_vextent() to prepare the AGFL for the
allocation. This "double allocation" requirement is not really clear
from the current xfs_alloc_fix_freelist() calls that are sprinkled
through the allocation code.

It's not helped that xfs_alloc_ag_vextent() can actually allocate
from the AGFL itself, but there's special code to prevent AGFL prep
allocations from allocating from the free list it's trying to prep.
The naming is also not consistent: args->wasfromfl is true when we
allocated _from_ the free list, but the indication that we are
allocating _for_ the free list is via checking that (args->resv ==
XFS_AG_RESV_AGFL).

So, lets make this "allocation required for allocation" situation
clear by moving it all inside xfs_alloc_ag_vextent(). The freelist
allocation is a specific XFS_ALLOCTYPE_THIS_AG allocation, which
translated directly to xfs_alloc_ag_vextent_size() allocation.

This enables us to replace __xfs_alloc_vextent_this_ag() with a call
to xfs_alloc_ag_vextent(), and we drive the freelist fixing further
into the per-ag allocation algorithm.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 65 +++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 30 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 2dec95f35562..011baace7e9d 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1140,22 +1140,38 @@ xfs_alloc_ag_vextent_small(
  * and of the form k * prod + mod unless there's nothing that large.
  * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
  */
-STATIC int			/* error */
+static int
 xfs_alloc_ag_vextent(
-	xfs_alloc_arg_t	*args)	/* argument structure for allocation */
+	struct xfs_alloc_arg	*args)
 {
-	int		error=0;
+	struct xfs_mount	*mp = args->mp;
+	int			error = 0;
 
 	ASSERT(args->minlen > 0);
 	ASSERT(args->maxlen > 0);
 	ASSERT(args->minlen <= args->maxlen);
 	ASSERT(args->mod < args->prod);
 	ASSERT(args->alignment > 0);
+	ASSERT(args->resv != XFS_AG_RESV_AGFL);
+
+
+	error = xfs_alloc_fix_freelist(args, 0);
+	if (error) {
+		trace_xfs_alloc_vextent_nofix(args);
+		return error;
+	}
+	if (!args->agbp) {
+		/* cannot allocate in this AG at all */
+		trace_xfs_alloc_vextent_noagbp(args);
+		args->agbno = NULLAGBLOCK;
+		return 0;
+	}
+	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+	args->wasfromfl = 0;
 
 	/*
 	 * Branch to correct routine based on the type.
 	 */
-	args->wasfromfl = 0;
 	switch (args->type) {
 	case XFS_ALLOCTYPE_THIS_AG:
 		error = xfs_alloc_ag_vextent_size(args);
@@ -1176,7 +1192,6 @@ xfs_alloc_ag_vextent(
 
 	ASSERT(args->len >= args->minlen);
 	ASSERT(args->len <= args->maxlen);
-	ASSERT(!args->wasfromfl || args->resv != XFS_AG_RESV_AGFL);
 	ASSERT(args->agbno % args->alignment == 0);
 
 	/* if not file data, insert new block into the reverse map btree */
@@ -2721,7 +2736,7 @@ xfs_alloc_fix_freelist(
 		targs.resv = XFS_AG_RESV_AGFL;
 
 		/* Allocate as many blocks as possible at once. */
-		error = xfs_alloc_ag_vextent(&targs);
+		error = xfs_alloc_ag_vextent_size(&targs);
 		if (error)
 			goto out_agflbp_relse;
 
@@ -2735,6 +2750,18 @@ xfs_alloc_fix_freelist(
 				break;
 			goto out_agflbp_relse;
 		}
+
+		if (!xfs_rmap_should_skip_owner_update(&targs.oinfo)) {
+			error = xfs_rmap_alloc(tp, agbp, pag,
+				       targs.agbno, targs.len, &targs.oinfo);
+			if (error)
+				goto out_agflbp_relse;
+		}
+		error = xfs_alloc_update_counters(tp, agbp,
+						  -((long)(targs.len)));
+		if (error)
+			goto out_agflbp_relse;
+
 		/*
 		 * Put each allocated block on the list.
 		 */
@@ -3244,28 +3271,6 @@ xfs_alloc_vextent_set_fsbno(
 /*
  * Allocate within a single AG only.
  */
-static int
-__xfs_alloc_vextent_this_ag(
-	struct xfs_alloc_arg	*args)
-{
-	struct xfs_mount	*mp = args->mp;
-	int			error;
-
-	error = xfs_alloc_fix_freelist(args, 0);
-	if (error) {
-		trace_xfs_alloc_vextent_nofix(args);
-		return error;
-	}
-	if (!args->agbp) {
-		/* cannot allocate in this AG at all */
-		trace_xfs_alloc_vextent_noagbp(args);
-		args->agbno = NULLAGBLOCK;
-		return 0;
-	}
-	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
-	return xfs_alloc_ag_vextent(args);
-}
-
 static int
 xfs_alloc_vextent_this_ag(
 	struct xfs_alloc_arg	*args,
@@ -3289,7 +3294,7 @@ xfs_alloc_vextent_this_ag(
 	}
 
 	args->pag = xfs_perag_get(mp, args->agno);
-	error = __xfs_alloc_vextent_this_ag(args);
+	error = xfs_alloc_ag_vextent(args);
 
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
 	xfs_perag_put(args->pag);
@@ -3329,7 +3334,7 @@ xfs_alloc_vextent_iterate_ags(
 	args->agno = start_agno;
 	for (;;) {
 		args->pag = xfs_perag_get(mp, args->agno);
-		error = __xfs_alloc_vextent_this_ag(args);
+		error = xfs_alloc_ag_vextent(args);
 		if (error) {
 			args->agbno = NULLAGBLOCK;
 			break;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 18/42] xfs: use xfs_alloc_vextent_this_ag() where appropriate
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (16 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 17/42] xfs: combine __xfs_alloc_vextent_this_ag and xfs_alloc_ag_vextent Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 19/42] xfs: factor xfs_bmap_btalloc() Dave Chinner
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Change obvious callers of single AG allocation to use
xfs_alloc_vextent_this_ag(). Drive the per-ag grabbing out to the
callers, too, so that callers with active references don't need
to do new lookups just for an allocation in a context that already
has a perag reference.

The only remaining caller that does single AG allocation through
xfs_alloc_vextent() is xfs_bmap_btalloc() with
XFS_ALLOCTYPE_NEAR_BNO. That is going to need more untangling before
it can be converted cleanly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c             |  3 +-
 fs/xfs/libxfs/xfs_alloc.c          | 26 ++++++++-------
 fs/xfs/libxfs/xfs_alloc.h          |  6 ++++
 fs/xfs/libxfs/xfs_bmap.c           | 52 +++++++++++++++++-------------
 fs/xfs/libxfs/xfs_bmap_btree.c     | 22 ++++++-------
 fs/xfs/libxfs/xfs_ialloc.c         |  9 ++++--
 fs/xfs/libxfs/xfs_ialloc_btree.c   |  3 +-
 fs/xfs/libxfs/xfs_refcount_btree.c |  3 +-
 fs/xfs/scrub/repair.c              |  3 +-
 9 files changed, 74 insertions(+), 53 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index a3bdcde95845..053d77a283f7 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -887,6 +887,7 @@ xfs_ag_shrink_space(
 	struct xfs_alloc_arg	args = {
 		.tp	= *tpp,
 		.mp	= mp,
+		.pag	= pag,
 		.type	= XFS_ALLOCTYPE_THIS_BNO,
 		.minlen = delta,
 		.maxlen = delta,
@@ -938,7 +939,7 @@ xfs_ag_shrink_space(
 		return error;
 
 	/* internal log shouldn't also show up in the free space btrees */
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_this_ag(&args);
 	if (!error && args.agbno == NULLAGBLOCK)
 		error = -ENOSPC;
 
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 011baace7e9d..28b79facf2e3 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2723,7 +2723,6 @@ xfs_alloc_fix_freelist(
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = 1;
-	targs.type = XFS_ALLOCTYPE_THIS_AG;
 	targs.pag = pag;
 	error = xfs_alloc_read_agfl(pag, tp, &agflbp);
 	if (error)
@@ -3271,14 +3270,17 @@ xfs_alloc_vextent_set_fsbno(
 /*
  * Allocate within a single AG only.
  */
-static int
+int
 xfs_alloc_vextent_this_ag(
-	struct xfs_alloc_arg	*args,
-	xfs_agnumber_t		minimum_agno)
+	struct xfs_alloc_arg	*args)
 {
 	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		minimum_agno = 0;
 	int			error;
 
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		minimum_agno = args->tp->t_highest_agno;
+
 	error = xfs_alloc_vextent_check_args(args);
 	if (error) {
 		if (error == -ENOSPC)
@@ -3293,11 +3295,8 @@ xfs_alloc_vextent_this_ag(
 		return 0;
 	}
 
-	args->pag = xfs_perag_get(mp, args->agno);
 	error = xfs_alloc_ag_vextent(args);
-
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	xfs_perag_put(args->pag);
 	return error;
 }
 
@@ -3480,6 +3479,7 @@ xfs_alloc_vextent(
 	struct xfs_alloc_arg	*args)
 {
 	xfs_agnumber_t		minimum_agno = 0;
+	int			error;
 
 	if (args->tp->t_highest_agno != NULLAGNUMBER)
 		minimum_agno = args->tp->t_highest_agno;
@@ -3488,17 +3488,21 @@ xfs_alloc_vextent(
 	case XFS_ALLOCTYPE_THIS_AG:
 	case XFS_ALLOCTYPE_NEAR_BNO:
 	case XFS_ALLOCTYPE_THIS_BNO:
-		return xfs_alloc_vextent_this_ag(args, minimum_agno);
+		args->pag = xfs_perag_get(args->mp,
+				XFS_FSB_TO_AGNO(args->mp, args->fsbno));
+		error = xfs_alloc_vextent_this_ag(args);
+		xfs_perag_put(args->pag);
+		break;
 	case XFS_ALLOCTYPE_START_BNO:
 		return xfs_alloc_vextent_start_ag(args, minimum_agno);
 	case XFS_ALLOCTYPE_FIRST_AG:
 		return xfs_alloc_vextent_first_ag(args, minimum_agno);
 	default:
+		error = -EFSCORRUPTED;
 		ASSERT(0);
-		/* NOTREACHED */
+		break;
 	}
-	/* Should never get here */
-	return -EFSCORRUPTED;
+	return error;
 }
 
 /* Ensure that the freelist is at full capacity. */
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 2c3f762dfb58..0a9ad6cd18e2 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -124,6 +124,12 @@ int				/* error */
 xfs_alloc_vextent(
 	xfs_alloc_arg_t	*args);	/* allocation argument structure */
 
+/*
+ * Allocate an extent in the specific AG defined by args->fsbno. If there is no
+ * space in that AG, then the allocation will fail.
+ */
+int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
+
 /*
  * Free an extent.
  */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index e5519abbfa0d..fec00cceeba7 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -789,6 +789,8 @@ xfs_bmap_local_to_extents(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = ip->i_mount;
+	args.total = total;
+	args.minlen = args.maxlen = args.prod = 1;
 	xfs_rmap_ino_owner(&args.oinfo, ip->i_ino, whichfork, 0);
 	/*
 	 * Allocate a block.  We know we need only one, since the
@@ -3506,8 +3508,7 @@ xfs_bmap_btalloc(
 	xfs_extlen_t		orig_length;
 	xfs_extlen_t		blen;
 	xfs_extlen_t		nextminlen = 0;
-	int			isaligned;
-	int			tryagain;
+	int			isaligned = 0;
 	int			error;
 	int			stripe_align;
 
@@ -3528,7 +3529,6 @@ xfs_bmap_btalloc(
 
 	xfs_bmap_adjacent(ap);
 
-	tryagain = isaligned = 0;
 	args.fsbno = ap->blkno;
 	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
 
@@ -3576,9 +3576,9 @@ xfs_bmap_btalloc(
 			 * allocation with alignment turned on.
 			 */
 			atype = args.type;
-			tryagain = 1;
 			args.type = XFS_ALLOCTYPE_THIS_BNO;
 			args.alignment = 1;
+
 			/*
 			 * Compute the minlen+alignment for the
 			 * next case.  Set slop so that the value
@@ -3595,34 +3595,37 @@ xfs_bmap_btalloc(
 					args.minlen - 1;
 			else
 				args.minalignslop = 0;
+
+			args.pag = xfs_perag_get(mp,
+					XFS_FSB_TO_AGNO(mp, args.fsbno));
+			error = xfs_alloc_vextent_this_ag(&args);
+			xfs_perag_put(args.pag);
+			if (error)
+				return error;
+
+			if (args.fsbno != NULLFSBLOCK)
+				goto out_success;
+			/*
+			 * Exact allocation failed. Now try with alignment
+			 * turned on.
+			 */
+			args.pag = NULL;
+			args.type = atype;
+			args.fsbno = ap->blkno;
+			args.alignment = stripe_align;
+			args.minlen = nextminlen;
+			args.minalignslop = 0;
+			isaligned = 1;
 		}
 	} else {
 		args.alignment = 1;
 		args.minalignslop = 0;
 	}
-	args.minleft = ap->minleft;
-	args.wasdel = ap->wasdel;
-	args.resv = XFS_AG_RESV_NONE;
-	args.datatype = ap->datatype;
 
 	error = xfs_alloc_vextent(&args);
 	if (error)
 		return error;
 
-	if (tryagain && args.fsbno == NULLFSBLOCK) {
-		/*
-		 * Exact allocation failed. Now try with alignment
-		 * turned on.
-		 */
-		args.type = atype;
-		args.fsbno = ap->blkno;
-		args.alignment = stripe_align;
-		args.minlen = nextminlen;
-		args.minalignslop = 0;
-		isaligned = 1;
-		if ((error = xfs_alloc_vextent(&args)))
-			return error;
-	}
 	if (isaligned && args.fsbno == NULLFSBLOCK) {
 		/*
 		 * allocation failed, so turn off alignment and
@@ -3650,8 +3653,13 @@ xfs_bmap_btalloc(
 			return error;
 		ap->tp->t_flags |= XFS_TRANS_LOWMODE;
 	}
+	args.minleft = ap->minleft;
+	args.wasdel = ap->wasdel;
+	args.resv = XFS_AG_RESV_NONE;
+	args.datatype = ap->datatype;
 
 	if (args.fsbno != NULLFSBLOCK) {
+out_success:
 		xfs_bmap_process_allocated_extent(ap, &args, orig_offset,
 			orig_length);
 	} else {
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index afd9b2d962a3..d42c1a1da1fc 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -21,6 +21,7 @@
 #include "xfs_quota.h"
 #include "xfs_trace.h"
 #include "xfs_rmap.h"
+#include "xfs_ag.h"
 
 static struct kmem_cache	*xfs_bmbt_cur_cache;
 
@@ -200,14 +201,18 @@ xfs_bmbt_alloc_block(
 	union xfs_btree_ptr		*new,
 	int				*stat)
 {
-	xfs_alloc_arg_t		args;		/* block allocation args */
-	int			error;		/* error return value */
+	struct xfs_alloc_arg	args;
+	int			error;
 
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
 	xfs_rmap_ino_bmbt_owner(&args.oinfo, cur->bc_ino.ip->i_ino,
 			cur->bc_ino.whichfork);
+	args.minlen = args.maxlen = args.prod = 1;
+	args.wasdel = cur->bc_ino.flags & XFS_BTCUR_BMBT_WASDEL;
+	if (!args.wasdel && args.tp->t_blk_res == 0)
+		return -ENOSPC;
 
 	args.fsbno = be64_to_cpu(start->l);
 	args.type = XFS_ALLOCTYPE_START_BNO;
@@ -222,15 +227,9 @@ xfs_bmbt_alloc_block(
 		args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip,
 					cur->bc_ino.whichfork);
 
-	args.minlen = args.maxlen = args.prod = 1;
-	args.wasdel = cur->bc_ino.flags & XFS_BTCUR_BMBT_WASDEL;
-	if (!args.wasdel && args.tp->t_blk_res == 0) {
-		error = -ENOSPC;
-		goto error0;
-	}
 	error = xfs_alloc_vextent(&args);
 	if (error)
-		goto error0;
+		return error;
 
 	if (args.fsbno == NULLFSBLOCK && args.minleft) {
 		/*
@@ -243,7 +242,7 @@ xfs_bmbt_alloc_block(
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		error = xfs_alloc_vextent(&args);
 		if (error)
-			goto error0;
+			return error;
 		cur->bc_tp->t_flags |= XFS_TRANS_LOWMODE;
 	}
 	if (WARN_ON_ONCE(args.fsbno == NULLFSBLOCK)) {
@@ -262,9 +261,6 @@ xfs_bmbt_alloc_block(
 
 	*stat = 1;
 	return 0;
-
- error0:
-	return error;
 }
 
 STATIC int
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 50fef3f5af51..2f3e47cb9332 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -630,6 +630,7 @@ xfs_ialloc_ag_alloc(
 	args.mp = tp->t_mountp;
 	args.fsbno = NULLFSBLOCK;
 	args.oinfo = XFS_RMAP_OINFO_INODES;
+	args.pag = pag;
 
 #ifdef DEBUG
 	/* randomly do sparse inode allocations */
@@ -683,7 +684,8 @@ xfs_ialloc_ag_alloc(
 
 		/* Allow space for the inode btree to split. */
 		args.minleft = igeo->inobt_maxlevels;
-		if ((error = xfs_alloc_vextent(&args)))
+		error = xfs_alloc_vextent_this_ag(&args);
+		if (error)
 			return error;
 
 		/*
@@ -731,7 +733,8 @@ xfs_ialloc_ag_alloc(
 		 * Allow space for the inode btree to split.
 		 */
 		args.minleft = igeo->inobt_maxlevels;
-		if ((error = xfs_alloc_vextent(&args)))
+		error = xfs_alloc_vextent_this_ag(&args);
+		if (error)
 			return error;
 	}
 
@@ -780,7 +783,7 @@ xfs_ialloc_ag_alloc(
 					    args.mp->m_sb.sb_inoalignmt) -
 				 igeo->ialloc_blks;
 
-		error = xfs_alloc_vextent(&args);
+		error = xfs_alloc_vextent_this_ag(&args);
 		if (error)
 			return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 3675a0d29310..fa6cd2502970 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -103,6 +103,7 @@ __xfs_inobt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
+	args.pag = cur->bc_ag.pag;
 	args.oinfo = XFS_RMAP_OINFO_INOBT;
 	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_ag.pag->pag_agno, sbno);
 	args.minlen = 1;
@@ -111,7 +112,7 @@ __xfs_inobt_alloc_block(
 	args.type = XFS_ALLOCTYPE_NEAR_BNO;
 	args.resv = resv;
 
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_this_ag(&args);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index d20abf0390fc..a980fb18bde2 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -67,6 +67,7 @@ xfs_refcountbt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
+	args.pag = cur->bc_ag.pag;
 	args.type = XFS_ALLOCTYPE_NEAR_BNO;
 	args.fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno,
 			xfs_refc_block(args.mp));
@@ -74,7 +75,7 @@ xfs_refcountbt_alloc_block(
 	args.minlen = args.maxlen = args.prod = 1;
 	args.resv = XFS_AG_RESV_METADATA;
 
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_this_ag(&args);
 	if (error)
 		goto out_error;
 	trace_xfs_refcountbt_alloc_block(cur->bc_mp, cur->bc_ag.pag->pag_agno,
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index d0b1644efb89..5f4b50aac4bb 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -326,6 +326,7 @@ xrep_alloc_ag_block(
 
 	args.tp = sc->tp;
 	args.mp = sc->mp;
+	args.pag = sc->sa.pag;
 	args.oinfo = *oinfo;
 	args.fsbno = XFS_AGB_TO_FSB(args.mp, sc->sa.pag->pag_agno, 0);
 	args.minlen = 1;
@@ -334,7 +335,7 @@ xrep_alloc_ag_block(
 	args.type = XFS_ALLOCTYPE_THIS_AG;
 	args.resv = resv;
 
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_this_ag(&args);
 	if (error)
 		return error;
 	if (args.fsbno == NULLFSBLOCK)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 19/42] xfs: factor xfs_bmap_btalloc()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (17 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 18/42] xfs: use xfs_alloc_vextent_this_ag() where appropriate Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate Dave Chinner
                   ` (23 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There are several different contexts xfs_bmap_btalloc() handles, and
large chunks of the code execute independent allocation contexts.
Try to untangle this mess a bit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 333 +++++++++++++++++++++++----------------
 1 file changed, 196 insertions(+), 137 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index fec00cceeba7..cdf3b551ef7b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3196,13 +3196,13 @@ xfs_bmap_select_minlen(
 	}
 }
 
-STATIC int
+static int
 xfs_bmap_btalloc_select_lengths(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		*blen)
 {
-	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	xfs_agnumber_t		agno, startag;
 	int			notinit = 0;
@@ -3216,7 +3216,7 @@ xfs_bmap_btalloc_select_lengths(
 	}
 
 	args->total = ap->total;
-	startag = XFS_FSB_TO_AGNO(mp, args->fsbno);
+	startag = XFS_FSB_TO_AGNO(mp, ap->blkno);
 	if (startag == NULLAGNUMBER)
 		startag = 0;
 
@@ -3258,7 +3258,7 @@ xfs_bmap_btalloc_filestreams(
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 	args->total = ap->total;
 
-	start_agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
+	start_agno = XFS_FSB_TO_AGNO(mp, ap->blkno);
 	if (start_agno == NULLAGNUMBER)
 		start_agno = 0;
 
@@ -3496,170 +3496,229 @@ xfs_bmap_exact_minlen_extent_alloc(
 
 #endif
 
-STATIC int
-xfs_bmap_btalloc(
-	struct xfs_bmalloca	*ap)
+/*
+ * If we are not low on available data blocks and we are allocating at
+ * EOF, optimise allocation for contiguous file extension and/or stripe
+ * alignment of the new extent.
+ *
+ * NOTE: ap->aeof is only set if the allocation length is >= the
+ * stripe unit and the allocation offset is at the end of file.
+ */
+static int
+xfs_bmap_btalloc_at_eof(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_extlen_t		blen,
+	int			stripe_align)
 {
-	struct xfs_mount	*mp = ap->ip->i_mount;
-	struct xfs_alloc_arg	args = { .tp = ap->tp, .mp = mp };
-	xfs_alloctype_t		atype = 0;
-	xfs_agnumber_t		ag;
-	xfs_fileoff_t		orig_offset;
-	xfs_extlen_t		orig_length;
-	xfs_extlen_t		blen;
-	xfs_extlen_t		nextminlen = 0;
-	int			isaligned = 0;
+	struct xfs_mount	*mp = args->mp;
+	xfs_alloctype_t		atype;
 	int			error;
-	int			stripe_align;
 
-	ASSERT(ap->length);
-	orig_offset = ap->offset;
-	orig_length = ap->length;
+	/*
+	 * If there are already extents in the file, try an exact EOF block
+	 * allocation to extend the file as a contiguous extent. If that fails,
+	 * or it's the first allocation in a file, just try for a stripe aligned
+	 * allocation.
+	 */
+	if (ap->offset) {
+		xfs_extlen_t	nextminlen = 0;
 
-	stripe_align = xfs_bmap_compute_alignments(ap, &args);
+		atype = args->type;
+		args->type = XFS_ALLOCTYPE_THIS_BNO;
+		args->alignment = 1;
 
+		/*
+		 * Compute the minlen+alignment for the next case.  Set slop so
+		 * that the value of minlen+alignment+slop doesn't go up between
+		 * the calls.
+		 */
+		if (blen > stripe_align && blen <= args->maxlen)
+			nextminlen = blen - stripe_align;
+		else
+			nextminlen = args->minlen;
+		if (nextminlen + stripe_align > args->minlen + 1)
+			args->minalignslop = nextminlen + stripe_align -
+					args->minlen - 1;
+		else
+			args->minalignslop = 0;
+
+		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, args->fsbno));
+		error = xfs_alloc_vextent_this_ag(args);
+		xfs_perag_put(args->pag);
+		if (error)
+			return error;
+
+		if (args->fsbno != NULLFSBLOCK)
+			return 0;
+		/*
+		 * Exact allocation failed. Reset to try an aligned allocation
+		 * according to the original allocation specification.
+		 */
+		args->pag = NULL;
+		args->type = atype;
+		args->fsbno = ap->blkno;
+		args->alignment = stripe_align;
+		args->minlen = nextminlen;
+		args->minalignslop = 0;
+	} else {
+		args->alignment = stripe_align;
+		atype = args->type;
+		/*
+		 * Adjust minlen to try and preserve alignment if we
+		 * can't guarantee an aligned maxlen extent.
+		 */
+		if (blen > args->alignment &&
+		    blen <= args->maxlen + args->alignment)
+			args->minlen = blen - args->alignment;
+		args->minalignslop = 0;
+	}
+
+	error = xfs_alloc_vextent(args);
+	if (error)
+		return error;
+
+	if (args->fsbno != NULLFSBLOCK)
+		return 0;
+
+	/*
+	 * Allocation failed, so turn return the allocation args to their
+	 * original non-aligned state so the caller can proceed on allocation
+	 * failure as if this function was never called.
+	 */
+	args->type = atype;
+	args->fsbno = ap->blkno;
+	args->alignment = 1;
+	return 0;
+}
+
+static int
+xfs_bmap_btalloc_best_length(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	int			stripe_align)
+{
+	struct xfs_mount	*mp = args->mp;
+	xfs_extlen_t		blen = 0;
+	int			error;
+
+	/*
+	 * Determine the initial block number we will target for allocation.
+	 */
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
 	    xfs_inode_is_filestream(ap->ip)) {
-		ag = xfs_filestream_lookup_ag(ap->ip);
-		ag = (ag != NULLAGNUMBER) ? ag : 0;
-		ap->blkno = XFS_AGB_TO_FSB(mp, ag, 0);
+		xfs_agnumber_t	agno = xfs_filestream_lookup_ag(ap->ip);
+		if (agno == NULLAGNUMBER)
+			agno = 0;
+		ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
 	} else {
 		ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
 	}
-
 	xfs_bmap_adjacent(ap);
-
-	args.fsbno = ap->blkno;
-	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
-
-	/* Trim the allocation back to the maximum an AG can fit. */
-	args.maxlen = min(ap->length, mp->m_ag_max_usable);
-	blen = 0;
+	args->fsbno = ap->blkno;
 
 	/*
-	 * Search for an allocation group with a single extent large
-	 * enough for the request.  If one isn't found, then adjust
-	 * the minimum allocation size to the largest space found.
+	 * Search for an allocation group with a single extent large enough for
+	 * the request.  If one isn't found, then adjust the minimum allocation
+	 * size to the largest space found.
 	 */
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
 	    xfs_inode_is_filestream(ap->ip))
-		error = xfs_bmap_btalloc_filestreams(ap, &args, &blen);
+		error = xfs_bmap_btalloc_filestreams(ap, args, &blen);
 	else
-		error = xfs_bmap_btalloc_select_lengths(ap, &args, &blen);
+		error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
 	if (error)
 		return error;
 
 	/*
-	 * If we are not low on available data blocks, and the underlying
-	 * logical volume manager is a stripe, and the file offset is zero then
-	 * try to allocate data blocks on stripe unit boundary. NOTE: ap->aeof
-	 * is only set if the allocation length is >= the stripe unit and the
-	 * allocation offset is at the end of file.
+	 * Don't attempt optimal EOF allocation if previous allocations barely
+	 * succeeded due to being near ENOSPC. It is highly unlikely we'll get
+	 * optimal or even aligned allocations in this case, so don't waste time
+	 * trying.
 	 */
-	if (!(ap->tp->t_flags & XFS_TRANS_LOWMODE) && ap->aeof) {
-		if (!ap->offset) {
-			args.alignment = stripe_align;
-			atype = args.type;
-			isaligned = 1;
-			/*
-			 * Adjust minlen to try and preserve alignment if we
-			 * can't guarantee an aligned maxlen extent.
-			 */
-			if (blen > args.alignment &&
-			    blen <= args.maxlen + args.alignment)
-				args.minlen = blen - args.alignment;
-			args.minalignslop = 0;
-		} else {
-			/*
-			 * First try an exact bno allocation.
-			 * If it fails then do a near or start bno
-			 * allocation with alignment turned on.
-			 */
-			atype = args.type;
-			args.type = XFS_ALLOCTYPE_THIS_BNO;
-			args.alignment = 1;
-
-			/*
-			 * Compute the minlen+alignment for the
-			 * next case.  Set slop so that the value
-			 * of minlen+alignment+slop doesn't go up
-			 * between the calls.
-			 */
-			if (blen > stripe_align && blen <= args.maxlen)
-				nextminlen = blen - stripe_align;
-			else
-				nextminlen = args.minlen;
-			if (nextminlen + stripe_align > args.minlen + 1)
-				args.minalignslop =
-					nextminlen + stripe_align -
-					args.minlen - 1;
-			else
-				args.minalignslop = 0;
-
-			args.pag = xfs_perag_get(mp,
-					XFS_FSB_TO_AGNO(mp, args.fsbno));
-			error = xfs_alloc_vextent_this_ag(&args);
-			xfs_perag_put(args.pag);
-			if (error)
-				return error;
-
-			if (args.fsbno != NULLFSBLOCK)
-				goto out_success;
-			/*
-			 * Exact allocation failed. Now try with alignment
-			 * turned on.
-			 */
-			args.pag = NULL;
-			args.type = atype;
-			args.fsbno = ap->blkno;
-			args.alignment = stripe_align;
-			args.minlen = nextminlen;
-			args.minalignslop = 0;
-			isaligned = 1;
-		}
-	} else {
-		args.alignment = 1;
-		args.minalignslop = 0;
+	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align);
+		if (error)
+			return error;
+		if (args->fsbno != NULLFSBLOCK)
+			return 0;
 	}
 
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent(args);
 	if (error)
 		return error;
+	if (args->fsbno != NULLFSBLOCK)
+		return 0;
 
-	if (isaligned && args.fsbno == NULLFSBLOCK) {
-		/*
-		 * allocation failed, so turn off alignment and
-		 * try again.
-		 */
-		args.type = atype;
-		args.fsbno = ap->blkno;
-		args.alignment = 0;
-		if ((error = xfs_alloc_vextent(&args)))
-			return error;
-	}
-	if (args.fsbno == NULLFSBLOCK &&
-	    args.minlen > ap->minlen) {
-		args.minlen = ap->minlen;
-		args.type = XFS_ALLOCTYPE_START_BNO;
-		args.fsbno = ap->blkno;
-		if ((error = xfs_alloc_vextent(&args)))
-			return error;
-	}
-	if (args.fsbno == NULLFSBLOCK) {
-		args.fsbno = 0;
-		args.type = XFS_ALLOCTYPE_FIRST_AG;
-		args.total = ap->minlen;
-		if ((error = xfs_alloc_vextent(&args)))
+	/*
+	 * Try a locality first full filesystem minimum length allocation whilst
+	 * still maintaining necessary total block reservation requirements.
+	 */
+	if (args->minlen > ap->minlen) {
+		args->minlen = ap->minlen;
+		args->type = XFS_ALLOCTYPE_START_BNO;
+		args->fsbno = ap->blkno;
+		error = xfs_alloc_vextent(args);
+		if (error)
 			return error;
-		ap->tp->t_flags |= XFS_TRANS_LOWMODE;
 	}
-	args.minleft = ap->minleft;
-	args.wasdel = ap->wasdel;
-	args.resv = XFS_AG_RESV_NONE;
-	args.datatype = ap->datatype;
+	if (args->fsbno != NULLFSBLOCK)
+		return 0;
+
+	/*
+	 * We are now critically low on space, so this is a last resort
+	 * allocation attempt: no reserve, no locality, blocking, minimum
+	 * length, full filesystem free space scan. We also indicate to future
+	 * allocations in this transaction that we are critically low on space
+	 * so they don't waste time on allocation modes that are unlikely to
+	 * succeed.
+	 */
+	args->fsbno = 0;
+	args->type = XFS_ALLOCTYPE_FIRST_AG;
+	args->total = ap->minlen;
+	error = xfs_alloc_vextent(args);
+	if (error)
+		return error;
+	ap->tp->t_flags |= XFS_TRANS_LOWMODE;
+	return 0;
+}
+
+static int
+xfs_bmap_btalloc(
+	struct xfs_bmalloca	*ap)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_alloc_arg	args = {
+		.tp		= ap->tp,
+		.mp		= mp,
+		.fsbno		= NULLFSBLOCK,
+		.oinfo		= XFS_RMAP_OINFO_SKIP_UPDATE,
+		.minleft	= ap->minleft,
+		.wasdel		= ap->wasdel,
+		.resv		= XFS_AG_RESV_NONE,
+		.datatype	= ap->datatype,
+		.alignment	= 1,
+		.minalignslop	= 0,
+	};
+	xfs_fileoff_t		orig_offset;
+	xfs_extlen_t		orig_length;
+	int			error;
+	int			stripe_align;
+
+	ASSERT(ap->length);
+	orig_offset = ap->offset;
+	orig_length = ap->length;
+
+	stripe_align = xfs_bmap_compute_alignments(ap, &args);
+
+	/* Trim the allocation back to the maximum an AG can fit. */
+	args.maxlen = min(ap->length, mp->m_ag_max_usable);
+
+	error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
+	if (error)
+		return error;
 
 	if (args.fsbno != NULLFSBLOCK) {
-out_success:
 		xfs_bmap_process_allocated_extent(ap, &args, orig_offset,
 			orig_length);
 	} else {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (18 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 19/42] xfs: factor xfs_bmap_btalloc() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 22:43   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 21/42] xfs: use xfs_alloc_vextent_start_bno() " Dave Chinner
                   ` (22 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Change obvious callers of single AG allocation to use
xfs_alloc_vextent_first_ag(). This gets rid of
XFS_ALLOCTYPE_FIRST_AG as the type used within
xfs_alloc_vextent_first_ag() during iteration is _THIS_AG. Hence we
can remove the setting of args->type from all the callers of
_first_ag() and remove the alloctype.

While doing this, pass the allocation target fsb as a parameter
rather than encoding it in args->fsbno. This starts the process
of making args->fsbno an output only variable rather than
input/output.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 35 +++++++++++++++++++----------------
 fs/xfs/libxfs/xfs_alloc.h | 10 ++++++++--
 fs/xfs/libxfs/xfs_bmap.c  | 31 ++++++++++++++++---------------
 3 files changed, 43 insertions(+), 33 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 28b79facf2e3..186ce3aee9e0 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3183,7 +3183,8 @@ xfs_alloc_read_agf(
  */
 static int
 xfs_alloc_vextent_check_args(
-	struct xfs_alloc_arg	*args)
+	struct xfs_alloc_arg	*args,
+	xfs_rfsblock_t		target)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_agblock_t		agsize;
@@ -3201,13 +3202,13 @@ xfs_alloc_vextent_check_args(
 		args->maxlen = agsize;
 	if (args->alignment == 0)
 		args->alignment = 1;
-	ASSERT(XFS_FSB_TO_AGNO(mp, args->fsbno) < mp->m_sb.sb_agcount);
-	ASSERT(XFS_FSB_TO_AGBNO(mp, args->fsbno) < agsize);
+	ASSERT(XFS_FSB_TO_AGNO(mp, target) < mp->m_sb.sb_agcount);
+	ASSERT(XFS_FSB_TO_AGBNO(mp, target) < agsize);
 	ASSERT(args->minlen <= args->maxlen);
 	ASSERT(args->minlen <= agsize);
 	ASSERT(args->mod < args->prod);
-	if (XFS_FSB_TO_AGNO(mp, args->fsbno) >= mp->m_sb.sb_agcount ||
-	    XFS_FSB_TO_AGBNO(mp, args->fsbno) >= agsize ||
+	if (XFS_FSB_TO_AGNO(mp, target) >= mp->m_sb.sb_agcount ||
+	    XFS_FSB_TO_AGBNO(mp, target) >= agsize ||
 	    args->minlen > args->maxlen || args->minlen > agsize ||
 	    args->mod >= args->prod) {
 		args->fsbno = NULLFSBLOCK;
@@ -3281,7 +3282,7 @@ xfs_alloc_vextent_this_ag(
 	if (args->tp->t_highest_agno != NULLAGNUMBER)
 		minimum_agno = args->tp->t_highest_agno;
 
-	error = xfs_alloc_vextent_check_args(args);
+	error = xfs_alloc_vextent_check_args(args, args->fsbno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
@@ -3406,7 +3407,7 @@ xfs_alloc_vextent_start_ag(
 	bool			bump_rotor = false;
 	int			error;
 
-	error = xfs_alloc_vextent_check_args(args);
+	error = xfs_alloc_vextent_check_args(args, args->fsbno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
@@ -3444,25 +3445,29 @@ xfs_alloc_vextent_start_ag(
  * filesystem attempting blocking allocation. This does not wrap or try a second
  * pass, so will not recurse into AGs lower than indicated by fsbno.
  */
-static int
-xfs_alloc_vextent_first_ag(
+int
+ xfs_alloc_vextent_first_ag(
 	struct xfs_alloc_arg	*args,
-	xfs_agnumber_t		minimum_agno)
-{
+	xfs_rfsblock_t		target)
+ {
 	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		minimum_agno = 0;
 	xfs_agnumber_t		start_agno;
 	int			error;
 
-	error = xfs_alloc_vextent_check_args(args);
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		minimum_agno = args->tp->t_highest_agno;
+
+	error = xfs_alloc_vextent_check_args(args, target);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
 		return error;
 	}
 
-	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
-
+	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
 	args->type = XFS_ALLOCTYPE_THIS_AG;
+	args->fsbno = target;
 	error =  xfs_alloc_vextent_iterate_ags(args, minimum_agno,
 			start_agno, 0);
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
@@ -3495,8 +3500,6 @@ xfs_alloc_vextent(
 		break;
 	case XFS_ALLOCTYPE_START_BNO:
 		return xfs_alloc_vextent_start_ag(args, minimum_agno);
-	case XFS_ALLOCTYPE_FIRST_AG:
-		return xfs_alloc_vextent_first_ag(args, minimum_agno);
 	default:
 		error = -EFSCORRUPTED;
 		ASSERT(0);
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 0a9ad6cd18e2..73697dd3ca55 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -19,7 +19,6 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
 /*
  * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
  */
-#define XFS_ALLOCTYPE_FIRST_AG	0x02	/* ... start at ag 0 */
 #define XFS_ALLOCTYPE_THIS_AG	0x08	/* anywhere in this a.g. */
 #define XFS_ALLOCTYPE_START_BNO	0x10	/* near this block else anywhere */
 #define XFS_ALLOCTYPE_NEAR_BNO	0x20	/* in this a.g. and near this block */
@@ -29,7 +28,6 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
 typedef unsigned int xfs_alloctype_t;
 
 #define XFS_ALLOC_TYPES \
-	{ XFS_ALLOCTYPE_FIRST_AG,	"FIRST_AG" }, \
 	{ XFS_ALLOCTYPE_THIS_AG,	"THIS_AG" }, \
 	{ XFS_ALLOCTYPE_START_BNO,	"START_BNO" }, \
 	{ XFS_ALLOCTYPE_NEAR_BNO,	"NEAR_BNO" }, \
@@ -130,6 +128,14 @@ xfs_alloc_vextent(
  */
 int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
 
+/*
+ * Iterate from the AG indicated from args->fsbno through to the end of the
+ * filesystem attempting blocking allocation. This is for use in last
+ * resort allocation attempts when everything else has failed.
+ */
+int xfs_alloc_vextent_first_ag(struct xfs_alloc_arg *args,
+		xfs_rfsblock_t target);
+
 /*
  * Free an extent.
  */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index cdf3b551ef7b..eb3dc8d5319b 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3248,13 +3248,6 @@ xfs_bmap_btalloc_filestreams(
 	int			notinit = 0;
 	int			error;
 
-	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
-		args->type = XFS_ALLOCTYPE_FIRST_AG;
-		args->total = ap->minlen;
-		args->minlen = ap->minlen;
-		return 0;
-	}
-
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 	args->total = ap->total;
 
@@ -3462,9 +3455,7 @@ xfs_bmap_exact_minlen_extent_alloc(
 	 */
 	ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
 
-	args.fsbno = ap->blkno;
 	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
-	args.type = XFS_ALLOCTYPE_FIRST_AG;
 	args.minlen = args.maxlen = ap->minlen;
 	args.total = ap->total;
 
@@ -3476,7 +3467,7 @@ xfs_bmap_exact_minlen_extent_alloc(
 	args.resv = XFS_AG_RESV_NONE;
 	args.datatype = ap->datatype;
 
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_first_ag(&args, ap->blkno);
 	if (error)
 		return error;
 
@@ -3623,10 +3614,21 @@ xfs_bmap_btalloc_best_length(
 	 * size to the largest space found.
 	 */
 	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
-	    xfs_inode_is_filestream(ap->ip))
+	    xfs_inode_is_filestream(ap->ip)) {
+		/*
+		 * If there is very little free space before we start a
+		 * filestreams allocation, we're almost guaranteed to fail to
+		 * find an AG with enough contiguous free space to succeed, so
+		 * just go straight to the low space algorithm.
+		 */
+		if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+			args->minlen = ap->minlen;
+			goto critically_low_space;
+		}
 		error = xfs_bmap_btalloc_filestreams(ap, args, &blen);
-	else
+	} else {
 		error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
+	}
 	if (error)
 		return error;
 
@@ -3673,10 +3675,9 @@ xfs_bmap_btalloc_best_length(
 	 * so they don't waste time on allocation modes that are unlikely to
 	 * succeed.
 	 */
-	args->fsbno = 0;
-	args->type = XFS_ALLOCTYPE_FIRST_AG;
+critically_low_space:
 	args->total = ap->minlen;
-	error = xfs_alloc_vextent(args);
+	error = xfs_alloc_vextent_first_ag(args, 0);
 	if (error)
 		return error;
 	ap->tp->t_flags |= XFS_TRANS_LOWMODE;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 21/42] xfs: use xfs_alloc_vextent_start_bno() where appropriate
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (19 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 22:51   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 22/42] xfs: introduce xfs_alloc_vextent_near_bno() Dave Chinner
                   ` (21 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Change obvious callers of single AG allocation to use
xfs_alloc_vextent_start_bno(). Callers no long need to specify
XFS_ALLOCTYPE_START_BNO, and so the type can be driven inward and
removed.

While doing this, also pass the allocation target fsb as a parameter
rather than encoding it in args->fsbno.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c      | 24 ++++++++++---------
 fs/xfs/libxfs/xfs_alloc.h      | 13 ++++++++--
 fs/xfs/libxfs/xfs_bmap.c       | 43 ++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_bmap_btree.c |  9 ++-----
 4 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 186ce3aee9e0..294f80d596d9 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3189,7 +3189,6 @@ xfs_alloc_vextent_check_args(
 	struct xfs_mount	*mp = args->mp;
 	xfs_agblock_t		agsize;
 
-	args->otype = args->type;
 	args->agbno = NULLAGBLOCK;
 
 	/*
@@ -3345,7 +3344,7 @@ xfs_alloc_vextent_iterate_ags(
 		trace_xfs_alloc_vextent_loopfailed(args);
 
 		if (args->agno == start_agno &&
-		    args->otype == XFS_ALLOCTYPE_START_BNO)
+		    args->otype == XFS_ALLOCTYPE_NEAR_BNO)
 			args->type = XFS_ALLOCTYPE_THIS_AG;
 
 		/*
@@ -3373,7 +3372,7 @@ xfs_alloc_vextent_iterate_ags(
 			}
 
 			flags = 0;
-			if (args->otype == XFS_ALLOCTYPE_START_BNO) {
+			if (args->otype == XFS_ALLOCTYPE_NEAR_BNO) {
 				args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
 				args->type = XFS_ALLOCTYPE_NEAR_BNO;
 			}
@@ -3396,18 +3395,22 @@ xfs_alloc_vextent_iterate_ags(
  * otherwise will wrap back to the start AG and run a second blocking pass to
  * the end of the filesystem.
  */
-static int
+int
 xfs_alloc_vextent_start_ag(
 	struct xfs_alloc_arg	*args,
-	xfs_agnumber_t		minimum_agno)
+	xfs_rfsblock_t		target)
 {
 	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		minimum_agno = 0;
 	xfs_agnumber_t		start_agno;
 	xfs_agnumber_t		rotorstep = xfs_rotorstep;
 	bool			bump_rotor = false;
 	int			error;
 
-	error = xfs_alloc_vextent_check_args(args, args->fsbno);
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		minimum_agno = args->tp->t_highest_agno;
+
+	error = xfs_alloc_vextent_check_args(args, target);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
@@ -3416,14 +3419,15 @@ xfs_alloc_vextent_start_ag(
 
 	if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
 	    xfs_is_inode32(mp)) {
-		args->fsbno = XFS_AGB_TO_FSB(mp,
+		target = XFS_AGB_TO_FSB(mp,
 				((mp->m_agfrotor / rotorstep) %
 				mp->m_sb.sb_agcount), 0);
 		bump_rotor = 1;
 	}
-	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
-	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
+	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
+	args->fsbno = target;
 
 	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno, start_agno,
 			XFS_ALLOC_FLAG_TRYLOCK);
@@ -3498,8 +3502,6 @@ xfs_alloc_vextent(
 		error = xfs_alloc_vextent_this_ag(args);
 		xfs_perag_put(args->pag);
 		break;
-	case XFS_ALLOCTYPE_START_BNO:
-		return xfs_alloc_vextent_start_ag(args, minimum_agno);
 	default:
 		error = -EFSCORRUPTED;
 		ASSERT(0);
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 73697dd3ca55..5487dff3d68a 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -20,7 +20,6 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
  * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
  */
 #define XFS_ALLOCTYPE_THIS_AG	0x08	/* anywhere in this a.g. */
-#define XFS_ALLOCTYPE_START_BNO	0x10	/* near this block else anywhere */
 #define XFS_ALLOCTYPE_NEAR_BNO	0x20	/* in this a.g. and near this block */
 #define XFS_ALLOCTYPE_THIS_BNO	0x40	/* at exactly this block */
 
@@ -29,7 +28,6 @@ typedef unsigned int xfs_alloctype_t;
 
 #define XFS_ALLOC_TYPES \
 	{ XFS_ALLOCTYPE_THIS_AG,	"THIS_AG" }, \
-	{ XFS_ALLOCTYPE_START_BNO,	"START_BNO" }, \
 	{ XFS_ALLOCTYPE_NEAR_BNO,	"NEAR_BNO" }, \
 	{ XFS_ALLOCTYPE_THIS_BNO,	"THIS_BNO" }
 
@@ -128,6 +126,17 @@ xfs_alloc_vextent(
  */
 int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
 
+/*
+ * Best effort full filesystem allocation scan.
+ *
+ * Locality aware allocation will be attempted in the initial AG, but on failure
+ * non-localised attempts will be made. The AGs are constrained by previous
+ * allocations in the current transaction. Two passes will be made - the first
+ * non-blocking, the second blocking.
+ */
+int xfs_alloc_vextent_start_ag(struct xfs_alloc_arg *args,
+		xfs_rfsblock_t target);
+
 /*
  * Iterate from the AG indicated from args->fsbno through to the end of the
  * filesystem attempting blocking allocation. This is for use in last
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index eb3dc8d5319b..aefcdf2bfd57 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -646,12 +646,11 @@ xfs_bmap_extents_to_btree(
 	args.mp = mp;
 	xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, whichfork);
 
-	args.type = XFS_ALLOCTYPE_START_BNO;
-	args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
 	args.minlen = args.maxlen = args.prod = 1;
 	args.wasdel = wasdel;
 	*logflagsp = 0;
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_start_ag(&args,
+				XFS_INO_TO_FSB(mp, ip->i_ino));
 	if (error)
 		goto out_root_realloc;
 
@@ -792,15 +791,15 @@ xfs_bmap_local_to_extents(
 	args.total = total;
 	args.minlen = args.maxlen = args.prod = 1;
 	xfs_rmap_ino_owner(&args.oinfo, ip->i_ino, whichfork, 0);
+
 	/*
 	 * Allocate a block.  We know we need only one, since the
 	 * file currently fits in an inode.
 	 */
-	args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
-	args.type = XFS_ALLOCTYPE_START_BNO;
 	args.total = total;
 	args.minlen = args.maxlen = args.prod = 1;
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_start_ag(&args,
+			XFS_INO_TO_FSB(args.mp, ip->i_ino));
 	if (error)
 		goto done;
 
@@ -3208,7 +3207,6 @@ xfs_bmap_btalloc_select_lengths(
 	int			notinit = 0;
 	int			error = 0;
 
-	args->type = XFS_ALLOCTYPE_START_BNO;
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
 		args->total = ap->minlen;
 		args->minlen = ap->minlen;
@@ -3500,7 +3498,8 @@ xfs_bmap_btalloc_at_eof(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		blen,
-	int			stripe_align)
+	int			stripe_align,
+	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_alloctype_t		atype;
@@ -3565,7 +3564,10 @@ xfs_bmap_btalloc_at_eof(
 		args->minalignslop = 0;
 	}
 
-	error = xfs_alloc_vextent(args);
+	if (ag_only)
+		error = xfs_alloc_vextent(args);
+	else
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error)
 		return error;
 
@@ -3591,13 +3593,17 @@ xfs_bmap_btalloc_best_length(
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_extlen_t		blen = 0;
+	bool			is_filestream = false;
 	int			error;
 
+	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
+	    xfs_inode_is_filestream(ap->ip))
+		is_filestream = true;
+
 	/*
 	 * Determine the initial block number we will target for allocation.
 	 */
-	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
-	    xfs_inode_is_filestream(ap->ip)) {
+	if (is_filestream) {
 		xfs_agnumber_t	agno = xfs_filestream_lookup_ag(ap->ip);
 		if (agno == NULLAGNUMBER)
 			agno = 0;
@@ -3613,8 +3619,7 @@ xfs_bmap_btalloc_best_length(
 	 * the request.  If one isn't found, then adjust the minimum allocation
 	 * size to the largest space found.
 	 */
-	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
-	    xfs_inode_is_filestream(ap->ip)) {
+	if (is_filestream) {
 		/*
 		 * If there is very little free space before we start a
 		 * filestreams allocation, we're almost guaranteed to fail to
@@ -3639,14 +3644,18 @@ xfs_bmap_btalloc_best_length(
 	 * trying.
 	 */
 	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align);
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
+				is_filestream);
 		if (error)
 			return error;
 		if (args->fsbno != NULLFSBLOCK)
 			return 0;
 	}
 
-	error = xfs_alloc_vextent(args);
+	if (is_filestream)
+		error = xfs_alloc_vextent(args);
+	else
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error)
 		return error;
 	if (args->fsbno != NULLFSBLOCK)
@@ -3658,9 +3667,7 @@ xfs_bmap_btalloc_best_length(
 	 */
 	if (args->minlen > ap->minlen) {
 		args->minlen = ap->minlen;
-		args->type = XFS_ALLOCTYPE_START_BNO;
-		args->fsbno = ap->blkno;
-		error = xfs_alloc_vextent(args);
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index d42c1a1da1fc..b8ad95050c9b 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -214,9 +214,6 @@ xfs_bmbt_alloc_block(
 	if (!args.wasdel && args.tp->t_blk_res == 0)
 		return -ENOSPC;
 
-	args.fsbno = be64_to_cpu(start->l);
-	args.type = XFS_ALLOCTYPE_START_BNO;
-
 	/*
 	 * If we are coming here from something like unwritten extent
 	 * conversion, there has been no data extent allocation already done, so
@@ -227,7 +224,7 @@ xfs_bmbt_alloc_block(
 		args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip,
 					cur->bc_ino.whichfork);
 
-	error = xfs_alloc_vextent(&args);
+	error = xfs_alloc_vextent_start_ag(&args, be64_to_cpu(start->l));
 	if (error)
 		return error;
 
@@ -237,10 +234,8 @@ xfs_bmbt_alloc_block(
 		 * a full btree split.  Try again and if
 		 * successful activate the lowspace algorithm.
 		 */
-		args.fsbno = 0;
 		args.minleft = 0;
-		args.type = XFS_ALLOCTYPE_START_BNO;
-		error = xfs_alloc_vextent(&args);
+		error = xfs_alloc_vextent_start_ag(&args, 0);
 		if (error)
 			return error;
 		cur->bc_tp->t_flags |= XFS_TRANS_LOWMODE;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 22/42] xfs: introduce xfs_alloc_vextent_near_bno()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (20 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 21/42] xfs: use xfs_alloc_vextent_start_bno() " Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 22:52   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 23/42] xfs: introduce xfs_alloc_vextent_exact_bno() Dave Chinner
                   ` (20 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The remaining callers of xfs_alloc_vextent() are all doing NEAR_BNO
allocations. We can replace that function with a new
xfs_alloc_vextent_near_bno() function that does this explicitly.

We also multiplex NEAR_BNO allocations through
xfs_alloc_vextent_this_ag via args->type. Replace all of these with
direct calls to xfs_alloc_vextent_near_bno(), too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c          | 50 ++++++++++++++++++------------
 fs/xfs/libxfs/xfs_alloc.h          | 14 ++++-----
 fs/xfs/libxfs/xfs_bmap.c           |  6 ++--
 fs/xfs/libxfs/xfs_ialloc.c         | 27 ++++++----------
 fs/xfs/libxfs/xfs_ialloc_btree.c   |  5 ++-
 fs/xfs/libxfs/xfs_refcount_btree.c |  7 ++---
 6 files changed, 55 insertions(+), 54 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 294f80d596d9..485a73eab9d9 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3479,35 +3479,47 @@ int
 }
 
 /*
- * Allocate an extent (variable-size).
- * Depending on the allocation type, we either look in a single allocation
- * group or loop over the allocation groups to find the result.
+ * Allocate an extent as close to the target as possible. If there are not
+ * viable candidates in the AG, then fail the allocation.
  */
 int
-xfs_alloc_vextent(
-	struct xfs_alloc_arg	*args)
+xfs_alloc_vextent_near_bno(
+	struct xfs_alloc_arg	*args,
+	xfs_rfsblock_t		target)
 {
+	struct xfs_mount	*mp = args->mp;
+	bool			need_pag = !args->pag;
 	xfs_agnumber_t		minimum_agno = 0;
 	int			error;
 
 	if (args->tp->t_highest_agno != NULLAGNUMBER)
 		minimum_agno = args->tp->t_highest_agno;
 
-	switch (args->type) {
-	case XFS_ALLOCTYPE_THIS_AG:
-	case XFS_ALLOCTYPE_NEAR_BNO:
-	case XFS_ALLOCTYPE_THIS_BNO:
-		args->pag = xfs_perag_get(args->mp,
-				XFS_FSB_TO_AGNO(args->mp, args->fsbno));
-		error = xfs_alloc_vextent_this_ag(args);
-		xfs_perag_put(args->pag);
-		break;
-	default:
-		error = -EFSCORRUPTED;
-		ASSERT(0);
-		break;
+	error = xfs_alloc_vextent_check_args(args, target);
+	if (error) {
+		if (error == -ENOSPC)
+			return 0;
+		return error;
 	}
-	return error;
+
+	args->agno = XFS_FSB_TO_AGNO(mp, target);
+	if (minimum_agno > args->agno) {
+		trace_xfs_alloc_vextent_skip_deadlock(args);
+		return 0;
+	}
+
+	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
+	args->type = XFS_ALLOCTYPE_NEAR_BNO;
+	if (need_pag)
+		args->pag = xfs_perag_get(args->mp, args->agno);
+	error = xfs_alloc_ag_vextent(args);
+	if (need_pag)
+		xfs_perag_put(args->pag);
+	if (error)
+		return error;
+
+	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
+	return 0;
 }
 
 /* Ensure that the freelist is at full capacity. */
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 5487dff3d68a..f38a2f8e20fb 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -113,19 +113,19 @@ xfs_alloc_log_agf(
 	struct xfs_buf	*bp,	/* buffer for a.g. freelist header */
 	uint32_t	fields);/* mask of fields to be logged (XFS_AGF_...) */
 
-/*
- * Allocate an extent (variable-size).
- */
-int				/* error */
-xfs_alloc_vextent(
-	xfs_alloc_arg_t	*args);	/* allocation argument structure */
-
 /*
  * Allocate an extent in the specific AG defined by args->fsbno. If there is no
  * space in that AG, then the allocation will fail.
  */
 int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
 
+/*
+ * Allocate an extent as close to the target as possible. If there are not
+ * viable candidates in the AG, then fail the allocation.
+ */
+int xfs_alloc_vextent_near_bno(struct xfs_alloc_arg *args,
+		xfs_rfsblock_t target);
+
 /*
  * Best effort full filesystem allocation scan.
  *
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index aefcdf2bfd57..4446b035eed5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3246,7 +3246,6 @@ xfs_bmap_btalloc_filestreams(
 	int			notinit = 0;
 	int			error;
 
-	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 	args->total = ap->total;
 
 	start_agno = XFS_FSB_TO_AGNO(mp, ap->blkno);
@@ -3565,7 +3564,7 @@ xfs_bmap_btalloc_at_eof(
 	}
 
 	if (ag_only)
-		error = xfs_alloc_vextent(args);
+		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 	else
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error)
@@ -3612,7 +3611,6 @@ xfs_bmap_btalloc_best_length(
 		ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
 	}
 	xfs_bmap_adjacent(ap);
-	args->fsbno = ap->blkno;
 
 	/*
 	 * Search for an allocation group with a single extent large enough for
@@ -3653,7 +3651,7 @@ xfs_bmap_btalloc_best_length(
 	}
 
 	if (is_filestream)
-		error = xfs_alloc_vextent(args);
+		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
 	else
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
 	if (error)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 2f3e47cb9332..daa6f7055bba 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -717,23 +717,17 @@ xfs_ialloc_ag_alloc(
 			isaligned = 1;
 		} else
 			args.alignment = igeo->cluster_align;
-		/*
-		 * Need to figure out where to allocate the inode blocks.
-		 * Ideally they should be spaced out through the a.g.
-		 * For now, just allocate blocks up front.
-		 */
-		args.agbno = be32_to_cpu(agi->agi_root);
-		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
 		/*
 		 * Allocate a fixed-size extent of inodes.
 		 */
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
 		args.prod = 1;
 		/*
 		 * Allow space for the inode btree to split.
 		 */
 		args.minleft = igeo->inobt_maxlevels;
-		error = xfs_alloc_vextent_this_ag(&args);
+		error = xfs_alloc_vextent_near_bno(&args,
+				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
+						be32_to_cpu(agi->agi_root)));
 		if (error)
 			return error;
 	}
@@ -743,11 +737,11 @@ xfs_ialloc_ag_alloc(
 	 * alignment.
 	 */
 	if (isaligned && args.fsbno == NULLFSBLOCK) {
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-		args.agbno = be32_to_cpu(agi->agi_root);
-		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
 		args.alignment = igeo->cluster_align;
-		if ((error = xfs_alloc_vextent(&args)))
+		error = xfs_alloc_vextent_near_bno(&args,
+				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
+						be32_to_cpu(agi->agi_root)));
+		if (error)
 			return error;
 	}
 
@@ -759,9 +753,6 @@ xfs_ialloc_ag_alloc(
 	    igeo->ialloc_min_blks < igeo->ialloc_blks &&
 	    args.fsbno == NULLFSBLOCK) {
 sparse_alloc:
-		args.type = XFS_ALLOCTYPE_NEAR_BNO;
-		args.agbno = be32_to_cpu(agi->agi_root);
-		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
 		args.alignment = args.mp->m_sb.sb_spino_align;
 		args.prod = 1;
 
@@ -783,7 +774,9 @@ xfs_ialloc_ag_alloc(
 					    args.mp->m_sb.sb_inoalignmt) -
 				 igeo->ialloc_blks;
 
-		error = xfs_alloc_vextent_this_ag(&args);
+		error = xfs_alloc_vextent_near_bno(&args,
+				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
+						be32_to_cpu(agi->agi_root)));
 		if (error)
 			return error;
 
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index fa6cd2502970..9b28211d5a4c 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -105,14 +105,13 @@ __xfs_inobt_alloc_block(
 	args.mp = cur->bc_mp;
 	args.pag = cur->bc_ag.pag;
 	args.oinfo = XFS_RMAP_OINFO_INOBT;
-	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_ag.pag->pag_agno, sbno);
 	args.minlen = 1;
 	args.maxlen = 1;
 	args.prod = 1;
-	args.type = XFS_ALLOCTYPE_NEAR_BNO;
 	args.resv = resv;
 
-	error = xfs_alloc_vextent_this_ag(&args);
+	error = xfs_alloc_vextent_near_bno(&args,
+			XFS_AGB_TO_FSB(args.mp, args.pag->pag_agno, sbno));
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index a980fb18bde2..f3b860970b26 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -68,14 +68,13 @@ xfs_refcountbt_alloc_block(
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
 	args.pag = cur->bc_ag.pag;
-	args.type = XFS_ALLOCTYPE_NEAR_BNO;
-	args.fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-			xfs_refc_block(args.mp));
 	args.oinfo = XFS_RMAP_OINFO_REFC;
 	args.minlen = args.maxlen = args.prod = 1;
 	args.resv = XFS_AG_RESV_METADATA;
 
-	error = xfs_alloc_vextent_this_ag(&args);
+	error = xfs_alloc_vextent_near_bno(&args,
+			XFS_AGB_TO_FSB(args.mp, args.pag->pag_agno,
+					xfs_refc_block(args.mp)));
 	if (error)
 		goto out_error;
 	trace_xfs_refcountbt_alloc_block(cur->bc_mp, cur->bc_ag.pag->pag_agno,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 23/42] xfs: introduce xfs_alloc_vextent_exact_bno()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (21 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 22/42] xfs: introduce xfs_alloc_vextent_near_bno() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 23:00   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 24/42] xfs: introduce xfs_alloc_vextent_prepare() Dave Chinner
                   ` (19 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Two of the callers to xfs_alloc_vextent_this_ag() actually want
exact block number allocation, not anywhere-in-ag allocation. Split
this out from _this_ag() as a first class citizen so no external
extent allocation code needs to care about args->type anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.c     |  6 ++--
 fs/xfs/libxfs/xfs_alloc.c  | 65 ++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_alloc.h  | 13 ++++++--
 fs/xfs/libxfs/xfs_bmap.c   |  6 ++--
 fs/xfs/libxfs/xfs_ialloc.c |  6 ++--
 fs/xfs/scrub/repair.c      |  4 +--
 6 files changed, 73 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 053d77a283f7..86696a1c6891 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -888,7 +888,6 @@ xfs_ag_shrink_space(
 		.tp	= *tpp,
 		.mp	= mp,
 		.pag	= pag,
-		.type	= XFS_ALLOCTYPE_THIS_BNO,
 		.minlen = delta,
 		.maxlen = delta,
 		.oinfo	= XFS_RMAP_OINFO_SKIP_UPDATE,
@@ -920,8 +919,6 @@ xfs_ag_shrink_space(
 	if (delta >= aglen)
 		return -EINVAL;
 
-	args.fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, aglen - delta);
-
 	/*
 	 * Make sure that the last inode cluster cannot overlap with the new
 	 * end of the AG, even if it's sparse.
@@ -939,7 +936,8 @@ xfs_ag_shrink_space(
 		return error;
 
 	/* internal log shouldn't also show up in the free space btrees */
-	error = xfs_alloc_vextent_this_ag(&args);
+	error = xfs_alloc_vextent_exact_bno(&args,
+			XFS_AGB_TO_FSB(mp, pag->pag_agno, aglen - delta));
 	if (!error && args.agbno == NULLAGBLOCK)
 		error = -ENOSPC;
 
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 485a73eab9d9..b810a94aad70 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3272,28 +3272,34 @@ xfs_alloc_vextent_set_fsbno(
  */
 int
 xfs_alloc_vextent_this_ag(
-	struct xfs_alloc_arg	*args)
+	struct xfs_alloc_arg	*args,
+	xfs_agnumber_t		agno)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_agnumber_t		minimum_agno = 0;
+	xfs_rfsblock_t		target = XFS_AGB_TO_FSB(mp, agno, 0);
 	int			error;
 
 	if (args->tp->t_highest_agno != NULLAGNUMBER)
 		minimum_agno = args->tp->t_highest_agno;
 
-	error = xfs_alloc_vextent_check_args(args, args->fsbno);
+	if (minimum_agno > agno) {
+		trace_xfs_alloc_vextent_skip_deadlock(args);
+		args->fsbno = NULLFSBLOCK;
+		return 0;
+	}
+
+	error = xfs_alloc_vextent_check_args(args, target);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
 		return error;
 	}
 
-	args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
-	if (minimum_agno > args->agno) {
-		trace_xfs_alloc_vextent_skip_deadlock(args);
-		args->fsbno = NULLFSBLOCK;
-		return 0;
-	}
+	args->agno = agno;
+	args->agbno = 0;
+	args->fsbno = target;
+	args->type = XFS_ALLOCTYPE_THIS_AG;
 
 	error = xfs_alloc_ag_vextent(args);
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
@@ -3450,7 +3456,7 @@ xfs_alloc_vextent_start_ag(
  * pass, so will not recurse into AGs lower than indicated by fsbno.
  */
 int
- xfs_alloc_vextent_first_ag(
+xfs_alloc_vextent_first_ag(
 	struct xfs_alloc_arg	*args,
 	xfs_rfsblock_t		target)
  {
@@ -3472,12 +3478,51 @@ int
 	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
 	args->type = XFS_ALLOCTYPE_THIS_AG;
 	args->fsbno = target;
-	error =  xfs_alloc_vextent_iterate_ags(args, minimum_agno,
+	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno,
 			start_agno, 0);
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
 	return error;
 }
 
+/*
+ * Allocate within a single AG only.
+ */
+int
+xfs_alloc_vextent_exact_bno(
+	struct xfs_alloc_arg	*args,
+	xfs_rfsblock_t		target)
+{
+	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		minimum_agno = 0;
+	int			error;
+
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		minimum_agno = args->tp->t_highest_agno;
+
+	error = xfs_alloc_vextent_check_args(args, target);
+	if (error) {
+		if (error == -ENOSPC)
+			return 0;
+		return error;
+	}
+
+	args->agno = XFS_FSB_TO_AGNO(mp, target);
+	if (minimum_agno > args->agno) {
+		trace_xfs_alloc_vextent_skip_deadlock(args);
+		return 0;
+	}
+
+	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
+	args->fsbno = target;
+	args->type = XFS_ALLOCTYPE_THIS_BNO;
+	error = xfs_alloc_ag_vextent(args);
+	if (error)
+		return error;
+
+	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
+	return 0;
+}
+
 /*
  * Allocate an extent as close to the target as possible. If there are not
  * viable candidates in the AG, then fail the allocation.
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index f38a2f8e20fb..106b4deb1110 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -114,10 +114,10 @@ xfs_alloc_log_agf(
 	uint32_t	fields);/* mask of fields to be logged (XFS_AGF_...) */
 
 /*
- * Allocate an extent in the specific AG defined by args->fsbno. If there is no
- * space in that AG, then the allocation will fail.
+ * Allocate an extent anywhere in the specific AG given. If there is no
+ * space matching the requirements in that AG, then the allocation will fail.
  */
-int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
+int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args, xfs_agnumber_t agno);
 
 /*
  * Allocate an extent as close to the target as possible. If there are not
@@ -126,6 +126,13 @@ int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
 int xfs_alloc_vextent_near_bno(struct xfs_alloc_arg *args,
 		xfs_rfsblock_t target);
 
+/*
+ * Allocate an extent exactly at the target given. If this is not possible
+ * then the allocation fails.
+ */
+int xfs_alloc_vextent_exact_bno(struct xfs_alloc_arg *args,
+		xfs_rfsblock_t target);
+
 /*
  * Best effort full filesystem allocation scan.
  *
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4446b035eed5..c9902df16e25 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3514,7 +3514,6 @@ xfs_bmap_btalloc_at_eof(
 		xfs_extlen_t	nextminlen = 0;
 
 		atype = args->type;
-		args->type = XFS_ALLOCTYPE_THIS_BNO;
 		args->alignment = 1;
 
 		/*
@@ -3532,8 +3531,8 @@ xfs_bmap_btalloc_at_eof(
 		else
 			args->minalignslop = 0;
 
-		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, args->fsbno));
-		error = xfs_alloc_vextent_this_ag(args);
+		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
+		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
 		xfs_perag_put(args->pag);
 		if (error)
 			return error;
@@ -3546,7 +3545,6 @@ xfs_bmap_btalloc_at_eof(
 		 */
 		args->pag = NULL;
 		args->type = atype;
-		args->fsbno = ap->blkno;
 		args->alignment = stripe_align;
 		args->minlen = nextminlen;
 		args->minalignslop = 0;
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index daa6f7055bba..d2525f0cc6cd 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -662,8 +662,6 @@ xfs_ialloc_ag_alloc(
 		goto sparse_alloc;
 	if (likely(newino != NULLAGINO &&
 		  (args.agbno < be32_to_cpu(agi->agi_length)))) {
-		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
-		args.type = XFS_ALLOCTYPE_THIS_BNO;
 		args.prod = 1;
 
 		/*
@@ -684,7 +682,9 @@ xfs_ialloc_ag_alloc(
 
 		/* Allow space for the inode btree to split. */
 		args.minleft = igeo->inobt_maxlevels;
-		error = xfs_alloc_vextent_this_ag(&args);
+		error = xfs_alloc_vextent_exact_bno(&args,
+				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
+						args.agbno));
 		if (error)
 			return error;
 
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 5f4b50aac4bb..1b71174ec0d6 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -328,14 +328,12 @@ xrep_alloc_ag_block(
 	args.mp = sc->mp;
 	args.pag = sc->sa.pag;
 	args.oinfo = *oinfo;
-	args.fsbno = XFS_AGB_TO_FSB(args.mp, sc->sa.pag->pag_agno, 0);
 	args.minlen = 1;
 	args.maxlen = 1;
 	args.prod = 1;
-	args.type = XFS_ALLOCTYPE_THIS_AG;
 	args.resv = resv;
 
-	error = xfs_alloc_vextent_this_ag(&args);
+	error = xfs_alloc_vextent_this_ag(&args, sc->sa.pag->pag_agno);
 	if (error)
 		return error;
 	if (args.fsbno == NULLFSBLOCK)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 24/42] xfs: introduce xfs_alloc_vextent_prepare()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (22 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 23/42] xfs: introduce xfs_alloc_vextent_exact_bno() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 25/42] xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno() Dave Chinner
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that we have wrapper functions for each type of allocation we
can ask for, we can start unravelling xfs_alloc_ag_vextent(). That
is essentially just a prepare stage, the allocation multiplexer
and a post-allocation accounting step is the allocation proceeded.

The current xfs_alloc_vextent*() wrappers all have a prepare stage,
the allocation operation and a post-allocation accounting step.

We can consolidate this by moving the AG alloc prep code into the
wrapper functions, the accounting code in the wrapper accounting
functions, and cut out the multiplexer layer entirely.

This patch consolidates the AG preparation stage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 120 ++++++++++++++++++++++++--------------
 1 file changed, 76 insertions(+), 44 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index b810a94aad70..bfbbb7536310 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1144,31 +1144,8 @@ static int
 xfs_alloc_ag_vextent(
 	struct xfs_alloc_arg	*args)
 {
-	struct xfs_mount	*mp = args->mp;
 	int			error = 0;
 
-	ASSERT(args->minlen > 0);
-	ASSERT(args->maxlen > 0);
-	ASSERT(args->minlen <= args->maxlen);
-	ASSERT(args->mod < args->prod);
-	ASSERT(args->alignment > 0);
-	ASSERT(args->resv != XFS_AG_RESV_AGFL);
-
-
-	error = xfs_alloc_fix_freelist(args, 0);
-	if (error) {
-		trace_xfs_alloc_vextent_nofix(args);
-		return error;
-	}
-	if (!args->agbp) {
-		/* cannot allocate in this AG at all */
-		trace_xfs_alloc_vextent_noagbp(args);
-		args->agbno = NULLAGBLOCK;
-		return 0;
-	}
-	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
-	args->wasfromfl = 0;
-
 	/*
 	 * Branch to correct routine based on the type.
 	 */
@@ -3201,11 +3178,18 @@ xfs_alloc_vextent_check_args(
 		args->maxlen = agsize;
 	if (args->alignment == 0)
 		args->alignment = 1;
+
+	ASSERT(args->minlen > 0);
+	ASSERT(args->maxlen > 0);
+	ASSERT(args->alignment > 0);
+	ASSERT(args->resv != XFS_AG_RESV_AGFL);
+
 	ASSERT(XFS_FSB_TO_AGNO(mp, target) < mp->m_sb.sb_agcount);
 	ASSERT(XFS_FSB_TO_AGBNO(mp, target) < agsize);
 	ASSERT(args->minlen <= args->maxlen);
 	ASSERT(args->minlen <= agsize);
 	ASSERT(args->mod < args->prod);
+
 	if (XFS_FSB_TO_AGNO(mp, target) >= mp->m_sb.sb_agcount ||
 	    XFS_FSB_TO_AGBNO(mp, target) >= agsize ||
 	    args->minlen > args->maxlen || args->minlen > agsize ||
@@ -3217,6 +3201,41 @@ xfs_alloc_vextent_check_args(
 	return 0;
 }
 
+/*
+ * Prepare an AG for allocation. If the AG is not prepared to accept the
+ * allocation, return failure.
+ *
+ * XXX(dgc): The complexity of "need_pag" will go away as all caller paths are
+ * modified to hold their own perag references.
+ */
+static int
+xfs_alloc_vextent_prepare_ag(
+	struct xfs_alloc_arg	*args)
+{
+	bool			need_pag = !args->pag;
+	int			error;
+
+	if (need_pag)
+		args->pag = xfs_perag_get(args->mp, args->agno);
+
+	error = xfs_alloc_fix_freelist(args, 0);
+	if (error) {
+		trace_xfs_alloc_vextent_nofix(args);
+		if (need_pag)
+			xfs_perag_put(args->pag);
+		args->agbno = NULLAGBLOCK;
+		return error;
+	}
+	if (!args->agbp) {
+		/* cannot allocate in this AG at all */
+		trace_xfs_alloc_vextent_noagbp(args);
+		args->agbno = NULLAGBLOCK;
+		return 0;
+	}
+	args->wasfromfl = 0;
+	return 0;
+}
+
 /*
  * Post-process allocation results to set the allocated block number correctly
  * for the caller.
@@ -3268,7 +3287,8 @@ xfs_alloc_vextent_set_fsbno(
 }
 
 /*
- * Allocate within a single AG only.
+ * Allocate within a single AG only. Caller is expected to hold a
+ * perag reference in args->pag.
  */
 int
 xfs_alloc_vextent_this_ag(
@@ -3301,7 +3321,10 @@ xfs_alloc_vextent_this_ag(
 	args->fsbno = target;
 	args->type = XFS_ALLOCTYPE_THIS_AG;
 
-	error = xfs_alloc_ag_vextent(args);
+	error = xfs_alloc_vextent_prepare_ag(args);
+	if (!error && args->agbp)
+		error = xfs_alloc_ag_vextent(args);
+
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
 	return error;
 }
@@ -3339,13 +3362,19 @@ xfs_alloc_vextent_iterate_ags(
 	args->agno = start_agno;
 	for (;;) {
 		args->pag = xfs_perag_get(mp, args->agno);
-		error = xfs_alloc_ag_vextent(args);
-		if (error) {
-			args->agbno = NULLAGBLOCK;
+		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+		error = xfs_alloc_vextent_prepare_ag(args);
+		if (error)
 			break;
-		}
-		if (args->agbp)
+
+		if (args->agbp) {
+			/*
+			 * Allocation is supposed to succeed now, so break out
+			 * of the loop regardless of whether we succeed or not.
+			 */
+			error = xfs_alloc_ag_vextent(args);
 			break;
+		}
 
 		trace_xfs_alloc_vextent_loopfailed(args);
 
@@ -3378,10 +3407,8 @@ xfs_alloc_vextent_iterate_ags(
 			}
 
 			flags = 0;
-			if (args->otype == XFS_ALLOCTYPE_NEAR_BNO) {
-				args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
+			if (args->otype == XFS_ALLOCTYPE_NEAR_BNO)
 				args->type = XFS_ALLOCTYPE_NEAR_BNO;
-			}
 		}
 		xfs_perag_put(args->pag);
 		args->pag = NULL;
@@ -3485,7 +3512,8 @@ xfs_alloc_vextent_first_ag(
 }
 
 /*
- * Allocate within a single AG only.
+ * Allocate at the exact block target or fail. Caller is expected to hold a
+ * perag reference in args->pag.
  */
 int
 xfs_alloc_vextent_exact_bno(
@@ -3515,9 +3543,10 @@ xfs_alloc_vextent_exact_bno(
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
 	args->fsbno = target;
 	args->type = XFS_ALLOCTYPE_THIS_BNO;
-	error = xfs_alloc_ag_vextent(args);
-	if (error)
-		return error;
+
+	error = xfs_alloc_vextent_prepare_ag(args);
+	if (!error && args->agbp)
+		error = xfs_alloc_ag_vextent(args);
 
 	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
 	return 0;
@@ -3526,6 +3555,8 @@ xfs_alloc_vextent_exact_bno(
 /*
  * Allocate an extent as close to the target as possible. If there are not
  * viable candidates in the AG, then fail the allocation.
+ *
+ * Caller may or may not have a per-ag reference in args->pag.
  */
 int
 xfs_alloc_vextent_near_bno(
@@ -3550,21 +3581,22 @@ xfs_alloc_vextent_near_bno(
 	args->agno = XFS_FSB_TO_AGNO(mp, target);
 	if (minimum_agno > args->agno) {
 		trace_xfs_alloc_vextent_skip_deadlock(args);
+		args->fsbno = NULLFSBLOCK;
 		return 0;
 	}
 
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
-	if (need_pag)
-		args->pag = xfs_perag_get(args->mp, args->agno);
-	error = xfs_alloc_ag_vextent(args);
+
+	error = xfs_alloc_vextent_prepare_ag(args);
+	if (!error && args->agbp)
+		error = xfs_alloc_ag_vextent(args);
+
+	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
 	if (need_pag)
 		xfs_perag_put(args->pag);
-	if (error)
-		return error;
 
-	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	return 0;
+	return error;
 }
 
 /* Ensure that the freelist is at full capacity. */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 25/42] xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (23 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 24/42] xfs: introduce xfs_alloc_vextent_prepare() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 26/42] xfs: fold xfs_alloc_ag_vextent() into callers Dave Chinner
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Move it from xfs_alloc_ag_vextent() so we can get rid of that layer.
Rename xfs_alloc_vextent_set_fsbno() to xfs_alloc_vextent_finish()
to indicate that it's function is finishing off the allocation that
we've run now that it contains much more functionality.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 122 ++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 59 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index bfbbb7536310..ad2b91b230f6 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1163,36 +1163,6 @@ xfs_alloc_ag_vextent(
 		ASSERT(0);
 		/* NOTREACHED */
 	}
-
-	if (error || args->agbno == NULLAGBLOCK)
-		return error;
-
-	ASSERT(args->len >= args->minlen);
-	ASSERT(args->len <= args->maxlen);
-	ASSERT(args->agbno % args->alignment == 0);
-
-	/* if not file data, insert new block into the reverse map btree */
-	if (!xfs_rmap_should_skip_owner_update(&args->oinfo)) {
-		error = xfs_rmap_alloc(args->tp, args->agbp, args->pag,
-				       args->agbno, args->len, &args->oinfo);
-		if (error)
-			return error;
-	}
-
-	if (!args->wasfromfl) {
-		error = xfs_alloc_update_counters(args->tp, args->agbp,
-						  -((long)(args->len)));
-		if (error)
-			return error;
-
-		ASSERT(!xfs_extent_busy_search(args->mp, args->pag,
-					      args->agbno, args->len));
-	}
-
-	xfs_ag_resv_alloc_extent(args->pag, args->resv, args);
-
-	XFS_STATS_INC(args->mp, xs_allocx);
-	XFS_STATS_ADD(args->mp, xs_allocb, args->len);
 	return error;
 }
 
@@ -3237,18 +3207,21 @@ xfs_alloc_vextent_prepare_ag(
 }
 
 /*
- * Post-process allocation results to set the allocated block number correctly
- * for the caller.
+ * Post-process allocation results to account for the allocation if it succeed
+ * and set the allocated block number correctly for the caller.
  *
- * XXX: xfs_alloc_vextent() should really be returning ENOSPC for ENOSPC, not
+ * XXX: we should really be returning ENOSPC for ENOSPC, not
  * hiding it behind a "successful" NULLFSBLOCK allocation.
  */
-static void
-xfs_alloc_vextent_set_fsbno(
+static int
+xfs_alloc_vextent_finish(
 	struct xfs_alloc_arg	*args,
-	xfs_agnumber_t		minimum_agno)
+	xfs_agnumber_t		minimum_agno,
+	int			alloc_error,
+	bool			drop_perag)
 {
 	struct xfs_mount	*mp = args->mp;
+	int			error = 0;
 
 	/*
 	 * We can end up here with a locked AGF. If we failed, the caller is
@@ -3271,19 +3244,54 @@ xfs_alloc_vextent_set_fsbno(
 	     args->agno > minimum_agno))
 		args->tp->t_highest_agno = args->agno;
 
-	/* Allocation failed with ENOSPC if NULLAGBLOCK was returned. */
-	if (args->agbno == NULLAGBLOCK) {
+	/*
+	 * If the allocation failed with an error or we had an ENOSPC result,
+	 * preserve the returned error whilst also marking the allocation result
+	 * as "no extent allocated". This ensures that callers that fail to
+	 * capture the error will still treat it as a failed allocation.
+	 */
+	if (alloc_error || args->agbno == NULLAGBLOCK) {
 		args->fsbno = NULLFSBLOCK;
-		return;
+		error = alloc_error;
+		goto out_drop_perag;
 	}
 
 	args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
-#ifdef DEBUG
+
 	ASSERT(args->len >= args->minlen);
 	ASSERT(args->len <= args->maxlen);
 	ASSERT(args->agbno % args->alignment == 0);
 	XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno), args->len);
-#endif
+
+	/* if not file data, insert new block into the reverse map btree */
+	if (!xfs_rmap_should_skip_owner_update(&args->oinfo)) {
+		error = xfs_rmap_alloc(args->tp, args->agbp, args->pag,
+				       args->agbno, args->len, &args->oinfo);
+		if (error)
+			goto out_drop_perag;
+	}
+
+	if (!args->wasfromfl) {
+		error = xfs_alloc_update_counters(args->tp, args->agbp,
+						  -((long)(args->len)));
+		if (error)
+			goto out_drop_perag;
+
+		ASSERT(!xfs_extent_busy_search(mp, args->pag, args->agbno,
+				args->len));
+	}
+
+	xfs_ag_resv_alloc_extent(args->pag, args->resv, args);
+
+	XFS_STATS_INC(mp, xs_allocx);
+	XFS_STATS_ADD(mp, xs_allocb, args->len);
+
+out_drop_perag:
+	if (drop_perag) {
+		xfs_perag_put(args->pag);
+		args->pag = NULL;
+	}
+	return error;
 }
 
 /*
@@ -3325,8 +3333,7 @@ xfs_alloc_vextent_this_ag(
 	if (!error && args->agbp)
 		error = xfs_alloc_ag_vextent(args);
 
-	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	return error;
+	return xfs_alloc_vextent_finish(args, minimum_agno, error, false);
 }
 
 /*
@@ -3413,10 +3420,10 @@ xfs_alloc_vextent_iterate_ags(
 		xfs_perag_put(args->pag);
 		args->pag = NULL;
 	}
-	if (args->pag) {
-		xfs_perag_put(args->pag);
-		args->pag = NULL;
-	}
+	/*
+	 * The perag is left referenced in args for the caller to clean
+	 * up after they've finished the allocation.
+	 */
 	return error;
 }
 
@@ -3473,8 +3480,7 @@ xfs_alloc_vextent_start_ag(
 				(mp->m_sb.sb_agcount * rotorstep);
 	}
 
-	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	return error;
+	return xfs_alloc_vextent_finish(args, minimum_agno, error, true);
 }
 
 /*
@@ -3507,8 +3513,7 @@ xfs_alloc_vextent_first_ag(
 	args->fsbno = target;
 	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno,
 			start_agno, 0);
-	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	return error;
+	return xfs_alloc_vextent_finish(args, minimum_agno, error, true);
 }
 
 /*
@@ -3537,6 +3542,7 @@ xfs_alloc_vextent_exact_bno(
 	args->agno = XFS_FSB_TO_AGNO(mp, target);
 	if (minimum_agno > args->agno) {
 		trace_xfs_alloc_vextent_skip_deadlock(args);
+		args->fsbno = NULLFSBLOCK;
 		return 0;
 	}
 
@@ -3548,8 +3554,7 @@ xfs_alloc_vextent_exact_bno(
 	if (!error && args->agbp)
 		error = xfs_alloc_ag_vextent(args);
 
-	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	return 0;
+	return xfs_alloc_vextent_finish(args, minimum_agno, error, false);
 }
 
 /*
@@ -3564,8 +3569,8 @@ xfs_alloc_vextent_near_bno(
 	xfs_rfsblock_t		target)
 {
 	struct xfs_mount	*mp = args->mp;
-	bool			need_pag = !args->pag;
 	xfs_agnumber_t		minimum_agno = 0;
+	bool			needs_perag = args->pag == NULL;
 	int			error;
 
 	if (args->tp->t_highest_agno != NULLAGNUMBER)
@@ -3585,6 +3590,9 @@ xfs_alloc_vextent_near_bno(
 		return 0;
 	}
 
+	if (needs_perag)
+		args->pag = xfs_perag_get(mp, args->agno);
+
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
 	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 
@@ -3592,11 +3600,7 @@ xfs_alloc_vextent_near_bno(
 	if (!error && args->agbp)
 		error = xfs_alloc_ag_vextent(args);
 
-	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
-	if (need_pag)
-		xfs_perag_put(args->pag);
-
-	return error;
+	return xfs_alloc_vextent_finish(args, minimum_agno, error, needs_perag);
 }
 
 /* Ensure that the freelist is at full capacity. */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 26/42] xfs: fold xfs_alloc_ag_vextent() into callers
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (24 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 25/42] xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 27/42] xfs: move the minimum agno checks into xfs_alloc_vextent_check_args Dave Chinner
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We don't need the multiplexing xfs_alloc_ag_vextent() provided
anymore - we can just call the exact/near/size variants directly.
This allows us to remove args->type completely and stop using
args->fsbno as an input to the allocator algorithms.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 100 ++++++++++----------------------------
 fs/xfs/libxfs/xfs_alloc.h |  17 -------
 fs/xfs/libxfs/xfs_bmap.c  |  10 +---
 fs/xfs/xfs_trace.h        |   8 +--
 4 files changed, 29 insertions(+), 106 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index ad2b91b230f6..4de9026d872f 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -36,10 +36,6 @@ struct workqueue_struct *xfs_alloc_wq;
 #define	XFSA_FIXUP_BNO_OK	1
 #define	XFSA_FIXUP_CNT_OK	2
 
-STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *);
-STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *);
-STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
-
 /*
  * Size of the AGFL.  For CRC-enabled filesystes we steal a couple of slots in
  * the beginning of the block for a proper header with the location information
@@ -772,8 +768,6 @@ xfs_alloc_cur_setup(
 	int			error;
 	int			i;
 
-	ASSERT(args->alignment == 1 || args->type != XFS_ALLOCTYPE_THIS_BNO);
-
 	acur->cur_len = args->maxlen;
 	acur->rec_bno = 0;
 	acur->rec_len = 0;
@@ -887,7 +881,6 @@ xfs_alloc_cur_check(
 	 * We have an aligned record that satisfies minlen and beats or matches
 	 * the candidate extent size. Compare locality for near allocation mode.
 	 */
-	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
 	diff = xfs_alloc_compute_diff(args->agbno, args->len,
 				      args->alignment, args->datatype,
 				      bnoa, lena, &bnew);
@@ -1132,40 +1125,6 @@ xfs_alloc_ag_vextent_small(
 	return error;
 }
 
-/*
- * Allocate a variable extent in the allocation group agno.
- * Type and bno are used to determine where in the allocation group the
- * extent will start.
- * Extent's length (returned in *len) will be between minlen and maxlen,
- * and of the form k * prod + mod unless there's nothing that large.
- * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
- */
-static int
-xfs_alloc_ag_vextent(
-	struct xfs_alloc_arg	*args)
-{
-	int			error = 0;
-
-	/*
-	 * Branch to correct routine based on the type.
-	 */
-	switch (args->type) {
-	case XFS_ALLOCTYPE_THIS_AG:
-		error = xfs_alloc_ag_vextent_size(args);
-		break;
-	case XFS_ALLOCTYPE_NEAR_BNO:
-		error = xfs_alloc_ag_vextent_near(args);
-		break;
-	case XFS_ALLOCTYPE_THIS_BNO:
-		error = xfs_alloc_ag_vextent_exact(args);
-		break;
-	default:
-		ASSERT(0);
-		/* NOTREACHED */
-	}
-	return error;
-}
-
 /*
  * Allocate a variable extent at exactly agno/bno.
  * Extent's length (returned in *len) will be between minlen and maxlen,
@@ -1351,7 +1310,6 @@ xfs_alloc_ag_vextent_locality(
 	bool			fbinc;
 
 	ASSERT(acur->len == 0);
-	ASSERT(args->type == XFS_ALLOCTYPE_NEAR_BNO);
 
 	*stat = 0;
 
@@ -3137,6 +3095,7 @@ xfs_alloc_vextent_check_args(
 	xfs_agblock_t		agsize;
 
 	args->agbno = NULLAGBLOCK;
+	args->fsbno = NULLFSBLOCK;
 
 	/*
 	 * Just fix this up, for the case where the last a.g. is shorter
@@ -3295,8 +3254,11 @@ xfs_alloc_vextent_finish(
 }
 
 /*
- * Allocate within a single AG only. Caller is expected to hold a
- * perag reference in args->pag.
+ * Allocate within a single AG only. This uses a best-fit length algorithm so if
+ * you need an exact sized allocation without locality constraints, this is the
+ * fastest way to do it.
+ *
+ * Caller is expected to hold a perag reference in args->pag.
  */
 int
 xfs_alloc_vextent_this_ag(
@@ -3305,7 +3267,6 @@ xfs_alloc_vextent_this_ag(
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_agnumber_t		minimum_agno = 0;
-	xfs_rfsblock_t		target = XFS_AGB_TO_FSB(mp, agno, 0);
 	int			error;
 
 	if (args->tp->t_highest_agno != NULLAGNUMBER)
@@ -3317,7 +3278,7 @@ xfs_alloc_vextent_this_ag(
 		return 0;
 	}
 
-	error = xfs_alloc_vextent_check_args(args, target);
+	error = xfs_alloc_vextent_check_args(args, XFS_AGB_TO_FSB(mp, agno, 0));
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
@@ -3326,12 +3287,10 @@ xfs_alloc_vextent_this_ag(
 
 	args->agno = agno;
 	args->agbno = 0;
-	args->fsbno = target;
-	args->type = XFS_ALLOCTYPE_THIS_AG;
 
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
-		error = xfs_alloc_ag_vextent(args);
+		error = xfs_alloc_ag_vextent_size(args);
 
 	return xfs_alloc_vextent_finish(args, minimum_agno, error, false);
 }
@@ -3355,6 +3314,7 @@ xfs_alloc_vextent_iterate_ags(
 	struct xfs_alloc_arg	*args,
 	xfs_agnumber_t		minimum_agno,
 	xfs_agnumber_t		start_agno,
+	xfs_agblock_t		target_agbno,
 	uint32_t		flags)
 {
 	struct xfs_mount	*mp = args->mp;
@@ -3369,7 +3329,6 @@ xfs_alloc_vextent_iterate_ags(
 	args->agno = start_agno;
 	for (;;) {
 		args->pag = xfs_perag_get(mp, args->agno);
-		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
 		error = xfs_alloc_vextent_prepare_ag(args);
 		if (error)
 			break;
@@ -3379,16 +3338,18 @@ xfs_alloc_vextent_iterate_ags(
 			 * Allocation is supposed to succeed now, so break out
 			 * of the loop regardless of whether we succeed or not.
 			 */
-			error = xfs_alloc_ag_vextent(args);
+			if (args->agno == start_agno && target_agbno) {
+				args->agbno = target_agbno;
+				error = xfs_alloc_ag_vextent_near(args);
+			} else {
+				args->agbno = 0;
+				error = xfs_alloc_ag_vextent_size(args);
+			}
 			break;
 		}
 
 		trace_xfs_alloc_vextent_loopfailed(args);
 
-		if (args->agno == start_agno &&
-		    args->otype == XFS_ALLOCTYPE_NEAR_BNO)
-			args->type = XFS_ALLOCTYPE_THIS_AG;
-
 		/*
 		 * If we are try-locking, we can't deadlock on AGF locks so we
 		 * can wrap all the way back to the first AG. Otherwise, wrap
@@ -3412,10 +3373,8 @@ xfs_alloc_vextent_iterate_ags(
 				trace_xfs_alloc_vextent_allfailed(args);
 				break;
 			}
-
+			args->agbno = target_agbno;
 			flags = 0;
-			if (args->otype == XFS_ALLOCTYPE_NEAR_BNO)
-				args->type = XFS_ALLOCTYPE_NEAR_BNO;
 		}
 		xfs_perag_put(args->pag);
 		args->pag = NULL;
@@ -3464,13 +3423,11 @@ xfs_alloc_vextent_start_ag(
 				mp->m_sb.sb_agcount), 0);
 		bump_rotor = 1;
 	}
-	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
-	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
-	args->type = XFS_ALLOCTYPE_NEAR_BNO;
-	args->fsbno = target;
 
+	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
 	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno, start_agno,
-			XFS_ALLOC_FLAG_TRYLOCK);
+			XFS_FSB_TO_AGBNO(mp, target), XFS_ALLOC_FLAG_TRYLOCK);
+
 	if (bump_rotor) {
 		if (args->agno == start_agno)
 			mp->m_agfrotor = (mp->m_agfrotor + 1) %
@@ -3484,9 +3441,9 @@ xfs_alloc_vextent_start_ag(
 }
 
 /*
- * Iterate from the agno indicated from args->fsbno through to the end of the
+ * Iterate from the agno indicated via @target through to the end of the
  * filesystem attempting blocking allocation. This does not wrap or try a second
- * pass, so will not recurse into AGs lower than indicated by fsbno.
+ * pass, so will not recurse into AGs lower than indicated by the target.
  */
 int
 xfs_alloc_vextent_first_ag(
@@ -3509,10 +3466,8 @@ xfs_alloc_vextent_first_ag(
 	}
 
 	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
-	args->type = XFS_ALLOCTYPE_THIS_AG;
-	args->fsbno = target;
-	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno,
-			start_agno, 0);
+	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno, start_agno,
+			XFS_FSB_TO_AGBNO(mp, target), 0);
 	return xfs_alloc_vextent_finish(args, minimum_agno, error, true);
 }
 
@@ -3547,12 +3502,10 @@ xfs_alloc_vextent_exact_bno(
 	}
 
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
-	args->fsbno = target;
-	args->type = XFS_ALLOCTYPE_THIS_BNO;
 
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
-		error = xfs_alloc_ag_vextent(args);
+		error = xfs_alloc_ag_vextent_exact(args);
 
 	return xfs_alloc_vextent_finish(args, minimum_agno, error, false);
 }
@@ -3594,11 +3547,10 @@ xfs_alloc_vextent_near_bno(
 		args->pag = xfs_perag_get(mp, args->agno);
 
 	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
-	args->type = XFS_ALLOCTYPE_NEAR_BNO;
 
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
-		error = xfs_alloc_ag_vextent(args);
+		error = xfs_alloc_ag_vextent_near(args);
 
 	return xfs_alloc_vextent_finish(args, minimum_agno, error, needs_perag);
 }
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 106b4deb1110..689419409e09 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -16,21 +16,6 @@ extern struct workqueue_struct *xfs_alloc_wq;
 
 unsigned int xfs_agfl_size(struct xfs_mount *mp);
 
-/*
- * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
- */
-#define XFS_ALLOCTYPE_THIS_AG	0x08	/* anywhere in this a.g. */
-#define XFS_ALLOCTYPE_NEAR_BNO	0x20	/* in this a.g. and near this block */
-#define XFS_ALLOCTYPE_THIS_BNO	0x40	/* at exactly this block */
-
-/* this should become an enum again when the tracing code is fixed */
-typedef unsigned int xfs_alloctype_t;
-
-#define XFS_ALLOC_TYPES \
-	{ XFS_ALLOCTYPE_THIS_AG,	"THIS_AG" }, \
-	{ XFS_ALLOCTYPE_NEAR_BNO,	"NEAR_BNO" }, \
-	{ XFS_ALLOCTYPE_THIS_BNO,	"THIS_BNO" }
-
 /*
  * Flags for xfs_alloc_fix_freelist.
  */
@@ -64,8 +49,6 @@ typedef struct xfs_alloc_arg {
 	xfs_agblock_t	min_agbno;	/* set an agbno range for NEAR allocs */
 	xfs_agblock_t	max_agbno;	/* ... */
 	xfs_extlen_t	len;		/* output: actual size of extent */
-	xfs_alloctype_t	type;		/* allocation type XFS_ALLOCTYPE_... */
-	xfs_alloctype_t	otype;		/* original allocation type */
 	int		datatype;	/* mask defining data type treatment */
 	char		wasdel;		/* set if allocation was prev delayed */
 	char		wasfromfl;	/* set if allocation is from freelist */
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c9902df16e25..ba74aea034b0 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3501,7 +3501,6 @@ xfs_bmap_btalloc_at_eof(
 	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
-	xfs_alloctype_t		atype;
 	int			error;
 
 	/*
@@ -3513,14 +3512,12 @@ xfs_bmap_btalloc_at_eof(
 	if (ap->offset) {
 		xfs_extlen_t	nextminlen = 0;
 
-		atype = args->type;
-		args->alignment = 1;
-
 		/*
 		 * Compute the minlen+alignment for the next case.  Set slop so
 		 * that the value of minlen+alignment+slop doesn't go up between
 		 * the calls.
 		 */
+		args->alignment = 1;
 		if (blen > stripe_align && blen <= args->maxlen)
 			nextminlen = blen - stripe_align;
 		else
@@ -3544,17 +3541,15 @@ xfs_bmap_btalloc_at_eof(
 		 * according to the original allocation specification.
 		 */
 		args->pag = NULL;
-		args->type = atype;
 		args->alignment = stripe_align;
 		args->minlen = nextminlen;
 		args->minalignslop = 0;
 	} else {
-		args->alignment = stripe_align;
-		atype = args->type;
 		/*
 		 * Adjust minlen to try and preserve alignment if we
 		 * can't guarantee an aligned maxlen extent.
 		 */
+		args->alignment = stripe_align;
 		if (blen > args->alignment &&
 		    blen <= args->maxlen + args->alignment)
 			args->minlen = blen - args->alignment;
@@ -3576,7 +3571,6 @@ xfs_bmap_btalloc_at_eof(
 	 * original non-aligned state so the caller can proceed on allocation
 	 * failure as if this function was never called.
 	 */
-	args->type = atype;
 	args->fsbno = ap->blkno;
 	args->alignment = 1;
 	return 0;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index c921e9a5256d..3b25b10fccc1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1799,8 +1799,6 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__field(xfs_extlen_t, alignment)
 		__field(xfs_extlen_t, minalignslop)
 		__field(xfs_extlen_t, len)
-		__field(short, type)
-		__field(short, otype)
 		__field(char, wasdel)
 		__field(char, wasfromfl)
 		__field(int, resv)
@@ -1820,8 +1818,6 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		__entry->alignment = args->alignment;
 		__entry->minalignslop = args->minalignslop;
 		__entry->len = args->len;
-		__entry->type = args->type;
-		__entry->otype = args->otype;
 		__entry->wasdel = args->wasdel;
 		__entry->wasfromfl = args->wasfromfl;
 		__entry->resv = args->resv;
@@ -1830,7 +1826,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 	),
 	TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u mod %u "
 		  "prod %u minleft %u total %u alignment %u minalignslop %u "
-		  "len %u type %s otype %s wasdel %d wasfromfl %d resv %d "
+		  "len %u wasdel %d wasfromfl %d resv %d "
 		  "datatype 0x%x highest_agno 0x%x",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
@@ -1844,8 +1840,6 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
 		  __entry->alignment,
 		  __entry->minalignslop,
 		  __entry->len,
-		  __print_symbolic(__entry->type, XFS_ALLOC_TYPES),
-		  __print_symbolic(__entry->otype, XFS_ALLOC_TYPES),
 		  __entry->wasdel,
 		  __entry->wasfromfl,
 		  __entry->resv,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 27/42] xfs: move the minimum agno checks into xfs_alloc_vextent_check_args
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (25 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 26/42] xfs: fold xfs_alloc_ag_vextent() into callers Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 28/42] xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker Dave Chinner
                   ` (15 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

All of the allocation functions now extract the minimum allowed AG
from the transaction and then use it in some way. The allocation
functions that are restricted to a single AG all check if the
AG requested can be allocated from and return an error if so. These
all set args->agno appropriately.

All the allocation functions that iterate AGs use it to calculate
the scan start AG. args->agno is not set until the iterator starts
walking AGs.

Hence we can easily set up a conditional check against the minimum
AG allowed in xfs_alloc_vextent_check_args() based on whether
args->agno contains NULLAGNUMBER or not and move all the repeated
setup code to xfs_alloc_vextent_check_args(), further simplifying
the allocation functions.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 88 +++++++++++++++------------------------
 1 file changed, 33 insertions(+), 55 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 4de9026d872f..43a054002da3 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3089,14 +3089,18 @@ xfs_alloc_read_agf(
 static int
 xfs_alloc_vextent_check_args(
 	struct xfs_alloc_arg	*args,
-	xfs_rfsblock_t		target)
+	xfs_rfsblock_t		target,
+	xfs_agnumber_t		*minimum_agno)
 {
 	struct xfs_mount	*mp = args->mp;
 	xfs_agblock_t		agsize;
 
-	args->agbno = NULLAGBLOCK;
 	args->fsbno = NULLFSBLOCK;
 
+	*minimum_agno = 0;
+	if (args->tp->t_highest_agno != NULLAGNUMBER)
+		*minimum_agno = args->tp->t_highest_agno;
+
 	/*
 	 * Just fix this up, for the case where the last a.g. is shorter
 	 * (or there's only one a.g.) and the caller couldn't easily figure
@@ -3123,11 +3127,16 @@ xfs_alloc_vextent_check_args(
 	    XFS_FSB_TO_AGBNO(mp, target) >= agsize ||
 	    args->minlen > args->maxlen || args->minlen > agsize ||
 	    args->mod >= args->prod) {
-		args->fsbno = NULLFSBLOCK;
 		trace_xfs_alloc_vextent_badargs(args);
 		return -ENOSPC;
 	}
+
+	if (args->agno != NULLAGNUMBER && *minimum_agno > args->agno) {
+		trace_xfs_alloc_vextent_skip_deadlock(args);
+		return -ENOSPC;
+	}
 	return 0;
+
 }
 
 /*
@@ -3266,28 +3275,19 @@ xfs_alloc_vextent_this_ag(
 	xfs_agnumber_t		agno)
 {
 	struct xfs_mount	*mp = args->mp;
-	xfs_agnumber_t		minimum_agno = 0;
+	xfs_agnumber_t		minimum_agno;
 	int			error;
 
-	if (args->tp->t_highest_agno != NULLAGNUMBER)
-		minimum_agno = args->tp->t_highest_agno;
-
-	if (minimum_agno > agno) {
-		trace_xfs_alloc_vextent_skip_deadlock(args);
-		args->fsbno = NULLFSBLOCK;
-		return 0;
-	}
-
-	error = xfs_alloc_vextent_check_args(args, XFS_AGB_TO_FSB(mp, agno, 0));
+	args->agno = agno;
+	args->agbno = 0;
+	error = xfs_alloc_vextent_check_args(args, XFS_AGB_TO_FSB(mp, agno, 0),
+			&minimum_agno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
 		return error;
 	}
 
-	args->agno = agno;
-	args->agbno = 0;
-
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
 		error = xfs_alloc_ag_vextent_size(args);
@@ -3400,16 +3400,15 @@ xfs_alloc_vextent_start_ag(
 	xfs_rfsblock_t		target)
 {
 	struct xfs_mount	*mp = args->mp;
-	xfs_agnumber_t		minimum_agno = 0;
+	xfs_agnumber_t		minimum_agno;
 	xfs_agnumber_t		start_agno;
 	xfs_agnumber_t		rotorstep = xfs_rotorstep;
 	bool			bump_rotor = false;
 	int			error;
 
-	if (args->tp->t_highest_agno != NULLAGNUMBER)
-		minimum_agno = args->tp->t_highest_agno;
-
-	error = xfs_alloc_vextent_check_args(args, target);
+	args->agno = NULLAGNUMBER;
+	args->agbno = NULLAGBLOCK;
+	error = xfs_alloc_vextent_check_args(args, target, &minimum_agno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
@@ -3451,14 +3450,13 @@ xfs_alloc_vextent_first_ag(
 	xfs_rfsblock_t		target)
  {
 	struct xfs_mount	*mp = args->mp;
-	xfs_agnumber_t		minimum_agno = 0;
+	xfs_agnumber_t		minimum_agno;
 	xfs_agnumber_t		start_agno;
 	int			error;
 
-	if (args->tp->t_highest_agno != NULLAGNUMBER)
-		minimum_agno = args->tp->t_highest_agno;
-
-	error = xfs_alloc_vextent_check_args(args, target);
+	args->agno = NULLAGNUMBER;
+	args->agbno = NULLAGBLOCK;
+	error = xfs_alloc_vextent_check_args(args, target, &minimum_agno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
@@ -3481,28 +3479,18 @@ xfs_alloc_vextent_exact_bno(
 	xfs_rfsblock_t		target)
 {
 	struct xfs_mount	*mp = args->mp;
-	xfs_agnumber_t		minimum_agno = 0;
+	xfs_agnumber_t		minimum_agno;
 	int			error;
 
-	if (args->tp->t_highest_agno != NULLAGNUMBER)
-		minimum_agno = args->tp->t_highest_agno;
-
-	error = xfs_alloc_vextent_check_args(args, target);
+	args->agno = XFS_FSB_TO_AGNO(mp, target);
+	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
+	error = xfs_alloc_vextent_check_args(args, target, &minimum_agno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
 		return error;
 	}
 
-	args->agno = XFS_FSB_TO_AGNO(mp, target);
-	if (minimum_agno > args->agno) {
-		trace_xfs_alloc_vextent_skip_deadlock(args);
-		args->fsbno = NULLFSBLOCK;
-		return 0;
-	}
-
-	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
-
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
 		error = xfs_alloc_ag_vextent_exact(args);
@@ -3522,32 +3510,22 @@ xfs_alloc_vextent_near_bno(
 	xfs_rfsblock_t		target)
 {
 	struct xfs_mount	*mp = args->mp;
-	xfs_agnumber_t		minimum_agno = 0;
+	xfs_agnumber_t		minimum_agno;
 	bool			needs_perag = args->pag == NULL;
 	int			error;
 
-	if (args->tp->t_highest_agno != NULLAGNUMBER)
-		minimum_agno = args->tp->t_highest_agno;
-
-	error = xfs_alloc_vextent_check_args(args, target);
+	args->agno = XFS_FSB_TO_AGNO(mp, target);
+	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
+	error = xfs_alloc_vextent_check_args(args, target, &minimum_agno);
 	if (error) {
 		if (error == -ENOSPC)
 			return 0;
 		return error;
 	}
 
-	args->agno = XFS_FSB_TO_AGNO(mp, target);
-	if (minimum_agno > args->agno) {
-		trace_xfs_alloc_vextent_skip_deadlock(args);
-		args->fsbno = NULLFSBLOCK;
-		return 0;
-	}
-
 	if (needs_perag)
 		args->pag = xfs_perag_get(mp, args->agno);
 
-	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
-
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
 		error = xfs_alloc_ag_vextent_near(args);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 28/42] xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (26 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 27/42] xfs: move the minimum agno checks into xfs_alloc_vextent_check_args Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 23:13   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 29/42] xfs: convert trim to use for_each_perag_range Dave Chinner
                   ` (14 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that the AG iteration code in the core allocation code has been
cleaned up, we can easily convert it to use a for_each_perag..()
variant to use active references and skip AGs that it can't get
active references on.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_ag.h    | 22 ++++++---
 fs/xfs/libxfs/xfs_alloc.c | 98 ++++++++++++++++++---------------------
 2 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 8f43b91d4cf3..5e18536dfdce 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -253,6 +253,7 @@ xfs_perag_next_wrap(
 	struct xfs_perag	*pag,
 	xfs_agnumber_t		*agno,
 	xfs_agnumber_t		stop_agno,
+	xfs_agnumber_t		restart_agno,
 	xfs_agnumber_t		wrap_agno)
 {
 	struct xfs_mount	*mp = pag->pag_mount;
@@ -260,10 +261,11 @@ xfs_perag_next_wrap(
 	*agno = pag->pag_agno + 1;
 	xfs_perag_rele(pag);
 	while (*agno != stop_agno) {
-		if (*agno >= wrap_agno)
-			*agno = 0;
-		if (*agno == stop_agno)
-			break;
+		if (*agno >= wrap_agno) {
+			if (restart_agno >= stop_agno)
+				break;
+			*agno = restart_agno;
+		}
 
 		pag = xfs_perag_grab(mp, *agno);
 		if (pag)
@@ -274,14 +276,20 @@ xfs_perag_next_wrap(
 }
 
 /*
- * Iterate all AGs from start_agno through wrap_agno, then 0 through
+ * Iterate all AGs from start_agno through wrap_agno, then restart_agno through
  * (start_agno - 1).
  */
-#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag) \
+#define for_each_perag_wrap_range(mp, start_agno, restart_agno, wrap_agno, agno, pag) \
 	for ((agno) = (start_agno), (pag) = xfs_perag_grab((mp), (agno)); \
 		(pag) != NULL; \
 		(pag) = xfs_perag_next_wrap((pag), &(agno), (start_agno), \
-				(wrap_agno)))
+				(restart_agno), (wrap_agno)))
+/*
+ * Iterate all AGs from start_agno through wrap_agno, then 0 through
+ * (start_agno - 1).
+ */
+#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag) \
+	for_each_perag_wrap_range((mp), (start_agno), 0, (wrap_agno), (agno), (pag))
 
 /*
  * Iterate all AGs from start_agno through to the end of the filesystem, then 0
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 43a054002da3..39f3e76efcab 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3156,6 +3156,7 @@ xfs_alloc_vextent_prepare_ag(
 	if (need_pag)
 		args->pag = xfs_perag_get(args->mp, args->agno);
 
+	args->agbp = NULL;
 	error = xfs_alloc_fix_freelist(args, 0);
 	if (error) {
 		trace_xfs_alloc_vextent_nofix(args);
@@ -3255,8 +3256,8 @@ xfs_alloc_vextent_finish(
 	XFS_STATS_ADD(mp, xs_allocb, args->len);
 
 out_drop_perag:
-	if (drop_perag) {
-		xfs_perag_put(args->pag);
+	if (drop_perag && args->pag) {
+		xfs_perag_rele(args->pag);
 		args->pag = NULL;
 	}
 	return error;
@@ -3304,6 +3305,10 @@ xfs_alloc_vextent_this_ag(
  * we attempt to allocation in as there is no locality optimisation possible for
  * those allocations.
  *
+ * On return, args->pag may be left referenced if we finish before the "all
+ * failed" return point. The allocation finish still needs the perag, and
+ * so the caller will release it once they've finished the allocation.
+ *
  * When we wrap the AG iteration at the end of the filesystem, we have to be
  * careful not to wrap into AGs below ones we already have locked in the
  * transaction if we are doing a blocking iteration. This will result in an
@@ -3318,72 +3323,59 @@ xfs_alloc_vextent_iterate_ags(
 	uint32_t		flags)
 {
 	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		agno;
 	int			error = 0;
 
-	ASSERT(start_agno >= minimum_agno);
+restart:
+	for_each_perag_wrap_range(mp, start_agno, minimum_agno,
+			mp->m_sb.sb_agcount, agno, args->pag) {
+		args->agno = agno;
+		trace_printk("sag %u minag %u agno %u pag %u, agbno %u, agcnt %u",
+			start_agno, minimum_agno, agno, args->pag->pag_agno,
+			target_agbno, mp->m_sb.sb_agcount);
 
-	/*
-	 * Loop over allocation groups twice; first time with
-	 * trylock set, second time without.
-	 */
-	args->agno = start_agno;
-	for (;;) {
-		args->pag = xfs_perag_get(mp, args->agno);
 		error = xfs_alloc_vextent_prepare_ag(args);
 		if (error)
 			break;
-
-		if (args->agbp) {
-			/*
-			 * Allocation is supposed to succeed now, so break out
-			 * of the loop regardless of whether we succeed or not.
-			 */
-			if (args->agno == start_agno && target_agbno) {
-				args->agbno = target_agbno;
-				error = xfs_alloc_ag_vextent_near(args);
-			} else {
-				args->agbno = 0;
-				error = xfs_alloc_ag_vextent_size(args);
-			}
-			break;
+		if (!args->agbp) {
+			trace_xfs_alloc_vextent_loopfailed(args);
+			continue;
 		}
 
-		trace_xfs_alloc_vextent_loopfailed(args);
-
 		/*
-		 * If we are try-locking, we can't deadlock on AGF locks so we
-		 * can wrap all the way back to the first AG. Otherwise, wrap
-		 * back to the start AG so we can't deadlock and let the end of
-		 * scan handler decide what to do next.
+		 * Allocation is supposed to succeed now, so break out of the
+		 * loop regardless of whether we succeed or not.
 		 */
-		if (++(args->agno) == mp->m_sb.sb_agcount) {
-			if (flags & XFS_ALLOC_FLAG_TRYLOCK)
-				args->agno = 0;
-			else
-				args->agno = minimum_agno;
-		}
-
-		/*
-		 * Reached the starting a.g., must either be done
-		 * or switch to non-trylock mode.
-		 */
-		if (args->agno == start_agno) {
-			if (flags == 0) {
-				args->agbno = NULLAGBLOCK;
-				trace_xfs_alloc_vextent_allfailed(args);
-				break;
-			}
+		if (args->agno == start_agno && target_agbno) {
 			args->agbno = target_agbno;
-			flags = 0;
+			error = xfs_alloc_ag_vextent_near(args);
+		} else {
+			args->agbno = 0;
+			error = xfs_alloc_ag_vextent_size(args);
 		}
-		xfs_perag_put(args->pag);
+		break;
+	}
+	if (error) {
+		xfs_perag_rele(args->pag);
 		args->pag = NULL;
+		return error;
 	}
+	if (args->agbp)
+		return 0;
+
 	/*
-	 * The perag is left referenced in args for the caller to clean
-	 * up after they've finished the allocation.
+	 * We didn't find an AG we can alloation from. If we were given
+	 * constraining flags by the caller, drop them and retry the allocation
+	 * without any constraints being set.
 	 */
-	return error;
+	if (flags) {
+		flags = 0;
+		goto restart;
+	}
+
+	ASSERT(args->pag == NULL);
+	trace_xfs_alloc_vextent_allfailed(args);
+	return 0;
 }
 
 /*
@@ -3524,7 +3516,7 @@ xfs_alloc_vextent_near_bno(
 	}
 
 	if (needs_perag)
-		args->pag = xfs_perag_get(mp, args->agno);
+		args->pag = xfs_perag_grab(mp, args->agno);
 
 	error = xfs_alloc_vextent_prepare_ag(args);
 	if (!error && args->agbp)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 29/42] xfs: convert trim to use for_each_perag_range
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (27 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 28/42] xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-02-01 23:15   ` Darrick J. Wong
  2023-01-18 22:44 ` [PATCH 30/42] xfs: factor out filestreams from xfs_bmap_btalloc_nullfb Dave Chinner
                   ` (13 subsequent siblings)
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

To convert it to using active perag references and hence make it
shrink safe.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_discard.c | 50 ++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index bfc829c07f03..afc4c78b9eed 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -21,23 +21,20 @@
 
 STATIC int
 xfs_trim_extents(
-	struct xfs_mount	*mp,
-	xfs_agnumber_t		agno,
+	struct xfs_perag	*pag,
 	xfs_daddr_t		start,
 	xfs_daddr_t		end,
 	xfs_daddr_t		minlen,
 	uint64_t		*blocks_trimmed)
 {
+	struct xfs_mount	*mp = pag->pag_mount;
 	struct block_device	*bdev = mp->m_ddev_targp->bt_bdev;
 	struct xfs_btree_cur	*cur;
 	struct xfs_buf		*agbp;
 	struct xfs_agf		*agf;
-	struct xfs_perag	*pag;
 	int			error;
 	int			i;
 
-	pag = xfs_perag_get(mp, agno);
-
 	/*
 	 * Force out the log.  This means any transactions that might have freed
 	 * space before we take the AGF buffer lock are now on disk, and the
@@ -47,7 +44,7 @@ xfs_trim_extents(
 
 	error = xfs_alloc_read_agf(pag, NULL, 0, &agbp);
 	if (error)
-		goto out_put_perag;
+		return error;
 	agf = agbp->b_addr;
 
 	cur = xfs_allocbt_init_cursor(mp, NULL, agbp, pag, XFS_BTNUM_CNT);
@@ -71,10 +68,10 @@ xfs_trim_extents(
 
 		error = xfs_alloc_get_rec(cur, &fbno, &flen, &i);
 		if (error)
-			goto out_del_cursor;
+			break;
 		if (XFS_IS_CORRUPT(mp, i != 1)) {
 			error = -EFSCORRUPTED;
-			goto out_del_cursor;
+			break;
 		}
 		ASSERT(flen <= be32_to_cpu(agf->agf_longest));
 
@@ -83,15 +80,15 @@ xfs_trim_extents(
 		 * the format the range/len variables are supplied in by
 		 * userspace.
 		 */
-		dbno = XFS_AGB_TO_DADDR(mp, agno, fbno);
+		dbno = XFS_AGB_TO_DADDR(mp, pag->pag_agno, fbno);
 		dlen = XFS_FSB_TO_BB(mp, flen);
 
 		/*
 		 * Too small?  Give up.
 		 */
 		if (dlen < minlen) {
-			trace_xfs_discard_toosmall(mp, agno, fbno, flen);
-			goto out_del_cursor;
+			trace_xfs_discard_toosmall(mp, pag->pag_agno, fbno, flen);
+			break;
 		}
 
 		/*
@@ -100,7 +97,7 @@ xfs_trim_extents(
 		 * down partially overlapping ranges for now.
 		 */
 		if (dbno + dlen < start || dbno > end) {
-			trace_xfs_discard_exclude(mp, agno, fbno, flen);
+			trace_xfs_discard_exclude(mp, pag->pag_agno, fbno, flen);
 			goto next_extent;
 		}
 
@@ -109,32 +106,30 @@ xfs_trim_extents(
 		 * discard and try again the next time.
 		 */
 		if (xfs_extent_busy_search(mp, pag, fbno, flen)) {
-			trace_xfs_discard_busy(mp, agno, fbno, flen);
+			trace_xfs_discard_busy(mp, pag->pag_agno, fbno, flen);
 			goto next_extent;
 		}
 
-		trace_xfs_discard_extent(mp, agno, fbno, flen);
+		trace_xfs_discard_extent(mp, pag->pag_agno, fbno, flen);
 		error = blkdev_issue_discard(bdev, dbno, dlen, GFP_NOFS);
 		if (error)
-			goto out_del_cursor;
+			break;
 		*blocks_trimmed += flen;
 
 next_extent:
 		error = xfs_btree_decrement(cur, 0, &i);
 		if (error)
-			goto out_del_cursor;
+			break;
 
 		if (fatal_signal_pending(current)) {
 			error = -ERESTARTSYS;
-			goto out_del_cursor;
+			break;
 		}
 	}
 
 out_del_cursor:
 	xfs_btree_del_cursor(cur, error);
 	xfs_buf_relse(agbp);
-out_put_perag:
-	xfs_perag_put(pag);
 	return error;
 }
 
@@ -152,11 +147,12 @@ xfs_ioc_trim(
 	struct xfs_mount		*mp,
 	struct fstrim_range __user	*urange)
 {
+	struct xfs_perag	*pag;
 	unsigned int		granularity =
 		bdev_discard_granularity(mp->m_ddev_targp->bt_bdev);
 	struct fstrim_range	range;
 	xfs_daddr_t		start, end, minlen;
-	xfs_agnumber_t		start_agno, end_agno, agno;
+	xfs_agnumber_t		agno;
 	uint64_t		blocks_trimmed = 0;
 	int			error, last_error = 0;
 
@@ -193,18 +189,18 @@ xfs_ioc_trim(
 	end = start + BTOBBT(range.len) - 1;
 
 	if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
-		end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;
-
-	start_agno = xfs_daddr_to_agno(mp, start);
-	end_agno = xfs_daddr_to_agno(mp, end);
+		end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1;
 
-	for (agno = start_agno; agno <= end_agno; agno++) {
-		error = xfs_trim_extents(mp, agno, start, end, minlen,
+	agno = xfs_daddr_to_agno(mp, start);
+	for_each_perag_range(mp, agno, xfs_daddr_to_agno(mp, end), pag) {
+		error = xfs_trim_extents(pag, start, end, minlen,
 					  &blocks_trimmed);
 		if (error) {
 			last_error = error;
-			if (error == -ERESTARTSYS)
+			if (error == -ERESTARTSYS) {
+				xfs_perag_rele(pag);
 				break;
+			}
 		}
 	}
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 30/42] xfs: factor out filestreams from xfs_bmap_btalloc_nullfb
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (28 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 29/42] xfs: convert trim to use for_each_perag_range Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 31/42] xfs: get rid of notinit from xfs_bmap_longest_free_extent Dave Chinner
                   ` (12 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

There's many if (filestreams) {} else {} branches in this function.
Split it out into a filestreams specific function so that we can
then work directly on cleaning up the filestreams code without
impacting the rest of the allocation algorithms.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 167 ++++++++++++++++++++++-----------------
 1 file changed, 96 insertions(+), 71 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index ba74aea034b0..7ae08b44e4d8 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3234,8 +3234,8 @@ xfs_bmap_btalloc_select_lengths(
 	return 0;
 }
 
-STATIC int
-xfs_bmap_btalloc_filestreams(
+static int
+xfs_bmap_btalloc_filestreams_select_lengths(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		*blen)
@@ -3576,54 +3576,109 @@ xfs_bmap_btalloc_at_eof(
 	return 0;
 }
 
+/*
+ * We have failed multiple allocation attempts so now are in a low space
+ * allocation situation. Try a locality first full filesystem minimum length
+ * allocation whilst still maintaining necessary total block reservation
+ * requirements.
+ *
+ * If that fails, we are now critically low on space, so perform a last resort
+ * allocation attempt: no reserve, no locality, blocking, minimum length, full
+ * filesystem free space scan. We also indicate to future allocations in this
+ * transaction that we are critically low on space so they don't waste time on
+ * allocation modes that are unlikely to succeed.
+ */
 static int
-xfs_bmap_btalloc_best_length(
+xfs_bmap_btalloc_low_space(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args)
+{
+	int			error;
+
+	if (args->minlen > ap->minlen) {
+		args->minlen = ap->minlen;
+		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
+		if (error || args->fsbno != NULLFSBLOCK)
+			return error;
+	}
+
+	/* Last ditch attempt before failure is declared. */
+	args->total = ap->minlen;
+	error = xfs_alloc_vextent_first_ag(args, 0);
+	if (error)
+		return error;
+	ap->tp->t_flags |= XFS_TRANS_LOWMODE;
+	return 0;
+}
+
+static int
+xfs_bmap_btalloc_filestreams(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	int			stripe_align)
 {
-	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		agno = xfs_filestream_lookup_ag(ap->ip);
 	xfs_extlen_t		blen = 0;
-	bool			is_filestream = false;
 	int			error;
 
-	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
-	    xfs_inode_is_filestream(ap->ip))
-		is_filestream = true;
+	/* Determine the initial block number we will target for allocation. */
+	if (agno == NULLAGNUMBER)
+		agno = 0;
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
+	xfs_bmap_adjacent(ap);
 
 	/*
-	 * Determine the initial block number we will target for allocation.
+	 * If there is very little free space before we start a
+	 * filestreams allocation, we're almost guaranteed to fail to
+	 * find an AG with enough contiguous free space to succeed, so
+	 * just go straight to the low space algorithm.
 	 */
-	if (is_filestream) {
-		xfs_agnumber_t	agno = xfs_filestream_lookup_ag(ap->ip);
-		if (agno == NULLAGNUMBER)
-			agno = 0;
-		ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
-	} else {
-		ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		args->minlen = ap->minlen;
+		return xfs_bmap_btalloc_low_space(ap, args);
 	}
-	xfs_bmap_adjacent(ap);
 
 	/*
 	 * Search for an allocation group with a single extent large enough for
 	 * the request.  If one isn't found, then adjust the minimum allocation
 	 * size to the largest space found.
 	 */
-	if (is_filestream) {
-		/*
-		 * If there is very little free space before we start a
-		 * filestreams allocation, we're almost guaranteed to fail to
-		 * find an AG with enough contiguous free space to succeed, so
-		 * just go straight to the low space algorithm.
-		 */
-		if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
-			args->minlen = ap->minlen;
-			goto critically_low_space;
-		}
-		error = xfs_bmap_btalloc_filestreams(ap, args, &blen);
-	} else {
-		error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
+	error = xfs_bmap_btalloc_filestreams_select_lengths(ap, args, &blen);
+	if (error)
+		return error;
+
+	if (ap->aeof) {
+		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
+				true);
+		if (error || args->fsbno != NULLFSBLOCK)
+			return error;
 	}
+
+	error = xfs_alloc_vextent_near_bno(args, ap->blkno);
+	if (error || args->fsbno != NULLFSBLOCK)
+		return error;
+
+	return xfs_bmap_btalloc_low_space(ap, args);
+}
+
+static int
+xfs_bmap_btalloc_best_length(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	int			stripe_align)
+{
+	xfs_extlen_t		blen = 0;
+	int			error;
+
+	ap->blkno = XFS_INO_TO_FSB(args->mp, ap->ip->i_ino);
+	xfs_bmap_adjacent(ap);
+
+	/*
+	 * Search for an allocation group with a single extent large enough for
+	 * the request.  If one isn't found, then adjust the minimum allocation
+	 * size to the largest space found.
+	 */
+	error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
 	if (error)
 		return error;
 
@@ -3635,50 +3690,16 @@ xfs_bmap_btalloc_best_length(
 	 */
 	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
 		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
-				is_filestream);
-		if (error)
+				false);
+		if (error || args->fsbno != NULLFSBLOCK)
 			return error;
-		if (args->fsbno != NULLFSBLOCK)
-			return 0;
 	}
 
-	if (is_filestream)
-		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
-	else
-		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
-	if (error)
+	error = xfs_alloc_vextent_start_ag(args, ap->blkno);
+	if (error || args->fsbno != NULLFSBLOCK)
 		return error;
-	if (args->fsbno != NULLFSBLOCK)
-		return 0;
-
-	/*
-	 * Try a locality first full filesystem minimum length allocation whilst
-	 * still maintaining necessary total block reservation requirements.
-	 */
-	if (args->minlen > ap->minlen) {
-		args->minlen = ap->minlen;
-		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
-		if (error)
-			return error;
-	}
-	if (args->fsbno != NULLFSBLOCK)
-		return 0;
 
-	/*
-	 * We are now critically low on space, so this is a last resort
-	 * allocation attempt: no reserve, no locality, blocking, minimum
-	 * length, full filesystem free space scan. We also indicate to future
-	 * allocations in this transaction that we are critically low on space
-	 * so they don't waste time on allocation modes that are unlikely to
-	 * succeed.
-	 */
-critically_low_space:
-	args->total = ap->minlen;
-	error = xfs_alloc_vextent_first_ag(args, 0);
-	if (error)
-		return error;
-	ap->tp->t_flags |= XFS_TRANS_LOWMODE;
-	return 0;
+	return xfs_bmap_btalloc_low_space(ap, args);
 }
 
 static int
@@ -3712,7 +3733,11 @@ xfs_bmap_btalloc(
 	/* Trim the allocation back to the maximum an AG can fit. */
 	args.maxlen = min(ap->length, mp->m_ag_max_usable);
 
-	error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
+	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
+	    xfs_inode_is_filestream(ap->ip))
+		error = xfs_bmap_btalloc_filestreams(ap, &args, stripe_align);
+	else
+		error = xfs_bmap_btalloc_best_length(ap, &args, stripe_align);
 	if (error)
 		return error;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 31/42] xfs: get rid of notinit from xfs_bmap_longest_free_extent
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (29 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 30/42] xfs: factor out filestreams from xfs_bmap_btalloc_nullfb Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 32/42] xfs: use xfs_bmap_longest_free_extent() in filestreams Dave Chinner
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

It is only set if reading the AGF gets a EAGAIN error. Just return
the EAGAIN error and handle that error in the callers.

This means we can remove the not_init parameter from
xfs_bmap_select_minlen(), too, because the use of not_init there is
pessimistic. If we can't read the agf, it won't increase blen.

The only time we actually care whether we checked all the AGFs for
contiguous free space is when the best length is less than the
minimum allocation length. If not_init is set, then we ignore blen
and set the minimum alloc length to the absolute minimum, not the
best length we know already is present.

However, if blen is less than the minimum we're going to ignore it
anyway, regardless of whether we scanned all the AGFs or not.  Hence
not_init can go away, because we only use if blen is good from
the scanned AGs otherwise we ignore it altogether and use minlen.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 84 +++++++++++++++++-----------------------
 1 file changed, 36 insertions(+), 48 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7ae08b44e4d8..58790951be3e 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3139,8 +3139,7 @@ static int
 xfs_bmap_longest_free_extent(
 	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
-	xfs_extlen_t		*blen,
-	int			*notinit)
+	xfs_extlen_t		*blen)
 {
 	xfs_extlen_t		longest;
 	int			error = 0;
@@ -3148,14 +3147,8 @@ xfs_bmap_longest_free_extent(
 	if (!xfs_perag_initialised_agf(pag)) {
 		error = xfs_alloc_read_agf(pag, tp, XFS_ALLOC_FLAG_TRYLOCK,
 				NULL);
-		if (error) {
-			/* Couldn't lock the AGF, so skip this AG. */
-			if (error == -EAGAIN) {
-				*notinit = 1;
-				error = 0;
-			}
+		if (error)
 			return error;
-		}
 	}
 
 	longest = xfs_alloc_longest_free_extent(pag,
@@ -3167,32 +3160,28 @@ xfs_bmap_longest_free_extent(
 	return 0;
 }
 
-static void
+static xfs_extlen_t
 xfs_bmap_select_minlen(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen,
-	int			notinit)
+	xfs_extlen_t		blen)
 {
-	if (notinit || *blen < ap->minlen) {
-		/*
-		 * Since we did a BUF_TRYLOCK above, it is possible that
-		 * there is space for this request.
-		 */
-		args->minlen = ap->minlen;
-	} else if (*blen < args->maxlen) {
-		/*
-		 * If the best seen length is less than the request length,
-		 * use the best as the minimum.
-		 */
-		args->minlen = *blen;
-	} else {
-		/*
-		 * Otherwise we've seen an extent as big as maxlen, use that
-		 * as the minimum.
-		 */
-		args->minlen = args->maxlen;
-	}
+
+	/*
+	 * Since we used XFS_ALLOC_FLAG_TRYLOCK in _longest_free_extent(), it is
+	 * possible that there is enough contiguous free space for this request.
+	 */
+	if (blen < ap->minlen)
+		return ap->minlen;
+
+	/*
+	 * If the best seen length is less than the request length,
+	 * use the best as the minimum, otherwise we've got the maxlen we
+	 * were asked for.
+	 */
+	if (blen < args->maxlen)
+		return blen;
+	return args->maxlen;
 }
 
 static int
@@ -3204,7 +3193,6 @@ xfs_bmap_btalloc_select_lengths(
 	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	xfs_agnumber_t		agno, startag;
-	int			notinit = 0;
 	int			error = 0;
 
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
@@ -3220,9 +3208,8 @@ xfs_bmap_btalloc_select_lengths(
 
 	*blen = 0;
 	for_each_perag_wrap(mp, startag, agno, pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen,
-						     &notinit);
-		if (error)
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+		if (error && error != -EAGAIN)
 			break;
 		if (*blen >= args->maxlen)
 			break;
@@ -3230,7 +3217,7 @@ xfs_bmap_btalloc_select_lengths(
 	if (pag)
 		xfs_perag_rele(pag);
 
-	xfs_bmap_select_minlen(ap, args, blen, notinit);
+	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
 	return 0;
 }
 
@@ -3243,7 +3230,6 @@ xfs_bmap_btalloc_filestreams_select_lengths(
 	struct xfs_mount	*mp = ap->ip->i_mount;
 	struct xfs_perag	*pag;
 	xfs_agnumber_t		start_agno;
-	int			notinit = 0;
 	int			error;
 
 	args->total = ap->total;
@@ -3254,11 +3240,13 @@ xfs_bmap_btalloc_filestreams_select_lengths(
 
 	pag = xfs_perag_grab(mp, start_agno);
 	if (pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen,
-				&notinit);
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
 		xfs_perag_rele(pag);
-		if (error)
-			return error;
+		if (error) {
+			if (error != -EAGAIN)
+				return error;
+			*blen = 0;
+		}
 	}
 
 	if (*blen < args->maxlen) {
@@ -3274,18 +3262,18 @@ xfs_bmap_btalloc_filestreams_select_lengths(
 		if (!pag)
 			goto out_select;
 
-		error = xfs_bmap_longest_free_extent(pag, args->tp,
-				blen, &notinit);
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
 		xfs_perag_rele(pag);
-		if (error)
-			return error;
-
+		if (error) {
+			if (error != -EAGAIN)
+				return error;
+			*blen = 0;
+		}
 		start_agno = agno;
-
 	}
 
 out_select:
-	xfs_bmap_select_minlen(ap, args, blen, notinit);
+	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
 
 	/*
 	 * Set the failure fallback case to look in the selected AG as stream
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 32/42] xfs: use xfs_bmap_longest_free_extent() in filestreams
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (30 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 31/42] xfs: get rid of notinit from xfs_bmap_longest_free_extent Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 33/42] xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c Dave Chinner
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The code in xfs_bmap_longest_free_extent() is open coded in
xfs_filestream_pick_ag(). Export xfs_bmap_longest_free_extent and
call it from the filestreams code instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c |  2 +-
 fs/xfs/libxfs/xfs_bmap.h |  2 ++
 fs/xfs/xfs_filestream.c  | 22 ++++++++--------------
 3 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 58790951be3e..c6a617dada27 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3135,7 +3135,7 @@ xfs_bmap_adjacent(
 #undef ISVALID
 }
 
-static int
+int
 xfs_bmap_longest_free_extent(
 	struct xfs_perag	*pag,
 	struct xfs_trans	*tp,
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 0ffc0d998850..7bd619eb2f7d 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -168,6 +168,8 @@ static inline bool xfs_bmap_is_written_extent(struct xfs_bmbt_irec *irec)
 #define xfs_valid_startblock(ip, startblock) \
 	((startblock) != 0 || XFS_IS_REALTIME_INODE(ip))
 
+int	xfs_bmap_longest_free_extent(struct xfs_perag *pag,
+		struct xfs_trans *tp, xfs_extlen_t *blen);
 void	xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno,
 		xfs_filblks_t len);
 unsigned int xfs_bmap_compute_attr_offset(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 7e8b25ab6c46..2eb702034d05 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -124,17 +124,14 @@ xfs_filestream_pick_ag(
 		trace_xfs_filestream_scan(mp, ip->i_ino, ag);
 
 		pag = xfs_perag_get(mp, ag);
-
-		if (!xfs_perag_initialised_agf(pag)) {
-			err = xfs_alloc_read_agf(pag, NULL, trylock, NULL);
-			if (err) {
-				if (err != -EAGAIN) {
-					xfs_perag_put(pag);
-					return err;
-				}
-				/* Couldn't lock the AGF, skip this AG. */
-				goto next_ag;
-			}
+		longest = 0;
+		err = xfs_bmap_longest_free_extent(pag, NULL, &longest);
+		if (err) {
+			xfs_perag_put(pag);
+			if (err != -EAGAIN)
+				return err;
+			/* Couldn't lock the AGF, skip this AG. */
+			goto next_ag;
 		}
 
 		/* Keep track of the AG with the most free blocks. */
@@ -154,9 +151,6 @@ xfs_filestream_pick_ag(
 			goto next_ag;
 		}
 
-		longest = xfs_alloc_longest_free_extent(pag,
-				xfs_alloc_min_freelist(mp, pag),
-				xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE));
 		if (((minlen && longest >= minlen) ||
 		     (!minlen && pag->pagf_freeblks >= minfree)) &&
 		    (!xfs_perag_prefers_metadata(pag) ||
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 33/42] xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (31 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 32/42] xfs: use xfs_bmap_longest_free_extent() in filestreams Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 34/42] xfs: merge filestream AG lookup into xfs_filestream_select_ag() Dave Chinner
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_bmap_btalloc_filestreams() calls two filestreams functions to
select the AG to allocate from. Both those functions end up in
the same selection function that iterates all AGs multiple times.
Worst case, xfs_bmap_btalloc_filestreams() can iterate all AGs 4
times just to select the initial AG to allocate in.

Move the AG selection to fs/xfs/xfs_filestreams.c as a single
interface so that the inefficient AG interation is contained
entirely within the filestreams code. This will allow the
implementation to be simplified and made more efficient in future
patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c |  94 +++++-------------------------------
 fs/xfs/libxfs/xfs_bmap.h |   3 ++
 fs/xfs/xfs_filestream.c  | 100 ++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_filestream.h  |   5 +-
 4 files changed, 115 insertions(+), 87 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c6a617dada27..098b46f3f3e3 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3221,68 +3221,6 @@ xfs_bmap_btalloc_select_lengths(
 	return 0;
 }
 
-static int
-xfs_bmap_btalloc_filestreams_select_lengths(
-	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen)
-{
-	struct xfs_mount	*mp = ap->ip->i_mount;
-	struct xfs_perag	*pag;
-	xfs_agnumber_t		start_agno;
-	int			error;
-
-	args->total = ap->total;
-
-	start_agno = XFS_FSB_TO_AGNO(mp, ap->blkno);
-	if (start_agno == NULLAGNUMBER)
-		start_agno = 0;
-
-	pag = xfs_perag_grab(mp, start_agno);
-	if (pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-		xfs_perag_rele(pag);
-		if (error) {
-			if (error != -EAGAIN)
-				return error;
-			*blen = 0;
-		}
-	}
-
-	if (*blen < args->maxlen) {
-		xfs_agnumber_t	agno = start_agno;
-
-		error = xfs_filestream_new_ag(ap, &agno);
-		if (error)
-			return error;
-		if (agno == NULLAGNUMBER)
-			goto out_select;
-
-		pag = xfs_perag_grab(mp, agno);
-		if (!pag)
-			goto out_select;
-
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-		xfs_perag_rele(pag);
-		if (error) {
-			if (error != -EAGAIN)
-				return error;
-			*blen = 0;
-		}
-		start_agno = agno;
-	}
-
-out_select:
-	args->minlen = xfs_bmap_select_minlen(ap, args, *blen);
-
-	/*
-	 * Set the failure fallback case to look in the selected AG as stream
-	 * may have moved.
-	 */
-	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, start_agno, 0);
-	return 0;
-}
-
 /* Update all inode and quota accounting for the allocation we just did. */
 static void
 xfs_bmap_btalloc_accounting(
@@ -3576,7 +3514,7 @@ xfs_bmap_btalloc_at_eof(
  * transaction that we are critically low on space so they don't waste time on
  * allocation modes that are unlikely to succeed.
  */
-static int
+int
 xfs_bmap_btalloc_low_space(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args)
@@ -3605,36 +3543,25 @@ xfs_bmap_btalloc_filestreams(
 	struct xfs_alloc_arg	*args,
 	int			stripe_align)
 {
-	xfs_agnumber_t		agno = xfs_filestream_lookup_ag(ap->ip);
 	xfs_extlen_t		blen = 0;
 	int			error;
 
-	/* Determine the initial block number we will target for allocation. */
-	if (agno == NULLAGNUMBER)
-		agno = 0;
-	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
-	xfs_bmap_adjacent(ap);
+
+	error = xfs_filestream_select_ag(ap, args, &blen);
+	if (error)
+		return error;
 
 	/*
-	 * If there is very little free space before we start a
-	 * filestreams allocation, we're almost guaranteed to fail to
-	 * find an AG with enough contiguous free space to succeed, so
-	 * just go straight to the low space algorithm.
+	 * If we are in low space mode, then optimal allocation will fail so
+	 * prepare for minimal allocation and jump to the low space algorithm
+	 * immediately.
 	 */
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
 		args->minlen = ap->minlen;
-		return xfs_bmap_btalloc_low_space(ap, args);
+		goto out_low_space;
 	}
 
-	/*
-	 * Search for an allocation group with a single extent large enough for
-	 * the request.  If one isn't found, then adjust the minimum allocation
-	 * size to the largest space found.
-	 */
-	error = xfs_bmap_btalloc_filestreams_select_lengths(ap, args, &blen);
-	if (error)
-		return error;
-
+	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
 	if (ap->aeof) {
 		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
 				true);
@@ -3646,6 +3573,7 @@ xfs_bmap_btalloc_filestreams(
 	if (error || args->fsbno != NULLFSBLOCK)
 		return error;
 
+out_low_space:
 	return xfs_bmap_btalloc_low_space(ap, args);
 }
 
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 7bd619eb2f7d..94d9285eeba1 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -12,6 +12,7 @@ struct xfs_ifork;
 struct xfs_inode;
 struct xfs_mount;
 struct xfs_trans;
+struct xfs_alloc_arg;
 
 /*
  * Argument structure for xfs_bmap_alloc.
@@ -224,6 +225,8 @@ int	xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
 		struct xfs_bmbt_irec *new, int *logflagsp);
 xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct xfs_inode *ip,
 		int fork);
+int	xfs_bmap_btalloc_low_space(struct xfs_bmalloca *ap,
+		struct xfs_alloc_arg *args);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 2eb702034d05..a641404aa9a6 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -12,6 +12,7 @@
 #include "xfs_mount.h"
 #include "xfs_inode.h"
 #include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
 #include "xfs_alloc.h"
 #include "xfs_mru_cache.h"
 #include "xfs_trace.h"
@@ -263,7 +264,7 @@ xfs_filestream_get_parent(
  *
  * Returns NULLAGNUMBER in case of an error.
  */
-xfs_agnumber_t
+static xfs_agnumber_t
 xfs_filestream_lookup_ag(
 	struct xfs_inode	*ip)
 {
@@ -312,7 +313,7 @@ xfs_filestream_lookup_ag(
  * This is called when the allocator can't find a suitable extent in the
  * current AG, and we have to move the stream into a new AG with more space.
  */
-int
+static int
 xfs_filestream_new_ag(
 	struct xfs_bmalloca	*ap,
 	xfs_agnumber_t		*agp)
@@ -358,6 +359,101 @@ xfs_filestream_new_ag(
 	return err;
 }
 
+static int
+xfs_filestreams_select_lengths(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_extlen_t		*blen)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		start_agno;
+	int			error;
+
+	args->total = ap->total;
+
+	start_agno = XFS_FSB_TO_AGNO(mp, ap->blkno);
+	if (start_agno == NULLAGNUMBER)
+		start_agno = 0;
+
+	pag = xfs_perag_grab(mp, start_agno);
+	if (pag) {
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+		xfs_perag_rele(pag);
+		if (error) {
+			if (error != -EAGAIN)
+				return error;
+			*blen = 0;
+		}
+	}
+
+	if (*blen < args->maxlen) {
+		xfs_agnumber_t	agno = start_agno;
+
+		error = xfs_filestream_new_ag(ap, &agno);
+		if (error)
+			return error;
+		if (agno == NULLAGNUMBER)
+			goto out_select;
+
+		pag = xfs_perag_grab(mp, agno);
+		if (!pag)
+			goto out_select;
+
+		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+		xfs_perag_rele(pag);
+		if (error) {
+			if (error != -EAGAIN)
+				return error;
+			*blen = 0;
+		}
+		start_agno = agno;
+	}
+
+out_select:
+	/*
+	 * Set the failure fallback case to look in the selected AG as stream
+	 * may have moved.
+	 */
+	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, start_agno, 0);
+	return 0;
+}
+
+/*
+ * Search for an allocation group with a single extent large enough for
+ * the request.  If one isn't found, then the largest available free extent is
+ * returned as the best length possible.
+ */
+int
+xfs_filestream_select_ag(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_extlen_t		*blen)
+{
+	xfs_agnumber_t		start_agno = xfs_filestream_lookup_ag(ap->ip);
+
+	/* Determine the initial block number we will target for allocation. */
+	if (start_agno == NULLAGNUMBER)
+		start_agno = 0;
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, start_agno, 0);
+	xfs_bmap_adjacent(ap);
+
+	/*
+	 * If there is very little free space before we start a filestreams
+	 * allocation, we're almost guaranteed to fail to find a better AG with
+	 * larger free space available so we don't even try.
+	 */
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
+		return 0;
+
+	/*
+	 * Search for an allocation group with a single extent large enough for
+	 * the request.  If one isn't found, then adjust the minimum allocation
+	 * size to the largest space found.
+	 */
+	return xfs_filestreams_select_lengths(ap, args, blen);
+}
+
 void
 xfs_filestream_deassociate(
 	struct xfs_inode	*ip)
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index 403226ebb80b..df9f7553e106 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -9,13 +9,14 @@
 struct xfs_mount;
 struct xfs_inode;
 struct xfs_bmalloca;
+struct xfs_alloc_arg;
 
 int xfs_filestream_mount(struct xfs_mount *mp);
 void xfs_filestream_unmount(struct xfs_mount *mp);
 void xfs_filestream_deassociate(struct xfs_inode *ip);
-xfs_agnumber_t xfs_filestream_lookup_ag(struct xfs_inode *ip);
-int xfs_filestream_new_ag(struct xfs_bmalloca *ap, xfs_agnumber_t *agp);
 int xfs_filestream_peek_ag(struct xfs_mount *mp, xfs_agnumber_t agno);
+int xfs_filestream_select_ag(struct xfs_bmalloca *ap,
+		struct xfs_alloc_arg *args, xfs_extlen_t *blen);
 
 static inline int
 xfs_inode_is_filestream(
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 34/42] xfs: merge filestream AG lookup into xfs_filestream_select_ag()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (32 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 33/42] xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 35/42] xfs: merge new filestream AG selection " Dave Chinner
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

The lookup currently either returns the cached filestream AG or it
calls xfs_filestreams_select_lengths() to looks up a new AG. This
has verify the AG that is selected, so we end up doing "select a new
AG loop in a couple of places when only one really is needed.  Merge
the initial lookup functionality with the length selection so that
we only need to do a single pick loop on lookup or verification
failure.

This undoes a lot of the factoring that enabled the selection to be
moved over to the filestreams code. It makes
xfs_filestream_select_ag() an awful messier, but it has to be made
worse before it can get better in future patches...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 184 +++++++++++++++-------------------------
 1 file changed, 70 insertions(+), 114 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index a641404aa9a6..23044dab2001 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -258,55 +258,6 @@ xfs_filestream_get_parent(
 	return dir ? XFS_I(dir) : NULL;
 }
 
-/*
- * Find the right allocation group for a file, either by finding an
- * existing file stream or creating a new one.
- *
- * Returns NULLAGNUMBER in case of an error.
- */
-static xfs_agnumber_t
-xfs_filestream_lookup_ag(
-	struct xfs_inode	*ip)
-{
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_inode	*pip = NULL;
-	xfs_agnumber_t		startag, ag = NULLAGNUMBER;
-	struct xfs_mru_cache_elem *mru;
-
-	ASSERT(S_ISREG(VFS_I(ip)->i_mode));
-
-	pip = xfs_filestream_get_parent(ip);
-	if (!pip)
-		return NULLAGNUMBER;
-
-	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
-	if (mru) {
-		ag = container_of(mru, struct xfs_fstrm_item, mru)->ag;
-		xfs_mru_cache_done(mp->m_filestream);
-
-		trace_xfs_filestream_lookup(mp, ip->i_ino, ag);
-		goto out;
-	}
-
-	/*
-	 * Set the starting AG using the rotor for inode32, otherwise
-	 * use the directory inode's AG.
-	 */
-	if (xfs_is_inode32(mp)) {
-		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
-		startag = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
-		mp->m_agfrotor = (mp->m_agfrotor + 1) %
-		                 (mp->m_sb.sb_agcount * rotorstep);
-	} else
-		startag = XFS_INO_TO_AGNO(mp, pip->i_ino);
-
-	if (xfs_filestream_pick_ag(pip, startag, &ag, 0, 0))
-		ag = NULLAGNUMBER;
-out:
-	xfs_irele(pip);
-	return ag;
-}
-
 /*
  * Pick a new allocation group for the current file and its file stream.
  *
@@ -359,83 +310,70 @@ xfs_filestream_new_ag(
 	return err;
 }
 
-static int
-xfs_filestreams_select_lengths(
+/*
+ * Search for an allocation group with a single extent large enough for
+ * the request.  If one isn't found, then the largest available free extent is
+ * returned as the best length possible.
+ */
+int
+xfs_filestream_select_ag(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
 	xfs_extlen_t		*blen)
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
 	struct xfs_perag	*pag;
-	xfs_agnumber_t		start_agno;
+	struct xfs_inode	*pip = NULL;
+	xfs_agnumber_t		agno = NULLAGNUMBER;
+	struct xfs_mru_cache_elem *mru;
 	int			error;
 
 	args->total = ap->total;
+	*blen = 0;
 
-	start_agno = XFS_FSB_TO_AGNO(mp, ap->blkno);
-	if (start_agno == NULLAGNUMBER)
-		start_agno = 0;
-
-	pag = xfs_perag_grab(mp, start_agno);
-	if (pag) {
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-		xfs_perag_rele(pag);
-		if (error) {
-			if (error != -EAGAIN)
-				return error;
-			*blen = 0;
-		}
+	pip = xfs_filestream_get_parent(ap->ip);
+	if (!pip) {
+		agno = 0;
+		goto new_ag;
 	}
 
-	if (*blen < args->maxlen) {
-		xfs_agnumber_t	agno = start_agno;
+	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
+	if (mru) {
+		agno = container_of(mru, struct xfs_fstrm_item, mru)->ag;
+		xfs_mru_cache_done(mp->m_filestream);
 
-		error = xfs_filestream_new_ag(ap, &agno);
-		if (error)
-			return error;
-		if (agno == NULLAGNUMBER)
-			goto out_select;
+		trace_xfs_filestream_lookup(mp, ap->ip->i_ino, agno);
+		xfs_irele(pip);
 
-		pag = xfs_perag_grab(mp, agno);
-		if (!pag)
-			goto out_select;
+		ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
+		xfs_bmap_adjacent(ap);
 
-		error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-		xfs_perag_rele(pag);
-		if (error) {
-			if (error != -EAGAIN)
-				return error;
-			*blen = 0;
+		pag = xfs_perag_grab(mp, agno);
+		if (pag) {
+			error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+			xfs_perag_rele(pag);
+			if (error) {
+				if (error != -EAGAIN)
+					return error;
+				*blen = 0;
+			}
 		}
-		start_agno = agno;
+		if (*blen >= args->maxlen)
+			goto out_select;
+	} else if (xfs_is_inode32(mp)) {
+		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
+		agno = (mp->m_agfrotor / rotorstep) %
+				mp->m_sb.sb_agcount;
+		mp->m_agfrotor = (mp->m_agfrotor + 1) %
+				 (mp->m_sb.sb_agcount * rotorstep);
+		xfs_irele(pip);
+	} else {
+		agno = XFS_INO_TO_AGNO(mp, pip->i_ino);
+		xfs_irele(pip);
 	}
 
-out_select:
-	/*
-	 * Set the failure fallback case to look in the selected AG as stream
-	 * may have moved.
-	 */
-	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, start_agno, 0);
-	return 0;
-}
-
-/*
- * Search for an allocation group with a single extent large enough for
- * the request.  If one isn't found, then the largest available free extent is
- * returned as the best length possible.
- */
-int
-xfs_filestream_select_ag(
-	struct xfs_bmalloca	*ap,
-	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen)
-{
-	xfs_agnumber_t		start_agno = xfs_filestream_lookup_ag(ap->ip);
-
-	/* Determine the initial block number we will target for allocation. */
-	if (start_agno == NULLAGNUMBER)
-		start_agno = 0;
-	ap->blkno = XFS_AGB_TO_FSB(args->mp, start_agno, 0);
+new_ag:
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
 	xfs_bmap_adjacent(ap);
 
 	/*
@@ -446,14 +384,32 @@ xfs_filestream_select_ag(
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
 		return 0;
 
-	/*
-	 * Search for an allocation group with a single extent large enough for
-	 * the request.  If one isn't found, then adjust the minimum allocation
-	 * size to the largest space found.
-	 */
-	return xfs_filestreams_select_lengths(ap, args, blen);
+	error = xfs_filestream_new_ag(ap, &agno);
+	if (error)
+		return error;
+	if (agno == NULLAGNUMBER) {
+		agno = 0;
+		goto out_select;
+	}
+
+	pag = xfs_perag_grab(mp, agno);
+	if (!pag)
+		goto out_select;
+
+	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+	xfs_perag_rele(pag);
+	if (error) {
+		if (error != -EAGAIN)
+			return error;
+		*blen = 0;
+	}
+
+out_select:
+	ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
+	return 0;
 }
 
+
 void
 xfs_filestream_deassociate(
 	struct xfs_inode	*ip)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 35/42] xfs: merge new filestream AG selection into xfs_filestream_select_ag()
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (33 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 34/42] xfs: merge filestream AG lookup into xfs_filestream_select_ag() Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:44 ` [PATCH 36/42] xfs: remove xfs_filestream_select_ag() longest extent check Dave Chinner
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

This is largely a wrapper around xfs_filestream_pick_ag() that
repeats a lot of the lookups that we just merged back into
xfs_filestream_select_ag() from the lookup code. Merge the
xfs_filestream_new_ag() code back into _select_ag() to get rid
of all the unnecessary logic.

Indeed, this makes it obvious that if we have no parent inode,
the filestreams allocator always selects AG 0 regardless of whether
it is fit for purpose or not.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 112 ++++++++++++++--------------------------
 1 file changed, 40 insertions(+), 72 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 23044dab2001..713766729dcf 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -98,16 +98,18 @@ xfs_fstrm_free_func(
 static int
 xfs_filestream_pick_ag(
 	struct xfs_inode	*ip,
-	xfs_agnumber_t		startag,
 	xfs_agnumber_t		*agp,
 	int			flags,
-	xfs_extlen_t		minlen)
+	xfs_extlen_t		*longest)
 {
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_fstrm_item	*item;
 	struct xfs_perag	*pag;
-	xfs_extlen_t		longest, free = 0, minfree, maxfree = 0;
-	xfs_agnumber_t		ag, max_ag = NULLAGNUMBER;
+	xfs_extlen_t		minlen = *longest;
+	xfs_extlen_t		free = 0, minfree, maxfree = 0;
+	xfs_agnumber_t		startag = *agp;
+	xfs_agnumber_t		ag = startag;
+	xfs_agnumber_t		max_ag = NULLAGNUMBER;
 	int			err, trylock, nscan;
 
 	ASSERT(S_ISDIR(VFS_I(ip)->i_mode));
@@ -115,7 +117,6 @@ xfs_filestream_pick_ag(
 	/* 2% of an AG's blocks must be free for it to be chosen. */
 	minfree = mp->m_sb.sb_agblocks / 50;
 
-	ag = startag;
 	*agp = NULLAGNUMBER;
 
 	/* For the first pass, don't sleep trying to init the per-AG. */
@@ -125,8 +126,8 @@ xfs_filestream_pick_ag(
 		trace_xfs_filestream_scan(mp, ip->i_ino, ag);
 
 		pag = xfs_perag_get(mp, ag);
-		longest = 0;
-		err = xfs_bmap_longest_free_extent(pag, NULL, &longest);
+		*longest = 0;
+		err = xfs_bmap_longest_free_extent(pag, NULL, longest);
 		if (err) {
 			xfs_perag_put(pag);
 			if (err != -EAGAIN)
@@ -152,7 +153,7 @@ xfs_filestream_pick_ag(
 			goto next_ag;
 		}
 
-		if (((minlen && longest >= minlen) ||
+		if (((minlen && *longest >= minlen) ||
 		     (!minlen && pag->pagf_freeblks >= minfree)) &&
 		    (!xfs_perag_prefers_metadata(pag) ||
 		     !(flags & XFS_PICK_USERDATA) ||
@@ -258,58 +259,6 @@ xfs_filestream_get_parent(
 	return dir ? XFS_I(dir) : NULL;
 }
 
-/*
- * Pick a new allocation group for the current file and its file stream.
- *
- * This is called when the allocator can't find a suitable extent in the
- * current AG, and we have to move the stream into a new AG with more space.
- */
-static int
-xfs_filestream_new_ag(
-	struct xfs_bmalloca	*ap,
-	xfs_agnumber_t		*agp)
-{
-	struct xfs_inode	*ip = ap->ip, *pip;
-	struct xfs_mount	*mp = ip->i_mount;
-	xfs_extlen_t		minlen = ap->length;
-	xfs_agnumber_t		startag = 0;
-	int			flags = 0;
-	int			err = 0;
-	struct xfs_mru_cache_elem *mru;
-
-	*agp = NULLAGNUMBER;
-
-	pip = xfs_filestream_get_parent(ip);
-	if (!pip)
-		goto exit;
-
-	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
-	if (mru) {
-		struct xfs_fstrm_item *item =
-			container_of(mru, struct xfs_fstrm_item, mru);
-		startag = (item->ag + 1) % mp->m_sb.sb_agcount;
-	}
-
-	if (ap->datatype & XFS_ALLOC_USERDATA)
-		flags |= XFS_PICK_USERDATA;
-	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
-		flags |= XFS_PICK_LOWSPACE;
-
-	err = xfs_filestream_pick_ag(pip, startag, agp, flags, minlen);
-
-	/*
-	 * Only free the item here so we skip over the old AG earlier.
-	 */
-	if (mru)
-		xfs_fstrm_free_func(mp, mru);
-
-	xfs_irele(pip);
-exit:
-	if (*agp == NULLAGNUMBER)
-		*agp = 0;
-	return err;
-}
-
 /*
  * Search for an allocation group with a single extent large enough for
  * the request.  If one isn't found, then the largest available free extent is
@@ -326,6 +275,7 @@ xfs_filestream_select_ag(
 	struct xfs_inode	*pip = NULL;
 	xfs_agnumber_t		agno = NULLAGNUMBER;
 	struct xfs_mru_cache_elem *mru;
+	int			flags = 0;
 	int			error;
 
 	args->total = ap->total;
@@ -334,13 +284,14 @@ xfs_filestream_select_ag(
 	pip = xfs_filestream_get_parent(ap->ip);
 	if (!pip) {
 		agno = 0;
-		goto new_ag;
+		goto out_select;
 	}
 
 	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
 	if (mru) {
 		agno = container_of(mru, struct xfs_fstrm_item, mru)->ag;
 		xfs_mru_cache_done(mp->m_filestream);
+		mru = NULL;
 
 		trace_xfs_filestream_lookup(mp, ap->ip->i_ino, agno);
 		xfs_irele(pip);
@@ -354,7 +305,7 @@ xfs_filestream_select_ag(
 			xfs_perag_rele(pag);
 			if (error) {
 				if (error != -EAGAIN)
-					return error;
+					goto out_error;
 				*blen = 0;
 			}
 		}
@@ -366,13 +317,18 @@ xfs_filestream_select_ag(
 				mp->m_sb.sb_agcount;
 		mp->m_agfrotor = (mp->m_agfrotor + 1) %
 				 (mp->m_sb.sb_agcount * rotorstep);
-		xfs_irele(pip);
 	} else {
 		agno = XFS_INO_TO_AGNO(mp, pip->i_ino);
-		xfs_irele(pip);
 	}
 
-new_ag:
+	/* Changing parent AG association now, so remove the existing one. */
+	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
+	if (mru) {
+		struct xfs_fstrm_item *item =
+			container_of(mru, struct xfs_fstrm_item, mru);
+		agno = (item->ag + 1) % mp->m_sb.sb_agcount;
+		xfs_fstrm_free_func(mp, mru);
+	}
 	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
 	xfs_bmap_adjacent(ap);
 
@@ -382,33 +338,45 @@ xfs_filestream_select_ag(
 	 * larger free space available so we don't even try.
 	 */
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
-		return 0;
+		goto out_select;
+
+	if (ap->datatype & XFS_ALLOC_USERDATA)
+		flags |= XFS_PICK_USERDATA;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
+		flags |= XFS_PICK_LOWSPACE;
 
-	error = xfs_filestream_new_ag(ap, &agno);
+	*blen = ap->length;
+	error = xfs_filestream_pick_ag(pip, &agno, flags, blen);
 	if (error)
-		return error;
+		goto out_error;
 	if (agno == NULLAGNUMBER) {
 		agno = 0;
-		goto out_select;
+		goto out_irele;
 	}
 
 	pag = xfs_perag_grab(mp, agno);
 	if (!pag)
-		goto out_select;
+		goto out_irele;
 
 	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
 	xfs_perag_rele(pag);
 	if (error) {
 		if (error != -EAGAIN)
-			return error;
+			goto out_error;
 		*blen = 0;
 	}
 
+out_irele:
+	xfs_irele(pip);
 out_select:
 	ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
 	return 0;
-}
 
+out_error:
+	xfs_irele(pip);
+	return error;
+
+}
 
 void
 xfs_filestream_deassociate(
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 36/42] xfs: remove xfs_filestream_select_ag() longest extent check
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (34 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 35/42] xfs: merge new filestream AG selection " Dave Chinner
@ 2023-01-18 22:44 ` Dave Chinner
  2023-01-18 22:45 ` [PATCH 37/42] xfs: factor out MRU hit case in xfs_filestream_select_ag Dave Chinner
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:44 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Picking a new AG checks the longest free extent in the AG is valid,
so there's no need to repeat the check in
xfs_filestream_select_ag(). Remove it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 18 +-----------------
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 713766729dcf..95e28aae35ab 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -276,7 +276,7 @@ xfs_filestream_select_ag(
 	xfs_agnumber_t		agno = NULLAGNUMBER;
 	struct xfs_mru_cache_elem *mru;
 	int			flags = 0;
-	int			error;
+	int			error = 0;
 
 	args->total = ap->total;
 	*blen = 0;
@@ -351,27 +351,11 @@ xfs_filestream_select_ag(
 		goto out_error;
 	if (agno == NULLAGNUMBER) {
 		agno = 0;
-		goto out_irele;
-	}
-
-	pag = xfs_perag_grab(mp, agno);
-	if (!pag)
-		goto out_irele;
-
-	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-	xfs_perag_rele(pag);
-	if (error) {
-		if (error != -EAGAIN)
-			goto out_error;
 		*blen = 0;
 	}
 
-out_irele:
-	xfs_irele(pip);
 out_select:
 	ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
-	return 0;
-
 out_error:
 	xfs_irele(pip);
 	return error;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 37/42] xfs: factor out MRU hit case in xfs_filestream_select_ag
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (35 preceding siblings ...)
  2023-01-18 22:44 ` [PATCH 36/42] xfs: remove xfs_filestream_select_ag() longest extent check Dave Chinner
@ 2023-01-18 22:45 ` Dave Chinner
  2023-01-18 22:45 ` [PATCH 38/42] xfs: track an active perag reference in filestreams Dave Chinner
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:45 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Because it now stands out like a sore thumb. Factoring out this case
starts the process of simplifying xfs_filestream_select_ag() again.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 133 +++++++++++++++++++++++++---------------
 1 file changed, 83 insertions(+), 50 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 95e28aae35ab..147296a1079e 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -259,10 +259,85 @@ xfs_filestream_get_parent(
 	return dir ? XFS_I(dir) : NULL;
 }
 
+/*
+ * Lookup the mru cache for an existing association. If one exists and we can
+ * use it, return with the agno and blen indicating that the allocation will
+ * proceed with that association.
+ *
+ * If we have no association, or we cannot use the current one and have to
+ * destroy it, return with blen = 0 and agno pointing at the next agno to try.
+ */
+int
+xfs_filestream_select_ag_mru(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	struct xfs_inode	*pip,
+	xfs_agnumber_t		*agno,
+	xfs_extlen_t		*blen)
+{
+	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_perag	*pag;
+	struct xfs_mru_cache_elem *mru;
+	int			error;
+
+	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
+	if (!mru)
+		goto out_default_agno;
+
+	*agno = container_of(mru, struct xfs_fstrm_item, mru)->ag;
+	xfs_mru_cache_done(mp->m_filestream);
+
+	trace_xfs_filestream_lookup(mp, ap->ip->i_ino, *agno);
+
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, *agno, 0);
+	xfs_bmap_adjacent(ap);
+
+	pag = xfs_perag_grab(mp, *agno);
+	if (!pag)
+		goto out_default_agno;
+
+	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
+	xfs_perag_rele(pag);
+	if (error) {
+		if (error != -EAGAIN)
+			return error;
+		*blen = 0;
+	}
+
+	/*
+	 * We are done if there's still enough contiguous free space to succeed.
+	 */
+	if (*blen >= args->maxlen)
+		return 0;
+
+	/* Changing parent AG association now, so remove the existing one. */
+	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
+	if (mru) {
+		struct xfs_fstrm_item *item =
+			container_of(mru, struct xfs_fstrm_item, mru);
+		*agno = (item->ag + 1) % mp->m_sb.sb_agcount;
+		xfs_fstrm_free_func(mp, mru);
+		return 0;
+	}
+
+out_default_agno:
+	if (xfs_is_inode32(mp)) {
+		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
+		*agno = (mp->m_agfrotor / rotorstep) %
+				mp->m_sb.sb_agcount;
+		mp->m_agfrotor = (mp->m_agfrotor + 1) %
+				 (mp->m_sb.sb_agcount * rotorstep);
+		return 0;
+	}
+	*agno = XFS_INO_TO_AGNO(mp, pip->i_ino);
+	return 0;
+
+}
+
 /*
  * Search for an allocation group with a single extent large enough for
- * the request.  If one isn't found, then the largest available free extent is
- * returned as the best length possible.
+ * the request.  If one isn't found, then adjust the minimum allocation
+ * size to the largest space found.
  */
 int
 xfs_filestream_select_ag(
@@ -271,12 +346,10 @@ xfs_filestream_select_ag(
 	xfs_extlen_t		*blen)
 {
 	struct xfs_mount	*mp = ap->ip->i_mount;
-	struct xfs_perag	*pag;
 	struct xfs_inode	*pip = NULL;
-	xfs_agnumber_t		agno = NULLAGNUMBER;
-	struct xfs_mru_cache_elem *mru;
+	xfs_agnumber_t		agno;
 	int			flags = 0;
-	int			error = 0;
+	int			error;
 
 	args->total = ap->total;
 	*blen = 0;
@@ -287,48 +360,10 @@ xfs_filestream_select_ag(
 		goto out_select;
 	}
 
-	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
-	if (mru) {
-		agno = container_of(mru, struct xfs_fstrm_item, mru)->ag;
-		xfs_mru_cache_done(mp->m_filestream);
-		mru = NULL;
-
-		trace_xfs_filestream_lookup(mp, ap->ip->i_ino, agno);
-		xfs_irele(pip);
-
-		ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
-		xfs_bmap_adjacent(ap);
-
-		pag = xfs_perag_grab(mp, agno);
-		if (pag) {
-			error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-			xfs_perag_rele(pag);
-			if (error) {
-				if (error != -EAGAIN)
-					goto out_error;
-				*blen = 0;
-			}
-		}
-		if (*blen >= args->maxlen)
-			goto out_select;
-	} else if (xfs_is_inode32(mp)) {
-		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
-		agno = (mp->m_agfrotor / rotorstep) %
-				mp->m_sb.sb_agcount;
-		mp->m_agfrotor = (mp->m_agfrotor + 1) %
-				 (mp->m_sb.sb_agcount * rotorstep);
-	} else {
-		agno = XFS_INO_TO_AGNO(mp, pip->i_ino);
-	}
+	error = xfs_filestream_select_ag_mru(ap, args, pip, &agno, blen);
+	if (error || *blen >= args->maxlen)
+		goto out_rele;
 
-	/* Changing parent AG association now, so remove the existing one. */
-	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
-	if (mru) {
-		struct xfs_fstrm_item *item =
-			container_of(mru, struct xfs_fstrm_item, mru);
-		agno = (item->ag + 1) % mp->m_sb.sb_agcount;
-		xfs_fstrm_free_func(mp, mru);
-	}
 	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
 	xfs_bmap_adjacent(ap);
 
@@ -347,8 +382,6 @@ xfs_filestream_select_ag(
 
 	*blen = ap->length;
 	error = xfs_filestream_pick_ag(pip, &agno, flags, blen);
-	if (error)
-		goto out_error;
 	if (agno == NULLAGNUMBER) {
 		agno = 0;
 		*blen = 0;
@@ -356,7 +389,7 @@ xfs_filestream_select_ag(
 
 out_select:
 	ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
-out_error:
+out_rele:
 	xfs_irele(pip);
 	return error;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 38/42] xfs: track an active perag reference in filestreams
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (36 preceding siblings ...)
  2023-01-18 22:45 ` [PATCH 37/42] xfs: factor out MRU hit case in xfs_filestream_select_ag Dave Chinner
@ 2023-01-18 22:45 ` Dave Chinner
  2023-01-18 22:45 ` [PATCH 39/42] xfs: use for_each_perag_wrap in xfs_filestream_pick_ag Dave Chinner
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:45 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Rather than just track the agno of the reference, track a referenced
perag pointer instead. This will allow active filestreams to prevent
AGs from going away until the filestreams have been torn down.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 100 +++++++++++++++++-----------------------
 1 file changed, 43 insertions(+), 57 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 147296a1079e..c92429272ff7 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -23,7 +23,7 @@
 
 struct xfs_fstrm_item {
 	struct xfs_mru_cache_elem	mru;
-	xfs_agnumber_t			ag; /* AG in use for this directory */
+	struct xfs_perag		*pag; /* AG in use for this directory */
 };
 
 enum xfs_fstrm_alloc {
@@ -50,43 +50,18 @@ xfs_filestream_peek_ag(
 	return ret;
 }
 
-static int
-xfs_filestream_get_ag(
-	xfs_mount_t	*mp,
-	xfs_agnumber_t	agno)
-{
-	struct xfs_perag *pag;
-	int		ret;
-
-	pag = xfs_perag_get(mp, agno);
-	ret = atomic_inc_return(&pag->pagf_fstrms);
-	xfs_perag_put(pag);
-	return ret;
-}
-
-static void
-xfs_filestream_put_ag(
-	xfs_mount_t	*mp,
-	xfs_agnumber_t	agno)
-{
-	struct xfs_perag *pag;
-
-	pag = xfs_perag_get(mp, agno);
-	atomic_dec(&pag->pagf_fstrms);
-	xfs_perag_put(pag);
-}
-
 static void
 xfs_fstrm_free_func(
 	void			*data,
 	struct xfs_mru_cache_elem *mru)
 {
-	struct xfs_mount	*mp = data;
 	struct xfs_fstrm_item	*item =
 		container_of(mru, struct xfs_fstrm_item, mru);
+	struct xfs_perag	*pag = item->pag;
 
-	xfs_filestream_put_ag(mp, item->ag);
-	trace_xfs_filestream_free(mp, mru->key, item->ag);
+	trace_xfs_filestream_free(pag->pag_mount, mru->key, pag->pag_agno);
+	atomic_dec(&pag->pagf_fstrms);
+	xfs_perag_rele(pag);
 
 	kmem_free(item);
 }
@@ -105,11 +80,11 @@ xfs_filestream_pick_ag(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_fstrm_item	*item;
 	struct xfs_perag	*pag;
+	struct xfs_perag	*max_pag = NULL;
 	xfs_extlen_t		minlen = *longest;
 	xfs_extlen_t		free = 0, minfree, maxfree = 0;
 	xfs_agnumber_t		startag = *agp;
 	xfs_agnumber_t		ag = startag;
-	xfs_agnumber_t		max_ag = NULLAGNUMBER;
 	int			err, trylock, nscan;
 
 	ASSERT(S_ISDIR(VFS_I(ip)->i_mode));
@@ -125,13 +100,16 @@ xfs_filestream_pick_ag(
 	for (nscan = 0; 1; nscan++) {
 		trace_xfs_filestream_scan(mp, ip->i_ino, ag);
 
-		pag = xfs_perag_get(mp, ag);
+		err = 0;
+		pag = xfs_perag_grab(mp, ag);
+		if (!pag)
+			goto next_ag;
 		*longest = 0;
 		err = xfs_bmap_longest_free_extent(pag, NULL, longest);
 		if (err) {
-			xfs_perag_put(pag);
+			xfs_perag_rele(pag);
 			if (err != -EAGAIN)
-				return err;
+				break;
 			/* Couldn't lock the AGF, skip this AG. */
 			goto next_ag;
 		}
@@ -139,7 +117,10 @@ xfs_filestream_pick_ag(
 		/* Keep track of the AG with the most free blocks. */
 		if (pag->pagf_freeblks > maxfree) {
 			maxfree = pag->pagf_freeblks;
-			max_ag = ag;
+			if (max_pag)
+				xfs_perag_rele(max_pag);
+			atomic_inc(&pag->pag_active_ref);
+			max_pag = pag;
 		}
 
 		/*
@@ -148,8 +129,9 @@ xfs_filestream_pick_ag(
 		 * loop, and it guards against two filestreams being established
 		 * in the same AG as each other.
 		 */
-		if (xfs_filestream_get_ag(mp, ag) > 1) {
-			xfs_filestream_put_ag(mp, ag);
+		if (atomic_inc_return(&pag->pagf_fstrms) > 1) {
+			atomic_dec(&pag->pagf_fstrms);
+			xfs_perag_rele(pag);
 			goto next_ag;
 		}
 
@@ -161,15 +143,12 @@ xfs_filestream_pick_ag(
 
 			/* Break out, retaining the reference on the AG. */
 			free = pag->pagf_freeblks;
-			xfs_perag_put(pag);
-			*agp = ag;
 			break;
 		}
 
 		/* Drop the reference on this AG, it's not usable. */
-		xfs_filestream_put_ag(mp, ag);
+		atomic_dec(&pag->pagf_fstrms);
 next_ag:
-		xfs_perag_put(pag);
 		/* Move to the next AG, wrapping to AG 0 if necessary. */
 		if (++ag >= mp->m_sb.sb_agcount)
 			ag = 0;
@@ -194,10 +173,10 @@ xfs_filestream_pick_ag(
 		 * Take the AG with the most free space, regardless of whether
 		 * it's already in use by another filestream.
 		 */
-		if (max_ag != NULLAGNUMBER) {
-			xfs_filestream_get_ag(mp, max_ag);
+		if (max_pag) {
+			pag = max_pag;
+			atomic_inc(&pag->pagf_fstrms);
 			free = maxfree;
-			*agp = max_ag;
 			break;
 		}
 
@@ -207,17 +186,26 @@ xfs_filestream_pick_ag(
 		return 0;
 	}
 
-	trace_xfs_filestream_pick(ip, *agp, free, nscan);
+	trace_xfs_filestream_pick(ip, pag ? pag->pag_agno : NULLAGNUMBER,
+			free, nscan);
 
-	if (*agp == NULLAGNUMBER)
+	if (max_pag)
+		xfs_perag_rele(max_pag);
+
+	if (err)
+		return err;
+
+	if (!pag) {
+		*agp = NULLAGNUMBER;
 		return 0;
+	}
 
 	err = -ENOMEM;
 	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
 	if (!item)
 		goto out_put_ag;
 
-	item->ag = *agp;
+	item->pag = pag;
 
 	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
 	if (err) {
@@ -226,12 +214,14 @@ xfs_filestream_pick_ag(
 		goto out_free_item;
 	}
 
+	*agp = pag->pag_agno;
 	return 0;
 
 out_free_item:
 	kmem_free(item);
 out_put_ag:
-	xfs_filestream_put_ag(mp, *agp);
+	atomic_dec(&pag->pagf_fstrms);
+	xfs_perag_rele(pag);
 	return err;
 }
 
@@ -284,20 +274,15 @@ xfs_filestream_select_ag_mru(
 	if (!mru)
 		goto out_default_agno;
 
-	*agno = container_of(mru, struct xfs_fstrm_item, mru)->ag;
+	pag = container_of(mru, struct xfs_fstrm_item, mru)->pag;
 	xfs_mru_cache_done(mp->m_filestream);
 
-	trace_xfs_filestream_lookup(mp, ap->ip->i_ino, *agno);
+	trace_xfs_filestream_lookup(mp, ap->ip->i_ino, pag->pag_agno);
 
-	ap->blkno = XFS_AGB_TO_FSB(args->mp, *agno, 0);
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, pag->pag_agno, 0);
 	xfs_bmap_adjacent(ap);
 
-	pag = xfs_perag_grab(mp, *agno);
-	if (!pag)
-		goto out_default_agno;
-
 	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-	xfs_perag_rele(pag);
 	if (error) {
 		if (error != -EAGAIN)
 			return error;
@@ -307,6 +292,7 @@ xfs_filestream_select_ag_mru(
 	/*
 	 * We are done if there's still enough contiguous free space to succeed.
 	 */
+	*agno = pag->pag_agno;
 	if (*blen >= args->maxlen)
 		return 0;
 
@@ -315,7 +301,7 @@ xfs_filestream_select_ag_mru(
 	if (mru) {
 		struct xfs_fstrm_item *item =
 			container_of(mru, struct xfs_fstrm_item, mru);
-		*agno = (item->ag + 1) % mp->m_sb.sb_agcount;
+		*agno = (item->pag->pag_agno + 1) % mp->m_sb.sb_agcount;
 		xfs_fstrm_free_func(mp, mru);
 		return 0;
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 39/42] xfs: use for_each_perag_wrap in xfs_filestream_pick_ag
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (37 preceding siblings ...)
  2023-01-18 22:45 ` [PATCH 38/42] xfs: track an active perag reference in filestreams Dave Chinner
@ 2023-01-18 22:45 ` Dave Chinner
  2023-01-18 22:45 ` [PATCH 40/42] xfs: pass perag to filestreams tracing Dave Chinner
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:45 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_filestream_pick_ag() is now ready to rework to use
for_each_perag_wrap() for iterating the perags during the AG
selection scan.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 101 ++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index c92429272ff7..71fa44485a2f 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -83,9 +83,9 @@ xfs_filestream_pick_ag(
 	struct xfs_perag	*max_pag = NULL;
 	xfs_extlen_t		minlen = *longest;
 	xfs_extlen_t		free = 0, minfree, maxfree = 0;
-	xfs_agnumber_t		startag = *agp;
-	xfs_agnumber_t		ag = startag;
-	int			err, trylock, nscan;
+	xfs_agnumber_t		start_agno = *agp;
+	xfs_agnumber_t		agno;
+	int			err, trylock;
 
 	ASSERT(S_ISDIR(VFS_I(ip)->i_mode));
 
@@ -97,13 +97,9 @@ xfs_filestream_pick_ag(
 	/* For the first pass, don't sleep trying to init the per-AG. */
 	trylock = XFS_ALLOC_FLAG_TRYLOCK;
 
-	for (nscan = 0; 1; nscan++) {
-		trace_xfs_filestream_scan(mp, ip->i_ino, ag);
-
-		err = 0;
-		pag = xfs_perag_grab(mp, ag);
-		if (!pag)
-			goto next_ag;
+restart:
+	for_each_perag_wrap(mp, start_agno, agno, pag) {
+		trace_xfs_filestream_scan(mp, ip->i_ino, agno);
 		*longest = 0;
 		err = xfs_bmap_longest_free_extent(pag, NULL, longest);
 		if (err) {
@@ -111,6 +107,7 @@ xfs_filestream_pick_ag(
 			if (err != -EAGAIN)
 				break;
 			/* Couldn't lock the AGF, skip this AG. */
+			err = 0;
 			goto next_ag;
 		}
 
@@ -129,77 +126,61 @@ xfs_filestream_pick_ag(
 		 * loop, and it guards against two filestreams being established
 		 * in the same AG as each other.
 		 */
-		if (atomic_inc_return(&pag->pagf_fstrms) > 1) {
-			atomic_dec(&pag->pagf_fstrms);
-			xfs_perag_rele(pag);
-			goto next_ag;
-		}
-
-		if (((minlen && *longest >= minlen) ||
-		     (!minlen && pag->pagf_freeblks >= minfree)) &&
-		    (!xfs_perag_prefers_metadata(pag) ||
-		     !(flags & XFS_PICK_USERDATA) ||
-		     (flags & XFS_PICK_LOWSPACE))) {
-
-			/* Break out, retaining the reference on the AG. */
-			free = pag->pagf_freeblks;
-			break;
+		if (atomic_inc_return(&pag->pagf_fstrms) <= 1) {
+			if (((minlen && *longest >= minlen) ||
+			     (!minlen && pag->pagf_freeblks >= minfree)) &&
+			    (!xfs_perag_prefers_metadata(pag) ||
+			     !(flags & XFS_PICK_USERDATA) ||
+			     (flags & XFS_PICK_LOWSPACE))) {
+				/* Break out, retaining the reference on the AG. */
+				free = pag->pagf_freeblks;
+				break;
+			}
 		}
 
 		/* Drop the reference on this AG, it's not usable. */
 		atomic_dec(&pag->pagf_fstrms);
-next_ag:
-		/* Move to the next AG, wrapping to AG 0 if necessary. */
-		if (++ag >= mp->m_sb.sb_agcount)
-			ag = 0;
+	}
 
-		/* If a full pass of the AGs hasn't been done yet, continue. */
-		if (ag != startag)
-			continue;
+	if (err) {
+		xfs_perag_rele(pag);
+		if (max_pag)
+			xfs_perag_rele(max_pag);
+		return err;
+	}
 
+	if (!pag) {
 		/* Allow sleeping in xfs_alloc_read_agf() on the 2nd pass. */
-		if (trylock != 0) {
+		if (trylock) {
 			trylock = 0;
-			continue;
+			goto restart;
 		}
 
 		/* Finally, if lowspace wasn't set, set it for the 3rd pass. */
 		if (!(flags & XFS_PICK_LOWSPACE)) {
 			flags |= XFS_PICK_LOWSPACE;
-			continue;
+			goto restart;
 		}
 
 		/*
-		 * Take the AG with the most free space, regardless of whether
-		 * it's already in use by another filestream.
+		 * No unassociated AGs are available, so select the AG with the
+		 * most free space, regardless of whether it's already in use by
+		 * another filestream. It none suit, return NULLAGNUMBER.
 		 */
-		if (max_pag) {
-			pag = max_pag;
-			atomic_inc(&pag->pagf_fstrms);
-			free = maxfree;
-			break;
+		if (!max_pag) {
+			*agp = NULLAGNUMBER;
+			trace_xfs_filestream_pick(ip, *agp, free, 0);
+			return 0;
 		}
-
-		/* take AG 0 if none matched */
-		trace_xfs_filestream_pick(ip, *agp, free, nscan);
-		*agp = 0;
-		return 0;
-	}
-
-	trace_xfs_filestream_pick(ip, pag ? pag->pag_agno : NULLAGNUMBER,
-			free, nscan);
-
-	if (max_pag)
+		pag = max_pag;
+		free = maxfree;
+		atomic_inc(&pag->pagf_fstrms);
+	} else if (max_pag) {
 		xfs_perag_rele(max_pag);
-
-	if (err)
-		return err;
-
-	if (!pag) {
-		*agp = NULLAGNUMBER;
-		return 0;
 	}
 
+	trace_xfs_filestream_pick(ip, pag->pag_agno, free, 0);
+
 	err = -ENOMEM;
 	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
 	if (!item)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 40/42] xfs: pass perag to filestreams tracing
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (38 preceding siblings ...)
  2023-01-18 22:45 ` [PATCH 39/42] xfs: use for_each_perag_wrap in xfs_filestream_pick_ag Dave Chinner
@ 2023-01-18 22:45 ` Dave Chinner
  2023-01-18 22:45 ` [PATCH 41/42] xfs: return a referenced perag from filestreams allocator Dave Chinner
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:45 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Pass perags instead of raw ag numbers, avoiding the need for the
special peek function for the tracing code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 29 +++++------------------------
 fs/xfs/xfs_filestream.h |  1 -
 fs/xfs/xfs_trace.h      | 37 ++++++++++++++++++++-----------------
 3 files changed, 25 insertions(+), 42 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 71fa44485a2f..81aebe3e09ba 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -31,25 +31,6 @@ enum xfs_fstrm_alloc {
 	XFS_PICK_LOWSPACE = 2,
 };
 
-/*
- * Allocation group filestream associations are tracked with per-ag atomic
- * counters.  These counters allow xfs_filestream_pick_ag() to tell whether a
- * particular AG already has active filestreams associated with it.
- */
-int
-xfs_filestream_peek_ag(
-	xfs_mount_t	*mp,
-	xfs_agnumber_t	agno)
-{
-	struct xfs_perag *pag;
-	int		ret;
-
-	pag = xfs_perag_get(mp, agno);
-	ret = atomic_read(&pag->pagf_fstrms);
-	xfs_perag_put(pag);
-	return ret;
-}
-
 static void
 xfs_fstrm_free_func(
 	void			*data,
@@ -59,7 +40,7 @@ xfs_fstrm_free_func(
 		container_of(mru, struct xfs_fstrm_item, mru);
 	struct xfs_perag	*pag = item->pag;
 
-	trace_xfs_filestream_free(pag->pag_mount, mru->key, pag->pag_agno);
+	trace_xfs_filestream_free(pag, mru->key);
 	atomic_dec(&pag->pagf_fstrms);
 	xfs_perag_rele(pag);
 
@@ -99,7 +80,7 @@ xfs_filestream_pick_ag(
 
 restart:
 	for_each_perag_wrap(mp, start_agno, agno, pag) {
-		trace_xfs_filestream_scan(mp, ip->i_ino, agno);
+		trace_xfs_filestream_scan(pag, ip->i_ino);
 		*longest = 0;
 		err = xfs_bmap_longest_free_extent(pag, NULL, longest);
 		if (err) {
@@ -169,7 +150,7 @@ xfs_filestream_pick_ag(
 		 */
 		if (!max_pag) {
 			*agp = NULLAGNUMBER;
-			trace_xfs_filestream_pick(ip, *agp, free, 0);
+			trace_xfs_filestream_pick(ip, NULL, free);
 			return 0;
 		}
 		pag = max_pag;
@@ -179,7 +160,7 @@ xfs_filestream_pick_ag(
 		xfs_perag_rele(max_pag);
 	}
 
-	trace_xfs_filestream_pick(ip, pag->pag_agno, free, 0);
+	trace_xfs_filestream_pick(ip, pag, free);
 
 	err = -ENOMEM;
 	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
@@ -258,7 +239,7 @@ xfs_filestream_select_ag_mru(
 	pag = container_of(mru, struct xfs_fstrm_item, mru)->pag;
 	xfs_mru_cache_done(mp->m_filestream);
 
-	trace_xfs_filestream_lookup(mp, ap->ip->i_ino, pag->pag_agno);
+	trace_xfs_filestream_lookup(pag, ap->ip->i_ino);
 
 	ap->blkno = XFS_AGB_TO_FSB(args->mp, pag->pag_agno, 0);
 	xfs_bmap_adjacent(ap);
diff --git a/fs/xfs/xfs_filestream.h b/fs/xfs/xfs_filestream.h
index df9f7553e106..84149ed0e340 100644
--- a/fs/xfs/xfs_filestream.h
+++ b/fs/xfs/xfs_filestream.h
@@ -14,7 +14,6 @@ struct xfs_alloc_arg;
 int xfs_filestream_mount(struct xfs_mount *mp);
 void xfs_filestream_unmount(struct xfs_mount *mp);
 void xfs_filestream_deassociate(struct xfs_inode *ip);
-int xfs_filestream_peek_ag(struct xfs_mount *mp, xfs_agnumber_t agno);
 int xfs_filestream_select_ag(struct xfs_bmalloca *ap,
 		struct xfs_alloc_arg *args, xfs_extlen_t *blen);
 
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 3b25b10fccc1..b5f7d225d5b4 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -74,6 +74,7 @@ struct xfs_inobt_rec_incore;
 union xfs_btree_ptr;
 struct xfs_dqtrx;
 struct xfs_icwalk;
+struct xfs_perag;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -638,8 +639,8 @@ DEFINE_BUF_ITEM_EVENT(xfs_trans_bhold_release);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_binval);
 
 DECLARE_EVENT_CLASS(xfs_filestream_class,
-	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno),
-	TP_ARGS(mp, ino, agno),
+	TP_PROTO(struct xfs_perag *pag, xfs_ino_t ino),
+	TP_ARGS(pag, ino),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
@@ -647,10 +648,10 @@ DECLARE_EVENT_CLASS(xfs_filestream_class,
 		__field(int, streams)
 	),
 	TP_fast_assign(
-		__entry->dev = mp->m_super->s_dev;
+		__entry->dev = pag->pag_mount->m_super->s_dev;
 		__entry->ino = ino;
-		__entry->agno = agno;
-		__entry->streams = xfs_filestream_peek_ag(mp, agno);
+		__entry->agno = pag->pag_agno;
+		__entry->streams = atomic_read(&pag->pagf_fstrms);
 	),
 	TP_printk("dev %d:%d ino 0x%llx agno 0x%x streams %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
@@ -660,39 +661,41 @@ DECLARE_EVENT_CLASS(xfs_filestream_class,
 )
 #define DEFINE_FILESTREAM_EVENT(name) \
 DEFINE_EVENT(xfs_filestream_class, name, \
-	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino, xfs_agnumber_t agno), \
-	TP_ARGS(mp, ino, agno))
+	TP_PROTO(struct xfs_perag *pag, xfs_ino_t ino), \
+	TP_ARGS(pag, ino))
 DEFINE_FILESTREAM_EVENT(xfs_filestream_free);
 DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup);
 DEFINE_FILESTREAM_EVENT(xfs_filestream_scan);
 
 TRACE_EVENT(xfs_filestream_pick,
-	TP_PROTO(struct xfs_inode *ip, xfs_agnumber_t agno,
-		 xfs_extlen_t free, int nscan),
-	TP_ARGS(ip, agno, free, nscan),
+	TP_PROTO(struct xfs_inode *ip, struct xfs_perag *pag,
+		 xfs_extlen_t free),
+	TP_ARGS(ip, pag, free),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(xfs_agnumber_t, agno)
 		__field(int, streams)
 		__field(xfs_extlen_t, free)
-		__field(int, nscan)
 	),
 	TP_fast_assign(
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
 		__entry->ino = ip->i_ino;
-		__entry->agno = agno;
-		__entry->streams = xfs_filestream_peek_ag(ip->i_mount, agno);
+		if (pag) {
+			__entry->agno = pag->pag_agno;
+			__entry->streams = atomic_read(&pag->pagf_fstrms);
+		} else {
+			__entry->agno = NULLAGNUMBER;
+			__entry->streams = 0;
+		}
 		__entry->free = free;
-		__entry->nscan = nscan;
 	),
-	TP_printk("dev %d:%d ino 0x%llx agno 0x%x streams %d free %d nscan %d",
+	TP_printk("dev %d:%d ino 0x%llx agno 0x%x streams %d free %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->agno,
 		  __entry->streams,
-		  __entry->free,
-		  __entry->nscan)
+		  __entry->free)
 );
 
 DECLARE_EVENT_CLASS(xfs_lock_class,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 41/42] xfs: return a referenced perag from filestreams allocator
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (39 preceding siblings ...)
  2023-01-18 22:45 ` [PATCH 40/42] xfs: pass perag to filestreams tracing Dave Chinner
@ 2023-01-18 22:45 ` Dave Chinner
  2023-02-02  0:01   ` Darrick J. Wong
  2023-01-18 22:45 ` [PATCH 42/42] xfs: refactor the filestreams allocator pick functions Dave Chinner
  2023-02-02  0:14 ` [PATCH 00/42] xfs: per-ag centric allocation alogrithms Darrick J. Wong
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:45 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that the filestreams AG selection tracks active perags, we need
to return an active perag to the core allocator code. This is
because the file allocation the filestreams code will run are AG
specific allocations and so need to pin the AG until the allocations
complete.

We cannot rely on the filestreams item reference to do this - the
filestreams association can be torn down at any time, hence we
need to have a separate reference for the allocation process to pin
the AG after it has been selected.

This means there is some perag juggling in allocation failure
fallback paths as they will do all AG scans in the case the AG
specific allocation fails. Hence we need to track the perag
reference that the filestream allocator returned to make sure we
don't leak it on repeated allocation failure.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++-----
 fs/xfs/xfs_filestream.c  | 93 ++++++++++++++++++++++++----------------
 2 files changed, 84 insertions(+), 47 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 098b46f3f3e3..7f56002b545d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3427,6 +3427,7 @@ xfs_bmap_btalloc_at_eof(
 	bool			ag_only)
 {
 	struct xfs_mount	*mp = args->mp;
+	struct xfs_perag	*caller_pag = args->pag;
 	int			error;
 
 	/*
@@ -3454,9 +3455,11 @@ xfs_bmap_btalloc_at_eof(
 		else
 			args->minalignslop = 0;
 
-		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
+		if (!caller_pag)
+			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
 		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
-		xfs_perag_put(args->pag);
+		if (!caller_pag)
+			xfs_perag_put(args->pag);
 		if (error)
 			return error;
 
@@ -3482,10 +3485,13 @@ xfs_bmap_btalloc_at_eof(
 		args->minalignslop = 0;
 	}
 
-	if (ag_only)
+	if (ag_only) {
 		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
-	else
+	} else {
+		args->pag = NULL;
 		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
+		args->pag = caller_pag;
+	}
 	if (error)
 		return error;
 
@@ -3544,12 +3550,13 @@ xfs_bmap_btalloc_filestreams(
 	int			stripe_align)
 {
 	xfs_extlen_t		blen = 0;
-	int			error;
+	int			error = 0;
 
 
 	error = xfs_filestream_select_ag(ap, args, &blen);
 	if (error)
 		return error;
+	ASSERT(args->pag);
 
 	/*
 	 * If we are in low space mode, then optimal allocation will fail so
@@ -3558,22 +3565,31 @@ xfs_bmap_btalloc_filestreams(
 	 */
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
 		args->minlen = ap->minlen;
+		ASSERT(args->fsbno == NULLFSBLOCK);
 		goto out_low_space;
 	}
 
 	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
-	if (ap->aeof) {
+	if (ap->aeof)
 		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
 				true);
-		if (error || args->fsbno != NULLFSBLOCK)
-			return error;
-	}
 
-	error = xfs_alloc_vextent_near_bno(args, ap->blkno);
+	if (!error && args->fsbno == NULLFSBLOCK)
+		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
+
+out_low_space:
+	/*
+	 * We are now done with the perag reference for the filestreams
+	 * association provided by xfs_filestream_select_ag(). Release it now as
+	 * we've either succeeded, had a fatal error or we are out of space and
+	 * need to do a full filesystem scan for free space which will take it's
+	 * own references.
+	 */
+	xfs_perag_rele(args->pag);
+	args->pag = NULL;
 	if (error || args->fsbno != NULLFSBLOCK)
 		return error;
 
-out_low_space:
 	return xfs_bmap_btalloc_low_space(ap, args);
 }
 
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 81aebe3e09ba..523a3b8b5754 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -53,8 +53,9 @@ xfs_fstrm_free_func(
  */
 static int
 xfs_filestream_pick_ag(
+	struct xfs_alloc_arg	*args,
 	struct xfs_inode	*ip,
-	xfs_agnumber_t		*agp,
+	xfs_agnumber_t		start_agno,
 	int			flags,
 	xfs_extlen_t		*longest)
 {
@@ -64,7 +65,6 @@ xfs_filestream_pick_ag(
 	struct xfs_perag	*max_pag = NULL;
 	xfs_extlen_t		minlen = *longest;
 	xfs_extlen_t		free = 0, minfree, maxfree = 0;
-	xfs_agnumber_t		start_agno = *agp;
 	xfs_agnumber_t		agno;
 	int			err, trylock;
 
@@ -73,8 +73,6 @@ xfs_filestream_pick_ag(
 	/* 2% of an AG's blocks must be free for it to be chosen. */
 	minfree = mp->m_sb.sb_agblocks / 50;
 
-	*agp = NULLAGNUMBER;
-
 	/* For the first pass, don't sleep trying to init the per-AG. */
 	trylock = XFS_ALLOC_FLAG_TRYLOCK;
 
@@ -89,7 +87,7 @@ xfs_filestream_pick_ag(
 				break;
 			/* Couldn't lock the AGF, skip this AG. */
 			err = 0;
-			goto next_ag;
+			continue;
 		}
 
 		/* Keep track of the AG with the most free blocks. */
@@ -146,16 +144,19 @@ xfs_filestream_pick_ag(
 		/*
 		 * No unassociated AGs are available, so select the AG with the
 		 * most free space, regardless of whether it's already in use by
-		 * another filestream. It none suit, return NULLAGNUMBER.
+		 * another filestream. It none suit, just use whatever AG we can
+		 * grab.
 		 */
 		if (!max_pag) {
-			*agp = NULLAGNUMBER;
-			trace_xfs_filestream_pick(ip, NULL, free);
-			return 0;
+			for_each_perag_wrap(mp, start_agno, agno, pag)
+				break;
+			atomic_inc(&pag->pagf_fstrms);
+			*longest = 0;
+		} else {
+			pag = max_pag;
+			free = maxfree;
+			atomic_inc(&pag->pagf_fstrms);
 		}
-		pag = max_pag;
-		free = maxfree;
-		atomic_inc(&pag->pagf_fstrms);
 	} else if (max_pag) {
 		xfs_perag_rele(max_pag);
 	}
@@ -167,16 +168,29 @@ xfs_filestream_pick_ag(
 	if (!item)
 		goto out_put_ag;
 
+
+	/*
+	 * We are going to use this perag now, so take another ref to it for the
+	 * allocation context returned to the caller. If we raced to create and
+	 * insert the filestreams item into the MRU (-EEXIST), then we still
+	 * keep this reference but free the item reference we gained above. On
+	 * any other failure, we have to drop both.
+	 */
+	atomic_inc(&pag->pag_active_ref);
 	item->pag = pag;
+	args->pag = pag;
 
 	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
 	if (err) {
-		if (err == -EEXIST)
+		if (err == -EEXIST) {
 			err = 0;
+		} else {
+			xfs_perag_rele(args->pag);
+			args->pag = NULL;
+		}
 		goto out_free_item;
 	}
 
-	*agp = pag->pag_agno;
 	return 0;
 
 out_free_item:
@@ -236,7 +250,14 @@ xfs_filestream_select_ag_mru(
 	if (!mru)
 		goto out_default_agno;
 
+	/*
+	 * Grab the pag and take an extra active reference for the caller whilst
+	 * the mru item cannot go away. This means we'll pin the perag with
+	 * the reference we get here even if the filestreams association is torn
+	 * down immediately after we mark the lookup as done.
+	 */
 	pag = container_of(mru, struct xfs_fstrm_item, mru)->pag;
+	atomic_inc(&pag->pag_active_ref);
 	xfs_mru_cache_done(mp->m_filestream);
 
 	trace_xfs_filestream_lookup(pag, ap->ip->i_ino);
@@ -246,6 +267,8 @@ xfs_filestream_select_ag_mru(
 
 	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
 	if (error) {
+		/* We aren't going to use this perag */
+		xfs_perag_rele(pag);
 		if (error != -EAGAIN)
 			return error;
 		*blen = 0;
@@ -253,12 +276,18 @@ xfs_filestream_select_ag_mru(
 
 	/*
 	 * We are done if there's still enough contiguous free space to succeed.
+	 * If there is very little free space before we start a filestreams
+	 * allocation, we're almost guaranteed to fail to find a better AG with
+	 * larger free space available so we don't even try.
 	 */
 	*agno = pag->pag_agno;
-	if (*blen >= args->maxlen)
+	if (*blen >= args->maxlen || (ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
+		args->pag = pag;
 		return 0;
+	}
 
 	/* Changing parent AG association now, so remove the existing one. */
+	xfs_perag_rele(pag);
 	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
 	if (mru) {
 		struct xfs_fstrm_item *item =
@@ -297,46 +326,38 @@ xfs_filestream_select_ag(
 	struct xfs_inode	*pip = NULL;
 	xfs_agnumber_t		agno;
 	int			flags = 0;
-	int			error;
+	int			error = 0;
 
 	args->total = ap->total;
 	*blen = 0;
 
 	pip = xfs_filestream_get_parent(ap->ip);
 	if (!pip) {
-		agno = 0;
-		goto out_select;
+		ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
+		return 0;
 	}
 
 	error = xfs_filestream_select_ag_mru(ap, args, pip, &agno, blen);
-	if (error || *blen >= args->maxlen)
+	if (error)
 		goto out_rele;
-
-	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
-	xfs_bmap_adjacent(ap);
-
-	/*
-	 * If there is very little free space before we start a filestreams
-	 * allocation, we're almost guaranteed to fail to find a better AG with
-	 * larger free space available so we don't even try.
-	 */
+	if (*blen >= args->maxlen)
+		goto out_select;
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
 		goto out_select;
 
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
+	xfs_bmap_adjacent(ap);
+	*blen = ap->length;
 	if (ap->datatype & XFS_ALLOC_USERDATA)
 		flags |= XFS_PICK_USERDATA;
 	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
 		flags |= XFS_PICK_LOWSPACE;
 
-	*blen = ap->length;
-	error = xfs_filestream_pick_ag(pip, &agno, flags, blen);
-	if (agno == NULLAGNUMBER) {
-		agno = 0;
-		*blen = 0;
-	}
-
+	error = xfs_filestream_pick_ag(args, pip, agno, flags, blen);
+	if (error)
+		goto out_rele;
 out_select:
-	ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
+	ap->blkno = XFS_AGB_TO_FSB(mp, args->pag->pag_agno, 0);
 out_rele:
 	xfs_irele(pip);
 	return error;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH 42/42] xfs: refactor the filestreams allocator pick functions
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (40 preceding siblings ...)
  2023-01-18 22:45 ` [PATCH 41/42] xfs: return a referenced perag from filestreams allocator Dave Chinner
@ 2023-01-18 22:45 ` Dave Chinner
  2023-02-02  0:08   ` Darrick J. Wong
  2023-02-02  0:14 ` [PATCH 00/42] xfs: per-ag centric allocation alogrithms Darrick J. Wong
  42 siblings, 1 reply; 77+ messages in thread
From: Dave Chinner @ 2023-01-18 22:45 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Now that the filestreams allocator is largely rewritten,
restructure the main entry point and pick function to seperate out
the different operations cleanly. The MRU lookup function should not
handle the start AG selection on MRU lookup failure, and nor should
the pick function handle building the association that is inserted
into the MRU.

This leaves the filestreams allocator fairly clean and easy to
understand, returning to the caller with an active perag reference
and a target block to allocate at.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_filestream.c | 247 +++++++++++++++++++++-------------------
 fs/xfs/xfs_trace.h      |   9 +-
 2 files changed, 132 insertions(+), 124 deletions(-)

diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 523a3b8b5754..0a1d316ebdba 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -48,19 +48,19 @@ xfs_fstrm_free_func(
 }
 
 /*
- * Scan the AGs starting at startag looking for an AG that isn't in use and has
- * at least minlen blocks free.
+ * Scan the AGs starting at start_agno looking for an AG that isn't in use and
+ * has at least minlen blocks free. If no AG is found to match the allocation
+ * requirements, pick the AG with the most free space in it.
  */
 static int
 xfs_filestream_pick_ag(
 	struct xfs_alloc_arg	*args,
-	struct xfs_inode	*ip,
+	xfs_ino_t		pino,
 	xfs_agnumber_t		start_agno,
 	int			flags,
 	xfs_extlen_t		*longest)
 {
-	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_fstrm_item	*item;
+	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	struct xfs_perag	*max_pag = NULL;
 	xfs_extlen_t		minlen = *longest;
@@ -68,8 +68,6 @@ xfs_filestream_pick_ag(
 	xfs_agnumber_t		agno;
 	int			err, trylock;
 
-	ASSERT(S_ISDIR(VFS_I(ip)->i_mode));
-
 	/* 2% of an AG's blocks must be free for it to be chosen. */
 	minfree = mp->m_sb.sb_agblocks / 50;
 
@@ -78,7 +76,7 @@ xfs_filestream_pick_ag(
 
 restart:
 	for_each_perag_wrap(mp, start_agno, agno, pag) {
-		trace_xfs_filestream_scan(pag, ip->i_ino);
+		trace_xfs_filestream_scan(pag, pino);
 		*longest = 0;
 		err = xfs_bmap_longest_free_extent(pag, NULL, longest);
 		if (err) {
@@ -148,9 +146,9 @@ xfs_filestream_pick_ag(
 		 * grab.
 		 */
 		if (!max_pag) {
-			for_each_perag_wrap(mp, start_agno, agno, pag)
+			for_each_perag_wrap(args->mp, 0, start_agno, args->pag)
 				break;
-			atomic_inc(&pag->pagf_fstrms);
+			atomic_inc(&args->pag->pagf_fstrms);
 			*longest = 0;
 		} else {
 			pag = max_pag;
@@ -161,44 +159,10 @@ xfs_filestream_pick_ag(
 		xfs_perag_rele(max_pag);
 	}
 
-	trace_xfs_filestream_pick(ip, pag, free);
-
-	err = -ENOMEM;
-	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
-	if (!item)
-		goto out_put_ag;
-
-
-	/*
-	 * We are going to use this perag now, so take another ref to it for the
-	 * allocation context returned to the caller. If we raced to create and
-	 * insert the filestreams item into the MRU (-EEXIST), then we still
-	 * keep this reference but free the item reference we gained above. On
-	 * any other failure, we have to drop both.
-	 */
-	atomic_inc(&pag->pag_active_ref);
-	item->pag = pag;
+	trace_xfs_filestream_pick(pag, pino, free);
 	args->pag = pag;
-
-	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
-	if (err) {
-		if (err == -EEXIST) {
-			err = 0;
-		} else {
-			xfs_perag_rele(args->pag);
-			args->pag = NULL;
-		}
-		goto out_free_item;
-	}
-
 	return 0;
 
-out_free_item:
-	kmem_free(item);
-out_put_ag:
-	atomic_dec(&pag->pagf_fstrms);
-	xfs_perag_rele(pag);
-	return err;
 }
 
 static struct xfs_inode *
@@ -227,29 +191,29 @@ xfs_filestream_get_parent(
 
 /*
  * Lookup the mru cache for an existing association. If one exists and we can
- * use it, return with the agno and blen indicating that the allocation will
- * proceed with that association.
+ * use it, return with an active perag reference indicating that the allocation
+ * will proceed with that association.
  *
  * If we have no association, or we cannot use the current one and have to
- * destroy it, return with blen = 0 and agno pointing at the next agno to try.
+ * destroy it, return with longest = 0 to tell the caller to create a new
+ * association.
  */
-int
-xfs_filestream_select_ag_mru(
+static int
+xfs_filestream_lookup_association(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
-	struct xfs_inode	*pip,
-	xfs_agnumber_t		*agno,
-	xfs_extlen_t		*blen)
+	xfs_ino_t		pino,
+	xfs_extlen_t		*longest)
 {
-	struct xfs_mount	*mp = ap->ip->i_mount;
+	struct xfs_mount	*mp = args->mp;
 	struct xfs_perag	*pag;
 	struct xfs_mru_cache_elem *mru;
-	int			error;
+	int			error = 0;
 
-	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
+	*longest = 0;
+	mru = xfs_mru_cache_lookup(mp->m_filestream, pino);
 	if (!mru)
-		goto out_default_agno;
-
+		return 0;
 	/*
 	 * Grab the pag and take an extra active reference for the caller whilst
 	 * the mru item cannot go away. This means we'll pin the perag with
@@ -265,103 +229,148 @@ xfs_filestream_select_ag_mru(
 	ap->blkno = XFS_AGB_TO_FSB(args->mp, pag->pag_agno, 0);
 	xfs_bmap_adjacent(ap);
 
-	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
-	if (error) {
-		/* We aren't going to use this perag */
-		xfs_perag_rele(pag);
-		if (error != -EAGAIN)
-			return error;
-		*blen = 0;
-	}
-
 	/*
-	 * We are done if there's still enough contiguous free space to succeed.
 	 * If there is very little free space before we start a filestreams
-	 * allocation, we're almost guaranteed to fail to find a better AG with
-	 * larger free space available so we don't even try.
+	 * allocation, we're almost guaranteed to fail to find a large enough
+	 * free space available so just use the cached AG.
 	 */
-	*agno = pag->pag_agno;
-	if (*blen >= args->maxlen || (ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
-		args->pag = pag;
-		return 0;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
+		*longest = 1;
+		goto out_done;
 	}
 
+	error = xfs_bmap_longest_free_extent(pag, args->tp, longest);
+	if (error == -EAGAIN)
+		error = 0;
+	if (error || *longest < args->maxlen) {
+		/* We aren't going to use this perag */
+		*longest = 0;
+		xfs_perag_rele(pag);
+		return error;
+	}
+
+out_done:
+	args->pag = pag;
+	return 0;
+}
+
+static int
+xfs_filestream_create_association(
+	struct xfs_bmalloca	*ap,
+	struct xfs_alloc_arg	*args,
+	xfs_ino_t		pino,
+	xfs_extlen_t		*longest)
+{
+	struct xfs_mount	*mp = args->mp;
+	struct xfs_mru_cache_elem *mru;
+	struct xfs_fstrm_item	*item;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, pino);
+	int			flags = 0;
+	int			error;
+
 	/* Changing parent AG association now, so remove the existing one. */
-	xfs_perag_rele(pag);
-	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
+	mru = xfs_mru_cache_remove(mp->m_filestream, pino);
 	if (mru) {
 		struct xfs_fstrm_item *item =
 			container_of(mru, struct xfs_fstrm_item, mru);
-		*agno = (item->pag->pag_agno + 1) % mp->m_sb.sb_agcount;
-		xfs_fstrm_free_func(mp, mru);
-		return 0;
-	}
 
-out_default_agno:
-	if (xfs_is_inode32(mp)) {
+		agno = (item->pag->pag_agno + 1) % mp->m_sb.sb_agcount;
+		xfs_fstrm_free_func(mp, mru);
+	} else if (xfs_is_inode32(mp)) {
 		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
-		*agno = (mp->m_agfrotor / rotorstep) %
-				mp->m_sb.sb_agcount;
+
+		agno = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
 		mp->m_agfrotor = (mp->m_agfrotor + 1) %
 				 (mp->m_sb.sb_agcount * rotorstep);
-		return 0;
 	}
-	*agno = XFS_INO_TO_AGNO(mp, pip->i_ino);
+
+	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
+	xfs_bmap_adjacent(ap);
+
+	if (ap->datatype & XFS_ALLOC_USERDATA)
+		flags |= XFS_PICK_USERDATA;
+	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
+		flags |= XFS_PICK_LOWSPACE;
+
+	*longest = ap->length;
+	error = xfs_filestream_pick_ag(args, pino, agno, flags, longest);
+	if (error)
+		return error;
+
+	/*
+	 * We are going to use this perag now, so create an assoication for it.
+	 * xfs_filestream_pick_ag() has already bumped the perag fstrms counter
+	 * for us, so all we need to do here is take another active reference to
+	 * the perag for the cached association.
+	 *
+	 * If we fail to store the association, we need to drop the fstrms
+	 * counter as well as drop the perag reference we take here for the
+	 * item. We do not need to return an error for this failure - as long as
+	 * we return a referenced AG, the allocation can still go ahead just
+	 * fine.
+	 */
+	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
+	if (!item)
+		goto out_put_fstrms;
+
+	atomic_inc(&args->pag->pag_active_ref);
+	item->pag = args->pag;
+	error = xfs_mru_cache_insert(mp->m_filestream, pino, &item->mru);
+	if (error)
+		goto out_free_item;
 	return 0;
 
+out_free_item:
+	xfs_perag_rele(item->pag);
+	kmem_free(item);
+out_put_fstrms:
+	atomic_dec(&args->pag->pagf_fstrms);
+	return 0;
 }
 
 /*
  * Search for an allocation group with a single extent large enough for
- * the request.  If one isn't found, then adjust the minimum allocation
- * size to the largest space found.
+ * the request. First we look for an existing association and use that if it
+ * is found. Otherwise, we create a new association by selecting an AG that fits
+ * the allocation criteria.
+ *
+ * We return with a referenced perag in args->pag to indicate which AG we are
+ * allocating into or an error with no references held.
  */
 int
 xfs_filestream_select_ag(
 	struct xfs_bmalloca	*ap,
 	struct xfs_alloc_arg	*args,
-	xfs_extlen_t		*blen)
+	xfs_extlen_t		*longest)
 {
-	struct xfs_mount	*mp = ap->ip->i_mount;
-	struct xfs_inode	*pip = NULL;
-	xfs_agnumber_t		agno;
-	int			flags = 0;
+	struct xfs_mount	*mp = args->mp;
+	struct xfs_inode	*pip;
+	xfs_ino_t		ino = 0;
 	int			error = 0;
 
+	*longest = 0;
 	args->total = ap->total;
-	*blen = 0;
-
 	pip = xfs_filestream_get_parent(ap->ip);
-	if (!pip) {
-		ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
-		return 0;
+	if (pip) {
+		ino = pip->i_ino;
+		error = xfs_filestream_lookup_association(ap, args, ino,
+				longest);
+		xfs_irele(pip);
+		if (error)
+			return error;
+		if (*longest >= args->maxlen)
+			goto out_select;
+		if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
+			goto out_select;
 	}
 
-	error = xfs_filestream_select_ag_mru(ap, args, pip, &agno, blen);
+	error = xfs_filestream_create_association(ap, args, ino, longest);
 	if (error)
-		goto out_rele;
-	if (*blen >= args->maxlen)
-		goto out_select;
-	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
-		goto out_select;
-
-	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
-	xfs_bmap_adjacent(ap);
-	*blen = ap->length;
-	if (ap->datatype & XFS_ALLOC_USERDATA)
-		flags |= XFS_PICK_USERDATA;
-	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
-		flags |= XFS_PICK_LOWSPACE;
+		return error;
 
-	error = xfs_filestream_pick_ag(args, pip, agno, flags, blen);
-	if (error)
-		goto out_rele;
 out_select:
 	ap->blkno = XFS_AGB_TO_FSB(mp, args->pag->pag_agno, 0);
-out_rele:
-	xfs_irele(pip);
-	return error;
-
+	return 0;
 }
 
 void
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index b5f7d225d5b4..1d3569c0d2fe 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -668,9 +668,8 @@ DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup);
 DEFINE_FILESTREAM_EVENT(xfs_filestream_scan);
 
 TRACE_EVENT(xfs_filestream_pick,
-	TP_PROTO(struct xfs_inode *ip, struct xfs_perag *pag,
-		 xfs_extlen_t free),
-	TP_ARGS(ip, pag, free),
+	TP_PROTO(struct xfs_perag *pag, xfs_ino_t ino, xfs_extlen_t free),
+	TP_ARGS(pag, ino, free),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
@@ -679,8 +678,8 @@ TRACE_EVENT(xfs_filestream_pick,
 		__field(xfs_extlen_t, free)
 	),
 	TP_fast_assign(
-		__entry->dev = VFS_I(ip)->i_sb->s_dev;
-		__entry->ino = ip->i_ino;
+		__entry->dev = pag->pag_mount->m_super->s_dev;
+		__entry->ino = ino;
 		if (pag) {
 			__entry->agno = pag->pag_agno;
 			__entry->streams = atomic_read(&pag->pagf_fstrms);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 01/42] xfs: fix low space alloc deadlock
  2023-01-18 22:44 ` [PATCH 01/42] xfs: fix low space alloc deadlock Dave Chinner
@ 2023-01-19 16:39   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-19 16:39 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> I've recently encountered an ABBA deadlock with g/476. The upcoming
> changes seem to make this much easier to hit, but the underlying
> problem is a pre-existing one.
> 
> Essentially, if we select an AG for allocation, then lock the AGF
> and then fail to allocate for some reason (e.g. minimum length
> requirements cannot be satisfied), then we drop out of the
> allocation with the AGF still locked.
> 
> The caller then modifies the allocation constraints - usually
> loosening them up - and tries again. This can result in trying to
> access AGFs that are lower than the AGF we already have locked from
> the failed attempt. e.g. the failed attempt skipped several AGs
> before failing, so we have locks an AG higher than the start AG.
> Retrying the allocation from the start AG then causes us to violate
> AGF lock ordering and this can lead to deadlocks.
> 
> The deadlock exists even if allocation succeeds - we can do a
> followup allocations in the same transaction for BMBT blocks that
> aren't guaranteed to be in the same AG as the original, and can move
> into higher AGs. Hence we really need to move the tp->t_firstblock
> tracking down into xfs_alloc_vextent() where it can be set when we
> exit with a locked AG.
> 
> xfs_alloc_vextent() can also check there if the requested
> allocation falls within the allow range of AGs set by
> tp->t_firstblock. If we can't allocate within the range set, we have
> to fail the allocation. If we are allowed to to non-blocking AGF
> locking, we can ignore the AG locking order limitations as we can
> use try-locks for the first iteration over requested AG range.
> 
> This invalidates a set of post allocation asserts that check that
> the allocation is always above tp->t_firstblock if it is set.
> Because we can use try-locks to avoid the deadlock in some
> circumstances, having a pre-existing locked AGF doesn't always
> prevent allocation from lower order AGFs. Hence those ASSERTs need
> to be removed.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
This makes sense to me.  I kinda wish git had a way of making subsets
of sets to help break up large series like this.  Like a bug fix subset
and then the new feature subset.

Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 69 ++++++++++++++++++++++++++++++++-----
> --
>  fs/xfs/libxfs/xfs_bmap.c  | 14 --------
>  fs/xfs/xfs_trace.h        |  1 +
>  3 files changed, 58 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 989cf341779b..c2f38f595d7f 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3164,10 +3164,13 @@ xfs_alloc_vextent(
>         xfs_alloctype_t         type;   /* input allocation type */
>         int                     bump_rotor = 0;
>         xfs_agnumber_t          rotorstep = xfs_rotorstep; /* inode32
> agf stepper */
> +       xfs_agnumber_t          minimum_agno = 0;
>  
>         mp = args->mp;
>         type = args->otype = args->type;
>         args->agbno = NULLAGBLOCK;
> +       if (args->tp->t_firstblock != NULLFSBLOCK)
> +               minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp-
> >t_firstblock);
>         /*
>          * Just fix this up, for the case where the last a.g. is
> shorter
>          * (or there's only one a.g.) and the caller couldn't easily
> figure
> @@ -3201,6 +3204,13 @@ xfs_alloc_vextent(
>                  */
>                 args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
>                 args->pag = xfs_perag_get(mp, args->agno);
> +
> +               if (minimum_agno > args->agno) {
> +                       trace_xfs_alloc_vextent_skip_deadlock(args);
> +                       error = 0;
> +                       break;
> +               }
> +
>                 error = xfs_alloc_fix_freelist(args, 0);
>                 if (error) {
>                         trace_xfs_alloc_vextent_nofix(args);
> @@ -3232,6 +3242,8 @@ xfs_alloc_vextent(
>         case XFS_ALLOCTYPE_FIRST_AG:
>                 /*
>                  * Rotate through the allocation groups looking for a
> winner.
> +                * If we are blocking, we must obey minimum_agno
> contraints for
> +                * avoiding ABBA deadlocks on AGF locking.
>                  */
>                 if (type == XFS_ALLOCTYPE_FIRST_AG) {
>                         /*
> @@ -3239,7 +3251,7 @@ xfs_alloc_vextent(
>                          */
>                         args->agno = XFS_FSB_TO_AGNO(mp, args-
> >fsbno);
>                         args->type = XFS_ALLOCTYPE_THIS_AG;
> -                       sagno = 0;
> +                       sagno = minimum_agno;
>                         flags = 0;
>                 } else {
>                         /*
> @@ -3248,6 +3260,7 @@ xfs_alloc_vextent(
>                         args->agno = sagno = XFS_FSB_TO_AGNO(mp,
> args->fsbno);
>                         flags = XFS_ALLOC_FLAG_TRYLOCK;
>                 }
> +
>                 /*
>                  * Loop over allocation groups twice; first time with
>                  * trylock set, second time without.
> @@ -3276,19 +3289,21 @@ xfs_alloc_vextent(
>                         if (args->agno == sagno &&
>                             type == XFS_ALLOCTYPE_START_BNO)
>                                 args->type = XFS_ALLOCTYPE_THIS_AG;
> +
>                         /*
> -                       * For the first allocation, we can try any AG
> to get
> -                       * space.  However, if we already have
> allocated a
> -                       * block, we don't want to try AGs whose
> number is below
> -                       * sagno. Otherwise, we may end up with out-
> of-order
> -                       * locking of AGF, which might cause deadlock.
> -                       */
> +                        * If we are try-locking, we can't deadlock
> on AGF
> +                        * locks, so we can wrap all the way back to
> the first
> +                        * AG. Otherwise, wrap back to the start AG
> so we can't
> +                        * deadlock, and let the end of scan handler
> decide what
> +                        * to do next.
> +                        */
>                         if (++(args->agno) == mp->m_sb.sb_agcount) {
> -                               if (args->tp->t_firstblock !=
> NULLFSBLOCK)
> -                                       args->agno = sagno;
> -                               else
> +                               if (flags & XFS_ALLOC_FLAG_TRYLOCK)
>                                         args->agno = 0;
> +                               else
> +                                       args->agno = sagno;
>                         }
> +
>                         /*
>                          * Reached the starting a.g., must either be
> done
>                          * or switch to non-trylock mode.
> @@ -3300,7 +3315,14 @@ xfs_alloc_vextent(
>                                         break;
>                                 }
>  
> +                               /*
> +                                * Blocking pass next, so we must
> obey minimum
> +                                * agno constraints to avoid ABBA AGF
> deadlocks.
> +                                */
>                                 flags = 0;
> +                               if (minimum_agno > sagno)
> +                                       sagno = minimum_agno;
> +
>                                 if (type == XFS_ALLOCTYPE_START_BNO)
> {
>                                         args->agbno =
> XFS_FSB_TO_AGBNO(mp,
>                                                 args->fsbno);
> @@ -3322,9 +3344,9 @@ xfs_alloc_vextent(
>                 ASSERT(0);
>                 /* NOTREACHED */
>         }
> -       if (args->agbno == NULLAGBLOCK)
> +       if (args->agbno == NULLAGBLOCK) {
>                 args->fsbno = NULLFSBLOCK;
> -       else {
> +       } else {
>                 args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args-
> >agbno);
>  #ifdef DEBUG
>                 ASSERT(args->len >= args->minlen);
> @@ -3335,6 +3357,29 @@ xfs_alloc_vextent(
>  #endif
>  
>         }
> +
> +       /*
> +        * We end up here with a locked AGF. If we failed, the caller
> is likely
> +        * going to try to allocate again with different parameters,
> and that
> +        * can widen the AGs that are searched for free space. If we
> have to do
> +        * BMBT block allocation, we have to do a new allocation.
> +        *
> +        * Hence leaving this function with the AGF locked opens up
> potential
> +        * ABBA AGF deadlocks because a future allocation attempt in
> this
> +        * transaction may attempt to lock a lower number AGF.
> +        *
> +        * We can't release the AGF until the transaction is
> commited, so at
> +        * this point we must update the "firstblock" tracker to
> point at this
> +        * AG if the tracker is empty or points to a lower AG. This
> allows the
> +        * next allocation attempt to be modified appropriately to
> avoid
> +        * deadlocks.
> +        */
> +       if (args->agbp &&
> +           (args->tp->t_firstblock == NULLFSBLOCK ||
> +            args->pag->pag_agno > minimum_agno)) {
> +               args->tp->t_firstblock = XFS_AGB_TO_FSB(mp,
> +                                       args->pag->pag_agno, 0);
> +       }
>         xfs_perag_put(args->pag);
>         return 0;
>  error0:
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0d56a8d862e8..018837bd72c8 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3413,21 +3413,7 @@ xfs_bmap_process_allocated_extent(
>         xfs_fileoff_t           orig_offset,
>         xfs_extlen_t            orig_length)
>  {
> -       int                     nullfb;
> -
> -       nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
> -
> -       /*
> -        * check the allocation happened at the same or higher AG
> than
> -        * the first block that was allocated.
> -        */
> -       ASSERT(nullfb ||
> -               XFS_FSB_TO_AGNO(args->mp, ap->tp->t_firstblock) <=
> -               XFS_FSB_TO_AGNO(args->mp, args->fsbno));
> -
>         ap->blkno = args->fsbno;
> -       if (nullfb)
> -               ap->tp->t_firstblock = args->fsbno;
>         ap->length = args->len;
>         /*
>          * If the extent size hint is active, we tried to round the
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 421d1e504ac4..918e778fdd55 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1877,6 +1877,7 @@ DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
>  DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
>  DEFINE_ALLOC_EVENT(xfs_alloc_small_error);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_badargs);
> +DEFINE_ALLOC_EVENT(xfs_alloc_vextent_skip_deadlock);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_nofix);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
>  DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 02/42] xfs: prefer free inodes at ENOSPC over chunk allocation
  2023-01-18 22:44 ` [PATCH 02/42] xfs: prefer free inodes at ENOSPC over chunk allocation Dave Chinner
@ 2023-01-19 19:08   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-19 19:08 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When an XFS filesystem has free inodes in chunks already allocated
> on disk, it will still allocate new inode chunks if the target AG
> has no free inodes in it. Normally, this is a good idea as it
> preserves locality of all the inodes in a given directory.
> 
> However, at ENOSPC this can lead to using the last few remaining
> free filesystem blocks to allocate a new chunk when there are many,
> many free inodes that could be allocated without consuming free
> space. This results in speeding up the consumption of the last few
> blocks and inode create operations then returning ENOSPC when there
> free inodes available because we don't have enough block left in the
> filesystem for directory creation reservations to proceed.
> 
> Hence when we are near ENOSPC, we should be attempting to preserve
> the remaining blocks for directory block allocation rather than
> using them for unnecessary inode chunk creation.
> 
> This particular behaviour is exposed by xfs/294, when it drives to
> ENOSPC on empty file creation whilst there are still thousands of
> free inodes available for allocation in other AGs in the filesystem.
> 
> Hence, when we are within 1% of ENOSPC, change the inode allocation
> behaviour to prefer to use existing free inodes over allocating new
> inode chunks, even though it results is poorer locality of the data
> set. It is more important for the allocations to be space efficient
> near ENOSPC than to have optimal locality for performance, so lets
> modify the inode AG selection code to reflect that fact.
> 
> This allows generic/294 to not only pass with this allocator rework
> patchset, but to increase the number of post-ENOSPC empty inode
> allocations to from ~600 to ~9080 before we hit ENOSPC on the
> directory create transaction reservation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
 
> ---
>  fs/xfs/libxfs/xfs_ialloc.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 5118dedf9267..e8068422aa21 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1737,6 +1737,7 @@ xfs_dialloc(
>         struct xfs_perag        *pag;
>         struct xfs_ino_geometry *igeo = M_IGEO(mp);
>         bool                    ok_alloc = true;
> +       bool                    low_space = false;
>         int                     flags;
>         xfs_ino_t               ino;
>  
> @@ -1767,6 +1768,20 @@ xfs_dialloc(
>                 ok_alloc = false;
>         }
>  
> +       /*
> +        * If we are near to ENOSPC, we want to prefer allocation
> from AGs that
> +        * have free inodes in them rather than use up free space
> allocating new
> +        * inode chunks. Hence we turn off allocation for the first
> non-blocking
> +        * pass through the AGs if we are near ENOSPC to consume free
> inodes
> +        * that we can immediately allocate, but then we allow
> allocation on the
> +        * second pass if we fail to find an AG with free inodes in
> it.
> +        */
> +       if (percpu_counter_read_positive(&mp->m_fdblocks) <
> +                       mp->m_low_space[XFS_LOWSP_1_PCNT]) {
> +               ok_alloc = false;
> +               low_space = true;
> +       }
> +
>         /*
>          * Loop until we find an allocation group that either has
> free inodes
>          * or in which we can allocate some inodes.  Iterate through
> the
> @@ -1795,6 +1810,8 @@ xfs_dialloc(
>                                 break;
>                         }
>                         flags = 0;
> +                       if (low_space)
> +                               ok_alloc = true;
>                 }
>                 xfs_perag_put(pag);
>         }


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 03/42] xfs: block reservation too large for minleft allocation
  2023-01-18 22:44 ` [PATCH 03/42] xfs: block reservation too large for minleft allocation Dave Chinner
@ 2023-01-19 20:38   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-19 20:38 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we enter xfs_bmbt_alloc_block() without having first allocated
> a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we
> are doing something like unwritten extent conversion, the transaction
> block reservation is used as the minleft value.
> 
> This works for operations like unwritten extent conversion, but it
> assumes that the block reservation is only for a BMBT split. THis is
> not always true, and sometimes results in larger than necessary
> minleft values being set. We only actually need enough space for a
> btree split, something we already handle correctly in
> xfs_bmapi_write() via the xfs_bmapi_minleft() calculation.
> 
> We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to
> calculate the number of blocks a BMBT split on this inode is going to
> require, not use the transaction block reservation that contains the
> maximum number of blocks this transaction may consume in it...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_bmap.c       |  2 +-
>  fs/xfs/libxfs/xfs_bmap.h       |  2 ++
>  fs/xfs/libxfs/xfs_bmap_btree.c | 19 +++++++++----------
>  3 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 018837bd72c8..9dc33cdc2ab9 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4242,7 +4242,7 @@ xfs_bmapi_convert_unwritten(
>         return 0;
>  }
>  
> -static inline xfs_extlen_t
> +xfs_extlen_t
>  xfs_bmapi_minleft(
>         struct xfs_trans        *tp,
>         struct xfs_inode        *ip,
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 16db95b11589..08c16e4edc0f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -220,6 +220,8 @@ int xfs_bmap_add_extent_unwritten_real(struct
> xfs_trans *tp,
>                 struct xfs_inode *ip, int whichfork,
>                 struct xfs_iext_cursor *icur, struct xfs_btree_cur
> **curp,
>                 struct xfs_bmbt_irec *new, int *logflagsp);
> +xfs_extlen_t xfs_bmapi_minleft(struct xfs_trans *tp, struct
> xfs_inode *ip,
> +               int fork);
>  
>  enum xfs_bmap_intent_type {
>         XFS_BMAP_MAP = 1,
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c
> b/fs/xfs/libxfs/xfs_bmap_btree.c
> index cfa052d40105..18de4fbfef4e 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -213,18 +213,16 @@ xfs_bmbt_alloc_block(
>         if (args.fsbno == NULLFSBLOCK) {
>                 args.fsbno = be64_to_cpu(start->l);
>                 args.type = XFS_ALLOCTYPE_START_BNO;
> +
>                 /*
> -                * Make sure there is sufficient room left in the AG
> to
> -                * complete a full tree split for an extent insert. 
> If
> -                * we are converting the middle part of an extent
> then
> -                * we may need space for two tree splits.
> -                *
> -                * We are relying on the caller to make the correct
> block
> -                * reservation for this operation to succeed.  If the
> -                * reservation amount is insufficient then we may
> fail a
> -                * block allocation here and corrupt the filesystem.
> +                * If we are coming here from something like
> unwritten extent
> +                * conversion, there has been no data extent
> allocation already
> +                * done, so we have to ensure that we attempt to
> locate the
> +                * entire set of bmbt allocations in the same AG, as
> +                * xfs_bmapi_write() would have reserved.
>                  */
> -               args.minleft = args.tp->t_blk_res;
> +               args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur-
> >bc_ino.ip,
> +                                               cur-
> >bc_ino.whichfork);
>         } else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) {
>                 args.type = XFS_ALLOCTYPE_START_BNO;
>         } else {
> @@ -248,6 +246,7 @@ xfs_bmbt_alloc_block(
>                  * successful activate the lowspace algorithm.
>                  */
>                 args.fsbno = 0;
> +               args.minleft = 0;
>                 args.type = XFS_ALLOCTYPE_FIRST_AG;
>                 error = xfs_alloc_vextent(&args);
>                 if (error)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 04/42] xfs: drop firstblock constraints from allocation setup
  2023-01-18 22:44 ` [PATCH 04/42] xfs: drop firstblock constraints from allocation setup Dave Chinner
@ 2023-01-19 22:03   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-19 22:03 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that xfs_alloc_vextent() does all the AGF deadlock prevention
> filtering for multiple allocations in a single transaction, we no
> longer need the allocation setup code to care about what AGs we
> might already have locked.
> 
> Hence we can remove all the "nullfb" conditional logic in places
> like xfs_bmap_btalloc() and instead have them focus simply on
> setting up locality constraints. If the allocation fails due to
> AGF lock filtering in xfs_alloc_vextent, then we just fall back as
> we normally do to more relaxed allocation constraints.
> 
> As a result, any allocation that allows AG scanning (i.e. not
> confined to a single AG) and does not force a worst case full
> filesystem scan will now be able to attempt allocation from AGs
> lower than that defined by tp->t_firstblock. This is because
> xfs_alloc_vextent() allows try-locking of the AGFs and hence enables
> low space algorithms to at least -try- to get space from AGs lower
> than the one that we have currently locked and allocated from. This
> is a significant improvement in the low space allocation algorithm.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c       | 168 +++++++++++--------------------
> --
>  fs/xfs/libxfs/xfs_bmap.h       |   1 +
>  fs/xfs/libxfs/xfs_bmap_btree.c |  30 +++---
>  3 files changed, 67 insertions(+), 132 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 9dc33cdc2ab9..bc566aae4246 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -645,16 +645,9 @@ xfs_bmap_extents_to_btree(
>         args.tp = tp;
>         args.mp = mp;
>         xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, whichfork);
> -       if (tp->t_firstblock == NULLFSBLOCK) {
> -               args.type = XFS_ALLOCTYPE_START_BNO;
> -               args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
> -       } else if (tp->t_flags & XFS_TRANS_LOWMODE) {
> -               args.type = XFS_ALLOCTYPE_START_BNO;
> -               args.fsbno = tp->t_firstblock;
> -       } else {
> -               args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -               args.fsbno = tp->t_firstblock;
> -       }
> +
> +       args.type = XFS_ALLOCTYPE_START_BNO;
> +       args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
>         args.minlen = args.maxlen = args.prod = 1;
>         args.wasdel = wasdel;
>         *logflagsp = 0;
> @@ -662,17 +655,14 @@ xfs_bmap_extents_to_btree(
>         if (error)
>                 goto out_root_realloc;
>  
> +       /*
> +        * Allocation can't fail, the space was reserved.
> +        */
>         if (WARN_ON_ONCE(args.fsbno == NULLFSBLOCK)) {
>                 error = -ENOSPC;
>                 goto out_root_realloc;
>         }
>  
> -       /*
> -        * Allocation can't fail, the space was reserved.
> -        */
> -       ASSERT(tp->t_firstblock == NULLFSBLOCK ||
> -              args.agno >= XFS_FSB_TO_AGNO(mp, tp->t_firstblock));
> -       tp->t_firstblock = args.fsbno;
>         cur->bc_ino.allocated++;
>         ip->i_nblocks++;
>         xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, 1L);
> @@ -804,13 +794,8 @@ xfs_bmap_local_to_extents(
>          * Allocate a block.  We know we need only one, since the
>          * file currently fits in an inode.
>          */
> -       if (tp->t_firstblock == NULLFSBLOCK) {
> -               args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
> -               args.type = XFS_ALLOCTYPE_START_BNO;
> -       } else {
> -               args.fsbno = tp->t_firstblock;
> -               args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -       }
> +       args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
> +       args.type = XFS_ALLOCTYPE_START_BNO;
>         args.total = total;
>         args.minlen = args.maxlen = args.prod = 1;
>         error = xfs_alloc_vextent(&args);
> @@ -820,7 +805,6 @@ xfs_bmap_local_to_extents(
>         /* Can't fail, the space was reserved. */
>         ASSERT(args.fsbno != NULLFSBLOCK);
>         ASSERT(args.len == 1);
> -       tp->t_firstblock = args.fsbno;
>         error = xfs_trans_get_buf(tp, args.mp->m_ddev_targp,
>                         XFS_FSB_TO_DADDR(args.mp, args.fsbno),
>                         args.mp->m_bsize, 0, &bp);
> @@ -854,8 +838,7 @@ xfs_bmap_local_to_extents(
>  
>         ifp->if_nextents = 1;
>         ip->i_nblocks = 1;
> -       xfs_trans_mod_dquot_byino(tp, ip,
> -               XFS_TRANS_DQ_BCOUNT, 1L);
> +       xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, 1L);
>         flags |= xfs_ilog_fext(whichfork);
>  
>  done:
> @@ -3025,9 +3008,7 @@ xfs_bmap_adjacent(
>         struct xfs_bmalloca     *ap)    /* bmap alloc argument struct
> */
>  {
>         xfs_fsblock_t   adjust;         /* adjustment to block
> numbers */
> -       xfs_agnumber_t  fb_agno;        /* ag number of ap-
> >firstblock */
>         xfs_mount_t     *mp;            /* mount point structure */
> -       int             nullfb;         /* true if ap->firstblock
> isn't set */
>         int             rt;             /* true if inode is realtime
> */
>  
>  #define        ISVALID(x,y)    \
> @@ -3038,11 +3019,8 @@ xfs_bmap_adjacent(
>                 XFS_FSB_TO_AGBNO(mp, x) < mp->m_sb.sb_agblocks)
>  
>         mp = ap->ip->i_mount;
> -       nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
>         rt = XFS_IS_REALTIME_INODE(ap->ip) &&
>                 (ap->datatype & XFS_ALLOC_USERDATA);
> -       fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp,
> -                                                       ap->tp-
> >t_firstblock);
>         /*
>          * If allocating at eof, and there's a previous real block,
>          * try to use its last block as our starting point.
> @@ -3101,13 +3079,6 @@ xfs_bmap_adjacent(
>                                 prevbno += adjust;
>                         else
>                                 prevdiff += adjust;
> -                       /*
> -                        * If the firstblock forbids it, can't use
> it,
> -                        * must use default.
> -                        */
> -                       if (!rt && !nullfb &&
> -                           XFS_FSB_TO_AGNO(mp, prevbno) != fb_agno)
> -                               prevbno = NULLFSBLOCK;
>                 }
>                 /*
>                  * No previous block or can't follow it, just
> default.
> @@ -3143,13 +3114,6 @@ xfs_bmap_adjacent(
>                                 gotdiff += adjust - ap->length;
>                         } else
>                                 gotdiff += adjust;
> -                       /*
> -                        * If the firstblock forbids it, can't use
> it,
> -                        * must use default.
> -                        */
> -                       if (!rt && !nullfb &&
> -                           XFS_FSB_TO_AGNO(mp, gotbno) != fb_agno)
> -                               gotbno = NULLFSBLOCK;
>                 }
>                 /*
>                  * No next block, just default.
> @@ -3236,7 +3200,7 @@ xfs_bmap_select_minlen(
>  }
>  
>  STATIC int
> -xfs_bmap_btalloc_nullfb(
> +xfs_bmap_btalloc_select_lengths(
>         struct xfs_bmalloca     *ap,
>         struct xfs_alloc_arg    *args,
>         xfs_extlen_t            *blen)
> @@ -3247,8 +3211,13 @@ xfs_bmap_btalloc_nullfb(
>         int                     error;
>  
>         args->type = XFS_ALLOCTYPE_START_BNO;
> -       args->total = ap->total;
> +       if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +               args->total = ap->minlen;
> +               args->minlen = ap->minlen;
> +               return 0;
> +       }
>  
> +       args->total = ap->total;
>         startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
>         if (startag == NULLAGNUMBER)
>                 startag = ag = 0;
> @@ -3280,6 +3249,13 @@ xfs_bmap_btalloc_filestreams(
>         int                     notinit = 0;
>         int                     error;
>  
> +       if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +               args->type = XFS_ALLOCTYPE_FIRST_AG;
> +               args->total = ap->minlen;
> +               args->minlen = ap->minlen;
> +               return 0;
> +       }
> +
>         args->type = XFS_ALLOCTYPE_NEAR_BNO;
>         args->total = ap->total;
>  
> @@ -3460,19 +3436,15 @@ xfs_bmap_exact_minlen_extent_alloc(
>  
>         xfs_bmap_compute_alignments(ap, &args);
>  
> -       if (ap->tp->t_firstblock == NULLFSBLOCK) {
> -               /*
> -                * Unlike the longest extent available in an AG, we
> don't track
> -                * the length of an AG's shortest extent.
> -                * XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is a debug
> only knob and
> -                * hence we can afford to start traversing from the
> 0th AG since
> -                * we need not be concerned about a drop in
> performance in
> -                * "debug only" code paths.
> -                */
> -               ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
> -       } else {
> -               ap->blkno = ap->tp->t_firstblock;
> -       }
> +       /*
> +        * Unlike the longest extent available in an AG, we don't
> track
> +        * the length of an AG's shortest extent.
> +        * XFS_ERRTAG_BMAP_ALLOC_MINLEN_EXTENT is a debug only knob
> and
> +        * hence we can afford to start traversing from the 0th AG
> since
> +        * we need not be concerned about a drop in performance in
> +        * "debug only" code paths.
> +        */
> +       ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
>  
>         args.fsbno = ap->blkno;
>         args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
> @@ -3515,13 +3487,11 @@ xfs_bmap_btalloc(
>         struct xfs_mount        *mp = ap->ip->i_mount;
>         struct xfs_alloc_arg    args = { .tp = ap->tp, .mp = mp };
>         xfs_alloctype_t         atype = 0;
> -       xfs_agnumber_t          fb_agno;        /* ag number of ap-
> >firstblock */
>         xfs_agnumber_t          ag;
>         xfs_fileoff_t           orig_offset;
>         xfs_extlen_t            orig_length;
>         xfs_extlen_t            blen;
>         xfs_extlen_t            nextminlen = 0;
> -       int                     nullfb; /* true if ap->firstblock
> isn't set */
>         int                     isaligned;
>         int                     tryagain;
>         int                     error;
> @@ -3533,34 +3503,17 @@ xfs_bmap_btalloc(
>  
>         stripe_align = xfs_bmap_compute_alignments(ap, &args);
>  
> -       nullfb = ap->tp->t_firstblock == NULLFSBLOCK;
> -       fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp,
> -                                                       ap->tp-
> >t_firstblock);
> -       if (nullfb) {
> -               if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> -                   xfs_inode_is_filestream(ap->ip)) {
> -                       ag = xfs_filestream_lookup_ag(ap->ip);
> -                       ag = (ag != NULLAGNUMBER) ? ag : 0;
> -                       ap->blkno = XFS_AGB_TO_FSB(mp, ag, 0);
> -               } else {
> -                       ap->blkno = XFS_INO_TO_FSB(mp, ap->ip-
> >i_ino);
> -               }
> -       } else
> -               ap->blkno = ap->tp->t_firstblock;
> +       if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> +           xfs_inode_is_filestream(ap->ip)) {
> +               ag = xfs_filestream_lookup_ag(ap->ip);
> +               ag = (ag != NULLAGNUMBER) ? ag : 0;
> +               ap->blkno = XFS_AGB_TO_FSB(mp, ag, 0);
> +       } else {
> +               ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
> +       }
>  
>         xfs_bmap_adjacent(ap);
>  
> -       /*
> -        * If allowed, use ap->blkno; otherwise must use firstblock
> since
> -        * it's in the right allocation group.
> -        */
> -       if (nullfb || XFS_FSB_TO_AGNO(mp, ap->blkno) == fb_agno)
> -               ;
> -       else
> -               ap->blkno = ap->tp->t_firstblock;
> -       /*
> -        * Normal allocation, done through xfs_alloc_vextent.
> -        */
>         tryagain = isaligned = 0;
>         args.fsbno = ap->blkno;
>         args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
> @@ -3568,30 +3521,19 @@ xfs_bmap_btalloc(
>         /* Trim the allocation back to the maximum an AG can fit. */
>         args.maxlen = min(ap->length, mp->m_ag_max_usable);
>         blen = 0;
> -       if (nullfb) {
> -               /*
> -                * Search for an allocation group with a single
> extent large
> -                * enough for the request.  If one isn't found, then
> adjust
> -                * the minimum allocation size to the largest space
> found.
> -                */
> -               if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> -                   xfs_inode_is_filestream(ap->ip))
> -                       error = xfs_bmap_btalloc_filestreams(ap,
> &args, &blen);
> -               else
> -                       error = xfs_bmap_btalloc_nullfb(ap, &args,
> &blen);
> -               if (error)
> -                       return error;
> -       } else if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> -               if (xfs_inode_is_filestream(ap->ip))
> -                       args.type = XFS_ALLOCTYPE_FIRST_AG;
> -               else
> -                       args.type = XFS_ALLOCTYPE_START_BNO;
> -               args.total = args.minlen = ap->minlen;
> -       } else {
> -               args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -               args.total = ap->total;
> -               args.minlen = ap->minlen;
> -       }
> +
> +       /*
> +        * Search for an allocation group with a single extent large
> +        * enough for the request.  If one isn't found, then adjust
> +        * the minimum allocation size to the largest space found.
> +        */
> +       if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> +           xfs_inode_is_filestream(ap->ip))
> +               error = xfs_bmap_btalloc_filestreams(ap, &args,
> &blen);
> +       else
> +               error = xfs_bmap_btalloc_select_lengths(ap, &args,
> &blen);
> +       if (error)
> +               return error;
>  
>         /*
>          * If we are not low on available data blocks, and the
> underlying
> @@ -3678,7 +3620,7 @@ xfs_bmap_btalloc(
>                 if ((error = xfs_alloc_vextent(&args)))
>                         return error;
>         }
> -       if (args.fsbno == NULLFSBLOCK && nullfb &&
> +       if (args.fsbno == NULLFSBLOCK &&
>             args.minlen > ap->minlen) {
>                 args.minlen = ap->minlen;
>                 args.type = XFS_ALLOCTYPE_START_BNO;
> @@ -3686,7 +3628,7 @@ xfs_bmap_btalloc(
>                 if ((error = xfs_alloc_vextent(&args)))
>                         return error;
>         }
> -       if (args.fsbno == NULLFSBLOCK && nullfb) {
> +       if (args.fsbno == NULLFSBLOCK) {
>                 args.fsbno = 0;
>                 args.type = XFS_ALLOCTYPE_FIRST_AG;
>                 args.total = ap->minlen;
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 08c16e4edc0f..0ffc0d998850 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -269,4 +269,5 @@ extern struct
> kmem_cache    *xfs_bmap_intent_cache;
>  int __init xfs_bmap_intent_init_cache(void);
>  void xfs_bmap_intent_destroy_cache(void);
>  
> +
Stray new line?

Otherwise looks like a nice clean up 
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

>  #endif /* __XFS_BMAP_H__ */
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c
> b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 18de4fbfef4e..76a0f0d260a4 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -206,28 +206,21 @@ xfs_bmbt_alloc_block(
>         memset(&args, 0, sizeof(args));
>         args.tp = cur->bc_tp;
>         args.mp = cur->bc_mp;
> -       args.fsbno = cur->bc_tp->t_firstblock;
>         xfs_rmap_ino_bmbt_owner(&args.oinfo, cur->bc_ino.ip->i_ino,
>                         cur->bc_ino.whichfork);
>  
> -       if (args.fsbno == NULLFSBLOCK) {
> -               args.fsbno = be64_to_cpu(start->l);
> -               args.type = XFS_ALLOCTYPE_START_BNO;
> +       args.fsbno = be64_to_cpu(start->l);
> +       args.type = XFS_ALLOCTYPE_START_BNO;
>  
> -               /*
> -                * If we are coming here from something like
> unwritten extent
> -                * conversion, there has been no data extent
> allocation already
> -                * done, so we have to ensure that we attempt to
> locate the
> -                * entire set of bmbt allocations in the same AG, as
> -                * xfs_bmapi_write() would have reserved.
> -                */
> +       /*
> +        * If we are coming here from something like unwritten extent
> +        * conversion, there has been no data extent allocation
> already done, so
> +        * we have to ensure that we attempt to locate the entire set
> of bmbt
> +        * allocations in the same AG, as xfs_bmapi_write() would
> have reserved.
> +        */
> +       if (cur->bc_tp->t_firstblock == NULLFSBLOCK)
>                 args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur-
> >bc_ino.ip,
> -                                               cur-
> >bc_ino.whichfork);
> -       } else if (cur->bc_tp->t_flags & XFS_TRANS_LOWMODE) {
> -               args.type = XFS_ALLOCTYPE_START_BNO;
> -       } else {
> -               args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -       }
> +                                       cur->bc_ino.whichfork);
>  
>         args.minlen = args.maxlen = args.prod = 1;
>         args.wasdel = cur->bc_ino.flags & XFS_BTCUR_BMBT_WASDEL;
> @@ -247,7 +240,7 @@ xfs_bmbt_alloc_block(
>                  */
>                 args.fsbno = 0;
>                 args.minleft = 0;
> -               args.type = XFS_ALLOCTYPE_FIRST_AG;
> +               args.type = XFS_ALLOCTYPE_START_BNO;
>                 error = xfs_alloc_vextent(&args);
>                 if (error)
>                         goto error0;
> @@ -259,7 +252,6 @@ xfs_bmbt_alloc_block(
>         }
>  
>         ASSERT(args.len == 1);
> -       cur->bc_tp->t_firstblock = args.fsbno;
>         cur->bc_ino.allocated++;
>         cur->bc_ino.ip->i_nblocks++;
>         xfs_trans_log_inode(args.tp, cur->bc_ino.ip, XFS_ILOG_CORE);


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 05/42] xfs: t_firstblock is tracking AGs not blocks
  2023-01-18 22:44 ` [PATCH 05/42] xfs: t_firstblock is tracking AGs not blocks Dave Chinner
@ 2023-01-19 22:12   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-19 22:12 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The tp->t_firstblock field is now raelly tracking the highest AG we
> have locked, not the block number of the highest allocation we've
> made. It's purpose is to prevent AGF locking deadlocks, so rename it
> to "highest AG" and simplify the implementation to just track the
> agno rather than a fsbno.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Looks a like a straight forward rename
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_alloc.c      | 12 +++++-------
>  fs/xfs/libxfs/xfs_bmap.c       |  4 ++--
>  fs/xfs/libxfs/xfs_bmap_btree.c |  6 +++---
>  fs/xfs/xfs_bmap_util.c         |  2 +-
>  fs/xfs/xfs_inode.c             |  2 +-
>  fs/xfs/xfs_reflink.c           |  2 +-
>  fs/xfs/xfs_trace.h             |  8 ++++----
>  fs/xfs/xfs_trans.c             |  4 ++--
>  fs/xfs/xfs_trans.h             |  2 +-
>  9 files changed, 20 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index c2f38f595d7f..9f26a9368eeb 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3169,8 +3169,8 @@ xfs_alloc_vextent(
>         mp = args->mp;
>         type = args->otype = args->type;
>         args->agbno = NULLAGBLOCK;
> -       if (args->tp->t_firstblock != NULLFSBLOCK)
> -               minimum_agno = XFS_FSB_TO_AGNO(mp, args->tp-
> >t_firstblock);
> +       if (args->tp->t_highest_agno != NULLAGNUMBER)
> +               minimum_agno = args->tp->t_highest_agno;
>         /*
>          * Just fix this up, for the case where the last a.g. is
> shorter
>          * (or there's only one a.g.) and the caller couldn't easily
> figure
> @@ -3375,11 +3375,9 @@ xfs_alloc_vextent(
>          * deadlocks.
>          */
>         if (args->agbp &&
> -           (args->tp->t_firstblock == NULLFSBLOCK ||
> -            args->pag->pag_agno > minimum_agno)) {
> -               args->tp->t_firstblock = XFS_AGB_TO_FSB(mp,
> -                                       args->pag->pag_agno, 0);
> -       }
> +           (args->tp->t_highest_agno == NULLAGNUMBER ||
> +            args->pag->pag_agno > minimum_agno))
> +               args->tp->t_highest_agno = args->pag->pag_agno;
>         xfs_perag_put(args->pag);
>         return 0;
>  error0:
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index bc566aae4246..f15d45af661f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4192,7 +4192,7 @@ xfs_bmapi_minleft(
>  {
>         struct xfs_ifork        *ifp = xfs_ifork_ptr(ip, fork);
>  
> -       if (tp && tp->t_firstblock != NULLFSBLOCK)
> +       if (tp && tp->t_highest_agno != NULLAGNUMBER)
>                 return 0;
>         if (ifp->if_format != XFS_DINODE_FMT_BTREE)
>                 return 1;
> @@ -6084,7 +6084,7 @@ xfs_bmap_finish_one(
>  {
>         int                             error = 0;
>  
> -       ASSERT(tp->t_firstblock == NULLFSBLOCK);
> +       ASSERT(tp->t_highest_agno == NULLAGNUMBER);
>  
>         trace_xfs_bmap_deferred(tp->t_mountp,
>                         XFS_FSB_TO_AGNO(tp->t_mountp, startblock),
> type,
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c
> b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 76a0f0d260a4..afd9b2d962a3 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -184,11 +184,11 @@ xfs_bmbt_update_cursor(
>         struct xfs_btree_cur    *src,
>         struct xfs_btree_cur    *dst)
>  {
> -       ASSERT((dst->bc_tp->t_firstblock != NULLFSBLOCK) ||
> +       ASSERT((dst->bc_tp->t_highest_agno != NULLAGNUMBER) ||
>                (dst->bc_ino.ip->i_diflags & XFS_DIFLAG_REALTIME));
>  
>         dst->bc_ino.allocated += src->bc_ino.allocated;
> -       dst->bc_tp->t_firstblock = src->bc_tp->t_firstblock;
> +       dst->bc_tp->t_highest_agno = src->bc_tp->t_highest_agno;
>  
>         src->bc_ino.allocated = 0;
>  }
> @@ -218,7 +218,7 @@ xfs_bmbt_alloc_block(
>          * we have to ensure that we attempt to locate the entire set
> of bmbt
>          * allocations in the same AG, as xfs_bmapi_write() would
> have reserved.
>          */
> -       if (cur->bc_tp->t_firstblock == NULLFSBLOCK)
> +       if (cur->bc_tp->t_highest_agno == NULLAGNUMBER)
>                 args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur-
> >bc_ino.ip,
>                                         cur->bc_ino.whichfork);
>  
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 867645b74d88..a09dd2606479 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1410,7 +1410,7 @@ xfs_swap_extent_rmap(
>  
>                 /* Unmap the old blocks in the source file. */
>                 while (tirec.br_blockcount) {
> -                       ASSERT(tp->t_firstblock == NULLFSBLOCK);
> +                       ASSERT(tp->t_highest_agno == NULLAGNUMBER);
>                         trace_xfs_swap_extent_rmap_remap_piece(tip,
> &tirec);
>  
>                         /* Read extent from the source file */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index d354ea2b74f9..dbe274b8065d 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1367,7 +1367,7 @@ xfs_itruncate_extents_flags(
>  
>         unmap_len = XFS_MAX_FILEOFF - first_unmap_block + 1;
>         while (unmap_len > 0) {
> -               ASSERT(tp->t_firstblock == NULLFSBLOCK);
> +               ASSERT(tp->t_highest_agno == NULLAGNUMBER);
>                 error = __xfs_bunmapi(tp, ip, first_unmap_block,
> &unmap_len,
>                                 flags, XFS_ITRUNC_MAX_EXTENTS);
>                 if (error)
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 5535778a98f9..57bf59ff4854 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -610,7 +610,7 @@ xfs_reflink_cancel_cow_blocks(
>                         if (error)
>                                 break;
>                 } else if (del.br_state == XFS_EXT_UNWRITTEN ||
> cancel_real) {
> -                       ASSERT((*tpp)->t_firstblock == NULLFSBLOCK);
> +                       ASSERT((*tpp)->t_highest_agno ==
> NULLAGNUMBER);
>  
>                         /* Free the CoW orphan record. */
>                         xfs_refcount_free_cow_extent(*tpp,
> del.br_startblock,
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 918e778fdd55..7dc57db6aa42 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1801,7 +1801,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>                 __field(char, wasfromfl)
>                 __field(int, resv)
>                 __field(int, datatype)
> -               __field(xfs_fsblock_t, firstblock)
> +               __field(xfs_agnumber_t, highest_agno)
>         ),
>         TP_fast_assign(
>                 __entry->dev = args->mp->m_super->s_dev;
> @@ -1822,12 +1822,12 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>                 __entry->wasfromfl = args->wasfromfl;
>                 __entry->resv = args->resv;
>                 __entry->datatype = args->datatype;
> -               __entry->firstblock = args->tp->t_firstblock;
> +               __entry->highest_agno = args->tp->t_highest_agno;
>         ),
>         TP_printk("dev %d:%d agno 0x%x agbno 0x%x minlen %u maxlen %u
> mod %u "
>                   "prod %u minleft %u total %u alignment %u
> minalignslop %u "
>                   "len %u type %s otype %s wasdel %d wasfromfl %d
> resv %d "
> -                 "datatype 0x%x firstblock 0x%llx",
> +                 "datatype 0x%x highest_agno 0x%x",
>                   MAJOR(__entry->dev), MINOR(__entry->dev),
>                   __entry->agno,
>                   __entry->agbno,
> @@ -1846,7 +1846,7 @@ DECLARE_EVENT_CLASS(xfs_alloc_class,
>                   __entry->wasfromfl,
>                   __entry->resv,
>                   __entry->datatype,
> -                 (unsigned long long)__entry->firstblock)
> +                 __entry->highest_agno)
>  )
>  
>  #define DEFINE_ALLOC_EVENT(name) \
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 7bd16fbff534..53ab544e4c2c 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -102,7 +102,7 @@ xfs_trans_dup(
>         INIT_LIST_HEAD(&ntp->t_items);
>         INIT_LIST_HEAD(&ntp->t_busy);
>         INIT_LIST_HEAD(&ntp->t_dfops);
> -       ntp->t_firstblock = NULLFSBLOCK;
> +       ntp->t_highest_agno = NULLAGNUMBER;
>  
>         ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>         ASSERT(tp->t_ticket != NULL);
> @@ -278,7 +278,7 @@ xfs_trans_alloc(
>         INIT_LIST_HEAD(&tp->t_items);
>         INIT_LIST_HEAD(&tp->t_busy);
>         INIT_LIST_HEAD(&tp->t_dfops);
> -       tp->t_firstblock = NULLFSBLOCK;
> +       tp->t_highest_agno = NULLAGNUMBER;
>  
>         error = xfs_trans_reserve(tp, resp, blocks, rtextents);
>         if (error == -ENOSPC && want_retry) {
> diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
> index 55819785941c..6e3646d524ce 100644
> --- a/fs/xfs/xfs_trans.h
> +++ b/fs/xfs/xfs_trans.h
> @@ -132,7 +132,7 @@ typedef struct xfs_trans {
>         unsigned int            t_rtx_res;      /* # of rt extents
> resvd */
>         unsigned int            t_rtx_res_used; /* # of resvd rt
> extents used */
>         unsigned int            t_flags;        /* misc flags */
> -       xfs_fsblock_t           t_firstblock;   /* first block
> allocated */
> +       xfs_agnumber_t          t_highest_agno; /* highest AGF locked
> */
>         struct xlog_ticket      *t_ticket;      /* log mgr ticket */
>         struct xfs_mount        *t_mountp;      /* ptr to fs mount
> struct */
>         struct xfs_dquot_acct   *t_dqinfo;      /* acctg info for
> dquots */


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 06/42] xfs: don't assert fail on transaction cancel with deferred ops
  2023-01-18 22:44 ` [PATCH 06/42] xfs: don't assert fail on transaction cancel with deferred ops Dave Chinner
@ 2023-01-19 22:18   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-19 22:18 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We can error out of an allocation transaction when updating BMBT
> blocks when things go wrong. This can be a btree corruption, and
> unexpected ENOSPC, etc. In these cases, we already have deferred ops
> queued for the first allocation that has been done, and we just want
> to cancel out the transaction and shut down the filesystem on error.
> 
> In fact, we do just that for production systems - the assert that we
> can't have a transaction with defer ops attached unless we are
> already shut down is bogus and gets in the way of debugging
> whatever issue is actually causing the transaction to be cancelled.
> 
> Remove the assert because it is causing spurious test failures to
> hang test machines.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/xfs_trans.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 53ab544e4c2c..8afc0c080861 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1078,10 +1078,10 @@ xfs_trans_cancel(
>         /*
>          * It's never valid to cancel a transaction with deferred ops
> attached,
>          * because the transaction is effectively dirty.  Complain
> about this
> -        * loudly before freeing the in-memory defer items.
> +        * loudly before freeing the in-memory defer items and
> shutting down the
> +        * filesystem.
>          */
>         if (!list_empty(&tp->t_dfops)) {
> -               ASSERT(xfs_is_shutdown(mp) || list_empty(&tp-
> >t_dfops));
>                 ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>                 dirty = true;
>                 xfs_defer_cancel(tp);


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/42] xfs: active perag reference counting
  2023-01-18 22:44 ` [PATCH 07/42] xfs: active perag reference counting Dave Chinner
@ 2023-01-21  5:16   ` Allison Henderson
  2023-02-01 19:08   ` Darrick J. Wong
  1 sibling, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-21  5:16 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We need to be able to dynamically remove instantiated AGs from
> memory safely, either for shrinking the filesystem or paging AG
> state in and out of memory (e.g. supporting millions of AGs). This
> means we need to be able to safely exclude operations from accessing
> perags while dynamic removal is in progress.
> 
> To do this, introduce the concept of active and passive references.
> Active references are required for high level operations that make
> use of an AG for a given operation (e.g. allocation) and pin the
> perag in memory for the duration of the operation that is operating
> on the perag (e.g. transaction scope). This means we can fail to get
> an active reference to an AG, hence callers of the new active
> reference API must be able to handle lookup failure gracefully.
> 
> Passive references are used in low level code, where we might need
> to access the perag structure for the purposes of completing high
> level operations. For example, buffers need to use passive
> references because:
> - we need to be able to do metadata IO during operations like grow
>   and shrink transactions where high level active references to the
>   AG have already been blocked
> - buffers need to pin the perag until they are reclaimed from
>   memory, something that high level code has no direct control over.
> - unused cached buffers should not prevent a shrink from being
>   started.
> 
> Hence we have active references that will form exclusion barriers
> for operations to be performed on an AG, and passive references that
> will prevent reclaim of the perag until all objects with passive
> references have been reclaimed themselves.
> 
> This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API
> for active AG reference functionality. We also need to convert the
> for_each_perag*() iterators to use active references, which will
> start the process of converting high level code over to using active
> references. Conversion of non-iterator based code to active
> references will be done in followup patches.
> 
> Note that the implementation using reference counting is really just
> a development vehicle for the API to ensure we don't have any leaks
> in the callers. Once we need to remove perag structures from memory
> dyanmically, we will need a much more robust per-ag state transition
> mechanism for preventing new references from being taken while we
> wait for existing references to drain before removal from memory can
> occur....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
ok, I was able to follow it
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

>  fs/xfs/libxfs/xfs_ag.c    | 70
> +++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ag.h    | 31 ++++++++++++-----
>  fs/xfs/scrub/bmap.c       |  2 +-
>  fs/xfs/scrub/fscounters.c |  4 +--
>  fs/xfs/xfs_fsmap.c        |  4 +--
>  fs/xfs/xfs_icache.c       |  2 +-
>  fs/xfs/xfs_iwalk.c        |  6 ++--
>  fs/xfs/xfs_reflink.c      |  2 +-
>  fs/xfs/xfs_trace.h        |  3 ++
>  9 files changed, 105 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index bb0c700afe3c..46e25c682bf4 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -94,6 +94,68 @@ xfs_perag_put(
>         trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref,
> _RET_IP_);
>  }
>  
> +/*
> + * Active references for perag structures. This is for short term
> access to the
> + * per ag structures for walking trees or accessing state. If an AG
> is being
> + * shrunk or is offline, then this will fail to find that AG and
> return NULL
> + * instead.
> + */
> +struct xfs_perag *
> +xfs_perag_grab(
> +       struct xfs_mount        *mp,
> +       xfs_agnumber_t          agno)
> +{
> +       struct xfs_perag        *pag;
> +
> +       rcu_read_lock();
> +       pag = radix_tree_lookup(&mp->m_perag_tree, agno);
> +       if (pag) {
> +               trace_xfs_perag_grab(mp, pag->pag_agno,
> +                               atomic_read(&pag->pag_active_ref),
> _RET_IP_);
> +               if (!atomic_inc_not_zero(&pag->pag_active_ref))
> +                       pag = NULL;
> +       }
> +       rcu_read_unlock();
> +       return pag;
> +}
> +
> +/*
> + * search from @first to find the next perag with the given tag set.
> + */
> +struct xfs_perag *
> +xfs_perag_grab_tag(
> +       struct xfs_mount        *mp,
> +       xfs_agnumber_t          first,
> +       int                     tag)
> +{
> +       struct xfs_perag        *pag;
> +       int                     found;
> +
> +       rcu_read_lock();
> +       found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
> +                                       (void **)&pag, first, 1,
> tag);
> +       if (found <= 0) {
> +               rcu_read_unlock();
> +               return NULL;
> +       }
> +       trace_xfs_perag_grab_tag(mp, pag->pag_agno,
> +                       atomic_read(&pag->pag_active_ref), _RET_IP_);
> +       if (!atomic_inc_not_zero(&pag->pag_active_ref))
> +               pag = NULL;
> +       rcu_read_unlock();
> +       return pag;
> +}
> +
> +void
> +xfs_perag_rele(
> +       struct xfs_perag        *pag)
> +{
> +       trace_xfs_perag_rele(pag->pag_mount, pag->pag_agno,
> +                       atomic_read(&pag->pag_active_ref), _RET_IP_);
> +       if (atomic_dec_and_test(&pag->pag_active_ref))
> +               wake_up(&pag->pag_active_wq);
> +}
> +
>  /*
>   * xfs_initialize_perag_data
>   *
> @@ -196,6 +258,10 @@ xfs_free_perag(
>                 cancel_delayed_work_sync(&pag->pag_blockgc_work);
>                 xfs_buf_hash_destroy(pag);
>  
> +               /* drop the mount's active reference */
> +               xfs_perag_rele(pag);
> +               XFS_IS_CORRUPT(pag->pag_mount,
> +                               atomic_read(&pag->pag_active_ref) !=
> 0);
>                 call_rcu(&pag->rcu_head, __xfs_free_perag);
>         }
>  }
> @@ -314,6 +380,7 @@ xfs_initialize_perag(
>                 INIT_DELAYED_WORK(&pag->pag_blockgc_work,
> xfs_blockgc_worker);
>                 INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
>                 init_waitqueue_head(&pag->pagb_wait);
> +               init_waitqueue_head(&pag->pag_active_wq);
>                 pag->pagb_count = 0;
>                 pag->pagb_tree = RB_ROOT;
>  #endif /* __KERNEL__ */
> @@ -322,6 +389,9 @@ xfs_initialize_perag(
>                 if (error)
>                         goto out_remove_pag;
>  
> +               /* Active ref owned by mount indicates AG is online.
> */
> +               atomic_set(&pag->pag_active_ref, 1);
> +
>                 /* first new pag is fully initialized */
>                 if (first_initialised == NULLAGNUMBER)
>                         first_initialised = index;
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 191b22b9a35b..aeb21c8df201 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -32,7 +32,9 @@ struct xfs_ag_resv {
>  struct xfs_perag {
>         struct xfs_mount *pag_mount;    /* owner filesystem */
>         xfs_agnumber_t  pag_agno;       /* AG this structure belongs
> to */
> -       atomic_t        pag_ref;        /* perag reference count */
> +       atomic_t        pag_ref;        /* passive reference count */
> +       atomic_t        pag_active_ref; /* active reference count */
> +       wait_queue_head_t pag_active_wq;/* woken active_ref falls to
> zero */
>         char            pagf_init;      /* this agf's entry is
> initialized */
>         char            pagi_init;      /* this agi's entry is
> initialized */
>         char            pagf_metadata;  /* the agf is preferred to be
> metadata */
> @@ -111,11 +113,18 @@ int xfs_initialize_perag(struct xfs_mount *mp,
> xfs_agnumber_t agcount,
>  int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t
> agno);
>  void xfs_free_perag(struct xfs_mount *mp);
>  
> +/* Passive AG references */
>  struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t
> agno);
>  struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp,
> xfs_agnumber_t agno,
>                 unsigned int tag);
>  void xfs_perag_put(struct xfs_perag *pag);
>  
> +/* Active AG references */
> +struct xfs_perag *xfs_perag_grab(struct xfs_mount *,
> xfs_agnumber_t);
> +struct xfs_perag *xfs_perag_grab_tag(struct xfs_mount *,
> xfs_agnumber_t,
> +                                  int tag);
> +void xfs_perag_rele(struct xfs_perag *pag);
> +
>  /*
>   * Per-ag geometry infomation and validation
>   */
> @@ -193,14 +202,18 @@ xfs_perag_next(
>         struct xfs_mount        *mp = pag->pag_mount;
>  
>         *agno = pag->pag_agno + 1;
> -       xfs_perag_put(pag);
> -       if (*agno > end_agno)
> -               return NULL;
> -       return xfs_perag_get(mp, *agno);
> +       xfs_perag_rele(pag);
> +       while (*agno <= end_agno) {
> +               pag = xfs_perag_grab(mp, *agno);
> +               if (pag)
> +                       return pag;
> +               (*agno)++;
> +       }
> +       return NULL;
>  }
>  
>  #define for_each_perag_range(mp, agno, end_agno, pag) \
> -       for ((pag) = xfs_perag_get((mp), (agno)); \
> +       for ((pag) = xfs_perag_grab((mp), (agno)); \
>                 (pag) != NULL; \
>                 (pag) = xfs_perag_next((pag), &(agno), (end_agno)))
>  
> @@ -213,11 +226,11 @@ xfs_perag_next(
>         for_each_perag_from((mp), (agno), (pag))
>  
>  #define for_each_perag_tag(mp, agno, pag, tag) \
> -       for ((agno) = 0, (pag) = xfs_perag_get_tag((mp), 0, (tag)); \
> +       for ((agno) = 0, (pag) = xfs_perag_grab_tag((mp), 0, (tag));
> \
>                 (pag) != NULL; \
>                 (agno) = (pag)->pag_agno + 1, \
> -               xfs_perag_put(pag), \
> -               (pag) = xfs_perag_get_tag((mp), (agno), (tag)))
> +               xfs_perag_rele(pag), \
> +               (pag) = xfs_perag_grab_tag((mp), (agno), (tag)))
>  
>  struct aghdr_init_data {
>         /* per ag data */
> diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> index d50d0eab196a..dbbc7037074c 100644
> --- a/fs/xfs/scrub/bmap.c
> +++ b/fs/xfs/scrub/bmap.c
> @@ -662,7 +662,7 @@ xchk_bmap_check_rmaps(
>                 error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag);
>                 if (error ||
>                     (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) {
> -                       xfs_perag_put(pag);
> +                       xfs_perag_rele(pag);
>                         return error;
>                 }
>         }
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index 4777e7b89fdc..ef97670970c3 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -117,7 +117,7 @@ xchk_fscount_warmup(
>         if (agi_bp)
>                 xfs_buf_relse(agi_bp);
>         if (pag)
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         return error;
>  }
>  
> @@ -249,7 +249,7 @@ xchk_fscount_aggregate_agcounts(
>  
>         }
>         if (pag)
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         if (error) {
>                 xchk_set_incomplete(sc);
>                 return error;
> diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
> index 88a88506ffff..120d284a03fe 100644
> --- a/fs/xfs/xfs_fsmap.c
> +++ b/fs/xfs/xfs_fsmap.c
> @@ -688,11 +688,11 @@ __xfs_getfsmap_datadev(
>                 info->agf_bp = NULL;
>         }
>         if (info->pag) {
> -               xfs_perag_put(info->pag);
> +               xfs_perag_rele(info->pag);
>                 info->pag = NULL;
>         } else if (pag) {
>                 /* loop termination case */
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         }
>  
>         return error;
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index ddeaccc04aec..0f4a014dded3 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1767,7 +1767,7 @@ xfs_icwalk(
>                 if (error) {
>                         last_error = error;
>                         if (error == -EFSCORRUPTED) {
> -                               xfs_perag_put(pag);
> +                               xfs_perag_rele(pag);
>                                 break;
>                         }
>                 }
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index 7558486f4937..c31857d903a4 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -591,7 +591,7 @@ xfs_iwalk(
>         }
>  
>         if (iwag.pag)
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         xfs_iwalk_free(&iwag);
>         return error;
>  }
> @@ -683,7 +683,7 @@ xfs_iwalk_threaded(
>                         break;
>         }
>         if (pag)
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         if (polled)
>                 xfs_pwork_poll(&pctl);
>         return xfs_pwork_destroy(&pctl);
> @@ -776,7 +776,7 @@ xfs_inobt_walk(
>         }
>  
>         if (iwag.pag)
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         xfs_iwalk_free(&iwag);
>         return error;
>  }
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 57bf59ff4854..f5dc46ce9803 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -927,7 +927,7 @@ xfs_reflink_recover_cow(
>         for_each_perag(mp, agno, pag) {
>                 error = xfs_refcount_recover_cow_leftovers(mp, pag);
>                 if (error) {
> -                       xfs_perag_put(pag);
> +                       xfs_perag_rele(pag);
>                         break;
>                 }
>         }
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 7dc57db6aa42..f0b62054ea68 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -189,6 +189,9 @@ DEFINE_EVENT(xfs_perag_class, name, \
>  DEFINE_PERAG_REF_EVENT(xfs_perag_get);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_put);
> +DEFINE_PERAG_REF_EVENT(xfs_perag_grab);
> +DEFINE_PERAG_REF_EVENT(xfs_perag_grab_tag);
> +DEFINE_PERAG_REF_EVENT(xfs_perag_rele);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_set_inode_tag);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_clear_inode_tag);
>  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 08/42] xfs: rework the perag trace points to be perag centric
  2023-01-18 22:44 ` [PATCH 08/42] xfs: rework the perag trace points to be perag centric Dave Chinner
@ 2023-01-21  5:16   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-21  5:16 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> So that they all output the same information in the traces to make
> debugging refcount issues easier.
> 
> This means that all the lookup/drop functions no longer need to use
> the full memory barrier atomic operations (atomic*_return()) so
> will have less overhead when tracing is off. The set/clear tag
> tracepoints no longer abuse the reference count to pass the tag -
> the tag being cleared is obvious from the _RET_IP_ that is recorded
> in the trace point.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

>  fs/xfs/libxfs/xfs_ag.c | 25 +++++++++----------------
>  fs/xfs/xfs_icache.c    |  4 ++--
>  fs/xfs/xfs_trace.h     | 21 +++++++++++----------
>  3 files changed, 22 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 46e25c682bf4..7cff61875340 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -44,16 +44,15 @@ xfs_perag_get(
>         xfs_agnumber_t          agno)
>  {
>         struct xfs_perag        *pag;
> -       int                     ref = 0;
>  
>         rcu_read_lock();
>         pag = radix_tree_lookup(&mp->m_perag_tree, agno);
>         if (pag) {
> +               trace_xfs_perag_get(pag, _RET_IP_);
>                 ASSERT(atomic_read(&pag->pag_ref) >= 0);
> -               ref = atomic_inc_return(&pag->pag_ref);
> +               atomic_inc(&pag->pag_ref);
>         }
>         rcu_read_unlock();
> -       trace_xfs_perag_get(mp, agno, ref, _RET_IP_);
>         return pag;
>  }
>  
> @@ -68,7 +67,6 @@ xfs_perag_get_tag(
>  {
>         struct xfs_perag        *pag;
>         int                     found;
> -       int                     ref;
>  
>         rcu_read_lock();
>         found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
> @@ -77,9 +75,9 @@ xfs_perag_get_tag(
>                 rcu_read_unlock();
>                 return NULL;
>         }
> -       ref = atomic_inc_return(&pag->pag_ref);
> +       trace_xfs_perag_get_tag(pag, _RET_IP_);
> +       atomic_inc(&pag->pag_ref);
>         rcu_read_unlock();
> -       trace_xfs_perag_get_tag(mp, pag->pag_agno, ref, _RET_IP_);
>         return pag;
>  }
>  
> @@ -87,11 +85,9 @@ void
>  xfs_perag_put(
>         struct xfs_perag        *pag)
>  {
> -       int     ref;
> -
> +       trace_xfs_perag_put(pag, _RET_IP_);
>         ASSERT(atomic_read(&pag->pag_ref) > 0);
> -       ref = atomic_dec_return(&pag->pag_ref);
> -       trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref,
> _RET_IP_);
> +       atomic_dec(&pag->pag_ref);
>  }
>  
>  /*
> @@ -110,8 +106,7 @@ xfs_perag_grab(
>         rcu_read_lock();
>         pag = radix_tree_lookup(&mp->m_perag_tree, agno);
>         if (pag) {
> -               trace_xfs_perag_grab(mp, pag->pag_agno,
> -                               atomic_read(&pag->pag_active_ref),
> _RET_IP_);
> +               trace_xfs_perag_grab(pag, _RET_IP_);
>                 if (!atomic_inc_not_zero(&pag->pag_active_ref))
>                         pag = NULL;
>         }
> @@ -138,8 +133,7 @@ xfs_perag_grab_tag(
>                 rcu_read_unlock();
>                 return NULL;
>         }
> -       trace_xfs_perag_grab_tag(mp, pag->pag_agno,
> -                       atomic_read(&pag->pag_active_ref), _RET_IP_);
> +       trace_xfs_perag_grab_tag(pag, _RET_IP_);
>         if (!atomic_inc_not_zero(&pag->pag_active_ref))
>                 pag = NULL;
>         rcu_read_unlock();
> @@ -150,8 +144,7 @@ void
>  xfs_perag_rele(
>         struct xfs_perag        *pag)
>  {
> -       trace_xfs_perag_rele(pag->pag_mount, pag->pag_agno,
> -                       atomic_read(&pag->pag_active_ref), _RET_IP_);
> +       trace_xfs_perag_rele(pag, _RET_IP_);
>         if (atomic_dec_and_test(&pag->pag_active_ref))
>                 wake_up(&pag->pag_active_wq);
>  }
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 0f4a014dded3..8b2823d85a68 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -255,7 +255,7 @@ xfs_perag_set_inode_tag(
>                 break;
>         }
>  
> -       trace_xfs_perag_set_inode_tag(mp, pag->pag_agno, tag,
> _RET_IP_);
> +       trace_xfs_perag_set_inode_tag(pag, _RET_IP_);
>  }
>  
>  /* Clear a tag on both the AG incore inode tree and the AG radix
> tree. */
> @@ -289,7 +289,7 @@ xfs_perag_clear_inode_tag(
>         radix_tree_tag_clear(&mp->m_perag_tree, pag->pag_agno, tag);
>         spin_unlock(&mp->m_perag_lock);
>  
> -       trace_xfs_perag_clear_inode_tag(mp, pag->pag_agno, tag,
> _RET_IP_);
> +       trace_xfs_perag_clear_inode_tag(pag, _RET_IP_);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index f0b62054ea68..c921e9a5256d 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -159,33 +159,34 @@ TRACE_EVENT(xlog_intent_recovery_failed,
>  );
>  
>  DECLARE_EVENT_CLASS(xfs_perag_class,
> -       TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int
> refcount,
> -                unsigned long caller_ip),
> -       TP_ARGS(mp, agno, refcount, caller_ip),
> +       TP_PROTO(struct xfs_perag *pag, unsigned long caller_ip),
> +       TP_ARGS(pag, caller_ip),
>         TP_STRUCT__entry(
>                 __field(dev_t, dev)
>                 __field(xfs_agnumber_t, agno)
>                 __field(int, refcount)
> +               __field(int, active_refcount)
>                 __field(unsigned long, caller_ip)
>         ),
>         TP_fast_assign(
> -               __entry->dev = mp->m_super->s_dev;
> -               __entry->agno = agno;
> -               __entry->refcount = refcount;
> +               __entry->dev = pag->pag_mount->m_super->s_dev;
> +               __entry->agno = pag->pag_agno;
> +               __entry->refcount = atomic_read(&pag->pag_ref);
> +               __entry->active_refcount = atomic_read(&pag-
> >pag_active_ref);
>                 __entry->caller_ip = caller_ip;
>         ),
> -       TP_printk("dev %d:%d agno 0x%x refcount %d caller %pS",
> +       TP_printk("dev %d:%d agno 0x%x passive refs %d active refs %d
> caller %pS",
>                   MAJOR(__entry->dev), MINOR(__entry->dev),
>                   __entry->agno,
>                   __entry->refcount,
> +                 __entry->active_refcount,
>                   (char *)__entry->caller_ip)
>  );
>  
>  #define DEFINE_PERAG_REF_EVENT(name)   \
>  DEFINE_EVENT(xfs_perag_class, name,    \
> -       TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int
> refcount,       \
> -                unsigned long
> caller_ip),                                      \
> -       TP_ARGS(mp, agno, refcount, caller_ip))
> +       TP_PROTO(struct xfs_perag *pag, unsigned long caller_ip), \
> +       TP_ARGS(pag, caller_ip))
>  DEFINE_PERAG_REF_EVENT(xfs_perag_get);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_put);


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 10/42] xfs: use active perag references for inode allocation
  2023-01-18 22:44 ` [PATCH 10/42] xfs: use active perag references for inode allocation Dave Chinner
@ 2023-01-22  6:48   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-22  6:48 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Convert the inode allocation routines to use active perag references
> or references held by callers rather than grab their own. Also drive
> the perag further inwards to replace xfs_mounts when doing
> operations on a specific AG.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, looks ok to me
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_ag.c     |  3 +-
>  fs/xfs/libxfs/xfs_ialloc.c | 63 +++++++++++++++++++-----------------
> --
>  fs/xfs/libxfs/xfs_ialloc.h |  2 +-
>  3 files changed, 33 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 7cff61875340..a3bdcde95845 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -925,8 +925,7 @@ xfs_ag_shrink_space(
>          * Make sure that the last inode cluster cannot overlap with
> the new
>          * end of the AG, even if it's sparse.
>          */
> -       error = xfs_ialloc_check_shrink(*tpp, pag->pag_agno, agibp,
> -                       aglen - delta);
> +       error = xfs_ialloc_check_shrink(pag, *tpp, agibp, aglen -
> delta);
>         if (error)
>                 return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 2b4961ff2e24..a1a482ec3065 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -169,14 +169,14 @@ xfs_inobt_insert_rec(
>   */
>  STATIC int
>  xfs_inobt_insert(
> -       struct xfs_mount        *mp,
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
>         struct xfs_buf          *agbp,
> -       struct xfs_perag        *pag,
>         xfs_agino_t             newino,
>         xfs_agino_t             newlen,
>         xfs_btnum_t             btnum)
>  {
> +       struct xfs_mount        *mp = pag->pag_mount;
>         struct xfs_btree_cur    *cur;
>         xfs_agino_t             thisino;
>         int                     i;
> @@ -514,14 +514,14 @@ __xfs_inobt_rec_merge(
>   */
>  STATIC int
>  xfs_inobt_insert_sprec(
> -       struct xfs_mount                *mp,
> +       struct xfs_perag                *pag,
>         struct xfs_trans                *tp,
>         struct xfs_buf                  *agbp,
> -       struct xfs_perag                *pag,
>         int                             btnum,
>         struct xfs_inobt_rec_incore     *nrec,  /* in/out: new/merged
> rec. */
>         bool                            merge)  /* merge or replace
> */
>  {
> +       struct xfs_mount                *mp = pag->pag_mount;
>         struct xfs_btree_cur            *cur;
>         int                             error;
>         int                             i;
> @@ -609,9 +609,9 @@ xfs_inobt_insert_sprec(
>   */
>  STATIC int
>  xfs_ialloc_ag_alloc(
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
> -       struct xfs_buf          *agbp,
> -       struct xfs_perag        *pag)
> +       struct xfs_buf          *agbp)
>  {
>         struct xfs_agi          *agi;
>         struct xfs_alloc_arg    args;
> @@ -831,7 +831,7 @@ xfs_ialloc_ag_alloc(
>                  * if necessary. If a merge does occur, rec is
> updated to the
>                  * merged record.
>                  */
> -               error = xfs_inobt_insert_sprec(args.mp, tp, agbp,
> pag,
> +               error = xfs_inobt_insert_sprec(pag, tp, agbp,
>                                 XFS_BTNUM_INO, &rec, true);
>                 if (error == -EFSCORRUPTED) {
>                         xfs_alert(args.mp,
> @@ -856,20 +856,20 @@ xfs_ialloc_ag_alloc(
>                  * existing record with this one.
>                  */
>                 if (xfs_has_finobt(args.mp)) {
> -                       error = xfs_inobt_insert_sprec(args.mp, tp,
> agbp, pag,
> +                       error = xfs_inobt_insert_sprec(pag, tp, agbp,
>                                        XFS_BTNUM_FINO, &rec, false);
>                         if (error)
>                                 return error;
>                 }
>         } else {
>                 /* full chunk - insert new records to both btrees */
> -               error = xfs_inobt_insert(args.mp, tp, agbp, pag,
> newino, newlen,
> +               error = xfs_inobt_insert(pag, tp, agbp, newino,
> newlen,
>                                          XFS_BTNUM_INO);
>                 if (error)
>                         return error;
>  
>                 if (xfs_has_finobt(args.mp)) {
> -                       error = xfs_inobt_insert(args.mp, tp, agbp,
> pag, newino,
> +                       error = xfs_inobt_insert(pag, tp, agbp,
> newino,
>                                                  newlen,
> XFS_BTNUM_FINO);
>                         if (error)
>                                 return error;
> @@ -981,9 +981,9 @@ xfs_inobt_first_free_inode(
>   */
>  STATIC int
>  xfs_dialloc_ag_inobt(
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
>         struct xfs_buf          *agbp,
> -       struct xfs_perag        *pag,
>         xfs_ino_t               parent,
>         xfs_ino_t               *inop)
>  {
> @@ -1429,9 +1429,9 @@ xfs_dialloc_ag_update_inobt(
>   */
>  static int
>  xfs_dialloc_ag(
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
>         struct xfs_buf          *agbp,
> -       struct xfs_perag        *pag,
>         xfs_ino_t               parent,
>         xfs_ino_t               *inop)
>  {
> @@ -1448,7 +1448,7 @@ xfs_dialloc_ag(
>         int                             i;
>  
>         if (!xfs_has_finobt(mp))
> -               return xfs_dialloc_ag_inobt(tp, agbp, pag, parent,
> inop);
> +               return xfs_dialloc_ag_inobt(pag, tp, agbp, parent,
> inop);
>  
>         /*
>          * If pagino is 0 (this is the root inode allocation) use
> newino.
> @@ -1594,8 +1594,8 @@ xfs_ialloc_next_ag(
>  
>  static bool
>  xfs_dialloc_good_ag(
> -       struct xfs_trans        *tp,
>         struct xfs_perag        *pag,
> +       struct xfs_trans        *tp,
>         umode_t                 mode,
>         int                     flags,
>         bool                    ok_alloc)
> @@ -1606,6 +1606,8 @@ xfs_dialloc_good_ag(
>         int                     needspace;
>         int                     error;
>  
> +       if (!pag)
> +               return false;
>         if (!pag->pagi_inodeok)
>                 return false;
>  
> @@ -1665,8 +1667,8 @@ xfs_dialloc_good_ag(
>  
>  static int
>  xfs_dialloc_try_ag(
> -       struct xfs_trans        **tpp,
>         struct xfs_perag        *pag,
> +       struct xfs_trans        **tpp,
>         xfs_ino_t               parent,
>         xfs_ino_t               *new_ino,
>         bool                    ok_alloc)
> @@ -1689,7 +1691,7 @@ xfs_dialloc_try_ag(
>                         goto out_release;
>                 }
>  
> -               error = xfs_ialloc_ag_alloc(*tpp, agbp, pag);
> +               error = xfs_ialloc_ag_alloc(pag, *tpp, agbp);
>                 if (error < 0)
>                         goto out_release;
>  
> @@ -1705,7 +1707,7 @@ xfs_dialloc_try_ag(
>         }
>  
>         /* Allocate an inode in the found AG */
> -       error = xfs_dialloc_ag(*tpp, agbp, pag, parent, &ino);
> +       error = xfs_dialloc_ag(pag, *tpp, agbp, parent, &ino);
>         if (!error)
>                 *new_ino = ino;
>         return error;
> @@ -1790,9 +1792,9 @@ xfs_dialloc(
>         agno = start_agno;
>         flags = XFS_ALLOC_FLAG_TRYLOCK;
>         for (;;) {
> -               pag = xfs_perag_get(mp, agno);
> -               if (xfs_dialloc_good_ag(*tpp, pag, mode, flags,
> ok_alloc)) {
> -                       error = xfs_dialloc_try_ag(tpp, pag, parent,
> +               pag = xfs_perag_grab(mp, agno);
> +               if (xfs_dialloc_good_ag(pag, *tpp, mode, flags,
> ok_alloc)) {
> +                       error = xfs_dialloc_try_ag(pag, tpp, parent,
>                                         &ino, ok_alloc);
>                         if (error != -EAGAIN)
>                                 break;
> @@ -1813,12 +1815,12 @@ xfs_dialloc(
>                         if (low_space)
>                                 ok_alloc = true;
>                 }
> -               xfs_perag_put(pag);
> +               xfs_perag_rele(pag);
>         }
>  
>         if (!error)
>                 *new_ino = ino;
> -       xfs_perag_put(pag);
> +       xfs_perag_rele(pag);
>         return error;
>  }
>  
> @@ -1902,14 +1904,14 @@ xfs_difree_inode_chunk(
>  
>  STATIC int
>  xfs_difree_inobt(
> -       struct xfs_mount                *mp,
> +       struct xfs_perag                *pag,
>         struct xfs_trans                *tp,
>         struct xfs_buf                  *agbp,
> -       struct xfs_perag                *pag,
>         xfs_agino_t                     agino,
>         struct xfs_icluster             *xic,
>         struct xfs_inobt_rec_incore     *orec)
>  {
> +       struct xfs_mount                *mp = pag->pag_mount;
>         struct xfs_agi                  *agi = agbp->b_addr;
>         struct xfs_btree_cur            *cur;
>         struct xfs_inobt_rec_incore     rec;
> @@ -2036,13 +2038,13 @@ xfs_difree_inobt(
>   */
>  STATIC int
>  xfs_difree_finobt(
> -       struct xfs_mount                *mp,
> +       struct xfs_perag                *pag,
>         struct xfs_trans                *tp,
>         struct xfs_buf                  *agbp,
> -       struct xfs_perag                *pag,
>         xfs_agino_t                     agino,
>         struct xfs_inobt_rec_incore     *ibtrec) /* inobt record */
>  {
> +       struct xfs_mount                *mp = pag->pag_mount;
>         struct xfs_btree_cur            *cur;
>         struct xfs_inobt_rec_incore     rec;
>         int                             offset = agino - ibtrec-
> >ir_startino;
> @@ -2196,7 +2198,7 @@ xfs_difree(
>         /*
>          * Fix up the inode allocation btree.
>          */
> -       error = xfs_difree_inobt(mp, tp, agbp, pag, agino, xic,
> &rec);
> +       error = xfs_difree_inobt(pag, tp, agbp, agino, xic, &rec);
>         if (error)
>                 goto error0;
>  
> @@ -2204,7 +2206,7 @@ xfs_difree(
>          * Fix up the free inode btree.
>          */
>         if (xfs_has_finobt(mp)) {
> -               error = xfs_difree_finobt(mp, tp, agbp, pag, agino,
> &rec);
> +               error = xfs_difree_finobt(pag, tp, agbp, agino,
> &rec);
>                 if (error)
>                         goto error0;
>         }
> @@ -2928,15 +2930,14 @@ xfs_ialloc_calc_rootino(
>   */
>  int
>  xfs_ialloc_check_shrink(
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
> -       xfs_agnumber_t          agno,
>         struct xfs_buf          *agibp,
>         xfs_agblock_t           new_length)
>  {
>         struct xfs_inobt_rec_incore rec;
>         struct xfs_btree_cur    *cur;
>         struct xfs_mount        *mp = tp->t_mountp;
> -       struct xfs_perag        *pag;
>         xfs_agino_t             agino = XFS_AGB_TO_AGINO(mp,
> new_length);
>         int                     has;
>         int                     error;
> @@ -2944,7 +2945,6 @@ xfs_ialloc_check_shrink(
>         if (!xfs_has_sparseinodes(mp))
>                 return 0;
>  
> -       pag = xfs_perag_get(mp, agno);
>         cur = xfs_inobt_init_cursor(mp, tp, agibp, pag,
> XFS_BTNUM_INO);
>  
>         /* Look up the inobt record that would correspond to the new
> EOFS. */
> @@ -2968,6 +2968,5 @@ xfs_ialloc_check_shrink(
>         }
>  out:
>         xfs_btree_del_cursor(cur, error);
> -       xfs_perag_put(pag);
>         return error;
>  }
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index 4cfce2eebe7e..ab8c30b4ec22 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -107,7 +107,7 @@ int xfs_ialloc_cluster_alignment(struct xfs_mount
> *mp);
>  void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
>  xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
>  
> -int xfs_ialloc_check_shrink(struct xfs_trans *tp, xfs_agnumber_t
> agno,
> +int xfs_ialloc_check_shrink(struct xfs_perag *pag, struct xfs_trans
> *tp,
>                 struct xfs_buf *agibp, xfs_agblock_t new_length);
>  
>  #endif /* __XFS_IALLOC_H__ */


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 11/42] xfs: inobt can use perags in many more places than it does
  2023-01-18 22:44 ` [PATCH 11/42] xfs: inobt can use perags in many more places than it does Dave Chinner
@ 2023-01-22  6:48   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-22  6:48 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Lots of code in the inobt infrastructure is passed both xfs_mount
> and perags. We only need perags for the per-ag inode allocation
> code, so reduce the duplication by passing only the perags as the
> primary object.
> 
> This ends up reducing the code size by a bit:
> 
>            text    data     bss     dec     hex filename
> orig    1138878  323979     548 1463405  16546d (TOTALS)
> patched 1138709  323979     548 1463236  1653c4 (TOTALS)
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Alrighty, looks like a nice clean up
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_ag_resv.c      |  2 +-
>  fs/xfs/libxfs/xfs_ialloc.c       | 25 +++++++++++----------
>  fs/xfs/libxfs/xfs_ialloc_btree.c | 37 ++++++++++++++----------------
> --
>  fs/xfs/libxfs/xfs_ialloc_btree.h | 20 ++++++++---------
>  fs/xfs/scrub/agheader_repair.c   |  7 +++---
>  fs/xfs/scrub/common.c            |  8 +++----
>  fs/xfs/xfs_iwalk.c               |  4 ++--
>  7 files changed, 47 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag_resv.c
> b/fs/xfs/libxfs/xfs_ag_resv.c
> index 5af123d13a63..7fd1fea95552 100644
> --- a/fs/xfs/libxfs/xfs_ag_resv.c
> +++ b/fs/xfs/libxfs/xfs_ag_resv.c
> @@ -264,7 +264,7 @@ xfs_ag_resv_init(
>                 if (error)
>                         goto out;
>  
> -               error = xfs_finobt_calc_reserves(mp, tp, pag, &ask,
> &used);
> +               error = xfs_finobt_calc_reserves(pag, tp, &ask,
> &used);
>                 if (error)
>                         goto out;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index a1a482ec3065..5b8401038bab 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -176,13 +176,12 @@ xfs_inobt_insert(
>         xfs_agino_t             newlen,
>         xfs_btnum_t             btnum)
>  {
> -       struct xfs_mount        *mp = pag->pag_mount;
>         struct xfs_btree_cur    *cur;
>         xfs_agino_t             thisino;
>         int                     i;
>         int                     error;
>  
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, btnum);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, btnum);
>  
>         for (thisino = newino;
>              thisino < newino + newlen;
> @@ -527,7 +526,7 @@ xfs_inobt_insert_sprec(
>         int                             i;
>         struct xfs_inobt_rec_incore     rec;
>  
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag, btnum);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, btnum);
>  
>         /* the new record is pre-aligned so we know where to look */
>         error = xfs_inobt_lookup(cur, nrec->ir_startino,
> XFS_LOOKUP_EQ, &i);
> @@ -1004,7 +1003,7 @@ xfs_dialloc_ag_inobt(
>         ASSERT(pag->pagi_freecount > 0);
>  
>   restart_pagno:
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag,
> XFS_BTNUM_INO);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
>         /*
>          * If pagino is 0 (this is the root inode allocation) use
> newino.
>          * This must work because we've just allocated some.
> @@ -1457,7 +1456,7 @@ xfs_dialloc_ag(
>         if (!pagino)
>                 pagino = be32_to_cpu(agi->agi_newino);
>  
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag,
> XFS_BTNUM_FINO);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_FINO);
>  
>         error = xfs_check_agi_freecount(cur);
>         if (error)
> @@ -1500,7 +1499,7 @@ xfs_dialloc_ag(
>          * the original freecount. If all is well, make the
> equivalent update to
>          * the inobt using the finobt record and offset information.
>          */
> -       icur = xfs_inobt_init_cursor(mp, tp, agbp, pag,
> XFS_BTNUM_INO);
> +       icur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
>  
>         error = xfs_check_agi_freecount(icur);
>         if (error)
> @@ -1926,7 +1925,7 @@ xfs_difree_inobt(
>         /*
>          * Initialize the cursor.
>          */
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag,
> XFS_BTNUM_INO);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
>  
>         error = xfs_check_agi_freecount(cur);
>         if (error)
> @@ -2051,7 +2050,7 @@ xfs_difree_finobt(
>         int                             error;
>         int                             i;
>  
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag,
> XFS_BTNUM_FINO);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_FINO);
>  
>         error = xfs_inobt_lookup(cur, ibtrec->ir_startino,
> XFS_LOOKUP_EQ, &i);
>         if (error)
> @@ -2248,7 +2247,7 @@ xfs_imap_lookup(
>          * we have a record, we need to ensure it contains the inode
> number
>          * we are looking up.
>          */
> -       cur = xfs_inobt_init_cursor(mp, tp, agbp, pag,
> XFS_BTNUM_INO);
> +       cur = xfs_inobt_init_cursor(pag, tp, agbp, XFS_BTNUM_INO);
>         error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &i);
>         if (!error) {
>                 if (i)
> @@ -2937,17 +2936,17 @@ xfs_ialloc_check_shrink(
>  {
>         struct xfs_inobt_rec_incore rec;
>         struct xfs_btree_cur    *cur;
> -       struct xfs_mount        *mp = tp->t_mountp;
> -       xfs_agino_t             agino = XFS_AGB_TO_AGINO(mp,
> new_length);
> +       xfs_agino_t             agino;
>         int                     has;
>         int                     error;
>  
> -       if (!xfs_has_sparseinodes(mp))
> +       if (!xfs_has_sparseinodes(pag->pag_mount))
>                 return 0;
>  
> -       cur = xfs_inobt_init_cursor(mp, tp, agibp, pag,
> XFS_BTNUM_INO);
> +       cur = xfs_inobt_init_cursor(pag, tp, agibp, XFS_BTNUM_INO);
>  
>         /* Look up the inobt record that would correspond to the new
> EOFS. */
> +       agino = XFS_AGB_TO_AGINO(pag->pag_mount, new_length);
>         error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &has);
>         if (error || !has)
>                 goto out;
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c
> b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index 8c83e265770c..d657af2ec350 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -36,8 +36,8 @@ STATIC struct xfs_btree_cur *
>  xfs_inobt_dup_cursor(
>         struct xfs_btree_cur    *cur)
>  {
> -       return xfs_inobt_init_cursor(cur->bc_mp, cur->bc_tp,
> -                       cur->bc_ag.agbp, cur->bc_ag.pag, cur-
> >bc_btnum);
> +       return xfs_inobt_init_cursor(cur->bc_ag.pag, cur->bc_tp,
> +                       cur->bc_ag.agbp, cur->bc_btnum);
>  }
>  
>  STATIC void
> @@ -427,11 +427,11 @@ static const struct xfs_btree_ops
> xfs_finobt_ops = {
>   */
>  static struct xfs_btree_cur *
>  xfs_inobt_init_common(
> -       struct xfs_mount        *mp,            /* file system mount
> point */
> -       struct xfs_trans        *tp,            /* transaction
> pointer */
>         struct xfs_perag        *pag,
> +       struct xfs_trans        *tp,            /* transaction
> pointer */
>         xfs_btnum_t             btnum)          /* ialloc or free ino
> btree */
>  {
> +       struct xfs_mount        *mp = pag->pag_mount;
>         struct xfs_btree_cur    *cur;
>  
>         cur = xfs_btree_alloc_cursor(mp, tp, btnum,
> @@ -456,16 +456,15 @@ xfs_inobt_init_common(
>  /* Create an inode btree cursor. */
>  struct xfs_btree_cur *
>  xfs_inobt_init_cursor(
> -       struct xfs_mount        *mp,
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
>         struct xfs_buf          *agbp,
> -       struct xfs_perag        *pag,
>         xfs_btnum_t             btnum)
>  {
>         struct xfs_btree_cur    *cur;
>         struct xfs_agi          *agi = agbp->b_addr;
>  
> -       cur = xfs_inobt_init_common(mp, tp, pag, btnum);
> +       cur = xfs_inobt_init_common(pag, tp, btnum);
>         if (btnum == XFS_BTNUM_INO)
>                 cur->bc_nlevels = be32_to_cpu(agi->agi_level);
>         else
> @@ -477,14 +476,13 @@ xfs_inobt_init_cursor(
>  /* Create an inode btree cursor with a fake root for staging. */
>  struct xfs_btree_cur *
>  xfs_inobt_stage_cursor(
> -       struct xfs_mount        *mp,
> -       struct xbtree_afakeroot *afake,
>         struct xfs_perag        *pag,
> +       struct xbtree_afakeroot *afake,
>         xfs_btnum_t             btnum)
>  {
>         struct xfs_btree_cur    *cur;
>  
> -       cur = xfs_inobt_init_common(mp, NULL, pag, btnum);
> +       cur = xfs_inobt_init_common(pag, NULL, btnum);
>         xfs_btree_stage_afakeroot(cur, afake);
>         return cur;
>  }
> @@ -708,9 +706,8 @@ xfs_inobt_max_size(
>  /* Read AGI and create inobt cursor. */
>  int
>  xfs_inobt_cur(
> -       struct xfs_mount        *mp,
> -       struct xfs_trans        *tp,
>         struct xfs_perag        *pag,
> +       struct xfs_trans        *tp,
>         xfs_btnum_t             which,
>         struct xfs_btree_cur    **curpp,
>         struct xfs_buf          **agi_bpp)
> @@ -725,16 +722,15 @@ xfs_inobt_cur(
>         if (error)
>                 return error;
>  
> -       cur = xfs_inobt_init_cursor(mp, tp, *agi_bpp, pag, which);
> +       cur = xfs_inobt_init_cursor(pag, tp, *agi_bpp, which);
>         *curpp = cur;
>         return 0;
>  }
>  
>  static int
>  xfs_inobt_count_blocks(
> -       struct xfs_mount        *mp,
> -       struct xfs_trans        *tp,
>         struct xfs_perag        *pag,
> +       struct xfs_trans        *tp,
>         xfs_btnum_t             btnum,
>         xfs_extlen_t            *tree_blocks)
>  {
> @@ -742,7 +738,7 @@ xfs_inobt_count_blocks(
>         struct xfs_btree_cur    *cur = NULL;
>         int                     error;
>  
> -       error = xfs_inobt_cur(mp, tp, pag, btnum, &cur, &agbp);
> +       error = xfs_inobt_cur(pag, tp, btnum, &cur, &agbp);
>         if (error)
>                 return error;
>  
> @@ -779,22 +775,21 @@ xfs_finobt_read_blocks(
>   */
>  int
>  xfs_finobt_calc_reserves(
> -       struct xfs_mount        *mp,
> -       struct xfs_trans        *tp,
>         struct xfs_perag        *pag,
> +       struct xfs_trans        *tp,
>         xfs_extlen_t            *ask,
>         xfs_extlen_t            *used)
>  {
>         xfs_extlen_t            tree_len = 0;
>         int                     error;
>  
> -       if (!xfs_has_finobt(mp))
> +       if (!xfs_has_finobt(pag->pag_mount))
>                 return 0;
>  
> -       if (xfs_has_inobtcounts(mp))
> +       if (xfs_has_inobtcounts(pag->pag_mount))
>                 error = xfs_finobt_read_blocks(pag, tp, &tree_len);
>         else
> -               error = xfs_inobt_count_blocks(mp, tp, pag,
> XFS_BTNUM_FINO,
> +               error = xfs_inobt_count_blocks(pag, tp,
> XFS_BTNUM_FINO,
>                                 &tree_len);
>         if (error)
>                 return error;
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h
> b/fs/xfs/libxfs/xfs_ialloc_btree.h
> index 26451cb76b98..e859a6e05230 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.h
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.h
> @@ -46,12 +46,10 @@ struct xfs_perag;
>                  (maxrecs) * sizeof(xfs_inobt_key_t) + \
>                  ((index) - 1) * sizeof(xfs_inobt_ptr_t)))
>  
> -extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount
> *mp,
> -               struct xfs_trans *tp, struct xfs_buf *agbp,
> -               struct xfs_perag *pag, xfs_btnum_t btnum);
> -struct xfs_btree_cur *xfs_inobt_stage_cursor(struct xfs_mount *mp,
> -               struct xbtree_afakeroot *afake, struct xfs_perag
> *pag,
> -               xfs_btnum_t btnum);
> +extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_perag
> *pag,
> +               struct xfs_trans *tp, struct xfs_buf *agbp,
> xfs_btnum_t btnum);
> +struct xfs_btree_cur *xfs_inobt_stage_cursor(struct xfs_perag *pag,
> +               struct xbtree_afakeroot *afake, xfs_btnum_t btnum);
>  extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int);
>  
>  /* ir_holemask to inode allocation bitmap conversion */
> @@ -64,13 +62,13 @@ int xfs_inobt_rec_check_count(struct xfs_mount *,
>  #define xfs_inobt_rec_check_count(mp, rec)     0
>  #endif /* DEBUG */
>  
> -int xfs_finobt_calc_reserves(struct xfs_mount *mp, struct xfs_trans
> *tp,
> -               struct xfs_perag *pag, xfs_extlen_t *ask,
> xfs_extlen_t *used);
> +int xfs_finobt_calc_reserves(struct xfs_perag *perag, struct
> xfs_trans *tp,
> +               xfs_extlen_t *ask, xfs_extlen_t *used);
>  extern xfs_extlen_t xfs_iallocbt_calc_size(struct xfs_mount *mp,
>                 unsigned long long len);
> -int xfs_inobt_cur(struct xfs_mount *mp, struct xfs_trans *tp,
> -               struct xfs_perag *pag, xfs_btnum_t btnum,
> -               struct xfs_btree_cur **curpp, struct xfs_buf
> **agi_bpp);
> +int xfs_inobt_cur(struct xfs_perag *pag, struct xfs_trans *tp,
> +               xfs_btnum_t btnum, struct xfs_btree_cur **curpp,
> +               struct xfs_buf **agi_bpp);
>  
>  void xfs_inobt_commit_staged_btree(struct xfs_btree_cur *cur,
>                 struct xfs_trans *tp, struct xfs_buf *agbp);
> diff --git a/fs/xfs/scrub/agheader_repair.c
> b/fs/xfs/scrub/agheader_repair.c
> index d75d82151eeb..b80b9111e781 100644
> --- a/fs/xfs/scrub/agheader_repair.c
> +++ b/fs/xfs/scrub/agheader_repair.c
> @@ -873,8 +873,7 @@ xrep_agi_calc_from_btrees(
>         xfs_agino_t             freecount;
>         int                     error;
>  
> -       cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp,
> -                       sc->sa.pag, XFS_BTNUM_INO);
> +       cur = xfs_inobt_init_cursor(sc->sa.pag, sc->tp, agi_bp,
> XFS_BTNUM_INO);
>         error = xfs_ialloc_count_inodes(cur, &count, &freecount);
>         if (error)
>                 goto err;
> @@ -894,8 +893,8 @@ xrep_agi_calc_from_btrees(
>         if (xfs_has_finobt(mp) && xfs_has_inobtcounts(mp)) {
>                 xfs_agblock_t   blocks;
>  
> -               cur = xfs_inobt_init_cursor(mp, sc->tp, agi_bp,
> -                               sc->sa.pag, XFS_BTNUM_FINO);
> +               cur = xfs_inobt_init_cursor(sc->sa.pag, sc->tp,
> agi_bp,
> +                               XFS_BTNUM_FINO);
>                 error = xfs_btree_count_blocks(cur, &blocks);
>                 if (error)
>                         goto err;
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 033bf6730ece..848a8e32e56f 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -478,15 +478,15 @@ xchk_ag_btcur_init(
>         /* Set up a inobt cursor for cross-referencing. */
>         if (sa->agi_bp &&
>             xchk_ag_btree_healthy_enough(sc, sa->pag, XFS_BTNUM_INO))
> {
> -               sa->ino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa-
> >agi_bp,
> -                               sa->pag, XFS_BTNUM_INO);
> +               sa->ino_cur = xfs_inobt_init_cursor(sa->pag, sc->tp,
> sa->agi_bp,
> +                               XFS_BTNUM_INO);
>         }
>  
>         /* Set up a finobt cursor for cross-referencing. */
>         if (sa->agi_bp && xfs_has_finobt(mp) &&
>             xchk_ag_btree_healthy_enough(sc, sa->pag,
> XFS_BTNUM_FINO)) {
> -               sa->fino_cur = xfs_inobt_init_cursor(mp, sc->tp, sa-
> >agi_bp,
> -                               sa->pag, XFS_BTNUM_FINO);
> +               sa->fino_cur = xfs_inobt_init_cursor(sa->pag, sc->tp,
> sa->agi_bp,
> +                               XFS_BTNUM_FINO);
>         }
>  
>         /* Set up a rmapbt cursor for cross-referencing. */
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index c31857d903a4..21be93bf006d 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -275,7 +275,7 @@ xfs_iwalk_ag_start(
>  
>         /* Set up a fresh cursor and empty the inobt cache. */
>         iwag->nr_recs = 0;
> -       error = xfs_inobt_cur(mp, tp, pag, XFS_BTNUM_INO, curpp,
> agi_bpp);
> +       error = xfs_inobt_cur(pag, tp, XFS_BTNUM_INO, curpp,
> agi_bpp);
>         if (error)
>                 return error;
>  
> @@ -390,7 +390,7 @@ xfs_iwalk_run_callbacks(
>         }
>  
>         /* ...and recreate the cursor just past where we left off. */
> -       error = xfs_inobt_cur(mp, iwag->tp, iwag->pag, XFS_BTNUM_INO,
> curpp,
> +       error = xfs_inobt_cur(iwag->pag, iwag->tp, XFS_BTNUM_INO,
> curpp,
>                         agi_bpp);
>         if (error)
>                 return error;


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 12/42] xfs: convert xfs_ialloc_next_ag() to an atomic
  2023-01-18 22:44 ` [PATCH 12/42] xfs: convert xfs_ialloc_next_ag() to an atomic Dave Chinner
@ 2023-01-22  7:03   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-22  7:03 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This is currently a spinlock lock protected rotor which can be
> implemented with a single atomic operation. Change it to be more
> efficient and get rid of the m_agirotor_lock. Noticed while
> converting the inode allocation AG selection loop to active perag
> references.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Ok, makes sense
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_ialloc.c | 17 +----------------
>  fs/xfs/libxfs/xfs_sb.c     |  3 ++-
>  fs/xfs/xfs_mount.h         |  3 +--
>  fs/xfs/xfs_super.c         |  1 -
>  4 files changed, 4 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 5b8401038bab..c8d837d8876f 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1576,21 +1576,6 @@ xfs_dialloc_roll(
>         return error;
>  }
>  
> -static xfs_agnumber_t
> -xfs_ialloc_next_ag(
> -       xfs_mount_t     *mp)
> -{
> -       xfs_agnumber_t  agno;
> -
> -       spin_lock(&mp->m_agirotor_lock);
> -       agno = mp->m_agirotor;
> -       if (++mp->m_agirotor >= mp->m_maxagi)
> -               mp->m_agirotor = 0;
> -       spin_unlock(&mp->m_agirotor_lock);
> -
> -       return agno;
> -}
> -
>  static bool
>  xfs_dialloc_good_ag(
>         struct xfs_perag        *pag,
> @@ -1748,7 +1733,7 @@ xfs_dialloc(
>          * an AG has enough space for file creation.
>          */
>         if (S_ISDIR(mode))
> -               start_agno = xfs_ialloc_next_ag(mp);
> +               start_agno = atomic_inc_return(&mp->m_agirotor) % mp-
> >m_maxagi;
>         else {
>                 start_agno = XFS_INO_TO_AGNO(mp, parent);
>                 if (start_agno >= mp->m_maxagi)
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 1eeecf2eb2a7..99cc03a298e2 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -909,7 +909,8 @@ xfs_sb_mount_common(
>         struct xfs_mount        *mp,
>         struct xfs_sb           *sbp)
>  {
> -       mp->m_agfrotor = mp->m_agirotor = 0;
> +       mp->m_agfrotor = 0;
> +       atomic_set(&mp->m_agirotor, 0);
>         mp->m_maxagi = mp->m_sb.sb_agcount;
>         mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
>         mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 8aca2cc173ac..f3269c0626f0 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -210,8 +210,7 @@ typedef struct xfs_mount {
>         struct
> xfs_error_cfg    m_error_cfg[XFS_ERR_CLASS_MAX][XFS_ERR_ERRNO_MAX];
>         struct xstats           m_stats;        /* per-fs stats */
>         xfs_agnumber_t          m_agfrotor;     /* last ag where
> space found */
> -       xfs_agnumber_t          m_agirotor;     /* last ag dir inode
> alloced */
> -       spinlock_t              m_agirotor_lock;/* .. and lock
> protecting it */
> +       atomic_t                m_agirotor;     /* last ag dir inode
> alloced */
>  
>         /* Memory shrinker to throttle and reprioritize inodegc */
>         struct shrinker         m_inodegc_shrinker;
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 0c4b73e9b29d..96375b5622fd 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1922,7 +1922,6 @@ static int xfs_init_fs_context(
>                 return -ENOMEM;
>  
>         spin_lock_init(&mp->m_sb_lock);
> -       spin_lock_init(&mp->m_agirotor_lock);
>         INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
>         spin_lock_init(&mp->m_perag_lock);
>         mutex_init(&mp->m_growlock);


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 13/42] xfs: perags need atomic operational state
  2023-01-18 22:44 ` [PATCH 13/42] xfs: perags need atomic operational state Dave Chinner
@ 2023-01-23  4:04   ` Allison Henderson
  0 siblings, 0 replies; 77+ messages in thread
From: Allison Henderson @ 2023-01-23  4:04 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We currently don't have any flags or operational state in the
> xfs_perag except for the pagf_init and pagi_init flags. And the
> agflreset flag. Oh, there's also the pagf_metadata and pagi_inodeok
> flags, too.
> 
> For controlling per-ag operations, we are going to need some atomic
> state flags. Hence add an opstate field similar to what we already
> have in the mount and log, and convert all these state flags across
> to atomic bit operations.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
Seems like a reasonable conversion
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
>  fs/xfs/libxfs/xfs_ag.h             | 27 ++++++++++++++----
>  fs/xfs/libxfs/xfs_alloc.c          | 23 ++++++++-------
>  fs/xfs/libxfs/xfs_alloc_btree.c    |  2 +-
>  fs/xfs/libxfs/xfs_bmap.c           |  2 +-
>  fs/xfs/libxfs/xfs_ialloc.c         | 14 ++++-----
>  fs/xfs/libxfs/xfs_ialloc_btree.c   |  4 +--
>  fs/xfs/libxfs/xfs_refcount_btree.c |  2 +-
>  fs/xfs/libxfs/xfs_rmap_btree.c     |  2 +-
>  fs/xfs/scrub/agheader_repair.c     | 28 +++++++++---------
>  fs/xfs/scrub/fscounters.c          |  9 ++++--
>  fs/xfs/scrub/repair.c              |  2 +-
>  fs/xfs/xfs_filestream.c            |  5 ++--
>  fs/xfs/xfs_super.c                 | 46 ++++++++++++++++++----------
> --
>  13 files changed, 101 insertions(+), 65 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index aeb21c8df201..187d30d9bb13 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -35,13 +35,9 @@ struct xfs_perag {
>         atomic_t        pag_ref;        /* passive reference count */
>         atomic_t        pag_active_ref; /* active reference count */
>         wait_queue_head_t pag_active_wq;/* woken active_ref falls to
> zero */
> -       char            pagf_init;      /* this agf's entry is
> initialized */
> -       char            pagi_init;      /* this agi's entry is
> initialized */
> -       char            pagf_metadata;  /* the agf is preferred to be
> metadata */
> -       char            pagi_inodeok;   /* The agi is ok for inodes
> */
> +       unsigned long   pag_opstate;
>         uint8_t         pagf_levels[XFS_BTNUM_AGF];
>                                         /* # of levels in bno & cnt
> btree */
> -       bool            pagf_agflreset; /* agfl requires reset before
> use */
>         uint32_t        pagf_flcount;   /* count of blocks in
> freelist */
>         xfs_extlen_t    pagf_freeblks;  /* total free blocks */
>         xfs_extlen_t    pagf_longest;   /* longest free space */
> @@ -108,6 +104,27 @@ struct xfs_perag {
>  #endif /* __KERNEL__ */
>  };
>  
> +/*
> + * Per-AG operational state. These are atomic flag bits.
> + */
> +#define XFS_AGSTATE_AGF_INIT           0
> +#define XFS_AGSTATE_AGI_INIT           1
> +#define XFS_AGSTATE_PREFERS_METADATA   2
> +#define XFS_AGSTATE_ALLOWS_INODES      3
> +#define XFS_AGSTATE_AGFL_NEEDS_RESET   4
> +
> +#define __XFS_AG_OPSTATE(name, NAME) \
> +static inline bool xfs_perag_ ## name (struct xfs_perag *pag) \
> +{ \
> +       return test_bit(XFS_AGSTATE_ ## NAME, &pag->pag_opstate); \
> +}
> +
> +__XFS_AG_OPSTATE(initialised_agf, AGF_INIT)
> +__XFS_AG_OPSTATE(initialised_agi, AGI_INIT)
> +__XFS_AG_OPSTATE(prefers_metadata, PREFERS_METADATA)
> +__XFS_AG_OPSTATE(allows_inodes, ALLOWS_INODES)
> +__XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
> +
>  int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t
> agcount,
>                         xfs_rfsblock_t dcount, xfs_agnumber_t
> *maxagi);
>  int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t
> agno);
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 9f26a9368eeb..246c2e7d9e7a 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2435,7 +2435,7 @@ xfs_agfl_reset(
>         struct xfs_mount        *mp = tp->t_mountp;
>         struct xfs_agf          *agf = agbp->b_addr;
>  
> -       ASSERT(pag->pagf_agflreset);
> +       ASSERT(xfs_perag_agfl_needs_reset(pag));
>         trace_xfs_agfl_reset(mp, agf, 0, _RET_IP_);
>  
>         xfs_warn(mp,
> @@ -2450,7 +2450,7 @@ xfs_agfl_reset(
>                                     XFS_AGF_FLCOUNT);
>  
>         pag->pagf_flcount = 0;
> -       pag->pagf_agflreset = false;
> +       clear_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag->pag_opstate);
>  }
>  
>  /*
> @@ -2605,7 +2605,7 @@ xfs_alloc_fix_freelist(
>         /* deferred ops (AGFL block frees) require permanent
> transactions */
>         ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>  
> -       if (!pag->pagf_init) {
> +       if (!xfs_perag_initialised_agf(pag)) {
>                 error = xfs_alloc_read_agf(pag, tp, flags, &agbp);
>                 if (error) {
>                         /* Couldn't lock the AGF so skip this AG. */
> @@ -2620,7 +2620,8 @@ xfs_alloc_fix_freelist(
>          * somewhere else if we are not being asked to try harder at
> this
>          * point
>          */
> -       if (pag->pagf_metadata && (args->datatype &
> XFS_ALLOC_USERDATA) &&
> +       if (xfs_perag_prefers_metadata(pag) &&
> +           (args->datatype & XFS_ALLOC_USERDATA) &&
>             (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
>                 ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
>                 goto out_agbp_relse;
> @@ -2646,7 +2647,7 @@ xfs_alloc_fix_freelist(
>         }
>  
>         /* reset a padding mismatched agfl before final free space
> check */
> -       if (pag->pagf_agflreset)
> +       if (xfs_perag_agfl_needs_reset(pag))
>                 xfs_agfl_reset(tp, agbp, pag);
>  
>         /* If there isn't enough total space or single-extent, reject
> it. */
> @@ -2803,7 +2804,7 @@ xfs_alloc_get_freelist(
>         if (be32_to_cpu(agf->agf_flfirst) == xfs_agfl_size(mp))
>                 agf->agf_flfirst = 0;
>  
> -       ASSERT(!pag->pagf_agflreset);
> +       ASSERT(!xfs_perag_agfl_needs_reset(pag));
>         be32_add_cpu(&agf->agf_flcount, -1);
>         pag->pagf_flcount--;
>  
> @@ -2892,7 +2893,7 @@ xfs_alloc_put_freelist(
>         if (be32_to_cpu(agf->agf_fllast) == xfs_agfl_size(mp))
>                 agf->agf_fllast = 0;
>  
> -       ASSERT(!pag->pagf_agflreset);
> +       ASSERT(!xfs_perag_agfl_needs_reset(pag));
>         be32_add_cpu(&agf->agf_flcount, 1);
>         pag->pagf_flcount++;
>  
> @@ -3099,7 +3100,7 @@ xfs_alloc_read_agf(
>                 return error;
>  
>         agf = agfbp->b_addr;
> -       if (!pag->pagf_init) {
> +       if (!xfs_perag_initialised_agf(pag)) {
>                 pag->pagf_freeblks = be32_to_cpu(agf->agf_freeblks);
>                 pag->pagf_btreeblks = be32_to_cpu(agf-
> >agf_btreeblks);
>                 pag->pagf_flcount = be32_to_cpu(agf->agf_flcount);
> @@ -3111,8 +3112,8 @@ xfs_alloc_read_agf(
>                 pag->pagf_levels[XFS_BTNUM_RMAPi] =
>                         be32_to_cpu(agf-
> >agf_levels[XFS_BTNUM_RMAPi]);
>                 pag->pagf_refcount_level = be32_to_cpu(agf-
> >agf_refcount_level);
> -               pag->pagf_init = 1;
> -               pag->pagf_agflreset = xfs_agfl_needs_reset(pag-
> >pag_mount, agf);
> +               if (xfs_agfl_needs_reset(pag->pag_mount, agf))
> +                       set_bit(XFS_AGSTATE_AGFL_NEEDS_RESET, &pag-
> >pag_opstate);
>  
>                 /*
>                  * Update the in-core allocbt counter. Filter out the
> rmapbt
> @@ -3127,6 +3128,8 @@ xfs_alloc_read_agf(
>                 if (allocbt_blks > 0)
>                         atomic64_add(allocbt_blks,
>                                         &pag->pag_mount-
> >m_allocbt_blks);
> +
> +               set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
>         }
>  #ifdef DEBUG
>         else if (!xfs_is_shutdown(pag->pag_mount)) {
> diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c
> b/fs/xfs/libxfs/xfs_alloc_btree.c
> index 549a3cba0234..0f29c7b1b39f 100644
> --- a/fs/xfs/libxfs/xfs_alloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_alloc_btree.c
> @@ -315,7 +315,7 @@ xfs_allocbt_verify(
>         level = be16_to_cpu(block->bb_level);
>         if (bp->b_ops->magic[0] == cpu_to_be32(XFS_ABTC_MAGIC))
>                 btnum = XFS_BTNUM_CNTi;
> -       if (pag && pag->pagf_init) {
> +       if (pag && xfs_perag_initialised_agf(pag)) {
>                 if (level >= pag->pagf_levels[btnum])
>                         return __this_address;
>         } else if (level >= mp->m_alloc_maxlevels)
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index f15d45af661f..6aad0ea5e606 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3147,7 +3147,7 @@ xfs_bmap_longest_free_extent(
>         int                     error = 0;
>  
>         pag = xfs_perag_get(mp, ag);
> -       if (!pag->pagf_init) {
> +       if (!xfs_perag_initialised_agf(pag)) {
>                 error = xfs_alloc_read_agf(pag, tp,
> XFS_ALLOC_FLAG_TRYLOCK,
>                                 NULL);
>                 if (error) {
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index c8d837d8876f..2a323ffa5ba9 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -998,8 +998,8 @@ xfs_dialloc_ag_inobt(
>         int                     i, j;
>         int                     searchdistance = 10;
>  
> -       ASSERT(pag->pagi_init);
> -       ASSERT(pag->pagi_inodeok);
> +       ASSERT(xfs_perag_initialised_agi(pag));
> +       ASSERT(xfs_perag_allows_inodes(pag));
>         ASSERT(pag->pagi_freecount > 0);
>  
>   restart_pagno:
> @@ -1592,10 +1592,10 @@ xfs_dialloc_good_ag(
>  
>         if (!pag)
>                 return false;
> -       if (!pag->pagi_inodeok)
> +       if (!xfs_perag_allows_inodes(pag))
>                 return false;
>  
> -       if (!pag->pagi_init) {
> +       if (!xfs_perag_initialised_agi(pag)) {
>                 error = xfs_ialloc_read_agi(pag, tp, NULL);
>                 if (error)
>                         return false;
> @@ -1606,7 +1606,7 @@ xfs_dialloc_good_ag(
>         if (!ok_alloc)
>                 return false;
>  
> -       if (!pag->pagf_init) {
> +       if (!xfs_perag_initialised_agf(pag)) {
>                 error = xfs_alloc_read_agf(pag, tp, flags, NULL);
>                 if (error)
>                         return false;
> @@ -2603,10 +2603,10 @@ xfs_ialloc_read_agi(
>                 return error;
>  
>         agi = agibp->b_addr;
> -       if (!pag->pagi_init) {
> +       if (!xfs_perag_initialised_agi(pag)) {
>                 pag->pagi_freecount = be32_to_cpu(agi-
> >agi_freecount);
>                 pag->pagi_count = be32_to_cpu(agi->agi_count);
> -               pag->pagi_init = 1;
> +               set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
>         }
>  
>         /*
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c
> b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index d657af2ec350..3675a0d29310 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -291,8 +291,8 @@ xfs_inobt_verify(
>          * Similarly, during log recovery we will have a perag
> structure
>          * attached, but the agi information will not yet have been
> initialised
>          * from the on disk AGI. We don't currently use any of this
> information,
> -        * but beware of the landmine (i.e. need to check pag-
> >pagi_init) if we
> -        * ever do.
> +        * but beware of the landmine (i.e. need to check
> +        * xfs_perag_initialised_agi(pag)) if we ever do.
>          */
>         if (xfs_has_crc(mp)) {
>                 fa = xfs_btree_sblock_v5hdr_verify(bp);
> diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c
> b/fs/xfs/libxfs/xfs_refcount_btree.c
> index e1f789866683..d20abf0390fc 100644
> --- a/fs/xfs/libxfs/xfs_refcount_btree.c
> +++ b/fs/xfs/libxfs/xfs_refcount_btree.c
> @@ -227,7 +227,7 @@ xfs_refcountbt_verify(
>                 return fa;
>  
>         level = be16_to_cpu(block->bb_level);
> -       if (pag && pag->pagf_init) {
> +       if (pag && xfs_perag_initialised_agf(pag)) {
>                 if (level >= pag->pagf_refcount_level)
>                         return __this_address;
>         } else if (level >= mp->m_refc_maxlevels)
> diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c
> b/fs/xfs/libxfs/xfs_rmap_btree.c
> index 7f83f62e51e0..d3285684bb5e 100644
> --- a/fs/xfs/libxfs/xfs_rmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_rmap_btree.c
> @@ -313,7 +313,7 @@ xfs_rmapbt_verify(
>                 return fa;
>  
>         level = be16_to_cpu(block->bb_level);
> -       if (pag && pag->pagf_init) {
> +       if (pag && xfs_perag_initialised_agf(pag)) {
>                 if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
>                         return __this_address;
>         } else if (level >= mp->m_rmap_maxlevels)
> diff --git a/fs/xfs/scrub/agheader_repair.c
> b/fs/xfs/scrub/agheader_repair.c
> index b80b9111e781..c37e6d72760b 100644
> --- a/fs/xfs/scrub/agheader_repair.c
> +++ b/fs/xfs/scrub/agheader_repair.c
> @@ -191,14 +191,15 @@ xrep_agf_init_header(
>         struct xfs_agf          *old_agf)
>  {
>         struct xfs_mount        *mp = sc->mp;
> +       struct xfs_perag        *pag = sc->sa.pag;
>         struct xfs_agf          *agf = agf_bp->b_addr;
>  
>         memcpy(old_agf, agf, sizeof(*old_agf));
>         memset(agf, 0, BBTOB(agf_bp->b_length));
>         agf->agf_magicnum = cpu_to_be32(XFS_AGF_MAGIC);
>         agf->agf_versionnum = cpu_to_be32(XFS_AGF_VERSION);
> -       agf->agf_seqno = cpu_to_be32(sc->sa.pag->pag_agno);
> -       agf->agf_length = cpu_to_be32(sc->sa.pag->block_count);
> +       agf->agf_seqno = cpu_to_be32(pag->pag_agno);
> +       agf->agf_length = cpu_to_be32(pag->block_count);
>         agf->agf_flfirst = old_agf->agf_flfirst;
>         agf->agf_fllast = old_agf->agf_fllast;
>         agf->agf_flcount = old_agf->agf_flcount;
> @@ -206,8 +207,8 @@ xrep_agf_init_header(
>                 uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
>  
>         /* Mark the incore AGF data stale until we're done fixing
> things. */
> -       ASSERT(sc->sa.pag->pagf_init);
> -       sc->sa.pag->pagf_init = 0;
> +       ASSERT(xfs_perag_initialised_agf(pag));
> +       clear_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
>  }
>  
>  /* Set btree root information in an AGF. */
> @@ -333,7 +334,7 @@ xrep_agf_commit_new(
>         pag->pagf_levels[XFS_BTNUM_RMAPi] =
>                         be32_to_cpu(agf-
> >agf_levels[XFS_BTNUM_RMAPi]);
>         pag->pagf_refcount_level = be32_to_cpu(agf-
> >agf_refcount_level);
> -       pag->pagf_init = 1;
> +       set_bit(XFS_AGSTATE_AGF_INIT, &pag->pag_opstate);
>  
>         return 0;
>  }
> @@ -434,7 +435,7 @@ xrep_agf(
>  
>  out_revert:
>         /* Mark the incore AGF state stale and revert the AGF. */
> -       sc->sa.pag->pagf_init = 0;
> +       clear_bit(XFS_AGSTATE_AGF_INIT, &sc->sa.pag->pag_opstate);
>         memcpy(agf, &old_agf, sizeof(old_agf));
>         return error;
>  }
> @@ -618,7 +619,7 @@ xrep_agfl_update_agf(
>         xfs_force_summary_recalc(sc->mp);
>  
>         /* Update the AGF counters. */
> -       if (sc->sa.pag->pagf_init)
> +       if (xfs_perag_initialised_agf(sc->sa.pag))
>                 sc->sa.pag->pagf_flcount = flcount;
>         agf->agf_flfirst = cpu_to_be32(0);
>         agf->agf_flcount = cpu_to_be32(flcount);
> @@ -822,14 +823,15 @@ xrep_agi_init_header(
>         struct xfs_agi          *old_agi)
>  {
>         struct xfs_agi          *agi = agi_bp->b_addr;
> +       struct xfs_perag        *pag = sc->sa.pag;
>         struct xfs_mount        *mp = sc->mp;
>  
>         memcpy(old_agi, agi, sizeof(*old_agi));
>         memset(agi, 0, BBTOB(agi_bp->b_length));
>         agi->agi_magicnum = cpu_to_be32(XFS_AGI_MAGIC);
>         agi->agi_versionnum = cpu_to_be32(XFS_AGI_VERSION);
> -       agi->agi_seqno = cpu_to_be32(sc->sa.pag->pag_agno);
> -       agi->agi_length = cpu_to_be32(sc->sa.pag->block_count);
> +       agi->agi_seqno = cpu_to_be32(pag->pag_agno);
> +       agi->agi_length = cpu_to_be32(pag->block_count);
>         agi->agi_newino = cpu_to_be32(NULLAGINO);
>         agi->agi_dirino = cpu_to_be32(NULLAGINO);
>         if (xfs_has_crc(mp))
> @@ -840,8 +842,8 @@ xrep_agi_init_header(
>                         sizeof(agi->agi_unlinked));
>  
>         /* Mark the incore AGF data stale until we're done fixing
> things. */
> -       ASSERT(sc->sa.pag->pagi_init);
> -       sc->sa.pag->pagi_init = 0;
> +       ASSERT(xfs_perag_initialised_agi(pag));
> +       clear_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
>  }
>  
>  /* Set btree root information in an AGI. */
> @@ -928,7 +930,7 @@ xrep_agi_commit_new(
>         pag = sc->sa.pag;
>         pag->pagi_count = be32_to_cpu(agi->agi_count);
>         pag->pagi_freecount = be32_to_cpu(agi->agi_freecount);
> -       pag->pagi_init = 1;
> +       set_bit(XFS_AGSTATE_AGI_INIT, &pag->pag_opstate);
>  
>         return 0;
>  }
> @@ -993,7 +995,7 @@ xrep_agi(
>  
>  out_revert:
>         /* Mark the incore AGI state stale and revert the AGI. */
> -       sc->sa.pag->pagi_init = 0;
> +       clear_bit(XFS_AGSTATE_AGI_INIT, &sc->sa.pag->pag_opstate);
>         memcpy(agi, &old_agi, sizeof(old_agi));
>         return error;
>  }
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index ef97670970c3..f0c7f41897b9 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -86,7 +86,8 @@ xchk_fscount_warmup(
>         for_each_perag(mp, agno, pag) {
>                 if (xchk_should_terminate(sc, &error))
>                         break;
> -               if (pag->pagi_init && pag->pagf_init)
> +               if (xfs_perag_initialised_agi(pag) &&
> +                   xfs_perag_initialised_agf(pag))
>                         continue;
>  
>                 /* Lock both AG headers. */
> @@ -101,7 +102,8 @@ xchk_fscount_warmup(
>                  * These are supposed to be initialized by the header
> read
>                  * function.
>                  */
> -               if (!pag->pagi_init || !pag->pagf_init) {
> +               if (!xfs_perag_initialised_agi(pag) ||
> +                   !xfs_perag_initialised_agf(pag)) {
>                         error = -EFSCORRUPTED;
>                         break;
>                 }
> @@ -220,7 +222,8 @@ xchk_fscount_aggregate_agcounts(
>                         break;
>  
>                 /* This somehow got unset since the warmup? */
> -               if (!pag->pagi_init || !pag->pagf_init) {
> +               if (!xfs_perag_initialised_agi(pag) ||
> +                   !xfs_perag_initialised_agf(pag)) {
>                         error = -EFSCORRUPTED;
>                         break;
>                 }
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 4b92f9253ccd..d0b1644efb89 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -206,7 +206,7 @@ xrep_calc_ag_resblks(
>                 return 0;
>  
>         pag = xfs_perag_get(mp, sm->sm_agno);
> -       if (pag->pagi_init) {
> +       if (xfs_perag_initialised_agi(pag)) {
>                 /* Use in-core icount if possible. */
>                 icount = pag->pagi_count;
>         } else {
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index 34b21a29c39b..7e8b25ab6c46 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -125,7 +125,7 @@ xfs_filestream_pick_ag(
>  
>                 pag = xfs_perag_get(mp, ag);
>  
> -               if (!pag->pagf_init) {
> +               if (!xfs_perag_initialised_agf(pag)) {
>                         err = xfs_alloc_read_agf(pag, NULL, trylock,
> NULL);
>                         if (err) {
>                                 if (err != -EAGAIN) {
> @@ -159,7 +159,8 @@ xfs_filestream_pick_ag(
>                                 xfs_ag_resv_needed(pag,
> XFS_AG_RESV_NONE));
>                 if (((minlen && longest >= minlen) ||
>                      (!minlen && pag->pagf_freeblks >= minfree)) &&
> -                   (!pag->pagf_metadata || !(flags &
> XFS_PICK_USERDATA) ||
> +                   (!xfs_perag_prefers_metadata(pag) ||
> +                    !(flags & XFS_PICK_USERDATA) ||
>                      (flags & XFS_PICK_LOWSPACE))) {
>  
>                         /* Break out, retaining the reference on the
> AG. */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 96375b5622fd..2479b5cbd75e 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -247,6 +247,32 @@ xfs_fs_show_options(
>         return 0;
>  }
>  
> +static bool
> +xfs_set_inode_alloc_perag(
> +       struct xfs_perag        *pag,
> +       xfs_ino_t               ino,
> +       xfs_agnumber_t          max_metadata)
> +{
> +       if (!xfs_is_inode32(pag->pag_mount)) {
> +               set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag-
> >pag_opstate);
> +               clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag-
> >pag_opstate);
> +               return false;
> +       }
> +
> +       if (ino > XFS_MAXINUMBER_32) {
> +               clear_bit(XFS_AGSTATE_ALLOWS_INODES, &pag-
> >pag_opstate);
> +               clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag-
> >pag_opstate);
> +               return false;
> +       }
> +
> +       set_bit(XFS_AGSTATE_ALLOWS_INODES, &pag->pag_opstate);
> +       if (pag->pag_agno < max_metadata)
> +               set_bit(XFS_AGSTATE_PREFERS_METADATA, &pag-
> >pag_opstate);
> +       else
> +               clear_bit(XFS_AGSTATE_PREFERS_METADATA, &pag-
> >pag_opstate);
> +       return true;
> +}
> +
>  /*
>   * Set parameters for inode allocation heuristics, taking into
> account
>   * filesystem size and inode32/inode64 mount options; i.e.
> specifically
> @@ -310,24 +336,8 @@ xfs_set_inode_alloc(
>                 ino = XFS_AGINO_TO_INO(mp, index, agino);
>  
>                 pag = xfs_perag_get(mp, index);
> -
> -               if (xfs_is_inode32(mp)) {
> -                       if (ino > XFS_MAXINUMBER_32) {
> -                               pag->pagi_inodeok = 0;
> -                               pag->pagf_metadata = 0;
> -                       } else {
> -                               pag->pagi_inodeok = 1;
> -                               maxagi++;
> -                               if (index < max_metadata)
> -                                       pag->pagf_metadata = 1;
> -                               else
> -                                       pag->pagf_metadata = 0;
> -                       }
> -               } else {
> -                       pag->pagi_inodeok = 1;
> -                       pag->pagf_metadata = 0;
> -               }
> -
> +               if (xfs_set_inode_alloc_perag(pag, ino,
> max_metadata))
> +                       maxagi++;
>                 xfs_perag_put(pag);
>         }
>  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap()
  2023-01-18 22:44 ` [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap() Dave Chinner
@ 2023-01-23  5:41   ` Allison Henderson
  2023-02-06 23:14     ` Dave Chinner
  2023-02-01 19:28   ` Darrick J. Wong
  1 sibling, 1 reply; 77+ messages in thread
From: Allison Henderson @ 2023-01-23  5:41 UTC (permalink / raw)
  To: david, linux-xfs

On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In several places we iterate every AG from a specific start agno and
> wrap back to the first AG when we reach the end of the filesystem to
> continue searching. We don't have a primitive for this iteration
> yet, so add one for conversion of these algorithms to per-ag based
> iteration.
> 
> The filestream AG select code is a mess, and this initially makes it
> worse. The per-ag selection needs to be driven completely into the
> filestream code to clean this up and it will be done in a future
> patch that makes the filestream allocator use active per-ag
> references correctly.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.h     | 45 +++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_bmap.c   | 76 ++++++++++++++++++++++--------------
> --
>  fs/xfs/libxfs/xfs_ialloc.c | 32 ++++++++--------
>  3 files changed, 104 insertions(+), 49 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 187d30d9bb13..8f43b91d4cf3 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -237,7 +237,6 @@ xfs_perag_next(
>  #define for_each_perag_from(mp, agno, pag) \
>         for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1,
> (pag))
>  
> -
>  #define for_each_perag(mp, agno, pag) \
>         (agno) = 0; \
>         for_each_perag_from((mp), (agno), (pag))
> @@ -249,6 +248,50 @@ xfs_perag_next(
>                 xfs_perag_rele(pag), \
>                 (pag) = xfs_perag_grab_tag((mp), (agno), (tag)))
>  
> +static inline struct xfs_perag *
> +xfs_perag_next_wrap(
> +       struct xfs_perag        *pag,
> +       xfs_agnumber_t          *agno,
> +       xfs_agnumber_t          stop_agno,
> +       xfs_agnumber_t          wrap_agno)
> +{
> +       struct xfs_mount        *mp = pag->pag_mount;
> +
> +       *agno = pag->pag_agno + 1;
> +       xfs_perag_rele(pag);
> +       while (*agno != stop_agno) {
> +               if (*agno >= wrap_agno)
> +                       *agno = 0;
> +               if (*agno == stop_agno)
> +                       break;
> +
> +               pag = xfs_perag_grab(mp, *agno);
> +               if (pag)
> +                       return pag;
> +               (*agno)++;
> +       }
> +       return NULL;
> +}
> +
> +/*
> + * Iterate all AGs from start_agno through wrap_agno, then 0 through
> + * (start_agno - 1).
> + */
> +#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag)
> \
> +       for ((agno) = (start_agno), (pag) = xfs_perag_grab((mp),
> (agno)); \
> +               (pag) != NULL; \
> +               (pag) = xfs_perag_next_wrap((pag), &(agno),
> (start_agno), \
> +                               (wrap_agno)))
> +
> +/*
> + * Iterate all AGs from start_agno through to the end of the
> filesystem, then 0
> + * through (start_agno - 1).
> + */
> +#define for_each_perag_wrap(mp, start_agno, agno, pag) \
> +       for_each_perag_wrap_at((mp), (start_agno), (mp)-
> >m_sb.sb_agcount, \
> +                               (agno), (pag))
> +
> +
>  struct aghdr_init_data {
>         /* per ag data */
>         xfs_agblock_t           agno;           /* ag to init */
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 6aad0ea5e606..e5519abbfa0d 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3136,17 +3136,14 @@ xfs_bmap_adjacent(
>  
>  static int
>  xfs_bmap_longest_free_extent(
> +       struct xfs_perag        *pag,
>         struct xfs_trans        *tp,
> -       xfs_agnumber_t          ag,
>         xfs_extlen_t            *blen,
>         int                     *notinit)
>  {
> -       struct xfs_mount        *mp = tp->t_mountp;
> -       struct xfs_perag        *pag;
>         xfs_extlen_t            longest;
>         int                     error = 0;
>  
> -       pag = xfs_perag_get(mp, ag);
>         if (!xfs_perag_initialised_agf(pag)) {
>                 error = xfs_alloc_read_agf(pag, tp,
> XFS_ALLOC_FLAG_TRYLOCK,
>                                 NULL);
> @@ -3156,19 +3153,17 @@ xfs_bmap_longest_free_extent(
>                                 *notinit = 1;
>                                 error = 0;
>                         }
> -                       goto out;
> +                       return error;
>                 }
>         }
>  
>         longest = xfs_alloc_longest_free_extent(pag,
> -                               xfs_alloc_min_freelist(mp, pag),
> +                               xfs_alloc_min_freelist(pag-
> >pag_mount, pag),
>                                 xfs_ag_resv_needed(pag,
> XFS_AG_RESV_NONE));
>         if (*blen < longest)
>                 *blen = longest;
>  
> -out:
> -       xfs_perag_put(pag);
> -       return error;
> +       return 0;
>  }
>  
>  static void
> @@ -3206,9 +3201,10 @@ xfs_bmap_btalloc_select_lengths(
>         xfs_extlen_t            *blen)
>  {
>         struct xfs_mount        *mp = ap->ip->i_mount;
> -       xfs_agnumber_t          ag, startag;
> +       struct xfs_perag        *pag;
> +       xfs_agnumber_t          agno, startag;
>         int                     notinit = 0;
> -       int                     error;
> +       int                     error = 0;
>  
>         args->type = XFS_ALLOCTYPE_START_BNO;
>         if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> @@ -3218,21 +3214,21 @@ xfs_bmap_btalloc_select_lengths(
>         }
>  
>         args->total = ap->total;
> -       startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
> +       startag = XFS_FSB_TO_AGNO(mp, args->fsbno);
>         if (startag == NULLAGNUMBER)
> -               startag = ag = 0;
> +               startag = 0;
>  
> -       while (*blen < args->maxlen) {
> -               error = xfs_bmap_longest_free_extent(args->tp, ag,
> blen,
> +       *blen = 0;
> +       for_each_perag_wrap(mp, startag, agno, pag) {
> +               error = xfs_bmap_longest_free_extent(pag, args->tp,
> blen,
>                                                      &notinit);
>                 if (error)
> -                       return error;
> -
> -               if (++ag == mp->m_sb.sb_agcount)
> -                       ag = 0;
> -               if (ag == startag)
> +                       break;
> +               if (*blen >= args->maxlen)
>                         break;
>         }
> +       if (pag)
> +               xfs_perag_rele(pag);
>  
>         xfs_bmap_select_minlen(ap, args, blen, notinit);
>         return 0;
Hmm, did you want to return error here?  Since now we only break on
error in the loop body above?

Otherwise looks good.
Allison

> @@ -3245,7 +3241,8 @@ xfs_bmap_btalloc_filestreams(
>         xfs_extlen_t            *blen)
>  {
>         struct xfs_mount        *mp = ap->ip->i_mount;
> -       xfs_agnumber_t          ag;
> +       struct xfs_perag        *pag;
> +       xfs_agnumber_t          start_agno;
>         int                     notinit = 0;
>         int                     error;
>  
> @@ -3259,33 +3256,50 @@ xfs_bmap_btalloc_filestreams(
>         args->type = XFS_ALLOCTYPE_NEAR_BNO;
>         args->total = ap->total;
>  
> -       ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
> -       if (ag == NULLAGNUMBER)
> -               ag = 0;
> +       start_agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> +       if (start_agno == NULLAGNUMBER)
> +               start_agno = 0;
>  
> -       error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
> &notinit);
> -       if (error)
> -               return error;
> +       pag = xfs_perag_grab(mp, start_agno);
> +       if (pag) {
> +               error = xfs_bmap_longest_free_extent(pag, args->tp,
> blen,
> +                               &notinit);
> +               xfs_perag_rele(pag);
> +               if (error)
> +                       return error;
> +       }
>  
>         if (*blen < args->maxlen) {
> -               error = xfs_filestream_new_ag(ap, &ag);
> +               xfs_agnumber_t  agno = start_agno;
> +
> +               error = xfs_filestream_new_ag(ap, &agno);
>                 if (error)
>                         return error;
> +               if (agno == NULLAGNUMBER)
> +                       goto out_select;
>  
> -               error = xfs_bmap_longest_free_extent(args->tp, ag,
> blen,
> -                                                    &notinit);
> +               pag = xfs_perag_grab(mp, agno);
> +               if (!pag)
> +                       goto out_select;
> +
> +               error = xfs_bmap_longest_free_extent(pag, args->tp,
> +                               blen, &notinit);
> +               xfs_perag_rele(pag);
>                 if (error)
>                         return error;
>  
> +               start_agno = agno;
> +
>         }
>  
> +out_select:
>         xfs_bmap_select_minlen(ap, args, blen, notinit);
>  
>         /*
>          * Set the failure fallback case to look in the selected AG
> as stream
>          * may have moved.
>          */
> -       ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, ag, 0);
> +       ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, start_agno, 0);
>         return 0;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 2a323ffa5ba9..50fef3f5af51 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1725,7 +1725,7 @@ xfs_dialloc(
>         bool                    ok_alloc = true;
>         bool                    low_space = false;
>         int                     flags;
> -       xfs_ino_t               ino;
> +       xfs_ino_t               ino = NULLFSINO;
>  
>         /*
>          * Directories, symlinks, and regular files frequently
> allocate at least
> @@ -1773,39 +1773,37 @@ xfs_dialloc(
>          * or in which we can allocate some inodes.  Iterate through
> the
>          * allocation groups upward, wrapping at the end.
>          */
> -       agno = start_agno;
>         flags = XFS_ALLOC_FLAG_TRYLOCK;
> -       for (;;) {
> -               pag = xfs_perag_grab(mp, agno);
> +retry:
> +       for_each_perag_wrap_at(mp, start_agno, mp->m_maxagi, agno,
> pag) {
>                 if (xfs_dialloc_good_ag(pag, *tpp, mode, flags,
> ok_alloc)) {
>                         error = xfs_dialloc_try_ag(pag, tpp, parent,
>                                         &ino, ok_alloc);
>                         if (error != -EAGAIN)
>                                 break;
> +                       error = 0;
>                 }
>  
>                 if (xfs_is_shutdown(mp)) {
>                         error = -EFSCORRUPTED;
>                         break;
>                 }
> -               if (++agno == mp->m_maxagi)
> -                       agno = 0;
> -               if (agno == start_agno) {
> -                       if (!flags) {
> -                               error = -ENOSPC;
> -                               break;
> -                       }
> +       }
> +       if (pag)
> +               xfs_perag_rele(pag);
> +       if (error)
> +               return error;
> +       if (ino == NULLFSINO) {
> +               if (flags) {
>                         flags = 0;
>                         if (low_space)
>                                 ok_alloc = true;
> +                       goto retry;
>                 }
> -               xfs_perag_rele(pag);
> +               return -ENOSPC;
>         }
> -
> -       if (!error)
> -               *new_ino = ino;
> -       xfs_perag_rele(pag);
> -       return error;
> +       *new_ino = ino;
> +       return 0;
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/42] xfs: active perag reference counting
  2023-01-18 22:44 ` [PATCH 07/42] xfs: active perag reference counting Dave Chinner
  2023-01-21  5:16   ` Allison Henderson
@ 2023-02-01 19:08   ` Darrick J. Wong
  2023-02-06 22:56     ` Dave Chinner
  1 sibling, 1 reply; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 19:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:30AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We need to be able to dynamically remove instantiated AGs from
> memory safely, either for shrinking the filesystem or paging AG
> state in and out of memory (e.g. supporting millions of AGs). This
> means we need to be able to safely exclude operations from accessing
> perags while dynamic removal is in progress.
> 
> To do this, introduce the concept of active and passive references.
> Active references are required for high level operations that make
> use of an AG for a given operation (e.g. allocation) and pin the
> perag in memory for the duration of the operation that is operating
> on the perag (e.g. transaction scope). This means we can fail to get
> an active reference to an AG, hence callers of the new active
> reference API must be able to handle lookup failure gracefully.
> 
> Passive references are used in low level code, where we might need
> to access the perag structure for the purposes of completing high
> level operations. For example, buffers need to use passive
> references because:
> - we need to be able to do metadata IO during operations like grow
>   and shrink transactions where high level active references to the
>   AG have already been blocked
> - buffers need to pin the perag until they are reclaimed from
>   memory, something that high level code has no direct control over.
> - unused cached buffers should not prevent a shrink from being
>   started.
> 
> Hence we have active references that will form exclusion barriers
> for operations to be performed on an AG, and passive references that
> will prevent reclaim of the perag until all objects with passive
> references have been reclaimed themselves.

This is going to be fun to rebase the online fsck series on top of. :)

If I'm understanding correctly, active perag refs are for high level
code that wants to call down into an AG to do some operation
(allocating, freeing, scanning, whatever)?  So I think online fsck
uniformly wants xfs_perag_grab/rele, right?

Passive refs are (I think) for lower level code that's wants to call up
into an AG to finish off something that was already started?  And
probably by upper level code?  So the amount of code that actually wants
a passive reference is pretty small?

> This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API
> for active AG reference functionality. We also need to convert the
> for_each_perag*() iterators to use active references, which will
> start the process of converting high level code over to using active
> references. Conversion of non-iterator based code to active
> references will be done in followup patches.

Is there any code that iterates perag structures via passive references?
I think the answer to this is 'no'?

The code changes look all right.  If the answers to the above questions
are "yes", "yes", "yes", and "no", then:
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Note that the implementation using reference counting is really just
> a development vehicle for the API to ensure we don't have any leaks
> in the callers. Once we need to remove perag structures from memory
> dyanmically, we will need a much more robust per-ag state transition
> mechanism for preventing new references from being taken while we
> wait for existing references to drain before removal from memory can
> occur....
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c    | 70 +++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_ag.h    | 31 ++++++++++++-----
>  fs/xfs/scrub/bmap.c       |  2 +-
>  fs/xfs/scrub/fscounters.c |  4 +--
>  fs/xfs/xfs_fsmap.c        |  4 +--
>  fs/xfs/xfs_icache.c       |  2 +-
>  fs/xfs/xfs_iwalk.c        |  6 ++--
>  fs/xfs/xfs_reflink.c      |  2 +-
>  fs/xfs/xfs_trace.h        |  3 ++
>  9 files changed, 105 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index bb0c700afe3c..46e25c682bf4 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -94,6 +94,68 @@ xfs_perag_put(
>  	trace_xfs_perag_put(pag->pag_mount, pag->pag_agno, ref, _RET_IP_);
>  }
>  
> +/*
> + * Active references for perag structures. This is for short term access to the
> + * per ag structures for walking trees or accessing state. If an AG is being
> + * shrunk or is offline, then this will fail to find that AG and return NULL
> + * instead.
> + */
> +struct xfs_perag *
> +xfs_perag_grab(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		agno)
> +{
> +	struct xfs_perag	*pag;
> +
> +	rcu_read_lock();
> +	pag = radix_tree_lookup(&mp->m_perag_tree, agno);
> +	if (pag) {
> +		trace_xfs_perag_grab(mp, pag->pag_agno,
> +				atomic_read(&pag->pag_active_ref), _RET_IP_);
> +		if (!atomic_inc_not_zero(&pag->pag_active_ref))
> +			pag = NULL;
> +	}
> +	rcu_read_unlock();
> +	return pag;
> +}
> +
> +/*
> + * search from @first to find the next perag with the given tag set.
> + */
> +struct xfs_perag *
> +xfs_perag_grab_tag(
> +	struct xfs_mount	*mp,
> +	xfs_agnumber_t		first,
> +	int			tag)
> +{
> +	struct xfs_perag	*pag;
> +	int			found;
> +
> +	rcu_read_lock();
> +	found = radix_tree_gang_lookup_tag(&mp->m_perag_tree,
> +					(void **)&pag, first, 1, tag);
> +	if (found <= 0) {
> +		rcu_read_unlock();
> +		return NULL;
> +	}
> +	trace_xfs_perag_grab_tag(mp, pag->pag_agno,
> +			atomic_read(&pag->pag_active_ref), _RET_IP_);
> +	if (!atomic_inc_not_zero(&pag->pag_active_ref))
> +		pag = NULL;
> +	rcu_read_unlock();
> +	return pag;
> +}
> +
> +void
> +xfs_perag_rele(
> +	struct xfs_perag	*pag)
> +{
> +	trace_xfs_perag_rele(pag->pag_mount, pag->pag_agno,
> +			atomic_read(&pag->pag_active_ref), _RET_IP_);
> +	if (atomic_dec_and_test(&pag->pag_active_ref))
> +		wake_up(&pag->pag_active_wq);
> +}
> +
>  /*
>   * xfs_initialize_perag_data
>   *
> @@ -196,6 +258,10 @@ xfs_free_perag(
>  		cancel_delayed_work_sync(&pag->pag_blockgc_work);
>  		xfs_buf_hash_destroy(pag);
>  
> +		/* drop the mount's active reference */
> +		xfs_perag_rele(pag);
> +		XFS_IS_CORRUPT(pag->pag_mount,
> +				atomic_read(&pag->pag_active_ref) != 0);
>  		call_rcu(&pag->rcu_head, __xfs_free_perag);
>  	}
>  }
> @@ -314,6 +380,7 @@ xfs_initialize_perag(
>  		INIT_DELAYED_WORK(&pag->pag_blockgc_work, xfs_blockgc_worker);
>  		INIT_RADIX_TREE(&pag->pag_ici_root, GFP_ATOMIC);
>  		init_waitqueue_head(&pag->pagb_wait);
> +		init_waitqueue_head(&pag->pag_active_wq);
>  		pag->pagb_count = 0;
>  		pag->pagb_tree = RB_ROOT;
>  #endif /* __KERNEL__ */
> @@ -322,6 +389,9 @@ xfs_initialize_perag(
>  		if (error)
>  			goto out_remove_pag;
>  
> +		/* Active ref owned by mount indicates AG is online. */
> +		atomic_set(&pag->pag_active_ref, 1);
> +
>  		/* first new pag is fully initialized */
>  		if (first_initialised == NULLAGNUMBER)
>  			first_initialised = index;
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 191b22b9a35b..aeb21c8df201 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -32,7 +32,9 @@ struct xfs_ag_resv {
>  struct xfs_perag {
>  	struct xfs_mount *pag_mount;	/* owner filesystem */
>  	xfs_agnumber_t	pag_agno;	/* AG this structure belongs to */
> -	atomic_t	pag_ref;	/* perag reference count */
> +	atomic_t	pag_ref;	/* passive reference count */
> +	atomic_t	pag_active_ref;	/* active reference count */
> +	wait_queue_head_t pag_active_wq;/* woken active_ref falls to zero */
>  	char		pagf_init;	/* this agf's entry is initialized */
>  	char		pagi_init;	/* this agi's entry is initialized */
>  	char		pagf_metadata;	/* the agf is preferred to be metadata */
> @@ -111,11 +113,18 @@ int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
>  int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
>  void xfs_free_perag(struct xfs_mount *mp);
>  
> +/* Passive AG references */
>  struct xfs_perag *xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno);
>  struct xfs_perag *xfs_perag_get_tag(struct xfs_mount *mp, xfs_agnumber_t agno,
>  		unsigned int tag);
>  void xfs_perag_put(struct xfs_perag *pag);
>  
> +/* Active AG references */
> +struct xfs_perag *xfs_perag_grab(struct xfs_mount *, xfs_agnumber_t);
> +struct xfs_perag *xfs_perag_grab_tag(struct xfs_mount *, xfs_agnumber_t,
> +				   int tag);
> +void xfs_perag_rele(struct xfs_perag *pag);
> +
>  /*
>   * Per-ag geometry infomation and validation
>   */
> @@ -193,14 +202,18 @@ xfs_perag_next(
>  	struct xfs_mount	*mp = pag->pag_mount;
>  
>  	*agno = pag->pag_agno + 1;
> -	xfs_perag_put(pag);
> -	if (*agno > end_agno)
> -		return NULL;
> -	return xfs_perag_get(mp, *agno);
> +	xfs_perag_rele(pag);
> +	while (*agno <= end_agno) {
> +		pag = xfs_perag_grab(mp, *agno);
> +		if (pag)
> +			return pag;
> +		(*agno)++;
> +	}
> +	return NULL;
>  }
>  
>  #define for_each_perag_range(mp, agno, end_agno, pag) \
> -	for ((pag) = xfs_perag_get((mp), (agno)); \
> +	for ((pag) = xfs_perag_grab((mp), (agno)); \
>  		(pag) != NULL; \
>  		(pag) = xfs_perag_next((pag), &(agno), (end_agno)))
>  
> @@ -213,11 +226,11 @@ xfs_perag_next(
>  	for_each_perag_from((mp), (agno), (pag))
>  
>  #define for_each_perag_tag(mp, agno, pag, tag) \
> -	for ((agno) = 0, (pag) = xfs_perag_get_tag((mp), 0, (tag)); \
> +	for ((agno) = 0, (pag) = xfs_perag_grab_tag((mp), 0, (tag)); \
>  		(pag) != NULL; \
>  		(agno) = (pag)->pag_agno + 1, \
> -		xfs_perag_put(pag), \
> -		(pag) = xfs_perag_get_tag((mp), (agno), (tag)))
> +		xfs_perag_rele(pag), \
> +		(pag) = xfs_perag_grab_tag((mp), (agno), (tag)))
>  
>  struct aghdr_init_data {
>  	/* per ag data */
> diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
> index d50d0eab196a..dbbc7037074c 100644
> --- a/fs/xfs/scrub/bmap.c
> +++ b/fs/xfs/scrub/bmap.c
> @@ -662,7 +662,7 @@ xchk_bmap_check_rmaps(
>  		error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag);
>  		if (error ||
>  		    (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) {
> -			xfs_perag_put(pag);
> +			xfs_perag_rele(pag);
>  			return error;
>  		}
>  	}
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index 4777e7b89fdc..ef97670970c3 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -117,7 +117,7 @@ xchk_fscount_warmup(
>  	if (agi_bp)
>  		xfs_buf_relse(agi_bp);
>  	if (pag)
> -		xfs_perag_put(pag);
> +		xfs_perag_rele(pag);
>  	return error;
>  }
>  
> @@ -249,7 +249,7 @@ xchk_fscount_aggregate_agcounts(
>  
>  	}
>  	if (pag)
> -		xfs_perag_put(pag);
> +		xfs_perag_rele(pag);
>  	if (error) {
>  		xchk_set_incomplete(sc);
>  		return error;
> diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
> index 88a88506ffff..120d284a03fe 100644
> --- a/fs/xfs/xfs_fsmap.c
> +++ b/fs/xfs/xfs_fsmap.c
> @@ -688,11 +688,11 @@ __xfs_getfsmap_datadev(
>  		info->agf_bp = NULL;
>  	}
>  	if (info->pag) {
> -		xfs_perag_put(info->pag);
> +		xfs_perag_rele(info->pag);
>  		info->pag = NULL;
>  	} else if (pag) {
>  		/* loop termination case */
> -		xfs_perag_put(pag);
> +		xfs_perag_rele(pag);
>  	}
>  
>  	return error;
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index ddeaccc04aec..0f4a014dded3 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1767,7 +1767,7 @@ xfs_icwalk(
>  		if (error) {
>  			last_error = error;
>  			if (error == -EFSCORRUPTED) {
> -				xfs_perag_put(pag);
> +				xfs_perag_rele(pag);
>  				break;
>  			}
>  		}
> diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c
> index 7558486f4937..c31857d903a4 100644
> --- a/fs/xfs/xfs_iwalk.c
> +++ b/fs/xfs/xfs_iwalk.c
> @@ -591,7 +591,7 @@ xfs_iwalk(
>  	}
>  
>  	if (iwag.pag)
> -		xfs_perag_put(pag);
> +		xfs_perag_rele(pag);
>  	xfs_iwalk_free(&iwag);
>  	return error;
>  }
> @@ -683,7 +683,7 @@ xfs_iwalk_threaded(
>  			break;
>  	}
>  	if (pag)
> -		xfs_perag_put(pag);
> +		xfs_perag_rele(pag);
>  	if (polled)
>  		xfs_pwork_poll(&pctl);
>  	return xfs_pwork_destroy(&pctl);
> @@ -776,7 +776,7 @@ xfs_inobt_walk(
>  	}
>  
>  	if (iwag.pag)
> -		xfs_perag_put(pag);
> +		xfs_perag_rele(pag);
>  	xfs_iwalk_free(&iwag);
>  	return error;
>  }
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 57bf59ff4854..f5dc46ce9803 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -927,7 +927,7 @@ xfs_reflink_recover_cow(
>  	for_each_perag(mp, agno, pag) {
>  		error = xfs_refcount_recover_cow_leftovers(mp, pag);
>  		if (error) {
> -			xfs_perag_put(pag);
> +			xfs_perag_rele(pag);
>  			break;
>  		}
>  	}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 7dc57db6aa42..f0b62054ea68 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -189,6 +189,9 @@ DEFINE_EVENT(xfs_perag_class, name,	\
>  DEFINE_PERAG_REF_EVENT(xfs_perag_get);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_get_tag);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_put);
> +DEFINE_PERAG_REF_EVENT(xfs_perag_grab);
> +DEFINE_PERAG_REF_EVENT(xfs_perag_grab_tag);
> +DEFINE_PERAG_REF_EVENT(xfs_perag_rele);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_set_inode_tag);
>  DEFINE_PERAG_REF_EVENT(xfs_perag_clear_inode_tag);
>  
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 09/42] xfs: convert xfs_imap() to take a perag
  2023-01-18 22:44 ` [PATCH 09/42] xfs: convert xfs_imap() to take a perag Dave Chinner
@ 2023-02-01 19:10   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 19:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:32AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Callers have referenced perags but they don't pass it into
> xfs_imap() so it takes it's own reference. Fix that so we can change
> inode allocation over to using active references.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ialloc.c | 43 +++++++++++++-------------------------
>  fs/xfs/libxfs/xfs_ialloc.h |  3 ++-
>  fs/xfs/scrub/common.c      | 13 ++++++++----
>  fs/xfs/xfs_icache.c        |  2 +-
>  4 files changed, 27 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index e8068422aa21..2b4961ff2e24 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -2217,15 +2217,15 @@ xfs_difree(
>  
>  STATIC int
>  xfs_imap_lookup(
> -	struct xfs_mount	*mp,
> -	struct xfs_trans	*tp,
>  	struct xfs_perag	*pag,
> +	struct xfs_trans	*tp,
>  	xfs_agino_t		agino,
>  	xfs_agblock_t		agbno,
>  	xfs_agblock_t		*chunk_agbno,
>  	xfs_agblock_t		*offset_agbno,
>  	int			flags)
>  {
> +	struct xfs_mount	*mp = pag->pag_mount;
>  	struct xfs_inobt_rec_incore rec;
>  	struct xfs_btree_cur	*cur;
>  	struct xfs_buf		*agbp;
> @@ -2280,12 +2280,13 @@ xfs_imap_lookup(
>   */
>  int
>  xfs_imap(
> -	struct xfs_mount	 *mp,	/* file system mount structure */
> +	struct xfs_perag	*pag,
>  	struct xfs_trans	 *tp,	/* transaction pointer */

Stupid nit: fix the extra space ^ problem here.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

>  	xfs_ino_t		ino,	/* inode to locate */
>  	struct xfs_imap		*imap,	/* location map structure */
>  	uint			flags)	/* flags for inode btree lookup */
>  {
> +	struct xfs_mount	*mp = pag->pag_mount;
>  	xfs_agblock_t		agbno;	/* block number of inode in the alloc group */
>  	xfs_agino_t		agino;	/* inode number within alloc group */
>  	xfs_agblock_t		chunk_agbno;	/* first block in inode chunk */
> @@ -2293,17 +2294,15 @@ xfs_imap(
>  	int			error;	/* error code */
>  	int			offset;	/* index of inode in its buffer */
>  	xfs_agblock_t		offset_agbno;	/* blks from chunk start to inode */
> -	struct xfs_perag	*pag;
>  
>  	ASSERT(ino != NULLFSINO);
>  
>  	/*
>  	 * Split up the inode number into its parts.
>  	 */
> -	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ino));
>  	agino = XFS_INO_TO_AGINO(mp, ino);
>  	agbno = XFS_AGINO_TO_AGBNO(mp, agino);
> -	if (!pag || agbno >= mp->m_sb.sb_agblocks ||
> +	if (agbno >= mp->m_sb.sb_agblocks ||
>  	    ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
>  		error = -EINVAL;
>  #ifdef DEBUG
> @@ -2312,20 +2311,14 @@ xfs_imap(
>  		 * as they can be invalid without implying corruption.
>  		 */
>  		if (flags & XFS_IGET_UNTRUSTED)
> -			goto out_drop;
> -		if (!pag) {
> -			xfs_alert(mp,
> -				"%s: agno (%d) >= mp->m_sb.sb_agcount (%d)",
> -				__func__, XFS_INO_TO_AGNO(mp, ino),
> -				mp->m_sb.sb_agcount);
> -		}
> +			return error;
>  		if (agbno >= mp->m_sb.sb_agblocks) {
>  			xfs_alert(mp,
>  		"%s: agbno (0x%llx) >= mp->m_sb.sb_agblocks (0x%lx)",
>  				__func__, (unsigned long long)agbno,
>  				(unsigned long)mp->m_sb.sb_agblocks);
>  		}
> -		if (pag && ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
> +		if (ino != XFS_AGINO_TO_INO(mp, pag->pag_agno, agino)) {
>  			xfs_alert(mp,
>  		"%s: ino (0x%llx) != XFS_AGINO_TO_INO() (0x%llx)",
>  				__func__, ino,
> @@ -2333,7 +2326,7 @@ xfs_imap(
>  		}
>  		xfs_stack_trace();
>  #endif /* DEBUG */
> -		goto out_drop;
> +		return error;
>  	}
>  
>  	/*
> @@ -2344,10 +2337,10 @@ xfs_imap(
>  	 * in all cases where an untrusted inode number is passed.
>  	 */
>  	if (flags & XFS_IGET_UNTRUSTED) {
> -		error = xfs_imap_lookup(mp, tp, pag, agino, agbno,
> +		error = xfs_imap_lookup(pag, tp, agino, agbno,
>  					&chunk_agbno, &offset_agbno, flags);
>  		if (error)
> -			goto out_drop;
> +			return error;
>  		goto out_map;
>  	}
>  
> @@ -2363,8 +2356,7 @@ xfs_imap(
>  		imap->im_len = XFS_FSB_TO_BB(mp, 1);
>  		imap->im_boffset = (unsigned short)(offset <<
>  							mp->m_sb.sb_inodelog);
> -		error = 0;
> -		goto out_drop;
> +		return 0;
>  	}
>  
>  	/*
> @@ -2376,10 +2368,10 @@ xfs_imap(
>  		offset_agbno = agbno & M_IGEO(mp)->inoalign_mask;
>  		chunk_agbno = agbno - offset_agbno;
>  	} else {
> -		error = xfs_imap_lookup(mp, tp, pag, agino, agbno,
> +		error = xfs_imap_lookup(pag, tp, agino, agbno,
>  					&chunk_agbno, &offset_agbno, flags);
>  		if (error)
> -			goto out_drop;
> +			return error;
>  	}
>  
>  out_map:
> @@ -2407,14 +2399,9 @@ xfs_imap(
>  			__func__, (unsigned long long) imap->im_blkno,
>  			(unsigned long long) imap->im_len,
>  			XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks));
> -		error = -EINVAL;
> -		goto out_drop;
> +		return -EINVAL;
>  	}
> -	error = 0;
> -out_drop:
> -	if (pag)
> -		xfs_perag_put(pag);
> -	return error;
> +	return 0;
>  }
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> index 9bbbca6ac4ed..4cfce2eebe7e 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.h
> +++ b/fs/xfs/libxfs/xfs_ialloc.h
> @@ -12,6 +12,7 @@ struct xfs_imap;
>  struct xfs_mount;
>  struct xfs_trans;
>  struct xfs_btree_cur;
> +struct xfs_perag;
>  
>  /* Move inodes in clusters of this size */
>  #define	XFS_INODE_BIG_CLUSTER_SIZE	8192
> @@ -47,7 +48,7 @@ int xfs_difree(struct xfs_trans *tp, struct xfs_perag *pag,
>   */
>  int
>  xfs_imap(
> -	struct xfs_mount *mp,		/* file system mount structure */
> +	struct xfs_perag *pag,
>  	struct xfs_trans *tp,		/* transaction pointer */
>  	xfs_ino_t	ino,		/* inode to locate */
>  	struct xfs_imap	*imap,		/* location map structure */
> diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> index 613260b04a3d..033bf6730ece 100644
> --- a/fs/xfs/scrub/common.c
> +++ b/fs/xfs/scrub/common.c
> @@ -636,6 +636,7 @@ xchk_get_inode(
>  {
>  	struct xfs_imap		imap;
>  	struct xfs_mount	*mp = sc->mp;
> +	struct xfs_perag	*pag;
>  	struct xfs_inode	*ip_in = XFS_I(file_inode(sc->file));
>  	struct xfs_inode	*ip = NULL;
>  	int			error;
> @@ -671,10 +672,14 @@ xchk_get_inode(
>  		 * Otherwise, we really couldn't find it so tell userspace
>  		 * that it no longer exists.
>  		 */
> -		error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap,
> -				XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE);
> -		if (error)
> -			return -ENOENT;
> +		pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, sc->sm->sm_ino));
> +		if (pag) {
> +			error = xfs_imap(pag, sc->tp, sc->sm->sm_ino, &imap,
> +					XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE);
> +			xfs_perag_put(pag);
> +			if (error)
> +				return -ENOENT;
> +		}
>  		error = -EFSCORRUPTED;
>  		fallthrough;
>  	default:
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 8b2823d85a68..c9a7e270a428 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -586,7 +586,7 @@ xfs_iget_cache_miss(
>  	if (!ip)
>  		return -ENOMEM;
>  
> -	error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, flags);
> +	error = xfs_imap(pag, tp, ip->i_ino, &ip->i_imap, flags);
>  	if (error)
>  		goto out_destroy;
>  
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap()
  2023-01-18 22:44 ` [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap() Dave Chinner
  2023-01-23  5:41   ` Allison Henderson
@ 2023-02-01 19:28   ` Darrick J. Wong
  1 sibling, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 19:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:37AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In several places we iterate every AG from a specific start agno and
> wrap back to the first AG when we reach the end of the filesystem to
> continue searching. We don't have a primitive for this iteration
> yet, so add one for conversion of these algorithms to per-ag based
> iteration.
> 
> The filestream AG select code is a mess, and this initially makes it
> worse. The per-ag selection needs to be driven completely into the
> filestream code to clean this up and it will be done in a future
> patch that makes the filestream allocator use active per-ag
> references correctly.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.h     | 45 +++++++++++++++++++++-
>  fs/xfs/libxfs/xfs_bmap.c   | 76 ++++++++++++++++++++++----------------
>  fs/xfs/libxfs/xfs_ialloc.c | 32 ++++++++--------
>  3 files changed, 104 insertions(+), 49 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 187d30d9bb13..8f43b91d4cf3 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -237,7 +237,6 @@ xfs_perag_next(
>  #define for_each_perag_from(mp, agno, pag) \
>  	for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
>  
> -
>  #define for_each_perag(mp, agno, pag) \
>  	(agno) = 0; \
>  	for_each_perag_from((mp), (agno), (pag))
> @@ -249,6 +248,50 @@ xfs_perag_next(
>  		xfs_perag_rele(pag), \
>  		(pag) = xfs_perag_grab_tag((mp), (agno), (tag)))
>  
> +static inline struct xfs_perag *
> +xfs_perag_next_wrap(
> +	struct xfs_perag	*pag,
> +	xfs_agnumber_t		*agno,
> +	xfs_agnumber_t		stop_agno,
> +	xfs_agnumber_t		wrap_agno)
> +{
> +	struct xfs_mount	*mp = pag->pag_mount;
> +
> +	*agno = pag->pag_agno + 1;
> +	xfs_perag_rele(pag);
> +	while (*agno != stop_agno) {
> +		if (*agno >= wrap_agno)
> +			*agno = 0;
> +		if (*agno == stop_agno)
> +			break;
> +
> +		pag = xfs_perag_grab(mp, *agno);
> +		if (pag)
> +			return pag;
> +		(*agno)++;
> +	}
> +	return NULL;
> +}
> +
> +/*
> + * Iterate all AGs from start_agno through wrap_agno, then 0 through
> + * (start_agno - 1).
> + */
> +#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag) \
> +	for ((agno) = (start_agno), (pag) = xfs_perag_grab((mp), (agno)); \
> +		(pag) != NULL; \
> +		(pag) = xfs_perag_next_wrap((pag), &(agno), (start_agno), \
> +				(wrap_agno)))
> +
> +/*
> + * Iterate all AGs from start_agno through to the end of the filesystem, then 0
> + * through (start_agno - 1).
> + */
> +#define for_each_perag_wrap(mp, start_agno, agno, pag) \
> +	for_each_perag_wrap_at((mp), (start_agno), (mp)->m_sb.sb_agcount, \
> +				(agno), (pag))

This seems like a useful new iterator.  I like that the opencoded loops
finally got cleaned up.

> +
> +
>  struct aghdr_init_data {
>  	/* per ag data */
>  	xfs_agblock_t		agno;		/* ag to init */
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 6aad0ea5e606..e5519abbfa0d 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c

<snip>

> @@ -3218,21 +3214,21 @@ xfs_bmap_btalloc_select_lengths(
>  	}
>  
>  	args->total = ap->total;
> -	startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
> +	startag = XFS_FSB_TO_AGNO(mp, args->fsbno);
>  	if (startag == NULLAGNUMBER)
> -		startag = ag = 0;
> +		startag = 0;
>  
> -	while (*blen < args->maxlen) {
> -		error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
> +	*blen = 0;
> +	for_each_perag_wrap(mp, startag, agno, pag) {
> +		error = xfs_bmap_longest_free_extent(pag, args->tp, blen,
>  						     &notinit);
>  		if (error)
> -			return error;
> -
> -		if (++ag == mp->m_sb.sb_agcount)
> -			ag = 0;
> -		if (ag == startag)
> +			break;
> +		if (*blen >= args->maxlen)
>  			break;
>  	}
> +	if (pag)
> +		xfs_perag_rele(pag);
>  
>  	xfs_bmap_select_minlen(ap, args, blen, notinit);
>  	return 0;

Same question as Allison -- if xfs_bmap_longest_free_extent returned a
non-EAGAIN error code, don't we want to return that to the caller?

--D

> @@ -3245,7 +3241,8 @@ xfs_bmap_btalloc_filestreams(
>  	xfs_extlen_t		*blen)
>  {
>  	struct xfs_mount	*mp = ap->ip->i_mount;
> -	xfs_agnumber_t		ag;
> +	struct xfs_perag	*pag;
> +	xfs_agnumber_t		start_agno;
>  	int			notinit = 0;
>  	int			error;
>  
> @@ -3259,33 +3256,50 @@ xfs_bmap_btalloc_filestreams(
>  	args->type = XFS_ALLOCTYPE_NEAR_BNO;
>  	args->total = ap->total;
>  
> -	ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
> -	if (ag == NULLAGNUMBER)
> -		ag = 0;
> +	start_agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> +	if (start_agno == NULLAGNUMBER)
> +		start_agno = 0;
>  
> -	error = xfs_bmap_longest_free_extent(args->tp, ag, blen, &notinit);
> -	if (error)
> -		return error;
> +	pag = xfs_perag_grab(mp, start_agno);
> +	if (pag) {
> +		error = xfs_bmap_longest_free_extent(pag, args->tp, blen,
> +				&notinit);
> +		xfs_perag_rele(pag);
> +		if (error)
> +			return error;
> +	}
>  
>  	if (*blen < args->maxlen) {
> -		error = xfs_filestream_new_ag(ap, &ag);
> +		xfs_agnumber_t	agno = start_agno;
> +
> +		error = xfs_filestream_new_ag(ap, &agno);
>  		if (error)
>  			return error;
> +		if (agno == NULLAGNUMBER)
> +			goto out_select;
>  
> -		error = xfs_bmap_longest_free_extent(args->tp, ag, blen,
> -						     &notinit);
> +		pag = xfs_perag_grab(mp, agno);
> +		if (!pag)
> +			goto out_select;
> +
> +		error = xfs_bmap_longest_free_extent(pag, args->tp,
> +				blen, &notinit);
> +		xfs_perag_rele(pag);
>  		if (error)
>  			return error;
>  
> +		start_agno = agno;
> +
>  	}
>  
> +out_select:
>  	xfs_bmap_select_minlen(ap, args, blen, notinit);
>  
>  	/*
>  	 * Set the failure fallback case to look in the selected AG as stream
>  	 * may have moved.
>  	 */
> -	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, ag, 0);
> +	ap->blkno = args->fsbno = XFS_AGB_TO_FSB(mp, start_agno, 0);
>  	return 0;
>  }
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 2a323ffa5ba9..50fef3f5af51 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -1725,7 +1725,7 @@ xfs_dialloc(
>  	bool			ok_alloc = true;
>  	bool			low_space = false;
>  	int			flags;
> -	xfs_ino_t		ino;
> +	xfs_ino_t		ino = NULLFSINO;
>  
>  	/*
>  	 * Directories, symlinks, and regular files frequently allocate at least
> @@ -1773,39 +1773,37 @@ xfs_dialloc(
>  	 * or in which we can allocate some inodes.  Iterate through the
>  	 * allocation groups upward, wrapping at the end.
>  	 */
> -	agno = start_agno;
>  	flags = XFS_ALLOC_FLAG_TRYLOCK;
> -	for (;;) {
> -		pag = xfs_perag_grab(mp, agno);
> +retry:
> +	for_each_perag_wrap_at(mp, start_agno, mp->m_maxagi, agno, pag) {
>  		if (xfs_dialloc_good_ag(pag, *tpp, mode, flags, ok_alloc)) {
>  			error = xfs_dialloc_try_ag(pag, tpp, parent,
>  					&ino, ok_alloc);
>  			if (error != -EAGAIN)
>  				break;
> +			error = 0;
>  		}
>  
>  		if (xfs_is_shutdown(mp)) {
>  			error = -EFSCORRUPTED;
>  			break;
>  		}
> -		if (++agno == mp->m_maxagi)
> -			agno = 0;
> -		if (agno == start_agno) {
> -			if (!flags) {
> -				error = -ENOSPC;
> -				break;
> -			}
> +	}
> +	if (pag)
> +		xfs_perag_rele(pag);
> +	if (error)
> +		return error;
> +	if (ino == NULLFSINO) {
> +		if (flags) {
>  			flags = 0;
>  			if (low_space)
>  				ok_alloc = true;
> +			goto retry;
>  		}
> -		xfs_perag_rele(pag);
> +		return -ENOSPC;
>  	}
> -
> -	if (!error)
> -		*new_ino = ino;
> -	xfs_perag_rele(pag);
> -	return error;
> +	*new_ino = ino;
> +	return 0;
>  }
>  
>  /*
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 15/42] xfs: rework xfs_alloc_vextent()
  2023-01-18 22:44 ` [PATCH 15/42] xfs: rework xfs_alloc_vextent() Dave Chinner
@ 2023-02-01 19:39   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 19:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:38AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It's a multiplexing mess that can be greatly simplified, and really
> needs to be simplified to allow active per-ag references to
> propagate from initial AG selection code the the bmapi code.
> 
> This splits the code out into separate a parameter checking
> function, an iterator function, and allocation completion functions
> and then implements the individual policies using these functions.

This patch was **so** much easier to read once I imported it and
re-exported it with git set to patience diff mode.  With that in hand
it's far easier to see that the diff really does break up an overlong
function and nothing else.

$ cat /etc/gitconfig
[diff]
        algorithm = patience

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 464 +++++++++++++++++++++++---------------
>  1 file changed, 285 insertions(+), 179 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 246c2e7d9e7a..39e34a1bfa31 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3151,29 +3151,20 @@ xfs_alloc_read_agf(
>  }
>  
>  /*
> - * Allocate an extent (variable-size).
> - * Depending on the allocation type, we either look in a single allocation
> - * group or loop over the allocation groups to find the result.
> + * Pre-proces allocation arguments to set initial state that we don't require
> + * callers to set up correctly, as well as bounds check the allocation args
> + * that are set up.
>   */
> -int				/* error */
> -xfs_alloc_vextent(
> -	struct xfs_alloc_arg	*args)	/* allocation argument structure */
> +static int
> +xfs_alloc_vextent_check_args(
> +	struct xfs_alloc_arg	*args)
>  {
> -	xfs_agblock_t		agsize;	/* allocation group size */
> -	int			error;
> -	int			flags;	/* XFS_ALLOC_FLAG_... locking flags */
> -	struct xfs_mount	*mp;	/* mount structure pointer */
> -	xfs_agnumber_t		sagno;	/* starting allocation group number */
> -	xfs_alloctype_t		type;	/* input allocation type */
> -	int			bump_rotor = 0;
> -	xfs_agnumber_t		rotorstep = xfs_rotorstep; /* inode32 agf stepper */
> -	xfs_agnumber_t		minimum_agno = 0;
> +	struct xfs_mount	*mp = args->mp;
> +	xfs_agblock_t		agsize;
>  
> -	mp = args->mp;
> -	type = args->otype = args->type;
> +	args->otype = args->type;
>  	args->agbno = NULLAGBLOCK;
> -	if (args->tp->t_highest_agno != NULLAGNUMBER)
> -		minimum_agno = args->tp->t_highest_agno;
> +
>  	/*
>  	 * Just fix this up, for the case where the last a.g. is shorter
>  	 * (or there's only one a.g.) and the caller couldn't easily figure
> @@ -3195,199 +3186,314 @@ xfs_alloc_vextent(
>  	    args->mod >= args->prod) {
>  		args->fsbno = NULLFSBLOCK;
>  		trace_xfs_alloc_vextent_badargs(args);
> +		return -ENOSPC;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Post-process allocation results to set the allocated block number correctly
> + * for the caller.
> + *
> + * XXX: xfs_alloc_vextent() should really be returning ENOSPC for ENOSPC, not
> + * hiding it behind a "successful" NULLFSBLOCK allocation.
> + */
> +static void
> +xfs_alloc_vextent_set_fsbno(
> +	struct xfs_alloc_arg	*args,
> +	xfs_agnumber_t		minimum_agno)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +
> +	/*
> +	 * We can end up here with a locked AGF. If we failed, the caller is
> +	 * likely going to try to allocate again with different parameters, and
> +	 * that can widen the AGs that are searched for free space. If we have
> +	 * to do BMBT block allocation, we have to do a new allocation.
> +	 *
> +	 * Hence leaving this function with the AGF locked opens up potential
> +	 * ABBA AGF deadlocks because a future allocation attempt in this
> +	 * transaction may attempt to lock a lower number AGF.
> +	 *
> +	 * We can't release the AGF until the transaction is commited, so at
> +	 * this point we must update the "first allocation" tracker to point at
> +	 * this AG if the tracker is empty or points to a lower AG. This allows
> +	 * the next allocation attempt to be modified appropriately to avoid
> +	 * deadlocks.
> +	 */
> +	if (args->agbp &&
> +	    (args->tp->t_highest_agno == NULLAGNUMBER ||
> +	     args->agno > minimum_agno))
> +		args->tp->t_highest_agno = args->agno;
> +
> +	/* Allocation failed with ENOSPC if NULLAGBLOCK was returned. */
> +	if (args->agbno == NULLAGBLOCK) {
> +		args->fsbno = NULLFSBLOCK;
> +		return;
> +	}
> +
> +	args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
> +#ifdef DEBUG
> +	ASSERT(args->len >= args->minlen);
> +	ASSERT(args->len <= args->maxlen);
> +	ASSERT(args->agbno % args->alignment == 0);
> +	XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno), args->len);
> +#endif
> +}
> +
> +/*
> + * Allocate within a single AG only.
> + */
> +static int
> +xfs_alloc_vextent_this_ag(
> +	struct xfs_alloc_arg	*args,
> +	xfs_agnumber_t		minimum_agno)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +	int			error;
> +
> +	error = xfs_alloc_vextent_check_args(args);
> +	if (error) {
> +		if (error == -ENOSPC)
> +			return 0;
> +		return error;
> +	}
> +
> +	args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> +	if (minimum_agno > args->agno) {
> +		trace_xfs_alloc_vextent_skip_deadlock(args);
> +		args->fsbno = NULLFSBLOCK;
>  		return 0;
>  	}
>  
> -	switch (type) {
> -	case XFS_ALLOCTYPE_THIS_AG:
> -	case XFS_ALLOCTYPE_NEAR_BNO:
> -	case XFS_ALLOCTYPE_THIS_BNO:
> -		/*
> -		 * These three force us into a single a.g.
> -		 */
> -		args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> -		args->pag = xfs_perag_get(mp, args->agno);
> +	args->pag = xfs_perag_get(mp, args->agno);
> +	error = xfs_alloc_fix_freelist(args, 0);
> +	if (error) {
> +		trace_xfs_alloc_vextent_nofix(args);
> +		goto out_error;
> +	}
> +	if (!args->agbp) {
> +		trace_xfs_alloc_vextent_noagbp(args);
> +		args->fsbno = NULLFSBLOCK;
> +		goto out_error;
> +	}
> +	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> +	error = xfs_alloc_ag_vextent(args);
>  
> -		if (minimum_agno > args->agno) {
> -			trace_xfs_alloc_vextent_skip_deadlock(args);
> -			error = 0;
> -			break;
> -		}
> +	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
> +out_error:
> +	xfs_perag_put(args->pag);
> +	return error;
> +}
> +
> +/*
> + * Iterate all AGs trying to allocate an extent starting from @start_ag.
> + *
> + * If the incoming allocation type is XFS_ALLOCTYPE_NEAR_BNO, it means the
> + * allocation attempts in @start_agno have locality information. If we fail to
> + * allocate in that AG, then we revert to anywhere-in-AG for all the other AGs
> + * we attempt to allocation in as there is no locality optimisation possible for
> + * those allocations.
> + *
> + * When we wrap the AG iteration at the end of the filesystem, we have to be
> + * careful not to wrap into AGs below ones we already have locked in the
> + * transaction if we are doing a blocking iteration. This will result in an
> + * out-of-order locking of AGFs and hence can cause deadlocks.
> + */
> +static int
> +xfs_alloc_vextent_iterate_ags(
> +	struct xfs_alloc_arg	*args,
> +	xfs_agnumber_t		minimum_agno,
> +	xfs_agnumber_t		start_agno,
> +	uint32_t		flags)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +	int			error = 0;
>  
> -		error = xfs_alloc_fix_freelist(args, 0);
> +	ASSERT(start_agno >= minimum_agno);
> +
> +	/*
> +	 * Loop over allocation groups twice; first time with
> +	 * trylock set, second time without.
> +	 */
> +	args->agno = start_agno;
> +	for (;;) {
> +		args->pag = xfs_perag_get(mp, args->agno);
> +		error = xfs_alloc_fix_freelist(args, flags);
>  		if (error) {
>  			trace_xfs_alloc_vextent_nofix(args);
> -			goto error0;
> -		}
> -		if (!args->agbp) {
> -			trace_xfs_alloc_vextent_noagbp(args);
>  			break;
>  		}
> -		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> -		if ((error = xfs_alloc_ag_vextent(args)))
> -			goto error0;
> -		break;
> -	case XFS_ALLOCTYPE_START_BNO:
>  		/*
> -		 * Try near allocation first, then anywhere-in-ag after
> -		 * the first a.g. fails.
> +		 * If we get a buffer back then the allocation will fly.
>  		 */
> -		if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
> -		    xfs_is_inode32(mp)) {
> -			args->fsbno = XFS_AGB_TO_FSB(mp,
> -					((mp->m_agfrotor / rotorstep) %
> -					mp->m_sb.sb_agcount), 0);
> -			bump_rotor = 1;
> +		if (args->agbp) {
> +			error = xfs_alloc_ag_vextent(args);
> +			break;
>  		}
> -		args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> -		args->type = XFS_ALLOCTYPE_NEAR_BNO;
> -		fallthrough;
> -	case XFS_ALLOCTYPE_FIRST_AG:
> +
> +		trace_xfs_alloc_vextent_loopfailed(args);
> +
>  		/*
> -		 * Rotate through the allocation groups looking for a winner.
> -		 * If we are blocking, we must obey minimum_agno contraints for
> -		 * avoiding ABBA deadlocks on AGF locking.
> +		 * Didn't work, figure out the next iteration.
>  		 */
> -		if (type == XFS_ALLOCTYPE_FIRST_AG) {
> -			/*
> -			 * Start with allocation group given by bno.
> -			 */
> -			args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> +		if (args->agno == start_agno &&
> +		    args->otype == XFS_ALLOCTYPE_START_BNO)
>  			args->type = XFS_ALLOCTYPE_THIS_AG;
> -			sagno = minimum_agno;
> -			flags = 0;
> -		} else {
> -			/*
> -			 * Start with the given allocation group.
> -			 */
> -			args->agno = sagno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> -			flags = XFS_ALLOC_FLAG_TRYLOCK;
> +
> +		/*
> +		 * If we are try-locking, we can't deadlock on AGF locks so we
> +		 * can wrap all the way back to the first AG. Otherwise, wrap
> +		 * back to the start AG so we can't deadlock and let the end of
> +		 * scan handler decide what to do next.
> +		 */
> +		if (++(args->agno) == mp->m_sb.sb_agcount) {
> +			if (flags & XFS_ALLOC_FLAG_TRYLOCK)
> +				args->agno = 0;
> +			else
> +				args->agno = minimum_agno;
>  		}
>  
>  		/*
> -		 * Loop over allocation groups twice; first time with
> -		 * trylock set, second time without.
> +		 * Reached the starting a.g., must either be done
> +		 * or switch to non-trylock mode.
>  		 */
> -		for (;;) {
> -			args->pag = xfs_perag_get(mp, args->agno);
> -			error = xfs_alloc_fix_freelist(args, flags);
> -			if (error) {
> -				trace_xfs_alloc_vextent_nofix(args);
> -				goto error0;
> -			}
> -			/*
> -			 * If we get a buffer back then the allocation will fly.
> -			 */
> -			if (args->agbp) {
> -				if ((error = xfs_alloc_ag_vextent(args)))
> -					goto error0;
> +		if (args->agno == start_agno) {
> +			if (flags == 0) {
> +				args->agbno = NULLAGBLOCK;
> +				trace_xfs_alloc_vextent_allfailed(args);
>  				break;
>  			}
>  
> -			trace_xfs_alloc_vextent_loopfailed(args);
> +			flags = 0;
> +			if (args->otype == XFS_ALLOCTYPE_START_BNO) {
> +				args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> +				args->type = XFS_ALLOCTYPE_NEAR_BNO;
> +			}
> +		}
> +		xfs_perag_put(args->pag);
> +		args->pag = NULL;
> +	}
> +	if (args->pag) {
> +		xfs_perag_put(args->pag);
> +		args->pag = NULL;
> +	}
> +	return error;
> +}
>  
> -			/*
> -			 * Didn't work, figure out the next iteration.
> -			 */
> -			if (args->agno == sagno &&
> -			    type == XFS_ALLOCTYPE_START_BNO)
> -				args->type = XFS_ALLOCTYPE_THIS_AG;
> +/*
> + * Iterate from the AGs from the start AG to the end of the filesystem, trying
> + * to allocate blocks. It starts with a near allocation attempt in the initial
> + * AG, then falls back to anywhere-in-ag after the first AG fails. It will wrap
> + * back to zero if allowed by previous allocations in this transaction,
> + * otherwise will wrap back to the start AG and run a second blocking pass to
> + * the end of the filesystem.
> + */
> +static int
> +xfs_alloc_vextent_start_ag(
> +	struct xfs_alloc_arg	*args,
> +	xfs_agnumber_t		minimum_agno)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		start_agno;
> +	xfs_agnumber_t		rotorstep = xfs_rotorstep;
> +	bool			bump_rotor = false;
> +	int			error;
>  
> -			/*
> -			 * If we are try-locking, we can't deadlock on AGF
> -			 * locks, so we can wrap all the way back to the first
> -			 * AG. Otherwise, wrap back to the start AG so we can't
> -			 * deadlock, and let the end of scan handler decide what
> -			 * to do next.
> -			 */
> -			if (++(args->agno) == mp->m_sb.sb_agcount) {
> -				if (flags & XFS_ALLOC_FLAG_TRYLOCK)
> -					args->agno = 0;
> -				else
> -					args->agno = sagno;
> -			}
> +	error = xfs_alloc_vextent_check_args(args);
> +	if (error) {
> +		if (error == -ENOSPC)
> +			return 0;
> +		return error;
> +	}
>  
> -			/*
> -			 * Reached the starting a.g., must either be done
> -			 * or switch to non-trylock mode.
> -			 */
> -			if (args->agno == sagno) {
> -				if (flags == 0) {
> -					args->agbno = NULLAGBLOCK;
> -					trace_xfs_alloc_vextent_allfailed(args);
> -					break;
> -				}
> +	if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
> +	    xfs_is_inode32(mp)) {
> +		args->fsbno = XFS_AGB_TO_FSB(mp,
> +				((mp->m_agfrotor / rotorstep) %
> +				mp->m_sb.sb_agcount), 0);
> +		bump_rotor = 1;
> +	}
> +	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
> +	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> +	args->type = XFS_ALLOCTYPE_NEAR_BNO;
>  
> -				/*
> -				 * Blocking pass next, so we must obey minimum
> -				 * agno constraints to avoid ABBA AGF deadlocks.
> -				 */
> -				flags = 0;
> -				if (minimum_agno > sagno)
> -					sagno = minimum_agno;
> -
> -				if (type == XFS_ALLOCTYPE_START_BNO) {
> -					args->agbno = XFS_FSB_TO_AGBNO(mp,
> -						args->fsbno);
> -					args->type = XFS_ALLOCTYPE_NEAR_BNO;
> -				}
> -			}
> -			xfs_perag_put(args->pag);
> -		}
> -		if (bump_rotor) {
> -			if (args->agno == sagno)
> -				mp->m_agfrotor = (mp->m_agfrotor + 1) %
> -					(mp->m_sb.sb_agcount * rotorstep);
> -			else
> -				mp->m_agfrotor = (args->agno * rotorstep + 1) %
> -					(mp->m_sb.sb_agcount * rotorstep);
> -		}
> -		break;
> -	default:
> -		ASSERT(0);
> -		/* NOTREACHED */
> +	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno, start_agno,
> +			XFS_ALLOC_FLAG_TRYLOCK);
> +	if (bump_rotor) {
> +		if (args->agno == start_agno)
> +			mp->m_agfrotor = (mp->m_agfrotor + 1) %
> +				(mp->m_sb.sb_agcount * rotorstep);
> +		else
> +			mp->m_agfrotor = (args->agno * rotorstep + 1) %
> +				(mp->m_sb.sb_agcount * rotorstep);
>  	}
> -	if (args->agbno == NULLAGBLOCK) {
> -		args->fsbno = NULLFSBLOCK;
> -	} else {
> -		args->fsbno = XFS_AGB_TO_FSB(mp, args->agno, args->agbno);
> -#ifdef DEBUG
> -		ASSERT(args->len >= args->minlen);
> -		ASSERT(args->len <= args->maxlen);
> -		ASSERT(args->agbno % args->alignment == 0);
> -		XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno),
> -			args->len);
> -#endif
>  
> +	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
> +	return error;
> +}
> +
> +/*
> + * Iterate from the agno indicated from args->fsbno through to the end of the
> + * filesystem attempting blocking allocation. This does not wrap or try a second
> + * pass, so will not recurse into AGs lower than indicated by fsbno.
> + */
> +static int
> +xfs_alloc_vextent_first_ag(
> +	struct xfs_alloc_arg	*args,
> +	xfs_agnumber_t		minimum_agno)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		start_agno;
> +	int			error;
> +
> +	error = xfs_alloc_vextent_check_args(args);
> +	if (error) {
> +		if (error == -ENOSPC)
> +			return 0;
> +		return error;
>  	}
>  
> -	/*
> -	 * We end up here with a locked AGF. If we failed, the caller is likely
> -	 * going to try to allocate again with different parameters, and that
> -	 * can widen the AGs that are searched for free space. If we have to do
> -	 * BMBT block allocation, we have to do a new allocation.
> -	 *
> -	 * Hence leaving this function with the AGF locked opens up potential
> -	 * ABBA AGF deadlocks because a future allocation attempt in this
> -	 * transaction may attempt to lock a lower number AGF.
> -	 *
> -	 * We can't release the AGF until the transaction is commited, so at
> -	 * this point we must update the "firstblock" tracker to point at this
> -	 * AG if the tracker is empty or points to a lower AG. This allows the
> -	 * next allocation attempt to be modified appropriately to avoid
> -	 * deadlocks.
> -	 */
> -	if (args->agbp &&
> -	    (args->tp->t_highest_agno == NULLAGNUMBER ||
> -	     args->pag->pag_agno > minimum_agno))
> -		args->tp->t_highest_agno = args->pag->pag_agno;
> -	xfs_perag_put(args->pag);
> -	return 0;
> -error0:
> -	xfs_perag_put(args->pag);
> +	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
> +
> +	args->type = XFS_ALLOCTYPE_THIS_AG;
> +	error =  xfs_alloc_vextent_iterate_ags(args, minimum_agno,
> +			start_agno, 0);
> +	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
>  	return error;
>  }
>  
> +/*
> + * Allocate an extent (variable-size).
> + * Depending on the allocation type, we either look in a single allocation
> + * group or loop over the allocation groups to find the result.
> + */
> +int
> +xfs_alloc_vextent(
> +	struct xfs_alloc_arg	*args)
> +{
> +	xfs_agnumber_t		minimum_agno = 0;
> +
> +	if (args->tp->t_highest_agno != NULLAGNUMBER)
> +		minimum_agno = args->tp->t_highest_agno;
> +
> +	switch (args->type) {
> +	case XFS_ALLOCTYPE_THIS_AG:
> +	case XFS_ALLOCTYPE_NEAR_BNO:
> +	case XFS_ALLOCTYPE_THIS_BNO:
> +		return xfs_alloc_vextent_this_ag(args, minimum_agno);
> +	case XFS_ALLOCTYPE_START_BNO:
> +		return xfs_alloc_vextent_start_ag(args, minimum_agno);
> +	case XFS_ALLOCTYPE_FIRST_AG:
> +		return xfs_alloc_vextent_first_ag(args, minimum_agno);
> +	default:
> +		ASSERT(0);
> +		/* NOTREACHED */
> +	}
> +	/* Should never get here */
> +	return -EFSCORRUPTED;
> +}
> +
>  /* Ensure that the freelist is at full capacity. */
>  int
>  xfs_free_extent_fix_freelist(
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 17/42] xfs: combine __xfs_alloc_vextent_this_ag and xfs_alloc_ag_vextent
  2023-01-18 22:44 ` [PATCH 17/42] xfs: combine __xfs_alloc_vextent_this_ag and xfs_alloc_ag_vextent Dave Chinner
@ 2023-02-01 22:25   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 22:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:40AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> There's a bit of a recursive conundrum around
> xfs_alloc_ag_vextent(). We can't first call xfs_alloc_ag_vextent()
> without preparing the AGFL for the allocation, and preparing the
> AGFL calls xfs_alloc_ag_vextent() to prepare the AGFL for the
> allocation. This "double allocation" requirement is not really clear
> from the current xfs_alloc_fix_freelist() calls that are sprinkled
> through the allocation code.
> 
> It's not helped that xfs_alloc_ag_vextent() can actually allocate
> from the AGFL itself, but there's special code to prevent AGFL prep
> allocations from allocating from the free list it's trying to prep.
> The naming is also not consistent: args->wasfromfl is true when we
> allocated _from_ the free list, but the indication that we are
> allocating _for_ the free list is via checking that (args->resv ==
> XFS_AG_RESV_AGFL).
> 
> So, lets make this "allocation required for allocation" situation
> clear by moving it all inside xfs_alloc_ag_vextent(). The freelist
> allocation is a specific XFS_ALLOCTYPE_THIS_AG allocation, which
> translated directly to xfs_alloc_ag_vextent_size() allocation.
> 
> This enables us to replace __xfs_alloc_vextent_this_ag() with a call
> to xfs_alloc_ag_vextent(), and we drive the freelist fixing further
> into the per-ag allocation algorithm.

Hmm.  My first reaction to all this was "why do I care about all this
slicing and dicing?" and "uugh what confuusing".  Then I skipped to the
end of the book and observed that the end goal seems to be the
elimination of:

	args.type = XFS_ALLOC_TYPE_START_AG;
	args.fsbno = sometarget;
	/* fill out other fields mysteriously */

by turning them all into explicit functions!

	error = xfs_alloc_vextent_start_ag(&args, sometarget);

So I looked at all the replacements and noticed that it's quite a bit
easier to understand what each variant on allocation does.

It took me a minute to realize that the additional call to
xfs_rmap_should_skip_owner_update is because _alloc_fix_freelist doesn't
call what becomes the _vextent_finish function.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 65 +++++++++++++++++++++------------------
>  1 file changed, 35 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 2dec95f35562..011baace7e9d 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1140,22 +1140,38 @@ xfs_alloc_ag_vextent_small(
>   * and of the form k * prod + mod unless there's nothing that large.
>   * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
>   */
> -STATIC int			/* error */
> +static int
>  xfs_alloc_ag_vextent(
> -	xfs_alloc_arg_t	*args)	/* argument structure for allocation */
> +	struct xfs_alloc_arg	*args)
>  {
> -	int		error=0;
> +	struct xfs_mount	*mp = args->mp;
> +	int			error = 0;
>  
>  	ASSERT(args->minlen > 0);
>  	ASSERT(args->maxlen > 0);
>  	ASSERT(args->minlen <= args->maxlen);
>  	ASSERT(args->mod < args->prod);
>  	ASSERT(args->alignment > 0);
> +	ASSERT(args->resv != XFS_AG_RESV_AGFL);
> +
> +
> +	error = xfs_alloc_fix_freelist(args, 0);
> +	if (error) {
> +		trace_xfs_alloc_vextent_nofix(args);
> +		return error;
> +	}
> +	if (!args->agbp) {
> +		/* cannot allocate in this AG at all */
> +		trace_xfs_alloc_vextent_noagbp(args);
> +		args->agbno = NULLAGBLOCK;
> +		return 0;
> +	}
> +	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> +	args->wasfromfl = 0;
>  
>  	/*
>  	 * Branch to correct routine based on the type.
>  	 */
> -	args->wasfromfl = 0;
>  	switch (args->type) {
>  	case XFS_ALLOCTYPE_THIS_AG:
>  		error = xfs_alloc_ag_vextent_size(args);
> @@ -1176,7 +1192,6 @@ xfs_alloc_ag_vextent(
>  
>  	ASSERT(args->len >= args->minlen);
>  	ASSERT(args->len <= args->maxlen);
> -	ASSERT(!args->wasfromfl || args->resv != XFS_AG_RESV_AGFL);
>  	ASSERT(args->agbno % args->alignment == 0);
>  
>  	/* if not file data, insert new block into the reverse map btree */
> @@ -2721,7 +2736,7 @@ xfs_alloc_fix_freelist(
>  		targs.resv = XFS_AG_RESV_AGFL;
>  
>  		/* Allocate as many blocks as possible at once. */
> -		error = xfs_alloc_ag_vextent(&targs);
> +		error = xfs_alloc_ag_vextent_size(&targs);
>  		if (error)
>  			goto out_agflbp_relse;
>  
> @@ -2735,6 +2750,18 @@ xfs_alloc_fix_freelist(
>  				break;
>  			goto out_agflbp_relse;
>  		}
> +
> +		if (!xfs_rmap_should_skip_owner_update(&targs.oinfo)) {
> +			error = xfs_rmap_alloc(tp, agbp, pag,
> +				       targs.agbno, targs.len, &targs.oinfo);
> +			if (error)
> +				goto out_agflbp_relse;
> +		}
> +		error = xfs_alloc_update_counters(tp, agbp,
> +						  -((long)(targs.len)));
> +		if (error)
> +			goto out_agflbp_relse;
> +
>  		/*
>  		 * Put each allocated block on the list.
>  		 */
> @@ -3244,28 +3271,6 @@ xfs_alloc_vextent_set_fsbno(
>  /*
>   * Allocate within a single AG only.
>   */
> -static int
> -__xfs_alloc_vextent_this_ag(
> -	struct xfs_alloc_arg	*args)
> -{
> -	struct xfs_mount	*mp = args->mp;
> -	int			error;
> -
> -	error = xfs_alloc_fix_freelist(args, 0);
> -	if (error) {
> -		trace_xfs_alloc_vextent_nofix(args);
> -		return error;
> -	}
> -	if (!args->agbp) {
> -		/* cannot allocate in this AG at all */
> -		trace_xfs_alloc_vextent_noagbp(args);
> -		args->agbno = NULLAGBLOCK;
> -		return 0;
> -	}
> -	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> -	return xfs_alloc_ag_vextent(args);
> -}
> -
>  static int
>  xfs_alloc_vextent_this_ag(
>  	struct xfs_alloc_arg	*args,
> @@ -3289,7 +3294,7 @@ xfs_alloc_vextent_this_ag(
>  	}
>  
>  	args->pag = xfs_perag_get(mp, args->agno);
> -	error = __xfs_alloc_vextent_this_ag(args);
> +	error = xfs_alloc_ag_vextent(args);
>  
>  	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
>  	xfs_perag_put(args->pag);
> @@ -3329,7 +3334,7 @@ xfs_alloc_vextent_iterate_ags(
>  	args->agno = start_agno;
>  	for (;;) {
>  		args->pag = xfs_perag_get(mp, args->agno);
> -		error = __xfs_alloc_vextent_this_ag(args);
> +		error = xfs_alloc_ag_vextent(args);
>  		if (error) {
>  			args->agbno = NULLAGBLOCK;
>  			break;
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate
  2023-01-18 22:44 ` [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate Dave Chinner
@ 2023-02-01 22:43   ` Darrick J. Wong
  2023-02-06 23:16     ` Dave Chinner
  0 siblings, 1 reply; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 22:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:43AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Change obvious callers of single AG allocation to use
> xfs_alloc_vextent_first_ag(). This gets rid of
> XFS_ALLOCTYPE_FIRST_AG as the type used within
> xfs_alloc_vextent_first_ag() during iteration is _THIS_AG. Hence we
> can remove the setting of args->type from all the callers of
> _first_ag() and remove the alloctype.
> 
> While doing this, pass the allocation target fsb as a parameter
> rather than encoding it in args->fsbno. This starts the process
> of making args->fsbno an output only variable rather than
> input/output.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 35 +++++++++++++++++++----------------
>  fs/xfs/libxfs/xfs_alloc.h | 10 ++++++++--
>  fs/xfs/libxfs/xfs_bmap.c  | 31 ++++++++++++++++---------------
>  3 files changed, 43 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 28b79facf2e3..186ce3aee9e0 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3183,7 +3183,8 @@ xfs_alloc_read_agf(
>   */
>  static int
>  xfs_alloc_vextent_check_args(
> -	struct xfs_alloc_arg	*args)
> +	struct xfs_alloc_arg	*args,
> +	xfs_rfsblock_t		target)

Isn't xfs_rfsblock_t supposed to be used to measure quantities of raw fs
blocks, and not the segmented agno/agbno numbers that we encode in most
places?

>  {
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_agblock_t		agsize;
> @@ -3201,13 +3202,13 @@ xfs_alloc_vextent_check_args(
>  		args->maxlen = agsize;
>  	if (args->alignment == 0)
>  		args->alignment = 1;
> -	ASSERT(XFS_FSB_TO_AGNO(mp, args->fsbno) < mp->m_sb.sb_agcount);
> -	ASSERT(XFS_FSB_TO_AGBNO(mp, args->fsbno) < agsize);
> +	ASSERT(XFS_FSB_TO_AGNO(mp, target) < mp->m_sb.sb_agcount);
> +	ASSERT(XFS_FSB_TO_AGBNO(mp, target) < agsize);

Yes, I think @target should be xfs_fsblock_t since we pass it to
XFS_FSB_TO_AG{,B}NO here.

>  	ASSERT(args->minlen <= args->maxlen);
>  	ASSERT(args->minlen <= agsize);
>  	ASSERT(args->mod < args->prod);
> -	if (XFS_FSB_TO_AGNO(mp, args->fsbno) >= mp->m_sb.sb_agcount ||
> -	    XFS_FSB_TO_AGBNO(mp, args->fsbno) >= agsize ||
> +	if (XFS_FSB_TO_AGNO(mp, target) >= mp->m_sb.sb_agcount ||
> +	    XFS_FSB_TO_AGBNO(mp, target) >= agsize ||
>  	    args->minlen > args->maxlen || args->minlen > agsize ||
>  	    args->mod >= args->prod) {
>  		args->fsbno = NULLFSBLOCK;
> @@ -3281,7 +3282,7 @@ xfs_alloc_vextent_this_ag(
>  	if (args->tp->t_highest_agno != NULLAGNUMBER)
>  		minimum_agno = args->tp->t_highest_agno;
>  
> -	error = xfs_alloc_vextent_check_args(args);
> +	error = xfs_alloc_vextent_check_args(args, args->fsbno);
>  	if (error) {
>  		if (error == -ENOSPC)
>  			return 0;
> @@ -3406,7 +3407,7 @@ xfs_alloc_vextent_start_ag(
>  	bool			bump_rotor = false;
>  	int			error;
>  
> -	error = xfs_alloc_vextent_check_args(args);
> +	error = xfs_alloc_vextent_check_args(args, args->fsbno);
>  	if (error) {
>  		if (error == -ENOSPC)
>  			return 0;
> @@ -3444,25 +3445,29 @@ xfs_alloc_vextent_start_ag(
>   * filesystem attempting blocking allocation. This does not wrap or try a second
>   * pass, so will not recurse into AGs lower than indicated by fsbno.
>   */
> -static int
> -xfs_alloc_vextent_first_ag(
> +int
> + xfs_alloc_vextent_first_ag(
>  	struct xfs_alloc_arg	*args,
> -	xfs_agnumber_t		minimum_agno)
> -{
> +	xfs_rfsblock_t		target)
> + {

Extra spaces here, and seemingly another variable that ought to be
xfs_fsblock_t?

--D

>  	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		minimum_agno = 0;
>  	xfs_agnumber_t		start_agno;
>  	int			error;
>  
> -	error = xfs_alloc_vextent_check_args(args);
> +	if (args->tp->t_highest_agno != NULLAGNUMBER)
> +		minimum_agno = args->tp->t_highest_agno;
> +
> +	error = xfs_alloc_vextent_check_args(args, target);
>  	if (error) {
>  		if (error == -ENOSPC)
>  			return 0;
>  		return error;
>  	}
>  
> -	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
> -
> +	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
>  	args->type = XFS_ALLOCTYPE_THIS_AG;
> +	args->fsbno = target;
>  	error =  xfs_alloc_vextent_iterate_ags(args, minimum_agno,
>  			start_agno, 0);
>  	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
> @@ -3495,8 +3500,6 @@ xfs_alloc_vextent(
>  		break;
>  	case XFS_ALLOCTYPE_START_BNO:
>  		return xfs_alloc_vextent_start_ag(args, minimum_agno);
> -	case XFS_ALLOCTYPE_FIRST_AG:
> -		return xfs_alloc_vextent_first_ag(args, minimum_agno);
>  	default:
>  		error = -EFSCORRUPTED;
>  		ASSERT(0);
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 0a9ad6cd18e2..73697dd3ca55 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -19,7 +19,6 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
>  /*
>   * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
>   */
> -#define XFS_ALLOCTYPE_FIRST_AG	0x02	/* ... start at ag 0 */
>  #define XFS_ALLOCTYPE_THIS_AG	0x08	/* anywhere in this a.g. */
>  #define XFS_ALLOCTYPE_START_BNO	0x10	/* near this block else anywhere */
>  #define XFS_ALLOCTYPE_NEAR_BNO	0x20	/* in this a.g. and near this block */
> @@ -29,7 +28,6 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
>  typedef unsigned int xfs_alloctype_t;
>  
>  #define XFS_ALLOC_TYPES \
> -	{ XFS_ALLOCTYPE_FIRST_AG,	"FIRST_AG" }, \
>  	{ XFS_ALLOCTYPE_THIS_AG,	"THIS_AG" }, \
>  	{ XFS_ALLOCTYPE_START_BNO,	"START_BNO" }, \
>  	{ XFS_ALLOCTYPE_NEAR_BNO,	"NEAR_BNO" }, \
> @@ -130,6 +128,14 @@ xfs_alloc_vextent(
>   */
>  int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
>  
> +/*
> + * Iterate from the AG indicated from args->fsbno through to the end of the
> + * filesystem attempting blocking allocation. This is for use in last
> + * resort allocation attempts when everything else has failed.
> + */
> +int xfs_alloc_vextent_first_ag(struct xfs_alloc_arg *args,
> +		xfs_rfsblock_t target);
> +
>  /*
>   * Free an extent.
>   */
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index cdf3b551ef7b..eb3dc8d5319b 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3248,13 +3248,6 @@ xfs_bmap_btalloc_filestreams(
>  	int			notinit = 0;
>  	int			error;
>  
> -	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> -		args->type = XFS_ALLOCTYPE_FIRST_AG;
> -		args->total = ap->minlen;
> -		args->minlen = ap->minlen;
> -		return 0;
> -	}
> -
>  	args->type = XFS_ALLOCTYPE_NEAR_BNO;
>  	args->total = ap->total;
>  
> @@ -3462,9 +3455,7 @@ xfs_bmap_exact_minlen_extent_alloc(
>  	 */
>  	ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
>  
> -	args.fsbno = ap->blkno;
>  	args.oinfo = XFS_RMAP_OINFO_SKIP_UPDATE;
> -	args.type = XFS_ALLOCTYPE_FIRST_AG;
>  	args.minlen = args.maxlen = ap->minlen;
>  	args.total = ap->total;
>  
> @@ -3476,7 +3467,7 @@ xfs_bmap_exact_minlen_extent_alloc(
>  	args.resv = XFS_AG_RESV_NONE;
>  	args.datatype = ap->datatype;
>  
> -	error = xfs_alloc_vextent(&args);
> +	error = xfs_alloc_vextent_first_ag(&args, ap->blkno);
>  	if (error)
>  		return error;
>  
> @@ -3623,10 +3614,21 @@ xfs_bmap_btalloc_best_length(
>  	 * size to the largest space found.
>  	 */
>  	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> -	    xfs_inode_is_filestream(ap->ip))
> +	    xfs_inode_is_filestream(ap->ip)) {
> +		/*
> +		 * If there is very little free space before we start a
> +		 * filestreams allocation, we're almost guaranteed to fail to
> +		 * find an AG with enough contiguous free space to succeed, so
> +		 * just go straight to the low space algorithm.
> +		 */
> +		if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +			args->minlen = ap->minlen;
> +			goto critically_low_space;
> +		}
>  		error = xfs_bmap_btalloc_filestreams(ap, args, &blen);
> -	else
> +	} else {
>  		error = xfs_bmap_btalloc_select_lengths(ap, args, &blen);
> +	}
>  	if (error)
>  		return error;
>  
> @@ -3673,10 +3675,9 @@ xfs_bmap_btalloc_best_length(
>  	 * so they don't waste time on allocation modes that are unlikely to
>  	 * succeed.
>  	 */
> -	args->fsbno = 0;
> -	args->type = XFS_ALLOCTYPE_FIRST_AG;
> +critically_low_space:
>  	args->total = ap->minlen;
> -	error = xfs_alloc_vextent(args);
> +	error = xfs_alloc_vextent_first_ag(args, 0);
>  	if (error)
>  		return error;
>  	ap->tp->t_flags |= XFS_TRANS_LOWMODE;
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 21/42] xfs: use xfs_alloc_vextent_start_bno() where appropriate
  2023-01-18 22:44 ` [PATCH 21/42] xfs: use xfs_alloc_vextent_start_bno() " Dave Chinner
@ 2023-02-01 22:51   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 22:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:44AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Change obvious callers of single AG allocation to use
> xfs_alloc_vextent_start_bno(). Callers no long need to specify
> XFS_ALLOCTYPE_START_BNO, and so the type can be driven inward and
> removed.
> 
> While doing this, also pass the allocation target fsb as a parameter
> rather than encoding it in args->fsbno.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c      | 24 ++++++++++---------
>  fs/xfs/libxfs/xfs_alloc.h      | 13 ++++++++--
>  fs/xfs/libxfs/xfs_bmap.c       | 43 ++++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_bmap_btree.c |  9 ++-----
>  4 files changed, 51 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 186ce3aee9e0..294f80d596d9 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3189,7 +3189,6 @@ xfs_alloc_vextent_check_args(
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_agblock_t		agsize;
>  
> -	args->otype = args->type;
>  	args->agbno = NULLAGBLOCK;
>  
>  	/*
> @@ -3345,7 +3344,7 @@ xfs_alloc_vextent_iterate_ags(
>  		trace_xfs_alloc_vextent_loopfailed(args);
>  
>  		if (args->agno == start_agno &&
> -		    args->otype == XFS_ALLOCTYPE_START_BNO)
> +		    args->otype == XFS_ALLOCTYPE_NEAR_BNO)
>  			args->type = XFS_ALLOCTYPE_THIS_AG;
>  
>  		/*
> @@ -3373,7 +3372,7 @@ xfs_alloc_vextent_iterate_ags(
>  			}
>  
>  			flags = 0;
> -			if (args->otype == XFS_ALLOCTYPE_START_BNO) {
> +			if (args->otype == XFS_ALLOCTYPE_NEAR_BNO) {
>  				args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
>  				args->type = XFS_ALLOCTYPE_NEAR_BNO;
>  			}
> @@ -3396,18 +3395,22 @@ xfs_alloc_vextent_iterate_ags(
>   * otherwise will wrap back to the start AG and run a second blocking pass to
>   * the end of the filesystem.
>   */
> -static int
> +int
>  xfs_alloc_vextent_start_ag(
>  	struct xfs_alloc_arg	*args,
> -	xfs_agnumber_t		minimum_agno)
> +	xfs_rfsblock_t		target)

Same xfs_fsblock_t vs. xfs_rfsblock_t comment as the last patch.  The
rest looks ok though.

--D

>  {
>  	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		minimum_agno = 0;
>  	xfs_agnumber_t		start_agno;
>  	xfs_agnumber_t		rotorstep = xfs_rotorstep;
>  	bool			bump_rotor = false;
>  	int			error;
>  
> -	error = xfs_alloc_vextent_check_args(args, args->fsbno);
> +	if (args->tp->t_highest_agno != NULLAGNUMBER)
> +		minimum_agno = args->tp->t_highest_agno;
> +
> +	error = xfs_alloc_vextent_check_args(args, target);
>  	if (error) {
>  		if (error == -ENOSPC)
>  			return 0;
> @@ -3416,14 +3419,15 @@ xfs_alloc_vextent_start_ag(
>  
>  	if ((args->datatype & XFS_ALLOC_INITIAL_USER_DATA) &&
>  	    xfs_is_inode32(mp)) {
> -		args->fsbno = XFS_AGB_TO_FSB(mp,
> +		target = XFS_AGB_TO_FSB(mp,
>  				((mp->m_agfrotor / rotorstep) %
>  				mp->m_sb.sb_agcount), 0);
>  		bump_rotor = 1;
>  	}
> -	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, args->fsbno));
> -	args->agbno = XFS_FSB_TO_AGBNO(mp, args->fsbno);
> +	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
> +	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
>  	args->type = XFS_ALLOCTYPE_NEAR_BNO;
> +	args->fsbno = target;
>  
>  	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno, start_agno,
>  			XFS_ALLOC_FLAG_TRYLOCK);
> @@ -3498,8 +3502,6 @@ xfs_alloc_vextent(
>  		error = xfs_alloc_vextent_this_ag(args);
>  		xfs_perag_put(args->pag);
>  		break;
> -	case XFS_ALLOCTYPE_START_BNO:
> -		return xfs_alloc_vextent_start_ag(args, minimum_agno);
>  	default:
>  		error = -EFSCORRUPTED;
>  		ASSERT(0);
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 73697dd3ca55..5487dff3d68a 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -20,7 +20,6 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
>   * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
>   */
>  #define XFS_ALLOCTYPE_THIS_AG	0x08	/* anywhere in this a.g. */
> -#define XFS_ALLOCTYPE_START_BNO	0x10	/* near this block else anywhere */
>  #define XFS_ALLOCTYPE_NEAR_BNO	0x20	/* in this a.g. and near this block */
>  #define XFS_ALLOCTYPE_THIS_BNO	0x40	/* at exactly this block */
>  
> @@ -29,7 +28,6 @@ typedef unsigned int xfs_alloctype_t;
>  
>  #define XFS_ALLOC_TYPES \
>  	{ XFS_ALLOCTYPE_THIS_AG,	"THIS_AG" }, \
> -	{ XFS_ALLOCTYPE_START_BNO,	"START_BNO" }, \
>  	{ XFS_ALLOCTYPE_NEAR_BNO,	"NEAR_BNO" }, \
>  	{ XFS_ALLOCTYPE_THIS_BNO,	"THIS_BNO" }
>  
> @@ -128,6 +126,17 @@ xfs_alloc_vextent(
>   */
>  int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
>  
> +/*
> + * Best effort full filesystem allocation scan.
> + *
> + * Locality aware allocation will be attempted in the initial AG, but on failure
> + * non-localised attempts will be made. The AGs are constrained by previous
> + * allocations in the current transaction. Two passes will be made - the first
> + * non-blocking, the second blocking.
> + */
> +int xfs_alloc_vextent_start_ag(struct xfs_alloc_arg *args,
> +		xfs_rfsblock_t target);
> +
>  /*
>   * Iterate from the AG indicated from args->fsbno through to the end of the
>   * filesystem attempting blocking allocation. This is for use in last
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index eb3dc8d5319b..aefcdf2bfd57 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -646,12 +646,11 @@ xfs_bmap_extents_to_btree(
>  	args.mp = mp;
>  	xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, whichfork);
>  
> -	args.type = XFS_ALLOCTYPE_START_BNO;
> -	args.fsbno = XFS_INO_TO_FSB(mp, ip->i_ino);
>  	args.minlen = args.maxlen = args.prod = 1;
>  	args.wasdel = wasdel;
>  	*logflagsp = 0;
> -	error = xfs_alloc_vextent(&args);
> +	error = xfs_alloc_vextent_start_ag(&args,
> +				XFS_INO_TO_FSB(mp, ip->i_ino));
>  	if (error)
>  		goto out_root_realloc;
>  
> @@ -792,15 +791,15 @@ xfs_bmap_local_to_extents(
>  	args.total = total;
>  	args.minlen = args.maxlen = args.prod = 1;
>  	xfs_rmap_ino_owner(&args.oinfo, ip->i_ino, whichfork, 0);
> +
>  	/*
>  	 * Allocate a block.  We know we need only one, since the
>  	 * file currently fits in an inode.
>  	 */
> -	args.fsbno = XFS_INO_TO_FSB(args.mp, ip->i_ino);
> -	args.type = XFS_ALLOCTYPE_START_BNO;
>  	args.total = total;
>  	args.minlen = args.maxlen = args.prod = 1;
> -	error = xfs_alloc_vextent(&args);
> +	error = xfs_alloc_vextent_start_ag(&args,
> +			XFS_INO_TO_FSB(args.mp, ip->i_ino));
>  	if (error)
>  		goto done;
>  
> @@ -3208,7 +3207,6 @@ xfs_bmap_btalloc_select_lengths(
>  	int			notinit = 0;
>  	int			error = 0;
>  
> -	args->type = XFS_ALLOCTYPE_START_BNO;
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
>  		args->total = ap->minlen;
>  		args->minlen = ap->minlen;
> @@ -3500,7 +3498,8 @@ xfs_bmap_btalloc_at_eof(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args,
>  	xfs_extlen_t		blen,
> -	int			stripe_align)
> +	int			stripe_align,
> +	bool			ag_only)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_alloctype_t		atype;
> @@ -3565,7 +3564,10 @@ xfs_bmap_btalloc_at_eof(
>  		args->minalignslop = 0;
>  	}
>  
> -	error = xfs_alloc_vextent(args);
> +	if (ag_only)
> +		error = xfs_alloc_vextent(args);
> +	else
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error)
>  		return error;
>  
> @@ -3591,13 +3593,17 @@ xfs_bmap_btalloc_best_length(
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_extlen_t		blen = 0;
> +	bool			is_filestream = false;
>  	int			error;
>  
> +	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> +	    xfs_inode_is_filestream(ap->ip))
> +		is_filestream = true;
> +
>  	/*
>  	 * Determine the initial block number we will target for allocation.
>  	 */
> -	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> -	    xfs_inode_is_filestream(ap->ip)) {
> +	if (is_filestream) {
>  		xfs_agnumber_t	agno = xfs_filestream_lookup_ag(ap->ip);
>  		if (agno == NULLAGNUMBER)
>  			agno = 0;
> @@ -3613,8 +3619,7 @@ xfs_bmap_btalloc_best_length(
>  	 * the request.  If one isn't found, then adjust the minimum allocation
>  	 * size to the largest space found.
>  	 */
> -	if ((ap->datatype & XFS_ALLOC_USERDATA) &&
> -	    xfs_inode_is_filestream(ap->ip)) {
> +	if (is_filestream) {
>  		/*
>  		 * If there is very little free space before we start a
>  		 * filestreams allocation, we're almost guaranteed to fail to
> @@ -3639,14 +3644,18 @@ xfs_bmap_btalloc_best_length(
>  	 * trying.
>  	 */
>  	if (ap->aeof && !(ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align);
> +		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
> +				is_filestream);
>  		if (error)
>  			return error;
>  		if (args->fsbno != NULLFSBLOCK)
>  			return 0;
>  	}
>  
> -	error = xfs_alloc_vextent(args);
> +	if (is_filestream)
> +		error = xfs_alloc_vextent(args);
> +	else
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error)
>  		return error;
>  	if (args->fsbno != NULLFSBLOCK)
> @@ -3658,9 +3667,7 @@ xfs_bmap_btalloc_best_length(
>  	 */
>  	if (args->minlen > ap->minlen) {
>  		args->minlen = ap->minlen;
> -		args->type = XFS_ALLOCTYPE_START_BNO;
> -		args->fsbno = ap->blkno;
> -		error = xfs_alloc_vextent(args);
> +		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  		if (error)
>  			return error;
>  	}
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index d42c1a1da1fc..b8ad95050c9b 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -214,9 +214,6 @@ xfs_bmbt_alloc_block(
>  	if (!args.wasdel && args.tp->t_blk_res == 0)
>  		return -ENOSPC;
>  
> -	args.fsbno = be64_to_cpu(start->l);
> -	args.type = XFS_ALLOCTYPE_START_BNO;
> -
>  	/*
>  	 * If we are coming here from something like unwritten extent
>  	 * conversion, there has been no data extent allocation already done, so
> @@ -227,7 +224,7 @@ xfs_bmbt_alloc_block(
>  		args.minleft = xfs_bmapi_minleft(cur->bc_tp, cur->bc_ino.ip,
>  					cur->bc_ino.whichfork);
>  
> -	error = xfs_alloc_vextent(&args);
> +	error = xfs_alloc_vextent_start_ag(&args, be64_to_cpu(start->l));
>  	if (error)
>  		return error;
>  
> @@ -237,10 +234,8 @@ xfs_bmbt_alloc_block(
>  		 * a full btree split.  Try again and if
>  		 * successful activate the lowspace algorithm.
>  		 */
> -		args.fsbno = 0;
>  		args.minleft = 0;
> -		args.type = XFS_ALLOCTYPE_START_BNO;
> -		error = xfs_alloc_vextent(&args);
> +		error = xfs_alloc_vextent_start_ag(&args, 0);
>  		if (error)
>  			return error;
>  		cur->bc_tp->t_flags |= XFS_TRANS_LOWMODE;
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 22/42] xfs: introduce xfs_alloc_vextent_near_bno()
  2023-01-18 22:44 ` [PATCH 22/42] xfs: introduce xfs_alloc_vextent_near_bno() Dave Chinner
@ 2023-02-01 22:52   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 22:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:45AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The remaining callers of xfs_alloc_vextent() are all doing NEAR_BNO
> allocations. We can replace that function with a new
> xfs_alloc_vextent_near_bno() function that does this explicitly.
> 
> We also multiplex NEAR_BNO allocations through
> xfs_alloc_vextent_this_ag via args->type. Replace all of these with
> direct calls to xfs_alloc_vextent_near_bno(), too.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c          | 50 ++++++++++++++++++------------
>  fs/xfs/libxfs/xfs_alloc.h          | 14 ++++-----
>  fs/xfs/libxfs/xfs_bmap.c           |  6 ++--
>  fs/xfs/libxfs/xfs_ialloc.c         | 27 ++++++----------
>  fs/xfs/libxfs/xfs_ialloc_btree.c   |  5 ++-
>  fs/xfs/libxfs/xfs_refcount_btree.c |  7 ++---
>  6 files changed, 55 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 294f80d596d9..485a73eab9d9 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3479,35 +3479,47 @@ int
>  }
>  
>  /*
> - * Allocate an extent (variable-size).
> - * Depending on the allocation type, we either look in a single allocation
> - * group or loop over the allocation groups to find the result.
> + * Allocate an extent as close to the target as possible. If there are not
> + * viable candidates in the AG, then fail the allocation.
>   */
>  int
> -xfs_alloc_vextent(
> -	struct xfs_alloc_arg	*args)
> +xfs_alloc_vextent_near_bno(
> +	struct xfs_alloc_arg	*args,
> +	xfs_rfsblock_t		target)

xfs_rfsblock_t vs. xfs_fsblock_t here too...

--D

>  {
> +	struct xfs_mount	*mp = args->mp;
> +	bool			need_pag = !args->pag;
>  	xfs_agnumber_t		minimum_agno = 0;
>  	int			error;
>  
>  	if (args->tp->t_highest_agno != NULLAGNUMBER)
>  		minimum_agno = args->tp->t_highest_agno;
>  
> -	switch (args->type) {
> -	case XFS_ALLOCTYPE_THIS_AG:
> -	case XFS_ALLOCTYPE_NEAR_BNO:
> -	case XFS_ALLOCTYPE_THIS_BNO:
> -		args->pag = xfs_perag_get(args->mp,
> -				XFS_FSB_TO_AGNO(args->mp, args->fsbno));
> -		error = xfs_alloc_vextent_this_ag(args);
> -		xfs_perag_put(args->pag);
> -		break;
> -	default:
> -		error = -EFSCORRUPTED;
> -		ASSERT(0);
> -		break;
> +	error = xfs_alloc_vextent_check_args(args, target);
> +	if (error) {
> +		if (error == -ENOSPC)
> +			return 0;
> +		return error;
>  	}
> -	return error;
> +
> +	args->agno = XFS_FSB_TO_AGNO(mp, target);
> +	if (minimum_agno > args->agno) {
> +		trace_xfs_alloc_vextent_skip_deadlock(args);
> +		return 0;
> +	}
> +
> +	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
> +	args->type = XFS_ALLOCTYPE_NEAR_BNO;
> +	if (need_pag)
> +		args->pag = xfs_perag_get(args->mp, args->agno);
> +	error = xfs_alloc_ag_vextent(args);
> +	if (need_pag)
> +		xfs_perag_put(args->pag);
> +	if (error)
> +		return error;
> +
> +	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
> +	return 0;
>  }
>  
>  /* Ensure that the freelist is at full capacity. */
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 5487dff3d68a..f38a2f8e20fb 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -113,19 +113,19 @@ xfs_alloc_log_agf(
>  	struct xfs_buf	*bp,	/* buffer for a.g. freelist header */
>  	uint32_t	fields);/* mask of fields to be logged (XFS_AGF_...) */
>  
> -/*
> - * Allocate an extent (variable-size).
> - */
> -int				/* error */
> -xfs_alloc_vextent(
> -	xfs_alloc_arg_t	*args);	/* allocation argument structure */
> -
>  /*
>   * Allocate an extent in the specific AG defined by args->fsbno. If there is no
>   * space in that AG, then the allocation will fail.
>   */
>  int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
>  
> +/*
> + * Allocate an extent as close to the target as possible. If there are not
> + * viable candidates in the AG, then fail the allocation.
> + */
> +int xfs_alloc_vextent_near_bno(struct xfs_alloc_arg *args,
> +		xfs_rfsblock_t target);
> +
>  /*
>   * Best effort full filesystem allocation scan.
>   *
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index aefcdf2bfd57..4446b035eed5 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3246,7 +3246,6 @@ xfs_bmap_btalloc_filestreams(
>  	int			notinit = 0;
>  	int			error;
>  
> -	args->type = XFS_ALLOCTYPE_NEAR_BNO;
>  	args->total = ap->total;
>  
>  	start_agno = XFS_FSB_TO_AGNO(mp, ap->blkno);
> @@ -3565,7 +3564,7 @@ xfs_bmap_btalloc_at_eof(
>  	}
>  
>  	if (ag_only)
> -		error = xfs_alloc_vextent(args);
> +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
>  	else
>  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error)
> @@ -3612,7 +3611,6 @@ xfs_bmap_btalloc_best_length(
>  		ap->blkno = XFS_INO_TO_FSB(mp, ap->ip->i_ino);
>  	}
>  	xfs_bmap_adjacent(ap);
> -	args->fsbno = ap->blkno;
>  
>  	/*
>  	 * Search for an allocation group with a single extent large enough for
> @@ -3653,7 +3651,7 @@ xfs_bmap_btalloc_best_length(
>  	}
>  
>  	if (is_filestream)
> -		error = xfs_alloc_vextent(args);
> +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
>  	else
>  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
>  	if (error)
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 2f3e47cb9332..daa6f7055bba 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -717,23 +717,17 @@ xfs_ialloc_ag_alloc(
>  			isaligned = 1;
>  		} else
>  			args.alignment = igeo->cluster_align;
> -		/*
> -		 * Need to figure out where to allocate the inode blocks.
> -		 * Ideally they should be spaced out through the a.g.
> -		 * For now, just allocate blocks up front.
> -		 */
> -		args.agbno = be32_to_cpu(agi->agi_root);
> -		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
>  		/*
>  		 * Allocate a fixed-size extent of inodes.
>  		 */
> -		args.type = XFS_ALLOCTYPE_NEAR_BNO;
>  		args.prod = 1;
>  		/*
>  		 * Allow space for the inode btree to split.
>  		 */
>  		args.minleft = igeo->inobt_maxlevels;
> -		error = xfs_alloc_vextent_this_ag(&args);
> +		error = xfs_alloc_vextent_near_bno(&args,
> +				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
> +						be32_to_cpu(agi->agi_root)));
>  		if (error)
>  			return error;
>  	}
> @@ -743,11 +737,11 @@ xfs_ialloc_ag_alloc(
>  	 * alignment.
>  	 */
>  	if (isaligned && args.fsbno == NULLFSBLOCK) {
> -		args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -		args.agbno = be32_to_cpu(agi->agi_root);
> -		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
>  		args.alignment = igeo->cluster_align;
> -		if ((error = xfs_alloc_vextent(&args)))
> +		error = xfs_alloc_vextent_near_bno(&args,
> +				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
> +						be32_to_cpu(agi->agi_root)));
> +		if (error)
>  			return error;
>  	}
>  
> @@ -759,9 +753,6 @@ xfs_ialloc_ag_alloc(
>  	    igeo->ialloc_min_blks < igeo->ialloc_blks &&
>  	    args.fsbno == NULLFSBLOCK) {
>  sparse_alloc:
> -		args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -		args.agbno = be32_to_cpu(agi->agi_root);
> -		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
>  		args.alignment = args.mp->m_sb.sb_spino_align;
>  		args.prod = 1;
>  
> @@ -783,7 +774,9 @@ xfs_ialloc_ag_alloc(
>  					    args.mp->m_sb.sb_inoalignmt) -
>  				 igeo->ialloc_blks;
>  
> -		error = xfs_alloc_vextent_this_ag(&args);
> +		error = xfs_alloc_vextent_near_bno(&args,
> +				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
> +						be32_to_cpu(agi->agi_root)));
>  		if (error)
>  			return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index fa6cd2502970..9b28211d5a4c 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -105,14 +105,13 @@ __xfs_inobt_alloc_block(
>  	args.mp = cur->bc_mp;
>  	args.pag = cur->bc_ag.pag;
>  	args.oinfo = XFS_RMAP_OINFO_INOBT;
> -	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_ag.pag->pag_agno, sbno);
>  	args.minlen = 1;
>  	args.maxlen = 1;
>  	args.prod = 1;
> -	args.type = XFS_ALLOCTYPE_NEAR_BNO;
>  	args.resv = resv;
>  
> -	error = xfs_alloc_vextent_this_ag(&args);
> +	error = xfs_alloc_vextent_near_bno(&args,
> +			XFS_AGB_TO_FSB(args.mp, args.pag->pag_agno, sbno));
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
> index a980fb18bde2..f3b860970b26 100644
> --- a/fs/xfs/libxfs/xfs_refcount_btree.c
> +++ b/fs/xfs/libxfs/xfs_refcount_btree.c
> @@ -68,14 +68,13 @@ xfs_refcountbt_alloc_block(
>  	args.tp = cur->bc_tp;
>  	args.mp = cur->bc_mp;
>  	args.pag = cur->bc_ag.pag;
> -	args.type = XFS_ALLOCTYPE_NEAR_BNO;
> -	args.fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> -			xfs_refc_block(args.mp));
>  	args.oinfo = XFS_RMAP_OINFO_REFC;
>  	args.minlen = args.maxlen = args.prod = 1;
>  	args.resv = XFS_AG_RESV_METADATA;
>  
> -	error = xfs_alloc_vextent_this_ag(&args);
> +	error = xfs_alloc_vextent_near_bno(&args,
> +			XFS_AGB_TO_FSB(args.mp, args.pag->pag_agno,
> +					xfs_refc_block(args.mp)));
>  	if (error)
>  		goto out_error;
>  	trace_xfs_refcountbt_alloc_block(cur->bc_mp, cur->bc_ag.pag->pag_agno,
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 23/42] xfs: introduce xfs_alloc_vextent_exact_bno()
  2023-01-18 22:44 ` [PATCH 23/42] xfs: introduce xfs_alloc_vextent_exact_bno() Dave Chinner
@ 2023-02-01 23:00   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 23:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:46AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Two of the callers to xfs_alloc_vextent_this_ag() actually want
> exact block number allocation, not anywhere-in-ag allocation. Split
> this out from _this_ag() as a first class citizen so no external
> extent allocation code needs to care about args->type anymore.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.c     |  6 ++--
>  fs/xfs/libxfs/xfs_alloc.c  | 65 ++++++++++++++++++++++++++++++++------
>  fs/xfs/libxfs/xfs_alloc.h  | 13 ++++++--
>  fs/xfs/libxfs/xfs_bmap.c   |  6 ++--
>  fs/xfs/libxfs/xfs_ialloc.c |  6 ++--
>  fs/xfs/scrub/repair.c      |  4 +--
>  6 files changed, 73 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 053d77a283f7..86696a1c6891 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -888,7 +888,6 @@ xfs_ag_shrink_space(
>  		.tp	= *tpp,
>  		.mp	= mp,
>  		.pag	= pag,
> -		.type	= XFS_ALLOCTYPE_THIS_BNO,
>  		.minlen = delta,
>  		.maxlen = delta,
>  		.oinfo	= XFS_RMAP_OINFO_SKIP_UPDATE,
> @@ -920,8 +919,6 @@ xfs_ag_shrink_space(
>  	if (delta >= aglen)
>  		return -EINVAL;
>  
> -	args.fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, aglen - delta);
> -
>  	/*
>  	 * Make sure that the last inode cluster cannot overlap with the new
>  	 * end of the AG, even if it's sparse.
> @@ -939,7 +936,8 @@ xfs_ag_shrink_space(
>  		return error;
>  
>  	/* internal log shouldn't also show up in the free space btrees */
> -	error = xfs_alloc_vextent_this_ag(&args);
> +	error = xfs_alloc_vextent_exact_bno(&args,
> +			XFS_AGB_TO_FSB(mp, pag->pag_agno, aglen - delta));
>  	if (!error && args.agbno == NULLAGBLOCK)
>  		error = -ENOSPC;
>  
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 485a73eab9d9..b810a94aad70 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3272,28 +3272,34 @@ xfs_alloc_vextent_set_fsbno(
>   */
>  int
>  xfs_alloc_vextent_this_ag(
> -	struct xfs_alloc_arg	*args)
> +	struct xfs_alloc_arg	*args,
> +	xfs_agnumber_t		agno)
>  {
>  	struct xfs_mount	*mp = args->mp;
>  	xfs_agnumber_t		minimum_agno = 0;
> +	xfs_rfsblock_t		target = XFS_AGB_TO_FSB(mp, agno, 0);

Same xfs_rfsblock_t vs. xfs_fsblock_t comment here too.

These conversions look like a good improvement though -- it's very
helpful that we can now look at an xfs_alloc_vextent_* call and know
exactly which behavior we're getting.

--D

>  	int			error;
>  
>  	if (args->tp->t_highest_agno != NULLAGNUMBER)
>  		minimum_agno = args->tp->t_highest_agno;
>  
> -	error = xfs_alloc_vextent_check_args(args, args->fsbno);
> +	if (minimum_agno > agno) {
> +		trace_xfs_alloc_vextent_skip_deadlock(args);
> +		args->fsbno = NULLFSBLOCK;
> +		return 0;
> +	}
> +
> +	error = xfs_alloc_vextent_check_args(args, target);
>  	if (error) {
>  		if (error == -ENOSPC)
>  			return 0;
>  		return error;
>  	}
>  
> -	args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
> -	if (minimum_agno > args->agno) {
> -		trace_xfs_alloc_vextent_skip_deadlock(args);
> -		args->fsbno = NULLFSBLOCK;
> -		return 0;
> -	}
> +	args->agno = agno;
> +	args->agbno = 0;
> +	args->fsbno = target;
> +	args->type = XFS_ALLOCTYPE_THIS_AG;
>  
>  	error = xfs_alloc_ag_vextent(args);
>  	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
> @@ -3450,7 +3456,7 @@ xfs_alloc_vextent_start_ag(
>   * pass, so will not recurse into AGs lower than indicated by fsbno.
>   */
>  int
> - xfs_alloc_vextent_first_ag(
> +xfs_alloc_vextent_first_ag(
>  	struct xfs_alloc_arg	*args,
>  	xfs_rfsblock_t		target)
>   {
> @@ -3472,12 +3478,51 @@ int
>  	start_agno = max(minimum_agno, XFS_FSB_TO_AGNO(mp, target));
>  	args->type = XFS_ALLOCTYPE_THIS_AG;
>  	args->fsbno = target;
> -	error =  xfs_alloc_vextent_iterate_ags(args, minimum_agno,
> +	error = xfs_alloc_vextent_iterate_ags(args, minimum_agno,
>  			start_agno, 0);
>  	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
>  	return error;
>  }
>  
> +/*
> + * Allocate within a single AG only.
> + */
> +int
> +xfs_alloc_vextent_exact_bno(
> +	struct xfs_alloc_arg	*args,
> +	xfs_rfsblock_t		target)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		minimum_agno = 0;
> +	int			error;
> +
> +	if (args->tp->t_highest_agno != NULLAGNUMBER)
> +		minimum_agno = args->tp->t_highest_agno;
> +
> +	error = xfs_alloc_vextent_check_args(args, target);
> +	if (error) {
> +		if (error == -ENOSPC)
> +			return 0;
> +		return error;
> +	}
> +
> +	args->agno = XFS_FSB_TO_AGNO(mp, target);
> +	if (minimum_agno > args->agno) {
> +		trace_xfs_alloc_vextent_skip_deadlock(args);
> +		return 0;
> +	}
> +
> +	args->agbno = XFS_FSB_TO_AGBNO(mp, target);
> +	args->fsbno = target;
> +	args->type = XFS_ALLOCTYPE_THIS_BNO;
> +	error = xfs_alloc_ag_vextent(args);
> +	if (error)
> +		return error;
> +
> +	xfs_alloc_vextent_set_fsbno(args, minimum_agno);
> +	return 0;
> +}
> +
>  /*
>   * Allocate an extent as close to the target as possible. If there are not
>   * viable candidates in the AG, then fail the allocation.
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index f38a2f8e20fb..106b4deb1110 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -114,10 +114,10 @@ xfs_alloc_log_agf(
>  	uint32_t	fields);/* mask of fields to be logged (XFS_AGF_...) */
>  
>  /*
> - * Allocate an extent in the specific AG defined by args->fsbno. If there is no
> - * space in that AG, then the allocation will fail.
> + * Allocate an extent anywhere in the specific AG given. If there is no
> + * space matching the requirements in that AG, then the allocation will fail.
>   */
> -int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
> +int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args, xfs_agnumber_t agno);
>  
>  /*
>   * Allocate an extent as close to the target as possible. If there are not
> @@ -126,6 +126,13 @@ int xfs_alloc_vextent_this_ag(struct xfs_alloc_arg *args);
>  int xfs_alloc_vextent_near_bno(struct xfs_alloc_arg *args,
>  		xfs_rfsblock_t target);
>  
> +/*
> + * Allocate an extent exactly at the target given. If this is not possible
> + * then the allocation fails.
> + */
> +int xfs_alloc_vextent_exact_bno(struct xfs_alloc_arg *args,
> +		xfs_rfsblock_t target);
> +
>  /*
>   * Best effort full filesystem allocation scan.
>   *
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 4446b035eed5..c9902df16e25 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3514,7 +3514,6 @@ xfs_bmap_btalloc_at_eof(
>  		xfs_extlen_t	nextminlen = 0;
>  
>  		atype = args->type;
> -		args->type = XFS_ALLOCTYPE_THIS_BNO;
>  		args->alignment = 1;
>  
>  		/*
> @@ -3532,8 +3531,8 @@ xfs_bmap_btalloc_at_eof(
>  		else
>  			args->minalignslop = 0;
>  
> -		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, args->fsbno));
> -		error = xfs_alloc_vextent_this_ag(args);
> +		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> +		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
>  		xfs_perag_put(args->pag);
>  		if (error)
>  			return error;
> @@ -3546,7 +3545,6 @@ xfs_bmap_btalloc_at_eof(
>  		 */
>  		args->pag = NULL;
>  		args->type = atype;
> -		args->fsbno = ap->blkno;
>  		args->alignment = stripe_align;
>  		args->minlen = nextminlen;
>  		args->minalignslop = 0;
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index daa6f7055bba..d2525f0cc6cd 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -662,8 +662,6 @@ xfs_ialloc_ag_alloc(
>  		goto sparse_alloc;
>  	if (likely(newino != NULLAGINO &&
>  		  (args.agbno < be32_to_cpu(agi->agi_length)))) {
> -		args.fsbno = XFS_AGB_TO_FSB(args.mp, pag->pag_agno, args.agbno);
> -		args.type = XFS_ALLOCTYPE_THIS_BNO;
>  		args.prod = 1;
>  
>  		/*
> @@ -684,7 +682,9 @@ xfs_ialloc_ag_alloc(
>  
>  		/* Allow space for the inode btree to split. */
>  		args.minleft = igeo->inobt_maxlevels;
> -		error = xfs_alloc_vextent_this_ag(&args);
> +		error = xfs_alloc_vextent_exact_bno(&args,
> +				XFS_AGB_TO_FSB(args.mp, pag->pag_agno,
> +						args.agbno));
>  		if (error)
>  			return error;
>  
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index 5f4b50aac4bb..1b71174ec0d6 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -328,14 +328,12 @@ xrep_alloc_ag_block(
>  	args.mp = sc->mp;
>  	args.pag = sc->sa.pag;
>  	args.oinfo = *oinfo;
> -	args.fsbno = XFS_AGB_TO_FSB(args.mp, sc->sa.pag->pag_agno, 0);
>  	args.minlen = 1;
>  	args.maxlen = 1;
>  	args.prod = 1;
> -	args.type = XFS_ALLOCTYPE_THIS_AG;
>  	args.resv = resv;
>  
> -	error = xfs_alloc_vextent_this_ag(&args);
> +	error = xfs_alloc_vextent_this_ag(&args, sc->sa.pag->pag_agno);
>  	if (error)
>  		return error;
>  	if (args.fsbno == NULLFSBLOCK)
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 28/42] xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker
  2023-01-18 22:44 ` [PATCH 28/42] xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker Dave Chinner
@ 2023-02-01 23:13   ` Darrick J. Wong
  0 siblings, 0 replies; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 23:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:51AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that the AG iteration code in the core allocation code has been
> cleaned up, we can easily convert it to use a for_each_perag..()
> variant to use active references and skip AGs that it can't get
> active references on.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ag.h    | 22 ++++++---
>  fs/xfs/libxfs/xfs_alloc.c | 98 ++++++++++++++++++---------------------
>  2 files changed, 60 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 8f43b91d4cf3..5e18536dfdce 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -253,6 +253,7 @@ xfs_perag_next_wrap(
>  	struct xfs_perag	*pag,
>  	xfs_agnumber_t		*agno,
>  	xfs_agnumber_t		stop_agno,
> +	xfs_agnumber_t		restart_agno,
>  	xfs_agnumber_t		wrap_agno)
>  {
>  	struct xfs_mount	*mp = pag->pag_mount;
> @@ -260,10 +261,11 @@ xfs_perag_next_wrap(
>  	*agno = pag->pag_agno + 1;
>  	xfs_perag_rele(pag);
>  	while (*agno != stop_agno) {
> -		if (*agno >= wrap_agno)
> -			*agno = 0;
> -		if (*agno == stop_agno)
> -			break;
> +		if (*agno >= wrap_agno) {
> +			if (restart_agno >= stop_agno)
> +				break;
> +			*agno = restart_agno;
> +		}
>  
>  		pag = xfs_perag_grab(mp, *agno);
>  		if (pag)
> @@ -274,14 +276,20 @@ xfs_perag_next_wrap(
>  }
>  
>  /*
> - * Iterate all AGs from start_agno through wrap_agno, then 0 through
> + * Iterate all AGs from start_agno through wrap_agno, then restart_agno through
>   * (start_agno - 1).
>   */
> -#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag) \
> +#define for_each_perag_wrap_range(mp, start_agno, restart_agno, wrap_agno, agno, pag) \
>  	for ((agno) = (start_agno), (pag) = xfs_perag_grab((mp), (agno)); \
>  		(pag) != NULL; \
>  		(pag) = xfs_perag_next_wrap((pag), &(agno), (start_agno), \
> -				(wrap_agno)))
> +				(restart_agno), (wrap_agno)))
> +/*
> + * Iterate all AGs from start_agno through wrap_agno, then 0 through
> + * (start_agno - 1).
> + */
> +#define for_each_perag_wrap_at(mp, start_agno, wrap_agno, agno, pag) \
> +	for_each_perag_wrap_range((mp), (start_agno), 0, (wrap_agno), (agno), (pag))
>  
>  /*
>   * Iterate all AGs from start_agno through to the end of the filesystem, then 0
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 43a054002da3..39f3e76efcab 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3156,6 +3156,7 @@ xfs_alloc_vextent_prepare_ag(
>  	if (need_pag)
>  		args->pag = xfs_perag_get(args->mp, args->agno);
>  
> +	args->agbp = NULL;
>  	error = xfs_alloc_fix_freelist(args, 0);
>  	if (error) {
>  		trace_xfs_alloc_vextent_nofix(args);
> @@ -3255,8 +3256,8 @@ xfs_alloc_vextent_finish(
>  	XFS_STATS_ADD(mp, xs_allocb, args->len);
>  
>  out_drop_perag:
> -	if (drop_perag) {
> -		xfs_perag_put(args->pag);
> +	if (drop_perag && args->pag) {
> +		xfs_perag_rele(args->pag);
>  		args->pag = NULL;
>  	}
>  	return error;
> @@ -3304,6 +3305,10 @@ xfs_alloc_vextent_this_ag(
>   * we attempt to allocation in as there is no locality optimisation possible for
>   * those allocations.
>   *
> + * On return, args->pag may be left referenced if we finish before the "all
> + * failed" return point. The allocation finish still needs the perag, and
> + * so the caller will release it once they've finished the allocation.
> + *
>   * When we wrap the AG iteration at the end of the filesystem, we have to be
>   * careful not to wrap into AGs below ones we already have locked in the
>   * transaction if we are doing a blocking iteration. This will result in an
> @@ -3318,72 +3323,59 @@ xfs_alloc_vextent_iterate_ags(
>  	uint32_t		flags)
>  {
>  	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		agno;
>  	int			error = 0;
>  
> -	ASSERT(start_agno >= minimum_agno);
> +restart:
> +	for_each_perag_wrap_range(mp, start_agno, minimum_agno,
> +			mp->m_sb.sb_agcount, agno, args->pag) {
> +		args->agno = agno;
> +		trace_printk("sag %u minag %u agno %u pag %u, agbno %u, agcnt %u",
> +			start_agno, minimum_agno, agno, args->pag->pag_agno,
> +			target_agbno, mp->m_sb.sb_agcount);

Please remove the debugging statement or (if it's useful) convert this
to a static tracepoint.

--D

>  
> -	/*
> -	 * Loop over allocation groups twice; first time with
> -	 * trylock set, second time without.
> -	 */
> -	args->agno = start_agno;
> -	for (;;) {
> -		args->pag = xfs_perag_get(mp, args->agno);
>  		error = xfs_alloc_vextent_prepare_ag(args);
>  		if (error)
>  			break;
> -
> -		if (args->agbp) {
> -			/*
> -			 * Allocation is supposed to succeed now, so break out
> -			 * of the loop regardless of whether we succeed or not.
> -			 */
> -			if (args->agno == start_agno && target_agbno) {
> -				args->agbno = target_agbno;
> -				error = xfs_alloc_ag_vextent_near(args);
> -			} else {
> -				args->agbno = 0;
> -				error = xfs_alloc_ag_vextent_size(args);
> -			}
> -			break;
> +		if (!args->agbp) {
> +			trace_xfs_alloc_vextent_loopfailed(args);
> +			continue;
>  		}
>  
> -		trace_xfs_alloc_vextent_loopfailed(args);
> -
>  		/*
> -		 * If we are try-locking, we can't deadlock on AGF locks so we
> -		 * can wrap all the way back to the first AG. Otherwise, wrap
> -		 * back to the start AG so we can't deadlock and let the end of
> -		 * scan handler decide what to do next.
> +		 * Allocation is supposed to succeed now, so break out of the
> +		 * loop regardless of whether we succeed or not.
>  		 */
> -		if (++(args->agno) == mp->m_sb.sb_agcount) {
> -			if (flags & XFS_ALLOC_FLAG_TRYLOCK)
> -				args->agno = 0;
> -			else
> -				args->agno = minimum_agno;
> -		}
> -
> -		/*
> -		 * Reached the starting a.g., must either be done
> -		 * or switch to non-trylock mode.
> -		 */
> -		if (args->agno == start_agno) {
> -			if (flags == 0) {
> -				args->agbno = NULLAGBLOCK;
> -				trace_xfs_alloc_vextent_allfailed(args);
> -				break;
> -			}
> +		if (args->agno == start_agno && target_agbno) {
>  			args->agbno = target_agbno;
> -			flags = 0;
> +			error = xfs_alloc_ag_vextent_near(args);
> +		} else {
> +			args->agbno = 0;
> +			error = xfs_alloc_ag_vextent_size(args);
>  		}
> -		xfs_perag_put(args->pag);
> +		break;
> +	}
> +	if (error) {
> +		xfs_perag_rele(args->pag);
>  		args->pag = NULL;
> +		return error;
>  	}
> +	if (args->agbp)
> +		return 0;
> +
>  	/*
> -	 * The perag is left referenced in args for the caller to clean
> -	 * up after they've finished the allocation.
> +	 * We didn't find an AG we can alloation from. If we were given
> +	 * constraining flags by the caller, drop them and retry the allocation
> +	 * without any constraints being set.
>  	 */
> -	return error;
> +	if (flags) {
> +		flags = 0;
> +		goto restart;
> +	}
> +
> +	ASSERT(args->pag == NULL);
> +	trace_xfs_alloc_vextent_allfailed(args);
> +	return 0;
>  }
>  
>  /*
> @@ -3524,7 +3516,7 @@ xfs_alloc_vextent_near_bno(
>  	}
>  
>  	if (needs_perag)
> -		args->pag = xfs_perag_get(mp, args->agno);
> +		args->pag = xfs_perag_grab(mp, args->agno);
>  
>  	error = xfs_alloc_vextent_prepare_ag(args);
>  	if (!error && args->agbp)
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 29/42] xfs: convert trim to use for_each_perag_range
  2023-01-18 22:44 ` [PATCH 29/42] xfs: convert trim to use for_each_perag_range Dave Chinner
@ 2023-02-01 23:15   ` Darrick J. Wong
  2023-02-06 23:19     ` Dave Chinner
  0 siblings, 1 reply; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-01 23:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:52AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> To convert it to using active perag references and hence make it
> shrink safe.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_discard.c | 50 ++++++++++++++++++++------------------------
>  1 file changed, 23 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index bfc829c07f03..afc4c78b9eed 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -21,23 +21,20 @@
>  
>  STATIC int
>  xfs_trim_extents(
> -	struct xfs_mount	*mp,
> -	xfs_agnumber_t		agno,
> +	struct xfs_perag	*pag,
>  	xfs_daddr_t		start,
>  	xfs_daddr_t		end,
>  	xfs_daddr_t		minlen,
>  	uint64_t		*blocks_trimmed)
>  {
> +	struct xfs_mount	*mp = pag->pag_mount;
>  	struct block_device	*bdev = mp->m_ddev_targp->bt_bdev;
>  	struct xfs_btree_cur	*cur;
>  	struct xfs_buf		*agbp;
>  	struct xfs_agf		*agf;
> -	struct xfs_perag	*pag;
>  	int			error;
>  	int			i;
>  
> -	pag = xfs_perag_get(mp, agno);
> -
>  	/*
>  	 * Force out the log.  This means any transactions that might have freed

This is a tangent, but one thing I've wondered is if it's really
necessary to force the log for *every* AG that we want to trim?  Even if
we've just come from trimming the previous AG?

Looks good otherwise,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


>  	 * space before we take the AGF buffer lock are now on disk, and the
> @@ -47,7 +44,7 @@ xfs_trim_extents(
>  
>  	error = xfs_alloc_read_agf(pag, NULL, 0, &agbp);
>  	if (error)
> -		goto out_put_perag;
> +		return error;
>  	agf = agbp->b_addr;
>  
>  	cur = xfs_allocbt_init_cursor(mp, NULL, agbp, pag, XFS_BTNUM_CNT);
> @@ -71,10 +68,10 @@ xfs_trim_extents(
>  
>  		error = xfs_alloc_get_rec(cur, &fbno, &flen, &i);
>  		if (error)
> -			goto out_del_cursor;
> +			break;
>  		if (XFS_IS_CORRUPT(mp, i != 1)) {
>  			error = -EFSCORRUPTED;
> -			goto out_del_cursor;
> +			break;
>  		}
>  		ASSERT(flen <= be32_to_cpu(agf->agf_longest));
>  
> @@ -83,15 +80,15 @@ xfs_trim_extents(
>  		 * the format the range/len variables are supplied in by
>  		 * userspace.
>  		 */
> -		dbno = XFS_AGB_TO_DADDR(mp, agno, fbno);
> +		dbno = XFS_AGB_TO_DADDR(mp, pag->pag_agno, fbno);
>  		dlen = XFS_FSB_TO_BB(mp, flen);
>  
>  		/*
>  		 * Too small?  Give up.
>  		 */
>  		if (dlen < minlen) {
> -			trace_xfs_discard_toosmall(mp, agno, fbno, flen);
> -			goto out_del_cursor;
> +			trace_xfs_discard_toosmall(mp, pag->pag_agno, fbno, flen);
> +			break;
>  		}
>  
>  		/*
> @@ -100,7 +97,7 @@ xfs_trim_extents(
>  		 * down partially overlapping ranges for now.
>  		 */
>  		if (dbno + dlen < start || dbno > end) {
> -			trace_xfs_discard_exclude(mp, agno, fbno, flen);
> +			trace_xfs_discard_exclude(mp, pag->pag_agno, fbno, flen);
>  			goto next_extent;
>  		}
>  
> @@ -109,32 +106,30 @@ xfs_trim_extents(
>  		 * discard and try again the next time.
>  		 */
>  		if (xfs_extent_busy_search(mp, pag, fbno, flen)) {
> -			trace_xfs_discard_busy(mp, agno, fbno, flen);
> +			trace_xfs_discard_busy(mp, pag->pag_agno, fbno, flen);
>  			goto next_extent;
>  		}
>  
> -		trace_xfs_discard_extent(mp, agno, fbno, flen);
> +		trace_xfs_discard_extent(mp, pag->pag_agno, fbno, flen);
>  		error = blkdev_issue_discard(bdev, dbno, dlen, GFP_NOFS);
>  		if (error)
> -			goto out_del_cursor;
> +			break;
>  		*blocks_trimmed += flen;
>  
>  next_extent:
>  		error = xfs_btree_decrement(cur, 0, &i);
>  		if (error)
> -			goto out_del_cursor;
> +			break;
>  
>  		if (fatal_signal_pending(current)) {
>  			error = -ERESTARTSYS;
> -			goto out_del_cursor;
> +			break;
>  		}
>  	}
>  
>  out_del_cursor:
>  	xfs_btree_del_cursor(cur, error);
>  	xfs_buf_relse(agbp);
> -out_put_perag:
> -	xfs_perag_put(pag);
>  	return error;
>  }
>  
> @@ -152,11 +147,12 @@ xfs_ioc_trim(
>  	struct xfs_mount		*mp,
>  	struct fstrim_range __user	*urange)
>  {
> +	struct xfs_perag	*pag;
>  	unsigned int		granularity =
>  		bdev_discard_granularity(mp->m_ddev_targp->bt_bdev);
>  	struct fstrim_range	range;
>  	xfs_daddr_t		start, end, minlen;
> -	xfs_agnumber_t		start_agno, end_agno, agno;
> +	xfs_agnumber_t		agno;
>  	uint64_t		blocks_trimmed = 0;
>  	int			error, last_error = 0;
>  
> @@ -193,18 +189,18 @@ xfs_ioc_trim(
>  	end = start + BTOBBT(range.len) - 1;
>  
>  	if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
> -		end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;
> -
> -	start_agno = xfs_daddr_to_agno(mp, start);
> -	end_agno = xfs_daddr_to_agno(mp, end);
> +		end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1;
>  
> -	for (agno = start_agno; agno <= end_agno; agno++) {
> -		error = xfs_trim_extents(mp, agno, start, end, minlen,
> +	agno = xfs_daddr_to_agno(mp, start);
> +	for_each_perag_range(mp, agno, xfs_daddr_to_agno(mp, end), pag) {
> +		error = xfs_trim_extents(pag, start, end, minlen,
>  					  &blocks_trimmed);
>  		if (error) {
>  			last_error = error;
> -			if (error == -ERESTARTSYS)
> +			if (error == -ERESTARTSYS) {
> +				xfs_perag_rele(pag);
>  				break;
> +			}
>  		}
>  	}
>  
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 41/42] xfs: return a referenced perag from filestreams allocator
  2023-01-18 22:45 ` [PATCH 41/42] xfs: return a referenced perag from filestreams allocator Dave Chinner
@ 2023-02-02  0:01   ` Darrick J. Wong
  2023-02-06 23:22     ` Dave Chinner
  0 siblings, 1 reply; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-02  0:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:45:04AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that the filestreams AG selection tracks active perags, we need
> to return an active perag to the core allocator code. This is
> because the file allocation the filestreams code will run are AG
> specific allocations and so need to pin the AG until the allocations
> complete.
> 
> We cannot rely on the filestreams item reference to do this - the
> filestreams association can be torn down at any time, hence we
> need to have a separate reference for the allocation process to pin
> the AG after it has been selected.
> 
> This means there is some perag juggling in allocation failure
> fallback paths as they will do all AG scans in the case the AG
> specific allocation fails. Hence we need to track the perag
> reference that the filestream allocator returned to make sure we
> don't leak it on repeated allocation failure.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++-----
>  fs/xfs/xfs_filestream.c  | 93 ++++++++++++++++++++++++----------------
>  2 files changed, 84 insertions(+), 47 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 098b46f3f3e3..7f56002b545d 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3427,6 +3427,7 @@ xfs_bmap_btalloc_at_eof(
>  	bool			ag_only)
>  {
>  	struct xfs_mount	*mp = args->mp;
> +	struct xfs_perag	*caller_pag = args->pag;
>  	int			error;
>  
>  	/*
> @@ -3454,9 +3455,11 @@ xfs_bmap_btalloc_at_eof(
>  		else
>  			args->minalignslop = 0;
>  
> -		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> +		if (!caller_pag)
> +			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
>  		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> -		xfs_perag_put(args->pag);
> +		if (!caller_pag)
> +			xfs_perag_put(args->pag);
>  		if (error)
>  			return error;
>  
> @@ -3482,10 +3485,13 @@ xfs_bmap_btalloc_at_eof(
>  		args->minalignslop = 0;
>  	}
>  
> -	if (ag_only)
> +	if (ag_only) {
>  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> -	else
> +	} else {
> +		args->pag = NULL;
>  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> +		args->pag = caller_pag;

At first glance I wondered if we end up leaking any args->pag set by the
_iterate_ags function, but I think it's the case that _finish will
release args->pag and set it back to NULL?  So in effect we're
preserving the caller's args->pag here, and nothing leaks.  In that
case, I think we should check that assumption:

		ASSERT(args->pag == NULL);
		args->pag = caller_pag;

If the answer to the above is yes, then with the above fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +	}
>  	if (error)
>  		return error;
>  
> @@ -3544,12 +3550,13 @@ xfs_bmap_btalloc_filestreams(
>  	int			stripe_align)
>  {
>  	xfs_extlen_t		blen = 0;
> -	int			error;
> +	int			error = 0;
>  
>  
>  	error = xfs_filestream_select_ag(ap, args, &blen);
>  	if (error)
>  		return error;
> +	ASSERT(args->pag);
>  
>  	/*
>  	 * If we are in low space mode, then optimal allocation will fail so
> @@ -3558,22 +3565,31 @@ xfs_bmap_btalloc_filestreams(
>  	 */
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
>  		args->minlen = ap->minlen;
> +		ASSERT(args->fsbno == NULLFSBLOCK);
>  		goto out_low_space;
>  	}
>  
>  	args->minlen = xfs_bmap_select_minlen(ap, args, blen);
> -	if (ap->aeof) {
> +	if (ap->aeof)
>  		error = xfs_bmap_btalloc_at_eof(ap, args, blen, stripe_align,
>  				true);
> -		if (error || args->fsbno != NULLFSBLOCK)
> -			return error;
> -	}
>  
> -	error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> +	if (!error && args->fsbno == NULLFSBLOCK)
> +		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> +
> +out_low_space:
> +	/*
> +	 * We are now done with the perag reference for the filestreams
> +	 * association provided by xfs_filestream_select_ag(). Release it now as
> +	 * we've either succeeded, had a fatal error or we are out of space and
> +	 * need to do a full filesystem scan for free space which will take it's
> +	 * own references.
> +	 */
> +	xfs_perag_rele(args->pag);
> +	args->pag = NULL;
>  	if (error || args->fsbno != NULLFSBLOCK)
>  		return error;
>  
> -out_low_space:
>  	return xfs_bmap_btalloc_low_space(ap, args);
>  }
>  
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index 81aebe3e09ba..523a3b8b5754 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -53,8 +53,9 @@ xfs_fstrm_free_func(
>   */
>  static int
>  xfs_filestream_pick_ag(
> +	struct xfs_alloc_arg	*args,
>  	struct xfs_inode	*ip,
> -	xfs_agnumber_t		*agp,
> +	xfs_agnumber_t		start_agno,
>  	int			flags,
>  	xfs_extlen_t		*longest)
>  {
> @@ -64,7 +65,6 @@ xfs_filestream_pick_ag(
>  	struct xfs_perag	*max_pag = NULL;
>  	xfs_extlen_t		minlen = *longest;
>  	xfs_extlen_t		free = 0, minfree, maxfree = 0;
> -	xfs_agnumber_t		start_agno = *agp;
>  	xfs_agnumber_t		agno;
>  	int			err, trylock;
>  
> @@ -73,8 +73,6 @@ xfs_filestream_pick_ag(
>  	/* 2% of an AG's blocks must be free for it to be chosen. */
>  	minfree = mp->m_sb.sb_agblocks / 50;
>  
> -	*agp = NULLAGNUMBER;
> -
>  	/* For the first pass, don't sleep trying to init the per-AG. */
>  	trylock = XFS_ALLOC_FLAG_TRYLOCK;
>  
> @@ -89,7 +87,7 @@ xfs_filestream_pick_ag(
>  				break;
>  			/* Couldn't lock the AGF, skip this AG. */
>  			err = 0;
> -			goto next_ag;
> +			continue;
>  		}
>  
>  		/* Keep track of the AG with the most free blocks. */
> @@ -146,16 +144,19 @@ xfs_filestream_pick_ag(
>  		/*
>  		 * No unassociated AGs are available, so select the AG with the
>  		 * most free space, regardless of whether it's already in use by
> -		 * another filestream. It none suit, return NULLAGNUMBER.
> +		 * another filestream. It none suit, just use whatever AG we can
> +		 * grab.
>  		 */
>  		if (!max_pag) {
> -			*agp = NULLAGNUMBER;
> -			trace_xfs_filestream_pick(ip, NULL, free);
> -			return 0;
> +			for_each_perag_wrap(mp, start_agno, agno, pag)
> +				break;
> +			atomic_inc(&pag->pagf_fstrms);
> +			*longest = 0;
> +		} else {
> +			pag = max_pag;
> +			free = maxfree;
> +			atomic_inc(&pag->pagf_fstrms);
>  		}
> -		pag = max_pag;
> -		free = maxfree;
> -		atomic_inc(&pag->pagf_fstrms);
>  	} else if (max_pag) {
>  		xfs_perag_rele(max_pag);
>  	}
> @@ -167,16 +168,29 @@ xfs_filestream_pick_ag(
>  	if (!item)
>  		goto out_put_ag;
>  
> +
> +	/*
> +	 * We are going to use this perag now, so take another ref to it for the
> +	 * allocation context returned to the caller. If we raced to create and
> +	 * insert the filestreams item into the MRU (-EEXIST), then we still
> +	 * keep this reference but free the item reference we gained above. On
> +	 * any other failure, we have to drop both.
> +	 */
> +	atomic_inc(&pag->pag_active_ref);
>  	item->pag = pag;
> +	args->pag = pag;
>  
>  	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
>  	if (err) {
> -		if (err == -EEXIST)
> +		if (err == -EEXIST) {
>  			err = 0;
> +		} else {
> +			xfs_perag_rele(args->pag);
> +			args->pag = NULL;
> +		}
>  		goto out_free_item;
>  	}
>  
> -	*agp = pag->pag_agno;
>  	return 0;
>  
>  out_free_item:
> @@ -236,7 +250,14 @@ xfs_filestream_select_ag_mru(
>  	if (!mru)
>  		goto out_default_agno;
>  
> +	/*
> +	 * Grab the pag and take an extra active reference for the caller whilst
> +	 * the mru item cannot go away. This means we'll pin the perag with
> +	 * the reference we get here even if the filestreams association is torn
> +	 * down immediately after we mark the lookup as done.
> +	 */
>  	pag = container_of(mru, struct xfs_fstrm_item, mru)->pag;
> +	atomic_inc(&pag->pag_active_ref);
>  	xfs_mru_cache_done(mp->m_filestream);
>  
>  	trace_xfs_filestream_lookup(pag, ap->ip->i_ino);
> @@ -246,6 +267,8 @@ xfs_filestream_select_ag_mru(
>  
>  	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
>  	if (error) {
> +		/* We aren't going to use this perag */
> +		xfs_perag_rele(pag);
>  		if (error != -EAGAIN)
>  			return error;
>  		*blen = 0;
> @@ -253,12 +276,18 @@ xfs_filestream_select_ag_mru(
>  
>  	/*
>  	 * We are done if there's still enough contiguous free space to succeed.
> +	 * If there is very little free space before we start a filestreams
> +	 * allocation, we're almost guaranteed to fail to find a better AG with
> +	 * larger free space available so we don't even try.
>  	 */
>  	*agno = pag->pag_agno;
> -	if (*blen >= args->maxlen)
> +	if (*blen >= args->maxlen || (ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> +		args->pag = pag;
>  		return 0;
> +	}
>  
>  	/* Changing parent AG association now, so remove the existing one. */
> +	xfs_perag_rele(pag);
>  	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
>  	if (mru) {
>  		struct xfs_fstrm_item *item =
> @@ -297,46 +326,38 @@ xfs_filestream_select_ag(
>  	struct xfs_inode	*pip = NULL;
>  	xfs_agnumber_t		agno;
>  	int			flags = 0;
> -	int			error;
> +	int			error = 0;
>  
>  	args->total = ap->total;
>  	*blen = 0;
>  
>  	pip = xfs_filestream_get_parent(ap->ip);
>  	if (!pip) {
> -		agno = 0;
> -		goto out_select;
> +		ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
> +		return 0;
>  	}
>  
>  	error = xfs_filestream_select_ag_mru(ap, args, pip, &agno, blen);
> -	if (error || *blen >= args->maxlen)
> +	if (error)
>  		goto out_rele;
> -
> -	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
> -	xfs_bmap_adjacent(ap);
> -
> -	/*
> -	 * If there is very little free space before we start a filestreams
> -	 * allocation, we're almost guaranteed to fail to find a better AG with
> -	 * larger free space available so we don't even try.
> -	 */
> +	if (*blen >= args->maxlen)
> +		goto out_select;
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
>  		goto out_select;
>  
> +	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
> +	xfs_bmap_adjacent(ap);
> +	*blen = ap->length;
>  	if (ap->datatype & XFS_ALLOC_USERDATA)
>  		flags |= XFS_PICK_USERDATA;
>  	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
>  		flags |= XFS_PICK_LOWSPACE;
>  
> -	*blen = ap->length;
> -	error = xfs_filestream_pick_ag(pip, &agno, flags, blen);
> -	if (agno == NULLAGNUMBER) {
> -		agno = 0;
> -		*blen = 0;
> -	}
> -
> +	error = xfs_filestream_pick_ag(args, pip, agno, flags, blen);
> +	if (error)
> +		goto out_rele;
>  out_select:
> -	ap->blkno = XFS_AGB_TO_FSB(mp, agno, 0);
> +	ap->blkno = XFS_AGB_TO_FSB(mp, args->pag->pag_agno, 0);
>  out_rele:
>  	xfs_irele(pip);
>  	return error;
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 42/42] xfs: refactor the filestreams allocator pick functions
  2023-01-18 22:45 ` [PATCH 42/42] xfs: refactor the filestreams allocator pick functions Dave Chinner
@ 2023-02-02  0:08   ` Darrick J. Wong
  2023-02-06 23:26     ` Dave Chinner
  0 siblings, 1 reply; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-02  0:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:45:05AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that the filestreams allocator is largely rewritten,
> restructure the main entry point and pick function to seperate out
> the different operations cleanly. The MRU lookup function should not
> handle the start AG selection on MRU lookup failure, and nor should
> the pick function handle building the association that is inserted
> into the MRU.
> 
> This leaves the filestreams allocator fairly clean and easy to
> understand, returning to the caller with an active perag reference
> and a target block to allocate at.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_filestream.c | 247 +++++++++++++++++++++-------------------
>  fs/xfs/xfs_trace.h      |   9 +-
>  2 files changed, 132 insertions(+), 124 deletions(-)
> 
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index 523a3b8b5754..0a1d316ebdba 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -48,19 +48,19 @@ xfs_fstrm_free_func(
>  }
>  
>  /*
> - * Scan the AGs starting at startag looking for an AG that isn't in use and has
> - * at least minlen blocks free.
> + * Scan the AGs starting at start_agno looking for an AG that isn't in use and
> + * has at least minlen blocks free. If no AG is found to match the allocation
> + * requirements, pick the AG with the most free space in it.
>   */
>  static int
>  xfs_filestream_pick_ag(
>  	struct xfs_alloc_arg	*args,
> -	struct xfs_inode	*ip,
> +	xfs_ino_t		pino,
>  	xfs_agnumber_t		start_agno,
>  	int			flags,
>  	xfs_extlen_t		*longest)
>  {
> -	struct xfs_mount	*mp = ip->i_mount;
> -	struct xfs_fstrm_item	*item;
> +	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*pag;
>  	struct xfs_perag	*max_pag = NULL;
>  	xfs_extlen_t		minlen = *longest;
> @@ -68,8 +68,6 @@ xfs_filestream_pick_ag(
>  	xfs_agnumber_t		agno;
>  	int			err, trylock;

Who consumes trylock?  Is this supposed to get passed through to
xfs_bmap_longest_free_extent, or is the goal here merely to run the
for_each_perag_wrap loop twice before going for the most free or any old
perag?

--D

> -	ASSERT(S_ISDIR(VFS_I(ip)->i_mode));
> -
>  	/* 2% of an AG's blocks must be free for it to be chosen. */
>  	minfree = mp->m_sb.sb_agblocks / 50;
>  
> @@ -78,7 +76,7 @@ xfs_filestream_pick_ag(
>  
>  restart:
>  	for_each_perag_wrap(mp, start_agno, agno, pag) {
> -		trace_xfs_filestream_scan(pag, ip->i_ino);
> +		trace_xfs_filestream_scan(pag, pino);
>  		*longest = 0;
>  		err = xfs_bmap_longest_free_extent(pag, NULL, longest);
>  		if (err) {
> @@ -148,9 +146,9 @@ xfs_filestream_pick_ag(
>  		 * grab.
>  		 */
>  		if (!max_pag) {
> -			for_each_perag_wrap(mp, start_agno, agno, pag)
> +			for_each_perag_wrap(args->mp, 0, start_agno, args->pag)
>  				break;
> -			atomic_inc(&pag->pagf_fstrms);
> +			atomic_inc(&args->pag->pagf_fstrms);
>  			*longest = 0;
>  		} else {
>  			pag = max_pag;
> @@ -161,44 +159,10 @@ xfs_filestream_pick_ag(
>  		xfs_perag_rele(max_pag);
>  	}
>  
> -	trace_xfs_filestream_pick(ip, pag, free);
> -
> -	err = -ENOMEM;
> -	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
> -	if (!item)
> -		goto out_put_ag;
> -
> -
> -	/*
> -	 * We are going to use this perag now, so take another ref to it for the
> -	 * allocation context returned to the caller. If we raced to create and
> -	 * insert the filestreams item into the MRU (-EEXIST), then we still
> -	 * keep this reference but free the item reference we gained above. On
> -	 * any other failure, we have to drop both.
> -	 */
> -	atomic_inc(&pag->pag_active_ref);
> -	item->pag = pag;
> +	trace_xfs_filestream_pick(pag, pino, free);
>  	args->pag = pag;
> -
> -	err = xfs_mru_cache_insert(mp->m_filestream, ip->i_ino, &item->mru);
> -	if (err) {
> -		if (err == -EEXIST) {
> -			err = 0;
> -		} else {
> -			xfs_perag_rele(args->pag);
> -			args->pag = NULL;
> -		}
> -		goto out_free_item;
> -	}
> -
>  	return 0;
>  
> -out_free_item:
> -	kmem_free(item);
> -out_put_ag:
> -	atomic_dec(&pag->pagf_fstrms);
> -	xfs_perag_rele(pag);
> -	return err;
>  }
>  
>  static struct xfs_inode *
> @@ -227,29 +191,29 @@ xfs_filestream_get_parent(
>  
>  /*
>   * Lookup the mru cache for an existing association. If one exists and we can
> - * use it, return with the agno and blen indicating that the allocation will
> - * proceed with that association.
> + * use it, return with an active perag reference indicating that the allocation
> + * will proceed with that association.
>   *
>   * If we have no association, or we cannot use the current one and have to
> - * destroy it, return with blen = 0 and agno pointing at the next agno to try.
> + * destroy it, return with longest = 0 to tell the caller to create a new
> + * association.
>   */
> -int
> -xfs_filestream_select_ag_mru(
> +static int
> +xfs_filestream_lookup_association(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args,
> -	struct xfs_inode	*pip,
> -	xfs_agnumber_t		*agno,
> -	xfs_extlen_t		*blen)
> +	xfs_ino_t		pino,
> +	xfs_extlen_t		*longest)
>  {
> -	struct xfs_mount	*mp = ap->ip->i_mount;
> +	struct xfs_mount	*mp = args->mp;
>  	struct xfs_perag	*pag;
>  	struct xfs_mru_cache_elem *mru;
> -	int			error;
> +	int			error = 0;
>  
> -	mru = xfs_mru_cache_lookup(mp->m_filestream, pip->i_ino);
> +	*longest = 0;
> +	mru = xfs_mru_cache_lookup(mp->m_filestream, pino);
>  	if (!mru)
> -		goto out_default_agno;
> -
> +		return 0;
>  	/*
>  	 * Grab the pag and take an extra active reference for the caller whilst
>  	 * the mru item cannot go away. This means we'll pin the perag with
> @@ -265,103 +229,148 @@ xfs_filestream_select_ag_mru(
>  	ap->blkno = XFS_AGB_TO_FSB(args->mp, pag->pag_agno, 0);
>  	xfs_bmap_adjacent(ap);
>  
> -	error = xfs_bmap_longest_free_extent(pag, args->tp, blen);
> -	if (error) {
> -		/* We aren't going to use this perag */
> -		xfs_perag_rele(pag);
> -		if (error != -EAGAIN)
> -			return error;
> -		*blen = 0;
> -	}
> -
>  	/*
> -	 * We are done if there's still enough contiguous free space to succeed.
>  	 * If there is very little free space before we start a filestreams
> -	 * allocation, we're almost guaranteed to fail to find a better AG with
> -	 * larger free space available so we don't even try.
> +	 * allocation, we're almost guaranteed to fail to find a large enough
> +	 * free space available so just use the cached AG.
>  	 */
> -	*agno = pag->pag_agno;
> -	if (*blen >= args->maxlen || (ap->tp->t_flags & XFS_TRANS_LOWMODE)) {
> -		args->pag = pag;
> -		return 0;
> +	if (ap->tp->t_flags & XFS_TRANS_LOWMODE) {
> +		*longest = 1;
> +		goto out_done;
>  	}
>  
> +	error = xfs_bmap_longest_free_extent(pag, args->tp, longest);
> +	if (error == -EAGAIN)
> +		error = 0;
> +	if (error || *longest < args->maxlen) {
> +		/* We aren't going to use this perag */
> +		*longest = 0;
> +		xfs_perag_rele(pag);
> +		return error;
> +	}
> +
> +out_done:
> +	args->pag = pag;
> +	return 0;
> +}
> +
> +static int
> +xfs_filestream_create_association(
> +	struct xfs_bmalloca	*ap,
> +	struct xfs_alloc_arg	*args,
> +	xfs_ino_t		pino,
> +	xfs_extlen_t		*longest)
> +{
> +	struct xfs_mount	*mp = args->mp;
> +	struct xfs_mru_cache_elem *mru;
> +	struct xfs_fstrm_item	*item;
> +	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, pino);
> +	int			flags = 0;
> +	int			error;
> +
>  	/* Changing parent AG association now, so remove the existing one. */
> -	xfs_perag_rele(pag);
> -	mru = xfs_mru_cache_remove(mp->m_filestream, pip->i_ino);
> +	mru = xfs_mru_cache_remove(mp->m_filestream, pino);
>  	if (mru) {
>  		struct xfs_fstrm_item *item =
>  			container_of(mru, struct xfs_fstrm_item, mru);
> -		*agno = (item->pag->pag_agno + 1) % mp->m_sb.sb_agcount;
> -		xfs_fstrm_free_func(mp, mru);
> -		return 0;
> -	}
>  
> -out_default_agno:
> -	if (xfs_is_inode32(mp)) {
> +		agno = (item->pag->pag_agno + 1) % mp->m_sb.sb_agcount;
> +		xfs_fstrm_free_func(mp, mru);
> +	} else if (xfs_is_inode32(mp)) {
>  		xfs_agnumber_t	 rotorstep = xfs_rotorstep;
> -		*agno = (mp->m_agfrotor / rotorstep) %
> -				mp->m_sb.sb_agcount;
> +
> +		agno = (mp->m_agfrotor / rotorstep) % mp->m_sb.sb_agcount;
>  		mp->m_agfrotor = (mp->m_agfrotor + 1) %
>  				 (mp->m_sb.sb_agcount * rotorstep);
> -		return 0;
>  	}
> -	*agno = XFS_INO_TO_AGNO(mp, pip->i_ino);
> +
> +	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
> +	xfs_bmap_adjacent(ap);
> +
> +	if (ap->datatype & XFS_ALLOC_USERDATA)
> +		flags |= XFS_PICK_USERDATA;
> +	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
> +		flags |= XFS_PICK_LOWSPACE;
> +
> +	*longest = ap->length;
> +	error = xfs_filestream_pick_ag(args, pino, agno, flags, longest);
> +	if (error)
> +		return error;
> +
> +	/*
> +	 * We are going to use this perag now, so create an assoication for it.
> +	 * xfs_filestream_pick_ag() has already bumped the perag fstrms counter
> +	 * for us, so all we need to do here is take another active reference to
> +	 * the perag for the cached association.
> +	 *
> +	 * If we fail to store the association, we need to drop the fstrms
> +	 * counter as well as drop the perag reference we take here for the
> +	 * item. We do not need to return an error for this failure - as long as
> +	 * we return a referenced AG, the allocation can still go ahead just
> +	 * fine.
> +	 */
> +	item = kmem_alloc(sizeof(*item), KM_MAYFAIL);
> +	if (!item)
> +		goto out_put_fstrms;
> +
> +	atomic_inc(&args->pag->pag_active_ref);
> +	item->pag = args->pag;
> +	error = xfs_mru_cache_insert(mp->m_filestream, pino, &item->mru);
> +	if (error)
> +		goto out_free_item;
>  	return 0;
>  
> +out_free_item:
> +	xfs_perag_rele(item->pag);
> +	kmem_free(item);
> +out_put_fstrms:
> +	atomic_dec(&args->pag->pagf_fstrms);
> +	return 0;
>  }
>  
>  /*
>   * Search for an allocation group with a single extent large enough for
> - * the request.  If one isn't found, then adjust the minimum allocation
> - * size to the largest space found.
> + * the request. First we look for an existing association and use that if it
> + * is found. Otherwise, we create a new association by selecting an AG that fits
> + * the allocation criteria.
> + *
> + * We return with a referenced perag in args->pag to indicate which AG we are
> + * allocating into or an error with no references held.
>   */
>  int
>  xfs_filestream_select_ag(
>  	struct xfs_bmalloca	*ap,
>  	struct xfs_alloc_arg	*args,
> -	xfs_extlen_t		*blen)
> +	xfs_extlen_t		*longest)
>  {
> -	struct xfs_mount	*mp = ap->ip->i_mount;
> -	struct xfs_inode	*pip = NULL;
> -	xfs_agnumber_t		agno;
> -	int			flags = 0;
> +	struct xfs_mount	*mp = args->mp;
> +	struct xfs_inode	*pip;
> +	xfs_ino_t		ino = 0;
>  	int			error = 0;
>  
> +	*longest = 0;
>  	args->total = ap->total;
> -	*blen = 0;
> -
>  	pip = xfs_filestream_get_parent(ap->ip);
> -	if (!pip) {
> -		ap->blkno = XFS_AGB_TO_FSB(mp, 0, 0);
> -		return 0;
> +	if (pip) {
> +		ino = pip->i_ino;
> +		error = xfs_filestream_lookup_association(ap, args, ino,
> +				longest);
> +		xfs_irele(pip);
> +		if (error)
> +			return error;
> +		if (*longest >= args->maxlen)
> +			goto out_select;
> +		if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
> +			goto out_select;
>  	}
>  
> -	error = xfs_filestream_select_ag_mru(ap, args, pip, &agno, blen);
> +	error = xfs_filestream_create_association(ap, args, ino, longest);
>  	if (error)
> -		goto out_rele;
> -	if (*blen >= args->maxlen)
> -		goto out_select;
> -	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
> -		goto out_select;
> -
> -	ap->blkno = XFS_AGB_TO_FSB(args->mp, agno, 0);
> -	xfs_bmap_adjacent(ap);
> -	*blen = ap->length;
> -	if (ap->datatype & XFS_ALLOC_USERDATA)
> -		flags |= XFS_PICK_USERDATA;
> -	if (ap->tp->t_flags & XFS_TRANS_LOWMODE)
> -		flags |= XFS_PICK_LOWSPACE;
> +		return error;
>  
> -	error = xfs_filestream_pick_ag(args, pip, agno, flags, blen);
> -	if (error)
> -		goto out_rele;
>  out_select:
>  	ap->blkno = XFS_AGB_TO_FSB(mp, args->pag->pag_agno, 0);
> -out_rele:
> -	xfs_irele(pip);
> -	return error;
> -
> +	return 0;
>  }
>  
>  void
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index b5f7d225d5b4..1d3569c0d2fe 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -668,9 +668,8 @@ DEFINE_FILESTREAM_EVENT(xfs_filestream_lookup);
>  DEFINE_FILESTREAM_EVENT(xfs_filestream_scan);
>  
>  TRACE_EVENT(xfs_filestream_pick,
> -	TP_PROTO(struct xfs_inode *ip, struct xfs_perag *pag,
> -		 xfs_extlen_t free),
> -	TP_ARGS(ip, pag, free),
> +	TP_PROTO(struct xfs_perag *pag, xfs_ino_t ino, xfs_extlen_t free),
> +	TP_ARGS(pag, ino, free),
>  	TP_STRUCT__entry(
>  		__field(dev_t, dev)
>  		__field(xfs_ino_t, ino)
> @@ -679,8 +678,8 @@ TRACE_EVENT(xfs_filestream_pick,
>  		__field(xfs_extlen_t, free)
>  	),
>  	TP_fast_assign(
> -		__entry->dev = VFS_I(ip)->i_sb->s_dev;
> -		__entry->ino = ip->i_ino;
> +		__entry->dev = pag->pag_mount->m_super->s_dev;
> +		__entry->ino = ino;
>  		if (pag) {
>  			__entry->agno = pag->pag_agno;
>  			__entry->streams = atomic_read(&pag->pagf_fstrms);
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 00/42] xfs: per-ag centric allocation alogrithms
  2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
                   ` (41 preceding siblings ...)
  2023-01-18 22:45 ` [PATCH 42/42] xfs: refactor the filestreams allocator pick functions Dave Chinner
@ 2023-02-02  0:14 ` Darrick J. Wong
  2023-02-06 23:13   ` Dave Chinner
  42 siblings, 1 reply; 77+ messages in thread
From: Darrick J. Wong @ 2023-02-02  0:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 19, 2023 at 09:44:23AM +1100, Dave Chinner wrote:
> This series continues the work towards making shrinking a filesystem
> possible.  We need to be able to stop operations from taking place
> on AGs that need to be removed by a shrink, so before shrink can be
> implemented we need to have the infrastructure in place to prevent
> incursion into AGs that are going to be, or are in the process, of
> being removed from active duty.
> 
> The focus of this is making operations that depend on access to AGs
> use the perag to access and pin the AG in active use, thereby
> creating a barrier we can use to delay shrink until all active uses
> of an AG have been drained and new uses are prevented.
> 
> This series starts by fixing some existing issues that are exposed
> by changes later in the series. They stand alone, so can be picked
> up independently of the rest of this patchset.

Hmm if I had to pick up only the bugfixes, which patches are those?
Patches 1-3 look like bug fixes, 4-6 might be but might not be?

> The most complex of these fixes is cleaning up the mess that is the
> AGF deadlock avoidance algorithm. This algorithm stores the first
> block that is allocated in a transaction in tp->t_firstblock, then
> uses this to try to limit future allocations within the transaction
> to AGs at or higher than the filesystem block stored in
> tp->t_firstblock. This depends on one of the initial bug fixes in
> the series to move the deadlock avoidance checks to
> xfs_alloc_vextent(), and then builds on it to relax the constraints
> of the avoidance algorithm to only be active when a deadlock is
> possible.
> 
> We also update the algorithm to record allocations from higher AGs
> that are allocated from, because we when we need to lock more than
> two AGs we still have to ensure lock order is correct. Therefore we
> can't lock AGs in the order 1, 3, 2, even though tp->t_firstblock
> indicates that we've allocated from AG 1 and so AG is valid to lock.
> It's not valid, because we already hold AG 3 locked, and so
> tp->t-first_block should actually point at AG 3, not AG 1 in this
> situation.
> 
> It should now be obvious that the deadlock avoidance algorithm
> should record AGs, not filesystem blocks. So the series then changes
> the transaction to store the highest AG we've allocated in rather
> than a filesystem block we allocated.  This makes it obvious what
> the constraints are, and trivial to update as we lock and allocate
> from various AGs.
> 
> With all the bug fixes out of the way, the series then starts
> converting the code to use active references. Active reference
> counts are used by high level code that needs to prevent the AG from
> being taken out from under it by a shrink operation. The high level
> code needs to be able to handle not getting an active reference
> gracefully, and the shrink code will need to wait for active
> references to drain before continuing.
> 
> Active references are implemented just as reference counts right now
> - an active reference is taken at perag init during mount, and all
> other active references are dependent on the active reference count
> being greater than zero. This gives us an initial method of stopping
> new active references without needing other infrastructure; just
> drop the reference taken at filesystem mount time and when the
> refcount then falls to zero no new references can be taken.
> 
> In future, this will need to take into account AG control state
> (e.g. offline, no alloc, etc) as well as the reference count, but
> right now we can implement a basic barrier for shrink with just
> reference count manipulations. As such, patches to convert the perag
> state to atomic opstate fields similar to the xfs_mount and xlog
> opstate fields follow the initial active perag reference counting
> patches.
> 
> The first target for active reference conversion is the
> for_each_perag*() iterators. This captures a lot of high level code
> that should skip offline AGs, and introduces the ability to
> differentiate between a lookup that didn't have an online AG and the
> end of the AG iteration range.
> 
> From there, the inode allocation AG selection is converted to active
> references, and the perag is driven deeper into the inode allocation
> and btree code to replace the xfs_mount. Most of the inode
> allocation code operates on a single AG once it is selected, hence
> it should pass the perag as the primary referenced object around for
> allocation, not the xfs_mount. There is a bit of churn here, but it
> emphasises that inode allocation is inherently an allocation group
> based operation.
> 
> Next the bmap/alloc interface undergoes a major untangling,
> reworking xfs_bmap_btalloc() into separate allocation operations for
> different contexts and failure handling behaviours. This then allows
> us to completely remove the xfs_alloc_vextent() layer via
> restructuring the xfs_alloc_vextent/xfs_alloc_ag_vextent() into a
> set of realtively simple helper function that describe the
> allocation that they are doing. e.g.  xfs_alloc_vextent_exact_bno().
> 
> This allows the requirements for accessing AGs to be allocation
> context dependent. The allocations that require operation on a
> single AG generally can't tolerate failure after the allocation
> method and AG has been decided on, and hence the caller needs to
> manage the active references to ensure the allocation does not race
> with shrink removing the selected AG for the duration of the
> operation that requires access to that allocation group.
> 
> Other allocations iterate AGs and so the first AG is just a hint -
> these do not need to pin a perag first as they can tolerate not
> being able to access an AG by simply skipping over it. These require
> new perag iteration functions that can start at arbitrary AGs and
> wrap around at arbitrary AGs, hence a new set for
> for_each_perag_wrap*() helpers to do this.
> 
> Next is the rework of the filestreams allocator. This doesn't change
> any functionality, but gets rid of the unnecessary multi-pass
> selection algorithm when the selected AG is not available. It
> currently does a lookup pass which might iterate all AGs to select
> an AG, then checks if the AG is acceptible and if not does a "new
> AG" pass that is essentially identical to the lookup pass. Both of
> these scans also do the same "longest extent in AG" check before
> selecting an AG as is done after the AG is selected.
> 
> IOWs, the filestreams algorithm can be greatly simplified into a
> single new AG selection pass if the there is no current association
> or the currently associated AG doesn't have enough contiguous free
> space for the allocation to proceed.  With this simplification of
> the filestreams allocator, it's then trivial to convert it to use
> for_each_perag_wrap() for the AG scan algorithm.
> 
> This series passes auto group fstests with rmapbt=1 on both 1kB and
> 4kB block size configurations without functional or performance
> regressions. In some cases ENOSPC behaviour is improved, but fstests
> does not capture those improvements as it only tests for regressions
> in behaviour.
> 

For all the patches that I have not sent replies to,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

IIRC that's patches 1-6, 8, 10-13, 16, 18-19, 24-27, and 30-40.

--D

> Version 2:
> - AGI, AGF and AGFL access conversion patches removed due to being
>   merged.
> - AG geometry conversion patches removed due to being merged
> - Rebase on 6.2-rc4
> - fixed "firstblock" AGF deadlock avoidance algorithm
> - lots of cleanups and bug fixes.
> 
> Version 1 [RFC]:
> - https://lore.kernel.org/linux-xfs/20220611012659.3418072-1-david@fromorbit.com/
> 

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 07/42] xfs: active perag reference counting
  2023-02-01 19:08   ` Darrick J. Wong
@ 2023-02-06 22:56     ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 22:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 01, 2023 at 11:08:53AM -0800, Darrick J. Wong wrote:
> On Thu, Jan 19, 2023 at 09:44:30AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We need to be able to dynamically remove instantiated AGs from
> > memory safely, either for shrinking the filesystem or paging AG
> > state in and out of memory (e.g. supporting millions of AGs). This
> > means we need to be able to safely exclude operations from accessing
> > perags while dynamic removal is in progress.
> > 
> > To do this, introduce the concept of active and passive references.
> > Active references are required for high level operations that make
> > use of an AG for a given operation (e.g. allocation) and pin the
> > perag in memory for the duration of the operation that is operating
> > on the perag (e.g. transaction scope). This means we can fail to get
> > an active reference to an AG, hence callers of the new active
> > reference API must be able to handle lookup failure gracefully.
> > 
> > Passive references are used in low level code, where we might need
> > to access the perag structure for the purposes of completing high
> > level operations. For example, buffers need to use passive
> > references because:
> > - we need to be able to do metadata IO during operations like grow
> >   and shrink transactions where high level active references to the
> >   AG have already been blocked
> > - buffers need to pin the perag until they are reclaimed from
> >   memory, something that high level code has no direct control over.
> > - unused cached buffers should not prevent a shrink from being
> >   started.
> > 
> > Hence we have active references that will form exclusion barriers
> > for operations to be performed on an AG, and passive references that
> > will prevent reclaim of the perag until all objects with passive
> > references have been reclaimed themselves.
> 
> This is going to be fun to rebase the online fsck series on top of. :)
> 
> If I'm understanding correctly, active perag refs are for high level
> code that wants to call down into an AG to do some operation
> (allocating, freeing, scanning, whatever)?  So I think online fsck
> uniformly wants xfs_perag_grab/rele, right?

That depends. For scrubbing, yes, active references are probably
going to be needed. For repair of AG structures where the AG needs
to be taken offline, we will likely have to take the AG offline to
prevent allocation from being attempted in them. Yes, we currently
use the AGF/AGI lock to prevent that, but this results in blocking
user applications during allocation until repair is done with the
AG. We really want application allocation to naturally skip AGs
under repair, not block until the repair is done....

As such, I think the answer is scrub should use active references as
it scans, but repair needs to use passive references once the AG has
had it's state changed to "offline" as active references will only
be available on "fully online" AGs.

> Passive refs are (I think) for lower level code that's wants to call up
> into an AG to finish off something that was already started? 

Yes, like buffers carrying a passive reference to pin the perag
while there are cached buffers indexed by the perag buffer hash.
Here we only care about the existence of the perag structure, as we
need to do IO to the AG metadata regardless of whether the perag is
active or not.

> And
> probably by upper level code?  So the amount of code that actually wants
> a passive reference is pretty small?

I don't think it's "small" - all the back end code that uses the
perag as the root of indexing structures will likely need passive
references.

The mental model I'm using is that active references are for
tracking user-facing and user-data operations that require perag
access.  That's things like inode allocation, data extent
allocation, etc which will need to skip over AGs that aren't
available for storing new user data/metadata at the current time.

Anything that is internal (e.g. metadata buffers, inode cache walks
for reclaim) that needs to run regardless of user operation just
needs an existence guarantee over the life of the object. This is
what passive references provide - the perag cannot be freed from
memory while there are still passive references to it.

Hence I'm looking at active references as a mechanism that can
provide an access barrier/drain for serialising per-ag operational
state changes, not to provide per-ag existence guarantees. Passive
references provide low level existence guarantees, active references
allow online/offline/no-alloc/shrinking/etc operational state
changes to be made safely.

> > This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API
> > for active AG reference functionality. We also need to convert the
> > for_each_perag*() iterators to use active references, which will
> > start the process of converting high level code over to using active
> > references. Conversion of non-iterator based code to active
> > references will be done in followup patches.
> 
> Is there any code that iterates perag structures via passive references?
> I think the answer to this is 'no'?

I think the answer is yes - inode cache walking is a good example of
this. That will (eventually) have to grab a passive reference to the
perag and check the return - if it fails the perag has just been
torn down so we need to skip it. If it succeeds then we have a
reference that pins the perag in memory and we can safely walk the
inode cache structures in that perag.

Some of the operations that the inode cache walks perform (e.g.
block trimming) might need active references to per-ags to perform
their work (e.g. because a different AG is offline being repaired
and so we cannot free the post-eof blocks without blocking on that
offline AG). However, we don't want to skip inode cache walks just
because an AG is not allowing new allocations to be made in it....

> The code changes look all right.  If the answers to the above questions
> are "yes", "yes", "yes", and "no", then:
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

The answers are a whole lot more nuanced than that, unfortunately.
Which means that some of the repair infrastructure will need to be
done differently as the state changes for shrink are introduced. I
don't think there's any show-stoppers here, though.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 00/42] xfs: per-ag centric allocation alogrithms
  2023-02-02  0:14 ` [PATCH 00/42] xfs: per-ag centric allocation alogrithms Darrick J. Wong
@ 2023-02-06 23:13   ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 23:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 01, 2023 at 04:14:21PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 19, 2023 at 09:44:23AM +1100, Dave Chinner wrote:
> > This series continues the work towards making shrinking a filesystem
> > possible.  We need to be able to stop operations from taking place
> > on AGs that need to be removed by a shrink, so before shrink can be
> > implemented we need to have the infrastructure in place to prevent
> > incursion into AGs that are going to be, or are in the process, of
> > being removed from active duty.
> > 
> > The focus of this is making operations that depend on access to AGs
> > use the perag to access and pin the AG in active use, thereby
> > creating a barrier we can use to delay shrink until all active uses
> > of an AG have been drained and new uses are prevented.
> > 
> > This series starts by fixing some existing issues that are exposed
> > by changes later in the series. They stand alone, so can be picked
> > up independently of the rest of this patchset.
> 
> Hmm if I had to pick up only the bugfixes, which patches are those?
> Patches 1-3 look like bug fixes, 4-6 might be but might not be?

1-3 are bug fixes. 4-6 are dependent on 1 and they expand the range
of AGs that can be allocated in when a single AG is at ENOSPC. We
have had users reporting premature ENOSPC being reported to
applications in this exact situation in the past (maybe half a dozen
in the past decade or so?), so it is a bug fix of sorts. It's not a
critical bug fix, though, as it's not a common problem.

.....
> 
> For all the patches that I have not sent replies to,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> 
> IIRC that's patches 1-6, 8, 10-13, 16, 18-19, 24-27, and 30-40.

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap()
  2023-01-23  5:41   ` Allison Henderson
@ 2023-02-06 23:14     ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 23:14 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Mon, Jan 23, 2023 at 05:41:09AM +0000, Allison Henderson wrote:
> On Thu, 2023-01-19 at 09:44 +1100, Dave Chinner wrote:
> > @@ -3218,21 +3214,21 @@ xfs_bmap_btalloc_select_lengths(
> >         }
> >  
> >         args->total = ap->total;
> > -       startag = ag = XFS_FSB_TO_AGNO(mp, args->fsbno);
> > +       startag = XFS_FSB_TO_AGNO(mp, args->fsbno);
> >         if (startag == NULLAGNUMBER)
> > -               startag = ag = 0;
> > +               startag = 0;
> >  
> > -       while (*blen < args->maxlen) {
> > -               error = xfs_bmap_longest_free_extent(args->tp, ag,
> > blen,
> > +       *blen = 0;
> > +       for_each_perag_wrap(mp, startag, agno, pag) {
> > +               error = xfs_bmap_longest_free_extent(pag, args->tp,
> > blen,
> >                                                      &notinit);
> >                 if (error)
> > -                       return error;
> > -
> > -               if (++ag == mp->m_sb.sb_agcount)
> > -                       ag = 0;
> > -               if (ag == startag)
> > +                       break;
> > +               if (*blen >= args->maxlen)
> >                         break;
> >         }
> > +       if (pag)
> > +               xfs_perag_rele(pag);
> >  
> >         xfs_bmap_select_minlen(ap, args, blen, notinit);
> >         return 0;
> Hmm, did you want to return error here?  Since now we only break on
> error in the loop body above?

Yup, good catch Allison, that needs fixing.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate
  2023-02-01 22:43   ` Darrick J. Wong
@ 2023-02-06 23:16     ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 23:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 01, 2023 at 02:43:13PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 19, 2023 at 09:44:43AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Change obvious callers of single AG allocation to use
> > xfs_alloc_vextent_first_ag(). This gets rid of
> > XFS_ALLOCTYPE_FIRST_AG as the type used within
> > xfs_alloc_vextent_first_ag() during iteration is _THIS_AG. Hence we
> > can remove the setting of args->type from all the callers of
> > _first_ag() and remove the alloctype.
> > 
> > While doing this, pass the allocation target fsb as a parameter
> > rather than encoding it in args->fsbno. This starts the process
> > of making args->fsbno an output only variable rather than
> > input/output.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 35 +++++++++++++++++++----------------
> >  fs/xfs/libxfs/xfs_alloc.h | 10 ++++++++--
> >  fs/xfs/libxfs/xfs_bmap.c  | 31 ++++++++++++++++---------------
> >  3 files changed, 43 insertions(+), 33 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 28b79facf2e3..186ce3aee9e0 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -3183,7 +3183,8 @@ xfs_alloc_read_agf(
> >   */
> >  static int
> >  xfs_alloc_vextent_check_args(
> > -	struct xfs_alloc_arg	*args)
> > +	struct xfs_alloc_arg	*args,
> > +	xfs_rfsblock_t		target)
> 
> Isn't xfs_rfsblock_t supposed to be used to measure quantities of raw fs
> blocks, and not the segmented agno/agbno numbers that we encode in most
> places?

Yup, just a minor braino. I'll fix those.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 29/42] xfs: convert trim to use for_each_perag_range
  2023-02-01 23:15   ` Darrick J. Wong
@ 2023-02-06 23:19     ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 23:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 01, 2023 at 03:15:15PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 19, 2023 at 09:44:52AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > To convert it to using active perag references and hence make it
> > shrink safe.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_discard.c | 50 ++++++++++++++++++++------------------------
> >  1 file changed, 23 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> > index bfc829c07f03..afc4c78b9eed 100644
> > --- a/fs/xfs/xfs_discard.c
> > +++ b/fs/xfs/xfs_discard.c
> > @@ -21,23 +21,20 @@
> >  
> >  STATIC int
> >  xfs_trim_extents(
> > -	struct xfs_mount	*mp,
> > -	xfs_agnumber_t		agno,
> > +	struct xfs_perag	*pag,
> >  	xfs_daddr_t		start,
> >  	xfs_daddr_t		end,
> >  	xfs_daddr_t		minlen,
> >  	uint64_t		*blocks_trimmed)
> >  {
> > +	struct xfs_mount	*mp = pag->pag_mount;
> >  	struct block_device	*bdev = mp->m_ddev_targp->bt_bdev;
> >  	struct xfs_btree_cur	*cur;
> >  	struct xfs_buf		*agbp;
> >  	struct xfs_agf		*agf;
> > -	struct xfs_perag	*pag;
> >  	int			error;
> >  	int			i;
> >  
> > -	pag = xfs_perag_get(mp, agno);
> > -
> >  	/*
> >  	 * Force out the log.  This means any transactions that might have freed
> 
> This is a tangent, but one thing I've wondered is if it's really
> necessary to force the log for *every* AG that we want to trim?  Even if
> we've just come from trimming the previous AG?

I suspect the thought behind this is that TRIM operations can be
really slow, so there can be a big build-up of new busy extents as a
large fragmented AG is trimmed.

I don't think it really matters at this point - if you are running a
multi-AG trim range, a few extra log forces is the least of your
performance worries. If someone reports it as a perf problem, let's
look at it then....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 41/42] xfs: return a referenced perag from filestreams allocator
  2023-02-02  0:01   ` Darrick J. Wong
@ 2023-02-06 23:22     ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 23:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 01, 2023 at 04:01:24PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 19, 2023 at 09:45:04AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now that the filestreams AG selection tracks active perags, we need
> > to return an active perag to the core allocator code. This is
> > because the file allocation the filestreams code will run are AG
> > specific allocations and so need to pin the AG until the allocations
> > complete.
> > 
> > We cannot rely on the filestreams item reference to do this - the
> > filestreams association can be torn down at any time, hence we
> > need to have a separate reference for the allocation process to pin
> > the AG after it has been selected.
> > 
> > This means there is some perag juggling in allocation failure
> > fallback paths as they will do all AG scans in the case the AG
> > specific allocation fails. Hence we need to track the perag
> > reference that the filestream allocator returned to make sure we
> > don't leak it on repeated allocation failure.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c | 38 +++++++++++-----
> >  fs/xfs/xfs_filestream.c  | 93 ++++++++++++++++++++++++----------------
> >  2 files changed, 84 insertions(+), 47 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 098b46f3f3e3..7f56002b545d 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3427,6 +3427,7 @@ xfs_bmap_btalloc_at_eof(
> >  	bool			ag_only)
> >  {
> >  	struct xfs_mount	*mp = args->mp;
> > +	struct xfs_perag	*caller_pag = args->pag;
> >  	int			error;
> >  
> >  	/*
> > @@ -3454,9 +3455,11 @@ xfs_bmap_btalloc_at_eof(
> >  		else
> >  			args->minalignslop = 0;
> >  
> > -		args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> > +		if (!caller_pag)
> > +			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
> >  		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> > -		xfs_perag_put(args->pag);
> > +		if (!caller_pag)
> > +			xfs_perag_put(args->pag);
> >  		if (error)
> >  			return error;
> >  
> > @@ -3482,10 +3485,13 @@ xfs_bmap_btalloc_at_eof(
> >  		args->minalignslop = 0;
> >  	}
> >  
> > -	if (ag_only)
> > +	if (ag_only) {
> >  		error = xfs_alloc_vextent_near_bno(args, ap->blkno);
> > -	else
> > +	} else {
> > +		args->pag = NULL;
> >  		error = xfs_alloc_vextent_start_ag(args, ap->blkno);
> > +		args->pag = caller_pag;
> 
> At first glance I wondered if we end up leaking any args->pag set by the
> _iterate_ags function, but I think it's the case that _finish will
> release args->pag and set it back to NULL?

*nod*

> So in effect we're
> preserving the caller's args->pag here, and nothing leaks.  In that
> case, I think we should check that assumption:
> 
> 		ASSERT(args->pag == NULL);
> 		args->pag = caller_pag;

Sure. I'm going to try to remove this conditional caller_pag
situation as we get further down the "per-ags everywhere" hole, but
for the moment this is a necessary quirk...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 42/42] xfs: refactor the filestreams allocator pick functions
  2023-02-02  0:08   ` Darrick J. Wong
@ 2023-02-06 23:26     ` Dave Chinner
  0 siblings, 0 replies; 77+ messages in thread
From: Dave Chinner @ 2023-02-06 23:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Feb 01, 2023 at 04:08:48PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 19, 2023 at 09:45:05AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now that the filestreams allocator is largely rewritten,
> > restructure the main entry point and pick function to seperate out
> > the different operations cleanly. The MRU lookup function should not
> > handle the start AG selection on MRU lookup failure, and nor should
> > the pick function handle building the association that is inserted
> > into the MRU.
> > 
> > This leaves the filestreams allocator fairly clean and easy to
> > understand, returning to the caller with an active perag reference
> > and a target block to allocate at.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_filestream.c | 247 +++++++++++++++++++++-------------------
> >  fs/xfs/xfs_trace.h      |   9 +-
> >  2 files changed, 132 insertions(+), 124 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> > index 523a3b8b5754..0a1d316ebdba 100644
> > --- a/fs/xfs/xfs_filestream.c
> > +++ b/fs/xfs/xfs_filestream.c
> > @@ -48,19 +48,19 @@ xfs_fstrm_free_func(
> >  }
> >  
> >  /*
> > - * Scan the AGs starting at startag looking for an AG that isn't in use and has
> > - * at least minlen blocks free.
> > + * Scan the AGs starting at start_agno looking for an AG that isn't in use and
> > + * has at least minlen blocks free. If no AG is found to match the allocation
> > + * requirements, pick the AG with the most free space in it.
> >   */
> >  static int
> >  xfs_filestream_pick_ag(
> >  	struct xfs_alloc_arg	*args,
> > -	struct xfs_inode	*ip,
> > +	xfs_ino_t		pino,
> >  	xfs_agnumber_t		start_agno,
> >  	int			flags,
> >  	xfs_extlen_t		*longest)
> >  {
> > -	struct xfs_mount	*mp = ip->i_mount;
> > -	struct xfs_fstrm_item	*item;
> > +	struct xfs_mount	*mp = args->mp;
> >  	struct xfs_perag	*pag;
> >  	struct xfs_perag	*max_pag = NULL;
> >  	xfs_extlen_t		minlen = *longest;
> > @@ -68,8 +68,6 @@ xfs_filestream_pick_ag(
> >  	xfs_agnumber_t		agno;
> >  	int			err, trylock;
> 
> Who consumes trylock?  Is this supposed to get passed through to
> xfs_bmap_longest_free_extent, or is the goal here merely to run the
> for_each_perag_wrap loop twice before going for the most free or any old
> perag?

It was originally used in this loop for directing the AGF locking,
but it looks like I removed all the cases where we we directly read
and lock AGFs in this loop. Hence it's now only used to run the loop
a second time. I'll change it to a boolean flag instead.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2023-02-06 23:26 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 22:44 [PATCH 00/42] xfs: per-ag centric allocation alogrithms Dave Chinner
2023-01-18 22:44 ` [PATCH 01/42] xfs: fix low space alloc deadlock Dave Chinner
2023-01-19 16:39   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 02/42] xfs: prefer free inodes at ENOSPC over chunk allocation Dave Chinner
2023-01-19 19:08   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 03/42] xfs: block reservation too large for minleft allocation Dave Chinner
2023-01-19 20:38   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 04/42] xfs: drop firstblock constraints from allocation setup Dave Chinner
2023-01-19 22:03   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 05/42] xfs: t_firstblock is tracking AGs not blocks Dave Chinner
2023-01-19 22:12   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 06/42] xfs: don't assert fail on transaction cancel with deferred ops Dave Chinner
2023-01-19 22:18   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 07/42] xfs: active perag reference counting Dave Chinner
2023-01-21  5:16   ` Allison Henderson
2023-02-01 19:08   ` Darrick J. Wong
2023-02-06 22:56     ` Dave Chinner
2023-01-18 22:44 ` [PATCH 08/42] xfs: rework the perag trace points to be perag centric Dave Chinner
2023-01-21  5:16   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 09/42] xfs: convert xfs_imap() to take a perag Dave Chinner
2023-02-01 19:10   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 10/42] xfs: use active perag references for inode allocation Dave Chinner
2023-01-22  6:48   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 11/42] xfs: inobt can use perags in many more places than it does Dave Chinner
2023-01-22  6:48   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 12/42] xfs: convert xfs_ialloc_next_ag() to an atomic Dave Chinner
2023-01-22  7:03   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 13/42] xfs: perags need atomic operational state Dave Chinner
2023-01-23  4:04   ` Allison Henderson
2023-01-18 22:44 ` [PATCH 14/42] xfs: introduce xfs_for_each_perag_wrap() Dave Chinner
2023-01-23  5:41   ` Allison Henderson
2023-02-06 23:14     ` Dave Chinner
2023-02-01 19:28   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 15/42] xfs: rework xfs_alloc_vextent() Dave Chinner
2023-02-01 19:39   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 16/42] xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags() Dave Chinner
2023-01-18 22:44 ` [PATCH 17/42] xfs: combine __xfs_alloc_vextent_this_ag and xfs_alloc_ag_vextent Dave Chinner
2023-02-01 22:25   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 18/42] xfs: use xfs_alloc_vextent_this_ag() where appropriate Dave Chinner
2023-01-18 22:44 ` [PATCH 19/42] xfs: factor xfs_bmap_btalloc() Dave Chinner
2023-01-18 22:44 ` [PATCH 20/42] xfs: use xfs_alloc_vextent_first_ag() where appropriate Dave Chinner
2023-02-01 22:43   ` Darrick J. Wong
2023-02-06 23:16     ` Dave Chinner
2023-01-18 22:44 ` [PATCH 21/42] xfs: use xfs_alloc_vextent_start_bno() " Dave Chinner
2023-02-01 22:51   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 22/42] xfs: introduce xfs_alloc_vextent_near_bno() Dave Chinner
2023-02-01 22:52   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 23/42] xfs: introduce xfs_alloc_vextent_exact_bno() Dave Chinner
2023-02-01 23:00   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 24/42] xfs: introduce xfs_alloc_vextent_prepare() Dave Chinner
2023-01-18 22:44 ` [PATCH 25/42] xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno() Dave Chinner
2023-01-18 22:44 ` [PATCH 26/42] xfs: fold xfs_alloc_ag_vextent() into callers Dave Chinner
2023-01-18 22:44 ` [PATCH 27/42] xfs: move the minimum agno checks into xfs_alloc_vextent_check_args Dave Chinner
2023-01-18 22:44 ` [PATCH 28/42] xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker Dave Chinner
2023-02-01 23:13   ` Darrick J. Wong
2023-01-18 22:44 ` [PATCH 29/42] xfs: convert trim to use for_each_perag_range Dave Chinner
2023-02-01 23:15   ` Darrick J. Wong
2023-02-06 23:19     ` Dave Chinner
2023-01-18 22:44 ` [PATCH 30/42] xfs: factor out filestreams from xfs_bmap_btalloc_nullfb Dave Chinner
2023-01-18 22:44 ` [PATCH 31/42] xfs: get rid of notinit from xfs_bmap_longest_free_extent Dave Chinner
2023-01-18 22:44 ` [PATCH 32/42] xfs: use xfs_bmap_longest_free_extent() in filestreams Dave Chinner
2023-01-18 22:44 ` [PATCH 33/42] xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c Dave Chinner
2023-01-18 22:44 ` [PATCH 34/42] xfs: merge filestream AG lookup into xfs_filestream_select_ag() Dave Chinner
2023-01-18 22:44 ` [PATCH 35/42] xfs: merge new filestream AG selection " Dave Chinner
2023-01-18 22:44 ` [PATCH 36/42] xfs: remove xfs_filestream_select_ag() longest extent check Dave Chinner
2023-01-18 22:45 ` [PATCH 37/42] xfs: factor out MRU hit case in xfs_filestream_select_ag Dave Chinner
2023-01-18 22:45 ` [PATCH 38/42] xfs: track an active perag reference in filestreams Dave Chinner
2023-01-18 22:45 ` [PATCH 39/42] xfs: use for_each_perag_wrap in xfs_filestream_pick_ag Dave Chinner
2023-01-18 22:45 ` [PATCH 40/42] xfs: pass perag to filestreams tracing Dave Chinner
2023-01-18 22:45 ` [PATCH 41/42] xfs: return a referenced perag from filestreams allocator Dave Chinner
2023-02-02  0:01   ` Darrick J. Wong
2023-02-06 23:22     ` Dave Chinner
2023-01-18 22:45 ` [PATCH 42/42] xfs: refactor the filestreams allocator pick functions Dave Chinner
2023-02-02  0:08   ` Darrick J. Wong
2023-02-06 23:26     ` Dave Chinner
2023-02-02  0:14 ` [PATCH 00/42] xfs: per-ag centric allocation alogrithms Darrick J. Wong
2023-02-06 23:13   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.