All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: don't include bnobt blocks when reserving free block pool
@ 2022-03-14 18:08 Darrick J. Wong
  2022-03-16 11:28 ` Brian Foster
  0 siblings, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2022-03-14 18:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

From: Darrick J. Wong <djwong@kernel.org>

xfs_reserve_blocks controls the size of the user-visible free space
reserve pool.  Given the difference between the current and requested
pool sizes, it will try to reserve free space from fdblocks.  However,
the amount requested from fdblocks is also constrained by the amount of
space that we think xfs_mod_fdblocks will give us.  We'll keep trying to
reserve space so long as xfs_mod_fdblocks returns ENOSPC.

In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
out the "free space" used by the free space btrees, because some portion
of the free space btrees hold in reserve space for future btree
expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
of blocks that it could request from xfs_mod_fdblocks was not updated to
include m_allocbt_blks, so if space is extremely low, the caller hangs.

Fix this by including m_allocbt_blks in the estimation, and modify the
loop so that it will not retry infinitely.

Found by running xfs/306 (which formats a single-AG 20MB filesystem)
with an fstests configuration that specifies a 1k blocksize and a
specially crafted log size that will consume 7/8 of the space (17920
blocks, specifically) in that AG.

Cc: Brian Foster <bfoster@redhat.com>
Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 33e26690a8c4..78b6982ea5b0 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -379,6 +379,7 @@ xfs_reserve_blocks(
 	int64_t			fdblks_delta = 0;
 	uint64_t		request;
 	int64_t			free;
+	unsigned int		tries;
 	int			error = 0;
 
 	/* If inval is null, report current values and return */
@@ -432,9 +433,16 @@ xfs_reserve_blocks(
 	 * perform a partial reservation if the request exceeds free space.
 	 */
 	error = -ENOSPC;
-	do {
-		free = percpu_counter_sum(&mp->m_fdblocks) -
-						mp->m_alloc_set_aside;
+	for (tries = 0; tries < 30 && error == -ENOSPC; tries++) {
+		/*
+		 * The reservation pool cannot take space that xfs_mod_fdblocks
+		 * will not give us.  This includes the per-AG set-aside space
+		 * and free space btree blocks that are not available for
+		 * allocation due to per-AG metadata reservations.
+		 */
+		free = percpu_counter_sum(&mp->m_fdblocks);
+		free -= mp->m_alloc_set_aside;
+		free -= atomic64_read(&mp->m_allocbt_blks);
 		if (free <= 0)
 			break;
 
@@ -459,7 +467,7 @@ xfs_reserve_blocks(
 		spin_unlock(&mp->m_sb_lock);
 		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
 		spin_lock(&mp->m_sb_lock);
-	} while (error == -ENOSPC);
+	}
 
 	/*
 	 * Update the reserve counters if blocks have been successfully

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-03-17 17:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14 18:08 [PATCH] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-03-16 11:28 ` Brian Foster
2022-03-16 16:32   ` Darrick J. Wong
2022-03-16 17:29     ` Brian Foster
2022-03-16 18:17       ` Darrick J. Wong
2022-03-16 18:48         ` Brian Foster
2022-03-16 19:17           ` Brian Foster
2022-03-16 21:15             ` Darrick J. Wong
2022-03-17  2:19               ` Dave Chinner
2022-03-17 12:53               ` Brian Foster
2022-03-17  2:05         ` Dave Chinner
2022-03-17 12:56           ` Brian Foster
2022-03-17 15:46             ` Darrick J. Wong
2022-03-17 17:05           ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.