All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting
@ 2022-03-27 16:58 Darrick J. Wong
  2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Brian Foster, linux-xfs, bfoster, david

Hi all,

Last week, I was running xfs/306 (which formats a single-AG 20MB
filesystem) with an fstests configuration that specifies a 1k blocksize
and a specially crafted log size that will consume 7/8 of the space
(17920 blocks, specifically) in that AG.  This resulted mount hanging on
the infinite loop in xfs_reserve_blocks because we overestimate the
number of blocks that xfs_mod_fdblocks will give us, and the code
retries forever.

v2: Initially, this was a single patch that fixed the calculation and
transformed the loop into a finite loop.  However, further discussion
revealed several more issues:

 * People had a hard time grokking that the "alloc_set_aside" is
   actually the amount of space we hide so that a bmbt split will always
   succeed;
 * The author didn't understand what XFS_ALLOC_AGFL_RESERVE actually
   mean or whether it was correctly calculated;
 * The "alloc_set_aside" underestimated the blocks needed to handle any
   bmap btree split;
 * Both statfs and XFS_IOC_FSCOUNTS forgot to exclude the amount of
   space used by the free space btrees on behalf of per-AG reservations,
   leading to overreporting of available space;
 * The loop control change really should have been separate.

Hence, this patchset has now grown to six patches to address each of
these issues separately.

v3: only add one helper for calculating the total fdblocks setaside and
tighten some documentation

v4: It turns out I was wrong about the purpose of alloc_set_aside, so
fix the new comments in the first patch, delete the last patch, and
leave the "btree split calculations" for another patchset.  Eliminate
the looping behavior in xfs_reserve_blocks, and fix a case where we
could race to set the pool size and thereby overfill it.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=reserve-block-fixes-5.18
---
 fs/xfs/libxfs/xfs_alloc.c |   28 +++++++++++++++++++----
 fs/xfs/libxfs/xfs_alloc.h |    1 -
 fs/xfs/xfs_fsops.c        |   54 ++++++++++++++++++---------------------------
 fs/xfs/xfs_mount.c        |    2 +-
 fs/xfs/xfs_mount.h        |   15 +++++++++++++
 fs/xfs/xfs_super.c        |    3 ++-
 6 files changed, 63 insertions(+), 40 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant
  2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
@ 2022-03-27 16:58 ` Darrick J. Wong
  2022-03-28  1:18   ` Dave Chinner
  2022-04-01  5:51   ` Christoph Hellwig
  2022-03-27 16:58 ` [PATCH 2/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, bfoster, david

From: Darrick J. Wong <djwong@kernel.org>

Currently, we use this undocumented macro to encode the minimum number
of blocks needed to replenish a completely empty AGFL when an AG is
nearly full.  This has lead to confusion on the part of the maintainers,
so let's document what the value actually means, and move it to
xfs_alloc.c since it's not used outside of that module.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc.c |   28 +++++++++++++++++++++++-----
 fs/xfs/libxfs/xfs_alloc.h |    1 -
 2 files changed, 23 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 353e53b892e6..b52ed339727f 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -82,6 +82,24 @@ xfs_prealloc_blocks(
 }
 
 /*
+ * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
+ * guarantee that we can refill the AGFL prior to allocating space in a nearly
+ * full AG.  Although the the space described by the free space btrees, the
+ * blocks used by the freesp btrees themselves, and the blocks owned by the
+ * AGFL are counted in the ondisk fdblocks, it's a mistake to let the ondisk
+ * free space in the AG drop so low that the free space btrees cannot refill an
+ * empty AGFL up to the minimum level.  Rather than grind through empty AGs
+ * until the fs goes down, we subtract this many AG blocks from the incore
+ * fdblocks to ensure user allocation does not overcommit the space the
+ * filesystem needs for the AGFLs.  The rmap btree uses a per-AG reservation to
+ * withhold space from xfs_mod_fdblocks, so we do not account for that here.
+ */
+#define XFS_ALLOCBT_AGFL_RESERVE	4
+
+/*
+ * Compute the number of blocks that we set aside to guarantee the ability to
+ * refill the AGFL and handle a full bmap btree split.
+ *
  * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
  * AGF buffer (PV 947395), we place constraints on the relationship among
  * actual allocations for data blocks, freelist blocks, and potential file data
@@ -93,14 +111,14 @@ xfs_prealloc_blocks(
  * extents need to be actually allocated. To get around this, we explicitly set
  * aside a few blocks which will not be reserved in delayed allocation.
  *
- * We need to reserve 4 fsbs _per AG_ for the freelist and 4 more to handle a
- * potential split of the file's bmap btree.
+ * For each AG, we need to reserve enough blocks to replenish a totally empty
+ * AGFL and 4 more to handle a potential split of the file's bmap btree.
  */
 unsigned int
 xfs_alloc_set_aside(
 	struct xfs_mount	*mp)
 {
-	return mp->m_sb.sb_agcount * (XFS_ALLOC_AGFL_RESERVE + 4);
+	return mp->m_sb.sb_agcount * (XFS_ALLOCBT_AGFL_RESERVE + 4);
 }
 
 /*
@@ -124,12 +142,12 @@ xfs_alloc_ag_max_usable(
 	unsigned int		blocks;
 
 	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
-	blocks += XFS_ALLOC_AGFL_RESERVE;
+	blocks += XFS_ALLOCBT_AGFL_RESERVE;
 	blocks += 3;			/* AGF, AGI btree root blocks */
 	if (xfs_has_finobt(mp))
 		blocks++;		/* finobt root block */
 	if (xfs_has_rmapbt(mp))
-		blocks++; 		/* rmap root block */
+		blocks++;		/* rmap root block */
 	if (xfs_has_reflink(mp))
 		blocks++;		/* refcount root block */
 
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 1c14a0b1abea..d4c057b764f9 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -88,7 +88,6 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_NOBUSY		(1 << 2)/* Busy extents not allowed */
 
 /* freespace limit calculations */
-#define XFS_ALLOC_AGFL_RESERVE	4
 unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
 unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] xfs: don't include bnobt blocks when reserving free block pool
  2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
  2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
@ 2022-03-27 16:58 ` Darrick J. Wong
  2022-04-06 16:39   ` Christoph Hellwig
  2022-03-27 16:58 ` [PATCH 3/6] xfs: remove infinite loop " Darrick J. Wong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong
  Cc: Brian Foster, Brian Foster, Dave Chinner, linux-xfs, bfoster, david

From: Darrick J. Wong <djwong@kernel.org>

xfs_reserve_blocks controls the size of the user-visible free space
reserve pool.  Given the difference between the current and requested
pool sizes, it will try to reserve free space from fdblocks.  However,
the amount requested from fdblocks is also constrained by the amount of
space that we think xfs_mod_fdblocks will give us.  If we forget to
subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will
return ENOSPC and we'll hang the kernel at mount due to the infinite
loop.

In commit fd43cf600cf6, we decided that xfs_mod_fdblocks should not hand
out the "free space" used by the free space btrees, because some portion
of the free space btrees hold in reserve space for future btree
expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
of blocks that it could request from xfs_mod_fdblocks was not updated to
include m_allocbt_blks, so if space is extremely low, the caller hangs.

Fix this by creating a function to estimate the number of blocks that
can be reserved from fdblocks, which needs to exclude the set-aside and
m_allocbt_blks.

Found by running xfs/306 (which formats a single-AG 20MB filesystem)
with an fstests configuration that specifies a 1k blocksize and a
specially crafted log size that will consume 7/8 of the space (17920
blocks, specifically) in that AG.

Cc: Brian Foster <bfoster@redhat.com>
Fixes: fd43cf600cf6 ("xfs: set aside allocation btree blocks from block reservation")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c |    2 +-
 fs/xfs/xfs_mount.c |    2 +-
 fs/xfs/xfs_mount.h |   15 +++++++++++++++
 3 files changed, 17 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 33e26690a8c4..710e857bb825 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -434,7 +434,7 @@ xfs_reserve_blocks(
 	error = -ENOSPC;
 	do {
 		free = percpu_counter_sum(&mp->m_fdblocks) -
-						mp->m_alloc_set_aside;
+						xfs_fdblocks_unavailable(mp);
 		if (free <= 0)
 			break;
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bed73e8002a5..29ffa8c42795 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1146,7 +1146,7 @@ xfs_mod_fdblocks(
 	 * problems (i.e. transaction abort, pagecache discards, etc.) than
 	 * slightly premature -ENOSPC.
 	 */
-	set_aside = mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
+	set_aside = xfs_fdblocks_unavailable(mp);
 	percpu_counter_add_batch(&mp->m_fdblocks, delta, batch);
 	if (__percpu_counter_compare(&mp->m_fdblocks, set_aside,
 				     XFS_FDBLOCKS_BATCH) >= 0) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 00720a02e761..f6dc19de8322 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -479,6 +479,21 @@ extern void	xfs_unmountfs(xfs_mount_t *);
  */
 #define XFS_FDBLOCKS_BATCH	1024
 
+/*
+ * Estimate the amount of free space that is not available to userspace and is
+ * not explicitly reserved from the incore fdblocks.  This includes:
+ *
+ * - The minimum number of blocks needed to support splitting a bmap btree
+ * - The blocks currently in use by the freespace btrees because they record
+ *   the actual blocks that will fill per-AG metadata space reservations
+ */
+static inline uint64_t
+xfs_fdblocks_unavailable(
+	struct xfs_mount	*mp)
+{
+	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
+}
+
 extern int	xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta,
 				 bool reserved);
 extern int	xfs_mod_frextents(struct xfs_mount *mp, int64_t delta);


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] xfs: remove infinite loop when reserving free block pool
  2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
  2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
  2022-03-27 16:58 ` [PATCH 2/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
@ 2022-03-27 16:58 ` Darrick J. Wong
  2022-03-28  1:20   ` Dave Chinner
  2022-04-06 16:40   ` Christoph Hellwig
  2022-03-27 16:58 ` [PATCH 4/6] xfs: always succeed at setting the reserve pool size Darrick J. Wong
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong; +Cc: Brian Foster, linux-xfs, bfoster, david

From: Darrick J. Wong <djwong@kernel.org>

Infinite loops in kernel code are scary.  Calls to xfs_reserve_blocks
should be rare (people should just use the defaults!) so we really don't
need to try so hard.  Simplify the logic here by removing the infinite
loop.

Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c |   50 ++++++++++++++++++++------------------------------
 1 file changed, 20 insertions(+), 30 deletions(-)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 710e857bb825..3c6d9d6836ef 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -430,46 +430,36 @@ xfs_reserve_blocks(
 	 * If the request is larger than the current reservation, reserve the
 	 * blocks before we update the reserve counters. Sample m_fdblocks and
 	 * perform a partial reservation if the request exceeds free space.
+	 *
+	 * The code below estimates how many blocks it can request from
+	 * fdblocks to stash in the reserve pool.  This is a classic TOCTOU
+	 * race since fdblocks updates are not always coordinated via
+	 * m_sb_lock.
 	 */
-	error = -ENOSPC;
-	do {
-		free = percpu_counter_sum(&mp->m_fdblocks) -
+	free = percpu_counter_sum(&mp->m_fdblocks) -
 						xfs_fdblocks_unavailable(mp);
-		if (free <= 0)
-			break;
-
-		delta = request - mp->m_resblks;
-		lcounter = free - delta;
-		if (lcounter < 0)
-			/* We can't satisfy the request, just get what we can */
-			fdblks_delta = free;
-		else
-			fdblks_delta = delta;
-
+	delta = request - mp->m_resblks;
+	if (delta > 0 && free > 0) {
 		/*
 		 * We'll either succeed in getting space from the free block
-		 * count or we'll get an ENOSPC. If we get a ENOSPC, it means
-		 * things changed while we were calculating fdblks_delta and so
-		 * we should try again to see if there is anything left to
-		 * reserve.
-		 *
-		 * Don't set the reserved flag here - we don't want to reserve
-		 * the extra reserve blocks from the reserve.....
+		 * count or we'll get an ENOSPC.  Don't set the reserved flag
+		 * here - we don't want to reserve the extra reserve blocks
+		 * from the reserve.
 		 */
+		fdblks_delta = min(free, delta);
 		spin_unlock(&mp->m_sb_lock);
 		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
 		spin_lock(&mp->m_sb_lock);
-	} while (error == -ENOSPC);
 
-	/*
-	 * Update the reserve counters if blocks have been successfully
-	 * allocated.
-	 */
-	if (!error && fdblks_delta) {
-		mp->m_resblks += fdblks_delta;
-		mp->m_resblks_avail += fdblks_delta;
+		/*
+		 * Update the reserve counters if blocks have been successfully
+		 * allocated.
+		 */
+		if (!error) {
+			mp->m_resblks += fdblks_delta;
+			mp->m_resblks_avail += fdblks_delta;
+		}
 	}
-
 out:
 	if (outval) {
 		outval->resblks = mp->m_resblks;


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] xfs: always succeed at setting the reserve pool size
  2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
                   ` (2 preceding siblings ...)
  2022-03-27 16:58 ` [PATCH 3/6] xfs: remove infinite loop " Darrick J. Wong
@ 2022-03-27 16:58 ` Darrick J. Wong
  2022-03-28  1:21   ` Dave Chinner
  2022-04-06 16:40   ` Christoph Hellwig
  2022-03-27 16:58 ` [PATCH 5/6] xfs: fix overfilling of reserve pool Darrick J. Wong
  2022-03-27 16:58 ` [PATCH 6/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
  5 siblings, 2 replies; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, bfoster, david

From: Darrick J. Wong <djwong@kernel.org>

Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
with freed blocks before adding to fdblocks.  Therefore, we can change
the behavior of xfs_reserve_blocks slightly -- setting the target size
of the pool should always succeed, since a deficiency will eventually
be made up as blocks get freed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 3c6d9d6836ef..5c2bea1e12a8 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -434,11 +434,14 @@ xfs_reserve_blocks(
 	 * The code below estimates how many blocks it can request from
 	 * fdblocks to stash in the reserve pool.  This is a classic TOCTOU
 	 * race since fdblocks updates are not always coordinated via
-	 * m_sb_lock.
+	 * m_sb_lock.  Set the reserve size even if there's not enough free
+	 * space to fill it because mod_fdblocks will refill an undersized
+	 * reserve when it can.
 	 */
 	free = percpu_counter_sum(&mp->m_fdblocks) -
 						xfs_fdblocks_unavailable(mp);
 	delta = request - mp->m_resblks;
+	mp->m_resblks = request;
 	if (delta > 0 && free > 0) {
 		/*
 		 * We'll either succeed in getting space from the free block
@@ -455,10 +458,8 @@ xfs_reserve_blocks(
 		 * Update the reserve counters if blocks have been successfully
 		 * allocated.
 		 */
-		if (!error) {
-			mp->m_resblks += fdblks_delta;
+		if (!error)
 			mp->m_resblks_avail += fdblks_delta;
-		}
 	}
 out:
 	if (outval) {


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] xfs: fix overfilling of reserve pool
  2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
                   ` (3 preceding siblings ...)
  2022-03-27 16:58 ` [PATCH 4/6] xfs: always succeed at setting the reserve pool size Darrick J. Wong
@ 2022-03-27 16:58 ` Darrick J. Wong
  2022-03-28  1:22   ` Dave Chinner
  2022-04-06 16:42   ` Christoph Hellwig
  2022-03-27 16:58 ` [PATCH 6/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
  5 siblings, 2 replies; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, bfoster, david

From: Darrick J. Wong <djwong@kernel.org>

Due to cycling of m_sb_lock, it's possible for multiple callers of
xfs_reserve_blocks to race at changing the pool size, subtracting blocks
from fdblocks, and actually putting it in the pool.  The result of all
this is that we can overfill the reserve pool to hilarious levels.

xfs_mod_fdblocks, when called with a positive value, already knows how
to take freed blocks and either fill the reserve until it's full, or put
them in fdblocks.  Use that instead of setting m_resblks_avail directly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 5c2bea1e12a8..5b5b68affe66 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -448,18 +448,17 @@ xfs_reserve_blocks(
 		 * count or we'll get an ENOSPC.  Don't set the reserved flag
 		 * here - we don't want to reserve the extra reserve blocks
 		 * from the reserve.
+		 *
+		 * The desired reserve size can change after we drop the lock.
+		 * Use mod_fdblocks to put the space into the reserve or into
+		 * fdblocks as appropriate.
 		 */
 		fdblks_delta = min(free, delta);
 		spin_unlock(&mp->m_sb_lock);
 		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
-		spin_lock(&mp->m_sb_lock);
-
-		/*
-		 * Update the reserve counters if blocks have been successfully
-		 * allocated.
-		 */
 		if (!error)
-			mp->m_resblks_avail += fdblks_delta;
+			xfs_mod_fdblocks(mp, fdblks_delta, 0);
+		spin_lock(&mp->m_sb_lock);
 	}
 out:
 	if (outval) {


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] xfs: don't report reserved bnobt space as available
  2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
                   ` (4 preceding siblings ...)
  2022-03-27 16:58 ` [PATCH 5/6] xfs: fix overfilling of reserve pool Darrick J. Wong
@ 2022-03-27 16:58 ` Darrick J. Wong
  2022-04-06 16:43   ` Christoph Hellwig
  5 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2022-03-27 16:58 UTC (permalink / raw)
  To: djwong; +Cc: Brian Foster, Dave Chinner, linux-xfs, bfoster, david

From: Darrick J. Wong <djwong@kernel.org>

On a modern filesystem, we don't allow userspace to allocate blocks for
data storage from the per-AG space reservations, the user-controlled
reservation pool that prevents ENOSPC in the middle of internal
operations, or the internal per-AG set-aside that prevents unwanted
filesystem shutdowns due to ENOSPC during a bmap btree split.

Since we now consider freespace btree blocks as unavailable for
allocation for data storage, we shouldn't report those blocks via statfs
either.  This makes the numbers that we return via the statfs f_bavail
and f_bfree fields a more conservative estimate of actual free space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c |    2 +-
 fs/xfs/xfs_super.c |    3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 5b5b68affe66..196e2c51309c 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -347,7 +347,7 @@ xfs_fs_counts(
 	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
 	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
 	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
-						mp->m_alloc_set_aside;
+						xfs_fdblocks_unavailable(mp);
 
 	spin_lock(&mp->m_sb_lock);
 	cnt->freertx = mp->m_sb.sb_frextents;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d84714e4e46a..54be9d64093e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -815,7 +815,8 @@ xfs_fs_statfs(
 	spin_unlock(&mp->m_sb_lock);
 
 	/* make sure statp->f_bfree does not underflow */
-	statp->f_bfree = max_t(int64_t, fdblocks - mp->m_alloc_set_aside, 0);
+	statp->f_bfree = max_t(int64_t, 0,
+				fdblocks - xfs_fdblocks_unavailable(mp));
 	statp->f_bavail = statp->f_bfree;
 
 	fakeinos = XFS_FSB_TO_INO(mp, statp->f_bfree);


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant
  2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
@ 2022-03-28  1:18   ` Dave Chinner
  2022-04-01  5:51   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2022-03-28  1:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster

On Sun, Mar 27, 2022 at 09:58:22AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Currently, we use this undocumented macro to encode the minimum number
> of blocks needed to replenish a completely empty AGFL when an AG is
> nearly full.  This has lead to confusion on the part of the maintainers,
> so let's document what the value actually means, and move it to
> xfs_alloc.c since it's not used outside of that module.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good now with the comment update.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] xfs: remove infinite loop when reserving free block pool
  2022-03-27 16:58 ` [PATCH 3/6] xfs: remove infinite loop " Darrick J. Wong
@ 2022-03-28  1:20   ` Dave Chinner
  2022-04-06 16:40   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2022-03-28  1:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Sun, Mar 27, 2022 at 09:58:33AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Infinite loops in kernel code are scary.  Calls to xfs_reserve_blocks
> should be rare (people should just use the defaults!) so we really don't
> need to try so hard.  Simplify the logic here by removing the infinite
> loop.
> 
> Cc: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good. Callers are already expected to handle failure because
ENOSPC, so removing the loop and returning ENOSPC on failure is fine
by me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] xfs: always succeed at setting the reserve pool size
  2022-03-27 16:58 ` [PATCH 4/6] xfs: always succeed at setting the reserve pool size Darrick J. Wong
@ 2022-03-28  1:21   ` Dave Chinner
  2022-04-06 16:40   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2022-03-28  1:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster

On Sun, Mar 27, 2022 at 09:58:39AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
> with freed blocks before adding to fdblocks.  Therefore, we can change
> the behavior of xfs_reserve_blocks slightly -- setting the target size
> of the pool should always succeed, since a deficiency will eventually
> be made up as blocks get freed.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] xfs: fix overfilling of reserve pool
  2022-03-27 16:58 ` [PATCH 5/6] xfs: fix overfilling of reserve pool Darrick J. Wong
@ 2022-03-28  1:22   ` Dave Chinner
  2022-04-06 16:42   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2022-03-28  1:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster

On Sun, Mar 27, 2022 at 09:58:44AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Due to cycling of m_sb_lock, it's possible for multiple callers of
> xfs_reserve_blocks to race at changing the pool size, subtracting blocks
> from fdblocks, and actually putting it in the pool.  The result of all
> this is that we can overfill the reserve pool to hilarious levels.
> 
> xfs_mod_fdblocks, when called with a positive value, already knows how
> to take freed blocks and either fill the reserve until it's full, or put
> them in fdblocks.  Use that instead of setting m_resblks_avail directly.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Another corner case fixed :)

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant
  2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
  2022-03-28  1:18   ` Dave Chinner
@ 2022-04-01  5:51   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-04-01  5:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster, david

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] xfs: don't include bnobt blocks when reserving free block pool
  2022-03-27 16:58 ` [PATCH 2/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
@ 2022-04-06 16:39   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Dave Chinner, linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] xfs: remove infinite loop when reserving free block pool
  2022-03-27 16:58 ` [PATCH 3/6] xfs: remove infinite loop " Darrick J. Wong
  2022-03-28  1:20   ` Dave Chinner
@ 2022-04-06 16:40   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs, david

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] xfs: always succeed at setting the reserve pool size
  2022-03-27 16:58 ` [PATCH 4/6] xfs: always succeed at setting the reserve pool size Darrick J. Wong
  2022-03-28  1:21   ` Dave Chinner
@ 2022-04-06 16:40   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster, david

On Sun, Mar 27, 2022 at 09:58:39AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool
> with freed blocks before adding to fdblocks.  Therefore, we can change
> the behavior of xfs_reserve_blocks slightly -- setting the target size
> of the pool should always succeed, since a deficiency will eventually
> be made up as blocks get freed.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] xfs: fix overfilling of reserve pool
  2022-03-27 16:58 ` [PATCH 5/6] xfs: fix overfilling of reserve pool Darrick J. Wong
  2022-03-28  1:22   ` Dave Chinner
@ 2022-04-06 16:42   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, bfoster, david

On Sun, Mar 27, 2022 at 09:58:44AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Due to cycling of m_sb_lock, it's possible for multiple callers of
> xfs_reserve_blocks to race at changing the pool size, subtracting blocks
> from fdblocks, and actually putting it in the pool.  The result of all
> this is that we can overfill the reserve pool to hilarious levels.
> 
> xfs_mod_fdblocks, when called with a positive value, already knows how
> to take freed blocks and either fill the reserve until it's full, or put
> them in fdblocks.  Use that instead of setting m_resblks_avail directly.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] xfs: don't report reserved bnobt space as available
  2022-03-27 16:58 ` [PATCH 6/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
@ 2022-04-06 16:43   ` Christoph Hellwig
  0 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2022-04-06 16:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, Dave Chinner, linux-xfs, david

On Sun, Mar 27, 2022 at 09:58:50AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> On a modern filesystem, we don't allow userspace to allocate blocks for
> data storage from the per-AG space reservations, the user-controlled
> reservation pool that prevents ENOSPC in the middle of internal
> operations, or the internal per-AG set-aside that prevents unwanted
> filesystem shutdowns due to ENOSPC during a bmap btree split.
> 
> Since we now consider freespace btree blocks as unavailable for
> allocation for data storage, we shouldn't report those blocks via statfs
> either.  This makes the numbers that we return via the statfs f_bavail
> and f_bfree fields a more conservative estimate of actual free space.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-04-06 18:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-27 16:58 [PATCHSET v4 0/6] xfs: fix incorrect reserve pool calculations and reporting Darrick J. Wong
2022-03-27 16:58 ` [PATCH 1/6] xfs: document the XFS_ALLOC_AGFL_RESERVE constant Darrick J. Wong
2022-03-28  1:18   ` Dave Chinner
2022-04-01  5:51   ` Christoph Hellwig
2022-03-27 16:58 ` [PATCH 2/6] xfs: don't include bnobt blocks when reserving free block pool Darrick J. Wong
2022-04-06 16:39   ` Christoph Hellwig
2022-03-27 16:58 ` [PATCH 3/6] xfs: remove infinite loop " Darrick J. Wong
2022-03-28  1:20   ` Dave Chinner
2022-04-06 16:40   ` Christoph Hellwig
2022-03-27 16:58 ` [PATCH 4/6] xfs: always succeed at setting the reserve pool size Darrick J. Wong
2022-03-28  1:21   ` Dave Chinner
2022-04-06 16:40   ` Christoph Hellwig
2022-03-27 16:58 ` [PATCH 5/6] xfs: fix overfilling of reserve pool Darrick J. Wong
2022-03-28  1:22   ` Dave Chinner
2022-04-06 16:42   ` Christoph Hellwig
2022-03-27 16:58 ` [PATCH 6/6] xfs: don't report reserved bnobt space as available Darrick J. Wong
2022-04-06 16:43   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.