All of lore.kernel.org
 help / color / mirror / Atom feed
* bring back RT delalloc support
@ 2024-02-19  6:34 Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 1/9] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

Hi all,

this series adds back delalloc support for RT inodes, at least if the RT
extent size is a single file system block.  This shows really nice
performance improvements for workloads that frequently rewrite or append
to files, and improves fragmentation for larger writes.  On other workloads
it sometimes shows small performance improvements or flat performance.

Diffstat:
 libxfs/xfs_ag.c       |    4 -
 libxfs/xfs_ag_resv.c  |   24 ++--------
 libxfs/xfs_ag_resv.h  |    2 
 libxfs/xfs_alloc.c    |    4 -
 libxfs/xfs_bmap.c     |  102 ++++++++++++++++++++++++++-----------------
 libxfs/xfs_rtbitmap.c |   14 +++++
 libxfs/xfs_shared.h   |    6 +-
 scrub/fscounters.c    |    5 +-
 scrub/repair.c        |    5 --
 xfs_fsops.c           |   29 +++---------
 xfs_fsops.h           |    2 
 xfs_inode.c           |    3 -
 xfs_iomap.c           |   44 ++++++++++++------
 xfs_iops.c            |    2 
 xfs_mount.c           |  117 +++++++++++++++++++++++++-------------------------
 xfs_mount.h           |   41 ++++++++++++++---
 xfs_rtalloc.c         |    2 
 xfs_super.c           |   17 ++++---
 xfs_trace.h           |    1 
 xfs_trans.c           |   25 +++-------
 20 files changed, 252 insertions(+), 197 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

Commit bb7b1c9c5dd3 ("xfs: tag transactions that contain intent done
items") switched the XFS_TRANS_ definitions to be bit based, and using
comments above the definitions.  As XFS_TRANS_LOWMODE was last and has
a big fat comment it was missed.  Switch it to the same style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_shared.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 4220d3584c1b0b..6f1cedb850eb39 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -70,7 +70,6 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
 #define XFS_TRANS_RES_FDBLKS		(1u << 6)
 /* Transaction contains an intent done log item */
 #define XFS_TRANS_HAS_INTENT_DONE	(1u << 7)
-
 /*
  * LOWMODE is used by the allocator to activate the lowspace algorithm - when
  * free space is running low the extent allocator may choose to allocate an
@@ -82,7 +81,7 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
  * for free space from AG 0. If the correct transaction reservations have been
  * made then this algorithm will eventually find all the space it needs.
  */
-#define XFS_TRANS_LOWMODE	0x100	/* allocate in low space mode */
+#define XFS_TRANS_LOWMODE		(1u << 8)
 
 /*
  * Field values for xfs_trans_mod_sb.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 1/9] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19 23:55   ` Dave Chinner
  2024-02-19  6:34 ` [PATCH 3/9] xfs: split xfs_mod_freecounter Christoph Hellwig
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

__xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary
inodes given that it is very high level code.  While this only looks ugly
right now, it will become a problem when supporting delayed allocations
for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents
and thus never unlock the rt inodes.

Move the locking into xfs_rtfree_blocks instead (where it will also be
helpful once we support extfree items for RT allocations), and use a new
flag in the transaction to ensure they aren't locked twice.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c     | 10 ----------
 fs/xfs/libxfs/xfs_rtbitmap.c | 14 ++++++++++++++
 fs/xfs/libxfs/xfs_shared.h   |  3 +++
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index b525524a2da4ef..f8cc7c510d7bd5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5321,16 +5321,6 @@ __xfs_bunmapi(
 	} else
 		cur = NULL;
 
-	if (isrt) {
-		/*
-		 * Synchronize by locking the bitmap inode.
-		 */
-		xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL|XFS_ILOCK_RTBITMAP);
-		xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
-		xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL|XFS_ILOCK_RTSUM);
-		xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL);
-	}
-
 	extno = 0;
 	while (end != (xfs_fileoff_t)-1 && end >= start &&
 	       (nexts == 0 || extno < nexts)) {
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index e31663cb7b4349..2759c48390241d 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1001,6 +1001,20 @@ xfs_rtfree_blocks(
 		return -EIO;
 	}
 
+	/*
+	 * Ensure the bitmap and summary inodes are locked before modifying
+	 * them.  We can get called multiples times per transaction, so record
+	 * the fact that they are locked in the transaction.
+	 */
+	if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
+		tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;
+
+		xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL|XFS_ILOCK_RTBITMAP);
+		xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
+		xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL|XFS_ILOCK_RTSUM);
+		xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL);
+	}
+
 	return xfs_rtfree_extent(tp, start, len);
 }
 
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 6f1cedb850eb39..1598ff00f6805f 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -83,6 +83,9 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
  */
 #define XFS_TRANS_LOWMODE		(1u << 8)
 
+/* Transaction has locked the rtbitmap and rtsum inodes */
+#define XFS_TRANS_RTBITMAP_LOCKED	(1u << 9)
+
 /*
  * Field values for xfs_trans_mod_sb.
  */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] xfs: split xfs_mod_freecounter
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 1/9] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19 23:21   ` Dave Chinner
  2024-02-19  6:34 ` [PATCH 4/9] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

xfs_mod_freecounter has two entirely separate code paths for adding or
subtracting from the free counters.  Only the subtract case looks at the
rsvd flag and can return an error.

Split xfs_mod_freecounter into separate helpers for subtracting or
adding the freecounter, and remove all the impossible to reach error
handling for the addition case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_ag.c      |   4 +-
 fs/xfs/libxfs/xfs_ag_resv.c |  24 +++------
 fs/xfs/libxfs/xfs_ag_resv.h |   2 +-
 fs/xfs/libxfs/xfs_alloc.c   |   4 +-
 fs/xfs/libxfs/xfs_bmap.c    |  23 ++++----
 fs/xfs/scrub/fscounters.c   |   2 +-
 fs/xfs/scrub/repair.c       |   5 +-
 fs/xfs/xfs_fsops.c          |  29 +++--------
 fs/xfs/xfs_fsops.h          |   2 +-
 fs/xfs/xfs_mount.c          | 101 ++++++++++++++++--------------------
 fs/xfs/xfs_mount.h          |  32 +++++++++---
 fs/xfs/xfs_super.c          |   6 +--
 fs/xfs/xfs_trace.h          |   1 -
 fs/xfs/xfs_trans.c          |  25 ++++-----
 14 files changed, 115 insertions(+), 145 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 036f4ee43fd3c7..f89fa03b2db7f9 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -967,9 +967,7 @@ xfs_ag_shrink_space(
 	 * Disable perag reservations so it doesn't cause the allocation request
 	 * to fail. We'll reestablish reservation before we return.
 	 */
-	error = xfs_ag_resv_free(pag);
-	if (error)
-		return error;
+	xfs_ag_resv_free(pag);
 
 	/* internal log shouldn't also show up in the free space btrees */
 	error = xfs_alloc_vextent_exact_bno(&args,
diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index da1057bd0e6067..216423df939e5c 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -126,14 +126,13 @@ xfs_ag_resv_needed(
 }
 
 /* Clean out a reservation */
-static int
+static void
 __xfs_ag_resv_free(
 	struct xfs_perag		*pag,
 	enum xfs_ag_resv_type		type)
 {
 	struct xfs_ag_resv		*resv;
 	xfs_extlen_t			oldresv;
-	int				error;
 
 	trace_xfs_ag_resv_free(pag, type, 0);
 
@@ -149,30 +148,19 @@ __xfs_ag_resv_free(
 		oldresv = resv->ar_orig_reserved;
 	else
 		oldresv = resv->ar_reserved;
-	error = xfs_mod_fdblocks(pag->pag_mount, oldresv, true);
+	xfs_add_fdblocks(pag->pag_mount, oldresv);
 	resv->ar_reserved = 0;
 	resv->ar_asked = 0;
 	resv->ar_orig_reserved = 0;
-
-	if (error)
-		trace_xfs_ag_resv_free_error(pag->pag_mount, pag->pag_agno,
-				error, _RET_IP_);
-	return error;
 }
 
 /* Free a per-AG reservation. */
-int
+void
 xfs_ag_resv_free(
 	struct xfs_perag		*pag)
 {
-	int				error;
-	int				err2;
-
-	error = __xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT);
-	err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
-	if (err2 && !error)
-		error = err2;
-	return error;
+	__xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT);
+	__xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
 }
 
 static int
@@ -216,7 +204,7 @@ __xfs_ag_resv_init(
 	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL))
 		error = -ENOSPC;
 	else
-		error = xfs_mod_fdblocks(mp, -(int64_t)hidden_space, true);
+		error = xfs_dec_fdblocks(mp, hidden_space, true);
 	if (error) {
 		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
 				error, _RET_IP_);
diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h
index b74b210008ea7e..ff20ed93de7724 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.h
+++ b/fs/xfs/libxfs/xfs_ag_resv.h
@@ -6,7 +6,7 @@
 #ifndef __XFS_AG_RESV_H__
 #define	__XFS_AG_RESV_H__
 
-int xfs_ag_resv_free(struct xfs_perag *pag);
+void xfs_ag_resv_free(struct xfs_perag *pag);
 int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp);
 
 bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3bd0a33fee0a64..ba131fecbd236d 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -78,7 +78,7 @@ xfs_prealloc_blocks(
 }
 
 /*
- * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
+ * The number of blocks per AG that we withhold from xfs_dec_fdblocks to
  * guarantee that we can refill the AGFL prior to allocating space in a nearly
  * full AG.  Although the space described by the free space btrees, the
  * blocks used by the freesp btrees themselves, and the blocks owned by the
@@ -88,7 +88,7 @@ xfs_prealloc_blocks(
  * until the fs goes down, we subtract this many AG blocks from the incore
  * fdblocks to ensure user allocation does not overcommit the space the
  * filesystem needs for the AGFLs.  The rmap btree uses a per-AG reservation to
- * withhold space from xfs_mod_fdblocks, so we do not account for that here.
+ * withhold space from xfs_dec_fdblocks, so we do not account for that here.
  */
 #define XFS_ALLOCBT_AGFL_RESERVE	4
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index f8cc7c510d7bd5..cc788cde8bffd6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1934,10 +1934,11 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	/* adjust for changes in reserved delayed indirect blocks */
-	if (da_new != da_old) {
-		ASSERT(state == 0 || da_new < da_old);
-		error = xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new),
-				false);
+	if (da_new < da_old) {
+		xfs_add_fdblocks(mp, da_old - da_new);
+	} else if (da_new > da_old) {
+		ASSERT(state == 0);
+		error = xfs_dec_fdblocks(mp, da_new - da_old, false);
 	}
 
 	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
@@ -2616,8 +2617,8 @@ xfs_bmap_add_extent_hole_delay(
 	}
 	if (oldlen != newlen) {
 		ASSERT(oldlen > newlen);
-		xfs_mod_fdblocks(ip->i_mount, (int64_t)(oldlen - newlen),
-				 false);
+		xfs_add_fdblocks(ip->i_mount, oldlen - newlen);
+
 		/*
 		 * Nothing to do for disk quota accounting here.
 		 */
@@ -4025,11 +4026,11 @@ xfs_bmapi_reserve_delalloc(
 	indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
 	ASSERT(indlen > 0);
 
-	error = xfs_mod_fdblocks(mp, -((int64_t)alen), false);
+	error = xfs_dec_fdblocks(mp, alen, false);
 	if (error)
 		goto out_unreserve_quota;
 
-	error = xfs_mod_fdblocks(mp, -((int64_t)indlen), false);
+	error = xfs_dec_fdblocks(mp, indlen, false);
 	if (error)
 		goto out_unreserve_blocks;
 
@@ -4057,7 +4058,7 @@ xfs_bmapi_reserve_delalloc(
 	return 0;
 
 out_unreserve_blocks:
-	xfs_mod_fdblocks(mp, alen, false);
+	xfs_add_fdblocks(mp, alen);
 out_unreserve_quota:
 	if (XFS_IS_QUOTA_ON(mp))
 		xfs_quota_unreserve_blkres(ip, alen);
@@ -4842,7 +4843,7 @@ xfs_bmap_del_extent_delay(
 	ASSERT(got_endoff >= del_endoff);
 
 	if (isrt)
-		xfs_mod_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
+		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
 
 	/*
 	 * Update the inode delalloc counter now and wait to update the
@@ -4929,7 +4930,7 @@ xfs_bmap_del_extent_delay(
 	if (!isrt)
 		da_diff += del->br_blockcount;
 	if (da_diff) {
-		xfs_mod_fdblocks(mp, da_diff, false);
+		xfs_add_fdblocks(mp, da_diff);
 		xfs_mod_delalloc(mp, -da_diff);
 	}
 	return error;
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index 5799e9a94f1f66..5c6d7244078942 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -516,7 +516,7 @@ xchk_fscounters(
 
 	/*
 	 * If the filesystem is not frozen, the counter summation calls above
-	 * can race with xfs_mod_freecounter, which subtracts a requested space
+	 * can race with xfs_dec_freecounter, which subtracts a requested space
 	 * reservation from the counter and undoes the subtraction if that made
 	 * the counter go negative.  Therefore, it's possible to see negative
 	 * values here, and we should only flag that as a corruption if we
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 745d5b8f405a91..0412bf7c78e727 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -929,9 +929,7 @@ xrep_reset_perag_resv(
 	ASSERT(sc->tp);
 
 	sc->flags &= ~XREP_RESET_PERAG_RESV;
-	error = xfs_ag_resv_free(sc->sa.pag);
-	if (error)
-		goto out;
+	xfs_ag_resv_free(sc->sa.pag);
 	error = xfs_ag_resv_init(sc->sa.pag, sc->tp);
 	if (error == -ENOSPC) {
 		xfs_err(sc->mp,
@@ -940,7 +938,6 @@ xrep_reset_perag_resv(
 		error = 0;
 	}
 
-out:
 	return error;
 }
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 83f708f62ed9f2..c211ea2b63c4dd 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -213,10 +213,8 @@ xfs_growfs_data_private(
 			struct xfs_perag	*pag;
 
 			pag = xfs_perag_get(mp, id.agno);
-			error = xfs_ag_resv_free(pag);
+			xfs_ag_resv_free(pag);
 			xfs_perag_put(pag);
-			if (error)
-				return error;
 		}
 		/*
 		 * Reserve AG metadata blocks. ENOSPC here does not mean there
@@ -385,14 +383,14 @@ xfs_reserve_blocks(
 	 */
 	if (mp->m_resblks > request) {
 		lcounter = mp->m_resblks_avail - request;
-		if (lcounter  > 0) {		/* release unused blocks */
+		if (lcounter > 0) {		/* release unused blocks */
 			fdblks_delta = lcounter;
 			mp->m_resblks_avail -= lcounter;
 		}
 		mp->m_resblks = request;
 		if (fdblks_delta) {
 			spin_unlock(&mp->m_sb_lock);
-			error = xfs_mod_fdblocks(mp, fdblks_delta, 0);
+			xfs_add_fdblocks(mp, fdblks_delta);
 			spin_lock(&mp->m_sb_lock);
 		}
 
@@ -428,9 +426,9 @@ xfs_reserve_blocks(
 		 */
 		fdblks_delta = min(free, delta);
 		spin_unlock(&mp->m_sb_lock);
-		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
+		error = xfs_dec_fdblocks(mp, fdblks_delta, 0);
 		if (!error)
-			xfs_mod_fdblocks(mp, fdblks_delta, 0);
+			xfs_add_fdblocks(mp, fdblks_delta);
 		spin_lock(&mp->m_sb_lock);
 	}
 out:
@@ -556,24 +554,13 @@ xfs_fs_reserve_ag_blocks(
 /*
  * Free space reserved for per-AG metadata.
  */
-int
+void
 xfs_fs_unreserve_ag_blocks(
 	struct xfs_mount	*mp)
 {
 	xfs_agnumber_t		agno;
 	struct xfs_perag	*pag;
-	int			error = 0;
-	int			err2;
 
-	for_each_perag(mp, agno, pag) {
-		err2 = xfs_ag_resv_free(pag);
-		if (err2 && !error)
-			error = err2;
-	}
-
-	if (error)
-		xfs_warn(mp,
-	"Error %d freeing per-AG metadata reserve pool.", error);
-
-	return error;
+	for_each_perag(mp, agno, pag)
+		xfs_ag_resv_free(pag);
 }
diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
index 44457b0a059376..3e2f73bcf8314b 100644
--- a/fs/xfs/xfs_fsops.h
+++ b/fs/xfs/xfs_fsops.h
@@ -12,6 +12,6 @@ int xfs_reserve_blocks(struct xfs_mount *mp, uint64_t request);
 int xfs_fs_goingdown(struct xfs_mount *mp, uint32_t inflags);
 
 int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
-int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
+void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
 
 #endif	/* __XFS_FSOPS_H__ */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 7328034d42ed8d..2e06837051d6b0 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1129,50 +1129,40 @@ xfs_fs_writable(
 	return true;
 }
 
-/* Adjust m_fdblocks or m_frextents. */
+void
+__xfs_add_fdblocks(
+	struct xfs_mount	*mp,
+	uint64_t		delta)
+{
+	long long		res_used;
+
+	spin_lock(&mp->m_sb_lock);
+	res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
+	if (res_used > delta) {
+		mp->m_resblks_avail += delta;
+	} else {
+		delta -= res_used;
+		mp->m_resblks_avail = mp->m_resblks;
+		percpu_counter_add(&mp->m_fdblocks, delta);
+	}
+	spin_unlock(&mp->m_sb_lock);
+}
+
 int
-xfs_mod_freecounter(
+xfs_dec_freecounter(
 	struct xfs_mount	*mp,
 	struct percpu_counter	*counter,
-	int64_t			delta,
+	uint64_t		delta,
 	bool			rsvd)
 {
-	int64_t			lcounter;
-	long long		res_used;
+	bool			has_resv_pool = (counter == &mp->m_fdblocks);
 	uint64_t		set_aside = 0;
 	s32			batch;
-	bool			has_resv_pool;
 
 	ASSERT(counter == &mp->m_fdblocks || counter == &mp->m_frextents);
-	has_resv_pool = (counter == &mp->m_fdblocks);
 	if (rsvd)
 		ASSERT(has_resv_pool);
 
-	if (delta > 0) {
-		/*
-		 * If the reserve pool is depleted, put blocks back into it
-		 * first. Most of the time the pool is full.
-		 */
-		if (likely(!has_resv_pool ||
-			   mp->m_resblks == mp->m_resblks_avail)) {
-			percpu_counter_add(counter, delta);
-			return 0;
-		}
-
-		spin_lock(&mp->m_sb_lock);
-		res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
-
-		if (res_used > delta) {
-			mp->m_resblks_avail += delta;
-		} else {
-			delta -= res_used;
-			mp->m_resblks_avail = mp->m_resblks;
-			percpu_counter_add(counter, delta);
-		}
-		spin_unlock(&mp->m_sb_lock);
-		return 0;
-	}
-
 	/*
 	 * Taking blocks away, need to be more accurate the closer we
 	 * are to zero.
@@ -1200,34 +1190,35 @@ xfs_mod_freecounter(
 	 */
 	if (has_resv_pool)
 		set_aside = xfs_fdblocks_unavailable(mp);
-	percpu_counter_add_batch(counter, delta, batch);
-	if (__percpu_counter_compare(counter, set_aside,
-				     XFS_FDBLOCKS_BATCH) >= 0) {
-		/* we had space! */
-		return 0;
-	}
 
-	/*
-	 * lock up the sb for dipping into reserves before releasing the space
-	 * that took us to ENOSPC.
-	 */
-	spin_lock(&mp->m_sb_lock);
-	percpu_counter_add(counter, -delta);
-	if (!has_resv_pool || !rsvd)
-		goto fdblocks_enospc;
+	percpu_counter_add_batch(counter, -((int64_t)delta), batch);
+	if (__percpu_counter_compare(counter, set_aside,
+				     XFS_FDBLOCKS_BATCH) < 0) {
+		/*
+		 * Take the SB lock to prevent other thread from racing from us
+		 * before putting back the reserved blocks, and then try to dip
+		 * into the reserved pool if we are allowed to.
+		 */
+		spin_lock(&mp->m_sb_lock);
+		percpu_counter_add(counter, delta);
+		if (has_resv_pool && rsvd) {
+			int64_t	lcounter;
+
+			lcounter = (long long)mp->m_resblks_avail - delta;
+			if (lcounter >= 0) {
+				mp->m_resblks_avail = lcounter;
+				spin_unlock(&mp->m_sb_lock);
+				return 0;
+			}
+		}
 
-	lcounter = (long long)mp->m_resblks_avail + delta;
-	if (lcounter >= 0) {
-		mp->m_resblks_avail = lcounter;
+		xfs_warn_once(mp,
+"Reserve blocks depleted! Consider increasing reserve pool size.");
 		spin_unlock(&mp->m_sb_lock);
-		return 0;
+		return -ENOSPC;
 	}
-	xfs_warn_once(mp,
-"Reserve blocks depleted! Consider increasing reserve pool size.");
 
-fdblocks_enospc:
-	spin_unlock(&mp->m_sb_lock);
-	return -ENOSPC;
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 503fe3c7edbf82..891a54d57f576d 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -534,19 +534,39 @@ xfs_fdblocks_unavailable(
 	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
 }
 
-int xfs_mod_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
-		int64_t delta, bool rsvd);
+int xfs_dec_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
+		uint64_t delta, bool rsvd);
 
 static inline int
-xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta, bool reserved)
+xfs_dec_fdblocks(struct xfs_mount *mp, uint64_t delta, bool reserved)
 {
-	return xfs_mod_freecounter(mp, &mp->m_fdblocks, delta, reserved);
+	return xfs_dec_freecounter(mp, &mp->m_fdblocks, delta, reserved);
+}
+
+
+void __xfs_add_fdblocks(struct xfs_mount *mp, uint64_t delta);
+static inline void xfs_add_fdblocks(struct xfs_mount *mp, uint64_t delta)
+{
+	/*
+	 * If the reserve pool is depleted, put blocks back into it first.
+	 * Most of the time the pool is full.
+	 */
+	if (unlikely(mp->m_resblks != mp->m_resblks_avail))
+		__xfs_add_fdblocks(mp, delta);
+	else
+		percpu_counter_add(&mp->m_fdblocks, delta);
 }
 
 static inline int
-xfs_mod_frextents(struct xfs_mount *mp, int64_t delta)
+xfs_dec_frextents(struct xfs_mount *mp, uint64_t delta)
+{
+	return xfs_dec_freecounter(mp, &mp->m_frextents, delta, false);
+}
+
+static inline void
+xfs_add_frextents(struct xfs_mount *mp, uint64_t delta)
 {
-	return xfs_mod_freecounter(mp, &mp->m_frextents, delta, false);
+	percpu_counter_add(&mp->m_frextents, delta);
 }
 
 extern int	xfs_readsb(xfs_mount_t *, int);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 6ce1e6deb7ec5f..b16828410ec19b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1875,11 +1875,7 @@ xfs_remount_ro(
 	xfs_inodegc_stop(mp);
 
 	/* Free the per-AG metadata reservation pool. */
-	error = xfs_fs_unreserve_ag_blocks(mp);
-	if (error) {
-		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-		return error;
-	}
+	xfs_fs_unreserve_ag_blocks(mp);
 
 	/*
 	 * Before we sync the metadata, we need to free up the reserve block
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index c7e57efe035666..d7abb3539d2d92 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2932,7 +2932,6 @@ DEFINE_AG_RESV_EVENT(xfs_ag_resv_free_extent);
 DEFINE_AG_RESV_EVENT(xfs_ag_resv_critical);
 DEFINE_AG_RESV_EVENT(xfs_ag_resv_needed);
 
-DEFINE_AG_ERROR_EVENT(xfs_ag_resv_free_error);
 DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error);
 
 /* refcount tracepoint classes */
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 12d45e93f07d50..049bbe0d5df7a9 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -163,7 +163,7 @@ xfs_trans_reserve(
 	 * fail if the count would go below zero.
 	 */
 	if (blocks > 0) {
-		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
+		error = xfs_dec_fdblocks(mp, blocks, rsvd);
 		if (error != 0)
 			return -ENOSPC;
 		tp->t_blk_res += blocks;
@@ -210,7 +210,7 @@ xfs_trans_reserve(
 	 * fail if the count would go below zero.
 	 */
 	if (rtextents > 0) {
-		error = xfs_mod_frextents(mp, -((int64_t)rtextents));
+		error = xfs_dec_frextents(mp, rtextents);
 		if (error) {
 			error = -ENOSPC;
 			goto undo_log;
@@ -234,7 +234,7 @@ xfs_trans_reserve(
 
 undo_blocks:
 	if (blocks > 0) {
-		xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd);
+		xfs_add_fdblocks(mp, blocks);
 		tp->t_blk_res = 0;
 	}
 	return error;
@@ -593,12 +593,10 @@ xfs_trans_unreserve_and_mod_sb(
 	struct xfs_trans	*tp)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 	int64_t			blkdelta = 0;
 	int64_t			rtxdelta = 0;
 	int64_t			idelta = 0;
 	int64_t			ifreedelta = 0;
-	int			error;
 
 	/* calculate deltas */
 	if (tp->t_blk_res > 0)
@@ -621,10 +619,8 @@ xfs_trans_unreserve_and_mod_sb(
 	}
 
 	/* apply the per-cpu counters */
-	if (blkdelta) {
-		error = xfs_mod_fdblocks(mp, blkdelta, rsvd);
-		ASSERT(!error);
-	}
+	if (blkdelta)
+		xfs_add_fdblocks(mp, blkdelta);
 
 	if (idelta)
 		percpu_counter_add_batch(&mp->m_icount, idelta,
@@ -633,10 +629,8 @@ xfs_trans_unreserve_and_mod_sb(
 	if (ifreedelta)
 		percpu_counter_add(&mp->m_ifree, ifreedelta);
 
-	if (rtxdelta) {
-		error = xfs_mod_frextents(mp, rtxdelta);
-		ASSERT(!error);
-	}
+	if (rtxdelta)
+		xfs_add_frextents(mp, rtxdelta);
 
 	if (!(tp->t_flags & XFS_TRANS_SB_DIRTY))
 		return;
@@ -672,7 +666,6 @@ xfs_trans_unreserve_and_mod_sb(
 	 */
 	ASSERT(mp->m_sb.sb_imax_pct >= 0);
 	ASSERT(mp->m_sb.sb_rextslog >= 0);
-	return;
 }
 
 /* Add the given log item to the transaction's list of log items. */
@@ -1291,9 +1284,9 @@ xfs_trans_reserve_more_inode(
 		return 0;
 
 	/* Quota failed, give back the new reservation. */
-	xfs_mod_fdblocks(mp, dblocks, tp->t_flags & XFS_TRANS_RESERVE);
+	xfs_add_fdblocks(mp, dblocks);
 	tp->t_blk_res -= dblocks;
-	xfs_mod_frextents(mp, rtx);
+	xfs_add_frextents(mp, rtx);
 	tp->t_rtx_res -= rtx;
 	return error;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-02-19  6:34 ` [PATCH 3/9] xfs: split xfs_mod_freecounter Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 5/9] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

Allocate data blocks for RT inodes using xfs_dec_frextents.  While at
it optimize the data device case by doing only a single xfs_dec_fdblocks
call for the extent itself and the indirect blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index cc788cde8bffd6..95e93534cd1264 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3984,6 +3984,7 @@ xfs_bmapi_reserve_delalloc(
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	xfs_extlen_t		alen;
 	xfs_extlen_t		indlen;
+	uint64_t		fdblocks;
 	int			error;
 	xfs_fileoff_t		aoff = off;
 
@@ -4026,14 +4027,18 @@ xfs_bmapi_reserve_delalloc(
 	indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
 	ASSERT(indlen > 0);
 
-	error = xfs_dec_fdblocks(mp, alen, false);
-	if (error)
-		goto out_unreserve_quota;
+	fdblocks = indlen;
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		error = xfs_dec_frextents(mp, xfs_rtb_to_rtx(mp, alen));
+		if (error)
+			goto out_unreserve_quota;
+	} else {
+		fdblocks += alen;
+	}
 
-	error = xfs_dec_fdblocks(mp, indlen, false);
+	error = xfs_dec_fdblocks(mp, fdblocks, false);
 	if (error)
-		goto out_unreserve_blocks;
-
+		goto out_unreserve_frextents;
 
 	ip->i_delayed_blks += alen;
 	xfs_mod_delalloc(ip->i_mount, alen + indlen);
@@ -4057,8 +4062,9 @@ xfs_bmapi_reserve_delalloc(
 
 	return 0;
 
-out_unreserve_blocks:
-	xfs_add_fdblocks(mp, alen);
+out_unreserve_frextents:
+	if (XFS_IS_REALTIME_INODE(ip))
+		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, alen));
 out_unreserve_quota:
 	if (XFS_IS_QUOTA_ON(mp))
 		xfs_quota_unreserve_blkres(ip, alen);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-02-19  6:34 ` [PATCH 4/9] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

The code to account fdblocks and frextents in xfs_bmap_del_extent_delay
is a bit weird in that it accounts frextents before the iext tree
manipulations and fdblocks after it.  Given that the iext tree
manipulations can fail currently that's not really a problem, but
still odd.  Move the frextent manipulation to the end, and use a
fdblocks variable to account of the unconditional indirect blocks and
the data blocks only freed for !RT.  This prepares for following
updates in the area and already makes the code more readable.

Also remove the !isrt assert given that this code clearly handles
rt extents correctly, and we'll soon reinstate delalloc support for
RT inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 95e93534cd1264..074d833e845af3 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4833,6 +4833,7 @@ xfs_bmap_del_extent_delay(
 	xfs_fileoff_t		del_endoff, got_endoff;
 	xfs_filblks_t		got_indlen, new_indlen, stolen;
 	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
+	uint64_t		fdblocks;
 	int			error = 0;
 	bool			isrt;
 
@@ -4848,15 +4849,11 @@ xfs_bmap_del_extent_delay(
 	ASSERT(got->br_startoff <= del->br_startoff);
 	ASSERT(got_endoff >= del_endoff);
 
-	if (isrt)
-		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
-
 	/*
 	 * Update the inode delalloc counter now and wait to update the
 	 * sb counters as we might have to borrow some blocks for the
 	 * indirect block accounting.
 	 */
-	ASSERT(!isrt);
 	error = xfs_quota_unreserve_blkres(ip, del->br_blockcount);
 	if (error)
 		return error;
@@ -4933,12 +4930,15 @@ xfs_bmap_del_extent_delay(
 
 	ASSERT(da_old >= da_new);
 	da_diff = da_old - da_new;
-	if (!isrt)
-		da_diff += del->br_blockcount;
-	if (da_diff) {
-		xfs_add_fdblocks(mp, da_diff);
-		xfs_mod_delalloc(mp, -da_diff);
-	}
+	fdblocks = da_diff;
+
+	if (isrt)
+		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
+	else
+		fdblocks += del->br_blockcount;
+
+	xfs_add_fdblocks(mp, fdblocks);
+	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
 	return error;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-02-19  6:34 ` [PATCH 5/9] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19 23:30   ` Dave Chinner
  2024-02-19  6:34 ` [PATCH 7/9] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

To prepare for re-enabling delalloc on RT devices, track the data blocks
(which use the RT device when the inode sits on it) and the indirect
blocks (which don't) separately to xfs_mod_delalloc, and add a new
percpu counter to also track the RT delalloc blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c  | 12 ++++++------
 fs/xfs/scrub/fscounters.c |  3 +++
 fs/xfs/xfs_mount.c        | 16 +++++++++++++---
 fs/xfs/xfs_mount.h        |  9 ++++++++-
 fs/xfs/xfs_super.c        | 11 ++++++++++-
 5 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 074d833e845af3..8a84b7f0b55f38 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1926,7 +1926,7 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	if (da_new != da_old)
-		xfs_mod_delalloc(mp, (int64_t)da_new - da_old);
+		xfs_mod_delalloc(bma->ip, 0, (int64_t)da_new - da_old);
 
 	if (bma->cur) {
 		da_new += bma->cur->bc_ino.allocated;
@@ -2622,7 +2622,7 @@ xfs_bmap_add_extent_hole_delay(
 		/*
 		 * Nothing to do for disk quota accounting here.
 		 */
-		xfs_mod_delalloc(ip->i_mount, (int64_t)newlen - oldlen);
+		xfs_mod_delalloc(ip, 0, (int64_t)newlen - oldlen);
 	}
 }
 
@@ -3292,7 +3292,7 @@ xfs_bmap_alloc_account(
 		 * yet.
 		 */
 		if (ap->wasdel) {
-			xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
+			xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0);
 			return;
 		}
 
@@ -3316,7 +3316,7 @@ xfs_bmap_alloc_account(
 	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 	if (ap->wasdel) {
 		ap->ip->i_delayed_blks -= ap->length;
-		xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
+		xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0);
 		fld = isrt ? XFS_TRANS_DQ_DELRTBCOUNT : XFS_TRANS_DQ_DELBCOUNT;
 	} else {
 		fld = isrt ? XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT;
@@ -4041,7 +4041,7 @@ xfs_bmapi_reserve_delalloc(
 		goto out_unreserve_frextents;
 
 	ip->i_delayed_blks += alen;
-	xfs_mod_delalloc(ip->i_mount, alen + indlen);
+	xfs_mod_delalloc(ip, alen, indlen);
 
 	got->br_startoff = aoff;
 	got->br_startblock = nullstartblock(indlen);
@@ -4938,7 +4938,7 @@ xfs_bmap_del_extent_delay(
 		fdblocks += del->br_blockcount;
 
 	xfs_add_fdblocks(mp, fdblocks);
-	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
+	xfs_mod_delalloc(ip, -(long)del->br_blockcount, -da_diff);
 	return error;
 }
 
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index 5c6d7244078942..b442e50e81555f 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -426,6 +426,9 @@ xchk_fscount_count_frextents(
 		goto out_unlock;
 	}
 
+	fsc->frextents -=
+		xfs_rtb_to_rtx(mp, percpu_counter_sum(&mp->m_delalloc_rtblks));
+
 out_unlock:
 	xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
 	return error;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 2e06837051d6b0..afc30d3333e8ad 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1388,9 +1388,19 @@ xfs_clear_incompat_log_features(
 #define XFS_DELALLOC_BATCH	(4096)
 void
 xfs_mod_delalloc(
-	struct xfs_mount	*mp,
-	int64_t			delta)
+	struct xfs_inode	*ip,
+	int64_t			data_delta,
+	int64_t			ind_delta)
 {
-	percpu_counter_add_batch(&mp->m_delalloc_blks, delta,
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		percpu_counter_add_batch(&mp->m_delalloc_rtblks, data_delta,
+				XFS_DELALLOC_BATCH);
+		if (!ind_delta)
+			return;
+		data_delta = 0;
+	}
+	percpu_counter_add_batch(&mp->m_delalloc_blks, data_delta + ind_delta,
 			XFS_DELALLOC_BATCH);
 }
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 891a54d57f576d..71c7d06f3210c8 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -195,6 +195,12 @@ typedef struct xfs_mount {
 	 * extents or anything related to the rt device.
 	 */
 	struct percpu_counter	m_delalloc_blks;
+
+	/*
+	 * RT version of the above.
+	 */
+	struct percpu_counter	m_delalloc_rtblks;
+
 	/*
 	 * Global count of allocation btree blocks in use across all AGs. Only
 	 * used when perag reservation is enabled. Helps prevent block
@@ -586,6 +592,7 @@ struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 void xfs_force_summary_recalc(struct xfs_mount *mp);
 int xfs_add_incompat_log_feature(struct xfs_mount *mp, uint32_t feature);
 bool xfs_clear_incompat_log_features(struct xfs_mount *mp);
-void xfs_mod_delalloc(struct xfs_mount *mp, int64_t delta);
+void xfs_mod_delalloc(struct xfs_inode *ip, int64_t data_delta,
+		int64_t ind_delta);
 
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index b16828410ec19b..1e21f8b953cf77 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1053,12 +1053,18 @@ xfs_init_percpu_counters(
 	if (error)
 		goto free_fdblocks;
 
-	error = percpu_counter_init(&mp->m_frextents, 0, GFP_KERNEL);
+	error = percpu_counter_init(&mp->m_delalloc_rtblks, 0, GFP_KERNEL);
 	if (error)
 		goto free_delalloc;
 
+	error = percpu_counter_init(&mp->m_frextents, 0, GFP_KERNEL);
+	if (error)
+		goto free_delalloc_rt;
+
 	return 0;
 
+free_delalloc_rt:
+	percpu_counter_destroy(&mp->m_delalloc_rtblks);
 free_delalloc:
 	percpu_counter_destroy(&mp->m_delalloc_blks);
 free_fdblocks:
@@ -1087,6 +1093,9 @@ xfs_destroy_percpu_counters(
 	percpu_counter_destroy(&mp->m_icount);
 	percpu_counter_destroy(&mp->m_ifree);
 	percpu_counter_destroy(&mp->m_fdblocks);
+	ASSERT(xfs_is_shutdown(mp) ||
+	       percpu_counter_sum(&mp->m_delalloc_rtblks) == 0);
+	percpu_counter_destroy(&mp->m_delalloc_rtblks);
 	ASSERT(xfs_is_shutdown(mp) ||
 	       percpu_counter_sum(&mp->m_delalloc_blks) == 0);
 	percpu_counter_destroy(&mp->m_delalloc_blks);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-02-19  6:34 ` [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
  2024-02-19  6:34 ` [PATCH 9/9] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig
  8 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

Add a check for files on the RT subvolume and use m_frextents instead
of m_fdblocks to adjust the preallocation size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_iomap.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 18c8f168b1532d..e6abe56d1f1f23 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -27,6 +27,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
+#include "xfs_rtbitmap.h"
 
 #define XFS_ALLOC_ALIGN(mp, off) \
 	(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -398,6 +399,29 @@ xfs_quota_calc_throttle(
 	}
 }
 
+static int64_t
+xfs_iomap_freesp(
+	struct percpu_counter	*counter,
+	uint64_t		low_space[XFS_LOWSP_MAX],
+	int			*shift)
+{
+	int64_t			freesp;
+
+	freesp = percpu_counter_read_positive(counter);
+	if (freesp < low_space[XFS_LOWSP_5_PCNT]) {
+		*shift = 2;
+		if (freesp < low_space[XFS_LOWSP_4_PCNT])
+			(*shift)++;
+		if (freesp < low_space[XFS_LOWSP_3_PCNT])
+			(*shift)++;
+		if (freesp < low_space[XFS_LOWSP_2_PCNT])
+			(*shift)++;
+		if (freesp < low_space[XFS_LOWSP_1_PCNT])
+			(*shift)++;
+	}
+	return freesp;
+}
+
 /*
  * If we don't have a user specified preallocation size, dynamically increase
  * the preallocation size as the size of the file grows.  Cap the maximum size
@@ -480,18 +504,12 @@ xfs_iomap_prealloc_size(
 	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(XFS_MAX_BMBT_EXTLEN),
 				       alloc_blocks);
 
-	freesp = percpu_counter_read_positive(&mp->m_fdblocks);
-	if (freesp < mp->m_low_space[XFS_LOWSP_5_PCNT]) {
-		shift = 2;
-		if (freesp < mp->m_low_space[XFS_LOWSP_4_PCNT])
-			shift++;
-		if (freesp < mp->m_low_space[XFS_LOWSP_3_PCNT])
-			shift++;
-		if (freesp < mp->m_low_space[XFS_LOWSP_2_PCNT])
-			shift++;
-		if (freesp < mp->m_low_space[XFS_LOWSP_1_PCNT])
-			shift++;
-	}
+	if (unlikely(XFS_IS_REALTIME_INODE(ip)))
+		freesp = xfs_rtx_to_rtb(mp, xfs_iomap_freesp(&mp->m_frextents,
+				mp->m_low_rtexts, &shift));
+	else
+		freesp = xfs_iomap_freesp(&mp->m_fdblocks, mp->m_low_space,
+				&shift);
 
 	/*
 	 * Check each quota to cap the prealloc size, provide a shift value to
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks)
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-02-19  6:34 ` [PATCH 7/9] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  2024-02-19 23:47   ` Dave Chinner
  2024-02-19  6:34 ` [PATCH 9/9] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig
  8 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

When xfs_bmap_del_extent_delay has to split an indirect block it tries
to steal blocks from the the part that gets unmapped to increase the
indirect block reservation that now needs to cover for two extents
instead of one.

This works perfectly fine on the data device, where the data and
indirect blocks come from the same pool.  It has no chance of working
when the inode sits on the RT device.  To support re-enabling delalloc
for inodes on the RT device, make this behavior conditional on not
beeing for rt extents.  For an RT extent try allocate new blocks or
otherwise just give up.

Note that split of delalloc extents should only happen on writeback
failure, as for other kinds of hole punching we first write back all
data and thus convert the delalloc reservations covering the hole to
a real allocation.

Note that restoring a quota reservation is always a bit problematic,
but the force flag should take care of it.  That is, if we actually
supported quota with the RT volume, which seems to not be the case
at the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8a84b7f0b55f38..a137abf435eeba 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4912,6 +4912,30 @@ xfs_bmap_del_extent_delay(
 		WARN_ON_ONCE(!got_indlen || !new_indlen);
 		stolen = xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen,
 						       del->br_blockcount);
+		if (isrt && stolen) {
+			/*
+			 * Ugg, we can't just steal reservations from the data
+			 * blocks as the data blocks come from a different pool.
+			 *
+			 * So we have to try to increase out reservations here,
+			 * and if that fails we have to fail the unmap.  To
+			 * avoid that as much as possible dip into the reserve
+			 * pool.
+			 *
+			 * Note that in theory the user/group/project could
+			 * be over the quota limit in the meantime, thus we
+			 * force the quota accounting even if it was over the
+			 * limit.
+			 */
+			error = xfs_dec_fdblocks(mp, stolen, true);
+			if (error) {
+				ip->i_delayed_blks += del->br_blockcount;
+				xfs_trans_reserve_quota_nblks(NULL, ip, 0,
+						del->br_blockcount, true);
+				return error;
+			}
+			xfs_mod_delalloc(ip, 0, stolen);
+		}
 
 		got->br_startblock = nullstartblock((int)got_indlen);
 
@@ -4924,7 +4948,8 @@ xfs_bmap_del_extent_delay(
 		xfs_iext_insert(ip, icur, &new, state);
 
 		da_new = got_indlen + new_indlen - stolen;
-		del->br_blockcount -= stolen;
+		if (!isrt)
+			del->br_blockcount -= stolen;
 		break;
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1)
  2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-02-19  6:34 ` [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
@ 2024-02-19  6:34 ` Christoph Hellwig
  8 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-19  6:34 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, linux-xfs

Commit aff3a9edb708 ("xfs: Use preallocation for inodes with extsz
hints") disabled delayed allocation for all inodes with extent size
hints due a data exposure problem.  It turns out we fixed this data
exposure problem since by always creating unwritten extents for
delalloc conversions due to more data exposure problems, but the
writeback path doesn't actually support extent size hints when
converting delalloc these days, which probably isn't a problem given
that people using the hints know what they get.

However due to the way how xfs_get_extsz_hint is implemented, it
always claims an extent size hint for RT inodes even if the RT
extent size is a single FSB.  Due to that the above commit effectively
disabled delalloc support for RT inodes.

Switch xfs_get_extsz_hint to return 0 for this case and work around
that in a few places to reinstate delalloc support for RT inodes on
file systems with an sb_rextsize of 1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_inode.c   | 3 ++-
 fs/xfs/xfs_iomap.c   | 2 --
 fs/xfs/xfs_iops.c    | 2 +-
 fs/xfs/xfs_rtalloc.c | 2 ++
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 37ec247edc1332..9e12278d1b62cd 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -60,7 +60,8 @@ xfs_get_extsz_hint(
 		return 0;
 	if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize)
 		return ip->i_extsize;
-	if (XFS_IS_REALTIME_INODE(ip))
+	if (XFS_IS_REALTIME_INODE(ip) &&
+	    ip->i_mount->m_sb.sb_rextsize > 1)
 		return ip->i_mount->m_sb.sb_rextsize;
 	return 0;
 }
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e6abe56d1f1f23..aea4e29ebd6785 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -992,8 +992,6 @@ xfs_buffered_write_iomap_begin(
 		return xfs_direct_write_iomap_begin(inode, offset, count,
 				flags, iomap, srcmap);
 
-	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-
 	error = xfs_qm_dqattach(ip);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index be102fd49560dc..ca60ba060fd5c9 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -521,7 +521,7 @@ xfs_stat_blksize(
 	 * always return the realtime extent size.
 	 */
 	if (XFS_IS_REALTIME_INODE(ip))
-		return XFS_FSB_TO_B(mp, xfs_get_extsz_hint(ip));
+		return XFS_FSB_TO_B(mp, xfs_get_extsz_hint(ip) ? : 1);
 
 	/*
 	 * Allow large block sizes to be reported to userspace programs if the
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 2f85567f3d756b..9c7fba175b9025 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1340,6 +1340,8 @@ xfs_bmap_rtalloc(
 	int			error;
 
 	align = xfs_get_extsz_hint(ap->ip);
+	if (!align)
+		align = 1;
 retry:
 	error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
 					align, 1, ap->eof, 0,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] xfs: split xfs_mod_freecounter
  2024-02-19  6:34 ` [PATCH 3/9] xfs: split xfs_mod_freecounter Christoph Hellwig
@ 2024-02-19 23:21   ` Dave Chinner
  2024-02-20  7:28     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2024-02-19 23:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Mon, Feb 19, 2024 at 07:34:44AM +0100, Christoph Hellwig wrote:
> xfs_mod_freecounter has two entirely separate code paths for adding or
> subtracting from the free counters.  Only the subtract case looks at the
> rsvd flag and can return an error.
> 
> Split xfs_mod_freecounter into separate helpers for subtracting or
> adding the freecounter, and remove all the impossible to reach error
> handling for the addition case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

.....

> @@ -593,12 +593,10 @@ xfs_trans_unreserve_and_mod_sb(
>  	struct xfs_trans	*tp)
>  {
>  	struct xfs_mount	*mp = tp->t_mountp;
> -	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
>  	int64_t			blkdelta = 0;
>  	int64_t			rtxdelta = 0;
>  	int64_t			idelta = 0;
>  	int64_t			ifreedelta = 0;
> -	int			error;
>  
>  	/* calculate deltas */
>  	if (tp->t_blk_res > 0)
> @@ -621,10 +619,8 @@ xfs_trans_unreserve_and_mod_sb(
>  	}
>  
>  	/* apply the per-cpu counters */
> -	if (blkdelta) {
> -		error = xfs_mod_fdblocks(mp, blkdelta, rsvd);
> -		ASSERT(!error);
> -	}
> +	if (blkdelta)
> +		xfs_add_fdblocks(mp, blkdelta);
>  
>  	if (idelta)
>  		percpu_counter_add_batch(&mp->m_icount, idelta,
> @@ -633,10 +629,8 @@ xfs_trans_unreserve_and_mod_sb(
>  	if (ifreedelta)
>  		percpu_counter_add(&mp->m_ifree, ifreedelta);
>  
> -	if (rtxdelta) {
> -		error = xfs_mod_frextents(mp, rtxdelta);
> -		ASSERT(!error);
> -	}
> +	if (rtxdelta)
> +		xfs_add_frextents(mp, rtxdelta);
>  
>  	if (!(tp->t_flags & XFS_TRANS_SB_DIRTY))
>  		return;

I don't think these hunks are correct. blkdelta and rtxdelta can be
negative - they are int64_t, and they are set via
xfs_trans_mod_sb(). e.g. in xfs_ag_resv_alloc_extent() we do:

	case XFS_AG_RESV_NONE:
                field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS :
                                       XFS_TRANS_SB_FDBLOCKS;
                xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len);
                return;
        }

Which passes a negative delta to xfs_trans_mod_sb() and adds it to
tp->t_fdblocks_delta. So that field can hold a negative number, and
now we pass a negative int64_t to xfs_add_fdblocks() as an unsigned
uint64_t.....

While it might kinda work because of implicit overflow behaviour,
it won't account allow for that block usage to correctly account
for the reserve pool usage that it should have accounted for near
ENOSPC....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc
  2024-02-19  6:34 ` [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
@ 2024-02-19 23:30   ` Dave Chinner
  2024-02-20  5:14     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2024-02-19 23:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Mon, Feb 19, 2024 at 07:34:47AM +0100, Christoph Hellwig wrote:
> To prepare for re-enabling delalloc on RT devices, track the data blocks
> (which use the RT device when the inode sits on it) and the indirect
> blocks (which don't) separately to xfs_mod_delalloc, and add a new
> percpu counter to also track the RT delalloc blocks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....

> @@ -4938,7 +4938,7 @@ xfs_bmap_del_extent_delay(
>  		fdblocks += del->br_blockcount;
>  
>  	xfs_add_fdblocks(mp, fdblocks);
> -	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
> +	xfs_mod_delalloc(ip, -(long)del->br_blockcount, -da_diff);
>  	return error;

That change of cast type looks wrong.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks)
  2024-02-19  6:34 ` [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
@ 2024-02-19 23:47   ` Dave Chinner
  2024-02-20  5:13     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2024-02-19 23:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Mon, Feb 19, 2024 at 07:34:49AM +0100, Christoph Hellwig wrote:
> When xfs_bmap_del_extent_delay has to split an indirect block it tries
> to steal blocks from the the part that gets unmapped to increase the
> indirect block reservation that now needs to cover for two extents
> instead of one.
> 
> This works perfectly fine on the data device, where the data and
> indirect blocks come from the same pool.  It has no chance of working
> when the inode sits on the RT device.  To support re-enabling delalloc
> for inodes on the RT device, make this behavior conditional on not
> beeing for rt extents.  For an RT extent try allocate new blocks or
> otherwise just give up.
> 
> Note that split of delalloc extents should only happen on writeback
> failure, as for other kinds of hole punching we first write back all
> data and thus convert the delalloc reservations covering the hole to
> a real allocation.
> 
> Note that restoring a quota reservation is always a bit problematic,
> but the force flag should take care of it.  That is, if we actually
> supported quota with the RT volume, which seems to not be the case
> at the moment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 8a84b7f0b55f38..a137abf435eeba 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4912,6 +4912,30 @@ xfs_bmap_del_extent_delay(
>  		WARN_ON_ONCE(!got_indlen || !new_indlen);
>  		stolen = xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen,
>  						       del->br_blockcount);
> +		if (isrt && stolen) {
> +			/*
> +			 * Ugg, we can't just steal reservations from the data
> +			 * blocks as the data blocks come from a different pool.
> +			 *
> +			 * So we have to try to increase out reservations here,
> +			 * and if that fails we have to fail the unmap.  To
> +			 * avoid that as much as possible dip into the reserve
> +			 * pool.
> +			 *
> +			 * Note that in theory the user/group/project could
> +			 * be over the quota limit in the meantime, thus we
> +			 * force the quota accounting even if it was over the
> +			 * limit.
> +			 */
> +			error = xfs_dec_fdblocks(mp, stolen, true);
> +			if (error) {
> +				ip->i_delayed_blks += del->br_blockcount;
> +				xfs_trans_reserve_quota_nblks(NULL, ip, 0,
> +						del->br_blockcount, true);
> +				return error;
> +			}
> +			xfs_mod_delalloc(ip, 0, stolen);
> +		}

Ok. If you delay the ip->i_delayed_blks and quota accounting until
after the incore extent tree updates are done, this code doesn't
need to undo anything and can just return an error. We should also
keep in mind that an error here will likely cause a filesystem
shutdown when the transaction is canceled....

FWIW, if we are going to do this for rt, we should probably also
consider do it for normal delalloc conversion when the indlen
reservation runs out due to excessive fragmentation of large
extents. Separate patch and all that, but it doesn't really make
sense to me to only do this for RT when we know it is also needed in
reare cases on non-rt workloads...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi
  2024-02-19  6:34 ` [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
@ 2024-02-19 23:55   ` Dave Chinner
  2024-02-20  5:10     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2024-02-19 23:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Mon, Feb 19, 2024 at 07:34:43AM +0100, Christoph Hellwig wrote:
> __xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary
> inodes given that it is very high level code.  While this only looks ugly
> right now, it will become a problem when supporting delayed allocations
> for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents
> and thus never unlock the rt inodes.
> 
> Move the locking into xfs_rtfree_blocks instead (where it will also be
> helpful once we support extfree items for RT allocations), and use a new
> flag in the transaction to ensure they aren't locked twice.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c     | 10 ----------
>  fs/xfs/libxfs/xfs_rtbitmap.c | 14 ++++++++++++++
>  fs/xfs/libxfs/xfs_shared.h   |  3 +++
>  3 files changed, 17 insertions(+), 10 deletions(-)

Ok, nice cleanup.

I'd also like to see the rest of the rt-only code in __xfs_bunmapi()
lifted out of the function into a helper. It's a big chunk of code
inside the loop, and the code structure is:

	loop() {
		/* common stuff */

		if (!rt)
			goto delete;

		/* lots of rt only stuff */

	delete:
		/* common stuff */
	}

I think this would be much better as

	loop() {
		/* common stuff */

		if (rt) {
			error = xfs_bunmapi_rtext()
			if (error)
				goto error0;
		}

		/* common stuff */
	}

Separate cleanup though, not necessary for this patchset...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi
  2024-02-19 23:55   ` Dave Chinner
@ 2024-02-20  5:10     ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-20  5:10 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Tue, Feb 20, 2024 at 10:55:59AM +1100, Dave Chinner wrote:
> I'd also like to see the rest of the rt-only code in __xfs_bunmapi()
> lifted out of the function into a helper. It's a big chunk of code
> inside the loop, and the code structure is:

Yes, this code has been bothering me forever, and I've started at least
three attempts at cleaning it up and for now given them up due to the
really compliated exit conditions.  I think it really should be done
eventually, but I don't plan to add it to this series.  It will probably
end up changing some of the existing loop logic as some of that is
pretty questionable and predates the nice iext iterators.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks)
  2024-02-19 23:47   ` Dave Chinner
@ 2024-02-20  5:13     ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-20  5:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Tue, Feb 20, 2024 at 10:47:12AM +1100, Dave Chinner wrote:
> Ok. If you delay the ip->i_delayed_blks and quota accounting until
> after the incore extent tree updates are done, this code doesn't
> need to undo anything and can just return an error. We should also
> keep in mind that an error here will likely cause a filesystem
> shutdown when the transaction is canceled....

Yes.  However (as documented in the commit log), the only place where
I think it can actually happen is on a buffered write errors as "real"
hole punches always flush delalloc space before.

> FWIW, if we are going to do this for rt, we should probably also
> consider do it for normal delalloc conversion when the indlen
> reservation runs out due to excessive fragmentation of large
> extents. Separate patch and all that, but it doesn't really make
> sense to me to only do this for RT when we know it is also needed in
> reare cases on non-rt workloads...

Can it happen for non-RT extents?  That would assume the required new
indirect block reservation for splitting an extent would have to be
larger than the amount we punch out.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc
  2024-02-19 23:30   ` Dave Chinner
@ 2024-02-20  5:14     ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-20  5:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Tue, Feb 20, 2024 at 10:30:03AM +1100, Dave Chinner wrote:
> On Mon, Feb 19, 2024 at 07:34:47AM +0100, Christoph Hellwig wrote:
> > To prepare for re-enabling delalloc on RT devices, track the data blocks
> > (which use the RT device when the inode sits on it) and the indirect
> > blocks (which don't) separately to xfs_mod_delalloc, and add a new
> > percpu counter to also track the RT delalloc blocks.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> .....
> 
> > @@ -4938,7 +4938,7 @@ xfs_bmap_del_extent_delay(
> >  		fdblocks += del->br_blockcount;
> >  
> >  	xfs_add_fdblocks(mp, fdblocks);
> > -	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
> > +	xfs_mod_delalloc(ip, -(long)del->br_blockcount, -da_diff);
> >  	return error;
> 
> That change of cast type looks wrong.

Yes, this should be a int64_t.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] xfs: split xfs_mod_freecounter
  2024-02-19 23:21   ` Dave Chinner
@ 2024-02-20  7:28     ` Christoph Hellwig
  2024-02-20 16:08       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-20  7:28 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Tue, Feb 20, 2024 at 10:21:12AM +1100, Dave Chinner wrote:
> I don't think these hunks are correct. blkdelta and rtxdelta can be
> negative - they are int64_t, and they are set via
> xfs_trans_mod_sb(). e.g. in xfs_ag_resv_alloc_extent() we do:
> 
> 	case XFS_AG_RESV_NONE:
>                 field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS :
>                                        XFS_TRANS_SB_FDBLOCKS;
>                 xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len);
>                 return;
>         }
> 
> Which passes a negative delta to xfs_trans_mod_sb() and adds it to
> tp->t_fdblocks_delta. So that field can hold a negative number, and
> now we pass a negative int64_t to xfs_add_fdblocks() as an unsigned
> uint64_t.....

This area is rather subtle.

For XFS_TRANS_SB_FDBLOCKS, xfs_trans_mod_sb expects enough t_blk_res to
be held to at least balance out the t_fdblocks_delta value, i.e.
xfs_trans_unreserve_and_mod_sb always starts out with a positive value
due to the t_blk_res, and then decrements the actually used block
allocation in t_fdblocks_delta, and then still must end up with 0 or a
positive value, and if a positive value is left it "unreserves" the
reservation per the function name.  Same for the rtextent version.

So we should be fine here, but the code could really use documentation,
a few more asserts and a slightly different structure that makes this
more obvious.  I'll throw in a patch for that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] xfs: split xfs_mod_freecounter
  2024-02-20  7:28     ` Christoph Hellwig
@ 2024-02-20 16:08       ` Christoph Hellwig
  2024-02-21  0:00         ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2024-02-20 16:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Tue, Feb 20, 2024 at 08:28:21AM +0100, Christoph Hellwig wrote:
> So we should be fine here, but the code could really use documentation,
> a few more asserts and a slightly different structure that makes this
> more obvious.  I'll throw in a patch for that.

This is what I ended up with:

---
From 22cba925f1f94b22cfa6143a814f1d14a3521621 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 20 Feb 2024 08:35:27 +0100
Subject: xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive

And to make that more clear, rearrange the code a bit and add asserts
and a comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_trans.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 12d45e93f07d50..befb508638ca1f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -594,28 +594,38 @@ xfs_trans_unreserve_and_mod_sb(
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
-	int64_t			blkdelta = 0;
-	int64_t			rtxdelta = 0;
+	int64_t			blkdelta = tp->t_blk_res;
+	int64_t			rtxdelta = tp->t_rtx_res;
 	int64_t			idelta = 0;
 	int64_t			ifreedelta = 0;
 	int			error;
 
-	/* calculate deltas */
-	if (tp->t_blk_res > 0)
-		blkdelta = tp->t_blk_res;
-	if ((tp->t_fdblocks_delta != 0) &&
-	    (xfs_has_lazysbcount(mp) ||
-	     (tp->t_flags & XFS_TRANS_SB_DIRTY)))
+	/*
+	 * Calculate the deltas.
+	 *
+	 * t_fdblocks_delta and t_frextents_delta can be positive or negative:
+	 *
+	 *  - positive values indicate blocks freed in the transaction.
+	 *  - negative values indicate blocks allocated in the transaction
+	 *
+	 * Negative values can only happen if the transaction has a block
+	 * reservation that covers the allocated block.  The end result is
+	 * that the calculated delta values must always be positive and we
+	 * can only put back previous allocated or reserved blocks here.
+	 */
+	ASSERT(tp->t_blk_res || tp->t_fdblocks_delta >= 0);
+	if (xfs_has_lazysbcount(mp) || (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
 	        blkdelta += tp->t_fdblocks_delta;
+		ASSERT(blkdelta >= 0);
+	}
 
-	if (tp->t_rtx_res > 0)
-		rtxdelta = tp->t_rtx_res;
-	if ((tp->t_frextents_delta != 0) &&
-	    (tp->t_flags & XFS_TRANS_SB_DIRTY))
+	ASSERT(tp->t_rtx_res || tp->t_frextents_delta >= 0);
+	if (tp->t_flags & XFS_TRANS_SB_DIRTY) {
 		rtxdelta += tp->t_frextents_delta;
+		ASSERT(rtxdelta >= 0);
+	}
 
-	if (xfs_has_lazysbcount(mp) ||
-	     (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
+	if (xfs_has_lazysbcount(mp) || (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
 		idelta = tp->t_icount_delta;
 		ifreedelta = tp->t_ifree_delta;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] xfs: split xfs_mod_freecounter
  2024-02-20 16:08       ` Christoph Hellwig
@ 2024-02-21  0:00         ` Dave Chinner
  0 siblings, 0 replies; 20+ messages in thread
From: Dave Chinner @ 2024-02-21  0:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Tue, Feb 20, 2024 at 05:08:58PM +0100, Christoph Hellwig wrote:
> On Tue, Feb 20, 2024 at 08:28:21AM +0100, Christoph Hellwig wrote:
> > So we should be fine here, but the code could really use documentation,
> > a few more asserts and a slightly different structure that makes this
> > more obvious.  I'll throw in a patch for that.
> 
> This is what I ended up with:
> 
> ---
> From 22cba925f1f94b22cfa6143a814f1d14a3521621 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 20 Feb 2024 08:35:27 +0100
> Subject: xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive
> 
> And to make that more clear, rearrange the code a bit and add asserts
> and a comment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_trans.c | 38 ++++++++++++++++++++++++--------------
>  1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 12d45e93f07d50..befb508638ca1f 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -594,28 +594,38 @@ xfs_trans_unreserve_and_mod_sb(
>  {
>  	struct xfs_mount	*mp = tp->t_mountp;
>  	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
> -	int64_t			blkdelta = 0;
> -	int64_t			rtxdelta = 0;
> +	int64_t			blkdelta = tp->t_blk_res;
> +	int64_t			rtxdelta = tp->t_rtx_res;
>  	int64_t			idelta = 0;
>  	int64_t			ifreedelta = 0;
>  	int			error;
>  
> -	/* calculate deltas */
> -	if (tp->t_blk_res > 0)
> -		blkdelta = tp->t_blk_res;
> -	if ((tp->t_fdblocks_delta != 0) &&
> -	    (xfs_has_lazysbcount(mp) ||
> -	     (tp->t_flags & XFS_TRANS_SB_DIRTY)))
> +	/*
> +	 * Calculate the deltas.
> +	 *
> +	 * t_fdblocks_delta and t_frextents_delta can be positive or negative:
> +	 *
> +	 *  - positive values indicate blocks freed in the transaction.
> +	 *  - negative values indicate blocks allocated in the transaction
> +	 *
> +	 * Negative values can only happen if the transaction has a block
> +	 * reservation that covers the allocated block.  The end result is
> +	 * that the calculated delta values must always be positive and we
> +	 * can only put back previous allocated or reserved blocks here.
> +	 */
> +	ASSERT(tp->t_blk_res || tp->t_fdblocks_delta >= 0);
> +	if (xfs_has_lazysbcount(mp) || (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
>  	        blkdelta += tp->t_fdblocks_delta;
> +		ASSERT(blkdelta >= 0);
> +	}
>  
> -	if (tp->t_rtx_res > 0)
> -		rtxdelta = tp->t_rtx_res;
> -	if ((tp->t_frextents_delta != 0) &&
> -	    (tp->t_flags & XFS_TRANS_SB_DIRTY))
> +	ASSERT(tp->t_rtx_res || tp->t_frextents_delta >= 0);
> +	if (tp->t_flags & XFS_TRANS_SB_DIRTY) {
>  		rtxdelta += tp->t_frextents_delta;
> +		ASSERT(rtxdelta >= 0);
> +	}
>  
> -	if (xfs_has_lazysbcount(mp) ||
> -	     (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
> +	if (xfs_has_lazysbcount(mp) || (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
>  		idelta = tp->t_icount_delta;
>  		ifreedelta = tp->t_ifree_delta;
>  	}

That seems reasonable - at least it documents the expectations.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-02-21  0:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-19  6:34 bring back RT delalloc support Christoph Hellwig
2024-02-19  6:34 ` [PATCH 1/9] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
2024-02-19  6:34 ` [PATCH 2/9] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
2024-02-19 23:55   ` Dave Chinner
2024-02-20  5:10     ` Christoph Hellwig
2024-02-19  6:34 ` [PATCH 3/9] xfs: split xfs_mod_freecounter Christoph Hellwig
2024-02-19 23:21   ` Dave Chinner
2024-02-20  7:28     ` Christoph Hellwig
2024-02-20 16:08       ` Christoph Hellwig
2024-02-21  0:00         ` Dave Chinner
2024-02-19  6:34 ` [PATCH 4/9] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
2024-02-19  6:34 ` [PATCH 5/9] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
2024-02-19  6:34 ` [PATCH 6/9] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
2024-02-19 23:30   ` Dave Chinner
2024-02-20  5:14     ` Christoph Hellwig
2024-02-19  6:34 ` [PATCH 7/9] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
2024-02-19  6:34 ` [PATCH 8/9] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
2024-02-19 23:47   ` Dave Chinner
2024-02-20  5:13     ` Christoph Hellwig
2024-02-19  6:34 ` [PATCH 9/9] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.