All of lore.kernel.org
 help / color / mirror / Atom feed
* bring back RT delalloc support v5
@ 2024-03-27 11:03 Christoph Hellwig
  2024-03-27 11:03 ` [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Hi all,

this series adds back delalloc support for RT inodes, at least if the RT
extent size is a single file system block.  This shows really nice
performance improvements for workloads that frequently rewrite or append
to files, and improves fragmentation for larger writes.  On other
workloads it sometimes shows small performance improvements or flat
performance.

Changes since v4:
 - fix a bug in the fix for splitting RT delalloc reservations
 - rework where rtbitmap locking happens in the bunmapi patch to
   avoid churn with a series in development, stealing a patch from
   Darrick's giant patch queue in the process

Changes since v3:
 - fix a bug in splitting RT delalloc reservations
 - fix online repair of RT delalloc fscounters
 - adjust the freespace counter refactoing to play nicer with an
   upcoming series of mine

Changes since v3:
 - drop gratuitous changes to xfs_mod_freecounter/xfs_add_freecounter
   that caused a warning to be logged when it shouldn't.
 - track delalloc rtextents instead of rtblocks in xfs_mount
 - fix commit message spelling an typos
 
Changes since v2:
 - keep casting to int64_t for xfs_mod_delalloc
 - add a patch to clarify and assert that the block delta in
   xfs_trans_unreserve_and_mod_sb can only be positive

Diffstat:
 libxfs/xfs_ag.c           |    4 -
 libxfs/xfs_ag_resv.c      |   24 +------
 libxfs/xfs_ag_resv.h      |    2 
 libxfs/xfs_alloc.c        |    4 -
 libxfs/xfs_bmap.c         |  152 ++++++++++++++++++++++------------------------
 libxfs/xfs_rtbitmap.c     |   57 +++++++++++++++++
 libxfs/xfs_rtbitmap.h     |   17 +++++
 libxfs/xfs_shared.h       |    6 +
 scrub/common.c            |    1 
 scrub/fscounters.c        |   12 ++-
 scrub/fscounters.h        |    1 
 scrub/fscounters_repair.c |    3 
 scrub/repair.c            |    5 -
 xfs_fsmap.c               |    4 -
 xfs_fsops.c               |   29 ++------
 xfs_fsops.h               |    2 
 xfs_inode.c               |    3 
 xfs_iomap.c               |   44 +++++++++----
 xfs_iops.c                |    2 
 xfs_mount.c               |   85 +++++++++++++++----------
 xfs_mount.h               |   36 ++++++++--
 xfs_rtalloc.c             |   22 ++----
 xfs_super.c               |   17 +++--
 xfs_trace.h               |    1 
 xfs_trans.c               |   63 +++++++++----------
 25 files changed, 348 insertions(+), 248 deletions(-)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  3:08   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 02/13] xfs: refactor realtime inode locking Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Commit bb7b1c9c5dd3 ("xfs: tag transactions that contain intent done
items") switched the XFS_TRANS_ definitions to be bit based, and using
comments above the definitions.  As XFS_TRANS_LOWMODE was last and has
a big fat comment it was missed.  Switch it to the same style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_shared.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index dfd61fa8332e1a..f35640ad3e7fe4 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -124,7 +124,6 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
 #define XFS_TRANS_RES_FDBLKS		(1u << 6)
 /* Transaction contains an intent done log item */
 #define XFS_TRANS_HAS_INTENT_DONE	(1u << 7)
-
 /*
  * LOWMODE is used by the allocator to activate the lowspace algorithm - when
  * free space is running low the extent allocator may choose to allocate an
@@ -136,7 +135,7 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
  * for free space from AG 0. If the correct transaction reservations have been
  * made then this algorithm will eventually find all the space it needs.
  */
-#define XFS_TRANS_LOWMODE	0x100	/* allocate in low space mode */
+#define XFS_TRANS_LOWMODE		(1u << 8)
 
 /*
  * Field values for xfs_trans_mod_sb.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/13] xfs: refactor realtime inode locking
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
  2024-03-27 11:03 ` [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  3:15   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 03/13] xfs: free RT extents after updating the bmap btree Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

From: "Darrick J. Wong" <djwong@kernel.org>

Create helper functions to deal with locking realtime metadata inodes.
This enables us to maintain correct locking order once we start adding
the realtime rmap and refcount btree inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c     |  7 ++---
 fs/xfs/libxfs/xfs_rtbitmap.c | 57 ++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rtbitmap.h | 17 +++++++++++
 fs/xfs/scrub/common.c        |  1 +
 fs/xfs/scrub/fscounters.c    |  4 +--
 fs/xfs/xfs_fsmap.c           |  4 +--
 fs/xfs/xfs_rtalloc.c         | 20 ++++---------
 7 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 656c95a22f2e6d..09d4b730ee9709 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5416,12 +5416,9 @@ __xfs_bunmapi(
 
 	if (isrt) {
 		/*
-		 * Synchronize by locking the bitmap inode.
+		 * Synchronize by locking the realtime bitmap.
 		 */
-		xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL|XFS_ILOCK_RTBITMAP);
-		xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
-		xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL|XFS_ILOCK_RTSUM);
-		xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL);
+		xfs_rtbitmap_lock(tp, mp);
 	}
 
 	extno = 0;
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c
index f246d6dbf4eca8..386b672c505896 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.c
+++ b/fs/xfs/libxfs/xfs_rtbitmap.c
@@ -1168,3 +1168,60 @@ xfs_rtsummary_wordcount(
 	blocks = xfs_rtsummary_blockcount(mp, rsumlevels, rbmblocks);
 	return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG;
 }
+
+/*
+ * Lock both realtime free space metadata inodes for a freespace update.  If a
+ * transaction is given, the inodes will be joined to the transaction and the
+ * ILOCKs will be released on transaction commit.
+ */
+void
+xfs_rtbitmap_lock(
+	struct xfs_trans	*tp,
+	struct xfs_mount	*mp)
+{
+	xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
+	if (tp)
+		xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
+
+	xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
+	if (tp)
+		xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL);
+}
+
+/* Unlock both realtime free space metadata inodes after a freespace update. */
+void
+xfs_rtbitmap_unlock(
+	struct xfs_mount	*mp)
+{
+	xfs_iunlock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
+	xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
+}
+
+/*
+ * Lock the realtime free space metadata inodes for a freespace scan.  Callers
+ * must walk metadata blocks in order of increasing file offset.
+ */
+void
+xfs_rtbitmap_lock_shared(
+	struct xfs_mount	*mp,
+	unsigned int		rbmlock_flags)
+{
+	if (rbmlock_flags & XFS_RBMLOCK_BITMAP)
+		xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+
+	if (rbmlock_flags & XFS_RBMLOCK_SUMMARY)
+		xfs_ilock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM);
+}
+
+/* Unlock the realtime free space metadata inodes after a freespace scan. */
+void
+xfs_rtbitmap_unlock_shared(
+	struct xfs_mount	*mp,
+	unsigned int		rbmlock_flags)
+{
+	if (rbmlock_flags & XFS_RBMLOCK_SUMMARY)
+		xfs_iunlock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM);
+
+	if (rbmlock_flags & XFS_RBMLOCK_BITMAP)
+		xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+}
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h
index 152a66750af554..6186585f2c376d 100644
--- a/fs/xfs/libxfs/xfs_rtbitmap.h
+++ b/fs/xfs/libxfs/xfs_rtbitmap.h
@@ -360,6 +360,19 @@ xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp,
 		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
 unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp,
 		unsigned int rsumlevels, xfs_extlen_t rbmblocks);
+
+void xfs_rtbitmap_lock(struct xfs_trans *tp, struct xfs_mount *mp);
+void xfs_rtbitmap_unlock(struct xfs_mount *mp);
+
+/* Lock the rt bitmap inode in shared mode */
+#define XFS_RBMLOCK_BITMAP	(1U << 0)
+/* Lock the rt summary inode in shared mode */
+#define XFS_RBMLOCK_SUMMARY	(1U << 1)
+
+void xfs_rtbitmap_lock_shared(struct xfs_mount *mp,
+		unsigned int rbmlock_flags);
+void xfs_rtbitmap_unlock_shared(struct xfs_mount *mp,
+		unsigned int rbmlock_flags);
 #else /* CONFIG_XFS_RT */
 # define xfs_rtfree_extent(t,b,l)			(-ENOSYS)
 # define xfs_rtfree_blocks(t,rb,rl)			(-ENOSYS)
@@ -378,6 +391,10 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 # define xfs_rtbitmap_wordcount(mp, r)			(0)
 # define xfs_rtsummary_blockcount(mp, l, b)		(0)
 # define xfs_rtsummary_wordcount(mp, l, b)		(0)
+# define xfs_rtbitmap_lock(tp, mp)		do { } while (0)
+# define xfs_rtbitmap_unlock(mp)		do { } while (0)
+# define xfs_rtbitmap_lock_shared(mp, lf)	do { } while (0)
+# define xfs_rtbitmap_unlock_shared(mp, lf)	do { } while (0)
 #endif /* CONFIG_XFS_RT */
 
 #endif /* __XFS_RTBITMAP_H__ */
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 47a20cf5205f00..48887a163e6a3b 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -31,6 +31,7 @@
 #include "xfs_ag.h"
 #include "xfs_error.h"
 #include "xfs_quota.h"
+#include "xfs_rtbitmap.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index d310737c882367..2b4bd2eb71b57d 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -415,7 +415,7 @@ xchk_fscount_count_frextents(
 	if (!xfs_has_realtime(mp))
 		return 0;
 
-	xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+	xfs_rtbitmap_lock_shared(sc->mp, XFS_RBMLOCK_BITMAP);
 	error = xfs_rtalloc_query_all(sc->mp, sc->tp,
 			xchk_fscount_add_frextent, fsc);
 	if (error) {
@@ -424,7 +424,7 @@ xchk_fscount_count_frextents(
 	}
 
 out_unlock:
-	xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+	xfs_rtbitmap_unlock_shared(sc->mp, XFS_RBMLOCK_BITMAP);
 	return error;
 }
 #else
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index de59eec7476575..85dbb46452ca0b 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -533,7 +533,7 @@ xfs_getfsmap_rtdev_rtbitmap(
 	trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb);
 	trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb);
 
-	xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+	xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP);
 
 	/*
 	 * Set up query parameters to return free rtextents covering the range
@@ -557,7 +557,7 @@ xfs_getfsmap_rtdev_rtbitmap(
 	if (error)
 		goto err;
 err:
-	xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+	xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP);
 	return error;
 }
 #endif /* CONFIG_XFS_RT */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index e66f9bd5de5cff..86f928d30feda9 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -957,10 +957,10 @@ xfs_growfs_rt(
 		nargs.tp = tp;
 
 		/*
-		 * Lock out other callers by grabbing the bitmap inode lock.
+		 * Lock out other callers by grabbing the bitmap and summary
+		 * inode locks and joining them to the transaction.
 		 */
-		xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
-		xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL);
+		xfs_rtbitmap_lock(tp, mp);
 		/*
 		 * Update the bitmap inode's size ondisk and incore.  We need
 		 * to update the incore size so that inode inactivation won't
@@ -970,11 +970,6 @@ xfs_growfs_rt(
 			nsbp->sb_rbmblocks * nsbp->sb_blocksize;
 		i_size_write(VFS_I(mp->m_rbmip), mp->m_rbmip->i_disk_size);
 		xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE);
-		/*
-		 * Get the summary inode into the transaction.
-		 */
-		xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
-		xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL);
 		/*
 		 * Update the summary inode's size.  We need to update the
 		 * incore size so that inode inactivation won't punch what it
@@ -1142,10 +1137,10 @@ xfs_rtalloc_reinit_frextents(
 	uint64_t		val = 0;
 	int			error;
 
-	xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+	xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP);
 	error = xfs_rtalloc_query_all(mp, NULL, xfs_rtalloc_count_frextent,
 			&val);
-	xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP);
+	xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP);
 	if (error)
 		return error;
 
@@ -1382,10 +1377,7 @@ xfs_bmap_rtalloc(
 	 * Lock out modifications to both the RT bitmap and summary inodes
 	 */
 	if (!rtlocked) {
-		xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL|XFS_ILOCK_RTBITMAP);
-		xfs_trans_ijoin(ap->tp, mp->m_rbmip, XFS_ILOCK_EXCL);
-		xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL|XFS_ILOCK_RTSUM);
-		xfs_trans_ijoin(ap->tp, mp->m_rsumip, XFS_ILOCK_EXCL);
+		xfs_rtbitmap_lock(ap->tp, mp);
 		rtlocked = true;
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
  2024-03-27 11:03 ` [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
  2024-03-27 11:03 ` [PATCH 02/13] xfs: refactor realtime inode locking Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-27 14:55   ` Darrick J. Wong
  2024-03-28  4:13   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Currently xfs_bmap_del_extent_real frees RT extents before updating
the bmap btree, while it frees regular blocks after performing the bmap
btree update.  While this behavior goes back to the original commit,
I can't find any good reason for handling RT extent vs regular block
freeing differently.  We use the same transaction, and unless rmaps
or reflink are enabled (which currently aren't support for RT inodes)
there are no transactions rolls or deferred ops that can rely on this
ordering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 09d4b730ee9709..282b44deb9f864 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5107,8 +5107,7 @@ xfs_bmap_del_extent_real(
 {
 	xfs_fsblock_t		del_endblock=0;	/* first block past del */
 	xfs_fileoff_t		del_endoff;	/* first offset past del */
-	int			do_fx;	/* free extent at end of routine */
-	int			error;	/* error return value */
+	int			error = 0;	/* error return value */
 	struct xfs_bmbt_irec	got;	/* current extent entry */
 	xfs_fileoff_t		got_endoff;	/* first offset past got */
 	int			i;	/* temp state */
@@ -5151,20 +5150,10 @@ xfs_bmap_del_extent_real(
 		return -ENOSPC;
 
 	*logflagsp = XFS_ILOG_CORE;
-	if (xfs_ifork_is_realtime(ip, whichfork)) {
-		if (!(bflags & XFS_BMAPI_REMAP)) {
-			error = xfs_rtfree_blocks(tp, del->br_startblock,
-					del->br_blockcount);
-			if (error)
-				return error;
-		}
-
-		do_fx = 0;
+	if (xfs_ifork_is_realtime(ip, whichfork))
 		qfield = XFS_TRANS_DQ_RTBCOUNT;
-	} else {
-		do_fx = 1;
+	else
 		qfield = XFS_TRANS_DQ_BCOUNT;
-	}
 	nblks = del->br_blockcount;
 
 	del_endblock = del->br_startblock + del->br_blockcount;
@@ -5312,18 +5301,21 @@ xfs_bmap_del_extent_real(
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
-	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
+	if (!(bflags & XFS_BMAPI_REMAP)) {
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
+		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
+			error = xfs_rtfree_blocks(tp, del->br_startblock,
+					del->br_blockcount);
 		} else {
 			error = xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
 					XFS_AG_RESV_NONE,
 					((bflags & XFS_BMAPI_NODISCARD) ||
 					del->br_state == XFS_EXT_UNWRITTEN));
-			if (error)
-				return error;
 		}
+		if (error)
+			return error;
 	}
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 03/13] xfs: free RT extents after updating the bmap btree Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-27 15:07   ` Darrick J. Wong
  2024-03-28  4:15   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 05/13] xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

__xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary
inodes given that it is very high level code.  While this only looks ugly
right now, it will become a problem when supporting delayed allocations
for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents
and thus never unlock the rt inodes.

Move the locking into xfs_bmap_del_extent_real just before the call to
xfs_rtfree_blocks instead and use a new flag in the transaction to ensure
that the locking happens only once.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c   | 15 ++++++++-------
 fs/xfs/libxfs/xfs_shared.h |  3 +++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 282b44deb9f864..e5e199d325982f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5305,6 +5305,14 @@ xfs_bmap_del_extent_real(
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
 		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
+			/*
+			 * Ensure the bitmap and summary inodes are locked
+			 * and joined to the transaction before modifying them.
+			 */
+			if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
+				tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;
+				xfs_rtbitmap_lock(tp, mp);
+			}
 			error = xfs_rtfree_blocks(tp, del->br_startblock,
 					del->br_blockcount);
 		} else {
@@ -5406,13 +5414,6 @@ __xfs_bunmapi(
 	} else
 		cur = NULL;
 
-	if (isrt) {
-		/*
-		 * Synchronize by locking the realtime bitmap.
-		 */
-		xfs_rtbitmap_lock(tp, mp);
-	}
-
 	extno = 0;
 	while (end != (xfs_fileoff_t)-1 && end >= start &&
 	       (nexts == 0 || extno < nexts)) {
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index f35640ad3e7fe4..34f104ed372c09 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -137,6 +137,9 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
  */
 #define XFS_TRANS_LOWMODE		(1u << 8)
 
+/* Transaction has locked the rtbitmap and rtsum inodes */
+#define XFS_TRANS_RTBITMAP_LOCKED	(1u << 9)
+
 /*
  * Field values for xfs_trans_mod_sb.
  */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/13] xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  4:16   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 06/13] xfs: split xfs_mod_freecounter Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

And to make that more clear, rearrange the code a bit and add asserts
and a comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_trans.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7350640059cc60..924b460229e951 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -594,28 +594,38 @@ xfs_trans_unreserve_and_mod_sb(
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
-	int64_t			blkdelta = 0;
-	int64_t			rtxdelta = 0;
+	int64_t			blkdelta = tp->t_blk_res;
+	int64_t			rtxdelta = tp->t_rtx_res;
 	int64_t			idelta = 0;
 	int64_t			ifreedelta = 0;
 	int			error;
 
-	/* calculate deltas */
-	if (tp->t_blk_res > 0)
-		blkdelta = tp->t_blk_res;
-	if ((tp->t_fdblocks_delta != 0) &&
-	    (xfs_has_lazysbcount(mp) ||
-	     (tp->t_flags & XFS_TRANS_SB_DIRTY)))
+	/*
+	 * Calculate the deltas.
+	 *
+	 * t_fdblocks_delta and t_frextents_delta can be positive or negative:
+	 *
+	 *  - positive values indicate blocks freed in the transaction.
+	 *  - negative values indicate blocks allocated in the transaction
+	 *
+	 * Negative values can only happen if the transaction has a block
+	 * reservation that covers the allocated block.  The end result is
+	 * that the calculated delta values must always be positive and we
+	 * can only put back previous allocated or reserved blocks here.
+	 */
+	ASSERT(tp->t_blk_res || tp->t_fdblocks_delta >= 0);
+	if (xfs_has_lazysbcount(mp) || (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
 	        blkdelta += tp->t_fdblocks_delta;
+		ASSERT(blkdelta >= 0);
+	}
 
-	if (tp->t_rtx_res > 0)
-		rtxdelta = tp->t_rtx_res;
-	if ((tp->t_frextents_delta != 0) &&
-	    (tp->t_flags & XFS_TRANS_SB_DIRTY))
+	ASSERT(tp->t_rtx_res || tp->t_frextents_delta >= 0);
+	if (tp->t_flags & XFS_TRANS_SB_DIRTY) {
 		rtxdelta += tp->t_frextents_delta;
+		ASSERT(rtxdelta >= 0);
+	}
 
-	if (xfs_has_lazysbcount(mp) ||
-	     (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
+	if (xfs_has_lazysbcount(mp) || (tp->t_flags & XFS_TRANS_SB_DIRTY)) {
 		idelta = tp->t_icount_delta;
 		ifreedelta = tp->t_ifree_delta;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/13] xfs: split xfs_mod_freecounter
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 05/13] xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-27 15:12   ` Darrick J. Wong
  2024-03-28  4:18   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 07/13] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

xfs_mod_freecounter has two entirely separate code paths for adding or
subtracting from the free counters.  Only the subtract case looks at the
rsvd flag and can return an error.

Split xfs_mod_freecounter into separate helpers for subtracting or
adding the freecounter, and remove all the impossible to reach error
handling for the addition case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_ag.c      |  4 +--
 fs/xfs/libxfs/xfs_ag_resv.c | 24 ++++---------
 fs/xfs/libxfs/xfs_ag_resv.h |  2 +-
 fs/xfs/libxfs/xfs_alloc.c   |  4 +--
 fs/xfs/libxfs/xfs_bmap.c    | 23 +++++++------
 fs/xfs/scrub/fscounters.c   |  2 +-
 fs/xfs/scrub/repair.c       |  5 +--
 fs/xfs/xfs_fsops.c          | 29 +++++-----------
 fs/xfs/xfs_fsops.h          |  2 +-
 fs/xfs/xfs_mount.c          | 67 +++++++++++++++++++------------------
 fs/xfs/xfs_mount.h          | 27 ++++++++++-----
 fs/xfs/xfs_super.c          |  6 +---
 fs/xfs/xfs_trace.h          |  1 -
 fs/xfs/xfs_trans.c          | 25 +++++---------
 14 files changed, 97 insertions(+), 124 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index dc1873f76bffd5..189e3296fef6de 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -963,9 +963,7 @@ xfs_ag_shrink_space(
 	 * Disable perag reservations so it doesn't cause the allocation request
 	 * to fail. We'll reestablish reservation before we return.
 	 */
-	error = xfs_ag_resv_free(pag);
-	if (error)
-		return error;
+	xfs_ag_resv_free(pag);
 
 	/* internal log shouldn't also show up in the free space btrees */
 	error = xfs_alloc_vextent_exact_bno(&args,
diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
index da1057bd0e6067..216423df939e5c 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.c
+++ b/fs/xfs/libxfs/xfs_ag_resv.c
@@ -126,14 +126,13 @@ xfs_ag_resv_needed(
 }
 
 /* Clean out a reservation */
-static int
+static void
 __xfs_ag_resv_free(
 	struct xfs_perag		*pag,
 	enum xfs_ag_resv_type		type)
 {
 	struct xfs_ag_resv		*resv;
 	xfs_extlen_t			oldresv;
-	int				error;
 
 	trace_xfs_ag_resv_free(pag, type, 0);
 
@@ -149,30 +148,19 @@ __xfs_ag_resv_free(
 		oldresv = resv->ar_orig_reserved;
 	else
 		oldresv = resv->ar_reserved;
-	error = xfs_mod_fdblocks(pag->pag_mount, oldresv, true);
+	xfs_add_fdblocks(pag->pag_mount, oldresv);
 	resv->ar_reserved = 0;
 	resv->ar_asked = 0;
 	resv->ar_orig_reserved = 0;
-
-	if (error)
-		trace_xfs_ag_resv_free_error(pag->pag_mount, pag->pag_agno,
-				error, _RET_IP_);
-	return error;
 }
 
 /* Free a per-AG reservation. */
-int
+void
 xfs_ag_resv_free(
 	struct xfs_perag		*pag)
 {
-	int				error;
-	int				err2;
-
-	error = __xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT);
-	err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
-	if (err2 && !error)
-		error = err2;
-	return error;
+	__xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT);
+	__xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
 }
 
 static int
@@ -216,7 +204,7 @@ __xfs_ag_resv_init(
 	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL))
 		error = -ENOSPC;
 	else
-		error = xfs_mod_fdblocks(mp, -(int64_t)hidden_space, true);
+		error = xfs_dec_fdblocks(mp, hidden_space, true);
 	if (error) {
 		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
 				error, _RET_IP_);
diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h
index b74b210008ea7e..ff20ed93de7724 100644
--- a/fs/xfs/libxfs/xfs_ag_resv.h
+++ b/fs/xfs/libxfs/xfs_ag_resv.h
@@ -6,7 +6,7 @@
 #ifndef __XFS_AG_RESV_H__
 #define	__XFS_AG_RESV_H__
 
-int xfs_ag_resv_free(struct xfs_perag *pag);
+void xfs_ag_resv_free(struct xfs_perag *pag);
 int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp);
 
 bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 9da52e92172aba..6cb8b2ddc541b4 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -79,7 +79,7 @@ xfs_prealloc_blocks(
 }
 
 /*
- * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
+ * The number of blocks per AG that we withhold from xfs_dec_fdblocks to
  * guarantee that we can refill the AGFL prior to allocating space in a nearly
  * full AG.  Although the space described by the free space btrees, the
  * blocks used by the freesp btrees themselves, and the blocks owned by the
@@ -89,7 +89,7 @@ xfs_prealloc_blocks(
  * until the fs goes down, we subtract this many AG blocks from the incore
  * fdblocks to ensure user allocation does not overcommit the space the
  * filesystem needs for the AGFLs.  The rmap btree uses a per-AG reservation to
- * withhold space from xfs_mod_fdblocks, so we do not account for that here.
+ * withhold space from xfs_dec_fdblocks, so we do not account for that here.
  */
 #define XFS_ALLOCBT_AGFL_RESERVE	4
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index e5e199d325982f..4e81baf5e95301 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1983,10 +1983,11 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	/* adjust for changes in reserved delayed indirect blocks */
-	if (da_new != da_old) {
-		ASSERT(state == 0 || da_new < da_old);
-		error = xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new),
-				false);
+	if (da_new < da_old) {
+		xfs_add_fdblocks(mp, da_old - da_new);
+	} else if (da_new > da_old) {
+		ASSERT(state == 0);
+		error = xfs_dec_fdblocks(mp, da_new - da_old, false);
 	}
 
 	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
@@ -2688,8 +2689,8 @@ xfs_bmap_add_extent_hole_delay(
 	}
 	if (oldlen != newlen) {
 		ASSERT(oldlen > newlen);
-		xfs_mod_fdblocks(ip->i_mount, (int64_t)(oldlen - newlen),
-				 false);
+		xfs_add_fdblocks(ip->i_mount, oldlen - newlen);
+
 		/*
 		 * Nothing to do for disk quota accounting here.
 		 */
@@ -4108,11 +4109,11 @@ xfs_bmapi_reserve_delalloc(
 	indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
 	ASSERT(indlen > 0);
 
-	error = xfs_mod_fdblocks(mp, -((int64_t)alen), false);
+	error = xfs_dec_fdblocks(mp, alen, false);
 	if (error)
 		goto out_unreserve_quota;
 
-	error = xfs_mod_fdblocks(mp, -((int64_t)indlen), false);
+	error = xfs_dec_fdblocks(mp, indlen, false);
 	if (error)
 		goto out_unreserve_blocks;
 
@@ -4140,7 +4141,7 @@ xfs_bmapi_reserve_delalloc(
 	return 0;
 
 out_unreserve_blocks:
-	xfs_mod_fdblocks(mp, alen, false);
+	xfs_add_fdblocks(mp, alen);
 out_unreserve_quota:
 	if (XFS_IS_QUOTA_ON(mp))
 		xfs_quota_unreserve_blkres(ip, alen);
@@ -4926,7 +4927,7 @@ xfs_bmap_del_extent_delay(
 	ASSERT(got_endoff >= del_endoff);
 
 	if (isrt)
-		xfs_mod_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
+		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
 
 	/*
 	 * Update the inode delalloc counter now and wait to update the
@@ -5013,7 +5014,7 @@ xfs_bmap_del_extent_delay(
 	if (!isrt)
 		da_diff += del->br_blockcount;
 	if (da_diff) {
-		xfs_mod_fdblocks(mp, da_diff, false);
+		xfs_add_fdblocks(mp, da_diff);
 		xfs_mod_delalloc(mp, -da_diff);
 	}
 	return error;
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index 2b4bd2eb71b57d..c910b5f1306913 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -517,7 +517,7 @@ xchk_fscounters(
 
 	/*
 	 * If the filesystem is not frozen, the counter summation calls above
-	 * can race with xfs_mod_freecounter, which subtracts a requested space
+	 * can race with xfs_dec_freecounter, which subtracts a requested space
 	 * reservation from the counter and undoes the subtraction if that made
 	 * the counter go negative.  Therefore, it's possible to see negative
 	 * values here, and we should only flag that as a corruption if we
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index f43dce771cdd26..6123e6c7ac7d67 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -963,9 +963,7 @@ xrep_reset_perag_resv(
 	ASSERT(sc->tp);
 
 	sc->flags &= ~XREP_RESET_PERAG_RESV;
-	error = xfs_ag_resv_free(sc->sa.pag);
-	if (error)
-		goto out;
+	xfs_ag_resv_free(sc->sa.pag);
 	error = xfs_ag_resv_init(sc->sa.pag, sc->tp);
 	if (error == -ENOSPC) {
 		xfs_err(sc->mp,
@@ -974,7 +972,6 @@ xrep_reset_perag_resv(
 		error = 0;
 	}
 
-out:
 	return error;
 }
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 83f708f62ed9f2..c211ea2b63c4dd 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -213,10 +213,8 @@ xfs_growfs_data_private(
 			struct xfs_perag	*pag;
 
 			pag = xfs_perag_get(mp, id.agno);
-			error = xfs_ag_resv_free(pag);
+			xfs_ag_resv_free(pag);
 			xfs_perag_put(pag);
-			if (error)
-				return error;
 		}
 		/*
 		 * Reserve AG metadata blocks. ENOSPC here does not mean there
@@ -385,14 +383,14 @@ xfs_reserve_blocks(
 	 */
 	if (mp->m_resblks > request) {
 		lcounter = mp->m_resblks_avail - request;
-		if (lcounter  > 0) {		/* release unused blocks */
+		if (lcounter > 0) {		/* release unused blocks */
 			fdblks_delta = lcounter;
 			mp->m_resblks_avail -= lcounter;
 		}
 		mp->m_resblks = request;
 		if (fdblks_delta) {
 			spin_unlock(&mp->m_sb_lock);
-			error = xfs_mod_fdblocks(mp, fdblks_delta, 0);
+			xfs_add_fdblocks(mp, fdblks_delta);
 			spin_lock(&mp->m_sb_lock);
 		}
 
@@ -428,9 +426,9 @@ xfs_reserve_blocks(
 		 */
 		fdblks_delta = min(free, delta);
 		spin_unlock(&mp->m_sb_lock);
-		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
+		error = xfs_dec_fdblocks(mp, fdblks_delta, 0);
 		if (!error)
-			xfs_mod_fdblocks(mp, fdblks_delta, 0);
+			xfs_add_fdblocks(mp, fdblks_delta);
 		spin_lock(&mp->m_sb_lock);
 	}
 out:
@@ -556,24 +554,13 @@ xfs_fs_reserve_ag_blocks(
 /*
  * Free space reserved for per-AG metadata.
  */
-int
+void
 xfs_fs_unreserve_ag_blocks(
 	struct xfs_mount	*mp)
 {
 	xfs_agnumber_t		agno;
 	struct xfs_perag	*pag;
-	int			error = 0;
-	int			err2;
 
-	for_each_perag(mp, agno, pag) {
-		err2 = xfs_ag_resv_free(pag);
-		if (err2 && !error)
-			error = err2;
-	}
-
-	if (error)
-		xfs_warn(mp,
-	"Error %d freeing per-AG metadata reserve pool.", error);
-
-	return error;
+	for_each_perag(mp, agno, pag)
+		xfs_ag_resv_free(pag);
 }
diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
index 44457b0a059376..3e2f73bcf8314b 100644
--- a/fs/xfs/xfs_fsops.h
+++ b/fs/xfs/xfs_fsops.h
@@ -12,6 +12,6 @@ int xfs_reserve_blocks(struct xfs_mount *mp, uint64_t request);
 int xfs_fs_goingdown(struct xfs_mount *mp, uint32_t inflags);
 
 int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
-int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
+void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
 
 #endif	/* __XFS_FSOPS_H__ */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index df370eb5dc15e1..e7dfa259d0e6a2 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1131,16 +1131,44 @@ xfs_fs_writable(
 	return true;
 }
 
-/* Adjust m_fdblocks or m_frextents. */
+void
+xfs_add_freecounter(
+	struct xfs_mount	*mp,
+	struct percpu_counter	*counter,
+	uint64_t		delta)
+{
+	bool			has_resv_pool = (counter == &mp->m_fdblocks);
+	uint64_t		res_used;
+
+	/*
+	 * If the reserve pool is depleted, put blocks back into it first.
+	 * Most of the time the pool is full.
+	 */
+	if (!has_resv_pool || mp->m_resblks == mp->m_resblks_avail) {
+		percpu_counter_add(counter, delta);
+		return;
+	}
+
+	spin_lock(&mp->m_sb_lock);
+	res_used = mp->m_resblks - mp->m_resblks_avail;
+	if (res_used > delta) {
+		mp->m_resblks_avail += delta;
+	} else {
+		delta -= res_used;
+		mp->m_resblks_avail = mp->m_resblks;
+		percpu_counter_add(counter, delta);
+	}
+	spin_unlock(&mp->m_sb_lock);
+}
+
 int
-xfs_mod_freecounter(
+xfs_dec_freecounter(
 	struct xfs_mount	*mp,
 	struct percpu_counter	*counter,
-	int64_t			delta,
+	uint64_t		delta,
 	bool			rsvd)
 {
 	int64_t			lcounter;
-	long long		res_used;
 	uint64_t		set_aside = 0;
 	s32			batch;
 	bool			has_resv_pool;
@@ -1150,31 +1178,6 @@ xfs_mod_freecounter(
 	if (rsvd)
 		ASSERT(has_resv_pool);
 
-	if (delta > 0) {
-		/*
-		 * If the reserve pool is depleted, put blocks back into it
-		 * first. Most of the time the pool is full.
-		 */
-		if (likely(!has_resv_pool ||
-			   mp->m_resblks == mp->m_resblks_avail)) {
-			percpu_counter_add(counter, delta);
-			return 0;
-		}
-
-		spin_lock(&mp->m_sb_lock);
-		res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
-
-		if (res_used > delta) {
-			mp->m_resblks_avail += delta;
-		} else {
-			delta -= res_used;
-			mp->m_resblks_avail = mp->m_resblks;
-			percpu_counter_add(counter, delta);
-		}
-		spin_unlock(&mp->m_sb_lock);
-		return 0;
-	}
-
 	/*
 	 * Taking blocks away, need to be more accurate the closer we
 	 * are to zero.
@@ -1202,7 +1205,7 @@ xfs_mod_freecounter(
 	 */
 	if (has_resv_pool)
 		set_aside = xfs_fdblocks_unavailable(mp);
-	percpu_counter_add_batch(counter, delta, batch);
+	percpu_counter_add_batch(counter, -((int64_t)delta), batch);
 	if (__percpu_counter_compare(counter, set_aside,
 				     XFS_FDBLOCKS_BATCH) >= 0) {
 		/* we had space! */
@@ -1214,11 +1217,11 @@ xfs_mod_freecounter(
 	 * that took us to ENOSPC.
 	 */
 	spin_lock(&mp->m_sb_lock);
-	percpu_counter_add(counter, -delta);
+	percpu_counter_add(counter, delta);
 	if (!has_resv_pool || !rsvd)
 		goto fdblocks_enospc;
 
-	lcounter = (long long)mp->m_resblks_avail + delta;
+	lcounter = (long long)mp->m_resblks_avail - delta;
 	if (lcounter >= 0) {
 		mp->m_resblks_avail = lcounter;
 		spin_unlock(&mp->m_sb_lock);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e880aa48de68bb..d941437a0c7369 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -534,19 +534,30 @@ xfs_fdblocks_unavailable(
 	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
 }
 
-int xfs_mod_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
-		int64_t delta, bool rsvd);
+int xfs_dec_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
+		uint64_t delta, bool rsvd);
+void xfs_add_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
+		uint64_t delta);
 
-static inline int
-xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta, bool reserved)
+static inline int xfs_dec_fdblocks(struct xfs_mount *mp, uint64_t delta,
+		bool reserved)
 {
-	return xfs_mod_freecounter(mp, &mp->m_fdblocks, delta, reserved);
+	return xfs_dec_freecounter(mp, &mp->m_fdblocks, delta, reserved);
 }
 
-static inline int
-xfs_mod_frextents(struct xfs_mount *mp, int64_t delta)
+static inline void xfs_add_fdblocks(struct xfs_mount *mp, uint64_t delta)
 {
-	return xfs_mod_freecounter(mp, &mp->m_frextents, delta, false);
+	xfs_add_freecounter(mp, &mp->m_fdblocks, delta);
+}
+
+static inline int xfs_dec_frextents(struct xfs_mount *mp, uint64_t delta)
+{
+	return xfs_dec_freecounter(mp, &mp->m_frextents, delta, false);
+}
+
+static inline void xfs_add_frextents(struct xfs_mount *mp, uint64_t delta)
+{
+	xfs_add_freecounter(mp, &mp->m_frextents, delta);
 }
 
 extern int	xfs_readsb(xfs_mount_t *, int);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index c21f10ab0f5dbe..784ff7d1c986ba 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1873,11 +1873,7 @@ xfs_remount_ro(
 	xfs_inodegc_stop(mp);
 
 	/* Free the per-AG metadata reservation pool. */
-	error = xfs_fs_unreserve_ag_blocks(mp);
-	if (error) {
-		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
-		return error;
-	}
+	xfs_fs_unreserve_ag_blocks(mp);
 
 	/*
 	 * Before we sync the metadata, we need to free up the reserve block
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index aea97fc074f8de..1c5753f91c388e 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3062,7 +3062,6 @@ DEFINE_AG_RESV_EVENT(xfs_ag_resv_free_extent);
 DEFINE_AG_RESV_EVENT(xfs_ag_resv_critical);
 DEFINE_AG_RESV_EVENT(xfs_ag_resv_needed);
 
-DEFINE_AG_ERROR_EVENT(xfs_ag_resv_free_error);
 DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error);
 
 /* refcount tracepoint classes */
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 924b460229e951..b18d478e2c9e85 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -163,7 +163,7 @@ xfs_trans_reserve(
 	 * fail if the count would go below zero.
 	 */
 	if (blocks > 0) {
-		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
+		error = xfs_dec_fdblocks(mp, blocks, rsvd);
 		if (error != 0)
 			return -ENOSPC;
 		tp->t_blk_res += blocks;
@@ -210,7 +210,7 @@ xfs_trans_reserve(
 	 * fail if the count would go below zero.
 	 */
 	if (rtextents > 0) {
-		error = xfs_mod_frextents(mp, -((int64_t)rtextents));
+		error = xfs_dec_frextents(mp, rtextents);
 		if (error) {
 			error = -ENOSPC;
 			goto undo_log;
@@ -234,7 +234,7 @@ xfs_trans_reserve(
 
 undo_blocks:
 	if (blocks > 0) {
-		xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd);
+		xfs_add_fdblocks(mp, blocks);
 		tp->t_blk_res = 0;
 	}
 	return error;
@@ -593,12 +593,10 @@ xfs_trans_unreserve_and_mod_sb(
 	struct xfs_trans	*tp)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
-	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 	int64_t			blkdelta = tp->t_blk_res;
 	int64_t			rtxdelta = tp->t_rtx_res;
 	int64_t			idelta = 0;
 	int64_t			ifreedelta = 0;
-	int			error;
 
 	/*
 	 * Calculate the deltas.
@@ -631,10 +629,8 @@ xfs_trans_unreserve_and_mod_sb(
 	}
 
 	/* apply the per-cpu counters */
-	if (blkdelta) {
-		error = xfs_mod_fdblocks(mp, blkdelta, rsvd);
-		ASSERT(!error);
-	}
+	if (blkdelta)
+		xfs_add_fdblocks(mp, blkdelta);
 
 	if (idelta)
 		percpu_counter_add_batch(&mp->m_icount, idelta,
@@ -643,10 +639,8 @@ xfs_trans_unreserve_and_mod_sb(
 	if (ifreedelta)
 		percpu_counter_add(&mp->m_ifree, ifreedelta);
 
-	if (rtxdelta) {
-		error = xfs_mod_frextents(mp, rtxdelta);
-		ASSERT(!error);
-	}
+	if (rtxdelta)
+		xfs_add_frextents(mp, rtxdelta);
 
 	if (!(tp->t_flags & XFS_TRANS_SB_DIRTY))
 		return;
@@ -682,7 +676,6 @@ xfs_trans_unreserve_and_mod_sb(
 	 */
 	ASSERT(mp->m_sb.sb_imax_pct >= 0);
 	ASSERT(mp->m_sb.sb_rextslog >= 0);
-	return;
 }
 
 /* Add the given log item to the transaction's list of log items. */
@@ -1301,9 +1294,9 @@ xfs_trans_reserve_more_inode(
 		return 0;
 
 	/* Quota failed, give back the new reservation. */
-	xfs_mod_fdblocks(mp, dblocks, tp->t_flags & XFS_TRANS_RESERVE);
+	xfs_add_fdblocks(mp, dblocks);
 	tp->t_blk_res -= dblocks;
-	xfs_mod_frextents(mp, rtx);
+	xfs_add_frextents(mp, rtx);
 	tp->t_rtx_res -= rtx;
 	return error;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/13] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 06/13] xfs: split xfs_mod_freecounter Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  4:20   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 08/13] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Allocate data blocks for RT inodes using xfs_dec_frextents.  While at
it optimize the data device case by doing only a single xfs_dec_fdblocks
call for the extent itself and the indirect blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4e81baf5e95301..572c91c986b6af 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4067,6 +4067,7 @@ xfs_bmapi_reserve_delalloc(
 	struct xfs_ifork	*ifp = xfs_ifork_ptr(ip, whichfork);
 	xfs_extlen_t		alen;
 	xfs_extlen_t		indlen;
+	uint64_t		fdblocks;
 	int			error;
 	xfs_fileoff_t		aoff = off;
 
@@ -4109,14 +4110,18 @@ xfs_bmapi_reserve_delalloc(
 	indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
 	ASSERT(indlen > 0);
 
-	error = xfs_dec_fdblocks(mp, alen, false);
-	if (error)
-		goto out_unreserve_quota;
+	fdblocks = indlen;
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		error = xfs_dec_frextents(mp, xfs_rtb_to_rtx(mp, alen));
+		if (error)
+			goto out_unreserve_quota;
+	} else {
+		fdblocks += alen;
+	}
 
-	error = xfs_dec_fdblocks(mp, indlen, false);
+	error = xfs_dec_fdblocks(mp, fdblocks, false);
 	if (error)
-		goto out_unreserve_blocks;
-
+		goto out_unreserve_frextents;
 
 	ip->i_delayed_blks += alen;
 	xfs_mod_delalloc(ip->i_mount, alen + indlen);
@@ -4140,8 +4145,9 @@ xfs_bmapi_reserve_delalloc(
 
 	return 0;
 
-out_unreserve_blocks:
-	xfs_add_fdblocks(mp, alen);
+out_unreserve_frextents:
+	if (XFS_IS_REALTIME_INODE(ip))
+		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, alen));
 out_unreserve_quota:
 	if (XFS_IS_QUOTA_ON(mp))
 		xfs_quota_unreserve_blkres(ip, alen);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/13] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 07/13] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  4:22   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

The code to account fdblocks and frextents in xfs_bmap_del_extent_delay
is a bit weird in that it accounts frextents before the iext tree
manipulations and fdblocks after it.  Given that the iext tree
manipulations cannot fail currently that's not really a problem, but
still odd.  Move the frextent manipulation to the end, and use a
fdblocks variable to account of the unconditional indirect blocks and
the data blocks only freed for !RT.  This prepares for following
updates in the area and already makes the code more readable.

Also remove the !isrt assert given that this code clearly handles
rt extents correctly, and we'll soon reinstate delalloc support for
RT inodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 572c91c986b6af..a9a23a5d2e487f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4917,6 +4917,7 @@ xfs_bmap_del_extent_delay(
 	xfs_fileoff_t		del_endoff, got_endoff;
 	xfs_filblks_t		got_indlen, new_indlen, stolen;
 	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
+	uint64_t		fdblocks;
 	int			error = 0;
 	bool			isrt;
 
@@ -4932,15 +4933,11 @@ xfs_bmap_del_extent_delay(
 	ASSERT(got->br_startoff <= del->br_startoff);
 	ASSERT(got_endoff >= del_endoff);
 
-	if (isrt)
-		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
-
 	/*
 	 * Update the inode delalloc counter now and wait to update the
 	 * sb counters as we might have to borrow some blocks for the
 	 * indirect block accounting.
 	 */
-	ASSERT(!isrt);
 	error = xfs_quota_unreserve_blkres(ip, del->br_blockcount);
 	if (error)
 		return error;
@@ -5017,12 +5014,15 @@ xfs_bmap_del_extent_delay(
 
 	ASSERT(da_old >= da_new);
 	da_diff = da_old - da_new;
-	if (!isrt)
-		da_diff += del->br_blockcount;
-	if (da_diff) {
-		xfs_add_fdblocks(mp, da_diff);
-		xfs_mod_delalloc(mp, -da_diff);
-	}
+	fdblocks = da_diff;
+
+	if (isrt)
+		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
+	else
+		fdblocks += del->br_blockcount;
+
+	xfs_add_fdblocks(mp, fdblocks);
+	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
 	return error;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 08/13] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-27 15:20   ` Darrick J. Wong
  2024-03-28  4:27   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

To prepare for re-enabling delalloc on RT devices, track the data blocks
(which use the RT device when the inode sits on it) and the indirect
blocks (which don't) separately to xfs_mod_delalloc, and add a new
percpu counter to also track the RT delalloc blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c         | 12 ++++++------
 fs/xfs/scrub/fscounters.c        |  6 +++++-
 fs/xfs/scrub/fscounters.h        |  1 +
 fs/xfs/scrub/fscounters_repair.c |  3 ++-
 fs/xfs/xfs_mount.c               | 18 +++++++++++++++---
 fs/xfs/xfs_mount.h               |  9 ++++++++-
 fs/xfs/xfs_super.c               | 11 ++++++++++-
 7 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a9a23a5d2e487f..131d4f063b660a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1975,7 +1975,7 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	if (da_new != da_old)
-		xfs_mod_delalloc(mp, (int64_t)da_new - da_old);
+		xfs_mod_delalloc(bma->ip, 0, (int64_t)da_new - da_old);
 
 	if (bma->cur) {
 		da_new += bma->cur->bc_bmap.allocated;
@@ -2694,7 +2694,7 @@ xfs_bmap_add_extent_hole_delay(
 		/*
 		 * Nothing to do for disk quota accounting here.
 		 */
-		xfs_mod_delalloc(ip->i_mount, (int64_t)newlen - oldlen);
+		xfs_mod_delalloc(ip, 0, (int64_t)newlen - oldlen);
 	}
 }
 
@@ -3371,7 +3371,7 @@ xfs_bmap_alloc_account(
 		 * yet.
 		 */
 		if (ap->wasdel) {
-			xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
+			xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0);
 			return;
 		}
 
@@ -3395,7 +3395,7 @@ xfs_bmap_alloc_account(
 	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 	if (ap->wasdel) {
 		ap->ip->i_delayed_blks -= ap->length;
-		xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
+		xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0);
 		fld = isrt ? XFS_TRANS_DQ_DELRTBCOUNT : XFS_TRANS_DQ_DELBCOUNT;
 	} else {
 		fld = isrt ? XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT;
@@ -4124,7 +4124,7 @@ xfs_bmapi_reserve_delalloc(
 		goto out_unreserve_frextents;
 
 	ip->i_delayed_blks += alen;
-	xfs_mod_delalloc(ip->i_mount, alen + indlen);
+	xfs_mod_delalloc(ip, alen, indlen);
 
 	got->br_startoff = aoff;
 	got->br_startblock = nullstartblock(indlen);
@@ -5022,7 +5022,7 @@ xfs_bmap_del_extent_delay(
 		fdblocks += del->br_blockcount;
 
 	xfs_add_fdblocks(mp, fdblocks);
-	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
+	xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff);
 	return error;
 }
 
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index c910b5f1306913..d18a3cac30479d 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -412,6 +412,7 @@ xchk_fscount_count_frextents(
 	int			error;
 
 	fsc->frextents = 0;
+	fsc->frextents_delayed = 0;
 	if (!xfs_has_realtime(mp))
 		return 0;
 
@@ -423,6 +424,8 @@ xchk_fscount_count_frextents(
 		goto out_unlock;
 	}
 
+	fsc->frextents_delayed = percpu_counter_sum(&mp->m_delalloc_rtextents);
+
 out_unlock:
 	xfs_rtbitmap_unlock_shared(sc->mp, XFS_RBMLOCK_BITMAP);
 	return error;
@@ -434,6 +437,7 @@ xchk_fscount_count_frextents(
 	struct xchk_fscounters	*fsc)
 {
 	fsc->frextents = 0;
+	fsc->frextents_delayed = 0;
 	return 0;
 }
 #endif /* CONFIG_XFS_RT */
@@ -593,7 +597,7 @@ xchk_fscounters(
 	}
 
 	if (!xchk_fscount_within_range(sc, frextents, &mp->m_frextents,
-			fsc->frextents)) {
+			fsc->frextents - fsc->frextents_delayed)) {
 		if (fsc->frozen)
 			xchk_set_corrupt(sc);
 		else
diff --git a/fs/xfs/scrub/fscounters.h b/fs/xfs/scrub/fscounters.h
index 461a13d25f4b38..bcf56e1c36f91c 100644
--- a/fs/xfs/scrub/fscounters.h
+++ b/fs/xfs/scrub/fscounters.h
@@ -12,6 +12,7 @@ struct xchk_fscounters {
 	uint64_t		ifree;
 	uint64_t		fdblocks;
 	uint64_t		frextents;
+	uint64_t		frextents_delayed;
 	unsigned long long	icount_min;
 	unsigned long long	icount_max;
 	bool			frozen;
diff --git a/fs/xfs/scrub/fscounters_repair.c b/fs/xfs/scrub/fscounters_repair.c
index 94cdb852bee462..210ebbcf3e1520 100644
--- a/fs/xfs/scrub/fscounters_repair.c
+++ b/fs/xfs/scrub/fscounters_repair.c
@@ -65,7 +65,8 @@ xrep_fscounters(
 	percpu_counter_set(&mp->m_icount, fsc->icount);
 	percpu_counter_set(&mp->m_ifree, fsc->ifree);
 	percpu_counter_set(&mp->m_fdblocks, fsc->fdblocks);
-	percpu_counter_set(&mp->m_frextents, fsc->frextents);
+	percpu_counter_set(&mp->m_frextents,
+		fsc->frextents - fsc->frextents_delayed);
 	mp->m_sb.sb_frextents = fsc->frextents;
 
 	return 0;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e7dfa259d0e6a2..2feb1a81bb87b5 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -34,6 +34,7 @@
 #include "xfs_health.h"
 #include "xfs_trace.h"
 #include "xfs_ag.h"
+#include "xfs_rtbitmap.h"
 #include "scrub/stats.h"
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
@@ -1402,9 +1403,20 @@ xfs_clear_incompat_log_features(
 #define XFS_DELALLOC_BATCH	(4096)
 void
 xfs_mod_delalloc(
-	struct xfs_mount	*mp,
-	int64_t			delta)
+	struct xfs_inode	*ip,
+	int64_t			data_delta,
+	int64_t			ind_delta)
 {
-	percpu_counter_add_batch(&mp->m_delalloc_blks, delta,
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip)) {
+		percpu_counter_add_batch(&mp->m_delalloc_rtextents,
+				xfs_rtb_to_rtx(mp, data_delta),
+				XFS_DELALLOC_BATCH);
+		if (!ind_delta)
+			return;
+		data_delta = 0;
+	}
+	percpu_counter_add_batch(&mp->m_delalloc_blks, data_delta + ind_delta,
 			XFS_DELALLOC_BATCH);
 }
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d941437a0c7369..0e8d7779c0a561 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -195,6 +195,12 @@ typedef struct xfs_mount {
 	 * extents or anything related to the rt device.
 	 */
 	struct percpu_counter	m_delalloc_blks;
+
+	/*
+	 * RT version of the above.
+	 */
+	struct percpu_counter	m_delalloc_rtextents;
+
 	/*
 	 * Global count of allocation btree blocks in use across all AGs. Only
 	 * used when perag reservation is enabled. Helps prevent block
@@ -577,6 +583,7 @@ struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 void xfs_force_summary_recalc(struct xfs_mount *mp);
 int xfs_add_incompat_log_feature(struct xfs_mount *mp, uint32_t feature);
 bool xfs_clear_incompat_log_features(struct xfs_mount *mp);
-void xfs_mod_delalloc(struct xfs_mount *mp, int64_t delta);
+void xfs_mod_delalloc(struct xfs_inode *ip, int64_t data_delta,
+		int64_t ind_delta);
 
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 784ff7d1c986ba..39a5e664870485 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1051,12 +1051,18 @@ xfs_init_percpu_counters(
 	if (error)
 		goto free_fdblocks;
 
-	error = percpu_counter_init(&mp->m_frextents, 0, GFP_KERNEL);
+	error = percpu_counter_init(&mp->m_delalloc_rtextents, 0, GFP_KERNEL);
 	if (error)
 		goto free_delalloc;
 
+	error = percpu_counter_init(&mp->m_frextents, 0, GFP_KERNEL);
+	if (error)
+		goto free_delalloc_rt;
+
 	return 0;
 
+free_delalloc_rt:
+	percpu_counter_destroy(&mp->m_delalloc_rtextents);
 free_delalloc:
 	percpu_counter_destroy(&mp->m_delalloc_blks);
 free_fdblocks:
@@ -1085,6 +1091,9 @@ xfs_destroy_percpu_counters(
 	percpu_counter_destroy(&mp->m_icount);
 	percpu_counter_destroy(&mp->m_ifree);
 	percpu_counter_destroy(&mp->m_fdblocks);
+	ASSERT(xfs_is_shutdown(mp) ||
+	       percpu_counter_sum(&mp->m_delalloc_rtextents) == 0);
+	percpu_counter_destroy(&mp->m_delalloc_rtextents);
 	ASSERT(xfs_is_shutdown(mp) ||
 	       percpu_counter_sum(&mp->m_delalloc_blks) == 0);
 	percpu_counter_destroy(&mp->m_delalloc_blks);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  4:32   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 11/13] xfs: rework splitting of indirect block reservations Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Add a check for files on the RT subvolume and use m_frextents instead
of m_fdblocks to adjust the preallocation size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_iomap.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4087af7f3c9f3f..e0c205bcf03404 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -28,6 +28,7 @@
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
 #include "xfs_health.h"
+#include "xfs_rtbitmap.h"
 
 #define XFS_ALLOC_ALIGN(mp, off) \
 	(((off) >> mp->m_allocsize_log) << mp->m_allocsize_log)
@@ -404,6 +405,29 @@ xfs_quota_calc_throttle(
 	}
 }
 
+static int64_t
+xfs_iomap_freesp(
+	struct percpu_counter	*counter,
+	uint64_t		low_space[XFS_LOWSP_MAX],
+	int			*shift)
+{
+	int64_t			freesp;
+
+	freesp = percpu_counter_read_positive(counter);
+	if (freesp < low_space[XFS_LOWSP_5_PCNT]) {
+		*shift = 2;
+		if (freesp < low_space[XFS_LOWSP_4_PCNT])
+			(*shift)++;
+		if (freesp < low_space[XFS_LOWSP_3_PCNT])
+			(*shift)++;
+		if (freesp < low_space[XFS_LOWSP_2_PCNT])
+			(*shift)++;
+		if (freesp < low_space[XFS_LOWSP_1_PCNT])
+			(*shift)++;
+	}
+	return freesp;
+}
+
 /*
  * If we don't have a user specified preallocation size, dynamically increase
  * the preallocation size as the size of the file grows.  Cap the maximum size
@@ -486,18 +510,12 @@ xfs_iomap_prealloc_size(
 	alloc_blocks = XFS_FILEOFF_MIN(roundup_pow_of_two(XFS_MAX_BMBT_EXTLEN),
 				       alloc_blocks);
 
-	freesp = percpu_counter_read_positive(&mp->m_fdblocks);
-	if (freesp < mp->m_low_space[XFS_LOWSP_5_PCNT]) {
-		shift = 2;
-		if (freesp < mp->m_low_space[XFS_LOWSP_4_PCNT])
-			shift++;
-		if (freesp < mp->m_low_space[XFS_LOWSP_3_PCNT])
-			shift++;
-		if (freesp < mp->m_low_space[XFS_LOWSP_2_PCNT])
-			shift++;
-		if (freesp < mp->m_low_space[XFS_LOWSP_1_PCNT])
-			shift++;
-	}
+	if (unlikely(XFS_IS_REALTIME_INODE(ip)))
+		freesp = xfs_rtx_to_rtb(mp, xfs_iomap_freesp(&mp->m_frextents,
+				mp->m_low_rtexts, &shift));
+	else
+		freesp = xfs_iomap_freesp(&mp->m_fdblocks, mp->m_low_space,
+				&shift);
 
 	/*
 	 * Check each quota to cap the prealloc size, provide a shift value to
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/13] xfs: rework splitting of indirect block reservations
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-27 15:14   ` Darrick J. Wong
  2024-03-28  4:35   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
  2024-03-27 11:03 ` [PATCH 13/13] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig
  12 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Move the check if we have enough indirect blocks and the stealing of
the deleted extent blocks out of xfs_bmap_split_indlen and into the
caller to prepare for handling delayed allocation of RT extents that
can't easily be stolen.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 131d4f063b660a..9d0b7caa9a036c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4829,31 +4829,17 @@ xfs_bmapi_remap(
  * ores == 1). The number of stolen blocks is returned. The availability and
  * subsequent accounting of stolen blocks is the responsibility of the caller.
  */
-static xfs_filblks_t
+static void
 xfs_bmap_split_indlen(
 	xfs_filblks_t			ores,		/* original res. */
 	xfs_filblks_t			*indlen1,	/* ext1 worst indlen */
-	xfs_filblks_t			*indlen2,	/* ext2 worst indlen */
-	xfs_filblks_t			avail)		/* stealable blocks */
+	xfs_filblks_t			*indlen2)	/* ext2 worst indlen */
 {
 	xfs_filblks_t			len1 = *indlen1;
 	xfs_filblks_t			len2 = *indlen2;
 	xfs_filblks_t			nres = len1 + len2; /* new total res. */
-	xfs_filblks_t			stolen = 0;
 	xfs_filblks_t			resfactor;
 
-	/*
-	 * Steal as many blocks as we can to try and satisfy the worst case
-	 * indlen for both new extents.
-	 */
-	if (ores < nres && avail)
-		stolen = XFS_FILBLKS_MIN(nres - ores, avail);
-	ores += stolen;
-
-	 /* nothing else to do if we've satisfied the new reservation */
-	if (ores >= nres)
-		return stolen;
-
 	/*
 	 * We can't meet the total required reservation for the two extents.
 	 * Calculate the percent of the overall shortage between both extents
@@ -4898,8 +4884,6 @@ xfs_bmap_split_indlen(
 
 	*indlen1 = len1;
 	*indlen2 = len2;
-
-	return stolen;
 }
 
 int
@@ -4915,7 +4899,7 @@ xfs_bmap_del_extent_delay(
 	struct xfs_bmbt_irec	new;
 	int64_t			da_old, da_new, da_diff = 0;
 	xfs_fileoff_t		del_endoff, got_endoff;
-	xfs_filblks_t		got_indlen, new_indlen, stolen;
+	xfs_filblks_t		got_indlen, new_indlen, stolen = 0;
 	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
 	uint64_t		fdblocks;
 	int			error = 0;
@@ -4994,8 +4978,19 @@ xfs_bmap_del_extent_delay(
 		new_indlen = xfs_bmap_worst_indlen(ip, new.br_blockcount);
 
 		WARN_ON_ONCE(!got_indlen || !new_indlen);
-		stolen = xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen,
-						       del->br_blockcount);
+		/*
+		 * Steal as many blocks as we can to try and satisfy the worst
+		 * case indlen for both new extents.
+		 */
+		da_new = got_indlen + new_indlen;
+		if (da_new > da_old) {
+			stolen = XFS_FILBLKS_MIN(da_new - da_old,
+						 del->br_blockcount);
+			da_old += stolen;
+		}
+		if (da_new > da_old)
+			xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen);
+		da_new = got_indlen + new_indlen;
 
 		got->br_startblock = nullstartblock((int)got_indlen);
 
@@ -5007,7 +5002,6 @@ xfs_bmap_del_extent_delay(
 		xfs_iext_next(ifp, icur);
 		xfs_iext_insert(ip, icur, &new, state);
 
-		da_new = got_indlen + new_indlen - stolen;
 		del->br_blockcount -= stolen;
 		break;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks)
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 11/13] xfs: rework splitting of indirect block reservations Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-27 15:18   ` Darrick J. Wong
  2024-03-28  4:36   ` Dave Chinner
  2024-03-27 11:03 ` [PATCH 13/13] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig
  12 siblings, 2 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

When xfs_bmap_del_extent_delay has to split an indirect block it tries
to steal blocks from the the part that gets unmapped to increase the
indirect block reservation that now needs to cover for two extents
instead of one.

This works perfectly fine on the data device, where the data and
indirect blocks come from the same pool.  It has no chance of working
when the inode sits on the RT device.  To support re-enabling delalloc
for inodes on the RT device, make this behavior conditional on not
beeing for rt extents.

Note that split of delalloc extents should only happen on writeback
failure, as for other kinds of hole punching we first write back all
data and thus convert the delalloc reservations covering the hole to
a real allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_bmap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9d0b7caa9a036c..ef34738fb0fedd 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4981,9 +4981,14 @@ xfs_bmap_del_extent_delay(
 		/*
 		 * Steal as many blocks as we can to try and satisfy the worst
 		 * case indlen for both new extents.
+		 *
+		 * However, we can't just steal reservations from the data
+		 * blocks if this is an RT inodes as the data and metadata
+		 * blocks come from different pools.  We'll have to live with
+		 * under-filled indirect reservation in this case.
 		 */
 		da_new = got_indlen + new_indlen;
-		if (da_new > da_old) {
+		if (da_new > da_old && !isrt) {
 			stolen = XFS_FILBLKS_MIN(da_new - da_old,
 						 del->br_blockcount);
 			da_old += stolen;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/13] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1)
  2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-03-27 11:03 ` [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
@ 2024-03-27 11:03 ` Christoph Hellwig
  2024-03-28  4:39   ` Dave Chinner
  12 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 11:03 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs

Commit aff3a9edb708 ("xfs: Use preallocation for inodes with extsz
hints") disabled delayed allocation for all inodes with extent size
hints due a data exposure problem.  It turns out we fixed this data
exposure problem since by always creating unwritten extents for
delalloc conversions due to more data exposure problems, but the
writeback path doesn't actually support extent size hints when
converting delalloc these days, which probably isn't a problem given
that people using the hints know what they get.

However due to the way how xfs_get_extsz_hint is implemented, it
always claims an extent size hint for RT inodes even if the RT
extent size is a single FSB.  Due to that the above commit effectively
disabled delalloc support for RT inodes.

Switch xfs_get_extsz_hint to return 0 for this case and work around
that in a few places to reinstate delalloc support for RT inodes on
file systems with an sb_rextsize of 1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c   | 3 ++-
 fs/xfs/xfs_iomap.c   | 2 --
 fs/xfs/xfs_iops.c    | 2 +-
 fs/xfs/xfs_rtalloc.c | 2 ++
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ea48774f6b76d3..aa62fe2ed76834 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -60,7 +60,8 @@ xfs_get_extsz_hint(
 		return 0;
 	if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize)
 		return ip->i_extsize;
-	if (XFS_IS_REALTIME_INODE(ip))
+	if (XFS_IS_REALTIME_INODE(ip) &&
+	    ip->i_mount->m_sb.sb_rextsize > 1)
 		return ip->i_mount->m_sb.sb_rextsize;
 	return 0;
 }
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index e0c205bcf03404..f6fd9aed3a7f4b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1000,8 +1000,6 @@ xfs_buffered_write_iomap_begin(
 		return xfs_direct_write_iomap_begin(inode, offset, count,
 				flags, iomap, srcmap);
 
-	ASSERT(!XFS_IS_REALTIME_INODE(ip));
-
 	error = xfs_qm_dqattach(ip);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 66f8c47642e884..62f91392b281dc 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -521,7 +521,7 @@ xfs_stat_blksize(
 	 * always return the realtime extent size.
 	 */
 	if (XFS_IS_REALTIME_INODE(ip))
-		return XFS_FSB_TO_B(mp, xfs_get_extsz_hint(ip));
+		return XFS_FSB_TO_B(mp, xfs_get_extsz_hint(ip) ? : 1);
 
 	/*
 	 * Allow large block sizes to be reported to userspace programs if the
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 86f928d30feda9..b476a876478d93 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1341,6 +1341,8 @@ xfs_bmap_rtalloc(
 	int			error;
 
 	align = xfs_get_extsz_hint(ap->ip);
+	if (!align)
+		align = 1;
 retry:
 	error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev,
 					align, 1, ap->eof, 0,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-03-27 11:03 ` [PATCH 03/13] xfs: free RT extents after updating the bmap btree Christoph Hellwig
@ 2024-03-27 14:55   ` Darrick J. Wong
  2024-03-27 17:03     ` Christoph Hellwig
  2024-03-28  4:13   ` Dave Chinner
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 14:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 12:03:08PM +0100, Christoph Hellwig wrote:
> Currently xfs_bmap_del_extent_real frees RT extents before updating
> the bmap btree, while it frees regular blocks after performing the bmap
> btree update.  While this behavior goes back to the original commit,
> I can't find any good reason for handling RT extent vs regular block
> freeing differently.  We use the same transaction, and unless rmaps
> or reflink are enabled (which currently aren't support for RT inodes)
> there are no transactions rolls or deferred ops that can rely on this
> ordering.

...and the realtime rmap/reflink patchsets will want to reuse the data
device's ordering (bmap -> rmap -> refcount -> efi) for the rt volume.

> Signed-off-by: Christoph Hellwig <hch@lst.de>

I'm ok with moving this now since I'm mostly going to pave over it later
anyway :)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 09d4b730ee9709..282b44deb9f864 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5107,8 +5107,7 @@ xfs_bmap_del_extent_real(
>  {
>  	xfs_fsblock_t		del_endblock=0;	/* first block past del */
>  	xfs_fileoff_t		del_endoff;	/* first offset past del */
> -	int			do_fx;	/* free extent at end of routine */
> -	int			error;	/* error return value */
> +	int			error = 0;	/* error return value */
>  	struct xfs_bmbt_irec	got;	/* current extent entry */
>  	xfs_fileoff_t		got_endoff;	/* first offset past got */
>  	int			i;	/* temp state */
> @@ -5151,20 +5150,10 @@ xfs_bmap_del_extent_real(
>  		return -ENOSPC;
>  
>  	*logflagsp = XFS_ILOG_CORE;
> -	if (xfs_ifork_is_realtime(ip, whichfork)) {
> -		if (!(bflags & XFS_BMAPI_REMAP)) {
> -			error = xfs_rtfree_blocks(tp, del->br_startblock,
> -					del->br_blockcount);
> -			if (error)
> -				return error;
> -		}
> -
> -		do_fx = 0;
> +	if (xfs_ifork_is_realtime(ip, whichfork))
>  		qfield = XFS_TRANS_DQ_RTBCOUNT;
> -	} else {
> -		do_fx = 1;
> +	else
>  		qfield = XFS_TRANS_DQ_BCOUNT;
> -	}
>  	nblks = del->br_blockcount;
>  
>  	del_endblock = del->br_startblock + del->br_blockcount;
> @@ -5312,18 +5301,21 @@ xfs_bmap_del_extent_real(
>  	/*
>  	 * If we need to, add to list of extents to delete.
>  	 */
> -	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
> +	if (!(bflags & XFS_BMAPI_REMAP)) {
>  		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
>  			xfs_refcount_decrease_extent(tp, del);
> +		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
> +			error = xfs_rtfree_blocks(tp, del->br_startblock,
> +					del->br_blockcount);
>  		} else {
>  			error = xfs_free_extent_later(tp, del->br_startblock,
>  					del->br_blockcount, NULL,
>  					XFS_AG_RESV_NONE,
>  					((bflags & XFS_BMAPI_NODISCARD) ||
>  					del->br_state == XFS_EXT_UNWRITTEN));
> -			if (error)
> -				return error;
>  		}
> +		if (error)
> +			return error;
>  	}
>  
>  	/*
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi
  2024-03-27 11:03 ` [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
@ 2024-03-27 15:07   ` Darrick J. Wong
  2024-03-27 17:06     ` Christoph Hellwig
  2024-03-28  4:15   ` Dave Chinner
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 15:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 12:03:09PM +0100, Christoph Hellwig wrote:
> __xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary
> inodes given that it is very high level code.  While this only looks ugly
> right now, it will become a problem when supporting delayed allocations
> for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents
> and thus never unlock the rt inodes.
> 
> Move the locking into xfs_bmap_del_extent_real just before the call to
> xfs_rtfree_blocks instead and use a new flag in the transaction to ensure
> that the locking happens only once.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_bmap.c   | 15 ++++++++-------
>  fs/xfs/libxfs/xfs_shared.h |  3 +++
>  2 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 282b44deb9f864..e5e199d325982f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5305,6 +5305,14 @@ xfs_bmap_del_extent_real(
>  		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
>  			xfs_refcount_decrease_extent(tp, del);
>  		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
> +			/*
> +			 * Ensure the bitmap and summary inodes are locked
> +			 * and joined to the transaction before modifying them.
> +			 */
> +			if (!(tp->t_flags & XFS_TRANS_RTBITMAP_LOCKED)) {
> +				tp->t_flags |= XFS_TRANS_RTBITMAP_LOCKED;

How does it happen that xfs_rtfree_blocks gets called more than once in
the same transaction?  Is that simply the effect of xfs_bunmapi_range
and xfs_unmap_exten calling __xfs_bunmapi with
nextents == XFS_ITRUNC_MAX_EXTENTS==2?

What if we simply didn't unmap multiple extents per bunmapi call for
realtime files?  Would that eliminate the need for
XFS_TRANS_RTBITMAP_LOCKED?

--D

> +				xfs_rtbitmap_lock(tp, mp);
> +			}
>  			error = xfs_rtfree_blocks(tp, del->br_startblock,
>  					del->br_blockcount);
>  		} else {
> @@ -5406,13 +5414,6 @@ __xfs_bunmapi(
>  	} else
>  		cur = NULL;
>  
> -	if (isrt) {
> -		/*
> -		 * Synchronize by locking the realtime bitmap.
> -		 */
> -		xfs_rtbitmap_lock(tp, mp);
> -	}
> -
>  	extno = 0;
>  	while (end != (xfs_fileoff_t)-1 && end >= start &&
>  	       (nexts == 0 || extno < nexts)) {
> diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
> index f35640ad3e7fe4..34f104ed372c09 100644
> --- a/fs/xfs/libxfs/xfs_shared.h
> +++ b/fs/xfs/libxfs/xfs_shared.h
> @@ -137,6 +137,9 @@ void	xfs_log_get_max_trans_res(struct xfs_mount *mp,
>   */
>  #define XFS_TRANS_LOWMODE		(1u << 8)
>  
> +/* Transaction has locked the rtbitmap and rtsum inodes */
> +#define XFS_TRANS_RTBITMAP_LOCKED	(1u << 9)
> +
>  /*
>   * Field values for xfs_trans_mod_sb.
>   */
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/13] xfs: split xfs_mod_freecounter
  2024-03-27 11:03 ` [PATCH 06/13] xfs: split xfs_mod_freecounter Christoph Hellwig
@ 2024-03-27 15:12   ` Darrick J. Wong
  2024-03-28  4:18   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 15:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 12:03:11PM +0100, Christoph Hellwig wrote:
> xfs_mod_freecounter has two entirely separate code paths for adding or
> subtracting from the free counters.  Only the subtract case looks at the
> rsvd flag and can return an error.
> 
> Split xfs_mod_freecounter into separate helpers for subtracting or
> adding the freecounter, and remove all the impossible to reach error
> handling for the addition case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks fine still,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_ag.c      |  4 +--
>  fs/xfs/libxfs/xfs_ag_resv.c | 24 ++++---------
>  fs/xfs/libxfs/xfs_ag_resv.h |  2 +-
>  fs/xfs/libxfs/xfs_alloc.c   |  4 +--
>  fs/xfs/libxfs/xfs_bmap.c    | 23 +++++++------
>  fs/xfs/scrub/fscounters.c   |  2 +-
>  fs/xfs/scrub/repair.c       |  5 +--
>  fs/xfs/xfs_fsops.c          | 29 +++++-----------
>  fs/xfs/xfs_fsops.h          |  2 +-
>  fs/xfs/xfs_mount.c          | 67 +++++++++++++++++++------------------
>  fs/xfs/xfs_mount.h          | 27 ++++++++++-----
>  fs/xfs/xfs_super.c          |  6 +---
>  fs/xfs/xfs_trace.h          |  1 -
>  fs/xfs/xfs_trans.c          | 25 +++++---------
>  14 files changed, 97 insertions(+), 124 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index dc1873f76bffd5..189e3296fef6de 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -963,9 +963,7 @@ xfs_ag_shrink_space(
>  	 * Disable perag reservations so it doesn't cause the allocation request
>  	 * to fail. We'll reestablish reservation before we return.
>  	 */
> -	error = xfs_ag_resv_free(pag);
> -	if (error)
> -		return error;
> +	xfs_ag_resv_free(pag);
>  
>  	/* internal log shouldn't also show up in the free space btrees */
>  	error = xfs_alloc_vextent_exact_bno(&args,
> diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c
> index da1057bd0e6067..216423df939e5c 100644
> --- a/fs/xfs/libxfs/xfs_ag_resv.c
> +++ b/fs/xfs/libxfs/xfs_ag_resv.c
> @@ -126,14 +126,13 @@ xfs_ag_resv_needed(
>  }
>  
>  /* Clean out a reservation */
> -static int
> +static void
>  __xfs_ag_resv_free(
>  	struct xfs_perag		*pag,
>  	enum xfs_ag_resv_type		type)
>  {
>  	struct xfs_ag_resv		*resv;
>  	xfs_extlen_t			oldresv;
> -	int				error;
>  
>  	trace_xfs_ag_resv_free(pag, type, 0);
>  
> @@ -149,30 +148,19 @@ __xfs_ag_resv_free(
>  		oldresv = resv->ar_orig_reserved;
>  	else
>  		oldresv = resv->ar_reserved;
> -	error = xfs_mod_fdblocks(pag->pag_mount, oldresv, true);
> +	xfs_add_fdblocks(pag->pag_mount, oldresv);
>  	resv->ar_reserved = 0;
>  	resv->ar_asked = 0;
>  	resv->ar_orig_reserved = 0;
> -
> -	if (error)
> -		trace_xfs_ag_resv_free_error(pag->pag_mount, pag->pag_agno,
> -				error, _RET_IP_);
> -	return error;
>  }
>  
>  /* Free a per-AG reservation. */
> -int
> +void
>  xfs_ag_resv_free(
>  	struct xfs_perag		*pag)
>  {
> -	int				error;
> -	int				err2;
> -
> -	error = __xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT);
> -	err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
> -	if (err2 && !error)
> -		error = err2;
> -	return error;
> +	__xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT);
> +	__xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA);
>  }
>  
>  static int
> @@ -216,7 +204,7 @@ __xfs_ag_resv_init(
>  	if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_AG_RESV_FAIL))
>  		error = -ENOSPC;
>  	else
> -		error = xfs_mod_fdblocks(mp, -(int64_t)hidden_space, true);
> +		error = xfs_dec_fdblocks(mp, hidden_space, true);
>  	if (error) {
>  		trace_xfs_ag_resv_init_error(pag->pag_mount, pag->pag_agno,
>  				error, _RET_IP_);
> diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h
> index b74b210008ea7e..ff20ed93de7724 100644
> --- a/fs/xfs/libxfs/xfs_ag_resv.h
> +++ b/fs/xfs/libxfs/xfs_ag_resv.h
> @@ -6,7 +6,7 @@
>  #ifndef __XFS_AG_RESV_H__
>  #define	__XFS_AG_RESV_H__
>  
> -int xfs_ag_resv_free(struct xfs_perag *pag);
> +void xfs_ag_resv_free(struct xfs_perag *pag);
>  int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp);
>  
>  bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type);
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 9da52e92172aba..6cb8b2ddc541b4 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -79,7 +79,7 @@ xfs_prealloc_blocks(
>  }
>  
>  /*
> - * The number of blocks per AG that we withhold from xfs_mod_fdblocks to
> + * The number of blocks per AG that we withhold from xfs_dec_fdblocks to
>   * guarantee that we can refill the AGFL prior to allocating space in a nearly
>   * full AG.  Although the space described by the free space btrees, the
>   * blocks used by the freesp btrees themselves, and the blocks owned by the
> @@ -89,7 +89,7 @@ xfs_prealloc_blocks(
>   * until the fs goes down, we subtract this many AG blocks from the incore
>   * fdblocks to ensure user allocation does not overcommit the space the
>   * filesystem needs for the AGFLs.  The rmap btree uses a per-AG reservation to
> - * withhold space from xfs_mod_fdblocks, so we do not account for that here.
> + * withhold space from xfs_dec_fdblocks, so we do not account for that here.
>   */
>  #define XFS_ALLOCBT_AGFL_RESERVE	4
>  
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index e5e199d325982f..4e81baf5e95301 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -1983,10 +1983,11 @@ xfs_bmap_add_extent_delay_real(
>  	}
>  
>  	/* adjust for changes in reserved delayed indirect blocks */
> -	if (da_new != da_old) {
> -		ASSERT(state == 0 || da_new < da_old);
> -		error = xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new),
> -				false);
> +	if (da_new < da_old) {
> +		xfs_add_fdblocks(mp, da_old - da_new);
> +	} else if (da_new > da_old) {
> +		ASSERT(state == 0);
> +		error = xfs_dec_fdblocks(mp, da_new - da_old, false);
>  	}
>  
>  	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
> @@ -2688,8 +2689,8 @@ xfs_bmap_add_extent_hole_delay(
>  	}
>  	if (oldlen != newlen) {
>  		ASSERT(oldlen > newlen);
> -		xfs_mod_fdblocks(ip->i_mount, (int64_t)(oldlen - newlen),
> -				 false);
> +		xfs_add_fdblocks(ip->i_mount, oldlen - newlen);
> +
>  		/*
>  		 * Nothing to do for disk quota accounting here.
>  		 */
> @@ -4108,11 +4109,11 @@ xfs_bmapi_reserve_delalloc(
>  	indlen = (xfs_extlen_t)xfs_bmap_worst_indlen(ip, alen);
>  	ASSERT(indlen > 0);
>  
> -	error = xfs_mod_fdblocks(mp, -((int64_t)alen), false);
> +	error = xfs_dec_fdblocks(mp, alen, false);
>  	if (error)
>  		goto out_unreserve_quota;
>  
> -	error = xfs_mod_fdblocks(mp, -((int64_t)indlen), false);
> +	error = xfs_dec_fdblocks(mp, indlen, false);
>  	if (error)
>  		goto out_unreserve_blocks;
>  
> @@ -4140,7 +4141,7 @@ xfs_bmapi_reserve_delalloc(
>  	return 0;
>  
>  out_unreserve_blocks:
> -	xfs_mod_fdblocks(mp, alen, false);
> +	xfs_add_fdblocks(mp, alen);
>  out_unreserve_quota:
>  	if (XFS_IS_QUOTA_ON(mp))
>  		xfs_quota_unreserve_blkres(ip, alen);
> @@ -4926,7 +4927,7 @@ xfs_bmap_del_extent_delay(
>  	ASSERT(got_endoff >= del_endoff);
>  
>  	if (isrt)
> -		xfs_mod_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
> +		xfs_add_frextents(mp, xfs_rtb_to_rtx(mp, del->br_blockcount));
>  
>  	/*
>  	 * Update the inode delalloc counter now and wait to update the
> @@ -5013,7 +5014,7 @@ xfs_bmap_del_extent_delay(
>  	if (!isrt)
>  		da_diff += del->br_blockcount;
>  	if (da_diff) {
> -		xfs_mod_fdblocks(mp, da_diff, false);
> +		xfs_add_fdblocks(mp, da_diff);
>  		xfs_mod_delalloc(mp, -da_diff);
>  	}
>  	return error;
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index 2b4bd2eb71b57d..c910b5f1306913 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -517,7 +517,7 @@ xchk_fscounters(
>  
>  	/*
>  	 * If the filesystem is not frozen, the counter summation calls above
> -	 * can race with xfs_mod_freecounter, which subtracts a requested space
> +	 * can race with xfs_dec_freecounter, which subtracts a requested space
>  	 * reservation from the counter and undoes the subtraction if that made
>  	 * the counter go negative.  Therefore, it's possible to see negative
>  	 * values here, and we should only flag that as a corruption if we
> diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> index f43dce771cdd26..6123e6c7ac7d67 100644
> --- a/fs/xfs/scrub/repair.c
> +++ b/fs/xfs/scrub/repair.c
> @@ -963,9 +963,7 @@ xrep_reset_perag_resv(
>  	ASSERT(sc->tp);
>  
>  	sc->flags &= ~XREP_RESET_PERAG_RESV;
> -	error = xfs_ag_resv_free(sc->sa.pag);
> -	if (error)
> -		goto out;
> +	xfs_ag_resv_free(sc->sa.pag);
>  	error = xfs_ag_resv_init(sc->sa.pag, sc->tp);
>  	if (error == -ENOSPC) {
>  		xfs_err(sc->mp,
> @@ -974,7 +972,6 @@ xrep_reset_perag_resv(
>  		error = 0;
>  	}
>  
> -out:
>  	return error;
>  }
>  
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 83f708f62ed9f2..c211ea2b63c4dd 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -213,10 +213,8 @@ xfs_growfs_data_private(
>  			struct xfs_perag	*pag;
>  
>  			pag = xfs_perag_get(mp, id.agno);
> -			error = xfs_ag_resv_free(pag);
> +			xfs_ag_resv_free(pag);
>  			xfs_perag_put(pag);
> -			if (error)
> -				return error;
>  		}
>  		/*
>  		 * Reserve AG metadata blocks. ENOSPC here does not mean there
> @@ -385,14 +383,14 @@ xfs_reserve_blocks(
>  	 */
>  	if (mp->m_resblks > request) {
>  		lcounter = mp->m_resblks_avail - request;
> -		if (lcounter  > 0) {		/* release unused blocks */
> +		if (lcounter > 0) {		/* release unused blocks */
>  			fdblks_delta = lcounter;
>  			mp->m_resblks_avail -= lcounter;
>  		}
>  		mp->m_resblks = request;
>  		if (fdblks_delta) {
>  			spin_unlock(&mp->m_sb_lock);
> -			error = xfs_mod_fdblocks(mp, fdblks_delta, 0);
> +			xfs_add_fdblocks(mp, fdblks_delta);
>  			spin_lock(&mp->m_sb_lock);
>  		}
>  
> @@ -428,9 +426,9 @@ xfs_reserve_blocks(
>  		 */
>  		fdblks_delta = min(free, delta);
>  		spin_unlock(&mp->m_sb_lock);
> -		error = xfs_mod_fdblocks(mp, -fdblks_delta, 0);
> +		error = xfs_dec_fdblocks(mp, fdblks_delta, 0);
>  		if (!error)
> -			xfs_mod_fdblocks(mp, fdblks_delta, 0);
> +			xfs_add_fdblocks(mp, fdblks_delta);
>  		spin_lock(&mp->m_sb_lock);
>  	}
>  out:
> @@ -556,24 +554,13 @@ xfs_fs_reserve_ag_blocks(
>  /*
>   * Free space reserved for per-AG metadata.
>   */
> -int
> +void
>  xfs_fs_unreserve_ag_blocks(
>  	struct xfs_mount	*mp)
>  {
>  	xfs_agnumber_t		agno;
>  	struct xfs_perag	*pag;
> -	int			error = 0;
> -	int			err2;
>  
> -	for_each_perag(mp, agno, pag) {
> -		err2 = xfs_ag_resv_free(pag);
> -		if (err2 && !error)
> -			error = err2;
> -	}
> -
> -	if (error)
> -		xfs_warn(mp,
> -	"Error %d freeing per-AG metadata reserve pool.", error);
> -
> -	return error;
> +	for_each_perag(mp, agno, pag)
> +		xfs_ag_resv_free(pag);
>  }
> diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h
> index 44457b0a059376..3e2f73bcf8314b 100644
> --- a/fs/xfs/xfs_fsops.h
> +++ b/fs/xfs/xfs_fsops.h
> @@ -12,6 +12,6 @@ int xfs_reserve_blocks(struct xfs_mount *mp, uint64_t request);
>  int xfs_fs_goingdown(struct xfs_mount *mp, uint32_t inflags);
>  
>  int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp);
> -int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
> +void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp);
>  
>  #endif	/* __XFS_FSOPS_H__ */
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index df370eb5dc15e1..e7dfa259d0e6a2 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1131,16 +1131,44 @@ xfs_fs_writable(
>  	return true;
>  }
>  
> -/* Adjust m_fdblocks or m_frextents. */
> +void
> +xfs_add_freecounter(
> +	struct xfs_mount	*mp,
> +	struct percpu_counter	*counter,
> +	uint64_t		delta)
> +{
> +	bool			has_resv_pool = (counter == &mp->m_fdblocks);
> +	uint64_t		res_used;
> +
> +	/*
> +	 * If the reserve pool is depleted, put blocks back into it first.
> +	 * Most of the time the pool is full.
> +	 */
> +	if (!has_resv_pool || mp->m_resblks == mp->m_resblks_avail) {
> +		percpu_counter_add(counter, delta);
> +		return;
> +	}
> +
> +	spin_lock(&mp->m_sb_lock);
> +	res_used = mp->m_resblks - mp->m_resblks_avail;
> +	if (res_used > delta) {
> +		mp->m_resblks_avail += delta;
> +	} else {
> +		delta -= res_used;
> +		mp->m_resblks_avail = mp->m_resblks;
> +		percpu_counter_add(counter, delta);
> +	}
> +	spin_unlock(&mp->m_sb_lock);
> +}
> +
>  int
> -xfs_mod_freecounter(
> +xfs_dec_freecounter(
>  	struct xfs_mount	*mp,
>  	struct percpu_counter	*counter,
> -	int64_t			delta,
> +	uint64_t		delta,
>  	bool			rsvd)
>  {
>  	int64_t			lcounter;
> -	long long		res_used;
>  	uint64_t		set_aside = 0;
>  	s32			batch;
>  	bool			has_resv_pool;
> @@ -1150,31 +1178,6 @@ xfs_mod_freecounter(
>  	if (rsvd)
>  		ASSERT(has_resv_pool);
>  
> -	if (delta > 0) {
> -		/*
> -		 * If the reserve pool is depleted, put blocks back into it
> -		 * first. Most of the time the pool is full.
> -		 */
> -		if (likely(!has_resv_pool ||
> -			   mp->m_resblks == mp->m_resblks_avail)) {
> -			percpu_counter_add(counter, delta);
> -			return 0;
> -		}
> -
> -		spin_lock(&mp->m_sb_lock);
> -		res_used = (long long)(mp->m_resblks - mp->m_resblks_avail);
> -
> -		if (res_used > delta) {
> -			mp->m_resblks_avail += delta;
> -		} else {
> -			delta -= res_used;
> -			mp->m_resblks_avail = mp->m_resblks;
> -			percpu_counter_add(counter, delta);
> -		}
> -		spin_unlock(&mp->m_sb_lock);
> -		return 0;
> -	}
> -
>  	/*
>  	 * Taking blocks away, need to be more accurate the closer we
>  	 * are to zero.
> @@ -1202,7 +1205,7 @@ xfs_mod_freecounter(
>  	 */
>  	if (has_resv_pool)
>  		set_aside = xfs_fdblocks_unavailable(mp);
> -	percpu_counter_add_batch(counter, delta, batch);
> +	percpu_counter_add_batch(counter, -((int64_t)delta), batch);
>  	if (__percpu_counter_compare(counter, set_aside,
>  				     XFS_FDBLOCKS_BATCH) >= 0) {
>  		/* we had space! */
> @@ -1214,11 +1217,11 @@ xfs_mod_freecounter(
>  	 * that took us to ENOSPC.
>  	 */
>  	spin_lock(&mp->m_sb_lock);
> -	percpu_counter_add(counter, -delta);
> +	percpu_counter_add(counter, delta);
>  	if (!has_resv_pool || !rsvd)
>  		goto fdblocks_enospc;
>  
> -	lcounter = (long long)mp->m_resblks_avail + delta;
> +	lcounter = (long long)mp->m_resblks_avail - delta;
>  	if (lcounter >= 0) {
>  		mp->m_resblks_avail = lcounter;
>  		spin_unlock(&mp->m_sb_lock);
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index e880aa48de68bb..d941437a0c7369 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -534,19 +534,30 @@ xfs_fdblocks_unavailable(
>  	return mp->m_alloc_set_aside + atomic64_read(&mp->m_allocbt_blks);
>  }
>  
> -int xfs_mod_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
> -		int64_t delta, bool rsvd);
> +int xfs_dec_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
> +		uint64_t delta, bool rsvd);
> +void xfs_add_freecounter(struct xfs_mount *mp, struct percpu_counter *counter,
> +		uint64_t delta);
>  
> -static inline int
> -xfs_mod_fdblocks(struct xfs_mount *mp, int64_t delta, bool reserved)
> +static inline int xfs_dec_fdblocks(struct xfs_mount *mp, uint64_t delta,
> +		bool reserved)
>  {
> -	return xfs_mod_freecounter(mp, &mp->m_fdblocks, delta, reserved);
> +	return xfs_dec_freecounter(mp, &mp->m_fdblocks, delta, reserved);
>  }
>  
> -static inline int
> -xfs_mod_frextents(struct xfs_mount *mp, int64_t delta)
> +static inline void xfs_add_fdblocks(struct xfs_mount *mp, uint64_t delta)
>  {
> -	return xfs_mod_freecounter(mp, &mp->m_frextents, delta, false);
> +	xfs_add_freecounter(mp, &mp->m_fdblocks, delta);
> +}
> +
> +static inline int xfs_dec_frextents(struct xfs_mount *mp, uint64_t delta)
> +{
> +	return xfs_dec_freecounter(mp, &mp->m_frextents, delta, false);
> +}
> +
> +static inline void xfs_add_frextents(struct xfs_mount *mp, uint64_t delta)
> +{
> +	xfs_add_freecounter(mp, &mp->m_frextents, delta);
>  }
>  
>  extern int	xfs_readsb(xfs_mount_t *, int);
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index c21f10ab0f5dbe..784ff7d1c986ba 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1873,11 +1873,7 @@ xfs_remount_ro(
>  	xfs_inodegc_stop(mp);
>  
>  	/* Free the per-AG metadata reservation pool. */
> -	error = xfs_fs_unreserve_ag_blocks(mp);
> -	if (error) {
> -		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> -		return error;
> -	}
> +	xfs_fs_unreserve_ag_blocks(mp);
>  
>  	/*
>  	 * Before we sync the metadata, we need to free up the reserve block
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index aea97fc074f8de..1c5753f91c388e 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3062,7 +3062,6 @@ DEFINE_AG_RESV_EVENT(xfs_ag_resv_free_extent);
>  DEFINE_AG_RESV_EVENT(xfs_ag_resv_critical);
>  DEFINE_AG_RESV_EVENT(xfs_ag_resv_needed);
>  
> -DEFINE_AG_ERROR_EVENT(xfs_ag_resv_free_error);
>  DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error);
>  
>  /* refcount tracepoint classes */
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 924b460229e951..b18d478e2c9e85 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -163,7 +163,7 @@ xfs_trans_reserve(
>  	 * fail if the count would go below zero.
>  	 */
>  	if (blocks > 0) {
> -		error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd);
> +		error = xfs_dec_fdblocks(mp, blocks, rsvd);
>  		if (error != 0)
>  			return -ENOSPC;
>  		tp->t_blk_res += blocks;
> @@ -210,7 +210,7 @@ xfs_trans_reserve(
>  	 * fail if the count would go below zero.
>  	 */
>  	if (rtextents > 0) {
> -		error = xfs_mod_frextents(mp, -((int64_t)rtextents));
> +		error = xfs_dec_frextents(mp, rtextents);
>  		if (error) {
>  			error = -ENOSPC;
>  			goto undo_log;
> @@ -234,7 +234,7 @@ xfs_trans_reserve(
>  
>  undo_blocks:
>  	if (blocks > 0) {
> -		xfs_mod_fdblocks(mp, (int64_t)blocks, rsvd);
> +		xfs_add_fdblocks(mp, blocks);
>  		tp->t_blk_res = 0;
>  	}
>  	return error;
> @@ -593,12 +593,10 @@ xfs_trans_unreserve_and_mod_sb(
>  	struct xfs_trans	*tp)
>  {
>  	struct xfs_mount	*mp = tp->t_mountp;
> -	bool			rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
>  	int64_t			blkdelta = tp->t_blk_res;
>  	int64_t			rtxdelta = tp->t_rtx_res;
>  	int64_t			idelta = 0;
>  	int64_t			ifreedelta = 0;
> -	int			error;
>  
>  	/*
>  	 * Calculate the deltas.
> @@ -631,10 +629,8 @@ xfs_trans_unreserve_and_mod_sb(
>  	}
>  
>  	/* apply the per-cpu counters */
> -	if (blkdelta) {
> -		error = xfs_mod_fdblocks(mp, blkdelta, rsvd);
> -		ASSERT(!error);
> -	}
> +	if (blkdelta)
> +		xfs_add_fdblocks(mp, blkdelta);
>  
>  	if (idelta)
>  		percpu_counter_add_batch(&mp->m_icount, idelta,
> @@ -643,10 +639,8 @@ xfs_trans_unreserve_and_mod_sb(
>  	if (ifreedelta)
>  		percpu_counter_add(&mp->m_ifree, ifreedelta);
>  
> -	if (rtxdelta) {
> -		error = xfs_mod_frextents(mp, rtxdelta);
> -		ASSERT(!error);
> -	}
> +	if (rtxdelta)
> +		xfs_add_frextents(mp, rtxdelta);
>  
>  	if (!(tp->t_flags & XFS_TRANS_SB_DIRTY))
>  		return;
> @@ -682,7 +676,6 @@ xfs_trans_unreserve_and_mod_sb(
>  	 */
>  	ASSERT(mp->m_sb.sb_imax_pct >= 0);
>  	ASSERT(mp->m_sb.sb_rextslog >= 0);
> -	return;
>  }
>  
>  /* Add the given log item to the transaction's list of log items. */
> @@ -1301,9 +1294,9 @@ xfs_trans_reserve_more_inode(
>  		return 0;
>  
>  	/* Quota failed, give back the new reservation. */
> -	xfs_mod_fdblocks(mp, dblocks, tp->t_flags & XFS_TRANS_RESERVE);
> +	xfs_add_fdblocks(mp, dblocks);
>  	tp->t_blk_res -= dblocks;
> -	xfs_mod_frextents(mp, rtx);
> +	xfs_add_frextents(mp, rtx);
>  	tp->t_rtx_res -= rtx;
>  	return error;
>  }
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/13] xfs: rework splitting of indirect block reservations
  2024-03-27 11:03 ` [PATCH 11/13] xfs: rework splitting of indirect block reservations Christoph Hellwig
@ 2024-03-27 15:14   ` Darrick J. Wong
  2024-03-28  4:35   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 15:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 12:03:16PM +0100, Christoph Hellwig wrote:
> Move the check if we have enough indirect blocks and the stealing of
> the deleted extent blocks out of xfs_bmap_split_indlen and into the
> caller to prepare for handling delayed allocation of RT extents that
> can't easily be stolen.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks correct now,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 38 ++++++++++++++++----------------------
>  1 file changed, 16 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 131d4f063b660a..9d0b7caa9a036c 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4829,31 +4829,17 @@ xfs_bmapi_remap(
>   * ores == 1). The number of stolen blocks is returned. The availability and
>   * subsequent accounting of stolen blocks is the responsibility of the caller.
>   */
> -static xfs_filblks_t
> +static void
>  xfs_bmap_split_indlen(
>  	xfs_filblks_t			ores,		/* original res. */
>  	xfs_filblks_t			*indlen1,	/* ext1 worst indlen */
> -	xfs_filblks_t			*indlen2,	/* ext2 worst indlen */
> -	xfs_filblks_t			avail)		/* stealable blocks */
> +	xfs_filblks_t			*indlen2)	/* ext2 worst indlen */
>  {
>  	xfs_filblks_t			len1 = *indlen1;
>  	xfs_filblks_t			len2 = *indlen2;
>  	xfs_filblks_t			nres = len1 + len2; /* new total res. */
> -	xfs_filblks_t			stolen = 0;
>  	xfs_filblks_t			resfactor;
>  
> -	/*
> -	 * Steal as many blocks as we can to try and satisfy the worst case
> -	 * indlen for both new extents.
> -	 */
> -	if (ores < nres && avail)
> -		stolen = XFS_FILBLKS_MIN(nres - ores, avail);
> -	ores += stolen;
> -
> -	 /* nothing else to do if we've satisfied the new reservation */
> -	if (ores >= nres)
> -		return stolen;
> -
>  	/*
>  	 * We can't meet the total required reservation for the two extents.
>  	 * Calculate the percent of the overall shortage between both extents
> @@ -4898,8 +4884,6 @@ xfs_bmap_split_indlen(
>  
>  	*indlen1 = len1;
>  	*indlen2 = len2;
> -
> -	return stolen;
>  }
>  
>  int
> @@ -4915,7 +4899,7 @@ xfs_bmap_del_extent_delay(
>  	struct xfs_bmbt_irec	new;
>  	int64_t			da_old, da_new, da_diff = 0;
>  	xfs_fileoff_t		del_endoff, got_endoff;
> -	xfs_filblks_t		got_indlen, new_indlen, stolen;
> +	xfs_filblks_t		got_indlen, new_indlen, stolen = 0;
>  	uint32_t		state = xfs_bmap_fork_to_state(whichfork);
>  	uint64_t		fdblocks;
>  	int			error = 0;
> @@ -4994,8 +4978,19 @@ xfs_bmap_del_extent_delay(
>  		new_indlen = xfs_bmap_worst_indlen(ip, new.br_blockcount);
>  
>  		WARN_ON_ONCE(!got_indlen || !new_indlen);
> -		stolen = xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen,
> -						       del->br_blockcount);
> +		/*
> +		 * Steal as many blocks as we can to try and satisfy the worst
> +		 * case indlen for both new extents.
> +		 */
> +		da_new = got_indlen + new_indlen;
> +		if (da_new > da_old) {
> +			stolen = XFS_FILBLKS_MIN(da_new - da_old,
> +						 del->br_blockcount);
> +			da_old += stolen;
> +		}
> +		if (da_new > da_old)
> +			xfs_bmap_split_indlen(da_old, &got_indlen, &new_indlen);
> +		da_new = got_indlen + new_indlen;
>  
>  		got->br_startblock = nullstartblock((int)got_indlen);
>  
> @@ -5007,7 +5002,6 @@ xfs_bmap_del_extent_delay(
>  		xfs_iext_next(ifp, icur);
>  		xfs_iext_insert(ip, icur, &new, state);
>  
> -		da_new = got_indlen + new_indlen - stolen;
>  		del->br_blockcount -= stolen;
>  		break;
>  	}
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks)
  2024-03-27 11:03 ` [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
@ 2024-03-27 15:18   ` Darrick J. Wong
  2024-03-28  4:36   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 15:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 12:03:17PM +0100, Christoph Hellwig wrote:
> When xfs_bmap_del_extent_delay has to split an indirect block it tries
> to steal blocks from the the part that gets unmapped to increase the
> indirect block reservation that now needs to cover for two extents
> instead of one.
> 
> This works perfectly fine on the data device, where the data and
> indirect blocks come from the same pool.  It has no chance of working
> when the inode sits on the RT device.  To support re-enabling delalloc
> for inodes on the RT device, make this behavior conditional on not
> beeing for rt extents.

  being

> Note that split of delalloc extents should only happen on writeback
> failure, as for other kinds of hole punching we first write back all
> data and thus convert the delalloc reservations covering the hole to
> a real allocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

With that one spelling fix,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 9d0b7caa9a036c..ef34738fb0fedd 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4981,9 +4981,14 @@ xfs_bmap_del_extent_delay(
>  		/*
>  		 * Steal as many blocks as we can to try and satisfy the worst
>  		 * case indlen for both new extents.
> +		 *
> +		 * However, we can't just steal reservations from the data
> +		 * blocks if this is an RT inodes as the data and metadata
> +		 * blocks come from different pools.  We'll have to live with
> +		 * under-filled indirect reservation in this case.
>  		 */
>  		da_new = got_indlen + new_indlen;
> -		if (da_new > da_old) {
> +		if (da_new > da_old && !isrt) {
>  			stolen = XFS_FILBLKS_MIN(da_new - da_old,
>  						 del->br_blockcount);
>  			da_old += stolen;
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-27 11:03 ` [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
@ 2024-03-27 15:20   ` Darrick J. Wong
  2024-03-27 17:05     ` Christoph Hellwig
  2024-03-28  4:27   ` Dave Chinner
  1 sibling, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 15:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 12:03:14PM +0100, Christoph Hellwig wrote:
> To prepare for re-enabling delalloc on RT devices, track the data blocks
> (which use the RT device when the inode sits on it) and the indirect
> blocks (which don't) separately to xfs_mod_delalloc, and add a new
> percpu counter to also track the RT delalloc blocks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Hmm this looks nearly the same as V4, right?  I couldn't spot the
differences, and I used to follow Hocus Focus every Sunday as a kid. :)

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c         | 12 ++++++------
>  fs/xfs/scrub/fscounters.c        |  6 +++++-
>  fs/xfs/scrub/fscounters.h        |  1 +
>  fs/xfs/scrub/fscounters_repair.c |  3 ++-
>  fs/xfs/xfs_mount.c               | 18 +++++++++++++++---
>  fs/xfs/xfs_mount.h               |  9 ++++++++-
>  fs/xfs/xfs_super.c               | 11 ++++++++++-
>  7 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index a9a23a5d2e487f..131d4f063b660a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -1975,7 +1975,7 @@ xfs_bmap_add_extent_delay_real(
>  	}
>  
>  	if (da_new != da_old)
> -		xfs_mod_delalloc(mp, (int64_t)da_new - da_old);
> +		xfs_mod_delalloc(bma->ip, 0, (int64_t)da_new - da_old);
>  
>  	if (bma->cur) {
>  		da_new += bma->cur->bc_bmap.allocated;
> @@ -2694,7 +2694,7 @@ xfs_bmap_add_extent_hole_delay(
>  		/*
>  		 * Nothing to do for disk quota accounting here.
>  		 */
> -		xfs_mod_delalloc(ip->i_mount, (int64_t)newlen - oldlen);
> +		xfs_mod_delalloc(ip, 0, (int64_t)newlen - oldlen);
>  	}
>  }
>  
> @@ -3371,7 +3371,7 @@ xfs_bmap_alloc_account(
>  		 * yet.
>  		 */
>  		if (ap->wasdel) {
> -			xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
> +			xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0);
>  			return;
>  		}
>  
> @@ -3395,7 +3395,7 @@ xfs_bmap_alloc_account(
>  	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
>  	if (ap->wasdel) {
>  		ap->ip->i_delayed_blks -= ap->length;
> -		xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length);
> +		xfs_mod_delalloc(ap->ip, -(int64_t)ap->length, 0);
>  		fld = isrt ? XFS_TRANS_DQ_DELRTBCOUNT : XFS_TRANS_DQ_DELBCOUNT;
>  	} else {
>  		fld = isrt ? XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT;
> @@ -4124,7 +4124,7 @@ xfs_bmapi_reserve_delalloc(
>  		goto out_unreserve_frextents;
>  
>  	ip->i_delayed_blks += alen;
> -	xfs_mod_delalloc(ip->i_mount, alen + indlen);
> +	xfs_mod_delalloc(ip, alen, indlen);
>  
>  	got->br_startoff = aoff;
>  	got->br_startblock = nullstartblock(indlen);
> @@ -5022,7 +5022,7 @@ xfs_bmap_del_extent_delay(
>  		fdblocks += del->br_blockcount;
>  
>  	xfs_add_fdblocks(mp, fdblocks);
> -	xfs_mod_delalloc(mp, -(int64_t)fdblocks);
> +	xfs_mod_delalloc(ip, -(int64_t)del->br_blockcount, -da_diff);
>  	return error;
>  }
>  
> diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
> index c910b5f1306913..d18a3cac30479d 100644
> --- a/fs/xfs/scrub/fscounters.c
> +++ b/fs/xfs/scrub/fscounters.c
> @@ -412,6 +412,7 @@ xchk_fscount_count_frextents(
>  	int			error;
>  
>  	fsc->frextents = 0;
> +	fsc->frextents_delayed = 0;
>  	if (!xfs_has_realtime(mp))
>  		return 0;
>  
> @@ -423,6 +424,8 @@ xchk_fscount_count_frextents(
>  		goto out_unlock;
>  	}
>  
> +	fsc->frextents_delayed = percpu_counter_sum(&mp->m_delalloc_rtextents);
> +
>  out_unlock:
>  	xfs_rtbitmap_unlock_shared(sc->mp, XFS_RBMLOCK_BITMAP);
>  	return error;
> @@ -434,6 +437,7 @@ xchk_fscount_count_frextents(
>  	struct xchk_fscounters	*fsc)
>  {
>  	fsc->frextents = 0;
> +	fsc->frextents_delayed = 0;
>  	return 0;
>  }
>  #endif /* CONFIG_XFS_RT */
> @@ -593,7 +597,7 @@ xchk_fscounters(
>  	}
>  
>  	if (!xchk_fscount_within_range(sc, frextents, &mp->m_frextents,
> -			fsc->frextents)) {
> +			fsc->frextents - fsc->frextents_delayed)) {
>  		if (fsc->frozen)
>  			xchk_set_corrupt(sc);
>  		else
> diff --git a/fs/xfs/scrub/fscounters.h b/fs/xfs/scrub/fscounters.h
> index 461a13d25f4b38..bcf56e1c36f91c 100644
> --- a/fs/xfs/scrub/fscounters.h
> +++ b/fs/xfs/scrub/fscounters.h
> @@ -12,6 +12,7 @@ struct xchk_fscounters {
>  	uint64_t		ifree;
>  	uint64_t		fdblocks;
>  	uint64_t		frextents;
> +	uint64_t		frextents_delayed;
>  	unsigned long long	icount_min;
>  	unsigned long long	icount_max;
>  	bool			frozen;
> diff --git a/fs/xfs/scrub/fscounters_repair.c b/fs/xfs/scrub/fscounters_repair.c
> index 94cdb852bee462..210ebbcf3e1520 100644
> --- a/fs/xfs/scrub/fscounters_repair.c
> +++ b/fs/xfs/scrub/fscounters_repair.c
> @@ -65,7 +65,8 @@ xrep_fscounters(
>  	percpu_counter_set(&mp->m_icount, fsc->icount);
>  	percpu_counter_set(&mp->m_ifree, fsc->ifree);
>  	percpu_counter_set(&mp->m_fdblocks, fsc->fdblocks);
> -	percpu_counter_set(&mp->m_frextents, fsc->frextents);
> +	percpu_counter_set(&mp->m_frextents,
> +		fsc->frextents - fsc->frextents_delayed);
>  	mp->m_sb.sb_frextents = fsc->frextents;
>  
>  	return 0;
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index e7dfa259d0e6a2..2feb1a81bb87b5 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -34,6 +34,7 @@
>  #include "xfs_health.h"
>  #include "xfs_trace.h"
>  #include "xfs_ag.h"
> +#include "xfs_rtbitmap.h"
>  #include "scrub/stats.h"
>  
>  static DEFINE_MUTEX(xfs_uuid_table_mutex);
> @@ -1402,9 +1403,20 @@ xfs_clear_incompat_log_features(
>  #define XFS_DELALLOC_BATCH	(4096)
>  void
>  xfs_mod_delalloc(
> -	struct xfs_mount	*mp,
> -	int64_t			delta)
> +	struct xfs_inode	*ip,
> +	int64_t			data_delta,
> +	int64_t			ind_delta)
>  {
> -	percpu_counter_add_batch(&mp->m_delalloc_blks, delta,
> +	struct xfs_mount	*mp = ip->i_mount;
> +
> +	if (XFS_IS_REALTIME_INODE(ip)) {
> +		percpu_counter_add_batch(&mp->m_delalloc_rtextents,
> +				xfs_rtb_to_rtx(mp, data_delta),
> +				XFS_DELALLOC_BATCH);
> +		if (!ind_delta)
> +			return;
> +		data_delta = 0;
> +	}
> +	percpu_counter_add_batch(&mp->m_delalloc_blks, data_delta + ind_delta,
>  			XFS_DELALLOC_BATCH);
>  }
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index d941437a0c7369..0e8d7779c0a561 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -195,6 +195,12 @@ typedef struct xfs_mount {
>  	 * extents or anything related to the rt device.
>  	 */
>  	struct percpu_counter	m_delalloc_blks;
> +
> +	/*
> +	 * RT version of the above.
> +	 */
> +	struct percpu_counter	m_delalloc_rtextents;
> +
>  	/*
>  	 * Global count of allocation btree blocks in use across all AGs. Only
>  	 * used when perag reservation is enabled. Helps prevent block
> @@ -577,6 +583,7 @@ struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
>  void xfs_force_summary_recalc(struct xfs_mount *mp);
>  int xfs_add_incompat_log_feature(struct xfs_mount *mp, uint32_t feature);
>  bool xfs_clear_incompat_log_features(struct xfs_mount *mp);
> -void xfs_mod_delalloc(struct xfs_mount *mp, int64_t delta);
> +void xfs_mod_delalloc(struct xfs_inode *ip, int64_t data_delta,
> +		int64_t ind_delta);
>  
>  #endif	/* __XFS_MOUNT_H__ */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 784ff7d1c986ba..39a5e664870485 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1051,12 +1051,18 @@ xfs_init_percpu_counters(
>  	if (error)
>  		goto free_fdblocks;
>  
> -	error = percpu_counter_init(&mp->m_frextents, 0, GFP_KERNEL);
> +	error = percpu_counter_init(&mp->m_delalloc_rtextents, 0, GFP_KERNEL);
>  	if (error)
>  		goto free_delalloc;
>  
> +	error = percpu_counter_init(&mp->m_frextents, 0, GFP_KERNEL);
> +	if (error)
> +		goto free_delalloc_rt;
> +
>  	return 0;
>  
> +free_delalloc_rt:
> +	percpu_counter_destroy(&mp->m_delalloc_rtextents);
>  free_delalloc:
>  	percpu_counter_destroy(&mp->m_delalloc_blks);
>  free_fdblocks:
> @@ -1085,6 +1091,9 @@ xfs_destroy_percpu_counters(
>  	percpu_counter_destroy(&mp->m_icount);
>  	percpu_counter_destroy(&mp->m_ifree);
>  	percpu_counter_destroy(&mp->m_fdblocks);
> +	ASSERT(xfs_is_shutdown(mp) ||
> +	       percpu_counter_sum(&mp->m_delalloc_rtextents) == 0);
> +	percpu_counter_destroy(&mp->m_delalloc_rtextents);
>  	ASSERT(xfs_is_shutdown(mp) ||
>  	       percpu_counter_sum(&mp->m_delalloc_blks) == 0);
>  	percpu_counter_destroy(&mp->m_delalloc_blks);
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-03-27 14:55   ` Darrick J. Wong
@ 2024-03-27 17:03     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 17:03 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 07:55:40AM -0700, Darrick J. Wong wrote:
> ...and the realtime rmap/reflink patchsets will want to reuse the data
> device's ordering (bmap -> rmap -> refcount -> efi) for the rt volume.

Yes.

> 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> I'm ok with moving this now since I'm mostly going to pave over it later
> anyway :)

Actually the rt extent free item and refcount work will fit in very
nicely with this.  Basically the rt branch added here needs another
condition so that's it's skipped when using extent free items for
RT.  And the extent free item branch needs to pass the RT flag
as already done in your series of course.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-27 15:20   ` Darrick J. Wong
@ 2024-03-27 17:05     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 17:05 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 08:20:22AM -0700, Darrick J. Wong wrote:
> Hmm this looks nearly the same as V4, right?  I couldn't spot the
> differences, and I used to follow Hocus Focus every Sunday as a kid. :)

Yes, no changes here.  Sorry for missing the rvb tag.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi
  2024-03-27 15:07   ` Darrick J. Wong
@ 2024-03-27 17:06     ` Christoph Hellwig
  2024-03-27 18:06       ` Darrick J. Wong
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 17:06 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 08:07:55AM -0700, Darrick J. Wong wrote:
> How does it happen that xfs_rtfree_blocks gets called more than once in
> the same transaction?  Is that simply the effect of xfs_bunmapi_range
> and xfs_unmap_exten calling __xfs_bunmapi with
> nextents == XFS_ITRUNC_MAX_EXTENTS==2?

Yes.

> What if we simply didn't unmap multiple extents per bunmapi call for
> realtime files?  Would that eliminate the need for
> XFS_TRANS_RTBITMAP_LOCKED?

Probably.  Not that I really want to rock that boat now when we'll
also have the extent free item / defer ops based path soon.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi
  2024-03-27 17:06     ` Christoph Hellwig
@ 2024-03-27 18:06       ` Darrick J. Wong
  2024-03-27 18:12         ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Darrick J. Wong @ 2024-03-27 18:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 06:06:32PM +0100, Christoph Hellwig wrote:
> On Wed, Mar 27, 2024 at 08:07:55AM -0700, Darrick J. Wong wrote:
> > How does it happen that xfs_rtfree_blocks gets called more than once in
> > the same transaction?  Is that simply the effect of xfs_bunmapi_range
> > and xfs_unmap_exten calling __xfs_bunmapi with
> > nextents == XFS_ITRUNC_MAX_EXTENTS==2?
> 
> Yes.
> 
> > What if we simply didn't unmap multiple extents per bunmapi call for
> > realtime files?  Would that eliminate the need for
> > XFS_TRANS_RTBITMAP_LOCKED?
> 
> Probably.  Not that I really want to rock that boat now when we'll
> also have the extent free item / defer ops based path soon.

<nod> I think once we get to the rtgroups patchset then all we have to
do in __xfs_bunmapi is:

	if (XFS_IS_REALTIME_INODE(ip) && whichfork == XFS_DATA_FORK &&
	    !xfs_has_rtgroups(mp))
		nexts = 1;

and then the XFS_TRANS_RTBITMAP_LOCKED flag can go away?

If so, then
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi
  2024-03-27 18:06       ` Darrick J. Wong
@ 2024-03-27 18:12         ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-27 18:12 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Chandan Babu R, Dave Chinner, linux-xfs

On Wed, Mar 27, 2024 at 11:06:00AM -0700, Darrick J. Wong wrote:
> <nod> I think once we get to the rtgroups patchset then all we have to
> do in __xfs_bunmapi is:
> 
> 	if (XFS_IS_REALTIME_INODE(ip) && whichfork == XFS_DATA_FORK &&
> 	    !xfs_has_rtgroups(mp))
> 		nexts = 1;
> 
> and then the XFS_TRANS_RTBITMAP_LOCKED flag can go away?

Probably.  We could also try to do it now, but that's just yet
another variable changing..


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions
  2024-03-27 11:03 ` [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
@ 2024-03-28  3:08   ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  3:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:06PM +0100, Christoph Hellwig wrote:
> Commit bb7b1c9c5dd3 ("xfs: tag transactions that contain intent done
> items") switched the XFS_TRANS_ definitions to be bit based, and using
> comments above the definitions.  As XFS_TRANS_LOWMODE was last and has
> a big fat comment it was missed.  Switch it to the same style.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/13] xfs: refactor realtime inode locking
  2024-03-27 11:03 ` [PATCH 02/13] xfs: refactor realtime inode locking Christoph Hellwig
@ 2024-03-28  3:15   ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  3:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:07PM +0100, Christoph Hellwig wrote:
> From: "Darrick J. Wong" <djwong@kernel.org>
> 
> Create helper functions to deal with locking realtime metadata inodes.
> This enables us to maintain correct locking order once we start adding
> the realtime rmap and refcount btree inodes.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Looks ok.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-03-27 11:03 ` [PATCH 03/13] xfs: free RT extents after updating the bmap btree Christoph Hellwig
  2024-03-27 14:55   ` Darrick J. Wong
@ 2024-03-28  4:13   ` Dave Chinner
  2024-03-29  4:14     ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:08PM +0100, Christoph Hellwig wrote:
> Currently xfs_bmap_del_extent_real frees RT extents before updating
> the bmap btree, while it frees regular blocks after performing the bmap
> btree update.  While this behavior goes back to the original commit,
> I can't find any good reason for handling RT extent vs regular block
> freeing differently. 

It's to do with free space btree manipulations and ENOSPC.  The
truncate for data device extents was originally a two-phase
operation. First it removed the bmapbt record, but because this can
free BMBT extents, it can use up all the free space tree reservation
space. So the transaction gets rolled to commit the BMBT change and
the xfs_bmap_finish() call that frees the data extent runs with a
new transaction reservation that allows different free space btrees
to be logged without overrun.

However, on crash, this could lose the free space because there was
nothing to tell recovery about the extents removed from the BMBT,
hence EFIs were introduced. They tie the extent free operation to the
bmapbt record removal commit for recovery of the second phase of the
extent removal process.

Then RT extents came along. RT extent freeing does not require a
free space btree reservation because the free space metadata is
static and transaction size is bound. Hence we don't need to care if
the BMBT record removal modifies the per-ag free space trees and we
don't need a two-phase extent remove transaction. The only thing we
have to care about is not losing space on crash.

Hence instead of recording the extent for freeing in the bmap list
for xfs_bmap_finish() to process in a new transaction, it simply
freed the rtextent directly. So the original code (from 1994) simply
replaced the "free AG extent later" queueing with a direct free:

@@ -920,7 +937,10 @@ xfs_bmap_del_extent(
               (got.br_startblock == NULLSTARTBLOCK));
        delay = got.br_startblock == NULLSTARTBLOCK;
        if (!delay) {
-               xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist);
+               if (ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+                       xfs_rtfree_extent(ip->i_mount, ip->i_transp, del->br_startblock, del->br_blockcount);
+               else
+                       xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist);
                del_endblock = del->br_startblock + del->br_blockcount;
                if (cur)
                        xfs_bmbt_lookup_eq(cur, got.br_startoff, got.br_startblock, got.br_blockcount);

This code was originally at the start of xfs_dmap_del_extent(), but
the xfs_bmap_add_free() got moved to the end of the function via the
"do_fx" flag (the current code logic) in 1997 because:

commit c4fac74eaa580edcc6b65e977151d73f2b6e9aa5
Author: Doug Doucette <doucette@engr.sgi.com>
Date:   Wed Jul 9 17:34:17 1997 +0000

    Fix xfs_bmap_del_extent, so it can back out of the case where an
    extent is being split, and the space allocation fails.

There was a shutdown occurring because of a case where splitting the
extent record failed because the BMBT split and the filesystem
didn't have enough space for the split to be done. (FWIW, I'm not
sure this can happen anymore.)

The commit backed out the BMBT change on ENOSPC error, and in doing
so I think this actually breaks RT free space tracking. However, it
then returns an ENOSPC error, and we have a dirty transaction in the
RT case so this will shut down the filesysetm when the transaction
is cancelled. Hence the corrupted "bmbt now points at freed rt dev
space" condition never make it to disk, but it's still the wrong way
to handle the issue.

IOWs, this proposed change fixes that "shutdown at ENOSPC on rt
devices" situation that was introduced by the above commit back in
1997.

Nice!

> We use the same transaction, and unless rmaps
> or reflink are enabled (which currently aren't support for RT inodes)
> there are no transactions rolls or deferred ops that can rely on this
> ordering.

Yup, I see no problem there.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi
  2024-03-27 11:03 ` [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
  2024-03-27 15:07   ` Darrick J. Wong
@ 2024-03-28  4:15   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:09PM +0100, Christoph Hellwig wrote:
> __xfs_bunmapi is a bit of an odd place to lock the rtbitmap and rtsummary
> inodes given that it is very high level code.  While this only looks ugly
> right now, it will become a problem when supporting delayed allocations
> for RT inodes as __xfs_bunmapi might end up deleting only delalloc extents
> and thus never unlock the rt inodes.
> 
> Move the locking into xfs_bmap_del_extent_real just before the call to
> xfs_rtfree_blocks instead and use a new flag in the transaction to ensure
> that the locking happens only once.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Not a great fan of using transaction flags for object state, but
I don't have a better way of doing this.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/13] xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive
  2024-03-27 11:03 ` [PATCH 05/13] xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive Christoph Hellwig
@ 2024-03-28  4:16   ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:10PM +0100, Christoph Hellwig wrote:
> And to make that more clear, rearrange the code a bit and add asserts
> and a comment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Yup, that helps a lot.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 06/13] xfs: split xfs_mod_freecounter
  2024-03-27 11:03 ` [PATCH 06/13] xfs: split xfs_mod_freecounter Christoph Hellwig
  2024-03-27 15:12   ` Darrick J. Wong
@ 2024-03-28  4:18   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:11PM +0100, Christoph Hellwig wrote:
> xfs_mod_freecounter has two entirely separate code paths for adding or
> subtracting from the free counters.  Only the subtract case looks at the
> rsvd flag and can return an error.
> 
> Split xfs_mod_freecounter into separate helpers for subtracting or
> adding the freecounter, and remove all the impossible to reach error
> handling for the addition case.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

A nice improvement.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/13] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc
  2024-03-27 11:03 ` [PATCH 07/13] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
@ 2024-03-28  4:20   ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:12PM +0100, Christoph Hellwig wrote:
> Allocate data blocks for RT inodes using xfs_dec_frextents.  While at
> it optimize the data device case by doing only a single xfs_dec_fdblocks
> call for the extent itself and the indirect blocks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Simple enough.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 08/13] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay
  2024-03-27 11:03 ` [PATCH 08/13] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
@ 2024-03-28  4:22   ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:13PM +0100, Christoph Hellwig wrote:
> The code to account fdblocks and frextents in xfs_bmap_del_extent_delay
> is a bit weird in that it accounts frextents before the iext tree
> manipulations and fdblocks after it.  Given that the iext tree
> manipulations cannot fail currently that's not really a problem, but
> still odd.  Move the frextent manipulation to the end, and use a
> fdblocks variable to account of the unconditional indirect blocks and
> the data blocks only freed for !RT.  This prepares for following
> updates in the area and already makes the code more readable.
> 
> Also remove the !isrt assert given that this code clearly handles
> rt extents correctly, and we'll soon reinstate delalloc support for
> RT inodes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-27 11:03 ` [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
  2024-03-27 15:20   ` Darrick J. Wong
@ 2024-03-28  4:27   ` Dave Chinner
  2024-03-28  4:34     ` Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:14PM +0100, Christoph Hellwig wrote:
> To prepare for re-enabling delalloc on RT devices, track the data blocks
> (which use the RT device when the inode sits on it) and the indirect
> blocks (which don't) separately to xfs_mod_delalloc, and add a new
> percpu counter to also track the RT delalloc blocks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....
> diff --git a/fs/xfs/scrub/fscounters_repair.c b/fs/xfs/scrub/fscounters_repair.c
> index 94cdb852bee462..210ebbcf3e1520 100644
> --- a/fs/xfs/scrub/fscounters_repair.c
> +++ b/fs/xfs/scrub/fscounters_repair.c
> @@ -65,7 +65,8 @@ xrep_fscounters(
>  	percpu_counter_set(&mp->m_icount, fsc->icount);
>  	percpu_counter_set(&mp->m_ifree, fsc->ifree);
>  	percpu_counter_set(&mp->m_fdblocks, fsc->fdblocks);
> -	percpu_counter_set(&mp->m_frextents, fsc->frextents);
> +	percpu_counter_set(&mp->m_frextents,
> +		fsc->frextents - fsc->frextents_delayed);
>  	mp->m_sb.sb_frextents = fsc->frextents;

Why do we set mp->m_frextents differently to mp->m_fdblocks?
Surely if we have to care about delalloc blocks here, we have to
process both data device and rt device delalloc block accounting the
same way, right?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations
  2024-03-27 11:03 ` [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
@ 2024-03-28  4:32   ` Dave Chinner
  2024-03-28  4:34     ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:15PM +0100, Christoph Hellwig wrote:
> Add a check for files on the RT subvolume and use m_frextents instead
> of m_fdblocks to adjust the preallocation size.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

One minor nit:

> +	if (unlikely(XFS_IS_REALTIME_INODE(ip)))
> +		freesp = xfs_rtx_to_rtb(mp, xfs_iomap_freesp(&mp->m_frextents,
> +				mp->m_low_rtexts, &shift));

It took me extra brain cells to realise that this had nested
function calls because of the way the long line is split. Can we
change it to look like this:

		freesp = xfs_rtx_to_rtb(mp,
				xfs_iomap_freesp(&mp->m_frextents,
						mp->m_low_rtexts, &shift));

Just to make it a little easier to spot the nested function and
which parameters belong to which function?

Regardless, the code looks correct, so:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-28  4:27   ` Dave Chinner
@ 2024-03-28  4:34     ` Christoph Hellwig
  2024-03-28  4:42       ` Dave Chinner
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-28  4:34 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Thu, Mar 28, 2024 at 03:27:36PM +1100, Dave Chinner wrote:
> >  	percpu_counter_set(&mp->m_fdblocks, fsc->fdblocks);
> > -	percpu_counter_set(&mp->m_frextents, fsc->frextents);
> > +	percpu_counter_set(&mp->m_frextents,
> > +		fsc->frextents - fsc->frextents_delayed);
> >  	mp->m_sb.sb_frextents = fsc->frextents;
> 
> Why do we set mp->m_frextents differently to mp->m_fdblocks?
> Surely if we have to care about delalloc blocks here, we have to
> process both data device and rt device delalloc block accounting the
> same way, right?

Unfortunately there are different.  For data device blocks we use the
lazy sb counters and thus never updated the sb version for any file
system new enough to support scrub.  For RT extents lazy sb counters
only appear half way down Darrick's giant stack and aren't even
upstream yet.

> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
---end quoted text---

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations
  2024-03-28  4:32   ` Dave Chinner
@ 2024-03-28  4:34     ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-28  4:34 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Thu, Mar 28, 2024 at 03:32:50PM +1100, Dave Chinner wrote:
> It took me extra brain cells to realise that this had nested
> function calls because of the way the long line is split. Can we
> change it to look like this:
> 
> 		freesp = xfs_rtx_to_rtb(mp,
> 				xfs_iomap_freesp(&mp->m_frextents,
> 						mp->m_low_rtexts, &shift));
> 
> Just to make it a little easier to spot the nested function and
> which parameters belong to which function?

Sure.  Or maybe even add a local variable.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/13] xfs: rework splitting of indirect block reservations
  2024-03-27 11:03 ` [PATCH 11/13] xfs: rework splitting of indirect block reservations Christoph Hellwig
  2024-03-27 15:14   ` Darrick J. Wong
@ 2024-03-28  4:35   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:35 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:16PM +0100, Christoph Hellwig wrote:
> Move the check if we have enough indirect blocks and the stealing of
> the deleted extent blocks out of xfs_bmap_split_indlen and into the
> caller to prepare for handling delayed allocation of RT extents that
> can't easily be stolen.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks)
  2024-03-27 11:03 ` [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
  2024-03-27 15:18   ` Darrick J. Wong
@ 2024-03-28  4:36   ` Dave Chinner
  1 sibling, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:17PM +0100, Christoph Hellwig wrote:
> When xfs_bmap_del_extent_delay has to split an indirect block it tries
> to steal blocks from the the part that gets unmapped to increase the
> indirect block reservation that now needs to cover for two extents
> instead of one.
> 
> This works perfectly fine on the data device, where the data and
> indirect blocks come from the same pool.  It has no chance of working
> when the inode sits on the RT device.  To support re-enabling delalloc
> for inodes on the RT device, make this behavior conditional on not
> beeing for rt extents.
> 
> Note that split of delalloc extents should only happen on writeback
> failure, as for other kinds of hole punching we first write back all
> data and thus convert the delalloc reservations covering the hole to
> a real allocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/13] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1)
  2024-03-27 11:03 ` [PATCH 13/13] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig
@ 2024-03-28  4:39   ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Wed, Mar 27, 2024 at 12:03:18PM +0100, Christoph Hellwig wrote:
> Commit aff3a9edb708 ("xfs: Use preallocation for inodes with extsz
> hints") disabled delayed allocation for all inodes with extent size
> hints due a data exposure problem.  It turns out we fixed this data
> exposure problem since by always creating unwritten extents for
> delalloc conversions due to more data exposure problems, but the
> writeback path doesn't actually support extent size hints when
> converting delalloc these days, which probably isn't a problem given
> that people using the hints know what they get.
>
> However due to the way how xfs_get_extsz_hint is implemented, it
> always claims an extent size hint for RT inodes even if the RT
> extent size is a single FSB.  Due to that the above commit effectively
> disabled delalloc support for RT inodes.
> 
> Switch xfs_get_extsz_hint to return 0 for this case and work around
> that in a few places to reinstate delalloc support for RT inodes on
> file systems with an sb_rextsize of 1.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Looks OK.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-28  4:34     ` Christoph Hellwig
@ 2024-03-28  4:42       ` Dave Chinner
  2024-03-28  8:25         ` Christoph Hellwig
  0 siblings, 1 reply; 46+ messages in thread
From: Dave Chinner @ 2024-03-28  4:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Thu, Mar 28, 2024 at 05:34:11AM +0100, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 03:27:36PM +1100, Dave Chinner wrote:
> > >  	percpu_counter_set(&mp->m_fdblocks, fsc->fdblocks);
> > > -	percpu_counter_set(&mp->m_frextents, fsc->frextents);
> > > +	percpu_counter_set(&mp->m_frextents,
> > > +		fsc->frextents - fsc->frextents_delayed);
> > >  	mp->m_sb.sb_frextents = fsc->frextents;
> > 
> > Why do we set mp->m_frextents differently to mp->m_fdblocks?
> > Surely if we have to care about delalloc blocks here, we have to
> > process both data device and rt device delalloc block accounting the
> > same way, right?
> 
> Unfortunately there are different.  For data device blocks we use the
> lazy sb counters and thus never updated the sb version for any file
> system new enough to support scrub.  For RT extents lazy sb counters
> only appear half way down Darrick's giant stack and aren't even
> upstream yet.

Can you add a comment to either the code or commit message to that
effect? Otherwise I'm going to forget about that and not be able to
discover it from looking at the code and/or commit messages...

With such a comment, you can also add

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-28  4:42       ` Dave Chinner
@ 2024-03-28  8:25         ` Christoph Hellwig
  2024-03-28 21:25           ` Dave Chinner
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-28  8:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

On Thu, Mar 28, 2024 at 03:42:19PM +1100, Dave Chinner wrote:
> > Unfortunately there are different.  For data device blocks we use the
> > lazy sb counters and thus never updated the sb version for any file
> > system new enough to support scrub.  For RT extents lazy sb counters
> > only appear half way down Darrick's giant stack and aren't even
> > upstream yet.
> 
> Can you add a comment to either the code or commit message to that
> effect? Otherwise I'm going to forget about that and not be able to
> discover it from looking at the code and/or commit messages...

Does this look good?

http://git.infradead.org/?p=users/hch/xfs.git;a=commitdiff;h=91bbe4c2d5518a1f991d7f19aad350d636e42d32


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc
  2024-03-28  8:25         ` Christoph Hellwig
@ 2024-03-28 21:25           ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-28 21:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Thu, Mar 28, 2024 at 09:25:16AM +0100, Christoph Hellwig wrote:
> On Thu, Mar 28, 2024 at 03:42:19PM +1100, Dave Chinner wrote:
> > > Unfortunately there are different.  For data device blocks we use the
> > > lazy sb counters and thus never updated the sb version for any file
> > > system new enough to support scrub.  For RT extents lazy sb counters
> > > only appear half way down Darrick's giant stack and aren't even
> > > upstream yet.
> > 
> > Can you add a comment to either the code or commit message to that
> > effect? Otherwise I'm going to forget about that and not be able to
> > discover it from looking at the code and/or commit messages...
> 
> Does this look good?
> 
> http://git.infradead.org/?p=users/hch/xfs.git;a=commitdiff;h=91bbe4c2d5518a1f991d7f19aad350d636e42d32

Spot on. Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-03-28  4:13   ` Dave Chinner
@ 2024-03-29  4:14     ` Christoph Hellwig
  2024-03-30 21:55       ` Dave Chinner
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Hellwig @ 2024-03-29  4:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Chandan Babu R, Darrick J. Wong, linux-xfs

FYI, I've picked up your comments with minor edits for the commit
message, let me know if that is ok with you:

http://git.infradead.org/?p=users/hch/xfs.git;a=commitdiff;h=445d786ae6e50f6631118e0fa378ad0cd72076a9


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-03-29  4:14     ` Christoph Hellwig
@ 2024-03-30 21:55       ` Dave Chinner
  0 siblings, 0 replies; 46+ messages in thread
From: Dave Chinner @ 2024-03-30 21:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chandan Babu R, Darrick J. Wong, linux-xfs

On Fri, Mar 29, 2024 at 05:14:26AM +0100, Christoph Hellwig wrote:
> FYI, I've picked up your comments with minor edits for the commit
> message, let me know if that is ok with you:
> 
> http://git.infradead.org/?p=users/hch/xfs.git;a=commitdiff;h=445d786ae6e50f6631118e0fa378ad0cd72076a9

Yes, that's fine. It certainly can't hurt having some of the
historic context of the code in new commits as it avoids the need
for archive spelunking to make sense of the change...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/13] xfs: free RT extents after updating the bmap btree
  2024-04-22 11:20 bring back RT delalloc support v6 Christoph Hellwig
@ 2024-04-22 11:20 ` Christoph Hellwig
  0 siblings, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2024-04-22 11:20 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: Darrick J. Wong, Dave Chinner, linux-xfs, Dave Chinner

Currently xfs_bmap_del_extent_real frees RT extents before updating
the bmap btree, while it frees regular blocks after performing the bmap
btree update for convoluted historic reasons.  Switch to free the RT
blocks in the same place as the regular data blocks instead to simply
the code and fix a very theoretical bug.

A short history of this code researched by Dave Chiner below:

The truncate for data device extents was originally a two-phase
operation. First it removed the bmapbt record, but because this can
free BMBT extents, it can use up all the free space tree reservation
space. So the transaction gets rolled to commit the BMBT change and
the xfs_bmap_finish() call that frees the data extent runs with a
new transaction reservation that allows different free space btrees
to be logged without overrun.

However, on crash, this could lose the free space because there was
nothing to tell recovery about the extents removed from the BMBT,
hence EFIs were introduced. They tie the extent free operation to the
bmapbt record removal commit for recovery of the second phase of the
extent removal process.

Then RT extents came along. RT extent freeing does not require a
free space btree reservation because the free space metadata is
static and transaction size is bound. Hence we don't need to care if
the BMBT record removal modifies the per-ag free space trees and we
don't need a two-phase extent remove transaction. The only thing we
have to care about is not losing space on crash.

Hence instead of recording the extent for freeing in the bmap list
for xfs_bmap_finish() to process in a new transaction, it simply
freed the rtextent directly. So the original code (from 1994) simply
replaced the "free AG extent later" queueing with a direct free.

This code was originally at the start of xfs_dmap_del_extent(), but
the xfs_bmap_add_free() got moved to the end of the function via the
"do_fx" flag (the current code logic) in 1997 (commit c4fac74eaa58
in the historic xfs-import tree) because there was a shutdown occurring
because of a case where splitting the extent record failed because the
BMBT split and the filesystem didn't have enough space for the split to
be done. (FWIW, I'm not sure this can happen anymore.)

The commit backed out the BMBT change on ENOSPC error, and in doing
so I think this actually breaks RT free space tracking. However, it
then returns an ENOSPC error, and we have a dirty transaction in the
RT case so this will shut down the filesysetm when the transaction
is cancelled. Hence the corrupted "bmbt now points at freed rt dev
space" condition never make it to disk, but it's still the wrong way
to handle the issue.

IOWs, this proposed change fixes that "shutdown at ENOSPC on rt
devices" situation that was introduced by the above commit back in
1997.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3e56173a8fe4db..f529aa40710924 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5109,8 +5109,7 @@ xfs_bmap_del_extent_real(
 {
 	xfs_fsblock_t		del_endblock=0;	/* first block past del */
 	xfs_fileoff_t		del_endoff;	/* first offset past del */
-	int			do_fx;	/* free extent at end of routine */
-	int			error;	/* error return value */
+	int			error = 0;	/* error return value */
 	struct xfs_bmbt_irec	got;	/* current extent entry */
 	xfs_fileoff_t		got_endoff;	/* first offset past got */
 	int			i;	/* temp state */
@@ -5153,20 +5152,10 @@ xfs_bmap_del_extent_real(
 		return -ENOSPC;
 
 	*logflagsp = XFS_ILOG_CORE;
-	if (xfs_ifork_is_realtime(ip, whichfork)) {
-		if (!(bflags & XFS_BMAPI_REMAP)) {
-			error = xfs_rtfree_blocks(tp, del->br_startblock,
-					del->br_blockcount);
-			if (error)
-				return error;
-		}
-
-		do_fx = 0;
+	if (xfs_ifork_is_realtime(ip, whichfork))
 		qfield = XFS_TRANS_DQ_RTBCOUNT;
-	} else {
-		do_fx = 1;
+	else
 		qfield = XFS_TRANS_DQ_BCOUNT;
-	}
 	nblks = del->br_blockcount;
 
 	del_endblock = del->br_startblock + del->br_blockcount;
@@ -5314,18 +5303,21 @@ xfs_bmap_del_extent_real(
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
-	if (do_fx && !(bflags & XFS_BMAPI_REMAP)) {
+	if (!(bflags & XFS_BMAPI_REMAP)) {
 		if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) {
 			xfs_refcount_decrease_extent(tp, del);
+		} else if (xfs_ifork_is_realtime(ip, whichfork)) {
+			error = xfs_rtfree_blocks(tp, del->br_startblock,
+					del->br_blockcount);
 		} else {
 			error = xfs_free_extent_later(tp, del->br_startblock,
 					del->br_blockcount, NULL,
 					XFS_AG_RESV_NONE,
 					((bflags & XFS_BMAPI_NODISCARD) ||
 					del->br_state == XFS_EXT_UNWRITTEN));
-			if (error)
-				return error;
 		}
+		if (error)
+			return error;
 	}
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2024-04-22 11:20 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-27 11:03 bring back RT delalloc support v5 Christoph Hellwig
2024-03-27 11:03 ` [PATCH 01/13] xfs: make XFS_TRANS_LOWMODE match the other XFS_TRANS_ definitions Christoph Hellwig
2024-03-28  3:08   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 02/13] xfs: refactor realtime inode locking Christoph Hellwig
2024-03-28  3:15   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 03/13] xfs: free RT extents after updating the bmap btree Christoph Hellwig
2024-03-27 14:55   ` Darrick J. Wong
2024-03-27 17:03     ` Christoph Hellwig
2024-03-28  4:13   ` Dave Chinner
2024-03-29  4:14     ` Christoph Hellwig
2024-03-30 21:55       ` Dave Chinner
2024-03-27 11:03 ` [PATCH 04/13] xfs: move RT inode locking out of __xfs_bunmapi Christoph Hellwig
2024-03-27 15:07   ` Darrick J. Wong
2024-03-27 17:06     ` Christoph Hellwig
2024-03-27 18:06       ` Darrick J. Wong
2024-03-27 18:12         ` Christoph Hellwig
2024-03-28  4:15   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 05/13] xfs: block deltas in xfs_trans_unreserve_and_mod_sb must be positive Christoph Hellwig
2024-03-28  4:16   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 06/13] xfs: split xfs_mod_freecounter Christoph Hellwig
2024-03-27 15:12   ` Darrick J. Wong
2024-03-28  4:18   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 07/13] xfs: reinstate RT support in xfs_bmapi_reserve_delalloc Christoph Hellwig
2024-03-28  4:20   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 08/13] xfs: cleanup fdblock/frextent accounting in xfs_bmap_del_extent_delay Christoph Hellwig
2024-03-28  4:22   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 09/13] xfs: support RT inodes in xfs_mod_delalloc Christoph Hellwig
2024-03-27 15:20   ` Darrick J. Wong
2024-03-27 17:05     ` Christoph Hellwig
2024-03-28  4:27   ` Dave Chinner
2024-03-28  4:34     ` Christoph Hellwig
2024-03-28  4:42       ` Dave Chinner
2024-03-28  8:25         ` Christoph Hellwig
2024-03-28 21:25           ` Dave Chinner
2024-03-27 11:03 ` [PATCH 10/13] xfs: look at m_frextents in xfs_iomap_prealloc_size for RT allocations Christoph Hellwig
2024-03-28  4:32   ` Dave Chinner
2024-03-28  4:34     ` Christoph Hellwig
2024-03-27 11:03 ` [PATCH 11/13] xfs: rework splitting of indirect block reservations Christoph Hellwig
2024-03-27 15:14   ` Darrick J. Wong
2024-03-28  4:35   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 12/13] xfs: stop the steal (of data blocks for RT indirect blocks) Christoph Hellwig
2024-03-27 15:18   ` Darrick J. Wong
2024-03-28  4:36   ` Dave Chinner
2024-03-27 11:03 ` [PATCH 13/13] xfs: reinstate delalloc for RT inodes (if sb_rextsize == 1) Christoph Hellwig
2024-03-28  4:39   ` Dave Chinner
2024-04-22 11:20 bring back RT delalloc support v6 Christoph Hellwig
2024-04-22 11:20 ` [PATCH 03/13] xfs: free RT extents after updating the bmap btree Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.