All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing
@ 2021-05-27  4:51 Dave Chinner
  2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:51 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

I pulled on a loose thread when I started looking into the 64kB
directory block size assert failure I was seeing while trying to
test the bulk page allocation changes.

I posted the first patch in the series separately - it fixed the
immediate assert failure (5.13-rc1 regression) I was seeing, but in
fixing that it only then dropped back to the previous assert failure
that g/538 was triggering with 64kb directory block sizes. This can
only be reproduced on 5.12, because that's when the error injection
that g/538 uses was added. So I went looking deeper.

It turns out that xfs_bunmapi() has some code in it to avoid locking
AGFs in the wrong order and this is what was triggering. Many of the
xfs_bunmapi() callers can not/do not handle partial unmaps that
return success, and that's what the directory code is tripping over
trying to free badly fragmented directory blocks.

This AGF locking order constraint was added to xfs_bunmapu in 2017
to avoid a deadlock in g/299. Sad thing is that shortly after this,
we converted xfs-bunmapi to use deferred freeing, so it never
actually locks AGFs anymore. But the deadlock avoiding landmine
remained. And xfs_bmap_finish() went away, too, and we now only ever
put one extent in any EFI we log for deferred freeing.

That means we now only free one extent per transaction via deferred
freeing, and there are no limitations on what order xfs_bunmapi()
can unmap extents. 64kB directories on a 1kB block size filesystem
already unmap 64 extents in a single loop, so there's no real
limitation here.

This means that the limitations of how many extents we can unmap per
loop in xfs_itruncate_extents_flags() goes away for data device
extents (and will eventually go away for RT devices, too, when
Darrick's RT EFI stuff gets merged).

This "one data deveice extent free per transaction" change now means
that all of the transaction reservations that include
"xfs_bmap_finish" based freeing reservations are wrong. These extent
frees are now done by deferred freeing, and so they only need a
single extent free reservation instead of up to 4 (as truncate was
reserving).

This series fixes the btree fork regression, the bunmapi partial
unmap regression from 2017, extends xfs_itruncate_extents to unmap
64 extents at a time for data device (AG) resident extents, and
reworks the transaction reservations to use a consistent and correct
reservation for allocation and freeing extents. The size of some
transaction reservations drops dramatically as a result.

The first two patches are -rcX candidates, the rest are for the next
merge cycle....

Cheers,

Dave.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/6] xfs: btree format inode forks can have zero extents
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
@ 2021-05-27  4:51 ` Dave Chinner
  2021-05-27  6:15   ` Darrick J. Wong
  2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:51 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

xfs/538 is assert failing with this trace when testing with
directory block sizes of 64kB:

XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
....
Call Trace:
 xfs_bmap_btree_to_extents+0x2a9/0x470
 ? kmem_cache_alloc+0xe7/0x220
 __xfs_bunmapi+0x4ca/0xdf0
 xfs_bunmapi+0x1a/0x30
 xfs_dir2_shrink_inode+0x71/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_remove+0x244/0x460
 xfs_vn_unlink+0x53/0xa0
 ? selinux_inode_unlink+0x13/0x20
 vfs_unlink+0x117/0x220
 do_unlinkat+0x1a2/0x2d0
 __x64_sys_unlink+0x42/0x60
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

This is a check to ensure that the extents have been read into
memory before we are doing a ifork btree manipulation. This assert
is bogus in the above case.

We have a fragmented directory block that has more extents in it
than can fit in extent format, so the inode data fork is in btree
format. xfs_dir2_shrink_inode() asks to remove all remaining 16
filesystem blocks from the inode so it can convert to short form,
and __xfs_bunmapi() removes all the extents. We now have a data fork
in btree format but have zero extents in the fork. This incorrectly
trips the xfs_need_iread_extents() assert because it assumes that an
empty extent btree means the extent tree has not been read into
memory yet. This is clearly not the case with xfs_bunmapi(), as it
has an explicit call to xfs_iread_extents() in it to pull the
extents into memory before it starts unmapping.

Also, the assert directly after this bogus one is:

	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);

Which covers the context in which it is legal to call
xfs_bmap_btree_to_extents just fine. Hence we should just remove the
bogus assert as it is clearly wrong and causes a regression.

The returns the test behaviour to the pre-existing assert failure in
xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
remove all the extents in the range it was asked to unmap.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7e3b9b01431e..3f8b6da09261 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -605,7 +605,6 @@ xfs_bmap_btree_to_extents(
 
 	ASSERT(cur);
 	ASSERT(whichfork != XFS_COW_FORK);
-	ASSERT(!xfs_need_iread_extents(ifp));
 	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
 	ASSERT(be16_to_cpu(rblock->bb_level) == 1);
 	ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
  2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
@ 2021-05-27  4:51 ` Dave Chinner
  2021-05-27  6:16   ` Darrick J. Wong
  2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:51 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

large directory block size operations are assert failing because
xfs_bunmapi() is not completely removing fragmented directory blocks
like so:

XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
....
Call Trace:
 xfs_dir2_shrink_inode+0x1a8/0x210
 xfs_dir2_block_to_sf+0x2ae/0x410
 xfs_dir2_block_removename+0x21a/0x280
 xfs_dir_removename+0x195/0x1d0
 xfs_rename+0xb79/0xc50
 ? avc_has_perm+0x8d/0x1a0
 ? avc_has_perm_noaudit+0x9a/0x120
 xfs_vn_rename+0xdb/0x150
 vfs_rename+0x719/0xb50
 ? __lookup_hash+0x6a/0xa0
 do_renameat2+0x413/0x5e0
 __x64_sys_rename+0x45/0x50
 do_syscall_64+0x3a/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xae

We are aborting the bunmapi() pass because of this specific chunk of
code:

                /*
                 * Make sure we don't touch multiple AGF headers out of order
                 * in a single transaction, as that could cause AB-BA deadlocks.
                 */
                if (!wasdel && !isrt) {
                        agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
                        if (prev_agno != NULLAGNUMBER && prev_agno > agno)
                                break;
                        prev_agno = agno;
                }

This is designed to prevent deadlocks in AGF locking when freeing
multiple extents by ensuring that we only ever lock in increasing
AG number order. Unfortunately, this also violates the "bunmapi will
always succeed" semantic that some high level callers depend on,
such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
xfs_inactive_symlink_rmt().

This AG lock ordering was introduced back in 2017 to fix deadlocks
triggered by generic/299 as reported here:

https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/

This codebase is old enough that it was before we were defering all
AG based extent freeing from within xfs_bunmapi(). THat is, we never
actually lock AGs in xfs_bunmapi() any more - every non-rt based
extent free is added to the defer ops list, as is all BMBT block
freeing. And RT extents are not RT based, so there's no lock
ordering issues associated with them.

Hence this AGF lock ordering code is both broken and dead. Let's
just remove it so that the large directory block code works reliably
again.

Tested against xfs/538 and generic/299 which is the original test
that exposed the deadlocks that this code fixed.

Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3f8b6da09261..a3e0e6f672d6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5349,7 +5349,6 @@ __xfs_bunmapi(
 	xfs_fsblock_t		sum;
 	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
 	xfs_fileoff_t		max_len;
-	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
 	xfs_fileoff_t		end;
 	struct xfs_iext_cursor	icur;
 	bool			done = false;
@@ -5441,16 +5440,6 @@ __xfs_bunmapi(
 		del = got;
 		wasdel = isnullstartblock(del.br_startblock);
 
-		/*
-		 * Make sure we don't touch multiple AGF headers out of order
-		 * in a single transaction, as that could cause AB-BA deadlocks.
-		 */
-		if (!wasdel && !isrt) {
-			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
-			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
-				break;
-			prev_agno = agno;
-		}
 		if (got.br_startoff < start) {
 			del.br_startoff = start;
 			del.br_blockcount -= start - got.br_startoff;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
  2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
  2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
@ 2021-05-27  4:51 ` Dave Chinner
  2021-05-31 12:55   ` Chandan Babu R
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:51 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Ever since we moved to freeing of extents by deferred operations,
we've already freed extents via individual transactions. Hence the
only limitation of how many extents we can mark for freeing in a
single xfs_bunmapi() call bound only by how many deferrals we want
to queue.

That is xfs_bunmapi() doesn't actually do any AG based extent
freeing, so there's no actually transaction reservation used up by
calling bunmapi with a large count of extents to be freed. RT
extents have always been freed directly by bunmapi, but that doesn't
require modification of large number of blocks as there are no
btrees to split.

Some callers of xfs_bunmapi assume that the extent count being freed
is bound by geometry (e.g. directories) and these can ask bunmapi to
free up to 64 extents in a single call. These functions just work as
tehy stand, so there's no reason for truncate to have a limit of
just two extents per bunmapi call anymore.

Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
that can be deferred in a single loop to match what the directory
code already uses.

For realtime inodes, where xfs_bunmapi() directly frees extents,
leave the limit at 2 extents per loop as this is all the space that
the transaction reservation will cover.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_inode.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 0369eb22c1bb..db220eaa34b8 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
 
 /*
  * Used in xfs_itruncate_extents().  This is the maximum number of extents
- * freed from a file in a single transaction.
+ * we will unmap and defer for freeing in a single call to xfs_bunmapi().
+ * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
+ * by transaction reservation size to 2 extents.
  */
-#define	XFS_ITRUNC_MAX_EXTENTS	2
+static inline int
+xfs_itrunc_max_extents(
+	struct xfs_inode	*ip)
+{
+	if (XFS_IS_REALTIME_INODE(ip))
+		return 2;
+	return 64;
+}
 
 STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
 STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
@@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
 	while (unmap_len > 0) {
 		ASSERT(tp->t_firstblock == NULLFSBLOCK);
 		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
-				flags, XFS_ITRUNC_MAX_EXTENTS);
+				flags, xfs_itrunc_max_extents(ip));
 		if (error)
 			goto out;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/6] xfs: add a free space extent change reservation
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
                   ` (2 preceding siblings ...)
  2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
@ 2021-05-27  4:52 ` Dave Chinner
  2021-05-27  6:38     ` kernel test robot
                     ` (4 more replies)
  2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
                   ` (2 subsequent siblings)
  6 siblings, 5 replies; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:52 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Lots of the transaction reservation code reserves space for an
extent allocation. It is inconsistently implemented, and many of
them get it wrong. Introduce a new function to calculate the log
space reservation for adding or removing an extent from the free
space btrees.

This function reserves space for logging the AGF, the AGFL and the
free space btrees, avoiding the need to account for them seperately
in every reservation that manipulates free space.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index d1a0848cb52e..6363cacb790f 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -79,6 +79,23 @@ xfs_allocfree_log_count(
 	return blocks;
 }
 
+/*
+ * Log reservation required to add or remove a single extent to the free space
+ * btrees.  This requires modifying:
+ *
+ * the agf header: 1 sector
+ * the agfl header: 1 sector
+ * the allocation btrees: 2 trees * (max depth - 1) * block size
+ */
+uint
+xfs_allocfree_extent_res(
+	struct xfs_mount *mp)
+{
+	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
+	       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
+				XFS_FSB_TO_B(mp, 1));
+}
+
 /*
  * Logging inodes is really tricksy. They are logged in memory format,
  * which means that what we write into the log doesn't directly translate into
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/6] xfs: factor free space tree transaciton reservations
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
                   ` (3 preceding siblings ...)
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
@ 2021-05-27  4:52 ` Dave Chinner
  2021-06-02 21:36   ` Darrick J. Wong
  2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
  2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:52 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Convert all the open coded free space tree modification reservations
to use the new xfs_allocfree_extent_res() function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 122 ++++++++++++---------------------
 1 file changed, 45 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6363cacb790f..02079f55ef20 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -143,18 +143,16 @@ xfs_calc_inode_res(
  * reservation:
  *
  * the inode btree: max depth * blocksize
- * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * one extent allocfree reservation for the AG.
  *
- * The caller must account for SB and AG header modifications, etc.
  */
 STATIC uint
 xfs_calc_inobt_res(
 	struct xfs_mount	*mp)
 {
 	return xfs_calc_buf_res(M_IGEO(mp)->inobt_maxlevels,
-			XFS_FSB_TO_B(mp, 1)) +
-				xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-			XFS_FSB_TO_B(mp, 1));
+				XFS_FSB_TO_B(mp, 1)) +
+		xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -182,7 +180,7 @@ xfs_calc_finobt_res(
  * Calculate the reservation required to allocate or free an inode chunk. This
  * includes:
  *
- * the allocation btrees: 2 trees * (max depth - 1) * block size
+ * one extent allocfree reservation for the AG.
  * the inode chunk: m_ino_geo.ialloc_blks * N
  *
  * The size N of the inode chunk reservation depends on whether it is for
@@ -200,8 +198,7 @@ xfs_calc_inode_chunk_res(
 {
 	uint			res, size = 0;
 
-	res = xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-			       XFS_FSB_TO_B(mp, 1));
+	res = xfs_allocfree_extent_res(mp);
 	if (alloc) {
 		/* icreate tx uses ordered buffers */
 		if (xfs_sb_version_has_v3inode(&mp->m_sb))
@@ -256,22 +253,18 @@ xfs_rtalloc_log_count(
  * extents.  This gives (t1):
  *    the inode getting the new extents: inode size
  *    the inode's bmap btree: max depth * block size
- *    the agfs of the ags from which the extents are allocated: 2 * sector
  *    the superblock free block counter: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  * Or, if we're writing to a realtime file (t2):
  *    the inode getting the new extents: inode size
  *    the inode's bmap btree: max depth * block size
- *    the agfs of the ags from which the extents are allocated: 2 * sector
+ *    one extent allocfree reservation for the AG.
  *    the superblock free block counter: sector size
  *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 1 block
- *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
  * And the bmap_finish transaction can free bmap blocks in a join (t3):
- *    the agfs of the ags containing the blocks: 2 * sector size
- *    the agfls of the ags containing the blocks: 2 * sector size
  *    the super block free block counter: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_write_reservation(
@@ -282,8 +275,8 @@ xfs_calc_write_reservation(
 
 	t1 = xfs_calc_inode_res(mp, 1) +
 	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), blksz) +
-	     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
+	     xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp) * 2;
 
 	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
 		t2 = xfs_calc_inode_res(mp, 1) +
@@ -291,13 +284,13 @@ xfs_calc_write_reservation(
 				     blksz) +
 		     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
 		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 1), blksz) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), blksz);
+		     xfs_allocfree_extent_res(mp);
 	} else {
 		t2 = 0;
 	}
 
-	t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
+	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp) * 2;
 
 	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
 }
@@ -307,19 +300,13 @@ xfs_calc_write_reservation(
  *    the inode being truncated: inode size
  *    the inode's bmap btree: (max depth + 1) * block size
  * And the bmap_finish transaction can free the blocks and bmap blocks (t2):
- *    the agf for each of the ags: 4 * sector size
- *    the agfl for each of the ags: 4 * sector size
  *    the super block to reflect the freed blocks: sector size
- *    worst case split in allocation btrees per extent assuming 4 extents:
- *		4 exts * 2 trees * (2 * max depth - 1) * block size
+ *    four extent allocfree reservations for the AG.
  * Or, if it's a realtime file (t3):
- *    the agf for each of the ags: 2 * sector size
- *    the agfl for each of the ags: 2 * sector size
  *    the super block to reflect the freed blocks: sector size
  *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 2 exts * 1 block
- *    worst case split in allocation btrees per extent assuming 2 extents:
- *		2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_itruncate_reservation(
@@ -331,13 +318,13 @@ xfs_calc_itruncate_reservation(
 	t1 = xfs_calc_inode_res(mp, 1) +
 	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz);
 
-	t2 = xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4), blksz);
+	t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp) * 4;
 
 	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
-		t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
+		t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
 		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
+		     xfs_allocfree_extent_res(mp) * 2;
 	} else {
 		t3 = 0;
 	}
@@ -352,10 +339,8 @@ xfs_calc_itruncate_reservation(
  *    the two directory bmap btrees: 2 * max depth * block size
  * And the bmap_finish transaction can free dir and bmap blocks (two sets
  *	of bmap blocks) giving:
- *    the agf for the ags in which the blocks live: 3 * sector size
- *    the agfl for the ags in which the blocks live: 3 * sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
+ *    three extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_rename_reservation(
@@ -365,9 +350,8 @@ xfs_calc_rename_reservation(
 		max((xfs_calc_inode_res(mp, 4) +
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp) * 3));
 }
 
 /*
@@ -381,20 +365,19 @@ xfs_calc_iunlink_remove_reservation(
 	struct xfs_mount        *mp)
 {
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-	       2 * M_IGEO(mp)->inode_cluster_size;
+	       xfs_calc_buf_res(2, M_IGEO(mp)->inode_cluster_size);
 }
 
 /*
  * For creating a link to an inode:
+ *    the inode is removed from the iunlink list (O_TMPFILE)
  *    the parent directory inode: inode size
  *    the linked inode: inode size
  *    the directory btree could split: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
  * And the bmap_finish transaction can free some bmap blocks giving:
- *    the agf for the ag in which the blocks live: sector size
- *    the agfl for the ag in which the blocks live: sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_link_reservation(
@@ -405,9 +388,8 @@ xfs_calc_link_reservation(
 		max((xfs_calc_inode_res(mp, 2) +
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -419,20 +401,19 @@ STATIC uint
 xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
 {
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-			M_IGEO(mp)->inode_cluster_size;
+	       xfs_calc_buf_res(1, M_IGEO(mp)->inode_cluster_size);
 }
 
 /*
  * For removing a directory entry we can modify:
+ *    the inode is added to the agi unlinked list
  *    the parent directory inode: inode size
  *    the removed inode: inode size
  *    the directory btree could join: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
  * And the bmap_finish transaction can free the dir and bmap blocks giving:
- *    the agf for the ag in which the blocks live: 2 * sector size
- *    the agfl for the ag in which the blocks live: 2 * sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_remove_reservation(
@@ -443,9 +424,8 @@ xfs_calc_remove_reservation(
 		max((xfs_calc_inode_res(mp, 1) +
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp) * 2));
 }
 
 /*
@@ -581,16 +561,14 @@ xfs_calc_ichange_reservation(
 /*
  * Growing the data section of the filesystem.
  *	superblock
- *	agi and agf
- *	allocation btrees
+ *      one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_growdata_reservation(
 	struct xfs_mount	*mp)
 {
-	return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	       xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -598,10 +576,9 @@ xfs_calc_growdata_reservation(
  * In the first set of transactions (ALLOC) we allocate space to the
  * bitmap or summary files.
  *	superblock: sector size
- *	agf of the ag from which the extent is allocated: sector size
  *	bmap btree for bitmap/summary inode: max depth * blocksize
  *	bitmap/summary inode: inode size
- *	allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize
+ *      one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_growrtalloc_reservation(
@@ -611,8 +588,7 @@ xfs_calc_growrtalloc_reservation(
 		xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_inode_res(mp, 1) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+		xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -675,7 +651,7 @@ xfs_calc_writeid_reservation(
  *	agf block and superblock (for block allocation)
  *	the new block (directory sized)
  *	bmap blocks for the new directory block
- *	allocation btrees
+ *      one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_addafork_reservation(
@@ -687,8 +663,7 @@ xfs_calc_addafork_reservation(
 		xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
 		xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
 				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
-				 XFS_FSB_TO_B(mp, 1));
+		xfs_allocfree_extent_res(mp);
 }
 
 /*
@@ -696,11 +671,8 @@ xfs_calc_addafork_reservation(
  *    the inode being truncated: inode size
  *    the inode's bmap btree: max depth * block size
  * And the bmap_finish transaction can free the blocks and bmap blocks:
- *    the agf for each of the ags: 4 * sector size
- *    the agfl for each of the ags: 4 * sector size
  *    the super block to reflect the freed blocks: sector size
- *    worst case split in allocation btrees per extent assuming 4 extents:
- *		4 exts * 2 trees * (2 * max depth - 1) * block size
+ *    four extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_attrinval_reservation(
@@ -709,9 +681,8 @@ xfs_calc_attrinval_reservation(
 	return max((xfs_calc_inode_res(mp, 1) +
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
-		   (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
-				     XFS_FSB_TO_B(mp, 1))));
+		   (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		    xfs_allocfree_extent_res(mp) * 4));
 }
 
 /*
@@ -760,10 +731,8 @@ xfs_calc_attrsetrt_reservation(
  *    the attribute btree could join: max depth * block size
  *    the inode bmap btree could join or split: max depth * block size
  * And the bmap_finish transaction can free the attr blocks freed giving:
- *    the agf for the ag in which the blocks live: 2 * sector size
- *    the agfl for the ag in which the blocks live: 2 * sector size
  *    the superblock for the free block count: sector size
- *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
+ *    two extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_attrrm_reservation(
@@ -776,9 +745,8 @@ xfs_calc_attrrm_reservation(
 		     (uint)XFS_FSB_TO_B(mp,
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
-		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_allocfree_extent_res(mp) * 2));
 }
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
                   ` (4 preceding siblings ...)
  2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
@ 2021-05-27  4:52 ` Dave Chinner
  2021-05-27  6:19   ` Darrick J. Wong
  2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  4:52 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Ever since we moved to deferred freeing of extents, we only every
free one extent per transaction. We separated the bulk unmapping of
extents from the submission of EFI/free/EFD transactions, and hence
while we unmap extents in bulk, we only every free one per
transaction.

Our transaction reservations still live in the era from before
deferred freeing of extents, so still refer to "xfs_bmap_finish"
and it needing to free multiple extents per transaction. These
freeing reservations can now all be reduced to a single extent to
reflect how we currently free extents.

This significantly reduces the reservation sizes for operations like
truncate and directory operations where they currently reserve space
for freeing up to 4 extents per transaction.

For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
reservation sizes change like this:

Reservation		Before			After
(index)			logres	logcount	logres	logcount
 0	write		314104	    8		314104	    8
 1	itruncate	579456	    8           148608	    8
 2	rename		435840	    2           307936	    2
 3	link		191600	    2           191600	    2
 4	remove		312960	    2           174328	    2
 5	symlink		470656	    3           470656	    3
 6	create		469504	    2           469504	    2
 7	create_tmpfile	490240	    2           490240	    2
 8	mkdir		469504	    3           469504	    3
 9	ifree		508664	    2           508664	    2
 10	ichange		  5752	    0             5752	    0
 11	growdata	147840	    2           147840	    2
 12	addafork	178936	    2           178936	    2
 13	writeid		   760	    0              760	    0
 14	attrinval	578688	    1           147840	    1
 15	attrsetm	 26872	    3            26872	    3
 16	attrsetrt	 16896	    0            16896	    0
 17	attrrm		292224	    3           148608	    3
 18	clearagi	  4224	    0             4224	    0
 19	growrtalloc	173944	    2           173944	    2
 20	growrtzero	  4224	    0             4224	    0
 21	growrtfree	 10096	    0            10096	    0
 22	qm_setqlim	   232	    1              232	    1
 23	qm_dqalloc	318327	    8           318327	    8
 24	qm_quotaoff	  4544	    1             4544	    1
 25	qm_equotaoff	   320	    1              320	    1
 26	sb		  4224	    1             4224	    1
 27	fsyncts		   760	    0              760	    0
MAX			579456	    8           318327	    8

So we can see that many of the reservations have gone substantially
down in size. itruncate, rename, remove, attrinval and attrrm are
much smaller now. The maximum reservation size has gone from being
attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
substantial improvement for common operations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 63 +++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 02079f55ef20..f5e76eeae281 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -232,8 +232,7 @@ xfs_rtalloc_log_count(
  * Various log reservation values.
  *
  * These are based on the size of the file system block because that is what
- * most transactions manipulate.  Each adds in an additional 128 bytes per
- * item logged to try to account for the overhead of the transaction mechanism.
+ * most transactions manipulate.
  *
  * Note:  Most of the reservations underestimate the number of allocation
  * groups into which they could free extents in the xfs_defer_finish() call.
@@ -262,9 +261,9 @@ xfs_rtalloc_log_count(
  *    the superblock free block counter: sector size
  *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 1 block
- * And the bmap_finish transaction can free bmap blocks in a join (t3):
+ * And the deferred freeing can free bmap blocks in a join (t3):
  *    the super block free block counter: sector size
- *    two extent allocfree reservations for the AG.
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_write_reservation(
@@ -290,23 +289,25 @@ xfs_calc_write_reservation(
 	}
 
 	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-	     xfs_allocfree_extent_res(mp) * 2;
+	     xfs_allocfree_extent_res(mp);
 
 	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
 }
 
 /*
- * In truncating a file we free up to two extents at once.  We can modify (t1):
+ * In truncating a file we defer freeing so we only free one extent per
+ * transaction for normal files. For rt files we limit to 2 extents per
+ * transaction.
+ * We can modify (t1):
  *    the inode being truncated: inode size
  *    the inode's bmap btree: (max depth + 1) * block size
- * And the bmap_finish transaction can free the blocks and bmap blocks (t2):
- *    the super block to reflect the freed blocks: sector size
- *    four extent allocfree reservations for the AG.
- * Or, if it's a realtime file (t3):
+ * Or, if it's a realtime file (t2):
  *    the super block to reflect the freed blocks: sector size
  *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
  *    the realtime summary: 2 exts * 1 block
- *    two extent allocfree reservations for the AG.
+ * And the deferred freeing can free the blocks and bmap blocks (t3):
+ *    the super block to reflect the freed blocks: sector size
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_itruncate_reservation(
@@ -318,17 +319,16 @@ xfs_calc_itruncate_reservation(
 	t1 = xfs_calc_inode_res(mp, 1) +
 	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz);
 
-	t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-	     xfs_allocfree_extent_res(mp) * 4;
-
 	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
-		t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) +
-		     xfs_allocfree_extent_res(mp) * 2;
+		t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz);
 	} else {
-		t3 = 0;
+		t2 = 0;
 	}
 
+	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
+	     xfs_allocfree_extent_res(mp);
+
 	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
 }
 
@@ -337,10 +337,9 @@ xfs_calc_itruncate_reservation(
  *    the four inodes involved: 4 * inode size
  *    the two directory btrees: 2 * (max depth + v2) * dir block size
  *    the two directory bmap btrees: 2 * max depth * block size
- * And the bmap_finish transaction can free dir and bmap blocks (two sets
- *	of bmap blocks) giving:
+ * And the deferred freeing can free dir and bmap blocks giving:
  *    the superblock for the free block count: sector size
- *    three extent allocfree reservations for the AG.
+ *    one extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_rename_reservation(
@@ -351,7 +350,7 @@ xfs_calc_rename_reservation(
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_allocfree_extent_res(mp) * 3));
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -375,7 +374,7 @@ xfs_calc_iunlink_remove_reservation(
  *    the linked inode: inode size
  *    the directory btree could split: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
- * And the bmap_finish transaction can free some bmap blocks giving:
+ * And the deferred freeing can free bmap blocks giving:
  *    the superblock for the free block count: sector size
  *    one extent allocfree reservation for the AG.
  */
@@ -411,9 +410,9 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
  *    the removed inode: inode size
  *    the directory btree could join: (max depth + v2) * dir block size
  *    the directory bmap btree could join or split: (max depth + v2) * blocksize
- * And the bmap_finish transaction can free the dir and bmap blocks giving:
+ * And the deferred freeing can free the dir and bmap blocks giving:
  *    the superblock for the free block count: sector size
- *    two extent allocfree reservations for the AG.
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_remove_reservation(
@@ -425,7 +424,7 @@ xfs_calc_remove_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_allocfree_extent_res(mp) * 2));
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -670,9 +669,9 @@ xfs_calc_addafork_reservation(
  * Removing the attribute fork of a file
  *    the inode being truncated: inode size
  *    the inode's bmap btree: max depth * block size
- * And the bmap_finish transaction can free the blocks and bmap blocks:
+ * And the deferred freeing can free the blocks and bmap blocks:
  *    the super block to reflect the freed blocks: sector size
- *    four extent allocfree reservations for the AG.
+ *    one extent allocfree reservation for the AG.
  */
 STATIC uint
 xfs_calc_attrinval_reservation(
@@ -682,7 +681,7 @@ xfs_calc_attrinval_reservation(
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
 		   (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		    xfs_allocfree_extent_res(mp) * 4));
+		    xfs_allocfree_extent_res(mp)));
 }
 
 /*
@@ -730,9 +729,9 @@ xfs_calc_attrsetrt_reservation(
  *    the inode: inode size
  *    the attribute btree could join: max depth * block size
  *    the inode bmap btree could join or split: max depth * block size
- * And the bmap_finish transaction can free the attr blocks freed giving:
+ * And the deferred freeing can free the attr blocks freed giving:
  *    the superblock for the free block count: sector size
- *    two extent allocfree reservations for the AG.
+ *    one extent allocfree reservations for the AG.
  */
 STATIC uint
 xfs_calc_attrrm_reservation(
@@ -746,7 +745,7 @@ xfs_calc_attrrm_reservation(
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
 		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
-		     xfs_allocfree_extent_res(mp) * 2));
+		     xfs_allocfree_extent_res(mp)));
 }
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/6] xfs: btree format inode forks can have zero extents
  2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
@ 2021-05-27  6:15   ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-05-27  6:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 27, 2021 at 02:51:57PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs/538 is assert failing with this trace when testing with
> directory block sizes of 64kB:
> 
> XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608
> ....
> Call Trace:
>  xfs_bmap_btree_to_extents+0x2a9/0x470
>  ? kmem_cache_alloc+0xe7/0x220
>  __xfs_bunmapi+0x4ca/0xdf0
>  xfs_bunmapi+0x1a/0x30
>  xfs_dir2_shrink_inode+0x71/0x210
>  xfs_dir2_block_to_sf+0x2ae/0x410
>  xfs_dir2_block_removename+0x21a/0x280
>  xfs_dir_removename+0x195/0x1d0
>  xfs_remove+0x244/0x460
>  xfs_vn_unlink+0x53/0xa0
>  ? selinux_inode_unlink+0x13/0x20
>  vfs_unlink+0x117/0x220
>  do_unlinkat+0x1a2/0x2d0
>  __x64_sys_unlink+0x42/0x60
>  do_syscall_64+0x3a/0x70
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> This is a check to ensure that the extents have been read into
> memory before we are doing a ifork btree manipulation. This assert
> is bogus in the above case.
> 
> We have a fragmented directory block that has more extents in it
> than can fit in extent format, so the inode data fork is in btree
> format. xfs_dir2_shrink_inode() asks to remove all remaining 16
> filesystem blocks from the inode so it can convert to short form,
> and __xfs_bunmapi() removes all the extents. We now have a data fork
> in btree format but have zero extents in the fork. This incorrectly
> trips the xfs_need_iread_extents() assert because it assumes that an
> empty extent btree means the extent tree has not been read into
> memory yet. This is clearly not the case with xfs_bunmapi(), as it
> has an explicit call to xfs_iread_extents() in it to pull the
> extents into memory before it starts unmapping.
> 
> Also, the assert directly after this bogus one is:
> 
> 	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
> 
> Which covers the context in which it is legal to call
> xfs_bmap_btree_to_extents just fine. Hence we should just remove the
> bogus assert as it is clearly wrong and causes a regression.
> 
> The returns the test behaviour to the pre-existing assert failure in
> xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
> remove all the extents in the range it was asked to unmap.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Makes sense to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7e3b9b01431e..3f8b6da09261 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -605,7 +605,6 @@ xfs_bmap_btree_to_extents(
>  
>  	ASSERT(cur);
>  	ASSERT(whichfork != XFS_COW_FORK);
> -	ASSERT(!xfs_need_iread_extents(ifp));
>  	ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
>  	ASSERT(be16_to_cpu(rblock->bb_level) == 1);
>  	ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues
  2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
@ 2021-05-27  6:16   ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-05-27  6:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 27, 2021 at 02:51:58PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> large directory block size operations are assert failing because
> xfs_bunmapi() is not completely removing fragmented directory blocks
> like so:
> 
> XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677
> ....
> Call Trace:
>  xfs_dir2_shrink_inode+0x1a8/0x210
>  xfs_dir2_block_to_sf+0x2ae/0x410
>  xfs_dir2_block_removename+0x21a/0x280
>  xfs_dir_removename+0x195/0x1d0
>  xfs_rename+0xb79/0xc50
>  ? avc_has_perm+0x8d/0x1a0
>  ? avc_has_perm_noaudit+0x9a/0x120
>  xfs_vn_rename+0xdb/0x150
>  vfs_rename+0x719/0xb50
>  ? __lookup_hash+0x6a/0xa0
>  do_renameat2+0x413/0x5e0
>  __x64_sys_rename+0x45/0x50
>  do_syscall_64+0x3a/0x70
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> We are aborting the bunmapi() pass because of this specific chunk of
> code:
> 
>                 /*
>                  * Make sure we don't touch multiple AGF headers out of order
>                  * in a single transaction, as that could cause AB-BA deadlocks.
>                  */
>                 if (!wasdel && !isrt) {
>                         agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
>                         if (prev_agno != NULLAGNUMBER && prev_agno > agno)
>                                 break;
>                         prev_agno = agno;
>                 }
> 
> This is designed to prevent deadlocks in AGF locking when freeing
> multiple extents by ensuring that we only ever lock in increasing
> AG number order. Unfortunately, this also violates the "bunmapi will
> always succeed" semantic that some high level callers depend on,
> such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and
> xfs_inactive_symlink_rmt().
> 
> This AG lock ordering was introduced back in 2017 to fix deadlocks
> triggered by generic/299 as reported here:
> 
> https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/
> 
> This codebase is old enough that it was before we were defering all
> AG based extent freeing from within xfs_bunmapi(). THat is, we never
> actually lock AGs in xfs_bunmapi() any more - every non-rt based
> extent free is added to the defer ops list, as is all BMBT block
> freeing. And RT extents are not RT based, so there's no lock
> ordering issues associated with them.
> 
> Hence this AGF lock ordering code is both broken and dead. Let's
> just remove it so that the large directory block code works reliably
> again.
> 
> Tested against xfs/538 and generic/299 which is the original test
> that exposed the deadlocks that this code fixed.
> 
> Fixes: 5b094d6dac04 ("xfs: fix multi-AG deadlock in xfs_bunmapi")
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Seems reasonable nowadays; will throw it at the test cloud and see what
happens.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 3f8b6da09261..a3e0e6f672d6 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5349,7 +5349,6 @@ __xfs_bunmapi(
>  	xfs_fsblock_t		sum;
>  	xfs_filblks_t		len = *rlen;	/* length to unmap in file */
>  	xfs_fileoff_t		max_len;
> -	xfs_agnumber_t		prev_agno = NULLAGNUMBER, agno;
>  	xfs_fileoff_t		end;
>  	struct xfs_iext_cursor	icur;
>  	bool			done = false;
> @@ -5441,16 +5440,6 @@ __xfs_bunmapi(
>  		del = got;
>  		wasdel = isnullstartblock(del.br_startblock);
>  
> -		/*
> -		 * Make sure we don't touch multiple AGF headers out of order
> -		 * in a single transaction, as that could cause AB-BA deadlocks.
> -		 */
> -		if (!wasdel && !isrt) {
> -			agno = XFS_FSB_TO_AGNO(mp, del.br_startblock);
> -			if (prev_agno != NULLAGNUMBER && prev_agno > agno)
> -				break;
> -			prev_agno = agno;
> -		}
>  		if (got.br_startoff < start) {
>  			del.br_startoff = start;
>  			del.br_blockcount -= start - got.br_startoff;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
  2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
@ 2021-05-27  6:19   ` Darrick J. Wong
  2021-05-27  8:52     ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2021-05-27  6:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 27, 2021 at 02:52:02PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Ever since we moved to deferred freeing of extents, we only every
> free one extent per transaction. We separated the bulk unmapping of
> extents from the submission of EFI/free/EFD transactions, and hence
> while we unmap extents in bulk, we only every free one per
> transaction.
> 
> Our transaction reservations still live in the era from before
> deferred freeing of extents, so still refer to "xfs_bmap_finish"
> and it needing to free multiple extents per transaction. These
> freeing reservations can now all be reduced to a single extent to
> reflect how we currently free extents.
> 
> This significantly reduces the reservation sizes for operations like
> truncate and directory operations where they currently reserve space
> for freeing up to 4 extents per transaction.
> 
> For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
> reservation sizes change like this:
> 
> Reservation		Before			After
> (index)			logres	logcount	logres	logcount
>  0	write		314104	    8		314104	    8
>  1	itruncate	579456	    8           148608	    8
>  2	rename		435840	    2           307936	    2
>  3	link		191600	    2           191600	    2
>  4	remove		312960	    2           174328	    2
>  5	symlink		470656	    3           470656	    3
>  6	create		469504	    2           469504	    2
>  7	create_tmpfile	490240	    2           490240	    2
>  8	mkdir		469504	    3           469504	    3
>  9	ifree		508664	    2           508664	    2
>  10	ichange		  5752	    0             5752	    0
>  11	growdata	147840	    2           147840	    2
>  12	addafork	178936	    2           178936	    2
>  13	writeid		   760	    0              760	    0
>  14	attrinval	578688	    1           147840	    1
>  15	attrsetm	 26872	    3            26872	    3
>  16	attrsetrt	 16896	    0            16896	    0
>  17	attrrm		292224	    3           148608	    3
>  18	clearagi	  4224	    0             4224	    0
>  19	growrtalloc	173944	    2           173944	    2
>  20	growrtzero	  4224	    0             4224	    0
>  21	growrtfree	 10096	    0            10096	    0
>  22	qm_setqlim	   232	    1              232	    1
>  23	qm_dqalloc	318327	    8           318327	    8
>  24	qm_quotaoff	  4544	    1             4544	    1
>  25	qm_equotaoff	   320	    1              320	    1
>  26	sb		  4224	    1             4224	    1
>  27	fsyncts		   760	    0              760	    0
> MAX			579456	    8           318327	    8
> 
> So we can see that many of the reservations have gone substantially
> down in size. itruncate, rename, remove, attrinval and attrrm are
> much smaller now. The maximum reservation size has gone from being
> attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
> substantial improvement for common operations.

If you're going to play around with log reservations, can you have a
quick look at the branch I made to fix all the oversized reservations
that we make for rmap and reflink?

https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reflink-speedups

That's what's next after deferred inode inactivation lands.

--D

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_trans_resv.c | 63 +++++++++++++++++-----------------
>  1 file changed, 31 insertions(+), 32 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index 02079f55ef20..f5e76eeae281 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -232,8 +232,7 @@ xfs_rtalloc_log_count(
>   * Various log reservation values.
>   *
>   * These are based on the size of the file system block because that is what
> - * most transactions manipulate.  Each adds in an additional 128 bytes per
> - * item logged to try to account for the overhead of the transaction mechanism.
> + * most transactions manipulate.
>   *
>   * Note:  Most of the reservations underestimate the number of allocation
>   * groups into which they could free extents in the xfs_defer_finish() call.
> @@ -262,9 +261,9 @@ xfs_rtalloc_log_count(
>   *    the superblock free block counter: sector size
>   *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
>   *    the realtime summary: 1 block
> - * And the bmap_finish transaction can free bmap blocks in a join (t3):
> + * And the deferred freeing can free bmap blocks in a join (t3):
>   *    the super block free block counter: sector size
> - *    two extent allocfree reservations for the AG.
> + *    one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_write_reservation(
> @@ -290,23 +289,25 @@ xfs_calc_write_reservation(
>  	}
>  
>  	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -	     xfs_allocfree_extent_res(mp) * 2;
> +	     xfs_allocfree_extent_res(mp);
>  
>  	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
>  }
>  
>  /*
> - * In truncating a file we free up to two extents at once.  We can modify (t1):
> + * In truncating a file we defer freeing so we only free one extent per
> + * transaction for normal files. For rt files we limit to 2 extents per
> + * transaction.
> + * We can modify (t1):
>   *    the inode being truncated: inode size
>   *    the inode's bmap btree: (max depth + 1) * block size
> - * And the bmap_finish transaction can free the blocks and bmap blocks (t2):
> - *    the super block to reflect the freed blocks: sector size
> - *    four extent allocfree reservations for the AG.
> - * Or, if it's a realtime file (t3):
> + * Or, if it's a realtime file (t2):
>   *    the super block to reflect the freed blocks: sector size
>   *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
>   *    the realtime summary: 2 exts * 1 block
> - *    two extent allocfree reservations for the AG.
> + * And the deferred freeing can free the blocks and bmap blocks (t3):
> + *    the super block to reflect the freed blocks: sector size
> + *    one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_itruncate_reservation(
> @@ -318,17 +319,16 @@ xfs_calc_itruncate_reservation(
>  	t1 = xfs_calc_inode_res(mp, 1) +
>  	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz);
>  
> -	t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -	     xfs_allocfree_extent_res(mp) * 4;
> -
>  	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
> -		t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) +
> -		     xfs_allocfree_extent_res(mp) * 2;
> +		t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz);
>  	} else {
> -		t3 = 0;
> +		t2 = 0;
>  	}
>  
> +	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +	     xfs_allocfree_extent_res(mp);
> +
>  	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
>  }
>  
> @@ -337,10 +337,9 @@ xfs_calc_itruncate_reservation(
>   *    the four inodes involved: 4 * inode size
>   *    the two directory btrees: 2 * (max depth + v2) * dir block size
>   *    the two directory bmap btrees: 2 * max depth * block size
> - * And the bmap_finish transaction can free dir and bmap blocks (two sets
> - *	of bmap blocks) giving:
> + * And the deferred freeing can free dir and bmap blocks giving:
>   *    the superblock for the free block count: sector size
> - *    three extent allocfree reservations for the AG.
> + *    one extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_rename_reservation(
> @@ -351,7 +350,7 @@ xfs_calc_rename_reservation(
>  		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
>  				      XFS_FSB_TO_B(mp, 1))),
>  		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		     xfs_allocfree_extent_res(mp) * 3));
> +		     xfs_allocfree_extent_res(mp)));
>  }
>  
>  /*
> @@ -375,7 +374,7 @@ xfs_calc_iunlink_remove_reservation(
>   *    the linked inode: inode size
>   *    the directory btree could split: (max depth + v2) * dir block size
>   *    the directory bmap btree could join or split: (max depth + v2) * blocksize
> - * And the bmap_finish transaction can free some bmap blocks giving:
> + * And the deferred freeing can free bmap blocks giving:
>   *    the superblock for the free block count: sector size
>   *    one extent allocfree reservation for the AG.
>   */
> @@ -411,9 +410,9 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
>   *    the removed inode: inode size
>   *    the directory btree could join: (max depth + v2) * dir block size
>   *    the directory bmap btree could join or split: (max depth + v2) * blocksize
> - * And the bmap_finish transaction can free the dir and bmap blocks giving:
> + * And the deferred freeing can free the dir and bmap blocks giving:
>   *    the superblock for the free block count: sector size
> - *    two extent allocfree reservations for the AG.
> + *    one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_remove_reservation(
> @@ -425,7 +424,7 @@ xfs_calc_remove_reservation(
>  		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
>  				      XFS_FSB_TO_B(mp, 1))),
>  		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		     xfs_allocfree_extent_res(mp) * 2));
> +		     xfs_allocfree_extent_res(mp)));
>  }
>  
>  /*
> @@ -670,9 +669,9 @@ xfs_calc_addafork_reservation(
>   * Removing the attribute fork of a file
>   *    the inode being truncated: inode size
>   *    the inode's bmap btree: max depth * block size
> - * And the bmap_finish transaction can free the blocks and bmap blocks:
> + * And the deferred freeing can free the blocks and bmap blocks:
>   *    the super block to reflect the freed blocks: sector size
> - *    four extent allocfree reservations for the AG.
> + *    one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_attrinval_reservation(
> @@ -682,7 +681,7 @@ xfs_calc_attrinval_reservation(
>  		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
>  				     XFS_FSB_TO_B(mp, 1))),
>  		   (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		    xfs_allocfree_extent_res(mp) * 4));
> +		    xfs_allocfree_extent_res(mp)));
>  }
>  
>  /*
> @@ -730,9 +729,9 @@ xfs_calc_attrsetrt_reservation(
>   *    the inode: inode size
>   *    the attribute btree could join: max depth * block size
>   *    the inode bmap btree could join or split: max depth * block size
> - * And the bmap_finish transaction can free the attr blocks freed giving:
> + * And the deferred freeing can free the attr blocks freed giving:
>   *    the superblock for the free block count: sector size
> - *    two extent allocfree reservations for the AG.
> + *    one extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_attrrm_reservation(
> @@ -746,7 +745,7 @@ xfs_calc_attrrm_reservation(
>  					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
>  		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
>  		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -		     xfs_allocfree_extent_res(mp) * 2));
> +		     xfs_allocfree_extent_res(mp)));
>  }
>  
>  /*
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
@ 2021-05-27  6:38     ` kernel test robot
  2021-05-27  6:38     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  6:38 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2403 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.13-rc3 next-20210526]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: parisc-randconfig-r036-20210527 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/79189c83717c1c3653f6870ab803b4f1eb5d6159
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
        git checkout 79189c83717c1c3653f6870ab803b4f1eb5d6159
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/libxfs/xfs_trans_resv.c:91:1: warning: no previous prototype for 'xfs_allocfree_extent_res' [-Wmissing-prototypes]
      91 | xfs_allocfree_extent_res(
         | ^~~~~~~~~~~~~~~~~~~~~~~~


vim +/xfs_allocfree_extent_res +91 fs/xfs/libxfs/xfs_trans_resv.c

    81	
    82	/*
    83	 * Log reservation required to add or remove a single extent to the free space
    84	 * btrees.  This requires modifying:
    85	 *
    86	 * the agf header: 1 sector
    87	 * the agfl header: 1 sector
    88	 * the allocation btrees: 2 trees * (max depth - 1) * block size
    89	 */
    90	uint
  > 91	xfs_allocfree_extent_res(
    92		struct xfs_mount *mp)
    93	{
    94		return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
    95		       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
    96					XFS_FSB_TO_B(mp, 1));
    97	}
    98	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29801 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
@ 2021-05-27  6:38     ` kernel test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  6:38 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2463 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.13-rc3 next-20210526]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: parisc-randconfig-r036-20210527 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/79189c83717c1c3653f6870ab803b4f1eb5d6159
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
        git checkout 79189c83717c1c3653f6870ab803b4f1eb5d6159
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/libxfs/xfs_trans_resv.c:91:1: warning: no previous prototype for 'xfs_allocfree_extent_res' [-Wmissing-prototypes]
      91 | xfs_allocfree_extent_res(
         | ^~~~~~~~~~~~~~~~~~~~~~~~


vim +/xfs_allocfree_extent_res +91 fs/xfs/libxfs/xfs_trans_resv.c

    81	
    82	/*
    83	 * Log reservation required to add or remove a single extent to the free space
    84	 * btrees.  This requires modifying:
    85	 *
    86	 * the agf header: 1 sector
    87	 * the agfl header: 1 sector
    88	 * the allocation btrees: 2 trees * (max depth - 1) * block size
    89	 */
    90	uint
  > 91	xfs_allocfree_extent_res(
    92		struct xfs_mount *mp)
    93	{
    94		return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
    95		       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
    96					XFS_FSB_TO_B(mp, 1));
    97	}
    98	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 29801 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
@ 2021-05-27  6:38     ` kernel test robot
  2021-05-27  6:38     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  6:38 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2384 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.13-rc3 next-20210526]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: arc-allyesconfig (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/79189c83717c1c3653f6870ab803b4f1eb5d6159
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
        git checkout 79189c83717c1c3653f6870ab803b4f1eb5d6159
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/libxfs/xfs_trans_resv.c:91:1: warning: no previous prototype for 'xfs_allocfree_extent_res' [-Wmissing-prototypes]
      91 | xfs_allocfree_extent_res(
         | ^~~~~~~~~~~~~~~~~~~~~~~~


vim +/xfs_allocfree_extent_res +91 fs/xfs/libxfs/xfs_trans_resv.c

    81	
    82	/*
    83	 * Log reservation required to add or remove a single extent to the free space
    84	 * btrees.  This requires modifying:
    85	 *
    86	 * the agf header: 1 sector
    87	 * the agfl header: 1 sector
    88	 * the allocation btrees: 2 trees * (max depth - 1) * block size
    89	 */
    90	uint
  > 91	xfs_allocfree_extent_res(
    92		struct xfs_mount *mp)
    93	{
    94		return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
    95		       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
    96					XFS_FSB_TO_B(mp, 1));
    97	}
    98	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 68132 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
@ 2021-05-27  6:38     ` kernel test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  6:38 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2444 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.13-rc3 next-20210526]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: arc-allyesconfig (attached as .config)
compiler: arceb-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/79189c83717c1c3653f6870ab803b4f1eb5d6159
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
        git checkout 79189c83717c1c3653f6870ab803b4f1eb5d6159
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/xfs/libxfs/xfs_trans_resv.c:91:1: warning: no previous prototype for 'xfs_allocfree_extent_res' [-Wmissing-prototypes]
      91 | xfs_allocfree_extent_res(
         | ^~~~~~~~~~~~~~~~~~~~~~~~


vim +/xfs_allocfree_extent_res +91 fs/xfs/libxfs/xfs_trans_resv.c

    81	
    82	/*
    83	 * Log reservation required to add or remove a single extent to the free space
    84	 * btrees.  This requires modifying:
    85	 *
    86	 * the agf header: 1 sector
    87	 * the agfl header: 1 sector
    88	 * the allocation btrees: 2 trees * (max depth - 1) * block size
    89	 */
    90	uint
  > 91	xfs_allocfree_extent_res(
    92		struct xfs_mount *mp)
    93	{
    94		return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
    95		       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
    96					XFS_FSB_TO_B(mp, 1));
    97	}
    98	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 68132 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
@ 2021-05-27  7:03     ` kernel test robot
  2021-05-27  6:38     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  7:03 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1677 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.13-rc3 next-20210526]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-s021-20210527 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/79189c83717c1c3653f6870ab803b4f1eb5d6159
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
        git checkout 79189c83717c1c3653f6870ab803b4f1eb5d6159
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> fs/xfs/libxfs/xfs_trans_resv.c:91:1: sparse: sparse: symbol 'xfs_allocfree_extent_res' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33951 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
@ 2021-05-27  7:03     ` kernel test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  7:03 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on v5.13-rc3 next-20210526]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
config: x86_64-randconfig-s021-20210527 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/79189c83717c1c3653f6870ab803b4f1eb5d6159
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Chinner/xfs-bunmapi-needs-updating-for-deferred-freeing/20210527-125525
        git checkout 79189c83717c1c3653f6870ab803b4f1eb5d6159
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> fs/xfs/libxfs/xfs_trans_resv.c:91:1: sparse: sparse: symbol 'xfs_allocfree_extent_res' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33951 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH] xfs: xfs_allocfree_extent_res can be static
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
@ 2021-05-27  7:03     ` kernel test robot
  2021-05-27  6:38     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  7:03 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs; +Cc: kbuild-all

fs/xfs/libxfs/xfs_trans_resv.c:91:1: warning: symbol 'xfs_allocfree_extent_res' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 xfs_trans_resv.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6363cacb790fe..1a5ee1da34a67 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -87,7 +87,7 @@ xfs_allocfree_log_count(
  * the agfl header: 1 sector
  * the allocation btrees: 2 trees * (max depth - 1) * block size
  */
-uint
+static uint
 xfs_allocfree_extent_res(
 	struct xfs_mount *mp)
 {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH] xfs: xfs_allocfree_extent_res can be static
@ 2021-05-27  7:03     ` kernel test robot
  0 siblings, 0 replies; 30+ messages in thread
From: kernel test robot @ 2021-05-27  7:03 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

fs/xfs/libxfs/xfs_trans_resv.c:91:1: warning: symbol 'xfs_allocfree_extent_res' was not declared. Should it be static?

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
---
 xfs_trans_resv.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6363cacb790fe..1a5ee1da34a67 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -87,7 +87,7 @@ xfs_allocfree_log_count(
  * the agfl header: 1 sector
  * the allocation btrees: 2 trees * (max depth - 1) * block size
  */
-uint
+static uint
 xfs_allocfree_extent_res(
 	struct xfs_mount *mp)
 {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
  2021-05-27  6:19   ` Darrick J. Wong
@ 2021-05-27  8:52     ` Dave Chinner
  2021-05-28  0:01       ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-27  8:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, May 26, 2021 at 11:19:47PM -0700, Darrick J. Wong wrote:
> On Thu, May 27, 2021 at 02:52:02PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Ever since we moved to deferred freeing of extents, we only every
> > free one extent per transaction. We separated the bulk unmapping of
> > extents from the submission of EFI/free/EFD transactions, and hence
> > while we unmap extents in bulk, we only every free one per
> > transaction.
> > 
> > Our transaction reservations still live in the era from before
> > deferred freeing of extents, so still refer to "xfs_bmap_finish"
> > and it needing to free multiple extents per transaction. These
> > freeing reservations can now all be reduced to a single extent to
> > reflect how we currently free extents.
> > 
> > This significantly reduces the reservation sizes for operations like
> > truncate and directory operations where they currently reserve space
> > for freeing up to 4 extents per transaction.
> > 
> > For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
> > reservation sizes change like this:
> > 
> > Reservation		Before			After
> > (index)			logres	logcount	logres	logcount
> >  0	write		314104	    8		314104	    8
> >  1	itruncate	579456	    8           148608	    8
> >  2	rename		435840	    2           307936	    2
> >  3	link		191600	    2           191600	    2
> >  4	remove		312960	    2           174328	    2
> >  5	symlink		470656	    3           470656	    3
> >  6	create		469504	    2           469504	    2
> >  7	create_tmpfile	490240	    2           490240	    2
> >  8	mkdir		469504	    3           469504	    3
> >  9	ifree		508664	    2           508664	    2
> >  10	ichange		  5752	    0             5752	    0
> >  11	growdata	147840	    2           147840	    2
> >  12	addafork	178936	    2           178936	    2
> >  13	writeid		   760	    0              760	    0
> >  14	attrinval	578688	    1           147840	    1
> >  15	attrsetm	 26872	    3            26872	    3
> >  16	attrsetrt	 16896	    0            16896	    0
> >  17	attrrm		292224	    3           148608	    3
> >  18	clearagi	  4224	    0             4224	    0
> >  19	growrtalloc	173944	    2           173944	    2
> >  20	growrtzero	  4224	    0             4224	    0
> >  21	growrtfree	 10096	    0            10096	    0
> >  22	qm_setqlim	   232	    1              232	    1
> >  23	qm_dqalloc	318327	    8           318327	    8
> >  24	qm_quotaoff	  4544	    1             4544	    1
> >  25	qm_equotaoff	   320	    1              320	    1
> >  26	sb		  4224	    1             4224	    1
> >  27	fsyncts		   760	    0              760	    0
> > MAX			579456	    8           318327	    8
> > 
> > So we can see that many of the reservations have gone substantially
> > down in size. itruncate, rename, remove, attrinval and attrrm are
> > much smaller now. The maximum reservation size has gone from being
> > attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
> > substantial improvement for common operations.
> 
> If you're going to play around with log reservations, can you have a
> quick look at the branch I made to fix all the oversized reservations
> that we make for rmap and reflink?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reflink-speedups
> 
> That's what's next after deferred inode inactivation lands.

They all look reasonable at a first pass. I'd need to do more than
read the patches to say they are definitely correct, but they
certainly don't seem unreasonable to me. I'd also have to run
numbers like the table above so I can quantify the reductions,
but nothing shouted "don't do this!" to me....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
  2021-05-27  8:52     ` Dave Chinner
@ 2021-05-28  0:01       ` Darrick J. Wong
  2021-05-28  2:30         ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Darrick J. Wong @ 2021-05-28  0:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 27, 2021 at 06:52:15PM +1000, Dave Chinner wrote:
> On Wed, May 26, 2021 at 11:19:47PM -0700, Darrick J. Wong wrote:
> > On Thu, May 27, 2021 at 02:52:02PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Ever since we moved to deferred freeing of extents, we only every
> > > free one extent per transaction. We separated the bulk unmapping of
> > > extents from the submission of EFI/free/EFD transactions, and hence
> > > while we unmap extents in bulk, we only every free one per
> > > transaction.
> > > 
> > > Our transaction reservations still live in the era from before
> > > deferred freeing of extents, so still refer to "xfs_bmap_finish"
> > > and it needing to free multiple extents per transaction. These
> > > freeing reservations can now all be reduced to a single extent to
> > > reflect how we currently free extents.
> > > 
> > > This significantly reduces the reservation sizes for operations like
> > > truncate and directory operations where they currently reserve space
> > > for freeing up to 4 extents per transaction.
> > > 
> > > For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
> > > reservation sizes change like this:
> > > 
> > > Reservation		Before			After
> > > (index)			logres	logcount	logres	logcount
> > >  0	write		314104	    8		314104	    8
> > >  1	itruncate	579456	    8           148608	    8
> > >  2	rename		435840	    2           307936	    2
> > >  3	link		191600	    2           191600	    2
> > >  4	remove		312960	    2           174328	    2
> > >  5	symlink		470656	    3           470656	    3
> > >  6	create		469504	    2           469504	    2
> > >  7	create_tmpfile	490240	    2           490240	    2
> > >  8	mkdir		469504	    3           469504	    3
> > >  9	ifree		508664	    2           508664	    2
> > >  10	ichange		  5752	    0             5752	    0
> > >  11	growdata	147840	    2           147840	    2
> > >  12	addafork	178936	    2           178936	    2
> > >  13	writeid		   760	    0              760	    0
> > >  14	attrinval	578688	    1           147840	    1
> > >  15	attrsetm	 26872	    3            26872	    3
> > >  16	attrsetrt	 16896	    0            16896	    0
> > >  17	attrrm		292224	    3           148608	    3
> > >  18	clearagi	  4224	    0             4224	    0
> > >  19	growrtalloc	173944	    2           173944	    2
> > >  20	growrtzero	  4224	    0             4224	    0
> > >  21	growrtfree	 10096	    0            10096	    0
> > >  22	qm_setqlim	   232	    1              232	    1
> > >  23	qm_dqalloc	318327	    8           318327	    8
> > >  24	qm_quotaoff	  4544	    1             4544	    1
> > >  25	qm_equotaoff	   320	    1              320	    1
> > >  26	sb		  4224	    1             4224	    1
> > >  27	fsyncts		   760	    0              760	    0
> > > MAX			579456	    8           318327	    8
> > > 
> > > So we can see that many of the reservations have gone substantially
> > > down in size. itruncate, rename, remove, attrinval and attrrm are
> > > much smaller now. The maximum reservation size has gone from being
> > > attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
> > > substantial improvement for common operations.
> > 
> > If you're going to play around with log reservations, can you have a
> > quick look at the branch I made to fix all the oversized reservations
> > that we make for rmap and reflink?
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reflink-speedups
> > 
> > That's what's next after deferred inode inactivation lands.
> 
> They all look reasonable at a first pass. I'd need to do more than
> read the patches to say they are definitely correct, but they
> certainly don't seem unreasonable to me. I'd also have to run
> numbers like the table above so I can quantify the reductions,
> but nothing shouted "don't do this!" to me....

Reservations before and after for a sample 100G filesystem:

# mkfs.xfs /tmp/a.img
meta-data=/tmp/a.img             isize=512    agcount=4, agsize=6553600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=12800, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Look like:

<before>				<after>
type 0 res 294904 count 8             | type 0 res 185080 count 5
type 1 res 547200 count 8             | type 1 res 327552 count 5
type 2 res 410752 count 2             | type 2 res 307936 count 2
type 3 res 187760 count 2               type 3 res 187760 count 2
type 4 res 290688 count 2             | type 4 res 180864 count 2
type 5 res 434560 count 3             | type 5 res 269824 count 3
type 6 res 433408 count 2             | type 6 res 268672 count 2
type 7 res 450432 count 2             | type 7 res 285696 count 2
type 8 res 433408 count 3             | type 8 res 268672 count 3
type 9 res 468728 count 2             | type 9 res 303992 count 2
type 10 res 2168 count 0                type 10 res 2168 count 0
type 11 res 137088 count 2            | type 11 res 82176 count 2
type 12 res 163320 count 2            | type 12 res 108408 count 2
type 13 res 760 count 0                 type 13 res 760 count 0
type 14 res 546432 count 1            | type 14 res 326784 count 1
type 15 res 23288 count 3               type 15 res 23288 count 3
type 16 res 13312 count 0               type 16 res 13312 count 0
type 17 res 274304 count 3            | type 17 res 164480 count 3
type 18 res 640 count 0                 type 18 res 640 count 0
type 19 res 158328 count 2            | type 19 res 103416 count 2
type 20 res 4224 count 0                type 20 res 4224 count 0
type 21 res 6512 count 0                type 21 res 6512 count 0
type 22 res 232 count 1                 type 22 res 232 count 1
type 23 res 299127 count 8            | type 23 res 189303 count 5
type 24 res 848 count 1                 type 24 res 848 count 1
type 25 res 208 count 1                 type 25 res 208 count 1
type 26 res 640 count 1                 type 26 res 640 count 1
type 27 res 760 count 0                 type 27 res 760 count 0
type -1 res 547200 count 8              type -1 res 327552 count 5

IOWs, a 63% reduction in the size of an itruncate reservations.

Note that the patchset also includes the necessary pieces to continue
formatting filesystems with the same minimum log size rules as before so
that you can format with a new xfsprogs and still be able to mount on an
older kernel.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
  2021-05-28  0:01       ` Darrick J. Wong
@ 2021-05-28  2:30         ` Dave Chinner
  2021-05-28  5:30           ` Darrick J. Wong
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-28  2:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, May 27, 2021 at 05:01:53PM -0700, Darrick J. Wong wrote:
> On Thu, May 27, 2021 at 06:52:15PM +1000, Dave Chinner wrote:
> > On Wed, May 26, 2021 at 11:19:47PM -0700, Darrick J. Wong wrote:
> > > On Thu, May 27, 2021 at 02:52:02PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > Ever since we moved to deferred freeing of extents, we only every
> > > > free one extent per transaction. We separated the bulk unmapping of
> > > > extents from the submission of EFI/free/EFD transactions, and hence
> > > > while we unmap extents in bulk, we only every free one per
> > > > transaction.
> > > > 
> > > > Our transaction reservations still live in the era from before
> > > > deferred freeing of extents, so still refer to "xfs_bmap_finish"
> > > > and it needing to free multiple extents per transaction. These
> > > > freeing reservations can now all be reduced to a single extent to
> > > > reflect how we currently free extents.
> > > > 
> > > > This significantly reduces the reservation sizes for operations like
> > > > truncate and directory operations where they currently reserve space
> > > > for freeing up to 4 extents per transaction.
> > > > 
> > > > For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
> > > > reservation sizes change like this:
> > > > 
> > > > Reservation		Before			After
> > > > (index)			logres	logcount	logres	logcount
> > > >  0	write		314104	    8		314104	    8
> > > >  1	itruncate	579456	    8           148608	    8
> > > >  2	rename		435840	    2           307936	    2
> > > >  3	link		191600	    2           191600	    2
> > > >  4	remove		312960	    2           174328	    2
> > > >  5	symlink		470656	    3           470656	    3
> > > >  6	create		469504	    2           469504	    2
> > > >  7	create_tmpfile	490240	    2           490240	    2
> > > >  8	mkdir		469504	    3           469504	    3
> > > >  9	ifree		508664	    2           508664	    2
> > > >  10	ichange		  5752	    0             5752	    0
> > > >  11	growdata	147840	    2           147840	    2
> > > >  12	addafork	178936	    2           178936	    2
> > > >  13	writeid		   760	    0              760	    0
> > > >  14	attrinval	578688	    1           147840	    1
> > > >  15	attrsetm	 26872	    3            26872	    3
> > > >  16	attrsetrt	 16896	    0            16896	    0
> > > >  17	attrrm		292224	    3           148608	    3
> > > >  18	clearagi	  4224	    0             4224	    0
> > > >  19	growrtalloc	173944	    2           173944	    2
> > > >  20	growrtzero	  4224	    0             4224	    0
> > > >  21	growrtfree	 10096	    0            10096	    0
> > > >  22	qm_setqlim	   232	    1              232	    1
> > > >  23	qm_dqalloc	318327	    8           318327	    8
> > > >  24	qm_quotaoff	  4544	    1             4544	    1
> > > >  25	qm_equotaoff	   320	    1              320	    1
> > > >  26	sb		  4224	    1             4224	    1
> > > >  27	fsyncts		   760	    0              760	    0
> > > > MAX			579456	    8           318327	    8
> > > > 
> > > > So we can see that many of the reservations have gone substantially
> > > > down in size. itruncate, rename, remove, attrinval and attrrm are
> > > > much smaller now. The maximum reservation size has gone from being
> > > > attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
> > > > substantial improvement for common operations.
> > > 
> > > If you're going to play around with log reservations, can you have a
> > > quick look at the branch I made to fix all the oversized reservations
> > > that we make for rmap and reflink?
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reflink-speedups
> > > 
> > > That's what's next after deferred inode inactivation lands.
> > 
> > They all look reasonable at a first pass. I'd need to do more than
> > read the patches to say they are definitely correct, but they
> > certainly don't seem unreasonable to me. I'd also have to run
> > numbers like the table above so I can quantify the reductions,
> > but nothing shouted "don't do this!" to me....
> 
> Reservations before and after for a sample 100G filesystem:
....
> type -1 res 547200 count 8              type -1 res 327552 count 5
> 
> IOWs, a 63% reduction in the size of an itruncate reservations.

Nice! Do you want to add the three reservation patches from this
patchset to your branch? Because...

> Note that the patchset also includes the necessary pieces to continue
> formatting filesystems with the same minimum log size rules as before so
> that you can format with a new xfsprogs and still be able to mount on an
> older kernel.

... this will be needed for either of these reduction patchsets.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 6/6] xfs: reduce transaction reservation for freeing extents
  2021-05-28  2:30         ` Dave Chinner
@ 2021-05-28  5:30           ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-05-28  5:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, May 28, 2021 at 12:30:00PM +1000, Dave Chinner wrote:
> On Thu, May 27, 2021 at 05:01:53PM -0700, Darrick J. Wong wrote:
> > On Thu, May 27, 2021 at 06:52:15PM +1000, Dave Chinner wrote:
> > > On Wed, May 26, 2021 at 11:19:47PM -0700, Darrick J. Wong wrote:
> > > > On Thu, May 27, 2021 at 02:52:02PM +1000, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > Ever since we moved to deferred freeing of extents, we only every
> > > > > free one extent per transaction. We separated the bulk unmapping of
> > > > > extents from the submission of EFI/free/EFD transactions, and hence
> > > > > while we unmap extents in bulk, we only every free one per
> > > > > transaction.
> > > > > 
> > > > > Our transaction reservations still live in the era from before
> > > > > deferred freeing of extents, so still refer to "xfs_bmap_finish"
> > > > > and it needing to free multiple extents per transaction. These
> > > > > freeing reservations can now all be reduced to a single extent to
> > > > > reflect how we currently free extents.
> > > > > 
> > > > > This significantly reduces the reservation sizes for operations like
> > > > > truncate and directory operations where they currently reserve space
> > > > > for freeing up to 4 extents per transaction.
> > > > > 
> > > > > For a 4kB block size filesytsem with reflink=1,rmapbt=1, the
> > > > > reservation sizes change like this:
> > > > > 
> > > > > Reservation		Before			After
> > > > > (index)			logres	logcount	logres	logcount
> > > > >  0	write		314104	    8		314104	    8
> > > > >  1	itruncate	579456	    8           148608	    8
> > > > >  2	rename		435840	    2           307936	    2
> > > > >  3	link		191600	    2           191600	    2
> > > > >  4	remove		312960	    2           174328	    2
> > > > >  5	symlink		470656	    3           470656	    3
> > > > >  6	create		469504	    2           469504	    2
> > > > >  7	create_tmpfile	490240	    2           490240	    2
> > > > >  8	mkdir		469504	    3           469504	    3
> > > > >  9	ifree		508664	    2           508664	    2
> > > > >  10	ichange		  5752	    0             5752	    0
> > > > >  11	growdata	147840	    2           147840	    2
> > > > >  12	addafork	178936	    2           178936	    2
> > > > >  13	writeid		   760	    0              760	    0
> > > > >  14	attrinval	578688	    1           147840	    1
> > > > >  15	attrsetm	 26872	    3            26872	    3
> > > > >  16	attrsetrt	 16896	    0            16896	    0
> > > > >  17	attrrm		292224	    3           148608	    3
> > > > >  18	clearagi	  4224	    0             4224	    0
> > > > >  19	growrtalloc	173944	    2           173944	    2
> > > > >  20	growrtzero	  4224	    0             4224	    0
> > > > >  21	growrtfree	 10096	    0            10096	    0
> > > > >  22	qm_setqlim	   232	    1              232	    1
> > > > >  23	qm_dqalloc	318327	    8           318327	    8
> > > > >  24	qm_quotaoff	  4544	    1             4544	    1
> > > > >  25	qm_equotaoff	   320	    1              320	    1
> > > > >  26	sb		  4224	    1             4224	    1
> > > > >  27	fsyncts		   760	    0              760	    0
> > > > > MAX			579456	    8           318327	    8
> > > > > 
> > > > > So we can see that many of the reservations have gone substantially
> > > > > down in size. itruncate, rename, remove, attrinval and attrrm are
> > > > > much smaller now. The maximum reservation size has gone from being
> > > > > attrinval at 579456*8 bytes to dqalloc at 318327*8 bytes. This is a
> > > > > substantial improvement for common operations.
> > > > 
> > > > If you're going to play around with log reservations, can you have a
> > > > quick look at the branch I made to fix all the oversized reservations
> > > > that we make for rmap and reflink?
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=reflink-speedups
> > > > 
> > > > That's what's next after deferred inode inactivation lands.
> > > 
> > > They all look reasonable at a first pass. I'd need to do more than
> > > read the patches to say they are definitely correct, but they
> > > certainly don't seem unreasonable to me. I'd also have to run
> > > numbers like the table above so I can quantify the reductions,
> > > but nothing shouted "don't do this!" to me....
> > 
> > Reservations before and after for a sample 100G filesystem:
> ....
> > type -1 res 547200 count 8              type -1 res 327552 count 5
> > 
> > IOWs, a 63% reduction in the size of an itruncate reservations.
> 
> Nice! Do you want to add the three reservation patches from this
> patchset to your branch? Because...
> 
> > Note that the patchset also includes the necessary pieces to continue
> > formatting filesystems with the same minimum log size rules as before so
> > that you can format with a new xfsprogs and still be able to mount on an
> > older kernel.
> 
> ... this will be needed for either of these reduction patchsets.

I'll add them to the set, but if you want to see that branch submitted
any time soon I'll need your help next week to get deferred inode
inactivation reviewed.  Deal? :)

--D

> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing
  2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
                   ` (5 preceding siblings ...)
  2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
@ 2021-05-31 10:02 ` Chandan Babu R
  2021-05-31 22:41   ` Dave Chinner
  6 siblings, 1 reply; 30+ messages in thread
From: Chandan Babu R @ 2021-05-31 10:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On 27 May 2021 at 10:21, Dave Chinner wrote:
> Hi folks,
>
> I pulled on a loose thread when I started looking into the 64kB
> directory block size assert failure I was seeing while trying to
> test the bulk page allocation changes.
>
> I posted the first patch in the series separately - it fixed the
> immediate assert failure (5.13-rc1 regression) I was seeing, but in
> fixing that it only then dropped back to the previous assert failure
> that g/538 was triggering with 64kb directory block sizes. This can
> only be reproduced on 5.12, because that's when the error injection
> that g/538 uses was added. So I went looking deeper.
>
> It turns out that xfs_bunmapi() has some code in it to avoid locking
> AGFs in the wrong order and this is what was triggering. Many of the
> xfs_bunmapi() callers can not/do not handle partial unmaps that
> return success, and that's what the directory code is tripping over
> trying to free badly fragmented directory blocks.
>
> This AGF locking order constraint was added to xfs_bunmapu in 2017
> to avoid a deadlock in g/299. Sad thing is that shortly after this,
> we converted xfs-bunmapi to use deferred freeing, so it never
> actually locks AGFs anymore. But the deadlock avoiding landmine
> remained. And xfs_bmap_finish() went away, too, and we now only ever
> put one extent in any EFI we log for deferred freeing.

I did come across a scenario (when executing xfs/538 with 1k fs block size and
64k directory block size) where an EFI item contained three extents:

- Two of those extents belonged to the file whose extents were being freed.
- One more extent was added by xfs_bmap_btree_to_extents().
  The corresponding call trace was,
    CPU: 3 PID: 1367 Comm: fsstress Not tainted 5.12.0-rc8-next-20210419-chandan #125
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
    Call Trace:
     dump_stack+0x64/0x7c
     xfs_defer_add.cold+0x1d/0x22
     xfs_bmap_btree_to_extents+0x1f6/0x470
     __xfs_bunmapi+0x50a/0xe60
     ? xfs_trans_alloc_inode+0xbb/0x180
     xfs_bunmapi+0x15/0x30
     xfs_free_file_space+0x241/0x2c0
     xfs_file_fallocate+0x1ca/0x430
     ? __cond_resched+0x16/0x40
     ? inode_security+0x22/0x60
     ? selinux_file_permission+0xe2/0x120
     vfs_fallocate+0x146/0x2e0
     ioctl_preallocate+0x8f/0xc0
     __x64_sys_ioctl+0x62/0xb0
     do_syscall_64+0x40/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

>
> That means we now only free one extent per transaction via deferred
> freeing,

With three instances of xfs_extent_free_items associated with one instance of
xfs_defer_pending, xfs_defer_finish_noroll() would,
1. Create an EFI item containing information about the three extents to be
   freed.
   - The extents in xfs_defer_pending->dfp_work list are sorted based on AG
     number.
2. Roll the transaction.
3. The new transaction would,
   - Create an EFD item to hold information about the three extents to be
     freed.
   - Free the three extents in a single transaction.

> and there are no limitations on what order xfs_bunmapi()
> can unmap extents.

I think the sorting of extent items mentioned above is the reason that AG
locks are obtained in increasing AGNO order while freeing extents.

> 64kB directories on a 1kB block size filesystem
> already unmap 64 extents in a single loop, so there's no real
> limitation here.

I think, in the worst case, we can free atmost XFS_EFI_MAX_FAST_EXTENTS
(i.e. 16) extents in a single transaction assuming that they were all added
in a sequence without any non-XFS_DEFER_OPS_TYPE_FREE deferred objects
added in between.

>
> This means that the limitations of how many extents we can unmap per
> loop in xfs_itruncate_extents_flags() goes away for data device
> extents (and will eventually go away for RT devices, too, when
> Darrick's RT EFI stuff gets merged).
>
> This "one data deveice extent free per transaction" change now means
> that all of the transaction reservations that include
> "xfs_bmap_finish" based freeing reservations are wrong. These extent
> frees are now done by deferred freeing, and so they only need a
> single extent free reservation instead of up to 4 (as truncate was
> reserving).
>
> This series fixes the btree fork regression, the bunmapi partial
> unmap regression from 2017, extends xfs_itruncate_extents to unmap
> 64 extents at a time for data device (AG) resident extents, and
> reworks the transaction reservations to use a consistent and correct
> reservation for allocation and freeing extents. The size of some
> transaction reservations drops dramatically as a result.
>
> The first two patches are -rcX candidates, the rest are for the next
> merge cycle....
>

--
chandan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
  2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
@ 2021-05-31 12:55   ` Chandan Babu R
  2021-05-31 13:05     ` Chandan Babu R
  0 siblings, 1 reply; 30+ messages in thread
From: Chandan Babu R @ 2021-05-31 12:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On 27 May 2021 at 10:21, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Ever since we moved to freeing of extents by deferred operations,
> we've already freed extents via individual transactions. Hence the
> only limitation of how many extents we can mark for freeing in a
> single xfs_bunmapi() call bound only by how many deferrals we want
> to queue.
>
> That is xfs_bunmapi() doesn't actually do any AG based extent
> freeing, so there's no actually transaction reservation used up by
> calling bunmapi with a large count of extents to be freed. RT
> extents have always been freed directly by bunmapi, but that doesn't
> require modification of large number of blocks as there are no
> btrees to split.
>
> Some callers of xfs_bunmapi assume that the extent count being freed
> is bound by geometry (e.g. directories) and these can ask bunmapi to
> free up to 64 extents in a single call. These functions just work as
> tehy stand, so there's no reason for truncate to have a limit of
> just two extents per bunmapi call anymore.
>
> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
> that can be deferred in a single loop to match what the directory
> code already uses.
>
> For realtime inodes, where xfs_bunmapi() directly frees extents,
> leave the limit at 2 extents per loop as this is all the space that
> the transaction reservation will cover.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 0369eb22c1bb..db220eaa34b8 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>
>  /*
>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
> - * freed from a file in a single transaction.
> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
> + * by transaction reservation size to 2 extents.
>   */
> -#define	XFS_ITRUNC_MAX_EXTENTS	2
> +static inline int
> +xfs_itrunc_max_extents(
> +	struct xfs_inode	*ip)
> +{
> +	if (XFS_IS_REALTIME_INODE(ip))
> +		return 2;
> +	return 64;
> +}
>
>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>  	while (unmap_len > 0) {
>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
> -				flags, XFS_ITRUNC_MAX_EXTENTS);
> +				flags, xfs_itrunc_max_extents(ip));
>  		if (error)
>  			goto out;

The list of free extent items at xfs_defer_pending->dfp_work could
now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.

For a single transaction, xfs_calc_itruncate_reservation() reserves space for
logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
size). But with the above change, a single transaction can now free upto 16
extents. Wouldn't this overflow the reserved log space?

--
chandan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
  2021-05-31 12:55   ` Chandan Babu R
@ 2021-05-31 13:05     ` Chandan Babu R
  2021-05-31 23:28       ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Chandan Babu R @ 2021-05-31 13:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On 31 May 2021 at 18:25, Chandan Babu R wrote:
> On 27 May 2021 at 10:21, Dave Chinner wrote:
>> From: Dave Chinner <dchinner@redhat.com>
>>
>> Ever since we moved to freeing of extents by deferred operations,
>> we've already freed extents via individual transactions. Hence the
>> only limitation of how many extents we can mark for freeing in a
>> single xfs_bunmapi() call bound only by how many deferrals we want
>> to queue.
>>
>> That is xfs_bunmapi() doesn't actually do any AG based extent
>> freeing, so there's no actually transaction reservation used up by
>> calling bunmapi with a large count of extents to be freed. RT
>> extents have always been freed directly by bunmapi, but that doesn't
>> require modification of large number of blocks as there are no
>> btrees to split.
>>
>> Some callers of xfs_bunmapi assume that the extent count being freed
>> is bound by geometry (e.g. directories) and these can ask bunmapi to
>> free up to 64 extents in a single call. These functions just work as
>> tehy stand, so there's no reason for truncate to have a limit of
>> just two extents per bunmapi call anymore.
>>
>> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
>> that can be deferred in a single loop to match what the directory
>> code already uses.
>>
>> For realtime inodes, where xfs_bunmapi() directly frees extents,
>> leave the limit at 2 extents per loop as this is all the space that
>> the transaction reservation will cover.
>>
>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> ---
>>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> index 0369eb22c1bb..db220eaa34b8 100644
>> --- a/fs/xfs/xfs_inode.c
>> +++ b/fs/xfs/xfs_inode.c
>> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>>
>>  /*
>>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
>> - * freed from a file in a single transaction.
>> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
>> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
>> + * by transaction reservation size to 2 extents.
>>   */
>> -#define	XFS_ITRUNC_MAX_EXTENTS	2
>> +static inline int
>> +xfs_itrunc_max_extents(
>> +	struct xfs_inode	*ip)
>> +{
>> +	if (XFS_IS_REALTIME_INODE(ip))
>> +		return 2;
>> +	return 64;
>> +}
>>
>>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
>> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>>  	while (unmap_len > 0) {
>>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
>> -				flags, XFS_ITRUNC_MAX_EXTENTS);
>> +				flags, xfs_itrunc_max_extents(ip));
>>  		if (error)
>>  			goto out;
>
> The list of free extent items at xfs_defer_pending->dfp_work could
> now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.
>
> For a single transaction, xfs_calc_itruncate_reservation() reserves space for
> logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
> size).

... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
required for freeing 4 extents ..."

> But with the above change, a single transaction can now free upto 16
> extents. Wouldn't this overflow the reserved log space?


-- 
chandan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing
  2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
@ 2021-05-31 22:41   ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2021-05-31 22:41 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Mon, May 31, 2021 at 03:32:05PM +0530, Chandan Babu R wrote:
> On 27 May 2021 at 10:21, Dave Chinner wrote:
> > Hi folks,
> >
> > I pulled on a loose thread when I started looking into the 64kB
> > directory block size assert failure I was seeing while trying to
> > test the bulk page allocation changes.
> >
> > I posted the first patch in the series separately - it fixed the
> > immediate assert failure (5.13-rc1 regression) I was seeing, but in
> > fixing that it only then dropped back to the previous assert failure
> > that g/538 was triggering with 64kb directory block sizes. This can
> > only be reproduced on 5.12, because that's when the error injection
> > that g/538 uses was added. So I went looking deeper.
> >
> > It turns out that xfs_bunmapi() has some code in it to avoid locking
> > AGFs in the wrong order and this is what was triggering. Many of the
> > xfs_bunmapi() callers can not/do not handle partial unmaps that
> > return success, and that's what the directory code is tripping over
> > trying to free badly fragmented directory blocks.
> >
> > This AGF locking order constraint was added to xfs_bunmapu in 2017
> > to avoid a deadlock in g/299. Sad thing is that shortly after this,
> > we converted xfs-bunmapi to use deferred freeing, so it never
> > actually locks AGFs anymore. But the deadlock avoiding landmine
> > remained. And xfs_bmap_finish() went away, too, and we now only ever
> > put one extent in any EFI we log for deferred freeing.
> 
> I did come across a scenario (when executing xfs/538 with 1k fs block size and
> 64k directory block size) where an EFI item contained three extents:
> 
> - Two of those extents belonged to the file whose extents were being freed.
> - One more extent was added by xfs_bmap_btree_to_extents().
>   The corresponding call trace was,
>     CPU: 3 PID: 1367 Comm: fsstress Not tainted 5.12.0-rc8-next-20210419-chandan #125
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
>     Call Trace:
>      dump_stack+0x64/0x7c
>      xfs_defer_add.cold+0x1d/0x22
>      xfs_bmap_btree_to_extents+0x1f6/0x470
>      __xfs_bunmapi+0x50a/0xe60
>      ? xfs_trans_alloc_inode+0xbb/0x180
>      xfs_bunmapi+0x15/0x30
>      xfs_free_file_space+0x241/0x2c0
>      xfs_file_fallocate+0x1ca/0x430
>      ? __cond_resched+0x16/0x40
>      ? inode_security+0x22/0x60
>      ? selinux_file_permission+0xe2/0x120
>      vfs_fallocate+0x146/0x2e0
>      ioctl_preallocate+0x8f/0xc0
>      __x64_sys_ioctl+0x62/0xb0
>      do_syscall_64+0x40/0x80
>      entry_SYSCALL_64_after_hwframe+0x44/0xae

That's not a directory operation that tripped over. :/

> > That means we now only free one extent per transaction via deferred
> > freeing,
> 
> With three instances of xfs_extent_free_items associated with one instance of
> xfs_defer_pending, xfs_defer_finish_noroll() would,
> 1. Create an EFI item containing information about the three extents to be
>    freed.
>    - The extents in xfs_defer_pending->dfp_work list are sorted based on AG
>      number.
> 2. Roll the transaction.
> 3. The new transaction would,
>    - Create an EFD item to hold information about the three extents to be
>      freed.
>    - Free the three extents in a single transaction.

Yeah, so I missed that xfs_defer_add() will combine like types into
the the same list if they are consecutively logged. I was looking at
intent creation being limited to one intent per deferred operation,
but missed the aggregation at queuing time.

But this is largely irrelevant to the operation of bunmapi, because
it's just queuing extents to be freed, not running the transactions
that free them...

> > and there are no limitations on what order xfs_bunmapi()
> > can unmap extents.
> 
> I think the sorting of extent items mentioned above is the reason that AG
> locks are obtained in increasing AGNO order while freeing extents.

Nope, now that I look at it, the EFI intent creation sorts the
queued extents to be freed into ascending AG order, so they'll
always be freed in an order that allows for correct AG locking order
in the freeing transaction. IOWs, we still don't need AG ordering
anywhere in the bunmapi path, because AG ordering is done at the
intent level where ordering might matter...

> > 64kB directories on a 1kB block size filesystem already unmap 64
> > extents in a single loop, so there's no real limitation here.
> 
> I think, in the worst case, we can free atmost
> XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) extents in a single transaction
> assuming that they were all added in a sequence without any
> non-XFS_DEFER_OPS_TYPE_FREE deferred objects added in between.

Yup. Seems that way, but there's not deadlock issue with doing this
because of the sorting. That's an issue for transaction reservations
for EFI processing, but these days that is not something that
bunmapi has to care about.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
  2021-05-31 13:05     ` Chandan Babu R
@ 2021-05-31 23:28       ` Dave Chinner
  2021-06-01  6:42         ` Chandan Babu R
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2021-05-31 23:28 UTC (permalink / raw)
  To: Chandan Babu R; +Cc: linux-xfs

On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote:
> On 31 May 2021 at 18:25, Chandan Babu R wrote:
> > On 27 May 2021 at 10:21, Dave Chinner wrote:
> >> From: Dave Chinner <dchinner@redhat.com>
> >>
> >> Ever since we moved to freeing of extents by deferred operations,
> >> we've already freed extents via individual transactions. Hence the
> >> only limitation of how many extents we can mark for freeing in a
> >> single xfs_bunmapi() call bound only by how many deferrals we want
> >> to queue.
> >>
> >> That is xfs_bunmapi() doesn't actually do any AG based extent
> >> freeing, so there's no actually transaction reservation used up by
> >> calling bunmapi with a large count of extents to be freed. RT
> >> extents have always been freed directly by bunmapi, but that doesn't
> >> require modification of large number of blocks as there are no
> >> btrees to split.
> >>
> >> Some callers of xfs_bunmapi assume that the extent count being freed
> >> is bound by geometry (e.g. directories) and these can ask bunmapi to
> >> free up to 64 extents in a single call. These functions just work as
> >> tehy stand, so there's no reason for truncate to have a limit of
> >> just two extents per bunmapi call anymore.
> >>
> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
> >> that can be deferred in a single loop to match what the directory
> >> code already uses.
> >>
> >> For realtime inodes, where xfs_bunmapi() directly frees extents,
> >> leave the limit at 2 extents per loop as this is all the space that
> >> the transaction reservation will cover.
> >>
> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >> ---
> >>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
> >>  1 file changed, 12 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> index 0369eb22c1bb..db220eaa34b8 100644
> >> --- a/fs/xfs/xfs_inode.c
> >> +++ b/fs/xfs/xfs_inode.c
> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
> >>
> >>  /*
> >>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
> >> - * freed from a file in a single transaction.
> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
> >> + * by transaction reservation size to 2 extents.
> >>   */
> >> -#define	XFS_ITRUNC_MAX_EXTENTS	2
> >> +static inline int
> >> +xfs_itrunc_max_extents(
> >> +	struct xfs_inode	*ip)
> >> +{
> >> +	if (XFS_IS_REALTIME_INODE(ip))
> >> +		return 2;
> >> +	return 64;
> >> +}
> >>
> >>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
> >>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
> >>  	while (unmap_len > 0) {
> >>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
> >>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
> >> -				flags, XFS_ITRUNC_MAX_EXTENTS);
> >> +				flags, xfs_itrunc_max_extents(ip));
> >>  		if (error)
> >>  			goto out;
> >
> > The list of free extent items at xfs_defer_pending->dfp_work could
> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.

Yes, but we do exactly this when freeing a large fragmented directly
block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless
of how many extents map that range.

IOWs, the limitation in extent count placed in
xfs_itruncate_extents() doesn't actually address the underlying
problem - it's just a band-aid that has been placed over the easy to
trigger transaction overrun symptom that has always been present in
the underlying extent freeing code.

> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for
> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
> > size).
> 
> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
> required for freeing 4 extents ..."

My point exactly - the code has always had a mismatch between
reservations and what we can stuff into an EFI.  EFIs are unbound in
size, so having a fixed "4 extents per EFI" reservation limit has
never made any real sense given that unmaping a 64kB directory block
on a 1kb block size filesystem has a worst case of freeing 64
extents in a single transaction. As I said above, the limitations
placed on xfs_itruncate_extents is largely a hack because it doesn't
address other avenues to the same overruns...

This "4 extents per transaction" reservation makes even less sense
now that extents are freed by defer ops, not by the transaction that
unmaps them. We've completely decoupled extent freeing from the
higher level code that unmaps them, and so now the freeing
transactions are independent of the high level code that runs the
freeing operations.

IOWs, having a reservation big enough to free a single extent is all
we should need now as we can break the extent freeing up into
individual transactions and still have them complete atomically even
after a crash. That's where I'm trying to get to here, but it's
clear that I need to refine the code a bit further such that we only
allow individual EFIs to queue up and free multiple extents within
an AG to be freed in the same transaction....

FWIW, I suspect that the right thing to do here is make use of the
xfs_defer_finish_one() mechanism for relogging the remaining intents
in work list while it is processing that list. All we need is for
xfs_extent_free_finish_item() to return -EAGAIN instead of running
the free transaction, and we will log the completed EFDs and the
remaining EFIs to be run and roll the transaction.

Hence we can control exactly when we roll the extent freeing
transaction (e.g. when the next extent to free is in a different AG
to the one we've just been freeing extents in) and as a result we
can define a fixed transaction reservation for extent freeing that
works in every situation without having to care about arbitrary
"how many extents can we unmap without overrun" concerns.

That's the goal here - fixed, small reservation for extent freeing
that is decoupled from and independent of the number of extents that
need to be freed atomically by the high level operation...

Hence I think that a minor amount of rework will allow the EFI code
to log large numbers of extents to be freed whilst still processing
them within single AG free space tree modification reservation
bounds...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation
  2021-05-31 23:28       ` Dave Chinner
@ 2021-06-01  6:42         ` Chandan Babu R
  0 siblings, 0 replies; 30+ messages in thread
From: Chandan Babu R @ 2021-06-01  6:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On 01 Jun 2021 at 04:58, Dave Chinner wrote:
> On Mon, May 31, 2021 at 06:35:40PM +0530, Chandan Babu R wrote:
>> On 31 May 2021 at 18:25, Chandan Babu R wrote:
>> > On 27 May 2021 at 10:21, Dave Chinner wrote:
>> >> From: Dave Chinner <dchinner@redhat.com>
>> >>
>> >> Ever since we moved to freeing of extents by deferred operations,
>> >> we've already freed extents via individual transactions. Hence the
>> >> only limitation of how many extents we can mark for freeing in a
>> >> single xfs_bunmapi() call bound only by how many deferrals we want
>> >> to queue.
>> >>
>> >> That is xfs_bunmapi() doesn't actually do any AG based extent
>> >> freeing, so there's no actually transaction reservation used up by
>> >> calling bunmapi with a large count of extents to be freed. RT
>> >> extents have always been freed directly by bunmapi, but that doesn't
>> >> require modification of large number of blocks as there are no
>> >> btrees to split.
>> >>
>> >> Some callers of xfs_bunmapi assume that the extent count being freed
>> >> is bound by geometry (e.g. directories) and these can ask bunmapi to
>> >> free up to 64 extents in a single call. These functions just work as
>> >> tehy stand, so there's no reason for truncate to have a limit of
>> >> just two extents per bunmapi call anymore.
>> >>
>> >> Increase XFS_ITRUNC_MAX_EXTENTS to 64 to match the number of extents
>> >> that can be deferred in a single loop to match what the directory
>> >> code already uses.
>> >>
>> >> For realtime inodes, where xfs_bunmapi() directly frees extents,
>> >> leave the limit at 2 extents per loop as this is all the space that
>> >> the transaction reservation will cover.
>> >>
>> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> >> ---
>> >>  fs/xfs/xfs_inode.c | 15 ++++++++++++---
>> >>  1 file changed, 12 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> >> index 0369eb22c1bb..db220eaa34b8 100644
>> >> --- a/fs/xfs/xfs_inode.c
>> >> +++ b/fs/xfs/xfs_inode.c
>> >> @@ -40,9 +40,18 @@ kmem_zone_t *xfs_inode_zone;
>> >>
>> >>  /*
>> >>   * Used in xfs_itruncate_extents().  This is the maximum number of extents
>> >> - * freed from a file in a single transaction.
>> >> + * we will unmap and defer for freeing in a single call to xfs_bunmapi().
>> >> + * Realtime inodes directly free extents in xfs_bunmapi(), so are bound
>> >> + * by transaction reservation size to 2 extents.
>> >>   */
>> >> -#define	XFS_ITRUNC_MAX_EXTENTS	2
>> >> +static inline int
>> >> +xfs_itrunc_max_extents(
>> >> +	struct xfs_inode	*ip)
>> >> +{
>> >> +	if (XFS_IS_REALTIME_INODE(ip))
>> >> +		return 2;
>> >> +	return 64;
>> >> +}
>> >>
>> >>  STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
>> >>  STATIC int xfs_iunlink_remove(struct xfs_trans *, struct xfs_inode *);
>> >> @@ -1402,7 +1411,7 @@ xfs_itruncate_extents_flags(
>> >>  	while (unmap_len > 0) {
>> >>  		ASSERT(tp->t_firstblock == NULLFSBLOCK);
>> >>  		error = __xfs_bunmapi(tp, ip, first_unmap_block, &unmap_len,
>> >> -				flags, XFS_ITRUNC_MAX_EXTENTS);
>> >> +				flags, xfs_itrunc_max_extents(ip));
>> >>  		if (error)
>> >>  			goto out;
>> >
>> > The list of free extent items at xfs_defer_pending->dfp_work could
>> > now contain XFS_EFI_MAX_FAST_EXTENTS (i.e. 16) entries in the worst case.
>
> Yes, but we do exactly this when freeing a large fragmented directly
> block. That is, we ask xfs_bunmapi to unmap a 64kB range regardless
> of how many extents map that range.
>
> IOWs, the limitation in extent count placed in
> xfs_itruncate_extents() doesn't actually address the underlying
> problem - it's just a band-aid that has been placed over the easy to
> trigger transaction overrun symptom that has always been present in
> the underlying extent freeing code.
>
>> > For a single transaction, xfs_calc_itruncate_reservation() reserves space for
>> > logging only 4 extents (i.e. 4 exts * 2 trees * (2 * max depth - 1) * block
>> > size).
>>
>> ... Sorry, I meant to say "xfs_calc_itruncate_reservation() reserves log space
>> required for freeing 4 extents ..."
>
> My point exactly - the code has always had a mismatch between
> reservations and what we can stuff into an EFI.  EFIs are unbound in
> size, so having a fixed "4 extents per EFI" reservation limit has
> never made any real sense given that unmaping a 64kB directory block
> on a 1kb block size filesystem has a worst case of freeing 64
> extents in a single transaction. As I said above, the limitations
> placed on xfs_itruncate_extents is largely a hack because it doesn't
> address other avenues to the same overruns...
>

Ok. Thanks for the explanation.

> This "4 extents per transaction" reservation makes even less sense
> now that extents are freed by defer ops, not by the transaction that
> unmaps them. We've completely decoupled extent freeing from the
> higher level code that unmaps them, and so now the freeing
> transactions are independent of the high level code that runs the
> freeing operations.
>
> IOWs, having a reservation big enough to free a single extent is all
> we should need now as we can break the extent freeing up into
> individual transactions and still have them complete atomically even
> after a crash. That's where I'm trying to get to here, but it's
> clear that I need to refine the code a bit further such that we only
> allow individual EFIs to queue up and free multiple extents within
> an AG to be freed in the same transaction....

Ok. That should limit the number of CNTBT/BNOBT blocks being logged by a
transaction which is freeing extents.

>
> FWIW, I suspect that the right thing to do here is make use of the
> xfs_defer_finish_one() mechanism for relogging the remaining intents
> in work list while it is processing that list. All we need is for
> xfs_extent_free_finish_item() to return -EAGAIN instead of running
> the free transaction, and we will log the completed EFDs and the
> remaining EFIs to be run and roll the transaction.
>
> Hence we can control exactly when we roll the extent freeing
> transaction (e.g. when the next extent to free is in a different AG
> to the one we've just been freeing extents in) and as a result we
> can define a fixed transaction reservation for extent freeing that
> works in every situation without having to care about arbitrary
> "how many extents can we unmap without overrun" concerns.
>

That sounds perfect.

> That's the goal here - fixed, small reservation for extent freeing
> that is decoupled from and independent of the number of extents that
> need to be freed atomically by the high level operation...
>
> Hence I think that a minor amount of rework will allow the EFI code
> to log large numbers of extents to be freed whilst still processing
> them within single AG free space tree modification reservation
> bounds...


--
chandan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] xfs: factor free space tree transaciton reservations
  2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
@ 2021-06-02 21:36   ` Darrick J. Wong
  0 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-06-02 21:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 27, 2021 at 02:52:01PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Convert all the open coded free space tree modification reservations
> to use the new xfs_allocfree_extent_res() function.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_trans_resv.c | 122 ++++++++++++---------------------
>  1 file changed, 45 insertions(+), 77 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index 6363cacb790f..02079f55ef20 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -143,18 +143,16 @@ xfs_calc_inode_res(
>   * reservation:
>   *
>   * the inode btree: max depth * blocksize
> - * the allocation btrees: 2 trees * (max depth - 1) * block size
> + * one extent allocfree reservation for the AG.
>   *
> - * The caller must account for SB and AG header modifications, etc.
>   */
>  STATIC uint
>  xfs_calc_inobt_res(
>  	struct xfs_mount	*mp)
>  {
>  	return xfs_calc_buf_res(M_IGEO(mp)->inobt_maxlevels,
> -			XFS_FSB_TO_B(mp, 1)) +
> -				xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -			XFS_FSB_TO_B(mp, 1));
> +				XFS_FSB_TO_B(mp, 1)) +
> +		xfs_allocfree_extent_res(mp);

Hm.  xfs_allocfree_extent_res computes the space to handle full splits
of both free space btrees and updates to the AGFL and AGF.

This function did not previously account for AG header updates -- it
only computes the space needed to log a split of an inode btree and
splits of both free space btrees.  It looks like xfs_calc_inobt_res's
callers themselves add reservation space for AG headers.

So, doesn't this introduce double-counting of the AGF for inode
operations?  It's probably not a big deal to allocate more space, but
this doesn't look like a straight refactoring.

>  }
>  
>  /*
> @@ -182,7 +180,7 @@ xfs_calc_finobt_res(
>   * Calculate the reservation required to allocate or free an inode chunk. This
>   * includes:
>   *
> - * the allocation btrees: 2 trees * (max depth - 1) * block size
> + * one extent allocfree reservation for the AG.
>   * the inode chunk: m_ino_geo.ialloc_blks * N
>   *
>   * The size N of the inode chunk reservation depends on whether it is for
> @@ -200,8 +198,7 @@ xfs_calc_inode_chunk_res(
>  {
>  	uint			res, size = 0;
>  
> -	res = xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -			       XFS_FSB_TO_B(mp, 1));
> +	res = xfs_allocfree_extent_res(mp);

Same question here.

>  	if (alloc) {
>  		/* icreate tx uses ordered buffers */
>  		if (xfs_sb_version_has_v3inode(&mp->m_sb))
> @@ -256,22 +253,18 @@ xfs_rtalloc_log_count(
>   * extents.  This gives (t1):
>   *    the inode getting the new extents: inode size
>   *    the inode's bmap btree: max depth * block size
> - *    the agfs of the ags from which the extents are allocated: 2 * sector
>   *    the superblock free block counter: sector size
> - *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> + *    two extent allocfree reservations for the AG.
>   * Or, if we're writing to a realtime file (t2):
>   *    the inode getting the new extents: inode size
>   *    the inode's bmap btree: max depth * block size
> - *    the agfs of the ags from which the extents are allocated: 2 * sector
> + *    one extent allocfree reservation for the AG.
>   *    the superblock free block counter: sector size
>   *    the realtime bitmap: ((MAXEXTLEN / rtextsize) / NBBY) bytes
>   *    the realtime summary: 1 block
> - *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
>   * And the bmap_finish transaction can free bmap blocks in a join (t3):
> - *    the agfs of the ags containing the blocks: 2 * sector size
> - *    the agfls of the ags containing the blocks: 2 * sector size
>   *    the super block free block counter: sector size
> - *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> + *    two extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_write_reservation(
> @@ -282,8 +275,8 @@ xfs_calc_write_reservation(
>  
>  	t1 = xfs_calc_inode_res(mp, 1) +
>  	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), blksz) +
> -	     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
> -	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
> +	     xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +	     xfs_allocfree_extent_res(mp) * 2;

This conversion looks ok.

>  
>  	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
>  		t2 = xfs_calc_inode_res(mp, 1) +
> @@ -291,13 +284,13 @@ xfs_calc_write_reservation(
>  				     blksz) +
>  		     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
>  		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 1), blksz) +
> -		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1), blksz);
> +		     xfs_allocfree_extent_res(mp);

Shouldn't this change the 3 above to 1?

Same questions for the rest of this patch.

--D

>  	} else {
>  		t2 = 0;
>  	}
>  
> -	t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
> -	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
> +	t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +	     xfs_allocfree_extent_res(mp) * 2;
>  
>  	return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3);
>  }
> @@ -307,19 +300,13 @@ xfs_calc_write_reservation(
>   *    the inode being truncated: inode size
>   *    the inode's bmap btree: (max depth + 1) * block size
>   * And the bmap_finish transaction can free the blocks and bmap blocks (t2):
> - *    the agf for each of the ags: 4 * sector size
> - *    the agfl for each of the ags: 4 * sector size
>   *    the super block to reflect the freed blocks: sector size
> - *    worst case split in allocation btrees per extent assuming 4 extents:
> - *		4 exts * 2 trees * (2 * max depth - 1) * block size
> + *    four extent allocfree reservations for the AG.
>   * Or, if it's a realtime file (t3):
> - *    the agf for each of the ags: 2 * sector size
> - *    the agfl for each of the ags: 2 * sector size
>   *    the super block to reflect the freed blocks: sector size
>   *    the realtime bitmap: 2 exts * ((MAXEXTLEN / rtextsize) / NBBY) bytes
>   *    the realtime summary: 2 exts * 1 block
> - *    worst case split in allocation btrees per extent assuming 2 extents:
> - *		2 exts * 2 trees * (2 * max depth - 1) * block size
> + *    two extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_itruncate_reservation(
> @@ -331,13 +318,13 @@ xfs_calc_itruncate_reservation(
>  	t1 = xfs_calc_inode_res(mp, 1) +
>  	     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1, blksz);
>  
> -	t2 = xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
> -	     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4), blksz);
> +	t2 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +	     xfs_allocfree_extent_res(mp) * 4;
>  
>  	if (xfs_sb_version_hasrealtime(&mp->m_sb)) {
> -		t3 = xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
> +		t3 = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
>  		     xfs_calc_buf_res(xfs_rtalloc_log_count(mp, 2), blksz) +
> -		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2), blksz);
> +		     xfs_allocfree_extent_res(mp) * 2;
>  	} else {
>  		t3 = 0;
>  	}
> @@ -352,10 +339,8 @@ xfs_calc_itruncate_reservation(
>   *    the two directory bmap btrees: 2 * max depth * block size
>   * And the bmap_finish transaction can free dir and bmap blocks (two sets
>   *	of bmap blocks) giving:
> - *    the agf for the ags in which the blocks live: 3 * sector size
> - *    the agfl for the ags in which the blocks live: 3 * sector size
>   *    the superblock for the free block count: sector size
> - *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
> + *    three extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_rename_reservation(
> @@ -365,9 +350,8 @@ xfs_calc_rename_reservation(
>  		max((xfs_calc_inode_res(mp, 4) +
>  		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
>  				      XFS_FSB_TO_B(mp, 1))),
> -		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
> -		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
> -				      XFS_FSB_TO_B(mp, 1))));
> +		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +		     xfs_allocfree_extent_res(mp) * 3));
>  }
>  
>  /*
> @@ -381,20 +365,19 @@ xfs_calc_iunlink_remove_reservation(
>  	struct xfs_mount        *mp)
>  {
>  	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -	       2 * M_IGEO(mp)->inode_cluster_size;
> +	       xfs_calc_buf_res(2, M_IGEO(mp)->inode_cluster_size);
>  }
>  
>  /*
>   * For creating a link to an inode:
> + *    the inode is removed from the iunlink list (O_TMPFILE)
>   *    the parent directory inode: inode size
>   *    the linked inode: inode size
>   *    the directory btree could split: (max depth + v2) * dir block size
>   *    the directory bmap btree could join or split: (max depth + v2) * blocksize
>   * And the bmap_finish transaction can free some bmap blocks giving:
> - *    the agf for the ag in which the blocks live: sector size
> - *    the agfl for the ag in which the blocks live: sector size
>   *    the superblock for the free block count: sector size
> - *    the allocation btrees: 2 trees * (2 * max depth - 1) * block size
> + *    one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_link_reservation(
> @@ -405,9 +388,8 @@ xfs_calc_link_reservation(
>  		max((xfs_calc_inode_res(mp, 2) +
>  		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
>  				      XFS_FSB_TO_B(mp, 1))),
> -		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
> -		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				      XFS_FSB_TO_B(mp, 1))));
> +		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +		     xfs_allocfree_extent_res(mp)));
>  }
>  
>  /*
> @@ -419,20 +401,19 @@ STATIC uint
>  xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
>  {
>  	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> -			M_IGEO(mp)->inode_cluster_size;
> +	       xfs_calc_buf_res(1, M_IGEO(mp)->inode_cluster_size);
>  }
>  
>  /*
>   * For removing a directory entry we can modify:
> + *    the inode is added to the agi unlinked list
>   *    the parent directory inode: inode size
>   *    the removed inode: inode size
>   *    the directory btree could join: (max depth + v2) * dir block size
>   *    the directory bmap btree could join or split: (max depth + v2) * blocksize
>   * And the bmap_finish transaction can free the dir and bmap blocks giving:
> - *    the agf for the ag in which the blocks live: 2 * sector size
> - *    the agfl for the ag in which the blocks live: 2 * sector size
>   *    the superblock for the free block count: sector size
> - *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> + *    two extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_remove_reservation(
> @@ -443,9 +424,8 @@ xfs_calc_remove_reservation(
>  		max((xfs_calc_inode_res(mp, 1) +
>  		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
>  				      XFS_FSB_TO_B(mp, 1))),
> -		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
> -		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
> -				      XFS_FSB_TO_B(mp, 1))));
> +		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +		     xfs_allocfree_extent_res(mp) * 2));
>  }
>  
>  /*
> @@ -581,16 +561,14 @@ xfs_calc_ichange_reservation(
>  /*
>   * Growing the data section of the filesystem.
>   *	superblock
> - *	agi and agf
> - *	allocation btrees
> + *      one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_growdata_reservation(
>  	struct xfs_mount	*mp)
>  {
> -	return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
> -		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				 XFS_FSB_TO_B(mp, 1));
> +	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +	       xfs_allocfree_extent_res(mp);
>  }
>  
>  /*
> @@ -598,10 +576,9 @@ xfs_calc_growdata_reservation(
>   * In the first set of transactions (ALLOC) we allocate space to the
>   * bitmap or summary files.
>   *	superblock: sector size
> - *	agf of the ag from which the extent is allocated: sector size
>   *	bmap btree for bitmap/summary inode: max depth * blocksize
>   *	bitmap/summary inode: inode size
> - *	allocation btrees for 1 block alloc: 2 * (2 * maxdepth - 1) * blocksize
> + *      one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_growrtalloc_reservation(
> @@ -611,8 +588,7 @@ xfs_calc_growrtalloc_reservation(
>  		xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
>  				 XFS_FSB_TO_B(mp, 1)) +
>  		xfs_calc_inode_res(mp, 1) +
> -		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				 XFS_FSB_TO_B(mp, 1));
> +		xfs_allocfree_extent_res(mp);
>  }
>  
>  /*
> @@ -675,7 +651,7 @@ xfs_calc_writeid_reservation(
>   *	agf block and superblock (for block allocation)
>   *	the new block (directory sized)
>   *	bmap blocks for the new directory block
> - *	allocation btrees
> + *      one extent allocfree reservation for the AG.
>   */
>  STATIC uint
>  xfs_calc_addafork_reservation(
> @@ -687,8 +663,7 @@ xfs_calc_addafork_reservation(
>  		xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
>  		xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
>  				 XFS_FSB_TO_B(mp, 1)) +
> -		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> -				 XFS_FSB_TO_B(mp, 1));
> +		xfs_allocfree_extent_res(mp);
>  }
>  
>  /*
> @@ -696,11 +671,8 @@ xfs_calc_addafork_reservation(
>   *    the inode being truncated: inode size
>   *    the inode's bmap btree: max depth * block size
>   * And the bmap_finish transaction can free the blocks and bmap blocks:
> - *    the agf for each of the ags: 4 * sector size
> - *    the agfl for each of the ags: 4 * sector size
>   *    the super block to reflect the freed blocks: sector size
> - *    worst case split in allocation btrees per extent assuming 4 extents:
> - *		4 exts * 2 trees * (2 * max depth - 1) * block size
> + *    four extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_attrinval_reservation(
> @@ -709,9 +681,8 @@ xfs_calc_attrinval_reservation(
>  	return max((xfs_calc_inode_res(mp, 1) +
>  		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
>  				     XFS_FSB_TO_B(mp, 1))),
> -		   (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
> -		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
> -				     XFS_FSB_TO_B(mp, 1))));
> +		   (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +		    xfs_allocfree_extent_res(mp) * 4));
>  }
>  
>  /*
> @@ -760,10 +731,8 @@ xfs_calc_attrsetrt_reservation(
>   *    the attribute btree could join: max depth * block size
>   *    the inode bmap btree could join or split: max depth * block size
>   * And the bmap_finish transaction can free the attr blocks freed giving:
> - *    the agf for the ag in which the blocks live: 2 * sector size
> - *    the agfl for the ag in which the blocks live: 2 * sector size
>   *    the superblock for the free block count: sector size
> - *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> + *    two extent allocfree reservations for the AG.
>   */
>  STATIC uint
>  xfs_calc_attrrm_reservation(
> @@ -776,9 +745,8 @@ xfs_calc_attrrm_reservation(
>  		     (uint)XFS_FSB_TO_B(mp,
>  					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
>  		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
> -		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
> -		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
> -				      XFS_FSB_TO_B(mp, 1))));
> +		    (xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) +
> +		     xfs_allocfree_extent_res(mp) * 2));
>  }
>  
>  /*
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] xfs: add a free space extent change reservation
  2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
                     ` (3 preceding siblings ...)
  2021-05-27  7:03     ` kernel test robot
@ 2021-06-02 21:37   ` Darrick J. Wong
  4 siblings, 0 replies; 30+ messages in thread
From: Darrick J. Wong @ 2021-06-02 21:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, May 27, 2021 at 02:52:00PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Lots of the transaction reservation code reserves space for an
> extent allocation. It is inconsistently implemented, and many of
> them get it wrong. Introduce a new function to calculate the log
> space reservation for adding or removing an extent from the free
> space btrees.
> 
> This function reserves space for logging the AGF, the AGFL and the
> free space btrees, avoiding the need to account for them seperately
> in every reservation that manipulates free space.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_trans_resv.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index d1a0848cb52e..6363cacb790f 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -79,6 +79,23 @@ xfs_allocfree_log_count(
>  	return blocks;
>  }
>  
> +/*
> + * Log reservation required to add or remove a single extent to the free space
> + * btrees.  This requires modifying:
> + *
> + * the agf header: 1 sector
> + * the agfl header: 1 sector
> + * the allocation btrees: 2 trees * (max depth - 1) * block size
> + */
> +uint
> +xfs_allocfree_extent_res(
> +	struct xfs_mount *mp)
> +{
> +	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
> +	       xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
> +				XFS_FSB_TO_B(mp, 1));
> +}
> +

No caller?  I think the next patch should get merged into this one.

--D

>  /*
>   * Logging inodes is really tricksy. They are logged in memory format,
>   * which means that what we write into the log doesn't directly translate into
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-06-02 21:37 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-27  4:51 [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Dave Chinner
2021-05-27  4:51 ` [PATCH 1/6] xfs: btree format inode forks can have zero extents Dave Chinner
2021-05-27  6:15   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 2/6] xfs: bunmapi has unnecessary AG lock ordering issues Dave Chinner
2021-05-27  6:16   ` Darrick J. Wong
2021-05-27  4:51 ` [PATCH 3/6] xfs: xfs_itruncate_extents has no extent count limitation Dave Chinner
2021-05-31 12:55   ` Chandan Babu R
2021-05-31 13:05     ` Chandan Babu R
2021-05-31 23:28       ` Dave Chinner
2021-06-01  6:42         ` Chandan Babu R
2021-05-27  4:52 ` [PATCH 4/6] xfs: add a free space extent change reservation Dave Chinner
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38     ` kernel test robot
2021-05-27  6:38   ` kernel test robot
2021-05-27  6:38     ` kernel test robot
2021-05-27  7:03   ` kernel test robot
2021-05-27  7:03     ` kernel test robot
2021-05-27  7:03   ` [RFC PATCH] xfs: xfs_allocfree_extent_res can be static kernel test robot
2021-05-27  7:03     ` kernel test robot
2021-06-02 21:37   ` [PATCH 4/6] xfs: add a free space extent change reservation Darrick J. Wong
2021-05-27  4:52 ` [PATCH 5/6] xfs: factor free space tree transaciton reservations Dave Chinner
2021-06-02 21:36   ` Darrick J. Wong
2021-05-27  4:52 ` [PATCH 6/6] xfs: reduce transaction reservation for freeing extents Dave Chinner
2021-05-27  6:19   ` Darrick J. Wong
2021-05-27  8:52     ` Dave Chinner
2021-05-28  0:01       ` Darrick J. Wong
2021-05-28  2:30         ` Dave Chinner
2021-05-28  5:30           ` Darrick J. Wong
2021-05-31 10:02 ` [PATCH 0/6] xfs: bunmapi needs updating for deferred freeing Chandan Babu R
2021-05-31 22:41   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.