linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res
@ 2019-09-12 14:32 Brian Foster
  2019-09-12 14:32 ` [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Brian Foster @ 2019-09-12 14:32 UTC (permalink / raw)
  To: linux-xfs; +Cc: Carlos Maiolino

Hi all,

This is a repost of a couple patches I posted a few months ago[1]. There
are no changes other than a rebase to for-next. Any thoughts on these? I
think Carlos had also run into some related generic/223 failures fairly
recently...

Carlos,

Any chance you could give these a try?

Brian

[1] https://lore.kernel.org/linux-xfs/20190501140504.16435-1-bfoster@redhat.com/

Brian Foster (2):
  xfs: drop minlen before tossing alignment on bmap allocs
  xfs: don't set bmapi total block req where minleft is sufficient

 fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
 fs/xfs/xfs_bmap_util.c   |  4 ++--
 fs/xfs/xfs_dquot.c       |  4 ++--
 fs/xfs/xfs_iomap.c       |  4 ++--
 fs/xfs/xfs_reflink.c     |  4 ++--
 fs/xfs/xfs_rtalloc.c     |  3 +--
 6 files changed, 18 insertions(+), 14 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-12 14:32 [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
@ 2019-09-12 14:32 ` Brian Foster
  2019-09-12 22:35   ` Dave Chinner
  2019-09-18 21:41   ` Darrick J. Wong
  2019-09-12 14:32 ` [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Brian Foster @ 2019-09-12 14:32 UTC (permalink / raw)
  To: linux-xfs; +Cc: Carlos Maiolino

The bmap block allocation code issues a sequence of retries to
perform an optimal allocation, gradually loosening constraints as
allocations fail. For example, the first attempt might begin at a
particular bno, with maxlen == minlen and alignment incorporated. As
allocations fail, the parameters fall back to different modes, drop
alignment requirements and reduce the minlen and total block
requirements.

For large extent allocations with an args.total value that exceeds
the allocation length (i.e., non-delalloc), the total value tends to
dominate despite these fallbacks. For example, an aligned extent
allocation request of tens to hundreds of MB that cannot be
satisfied from a particular AG will not succeed after dropping
alignment or minlen because xfs_alloc_space_available() never
selects an AG that can't satisfy args.total. The retry sequence
eventually reduces total and ultimately succeeds if a minlen extent
is available somewhere, but the first several retries are
effectively pointless in this scenario.

Beyond simply being inefficient, another side effect of this
behavior is that we drop alignment requirements too aggressively.
Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
unit:

 # xfs_io -c "falloc 0 1g" /mnt/file
 # <xfstests>/src/t_stripealign /mnt/file 32
 /mnt/file: Start block 347176 not multiple of sunit 32

Despite the filesystem being completely empty, the fact that the
allocation request cannot be satisifed from a single AG means the
allocation doesn't succeed until xfs_bmap_btalloc() drops total from
the original value based on maxlen. This occurs after we've dropped
minlen and alignment (unnecessarily).

As a step towards addressing this problem, insert a new retry in the
bmap allocation sequence to drop minlen (from maxlen) before tossing
alignment. This should still result in as large of an extent as
possible as the block allocator prioritizes extent size in all but
exact allocation modes. By itself, this does not change the behavior
of the command above because the preallocation code still specifies
total based on maxlen. Instead, this facilitates preservation of
alignment once extra reservation is separated from the extent length
portion of the total block requirement.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 054b4ce30033..eaa965920a03 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3573,6 +3573,14 @@ xfs_bmap_btalloc(
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
 	}
+	if (args.fsbno == NULLFSBLOCK && nullfb &&
+	    args.minlen > ap->minlen) {
+		args.minlen = ap->minlen;
+		args.fsbno = ap->blkno;
+		error = xfs_alloc_vextent(&args);
+		if (error)
+			return error;
+	}
 	if (isaligned && args.fsbno == NULLFSBLOCK) {
 		/*
 		 * allocation failed, so turn off alignment and
@@ -3584,9 +3592,7 @@ xfs_bmap_btalloc(
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
 	}
-	if (args.fsbno == NULLFSBLOCK && nullfb &&
-	    args.minlen > ap->minlen) {
-		args.minlen = ap->minlen;
+	if (args.fsbno == NULLFSBLOCK && nullfb) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 		args.fsbno = ap->blkno;
 		if ((error = xfs_alloc_vextent(&args)))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient
  2019-09-12 14:32 [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
  2019-09-12 14:32 ` [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
@ 2019-09-12 14:32 ` Brian Foster
  2019-09-18 21:55   ` Darrick J. Wong
  2019-09-16  8:18 ` [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Carlos Maiolino
  2019-10-18 17:17 ` Darrick J. Wong
  3 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2019-09-12 14:32 UTC (permalink / raw)
  To: linux-xfs; +Cc: Carlos Maiolino

xfs_bmapi_write() takes a total block requirement parameter that is
passed down to the block allocation code and is used to specify the
total block requirement of the associated transaction. This is used
to try and select an AG that can not only satisfy the requested
extent allocation, but can also accommodate subsequent allocations
that might be required to complete the transaction. For example,
additional bmbt block allocations may be required on insertion of
the resulting extent to an inode data fork.

While it's important for callers to calculate and reserve such extra
blocks in the transaction, it is not necessary to pass the total
value to xfs_bmapi_write() in all cases. The latter automatically
sets minleft to ensure that sufficient free blocks remain after the
allocation attempt to expand the format of the associated inode
(i.e., such as extent to btree conversion, btree splits, etc).
Therefore, any callers that pass a total block requirement of the
bmap mapping length plus worst case bmbt expansion essentially
specify the additional reservation requirement twice. These callers
can pass a total of zero to rely on the bmapi minleft policy.

Beyond being superfluous, the primary motivation for this change is
that the total reservation logic in the bmbt code is dubious in
scenarios where minlen < maxlen and a maxlen extent cannot be
allocated (which is more common for data extent allocations where
contiguity is not required). The total value is based on maxlen in
the xfs_bmapi_write() caller. If the bmbt code falls back to an
allocation between minlen and maxlen, that allocation will not
succeed until total is reset to minlen, which essentially throws
away any additional reservation included in total by the caller. In
addition, the total value is not reset until after alignment is
dropped, which means that such callers drop alignment far too
aggressively than necessary.

Update all callers of xfs_bmapi_write() that pass a total block
value of the mapping length plus bmbt reservation to instead pass
zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
requirement. This trades off slightly less conservative AG selection
for the ability to preserve alignment in more scenarios.
xfs_bmapi_write() callers that incorporate unrelated or additional
reservations in total beyond what is already included in minleft
must continue to use the former.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 1 -
 fs/xfs/xfs_bmap_util.c   | 4 ++--
 fs/xfs/xfs_dquot.c       | 4 ++--
 fs/xfs/xfs_iomap.c       | 4 ++--
 fs/xfs/xfs_reflink.c     | 4 ++--
 fs/xfs/xfs_rtalloc.c     | 3 +--
 6 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index eaa965920a03..c2f0afdf2f65 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4505,7 +4505,6 @@ xfs_bmapi_convert_delalloc(
 	bma.wasdel = true;
 	bma.offset = bma.got.br_startoff;
 	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
-	bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
 	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
 	if (whichfork == XFS_COW_FORK)
 		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0910cb75b65d..079aedade656 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -962,8 +962,8 @@ xfs_alloc_file_space(
 		xfs_trans_ijoin(tp, ip, 0);
 
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
-					allocatesize_fsb, alloc_type, resblks,
-					imapp, &nimaps);
+					allocatesize_fsb, alloc_type, 0, imapp,
+					&nimaps);
 		if (error)
 			goto error0;
 
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index aeb95e7391c1..b924dbd63a7d 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -305,8 +305,8 @@ xfs_dquot_disk_alloc(
 	/* Create the block mapping. */
 	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
 	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
-			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
-			XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
+			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
+			&nmaps);
 	if (error)
 		return error;
 	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f780e223b118..27f2030690e2 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -277,8 +277,8 @@ xfs_iomap_write_direct(
 	 * caller gave to us.
 	 */
 	nimaps = 1;
-	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
-				bmapi_flags, resblks, imap, &nimaps);
+	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
+				imap, &nimaps);
 	if (error)
 		goto out_res_cancel;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0f08153b4994..3aa56cac1a47 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -410,8 +410,8 @@ xfs_reflink_allocate_cow(
 	/* Allocate the entire reservation as unwritten blocks. */
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
-			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
-			resblks, imap, &nimaps);
+			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
+			&nimaps);
 	if (error)
 		goto out_unreserve;
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 4a48a8c75b4f..d42b5a2047e0 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -792,8 +792,7 @@ xfs_growfs_rt_alloc(
 		 */
 		nmap = 1;
 		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
-					XFS_BMAPI_METADATA, resblks, &map,
-					&nmap);
+					XFS_BMAPI_METADATA, 0, &map, &nmap);
 		if (!error && nmap < 1)
 			error = -ENOSPC;
 		if (error)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-12 14:32 ` [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
@ 2019-09-12 22:35   ` Dave Chinner
  2019-09-12 22:46     ` Dave Chinner
                       ` (2 more replies)
  2019-09-18 21:41   ` Darrick J. Wong
  1 sibling, 3 replies; 17+ messages in thread
From: Dave Chinner @ 2019-09-12 22:35 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> The bmap block allocation code issues a sequence of retries to
> perform an optimal allocation, gradually loosening constraints as
> allocations fail. For example, the first attempt might begin at a
> particular bno, with maxlen == minlen and alignment incorporated. As
> allocations fail, the parameters fall back to different modes, drop
> alignment requirements and reduce the minlen and total block
> requirements.
> 
> For large extent allocations with an args.total value that exceeds
> the allocation length (i.e., non-delalloc), the total value tends to
> dominate despite these fallbacks. For example, an aligned extent
> allocation request of tens to hundreds of MB that cannot be
> satisfied from a particular AG will not succeed after dropping
> alignment or minlen because xfs_alloc_space_available() never
> selects an AG that can't satisfy args.total. The retry sequence
> eventually reduces total and ultimately succeeds if a minlen extent
> is available somewhere, but the first several retries are
> effectively pointless in this scenario.
> 
> Beyond simply being inefficient, another side effect of this
> behavior is that we drop alignment requirements too aggressively.
> Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> unit:
> 
>  # xfs_io -c "falloc 0 1g" /mnt/file
>  # <xfstests>/src/t_stripealign /mnt/file 32
>  /mnt/file: Start block 347176 not multiple of sunit 32

Ok, so what Carlos and I found last night was an issue with the
the agresv code leading to the maximum free extent calculated
by xfs_alloc_longest_free_extent() being longer than the largest
allowable extent allocation (mp->m_ag_max_usable) resulting in the
situation where blen > args->maxlen, and so in the case of initial
allocation here, we never run this:

	/*
	 * Adjust for alignment
	 */
	if (blen > args.alignment && blen <= args.maxlen)
		args.minlen = blen - args.alignment;
	args.minalignslop = 0;

this is how we end up with args.minlen = args.maxlen and the
initial allocation failing.

The issue is the way mp->m_ag_max_usable is calculated versus how
the pag->pag_meta_resv.ar_reserved value is set up for the finobt.
That is, "ask" = max tree size, and "used" = 1 because we have a
root block allocated. that code does:

	mp->m_ag_max_unused -= ask;
...
	pag->pag_meta_resv.ar_reserved = ask - used

That means when we calculate the longest extent in the AG, we do:

	longest = pag->pagf_longest - min_needed - resv(NONE)
		= pag->pagf_longest - min_needed - pag->pag_meta_resv.ar_reserved

whilst mp->m_ag_max_usable is calculated as

	usable = agf_length - AG header blocks - AGFL - resv(ask)

When the AG is empty, this ends up with

	pag->pagf_longest = agf_length - AG header blocks
and
	min_needed = AGFL blocks
and
	resv(ask) = pag->pag_meta_resv.ar_reserved + 1

and so:
	longest = usable + 1

And that's how we get blen = maxlen + 1, and that's why alignment is
being dropped for the initial allocation in this "allocate full AG"
corner case.

> Despite the filesystem being completely empty, the fact that the
> allocation request cannot be satisifed from a single AG means the
> allocation doesn't succeed until xfs_bmap_btalloc() drops total from
> the original value based on maxlen. This occurs after we've dropped
> minlen and alignment (unnecessarily).

Right, we'll continue to fail until minlen is reduced appropriately.
But that's not an issue in the fallback algorithms, that's a problem
with the initial conditions not being set up correctly.

> As a step towards addressing this problem, insert a new retry in the
> bmap allocation sequence to drop minlen (from maxlen) before tossing
> alignment. This should still result in as large of an extent as
> possible as the block allocator prioritizes extent size in all but
> exact allocation modes. By itself, this does not change the behavior
> of the command above because the preallocation code still specifies
> total based on maxlen. Instead, this facilitates preservation of
> alignment once extra reservation is separated from the extent length
> portion of the total block requirement.

AFAICT this is not necessary. The prototypoe patch I wrote last
night while working through this with Carlos is attached below. I
updated with a variant of your patch 2 to demonstrate that it does
actually solve the problem of full AG allocation failing to be
aligned.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

xfs: cap longest free extent to maximum allocatable

From: Dave Chinner <dchinner@redhat.com>

Cap longest extent to the largest we can allocate based on limits
calculated at mount time. Dynamic state (such as finobt blocks)
can result in the longest free extent exceeding the size we can
allocate, and that results in failure to align full AG allocations
when the AG is empty.

Result:

xfs_io-4413  [003]   426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff

minlen and maxlen are now separated by the alignment size, and
allocation fails because args.total > free space in the AG.

Update: add a hacked version of Brian's xfs_bmapi_write() args.total
hack to actually allow this initial aligned allocation to succeed:

$ sudo mkfs.xfs -f -d su=128k,sw=4,size=15g /dev/sdg
meta-data=/dev/sdg               isize=512    agcount=16, agsize=245728 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=3931648, imaxpct=25
         =                       sunit=32     swidth=128 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$ sudo mount /dev/sdg /mnt/test
$ sudo xfs_io -f -c "falloc 0 1g" -c "bmap -vvp" /mnt/test/file
/mnt/test/file:
 EXT: FILE-OFFSET         BLOCK-RANGE      AG AG-OFFSET          TOTAL FLAGS
   0: [0..1951999]:       1966080..3918079  1 (256..1952255)   1952000 010001
   1: [1952000..2097151]: 3931904..4077055  2 (256..145407)     145152 010011
 FLAG Values:
    0100000 Shared extent
    0010000 Unwritten preallocated extent
    0001000 Doesn't begin on stripe unit
    0000100 Doesn't end   on stripe unit
    0000010 Doesn't begin on stripe width
    0000001 Doesn't end   on stripe width
$ sudo ~/src/xfstests-dev/src/t_stripealign /mnt/test/file 32
/mnt/test/file: well-aligned
$

Note that it skips AG 1 now because AG 0 is not the AG with the
longest free space - it's got a root inode chunk in it so it has
less space in it than the other AGs. Hence it can't do a maximally
sized aligned allocation, while AG 1 can. Note the difference in
total compared to the initial trace above.

xfs_io-4398  [003]   207.663502: xfs_ag_resv_needed:   dev 8:96 agno 0 resv 0 freeblks 245707 flcount 4 resv 0 ask 0 len 1713
xfs_io-4398  [003]   207.663502: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 0 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff
....
xfs_io-4398  [003]   207.663503: xfs_ag_resv_needed:   dev 8:96 agno 1 resv 0 freeblks 245715 flcount 4 resv 0 ask 0 len 1713
.....
xfs_io-4398  [003]   207.666010: xfs_alloc_size_done:  dev 8:96 agno 1 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 0 alignment 32 minalignslop 0 len 244000 type THIS_AG otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c |  3 ++-
 fs/xfs/libxfs/xfs_bmap.c  | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 533b04aaf6f6..9dead25d2e70 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1989,7 +1989,8 @@ xfs_alloc_longest_free_extent(
 	 * reservations and AGFL rules in place, we can return this extent.
 	 */
 	if (pag->pagf_longest > delta)
-		return pag->pagf_longest - delta;
+		return min_t(xfs_extlen_t, pag->pag_mount->m_ag_max_usable,
+				pag->pagf_longest - delta);
 
 	/* Otherwise, let the caller try for 1 block if there's space. */
 	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 054b4ce30033..b05683f649a6 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4286,6 +4286,20 @@ xfs_bmapi_write(
 #endif
 	whichfork = xfs_bmapi_whichfork(flags);
 
+	/*
+	 * XXX: Hack!
+	 *
+	 * If the total blocks requested is larger than an AG, we can't allocate
+	 * all the space atomically and within a single AG. This will be a
+	 * "short" allocation. In this case, just ignore the total block count
+	 * and rely on minleft calculations to ensure the allocation we do fits
+	 * inside an AG properly.
+	 *
+	 * Based on a patch from Brian.
+	 */
+	if (bma.total > mp->m_ag_max_usable)
+		bma.total = 0;
+
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
 	ASSERT(tp != NULL);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-12 22:35   ` Dave Chinner
@ 2019-09-12 22:46     ` Dave Chinner
  2019-09-13 14:58     ` Brian Foster
  2019-09-17 12:22     ` Carlos Maiolino
  2 siblings, 0 replies; 17+ messages in thread
From: Dave Chinner @ 2019-09-12 22:46 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On Fri, Sep 13, 2019 at 08:35:19AM +1000, Dave Chinner wrote:
> On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> > The bmap block allocation code issues a sequence of retries to
> > perform an optimal allocation, gradually loosening constraints as
> > allocations fail. For example, the first attempt might begin at a
> > particular bno, with maxlen == minlen and alignment incorporated. As
> > allocations fail, the parameters fall back to different modes, drop
> > alignment requirements and reduce the minlen and total block
> > requirements.
> > 
> > For large extent allocations with an args.total value that exceeds
> > the allocation length (i.e., non-delalloc), the total value tends to
> > dominate despite these fallbacks. For example, an aligned extent
> > allocation request of tens to hundreds of MB that cannot be
> > satisfied from a particular AG will not succeed after dropping
> > alignment or minlen because xfs_alloc_space_available() never
> > selects an AG that can't satisfy args.total. The retry sequence
> > eventually reduces total and ultimately succeeds if a minlen extent
> > is available somewhere, but the first several retries are
> > effectively pointless in this scenario.
> > 
> > Beyond simply being inefficient, another side effect of this
> > behavior is that we drop alignment requirements too aggressively.
> > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> > unit:
> > 
> >  # xfs_io -c "falloc 0 1g" /mnt/file
> >  # <xfstests>/src/t_stripealign /mnt/file 32
> >  /mnt/file: Start block 347176 not multiple of sunit 32
> 
> Ok, so what Carlos and I found last night was an issue with the
> the agresv code leading to the maximum free extent calculated
> by xfs_alloc_longest_free_extent() being longer than the largest
> allowable extent allocation (mp->m_ag_max_usable) resulting in the
> situation where blen > args->maxlen, and so in the case of initial
> allocation here, we never run this:

Just to make it clear: carlos did all the hard work of narrowing it
down and isolating the accounting discrepancy in the allocation
code. All I did was put 2 and 2 together - the agresv discrepancy -
wrote a quick patch and did a trace to make sure I didn't get 5...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-12 22:35   ` Dave Chinner
  2019-09-12 22:46     ` Dave Chinner
@ 2019-09-13 14:58     ` Brian Foster
  2019-09-14 22:00       ` Dave Chinner
  2019-09-17 12:22     ` Carlos Maiolino
  2 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2019-09-13 14:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Carlos Maiolino

On Fri, Sep 13, 2019 at 08:35:19AM +1000, Dave Chinner wrote:
> On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> > The bmap block allocation code issues a sequence of retries to
> > perform an optimal allocation, gradually loosening constraints as
> > allocations fail. For example, the first attempt might begin at a
> > particular bno, with maxlen == minlen and alignment incorporated. As
> > allocations fail, the parameters fall back to different modes, drop
> > alignment requirements and reduce the minlen and total block
> > requirements.
> > 
> > For large extent allocations with an args.total value that exceeds
> > the allocation length (i.e., non-delalloc), the total value tends to
> > dominate despite these fallbacks. For example, an aligned extent
> > allocation request of tens to hundreds of MB that cannot be
> > satisfied from a particular AG will not succeed after dropping
> > alignment or minlen because xfs_alloc_space_available() never
> > selects an AG that can't satisfy args.total. The retry sequence
> > eventually reduces total and ultimately succeeds if a minlen extent
> > is available somewhere, but the first several retries are
> > effectively pointless in this scenario.
> > 
> > Beyond simply being inefficient, another side effect of this
> > behavior is that we drop alignment requirements too aggressively.
> > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> > unit:
> > 
> >  # xfs_io -c "falloc 0 1g" /mnt/file
> >  # <xfstests>/src/t_stripealign /mnt/file 32
> >  /mnt/file: Start block 347176 not multiple of sunit 32
> 
> Ok, so what Carlos and I found last night was an issue with the
> the agresv code leading to the maximum free extent calculated
> by xfs_alloc_longest_free_extent() being longer than the largest
> allowable extent allocation (mp->m_ag_max_usable) resulting in the
> situation where blen > args->maxlen, and so in the case of initial
> allocation here, we never run this:
> 
> 	/*
> 	 * Adjust for alignment
> 	 */
> 	if (blen > args.alignment && blen <= args.maxlen)
> 		args.minlen = blen - args.alignment;
> 	args.minalignslop = 0;
> 

Interesting, I think I missed this logic when I looked at this
originally. It makes sense that we should be allowing this to handle the
maxlen = minlen alignment case.

> this is how we end up with args.minlen = args.maxlen and the
> initial allocation failing.
> 
> The issue is the way mp->m_ag_max_usable is calculated versus how
> the pag->pag_meta_resv.ar_reserved value is set up for the finobt.
> That is, "ask" = max tree size, and "used" = 1 because we have a
> root block allocated. that code does:
> 

Ok, I see that this behavior changes without perag reservations in the
mix (i.e., disable finobt, reflink, etc.). The perag reservation
calculation stuff always confuses me, however..

> 	mp->m_ag_max_unused -= ask;
> ...
> 	pag->pag_meta_resv.ar_reserved = ask - used
> 
> That means when we calculate the longest extent in the AG, we do:
> 
> 	longest = pag->pagf_longest - min_needed - resv(NONE)
> 		= pag->pagf_longest - min_needed - pag->pag_meta_resv.ar_reserved
> 

So here we use ar_reserved, which reflects outstanding and so far unused
reservation for the metadata type.

> whilst mp->m_ag_max_usable is calculated as
> 
> 	usable = agf_length - AG header blocks - AGFL - resv(ask)
> 

And here we apparently use the total reservation requirement, regardless
of how much is used, because we're calculating the maximum amount ever
available from the AG.

> When the AG is empty, this ends up with
> 
> 	pag->pagf_longest = agf_length - AG header blocks
> and
> 	min_needed = AGFL blocks
> and
> 	resv(ask) = pag->pag_meta_resv.ar_reserved + 1
> 
> and so:
> 	longest = usable + 1
> 
> And that's how we get blen = maxlen + 1, and that's why alignment is
> being dropped for the initial allocation in this "allocate full AG"
> corner case.
> 

And maxlen is capped to ->m_ag_max_usable, hence the delta between the
two values. I think this makes sense. Thanks for the breakdown.

> > Despite the filesystem being completely empty, the fact that the
> > allocation request cannot be satisifed from a single AG means the
> > allocation doesn't succeed until xfs_bmap_btalloc() drops total from
> > the original value based on maxlen. This occurs after we've dropped
> > minlen and alignment (unnecessarily).
> 
> Right, we'll continue to fail until minlen is reduced appropriately.
> But that's not an issue in the fallback algorithms, that's a problem
> with the initial conditions not being set up correctly.
> 
> > As a step towards addressing this problem, insert a new retry in the
> > bmap allocation sequence to drop minlen (from maxlen) before tossing
> > alignment. This should still result in as large of an extent as
> > possible as the block allocator prioritizes extent size in all but
> > exact allocation modes. By itself, this does not change the behavior
> > of the command above because the preallocation code still specifies
> > total based on maxlen. Instead, this facilitates preservation of
> > alignment once extra reservation is separated from the extent length
> > portion of the total block requirement.
> 
> AFAICT this is not necessary. The prototypoe patch I wrote last
> night while working through this with Carlos is attached below. I
> updated with a variant of your patch 2 to demonstrate that it does
> actually solve the problem of full AG allocation failing to be
> aligned.
> 

I agree that this addresses the reported issue, but I can reproduce
other corner cases affected by the original patch that aren't affected
by this one. For example, if the allocation request happens to be
slightly less than blen but not enough to allow for alignment, minlen
isn't dropped and we can run through the same allocation retry sequence
that kills off alignment before success.

This does raise an interesting question in that current allocation
behavior allocates one extent out of whatever that largest free extent
happened to be in the AG, regardless of whether it's aligned (because
args.alignment drops to 1 before minlen). With the retry in this patch,
we'd satisfy alignment but end up allocating two extents in the file
after chopping alignment off the first free extent.

I could see a reasonable argument for either. On one hand contiguity may
be preferable to alignment. On the other, the difference between one or
two extents in this case is basically the alignment size. Getting back
an unaligned extent of say 200MB over an aligned extent 128k smaller
might not be worth it, or at the very least unexpected to users who
don't understand XFS geometry when there are multiples of that amount of
free space still available in the broader fs. I don't feel too strongly
about it either way, but figured it's worth noting..

...
> 
> xfs: cap longest free extent to maximum allocatable
> 
> From: Dave Chinner <dchinner@redhat.com>
> 
> Cap longest extent to the largest we can allocate based on limits
> calculated at mount time. Dynamic state (such as finobt blocks)
> can result in the longest free extent exceeding the size we can
> allocate, and that results in failure to align full AG allocations
> when the AG is empty.
> 
...
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Are either of you guys planning to post this patch without the
bmapi.total hack?

Brian

>  fs/xfs/libxfs/xfs_alloc.c |  3 ++-
>  fs/xfs/libxfs/xfs_bmap.c  | 14 ++++++++++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 533b04aaf6f6..9dead25d2e70 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1989,7 +1989,8 @@ xfs_alloc_longest_free_extent(
>  	 * reservations and AGFL rules in place, we can return this extent.
>  	 */
>  	if (pag->pagf_longest > delta)
> -		return pag->pagf_longest - delta;
> +		return min_t(xfs_extlen_t, pag->pag_mount->m_ag_max_usable,
> +				pag->pagf_longest - delta);
>  
>  	/* Otherwise, let the caller try for 1 block if there's space. */
>  	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 054b4ce30033..b05683f649a6 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4286,6 +4286,20 @@ xfs_bmapi_write(
>  #endif
>  	whichfork = xfs_bmapi_whichfork(flags);
>  
> +	/*
> +	 * XXX: Hack!
> +	 *
> +	 * If the total blocks requested is larger than an AG, we can't allocate
> +	 * all the space atomically and within a single AG. This will be a
> +	 * "short" allocation. In this case, just ignore the total block count
> +	 * and rely on minleft calculations to ensure the allocation we do fits
> +	 * inside an AG properly.
> +	 *
> +	 * Based on a patch from Brian.
> +	 */
> +	if (bma.total > mp->m_ag_max_usable)
> +		bma.total = 0;
> +
>  	ASSERT(*nmap >= 1);
>  	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
>  	ASSERT(tp != NULL);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-13 14:58     ` Brian Foster
@ 2019-09-14 22:00       ` Dave Chinner
  2019-09-15 13:09         ` Brian Foster
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Chinner @ 2019-09-14 22:00 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On Fri, Sep 13, 2019 at 10:58:02AM -0400, Brian Foster wrote:
> On Fri, Sep 13, 2019 at 08:35:19AM +1000, Dave Chinner wrote:
> > On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> > > The bmap block allocation code issues a sequence of retries to
> > > perform an optimal allocation, gradually loosening constraints as
> > > allocations fail. For example, the first attempt might begin at a
> > > particular bno, with maxlen == minlen and alignment incorporated. As
> > > allocations fail, the parameters fall back to different modes, drop
> > > alignment requirements and reduce the minlen and total block
> > > requirements.
> > > 
> > > For large extent allocations with an args.total value that exceeds
> > > the allocation length (i.e., non-delalloc), the total value tends to
> > > dominate despite these fallbacks. For example, an aligned extent
> > > allocation request of tens to hundreds of MB that cannot be
> > > satisfied from a particular AG will not succeed after dropping
> > > alignment or minlen because xfs_alloc_space_available() never
> > > selects an AG that can't satisfy args.total. The retry sequence
> > > eventually reduces total and ultimately succeeds if a minlen extent
> > > is available somewhere, but the first several retries are
> > > effectively pointless in this scenario.
> > > 
> > > Beyond simply being inefficient, another side effect of this
> > > behavior is that we drop alignment requirements too aggressively.
> > > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> > > unit:
> > > 
> > >  # xfs_io -c "falloc 0 1g" /mnt/file
> > >  # <xfstests>/src/t_stripealign /mnt/file 32
> > >  /mnt/file: Start block 347176 not multiple of sunit 32
> > 
> > Ok, so what Carlos and I found last night was an issue with the
> > the agresv code leading to the maximum free extent calculated
> > by xfs_alloc_longest_free_extent() being longer than the largest
> > allowable extent allocation (mp->m_ag_max_usable) resulting in the
> > situation where blen > args->maxlen, and so in the case of initial
> > allocation here, we never run this:
> > 
> > 	/*
> > 	 * Adjust for alignment
> > 	 */
> > 	if (blen > args.alignment && blen <= args.maxlen)
> > 		args.minlen = blen - args.alignment;
> > 	args.minalignslop = 0;
> > 
....
> > > As a step towards addressing this problem, insert a new retry in the
> > > bmap allocation sequence to drop minlen (from maxlen) before tossing
> > > alignment. This should still result in as large of an extent as
> > > possible as the block allocator prioritizes extent size in all but
> > > exact allocation modes. By itself, this does not change the behavior
> > > of the command above because the preallocation code still specifies
> > > total based on maxlen. Instead, this facilitates preservation of
> > > alignment once extra reservation is separated from the extent length
> > > portion of the total block requirement.
> > 
> > AFAICT this is not necessary. The prototypoe patch I wrote last
> > night while working through this with Carlos is attached below. I
> > updated with a variant of your patch 2 to demonstrate that it does
> > actually solve the problem of full AG allocation failing to be
> > aligned.
> > 
> 
> I agree that this addresses the reported issue, but I can reproduce
> other corner cases affected by the original patch that aren't affected
> by this one. For example, if the allocation request happens to be
> slightly less than blen but not enough to allow for alignment, minlen
> isn't dropped and we can run through the same allocation retry sequence
> that kills off alignment before success.

But isn't that just another variation of the initial conditions
(minlen/maxlen) not being set up correctly for alignment when the AG
is empty?

i.e. Take the above condition and change it like this:

 	/*
 	 * Adjust for alignment
 	 */
-	if (blen > args.alignment && blen <= args.maxlen)
+	if (blen > args.alignment && blen <= args.maxlen + args.alignment)
 		args.minlen = blen - args.alignment;
 	args.minalignslop = 0;

and now we cover all the cases when blen covers an aligned maxlen
allocation...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-14 22:00       ` Dave Chinner
@ 2019-09-15 13:09         ` Brian Foster
  2019-09-16  8:36           ` Carlos Maiolino
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Foster @ 2019-09-15 13:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, Carlos Maiolino

On Sun, Sep 15, 2019 at 08:00:35AM +1000, Dave Chinner wrote:
> On Fri, Sep 13, 2019 at 10:58:02AM -0400, Brian Foster wrote:
> > On Fri, Sep 13, 2019 at 08:35:19AM +1000, Dave Chinner wrote:
> > > On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> > > > The bmap block allocation code issues a sequence of retries to
> > > > perform an optimal allocation, gradually loosening constraints as
> > > > allocations fail. For example, the first attempt might begin at a
> > > > particular bno, with maxlen == minlen and alignment incorporated. As
> > > > allocations fail, the parameters fall back to different modes, drop
> > > > alignment requirements and reduce the minlen and total block
> > > > requirements.
> > > > 
> > > > For large extent allocations with an args.total value that exceeds
> > > > the allocation length (i.e., non-delalloc), the total value tends to
> > > > dominate despite these fallbacks. For example, an aligned extent
> > > > allocation request of tens to hundreds of MB that cannot be
> > > > satisfied from a particular AG will not succeed after dropping
> > > > alignment or minlen because xfs_alloc_space_available() never
> > > > selects an AG that can't satisfy args.total. The retry sequence
> > > > eventually reduces total and ultimately succeeds if a minlen extent
> > > > is available somewhere, but the first several retries are
> > > > effectively pointless in this scenario.
> > > > 
> > > > Beyond simply being inefficient, another side effect of this
> > > > behavior is that we drop alignment requirements too aggressively.
> > > > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> > > > unit:
> > > > 
> > > >  # xfs_io -c "falloc 0 1g" /mnt/file
> > > >  # <xfstests>/src/t_stripealign /mnt/file 32
> > > >  /mnt/file: Start block 347176 not multiple of sunit 32
> > > 
> > > Ok, so what Carlos and I found last night was an issue with the
> > > the agresv code leading to the maximum free extent calculated
> > > by xfs_alloc_longest_free_extent() being longer than the largest
> > > allowable extent allocation (mp->m_ag_max_usable) resulting in the
> > > situation where blen > args->maxlen, and so in the case of initial
> > > allocation here, we never run this:
> > > 
> > > 	/*
> > > 	 * Adjust for alignment
> > > 	 */
> > > 	if (blen > args.alignment && blen <= args.maxlen)
> > > 		args.minlen = blen - args.alignment;
> > > 	args.minalignslop = 0;
> > > 
> ....
> > > > As a step towards addressing this problem, insert a new retry in the
> > > > bmap allocation sequence to drop minlen (from maxlen) before tossing
> > > > alignment. This should still result in as large of an extent as
> > > > possible as the block allocator prioritizes extent size in all but
> > > > exact allocation modes. By itself, this does not change the behavior
> > > > of the command above because the preallocation code still specifies
> > > > total based on maxlen. Instead, this facilitates preservation of
> > > > alignment once extra reservation is separated from the extent length
> > > > portion of the total block requirement.
> > > 
> > > AFAICT this is not necessary. The prototypoe patch I wrote last
> > > night while working through this with Carlos is attached below. I
> > > updated with a variant of your patch 2 to demonstrate that it does
> > > actually solve the problem of full AG allocation failing to be
> > > aligned.
> > > 
> > 
> > I agree that this addresses the reported issue, but I can reproduce
> > other corner cases affected by the original patch that aren't affected
> > by this one. For example, if the allocation request happens to be
> > slightly less than blen but not enough to allow for alignment, minlen
> > isn't dropped and we can run through the same allocation retry sequence
> > that kills off alignment before success.
> 
> But isn't that just another variation of the initial conditions
> (minlen/maxlen) not being set up correctly for alignment when the AG
> is empty?
> 

Perhaps, though I don't think it's exclusive to an empty AG.

> i.e. Take the above condition and change it like this:
> 
>  	/*
>  	 * Adjust for alignment
>  	 */
> -	if (blen > args.alignment && blen <= args.maxlen)
> +	if (blen > args.alignment && blen <= args.maxlen + args.alignment)
>  		args.minlen = blen - args.alignment;
>  	args.minalignslop = 0;
> 
> and now we cover all the cases when blen covers an aligned maxlen
> allocation...
> 

Do we want to consider whether minlen goes to 1? Otherwise that looks
reasonable to me. What I was trying to get at is just that we should
consider whether there are any other corner cases (that we might care
about) where this particular allocation might not behave as expected vs.
just the example used in the original commit log.

If somebody wants to send a finalized patch or two with these fixes
along with the bma.total one (or I can tack it on in reply..?), I'll
think about it further on review as well..

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res
  2019-09-12 14:32 [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
  2019-09-12 14:32 ` [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
  2019-09-12 14:32 ` [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
@ 2019-09-16  8:18 ` Carlos Maiolino
  2019-10-18 17:17 ` Darrick J. Wong
  3 siblings, 0 replies; 17+ messages in thread
From: Carlos Maiolino @ 2019-09-16  8:18 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Sep 12, 2019 at 10:32:21AM -0400, Brian Foster wrote:
> Hi all,
> 
> This is a repost of a couple patches I posted a few months ago[1]. There
> are no changes other than a rebase to for-next. Any thoughts on these? I
> think Carlos had also run into some related generic/223 failures fairly
> recently...

Hi Brian,

Sure, I'll take a look at these patches now, my apologies, I only saw these
patches now.

> 
> Carlos,
> 
> Any chance you could give these a try?
> 
> Brian
> 
> [1] https://lore.kernel.org/linux-xfs/20190501140504.16435-1-bfoster@redhat.com/
> 
> Brian Foster (2):
>   xfs: drop minlen before tossing alignment on bmap allocs
>   xfs: don't set bmapi total block req where minleft is sufficient
> 
>  fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
>  fs/xfs/xfs_bmap_util.c   |  4 ++--
>  fs/xfs/xfs_dquot.c       |  4 ++--
>  fs/xfs/xfs_iomap.c       |  4 ++--
>  fs/xfs/xfs_reflink.c     |  4 ++--
>  fs/xfs/xfs_rtalloc.c     |  3 +--
>  6 files changed, 18 insertions(+), 14 deletions(-)
> 
> -- 
> 2.20.1
> 

-- 
Carlos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-15 13:09         ` Brian Foster
@ 2019-09-16  8:36           ` Carlos Maiolino
  0 siblings, 0 replies; 17+ messages in thread
From: Carlos Maiolino @ 2019-09-16  8:36 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

> > > > aligned.
> > > > 
> > > 
> > > I agree that this addresses the reported issue, but I can reproduce
> > > other corner cases affected by the original patch that aren't affected
> > > by this one. For example, if the allocation request happens to be
> > > slightly less than blen but not enough to allow for alignment, minlen
> > > isn't dropped and we can run through the same allocation retry sequence
> > > that kills off alignment before success.
> > 
> > But isn't that just another variation of the initial conditions
> > (minlen/maxlen) not being set up correctly for alignment when the AG
> > is empty?
> > 
> 
> Perhaps, though I don't think it's exclusive to an empty AG.
> 
> > i.e. Take the above condition and change it like this:
> > 
> >  	/*
> >  	 * Adjust for alignment
> >  	 */
> > -	if (blen > args.alignment && blen <= args.maxlen)
> > +	if (blen > args.alignment && blen <= args.maxlen + args.alignment)
> >  		args.minlen = blen - args.alignment;
> >  	args.minalignslop = 0;
> > 
> > and now we cover all the cases when blen covers an aligned maxlen
> > allocation...
> > 
> 
> Do we want to consider whether minlen goes to 1? Otherwise that looks
> reasonable to me. What I was trying to get at is just that we should
> consider whether there are any other corner cases (that we might care
> about) where this particular allocation might not behave as expected vs.
> just the example used in the original commit log.
> 
> If somebody wants to send a finalized patch or two with these fixes
> along with the bma.total one (or I can tack it on in reply..?), I'll
> think about it further on review as well..

I didn't realize the conversation was already going on before I replied for the
first time, my apologies for unnecessary emails.

I've been working on some patches about this issue since I had this chat with
Dave, but I do not have anything 'mature' yet, exactly because some of the
issues you mentioned above, like the behavior not being exclusive to an empty
AG, and the fact the generic/223 was still failing after the 'fix' has been
applied (the single large fallocated file worked, but generic/223 no), so I was
kind of chasing my own tails on Friday :D

I'll get back to it today and see what I can do with fresh eyes.

Thanks Dave and Brian.

> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com

-- 
Carlos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-12 22:35   ` Dave Chinner
  2019-09-12 22:46     ` Dave Chinner
  2019-09-13 14:58     ` Brian Foster
@ 2019-09-17 12:22     ` Carlos Maiolino
  2 siblings, 0 replies; 17+ messages in thread
From: Carlos Maiolino @ 2019-09-17 12:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

Hi Dave.

I've been playing a bit with it, and, based on our talk on IRC:

> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4286,6 +4286,20 @@ xfs_bmapi_write(
>  #endif
>  	whichfork = xfs_bmapi_whichfork(flags);
>  
> +	/*
> +	 * XXX: Hack!
> +	 *
> +	 * If the total blocks requested is larger than an AG, we can't allocate
> +	 * all the space atomically and within a single AG. This will be a
> +	 * "short" allocation. In this case, just ignore the total block count
> +	 * and rely on minleft calculations to ensure the allocation we do fits
> +	 * inside an AG properly.
> +	 *
> +	 * Based on a patch from Brian.
> +	 */
> +	if (bma.total > mp->m_ag_max_usable)
> +		bma.total = 0;

Instead zeroing bma.total here, can't we crop it to blen in xfs_bmap_btalloc()?

I did some tests here and looks like the result is the same, although I'm not
sure if it's a good approach.

Cheers

> +
>  	ASSERT(*nmap >= 1);
>  	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
>  	ASSERT(tp != NULL);

-- 
Carlos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-12 14:32 ` [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
  2019-09-12 22:35   ` Dave Chinner
@ 2019-09-18 21:41   ` Darrick J. Wong
  2019-09-19 11:49     ` Brian Foster
  1 sibling, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2019-09-18 21:41 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> The bmap block allocation code issues a sequence of retries to
> perform an optimal allocation, gradually loosening constraints as
> allocations fail. For example, the first attempt might begin at a
> particular bno, with maxlen == minlen and alignment incorporated. As
> allocations fail, the parameters fall back to different modes, drop
> alignment requirements and reduce the minlen and total block
> requirements.
> 
> For large extent allocations with an args.total value that exceeds
> the allocation length (i.e., non-delalloc), the total value tends to
> dominate despite these fallbacks. For example, an aligned extent
> allocation request of tens to hundreds of MB that cannot be
> satisfied from a particular AG will not succeed after dropping
> alignment or minlen because xfs_alloc_space_available() never
> selects an AG that can't satisfy args.total. The retry sequence

"..that can satisfy args.total"?

> eventually reduces total and ultimately succeeds if a minlen extent
> is available somewhere, but the first several retries are
> effectively pointless in this scenario.
> 
> Beyond simply being inefficient, another side effect of this
> behavior is that we drop alignment requirements too aggressively.
> Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> unit:
> 
>  # xfs_io -c "falloc 0 1g" /mnt/file
>  # <xfstests>/src/t_stripealign /mnt/file 32
>  /mnt/file: Start block 347176 not multiple of sunit 32
> 
> Despite the filesystem being completely empty, the fact that the
> allocation request cannot be satisifed from a single AG means the
> allocation doesn't succeed until xfs_bmap_btalloc() drops total from
> the original value based on maxlen. This occurs after we've dropped
> minlen and alignment (unnecessarily).
> 
> As a step towards addressing this problem, insert a new retry in the
> bmap allocation sequence to drop minlen (from maxlen) before tossing
> alignment. This should still result in as large of an extent as
> possible as the block allocator prioritizes extent size in all but
> exact allocation modes. By itself, this does not change the behavior
> of the command above because the preallocation code still specifies
> total based on maxlen. Instead, this facilitates preservation of
> alignment once extra reservation is separated from the extent length
> portion of the total block requirement.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 054b4ce30033..eaa965920a03 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3573,6 +3573,14 @@ xfs_bmap_btalloc(
>  		if ((error = xfs_alloc_vextent(&args)))
>  			return error;
>  	}
> +	if (args.fsbno == NULLFSBLOCK && nullfb &&
> +	    args.minlen > ap->minlen) {

Maybe a comment here to point out that we're retrying the allocation
with the minimum acceptable minlen?  I say this mostly because (some of)
the other clauses have a quick description of the constraints that are
being fed to the allocation request, and it makes it easier to keep
track of what's going on.

> +		args.minlen = ap->minlen;
> +		args.fsbno = ap->blkno;
> +		error = xfs_alloc_vextent(&args);
> +		if (error)
> +			return error;
> +	}
>  	if (isaligned && args.fsbno == NULLFSBLOCK) {
>  		/*
>  		 * allocation failed, so turn off alignment and
> @@ -3584,9 +3592,7 @@ xfs_bmap_btalloc(
>  		if ((error = xfs_alloc_vextent(&args)))
>  			return error;
>  	}
> -	if (args.fsbno == NULLFSBLOCK && nullfb &&
> -	    args.minlen > ap->minlen) {
> -		args.minlen = ap->minlen;
> +	if (args.fsbno == NULLFSBLOCK && nullfb) {
>  		args.type = XFS_ALLOCTYPE_START_BNO;

Particularly when we get here and I have to look pretty closely to
figure out what this retry is now attempting to do.  This one is trying
the allocation again, but now with no alignment and the caller's minlen,
right?

--D

>  		args.fsbno = ap->blkno;
>  		if ((error = xfs_alloc_vextent(&args)))
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient
  2019-09-12 14:32 ` [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
@ 2019-09-18 21:55   ` Darrick J. Wong
  2019-09-19 11:55     ` Brian Foster
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2019-09-18 21:55 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On Thu, Sep 12, 2019 at 10:32:23AM -0400, Brian Foster wrote:
> xfs_bmapi_write() takes a total block requirement parameter that is
> passed down to the block allocation code and is used to specify the
> total block requirement of the associated transaction. This is used
> to try and select an AG that can not only satisfy the requested
> extent allocation, but can also accommodate subsequent allocations
> that might be required to complete the transaction. For example,
> additional bmbt block allocations may be required on insertion of
> the resulting extent to an inode data fork.
> 
> While it's important for callers to calculate and reserve such extra
> blocks in the transaction, it is not necessary to pass the total
> value to xfs_bmapi_write() in all cases. The latter automatically
> sets minleft to ensure that sufficient free blocks remain after the
> allocation attempt to expand the format of the associated inode
> (i.e., such as extent to btree conversion, btree splits, etc).
> Therefore, any callers that pass a total block requirement of the
> bmap mapping length plus worst case bmbt expansion essentially
> specify the additional reservation requirement twice. These callers
> can pass a total of zero to rely on the bmapi minleft policy.
> 
> Beyond being superfluous, the primary motivation for this change is
> that the total reservation logic in the bmbt code is dubious in
> scenarios where minlen < maxlen and a maxlen extent cannot be
> allocated (which is more common for data extent allocations where
> contiguity is not required). The total value is based on maxlen in
> the xfs_bmapi_write() caller. If the bmbt code falls back to an
> allocation between minlen and maxlen, that allocation will not
> succeed until total is reset to minlen, which essentially throws
> away any additional reservation included in total by the caller. In

Hm, are you talking about lowmode allocations and the "retry with fewer
constraints" behavior in xfs_bmap_btalloc?

> addition, the total value is not reset until after alignment is
> dropped, which means that such callers drop alignment far too
> aggressively than necessary.

Does that need fixing?

> Update all callers of xfs_bmapi_write() that pass a total block
> value of the mapping length plus bmbt reservation to instead pass
> zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
> requirement. This trades off slightly less conservative AG selection
> for the ability to preserve alignment in more scenarios.
> xfs_bmapi_write() callers that incorporate unrelated or additional
> reservations in total beyond what is already included in minleft
> must continue to use the former.

Does doing this affect the outcome of where bmbt blocks get allocated
with respect to whichever data extent allocation triggered the reshaping
of the bmbt?  I would imagine that it /could/ result in somewhat better
allocation decisions?  But that the primary outcome of these two patches
is that a large fallocate on a filesystem with alignment hints and small
AGs (relative to the fallocate request size) are more likely to spit out
aligned extents?

The code changes look ok, but at the same time I wonder if there's a
bigger picture I'm missing?  FWIW that might just be due to Dave and
Carlos discussing something resulting in the "A small improvement in the
allocation algorithm" series and just a gut feeling that better
coordination (or maintainer soothing :P) is needed.

--D

> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 1 -
>  fs/xfs/xfs_bmap_util.c   | 4 ++--
>  fs/xfs/xfs_dquot.c       | 4 ++--
>  fs/xfs/xfs_iomap.c       | 4 ++--
>  fs/xfs/xfs_reflink.c     | 4 ++--
>  fs/xfs/xfs_rtalloc.c     | 3 +--
>  6 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index eaa965920a03..c2f0afdf2f65 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4505,7 +4505,6 @@ xfs_bmapi_convert_delalloc(
>  	bma.wasdel = true;
>  	bma.offset = bma.got.br_startoff;
>  	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
> -	bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
>  	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
>  	if (whichfork == XFS_COW_FORK)
>  		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 0910cb75b65d..079aedade656 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -962,8 +962,8 @@ xfs_alloc_file_space(
>  		xfs_trans_ijoin(tp, ip, 0);
>  
>  		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> -					allocatesize_fsb, alloc_type, resblks,
> -					imapp, &nimaps);
> +					allocatesize_fsb, alloc_type, 0, imapp,
> +					&nimaps);
>  		if (error)
>  			goto error0;
>  
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index aeb95e7391c1..b924dbd63a7d 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -305,8 +305,8 @@ xfs_dquot_disk_alloc(
>  	/* Create the block mapping. */
>  	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
>  	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
> -			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
> -			XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
> +			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
> +			&nmaps);
>  	if (error)
>  		return error;
>  	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f780e223b118..27f2030690e2 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -277,8 +277,8 @@ xfs_iomap_write_direct(
>  	 * caller gave to us.
>  	 */
>  	nimaps = 1;
> -	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
> -				bmapi_flags, resblks, imap, &nimaps);
> +	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
> +				imap, &nimaps);
>  	if (error)
>  		goto out_res_cancel;
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 0f08153b4994..3aa56cac1a47 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -410,8 +410,8 @@ xfs_reflink_allocate_cow(
>  	/* Allocate the entire reservation as unwritten blocks. */
>  	nimaps = 1;
>  	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> -			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> -			resblks, imap, &nimaps);
> +			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
> +			&nimaps);
>  	if (error)
>  		goto out_unreserve;
>  
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 4a48a8c75b4f..d42b5a2047e0 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -792,8 +792,7 @@ xfs_growfs_rt_alloc(
>  		 */
>  		nmap = 1;
>  		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
> -					XFS_BMAPI_METADATA, resblks, &map,
> -					&nmap);
> +					XFS_BMAPI_METADATA, 0, &map, &nmap);
>  		if (!error && nmap < 1)
>  			error = -ENOSPC;
>  		if (error)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs
  2019-09-18 21:41   ` Darrick J. Wong
@ 2019-09-19 11:49     ` Brian Foster
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2019-09-19 11:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Carlos Maiolino

On Wed, Sep 18, 2019 at 02:41:41PM -0700, Darrick J. Wong wrote:
> On Thu, Sep 12, 2019 at 10:32:22AM -0400, Brian Foster wrote:
> > The bmap block allocation code issues a sequence of retries to
> > perform an optimal allocation, gradually loosening constraints as
> > allocations fail. For example, the first attempt might begin at a
> > particular bno, with maxlen == minlen and alignment incorporated. As
> > allocations fail, the parameters fall back to different modes, drop
> > alignment requirements and reduce the minlen and total block
> > requirements.
> > 
> > For large extent allocations with an args.total value that exceeds
> > the allocation length (i.e., non-delalloc), the total value tends to
> > dominate despite these fallbacks. For example, an aligned extent
> > allocation request of tens to hundreds of MB that cannot be
> > satisfied from a particular AG will not succeed after dropping
> > alignment or minlen because xfs_alloc_space_available() never
> > selects an AG that can't satisfy args.total. The retry sequence
> 
> "..that can satisfy args.total"?
> 

Heh. "can't" was intended there, but after reading it back it is poorly
worded as a double negative. :P Basically it's just saying that
args.total is what restricts AG selection in this corner case.

> > eventually reduces total and ultimately succeeds if a minlen extent
> > is available somewhere, but the first several retries are
> > effectively pointless in this scenario.
> > 
> > Beyond simply being inefficient, another side effect of this
> > behavior is that we drop alignment requirements too aggressively.
> > Consider a 1GB fallocate on a 15GB fs with 16 AGs and 128k stripe
> > unit:
> > 
> >  # xfs_io -c "falloc 0 1g" /mnt/file
> >  # <xfstests>/src/t_stripealign /mnt/file 32
> >  /mnt/file: Start block 347176 not multiple of sunit 32
> > 
> > Despite the filesystem being completely empty, the fact that the
> > allocation request cannot be satisifed from a single AG means the
> > allocation doesn't succeed until xfs_bmap_btalloc() drops total from
> > the original value based on maxlen. This occurs after we've dropped
> > minlen and alignment (unnecessarily).
> > 
> > As a step towards addressing this problem, insert a new retry in the
> > bmap allocation sequence to drop minlen (from maxlen) before tossing
> > alignment. This should still result in as large of an extent as
> > possible as the block allocator prioritizes extent size in all but
> > exact allocation modes. By itself, this does not change the behavior
> > of the command above because the preallocation code still specifies
> > total based on maxlen. Instead, this facilitates preservation of
> > alignment once extra reservation is separated from the extent length
> > portion of the total block requirement.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 054b4ce30033..eaa965920a03 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -3573,6 +3573,14 @@ xfs_bmap_btalloc(
> >  		if ((error = xfs_alloc_vextent(&args)))
> >  			return error;
> >  	}
> > +	if (args.fsbno == NULLFSBLOCK && nullfb &&
> > +	    args.minlen > ap->minlen) {
> 
> Maybe a comment here to point out that we're retrying the allocation
> with the minimum acceptable minlen?  I say this mostly because (some of)
> the other clauses have a quick description of the constraints that are
> being fed to the allocation request, and it makes it easier to keep
> track of what's going on.
> 

Yeah..

> > +		args.minlen = ap->minlen;
> > +		args.fsbno = ap->blkno;
> > +		error = xfs_alloc_vextent(&args);
> > +		if (error)
> > +			return error;
> > +	}
> >  	if (isaligned && args.fsbno == NULLFSBLOCK) {
> >  		/*
> >  		 * allocation failed, so turn off alignment and
> > @@ -3584,9 +3592,7 @@ xfs_bmap_btalloc(
> >  		if ((error = xfs_alloc_vextent(&args)))
> >  			return error;
> >  	}
> > -	if (args.fsbno == NULLFSBLOCK && nullfb &&
> > -	    args.minlen > ap->minlen) {
> > -		args.minlen = ap->minlen;
> > +	if (args.fsbno == NULLFSBLOCK && nullfb) {
> >  		args.type = XFS_ALLOCTYPE_START_BNO;
> 
> Particularly when we get here and I have to look pretty closely to
> figure out what this retry is now attempting to do.  This one is trying
> the allocation again, but now with no alignment and the caller's minlen,
> right?
> 

Yeah, but this branch also (potentially) changes the allocation type.
IIRC, this wasn't relevant to the corner case I was trying to address
with this patch. I basically just wanted to get minlen dropped before
tossing alignment and at the same time didn't want to screw around with
the existing logic that was unrelated (hence the separation of the
minlen update into a new retry as opposed to just reordering code).

That said, the other subthread suggests this patch can be replaced with
more localized fixes to bma.minlen alignment handling code. My original
reproducer of this problem was based on some of the extent allocation
rework bits that have been deferred (rework of the size allocation
mode). I wasn't aware of the bma.minlen alignment logic at the time, so
I may have misanalyzed the problem and my current development tree
doesn't reproduce. I'll make the tweaks to this patch locally in the
event that I run into the problem again when I get back to that work and
if there still happens to be corner cases not covered by the minlen
fixup patch that Carlos has posted..

Brian

> --D
> 
> >  		args.fsbno = ap->blkno;
> >  		if ((error = xfs_alloc_vextent(&args)))
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient
  2019-09-18 21:55   ` Darrick J. Wong
@ 2019-09-19 11:55     ` Brian Foster
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2019-09-19 11:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Carlos Maiolino

On Wed, Sep 18, 2019 at 02:55:16PM -0700, Darrick J. Wong wrote:
> On Thu, Sep 12, 2019 at 10:32:23AM -0400, Brian Foster wrote:
> > xfs_bmapi_write() takes a total block requirement parameter that is
> > passed down to the block allocation code and is used to specify the
> > total block requirement of the associated transaction. This is used
> > to try and select an AG that can not only satisfy the requested
> > extent allocation, but can also accommodate subsequent allocations
> > that might be required to complete the transaction. For example,
> > additional bmbt block allocations may be required on insertion of
> > the resulting extent to an inode data fork.
> > 
> > While it's important for callers to calculate and reserve such extra
> > blocks in the transaction, it is not necessary to pass the total
> > value to xfs_bmapi_write() in all cases. The latter automatically
> > sets minleft to ensure that sufficient free blocks remain after the
> > allocation attempt to expand the format of the associated inode
> > (i.e., such as extent to btree conversion, btree splits, etc).
> > Therefore, any callers that pass a total block requirement of the
> > bmap mapping length plus worst case bmbt expansion essentially
> > specify the additional reservation requirement twice. These callers
> > can pass a total of zero to rely on the bmapi minleft policy.
> > 
> > Beyond being superfluous, the primary motivation for this change is
> > that the total reservation logic in the bmbt code is dubious in
> > scenarios where minlen < maxlen and a maxlen extent cannot be
> > allocated (which is more common for data extent allocations where
> > contiguity is not required). The total value is based on maxlen in
> > the xfs_bmapi_write() caller. If the bmbt code falls back to an
> > allocation between minlen and maxlen, that allocation will not
> > succeed until total is reset to minlen, which essentially throws
> > away any additional reservation included in total by the caller. In
> 
> Hm, are you talking about lowmode allocations and the "retry with fewer
> constraints" behavior in xfs_bmap_btalloc?
> 

This isn't related to low space mode. Consider a simple example of
xfs_alloc_file_space() calling xfs_bmapi_write() with a total param of
the maxlen + bmbt reservation. xfs_bmapi_write() assigns bma.total, this
makes its way to args.total and the allocation code thus won't pick an
AG with less space available than specified in args.total (see
xfs_alloc_space_available()). If args.total is what prevents AG
selection for a particular allocation, the allocation retries in
xfs_bmap_btalloc() are going to fail until we get to the one that does:

	args.total = ap->minlen;

... which basically means we're now free to select an AG that satisfies
minlen without any consideration for the "+ bmbt reservation" part that
was added to bma.total in the first place.

This is separate from the observation that the bmap code already assigns
[bma|args].minleft (xfs_bmapi_minleft()) to a value that considers that
additional bmbt blocks might be required for btree splits due to the
extent allocation/mapping. With that, my understanding was that a
bma.total of maxlen + bmbt res is unnecessary because the bmap code
already takes the bmbt res into consideration itself.

> > addition, the total value is not reset until after alignment is
> > dropped, which means that such callers drop alignment far too
> > aggressively than necessary.
> 
> Does that need fixing?
> 

That was the intent of the first patch. :)

> > Update all callers of xfs_bmapi_write() that pass a total block
> > value of the mapping length plus bmbt reservation to instead pass
> > zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
> > requirement. This trades off slightly less conservative AG selection
> > for the ability to preserve alignment in more scenarios.
> > xfs_bmapi_write() callers that incorporate unrelated or additional
> > reservations in total beyond what is already included in minleft
> > must continue to use the former.
> 
> Does doing this affect the outcome of where bmbt blocks get allocated
> with respect to whichever data extent allocation triggered the reshaping
> of the bmbt?  I would imagine that it /could/ result in somewhat better
> allocation decisions?  But that the primary outcome of these two patches
> is that a large fallocate on a filesystem with alignment hints and small
> AGs (relative to the fallocate request size) are more likely to spit out
> aligned extents?
> 

Yeah, the intent is to try and honor alignment in more cases. I don't
think this affects bmbt block allocation because of the minleft bits
mentioned above. Rather, this just means that if we _did_ have some
subset of AGs where bma.total (maxlen+bmbt res) could be satisfied,
we're no longer restricting ourselves to those AGs over others where
minlen+minleft+alignment might be possible. IOW, this takes into
consideration the behavior change from the previous patch (or Carlos'
variant thereof).

> The code changes look ok, but at the same time I wonder if there's a
> bigger picture I'm missing?  FWIW that might just be due to Dave and
> Carlos discussing something resulting in the "A small improvement in the
> allocation algorithm" series and just a gut feeling that better
> coordination (or maintainer soothing :P) is needed.
> 

Yeah, I think what fell out of that ends up replacing the first patch.
AFAICT, this patch is still necessary to prevent bma.total from getting
in the way, though some discussion over Carlos' series is still in
progress. Either way, it probably makes sense for us to work things out
in that series first..

Brian

> --D
> 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c | 1 -
> >  fs/xfs/xfs_bmap_util.c   | 4 ++--
> >  fs/xfs/xfs_dquot.c       | 4 ++--
> >  fs/xfs/xfs_iomap.c       | 4 ++--
> >  fs/xfs/xfs_reflink.c     | 4 ++--
> >  fs/xfs/xfs_rtalloc.c     | 3 +--
> >  6 files changed, 9 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index eaa965920a03..c2f0afdf2f65 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -4505,7 +4505,6 @@ xfs_bmapi_convert_delalloc(
> >  	bma.wasdel = true;
> >  	bma.offset = bma.got.br_startoff;
> >  	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
> > -	bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
> >  	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
> >  	if (whichfork == XFS_COW_FORK)
> >  		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 0910cb75b65d..079aedade656 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -962,8 +962,8 @@ xfs_alloc_file_space(
> >  		xfs_trans_ijoin(tp, ip, 0);
> >  
> >  		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> > -					allocatesize_fsb, alloc_type, resblks,
> > -					imapp, &nimaps);
> > +					allocatesize_fsb, alloc_type, 0, imapp,
> > +					&nimaps);
> >  		if (error)
> >  			goto error0;
> >  
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index aeb95e7391c1..b924dbd63a7d 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -305,8 +305,8 @@ xfs_dquot_disk_alloc(
> >  	/* Create the block mapping. */
> >  	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
> >  	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
> > -			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
> > -			XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
> > +			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
> > +			&nmaps);
> >  	if (error)
> >  		return error;
> >  	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index f780e223b118..27f2030690e2 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -277,8 +277,8 @@ xfs_iomap_write_direct(
> >  	 * caller gave to us.
> >  	 */
> >  	nimaps = 1;
> > -	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
> > -				bmapi_flags, resblks, imap, &nimaps);
> > +	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
> > +				imap, &nimaps);
> >  	if (error)
> >  		goto out_res_cancel;
> >  
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 0f08153b4994..3aa56cac1a47 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -410,8 +410,8 @@ xfs_reflink_allocate_cow(
> >  	/* Allocate the entire reservation as unwritten blocks. */
> >  	nimaps = 1;
> >  	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> > -			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> > -			resblks, imap, &nimaps);
> > +			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
> > +			&nimaps);
> >  	if (error)
> >  		goto out_unreserve;
> >  
> > diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> > index 4a48a8c75b4f..d42b5a2047e0 100644
> > --- a/fs/xfs/xfs_rtalloc.c
> > +++ b/fs/xfs/xfs_rtalloc.c
> > @@ -792,8 +792,7 @@ xfs_growfs_rt_alloc(
> >  		 */
> >  		nmap = 1;
> >  		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
> > -					XFS_BMAPI_METADATA, resblks, &map,
> > -					&nmap);
> > +					XFS_BMAPI_METADATA, 0, &map, &nmap);
> >  		if (!error && nmap < 1)
> >  			error = -ENOSPC;
> >  		if (error)
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res
  2019-09-12 14:32 [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
                   ` (2 preceding siblings ...)
  2019-09-16  8:18 ` [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Carlos Maiolino
@ 2019-10-18 17:17 ` Darrick J. Wong
  2019-10-21 12:14   ` Brian Foster
  3 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2019-10-18 17:17 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Carlos Maiolino

On Thu, Sep 12, 2019 at 10:32:21AM -0400, Brian Foster wrote:
> Hi all,
> 
> This is a repost of a couple patches I posted a few months ago[1]. There
> are no changes other than a rebase to for-next. Any thoughts on these? I
> think Carlos had also run into some related generic/223 failures fairly
> recently...
> 
> Carlos,
> 
> Any chance you could give these a try?

Any progress on this in the last month?  I /think/ this is related to
the unaligned allocations that Dan's complaining about this morning.

--D

> Brian
> 
> [1] https://lore.kernel.org/linux-xfs/20190501140504.16435-1-bfoster@redhat.com/
> 
> Brian Foster (2):
>   xfs: drop minlen before tossing alignment on bmap allocs
>   xfs: don't set bmapi total block req where minleft is sufficient
> 
>  fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
>  fs/xfs/xfs_bmap_util.c   |  4 ++--
>  fs/xfs/xfs_dquot.c       |  4 ++--
>  fs/xfs/xfs_iomap.c       |  4 ++--
>  fs/xfs/xfs_reflink.c     |  4 ++--
>  fs/xfs/xfs_rtalloc.c     |  3 +--
>  6 files changed, 18 insertions(+), 14 deletions(-)
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res
  2019-10-18 17:17 ` Darrick J. Wong
@ 2019-10-21 12:14   ` Brian Foster
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Foster @ 2019-10-21 12:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Carlos Maiolino

On Fri, Oct 18, 2019 at 10:17:20AM -0700, Darrick J. Wong wrote:
> On Thu, Sep 12, 2019 at 10:32:21AM -0400, Brian Foster wrote:
> > Hi all,
> > 
> > This is a repost of a couple patches I posted a few months ago[1]. There
> > are no changes other than a rebase to for-next. Any thoughts on these? I
> > think Carlos had also run into some related generic/223 failures fairly
> > recently...
> > 
> > Carlos,
> > 
> > Any chance you could give these a try?
> 
> Any progress on this in the last month?  I /think/ this is related to
> the unaligned allocations that Dan's complaining about this morning.
> 

Not that I'm aware of. It looks similar and IIRC all that is really
needed here is a tweak to Carlos' and Dave's patch 1 that Carlos posted
(which replaces patch 1 of this series) added with patch 2 of this
series. I've just reposted a v2 series with that combination (including
links/references to hopefully reduce confusion).

Brian

> --D
> 
> > Brian
> > 
> > [1] https://lore.kernel.org/linux-xfs/20190501140504.16435-1-bfoster@redhat.com/
> > 
> > Brian Foster (2):
> >   xfs: drop minlen before tossing alignment on bmap allocs
> >   xfs: don't set bmapi total block req where minleft is sufficient
> > 
> >  fs/xfs/libxfs/xfs_bmap.c | 13 +++++++++----
> >  fs/xfs/xfs_bmap_util.c   |  4 ++--
> >  fs/xfs/xfs_dquot.c       |  4 ++--
> >  fs/xfs/xfs_iomap.c       |  4 ++--
> >  fs/xfs/xfs_reflink.c     |  4 ++--
> >  fs/xfs/xfs_rtalloc.c     |  3 +--
> >  6 files changed, 18 insertions(+), 14 deletions(-)
> > 
> > -- 
> > 2.20.1
> > 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-10-21 12:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-12 14:32 [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
2019-09-12 14:32 ` [PATCH REPOST 1/2] xfs: drop minlen before tossing alignment on bmap allocs Brian Foster
2019-09-12 22:35   ` Dave Chinner
2019-09-12 22:46     ` Dave Chinner
2019-09-13 14:58     ` Brian Foster
2019-09-14 22:00       ` Dave Chinner
2019-09-15 13:09         ` Brian Foster
2019-09-16  8:36           ` Carlos Maiolino
2019-09-17 12:22     ` Carlos Maiolino
2019-09-18 21:41   ` Darrick J. Wong
2019-09-19 11:49     ` Brian Foster
2019-09-12 14:32 ` [PATCH REPOST 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
2019-09-18 21:55   ` Darrick J. Wong
2019-09-19 11:55     ` Brian Foster
2019-09-16  8:18 ` [PATCH REPOST 0/2] xfs: rely on minleft instead of total for bmbt res Carlos Maiolino
2019-10-18 17:17 ` Darrick J. Wong
2019-10-21 12:14   ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).