linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] xfs: rely on minleft instead of total for bmbt res
@ 2019-10-21 12:13 Brian Foster
  2019-10-21 12:13 ` [PATCH v2 1/2] xfs: cap longest free extent to maximum allocatable Brian Foster
  2019-10-21 12:13 ` [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
  0 siblings, 2 replies; 6+ messages in thread
From: Brian Foster @ 2019-10-21 12:13 UTC (permalink / raw)
  To: linux-xfs

This is a v2 that is a combination of two series. Patch 1 is Carlos' and
Dave's patch [1] (that replaces my original patch 1) with an additional
modification based on discussion with Dave [2] in the original v1 post
of this series. Patch 2 is a repost from my previous v1 [3].

Note that this combination of changes is currently untested AFAIA. I'm
planning to run through some tests today and will report back on any
issues...

Brian

v2:
- Added xfs_bmap_btalloc() changes to patch 1.

[1] https://lore.kernel.org/linux-xfs/20190918082453.25266-2-cmaiolino@redhat.com/
[2] https://lore.kernel.org/linux-xfs/20190914220035.GY16973@dread.disaster.area/
[3] https://lore.kernel.org/linux-xfs/20190912143223.24194-3-bfoster@redhat.com/

Brian Foster (1):
  xfs: don't set bmapi total block req where minleft is sufficient

Dave Chinner (1):
  xfs: cap longest free extent to maximum allocatable

 fs/xfs/libxfs/xfs_alloc.c |  3 ++-
 fs/xfs/libxfs/xfs_bmap.c  | 19 +++++++++----------
 fs/xfs/xfs_bmap_util.c    |  4 ++--
 fs/xfs/xfs_dquot.c        |  4 ++--
 fs/xfs/xfs_iomap.c        |  4 ++--
 fs/xfs/xfs_reflink.c      |  4 ++--
 fs/xfs/xfs_rtalloc.c      |  3 +--
 7 files changed, 20 insertions(+), 21 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] xfs: cap longest free extent to maximum allocatable
  2019-10-21 12:13 [PATCH v2 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
@ 2019-10-21 12:13 ` Brian Foster
  2019-10-22 16:16   ` Darrick J. Wong
  2019-10-21 12:13 ` [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
  1 sibling, 1 reply; 6+ messages in thread
From: Brian Foster @ 2019-10-21 12:13 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Cap longest extent to the largest we can allocate based on limits
calculated at mount time. Dynamic state (such as finobt blocks)
can result in the longest free extent exceeding the size we can
allocate, and that results in failure to align full AG allocations
when the AG is empty.

Result:

xfs_io-4413  [003]   426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff

minlen and maxlen are now separated by the alignment size, and
allocation fails because args.total > free space in the AG.

[bfoster: Added xfs_bmap_btalloc() changes.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c |  3 ++-
 fs/xfs/libxfs/xfs_bmap.c  | 18 +++++++++---------
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 533b04aaf6f6..9dead25d2e70 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1989,7 +1989,8 @@ xfs_alloc_longest_free_extent(
 	 * reservations and AGFL rules in place, we can return this extent.
 	 */
 	if (pag->pagf_longest > delta)
-		return pag->pagf_longest - delta;
+		return min_t(xfs_extlen_t, pag->pag_mount->m_ag_max_usable,
+				pag->pagf_longest - delta);
 
 	/* Otherwise, let the caller try for 1 block if there's space. */
 	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 02469d59c787..c118577deaa9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3500,13 +3500,11 @@ xfs_bmap_btalloc(
 			args.mod = args.prod - args.mod;
 	}
 	/*
-	 * If we are not low on available data blocks, and the
-	 * underlying logical volume manager is a stripe, and
-	 * the file offset is zero then try to allocate data
-	 * blocks on stripe unit boundary.
-	 * NOTE: ap->aeof is only set if the allocation length
-	 * is >= the stripe unit and the allocation offset is
-	 * at the end of file.
+	 * If we are not low on available data blocks, and the underlying
+	 * logical volume manager is a stripe, and the file offset is zero then
+	 * try to allocate data blocks on stripe unit boundary. NOTE: ap->aeof
+	 * is only set if the allocation length is >= the stripe unit and the
+	 * allocation offset is at the end of file.
 	 */
 	if (!(ap->tp->t_flags & XFS_TRANS_LOWMODE) && ap->aeof) {
 		if (!ap->offset) {
@@ -3514,9 +3512,11 @@ xfs_bmap_btalloc(
 			atype = args.type;
 			isaligned = 1;
 			/*
-			 * Adjust for alignment
+			 * Adjust minlen to try and preserve alignment if we
+			 * can't guarantee an aligned maxlen extent.
 			 */
-			if (blen > args.alignment && blen <= args.maxlen)
+			if (blen > args.alignment &&
+			    blen <= args.maxlen + args.alignment)
 				args.minlen = blen - args.alignment;
 			args.minalignslop = 0;
 		} else {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient
  2019-10-21 12:13 [PATCH v2 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
  2019-10-21 12:13 ` [PATCH v2 1/2] xfs: cap longest free extent to maximum allocatable Brian Foster
@ 2019-10-21 12:13 ` Brian Foster
  2019-10-22 16:51   ` Darrick J. Wong
  1 sibling, 1 reply; 6+ messages in thread
From: Brian Foster @ 2019-10-21 12:13 UTC (permalink / raw)
  To: linux-xfs

xfs_bmapi_write() takes a total block requirement parameter that is
passed down to the block allocation code and is used to specify the
total block requirement of the associated transaction. This is used
to try and select an AG that can not only satisfy the requested
extent allocation, but can also accommodate subsequent allocations
that might be required to complete the transaction. For example,
additional bmbt block allocations may be required on insertion of
the resulting extent to an inode data fork.

While it's important for callers to calculate and reserve such extra
blocks in the transaction, it is not necessary to pass the total
value to xfs_bmapi_write() in all cases. The latter automatically
sets minleft to ensure that sufficient free blocks remain after the
allocation attempt to expand the format of the associated inode
(i.e., such as extent to btree conversion, btree splits, etc).
Therefore, any callers that pass a total block requirement of the
bmap mapping length plus worst case bmbt expansion essentially
specify the additional reservation requirement twice. These callers
can pass a total of zero to rely on the bmapi minleft policy.

Beyond being superfluous, the primary motivation for this change is
that the total reservation logic in the bmbt code is dubious in
scenarios where minlen < maxlen and a maxlen extent cannot be
allocated (which is more common for data extent allocations where
contiguity is not required). The total value is based on maxlen in
the xfs_bmapi_write() caller. If the bmbt code falls back to an
allocation between minlen and maxlen, that allocation will not
succeed until total is reset to minlen, which essentially throws
away any additional reservation included in total by the caller. In
addition, the total value is not reset until after alignment is
dropped, which means that such callers drop alignment far too
aggressively than necessary.

Update all callers of xfs_bmapi_write() that pass a total block
value of the mapping length plus bmbt reservation to instead pass
zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
requirement. This trades off slightly less conservative AG selection
for the ability to preserve alignment in more scenarios.
xfs_bmapi_write() callers that incorporate unrelated or additional
reservations in total beyond what is already included in minleft
must continue to use the former.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 1 -
 fs/xfs/xfs_bmap_util.c   | 4 ++--
 fs/xfs/xfs_dquot.c       | 4 ++--
 fs/xfs/xfs_iomap.c       | 4 ++--
 fs/xfs/xfs_reflink.c     | 4 ++--
 fs/xfs/xfs_rtalloc.c     | 3 +--
 6 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index c118577deaa9..65e38bd954d1 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4505,7 +4505,6 @@ xfs_bmapi_convert_delalloc(
 	bma.wasdel = true;
 	bma.offset = bma.got.br_startoff;
 	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
-	bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
 	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
 	if (whichfork == XFS_COW_FORK)
 		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 4f443703065e..15547e089565 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -964,8 +964,8 @@ xfs_alloc_file_space(
 		xfs_trans_ijoin(tp, ip, 0);
 
 		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
-					allocatesize_fsb, alloc_type, resblks,
-					imapp, &nimaps);
+					allocatesize_fsb, alloc_type, 0, imapp,
+					&nimaps);
 		if (error)
 			goto error0;
 
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index aeb95e7391c1..b924dbd63a7d 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -305,8 +305,8 @@ xfs_dquot_disk_alloc(
 	/* Create the block mapping. */
 	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
 	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
-			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
-			XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
+			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
+			&nmaps);
 	if (error)
 		return error;
 	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f780e223b118..27f2030690e2 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -277,8 +277,8 @@ xfs_iomap_write_direct(
 	 * caller gave to us.
 	 */
 	nimaps = 1;
-	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
-				bmapi_flags, resblks, imap, &nimaps);
+	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
+				imap, &nimaps);
 	if (error)
 		goto out_res_cancel;
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 0f08153b4994..3aa56cac1a47 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -410,8 +410,8 @@ xfs_reflink_allocate_cow(
 	/* Allocate the entire reservation as unwritten blocks. */
 	nimaps = 1;
 	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
-			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
-			resblks, imap, &nimaps);
+			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
+			&nimaps);
 	if (error)
 		goto out_unreserve;
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 4a48a8c75b4f..d42b5a2047e0 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -792,8 +792,7 @@ xfs_growfs_rt_alloc(
 		 */
 		nmap = 1;
 		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
-					XFS_BMAPI_METADATA, resblks, &map,
-					&nmap);
+					XFS_BMAPI_METADATA, 0, &map, &nmap);
 		if (!error && nmap < 1)
 			error = -ENOSPC;
 		if (error)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] xfs: cap longest free extent to maximum allocatable
  2019-10-21 12:13 ` [PATCH v2 1/2] xfs: cap longest free extent to maximum allocatable Brian Foster
@ 2019-10-22 16:16   ` Darrick J. Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2019-10-22 16:16 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Oct 21, 2019 at 08:13:21AM -0400, Brian Foster wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Cap longest extent to the largest we can allocate based on limits
> calculated at mount time. Dynamic state (such as finobt blocks)
> can result in the longest free extent exceeding the size we can
> allocate, and that results in failure to align full AG allocations
> when the AG is empty.
> 
> Result:
> 
> xfs_io-4413  [003]   426.412459: xfs_alloc_vextent_loopfailed: dev 8:96 agno 0 agbno 32 minlen 243968 maxlen 244000 mod 0 prod 1 minleft 1 total 262148 alignment 32 minalignslop 0 len 0 type NEAR_BNO otype START_BNO wasdel 0 wasfromfl 0 resv 0 datatype 0x5 firstblock 0xffffffffffffffff
> 
> minlen and maxlen are now separated by the alignment size, and
> allocation fails because args.total > free space in the AG.
> 
> [bfoster: Added xfs_bmap_btalloc() changes.]
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_alloc.c |  3 ++-
>  fs/xfs/libxfs/xfs_bmap.c  | 18 +++++++++---------
>  2 files changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 533b04aaf6f6..9dead25d2e70 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1989,7 +1989,8 @@ xfs_alloc_longest_free_extent(
>  	 * reservations and AGFL rules in place, we can return this extent.
>  	 */
>  	if (pag->pagf_longest > delta)
> -		return pag->pagf_longest - delta;
> +		return min_t(xfs_extlen_t, pag->pag_mount->m_ag_max_usable,
> +				pag->pagf_longest - delta);
>  
>  	/* Otherwise, let the caller try for 1 block if there's space. */
>  	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 02469d59c787..c118577deaa9 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3500,13 +3500,11 @@ xfs_bmap_btalloc(
>  			args.mod = args.prod - args.mod;
>  	}
>  	/*
> -	 * If we are not low on available data blocks, and the
> -	 * underlying logical volume manager is a stripe, and
> -	 * the file offset is zero then try to allocate data
> -	 * blocks on stripe unit boundary.
> -	 * NOTE: ap->aeof is only set if the allocation length
> -	 * is >= the stripe unit and the allocation offset is
> -	 * at the end of file.
> +	 * If we are not low on available data blocks, and the underlying
> +	 * logical volume manager is a stripe, and the file offset is zero then
> +	 * try to allocate data blocks on stripe unit boundary. NOTE: ap->aeof
> +	 * is only set if the allocation length is >= the stripe unit and the
> +	 * allocation offset is at the end of file.
>  	 */
>  	if (!(ap->tp->t_flags & XFS_TRANS_LOWMODE) && ap->aeof) {
>  		if (!ap->offset) {
> @@ -3514,9 +3512,11 @@ xfs_bmap_btalloc(
>  			atype = args.type;
>  			isaligned = 1;
>  			/*
> -			 * Adjust for alignment
> +			 * Adjust minlen to try and preserve alignment if we
> +			 * can't guarantee an aligned maxlen extent.
>  			 */
> -			if (blen > args.alignment && blen <= args.maxlen)
> +			if (blen > args.alignment &&
> +			    blen <= args.maxlen + args.alignment)
>  				args.minlen = blen - args.alignment;
>  			args.minalignslop = 0;
>  		} else {
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient
  2019-10-21 12:13 ` [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
@ 2019-10-22 16:51   ` Darrick J. Wong
  2019-10-22 17:10     ` Brian Foster
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2019-10-22 16:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Oct 21, 2019 at 08:13:22AM -0400, Brian Foster wrote:
> xfs_bmapi_write() takes a total block requirement parameter that is
> passed down to the block allocation code and is used to specify the
> total block requirement of the associated transaction. This is used
> to try and select an AG that can not only satisfy the requested
> extent allocation, but can also accommodate subsequent allocations
> that might be required to complete the transaction. For example,
> additional bmbt block allocations may be required on insertion of
> the resulting extent to an inode data fork.
> 
> While it's important for callers to calculate and reserve such extra
> blocks in the transaction, it is not necessary to pass the total
> value to xfs_bmapi_write() in all cases. The latter automatically
> sets minleft to ensure that sufficient free blocks remain after the
> allocation attempt to expand the format of the associated inode
> (i.e., such as extent to btree conversion, btree splits, etc).
> Therefore, any callers that pass a total block requirement of the
> bmap mapping length plus worst case bmbt expansion essentially
> specify the additional reservation requirement twice. These callers
> can pass a total of zero to rely on the bmapi minleft policy.
> 
> Beyond being superfluous, the primary motivation for this change is
> that the total reservation logic in the bmbt code is dubious in
> scenarios where minlen < maxlen and a maxlen extent cannot be
> allocated (which is more common for data extent allocations where
> contiguity is not required). The total value is based on maxlen in
> the xfs_bmapi_write() caller. If the bmbt code falls back to an
> allocation between minlen and maxlen, that allocation will not
> succeed until total is reset to minlen, which essentially throws
> away any additional reservation included in total by the caller. In
> addition, the total value is not reset until after alignment is
> dropped, which means that such callers drop alignment far too
> aggressively than necessary.
> 
> Update all callers of xfs_bmapi_write() that pass a total block
> value of the mapping length plus bmbt reservation to instead pass
> zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
> requirement. This trades off slightly less conservative AG selection
> for the ability to preserve alignment in more scenarios.
> xfs_bmapi_write() callers that incorporate unrelated or additional
> reservations in total beyond what is already included in minleft
> must continue to use the former.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Hmm, /me wonders about the other xfs_bmapi_write cases (remote attrs,
dir/attr allocations, and symlinks) which pass in a nonzero @total?  I
guess those are metadata so we have no alignment requirements and
over-reserving isn't a huge deal?

I also wonder if the xfs_bmapi_write call in xfs_iomap_write_unwritten
needs the same treatment, but the resblks calculation inexplicably(?)
doubles itself.  Not sure why, since in the "splitting the middle of an
extent" case we hold the ilock across both insertions, and one split
should leave us with two half-full leaves, right?  (Maybe this is its
own investigate-and-fix patch...)

<shrug> In the mean time, this part looks ok.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/xfs/libxfs/xfs_bmap.c | 1 -
>  fs/xfs/xfs_bmap_util.c   | 4 ++--
>  fs/xfs/xfs_dquot.c       | 4 ++--
>  fs/xfs/xfs_iomap.c       | 4 ++--
>  fs/xfs/xfs_reflink.c     | 4 ++--
>  fs/xfs/xfs_rtalloc.c     | 3 +--
>  6 files changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index c118577deaa9..65e38bd954d1 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -4505,7 +4505,6 @@ xfs_bmapi_convert_delalloc(
>  	bma.wasdel = true;
>  	bma.offset = bma.got.br_startoff;
>  	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
> -	bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
>  	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
>  	if (whichfork == XFS_COW_FORK)
>  		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 4f443703065e..15547e089565 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -964,8 +964,8 @@ xfs_alloc_file_space(
>  		xfs_trans_ijoin(tp, ip, 0);
>  
>  		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> -					allocatesize_fsb, alloc_type, resblks,
> -					imapp, &nimaps);
> +					allocatesize_fsb, alloc_type, 0, imapp,
> +					&nimaps);
>  		if (error)
>  			goto error0;
>  
> diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> index aeb95e7391c1..b924dbd63a7d 100644
> --- a/fs/xfs/xfs_dquot.c
> +++ b/fs/xfs/xfs_dquot.c
> @@ -305,8 +305,8 @@ xfs_dquot_disk_alloc(
>  	/* Create the block mapping. */
>  	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
>  	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
> -			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
> -			XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
> +			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
> +			&nmaps);
>  	if (error)
>  		return error;
>  	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index f780e223b118..27f2030690e2 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -277,8 +277,8 @@ xfs_iomap_write_direct(
>  	 * caller gave to us.
>  	 */
>  	nimaps = 1;
> -	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
> -				bmapi_flags, resblks, imap, &nimaps);
> +	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
> +				imap, &nimaps);
>  	if (error)
>  		goto out_res_cancel;
>  
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 0f08153b4994..3aa56cac1a47 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -410,8 +410,8 @@ xfs_reflink_allocate_cow(
>  	/* Allocate the entire reservation as unwritten blocks. */
>  	nimaps = 1;
>  	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> -			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> -			resblks, imap, &nimaps);
> +			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
> +			&nimaps);
>  	if (error)
>  		goto out_unreserve;
>  
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index 4a48a8c75b4f..d42b5a2047e0 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -792,8 +792,7 @@ xfs_growfs_rt_alloc(
>  		 */
>  		nmap = 1;
>  		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
> -					XFS_BMAPI_METADATA, resblks, &map,
> -					&nmap);
> +					XFS_BMAPI_METADATA, 0, &map, &nmap);
>  		if (!error && nmap < 1)
>  			error = -ENOSPC;
>  		if (error)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient
  2019-10-22 16:51   ` Darrick J. Wong
@ 2019-10-22 17:10     ` Brian Foster
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2019-10-22 17:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, Oct 22, 2019 at 09:51:47AM -0700, Darrick J. Wong wrote:
> On Mon, Oct 21, 2019 at 08:13:22AM -0400, Brian Foster wrote:
> > xfs_bmapi_write() takes a total block requirement parameter that is
> > passed down to the block allocation code and is used to specify the
> > total block requirement of the associated transaction. This is used
> > to try and select an AG that can not only satisfy the requested
> > extent allocation, but can also accommodate subsequent allocations
> > that might be required to complete the transaction. For example,
> > additional bmbt block allocations may be required on insertion of
> > the resulting extent to an inode data fork.
> > 
> > While it's important for callers to calculate and reserve such extra
> > blocks in the transaction, it is not necessary to pass the total
> > value to xfs_bmapi_write() in all cases. The latter automatically
> > sets minleft to ensure that sufficient free blocks remain after the
> > allocation attempt to expand the format of the associated inode
> > (i.e., such as extent to btree conversion, btree splits, etc).
> > Therefore, any callers that pass a total block requirement of the
> > bmap mapping length plus worst case bmbt expansion essentially
> > specify the additional reservation requirement twice. These callers
> > can pass a total of zero to rely on the bmapi minleft policy.
> > 
> > Beyond being superfluous, the primary motivation for this change is
> > that the total reservation logic in the bmbt code is dubious in
> > scenarios where minlen < maxlen and a maxlen extent cannot be
> > allocated (which is more common for data extent allocations where
> > contiguity is not required). The total value is based on maxlen in
> > the xfs_bmapi_write() caller. If the bmbt code falls back to an
> > allocation between minlen and maxlen, that allocation will not
> > succeed until total is reset to minlen, which essentially throws
> > away any additional reservation included in total by the caller. In
> > addition, the total value is not reset until after alignment is
> > dropped, which means that such callers drop alignment far too
> > aggressively than necessary.
> > 
> > Update all callers of xfs_bmapi_write() that pass a total block
> > value of the mapping length plus bmbt reservation to instead pass
> > zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
> > requirement. This trades off slightly less conservative AG selection
> > for the ability to preserve alignment in more scenarios.
> > xfs_bmapi_write() callers that incorporate unrelated or additional
> > reservations in total beyond what is already included in minleft
> > must continue to use the former.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> 
> Hmm, /me wonders about the other xfs_bmapi_write cases (remote attrs,
> dir/attr allocations, and symlinks) which pass in a nonzero @total?  I
> guess those are metadata so we have no alignment requirements and
> over-reserving isn't a huge deal?
> 

I skipped over callsites like symlink just because they appeared to use
total for things other than btree expansion. That's what the bmapi layer
already handles via minleft, so those calls where total is equivalent
are straightforward conversions. We could probably do further work to
clean up bma.total in a more general sense..

> I also wonder if the xfs_bmapi_write call in xfs_iomap_write_unwritten
> needs the same treatment, but the resblks calculation inexplicably(?)
> doubles itself.  Not sure why, since in the "splitting the middle of an
> extent" case we hold the ilock across both insertions, and one split
> should leave us with two half-full leaves, right?  (Maybe this is its
> own investigate-and-fix patch...)
> 

Yeah, I'm guessing I skipped over that one without thinking too hard
about it because it doubles resblks, but I agree that calculation doesn't
make a whole lot of practical sense. I'll make a note to send another
patch for this one..

> <shrug> In the mean time, this part looks ok.
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 

Thanks!

Brian

> --D
> 
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c | 1 -
> >  fs/xfs/xfs_bmap_util.c   | 4 ++--
> >  fs/xfs/xfs_dquot.c       | 4 ++--
> >  fs/xfs/xfs_iomap.c       | 4 ++--
> >  fs/xfs/xfs_reflink.c     | 4 ++--
> >  fs/xfs/xfs_rtalloc.c     | 3 +--
> >  6 files changed, 9 insertions(+), 11 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index c118577deaa9..65e38bd954d1 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -4505,7 +4505,6 @@ xfs_bmapi_convert_delalloc(
> >  	bma.wasdel = true;
> >  	bma.offset = bma.got.br_startoff;
> >  	bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN);
> > -	bma.total = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
> >  	bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork);
> >  	if (whichfork == XFS_COW_FORK)
> >  		bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 4f443703065e..15547e089565 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -964,8 +964,8 @@ xfs_alloc_file_space(
> >  		xfs_trans_ijoin(tp, ip, 0);
> >  
> >  		error = xfs_bmapi_write(tp, ip, startoffset_fsb,
> > -					allocatesize_fsb, alloc_type, resblks,
> > -					imapp, &nimaps);
> > +					allocatesize_fsb, alloc_type, 0, imapp,
> > +					&nimaps);
> >  		if (error)
> >  			goto error0;
> >  
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index aeb95e7391c1..b924dbd63a7d 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> > @@ -305,8 +305,8 @@ xfs_dquot_disk_alloc(
> >  	/* Create the block mapping. */
> >  	xfs_trans_ijoin(tp, quotip, XFS_ILOCK_EXCL);
> >  	error = xfs_bmapi_write(tp, quotip, dqp->q_fileoffset,
> > -			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA,
> > -			XFS_QM_DQALLOC_SPACE_RES(mp), &map, &nmaps);
> > +			XFS_DQUOT_CLUSTER_SIZE_FSB, XFS_BMAPI_METADATA, 0, &map,
> > +			&nmaps);
> >  	if (error)
> >  		return error;
> >  	ASSERT(map.br_blockcount == XFS_DQUOT_CLUSTER_SIZE_FSB);
> > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > index f780e223b118..27f2030690e2 100644
> > --- a/fs/xfs/xfs_iomap.c
> > +++ b/fs/xfs/xfs_iomap.c
> > @@ -277,8 +277,8 @@ xfs_iomap_write_direct(
> >  	 * caller gave to us.
> >  	 */
> >  	nimaps = 1;
> > -	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb,
> > -				bmapi_flags, resblks, imap, &nimaps);
> > +	error = xfs_bmapi_write(tp, ip, offset_fsb, count_fsb, bmapi_flags, 0,
> > +				imap, &nimaps);
> >  	if (error)
> >  		goto out_res_cancel;
> >  
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index 0f08153b4994..3aa56cac1a47 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -410,8 +410,8 @@ xfs_reflink_allocate_cow(
> >  	/* Allocate the entire reservation as unwritten blocks. */
> >  	nimaps = 1;
> >  	error = xfs_bmapi_write(tp, ip, imap->br_startoff, imap->br_blockcount,
> > -			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC,
> > -			resblks, imap, &nimaps);
> > +			XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC, 0, imap,
> > +			&nimaps);
> >  	if (error)
> >  		goto out_unreserve;
> >  
> > diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> > index 4a48a8c75b4f..d42b5a2047e0 100644
> > --- a/fs/xfs/xfs_rtalloc.c
> > +++ b/fs/xfs/xfs_rtalloc.c
> > @@ -792,8 +792,7 @@ xfs_growfs_rt_alloc(
> >  		 */
> >  		nmap = 1;
> >  		error = xfs_bmapi_write(tp, ip, oblocks, nblocks - oblocks,
> > -					XFS_BMAPI_METADATA, resblks, &map,
> > -					&nmap);
> > +					XFS_BMAPI_METADATA, 0, &map, &nmap);
> >  		if (!error && nmap < 1)
> >  			error = -ENOSPC;
> >  		if (error)
> > -- 
> > 2.20.1
> > 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-10-22 17:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 12:13 [PATCH v2 0/2] xfs: rely on minleft instead of total for bmbt res Brian Foster
2019-10-21 12:13 ` [PATCH v2 1/2] xfs: cap longest free extent to maximum allocatable Brian Foster
2019-10-22 16:16   ` Darrick J. Wong
2019-10-21 12:13 ` [PATCH v2 2/2] xfs: don't set bmapi total block req where minleft is sufficient Brian Foster
2019-10-22 16:51   ` Darrick J. Wong
2019-10-22 17:10     ` Brian Foster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).