linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
@ 2023-03-16 16:47 Darrick J. Wong
  2023-03-18  0:38 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Darrick J. Wong @ 2023-03-16 16:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

From: Darrick J. Wong <djwong@kernel.org>

Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
want us to perform a non-blocking scan of the AGs for free space.  There
are no ordering constraints for non-blocking AGF lock acquisition, so
the scan can freely start over at AG 0 even when minimum_agno > 0.

This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
pointer patchset applied and the realtime volume enabled.  I observed
the following sequence as part of an xfs_dir_createname call:

0. Fragment the free space, then allocate nearly all the free space in
   all AGs except AG 0.

1. Create a directory in AG 2 and let it grow for a while.

2. Try to allocate 2 blocks to expand the dirent part of a directory.
   The space will be allocated out of AG 0, but the allocation will not
   be contiguous.  This (I think) activates the LOWMODE allocator.

3. The bmapi call decides to convert from extents to bmbt format and
   tries to allocate 1 block.  This allocation request calls
   xfs_alloc_vextent_start_ag with the inode number, which starts the
   scan at AG 2.  We ignore AG 0 (with all its free space) and instead
   scrape AG 2 and 3 for more space.  We find one block, but this now
   kicks t_highest_agno to 3.

4. The createname call decides it needs to split the dabtree.  It tries
   to allocate even more space with xfs_alloc_vextent_start_ag, but now
   we're constrained to AG 3, and we don't find the space.  The
   createname returns ENOSPC and the filesystem shuts down.

This change fixes the problem by making the trylock scan wrap around to
AG 0 if it doesn't like the AGs that it finds.  Since the current
transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
we take space from the AG that has plenty.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_alloc.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 8999e38e1bed..bd7112d430b6 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -3326,11 +3326,14 @@ xfs_alloc_vextent_iterate_ags(
 	uint32_t		flags)
 {
 	struct xfs_mount	*mp = args->mp;
+	xfs_agnumber_t		restart_agno = minimum_agno;
 	xfs_agnumber_t		agno;
 	int			error = 0;
 
+	if (flags & XFS_ALLOC_FLAG_TRYLOCK)
+		restart_agno = 0;
 restart:
-	for_each_perag_wrap_range(mp, start_agno, minimum_agno,
+	for_each_perag_wrap_range(mp, start_agno, restart_agno,
 			mp->m_sb.sb_agcount, agno, args->pag) {
 		args->agno = agno;
 		error = xfs_alloc_vextent_prepare_ag(args);
@@ -3369,6 +3372,7 @@ xfs_alloc_vextent_iterate_ags(
 	 */
 	if (flags) {
 		flags = 0;
+		restart_agno = minimum_agno;
 		goto restart;
 	}
 

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
  2023-03-16 16:47 [PATCH] xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags Darrick J. Wong
@ 2023-03-18  0:38 ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2023-03-18  0:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Thu, Mar 16, 2023 at 09:47:21AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
> want us to perform a non-blocking scan of the AGs for free space.  There
> are no ordering constraints for non-blocking AGF lock acquisition, so
> the scan can freely start over at AG 0 even when minimum_agno > 0.
> 
> This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
> pointer patchset applied and the realtime volume enabled.  I observed
> the following sequence as part of an xfs_dir_createname call:
> 
> 0. Fragment the free space, then allocate nearly all the free space in
>    all AGs except AG 0.
> 
> 1. Create a directory in AG 2 and let it grow for a while.
> 
> 2. Try to allocate 2 blocks to expand the dirent part of a directory.
>    The space will be allocated out of AG 0, but the allocation will not
>    be contiguous.  This (I think) activates the LOWMODE allocator.
> 
> 3. The bmapi call decides to convert from extents to bmbt format and
>    tries to allocate 1 block.  This allocation request calls
>    xfs_alloc_vextent_start_ag with the inode number, which starts the
>    scan at AG 2.  We ignore AG 0 (with all its free space) and instead
>    scrape AG 2 and 3 for more space.  We find one block, but this now
>    kicks t_highest_agno to 3.
> 
> 4. The createname call decides it needs to split the dabtree.  It tries
>    to allocate even more space with xfs_alloc_vextent_start_ag, but now
>    we're constrained to AG 3, and we don't find the space.  The
>    createname returns ENOSPC and the filesystem shuts down.
> 
> This change fixes the problem by making the trylock scan wrap around to
> AG 0 if it doesn't like the AGs that it finds.  Since the current
> transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
> we take space from the AG that has plenty.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_alloc.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 8999e38e1bed..bd7112d430b6 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -3326,11 +3326,14 @@ xfs_alloc_vextent_iterate_ags(
>  	uint32_t		flags)
>  {
>  	struct xfs_mount	*mp = args->mp;
> +	xfs_agnumber_t		restart_agno = minimum_agno;
>  	xfs_agnumber_t		agno;
>  	int			error = 0;
>  
> +	if (flags & XFS_ALLOC_FLAG_TRYLOCK)
> +		restart_agno = 0;
>  restart:
> -	for_each_perag_wrap_range(mp, start_agno, minimum_agno,
> +	for_each_perag_wrap_range(mp, start_agno, restart_agno,
>  			mp->m_sb.sb_agcount, agno, args->pag) {
>  		args->agno = agno;
>  		error = xfs_alloc_vextent_prepare_ag(args);
> @@ -3369,6 +3372,7 @@ xfs_alloc_vextent_iterate_ags(
>  	 */
>  	if (flags) {
>  		flags = 0;
> +		restart_agno = minimum_agno;
>  		goto restart;
>  	}

Looks good. Pretty much what we talked about on #xfs.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-03-18  0:38 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 16:47 [PATCH] xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags Darrick J. Wong
2023-03-18  0:38 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).