All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 21/22] xfs: clean up and simplify xfs_dialloc()
Date: Wed, 12 May 2021 14:49:30 -0700	[thread overview]
Message-ID: <20210512214930.GZ8582@magnolia> (raw)
In-Reply-To: <20210506072054.271157-22-david@fromorbit.com>

On Thu, May 06, 2021 at 05:20:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because it's a mess.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_ialloc.c | 270 +++++++++++++++++++++----------------
>  1 file changed, 153 insertions(+), 117 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index d749bb7c7a69..340bb95d7bc1 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -604,9 +604,10 @@ xfs_inobt_insert_sprec(
>  }
>  
>  /*
> - * Allocate new inodes in the allocation group specified by agbp.
> - * Returns 0 if inodes were allocated in this AG; 1 if there was no space
> - * in this AG; or the usual negative error code.
> + * Allocate new inodes in the allocation group specified by agbp.  Returns 0 if
> + * inodes were allocated in this AG; -EAGAIN if there was no space in this AG so
> + * the caller knows it can try another AG, a hard -ENOSPC when over the maximum
> + * inode count threshold, or the usual negative error code for other errors.
>   */
>  STATIC int
>  xfs_ialloc_ag_alloc(
> @@ -792,7 +793,7 @@ xfs_ialloc_ag_alloc(
>  	}
>  
>  	if (args.fsbno == NULLFSBLOCK)
> -		return 1;
> +		return -EAGAIN;
>  
>  	ASSERT(args.len == args.minlen);
>  
> @@ -1568,14 +1569,17 @@ xfs_dialloc_roll(
>  	/* Re-attach the quota info that we detached from prev trx. */
>  	tp->t_dqinfo = dqinfo;
>  
> -	*tpp = tp;
> -	if (error)
> -		return error;
> +	/*
> +	 * Join the buffer even on commit error so that the buffer is released
> +	 * when the caller cancels the transaction and doesn't have to handle
> +	 * this error case specially.
> +	 */
>  	xfs_trans_bjoin(tp, agibp);
> -	return 0;
> +	*tpp = tp;
> +	return error;
>  }
>  
> -STATIC xfs_agnumber_t
> +static xfs_agnumber_t
>  xfs_ialloc_next_ag(
>  	xfs_mount_t	*mp)
>  {
> @@ -1590,16 +1594,136 @@ xfs_ialloc_next_ag(
>  	return agno;
>  }
>  
> +static bool
> +xfs_dialloc_good_ag(
> +	struct xfs_trans	*tp,
> +	struct xfs_perag	*pag,
> +	umode_t			mode,
> +	int			flags,
> +	bool			ok_alloc)
> +{
> +	struct xfs_mount	*mp = tp->t_mountp;
> +	xfs_extlen_t		ineed;
> +	xfs_extlen_t		longest = 0;
> +	int			needspace;
> +	int			error;
> +
> +	if (!pag->pagi_inodeok)
> +		return false;
> +
> +	if (!pag->pagi_init) {
> +		error = xfs_ialloc_pagi_init(mp, tp, pag->pag_agno);
> +		if (error)
> +			return false;
> +	}
> +
> +	if (pag->pagi_freecount)
> +		return true;
> +	if (!ok_alloc)
> +		return false;
> +
> +	if (!pag->pagf_init) {
> +		error = xfs_alloc_pagf_init(mp, tp, pag->pag_agno, flags);
> +		if (error)
> +			return false;
> +	}
> +
> +	/*
> +	 * Check that there is enough free space for the file plus a chunk of
> +	 * inodes if we need to allocate some. If this is the first pass across
> +	 * the AGs, take into account the potential space needed for alignment
> +	 * of inode chunks when checking the longest contiguous free space in
> +	 * the AG - this prevents us from getting ENOSPC because we have free
> +	 * space larger than ialloc_blks but alignment constraints prevent us
> +	 * from using it.
> +	 *
> +	 * If we can't find an AG with space for full alignment slack to be
> +	 * taken into account, we must be near ENOSPC in all AGs.  Hence we
> +	 * don't include alignment for the second pass and so if we fail
> +	 * allocation due to alignment issues then it is most likely a real
> +	 * ENOSPC condition.
> +	 *
> +	 * XXX(dgc): this calculation is now bogus thanks to the per-ag
> +	 * reservations that xfs_alloc_fix_freelist() now does via
> +	 * xfs_alloc_space_available(). When the AG fills up, pagf_freeblks will
> +	 * be more than large enough for the check below to succeed, but
> +	 * xfs_alloc_space_available() will fail because of the non-zero
> +	 * metadata reservation and hence we won't actually be able to allocate
> +	 * more inodes in this AG. We do soooo much unnecessary work near ENOSPC
> +	 * because of this.

Yuck.  Can this be fixed by doing:

	really_free = pag->pagf_freeblks -
				xfs_ag_resv_needed(pag, XFS_AG_RESV_NONE);
	return really_free >= needspace + ineed && longest >= ineed)

to account for those reservations, perhaps?

> +	 */
> +	ineed = M_IGEO(mp)->ialloc_min_blks;
> +	if (flags && ineed > 1)
> +		ineed += M_IGEO(mp)->cluster_align;
> +	longest = pag->pagf_longest;
> +	if (!longest)
> +		longest = pag->pagf_flcount > 0;
> +	needspace = S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode);
> +
> +	if (pag->pagf_freeblks < needspace + ineed || longest < ineed)
> +		return false;
> +	return true;
> +}
> +
> +static int
> +xfs_dialloc_try_ag(
> +	struct xfs_trans	**tpp,
> +	struct xfs_perag	*pag,
> +	xfs_ino_t		parent,
> +	xfs_ino_t		*new_ino,
> +	bool			ok_alloc)
> +{
> +	struct xfs_buf		*agbp;
> +	xfs_ino_t		ino;
> +	int			error;
> +
> +	/*
> +	 * Then read in the AGI buffer and recheck with the AGI buffer
> +	 * lock held.
> +	 */
> +	error = xfs_ialloc_read_agi(pag->pag_mount, *tpp, pag->pag_agno, &agbp);
> +	if (error)
> +		return error;
> +
> +	if (!pag->pagi_freecount) {
> +		if (!ok_alloc) {
> +			error = -EAGAIN;
> +			goto out_release;
> +		}
> +
> +		error = xfs_ialloc_ag_alloc(*tpp, agbp, pag);
> +		if (error < 0)
> +			goto out_release;
> +
> +		/*
> +		 * We successfully allocated space for an inode cluster in this
> +		 * AG.  Roll the transaction so that we can allocate one of the
> +		 * new inodes.
> +		 */
> +		ASSERT(pag->pagi_freecount > 0);
> +		error = xfs_dialloc_roll(tpp, agbp);
> +		if (error)
> +			goto out_release;
> +	}
> +
> +	/* Allocate an inode in the found AG */
> +	error = xfs_dialloc_ag(*tpp, agbp, pag, parent, &ino);
> +	if (!error)
> +		*new_ino = ino;
> +	return error;
> +
> +out_release:
> +	xfs_trans_brelse(*tpp, agbp);
> +	return error;
> +}
> +
>  /*
> - * Select and prepare an AG for inode allocation.
> + * Allocate an on-disk inode.
>   *
>   * Mode is used to tell whether the new inode is a directory and hence where to
> - * locate it.
> - *
> - * This function will ensure that the selected AG has free inodes available to
> - * allocate from. The selected AGI will be returned locked to the caller, and it
> - * will allocate more free inodes if required. If no free inodes are found or
> - * can be allocated, -ENOSPC be returned.
> + * locate it. The on-disk inode that is allocated will be returned in @new_ino
> + * on success, otherwise an error will be set to indicate the failure (e.g.
> + * -ENOSPC).
>   */
>  int
>  xfs_dialloc(
> @@ -1609,14 +1733,12 @@ xfs_dialloc(
>  	xfs_ino_t		*new_ino)
>  {
>  	struct xfs_mount	*mp = (*tpp)->t_mountp;
> -	struct xfs_buf		*agbp;
>  	xfs_agnumber_t		agno;
>  	int			error = 0;
>  	xfs_agnumber_t		start_agno;
>  	struct xfs_perag	*pag;
>  	struct xfs_ino_geometry	*igeo = M_IGEO(mp);
> -	bool			okalloc = true;
> -	int			needspace;
> +	bool			ok_alloc = true;
>  	int			flags;
>  	xfs_ino_t		ino;
>  
> @@ -1624,7 +1746,6 @@ xfs_dialloc(
>  	 * Files of these types need at least one block if length > 0
>  	 * (and they won't fit in the inode, but that's hard to figure out).
>  	 */
> -	needspace = S_ISDIR(mode) || S_ISREG(mode) || S_ISLNK(mode);
>  	if (S_ISDIR(mode))
>  		start_agno = xfs_ialloc_next_ag(mp);
>  	else {
> @@ -1635,7 +1756,7 @@ xfs_dialloc(
>  
>  	/*
>  	 * If we have already hit the ceiling of inode blocks then clear
> -	 * okalloc so we scan all available agi structures for a free
> +	 * ok_alloc so we scan all available agi structures for a free
>  	 * inode.
>  	 *
>  	 * Read rough value of mp->m_icount by percpu_counter_read_positive,
> @@ -1644,7 +1765,7 @@ xfs_dialloc(

Er... didn't this logic get split into xfs_dialloc_select_ag in 5.11?

Nice cleanup, at least...

--D

>  	if (igeo->maxicount &&
>  	    percpu_counter_read_positive(&mp->m_icount) + igeo->ialloc_inos
>  							> igeo->maxicount) {
> -		okalloc = false;
> +		ok_alloc = false;
>  	}
>  
>  	/*
> @@ -1655,95 +1776,14 @@ xfs_dialloc(
>  	agno = start_agno;
>  	flags = XFS_ALLOC_FLAG_TRYLOCK;
>  	for (;;) {
> -		xfs_extlen_t	ineed;
> -		xfs_extlen_t	longest = 0;
> -
>  		pag = xfs_perag_get(mp, agno);
> -		if (!pag->pagi_inodeok)
> -			goto nextag;
> -
> -		if (!pag->pagi_init) {
> -			error = xfs_ialloc_pagi_init(mp, *tpp, agno);
> -			if (error)
> -				break;
> -		}
> -
> -		if (!pag->pagi_freecount)
> -			goto nextag;
> -		if (!okalloc)
> -			goto nextag;
> -
> -		if (!pag->pagf_init) {
> -			error = xfs_alloc_pagf_init(mp, *tpp, agno, flags);
> -			if (error)
> -				goto nextag;
> -		}
> -
> -		/*
> -		 * Check that there is enough free space for the file plus a
> -		 * chunk of inodes if we need to allocate some. If this is the
> -		 * first pass across the AGs, take into account the potential
> -		 * space needed for alignment of inode chunks when checking the
> -		 * longest contiguous free space in the AG - this prevents us
> -		 * from getting ENOSPC because we have free space larger than
> -		 * ialloc_blks but alignment constraints prevent us from using
> -		 * it.
> -		 *
> -		 * If we can't find an AG with space for full alignment slack to
> -		 * be taken into account, we must be near ENOSPC in all AGs.
> -		 * Hence we don't include alignment for the second pass and so
> -		 * if we fail allocation due to alignment issues then it is most
> -		 * likely a real ENOSPC condition.
> -		 */
> -		ineed = M_IGEO(mp)->ialloc_min_blks;
> -		if (flags && ineed > 1)
> -			ineed += M_IGEO(mp)->cluster_align;
> -		longest = pag->pagf_longest;
> -		if (!longest)
> -			longest = pag->pagf_flcount > 0;
> -
> -		if (pag->pagf_freeblks < needspace + ineed || longest < ineed)
> -			goto nextag;
> -
> -		/*
> -		 * Then read in the AGI buffer and recheck with the AGI buffer
> -		 * lock held.
> -		 */
> -		error = xfs_ialloc_read_agi(mp, *tpp, agno, &agbp);
> -		if (error)
> -			break;
> -
> -		if (pag->pagi_freecount)
> -			goto found_ag;
> -
> -		if (!okalloc)
> -			goto nextag_relse_buffer;
> -
> -		error = xfs_ialloc_ag_alloc(*tpp, agbp, pag);
> -		if (error < 0) {
> -			xfs_trans_brelse(*tpp, agbp);
> -			break;
> -		}
> -
> -		if (error == 0) {
> -			/*
> -			 * We successfully allocated space for an inode cluster
> -			 * in this AG.  Roll the transaction so that we can
> -			 * allocate one of the new inodes.
> -			 */
> -			ASSERT(pag->pagi_freecount > 0);
> -
> -			error = xfs_dialloc_roll(tpp, agbp);
> -			if (error) {
> -				xfs_buf_relse(agbp);
> +		if (xfs_dialloc_good_ag(*tpp, pag, mode, flags, ok_alloc)) {
> +			error = xfs_dialloc_try_ag(tpp, pag, parent,
> +					&ino, ok_alloc);
> +			if (error != -EAGAIN)
>  				break;
> -			}
> -			goto found_ag;
>  		}
>  
> -nextag_relse_buffer:
> -		xfs_trans_brelse(*tpp, agbp);
> -nextag:
>  		if (XFS_FORCED_SHUTDOWN(mp)) {
>  			error = -EFSCORRUPTED;
>  			break;
> @@ -1751,23 +1791,19 @@ xfs_dialloc(
>  		if (++agno == mp->m_maxagi)
>  			agno = 0;
>  		if (agno == start_agno) {
> -			if (!flags)
> +			if (!flags) {
> +				error = -ENOSPC;
>  				break;
> +			}
>  			flags = 0;
>  		}
>  		xfs_perag_put(pag);
>  	}
>  
> +	if (!error)
> +		*new_ino = ino;
>  	xfs_perag_put(pag);
> -	return error ? error : -ENOSPC;
> -found_ag:
> -	/* Allocate an inode in the found AG */
> -	error = xfs_dialloc_ag(*tpp, agbp, pag, parent, &ino);
> -	xfs_perag_put(pag);
> -	if (error)
> -		return error;
> -	*new_ino = ino;
> -	return 0;
> +	return error;
>  }
>  
>  /*
> -- 
> 2.31.1
> 

  reply	other threads:[~2021-05-12 22:42 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06  7:20 [RFC 00/22] xfs: initial agnumber -> perag conversions for shrink Dave Chinner
2021-05-06  7:20 ` [PATCH 01/22] xfs: move xfs_perag_get/put to xfs_ag.[ch] Dave Chinner
2021-05-10 12:52   ` Brian Foster
2021-05-11  7:18     ` Dave Chinner
2021-05-10 22:28   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 02/22] xfs: prepare for moving perag definitions and support to libxfs Dave Chinner
2021-05-10 12:53   ` Brian Foster
2021-05-11  7:19     ` Dave Chinner
2021-05-06  7:20 ` [PATCH 03/22] xfs: move perag structure and setup to libxfs/xfs_ag.[ch] Dave Chinner
2021-05-10 22:26   ` Darrick J. Wong
2021-05-10 23:38     ` Dave Chinner
2021-05-06  7:20 ` [PATCH 04/22] xfs: make for_each_perag... a first class citizen Dave Chinner
2021-05-10 12:53   ` Brian Foster
2021-05-11  7:35     ` Dave Chinner
2021-05-11 12:29       ` Brian Foster
2021-05-11 21:33         ` Dave Chinner
2021-05-12 21:58           ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 05/22] xfs: convert raw ag walks to use for_each_perag Dave Chinner
2021-05-10 12:54   ` Brian Foster
2021-05-06  7:20 ` [PATCH 06/22] xfs: convert xfs_iwalk to use perag references Dave Chinner
2021-05-10 13:41   ` Brian Foster
2021-05-12 22:08   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 07/22] xfs: convert secondary superblock walk to use perags Dave Chinner
2021-05-10 13:41   ` Brian Foster
2021-05-12 22:09   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 08/22] xfs: pass perags through to the busy extent code Dave Chinner
2021-05-11 12:29   ` Brian Foster
2021-05-12 22:13   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 09/22] xfs: push perags through the ag reservation callouts Dave Chinner
2021-05-11 12:29   ` Brian Foster
2021-05-13  0:29     ` Dave Chinner
2021-05-12 22:16   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 10/22] xfs: pass perags around in fsmap data dev functions Dave Chinner
2021-05-11 12:30   ` Brian Foster
2021-05-12 22:23   ` Darrick J. Wong
2021-05-18  1:00     ` Dave Chinner
2021-05-06  7:20 ` [PATCH 11/22] xfs: add a perag to the btree cursor Dave Chinner
2021-05-11 12:30   ` Brian Foster
2021-05-11 20:51     ` Darrick J. Wong
2021-05-11 21:52       ` Dave Chinner
2021-05-12 12:49         ` Brian Foster
2021-05-12 22:41           ` Darrick J. Wong
2021-05-12 22:40   ` Darrick J. Wong
2021-05-13  0:12     ` Dave Chinner
2021-05-13  0:55       ` Darrick J. Wong
2021-05-13  1:07         ` Dave Chinner
2021-05-13  3:49           ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 12/22] xfs: convert rmap btree cursor to using a perag Dave Chinner
2021-05-11 12:30   ` Brian Foster
2021-05-12 22:45   ` Darrick J. Wong
2021-05-13  3:54     ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 13/22] xfs: convert refcount btree cursor to use perags Dave Chinner
2021-05-11 12:30   ` Brian Foster
2021-05-12 22:47   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 14/22] xfs: convert allocbt cursors " Dave Chinner
2021-05-11 12:30   ` Brian Foster
2021-05-13  3:55   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 15/22] xfs: use perag for ialloc btree cursors Dave Chinner
2021-05-11 12:30   ` Brian Foster
2021-05-13  3:55   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 16/22] xfs: remove agno from btree cursor Dave Chinner
2021-05-11 12:34   ` Brian Foster
2021-05-11 22:02     ` Dave Chinner
2021-05-12 22:52   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 17/22] xfs: simplify xfs_dialloc_select_ag() return values Dave Chinner
2021-05-12 12:49   ` Brian Foster
2021-05-12 22:55   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 18/22] xfs: collapse AG selection for inode allocation Dave Chinner
2021-05-12 12:52   ` Brian Foster
2021-05-18  1:21     ` Dave Chinner
2021-05-12 23:11   ` Darrick J. Wong
2021-05-18  1:29     ` Dave Chinner
2021-05-06  7:20 ` [PATCH 19/22] xfs: get rid of xfs_dir_ialloc() Dave Chinner
2021-05-06 11:26   ` kernel test robot
2021-05-06 11:26     ` kernel test robot
2021-05-06 11:26   ` [RFC PATCH] xfs: xfs_dialloc_ag can be static kernel test robot
2021-05-06 11:26     ` kernel test robot
2021-05-12 12:52   ` [PATCH 19/22] xfs: get rid of xfs_dir_ialloc() Brian Foster
2021-05-12 23:19   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 20/22] xfs: inode allocation can use a single perag instance Dave Chinner
2021-05-12 12:52   ` Brian Foster
2021-05-12 23:19   ` Darrick J. Wong
2021-05-06  7:20 ` [PATCH 21/22] xfs: clean up and simplify xfs_dialloc() Dave Chinner
2021-05-12 21:49   ` Darrick J. Wong [this message]
2021-05-18  1:46     ` Dave Chinner
2021-05-06  7:20 ` [PATCH 22/22] xfs: use perag through unlink processing Dave Chinner
2021-05-12 21:37   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210512214930.GZ8582@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.