linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 3/4] xfs: amortize agfl block frees across multiple transactions
Date: Mon, 27 Nov 2017 15:24:33 -0500	[thread overview]
Message-ID: <20171127202434.43125-4-bfoster@redhat.com> (raw)
In-Reply-To: <20171127202434.43125-1-bfoster@redhat.com>

We've had rare reports of transaction overruns in
xfs_inactive_ifree() for quite some time. Analysis of a reproducing
metadump has shown the problem is essentially caused by performing
too many agfl block frees in a single transaction.

For example, an inode chunk is freed and the associated agfl fixup
algorithm discovers it needs to free a single agfl block before the
chunk free occurs. This single block ends up causing a space btree
join and adds one or more blocks back onto the agfl. This causes
xfs_alloc_fix_freelist() to free up to 3 blocks just to rectify a
single block discrepency.

The transaction overrun occurs under several other unfortunate
circumstances:

 - Each agfl block free is left+right contiguous. This requires 2
   record deletions and 1 insertion for the cntbt and thus requires
   more log reservation than normal.
 - The associated transaction is the first in the CIL ctx and thus
   the ctx header reservation is consumed.
 - The transaction reservation is larger than a log buffer and thus
   extra split header reservation is consumed.

As a result of the agfl and free space state of the filesystem, the
agfl fixup code has dirtied more cntbt buffer space than allowed by
the portion of the reservation allotted for block allocation. This
is all before the real allocation even starts!

Note that the log related conditions above are correctly covered by
the existing transaction reservation. The above demonstrates that
the reservation allotted for the context/split headers may help
suppress overruns in the more common case where that reservation
goes unused for its intended purpose.

To address this problem, update xfs_alloc_fix_freelist() to amortize
agfl block frees over multiple transactions. Free one block per
transaction so long as the agfl is less than half free. The agfl
minimum allocation requirement is dynamic, but is based on the
geometry of the associated btrees (i.e., level count) and therefore
should be easily rectified over multiple allocation transactions.
Further, there is no real harm in leaving extraneous blocks on the
agfl so long as there are enough free slots available for btree
blocks freed as a result of the upcoming allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 0da80019a917..d8d58e35da00 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2117,11 +2117,6 @@ xfs_alloc_fix_freelist(
 	 * appropriately based on the recursion count and dirty state of the
 	 * buffer.
 	 *
-	 * XXX (dgc): When we have lots of free space, does this buy us
-	 * anything other than extra overhead when we need to put more blocks
-	 * back on the free list? Maybe we should only do this when space is
-	 * getting low or the AGFL is more than half full?
-	 *
 	 * The NOSHRINK flag prevents the AGFL from being shrunk if it's too
 	 * big; the NORMAP flag prevents AGFL expand/shrink operations from
 	 * updating the rmapbt.  Both flags are used in xfs_repair while we're
@@ -2151,6 +2146,16 @@ xfs_alloc_fix_freelist(
 			goto out_agbp_relse;
 		}
 		xfs_trans_binval(tp, bp);
+
+		/*
+		 * Freeing all extra agfl blocks adds too much log reservation
+		 * overhead to a single transaction, particularly considering
+		 * that freeing a block can cause a btree join and put one right
+		 * back on the agfl. Try to free one block per tx so long as
+		 * we've left enough free slots for the upcoming modifications.
+		 */
+		if (pag->pagf_flcount <= (XFS_AGFL_SIZE(mp) >> 1))
+			break;
 	}
 
 	targs.tp = tp;
-- 
2.13.6


  parent reply	other threads:[~2017-11-27 20:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-27 20:24 [PATCH 0/4] xfs: inode transaction reservation fixups Brian Foster
2017-11-27 20:24 ` [PATCH 1/4] xfs: print transaction log reservation on overrun Brian Foster
2017-11-27 22:14   ` Dave Chinner
2017-11-27 20:24 ` [PATCH 2/4] xfs: include inobt buffers in ifree tx log reservation Brian Foster
2017-11-27 22:28   ` Dave Chinner
2017-11-28 13:30     ` Brian Foster
2017-11-28 21:38       ` Dave Chinner
2017-11-29 14:31         ` Brian Foster
2017-11-27 20:24 ` Brian Foster [this message]
2017-11-27 23:07   ` [PATCH 3/4] xfs: amortize agfl block frees across multiple transactions Dave Chinner
2017-11-28 13:57     ` Brian Foster
2017-11-28 22:09       ` Dave Chinner
2017-11-29 18:24         ` Brian Foster
2017-11-29 20:36           ` Brian Foster
2017-12-05 20:53             ` Brian Foster
2017-11-27 20:24 ` [PATCH RFC 4/4] xfs: include an allocfree res for inobt modifications Brian Foster
2017-11-27 23:27   ` Dave Chinner
2017-11-28 14:04     ` Brian Foster
2017-11-28 22:26       ` Dave Chinner
2017-11-29 14:32         ` Brian Foster
2017-11-28 15:49     ` Brian Foster
2017-11-28 22:34       ` Dave Chinner
2017-11-29 14:32         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171127202434.43125-4-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).