All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>
Subject: [PATCH 17/26] xfs: split indlen reservations fairly when under reserved
Date: Mon, 27 Mar 2017 10:38:28 +0200	[thread overview]
Message-ID: <20170327083837.11027-18-hch@lst.de> (raw)
In-Reply-To: <20170327083837.11027-1-hch@lst.de>

From: Brian Foster <bfoster@redhat.com>

commit 75d65361cf3c0dae2af970c305e19c727b28a510 upstream.

Certain workoads that punch holes into speculative preallocation can
cause delalloc indirect reservation splits when the delalloc extent is
split in two. If further splits occur, an already short-handed extent
can be split into two in a manner that leaves zero indirect blocks for
one of the two new extents. This occurs because the shortage is large
enough that the xfs_bmap_split_indlen() algorithm completely drains the
requested indlen of one of the extents before it honors the existing
reservation.

This ultimately results in a warning from xfs_bmap_del_extent(). This
has been observed during file copies of large, sparse files using 'cp
--sparse=always.'

To avoid this problem, update xfs_bmap_split_indlen() to explicitly
apply the reservation shortage fairly between both extents. This smooths
out the overall indlen shortage and defers the situation where we end up
with a delalloc extent with zero indlen reservation to extreme
circumstances.

Reported-by: Patrick Dung <mpatdung@gmail.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 61 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6eb9ae8ed22d..0eece4c13500 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4805,34 +4805,59 @@ xfs_bmap_split_indlen(
 	xfs_filblks_t			len2 = *indlen2;
 	xfs_filblks_t			nres = len1 + len2; /* new total res. */
 	xfs_filblks_t			stolen = 0;
+	xfs_filblks_t			resfactor;
 
 	/*
 	 * Steal as many blocks as we can to try and satisfy the worst case
 	 * indlen for both new extents.
 	 */
-	while (nres > ores && avail) {
-		nres--;
-		avail--;
-		stolen++;
-	}
+	if (ores < nres && avail)
+		stolen = XFS_FILBLKS_MIN(nres - ores, avail);
+	ores += stolen;
+
+	 /* nothing else to do if we've satisfied the new reservation */
+	if (ores >= nres)
+		return stolen;
+
+	/*
+	 * We can't meet the total required reservation for the two extents.
+	 * Calculate the percent of the overall shortage between both extents
+	 * and apply this percentage to each of the requested indlen values.
+	 * This distributes the shortage fairly and reduces the chances that one
+	 * of the two extents is left with nothing when extents are repeatedly
+	 * split.
+	 */
+	resfactor = (ores * 100);
+	do_div(resfactor, nres);
+	len1 *= resfactor;
+	do_div(len1, 100);
+	len2 *= resfactor;
+	do_div(len2, 100);
+	ASSERT(len1 + len2 <= ores);
+	ASSERT(len1 < *indlen1 && len2 < *indlen2);
 
 	/*
-	 * The only blocks available are those reserved for the original
-	 * extent and what we can steal from the extent being removed.
-	 * If this still isn't enough to satisfy the combined
-	 * requirements for the two new extents, skim blocks off of each
-	 * of the new reservations until they match what is available.
+	 * Hand out the remainder to each extent. If one of the two reservations
+	 * is zero, we want to make sure that one gets a block first. The loop
+	 * below starts with len1, so hand len2 a block right off the bat if it
+	 * is zero.
 	 */
-	while (nres > ores) {
-		if (len1) {
-			len1--;
-			nres--;
+	ores -= (len1 + len2);
+	ASSERT((*indlen1 - len1) + (*indlen2 - len2) >= ores);
+	if (ores && !len2 && *indlen2) {
+		len2++;
+		ores--;
+	}
+	while (ores) {
+		if (len1 < *indlen1) {
+			len1++;
+			ores--;
 		}
-		if (nres == ores)
+		if (!ores)
 			break;
-		if (len2) {
-			len2--;
-			nres--;
+		if (len2 < *indlen2) {
+			len2++;
+			ores--;
 		}
 	}
 
-- 
2.11.0


  parent reply	other threads:[~2017-03-27  8:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-27  8:38 Christoph Hellwig
2017-03-27  8:38 ` [PATCH 01/26] xfs: pull up iolock from xfs_free_eofblocks() Christoph Hellwig
2017-03-27  8:38 ` [PATCH 02/26] xfs: sync eofblocks scans under iolock are livelock prone Christoph Hellwig
2017-03-27  8:38 ` [PATCH 03/26] xfs: fix eofblocks race with file extending async dio writes Christoph Hellwig
2017-03-27  8:38 ` [PATCH 04/26] xfs: fix toctou race when locking an inode to access the data map Christoph Hellwig
2017-03-27  8:38 ` [PATCH 05/26] xfs: fail _dir_open when readahead fails Christoph Hellwig
2017-03-27  8:38 ` [PATCH 06/26] xfs: filter out obviously bad btree pointers Christoph Hellwig
2017-03-27  8:38 ` [PATCH 07/26] xfs: check for obviously bad level values in the bmbt root Christoph Hellwig
2017-03-27  8:38 ` [PATCH 08/26] xfs: verify free block header fields Christoph Hellwig
2017-03-27  8:38 ` [PATCH 09/26] xfs: allow unwritten extents in the CoW fork Christoph Hellwig
2017-03-27  8:38 ` [PATCH 10/26] xfs: mark speculative prealloc CoW fork extents unwritten Christoph Hellwig
2017-03-27  8:38 ` [PATCH 11/26] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Christoph Hellwig
2017-03-27  8:38 ` [PATCH 12/26] xfs: reject all unaligned direct writes to reflinked files Christoph Hellwig
2017-03-27  8:38 ` [PATCH 13/26] xfs: update ctime and mtime on clone destinatation inodes Christoph Hellwig
2017-03-27  8:38 ` [PATCH 14/26] xfs: correct null checks and error processing in xfs_initialize_perag Christoph Hellwig
2017-03-27  8:38 ` [PATCH 15/26] xfs: don't fail xfs_extent_busy allocation Christoph Hellwig
2017-03-27  8:38 ` [PATCH 16/26] xfs: handle indlen shortage on delalloc extent merge Christoph Hellwig
2017-03-27  8:38 ` Christoph Hellwig [this message]
2017-03-27  8:38 ` [PATCH 18/26] xfs: fix uninitialized variable in _reflink_convert_cow Christoph Hellwig
2017-03-27  8:38 ` [PATCH 19/26] xfs: don't reserve blocks for right shift transactions Christoph Hellwig
2017-03-27  8:38 ` [PATCH 20/26] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Christoph Hellwig
2017-03-27  8:38 ` [PATCH 21/26] xfs: tune down agno asserts in the bmap code Christoph Hellwig
2017-03-27  8:38 ` [PATCH 22/26] xfs: only reclaim unwritten COW extents periodically Christoph Hellwig
2017-03-27  8:38 ` [PATCH 23/26] xfs: fix and streamline error handling in xfs_end_io Christoph Hellwig
2017-03-27  8:38 ` [PATCH 24/26] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Christoph Hellwig
2017-03-27  8:38 ` [PATCH 25/26] xfs: use iomap new flag for newly allocated delalloc blocks Christoph Hellwig
2017-03-27  8:38 ` [PATCH 26/26] xfs: try any AG when allocating the first btree block when reflinking Christoph Hellwig
2017-04-01  6:34 4.10-stable updates for XFS Christoph Hellwig
2017-04-01  6:35 ` [PATCH 17/26] xfs: split indlen reservations fairly when under reserved Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170327083837.11027-18-hch@lst.de \
    --to=hch@lst.de \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.