[PATCH 0/4] clean up collapse range and handle post-eof delalloc

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] clean up collapse range and handle post-eof delalloc
@ 2014-09-07 12:25 Brian Foster
  2014-09-07 12:25 ` [PATCH 1/4] xfs: track collapse via file offset rather than extent index Brian Foster
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Brian Foster @ 2014-09-07 12:25 UTC (permalink / raw)
  To: xfs

Hi all,

Here's a new drop of the patches that tweak collapse to be fsb based and
organize the code a bit. This has seen ~1.5b fsx ops without a failure
and some light xfstests testing so far.

The primary difference from this version and the rfc is the retention of
the eofblocks trim prior to collapse. Patch 4 is also new. It isolates
the pagecache writeback/truncate in xfs_free_file_space() to the range
affected by the free. Thoughts, reviews, flames appreciated.

Brian

v1:
- Retain the eofblocks trim and writeback/inval. the range of shifted
  data only.
- Added the xfs_free_file_space() patch to no longer writeback the
  entire file.
rfc: http://oss.sgi.com/archives/xfs/2014-08/msg00462.html

Brian Foster (4):
  xfs: track collapse via file offset rather than extent index
  xfs: refactor xfs_bmap_shift_extents() into multiple functions
  xfs: writeback and inval. file range to be shifted by collapse
  xfs: only writeback and truncate pages for the freed range

 fs/xfs/libxfs/xfs_bmap.c | 294 +++++++++++++++++++++++++++++++----------------
 fs/xfs/libxfs/xfs_bmap.h |   7 +-
 fs/xfs/xfs_bmap_util.c   |  54 +++++----
 3 files changed, 230 insertions(+), 125 deletions(-)

-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] xfs: track collapse via file offset rather than extent index
  2014-09-07 12:25 [PATCH 0/4] clean up collapse range and handle post-eof delalloc Brian Foster
@ 2014-09-07 12:25 ` Brian Foster
  2014-09-09  4:03   ` Dave Chinner
  2014-09-07 12:25 ` [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions Brian Foster
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2014-09-07 12:25 UTC (permalink / raw)
  To: xfs

The collapse range implementation uses a transaction per extent shift.
The progress of the overall operation is tracked via the current extent
index of the in-core extent list. This is racy because the ilock must be
dropped and reacquired for each transaction according to locking and log
reservation rules. Therefore, writeback to prior regions of the file is
possible and can change the extent count. This changes the extent to
which the current index refers and causes the collapse to fail mid
operation. To avoid this problem, the entire file is currently written
back before the collapse operation starts.

To eliminate the need to flush the entire file, use the file offset
(fsb) to track the progress of the overall extent shift operation rather
than the extent index. Modify xfs_bmap_shift_extents() to
unconditionally convert the start_fsb parameter to an extent index and
return the file offset of the extent where the shift left off, if
further extents exist. The bulk of ths function can remain based on
extent index as ilock is held by the caller. xfs_collapse_file_space()
now uses the fsb output as the starting point for the subsequent shift.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 85 +++++++++++++++++++++++++-----------------------
 fs/xfs/libxfs/xfs_bmap.h |  7 ++--
 fs/xfs/xfs_bmap_util.c   | 12 +++----
 3 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 86df952..4b3f1b9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5406,20 +5406,21 @@ error0:
 /*
  * Shift extent records to the left to cover a hole.
  *
- * The maximum number of extents to be shifted in a single operation
- * is @num_exts, and @current_ext keeps track of the current extent
- * index we have shifted. @offset_shift_fsb is the length by which each
- * extent is shifted. If there is no hole to shift the extents
- * into, this will be considered invalid operation and we abort immediately.
+ * The maximum number of extents to be shifted in a single operation is
+ * @num_exts. @start_fsb specifies the file offset to start the shift and the
+ * file offset where we've left off is returned in @next_fsb. @offset_shift_fsb
+ * is the length by which each extent is shifted. If there is no hole to shift
+ * the extents into, this will be considered invalid operation and we abort
+ * immediately.
  */
 int
 xfs_bmap_shift_extents(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip,
-	int			*done,
 	xfs_fileoff_t		start_fsb,
 	xfs_fileoff_t		offset_shift_fsb,
-	xfs_extnum_t		*current_ext,
+	int			*done,
+	xfs_fileoff_t		*next_fsb,
 	xfs_fsblock_t		*firstblock,
 	struct xfs_bmap_free	*flist,
 	int			num_exts)
@@ -5431,6 +5432,7 @@ xfs_bmap_shift_extents(
 	struct xfs_mount		*mp = ip->i_mount;
 	struct xfs_ifork		*ifp;
 	xfs_extnum_t			nexts = 0;
+	xfs_extnum_t			current_ext;
 	xfs_fileoff_t			startoff;
 	int				error = 0;
 	int				i;
@@ -5451,7 +5453,8 @@ xfs_bmap_shift_extents(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	ASSERT(current_ext != NULL);
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
@@ -5462,20 +5465,18 @@ xfs_bmap_shift_extents(
 	}
 
 	/*
-	 * If *current_ext is 0, we would need to lookup the extent
-	 * from where we would start shifting and store it in gotp.
+	 * Look up the extent index for the fsb where we start shifting. We can
+	 * henceforth iterate with current_ext as extent list changes are locked
+	 * out via ilock.
+	 *
+	 * gotp can be null in 2 cases: 1) if there are no extents or 2)
+	 * start_fsb lies in a hole beyond which there are no extents. Either
+	 * way, we are done.
 	 */
-	if (!*current_ext) {
-		gotp = xfs_iext_bno_to_ext(ifp, start_fsb, current_ext);
-		/*
-		 * gotp can be null in 2 cases: 1) if there are no extents
-		 * or 2) start_fsb lies in a hole beyond which there are
-		 * no extents. Either way, we are done.
-		 */
-		if (!gotp) {
-			*done = 1;
-			return 0;
-		}
+	gotp = xfs_iext_bno_to_ext(ifp, start_fsb, &current_ext);
+	if (!gotp) {
+		*done = 1;
+		return 0;
 	}
 
 	if (ifp->if_flags & XFS_IFBROOT) {
@@ -5487,36 +5488,32 @@ xfs_bmap_shift_extents(
 
 	/*
 	 * There may be delalloc extents in the data fork before the range we
-	 * are collapsing out, so we cannot
-	 * use the count of real extents here. Instead we have to calculate it
-	 * from the incore fork.
+	 * are collapsing out, so we cannot use the count of real extents here.
+	 * Instead we have to calculate it from the incore fork.
 	 */
 	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
-	while (nexts++ < num_exts && *current_ext < total_extents) {
+	while (nexts++ < num_exts && current_ext < total_extents) {
 
-		gotp = xfs_iext_get_ext(ifp, *current_ext);
+		gotp = xfs_iext_get_ext(ifp, current_ext);
 		xfs_bmbt_get_all(gotp, &got);
 		startoff = got.br_startoff - offset_shift_fsb;
 
 		/*
-		 * Before shifting extent into hole, make sure that the hole
-		 * is large enough to accomodate the shift.
+		 * Before shifting extent into hole, make sure that the hole is
+		 * large enough to accommodate the shift.
 		 */
-		if (*current_ext) {
-			xfs_bmbt_get_all(xfs_iext_get_ext(ifp,
-						*current_ext - 1), &left);
-
+		if (current_ext > 0) {
+			xfs_bmbt_get_all(xfs_iext_get_ext(ifp, current_ext - 1),
+					 &left);
 			if (startoff < left.br_startoff + left.br_blockcount)
 				error = -EINVAL;
 		} else if (offset_shift_fsb > got.br_startoff) {
 			/*
-			 * When first extent is shifted, offset_shift_fsb
-			 * should be less than the stating offset of
-			 * the first extent.
+			 * When first extent is shifted, offset_shift_fsb should
+			 * be less than the stating offset of the first extent.
 			 */
 			error = -EINVAL;
 		}
-
 		if (error)
 			goto del_cursor;
 
@@ -5531,7 +5528,7 @@ xfs_bmap_shift_extents(
 		}
 
 		/* Check if we can merge 2 adjacent extents */
-		if (*current_ext &&
+		if (current_ext &&
 		    left.br_startoff + left.br_blockcount == startoff &&
 		    left.br_startblock + left.br_blockcount ==
 				got.br_startblock &&
@@ -5539,7 +5536,7 @@ xfs_bmap_shift_extents(
 		    left.br_blockcount + got.br_blockcount <= MAXEXTLEN) {
 			blockcount = left.br_blockcount +
 				got.br_blockcount;
-			xfs_iext_remove(ip, *current_ext, 1, 0);
+			xfs_iext_remove(ip, current_ext, 1, 0);
 			logflags |= XFS_ILOG_CORE;
 			if (cur) {
 				error = xfs_btree_delete(cur, &i);
@@ -5551,7 +5548,7 @@ xfs_bmap_shift_extents(
 			}
 			XFS_IFORK_NEXT_SET(ip, whichfork,
 				XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
-			gotp = xfs_iext_get_ext(ifp, --*current_ext);
+			gotp = xfs_iext_get_ext(ifp, --current_ext);
 			xfs_bmbt_get_all(gotp, &got);
 
 			/* Make cursor point to the extent we will update */
@@ -5585,13 +5582,18 @@ xfs_bmap_shift_extents(
 			logflags |= XFS_ILOG_DEXT;
 		}
 
-		(*current_ext)++;
+		current_ext++;
 		total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
 	}
 
 	/* Check if we are done */
-	if (*current_ext == total_extents)
+	if (current_ext == total_extents)
 		*done = 1;
+	else if (next_fsb) {
+		gotp = xfs_iext_get_ext(ifp, current_ext);
+		xfs_bmbt_get_all(gotp, &got);
+		*next_fsb = got.br_startoff;
+	}
 
 del_cursor:
 	if (cur)
@@ -5600,5 +5602,6 @@ del_cursor:
 
 	if (logflags)
 		xfs_trans_log_inode(tp, ip, logflags);
+
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index b879ca5..44db6db 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -178,9 +178,8 @@ int	xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
 		xfs_extnum_t num);
 uint	xfs_default_attroffset(struct xfs_inode *ip);
 int	xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
-		int *done, xfs_fileoff_t start_fsb,
-		xfs_fileoff_t offset_shift_fsb, xfs_extnum_t *current_ext,
-		xfs_fsblock_t *firstblock, struct xfs_bmap_free	*flist,
-		int num_exts);
+		xfs_fileoff_t start_fsb, xfs_fileoff_t offset_shift_fsb,
+		int *done, xfs_fileoff_t *next_fsb, xfs_fsblock_t *firstblock,
+		struct xfs_bmap_free *flist, int num_exts);
 
 #endif	/* __XFS_BMAP_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 1707980..1e96d77 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1456,18 +1456,18 @@ xfs_collapse_file_space(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	int			error;
-	xfs_extnum_t		current_ext = 0;
 	struct xfs_bmap_free	free_list;
 	xfs_fsblock_t		first_block;
 	int			committed;
 	xfs_fileoff_t		start_fsb;
+	xfs_fileoff_t		next_fsb;
 	xfs_fileoff_t		shift_fsb;
 
 	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
 
 	trace_xfs_collapse_file_space(ip);
 
-	start_fsb = XFS_B_TO_FSB(mp, offset + len);
+	next_fsb = XFS_B_TO_FSB(mp, offset + len);
 	shift_fsb = XFS_B_TO_FSB(mp, len);
 
 	/*
@@ -1525,10 +1525,10 @@ xfs_collapse_file_space(
 		 * We are using the write transaction in which max 2 bmbt
 		 * updates are allowed
 		 */
-		error = xfs_bmap_shift_extents(tp, ip, &done, start_fsb,
-					       shift_fsb, &current_ext,
-					       &first_block, &free_list,
-					       XFS_BMAP_MAX_SHIFT_EXTENTS);
+		start_fsb = next_fsb;
+		error = xfs_bmap_shift_extents(tp, ip, start_fsb, shift_fsb,
+				&done, &next_fsb, &first_block, &free_list,
+				XFS_BMAP_MAX_SHIFT_EXTENTS);
 		if (error)
 			goto out;
 
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions
  2014-09-07 12:25 [PATCH 0/4] clean up collapse range and handle post-eof delalloc Brian Foster
  2014-09-07 12:25 ` [PATCH 1/4] xfs: track collapse via file offset rather than extent index Brian Foster
@ 2014-09-07 12:25 ` Brian Foster
  2014-09-09  5:08   ` Dave Chinner
  2014-09-07 12:25 ` [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse Brian Foster
  2014-09-07 12:26 ` [PATCH 4/4] xfs: only writeback and truncate pages for the freed range Brian Foster
  3 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2014-09-07 12:25 UTC (permalink / raw)
  To: xfs

The extent shift mechanism in xfs_bmap_shift_extents() is complicated
and handles several different, non-deterministic scenarios. These
include extent shifts, extent merges and potential btree updates in
either of the former scenarios.

Refactor the code to be more linear and readable. The loop logic in
xfs_bmap_shift_extents() and some initial error checking is adjusted
slightly. The associated btree lookup and update/delete operations are
condensed into single blocks of code. This reduces the number of
btree-specific blocks and facilitates the separation of the merge
operation into a new xfs_bmap_shift_extents_merge() helper.  The merge
check is also separated into an inline.

This is a code refactor only. The behavior of extent shift and collapse
range is not modified.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 243 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 168 insertions(+), 75 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 4b3f1b9..449a016 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5404,6 +5404,120 @@ error0:
 }
 
 /*
+ * Determine whether an extent shift can be accomplished by a merge with the
+ * extent that precedes the target hole of the shift.
+ */
+static inline bool
+xfs_bmap_shift_extents_can_merge(
+	struct xfs_bmbt_irec	*left,	/* preceding extent */
+	struct xfs_bmbt_irec	*got,	/* current extent to shift */
+	xfs_fileoff_t		shift)	/* shift fsb */
+{
+	xfs_fileoff_t		startoff;
+
+	startoff = got->br_startoff - shift;
+
+	/*
+	 * The extent, once shifted, must be adjacent in-file and on-disk with
+	 * the preceding extent.
+	 */
+	if ((left->br_startoff + left->br_blockcount != startoff) ||
+	    (left->br_startblock + left->br_blockcount != got->br_startblock) ||
+	    (left->br_state != got->br_state) ||
+	    (left->br_blockcount + got->br_blockcount > MAXEXTLEN))
+		return false;
+
+	return true;
+}
+
+/*
+ * An extent shift adjusts the file offset of an extent to fill a preceding hole
+ * in the file. If an extent shift would result in the extent being fully
+ * adjacent to the extent that currently precedes the hole, we can merge with
+ * the preceding extent rather than do the shift.
+ *
+ * This function assumes the caller has verified a shift-by-merge is possible
+ * with the provided extents via xfs_bmap_shift_extents_can_merge().
+ */
+static int
+xfs_bmap_shift_extents_merge(
+	struct xfs_inode		*ip,
+	int				whichfork,
+	xfs_fileoff_t			shift,		/* shift fsb */
+	int				current_ext,	/* idx of gotp */
+	struct xfs_bmbt_rec_host	*gotp,		/* extent to shift */
+	struct xfs_bmbt_rec_host	*leftp,		/* preceding extent */
+	struct xfs_btree_cur		*cur,
+	int				*logflags)	/* output */
+{
+	struct xfs_ifork		*ifp;
+	struct xfs_bmbt_irec		got;
+	struct xfs_bmbt_irec		left;
+	xfs_filblks_t			blockcount;
+	int				error, i;
+
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	xfs_bmbt_get_all(gotp, &got);
+	xfs_bmbt_get_all(leftp, &left);
+	blockcount = left.br_blockcount + got.br_blockcount;
+
+	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+	ASSERT(xfs_bmap_shift_extents_can_merge(&left, &got, shift));
+
+	/*
+	 * Merge the in-core extents. Note that the host record pointers and
+	 * current_ext index are invalid once the extent has been removed via
+	 * xfs_iext_remove().
+	 */
+	xfs_bmbt_set_blockcount(leftp, blockcount);
+	xfs_iext_remove(ip, current_ext, 1, 0);
+
+	/*
+	 * Update the on-disk extent count, the btree if necessary and log the
+	 * inode.
+	 */
+	XFS_IFORK_NEXT_SET(ip, whichfork,
+			   XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
+	*logflags |= XFS_ILOG_CORE;
+	if (cur) {
+		/* lookup and remove the extent to merge */
+		error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
+				got.br_startblock, got.br_blockcount, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
+
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
+
+		/* lookup and update size of the previous extent */
+		error = xfs_bmbt_lookup_eq(cur, left.br_startoff,
+				left.br_startblock, left.br_blockcount, &i);
+		if (error)
+			goto error;
+		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
+
+		left.br_blockcount = blockcount;
+
+		error = xfs_bmbt_update(cur, left.br_startoff,
+				left.br_startblock, left.br_blockcount,
+				left.br_state);
+		if (error)
+			goto error;
+	} else {
+		*logflags |= XFS_ILOG_DEXT;
+	}
+
+	return 0;
+
+error:
+	return error;
+}
+
+/*
  * Shift extent records to the left to cover a hole.
  *
  * The maximum number of extents to be shifted in a single operation is
@@ -5427,6 +5541,7 @@ xfs_bmap_shift_extents(
 {
 	struct xfs_btree_cur		*cur = NULL;
 	struct xfs_bmbt_rec_host	*gotp;
+	struct xfs_bmbt_rec_host	*leftp;
 	struct xfs_bmbt_irec            got;
 	struct xfs_bmbt_irec		left;
 	struct xfs_mount		*mp = ip->i_mount;
@@ -5438,7 +5553,6 @@ xfs_bmap_shift_extents(
 	int				i;
 	int				whichfork = XFS_DATA_FORK;
 	int				logflags = 0;
-	xfs_filblks_t			blockcount = 0;
 	int				total_extents;
 
 	if (unlikely(XFS_TEST_ERROR(
@@ -5464,6 +5578,13 @@ xfs_bmap_shift_extents(
 			return error;
 	}
 
+	if (ifp->if_flags & XFS_IFBROOT) {
+		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
+		cur->bc_private.b.firstblock = *firstblock;
+		cur->bc_private.b.flist = flist;
+		cur->bc_private.b.flags = 0;
+	}
+
 	/*
 	 * Look up the extent index for the fsb where we start shifting. We can
 	 * henceforth iterate with current_ext as extent list changes are locked
@@ -5476,14 +5597,17 @@ xfs_bmap_shift_extents(
 	gotp = xfs_iext_bno_to_ext(ifp, start_fsb, &current_ext);
 	if (!gotp) {
 		*done = 1;
-		return 0;
+		goto del_cursor;
 	}
+	xfs_bmbt_get_all(gotp, &got);
 
-	if (ifp->if_flags & XFS_IFBROOT) {
-		cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
-		cur->bc_private.b.firstblock = *firstblock;
-		cur->bc_private.b.flist = flist;
-		cur->bc_private.b.flags = 0;
+	/*
+	 * If the first extent is shifted, offset_shift_fsb cannot be larger
+	 * than the starting offset of the first extent.
+	 */
+	if (current_ext == 0 && got.br_startoff < offset_shift_fsb) {
+		error = -EINVAL;
+		goto del_cursor;
 	}
 
 	/*
@@ -5493,30 +5617,42 @@ xfs_bmap_shift_extents(
 	 */
 	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
 	while (nexts++ < num_exts && current_ext < total_extents) {
-
-		gotp = xfs_iext_get_ext(ifp, current_ext);
-		xfs_bmbt_get_all(gotp, &got);
 		startoff = got.br_startoff - offset_shift_fsb;
 
-		/*
-		 * Before shifting extent into hole, make sure that the hole is
-		 * large enough to accommodate the shift.
-		 */
+		/* grab the left extent and check for a potential merge */
 		if (current_ext > 0) {
-			xfs_bmbt_get_all(xfs_iext_get_ext(ifp, current_ext - 1),
-					 &left);
-			if (startoff < left.br_startoff + left.br_blockcount)
+			leftp = xfs_iext_get_ext(ifp, current_ext - 1);
+			xfs_bmbt_get_all(leftp, &left);
+
+			/* make sure hole is large enough for shift */
+			if (startoff < left.br_startoff + left.br_blockcount) {
 				error = -EINVAL;
-		} else if (offset_shift_fsb > got.br_startoff) {
-			/*
-			 * When first extent is shifted, offset_shift_fsb should
-			 * be less than the stating offset of the first extent.
-			 */
-			error = -EINVAL;
+				goto del_cursor;
+			}
+
+			if (xfs_bmap_shift_extents_can_merge(&left, &got,
+							offset_shift_fsb)) {
+				error = xfs_bmap_shift_extents_merge(ip, whichfork,
+						offset_shift_fsb, current_ext, gotp,
+						leftp, cur, &logflags);
+				if (error)
+					goto del_cursor;
+
+				/*
+				 * The extent was merged so adjust the extent
+				 * index and move onto the next.
+				 */
+				current_ext--;
+				goto next;
+			}
 		}
-		if (error)
-			goto del_cursor;
 
+		/*
+		 * We didn't merge the extent so do the shift. Update the start
+		 * offset in the in-core extent and btree, if necessary.
+		 */
+		xfs_bmbt_set_startoff(gotp, startoff);
+		logflags |= XFS_ILOG_CORE;
 		if (cur) {
 			error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
 						   got.br_startblock,
@@ -5525,53 +5661,8 @@ xfs_bmap_shift_extents(
 			if (error)
 				goto del_cursor;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
-		}
-
-		/* Check if we can merge 2 adjacent extents */
-		if (current_ext &&
-		    left.br_startoff + left.br_blockcount == startoff &&
-		    left.br_startblock + left.br_blockcount ==
-				got.br_startblock &&
-		    left.br_state == got.br_state &&
-		    left.br_blockcount + got.br_blockcount <= MAXEXTLEN) {
-			blockcount = left.br_blockcount +
-				got.br_blockcount;
-			xfs_iext_remove(ip, current_ext, 1, 0);
-			logflags |= XFS_ILOG_CORE;
-			if (cur) {
-				error = xfs_btree_delete(cur, &i);
-				if (error)
-					goto del_cursor;
-				XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
-			} else {
-				logflags |= XFS_ILOG_DEXT;
-			}
-			XFS_IFORK_NEXT_SET(ip, whichfork,
-				XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
-			gotp = xfs_iext_get_ext(ifp, --current_ext);
-			xfs_bmbt_get_all(gotp, &got);
-
-			/* Make cursor point to the extent we will update */
-			if (cur) {
-				error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
-							   got.br_startblock,
-							   got.br_blockcount,
-							   &i);
-				if (error)
-					goto del_cursor;
-				XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
-			}
 
-			xfs_bmbt_set_blockcount(gotp, blockcount);
-			got.br_blockcount = blockcount;
-		} else {
-			/* We have to update the startoff */
-			xfs_bmbt_set_startoff(gotp, startoff);
 			got.br_startoff = startoff;
-		}
-
-		logflags |= XFS_ILOG_CORE;
-		if (cur) {
 			error = xfs_bmbt_update(cur, got.br_startoff,
 						got.br_startblock,
 						got.br_blockcount,
@@ -5582,18 +5673,20 @@ xfs_bmap_shift_extents(
 			logflags |= XFS_ILOG_DEXT;
 		}
 
-		current_ext++;
+next:
+		/* update total extent count and grab the next record */
 		total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
+		if (++current_ext >= total_extents)
+			break;
+		gotp = xfs_iext_get_ext(ifp, current_ext);
+		xfs_bmbt_get_all(gotp, &got);
 	}
 
 	/* Check if we are done */
 	if (current_ext == total_extents)
 		*done = 1;
-	else if (next_fsb) {
-		gotp = xfs_iext_get_ext(ifp, current_ext);
-		xfs_bmbt_get_all(gotp, &got);
+	else if (next_fsb)
 		*next_fsb = got.br_startoff;
-	}
 
 del_cursor:
 	if (cur)
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse
  2014-09-07 12:25 [PATCH 0/4] clean up collapse range and handle post-eof delalloc Brian Foster
  2014-09-07 12:25 ` [PATCH 1/4] xfs: track collapse via file offset rather than extent index Brian Foster
  2014-09-07 12:25 ` [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions Brian Foster
@ 2014-09-07 12:25 ` Brian Foster
  2014-09-09  5:13   ` Dave Chinner
  2014-09-07 12:26 ` [PATCH 4/4] xfs: only writeback and truncate pages for the freed range Brian Foster
  3 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2014-09-07 12:25 UTC (permalink / raw)
  To: xfs

The collapse range operation currently writes the entire file before
starting the collapse to avoid changes in the in-core extent list due to
writeback causing the extent count to change. Now that collapse range is
fsb based rather than extent index based it can sustain changes in the
extent list during the shift sequence without disruption.

Modify xfs_collapse_file_space() to writeback and invalidate pages
associated with the range of the file to be shifted.
xfs_free_file_space() currently has similar behavior, but the space free
need only affect the region of the file that is freed and this could
change in the future.

Also update the comments to reflect the current implementation. We
retain the eofblocks trim permanently as a best option for dealing with
delalloc extents. We don't shift delalloc extents because this scenario
only occurs with post-eof preallocation (since data must be flushed such
that the cache can be invalidated and data can be shifted). That means
said space must also be initialized before being shifted into the
accessible region of the file only to be immediately truncated off as
the last part of the collapse. In other words, the eofblocks trim will
happen anyways, we just run it first to ensure the file remains in a
consistent state throughout the collapse.

Finally, BUG() in the event of a delalloc extent during the extent shift
such that a failure is obvious. The implementation explicitly does not
support delalloc extents and the caller is expected to prevent this
scenario in advance as is done by collapse.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/libxfs/xfs_bmap.c |  2 ++
 fs/xfs/xfs_bmap_util.c   | 32 +++++++++++++++++++-------------
 2 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 449a016..1dd04c2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5617,6 +5617,8 @@ xfs_bmap_shift_extents(
 	 */
 	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
 	while (nexts++ < num_exts && current_ext < total_extents) {
+		/* can't handle delalloc extents */
+		BUG_ON(isnullstartblock(got.br_startblock));
 		startoff = got.br_startoff - offset_shift_fsb;

 		/* grab the left extent and check for a potential merge */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 1e96d77..eae763f 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1470,27 +1470,33 @@ xfs_collapse_file_space(
 	next_fsb = XFS_B_TO_FSB(mp, offset + len);
 	shift_fsb = XFS_B_TO_FSB(mp, len);

-	/*
-	 * Writeback the entire file and force remove any post-eof blocks. The
-	 * writeback prevents changes to the extent list via concurrent
-	 * writeback and the eofblocks trim prevents the extent shift algorithm
-	 * from running into a post-eof delalloc extent.
-	 *
-	 * XXX: This is a temporary fix until the extent shift loop below is
-	 * converted to use offsets and lookups within the ILOCK rather than
-	 * carrying around the index into the extent list for the next
-	 * iteration.
-	 */
-	error = filemap_write_and_wait(VFS_I(ip)->i_mapping);
+	error = xfs_free_file_space(ip, offset, len);
 	if (error)
 		return error;
+
+	/*
+	 * Trim eofblocks to avoid shifting uninitialized post-eof preallocation
+	 * into the accessible region of the file.
+	 */
 	if (xfs_can_free_eofblocks(ip, true)) {
 		error = xfs_free_eofblocks(mp, ip, false);
 		if (error)
 			return error;
 	}

-	error = xfs_free_file_space(ip, offset, len);
+	/*
+	 * Writeback and invalidate cache for the remainder of the file as we're
+	 * about to shift down every extent from the collapse range to EOF. The
+	 * free of the collapse range above might have already done some of
+	 * this, but we shouldn't rely on it to do anything outside of the range
+	 * that was freed.
+	 */
+	error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
+					     offset + len, -1);
+	if (error)
+		return error;
+	error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
+					(offset + len) >> PAGE_CACHE_SHIFT, -1);
 	if (error)
 		return error;

-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] xfs: only writeback and truncate pages for the freed range
  2014-09-07 12:25 [PATCH 0/4] clean up collapse range and handle post-eof delalloc Brian Foster
                   ` (2 preceding siblings ...)
  2014-09-07 12:25 ` [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse Brian Foster
@ 2014-09-07 12:26 ` Brian Foster
  2014-09-09  5:14   ` Dave Chinner
  3 siblings, 1 reply; 11+ messages in thread
From: Brian Foster @ 2014-09-07 12:26 UTC (permalink / raw)
  To: xfs

xfs_free_file_space() only affects the range of the file for which space
is being freed. It currently writes and truncates the page cache from
the start offset of the free to EOF.

Modify xfs_free_file_space() to write back and truncate page cache of
just the range being freed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index eae763f..809ae7d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1205,6 +1205,7 @@ xfs_free_file_space(
 	xfs_bmap_free_t		free_list;
 	xfs_bmbt_irec_t		imap;
 	xfs_off_t		ioffset;
+	xfs_off_t		iendoffset;
 	xfs_extlen_t		mod=0;
 	xfs_mount_t		*mp;
 	int			nimap;
@@ -1233,12 +1234,13 @@ xfs_free_file_space(
 	inode_dio_wait(VFS_I(ip));
 
 	rounding = max_t(xfs_off_t, 1 << mp->m_sb.sb_blocklog, PAGE_CACHE_SIZE);
-	ioffset = offset & ~(rounding - 1);
-	error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
-					      ioffset, -1);
+	ioffset = round_down(offset, rounding);
+	iendoffset = round_up(offset + len, rounding) - 1;
+	error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, ioffset,
+					     iendoffset);
 	if (error)
 		goto out;
-	truncate_pagecache_range(VFS_I(ip), ioffset, -1);
+	truncate_pagecache_range(VFS_I(ip), ioffset, iendoffset);
 
 	/*
 	 * Need to zero the stuff we're not freeing, on disk.
-- 
1.8.3.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] xfs: track collapse via file offset rather than extent index
  2014-09-07 12:25 ` [PATCH 1/4] xfs: track collapse via file offset rather than extent index Brian Foster
@ 2014-09-09  4:03   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2014-09-09  4:03 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Sun, Sep 07, 2014 at 08:25:57AM -0400, Brian Foster wrote:
> The collapse range implementation uses a transaction per extent shift.
> The progress of the overall operation is tracked via the current extent
> index of the in-core extent list. This is racy because the ilock must be
> dropped and reacquired for each transaction according to locking and log
> reservation rules. Therefore, writeback to prior regions of the file is
> possible and can change the extent count. This changes the extent to
> which the current index refers and causes the collapse to fail mid
> operation. To avoid this problem, the entire file is currently written
> back before the collapse operation starts.
> 
> To eliminate the need to flush the entire file, use the file offset
> (fsb) to track the progress of the overall extent shift operation rather
> than the extent index. Modify xfs_bmap_shift_extents() to
> unconditionally convert the start_fsb parameter to an extent index and
> return the file offset of the extent where the shift left off, if
> further extents exist. The bulk of ths function can remain based on
> extent index as ilock is held by the caller. xfs_collapse_file_space()
> now uses the fsb output as the starting point for the subsequent shift.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks good. There's a couple of small things I noticed, but
they aren't worth redoing the patches again to fix.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions
  2014-09-07 12:25 ` [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions Brian Foster
@ 2014-09-09  5:08   ` Dave Chinner
  2014-09-09 15:04     ` Brian Foster
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-09-09  5:08 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Sun, Sep 07, 2014 at 08:25:58AM -0400, Brian Foster wrote:
> The extent shift mechanism in xfs_bmap_shift_extents() is complicated
> and handles several different, non-deterministic scenarios. These
> include extent shifts, extent merges and potential btree updates in
> either of the former scenarios.
> 
> Refactor the code to be more linear and readable. The loop logic in
> xfs_bmap_shift_extents() and some initial error checking is adjusted
> slightly. The associated btree lookup and update/delete operations are
> condensed into single blocks of code. This reduces the number of
> btree-specific blocks and facilitates the separation of the merge
> operation into a new xfs_bmap_shift_extents_merge() helper.  The merge
> check is also separated into an inline.
> 
> This is a code refactor only. The behavior of extent shift and collapse
> range is not modified.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 243 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 168 insertions(+), 75 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 4b3f1b9..449a016 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5404,6 +5404,120 @@ error0:
>  }
>  
>  /*
> + * Determine whether an extent shift can be accomplished by a merge with the
> + * extent that precedes the target hole of the shift.
> + */
> +static inline bool
> +xfs_bmap_shift_extents_can_merge(

No need for "inline" for static functions. The compiler will do that
as an optimisation if appropriate.

> +	struct xfs_bmbt_irec	*left,	/* preceding extent */
> +	struct xfs_bmbt_irec	*got,	/* current extent to shift */
> +	xfs_fileoff_t		shift)	/* shift fsb */
> +{
> +	xfs_fileoff_t		startoff;
> +
> +	startoff = got->br_startoff - shift;
> +
> +	/*
> +	 * The extent, once shifted, must be adjacent in-file and on-disk with
> +	 * the preceding extent.
> +	 */
> +	if ((left->br_startoff + left->br_blockcount != startoff) ||
> +	    (left->br_startblock + left->br_blockcount != got->br_startblock) ||
> +	    (left->br_state != got->br_state) ||
> +	    (left->br_blockcount + got->br_blockcount > MAXEXTLEN))
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
> + * An extent shift adjusts the file offset of an extent to fill a preceding hole
> + * in the file. If an extent shift would result in the extent being fully
> + * adjacent to the extent that currently precedes the hole, we can merge with
> + * the preceding extent rather than do the shift.
> + *
> + * This function assumes the caller has verified a shift-by-merge is possible
> + * with the provided extents via xfs_bmap_shift_extents_can_merge().
> + */
> +static int
> +xfs_bmap_shift_extents_merge(
> +	struct xfs_inode		*ip,
> +	int				whichfork,
> +	xfs_fileoff_t			shift,		/* shift fsb */
> +	int				current_ext,	/* idx of gotp */
> +	struct xfs_bmbt_rec_host	*gotp,		/* extent to shift */
> +	struct xfs_bmbt_rec_host	*leftp,		/* preceding extent */
> +	struct xfs_btree_cur		*cur,
> +	int				*logflags)	/* output */
> +{
> +	struct xfs_ifork		*ifp;
> +	struct xfs_bmbt_irec		got;
> +	struct xfs_bmbt_irec		left;
> +	xfs_filblks_t			blockcount;
> +	int				error, i;
> +
> +	ifp = XFS_IFORK_PTR(ip, whichfork);
> +	xfs_bmbt_get_all(gotp, &got);
> +	xfs_bmbt_get_all(leftp, &left);
> +	blockcount = left.br_blockcount + got.br_blockcount;
> +
> +	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> +	ASSERT(xfs_bmap_shift_extents_can_merge(&left, &got, shift));
> +
> +	/*
> +	 * Merge the in-core extents. Note that the host record pointers and
> +	 * current_ext index are invalid once the extent has been removed via
> +	 * xfs_iext_remove().
> +	 */
> +	xfs_bmbt_set_blockcount(leftp, blockcount);
> +	xfs_iext_remove(ip, current_ext, 1, 0);
> +
> +	/*
> +	 * Update the on-disk extent count, the btree if necessary and log the
> +	 * inode.
> +	 */
> +	XFS_IFORK_NEXT_SET(ip, whichfork,
> +			   XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
> +	*logflags |= XFS_ILOG_CORE;
> +	if (cur) {
> +		/* lookup and remove the extent to merge */
> +		error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> +				got.br_startblock, got.br_blockcount, &i);
> +		if (error)
> +			goto error;

Probably shouldn't give the jump label the same name as a variable
in the function. "out_error" is commonly used here.

> +		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
> +
> +		error = xfs_btree_delete(cur, &i);
> +		if (error)
> +			goto error;
> +		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
> +
> +		/* lookup and update size of the previous extent */
> +		error = xfs_bmbt_lookup_eq(cur, left.br_startoff,
> +				left.br_startblock, left.br_blockcount, &i);
> +		if (error)
> +			goto error;
> +		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
> +
> +		left.br_blockcount = blockcount;
> +
> +		error = xfs_bmbt_update(cur, left.br_startoff,
> +				left.br_startblock, left.br_blockcount,
> +				left.br_state);
> +		if (error)
> +			goto error;
> +	} else {
> +		*logflags |= XFS_ILOG_DEXT;
> +	}
> +
> +	return 0;
> +
> +error:
> +	return error;

This code is much clearer, though I'd get rid of further
indents by changing the logic around like so:

....
	*logflags |= XFS_ILOG_CORE;
	if (!cur) {
		*logflags |= XFS_ILOG_DEXT;
		return 0;
	}

	/* lookup and remove the extent to merge */
	error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
				got.br_startblock, got.br_blockcount, &i);
	if (error)
		goto out_error;
.....

	error = xfs_bmbt_update(cur, left.br_startoff,
				left.br_startblock, left.br_blockcount,
				left.br_state);
out_error:
	return error;
}

> @@ -5493,30 +5617,42 @@ xfs_bmap_shift_extents(
>  	 */
>  	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
>  	while (nexts++ < num_exts && current_ext < total_extents) {
> -
> -		gotp = xfs_iext_get_ext(ifp, current_ext);
> -		xfs_bmbt_get_all(gotp, &got);
>  		startoff = got.br_startoff - offset_shift_fsb;
>  
> -		/*
> -		 * Before shifting extent into hole, make sure that the hole is
> -		 * large enough to accommodate the shift.
> -		 */
> +		/* grab the left extent and check for a potential merge */
>  		if (current_ext > 0) {
> -			xfs_bmbt_get_all(xfs_iext_get_ext(ifp, current_ext - 1),
> -					 &left);
> -			if (startoff < left.br_startoff + left.br_blockcount)
> +			leftp = xfs_iext_get_ext(ifp, current_ext - 1);
> +			xfs_bmbt_get_all(leftp, &left);
> +
> +			/* make sure hole is large enough for shift */
> +			if (startoff < left.br_startoff + left.br_blockcount) {
>  				error = -EINVAL;
> -		} else if (offset_shift_fsb > got.br_startoff) {
> -			/*
> -			 * When first extent is shifted, offset_shift_fsb should
> -			 * be less than the stating offset of the first extent.
> -			 */
> -			error = -EINVAL;
> +				goto del_cursor;
> +			}
> +
> +			if (xfs_bmap_shift_extents_can_merge(&left, &got,
> +							offset_shift_fsb)) {
> +				error = xfs_bmap_shift_extents_merge(ip, whichfork,
> +						offset_shift_fsb, current_ext, gotp,
> +						leftp, cur, &logflags);
> +				if (error)
> +					goto del_cursor;
> +
> +				/*
> +				 * The extent was merged so adjust the extent
> +				 * index and move onto the next.
> +				 */
> +				current_ext--;
> +				goto next;
> +			}
>  		}
> -		if (error)
> -			goto del_cursor;
>  
> +		/*
> +		 * We didn't merge the extent so do the shift. Update the start
> +		 * offset in the in-core extent and btree, if necessary.
> +		 */
> +		xfs_bmbt_set_startoff(gotp, startoff);
> +		logflags |= XFS_ILOG_CORE;
>  		if (cur) {
>  			error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
>  						   got.br_startblock,

This doesn't do a lot to improve the code. It increases the level of
indent and, IMO, makes it harder to read. It's a classic case of
function names getting so long there's no space left for the code...

So, how about s/xfs_bmap_shift_extents/xfs_bmse/ as a means of
reducing the namespace verbosity? And, really, that whole loop body
could be pushed into a separate function. i.e

        while (nexts++ < num_exts) {
		error = xfs_bmse_collapse_one(ip, ifp, cur, &current_ext, gotp,
					      offset_shift_fsb, &logflags);
		if (error)
			goto del_cursor;

		/* check against total extent count, grab the next record */
		total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
		if (current_ext >= total_extents)
			break;
		gotp = xfs_iext_get_ext(ifp, current_ext);
		xfs_bmbt_get_all(gotp, &got);
	}

That reduces all the jumps/error cases simply to return statements,
allowing them to be ordered clearly to minimise the indent of
the code. It also means that way the value of current_ext changes is
obvious, rather than having the decrement/increment because of the
"next iteration" handling in the loop always incrementing the value.

Even the initial check outside the loop for:

        if (current_ext == 0 && got.br_startoff < offset_shift_fsb) {
		error = -EINVAL;
		goto del_cursor
	}

could be driven inside xfs_bmse_collapse_one() as the first check that
is done, and that would further simplify xfs_bmap_shift_extents().
i.e.

xfs_bmse_collapse_one()
{
	if (*current_ext == 0) {
		if (got.br_startoff < offset_shift_fsb)
			return -EINVAL;
		goto shift_extent;
	}

	/* get left, do merge checks */
	....
	if (!xfs_bmse_can_merge(&left, &got, offset_shift_fsb))
		goto shift_extent;

	return xfs_bmse_merge(ip, whichfork, ...)

shift_extent:
	(*current_ext)++;
	xfs_bmbt_set_startoff(gotp, startoff);
	logflags |= XFS_ILOG_CORE;
	if (!cur)
		*logflags |= XFS_ILOG_DEXT;
		return 0;
	}

	/* do btree shift */
	....
	return error;
}

This way we end up with a set of well defined operations, along
with clear logic on how they are iterated to make do the overall
shift operation.

What do you think?

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse
  2014-09-07 12:25 ` [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse Brian Foster
@ 2014-09-09  5:13   ` Dave Chinner
  2014-09-09 15:16     ` Brian Foster
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2014-09-09  5:13 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Sun, Sep 07, 2014 at 08:25:59AM -0400, Brian Foster wrote:
> The collapse range operation currently writes the entire file before
> starting the collapse to avoid changes in the in-core extent list due to
> writeback causing the extent count to change. Now that collapse range is
> fsb based rather than extent index based it can sustain changes in the
> extent list during the shift sequence without disruption.
> 
> Modify xfs_collapse_file_space() to writeback and invalidate pages
> associated with the range of the file to be shifted.
> xfs_free_file_space() currently has similar behavior, but the space free
> need only affect the region of the file that is freed and this could
> change in the future.
> 
> Also update the comments to reflect the current implementation. We
> retain the eofblocks trim permanently as a best option for dealing with
> delalloc extents. We don't shift delalloc extents because this scenario
> only occurs with post-eof preallocation (since data must be flushed such
> that the cache can be invalidated and data can be shifted). That means
> said space must also be initialized before being shifted into the
> accessible region of the file only to be immediately truncated off as
> the last part of the collapse. In other words, the eofblocks trim will
> happen anyways, we just run it first to ensure the file remains in a
> consistent state throughout the collapse.
> 
> Finally, BUG() in the event of a delalloc extent during the extent shift
> such that a failure is obvious. The implementation explicitly does not
> support delalloc extents and the caller is expected to prevent this
> scenario in advance as is done by collapse.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c |  2 ++
>  fs/xfs/xfs_bmap_util.c   | 32 +++++++++++++++++++-------------
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 449a016..1dd04c2 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5617,6 +5617,8 @@ xfs_bmap_shift_extents(
>  	 */
>  	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
>  	while (nexts++ < num_exts && current_ext < total_extents) {
> +		/* can't handle delalloc extents */
> +		BUG_ON(isnullstartblock(got.br_startblock));

XFS_WANT_CORRUPTED_GOTO() would be better, I think.

Otherwise OK.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] xfs: only writeback and truncate pages for the freed range
  2014-09-07 12:26 ` [PATCH 4/4] xfs: only writeback and truncate pages for the freed range Brian Foster
@ 2014-09-09  5:14   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2014-09-09  5:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Sun, Sep 07, 2014 at 08:26:00AM -0400, Brian Foster wrote:
> xfs_free_file_space() only affects the range of the file for which space
> is being freed. It currently writes and truncates the page cache from
> the start offset of the free to EOF.
> 
> Modify xfs_free_file_space() to write back and truncate page cache of
> just the range being freed.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

looks good.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions
  2014-09-09  5:08   ` Dave Chinner
@ 2014-09-09 15:04     ` Brian Foster
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2014-09-09 15:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Sep 09, 2014 at 03:08:00PM +1000, Dave Chinner wrote:
> On Sun, Sep 07, 2014 at 08:25:58AM -0400, Brian Foster wrote:
> > The extent shift mechanism in xfs_bmap_shift_extents() is complicated
> > and handles several different, non-deterministic scenarios. These
> > include extent shifts, extent merges and potential btree updates in
> > either of the former scenarios.
> > 
> > Refactor the code to be more linear and readable. The loop logic in
> > xfs_bmap_shift_extents() and some initial error checking is adjusted
> > slightly. The associated btree lookup and update/delete operations are
> > condensed into single blocks of code. This reduces the number of
> > btree-specific blocks and facilitates the separation of the merge
> > operation into a new xfs_bmap_shift_extents_merge() helper.  The merge
> > check is also separated into an inline.
> > 
> > This is a code refactor only. The behavior of extent shift and collapse
> > range is not modified.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c | 243 ++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 168 insertions(+), 75 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 4b3f1b9..449a016 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -5404,6 +5404,120 @@ error0:
> >  }
> >  
> >  /*
> > + * Determine whether an extent shift can be accomplished by a merge with the
> > + * extent that precedes the target hole of the shift.
> > + */
> > +static inline bool
> > +xfs_bmap_shift_extents_can_merge(
> 
> No need for "inline" for static functions. The compiler will do that
> as an optimisation if appropriate.
> 

Ok.

> > +	struct xfs_bmbt_irec	*left,	/* preceding extent */
> > +	struct xfs_bmbt_irec	*got,	/* current extent to shift */
> > +	xfs_fileoff_t		shift)	/* shift fsb */
> > +{
> > +	xfs_fileoff_t		startoff;
> > +
> > +	startoff = got->br_startoff - shift;
> > +
> > +	/*
> > +	 * The extent, once shifted, must be adjacent in-file and on-disk with
> > +	 * the preceding extent.
> > +	 */
> > +	if ((left->br_startoff + left->br_blockcount != startoff) ||
> > +	    (left->br_startblock + left->br_blockcount != got->br_startblock) ||
> > +	    (left->br_state != got->br_state) ||
> > +	    (left->br_blockcount + got->br_blockcount > MAXEXTLEN))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +/*
> > + * An extent shift adjusts the file offset of an extent to fill a preceding hole
> > + * in the file. If an extent shift would result in the extent being fully
> > + * adjacent to the extent that currently precedes the hole, we can merge with
> > + * the preceding extent rather than do the shift.
> > + *
> > + * This function assumes the caller has verified a shift-by-merge is possible
> > + * with the provided extents via xfs_bmap_shift_extents_can_merge().
> > + */
> > +static int
> > +xfs_bmap_shift_extents_merge(
> > +	struct xfs_inode		*ip,
> > +	int				whichfork,
> > +	xfs_fileoff_t			shift,		/* shift fsb */
> > +	int				current_ext,	/* idx of gotp */
> > +	struct xfs_bmbt_rec_host	*gotp,		/* extent to shift */
> > +	struct xfs_bmbt_rec_host	*leftp,		/* preceding extent */
> > +	struct xfs_btree_cur		*cur,
> > +	int				*logflags)	/* output */
> > +{
> > +	struct xfs_ifork		*ifp;
> > +	struct xfs_bmbt_irec		got;
> > +	struct xfs_bmbt_irec		left;
> > +	xfs_filblks_t			blockcount;
> > +	int				error, i;
> > +
> > +	ifp = XFS_IFORK_PTR(ip, whichfork);
> > +	xfs_bmbt_get_all(gotp, &got);
> > +	xfs_bmbt_get_all(leftp, &left);
> > +	blockcount = left.br_blockcount + got.br_blockcount;
> > +
> > +	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > +	ASSERT(xfs_bmap_shift_extents_can_merge(&left, &got, shift));
> > +
> > +	/*
> > +	 * Merge the in-core extents. Note that the host record pointers and
> > +	 * current_ext index are invalid once the extent has been removed via
> > +	 * xfs_iext_remove().
> > +	 */
> > +	xfs_bmbt_set_blockcount(leftp, blockcount);
> > +	xfs_iext_remove(ip, current_ext, 1, 0);
> > +
> > +	/*
> > +	 * Update the on-disk extent count, the btree if necessary and log the
> > +	 * inode.
> > +	 */
> > +	XFS_IFORK_NEXT_SET(ip, whichfork,
> > +			   XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
> > +	*logflags |= XFS_ILOG_CORE;
> > +	if (cur) {
> > +		/* lookup and remove the extent to merge */
> > +		error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> > +				got.br_startblock, got.br_blockcount, &i);
> > +		if (error)
> > +			goto error;
> 
> Probably shouldn't give the jump label the same name as a variable
> in the function. "out_error" is commonly used here.
> 

Ok.

> > +		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
> > +
> > +		error = xfs_btree_delete(cur, &i);
> > +		if (error)
> > +			goto error;
> > +		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
> > +
> > +		/* lookup and update size of the previous extent */
> > +		error = xfs_bmbt_lookup_eq(cur, left.br_startoff,
> > +				left.br_startblock, left.br_blockcount, &i);
> > +		if (error)
> > +			goto error;
> > +		XFS_WANT_CORRUPTED_GOTO(i == 1, error);
> > +
> > +		left.br_blockcount = blockcount;
> > +
> > +		error = xfs_bmbt_update(cur, left.br_startoff,
> > +				left.br_startblock, left.br_blockcount,
> > +				left.br_state);
> > +		if (error)
> > +			goto error;
> > +	} else {
> > +		*logflags |= XFS_ILOG_DEXT;
> > +	}
> > +
> > +	return 0;
> > +
> > +error:
> > +	return error;
> 
> This code is much clearer, though I'd get rid of further
> indents by changing the logic around like so:
> 

Yeah, good idea.

> ....
> 	*logflags |= XFS_ILOG_CORE;
> 	if (!cur) {
> 		*logflags |= XFS_ILOG_DEXT;
> 		return 0;
> 	}
> 
> 	/* lookup and remove the extent to merge */
> 	error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> 				got.br_startblock, got.br_blockcount, &i);
> 	if (error)
> 		goto out_error;
> .....
> 
> 	error = xfs_bmbt_update(cur, left.br_startoff,
> 				left.br_startblock, left.br_blockcount,
> 				left.br_state);
> out_error:
> 	return error;
> }
> 
> > @@ -5493,30 +5617,42 @@ xfs_bmap_shift_extents(
> >  	 */
> >  	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> >  	while (nexts++ < num_exts && current_ext < total_extents) {
> > -
> > -		gotp = xfs_iext_get_ext(ifp, current_ext);
> > -		xfs_bmbt_get_all(gotp, &got);
> >  		startoff = got.br_startoff - offset_shift_fsb;
> >  
> > -		/*
> > -		 * Before shifting extent into hole, make sure that the hole is
> > -		 * large enough to accommodate the shift.
> > -		 */
> > +		/* grab the left extent and check for a potential merge */
> >  		if (current_ext > 0) {
> > -			xfs_bmbt_get_all(xfs_iext_get_ext(ifp, current_ext - 1),
> > -					 &left);
> > -			if (startoff < left.br_startoff + left.br_blockcount)
> > +			leftp = xfs_iext_get_ext(ifp, current_ext - 1);
> > +			xfs_bmbt_get_all(leftp, &left);
> > +
> > +			/* make sure hole is large enough for shift */
> > +			if (startoff < left.br_startoff + left.br_blockcount) {
> >  				error = -EINVAL;
> > -		} else if (offset_shift_fsb > got.br_startoff) {
> > -			/*
> > -			 * When first extent is shifted, offset_shift_fsb should
> > -			 * be less than the stating offset of the first extent.
> > -			 */
> > -			error = -EINVAL;
> > +				goto del_cursor;
> > +			}
> > +
> > +			if (xfs_bmap_shift_extents_can_merge(&left, &got,
> > +							offset_shift_fsb)) {
> > +				error = xfs_bmap_shift_extents_merge(ip, whichfork,
> > +						offset_shift_fsb, current_ext, gotp,
> > +						leftp, cur, &logflags);
> > +				if (error)
> > +					goto del_cursor;
> > +
> > +				/*
> > +				 * The extent was merged so adjust the extent
> > +				 * index and move onto the next.
> > +				 */
> > +				current_ext--;
> > +				goto next;
> > +			}
> >  		}
> > -		if (error)
> > -			goto del_cursor;
> >  
> > +		/*
> > +		 * We didn't merge the extent so do the shift. Update the start
> > +		 * offset in the in-core extent and btree, if necessary.
> > +		 */
> > +		xfs_bmbt_set_startoff(gotp, startoff);
> > +		logflags |= XFS_ILOG_CORE;
> >  		if (cur) {
> >  			error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> >  						   got.br_startblock,
> 
> This doesn't do a lot to improve the code. It increases the level of
> indent and, IMO, makes it harder to read. It's a classic case of
> function names getting so long there's no space left for the code...
> 
> So, how about s/xfs_bmap_shift_extents/xfs_bmse/ as a means of
> reducing the namespace verbosity? And, really, that whole loop body
> could be pushed into a separate function. i.e
> 
>         while (nexts++ < num_exts) {
> 		error = xfs_bmse_collapse_one(ip, ifp, cur, &current_ext, gotp,
> 					      offset_shift_fsb, &logflags);
> 		if (error)
> 			goto del_cursor;
> 
> 		/* check against total extent count, grab the next record */
> 		total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> 		if (current_ext >= total_extents)
> 			break;
> 		gotp = xfs_iext_get_ext(ifp, current_ext);
> 		xfs_bmbt_get_all(gotp, &got);
> 	}
> 
> That reduces all the jumps/error cases simply to return statements,
> allowing them to be ordered clearly to minimise the indent of
> the code. It also means that way the value of current_ext changes is
> obvious, rather than having the decrement/increment because of the
> "next iteration" handling in the loop always incrementing the value.
> 
> Even the initial check outside the loop for:
> 
>         if (current_ext == 0 && got.br_startoff < offset_shift_fsb) {
> 		error = -EINVAL;
> 		goto del_cursor
> 	}
> 
> could be driven inside xfs_bmse_collapse_one() as the first check that
> is done, and that would further simplify xfs_bmap_shift_extents().
> i.e.
> 
> xfs_bmse_collapse_one()
> {
> 	if (*current_ext == 0) {
> 		if (got.br_startoff < offset_shift_fsb)
> 			return -EINVAL;
> 		goto shift_extent;
> 	}
> 
> 	/* get left, do merge checks */
> 	....
> 	if (!xfs_bmse_can_merge(&left, &got, offset_shift_fsb))
> 		goto shift_extent;
> 
> 	return xfs_bmse_merge(ip, whichfork, ...)
> 
> shift_extent:
> 	(*current_ext)++;
> 	xfs_bmbt_set_startoff(gotp, startoff);
> 	logflags |= XFS_ILOG_CORE;
> 	if (!cur)
> 		*logflags |= XFS_ILOG_DEXT;
> 		return 0;
> 	}
> 
> 	/* do btree shift */
> 	....
> 	return error;
> }
> 
> This way we end up with a set of well defined operations, along
> with clear logic on how they are iterated to make do the overall
> shift operation.
> 
> What do you think?
> 

That all sounds pretty good to me. I may incorporate the aforementioned
updates to the merge bits to this patch and create a new patch for the
subsequent refactor of the loop body. E.g., this patch becomes "...
refactor extent merge into new function." But I'll play with it and see
what falls out... thanks for the feedback.

Brian

> Cheers,
> 
> Dave.
> 
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse
  2014-09-09  5:13   ` Dave Chinner
@ 2014-09-09 15:16     ` Brian Foster
  0 siblings, 0 replies; 11+ messages in thread
From: Brian Foster @ 2014-09-09 15:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Sep 09, 2014 at 03:13:26PM +1000, Dave Chinner wrote:
> On Sun, Sep 07, 2014 at 08:25:59AM -0400, Brian Foster wrote:
> > The collapse range operation currently writes the entire file before
> > starting the collapse to avoid changes in the in-core extent list due to
> > writeback causing the extent count to change. Now that collapse range is
> > fsb based rather than extent index based it can sustain changes in the
> > extent list during the shift sequence without disruption.
> > 
> > Modify xfs_collapse_file_space() to writeback and invalidate pages
> > associated with the range of the file to be shifted.
> > xfs_free_file_space() currently has similar behavior, but the space free
> > need only affect the region of the file that is freed and this could
> > change in the future.
> > 
> > Also update the comments to reflect the current implementation. We
> > retain the eofblocks trim permanently as a best option for dealing with
> > delalloc extents. We don't shift delalloc extents because this scenario
> > only occurs with post-eof preallocation (since data must be flushed such
> > that the cache can be invalidated and data can be shifted). That means
> > said space must also be initialized before being shifted into the
> > accessible region of the file only to be immediately truncated off as
> > the last part of the collapse. In other words, the eofblocks trim will
> > happen anyways, we just run it first to ensure the file remains in a
> > consistent state throughout the collapse.
> > 
> > Finally, BUG() in the event of a delalloc extent during the extent shift
> > such that a failure is obvious. The implementation explicitly does not
> > support delalloc extents and the caller is expected to prevent this
> > scenario in advance as is done by collapse.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_bmap.c |  2 ++
> >  fs/xfs/xfs_bmap_util.c   | 32 +++++++++++++++++++-------------
> >  2 files changed, 21 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 449a016..1dd04c2 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -5617,6 +5617,8 @@ xfs_bmap_shift_extents(
> >  	 */
> >  	total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> >  	while (nexts++ < num_exts && current_ext < total_extents) {
> > +		/* can't handle delalloc extents */
> > +		BUG_ON(isnullstartblock(got.br_startblock));
> 
> XFS_WANT_CORRUPTED_GOTO() would be better, I think.
> 

Ok. I suppose we'll shutdown the fs if the transaction was dirtied by
that point anyways. I want to make sure the failure is explicit more
than anything, as opposed to an unclear and non-guaranteed lookup
failure in the event of a btree, so that works for me.

Brian

> Otherwise OK.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-09-09 15:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-07 12:25 [PATCH 0/4] clean up collapse range and handle post-eof delalloc Brian Foster
2014-09-07 12:25 ` [PATCH 1/4] xfs: track collapse via file offset rather than extent index Brian Foster
2014-09-09  4:03   ` Dave Chinner
2014-09-07 12:25 ` [PATCH 2/4] xfs: refactor xfs_bmap_shift_extents() into multiple functions Brian Foster
2014-09-09  5:08   ` Dave Chinner
2014-09-09 15:04     ` Brian Foster
2014-09-07 12:25 ` [PATCH 3/4] xfs: writeback and inval. file range to be shifted by collapse Brian Foster
2014-09-09  5:13   ` Dave Chinner
2014-09-09 15:16     ` Brian Foster
2014-09-07 12:26 ` [PATCH 4/4] xfs: only writeback and truncate pages for the freed range Brian Foster
2014-09-09  5:14   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.