linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] xfs: some end COW remapping optimization
@ 2022-02-16  3:08 Gao Xiang
  2022-02-16  3:08 ` [PATCH v2 1/3] xfs: get rid of LEFT, RIGHT, PREV in xfs_bmap_add_extent_unwritten_real() Gao Xiang
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Gao Xiang @ 2022-02-16  3:08 UTC (permalink / raw)
  To: xfs; +Cc: LKML, Darrick J. Wong, Gao Xiang

Hi folks,

Currently, xfs_reflink_end_cow_extent() will unconditionally unmap an
extent from DATA fork and then remap an extent from COW fork. It seems
somewhat ineffective since for many cases we could update real bmbt
records directly by sightly enhancing old
xfs_bmap_add_extent_unwritten_real() implementation, thus reduce some
measurable extra metadata overhead.

It's important to us since, actually, we're planing to use a modified
alway-cow like atomic write approach internally for database
applications, therefore it'd be nice to do some optimization over
simple end COW approach. Also I think it's still generic and can
benefit other reflink use cases as well.

I did some tests with ramdisk in order to measure metadata overhead:

echo 1 > /sys/fs/xfs/debug/always_cow
mkfs.xfs -f -mreflink=1 /dev/ram0
mount /dev/ram0 testdir
fio -filename=testdir/1 -size=1G -ioengine=psync -bs=4k -rw=randwrite -overwrite=1 -direct=1 -end_fsync=1 -name=job1

Test results as below:
Vanilla:
(1)   iops        : min= 7986, max=16434, avg=12505.76, stdev=2400.05, samples=41
(2)   iops        : min= 7636, max=16376, avg=12474.19, stdev=2258.18, samples=42
(3)   iops        : min= 8346, max=16439, avg=12227.95, stdev=2432.12, samples=42
(4)   iops        : min= 8580, max=16496, avg=12779.41, stdev=2297.42, samples=41
(5)   iops        : min= 8286, max=16556, avg=12500.76, stdev=2123.90, samples=41

Patched:
(1)   iops        : min= 7086, max=17132, avg=12931.20, stdev=2729.10, samples=40
(2)   iops        : min= 7704, max=17508, avg=13204.62, stdev=2507.70, samples=39
(3)   iops        : min= 8736, max=17634, avg=13253.08, stdev=2545.18, samples=39
(4)   iops        : min= 7188, max=17550, avg=12928.40, stdev=2633.64, samples=40
(5)   iops        : min= 8268, max=17446, avg=12837.55, stdev=2717.98, samples=40

xfstests seems survived. Comments are much welcomed and
thanks for your time!

Thanks,
Gao Xiang

Changes since v1:
 - fix missing tmp_logflags initialization;
 - drop unnecessary realtime inode check pointed out by Darrick.

Gao Xiang (3):
  xfs: get rid of LEFT, RIGHT, PREV in
    xfs_bmap_add_extent_unwritten_real()
  xfs: introduce xfs_bmap_update_extent_real()
  xfs: introduce xfs_bremapi_from_cowfork()

 fs/xfs/libxfs/xfs_bmap.c | 377 +++++++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_bmap.h |   7 +-
 fs/xfs/xfs_reflink.c     |  24 +--
 3 files changed, 252 insertions(+), 156 deletions(-)

-- 
2.24.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] xfs: get rid of LEFT, RIGHT, PREV in xfs_bmap_add_extent_unwritten_real()
  2022-02-16  3:08 [PATCH v2 0/3] xfs: some end COW remapping optimization Gao Xiang
@ 2022-02-16  3:08 ` Gao Xiang
  2022-02-16  3:08 ` [PATCH v2 2/3] xfs: introduce xfs_bmap_update_extent_real() Gao Xiang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Gao Xiang @ 2022-02-16  3:08 UTC (permalink / raw)
  To: xfs; +Cc: LKML, Darrick J. Wong, Gao Xiang

It doesn't seems that such macros are easier to read. Also, they
could polluate the identifier namespace.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 200 +++++++++++++++++++--------------------
 1 file changed, 96 insertions(+), 104 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 74198dd82b03..14d1a806ba15 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1935,11 +1935,11 @@ xfs_bmap_add_extent_delay_real(
 int					/* error */
 xfs_bmap_add_extent_unwritten_real(
 	struct xfs_trans	*tp,
-	xfs_inode_t		*ip,	/* incore inode pointer */
+	struct xfs_inode	*ip,	/* incore inode pointer */
 	int			whichfork,
 	struct xfs_iext_cursor	*icur,
 	struct xfs_btree_cur	**curp,	/* if *curp is null, not a btree */
-	xfs_bmbt_irec_t		*new,	/* new data to add to file extents */
+	struct xfs_bmbt_irec	*new,	/* new data to add to file extents */
 	int			*logflagsp) /* inode logging flags */
 {
 	struct xfs_btree_cur	*cur;	/* btree cursor */
@@ -1947,8 +1947,8 @@ xfs_bmap_add_extent_unwritten_real(
 	int			i;	/* temp state */
 	struct xfs_ifork	*ifp;	/* inode fork pointer */
 	xfs_fileoff_t		new_endoff;	/* end offset of new entry */
-	xfs_bmbt_irec_t		r[3];	/* neighbor extent entries */
-					/* left is 0, right is 1, prev is 2 */
+	struct xfs_bmbt_irec	left, right;	/* neighbor extent entries */
+	struct xfs_bmbt_irec	prev;		/* previous old extent */
 	int			rval=0;	/* return value (logging flags) */
 	int			state = xfs_bmap_fork_to_state(whichfork);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1963,44 +1963,40 @@ xfs_bmap_add_extent_unwritten_real(
 
 	XFS_STATS_INC(mp, xs_add_exlist);
 
-#define	LEFT		r[0]
-#define	RIGHT		r[1]
-#define	PREV		r[2]
-
 	/*
 	 * Set up a bunch of variables to make the tests simpler.
 	 */
 	error = 0;
-	xfs_iext_get_extent(ifp, icur, &PREV);
-	ASSERT(new->br_state != PREV.br_state);
+	xfs_iext_get_extent(ifp, icur, &prev);
+	ASSERT(new->br_state != prev.br_state);
 	new_endoff = new->br_startoff + new->br_blockcount;
-	ASSERT(PREV.br_startoff <= new->br_startoff);
-	ASSERT(PREV.br_startoff + PREV.br_blockcount >= new_endoff);
+	ASSERT(prev.br_startoff <= new->br_startoff);
+	ASSERT(prev.br_startoff + prev.br_blockcount >= new_endoff);
 
 	/*
 	 * Set flags determining what part of the previous oldext allocation
 	 * extent is being replaced by a newext allocation.
 	 */
-	if (PREV.br_startoff == new->br_startoff)
+	if (prev.br_startoff == new->br_startoff)
 		state |= BMAP_LEFT_FILLING;
-	if (PREV.br_startoff + PREV.br_blockcount == new_endoff)
+	if (prev.br_startoff + prev.br_blockcount == new_endoff)
 		state |= BMAP_RIGHT_FILLING;
 
 	/*
 	 * Check and set flags if this segment has a left neighbor.
 	 * Don't set contiguous if the combined extent would be too large.
 	 */
-	if (xfs_iext_peek_prev_extent(ifp, icur, &LEFT)) {
+	if (xfs_iext_peek_prev_extent(ifp, icur, &left)) {
 		state |= BMAP_LEFT_VALID;
-		if (isnullstartblock(LEFT.br_startblock))
+		if (isnullstartblock(left.br_startblock))
 			state |= BMAP_LEFT_DELAY;
 	}
 
 	if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) &&
-	    LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
-	    LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
-	    LEFT.br_state == new->br_state &&
-	    LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
+	    left.br_startoff + left.br_blockcount == new->br_startoff &&
+	    left.br_startblock + left.br_blockcount == new->br_startblock &&
+	    left.br_state == new->br_state &&
+	    left.br_blockcount + new->br_blockcount <= MAXEXTLEN)
 		state |= BMAP_LEFT_CONTIG;
 
 	/*
@@ -2008,22 +2004,22 @@ xfs_bmap_add_extent_unwritten_real(
 	 * Don't set contiguous if the combined extent would be too large.
 	 * Also check for all-three-contiguous being too large.
 	 */
-	if (xfs_iext_peek_next_extent(ifp, icur, &RIGHT)) {
+	if (xfs_iext_peek_next_extent(ifp, icur, &right)) {
 		state |= BMAP_RIGHT_VALID;
-		if (isnullstartblock(RIGHT.br_startblock))
+		if (isnullstartblock(right.br_startblock))
 			state |= BMAP_RIGHT_DELAY;
 	}
 
 	if ((state & BMAP_RIGHT_VALID) && !(state & BMAP_RIGHT_DELAY) &&
-	    new_endoff == RIGHT.br_startoff &&
-	    new->br_startblock + new->br_blockcount == RIGHT.br_startblock &&
-	    new->br_state == RIGHT.br_state &&
-	    new->br_blockcount + RIGHT.br_blockcount <= MAXEXTLEN &&
+	    new_endoff == right.br_startoff &&
+	    new->br_startblock + new->br_blockcount == right.br_startblock &&
+	    new->br_state == right.br_state &&
+	    new->br_blockcount + right.br_blockcount <= MAXEXTLEN &&
 	    ((state & (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING)) !=
 		      (BMAP_LEFT_CONTIG | BMAP_LEFT_FILLING |
 		       BMAP_RIGHT_FILLING) ||
-	     LEFT.br_blockcount + new->br_blockcount + RIGHT.br_blockcount
+	     left.br_blockcount + new->br_blockcount + right.br_blockcount
 			<= MAXEXTLEN))
 		state |= BMAP_RIGHT_CONTIG;
 
@@ -2038,18 +2034,18 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting all of a previous oldext extent to newext.
 		 * The left and right neighbors are both contiguous with new.
 		 */
-		LEFT.br_blockcount += PREV.br_blockcount + RIGHT.br_blockcount;
+		left.br_blockcount += prev.br_blockcount + right.br_blockcount;
 
 		xfs_iext_remove(ip, icur, state);
 		xfs_iext_remove(ip, icur, state);
 		xfs_iext_prev(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &LEFT);
+		xfs_iext_update_extent(ip, state, icur, &left);
 		ifp->if_nextents -= 2;
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = XFS_ILOG_CORE;
-			error = xfs_bmbt_lookup_eq(cur, &RIGHT, &i);
+			error = xfs_bmbt_lookup_eq(cur, &right, &i);
 			if (error)
 				goto done;
 			if (XFS_IS_CORRUPT(mp, i != 1)) {
@@ -2080,7 +2076,7 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &LEFT);
+			error = xfs_bmbt_update(cur, &left);
 			if (error)
 				goto done;
 		}
@@ -2091,17 +2087,17 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting all of a previous oldext extent to newext.
 		 * The left neighbor is contiguous, the right is not.
 		 */
-		LEFT.br_blockcount += PREV.br_blockcount;
+		left.br_blockcount += prev.br_blockcount;
 
 		xfs_iext_remove(ip, icur, state);
 		xfs_iext_prev(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &LEFT);
+		xfs_iext_update_extent(ip, state, icur, &left);
 		ifp->if_nextents--;
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = XFS_ILOG_CORE;
-			error = xfs_bmbt_lookup_eq(cur, &PREV, &i);
+			error = xfs_bmbt_lookup_eq(cur, &prev, &i);
 			if (error)
 				goto done;
 			if (XFS_IS_CORRUPT(mp, i != 1)) {
@@ -2120,7 +2116,7 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &LEFT);
+			error = xfs_bmbt_update(cur, &left);
 			if (error)
 				goto done;
 		}
@@ -2131,20 +2127,20 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting all of a previous oldext extent to newext.
 		 * The right neighbor is contiguous, the left is not.
 		 */
-		PREV.br_blockcount += RIGHT.br_blockcount;
-		PREV.br_state = new->br_state;
+		prev.br_blockcount += right.br_blockcount;
+		prev.br_state = new->br_state;
 
 		xfs_iext_next(ifp, icur);
 		xfs_iext_remove(ip, icur, state);
 		xfs_iext_prev(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		xfs_iext_update_extent(ip, state, icur, &prev);
 		ifp->if_nextents--;
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = XFS_ILOG_CORE;
-			error = xfs_bmbt_lookup_eq(cur, &RIGHT, &i);
+			error = xfs_bmbt_lookup_eq(cur, &right, &i);
 			if (error)
 				goto done;
 			if (XFS_IS_CORRUPT(mp, i != 1)) {
@@ -2163,7 +2159,7 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &PREV);
+			error = xfs_bmbt_update(cur, &prev);
 			if (error)
 				goto done;
 		}
@@ -2175,12 +2171,12 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Neither the left nor right neighbors are contiguous with
 		 * the new one.
 		 */
-		PREV.br_state = new->br_state;
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		prev.br_state = new->br_state;
+		xfs_iext_update_extent(ip, state, icur, &prev);
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = 0;
 			error = xfs_bmbt_lookup_eq(cur, new, &i);
 			if (error)
@@ -2189,7 +2185,7 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &PREV);
+			error = xfs_bmbt_update(cur, &prev);
 			if (error)
 				goto done;
 		}
@@ -2200,20 +2196,20 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting the first part of a previous oldext extent to newext.
 		 * The left neighbor is contiguous.
 		 */
-		LEFT.br_blockcount += new->br_blockcount;
+		left.br_blockcount += new->br_blockcount;
 
-		old = PREV;
-		PREV.br_startoff += new->br_blockcount;
-		PREV.br_startblock += new->br_blockcount;
-		PREV.br_blockcount -= new->br_blockcount;
+		old = prev;
+		prev.br_startoff += new->br_blockcount;
+		prev.br_startblock += new->br_blockcount;
+		prev.br_blockcount -= new->br_blockcount;
 
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		xfs_iext_update_extent(ip, state, icur, &prev);
 		xfs_iext_prev(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &LEFT);
+		xfs_iext_update_extent(ip, state, icur, &left);
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = 0;
 			error = xfs_bmbt_lookup_eq(cur, &old, &i);
 			if (error)
@@ -2222,13 +2218,13 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &PREV);
+			error = xfs_bmbt_update(cur, &prev);
 			if (error)
 				goto done;
 			error = xfs_btree_decrement(cur, 0, &i);
 			if (error)
 				goto done;
-			error = xfs_bmbt_update(cur, &LEFT);
+			error = xfs_bmbt_update(cur, &left);
 			if (error)
 				goto done;
 		}
@@ -2239,18 +2235,18 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting the first part of a previous oldext extent to newext.
 		 * The left neighbor is not contiguous.
 		 */
-		old = PREV;
-		PREV.br_startoff += new->br_blockcount;
-		PREV.br_startblock += new->br_blockcount;
-		PREV.br_blockcount -= new->br_blockcount;
+		old = prev;
+		prev.br_startoff += new->br_blockcount;
+		prev.br_startblock += new->br_blockcount;
+		prev.br_blockcount -= new->br_blockcount;
 
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		xfs_iext_update_extent(ip, state, icur, &prev);
 		xfs_iext_insert(ip, icur, new, state);
 		ifp->if_nextents++;
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = XFS_ILOG_CORE;
 			error = xfs_bmbt_lookup_eq(cur, &old, &i);
 			if (error)
@@ -2259,7 +2255,7 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &PREV);
+			error = xfs_bmbt_update(cur, &prev);
 			if (error)
 				goto done;
 			cur->bc_rec.b = *new;
@@ -2277,20 +2273,20 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting the last part of a previous oldext extent to newext.
 		 * The right neighbor is contiguous with the new allocation.
 		 */
-		old = PREV;
-		PREV.br_blockcount -= new->br_blockcount;
+		old = prev;
+		prev.br_blockcount -= new->br_blockcount;
 
-		RIGHT.br_startoff = new->br_startoff;
-		RIGHT.br_startblock = new->br_startblock;
-		RIGHT.br_blockcount += new->br_blockcount;
+		right.br_startoff = new->br_startoff;
+		right.br_startblock = new->br_startblock;
+		right.br_blockcount += new->br_blockcount;
 
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		xfs_iext_update_extent(ip, state, icur, &prev);
 		xfs_iext_next(ifp, icur);
-		xfs_iext_update_extent(ip, state, icur, &RIGHT);
+		xfs_iext_update_extent(ip, state, icur, &right);
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = 0;
 			error = xfs_bmbt_lookup_eq(cur, &old, &i);
 			if (error)
@@ -2299,13 +2295,13 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &PREV);
+			error = xfs_bmbt_update(cur, &prev);
 			if (error)
 				goto done;
 			error = xfs_btree_increment(cur, 0, &i);
 			if (error)
 				goto done;
-			error = xfs_bmbt_update(cur, &RIGHT);
+			error = xfs_bmbt_update(cur, &right);
 			if (error)
 				goto done;
 		}
@@ -2316,17 +2312,17 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Setting the last part of a previous oldext extent to newext.
 		 * The right neighbor is not contiguous.
 		 */
-		old = PREV;
-		PREV.br_blockcount -= new->br_blockcount;
+		old = prev;
+		prev.br_blockcount -= new->br_blockcount;
 
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		xfs_iext_update_extent(ip, state, icur, &prev);
 		xfs_iext_next(ifp, icur);
 		xfs_iext_insert(ip, icur, new, state);
 		ifp->if_nextents++;
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = XFS_ILOG_CORE;
 			error = xfs_bmbt_lookup_eq(cur, &old, &i);
 			if (error)
@@ -2335,7 +2331,7 @@ xfs_bmap_add_extent_unwritten_real(
 				error = -EFSCORRUPTED;
 				goto done;
 			}
-			error = xfs_bmbt_update(cur, &PREV);
+			error = xfs_bmbt_update(cur, &prev);
 			if (error)
 				goto done;
 			error = xfs_bmbt_lookup_eq(cur, new, &i);
@@ -2360,25 +2356,24 @@ xfs_bmap_add_extent_unwritten_real(
 		 * newext.  Contiguity is impossible here.
 		 * One extent becomes three extents.
 		 */
-		old = PREV;
-		PREV.br_blockcount = new->br_startoff - PREV.br_startoff;
+		old = prev;
+		prev.br_blockcount = new->br_startoff - prev.br_startoff;
 
-		r[0] = *new;
-		r[1].br_startoff = new_endoff;
-		r[1].br_blockcount =
+		right.br_startoff = new_endoff;
+		right.br_blockcount =
 			old.br_startoff + old.br_blockcount - new_endoff;
-		r[1].br_startblock = new->br_startblock + new->br_blockcount;
-		r[1].br_state = PREV.br_state;
+		right.br_startblock = new->br_startblock + new->br_blockcount;
+		right.br_state = prev.br_state;
 
-		xfs_iext_update_extent(ip, state, icur, &PREV);
+		xfs_iext_update_extent(ip, state, icur, &prev);
 		xfs_iext_next(ifp, icur);
-		xfs_iext_insert(ip, icur, &r[1], state);
-		xfs_iext_insert(ip, icur, &r[0], state);
+		xfs_iext_insert(ip, icur, &right, state);
+		xfs_iext_insert(ip, icur, new, state);
 		ifp->if_nextents += 2;
 
-		if (cur == NULL)
+		if (cur == NULL) {
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
-		else {
+		} else {
 			rval = XFS_ILOG_CORE;
 			error = xfs_bmbt_lookup_eq(cur, &old, &i);
 			if (error)
@@ -2388,11 +2383,11 @@ xfs_bmap_add_extent_unwritten_real(
 				goto done;
 			}
 			/* new right extent - oldext */
-			error = xfs_bmbt_update(cur, &r[1]);
+			error = xfs_bmbt_update(cur, &right);
 			if (error)
 				goto done;
 			/* new left extent - oldext */
-			cur->bc_rec.b = PREV;
+			cur->bc_rec.b = prev;
 			if ((error = xfs_btree_insert(cur, &i)))
 				goto done;
 			if (XFS_IS_CORRUPT(mp, i != 1)) {
@@ -2459,9 +2454,6 @@ xfs_bmap_add_extent_unwritten_real(
 done:
 	*logflagsp |= rval;
 	return error;
-#undef	LEFT
-#undef	RIGHT
-#undef	PREV
 }
 
 /*
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] xfs: introduce xfs_bmap_update_extent_real()
  2022-02-16  3:08 [PATCH v2 0/3] xfs: some end COW remapping optimization Gao Xiang
  2022-02-16  3:08 ` [PATCH v2 1/3] xfs: get rid of LEFT, RIGHT, PREV in xfs_bmap_add_extent_unwritten_real() Gao Xiang
@ 2022-02-16  3:08 ` Gao Xiang
  2022-02-16  3:08 ` [PATCH v2 3/3] xfs: introduce xfs_bremapi_from_cowfork() Gao Xiang
  2022-03-21 22:21 ` [PATCH v2 0/3] xfs: some end COW remapping optimization Darrick J. Wong
  3 siblings, 0 replies; 7+ messages in thread
From: Gao Xiang @ 2022-02-16  3:08 UTC (permalink / raw)
  To: xfs; +Cc: LKML, Darrick J. Wong, Gao Xiang

Previously, xfs_bmap_add_extent_unwritten_real() was just used for
unwritten conversion. However, the code could be sightly modified
to update a real allocated extent. It can be then used to avoid
unnecessary bmbt unmap due to end COW.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 65 ++++++++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_bmap.h |  4 +--
 fs/xfs/xfs_reflink.c     |  5 ++--
 3 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 14d1a806ba15..8eb1d1ed7b75 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1930,22 +1930,26 @@ xfs_bmap_add_extent_delay_real(
 }
 
 /*
- * Convert an unwritten allocation to a real allocation or vice versa.
+ * Update a real allocated extent (including converting an unwritten
+ * allocation to a real allocation or vice versa.)
  */
 int					/* error */
-xfs_bmap_add_extent_unwritten_real(
+xfs_bmap_update_extent_real(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip,	/* incore inode pointer */
 	int			whichfork,
 	struct xfs_iext_cursor	*icur,
 	struct xfs_btree_cur	**curp,	/* if *curp is null, not a btree */
 	struct xfs_bmbt_irec	*new,	/* new data to add to file extents */
-	int			*logflagsp) /* inode logging flags */
+	int			*logflagsp, /* inode logging flags */
+	bool			convert)
 {
 	struct xfs_btree_cur	*cur;	/* btree cursor */
 	int			error;	/* error return value */
 	int			i;	/* temp state */
 	struct xfs_ifork	*ifp;	/* inode fork pointer */
+	xfs_fileoff_t		del_startoff;	/* start offset of del entry */
+	xfs_exntst_t		del_state;
 	xfs_fileoff_t		new_endoff;	/* end offset of new entry */
 	struct xfs_bmbt_irec	left, right;	/* neighbor extent entries */
 	struct xfs_bmbt_irec	prev;		/* previous old extent */
@@ -1953,6 +1957,7 @@ xfs_bmap_add_extent_unwritten_real(
 	int			state = xfs_bmap_fork_to_state(whichfork);
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_bmbt_irec	old;
+	int			tmp_logflags = 0;
 
 	*logflagsp = 0;
 
@@ -1968,8 +1973,11 @@ xfs_bmap_add_extent_unwritten_real(
 	 */
 	error = 0;
 	xfs_iext_get_extent(ifp, icur, &prev);
-	ASSERT(new->br_state != prev.br_state);
+	ASSERT(!convert || new->br_state != prev.br_state);
 	new_endoff = new->br_startoff + new->br_blockcount;
+	del_startoff = prev.br_startblock +
+		new->br_startoff - prev.br_startoff;
+	del_state = prev.br_state;
 	ASSERT(prev.br_startoff <= new->br_startoff);
 	ASSERT(prev.br_startoff + prev.br_blockcount >= new_endoff);
 
@@ -2129,6 +2137,7 @@ xfs_bmap_add_extent_unwritten_real(
 		 */
 		prev.br_blockcount += right.br_blockcount;
 		prev.br_state = new->br_state;
+		prev.br_startblock = new->br_startblock;
 
 		xfs_iext_next(ifp, icur);
 		xfs_iext_remove(ip, icur, state);
@@ -2171,6 +2180,7 @@ xfs_bmap_add_extent_unwritten_real(
 		 * Neither the left nor right neighbors are contiguous with
 		 * the new one.
 		 */
+		prev.br_startblock = new->br_startblock;
 		prev.br_state = new->br_state;
 		xfs_iext_update_extent(ip, state, icur, &prev);
 
@@ -2362,7 +2372,8 @@ xfs_bmap_add_extent_unwritten_real(
 		right.br_startoff = new_endoff;
 		right.br_blockcount =
 			old.br_startoff + old.br_blockcount - new_endoff;
-		right.br_startblock = new->br_startblock + new->br_blockcount;
+		right.br_startblock = old.br_startblock + prev.br_blockcount +
+			new->br_blockcount;
 		right.br_state = prev.br_state;
 
 		xfs_iext_update_extent(ip, state, icur, &prev);
@@ -2430,20 +2441,30 @@ xfs_bmap_add_extent_unwritten_real(
 	}
 
 	/* update reverse mappings */
-	xfs_rmap_convert_extent(mp, tp, ip, whichfork, new);
+	if (!convert) {
+		old = *new;
+		old.br_startblock = del_startoff;
+		old.br_state = del_state;
+		xfs_rmap_unmap_extent(tp, ip, whichfork, &old);
+		xfs_rmap_map_extent(tp, ip, whichfork, new);
+	} else {
+		xfs_rmap_convert_extent(mp, tp, ip, whichfork, new);
+	}
 
-	/* convert to a btree if necessary */
+	/* convert to a btree or extents if necessary */
 	if (xfs_bmap_needs_btree(ip, whichfork)) {
-		int	tmp_logflags;	/* partial log flag return val */
-
 		ASSERT(cur == NULL);
 		error = xfs_bmap_extents_to_btree(tp, ip, &cur, 0,
 				&tmp_logflags, whichfork);
-		*logflagsp |= tmp_logflags;
-		if (error)
-			goto done;
+	} else if (!convert) {
+		error = xfs_bmap_btree_to_extents(tp, ip, cur,
+				&tmp_logflags, whichfork);
 	}
 
+	*logflagsp |= tmp_logflags;
+	if (error)
+		goto done;
+
 	/* clear out the allocated field, done with it now in any case. */
 	if (cur) {
 		cur->bc_ino.allocated = 0;
@@ -4216,8 +4237,8 @@ xfs_bmapi_convert_unwritten(
 			return error;
 	}
 
-	error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, whichfork,
-			&bma->icur, &bma->cur, mval, &tmp_logflags);
+	error = xfs_bmap_update_extent_real(bma->tp, bma->ip, whichfork,
+			&bma->icur, &bma->cur, mval, &tmp_logflags, true);
 	/*
 	 * Log the inode core unconditionally in the unwritten extent conversion
 	 * path because the conversion might not have done so (e.g., if the
@@ -5444,9 +5465,9 @@ __xfs_bunmapi(
 				del.br_blockcount = mod;
 			}
 			del.br_state = XFS_EXT_UNWRITTEN;
-			error = xfs_bmap_add_extent_unwritten_real(tp, ip,
+			error = xfs_bmap_update_extent_real(tp, ip,
 					whichfork, &icur, &cur, &del,
-					&logflags);
+					&logflags, true);
 			if (error)
 				goto error0;
 			goto nodelete;
@@ -5503,18 +5524,18 @@ __xfs_bunmapi(
 				prev.br_startblock += mod;
 				prev.br_blockcount -= mod;
 				prev.br_state = XFS_EXT_UNWRITTEN;
-				error = xfs_bmap_add_extent_unwritten_real(tp,
-						ip, whichfork, &icur, &cur,
-						&prev, &logflags);
+				error = xfs_bmap_update_extent_real(tp, ip,
+						whichfork, &icur, &cur, &prev,
+						&logflags, true);
 				if (error)
 					goto error0;
 				goto nodelete;
 			} else {
 				ASSERT(del.br_state == XFS_EXT_NORM);
 				del.br_state = XFS_EXT_UNWRITTEN;
-				error = xfs_bmap_add_extent_unwritten_real(tp,
-						ip, whichfork, &icur, &cur,
-						&del, &logflags);
+				error = xfs_bmap_update_extent_real(tp, ip,
+						whichfork, &icur, &cur, &del,
+						&logflags, true);
 				if (error)
 					goto error0;
 				goto nodelete;
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 03d9aaf87413..c52ff94786e2 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -216,10 +216,10 @@ int	xfs_bmapi_reserve_delalloc(struct xfs_inode *ip, int whichfork,
 		int eof);
 int	xfs_bmapi_convert_delalloc(struct xfs_inode *ip, int whichfork,
 		xfs_off_t offset, struct iomap *iomap, unsigned int *seq);
-int	xfs_bmap_add_extent_unwritten_real(struct xfs_trans *tp,
+int	xfs_bmap_update_extent_real(struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork,
 		struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp,
-		struct xfs_bmbt_irec *new, int *logflagsp);
+		struct xfs_bmbt_irec *new, int *logflagsp, bool convert);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index db70060e7bf6..276387a6a85d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -266,9 +266,8 @@ xfs_reflink_convert_cow_locked(
 			continue;
 
 		got.br_state = XFS_EXT_NORM;
-		error = xfs_bmap_add_extent_unwritten_real(NULL, ip,
-				XFS_COW_FORK, &icur, &dummy_cur, &got,
-				&dummy_logflags);
+		error = xfs_bmap_update_extent_real(NULL, ip, XFS_COW_FORK,
+				&icur, &dummy_cur, &got, &dummy_logflags, true);
 		if (error)
 			return error;
 	} while (xfs_iext_next_extent(ip->i_cowfp, &icur, &got));
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] xfs: introduce xfs_bremapi_from_cowfork()
  2022-02-16  3:08 [PATCH v2 0/3] xfs: some end COW remapping optimization Gao Xiang
  2022-02-16  3:08 ` [PATCH v2 1/3] xfs: get rid of LEFT, RIGHT, PREV in xfs_bmap_add_extent_unwritten_real() Gao Xiang
  2022-02-16  3:08 ` [PATCH v2 2/3] xfs: introduce xfs_bmap_update_extent_real() Gao Xiang
@ 2022-02-16  3:08 ` Gao Xiang
  2022-02-17 12:16   ` [PATCH v3 " Gao Xiang
  2022-03-21 22:21 ` [PATCH v2 0/3] xfs: some end COW remapping optimization Darrick J. Wong
  3 siblings, 1 reply; 7+ messages in thread
From: Gao Xiang @ 2022-02-16  3:08 UTC (permalink / raw)
  To: xfs; +Cc: LKML, Darrick J. Wong, Gao Xiang

Previously, xfs_reflink_end_cow_extent() will unconditionally unmap
the corresponding old extent and remap an extent from COW fork.
However, it seems somewhat ineffective since the old bmbt records can
be directly updated for many cases instead.

This patch uses introduced xfs_bmap_update_extent_real() in the
previous patch for most extent inclusive cases or it will fall back
to the old way if such replacement is not possible.

Actually, we're planing to use a modified alway-cow like atomic write
approach internally, therefore it'd be nice to do some optimization
to reduce some metadata overhead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/xfs/libxfs/xfs_bmap.c | 116 ++++++++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_bmap.h |   3 +
 fs/xfs/xfs_reflink.c     |  19 +------
 3 files changed, 111 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8eb1d1ed7b75..f74ab489d9c8 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5880,6 +5880,113 @@ xfs_bmap_collapse_extents(
 	return error;
 }
 
+/* Deferred mapping is only for real extents in the data fork. */
+static bool
+xfs_bmap_is_update_needed(
+	struct xfs_bmbt_irec	*bmap)
+{
+	return  bmap->br_startblock != HOLESTARTBLOCK &&
+		bmap->br_startblock != DELAYSTARTBLOCK;
+}
+
+/* del is an extent from COW fork */
+int
+xfs_bremapi_from_cowfork(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*icow)
+{
+	int			error;
+	xfs_filblks_t		rlen;
+
+	if (xfs_bmap_is_update_needed(icow)) {
+		xfs_fileoff_t		start, end, max_len;
+		struct xfs_bmbt_irec	got;
+		struct xfs_iext_cursor	icur;
+		struct xfs_btree_cur	*cur = NULL;
+		struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+		int			logflags = 0;
+
+		error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
+		if (error)
+			return error;
+
+		max_len = xfs_refcount_max_unmap(tp->t_log_res);
+		if (max_len < icow->br_blockcount) {
+			icow->br_startoff += icow->br_blockcount - max_len;
+			icow->br_startblock += icow->br_blockcount - max_len;
+			icow->br_blockcount = max_len;
+		}
+
+		end = icow->br_startoff + icow->br_blockcount;
+		if (!xfs_iext_count(ifp) || !xfs_iext_lookup_extent_before(ip,
+				ifp, &end, &icur, &got) ||
+		    isnullstartblock(got.br_startblock) ||
+		    icow->br_startoff + icow->br_blockcount > got.br_startoff +
+				got.br_blockcount) {
+			error = -EAGAIN;
+		} else {
+			end = icow->br_startoff + icow->br_blockcount;
+			start = XFS_FILEOFF_MAX(icow->br_startoff,
+						got.br_startoff);
+			ASSERT(start < end);
+
+			/* Trim the extent to what we need */
+			xfs_trim_extent(icow, start, end - start);
+			xfs_trim_extent(&got, start, end - start);
+
+			if (ifp->if_format == XFS_DINODE_FMT_BTREE) {
+				cur = xfs_bmbt_init_cursor(tp->t_mountp, tp, ip,
+							   XFS_DATA_FORK);
+				cur->bc_ino.flags = 0;
+			}
+
+			/*
+			 * Free the CoW orphan record (it should be done here
+			 * before updating extent due to rmapbt update)
+			 */
+			xfs_refcount_free_cow_extent(tp, icow->br_startblock,
+						     icow->br_blockcount);
+
+			xfs_bmap_update_extent_real(tp, ip, XFS_DATA_FORK,
+					&icur, &cur, icow, &logflags, false);
+
+			/* Free previous referenced space */
+			xfs_refcount_decrease_extent(tp, &got);
+
+			trace_xfs_reflink_cow_remap(ip, icow);
+			error = 0;
+		}
+		if (cur)
+			xfs_btree_del_cursor(cur, 0);
+		if (logflags)
+			xfs_trans_log_inode(tp, ip, logflags);
+		if (!error)
+			return 0;
+	}
+
+	rlen = icow->br_blockcount;
+	error = __xfs_bunmapi(tp, ip, icow->br_startoff, &rlen, 0, 1);
+	if (error)
+		return error;
+
+	/* Trim the extent to whatever got unmapped. */
+	xfs_trim_extent(icow, icow->br_startoff + rlen,
+			icow->br_blockcount - rlen);
+	/* Free the CoW orphan record. */
+	xfs_refcount_free_cow_extent(tp, icow->br_startblock,
+				     icow->br_blockcount);
+
+	/* Map the new blocks into the data fork. */
+	xfs_bmap_map_extent(tp, ip, icow);
+
+	/* Charge this new data fork mapping to the on-disk quota. */
+	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
+			(long)icow->br_blockcount);
+	trace_xfs_reflink_cow_remap(ip, icow);
+	return 0;
+}
+
 /* Make sure we won't be right-shifting an extent past the maximum bound. */
 int
 xfs_bmap_can_insert_extents(
@@ -6123,15 +6230,6 @@ xfs_bmap_split_extent(
 	return error;
 }
 
-/* Deferred mapping is only for real extents in the data fork. */
-static bool
-xfs_bmap_is_update_needed(
-	struct xfs_bmbt_irec	*bmap)
-{
-	return  bmap->br_startblock != HOLESTARTBLOCK &&
-		bmap->br_startblock != DELAYSTARTBLOCK;
-}
-
 /* Record a bmap intent. */
 static int
 __xfs_bmap_add(
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index c52ff94786e2..9da1cff41c1c 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -220,6 +220,9 @@ int	xfs_bmap_update_extent_real(struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork,
 		struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp,
 		struct xfs_bmbt_irec *new, int *logflagsp, bool convert);
+int
+xfs_bremapi_from_cowfork(struct xfs_trans *tp, struct xfs_inode *ip,
+		struct xfs_bmbt_irec *icow);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 276387a6a85d..75bd2e03cd5b 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -590,7 +590,6 @@ xfs_reflink_end_cow_extent(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	xfs_filblks_t		rlen;
 	unsigned int		resblks;
 	int			error;
 
@@ -651,26 +650,10 @@ xfs_reflink_end_cow_extent(
 		goto out_cancel;
 	}
 
-	/* Unmap the old blocks in the data fork. */
-	rlen = del.br_blockcount;
-	error = __xfs_bunmapi(tp, ip, del.br_startoff, &rlen, 0, 1);
+	error = xfs_bremapi_from_cowfork(tp, ip, &del);
 	if (error)
 		goto out_cancel;
 
-	/* Trim the extent to whatever got unmapped. */
-	xfs_trim_extent(&del, del.br_startoff + rlen, del.br_blockcount - rlen);
-	trace_xfs_reflink_cow_remap(ip, &del);
-
-	/* Free the CoW orphan record. */
-	xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount);
-
-	/* Map the new blocks into the data fork. */
-	xfs_bmap_map_extent(tp, ip, &del);
-
-	/* Charge this new data fork mapping to the on-disk quota. */
-	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
-			(long)del.br_blockcount);
-
 	/* Remove the mapping from the CoW fork. */
 	xfs_bmap_del_extent_cow(ip, &icur, &got, &del);
 
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 3/3] xfs: introduce xfs_bremapi_from_cowfork()
  2022-02-16  3:08 ` [PATCH v2 3/3] xfs: introduce xfs_bremapi_from_cowfork() Gao Xiang
@ 2022-02-17 12:16   ` Gao Xiang
  0 siblings, 0 replies; 7+ messages in thread
From: Gao Xiang @ 2022-02-17 12:16 UTC (permalink / raw)
  To: xfs; +Cc: LKML, Darrick J. Wong, Gao Xiang

Previously, xfs_reflink_end_cow_extent() will unconditionally unmap
the corresponding old extent and remap an extent from COW fork.
However, it seems somewhat ineffective since the old bmbt records can
be directly updated for many cases instead.

This patch uses introduced xfs_bmap_update_extent_real() in the
previous patch for most extent inclusive cases or it will fall back
to the old way if such replacement is not possible.

Actually, we're planing to use a modified alway-cow like atomic write
approach internally, therefore it'd be nice to do some optimization
to reduce some metadata overhead.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
changes since v2:
 - fix quota accounting found by tests which were not run by mistake;
   Also, after this patch, xfs/312 will be false positive since
   xfs_bmap_finish_one() can be bypassed and error injection has no
   effect.

 fs/xfs/libxfs/xfs_bmap.c | 115 ++++++++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_bmap.h |   3 +
 fs/xfs/xfs_reflink.c     |  15 +----
 3 files changed, 110 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8eb1d1ed7b75..b7649ad5f18d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5880,6 +5880,112 @@ xfs_bmap_collapse_extents(
 	return error;
 }
 
+/* Deferred mapping is only for real extents in the data fork. */
+static bool
+xfs_bmap_is_update_needed(
+	struct xfs_bmbt_irec	*bmap)
+{
+	return  bmap->br_startblock != HOLESTARTBLOCK &&
+		bmap->br_startblock != DELAYSTARTBLOCK;
+}
+
+/* del is an extent from COW fork */
+int
+xfs_bremapi_from_cowfork(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*icow)
+{
+	int			error;
+	xfs_filblks_t		rlen;
+
+	if (xfs_bmap_is_update_needed(icow)) {
+		xfs_fileoff_t		start, end, max_len;
+		struct xfs_bmbt_irec	got;
+		struct xfs_iext_cursor	icur;
+		struct xfs_btree_cur	*cur = NULL;
+		struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+		int			logflags = 0;
+
+		error = xfs_iread_extents(tp, ip, XFS_DATA_FORK);
+		if (error)
+			return error;
+
+		max_len = xfs_refcount_max_unmap(tp->t_log_res);
+		if (max_len < icow->br_blockcount) {
+			icow->br_startoff += icow->br_blockcount - max_len;
+			icow->br_startblock += icow->br_blockcount - max_len;
+			icow->br_blockcount = max_len;
+		}
+
+		end = icow->br_startoff + icow->br_blockcount;
+		if (!xfs_iext_count(ifp) || !xfs_iext_lookup_extent_before(ip,
+				ifp, &end, &icur, &got) ||
+		    isnullstartblock(got.br_startblock) ||
+		    icow->br_startoff + icow->br_blockcount > got.br_startoff +
+				got.br_blockcount) {
+			error = -EAGAIN;
+		} else {
+			end = icow->br_startoff + icow->br_blockcount;
+			start = XFS_FILEOFF_MAX(icow->br_startoff,
+						got.br_startoff);
+			ASSERT(start < end);
+
+			/* Trim the extent to what we need */
+			xfs_trim_extent(icow, start, end - start);
+			xfs_trim_extent(&got, start, end - start);
+
+			if (ifp->if_format == XFS_DINODE_FMT_BTREE) {
+				cur = xfs_bmbt_init_cursor(tp->t_mountp, tp, ip,
+							   XFS_DATA_FORK);
+				cur->bc_ino.flags = 0;
+			}
+
+			/*
+			 * Free the CoW orphan record (it should be done here
+			 * before updating extent due to rmapbt update)
+			 */
+			xfs_refcount_free_cow_extent(tp, icow->br_startblock,
+						     icow->br_blockcount);
+
+			xfs_bmap_update_extent_real(tp, ip, XFS_DATA_FORK,
+					&icur, &cur, icow, &logflags, false);
+
+			/* Free previous referenced space */
+			xfs_refcount_decrease_extent(tp, &got);
+
+			/* Unchange the previous referenced quota */
+			xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT,
+						  -(int)got.br_blockcount);
+			trace_xfs_reflink_cow_remap(ip, icow);
+			error = 0;
+		}
+		if (cur)
+			xfs_btree_del_cursor(cur, 0);
+		if (logflags)
+			xfs_trans_log_inode(tp, ip, logflags);
+		if (!error)
+			return 0;
+	}
+
+	rlen = icow->br_blockcount;
+	error = __xfs_bunmapi(tp, ip, icow->br_startoff, &rlen, 0, 1);
+	if (error)
+		return error;
+
+	/* Trim the extent to whatever got unmapped. */
+	xfs_trim_extent(icow, icow->br_startoff + rlen,
+			icow->br_blockcount - rlen);
+	/* Free the CoW orphan record. */
+	xfs_refcount_free_cow_extent(tp, icow->br_startblock,
+				     icow->br_blockcount);
+
+	/* Map the new blocks into the data fork. */
+	xfs_bmap_map_extent(tp, ip, icow);
+	trace_xfs_reflink_cow_remap(ip, icow);
+	return 0;
+}
+
 /* Make sure we won't be right-shifting an extent past the maximum bound. */
 int
 xfs_bmap_can_insert_extents(
@@ -6123,15 +6229,6 @@ xfs_bmap_split_extent(
 	return error;
 }
 
-/* Deferred mapping is only for real extents in the data fork. */
-static bool
-xfs_bmap_is_update_needed(
-	struct xfs_bmbt_irec	*bmap)
-{
-	return  bmap->br_startblock != HOLESTARTBLOCK &&
-		bmap->br_startblock != DELAYSTARTBLOCK;
-}
-
 /* Record a bmap intent. */
 static int
 __xfs_bmap_add(
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index c52ff94786e2..9da1cff41c1c 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -220,6 +220,9 @@ int	xfs_bmap_update_extent_real(struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork,
 		struct xfs_iext_cursor *icur, struct xfs_btree_cur **curp,
 		struct xfs_bmbt_irec *new, int *logflagsp, bool convert);
+int
+xfs_bremapi_from_cowfork(struct xfs_trans *tp, struct xfs_inode *ip,
+		struct xfs_bmbt_irec *icow);
 
 enum xfs_bmap_intent_type {
 	XFS_BMAP_MAP = 1,
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 276387a6a85d..0cddee8feda0 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -590,7 +590,6 @@ xfs_reflink_end_cow_extent(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_trans	*tp;
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
-	xfs_filblks_t		rlen;
 	unsigned int		resblks;
 	int			error;
 
@@ -651,22 +650,10 @@ xfs_reflink_end_cow_extent(
 		goto out_cancel;
 	}
 
-	/* Unmap the old blocks in the data fork. */
-	rlen = del.br_blockcount;
-	error = __xfs_bunmapi(tp, ip, del.br_startoff, &rlen, 0, 1);
+	error = xfs_bremapi_from_cowfork(tp, ip, &del);
 	if (error)
 		goto out_cancel;
 
-	/* Trim the extent to whatever got unmapped. */
-	xfs_trim_extent(&del, del.br_startoff + rlen, del.br_blockcount - rlen);
-	trace_xfs_reflink_cow_remap(ip, &del);
-
-	/* Free the CoW orphan record. */
-	xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount);
-
-	/* Map the new blocks into the data fork. */
-	xfs_bmap_map_extent(tp, ip, &del);
-
 	/* Charge this new data fork mapping to the on-disk quota. */
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT,
 			(long)del.br_blockcount);
-- 
2.24.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] xfs: some end COW remapping optimization
  2022-02-16  3:08 [PATCH v2 0/3] xfs: some end COW remapping optimization Gao Xiang
                   ` (2 preceding siblings ...)
  2022-02-16  3:08 ` [PATCH v2 3/3] xfs: introduce xfs_bremapi_from_cowfork() Gao Xiang
@ 2022-03-21 22:21 ` Darrick J. Wong
  2022-03-22  2:57   ` Gao Xiang
  3 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2022-03-21 22:21 UTC (permalink / raw)
  To: Gao Xiang; +Cc: xfs, LKML

On Wed, Feb 16, 2022 at 11:08:51AM +0800, Gao Xiang wrote:
> Hi folks,
> 
> Currently, xfs_reflink_end_cow_extent() will unconditionally unmap an
> extent from DATA fork and then remap an extent from COW fork. It seems
> somewhat ineffective since for many cases we could update real bmbt
> records directly by sightly enhancing old
> xfs_bmap_add_extent_unwritten_real() implementation, thus reduce some
> measurable extra metadata overhead.

Does it work with rmap enabled?

Reading between the lines, I'm guessing the performance boost might
come from avoiding a transaction roll and (possibly) reducing the need
to log bmbt updates?  Particularly in the worst case where we split the
bmbt only to rejoin the blocks immediately after.

Recently, Dave and Allison have been pondering making an addition to the
deferred log item code so that we could ->finish_item the first defer op
in the same transaction that logs the caller's deferred operations.
Might that get you most of the speed advantage that you're seeking?

> It's important to us since, actually, we're planing to use a modified
> alway-cow like atomic write approach internally for database
> applications, therefore it'd be nice to do some optimization over
> simple end COW approach. Also I think it's still generic and can
> benefit other reflink use cases as well.

Hmm, that sounds /awfully/ similar to what sqlite does with f2fs' atomic
write ioctls.

Alternately, would this[1] feature that's been sitting around in
djwong-dev since late 2019 help?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates

--D

> 
> I did some tests with ramdisk in order to measure metadata overhead:
> 
> echo 1 > /sys/fs/xfs/debug/always_cow
> mkfs.xfs -f -mreflink=1 /dev/ram0
> mount /dev/ram0 testdir
> fio -filename=testdir/1 -size=1G -ioengine=psync -bs=4k -rw=randwrite -overwrite=1 -direct=1 -end_fsync=1 -name=job1
> 
> Test results as below:
> Vanilla:
> (1)   iops        : min= 7986, max=16434, avg=12505.76, stdev=2400.05, samples=41
> (2)   iops        : min= 7636, max=16376, avg=12474.19, stdev=2258.18, samples=42
> (3)   iops        : min= 8346, max=16439, avg=12227.95, stdev=2432.12, samples=42
> (4)   iops        : min= 8580, max=16496, avg=12779.41, stdev=2297.42, samples=41
> (5)   iops        : min= 8286, max=16556, avg=12500.76, stdev=2123.90, samples=41
> 
> Patched:
> (1)   iops        : min= 7086, max=17132, avg=12931.20, stdev=2729.10, samples=40
> (2)   iops        : min= 7704, max=17508, avg=13204.62, stdev=2507.70, samples=39
> (3)   iops        : min= 8736, max=17634, avg=13253.08, stdev=2545.18, samples=39
> (4)   iops        : min= 7188, max=17550, avg=12928.40, stdev=2633.64, samples=40
> (5)   iops        : min= 8268, max=17446, avg=12837.55, stdev=2717.98, samples=40
> 
> xfstests seems survived. Comments are much welcomed and
> thanks for your time!
> 
> Thanks,
> Gao Xiang
> 
> Changes since v1:
>  - fix missing tmp_logflags initialization;
>  - drop unnecessary realtime inode check pointed out by Darrick.
> 
> Gao Xiang (3):
>   xfs: get rid of LEFT, RIGHT, PREV in
>     xfs_bmap_add_extent_unwritten_real()
>   xfs: introduce xfs_bmap_update_extent_real()
>   xfs: introduce xfs_bremapi_from_cowfork()
> 
>  fs/xfs/libxfs/xfs_bmap.c | 377 +++++++++++++++++++++++++--------------
>  fs/xfs/libxfs/xfs_bmap.h |   7 +-
>  fs/xfs/xfs_reflink.c     |  24 +--
>  3 files changed, 252 insertions(+), 156 deletions(-)
> 
> -- 
> 2.24.4
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] xfs: some end COW remapping optimization
  2022-03-21 22:21 ` [PATCH v2 0/3] xfs: some end COW remapping optimization Darrick J. Wong
@ 2022-03-22  2:57   ` Gao Xiang
  0 siblings, 0 replies; 7+ messages in thread
From: Gao Xiang @ 2022-03-22  2:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs, LKML

Hi Darrick,

On Mon, Mar 21, 2022 at 03:21:25PM -0700, Darrick J. Wong wrote:
> On Wed, Feb 16, 2022 at 11:08:51AM +0800, Gao Xiang wrote:
> > Hi folks,
> > 
> > Currently, xfs_reflink_end_cow_extent() will unconditionally unmap an
> > extent from DATA fork and then remap an extent from COW fork. It seems
> > somewhat ineffective since for many cases we could update real bmbt
> > records directly by sightly enhancing old
> > xfs_bmap_add_extent_unwritten_real() implementation, thus reduce some
> > measurable extra metadata overhead.
> 
> Does it work with rmap enabled?

Yeah, I've tested with rmap enabled. But for now, I have to prioritize
`erofs over fscache' stuff since it's really needed for cloud vendors and
cloud-native ecosystem to achieve high-dense and proformance image
solution for runC and Kata containers. And this patchset still left
some work to do (since our tester found some cases could cause
performance degration but I haven't checked/confirm it yet.)

> Reading between the lines, I'm guessing the performance boost might
> come from avoiding a transaction roll and (possibly) reducing the need
> to log bmbt updates?  Particularly in the worst case where we split the
> bmbt only to rejoin the blocks immediately after.
> 
> Recently, Dave and Allison have been pondering making an addition to the
> deferred log item code so that we could ->finish_item the first defer op
> in the same transaction that logs the caller's deferred operations.
> Might that get you most of the speed advantage that you're seeking?

Ok, I will recheck later after my container stuff are almost done..

> 
> > It's important to us since, actually, we're planing to use a modified
> > alway-cow like atomic write approach internally for database
> > applications, therefore it'd be nice to do some optimization over
> > simple end COW approach. Also I think it's still generic and can
> > benefit other reflink use cases as well.
> 
> Hmm, that sounds /awfully/ similar to what sqlite does with f2fs' atomic
> write ioctls.
> 
> Alternately, would this[1] feature that's been sitting around in
> djwong-dev since late 2019 help?
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=atomic-file-updates
> 

The problem is that, for example, mysql directly uses direct I/O to
write data, and we don't want to touch applications (since there are
still old e.g. mysql versions in production.) so a transparent COW
atomic write would be much helpful for products (yeah, I admit it's a
bit tricky, but until applications choose to use swapext, we still
need some way that we don't need to touch database apps).

Thanks for the reply and the time!

Thanks,
Gao Xiang 

> --D
> 
> > 
> > I did some tests with ramdisk in order to measure metadata overhead:
> > 
> > echo 1 > /sys/fs/xfs/debug/always_cow
> > mkfs.xfs -f -mreflink=1 /dev/ram0
> > mount /dev/ram0 testdir
> > fio -filename=testdir/1 -size=1G -ioengine=psync -bs=4k -rw=randwrite -overwrite=1 -direct=1 -end_fsync=1 -name=job1
> > 
> > Test results as below:
> > Vanilla:
> > (1)   iops        : min= 7986, max=16434, avg=12505.76, stdev=2400.05, samples=41
> > (2)   iops        : min= 7636, max=16376, avg=12474.19, stdev=2258.18, samples=42
> > (3)   iops        : min= 8346, max=16439, avg=12227.95, stdev=2432.12, samples=42
> > (4)   iops        : min= 8580, max=16496, avg=12779.41, stdev=2297.42, samples=41
> > (5)   iops        : min= 8286, max=16556, avg=12500.76, stdev=2123.90, samples=41
> > 
> > Patched:
> > (1)   iops        : min= 7086, max=17132, avg=12931.20, stdev=2729.10, samples=40
> > (2)   iops        : min= 7704, max=17508, avg=13204.62, stdev=2507.70, samples=39
> > (3)   iops        : min= 8736, max=17634, avg=13253.08, stdev=2545.18, samples=39
> > (4)   iops        : min= 7188, max=17550, avg=12928.40, stdev=2633.64, samples=40
> > (5)   iops        : min= 8268, max=17446, avg=12837.55, stdev=2717.98, samples=40
> > 
> > xfstests seems survived. Comments are much welcomed and
> > thanks for your time!
> > 
> > Thanks,
> > Gao Xiang
> > 
> > Changes since v1:
> >  - fix missing tmp_logflags initialization;
> >  - drop unnecessary realtime inode check pointed out by Darrick.
> > 
> > Gao Xiang (3):
> >   xfs: get rid of LEFT, RIGHT, PREV in
> >     xfs_bmap_add_extent_unwritten_real()
> >   xfs: introduce xfs_bmap_update_extent_real()
> >   xfs: introduce xfs_bremapi_from_cowfork()
> > 
> >  fs/xfs/libxfs/xfs_bmap.c | 377 +++++++++++++++++++++++++--------------
> >  fs/xfs/libxfs/xfs_bmap.h |   7 +-
> >  fs/xfs/xfs_reflink.c     |  24 +--
> >  3 files changed, 252 insertions(+), 156 deletions(-)
> > 
> > -- 
> > 2.24.4
> > 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-03-22  2:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  3:08 [PATCH v2 0/3] xfs: some end COW remapping optimization Gao Xiang
2022-02-16  3:08 ` [PATCH v2 1/3] xfs: get rid of LEFT, RIGHT, PREV in xfs_bmap_add_extent_unwritten_real() Gao Xiang
2022-02-16  3:08 ` [PATCH v2 2/3] xfs: introduce xfs_bmap_update_extent_real() Gao Xiang
2022-02-16  3:08 ` [PATCH v2 3/3] xfs: introduce xfs_bremapi_from_cowfork() Gao Xiang
2022-02-17 12:16   ` [PATCH v3 " Gao Xiang
2022-03-21 22:21 ` [PATCH v2 0/3] xfs: some end COW remapping optimization Darrick J. Wong
2022-03-22  2:57   ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).