All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
@ 2015-12-19  8:56 Darrick J. Wong
  2015-12-19  8:56 ` [PATCH 01/76] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
                   ` (76 more replies)
  0 siblings, 77 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:56 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Hi all,

This is the fourth revision of an RFC for adding to XFS kernel support
for tracking reverse-mappings of physical blocks to file and metadata;
and support for mapping multiple file logical blocks to the same
physical block, more commonly known as reflinking.  Given the
significant amount of re-engineering required to make the initial rmap
implementation compatible with reflink, I decided to publish both
features as an integrated patchset off of upstream.  This means that
rmap and reflink are now compatible with each other.

Since the previous RFC, I've integrated Anna Schumaker and Christoph
Hellwig's patches to hoist the reflink ioctls into the VFS, and done
the same to the dedupe ioctl.  These patches were posted separately to
a wider list, and are a build requirement for this patch set.

Dave Chinner's initial rmap implementation featured a simple b+tree
containing (_physical_block_, blockcount, owner) records and enough
code to stuff the rmap btree (rmapbt) whenever a block was allocated
or freed.  However, a generic reflink implementation requires the
ability to map a block to any logical block offset in any file.
Therefore it is necessary to expand the rmapbt record definition to be
(_physical block_, _owner_, _offset_, blockcount) to maintain uniquely
identifiable records.  The upper two bits of the offset field are used
to flag attr fork records and bmbt block records, respectively.  The
highest bit of the blockcount is used to indicate an unwritten extent.
It is intended that in the future the rmapbt will some day be used to
reconstruct a corrupt block map btree (bmbt).

The reflink implementation features a simple b+tree containing
(_physical block_, blockcount, refcount) records to track the
reference counts of extents of physical blocks.  There's also support
code to provide the desired copy-on-write behavior and the userland
interfaces to reflink, query the status of, and a new fallocate mode
to un-reflink parts of files.

For single-owner blocks (i.e. metadata) the rmapbt records are still
managed at alloc/free time.  To enable reflink and rmap at the same
time, however, it becomes necessary to manage rmapbt records for file
extents at map/unmap time.  In the current implementation, file extent
records exactly mirror bmbt contents.  It should be easy to merge
file extent rmaps on non-reflink filesystems, but that is not yet
written.  In theory merging can happen for file extent rmaps on
reflink filesystems too, but that could involve a lot of searching
through the tree since records are not indexed on the last physical
block of the extent.

The ioctl interface to XFS reflink looks surprisingly like the btrfs
ioctl interface -- you can reflink a file, reflink subranges
of a file, or dedupe subranges of files.  To un-reflink a file, I'm
proposing a new fallocate flag which will (try to) fork all shared
blocks within a certain file range.  xfs_fsr is a better candidate
for de-reflinking a file since it also defragments the file; the
extent swap ioctl has also been upgraded (crappily) to support
updating the rmapbt as needed.

The patch set is based on the current (4.4-rc5) upstream kernel.
There are plenty of bugs in this code.  There are too many patches to
discuss individually, but they are grouped by subject area:

1. Cleanups
2. rmapbt support
3. Re-engineering rmapbt to support reflink
4. refcntbt support
5. Implement the data block sharing pieces of reflink
6. Reflink/dedupe control for userspace

Fixed since RFCv3:

 * The reflink and dedupe ioctls are being hoisted to the VFS, as
   provided in the first few patches.  Patch 81 connects to this
   functionality.

 * Copy on write has been rewritten for v4.  We now use the existing
   delayed allocation mechanism to coalesce writes together, deferring
   allocation until writeout time.  This enables CoW to make better
   block placement decisions and significantly reduces overhead.
   CoW is still pretty slow, but not as slow as before.

 * Direct IO CoW has been implemented using the same mechanism as
   above, but modified to perform the allocation and remapping right
   then and there.  Throughput is much higher than pushing data
   through the page cache CoW.  (It's the same mechanism, but we're
   playing with chunks bigger than a single memory page.)

 * CoW ENOSPC works correctly now, except in the pathological case
   that the AG fills up and the rmap btree cannot expand.  That will
   be addressed for v5.

 * fallocate will now unshare blocks to prevent future ENOSPC, as
   you'd expect.

 * refcount btree blocks are preallocated at mount time to prevent
   ENOSPC while trying to expand the tree.  This also has the effect
   of grouping the btree blocks together, which can speed up CoW
   remapping.

Issues: 

 * The extent swapping ioctl still allocates a bigger fixed-size
   transaction.  That's most likely a stupid thing to do, so getting a
   better grip on how the journalling code works and auditing all the
   new transaction users will have to happen.  Right now it mostly
   gets lucky.

 * EFI tracking for the allocated-but-not-yet-mapped blocks is
   nonexistant.  A crash will leak them.

 * ENOSPC while expanding the rmap btree can crash the FS.  For now we
   work around this problem by making the AGFL as big as possible,
   failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
   available, and hoping that doesn't actually happen.

If you're going to start using this mess, you probably ought to just
pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
There are also updates for xfs-docs[4] and man-pages[5].

The patches have been xfstested with x64, i386, and ppc64; while in
general the tests run to completion, there are still periodic bugs
that will be addressed by the next RFC.  There's a persistent crash on
arm64 and ppc64el that I haven't been able to triage.

This is an extraordinary way to eat your data.  Enjoy! 
Comments and questions are, as always, welcome.

--D

[1] https://github.com/djwong/linux/tree/for-dave
[2] https://github.com/djwong/xfsprogs/tree/for-dave
[3] https://github.com/djwong/xfstests/tree/for-dave
[4] https://github.com/djwong/xfs-documentation/tree/for-dave
[5] https://github.com/djwong/man-pages/commits/for-mtk

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH 01/76] libxfs: make xfs_alloc_fix_freelist non-static
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
@ 2015-12-19  8:56 ` Darrick J. Wong
  2015-12-19  8:56 ` [PATCH 02/76] xfs: fix log ticket type printing Darrick J. Wong
                   ` (75 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:56 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since xfs_repair wants to use xfs_alloc_fix_freelist, remove the
static designation.  xfsprogs already has this; this simply brings
the kernel up to date.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c |    2 +-
 fs/xfs/libxfs/xfs_alloc.h |    1 +
 2 files changed, 2 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 3479294..89c4775 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1926,7 +1926,7 @@ xfs_alloc_space_available(
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
  */
-STATIC int			/* error */
+int			/* error */
 xfs_alloc_fix_freelist(
 	struct xfs_alloc_arg	*args,	/* allocation argument structure */
 	int			flags)	/* XFS_ALLOC_FLAG_... */
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 0ecde4d..135eb3d 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -235,5 +235,6 @@ xfs_alloc_get_rec(
 
 int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
+int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
 
 #endif	/* __XFS_ALLOC_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 02/76] xfs: fix log ticket type printing
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
  2015-12-19  8:56 ` [PATCH 01/76] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
@ 2015-12-19  8:56 ` Darrick J. Wong
  2016-01-03 12:13   ` Christoph Hellwig
  2015-12-19  8:56 ` [PATCH 03/76] libxfs: refactor the btree size calculator code Darrick J. Wong
                   ` (74 subsequent siblings)
  76 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:56 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Update the log ticket reservation type printing code to reflect
all the types of log tickets, to avoid incorrect debug output and
avoid running off the end of the array.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index f52c72a..2aa187e 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -2045,12 +2045,14 @@ xlog_print_tic_res(
 	    "QM_DQCLUSTER",
 	    "QM_QINOCREATE",
 	    "QM_QUOTAOFF_END",
-	    "SB_UNIT",
 	    "FSYNC_TS",
 	    "GROWFSRT_ALLOC",
 	    "GROWFSRT_ZERO",
 	    "GROWFSRT_FREE",
-	    "SWAPEXT"
+	    "SWAPEXT",
+	    "CHECKPOINT",
+	    "ICREATE",
+	    "CREATE_TMPFILE"
 	};
 
 	xfs_warn(mp, "xlog_write: reservation summary:");

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 03/76] libxfs: refactor the btree size calculator code
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
  2015-12-19  8:56 ` [PATCH 01/76] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
  2015-12-19  8:56 ` [PATCH 02/76] xfs: fix log ticket type printing Darrick J. Wong
@ 2015-12-19  8:56 ` Darrick J. Wong
  2015-12-20 20:39   ` Dave Chinner
  2015-12-19  8:56 ` [PATCH 04/76] libxfs: use a convenience variable instead of open-coding the fork Darrick J. Wong
                   ` (73 subsequent siblings)
  76 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:56 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a macro to generate btree height calculator functions.
This will be used (much) later when we get to the refcount
btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c       |   18 +-----------------
 fs/xfs/libxfs/xfs_bmap_btree.c |    2 ++
 fs/xfs/libxfs/xfs_bmap_btree.h |    2 ++
 fs/xfs/libxfs/xfs_inode_fork.c |    1 +
 fs/xfs/libxfs/xfs_shared.h     |   29 +++++++++++++++++++++++++++++
 5 files changed, 35 insertions(+), 17 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 119c242..2386377 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -180,25 +180,9 @@ xfs_bmap_worst_indlen(
 	xfs_inode_t	*ip,		/* incore inode pointer */
 	xfs_filblks_t	len)		/* delayed extent length */
 {
-	int		level;		/* btree level number */
-	int		maxrecs;	/* maximum record count at this level */
-	xfs_mount_t	*mp;		/* mount structure */
 	xfs_filblks_t	rval;		/* return value */
 
-	mp = ip->i_mount;
-	maxrecs = mp->m_bmap_dmxr[0];
-	for (level = 0, rval = 0;
-	     level < XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK);
-	     level++) {
-		len += maxrecs - 1;
-		do_div(len, maxrecs);
-		rval += len;
-		if (len == 1)
-			return rval + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) -
-				level - 1;
-		if (level == 0)
-			maxrecs = mp->m_bmap_dmxr[1];
-	}
+	rval = xfs_bmbt_calc_btree_size(ip->i_mount, len);
 	return rval;
 }
 
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 6b0cf65..3d947d6 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -882,3 +882,5 @@ xfs_bmbt_change_owner(
 	xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
 	return error;
 }
+
+DEFINE_BTREE_SIZE_FN(bmbt, m_bmap_dmxr, XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK))
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h
index 819a8a4..7165a1b0 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.h
+++ b/fs/xfs/libxfs/xfs_bmap_btree.h
@@ -140,4 +140,6 @@ extern int xfs_bmbt_change_owner(struct xfs_trans *tp, struct xfs_inode *ip,
 extern struct xfs_btree_cur *xfs_bmbt_init_cursor(struct xfs_mount *,
 		struct xfs_trans *, struct xfs_inode *, int);
 
+DECLARE_BTREE_SIZE_FN(bmbt);
+
 #endif	/* __XFS_BMAP_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 0defbd0..46ea657 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -19,6 +19,7 @@
 
 #include "xfs.h"
 #include "xfs_fs.h"
+#include "xfs_shared.h"
 #include "xfs_format.h"
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 5be5297..544d0e9 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -234,4 +234,33 @@ bool xfs_symlink_hdr_ok(xfs_ino_t ino, uint32_t offset,
 void xfs_symlink_local_to_remote(struct xfs_trans *tp, struct xfs_buf *bp,
 				 struct xfs_inode *ip, struct xfs_ifork *ifp);
 
+/* btree size calculator templates */
+#define DECLARE_BTREE_SIZE_FN(btree) \
+xfs_filblks_t xfs_##btree##_calc_btree_size(struct xfs_mount *mp, \
+		unsigned long len);
+
+#define DEFINE_BTREE_SIZE_FN(btree, limitfield, maxlevels) \
+xfs_filblks_t \
+xfs_##btree##_calc_btree_size( \
+	struct xfs_mount	*mp, \
+	unsigned long		len) \
+{ \
+	int			level; \
+	int			maxrecs; \
+	xfs_filblks_t		rval; \
+\
+	maxrecs = mp->limitfield[0]; \
+	for (level = 0, rval = 0; level < maxlevels; level++) { \
+		len += maxrecs - 1; \
+		do_div(len, maxrecs); \
+		rval += len; \
+		if (len == 1) \
+			return rval + maxlevels - \
+				level - 1; \
+		if (level == 0) \
+			maxrecs = mp->limitfield[1]; \
+	} \
+	return rval; \
+}
+
 #endif /* __XFS_SHARED_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 04/76] libxfs: use a convenience variable instead of open-coding the fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (2 preceding siblings ...)
  2015-12-19  8:56 ` [PATCH 03/76] libxfs: refactor the btree size calculator code Darrick J. Wong
@ 2015-12-19  8:56 ` Darrick J. Wong
  2015-12-19  8:56 ` [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct Darrick J. Wong
                   ` (72 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:56 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Use a convenience variable instead of open-coding the inode fork.
This isn't really needed for now, but will become important when we
add the copy-on-write fork later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 2386377..817c994 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1708,9 +1708,10 @@ xfs_bmap_add_extent_delay_real(
 	xfs_filblks_t		temp2=0;/* value for da_new calculations */
 	int			tmp_rval;	/* partial logging flags */
 	struct xfs_mount	*mp;
+	int			whichfork = XFS_DATA_FORK;
 
 	mp  = bma->tp ? bma->tp->t_mountp : NULL;
-	ifp = XFS_IFORK_PTR(bma->ip, XFS_DATA_FORK);
+	ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 
 	ASSERT(bma->idx >= 0);
 	ASSERT(bma->idx <= ifp->if_bytes / sizeof(struct xfs_bmbt_rec));
@@ -1769,7 +1770,7 @@ xfs_bmap_add_extent_delay_real(
 	 * Don't set contiguous if the combined extent would be too large.
 	 * Also check for all-three-contiguous being too large.
 	 */
-	if (bma->idx < bma->ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t) - 1) {
+	if (bma->idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) - 1) {
 		state |= BMAP_RIGHT_VALID;
 		xfs_bmbt_get_all(xfs_iext_get_ext(ifp, bma->idx + 1), &RIGHT);
 
@@ -2000,10 +2001,10 @@ xfs_bmap_add_extent_delay_real(
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
 
-		if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
+		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 					bma->firstblock, bma->flist,
-					&bma->cur, 1, &tmp_rval, XFS_DATA_FORK);
+					&bma->cur, 1, &tmp_rval, whichfork);
 			rval |= tmp_rval;
 			if (error)
 				goto done;
@@ -2084,10 +2085,10 @@ xfs_bmap_add_extent_delay_real(
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
 
-		if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
+		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 				bma->firstblock, bma->flist, &bma->cur, 1,
-				&tmp_rval, XFS_DATA_FORK);
+				&tmp_rval, whichfork);
 			rval |= tmp_rval;
 			if (error)
 				goto done;
@@ -2153,10 +2154,10 @@ xfs_bmap_add_extent_delay_real(
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
 
-		if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
+		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 					bma->firstblock, bma->flist, &bma->cur,
-					1, &tmp_rval, XFS_DATA_FORK);
+					1, &tmp_rval, whichfork);
 			rval |= tmp_rval;
 			if (error)
 				goto done;
@@ -2199,13 +2200,13 @@ xfs_bmap_add_extent_delay_real(
 	}
 
 	/* convert to a btree if necessary */
-	if (xfs_bmap_needs_btree(bma->ip, XFS_DATA_FORK)) {
+	if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 		int	tmp_logflags;	/* partial log flag return val */
 
 		ASSERT(bma->cur == NULL);
 		error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
 				bma->firstblock, bma->flist, &bma->cur,
-				da_old > 0, &tmp_logflags, XFS_DATA_FORK);
+				da_old > 0, &tmp_logflags, whichfork);
 		bma->logflags |= tmp_logflags;
 		if (error)
 			goto done;
@@ -2226,7 +2227,7 @@ xfs_bmap_add_extent_delay_real(
 	if (bma->cur)
 		bma->cur->bc_private.b.allocated = 0;
 
-	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, XFS_DATA_FORK);
+	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
 done:
 	bma->logflags |= rval;
 	return error;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (3 preceding siblings ...)
  2015-12-19  8:56 ` [PATCH 04/76] libxfs: use a convenience variable instead of open-coding the fork Darrick J. Wong
@ 2015-12-19  8:56 ` Darrick J. Wong
  2016-01-03 12:15   ` Christoph Hellwig
  2015-12-19  8:57 ` [PATCH 06/76] xfs: introduce rmap btree definitions Darrick J. Wong
                   ` (71 subsequent siblings)
  76 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:56 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
inside it, gcc will quietly round the structure size up to the nearest
64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
macro returning incorrect results for v5 filesystems on 64-bit
machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
will see garbage in AGFL item 119 and complain.

Therefore, tell gcc not to pad the structure so that the AGFL size
calculation is correct.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 8774498..e2536bb 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -786,7 +786,7 @@ typedef struct xfs_agfl {
 	__be64		agfl_lsn;
 	__be32		agfl_crc;
 	__be32		agfl_bno[];	/* actually XFS_AGFL_SIZE(mp) */
-} xfs_agfl_t;
+} __attribute__((packed)) xfs_agfl_t;
 
 #define XFS_AGFL_CRC_OFF	offsetof(struct xfs_agfl, agfl_crc)
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 06/76] xfs: introduce rmap btree definitions
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (4 preceding siblings ...)
  2015-12-19  8:56 ` [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 07/76] xfs: add rmap btree stats infrastructure Darrick J. Wong
                   ` (70 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit inthe empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |    6 ++++++
 fs/xfs/libxfs/xfs_btree.c  |    4 ++--
 fs/xfs/libxfs/xfs_btree.h  |    3 +++
 fs/xfs/libxfs/xfs_format.h |   22 +++++++++++++++++-----
 fs/xfs/libxfs/xfs_types.h  |    4 ++--
 5 files changed, 30 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 89c4775..05c87a2 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2281,6 +2281,10 @@ xfs_agf_verify(
 	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS)
 		return false;
 
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
+	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)
+		return false;
+
 	/*
 	 * during growfs operations, the perag is not fully initialised,
 	 * so we can't use it for any useful checking. growfs ensures we can't
@@ -2411,6 +2415,8 @@ xfs_alloc_read_agf(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
 		pag->pagf_levels[XFS_BTNUM_CNTi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+		pag->pagf_levels[XFS_BTNUM_RMAPi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
 		spin_lock_init(&pag->pagb_lock);
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index af1bbee..4ff6b80 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -43,9 +43,9 @@ kmem_zone_t	*xfs_btree_cur_zone;
  * Btree magic numbers.
  */
 static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
-	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
+	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
 	  XFS_FIBT_MAGIC },
-	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
+	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
 	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
 };
 #define xfs_btree_magic(cur) \
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 992dec0..4942c4a 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -63,6 +63,7 @@ union xfs_btree_rec {
 #define	XFS_BTNUM_BMAP	((xfs_btnum_t)XFS_BTNUM_BMAPi)
 #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
 #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
+#define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
 
 /*
  * For logging record fields.
@@ -95,6 +96,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(__mp, bmbt, stat); break; \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
+	case XFS_BTNUM_RMAP: break;	\
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
 	}       \
 } while (0)
@@ -115,6 +117,7 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, ibt, stat, val); break; \
 	case XFS_BTNUM_FINO:	\
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
+	case XFS_BTNUM_RMAP: break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e2536bb..5c2703d 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -455,6 +455,7 @@ xfs_sb_has_compat_feature(
 }
 
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
+#define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
@@ -526,6 +527,12 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
 		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
 }
 
+static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
+}
+
 /*
  * XFS_SB_FEAT_INCOMPAT_META_UUID indicates that the metadata UUID
  * is stored separately from the user-visible UUID; this allows the
@@ -598,10 +605,10 @@ xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
 #define	XFS_AGI_GOOD_VERSION(v)	((v) == XFS_AGI_VERSION)
 
 /*
- * Btree number 0 is bno, 1 is cnt.  This value gives the size of the
+ * Btree number 0 is bno, 1 is cnt, 2 is rmap. This value gives the size of the
  * arrays below.
  */
-#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_CNTi + 1)
+#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_RMAPi + 1)
 
 /*
  * The second word of agf_levels in the first a.g. overlaps the EFS
@@ -618,12 +625,10 @@ typedef struct xfs_agf {
 	__be32		agf_seqno;	/* sequence # starting from 0 */
 	__be32		agf_length;	/* size in blocks of a.g. */
 	/*
-	 * Freespace information
+	 * Freespace and rmap information
 	 */
 	__be32		agf_roots[XFS_BTNUM_AGF];	/* root blocks */
-	__be32		agf_spare0;	/* spare field */
 	__be32		agf_levels[XFS_BTNUM_AGF];	/* btree levels */
-	__be32		agf_spare1;	/* spare field */
 
 	__be32		agf_flfirst;	/* first freelist block's index */
 	__be32		agf_fllast;	/* last freelist block's index */
@@ -1301,6 +1306,13 @@ typedef __be32 xfs_inobt_ptr_t;
 #define	XFS_FIBT_BLOCK(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
 
 /*
+ * Reverse mapping btree format definitions
+ *
+ * There is a btree for the reverse map per allocation group
+ */
+#define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
+
+/*
  * The first data block of an AG depends on whether the filesystem was formatted
  * with the finobt feature. If so, account for the finobt reserved root btree
  * block.
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b79dc66..3d50364 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -108,8 +108,8 @@ typedef enum {
 } xfs_lookup_t;
 
 typedef enum {
-	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi,
-	XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
+	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
 } xfs_btnum_t;
 
 struct xfs_name {

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 07/76] xfs: add rmap btree stats infrastructure
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (5 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 06/76] xfs: introduce rmap btree definitions Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 08/76] xfs: rmap btree add more reserved blocks Darrick J. Wong
                   ` (69 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

The rmap btree will require the same stats as all the other generic
btrees, so add al the code for that now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_btree.h |    7 ++++---
 fs/xfs/xfs_stats.c        |    1 +
 fs/xfs/xfs_stats.h        |   18 +++++++++++++++++-
 3 files changed, 22 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 4942c4a..0d0e546 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -96,8 +96,8 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(__mp, bmbt, stat); break; \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
-	case XFS_BTNUM_RMAP: break;	\
-	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
+	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
+	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
 
@@ -117,7 +117,8 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, ibt, stat, val); break; \
 	case XFS_BTNUM_FINO:	\
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
-	case XFS_BTNUM_RMAP: break; \
+	case XFS_BTNUM_RMAP:	\
+		__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
index 8686df6..f04f547 100644
--- a/fs/xfs/xfs_stats.c
+++ b/fs/xfs/xfs_stats.c
@@ -61,6 +61,7 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
 		{ "bmbt2",		XFSSTAT_END_BMBT_V2		},
 		{ "ibt2",		XFSSTAT_END_IBT_V2		},
 		{ "fibt2",		XFSSTAT_END_FIBT_V2		},
+		{ "rmapbt",		XFSSTAT_END_RMAP_V2		},
 		/* we print both series of quota information together */
 		{ "qm",			XFSSTAT_END_QM			},
 	};
diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h
index 483b0ef..657865f 100644
--- a/fs/xfs/xfs_stats.h
+++ b/fs/xfs/xfs_stats.h
@@ -197,7 +197,23 @@ struct xfsstats {
 	__uint32_t		xs_fibt_2_alloc;
 	__uint32_t		xs_fibt_2_free;
 	__uint32_t		xs_fibt_2_moves;
-#define XFSSTAT_END_XQMSTAT		(XFSSTAT_END_FIBT_V2+6)
+#define XFSSTAT_END_RMAP_V2		(XFSSTAT_END_FIBT_V2+15)
+	__uint32_t		xs_rmap_2_lookup;
+	__uint32_t		xs_rmap_2_compare;
+	__uint32_t		xs_rmap_2_insrec;
+	__uint32_t		xs_rmap_2_delrec;
+	__uint32_t		xs_rmap_2_newroot;
+	__uint32_t		xs_rmap_2_killroot;
+	__uint32_t		xs_rmap_2_increment;
+	__uint32_t		xs_rmap_2_decrement;
+	__uint32_t		xs_rmap_2_lshift;
+	__uint32_t		xs_rmap_2_rshift;
+	__uint32_t		xs_rmap_2_split;
+	__uint32_t		xs_rmap_2_join;
+	__uint32_t		xs_rmap_2_alloc;
+	__uint32_t		xs_rmap_2_free;
+	__uint32_t		xs_rmap_2_moves;
+#define XFSSTAT_END_XQMSTAT		(XFSSTAT_END_RMAP_V2+6)
 	__uint32_t		xs_qm_dqreclaims;
 	__uint32_t		xs_qm_dqreclaim_misses;
 	__uint32_t		xs_qm_dquot_dups;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 08/76] xfs: rmap btree add more reserved blocks
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (6 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 07/76] xfs: add rmap btree stats infrastructure Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 09/76] xfs: add owner field to extent allocation and freeing Darrick J. Wong
                   ` (68 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).

Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |   11 +++++++++++
 fs/xfs/libxfs/xfs_alloc.h  |    2 ++
 fs/xfs/libxfs/xfs_format.h |    9 +--------
 fs/xfs/xfs_fsops.c         |    6 +++---
 fs/xfs/xfs_mount.c         |    2 ++
 fs/xfs/xfs_mount.h         |    1 +
 6 files changed, 20 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 05c87a2..24032ef 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -49,6 +49,17 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+xfs_extlen_t
+xfs_prealloc_blocks(
+	struct xfs_mount	*mp)
+{
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return XFS_RMAP_BLOCK(mp) + 1;
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		return XFS_FIBT_BLOCK(mp) + 1;
+	return XFS_IBT_BLOCK(mp) + 1;
+}
+
 /*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 135eb3d..d260916 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -237,4 +237,6 @@ int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
 int xfs_alloc_fix_freelist(struct xfs_alloc_arg *args, int flags);
 
+xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
+
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 5c2703d..28635ea 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1312,18 +1312,11 @@ typedef __be32 xfs_inobt_ptr_t;
  */
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
-/*
- * The first data block of an AG depends on whether the filesystem was formatted
- * with the finobt feature. If so, account for the finobt reserved root btree
- * block.
- */
-#define XFS_PREALLOC_BLOCKS(mp) \
+#define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
 	 XFS_IBT_BLOCK(mp) + 1)
 
-
-
 /*
  * BMAP Btree format definitions
  *
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index ee3aaa0a..32e24ec 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -246,7 +246,7 @@ xfs_growfs_data_private(
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
-		tmpsize = agsize - XFS_PREALLOC_BLOCKS(mp);
+		tmpsize = agsize - mp->m_ag_prealloc_blocks;
 		agf->agf_freeblks = cpu_to_be32(tmpsize);
 		agf->agf_longest = cpu_to_be32(tmpsize);
 		if (xfs_sb_version_hascrc(&mp->m_sb))
@@ -343,7 +343,7 @@ xfs_growfs_data_private(
 						agno, 0);
 
 		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
 		arec->ar_blockcount = cpu_to_be32(
 			agsize - be32_to_cpu(arec->ar_startblock));
 
@@ -372,7 +372,7 @@ xfs_growfs_data_private(
 						agno, 0);
 
 		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
 		arec->ar_blockcount = cpu_to_be32(
 			agsize - be32_to_cpu(arec->ar_startblock));
 		nfree += be32_to_cpu(arec->ar_blockcount);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bb753b3..3d4a4fb 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -249,6 +249,8 @@ xfs_initialize_perag(
 
 	if (maxagi)
 		*maxagi = index;
+
+	mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
 	return 0;
 
 out_unwind:
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b570984..77def45 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -93,6 +93,7 @@ typedef struct xfs_mount {
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
+	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 	struct mutex		m_growlock;	/* growfs mutex */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 09/76] xfs: add owner field to extent allocation and freeing
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (7 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 08/76] xfs: rmap btree add more reserved blocks Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 10/76] xfs: add extended " Darrick J. Wong
                   ` (67 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

For the rmap btree to work, we have to fed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c        |   11 ++++++++---
 fs/xfs/libxfs/xfs_alloc.h        |    4 +++-
 fs/xfs/libxfs/xfs_bmap.c         |   17 ++++++++++++-----
 fs/xfs/libxfs/xfs_bmap.h         |    5 +++--
 fs/xfs/libxfs/xfs_bmap_btree.c   |    3 ++-
 fs/xfs/libxfs/xfs_format.h       |   16 ++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.c       |   10 +++++-----
 fs/xfs/libxfs/xfs_ialloc_btree.c |    3 ++-
 fs/xfs/xfs_bmap_util.c           |    3 ++-
 fs/xfs/xfs_fsops.c               |   13 +++++++++----
 fs/xfs/xfs_log_recover.c         |    3 ++-
 fs/xfs/xfs_trans.h               |    2 +-
 fs/xfs/xfs_trans_extfree.c       |    5 +++--
 13 files changed, 68 insertions(+), 27 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 24032ef..e3d25a6 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1594,6 +1594,7 @@ xfs_free_ag_extent(
 	xfs_agnumber_t	agno,	/* allocation group number */
 	xfs_agblock_t	bno,	/* starting block number */
 	xfs_extlen_t	len,	/* length of extent */
+	uint64_t	owner,	/* extent owner */
 	int		isfl)	/* set if is freelist blocks - no sb acctg */
 {
 	xfs_btree_cur_t	*bno_cur;	/* cursor for by-block btree */
@@ -2020,7 +2021,8 @@ xfs_alloc_fix_freelist(
 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
 		if (error)
 			goto out_agbp_relse;
-		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
+		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+					   XFS_RMAP_OWN_AG, 1);
 		if (error)
 			goto out_agbp_relse;
 		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2030,6 +2032,7 @@ xfs_alloc_fix_freelist(
 	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
+	targs.owner = XFS_RMAP_OWN_AG;
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
@@ -2682,7 +2685,8 @@ int				/* error */
 xfs_free_extent(
 	xfs_trans_t	*tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
-	xfs_extlen_t	len)	/* length of extent */
+	xfs_extlen_t	len,	/* length of extent */
+	uint64_t	owner)	/* extent owner */
 {
 	xfs_alloc_arg_t	args;
 	int		error;
@@ -2718,7 +2722,8 @@ xfs_free_extent(
 		goto error0;
 	}
 
-	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
+	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno,
+				   len, owner, 0);
 	if (!error)
 		xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
 error0:
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index d260916..ee86f44 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -123,6 +123,7 @@ typedef struct xfs_alloc_arg {
 	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* mask defining userdata treatment */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
+	uint64_t	owner;		/* owner of blocks being allocated */
 } xfs_alloc_arg_t;
 
 /*
@@ -210,7 +211,8 @@ int				/* error */
 xfs_free_extent(
 	struct xfs_trans *tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
-	xfs_extlen_t	len);	/* length of extent */
+	xfs_extlen_t	len,	/* length of extent */
+	uint64_t	owner);	/* extent owner */
 
 int					/* error */
 xfs_alloc_lookup_le(
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 817c994..6944157 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -551,10 +551,11 @@ xfs_bmap_validate_ret(
  */
 void
 xfs_bmap_add_free(
+	struct xfs_mount	*mp,		/* mount point structure */
+	struct xfs_bmap_free	*flist,		/* list of extents */
 	xfs_fsblock_t		bno,		/* fs block number of extent */
 	xfs_filblks_t		len,		/* length of extent */
-	xfs_bmap_free_t		*flist,		/* list of extents */
-	xfs_mount_t		*mp)		/* mount point structure */
+	uint64_t		owner)		/* extent owner */
 {
 	xfs_bmap_free_item_t	*cur;		/* current (next) element */
 	xfs_bmap_free_item_t	*new;		/* new element */
@@ -575,9 +576,12 @@ xfs_bmap_add_free(
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_bmap_free_item_zone != NULL);
+	ASSERT(owner);
+
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
+	new->xbfi_owner = owner;
 	for (prev = NULL, cur = flist->xbf_first;
 	     cur != NULL;
 	     prev = cur, cur = cur->xbfi_next) {
@@ -680,7 +684,7 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
-	xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp);
+	xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, ip->i_ino);
 	ip->i_d.di_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	xfs_trans_binval(tp, cbp);
@@ -761,6 +765,7 @@ xfs_bmap_extents_to_btree(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = mp;
+	args.owner = ip->i_ino;
 	args.firstblock = *firstblock;
 	if (*firstblock == NULLFSBLOCK) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
@@ -907,6 +912,7 @@ xfs_bmap_local_to_extents(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = ip->i_mount;
+	args.owner = ip->i_ino;
 	args.firstblock = *firstblock;
 	/*
 	 * Allocate a block.  We know we need only one, since the
@@ -3690,6 +3696,7 @@ xfs_bmap_btalloc(
 	memset(&args, 0, sizeof(args));
 	args.tp = ap->tp;
 	args.mp = mp;
+	args.owner = ap->ip->i_ino;
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
@@ -4995,8 +5002,8 @@ xfs_bmap_del_extent(
 	 * If we need to, add to list of extents to delete.
 	 */
 	if (do_fx)
-		xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist,
-			mp);
+		xfs_bmap_add_free(mp, flist, del->br_startblock,
+				  del->br_blockcount, ip->i_ino);
 	/*
 	 * Adjust inode # blocks in the file.
 	 */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index a160f8a..1f8eb6e 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -66,6 +66,7 @@ typedef struct xfs_bmap_free_item
 {
 	xfs_fsblock_t		xbfi_startblock;/* starting fs block number */
 	xfs_extlen_t		xbfi_blockcount;/* number of blocks in extent */
+	uint64_t		xbfi_owner;	/* extent owner */
 	struct xfs_bmap_free_item *xbfi_next;	/* link to next entry */
 } xfs_bmap_free_item_t;
 
@@ -191,8 +192,8 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
-void	xfs_bmap_add_free(xfs_fsblock_t bno, xfs_filblks_t len,
-		struct xfs_bmap_free *flist, struct xfs_mount *mp);
+void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
+			  xfs_fsblock_t bno, xfs_filblks_t len, uint64_t owner);
 void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
 int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
 			int *committed);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 3d947d6..45b994b 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -446,6 +446,7 @@ xfs_bmbt_alloc_block(
 	args.mp = cur->bc_mp;
 	args.fsbno = cur->bc_private.b.firstblock;
 	args.firstblock = args.fsbno;
+	args.owner = cur->bc_private.b.ip->i_ino;
 
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
@@ -526,7 +527,7 @@ xfs_bmbt_free_block(
 	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
 
-	xfs_bmap_add_free(fsbno, 1, cur->bc_private.b.flist, mp);
+	xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, ip->i_ino);
 	ip->i_d.di_nblocks--;
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 28635ea..cb30f8b 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1312,6 +1312,22 @@ typedef __be32 xfs_inobt_ptr_t;
  */
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
+/*
+ * Special owner types.
+ *
+ * Seeing as we only support up to 8EB, we have the upper bit of the owner field
+ * to tell us we have a special owner value. We use these for static metadata
+ * allocated at mkfs/growfs time, as well as for freespace management metadata.
+ */
+#define XFS_RMAP_OWN_NULL	(-1ULL)	/* No owner, for growfs */
+#define XFS_RMAP_OWN_UNKNOWN	(-2ULL)	/* Unknown owner, for EFI recovery */
+#define XFS_RMAP_OWN_FS		(-3ULL)	/* static fs metadata */
+#define XFS_RMAP_OWN_LOG	(-4ULL)	/* static fs metadata */
+#define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
+#define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
+#define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
+#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
+
 #define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 70c1db9..7609f8c 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -614,6 +614,7 @@ xfs_ialloc_ag_alloc(
 	args.tp = tp;
 	args.mp = tp->t_mountp;
 	args.fsbno = NULLFSBLOCK;
+	args.owner = XFS_RMAP_OWN_INODES;
 
 #ifdef DEBUG
 	/* randomly do sparse inode allocations */
@@ -1828,9 +1829,8 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
-		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno,
-				  XFS_AGINO_TO_AGBNO(mp, rec->ir_startino)),
-				  mp->m_ialloc_blks, flist, mp);
+		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
+				  mp->m_ialloc_blks, XFS_RMAP_OWN_INODES);
 		return;
 	}
 
@@ -1873,8 +1873,8 @@ xfs_difree_inode_chunk(
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
-		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
-				  flist, mp);
+		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
+				  contigblk, XFS_RMAP_OWN_INODES);
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index f39b285..445245f 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -96,6 +96,7 @@ xfs_inobt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
+	args.owner = XFS_RMAP_OWN_INOBT;
 	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
 	args.minlen = 1;
 	args.maxlen = 1;
@@ -129,7 +130,7 @@ xfs_inobt_free_block(
 	int			error;
 
 	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
-	error = xfs_free_extent(cur->bc_tp, fsbno, 1);
+	error = xfs_free_extent(cur->bc_tp, fsbno, 1, XFS_RMAP_OWN_INOBT);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index dbae649..b447b09 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -151,7 +151,8 @@ xfs_bmap_finish(
 		next = free->xbfi_next;
 
 		error = xfs_trans_free_extent(*tp, efd, free->xbfi_startblock,
-					      free->xbfi_blockcount);
+					      free->xbfi_blockcount,
+					      free->xbfi_owner);
 		if (error)
 			return error;
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 32e24ec..5711700 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -466,14 +466,19 @@ xfs_growfs_data_private(
 		       be32_to_cpu(agi->agi_length));
 
 		xfs_alloc_log_agf(tp, bp, XFS_AGF_LENGTH);
+
 		/*
 		 * Free the new space.
+		 *
+		 * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
+		 * this doesn't actually exist in the rmap btree.
 		 */
-		error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
-			be32_to_cpu(agf->agf_length) - new), new);
-		if (error) {
+		error = xfs_free_extent(tp,
+				XFS_AGB_TO_FSB(mp, agno,
+					be32_to_cpu(agf->agf_length) - new),
+				new, XFS_RMAP_OWN_NULL);
+		if (error)
 			goto error0;
-		}
 	}
 
 	/*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c5ecaac..b0a1f62 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3821,7 +3821,8 @@ xlog_recover_process_efi(
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		extp = &(efip->efi_format.efi_extents[i]);
 		error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
-					      extp->ext_len);
+					      extp->ext_len,
+					      XFS_RMAP_OWN_UNKNOWN);
 		if (error)
 			goto abort_error;
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 4643070..ee277fa 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -222,7 +222,7 @@ struct xfs_efd_log_item	*xfs_trans_get_efd(xfs_trans_t *,
 				  uint);
 int		xfs_trans_free_extent(struct xfs_trans *,
 				      struct xfs_efd_log_item *, xfs_fsblock_t,
-				      xfs_extlen_t);
+				      xfs_extlen_t, uint64_t);
 int		xfs_trans_commit(struct xfs_trans *);
 int		__xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *);
 int		xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index a96ae54..1b7d6d5 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -118,13 +118,14 @@ xfs_trans_free_extent(
 	struct xfs_trans	*tp,
 	struct xfs_efd_log_item	*efdp,
 	xfs_fsblock_t		start_block,
-	xfs_extlen_t		ext_len)
+	xfs_extlen_t		ext_len,
+	uint64_t		owner)
 {
 	uint			next_extent;
 	struct xfs_extent	*extp;
 	int			error;
 
-	error = xfs_free_extent(tp, start_block, ext_len);
+	error = xfs_free_extent(tp, start_block, ext_len, owner);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 10/76] xfs: add extended owner field to extent allocation and freeing
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (8 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 09/76] xfs: add owner field to extent allocation and freeing Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 11/76] xfs: introduce rmap extent operation stubs Darrick J. Wong
                   ` (66 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Extend the owner field to include both the owner type and some sort
of index within the owner.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c        |   11 +++++----
 fs/xfs/libxfs/xfs_alloc.h        |    4 ++-
 fs/xfs/libxfs/xfs_bmap.c         |   20 +++++++++-------
 fs/xfs/libxfs/xfs_bmap.h         |    5 ++--
 fs/xfs/libxfs/xfs_bmap_btree.c   |    7 ++++-
 fs/xfs/libxfs/xfs_format.h       |   49 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.c       |    8 ++++--
 fs/xfs/libxfs/xfs_ialloc_btree.c |    6 +++--
 fs/xfs/xfs_bmap_util.c           |    2 +-
 fs/xfs/xfs_fsops.c               |    5 +++-
 fs/xfs/xfs_log_recover.c         |    4 ++-
 fs/xfs/xfs_trans.h               |    2 +-
 fs/xfs/xfs_trans_extfree.c       |    4 ++-
 13 files changed, 97 insertions(+), 30 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index e3d25a6..1fc3d28 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1594,7 +1594,7 @@ xfs_free_ag_extent(
 	xfs_agnumber_t	agno,	/* allocation group number */
 	xfs_agblock_t	bno,	/* starting block number */
 	xfs_extlen_t	len,	/* length of extent */
-	uint64_t	owner,	/* extent owner */
+	struct xfs_owner_info	*oinfo,	/* extent owner */
 	int		isfl)	/* set if is freelist blocks - no sb acctg */
 {
 	xfs_btree_cur_t	*bno_cur;	/* cursor for by-block btree */
@@ -2015,6 +2015,7 @@ xfs_alloc_fix_freelist(
 	 * back on the free list? Maybe we should only do this when space is
 	 * getting low or the AGFL is more than half full?
 	 */
+	XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
 	while (pag->pagf_flcount > need) {
 		struct xfs_buf	*bp;
 
@@ -2022,7 +2023,7 @@ xfs_alloc_fix_freelist(
 		if (error)
 			goto out_agbp_relse;
 		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
-					   XFS_RMAP_OWN_AG, 1);
+					   &targs.oinfo, 1);
 		if (error)
 			goto out_agbp_relse;
 		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2032,7 +2033,7 @@ xfs_alloc_fix_freelist(
 	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
-	targs.owner = XFS_RMAP_OWN_AG;
+	XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
@@ -2686,7 +2687,7 @@ xfs_free_extent(
 	xfs_trans_t	*tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
 	xfs_extlen_t	len,	/* length of extent */
-	uint64_t	owner)	/* extent owner */
+	struct xfs_owner_info	*oinfo)	/* extent owner */
 {
 	xfs_alloc_arg_t	args;
 	int		error;
@@ -2723,7 +2724,7 @@ xfs_free_extent(
 	}
 
 	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno,
-				   len, owner, 0);
+				   len, oinfo, 0);
 	if (!error)
 		xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
 error0:
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index ee86f44..6d0f328 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -123,7 +123,7 @@ typedef struct xfs_alloc_arg {
 	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* mask defining userdata treatment */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
-	uint64_t	owner;		/* owner of blocks being allocated */
+	struct xfs_owner_info	oinfo;		/* owner of blocks being allocated */
 } xfs_alloc_arg_t;
 
 /*
@@ -212,7 +212,7 @@ xfs_free_extent(
 	struct xfs_trans *tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
 	xfs_extlen_t	len,	/* length of extent */
-	uint64_t	owner);	/* extent owner */
+	struct xfs_owner_info	*oinfo);	/* extent owner */
 
 int					/* error */
 xfs_alloc_lookup_le(
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6944157..9183602 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -555,7 +555,7 @@ xfs_bmap_add_free(
 	struct xfs_bmap_free	*flist,		/* list of extents */
 	xfs_fsblock_t		bno,		/* fs block number of extent */
 	xfs_filblks_t		len,		/* length of extent */
-	uint64_t		owner)		/* extent owner */
+	struct xfs_owner_info	*oinfo)		/* extent owner */
 {
 	xfs_bmap_free_item_t	*cur;		/* current (next) element */
 	xfs_bmap_free_item_t	*new;		/* new element */
@@ -576,12 +576,14 @@ xfs_bmap_add_free(
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_bmap_free_item_zone != NULL);
-	ASSERT(owner);
 
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
-	new->xbfi_owner = owner;
+	if (oinfo)
+		memcpy(&new->xbfi_oinfo, oinfo, sizeof(struct xfs_owner_info));
+	else
+		memset(&new->xbfi_oinfo, 0, sizeof(struct xfs_owner_info));
 	for (prev = NULL, cur = flist->xbf_first;
 	     cur != NULL;
 	     prev = cur, cur = cur->xbfi_next) {
@@ -661,6 +663,7 @@ xfs_bmap_btree_to_extents(
 	xfs_mount_t		*mp;	/* mount point structure */
 	__be64			*pp;	/* ptr to block address */
 	struct xfs_btree_block	*rblock;/* root btree block */
+	struct xfs_owner_info	oinfo;
 
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -684,7 +687,8 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
-	xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, ip->i_ino);
+	XFS_RMAP_INO_BMBT_OWNER(&oinfo, ip->i_ino, whichfork);
+	xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, &oinfo);
 	ip->i_d.di_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	xfs_trans_binval(tp, cbp);
@@ -765,7 +769,7 @@ xfs_bmap_extents_to_btree(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = mp;
-	args.owner = ip->i_ino;
+	XFS_RMAP_INO_BMBT_OWNER(&args.oinfo, ip->i_ino, whichfork);
 	args.firstblock = *firstblock;
 	if (*firstblock == NULLFSBLOCK) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
@@ -912,7 +916,7 @@ xfs_bmap_local_to_extents(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = ip->i_mount;
-	args.owner = ip->i_ino;
+	XFS_RMAP_INO_OWNER(&args.oinfo, ip->i_ino, whichfork, 0);
 	args.firstblock = *firstblock;
 	/*
 	 * Allocate a block.  We know we need only one, since the
@@ -3696,7 +3700,6 @@ xfs_bmap_btalloc(
 	memset(&args, 0, sizeof(args));
 	args.tp = ap->tp;
 	args.mp = mp;
-	args.owner = ap->ip->i_ino;
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
@@ -4817,6 +4820,7 @@ xfs_bmap_del_extent(
 		nblks = 0;
 		do_fx = 0;
 	}
+
 	/*
 	 * Set flag value to use in switch statement.
 	 * Left-contig is 2, right-contig is 1.
@@ -5003,7 +5007,7 @@ xfs_bmap_del_extent(
 	 */
 	if (do_fx)
 		xfs_bmap_add_free(mp, flist, del->br_startblock,
-				  del->br_blockcount, ip->i_ino);
+				  del->br_blockcount, NULL);
 	/*
 	 * Adjust inode # blocks in the file.
 	 */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 1f8eb6e..e7cd789 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -66,7 +66,7 @@ typedef struct xfs_bmap_free_item
 {
 	xfs_fsblock_t		xbfi_startblock;/* starting fs block number */
 	xfs_extlen_t		xbfi_blockcount;/* number of blocks in extent */
-	uint64_t		xbfi_owner;	/* extent owner */
+	struct xfs_owner_info	xbfi_oinfo;	/* extent owner */
 	struct xfs_bmap_free_item *xbfi_next;	/* link to next entry */
 } xfs_bmap_free_item_t;
 
@@ -193,7 +193,8 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
 void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
-			  xfs_fsblock_t bno, xfs_filblks_t len, uint64_t owner);
+			  xfs_fsblock_t bno, xfs_filblks_t len,
+			  struct xfs_owner_info *oinfo);
 void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
 int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
 			int *committed);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 45b994b..789eb48 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -446,7 +446,8 @@ xfs_bmbt_alloc_block(
 	args.mp = cur->bc_mp;
 	args.fsbno = cur->bc_private.b.firstblock;
 	args.firstblock = args.fsbno;
-	args.owner = cur->bc_private.b.ip->i_ino;
+	XFS_RMAP_INO_BMBT_OWNER(&args.oinfo, cur->bc_private.b.ip->i_ino,
+			cur->bc_private.b.whichfork);
 
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
@@ -526,8 +527,10 @@ xfs_bmbt_free_block(
 	struct xfs_inode	*ip = cur->bc_private.b.ip;
 	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+	struct xfs_owner_info	oinfo;
 
-	xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, ip->i_ino);
+	XFS_RMAP_INO_BMBT_OWNER(&oinfo, ip->i_ino, cur->bc_private.b.whichfork);
+	xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, &oinfo);
 	ip->i_d.di_nblocks--;
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index cb30f8b..a751ee6 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1313,6 +1313,55 @@ typedef __be32 xfs_inobt_ptr_t;
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
 /*
+ * Ownership info for an extent.  This is used to create reverse-mapping
+ * entries.
+ */
+#define XFS_RMAP_INO_ATTR_FORK	(1)
+#define XFS_RMAP_BMBT_BLOCK	(2)
+struct xfs_owner_info {
+	uint64_t		oi_owner;
+	xfs_fileoff_t		oi_offset;
+	unsigned int		oi_flags;
+};
+
+static inline void
+XFS_RMAP_AG_OWNER(
+	struct xfs_owner_info	*oi,
+	uint64_t		owner)
+{
+	oi->oi_owner = owner;
+	oi->oi_offset = 0;
+	oi->oi_flags = 0;
+}
+
+static inline void
+XFS_RMAP_INO_BMBT_OWNER(
+	struct xfs_owner_info	*oi,
+	xfs_ino_t		ino,
+	int			whichfork)
+{
+	oi->oi_owner = ino;
+	oi->oi_offset = 0;
+	oi->oi_flags = XFS_RMAP_BMBT_BLOCK;
+	if (whichfork == XFS_ATTR_FORK)
+		oi->oi_flags |= XFS_RMAP_INO_ATTR_FORK;
+}
+
+static inline void
+XFS_RMAP_INO_OWNER(
+	struct xfs_owner_info	*oi,
+	xfs_ino_t		ino,
+	int			whichfork,
+	xfs_fileoff_t		offset)
+{
+	oi->oi_owner = ino;
+	oi->oi_offset = offset;
+	oi->oi_flags = 0;
+	if (whichfork == XFS_ATTR_FORK)
+		oi->oi_flags |= XFS_RMAP_INO_ATTR_FORK;
+}
+
+/*
  * Special owner types.
  *
  * Seeing as we only support up to 8EB, we have the upper bit of the owner field
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 7609f8c..0b8f748 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -614,7 +614,7 @@ xfs_ialloc_ag_alloc(
 	args.tp = tp;
 	args.mp = tp->t_mountp;
 	args.fsbno = NULLFSBLOCK;
-	args.owner = XFS_RMAP_OWN_INODES;
+	XFS_RMAP_AG_OWNER(&args.oinfo, XFS_RMAP_OWN_INODES);
 
 #ifdef DEBUG
 	/* randomly do sparse inode allocations */
@@ -1825,12 +1825,14 @@ xfs_difree_inode_chunk(
 	int		nextbit;
 	xfs_agblock_t	agbno;
 	int		contigblk;
+	struct xfs_owner_info	oinfo;
 	DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
+	XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_INODES);
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
 		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
-				  mp->m_ialloc_blks, XFS_RMAP_OWN_INODES);
+				  mp->m_ialloc_blks, &oinfo);
 		return;
 	}
 
@@ -1874,7 +1876,7 @@ xfs_difree_inode_chunk(
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
 		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
-				  contigblk, XFS_RMAP_OWN_INODES);
+				  contigblk, &oinfo);
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 445245f..bd8a1da 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -96,7 +96,7 @@ xfs_inobt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
-	args.owner = XFS_RMAP_OWN_INOBT;
+	XFS_RMAP_AG_OWNER(&args.oinfo, XFS_RMAP_OWN_INOBT);
 	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
 	args.minlen = 1;
 	args.maxlen = 1;
@@ -128,9 +128,11 @@ xfs_inobt_free_block(
 {
 	xfs_fsblock_t		fsbno;
 	int			error;
+	struct xfs_owner_info	oinfo;
 
+	XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_INOBT);
 	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
-	error = xfs_free_extent(cur->bc_tp, fsbno, 1, XFS_RMAP_OWN_INOBT);
+	error = xfs_free_extent(cur->bc_tp, fsbno, 1, &oinfo);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b447b09..0543779 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -152,7 +152,7 @@ xfs_bmap_finish(
 
 		error = xfs_trans_free_extent(*tp, efd, free->xbfi_startblock,
 					      free->xbfi_blockcount,
-					      free->xbfi_owner);
+					      &free->xbfi_oinfo);
 		if (error)
 			return error;
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 5711700..b3d1665 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -439,6 +439,8 @@ xfs_growfs_data_private(
 	 * There are new blocks in the old last a.g.
 	 */
 	if (new) {
+		struct xfs_owner_info	oinfo;
+
 		/*
 		 * Change the agi length.
 		 */
@@ -473,10 +475,11 @@ xfs_growfs_data_private(
 		 * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
 		 * this doesn't actually exist in the rmap btree.
 		 */
+		XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_NULL);
 		error = xfs_free_extent(tp,
 				XFS_AGB_TO_FSB(mp, agno,
 					be32_to_cpu(agf->agf_length) - new),
-				new, XFS_RMAP_OWN_NULL);
+				new, &oinfo);
 		if (error)
 			goto error0;
 	}
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index b0a1f62..c3abf50 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3786,6 +3786,7 @@ xlog_recover_process_efi(
 	int			error = 0;
 	xfs_extent_t		*extp;
 	xfs_fsblock_t		startblock_fsb;
+	struct xfs_owner_info	oinfo;
 
 	ASSERT(!test_bit(XFS_EFI_RECOVERED, &efip->efi_flags));
 
@@ -3818,11 +3819,12 @@ xlog_recover_process_efi(
 		goto abort_error;
 	efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
 
+	XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_UNKNOWN);
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		extp = &(efip->efi_format.efi_extents[i]);
 		error = xfs_trans_free_extent(tp, efdp, extp->ext_start,
 					      extp->ext_len,
-					      XFS_RMAP_OWN_UNKNOWN);
+					      &oinfo);
 		if (error)
 			goto abort_error;
 
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index ee277fa..50fe77e 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -222,7 +222,7 @@ struct xfs_efd_log_item	*xfs_trans_get_efd(xfs_trans_t *,
 				  uint);
 int		xfs_trans_free_extent(struct xfs_trans *,
 				      struct xfs_efd_log_item *, xfs_fsblock_t,
-				      xfs_extlen_t, uint64_t);
+				      xfs_extlen_t, struct xfs_owner_info *);
 int		xfs_trans_commit(struct xfs_trans *);
 int		__xfs_trans_roll(struct xfs_trans **, struct xfs_inode *, int *);
 int		xfs_trans_roll(struct xfs_trans **, struct xfs_inode *);
diff --git a/fs/xfs/xfs_trans_extfree.c b/fs/xfs/xfs_trans_extfree.c
index 1b7d6d5..d1b8833 100644
--- a/fs/xfs/xfs_trans_extfree.c
+++ b/fs/xfs/xfs_trans_extfree.c
@@ -119,13 +119,13 @@ xfs_trans_free_extent(
 	struct xfs_efd_log_item	*efdp,
 	xfs_fsblock_t		start_block,
 	xfs_extlen_t		ext_len,
-	uint64_t		owner)
+	struct xfs_owner_info	*oinfo)
 {
 	uint			next_extent;
 	struct xfs_extent	*extp;
 	int			error;
 
-	error = xfs_free_extent(tp, start_block, ext_len, owner);
+	error = xfs_free_extent(tp, start_block, ext_len, oinfo);
 
 	/*
 	 * Mark the transaction dirty, even on error. This ensures the

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 11/76] xfs: introduce rmap extent operation stubs
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (9 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 10/76] xfs: add extended " Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 12/76] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
                   ` (65 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/Makefile                |    1 
 fs/xfs/libxfs/xfs_alloc.c      |   11 +++++
 fs/xfs/libxfs/xfs_rmap.c       |   89 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h |   30 +++++++++++++
 fs/xfs/xfs_trace.h             |   37 +++++++++++++++++
 5 files changed, 168 insertions(+)
 create mode 100644 fs/xfs/libxfs/xfs_rmap.c
 create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f646391..c202ce3 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,6 +51,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_fork.o \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
+				   xfs_rmap.o \
 				   xfs_sb.o \
 				   xfs_symlink_remote.o \
 				   xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 1fc3d28..46a8906 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -26,6 +26,7 @@
 #include "xfs_mount.h"
 #include "xfs_inode.h"
 #include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_alloc.h"
 #include "xfs_extent_busy.h"
@@ -646,6 +647,12 @@ xfs_alloc_ag_vextent(
 	ASSERT(!args->wasfromfl || !args->isfl);
 	ASSERT(args->agbno % args->alignment == 0);
 
+	/* insert new block into the reverse map btree */
+	error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
+			       args->agbno, args->len, args->owner);
+	if (error)
+		return error;
+
 	if (!args->wasfromfl) {
 		error = xfs_alloc_update_counters(args->tp, args->pag,
 						  args->agbp,
@@ -1612,6 +1619,10 @@ xfs_free_ag_extent(
 	xfs_extlen_t	nlen;		/* new length of freespace */
 	xfs_perag_t	*pag;		/* per allocation group data */
 
+	error = xfs_rmap_free(tp, agbp, agno, bno, len, owner);
+	if (error)
+		goto error0;
+
 	mp = tp->t_mountp;
 	/*
 	 * Allocate and initialize a cursor for the by-block btree.
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
new file mode 100644
index 0000000..3958cf8
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -0,0 +1,89 @@
+
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_btree.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_trace.h"
+#include "xfs_error.h"
+#include "xfs_extent_busy.h"
+
+int
+xfs_rmap_free(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	int			error = 0;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_free_extent(mp, agno, bno, len, owner);
+	if (1)
+		goto out_error;
+	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, owner);
+	return 0;
+
+out_error:
+	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, owner);
+	return error;
+}
+
+int
+xfs_rmap_alloc(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	int			error = 0;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
+	if (1)
+		goto out_error;
+	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
+	return 0;
+
+out_error:
+	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
new file mode 100644
index 0000000..f1caa40
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_RMAP_BTREE_H__
+#define	__XFS_RMAP_BTREE_H__
+
+struct xfs_buf;
+
+int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
+		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		   uint64_t owner);
+int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
+		  xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		  uint64_t owner);
+
+#endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 877079eb..f506396 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1676,6 +1676,43 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
 
+DECLARE_EVENT_CLASS(xfs_rmap_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner),
+	TP_ARGS(mp, agno, agbno, len, owner),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, agbno)
+		__field(xfs_extlen_t, len)
+		__field(uint64_t, owner)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->agbno = agbno;
+		__entry->len = len;
+		__entry->owner = owner;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->agbno,
+		  __entry->len,
+		  __entry->owner)
+);
+#define DEFINE_RMAP_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner), \
+	TP_ARGS(mp, agno, agbno, len, owner))
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent);
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent_done);
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent_error);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_done);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_error);
+
 DECLARE_EVENT_CLASS(xfs_da_class,
 	TP_PROTO(struct xfs_da_args *args),
 	TP_ARGS(args),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 12/76] xfs: extend rmap extent operation stubs to take full owner info
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (10 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 11/76] xfs: introduce rmap extent operation stubs Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 13/76] xfs: define the on-disk rmap btree format Darrick J. Wong
                   ` (64 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Extend the stubs to take full owner info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c      |   25 +++++++++++++++----------
 fs/xfs/libxfs/xfs_rmap.c       |   16 ++++++++--------
 fs/xfs/libxfs/xfs_rmap_btree.h |    4 ++--
 fs/xfs/xfs_trace.h             |   22 +++++++++++++++-------
 4 files changed, 40 insertions(+), 27 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 46a8906..6c82021 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -647,11 +647,13 @@ xfs_alloc_ag_vextent(
 	ASSERT(!args->wasfromfl || !args->isfl);
 	ASSERT(args->agbno % args->alignment == 0);
 
-	/* insert new block into the reverse map btree */
-	error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
-			       args->agbno, args->len, args->owner);
-	if (error)
-		return error;
+	/* if not file data, insert new block into the reverse map btree */
+	if (args->oinfo.oi_owner) {
+		error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
+				       args->agbno, args->len, &args->oinfo);
+		if (error)
+			return error;
+	}
 
 	if (!args->wasfromfl) {
 		error = xfs_alloc_update_counters(args->tp, args->pag,
@@ -1619,16 +1621,19 @@ xfs_free_ag_extent(
 	xfs_extlen_t	nlen;		/* new length of freespace */
 	xfs_perag_t	*pag;		/* per allocation group data */
 
-	error = xfs_rmap_free(tp, agbp, agno, bno, len, owner);
-	if (error)
-		goto error0;
-
+	bno_cur = cnt_cur = NULL;
 	mp = tp->t_mountp;
+
+	if (oinfo->oi_owner) {
+		error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo);
+		if (error)
+			goto error0;
+	}
+
 	/*
 	 * Allocate and initialize a cursor for the by-block btree.
 	 */
 	bno_cur = xfs_allocbt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_BNO);
-	cnt_cur = NULL;
 	/*
 	 * Look for a neighboring block on the left (lower block numbers)
 	 * that is contiguous with this space.
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 3958cf8..3e17294 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -43,7 +43,7 @@ xfs_rmap_free(
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
-	uint64_t		owner)
+	struct xfs_owner_info	*oinfo)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	int			error = 0;
@@ -51,14 +51,14 @@ xfs_rmap_free(
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
-	trace_xfs_rmap_free_extent(mp, agno, bno, len, owner);
+	trace_xfs_rmap_free_extent(mp, agno, bno, len, oinfo);
 	if (1)
 		goto out_error;
-	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, owner);
+	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, oinfo);
 	return 0;
 
 out_error:
-	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, owner);
+	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, oinfo);
 	return error;
 }
 
@@ -69,7 +69,7 @@ xfs_rmap_alloc(
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
-	uint64_t		owner)
+	struct xfs_owner_info	*oinfo)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
 	int			error = 0;
@@ -77,13 +77,13 @@ xfs_rmap_alloc(
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
-	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
+	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, oinfo);
 	if (1)
 		goto out_error;
-	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
+	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, oinfo);
 	return 0;
 
 out_error:
-	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
+	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, oinfo);
 	return error;
 }
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index f1caa40..a3b8f90 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -22,9 +22,9 @@ struct xfs_buf;
 
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
-		   uint64_t owner);
+		   struct xfs_owner_info *oinfo);
 int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
 		  xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
-		  uint64_t owner);
+		  struct xfs_owner_info *oinfo);
 
 #endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f506396..d41b093 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1678,34 +1678,42 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
 
 DECLARE_EVENT_CLASS(xfs_rmap_class,
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
-		 xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner),
-	TP_ARGS(mp, agno, agbno, len, owner),
+		 xfs_agblock_t agbno, xfs_extlen_t len,
+		 struct xfs_owner_info *oinfo),
+	TP_ARGS(mp, agno, agbno, len, oinfo),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_agnumber_t, agno)
 		__field(xfs_agblock_t, agbno)
 		__field(xfs_extlen_t, len)
 		__field(uint64_t, owner)
+		__field(uint64_t, offset)
+		__field(unsigned long, flags)
 	),
 	TP_fast_assign(
 		__entry->dev = mp->m_super->s_dev;
 		__entry->agno = agno;
 		__entry->agbno = agbno;
 		__entry->len = len;
-		__entry->owner = owner;
+		__entry->owner = oinfo->oi_owner;
+		__entry->offset = oinfo->oi_offset;
+		__entry->flags = oinfo->oi_flags;
 	),
-	TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx",
+	TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx, offset %llu, flags 0x%lx",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->agno,
 		  __entry->agbno,
 		  __entry->len,
-		  __entry->owner)
+		  __entry->owner,
+		  __entry->offset,
+		  __entry->flags)
 );
 #define DEFINE_RMAP_EVENT(name) \
 DEFINE_EVENT(xfs_rmap_class, name, \
 	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
-		 xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner), \
-	TP_ARGS(mp, agno, agbno, len, owner))
+		 xfs_agblock_t agbno, xfs_extlen_t len, \
+		 struct xfs_owner_info *oinfo), \
+	TP_ARGS(mp, agno, agbno, len, oinfo))
 DEFINE_RMAP_EVENT(xfs_rmap_free_extent);
 DEFINE_RMAP_EVENT(xfs_rmap_free_extent_done);
 DEFINE_RMAP_EVENT(xfs_rmap_free_extent_error);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 13/76] xfs: define the on-disk rmap btree format
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (11 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 12/76] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:57 ` [PATCH 14/76] xfs: enhance " Darrick J. Wong
                   ` (63 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start fillin gout the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/Makefile                |    1 
 fs/xfs/libxfs/xfs_btree.c      |    3 +
 fs/xfs/libxfs/xfs_btree.h      |   18 ++--
 fs/xfs/libxfs/xfs_format.h     |   27 ++++++
 fs/xfs/libxfs/xfs_rmap_btree.c |  187 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h |   31 +++++++
 fs/xfs/libxfs/xfs_sb.c         |    6 +
 fs/xfs/libxfs/xfs_shared.h     |    2 
 fs/xfs/xfs_mount.h             |    2 
 9 files changed, 269 insertions(+), 8 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c202ce3..9391080 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -52,6 +52,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
 				   xfs_rmap.o \
+				   xfs_rmap_btree.o \
 				   xfs_sb.o \
 				   xfs_symlink_remote.o \
 				   xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 4ff6b80..56a84e7 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1129,6 +1129,9 @@ xfs_btree_set_refs(
 	case XFS_BTNUM_BMAP:
 		xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF);
 		break;
+	case XFS_BTNUM_RMAP:
+		xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 0d0e546..f7a049b 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -38,17 +38,19 @@ union xfs_btree_ptr {
 };
 
 union xfs_btree_key {
-	xfs_bmbt_key_t		bmbt;
-	xfs_bmdr_key_t		bmbr;	/* bmbt root block */
-	xfs_alloc_key_t		alloc;
-	xfs_inobt_key_t		inobt;
+	struct xfs_bmbt_key		bmbt;
+	xfs_bmdr_key_t			bmbr;	/* bmbt root block */
+	xfs_alloc_key_t			alloc;
+	struct xfs_inobt_key		inobt;
+	struct xfs_rmap_key		rmap;
 };
 
 union xfs_btree_rec {
-	xfs_bmbt_rec_t		bmbt;
-	xfs_bmdr_rec_t		bmbr;	/* bmbt root block */
-	xfs_alloc_rec_t		alloc;
-	xfs_inobt_rec_t		inobt;
+	struct xfs_bmbt_rec		bmbt;
+	xfs_bmdr_rec_t			bmbr;	/* bmbt root block */
+	struct xfs_alloc_rec		alloc;
+	struct xfs_inobt_rec		inobt;
+	struct xfs_rmap_rec		rmap;
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index a751ee6..991d67f 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1377,6 +1377,33 @@ XFS_RMAP_INO_OWNER(
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
 #define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
 
+/*
+ * Data record structure
+ */
+struct xfs_rmap_rec {
+	__be32		rm_startblock;	/* extent start block */
+	__be32		rm_blockcount;	/* extent length */
+	__be64		rm_owner;	/* extent owner */
+};
+
+struct xfs_rmap_irec {
+	xfs_agblock_t	rm_startblock;	/* extent start block */
+	xfs_extlen_t	rm_blockcount;	/* extent length */
+	__uint64_t	rm_owner;	/* extent owner */
+};
+
+/*
+ * Key structure
+ *
+ * We don't use the length for lookups
+ */
+struct xfs_rmap_key {
+	__be32		rm_startblock;	/* extent start block */
+};
+
+/* btree pointer type */
+typedef __be32 xfs_rmap_ptr_t;
+
 #define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
new file mode 100644
index 0000000..9a02699
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_error.h"
+#include "xfs_extent_busy.h"
+
+static struct xfs_btree_cur *
+xfs_rmapbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_rmapbt_init_cursor(cur->bc_mp, cur->bc_tp,
+			cur->bc_private.a.agbp, cur->bc_private.a.agno);
+}
+
+static bool
+xfs_rmapbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_perag	*pag = bp->b_pag;
+	unsigned int		level;
+
+	/*
+	 * magic number and level verification
+	 *
+	 * During growfs operations, we can't verify the exact level or owner as
+	 * the perag is not fully initialised and hence not attached to the
+	 * buffer.  In this case, check against the maximum tree depth.
+	 *
+	 * Similarly, during log recovery we will have a perag structure
+	 * attached, but the agf information will not yet have been initialised
+	 * from the on disk AGF. Again, we can only check against maximum limits
+	 * in this case.
+	 */
+	if (block->bb_magic!= cpu_to_be32(XFS_RMAP_CRC_MAGIC))
+		return false;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return false;
+	if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
+		return false;
+	if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
+		return false;
+	if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+		return false;
+
+	level = be16_to_cpu(block->bb_level);
+	if (pag && pag->pagf_init) {
+		if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
+			return false;
+	} else if (level >= mp->m_ag_maxlevels)
+		return false;
+
+	/* numrecs verification */
+	if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[level != 0])
+		return false;
+
+	/* sibling pointer verification */
+	if (!block->bb_u.s.bb_leftsib ||
+	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
+	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
+		return false;
+	if (!block->bb_u.s.bb_rightsib ||
+	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
+	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
+		return false;
+
+	return true;
+}
+
+static void
+xfs_rmapbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_btree_sblock_verify_crc(bp))
+		xfs_buf_ioerror(bp, -EFSBADCRC);
+	else if (!xfs_rmapbt_verify(bp))
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+	if (bp->b_error) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp);
+	}
+}
+
+static void
+xfs_rmapbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_rmapbt_verify(bp)) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+		xfs_verifier_error(bp);
+		return;
+	}
+	xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
+	.verify_read = xfs_rmapbt_read_verify,
+	.verify_write = xfs_rmapbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_rmapbt_ops = {
+	.rec_len		= sizeof(struct xfs_rmap_rec),
+	.key_len		= sizeof(struct xfs_rmap_key),
+
+	.dup_cursor		= xfs_rmapbt_dup_cursor,
+	.buf_ops		= &xfs_rmapbt_buf_ops,
+};
+
+/*
+ * Allocate a new allocation btree cursor.
+ */
+struct xfs_btree_cur *
+xfs_rmapbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	struct xfs_btree_cur	*cur;
+
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = XFS_BTNUM_RMAP;
+	cur->bc_flags = XFS_BTREE_CRC_BLOCKS;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ops = &xfs_rmapbt_ops;
+	cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+
+	cur->bc_private.a.agbp = agbp;
+	cur->bc_private.a.agno = agno;
+
+	return cur;
+}
+
+/*
+ * Calculate number of records in an rmap btree block.
+ */
+int
+xfs_rmapbt_maxrecs(
+	struct xfs_mount	*mp,
+	int			blocklen,
+	int			leaf)
+{
+	blocklen -= XFS_RMAP_BLOCK_LEN;
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_rmap_rec);
+	return blocklen /
+		(sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index a3b8f90..2e02362 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -19,6 +19,37 @@
 #define	__XFS_RMAP_BTREE_H__
 
 struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/* rmaps only exist on crc enabled filesystems */
+#define XFS_RMAP_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_RMAP_REC_ADDR(block, index) \
+	((struct xfs_rmap_rec *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 (((index) - 1) * sizeof(struct xfs_rmap_rec))))
+
+#define XFS_RMAP_KEY_ADDR(block, index) \
+	((struct xfs_rmap_key *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 ((index) - 1) * sizeof(struct xfs_rmap_key)))
+
+#define XFS_RMAP_PTR_ADDR(block, index, maxrecs) \
+	((xfs_rmap_ptr_t *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 (maxrecs) * sizeof(struct xfs_rmap_key) + \
+		 ((index) - 1) * sizeof(xfs_rmap_ptr_t)))
+
+struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
+				struct xfs_trans *tp, struct xfs_buf *bp,
+				xfs_agnumber_t agno);
+int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
 
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index a0b071d..7c3eb6c 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -36,6 +36,7 @@
 #include "xfs_alloc_btree.h"
 #include "xfs_ialloc_btree.h"
 #include "xfs_log.h"
+#include "xfs_rmap_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -727,6 +728,11 @@ xfs_sb_mount_common(
 	mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
 	mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
 
+	mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 1);
+	mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 0);
+	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
+	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
+
 	mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
 	mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
 					sbp->sb_inopblock);
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 544d0e9..fa2bb9b 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -38,6 +38,7 @@ extern const struct xfs_buf_ops xfs_agi_buf_ops;
 extern const struct xfs_buf_ops xfs_agf_buf_ops;
 extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
+extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -210,6 +211,7 @@ int	xfs_log_calc_minimum_size(struct xfs_mount *);
 #define	XFS_INO_BTREE_REF	3
 #define	XFS_ALLOC_BTREE_REF	2
 #define	XFS_BMAP_BTREE_REF	2
+#define	XFS_RMAP_BTREE_REF	2
 #define	XFS_DIR_BTREE_REF	2
 #define	XFS_INO_REF		2
 #define	XFS_ATTR_BTREE_REF	1
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 77def45..6b52575 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -90,6 +90,8 @@ typedef struct xfs_mount {
 	uint			m_bmap_dmnr[2];	/* min bmap btree records */
 	uint			m_inobt_mxr[2];	/* max inobt btree records */
 	uint			m_inobt_mnr[2];	/* min inobt btree records */
+	uint			m_rmap_mxr[2];	/* max rmap btree records */
+	uint			m_rmap_mnr[2];	/* min rmap btree records */
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 14/76] xfs: enhance the on-disk rmap btree format
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (12 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 13/76] xfs: define the on-disk rmap btree format Darrick J. Wong
@ 2015-12-19  8:57 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 15/76] xfs: add rmap btree growfs support Darrick J. Wong
                   ` (62 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:57 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Expand the rmap btree to record owner and offset info.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h     |   70 +++++++++++++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_rmap_btree.c |    2 +
 2 files changed, 70 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 991d67f..2799746 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1377,6 +1377,8 @@ XFS_RMAP_INO_OWNER(
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
 #define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
 
+#define XFS_RMAP_NON_INODE_OWNER(owner)	(!!((owner) & (1ULL << 63)))
+
 /*
  * Data record structure
  */
@@ -1384,12 +1386,44 @@ struct xfs_rmap_rec {
 	__be32		rm_startblock;	/* extent start block */
 	__be32		rm_blockcount;	/* extent length */
 	__be64		rm_owner;	/* extent owner */
+	__be64		rm_offset;	/* offset within the owner */
 };
 
+/*
+ * rmap btree record
+ *  rm_blockcount:31 is the unwritten extent flag (same as l0:63 in bmbt)
+ *  rm_blockcount:0-30 are the extent length
+ *  rm_offset:63 is the attribute fork flag
+ *  rm_offset:62 is the bmbt block flag
+ *  rm_offset:0-61 is the block offset within the inode
+ */
+#define XFS_RMAP_OFF_ATTR	((__uint64_t)1ULL << 63)
+#define XFS_RMAP_OFF_BMBT	((__uint64_t)1ULL << 62)
+#define XFS_RMAP_LEN_UNWRITTEN	((xfs_extlen_t)1U << 31)
+
+#define XFS_RMAP_OFF_MASK	~(XFS_RMAP_OFF_ATTR | XFS_RMAP_OFF_BMBT)
+#define XFS_RMAP_LEN_MASK	~XFS_RMAP_LEN_UNWRITTEN
+
+#define XFS_RMAP_OFF(off)		((off) & XFS_RMAP_OFF_MASK)
+#define XFS_RMAP_LEN(len)		((len) & XFS_RMAP_LEN_MASK)
+
+#define XFS_RMAP_IS_BMBT(off)		(!!((off) & XFS_RMAP_OFF_BMBT))
+#define XFS_RMAP_IS_ATTR_FORK(off)	(!!((off) & XFS_RMAP_OFF_ATTR))
+#define XFS_RMAP_IS_UNWRITTEN(len)	(!!((len) & XFS_RMAP_LEN_UNWRITTEN))
+
+#define RMAPBT_STARTBLOCK_BITLEN	32
+#define RMAPBT_EXNTFLAG_BITLEN		1
+#define RMAPBT_BLOCKCOUNT_BITLEN	31
+#define RMAPBT_OWNER_BITLEN		64
+#define RMAPBT_ATTRFLAG_BITLEN		1
+#define RMAPBT_BMBTFLAG_BITLEN		1
+#define RMAPBT_OFFSET_BITLEN		62
+
 struct xfs_rmap_irec {
 	xfs_agblock_t	rm_startblock;	/* extent start block */
 	xfs_extlen_t	rm_blockcount;	/* extent length */
 	__uint64_t	rm_owner;	/* extent owner */
+	__uint64_t	rm_offset;	/* offset within the owner */
 };
 
 /*
@@ -1399,7 +1433,9 @@ struct xfs_rmap_irec {
  */
 struct xfs_rmap_key {
 	__be32		rm_startblock;	/* extent start block */
-};
+	__be64		rm_owner;	/* extent owner */
+	__be64		rm_offset;	/* offset within the owner */
+} __attribute__((packed));
 
 /* btree pointer type */
 typedef __be32 xfs_rmap_ptr_t;
@@ -1409,6 +1445,38 @@ typedef __be32 xfs_rmap_ptr_t;
 	 XFS_FIBT_BLOCK(mp) + 1 : \
 	 XFS_IBT_BLOCK(mp) + 1)
 
+static inline void
+xfs_owner_info_unpack(
+	struct xfs_owner_info	*oinfo,
+	uint64_t		*owner,
+	uint64_t		*offset)
+{
+	__uint64_t		r;
+
+	*owner = oinfo->oi_owner;
+	r = oinfo->oi_offset;
+	if (oinfo->oi_flags & XFS_RMAP_INO_ATTR_FORK)
+		r |= XFS_RMAP_OFF_ATTR;
+	if (oinfo->oi_flags & XFS_RMAP_BMBT_BLOCK)
+		r |= XFS_RMAP_OFF_BMBT;
+	*offset = r;
+}
+
+static inline void
+xfs_owner_info_pack(
+	struct xfs_owner_info	*oinfo,
+	uint64_t		owner,
+	uint64_t		offset)
+{
+	oinfo->oi_owner = owner;
+	oinfo->oi_offset = XFS_RMAP_OFF(offset);
+	oinfo->oi_flags = 0;
+	if (XFS_RMAP_IS_ATTR_FORK(offset))
+		oinfo->oi_flags |= XFS_RMAP_INO_ATTR_FORK;
+	if (XFS_RMAP_IS_BMBT(offset))
+		oinfo->oi_flags |= XFS_RMAP_BMBT_BLOCK;
+}
+
 /*
  * BMAP Btree format definitions
  *
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 9a02699..5671771 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -63,7 +63,7 @@ xfs_rmapbt_verify(
 	 * from the on disk AGF. Again, we can only check against maximum limits
 	 * in this case.
 	 */
-	if (block->bb_magic!= cpu_to_be32(XFS_RMAP_CRC_MAGIC))
+	if (block->bb_magic != cpu_to_be32(XFS_RMAP_CRC_MAGIC))
 		return false;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 15/76] xfs: add rmap btree growfs support
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (13 preceding siblings ...)
  2015-12-19  8:57 ` [PATCH 14/76] xfs: enhance " Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 16/76] xfs: enhance " Darrick J. Wong
                   ` (61 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Now we can read and write rmap btree blocks, we can add support to
the growfs code to initialise new rmap btree blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index b3d1665..38c78da 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -32,6 +32,7 @@
 #include "xfs_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_ialloc.h"
 #include "xfs_fsops.h"
 #include "xfs_itable.h"
@@ -243,6 +244,12 @@ xfs_growfs_data_private(
 		agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
 		agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
 		agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			agf->agf_roots[XFS_BTNUM_RMAPi] =
+						cpu_to_be32(XFS_RMAP_BLOCK(mp));
+			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+		}
+
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
@@ -382,6 +389,67 @@ xfs_growfs_data_private(
 		if (error)
 			goto error0;
 
+		/* RMAP btree root block */
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			struct xfs_rmap_rec	*rrec;
+			struct xfs_btree_block	*block;
+			bp = xfs_growfs_get_hdr_buf(mp,
+				XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
+				BTOBB(mp->m_sb.sb_blocksize), 0,
+				&xfs_rmapbt_buf_ops);
+			if (!bp) {
+				error = -ENOMEM;
+				goto error0;
+			}
+
+			xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 2,
+						agno, XFS_BTREE_CRC_BLOCKS);
+			block = XFS_BUF_TO_BLOCK(bp);
+
+
+			/*
+			 * mark the AG header regions as static metadata The BNO
+			 * btree block is the first block after the headers, so
+			 * it's location defines the size of region the static
+			 * metadata consumes.
+			 *
+			 * Note: unlike mkfs, we never have to account for log
+			 * space when growing the data regions
+			 */
+			rrec = XFS_RMAP_REC_ADDR(block, 1);
+			rrec->rm_startblock = 0;
+			rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account freespace btree root blocks */
+			rrec = XFS_RMAP_REC_ADDR(block, 2);
+			rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(2);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account inode btree root blocks */
+			rrec = XFS_RMAP_REC_ADDR(block, 3);
+			rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
+							XFS_IBT_BLOCK(mp));
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account for rmap btree root */ 
+			rrec = XFS_RMAP_REC_ADDR(block, 4);
+			rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(1);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			error = xfs_bwrite(bp);
+			xfs_buf_relse(bp);
+			if (error)
+				goto error0;
+		}
+
 		/*
 		 * INO btree root block
 		 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 16/76] xfs: enhance rmap btree growfs support
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (14 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 15/76] xfs: add rmap btree growfs support Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 17/76] xfs: rmap btree transaction reservations Darrick J. Wong
                   ` (60 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Fill out the rmap offset fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_fsops.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 38c78da..119be0a 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -393,6 +393,7 @@ xfs_growfs_data_private(
 		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
 			struct xfs_rmap_rec	*rrec;
 			struct xfs_btree_block	*block;
+
 			bp = xfs_growfs_get_hdr_buf(mp,
 				XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
 				BTOBB(mp->m_sb.sb_blocksize), 0,
@@ -402,7 +403,7 @@ xfs_growfs_data_private(
 				goto error0;
 			}
 
-			xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 2,
+			xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 0,
 						agno, XFS_BTREE_CRC_BLOCKS);
 			block = XFS_BUF_TO_BLOCK(bp);
 
@@ -420,6 +421,7 @@ xfs_growfs_data_private(
 			rrec->rm_startblock = 0;
 			rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
 			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
 			/* account freespace btree root blocks */
@@ -427,6 +429,7 @@ xfs_growfs_data_private(
 			rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
 			rrec->rm_blockcount = cpu_to_be32(2);
 			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
 			/* account inode btree root blocks */
@@ -435,13 +438,15 @@ xfs_growfs_data_private(
 			rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
 							XFS_IBT_BLOCK(mp));
 			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
-			/* account for rmap btree root */ 
+			/* account for rmap btree root */
 			rrec = XFS_RMAP_REC_ADDR(block, 4);
 			rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
 			rrec->rm_blockcount = cpu_to_be32(1);
 			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
 			error = xfs_bwrite(bp);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 17/76] xfs: rmap btree transaction reservations
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (15 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 16/76] xfs: enhance " Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 18/76] xfs: rmap btree requires more reserved free space Darrick J. Wong
                   ` (59 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.

Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c |   56 ++++++++++++++++++++++++++++------------
 fs/xfs/libxfs/xfs_trans_resv.h |   10 -------
 2 files changed, 39 insertions(+), 27 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 68cb1e7..d495f82 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -64,6 +64,28 @@ xfs_calc_buf_res(
 }
 
 /*
+ * Per-extent log reservation for the allocation btree changes
+ * involved in freeing or allocating an extent. When rmap is not enabled,
+ * there are only two trees that will be modified (free space trees), and when
+ * rmap is enabled there will be three (freespace + rmap trees). The number of
+ * blocks reserved is based on the formula:
+ *
+ * num trees * ((2 blocks/level * max depth) - 1)
+ */
+static uint
+xfs_allocfree_log_count(
+	struct xfs_mount *mp,
+	uint		num_ops)
+{
+	uint		num_trees = 2;
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		num_trees++;
+
+	return num_ops * num_trees * (2 * mp->m_ag_maxlevels - 1);
+}
+
+/*
  * Logging inodes is really tricksy. They are logged in memory format,
  * which means that what we write into the log doesn't directly translate into
  * the amount of space they use on disk.
@@ -126,7 +148,7 @@ xfs_calc_inode_res(
  */
 STATIC uint
 xfs_calc_finobt_res(
-	struct xfs_mount 	*mp,
+	struct xfs_mount	*mp,
 	int			alloc,
 	int			modify)
 {
@@ -137,7 +159,7 @@ xfs_calc_finobt_res(
 
 	res = xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1));
 	if (alloc)
-		res += xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1), 
+		res += xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 					XFS_FSB_TO_B(mp, 1));
 	if (modify)
 		res += (uint)XFS_FSB_TO_B(mp, 1);
@@ -188,10 +210,10 @@ xfs_calc_write_reservation(
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				      XFS_FSB_TO_B(mp, 1)) +
 		     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -217,10 +239,10 @@ xfs_calc_itruncate_reservation(
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1,
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
 				      XFS_FSB_TO_B(mp, 1)) +
 		    xfs_calc_buf_res(5, 0) +
-		    xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				     XFS_FSB_TO_B(mp, 1)) +
 		    xfs_calc_buf_res(2 + mp->m_ialloc_blks +
 				     mp->m_in_maxlevels, 0)));
@@ -247,7 +269,7 @@ xfs_calc_rename_reservation(
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 3),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -286,7 +308,7 @@ xfs_calc_link_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -324,7 +346,7 @@ xfs_calc_remove_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -371,7 +393,7 @@ xfs_calc_create_resv_alloc(
 		mp->m_sb.sb_sectsize +
 		xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -399,7 +421,7 @@ xfs_calc_icreate_resv_alloc(
 	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
 		mp->m_sb.sb_sectsize +
 		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_finobt_res(mp, 0, 0);
 }
@@ -483,7 +505,7 @@ xfs_calc_ifree_reservation(
 		xfs_calc_buf_res(1, 0) +
 		xfs_calc_buf_res(2 + mp->m_ialloc_blks +
 				 mp->m_in_maxlevels, 0) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_finobt_res(mp, 0, 1);
 }
@@ -513,7 +535,7 @@ xfs_calc_growdata_reservation(
 	struct xfs_mount	*mp)
 {
 	return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -535,7 +557,7 @@ xfs_calc_growrtalloc_reservation(
 		xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_inode_res(mp, 1) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -611,7 +633,7 @@ xfs_calc_addafork_reservation(
 		xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
 		xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
 				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -634,7 +656,7 @@ xfs_calc_attrinval_reservation(
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
 		   (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		    xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
 				     XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -701,7 +723,7 @@ xfs_calc_attrrm_reservation(
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
 		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
index 7978150..0eb46ed 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.h
+++ b/fs/xfs/libxfs/xfs_trans_resv.h
@@ -68,16 +68,6 @@ struct xfs_trans_resv {
 #define M_RES(mp)	(&(mp)->m_resv)
 
 /*
- * Per-extent log reservation for the allocation btree changes
- * involved in freeing or allocating an extent.
- * 2 trees * (2 blocks/level * max depth - 1) * block size
- */
-#define	XFS_ALLOCFREE_LOG_RES(mp,nx) \
-	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * (mp)->m_ag_maxlevels - 1)))
-#define	XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
-	((nx) * (2 * (2 * (mp)->m_ag_maxlevels - 1)))
-
-/*
  * Per-directory log reservation for any directory change.
  * dir blocks: (1 btree block per level + data block + free block) * dblock size
  * bmap btree: (levels + 2) * max depth * block size

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 18/76] xfs: rmap btree requires more reserved free space
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (16 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 17/76] xfs: rmap btree transaction reservations Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 19/76] libxfs: fix min freelist length calculation Darrick J. Wong
                   ` (58 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functiosn to handle this.

Also, because the macros are now executing conditional code and are called quite
frequently, convert them to functions that initialise varaibles in the struct
xfs_mount, use the new variables everywhere and document the calculations
better.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c |   69 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_alloc.h |   41 +++------------------------
 fs/xfs/libxfs/xfs_bmap.c  |    2 +
 fs/xfs/libxfs/xfs_sb.c    |    2 +
 fs/xfs/xfs_discard.c      |    2 +
 fs/xfs/xfs_fsops.c        |    4 +--
 fs/xfs/xfs_mount.c        |    2 +
 fs/xfs/xfs_mount.h        |    2 +
 fs/xfs/xfs_super.c        |    2 +
 9 files changed, 84 insertions(+), 42 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6c82021..5a829d7 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -62,6 +62,72 @@ xfs_prealloc_blocks(
 }
 
 /*
+ * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
+ * AGF buffer (PV 947395), we place constraints on the relationship among actual
+ * allocations for data blocks, freelist blocks, and potential file data bmap
+ * btree blocks. However, these restrictions may result in no actual space
+ * allocated for a delayed extent, for example, a data block in a certain AG is
+ * allocated but there is no additional block for the additional bmap btree
+ * block due to a split of the bmap btree of the file. The result of this may
+ * lead to an infinite loop when the file gets flushed to disk and all delayed
+ * extents need to be actually allocated. To get around this, we explicitly set
+ * aside a few blocks which will not be reserved in delayed allocation.
+ *
+ * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
+ * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
+ * so we set the number of set-aside blocks to 4 + 4*agcount when not using rmap
+ * btrees.
+ *
+ * When rmap btrees are active, we have to consider that using the last block in
+ * the AG can cause a full height rmap btree split and we need enough blocks on
+ * the AGFL to be able to handle this. That means we have, in addition to the
+ * above consideration, another (2 * mp->m_ag_levels) - 1 blocks required to be
+ * available to the free list.
+ */
+unsigned int
+xfs_alloc_set_aside(
+	struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return blocks;
+	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_ag_maxlevels) - 1);
+}
+
+/*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ *	- the AG superblock, AGF, AGI and AGFL
+ *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
+ *	  the AGI free inode and rmap btree root blocks.
+ *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+unsigned int
+xfs_alloc_ag_max_usable(struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
+	blocks += XFS_ALLOC_AGFL_RESERVE;
+	blocks += 3;			/* AGF, AGI btree root blocks */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		blocks++;		/* finobt root block */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		/* rmap root block + full tree split on full AG */
+		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
+	}
+
+	return mp->m_sb.sb_agblocks - blocks;
+}
+
+/*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
 STATIC int				/* error */
@@ -1913,6 +1979,9 @@ xfs_alloc_min_freelist(
 	/* space needed by-size freespace btree */
 	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
 				       mp->m_ag_maxlevels);
+	/* space needed reverse mapping used space btree */
+	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+				       mp->m_ag_maxlevels);
 
 	return min_free;
 }
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 6d0f328..ea2868d 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
 
 /*
- * In order to avoid ENOSPC-related deadlock caused by
- * out-of-order locking of AGF buffer (PV 947395), we place
- * constraints on the relationship among actual allocations for
- * data blocks, freelist blocks, and potential file data bmap
- * btree blocks. However, these restrictions may result in no
- * actual space allocated for a delayed extent, for example, a data
- * block in a certain AG is allocated but there is no additional
- * block for the additional bmap btree block due to a split of the
- * bmap btree of the file. The result of this may lead to an
- * infinite loop in xfssyncd when the file gets flushed to disk and
- * all delayed extents need to be actually allocated. To get around
- * this, we explicitly set aside a few blocks which will not be
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks
- * to 4 + 4*agcount.
- */
-#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
-
-/*
- * When deciding how much space to allocate out of an AG, we limit the
- * allocation maximum size to the size the AG. However, we cannot use all the
- * blocks in the AG - some are permanently used by metadata. These
- * blocks are generally:
- *	- the AG superblock, AGF, AGI and AGFL
- *	- the AGF (bno and cnt) and AGI btree root blocks
- *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
- *
- * The AG headers are sector sized, so the amount of space they take up is
- * dependent on filesystem geometry. The others are all single blocks.
- */
-#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
-	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
-
-
-/*
  * Argument structure for xfs_alloc routines.
  * This is turned into a structure to avoid having 20 arguments passed
  * down several levels of the stack.
@@ -133,6 +97,11 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
 #define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
 
+/* freespace limit calculations */
+#define XFS_ALLOC_AGFL_RESERVE	4
+unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
+unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
+
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
 		struct xfs_perag *pag, xfs_extlen_t need);
 unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9183602..9fd02de 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3703,7 +3703,7 @@ xfs_bmap_btalloc(
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
-	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
+	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
 	args.firstblock = *ap->firstblock;
 	blen = 0;
 	if (nullfb) {
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 7c3eb6c..c9114b7 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -742,6 +742,8 @@ xfs_sb_mount_common(
 		mp->m_ialloc_min_blks = sbp->sb_spino_align;
 	else
 		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
+	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
 }
 
 /*
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index e85a951..ec7bb8b 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -179,7 +179,7 @@ xfs_ioc_trim(
 	 * matter as trimming blocks is an advisory interface.
 	 */
 	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
-	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
+	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
 	    range.len < mp->m_sb.sb_blocksize)
 		return -EINVAL;
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 119be0a..031dd92 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -723,7 +723,7 @@ xfs_fs_counts(
 	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
 	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
 	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
-							XFS_ALLOC_SET_ASIDE(mp);
+						mp->m_alloc_set_aside;
 
 	spin_lock(&mp->m_sb_lock);
 	cnt->freertx = mp->m_sb.sb_frextents;
@@ -796,7 +796,7 @@ retry:
 		__int64_t	free;
 
 		free = percpu_counter_sum(&mp->m_fdblocks) -
-							XFS_ALLOC_SET_ASIDE(mp);
+						mp->m_alloc_set_aside;
 		if (!free)
 			goto out; /* ENOSPC and fdblks_delta = 0 */
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 3d4a4fb..335bcad 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1215,7 +1215,7 @@ xfs_mod_fdblocks(
 		batch = XFS_FDBLOCKS_BATCH;
 
 	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
-	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
+	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
 				     XFS_FDBLOCKS_BATCH) >= 0) {
 		/* we had space! */
 		return 0;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 6b52575..16f31f3 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -96,6 +96,8 @@ typedef struct xfs_mount {
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
+	uint			m_alloc_set_aside; /* space we can't use */
+	uint			m_ag_max_usable; /* max space per AG */
 	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 	struct mutex		m_growlock;	/* growfs mutex */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 36bd882..85e582d 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1073,7 +1073,7 @@ xfs_fs_statfs(
 	statp->f_blocks = sbp->sb_dblocks - lsize;
 	spin_unlock(&mp->m_sb_lock);
 
-	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
+	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
 	statp->f_bavail = statp->f_bfree;
 
 	fakeinos = statp->f_bfree << sbp->sb_inopblog;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 19/76] libxfs: fix min freelist length calculation
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (17 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 18/76] xfs: rmap btree requires more reserved free space Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 20/76] xfs: add rmap btree operations Darrick J. Wong
                   ` (57 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

If rmapbt is disabled, it is incorrect to require 1 extra AGFL block
for the rmapbt (due to the + 1); the entire clause needs to be gated
on the feature flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 5a829d7..aa3cc24 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1980,8 +1980,10 @@ xfs_alloc_min_freelist(
 	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
 				       mp->m_ag_maxlevels);
 	/* space needed reverse mapping used space btree */
-	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
-				       mp->m_ag_maxlevels);
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		min_free += min_t(unsigned int,
+				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+				  mp->m_ag_maxlevels);
 
 	return min_free;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 20/76] xfs: add rmap btree operations
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (18 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 19/76] libxfs: fix min freelist length calculation Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 21/76] xfs: enhance " Darrick J. Wong
                   ` (56 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_btree.h      |    1 
 fs/xfs/libxfs/xfs_rmap.c       |   58 +++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.c |  204 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 263 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index f7a049b..35e7754 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -212,6 +212,7 @@ typedef struct xfs_btree_cur
 		xfs_alloc_rec_incore_t	a;
 		xfs_bmbt_irec_t		b;
 		xfs_inobt_rec_incore_t	i;
+		struct xfs_rmap_irec	r;
 	}		bc_rec;		/* current insert/search record value */
 	struct xfs_buf	*bc_bufs[XFS_BTREE_MAXLEVELS];	/* buf ptr per level */
 	int		bc_ptrs[XFS_BTREE_MAXLEVELS];	/* key/record # */
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 3e17294..64b2525 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -36,6 +36,64 @@
 #include "xfs_error.h"
 #include "xfs_extent_busy.h"
 
+/*
+ * Lookup the first record less than or equal to [bno, len]
+ * in the btree given by cur.
+ */
+STATIC int
+xfs_rmap_lookup_le(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	int			*stat)
+{
+	cur->bc_rec.r.rm_startblock = bno;
+	cur->bc_rec.r.rm_blockcount = len;
+	cur->bc_rec.r.rm_owner = owner;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, ref].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_rmap_update(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*irec)
+{
+	union xfs_btree_rec	rec;
+
+	rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
+	rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
+	rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
+	return xfs_btree_update(cur, &rec);
+}
+
+/*
+ * Get the data from the pointed-to record.
+ */
+STATIC int
+xfs_rmap_get_rec(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*irec,
+	int			*stat)
+{
+	union xfs_btree_rec	*rec;
+	int			error;
+
+	error = xfs_btree_get_rec(cur, &rec, stat);
+	if (error || !*stat)
+		return error;
+
+	irec->rm_startblock = be32_to_cpu(rec->rmap.rm_startblock);
+	irec->rm_blockcount = be32_to_cpu(rec->rmap.rm_blockcount);
+	irec->rm_owner = be64_to_cpu(rec->rmap.rm_owner);
+	return 0;
+}
+
 int
 xfs_rmap_free(
 	struct xfs_trans	*tp,
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 5671771..58bdac3 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -34,6 +34,26 @@
 #include "xfs_error.h"
 #include "xfs_extent_busy.h"
 
+/*
+ * Reverse map btree.
+ *
+ * This is a per-ag tree used to track the owner of a given extent. Owner
+ * records are inserted when an extent is allocated, and removed when an extent
+ * is freed. There can only be one owner of an extent, usually an inode or some
+ * other metadata structure like a AG btree.
+ *
+ * The rmap btree is part of the free space management, so blocks for the tree
+ * are sourced from the agfl. Hence we need transaction reservation support for
+ * this tree so that the freelist is always large enough. This also impacts on
+ * the minimum space we need to leave free in the AG.
+ *
+ * The tree is ordered by block number - there's no need to order/search by
+ * extent size for online updating/management of the tree, and the reverse
+ * lookups are going to be "who owns this block" and so are by-block ordering is
+ * perfect for this.
+ *
+ */
+
 static struct xfs_btree_cur *
 xfs_rmapbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
@@ -42,6 +62,153 @@ xfs_rmapbt_dup_cursor(
 			cur->bc_private.a.agbp, cur->bc_private.a.agno);
 }
 
+STATIC void
+xfs_rmapbt_set_root(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			inc)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			btnum = cur->bc_btnum;
+	struct xfs_perag	*pag = xfs_perag_get(cur->bc_mp, seqno);
+
+	ASSERT(ptr->s != 0);
+
+	agf->agf_roots[btnum] = ptr->s;
+	be32_add_cpu(&agf->agf_levels[btnum], inc);
+	pag->pagf_levels[btnum] += inc;
+	xfs_perag_put(pag);
+
+	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+}
+
+STATIC int
+xfs_rmapbt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	int			error;
+	xfs_agblock_t		bno;
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
+
+	/* Allocate the new block from the freelist. If we can't, give up.  */
+	error = xfs_alloc_get_freelist(cur->bc_tp, cur->bc_private.a.agbp,
+				       &bno, 1);
+	if (error) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+		return error;
+	}
+
+	if (bno == NULLAGBLOCK) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+		*stat = 0;
+		return 0;
+	}
+
+	xfs_extent_busy_reuse(cur->bc_mp, cur->bc_private.a.agno, bno, 1, false);
+
+	xfs_trans_agbtree_delta(cur->bc_tp, 1);
+	new->s = cpu_to_be32(bno);
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+	*stat = 1;
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_free_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agblock_t		bno;
+	int			error;
+
+	bno = xfs_daddr_to_agbno(cur->bc_mp, XFS_BUF_ADDR(bp));
+	error = xfs_alloc_put_freelist(cur->bc_tp, agbp, NULL, bno, 1);
+	if (error)
+		return error;
+
+	xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1,
+			      XFS_EXTENT_BUSY_SKIP_DISCARD);
+	xfs_trans_agbtree_delta(cur->bc_tp, -1);
+
+	xfs_trans_binval(cur->bc_tp, bp);
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_rmap_mnr[level != 0];
+}
+
+STATIC int
+xfs_rmapbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_rmap_mxr[level != 0];
+}
+
+STATIC void
+xfs_rmapbt_init_key_from_rec(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_key(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	rec->rmap.rm_startblock = key->rmap.rm_startblock;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
+	rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
+	rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+}
+
+STATIC void
+xfs_rmapbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+	ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(agf->agf_roots[cur->bc_btnum] != 0);
+
+	ptr->s = agf->agf_roots[cur->bc_btnum];
+}
+
+STATIC __int64_t
+xfs_rmapbt_key_diff(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*key)
+{
+	struct xfs_rmap_irec	*rec = &cur->bc_rec.r;
+	struct xfs_rmap_key	*kp = &key->rmap;
+
+	return (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+}
+
 static bool
 xfs_rmapbt_verify(
 	struct xfs_buf		*bp)
@@ -133,12 +300,49 @@ const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
 	.verify_write = xfs_rmapbt_write_verify,
 };
 
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_rmapbt_keys_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return be32_to_cpu(k1->rmap.rm_startblock) <
+	       be32_to_cpu(k2->rmap.rm_startblock);
+}
+
+STATIC int
+xfs_rmapbt_recs_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*r1,
+	union xfs_btree_rec	*r2)
+{
+	return be32_to_cpu(r1->rmap.rm_startblock) +
+		be32_to_cpu(r1->rmap.rm_blockcount) <=
+		be32_to_cpu(r2->rmap.rm_startblock);
+}
+#endif	/* DEBUG */
+
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.rec_len		= sizeof(struct xfs_rmap_rec),
 	.key_len		= sizeof(struct xfs_rmap_key),
 
 	.dup_cursor		= xfs_rmapbt_dup_cursor,
+	.set_root		= xfs_rmapbt_set_root,
+	.alloc_block		= xfs_rmapbt_alloc_block,
+	.free_block		= xfs_rmapbt_free_block,
+	.get_minrecs		= xfs_rmapbt_get_minrecs,
+	.get_maxrecs		= xfs_rmapbt_get_maxrecs,
+	.init_key_from_rec	= xfs_rmapbt_init_key_from_rec,
+	.init_rec_from_key	= xfs_rmapbt_init_rec_from_key,
+	.init_rec_from_cur	= xfs_rmapbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_rmapbt_init_ptr_from_cur,
+	.key_diff		= xfs_rmapbt_key_diff,
 	.buf_ops		= &xfs_rmapbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+	.keys_inorder		= xfs_rmapbt_keys_inorder,
+	.recs_inorder		= xfs_rmapbt_recs_inorder,
+#endif
 };
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 21/76] xfs: enhance rmap btree operations
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (19 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 20/76] xfs: add rmap btree operations Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 22/76] xfs: add an extent to the rmap btree Darrick J. Wong
                   ` (55 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Adapt the rmap btree to store owner offsets within each rmap record,
and to handle the primary key being extended to [agblk, owner, offset].
The expansion of the primary key is crucial to allowing multiple owners
per extent.  Unfortunately, doing so adds the requirement that all rmap
records for file extents (metadata always has one owner) correspond to
some bmbt entry somewhere.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_rmap.c       |   32 +++++++++++++++++---
 fs/xfs/libxfs/xfs_rmap_btree.c |   65 ++++++++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_rmap_btree.h |    7 ++++
 3 files changed, 84 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 64b2525..f6fe742 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -37,26 +37,48 @@
 #include "xfs_extent_busy.h"
 
 /*
- * Lookup the first record less than or equal to [bno, len]
+ * Lookup the first record less than or equal to [bno, len, owner, offset]
  * in the btree given by cur.
  */
-STATIC int
+int
 xfs_rmap_lookup_le(
 	struct xfs_btree_cur	*cur,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len,
 	uint64_t		owner,
+	uint64_t		offset,
 	int			*stat)
 {
 	cur->bc_rec.r.rm_startblock = bno;
 	cur->bc_rec.r.rm_blockcount = len;
 	cur->bc_rec.r.rm_owner = owner;
+	cur->bc_rec.r.rm_offset = offset;
 	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
 }
 
 /*
+ * Lookup the record exactly matching [bno, len, owner, offset]
+ * in the btree given by cur.
+ */
+int
+xfs_rmap_lookup_eq(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset,
+	int			*stat)
+{
+	cur->bc_rec.r.rm_startblock = bno;
+	cur->bc_rec.r.rm_blockcount = len;
+	cur->bc_rec.r.rm_owner = owner;
+	cur->bc_rec.r.rm_offset = offset;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_EQ, stat);
+}
+
+/*
  * Update the record referred to by cur to the value given
- * by [bno, len, ref].
+ * by [bno, len, owner, offset].
  * This either works (return 0) or gets an EFSCORRUPTED error.
  */
 STATIC int
@@ -69,13 +91,14 @@ xfs_rmap_update(
 	rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
 	rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
 	rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
+	rec.rmap.rm_offset = cpu_to_be64(irec->rm_offset);
 	return xfs_btree_update(cur, &rec);
 }
 
 /*
  * Get the data from the pointed-to record.
  */
-STATIC int
+int
 xfs_rmap_get_rec(
 	struct xfs_btree_cur	*cur,
 	struct xfs_rmap_irec	*irec,
@@ -91,6 +114,7 @@ xfs_rmap_get_rec(
 	irec->rm_startblock = be32_to_cpu(rec->rmap.rm_startblock);
 	irec->rm_blockcount = be32_to_cpu(rec->rmap.rm_blockcount);
 	irec->rm_owner = be64_to_cpu(rec->rmap.rm_owner);
+	irec->rm_offset = be64_to_cpu(rec->rmap.rm_offset);
 	return 0;
 }
 
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 58bdac3..5fe717b 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -37,21 +37,26 @@
 /*
  * Reverse map btree.
  *
- * This is a per-ag tree used to track the owner of a given extent. Owner
- * records are inserted when an extent is allocated, and removed when an extent
- * is freed. There can only be one owner of an extent, usually an inode or some
- * other metadata structure like a AG btree.
+ * This is a per-ag tree used to track the owner(s) of a given extent. With
+ * reflink it is possible for there to be multiple owners, which is a departure
+ * from classic XFS. Owner records for data extents are inserted when the
+ * extent is mapped and removed when an extent is unmapped.  Owner records for
+ * all other block types (i.e. metadata) are inserted when an extent is
+ * allocated and removed when an extent is freed. There can only be one owner
+ * of a metadata extent, usually an inode or some other metadata structure like
+ * an AG btree.
  *
  * The rmap btree is part of the free space management, so blocks for the tree
  * are sourced from the agfl. Hence we need transaction reservation support for
  * this tree so that the freelist is always large enough. This also impacts on
  * the minimum space we need to leave free in the AG.
  *
- * The tree is ordered by block number - there's no need to order/search by
- * extent size for online updating/management of the tree, and the reverse
- * lookups are going to be "who owns this block" and so are by-block ordering is
- * perfect for this.
- *
+ * The tree is ordered by [ag block, owner, offset]. This is a large key size,
+ * but it is the only way to enforce unique keys when a block can be owned by
+ * multiple files at any offset. There's no need to order/search by extent
+ * size for online updating/management of the tree. It is intended that most
+ * reverse lookups will be to find the owner(s) of a particular block, or to
+ * try to recover tree and file data from corrupt primary metadata.
  */
 
 static struct xfs_btree_cur *
@@ -165,6 +170,8 @@ xfs_rmapbt_init_key_from_rec(
 	union xfs_btree_rec	*rec)
 {
 	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+	key->rmap.rm_owner = rec->rmap.rm_owner;
+	key->rmap.rm_offset = rec->rmap.rm_offset;
 }
 
 STATIC void
@@ -173,6 +180,8 @@ xfs_rmapbt_init_rec_from_key(
 	union xfs_btree_rec	*rec)
 {
 	rec->rmap.rm_startblock = key->rmap.rm_startblock;
+	rec->rmap.rm_owner = key->rmap.rm_owner;
+	rec->rmap.rm_offset = key->rmap.rm_offset;
 }
 
 STATIC void
@@ -183,6 +192,7 @@ xfs_rmapbt_init_rec_from_cur(
 	rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
 	rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
 	rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+	rec->rmap.rm_offset = cpu_to_be64(cur->bc_rec.r.rm_offset);
 }
 
 STATIC void
@@ -205,8 +215,16 @@ xfs_rmapbt_key_diff(
 {
 	struct xfs_rmap_irec	*rec = &cur->bc_rec.r;
 	struct xfs_rmap_key	*kp = &key->rmap;
-
-	return (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+	__int64_t		d;
+
+	d = (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+	if (d)
+		return d;
+	d = (__int64_t)be64_to_cpu(kp->rm_owner) - rec->rm_owner;
+	if (d)
+		return d;
+	d = (__int64_t)be64_to_cpu(kp->rm_offset) - rec->rm_offset;
+	return d;
 }
 
 static bool
@@ -307,8 +325,16 @@ xfs_rmapbt_keys_inorder(
 	union xfs_btree_key	*k1,
 	union xfs_btree_key	*k2)
 {
-	return be32_to_cpu(k1->rmap.rm_startblock) <
-	       be32_to_cpu(k2->rmap.rm_startblock);
+	if (be32_to_cpu(k1->rmap.rm_startblock) <
+	    be32_to_cpu(k2->rmap.rm_startblock))
+		return 1;
+	if (be64_to_cpu(k1->rmap.rm_owner) <
+	    be64_to_cpu(k2->rmap.rm_owner))
+		return 1;
+	if (be64_to_cpu(k1->rmap.rm_offset) <=
+	    be64_to_cpu(k2->rmap.rm_offset))
+		return 1;
+	return 0;
 }
 
 STATIC int
@@ -317,9 +343,16 @@ xfs_rmapbt_recs_inorder(
 	union xfs_btree_rec	*r1,
 	union xfs_btree_rec	*r2)
 {
-	return be32_to_cpu(r1->rmap.rm_startblock) +
-		be32_to_cpu(r1->rmap.rm_blockcount) <=
-		be32_to_cpu(r2->rmap.rm_startblock);
+	if (be32_to_cpu(r1->rmap.rm_startblock) <
+	    be32_to_cpu(r2->rmap.rm_startblock))
+		return 1;
+	if (be64_to_cpu(r1->rmap.rm_offset) <
+	    be64_to_cpu(r2->rmap.rm_offset))
+		return 1;
+	if (be64_to_cpu(r1->rmap.rm_owner) <=
+	    be64_to_cpu(r2->rmap.rm_owner))
+		return 1;
+	return 0;
 }
 #endif	/* DEBUG */
 
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 2e02362..a5c97f8 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -51,6 +51,13 @@ struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
 				xfs_agnumber_t agno);
 int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
 
+int xfs_rmap_lookup_le(struct xfs_btree_cur *cur, xfs_agblock_t	bno,
+		xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
+int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t	bno,
+		xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
+int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
+		int *stat);
+
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
 		   struct xfs_owner_info *oinfo);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 22/76] xfs: add an extent to the rmap btree
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (20 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 21/76] xfs: enhance " Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 23/76] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
                   ` (54 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a spearate patch, so just the
addition operation can be focussed on here.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_rmap.c |  139 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index f6fe742..d1b6c82 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -144,6 +144,18 @@ out_error:
 	return error;
 }
 
+/*
+ * When we allocate a new block, the first thing we do is add a reference to the
+ * extent in the rmap btree. This takes the form of a [agbno, length, owner]
+ * record.  Newly inserted extents should never overlap with an existing extent
+ * in the rmap btree. Hence the insertion is a relatively trivial exercise,
+ * involving checking for adjacent records and merging if the new extent is
+ * contiguous and has the same owner.
+ *
+ * Note that we have no MAXEXTLEN limits here when merging as the length in the
+ * record has the full 32 bits available and hence a single record can track the
+ * entire space in the AG.
+ */
 int
 xfs_rmap_alloc(
 	struct xfs_trans	*tp,
@@ -154,18 +166,143 @@ xfs_rmap_alloc(
 	struct xfs_owner_info	*oinfo)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_btree_cur	*cur;
+	struct xfs_rmap_irec	ltrec;
+	struct xfs_rmap_irec	gtrec;
+	int			have_gt;
 	int			error = 0;
+	int			i;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
 	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, oinfo);
-	if (1)
+	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+	/*
+	 * For the initial lookup, look for and exact match or the left-adjacent
+	 * record for our insertion point. This will also give us the record for
+	 * start block contiguity tests.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	error = xfs_rmap_get_rec(cur, &ltrec, &i);
+	if (error)
 		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
+	//		agno, bno, len, owner, ltrec.rm_startblock,
+	//		ltrec.rm_blockcount, ltrec.rm_owner);
+
+	XFS_WANT_CORRUPTED_GOTO(mp,
+		ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
+
+	/*
+	 * Increment the cursor to see if we have a right-adjacent record to our
+	 * insertion point. This will give us the record for end block
+	 * contiguity tests.
+	 */
+	error = xfs_btree_increment(cur, 0, &have_gt);
+	if (error)
+		goto out_error;
+	if (have_gt) {
+		error = xfs_rmap_get_rec(cur, &gtrec, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, gtrec 0x%x/0x%x/0x%llx\n",
+	//		agno, bno, len, owner, gtrec.rm_startblock,
+	//		gtrec.rm_blockcount, gtrec.rm_owner);
+		XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= gtrec.rm_startblock,
+					out_error);
+	} else {
+		gtrec.rm_owner = XFS_RMAP_OWN_NULL;
+	}
+
+	/*
+	 * Note: cursor currently points one record to the right of ltrec, even
+	 * if there is no record in the tree to the right.
+	 */
+	if (ltrec.rm_owner == owner &&
+	    ltrec.rm_startblock + ltrec.rm_blockcount == bno) {
+		/*
+		 * left edge contiguous, merge into left record.
+		 *
+		 *       ltbno     ltlen
+		 * orig:   |ooooooooo|
+		 * adding:           |aaaaaaaaa|
+		 * result: |rrrrrrrrrrrrrrrrrrr|
+		 *                  bno       len
+		 */
+		//printk("add left\n");
+		ltrec.rm_blockcount += len;
+		if (gtrec.rm_owner == owner &&
+		    bno + len == gtrec.rm_startblock) {
+			//printk("add middle\n");
+			/*
+			 * right edge also contiguous, delete right record
+			 * and merge into left record.
+			 *
+			 *       ltbno     ltlen    gtbno     gtlen
+			 * orig:   |ooooooooo|         |ooooooooo|
+			 * adding:           |aaaaaaaaa|
+			 * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr|
+			 */
+			ltrec.rm_blockcount += gtrec.rm_blockcount;
+			error = xfs_btree_delete(cur, &i);
+			if (error)
+				goto out_error;
+			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		}
+
+		/* point the cursor back to the left record and update */
+		error = xfs_btree_decrement(cur, 0, &have_gt);
+		if (error)
+			goto out_error;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (gtrec.rm_owner == owner &&
+		   bno + len == gtrec.rm_startblock) {
+		/*
+		 * right edge contiguous, merge into right record.
+		 *
+		 *                 gtbno     gtlen
+		 * Orig:             |ooooooooo|
+		 * adding: |aaaaaaaaa|
+		 * Result: |rrrrrrrrrrrrrrrrrrr|
+		 *        bno       len
+		 */
+		//printk("add right\n");
+		gtrec.rm_startblock = bno;
+		gtrec.rm_blockcount += len;
+		error = xfs_rmap_update(cur, &gtrec);
+		if (error)
+			goto out_error;
+	} else {
+		//printk("add no match\n");
+		/*
+		 * no contiguous edge with identical owner, insert
+		 * new record at current cursor position.
+		 */
+		cur->bc_rec.r.rm_startblock = bno;
+		cur->bc_rec.r.rm_blockcount = len;
+		cur->bc_rec.r.rm_owner = owner;
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	}
+
 	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, oinfo);
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
 	return 0;
 
 out_error:
 	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, oinfo);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 23/76] xfs: add tracepoints for the rmap-mirrors-bmbt functions
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (21 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 22/76] xfs: add an extent to the rmap btree Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:58 ` [PATCH 24/76] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
                   ` (53 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_trace.h |  251 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 251 insertions(+)


diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d41b093..bdd6b18 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1721,6 +1721,257 @@ DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent);
 DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_done);
 DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_error);
 
+/* rmap-mirrors-bmbt traces */
+DECLARE_EVENT_CLASS(xfs_rmap_bmbt3_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_ino_t ino,
+		 int whichfork,
+		 struct xfs_bmbt_irec *left,
+		 struct xfs_bmbt_irec *prev,
+		 struct xfs_bmbt_irec *right),
+	TP_ARGS(mp, agno, ino, whichfork, left, prev, right),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(xfs_fileoff_t, l_loff)
+		__field(xfs_fsblock_t, l_poff)
+		__field(xfs_filblks_t, l_len)
+		__field(xfs_fileoff_t, p_loff)
+		__field(xfs_fsblock_t, p_poff)
+		__field(xfs_filblks_t, p_len)
+		__field(xfs_fileoff_t, r_loff)
+		__field(xfs_fsblock_t, r_poff)
+		__field(xfs_filblks_t, r_len)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->ino = ino;
+		__entry->whichfork = whichfork;
+		__entry->l_loff = left->br_startoff;
+		__entry->l_poff = left->br_startblock;
+		__entry->l_len = left->br_blockcount;
+		__entry->p_loff = prev->br_startoff;
+		__entry->p_poff = prev->br_startblock;
+		__entry->p_len = prev->br_blockcount;
+		__entry->r_loff = right->br_startoff;
+		__entry->r_poff = right->br_startblock;
+		__entry->r_len = right->br_blockcount;
+	),
+	TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld):(%llu:%lld:%lld):(%llu:%lld:%lld)",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->ino,
+		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+		  __entry->l_poff,
+		  __entry->l_len,
+		  __entry->l_loff,
+		  __entry->p_poff,
+		  __entry->p_len,
+		  __entry->p_loff,
+		  __entry->r_poff,
+		  __entry->r_len,
+		  __entry->r_loff)
+);
+#define DEFINE_RMAP_BMBT3_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_bmbt3_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_ino_t ino, \
+		 int whichfork, \
+		 struct xfs_bmbt_irec *left, \
+		 struct xfs_bmbt_irec *prev, \
+		 struct xfs_bmbt_irec *right), \
+	TP_ARGS(mp, agno, ino, whichfork, left, prev, right))
+
+DECLARE_EVENT_CLASS(xfs_rmap_bmbt2_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_ino_t ino,
+		 int whichfork,
+		 struct xfs_bmbt_irec *left,
+		 struct xfs_bmbt_irec *prev),
+	TP_ARGS(mp, agno, ino, whichfork, left, prev),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(xfs_fileoff_t, l_loff)
+		__field(xfs_fsblock_t, l_poff)
+		__field(xfs_filblks_t, l_len)
+		__field(xfs_fileoff_t, p_loff)
+		__field(xfs_fsblock_t, p_poff)
+		__field(xfs_filblks_t, p_len)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->ino = ino;
+		__entry->whichfork = whichfork;
+		__entry->l_loff = left->br_startoff;
+		__entry->l_poff = left->br_startblock;
+		__entry->l_len = left->br_blockcount;
+		__entry->p_loff = prev->br_startoff;
+		__entry->p_poff = prev->br_startblock;
+		__entry->p_len = prev->br_blockcount;
+	),
+	TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld):(%llu:%lld:%lld)",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->ino,
+		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+		  __entry->l_poff,
+		  __entry->l_len,
+		  __entry->l_loff,
+		  __entry->p_poff,
+		  __entry->p_len,
+		  __entry->p_loff)
+);
+#define DEFINE_RMAP_BMBT2_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_bmbt2_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_ino_t ino, \
+		 int whichfork, \
+		 struct xfs_bmbt_irec *left, \
+		 struct xfs_bmbt_irec *prev), \
+	TP_ARGS(mp, agno, ino, whichfork, left, prev))
+
+DECLARE_EVENT_CLASS(xfs_rmap_bmbt1_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_ino_t ino,
+		 int whichfork,
+		 struct xfs_bmbt_irec *left),
+	TP_ARGS(mp, agno, ino, whichfork, left),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(xfs_fileoff_t, l_loff)
+		__field(xfs_fsblock_t, l_poff)
+		__field(xfs_filblks_t, l_len)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->ino = ino;
+		__entry->whichfork = whichfork;
+		__entry->l_loff = left->br_startoff;
+		__entry->l_poff = left->br_startblock;
+		__entry->l_len = left->br_blockcount;
+	),
+	TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld)",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->ino,
+		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+		  __entry->l_poff,
+		  __entry->l_len,
+		  __entry->l_loff)
+);
+#define DEFINE_RMAP_BMBT1_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_bmbt1_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_ino_t ino, \
+		 int whichfork, \
+		 struct xfs_bmbt_irec *left), \
+	TP_ARGS(mp, agno, ino, whichfork, left))
+
+DECLARE_EVENT_CLASS(xfs_rmap_adjust_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_ino_t ino,
+		 int whichfork,
+		 struct xfs_bmbt_irec *left,
+		 long adj),
+	TP_ARGS(mp, agno, ino, whichfork, left, adj),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, ino)
+		__field(int, whichfork)
+		__field(xfs_fileoff_t, l_loff)
+		__field(xfs_fsblock_t, l_poff)
+		__field(xfs_filblks_t, l_len)
+		__field(long, adj)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->ino = ino;
+		__entry->whichfork = whichfork;
+		__entry->l_loff = left->br_startoff;
+		__entry->l_poff = left->br_startblock;
+		__entry->l_len = left->br_blockcount;
+		__entry->adj = adj;
+	),
+	TP_printk("dev %d:%d agno %u ino 0x%llx %s (%llu:%lld:%lld) adj %ld",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->ino,
+		  __entry->whichfork == XFS_ATTR_FORK ? "attr" : "data",
+		  __entry->l_poff,
+		  __entry->l_len,
+		  __entry->l_loff,
+		  __entry->adj)
+);
+#define DEFINE_RMAP_ADJUST_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_adjust_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_ino_t ino, \
+		 int whichfork, \
+		 struct xfs_bmbt_irec *left, \
+		 long adj), \
+	TP_ARGS(mp, agno, ino, whichfork, left, adj))
+
+DECLARE_EVENT_CLASS(xfs_rmapbt_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agblock_t agbno, xfs_extlen_t len,
+		 uint64_t owner, uint64_t offset),
+	TP_ARGS(mp, agno, agbno, len, owner, offset),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, agbno)
+		__field(xfs_extlen_t, len)
+		__field(uint64_t, owner)
+		__field(uint64_t, offset)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->agbno = agbno;
+		__entry->len = len;
+		__entry->owner = owner;
+		__entry->offset = offset;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx, offset %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->agbno,
+		  __entry->len,
+		  __entry->owner,
+		  __entry->offset)
+);
+#define DEFINE_RMAPBT_EVENT(name) \
+DEFINE_EVENT(xfs_rmapbt_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_agblock_t agbno, xfs_extlen_t len, \
+		 uint64_t owner, uint64_t offset), \
+	TP_ARGS(mp, agno, agbno, len, owner, offset))
+
+DEFINE_RMAP_BMBT3_EVENT(xfs_rmap_combine);
+DEFINE_RMAP_BMBT2_EVENT(xfs_rmap_lcombine);
+DEFINE_RMAP_BMBT2_EVENT(xfs_rmap_rcombine);
+DEFINE_RMAP_BMBT1_EVENT(xfs_rmap_insert);
+DEFINE_RMAP_BMBT1_EVENT(xfs_rmap_delete);
+DEFINE_RMAP_ADJUST_EVENT(xfs_rmap_move);
+DEFINE_RMAP_ADJUST_EVENT(xfs_rmap_slide);
+DEFINE_RMAP_ADJUST_EVENT(xfs_rmap_resize);
+DEFINE_RMAPBT_EVENT(xfs_rmapbt_update);
+DEFINE_RMAPBT_EVENT(xfs_rmapbt_insert);
+DEFINE_RMAPBT_EVENT(xfs_rmapbt_delete);
+
 DECLARE_EVENT_CLASS(xfs_da_class,
 	TP_PROTO(struct xfs_da_args *args),
 	TP_ARGS(args),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 24/76] xfs: teach rmap_alloc how to deal with our larger rmap btree
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (22 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 23/76] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
@ 2015-12-19  8:58 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 25/76] xfs: remove an extent from the " Darrick J. Wong
                   ` (52 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:58 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since reflink will require the rmapbt to mirror bmbt entries exactly, the
xfs_rmap_alloc function is only necessary to handle rmaps for bmbt blocks
and AG btrees.  Therefore, enhancing the function (and its extent merging
code) to handle the larger rmap records can be done simply.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_rmap.c       |   50 +++++++++++++++++++++++++++++++---------
 fs/xfs/libxfs/xfs_rmap_btree.h |    1 +
 2 files changed, 40 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index d1b6c82..a225a64 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -145,16 +145,34 @@ out_error:
 }
 
 /*
- * When we allocate a new block, the first thing we do is add a reference to the
- * extent in the rmap btree. This takes the form of a [agbno, length, owner]
- * record.  Newly inserted extents should never overlap with an existing extent
- * in the rmap btree. Hence the insertion is a relatively trivial exercise,
- * involving checking for adjacent records and merging if the new extent is
- * contiguous and has the same owner.
- *
- * Note that we have no MAXEXTLEN limits here when merging as the length in the
- * record has the full 32 bits available and hence a single record can track the
- * entire space in the AG.
+ * A mergeable rmap should have the same owner, cannot be unwritten, and
+ * must be a bmbt rmap if we're asking about a bmbt rmap.
+ */
+static bool
+is_mergeable_rmap(
+	struct xfs_rmap_irec	*irec,
+	uint64_t		owner,
+	uint64_t		offset)
+{
+	if (irec->rm_owner == XFS_RMAP_OWN_NULL)
+		return false;
+	if (irec->rm_owner != owner)
+		return false;
+	if (XFS_RMAP_IS_UNWRITTEN(irec->rm_blockcount))
+		return false;
+	if (XFS_RMAP_IS_ATTR_FORK(offset) ^
+	    XFS_RMAP_IS_ATTR_FORK(irec->rm_offset))
+		return false;
+	if (XFS_RMAP_IS_BMBT(offset) ^ XFS_RMAP_IS_BMBT(irec->rm_offset))
+		return false;
+	return true;
+}
+
+/*
+ * When we allocate a new block, the first thing we do is add a reference to
+ * the extent in the rmap btree. This takes the form of a [agbno, length,
+ * owner, offset] record.  Flags are encoded in the high bits of the offset
+ * field.
  */
 int
 xfs_rmap_alloc(
@@ -172,6 +190,8 @@ xfs_rmap_alloc(
 	int			have_gt;
 	int			error = 0;
 	int			i;
+	uint64_t		owner;
+	uint64_t		offset;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
@@ -179,12 +199,14 @@ xfs_rmap_alloc(
 	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, oinfo);
 	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
 
+	xfs_owner_info_unpack(oinfo, &owner, &offset);
+
 	/*
 	 * For the initial lookup, look for and exact match or the left-adjacent
 	 * record for our insertion point. This will also give us the record for
 	 * start block contiguity tests.
 	 */
-	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, &i);
 	if (error)
 		goto out_error;
 	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
@@ -196,8 +218,11 @@ xfs_rmap_alloc(
 	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
 	//		agno, bno, len, owner, ltrec.rm_startblock,
 	//		ltrec.rm_blockcount, ltrec.rm_owner);
+	if (!is_mergeable_rmap(&ltrec, owner, offset))
+		ltrec.rm_owner = XFS_RMAP_OWN_NULL;
 
 	XFS_WANT_CORRUPTED_GOTO(mp,
+		ltrec.rm_owner == XFS_RMAP_OWN_NULL ||
 		ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
 
 	/*
@@ -221,6 +246,8 @@ xfs_rmap_alloc(
 	} else {
 		gtrec.rm_owner = XFS_RMAP_OWN_NULL;
 	}
+	if (!is_mergeable_rmap(&gtrec, owner, offset))
+		gtrec.rm_owner = XFS_RMAP_OWN_NULL;
 
 	/*
 	 * Note: cursor currently points one record to the right of ltrec, even
@@ -291,6 +318,7 @@ xfs_rmap_alloc(
 		cur->bc_rec.r.rm_startblock = bno;
 		cur->bc_rec.r.rm_blockcount = len;
 		cur->bc_rec.r.rm_owner = owner;
+		cur->bc_rec.r.rm_offset = offset;
 		error = xfs_btree_insert(cur, &i);
 		if (error)
 			goto out_error;
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index a5c97f8..0dfc151 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -58,6 +58,7 @@ int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t	bno,
 int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
 		int *stat);
 
+/* functions for updating the rmapbt for bmbt blocks and AG btree blocks */
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
 		   struct xfs_owner_info *oinfo);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 25/76] xfs: remove an extent from the rmap btree
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (23 preceding siblings ...)
  2015-12-19  8:58 ` [PATCH 24/76] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 26/76] xfs: enhanced " Darrick J. Wong
                   ` (51 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_rmap.c |  154 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index a225a64..7ad3cd5 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -118,6 +118,31 @@ xfs_rmap_get_rec(
 	return 0;
 }
 
+/*
+ * Find the extent in the rmap btree and remove it.
+ *
+ * The record we find should always span a range greater than or equal to the
+ * the extent being freed. This makes the code simple as, in theory, we do not
+ * have to handle ranges that are split across multiple records as extents that
+ * result in bmap btree extent merges should also result in rmap btree extent
+ * merges.  The owner field ensures we don't merge extents from different
+ * structures into the same record, hence this property should always hold true
+ * if we ensure that the rmap btree supports at least the same size maximum
+ * extent as the bmap btree (bmbt MAXEXTLEN is 2^21 blocks at present, rmap
+ * btree record can hold 2^32 blocks in a single extent).
+ *
+ * Special Case #1: when growing the filesystem, we "free" an extent when
+ * growing the last AG. This extent is new space and so it is not tracked as
+ * used space in the btree. The growfs code will pass in an owner of
+ * XFS_RMAP_OWN_NULL to indicate that it expected that there is no owner of this
+ * extent. We verify that - the extent lookup result in a record that does not
+ * overlap.
+ *
+ * Special Case #2: EFIs do not record the owner of the extent, so when
+ * recovering EFIs from the log we pass in XFS_RMAP_OWN_UNKNOWN to tell the rmap
+ * btree to ignore the owner (i.e. wildcard match) so we don't trigger
+ * corruption checks during log recovery.
+ */
 int
 xfs_rmap_free(
 	struct xfs_trans	*tp,
@@ -128,19 +153,146 @@ xfs_rmap_free(
 	struct xfs_owner_info	*oinfo)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_btree_cur	*cur;
+	struct xfs_rmap_irec	ltrec;
 	int			error = 0;
+	int			i;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
 	trace_xfs_rmap_free_extent(mp, agno, bno, len, oinfo);
-	if (1)
+	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+	/*
+	 * We should always have a left record because there's a static record
+	 * for the AG headers at rm_startblock == 0 created by mkfs/growfs that
+	 * will not ever be removed from the tree.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	error = xfs_rmap_get_rec(cur, &ltrec, &i);
+	if (error)
 		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	/*
+	 * For growfs, the incoming extent must be beyond the left record we
+	 * just found as it is new space and won't be used by anyone. This is
+	 * just a corruption check as we don't actually do anything with this
+	 * extent.
+	 */
+	if (owner == XFS_RMAP_OWN_NULL) {
+		XFS_WANT_CORRUPTED_GOTO(mp, bno > ltrec.rm_startblock +
+						ltrec.rm_blockcount, out_error);
+		goto out_done;
+	}
+
+/*
+	if (owner != ltrec.rm_owner ||
+	    bno > ltrec.rm_startblock + ltrec.rm_blockcount)
+ */
+	//printk("rmfree  ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
+	//		agno, bno, len, owner, ltrec.rm_startblock,
+	//		ltrec.rm_blockcount, ltrec.rm_owner);
+
+	/* make sure the extent we found covers the entire freeing range. */
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno, out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_blockcount >= len, out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp,
+		bno <= ltrec.rm_startblock + ltrec.rm_blockcount, out_error);
+
+	/* make sure the owner matches what we expect to find in the tree */
+	XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
+				    (owner < XFS_RMAP_OWN_NULL &&
+				     owner >= XFS_RMAP_OWN_MIN), out_error);
+
+	if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
+	//printk("remove exact\n");
+		/* exact match, simply remove the record from rmap tree */
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	} else if (ltrec.rm_startblock == bno) {
+	//printk("remove left\n");
+		/*
+		 * overlap left hand side of extent: move the start, trim the
+		 * length and update the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing: |fffffffff|
+		 * Result:            |rrrrrrrrrr|
+		 *         bno       len
+		 */
+		ltrec.rm_startblock += len;
+		ltrec.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (ltrec.rm_startblock + ltrec.rm_blockcount == bno + len) {
+	//printk("remove right\n");
+		/*
+		 * overlap right hand side of extent: trim the length and update
+		 * the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:            |fffffffff|
+		 * Result:  |rrrrrrrrrr|
+		 *                    bno       len
+		 */
+		ltrec.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else {
+
+		/*
+		 * overlap middle of extent: trim the length of the existing
+		 * record to the length of the new left-extent size, increment
+		 * the insertion position so we can insert a new record
+		 * containing the remaining right-extent space.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:       |fffffffff|
+		 * Result:  |rrrrr|         |rrrr|
+		 *               bno       len
+		 */
+		xfs_extlen_t	orig_len = ltrec.rm_blockcount;
+	//printk("remove middle\n");
+
+		ltrec.rm_blockcount = bno - ltrec.rm_startblock;;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+
+		error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			goto out_error;
+
+		cur->bc_rec.r.rm_startblock = bno + len;
+		cur->bc_rec.r.rm_blockcount = orig_len - len -
+						     ltrec.rm_blockcount;
+		cur->bc_rec.r.rm_owner = ltrec.rm_owner;
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto out_error;
+	}
+
+out_done:
 	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, oinfo);
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
 	return 0;
 
 out_error:
 	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, oinfo);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 26/76] xfs: enhanced remove an extent from the rmap btree
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (24 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 25/76] xfs: remove an extent from the " Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 27/76] xfs: add rmap btree insert and delete helpers Darrick J. Wong
                   ` (50 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Simplify the rmap extent removal since we never merge extents or anything
fancy like that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_rmap.c |   49 ++++++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 17 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 7ad3cd5..8b244c4 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -121,15 +121,8 @@ xfs_rmap_get_rec(
 /*
  * Find the extent in the rmap btree and remove it.
  *
- * The record we find should always span a range greater than or equal to the
- * the extent being freed. This makes the code simple as, in theory, we do not
- * have to handle ranges that are split across multiple records as extents that
- * result in bmap btree extent merges should also result in rmap btree extent
- * merges.  The owner field ensures we don't merge extents from different
- * structures into the same record, hence this property should always hold true
- * if we ensure that the rmap btree supports at least the same size maximum
- * extent as the bmap btree (bmbt MAXEXTLEN is 2^21 blocks at present, rmap
- * btree record can hold 2^32 blocks in a single extent).
+ * The record we find should always be an exact match for the extent that we're
+ * looking for, since we insert them into the btree without modification.
  *
  * Special Case #1: when growing the filesystem, we "free" an extent when
  * growing the last AG. This extent is new space and so it is not tracked as
@@ -155,8 +148,11 @@ xfs_rmap_free(
 	struct xfs_mount	*mp = tp->t_mountp;
 	struct xfs_btree_cur	*cur;
 	struct xfs_rmap_irec	ltrec;
+	uint64_t		ltoff;
 	int			error = 0;
 	int			i;
+	uint64_t		owner;
+	uint64_t		offset;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
@@ -164,12 +160,15 @@ xfs_rmap_free(
 	trace_xfs_rmap_free_extent(mp, agno, bno, len, oinfo);
 	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
 
+	xfs_owner_info_unpack(oinfo, &owner, &offset);
+
+	ltoff = ltrec.rm_offset & ~XFS_RMAP_OFF_BMBT;
 	/*
 	 * We should always have a left record because there's a static record
 	 * for the AG headers at rm_startblock == 0 created by mkfs/growfs that
 	 * will not ever be removed from the tree.
 	 */
-	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, offset, &i);
 	if (error)
 		goto out_error;
 	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
@@ -200,15 +199,30 @@ xfs_rmap_free(
 	//		ltrec.rm_blockcount, ltrec.rm_owner);
 
 	/* make sure the extent we found covers the entire freeing range. */
-	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno, out_error);
-	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_blockcount >= len, out_error);
-	XFS_WANT_CORRUPTED_GOTO(mp,
-		bno <= ltrec.rm_startblock + ltrec.rm_blockcount, out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp, !XFS_RMAP_IS_UNWRITTEN(ltrec.rm_blockcount),
+		out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno &&
+		ltrec.rm_startblock + XFS_RMAP_LEN(ltrec.rm_blockcount) >=
+		bno + len, out_error);
 
 	/* make sure the owner matches what we expect to find in the tree */
 	XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
-				    (owner < XFS_RMAP_OWN_NULL &&
-				     owner >= XFS_RMAP_OWN_MIN), out_error);
+				    XFS_RMAP_NON_INODE_OWNER(owner), out_error);
+
+	/* check the offset, if necessary */
+	if (!XFS_RMAP_NON_INODE_OWNER(owner)) {
+		if (XFS_RMAP_IS_BMBT(offset)) {
+			XFS_WANT_CORRUPTED_GOTO(mp,
+					XFS_RMAP_IS_BMBT(ltrec.rm_offset),
+					out_error);
+		} else {
+			XFS_WANT_CORRUPTED_GOTO(mp,
+					ltrec.rm_offset <= offset, out_error);
+			XFS_WANT_CORRUPTED_GOTO(mp,
+					offset <= ltoff + ltrec.rm_blockcount,
+					out_error);
+		}
+	}
 
 	if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
 	//printk("remove exact\n");
@@ -267,7 +281,7 @@ xfs_rmap_free(
 		xfs_extlen_t	orig_len = ltrec.rm_blockcount;
 	//printk("remove middle\n");
 
-		ltrec.rm_blockcount = bno - ltrec.rm_startblock;;
+		ltrec.rm_blockcount = bno - ltrec.rm_startblock;
 		error = xfs_rmap_update(cur, &ltrec);
 		if (error)
 			goto out_error;
@@ -280,6 +294,7 @@ xfs_rmap_free(
 		cur->bc_rec.r.rm_blockcount = orig_len - len -
 						     ltrec.rm_blockcount;
 		cur->bc_rec.r.rm_owner = ltrec.rm_owner;
+		cur->bc_rec.r.rm_offset = offset;
 		error = xfs_btree_insert(cur, &i);
 		if (error)
 			goto out_error;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 27/76] xfs: add rmap btree insert and delete helpers
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (25 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 26/76] xfs: enhanced " Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 28/76] xfs: piggyback rmapbt update intents in the bmap free structure Darrick J. Wong
                   ` (49 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add a couple of helper functions to encapsulate rmap btree insert and
delete operations.  Add tracepoints to the update function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_rmap.c       |   62 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h |    2 +
 2 files changed, 64 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 8b244c4..6733873 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -88,6 +88,10 @@ xfs_rmap_update(
 {
 	union xfs_btree_rec	rec;
 
+	trace_xfs_rmapbt_update(cur->bc_mp, cur->bc_private.a.agno,
+			irec->rm_startblock, irec->rm_blockcount,
+			irec->rm_owner, irec->rm_offset);
+
 	rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
 	rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
 	rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
@@ -95,6 +99,64 @@ xfs_rmap_update(
 	return xfs_btree_update(cur, &rec);
 }
 
+int
+xfs_rmapbt_insert(
+	struct xfs_btree_cur	*rcur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset)
+{
+	int			i;
+	int			error;
+
+	trace_xfs_rmapbt_insert(rcur->bc_mp, rcur->bc_private.a.agno, agbno,
+			len, owner, offset);
+
+	error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 0, done);
+
+	rcur->bc_rec.r.rm_startblock = agbno;
+	rcur->bc_rec.r.rm_blockcount = len;
+	rcur->bc_rec.r.rm_owner = owner;
+	rcur->bc_rec.r.rm_offset = offset;
+	error = xfs_btree_insert(rcur, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+done:
+	return error;
+}
+
+STATIC int
+xfs_rmapbt_delete(
+	struct xfs_btree_cur	*rcur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	uint64_t		offset)
+{
+	int			i;
+	int			error;
+
+	trace_xfs_rmapbt_delete(rcur->bc_mp, rcur->bc_private.a.agno, agbno,
+			len, owner, offset);
+
+	error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+
+	error = xfs_btree_delete(rcur, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+done:
+	return error;
+}
+
 /*
  * Get the data from the pointed-to record.
  */
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 0dfc151..d7c9722 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -55,6 +55,8 @@ int xfs_rmap_lookup_le(struct xfs_btree_cur *cur, xfs_agblock_t	bno,
 		xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
 int xfs_rmap_lookup_eq(struct xfs_btree_cur *cur, xfs_agblock_t	bno,
 		xfs_extlen_t len, uint64_t owner, uint64_t offset, int *stat);
+int xfs_rmapbt_insert(struct xfs_btree_cur *rcur, xfs_agblock_t	agbno,
+		xfs_extlen_t len, uint64_t owner, uint64_t offset);
 int xfs_rmap_get_rec(struct xfs_btree_cur *cur, struct xfs_rmap_irec *irec,
 		int *stat);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 28/76] xfs: piggyback rmapbt update intents in the bmap free structure
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (26 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 27/76] xfs: add rmap btree insert and delete helpers Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 29/76] xfs: bmap btree changes should update rmap btree Darrick J. Wong
                   ` (48 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Extend the xfs_bmap_free structure to track a list of rmapbt update
intents.  In a subsequent patch, we'll defer all data fork rmapbt
edits until we're done making changes to the bmbt, at which point we
can replay the rmap edits in order of increasing AG number.  This
enables us to avoid deadlocks by complying with AG lock ordering
rules.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        |   18 +++++++++---------
 fs/xfs/libxfs/xfs_attr_remote.c |    4 ++--
 fs/xfs/libxfs/xfs_bmap.c        |    4 ++--
 fs/xfs/libxfs/xfs_bmap.h        |   13 ++++++++++++-
 fs/xfs/xfs_bmap_util.c          |   10 ++++++----
 fs/xfs/xfs_dquot.c              |    4 ++--
 fs/xfs/xfs_inode.c              |   12 ++++++------
 fs/xfs/xfs_iomap.c              |    7 ++++---
 fs/xfs/xfs_rtalloc.c            |    2 +-
 fs/xfs/xfs_symlink.c            |    4 ++--
 10 files changed, 46 insertions(+), 32 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index f949818..ea49e6f 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -336,7 +336,7 @@ xfs_attr_set(
 		error = xfs_attr_shortform_to_leaf(&args);
 		if (!error) {
 			error = xfs_bmap_finish(&args.trans, args.flist,
-						&committed);
+						&committed, dp);
 		}
 		if (error) {
 			ASSERT(committed);
@@ -630,7 +630,7 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 		error = xfs_attr3_leaf_to_node(args);
 		if (!error) {
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
+						&committed, dp);
 		}
 		if (error) {
 			ASSERT(committed);
@@ -732,7 +732,7 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 			if (!error) {
 				error = xfs_bmap_finish(&args->trans,
 							args->flist,
-							&committed);
+							&committed, dp);
 			}
 			if (error) {
 				ASSERT(committed);
@@ -805,7 +805,7 @@ xfs_attr_leaf_removename(xfs_da_args_t *args)
 		/* bp is gone due to xfs_da_shrink_inode */
 		if (!error) {
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
+						&committed, dp);
 		}
 		if (error) {
 			ASSERT(committed);
@@ -941,7 +941,7 @@ restart:
 			if (!error) {
 				error = xfs_bmap_finish(&args->trans,
 							args->flist,
-							&committed);
+							&committed, dp);
 			}
 			if (error) {
 				ASSERT(committed);
@@ -979,7 +979,7 @@ restart:
 		error = xfs_da3_split(state);
 		if (!error) {
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
+						&committed, dp);
 		}
 		if (error) {
 			ASSERT(committed);
@@ -1089,7 +1089,7 @@ restart:
 			if (!error) {
 				error = xfs_bmap_finish(&args->trans,
 							args->flist,
-							&committed);
+							&committed, dp);
 			}
 			if (error) {
 				ASSERT(committed);
@@ -1222,7 +1222,7 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 		error = xfs_da3_join(state);
 		if (!error) {
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
+						&committed, dp);
 		}
 		if (error) {
 			ASSERT(committed);
@@ -1268,7 +1268,7 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 			if (!error) {
 				error = xfs_bmap_finish(&args->trans,
 							args->flist,
-							&committed);
+							&committed, dp);
 			}
 			if (error) {
 				ASSERT(committed);
diff --git a/fs/xfs/libxfs/xfs_attr_remote.c b/fs/xfs/libxfs/xfs_attr_remote.c
index 5ab95ff..d0d1fd2 100644
--- a/fs/xfs/libxfs/xfs_attr_remote.c
+++ b/fs/xfs/libxfs/xfs_attr_remote.c
@@ -468,7 +468,7 @@ xfs_attr_rmtval_set(
 				  args->total, &map, &nmap, args->flist);
 		if (!error) {
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
+						&committed, dp);
 		}
 		if (error) {
 			ASSERT(committed);
@@ -622,7 +622,7 @@ xfs_attr_rmtval_remove(
 				    args->flist, &done);
 		if (!error) {
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
+						&committed, args->dp);
 		}
 		if (error) {
 			ASSERT(committed);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9fd02de..dfc634a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1208,7 +1208,7 @@ xfs_bmap_add_attrfork(
 			xfs_log_sb(tp);
 	}
 
-	error = xfs_bmap_finish(&tp, &flist, &committed);
+	error = xfs_bmap_finish(&tp, &flist, &committed, NULL);
 	if (error)
 		goto bmap_cancel;
 	error = xfs_trans_commit(tp);
@@ -5967,7 +5967,7 @@ xfs_bmap_split_extent(
 	if (error)
 		goto out;
 
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index e7cd789..fcea496 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -56,6 +56,7 @@ struct xfs_bmalloca {
 	bool			conv;	/* overwriting unwritten extents */
 	char			userdata;/* userdata mask */
 	int			flags;
+	struct xfs_rmap_list	*rlist;
 };
 
 /*
@@ -70,6 +71,13 @@ typedef struct xfs_bmap_free_item
 	struct xfs_bmap_free_item *xbfi_next;	/* link to next entry */
 } xfs_bmap_free_item_t;
 
+struct xfs_rmap_intent;
+
+struct xfs_rmap_list {
+	struct xfs_rmap_intent			*rl_first;
+	unsigned int				rl_count;
+};
+
 /*
  * Header for free extent list.
  *
@@ -89,6 +97,7 @@ typedef	struct xfs_bmap_free
 	xfs_bmap_free_item_t	*xbf_first;	/* list of to-be-free extents */
 	int			xbf_count;	/* count of items on list */
 	int			xbf_low;	/* alloc in low mode */
+	struct xfs_rmap_list	xbf_rlist;	/* rmap intent list */
 } xfs_bmap_free_t;
 
 #define	XFS_BMAP_MAX_NMAP	4
@@ -144,6 +153,8 @@ static inline void xfs_bmap_init(xfs_bmap_free_t *flp, xfs_fsblock_t *fbp)
 {
 	((flp)->xbf_first = NULL, (flp)->xbf_count = 0, \
 		(flp)->xbf_low = 0, *(fbp) = NULLFSBLOCK);
+	flp->xbf_rlist.rl_first = NULL;
+	flp->xbf_rlist.rl_count = 0;
 }
 
 /*
@@ -197,7 +208,7 @@ void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
 			  struct xfs_owner_info *oinfo);
 void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
 int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
-			int *committed);
+			int *committed, struct xfs_inode *ip);
 void	xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork);
 int	xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_extlen_t len, xfs_fileoff_t *unused, int whichfork);
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0543779..b8dfa93 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -40,6 +40,7 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_log.h"
+#include "xfs_rmap_btree.h"
 
 /* Kernel only BMAP related definitions and functions */
 
@@ -98,7 +99,8 @@ int						/* error */
 xfs_bmap_finish(
 	struct xfs_trans		**tp,	/* transaction pointer addr */
 	struct xfs_bmap_free		*flist,	/* i/o: list extents to free */
-	int				*committed)/* xact committed or not */
+	int				*committed,/* xact committed or not */
+	struct xfs_inode		*ip)	/* inode to join with rmap transaction */
 {
 	struct xfs_efd_log_item		*efd;	/* extent free data */
 	struct xfs_efi_log_item		*efi;	/* extent free intention */
@@ -1072,7 +1074,7 @@ xfs_alloc_file_space(
 		/*
 		 * Complete the transaction
 		 */
-		error = xfs_bmap_finish(&tp, &free_list, &committed);
+		error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 		if (error) {
 			goto error0;
 		}
@@ -1354,7 +1356,7 @@ xfs_free_file_space(
 		/*
 		 * complete the transaction
 		 */
-		error = xfs_bmap_finish(&tp, &free_list, &committed);
+		error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 		if (error) {
 			goto error0;
 		}
@@ -1527,7 +1529,7 @@ xfs_shift_file_space(
 		if (error)
 			goto out_bmap_cancel;
 
-		error = xfs_bmap_finish(&tp, &free_list, &committed);
+		error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 		if (error)
 			goto out_bmap_cancel;
 
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 7ac6c5c..8c7c9fb 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -379,9 +379,9 @@ xfs_qm_dqalloc(
 
 	xfs_trans_bhold(tp, bp);
 
-	if ((error = xfs_bmap_finish(tpp, &flist, &committed))) {
+	error = xfs_bmap_finish(tpp, &flist, &committed, NULL);
+	if (error)
 		goto error1;
-	}
 
 	if (committed) {
 		tp = *tpp;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 8ee3939..55a7e00 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1275,7 +1275,7 @@ xfs_create(
 	 */
 	xfs_qm_vop_create_dqattach(tp, ip, udqp, gdqp, pdqp);
 
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 	if (error)
 		goto out_bmap_cancel;
 
@@ -1506,7 +1506,7 @@ xfs_link(
 		xfs_trans_set_sync(tp);
 	}
 
-	error = xfs_bmap_finish (&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 	if (error) {
 		xfs_bmap_cancel(&free_list);
 		goto error_return;
@@ -1601,7 +1601,7 @@ xfs_itruncate_extents(
 		 * Duplicate the transaction that has the permanent
 		 * reservation and commit the old transaction.
 		 */
-		error = xfs_bmap_finish(&tp, &free_list, &committed);
+		error = xfs_bmap_finish(&tp, &free_list, &committed, ip);
 		if (committed)
 			xfs_trans_ijoin(tp, ip, 0);
 		if (error)
@@ -1841,7 +1841,7 @@ xfs_inactive_ifree(
 	 * Just ignore errors at this point.  There is nothing we can do except
 	 * to try to keep going. Make sure it's not a silent error.
 	 */
-	error = xfs_bmap_finish(&tp,  &free_list, &committed);
+	error = xfs_bmap_finish(&tp,  &free_list, &committed, NULL);
 	if (error) {
 		xfs_notice(mp, "%s: xfs_bmap_finish returned error %d",
 			__func__, error);
@@ -2624,7 +2624,7 @@ xfs_remove(
 	if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
 		xfs_trans_set_sync(tp);
 
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 	if (error)
 		goto out_bmap_cancel;
 
@@ -2711,7 +2711,7 @@ xfs_finish_rename(
 	if (tp->t_mountp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
 		xfs_trans_set_sync(tp);
 
-	error = xfs_bmap_finish(&tp, free_list, &committed);
+	error = xfs_bmap_finish(&tp, free_list, &committed, NULL);
 	if (error) {
 		xfs_bmap_cancel(free_list);
 		xfs_trans_cancel(tp);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f4f5b43..96dcaeb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -247,7 +247,7 @@ xfs_iomap_write_direct(
 	/*
 	 * Complete the transaction
 	 */
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 	if (error)
 		goto out_bmap_cancel;
 
@@ -794,7 +794,8 @@ xfs_iomap_write_allocate(
 			if (error)
 				goto trans_cancel;
 
-			error = xfs_bmap_finish(&tp, &free_list, &committed);
+			error = xfs_bmap_finish(&tp, &free_list, &committed,
+					NULL);
 			if (error)
 				goto trans_cancel;
 
@@ -924,7 +925,7 @@ xfs_iomap_write_unwritten(
 			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 		}
 
-		error = xfs_bmap_finish(&tp, &free_list, &committed);
+		error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 		if (error)
 			goto error_on_bmapi_transaction;
 
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index ab1bac6..96db032 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -811,7 +811,7 @@ xfs_growfs_rt_alloc(
 		/*
 		 * Free any blocks freed up in the transaction, then commit.
 		 */
-		error = xfs_bmap_finish(&tp, &flist, &committed);
+		error = xfs_bmap_finish(&tp, &flist, &committed, NULL);
 		if (error)
 			goto out_bmap_cancel;
 		error = xfs_trans_commit(tp);
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 996481e..6303362 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -387,7 +387,7 @@ xfs_symlink(
 		xfs_trans_set_sync(tp);
 	}
 
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
 	if (error)
 		goto out_bmap_cancel;
 
@@ -510,7 +510,7 @@ xfs_inactive_symlink_rmt(
 	/*
 	 * Commit the first transaction.  This logs the EFI and the inode.
 	 */
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, &committed, ip);
 	if (error)
 		goto error_bmap_cancel;
 	/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 29/76] xfs: bmap btree changes should update rmap btree
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (27 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 28/76] xfs: piggyback rmapbt update intents in the bmap free structure Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 30/76] xfs: add rmap btree geometry feature flag Darrick J. Wong
                   ` (47 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Any update to a file's bmap should make the corresponding change to
the rmapbt.  On a reflink filesystem, this is absolutely required
because a given (file data) physical block can have multiple owners
and the only sane way to find an rmap given a bmap is if there is a
1:1 correspondence.

(At some point we can optimize this for non-reflink filesystems
because regular merge still works there.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c       |  179 ++++++++++
 fs/xfs/libxfs/xfs_rmap.c       |  698 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h |   60 +++
 fs/xfs/xfs_bmap_util.c         |   12 +
 4 files changed, 941 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index dfc634a..5d1290e 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -45,6 +45,7 @@
 #include "xfs_symlink.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_filestream.h"
+#include "xfs_rmap_btree.h"
 
 
 kmem_zone_t		*xfs_bmap_free_item_zone;
@@ -626,6 +627,8 @@ xfs_bmap_cancel(
 	xfs_bmap_free_item_t	*free;	/* free list item */
 	xfs_bmap_free_item_t	*next;
 
+	xfs_rmap_cancel(&flist->xbf_rlist);
+
 	if (flist->xbf_count == 0)
 		return;
 	ASSERT(flist->xbf_first != NULL);
@@ -1848,6 +1851,10 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_combine(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &LEFT, &RIGHT, &PREV);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
@@ -1880,6 +1887,10 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_resize(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &LEFT, PREV.br_blockcount);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -1911,6 +1922,10 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_move(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &RIGHT, -PREV.br_blockcount);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
@@ -1940,6 +1955,10 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_insert(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, new);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG:
@@ -1975,6 +1994,10 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_resize(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &LEFT, new->br_blockcount);
+		if (error)
+			goto done;
 		da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp),
 			startblockval(PREV.br_startblock));
 		xfs_bmbt_set_startblock(ep, nullstartblock(da_new));
@@ -2010,6 +2033,10 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_insert(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, new);
+		if (error)
+			goto done;
 
 		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
@@ -2058,6 +2085,8 @@ xfs_bmap_add_extent_delay_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_move(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &RIGHT, -new->br_blockcount);
 
 		da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp),
 			startblockval(PREV.br_startblock));
@@ -2094,6 +2123,10 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_insert(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, new);
+		if (error)
+			goto done;
 
 		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
@@ -2163,6 +2196,10 @@ xfs_bmap_add_extent_delay_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_insert(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, new);
+		if (error)
+			goto done;
 
 		if (xfs_bmap_needs_btree(bma->ip, whichfork)) {
 			error = xfs_bmap_extents_to_btree(bma->tp, bma->ip,
@@ -2404,6 +2441,10 @@ xfs_bmap_add_extent_unwritten_real(
 				RIGHT.br_blockcount, LEFT.br_state)))
 				goto done;
 		}
+		error = xfs_rmap_combine(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &LEFT, &RIGHT, &PREV);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_LEFT_CONTIG:
@@ -2441,6 +2482,10 @@ xfs_bmap_add_extent_unwritten_real(
 				LEFT.br_state)))
 				goto done;
 		}
+		error = xfs_rmap_lcombine(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &LEFT, &PREV);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -2476,6 +2521,10 @@ xfs_bmap_add_extent_unwritten_real(
 				newext)))
 				goto done;
 		}
+		error = xfs_rmap_rcombine(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &RIGHT, &PREV);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING:
@@ -2502,6 +2551,11 @@ xfs_bmap_add_extent_unwritten_real(
 				newext)))
 				goto done;
 		}
+
+		error = xfs_rmap_resize(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, new, 0);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG:
@@ -2549,6 +2603,14 @@ xfs_bmap_add_extent_unwritten_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_move(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &PREV, new->br_blockcount);
+		if (error)
+			goto done;
+		error = xfs_rmap_resize(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &LEFT, new->br_blockcount);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING:
@@ -2587,6 +2649,14 @@ xfs_bmap_add_extent_unwritten_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_move(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &PREV, new->br_blockcount);
+		if (error)
+			goto done;
+		error = xfs_rmap_insert(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, new);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
@@ -2629,6 +2699,14 @@ xfs_bmap_add_extent_unwritten_real(
 				newext)))
 				goto done;
 		}
+		error = xfs_rmap_resize(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &PREV, -new->br_blockcount);
+		if (error)
+			goto done;
+		error = xfs_rmap_move(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &RIGHT, -new->br_blockcount);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_RIGHT_FILLING:
@@ -2669,6 +2747,14 @@ xfs_bmap_add_extent_unwritten_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_resize(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &PREV, -new->br_blockcount);
+		if (error)
+			goto done;
+		error = xfs_rmap_insert(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, new);
+		if (error)
+			goto done;
 		break;
 
 	case 0:
@@ -2730,6 +2816,19 @@ xfs_bmap_add_extent_unwritten_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_resize(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &PREV, new->br_startoff -
+				PREV.br_startoff - PREV.br_blockcount);
+		if (error)
+			goto done;
+		error = xfs_rmap_insert(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, new);
+		if (error)
+			goto done;
+		error = xfs_rmap_insert(mp, &flist->xbf_rlist, ip->i_ino,
+				XFS_DATA_FORK, &r[1]);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG | BMAP_RIGHT_CONTIG:
@@ -2933,6 +3032,7 @@ xfs_bmap_add_extent_hole_real(
 	int			rval=0;	/* return value (logging flags) */
 	int			state;	/* state bits, accessed thru macros */
 	struct xfs_mount	*mp;
+	struct xfs_bmbt_irec	prev;	/* fake previous extent entry */
 
 	mp = bma->tp ? bma->tp->t_mountp : NULL;
 	ifp = XFS_IFORK_PTR(bma->ip, whichfork);
@@ -3040,6 +3140,12 @@ xfs_bmap_add_extent_hole_real(
 			if (error)
 				goto done;
 		}
+		prev = *new;
+		prev.br_startblock = nullstartblock(0);
+		error = xfs_rmap_combine(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &left, &right, &prev);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_LEFT_CONTIG:
@@ -3072,6 +3178,10 @@ xfs_bmap_add_extent_hole_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_resize(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &left, new->br_blockcount);
+		if (error)
+			goto done;
 		break;
 
 	case BMAP_RIGHT_CONTIG:
@@ -3106,6 +3216,10 @@ xfs_bmap_add_extent_hole_real(
 			if (error)
 				goto done;
 		}
+		error = xfs_rmap_move(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, &right, -new->br_blockcount);
+		if (error)
+			goto done;
 		break;
 
 	case 0:
@@ -3134,6 +3248,10 @@ xfs_bmap_add_extent_hole_real(
 				goto done;
 			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done);
 		}
+		error = xfs_rmap_insert(mp, bma->rlist, bma->ip->i_ino,
+				whichfork, new);
+		if (error)
+			goto done;
 		break;
 	}
 
@@ -4268,7 +4386,6 @@ xfs_bmapi_delay(
 	return 0;
 }
 
-
 static int
 xfs_bmapi_allocate(
 	struct xfs_bmalloca	*bma)
@@ -4582,6 +4699,7 @@ xfs_bmapi_write(
 	bma.userdata = 0;
 	bma.flist = flist;
 	bma.firstblock = firstblock;
+	bma.rlist = &flist->xbf_rlist;
 
 	while (bno < end && n < *nmap) {
 		inhole = eof || bma.got.br_startoff > bno;
@@ -4840,6 +4958,10 @@ xfs_bmap_del_extent(
 		XFS_IFORK_NEXT_SET(ip, whichfork,
 			XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
 		flags |= XFS_ILOG_CORE;
+		error = xfs_rmap_delete(mp, &flist->xbf_rlist, ip->i_ino,
+				whichfork, &got);
+		if (error)
+			goto done;
 		if (!cur) {
 			flags |= xfs_ilog_fext(whichfork);
 			break;
@@ -4867,6 +4989,10 @@ xfs_bmap_del_extent(
 		}
 		xfs_bmbt_set_startblock(ep, del_endblock);
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+		error = xfs_rmap_move(mp, &flist->xbf_rlist, ip->i_ino,
+				whichfork, &got, del->br_blockcount);
+		if (error)
+			goto done;
 		if (!cur) {
 			flags |= xfs_ilog_fext(whichfork);
 			break;
@@ -4893,6 +5019,10 @@ xfs_bmap_del_extent(
 			break;
 		}
 		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+		error = xfs_rmap_resize(mp, &flist->xbf_rlist, ip->i_ino,
+				whichfork, &got, -del->br_blockcount);
+		if (error)
+			goto done;
 		if (!cur) {
 			flags |= xfs_ilog_fext(whichfork);
 			break;
@@ -4918,6 +5048,15 @@ xfs_bmap_del_extent(
 		if (!delay) {
 			new.br_startblock = del_endblock;
 			flags |= XFS_ILOG_CORE;
+			error = xfs_rmap_resize(mp, &flist->xbf_rlist,
+					ip->i_ino, whichfork, &got,
+					temp - got.br_blockcount);
+			if (error)
+				goto done;
+			error = xfs_rmap_insert(mp, &flist->xbf_rlist,
+					ip->i_ino, whichfork, &new);
+			if (error)
+				goto done;
 			if (cur) {
 				if ((error = xfs_bmbt_update(cur,
 						got.br_startoff,
@@ -5154,6 +5293,7 @@ xfs_bunmapi(
 			got.br_startoff + got.br_blockcount - 1);
 		if (bno < start)
 			break;
+
 		/*
 		 * Then deal with the (possibly delayed) allocated space
 		 * we found.
@@ -5456,7 +5596,8 @@ xfs_bmse_merge(
 	struct xfs_bmbt_rec_host	*gotp,		/* extent to shift */
 	struct xfs_bmbt_rec_host	*leftp,		/* preceding extent */
 	struct xfs_btree_cur		*cur,
-	int				*logflags)	/* output */
+	int				*logflags,	/* output */
+	struct xfs_rmap_list		*rlist)		/* rmap intent list */
 {
 	struct xfs_bmbt_irec		got;
 	struct xfs_bmbt_irec		left;
@@ -5487,6 +5628,13 @@ xfs_bmse_merge(
 	XFS_IFORK_NEXT_SET(ip, whichfork,
 			   XFS_IFORK_NEXTENTS(ip, whichfork) - 1);
 	*logflags |= XFS_ILOG_CORE;
+	error = xfs_rmap_resize(mp, rlist, ip->i_ino, whichfork, &left,
+			blockcount - left.br_blockcount);
+	if (error)
+		return error;
+	error = xfs_rmap_delete(mp, rlist, ip->i_ino, whichfork, &got);
+	if (error)
+		return error;
 	if (!cur) {
 		*logflags |= XFS_ILOG_DEXT;
 		return 0;
@@ -5529,7 +5677,8 @@ xfs_bmse_shift_one(
 	struct xfs_bmbt_rec_host	*gotp,
 	struct xfs_btree_cur		*cur,
 	int				*logflags,
-	enum shift_direction		direction)
+	enum shift_direction		direction,
+	struct xfs_rmap_list		*rlist)
 {
 	struct xfs_ifork		*ifp;
 	struct xfs_mount		*mp;
@@ -5579,7 +5728,7 @@ xfs_bmse_shift_one(
 				       offset_shift_fsb)) {
 			return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
 					      *current_ext, gotp, adj_irecp,
-					      cur, logflags);
+					      cur, logflags, rlist);
 		}
 	} else {
 		startoff = got.br_startoff + offset_shift_fsb;
@@ -5616,6 +5765,10 @@ update_current_ext:
 		(*current_ext)--;
 	xfs_bmbt_set_startoff(gotp, startoff);
 	*logflags |= XFS_ILOG_CORE;
+	error = xfs_rmap_slide(mp, rlist, ip->i_ino, whichfork,
+			&got, startoff - got.br_startoff);
+	if (error)
+		return error;
 	if (!cur) {
 		*logflags |= XFS_ILOG_DEXT;
 		return 0;
@@ -5755,9 +5908,11 @@ xfs_bmap_shift_extents(
 	}
 
 	while (nexts++ < num_exts) {
+		xfs_bmbt_get_all(gotp, &got);
+
 		error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
 					   &current_ext, gotp, cur, &logflags,
-					   direction);
+					   direction, &flist->xbf_rlist);
 		if (error)
 			goto del_cursor;
 		/*
@@ -5810,6 +5965,7 @@ xfs_bmap_split_extent_at(
 	int				whichfork = XFS_DATA_FORK;
 	struct xfs_btree_cur		*cur = NULL;
 	struct xfs_bmbt_rec_host	*gotp;
+	struct xfs_bmbt_irec		rgot;
 	struct xfs_bmbt_irec		got;
 	struct xfs_bmbt_irec		new; /* split extent */
 	struct xfs_mount		*mp = ip->i_mount;
@@ -5819,6 +5975,7 @@ xfs_bmap_split_extent_at(
 	int				error = 0;
 	int				logflags = 0;
 	int				i = 0;
+	long				adj;
 
 	if (unlikely(XFS_TEST_ERROR(
 	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
@@ -5858,6 +6015,7 @@ xfs_bmap_split_extent_at(
 	if (got.br_startoff >= split_fsb)
 		return 0;
 
+	rgot = got;
 	gotblkcnt = split_fsb - got.br_startoff;
 	new.br_startoff = split_fsb;
 	new.br_startblock = got.br_startblock + gotblkcnt;
@@ -5913,6 +6071,17 @@ xfs_bmap_split_extent_at(
 		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, del_cursor);
 	}
 
+	/* update rmapbt */
+	adj = -(long)rgot.br_blockcount + gotblkcnt;
+	error = xfs_rmap_resize(mp, &free_list->xbf_rlist, ip->i_ino,
+			whichfork, &rgot, adj);
+	if (error)
+		goto del_cursor;
+	error = xfs_rmap_insert(mp, &free_list->xbf_rlist, ip->i_ino,
+			whichfork, &new);
+	if (error)
+		goto del_cursor;
+
 	/*
 	 * Convert to a btree if necessary.
 	 */
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 6733873..46d87315 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -35,6 +35,7 @@
 #include "xfs_trace.h"
 #include "xfs_error.h"
 #include "xfs_extent_busy.h"
+#include "xfs_bmap.h"
 
 /*
  * Lookup the first record less than or equal to [bno, len, owner, offset]
@@ -563,3 +564,700 @@ out_error:
 	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }
+
+/* Encode logical offset for a rmapbt record */
+STATIC uint64_t
+b2r_off(
+	int		whichfork,
+	xfs_fileoff_t	off)
+{
+	uint64_t	x;
+
+	x = off;
+	if (whichfork == XFS_ATTR_FORK)
+		x |= XFS_RMAP_OFF_ATTR;
+	return x;
+}
+
+/* Encode blockcount for a rmapbt record */
+STATIC xfs_extlen_t
+b2r_len(
+	struct xfs_bmbt_irec	*irec)
+{
+	xfs_extlen_t		x;
+
+	x = irec->br_blockcount;
+	if (irec->br_state == XFS_EXT_UNWRITTEN)
+		x |= XFS_RMAP_LEN_UNWRITTEN;
+	return x;
+}
+
+static int
+__xfs_rmap_move(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			start_adj);
+
+static int
+__xfs_rmap_resize(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			size_adj);
+
+/* Combine two adjacent rmap extents */
+static int
+__xfs_rmap_combine(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*LEFT,
+	struct xfs_bmbt_irec	*RIGHT,
+	struct xfs_bmbt_irec	*PREV)
+{
+	int			error;
+
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_combine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, LEFT, PREV, RIGHT);
+
+	/* Delete right rmap */
+	error = xfs_rmapbt_delete(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, RIGHT->br_startblock),
+			b2r_len(RIGHT), ino,
+			b2r_off(whichfork, RIGHT->br_startoff));
+	if (error)
+		goto done;
+
+	/* Delete prev rmap */
+	if (!isnullstartblock(PREV->br_startblock)) {
+		error = xfs_rmapbt_delete(rcur,
+				XFS_FSB_TO_AGBNO(rcur->bc_mp,
+						PREV->br_startblock),
+				b2r_len(PREV), ino,
+				b2r_off(whichfork, PREV->br_startoff));
+		if (error)
+			goto done;
+	}
+
+	/* Enlarge left rmap */
+	return __xfs_rmap_resize(rcur, ino, whichfork, LEFT,
+			PREV->br_blockcount + RIGHT->br_blockcount);
+done:
+	return error;
+}
+
+/* Extend a left rmap extent */
+static int
+__xfs_rmap_lcombine(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*LEFT,
+	struct xfs_bmbt_irec	*PREV)
+{
+	int			error;
+
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_lcombine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, LEFT, PREV);
+
+	/* Delete prev rmap */
+	if (!isnullstartblock(PREV->br_startblock)) {
+		error = xfs_rmapbt_delete(rcur,
+				XFS_FSB_TO_AGBNO(rcur->bc_mp,
+						PREV->br_startblock),
+				b2r_len(PREV), ino,
+				b2r_off(whichfork, PREV->br_startoff));
+		if (error)
+			goto done;
+	}
+
+	/* Enlarge left rmap */
+	return __xfs_rmap_resize(rcur, ino, whichfork, LEFT,
+			PREV->br_blockcount);
+done:
+	return error;
+}
+
+/* Extend a right rmap extent */
+static int
+__xfs_rmap_rcombine(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*RIGHT,
+	struct xfs_bmbt_irec	*PREV)
+{
+	int			error;
+
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_rcombine(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, RIGHT, PREV);
+
+	/* Delete prev rmap */
+	if (!isnullstartblock(PREV->br_startblock)) {
+		error = xfs_rmapbt_delete(rcur,
+				XFS_FSB_TO_AGBNO(rcur->bc_mp,
+						PREV->br_startblock),
+				b2r_len(PREV), ino,
+				b2r_off(whichfork, PREV->br_startoff));
+		if (error)
+			goto done;
+	}
+
+	/* Enlarge right rmap */
+	return __xfs_rmap_move(rcur, ino, whichfork, RIGHT,
+			-PREV->br_blockcount);
+done:
+	return error;
+}
+
+/* Insert a rmap extent */
+static int
+__xfs_rmap_insert(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*rec)
+{
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_insert(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, rec);
+
+	return xfs_rmapbt_insert(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, rec->br_startblock),
+			b2r_len(rec), ino,
+			b2r_off(whichfork, rec->br_startoff));
+}
+
+/* Delete a rmap extent */
+static int
+__xfs_rmap_delete(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*rec)
+{
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_delete(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, rec);
+
+	return xfs_rmapbt_delete(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, rec->br_startblock),
+			b2r_len(rec), ino,
+			b2r_off(whichfork, rec->br_startoff));
+}
+
+/* Change the start of an rmap */
+static int
+__xfs_rmap_move(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			start_adj)
+{
+	int			error;
+	struct xfs_bmbt_irec	irec;
+
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_move(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, PREV, start_adj);
+
+	/* Delete prev rmap */
+	error = xfs_rmapbt_delete(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+			b2r_len(PREV), ino,
+			b2r_off(whichfork, PREV->br_startoff));
+	if (error)
+		goto done;
+
+	/* Re-add rmap with new start */
+	irec = *PREV;
+	irec.br_startblock += start_adj;
+	irec.br_startoff += start_adj;
+	irec.br_blockcount -= start_adj;
+	return xfs_rmapbt_insert(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, irec.br_startblock),
+			b2r_len(&irec), ino,
+			b2r_off(whichfork, irec.br_startoff));
+done:
+	return error;
+}
+
+/* Change the logical offset of an rmap */
+static int
+__xfs_rmap_slide(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			start_adj)
+{
+	int			error;
+
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_slide(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, PREV, start_adj);
+
+	/* Delete prev rmap */
+	error = xfs_rmapbt_delete(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+			b2r_len(PREV), ino,
+			b2r_off(whichfork, PREV->br_startoff));
+	if (error)
+		goto done;
+
+	/* Re-add rmap with new logical offset */
+	return xfs_rmapbt_insert(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+			b2r_len(PREV), ino,
+			b2r_off(whichfork, PREV->br_startoff + start_adj));
+done:
+	return error;
+}
+
+/* Change the size of an rmap */
+static int
+__xfs_rmap_resize(
+	struct xfs_btree_cur	*rcur,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			size_adj)
+{
+	int			i;
+	int			error;
+	struct xfs_bmbt_irec	irec;
+	struct xfs_rmap_irec	rrec;
+
+	if (!rcur)
+		return 0;
+
+	trace_xfs_rmap_resize(rcur->bc_mp, rcur->bc_private.a.agno, ino,
+			whichfork, PREV, size_adj);
+
+	error = xfs_rmap_lookup_eq(rcur,
+			XFS_FSB_TO_AGBNO(rcur->bc_mp, PREV->br_startblock),
+			b2r_len(PREV), ino,
+			b2r_off(whichfork, PREV->br_startoff), &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+	error = xfs_rmap_get_rec(rcur, &rrec, &i);
+	if (error)
+		goto done;
+	XFS_WANT_CORRUPTED_GOTO(rcur->bc_mp, i == 1, done);
+	irec = *PREV;
+	irec.br_blockcount += size_adj;
+	rrec.rm_blockcount = b2r_len(&irec);
+	error = xfs_rmap_update(rcur, &rrec);
+	if (error)
+		goto done;
+done:
+	return error;
+}
+
+/*
+ * Free up any items left in the list.
+ */
+void
+xfs_rmap_cancel(
+	struct xfs_rmap_list	*rlist)	/* list of bmap_free_items */
+{
+	struct xfs_rmap_intent	*free;	/* free list item */
+	struct xfs_rmap_intent	*next;
+
+	if (rlist->rl_count == 0)
+		return;
+	ASSERT(rlist->rl_first != NULL);
+	for (free = rlist->rl_first; free; free = next) {
+		next = free->ri_next;
+		kmem_free(free);
+	}
+	rlist->rl_count = 0;
+	rlist->rl_first = NULL;
+}
+
+static xfs_agnumber_t
+rmap_ag(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_intent	*ri)
+{
+	switch (ri->ri_type) {
+	case XFS_RMAP_COMBINE:
+	case XFS_RMAP_LCOMBINE:
+		return XFS_FSB_TO_AGNO(mp, ri->ri_u.a.left.br_startblock);
+	case XFS_RMAP_RCOMBINE:
+		return XFS_FSB_TO_AGNO(mp, ri->ri_u.a.right.br_startblock);
+	case XFS_RMAP_INSERT:
+	case XFS_RMAP_DELETE:
+	case XFS_RMAP_MOVE:
+	case XFS_RMAP_SLIDE:
+	case XFS_RMAP_RESIZE:
+		return XFS_FSB_TO_AGNO(mp, ri->ri_prev.br_startblock);
+	default:
+		ASSERT(0);
+	}
+	return 0; /* shut up, gcc */
+}
+
+/*
+ * Free up any items left in the extent list, using the given transaction.
+ */
+int
+__xfs_rmap_finish(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_rmap_list	*rlist)
+{
+	struct xfs_rmap_intent	*free;	/* free list item */
+	struct xfs_rmap_intent	*next;
+	struct xfs_btree_cur	*rcur = NULL;
+	struct xfs_buf		*agbp = NULL;
+	int			error = 0;
+	xfs_agnumber_t		agno;
+
+	if (rlist->rl_count == 0)
+		return 0;
+
+	ASSERT(rlist->rl_first != NULL);
+	for (free = rlist->rl_first; free; free = next) {
+		agno = rmap_ag(mp, free);
+		ASSERT(agno != NULLAGNUMBER);
+		if (rcur && agno < rcur->bc_private.a.agno) {
+			error = -EFSCORRUPTED;
+			break;
+		}
+
+		ASSERT(rcur == NULL || agno >= rcur->bc_private.a.agno);
+		if (rcur == NULL || agno > rcur->bc_private.a.agno) {
+			if (rcur) {
+				xfs_btree_del_cursor(rcur, XFS_BTREE_NOERROR);
+				xfs_trans_brelse(tp, agbp);
+			}
+
+			error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+			if (error)
+				break;
+
+			rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+			if (!rcur) {
+				xfs_trans_brelse(tp, agbp);
+				error = -ENOMEM;
+				break;
+			}
+		}
+
+		switch (free->ri_type) {
+		case XFS_RMAP_COMBINE:
+			error = __xfs_rmap_combine(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_u.a.left,
+					&free->ri_u.a.right, &free->ri_prev);
+			break;
+		case XFS_RMAP_LCOMBINE:
+			error = __xfs_rmap_lcombine(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_u.a.left,
+					&free->ri_prev);
+			break;
+		case XFS_RMAP_RCOMBINE:
+			error = __xfs_rmap_rcombine(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_u.a.right,
+					&free->ri_prev);
+			break;
+		case XFS_RMAP_INSERT:
+			error = __xfs_rmap_insert(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_prev);
+			break;
+		case XFS_RMAP_DELETE:
+			error = __xfs_rmap_delete(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_prev);
+			break;
+		case XFS_RMAP_MOVE:
+			error = __xfs_rmap_move(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_prev,
+					free->ri_u.b.adj);
+			break;
+		case XFS_RMAP_SLIDE:
+			error = __xfs_rmap_slide(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_prev,
+					free->ri_u.b.adj);
+			break;
+		case XFS_RMAP_RESIZE:
+			error = __xfs_rmap_resize(rcur, free->ri_ino,
+					free->ri_whichfork, &free->ri_prev,
+					free->ri_u.b.adj);
+			break;
+		default:
+			ASSERT(0);
+		}
+
+		if (error)
+			break;
+		next = free->ri_next;
+		kmem_free(free);
+	}
+
+	if (rcur)
+		xfs_btree_del_cursor(rcur, error ? XFS_BTREE_ERROR :
+				XFS_BTREE_NOERROR);
+	if (agbp)
+		xfs_trans_brelse(tp, agbp);
+
+	for (; free; free = next) {
+		next = free->ri_next;
+		kmem_free(free);
+	}
+
+	rlist->rl_count = 0;
+	rlist->rl_first = NULL;
+	return error;
+}
+
+/*
+ * Free up any items left in the intent list.
+ */
+int
+xfs_rmap_finish(
+	struct xfs_mount	*mp,
+	struct xfs_trans	**tpp,
+	struct xfs_inode	*ip,
+	struct xfs_rmap_list	*rlist,
+	int			*committed)
+{
+	int			error;
+
+	*committed = 0;
+	if (rlist->rl_count == 0)
+		return 0;
+
+	error = xfs_trans_roll(tpp, ip);
+	if (error)
+		return error;
+	*committed = 1;
+
+	return __xfs_rmap_finish(mp, *tpp, rlist);
+}
+
+/*
+ * Record a rmap intent; the list is kept sorted first by AG and then by
+ * increasing age.
+ */
+static int
+__xfs_rmap_add(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	struct xfs_rmap_intent	*ri)
+{
+	struct xfs_rmap_intent	*cur;		/* current (next) element */
+	struct xfs_rmap_intent	*new;
+	struct xfs_rmap_intent	*prev;		/* previous element */
+	xfs_agnumber_t		new_agno, cur_agno;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	new = kmem_zalloc(sizeof(struct xfs_rmap_intent), KM_SLEEP | KM_NOFS);
+	*new = *ri;
+	new_agno = rmap_ag(mp, new);
+	ASSERT(new_agno != NULLAGNUMBER);
+
+	for (prev = NULL, cur = rlist->rl_first;
+	     cur != NULL;
+	     prev = cur, cur = cur->ri_next) {
+		cur_agno = rmap_ag(mp, cur);
+		if (cur_agno > new_agno)
+			break;
+	}
+	if (prev)
+		prev->ri_next = new;
+	else
+		rlist->rl_first = new;
+	new->ri_next = cur;
+	rlist->rl_count++;
+	return 0;
+}
+
+/* Combine two adjacent rmap extents */
+int
+xfs_rmap_combine(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*left,
+	struct xfs_bmbt_irec	*right,
+	struct xfs_bmbt_irec	*prev)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_COMBINE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *prev;
+	ri.ri_u.a.left = *left;
+	ri.ri_u.a.right = *right;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Extend a left rmap extent */
+int
+xfs_rmap_lcombine(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*LEFT,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_LCOMBINE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *PREV;
+	ri.ri_u.a.left = *LEFT;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Extend a right rmap extent */
+int
+xfs_rmap_rcombine(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*RIGHT,
+	struct xfs_bmbt_irec	*PREV)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_RCOMBINE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *PREV;
+	ri.ri_u.a.right = *RIGHT;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Insert a rmap extent */
+int
+xfs_rmap_insert(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*new)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_INSERT;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *new;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Delete a rmap extent */
+int
+xfs_rmap_delete(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*new)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_DELETE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *new;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Change the start of an rmap */
+int
+xfs_rmap_move(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			start_adj)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_MOVE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *PREV;
+	ri.ri_u.b.adj = start_adj;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Change the logical offset of an rmap */
+int
+xfs_rmap_slide(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			start_adj)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_SLIDE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *PREV;
+	ri.ri_u.b.adj = start_adj;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
+
+/* Change the size of an rmap */
+int
+xfs_rmap_resize(
+	struct xfs_mount	*mp,
+	struct xfs_rmap_list	*rlist,
+	xfs_ino_t		ino,
+	int			whichfork,
+	struct xfs_bmbt_irec	*PREV,
+	long			size_adj)
+{
+	struct xfs_rmap_intent	ri;
+
+	ri.ri_type = XFS_RMAP_RESIZE;
+	ri.ri_ino = ino;
+	ri.ri_whichfork = whichfork;
+	ri.ri_prev = *PREV;
+	ri.ri_u.b.adj = size_adj;
+
+	return __xfs_rmap_add(mp, rlist, &ri);
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index d7c9722..4fe13f3 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -21,6 +21,7 @@
 struct xfs_buf;
 struct xfs_btree_cur;
 struct xfs_mount;
+struct xfs_rmap_list;
 
 /* rmaps only exist on crc enabled filesystems */
 #define XFS_RMAP_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
@@ -68,4 +69,63 @@ int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
 		  xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
 		  struct xfs_owner_info *oinfo);
 
+/* functions for updating the rmapbt based on bmbt map/unmap operations */
+int xfs_rmap_combine(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *LEFT,
+		struct xfs_bmbt_irec *RIGHT, struct xfs_bmbt_irec *PREV);
+int xfs_rmap_lcombine(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *LEFT,
+		struct xfs_bmbt_irec *PREV);
+int xfs_rmap_rcombine(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *RIGHT,
+		struct xfs_bmbt_irec *PREV);
+int xfs_rmap_insert(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *rec);
+int xfs_rmap_delete(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *rec);
+int xfs_rmap_move(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *PREV,
+		long start_adj);
+int xfs_rmap_slide(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *PREV,
+		long start_adj);
+int xfs_rmap_resize(struct xfs_mount *mp, struct xfs_rmap_list *rlist,
+		xfs_ino_t ino, int whichfork, struct xfs_bmbt_irec *PREV,
+		long size_adj);
+
+enum xfs_rmap_intent_type {
+	XFS_RMAP_COMBINE,
+	XFS_RMAP_LCOMBINE,
+	XFS_RMAP_RCOMBINE,
+	XFS_RMAP_INSERT,
+	XFS_RMAP_DELETE,
+	XFS_RMAP_MOVE,
+	XFS_RMAP_SLIDE,
+	XFS_RMAP_RESIZE,
+};
+
+struct xfs_rmap_intent {
+	struct xfs_rmap_intent			*ri_next;
+	enum xfs_rmap_intent_type		ri_type;
+	xfs_ino_t				ri_ino;
+	int					ri_whichfork;
+	struct xfs_bmbt_irec			ri_prev;
+	union {
+		struct {
+			struct xfs_bmbt_irec	left;
+			struct xfs_bmbt_irec	right;
+		} a;
+		struct {
+			long			adj;
+		} b;
+	} ri_u;
+};
+
+void	xfs_rmap_cancel(struct xfs_rmap_list *rlist);
+int	__xfs_rmap_finish(struct xfs_mount *mp, struct xfs_trans *tp,
+			struct xfs_rmap_list *rlist);
+int	xfs_rmap_finish(struct xfs_mount *mp, struct xfs_trans **tpp,
+			struct xfs_inode *ip, struct xfs_rmap_list *rlist,
+			int *committed);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index b8dfa93..d844997 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -109,10 +109,16 @@ xfs_bmap_finish(
 	struct xfs_bmap_free_item	*next;	/* next item on free list */
 
 	ASSERT((*tp)->t_flags & XFS_TRANS_PERM_LOG_RES);
-	if (flist->xbf_count == 0) {
-		*committed = 0;
+
+	*committed = 0;
+	error = xfs_rmap_finish((*tp)->t_mountp, tp, ip, &flist->xbf_rlist,
+			committed);
+	if (error)
+		return error;
+
+	if (flist->xbf_count == 0)
 		return 0;
-	}
+
 	efi = xfs_trans_get_efi(*tp, flist->xbf_count);
 	for (free = flist->xbf_first; free; free = free->xbfi_next)
 		xfs_trans_log_efi_extent(*tp, efi, free->xbfi_startblock,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 30/76] xfs: add rmap btree geometry feature flag
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (28 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 29/76] xfs: bmap btree changes should update rmap btree Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 31/76] xfs: add rmap btree block detection to log recovery Darrick J. Wong
                   ` (46 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

So xfs_info and other userspace utilities know the filesystem is
using this feature.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_fs.h |    1 +
 fs/xfs/xfs_fsops.c     |    4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b2b73a9..d2a58ff 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -240,6 +240,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_FTYPE	0x10000	/* inode directory types */
 #define XFS_FSOP_GEOM_FLAGS_FINOBT	0x20000	/* free inode btree */
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* Reverse mapping btree */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 031dd92..64bb02b 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -104,7 +104,9 @@ xfs_fs_geometry(
 			(xfs_sb_version_hasfinobt(&mp->m_sb) ?
 				XFS_FSOP_GEOM_FLAGS_FINOBT : 0) |
 			(xfs_sb_version_hassparseinodes(&mp->m_sb) ?
-				XFS_FSOP_GEOM_FLAGS_SPINODES : 0);
+				XFS_FSOP_GEOM_FLAGS_SPINODES : 0) |
+			(xfs_sb_version_hasrmapbt(&mp->m_sb) ?
+				XFS_FSOP_GEOM_FLAGS_RMAPBT : 0);
 		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
 				mp->m_sb.sb_logsectsize : BBSIZE;
 		geo->rtsectsize = mp->m_sb.sb_blocksize;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 31/76] xfs: add rmap btree block detection to log recovery
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (29 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 30/76] xfs: add rmap btree geometry feature flag Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  8:59 ` [PATCH 32/76] xfs: enable the rmap btree functionality Darrick J. Wong
                   ` (45 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

So such blocks can be correctly identified and have their operations
structutes attached to validate recovery has not resulted in a
correct block.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c3abf50..c8f056e 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1847,6 +1847,7 @@ xlog_recover_get_buf_lsn(
 	case XFS_ABTC_CRC_MAGIC:
 	case XFS_ABTB_MAGIC:
 	case XFS_ABTC_MAGIC:
+	case XFS_RMAP_CRC_MAGIC:
 	case XFS_IBT_CRC_MAGIC:
 	case XFS_IBT_MAGIC: {
 		struct xfs_btree_block *btb = blk;
@@ -2015,6 +2016,9 @@ xlog_recover_validate_buf_type(
 		case XFS_BMAP_MAGIC:
 			bp->b_ops = &xfs_bmbt_buf_ops;
 			break;
+		case XFS_RMAP_CRC_MAGIC:
+			bp->b_ops = &xfs_rmapbt_buf_ops;
+			break;
 		default:
 			xfs_warn(mp, "Bad btree block magic!");
 			ASSERT(0);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 32/76] xfs: enable the rmap btree functionality
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (30 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 31/76] xfs: add rmap btree block detection to log recovery Darrick J. Wong
@ 2015-12-19  8:59 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 33/76] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
                   ` (44 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  8:59 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h |    3 ++-
 fs/xfs/libxfs/xfs_sb.c     |    5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 2799746..4678831 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -457,7 +457,8 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index c9114b7..2590c89 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -223,6 +223,11 @@ xfs_mount_validate_sb(
 		return -EINVAL;
 	}
 
+	if (xfs_sb_version_hasrmapbt(sbp)) {
+		xfs_alert(mp,
+"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
+	}
+
 	/*
 	 * More sanity checking.  Most of these were stolen directly from
 	 * xfs_repair.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 33/76] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (31 preceding siblings ...)
  2015-12-19  8:59 ` [PATCH 32/76] xfs: enable the rmap btree functionality Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 34/76] xfs: implement " Darrick J. Wong
                   ` (43 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: Dave Chinner, xfs

>From : Dave Chinner <dchinner@redhat.com>

Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will nee dto be done before the
rmap btree code can have the experimental tag removed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_util.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index d844997..677e3aa6 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1729,6 +1729,19 @@ xfs_swap_extents(
 	__uint64_t	tmp;
 	int		lock_flags;
 
+	/*
+	 * We can't swap extents on rmap btree enabled filesystems yet
+	 * as there is no mechanism to update the owner of extents in
+	 * the rmap tree yet. Hence, for the moment, just reject attempts
+	 * to swap extents with EINVAL after emitting a warning once to remind
+	 * us this needs fixing.
+	 */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		WARN_ONCE(1,
+	"XFS: XFS_IOC_SWAPEXT not supported on RMAP enabled filesystems\n");
+		return -EINVAL;
+	}
+
 	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
 	if (!tempifp) {
 		error = -ENOMEM;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 34/76] xfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (32 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 33/76] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2016-01-03 12:17   ` Christoph Hellwig
  2015-12-19  9:00 ` [PATCH 35/76] libxfs: refactor short btree block verification Darrick J. Wong
                   ` (42 subsequent siblings)
  76 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Implement extent swapping when reverse-mapping is enabled.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.c      |   17 +++++++
 fs/xfs/libxfs/xfs_rmap.c       |   91 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h |    9 ++++
 fs/xfs/xfs_bmap_util.c         |   46 ++++++++++++++------
 4 files changed, 149 insertions(+), 14 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 56a84e7..34dddd3 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -33,6 +33,7 @@
 #include "xfs_cksum.h"
 #include "xfs_alloc.h"
 #include "xfs_log.h"
+#include "xfs_rmap_btree.h"
 
 /*
  * Cursor allocation zone.
@@ -4001,6 +4002,8 @@ xfs_btree_block_change_owner(
 	struct xfs_btree_block	*block;
 	struct xfs_buf		*bp;
 	union xfs_btree_ptr     rptr;
+	struct xfs_owner_info	old_oinfo, new_oinfo;
+	int			error;
 
 	/* do right sibling readahead */
 	xfs_btree_readahead(cur, level, XFS_BTCUR_RIGHTRA);
@@ -4012,6 +4015,20 @@ xfs_btree_block_change_owner(
 	else
 		block->bb_u.s.bb_owner = cpu_to_be32(new_owner);
 
+	/* change rmap owners (bmbt blocks only) */
+	if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+		XFS_RMAP_INO_BMBT_OWNER(&old_oinfo,
+				cur->bc_private.b.ip->i_ino,
+				cur->bc_private.b.whichfork);
+		XFS_RMAP_INO_BMBT_OWNER(&new_oinfo,
+				new_owner,
+				cur->bc_private.b.whichfork);
+		error = xfs_rmap_change_bmbt_owner(cur, bp, &old_oinfo,
+				&new_oinfo);
+		if (error)
+			return error;
+	}
+
 	/*
 	 * If the block is a root block hosted in an inode, we might not have a
 	 * buffer pointer here and we shouldn't attempt to log the change as the
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 46d87315..32172d1 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -36,6 +36,7 @@
 #include "xfs_error.h"
 #include "xfs_extent_busy.h"
 #include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
 
 /*
  * Lookup the first record less than or equal to [bno, len, owner, offset]
@@ -1261,3 +1262,93 @@ xfs_rmap_resize(
 
 	return __xfs_rmap_add(mp, rlist, &ri);
 }
+
+/**
+ * Change ownership of a file's BMBT block reverse-mappings.
+ */
+int
+xfs_rmap_change_bmbt_owner(
+	struct xfs_btree_cur	*bcur,
+	struct xfs_buf		*bp,
+	struct xfs_owner_info	*old_owner,
+	struct xfs_owner_info	*new_owner)
+{
+	struct xfs_buf		*agfbp;
+	xfs_fsblock_t		fsbno;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	int			error;
+
+	if (!xfs_sb_version_hasrmapbt(&bcur->bc_mp->m_sb) || !bp)
+		return 0;
+
+	fsbno = XFS_DADDR_TO_FSB(bcur->bc_mp, XFS_BUF_ADDR(bp));
+	agno = XFS_FSB_TO_AGNO(bcur->bc_mp, fsbno);
+	agbno = XFS_FSB_TO_AGBNO(bcur->bc_mp, fsbno);
+
+	error = xfs_read_agf(bcur->bc_mp, bcur->bc_tp, agno, 0, &agfbp);
+
+	error = xfs_rmap_free(bcur->bc_tp, agfbp, agno, agbno, 1, old_owner);
+	if (error)
+		goto err;
+
+	error = xfs_rmap_alloc(bcur->bc_tp, agfbp, agno, agbno, 1, new_owner);
+	if (error)
+		goto err;
+
+err:
+	xfs_trans_brelse(bcur->bc_tp, agfbp);
+	return error;
+}
+
+/**
+ * Change the ownership on a file's extent's reverse-mappings.
+ */
+int
+xfs_rmap_change_extent_owner(
+	struct xfs_mount	*mp,
+	struct xfs_inode	*ip,
+	xfs_ino_t		ino,
+	xfs_fileoff_t		isize,
+	struct xfs_trans	*tp,
+	int			whichfork,
+	xfs_ino_t		new_owner,
+	struct xfs_rmap_list	*rlist)
+{
+	struct xfs_bmbt_irec	imap;
+	int			nimaps;
+	xfs_fileoff_t		offset;
+	xfs_filblks_t		len;
+	int			flags = 0;
+	int			error;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	if (whichfork == XFS_ATTR_FORK)
+		flags |= XFS_BMAPI_ATTRFORK;
+
+	offset = 0;
+	len = XFS_B_TO_FSB(mp, isize);
+	nimaps = 1;
+	error = xfs_bmapi_read(ip, offset, len, &imap, &nimaps, flags);
+	while (error == 0 && nimaps > 0) {
+		if (imap.br_startblock == HOLESTARTBLOCK ||
+		    imap.br_startblock == DELAYSTARTBLOCK)
+			goto advloop;
+
+		error = xfs_rmap_delete(mp, rlist, ino, whichfork, &imap);
+		if (error)
+			break;
+		error = xfs_rmap_insert(mp, rlist, new_owner, whichfork, &imap);
+		if (error)
+			break;
+advloop:
+		offset += imap.br_blockcount;
+		len -= imap.br_blockcount;
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, offset, len, &imap, &nimaps, flags);
+	}
+
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index 4fe13f3..b4e085c 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -128,4 +128,13 @@ int	xfs_rmap_finish(struct xfs_mount *mp, struct xfs_trans **tpp,
 			struct xfs_inode *ip, struct xfs_rmap_list *rlist,
 			int *committed);
 
+/* functions for changing rmap ownership */
+int xfs_rmap_change_extent_owner(struct xfs_mount *mp, struct xfs_inode *ip,
+		xfs_ino_t ino, xfs_fileoff_t isize, struct xfs_trans *tp,
+		int whichfork, xfs_ino_t new_owner,
+		struct xfs_rmap_list *rlist);
+int xfs_rmap_change_bmbt_owner(struct xfs_btree_cur *bcur, struct xfs_buf *bp,
+		struct xfs_owner_info *old_owner,
+		struct xfs_owner_info *new_owner);
+
 #endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 677e3aa6..ad4eca7 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1712,6 +1712,18 @@ xfs_swap_extent_flush(
 	return 0;
 }
 
+static int
+change_extent_owner(
+	struct xfs_inode	*ip,
+	struct xfs_trans	*tp,
+	int			whichfork,
+	struct xfs_inode	*tip,
+	struct xfs_rmap_list	*rlist)
+{
+	return xfs_rmap_change_extent_owner(ip->i_mount, ip, ip->i_ino,
+			ip->i_d.di_size, tp, whichfork, tip->i_ino, rlist);
+}
+
 int
 xfs_swap_extents(
 	xfs_inode_t	*ip,	/* target inode */
@@ -1722,25 +1734,15 @@ xfs_swap_extents(
 	xfs_trans_t	*tp;
 	xfs_bstat_t	*sbp = &sxp->sx_stat;
 	xfs_ifork_t	*tempifp, *ifp, *tifp;
+	struct xfs_bmap_free	flist;
 	int		src_log_flags, target_log_flags;
 	int		error = 0;
 	int		aforkblks = 0;
 	int		taforkblks = 0;
 	__uint64_t	tmp;
 	int		lock_flags;
-
-	/*
-	 * We can't swap extents on rmap btree enabled filesystems yet
-	 * as there is no mechanism to update the owner of extents in
-	 * the rmap tree yet. Hence, for the moment, just reject attempts
-	 * to swap extents with EINVAL after emitting a warning once to remind
-	 * us this needs fixing.
-	 */
-	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
-		WARN_ONCE(1,
-	"XFS: XFS_IOC_SWAPEXT not supported on RMAP enabled filesystems\n");
-		return -EINVAL;
-	}
+	xfs_fsblock_t	firstfsb;
+	struct xfs_trans_res	tres = {.tr_logres = 262144, .tr_logcount = 1, .tr_logflags = 0};
 
 	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
 	if (!tempifp) {
@@ -1778,7 +1780,9 @@ xfs_swap_extents(
 		goto out_unlock;
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_SWAPEXT);
-	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+	/* XXX How do we create a potentially huge transaction here? */
+	/* error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0); */
+	error = xfs_trans_reserve(tp, &tres, 0, 0);
 	if (error) {
 		xfs_trans_cancel(tp);
 		goto out_unlock;
@@ -1878,6 +1882,20 @@ xfs_swap_extents(
 			goto out_trans_cancel;
 	}
 
+	/* Change owners in the extent rmaps */
+	xfs_bmap_init(&flist, &firstfsb);
+	error = change_extent_owner(ip, tp, XFS_DATA_FORK, tip,
+			&flist.xbf_rlist);
+	if (error)
+		goto out_trans_cancel;
+	error = change_extent_owner(tip, tp, XFS_DATA_FORK, ip,
+			&flist.xbf_rlist);
+	if (error)
+		goto out_trans_cancel;
+	error = __xfs_rmap_finish(mp, tp, &flist.xbf_rlist);
+	if (error)
+		goto out_trans_cancel;
+
 	/*
 	 * Swap the data forks of the inodes
 	 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 35/76] libxfs: refactor short btree block verification
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (33 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 34/76] xfs: implement " Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2016-01-03 12:18   ` Christoph Hellwig
  2015-12-19  9:00 ` [PATCH 36/76] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
                   ` (41 subsequent siblings)
  76 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create xfs_btree_sblock_verify() to verify short-format btree blocks
(i.e. the per-AG btrees with 32-bit block pointers) instead of
open-coding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc_btree.c  |   34 ++--------------------
 fs/xfs/libxfs/xfs_btree.c        |   58 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_btree.h        |    3 ++
 fs/xfs/libxfs/xfs_ialloc_btree.c |   26 ++---------------
 fs/xfs/libxfs/xfs_rmap_btree.c   |   23 ++-------------
 5 files changed, 70 insertions(+), 74 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 90de071..1352322 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -293,14 +293,7 @@ xfs_allocbt_verify(
 	level = be16_to_cpu(block->bb_level);
 	switch (block->bb_magic) {
 	case cpu_to_be32(XFS_ABTB_CRC_MAGIC):
-		if (!xfs_sb_version_hascrc(&mp->m_sb))
-			return false;
-		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
-			return false;
-		if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
-			return false;
-		if (pag &&
-		    be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+		if (!xfs_btree_sblock_v5hdr_verify(bp))
 			return false;
 		/* fall through */
 	case cpu_to_be32(XFS_ABTB_MAGIC):
@@ -311,14 +304,7 @@ xfs_allocbt_verify(
 			return false;
 		break;
 	case cpu_to_be32(XFS_ABTC_CRC_MAGIC):
-		if (!xfs_sb_version_hascrc(&mp->m_sb))
-			return false;
-		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
-			return false;
-		if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
-			return false;
-		if (pag &&
-		    be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+		if (!xfs_btree_sblock_v5hdr_verify(bp))
 			return false;
 		/* fall through */
 	case cpu_to_be32(XFS_ABTC_MAGIC):
@@ -332,21 +318,7 @@ xfs_allocbt_verify(
 		return false;
 	}
 
-	/* numrecs verification */
-	if (be16_to_cpu(block->bb_numrecs) > mp->m_alloc_mxr[level != 0])
-		return false;
-
-	/* sibling pointer verification */
-	if (!block->bb_u.s.bb_leftsib ||
-	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
-	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
-		return false;
-	if (!block->bb_u.s.bb_rightsib ||
-	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
-	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
-		return false;
-
-	return true;
+	return xfs_btree_sblock_verify(bp, mp->m_alloc_mxr[level != 0]);
 }
 
 static void
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 34dddd3..b810154 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4100,3 +4100,61 @@ xfs_btree_change_owner(
 
 	return 0;
 }
+
+/**
+ * xfs_btree_sblock_v5hdr_verify() -- verify the v5 fields of a short-format
+ *				      btree block
+ *
+ * @bp: buffer containing the btree block
+ * @max_recs: pointer to the m_*_mxr max records field in the xfs mount
+ * @pag_max_level: pointer to the per-ag max level field
+ */
+bool
+xfs_btree_sblock_v5hdr_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_perag	*pag = bp->b_pag;
+
+	if (!xfs_sb_version_hascrc(&mp->m_sb))
+		return false;
+	if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
+		return false;
+	if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
+		return false;
+	if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+		return false;
+	return true;
+}
+
+/**
+ * xfs_btree_sblock_verify() -- verify a short-format btree block
+ *
+ * @bp: buffer containing the btree block
+ * @max_recs: maximum records allowed in this btree node
+ */
+bool
+xfs_btree_sblock_verify(
+	struct xfs_buf		*bp,
+	unsigned int		max_recs)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+
+	/* numrecs verification */
+	if (be16_to_cpu(block->bb_numrecs) > max_recs)
+		return false;
+
+	/* sibling pointer verification */
+	if (!block->bb_u.s.bb_leftsib ||
+	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
+	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
+		return false;
+	if (!block->bb_u.s.bb_rightsib ||
+	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
+	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
+		return false;
+
+	return true;
+}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 35e7754..6443c74 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -479,4 +479,7 @@ static inline int xfs_btree_get_level(struct xfs_btree_block *block)
 #define XFS_BTREE_TRACE_ARGR(c, r)
 #define	XFS_BTREE_TRACE_CURSOR(c, t)
 
+bool xfs_btree_sblock_v5hdr_verify(struct xfs_buf *bp);
+bool xfs_btree_sblock_verify(struct xfs_buf *bp, unsigned int max_recs);
+
 #endif	/* __XFS_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index bd8a1da..97228d6 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -224,7 +224,6 @@ xfs_inobt_verify(
 {
 	struct xfs_mount	*mp = bp->b_target->bt_mount;
 	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
-	struct xfs_perag	*pag = bp->b_pag;
 	unsigned int		level;
 
 	/*
@@ -240,14 +239,7 @@ xfs_inobt_verify(
 	switch (block->bb_magic) {
 	case cpu_to_be32(XFS_IBT_CRC_MAGIC):
 	case cpu_to_be32(XFS_FIBT_CRC_MAGIC):
-		if (!xfs_sb_version_hascrc(&mp->m_sb))
-			return false;
-		if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_meta_uuid))
-			return false;
-		if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
-			return false;
-		if (pag &&
-		    be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+		if (!xfs_btree_sblock_v5hdr_verify(bp))
 			return false;
 		/* fall through */
 	case cpu_to_be32(XFS_IBT_MAGIC):
@@ -257,24 +249,12 @@ xfs_inobt_verify(
 		return 0;
 	}
 
-	/* numrecs and level verification */
+	/* level verification */
 	level = be16_to_cpu(block->bb_level);
 	if (level >= mp->m_in_maxlevels)
 		return false;
-	if (be16_to_cpu(block->bb_numrecs) > mp->m_inobt_mxr[level != 0])
-		return false;
-
-	/* sibling pointer verification */
-	if (!block->bb_u.s.bb_leftsib ||
-	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
-	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
-		return false;
-	if (!block->bb_u.s.bb_rightsib ||
-	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
-	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
-		return false;
 
-	return true;
+	return xfs_btree_sblock_verify(bp, mp->m_inobt_mxr[level != 0]);
 }
 
 static void
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 5fe717b..6546d80 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -253,13 +253,10 @@ xfs_rmapbt_verify(
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return false;
-	if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
-		return false;
-	if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
-		return false;
-	if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+	if (!xfs_btree_sblock_v5hdr_verify(bp))
 		return false;
 
+	/* level verification */
 	level = be16_to_cpu(block->bb_level);
 	if (pag && pag->pagf_init) {
 		if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
@@ -267,21 +264,7 @@ xfs_rmapbt_verify(
 	} else if (level >= mp->m_ag_maxlevels)
 		return false;
 
-	/* numrecs verification */
-	if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[level != 0])
-		return false;
-
-	/* sibling pointer verification */
-	if (!block->bb_u.s.bb_leftsib ||
-	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
-	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
-		return false;
-	if (!block->bb_u.s.bb_rightsib ||
-	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
-	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
-		return false;
-
-	return true;
+	return xfs_btree_sblock_verify(bp, mp->m_rmap_mxr[level != 0]);
 }
 
 static void

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 36/76] xfs: don't update rmapbt when fixing agfl
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (34 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 35/76] libxfs: refactor short btree block verification Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 37/76] xfs: define tracepoints for refcount btree activities Darrick J. Wong
                   ` (40 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Allow a caller of xfs_alloc_fix_freelist to disable rmapbt updates
when fixing the AG freelist.  xfs_repair needs this during phase 5
to be able to adjust the freelist while it's reconstructing the rmap
btree; the missing entries will be added back at the very end of
phase 5 once the AGFL contents settle down.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c |   40 ++++++++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_alloc.h |    5 ++++-
 2 files changed, 30 insertions(+), 15 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index aa3cc24..cc23a30 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2101,26 +2101,38 @@ xfs_alloc_fix_freelist(
 	 * anything other than extra overhead when we need to put more blocks
 	 * back on the free list? Maybe we should only do this when space is
 	 * getting low or the AGFL is more than half full?
+	 *
+	 * The NOSHRINK flag prevents the AGFL from being shrunk if it's too
+	 * big; the NORMAP flag prevents AGFL expand/shrink operations from
+	 * updating the rmapbt.  Both flags are used in xfs_repair while we're
+	 * rebuilding the rmapbt, and neither are used by the kernel.  They're
+	 * both required to ensure that rmaps are correctly recorded for the
+	 * regenerated AGFL, bnobt, and cntbt.  See repair/phase5.c and
+	 * repair/rmap.c in xfsprogs for details.
 	 */
-	XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
-	while (pag->pagf_flcount > need) {
-		struct xfs_buf	*bp;
+	memset(&targs, 0, sizeof(targs));
+	if (!(flags & XFS_ALLOC_FLAG_NOSHRINK)) {
+		if (!(flags & XFS_ALLOC_FLAG_NORMAP))
+			XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
+		while (pag->pagf_flcount > need) {
+			struct xfs_buf	*bp;
 
-		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
-		if (error)
-			goto out_agbp_relse;
-		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
-					   &targs.oinfo, 1);
-		if (error)
-			goto out_agbp_relse;
-		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
-		xfs_trans_binval(tp, bp);
+			error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
+			if (error)
+				goto out_agbp_relse;
+			error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+						   &targs.oinfo, 1);
+			if (error)
+				goto out_agbp_relse;
+			bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
+			xfs_trans_binval(tp, bp);
+		}
 	}
 
-	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
-	XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
+	if (!(flags & XFS_ALLOC_FLAG_NORMAP))
+		XFS_RMAP_AG_OWNER(&targs.oinfo, XFS_RMAP_OWN_AG);
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index ea2868d..f21ec2b 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -54,6 +54,9 @@ typedef unsigned int xfs_alloctype_t;
  */
 #define	XFS_ALLOC_FLAG_TRYLOCK	0x00000001  /* use trylock for buffer locking */
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
+#define	XFS_ALLOC_FLAG_NORMAP	0x00000004  /* don't modify the rmapbt */
+#define	XFS_ALLOC_FLAG_NOSHRINK	0x00000008  /* don't shrink the freelist */
+
 
 /*
  * Argument structure for xfs_alloc routines.
@@ -87,7 +90,7 @@ typedef struct xfs_alloc_arg {
 	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* mask defining userdata treatment */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
-	struct xfs_owner_info	oinfo;		/* owner of blocks being allocated */
+	struct xfs_owner_info	oinfo;	/* owner of blocks being allocated */
 } xfs_alloc_arg_t;
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 37/76] xfs: define tracepoints for refcount btree activities
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (35 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 36/76] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 38/76] xfs: introduce refcount btree definitions Darrick J. Wong
                   ` (39 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Define all the tracepoints we need to inspect the refcount btree
runtime operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_trace.h |  329 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 329 insertions(+)


diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index bdd6b18..1c91439 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -39,6 +39,16 @@ struct xfs_buf_log_format;
 struct xfs_inode_log_format;
 struct xfs_bmbt_irec;
 
+#ifndef XFS_REFCOUNT_IREC_PLACEHOLDER
+#define XFS_REFCOUNT_IREC_PLACEHOLDER
+/* Placeholder definition to avoid breaking bisectability. */
+struct xfs_refcount_irec {
+	xfs_agblock_t	rc_startblock;	/* starting block number */
+	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
+	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
+};
+#endif
+
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),
 	TP_ARGS(ctx),
@@ -2455,6 +2465,325 @@ DEFINE_DISCARD_EVENT(xfs_discard_toosmall);
 DEFINE_DISCARD_EVENT(xfs_discard_exclude);
 DEFINE_DISCARD_EVENT(xfs_discard_busy);
 
+/* refcount tracepoint classes */
+
+/* reuse the discard trace class for agbno/aglen-based traces */
+#define DEFINE_AG_EXTENT_EVENT(name) DEFINE_DISCARD_EVENT(name)
+
+/* ag btree lookup tracepoint class */
+#define XFS_AG_BTREE_CMP_FORMAT_STR \
+	{ XFS_LOOKUP_EQ,	"eq" }, \
+	{ XFS_LOOKUP_LE,	"le" }, \
+	{ XFS_LOOKUP_GE,	"ge" }
+DECLARE_EVENT_CLASS(xfs_ag_btree_lookup_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agblock_t agbno, xfs_lookup_t dir),
+	TP_ARGS(mp, agno, agbno, dir),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, agbno)
+		__field(xfs_lookup_t, dir)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->agbno = agbno;
+		__entry->dir = dir;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u cmp %s(%d)\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->agbno,
+		  __print_symbolic(__entry->dir, XFS_AG_BTREE_CMP_FORMAT_STR),
+		  __entry->dir)
+)
+
+#define DEFINE_AG_BTREE_LOOKUP_EVENT(name) \
+DEFINE_EVENT(xfs_ag_btree_lookup_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_agblock_t agbno, xfs_lookup_t dir), \
+	TP_ARGS(mp, agno, agbno, dir))
+
+/* single-rcext tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_extent_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 struct xfs_refcount_irec *irec),
+	TP_ARGS(mp, agno, irec),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, startblock)
+		__field(xfs_extlen_t, blockcount)
+		__field(xfs_nlink_t, refcount)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->startblock = irec->rc_startblock;
+		__entry->blockcount = irec->rc_blockcount;
+		__entry->refcount = irec->rc_refcount;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->startblock,
+		  __entry->blockcount,
+		  __entry->refcount)
+)
+
+#define DEFINE_REFCOUNT_EXTENT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_extent_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 struct xfs_refcount_irec *irec), \
+	TP_ARGS(mp, agno, irec))
+
+/* single-rcext and an agbno tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 struct xfs_refcount_irec *irec, xfs_agblock_t agbno),
+	TP_ARGS(mp, agno, irec, agbno),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, startblock)
+		__field(xfs_extlen_t, blockcount)
+		__field(xfs_nlink_t, refcount)
+		__field(xfs_agblock_t, agbno)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->startblock = irec->rc_startblock;
+		__entry->blockcount = irec->rc_blockcount;
+		__entry->refcount = irec->rc_refcount;
+		__entry->agbno = agbno;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u @ agbno %u\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->startblock,
+		  __entry->blockcount,
+		  __entry->refcount,
+		  __entry->agbno)
+)
+
+#define DEFINE_REFCOUNT_EXTENT_AT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_extent_at_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 struct xfs_refcount_irec *irec, xfs_agblock_t agbno), \
+	TP_ARGS(mp, agno, irec, agbno))
+
+/* double-rcext tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2),
+	TP_ARGS(mp, agno, i1, i2),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, i1_startblock)
+		__field(xfs_extlen_t, i1_blockcount)
+		__field(xfs_nlink_t, i1_refcount)
+		__field(xfs_agblock_t, i2_startblock)
+		__field(xfs_extlen_t, i2_blockcount)
+		__field(xfs_nlink_t, i2_refcount)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->i1_startblock = i1->rc_startblock;
+		__entry->i1_blockcount = i1->rc_blockcount;
+		__entry->i1_refcount = i1->rc_refcount;
+		__entry->i2_startblock = i2->rc_startblock;
+		__entry->i2_blockcount = i2->rc_blockcount;
+		__entry->i2_refcount = i2->rc_refcount;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u -- "
+		  "agbno %u len %u refcount %u\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->i1_startblock,
+		  __entry->i1_blockcount,
+		  __entry->i1_refcount,
+		  __entry->i2_startblock,
+		  __entry->i2_blockcount,
+		  __entry->i2_refcount)
+)
+
+#define DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_double_extent_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2), \
+	TP_ARGS(mp, agno, i1, i2))
+
+/* double-rcext and an agbno tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2,
+		 xfs_agblock_t agbno),
+	TP_ARGS(mp, agno, i1, i2, agbno),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, i1_startblock)
+		__field(xfs_extlen_t, i1_blockcount)
+		__field(xfs_nlink_t, i1_refcount)
+		__field(xfs_agblock_t, i2_startblock)
+		__field(xfs_extlen_t, i2_blockcount)
+		__field(xfs_nlink_t, i2_refcount)
+		__field(xfs_agblock_t, agbno)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->i1_startblock = i1->rc_startblock;
+		__entry->i1_blockcount = i1->rc_blockcount;
+		__entry->i1_refcount = i1->rc_refcount;
+		__entry->i2_startblock = i2->rc_startblock;
+		__entry->i2_blockcount = i2->rc_blockcount;
+		__entry->i2_refcount = i2->rc_refcount;
+		__entry->agbno = agbno;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u -- "
+		  "agbno %u len %u refcount %u @ agbno %u\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->i1_startblock,
+		  __entry->i1_blockcount,
+		  __entry->i1_refcount,
+		  __entry->i2_startblock,
+		  __entry->i2_blockcount,
+		  __entry->i2_refcount,
+		  __entry->agbno)
+)
+
+#define DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_double_extent_at_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, \
+		 xfs_agblock_t agbno), \
+	TP_ARGS(mp, agno, i1, i2, agbno))
+
+/* triple-rcext tracepoint class */
+DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2,
+		 struct xfs_refcount_irec *i3),
+	TP_ARGS(mp, agno, i1, i2, i3),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, i1_startblock)
+		__field(xfs_extlen_t, i1_blockcount)
+		__field(xfs_nlink_t, i1_refcount)
+		__field(xfs_agblock_t, i2_startblock)
+		__field(xfs_extlen_t, i2_blockcount)
+		__field(xfs_nlink_t, i2_refcount)
+		__field(xfs_agblock_t, i3_startblock)
+		__field(xfs_extlen_t, i3_blockcount)
+		__field(xfs_nlink_t, i3_refcount)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->i1_startblock = i1->rc_startblock;
+		__entry->i1_blockcount = i1->rc_blockcount;
+		__entry->i1_refcount = i1->rc_refcount;
+		__entry->i2_startblock = i2->rc_startblock;
+		__entry->i2_blockcount = i2->rc_blockcount;
+		__entry->i2_refcount = i2->rc_refcount;
+		__entry->i3_startblock = i3->rc_startblock;
+		__entry->i3_blockcount = i3->rc_blockcount;
+		__entry->i3_refcount = i3->rc_refcount;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u refcount %u -- "
+		  "agbno %u len %u refcount %u -- "
+		  "agbno %u len %u refcount %u\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->i1_startblock,
+		  __entry->i1_blockcount,
+		  __entry->i1_refcount,
+		  __entry->i2_startblock,
+		  __entry->i2_blockcount,
+		  __entry->i2_refcount,
+		  __entry->i3_startblock,
+		  __entry->i3_blockcount,
+		  __entry->i3_refcount)
+);
+
+#define DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(name) \
+DEFINE_EVENT(xfs_refcount_triple_extent_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, \
+		 struct xfs_refcount_irec *i3), \
+	TP_ARGS(mp, agno, i1, i2, i3))
+
+/* simple AG-based error/%ip tracepoint class */
+DECLARE_EVENT_CLASS(xfs_ag_error_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error,
+		 unsigned long caller_ip),
+	TP_ARGS(mp, agno, error, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(int, error)
+		__field(unsigned long, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->error = error;
+		__entry->caller_ip = caller_ip;
+	),
+	TP_printk("dev %d:%d agno %u error %d caller %ps",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->error,
+		  (char *)__entry->caller_ip)
+);
+
+#define DEFINE_AG_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_ag_error_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error, \
+		 unsigned long caller_ip), \
+	TP_ARGS(mp, agno, error, caller_ip))
+
+/* refcount btree tracepoints */
+DEFINE_AG_BTREE_LOOKUP_EVENT(xfs_refcountbt_lookup);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_get);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_update);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_insert);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcountbt_delete);
+
+/* refcount adjustment tracepoints */
+DEFINE_AG_EXTENT_EVENT(xfs_refcount_increase);
+DEFINE_AG_EXTENT_EVENT(xfs_refcount_decrease);
+DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(xfs_refcount_merge_center_extents);
+DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_modify_extent);
+DEFINE_REFCOUNT_EXTENT_AT_EVENT(xfs_refcount_split_left_extent);
+DEFINE_REFCOUNT_EXTENT_AT_EVENT(xfs_refcount_split_right_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_left_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_right_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(xfs_refcount_find_left_extent);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(xfs_refcount_find_right_extent);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_adjust_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_center_extents_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_modify_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_split_left_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_split_right_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_left_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_right_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_find_left_extent_error);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_find_right_extent_error);
+DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_rec_order_error);
+
+/* reflink helpers */
+DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared);
+DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared_result);
+DEFINE_AG_ERROR_EVENT(xfs_refcount_find_shared_error);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 38/76] xfs: introduce refcount btree definitions
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (36 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 37/76] xfs: define tracepoints for refcount btree activities Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 39/76] xfs: add refcount btree stats infrastructure Darrick J. Wong
                   ` (38 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add new per-AG refcount btree definitions to the per-AG structures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |    5 +++++
 fs/xfs/libxfs/xfs_btree.c  |    5 +++--
 fs/xfs/libxfs/xfs_btree.h  |    4 ++++
 fs/xfs/libxfs/xfs_format.h |   34 ++++++++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_types.h  |    2 +-
 fs/xfs/xfs_inode.h         |    5 +++++
 fs/xfs/xfs_mount.h         |    3 +++
 7 files changed, 53 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index cc23a30..b69993d 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2412,6 +2412,10 @@ xfs_agf_verify(
 	    be32_to_cpu(agf->agf_btreeblks) > be32_to_cpu(agf->agf_length))
 		return false;
 
+	if (xfs_sb_version_hasreflink(&mp->m_sb) &&
+	    be32_to_cpu(agf->agf_refcount_level) > XFS_BTREE_MAXLEVELS)
+		return false;
+
 	return true;;
 
 }
@@ -2531,6 +2535,7 @@ xfs_alloc_read_agf(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
 		pag->pagf_levels[XFS_BTNUM_RMAPi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
+		pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level);
 		spin_lock_init(&pag->pagb_lock);
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index b810154..03ff4e4 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -45,9 +45,10 @@ kmem_zone_t	*xfs_btree_cur_zone;
  */
 static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
 	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
-	  XFS_FIBT_MAGIC },
+	  XFS_FIBT_MAGIC, 0 },
 	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
-	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
+	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC,
+	  XFS_REFC_CRC_MAGIC }
 };
 #define xfs_btree_magic(cur) \
 	xfs_magics[!!((cur)->bc_flags & XFS_BTREE_CRC_BLOCKS)][cur->bc_btnum]
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 6443c74..ddb66fd 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -66,6 +66,7 @@ union xfs_btree_rec {
 #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
 #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
 #define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
+#define	XFS_BTNUM_REFC	((xfs_btnum_t)XFS_BTNUM_REFCi)
 
 /*
  * For logging record fields.
@@ -99,6 +100,7 @@ do {    \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
 	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
+	case XFS_BTNUM_REFC: break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -121,6 +123,7 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
 	case XFS_BTNUM_RMAP:	\
 		__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
+	case XFS_BTNUM_REFC: break;	\
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -225,6 +228,7 @@ typedef struct xfs_btree_cur
 	union {
 		struct {			/* needed for BNO, CNT, INO */
 			struct xfs_buf	*agbp;	/* agf/agi buffer pointer */
+			struct xfs_bmap_free *flist;	/* list to free after */
 			xfs_agnumber_t	agno;	/* ag number */
 		} a;
 		struct {			/* needed for BMAP */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 4678831..4e5c918 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -456,6 +456,7 @@ xfs_sb_has_compat_feature(
 
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
+#define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
 		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
@@ -534,6 +535,12 @@ static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
 		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
 }
 
+static inline bool xfs_sb_version_hasreflink(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_REFLINK);
+}
+
 /*
  * XFS_SB_FEAT_INCOMPAT_META_UUID indicates that the metadata UUID
  * is stored separately from the user-visible UUID; this allows the
@@ -640,12 +647,15 @@ typedef struct xfs_agf {
 	__be32		agf_btreeblks;	/* # of blocks held in AGF btrees */
 	uuid_t		agf_uuid;	/* uuid of filesystem */
 
+	__be32		agf_refcount_root;	/* refcount tree root block */
+	__be32		agf_refcount_level;	/* refcount btree levels */
+
 	/*
 	 * reserve some contiguous space for future logged fields before we add
 	 * the unlogged fields. This makes the range logging via flags and
 	 * structure offsets much simpler.
 	 */
-	__be64		agf_spare64[16];
+	__be64		agf_spare64[15];
 
 	/* unlogged fields, written during buffer writeback. */
 	__be64		agf_lsn;	/* last write sequence */
@@ -1032,6 +1042,18 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev)
 	 XFS_DIFLAG_EXTSZINHERIT | XFS_DIFLAG_NODEFRAG | XFS_DIFLAG_FILESTREAM)
 
 /*
+ * Values for di_flags2
+ * There should be a one-to-one correspondence between these flags and the
+ * XFS_XFLAG_s.
+ */
+#define XFS_DIFLAG2_REFLINK_BIT   0	/* file's blocks may be reflinked */
+#define XFS_DIFLAG2_REFLINK      (1 << XFS_DIFLAG2_REFLINK_BIT)
+
+#define XFS_DIFLAG2_ANY \
+	(XFS_DIFLAG2_REFLINK)
+
+
+/*
  * Inode number format:
  * low inopblog bits - offset in block
  * next agblklog bits - block number in ag
@@ -1376,7 +1398,8 @@ XFS_RMAP_INO_OWNER(
 #define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
 #define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
-#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
+#define XFS_RMAP_OWN_REFC	(-8ULL) /* refcount tree */
+#define XFS_RMAP_OWN_MIN	(-9ULL) /* guard */
 
 #define XFS_RMAP_NON_INODE_OWNER(owner)	(!!((owner) & (1ULL << 63)))
 
@@ -1479,6 +1502,13 @@ xfs_owner_info_pack(
 }
 
 /*
+ * Reference Count Btree format definitions
+ *
+ */
+#define	XFS_REFC_CRC_MAGIC	0x52334643	/* 'R3FC' */
+
+
+/*
  * BMAP Btree format definitions
  *
  * This includes both the root block definition that sits inside an inode fork
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index 3d50364..be7b6de 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -109,7 +109,7 @@ typedef enum {
 
 typedef enum {
 	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
-	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_MAX
 } xfs_btnum_t;
 
 struct xfs_name {
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ca9e119..6436a96 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -202,6 +202,11 @@ xfs_get_initial_prid(struct xfs_inode *dp)
 	return XFS_PROJID_DEFAULT;
 }
 
+static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
+{
+	return ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+}
+
 /*
  * In-core inode flags.
  */
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 16f31f3..edced9c 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -316,6 +316,9 @@ typedef struct xfs_perag {
 	/* for rcu-safe freeing */
 	struct rcu_head	rcu_head;
 	int		pagb_count;	/* pagb slots in use */
+
+	/* reference count */
+	__uint8_t	pagf_refcount_level;
 } xfs_perag_t;
 
 extern void	xfs_uuid_table_free(void);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 39/76] xfs: add refcount btree stats infrastructure
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (37 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 38/76] xfs: introduce refcount btree definitions Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 40/76] xfs: refcount btree add more reserved blocks Darrick J. Wong
                   ` (37 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

The refcount btree presents the same stats as the other btrees, so
add all the code for that now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_btree.h |    5 +++--
 fs/xfs/xfs_stats.c        |    1 +
 fs/xfs/xfs_stats.h        |   18 +++++++++++++++++-
 3 files changed, 21 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index ddb66fd..ec37ac7 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -100,7 +100,7 @@ do {    \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
 	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(__mp, rmap, stat); break; \
-	case XFS_BTNUM_REFC: break; \
+	case XFS_BTNUM_REFC: __XFS_BTREE_STATS_INC(__mp, refcbt, stat); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
@@ -123,7 +123,8 @@ do {    \
 		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
 	case XFS_BTNUM_RMAP:	\
 		__XFS_BTREE_STATS_ADD(__mp, rmap, stat, val); break; \
-	case XFS_BTNUM_REFC: break;	\
+	case XFS_BTNUM_REFC:	\
+		__XFS_BTREE_STATS_ADD(__mp, refcbt, stat, val); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break; \
 	}       \
 } while (0)
diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
index f04f547..e447deb 100644
--- a/fs/xfs/xfs_stats.c
+++ b/fs/xfs/xfs_stats.c
@@ -62,6 +62,7 @@ int xfs_stats_format(struct xfsstats __percpu *stats, char *buf)
 		{ "ibt2",		XFSSTAT_END_IBT_V2		},
 		{ "fibt2",		XFSSTAT_END_FIBT_V2		},
 		{ "rmapbt",		XFSSTAT_END_RMAP_V2		},
+		{ "refcntbt",		XFSSTAT_END_REFCOUNT		},
 		/* we print both series of quota information together */
 		{ "qm",			XFSSTAT_END_QM			},
 	};
diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h
index 657865f..79ad2e6 100644
--- a/fs/xfs/xfs_stats.h
+++ b/fs/xfs/xfs_stats.h
@@ -213,7 +213,23 @@ struct xfsstats {
 	__uint32_t		xs_rmap_2_alloc;
 	__uint32_t		xs_rmap_2_free;
 	__uint32_t		xs_rmap_2_moves;
-#define XFSSTAT_END_XQMSTAT		(XFSSTAT_END_RMAP_V2+6)
+#define XFSSTAT_END_REFCOUNT		(XFSSTAT_END_RMAP_V2 + 15)
+	__uint32_t		xs_refcbt_2_lookup;
+	__uint32_t		xs_refcbt_2_compare;
+	__uint32_t		xs_refcbt_2_insrec;
+	__uint32_t		xs_refcbt_2_delrec;
+	__uint32_t		xs_refcbt_2_newroot;
+	__uint32_t		xs_refcbt_2_killroot;
+	__uint32_t		xs_refcbt_2_increment;
+	__uint32_t		xs_refcbt_2_decrement;
+	__uint32_t		xs_refcbt_2_lshift;
+	__uint32_t		xs_refcbt_2_rshift;
+	__uint32_t		xs_refcbt_2_split;
+	__uint32_t		xs_refcbt_2_join;
+	__uint32_t		xs_refcbt_2_alloc;
+	__uint32_t		xs_refcbt_2_free;
+	__uint32_t		xs_refcbt_2_moves;
+#define XFSSTAT_END_XQMSTAT		(XFSSTAT_END_REFCOUNT + 6)
 	__uint32_t		xs_qm_dqreclaims;
 	__uint32_t		xs_qm_dqreclaim_misses;
 	__uint32_t		xs_qm_dquot_dups;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 40/76] xfs: refcount btree add more reserved blocks
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (38 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 39/76] xfs: add refcount btree stats infrastructure Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 41/76] xfs: define the on-disk refcount btree format Darrick J. Wong
                   ` (36 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Since XFS reserves a small amount of space in each AG as the minimum
free space needed for an operation, save some more space in case we
touch the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |   13 +++++++++++++
 fs/xfs/libxfs/xfs_format.h |    2 ++
 2 files changed, 15 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index b69993d..71dac69 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -50,10 +50,23 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+unsigned int
+xfs_refc_block(
+	struct xfs_mount	*mp)
+{
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return XFS_RMAP_BLOCK(mp) + 1;
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		return XFS_FIBT_BLOCK(mp) + 1;
+	return XFS_IBT_BLOCK(mp) + 1;
+}
+
 xfs_extlen_t
 xfs_prealloc_blocks(
 	struct xfs_mount	*mp)
 {
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		return xfs_refc_block(mp) + 1;
 	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return XFS_RMAP_BLOCK(mp) + 1;
 	if (xfs_sb_version_hasfinobt(&mp->m_sb))
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 4e5c918..611083e 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1507,6 +1507,8 @@ xfs_owner_info_pack(
  */
 #define	XFS_REFC_CRC_MAGIC	0x52334643	/* 'R3FC' */
 
+unsigned int xfs_refc_block(struct xfs_mount *mp);
+
 
 /*
  * BMAP Btree format definitions

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 41/76] xfs: define the on-disk refcount btree format
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (39 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 40/76] xfs: refcount btree add more reserved blocks Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:00 ` [PATCH 42/76] xfs: add refcount btree support to growfs Darrick J. Wong
                   ` (35 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Start constructing the refcount btree implementation by establishing
the on-disk format and everything needed to read, write, and
manipulate the refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                    |    1 
 fs/xfs/libxfs/xfs_btree.c          |    3 +
 fs/xfs/libxfs/xfs_btree.h          |    3 +
 fs/xfs/libxfs/xfs_format.h         |   32 +++++++
 fs/xfs/libxfs/xfs_refcount_btree.c |  173 ++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_refcount_btree.h |   65 ++++++++++++++
 fs/xfs/libxfs/xfs_sb.c             |    9 ++
 fs/xfs/libxfs/xfs_shared.h         |    2 
 fs/xfs/xfs_mount.h                 |    2 
 fs/xfs/xfs_trace.h                 |   11 --
 10 files changed, 291 insertions(+), 10 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_refcount_btree.c
 create mode 100644 fs/xfs/libxfs/xfs_refcount_btree.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 9391080..9bac577 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -53,6 +53,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_log_rlimit.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
+				   xfs_refcount_btree.o \
 				   xfs_sb.o \
 				   xfs_symlink_remote.o \
 				   xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 03ff4e4..79b8137 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1134,6 +1134,9 @@ xfs_btree_set_refs(
 	case XFS_BTNUM_RMAP:
 		xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
 		break;
+	case XFS_BTNUM_REFC:
+		xfs_buf_set_ref(bp, XFS_REFC_BTREE_REF);
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index ec37ac7..0ca80d3 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -43,6 +43,7 @@ union xfs_btree_key {
 	xfs_alloc_key_t			alloc;
 	struct xfs_inobt_key		inobt;
 	struct xfs_rmap_key		rmap;
+	struct xfs_refcount_key		refc;
 };
 
 union xfs_btree_rec {
@@ -51,6 +52,7 @@ union xfs_btree_rec {
 	struct xfs_alloc_rec		alloc;
 	struct xfs_inobt_rec		inobt;
 	struct xfs_rmap_rec		rmap;
+	struct xfs_refcount_rec		refc;
 };
 
 /*
@@ -217,6 +219,7 @@ typedef struct xfs_btree_cur
 		xfs_bmbt_irec_t		b;
 		xfs_inobt_rec_incore_t	i;
 		struct xfs_rmap_irec	r;
+		struct xfs_refcount_irec	rc;
 	}		bc_rec;		/* current insert/search record value */
 	struct xfs_buf	*bc_bufs[XFS_BTREE_MAXLEVELS];	/* buf ptr per level */
 	int		bc_ptrs[XFS_BTREE_MAXLEVELS];	/* key/record # */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 611083e..7482531 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1509,6 +1509,38 @@ xfs_owner_info_pack(
 
 unsigned int xfs_refc_block(struct xfs_mount *mp);
 
+/*
+ * Data record/key structure
+ *
+ * Each record associates a range of physical blocks (starting at
+ * rc_startblock and ending rc_blockcount blocks later) with a
+ * reference count (rc_refcount).  A record is only stored in the
+ * btree if the refcount is > 2.  An entry in the free block btree
+ * means that the refcount is 0, and no entries anywhere means that
+ * the refcount is 1, as was true in XFS before reflinking.
+ */
+struct xfs_refcount_rec {
+	__be32		rc_startblock;	/* starting block number */
+	__be32		rc_blockcount;	/* count of blocks */
+	__be32		rc_refcount;	/* number of inodes linked here */
+};
+
+struct xfs_refcount_key {
+	__be32		rc_startblock;	/* starting block number */
+};
+
+struct xfs_refcount_irec {
+	xfs_agblock_t	rc_startblock;	/* starting block number */
+	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
+	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
+};
+
+#define MAXREFCOUNT	((xfs_nlink_t)~0U)
+#define MAXREFCEXTLEN	((xfs_extlen_t)~0U)
+
+/* btree pointer type */
+typedef __be32 xfs_refcount_ptr_t;
+
 
 /*
  * BMAP Btree format definitions
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
new file mode 100644
index 0000000..7067a05
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -0,0 +1,173 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+
+static struct xfs_btree_cur *
+xfs_refcountbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_refcountbt_init_cursor(cur->bc_mp, cur->bc_tp,
+			cur->bc_private.a.agbp, cur->bc_private.a.agno,
+			cur->bc_private.a.flist);
+}
+
+STATIC bool
+xfs_refcountbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_perag	*pag = bp->b_pag;
+	unsigned int		level;
+
+	if (block->bb_magic != cpu_to_be32(XFS_REFC_CRC_MAGIC))
+		return false;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return false;
+	if (!xfs_btree_sblock_v5hdr_verify(bp))
+		return false;
+
+	level = be16_to_cpu(block->bb_level);
+	if (pag && pag->pagf_init) {
+		if (level >= pag->pagf_refcount_level)
+			return false;
+	} else if (level >= mp->m_ag_maxlevels)
+		return false;
+
+	return xfs_btree_sblock_verify(bp, mp->m_refc_mxr[level != 0]);
+}
+
+STATIC void
+xfs_refcountbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_btree_sblock_verify_crc(bp))
+		xfs_buf_ioerror(bp, -EFSBADCRC);
+	else if (!xfs_refcountbt_verify(bp))
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+	if (bp->b_error) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp);
+	}
+}
+
+STATIC void
+xfs_refcountbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_refcountbt_verify(bp)) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+		xfs_verifier_error(bp);
+		return;
+	}
+	xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
+	.verify_read		= xfs_refcountbt_read_verify,
+	.verify_write		= xfs_refcountbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_refcountbt_ops = {
+	.rec_len		= sizeof(struct xfs_refcount_rec),
+	.key_len		= sizeof(struct xfs_refcount_key),
+
+	.dup_cursor		= xfs_refcountbt_dup_cursor,
+	.buf_ops		= &xfs_refcountbt_buf_ops,
+};
+
+/**
+ * xfs_refcountbt_init_cursor() -- Allocate a new refcount btree cursor.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ */
+struct xfs_btree_cur *
+xfs_refcountbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	struct xfs_bmap_free	*flist)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	struct xfs_btree_cur	*cur;
+
+	ASSERT(agno != NULLAGNUMBER);
+	ASSERT(agno < mp->m_sb.sb_agcount);
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
+
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = XFS_BTNUM_REFC;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ops = &xfs_refcountbt_ops;
+
+	cur->bc_nlevels = be32_to_cpu(agf->agf_refcount_level);
+
+	cur->bc_private.a.agbp = agbp;
+	cur->bc_private.a.agno = agno;
+	cur->bc_private.a.flist = flist;
+	cur->bc_flags |= XFS_BTREE_CRC_BLOCKS;
+
+	return cur;
+}
+
+/**
+ * xfs_refcountbt_maxrecs() -- Calculate number of records in a refcount
+ *			       btree block.
+ * @mp: XFS mount object
+ * @blocklen: Length of block, in bytes.
+ * @leaf: true if this is a leaf btree block, false otherwise
+ */
+int
+xfs_refcountbt_maxrecs(
+	struct xfs_mount	*mp,
+	int			blocklen,
+	bool			leaf)
+{
+	blocklen -= XFS_REFCOUNT_BLOCK_LEN;
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_refcount_rec);
+	return blocklen / (sizeof(struct xfs_refcount_key) +
+			   sizeof(xfs_refcount_ptr_t));
+}
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
new file mode 100644
index 0000000..d51dc1a
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -0,0 +1,65 @@
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_REFCOUNT_BTREE_H__
+#define	__XFS_REFCOUNT_BTREE_H__
+
+/*
+ * Reference Count Btree on-disk structures
+ */
+
+struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/*
+ * Btree block header size
+ */
+#define XFS_REFCOUNT_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_REFCOUNT_REC_ADDR(block, index) \
+	((struct xfs_refcount_rec *) \
+		((char *)(block) + \
+		 XFS_REFCOUNT_BLOCK_LEN + \
+		 (((index) - 1) * sizeof(struct xfs_refcount_rec))))
+
+#define XFS_REFCOUNT_KEY_ADDR(block, index) \
+	((struct xfs_refcount_key *) \
+		((char *)(block) + \
+		 XFS_REFCOUNT_BLOCK_LEN + \
+		 ((index) - 1) * sizeof(struct xfs_refcount_key)))
+
+#define XFS_REFCOUNT_PTR_ADDR(block, index, maxrecs) \
+	((xfs_refcount_ptr_t *) \
+		((char *)(block) + \
+		 XFS_REFCOUNT_BLOCK_LEN + \
+		 (maxrecs) * sizeof(struct xfs_refcount_key) + \
+		 ((index) - 1) * sizeof(xfs_refcount_ptr_t)))
+
+extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
+		struct xfs_trans *tp, struct xfs_buf *agbp, xfs_agnumber_t agno,
+		struct xfs_bmap_free *flist);
+extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
+		bool leaf);
+
+#endif	/* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 2590c89..c195ea7 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -37,6 +37,8 @@
 #include "xfs_ialloc_btree.h"
 #include "xfs_log.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -738,6 +740,13 @@ xfs_sb_mount_common(
 	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
 	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
 
+	mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
+			true);
+	mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize,
+			false);
+	mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2;
+	mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2;
+
 	mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
 	mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
 					sbp->sb_inopblock);
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index fa2bb9b..bffef9e 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_agf_buf_ops;
 extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
 extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
+extern const struct xfs_buf_ops xfs_refcountbt_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -216,6 +217,7 @@ int	xfs_log_calc_minimum_size(struct xfs_mount *);
 #define	XFS_INO_REF		2
 #define	XFS_ATTR_BTREE_REF	1
 #define	XFS_DQUOT_REF		1
+#define	XFS_REFC_BTREE_REF	1
 
 /*
  * Flags for xfs_trans_ichgtime().
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index edced9c..caed8d3 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -92,6 +92,8 @@ typedef struct xfs_mount {
 	uint			m_inobt_mnr[2];	/* min inobt btree records */
 	uint			m_rmap_mxr[2];	/* max rmap btree records */
 	uint			m_rmap_mnr[2];	/* min rmap btree records */
+	uint			m_refc_mxr[2];	/* max refc btree records */
+	uint			m_refc_mnr[2];	/* min refc btree records */
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 1c91439..11b159a 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -38,16 +38,7 @@ struct xlog_recover_item;
 struct xfs_buf_log_format;
 struct xfs_inode_log_format;
 struct xfs_bmbt_irec;
-
-#ifndef XFS_REFCOUNT_IREC_PLACEHOLDER
-#define XFS_REFCOUNT_IREC_PLACEHOLDER
-/* Placeholder definition to avoid breaking bisectability. */
-struct xfs_refcount_irec {
-	xfs_agblock_t	rc_startblock;	/* starting block number */
-	xfs_extlen_t	rc_blockcount;	/* count of free blocks */
-	xfs_nlink_t	rc_refcount;	/* number of inodes linked here */
-};
-#endif
+struct xfs_refcount_irec;
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 42/76] xfs: add refcount btree support to growfs
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (40 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 41/76] xfs: define the on-disk refcount btree format Darrick J. Wong
@ 2015-12-19  9:00 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 43/76] xfs: add refcount btree operations Darrick J. Wong
                   ` (34 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:00 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Modify the growfs code to initialize new refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_fsops.c |   38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)


diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 64bb02b..7c37c22 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -260,6 +260,11 @@ xfs_growfs_data_private(
 		agf->agf_longest = cpu_to_be32(tmpsize);
 		if (xfs_sb_version_hascrc(&mp->m_sb))
 			uuid_copy(&agf->agf_uuid, &mp->m_sb.sb_meta_uuid);
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			agf->agf_refcount_root = cpu_to_be32(
+					xfs_refc_block(mp));
+			agf->agf_refcount_level = cpu_to_be32(1);
+		}
 
 		error = xfs_bwrite(bp);
 		xfs_buf_relse(bp);
@@ -451,6 +456,17 @@ xfs_growfs_data_private(
 			rrec->rm_offset = 0;
 			be16_add_cpu(&block->bb_numrecs, 1);
 
+			/* account for refc btree root */
+			if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+				rrec = XFS_RMAP_REC_ADDR(block, 5);
+				rrec->rm_startblock = cpu_to_be32(
+						xfs_refc_block(mp));
+				rrec->rm_blockcount = cpu_to_be32(1);
+				rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_REFC);
+				rrec->rm_offset = 0;
+				be16_add_cpu(&block->bb_numrecs, 1);
+			}
+
 			error = xfs_bwrite(bp);
 			xfs_buf_relse(bp);
 			if (error)
@@ -508,6 +524,28 @@ xfs_growfs_data_private(
 				goto error0;
 		}
 
+		/*
+		 * refcount btree root block
+		 */
+		if (xfs_sb_version_hasreflink(&mp->m_sb)) {
+			bp = xfs_growfs_get_hdr_buf(mp,
+				XFS_AGB_TO_DADDR(mp, agno, xfs_refc_block(mp)),
+				BTOBB(mp->m_sb.sb_blocksize), 0,
+				&xfs_refcountbt_buf_ops);
+			if (!bp) {
+				error = -ENOMEM;
+				goto error0;
+			}
+
+			xfs_btree_init_block(mp, bp, XFS_REFC_CRC_MAGIC,
+					     0, 0, agno,
+					     XFS_BTREE_CRC_BLOCKS);
+
+			error = xfs_bwrite(bp);
+			xfs_buf_relse(bp);
+			if (error)
+				goto error0;
+		}
 	}
 	xfs_trans_agblocks_delta(tp, nfree);
 	/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 43/76] xfs: add refcount btree operations
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (41 preceding siblings ...)
  2015-12-19  9:00 ` [PATCH 42/76] xfs: add refcount btree support to growfs Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 44/76] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
                   ` (33 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Implement the generic btree operations required to manipulate refcount
btree blocks.  The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                    |    1 
 fs/xfs/libxfs/xfs_refcount.c       |  169 ++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_refcount.h       |   29 +++++
 fs/xfs/libxfs/xfs_refcount_btree.c |  203 ++++++++++++++++++++++++++++++++++++
 4 files changed, 402 insertions(+)
 create mode 100644 fs/xfs/libxfs/xfs_refcount.c
 create mode 100644 fs/xfs/libxfs/xfs_refcount.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 9bac577..7d84c62 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -53,6 +53,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_log_rlimit.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
+				   xfs_refcount.o \
 				   xfs_refcount_btree.o \
 				   xfs_sb.o \
 				   xfs_symlink_remote.o \
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
new file mode 100644
index 0000000..b3f2c25
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -0,0 +1,169 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_btree.h"
+#include "xfs_bmap.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_alloc.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+#include "xfs_refcount.h"
+
+/**
+ * xfs_refcountbt_lookup_le() -- Look up the first record less than or equal to
+ *				 [bno, len] in the btree given by cur.
+ * @cur: refcount btree cursor
+ * @bno: AG block number to look up
+ * @stat: set to 1 if successful, 0 otherwise
+ */
+int
+xfs_refcountbt_lookup_le(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	int			*stat)
+{
+	trace_xfs_refcountbt_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+			XFS_LOOKUP_LE);
+	cur->bc_rec.rc.rc_startblock = bno;
+	cur->bc_rec.rc.rc_blockcount = 0;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/**
+ * xfs_refcountbt_lookup_ge() -- Look up the first record greater than or equal
+ *				 to [bno, len] in the btree given by cur.
+ * @cur: refcount btree cursor
+ * @bno: AG block number to look up
+ * @stat: set to 1 if successful, 0 otherwise
+ */
+int					/* error */
+xfs_refcountbt_lookup_ge(
+	struct xfs_btree_cur	*cur,	/* btree cursor */
+	xfs_agblock_t		bno,	/* starting block of extent */
+	int			*stat)	/* success/failure */
+{
+	trace_xfs_refcountbt_lookup(cur->bc_mp, cur->bc_private.a.agno, bno,
+			XFS_LOOKUP_GE);
+	cur->bc_rec.rc.rc_startblock = bno;
+	cur->bc_rec.rc.rc_blockcount = 0;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_GE, stat);
+}
+
+/**
+ * xfs_refcountbt_get_rec() -- Get the data from the pointed-to record.
+ *
+ * @cur: refcount btree cursor
+ * @irec: set to the record currently pointed to by the btree cursor
+ * @stat: set to 1 if successful, 0 otherwise
+ */
+int
+xfs_refcountbt_get_rec(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*irec,
+	int				*stat)
+{
+	union xfs_btree_rec	*rec;
+	int			error;
+
+	error = xfs_btree_get_rec(cur, &rec, stat);
+	if (!error && *stat == 1) {
+		irec->rc_startblock = be32_to_cpu(rec->refc.rc_startblock);
+		irec->rc_blockcount = be32_to_cpu(rec->refc.rc_blockcount);
+		irec->rc_refcount = be32_to_cpu(rec->refc.rc_refcount);
+		trace_xfs_refcountbt_get(cur->bc_mp, cur->bc_private.a.agno,
+				irec);
+	}
+	return error;
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, refcount].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_update(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*irec)
+{
+	union xfs_btree_rec	rec;
+
+	trace_xfs_refcountbt_update(cur->bc_mp, cur->bc_private.a.agno, irec);
+	rec.refc.rc_startblock = cpu_to_be32(irec->rc_startblock);
+	rec.refc.rc_blockcount = cpu_to_be32(irec->rc_blockcount);
+	rec.refc.rc_refcount = cpu_to_be32(irec->rc_refcount);
+	return xfs_btree_update(cur, &rec);
+}
+
+/*
+ * Insert the record referred to by cur to the value given
+ * by [bno, len, refcount].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_insert(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*irec,
+	int				*i)
+{
+	trace_xfs_refcountbt_insert(cur->bc_mp, cur->bc_private.a.agno, irec);
+	cur->bc_rec.rc.rc_startblock = irec->rc_startblock;
+	cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount;
+	cur->bc_rec.rc.rc_refcount = irec->rc_refcount;
+	return xfs_btree_insert(cur, i);
+}
+
+/*
+ * Remove the record referred to by cur, then set the pointer to the spot
+ * where the record could be re-inserted, in case we want to increment or
+ * decrement the cursor.
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_refcountbt_delete(
+	struct xfs_btree_cur	*cur,
+	int			*i)
+{
+	struct xfs_refcount_irec	irec;
+	int			found_rec;
+	int			error;
+
+	error = xfs_refcountbt_get_rec(cur, &irec, &found_rec);
+	if (error)
+		return error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	trace_xfs_refcountbt_delete(cur->bc_mp, cur->bc_private.a.agno, &irec);
+	error = xfs_btree_delete(cur, i);
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, *i == 1, out_error);
+	if (error)
+		return error;
+	error = xfs_refcountbt_lookup_ge(cur, irec.rc_startblock, &found_rec);
+out_error:
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
new file mode 100644
index 0000000..033a9b1
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_REFCOUNT_H__
+#define __XFS_REFCOUNT_H__
+
+extern int xfs_refcountbt_lookup_le(struct xfs_btree_cur *cur,
+		xfs_agblock_t bno, int *stat);
+extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
+		xfs_agblock_t bno, int *stat);
+extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
+		struct xfs_refcount_irec *irec, int *stat);
+
+#endif	/* __XFS_REFCOUNT_H__ */
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index 7067a05..d7574cd 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -43,6 +43,158 @@ xfs_refcountbt_dup_cursor(
 			cur->bc_private.a.flist);
 }
 
+STATIC void
+xfs_refcountbt_set_root(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			inc)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	struct xfs_perag	*pag = xfs_perag_get(cur->bc_mp, seqno);
+
+	ASSERT(ptr->s != 0);
+
+	agf->agf_refcount_root = ptr->s;
+	be32_add_cpu(&agf->agf_refcount_level, inc);
+	pag->pagf_refcount_level += inc;
+	xfs_perag_put(pag);
+
+	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+}
+
+STATIC int
+xfs_refcountbt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	struct xfs_alloc_arg	args;		/* block allocation args */
+	int			error;		/* error return value */
+
+	memset(&args, 0, sizeof(args));
+	args.tp = cur->bc_tp;
+	args.mp = cur->bc_mp;
+	args.type = XFS_ALLOCTYPE_NEAR_BNO;
+	args.fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_private.a.agno,
+			xfs_refc_block(args.mp));
+	args.firstblock = args.fsbno;
+	XFS_RMAP_AG_OWNER(&args.oinfo, XFS_RMAP_OWN_REFC);
+	args.minlen = args.maxlen = args.prod = 1;
+
+	error = xfs_alloc_vextent(&args);
+	if (error)
+		goto out_error;
+	if (args.fsbno == NULLFSBLOCK) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+		*stat = 0;
+		return 0;
+	}
+	ASSERT(args.agno == cur->bc_private.a.agno);
+	ASSERT(args.len == 1);
+
+	new->s = cpu_to_be32(args.agbno);
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+	*stat = 1;
+	return 0;
+
+out_error:
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+	return error;
+}
+
+STATIC int
+xfs_refcountbt_free_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = cur->bc_mp;
+	struct xfs_trans	*tp = cur->bc_tp;
+	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
+	struct xfs_owner_info	oinfo;
+
+	XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_REFC);
+	xfs_bmap_add_free(mp, cur->bc_private.a.flist, fsbno, 1,
+			&oinfo);
+	xfs_trans_binval(tp, bp);
+	return 0;
+}
+
+STATIC int
+xfs_refcountbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_refc_mnr[level != 0];
+}
+
+STATIC int
+xfs_refcountbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_refc_mxr[level != 0];
+}
+
+STATIC void
+xfs_refcountbt_init_key_from_rec(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	ASSERT(rec->refc.rc_startblock != 0);
+
+	key->refc.rc_startblock = rec->refc.rc_startblock;
+}
+
+STATIC void
+xfs_refcountbt_init_rec_from_key(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	ASSERT(key->refc.rc_startblock != 0);
+
+	rec->refc.rc_startblock = key->refc.rc_startblock;
+}
+
+STATIC void
+xfs_refcountbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	ASSERT(cur->bc_rec.rc.rc_startblock != 0);
+
+	rec->refc.rc_startblock = cpu_to_be32(cur->bc_rec.rc.rc_startblock);
+	rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount);
+	rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount);
+}
+
+STATIC void
+xfs_refcountbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+	ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(agf->agf_refcount_root != 0);
+
+	ptr->s = agf->agf_refcount_root;
+}
+
+STATIC __int64_t
+xfs_refcountbt_key_diff(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*key)
+{
+	struct xfs_refcount_irec	*rec = &cur->bc_rec.rc;
+	struct xfs_refcount_key		*kp = &key->refc;
+
+	return (__int64_t)be32_to_cpu(kp->rc_startblock) - rec->rc_startblock;
+}
+
 STATIC bool
 xfs_refcountbt_verify(
 	struct xfs_buf		*bp)
@@ -104,12 +256,63 @@ const struct xfs_buf_ops xfs_refcountbt_buf_ops = {
 	.verify_write		= xfs_refcountbt_write_verify,
 };
 
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_refcountbt_keys_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return be32_to_cpu(k1->refc.rc_startblock) <
+	       be32_to_cpu(k2->refc.rc_startblock);
+}
+
+STATIC int
+xfs_refcountbt_recs_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*r1,
+	union xfs_btree_rec	*r2)
+{
+	struct xfs_refcount_irec	a, b;
+
+	int ret = be32_to_cpu(r1->refc.rc_startblock) +
+		be32_to_cpu(r1->refc.rc_blockcount) <=
+		be32_to_cpu(r2->refc.rc_startblock);
+	if (!ret) {
+		a.rc_startblock = be32_to_cpu(r1->refc.rc_startblock);
+		a.rc_blockcount = be32_to_cpu(r1->refc.rc_blockcount);
+		a.rc_refcount = be32_to_cpu(r1->refc.rc_refcount);
+		b.rc_startblock = be32_to_cpu(r2->refc.rc_startblock);
+		b.rc_blockcount = be32_to_cpu(r2->refc.rc_blockcount);
+		b.rc_refcount = be32_to_cpu(r2->refc.rc_refcount);
+		trace_xfs_refcount_rec_order_error(cur->bc_mp,
+				cur->bc_private.a.agno, &a, &b);
+	}
+
+	return ret;
+}
+#endif	/* DEBUG */
+
 static const struct xfs_btree_ops xfs_refcountbt_ops = {
 	.rec_len		= sizeof(struct xfs_refcount_rec),
 	.key_len		= sizeof(struct xfs_refcount_key),
 
 	.dup_cursor		= xfs_refcountbt_dup_cursor,
+	.set_root		= xfs_refcountbt_set_root,
+	.alloc_block		= xfs_refcountbt_alloc_block,
+	.free_block		= xfs_refcountbt_free_block,
+	.get_minrecs		= xfs_refcountbt_get_minrecs,
+	.get_maxrecs		= xfs_refcountbt_get_maxrecs,
+	.init_key_from_rec	= xfs_refcountbt_init_key_from_rec,
+	.init_rec_from_key	= xfs_refcountbt_init_rec_from_key,
+	.init_rec_from_cur	= xfs_refcountbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_refcountbt_init_ptr_from_cur,
+	.key_diff		= xfs_refcountbt_key_diff,
 	.buf_ops		= &xfs_refcountbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+	.keys_inorder		= xfs_refcountbt_keys_inorder,
+	.recs_inorder		= xfs_refcountbt_recs_inorder,
+#endif
 };
 
 /**

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 44/76] libxfs: adjust refcount of an extent of blocks in refcount btree
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (42 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 43/76] xfs: add refcount btree operations Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 45/76] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
                   ` (32 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_refcount.c |  803 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_refcount.h |    8 
 2 files changed, 811 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index b3f2c25..dae6453 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -167,3 +167,806 @@ xfs_refcountbt_delete(
 out_error:
 	return error;
 }
+
+/*
+ * Adjusting the Reference Count
+ *
+ * As stated elsewhere, the reference count btree (refcbt) stores
+ * >1 reference counts for extents of physical blocks.  In this
+ * operation, we're either raising or lowering the reference count of
+ * some subrange stored in the tree:
+ *
+ *      <------ adjustment range ------>
+ * ----+   +---+-----+ +--+--------+---------
+ *  2  |   | 3 |  4  | |17|   55   |   10
+ * ----+   +---+-----+ +--+--------+---------
+ * X axis is physical blocks number;
+ * reference counts are the numbers inside the rectangles
+ *
+ * The first thing we need to do is to ensure that there are no
+ * refcount extents crossing either boundary of the range to be
+ * adjusted.  For any extent that does cross a boundary, split it into
+ * two extents so that we can increment the refcount of one of the
+ * pieces later:
+ *
+ *      <------ adjustment range ------>
+ * ----+   +---+-----+ +--+--------+----+----
+ *  2  |   | 3 |  2  | |17|   55   | 10 | 10
+ * ----+   +---+-----+ +--+--------+----+----
+ *
+ * For this next step, let's assume that all the physical blocks in
+ * the adjustment range are mapped to a file and are therefore in use
+ * at least once.  Therefore, we can infer that any gap in the
+ * refcount tree within the adjustment range represents a physical
+ * extent with refcount == 1:
+ *
+ *      <------ adjustment range ------>
+ * ----+---+---+-----+-+--+--------+----+----
+ *  2  |"1"| 3 |  2  |1|17|   55   | 10 | 10
+ * ----+---+---+-----+-+--+--------+----+----
+ *      ^
+ *
+ * For each extent that falls within the interval range, figure out
+ * which extent is to the left or the right of that extent.  Now we
+ * have a left, current, and right extent.  If the new reference count
+ * of the center extent enables us to merge left, center, and right
+ * into one record covering all three, do so.  If the center extent is
+ * at the left end of the range, abuts the left extent, and its new
+ * reference count matches the left extent's record, then merge them.
+ * If the center extent is at the right end of the range, abuts the
+ * right extent, and the reference counts match, merge those.  In the
+ * example, we can left merge (assuming an increment operation):
+ *
+ *      <------ adjustment range ------>
+ * --------+---+-----+-+--+--------+----+----
+ *    2    | 3 |  2  |1|17|   55   | 10 | 10
+ * --------+---+-----+-+--+--------+----+----
+ *          ^
+ *
+ * For all other extents within the range, adjust the reference count
+ * or delete it if the refcount falls below 2.  If we were
+ * incrementing, the end result looks like this:
+ *
+ *      <------ adjustment range ------>
+ * --------+---+-----+-+--+--------+----+----
+ *    2    | 4 |  3  |2|18|   56   | 11 | 10
+ * --------+---+-----+-+--+--------+----+----
+ *
+ * The result of a decrement operation looks as such:
+ *
+ *      <------ adjustment range ------>
+ * ----+   +---+       +--+--------+----+----
+ *  2  |   | 2 |       |16|   54   |  9 | 10
+ * ----+   +---+       +--+--------+----+----
+ *      DDDD    111111DD
+ *
+ * The blocks marked "D" are freed; the blocks marked "1" are only
+ * referenced once and therefore the record is removed from the
+ * refcount btree.
+ */
+
+#define RCNEXT(rc)	((rc).rc_startblock + (rc).rc_blockcount)
+/*
+ * Split a left rcextent that crosses agbno.
+ */
+STATIC int
+try_split_left_rcextent(
+	struct xfs_btree_cur		*cur,
+	xfs_agblock_t			agbno)
+{
+	struct xfs_refcount_irec	left, tmp;
+	int				found_rec;
+	int				error;
+
+	error = xfs_refcountbt_lookup_le(cur, agbno, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &left, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	if (left.rc_startblock >= agbno || RCNEXT(left) <= agbno)
+		return 0;
+
+	trace_xfs_refcount_split_left_extent(cur->bc_mp, cur->bc_private.a.agno,
+			&left, agbno);
+	tmp = left;
+	tmp.rc_blockcount = agbno - left.rc_startblock;
+	error = xfs_refcountbt_update(cur, &tmp);
+	if (error)
+		goto out_error;
+
+	error = xfs_btree_increment(cur, 0, &found_rec);
+	if (error)
+		goto out_error;
+
+	tmp = left;
+	tmp.rc_startblock = agbno;
+	tmp.rc_blockcount -= (agbno - left.rc_startblock);
+	error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	return error;
+
+out_error:
+	trace_xfs_refcount_split_left_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Split a right rcextent that crosses agbno.
+ */
+STATIC int
+try_split_right_rcextent(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbnext)
+{
+	struct xfs_refcount_irec	right, tmp;
+	int				found_rec;
+	int				error;
+
+	error = xfs_refcountbt_lookup_le(cur, agbnext - 1, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &right, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	if (RCNEXT(right) <= agbnext)
+		return 0;
+
+	trace_xfs_refcount_split_right_extent(cur->bc_mp,
+			cur->bc_private.a.agno, &right, agbnext);
+	tmp = right;
+	tmp.rc_startblock = agbnext;
+	tmp.rc_blockcount -= (agbnext - right.rc_startblock);
+	error = xfs_refcountbt_update(cur, &tmp);
+	if (error)
+		goto out_error;
+
+	tmp = right;
+	tmp.rc_blockcount = agbnext - right.rc_startblock;
+	error = xfs_refcountbt_insert(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+	return error;
+
+out_error:
+	trace_xfs_refcount_split_right_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Merge the left, center, and right extents.
+ */
+STATIC int
+merge_center(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*left,
+	struct xfs_refcount_irec	*center,
+	unsigned long long		extlen,
+	xfs_agblock_t			*agbno,
+	xfs_extlen_t			*aglen)
+{
+	int				error;
+	int				found_rec;
+
+	error = xfs_refcountbt_lookup_ge(cur, center->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	error = xfs_refcountbt_delete(cur, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	if (center->rc_refcount > 1) {
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+	}
+
+	error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	left->rc_blockcount = extlen;
+	error = xfs_refcountbt_update(cur, left);
+	if (error)
+		goto out_error;
+
+	*aglen = 0;
+	return error;
+
+out_error:
+	trace_xfs_refcount_merge_center_extents_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Merge with the left extent.
+ */
+STATIC int
+merge_left(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*left,
+	struct xfs_refcount_irec	*cleft,
+	xfs_agblock_t			*agbno,
+	xfs_extlen_t			*aglen)
+{
+	int				error;
+	int				found_rec;
+
+	if (cleft->rc_refcount > 1) {
+		error = xfs_refcountbt_lookup_le(cur, cleft->rc_startblock,
+				&found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+	}
+
+	error = xfs_refcountbt_lookup_le(cur, left->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	left->rc_blockcount += cleft->rc_blockcount;
+	error = xfs_refcountbt_update(cur, left);
+	if (error)
+		goto out_error;
+
+	*agbno += cleft->rc_blockcount;
+	*aglen -= cleft->rc_blockcount;
+	return error;
+
+out_error:
+	trace_xfs_refcount_merge_left_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Merge with the right extent.
+ */
+STATIC int
+merge_right(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*right,
+	struct xfs_refcount_irec	*cright,
+	xfs_agblock_t			*agbno,
+	xfs_extlen_t			*aglen)
+{
+	int				error;
+	int				found_rec;
+
+	if (cright->rc_refcount > 1) {
+		error = xfs_refcountbt_lookup_le(cur, cright->rc_startblock,
+			&found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		error = xfs_refcountbt_delete(cur, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+	}
+
+	error = xfs_refcountbt_lookup_le(cur, right->rc_startblock,
+			&found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	right->rc_startblock -= cright->rc_blockcount;
+	right->rc_blockcount += cright->rc_blockcount;
+	error = xfs_refcountbt_update(cur, right);
+	if (error)
+		goto out_error;
+
+	*aglen -= cright->rc_blockcount;
+	return error;
+
+out_error:
+	trace_xfs_refcount_merge_right_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Find the left extent and the one after it (cleft).  This function assumes
+ * that we've already split any extent crossing agbno.
+ */
+STATIC int
+find_left_extent(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*left,
+	struct xfs_refcount_irec	*cleft,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			aglen)
+{
+	struct xfs_refcount_irec	tmp;
+	int				error;
+	int				found_rec;
+
+	left->rc_blockcount = cleft->rc_blockcount = 0;
+	error = xfs_refcountbt_lookup_le(cur, agbno - 1, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	if (RCNEXT(tmp) != agbno)
+		return 0;
+	/* We have a left extent; retrieve (or invent) the next right one */
+	*left = tmp;
+
+	error = xfs_btree_increment(cur, 0, &found_rec);
+	if (error)
+		goto out_error;
+	if (found_rec) {
+		error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		/* if tmp starts at the end of our range, just use that */
+		if (tmp.rc_startblock == agbno)
+			*cleft = tmp;
+		else {
+			/*
+			 * There's a gap in the refcntbt at the start of the
+			 * range we're interested in (refcount == 1) so
+			 * create the implied extent and pass it back.
+			 */
+			cleft->rc_startblock = agbno;
+			cleft->rc_blockcount = min(aglen,
+					tmp.rc_startblock - agbno);
+			cleft->rc_refcount = 1;
+		}
+	} else {
+		/*
+		 * No extents, so pretend that there's one covering the whole
+		 * range.
+		 */
+		cleft->rc_startblock = agbno;
+		cleft->rc_blockcount = aglen;
+		cleft->rc_refcount = 1;
+	}
+	trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_private.a.agno,
+			left, cleft, agbno);
+	return error;
+
+out_error:
+	trace_xfs_refcount_find_left_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Find the right extent and the one before it (cright).  This function
+ * assumes that we've already split any extents crossing agbno + aglen.
+ */
+STATIC int
+find_right_extent(
+	struct xfs_btree_cur		*cur,
+	struct xfs_refcount_irec	*right,
+	struct xfs_refcount_irec	*cright,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			aglen)
+{
+	struct xfs_refcount_irec	tmp;
+	int				error;
+	int				found_rec;
+
+	right->rc_blockcount = cright->rc_blockcount = 0;
+	error = xfs_refcountbt_lookup_ge(cur, agbno + aglen, &found_rec);
+	if (error)
+		goto out_error;
+	if (!found_rec)
+		return 0;
+
+	error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1, out_error);
+
+	if (tmp.rc_startblock != agbno + aglen)
+		return 0;
+	/* We have a right extent; retrieve (or invent) the next left one */
+	*right = tmp;
+
+	error = xfs_btree_decrement(cur, 0, &found_rec);
+	if (error)
+		goto out_error;
+	if (found_rec) {
+		error = xfs_refcountbt_get_rec(cur, &tmp, &found_rec);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, found_rec == 1,
+				out_error);
+
+		/* if tmp ends at the end of our range, just use that */
+		if (RCNEXT(tmp) == agbno + aglen)
+			*cright = tmp;
+		else {
+			/*
+			 * There's a gap in the refcntbt at the end of the
+			 * range we're interested in (refcount == 1) so
+			 * create the implied extent and pass it back.
+			 */
+			cright->rc_startblock = max(agbno, RCNEXT(tmp));
+			cright->rc_blockcount = right->rc_startblock -
+					cright->rc_startblock;
+			cright->rc_refcount = 1;
+		}
+	} else {
+		/*
+		 * No extents, so pretend that there's one covering the whole
+		 * range.
+		 */
+		cright->rc_startblock = agbno;
+		cright->rc_blockcount = aglen;
+		cright->rc_refcount = 1;
+	}
+	trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_private.a.agno,
+			cright, right, agbno + aglen);
+	return error;
+
+out_error:
+	trace_xfs_refcount_find_right_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+#undef RCNEXT
+
+/*
+ * Try to merge with any extents on the boundaries of the adjustment range.
+ */
+STATIC int
+try_merge_rcextents(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		*agbno,
+	xfs_extlen_t		*aglen,
+	int			adjust)
+{
+	struct xfs_refcount_irec	left = {0}, cleft = {0};
+	struct xfs_refcount_irec	cright = {0}, right = {0};
+	int				error;
+	unsigned long long		ulen;
+	bool				cequal;
+
+	/*
+	 * Find the extent just below agbno [left], just above agbno [cleft],
+	 * just below (agbno + aglen) [cright], and just above (agbno + aglen)
+	 * [right].
+	 */
+	error = find_left_extent(cur, &left, &cleft, *agbno, *aglen);
+	if (error)
+		return error;
+	error = find_right_extent(cur, &right, &cright, *agbno, *aglen);
+	if (error)
+		return error;
+
+	/* No left or right extent to merge; exit. */
+	if (left.rc_blockcount == 0 && right.rc_blockcount == 0)
+		return 0;
+
+	cequal = (cleft.rc_startblock == cright.rc_startblock) &&
+		 (cleft.rc_blockcount == cright.rc_blockcount);
+
+	/* Try to merge left, cleft, and right.  cleft must == cright. */
+	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount +
+			right.rc_blockcount;
+	if (left.rc_blockcount != 0 && right.rc_blockcount != 0 &&
+	    cleft.rc_blockcount != 0 && cright.rc_blockcount != 0 &&
+	    cequal &&
+	    left.rc_refcount == cleft.rc_refcount + adjust &&
+	    right.rc_refcount == cleft.rc_refcount + adjust &&
+	    ulen < MAXREFCEXTLEN) {
+		trace_xfs_refcount_merge_center_extents(cur->bc_mp,
+			cur->bc_private.a.agno, &left, &cleft, &right);
+		return merge_center(cur, &left, &cleft, ulen, agbno, aglen);
+	}
+
+	/* Try to merge left and cleft. */
+	ulen = (unsigned long long)left.rc_blockcount + cleft.rc_blockcount;
+	if (left.rc_blockcount != 0 && cleft.rc_blockcount != 0 &&
+	    left.rc_refcount == cleft.rc_refcount + adjust &&
+	    ulen < MAXREFCEXTLEN) {
+		trace_xfs_refcount_merge_left_extent(cur->bc_mp,
+			cur->bc_private.a.agno, &left, &cleft);
+		error = merge_left(cur, &left, &cleft, agbno, aglen);
+		if (error)
+			return error;
+
+		/*
+		 * If we just merged left + cleft and cleft == cright,
+		 * we no longer have a cright to merge with right.  We're done.
+		 */
+		if (cequal)
+			return 0;
+	}
+
+	/* Try to merge cright and right. */
+	ulen = (unsigned long long)right.rc_blockcount + cright.rc_blockcount;
+	if (right.rc_blockcount != 0 && cright.rc_blockcount != 0 &&
+	    right.rc_refcount == cright.rc_refcount + adjust &&
+	    ulen < MAXREFCEXTLEN) {
+		trace_xfs_refcount_merge_right_extent(cur->bc_mp,
+			cur->bc_private.a.agno, &cright, &right);
+		return merge_right(cur, &right, &cright, agbno, aglen);
+	}
+
+	return error;
+}
+
+/*
+ * Adjust the refcounts of middle extents.  At this point we should have
+ * split extents that crossed the adjustment range; merged with adjacent
+ * extents; and updated agbno/aglen to reflect the merges.  Therefore,
+ * all we have to do is update the extents inside [agbno, agbno + aglen].
+ */
+STATIC int
+adjust_rcextents(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	int			adj,
+	struct xfs_bmap_free	*flist,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_refcount_irec	ext, tmp;
+	int				error;
+	int				found_rec, found_tmp;
+	xfs_fsblock_t			fsbno;
+
+	error = xfs_refcountbt_lookup_ge(cur, agbno, &found_rec);
+	if (error)
+		goto out_error;
+
+	while (aglen > 0) {
+		error = xfs_refcountbt_get_rec(cur, &ext, &found_rec);
+		if (error)
+			goto out_error;
+		if (!found_rec) {
+			ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks;
+			ext.rc_blockcount = 0;
+			ext.rc_refcount = 0;
+		}
+
+		/*
+		 * Deal with a hole in the refcount tree; if a file maps to
+		 * these blocks and there's no refcountbt recourd, pretend that
+		 * there is one with refcount == 1.
+		 */
+		if (ext.rc_startblock != agbno) {
+			tmp.rc_startblock = agbno;
+			tmp.rc_blockcount = min(aglen,
+					ext.rc_startblock - agbno);
+			tmp.rc_refcount = 1 + adj;
+			trace_xfs_refcount_modify_extent(cur->bc_mp,
+					cur->bc_private.a.agno, &tmp);
+
+			/*
+			 * Either cover the hole (increment) or
+			 * delete the range (decrement).
+			 */
+			if (tmp.rc_refcount) {
+				error = xfs_refcountbt_insert(cur, &tmp,
+						&found_tmp);
+				if (error)
+					goto out_error;
+				XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+						found_tmp == 1, out_error);
+			} else {
+				fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+						cur->bc_private.a.agno,
+						tmp.rc_startblock);
+				xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
+						tmp.rc_blockcount, oinfo);
+			}
+
+			agbno += tmp.rc_blockcount;
+			aglen -= tmp.rc_blockcount;
+
+			error = xfs_refcountbt_lookup_ge(cur, agbno,
+					&found_rec);
+			if (error)
+				goto out_error;
+		}
+
+		/* Stop if there's nothing left to modify */
+		if (aglen == 0)
+			break;
+
+		/*
+		 * Adjust the reference count and either update the tree
+		 * (incr) or free the blocks (decr).
+		 */
+		ext.rc_refcount += adj;
+		trace_xfs_refcount_modify_extent(cur->bc_mp,
+				cur->bc_private.a.agno, &ext);
+		if (ext.rc_refcount > 1) {
+			error = xfs_refcountbt_update(cur, &ext);
+			if (error)
+				goto out_error;
+		} else if (ext.rc_refcount == 1) {
+			error = xfs_refcountbt_delete(cur, &found_rec);
+			if (error)
+				goto out_error;
+			XFS_WANT_CORRUPTED_GOTO(cur->bc_mp,
+					found_rec == 1, out_error);
+			goto advloop;
+		} else {
+			fsbno = XFS_AGB_TO_FSB(cur->bc_mp,
+					cur->bc_private.a.agno,
+					ext.rc_startblock);
+			xfs_bmap_add_free(cur->bc_mp, flist, fsbno,
+					ext.rc_blockcount, oinfo);
+		}
+
+		error = xfs_btree_increment(cur, 0, &found_rec);
+		if (error)
+			goto out_error;
+
+advloop:
+		agbno += ext.rc_blockcount;
+		aglen -= ext.rc_blockcount;
+	}
+
+	return error;
+out_error:
+	trace_xfs_refcount_modify_extent_error(cur->bc_mp,
+			cur->bc_private.a.agno, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Adjust the reference count of a range of AG blocks.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction object
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ * @agbno: Start of range to adjust
+ * @aglen: Length of range to adjust
+ * @adj: +1 to increment, -1 to decrement reference count
+ * @flist: freelist (only required if adj == -1)
+ * @owner: owner of the blocks (only required if adj == -1)
+ */
+STATIC int
+xfs_refcountbt_adjust_refcount(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	int			adj,
+	struct xfs_bmap_free	*flist,
+	struct xfs_owner_info	*oinfo)
+{
+	struct xfs_btree_cur	*cur;
+	int			error;
+
+	cur = xfs_refcountbt_init_cursor(mp, tp, agbp, agno, flist);
+
+	/*
+	 * Ensure that no rcextents cross the boundary of the adjustment range.
+	 */
+	error = try_split_left_rcextent(cur, agbno);
+	if (error)
+		goto out_error;
+
+	error = try_split_right_rcextent(cur, agbno + aglen);
+	if (error)
+		goto out_error;
+
+	/*
+	 * Try to merge with the left or right extents of the range.
+	 */
+	error = try_merge_rcextents(cur, &agbno, &aglen, adj);
+	if (error)
+		goto out_error;
+
+	/* Now that we've taken care of the ends, adjust the middle extents */
+	error = adjust_rcextents(cur, agbno, aglen, adj, flist, oinfo);
+	if (error)
+		goto out_error;
+
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+	return 0;
+
+out_error:
+	trace_xfs_refcount_adjust_error(mp, agno, error, _RET_IP_);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
+	return error;
+}
+
+/**
+ * Increase the reference count of a range of AG blocks.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction object
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ * @agbno: Start of range to adjust
+ * @aglen: Length of range to adjust
+ * @flist: List of blocks to free
+ */
+int
+xfs_refcount_increase(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	struct xfs_bmap_free	*flist)
+{
+	trace_xfs_refcount_increase(mp, agno, agbno, aglen);
+	return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
+			aglen, 1, flist, NULL);
+}
+
+/**
+ * Decrease the reference count of a range of AG blocks.
+ *
+ * @mp: XFS mount object
+ * @tp: XFS transaction object
+ * @agbp: Buffer containing the AGF
+ * @agno: AG number
+ * @agbno: Start of range to adjust
+ * @aglen: Length of range to adjust
+ * @flist: List of blocks to free
+ * @owner: Extent owner
+ */
+int
+xfs_refcount_decrease(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	struct xfs_bmap_free	*flist,
+	struct xfs_owner_info	*oinfo)
+{
+	trace_xfs_refcount_decrease(mp, agno, agbno, aglen);
+	return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
+			aglen, -1, flist, oinfo);
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 033a9b1..6640e3d 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -26,4 +26,12 @@ extern int xfs_refcountbt_lookup_ge(struct xfs_btree_cur *cur,
 extern int xfs_refcountbt_get_rec(struct xfs_btree_cur *cur,
 		struct xfs_refcount_irec *irec, int *stat);
 
+extern int xfs_refcount_increase(struct xfs_mount *mp, struct xfs_trans *tp,
+		struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+		xfs_extlen_t  aglen, struct xfs_bmap_free *flist);
+extern int xfs_refcount_decrease(struct xfs_mount *mp, struct xfs_trans *tp,
+		struct xfs_buf *agbp, xfs_agnumber_t agno, xfs_agblock_t agbno,
+		xfs_extlen_t aglen, struct xfs_bmap_free *flist,
+		struct xfs_owner_info *oinfo);
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 45/76] libxfs: adjust refcount when unmapping file blocks
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (43 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 44/76] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 46/76] xfs: add refcount btree block detection to log recovery Darrick J. Wong
                   ` (31 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c     |   16 +++++++++++++---
 fs/xfs/libxfs/xfs_refcount.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_refcount.h |    4 ++++
 3 files changed, 59 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 5d1290e..6cc4aeb 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -46,6 +46,7 @@
 #include "xfs_attr_leaf.h"
 #include "xfs_filestream.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_refcount.h"
 
 
 kmem_zone_t		*xfs_bmap_free_item_zone;
@@ -5144,9 +5145,18 @@ xfs_bmap_del_extent(
 	/*
 	 * If we need to, add to list of extents to delete.
 	 */
-	if (do_fx)
-		xfs_bmap_add_free(mp, flist, del->br_startblock,
-				  del->br_blockcount, NULL);
+	if (do_fx) {
+		if (xfs_is_reflink_inode(ip)) {
+			error = xfs_refcount_put_extent(mp, tp, flist,
+						del->br_startblock,
+						del->br_blockcount, NULL);
+			if (error)
+				goto done;
+		} else
+			xfs_bmap_add_free(mp, flist, del->br_startblock,
+					  del->br_blockcount, NULL);
+	}
+
 	/*
 	 * Adjust inode # blocks in the file.
 	 */
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index dae6453..c9d6e5d 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -970,3 +970,45 @@ xfs_refcount_decrease(
 	return xfs_refcountbt_adjust_refcount(mp, tp, agbp, agno, agbno,
 			aglen, -1, flist, oinfo);
 }
+
+/**
+ * xfs_refcount_put_extent() - release a range of blocks
+ *
+ * @mp: XFS mount object
+ * @tp: transaction that goes with the free operation
+ * @flist: List of blocks to be freed at the end of the transaction
+ * @fsbno: First fs block of the range to release
+ * @len: Length of range
+ * @owner: owner of the extent
+ */
+int
+xfs_refcount_put_extent(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_bmap_free	*flist,
+	xfs_fsblock_t		fsbno,
+	xfs_filblks_t		fslen,
+	struct xfs_owner_info	*oinfo)
+{
+	int			error;
+	struct xfs_buf		*agbp;
+	xfs_agnumber_t		agno;		/* allocation group number */
+	xfs_agblock_t		agbno;		/* ag start of range to free */
+	xfs_extlen_t		aglen;		/* ag length of range to free */
+
+	agno = XFS_FSB_TO_AGNO(mp, fsbno);
+	agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+	aglen = fslen;
+
+	/*
+	 * Drop reference counts in the refcount tree.
+	 */
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+	if (error)
+		return error;
+
+	error = xfs_refcount_decrease(mp, tp, agbp, agno, agbno, aglen, flist,
+			oinfo);
+	xfs_trans_brelse(tp, agbp);
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 6640e3d..074d620 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -34,4 +34,8 @@ extern int xfs_refcount_decrease(struct xfs_mount *mp, struct xfs_trans *tp,
 		xfs_extlen_t aglen, struct xfs_bmap_free *flist,
 		struct xfs_owner_info *oinfo);
 
+extern int xfs_refcount_put_extent(struct xfs_mount *mp, struct xfs_trans *tp,
+		struct xfs_bmap_free *flist, xfs_fsblock_t fsbno,
+		xfs_filblks_t len, struct xfs_owner_info *oinfo);
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 46/76] xfs: add refcount btree block detection to log recovery
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (44 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 45/76] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 47/76] xfs: refcount btree requires more reserved space Darrick J. Wong
                   ` (30 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach log recovery how to deal with refcount btree blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_log_recover.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c8f056e..15bf5ed 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1848,6 +1848,7 @@ xlog_recover_get_buf_lsn(
 	case XFS_ABTB_MAGIC:
 	case XFS_ABTC_MAGIC:
 	case XFS_RMAP_CRC_MAGIC:
+	case XFS_REFC_CRC_MAGIC:
 	case XFS_IBT_CRC_MAGIC:
 	case XFS_IBT_MAGIC: {
 		struct xfs_btree_block *btb = blk;
@@ -2019,6 +2020,9 @@ xlog_recover_validate_buf_type(
 		case XFS_RMAP_CRC_MAGIC:
 			bp->b_ops = &xfs_rmapbt_buf_ops;
 			break;
+		case XFS_REFC_CRC_MAGIC:
+			bp->b_ops = &xfs_refcountbt_buf_ops;
+			break;
 		default:
 			xfs_warn(mp, "Bad btree block magic!");
 			ASSERT(0);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 47/76] xfs: refcount btree requires more reserved space
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (45 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 46/76] xfs: add refcount btree block detection to log recovery Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 48/76] xfs: introduce reflink utility functions Darrick J. Wong
                   ` (29 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

The reference count btree is allocated from the free space, which
means that we have to ensure that an AG can't run out of free space
while performing a refcount operation.  In the pathological case each
AG block has its own refcntbt record, so we have to keep that much
space available.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c          |    3 +++
 fs/xfs/libxfs/xfs_refcount_btree.c |   16 ++++++++++++++++
 fs/xfs/libxfs/xfs_refcount_btree.h |    3 +++
 3 files changed, 22 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 71dac69..637e2e7 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -36,6 +36,7 @@
 #include "xfs_trans.h"
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
+#include "xfs_refcount_btree.h"
 
 struct workqueue_struct *xfs_alloc_wq;
 
@@ -136,6 +137,8 @@ xfs_alloc_ag_max_usable(struct xfs_mount *mp)
 		/* rmap root block + full tree split on full AG */
 		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
 	}
+	if (xfs_sb_version_hasreflink(&mp->m_sb))
+		blocks += xfs_refcountbt_max_btree_size(mp);
 
 	return mp->m_sb.sb_agblocks - blocks;
 }
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index d7574cd..c785433 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -374,3 +374,19 @@ xfs_refcountbt_maxrecs(
 	return blocklen / (sizeof(struct xfs_refcount_key) +
 			   sizeof(xfs_refcount_ptr_t));
 }
+
+DEFINE_BTREE_SIZE_FN(refcountbt, m_refc_mxr, XFS_BTREE_MAXLEVELS);
+
+/**
+ * xfs_refcountbt_max_btree_size() -- Calculate the maximum refcount btree size.
+ */
+unsigned int
+xfs_refcountbt_max_btree_size(
+	struct xfs_mount	*mp)
+{
+	/* Bail out if we're uninitialized, which can happen in mkfs. */
+	if (mp->m_refc_mxr[0] == 0)
+		return 0;
+
+	return xfs_refcountbt_calc_btree_size(mp, mp->m_sb.sb_agblocks);
+}
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
index d51dc1a..0f55544 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.h
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -62,4 +62,7 @@ extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp,
 extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
 		bool leaf);
 
+DECLARE_BTREE_SIZE_FN(refcountbt);
+extern unsigned int xfs_refcountbt_max_btree_size(struct xfs_mount *mp);
+
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 48/76] xfs: introduce reflink utility functions
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (46 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 47/76] xfs: refcount btree requires more reserved space Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 49/76] xfs: define tracepoints for reflink activities Darrick J. Wong
                   ` (28 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

These functions will be used by the other reflink functions to find
the maximum length of a range of shared blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM>
---
 fs/xfs/libxfs/xfs_refcount.c |  118 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_refcount.h |    4 +
 2 files changed, 122 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
index c9d6e5d..49f6a7a 100644
--- a/fs/xfs/libxfs/xfs_refcount.c
+++ b/fs/xfs/libxfs/xfs_refcount.c
@@ -1012,3 +1012,121 @@ xfs_refcount_put_extent(
 	xfs_trans_brelse(tp, agbp);
 	return error;
 }
+
+/**
+ * xfs_refcount_find_shared() -- Given an AG extent, find the lowest-numbered
+ *				 run of shared blocks within that range.
+ *
+ * @mp: XFS mount.
+ * @agno: AG number.
+ * @agbno: AG block number to start searching.
+ * @aglen: Length of the range to search.
+ * @fbno: Returns the AG block number of the first shared range, or
+ *     agbno + aglen if no shared blocks are found.
+ * @flen: Returns the length of the shared range found, or 0 if no shared
+ *     blocks are found.
+ * @find_maximal: If true, find the length of the run of shared blocks.
+ *     Otherwise, the length of the first refcount extent is found.
+ */
+int
+xfs_refcount_find_shared(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		agbno,
+	xfs_extlen_t		aglen,
+	xfs_agblock_t		*fbno,
+	xfs_extlen_t		*flen,
+	bool			find_maximal)
+{
+	struct xfs_btree_cur	*cur;
+	struct xfs_buf		*agbp;
+	struct xfs_refcount_irec	tmp;
+	int			error;
+	int			i, have;
+	int			bt_error = XFS_BTREE_ERROR;
+
+	trace_xfs_refcount_find_shared(mp, agno, agbno, aglen);
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agbp);
+	if (error)
+		goto out;
+	cur = xfs_refcountbt_init_cursor(mp, NULL, agbp, agno, NULL);
+
+	/* By default, skip the whole range */
+	*fbno = agbno + aglen;
+	*flen = 0;
+
+	/* Try to find a refcount extent that crosses the start */
+	error = xfs_refcountbt_lookup_le(cur, agbno, &have);
+	if (error)
+		goto out_error;
+	if (!have) {
+		/* No left extent, look at the next one */
+		error = xfs_btree_increment(cur, 0, &have);
+		if (error)
+			goto out_error;
+		if (!have)
+			goto done;
+	}
+	error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	/* If the extent ends before the start, look at the next one */
+	if (tmp.rc_startblock + tmp.rc_blockcount <= agbno) {
+		error = xfs_btree_increment(cur, 0, &have);
+		if (error)
+			goto out_error;
+		if (!have)
+			goto done;
+		error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	}
+
+	/* If the extent ends after the range we want, bail out */
+	if (tmp.rc_startblock >= agbno + aglen)
+		goto done;
+
+	/* We found the start of a shared extent! */
+	if (tmp.rc_startblock < agbno) {
+		tmp.rc_blockcount -= (agbno - tmp.rc_startblock);
+		tmp.rc_startblock = agbno;
+	}
+
+	*fbno = tmp.rc_startblock;
+	*flen = min(tmp.rc_blockcount, agbno + aglen - *fbno);
+	if (!find_maximal)
+		goto done;
+
+	/* Otherwise, find the end of this shared extent */
+	while (*fbno + *flen < agbno + aglen) {
+		error = xfs_btree_increment(cur, 0, &have);
+		if (error)
+			goto out_error;
+		if (!have)
+			break;
+		error = xfs_refcountbt_get_rec(cur, &tmp, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		if (tmp.rc_startblock >= agbno + aglen ||
+		    tmp.rc_startblock != *fbno + *flen)
+			break;
+		*flen = min(*flen + tmp.rc_blockcount, agbno + aglen - *fbno);
+	}
+
+done:
+	bt_error = XFS_BTREE_NOERROR;
+	trace_xfs_refcount_find_shared_result(mp, agno, *fbno, *flen);
+
+out_error:
+	xfs_btree_del_cursor(cur, bt_error);
+	xfs_buf_relse(agbp);
+out:
+	if (error)
+		trace_xfs_refcount_find_shared_error(mp, agno, error, _RET_IP_);
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h
index 074d620..e8e0beb 100644
--- a/fs/xfs/libxfs/xfs_refcount.h
+++ b/fs/xfs/libxfs/xfs_refcount.h
@@ -38,4 +38,8 @@ extern int xfs_refcount_put_extent(struct xfs_mount *mp, struct xfs_trans *tp,
 		struct xfs_bmap_free *flist, xfs_fsblock_t fsbno,
 		xfs_filblks_t len, struct xfs_owner_info *oinfo);
 
+extern int xfs_refcount_find_shared(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno,
+		xfs_extlen_t *flen, bool find_maximal);
+
 #endif	/* __XFS_REFCOUNT_H__ */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 49/76] xfs: define tracepoints for reflink activities
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (47 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 48/76] xfs: introduce reflink utility functions Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:01 ` [PATCH 50/76] xfs: map an inode's offset to an exact physical block Darrick J. Wong
                   ` (27 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Define all the tracepoints we need to inspect the runtime operation
of reflink/dedupe/copy-on-write.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_trace.h |  385 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 385 insertions(+)


diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 11b159a..9e3a107 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2775,6 +2775,391 @@ DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared);
 DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared_result);
 DEFINE_AG_ERROR_EVENT(xfs_refcount_find_shared_error);
 
+/* reflink tracepoint classes */
+
+/* two-file io tracepoint class */
+DECLARE_EVENT_CLASS(xfs_double_io_class,
+	TP_PROTO(struct xfs_inode *src, xfs_off_t soffset, xfs_off_t len,
+		 struct xfs_inode *dest, xfs_off_t doffset),
+	TP_ARGS(src, soffset, len, dest, doffset),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, src_ino)
+		__field(loff_t, src_isize)
+		__field(loff_t, src_disize)
+		__field(loff_t, src_offset)
+		__field(size_t, len)
+		__field(xfs_ino_t, dest_ino)
+		__field(loff_t, dest_isize)
+		__field(loff_t, dest_disize)
+		__field(loff_t, dest_offset)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(src)->i_sb->s_dev;
+		__entry->src_ino = src->i_ino;
+		__entry->src_isize = VFS_I(src)->i_size;
+		__entry->src_disize = src->i_d.di_size;
+		__entry->src_offset = soffset;
+		__entry->len = len;
+		__entry->dest_ino = dest->i_ino;
+		__entry->dest_isize = VFS_I(dest)->i_size;
+		__entry->dest_disize = dest->i_d.di_size;
+		__entry->dest_offset = doffset;
+	),
+	TP_printk("dev %d:%d count %zd "
+		  "ino 0x%llx isize 0x%llx disize 0x%llx offset 0x%llx -> "
+		  "ino 0x%llx isize 0x%llx disize 0x%llx offset 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->len,
+		  __entry->src_ino,
+		  __entry->src_isize,
+		  __entry->src_disize,
+		  __entry->src_offset,
+		  __entry->dest_ino,
+		  __entry->dest_isize,
+		  __entry->dest_disize,
+		  __entry->dest_offset)
+)
+
+#define DEFINE_DOUBLE_IO_EVENT(name)	\
+DEFINE_EVENT(xfs_double_io_class, name,	\
+	TP_PROTO(struct xfs_inode *src, xfs_off_t soffset, xfs_off_t len, \
+		 struct xfs_inode *dest, xfs_off_t doffset), \
+	TP_ARGS(src, soffset, len, dest, doffset))
+
+/* two-file vfs io tracepoint class */
+DECLARE_EVENT_CLASS(xfs_double_vfs_io_class,
+	TP_PROTO(struct inode *src, u64 soffset, u64 len,
+		 struct inode *dest, u64 doffset),
+	TP_ARGS(src, soffset, len, dest, doffset),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, src_ino)
+		__field(loff_t, src_isize)
+		__field(loff_t, src_offset)
+		__field(size_t, len)
+		__field(unsigned long, dest_ino)
+		__field(loff_t, dest_isize)
+		__field(loff_t, dest_offset)
+	),
+	TP_fast_assign(
+		__entry->dev = src->i_sb->s_dev;
+		__entry->src_ino = src->i_ino;
+		__entry->src_isize = i_size_read(src);
+		__entry->src_offset = soffset;
+		__entry->len = len;
+		__entry->dest_ino = dest->i_ino;
+		__entry->dest_isize = i_size_read(dest);
+		__entry->dest_offset = doffset;
+	),
+	TP_printk("dev %d:%d count %zd "
+		  "ino 0x%lx isize 0x%llx offset 0x%llx -> "
+		  "ino 0x%lx isize 0x%llx offset 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->len,
+		  __entry->src_ino,
+		  __entry->src_isize,
+		  __entry->src_offset,
+		  __entry->dest_ino,
+		  __entry->dest_isize,
+		  __entry->dest_offset)
+)
+
+#define DEFINE_DOUBLE_VFS_IO_EVENT(name)	\
+DEFINE_EVENT(xfs_double_vfs_io_class, name,	\
+	TP_PROTO(struct inode *src, u64 soffset, u64 len, \
+		 struct inode *dest, u64 doffset), \
+	TP_ARGS(src, soffset, len, dest, doffset))
+
+/* CoW write tracepoint */
+DECLARE_EVENT_CLASS(xfs_copy_on_write_class,
+	TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk, xfs_fsblock_t pblk,
+		 xfs_extlen_t len, xfs_fsblock_t new_pblk),
+	TP_ARGS(ip, lblk, pblk, len, new_pblk),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_fileoff_t, lblk)
+		__field(xfs_fsblock_t, pblk)
+		__field(xfs_extlen_t, len)
+		__field(xfs_fsblock_t, new_pblk)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->lblk = lblk;
+		__entry->pblk = pblk;
+		__entry->len = len;
+		__entry->new_pblk = new_pblk;
+	),
+	TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx pblk 0x%llx "
+		  "len 0x%x new_pblk %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->lblk,
+		  __entry->pblk,
+		  __entry->len,
+		  __entry->new_pblk)
+)
+
+#define DEFINE_COW_EVENT(name)	\
+DEFINE_EVENT(xfs_copy_on_write_class, name,	\
+	TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk, xfs_fsblock_t pblk, \
+		 xfs_extlen_t len, xfs_fsblock_t new_pblk), \
+	TP_ARGS(ip, lblk, pblk, len, new_pblk))
+
+/* inode/irec events */
+DECLARE_EVENT_CLASS(xfs_inode_irec_class,
+	TP_PROTO(struct xfs_inode *ip, struct xfs_bmbt_irec *irec),
+	TP_ARGS(ip, irec),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_fileoff_t, lblk)
+		__field(xfs_extlen_t, len)
+		__field(xfs_fsblock_t, pblk)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->lblk = irec->br_startoff;
+		__entry->len = irec->br_blockcount;
+		__entry->pblk = irec->br_startblock;
+	),
+	TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x pblk %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->lblk,
+		  __entry->len,
+		  __entry->pblk)
+);
+#define DEFINE_INODE_IREC_EVENT(name) \
+DEFINE_EVENT(xfs_inode_irec_class, name, \
+	TP_PROTO(struct xfs_inode *ip, struct xfs_bmbt_irec *irec), \
+	TP_ARGS(ip, irec))
+
+/* simple inode-based error/%ip tracepoint class */
+DECLARE_EVENT_CLASS(xfs_inode_error_class,
+	TP_PROTO(struct xfs_inode *ip, int error, unsigned long caller_ip),
+	TP_ARGS(ip, error, caller_ip),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(int, error)
+		__field(unsigned long, caller_ip)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->error = error;
+		__entry->caller_ip = caller_ip;
+	),
+	TP_printk("dev %d:%d ino %llx error %d caller %ps",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->error,
+		  (char *)__entry->caller_ip)
+);
+
+#define DEFINE_INODE_ERROR_EVENT(name) \
+DEFINE_EVENT(xfs_inode_error_class, name, \
+	TP_PROTO(struct xfs_inode *ip, int error, \
+		 unsigned long caller_ip), \
+	TP_ARGS(ip, error, caller_ip))
+
+/* refcount/reflink tracepoint definitions */
+
+/* reflink allocator */
+TRACE_EVENT(xfs_bmap_remap_alloc,
+	TP_PROTO(struct xfs_inode *ip, xfs_fsblock_t fsbno,
+		 xfs_extlen_t len),
+	TP_ARGS(ip, fsbno, len),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_fsblock_t, fsbno)
+		__field(xfs_extlen_t, len)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->fsbno = fsbno;
+		__entry->len = len;
+	),
+	TP_printk("dev %d:%d ino 0x%llx fsbno 0x%llx len %x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->fsbno,
+		  __entry->len)
+);
+DEFINE_INODE_ERROR_EVENT(xfs_bmap_remap_alloc_error);
+
+/* reflink tracepoints */
+DEFINE_INODE_EVENT(xfs_reflink_set_inode_flag);
+DEFINE_INODE_EVENT(xfs_reflink_unset_inode_flag);
+DEFINE_ITRUNC_EVENT(xfs_reflink_update_inode_size);
+DEFINE_IOMAP_EVENT(xfs_reflink_remap_imap);
+TRACE_EVENT(xfs_reflink_remap_blocks_loop,
+	TP_PROTO(struct xfs_inode *src, xfs_fileoff_t soffset,
+		 xfs_filblks_t len, struct xfs_inode *dest,
+		 xfs_fileoff_t doffset),
+	TP_ARGS(src, soffset, len, dest, doffset),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, src_ino)
+		__field(xfs_fileoff_t, src_lblk)
+		__field(xfs_filblks_t, len)
+		__field(xfs_ino_t, dest_ino)
+		__field(xfs_fileoff_t, dest_lblk)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(src)->i_sb->s_dev;
+		__entry->src_ino = src->i_ino;
+		__entry->src_lblk = soffset;
+		__entry->len = len;
+		__entry->dest_ino = dest->i_ino;
+		__entry->dest_lblk = doffset;
+	),
+	TP_printk("dev %d:%d len 0x%llx "
+		  "ino 0x%llx offset 0x%llx blocks -> "
+		  "ino 0x%llx offset 0x%llx blocks",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->len,
+		  __entry->src_ino,
+		  __entry->src_lblk,
+		  __entry->dest_ino,
+		  __entry->dest_lblk)
+);
+TRACE_EVENT(xfs_reflink_punch_range,
+	TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk,
+		 xfs_extlen_t len),
+	TP_ARGS(ip, lblk, len),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_fileoff_t, lblk)
+		__field(xfs_extlen_t, len)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->lblk = lblk;
+		__entry->len = len;
+	),
+	TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->lblk,
+		  __entry->len)
+);
+TRACE_EVENT(xfs_reflink_remap,
+	TP_PROTO(struct xfs_inode *ip, xfs_fileoff_t lblk,
+		 xfs_extlen_t len, xfs_fsblock_t new_pblk),
+	TP_ARGS(ip, lblk, len, new_pblk),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_fileoff_t, lblk)
+		__field(xfs_extlen_t, len)
+		__field(xfs_fsblock_t, new_pblk)
+	),
+	TP_fast_assign(
+		__entry->dev = VFS_I(ip)->i_sb->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->lblk = lblk;
+		__entry->len = len;
+		__entry->new_pblk = new_pblk;
+	),
+	TP_printk("dev %d:%d ino 0x%llx lblk 0x%llx len 0x%x new_pblk %llu",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->lblk,
+		  __entry->len,
+		  __entry->new_pblk)
+);
+DEFINE_DOUBLE_IO_EVENT(xfs_reflink_remap_range);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_remap_range_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_set_inode_flag_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_update_inode_size_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_reflink_main_loop_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_read_iomap_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_remap_blocks_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_remap_extent_error);
+
+/* dedupe tracepoints */
+DEFINE_DOUBLE_IO_EVENT(xfs_reflink_compare_extents);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_compare_extents_error);
+
+/* ioctl tracepoints */
+DEFINE_DOUBLE_VFS_IO_EVENT(xfs_ioctl_reflink);
+DEFINE_DOUBLE_VFS_IO_EVENT(xfs_ioctl_clone_range);
+DEFINE_DOUBLE_VFS_IO_EVENT(xfs_ioctl_file_extent_same);
+TRACE_EVENT(xfs_ioctl_clone,
+	TP_PROTO(struct inode *src, struct inode *dest),
+	TP_ARGS(src, dest),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, src_ino)
+		__field(loff_t, src_isize)
+		__field(unsigned long, dest_ino)
+		__field(loff_t, dest_isize)
+	),
+	TP_fast_assign(
+		__entry->dev = src->i_sb->s_dev;
+		__entry->src_ino = src->i_ino;
+		__entry->src_isize = i_size_read(src);
+		__entry->dest_ino = dest->i_ino;
+		__entry->dest_isize = i_size_read(dest);
+	),
+	TP_printk("dev %d:%d "
+		  "ino 0x%lx isize 0x%llx -> "
+		  "ino 0x%lx isize 0x%llx\n",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->src_ino,
+		  __entry->src_isize,
+		  __entry->dest_ino,
+		  __entry->dest_isize)
+);
+
+/* unshare tracepoints */
+DEFINE_SIMPLE_IO_EVENT(xfs_reflink_unshare);
+DEFINE_SIMPLE_IO_EVENT(xfs_reflink_truncate);
+DEFINE_PAGE_EVENT(xfs_reflink_unshare_page);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_unshare_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_truncate_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_dirty_page_error);
+
+/* copy on write */
+DEFINE_INODE_IREC_EVENT(xfs_reflink_irec_is_shared);
+
+DEFINE_RW_EVENT(xfs_reflink_reserve_cow_range);
+DEFINE_INODE_IREC_EVENT(xfs_reflink_reserve_cow_extent);
+DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range);
+DEFINE_INODE_IREC_EVENT(xfs_reflink_allocate_cow_extent);
+
+DEFINE_INODE_IREC_EVENT(xfs_reflink_bounce_dio_write);
+DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
+DEFINE_SIMPLE_IO_EVENT(xfs_iomap_cow_delay);
+
+DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow_failed);
+DEFINE_SIMPLE_IO_EVENT(xfs_reflink_end_cow);
+DEFINE_INODE_IREC_EVENT(xfs_reflink_cow_remap);
+
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_reserve_cow_range_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_reserve_cow_extent_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_allocate_cow_range_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_failed_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_end_cow_error);
+
+DEFINE_COW_EVENT(xfs_reflink_fork_buf);
+DEFINE_COW_EVENT(xfs_reflink_finish_fork_buf);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_fork_buf_error);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_finish_fork_buf_error);
+
+DEFINE_INODE_EVENT(xfs_reflink_cancel_pending_cow);
+DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow);
+DEFINE_INODE_ERROR_EVENT(xfs_reflink_cancel_pending_cow_error);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 50/76] xfs: map an inode's offset to an exact physical block
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (48 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 49/76] xfs: define tracepoints for reflink activities Darrick J. Wong
@ 2015-12-19  9:01 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 51/76] xfs: add reflink feature flag to geometry Darrick J. Wong
                   ` (26 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:01 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks.  This enables reflink to map a file to blocks that are already
in use.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_bmap.h |   10 +++++++
 2 files changed, 71 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6cc4aeb..ac669d58 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4003,6 +4003,54 @@ xfs_bmap_btalloc(
 }
 
 /*
+ * For a remap operation, just "allocate" an extent at the address that the
+ * caller passed in, and ensure that the AGFL is the right size.  The caller
+ * will then map the "allocated" extent into the file somewhere.
+ */
+STATIC int
+xfs_bmap_remap_alloc(
+	struct xfs_bmalloca	*ap)
+{
+	struct xfs_trans	*tp = ap->tp;
+	struct xfs_mount	*mp = tp->t_mountp;
+	xfs_agblock_t		bno;
+	struct xfs_alloc_arg	args;
+	int			error;
+
+	/*
+	 * validate that the block number is legal - the enables us to detect
+	 * and handle a silent filesystem corruption rather than crashing.
+	 */
+	memset(&args, 0, sizeof(struct xfs_alloc_arg));
+	args.tp = ap->tp;
+	args.mp = ap->tp->t_mountp;
+	bno = *ap->firstblock;
+	args.agno = XFS_FSB_TO_AGNO(mp, bno);
+	ASSERT(args.agno < mp->m_sb.sb_agcount);
+	args.agbno = XFS_FSB_TO_AGBNO(mp, bno);
+	ASSERT(args.agbno < mp->m_sb.sb_agblocks);
+
+	/* "Allocate" the extent from the range we passed in. */
+	trace_xfs_bmap_remap_alloc(ap->ip, *ap->firstblock, ap->length);
+	ap->blkno = bno;
+	ap->ip->i_d.di_nblocks += ap->length;
+	xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
+
+	/* Fix the freelist, like a real allocator does. */
+	args.pag = xfs_perag_get(args.mp, args.agno);
+	ASSERT(args.pag);
+
+	error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING);
+	if (error)
+		goto error0;
+error0:
+	xfs_perag_put(args.pag);
+	if (error)
+		trace_xfs_bmap_remap_alloc_error(ap->ip, error, _RET_IP_);
+	return error;
+}
+
+/*
  * xfs_bmap_alloc is called by xfs_bmapi to allocate an extent for a file.
  * It figures out where to ask the underlying allocator to put the new extent.
  */
@@ -4010,6 +4058,8 @@ STATIC int
 xfs_bmap_alloc(
 	struct xfs_bmalloca	*ap)	/* bmap alloc argument struct */
 {
+	if (ap->flags & XFS_BMAPI_REMAP)
+		return xfs_bmap_remap_alloc(ap);
 	if (XFS_IS_REALTIME_INODE(ap->ip) && ap->userdata)
 		return xfs_bmap_rtalloc(ap);
 	return xfs_bmap_btalloc(ap);
@@ -4645,6 +4695,12 @@ xfs_bmapi_write(
 	ASSERT(len > 0);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+	if (whichfork == XFS_ATTR_FORK)
+		ASSERT(!(flags & XFS_BMAPI_REMAP));
+	if (flags & XFS_BMAPI_REMAP) {
+		ASSERT(!(flags & XFS_BMAPI_PREALLOC));
+		ASSERT(!(flags & XFS_BMAPI_CONVERT));
+	}
 
 	/* zeroing is for currently only for data extents, not metadata */
 	ASSERT((flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)) !=
@@ -4707,6 +4763,12 @@ xfs_bmapi_write(
 		wasdelay = !inhole && isnullstartblock(bma.got.br_startblock);
 
 		/*
+		 * Make sure we only reflink into a hole.
+		 */
+		if (flags & XFS_BMAPI_REMAP)
+			ASSERT(inhole);
+
+		/*
 		 * First, deal with the hole before the allocated space
 		 * that we found, if any.
 		 */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index fcea496..7b78fb8 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -127,6 +127,13 @@ typedef	struct xfs_bmap_free
  */
 #define XFS_BMAPI_ZERO		0x080
 
+/*
+ * Map the inode offset to the block given in ap->firstblock.  Primarily
+ * used for reflink.  The range must be in a hole, and this flag cannot be
+ * turned on with PREALLOC or CONVERT, and cannot be used on the attr fork.
+ */
+#define XFS_BMAPI_REMAP		0x100
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -135,7 +142,8 @@ typedef	struct xfs_bmap_free
 	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
-	{ XFS_BMAPI_ZERO,	"ZERO" }
+	{ XFS_BMAPI_ZERO,	"ZERO" }, \
+	{ XFS_BMAPI_REMAP,	"REMAP" }
 
 
 static inline int xfs_bmapi_aflag(int w)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 51/76] xfs: add reflink feature flag to geometry
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (49 preceding siblings ...)
  2015-12-19  9:01 ` [PATCH 50/76] xfs: map an inode's offset to an exact physical block Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 52/76] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
                   ` (25 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Report the reflink feature in the XFS geometry so that xfs_info and
friends know the filesystem has this feature.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |    3 ++-
 fs/xfs/xfs_fsops.c     |    4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index d2a58ff..22a8fd9 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -240,7 +240,8 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_FTYPE	0x10000	/* inode directory types */
 #define XFS_FSOP_GEOM_FLAGS_FINOBT	0x20000	/* free inode btree */
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
-#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* Reverse mapping btree */
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* reverse mapping btree */
+#define XFS_FSOP_GEOM_FLAGS_REFLINK	0x100000 /* files can share blocks */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 7c37c22..920db9d 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -106,7 +106,9 @@ xfs_fs_geometry(
 			(xfs_sb_version_hassparseinodes(&mp->m_sb) ?
 				XFS_FSOP_GEOM_FLAGS_SPINODES : 0) |
 			(xfs_sb_version_hasrmapbt(&mp->m_sb) ?
-				XFS_FSOP_GEOM_FLAGS_RMAPBT : 0);
+				XFS_FSOP_GEOM_FLAGS_RMAPBT : 0) |
+			(xfs_sb_version_hasreflink(&mp->m_sb) ?
+				XFS_FSOP_GEOM_FLAGS_REFLINK : 0);
 		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
 				mp->m_sb.sb_logsectsize : BBSIZE;
 		geo->rtsectsize = mp->m_sb.sb_blocksize;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 52/76] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (50 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 51/76] xfs: add reflink feature flag to geometry Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 53/76] xfs: introduce the CoW fork Darrick J. Wong
                   ` (24 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Only non-rt files can be reflinked, so check that when we load an
inode.  Also, don't leak the attr fork if there's a failure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_inode_fork.c |   23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index 46ea657..bbdee05 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -121,6 +121,26 @@ xfs_iformat_fork(
 		return -EFSCORRUPTED;
 	}
 
+	if (unlikely(xfs_is_reflink_inode(ip) &&
+	    (ip->i_d.di_mode & S_IFMT) != S_IFREG)) {
+		xfs_warn(ip->i_mount,
+			"corrupt dinode %llu, wrong file type for reflink.",
+			ip->i_ino);
+		XFS_CORRUPTION_ERROR("xfs_iformat(reflink)",
+				     XFS_ERRLEVEL_LOW, ip->i_mount, dip);
+		return -EFSCORRUPTED;
+	}
+
+	if (unlikely(xfs_is_reflink_inode(ip) &&
+	    (ip->i_d.di_flags & XFS_DIFLAG_REALTIME))) {
+		xfs_warn(ip->i_mount,
+			"corrupt dinode %llu, has reflink+realtime flag set.",
+			ip->i_ino);
+		XFS_CORRUPTION_ERROR("xfs_iformat(reflink)",
+				     XFS_ERRLEVEL_LOW, ip->i_mount, dip);
+		return -EFSCORRUPTED;
+	}
+
 	switch (ip->i_d.di_mode & S_IFMT) {
 	case S_IFIFO:
 	case S_IFCHR:
@@ -208,7 +228,8 @@ xfs_iformat_fork(
 			XFS_CORRUPTION_ERROR("xfs_iformat(8)",
 					     XFS_ERRLEVEL_LOW,
 					     ip->i_mount, dip);
-			return -EFSCORRUPTED;
+			error = -EFSCORRUPTED;
+			break;
 		}
 
 		error = xfs_iformat_local(ip, dip, XFS_ATTR_FORK, size);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 53/76] xfs: introduce the CoW fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (51 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 52/76] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 54/76] xfs: support bmapping delalloc extents in " Darrick J. Wong
                   ` (23 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                |    1 
 fs/xfs/libxfs/xfs_bmap.c       |   15 +++--
 fs/xfs/libxfs/xfs_bmap.h       |   19 ++++++-
 fs/xfs/libxfs/xfs_bmap_btree.c |    1 
 fs/xfs/libxfs/xfs_inode_fork.c |   49 +++++++++++++++++-
 fs/xfs/libxfs/xfs_inode_fork.h |   28 ++++++++--
 fs/xfs/libxfs/xfs_rmap.c       |    2 +
 fs/xfs/libxfs/xfs_types.h      |    1 
 fs/xfs/xfs_icache.c            |    5 ++
 fs/xfs/xfs_inode.h             |    4 +
 fs/xfs/xfs_reflink.c           |  111 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h           |   21 ++++++++
 fs/xfs/xfs_trace.h             |    4 +
 13 files changed, 242 insertions(+), 19 deletions(-)
 create mode 100644 fs/xfs/xfs_reflink.c
 create mode 100644 fs/xfs/xfs_reflink.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 7d84c62..798e2b0 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -88,6 +88,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_message.o \
 				   xfs_mount.o \
 				   xfs_mru_cache.o \
+				   xfs_reflink.o \
 				   xfs_stats.o \
 				   xfs_super.o \
 				   xfs_symlink.o \
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index ac669d58..53a8ebc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3043,6 +3043,7 @@ xfs_bmap_add_extent_hole_real(
 	ASSERT(!isnullstartblock(new->br_startblock));
 	ASSERT(!bma->cur ||
 	       !(bma->cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL));
+	ASSERT(whichfork != XFS_COW_FORK);
 
 	XFS_STATS_INC(mp, xs_add_exlist);
 
@@ -4188,8 +4189,7 @@ xfs_bmapi_read(
 	int			error;
 	int			eof;
 	int			n = 0;
-	int			whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-						XFS_ATTR_FORK : XFS_DATA_FORK;
+	int			whichfork = xfs_bmapi_whichfork(flags);
 
 	ASSERT(*nmap >= 1);
 	ASSERT(!(flags & ~(XFS_BMAPI_ATTRFORK|XFS_BMAPI_ENTIRE|
@@ -4562,8 +4562,7 @@ xfs_bmapi_convert_unwritten(
 	xfs_filblks_t		len,
 	int			flags)
 {
-	int			whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-						XFS_ATTR_FORK : XFS_DATA_FORK;
+	int			whichfork = xfs_bmapi_whichfork(flags);
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 	int			tmp_logflags = 0;
 	int			error;
@@ -4579,6 +4578,8 @@ xfs_bmapi_convert_unwritten(
 			(XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT))
 		return 0;
 
+	ASSERT(whichfork != XFS_COW_FORK);
+
 	/*
 	 * Modify (by adding) the state flag, if writing.
 	 */
@@ -4932,6 +4933,8 @@ xfs_bmap_del_extent(
 
 	if (whichfork == XFS_ATTR_FORK)
 		state |= BMAP_ATTRFORK;
+	else if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT((*idx >= 0) && (*idx < ifp->if_bytes /
@@ -5284,8 +5287,8 @@ xfs_bunmapi(
 
 	trace_xfs_bunmap(ip, bno, len, flags, _RET_IP_);
 
-	whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-		XFS_ATTR_FORK : XFS_DATA_FORK;
+	whichfork = xfs_bmapi_whichfork(flags);
+	ASSERT(whichfork != XFS_COW_FORK);
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	if (unlikely(
 	    XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 7b78fb8..2c18ed8a 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -134,6 +134,9 @@ typedef	struct xfs_bmap_free
  */
 #define XFS_BMAPI_REMAP		0x100
 
+/* Map something in the CoW fork. */
+#define XFS_BMAPI_COWFORK	0x200
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -143,7 +146,8 @@ typedef	struct xfs_bmap_free
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
 	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
 	{ XFS_BMAPI_ZERO,	"ZERO" }, \
-	{ XFS_BMAPI_REMAP,	"REMAP" }
+	{ XFS_BMAPI_REMAP,	"REMAP" }, \
+	{ XFS_BMAPI_COWFORK,	"COWFORK" }
 
 
 static inline int xfs_bmapi_aflag(int w)
@@ -151,6 +155,15 @@ static inline int xfs_bmapi_aflag(int w)
 	return (w == XFS_ATTR_FORK ? XFS_BMAPI_ATTRFORK : 0);
 }
 
+static inline int xfs_bmapi_whichfork(int bmapi_flags)
+{
+	if (bmapi_flags & XFS_BMAPI_COWFORK)
+		return XFS_COW_FORK;
+	else if (bmapi_flags & XFS_BMAPI_ATTRFORK)
+		return XFS_ATTR_FORK;
+	return XFS_DATA_FORK;
+}
+
 /*
  * Special values for xfs_bmbt_irec_t br_startblock field.
  */
@@ -177,13 +190,15 @@ static inline void xfs_bmap_init(xfs_bmap_free_t *flp, xfs_fsblock_t *fbp)
 #define BMAP_LEFT_VALID		(1 << 6)
 #define BMAP_RIGHT_VALID	(1 << 7)
 #define BMAP_ATTRFORK		(1 << 8)
+#define BMAP_COWFORK		(1 << 9)
 
 #define XFS_BMAP_EXT_FLAGS \
 	{ BMAP_LEFT_CONTIG,	"LC" }, \
 	{ BMAP_RIGHT_CONTIG,	"RC" }, \
 	{ BMAP_LEFT_FILLING,	"LF" }, \
 	{ BMAP_RIGHT_FILLING,	"RF" }, \
-	{ BMAP_ATTRFORK,	"ATTR" }
+	{ BMAP_ATTRFORK,	"ATTR" }, \
+	{ BMAP_COWFORK,		"COW" }
 
 
 /*
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 789eb48..adaceb1 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -787,6 +787,7 @@ xfs_bmbt_init_cursor(
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	struct xfs_btree_cur	*cur;
+	ASSERT(whichfork != XFS_COW_FORK);
 
 	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
 
diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
index bbdee05..6221c7a 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.c
+++ b/fs/xfs/libxfs/xfs_inode_fork.c
@@ -206,9 +206,14 @@ xfs_iformat_fork(
 		XFS_ERROR_REPORT("xfs_iformat(7)", XFS_ERRLEVEL_LOW, ip->i_mount);
 		return -EFSCORRUPTED;
 	}
-	if (error) {
+	if (error)
 		return error;
+
+	if (xfs_is_reflink_inode(ip)) {
+		ASSERT(ip->i_cowfp == NULL);
+		xfs_ifork_init_cow(ip);
 	}
+
 	if (!XFS_DFORK_Q(dip))
 		return 0;
 
@@ -247,6 +252,9 @@ xfs_iformat_fork(
 	if (error) {
 		kmem_zone_free(xfs_ifork_zone, ip->i_afp);
 		ip->i_afp = NULL;
+		if (ip->i_cowfp)
+			kmem_zone_free(xfs_ifork_zone, ip->i_cowfp);
+		ip->i_cowfp = NULL;
 		xfs_idestroy_fork(ip, XFS_DATA_FORK);
 	}
 	return error;
@@ -737,6 +745,9 @@ xfs_idestroy_fork(
 	if (whichfork == XFS_ATTR_FORK) {
 		kmem_zone_free(xfs_ifork_zone, ip->i_afp);
 		ip->i_afp = NULL;
+	} else if (whichfork == XFS_COW_FORK) {
+		kmem_zone_free(xfs_ifork_zone, ip->i_cowfp);
+		ip->i_cowfp = NULL;
 	}
 }
 
@@ -924,6 +935,19 @@ xfs_iext_get_ext(
 	}
 }
 
+/* XFS_IEXT_STATE_TO_FORK() -- Convert BMAP state flags to an inode fork. */
+xfs_ifork_t *
+XFS_IEXT_STATE_TO_FORK(
+	struct xfs_inode	*ip,
+	int			state)
+{
+	if (state & BMAP_COWFORK)
+		return ip->i_cowfp;
+	else if (state & BMAP_ATTRFORK)
+		return ip->i_afp;
+	return &ip->i_df;
+}
+
 /*
  * Insert new item(s) into the extent records for incore inode
  * fork 'ifp'.  'count' new items are inserted at index 'idx'.
@@ -936,7 +960,7 @@ xfs_iext_insert(
 	xfs_bmbt_irec_t	*new,		/* items to insert */
 	int		state)		/* type of extent conversion */
 {
-	xfs_ifork_t	*ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
+	xfs_ifork_t	*ifp = XFS_IEXT_STATE_TO_FORK(ip, state);
 	xfs_extnum_t	i;		/* extent record index */
 
 	trace_xfs_iext_insert(ip, idx, new, state, _RET_IP_);
@@ -1186,7 +1210,7 @@ xfs_iext_remove(
 	int		ext_diff,	/* number of extents to remove */
 	int		state)		/* type of extent conversion */
 {
-	xfs_ifork_t	*ifp = (state & BMAP_ATTRFORK) ? ip->i_afp : &ip->i_df;
+	xfs_ifork_t	*ifp = XFS_IEXT_STATE_TO_FORK(ip, state);
 	xfs_extnum_t	nextents;	/* number of extents in file */
 	int		new_size;	/* size of extents after removal */
 
@@ -1922,3 +1946,22 @@ xfs_iext_irec_update_extoffs(
 		ifp->if_u1.if_ext_irec[i].er_extoff += ext_diff;
 	}
 }
+
+/**
+ * xfs_ifork_init_cow() -- Initialize an inode's copy-on-write fork.
+ *
+ * @ip: XFS inode.
+ */
+void
+xfs_ifork_init_cow(
+	struct xfs_inode	*ip)
+{
+	if (ip->i_cowfp)
+		return;
+
+	ip->i_cowfp = kmem_zone_zalloc(xfs_ifork_zone,
+				       KM_SLEEP | KM_NOFS);
+	ip->i_cowfp->if_flags = XFS_IFEXTENTS;
+	ip->i_cformat = XFS_DINODE_FMT_EXTENTS;
+	ip->i_cnextents = 0;
+}
diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h
index 7d3b1ed..a9f5270 100644
--- a/fs/xfs/libxfs/xfs_inode_fork.h
+++ b/fs/xfs/libxfs/xfs_inode_fork.h
@@ -92,7 +92,9 @@ typedef struct xfs_ifork {
 #define XFS_IFORK_PTR(ip,w)		\
 	((w) == XFS_DATA_FORK ? \
 		&(ip)->i_df : \
-		(ip)->i_afp)
+		((w) == XFS_ATTR_FORK ? \
+			(ip)->i_afp : \
+			(ip)->i_cowfp))
 #define XFS_IFORK_DSIZE(ip) \
 	(XFS_IFORK_Q(ip) ? \
 		XFS_IFORK_BOFF(ip) : \
@@ -105,26 +107,38 @@ typedef struct xfs_ifork {
 #define XFS_IFORK_SIZE(ip,w) \
 	((w) == XFS_DATA_FORK ? \
 		XFS_IFORK_DSIZE(ip) : \
-		XFS_IFORK_ASIZE(ip))
+		((w) == XFS_ATTR_FORK ? \
+			XFS_IFORK_ASIZE(ip) : \
+			0))
 #define XFS_IFORK_FORMAT(ip,w) \
 	((w) == XFS_DATA_FORK ? \
 		(ip)->i_d.di_format : \
-		(ip)->i_d.di_aformat)
+		((w) == XFS_ATTR_FORK ? \
+			(ip)->i_d.di_aformat : \
+			(ip)->i_cformat))
 #define XFS_IFORK_FMT_SET(ip,w,n) \
 	((w) == XFS_DATA_FORK ? \
 		((ip)->i_d.di_format = (n)) : \
-		((ip)->i_d.di_aformat = (n)))
+		((w) == XFS_ATTR_FORK ? \
+			((ip)->i_d.di_aformat = (n)) : \
+			((ip)->i_cformat = (n))))
 #define XFS_IFORK_NEXTENTS(ip,w) \
 	((w) == XFS_DATA_FORK ? \
 		(ip)->i_d.di_nextents : \
-		(ip)->i_d.di_anextents)
+		((w) == XFS_ATTR_FORK ? \
+			(ip)->i_d.di_anextents : \
+			(ip)->i_cnextents))
 #define XFS_IFORK_NEXT_SET(ip,w,n) \
 	((w) == XFS_DATA_FORK ? \
 		((ip)->i_d.di_nextents = (n)) : \
-		((ip)->i_d.di_anextents = (n)))
+		((w) == XFS_ATTR_FORK ? \
+			((ip)->i_d.di_anextents = (n)) : \
+			((ip)->i_cnextents = (n))))
 #define XFS_IFORK_MAXEXT(ip, w) \
 	(XFS_IFORK_SIZE(ip, w) / sizeof(xfs_bmbt_rec_t))
 
+xfs_ifork_t	*XFS_IEXT_STATE_TO_FORK(struct xfs_inode *ip, int state);
+
 int		xfs_iformat_fork(struct xfs_inode *, struct xfs_dinode *);
 void		xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *,
 				struct xfs_inode_log_item *, int);
@@ -168,4 +182,6 @@ void		xfs_iext_irec_update_extoffs(struct xfs_ifork *, int, int);
 
 extern struct kmem_zone	*xfs_ifork_zone;
 
+extern void xfs_ifork_init_cow(struct xfs_inode *ip);
+
 #endif	/* __XFS_INODE_FORK_H__ */
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 32172d1..9a863a1 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -1075,6 +1075,8 @@ __xfs_rmap_add(
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
+	if (ri->ri_whichfork == XFS_COW_FORK)
+		return 0;
 
 	new = kmem_zalloc(sizeof(struct xfs_rmap_intent), KM_SLEEP | KM_NOFS);
 	*new = *ri;
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index be7b6de..8d74870 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -90,6 +90,7 @@ typedef __int64_t	xfs_sfiloff_t;	/* signed block number in a file */
  */
 #define	XFS_DATA_FORK	0
 #define	XFS_ATTR_FORK	1
+#define	XFS_COW_FORK	2
 
 /*
  * Min numbers of data/attr fork btree root pointers.
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index d7a490f..efe2a19 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -76,6 +76,9 @@ xfs_inode_alloc(
 	ip->i_mount = mp;
 	memset(&ip->i_imap, 0, sizeof(struct xfs_imap));
 	ip->i_afp = NULL;
+	ip->i_cowfp = NULL;
+	ip->i_cnextents = 0;
+	ip->i_cformat = XFS_DINODE_FMT_EXTENTS;
 	memset(&ip->i_df, 0, sizeof(xfs_ifork_t));
 	ip->i_flags = 0;
 	ip->i_delayed_blks = 0;
@@ -108,6 +111,8 @@ xfs_inode_free(
 
 	if (ip->i_afp)
 		xfs_idestroy_fork(ip, XFS_ATTR_FORK);
+	if (ip->i_cowfp)
+		xfs_idestroy_fork(ip, XFS_COW_FORK);
 
 	if (ip->i_itemp) {
 		ASSERT(!(ip->i_itemp->ili_item.li_flags & XFS_LI_IN_AIL));
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 6436a96..c1054d1 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -47,6 +47,7 @@ typedef struct xfs_inode {
 
 	/* Extent information. */
 	xfs_ifork_t		*i_afp;		/* attribute fork pointer */
+	xfs_ifork_t		*i_cowfp;	/* copy on write extents */
 	xfs_ifork_t		i_df;		/* data fork */
 
 	/* operations vectors */
@@ -65,6 +66,9 @@ typedef struct xfs_inode {
 
 	xfs_icdinode_t		i_d;		/* most of ondisk inode */
 
+	xfs_extnum_t		i_cnextents;	/* # of extents in cow fork */
+	unsigned int		i_cformat;	/* format of cow fork */
+
 	/* VFS inode */
 	struct inode		i_vnode;	/* embedded VFS inode */
 } xfs_inode_t;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
new file mode 100644
index 0000000..bbcd50c
--- /dev/null
+++ b/fs/xfs/xfs_reflink.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_inode_item.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_error.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ioctl.h"
+#include "xfs_trace.h"
+#include "xfs_log.h"
+#include "xfs_icache.h"
+#include "xfs_pnfs.h"
+#include "xfs_refcount_btree.h"
+#include "xfs_refcount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_bit.h"
+#include "xfs_alloc.h"
+#include "xfs_quota_defs.h"
+#include "xfs_quota.h"
+#include "xfs_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_reflink.h"
+
+/*
+ * Copy on Write of Shared Blocks
+ *
+ * XFS must preserve "the usual" file semantics even when two files share
+ * the same physical blocks.  This means that a write to one file must not
+ * alter the blocks in a different file; the way that we'll do that is
+ * through the use of a copy-on-write mechanism.  At a high level, that
+ * means that when we want to write to a shared block, we allocate a new
+ * block, write the data to the new block, and if that succeeds we map the
+ * new block into the file.
+ *
+ * XFS provides a "delayed allocation" mechanism that defers the allocation
+ * of disk blocks to dirty-but-not-yet-mapped file blocks as long as
+ * possible.  This reduces fragmentation by enabling the filesystem to ask
+ * for bigger chunks less often, which is exactly what we want for CoW.
+ *
+ * The delalloc mechanism begins when the kernel wants to make a block
+ * writable (write_begin or page_mkwrite).  If the offset is not mapped, we
+ * create a delalloc mapping, which is a regular in-core extent, but without
+ * a real startblock.  (For delalloc mappings, the startblock encodes both
+ * a flag that this is a delalloc mapping, and a worst-case estimate of how
+ * many blocks might be required to put the mapping into the BMBT.)  delalloc
+ * mappings are a reservation against the free space in the filesystem;
+ * adjacent mappings can also be combined into fewer larger mappings.
+ *
+ * When dirty pages are being written out (typically in writepage), the
+ * delalloc reservations are converted into real mappings by allocating
+ * blocks and replacing the delalloc mapping with real ones.  A delalloc
+ * mapping can be replaced by several real ones if the free space is
+ * fragmented.
+ *
+ * We want to adapt the delalloc mechanism for copy-on-write, since the
+ * write paths are similar.  The first two steps (creating the reservation
+ * and allocating the blocks) are exactly the same as delalloc except that
+ * the mappings must be stored in a separate CoW fork because we do not want
+ * to disturb the mapping in the data fork until we're sure that the write
+ * succeeded.  IO completion in this case is the process of removing the old
+ * mapping from the data fork and moving the new mapping from the CoW fork to
+ * the data fork.  This will be discussed shortly.
+ *
+ * For now, unaligned directio writes will be bounced back to the page cache.
+ * Block-aligned directio writes will use the same mechanism as buffered
+ * writes.
+ *
+ * CoW remapping must be done after the data block write completes,
+ * because we don't want to destroy the old data fork map until we're sure
+ * the new block has been written.  Since the new mappings are kept in a
+ * separate fork, we can simply iterate these mappings to find the ones
+ * that cover the file blocks that we just CoW'd.  For each extent, simply
+ * unmap the corresponding range in the data fork, map the new range into
+ * the data fork, and remove the extent from the CoW fork.
+ *
+ * Since the remapping operation can be applied to an arbitrary file
+ * range, we record the need for the remap step as a flag in the ioend
+ * instead of declaring a new IO type.  This is required for direct io
+ * because we only have ioend for the whole dio, and we have to be able to
+ * remember the presence of unwritten blocks and CoW blocks with a single
+ * ioend structure.  Better yet, the more ground we can cover with one
+ * ioend, the better.
+ */
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
new file mode 100644
index 0000000..e967cff
--- /dev/null
+++ b/fs/xfs/xfs_reflink.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_REFLINK_H
+#define __XFS_REFLINK_H 1
+
+#endif /* __XFS_REFLINK_H */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 9e3a107..0773938 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -268,10 +268,10 @@ DECLARE_EVENT_CLASS(xfs_bmap_class,
 		__field(unsigned long, caller_ip)
 	),
 	TP_fast_assign(
-		struct xfs_ifork	*ifp = (state & BMAP_ATTRFORK) ?
-						ip->i_afp : &ip->i_df;
+		struct xfs_ifork	*ifp;
 		struct xfs_bmbt_irec	r;
 
+		ifp = XFS_IEXT_STATE_TO_FORK(ip, state);
 		xfs_bmbt_get_all(xfs_iext_get_ext(ifp, idx), &r);
 		__entry->dev = VFS_I(ip)->i_sb->s_dev;
 		__entry->ino = ip->i_ino;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 54/76] xfs: support bmapping delalloc extents in the CoW fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (52 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 53/76] xfs: introduce the CoW fork Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 55/76] xfs: create delalloc extents in " Darrick J. Wong
                   ` (22 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Allow the creation of delayed allocation extents in the CoW fork.
In a subsequent patch we'll wire up write_begin and page_mkwrite to
actually do this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   29 +++++++++++++++++-----------
 fs/xfs/libxfs/xfs_bmap.h |    2 +-
 fs/xfs/xfs_iomap.c       |   48 +++++++++++++++++++++++++++++++++++++---------
 fs/xfs/xfs_iomap.h       |    2 ++
 4 files changed, 60 insertions(+), 21 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 53a8ebc..9737353 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -2878,6 +2878,7 @@ done:
 STATIC void
 xfs_bmap_add_extent_hole_delay(
 	xfs_inode_t		*ip,	/* incore inode pointer */
+	int			whichfork,
 	xfs_extnum_t		*idx,	/* extent number to update/insert */
 	xfs_bmbt_irec_t		*new)	/* new data to add to file extents */
 {
@@ -2889,8 +2890,10 @@ xfs_bmap_add_extent_hole_delay(
 	int			state;  /* state bits, accessed thru macros */
 	xfs_filblks_t		temp=0;	/* temp for indirect calculations */
 
-	ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	ifp = XFS_IFORK_PTR(ip, whichfork);
 	state = 0;
+	if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
 	ASSERT(isnullstartblock(new->br_startblock));
 
 	/*
@@ -2908,7 +2911,7 @@ xfs_bmap_add_extent_hole_delay(
 	 * Check and set flags if the current (right) segment exists.
 	 * If it doesn't exist, we're converting the hole at end-of-file.
 	 */
-	if (*idx < ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) {
+	if (*idx < ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)) {
 		state |= BMAP_RIGHT_VALID;
 		xfs_bmbt_get_all(xfs_iext_get_ext(ifp, *idx), &right);
 
@@ -4260,6 +4263,7 @@ xfs_bmapi_read(
 STATIC int
 xfs_bmapi_reserve_delalloc(
 	struct xfs_inode	*ip,
+	int			whichfork,
 	xfs_fileoff_t		aoff,
 	xfs_filblks_t		len,
 	struct xfs_bmbt_irec	*got,
@@ -4268,7 +4272,7 @@ xfs_bmapi_reserve_delalloc(
 	int			eof)
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	xfs_extlen_t		alen;
 	xfs_extlen_t		indlen;
 	char			rt = XFS_IS_REALTIME_INODE(ip);
@@ -4327,7 +4331,7 @@ xfs_bmapi_reserve_delalloc(
 	got->br_startblock = nullstartblock(indlen);
 	got->br_blockcount = alen;
 	got->br_state = XFS_EXT_NORM;
-	xfs_bmap_add_extent_hole_delay(ip, lastx, got);
+	xfs_bmap_add_extent_hole_delay(ip, whichfork, lastx, got);
 
 	/*
 	 * Update our extent pointer, given that xfs_bmap_add_extent_hole_delay
@@ -4359,6 +4363,7 @@ out_unreserve_quota:
 int
 xfs_bmapi_delay(
 	struct xfs_inode	*ip,	/* incore inode */
+	int			whichfork, /* data or cow fork? */
 	xfs_fileoff_t		bno,	/* starting file offs. mapped */
 	xfs_filblks_t		len,	/* length to map in file */
 	struct xfs_bmbt_irec	*mval,	/* output: map values */
@@ -4366,7 +4371,7 @@ xfs_bmapi_delay(
 	int			flags)	/* XFS_BMAPI_... */
 {
 	struct xfs_mount	*mp = ip->i_mount;
-	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 	struct xfs_bmbt_irec	got;	/* current file extent record */
 	struct xfs_bmbt_irec	prev;	/* previous file extent record */
 	xfs_fileoff_t		obno;	/* old block number (offset) */
@@ -4376,14 +4381,15 @@ xfs_bmapi_delay(
 	int			n = 0;	/* current extent index */
 	int			error = 0;
 
+	ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_COW_FORK);
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
 	ASSERT(!(flags & ~XFS_BMAPI_ENTIRE));
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
 	if (unlikely(XFS_TEST_ERROR(
-	    (XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_EXTENTS &&
-	     XFS_IFORK_FORMAT(ip, XFS_DATA_FORK) != XFS_DINODE_FMT_BTREE),
+	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
 	     mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
 		XFS_ERROR_REPORT("xfs_bmapi_delay", XFS_ERRLEVEL_LOW, mp);
 		return -EFSCORRUPTED;
@@ -4394,19 +4400,20 @@ xfs_bmapi_delay(
 
 	XFS_STATS_INC(mp, xs_blk_mapw);
 
-	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
-		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
+	if (whichfork == XFS_DATA_FORK && !(ifp->if_flags & XFS_IFEXTENTS)) {
+		error = xfs_iread_extents(NULL, ip, whichfork);
 		if (error)
 			return error;
 	}
 
-	xfs_bmap_search_extents(ip, bno, XFS_DATA_FORK, &eof, &lastx, &got, &prev);
+	xfs_bmap_search_extents(ip, bno, whichfork, &eof, &lastx, &got, &prev);
 	end = bno + len;
 	obno = bno;
 
 	while (bno < end && n < *nmap) {
 		if (eof || got.br_startoff > bno) {
-			error = xfs_bmapi_reserve_delalloc(ip, bno, len, &got,
+			error = xfs_bmapi_reserve_delalloc(ip, whichfork,
+							   bno, len, &got,
 							   &prev, &lastx, eof);
 			if (error) {
 				if (n == 0) {
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 2c18ed8a..fb68ac4 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -245,7 +245,7 @@ int	xfs_bmap_read_extents(struct xfs_trans *tp, struct xfs_inode *ip,
 int	xfs_bmapi_read(struct xfs_inode *ip, xfs_fileoff_t bno,
 		xfs_filblks_t len, struct xfs_bmbt_irec *mval,
 		int *nmap, int flags);
-int	xfs_bmapi_delay(struct xfs_inode *ip, xfs_fileoff_t bno,
+int	xfs_bmapi_delay(struct xfs_inode *ip, int whichfork, xfs_fileoff_t bno,
 		xfs_filblks_t len, struct xfs_bmbt_irec *mval,
 		int *nmap, int flags);
 int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 96dcaeb..53e027c 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -565,12 +565,13 @@ check_writeio:
 	return alloc_blocks;
 }
 
-int
-xfs_iomap_write_delay(
+STATIC int
+__xfs_iomap_write_delay(
 	xfs_inode_t	*ip,
 	xfs_off_t	offset,
 	size_t		count,
-	xfs_bmbt_irec_t *ret_imap)
+	xfs_bmbt_irec_t *ret_imap,
+	int		whichfork)
 {
 	xfs_mount_t	*mp = ip->i_mount;
 	xfs_fileoff_t	offset_fsb;
@@ -596,10 +597,14 @@ xfs_iomap_write_delay(
 	extsz = xfs_get_extsz_hint(ip);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 
-	error = xfs_iomap_eof_want_preallocate(mp, ip, offset, count,
-				imap, XFS_WRITE_IMAPS, &prealloc);
-	if (error)
-		return error;
+	if (whichfork == XFS_DATA_FORK) {
+		error = xfs_iomap_eof_want_preallocate(mp, ip, offset, count,
+					imap, XFS_WRITE_IMAPS, &prealloc);
+		if (error)
+			return error;
+	} else {
+		prealloc = 0;
+	}
 
 retry:
 	if (prealloc) {
@@ -631,8 +636,8 @@ retry:
 	ASSERT(last_fsb > offset_fsb);
 
 	nimaps = XFS_WRITE_IMAPS;
-	error = xfs_bmapi_delay(ip, offset_fsb, last_fsb - offset_fsb,
-				imap, &nimaps, XFS_BMAPI_ENTIRE);
+	error = xfs_bmapi_delay(ip, whichfork, offset_fsb,
+			last_fsb - offset_fsb, imap, &nimaps, XFS_BMAPI_ENTIRE);
 	switch (error) {
 	case 0:
 	case -ENOSPC:
@@ -670,6 +675,31 @@ retry:
 	return 0;
 }
 
+int
+xfs_iomap_write_delay(
+	xfs_inode_t	*ip,
+	xfs_off_t	offset,
+	size_t		count,
+	xfs_bmbt_irec_t *ret_imap)
+{
+	return __xfs_iomap_write_delay(ip, offset, count, ret_imap,
+				       XFS_DATA_FORK);
+}
+
+int
+xfs_iomap_cow_delay(
+	xfs_inode_t	*ip,
+	xfs_off_t	offset,
+	size_t		count,
+	xfs_bmbt_irec_t *ret_imap)
+{
+	ASSERT(XFS_IFORK_PTR(ip, XFS_COW_FORK) != NULL);
+	trace_xfs_iomap_cow_delay(ip, offset, count);
+
+	return __xfs_iomap_write_delay(ip, offset, count, ret_imap,
+				       XFS_COW_FORK);
+}
+
 /*
  * Pass in a delayed allocate extent, convert it to real extents;
  * return to the caller the extent we create which maps on top of
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 8688e66..f6a9adf 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -28,5 +28,7 @@ int xfs_iomap_write_delay(struct xfs_inode *, xfs_off_t, size_t,
 int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
 			struct xfs_bmbt_irec *);
 int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t);
+int xfs_iomap_cow_delay(struct xfs_inode *, xfs_off_t, size_t,
+			struct xfs_bmbt_irec *);
 
 #endif /* __XFS_IOMAP_H__*/

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 55/76] xfs: create delalloc extents in CoW fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (53 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 54/76] xfs: support bmapping delalloc extents in " Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 56/76] xfs: support allocating delayed " Darrick J. Wong
                   ` (21 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Wire up write_begin and page_mkwrite to detect shared extents and
create delayed allocation extents in the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    |   10 ++++
 fs/xfs/xfs_file.c    |   10 ++++
 fs/xfs/xfs_reflink.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    3 +
 4 files changed, 163 insertions(+)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 29e7e5d..afb62e0 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -31,6 +31,7 @@
 #include "xfs_bmap.h"
 #include "xfs_bmap_util.h"
 #include "xfs_bmap_btree.h"
+#include "xfs_reflink.h"
 #include <linux/gfp.h>
 #include <linux/mpage.h>
 #include <linux/pagevec.h>
@@ -1831,6 +1832,15 @@ xfs_vm_write_begin(
 	if (!page)
 		return -ENOMEM;
 
+	/* Reserve delalloc blocks for CoW. */
+	status = xfs_reflink_reserve_cow_range(XFS_I(mapping->host), pos, len);
+	if (status) {
+		unlock_page(page);
+		page_cache_release(page);
+		*pagep = NULL;
+		return status;
+	}
+
 	status = __block_write_begin(page, pos, len, xfs_get_blocks);
 	if (unlikely(status)) {
 		struct inode	*inode = mapping->host;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index f5392ab..0fbcb38 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -37,6 +37,7 @@
 #include "xfs_log.h"
 #include "xfs_icache.h"
 #include "xfs_pnfs.h"
+#include "xfs_reflink.h"
 
 #include <linux/dcache.h>
 #include <linux/falloc.h>
@@ -1518,6 +1519,14 @@ xfs_filemap_page_mkwrite(
 	file_update_time(vma->vm_file);
 	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 
+	/* Reserve delalloc blocks for CoW. */
+	ret = xfs_reflink_reserve_cow_range(XFS_I(inode),
+			vmf->page->index << PAGE_CACHE_SHIFT, PAGE_CACHE_SIZE);
+	if (ret) {
+		ret = block_page_mkwrite_return(ret);
+		goto out;
+	}
+
 	if (IS_DAX(inode)) {
 		ret = __dax_mkwrite(vma, vmf, xfs_get_blocks_dax_fault, NULL);
 	} else {
@@ -1525,6 +1534,7 @@ xfs_filemap_page_mkwrite(
 		ret = block_page_mkwrite_return(ret);
 	}
 
+out:
 	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
 	sb_end_pagefault(inode->i_sb);
 
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index bbcd50c..fdc538e 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -48,6 +48,7 @@
 #include "xfs_btree.h"
 #include "xfs_bmap_btree.h"
 #include "xfs_reflink.h"
+#include "xfs_iomap.h"
 
 /*
  * Copy on Write of Shared Blocks
@@ -109,3 +110,142 @@
  * ioend structure.  Better yet, the more ground we can cover with one
  * ioend, the better.
  */
+
+/* Trim extent to fit a logical block range. */
+static void
+xfs_trim_extent(
+	struct xfs_bmbt_irec	*irec,
+	xfs_fileoff_t		bno,
+	xfs_extlen_t		len)
+{
+	xfs_fileoff_t		distance;
+	xfs_fileoff_t		end = bno + len;
+
+	if (irec->br_startoff < bno) {
+		distance = bno - irec->br_startoff;
+		irec->br_startblock += distance;
+		irec->br_startoff += distance;
+		irec->br_blockcount -= distance;
+	}
+
+	if (end < irec->br_startoff + irec->br_blockcount) {
+		distance = irec->br_startoff + irec->br_blockcount - end;
+		irec->br_blockcount -= distance;
+	}
+}
+
+/* Find the shared ranges under an irec, and set up delalloc extents. */
+STATIC int
+xfs_reflink_reserve_cow_extent(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*irec)
+{
+	struct xfs_bmbt_irec	rec;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	xfs_extlen_t		aglen;
+	xfs_agblock_t		fbno;
+	xfs_extlen_t		flen;
+	xfs_fileoff_t		lblk;
+	xfs_off_t		foffset;
+	xfs_extlen_t		distance;
+	size_t			fsize;
+	int			error = 0;
+
+	/* Holes, unwritten, and delalloc extents cannot be shared */
+	if (ISUNWRITTEN(irec) ||
+	    irec->br_startblock == HOLESTARTBLOCK ||
+	    irec->br_startblock == DELAYSTARTBLOCK)
+		return 0;
+
+	trace_xfs_reflink_reserve_cow_extent(ip, irec);
+	agno = XFS_FSB_TO_AGNO(ip->i_mount, irec->br_startblock);
+	agbno = XFS_FSB_TO_AGBNO(ip->i_mount, irec->br_startblock);
+	lblk = irec->br_startoff;
+	aglen = irec->br_blockcount;
+
+	while (aglen > 0) {
+		/* Find maximal fork range within this extent */
+		error = xfs_refcount_find_shared(ip->i_mount, agno, agbno,
+				aglen, &fbno, &flen, true);
+		if (error)
+			break;
+		if (flen == 0) {
+			distance = fbno - agbno;
+			goto advloop;
+		}
+
+		/* Add as much as we can to the cow fork */
+		foffset = XFS_FSB_TO_B(ip->i_mount, lblk + fbno - agbno);
+		fsize = XFS_FSB_TO_B(ip->i_mount, flen);
+		error = xfs_iomap_cow_delay(ip, foffset, fsize, &rec);
+		if (error)
+			break;
+
+		distance = (rec.br_startoff - lblk) + rec.br_blockcount;
+advloop:
+		if (aglen < distance)
+			break;
+		aglen -= distance;
+		agbno += distance;
+		lblk += distance;
+	}
+
+	if (error)
+		trace_xfs_reflink_reserve_cow_extent_error(ip, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_reflink_reserve_cow_range() -- Reserve blocks to satisfy a copy on
+ *				      write operation.
+ * @ip: XFS inode.
+ * @pos: file offset to start CoWing.
+ * @len: number of bytes to CoW.
+ */
+int
+xfs_reflink_reserve_cow_range(
+	struct xfs_inode	*ip,
+	xfs_off_t		pos,
+	xfs_off_t		len)
+{
+	struct xfs_bmbt_irec	imap;
+	int			nimaps;
+	int			error = 0;
+	xfs_fileoff_t		lblk;
+	xfs_fileoff_t		next_lblk;
+	xfs_off_t		offset;
+
+	if (!xfs_is_reflink_inode(ip))
+		return 0;
+
+	trace_xfs_reflink_reserve_cow_range(ip, len, pos, 0);
+
+	lblk = XFS_B_TO_FSBT(ip->i_mount, pos);
+	next_lblk = XFS_B_TO_FSB(ip->i_mount, pos + len - 1);
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	while (lblk < next_lblk) {
+		offset = XFS_FSB_TO_B(ip->i_mount, lblk);
+		/* Read extent from the source file. */
+		nimaps = 1;
+		error = xfs_bmapi_read(ip, lblk, next_lblk - lblk, &imap,
+				&nimaps, 0);
+		if (error)
+			break;
+
+		if (nimaps == 0)
+			break;
+
+		/* Fork all the shared blocks in this extent. */
+		error = xfs_reflink_reserve_cow_extent(ip, &imap);
+		if (error)
+			break;
+
+		lblk += imap.br_blockcount;
+	}
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	if (error)
+		trace_xfs_reflink_reserve_cow_range_error(ip, error, _RET_IP_);
+	return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index e967cff..41afdbe 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -18,4 +18,7 @@
 #ifndef __XFS_REFLINK_H
 #define __XFS_REFLINK_H 1
 
+extern int xfs_reflink_reserve_cow_range(struct xfs_inode *ip, xfs_off_t pos,
+		xfs_off_t len);
+
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 56/76] xfs: support allocating delayed extents in CoW fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (54 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 55/76] xfs: create delalloc extents in " Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 57/76] xfs: allocate " Darrick J. Wong
                   ` (20 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
allocation extents in the CoW fork to real allocations, and wire this
up all the way back to xfs_iomap_write_allocate().  In a subsequent
patch, we'll modify the writepage handler to call this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |   51 ++++++++++++++++++++++++++++++++--------------
 fs/xfs/xfs_aops.c        |    6 ++++-
 fs/xfs/xfs_iomap.c       |    7 +++++-
 fs/xfs/xfs_iomap.h       |    2 +-
 4 files changed, 46 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 9737353..de0ac37 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -139,7 +139,8 @@ xfs_bmbt_lookup_ge(
  */
 static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
 {
-	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
+	return whichfork != XFS_COW_FORK &&
+		XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS &&
 		XFS_IFORK_NEXTENTS(ip, whichfork) >
 			XFS_IFORK_MAXEXT(ip, whichfork);
 }
@@ -149,7 +150,8 @@ static inline bool xfs_bmap_needs_btree(struct xfs_inode *ip, int whichfork)
  */
 static inline bool xfs_bmap_wants_extents(struct xfs_inode *ip, int whichfork)
 {
-	return XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
+	return whichfork != XFS_COW_FORK &&
+		XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE &&
 		XFS_IFORK_NEXTENTS(ip, whichfork) <=
 			XFS_IFORK_MAXEXT(ip, whichfork);
 }
@@ -671,6 +673,7 @@ xfs_bmap_btree_to_extents(
 
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
+	ASSERT(whichfork != XFS_COW_FORK);
 	ASSERT(ifp->if_flags & XFS_IFEXTENTS);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE);
 	rblock = ifp->if_broot;
@@ -737,6 +740,7 @@ xfs_bmap_extents_to_btree(
 	xfs_bmbt_ptr_t		*pp;		/* root block address pointer */
 
 	mp = ip->i_mount;
+	ASSERT(whichfork != XFS_COW_FORK);
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_EXTENTS);
 
@@ -868,6 +872,7 @@ xfs_bmap_local_to_extents_empty(
 {
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, whichfork);
 
+	ASSERT(whichfork != XFS_COW_FORK);
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL);
 	ASSERT(ifp->if_bytes == 0);
 	ASSERT(XFS_IFORK_NEXTENTS(ip, whichfork) == 0);
@@ -1703,7 +1708,8 @@ xfs_bmap_one_block(
  */
 STATIC int				/* error */
 xfs_bmap_add_extent_delay_real(
-	struct xfs_bmalloca	*bma)
+	struct xfs_bmalloca	*bma,
+	int			whichfork)
 {
 	struct xfs_bmbt_irec	*new = &bma->got;
 	int			diff;	/* temp value */
@@ -1722,10 +1728,13 @@ xfs_bmap_add_extent_delay_real(
 	xfs_filblks_t		temp2=0;/* value for da_new calculations */
 	int			tmp_rval;	/* partial logging flags */
 	struct xfs_mount	*mp;
-	int			whichfork = XFS_DATA_FORK;
+	xfs_extnum_t		*nextents;
 
 	mp  = bma->tp ? bma->tp->t_mountp : NULL;
 	ifp = XFS_IFORK_PTR(bma->ip, whichfork);
+	ASSERT(whichfork != XFS_ATTR_FORK);
+	nextents = (whichfork == XFS_COW_FORK ? &bma->ip->i_cnextents :
+						&bma->ip->i_d.di_nextents);
 
 	ASSERT(bma->idx >= 0);
 	ASSERT(bma->idx <= ifp->if_bytes / sizeof(struct xfs_bmbt_rec));
@@ -1739,6 +1748,9 @@ xfs_bmap_add_extent_delay_real(
 #define	RIGHT		r[1]
 #define	PREV		r[2]
 
+	if (whichfork == XFS_COW_FORK)
+		state |= BMAP_COWFORK;
+
 	/*
 	 * Set up a bunch of variables to make the tests simpler.
 	 */
@@ -1825,7 +1837,7 @@ xfs_bmap_add_extent_delay_real(
 		trace_xfs_bmap_post_update(bma->ip, bma->idx, state, _THIS_IP_);
 
 		xfs_iext_remove(bma->ip, bma->idx + 1, 2, state);
-		bma->ip->i_d.di_nextents--;
+		(*nextents)--;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -1939,7 +1951,7 @@ xfs_bmap_add_extent_delay_real(
 		xfs_bmbt_set_startblock(ep, new->br_startblock);
 		trace_xfs_bmap_post_update(bma->ip, bma->idx, state, _THIS_IP_);
 
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2017,7 +2029,7 @@ xfs_bmap_add_extent_delay_real(
 		temp = PREV.br_blockcount - new->br_blockcount;
 		xfs_bmbt_set_blockcount(ep, temp);
 		xfs_iext_insert(bma->ip, bma->idx, 1, new, state);
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2107,7 +2119,7 @@ xfs_bmap_add_extent_delay_real(
 		trace_xfs_bmap_pre_update(bma->ip, bma->idx, state, _THIS_IP_);
 		xfs_bmbt_set_blockcount(ep, temp);
 		xfs_iext_insert(bma->ip, bma->idx + 1, 1, new, state);
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2180,7 +2192,7 @@ xfs_bmap_add_extent_delay_real(
 		RIGHT.br_blockcount = temp2;
 		/* insert LEFT (r[0]) and RIGHT (r[1]) at the same time */
 		xfs_iext_insert(bma->ip, bma->idx + 1, 2, &LEFT, state);
-		bma->ip->i_d.di_nextents++;
+		(*nextents)++;
 		if (bma->cur == NULL)
 			rval = XFS_ILOG_CORE | XFS_ILOG_DEXT;
 		else {
@@ -2277,7 +2289,8 @@ xfs_bmap_add_extent_delay_real(
 
 	xfs_bmap_check_leaf_extents(bma->cur, bma->ip, whichfork);
 done:
-	bma->logflags |= rval;
+	if (whichfork != XFS_COW_FORK)
+		bma->logflags |= rval;
 	return error;
 #undef	LEFT
 #undef	RIGHT
@@ -3987,7 +4000,8 @@ xfs_bmap_btalloc(
 		ASSERT(nullfb || fb_agno == args.agno ||
 		       (ap->flist->xbf_low && fb_agno < args.agno));
 		ap->length = args.len;
-		ap->ip->i_d.di_nblocks += args.len;
+		if (!(ap->flags & XFS_BMAPI_COWFORK))
+			ap->ip->i_d.di_nblocks += args.len;
 		xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE);
 		if (ap->wasdel)
 			ap->ip->i_delayed_blks -= args.len;
@@ -4449,8 +4463,7 @@ xfs_bmapi_allocate(
 	struct xfs_bmalloca	*bma)
 {
 	struct xfs_mount	*mp = bma->ip->i_mount;
-	int			whichfork = (bma->flags & XFS_BMAPI_ATTRFORK) ?
-						XFS_ATTR_FORK : XFS_DATA_FORK;
+	int			whichfork = xfs_bmapi_whichfork(bma->flags);
 	struct xfs_ifork	*ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 	int			tmp_logflags = 0;
 	int			error;
@@ -4539,7 +4552,7 @@ xfs_bmapi_allocate(
 		bma->got.br_state = XFS_EXT_UNWRITTEN;
 
 	if (bma->wasdel)
-		error = xfs_bmap_add_extent_delay_real(bma);
+		error = xfs_bmap_add_extent_delay_real(bma, whichfork);
 	else
 		error = xfs_bmap_add_extent_hole_real(bma, whichfork);
 
@@ -4693,8 +4706,7 @@ xfs_bmapi_write(
 	orig_mval = mval;
 	orig_nmap = *nmap;
 #endif
-	whichfork = (flags & XFS_BMAPI_ATTRFORK) ?
-		XFS_ATTR_FORK : XFS_DATA_FORK;
+	whichfork = xfs_bmapi_whichfork(flags);
 
 	ASSERT(*nmap >= 1);
 	ASSERT(*nmap <= XFS_BMAP_MAX_NMAP);
@@ -4705,6 +4717,11 @@ xfs_bmapi_write(
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 	if (whichfork == XFS_ATTR_FORK)
 		ASSERT(!(flags & XFS_BMAPI_REMAP));
+	if (whichfork == XFS_COW_FORK) {
+		ASSERT(!(flags & XFS_BMAPI_REMAP));
+		ASSERT(!(flags & XFS_BMAPI_PREALLOC));
+		ASSERT(!(flags & XFS_BMAPI_CONVERT));
+	}
 	if (flags & XFS_BMAPI_REMAP) {
 		ASSERT(!(flags & XFS_BMAPI_PREALLOC));
 		ASSERT(!(flags & XFS_BMAPI_CONVERT));
@@ -4775,6 +4792,8 @@ xfs_bmapi_write(
 		 */
 		if (flags & XFS_BMAPI_REMAP)
 			ASSERT(inhole);
+		if (flags & XFS_BMAPI_COWFORK)
+			ASSERT(!inhole);
 
 		/*
 		 * First, deal with the hole before the allocated space
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index afb62e0..13629d2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -324,9 +324,11 @@ xfs_map_blocks(
 
 	if (type == XFS_IO_DELALLOC &&
 	    (!nimaps || isnullstartblock(imap->br_startblock))) {
-		error = xfs_iomap_write_allocate(ip, offset, imap);
+		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
+				imap);
 		if (!error)
-			trace_xfs_map_blocks_alloc(ip, offset, count, type, imap);
+			trace_xfs_map_blocks_alloc(ip, offset, count, type,
+					imap);
 		return error;
 	}
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 53e027c..d183b2a 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -713,6 +713,7 @@ xfs_iomap_cow_delay(
 int
 xfs_iomap_write_allocate(
 	xfs_inode_t	*ip,
+	int		whichfork,
 	xfs_off_t	offset,
 	xfs_bmbt_irec_t *imap)
 {
@@ -725,8 +726,12 @@ xfs_iomap_write_allocate(
 	xfs_trans_t	*tp;
 	int		nimaps, committed;
 	int		error = 0;
+	int		flags = 0;
 	int		nres;
 
+	if (whichfork == XFS_COW_FORK)
+		flags |= XFS_BMAPI_COWFORK;
+
 	/*
 	 * Make sure that the dquots are there.
 	 */
@@ -818,7 +823,7 @@ xfs_iomap_write_allocate(
 			 * pointer that the caller gave to us.
 			 */
 			error = xfs_bmapi_write(tp, ip, map_start_fsb,
-						count_fsb, 0, &first_block,
+						count_fsb, flags, &first_block,
 						nres, imap, &nimaps,
 						&free_list);
 			if (error)
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index f6a9adf..ba037d6 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -25,7 +25,7 @@ int xfs_iomap_write_direct(struct xfs_inode *, xfs_off_t, size_t,
 			struct xfs_bmbt_irec *, int);
 int xfs_iomap_write_delay(struct xfs_inode *, xfs_off_t, size_t,
 			struct xfs_bmbt_irec *);
-int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
+int xfs_iomap_write_allocate(struct xfs_inode *, int, xfs_off_t,
 			struct xfs_bmbt_irec *);
 int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t);
 int xfs_iomap_cow_delay(struct xfs_inode *, xfs_off_t, size_t,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 57/76] xfs: allocate delayed extents in CoW fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (55 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 56/76] xfs: support allocating delayed " Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2016-01-03 12:20   ` Christoph Hellwig
  2016-01-09  9:59   ` Darrick J. Wong
  2015-12-19  9:02 ` [PATCH 58/76] xfs: support removing extents from " Darrick J. Wong
                   ` (19 subsequent siblings)
  76 siblings, 2 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Modify the writepage handler to find and convert pending delalloc
extents to real allocations.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    |   63 +++++++++++++++++++++++++++++++++++---------
 fs/xfs/xfs_aops.h    |   10 +++++++
 fs/xfs/xfs_reflink.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    3 ++
 4 files changed, 135 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 13629d2..7179b25 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -285,7 +285,8 @@ xfs_map_blocks(
 	loff_t			offset,
 	struct xfs_bmbt_irec	*imap,
 	int			type,
-	int			nonblocking)
+	int			nonblocking,
+	bool			is_cow)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -294,10 +295,15 @@ xfs_map_blocks(
 	int			error = 0;
 	int			bmapi_flags = XFS_BMAPI_ENTIRE;
 	int			nimaps = 1;
+	int			whichfork;
+	bool			need_alloc;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
+	whichfork = (is_cow ? XFS_COW_FORK : XFS_DATA_FORK);
+	need_alloc = (type == XFS_IO_DELALLOC);
+
 	if (type == XFS_IO_UNWRITTEN)
 		bmapi_flags |= XFS_BMAPI_IGSTATE;
 
@@ -315,16 +321,21 @@ xfs_map_blocks(
 		count = mp->m_super->s_maxbytes - offset;
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
-	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				imap, &nimaps, bmapi_flags);
+
+	if (is_cow)
+		error = xfs_reflink_find_cow_mapping(ip, offset, imap,
+						     &need_alloc);
+	else
+		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
+				       imap, &nimaps, bmapi_flags);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	if (error)
 		return error;
 
-	if (type == XFS_IO_DELALLOC &&
+	if (need_alloc &&
 	    (!nimaps || isnullstartblock(imap->br_startblock))) {
-		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
+		error = xfs_iomap_write_allocate(ip, whichfork, offset,
 				imap);
 		if (!error)
 			trace_xfs_map_blocks_alloc(ip, offset, count, type,
@@ -575,7 +586,8 @@ xfs_add_to_ioend(
 	xfs_off_t		offset,
 	unsigned int		type,
 	xfs_ioend_t		**result,
-	int			need_ioend)
+	int			need_ioend,
+	bool			is_cow)
 {
 	xfs_ioend_t		*ioend = *result;
 
@@ -593,6 +605,8 @@ xfs_add_to_ioend(
 		ioend->io_buffer_tail->b_private = bh;
 		ioend->io_buffer_tail = bh;
 	}
+	if (is_cow)
+		ioend->io_flags |= XFS_IOEND_COW;
 
 	bh->b_private = NULL;
 	ioend->io_size += bh->b_size;
@@ -703,6 +717,8 @@ xfs_convert_page(
 	int			len, page_dirty;
 	int			count = 0, done = 0, uptodate = 1;
  	xfs_off_t		offset = page_offset(page);
+	bool			is_cow;
+	struct xfs_inode	*ip = XFS_I(inode);
 
 	if (page->index != tindex)
 		goto fail;
@@ -786,6 +802,15 @@ xfs_convert_page(
 			else
 				type = XFS_IO_OVERWRITE;
 
+			/* Figure out if CoW is pending for the other pages. */
+			is_cow = false;
+			if (type == XFS_IO_OVERWRITE &&
+			    xfs_sb_version_hasreflink(&ip->i_mount->m_sb)) {
+				xfs_ilock(ip, XFS_ILOCK_SHARED);
+				is_cow = xfs_reflink_is_cow_pending(ip, offset);
+				xfs_iunlock(ip, XFS_ILOCK_SHARED);
+			}
+
 			/*
 			 * imap should always be valid because of the above
 			 * partial page end_offset check on the imap.
@@ -793,10 +818,10 @@ xfs_convert_page(
 			ASSERT(xfs_imap_valid(inode, imap, offset));
 
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
+			if (type != XFS_IO_OVERWRITE || is_cow)
 				xfs_map_at_offset(inode, bh, imap, offset);
 			xfs_add_to_ioend(inode, bh, offset, type,
-					 ioendp, done);
+					 ioendp, done, is_cow);
 
 			page_dirty--;
 			count++;
@@ -959,6 +984,8 @@ xfs_vm_writepage(
 	int			err, imap_valid = 0, uptodate = 1;
 	int			count = 0;
 	int			nonblocking = 0;
+	struct xfs_inode	*ip = XFS_I(inode);
+	bool			was_cow, is_cow;
 
 	trace_xfs_writepage(inode, page, 0, 0);
 
@@ -1057,6 +1084,7 @@ xfs_vm_writepage(
 	bh = head = page_buffers(page);
 	offset = page_offset(page);
 	type = XFS_IO_OVERWRITE;
+	was_cow = false;
 
 	if (wbc->sync_mode == WB_SYNC_NONE)
 		nonblocking = 1;
@@ -1069,6 +1097,7 @@ xfs_vm_writepage(
 		if (!buffer_uptodate(bh))
 			uptodate = 0;
 
+		is_cow = false;
 		/*
 		 * set_page_dirty dirties all buffers in a page, independent
 		 * of their state.  The dirty state however is entirely
@@ -1091,7 +1120,15 @@ xfs_vm_writepage(
 				imap_valid = 0;
 			}
 		} else if (buffer_uptodate(bh)) {
-			if (type != XFS_IO_OVERWRITE) {
+			if (xfs_sb_version_hasreflink(&ip->i_mount->m_sb)) {
+				xfs_ilock(ip, XFS_ILOCK_SHARED);
+				is_cow = xfs_reflink_is_cow_pending(ip,
+						offset);
+				xfs_iunlock(ip, XFS_ILOCK_SHARED);
+			}
+
+			if (type != XFS_IO_OVERWRITE ||
+			    is_cow != was_cow) {
 				type = XFS_IO_OVERWRITE;
 				imap_valid = 0;
 			}
@@ -1121,23 +1158,23 @@ xfs_vm_writepage(
 			 */
 			new_ioend = 1;
 			err = xfs_map_blocks(inode, offset, &imap, type,
-					     nonblocking);
+					     nonblocking, is_cow);
 			if (err)
 				goto error;
 			imap_valid = xfs_imap_valid(inode, &imap, offset);
 		}
 		if (imap_valid) {
 			lock_buffer(bh);
-			if (type != XFS_IO_OVERWRITE)
+			if (type != XFS_IO_OVERWRITE || is_cow)
 				xfs_map_at_offset(inode, bh, &imap, offset);
 			xfs_add_to_ioend(inode, bh, offset, type, &ioend,
-					 new_ioend);
+					 new_ioend, is_cow);
 			count++;
 		}
 
 		if (!iohead)
 			iohead = ioend;
-
+		was_cow = is_cow;
 	} while (offset += len, ((bh = bh->b_this_page) != head));
 
 	if (uptodate && bh == head)
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index f6ffc9a..ac59548 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -34,6 +34,8 @@ enum {
 	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
 	{ XFS_IO_OVERWRITE,		"overwrite" }
 
+#define XFS_IOEND_COW		(1)
+
 /*
  * xfs_ioend struct manages large extent writes for XFS.
  * It can manage several multi-page bio's at once.
@@ -50,8 +52,16 @@ typedef struct xfs_ioend {
 	xfs_off_t		io_offset;	/* offset in the file */
 	struct work_struct	io_work;	/* xfsdatad work queue */
 	struct xfs_trans	*io_append_trans;/* xact. for size update */
+	unsigned long		io_flags;	/* status flags */
 } xfs_ioend_t;
 
+static inline bool
+xfs_ioend_is_cow(
+	struct xfs_ioend	*ioend)
+{
+	return ioend->io_flags & XFS_IOEND_COW;
+}
+
 extern const struct address_space_operations xfs_address_space_operations;
 
 int	xfs_get_blocks(struct inode *inode, sector_t offset,
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index fdc538e..65d4c2d 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -249,3 +249,75 @@ xfs_reflink_reserve_cow_range(
 		trace_xfs_reflink_reserve_cow_range_error(ip, error, _RET_IP_);
 	return error;
 }
+
+/**
+ * xfs_reflink_is_cow_pending() -- Determine if CoW is pending for a given
+ *				   file and offset.
+ *
+ * @ip: XFS inode object.
+ * @offset: The file offset, in bytes.
+ */
+bool
+xfs_reflink_is_cow_pending(
+	struct xfs_inode		*ip,
+	xfs_off_t			offset)
+{
+	struct xfs_ifork		*ifp;
+	struct xfs_bmbt_rec_host	*gotp;
+	struct xfs_bmbt_irec		irec;
+	xfs_fileoff_t			bno;
+	xfs_extnum_t			idx;
+
+	if (!xfs_is_reflink_inode(ip))
+		return false;
+
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	bno = XFS_B_TO_FSBT(ip->i_mount, offset);
+	gotp = xfs_iext_bno_to_ext(ifp, bno, &idx);
+
+	if (!gotp)
+		return false;
+
+	xfs_bmbt_get_all(gotp, &irec);
+	if (bno >= irec.br_startoff + irec.br_blockcount ||
+	    bno < irec.br_startoff)
+		return false;
+	return true;
+}
+
+/**
+ * xfs_reflink_find_cow_mapping() -- Find the mapping for a CoW block.
+ *
+ * @ip: The XFS inode object.
+ * @offset: The file offset, in bytes.
+ * @imap: The mapping we're going to use for this block.
+ * @nimaps: Number of mappings we're returning.
+ */
+int
+xfs_reflink_find_cow_mapping(
+	struct xfs_inode		*ip,
+	xfs_off_t			offset,
+	struct xfs_bmbt_irec		*imap,
+	bool				*need_alloc)
+{
+	struct xfs_bmbt_irec		irec;
+	struct xfs_ifork		*ifp;
+	struct xfs_bmbt_rec_host	*gotp;
+	xfs_fileoff_t			bno;
+	xfs_extnum_t			idx;
+
+	/* Find the extent in the CoW fork. */
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	bno = XFS_B_TO_FSBT(ip->i_mount, offset);
+	gotp = xfs_iext_bno_to_ext(ifp, bno, &idx);
+	xfs_bmbt_get_all(gotp, &irec);
+
+	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
+			&irec);
+
+	/* If it's still delalloc, we must allocate later. */
+	*imap = irec;
+	*need_alloc = !!(isnullstartblock(irec.br_startblock));
+
+	return 0;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 41afdbe..1018ac9 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -20,5 +20,8 @@
 
 extern int xfs_reflink_reserve_cow_range(struct xfs_inode *ip, xfs_off_t pos,
 		xfs_off_t len);
+extern bool xfs_reflink_is_cow_pending(struct xfs_inode *ip, xfs_off_t offset);
+extern int xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
+		struct xfs_bmbt_irec *imap, bool *need_alloc);
 
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 58/76] xfs: support removing extents from CoW fork
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (56 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 57/76] xfs: allocate " Darrick J. Wong
@ 2015-12-19  9:02 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 59/76] xfs: move mappings from cow fork to data fork after copy-write Darrick J. Wong
                   ` (18 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:02 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Create a helper method to remove extents from the CoW fork without
any of the side effects (rmapbt/bmbt updates) of the regular extent
deletion routine.  We'll eventually use this to clear out the CoW fork
during ioend processing.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_bmap.c |  173 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_bmap.h |    2 +
 2 files changed, 175 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index de0ac37..21548b9 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5272,6 +5272,179 @@ done:
 }
 
 /*
+ * xfs_bunmapi_cow() -- Remove the relevant parts of the CoW fork.
+ *			See xfs_bmap_del_extent.
+ * @ip: XFS inode.
+ * @idx: Extent number to delete.
+ * @del: Extent to remove.
+ */
+int
+xfs_bunmapi_cow(
+	xfs_inode_t		*ip,
+	xfs_extnum_t		*idx,
+	xfs_bmbt_irec_t		*del)
+{
+	xfs_filblks_t		da_new;	/* new delay-alloc indirect blocks */
+	xfs_filblks_t		da_old;	/* old delay-alloc indirect blocks */
+	xfs_fsblock_t		del_endblock = 0;/* first block past del */
+	xfs_fileoff_t		del_endoff;	/* first offset past del */
+	int			delay;	/* current block is delayed allocated */
+	xfs_bmbt_rec_host_t	*ep;	/* current extent entry pointer */
+	int			error;	/* error return value */
+	xfs_bmbt_irec_t		got;	/* current extent entry */
+	xfs_fileoff_t		got_endoff;	/* first offset past got */
+	xfs_ifork_t		*ifp;	/* inode fork pointer */
+	xfs_mount_t		*mp;	/* mount structure */
+	xfs_filblks_t		nblks;	/* quota/sb block count */
+	xfs_bmbt_irec_t		new;	/* new record to be inserted */
+	/* REFERENCED */
+	uint			qfield;	/* quota field to update */
+	xfs_filblks_t		temp;	/* for indirect length calculations */
+	xfs_filblks_t		temp2;	/* for indirect length calculations */
+	int			state = BMAP_COWFORK;
+
+	mp = ip->i_mount;
+	XFS_STATS_INC(mp, xs_del_exlist);
+
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	ASSERT((*idx >= 0) && (*idx < ifp->if_bytes /
+		(uint)sizeof(xfs_bmbt_rec_t)));
+	ASSERT(del->br_blockcount > 0);
+	ep = xfs_iext_get_ext(ifp, *idx);
+	xfs_bmbt_get_all(ep, &got);
+	ASSERT(got.br_startoff <= del->br_startoff);
+	del_endoff = del->br_startoff + del->br_blockcount;
+	got_endoff = got.br_startoff + got.br_blockcount;
+	ASSERT(got_endoff >= del_endoff);
+	delay = isnullstartblock(got.br_startblock);
+	ASSERT(isnullstartblock(del->br_startblock) == delay);
+	qfield = 0;
+	error = 0;
+	/*
+	 * If deleting a real allocation, must free up the disk space.
+	 */
+	if (!delay) {
+		nblks = del->br_blockcount;
+		qfield = XFS_TRANS_DQ_BCOUNT;
+		/*
+		 * Set up del_endblock and cur for later.
+		 */
+		del_endblock = del->br_startblock + del->br_blockcount;
+		da_old = da_new = 0;
+	} else {
+		da_old = startblockval(got.br_startblock);
+		da_new = 0;
+		nblks = 0;
+	}
+	qfield = qfield;
+	nblks = nblks;
+
+	/*
+	 * Set flag value to use in switch statement.
+	 * Left-contig is 2, right-contig is 1.
+	 */
+	switch (((got.br_startoff == del->br_startoff) << 1) |
+		(got_endoff == del_endoff)) {
+	case 3:
+		/*
+		 * Matches the whole extent.  Delete the entry.
+		 */
+		xfs_iext_remove(ip, *idx, 1, BMAP_COWFORK);
+		--*idx;
+		break;
+
+	case 2:
+		/*
+		 * Deleting the first part of the extent.
+		 */
+		trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_);
+		xfs_bmbt_set_startoff(ep, del_endoff);
+		temp = got.br_blockcount - del->br_blockcount;
+		xfs_bmbt_set_blockcount(ep, temp);
+		if (delay) {
+			temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+				da_old);
+			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
+			trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+			da_new = temp;
+			break;
+		}
+		xfs_bmbt_set_startblock(ep, del_endblock);
+		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+		break;
+
+	case 1:
+		/*
+		 * Deleting the last part of the extent.
+		 */
+		temp = got.br_blockcount - del->br_blockcount;
+		trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_);
+		xfs_bmbt_set_blockcount(ep, temp);
+		if (delay) {
+			temp = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(ip, temp),
+				da_old);
+			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
+			trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+			da_new = temp;
+			break;
+		}
+		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+		break;
+
+	case 0:
+		/*
+		 * Deleting the middle of the extent.
+		 */
+		temp = del->br_startoff - got.br_startoff;
+		trace_xfs_bmap_pre_update(ip, *idx, state, _THIS_IP_);
+		xfs_bmbt_set_blockcount(ep, temp);
+		new.br_startoff = del_endoff;
+		temp2 = got_endoff - del_endoff;
+		new.br_blockcount = temp2;
+		new.br_state = got.br_state;
+		if (!delay) {
+			new.br_startblock = del_endblock;
+		} else {
+			temp = xfs_bmap_worst_indlen(ip, temp);
+			xfs_bmbt_set_startblock(ep, nullstartblock((int)temp));
+			temp2 = xfs_bmap_worst_indlen(ip, temp2);
+			new.br_startblock = nullstartblock((int)temp2);
+			da_new = temp + temp2;
+			while (da_new > da_old) {
+				if (temp) {
+					temp--;
+					da_new--;
+					xfs_bmbt_set_startblock(ep,
+						nullstartblock((int)temp));
+				}
+				if (da_new == da_old)
+					break;
+				if (temp2) {
+					temp2--;
+					da_new--;
+					new.br_startblock =
+						nullstartblock((int)temp2);
+				}
+			}
+		}
+		trace_xfs_bmap_post_update(ip, *idx, state, _THIS_IP_);
+		xfs_iext_insert(ip, *idx + 1, 1, &new, state);
+		++*idx;
+		break;
+	}
+
+	/*
+	 * Account for change in delayed indirect blocks.
+	 * Nothing to do for disk quota accounting here.
+	 */
+	ASSERT(da_old >= da_new);
+	if (da_old > da_new)
+		xfs_mod_fdblocks(mp, (int64_t)(da_old - da_new), false);
+
+	return error;
+}
+
+/*
  * Unmap (remove) blocks from a file.
  * If nexts is nonzero then the number of extents to remove is limited to
  * that value.  If not all extents in the block range can be removed then
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index fb68ac4..bfa0294 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -253,6 +253,8 @@ int	xfs_bmapi_write(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fsblock_t *firstblock, xfs_extlen_t total,
 		struct xfs_bmbt_irec *mval, int *nmap,
 		struct xfs_bmap_free *flist);
+int	xfs_bunmapi_cow(struct xfs_inode *ip, xfs_extnum_t *idx,
+		struct xfs_bmbt_irec *del);
 int	xfs_bunmapi(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_fileoff_t bno, xfs_filblks_t len, int flags,
 		xfs_extnum_t nexts, xfs_fsblock_t *firstblock,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 59/76] xfs: move mappings from cow fork to data fork after copy-write
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (57 preceding siblings ...)
  2015-12-19  9:02 ` [PATCH 58/76] xfs: support removing extents from " Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 60/76] xfs: implement CoW for directio writes Darrick J. Wong
                   ` (17 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

After the write component of a copy-write operation finishes, clean up
the bookkeeping left behind.  On error, we simply free the new blocks
and pass the error up.  If we succeed, however, then we must remove
the old data fork mapping and move the cow fork mapping to the data
fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    |   23 +++++
 fs/xfs/xfs_reflink.c |  223 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    4 +
 3 files changed, 248 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 7179b25..8101d6a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -195,7 +195,8 @@ xfs_finish_ioend(
 	if (atomic_dec_and_test(&ioend->io_remaining)) {
 		struct xfs_mount	*mp = XFS_I(ioend->io_inode)->i_mount;
 
-		if (ioend->io_type == XFS_IO_UNWRITTEN)
+		if (ioend->io_type == XFS_IO_UNWRITTEN ||
+		    xfs_ioend_is_cow(ioend))
 			queue_work(mp->m_unwritten_workqueue, &ioend->io_work);
 		else if (ioend->io_append_trans)
 			queue_work(mp->m_data_workqueue, &ioend->io_work);
@@ -221,6 +222,23 @@ xfs_end_io(
 	}
 
 	/*
+	 * For a CoW extent, we need to move the mapping from the CoW fork
+	 * to the data fork.  If instead an error happened, just dump the
+	 * new blocks.
+	 */
+	if (xfs_ioend_is_cow(ioend)) {
+		if (ioend->io_error) {
+			error = xfs_reflink_end_cow_failed(ip, ioend->io_offset,
+					ioend->io_size);
+			goto done;
+		}
+		error = xfs_reflink_end_cow(ip, ioend->io_offset,
+				ioend->io_size);
+		if (error)
+			goto done;
+	}
+
+	/*
 	 * For unwritten extents we need to issue transactions to convert a
 	 * range to normal written extens after the data I/O has finished.
 	 * Detecting and handling completion IO errors is done individually
@@ -235,7 +253,7 @@ xfs_end_io(
 	} else if (ioend->io_append_trans) {
 		error = xfs_setfilesize_ioend(ioend);
 	} else {
-		ASSERT(!xfs_ioend_is_append(ioend));
+		ASSERT(!xfs_ioend_is_append(ioend) || xfs_ioend_is_cow(ioend));
 	}
 
 done:
@@ -274,6 +292,7 @@ xfs_alloc_ioend(
 	ioend->io_offset = 0;
 	ioend->io_size = 0;
 	ioend->io_append_trans = NULL;
+	ioend->io_flags = 0;
 
 	INIT_WORK(&ioend->io_work, xfs_end_io);
 	return ioend;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 65d4c2d..9c1c262 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -321,3 +321,226 @@ xfs_reflink_find_cow_mapping(
 
 	return 0;
 }
+
+/**
+ * xfs_reflink_end_cow_failed() -- Throw out a CoW mapping if there was an I/O
+ *				   error while writing the new block.
+ * @ip: The XFS inode.
+ * @offset: File offset, in bytes.
+ * @count: Number of bytes.
+ */
+int
+xfs_reflink_end_cow_failed(
+	struct xfs_inode	*ip,
+	xfs_off_t		offset,
+	size_t			count)
+{
+	struct xfs_bmbt_irec	irec;
+	struct xfs_trans	*tp;
+	struct xfs_ifork	*ifp;
+	xfs_fileoff_t		offset_fsb;
+	xfs_fileoff_t		end_fsb;
+	xfs_filblks_t		count_fsb;
+	xfs_fsblock_t		firstfsb;
+	xfs_extnum_t		idx;
+	struct xfs_bmap_free	free_list;
+	int			committed;
+	int			error;
+	struct xfs_bmbt_rec_host	*gotp;
+
+	trace_xfs_reflink_end_cow_failed(ip, offset, count);
+
+	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
+	end_fsb = XFS_B_TO_FSB(ip->i_mount, (xfs_ufsize_t)offset + count);
+	count_fsb = (xfs_filblks_t)(end_fsb - offset_fsb);
+
+	/* Start a rolling transaction to switch the mappings */
+	tp = xfs_trans_alloc(ip->i_mount, XFS_TRANS_STRAT_WRITE);
+	error = xfs_trans_reserve(tp, &M_RES(ip->i_mount)->tr_write, 0, 0);
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out;
+	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+	xfs_bmap_init(&free_list, &firstfsb);
+
+	/* Go find the old extent in the CoW fork. */
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	gotp = xfs_iext_bno_to_ext(ifp, offset_fsb, &idx);
+	while (gotp) {
+		xfs_bmbt_get_all(gotp, &irec);
+
+		if (irec.br_startoff >= end_fsb)
+			break;
+
+		ASSERT(!isnullstartblock(irec.br_startblock));
+		xfs_trim_extent(&irec, offset_fsb, count_fsb);
+		trace_xfs_reflink_cow_remap(ip, &irec);
+
+		/* Remove the EFD. */
+#if 0
+		efd = xfs_trans_get_efd(tp, eio->rlei_efi, 1);
+		xfs_trans_undelete_extent(tp, efd, imap->br_startblock,
+					 imap->br_blockcount);
+#endif
+
+		xfs_bmap_add_free(ip->i_mount, &free_list, irec.br_startblock,
+				irec.br_blockcount, NULL);
+
+		/* Roll on... */
+		idx++;
+		if (idx >= ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
+			break;
+		gotp = xfs_iext_get_ext(ifp, idx);
+	}
+
+	/* Process all the deferred stuff. */
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
+	if (error)
+		goto out_cancel;
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		goto out;
+	return 0;
+
+out_cancel:
+	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+	trace_xfs_reflink_end_cow_failed_error(ip, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_reflink_end_cow() -- Remap the data fork after a successful CoW.
+ * @ip: The XFS inode.
+ * @offset: File offset, in bytes.
+ * @count: Number of bytes.
+ */
+int
+xfs_reflink_end_cow(
+	struct xfs_inode	*ip,
+	xfs_off_t		offset,
+	size_t			count)
+{
+	struct xfs_bmbt_irec	irec;
+	struct xfs_bmbt_irec	imap;
+	int			nimaps = 1;
+	struct xfs_trans	*tp;
+	struct xfs_ifork	*ifp;
+	xfs_fileoff_t		offset_fsb;
+	xfs_fileoff_t		end_fsb;
+	xfs_filblks_t		count_fsb;
+	xfs_fsblock_t		firstfsb;
+	xfs_extnum_t		idx;
+	struct xfs_bmap_free	free_list;
+	int			committed;
+	int			done;
+	int			error;
+	unsigned int		resblks;
+	struct xfs_bmbt_rec_host	*gotp;
+
+	trace_xfs_reflink_end_cow(ip, offset, count);
+
+	offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
+	end_fsb = XFS_B_TO_FSB(ip->i_mount, (xfs_ufsize_t)offset + count);
+	count_fsb = (xfs_filblks_t)(end_fsb - offset_fsb);
+
+	/* Start a rolling transaction to switch the mappings */
+	resblks = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
+	tp = xfs_trans_alloc(ip->i_mount, XFS_TRANS_STRAT_WRITE);
+	error = xfs_trans_reserve(tp, &M_RES(ip->i_mount)->tr_write,
+			resblks, 0);
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out;
+	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+	xfs_bmap_init(&free_list, &firstfsb);
+
+	/* Go find the old extent in the CoW fork. */
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	gotp = xfs_iext_bno_to_ext(ifp, offset_fsb, &idx);
+	while (gotp) {
+		xfs_bmbt_get_all(gotp, &irec);
+
+		if (irec.br_startoff >= end_fsb)
+			break;
+
+		ASSERT(!isnullstartblock(irec.br_startblock));
+		xfs_trim_extent(&irec, offset_fsb, count_fsb);
+		trace_xfs_reflink_cow_remap(ip, &irec);
+
+		/* Unmap the old blocks in the data fork. */
+		done = false;
+		while (!done) {
+			error = xfs_bunmapi(tp, ip, irec.br_startoff,
+					irec.br_blockcount, 0, 1,
+					&firstfsb, &free_list, &done);
+			if (error)
+				goto out_freelist;
+
+			error = xfs_trans_roll(&tp, ip);
+			if (error)
+				goto out_freelist;
+		}
+
+		/* Remove the EFD. */
+#if 0
+		efd = xfs_trans_get_efd(tp, eio->rlei_efi, 1);
+		xfs_trans_undelete_extent(tp, efd, imap->br_startblock,
+					 imap->br_blockcount);
+#endif
+
+		/* Map the new blocks into the data fork. */
+		firstfsb = irec.br_startblock;
+		error = xfs_bmapi_write(tp, ip, irec.br_startoff,
+					irec.br_blockcount,
+					XFS_BMAPI_REMAP, &firstfsb,
+					irec.br_blockcount, &imap, &nimaps,
+					&free_list);
+		if (error)
+			goto out_freelist;
+
+		/* Remove the mapping from the CoW fork. */
+		error = xfs_bunmapi_cow(ip, &idx, &irec);
+		if (error)
+			goto out_freelist;
+
+		error = xfs_trans_roll(&tp, ip);
+		if (error)
+			goto out_freelist;
+
+		/* Roll on... */
+		idx++;
+		if (idx >= ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
+			break;
+		gotp = xfs_iext_get_ext(ifp, idx);
+	}
+
+	/* Process all the deferred stuff. */
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
+	if (error)
+		goto out_cancel;
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		goto out;
+	return 0;
+
+out_freelist:
+	xfs_bmap_cancel(&free_list);
+out_cancel:
+	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+	trace_xfs_reflink_end_cow_error(ip, error, _RET_IP_);
+	return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 1018ac9..8ec1ebb 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -23,5 +23,9 @@ extern int xfs_reflink_reserve_cow_range(struct xfs_inode *ip, xfs_off_t pos,
 extern bool xfs_reflink_is_cow_pending(struct xfs_inode *ip, xfs_off_t offset);
 extern int xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
 		struct xfs_bmbt_irec *imap, bool *need_alloc);
+extern int xfs_reflink_end_cow_failed(struct xfs_inode *ip, xfs_off_t offset,
+		size_t count);
+extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
+		size_t count);
 
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 60/76] xfs: implement CoW for directio writes
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (58 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 59/76] xfs: move mappings from cow fork to data fork after copy-write Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2016-01-08  9:34   ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 61/76] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
                   ` (16 subsequent siblings)
  76 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

For O_DIRECT writes to shared blocks, we have to CoW them just like
we would with buffered writes.  For writes that are not block-aligned,
just bounce them to the page cache.

For block-aligned writes, however, we can do better than that.  Use
the same mechanisms that we employ for buffered CoW to set up a
delalloc reservation, allocate all the blocks at once, issue the
writes against the new blocks and use the same ioend functions to
remap the blocks after the write.  This should be fairly performant.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c    |   63 +++++++++++++++++++++++++---
 fs/xfs/xfs_file.c    |   12 ++++-
 fs/xfs/xfs_reflink.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    5 ++
 4 files changed, 186 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 8101d6a..4b77d07 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1339,7 +1339,8 @@ xfs_map_direct(
 	struct buffer_head	*bh_result,
 	struct xfs_bmbt_irec	*imap,
 	xfs_off_t		offset,
-	bool			dax_fault)
+	bool			dax_fault,
+	bool			is_cow)
 {
 	struct xfs_ioend	*ioend;
 	xfs_off_t		size = bh_result->b_size;
@@ -1368,20 +1369,23 @@ xfs_map_direct(
 
 		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
 			ioend->io_type = XFS_IO_UNWRITTEN;
+		if (is_cow)
+			ioend->io_flags |= XFS_IOEND_COW;
 
 		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
 					      ioend->io_size, ioend->io_type,
 					      imap);
-	} else if (type == XFS_IO_UNWRITTEN ||
+	} else if (type == XFS_IO_UNWRITTEN || is_cow ||
 		   offset + size > i_size_read(inode) ||
 		   offset + size < 0) {
 		ioend = xfs_alloc_ioend(inode, type);
 		ioend->io_offset = offset;
 		ioend->io_size = size;
+		if (is_cow)
+			ioend->io_flags |= XFS_IOEND_COW;
 
 		bh_result->b_private = ioend;
 		set_buffer_defer_completion(bh_result);
-
 		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
 					   imap);
 	} else {
@@ -1449,6 +1453,8 @@ __xfs_get_blocks(
 	xfs_off_t		offset;
 	ssize_t			size;
 	int			new = 0;
+	bool			is_cow = false;
+	bool			need_alloc = false;
 
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
@@ -1480,8 +1486,15 @@ __xfs_get_blocks(
 	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size);
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 
-	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
-				&imap, &nimaps, XFS_BMAPI_ENTIRE);
+	if (create && direct)
+		is_cow = xfs_reflink_is_cow_pending(ip, offset);
+	if (is_cow)
+		error = xfs_reflink_find_cow_mapping(ip, offset, &imap,
+						     &need_alloc);
+	else
+		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
+					&imap, &nimaps, XFS_BMAPI_ENTIRE);
+	ASSERT(!need_alloc);
 	if (error)
 		goto out_unlock;
 
@@ -1553,13 +1566,33 @@ __xfs_get_blocks(
 	if (imap.br_startblock != HOLESTARTBLOCK &&
 	    imap.br_startblock != DELAYSTARTBLOCK &&
 	    (create || !ISUNWRITTEN(&imap))) {
+		if (create && direct && !is_cow) {
+			bool shared;
+
+			error = xfs_reflink_irec_is_shared(ip, &imap, &shared);
+			if (error)
+				return error;
+			/*
+			 * Are we doing a DIO write to a shared block?  In
+			 * the ideal world we at least would fork full blocks,
+			 * but for now just fall back to buffered mode.  Yuck.
+			 * Use -EREMCHG ("remote address changed") to signal
+			 * this, since in general XFS doesn't do this sort of
+			 * fallback.
+			 */
+			if (shared) {
+				trace_xfs_reflink_bounce_dio_write(ip, &imap);
+				return -EREMCHG;
+			}
+		}
+
 		xfs_map_buffer(inode, bh_result, &imap, offset);
 		if (ISUNWRITTEN(&imap))
 			set_buffer_unwritten(bh_result);
 		/* direct IO needs special help */
 		if (create && direct)
 			xfs_map_direct(inode, bh_result, &imap, offset,
-				       dax_fault);
+				       dax_fault, is_cow);
 	}
 
 	/*
@@ -1738,6 +1771,24 @@ xfs_vm_do_dio(
 	int			flags)
 {
 	struct block_device	*bdev;
+	loff_t			end;
+	loff_t			block_mask;
+	int			error;
+
+	/* If this is a block-aligned directio CoW, remap immediately. */
+	end = offset + iov_iter_count(iter);
+	block_mask = (1 << inode->i_blkbits) - 1;
+	if (xfs_is_reflink_inode(XFS_I(inode)) && iov_iter_rw(iter) == WRITE &&
+	    !(offset & block_mask) && !(end & block_mask)) {
+		error = xfs_reflink_reserve_cow_range(XFS_I(inode), offset,
+				iov_iter_count(iter));
+		if (error)
+			return error;
+		error = xfs_reflink_allocate_cow_range(XFS_I(inode), offset,
+				iov_iter_count(iter));
+		if (error)
+			return error;
+	}
 
 	if (IS_DAX(inode))
 		return dax_do_io(iocb, inode, iter, offset,
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 0fbcb38..31b002e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -892,10 +892,18 @@ xfs_file_write_iter(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
 
-	if ((iocb->ki_flags & IOCB_DIRECT) || IS_DAX(inode))
+	/*
+	 * Allow DIO to fall back to buffered *only* in the case that we're
+	 * doing a reflink CoW.
+	 */
+	if ((iocb->ki_flags & IOCB_DIRECT) || IS_DAX(inode)) {
 		ret = xfs_file_dio_aio_write(iocb, from);
-	else
+		if (ret == -EREMCHG)
+			goto buffered;
+	} else {
+buffered:
 		ret = xfs_file_buffered_aio_write(iocb, from);
+	}
 
 	if (ret > 0) {
 		ssize_t err;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 9c1c262..8594bc4 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -134,6 +134,56 @@ xfs_trim_extent(
 	}
 }
 
+/**
+ * xfs_reflink_irec_is_shared() -- Are any of the blocks in this mapping
+ *				   shared?
+ *
+ * @ip: XFS inode object
+ * @irec: the fileoff:fsblock mapping that we might fork
+ * @shared: set to true if the mapping is shared.
+ */
+int
+xfs_reflink_irec_is_shared(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*irec,
+	bool			*shared)
+{
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	xfs_extlen_t		aglen;
+	xfs_agblock_t		fbno;
+	xfs_extlen_t		flen;
+	int			error = 0;
+
+	/* Holes, unwritten, and delalloc extents cannot be shared */
+	if (!xfs_is_reflink_inode(ip) ||
+	    ISUNWRITTEN(irec) ||
+	    irec->br_startblock == HOLESTARTBLOCK ||
+	    irec->br_startblock == DELAYSTARTBLOCK) {
+		*shared = false;
+		return 0;
+	}
+
+	trace_xfs_reflink_irec_is_shared(ip, irec);
+
+	agno = XFS_FSB_TO_AGNO(ip->i_mount, irec->br_startblock);
+	agbno = XFS_FSB_TO_AGBNO(ip->i_mount, irec->br_startblock);
+	aglen = irec->br_blockcount;
+
+	/* Are there any shared blocks here? */
+	error = xfs_refcount_find_shared(ip->i_mount, agno, agbno,
+			aglen, &fbno, &flen, false);
+	if (error)
+		return error;
+	if (flen == 0) {
+		*shared = false;
+		return 0;
+	}
+
+	*shared = true;
+	return 0;
+}
+
 /* Find the shared ranges under an irec, and set up delalloc extents. */
 STATIC int
 xfs_reflink_reserve_cow_extent(
@@ -251,6 +301,70 @@ xfs_reflink_reserve_cow_range(
 }
 
 /**
+ * xfs_reflink_allocate_cow_range() -- Allocate blocks to satisfy a copy on
+ *				       write operation.
+ * @ip: XFS inode.
+ * @pos: file offset to start CoWing.
+ * @len: number of bytes to CoW.
+ */
+int
+xfs_reflink_allocate_cow_range(
+	struct xfs_inode	*ip,
+	xfs_off_t		pos,
+	xfs_off_t		len)
+{
+	struct xfs_ifork	*ifp;
+	struct xfs_bmbt_rec_host	*gotp;
+	struct xfs_bmbt_irec	imap;
+	int			error = 0;
+	xfs_fileoff_t		start_lblk;
+	xfs_fileoff_t		end_lblk;
+	xfs_extnum_t		idx;
+
+	if (!xfs_is_reflink_inode(ip))
+		return 0;
+
+	trace_xfs_reflink_allocate_cow_range(ip, len, pos, 0);
+
+	start_lblk = XFS_B_TO_FSBT(ip->i_mount, pos);
+	end_lblk = XFS_B_TO_FSB(ip->i_mount, pos + len);
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	gotp = xfs_iext_bno_to_ext(ifp, start_lblk, &idx);
+	while (gotp) {
+		xfs_bmbt_get_all(gotp, &imap);
+
+		if (imap.br_startoff >= end_lblk)
+			break;
+		if (!isnullstartblock(imap.br_startblock))
+			goto advloop;
+		xfs_trim_extent(&imap, start_lblk, end_lblk - start_lblk);
+		trace_xfs_reflink_allocate_cow_extent(ip, &imap);
+
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK,
+				XFS_FSB_TO_B(ip->i_mount, imap.br_startoff +
+						imap.br_blockcount - 1), &imap);
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		if (error)
+			break;
+advloop:
+		/* Roll on... */
+		idx++;
+		if (idx >= ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
+			break;
+		gotp = xfs_iext_get_ext(ifp, idx);
+	}
+
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	if (error)
+		trace_xfs_reflink_allocate_cow_range_error(ip, error, _RET_IP_);
+	return error;
+}
+
+/**
  * xfs_reflink_is_cow_pending() -- Determine if CoW is pending for a given
  *				   file and offset.
  *
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 8ec1ebb..d356c00 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -18,8 +18,13 @@
 #ifndef __XFS_REFLINK_H
 #define __XFS_REFLINK_H 1
 
+extern int xfs_reflink_irec_is_shared(struct xfs_inode *ip,
+		struct xfs_bmbt_irec *imap, bool *shared);
+
 extern int xfs_reflink_reserve_cow_range(struct xfs_inode *ip, xfs_off_t pos,
 		xfs_off_t len);
+extern int xfs_reflink_allocate_cow_range(struct xfs_inode *ip, xfs_off_t pos,
+		xfs_off_t len);
 extern bool xfs_reflink_is_cow_pending(struct xfs_inode *ip, xfs_off_t offset);
 extern int xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
 		struct xfs_bmbt_irec *imap, bool *need_alloc);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 61/76] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (59 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 60/76] xfs: implement CoW for directio writes Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 62/76] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
                   ` (15 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we're writing zeroes to a reflinked block (such as when we're
punching a reflinked range), we need to fork the the block and write
to that, otherwise we can corrupt the other reflinks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c      |   17 +++++++++++++++++
 fs/xfs/xfs_bmap_util.c |   34 ++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_reflink.h   |    3 +++
 3 files changed, 52 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 4b77d07..185415a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -374,6 +374,23 @@ xfs_map_blocks(
 	return 0;
 }
 
+/*
+ * xfs_map_cow_blocks() -- Find a CoW mapping and ensure that blocks have been
+ *			   allocated to it.
+ * @inode: VFS inode.
+ * @offset: File offset to allocate.
+ * @imap: (out) Block mapping.
+ */
+int
+xfs_map_cow_blocks(
+	struct inode		*inode,
+	xfs_off_t		offset,
+	struct xfs_bmbt_irec	*imap)
+{
+	return xfs_map_blocks(inode, offset, imap, XFS_IO_OVERWRITE,
+			false, true);
+}
+
 STATIC int
 xfs_imap_valid(
 	struct inode		*inode,
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index ad4eca7..983d8f6 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -41,6 +41,8 @@
 #include "xfs_icache.h"
 #include "xfs_log.h"
 #include "xfs_rmap_btree.h"
+#include "xfs_iomap.h"
+#include "xfs_reflink.h"
 
 /* Kernel only BMAP related definitions and functions */
 
@@ -1138,7 +1140,8 @@ xfs_zero_remaining_bytes(
 	xfs_buf_t		*bp;
 	xfs_mount_t		*mp = ip->i_mount;
 	int			nimap;
-	int			error = 0;
+	int			error = 0, err2;
+	bool			should_fork = false;
 
 	/*
 	 * Avoid doing I/O beyond eof - it's not necessary
@@ -1151,6 +1154,11 @@ xfs_zero_remaining_bytes(
 	if (endoff > XFS_ISIZE(ip))
 		endoff = XFS_ISIZE(ip);
 
+	error = xfs_reflink_reserve_cow_range(ip, startoff,
+			endoff - startoff + 1);
+	if (error)
+		return error;
+
 	for (offset = startoff; offset <= endoff; offset = lastoffset + 1) {
 		uint lock_mode;
 
@@ -1159,6 +1167,10 @@ xfs_zero_remaining_bytes(
 
 		lock_mode = xfs_ilock_data_map_shared(ip);
 		error = xfs_bmapi_read(ip, offset_fsb, 1, &imap, &nimap, 0);
+
+		/* Do we need to CoW this block? */
+		if (error == 0 && nimap == 1)
+			should_fork = xfs_reflink_is_cow_pending(ip, offset);
 		xfs_iunlock(ip, lock_mode);
 
 		if (error || nimap < 1)
@@ -1180,7 +1192,7 @@ xfs_zero_remaining_bytes(
 			lastoffset = endoff;
 
 		/* DAX can just zero the backing device directly */
-		if (IS_DAX(VFS_I(ip))) {
+		if (IS_DAX(VFS_I(ip)) && !should_fork) {
 			error = dax_zero_page_range(VFS_I(ip), offset,
 						    lastoffset - offset + 1,
 						    xfs_get_blocks_direct);
@@ -1201,8 +1213,26 @@ xfs_zero_remaining_bytes(
 				(offset - XFS_FSB_TO_B(mp, imap.br_startoff)),
 		       0, lastoffset - offset + 1);
 
+		if (should_fork) {
+			xfs_fsblock_t	new_fsbno;
+
+			error = xfs_map_cow_blocks(VFS_I(ip), offset, &imap);
+			if (error)
+				return error;
+			new_fsbno = imap.br_startblock +
+					(offset_fsb - imap.br_startoff);
+			XFS_BUF_SET_ADDR(bp, XFS_FSB_TO_DADDR(mp, new_fsbno));
+		}
+
 		error = xfs_bwrite(bp);
 		xfs_buf_relse(bp);
+		if (error) {
+			err2 = xfs_reflink_end_cow_failed(ip, offset,
+					lastoffset - offset + 1);
+			return error;
+		}
+		error = xfs_reflink_end_cow(ip, offset,
+				lastoffset - offset + 1);
 		if (error)
 			return error;
 	}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index d356c00..2d2832b 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -33,4 +33,7 @@ extern int xfs_reflink_end_cow_failed(struct xfs_inode *ip, xfs_off_t offset,
 extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
 		size_t count);
 
+int	xfs_map_cow_blocks(struct inode *inode, xfs_off_t offset,
+			   struct xfs_bmbt_irec	*imap);
+
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 62/76] xfs: clear inode reflink flag when freeing blocks
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (60 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 61/76] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 63/76] xfs: cancel pending CoW reservations when destroying inodes Darrick J. Wong
                   ` (14 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Clear the inode reflink flag when freeing or truncating all blocks
in a file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |    8 ++++++++
 fs/xfs/xfs_inode.c     |    6 ++++++
 2 files changed, 14 insertions(+)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 983d8f6..dd5a2f7 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1390,6 +1390,14 @@ xfs_free_file_space(
 		}
 
 		/*
+		 * Clear the reflink flag if we freed everything.
+		 */
+		if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
+			ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+		}
+
+		/*
 		 * complete the transaction
 		 */
 		error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 55a7e00..a323631 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1613,6 +1613,12 @@ xfs_itruncate_extents(
 	}
 
 	/*
+	 * Clear the reflink flag if we truncated everything.
+	 */
+	if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip))
+		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+
+	/*
 	 * Always re-log the inode so that our permanent transaction can keep
 	 * on rolling it forward in the log.
 	 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 63/76] xfs: cancel pending CoW reservations when destroying inodes
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (61 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 62/76] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 64/76] xfs: reflink extents from one file to another Darrick J. Wong
                   ` (13 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When destroying the inode, cancel all pending reservations in the CoW
fork so that all the reserved blocks go back to the free pile.  Also
free the reservations whenever we truncate or punch the entire file.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |    1 +
 fs/xfs/xfs_inode.c     |    5 +++-
 fs/xfs/xfs_reflink.c   |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h   |    1 +
 fs/xfs/xfs_super.c     |   10 ++++++++
 5 files changed, 80 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index dd5a2f7..3e274f6 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1393,6 +1393,7 @@ xfs_free_file_space(
 		 * Clear the reflink flag if we freed everything.
 		 */
 		if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
+			xfs_reflink_cancel_pending_cow(ip);
 			ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
 			xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 		}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index a323631..59964c3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -48,6 +48,7 @@
 #include "xfs_trans_priv.h"
 #include "xfs_log.h"
 #include "xfs_bmap_btree.h"
+#include "xfs_reflink.h"
 
 kmem_zone_t *xfs_inode_zone;
 
@@ -1615,8 +1616,10 @@ xfs_itruncate_extents(
 	/*
 	 * Clear the reflink flag if we truncated everything.
 	 */
-	if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip))
+	if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
+		xfs_reflink_cancel_pending_cow(ip);
 		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+	}
 
 	/*
 	 * Always re-log the inode so that our permanent transaction can keep
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 8594bc4..dcc71b9 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -658,3 +658,67 @@ out:
 	trace_xfs_reflink_end_cow_error(ip, error, _RET_IP_);
 	return error;
 }
+
+/**
+ * xfs_reflink_cancel_pending_cow() -- Cancel all pending CoW fork reservations.
+ * @ip: The XFS inode.
+ */
+int
+xfs_reflink_cancel_pending_cow(
+	struct xfs_inode	*ip)
+{
+	struct xfs_bmbt_irec	irec;
+	struct xfs_ifork	*ifp;
+	xfs_extnum_t		idx;
+	struct xfs_bmbt_rec_host	*gotp;
+	int			error;
+
+	/*
+	 * If this isn't a reflink inode there had better not be anything
+	 * in the CoW fork!
+	 */
+	if (!xfs_is_reflink_inode(ip)) {
+		ASSERT(ip->i_cowfp == NULL || ip->i_cowfp->if_bytes == 0);
+		return 0;
+	}
+
+	trace_xfs_reflink_cancel_pending_cow(ip);
+
+	ASSERT(ip->i_cowfp != NULL);
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+
+	/* Go find the old extent in the CoW fork. */
+	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+	gotp = xfs_iext_bno_to_ext(ifp, 0, &idx);
+	while (gotp) {
+		xfs_bmbt_get_all(gotp, &irec);
+
+		/*
+		 * We should never be asked to cancel allocated CoW extents
+		 * unless the FS is broken, because doing so prevents the
+		 * block remapping after the write finishes.
+		 */
+		ASSERT(isnullstartblock(irec.br_startblock) ||
+		       XFS_FORCED_SHUTDOWN(ip->i_mount));
+		if (!isnullstartblock(irec.br_startblock))
+			goto advloop;
+
+		trace_xfs_reflink_cancel_cow(ip, &irec);
+
+		/* Give the blocks back. */
+		xfs_mod_fdblocks(ip->i_mount, irec.br_blockcount, false);
+		ip->i_delayed_blks -= irec.br_blockcount;
+		error = xfs_bunmapi_cow(ip, &idx, &irec);
+		if (error)
+			break;
+
+		/* Roll on... */
+advloop:
+		idx++;
+		if (idx >= ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
+			break;
+		gotp = xfs_iext_get_ext(ifp, idx);
+	}
+
+	return 0;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 2d2832b..cf4b43b 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -32,6 +32,7 @@ extern int xfs_reflink_end_cow_failed(struct xfs_inode *ip, xfs_off_t offset,
 		size_t count);
 extern int xfs_reflink_end_cow(struct xfs_inode *ip, xfs_off_t offset,
 		size_t count);
+extern int xfs_reflink_cancel_pending_cow(struct xfs_inode *ip);
 
 int	xfs_map_cow_blocks(struct inode *inode, xfs_off_t offset,
 			   struct xfs_bmbt_irec	*imap);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 85e582d..36df40b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -45,6 +45,7 @@
 #include "xfs_filestream.h"
 #include "xfs_quota.h"
 #include "xfs_sysfs.h"
+#include "xfs_reflink.h"
 
 #include <linux/namei.h>
 #include <linux/init.h>
@@ -920,11 +921,20 @@ xfs_fs_destroy_inode(
 	struct inode		*inode)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
+	int			error;
 
 	trace_xfs_destroy_inode(ip);
 
 	XFS_STATS_INC(ip->i_mount, vn_reclaim);
 
+	/* Cancel the CoW reservations. */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	error = xfs_reflink_cancel_pending_cow(ip);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		xfs_warn(ip->i_mount, "Error %d destroying inode %llu.\n",
+				error, ip->i_ino);
+
 	ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
 
 	/*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 64/76] xfs: reflink extents from one file to another
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (62 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 63/76] xfs: cancel pending CoW reservations when destroying inodes Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 65/76] xfs: add clone file and clone range ioctls Darrick J. Wong
                   ` (12 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Reflink extents from one file to another; that is to say, iteratively
remove the mappings from the destination file, copy the mappings from
the source file to the destination file, and increment the reference
count of all the blocks that got remapped.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_reflink.c |  445 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    3 
 2 files changed, 448 insertions(+)


diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index dcc71b9..3de3c9a 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -722,3 +722,448 @@ advloop:
 
 	return 0;
 }
+
+/*
+ * Reflinking (Block) Ranges of Two Files Together
+ *
+ * First, ensure that the reflink flag is set on both inodes.  The flag is an
+ * optimization to avoid unnecessary refcount btree lookups in the write path.
+ *
+ * Now we can iteratively remap the range of extents (and holes) in src to the
+ * corresponding ranges in dest.  Let drange and srange denote the ranges of
+ * logical blocks in dest and src touched by the reflink operation.
+ *
+ * While the length of drange is greater than zero,
+ *    - Read src's bmbt at the start of srange ("imap")
+ *    - If imap doesn't exist, make imap appear to start at the end of srange
+ *      with zero length.
+ *    - If imap starts before srange, advance imap to start at srange.
+ *    - If imap goes beyond srange, truncate imap to end at the end of srange.
+ *    - Punch (imap start - srange start + imap len) blocks from dest at
+ *      offset (drange start).
+ *    - If imap points to a real range of pblks,
+ *         > Increase the refcount of the imap's pblks
+ *         > Map imap's pblks into dest at the offset
+ *           (drange start + imap start - srange start)
+ *    - Advance drange and srange by (imap start - srange start + imap len)
+ *
+ * Finally, if the reflink made dest longer, update both the in-core and
+ * on-disk file sizes.
+ *
+ * ASCII Art Demonstration:
+ *
+ * Let's say we want to reflink this source file:
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS (src file)
+ *   <-------------------->
+ *
+ * into this destination file:
+ *
+ * --DDDDDDDDDDDDDDDDDDD--DDD (dest file)
+ *        <-------------------->
+ * '-' means a hole, and 'S' and 'D' are written blocks in the src and dest.
+ * Observe that the range has different logical offsets in either file.
+ *
+ * Consider that the first extent in the source file doesn't line up with our
+ * reflink range.  Unmapping  and remapping are separate operations, so we can
+ * unmap more blocks from the destination file than we remap.
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ *   <------->
+ * --DDDDD---------DDDDD--DDD
+ *        <------->
+ *
+ * Now remap the source extent into the destination file:
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ *   <------->
+ * --DDDDD--SSSSSSSDDDDD--DDD
+ *        <------->
+ *
+ * Do likewise with the second hole and extent in our range.  Holes in the
+ * unmap range don't affect our operation.
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ *            <---->
+ * --DDDDD--SSSSSSS-SSSSS-DDD
+ *                 <---->
+ *
+ * Finally, unmap and remap part of the third extent.  This will increase the
+ * size of the destination file.
+ *
+ * ----SSSSSSS-SSSSS----SSSSSS
+ *                  <----->
+ * --DDDDD--SSSSSSS-SSSSS----SSS
+ *                       <----->
+ *
+ * Once we update the destination file's i_size, we're done.
+ */
+
+/*
+ * Ensure the reflink bit is set in both inodes.
+ */
+static int
+xfs_reflink_set_inode_flag(
+	struct xfs_inode	*src,
+	struct xfs_inode	*dest)
+{
+	struct xfs_mount	*mp = src->i_mount;
+	int			error;
+	struct xfs_trans	*tp;
+
+	if (xfs_is_reflink_inode(src) && xfs_is_reflink_inode(dest))
+		return 0;
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+
+	/*
+	 * check for running out of space
+	 */
+	if (error) {
+		/*
+		 * Free the transaction structure.
+		 */
+		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+		goto out_cancel;
+	}
+
+	/* Lock both files against IO */
+	if (src->i_ino == dest->i_ino)
+		xfs_ilock(src, XFS_ILOCK_EXCL);
+	else
+		xfs_lock_two_inodes(src, dest, XFS_ILOCK_EXCL);
+
+	if (!xfs_is_reflink_inode(src)) {
+		trace_xfs_reflink_set_inode_flag(src);
+		xfs_trans_ijoin(tp, src, XFS_ILOCK_EXCL);
+		src->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK;
+		xfs_trans_log_inode(tp, src, XFS_ILOG_CORE);
+		xfs_ifork_init_cow(src);
+	} else
+		xfs_iunlock(src, XFS_ILOCK_EXCL);
+
+	if (src->i_ino == dest->i_ino)
+		goto commit_flags;
+
+	if (!xfs_is_reflink_inode(dest)) {
+		trace_xfs_reflink_set_inode_flag(dest);
+		xfs_trans_ijoin(tp, dest, XFS_ILOCK_EXCL);
+		dest->i_d.di_flags2 |= XFS_DIFLAG2_REFLINK;
+		xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
+		xfs_ifork_init_cow(dest);
+	} else
+		xfs_iunlock(dest, XFS_ILOCK_EXCL);
+
+commit_flags:
+	error = xfs_trans_commit(tp);
+	if (error)
+		goto out_error;
+	return error;
+
+out_cancel:
+	xfs_trans_cancel(tp);
+out_error:
+	trace_xfs_reflink_set_inode_flag_error(dest, error, _RET_IP_);
+	return error;
+}
+
+/*
+ * Update destination inode size, if necessary.
+ */
+static int
+xfs_reflink_update_dest_isize(
+	struct xfs_inode	*dest,
+	xfs_off_t		newlen)
+{
+	struct xfs_mount	*mp = dest->i_mount;
+	struct xfs_trans	*tp;
+	int			error;
+
+	if (newlen <= i_size_read(VFS_I(dest)))
+		return 0;
+
+	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0);
+
+	/*
+	 * check for running out of space
+	 */
+	if (error) {
+		/*
+		 * Free the transaction structure.
+		 */
+		ASSERT(error == -ENOSPC || XFS_FORCED_SHUTDOWN(mp));
+		goto out_cancel;
+	}
+
+	xfs_ilock(dest, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, dest, XFS_ILOCK_EXCL);
+
+	trace_xfs_reflink_update_inode_size(dest, newlen);
+	i_size_write(VFS_I(dest), newlen);
+	dest->i_d.di_size = newlen;
+	xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
+
+	error = xfs_trans_commit(tp);
+	if (error)
+		goto out_error;
+	return error;
+
+out_cancel:
+	xfs_trans_cancel(tp);
+out_error:
+	trace_xfs_reflink_update_inode_size_error(dest, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_reflink_remap_extent() -- Unmap a range of blocks from a file, then
+ *				 map other blocks into the hole.  The range
+ *				 to unmap is:
+ *				 (@destoff:@destoff+@srcioff+@irec.blockcount).
+ *				 The extent @irec is mapped at
+ *				 @destoff+@srcioff.
+ * @ip: XFS inode.
+ * @irec: The mapping to put into the file.
+ * @destoff: How far into the the file to start unmapping.
+ * @srcioff: How far past @destoff to start mapping.
+ */
+static int
+xfs_reflink_remap_extent(
+	struct xfs_inode	*ip,
+	struct xfs_bmbt_irec	*irec,
+	xfs_fileoff_t		destoff,
+	xfs_fileoff_t		srcioff)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	xfs_fsblock_t		firstfsb;
+	unsigned int		resblks;
+	struct xfs_bmap_free	free_list;
+	struct xfs_bmbt_irec	imap;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	struct xfs_buf		*agbp;
+	int			nimaps;
+	int			done;
+	int			committed;
+	int			error;
+
+	trace_xfs_reflink_punch_range(ip, destoff,
+			srcioff + irec->br_blockcount);
+
+	/* Start a rolling transaction to switch the mappings */
+	resblks = XFS_EXTENTADD_SPACE_RES(ip->i_mount, XFS_DATA_FORK);
+	tp = xfs_trans_alloc(ip->i_mount, XFS_TRANS_STRAT_WRITE);
+	error = xfs_trans_reserve(tp, &M_RES(ip->i_mount)->tr_write,
+			resblks, 0);
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out;
+	}
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
+	xfs_bmap_init(&free_list, &firstfsb);
+
+	/* Unmap the old blocks in the data fork. */
+	done = false;
+	while (!done) {
+		error = xfs_bunmapi(tp, ip, destoff,
+				srcioff + irec->br_blockcount, 0, 1,
+				&firstfsb, &free_list, &done);
+		if (error)
+			goto out_freelist;
+
+		error = xfs_trans_roll(&tp, ip);
+		if (error)
+			goto out_freelist;
+	}
+
+	/* If this isn't a real mapping, we're done. */
+	if (irec->br_startblock == HOLESTARTBLOCK ||
+	    irec->br_startblock == DELAYSTARTBLOCK ||
+	    ISUNWRITTEN(irec))
+		goto done;
+
+	trace_xfs_reflink_remap(ip, destoff + srcioff,
+			irec->br_blockcount, irec->br_startblock);
+
+	/* Update the refcount tree */
+	agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock);
+	agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock);
+	error = xfs_alloc_read_agf(mp, tp, agno, 0, &agbp);
+	if (error)
+		goto out_freelist;
+	error = xfs_refcount_increase(mp, tp, agbp, agno, agbno,
+				      irec->br_blockcount, &free_list);
+	xfs_trans_brelse(tp, agbp);
+	if (error)
+		goto out_freelist;
+
+	error = xfs_trans_roll(&tp, ip);
+	if (error)
+		goto out_freelist;
+
+	/* Map the new blocks into the data fork. */
+	firstfsb = irec->br_startblock;
+	nimaps = 1;
+	error = xfs_bmapi_write(tp, ip, destoff + srcioff,
+				irec->br_blockcount,
+				XFS_BMAPI_REMAP, &firstfsb,
+				irec->br_blockcount, &imap, &nimaps,
+				&free_list);
+	if (error)
+		goto out_freelist;
+
+	/* Process all the deferred stuff. */
+done:
+	error = xfs_bmap_finish(&tp, &free_list, &committed, NULL);
+	if (error)
+		goto out_cancel;
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		goto out;
+	return 0;
+
+out_freelist:
+	xfs_bmap_cancel(&free_list);
+out_cancel:
+	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+	trace_xfs_reflink_remap_extent_error(ip, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * Iteratively remap one file's extents (and holes) to another's.
+ */
+static int
+xfs_reflink_remap_blocks(
+	struct xfs_inode	*src,
+	xfs_fileoff_t		srcoff,
+	struct xfs_inode	*dest,
+	xfs_fileoff_t		destoff,
+	xfs_filblks_t		len)
+{
+	struct xfs_bmbt_irec	imap;
+	int			nimaps;
+	int			error = 0;
+	xfs_fileoff_t		srcioff;
+
+	/* drange = (destoff, destoff + len); srange = (srcoff, srcoff + len) */
+	while (len) {
+		trace_xfs_reflink_remap_blocks_loop(src, srcoff, len,
+				dest, destoff);
+		/* Read extent from the source file */
+		nimaps = 1;
+		xfs_ilock(src, XFS_ILOCK_EXCL);
+		error = xfs_bmapi_read(src, srcoff, len, &imap, &nimaps, 0);
+		xfs_iunlock(src, XFS_ILOCK_EXCL);
+		if (error)
+			goto err;
+
+		/*
+		 * If imap doesn't exist, pretend that it does just past
+		 * srange.
+		 */
+		if (nimaps == 0) {
+			imap.br_startoff = srcoff + len;
+			imap.br_startblock = HOLESTARTBLOCK;
+			imap.br_blockcount = 0;
+			imap.br_state = XFS_EXT_INVALID;
+		} else
+			xfs_trim_extent(&imap, srcoff, len);
+
+		trace_xfs_reflink_remap_imap(src, srcoff, len, XFS_IO_OVERWRITE,
+				&imap);
+
+		srcioff = imap.br_startoff - srcoff;
+		error = xfs_reflink_remap_extent(dest, &imap, destoff, srcioff);
+		if (error)
+			goto err;
+
+		/* Advance drange/srange */
+		srcoff += srcioff + imap.br_blockcount;
+		destoff += srcioff + imap.br_blockcount;
+		len -= srcioff + imap.br_blockcount;
+	}
+
+	return 0;
+
+err:
+	trace_xfs_reflink_remap_blocks_error(dest, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_reflink_remap_range() -- Link a range of blocks from one file to another.
+ *
+ * @src: Inode to clone from
+ * @srcoff: Offset within source to start clone from
+ * @dest: Inode to clone to
+ * @destoff: Offset within @inode to start clone
+ * @len: Original length, passed by user, of range to clone
+ */
+int
+xfs_reflink_remap_range(
+	struct xfs_inode	*src,
+	xfs_off_t		srcoff,
+	struct xfs_inode	*dest,
+	xfs_off_t		destoff,
+	xfs_off_t		len)
+{
+	struct xfs_mount	*mp = src->i_mount;
+	xfs_fileoff_t		sfsbno, dfsbno;
+	xfs_filblks_t		fsblen;
+	int			error;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return -EOPNOTSUPP;
+
+	if (XFS_FORCED_SHUTDOWN(mp))
+		return -EIO;
+
+	/* Don't reflink realtime inodes */
+	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
+		return -EINVAL;
+
+	trace_xfs_reflink_remap_range(src, srcoff, len, dest, destoff);
+
+	/* Lock both files against IO */
+	if (src->i_ino == dest->i_ino) {
+		xfs_ilock(src, XFS_IOLOCK_EXCL);
+		xfs_ilock(src, XFS_MMAPLOCK_EXCL);
+	} else {
+		xfs_lock_two_inodes(src, dest, XFS_IOLOCK_EXCL);
+		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
+	}
+
+	error = xfs_reflink_set_inode_flag(src, dest);
+	if (error)
+		goto out_error;
+
+	dfsbno = XFS_B_TO_FSBT(mp, destoff);
+	sfsbno = XFS_B_TO_FSBT(mp, srcoff);
+	fsblen = XFS_B_TO_FSB(mp, len);
+	error = xfs_reflink_remap_blocks(src, sfsbno, dest, dfsbno, fsblen);
+	if (error)
+		goto out_error;
+
+	error = xfs_reflink_update_dest_isize(dest, destoff + len);
+
+out_error:
+	xfs_iunlock(src, XFS_MMAPLOCK_EXCL);
+	xfs_iunlock(src, XFS_IOLOCK_EXCL);
+	if (src->i_ino != dest->i_ino) {
+		xfs_iunlock(dest, XFS_MMAPLOCK_EXCL);
+		xfs_iunlock(dest, XFS_IOLOCK_EXCL);
+	}
+	if (error)
+		trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
+	return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index cf4b43b..df33044 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -37,4 +37,7 @@ extern int xfs_reflink_cancel_pending_cow(struct xfs_inode *ip);
 int	xfs_map_cow_blocks(struct inode *inode, xfs_off_t offset,
 			   struct xfs_bmbt_irec	*imap);
 
+extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff,
+		struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len);
+
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 65/76] xfs: add clone file and clone range ioctls
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (63 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 64/76] xfs: reflink extents from one file to another Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 66/76] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
                   ` (11 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Define two ioctls which allow userspace to reflink a range of blocks
between two files or to reflink one file's contents to another.
These ioctls must have the same ABI as the btrfs ioctls with similar
names.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |   11 +++++
 fs/xfs/xfs_file.c      |  115 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_ioctl.c     |   87 ++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_ioctl32.c   |    2 +
 4 files changed, 215 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 22a8fd9..a3cd93e 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -571,6 +571,17 @@ typedef struct xfs_swapext
 #define XFS_IOC_GOINGDOWN	     _IOR ('X', 125, __uint32_t)
 /*	XFS_IOC_GETFSUUID ---------- deprecated 140	 */
 
+/* reflink ioctls; these MUST match the btrfs ioctl definitions */
+/* from struct btrfs_ioctl_clone_range_args */
+struct xfs_clone_args {
+	__s64 src_fd;
+	__u64 src_offset;
+	__u64 src_length;
+	__u64 dest_offset;
+};
+
+#define XFS_IOC_CLONE		 _IOW (0x94, 9, int)
+#define XFS_IOC_CLONE_RANGE	 _IOW (0x94, 13, struct xfs_clone_args)
 
 #ifndef HAVE_BBMACROS
 /*
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 31b002e..44d89ea 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1050,6 +1050,121 @@ out_unlock:
 	return error;
 }
 
+/*
+ * Flush all file writes out to disk.
+ */
+static int
+xfs_file_wait_for_io(
+	struct inode	*inode,
+	loff_t		offset,
+	size_t		len)
+{
+	loff_t		rounding;
+	loff_t		ioffset;
+	loff_t		iendoffset;
+	loff_t		bs;
+	int		ret;
+
+	bs = inode->i_sb->s_blocksize;
+	inode_dio_wait(inode);
+
+	rounding = max_t(xfs_off_t, bs, PAGE_CACHE_SIZE);
+	ioffset = round_down(offset, rounding);
+	iendoffset = round_up(offset + len, rounding) - 1;
+	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
+					   iendoffset);
+	return ret;
+}
+
+/* Hook up to the VFS reflink function */
+int
+xfs_file_share_range(
+	struct file	*file_in,
+	loff_t		pos_in,
+	struct file	*file_out,
+	loff_t		pos_out,
+	u64		len)
+{
+	struct inode	*inode_in;
+	struct inode	*inode_out;
+	ssize_t		ret;
+	loff_t		bs;
+	loff_t		isize;
+	int		same_inode;
+	loff_t		blen;
+
+	inode_in = file_inode(file_in);
+	inode_out = file_inode(file_out);
+	bs = inode_out->i_sb->s_blocksize;
+
+	/* Don't touch certain kinds of inodes */
+	if (IS_IMMUTABLE(inode_out))
+		return -EPERM;
+	if (IS_SWAPFILE(inode_in) ||
+	    IS_SWAPFILE(inode_out))
+		return -ETXTBSY;
+
+	/* Reflink only works within this filesystem. */
+	if (inode_in->i_sb != inode_out->i_sb ||
+	    file_in->f_path.mnt != file_out->f_path.mnt)
+		return -EXDEV;
+	same_inode = (inode_in->i_ino == inode_out->i_ino);
+
+	/* Don't reflink dirs, pipes, sockets... */
+	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
+		return -EISDIR;
+	if (S_ISFIFO(inode_in->i_mode) || S_ISFIFO(inode_out->i_mode))
+		return -EINVAL;
+	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
+		return -EINVAL;
+
+	/* Are we going all the way to the end? */
+	isize = i_size_read(inode_in);
+	if (isize == 0)
+		return 0;
+	if (len == 0)
+		len = isize - pos_in;
+
+	/* Ensure offsets don't wrap and the input is inside i_size */
+	if (pos_in + len < pos_in || pos_out + len < pos_out ||
+	    pos_in + len > isize)
+		return -EINVAL;
+
+	/* If we're linking to EOF, continue to the block boundary. */
+	if (pos_in + len == isize)
+		blen = ALIGN(isize, bs) - pos_in;
+	else
+		blen = len;
+
+	/* Only reflink if we're aligned to block boundaries */
+	if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_in + blen, bs) ||
+	    !IS_ALIGNED(pos_out, bs) || !IS_ALIGNED(pos_out + blen, bs))
+		return -EINVAL;
+
+	/* Don't allow overlapped reflink within the same file */
+	if (same_inode && pos_out + blen > pos_in && pos_out < pos_in + blen)
+		return -EINVAL;
+
+	/* Wait for the completion of any pending IOs on srcfile */
+	ret = xfs_file_wait_for_io(inode_in, pos_in, len);
+	if (ret)
+		goto out_unlock;
+	ret = xfs_file_wait_for_io(inode_out, pos_out, len);
+	if (ret)
+		goto out_unlock;
+
+	ret = xfs_reflink_remap_range(XFS_I(inode_in), pos_in, XFS_I(inode_out),
+			pos_out, len);
+	if (ret < 0)
+		goto out_unlock;
+
+	/* Truncate the page cache so we don't see stale data */
+	truncate_inode_pages_range(&inode_out->i_data, pos_out,
+				   PAGE_CACHE_ALIGN(pos_out + len) - 1);
+
+out_unlock:
+	return ret;
+}
 
 STATIC int
 xfs_file_open(
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d42738d..1d836dc 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -41,6 +41,7 @@
 #include "xfs_trans.h"
 #include "xfs_pnfs.h"
 #include "xfs_acl.h"
+#include "xfs_reflink.h"
 
 #include <linux/capability.h>
 #include <linux/dcache.h>
@@ -49,6 +50,7 @@
 #include <linux/pagemap.h>
 #include <linux/slab.h>
 #include <linux/exportfs.h>
+#include <linux/fsnotify.h>
 
 /*
  * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
@@ -1513,6 +1515,49 @@ xfs_ioc_swapext(
 	return error;
 }
 
+extern int xfs_file_share_range(struct file *file_in, loff_t pos_in,
+		struct file *file_out, loff_t pos_out, size_t len);
+
+/*
+ * For reflink, validate the VFS parameters, convert them into the XFS
+ * equivalents, and then call the internal reflink function.
+ */
+STATIC int
+xfs_ioctl_reflink(
+	struct file	*file_in,
+	loff_t		pos_in,
+	struct file	*file_out,
+	loff_t		pos_out,
+	size_t		len)
+{
+	int		error;
+
+	/* Do we have the correct permissions? */
+	if (!(file_in->f_mode & FMODE_READ) ||
+	    !(file_out->f_mode & FMODE_WRITE) ||
+	    (file_out->f_flags & O_APPEND))
+		return -EBADF;
+
+	error = mnt_want_write_file(file_out);
+	if (error)
+		return error;
+
+	error = xfs_file_share_range(file_in, pos_in, file_out, pos_out, len);
+	if (error)
+		goto out_drop;
+
+	fsnotify_access(file_in);
+	add_rchar(current, len);
+	fsnotify_modify(file_out);
+	add_wchar(current, len);
+	inc_syscr(current);
+	inc_syscw(current);
+
+out_drop:
+	mnt_drop_write_file(file_out);
+	return error;
+}
+
 /*
  * Note: some of the ioctl's return positive numbers as a
  * byte count indicating success, such as readlink_by_handle.
@@ -1811,6 +1856,48 @@ xfs_file_ioctl(
 		return xfs_icache_free_eofblocks(mp, &keofb);
 	}
 
+	case XFS_IOC_CLONE: {
+		struct fd src;
+
+		src = fdget(p);
+		if (!src.file)
+			return -EBADF;
+
+		trace_xfs_ioctl_clone(file_inode(src.file), file_inode(filp));
+
+		error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL);
+		fdput(src);
+		if (error > 0)
+			error = 0;
+
+		return error;
+	}
+
+	case XFS_IOC_CLONE_RANGE: {
+		struct fd src;
+		struct xfs_clone_args args;
+
+		if (copy_from_user(&args, arg, sizeof(args)))
+			return -EFAULT;
+		src = fdget(args.src_fd);
+		if (!src.file)
+			return -EBADF;
+		if (args.src_length == 0)
+			args.src_length = ~0ULL;
+
+		trace_xfs_ioctl_clone_range(file_inode(src.file),
+				args.src_offset, args.src_length,
+				file_inode(filp), args.dest_offset);
+
+		error = xfs_ioctl_reflink(src.file, args.src_offset, filp,
+					  args.dest_offset, args.src_length);
+		fdput(src);
+		if (error > 0)
+			error = 0;
+
+		return error;
+	}
+
 	default:
 		return -ENOTTY;
 	}
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index 1a05d8a..dde2c7b 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -558,6 +558,8 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_GOINGDOWN:
 	case XFS_IOC_ERROR_INJECTION:
 	case XFS_IOC_ERROR_CLEARALL:
+	case XFS_IOC_CLONE:
+	case XFS_IOC_CLONE_RANGE:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 66/76] xfs: emulate the btrfs dedupe extent same ioctl
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (64 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 65/76] xfs: add clone file and clone range ioctls Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 67/76] xfs: teach fiemap about reflink'd extents Darrick J. Wong
                   ` (10 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Emulate the BTRFS_IOC_EXTENT_SAME ioctl.  This operation is similar
to clone_range, but the kernel must confirm that the contents of the
two extents are identical before performing the reflink.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |   30 ++++++++++++
 fs/xfs/xfs_file.c      |   11 +++-
 fs/xfs/xfs_ioctl.c     |  124 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_ioctl32.c   |    1 
 fs/xfs/xfs_reflink.c   |  120 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h   |    6 ++
 6 files changed, 282 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index a3cd93e..5c66459 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -580,8 +580,38 @@ struct xfs_clone_args {
 	__u64 dest_offset;
 };
 
+/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
+#define XFS_EXTENT_DATA_SAME	0
+#define XFS_EXTENT_DATA_DIFFERS	1
+
+/* from struct btrfs_ioctl_file_extent_same_info */
+struct xfs_extent_data_info {
+	__s64 fd;		/* in - destination file */
+	__u64 logical_offset;	/* in - start of extent in destination */
+	__u64 bytes_deduped;	/* out - total # of bytes we were able
+				 * to dedupe from this file */
+	/* status of this dedupe operation:
+	 * < 0 for error
+	 * == XFS_EXTENT_DATA_SAME if dedupe succeeds
+	 * == XFS_EXTENT_DATA_DIFFERS if data differs
+	 */
+	__s32 status;		/* out - see above description */
+	__u32 reserved;
+};
+
+/* from struct btrfs_ioctl_file_extent_same_args */
+struct xfs_extent_data {
+	__u64 logical_offset;	/* in - start of extent in source */
+	__u64 length;		/* in - length of extent */
+	__u16 dest_count;	/* in - total elements in info array */
+	__u16 reserved1;
+	__u32 reserved2;
+	struct xfs_extent_data_info info[0];
+};
+
 #define XFS_IOC_CLONE		 _IOW (0x94, 9, int)
 #define XFS_IOC_CLONE_RANGE	 _IOW (0x94, 13, struct xfs_clone_args)
+#define XFS_IOC_FILE_EXTENT_SAME _IOWR(0x94, 54, struct xfs_extent_data)
 
 #ifndef HAVE_BBMACROS
 /*
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 44d89ea..afca713 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1083,7 +1083,8 @@ xfs_file_share_range(
 	loff_t		pos_in,
 	struct file	*file_out,
 	loff_t		pos_out,
-	u64		len)
+	u64		len,
+	bool		is_dedupe)
 {
 	struct inode	*inode_in;
 	struct inode	*inode_out;
@@ -1092,6 +1093,7 @@ xfs_file_share_range(
 	loff_t		isize;
 	int		same_inode;
 	loff_t		blen;
+	unsigned int	flags = 0;
 
 	inode_in = file_inode(file_in);
 	inode_out = file_inode(file_out);
@@ -1153,13 +1155,16 @@ xfs_file_share_range(
 	if (ret)
 		goto out_unlock;
 
+	if (is_dedupe)
+		flags |= XFS_REFLINK_DEDUPE;
 	ret = xfs_reflink_remap_range(XFS_I(inode_in), pos_in, XFS_I(inode_out),
-			pos_out, len);
+			pos_out, len, flags);
 	if (ret < 0)
 		goto out_unlock;
 
 	/* Truncate the page cache so we don't see stale data */
-	truncate_inode_pages_range(&inode_out->i_data, pos_out,
+	if (!is_dedupe)
+		truncate_inode_pages_range(&inode_out->i_data, pos_out,
 				   PAGE_CACHE_ALIGN(pos_out + len) - 1);
 
 out_unlock:
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 1d836dc..07c4eb6 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1516,7 +1516,8 @@ xfs_ioc_swapext(
 }
 
 extern int xfs_file_share_range(struct file *file_in, loff_t pos_in,
-		struct file *file_out, loff_t pos_out, size_t len);
+		struct file *file_out, loff_t pos_out, size_t len,
+		bool is_dedupe);
 
 /*
  * For reflink, validate the VFS parameters, convert them into the XFS
@@ -1528,7 +1529,8 @@ xfs_ioctl_reflink(
 	loff_t		pos_in,
 	struct file	*file_out,
 	loff_t		pos_out,
-	size_t		len)
+	size_t		len,
+	bool		is_dedupe)
 {
 	int		error;
 
@@ -1542,7 +1544,8 @@ xfs_ioctl_reflink(
 	if (error)
 		return error;
 
-	error = xfs_file_share_range(file_in, pos_in, file_out, pos_out, len);
+	error = xfs_file_share_range(file_in, pos_in, file_out, pos_out, len,
+			is_dedupe);
 	if (error)
 		goto out_drop;
 
@@ -1558,6 +1561,113 @@ out_drop:
 	return error;
 }
 
+#define XFS_MAX_DEDUPE_LEN	(16 * 1024 * 1024)
+
+static long
+xfs_ioctl_file_extent_same(
+	struct file			*file,
+	struct xfs_extent_data __user	*argp)
+{
+	struct xfs_extent_data		*same = NULL;
+	struct xfs_extent_data_info	*info;
+	struct inode			*src;
+	u64				off;
+	u64				len;
+	int				i;
+	int				ret;
+	unsigned long			size;
+	bool				is_admin;
+	u16				count;
+
+	is_admin = capable(CAP_SYS_ADMIN);
+	src = file_inode(file);
+	if (!(file->f_mode & FMODE_READ))
+		return -EINVAL;
+
+	if (get_user(count, &argp->dest_count)) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	size = offsetof(struct xfs_extent_data __user,
+			info[count]);
+
+	same = memdup_user(argp, size);
+
+	if (IS_ERR(same)) {
+		ret = PTR_ERR(same);
+		goto out;
+	}
+
+	off = same->logical_offset;
+	len = same->length;
+
+	/*
+	 * Limit the total length we will dedupe for each operation.
+	 * This is intended to bound the total time spent in this
+	 * ioctl to something sane.
+	 */
+	if (len > XFS_MAX_DEDUPE_LEN)
+		len = XFS_MAX_DEDUPE_LEN;
+
+	ret = -EISDIR;
+	if (S_ISDIR(src->i_mode))
+		goto out;
+
+	ret = -EACCES;
+	if (!S_ISREG(src->i_mode))
+		goto out;
+
+	/* pre-format output fields to sane values */
+	for (i = 0; i < count; i++) {
+		same->info[i].bytes_deduped = 0ULL;
+		same->info[i].status = 0;
+	}
+
+	for (i = 0, info = same->info; i < count; i++, info++) {
+		struct inode *dst;
+		struct fd dst_file = fdget(info->fd);
+
+		if (!dst_file.file) {
+			info->status = -EBADF;
+			continue;
+		}
+		dst = file_inode(dst_file.file);
+
+		trace_xfs_ioctl_file_extent_same(file_inode(file), off, len,
+				dst, info->logical_offset);
+
+		info->bytes_deduped = 0;
+		if (!(is_admin || (dst_file.file->f_mode & FMODE_WRITE))) {
+			info->status = -EINVAL;
+		} else if (file->f_path.mnt != dst_file.file->f_path.mnt) {
+			info->status = -EXDEV;
+		} else if (S_ISDIR(dst->i_mode)) {
+			info->status = -EISDIR;
+		} else if (!S_ISREG(dst->i_mode)) {
+			info->status = -EOPNOTSUPP;
+		} else {
+			ret = xfs_ioctl_reflink(file, off, dst_file.file,
+					info->logical_offset, len, true);
+			if (ret == -EBADE)
+				info->status = XFS_EXTENT_DATA_DIFFERS;
+			else if (ret == 0)
+				info->bytes_deduped = len;
+			else
+				info->status = ret;
+		}
+		fdput(dst_file);
+	}
+
+	ret = copy_to_user(argp, same, size);
+	if (ret)
+		ret = -EFAULT;
+
+out:
+	kfree(same);
+	return ret;
+}
+
 /*
  * Note: some of the ioctl's return positive numbers as a
  * byte count indicating success, such as readlink_by_handle.
@@ -1865,7 +1975,7 @@ xfs_file_ioctl(
 
 		trace_xfs_ioctl_clone(file_inode(src.file), file_inode(filp));
 
-		error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL);
+		error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL, false);
 		fdput(src);
 		if (error > 0)
 			error = 0;
@@ -1890,7 +2000,8 @@ xfs_file_ioctl(
 				file_inode(filp), args.dest_offset);
 
 		error = xfs_ioctl_reflink(src.file, args.src_offset, filp,
-					  args.dest_offset, args.src_length);
+					  args.dest_offset, args.src_length,
+					  false);
 		fdput(src);
 		if (error > 0)
 			error = 0;
@@ -1898,6 +2009,9 @@ xfs_file_ioctl(
 		return error;
 	}
 
+	case XFS_IOC_FILE_EXTENT_SAME:
+		return xfs_ioctl_file_extent_same(filp, arg);
+
 	default:
 		return -ENOTTY;
 	}
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index dde2c7b..80b7b3c 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -560,6 +560,7 @@ xfs_file_compat_ioctl(
 	case XFS_IOC_ERROR_CLEARALL:
 	case XFS_IOC_CLONE:
 	case XFS_IOC_CLONE_RANGE:
+	case XFS_IOC_FILE_EXTENT_SAME:
 		return xfs_file_ioctl(filp, cmd, p);
 #ifndef BROKEN_X86_ALIGNMENT
 	/* These are handled fine if no alignment issues */
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 3de3c9a..c686583 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1100,6 +1100,103 @@ err:
 	return error;
 }
 
+/*
+ * Read a page's worth of file data into the page cache.
+ */
+STATIC struct page *
+xfs_get_page(
+	struct inode	*inode,		/* inode */
+	xfs_off_t	offset)		/* where in the inode to read */
+{
+	struct address_space	*mapping;
+	struct page		*page;
+	pgoff_t			n;
+
+	n = offset >> PAGE_CACHE_SHIFT;
+	mapping = inode->i_mapping;
+	page = read_mapping_page(mapping, n, NULL);
+	if (IS_ERR(page))
+		return page;
+	if (!PageUptodate(page)) {
+		page_cache_release(page);
+		return NULL;
+	}
+	return page;
+}
+
+/*
+ * Compare extents of two files to see if they are the same.
+ */
+STATIC int
+xfs_compare_extents(
+	struct inode	*src,		/* first inode */
+	xfs_off_t	srcoff,		/* offset of first inode */
+	struct inode	*dest,		/* second inode */
+	xfs_off_t	destoff,	/* offset of second inode */
+	xfs_off_t	len,		/* length of data to compare */
+	bool		*is_same)	/* out: true if the contents match */
+{
+	xfs_off_t	src_poff;
+	xfs_off_t	dest_poff;
+	void		*src_addr;
+	void		*dest_addr;
+	struct page	*src_page;
+	struct page	*dest_page;
+	xfs_off_t	cmp_len;
+	bool		same;
+	int		error;
+
+	error = -EINVAL;
+	same = true;
+	while (len) {
+		src_poff = srcoff & (PAGE_CACHE_SIZE - 1);
+		dest_poff = destoff & (PAGE_CACHE_SIZE - 1);
+		cmp_len = min(PAGE_CACHE_SIZE - src_poff,
+			      PAGE_CACHE_SIZE - dest_poff);
+		cmp_len = min(cmp_len, len);
+		ASSERT(cmp_len > 0);
+
+		trace_xfs_reflink_compare_extents(XFS_I(src), srcoff, cmp_len,
+				XFS_I(dest), destoff);
+
+		src_page = xfs_get_page(src, srcoff);
+		if (!src_page)
+			goto out_error;
+		dest_page = xfs_get_page(dest, destoff);
+		if (!dest_page) {
+			page_cache_release(src_page);
+			goto out_error;
+		}
+		src_addr = kmap_atomic(src_page);
+		dest_addr = kmap_atomic(dest_page);
+
+		flush_dcache_page(src_page);
+		flush_dcache_page(dest_page);
+
+		if (memcmp(src_addr + src_poff, dest_addr + dest_poff, cmp_len))
+			same = false;
+
+		kunmap_atomic(src_addr);
+		kunmap_atomic(dest_addr);
+		page_cache_release(src_page);
+		page_cache_release(dest_page);
+
+		if (!same)
+			break;
+
+		srcoff += cmp_len;
+		destoff += cmp_len;
+		len -= cmp_len;
+	}
+
+	*is_same = same;
+	return 0;
+
+out_error:
+	trace_xfs_reflink_compare_extents_error(XFS_I(dest), error, _RET_IP_);
+	return error;
+}
+
 /**
  * xfs_reflink_remap_range() -- Link a range of blocks from one file to another.
  *
@@ -1108,6 +1205,7 @@ err:
  * @dest: Inode to clone to
  * @destoff: Offset within @inode to start clone
  * @len: Original length, passed by user, of range to clone
+ * @flags: Flags to modify reflink's behavior
  */
 int
 xfs_reflink_remap_range(
@@ -1115,12 +1213,14 @@ xfs_reflink_remap_range(
 	xfs_off_t		srcoff,
 	struct xfs_inode	*dest,
 	xfs_off_t		destoff,
-	xfs_off_t		len)
+	xfs_off_t		len,
+	unsigned int		flags)
 {
 	struct xfs_mount	*mp = src->i_mount;
 	xfs_fileoff_t		sfsbno, dfsbno;
 	xfs_filblks_t		fsblen;
 	int			error;
+	bool			is_same;
 
 	if (!xfs_sb_version_hasreflink(&mp->m_sb))
 		return -EOPNOTSUPP;
@@ -1132,6 +1232,9 @@ xfs_reflink_remap_range(
 	if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest))
 		return -EINVAL;
 
+	if (flags & ~XFS_REFLINK_ALL)
+		return -EINVAL;
+
 	trace_xfs_reflink_remap_range(src, srcoff, len, dest, destoff);
 
 	/* Lock both files against IO */
@@ -1143,6 +1246,21 @@ xfs_reflink_remap_range(
 		xfs_lock_two_inodes(src, dest, XFS_MMAPLOCK_EXCL);
 	}
 
+	/*
+	 * Check that the extents are the same.
+	 */
+	if (flags & XFS_REFLINK_DEDUPE) {
+		is_same = false;
+		error = xfs_compare_extents(VFS_I(src), srcoff, VFS_I(dest),
+				destoff, len, &is_same);
+		if (error)
+			goto out_error;
+		if (!is_same) {
+			error = -EBADE;
+			goto out_error;
+		}
+	}
+
 	error = xfs_reflink_set_inode_flag(src, dest);
 	if (error)
 		goto out_error;
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index df33044..42eb860 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -37,7 +37,11 @@ extern int xfs_reflink_cancel_pending_cow(struct xfs_inode *ip);
 int	xfs_map_cow_blocks(struct inode *inode, xfs_off_t offset,
 			   struct xfs_bmbt_irec	*imap);
 
+#define XFS_REFLINK_DEDUPE	1	/* only reflink if contents match */
+#define XFS_REFLINK_ALL		(XFS_REFLINK_DEDUPE)
+
 extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff,
-		struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len);
+		struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len,
+		unsigned int flags);
 
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 67/76] xfs: teach fiemap about reflink'd extents
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (65 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 66/76] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:03 ` [PATCH 68/76] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
                   ` (9 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Teach FIEMAP to report shared (i.e. reflinked) extents.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |    2 +
 fs/xfs/xfs_bmap_util.h |    3 +-
 fs/xfs/xfs_ioctl.c     |   12 +++++++-
 fs/xfs/xfs_iops.c      |   68 ++++++++++++++++++++++++++++++++++++++++--------
 4 files changed, 69 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 3e274f6..61d616e 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -742,7 +742,7 @@ xfs_getbmap(
 		int full = 0;	/* user array is full */
 
 		/* format results & advance arg */
-		error = formatter(&arg, &out[i], &full);
+		error = formatter(ip, &arg, &out[i], &full);
 		if (error || full)
 			break;
 	}
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index af97d9a..d0dc504 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -37,7 +37,8 @@ int	xfs_bmap_punch_delalloc_range(struct xfs_inode *ip,
 		xfs_fileoff_t start_fsb, xfs_fileoff_t length);
 
 /* bmap to userspace formatter - copy to user & advance pointer */
-typedef int (*xfs_bmap_format_t)(void **, struct getbmapx *, int *);
+typedef int (*xfs_bmap_format_t)(struct xfs_inode *, void **, struct getbmapx *,
+		int *);
 int	xfs_getbmap(struct xfs_inode *ip, struct getbmapx *bmv,
 		xfs_bmap_format_t formatter, void *arg);
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 07c4eb6..cf712641 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1362,7 +1362,11 @@ out_drop_write:
 }
 
 STATIC int
-xfs_getbmap_format(void **ap, struct getbmapx *bmv, int *full)
+xfs_getbmap_format(
+	struct xfs_inode	*ip,
+	void			**ap,
+	struct getbmapx		*bmv,
+	int			*full)
 {
 	struct getbmap __user	*base = (struct getbmap __user *)*ap;
 
@@ -1406,7 +1410,11 @@ xfs_ioc_getbmap(
 }
 
 STATIC int
-xfs_getbmapx_format(void **ap, struct getbmapx *bmv, int *full)
+xfs_getbmapx_format(
+	struct xfs_inode	*ip,
+	void			**ap,
+	struct getbmapx		*bmv,
+	int			*full)
 {
 	struct getbmapx __user	*base = (struct getbmapx __user *)*ap;
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 245268a..ff1341c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -38,6 +38,8 @@
 #include "xfs_dir2.h"
 #include "xfs_trans_space.h"
 #include "xfs_pnfs.h"
+#include "xfs_bit.h"
+#include "xfs_refcount.h"
 
 #include <linux/capability.h>
 #include <linux/xattr.h>
@@ -1014,14 +1016,23 @@ xfs_vn_update_time(
  */
 STATIC int
 xfs_fiemap_format(
+	struct xfs_inode	*ip,
 	void			**arg,
 	struct getbmapx		*bmv,
 	int			*full)
 {
-	int			error;
+	int			error = 0;
 	struct fiemap_extent_info *fieinfo = *arg;
 	u32			fiemap_flags = 0;
-	u64			logical, physical, length;
+	u64			logical, physical, length, loop_len, len;
+	xfs_agblock_t		ebno;
+	xfs_extlen_t		elen;
+	xfs_nlink_t		nr;
+	xfs_fsblock_t		fsbno;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	xfs_extlen_t		aglen;
+	struct xfs_mount	*mp = ip->i_mount;
 
 	/* Do nothing for a hole */
 	if (bmv->bmv_block == -1LL)
@@ -1029,7 +1040,7 @@ xfs_fiemap_format(
 
 	logical = BBTOB(bmv->bmv_offset);
 	physical = BBTOB(bmv->bmv_block);
-	length = BBTOB(bmv->bmv_length);
+	length = loop_len = BBTOB(bmv->bmv_length);
 
 	if (bmv->bmv_oflags & BMV_OF_PREALLOC)
 		fiemap_flags |= FIEMAP_EXTENT_UNWRITTEN;
@@ -1038,16 +1049,49 @@ xfs_fiemap_format(
 				 FIEMAP_EXTENT_UNKNOWN);
 		physical = 0;   /* no block yet */
 	}
-	if (bmv->bmv_oflags & BMV_OF_LAST)
-		fiemap_flags |= FIEMAP_EXTENT_LAST;
-
-	error = fiemap_fill_next_extent(fieinfo, logical, physical,
-					length, fiemap_flags);
-	if (error > 0) {
-		error = 0;
-		*full = 1;	/* user array now full */
-	}
 
+	while (loop_len > 0) {
+		u32 ext_flags = 0;
+
+		if (bmv->bmv_oflags & BMV_OF_DELALLOC) {
+			physical = 0;
+			len = loop_len;
+			nr = 1;
+		} else if (xfs_is_reflink_inode(ip)) {
+			fsbno = XFS_DADDR_TO_FSB(mp, BTOBB(physical));
+			agno = XFS_FSB_TO_AGNO(mp, fsbno);
+			agbno = XFS_FSB_TO_AGBNO(mp, fsbno);
+			aglen = XFS_B_TO_FSB(mp, loop_len);
+			error = xfs_refcount_find_shared(mp, agno, agbno, aglen,
+							 &ebno, &elen, true);
+			if (error)
+				goto out;
+			if (elen == 0) {
+				len = loop_len;
+			} else if (ebno == agbno) {
+				len = XFS_FSB_TO_B(mp, elen);
+				ext_flags |= FIEMAP_EXTENT_SHARED;
+			} else {
+				len = XFS_FSB_TO_B(mp, ebno - agbno);
+			}
+		} else
+			len = loop_len;
+		if ((bmv->bmv_oflags & BMV_OF_LAST) &&
+		    len == loop_len)
+			ext_flags |= FIEMAP_EXTENT_LAST;
+
+		error = fiemap_fill_next_extent(fieinfo, logical, physical,
+						len, fiemap_flags | ext_flags);
+		if (error > 0) {
+			error = 0;
+			*full = 1;	/* user array now full */
+			goto out;
+		}
+		logical += len;
+		physical += len;
+		loop_len -= len;
+	}
+out:
 	return error;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 68/76] xfs: swap inode reflink flags when swapping inode extents
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (66 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 67/76] xfs: teach fiemap about reflink'd extents Darrick J. Wong
@ 2015-12-19  9:03 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 69/76] xfs: unshare a range of blocks via fallocate Darrick J. Wong
                   ` (8 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:03 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When we're swapping the extents of two inodes, be sure to swap the
reflink inode flag too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_bmap_util.c |   15 +++++++++++++++
 1 file changed, 15 insertions(+)


diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 61d616e..7bee3c7 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1782,6 +1782,8 @@ xfs_swap_extents(
 	int		lock_flags;
 	xfs_fsblock_t	firstfsb;
 	struct xfs_trans_res	tres = {.tr_logres = 262144, .tr_logcount = 1, .tr_logflags = 0};
+	struct xfs_ifork	*cowfp;
+	__uint64_t	f;
 
 	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
 	if (!tempifp) {
@@ -2010,6 +2012,19 @@ xfs_swap_extents(
 		break;
 	}
 
+	/* Do we have to swap reflink flags? */
+	if ((ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK) ^
+	    (tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK)) {
+		f = ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+		ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+		ip->i_d.di_flags2 |= tip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK;
+		tip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+		tip->i_d.di_flags2 |= f & XFS_DIFLAG2_REFLINK;
+		cowfp = ip->i_cowfp;
+		ip->i_cowfp = tip->i_cowfp;
+		tip->i_cowfp = cowfp;
+	}
+
 	xfs_trans_log_inode(tp, ip,  src_log_flags);
 	xfs_trans_log_inode(tp, tip, target_log_flags);
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 69/76] xfs: unshare a range of blocks via fallocate
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (67 preceding siblings ...)
  2015-12-19  9:03 ` [PATCH 68/76] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 70/76] xfs: fork shared EOF block when truncating file Darrick J. Wong
                   ` (7 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Now that we have an fallocate flag to unshare a range of blocks, make
XFS actually implement it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_file.c    |    6 +
 fs/xfs/xfs_reflink.c |  317 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    3 
 3 files changed, 325 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index afca713..e6bc6ab 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1011,9 +1011,13 @@ xfs_file_fallocate(
 
 		if (mode & FALLOC_FL_ZERO_RANGE)
 			error = xfs_zero_file_space(ip, offset, len);
-		else
+		else {
+			error = xfs_reflink_unshare(ip, file, offset, len);
+			if (error)
+				goto out_unlock;
 			error = xfs_alloc_file_space(ip, offset, len,
 						     XFS_BMAPI_PREALLOC);
+		}
 		if (error)
 			goto out_unlock;
 	}
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index c686583..f68df66 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1285,3 +1285,320 @@ out_error:
 		trace_xfs_reflink_remap_range_error(dest, error, _RET_IP_);
 	return error;
 }
+
+/**
+ * xfs_reflink_dirty_range() -- Dirty all the shared blocks in the file so that
+ * they're rewritten elsewhere.  Similar to generic_perform_write().
+ *
+ * @filp: VFS file pointer
+ * @pos: offset to start dirtying
+ * @len: number of bytes to dirty
+ */
+STATIC int
+xfs_reflink_dirty_range(
+	struct file		*filp,
+	xfs_off_t		pos,
+	xfs_off_t		len)
+{
+	struct address_space	*mapping;
+	const struct address_space_operations *a_ops;
+	int			error;
+	unsigned int		flags;
+	struct page		*page;
+	struct page		*rpage;
+	unsigned long		offset;	/* Offset into pagecache page */
+	unsigned long		bytes;	/* Bytes to write to page */
+	void			*fsdata;
+
+	mapping = filp->f_mapping;
+	a_ops = mapping->a_ops;
+	flags = AOP_FLAG_UNINTERRUPTIBLE;
+	do {
+
+		offset = (pos & (PAGE_CACHE_SIZE - 1));
+		bytes = min_t(unsigned long, len, PAGE_CACHE_SIZE) - offset;
+		rpage = xfs_get_page(file_inode(filp), pos);
+		if (IS_ERR(rpage)) {
+			error = PTR_ERR(rpage);
+			break;
+		} else if (!rpage) {
+			error = -ENOMEM;
+			break;
+		}
+
+		error = a_ops->write_begin(filp, mapping, pos, bytes, flags,
+					   &page, &fsdata);
+		page_cache_release(rpage);
+		if (error < 0)
+			break;
+
+		trace_xfs_reflink_unshare_page(file_inode(filp), page,
+				pos, bytes);
+
+		if (!PageUptodate(page)) {
+			pr_err("%s: STALE? ino=%lu pos=%llu\n",
+				__func__, filp->f_inode->i_ino, pos);
+			WARN_ON(1);
+		}
+		if (mapping_writably_mapped(mapping))
+			flush_dcache_page(page);
+
+		error = a_ops->write_end(filp, mapping, pos, bytes, bytes,
+					 page, fsdata);
+		if (error < 0)
+			break;
+		else if (error == 0) {
+			error = -EIO;
+			break;
+		} else {
+			bytes = error;
+			error = 0;
+		}
+
+		cond_resched();
+
+		pos += bytes;
+		len -= bytes;
+
+		balance_dirty_pages_ratelimited(mapping);
+		if (fatal_signal_pending(current)) {
+			error = -EINTR;
+			break;
+		}
+	} while (len > 0);
+
+	return error;
+}
+
+/*
+ * The user wants to preemptively CoW all shared blocks in this file,
+ * which enables us to turn off the reflink flag.  Iterate all
+ * extents which are not prealloc/delalloc to see which ranges are
+ * mentioned in the refcount tree, then read those blocks into the
+ * pagecache, dirty them, fsync them back out, and then we can update
+ * the inode flag.  What happens if we run out of memory? :)
+ */
+STATIC int
+xfs_reflink_dirty_extents(
+	struct xfs_inode	*ip,
+	struct file		*filp,
+	xfs_fileoff_t		fbno,
+	xfs_filblks_t		end,
+	xfs_off_t		isize)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	xfs_extlen_t		aglen;
+	xfs_agblock_t		rbno;
+	xfs_extlen_t		rlen;
+	xfs_off_t		fpos;
+	xfs_off_t		flen;
+	struct xfs_bmbt_irec	map[2];
+	int			nmaps;
+	int			error;
+
+	while (end - fbno > 0) {
+		nmaps = 1;
+		/*
+		 * Look for extents in the file.  Skip holes, delalloc, or
+		 * unwritten extents; they can't be reflinked.
+		 */
+		error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
+		if (error)
+			goto out;
+		if (nmaps == 0)
+			break;
+		if (map[0].br_startblock == HOLESTARTBLOCK ||
+		    map[0].br_startblock == DELAYSTARTBLOCK ||
+		    ISUNWRITTEN(&map[0]))
+			goto next;
+
+		map[1] = map[0];
+		while (map[1].br_blockcount) {
+			agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
+			agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
+			aglen = map[1].br_blockcount;
+
+			error = xfs_refcount_find_shared(mp, agno, agbno, aglen,
+							 &rbno, &rlen, true);
+			if (error)
+				goto out;
+			if (rlen == 0)
+				goto skip_copy;
+
+			/* Dirty the pages */
+			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			fpos = XFS_FSB_TO_B(mp, map[1].br_startoff +
+					(rbno - agbno));
+			flen = XFS_FSB_TO_B(mp, rlen);
+			if (fpos + flen > isize)
+				flen = isize - fpos;
+			error = xfs_reflink_dirty_range(filp, fpos, flen);
+			xfs_ilock(ip, XFS_ILOCK_EXCL);
+			if (error)
+				goto out;
+skip_copy:
+			map[1].br_blockcount -= (rbno - agbno + rlen);
+			map[1].br_startoff += (rbno - agbno + rlen);
+			map[1].br_startblock += (rbno - agbno + rlen);
+		}
+
+next:
+		fbno = map[0].br_startoff + map[0].br_blockcount;
+	}
+out:
+	return error;
+}
+
+/* Iterate the extents; if there are no reflinked blocks, clear the flag. */
+STATIC int
+xfs_reflink_try_clear_inode_flag(
+	struct xfs_inode	*ip,
+	xfs_off_t		old_isize)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;
+	xfs_fileoff_t		fbno;
+	xfs_filblks_t		end;
+	xfs_agnumber_t		agno;
+	xfs_agblock_t		agbno;
+	xfs_extlen_t		aglen;
+	xfs_agblock_t		rbno;
+	xfs_extlen_t		rlen;
+	struct xfs_bmbt_irec	map[2];
+	int			nmaps;
+	int			error = 0;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	if (old_isize != i_size_read(VFS_I(ip)))
+		goto out;
+	if (!(ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK))
+		goto out;
+
+	fbno = 0;
+	end = XFS_B_TO_FSB(mp, old_isize);
+	while (end - fbno > 0) {
+		nmaps = 1;
+		/*
+		 * Look for extents in the file.  Skip holes, delalloc, or
+		 * unwritten extents; they can't be reflinked.
+		 */
+		error = xfs_bmapi_read(ip, fbno, end - fbno, map, &nmaps, 0);
+		if (error)
+			goto out;
+		if (nmaps == 0)
+			break;
+		if (map[0].br_startblock == HOLESTARTBLOCK ||
+		    map[0].br_startblock == DELAYSTARTBLOCK ||
+		    ISUNWRITTEN(&map[0]))
+			goto next;
+
+		map[1] = map[0];
+		while (map[1].br_blockcount) {
+			agno = XFS_FSB_TO_AGNO(mp, map[1].br_startblock);
+			agbno = XFS_FSB_TO_AGBNO(mp, map[1].br_startblock);
+			aglen = map[1].br_blockcount;
+
+			error = xfs_refcount_find_shared(mp, agno, agbno, aglen,
+							 &rbno, &rlen, false);
+			if (error)
+				goto out;
+			/* Is there still a shared block here? */
+			if (rlen > 0) {
+				error = 0;
+				goto out;
+			}
+
+			map[1].br_blockcount -= aglen;
+			map[1].br_startoff += aglen;
+			map[1].br_startblock += aglen;
+		}
+
+next:
+		fbno = map[0].br_startoff + map[0].br_blockcount;
+	}
+
+	/* No reflinked blocks, so clear the flag */
+	tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_NOT_SIZE);
+	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_ichange, 0, 0);
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out;
+	}
+	trace_xfs_reflink_unset_inode_flag(ip);
+	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+	error = xfs_trans_commit(tp);
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out;
+	}
+
+	return 0;
+out:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+}
+
+/**
+ * xfs_reflink_unshare() - Pre-COW all shared blocks within a given range
+ *			   of a file and turn off the reflink flag if we
+ *			   unshare all of the file's blocks.
+ * @ip: XFS inode
+ * @filp: VFS file structure
+ * @offset: Offset to start
+ * @len: Length to ...
+ */
+int
+xfs_reflink_unshare(
+	struct xfs_inode	*ip,
+	struct file		*filp,
+	xfs_off_t		offset,
+	xfs_off_t		len)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		fbno;
+	xfs_filblks_t		end;
+	xfs_off_t		old_isize, isize;
+	int			error;
+
+	if (!xfs_is_reflink_inode(ip))
+		return 0;
+
+	trace_xfs_reflink_unshare(ip, offset, len);
+
+	inode_dio_wait(VFS_I(ip));
+
+	/* Try to CoW the selected ranges */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	fbno = XFS_B_TO_FSB(mp, offset);
+	old_isize = isize = i_size_read(VFS_I(ip));
+	end = XFS_B_TO_FSB(mp, offset + len);
+	error = xfs_reflink_dirty_extents(ip, filp, fbno, end, isize);
+	if (error)
+		goto out_unlock;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/* Wait for the IO to finish */
+	error = filemap_write_and_wait(filp->f_mapping);
+	if (error)
+		goto out;
+
+	/* Turn off the reflink flag if we unshared the whole file */
+	if (offset == 0 && len == isize) {
+		error = xfs_reflink_try_clear_inode_flag(ip, old_isize);
+		if (error)
+			goto out;
+	}
+
+	return 0;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+out:
+	trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
+	return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 42eb860..1904b32 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -44,4 +44,7 @@ extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff,
 		struct xfs_inode *dest, xfs_off_t destoff, xfs_off_t len,
 		unsigned int flags);
 
+extern int xfs_reflink_unshare(struct xfs_inode *ip, struct file *filp,
+		xfs_off_t offset, xfs_off_t len);
+
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 70/76] xfs: fork shared EOF block when truncating file
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (68 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 69/76] xfs: unshare a range of blocks via fallocate Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 71/76] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
                   ` (6 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

When shrinking a file, the VFS zeroes everything in the associated
page between the new EOF and the previous EOF to avoid leaking data.
If this block is shared we need to fork it before the VFS does its
zeroing to avoid corrupting the other files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_iops.c    |    9 +++++++++
 fs/xfs/xfs_reflink.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h |    2 ++
 3 files changed, 58 insertions(+)


diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ff1341c..56a1368 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -40,6 +40,7 @@
 #include "xfs_pnfs.h"
 #include "xfs_bit.h"
 #include "xfs_refcount.h"
+#include "xfs_reflink.h"
 
 #include <linux/capability.h>
 #include <linux/xattr.h>
@@ -814,6 +815,14 @@ xfs_setattr_size(
 	}
 
 	/*
+	 * Fork the last block of the file if it's necessary to avoid
+	 * corrupting other files.
+	 */
+	error = xfs_reflink_truncate(ip, iattr->ia_file, newsize);
+	if (error)
+		return error;
+
+	/*
 	 * We are going to log the inode size change in this transaction so
 	 * any previous writes that are beyond the on disk EOF and the new
 	 * EOF that have not been written out need to be written here.  If we
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index f68df66..04daa7c 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1602,3 +1602,50 @@ out:
 	trace_xfs_reflink_unshare_error(ip, error, _RET_IP_);
 	return error;
 }
+
+/**
+ * xfs_reflink_truncate() - If we're trying to truncate a file whose last
+ *			    block is shared and the new size isn't aligned to
+ *			    a block boundary, we need to dirty that last block
+ *			    ahead of the VFS zeroing the page.
+ * @ip: XFS inode
+ * @filp: VFS file structure
+ * @len: New file length
+ */
+int
+xfs_reflink_truncate(
+	struct xfs_inode	*ip,
+	struct file		*filp,
+	xfs_off_t		newsize)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_fileoff_t		fbno;
+	xfs_off_t		isize;
+	int			error;
+
+	if (!xfs_is_reflink_inode(ip) ||
+	    (newsize & ((1 << VFS_I(ip)->i_blkbits) - 1)) == 0)
+		return 0;
+
+	/* Try to CoW the shared last block */
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	fbno = XFS_B_TO_FSBT(mp, newsize);
+	isize = i_size_read(VFS_I(ip));
+
+	if (newsize > isize)
+		trace_xfs_reflink_truncate(ip, isize, newsize - isize);
+	else
+		trace_xfs_reflink_truncate(ip, newsize, isize - newsize);
+
+	error = xfs_reflink_dirty_extents(ip, filp, fbno, fbno + 1, isize);
+	if (error)
+		goto out_unlock;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+	return 0;
+
+out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	trace_xfs_reflink_truncate_error(ip, error, _RET_IP_);
+	return error;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 1904b32..b6d7ce6 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -46,5 +46,7 @@ extern int xfs_reflink_remap_range(struct xfs_inode *src, xfs_off_t srcoff,
 
 extern int xfs_reflink_unshare(struct xfs_inode *ip, struct file *filp,
 		xfs_off_t offset, xfs_off_t len);
+extern int xfs_reflink_truncate(struct xfs_inode *ip, struct file *filp,
+		xfs_off_t newsize);
 
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 71/76] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (69 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 70/76] xfs: fork shared EOF block when truncating file Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 72/76] xfs: recognize the reflink feature bit Darrick J. Wong
                   ` (5 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Report the reflink/nocow flags as appropriate in the XFS-specific and
"standard" getattr ioctls.

Allow the user to clear the reflink flag (or set the nocow flag) if the
size is zero.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_fs.h |    1 +
 fs/xfs/xfs_inode.c     |   10 +++++++--
 fs/xfs/xfs_ioctl.c     |   15 +++++++++++++-
 fs/xfs/xfs_reflink.c   |   51 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_reflink.h   |    4 ++++
 5 files changed, 77 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 5c66459..913c015 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -67,6 +67,7 @@ struct fsxattr {
 #define XFS_XFLAG_EXTSZINHERIT	0x00001000	/* inherit inode extent size */
 #define XFS_XFLAG_NODEFRAG	0x00002000  	/* do not defragment */
 #define XFS_XFLAG_FILESTREAM	0x00004000	/* use filestream allocator */
+#define XFS_XFLAG_REFLINK	0x00008000	/* file is reflinked */
 #define XFS_XFLAG_HASATTR	0x80000000	/* no DIFLAG for this	*/
 
 /*
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 59964c3..79765e0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -611,7 +611,8 @@ __xfs_iflock(
 
 STATIC uint
 _xfs_dic2xflags(
-	__uint16_t		di_flags)
+	__uint16_t		di_flags,
+	__uint64_t		di_flags2)
 {
 	uint			flags = 0;
 
@@ -644,6 +645,8 @@ _xfs_dic2xflags(
 			flags |= XFS_XFLAG_NODEFRAG;
 		if (di_flags & XFS_DIFLAG_FILESTREAM)
 			flags |= XFS_XFLAG_FILESTREAM;
+		if (di_flags2 & XFS_DIFLAG2_REFLINK)
+			flags |= XFS_XFLAG_REFLINK;
 	}
 
 	return flags;
@@ -655,7 +658,7 @@ xfs_ip2xflags(
 {
 	xfs_icdinode_t		*dic = &ip->i_d;
 
-	return _xfs_dic2xflags(dic->di_flags) |
+	return _xfs_dic2xflags(dic->di_flags, dic->di_flags2) |
 				(XFS_IFORK_Q(ip) ? XFS_XFLAG_HASATTR : 0);
 }
 
@@ -663,7 +666,8 @@ uint
 xfs_dic2xflags(
 	xfs_dinode_t		*dip)
 {
-	return _xfs_dic2xflags(be16_to_cpu(dip->di_flags)) |
+	return _xfs_dic2xflags(be16_to_cpu(dip->di_flags),
+			       be64_to_cpu(dip->di_flags2)) |
 				(XFS_DFORK_Q(dip) ? XFS_XFLAG_HASATTR : 0);
 }
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index cf712641..29dc36c 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -880,6 +880,10 @@ xfs_merge_ioc_xflags(
 		xflags |= XFS_XFLAG_NODUMP;
 	else
 		xflags &= ~XFS_XFLAG_NODUMP;
+	if (flags & FS_NOCOW_FL)
+		xflags &= ~XFS_XFLAG_REFLINK;
+	else
+		xflags |= XFS_XFLAG_REFLINK;
 
 	return xflags;
 }
@@ -1191,6 +1195,10 @@ xfs_ioctl_setattr(
 
 	trace_xfs_ioctl_setattr(ip);
 
+	code = xfs_reflink_check_flag_adjust(ip, &fa->fsx_xflags);
+	if (code)
+		return code;
+
 	code = xfs_ioctl_setattr_check_projid(ip, fa);
 	if (code)
 		return code;
@@ -1313,6 +1321,7 @@ xfs_ioc_getxflags(
 	unsigned int		flags;
 
 	flags = xfs_di2lxflags(ip->i_d.di_flags);
+	xfs_reflink_get_lxflags(ip, &flags);
 	if (copy_to_user(arg, &flags, sizeof(flags)))
 		return -EFAULT;
 	return 0;
@@ -1334,11 +1343,15 @@ xfs_ioc_setxflags(
 
 	if (flags & ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \
 		      FS_NOATIME_FL | FS_NODUMP_FL | \
-		      FS_SYNC_FL))
+		      FS_SYNC_FL | FS_NOCOW_FL))
 		return -EOPNOTSUPP;
 
 	fa.fsx_xflags = xfs_merge_ioc_xflags(flags, xfs_ip2xflags(ip));
 
+	error = xfs_reflink_check_flag_adjust(ip, &fa.fsx_xflags);
+	if (error)
+		return error;
+
 	error = mnt_want_write_file(filp);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 04daa7c..299a957 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1649,3 +1649,54 @@ out_unlock:
 	trace_xfs_reflink_truncate_error(ip, error, _RET_IP_);
 	return error;
 }
+
+/**
+ * xfs_reflink_get_lxflags() - set reflink-related linux inode flags
+ *
+ * @ip: XFS inode
+ * @flags: Pointer to the user-visible inode flags
+ */
+void
+xfs_reflink_get_lxflags(
+	struct xfs_inode	*ip,		/* XFS inode */
+	unsigned int		*flags)		/* user flags */
+{
+	/*
+	 * If this is a reflink-capable filesystem and there are no shared
+	 * blocks, then this is a "nocow" file.
+	 */
+	if (!xfs_sb_version_hasreflink(&ip->i_mount->m_sb) ||
+	    xfs_is_reflink_inode(ip))
+		return;
+	*flags |= FS_NOCOW_FL;
+}
+
+/**
+ * xfs_reflink_check_flag_adjust() - The only change we allow to the inode
+ *				     reflink flag is to clear it when the
+ *				     fs supports reflink and the size is zero.
+ *
+ * @ip: XFS inode
+ * @xflags: XFS in-core inode flags
+ */
+int
+xfs_reflink_check_flag_adjust(
+	struct xfs_inode	*ip,
+	unsigned int		*xflags)
+{
+	unsigned int		chg;
+
+	chg = !!(*xflags & XFS_XFLAG_REFLINK) ^ !!xfs_is_reflink_inode(ip);
+
+	if (!chg)
+		return 0;
+	if (!xfs_sb_version_hasreflink(&ip->i_mount->m_sb))
+		return -EOPNOTSUPP;
+	if (i_size_read(VFS_I(ip)) != 0)
+		return -EINVAL;
+	if (*xflags & XFS_XFLAG_REFLINK) {
+		*xflags &= ~XFS_XFLAG_REFLINK;
+		return 0;
+	}
+	return 0;
+}
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index b6d7ce6..15aa6b0 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -49,4 +49,8 @@ extern int xfs_reflink_unshare(struct xfs_inode *ip, struct file *filp,
 extern int xfs_reflink_truncate(struct xfs_inode *ip, struct file *filp,
 		xfs_off_t newsize);
 
+extern void xfs_reflink_get_lxflags(struct xfs_inode *ip, unsigned int *flags);
+extern int xfs_reflink_check_flag_adjust(struct xfs_inode *ip,
+		unsigned int *xflags);
+
 #endif /* __XFS_REFLINK_H */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 72/76] xfs: recognize the reflink feature bit
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (70 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 71/76] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 73/76] xfs: use new vfs reflink and dedup function pointers Darrick J. Wong
                   ` (4 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Add the reflink feature flag to the set of recognized feature flags.
This enables users to write to reflink filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h |    3 ++-
 fs/xfs/libxfs/xfs_sb.c     |    5 +++++
 fs/xfs/xfs_super.c         |    3 +++
 3 files changed, 10 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 7482531..ab0341f 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -459,7 +459,8 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_REFLINK  (1 << 2)		/* reflinked files */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
-		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT | \
+		 XFS_SB_FEAT_RO_COMPAT_REFLINK)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index c195ea7..a99eb01 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -230,6 +230,11 @@ xfs_mount_validate_sb(
 "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
 	}
 
+	if (xfs_sb_version_hasreflink(sbp)) {
+		xfs_alert(mp,
+"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
+	}
+
 	/*
 	 * More sanity checking.  Most of these were stolen directly from
 	 * xfs_repair.
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 36df40b..ede714b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1536,6 +1536,9 @@ xfs_fs_fill_super(
 		"Block device does not support DAX Turning DAX off.");
 			mp->m_flags &= ~XFS_MOUNT_DAX;
 		}
+		if (xfs_sb_version_hasreflink(&mp->m_sb))
+			xfs_alert(mp,
+		"DAX and reflink have not been tested together!");
 	}
 
 	if (xfs_sb_version_hassparseinodes(&mp->m_sb))

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 73/76] xfs: use new vfs reflink and dedup function pointers
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (71 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 72/76] xfs: recognize the reflink feature bit Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 74/76] xfs: set up per-AG preallocated block pools Darrick J. Wong
                   ` (3 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

Use the new VFS function pointers for copy_file_range and dedupe_data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_file.c  |   61 ++++++++++++++++
 fs/xfs/xfs_ioctl.c |  199 ----------------------------------------------------
 2 files changed, 60 insertions(+), 200 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e6bc6ab..0d96a37 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1081,7 +1081,7 @@ xfs_file_wait_for_io(
 }
 
 /* Hook up to the VFS reflink function */
-int
+STATIC int
 xfs_file_share_range(
 	struct file	*file_in,
 	loff_t		pos_in,
@@ -1175,6 +1175,62 @@ out_unlock:
 	return ret;
 }
 
+STATIC ssize_t
+xfs_file_copy_range(
+	struct file	*file_in,
+	loff_t		pos_in,
+	struct file	*file_out,
+	loff_t		pos_out,
+	size_t		len,
+	unsigned int	flags)
+{
+	int		error;
+
+	error = xfs_file_share_range(file_in, pos_in, file_out, pos_out,
+				     len, false);
+	if (error)
+		return error;
+	return len;
+}
+
+STATIC int
+xfs_file_clone_range(
+	struct file	*file_in,
+	loff_t		pos_in,
+	struct file	*file_out,
+	loff_t		pos_out,
+	u64		len)
+{
+	return xfs_file_share_range(file_in, pos_in, file_out, pos_out,
+				     len, false);
+}
+
+#define XFS_MAX_DEDUPE_LEN	(16 * 1024 * 1024)
+STATIC ssize_t
+xfs_file_dedupe_range(
+	struct file	*src_file,
+	u64		loff,
+	u64		len,
+	struct file	*dst_file,
+	u64		dst_loff)
+{
+	int		error;
+
+	/*
+	 * Limit the total length we will dedupe for each operation.
+	 * This is intended to bound the total time spent in this
+	 * ioctl to something sane.
+	 */
+	if (len > XFS_MAX_DEDUPE_LEN)
+		len = XFS_MAX_DEDUPE_LEN;
+
+	error = xfs_file_share_range(src_file, loff, dst_file, dst_loff,
+				     len, true);
+	if (error)
+		return error;
+	return len;
+}
+
 STATIC int
 xfs_file_open(
 	struct inode	*inode,
@@ -1811,6 +1867,9 @@ const struct file_operations xfs_file_operations = {
 	.release	= xfs_file_release,
 	.fsync		= xfs_file_fsync,
 	.fallocate	= xfs_file_fallocate,
+	.copy_file_range = xfs_file_copy_range,
+	.clone_file_range = xfs_file_clone_range,
+	.dedupe_file_range = xfs_file_dedupe_range,
 };
 
 const struct file_operations xfs_dir_file_operations = {
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 29dc36c..7562f14 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1536,159 +1536,6 @@ xfs_ioc_swapext(
 	return error;
 }
 
-extern int xfs_file_share_range(struct file *file_in, loff_t pos_in,
-		struct file *file_out, loff_t pos_out, size_t len,
-		bool is_dedupe);
-
-/*
- * For reflink, validate the VFS parameters, convert them into the XFS
- * equivalents, and then call the internal reflink function.
- */
-STATIC int
-xfs_ioctl_reflink(
-	struct file	*file_in,
-	loff_t		pos_in,
-	struct file	*file_out,
-	loff_t		pos_out,
-	size_t		len,
-	bool		is_dedupe)
-{
-	int		error;
-
-	/* Do we have the correct permissions? */
-	if (!(file_in->f_mode & FMODE_READ) ||
-	    !(file_out->f_mode & FMODE_WRITE) ||
-	    (file_out->f_flags & O_APPEND))
-		return -EBADF;
-
-	error = mnt_want_write_file(file_out);
-	if (error)
-		return error;
-
-	error = xfs_file_share_range(file_in, pos_in, file_out, pos_out, len,
-			is_dedupe);
-	if (error)
-		goto out_drop;
-
-	fsnotify_access(file_in);
-	add_rchar(current, len);
-	fsnotify_modify(file_out);
-	add_wchar(current, len);
-	inc_syscr(current);
-	inc_syscw(current);
-
-out_drop:
-	mnt_drop_write_file(file_out);
-	return error;
-}
-
-#define XFS_MAX_DEDUPE_LEN	(16 * 1024 * 1024)
-
-static long
-xfs_ioctl_file_extent_same(
-	struct file			*file,
-	struct xfs_extent_data __user	*argp)
-{
-	struct xfs_extent_data		*same = NULL;
-	struct xfs_extent_data_info	*info;
-	struct inode			*src;
-	u64				off;
-	u64				len;
-	int				i;
-	int				ret;
-	unsigned long			size;
-	bool				is_admin;
-	u16				count;
-
-	is_admin = capable(CAP_SYS_ADMIN);
-	src = file_inode(file);
-	if (!(file->f_mode & FMODE_READ))
-		return -EINVAL;
-
-	if (get_user(count, &argp->dest_count)) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	size = offsetof(struct xfs_extent_data __user,
-			info[count]);
-
-	same = memdup_user(argp, size);
-
-	if (IS_ERR(same)) {
-		ret = PTR_ERR(same);
-		goto out;
-	}
-
-	off = same->logical_offset;
-	len = same->length;
-
-	/*
-	 * Limit the total length we will dedupe for each operation.
-	 * This is intended to bound the total time spent in this
-	 * ioctl to something sane.
-	 */
-	if (len > XFS_MAX_DEDUPE_LEN)
-		len = XFS_MAX_DEDUPE_LEN;
-
-	ret = -EISDIR;
-	if (S_ISDIR(src->i_mode))
-		goto out;
-
-	ret = -EACCES;
-	if (!S_ISREG(src->i_mode))
-		goto out;
-
-	/* pre-format output fields to sane values */
-	for (i = 0; i < count; i++) {
-		same->info[i].bytes_deduped = 0ULL;
-		same->info[i].status = 0;
-	}
-
-	for (i = 0, info = same->info; i < count; i++, info++) {
-		struct inode *dst;
-		struct fd dst_file = fdget(info->fd);
-
-		if (!dst_file.file) {
-			info->status = -EBADF;
-			continue;
-		}
-		dst = file_inode(dst_file.file);
-
-		trace_xfs_ioctl_file_extent_same(file_inode(file), off, len,
-				dst, info->logical_offset);
-
-		info->bytes_deduped = 0;
-		if (!(is_admin || (dst_file.file->f_mode & FMODE_WRITE))) {
-			info->status = -EINVAL;
-		} else if (file->f_path.mnt != dst_file.file->f_path.mnt) {
-			info->status = -EXDEV;
-		} else if (S_ISDIR(dst->i_mode)) {
-			info->status = -EISDIR;
-		} else if (!S_ISREG(dst->i_mode)) {
-			info->status = -EOPNOTSUPP;
-		} else {
-			ret = xfs_ioctl_reflink(file, off, dst_file.file,
-					info->logical_offset, len, true);
-			if (ret == -EBADE)
-				info->status = XFS_EXTENT_DATA_DIFFERS;
-			else if (ret == 0)
-				info->bytes_deduped = len;
-			else
-				info->status = ret;
-		}
-		fdput(dst_file);
-	}
-
-	ret = copy_to_user(argp, same, size);
-	if (ret)
-		ret = -EFAULT;
-
-out:
-	kfree(same);
-	return ret;
-}
-
 /*
  * Note: some of the ioctl's return positive numbers as a
  * byte count indicating success, such as readlink_by_handle.
@@ -1987,52 +1834,6 @@ xfs_file_ioctl(
 		return xfs_icache_free_eofblocks(mp, &keofb);
 	}
 
-	case XFS_IOC_CLONE: {
-		struct fd src;
-
-		src = fdget(p);
-		if (!src.file)
-			return -EBADF;
-
-		trace_xfs_ioctl_clone(file_inode(src.file), file_inode(filp));
-
-		error = xfs_ioctl_reflink(src.file, 0, filp, 0, ~0ULL, false);
-		fdput(src);
-		if (error > 0)
-			error = 0;
-
-		return error;
-	}
-
-	case XFS_IOC_CLONE_RANGE: {
-		struct fd src;
-		struct xfs_clone_args args;
-
-		if (copy_from_user(&args, arg, sizeof(args)))
-			return -EFAULT;
-		src = fdget(args.src_fd);
-		if (!src.file)
-			return -EBADF;
-		if (args.src_length == 0)
-			args.src_length = ~0ULL;
-
-		trace_xfs_ioctl_clone_range(file_inode(src.file),
-				args.src_offset, args.src_length,
-				file_inode(filp), args.dest_offset);
-
-		error = xfs_ioctl_reflink(src.file, args.src_offset, filp,
-					  args.dest_offset, args.src_length,
-					  false);
-		fdput(src);
-		if (error > 0)
-			error = 0;
-
-		return error;
-	}
-
-	case XFS_IOC_FILE_EXTENT_SAME:
-		return xfs_ioctl_file_extent_same(filp, arg);
-
 	default:
 		return -ENOTTY;
 	}

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 74/76] xfs: set up per-AG preallocated block pools
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (72 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 73/76] xfs: use new vfs reflink and dedup function pointers Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 75/76] xfs: preallocate blocks for worst-case refcount btree expansion Darrick J. Wong
                   ` (2 subsequent siblings)
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

One unfortunate quirk of the reference count btree -- it can expand in
size when blocks are written to *other* allocation groups if, say, one
large extent becomes a lot of tiny extents.  Since we don't want to
start throwing errors in the middle of CoWing, establish a pool of
reserved blocks in each AG to feed such an expansion.  Reserved pools
can be large enough to obviate the need for external allocations and
use EFI/EFDs so that the the reserved blocks will be freed if the
system crashes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/Makefile                |    1 
 fs/xfs/libxfs/xfs_perag_pool.c |  379 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_perag_pool.h |   47 +++++
 fs/xfs/xfs_trace.h             |   15 ++
 4 files changed, 442 insertions(+)
 create mode 100644 fs/xfs/libxfs/xfs_perag_pool.c
 create mode 100644 fs/xfs/libxfs/xfs_perag_pool.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 798e2b0..d2ab008 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,6 +51,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_fork.o \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
+				   xfs_perag_pool.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
 				   xfs_refcount.o \
diff --git a/fs/xfs/libxfs/xfs_perag_pool.c b/fs/xfs/libxfs/xfs_perag_pool.c
new file mode 100644
index 0000000..b49ffd2
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_perag_pool.c
@@ -0,0 +1,379 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_alloc.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_trans.h"
+#include "xfs_bit.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_perag_pool.h"
+#include "xfs_trans_space.h"
+
+/**
+ * xfs_perag_pool_free() -- Free a per-AG reserved block pool.
+ */
+int
+xfs_perag_pool_free(
+	struct xfs_perag_pool		*p)
+{
+	struct xfs_mount		*mp;
+	struct xfs_perag_pool_entry	*ppe, *n;
+	struct xfs_trans		*tp;
+	xfs_fsblock_t			fsb;
+	struct xfs_bmap_free		freelist;
+	int				committed;
+	int				error = 0, err;
+
+	if (!p)
+		return 0;
+
+	mp = p->pp_mount;
+	list_for_each_entry_safe(ppe, n, &p->pp_entries, ppe_list) {
+		list_del(&ppe->ppe_list);
+		if (XFS_FORCED_SHUTDOWN(mp)) {
+			kmem_free(ppe);
+			continue;
+		}
+
+		/* Set up transaction. */
+		tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+		tp->t_flags |= XFS_TRANS_RESERVE;
+		err = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, 0, 0);
+		if (err)
+			goto loop_cancel;
+		xfs_bmap_init(&freelist, &fsb);
+		fsb = XFS_AGB_TO_FSB(p->pp_mount, p->pp_agno, ppe->ppe_bno);
+
+		trace_xfs_perag_pool_free_extent(mp, p->pp_agno, ppe->ppe_bno,
+				ppe->ppe_len, &p->pp_oinfo);
+
+		/* Free the block. */
+		xfs_bmap_add_free(mp, &freelist, fsb, ppe->ppe_len,
+				&p->pp_oinfo);
+
+		err = xfs_bmap_finish(&tp, &freelist, &committed, NULL);
+		if (err)
+			goto loop_cancel;
+
+		err = xfs_trans_commit(tp);
+		if (!error)
+			error = err;
+		kmem_free(ppe);
+		continue;
+loop_cancel:
+		if (!error)
+			error = err;
+		xfs_trans_cancel(tp);
+		kmem_free(ppe);
+	}
+
+	kmem_free(p);
+	if (error)
+		trace_xfs_perag_pool_free_error(mp, p->pp_agno, error,
+				_RET_IP_);
+	return error;
+}
+
+/* Allocate a block for the pool. */
+static int
+xfs_perag_pool_grab_block(
+	struct xfs_perag_pool		*p,
+	struct xfs_trans		*tp,
+	xfs_extlen_t			*len)
+{
+	struct xfs_mount		*mp;
+	struct xfs_perag_pool_entry	*ppe;
+	struct xfs_alloc_arg		args;
+	int				error;
+
+	mp = p->pp_mount;
+
+	/* Set up the allocation. */
+	memset(&args, 0, sizeof(args));
+	args.mp = mp;
+	args.type = XFS_ALLOCTYPE_NEAR_BNO;
+	args.fsbno = XFS_AGB_TO_FSB(mp, p->pp_agno, p->pp_agbno);
+	args.firstblock = args.fsbno;
+	args.oinfo = p->pp_oinfo;
+	args.minlen = 1;
+
+	/* Allocate blocks. */
+	args.tp = tp;
+	args.maxlen = args.prod = *len;
+	p->pp_allocating = true;
+	error = xfs_alloc_vextent(&args);
+	p->pp_allocating = false;
+	if (error)
+		goto out_error;
+	if (args.fsbno == NULLFSBLOCK) {
+		/* oh well, we're headed towards failure. */
+		error = -ENOSPC;
+		goto out_error;
+	}
+	*len = args.len;
+
+	trace_xfs_perag_pool_grab_block(mp, p->pp_agno, args.agbno, args.len,
+			&p->pp_oinfo);
+
+	/* Add to our list. */
+	ASSERT(args.agno == p->pp_agno);
+	ppe = kmem_alloc(sizeof(struct xfs_perag_pool_entry), KM_SLEEP);
+	ppe->ppe_bno = args.agbno;
+	ppe->ppe_len = args.len;
+	list_add_tail(&ppe->ppe_list, &p->pp_entries);
+	return 0;
+
+out_error:
+	trace_xfs_perag_pool_grab_block_error(mp, p->pp_agno, error, _RET_IP_);
+	return error;
+}
+
+/* Ensure the pool has some capacity. */
+static int
+__xfs_perag_pool_ensure_capacity(
+	struct xfs_perag_pool		*p,
+	xfs_extlen_t			sz,
+	bool				force)
+{
+	struct xfs_mount		*mp = p->pp_mount;
+	struct xfs_trans		*tp;
+	struct xfs_perag		*pag;
+	uint				resblks;
+	xfs_extlen_t			alloc_len;
+	int				error;
+
+	if (sz <= p->pp_len - p->pp_inuse)
+		return 0;
+	sz -= p->pp_len - p->pp_inuse;
+
+	trace_xfs_perag_pool_ensure_capacity(mp, p->pp_agno,
+			p->pp_len - p->pp_inuse, sz, &p->pp_oinfo);
+
+	/* Do we even have enough free blocks? */
+	pag = xfs_perag_get(mp, p->pp_agno);
+	resblks = pag->pagf_freeblks;
+	xfs_perag_put(pag);
+	if (force && resblks < sz)
+		sz = resblks;
+	if (resblks < sz) {
+		error = -ENOSPC;
+		goto out_error;
+	}
+
+	while (sz) {
+		/* Set up a transaction */
+		resblks = XFS_DIOSTRAT_SPACE_RES(mp, sz);
+		tp = xfs_trans_alloc(mp, XFS_TRANS_STRAT_WRITE);
+		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write, resblks, 0);
+		if (error)
+			goto out_cancel;
+
+		/* Allocate the blocks */
+		alloc_len = sz;
+		error = xfs_perag_pool_grab_block(p, tp, &alloc_len);
+		if (error)
+			goto out_cancel;
+
+		/* Commit the transaction */
+		error = xfs_trans_commit(tp);
+		if (error)
+			goto out_error;
+
+		p->pp_len += alloc_len;
+		sz -= alloc_len;
+	}
+	return 0;
+
+out_cancel:
+	xfs_trans_cancel(tp);
+out_error:
+	trace_xfs_perag_pool_ensure_capacity_error(mp, p->pp_agno, error,
+			_RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_perag_pool_ensure_capacity() -- Ensure the pool has some capacity.
+ *
+ * @p: per-AG reserved blocks pool.
+ * @sz: Ensure that there are at least this many free blocks.
+ */
+int
+xfs_perag_pool_ensure_capacity(
+	struct xfs_perag_pool		*p,
+	xfs_extlen_t			sz)
+{
+	if (!p)
+		return 0;
+	return __xfs_perag_pool_ensure_capacity(p, sz, false);
+}
+
+/**
+ * xfs_perag_pool_init() -- Initialize a per-AG reserved block pool.
+ */
+int
+xfs_perag_pool_init(
+	struct xfs_mount		*mp,
+	xfs_agnumber_t			agno,
+	xfs_agblock_t			agbno,
+	xfs_extlen_t			len,
+	xfs_extlen_t			inuse,
+	uint64_t			owner,
+	struct xfs_perag_pool		**pp)
+{
+	struct xfs_perag_pool		*p;
+	struct xfs_owner_info		oinfo;
+	int				error;
+
+	XFS_RMAP_AG_OWNER(&oinfo, owner);
+	trace_xfs_perag_pool_init(mp, agno, agbno, len, &oinfo);
+	trace_xfs_perag_pool_init(mp, agno, agbno, inuse, &oinfo);
+
+	p = kmem_alloc(sizeof(struct xfs_perag_pool), KM_SLEEP);
+	p->pp_mount = mp;
+	p->pp_agno = agno;
+	p->pp_agbno = agbno;
+	p->pp_inuse = p->pp_len = inuse;
+	p->pp_oinfo = oinfo;
+	p->pp_allocating = false;
+	INIT_LIST_HEAD(&p->pp_entries);
+	*pp = p;
+
+	/* Try to reserve some blocks. */
+	error = __xfs_perag_pool_ensure_capacity(p, len - inuse, true);
+	if (error == -ENOSPC)
+		error = 0;
+
+	if (error)
+		trace_xfs_perag_pool_init_error(mp, agno, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_perag_pool_alloc_block() -- Allocate a block from the pool.
+ *
+ * @p: Reserved block pool.
+ * @tp: Transaction to record the allocation.
+ * @bno: (out) The allocated block number.
+ */
+int
+xfs_perag_pool_alloc_block(
+	struct xfs_perag_pool		*p,
+	struct xfs_trans		*tp,
+	xfs_agblock_t			*bno)
+{
+	struct xfs_mount		*mp;
+	struct xfs_perag_pool_entry	*ppe;
+	xfs_extlen_t			len;
+	int				error;
+
+	if (p == NULL || p->pp_allocating)
+		return -EINVAL;
+
+	mp = p->pp_mount;
+	mp = mp;
+	/* Empty pool?  Grab another block. */
+	if (list_empty(&p->pp_entries)) {
+		len = 1;
+		error = xfs_perag_pool_grab_block(p, tp, &len);
+		if (error)
+			goto err;
+		ASSERT(len == 1);
+		if (list_empty(&p->pp_entries)) {
+			error = -ENOSPC;
+			goto err;
+		}
+	}
+
+	/* Find an available block. */
+	ppe = list_first_entry(&p->pp_entries, struct xfs_perag_pool_entry,
+			ppe_list);
+	*bno = ppe->ppe_bno;
+
+	trace_xfs_perag_pool_alloc_block(mp, p->pp_agno, *bno, 1, &p->pp_oinfo);
+
+	/* Update the accounting. */
+	ppe->ppe_len--;
+	ppe->ppe_bno++;
+	if (ppe->ppe_len == 0)
+		list_del(&ppe->ppe_list);
+	p->pp_inuse++;
+
+	return 0;
+err:
+	trace_xfs_perag_pool_alloc_block_error(mp, p->pp_agno, error, _RET_IP_);
+	return error;
+}
+
+/**
+ * xfs_perag_pool_free_block() -- Put a block back in the pool.
+ *
+ * @p: Reserved block pool.
+ * @tp: Transaction to record the free operation.
+ * @bno: Block to put back.
+ */
+int
+xfs_perag_pool_free_block(
+	struct xfs_perag_pool		*p,
+	struct xfs_trans		*tp,
+	xfs_agblock_t			bno)
+{
+	struct xfs_mount		*mp;
+	struct xfs_perag_pool_entry	*ppe;
+
+	if (p == NULL)
+		return -EINVAL;
+
+	mp = p->pp_mount;
+	mp = mp;
+	trace_xfs_perag_pool_free_block(mp, p->pp_agno, bno, 1, &p->pp_oinfo);
+
+	list_for_each_entry(ppe, &p->pp_entries, ppe_list) {
+		if (ppe->ppe_bno - 1 == bno) {
+
+			/* Adjust bookkeeping. */
+			p->pp_inuse--;
+			ppe->ppe_bno--;
+			ppe->ppe_len++;
+			return 0;
+		}
+		if (ppe->ppe_bno + ppe->ppe_len == bno) {
+			p->pp_inuse--;
+			ppe->ppe_len++;
+			return 0;
+		}
+	}
+	ppe = kmem_alloc(sizeof(struct xfs_perag_pool_entry), KM_SLEEP);
+	ppe->ppe_bno = bno;
+	ppe->ppe_len = 1;
+	p->pp_inuse--;
+
+	list_add_tail(&ppe->ppe_list, &p->pp_entries);
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_perag_pool.h b/fs/xfs/libxfs/xfs_perag_pool.h
new file mode 100644
index 0000000..ecdcd2a
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_perag_pool.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc.
+ * Copyright (c) 2015 Oracle.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+
+struct xfs_perag_pool_entry {
+	struct list_head	ppe_list;	/* pool list */
+	xfs_agblock_t		ppe_bno;	/* AG block number */
+	xfs_extlen_t		ppe_len;	/* length */
+};
+
+struct xfs_perag_pool {
+	struct xfs_mount	*pp_mount;	/* XFS mount */
+	xfs_agnumber_t		pp_agno;	/* AG number */
+	xfs_agblock_t		pp_agbno;	/* suggested AG block number */
+	xfs_extlen_t		pp_len;		/* blocks in pool */
+	xfs_extlen_t		pp_inuse;	/* blocks in use */
+	struct xfs_owner_info	pp_oinfo;	/* owner */
+	struct list_head	pp_entries;	/* pool entries */
+	bool			pp_allocating;	/* are we allocating? */
+};
+
+int xfs_perag_pool_free(struct xfs_perag_pool *p);
+int xfs_perag_pool_init(struct xfs_mount *mp, xfs_agnumber_t agno,
+		xfs_agblock_t agbno, xfs_extlen_t len, xfs_extlen_t inuse,
+		uint64_t owner, struct xfs_perag_pool **pp);
+
+int xfs_perag_pool_ensure_capacity(struct xfs_perag_pool *p, xfs_extlen_t sz);
+
+int xfs_perag_pool_alloc_block(struct xfs_perag_pool *p, struct xfs_trans *tp,
+		xfs_agblock_t *bno);
+int xfs_perag_pool_free_block(struct xfs_perag_pool *p, struct xfs_trans *tp,
+		xfs_agblock_t bno);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 0773938..dad57dc 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3160,6 +3160,21 @@ DEFINE_INODE_EVENT(xfs_reflink_cancel_pending_cow);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_cancel_cow);
 DEFINE_INODE_ERROR_EVENT(xfs_reflink_cancel_pending_cow_error);
 
+/* perag pool tracepoints */
+#define DEFINE_PERAG_POOL_EVENT	DEFINE_RMAP_EVENT
+DEFINE_PERAG_POOL_EVENT(xfs_perag_pool_free_extent);
+DEFINE_PERAG_POOL_EVENT(xfs_perag_pool_grab_block);
+DEFINE_PERAG_POOL_EVENT(xfs_perag_pool_init);
+DEFINE_PERAG_POOL_EVENT(xfs_perag_pool_ensure_capacity);
+DEFINE_PERAG_POOL_EVENT(xfs_perag_pool_alloc_block);
+DEFINE_PERAG_POOL_EVENT(xfs_perag_pool_free_block);
+
+DEFINE_AG_ERROR_EVENT(xfs_perag_pool_free_error);
+DEFINE_AG_ERROR_EVENT(xfs_perag_pool_grab_block_error);
+DEFINE_AG_ERROR_EVENT(xfs_perag_pool_init_error);
+DEFINE_AG_ERROR_EVENT(xfs_perag_pool_ensure_capacity_error);
+DEFINE_AG_ERROR_EVENT(xfs_perag_pool_alloc_block_error);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 75/76] xfs: preallocate blocks for worst-case refcount btree expansion
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (73 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 74/76] xfs: set up per-AG preallocated block pools Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-19  9:04 ` [PATCH 76/76] xfs: try to prevent failed rmap btree expansion during cow Darrick J. Wong
  2015-12-20 14:02 ` [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Brian Foster
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

To gracefully handle the situation where a CoW operation turns a
single refcount extent into a lot of tiny ones and then run out of
space when a tree split has to happen, use the per-AG reserved block
pool to pre-allocate all the space we'll ever need for a maximal
btree.  For a 4K block size, this only costs an overhead of 0.3% of
available disk space.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_refcount_btree.c |  184 ++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_refcount_btree.h |    3 +
 fs/xfs/xfs_fsops.c                 |    4 +
 fs/xfs/xfs_mount.c                 |   10 ++
 fs/xfs/xfs_mount.h                 |    1 
 fs/xfs/xfs_super.c                 |   14 +++
 6 files changed, 216 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c
index c785433..7f8bdc4 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.c
+++ b/fs/xfs/libxfs/xfs_refcount_btree.c
@@ -33,6 +33,7 @@
 #include "xfs_cksum.h"
 #include "xfs_trans.h"
 #include "xfs_bit.h"
+#include "xfs_perag_pool.h"
 
 static struct xfs_btree_cur *
 xfs_refcountbt_dup_cursor(
@@ -72,8 +73,32 @@ xfs_refcountbt_alloc_block(
 	int			*stat)
 {
 	struct xfs_alloc_arg	args;		/* block allocation args */
+	struct xfs_perag	*pag;
+	xfs_agblock_t		bno;
 	int			error;		/* error return value */
 
+	/* First try the per-AG reserve pool. */
+	pag = xfs_perag_get(cur->bc_mp, cur->bc_private.a.agno);
+	error = xfs_perag_pool_alloc_block(pag->pagf_refcountbt_pool,
+			cur->bc_tp, &bno);
+	xfs_perag_put(pag);
+
+	switch (error) {
+	case 0:
+		*stat = 1;
+		new->s = cpu_to_be32(bno);
+		return 0;
+	case -EINVAL:
+		break;
+	case -ENOSPC:
+		error = 0;
+		/* fall through */
+	default:
+		*stat = 0;
+		return error;
+	}
+
+	/* No pool; try a regular allocation. */
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
@@ -113,9 +138,27 @@ xfs_refcountbt_free_block(
 {
 	struct xfs_mount	*mp = cur->bc_mp;
 	struct xfs_trans	*tp = cur->bc_tp;
+	struct xfs_perag	*pag;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
 	struct xfs_owner_info	oinfo;
+	int			error;
 
+	/* Try to give it back to the pool. */
+	pag = xfs_perag_get(cur->bc_mp, cur->bc_private.a.agno);
+	error = xfs_perag_pool_free_block(pag->pagf_refcountbt_pool, cur->bc_tp,
+			XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno));
+	xfs_perag_put(pag);
+
+	switch (error) {
+	case 0:
+		return 0;
+	case -EINVAL:
+		break;
+	default:
+		return error;
+	}
+
+	/* Return it to the AG. */
 	XFS_RMAP_AG_OWNER(&oinfo, XFS_RMAP_OWN_REFC);
 	xfs_bmap_add_free(mp, cur->bc_private.a.flist, fsbno, 1,
 			&oinfo);
@@ -390,3 +433,144 @@ xfs_refcountbt_max_btree_size(
 
 	return xfs_refcountbt_calc_btree_size(mp, mp->m_sb.sb_agblocks);
 }
+
+/* Count the blocks in the reference count tree. */
+static int
+xfs_refcountbt_count_tree_blocks(
+	struct xfs_mount	*mp,
+	xfs_agnumber_t		agno,
+	xfs_extlen_t		*tree_len)
+{
+	struct xfs_buf		*agfbp;
+	struct xfs_buf		*bp = NULL;
+	struct xfs_agf		*agfp;
+	struct xfs_btree_block	*block = NULL;
+	int			level;
+	xfs_agblock_t		bno;
+	xfs_fsblock_t		fsbno;
+	__be32			*pp;
+	int			error;
+	xfs_extlen_t		nr_blocks = 0;
+
+	error = xfs_alloc_read_agf(mp, NULL, agno, 0, &agfbp);
+	if (error)
+		goto out;
+	agfp = XFS_BUF_TO_AGF(agfbp);
+	level = be32_to_cpu(agfp->agf_refcount_level);
+	bno = be32_to_cpu(agfp->agf_refcount_root);
+
+	/*
+	 * Go down the tree until leaf level is reached, following the first
+	 * pointer (leftmost) at each level.
+	 */
+	while (level-- > 0) {
+		fsbno = XFS_AGB_TO_FSB(mp, agno, bno);
+		error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
+				XFS_FSB_TO_DADDR(mp, fsbno),
+				XFS_FSB_TO_BB(mp, 1), 0, &bp,
+				&xfs_refcountbt_buf_ops);
+		if (error)
+			goto err;
+		block = XFS_BUF_TO_BLOCK(bp);
+		if (level == 0)
+			break;
+		pp = XFS_REFCOUNT_PTR_ADDR(block, 1, mp->m_refc_mxr[1]);
+		bno = be32_to_cpu(*pp);
+		xfs_trans_brelse(NULL, bp);
+	}
+
+	/* Jog rightward though level zero. */
+	while (block) {
+		nr_blocks++;
+		bno = be32_to_cpu(block->bb_u.s.bb_rightsib);
+		if (bno == NULLAGBLOCK)
+			break;
+		fsbno = XFS_AGB_TO_FSB(mp, agno, bno);
+		xfs_trans_brelse(NULL, bp);
+		error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
+				XFS_FSB_TO_DADDR(mp, fsbno),
+				XFS_FSB_TO_BB(mp, 1), 0, &bp,
+				&xfs_refcountbt_buf_ops);
+		if (error)
+			goto err;
+		block = XFS_BUF_TO_BLOCK(bp);
+	}
+
+	if (bp)
+		xfs_trans_brelse(NULL, bp);
+
+	/* Add in the upper levels of tree. */
+	*tree_len = nr_blocks;
+err:
+	xfs_trans_brelse(NULL, agfbp);
+out:
+	return error;
+}
+
+/**
+ * xfs_refcountbt_alloc_reserve_pool() -- Create reserved block pools for each
+ *					  allocation group.
+ */
+int
+xfs_refcountbt_alloc_reserve_pool(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		agno;
+	struct xfs_perag	*pag;
+	xfs_extlen_t		pool_len;
+	xfs_extlen_t		tree_len;
+	int			error = 0;
+	int			err;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	pool_len = xfs_refcountbt_max_btree_size(mp);
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		if (pag->pagf_refcountbt_pool) {
+			xfs_perag_put(pag);
+			continue;
+		}
+		tree_len = 0;
+		xfs_refcountbt_count_tree_blocks(mp, agno, &tree_len);
+		err = xfs_perag_pool_init(mp, agno,
+				xfs_refc_block(mp),
+				pool_len, tree_len,
+				XFS_RMAP_OWN_REFC,
+				&pag->pagf_refcountbt_pool);
+		xfs_perag_put(pag);
+		if (err && !error)
+			error = err;
+	}
+
+	return error;
+}
+
+/**
+ * xfs_refcountbt_free_reserve_pool() -- Free the reference count btree pools.
+ */
+int
+xfs_refcountbt_free_reserve_pool(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		agno;
+	struct xfs_perag	*pag;
+	int			error = 0;
+	int			err;
+
+	if (!xfs_sb_version_hasreflink(&mp->m_sb))
+		return 0;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		pag = xfs_perag_get(mp, agno);
+		err = xfs_perag_pool_free(pag->pagf_refcountbt_pool);
+		pag->pagf_refcountbt_pool = NULL;
+		xfs_perag_put(pag);
+		if (err && !error)
+			error = err;
+	}
+
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h
index 0f55544..93eebda 100644
--- a/fs/xfs/libxfs/xfs_refcount_btree.h
+++ b/fs/xfs/libxfs/xfs_refcount_btree.h
@@ -65,4 +65,7 @@ extern int xfs_refcountbt_maxrecs(struct xfs_mount *mp, int blocklen,
 DECLARE_BTREE_SIZE_FN(refcountbt);
 extern unsigned int xfs_refcountbt_max_btree_size(struct xfs_mount *mp);
 
+extern int xfs_refcountbt_alloc_reserve_pool(struct xfs_mount *mp);
+extern int xfs_refcountbt_free_reserve_pool(struct xfs_mount *mp);
+
 #endif	/* __XFS_REFCOUNT_BTREE_H__ */
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 920db9d..4158e07 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -41,6 +41,7 @@
 #include "xfs_trace.h"
 #include "xfs_log.h"
 #include "xfs_filestream.h"
+#include "xfs_refcount_btree.h"
 
 /*
  * File system operations
@@ -679,6 +680,9 @@ xfs_growfs_data_private(
 			continue;
 		}
 	}
+
+	error = xfs_refcountbt_alloc_reserve_pool(mp);
+
 	return saved_error ? saved_error : error;
 
  error0:
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 335bcad..29841d6 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -41,6 +41,7 @@
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_sysfs.h"
+#include "xfs_refcount_btree.h"
 
 
 static DEFINE_MUTEX(xfs_uuid_table_mutex);
@@ -966,6 +967,10 @@ xfs_mountfs(
 		if (error)
 			xfs_warn(mp,
 	"Unable to allocate reserve blocks. Continuing without reserve pool.");
+		error = xfs_refcountbt_alloc_reserve_pool(mp);
+		if (error)
+			xfs_err(mp,
+	"Error %d allocating refcount btree reserve blocks.", error);
 	}
 
 	return 0;
@@ -1007,6 +1012,11 @@ xfs_unmountfs(
 	__uint64_t		resblks;
 	int			error;
 
+	error = xfs_refcountbt_free_reserve_pool(mp);
+	if (error)
+		xfs_warn(mp,
+	"Error %d freeing refcount btree reserve blocks.", error);
+
 	cancel_delayed_work_sync(&mp->m_eofblocks_work);
 
 	xfs_qm_unmount_quotas(mp);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index caed8d3..75ff130 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -321,6 +321,7 @@ typedef struct xfs_perag {
 
 	/* reference count */
 	__uint8_t	pagf_refcount_level;
+	struct xfs_perag_pool	*pagf_refcountbt_pool;
 } xfs_perag_t;
 
 extern void	xfs_uuid_table_free(void);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index ede714b..87f44a2 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -46,6 +46,7 @@
 #include "xfs_quota.h"
 #include "xfs_sysfs.h"
 #include "xfs_reflink.h"
+#include "xfs_refcount_btree.h"
 
 #include <linux/namei.h>
 #include <linux/init.h>
@@ -1264,6 +1265,12 @@ xfs_fs_remount(
 		 */
 		xfs_restore_resvblks(mp);
 		xfs_log_work_queue(mp);
+
+		/* Save space for the refcount btree! */
+		error = xfs_refcountbt_alloc_reserve_pool(mp);
+		if (error)
+			xfs_err(mp,
+	"Error %d allocating refcount btree reserve blocks.", error);
 	}
 
 	/* rw -> ro */
@@ -1275,6 +1282,13 @@ xfs_fs_remount(
 		 * reserve pool size so that if we get remounted rw, we can
 		 * return it to the same size.
 		 */
+
+		/* Save space for the refcount btree! */
+		error = xfs_refcountbt_free_reserve_pool(mp);
+		if (error)
+			xfs_warn(mp,
+	"Error %d freeing refcount btree reserve blocks.", error);
+
 		xfs_save_resvblks(mp);
 		xfs_quiesce_attr(mp);
 		mp->m_flags |= XFS_MOUNT_RDONLY;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [PATCH 76/76] xfs: try to prevent failed rmap btree expansion during cow
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (74 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 75/76] xfs: preallocate blocks for worst-case refcount btree expansion Darrick J. Wong
@ 2015-12-19  9:04 ` Darrick J. Wong
  2015-12-20 14:02 ` [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Brian Foster
  76 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2015-12-19  9:04 UTC (permalink / raw)
  To: david, darrick.wong; +Cc: xfs

It's possible, if reflink and rmap are both enabled, for CoW to create
a lot of small mappings that cause an rmap btree expansion to run out
of space.  If both are enabled, keep the AGFL fully stocked at all
times, refuse to CoW if we start to run out of space in the AGFL, and
hope that's good enough.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_alloc.c |    8 ++++++++
 fs/xfs/xfs_reflink.c      |   20 ++++++++++++++++++++
 2 files changed, 28 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 637e2e7..70aad82 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2000,6 +2000,14 @@ xfs_alloc_min_freelist(
 		min_free += min_t(unsigned int,
 				  pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
 				  mp->m_ag_maxlevels);
+	/*
+	 * The rmapbt can explode if we have reflink enabled and someone
+	 * creates a lot of small mappings... so max out the AGFL to try
+	 * to prevent that.
+	 */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
+	    xfs_sb_version_hasreflink(&mp->m_sb))
+		min_free = XFS_AGFL_SIZE(mp) - min_free;
 
 	return min_free;
 }
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 299a957..da4a715 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -49,6 +49,7 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_reflink.h"
 #include "xfs_iomap.h"
+#include "xfs_sb.h"
 
 /*
  * Copy on Write of Shared Blocks
@@ -191,6 +192,8 @@ xfs_reflink_reserve_cow_extent(
 	struct xfs_bmbt_irec	*irec)
 {
 	struct xfs_bmbt_irec	rec;
+	struct xfs_perag	*pag;
+	unsigned int		resblks;
 	xfs_agnumber_t		agno;
 	xfs_agblock_t		agbno;
 	xfs_extlen_t		aglen;
@@ -225,6 +228,23 @@ xfs_reflink_reserve_cow_extent(
 			goto advloop;
 		}
 
+		/*
+		 * XXX: The rmapbt can ENOSPC if we do evil things like
+		 * CoW every other block of a long shared segment.  Don't
+		 * let the CoW happen if we lack sufficient space in the AG.
+		 * We overinflate the AGFL in the hopes that this will never
+		 * have to happen.
+		 */
+		if (xfs_sb_version_hasrmapbt(&ip->i_mount->m_sb)) {
+			pag = xfs_perag_get(ip->i_mount, agno);
+			resblks = pag->pagf_flcount;
+			xfs_perag_put(pag);
+			if (resblks < 50) {
+				error = -ENOSPC;
+				break;
+			}
+		}
+
 		/* Add as much as we can to the cow fork */
 		foffset = XFS_FSB_TO_B(ip->i_mount, lblk + fbno - agbno);
 		fsize = XFS_FSB_TO_B(ip->i_mount, flen);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
  2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
                   ` (75 preceding siblings ...)
  2015-12-19  9:04 ` [PATCH 76/76] xfs: try to prevent failed rmap btree expansion during cow Darrick J. Wong
@ 2015-12-20 14:02 ` Brian Foster
  2016-01-04 23:59   ` Darrick J. Wong
  76 siblings, 1 reply; 102+ messages in thread
From: Brian Foster @ 2015-12-20 14:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Sat, Dec 19, 2015 at 12:56:23AM -0800, Darrick J. Wong wrote:
> Hi all,
> 
...
> Fixed since RFCv3:
> 
>  * The reflink and dedupe ioctls are being hoisted to the VFS, as
>    provided in the first few patches.  Patch 81 connects to this
>    functionality.
> 
>  * Copy on write has been rewritten for v4.  We now use the existing
>    delayed allocation mechanism to coalesce writes together, deferring
>    allocation until writeout time.  This enables CoW to make better
>    block placement decisions and significantly reduces overhead.
>    CoW is still pretty slow, but not as slow as before.
> 
>  * Direct IO CoW has been implemented using the same mechanism as
>    above, but modified to perform the allocation and remapping right
>    then and there.  Throughput is much higher than pushing data
>    through the page cache CoW.  (It's the same mechanism, but we're
>    playing with chunks bigger than a single memory page.)
> 
>  * CoW ENOSPC works correctly now, except in the pathological case
>    that the AG fills up and the rmap btree cannot expand.  That will
>    be addressed for v5.
> 
>  * fallocate will now unshare blocks to prevent future ENOSPC, as
>    you'd expect.
> 
>  * refcount btree blocks are preallocated at mount time to prevent
>    ENOSPC while trying to expand the tree.  This also has the effect
>    of grouping the btree blocks together, which can speed up CoW
>    remapping.
> 

Can you elaborate on how these blocks are preallocated? E.g., is the
tree "preconstructed" in some sense? However that is done, is this the
anticipated solution or a temporary workaround..?

Also, shouldn't the enospc condition be handled by the agfl? I take it
there is something going on here that renders that solution flawed, so
I'm just curious what it is.

(Sorry if this is all explained elsewhere, but I haven't yet had a
chance to take a close enough look at this feature..).

Brian

> Issues: 
> 
>  * The extent swapping ioctl still allocates a bigger fixed-size
>    transaction.  That's most likely a stupid thing to do, so getting a
>    better grip on how the journalling code works and auditing all the
>    new transaction users will have to happen.  Right now it mostly
>    gets lucky.
> 
>  * EFI tracking for the allocated-but-not-yet-mapped blocks is
>    nonexistant.  A crash will leak them.
> 
>  * ENOSPC while expanding the rmap btree can crash the FS.  For now we
>    work around this problem by making the AGFL as big as possible,
>    failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
>    available, and hoping that doesn't actually happen.
> 
> If you're going to start using this mess, you probably ought to just
> pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
> There are also updates for xfs-docs[4] and man-pages[5].
> 
> The patches have been xfstested with x64, i386, and ppc64; while in
> general the tests run to completion, there are still periodic bugs
> that will be addressed by the next RFC.  There's a persistent crash on
> arm64 and ppc64el that I haven't been able to triage.
> 
> This is an extraordinary way to eat your data.  Enjoy! 
> Comments and questions are, as always, welcome.
> 
> --D
> 
> [1] https://github.com/djwong/linux/tree/for-dave
> [2] https://github.com/djwong/xfsprogs/tree/for-dave
> [3] https://github.com/djwong/xfstests/tree/for-dave
> [4] https://github.com/djwong/xfs-documentation/tree/for-dave
> [5] https://github.com/djwong/man-pages/commits/for-mtk
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 03/76] libxfs: refactor the btree size calculator code
  2015-12-19  8:56 ` [PATCH 03/76] libxfs: refactor the btree size calculator code Darrick J. Wong
@ 2015-12-20 20:39   ` Dave Chinner
  2016-01-04 22:06     ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Chinner @ 2015-12-20 20:39 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Sat, Dec 19, 2015 at 12:56:42AM -0800, Darrick J. Wong wrote:
> Create a macro to generate btree height calculator functions.
> This will be used (much) later when we get to the refcount
> btree.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
....
> +/* btree size calculator templates */
> +#define DECLARE_BTREE_SIZE_FN(btree) \
> +xfs_filblks_t xfs_##btree##_calc_btree_size(struct xfs_mount *mp, \
> +		unsigned long len);
> +
> +#define DEFINE_BTREE_SIZE_FN(btree, limitfield, maxlevels) \
> +xfs_filblks_t \
> +xfs_##btree##_calc_btree_size( \
> +	struct xfs_mount	*mp, \
> +	unsigned long		len) \
> +{ \
> +	int			level; \
> +	int			maxrecs; \
> +	xfs_filblks_t		rval; \
> +\
> +	maxrecs = mp->limitfield[0]; \
> +	for (level = 0, rval = 0; level < maxlevels; level++) { \
> +		len += maxrecs - 1; \
> +		do_div(len, maxrecs); \
> +		rval += len; \
> +		if (len == 1) \
> +			return rval + maxlevels - \
> +				level - 1; \
> +		if (level == 0) \
> +			maxrecs = mp->limitfield[1]; \
> +	} \
> +	return rval; \
> +}

I really don't like using macros like this. The code becomes hard to
debug, hard to edit, the functions don't show up in grep/cscope,
etc.

A helper function like this:

xfs_filblks_t
xfs_btree_calc_size(
	struct xfs_mount        *mp,
	int			*limits,
	int			maxlevels,
	unsigned long           len)
{
	int			level;
	int			maxrecs;
	xfs_filblks_t		rval;

	maxrecs = limits[0];
	for (level = 0, rval = 0; level < maxlevels; level++) {
		len += maxrecs - 1;
		do_div(len, maxrecs);
		rval += len;
		if (len == 1)
			return rval + maxlevels - level - 1;
		if (level == 0)
			maxrecs = limits[1];
	}
	return rval;
}

will work just as well when wrapped with the btree specific calling
function and that will have none of the problems using a macro to
build the functions has...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/76] xfs: fix log ticket type printing
  2015-12-19  8:56 ` [PATCH 02/76] xfs: fix log ticket type printing Darrick J. Wong
@ 2016-01-03 12:13   ` Christoph Hellwig
  2016-01-03 21:29     ` Dave Chinner
  0 siblings, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Sat, Dec 19, 2015 at 12:56:36AM -0800, Darrick J. Wong wrote:
> Update the log ticket reservation type printing code to reflect
> all the types of log tickets, to avoid incorrect debug output and
> avoid running off the end of the array.

This looks okay, but I think we should use name array initializers
to make sure it doesn't got out of sync ever.

Also I don't think we'd ever run past the array as the users
does a size check.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  2015-12-19  8:56 ` [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct Darrick J. Wong
@ 2016-01-03 12:15   ` Christoph Hellwig
  2016-01-04 22:12     ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Sat, Dec 19, 2015 at 12:56:55AM -0800, Darrick J. Wong wrote:
> Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
> inside it, gcc will quietly round the structure size up to the nearest
> 64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
> macro returning incorrect results for v5 filesystems on 64-bit
> machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
> will see garbage in AGFL item 119 and complain.
> 
> Therefore, tell gcc not to pad the structure so that the AGFL size
> calculation is correct.

Do you have a testcase for this?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 34/76] xfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled
  2015-12-19  9:00 ` [PATCH 34/76] xfs: implement " Darrick J. Wong
@ 2016-01-03 12:17   ` Christoph Hellwig
  2016-01-04 23:40     ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Sat, Dec 19, 2015 at 01:00:08AM -0800, Darrick J. Wong wrote:
> Implement extent swapping when reverse-mapping is enabled.

Can you just fold this into the previous patch?  There seem to be a lot
of patches in the series that would benefit from merging, but this
is the one that really screams for it :)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 35/76] libxfs: refactor short btree block verification
  2015-12-19  9:00 ` [PATCH 35/76] libxfs: refactor short btree block verification Darrick J. Wong
@ 2016-01-03 12:18   ` Christoph Hellwig
  2016-01-03 21:30     ` Dave Chinner
  0 siblings, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

> +/**
> + * xfs_btree_sblock_v5hdr_verify() -- verify the v5 fields of a short-format
> + *				      btree block
> + *
> + * @bp: buffer containing the btree block
> + * @max_recs: pointer to the m_*_mxr max records field in the xfs mount
> + * @pag_max_level: pointer to the per-ag max level field
> + */

We're so far avoided using kerneldoc comments and they silly super
verbosity in XFS.  It might be good to tone this down to the normal
XFS level (same for some other patches).

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 57/76] xfs: allocate delayed extents in CoW fork
  2015-12-19  9:02 ` [PATCH 57/76] xfs: allocate " Darrick J. Wong
@ 2016-01-03 12:20   ` Christoph Hellwig
  2016-01-05  1:13     ` Darrick J. Wong
  2016-01-09  9:59   ` Darrick J. Wong
  1 sibling, 1 reply; 102+ messages in thread
From: Christoph Hellwig @ 2016-01-03 12:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

I really don't like the separate XFS_IOEND_COW flag.  This should
be an ioend type just like XFS_IO_DELALLOC, XFS_IO_UNWRITTEN and
XFS_IO_OVERWRITE.  We might need some work in the direct I/O
completions for that to really work, though.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/76] xfs: fix log ticket type printing
  2016-01-03 12:13   ` Christoph Hellwig
@ 2016-01-03 21:29     ` Dave Chinner
  2016-01-04 19:57       ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Chinner @ 2016-01-03 21:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, Darrick J. Wong

On Sun, Jan 03, 2016 at 04:13:53AM -0800, Christoph Hellwig wrote:
> On Sat, Dec 19, 2015 at 12:56:36AM -0800, Darrick J. Wong wrote:
> > Update the log ticket reservation type printing code to reflect
> > all the types of log tickets, to avoid incorrect debug output and
> > avoid running off the end of the array.
> 
> This looks okay, but I think we should use name array initializers
> to make sure it doesn't got out of sync ever.

Good idea, separate patch is fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 35/76] libxfs: refactor short btree block verification
  2016-01-03 12:18   ` Christoph Hellwig
@ 2016-01-03 21:30     ` Dave Chinner
  0 siblings, 0 replies; 102+ messages in thread
From: Dave Chinner @ 2016-01-03 21:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs, Darrick J. Wong

On Sun, Jan 03, 2016 at 04:18:32AM -0800, Christoph Hellwig wrote:
> > +/**
> > + * xfs_btree_sblock_v5hdr_verify() -- verify the v5 fields of a short-format
> > + *				      btree block
> > + *
> > + * @bp: buffer containing the btree block
> > + * @max_recs: pointer to the m_*_mxr max records field in the xfs mount
> > + * @pag_max_level: pointer to the per-ag max level field
> > + */
> 
> We're so far avoided using kerneldoc comments and they silly super
> verbosity in XFS.  It might be good to tone this down to the normal
> XFS level (same for some other patches).

I'll do that before I commit it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 02/76] xfs: fix log ticket type printing
  2016-01-03 21:29     ` Dave Chinner
@ 2016-01-04 19:57       ` Darrick J. Wong
  0 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-04 19:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Mon, Jan 04, 2016 at 08:29:05AM +1100, Dave Chinner wrote:
> On Sun, Jan 03, 2016 at 04:13:53AM -0800, Christoph Hellwig wrote:
> > On Sat, Dec 19, 2015 at 12:56:36AM -0800, Darrick J. Wong wrote:
> > > Update the log ticket reservation type printing code to reflect
> > > all the types of log tickets, to avoid incorrect debug output and
> > > avoid running off the end of the array.
> > 
> > This looks okay, but I think we should use name array initializers
> > to make sure it doesn't got out of sync ever.
> 
> Good idea, separate patch is fine.

Will do.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 03/76] libxfs: refactor the btree size calculator code
  2015-12-20 20:39   ` Dave Chinner
@ 2016-01-04 22:06     ` Darrick J. Wong
  0 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-04 22:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Dec 21, 2015 at 07:39:28AM +1100, Dave Chinner wrote:
> On Sat, Dec 19, 2015 at 12:56:42AM -0800, Darrick J. Wong wrote:
> > Create a macro to generate btree height calculator functions.
> > This will be used (much) later when we get to the refcount
> > btree.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ....
> > +/* btree size calculator templates */
> > +#define DECLARE_BTREE_SIZE_FN(btree) \
> > +xfs_filblks_t xfs_##btree##_calc_btree_size(struct xfs_mount *mp, \
> > +		unsigned long len);
> > +
> > +#define DEFINE_BTREE_SIZE_FN(btree, limitfield, maxlevels) \
> > +xfs_filblks_t \
> > +xfs_##btree##_calc_btree_size( \
> > +	struct xfs_mount	*mp, \
> > +	unsigned long		len) \
> > +{ \
> > +	int			level; \
> > +	int			maxrecs; \
> > +	xfs_filblks_t		rval; \
> > +\
> > +	maxrecs = mp->limitfield[0]; \
> > +	for (level = 0, rval = 0; level < maxlevels; level++) { \
> > +		len += maxrecs - 1; \
> > +		do_div(len, maxrecs); \
> > +		rval += len; \
> > +		if (len == 1) \
> > +			return rval + maxlevels - \
> > +				level - 1; \
> > +		if (level == 0) \
> > +			maxrecs = mp->limitfield[1]; \
> > +	} \
> > +	return rval; \
> > +}
> 
> I really don't like using macros like this. The code becomes hard to
> debug, hard to edit, the functions don't show up in grep/cscope,
> etc.
> 
> A helper function like this:
> 
> xfs_filblks_t
> xfs_btree_calc_size(
> 	struct xfs_mount        *mp,
> 	int			*limits,
> 	int			maxlevels,
> 	unsigned long           len)
> {
> 	int			level;
> 	int			maxrecs;
> 	xfs_filblks_t		rval;
> 
> 	maxrecs = limits[0];
> 	for (level = 0, rval = 0; level < maxlevels; level++) {
> 		len += maxrecs - 1;
> 		do_div(len, maxrecs);
> 		rval += len;
> 		if (len == 1)
> 			return rval + maxlevels - level - 1;
> 		if (level == 0)
> 			maxrecs = limits[1];
> 	}
> 	return rval;
> }
> 
> will work just as well when wrapped with the btree specific calling
> function and that will have none of the problems using a macro to
> build the functions has...

Will change to use a helper function.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  2016-01-03 12:15   ` Christoph Hellwig
@ 2016-01-04 22:12     ` Darrick J. Wong
  2016-01-04 23:23       ` Darrick J. Wong
  2016-01-04 23:51       ` Dave Chinner
  0 siblings, 2 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-04 22:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Jan 03, 2016 at 04:15:25AM -0800, Christoph Hellwig wrote:
> On Sat, Dec 19, 2015 at 12:56:55AM -0800, Darrick J. Wong wrote:
> > Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
> > inside it, gcc will quietly round the structure size up to the nearest
> > 64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
> > macro returning incorrect results for v5 filesystems on 64-bit
> > machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
> > will see garbage in AGFL item 119 and complain.
> > 
> > Therefore, tell gcc not to pad the structure so that the AGFL size
> > calculation is correct.
> 
> Do you have a testcase for this?

Not much aside from:

0. Build kernel/xfsprogs with RFCv4 patches on a 64bit machine.
1. Build kernel/xfsprogs with RFCv4 patches on a 32bit machine.
2. Format a XFS with reflink and rmap on a 64-bit machine, so that the AGFL
   size is maximized.
3. Mount FS and create a reflinked file.
4. Unmount and xfs_repair with the 32-bit build.

I guess we could create a program that compares all the known sizeof(struct
xfs_disk_object) values against known good values and stuff that into the
xfsprogs build process.

--D

> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  2016-01-04 22:12     ` Darrick J. Wong
@ 2016-01-04 23:23       ` Darrick J. Wong
  2016-01-04 23:51       ` Dave Chinner
  1 sibling, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-04 23:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Jan 04, 2016 at 02:12:18PM -0800, Darrick J. Wong wrote:
> On Sun, Jan 03, 2016 at 04:15:25AM -0800, Christoph Hellwig wrote:
> > On Sat, Dec 19, 2015 at 12:56:55AM -0800, Darrick J. Wong wrote:
> > > Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
> > > inside it, gcc will quietly round the structure size up to the nearest
> > > 64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
> > > macro returning incorrect results for v5 filesystems on 64-bit
> > > machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
> > > will see garbage in AGFL item 119 and complain.
> > > 
> > > Therefore, tell gcc not to pad the structure so that the AGFL size
> > > calculation is correct.
> > 
> > Do you have a testcase for this?
> 
> Not much aside from:
> 
> 0. Build kernel/xfsprogs with RFCv4 patches on a 64bit machine.
> 1. Build kernel/xfsprogs with RFCv4 patches on a 32bit machine.
> 2. Format a XFS with reflink and rmap on a 64-bit machine, so that the AGFL
>    size is maximized.
> 3. Mount FS and create a reflinked file.
> 4. Unmount and xfs_repair with the 32-bit build.
> 
> I guess we could create a program that compares all the known sizeof(struct
> xfs_disk_object) values against known good values and stuff that into the
> xfsprogs build process.

I created a patch that uses BUILD_BUG_ON to stop kbuild if the sizes of the
on-disk data structures don't match known values.  I've also ported it to
xfsprogs, so we can check both.

--D

> 
> --D
> 
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 34/76] xfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled
  2016-01-03 12:17   ` Christoph Hellwig
@ 2016-01-04 23:40     ` Darrick J. Wong
  2016-01-05  2:41       ` Dave Chinner
  0 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-04 23:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Jan 03, 2016 at 04:17:28AM -0800, Christoph Hellwig wrote:
> On Sat, Dec 19, 2015 at 01:00:08AM -0800, Darrick J. Wong wrote:
> > Implement extent swapping when reverse-mapping is enabled.
> 
> Can you just fold this into the previous patch?  There seem to be a lot
> of patches in the series that would benefit from merging, but this
> is the one that really screams for it :)

The only reason I'm keeping Dave's original rmap patches separate is because
the last time I checked with Dave, he said that he'd made a few bugfixes to his
patchset that hadn't yet been sent out, so I'd like to merge those fixes into
the original patches before starting any combining work.  It also might be more
convenient (for him, anyway) to keep my rmapbt changes separate for review
purposes.

At this point, though, I've made so many changes that I'd like to pull in
Dave's changes and combine them with mine into fewer patches.  (AFAIK, Dave
hasn't sent any further reverse-mapping patches since 3 June 2015.)

--D

> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct
  2016-01-04 22:12     ` Darrick J. Wong
  2016-01-04 23:23       ` Darrick J. Wong
@ 2016-01-04 23:51       ` Dave Chinner
  1 sibling, 0 replies; 102+ messages in thread
From: Dave Chinner @ 2016-01-04 23:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Mon, Jan 04, 2016 at 02:12:18PM -0800, Darrick J. Wong wrote:
> On Sun, Jan 03, 2016 at 04:15:25AM -0800, Christoph Hellwig wrote:
> > On Sat, Dec 19, 2015 at 12:56:55AM -0800, Darrick J. Wong wrote:
> > > Because struct xfs_agfl is 36 bytes long and has a 64-bit integer
> > > inside it, gcc will quietly round the structure size up to the nearest
> > > 64 bits -- in this case, 40 bytes.  This results in the XFS_AGFL_SIZE
> > > macro returning incorrect results for v5 filesystems on 64-bit
> > > machines (118 items instead of 119).  As a result, a 32-bit xfs_repair
> > > will see garbage in AGFL item 119 and complain.
> > > 
> > > Therefore, tell gcc not to pad the structure so that the AGFL size
> > > calculation is correct.
> > 
> > Do you have a testcase for this?
> 
> Not much aside from:
> 
> 0. Build kernel/xfsprogs with RFCv4 patches on a 64bit machine.
> 1. Build kernel/xfsprogs with RFCv4 patches on a 32bit machine.
> 2. Format a XFS with reflink and rmap on a 64-bit machine, so that the AGFL
>    size is maximized.
> 3. Mount FS and create a reflinked file.
> 4. Unmount and xfs_repair with the 32-bit build.
> 
> I guess we could create a program that compares all the known sizeof(struct
> xfs_disk_object) values against known good values and stuff that into the
> xfsprogs build process.

There's an xfstest for that: xfs/122. It notruns on my systems since
the big xfsprogs build/header rework, and I haven't found the time
to work out what it needs to run again. Also, I don't think it
covers the AGFL structure, because that it relatively new and the
test doesn't check any of the v5 specific structures....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
  2015-12-20 14:02 ` [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Brian Foster
@ 2016-01-04 23:59   ` Darrick J. Wong
  2016-01-05 12:42     ` Brian Foster
  0 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-04 23:59 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Sun, Dec 20, 2015 at 09:02:54AM -0500, Brian Foster wrote:
> On Sat, Dec 19, 2015 at 12:56:23AM -0800, Darrick J. Wong wrote:
> > Hi all,
> > 
> ...
> > Fixed since RFCv3:
> > 
> >  * The reflink and dedupe ioctls are being hoisted to the VFS, as
> >    provided in the first few patches.  Patch 81 connects to this
> >    functionality.
> > 
> >  * Copy on write has been rewritten for v4.  We now use the existing
> >    delayed allocation mechanism to coalesce writes together, deferring
> >    allocation until writeout time.  This enables CoW to make better
> >    block placement decisions and significantly reduces overhead.
> >    CoW is still pretty slow, but not as slow as before.
> > 
> >  * Direct IO CoW has been implemented using the same mechanism as
> >    above, but modified to perform the allocation and remapping right
> >    then and there.  Throughput is much higher than pushing data
> >    through the page cache CoW.  (It's the same mechanism, but we're
> >    playing with chunks bigger than a single memory page.)
> > 
> >  * CoW ENOSPC works correctly now, except in the pathological case
> >    that the AG fills up and the rmap btree cannot expand.  That will
> >    be addressed for v5.
> > 
> >  * fallocate will now unshare blocks to prevent future ENOSPC, as
> >    you'd expect.
> > 
> >  * refcount btree blocks are preallocated at mount time to prevent
> >    ENOSPC while trying to expand the tree.  This also has the effect
> >    of grouping the btree blocks together, which can speed up CoW
> >    remapping.
> > 
> 
> Can you elaborate on how these blocks are preallocated? E.g., is the
> tree "preconstructed" in some sense? However that is done, is this the
> anticipated solution or a temporary workaround..?
> 
> Also, shouldn't the enospc condition be handled by the agfl? I take it
> there is something going on here that renders that solution flawed, so
> I'm just curious what it is.
> 
> (Sorry if this is all explained elsewhere, but I haven't yet had a
> chance to take a close enough look at this feature..).

Reference count btree blocks aren't allocated from the AGFL; they're allocated
from the free space in the same manner as the inobt, per a review comment from
Dave a looong time ago. :) 

As such, we can get ourselves into the nasty situation where every block in the
AG has been allocated to file data.  If we then see a bunch of reference count
changes that are scattered around the AG, the reference count btree has to
expand to hold all the new records... but there isn't space, and the operation
fails.  Given that we know the maximum possible size of the refcount btree
(it's 0.3% of the AG size with 4k blocks), I figured it was easy enough to
avoid ENOSPC for reflink operations.

I've temporarily fixed this by adding code that figures out how many blocks we
need if the reference count btree has to have a unique record for every block
in the AG and holding that many blocks until either they're allocated to the
refcount btree or freed at umount time.  Right now it's a temporary fix (if the
FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
FS to make a permanent reservation that's recorded on disk somehow.  But that's
involves writing things to disk + making xfsprogs understand the reservation;
let's see what people say about the reserved pool idea at all.

Does that make sense? :)

--D

> 
> Brian
> 
> > Issues: 
> > 
> >  * The extent swapping ioctl still allocates a bigger fixed-size
> >    transaction.  That's most likely a stupid thing to do, so getting a
> >    better grip on how the journalling code works and auditing all the
> >    new transaction users will have to happen.  Right now it mostly
> >    gets lucky.
> > 
> >  * EFI tracking for the allocated-but-not-yet-mapped blocks is
> >    nonexistant.  A crash will leak them.
> > 
> >  * ENOSPC while expanding the rmap btree can crash the FS.  For now we
> >    work around this problem by making the AGFL as big as possible,
> >    failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
> >    available, and hoping that doesn't actually happen.
> > 
> > If you're going to start using this mess, you probably ought to just
> > pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
> > There are also updates for xfs-docs[4] and man-pages[5].
> > 
> > The patches have been xfstested with x64, i386, and ppc64; while in
> > general the tests run to completion, there are still periodic bugs
> > that will be addressed by the next RFC.  There's a persistent crash on
> > arm64 and ppc64el that I haven't been able to triage.
> > 
> > This is an extraordinary way to eat your data.  Enjoy! 
> > Comments and questions are, as always, welcome.
> > 
> > --D
> > 
> > [1] https://github.com/djwong/linux/tree/for-dave
> > [2] https://github.com/djwong/xfsprogs/tree/for-dave
> > [3] https://github.com/djwong/xfstests/tree/for-dave
> > [4] https://github.com/djwong/xfs-documentation/tree/for-dave
> > [5] https://github.com/djwong/man-pages/commits/for-mtk
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 57/76] xfs: allocate delayed extents in CoW fork
  2016-01-03 12:20   ` Christoph Hellwig
@ 2016-01-05  1:13     ` Darrick J. Wong
  0 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-05  1:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Jan 03, 2016 at 04:20:58AM -0800, Christoph Hellwig wrote:
> I really don't like the separate XFS_IOEND_COW flag.  This should
> be an ioend type just like XFS_IO_DELALLOC, XFS_IO_UNWRITTEN and
> XFS_IO_OVERWRITE.  We might need some work in the direct I/O
> completions for that to really work, though.

The COW completion is an ioend flag (instead of an ioend type) *because* of how
I thought directio completions work. :)

It looked to me like you could only have one ioend for a dio, and that there
wasn't a clean way to make XFS have multiple ioends for a particular dio.
Since the remap operation remaps everything it finds in the COW fork for a
given part of a file and leaves the rest alone, it seemed natural to encode COW
as a flag in the ioend, leave the ioend type as XFS_IO_OVERWRITE, and try to
have as few ioends as possible.  For dio writes I can COW write almost as fast
as a regular overwrite.

I further observed with the RFCv3 COW (which did have a separate ioend type for
COW) that random writes had a tendency to create a /lot/ of ioends.  If you let
all these writes accumulate and then flushed them out with a single fsync, the
scatteredness of the COW blocks caused the creation of a lot of ioends, which
then caused a thundering herd when all those ioends had to be pushed separately
through the workqueue.  Changing the ioend handling to the way it is in RFCv4
greatly reduced the number of ioends that have to be processed, and
dramatically improved write speeds.

(I also fixed a bug where it would create ioends for each COW block in a
streaming write; both fixes make v4's COW 5x faster on a ramdisk than v3's.)

Did I overlook some subtlety with ioends?  So far it's worked fine as-is.

--D

> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 34/76] xfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled
  2016-01-04 23:40     ` Darrick J. Wong
@ 2016-01-05  2:41       ` Dave Chinner
  2016-01-07  0:09         ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Chinner @ 2016-01-05  2:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, xfs

On Mon, Jan 04, 2016 at 03:40:33PM -0800, Darrick J. Wong wrote:
> On Sun, Jan 03, 2016 at 04:17:28AM -0800, Christoph Hellwig wrote:
> > On Sat, Dec 19, 2015 at 01:00:08AM -0800, Darrick J. Wong wrote:
> > > Implement extent swapping when reverse-mapping is enabled.
> > 
> > Can you just fold this into the previous patch?  There seem to be a lot
> > of patches in the series that would benefit from merging, but this
> > is the one that really screams for it :)
> 
> The only reason I'm keeping Dave's original rmap patches separate is because
> the last time I checked with Dave, he said that he'd made a few bugfixes to his
> patchset that hadn't yet been sent out, so I'd like to merge those fixes into
> the original patches before starting any combining work.

I haven't had a chance to sort any of those out fully, so it's
probably best if you combine all your patches to minimise the
separation and I'll send delta patches (if still relevant) as I work
through it all in the next couple of weeks...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
  2016-01-04 23:59   ` Darrick J. Wong
@ 2016-01-05 12:42     ` Brian Foster
  2016-01-06  2:04       ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Brian Foster @ 2016-01-05 12:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Mon, Jan 04, 2016 at 03:59:51PM -0800, Darrick J. Wong wrote:
> On Sun, Dec 20, 2015 at 09:02:54AM -0500, Brian Foster wrote:
> > On Sat, Dec 19, 2015 at 12:56:23AM -0800, Darrick J. Wong wrote:
> > > Hi all,
> > > 
> > ...
> > > Fixed since RFCv3:
> > > 
> > >  * The reflink and dedupe ioctls are being hoisted to the VFS, as
> > >    provided in the first few patches.  Patch 81 connects to this
> > >    functionality.
> > > 
> > >  * Copy on write has been rewritten for v4.  We now use the existing
> > >    delayed allocation mechanism to coalesce writes together, deferring
> > >    allocation until writeout time.  This enables CoW to make better
> > >    block placement decisions and significantly reduces overhead.
> > >    CoW is still pretty slow, but not as slow as before.
> > > 
> > >  * Direct IO CoW has been implemented using the same mechanism as
> > >    above, but modified to perform the allocation and remapping right
> > >    then and there.  Throughput is much higher than pushing data
> > >    through the page cache CoW.  (It's the same mechanism, but we're
> > >    playing with chunks bigger than a single memory page.)
> > > 
> > >  * CoW ENOSPC works correctly now, except in the pathological case
> > >    that the AG fills up and the rmap btree cannot expand.  That will
> > >    be addressed for v5.
> > > 
> > >  * fallocate will now unshare blocks to prevent future ENOSPC, as
> > >    you'd expect.
> > > 
> > >  * refcount btree blocks are preallocated at mount time to prevent
> > >    ENOSPC while trying to expand the tree.  This also has the effect
> > >    of grouping the btree blocks together, which can speed up CoW
> > >    remapping.
> > > 
> > 
> > Can you elaborate on how these blocks are preallocated? E.g., is the
> > tree "preconstructed" in some sense? However that is done, is this the
> > anticipated solution or a temporary workaround..?
> > 
> > Also, shouldn't the enospc condition be handled by the agfl? I take it
> > there is something going on here that renders that solution flawed, so
> > I'm just curious what it is.
> > 
> > (Sorry if this is all explained elsewhere, but I haven't yet had a
> > chance to take a close enough look at this feature..).
> 
> Reference count btree blocks aren't allocated from the AGFL; they're allocated
> from the free space in the same manner as the inobt, per a review comment from
> Dave a looong time ago. :) 
> 

Ah, Ok.

> As such, we can get ourselves into the nasty situation where every block in the
> AG has been allocated to file data.  If we then see a bunch of reference count
> changes that are scattered around the AG, the reference count btree has to
> expand to hold all the new records... but there isn't space, and the operation
> fails.  Given that we know the maximum possible size of the refcount btree
> (it's 0.3% of the AG size with 4k blocks), I figured it was easy enough to
> avoid ENOSPC for reflink operations.
> 

Sounds reasonable.

> I've temporarily fixed this by adding code that figures out how many blocks we
> need if the reference count btree has to have a unique record for every block
> in the AG and holding that many blocks until either they're allocated to the
> refcount btree or freed at umount time.  Right now it's a temporary fix (if the
> FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
> FS to make a permanent reservation that's recorded on disk somehow.  But that's
> involves writing things to disk + making xfsprogs understand the reservation;
> let's see what people say about the reserved pool idea at all.
> 
> Does that make sense? :)
> 

Yep, it sounds sort of like the reserve pool mechanism used to protect
against ENOSPC when freeing blocks. Curious... why are the reserved
blocks lost on fs crash? Wouldn't they be reserved again on the
subsequent mount?

Thanks for the explanation...

Brian

> --D
> 
> > 
> > Brian
> > 
> > > Issues: 
> > > 
> > >  * The extent swapping ioctl still allocates a bigger fixed-size
> > >    transaction.  That's most likely a stupid thing to do, so getting a
> > >    better grip on how the journalling code works and auditing all the
> > >    new transaction users will have to happen.  Right now it mostly
> > >    gets lucky.
> > > 
> > >  * EFI tracking for the allocated-but-not-yet-mapped blocks is
> > >    nonexistant.  A crash will leak them.
> > > 
> > >  * ENOSPC while expanding the rmap btree can crash the FS.  For now we
> > >    work around this problem by making the AGFL as big as possible,
> > >    failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
> > >    available, and hoping that doesn't actually happen.
> > > 
> > > If you're going to start using this mess, you probably ought to just
> > > pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
> > > There are also updates for xfs-docs[4] and man-pages[5].
> > > 
> > > The patches have been xfstested with x64, i386, and ppc64; while in
> > > general the tests run to completion, there are still periodic bugs
> > > that will be addressed by the next RFC.  There's a persistent crash on
> > > arm64 and ppc64el that I haven't been able to triage.
> > > 
> > > This is an extraordinary way to eat your data.  Enjoy! 
> > > Comments and questions are, as always, welcome.
> > > 
> > > --D
> > > 
> > > [1] https://github.com/djwong/linux/tree/for-dave
> > > [2] https://github.com/djwong/xfsprogs/tree/for-dave
> > > [3] https://github.com/djwong/xfstests/tree/for-dave
> > > [4] https://github.com/djwong/xfs-documentation/tree/for-dave
> > > [5] https://github.com/djwong/man-pages/commits/for-mtk
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
  2016-01-05 12:42     ` Brian Foster
@ 2016-01-06  2:04       ` Darrick J. Wong
  2016-01-06  3:44         ` Dave Chinner
  0 siblings, 1 reply; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-06  2:04 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Tue, Jan 05, 2016 at 07:42:26AM -0500, Brian Foster wrote:
> On Mon, Jan 04, 2016 at 03:59:51PM -0800, Darrick J. Wong wrote:
> > On Sun, Dec 20, 2015 at 09:02:54AM -0500, Brian Foster wrote:
> > > On Sat, Dec 19, 2015 at 12:56:23AM -0800, Darrick J. Wong wrote:
> > > > Hi all,
> > > > 
> > > ...
> > > > Fixed since RFCv3:
> > > > 
> > > >  * The reflink and dedupe ioctls are being hoisted to the VFS, as
> > > >    provided in the first few patches.  Patch 81 connects to this
> > > >    functionality.
> > > > 
> > > >  * Copy on write has been rewritten for v4.  We now use the existing
> > > >    delayed allocation mechanism to coalesce writes together, deferring
> > > >    allocation until writeout time.  This enables CoW to make better
> > > >    block placement decisions and significantly reduces overhead.
> > > >    CoW is still pretty slow, but not as slow as before.
> > > > 
> > > >  * Direct IO CoW has been implemented using the same mechanism as
> > > >    above, but modified to perform the allocation and remapping right
> > > >    then and there.  Throughput is much higher than pushing data
> > > >    through the page cache CoW.  (It's the same mechanism, but we're
> > > >    playing with chunks bigger than a single memory page.)
> > > > 
> > > >  * CoW ENOSPC works correctly now, except in the pathological case
> > > >    that the AG fills up and the rmap btree cannot expand.  That will
> > > >    be addressed for v5.
> > > > 
> > > >  * fallocate will now unshare blocks to prevent future ENOSPC, as
> > > >    you'd expect.
> > > > 
> > > >  * refcount btree blocks are preallocated at mount time to prevent
> > > >    ENOSPC while trying to expand the tree.  This also has the effect
> > > >    of grouping the btree blocks together, which can speed up CoW
> > > >    remapping.
> > > > 
> > > 
> > > Can you elaborate on how these blocks are preallocated? E.g., is the
> > > tree "preconstructed" in some sense? However that is done, is this the
> > > anticipated solution or a temporary workaround..?
> > > 
> > > Also, shouldn't the enospc condition be handled by the agfl? I take it
> > > there is something going on here that renders that solution flawed, so
> > > I'm just curious what it is.
> > > 
> > > (Sorry if this is all explained elsewhere, but I haven't yet had a
> > > chance to take a close enough look at this feature..).
> > 
> > Reference count btree blocks aren't allocated from the AGFL; they're allocated
> > from the free space in the same manner as the inobt, per a review comment from
> > Dave a looong time ago. :) 
> > 
> 
> Ah, Ok.
> 
> > As such, we can get ourselves into the nasty situation where every block in the
> > AG has been allocated to file data.  If we then see a bunch of reference count
> > changes that are scattered around the AG, the reference count btree has to
> > expand to hold all the new records... but there isn't space, and the operation
> > fails.  Given that we know the maximum possible size of the refcount btree
> > (it's 0.3% of the AG size with 4k blocks), I figured it was easy enough to
> > avoid ENOSPC for reflink operations.
> > 
> 
> Sounds reasonable.
> 
> > I've temporarily fixed this by adding code that figures out how many blocks we
> > need if the reference count btree has to have a unique record for every block
> > in the AG and holding that many blocks until either they're allocated to the
> > refcount btree or freed at umount time.  Right now it's a temporary fix (if the
> > FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
> > FS to make a permanent reservation that's recorded on disk somehow.  But that's
> > involves writing things to disk + making xfsprogs understand the reservation;
> > let's see what people say about the reserved pool idea at all.
> > 
> > Does that make sense? :)
> > 
> 
> Yep, it sounds sort of like the reserve pool mechanism used to protect
> against ENOSPC when freeing blocks. Curious... why are the reserved
> blocks lost on fs crash? Wouldn't they be reserved again on the
> subsequent mount?

They will, but the pre-crash reservation isn't (yet) written down anywhere on
disk.

Thank /you/ for having a look at the reflink code! :)

--D

> 
> Thanks for the explanation...
> 
> Brian
> 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > > Issues: 
> > > > 
> > > >  * The extent swapping ioctl still allocates a bigger fixed-size
> > > >    transaction.  That's most likely a stupid thing to do, so getting a
> > > >    better grip on how the journalling code works and auditing all the
> > > >    new transaction users will have to happen.  Right now it mostly
> > > >    gets lucky.
> > > > 
> > > >  * EFI tracking for the allocated-but-not-yet-mapped blocks is
> > > >    nonexistant.  A crash will leak them.
> > > > 
> > > >  * ENOSPC while expanding the rmap btree can crash the FS.  For now we
> > > >    work around this problem by making the AGFL as big as possible,
> > > >    failing CoW attempts with ENOSPC if there aren't enough AGFL blocks
> > > >    available, and hoping that doesn't actually happen.
> > > > 
> > > > If you're going to start using this mess, you probably ought to just
> > > > pull from my github trees for kernel[1], xfsprogs[2], and xfstests[3].
> > > > There are also updates for xfs-docs[4] and man-pages[5].
> > > > 
> > > > The patches have been xfstested with x64, i386, and ppc64; while in
> > > > general the tests run to completion, there are still periodic bugs
> > > > that will be addressed by the next RFC.  There's a persistent crash on
> > > > arm64 and ppc64el that I haven't been able to triage.
> > > > 
> > > > This is an extraordinary way to eat your data.  Enjoy! 
> > > > Comments and questions are, as always, welcome.
> > > > 
> > > > --D
> > > > 
> > > > [1] https://github.com/djwong/linux/tree/for-dave
> > > > [2] https://github.com/djwong/xfsprogs/tree/for-dave
> > > > [3] https://github.com/djwong/xfstests/tree/for-dave
> > > > [4] https://github.com/djwong/xfs-documentation/tree/for-dave
> > > > [5] https://github.com/djwong/man-pages/commits/for-mtk
> > > > 
> > > > _______________________________________________
> > > > xfs mailing list
> > > > xfs@oss.sgi.com
> > > > http://oss.sgi.com/mailman/listinfo/xfs
> > > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
  2016-01-06  2:04       ` Darrick J. Wong
@ 2016-01-06  3:44         ` Dave Chinner
  2016-02-02 23:06           ` Darrick J. Wong
  0 siblings, 1 reply; 102+ messages in thread
From: Dave Chinner @ 2016-01-06  3:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, xfs

On Tue, Jan 05, 2016 at 06:04:40PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 05, 2016 at 07:42:26AM -0500, Brian Foster wrote:
> > On Mon, Jan 04, 2016 at 03:59:51PM -0800, Darrick J. Wong wrote:
> > > I've temporarily fixed this by adding code that figures out how many blocks we
> > > need if the reference count btree has to have a unique record for every block
> > > in the AG and holding that many blocks until either they're allocated to the
> > > refcount btree or freed at umount time.  Right now it's a temporary fix (if the
> > > FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
> > > FS to make a permanent reservation that's recorded on disk somehow.  But that's
> > > involves writing things to disk + making xfsprogs understand the reservation;
> > > let's see what people say about the reserved pool idea at all.
> > > 
> > > Does that make sense? :)
> > > 
> > 
> > Yep, it sounds sort of like the reserve pool mechanism used to protect
> > against ENOSPC when freeing blocks. Curious... why are the reserved
> > blocks lost on fs crash? Wouldn't they be reserved again on the
> > subsequent mount?
> 
> They will, but the pre-crash reservation isn't (yet) written down anywhere on
> disk.

Does it need to be? The global reserve pool is not "written down"
anywhere. When we mount, we pull the reserve from the global free
space accounting. Hence we given ENOSPC when we've used "total fs
blocks - reserve pool blocks" in memory, and so if we crash we've
still got at least that many free blocks on disk. hence on mount we
re-reserve those blocks in memory and everything is back to the way
it was prior to the crash.

I suspect the per-ag code is a bit different, but it should be able
to work the same way. i.e. when we initialise the per-ag structure,
we pull the reserve from the free block count in the AG, as well as
from the global free space count. Then we will get correct global
ENOSPC detection, as well as leave enough space free in each AG as
we scan and skip them during allocation...

As long as the per-ag reservation is restored during mount before we
do EFI recovery processing (i.e. between the two log recovery
phases), it should restore the reserve pool to the same size as it
was before a crash occurred....

Unless, of course, I'm missing something newly introduced by the
reflink code...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 34/76] xfs: implement XFS_IOC_SWAPEXT when rmap btree is enabled
  2016-01-05  2:41       ` Dave Chinner
@ 2016-01-07  0:09         ` Darrick J. Wong
  0 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-07  0:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Tue, Jan 05, 2016 at 01:41:30PM +1100, Dave Chinner wrote:
> On Mon, Jan 04, 2016 at 03:40:33PM -0800, Darrick J. Wong wrote:
> > On Sun, Jan 03, 2016 at 04:17:28AM -0800, Christoph Hellwig wrote:
> > > On Sat, Dec 19, 2015 at 01:00:08AM -0800, Darrick J. Wong wrote:
> > > > Implement extent swapping when reverse-mapping is enabled.
> > > 
> > > Can you just fold this into the previous patch?  There seem to be a lot
> > > of patches in the series that would benefit from merging, but this
> > > is the one that really screams for it :)
> > 
> > The only reason I'm keeping Dave's original rmap patches separate is because
> > the last time I checked with Dave, he said that he'd made a few bugfixes to his
> > patchset that hadn't yet been sent out, so I'd like to merge those fixes into
> > the original patches before starting any combining work.
> 
> I haven't had a chance to sort any of those out fully, so it's
> probably best if you combine all your patches to minimise the
> separation and I'll send delta patches (if still relevant) as I work
> through it all in the next couple of weeks...

Ok, I'll start combining them as I remove all the kerneldoc comments.  I think
I've gotten all the "CoW reservations left over at unmount" bugs out, so I'll
probably have a slimmer patchset in a week or two.

(Or everything will go nuts on some weird arch like arm64 and kablooie...)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 60/76] xfs: implement CoW for directio writes
  2015-12-19  9:03 ` [PATCH 60/76] xfs: implement CoW for directio writes Darrick J. Wong
@ 2016-01-08  9:34   ` Darrick J. Wong
  0 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-08  9:34 UTC (permalink / raw)
  To: david; +Cc: xfs

On Sat, Dec 19, 2015 at 01:03:07AM -0800, Darrick J. Wong wrote:
> For O_DIRECT writes to shared blocks, we have to CoW them just like
> we would with buffered writes.  For writes that are not block-aligned,
> just bounce them to the page cache.
> 
> For block-aligned writes, however, we can do better than that.  Use
> the same mechanisms that we employ for buffered CoW to set up a
> delalloc reservation, allocate all the blocks at once, issue the
> writes against the new blocks and use the same ioend functions to
> remap the blocks after the write.  This should be fairly performant.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_aops.c    |   63 +++++++++++++++++++++++++---
>  fs/xfs/xfs_file.c    |   12 ++++-
>  fs/xfs/xfs_reflink.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_reflink.h |    5 ++
>  4 files changed, 186 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 8101d6a..4b77d07 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1339,7 +1339,8 @@ xfs_map_direct(
>  	struct buffer_head	*bh_result,
>  	struct xfs_bmbt_irec	*imap,
>  	xfs_off_t		offset,
> -	bool			dax_fault)
> +	bool			dax_fault,
> +	bool			is_cow)
>  {
>  	struct xfs_ioend	*ioend;
>  	xfs_off_t		size = bh_result->b_size;
> @@ -1368,20 +1369,23 @@ xfs_map_direct(
>  
>  		if (type == XFS_IO_UNWRITTEN && type != ioend->io_type)
>  			ioend->io_type = XFS_IO_UNWRITTEN;
> +		if (is_cow)
> +			ioend->io_flags |= XFS_IOEND_COW;
>  
>  		trace_xfs_gbmap_direct_update(XFS_I(inode), ioend->io_offset,
>  					      ioend->io_size, ioend->io_type,
>  					      imap);
> -	} else if (type == XFS_IO_UNWRITTEN ||
> +	} else if (type == XFS_IO_UNWRITTEN || is_cow ||
>  		   offset + size > i_size_read(inode) ||
>  		   offset + size < 0) {
>  		ioend = xfs_alloc_ioend(inode, type);
>  		ioend->io_offset = offset;
>  		ioend->io_size = size;
> +		if (is_cow)
> +			ioend->io_flags |= XFS_IOEND_COW;

NAK.  Further testing demonstrates incorrect remapping when the DIO CoW write
fails because xfs_end_io_direct_write doesn't know if the write succeeded or
not.  xfs_vm_do_dio does, however, so we should move the remapping/cancelling
code to that function and avoid using the ioend entirely for directio cow.

--D

>  
>  		bh_result->b_private = ioend;
>  		set_buffer_defer_completion(bh_result);
> -
>  		trace_xfs_gbmap_direct_new(XFS_I(inode), offset, size, type,
>  					   imap);
>  	} else {
> @@ -1449,6 +1453,8 @@ __xfs_get_blocks(
>  	xfs_off_t		offset;
>  	ssize_t			size;
>  	int			new = 0;
> +	bool			is_cow = false;
> +	bool			need_alloc = false;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
> @@ -1480,8 +1486,15 @@ __xfs_get_blocks(
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + size);
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  
> -	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				&imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	if (create && direct)
> +		is_cow = xfs_reflink_is_cow_pending(ip, offset);
> +	if (is_cow)
> +		error = xfs_reflink_find_cow_mapping(ip, offset, &imap,
> +						     &need_alloc);
> +	else
> +		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> +					&imap, &nimaps, XFS_BMAPI_ENTIRE);
> +	ASSERT(!need_alloc);
>  	if (error)
>  		goto out_unlock;
>  
> @@ -1553,13 +1566,33 @@ __xfs_get_blocks(
>  	if (imap.br_startblock != HOLESTARTBLOCK &&
>  	    imap.br_startblock != DELAYSTARTBLOCK &&
>  	    (create || !ISUNWRITTEN(&imap))) {
> +		if (create && direct && !is_cow) {
> +			bool shared;
> +
> +			error = xfs_reflink_irec_is_shared(ip, &imap, &shared);
> +			if (error)
> +				return error;
> +			/*
> +			 * Are we doing a DIO write to a shared block?  In
> +			 * the ideal world we at least would fork full blocks,
> +			 * but for now just fall back to buffered mode.  Yuck.
> +			 * Use -EREMCHG ("remote address changed") to signal
> +			 * this, since in general XFS doesn't do this sort of
> +			 * fallback.
> +			 */
> +			if (shared) {
> +				trace_xfs_reflink_bounce_dio_write(ip, &imap);
> +				return -EREMCHG;
> +			}
> +		}
> +
>  		xfs_map_buffer(inode, bh_result, &imap, offset);
>  		if (ISUNWRITTEN(&imap))
>  			set_buffer_unwritten(bh_result);
>  		/* direct IO needs special help */
>  		if (create && direct)
>  			xfs_map_direct(inode, bh_result, &imap, offset,
> -				       dax_fault);
> +				       dax_fault, is_cow);
>  	}
>  
>  	/*
> @@ -1738,6 +1771,24 @@ xfs_vm_do_dio(
>  	int			flags)
>  {
>  	struct block_device	*bdev;
> +	loff_t			end;
> +	loff_t			block_mask;
> +	int			error;
> +
> +	/* If this is a block-aligned directio CoW, remap immediately. */
> +	end = offset + iov_iter_count(iter);
> +	block_mask = (1 << inode->i_blkbits) - 1;
> +	if (xfs_is_reflink_inode(XFS_I(inode)) && iov_iter_rw(iter) == WRITE &&
> +	    !(offset & block_mask) && !(end & block_mask)) {
> +		error = xfs_reflink_reserve_cow_range(XFS_I(inode), offset,
> +				iov_iter_count(iter));
> +		if (error)
> +			return error;
> +		error = xfs_reflink_allocate_cow_range(XFS_I(inode), offset,
> +				iov_iter_count(iter));
> +		if (error)
> +			return error;
> +	}
>  
>  	if (IS_DAX(inode))
>  		return dax_do_io(iocb, inode, iter, offset,
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 0fbcb38..31b002e 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -892,10 +892,18 @@ xfs_file_write_iter(
>  	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
>  		return -EIO;
>  
> -	if ((iocb->ki_flags & IOCB_DIRECT) || IS_DAX(inode))
> +	/*
> +	 * Allow DIO to fall back to buffered *only* in the case that we're
> +	 * doing a reflink CoW.
> +	 */
> +	if ((iocb->ki_flags & IOCB_DIRECT) || IS_DAX(inode)) {
>  		ret = xfs_file_dio_aio_write(iocb, from);
> -	else
> +		if (ret == -EREMCHG)
> +			goto buffered;
> +	} else {
> +buffered:
>  		ret = xfs_file_buffered_aio_write(iocb, from);
> +	}
>  
>  	if (ret > 0) {
>  		ssize_t err;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index 9c1c262..8594bc4 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -134,6 +134,56 @@ xfs_trim_extent(
>  	}
>  }
>  
> +/**
> + * xfs_reflink_irec_is_shared() -- Are any of the blocks in this mapping
> + *				   shared?
> + *
> + * @ip: XFS inode object
> + * @irec: the fileoff:fsblock mapping that we might fork
> + * @shared: set to true if the mapping is shared.
> + */
> +int
> +xfs_reflink_irec_is_shared(
> +	struct xfs_inode	*ip,
> +	struct xfs_bmbt_irec	*irec,
> +	bool			*shared)
> +{
> +	xfs_agnumber_t		agno;
> +	xfs_agblock_t		agbno;
> +	xfs_extlen_t		aglen;
> +	xfs_agblock_t		fbno;
> +	xfs_extlen_t		flen;
> +	int			error = 0;
> +
> +	/* Holes, unwritten, and delalloc extents cannot be shared */
> +	if (!xfs_is_reflink_inode(ip) ||
> +	    ISUNWRITTEN(irec) ||
> +	    irec->br_startblock == HOLESTARTBLOCK ||
> +	    irec->br_startblock == DELAYSTARTBLOCK) {
> +		*shared = false;
> +		return 0;
> +	}
> +
> +	trace_xfs_reflink_irec_is_shared(ip, irec);
> +
> +	agno = XFS_FSB_TO_AGNO(ip->i_mount, irec->br_startblock);
> +	agbno = XFS_FSB_TO_AGBNO(ip->i_mount, irec->br_startblock);
> +	aglen = irec->br_blockcount;
> +
> +	/* Are there any shared blocks here? */
> +	error = xfs_refcount_find_shared(ip->i_mount, agno, agbno,
> +			aglen, &fbno, &flen, false);
> +	if (error)
> +		return error;
> +	if (flen == 0) {
> +		*shared = false;
> +		return 0;
> +	}
> +
> +	*shared = true;
> +	return 0;
> +}
> +
>  /* Find the shared ranges under an irec, and set up delalloc extents. */
>  STATIC int
>  xfs_reflink_reserve_cow_extent(
> @@ -251,6 +301,70 @@ xfs_reflink_reserve_cow_range(
>  }
>  
>  /**
> + * xfs_reflink_allocate_cow_range() -- Allocate blocks to satisfy a copy on
> + *				       write operation.
> + * @ip: XFS inode.
> + * @pos: file offset to start CoWing.
> + * @len: number of bytes to CoW.
> + */
> +int
> +xfs_reflink_allocate_cow_range(
> +	struct xfs_inode	*ip,
> +	xfs_off_t		pos,
> +	xfs_off_t		len)
> +{
> +	struct xfs_ifork	*ifp;
> +	struct xfs_bmbt_rec_host	*gotp;
> +	struct xfs_bmbt_irec	imap;
> +	int			error = 0;
> +	xfs_fileoff_t		start_lblk;
> +	xfs_fileoff_t		end_lblk;
> +	xfs_extnum_t		idx;
> +
> +	if (!xfs_is_reflink_inode(ip))
> +		return 0;
> +
> +	trace_xfs_reflink_allocate_cow_range(ip, len, pos, 0);
> +
> +	start_lblk = XFS_B_TO_FSBT(ip->i_mount, pos);
> +	end_lblk = XFS_B_TO_FSB(ip->i_mount, pos + len);
> +	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> +
> +	gotp = xfs_iext_bno_to_ext(ifp, start_lblk, &idx);
> +	while (gotp) {
> +		xfs_bmbt_get_all(gotp, &imap);
> +
> +		if (imap.br_startoff >= end_lblk)
> +			break;
> +		if (!isnullstartblock(imap.br_startblock))
> +			goto advloop;
> +		xfs_trim_extent(&imap, start_lblk, end_lblk - start_lblk);
> +		trace_xfs_reflink_allocate_cow_extent(ip, &imap);
> +
> +		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +		error = xfs_iomap_write_allocate(ip, XFS_COW_FORK,
> +				XFS_FSB_TO_B(ip->i_mount, imap.br_startoff +
> +						imap.br_blockcount - 1), &imap);
> +		xfs_ilock(ip, XFS_ILOCK_EXCL);
> +		if (error)
> +			break;
> +advloop:
> +		/* Roll on... */
> +		idx++;
> +		if (idx >= ifp->if_bytes / sizeof(xfs_bmbt_rec_t))
> +			break;
> +		gotp = xfs_iext_get_ext(ifp, idx);
> +	}
> +
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +
> +	if (error)
> +		trace_xfs_reflink_allocate_cow_range_error(ip, error, _RET_IP_);
> +	return error;
> +}
> +
> +/**
>   * xfs_reflink_is_cow_pending() -- Determine if CoW is pending for a given
>   *				   file and offset.
>   *
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 8ec1ebb..d356c00 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -18,8 +18,13 @@
>  #ifndef __XFS_REFLINK_H
>  #define __XFS_REFLINK_H 1
>  
> +extern int xfs_reflink_irec_is_shared(struct xfs_inode *ip,
> +		struct xfs_bmbt_irec *imap, bool *shared);
> +
>  extern int xfs_reflink_reserve_cow_range(struct xfs_inode *ip, xfs_off_t pos,
>  		xfs_off_t len);
> +extern int xfs_reflink_allocate_cow_range(struct xfs_inode *ip, xfs_off_t pos,
> +		xfs_off_t len);
>  extern bool xfs_reflink_is_cow_pending(struct xfs_inode *ip, xfs_off_t offset);
>  extern int xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
>  		struct xfs_bmbt_irec *imap, bool *need_alloc);
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH 57/76] xfs: allocate delayed extents in CoW fork
  2015-12-19  9:02 ` [PATCH 57/76] xfs: allocate " Darrick J. Wong
  2016-01-03 12:20   ` Christoph Hellwig
@ 2016-01-09  9:59   ` Darrick J. Wong
  1 sibling, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-01-09  9:59 UTC (permalink / raw)
  To: david; +Cc: xfs

On Sat, Dec 19, 2015 at 01:02:48AM -0800, Darrick J. Wong wrote:
> Modify the writepage handler to find and convert pending delalloc
> extents to real allocations.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_aops.c    |   63 +++++++++++++++++++++++++++++++++++---------
>  fs/xfs/xfs_aops.h    |   10 +++++++
>  fs/xfs/xfs_reflink.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_reflink.h |    3 ++
>  4 files changed, 135 insertions(+), 13 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 13629d2..7179b25 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -285,7 +285,8 @@ xfs_map_blocks(
>  	loff_t			offset,
>  	struct xfs_bmbt_irec	*imap,
>  	int			type,
> -	int			nonblocking)
> +	int			nonblocking,
> +	bool			is_cow)
>  {
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	struct xfs_mount	*mp = ip->i_mount;
> @@ -294,10 +295,15 @@ xfs_map_blocks(
>  	int			error = 0;
>  	int			bmapi_flags = XFS_BMAPI_ENTIRE;
>  	int			nimaps = 1;
> +	int			whichfork;
> +	bool			need_alloc;
>  
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> +	whichfork = (is_cow ? XFS_COW_FORK : XFS_DATA_FORK);
> +	need_alloc = (type == XFS_IO_DELALLOC);
> +
>  	if (type == XFS_IO_UNWRITTEN)
>  		bmapi_flags |= XFS_BMAPI_IGSTATE;
>  
> @@ -315,16 +321,21 @@ xfs_map_blocks(
>  		count = mp->m_super->s_maxbytes - offset;
>  	end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)offset + count);
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
> -	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> -				imap, &nimaps, bmapi_flags);
> +
> +	if (is_cow)
> +		error = xfs_reflink_find_cow_mapping(ip, offset, imap,
> +						     &need_alloc);
> +	else
> +		error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
> +				       imap, &nimaps, bmapi_flags);

This isn't correct -- for an XFS_IO_OVERWRITE, the extent returned in imap
could have some (shared) blocks with a corresponding delalloc reservation in
the CoW fork.  If that's the case, then feeding the full extent to
xfs_cluster_write results in the ioend flag being set, which can cause the
ioend processing to crash because it'll see the delalloc reservation and freak
out.  Therefore, in the non-cow reflink-inode overwrite case, we have to find
trim the extent to just before the next CoW reservation.  That forces a call
to xfs_vm_writepage at the start of the reservation, which is the only way
that the delalloc->regular conversion can happen in the CoW fork.

(This fixes a rare crash in generic/163 and a regular corruption failure in
generic/898.)

--D

>  	xfs_iunlock(ip, XFS_ILOCK_SHARED);
>  
>  	if (error)
>  		return error;
>  
> -	if (type == XFS_IO_DELALLOC &&
> +	if (need_alloc &&
>  	    (!nimaps || isnullstartblock(imap->br_startblock))) {
> -		error = xfs_iomap_write_allocate(ip, XFS_DATA_FORK, offset,
> +		error = xfs_iomap_write_allocate(ip, whichfork, offset,
>  				imap);
>  		if (!error)
>  			trace_xfs_map_blocks_alloc(ip, offset, count, type,
> @@ -575,7 +586,8 @@ xfs_add_to_ioend(
>  	xfs_off_t		offset,
>  	unsigned int		type,
>  	xfs_ioend_t		**result,
> -	int			need_ioend)
> +	int			need_ioend,
> +	bool			is_cow)
>  {
>  	xfs_ioend_t		*ioend = *result;
>  
> @@ -593,6 +605,8 @@ xfs_add_to_ioend(
>  		ioend->io_buffer_tail->b_private = bh;
>  		ioend->io_buffer_tail = bh;
>  	}
> +	if (is_cow)
> +		ioend->io_flags |= XFS_IOEND_COW;
>  
>  	bh->b_private = NULL;
>  	ioend->io_size += bh->b_size;
> @@ -703,6 +717,8 @@ xfs_convert_page(
>  	int			len, page_dirty;
>  	int			count = 0, done = 0, uptodate = 1;
>   	xfs_off_t		offset = page_offset(page);
> +	bool			is_cow;
> +	struct xfs_inode	*ip = XFS_I(inode);
>  
>  	if (page->index != tindex)
>  		goto fail;
> @@ -786,6 +802,15 @@ xfs_convert_page(
>  			else
>  				type = XFS_IO_OVERWRITE;
>  
> +			/* Figure out if CoW is pending for the other pages. */
> +			is_cow = false;
> +			if (type == XFS_IO_OVERWRITE &&
> +			    xfs_sb_version_hasreflink(&ip->i_mount->m_sb)) {
> +				xfs_ilock(ip, XFS_ILOCK_SHARED);
> +				is_cow = xfs_reflink_is_cow_pending(ip, offset);
> +				xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +			}
> +
>  			/*
>  			 * imap should always be valid because of the above
>  			 * partial page end_offset check on the imap.
> @@ -793,10 +818,10 @@ xfs_convert_page(
>  			ASSERT(xfs_imap_valid(inode, imap, offset));
>  
>  			lock_buffer(bh);
> -			if (type != XFS_IO_OVERWRITE)
> +			if (type != XFS_IO_OVERWRITE || is_cow)
>  				xfs_map_at_offset(inode, bh, imap, offset);
>  			xfs_add_to_ioend(inode, bh, offset, type,
> -					 ioendp, done);
> +					 ioendp, done, is_cow);
>  
>  			page_dirty--;
>  			count++;
> @@ -959,6 +984,8 @@ xfs_vm_writepage(
>  	int			err, imap_valid = 0, uptodate = 1;
>  	int			count = 0;
>  	int			nonblocking = 0;
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	bool			was_cow, is_cow;
>  
>  	trace_xfs_writepage(inode, page, 0, 0);
>  
> @@ -1057,6 +1084,7 @@ xfs_vm_writepage(
>  	bh = head = page_buffers(page);
>  	offset = page_offset(page);
>  	type = XFS_IO_OVERWRITE;
> +	was_cow = false;
>  
>  	if (wbc->sync_mode == WB_SYNC_NONE)
>  		nonblocking = 1;
> @@ -1069,6 +1097,7 @@ xfs_vm_writepage(
>  		if (!buffer_uptodate(bh))
>  			uptodate = 0;
>  
> +		is_cow = false;
>  		/*
>  		 * set_page_dirty dirties all buffers in a page, independent
>  		 * of their state.  The dirty state however is entirely
> @@ -1091,7 +1120,15 @@ xfs_vm_writepage(
>  				imap_valid = 0;
>  			}
>  		} else if (buffer_uptodate(bh)) {
> -			if (type != XFS_IO_OVERWRITE) {
> +			if (xfs_sb_version_hasreflink(&ip->i_mount->m_sb)) {
> +				xfs_ilock(ip, XFS_ILOCK_SHARED);
> +				is_cow = xfs_reflink_is_cow_pending(ip,
> +						offset);
> +				xfs_iunlock(ip, XFS_ILOCK_SHARED);
> +			}
> +
> +			if (type != XFS_IO_OVERWRITE ||
> +			    is_cow != was_cow) {
>  				type = XFS_IO_OVERWRITE;
>  				imap_valid = 0;
>  			}
> @@ -1121,23 +1158,23 @@ xfs_vm_writepage(
>  			 */
>  			new_ioend = 1;
>  			err = xfs_map_blocks(inode, offset, &imap, type,
> -					     nonblocking);
> +					     nonblocking, is_cow);
>  			if (err)
>  				goto error;
>  			imap_valid = xfs_imap_valid(inode, &imap, offset);
>  		}
>  		if (imap_valid) {
>  			lock_buffer(bh);
> -			if (type != XFS_IO_OVERWRITE)
> +			if (type != XFS_IO_OVERWRITE || is_cow)
>  				xfs_map_at_offset(inode, bh, &imap, offset);
>  			xfs_add_to_ioend(inode, bh, offset, type, &ioend,
> -					 new_ioend);
> +					 new_ioend, is_cow);
>  			count++;
>  		}
>  
>  		if (!iohead)
>  			iohead = ioend;
> -
> +		was_cow = is_cow;
>  	} while (offset += len, ((bh = bh->b_this_page) != head));
>  
>  	if (uptodate && bh == head)
> diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
> index f6ffc9a..ac59548 100644
> --- a/fs/xfs/xfs_aops.h
> +++ b/fs/xfs/xfs_aops.h
> @@ -34,6 +34,8 @@ enum {
>  	{ XFS_IO_UNWRITTEN,		"unwritten" }, \
>  	{ XFS_IO_OVERWRITE,		"overwrite" }
>  
> +#define XFS_IOEND_COW		(1)
> +
>  /*
>   * xfs_ioend struct manages large extent writes for XFS.
>   * It can manage several multi-page bio's at once.
> @@ -50,8 +52,16 @@ typedef struct xfs_ioend {
>  	xfs_off_t		io_offset;	/* offset in the file */
>  	struct work_struct	io_work;	/* xfsdatad work queue */
>  	struct xfs_trans	*io_append_trans;/* xact. for size update */
> +	unsigned long		io_flags;	/* status flags */
>  } xfs_ioend_t;
>  
> +static inline bool
> +xfs_ioend_is_cow(
> +	struct xfs_ioend	*ioend)
> +{
> +	return ioend->io_flags & XFS_IOEND_COW;
> +}
> +
>  extern const struct address_space_operations xfs_address_space_operations;
>  
>  int	xfs_get_blocks(struct inode *inode, sector_t offset,
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index fdc538e..65d4c2d 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -249,3 +249,75 @@ xfs_reflink_reserve_cow_range(
>  		trace_xfs_reflink_reserve_cow_range_error(ip, error, _RET_IP_);
>  	return error;
>  }
> +
> +/**
> + * xfs_reflink_is_cow_pending() -- Determine if CoW is pending for a given
> + *				   file and offset.
> + *
> + * @ip: XFS inode object.
> + * @offset: The file offset, in bytes.
> + */
> +bool
> +xfs_reflink_is_cow_pending(
> +	struct xfs_inode		*ip,
> +	xfs_off_t			offset)
> +{
> +	struct xfs_ifork		*ifp;
> +	struct xfs_bmbt_rec_host	*gotp;
> +	struct xfs_bmbt_irec		irec;
> +	xfs_fileoff_t			bno;
> +	xfs_extnum_t			idx;
> +
> +	if (!xfs_is_reflink_inode(ip))
> +		return false;
> +
> +	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> +	bno = XFS_B_TO_FSBT(ip->i_mount, offset);
> +	gotp = xfs_iext_bno_to_ext(ifp, bno, &idx);
> +
> +	if (!gotp)
> +		return false;
> +
> +	xfs_bmbt_get_all(gotp, &irec);
> +	if (bno >= irec.br_startoff + irec.br_blockcount ||
> +	    bno < irec.br_startoff)
> +		return false;
> +	return true;
> +}
> +
> +/**
> + * xfs_reflink_find_cow_mapping() -- Find the mapping for a CoW block.
> + *
> + * @ip: The XFS inode object.
> + * @offset: The file offset, in bytes.
> + * @imap: The mapping we're going to use for this block.
> + * @nimaps: Number of mappings we're returning.
> + */
> +int
> +xfs_reflink_find_cow_mapping(
> +	struct xfs_inode		*ip,
> +	xfs_off_t			offset,
> +	struct xfs_bmbt_irec		*imap,
> +	bool				*need_alloc)
> +{
> +	struct xfs_bmbt_irec		irec;
> +	struct xfs_ifork		*ifp;
> +	struct xfs_bmbt_rec_host	*gotp;
> +	xfs_fileoff_t			bno;
> +	xfs_extnum_t			idx;
> +
> +	/* Find the extent in the CoW fork. */
> +	ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> +	bno = XFS_B_TO_FSBT(ip->i_mount, offset);
> +	gotp = xfs_iext_bno_to_ext(ifp, bno, &idx);
> +	xfs_bmbt_get_all(gotp, &irec);
> +
> +	trace_xfs_reflink_find_cow_mapping(ip, offset, 1, XFS_IO_OVERWRITE,
> +			&irec);
> +
> +	/* If it's still delalloc, we must allocate later. */
> +	*imap = irec;
> +	*need_alloc = !!(isnullstartblock(irec.br_startblock));
> +
> +	return 0;
> +}
> diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
> index 41afdbe..1018ac9 100644
> --- a/fs/xfs/xfs_reflink.h
> +++ b/fs/xfs/xfs_reflink.h
> @@ -20,5 +20,8 @@
>  
>  extern int xfs_reflink_reserve_cow_range(struct xfs_inode *ip, xfs_off_t pos,
>  		xfs_off_t len);
> +extern bool xfs_reflink_is_cow_pending(struct xfs_inode *ip, xfs_off_t offset);
> +extern int xfs_reflink_find_cow_mapping(struct xfs_inode *ip, xfs_off_t offset,
> +		struct xfs_bmbt_irec *imap, bool *need_alloc);
>  
>  #endif /* __XFS_REFLINK_H */
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support
  2016-01-06  3:44         ` Dave Chinner
@ 2016-02-02 23:06           ` Darrick J. Wong
  0 siblings, 0 replies; 102+ messages in thread
From: Darrick J. Wong @ 2016-02-02 23:06 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, xfs

On Wed, Jan 06, 2016 at 02:44:15PM +1100, Dave Chinner wrote:
> On Tue, Jan 05, 2016 at 06:04:40PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 05, 2016 at 07:42:26AM -0500, Brian Foster wrote:
> > > On Mon, Jan 04, 2016 at 03:59:51PM -0800, Darrick J. Wong wrote:
> > > > I've temporarily fixed this by adding code that figures out how many blocks we
> > > > need if the reference count btree has to have a unique record for every block
> > > > in the AG and holding that many blocks until either they're allocated to the
> > > > refcount btree or freed at umount time.  Right now it's a temporary fix (if the
> > > > FS crashes, the reserved blocks are lost) but it wouldn't be difficult for the
> > > > FS to make a permanent reservation that's recorded on disk somehow.  But that's
> > > > involves writing things to disk + making xfsprogs understand the reservation;
> > > > let's see what people say about the reserved pool idea at all.
> > > > 
> > > > Does that make sense? :)
> > > > 
> > > 
> > > Yep, it sounds sort of like the reserve pool mechanism used to protect
> > > against ENOSPC when freeing blocks. Curious... why are the reserved
> > > blocks lost on fs crash? Wouldn't they be reserved again on the
> > > subsequent mount?
> > 
> > They will, but the pre-crash reservation isn't (yet) written down anywhere on
> > disk.
> 
> Does it need to be? The global reserve pool is not "written down"
> anywhere. When we mount, we pull the reserve from the global free
> space accounting. Hence we given ENOSPC when we've used "total fs
> blocks - reserve pool blocks" in memory, and so if we crash we've
> still got at least that many free blocks on disk. hence on mount we
> re-reserve those blocks in memory and everything is back to the way
> it was prior to the crash.
> 
> I suspect the per-ag code is a bit different, but it should be able
> to work the same way. i.e. when we initialise the per-ag structure,
> we pull the reserve from the free block count in the AG, as well as
> from the global free space count. Then we will get correct global
> ENOSPC detection, as well as leave enough space free in each AG as
> we scan and skip them during allocation...
> 
> As long as the per-ag reservation is restored during mount before we
> do EFI recovery processing (i.e. between the two log recovery
> phases), it should restore the reserve pool to the same size as it
> was before a crash occurred....
> 
> Unless, of course, I'm missing something newly introduced by the
> reflink code...

Technically you were, but I've fixed the reservation code to exist purely as
in-core magic that works more or less how you outlined above.  No more on-disk
artifacts, no more need to write a persistence and recovery mechanism. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2016-02-02 23:06 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-19  8:56 [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Darrick J. Wong
2015-12-19  8:56 ` [PATCH 01/76] libxfs: make xfs_alloc_fix_freelist non-static Darrick J. Wong
2015-12-19  8:56 ` [PATCH 02/76] xfs: fix log ticket type printing Darrick J. Wong
2016-01-03 12:13   ` Christoph Hellwig
2016-01-03 21:29     ` Dave Chinner
2016-01-04 19:57       ` Darrick J. Wong
2015-12-19  8:56 ` [PATCH 03/76] libxfs: refactor the btree size calculator code Darrick J. Wong
2015-12-20 20:39   ` Dave Chinner
2016-01-04 22:06     ` Darrick J. Wong
2015-12-19  8:56 ` [PATCH 04/76] libxfs: use a convenience variable instead of open-coding the fork Darrick J. Wong
2015-12-19  8:56 ` [PATCH 05/76] libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct Darrick J. Wong
2016-01-03 12:15   ` Christoph Hellwig
2016-01-04 22:12     ` Darrick J. Wong
2016-01-04 23:23       ` Darrick J. Wong
2016-01-04 23:51       ` Dave Chinner
2015-12-19  8:57 ` [PATCH 06/76] xfs: introduce rmap btree definitions Darrick J. Wong
2015-12-19  8:57 ` [PATCH 07/76] xfs: add rmap btree stats infrastructure Darrick J. Wong
2015-12-19  8:57 ` [PATCH 08/76] xfs: rmap btree add more reserved blocks Darrick J. Wong
2015-12-19  8:57 ` [PATCH 09/76] xfs: add owner field to extent allocation and freeing Darrick J. Wong
2015-12-19  8:57 ` [PATCH 10/76] xfs: add extended " Darrick J. Wong
2015-12-19  8:57 ` [PATCH 11/76] xfs: introduce rmap extent operation stubs Darrick J. Wong
2015-12-19  8:57 ` [PATCH 12/76] xfs: extend rmap extent operation stubs to take full owner info Darrick J. Wong
2015-12-19  8:57 ` [PATCH 13/76] xfs: define the on-disk rmap btree format Darrick J. Wong
2015-12-19  8:57 ` [PATCH 14/76] xfs: enhance " Darrick J. Wong
2015-12-19  8:58 ` [PATCH 15/76] xfs: add rmap btree growfs support Darrick J. Wong
2015-12-19  8:58 ` [PATCH 16/76] xfs: enhance " Darrick J. Wong
2015-12-19  8:58 ` [PATCH 17/76] xfs: rmap btree transaction reservations Darrick J. Wong
2015-12-19  8:58 ` [PATCH 18/76] xfs: rmap btree requires more reserved free space Darrick J. Wong
2015-12-19  8:58 ` [PATCH 19/76] libxfs: fix min freelist length calculation Darrick J. Wong
2015-12-19  8:58 ` [PATCH 20/76] xfs: add rmap btree operations Darrick J. Wong
2015-12-19  8:58 ` [PATCH 21/76] xfs: enhance " Darrick J. Wong
2015-12-19  8:58 ` [PATCH 22/76] xfs: add an extent to the rmap btree Darrick J. Wong
2015-12-19  8:58 ` [PATCH 23/76] xfs: add tracepoints for the rmap-mirrors-bmbt functions Darrick J. Wong
2015-12-19  8:58 ` [PATCH 24/76] xfs: teach rmap_alloc how to deal with our larger rmap btree Darrick J. Wong
2015-12-19  8:59 ` [PATCH 25/76] xfs: remove an extent from the " Darrick J. Wong
2015-12-19  8:59 ` [PATCH 26/76] xfs: enhanced " Darrick J. Wong
2015-12-19  8:59 ` [PATCH 27/76] xfs: add rmap btree insert and delete helpers Darrick J. Wong
2015-12-19  8:59 ` [PATCH 28/76] xfs: piggyback rmapbt update intents in the bmap free structure Darrick J. Wong
2015-12-19  8:59 ` [PATCH 29/76] xfs: bmap btree changes should update rmap btree Darrick J. Wong
2015-12-19  8:59 ` [PATCH 30/76] xfs: add rmap btree geometry feature flag Darrick J. Wong
2015-12-19  8:59 ` [PATCH 31/76] xfs: add rmap btree block detection to log recovery Darrick J. Wong
2015-12-19  8:59 ` [PATCH 32/76] xfs: enable the rmap btree functionality Darrick J. Wong
2015-12-19  9:00 ` [PATCH 33/76] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Darrick J. Wong
2015-12-19  9:00 ` [PATCH 34/76] xfs: implement " Darrick J. Wong
2016-01-03 12:17   ` Christoph Hellwig
2016-01-04 23:40     ` Darrick J. Wong
2016-01-05  2:41       ` Dave Chinner
2016-01-07  0:09         ` Darrick J. Wong
2015-12-19  9:00 ` [PATCH 35/76] libxfs: refactor short btree block verification Darrick J. Wong
2016-01-03 12:18   ` Christoph Hellwig
2016-01-03 21:30     ` Dave Chinner
2015-12-19  9:00 ` [PATCH 36/76] xfs: don't update rmapbt when fixing agfl Darrick J. Wong
2015-12-19  9:00 ` [PATCH 37/76] xfs: define tracepoints for refcount btree activities Darrick J. Wong
2015-12-19  9:00 ` [PATCH 38/76] xfs: introduce refcount btree definitions Darrick J. Wong
2015-12-19  9:00 ` [PATCH 39/76] xfs: add refcount btree stats infrastructure Darrick J. Wong
2015-12-19  9:00 ` [PATCH 40/76] xfs: refcount btree add more reserved blocks Darrick J. Wong
2015-12-19  9:00 ` [PATCH 41/76] xfs: define the on-disk refcount btree format Darrick J. Wong
2015-12-19  9:00 ` [PATCH 42/76] xfs: add refcount btree support to growfs Darrick J. Wong
2015-12-19  9:01 ` [PATCH 43/76] xfs: add refcount btree operations Darrick J. Wong
2015-12-19  9:01 ` [PATCH 44/76] libxfs: adjust refcount of an extent of blocks in refcount btree Darrick J. Wong
2015-12-19  9:01 ` [PATCH 45/76] libxfs: adjust refcount when unmapping file blocks Darrick J. Wong
2015-12-19  9:01 ` [PATCH 46/76] xfs: add refcount btree block detection to log recovery Darrick J. Wong
2015-12-19  9:01 ` [PATCH 47/76] xfs: refcount btree requires more reserved space Darrick J. Wong
2015-12-19  9:01 ` [PATCH 48/76] xfs: introduce reflink utility functions Darrick J. Wong
2015-12-19  9:01 ` [PATCH 49/76] xfs: define tracepoints for reflink activities Darrick J. Wong
2015-12-19  9:01 ` [PATCH 50/76] xfs: map an inode's offset to an exact physical block Darrick J. Wong
2015-12-19  9:02 ` [PATCH 51/76] xfs: add reflink feature flag to geometry Darrick J. Wong
2015-12-19  9:02 ` [PATCH 52/76] xfs: don't allow reflinked dir/dev/fifo/socket/pipe files Darrick J. Wong
2015-12-19  9:02 ` [PATCH 53/76] xfs: introduce the CoW fork Darrick J. Wong
2015-12-19  9:02 ` [PATCH 54/76] xfs: support bmapping delalloc extents in " Darrick J. Wong
2015-12-19  9:02 ` [PATCH 55/76] xfs: create delalloc extents in " Darrick J. Wong
2015-12-19  9:02 ` [PATCH 56/76] xfs: support allocating delayed " Darrick J. Wong
2015-12-19  9:02 ` [PATCH 57/76] xfs: allocate " Darrick J. Wong
2016-01-03 12:20   ` Christoph Hellwig
2016-01-05  1:13     ` Darrick J. Wong
2016-01-09  9:59   ` Darrick J. Wong
2015-12-19  9:02 ` [PATCH 58/76] xfs: support removing extents from " Darrick J. Wong
2015-12-19  9:03 ` [PATCH 59/76] xfs: move mappings from cow fork to data fork after copy-write Darrick J. Wong
2015-12-19  9:03 ` [PATCH 60/76] xfs: implement CoW for directio writes Darrick J. Wong
2016-01-08  9:34   ` Darrick J. Wong
2015-12-19  9:03 ` [PATCH 61/76] xfs: copy-on-write reflinked blocks when zeroing ranges of blocks Darrick J. Wong
2015-12-19  9:03 ` [PATCH 62/76] xfs: clear inode reflink flag when freeing blocks Darrick J. Wong
2015-12-19  9:03 ` [PATCH 63/76] xfs: cancel pending CoW reservations when destroying inodes Darrick J. Wong
2015-12-19  9:03 ` [PATCH 64/76] xfs: reflink extents from one file to another Darrick J. Wong
2015-12-19  9:03 ` [PATCH 65/76] xfs: add clone file and clone range ioctls Darrick J. Wong
2015-12-19  9:03 ` [PATCH 66/76] xfs: emulate the btrfs dedupe extent same ioctl Darrick J. Wong
2015-12-19  9:03 ` [PATCH 67/76] xfs: teach fiemap about reflink'd extents Darrick J. Wong
2015-12-19  9:03 ` [PATCH 68/76] xfs: swap inode reflink flags when swapping inode extents Darrick J. Wong
2015-12-19  9:04 ` [PATCH 69/76] xfs: unshare a range of blocks via fallocate Darrick J. Wong
2015-12-19  9:04 ` [PATCH 70/76] xfs: fork shared EOF block when truncating file Darrick J. Wong
2015-12-19  9:04 ` [PATCH 71/76] xfs: support XFS_XFLAG_REFLINK (and FS_NOCOW_FL) on reflink filesystems Darrick J. Wong
2015-12-19  9:04 ` [PATCH 72/76] xfs: recognize the reflink feature bit Darrick J. Wong
2015-12-19  9:04 ` [PATCH 73/76] xfs: use new vfs reflink and dedup function pointers Darrick J. Wong
2015-12-19  9:04 ` [PATCH 74/76] xfs: set up per-AG preallocated block pools Darrick J. Wong
2015-12-19  9:04 ` [PATCH 75/76] xfs: preallocate blocks for worst-case refcount btree expansion Darrick J. Wong
2015-12-19  9:04 ` [PATCH 76/76] xfs: try to prevent failed rmap btree expansion during cow Darrick J. Wong
2015-12-20 14:02 ` [RFCv4 00/76] xfs: add reverse-mapping, reflink, and dedupe support Brian Foster
2016-01-04 23:59   ` Darrick J. Wong
2016-01-05 12:42     ` Brian Foster
2016-01-06  2:04       ` Darrick J. Wong
2016-01-06  3:44         ` Dave Chinner
2016-02-02 23:06           ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.