All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] xfs: reverse mapping btree support
@ 2015-06-03  6:04 Dave Chinner
  2015-06-03  6:04 ` [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures Dave Chinner
                   ` (19 more replies)
  0 siblings, 20 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

Hi Folks,

This is the first pass at kernel support for reverse block mapping
btree support in XFS. It is a single [block, owner] tree, so does
not support multiple mappings of a single block. That's left as an
exercise for Darrick to solve with the reflink btree code he is
working on. :P

The patch set is based on the current for-next branch (i.e. on top
of the sparse inode branch). I've also got my DAX branch and all the
current pending fixes applied ahead of this series too, so if you
have problems applying it, those will go away in the next couple of
days as those patchsets are pushed into the for-next branch.

The first 4 patches are not rmap btree patches - they do an initial
cleanup on xfs_alloc_fix_freelist() to be easier to read and also to
clean up some of the macros we need to add rmap support to. These 4
patches stand alone and can be reviewed and committed sepearately to
the rest of the rmap btree code.

In general, the approach I;ve taken is to add a piece at a time in
each patch. I've added stubs to show how owner information is will
be passed around, as well as how the rmap btree will be modified
long before the the rmap implementation is added. I did this to
clearly separate the rmap btree implementation details from
surrounding code that drives it. hence it should be easier to review
as the changes are as simply as possible.

It's not 100% complete - there is still things like XFS_IOC_SWAPEXT
that needs to be handled, but like the original CRC implementation
swapping the owners of extents is not straight forward, and so that
will be done once the core implementation has settled down.

As for testing, this really is only lightly smoke tested right now;
sufficient to post it as an RFC, but that's really it because the
userspace side is not yet complete. e.g. xfs_repair will typically
SEGV in phase 5 because there's no code to rebuild the rmap btree
yet. xfs_repair -n, however, does work, and it does validate the
consistency of the rmap btree against the free space btrees and the
used space indexed by the filesystem in phases 3 and 4. That said,
it has passed the entire "quick" group in xfstests, except for the
tests dependent on swapext or repair working correctly, so it's not
entirely broken. ;)

The current userspace code is still based on the older
single-patch code that I started with a couple of days ago before
splitting the kernel kernel code into this series. It's just a
single patch right now based on top of the libxfs-4.1-update branch
in the kernel.org tree. Get it from the following repo branch:

git://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git rmap-btree

I've been testing with:

# mkfs.xfs -f -m crc=1,finobt=1,rmapbt=1 <dev>

Over the next couple of days, I'll rebase the libxfs-4.1-update
branch to the current master branch and push Brian's sparse inode
support patchset into it as well. Then I'll start cleaning up the
userspace patchset and implementing some of the missing pieces.

Comments, flames, insanity pleas, testing and discussion all
welcome.

-Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-15 14:57   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 02/20] xfs: factor out free space extent length check Dave Chinner
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
access and agf buffer  based access to freelist and space usage
information. However, once the AGF buffer is locked inside this
function, it is guaranteed that both the in-memory and on-disk
values are identical. xfs_alloc_fix_freelist() doesn't modify the
values in the structures directly, so it is a read-only user of the
infomration, and hence can use the per-ag structure exclusively for
determining what it should do.

This opens up an avenue for cleaning up a lot of duplicated logic
whose only difference is the structure it gets the data from, and in
doing so removes a lot of needless byte swapping overhead when
fixing up the free list.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 69 +++++++++++++++++++++--------------------------
 fs/xfs/libxfs/xfs_alloc.h |  8 ++----
 fs/xfs/libxfs/xfs_bmap.c  |  3 ++-
 fs/xfs/xfs_filestream.c   |  3 ++-
 4 files changed, 37 insertions(+), 46 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index bc78ac0..08b45f8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1857,11 +1857,11 @@ xfs_alloc_compute_maxlevels(
 xfs_extlen_t
 xfs_alloc_longest_free_extent(
 	struct xfs_mount	*mp,
-	struct xfs_perag	*pag)
+	struct xfs_perag	*pag,
+	xfs_extlen_t		need)
 {
-	xfs_extlen_t		need, delta = 0;
+	xfs_extlen_t		delta = 0;
 
-	need = XFS_MIN_FREELIST_PAG(pag, mp);
 	if (need > pag->pagf_flcount)
 		delta = need - pag->pagf_flcount;
 
@@ -1880,10 +1880,8 @@ xfs_alloc_fix_freelist(
 	int		flags)	/* XFS_ALLOC_FLAG_... */
 {
 	xfs_buf_t	*agbp;	/* agf buffer pointer */
-	xfs_agf_t	*agf;	/* a.g. freespace structure pointer */
 	xfs_buf_t	*agflbp;/* agfl buffer pointer */
 	xfs_agblock_t	bno;	/* freelist block */
-	xfs_extlen_t	delta;	/* new blocks needed in freelist */
 	int		error;	/* error result code */
 	xfs_extlen_t	longest;/* longest extent in allocation group */
 	xfs_mount_t	*mp;	/* file system mount point structure */
@@ -1927,7 +1925,7 @@ xfs_alloc_fix_freelist(
 		 * total blocks, reject it.
 		 */
 		need = XFS_MIN_FREELIST_PAG(pag, mp);
-		longest = xfs_alloc_longest_free_extent(mp, pag);
+		longest = xfs_alloc_longest_free_extent(mp, pag, need);
 		if ((args->minlen + args->alignment + args->minalignslop - 1) >
 				longest ||
 		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
@@ -1954,25 +1952,16 @@ xfs_alloc_fix_freelist(
 			return 0;
 		}
 	}
-	/*
-	 * Figure out how many blocks we should have in the freelist.
-	 */
-	agf = XFS_BUF_TO_AGF(agbp);
-	need = XFS_MIN_FREELIST(agf, mp);
-	/*
-	 * If there isn't enough total or single-extent, reject it.
-	 */
+
+
+	/* If there isn't enough total space or single-extent, reject it. */
+	need = XFS_MIN_FREELIST_PAG(pag, mp);
 	if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
-		delta = need > be32_to_cpu(agf->agf_flcount) ?
-			(need - be32_to_cpu(agf->agf_flcount)) : 0;
-		longest = be32_to_cpu(agf->agf_longest);
-		longest = (longest > delta) ? (longest - delta) :
-			(be32_to_cpu(agf->agf_flcount) > 0 || longest > 0);
+		longest = xfs_alloc_longest_free_extent(mp, pag, need);
 		if ((args->minlen + args->alignment + args->minalignslop - 1) >
 				longest ||
-		    ((int)(be32_to_cpu(agf->agf_freeblks) +
-		     be32_to_cpu(agf->agf_flcount) - need - args->total) <
-				(int)args->minleft)) {
+		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
+			   need - args->total) < (int)args->minleft)) {
 			xfs_trans_brelse(tp, agbp);
 			args->agbp = NULL;
 			return 0;
@@ -1980,21 +1969,25 @@ xfs_alloc_fix_freelist(
 	}
 	/*
 	 * Make the freelist shorter if it's too long.
+	 *
+	 * XXX (dgc): When we have lots of free space, does this buy us
+	 * anything other than extra overhead when we need to put more blocks
+	 * back on the free list? Maybe we should only do this when space is
+	 * getting low or the AGFL is more than half full?
 	 */
-	while (be32_to_cpu(agf->agf_flcount) > need) {
-		xfs_buf_t	*bp;
+	while (pag->pagf_flcount > need) {
+		struct xfs_buf	*bp;
 
 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
 		if (error)
 			return error;
-		if ((error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1)))
+		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
+		if (error)
 			return error;
 		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
 		xfs_trans_binval(tp, bp);
 	}
-	/*
-	 * Initialize the args structure.
-	 */
+
 	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
@@ -2003,18 +1996,18 @@ xfs_alloc_fix_freelist(
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
 	targs.type = XFS_ALLOCTYPE_THIS_AG;
 	targs.pag = pag;
-	if ((error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp)))
+	error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
+	if (error)
 		return error;
-	/*
-	 * Make the freelist longer if it's too short.
-	 */
-	while (be32_to_cpu(agf->agf_flcount) < need) {
+
+	/* Make the freelist longer if it's too short. */
+	while (pag->pagf_flcount < need) {
 		targs.agbno = 0;
-		targs.maxlen = need - be32_to_cpu(agf->agf_flcount);
-		/*
-		 * Allocate as many blocks as possible at once.
-		 */
-		if ((error = xfs_alloc_ag_vextent(&targs))) {
+		targs.maxlen = need - pag->pagf_flcount;
+
+		/* Allocate as many blocks as possible at once. */
+		error = xfs_alloc_ag_vextent(&targs);
+		if (error) {
 			xfs_trans_brelse(tp, agflbp);
 			return error;
 		}
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 29f27b2..a4d3b9a 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -130,12 +130,8 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_USERDATA		1	/* allocation is for user data*/
 #define XFS_ALLOC_INITIAL_USER_DATA	2	/* special case start of file */
 
-/*
- * Find the length of the longest extent in an AG.
- */
-xfs_extlen_t
-xfs_alloc_longest_free_extent(struct xfs_mount *mp,
-		struct xfs_perag *pag);
+xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
+		struct xfs_perag *pag, xfs_extlen_t need);
 
 /*
  * Compute and fill in value of m_ag_maxlevels.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 5cb3e85..7382cce 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3521,7 +3521,8 @@ xfs_bmap_longest_free_extent(
 		}
 	}
 
-	longest = xfs_alloc_longest_free_extent(mp, pag);
+	longest = xfs_alloc_longest_free_extent(mp, pag,
+						XFS_MIN_FREELIST_PAG(pag, mp));
 	if (*blen < longest)
 		*blen = longest;
 
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index da82f1c..9ac5eaa 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -196,7 +196,8 @@ xfs_filestream_pick_ag(
 			goto next_ag;
 		}
 
-		longest = xfs_alloc_longest_free_extent(mp, pag);
+		longest = xfs_alloc_longest_free_extent(mp, pag,
+						XFS_MIN_FREELIST_PAG(pag, mp));
 		if (((minlen && longest >= minlen) ||
 		     (!minlen && pag->pagf_freeblks >= minfree)) &&
 		    (!pag->pagf_metadata || !(flags & XFS_PICK_USERDATA) ||
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/20] xfs: factor out free space extent length check
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
  2015-06-03  6:04 ` [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-15 14:58   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist Dave Chinner
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The longest extent length checks in xfs_alloc_fix_freelist() are now
essentially identical. Factor them out into a helper function, so we
know they are checking exactly the same thing before and after we
lock the AGF.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 71 +++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 08b45f8..2471cb5 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1871,6 +1871,39 @@ xfs_alloc_longest_free_extent(
 }
 
 /*
+ * Check if the operation we are fixing up the freelist for should go ahead or
+ * not. If we are freeing blocks, we always allow it, otherwise the allocation
+ * is dependent on whether the size and shape of free space available will
+ * permit the requested allocation to take place.
+ */
+static bool
+xfs_alloc_space_available(
+	struct xfs_alloc_arg	*args,
+	xfs_extlen_t		min_free,
+	int			flags)
+{
+	struct xfs_perag	*pag = args->pag;
+	xfs_extlen_t		longest;
+	int			available;
+
+	if (flags & XFS_ALLOC_FLAG_FREEING)
+		return true;
+
+	/* do we have enough contiguous free space for the allocation? */
+	longest = xfs_alloc_longest_free_extent(args->mp, pag, min_free);
+	if ((args->minlen + args->alignment + args->minalignslop - 1) > longest)
+		return false;
+
+	/* do have enough free space remaining for the allocation? */
+	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
+			  min_free - args->total);
+	if (available < (int)args->minleft)
+		return false;
+
+	return true;
+}
+
+/*
  * Decide whether to use this allocation group for this allocation.
  * If so, fix up the btree freelist's size.
  */
@@ -1883,7 +1916,6 @@ xfs_alloc_fix_freelist(
 	xfs_buf_t	*agflbp;/* agfl buffer pointer */
 	xfs_agblock_t	bno;	/* freelist block */
 	int		error;	/* error result code */
-	xfs_extlen_t	longest;/* longest extent in allocation group */
 	xfs_mount_t	*mp;	/* file system mount point structure */
 	xfs_extlen_t	need;	/* total blocks needed in freelist */
 	xfs_perag_t	*pag;	/* per-ag information structure */
@@ -1919,22 +1951,12 @@ xfs_alloc_fix_freelist(
 		return 0;
 	}
 
-	if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
-		/*
-		 * If it looks like there isn't a long enough extent, or enough
-		 * total blocks, reject it.
-		 */
-		need = XFS_MIN_FREELIST_PAG(pag, mp);
-		longest = xfs_alloc_longest_free_extent(mp, pag, need);
-		if ((args->minlen + args->alignment + args->minalignslop - 1) >
-				longest ||
-		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
-			   need - args->total) < (int)args->minleft)) {
-			if (agbp)
-				xfs_trans_brelse(tp, agbp);
-			args->agbp = NULL;
-			return 0;
-		}
+	need = XFS_MIN_FREELIST_PAG(pag, mp);
+	if (!xfs_alloc_space_available(args, need, flags)) {
+		if (agbp)
+			xfs_trans_brelse(tp, agbp);
+		args->agbp = NULL;
+		return 0;
 	}
 
 	/*
@@ -1956,17 +1978,12 @@ xfs_alloc_fix_freelist(
 
 	/* If there isn't enough total space or single-extent, reject it. */
 	need = XFS_MIN_FREELIST_PAG(pag, mp);
-	if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
-		longest = xfs_alloc_longest_free_extent(mp, pag, need);
-		if ((args->minlen + args->alignment + args->minalignslop - 1) >
-				longest ||
-		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
-			   need - args->total) < (int)args->minleft)) {
-			xfs_trans_brelse(tp, agbp);
-			args->agbp = NULL;
-			return 0;
-		}
+	if (!xfs_alloc_space_available(args, need, flags)) {
+		xfs_trans_brelse(tp, agbp);
+		args->agbp = NULL;
+		return 0;
 	}
+
 	/*
 	 * Make the freelist shorter if it's too long.
 	 *
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
  2015-06-03  6:04 ` [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures Dave Chinner
  2015-06-03  6:04 ` [PATCH 02/20] xfs: factor out free space extent length check Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-15 14:58   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 04/20] xfs: clean up XFS_MIN_FREELIST macros Dave Chinner
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The error handling is currently an inconsistent mess as every error
condition handles return values and releasing buffers individually.
Clean this up by using gotos and a sane error label stack.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 103 +++++++++++++++++++++-------------------------
 1 file changed, 47 insertions(+), 56 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 2471cb5..352db46 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1909,80 +1909,65 @@ xfs_alloc_space_available(
  */
 STATIC int			/* error */
 xfs_alloc_fix_freelist(
-	xfs_alloc_arg_t	*args,	/* allocation argument structure */
-	int		flags)	/* XFS_ALLOC_FLAG_... */
+	struct xfs_alloc_arg	*args,	/* allocation argument structure */
+	int			flags)	/* XFS_ALLOC_FLAG_... */
 {
-	xfs_buf_t	*agbp;	/* agf buffer pointer */
-	xfs_buf_t	*agflbp;/* agfl buffer pointer */
-	xfs_agblock_t	bno;	/* freelist block */
-	int		error;	/* error result code */
-	xfs_mount_t	*mp;	/* file system mount point structure */
-	xfs_extlen_t	need;	/* total blocks needed in freelist */
-	xfs_perag_t	*pag;	/* per-ag information structure */
-	xfs_alloc_arg_t	targs;	/* local allocation arguments */
-	xfs_trans_t	*tp;	/* transaction pointer */
-
-	mp = args->mp;
+	struct xfs_mount	*mp = args->mp;
+	struct xfs_perag	*pag = args->pag;
+	struct xfs_trans	*tp = args->tp;
+	struct xfs_buf		*agbp = NULL;
+	struct xfs_buf		*agflbp = NULL;
+	struct xfs_alloc_arg	targs;	/* local allocation arguments */
+	xfs_agblock_t		bno;	/* freelist block */
+	xfs_extlen_t		need;	/* total blocks needed in freelist */
+	int			error;
 
-	pag = args->pag;
-	tp = args->tp;
 	if (!pag->pagf_init) {
-		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
-				&agbp)))
-			return error;
+		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
+		if (error)
+			goto out_no_agbp;
 		if (!pag->pagf_init) {
 			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
 			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
-			args->agbp = NULL;
-			return 0;
+			goto out_agbp_relse;
 		}
-	} else
-		agbp = NULL;
+	}
 
 	/*
-	 * If this is a metadata preferred pag and we are user data
-	 * then try somewhere else if we are not being asked to
-	 * try harder at this point
+	 * If this is a metadata preferred pag and we are user data then try
+	 * somewhere else if we are not being asked to try harder at this
+	 * point
 	 */
 	if (pag->pagf_metadata && args->userdata &&
 	    (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
 		ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
-		args->agbp = NULL;
-		return 0;
+		goto out_agbp_relse;
 	}
 
 	need = XFS_MIN_FREELIST_PAG(pag, mp);
-	if (!xfs_alloc_space_available(args, need, flags)) {
-		if (agbp)
-			xfs_trans_brelse(tp, agbp);
-		args->agbp = NULL;
-		return 0;
-	}
+	if (!xfs_alloc_space_available(args, need, flags))
+		goto out_agbp_relse;
 
 	/*
 	 * Get the a.g. freespace buffer.
 	 * Can fail if we're not blocking on locks, and it's held.
 	 */
-	if (agbp == NULL) {
-		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
-				&agbp)))
-			return error;
-		if (agbp == NULL) {
+	if (!agbp) {
+		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
+		if (error)
+			goto out_no_agbp;
+		if (!agbp) {
 			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
 			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
-			args->agbp = NULL;
-			return 0;
+			goto out_no_agbp;
 		}
 	}
 
 
 	/* If there isn't enough total space or single-extent, reject it. */
 	need = XFS_MIN_FREELIST_PAG(pag, mp);
-	if (!xfs_alloc_space_available(args, need, flags)) {
-		xfs_trans_brelse(tp, agbp);
-		args->agbp = NULL;
-		return 0;
-	}
+	if (!xfs_alloc_space_available(args, need, flags))
+		goto out_agbp_relse;
 
 	/*
 	 * Make the freelist shorter if it's too long.
@@ -1997,10 +1982,10 @@ xfs_alloc_fix_freelist(
 
 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
 		if (error)
-			return error;
+			goto out_agbp_relse;
 		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
 		if (error)
-			return error;
+			goto out_agbp_relse;
 		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
 		xfs_trans_binval(tp, bp);
 	}
@@ -2015,7 +2000,7 @@ xfs_alloc_fix_freelist(
 	targs.pag = pag;
 	error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
 	if (error)
-		return error;
+		goto out_agbp_relse;
 
 	/* Make the freelist longer if it's too short. */
 	while (pag->pagf_flcount < need) {
@@ -2024,10 +2009,9 @@ xfs_alloc_fix_freelist(
 
 		/* Allocate as many blocks as possible at once. */
 		error = xfs_alloc_ag_vextent(&targs);
-		if (error) {
-			xfs_trans_brelse(tp, agflbp);
-			return error;
-		}
+		if (error)
+			goto out_agflbp_relse;
+
 		/*
 		 * Stop if we run out.  Won't happen if callers are obeying
 		 * the restrictions correctly.  Can happen for free calls
@@ -2036,9 +2020,7 @@ xfs_alloc_fix_freelist(
 		if (targs.agbno == NULLAGBLOCK) {
 			if (flags & XFS_ALLOC_FLAG_FREEING)
 				break;
-			xfs_trans_brelse(tp, agflbp);
-			args->agbp = NULL;
-			return 0;
+			goto out_agflbp_relse;
 		}
 		/*
 		 * Put each allocated block on the list.
@@ -2047,12 +2029,21 @@ xfs_alloc_fix_freelist(
 			error = xfs_alloc_put_freelist(tp, agbp,
 							agflbp, bno, 0);
 			if (error)
-				return error;
+				goto out_agflbp_relse;
 		}
 	}
 	xfs_trans_brelse(tp, agflbp);
 	args->agbp = agbp;
 	return 0;
+
+out_agflbp_relse:
+	xfs_trans_brelse(tp, agflbp);
+out_agbp_relse:
+	if (agbp)
+		xfs_trans_brelse(tp, agbp);
+out_no_agbp:
+	args->agbp = NULL;
+	return error;
 }
 
 /*
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/20] xfs: clean up XFS_MIN_FREELIST macros
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (2 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-15 14:58   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 05/20] xfs: introduce rmap btree definitions Dave Chinner
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

We no longer calculate the minimum freelist size from the on-disk
AGF, so we don't need the macros used for this. That means the
nested macros can be cleaned up, and turn this into an actual
function so the logic is clear and concise. This will make it much
easier to add support for the rmap btree when the time comes.

This also gets rid of the XFS_AG_MAXLEVELS macro used by these
freelist macros as it is simply a wrapper around a single variable.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c       | 22 +++++++++++++++++++---
 fs/xfs/libxfs/xfs_alloc.h       |  2 ++
 fs/xfs/libxfs/xfs_bmap.c        |  2 +-
 fs/xfs/libxfs/xfs_format.h      | 13 -------------
 fs/xfs/libxfs/xfs_trans_resv.h  |  4 ++--
 fs/xfs/libxfs/xfs_trans_space.h |  2 +-
 fs/xfs/xfs_filestream.c         |  2 +-
 7 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 352db46..d4aa844 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1870,6 +1870,23 @@ xfs_alloc_longest_free_extent(
 	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
 }
 
+unsigned int
+xfs_alloc_min_freelist(
+	struct xfs_mount	*mp,
+	struct xfs_perag	*pag)
+{
+	unsigned int		min_free;
+
+	/* space needed by-bno freespace btree */
+	min_free = min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_BNOi] + 1,
+				       mp->m_ag_maxlevels);
+	/* space needed by-size freespace btree */
+	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
+				       mp->m_ag_maxlevels);
+
+	return min_free;
+}
+
 /*
  * Check if the operation we are fixing up the freelist for should go ahead or
  * not. If we are freeing blocks, we always allow it, otherwise the allocation
@@ -1944,7 +1961,7 @@ xfs_alloc_fix_freelist(
 		goto out_agbp_relse;
 	}
 
-	need = XFS_MIN_FREELIST_PAG(pag, mp);
+	need = xfs_alloc_min_freelist(mp, pag);
 	if (!xfs_alloc_space_available(args, need, flags))
 		goto out_agbp_relse;
 
@@ -1963,9 +1980,8 @@ xfs_alloc_fix_freelist(
 		}
 	}
 
-
 	/* If there isn't enough total space or single-extent, reject it. */
-	need = XFS_MIN_FREELIST_PAG(pag, mp);
+	need = xfs_alloc_min_freelist(mp, pag);
 	if (!xfs_alloc_space_available(args, need, flags))
 		goto out_agbp_relse;
 
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index a4d3b9a..ca1c816 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -132,6 +132,8 @@ typedef struct xfs_alloc_arg {
 
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
 		struct xfs_perag *pag, xfs_extlen_t need);
+unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
+		struct xfs_perag *pag);
 
 /*
  * Compute and fill in value of m_ag_maxlevels.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 7382cce..983a5d0 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3522,7 +3522,7 @@ xfs_bmap_longest_free_extent(
 	}
 
 	longest = xfs_alloc_longest_free_extent(mp, pag,
-						XFS_MIN_FREELIST_PAG(pag, mp));
+					xfs_alloc_min_freelist(mp, pag));
 	if (*blen < longest)
 		*blen = longest;
 
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 815f61b..a0ae572 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -766,19 +766,6 @@ typedef struct xfs_agfl {
 
 #define XFS_AGFL_CRC_OFF	offsetof(struct xfs_agfl, agfl_crc)
 
-
-#define	XFS_AG_MAXLEVELS(mp)		((mp)->m_ag_maxlevels)
-#define	XFS_MIN_FREELIST_RAW(bl,cl,mp)	\
-	(MIN(bl + 1, XFS_AG_MAXLEVELS(mp)) + MIN(cl + 1, XFS_AG_MAXLEVELS(mp)))
-#define	XFS_MIN_FREELIST(a,mp)		\
-	(XFS_MIN_FREELIST_RAW(		\
-		be32_to_cpu((a)->agf_levels[XFS_BTNUM_BNOi]), \
-		be32_to_cpu((a)->agf_levels[XFS_BTNUM_CNTi]), mp))
-#define	XFS_MIN_FREELIST_PAG(pag,mp)	\
-	(XFS_MIN_FREELIST_RAW(		\
-		(unsigned int)(pag)->pagf_levels[XFS_BTNUM_BNOi], \
-		(unsigned int)(pag)->pagf_levels[XFS_BTNUM_CNTi], mp))
-
 #define XFS_AGB_TO_FSB(mp,agno,agbno)	\
 	(((xfs_fsblock_t)(agno) << (mp)->m_sb.sb_agblklog) | (agbno))
 #define	XFS_FSB_TO_AGNO(mp,fsbno)	\
diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
index 2d5bdfc..7978150 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.h
+++ b/fs/xfs/libxfs/xfs_trans_resv.h
@@ -73,9 +73,9 @@ struct xfs_trans_resv {
  * 2 trees * (2 blocks/level * max depth - 1) * block size
  */
 #define	XFS_ALLOCFREE_LOG_RES(mp,nx) \
-	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * XFS_AG_MAXLEVELS(mp) - 1)))
+	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * (mp)->m_ag_maxlevels - 1)))
 #define	XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
-	((nx) * (2 * (2 * XFS_AG_MAXLEVELS(mp) - 1)))
+	((nx) * (2 * (2 * (mp)->m_ag_maxlevels - 1)))
 
 /*
  * Per-directory log reservation for any directory change.
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index bf9c457..41e0428 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -67,7 +67,7 @@
 #define	XFS_DIOSTRAT_SPACE_RES(mp, v)	\
 	(XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK) + (v))
 #define	XFS_GROWFS_SPACE_RES(mp)	\
-	(2 * XFS_AG_MAXLEVELS(mp))
+	(2 * (mp)->m_ag_maxlevels)
 #define	XFS_GROWFSRT_SPACE_RES(mp,b)	\
 	((b) + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK))
 #define	XFS_LINK_SPACE_RES(mp,nl)	\
diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
index 9ac5eaa..c4c130f 100644
--- a/fs/xfs/xfs_filestream.c
+++ b/fs/xfs/xfs_filestream.c
@@ -197,7 +197,7 @@ xfs_filestream_pick_ag(
 		}
 
 		longest = xfs_alloc_longest_free_extent(mp, pag,
-						XFS_MIN_FREELIST_PAG(pag, mp));
+					xfs_alloc_min_freelist(mp, pag));
 		if (((minlen && longest >= minlen) ||
 		     (!minlen && pag->pagf_freeblks >= minfree)) &&
 		    (!pag->pagf_metadata || !(flags & XFS_PICK_USERDATA) ||
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/20] xfs: introduce rmap btree definitions
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (3 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 04/20] xfs: clean up XFS_MIN_FREELIST macros Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:30   ` Darrick J. Wong
  2015-06-03  6:04 ` [PATCH 06/20] xfs: add rmap btree stats infrastructure Dave Chinner
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Add new per-ag rmap btree definitions to the per-ag structures. The
rmap btree will sit inthe empty slots on disk after the free space
btrees, and hence form a part of the array of space management
btrees. This requires the definition of the btree to be contiguous
with the free space btrees.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c  |  6 ++++++
 fs/xfs/libxfs/xfs_btree.c  |  4 ++--
 fs/xfs/libxfs/xfs_btree.h  |  3 +++
 fs/xfs/libxfs/xfs_format.h | 22 +++++++++++++++++-----
 fs/xfs/libxfs/xfs_types.h  |  4 ++--
 5 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index d4aa844..c7206b5 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2267,6 +2267,10 @@ xfs_agf_verify(
 	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS)
 		return false;
 
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
+	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)
+		return false;
+
 	/*
 	 * during growfs operations, the perag is not fully initialised,
 	 * so we can't use it for any useful checking. growfs ensures we can't
@@ -2397,6 +2401,8 @@ xfs_alloc_read_agf(
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
 		pag->pagf_levels[XFS_BTNUM_CNTi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
+		pag->pagf_levels[XFS_BTNUM_RMAPi] =
+			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
 		spin_lock_init(&pag->pagb_lock);
 		pag->pagb_count = 0;
 		pag->pagb_tree = RB_ROOT;
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index c72283d..0426152 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -42,9 +42,9 @@ kmem_zone_t	*xfs_btree_cur_zone;
  * Btree magic numbers.
  */
 static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
-	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
+	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
 	  XFS_FIBT_MAGIC },
-	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
+	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
 	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
 };
 #define xfs_btree_magic(cur) \
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 8f18bab..ace1995 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -63,6 +63,7 @@ union xfs_btree_rec {
 #define	XFS_BTNUM_BMAP	((xfs_btnum_t)XFS_BTNUM_BMAPi)
 #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
 #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
+#define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
 
 /*
  * For logging record fields.
@@ -94,6 +95,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break;	\
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break;	\
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break;	\
+	case XFS_BTNUM_RMAP: break;	\
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
 	}       \
 } while (0)
@@ -108,6 +110,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
+	case XFS_BTNUM_RMAP: break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
 	}       \
 } while (0)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index a0ae572..d120af4 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -445,6 +445,7 @@ xfs_sb_has_compat_feature(
 }
 
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
+#define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
 		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
@@ -514,6 +515,12 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
 		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
 }
 
+static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
+{
+	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
+		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
+}
+
 /*
  * end of superblock version macros
  */
@@ -574,10 +581,10 @@ xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
 #define	XFS_AGI_GOOD_VERSION(v)	((v) == XFS_AGI_VERSION)
 
 /*
- * Btree number 0 is bno, 1 is cnt.  This value gives the size of the
+ * Btree number 0 is bno, 1 is cnt, 2 is rmap. This value gives the size of the
  * arrays below.
  */
-#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_CNTi + 1)
+#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_RMAPi + 1)
 
 /*
  * The second word of agf_levels in the first a.g. overlaps the EFS
@@ -594,12 +601,10 @@ typedef struct xfs_agf {
 	__be32		agf_seqno;	/* sequence # starting from 0 */
 	__be32		agf_length;	/* size in blocks of a.g. */
 	/*
-	 * Freespace information
+	 * Freespace and rmap information
 	 */
 	__be32		agf_roots[XFS_BTNUM_AGF];	/* root blocks */
-	__be32		agf_spare0;	/* spare field */
 	__be32		agf_levels[XFS_BTNUM_AGF];	/* btree levels */
-	__be32		agf_spare1;	/* spare field */
 
 	__be32		agf_flfirst;	/* first freelist block's index */
 	__be32		agf_fllast;	/* last freelist block's index */
@@ -1277,6 +1282,13 @@ typedef __be32 xfs_inobt_ptr_t;
 #define	XFS_FIBT_BLOCK(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
 
 /*
+ * Reverse mapping btree format definitions
+ *
+ * There is a btree for the reverse map per allocation group
+ */
+#define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
+
+/*
  * The first data block of an AG depends on whether the filesystem was formatted
  * with the finobt feature. If so, account for the finobt reserved root btree
  * block.
diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
index b79dc66..3d50364 100644
--- a/fs/xfs/libxfs/xfs_types.h
+++ b/fs/xfs/libxfs/xfs_types.h
@@ -108,8 +108,8 @@ typedef enum {
 } xfs_lookup_t;
 
 typedef enum {
-	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi,
-	XFS_BTNUM_FINOi, XFS_BTNUM_MAX
+	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
+	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
 } xfs_btnum_t;
 
 struct xfs_name {
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/20] xfs: add rmap btree stats infrastructure
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (4 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 05/20] xfs: introduce rmap btree definitions Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 07/20] xfs: rmap btree add more reserved blocks Dave Chinner
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The rmap btree will require the same stats as all the other generic
btrees, so add al the code for that now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_btree.h |  4 ++--
 fs/xfs/xfs_stats.c        |  1 +
 fs/xfs/xfs_stats.h        | 18 +++++++++++++++++-
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index ace1995..494ee0b 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -95,7 +95,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break;	\
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break;	\
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break;	\
-	case XFS_BTNUM_RMAP: break;	\
+	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_INC(rmap, stat); break;	\
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
 	}       \
 } while (0)
@@ -110,7 +110,7 @@ do {    \
 	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
 	case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
 	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
-	case XFS_BTNUM_RMAP: break; \
+	case XFS_BTNUM_RMAP: __XFS_BTREE_STATS_ADD(rmap, stat, val); break; \
 	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
 	}       \
 } while (0)
diff --git a/fs/xfs/xfs_stats.c b/fs/xfs/xfs_stats.c
index f224038..67bbfa2 100644
--- a/fs/xfs/xfs_stats.c
+++ b/fs/xfs/xfs_stats.c
@@ -60,6 +60,7 @@ static int xfs_stat_proc_show(struct seq_file *m, void *v)
 		{ "bmbt2",		XFSSTAT_END_BMBT_V2		},
 		{ "ibt2",		XFSSTAT_END_IBT_V2		},
 		{ "fibt2",		XFSSTAT_END_FIBT_V2		},
+		{ "rmapbt",		XFSSTAT_END_RMAP_V2		},
 		/* we print both series of quota information together */
 		{ "qm",			XFSSTAT_END_QM			},
 	};
diff --git a/fs/xfs/xfs_stats.h b/fs/xfs/xfs_stats.h
index c8f238b..8414db2 100644
--- a/fs/xfs/xfs_stats.h
+++ b/fs/xfs/xfs_stats.h
@@ -199,7 +199,23 @@ struct xfsstats {
 	__uint32_t		xs_fibt_2_alloc;
 	__uint32_t		xs_fibt_2_free;
 	__uint32_t		xs_fibt_2_moves;
-#define XFSSTAT_END_XQMSTAT		(XFSSTAT_END_FIBT_V2+6)
+#define XFSSTAT_END_RMAP_V2		(XFSSTAT_END_FIBT_V2+15)
+	__uint32_t		xs_rmap_2_lookup;
+	__uint32_t		xs_rmap_2_compare;
+	__uint32_t		xs_rmap_2_insrec;
+	__uint32_t		xs_rmap_2_delrec;
+	__uint32_t		xs_rmap_2_newroot;
+	__uint32_t		xs_rmap_2_killroot;
+	__uint32_t		xs_rmap_2_increment;
+	__uint32_t		xs_rmap_2_decrement;
+	__uint32_t		xs_rmap_2_lshift;
+	__uint32_t		xs_rmap_2_rshift;
+	__uint32_t		xs_rmap_2_split;
+	__uint32_t		xs_rmap_2_join;
+	__uint32_t		xs_rmap_2_alloc;
+	__uint32_t		xs_rmap_2_free;
+	__uint32_t		xs_rmap_2_moves;
+#define XFSSTAT_END_XQMSTAT		(XFSSTAT_END_RMAP_V2+6)
 	__uint32_t		xs_qm_dqreclaims;
 	__uint32_t		xs_qm_dqreclaim_misses;
 	__uint32_t		xs_qm_dquot_dups;
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/20] xfs: rmap btree add more reserved blocks
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (5 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 06/20] xfs: add rmap btree stats infrastructure Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 08/20] xfs: add owner field to extent allocation and freeing Dave Chinner
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

XFS reserves a small amount of space in each AG for the minimum
number of free blocks needed for operation. Adding the rmap btree
increases the number of reserved blocks, but it also increases the
complexity of the calculation as the free inode btree is optional
(like the rmbt).

Rather than calculate the prealloc blocks every time we need to
check it, add a function to calculate it at mount time and store it
in the struct xfs_mount, and convert the XFS_PREALLOC_BLOCKS macro
just to use the xfs-mount variable directly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c  | 11 +++++++++++
 fs/xfs/libxfs/xfs_alloc.h  |  2 ++
 fs/xfs/libxfs/xfs_format.h |  9 +--------
 fs/xfs/xfs_fsops.c         |  6 +++---
 fs/xfs/xfs_mount.c         |  2 ++
 fs/xfs/xfs_mount.h         |  1 +
 6 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index c7206b5..a683d7a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -49,6 +49,17 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 		xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+xfs_extlen_t
+xfs_prealloc_blocks(
+	struct xfs_mount	*mp)
+{
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return XFS_RMAP_BLOCK(mp) + 1;
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		return XFS_FIBT_BLOCK(mp) + 1;
+	return XFS_IBT_BLOCK(mp) + 1;
+}
+
 /*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index ca1c816..71379f6 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -234,4 +234,6 @@ xfs_alloc_get_rec(
 int xfs_read_agf(struct xfs_mount *mp, struct xfs_trans *tp,
 			xfs_agnumber_t agno, int flags, struct xfs_buf **bpp);
 
+xfs_extlen_t xfs_prealloc_blocks(struct xfs_mount *mp);
+
 #endif	/* __XFS_ALLOC_H__ */
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index d120af4..e81ffec 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1288,18 +1288,11 @@ typedef __be32 xfs_inobt_ptr_t;
  */
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
-/*
- * The first data block of an AG depends on whether the filesystem was formatted
- * with the finobt feature. If so, account for the finobt reserved root btree
- * block.
- */
-#define XFS_PREALLOC_BLOCKS(mp) \
+#define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
 	 XFS_IBT_BLOCK(mp) + 1)
 
-
-
 /*
  * BMAP Btree format definitions
  *
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 4bd6463..a564c4c 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -246,7 +246,7 @@ xfs_growfs_data_private(
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
-		tmpsize = agsize - XFS_PREALLOC_BLOCKS(mp);
+		tmpsize = agsize - mp->m_ag_prealloc_blocks;
 		agf->agf_freeblks = cpu_to_be32(tmpsize);
 		agf->agf_longest = cpu_to_be32(tmpsize);
 		if (xfs_sb_version_hascrc(&mp->m_sb))
@@ -343,7 +343,7 @@ xfs_growfs_data_private(
 						agno, 0);
 
 		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
 		arec->ar_blockcount = cpu_to_be32(
 			agsize - be32_to_cpu(arec->ar_startblock));
 
@@ -372,7 +372,7 @@ xfs_growfs_data_private(
 						agno, 0);
 
 		arec = XFS_ALLOC_REC_ADDR(mp, XFS_BUF_TO_BLOCK(bp), 1);
-		arec->ar_startblock = cpu_to_be32(XFS_PREALLOC_BLOCKS(mp));
+		arec->ar_startblock = cpu_to_be32(mp->m_ag_prealloc_blocks);
 		arec->ar_blockcount = cpu_to_be32(
 			agsize - be32_to_cpu(arec->ar_startblock));
 		nfree += be32_to_cpu(arec->ar_blockcount);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 461e791..9d6be55 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -239,6 +239,8 @@ xfs_initialize_perag(
 
 	if (maxagi)
 		*maxagi = index;
+
+	mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp);
 	return 0;
 
 out_unwind:
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 7999e91..d9c9834 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -93,6 +93,7 @@ typedef struct xfs_mount {
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
+	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
 	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 	struct mutex		m_growlock;	/* growfs mutex */
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/20] xfs: add owner field to extent allocation and freeing
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (6 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 07/20] xfs: rmap btree add more reserved blocks Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-24 19:09   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 09/20] xfs: introduce rmap extent operation stubs Dave Chinner
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

For the rmap btree to work, we have to fed the extent owner
information to the the allocation and freeing functions. This
information is what will end up in the rmap btree that tracks
allocated extents. While we technically don't need the owner
information when freeing extents, passing it allows us to validate
that the extent we are removing from the rmap btree actually
belonged to the owner we expected it to belong to.

We also define a special set of owner values for internal metadata
that would otherwise have no owner. This allows us to tell the
difference between metadata owned by different per-ag btrees, as
well as static fs metadata (e.g. AG headers) and internal journal
blocks.

There are also a couple of special cases we need to take care of -
during EFI recovery, we don't actually know who the original owner
was, so we need to pass a wildcard to indicate that we aren't
checking the owner for validity. We also need special handling in
growfs, as we "free" the space in the last AG when extending it, but
because it's new space it has no actual owner...

While touching the xfs_bmap_add_free() function, re-order the
parameters to put the struct xfs_mount first.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c        | 11 ++++++++---
 fs/xfs/libxfs/xfs_alloc.h        |  4 +++-
 fs/xfs/libxfs/xfs_bmap.c         | 17 ++++++++++++-----
 fs/xfs/libxfs/xfs_bmap.h         |  5 +++--
 fs/xfs/libxfs/xfs_bmap_btree.c   |  3 ++-
 fs/xfs/libxfs/xfs_format.h       | 16 ++++++++++++++++
 fs/xfs/libxfs/xfs_ialloc.c       | 10 +++++-----
 fs/xfs/libxfs/xfs_ialloc_btree.c |  3 ++-
 fs/xfs/xfs_bmap_util.c           | 17 +++++++++--------
 fs/xfs/xfs_fsops.c               | 13 +++++++++----
 fs/xfs/xfs_log_recover.c         |  3 ++-
 11 files changed, 71 insertions(+), 31 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index a683d7a..4353135 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -1592,6 +1592,7 @@ xfs_free_ag_extent(
 	xfs_agnumber_t	agno,	/* allocation group number */
 	xfs_agblock_t	bno,	/* starting block number */
 	xfs_extlen_t	len,	/* length of extent */
+	uint64_t	owner,	/* extent owner */
 	int		isfl)	/* set if is freelist blocks - no sb acctg */
 {
 	xfs_btree_cur_t	*bno_cur;	/* cursor for by-block btree */
@@ -2010,7 +2011,8 @@ xfs_alloc_fix_freelist(
 		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
 		if (error)
 			goto out_agbp_relse;
-		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
+		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
+					   XFS_RMAP_OWN_AG, 1);
 		if (error)
 			goto out_agbp_relse;
 		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
@@ -2020,6 +2022,7 @@ xfs_alloc_fix_freelist(
 	memset(&targs, 0, sizeof(targs));
 	targs.tp = tp;
 	targs.mp = mp;
+	targs.owner = XFS_RMAP_OWN_AG;
 	targs.agbp = agbp;
 	targs.agno = args->agno;
 	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
@@ -2660,7 +2663,8 @@ int				/* error */
 xfs_free_extent(
 	xfs_trans_t	*tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
-	xfs_extlen_t	len)	/* length of extent */
+	xfs_extlen_t	len,	/* length of extent */
+	uint64_t	owner)	/* extent owner */
 {
 	xfs_alloc_arg_t	args;
 	int		error;
@@ -2696,7 +2700,8 @@ xfs_free_extent(
 		goto error0;
 	}
 
-	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
+	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno,
+				   len, owner, 0);
 	if (!error)
 		xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
 error0:
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 71379f6..39ca815 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -122,6 +122,7 @@ typedef struct xfs_alloc_arg {
 	char		isfl;		/* set if is freelist blocks - !acctg */
 	char		userdata;	/* set if this is user data */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
+	uint64_t	owner;		/* owner of blocks being allocated */
 } xfs_alloc_arg_t;
 
 /*
@@ -208,7 +209,8 @@ int				/* error */
 xfs_free_extent(
 	struct xfs_trans *tp,	/* transaction pointer */
 	xfs_fsblock_t	bno,	/* starting block number of extent */
-	xfs_extlen_t	len);	/* length of extent */
+	xfs_extlen_t	len,	/* length of extent */
+	uint64_t	owner);	/* extent owner */
 
 int					/* error */
 xfs_alloc_lookup_le(
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 983a5d0..0b40a29 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -567,10 +567,11 @@ xfs_bmap_validate_ret(
  */
 void
 xfs_bmap_add_free(
+	struct xfs_mount	*mp,		/* mount point structure */
+	struct xfs_bmap_free	*flist,		/* list of extents */
 	xfs_fsblock_t		bno,		/* fs block number of extent */
 	xfs_filblks_t		len,		/* length of extent */
-	xfs_bmap_free_t		*flist,		/* list of extents */
-	xfs_mount_t		*mp)		/* mount point structure */
+	uint64_t		owner)		/* extent owner */
 {
 	xfs_bmap_free_item_t	*cur;		/* current (next) element */
 	xfs_bmap_free_item_t	*new;		/* new element */
@@ -591,9 +592,12 @@ xfs_bmap_add_free(
 	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
 #endif
 	ASSERT(xfs_bmap_free_item_zone != NULL);
+	ASSERT(owner);
+
 	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
 	new->xbfi_startblock = bno;
 	new->xbfi_blockcount = (xfs_extlen_t)len;
+	new->xbfi_owner = owner;
 	for (prev = NULL, cur = flist->xbf_first;
 	     cur != NULL;
 	     prev = cur, cur = cur->xbfi_next) {
@@ -696,7 +700,7 @@ xfs_bmap_btree_to_extents(
 	cblock = XFS_BUF_TO_BLOCK(cbp);
 	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
 		return error;
-	xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp);
+	xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, ip->i_ino);
 	ip->i_d.di_nblocks--;
 	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
 	xfs_trans_binval(tp, cbp);
@@ -777,6 +781,7 @@ xfs_bmap_extents_to_btree(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = mp;
+	args.owner = ip->i_ino;
 	args.firstblock = *firstblock;
 	if (*firstblock == NULLFSBLOCK) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
@@ -923,6 +928,7 @@ xfs_bmap_local_to_extents(
 	memset(&args, 0, sizeof(args));
 	args.tp = tp;
 	args.mp = ip->i_mount;
+	args.owner = ip->i_ino;
 	args.firstblock = *firstblock;
 	/*
 	 * Allocate a block.  We know we need only one, since the
@@ -3706,6 +3712,7 @@ xfs_bmap_btalloc(
 	memset(&args, 0, sizeof(args));
 	args.tp = ap->tp;
 	args.mp = mp;
+	args.owner = ap->ip->i_ino;
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
@@ -4980,8 +4987,8 @@ xfs_bmap_del_extent(
 	 * If we need to, add to list of extents to delete.
 	 */
 	if (do_fx)
-		xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist,
-			mp);
+		xfs_bmap_add_free(mp, flist, del->br_startblock,
+				  del->br_blockcount, ip->i_ino);
 	/*
 	 * Adjust inode # blocks in the file.
 	 */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 6aaa0c1..674819f 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -66,6 +66,7 @@ typedef struct xfs_bmap_free_item
 {
 	xfs_fsblock_t		xbfi_startblock;/* starting fs block number */
 	xfs_extlen_t		xbfi_blockcount;/* number of blocks in extent */
+	uint64_t		xbfi_owner;	/* extent owner */
 	struct xfs_bmap_free_item *xbfi_next;	/* link to next entry */
 } xfs_bmap_free_item_t;
 
@@ -182,8 +183,8 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
 
 int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
-void	xfs_bmap_add_free(xfs_fsblock_t bno, xfs_filblks_t len,
-		struct xfs_bmap_free *flist, struct xfs_mount *mp);
+void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
+			  xfs_fsblock_t bno, xfs_filblks_t len, uint64_t owner);
 void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
 int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
 			int *committed);
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 2c44c8e..18fe394 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -445,6 +445,7 @@ xfs_bmbt_alloc_block(
 	args.mp = cur->bc_mp;
 	args.fsbno = cur->bc_private.b.firstblock;
 	args.firstblock = args.fsbno;
+	args.owner = cur->bc_private.b.ip->i_ino;
 
 	if (args.fsbno == NULLFSBLOCK) {
 		args.fsbno = be64_to_cpu(start->l);
@@ -525,7 +526,7 @@ xfs_bmbt_free_block(
 	struct xfs_trans	*tp = cur->bc_tp;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
 
-	xfs_bmap_add_free(fsbno, 1, cur->bc_private.b.flist, mp);
+	xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, ip->i_ino);
 	ip->i_d.di_nblocks--;
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index e81ffec..4c9e7e1 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1288,6 +1288,22 @@ typedef __be32 xfs_inobt_ptr_t;
  */
 #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
 
+/*
+ * Special owner types.
+ *
+ * Seeing as we only support up to 8EB, we have the upper bit of the owner field
+ * to tell us we have a special owner value. We use these for static metadata
+ * allocated at mkfs/growfs time, as well as for freespace management metadata.
+ */
+#define XFS_RMAP_OWN_NULL	(-1ULL)	/* No owner, for growfs */
+#define XFS_RMAP_OWN_UNKNOWN	(-2ULL)	/* Unknown owner, for EFI recovery */
+#define XFS_RMAP_OWN_FS		(-3ULL)	/* static fs metadata */
+#define XFS_RMAP_OWN_LOG	(-4ULL)	/* static fs metadata */
+#define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
+#define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
+#define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
+#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
+
 #define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index 66efc70..b08823a 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -612,6 +612,7 @@ xfs_ialloc_ag_alloc(
 	args.tp = tp;
 	args.mp = tp->t_mountp;
 	args.fsbno = NULLFSBLOCK;
+	args.owner = XFS_RMAP_OWN_INODES;
 
 #ifdef DEBUG
 	/* randomly do sparse inode allocations */
@@ -1826,9 +1827,8 @@ xfs_difree_inode_chunk(
 
 	if (!xfs_inobt_issparse(rec->ir_holemask)) {
 		/* not sparse, calculate extent info directly */
-		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno,
-				  XFS_AGINO_TO_AGBNO(mp, rec->ir_startino)),
-				  mp->m_ialloc_blks, flist, mp);
+		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
+				  mp->m_ialloc_blks, XFS_RMAP_OWN_INODES);
 		return;
 	}
 
@@ -1871,8 +1871,8 @@ xfs_difree_inode_chunk(
 
 		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
 		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
-		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
-				  flist, mp);
+		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
+				  contigblk, XFS_RMAP_OWN_INODES);
 
 		/* reset range to current bit and carry on... */
 		startidx = endidx = nextbit;
diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
index 674ad8f..b96db1c 100644
--- a/fs/xfs/libxfs/xfs_ialloc_btree.c
+++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
@@ -96,6 +96,7 @@ xfs_inobt_alloc_block(
 	memset(&args, 0, sizeof(args));
 	args.tp = cur->bc_tp;
 	args.mp = cur->bc_mp;
+	args.owner = XFS_RMAP_OWN_INOBT;
 	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
 	args.minlen = 1;
 	args.maxlen = 1;
@@ -129,7 +130,7 @@ xfs_inobt_free_block(
 	int			error;
 
 	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
-	error = xfs_free_extent(cur->bc_tp, fsbno, 1);
+	error = xfs_free_extent(cur->bc_tp, fsbno, 1, XFS_RMAP_OWN_INOBT);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 4a29655..5ed272b 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -117,15 +117,16 @@ xfs_bmap_finish(
 	efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
 	for (free = flist->xbf_first; free != NULL; free = next) {
 		next = free->xbfi_next;
-		if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
-				free->xbfi_blockcount))) {
+		error = xfs_free_extent(ntp, free->xbfi_startblock,
+					free->xbfi_blockcount,
+					free->xbfi_owner);
+		if (error) {
 			/*
-			 * The bmap free list will be cleaned up at a
-			 * higher level.  The EFI will be canceled when
-			 * this transaction is aborted.
-			 * Need to force shutdown here to make sure it
-			 * happens, since this transaction may not be
-			 * dirty yet.
+			 * The bmap free list will be cleaned up at a higher
+			 * level.  The EFI will be canceled when this
+			 * transaction is aborted.  Need to force shutdown here
+			 * to make sure it happens, since this transaction may
+			 * not be dirty yet.
 			 */
 			mp = ntp->t_mountp;
 			if (!XFS_FORCED_SHUTDOWN(mp))
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index a564c4c..ebfeb84 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -466,14 +466,19 @@ xfs_growfs_data_private(
 		       be32_to_cpu(agi->agi_length));
 
 		xfs_alloc_log_agf(tp, bp, XFS_AGF_LENGTH);
+
 		/*
 		 * Free the new space.
+		 *
+		 * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
+		 * this doesn't actually exist in the rmap btree.
 		 */
-		error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
-			be32_to_cpu(agf->agf_length) - new), new);
-		if (error) {
+		error = xfs_free_extent(tp,
+				XFS_AGB_TO_FSB(mp, agno,
+					be32_to_cpu(agf->agf_length) - new),
+				new, XFS_RMAP_OWN_NULL);
+		if (error)
 			goto error0;
-		}
 	}
 
 	/*
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 4a8c440..5dad26c 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3753,7 +3753,8 @@ xlog_recover_process_efi(
 
 	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
 		extp = &(efip->efi_format.efi_extents[i]);
-		error = xfs_free_extent(tp, extp->ext_start, extp->ext_len);
+		error = xfs_free_extent(tp, extp->ext_start, extp->ext_len,
+					XFS_RMAP_OWN_UNKNOWN);
 		if (error)
 			goto abort_error;
 		xfs_trans_log_efd_extent(tp, efdp, extp->ext_start,
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/20] xfs: introduce rmap extent operation stubs
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (7 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 08/20] xfs: add owner field to extent allocation and freeing Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 10/20] xfs: define the on-disk rmap btree format Dave Chinner
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Add the stubs into the extent allocation and freeing paths that the
rmap btree implementation will hook into. While doing this, add the
trace points that will be used to track rmap btree extent
manipulations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/Makefile                |  1 +
 fs/xfs/libxfs/xfs_alloc.c      | 11 ++++++
 fs/xfs/libxfs/xfs_rmap.c       | 89 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h | 30 ++++++++++++++
 fs/xfs/xfs_trace.h             | 37 ++++++++++++++++++
 5 files changed, 168 insertions(+)
 create mode 100644 fs/xfs/libxfs/xfs_rmap.c
 create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.h

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b346326..07f36d3 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -50,6 +50,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_fork.o \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
+				   xfs_rmap.o \
 				   xfs_sb.o \
 				   xfs_symlink_remote.o \
 				   xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 4353135..f62775a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -26,6 +26,7 @@
 #include "xfs_mount.h"
 #include "xfs_inode.h"
 #include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_alloc.h"
 #include "xfs_extent_busy.h"
@@ -644,6 +645,12 @@ xfs_alloc_ag_vextent(
 	ASSERT(!args->wasfromfl || !args->isfl);
 	ASSERT(args->agbno % args->alignment == 0);
 
+	/* insert new block into the reverse map btree */
+	error = xfs_rmap_alloc(args->tp, args->agbp, args->agno,
+			       args->agbno, args->len, args->owner);
+	if (error)
+		return error;
+
 	if (!args->wasfromfl) {
 		error = xfs_alloc_update_counters(args->tp, args->pag,
 						  args->agbp,
@@ -1610,6 +1617,10 @@ xfs_free_ag_extent(
 	xfs_extlen_t	nlen;		/* new length of freespace */
 	xfs_perag_t	*pag;		/* per allocation group data */
 
+	error = xfs_rmap_free(tp, agbp, agno, bno, len, owner);
+	if (error)
+		goto error0;
+
 	mp = tp->t_mountp;
 	/*
 	 * Allocate and initialize a cursor for the by-block btree.
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
new file mode 100644
index 0000000..3958cf8
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -0,0 +1,89 @@
+
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_btree.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_trace.h"
+#include "xfs_error.h"
+#include "xfs_extent_busy.h"
+
+int
+xfs_rmap_free(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	int			error = 0;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_free_extent(mp, agno, bno, len, owner);
+	if (1)
+		goto out_error;
+	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, owner);
+	return 0;
+
+out_error:
+	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, owner);
+	return error;
+}
+
+int
+xfs_rmap_alloc(
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner)
+{
+	struct xfs_mount	*mp = tp->t_mountp;
+	int			error = 0;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return 0;
+
+	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
+	if (1)
+		goto out_error;
+	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
+	return 0;
+
+out_error:
+	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
+	return error;
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
new file mode 100644
index 0000000..f1caa40
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#ifndef __XFS_RMAP_BTREE_H__
+#define	__XFS_RMAP_BTREE_H__
+
+struct xfs_buf;
+
+int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
+		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		   uint64_t owner);
+int xfs_rmap_free(struct xfs_trans *tp, struct xfs_buf *agbp,
+		  xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
+		  uint64_t owner);
+
+#endif	/* __XFS_RMAP_BTREE_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 8d916d3..25bd4f5 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -1673,6 +1673,43 @@ DEFINE_ALLOC_EVENT(xfs_alloc_vextent_noagbp);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_loopfailed);
 DEFINE_ALLOC_EVENT(xfs_alloc_vextent_allfailed);
 
+DECLARE_EVENT_CLASS(xfs_rmap_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
+		 xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner),
+	TP_ARGS(mp, agno, agbno, len, owner),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_agblock_t, agbno)
+		__field(xfs_extlen_t, len)
+		__field(uint64_t, owner)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+		__entry->agbno = agbno;
+		__entry->len = len;
+		__entry->owner = owner;
+	),
+	TP_printk("dev %d:%d agno %u agbno %u len %u, owner 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno,
+		  __entry->agbno,
+		  __entry->len,
+		  __entry->owner)
+);
+#define DEFINE_RMAP_EVENT(name) \
+DEFINE_EVENT(xfs_rmap_class, name, \
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+		 xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner), \
+	TP_ARGS(mp, agno, agbno, len, owner))
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent);
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent_done);
+DEFINE_RMAP_EVENT(xfs_rmap_free_extent_error);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_done);
+DEFINE_RMAP_EVENT(xfs_rmap_alloc_extent_error);
+
 DECLARE_EVENT_CLASS(xfs_da_class,
 	TP_PROTO(struct xfs_da_args *args),
 	TP_ARGS(args),
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/20] xfs: define the on-disk rmap btree format
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (8 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 09/20] xfs: introduce rmap extent operation stubs Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 11/20] xfs: add rmap btree growfs support Dave Chinner
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now we have all the surrounding call infrastructure in place, we can
start fillin gout the rmap btree implementation. Start with the
on-disk btree format; add everything needed to read, write and
manipulate rmap btree blocks. This prepares the way for adding the
btree operations implementation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/Makefile                |   1 +
 fs/xfs/libxfs/xfs_btree.c      |   3 +
 fs/xfs/libxfs/xfs_btree.h      |  18 ++--
 fs/xfs/libxfs/xfs_format.h     |  27 ++++++
 fs/xfs/libxfs/xfs_rmap_btree.c | 187 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.h |  31 +++++++
 fs/xfs/libxfs/xfs_sb.c         |   6 ++
 fs/xfs/libxfs/xfs_shared.h     |   2 +
 fs/xfs/xfs_mount.h             |   2 +
 9 files changed, 269 insertions(+), 8 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_rmap_btree.c

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 07f36d3..654e28c 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,6 +51,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
 				   xfs_rmap.o \
+				   xfs_rmap_btree.o \
 				   xfs_sb.o \
 				   xfs_symlink_remote.o \
 				   xfs_trans_resv.o \
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 0426152..4c9b9b3 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1114,6 +1114,9 @@ xfs_btree_set_refs(
 	case XFS_BTNUM_BMAP:
 		xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF);
 		break;
+	case XFS_BTNUM_RMAP:
+		xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF);
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 494ee0b..67e2bbd 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -38,17 +38,19 @@ union xfs_btree_ptr {
 };
 
 union xfs_btree_key {
-	xfs_bmbt_key_t		bmbt;
-	xfs_bmdr_key_t		bmbr;	/* bmbt root block */
-	xfs_alloc_key_t		alloc;
-	xfs_inobt_key_t		inobt;
+	struct xfs_bmbt_key		bmbt;
+	xfs_bmdr_key_t			bmbr;	/* bmbt root block */
+	xfs_alloc_key_t			alloc;
+	struct xfs_inobt_key		inobt;
+	struct xfs_rmap_key		rmap;
 };
 
 union xfs_btree_rec {
-	xfs_bmbt_rec_t		bmbt;
-	xfs_bmdr_rec_t		bmbr;	/* bmbt root block */
-	xfs_alloc_rec_t		alloc;
-	xfs_inobt_rec_t		inobt;
+	struct xfs_bmbt_rec		bmbt;
+	xfs_bmdr_rec_t			bmbr;	/* bmbt root block */
+	struct xfs_alloc_rec		alloc;
+	struct xfs_inobt_rec		inobt;
+	struct xfs_rmap_rec		rmap;
 };
 
 /*
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 4c9e7e1..cd61ce9 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -1304,6 +1304,33 @@ typedef __be32 xfs_inobt_ptr_t;
 #define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
 #define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
 
+/*
+ * Data record structure
+ */
+struct xfs_rmap_rec {
+	__be32		rm_startblock;	/* extent start block */
+	__be32		rm_blockcount;	/* extent length */
+	__be64		rm_owner;	/* extent owner */
+};
+
+struct xfs_rmap_irec {
+	xfs_agblock_t	rm_startblock;	/* extent start block */
+	xfs_extlen_t	rm_blockcount;	/* extent length */
+	__uint64_t	rm_owner;	/* extent owner */
+};
+
+/*
+ * Key structure
+ *
+ * We don't use the length for lookups
+ */
+struct xfs_rmap_key {
+	__be32		rm_startblock;	/* extent start block */
+};
+
+/* btree pointer type */
+typedef __be32 xfs_rmap_ptr_t;
+
 #define	XFS_RMAP_BLOCK(mp) \
 	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
 	 XFS_FIBT_BLOCK(mp) + 1 : \
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
new file mode 100644
index 0000000..9a02699
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (c) 2014 Red Hat, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_bit.h"
+#include "xfs_sb.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_trans.h"
+#include "xfs_alloc.h"
+#include "xfs_btree.h"
+#include "xfs_rmap_btree.h"
+#include "xfs_trace.h"
+#include "xfs_cksum.h"
+#include "xfs_error.h"
+#include "xfs_extent_busy.h"
+
+static struct xfs_btree_cur *
+xfs_rmapbt_dup_cursor(
+	struct xfs_btree_cur	*cur)
+{
+	return xfs_rmapbt_init_cursor(cur->bc_mp, cur->bc_tp,
+			cur->bc_private.a.agbp, cur->bc_private.a.agno);
+}
+
+static bool
+xfs_rmapbt_verify(
+	struct xfs_buf		*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+	struct xfs_btree_block	*block = XFS_BUF_TO_BLOCK(bp);
+	struct xfs_perag	*pag = bp->b_pag;
+	unsigned int		level;
+
+	/*
+	 * magic number and level verification
+	 *
+	 * During growfs operations, we can't verify the exact level or owner as
+	 * the perag is not fully initialised and hence not attached to the
+	 * buffer.  In this case, check against the maximum tree depth.
+	 *
+	 * Similarly, during log recovery we will have a perag structure
+	 * attached, but the agf information will not yet have been initialised
+	 * from the on disk AGF. Again, we can only check against maximum limits
+	 * in this case.
+	 */
+	if (block->bb_magic!= cpu_to_be32(XFS_RMAP_CRC_MAGIC))
+		return false;
+
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return false;
+	if (!uuid_equal(&block->bb_u.s.bb_uuid, &mp->m_sb.sb_uuid))
+		return false;
+	if (block->bb_u.s.bb_blkno != cpu_to_be64(bp->b_bn))
+		return false;
+	if (pag && be32_to_cpu(block->bb_u.s.bb_owner) != pag->pag_agno)
+		return false;
+
+	level = be16_to_cpu(block->bb_level);
+	if (pag && pag->pagf_init) {
+		if (level >= pag->pagf_levels[XFS_BTNUM_RMAPi])
+			return false;
+	} else if (level >= mp->m_ag_maxlevels)
+		return false;
+
+	/* numrecs verification */
+	if (be16_to_cpu(block->bb_numrecs) > mp->m_rmap_mxr[level != 0])
+		return false;
+
+	/* sibling pointer verification */
+	if (!block->bb_u.s.bb_leftsib ||
+	    (be32_to_cpu(block->bb_u.s.bb_leftsib) >= mp->m_sb.sb_agblocks &&
+	     block->bb_u.s.bb_leftsib != cpu_to_be32(NULLAGBLOCK)))
+		return false;
+	if (!block->bb_u.s.bb_rightsib ||
+	    (be32_to_cpu(block->bb_u.s.bb_rightsib) >= mp->m_sb.sb_agblocks &&
+	     block->bb_u.s.bb_rightsib != cpu_to_be32(NULLAGBLOCK)))
+		return false;
+
+	return true;
+}
+
+static void
+xfs_rmapbt_read_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_btree_sblock_verify_crc(bp))
+		xfs_buf_ioerror(bp, -EFSBADCRC);
+	else if (!xfs_rmapbt_verify(bp))
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+
+	if (bp->b_error) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_verifier_error(bp);
+	}
+}
+
+static void
+xfs_rmapbt_write_verify(
+	struct xfs_buf	*bp)
+{
+	if (!xfs_rmapbt_verify(bp)) {
+		trace_xfs_btree_corrupt(bp, _RET_IP_);
+		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+		xfs_verifier_error(bp);
+		return;
+	}
+	xfs_btree_sblock_calc_crc(bp);
+
+}
+
+const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
+	.verify_read = xfs_rmapbt_read_verify,
+	.verify_write = xfs_rmapbt_write_verify,
+};
+
+static const struct xfs_btree_ops xfs_rmapbt_ops = {
+	.rec_len		= sizeof(struct xfs_rmap_rec),
+	.key_len		= sizeof(struct xfs_rmap_key),
+
+	.dup_cursor		= xfs_rmapbt_dup_cursor,
+	.buf_ops		= &xfs_rmapbt_buf_ops,
+};
+
+/*
+ * Allocate a new allocation btree cursor.
+ */
+struct xfs_btree_cur *
+xfs_rmapbt_init_cursor(
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_buf		*agbp,
+	xfs_agnumber_t		agno)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	struct xfs_btree_cur	*cur;
+
+	cur = kmem_zone_zalloc(xfs_btree_cur_zone, KM_SLEEP);
+	cur->bc_tp = tp;
+	cur->bc_mp = mp;
+	cur->bc_btnum = XFS_BTNUM_RMAP;
+	cur->bc_flags = XFS_BTREE_CRC_BLOCKS;
+	cur->bc_blocklog = mp->m_sb.sb_blocklog;
+	cur->bc_ops = &xfs_rmapbt_ops;
+	cur->bc_nlevels = be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]);
+
+	cur->bc_private.a.agbp = agbp;
+	cur->bc_private.a.agno = agno;
+
+	return cur;
+}
+
+/*
+ * Calculate number of records in an rmap btree block.
+ */
+int
+xfs_rmapbt_maxrecs(
+	struct xfs_mount	*mp,
+	int			blocklen,
+	int			leaf)
+{
+	blocklen -= XFS_RMAP_BLOCK_LEN;
+
+	if (leaf)
+		return blocklen / sizeof(struct xfs_rmap_rec);
+	return blocklen /
+		(sizeof(struct xfs_rmap_key) + sizeof(xfs_rmap_ptr_t));
+}
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h
index f1caa40..f04c9a1 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.h
+++ b/fs/xfs/libxfs/xfs_rmap_btree.h
@@ -19,6 +19,37 @@
 #define	__XFS_RMAP_BTREE_H__
 
 struct xfs_buf;
+struct xfs_btree_cur;
+struct xfs_mount;
+
+/* rmaps only exist on crc enabled filesystems */
+#define XFS_RMAP_BLOCK_LEN	XFS_BTREE_SBLOCK_CRC_LEN
+
+/*
+ * Record, key, and pointer address macros for btree blocks.
+ *
+ * (note that some of these may appear unused, but they are used in userspace)
+ */
+#define XFS_RMAP_REC_ADDR(block, index) \
+	((struct xfs_rmap_rec *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 (((index) - 1) * sizeof(struct xfs_rmap_rec))))
+
+#define XFS_RMAP_KEY_ADDR(block, index) \
+	((struct xfs_rmap_key *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 ((index) - 1) * sizeof(struct xfs_rmap_key)))
+
+#define XFS_RMAP_PTR_ADDR(block, index, maxrecs) \
+	((xfs_rmap_ptr_t *) \
+		((char *)(block) + XFS_RMAP_BLOCK_LEN + \
+		 (maxrecs) * sizeof(struct xfs_rmap_key) + \
+		 ((index) - 1) * sizeof(xfs_rmap_ptr_t)))
+
+struct xfs_btree_cur *xfs_rmapbt_init_cursor(struct xfs_mount *mp,
+				struct xfs_trans *tp, struct xfs_buf *bp,
+				xfs_agnumber_t agno);
+int xfs_rmapbt_maxrecs(struct xfs_mount *mp, int blocklen, int leaf);
 
 int xfs_rmap_alloc(struct xfs_trans *tp, struct xfs_buf *agbp,
 		   xfs_agnumber_t agno, xfs_agblock_t bno, xfs_extlen_t len,
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 019dc32..89d8052 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -35,6 +35,7 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_ialloc_btree.h"
+#include "xfs_rmap_btree.h"
 
 /*
  * Physical superblock buffer manipulations. Shared with libxfs in userspace.
@@ -706,6 +707,11 @@ xfs_sb_mount_common(
 	mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2;
 	mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2;
 
+	mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 1);
+	mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, 0);
+	mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2;
+	mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2;
+
 	mp->m_bsize = XFS_FSB_TO_BB(mp, 1);
 	mp->m_ialloc_inos = (int)MAX((__uint16_t)XFS_INODES_PER_CHUNK,
 					sbp->sb_inopblock);
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 8dda4b3..401cdba 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -38,6 +38,7 @@ extern const struct xfs_buf_ops xfs_agi_buf_ops;
 extern const struct xfs_buf_ops xfs_agf_buf_ops;
 extern const struct xfs_buf_ops xfs_agfl_buf_ops;
 extern const struct xfs_buf_ops xfs_allocbt_buf_ops;
+extern const struct xfs_buf_ops xfs_rmapbt_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_leaf_buf_ops;
 extern const struct xfs_buf_ops xfs_attr3_rmt_buf_ops;
 extern const struct xfs_buf_ops xfs_bmbt_buf_ops;
@@ -216,6 +217,7 @@ int	xfs_log_calc_minimum_size(struct xfs_mount *);
 #define	XFS_INO_BTREE_REF	3
 #define	XFS_ALLOC_BTREE_REF	2
 #define	XFS_BMAP_BTREE_REF	2
+#define	XFS_RMAP_BTREE_REF	2
 #define	XFS_DIR_BTREE_REF	2
 #define	XFS_INO_REF		2
 #define	XFS_ATTR_BTREE_REF	1
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index d9c9834..8030627 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -90,6 +90,8 @@ typedef struct xfs_mount {
 	uint			m_bmap_dmnr[2];	/* min bmap btree records */
 	uint			m_inobt_mxr[2];	/* max inobt btree records */
 	uint			m_inobt_mnr[2];	/* min inobt btree records */
+	uint			m_rmap_mxr[2];	/* max rmap btree records */
+	uint			m_rmap_mnr[2];	/* min rmap btree records */
 	uint			m_ag_maxlevels;	/* XFS_AG_MAXLEVELS */
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/20] xfs: add rmap btree growfs support
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (9 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 10/20] xfs: define the on-disk rmap btree format Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 12/20] xfs: rmap btree transaction reservations Dave Chinner
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now we can read and write rmap btree blocks, we can add support to
the growfs code to initialise new rmap btree blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_fsops.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index ebfeb84..b8b2f06 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -32,6 +32,7 @@
 #include "xfs_btree.h"
 #include "xfs_alloc_btree.h"
 #include "xfs_alloc.h"
+#include "xfs_rmap_btree.h"
 #include "xfs_ialloc.h"
 #include "xfs_fsops.h"
 #include "xfs_itable.h"
@@ -243,6 +244,12 @@ xfs_growfs_data_private(
 		agf->agf_roots[XFS_BTNUM_CNTi] = cpu_to_be32(XFS_CNT_BLOCK(mp));
 		agf->agf_levels[XFS_BTNUM_BNOi] = cpu_to_be32(1);
 		agf->agf_levels[XFS_BTNUM_CNTi] = cpu_to_be32(1);
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			agf->agf_roots[XFS_BTNUM_RMAPi] =
+						cpu_to_be32(XFS_RMAP_BLOCK(mp));
+			agf->agf_levels[XFS_BTNUM_RMAPi] = cpu_to_be32(1);
+		}
+
 		agf->agf_flfirst = 0;
 		agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
 		agf->agf_flcount = 0;
@@ -382,6 +389,67 @@ xfs_growfs_data_private(
 		if (error)
 			goto error0;
 
+		/* RMAP btree root block */
+		if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+			struct xfs_rmap_rec	*rrec;
+			struct xfs_btree_block	*block;
+			bp = xfs_growfs_get_hdr_buf(mp,
+				XFS_AGB_TO_DADDR(mp, agno, XFS_RMAP_BLOCK(mp)),
+				BTOBB(mp->m_sb.sb_blocksize), 0,
+				&xfs_rmapbt_buf_ops);
+			if (!bp) {
+				error = -ENOMEM;
+				goto error0;
+			}
+
+			xfs_btree_init_block(mp, bp, XFS_RMAP_CRC_MAGIC, 0, 2,
+						agno, XFS_BTREE_CRC_BLOCKS);
+			block = XFS_BUF_TO_BLOCK(bp);
+
+
+			/*
+			 * mark the AG header regions as static metadata The BNO
+			 * btree block is the first block after the headers, so
+			 * it's location defines the size of region the static
+			 * metadata consumes.
+			 *
+			 * Note: unlike mkfs, we never have to account for log
+			 * space when growing the data regions
+			 */
+			rrec = XFS_RMAP_REC_ADDR(block, 1);
+			rrec->rm_startblock = 0;
+			rrec->rm_blockcount = cpu_to_be32(XFS_BNO_BLOCK(mp));
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_FS);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account freespace btree root blocks */
+			rrec = XFS_RMAP_REC_ADDR(block, 2);
+			rrec->rm_startblock = cpu_to_be32(XFS_BNO_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(2);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account inode btree root blocks */
+			rrec = XFS_RMAP_REC_ADDR(block, 3);
+			rrec->rm_startblock = cpu_to_be32(XFS_IBT_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(XFS_RMAP_BLOCK(mp) -
+							XFS_IBT_BLOCK(mp));
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_INOBT);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			/* account for rmap btree root */ 
+			rrec = XFS_RMAP_REC_ADDR(block, 4);
+			rrec->rm_startblock = cpu_to_be32(XFS_RMAP_BLOCK(mp));
+			rrec->rm_blockcount = cpu_to_be32(1);
+			rrec->rm_owner = cpu_to_be64(XFS_RMAP_OWN_AG);
+			be16_add_cpu(&block->bb_numrecs, 1);
+
+			error = xfs_bwrite(bp);
+			xfs_buf_relse(bp);
+			if (error)
+				goto error0;
+		}
+
 		/*
 		 * INO btree root block
 		 */
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/20] xfs: rmap btree transaction reservations
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (10 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 11/20] xfs: add rmap btree growfs support Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 13/20] xfs: rmap btree requires more reserved free space Dave Chinner
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The rmap btrees will use the AGFL as the block allocation source, so
we need to ensure that the transaction reservations reflect the fact
this tree is modified by allocation and freeing. Hence we need to
extend all the extent allocation/free reservations used in
transactions to handle this.

Note that this also gets rid of the unused XFS_ALLOCFREE_LOG_RES
macro, as we now do buffer reservations based on the number of
buffers logged via xfs_calc_buf_res(). Hence we only need the buffer
count calculation now.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 56 +++++++++++++++++++++++++++++-------------
 fs/xfs/libxfs/xfs_trans_resv.h | 10 --------
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 68cb1e7..d495f82 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -64,6 +64,28 @@ xfs_calc_buf_res(
 }
 
 /*
+ * Per-extent log reservation for the allocation btree changes
+ * involved in freeing or allocating an extent. When rmap is not enabled,
+ * there are only two trees that will be modified (free space trees), and when
+ * rmap is enabled there will be three (freespace + rmap trees). The number of
+ * blocks reserved is based on the formula:
+ *
+ * num trees * ((2 blocks/level * max depth) - 1)
+ */
+static uint
+xfs_allocfree_log_count(
+	struct xfs_mount *mp,
+	uint		num_ops)
+{
+	uint		num_trees = 2;
+
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb))
+		num_trees++;
+
+	return num_ops * num_trees * (2 * mp->m_ag_maxlevels - 1);
+}
+
+/*
  * Logging inodes is really tricksy. They are logged in memory format,
  * which means that what we write into the log doesn't directly translate into
  * the amount of space they use on disk.
@@ -126,7 +148,7 @@ xfs_calc_inode_res(
  */
 STATIC uint
 xfs_calc_finobt_res(
-	struct xfs_mount 	*mp,
+	struct xfs_mount	*mp,
 	int			alloc,
 	int			modify)
 {
@@ -137,7 +159,7 @@ xfs_calc_finobt_res(
 
 	res = xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1));
 	if (alloc)
-		res += xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1), 
+		res += xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 					XFS_FSB_TO_B(mp, 1));
 	if (modify)
 		res += (uint)XFS_FSB_TO_B(mp, 1);
@@ -188,10 +210,10 @@ xfs_calc_write_reservation(
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				      XFS_FSB_TO_B(mp, 1)) +
 		     xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -217,10 +239,10 @@ xfs_calc_itruncate_reservation(
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) + 1,
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
 				      XFS_FSB_TO_B(mp, 1)) +
 		    xfs_calc_buf_res(5, 0) +
-		    xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				     XFS_FSB_TO_B(mp, 1)) +
 		    xfs_calc_buf_res(2 + mp->m_ialloc_blks +
 				     mp->m_in_maxlevels, 0)));
@@ -247,7 +269,7 @@ xfs_calc_rename_reservation(
 		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 3),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 3),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -286,7 +308,7 @@ xfs_calc_link_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -324,7 +346,7 @@ xfs_calc_remove_reservation(
 		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
 				      XFS_FSB_TO_B(mp, 1))),
 		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -371,7 +393,7 @@ xfs_calc_create_resv_alloc(
 		mp->m_sb.sb_sectsize +
 		xfs_calc_buf_res(mp->m_ialloc_blks, XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -399,7 +421,7 @@ xfs_calc_icreate_resv_alloc(
 	return xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
 		mp->m_sb.sb_sectsize +
 		xfs_calc_buf_res(mp->m_in_maxlevels, XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_finobt_res(mp, 0, 0);
 }
@@ -483,7 +505,7 @@ xfs_calc_ifree_reservation(
 		xfs_calc_buf_res(1, 0) +
 		xfs_calc_buf_res(2 + mp->m_ialloc_blks +
 				 mp->m_in_maxlevels, 0) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_finobt_res(mp, 0, 1);
 }
@@ -513,7 +535,7 @@ xfs_calc_growdata_reservation(
 	struct xfs_mount	*mp)
 {
 	return xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -535,7 +557,7 @@ xfs_calc_growrtalloc_reservation(
 		xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK),
 				 XFS_FSB_TO_B(mp, 1)) +
 		xfs_calc_inode_res(mp, 1) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -611,7 +633,7 @@ xfs_calc_addafork_reservation(
 		xfs_calc_buf_res(1, mp->m_dir_geo->blksize) +
 		xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1,
 				 XFS_FSB_TO_B(mp, 1)) +
-		xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 1),
+		xfs_calc_buf_res(xfs_allocfree_log_count(mp, 1),
 				 XFS_FSB_TO_B(mp, 1));
 }
 
@@ -634,7 +656,7 @@ xfs_calc_attrinval_reservation(
 		    xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK),
 				     XFS_FSB_TO_B(mp, 1))),
 		   (xfs_calc_buf_res(9, mp->m_sb.sb_sectsize) +
-		    xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 4),
+		    xfs_calc_buf_res(xfs_allocfree_log_count(mp, 4),
 				     XFS_FSB_TO_B(mp, 1))));
 }
 
@@ -701,7 +723,7 @@ xfs_calc_attrrm_reservation(
 					XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
 		     xfs_calc_buf_res(XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK), 0)),
 		    (xfs_calc_buf_res(5, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(XFS_ALLOCFREE_LOG_COUNT(mp, 2),
+		     xfs_calc_buf_res(xfs_allocfree_log_count(mp, 2),
 				      XFS_FSB_TO_B(mp, 1))));
 }
 
diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
index 7978150..0eb46ed 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.h
+++ b/fs/xfs/libxfs/xfs_trans_resv.h
@@ -68,16 +68,6 @@ struct xfs_trans_resv {
 #define M_RES(mp)	(&(mp)->m_resv)
 
 /*
- * Per-extent log reservation for the allocation btree changes
- * involved in freeing or allocating an extent.
- * 2 trees * (2 blocks/level * max depth - 1) * block size
- */
-#define	XFS_ALLOCFREE_LOG_RES(mp,nx) \
-	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * (mp)->m_ag_maxlevels - 1)))
-#define	XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
-	((nx) * (2 * (2 * (mp)->m_ag_maxlevels - 1)))
-
-/*
  * Per-directory log reservation for any directory change.
  * dir blocks: (1 btree block per level + data block + free block) * dblock size
  * bmap btree: (levels + 2) * max depth * block size
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/20] xfs: rmap btree requires more reserved free space
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (11 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 12/20] xfs: rmap btree transaction reservations Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-25 16:41   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 14/20] xfs: add rmap btree operations Dave Chinner
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The rmap btree is allocated from the AGFL, which means we have to
ensure ENOSPC is reported to userspace before we run out of free
space in each AG. The last allocation in an AG can cause a full
height rmap btree split, and that means we have to reserve at least
this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
Update the various space calculation functiosn to handle this.

Also, because the macros are now executing conditional code and are called quite
frequently, convert them to functions that initialise varaibles in the struct
xfs_mount, use the new variables everywhere and document the calculations
better.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_alloc.h | 41 ++++------------------------
 fs/xfs/libxfs/xfs_bmap.c  |  2 +-
 fs/xfs/libxfs/xfs_sb.c    |  3 +++
 fs/xfs/xfs_discard.c      |  2 +-
 fs/xfs/xfs_fsops.c        |  4 +--
 fs/xfs/xfs_mount.c        |  2 +-
 fs/xfs/xfs_mount.h        |  2 ++
 fs/xfs/xfs_super.c        |  2 +-
 9 files changed, 85 insertions(+), 42 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index f62775a..c6a1372 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -62,6 +62,72 @@ xfs_prealloc_blocks(
 }
 
 /*
+ * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
+ * AGF buffer (PV 947395), we place constraints on the relationship among actual
+ * allocations for data blocks, freelist blocks, and potential file data bmap
+ * btree blocks. However, these restrictions may result in no actual space
+ * allocated for a delayed extent, for example, a data block in a certain AG is
+ * allocated but there is no additional block for the additional bmap btree
+ * block due to a split of the bmap btree of the file. The result of this may
+ * lead to an infinite loop when the file gets flushed to disk and all delayed
+ * extents need to be actually allocated. To get around this, we explicitly set
+ * aside a few blocks which will not be reserved in delayed allocation.
+ *
+ * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
+ * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,
+ * so we set the number of set-aside blocks to 4 + 4*agcount when not using rmap
+ * btrees.
+ *
+ * When rmap btrees are active, we have to consider that using the last block in
+ * the AG can cause a full height rmap btree split and we need enough blocks on
+ * the AGFL to be able to handle this. That means we have, in addition to the
+ * above consideration, another (2 * mp->m_ag_levels) - 1 blocks required to be
+ * available to the free list.
+ */
+unsigned int
+xfs_alloc_set_aside(
+	struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
+	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
+		return blocks;
+	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_ag_maxlevels) - 1);
+}
+
+/*
+ * When deciding how much space to allocate out of an AG, we limit the
+ * allocation maximum size to the size the AG. However, we cannot use all the
+ * blocks in the AG - some are permanently used by metadata. These
+ * blocks are generally:
+ *	- the AG superblock, AGF, AGI and AGFL
+ *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
+ *	  the AGI free inode and rmap btree root blocks.
+ *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
+ *
+ * The AG headers are sector sized, so the amount of space they take up is
+ * dependent on filesystem geometry. The others are all single blocks.
+ */
+unsigned int
+xfs_alloc_ag_max_usable(struct xfs_mount *mp)
+{
+	unsigned int	blocks;
+
+	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
+	blocks += XFS_ALLOC_AGFL_RESERVE;
+	blocks += 3;			/* AGF, AGI btree root blocks */
+	if (xfs_sb_version_hasfinobt(&mp->m_sb))
+		blocks++;		/* finobt root block */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		/* rmap root block + full tree split on full AG */
+		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
+	}
+
+	return mp->m_sb.sb_agblocks - blocks;
+}
+
+/*
  * Lookup the record equal to [bno, len] in the btree given by cur.
  */
 STATIC int				/* error */
@@ -1906,6 +1972,9 @@ xfs_alloc_min_freelist(
 	/* space needed by-size freespace btree */
 	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
 				       mp->m_ag_maxlevels);
+	/* space needed reverse mapping used space btree */
+	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
+				       mp->m_ag_maxlevels);
 
 	return min_free;
 }
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
index 39ca815..18e4080 100644
--- a/fs/xfs/libxfs/xfs_alloc.h
+++ b/fs/xfs/libxfs/xfs_alloc.h
@@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
 #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
 
 /*
- * In order to avoid ENOSPC-related deadlock caused by
- * out-of-order locking of AGF buffer (PV 947395), we place
- * constraints on the relationship among actual allocations for
- * data blocks, freelist blocks, and potential file data bmap
- * btree blocks. However, these restrictions may result in no
- * actual space allocated for a delayed extent, for example, a data
- * block in a certain AG is allocated but there is no additional
- * block for the additional bmap btree block due to a split of the
- * bmap btree of the file. The result of this may lead to an
- * infinite loop in xfssyncd when the file gets flushed to disk and
- * all delayed extents need to be actually allocated. To get around
- * this, we explicitly set aside a few blocks which will not be
- * reserved in delayed allocation. Considering the minimum number of
- * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
- * btree requires 1 fsb, so we set the number of set-aside blocks
- * to 4 + 4*agcount.
- */
-#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
-
-/*
- * When deciding how much space to allocate out of an AG, we limit the
- * allocation maximum size to the size the AG. However, we cannot use all the
- * blocks in the AG - some are permanently used by metadata. These
- * blocks are generally:
- *	- the AG superblock, AGF, AGI and AGFL
- *	- the AGF (bno and cnt) and AGI btree root blocks
- *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
- *
- * The AG headers are sector sized, so the amount of space they take up is
- * dependent on filesystem geometry. The others are all single blocks.
- */
-#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
-	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
-
-
-/*
  * Argument structure for xfs_alloc routines.
  * This is turned into a structure to avoid having 20 arguments passed
  * down several levels of the stack.
@@ -131,6 +95,11 @@ typedef struct xfs_alloc_arg {
 #define XFS_ALLOC_USERDATA		1	/* allocation is for user data*/
 #define XFS_ALLOC_INITIAL_USER_DATA	2	/* special case start of file */
 
+/* freespace limit calculations */
+#define XFS_ALLOC_AGFL_RESERVE	4
+unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
+unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
+
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
 		struct xfs_perag *pag, xfs_extlen_t need);
 unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 0b40a29..dfb9f28 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3716,7 +3716,7 @@ xfs_bmap_btalloc(
 	args.fsbno = ap->blkno;
 
 	/* Trim the allocation back to the maximum an AG can fit. */
-	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
+	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
 	args.firstblock = *ap->firstblock;
 	blen = 0;
 	if (nullfb) {
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 89d8052..c136abd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -721,6 +721,9 @@ xfs_sb_mount_common(
 		mp->m_ialloc_min_blks = sbp->sb_spino_align;
 	else
 		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
+
+	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
 }
 
 /*
diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index e85a951..ec7bb8b 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -179,7 +179,7 @@ xfs_ioc_trim(
 	 * matter as trimming blocks is an advisory interface.
 	 */
 	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
-	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
+	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
 	    range.len < mp->m_sb.sb_blocksize)
 		return -EINVAL;
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index b8b2f06..d914a51 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -715,7 +715,7 @@ xfs_fs_counts(
 	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
 	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
 	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
-							XFS_ALLOC_SET_ASIDE(mp);
+						mp->m_alloc_set_aside;
 
 	spin_lock(&mp->m_sb_lock);
 	cnt->freertx = mp->m_sb.sb_frextents;
@@ -788,7 +788,7 @@ retry:
 		__int64_t	free;
 
 		free = percpu_counter_sum(&mp->m_fdblocks) -
-							XFS_ALLOC_SET_ASIDE(mp);
+						mp->m_alloc_set_aside;
 		if (!free)
 			goto out; /* ENOSPC and fdblks_delta = 0 */
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 9d6be55..05d3878 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1192,7 +1192,7 @@ xfs_mod_fdblocks(
 		batch = XFS_FDBLOCKS_BATCH;
 
 	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
-	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
+	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
 				     XFS_FDBLOCKS_BATCH) >= 0) {
 		/* we had space! */
 		return 0;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 8030627..cdced0b 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -96,6 +96,8 @@ typedef struct xfs_mount {
 	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
 	uint			m_in_maxlevels;	/* max inobt btree levels. */
 	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
+	uint			m_alloc_set_aside; /* space we can't use */
+	uint			m_ag_max_usable; /* max space per AG */
 	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 	struct mutex		m_growlock;	/* growfs mutex */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 1fb16562..796ccb5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1080,7 +1080,7 @@ xfs_fs_statfs(
 	statp->f_blocks = sbp->sb_dblocks - lsize;
 	spin_unlock(&mp->m_sb_lock);
 
-	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
+	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
 	statp->f_bavail = statp->f_bfree;
 
 	fakeinos = statp->f_bfree << sbp->sb_inopblog;
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/20] xfs: add rmap btree operations
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (12 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 13/20] xfs: rmap btree requires more reserved free space Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 15/20] xfs: add an extent to the rmap btree Dave Chinner
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Implement the generic btree operations needed to manipulate rmap
btree blocks. This is very similar to the per-ag freespace btree
implementation, and uses the AGFL for allocation and freeing of
blocks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_btree.h      |   1 +
 fs/xfs/libxfs/xfs_rmap.c       |  58 ++++++++++++
 fs/xfs/libxfs/xfs_rmap_btree.c | 204 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 263 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index 67e2bbd..48ab2b1 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -204,6 +204,7 @@ typedef struct xfs_btree_cur
 		xfs_alloc_rec_incore_t	a;
 		xfs_bmbt_irec_t		b;
 		xfs_inobt_rec_incore_t	i;
+		struct xfs_rmap_irec	r;
 	}		bc_rec;		/* current insert/search record value */
 	struct xfs_buf	*bc_bufs[XFS_BTREE_MAXLEVELS];	/* buf ptr per level */
 	int		bc_ptrs[XFS_BTREE_MAXLEVELS];	/* key/record # */
diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 3958cf8..38a92a1 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -36,6 +36,64 @@
 #include "xfs_error.h"
 #include "xfs_extent_busy.h"
 
+/*
+ * Lookup the first record less than or equal to [bno, len]
+ * in the btree given by cur.
+ */
+STATIC int
+xfs_rmap_lookup_le(
+	struct xfs_btree_cur	*cur,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	uint64_t		owner,
+	int			*stat)
+{
+	cur->bc_rec.r.rm_startblock = bno;
+	cur->bc_rec.r.rm_blockcount = len;
+	cur->bc_rec.r.rm_owner = owner;
+	return xfs_btree_lookup(cur, XFS_LOOKUP_LE, stat);
+}
+
+/*
+ * Update the record referred to by cur to the value given
+ * by [bno, len, ref].
+ * This either works (return 0) or gets an EFSCORRUPTED error.
+ */
+STATIC int
+xfs_rmap_update(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*irec)
+{
+	union xfs_btree_rec	rec;
+
+	rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock);
+	rec.rmap.rm_blockcount = cpu_to_be32(irec->rm_blockcount);
+	rec.rmap.rm_owner = cpu_to_be64(irec->rm_owner);
+	return xfs_btree_update(cur, &rec);
+}
+
+/*
+ * Get the data from the pointed-to record.
+ */
+STATIC int
+xfs_rmap_get_rec(
+	struct xfs_btree_cur	*cur,
+	struct xfs_rmap_irec	*irec,
+	int			*stat)
+{
+	union xfs_btree_rec	*rec;
+	int			error;
+
+	error = xfs_btree_get_rec(cur, &rec, stat);
+	if (error || !*stat)
+		return error;
+
+	irec->rm_startblock = be32_to_cpu(rec->rmap.rm_startblock);
+	irec->rm_blockcount = be32_to_cpu(rec->rmap.rm_blockcount);
+	irec->rm_owner = be64_to_cpu(rec->rmap.rm_owner);
+	return 0;
+}
+
 int
 xfs_rmap_free(
 	struct xfs_trans	*tp,
diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c
index 9a02699..0b396e6 100644
--- a/fs/xfs/libxfs/xfs_rmap_btree.c
+++ b/fs/xfs/libxfs/xfs_rmap_btree.c
@@ -34,6 +34,26 @@
 #include "xfs_error.h"
 #include "xfs_extent_busy.h"
 
+/*
+ * Reverse map btree.
+ *
+ * This is a per-ag tree used to track the owner of a given extent. Owner
+ * records are inserted when an extent is allocated, and removed when an extent
+ * is freed. There can only be one owner of an extent, usually an inode or some
+ * other metadata structure like a AG btree.
+ *
+ * The rmap btree is part of the free space management, so blocks for the tree
+ * are sourced from the agfl. Hence we need transaction reservation support for
+ * this tree so that the freelist is always large enough. This also impacts on
+ * the minimum space we need to leave free in the AG.
+ *
+ * The tree is ordered by block number - there's no need to order/search by
+ * extent size for online updating/management of the tree, and the reverse
+ * lookups are going to be "who owns this block" and so are by-block ordering is
+ * perfect for this.
+ *
+ */
+
 static struct xfs_btree_cur *
 xfs_rmapbt_dup_cursor(
 	struct xfs_btree_cur	*cur)
@@ -42,6 +62,153 @@ xfs_rmapbt_dup_cursor(
 			cur->bc_private.a.agbp, cur->bc_private.a.agno);
 }
 
+STATIC void
+xfs_rmapbt_set_root(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr,
+	int			inc)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agnumber_t		seqno = be32_to_cpu(agf->agf_seqno);
+	int			btnum = cur->bc_btnum;
+	struct xfs_perag	*pag = xfs_perag_get(cur->bc_mp, seqno);
+
+	ASSERT(ptr->s != 0);
+
+	agf->agf_roots[btnum] = ptr->s;
+	be32_add_cpu(&agf->agf_levels[btnum], inc);
+	pag->pagf_levels[btnum] += inc;
+	xfs_perag_put(pag);
+
+	xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_ROOTS | XFS_AGF_LEVELS);
+}
+
+STATIC int
+xfs_rmapbt_alloc_block(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*start,
+	union xfs_btree_ptr	*new,
+	int			*stat)
+{
+	int			error;
+	xfs_agblock_t		bno;
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_ENTRY);
+
+	/* Allocate the new block from the freelist. If we can't, give up.  */
+	error = xfs_alloc_get_freelist(cur->bc_tp, cur->bc_private.a.agbp,
+				       &bno, 1);
+	if (error) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_ERROR);
+		return error;
+	}
+
+	if (bno == NULLAGBLOCK) {
+		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+		*stat = 0;
+		return 0;
+	}
+
+	xfs_extent_busy_reuse(cur->bc_mp, cur->bc_private.a.agno, bno, 1, false);
+
+	xfs_trans_agbtree_delta(cur->bc_tp, 1);
+	new->s = cpu_to_be32(bno);
+
+	XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
+	*stat = 1;
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_free_block(
+	struct xfs_btree_cur	*cur,
+	struct xfs_buf		*bp)
+{
+	struct xfs_buf		*agbp = cur->bc_private.a.agbp;
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
+	xfs_agblock_t		bno;
+	int			error;
+
+	bno = xfs_daddr_to_agbno(cur->bc_mp, XFS_BUF_ADDR(bp));
+	error = xfs_alloc_put_freelist(cur->bc_tp, agbp, NULL, bno, 1);
+	if (error)
+		return error;
+
+	xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1,
+			      XFS_EXTENT_BUSY_SKIP_DISCARD);
+	xfs_trans_agbtree_delta(cur->bc_tp, -1);
+
+	xfs_trans_binval(cur->bc_tp, bp);
+	return 0;
+}
+
+STATIC int
+xfs_rmapbt_get_minrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_rmap_mnr[level != 0];
+}
+
+STATIC int
+xfs_rmapbt_get_maxrecs(
+	struct xfs_btree_cur	*cur,
+	int			level)
+{
+	return cur->bc_mp->m_rmap_mxr[level != 0];
+}
+
+STATIC void
+xfs_rmapbt_init_key_from_rec(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	key->rmap.rm_startblock = rec->rmap.rm_startblock;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_key(
+	union xfs_btree_key	*key,
+	union xfs_btree_rec	*rec)
+{
+	rec->rmap.rm_startblock = key->rmap.rm_startblock;
+}
+
+STATIC void
+xfs_rmapbt_init_rec_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*rec)
+{
+	rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock);
+	rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount);
+	rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner);
+}
+
+STATIC void
+xfs_rmapbt_init_ptr_from_cur(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_ptr	*ptr)
+{
+	struct xfs_agf		*agf = XFS_BUF_TO_AGF(cur->bc_private.a.agbp);
+
+	ASSERT(cur->bc_private.a.agno == be32_to_cpu(agf->agf_seqno));
+	ASSERT(agf->agf_roots[cur->bc_btnum] != 0);
+
+	ptr->s = agf->agf_roots[cur->bc_btnum];
+}
+
+STATIC __int64_t
+xfs_rmapbt_key_diff(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*key)
+{
+	struct xfs_rmap_irec	*rec = &cur->bc_rec.r;
+	struct xfs_rmap_key	*kp = &key->rmap;
+
+	return (__int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock;
+}
+
 static bool
 xfs_rmapbt_verify(
 	struct xfs_buf		*bp)
@@ -133,12 +300,49 @@ const struct xfs_buf_ops xfs_rmapbt_buf_ops = {
 	.verify_write = xfs_rmapbt_write_verify,
 };
 
+#if defined(DEBUG) || defined(XFS_WARN)
+STATIC int
+xfs_rmapbt_keys_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_key	*k1,
+	union xfs_btree_key	*k2)
+{
+	return be32_to_cpu(k1->rmap.rm_startblock) <
+	       be32_to_cpu(k2->rmap.rm_startblock);
+}
+
+STATIC int
+xfs_rmapbt_recs_inorder(
+	struct xfs_btree_cur	*cur,
+	union xfs_btree_rec	*r1,
+	union xfs_btree_rec	*r2)
+{
+	return be32_to_cpu(r1->rmap.rm_startblock) +
+		be32_to_cpu(r1->rmap.rm_blockcount) <=
+		be32_to_cpu(r2->rmap.rm_startblock);
+}
+#endif	/* DEBUG */
+
 static const struct xfs_btree_ops xfs_rmapbt_ops = {
 	.rec_len		= sizeof(struct xfs_rmap_rec),
 	.key_len		= sizeof(struct xfs_rmap_key),
 
 	.dup_cursor		= xfs_rmapbt_dup_cursor,
+	.set_root		= xfs_rmapbt_set_root,
+	.alloc_block		= xfs_rmapbt_alloc_block,
+	.free_block		= xfs_rmapbt_free_block,
+	.get_minrecs		= xfs_rmapbt_get_minrecs,
+	.get_maxrecs		= xfs_rmapbt_get_maxrecs,
+	.init_key_from_rec	= xfs_rmapbt_init_key_from_rec,
+	.init_rec_from_key	= xfs_rmapbt_init_rec_from_key,
+	.init_rec_from_cur	= xfs_rmapbt_init_rec_from_cur,
+	.init_ptr_from_cur	= xfs_rmapbt_init_ptr_from_cur,
+	.key_diff		= xfs_rmapbt_key_diff,
 	.buf_ops		= &xfs_rmapbt_buf_ops,
+#if defined(DEBUG) || defined(XFS_WARN)
+	.keys_inorder		= xfs_rmapbt_keys_inorder,
+	.recs_inorder		= xfs_rmapbt_recs_inorder,
+#endif
 };
 
 /*
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/20] xfs: add an extent to the rmap btree
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (13 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 14/20] xfs: add rmap btree operations Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-25 16:41   ` Brian Foster
  2015-06-03  6:04 ` [PATCH 16/20] xfs: remove an extent from " Dave Chinner
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now all the btree, free space and transaction infrastructure is in
place, we can finally add the code to insert reverse mappings to the
rmap btree. Freeing will be done in a spearate patch, so just the
addition operation can be focussed on here.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_rmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 138 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index 38a92a1..c1e5d23 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -120,6 +120,18 @@ out_error:
 	return error;
 }
 
+/*
+ * When we allocate a new block, the first thing we do is add a reference to the
+ * extent in the rmap btree. This takes the form of a [agbno, length, owner]
+ * record.  Newly inserted extents should never overlap with an existing extent
+ * in the rmap btree. Hence the insertion is a relatively trivial exercise,
+ * involving checking for adjacent records and merging if the new extent is
+ * contiguous and has the same owner.
+ *
+ * Note that we have no MAXEXTLEN limits here when merging as the length in the
+ * record has the full 32 bits available and hence a single record can track the
+ * entire space in the AG.
+ */
 int
 xfs_rmap_alloc(
 	struct xfs_trans	*tp,
@@ -130,18 +142,143 @@ xfs_rmap_alloc(
 	uint64_t		owner)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_btree_cur	*cur;
+	struct xfs_rmap_irec	ltrec;
+	struct xfs_rmap_irec	gtrec;
+	int			have_gt;
 	int			error = 0;
+	int			i;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
 	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
-	if (1)
+	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+	/*
+	 * For the initial lookup, look for and exact match or the left-adjacent
+	 * record for our insertion point. This will also give us the record for
+	 * start block contiguity tests.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	error = xfs_rmap_get_rec(cur, &ltrec, &i);
+	if (error)
 		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
+	//		agno, bno, len, owner, ltrec.rm_startblock,
+	//		ltrec.rm_blockcount, ltrec.rm_owner);
+
+	XFS_WANT_CORRUPTED_GOTO(mp,
+		ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
+
+	/*
+	 * Increment the cursor to see if we have a right-adjacent record to our
+	 * insertion point. This will give us the record for end block
+	 * contiguity tests.
+	 */
+	error = xfs_btree_increment(cur, 0, &have_gt);
+	if (error)
+		goto out_error;
+	if (have_gt) {
+		error = xfs_rmap_get_rec(cur, &gtrec, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, gtrec 0x%x/0x%x/0x%llx\n",
+	//		agno, bno, len, owner, gtrec.rm_startblock,
+	//		gtrec.rm_blockcount, gtrec.rm_owner);
+		XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= gtrec.rm_startblock,
+					out_error);
+	} else {
+		gtrec.rm_owner = XFS_RMAP_OWN_NULL;
+	}
+
+	/*
+	 * Note: cursor currently points one record to the right of ltrec, even
+	 * if there is no record in the tree to the right.
+	 */
+	if (ltrec.rm_owner == owner &&
+	    ltrec.rm_startblock + ltrec.rm_blockcount == bno) {
+		/*
+		 * left edge contiguous, merge into left record.
+		 *
+		 *       ltbno     ltlen
+		 * orig:   |ooooooooo|
+		 * adding:           |aaaaaaaaa|
+		 * result: |rrrrrrrrrrrrrrrrrrr|
+		 *                  bno       len
+		 */
+		//printk("add left\n");
+		ltrec.rm_blockcount += len;
+		if (gtrec.rm_owner == owner &&
+		    bno + len == gtrec.rm_startblock) {
+			//printk("add middle\n");
+			/*
+			 * right edge also contiguous, delete right record
+			 * and merge into left record.
+			 *
+			 *       ltbno     ltlen    gtbno     gtlen
+			 * orig:   |ooooooooo|         |ooooooooo|
+			 * adding:           |aaaaaaaaa|
+			 * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr|
+			 */
+			ltrec.rm_blockcount += gtrec.rm_blockcount;
+			error = xfs_btree_delete(cur, &i);
+			if (error)
+				goto out_error;
+			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+		}
+
+		/* point the cursor back to the left record and update */
+		error = xfs_btree_decrement(cur, 0, &have_gt);
+		if (error)
+			goto out_error;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (gtrec.rm_owner == owner &&
+		   bno + len == gtrec.rm_startblock) {
+		/*
+		 * right edge contiguous, merge into right record.
+		 *
+		 *                 gtbno     gtlen
+		 * Orig:             |ooooooooo|
+		 * adding: |aaaaaaaaa|
+		 * Result: |rrrrrrrrrrrrrrrrrrr|
+		 *        bno       len
+		 */
+		//printk("add right\n");
+		gtrec.rm_startblock = bno;
+		gtrec.rm_blockcount += len;
+		error = xfs_rmap_update(cur, &gtrec);
+		if (error)
+			goto out_error;
+	} else {
+		//printk("add no match\n");
+		/*
+		 * no contiguous edge with identical owner, insert
+		 * new record at current cursor position.
+		 */
+		cur->bc_rec.r.rm_startblock = bno;
+		cur->bc_rec.r.rm_blockcount = len;
+		cur->bc_rec.r.rm_owner = owner;
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	}
+
 	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
 	return 0;
 
 out_error:
 	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/20] xfs: remove an extent from the rmap btree
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (14 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 15/20] xfs: add an extent to the rmap btree Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 17/20] xfs: add rmap btree geometry feature flag Dave Chinner
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now that we have records in the rmap btree, we need to remove them
when extents are freed. This needs to find the relevant record in
the btree and remove/trim/split it accordingly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_rmap.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 153 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
index c1e5d23..8fd356f 100644
--- a/fs/xfs/libxfs/xfs_rmap.c
+++ b/fs/xfs/libxfs/xfs_rmap.c
@@ -94,6 +94,31 @@ xfs_rmap_get_rec(
 	return 0;
 }
 
+/*
+ * Find the extent in the rmap btree and remove it.
+ *
+ * The record we find should always span a range greater than or equal to the
+ * the extent being freed. This makes the code simple as, in theory, we do not
+ * have to handle ranges that are split across multiple records as extents that
+ * result in bmap btree extent merges should also result in rmap btree extent
+ * merges.  The owner field ensures we don't merge extents from different
+ * structures into the same record, hence this property should always hold true
+ * if we ensure that the rmap btree supports at least the same size maximum
+ * extent as the bmap btree (bmbt MAXEXTLEN is 2^21 blocks at present, rmap
+ * btree record can hold 2^32 blocks in a single extent).
+ *
+ * Special Case #1: when growing the filesystem, we "free" an extent when
+ * growing the last AG. This extent is new space and so it is not tracked as
+ * used space in the btree. The growfs code will pass in an owner of
+ * XFS_RMAP_OWN_NULL to indicate that it expected that there is no owner of this
+ * extent. We verify that - the extent lookup result in a record that does not
+ * overlap.
+ *
+ * Special Case #2: EFIs do not record the owner of the extent, so when
+ * recovering EFIs from the log we pass in XFS_RMAP_OWN_UNKNOWN to tell the rmap
+ * btree to ignore the owner (i.e. wildcard match) so we don't trigger
+ * corruption checks during log recovery.
+ */
 int
 xfs_rmap_free(
 	struct xfs_trans	*tp,
@@ -104,19 +129,146 @@ xfs_rmap_free(
 	uint64_t		owner)
 {
 	struct xfs_mount	*mp = tp->t_mountp;
+	struct xfs_btree_cur	*cur;
+	struct xfs_rmap_irec	ltrec;
 	int			error = 0;
+	int			i;
 
 	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
 		return 0;
 
 	trace_xfs_rmap_free_extent(mp, agno, bno, len, owner);
-	if (1)
+	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
+
+	/*
+	 * We should always have a left record because there's a static record
+	 * for the AG headers at rm_startblock == 0 created by mkfs/growfs that
+	 * will not ever be removed from the tree.
+	 */
+	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
+	if (error)
+		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	error = xfs_rmap_get_rec(cur, &ltrec, &i);
+	if (error)
 		goto out_error;
+	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+
+	/*
+	 * For growfs, the incoming extent must be beyond the left record we
+	 * just found as it is new space and won't be used by anyone. This is
+	 * just a corruption check as we don't actually do anything with this
+	 * extent.
+	 */
+	if (owner == XFS_RMAP_OWN_NULL) {
+		XFS_WANT_CORRUPTED_GOTO(mp, bno > ltrec.rm_startblock +
+						ltrec.rm_blockcount, out_error);
+		goto out_done;
+	}
+
+/*
+	if (owner != ltrec.rm_owner ||
+	    bno > ltrec.rm_startblock + ltrec.rm_blockcount)
+ */
+	//printk("rmfree  ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
+	//		agno, bno, len, owner, ltrec.rm_startblock,
+	//		ltrec.rm_blockcount, ltrec.rm_owner);
+
+	/* make sure the extent we found covers the entire freeing range. */
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno, out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_blockcount >= len, out_error);
+	XFS_WANT_CORRUPTED_GOTO(mp,
+		bno <= ltrec.rm_startblock + ltrec.rm_blockcount, out_error);
+
+	/* make sure the owner matches what we expect to find in the tree */
+	XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
+				    (owner < XFS_RMAP_OWN_NULL &&
+				     owner >= XFS_RMAP_OWN_MIN), out_error);
+
+	if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) {
+	//printk("remove exact\n");
+		/* exact match, simply remove the record from rmap tree */
+		error = xfs_btree_delete(cur, &i);
+		if (error)
+			goto out_error;
+		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
+	} else if (ltrec.rm_startblock == bno) {
+	//printk("remove left\n");
+		/*
+		 * overlap left hand side of extent: move the start, trim the
+		 * length and update the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing: |fffffffff|
+		 * Result:            |rrrrrrrrrr|
+		 *         bno       len
+		 */
+		ltrec.rm_startblock += len;
+		ltrec.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else if (ltrec.rm_startblock + ltrec.rm_blockcount == bno + len) {
+	//printk("remove right\n");
+		/*
+		 * overlap right hand side of extent: trim the length and update
+		 * the current record.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:            |fffffffff|
+		 * Result:  |rrrrrrrrrr|
+		 *                    bno       len
+		 */
+		ltrec.rm_blockcount -= len;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+	} else {
+
+		/*
+		 * overlap middle of extent: trim the length of the existing
+		 * record to the length of the new left-extent size, increment
+		 * the insertion position so we can insert a new record
+		 * containing the remaining right-extent space.
+		 *
+		 *       ltbno                ltlen
+		 * Orig:    |oooooooooooooooooooo|
+		 * Freeing:       |fffffffff|
+		 * Result:  |rrrrr|         |rrrr|
+		 *               bno       len
+		 */
+		xfs_extlen_t	orig_len = ltrec.rm_blockcount;
+	//printk("remove middle\n");
+
+		ltrec.rm_blockcount = bno - ltrec.rm_startblock;;
+		error = xfs_rmap_update(cur, &ltrec);
+		if (error)
+			goto out_error;
+
+		error = xfs_btree_increment(cur, 0, &i);
+		if (error)
+			goto out_error;
+
+		cur->bc_rec.r.rm_startblock = bno + len;
+		cur->bc_rec.r.rm_blockcount = orig_len - len -
+						     ltrec.rm_blockcount;
+		cur->bc_rec.r.rm_owner = ltrec.rm_owner;
+		error = xfs_btree_insert(cur, &i);
+		if (error)
+			goto out_error;
+	}
+
+out_done:
 	trace_xfs_rmap_free_extent_done(mp, agno, bno, len, owner);
+	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
 	return 0;
 
 out_error:
 	trace_xfs_rmap_free_extent_error(mp, agno, bno, len, owner);
+	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
 	return error;
 }
 
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/20] xfs: add rmap btree geometry feature flag
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (15 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 16/20] xfs: remove an extent from " Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 18/20] xfs: add rmap btree block detection to log recovery Dave Chinner
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

So xfs_info and other userspace utilities know the filesystem is
using this feature.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_fs.h | 1 +
 fs/xfs/xfs_fsops.c     | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 89689c6..9fbdb86 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -240,6 +240,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_FTYPE	0x10000	/* inode directory types */
 #define XFS_FSOP_GEOM_FLAGS_FINOBT	0x20000	/* free inode btree */
 #define XFS_FSOP_GEOM_FLAGS_SPINODES	0x40000	/* sparse inode chunks	*/
+#define XFS_FSOP_GEOM_FLAGS_RMAPBT	0x80000	/* Reverse mapping btree */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index d914a51..0eba0d0 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -105,6 +105,8 @@ xfs_fs_geometry(
 				XFS_FSOP_GEOM_FLAGS_FINOBT : 0) |
 			(xfs_sb_version_hassparseinodes(&mp->m_sb) ?
 				XFS_FSOP_GEOM_FLAGS_SPINODES : 0);
+			(xfs_sb_version_hasrmapbt(&mp->m_sb) ?
+				XFS_FSOP_GEOM_FLAGS_RMAPBT : 0);
 		geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
 				mp->m_sb.sb_logsectsize : BBSIZE;
 		geo->rtsectsize = mp->m_sb.sb_blocksize;
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 18/20] xfs: add rmap btree block detection to log recovery
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (16 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 17/20] xfs: add rmap btree geometry feature flag Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 19/20] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Dave Chinner
  2015-06-03  6:04 ` [PATCH 20/20] xfs: enable the rmap btree functionality Dave Chinner
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

So such blocks can be correctly identified and have their operations
structutes attached to validate recovery has not resulted in a
correct block.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 5dad26c..239b19c 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1848,6 +1848,7 @@ xlog_recover_get_buf_lsn(
 	case XFS_ABTC_CRC_MAGIC:
 	case XFS_ABTB_MAGIC:
 	case XFS_ABTC_MAGIC:
+	case XFS_RMAP_CRC_MAGIC:
 	case XFS_IBT_CRC_MAGIC:
 	case XFS_IBT_MAGIC: {
 		struct xfs_btree_block *btb = blk;
@@ -2001,6 +2002,9 @@ xlog_recover_validate_buf_type(
 		case XFS_BMAP_MAGIC:
 			bp->b_ops = &xfs_bmbt_buf_ops;
 			break;
+		case XFS_RMAP_CRC_MAGIC:
+			bp->b_ops = &xfs_rmapbt_buf_ops;
+			break;
 		default:
 			xfs_warn(mp, "Bad btree block magic!");
 			ASSERT(0);
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 19/20] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (17 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 18/20] xfs: add rmap btree block detection to log recovery Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  2015-06-03  6:04 ` [PATCH 20/20] xfs: enable the rmap btree functionality Dave Chinner
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Swapping extents between two inodes requires the owner to be updated
in the rmap tree for all the extents that are swapped. This code
does not yet exist, so switch off the XFS_IOC_SWAPEXT ioctl until
support has been implemented. This will nee dto be done before the
rmap btree code can have the experimental tag removed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_bmap_util.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 5ed272b..b080cea 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1696,6 +1696,19 @@ xfs_swap_extents(
 	__uint64_t	tmp;
 	int		lock_flags;
 
+	/*
+	 * We can't swap extents on rmap btree enabled filesystems yet
+	 * as there is no mechanism to update the owner of extents in
+	 * the rmap tree yet. Hence, for the moment, just reject attempts
+	 * to swap extents with EINVAL after emitting a warning once to remind
+	 * us this needs fixing.
+	 */
+	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
+		WARN_ONCE(1,
+	"XFS: XFS_IOC_SWAPEXT not supported on RMAP enabled filesystems\n");
+		return -EINVAL;
+	}
+
 	tempifp = kmem_alloc(sizeof(xfs_ifork_t), KM_MAYFAIL);
 	if (!tempifp) {
 		error = -ENOMEM;
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 20/20] xfs: enable the rmap btree functionality
  2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
                   ` (18 preceding siblings ...)
  2015-06-03  6:04 ` [PATCH 19/20] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Dave Chinner
@ 2015-06-03  6:04 ` Dave Chinner
  19 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-03  6:04 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Add the feature flag to the supported matrix so that the kernel can
mount and use rmap btree enabled filesystems

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_format.h | 3 ++-
 fs/xfs/libxfs/xfs_sb.c     | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index cd61ce9..9cff517 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -447,7 +447,8 @@ xfs_sb_has_compat_feature(
 #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
 #define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
 #define XFS_SB_FEAT_RO_COMPAT_ALL \
-		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
+		(XFS_SB_FEAT_RO_COMPAT_FINOBT | \
+		 XFS_SB_FEAT_RO_COMPAT_RMAPBT)
 #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
 static inline bool
 xfs_sb_has_ro_compat_feature(
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index c136abd..5fb5410 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -196,6 +196,12 @@ xfs_mount_validate_sb(
 		}
 	}
 
+	if (xfs_sb_version_hasrmapbt(sbp)) {
+		xfs_alert(mp,
+"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
+	}
+
+
 	if (unlikely(
 	    sbp->sb_logstart == 0 && mp->m_logdev_targp == mp->m_ddev_targp)) {
 		xfs_warn(mp,
-- 
2.0.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/20] xfs: introduce rmap btree definitions
  2015-06-03  6:04 ` [PATCH 05/20] xfs: introduce rmap btree definitions Dave Chinner
@ 2015-06-03  6:30   ` Darrick J. Wong
  2015-06-03  6:34     ` Darrick J. Wong
  0 siblings, 1 reply; 37+ messages in thread
From: Darrick J. Wong @ 2015-06-03  6:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:42PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Add new per-ag rmap btree definitions to the per-ag structures. The
> rmap btree will sit inthe empty slots on disk after the free space
> btrees, and hence form a part of the array of space management
> btrees. This requires the definition of the btree to be contiguous
> with the free space btrees.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c  |  6 ++++++
>  fs/xfs/libxfs/xfs_btree.c  |  4 ++--
>  fs/xfs/libxfs/xfs_btree.h  |  3 +++
>  fs/xfs/libxfs/xfs_format.h | 22 +++++++++++++++++-----
>  fs/xfs/libxfs/xfs_types.h  |  4 ++--
>  5 files changed, 30 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index d4aa844..c7206b5 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2267,6 +2267,10 @@ xfs_agf_verify(
>  	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS)
>  		return false;
>  
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
> +	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)
> +		return false;
> +
>  	/*
>  	 * during growfs operations, the perag is not fully initialised,
>  	 * so we can't use it for any useful checking. growfs ensures we can't
> @@ -2397,6 +2401,8 @@ xfs_alloc_read_agf(
>  			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
>  		pag->pagf_levels[XFS_BTNUM_CNTi] =
>  			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> +		pag->pagf_levels[XFS_BTNUM_RMAPi] =
> +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
>  		spin_lock_init(&pag->pagb_lock);
>  		pag->pagb_count = 0;
>  		pag->pagb_tree = RB_ROOT;
> diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> index c72283d..0426152 100644
> --- a/fs/xfs/libxfs/xfs_btree.c
> +++ b/fs/xfs/libxfs/xfs_btree.c
> @@ -42,9 +42,9 @@ kmem_zone_t	*xfs_btree_cur_zone;
>   * Btree magic numbers.
>   */
>  static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
> -	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
> +	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
>  	  XFS_FIBT_MAGIC },
> -	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
> +	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
>  	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
>  };
>  #define xfs_btree_magic(cur) \
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index 8f18bab..ace1995 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -63,6 +63,7 @@ union xfs_btree_rec {
>  #define	XFS_BTNUM_BMAP	((xfs_btnum_t)XFS_BTNUM_BMAPi)
>  #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
>  #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
> +#define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
>  
>  /*
>   * For logging record fields.
> @@ -94,6 +95,7 @@ do {    \
>  	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break;	\
>  	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break;	\
>  	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break;	\
> +	case XFS_BTNUM_RMAP: break;	\

Hmm, so we don't have to provide stats for all the btrees?

<shrug> Actually, I thought it was rather clever that one could grep
/proc/fs/xfs/stat for 'rlbt' to find out if the running xfs driver supports
reflink.

(Not that I want someone to think this is some kind of ABI...)

>  	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
>  	}       \
>  } while (0)
> @@ -108,6 +110,7 @@ do {    \
>  	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
>  	case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
>  	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
> +	case XFS_BTNUM_RMAP: break; \
>  	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
>  	}       \
>  } while (0)
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index a0ae572..d120af4 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -445,6 +445,7 @@ xfs_sb_has_compat_feature(
>  }
>  
>  #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
> +#define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
>  #define XFS_SB_FEAT_RO_COMPAT_ALL \
>  		(XFS_SB_FEAT_RO_COMPAT_FINOBT)

... | XFS_SB_FEAT_RO_COMPAT_RMAPBT) ?

(/me shifts the reflink feature flag to (1 << 2))

>  #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
> @@ -514,6 +515,12 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
>  		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
>  }
>  
> +static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
> +{
> +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
> +		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
> +}
> +
>  /*
>   * end of superblock version macros
>   */
> @@ -574,10 +581,10 @@ xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
>  #define	XFS_AGI_GOOD_VERSION(v)	((v) == XFS_AGI_VERSION)
>  
>  /*
> - * Btree number 0 is bno, 1 is cnt.  This value gives the size of the
> + * Btree number 0 is bno, 1 is cnt, 2 is rmap. This value gives the size of the
>   * arrays below.
>   */
> -#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_CNTi + 1)
> +#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_RMAPi + 1)
>  
>  /*
>   * The second word of agf_levels in the first a.g. overlaps the EFS
> @@ -594,12 +601,10 @@ typedef struct xfs_agf {
>  	__be32		agf_seqno;	/* sequence # starting from 0 */
>  	__be32		agf_length;	/* size in blocks of a.g. */
>  	/*
> -	 * Freespace information
> +	 * Freespace and rmap information
>  	 */
>  	__be32		agf_roots[XFS_BTNUM_AGF];	/* root blocks */
> -	__be32		agf_spare0;	/* spare field */
>  	__be32		agf_levels[XFS_BTNUM_AGF];	/* btree levels */
> -	__be32		agf_spare1;	/* spare field */

Doh, field collision! :)

Guess I'll use up one of the agf_spare64's.

--D

>  
>  	__be32		agf_flfirst;	/* first freelist block's index */
>  	__be32		agf_fllast;	/* last freelist block's index */
> @@ -1277,6 +1282,13 @@ typedef __be32 xfs_inobt_ptr_t;
>  #define	XFS_FIBT_BLOCK(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
>  
>  /*
> + * Reverse mapping btree format definitions
> + *
> + * There is a btree for the reverse map per allocation group
> + */
> +#define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
> +
> +/*
>   * The first data block of an AG depends on whether the filesystem was formatted
>   * with the finobt feature. If so, account for the finobt reserved root btree
>   * block.
> diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> index b79dc66..3d50364 100644
> --- a/fs/xfs/libxfs/xfs_types.h
> +++ b/fs/xfs/libxfs/xfs_types.h
> @@ -108,8 +108,8 @@ typedef enum {
>  } xfs_lookup_t;
>  
>  typedef enum {
> -	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi,
> -	XFS_BTNUM_FINOi, XFS_BTNUM_MAX
> +	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
> +	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
>  } xfs_btnum_t;
>  
>  struct xfs_name {
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 05/20] xfs: introduce rmap btree definitions
  2015-06-03  6:30   ` Darrick J. Wong
@ 2015-06-03  6:34     ` Darrick J. Wong
  0 siblings, 0 replies; 37+ messages in thread
From: Darrick J. Wong @ 2015-06-03  6:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jun 02, 2015 at 11:30:22PM -0700, Darrick J. Wong wrote:
> On Wed, Jun 03, 2015 at 04:04:42PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Add new per-ag rmap btree definitions to the per-ag structures. The
> > rmap btree will sit inthe empty slots on disk after the free space
> > btrees, and hence form a part of the array of space management
> > btrees. This requires the definition of the btree to be contiguous
> > with the free space btrees.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c  |  6 ++++++
> >  fs/xfs/libxfs/xfs_btree.c  |  4 ++--
> >  fs/xfs/libxfs/xfs_btree.h  |  3 +++
> >  fs/xfs/libxfs/xfs_format.h | 22 +++++++++++++++++-----
> >  fs/xfs/libxfs/xfs_types.h  |  4 ++--
> >  5 files changed, 30 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index d4aa844..c7206b5 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2267,6 +2267,10 @@ xfs_agf_verify(
> >  	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNT]) > XFS_BTREE_MAXLEVELS)
> >  		return false;
> >  
> > +	if (xfs_sb_version_hasrmapbt(&mp->m_sb) &&
> > +	    be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAP]) > XFS_BTREE_MAXLEVELS)
> > +		return false;
> > +
> >  	/*
> >  	 * during growfs operations, the perag is not fully initialised,
> >  	 * so we can't use it for any useful checking. growfs ensures we can't
> > @@ -2397,6 +2401,8 @@ xfs_alloc_read_agf(
> >  			be32_to_cpu(agf->agf_levels[XFS_BTNUM_BNOi]);
> >  		pag->pagf_levels[XFS_BTNUM_CNTi] =
> >  			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
> > +		pag->pagf_levels[XFS_BTNUM_RMAPi] =
> > +			be32_to_cpu(agf->agf_levels[XFS_BTNUM_RMAPi]);
> >  		spin_lock_init(&pag->pagb_lock);
> >  		pag->pagb_count = 0;
> >  		pag->pagb_tree = RB_ROOT;
> > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
> > index c72283d..0426152 100644
> > --- a/fs/xfs/libxfs/xfs_btree.c
> > +++ b/fs/xfs/libxfs/xfs_btree.c
> > @@ -42,9 +42,9 @@ kmem_zone_t	*xfs_btree_cur_zone;
> >   * Btree magic numbers.
> >   */
> >  static const __uint32_t xfs_magics[2][XFS_BTNUM_MAX] = {
> > -	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
> > +	{ XFS_ABTB_MAGIC, XFS_ABTC_MAGIC, 0, XFS_BMAP_MAGIC, XFS_IBT_MAGIC,
> >  	  XFS_FIBT_MAGIC },
> > -	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC,
> > +	{ XFS_ABTB_CRC_MAGIC, XFS_ABTC_CRC_MAGIC, XFS_RMAP_CRC_MAGIC,
> >  	  XFS_BMAP_CRC_MAGIC, XFS_IBT_CRC_MAGIC, XFS_FIBT_CRC_MAGIC }
> >  };
> >  #define xfs_btree_magic(cur) \
> > diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> > index 8f18bab..ace1995 100644
> > --- a/fs/xfs/libxfs/xfs_btree.h
> > +++ b/fs/xfs/libxfs/xfs_btree.h
> > @@ -63,6 +63,7 @@ union xfs_btree_rec {
> >  #define	XFS_BTNUM_BMAP	((xfs_btnum_t)XFS_BTNUM_BMAPi)
> >  #define	XFS_BTNUM_INO	((xfs_btnum_t)XFS_BTNUM_INOi)
> >  #define	XFS_BTNUM_FINO	((xfs_btnum_t)XFS_BTNUM_FINOi)
> > +#define	XFS_BTNUM_RMAP	((xfs_btnum_t)XFS_BTNUM_RMAPi)
> >  
> >  /*
> >   * For logging record fields.
> > @@ -94,6 +95,7 @@ do {    \
> >  	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break;	\
> >  	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break;	\
> >  	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break;	\
> > +	case XFS_BTNUM_RMAP: break;	\
> 
> Hmm, so we don't have to provide stats for all the btrees?

Aha, they're in the later patches, which my mailer helpfully ... delivered
later.

I think I need sleep.  Disregard most of this email. :)

I'll move the reflink fields so they don't conflict, and read these patches
more closely tomorrow.  Sorry for the noise.

--D

> 
> <shrug> Actually, I thought it was rather clever that one could grep
> /proc/fs/xfs/stat for 'rlbt' to find out if the running xfs driver supports
> reflink.
> 
> (Not that I want someone to think this is some kind of ABI...)
> 
> >  	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
> >  	}       \
> >  } while (0)
> > @@ -108,6 +110,7 @@ do {    \
> >  	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
> >  	case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
> >  	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
> > +	case XFS_BTNUM_RMAP: break; \
> >  	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
> >  	}       \
> >  } while (0)
> > diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> > index a0ae572..d120af4 100644
> > --- a/fs/xfs/libxfs/xfs_format.h
> > +++ b/fs/xfs/libxfs/xfs_format.h
> > @@ -445,6 +445,7 @@ xfs_sb_has_compat_feature(
> >  }
> >  
> >  #define XFS_SB_FEAT_RO_COMPAT_FINOBT   (1 << 0)		/* free inode btree */
> > +#define XFS_SB_FEAT_RO_COMPAT_RMAPBT   (1 << 1)		/* reverse map btree */
> >  #define XFS_SB_FEAT_RO_COMPAT_ALL \
> >  		(XFS_SB_FEAT_RO_COMPAT_FINOBT)
> 
> ... | XFS_SB_FEAT_RO_COMPAT_RMAPBT) ?
> 
> (/me shifts the reflink feature flag to (1 << 2))
> 
> >  #define XFS_SB_FEAT_RO_COMPAT_UNKNOWN	~XFS_SB_FEAT_RO_COMPAT_ALL
> > @@ -514,6 +515,12 @@ static inline bool xfs_sb_version_hassparseinodes(struct xfs_sb *sbp)
> >  		xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_SPINODES);
> >  }
> >  
> > +static inline bool xfs_sb_version_hasrmapbt(struct xfs_sb *sbp)
> > +{
> > +	return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5) &&
> > +		(sbp->sb_features_ro_compat & XFS_SB_FEAT_RO_COMPAT_RMAPBT);
> > +}
> > +
> >  /*
> >   * end of superblock version macros
> >   */
> > @@ -574,10 +581,10 @@ xfs_is_quota_inode(struct xfs_sb *sbp, xfs_ino_t ino)
> >  #define	XFS_AGI_GOOD_VERSION(v)	((v) == XFS_AGI_VERSION)
> >  
> >  /*
> > - * Btree number 0 is bno, 1 is cnt.  This value gives the size of the
> > + * Btree number 0 is bno, 1 is cnt, 2 is rmap. This value gives the size of the
> >   * arrays below.
> >   */
> > -#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_CNTi + 1)
> > +#define	XFS_BTNUM_AGF	((int)XFS_BTNUM_RMAPi + 1)
> >  
> >  /*
> >   * The second word of agf_levels in the first a.g. overlaps the EFS
> > @@ -594,12 +601,10 @@ typedef struct xfs_agf {
> >  	__be32		agf_seqno;	/* sequence # starting from 0 */
> >  	__be32		agf_length;	/* size in blocks of a.g. */
> >  	/*
> > -	 * Freespace information
> > +	 * Freespace and rmap information
> >  	 */
> >  	__be32		agf_roots[XFS_BTNUM_AGF];	/* root blocks */
> > -	__be32		agf_spare0;	/* spare field */
> >  	__be32		agf_levels[XFS_BTNUM_AGF];	/* btree levels */
> > -	__be32		agf_spare1;	/* spare field */
> 
> Doh, field collision! :)
> 
> Guess I'll use up one of the agf_spare64's.
> 
> --D
> 
> >  
> >  	__be32		agf_flfirst;	/* first freelist block's index */
> >  	__be32		agf_fllast;	/* last freelist block's index */
> > @@ -1277,6 +1282,13 @@ typedef __be32 xfs_inobt_ptr_t;
> >  #define	XFS_FIBT_BLOCK(mp)		((xfs_agblock_t)(XFS_IBT_BLOCK(mp) + 1))
> >  
> >  /*
> > + * Reverse mapping btree format definitions
> > + *
> > + * There is a btree for the reverse map per allocation group
> > + */
> > +#define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
> > +
> > +/*
> >   * The first data block of an AG depends on whether the filesystem was formatted
> >   * with the finobt feature. If so, account for the finobt reserved root btree
> >   * block.
> > diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h
> > index b79dc66..3d50364 100644
> > --- a/fs/xfs/libxfs/xfs_types.h
> > +++ b/fs/xfs/libxfs/xfs_types.h
> > @@ -108,8 +108,8 @@ typedef enum {
> >  } xfs_lookup_t;
> >  
> >  typedef enum {
> > -	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi,
> > -	XFS_BTNUM_FINOi, XFS_BTNUM_MAX
> > +	XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi,
> > +	XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_MAX
> >  } xfs_btnum_t;
> >  
> >  struct xfs_name {
> > -- 
> > 2.0.0
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures
  2015-06-03  6:04 ` [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures Dave Chinner
@ 2015-06-15 14:57   ` Brian Foster
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Foster @ 2015-06-15 14:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:38PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> At the moment, xfs_alloc_fix_freelist() uses a mix of per-ag based
> access and agf buffer  based access to freelist and space usage
> information. However, once the AGF buffer is locked inside this
> function, it is guaranteed that both the in-memory and on-disk
> values are identical. xfs_alloc_fix_freelist() doesn't modify the
> values in the structures directly, so it is a read-only user of the
> infomration, and hence can use the per-ag structure exclusively for
> determining what it should do.
> 
> This opens up an avenue for cleaning up a lot of duplicated logic
> whose only difference is the structure it gets the data from, and in
> doing so removes a lot of needless byte swapping overhead when
> fixing up the free list.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_alloc.c | 69 +++++++++++++++++++++--------------------------
>  fs/xfs/libxfs/xfs_alloc.h |  8 ++----
>  fs/xfs/libxfs/xfs_bmap.c  |  3 ++-
>  fs/xfs/xfs_filestream.c   |  3 ++-
>  4 files changed, 37 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index bc78ac0..08b45f8 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1857,11 +1857,11 @@ xfs_alloc_compute_maxlevels(
>  xfs_extlen_t
>  xfs_alloc_longest_free_extent(
>  	struct xfs_mount	*mp,
> -	struct xfs_perag	*pag)
> +	struct xfs_perag	*pag,
> +	xfs_extlen_t		need)
>  {
> -	xfs_extlen_t		need, delta = 0;
> +	xfs_extlen_t		delta = 0;
>  
> -	need = XFS_MIN_FREELIST_PAG(pag, mp);
>  	if (need > pag->pagf_flcount)
>  		delta = need - pag->pagf_flcount;
>  
> @@ -1880,10 +1880,8 @@ xfs_alloc_fix_freelist(
>  	int		flags)	/* XFS_ALLOC_FLAG_... */
>  {
>  	xfs_buf_t	*agbp;	/* agf buffer pointer */
> -	xfs_agf_t	*agf;	/* a.g. freespace structure pointer */
>  	xfs_buf_t	*agflbp;/* agfl buffer pointer */
>  	xfs_agblock_t	bno;	/* freelist block */
> -	xfs_extlen_t	delta;	/* new blocks needed in freelist */
>  	int		error;	/* error result code */
>  	xfs_extlen_t	longest;/* longest extent in allocation group */
>  	xfs_mount_t	*mp;	/* file system mount point structure */
> @@ -1927,7 +1925,7 @@ xfs_alloc_fix_freelist(
>  		 * total blocks, reject it.
>  		 */
>  		need = XFS_MIN_FREELIST_PAG(pag, mp);
> -		longest = xfs_alloc_longest_free_extent(mp, pag);
> +		longest = xfs_alloc_longest_free_extent(mp, pag, need);
>  		if ((args->minlen + args->alignment + args->minalignslop - 1) >
>  				longest ||
>  		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
> @@ -1954,25 +1952,16 @@ xfs_alloc_fix_freelist(
>  			return 0;
>  		}
>  	}
> -	/*
> -	 * Figure out how many blocks we should have in the freelist.
> -	 */
> -	agf = XFS_BUF_TO_AGF(agbp);
> -	need = XFS_MIN_FREELIST(agf, mp);
> -	/*
> -	 * If there isn't enough total or single-extent, reject it.
> -	 */
> +
> +
> +	/* If there isn't enough total space or single-extent, reject it. */
> +	need = XFS_MIN_FREELIST_PAG(pag, mp);
>  	if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
> -		delta = need > be32_to_cpu(agf->agf_flcount) ?
> -			(need - be32_to_cpu(agf->agf_flcount)) : 0;
> -		longest = be32_to_cpu(agf->agf_longest);
> -		longest = (longest > delta) ? (longest - delta) :
> -			(be32_to_cpu(agf->agf_flcount) > 0 || longest > 0);
> +		longest = xfs_alloc_longest_free_extent(mp, pag, need);
>  		if ((args->minlen + args->alignment + args->minalignslop - 1) >
>  				longest ||
> -		    ((int)(be32_to_cpu(agf->agf_freeblks) +
> -		     be32_to_cpu(agf->agf_flcount) - need - args->total) <
> -				(int)args->minleft)) {
> +		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
> +			   need - args->total) < (int)args->minleft)) {
>  			xfs_trans_brelse(tp, agbp);
>  			args->agbp = NULL;
>  			return 0;
> @@ -1980,21 +1969,25 @@ xfs_alloc_fix_freelist(
>  	}
>  	/*
>  	 * Make the freelist shorter if it's too long.
> +	 *
> +	 * XXX (dgc): When we have lots of free space, does this buy us
> +	 * anything other than extra overhead when we need to put more blocks
> +	 * back on the free list? Maybe we should only do this when space is
> +	 * getting low or the AGFL is more than half full?
>  	 */
> -	while (be32_to_cpu(agf->agf_flcount) > need) {
> -		xfs_buf_t	*bp;
> +	while (pag->pagf_flcount > need) {
> +		struct xfs_buf	*bp;
>  
>  		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
>  		if (error)
>  			return error;
> -		if ((error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1)))
> +		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
> +		if (error)
>  			return error;
>  		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
>  		xfs_trans_binval(tp, bp);
>  	}
> -	/*
> -	 * Initialize the args structure.
> -	 */
> +
>  	memset(&targs, 0, sizeof(targs));
>  	targs.tp = tp;
>  	targs.mp = mp;
> @@ -2003,18 +1996,18 @@ xfs_alloc_fix_freelist(
>  	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
>  	targs.type = XFS_ALLOCTYPE_THIS_AG;
>  	targs.pag = pag;
> -	if ((error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp)))
> +	error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
> +	if (error)
>  		return error;
> -	/*
> -	 * Make the freelist longer if it's too short.
> -	 */
> -	while (be32_to_cpu(agf->agf_flcount) < need) {
> +
> +	/* Make the freelist longer if it's too short. */
> +	while (pag->pagf_flcount < need) {
>  		targs.agbno = 0;
> -		targs.maxlen = need - be32_to_cpu(agf->agf_flcount);
> -		/*
> -		 * Allocate as many blocks as possible at once.
> -		 */
> -		if ((error = xfs_alloc_ag_vextent(&targs))) {
> +		targs.maxlen = need - pag->pagf_flcount;
> +
> +		/* Allocate as many blocks as possible at once. */
> +		error = xfs_alloc_ag_vextent(&targs);
> +		if (error) {
>  			xfs_trans_brelse(tp, agflbp);
>  			return error;
>  		}
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 29f27b2..a4d3b9a 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -130,12 +130,8 @@ typedef struct xfs_alloc_arg {
>  #define XFS_ALLOC_USERDATA		1	/* allocation is for user data*/
>  #define XFS_ALLOC_INITIAL_USER_DATA	2	/* special case start of file */
>  
> -/*
> - * Find the length of the longest extent in an AG.
> - */
> -xfs_extlen_t
> -xfs_alloc_longest_free_extent(struct xfs_mount *mp,
> -		struct xfs_perag *pag);
> +xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
> +		struct xfs_perag *pag, xfs_extlen_t need);
>  
>  /*
>   * Compute and fill in value of m_ag_maxlevels.
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 5cb3e85..7382cce 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3521,7 +3521,8 @@ xfs_bmap_longest_free_extent(
>  		}
>  	}
>  
> -	longest = xfs_alloc_longest_free_extent(mp, pag);
> +	longest = xfs_alloc_longest_free_extent(mp, pag,
> +						XFS_MIN_FREELIST_PAG(pag, mp));
>  	if (*blen < longest)
>  		*blen = longest;
>  
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index da82f1c..9ac5eaa 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -196,7 +196,8 @@ xfs_filestream_pick_ag(
>  			goto next_ag;
>  		}
>  
> -		longest = xfs_alloc_longest_free_extent(mp, pag);
> +		longest = xfs_alloc_longest_free_extent(mp, pag,
> +						XFS_MIN_FREELIST_PAG(pag, mp));
>  		if (((minlen && longest >= minlen) ||
>  		     (!minlen && pag->pagf_freeblks >= minfree)) &&
>  		    (!pag->pagf_metadata || !(flags & XFS_PICK_USERDATA) ||
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 02/20] xfs: factor out free space extent length check
  2015-06-03  6:04 ` [PATCH 02/20] xfs: factor out free space extent length check Dave Chinner
@ 2015-06-15 14:58   ` Brian Foster
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Foster @ 2015-06-15 14:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:39PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The longest extent length checks in xfs_alloc_fix_freelist() are now
> essentially identical. Factor them out into a helper function, so we
> know they are checking exactly the same thing before and after we
> lock the AGF.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 71 +++++++++++++++++++++++++++++------------------
>  1 file changed, 44 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 08b45f8..2471cb5 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1871,6 +1871,39 @@ xfs_alloc_longest_free_extent(
>  }
>  
>  /*
> + * Check if the operation we are fixing up the freelist for should go ahead or
> + * not. If we are freeing blocks, we always allow it, otherwise the allocation
> + * is dependent on whether the size and shape of free space available will
> + * permit the requested allocation to take place.
> + */
> +static bool
> +xfs_alloc_space_available(
> +	struct xfs_alloc_arg	*args,
> +	xfs_extlen_t		min_free,
> +	int			flags)
> +{
> +	struct xfs_perag	*pag = args->pag;
> +	xfs_extlen_t		longest;
> +	int			available;
> +
> +	if (flags & XFS_ALLOC_FLAG_FREEING)
> +		return true;
> +
> +	/* do we have enough contiguous free space for the allocation? */
> +	longest = xfs_alloc_longest_free_extent(args->mp, pag, min_free);
> +	if ((args->minlen + args->alignment + args->minalignslop - 1) > longest)
> +		return false;
> +
> +	/* do have enough free space remaining for the allocation? */
> +	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
> +			  min_free - args->total);
> +	if (available < (int)args->minleft)
> +		return false;
> +
> +	return true;
> +}
> +
> +/*
>   * Decide whether to use this allocation group for this allocation.
>   * If so, fix up the btree freelist's size.
>   */
> @@ -1883,7 +1916,6 @@ xfs_alloc_fix_freelist(
>  	xfs_buf_t	*agflbp;/* agfl buffer pointer */
>  	xfs_agblock_t	bno;	/* freelist block */
>  	int		error;	/* error result code */
> -	xfs_extlen_t	longest;/* longest extent in allocation group */
>  	xfs_mount_t	*mp;	/* file system mount point structure */
>  	xfs_extlen_t	need;	/* total blocks needed in freelist */
>  	xfs_perag_t	*pag;	/* per-ag information structure */
> @@ -1919,22 +1951,12 @@ xfs_alloc_fix_freelist(
>  		return 0;
>  	}
>  
> -	if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
> -		/*
> -		 * If it looks like there isn't a long enough extent, or enough
> -		 * total blocks, reject it.
> -		 */
> -		need = XFS_MIN_FREELIST_PAG(pag, mp);
> -		longest = xfs_alloc_longest_free_extent(mp, pag, need);
> -		if ((args->minlen + args->alignment + args->minalignslop - 1) >
> -				longest ||
> -		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
> -			   need - args->total) < (int)args->minleft)) {
> -			if (agbp)
> -				xfs_trans_brelse(tp, agbp);
> -			args->agbp = NULL;
> -			return 0;
> -		}
> +	need = XFS_MIN_FREELIST_PAG(pag, mp);

'need' could probably be initialized once at the top of the function at
this point. This is duplicated in the subsequent hunk and doesn't look
like the function modifies it anywhere. That minor bit aside, the rest
looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> +	if (!xfs_alloc_space_available(args, need, flags)) {
> +		if (agbp)
> +			xfs_trans_brelse(tp, agbp);
> +		args->agbp = NULL;
> +		return 0;
>  	}
>  
>  	/*
> @@ -1956,17 +1978,12 @@ xfs_alloc_fix_freelist(
>  
>  	/* If there isn't enough total space or single-extent, reject it. */
>  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> -	if (!(flags & XFS_ALLOC_FLAG_FREEING)) {
> -		longest = xfs_alloc_longest_free_extent(mp, pag, need);
> -		if ((args->minlen + args->alignment + args->minalignslop - 1) >
> -				longest ||
> -		    ((int)(pag->pagf_freeblks + pag->pagf_flcount -
> -			   need - args->total) < (int)args->minleft)) {
> -			xfs_trans_brelse(tp, agbp);
> -			args->agbp = NULL;
> -			return 0;
> -		}
> +	if (!xfs_alloc_space_available(args, need, flags)) {
> +		xfs_trans_brelse(tp, agbp);
> +		args->agbp = NULL;
> +		return 0;
>  	}
> +
>  	/*
>  	 * Make the freelist shorter if it's too long.
>  	 *
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist
  2015-06-03  6:04 ` [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist Dave Chinner
@ 2015-06-15 14:58   ` Brian Foster
  2015-06-15 21:51     ` Dave Chinner
  0 siblings, 1 reply; 37+ messages in thread
From: Brian Foster @ 2015-06-15 14:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:40PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The error handling is currently an inconsistent mess as every error
> condition handles return values and releasing buffers individually.
> Clean this up by using gotos and a sane error label stack.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 103 +++++++++++++++++++++-------------------------
>  1 file changed, 47 insertions(+), 56 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 2471cb5..352db46 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1909,80 +1909,65 @@ xfs_alloc_space_available(
>   */
>  STATIC int			/* error */
>  xfs_alloc_fix_freelist(
> -	xfs_alloc_arg_t	*args,	/* allocation argument structure */
> -	int		flags)	/* XFS_ALLOC_FLAG_... */
> +	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> +	int			flags)	/* XFS_ALLOC_FLAG_... */
>  {
> -	xfs_buf_t	*agbp;	/* agf buffer pointer */
> -	xfs_buf_t	*agflbp;/* agfl buffer pointer */
> -	xfs_agblock_t	bno;	/* freelist block */
> -	int		error;	/* error result code */
> -	xfs_mount_t	*mp;	/* file system mount point structure */
> -	xfs_extlen_t	need;	/* total blocks needed in freelist */
> -	xfs_perag_t	*pag;	/* per-ag information structure */
> -	xfs_alloc_arg_t	targs;	/* local allocation arguments */
> -	xfs_trans_t	*tp;	/* transaction pointer */
> -
> -	mp = args->mp;
> +	struct xfs_mount	*mp = args->mp;
> +	struct xfs_perag	*pag = args->pag;
> +	struct xfs_trans	*tp = args->tp;
> +	struct xfs_buf		*agbp = NULL;
> +	struct xfs_buf		*agflbp = NULL;
> +	struct xfs_alloc_arg	targs;	/* local allocation arguments */
> +	xfs_agblock_t		bno;	/* freelist block */
> +	xfs_extlen_t		need;	/* total blocks needed in freelist */
> +	int			error;
>  
> -	pag = args->pag;
> -	tp = args->tp;
>  	if (!pag->pagf_init) {
> -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> -				&agbp)))
> -			return error;
> +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> +		if (error)
> +			goto out_no_agbp;
>  		if (!pag->pagf_init) {
>  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
>  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> -			args->agbp = NULL;
> -			return 0;
> +			goto out_agbp_relse;
>  		}
> -	} else
> -		agbp = NULL;
> +	}
>  
>  	/*
> -	 * If this is a metadata preferred pag and we are user data
> -	 * then try somewhere else if we are not being asked to
> -	 * try harder at this point
> +	 * If this is a metadata preferred pag and we are user data then try
> +	 * somewhere else if we are not being asked to try harder at this
> +	 * point
>  	 */
>  	if (pag->pagf_metadata && args->userdata &&
>  	    (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
>  		ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> -		args->agbp = NULL;
> -		return 0;
> +		goto out_agbp_relse;
>  	}
>  
>  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> -	if (!xfs_alloc_space_available(args, need, flags)) {
> -		if (agbp)
> -			xfs_trans_brelse(tp, agbp);
> -		args->agbp = NULL;
> -		return 0;
> -	}
> +	if (!xfs_alloc_space_available(args, need, flags))
> +		goto out_agbp_relse;
>  
>  	/*
>  	 * Get the a.g. freespace buffer.
>  	 * Can fail if we're not blocking on locks, and it's held.
>  	 */
> -	if (agbp == NULL) {
> -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> -				&agbp)))
> -			return error;
> -		if (agbp == NULL) {
> +	if (!agbp) {
> +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> +		if (error)
> +			goto out_no_agbp;
> +		if (!agbp) {
>  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
>  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> -			args->agbp = NULL;
> -			return 0;
> +			goto out_no_agbp;
>  		}
>  	}
>  
>  
>  	/* If there isn't enough total space or single-extent, reject it. */
>  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> -	if (!xfs_alloc_space_available(args, need, flags)) {
> -		xfs_trans_brelse(tp, agbp);
> -		args->agbp = NULL;
> -		return 0;
> -	}
> +	if (!xfs_alloc_space_available(args, need, flags))
> +		goto out_agbp_relse;
>  
>  	/*
>  	 * Make the freelist shorter if it's too long.
> @@ -1997,10 +1982,10 @@ xfs_alloc_fix_freelist(
>  
>  		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
>  		if (error)
> -			return error;
> +			goto out_agbp_relse;

So at this point it looks like the buffer could be logged (i.e., dirty
in the transaction). Perhaps this is the reason for the lack of agbp
releases from this point forward in the current error handling. That
said, we do the agflbp release unconditionally at the end of the
function even when it might be logged. xfs_trans_brelse() appears to
handle this case as it just skips removing the item from the tp.

This is a bit confusing and at the very least seems like an unexpected
use of xfs_trans_brelse(). Is this intentional?

Brian

>  		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
>  		if (error)
> -			return error;
> +			goto out_agbp_relse;
>  		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
>  		xfs_trans_binval(tp, bp);
>  	}
> @@ -2015,7 +2000,7 @@ xfs_alloc_fix_freelist(
>  	targs.pag = pag;
>  	error = xfs_alloc_read_agfl(mp, tp, targs.agno, &agflbp);
>  	if (error)
> -		return error;
> +		goto out_agbp_relse;
>  
>  	/* Make the freelist longer if it's too short. */
>  	while (pag->pagf_flcount < need) {
> @@ -2024,10 +2009,9 @@ xfs_alloc_fix_freelist(
>  
>  		/* Allocate as many blocks as possible at once. */
>  		error = xfs_alloc_ag_vextent(&targs);
> -		if (error) {
> -			xfs_trans_brelse(tp, agflbp);
> -			return error;
> -		}
> +		if (error)
> +			goto out_agflbp_relse;
> +
>  		/*
>  		 * Stop if we run out.  Won't happen if callers are obeying
>  		 * the restrictions correctly.  Can happen for free calls
> @@ -2036,9 +2020,7 @@ xfs_alloc_fix_freelist(
>  		if (targs.agbno == NULLAGBLOCK) {
>  			if (flags & XFS_ALLOC_FLAG_FREEING)
>  				break;
> -			xfs_trans_brelse(tp, agflbp);
> -			args->agbp = NULL;
> -			return 0;
> +			goto out_agflbp_relse;
>  		}
>  		/*
>  		 * Put each allocated block on the list.
> @@ -2047,12 +2029,21 @@ xfs_alloc_fix_freelist(
>  			error = xfs_alloc_put_freelist(tp, agbp,
>  							agflbp, bno, 0);
>  			if (error)
> -				return error;
> +				goto out_agflbp_relse;
>  		}
>  	}
>  	xfs_trans_brelse(tp, agflbp);
>  	args->agbp = agbp;
>  	return 0;
> +
> +out_agflbp_relse:
> +	xfs_trans_brelse(tp, agflbp);
> +out_agbp_relse:
> +	if (agbp)
> +		xfs_trans_brelse(tp, agbp);
> +out_no_agbp:
> +	args->agbp = NULL;
> +	return error;
>  }
>  
>  /*
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/20] xfs: clean up XFS_MIN_FREELIST macros
  2015-06-03  6:04 ` [PATCH 04/20] xfs: clean up XFS_MIN_FREELIST macros Dave Chinner
@ 2015-06-15 14:58   ` Brian Foster
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Foster @ 2015-06-15 14:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:41PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We no longer calculate the minimum freelist size from the on-disk
> AGF, so we don't need the macros used for this. That means the
> nested macros can be cleaned up, and turn this into an actual
> function so the logic is clear and concise. This will make it much
> easier to add support for the rmap btree when the time comes.
> 
> This also gets rid of the XFS_AG_MAXLEVELS macro used by these
> freelist macros as it is simply a wrapper around a single variable.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_alloc.c       | 22 +++++++++++++++++++---
>  fs/xfs/libxfs/xfs_alloc.h       |  2 ++
>  fs/xfs/libxfs/xfs_bmap.c        |  2 +-
>  fs/xfs/libxfs/xfs_format.h      | 13 -------------
>  fs/xfs/libxfs/xfs_trans_resv.h  |  4 ++--
>  fs/xfs/libxfs/xfs_trans_space.h |  2 +-
>  fs/xfs/xfs_filestream.c         |  2 +-
>  7 files changed, 26 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index 352db46..d4aa844 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1870,6 +1870,23 @@ xfs_alloc_longest_free_extent(
>  	return pag->pagf_flcount > 0 || pag->pagf_longest > 0;
>  }
>  
> +unsigned int
> +xfs_alloc_min_freelist(
> +	struct xfs_mount	*mp,
> +	struct xfs_perag	*pag)
> +{
> +	unsigned int		min_free;
> +
> +	/* space needed by-bno freespace btree */
> +	min_free = min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_BNOi] + 1,
> +				       mp->m_ag_maxlevels);
> +	/* space needed by-size freespace btree */
> +	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
> +				       mp->m_ag_maxlevels);
> +
> +	return min_free;
> +}
> +
>  /*
>   * Check if the operation we are fixing up the freelist for should go ahead or
>   * not. If we are freeing blocks, we always allow it, otherwise the allocation
> @@ -1944,7 +1961,7 @@ xfs_alloc_fix_freelist(
>  		goto out_agbp_relse;
>  	}
>  
> -	need = XFS_MIN_FREELIST_PAG(pag, mp);
> +	need = xfs_alloc_min_freelist(mp, pag);
>  	if (!xfs_alloc_space_available(args, need, flags))
>  		goto out_agbp_relse;
>  
> @@ -1963,9 +1980,8 @@ xfs_alloc_fix_freelist(
>  		}
>  	}
>  
> -
>  	/* If there isn't enough total space or single-extent, reject it. */
> -	need = XFS_MIN_FREELIST_PAG(pag, mp);
> +	need = xfs_alloc_min_freelist(mp, pag);
>  	if (!xfs_alloc_space_available(args, need, flags))
>  		goto out_agbp_relse;
>  
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index a4d3b9a..ca1c816 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -132,6 +132,8 @@ typedef struct xfs_alloc_arg {
>  
>  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
>  		struct xfs_perag *pag, xfs_extlen_t need);
> +unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
> +		struct xfs_perag *pag);
>  
>  /*
>   * Compute and fill in value of m_ag_maxlevels.
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 7382cce..983a5d0 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3522,7 +3522,7 @@ xfs_bmap_longest_free_extent(
>  	}
>  
>  	longest = xfs_alloc_longest_free_extent(mp, pag,
> -						XFS_MIN_FREELIST_PAG(pag, mp));
> +					xfs_alloc_min_freelist(mp, pag));
>  	if (*blen < longest)
>  		*blen = longest;
>  
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index 815f61b..a0ae572 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -766,19 +766,6 @@ typedef struct xfs_agfl {
>  
>  #define XFS_AGFL_CRC_OFF	offsetof(struct xfs_agfl, agfl_crc)
>  
> -
> -#define	XFS_AG_MAXLEVELS(mp)		((mp)->m_ag_maxlevels)
> -#define	XFS_MIN_FREELIST_RAW(bl,cl,mp)	\
> -	(MIN(bl + 1, XFS_AG_MAXLEVELS(mp)) + MIN(cl + 1, XFS_AG_MAXLEVELS(mp)))
> -#define	XFS_MIN_FREELIST(a,mp)		\
> -	(XFS_MIN_FREELIST_RAW(		\
> -		be32_to_cpu((a)->agf_levels[XFS_BTNUM_BNOi]), \
> -		be32_to_cpu((a)->agf_levels[XFS_BTNUM_CNTi]), mp))
> -#define	XFS_MIN_FREELIST_PAG(pag,mp)	\
> -	(XFS_MIN_FREELIST_RAW(		\
> -		(unsigned int)(pag)->pagf_levels[XFS_BTNUM_BNOi], \
> -		(unsigned int)(pag)->pagf_levels[XFS_BTNUM_CNTi], mp))
> -
>  #define XFS_AGB_TO_FSB(mp,agno,agbno)	\
>  	(((xfs_fsblock_t)(agno) << (mp)->m_sb.sb_agblklog) | (agbno))
>  #define	XFS_FSB_TO_AGNO(mp,fsbno)	\
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h
> index 2d5bdfc..7978150 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.h
> +++ b/fs/xfs/libxfs/xfs_trans_resv.h
> @@ -73,9 +73,9 @@ struct xfs_trans_resv {
>   * 2 trees * (2 blocks/level * max depth - 1) * block size
>   */
>  #define	XFS_ALLOCFREE_LOG_RES(mp,nx) \
> -	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * XFS_AG_MAXLEVELS(mp) - 1)))
> +	((nx) * (2 * XFS_FSB_TO_B((mp), 2 * (mp)->m_ag_maxlevels - 1)))
>  #define	XFS_ALLOCFREE_LOG_COUNT(mp,nx) \
> -	((nx) * (2 * (2 * XFS_AG_MAXLEVELS(mp) - 1)))
> +	((nx) * (2 * (2 * (mp)->m_ag_maxlevels - 1)))
>  
>  /*
>   * Per-directory log reservation for any directory change.
> diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
> index bf9c457..41e0428 100644
> --- a/fs/xfs/libxfs/xfs_trans_space.h
> +++ b/fs/xfs/libxfs/xfs_trans_space.h
> @@ -67,7 +67,7 @@
>  #define	XFS_DIOSTRAT_SPACE_RES(mp, v)	\
>  	(XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK) + (v))
>  #define	XFS_GROWFS_SPACE_RES(mp)	\
> -	(2 * XFS_AG_MAXLEVELS(mp))
> +	(2 * (mp)->m_ag_maxlevels)
>  #define	XFS_GROWFSRT_SPACE_RES(mp,b)	\
>  	((b) + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK))
>  #define	XFS_LINK_SPACE_RES(mp,nl)	\
> diff --git a/fs/xfs/xfs_filestream.c b/fs/xfs/xfs_filestream.c
> index 9ac5eaa..c4c130f 100644
> --- a/fs/xfs/xfs_filestream.c
> +++ b/fs/xfs/xfs_filestream.c
> @@ -197,7 +197,7 @@ xfs_filestream_pick_ag(
>  		}
>  
>  		longest = xfs_alloc_longest_free_extent(mp, pag,
> -						XFS_MIN_FREELIST_PAG(pag, mp));
> +					xfs_alloc_min_freelist(mp, pag));
>  		if (((minlen && longest >= minlen) ||
>  		     (!minlen && pag->pagf_freeblks >= minfree)) &&
>  		    (!pag->pagf_metadata || !(flags & XFS_PICK_USERDATA) ||
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist
  2015-06-15 14:58   ` Brian Foster
@ 2015-06-15 21:51     ` Dave Chinner
  2015-06-16 11:27       ` Brian Foster
  0 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-15 21:51 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Jun 15, 2015 at 10:58:14AM -0400, Brian Foster wrote:
> On Wed, Jun 03, 2015 at 04:04:40PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The error handling is currently an inconsistent mess as every error
> > condition handles return values and releasing buffers individually.
> > Clean this up by using gotos and a sane error label stack.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_alloc.c | 103 +++++++++++++++++++++-------------------------
> >  1 file changed, 47 insertions(+), 56 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index 2471cb5..352db46 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -1909,80 +1909,65 @@ xfs_alloc_space_available(
> >   */
> >  STATIC int			/* error */
> >  xfs_alloc_fix_freelist(
> > -	xfs_alloc_arg_t	*args,	/* allocation argument structure */
> > -	int		flags)	/* XFS_ALLOC_FLAG_... */
> > +	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> > +	int			flags)	/* XFS_ALLOC_FLAG_... */
> >  {
> > -	xfs_buf_t	*agbp;	/* agf buffer pointer */
> > -	xfs_buf_t	*agflbp;/* agfl buffer pointer */
> > -	xfs_agblock_t	bno;	/* freelist block */
> > -	int		error;	/* error result code */
> > -	xfs_mount_t	*mp;	/* file system mount point structure */
> > -	xfs_extlen_t	need;	/* total blocks needed in freelist */
> > -	xfs_perag_t	*pag;	/* per-ag information structure */
> > -	xfs_alloc_arg_t	targs;	/* local allocation arguments */
> > -	xfs_trans_t	*tp;	/* transaction pointer */
> > -
> > -	mp = args->mp;
> > +	struct xfs_mount	*mp = args->mp;
> > +	struct xfs_perag	*pag = args->pag;
> > +	struct xfs_trans	*tp = args->tp;
> > +	struct xfs_buf		*agbp = NULL;
> > +	struct xfs_buf		*agflbp = NULL;
> > +	struct xfs_alloc_arg	targs;	/* local allocation arguments */
> > +	xfs_agblock_t		bno;	/* freelist block */
> > +	xfs_extlen_t		need;	/* total blocks needed in freelist */
> > +	int			error;
> >  
> > -	pag = args->pag;
> > -	tp = args->tp;
> >  	if (!pag->pagf_init) {
> > -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> > -				&agbp)))
> > -			return error;
> > +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> > +		if (error)
> > +			goto out_no_agbp;
> >  		if (!pag->pagf_init) {
> >  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
> >  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > -			args->agbp = NULL;
> > -			return 0;
> > +			goto out_agbp_relse;
> >  		}
> > -	} else
> > -		agbp = NULL;
> > +	}
> >  
> >  	/*
> > -	 * If this is a metadata preferred pag and we are user data
> > -	 * then try somewhere else if we are not being asked to
> > -	 * try harder at this point
> > +	 * If this is a metadata preferred pag and we are user data then try
> > +	 * somewhere else if we are not being asked to try harder at this
> > +	 * point
> >  	 */
> >  	if (pag->pagf_metadata && args->userdata &&
> >  	    (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
> >  		ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > -		args->agbp = NULL;
> > -		return 0;
> > +		goto out_agbp_relse;
> >  	}
> >  
> >  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> > -	if (!xfs_alloc_space_available(args, need, flags)) {
> > -		if (agbp)
> > -			xfs_trans_brelse(tp, agbp);
> > -		args->agbp = NULL;
> > -		return 0;
> > -	}
> > +	if (!xfs_alloc_space_available(args, need, flags))
> > +		goto out_agbp_relse;
> >  
> >  	/*
> >  	 * Get the a.g. freespace buffer.
> >  	 * Can fail if we're not blocking on locks, and it's held.
> >  	 */
> > -	if (agbp == NULL) {
> > -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> > -				&agbp)))
> > -			return error;
> > -		if (agbp == NULL) {
> > +	if (!agbp) {
> > +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> > +		if (error)
> > +			goto out_no_agbp;
> > +		if (!agbp) {
> >  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
> >  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > -			args->agbp = NULL;
> > -			return 0;
> > +			goto out_no_agbp;
> >  		}
> >  	}
> >  
> >  
> >  	/* If there isn't enough total space or single-extent, reject it. */
> >  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> > -	if (!xfs_alloc_space_available(args, need, flags)) {
> > -		xfs_trans_brelse(tp, agbp);
> > -		args->agbp = NULL;
> > -		return 0;
> > -	}
> > +	if (!xfs_alloc_space_available(args, need, flags))
> > +		goto out_agbp_relse;
> >  
> >  	/*
> >  	 * Make the freelist shorter if it's too long.
> > @@ -1997,10 +1982,10 @@ xfs_alloc_fix_freelist(
> >  
> >  		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
> >  		if (error)
> > -			return error;
> > +			goto out_agbp_relse;
> 
> So at this point it looks like the buffer could be logged (i.e., dirty
> in the transaction). Perhaps this is the reason for the lack of agbp
> releases from this point forward in the current error handling. That
> said, we do the agflbp release unconditionally at the end of the
> function even when it might be logged. xfs_trans_brelse() appears to
> handle this case as it just skips removing the item from the tp.
> 
> This is a bit confusing and at the very least seems like an unexpected
> use of xfs_trans_brelse(). Is this intentional?

Yes, very much so. If the agf is clean, then releasing it
immediately on error in the function that grabbed the buffer is the
right thing to do. This can happen if the first call to
xfs_alloc_get_freelist() fails (which only occurs if we fail to read
the AGFL buffer).

And, as you noticed, if it is dirty it will remain held by the
transaction until it is cancelled, as happens everywhere else in the
code. So this is effectively making the current code consistent with
the usual error handling patterns...

FWIW, in looking at this, I noticed that the freelist modificaiton
loops are kind of inefficient - extending it re-reads the agfl
buffer for every call, and then we re-read the agfl buffer again
before making the free list longer if needed. I think this can be
cleaned up further and better optimised. IOWs, it can only fail on
the first read of the AGFL buffer, because if it succeeds then it
will be found in the transaction item list on subsequent reads...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist
  2015-06-15 21:51     ` Dave Chinner
@ 2015-06-16 11:27       ` Brian Foster
  2015-06-22  0:10         ` Dave Chinner
  0 siblings, 1 reply; 37+ messages in thread
From: Brian Foster @ 2015-06-16 11:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jun 16, 2015 at 07:51:19AM +1000, Dave Chinner wrote:
> On Mon, Jun 15, 2015 at 10:58:14AM -0400, Brian Foster wrote:
> > On Wed, Jun 03, 2015 at 04:04:40PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > The error handling is currently an inconsistent mess as every error
> > > condition handles return values and releasing buffers individually.
> > > Clean this up by using gotos and a sane error label stack.
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/xfs/libxfs/xfs_alloc.c | 103 +++++++++++++++++++++-------------------------
> > >  1 file changed, 47 insertions(+), 56 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > index 2471cb5..352db46 100644
> > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > @@ -1909,80 +1909,65 @@ xfs_alloc_space_available(
> > >   */
> > >  STATIC int			/* error */
> > >  xfs_alloc_fix_freelist(
> > > -	xfs_alloc_arg_t	*args,	/* allocation argument structure */
> > > -	int		flags)	/* XFS_ALLOC_FLAG_... */
> > > +	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> > > +	int			flags)	/* XFS_ALLOC_FLAG_... */
> > >  {
> > > -	xfs_buf_t	*agbp;	/* agf buffer pointer */
> > > -	xfs_buf_t	*agflbp;/* agfl buffer pointer */
> > > -	xfs_agblock_t	bno;	/* freelist block */
> > > -	int		error;	/* error result code */
> > > -	xfs_mount_t	*mp;	/* file system mount point structure */
> > > -	xfs_extlen_t	need;	/* total blocks needed in freelist */
> > > -	xfs_perag_t	*pag;	/* per-ag information structure */
> > > -	xfs_alloc_arg_t	targs;	/* local allocation arguments */
> > > -	xfs_trans_t	*tp;	/* transaction pointer */
> > > -
> > > -	mp = args->mp;
> > > +	struct xfs_mount	*mp = args->mp;
> > > +	struct xfs_perag	*pag = args->pag;
> > > +	struct xfs_trans	*tp = args->tp;
> > > +	struct xfs_buf		*agbp = NULL;
> > > +	struct xfs_buf		*agflbp = NULL;
> > > +	struct xfs_alloc_arg	targs;	/* local allocation arguments */
> > > +	xfs_agblock_t		bno;	/* freelist block */
> > > +	xfs_extlen_t		need;	/* total blocks needed in freelist */
> > > +	int			error;
> > >  
> > > -	pag = args->pag;
> > > -	tp = args->tp;
> > >  	if (!pag->pagf_init) {
> > > -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> > > -				&agbp)))
> > > -			return error;
> > > +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> > > +		if (error)
> > > +			goto out_no_agbp;
> > >  		if (!pag->pagf_init) {
> > >  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
> > >  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > > -			args->agbp = NULL;
> > > -			return 0;
> > > +			goto out_agbp_relse;
> > >  		}
> > > -	} else
> > > -		agbp = NULL;
> > > +	}
> > >  
> > >  	/*
> > > -	 * If this is a metadata preferred pag and we are user data
> > > -	 * then try somewhere else if we are not being asked to
> > > -	 * try harder at this point
> > > +	 * If this is a metadata preferred pag and we are user data then try
> > > +	 * somewhere else if we are not being asked to try harder at this
> > > +	 * point
> > >  	 */
> > >  	if (pag->pagf_metadata && args->userdata &&
> > >  	    (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
> > >  		ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > > -		args->agbp = NULL;
> > > -		return 0;
> > > +		goto out_agbp_relse;
> > >  	}
> > >  
> > >  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> > > -	if (!xfs_alloc_space_available(args, need, flags)) {
> > > -		if (agbp)
> > > -			xfs_trans_brelse(tp, agbp);
> > > -		args->agbp = NULL;
> > > -		return 0;
> > > -	}
> > > +	if (!xfs_alloc_space_available(args, need, flags))
> > > +		goto out_agbp_relse;
> > >  
> > >  	/*
> > >  	 * Get the a.g. freespace buffer.
> > >  	 * Can fail if we're not blocking on locks, and it's held.
> > >  	 */
> > > -	if (agbp == NULL) {
> > > -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> > > -				&agbp)))
> > > -			return error;
> > > -		if (agbp == NULL) {
> > > +	if (!agbp) {
> > > +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> > > +		if (error)
> > > +			goto out_no_agbp;
> > > +		if (!agbp) {
> > >  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
> > >  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > > -			args->agbp = NULL;
> > > -			return 0;
> > > +			goto out_no_agbp;
> > >  		}
> > >  	}
> > >  
> > >  
> > >  	/* If there isn't enough total space or single-extent, reject it. */
> > >  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> > > -	if (!xfs_alloc_space_available(args, need, flags)) {
> > > -		xfs_trans_brelse(tp, agbp);
> > > -		args->agbp = NULL;
> > > -		return 0;
> > > -	}
> > > +	if (!xfs_alloc_space_available(args, need, flags))
> > > +		goto out_agbp_relse;
> > >  
> > >  	/*
> > >  	 * Make the freelist shorter if it's too long.
> > > @@ -1997,10 +1982,10 @@ xfs_alloc_fix_freelist(
> > >  
> > >  		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
> > >  		if (error)
> > > -			return error;
> > > +			goto out_agbp_relse;
> > 
> > So at this point it looks like the buffer could be logged (i.e., dirty
> > in the transaction). Perhaps this is the reason for the lack of agbp
> > releases from this point forward in the current error handling. That
> > said, we do the agflbp release unconditionally at the end of the
> > function even when it might be logged. xfs_trans_brelse() appears to
> > handle this case as it just skips removing the item from the tp.
> > 
> > This is a bit confusing and at the very least seems like an unexpected
> > use of xfs_trans_brelse(). Is this intentional?
> 
> Yes, very much so. If the agf is clean, then releasing it
> immediately on error in the function that grabbed the buffer is the
> right thing to do. This can happen if the first call to
> xfs_alloc_get_freelist() fails (which only occurs if we fail to read
> the AGFL buffer).
> 

Ok.

> And, as you noticed, if it is dirty it will remain held by the
> transaction until it is cancelled, as happens everywhere else in the
> code. So this is effectively making the current code consistent with
> the usual error handling patterns...
> 

Effectively, yeah. My point was more that it doesn't seem to be the
prevalent pattern. Case in point, this code prior to these changes, attr
code doesn't seem to use brelse() after potential modification
(xfs_attr_leaf_[add|remove]name()), inode allocation, etc. Anyways, it
seems valid and is probably the more simple method for handling errors
in this situation:

Reviewed-by: Brian Foster <bfoster@redhat.com>

... though I think a comment around it couldn't hurt.

> FWIW, in looking at this, I noticed that the freelist modificaiton
> loops are kind of inefficient - extending it re-reads the agfl
> buffer for every call, and then we re-read the agfl buffer again
> before making the free list longer if needed. I think this can be
> cleaned up further and better optimised. IOWs, it can only fail on
> the first read of the AGFL buffer, because if it succeeds then it
> will be found in the transaction item list on subsequent reads...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist
  2015-06-16 11:27       ` Brian Foster
@ 2015-06-22  0:10         ` Dave Chinner
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-06-22  0:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Tue, Jun 16, 2015 at 07:27:06AM -0400, Brian Foster wrote:
> On Tue, Jun 16, 2015 at 07:51:19AM +1000, Dave Chinner wrote:
> > On Mon, Jun 15, 2015 at 10:58:14AM -0400, Brian Foster wrote:
> > > On Wed, Jun 03, 2015 at 04:04:40PM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > 
> > > > The error handling is currently an inconsistent mess as every error
> > > > condition handles return values and releasing buffers individually.
> > > > Clean this up by using gotos and a sane error label stack.
> > > > 
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_alloc.c | 103 +++++++++++++++++++++-------------------------
> > > >  1 file changed, 47 insertions(+), 56 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > > > index 2471cb5..352db46 100644
> > > > --- a/fs/xfs/libxfs/xfs_alloc.c
> > > > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > > > @@ -1909,80 +1909,65 @@ xfs_alloc_space_available(
> > > >   */
> > > >  STATIC int			/* error */
> > > >  xfs_alloc_fix_freelist(
> > > > -	xfs_alloc_arg_t	*args,	/* allocation argument structure */
> > > > -	int		flags)	/* XFS_ALLOC_FLAG_... */
> > > > +	struct xfs_alloc_arg	*args,	/* allocation argument structure */
> > > > +	int			flags)	/* XFS_ALLOC_FLAG_... */
> > > >  {
> > > > -	xfs_buf_t	*agbp;	/* agf buffer pointer */
> > > > -	xfs_buf_t	*agflbp;/* agfl buffer pointer */
> > > > -	xfs_agblock_t	bno;	/* freelist block */
> > > > -	int		error;	/* error result code */
> > > > -	xfs_mount_t	*mp;	/* file system mount point structure */
> > > > -	xfs_extlen_t	need;	/* total blocks needed in freelist */
> > > > -	xfs_perag_t	*pag;	/* per-ag information structure */
> > > > -	xfs_alloc_arg_t	targs;	/* local allocation arguments */
> > > > -	xfs_trans_t	*tp;	/* transaction pointer */
> > > > -
> > > > -	mp = args->mp;
> > > > +	struct xfs_mount	*mp = args->mp;
> > > > +	struct xfs_perag	*pag = args->pag;
> > > > +	struct xfs_trans	*tp = args->tp;
> > > > +	struct xfs_buf		*agbp = NULL;
> > > > +	struct xfs_buf		*agflbp = NULL;
> > > > +	struct xfs_alloc_arg	targs;	/* local allocation arguments */
> > > > +	xfs_agblock_t		bno;	/* freelist block */
> > > > +	xfs_extlen_t		need;	/* total blocks needed in freelist */
> > > > +	int			error;
> > > >  
> > > > -	pag = args->pag;
> > > > -	tp = args->tp;
> > > >  	if (!pag->pagf_init) {
> > > > -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> > > > -				&agbp)))
> > > > -			return error;
> > > > +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> > > > +		if (error)
> > > > +			goto out_no_agbp;
> > > >  		if (!pag->pagf_init) {
> > > >  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
> > > >  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > > > -			args->agbp = NULL;
> > > > -			return 0;
> > > > +			goto out_agbp_relse;
> > > >  		}
> > > > -	} else
> > > > -		agbp = NULL;
> > > > +	}
> > > >  
> > > >  	/*
> > > > -	 * If this is a metadata preferred pag and we are user data
> > > > -	 * then try somewhere else if we are not being asked to
> > > > -	 * try harder at this point
> > > > +	 * If this is a metadata preferred pag and we are user data then try
> > > > +	 * somewhere else if we are not being asked to try harder at this
> > > > +	 * point
> > > >  	 */
> > > >  	if (pag->pagf_metadata && args->userdata &&
> > > >  	    (flags & XFS_ALLOC_FLAG_TRYLOCK)) {
> > > >  		ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > > > -		args->agbp = NULL;
> > > > -		return 0;
> > > > +		goto out_agbp_relse;
> > > >  	}
> > > >  
> > > >  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> > > > -	if (!xfs_alloc_space_available(args, need, flags)) {
> > > > -		if (agbp)
> > > > -			xfs_trans_brelse(tp, agbp);
> > > > -		args->agbp = NULL;
> > > > -		return 0;
> > > > -	}
> > > > +	if (!xfs_alloc_space_available(args, need, flags))
> > > > +		goto out_agbp_relse;
> > > >  
> > > >  	/*
> > > >  	 * Get the a.g. freespace buffer.
> > > >  	 * Can fail if we're not blocking on locks, and it's held.
> > > >  	 */
> > > > -	if (agbp == NULL) {
> > > > -		if ((error = xfs_alloc_read_agf(mp, tp, args->agno, flags,
> > > > -				&agbp)))
> > > > -			return error;
> > > > -		if (agbp == NULL) {
> > > > +	if (!agbp) {
> > > > +		error = xfs_alloc_read_agf(mp, tp, args->agno, flags, &agbp);
> > > > +		if (error)
> > > > +			goto out_no_agbp;
> > > > +		if (!agbp) {
> > > >  			ASSERT(flags & XFS_ALLOC_FLAG_TRYLOCK);
> > > >  			ASSERT(!(flags & XFS_ALLOC_FLAG_FREEING));
> > > > -			args->agbp = NULL;
> > > > -			return 0;
> > > > +			goto out_no_agbp;
> > > >  		}
> > > >  	}
> > > >  
> > > >  
> > > >  	/* If there isn't enough total space or single-extent, reject it. */
> > > >  	need = XFS_MIN_FREELIST_PAG(pag, mp);
> > > > -	if (!xfs_alloc_space_available(args, need, flags)) {
> > > > -		xfs_trans_brelse(tp, agbp);
> > > > -		args->agbp = NULL;
> > > > -		return 0;
> > > > -	}
> > > > +	if (!xfs_alloc_space_available(args, need, flags))
> > > > +		goto out_agbp_relse;
> > > >  
> > > >  	/*
> > > >  	 * Make the freelist shorter if it's too long.
> > > > @@ -1997,10 +1982,10 @@ xfs_alloc_fix_freelist(
> > > >  
> > > >  		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
> > > >  		if (error)
> > > > -			return error;
> > > > +			goto out_agbp_relse;
> > > 
> > > So at this point it looks like the buffer could be logged (i.e., dirty
> > > in the transaction). Perhaps this is the reason for the lack of agbp
> > > releases from this point forward in the current error handling. That
> > > said, we do the agflbp release unconditionally at the end of the
> > > function even when it might be logged. xfs_trans_brelse() appears to
> > > handle this case as it just skips removing the item from the tp.
> > > 
> > > This is a bit confusing and at the very least seems like an unexpected
> > > use of xfs_trans_brelse(). Is this intentional?
> > 
> > Yes, very much so. If the agf is clean, then releasing it
> > immediately on error in the function that grabbed the buffer is the
> > right thing to do. This can happen if the first call to
> > xfs_alloc_get_freelist() fails (which only occurs if we fail to read
> > the AGFL buffer).
> > 
> 
> Ok.
> 
> > And, as you noticed, if it is dirty it will remain held by the
> > transaction until it is cancelled, as happens everywhere else in the
> > code. So this is effectively making the current code consistent with
> > the usual error handling patterns...
> > 
> 
> Effectively, yeah. My point was more that it doesn't seem to be the
> prevalent pattern. Case in point, this code prior to these changes, attr
> code doesn't seem to use brelse() after potential modification
> (xfs_attr_leaf_[add|remove]name()), inode allocation, etc. Anyways, it
> seems valid and is probably the more simple method for handling errors
> in this situation:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
> ... though I think a comment around it couldn't hurt.

No worries, I added this hunk:

        /*
         * Make the freelist shorter if it's too long.
         *
+        * Note that from this point onwards, we will always release the agf and
+        * agfl buffers on error. This means that if we error out and the
+        * buffers are clean, they are correctly handled as they may not have
+        * been joined to the transaction and hence need to be released
+        * manually. If they have been joined to the transaction, then
+        * xfs_trans_brelse() will handle them according to the recursion count
+        * and dirty state of the buffer.
+        *
         * XXX (dgc): When we have lots of free space, does this buy us
         * anything other than extra overhead when we need to put more blocks
         * back on the free list? Maybe we should only do this when space is

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/20] xfs: add owner field to extent allocation and freeing
  2015-06-03  6:04 ` [PATCH 08/20] xfs: add owner field to extent allocation and freeing Dave Chinner
@ 2015-06-24 19:09   ` Brian Foster
  2015-06-24 21:13     ` Dave Chinner
  0 siblings, 1 reply; 37+ messages in thread
From: Brian Foster @ 2015-06-24 19:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:45PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> For the rmap btree to work, we have to fed the extent owner
> information to the the allocation and freeing functions. This
> information is what will end up in the rmap btree that tracks
> allocated extents. While we technically don't need the owner
> information when freeing extents, passing it allows us to validate
> that the extent we are removing from the rmap btree actually
> belonged to the owner we expected it to belong to.
> 
> We also define a special set of owner values for internal metadata
> that would otherwise have no owner. This allows us to tell the
> difference between metadata owned by different per-ag btrees, as
> well as static fs metadata (e.g. AG headers) and internal journal
> blocks.
> 
> There are also a couple of special cases we need to take care of -
> during EFI recovery, we don't actually know who the original owner
> was, so we need to pass a wildcard to indicate that we aren't
> checking the owner for validity. We also need special handling in
> growfs, as we "free" the space in the last AG when extending it, but
> because it's new space it has no actual owner...
> 

Any reason not to support passing the owner through the efi/efd log
structures? You've already plumbed it through the bmap_free struct. I
suppose that could make this a backwards incompatible feature rather
than read-only incompatible, though.

Brian

> While touching the xfs_bmap_add_free() function, re-order the
> parameters to put the struct xfs_mount first.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c        | 11 ++++++++---
>  fs/xfs/libxfs/xfs_alloc.h        |  4 +++-
>  fs/xfs/libxfs/xfs_bmap.c         | 17 ++++++++++++-----
>  fs/xfs/libxfs/xfs_bmap.h         |  5 +++--
>  fs/xfs/libxfs/xfs_bmap_btree.c   |  3 ++-
>  fs/xfs/libxfs/xfs_format.h       | 16 ++++++++++++++++
>  fs/xfs/libxfs/xfs_ialloc.c       | 10 +++++-----
>  fs/xfs/libxfs/xfs_ialloc_btree.c |  3 ++-
>  fs/xfs/xfs_bmap_util.c           | 17 +++++++++--------
>  fs/xfs/xfs_fsops.c               | 13 +++++++++----
>  fs/xfs/xfs_log_recover.c         |  3 ++-
>  11 files changed, 71 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index a683d7a..4353135 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1592,6 +1592,7 @@ xfs_free_ag_extent(
>  	xfs_agnumber_t	agno,	/* allocation group number */
>  	xfs_agblock_t	bno,	/* starting block number */
>  	xfs_extlen_t	len,	/* length of extent */
> +	uint64_t	owner,	/* extent owner */
>  	int		isfl)	/* set if is freelist blocks - no sb acctg */
>  {
>  	xfs_btree_cur_t	*bno_cur;	/* cursor for by-block btree */
> @@ -2010,7 +2011,8 @@ xfs_alloc_fix_freelist(
>  		error = xfs_alloc_get_freelist(tp, agbp, &bno, 0);
>  		if (error)
>  			goto out_agbp_relse;
> -		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1, 1);
> +		error = xfs_free_ag_extent(tp, agbp, args->agno, bno, 1,
> +					   XFS_RMAP_OWN_AG, 1);
>  		if (error)
>  			goto out_agbp_relse;
>  		bp = xfs_btree_get_bufs(mp, tp, args->agno, bno, 0);
> @@ -2020,6 +2022,7 @@ xfs_alloc_fix_freelist(
>  	memset(&targs, 0, sizeof(targs));
>  	targs.tp = tp;
>  	targs.mp = mp;
> +	targs.owner = XFS_RMAP_OWN_AG;
>  	targs.agbp = agbp;
>  	targs.agno = args->agno;
>  	targs.alignment = targs.minlen = targs.prod = targs.isfl = 1;
> @@ -2660,7 +2663,8 @@ int				/* error */
>  xfs_free_extent(
>  	xfs_trans_t	*tp,	/* transaction pointer */
>  	xfs_fsblock_t	bno,	/* starting block number of extent */
> -	xfs_extlen_t	len)	/* length of extent */
> +	xfs_extlen_t	len,	/* length of extent */
> +	uint64_t	owner)	/* extent owner */
>  {
>  	xfs_alloc_arg_t	args;
>  	int		error;
> @@ -2696,7 +2700,8 @@ xfs_free_extent(
>  		goto error0;
>  	}
>  
> -	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
> +	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno,
> +				   len, owner, 0);
>  	if (!error)
>  		xfs_extent_busy_insert(tp, args.agno, args.agbno, len, 0);
>  error0:
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 71379f6..39ca815 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -122,6 +122,7 @@ typedef struct xfs_alloc_arg {
>  	char		isfl;		/* set if is freelist blocks - !acctg */
>  	char		userdata;	/* set if this is user data */
>  	xfs_fsblock_t	firstblock;	/* io first block allocated */
> +	uint64_t	owner;		/* owner of blocks being allocated */
>  } xfs_alloc_arg_t;
>  
>  /*
> @@ -208,7 +209,8 @@ int				/* error */
>  xfs_free_extent(
>  	struct xfs_trans *tp,	/* transaction pointer */
>  	xfs_fsblock_t	bno,	/* starting block number of extent */
> -	xfs_extlen_t	len);	/* length of extent */
> +	xfs_extlen_t	len,	/* length of extent */
> +	uint64_t	owner);	/* extent owner */
>  
>  int					/* error */
>  xfs_alloc_lookup_le(
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 983a5d0..0b40a29 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -567,10 +567,11 @@ xfs_bmap_validate_ret(
>   */
>  void
>  xfs_bmap_add_free(
> +	struct xfs_mount	*mp,		/* mount point structure */
> +	struct xfs_bmap_free	*flist,		/* list of extents */
>  	xfs_fsblock_t		bno,		/* fs block number of extent */
>  	xfs_filblks_t		len,		/* length of extent */
> -	xfs_bmap_free_t		*flist,		/* list of extents */
> -	xfs_mount_t		*mp)		/* mount point structure */
> +	uint64_t		owner)		/* extent owner */
>  {
>  	xfs_bmap_free_item_t	*cur;		/* current (next) element */
>  	xfs_bmap_free_item_t	*new;		/* new element */
> @@ -591,9 +592,12 @@ xfs_bmap_add_free(
>  	ASSERT(agbno + len <= mp->m_sb.sb_agblocks);
>  #endif
>  	ASSERT(xfs_bmap_free_item_zone != NULL);
> +	ASSERT(owner);
> +
>  	new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
>  	new->xbfi_startblock = bno;
>  	new->xbfi_blockcount = (xfs_extlen_t)len;
> +	new->xbfi_owner = owner;
>  	for (prev = NULL, cur = flist->xbf_first;
>  	     cur != NULL;
>  	     prev = cur, cur = cur->xbfi_next) {
> @@ -696,7 +700,7 @@ xfs_bmap_btree_to_extents(
>  	cblock = XFS_BUF_TO_BLOCK(cbp);
>  	if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
>  		return error;
> -	xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp);
> +	xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, ip->i_ino);
>  	ip->i_d.di_nblocks--;
>  	xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
>  	xfs_trans_binval(tp, cbp);
> @@ -777,6 +781,7 @@ xfs_bmap_extents_to_btree(
>  	memset(&args, 0, sizeof(args));
>  	args.tp = tp;
>  	args.mp = mp;
> +	args.owner = ip->i_ino;
>  	args.firstblock = *firstblock;
>  	if (*firstblock == NULLFSBLOCK) {
>  		args.type = XFS_ALLOCTYPE_START_BNO;
> @@ -923,6 +928,7 @@ xfs_bmap_local_to_extents(
>  	memset(&args, 0, sizeof(args));
>  	args.tp = tp;
>  	args.mp = ip->i_mount;
> +	args.owner = ip->i_ino;
>  	args.firstblock = *firstblock;
>  	/*
>  	 * Allocate a block.  We know we need only one, since the
> @@ -3706,6 +3712,7 @@ xfs_bmap_btalloc(
>  	memset(&args, 0, sizeof(args));
>  	args.tp = ap->tp;
>  	args.mp = mp;
> +	args.owner = ap->ip->i_ino;
>  	args.fsbno = ap->blkno;
>  
>  	/* Trim the allocation back to the maximum an AG can fit. */
> @@ -4980,8 +4987,8 @@ xfs_bmap_del_extent(
>  	 * If we need to, add to list of extents to delete.
>  	 */
>  	if (do_fx)
> -		xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist,
> -			mp);
> +		xfs_bmap_add_free(mp, flist, del->br_startblock,
> +				  del->br_blockcount, ip->i_ino);
>  	/*
>  	 * Adjust inode # blocks in the file.
>  	 */
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index 6aaa0c1..674819f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -66,6 +66,7 @@ typedef struct xfs_bmap_free_item
>  {
>  	xfs_fsblock_t		xbfi_startblock;/* starting fs block number */
>  	xfs_extlen_t		xbfi_blockcount;/* number of blocks in extent */
> +	uint64_t		xbfi_owner;	/* extent owner */
>  	struct xfs_bmap_free_item *xbfi_next;	/* link to next entry */
>  } xfs_bmap_free_item_t;
>  
> @@ -182,8 +183,8 @@ void	xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
>  
>  int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
>  void	xfs_bmap_local_to_extents_empty(struct xfs_inode *ip, int whichfork);
> -void	xfs_bmap_add_free(xfs_fsblock_t bno, xfs_filblks_t len,
> -		struct xfs_bmap_free *flist, struct xfs_mount *mp);
> +void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
> +			  xfs_fsblock_t bno, xfs_filblks_t len, uint64_t owner);
>  void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
>  int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
>  			int *committed);
> diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
> index 2c44c8e..18fe394 100644
> --- a/fs/xfs/libxfs/xfs_bmap_btree.c
> +++ b/fs/xfs/libxfs/xfs_bmap_btree.c
> @@ -445,6 +445,7 @@ xfs_bmbt_alloc_block(
>  	args.mp = cur->bc_mp;
>  	args.fsbno = cur->bc_private.b.firstblock;
>  	args.firstblock = args.fsbno;
> +	args.owner = cur->bc_private.b.ip->i_ino;
>  
>  	if (args.fsbno == NULLFSBLOCK) {
>  		args.fsbno = be64_to_cpu(start->l);
> @@ -525,7 +526,7 @@ xfs_bmbt_free_block(
>  	struct xfs_trans	*tp = cur->bc_tp;
>  	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
>  
> -	xfs_bmap_add_free(fsbno, 1, cur->bc_private.b.flist, mp);
> +	xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, ip->i_ino);
>  	ip->i_d.di_nblocks--;
>  
>  	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
> index e81ffec..4c9e7e1 100644
> --- a/fs/xfs/libxfs/xfs_format.h
> +++ b/fs/xfs/libxfs/xfs_format.h
> @@ -1288,6 +1288,22 @@ typedef __be32 xfs_inobt_ptr_t;
>   */
>  #define	XFS_RMAP_CRC_MAGIC	0x524d4233	/* 'RMB3' */
>  
> +/*
> + * Special owner types.
> + *
> + * Seeing as we only support up to 8EB, we have the upper bit of the owner field
> + * to tell us we have a special owner value. We use these for static metadata
> + * allocated at mkfs/growfs time, as well as for freespace management metadata.
> + */
> +#define XFS_RMAP_OWN_NULL	(-1ULL)	/* No owner, for growfs */
> +#define XFS_RMAP_OWN_UNKNOWN	(-2ULL)	/* Unknown owner, for EFI recovery */
> +#define XFS_RMAP_OWN_FS		(-3ULL)	/* static fs metadata */
> +#define XFS_RMAP_OWN_LOG	(-4ULL)	/* static fs metadata */
> +#define XFS_RMAP_OWN_AG		(-5ULL)	/* AG freespace btree blocks */
> +#define XFS_RMAP_OWN_INOBT	(-6ULL)	/* Inode btree blocks */
> +#define XFS_RMAP_OWN_INODES	(-7ULL)	/* Inode chunk */
> +#define XFS_RMAP_OWN_MIN	(-8ULL) /* guard */
> +
>  #define	XFS_RMAP_BLOCK(mp) \
>  	(xfs_sb_version_hasfinobt(&((mp)->m_sb)) ? \
>  	 XFS_FIBT_BLOCK(mp) + 1 : \
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 66efc70..b08823a 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -612,6 +612,7 @@ xfs_ialloc_ag_alloc(
>  	args.tp = tp;
>  	args.mp = tp->t_mountp;
>  	args.fsbno = NULLFSBLOCK;
> +	args.owner = XFS_RMAP_OWN_INODES;
>  
>  #ifdef DEBUG
>  	/* randomly do sparse inode allocations */
> @@ -1826,9 +1827,8 @@ xfs_difree_inode_chunk(
>  
>  	if (!xfs_inobt_issparse(rec->ir_holemask)) {
>  		/* not sparse, calculate extent info directly */
> -		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno,
> -				  XFS_AGINO_TO_AGBNO(mp, rec->ir_startino)),
> -				  mp->m_ialloc_blks, flist, mp);
> +		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, sagbno),
> +				  mp->m_ialloc_blks, XFS_RMAP_OWN_INODES);
>  		return;
>  	}
>  
> @@ -1871,8 +1871,8 @@ xfs_difree_inode_chunk(
>  
>  		ASSERT(agbno % mp->m_sb.sb_spino_align == 0);
>  		ASSERT(contigblk % mp->m_sb.sb_spino_align == 0);
> -		xfs_bmap_add_free(XFS_AGB_TO_FSB(mp, agno, agbno), contigblk,
> -				  flist, mp);
> +		xfs_bmap_add_free(mp, flist, XFS_AGB_TO_FSB(mp, agno, agbno),
> +				  contigblk, XFS_RMAP_OWN_INODES);
>  
>  		/* reset range to current bit and carry on... */
>  		startidx = endidx = nextbit;
> diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c
> index 674ad8f..b96db1c 100644
> --- a/fs/xfs/libxfs/xfs_ialloc_btree.c
> +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c
> @@ -96,6 +96,7 @@ xfs_inobt_alloc_block(
>  	memset(&args, 0, sizeof(args));
>  	args.tp = cur->bc_tp;
>  	args.mp = cur->bc_mp;
> +	args.owner = XFS_RMAP_OWN_INOBT;
>  	args.fsbno = XFS_AGB_TO_FSB(args.mp, cur->bc_private.a.agno, sbno);
>  	args.minlen = 1;
>  	args.maxlen = 1;
> @@ -129,7 +130,7 @@ xfs_inobt_free_block(
>  	int			error;
>  
>  	fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
> -	error = xfs_free_extent(cur->bc_tp, fsbno, 1);
> +	error = xfs_free_extent(cur->bc_tp, fsbno, 1, XFS_RMAP_OWN_INOBT);
>  	if (error)
>  		return error;
>  
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 4a29655..5ed272b 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -117,15 +117,16 @@ xfs_bmap_finish(
>  	efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
>  	for (free = flist->xbf_first; free != NULL; free = next) {
>  		next = free->xbfi_next;
> -		if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
> -				free->xbfi_blockcount))) {
> +		error = xfs_free_extent(ntp, free->xbfi_startblock,
> +					free->xbfi_blockcount,
> +					free->xbfi_owner);
> +		if (error) {
>  			/*
> -			 * The bmap free list will be cleaned up at a
> -			 * higher level.  The EFI will be canceled when
> -			 * this transaction is aborted.
> -			 * Need to force shutdown here to make sure it
> -			 * happens, since this transaction may not be
> -			 * dirty yet.
> +			 * The bmap free list will be cleaned up at a higher
> +			 * level.  The EFI will be canceled when this
> +			 * transaction is aborted.  Need to force shutdown here
> +			 * to make sure it happens, since this transaction may
> +			 * not be dirty yet.
>  			 */
>  			mp = ntp->t_mountp;
>  			if (!XFS_FORCED_SHUTDOWN(mp))
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index a564c4c..ebfeb84 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -466,14 +466,19 @@ xfs_growfs_data_private(
>  		       be32_to_cpu(agi->agi_length));
>  
>  		xfs_alloc_log_agf(tp, bp, XFS_AGF_LENGTH);
> +
>  		/*
>  		 * Free the new space.
> +		 *
> +		 * XFS_RMAP_OWN_NULL is used here to tell the rmap btree that
> +		 * this doesn't actually exist in the rmap btree.
>  		 */
> -		error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
> -			be32_to_cpu(agf->agf_length) - new), new);
> -		if (error) {
> +		error = xfs_free_extent(tp,
> +				XFS_AGB_TO_FSB(mp, agno,
> +					be32_to_cpu(agf->agf_length) - new),
> +				new, XFS_RMAP_OWN_NULL);
> +		if (error)
>  			goto error0;
> -		}
>  	}
>  
>  	/*
> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
> index 4a8c440..5dad26c 100644
> --- a/fs/xfs/xfs_log_recover.c
> +++ b/fs/xfs/xfs_log_recover.c
> @@ -3753,7 +3753,8 @@ xlog_recover_process_efi(
>  
>  	for (i = 0; i < efip->efi_format.efi_nextents; i++) {
>  		extp = &(efip->efi_format.efi_extents[i]);
> -		error = xfs_free_extent(tp, extp->ext_start, extp->ext_len);
> +		error = xfs_free_extent(tp, extp->ext_start, extp->ext_len,
> +					XFS_RMAP_OWN_UNKNOWN);
>  		if (error)
>  			goto abort_error;
>  		xfs_trans_log_efd_extent(tp, efdp, extp->ext_start,
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/20] xfs: add owner field to extent allocation and freeing
  2015-06-24 19:09   ` Brian Foster
@ 2015-06-24 21:13     ` Dave Chinner
  2015-06-25 13:03       ` Brian Foster
  0 siblings, 1 reply; 37+ messages in thread
From: Dave Chinner @ 2015-06-24 21:13 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Jun 24, 2015 at 03:09:19PM -0400, Brian Foster wrote:
> On Wed, Jun 03, 2015 at 04:04:45PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > For the rmap btree to work, we have to fed the extent owner
> > information to the the allocation and freeing functions. This
> > information is what will end up in the rmap btree that tracks
> > allocated extents. While we technically don't need the owner
> > information when freeing extents, passing it allows us to validate
> > that the extent we are removing from the rmap btree actually
> > belonged to the owner we expected it to belong to.
> > 
> > We also define a special set of owner values for internal metadata
> > that would otherwise have no owner. This allows us to tell the
> > difference between metadata owned by different per-ag btrees, as
> > well as static fs metadata (e.g. AG headers) and internal journal
> > blocks.
> > 
> > There are also a couple of special cases we need to take care of -
> > during EFI recovery, we don't actually know who the original owner
> > was, so we need to pass a wildcard to indicate that we aren't
> > checking the owner for validity. We also need special handling in
> > growfs, as we "free" the space in the last AG when extending it, but
> > because it's new space it has no actual owner...
> > 
> 
> Any reason not to support passing the owner through the efi/efd log
> structures? You've already plumbed it through the bmap_free struct. I
> suppose that could make this a backwards incompatible feature rather
> than read-only incompatible, though.

That's an interesting idea that I didn't really consider.

I'll have a think about it, along with a couple of other suggestions
from Darrick (e.g. increasing the rmap record size and keeping
owner-related location information (file offset) in it) as that will
also impact on any changes to EFI/EFD structure.

As for ro compat vs incompat, an EFI/EFD change would be a log
incompat flag, so only be relevant if the log is dirty at mount
time. Hence if the log was clean then the ro-compat flag would be
used, and so a change of EFI/EFD format isn't a huge deal...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 08/20] xfs: add owner field to extent allocation and freeing
  2015-06-24 21:13     ` Dave Chinner
@ 2015-06-25 13:03       ` Brian Foster
  0 siblings, 0 replies; 37+ messages in thread
From: Brian Foster @ 2015-06-25 13:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, Jun 25, 2015 at 07:13:06AM +1000, Dave Chinner wrote:
> On Wed, Jun 24, 2015 at 03:09:19PM -0400, Brian Foster wrote:
> > On Wed, Jun 03, 2015 at 04:04:45PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > For the rmap btree to work, we have to fed the extent owner
> > > information to the the allocation and freeing functions. This
> > > information is what will end up in the rmap btree that tracks
> > > allocated extents. While we technically don't need the owner
> > > information when freeing extents, passing it allows us to validate
> > > that the extent we are removing from the rmap btree actually
> > > belonged to the owner we expected it to belong to.
> > > 
> > > We also define a special set of owner values for internal metadata
> > > that would otherwise have no owner. This allows us to tell the
> > > difference between metadata owned by different per-ag btrees, as
> > > well as static fs metadata (e.g. AG headers) and internal journal
> > > blocks.
> > > 
> > > There are also a couple of special cases we need to take care of -
> > > during EFI recovery, we don't actually know who the original owner
> > > was, so we need to pass a wildcard to indicate that we aren't
> > > checking the owner for validity. We also need special handling in
> > > growfs, as we "free" the space in the last AG when extending it, but
> > > because it's new space it has no actual owner...
> > > 
> > 
> > Any reason not to support passing the owner through the efi/efd log
> > structures? You've already plumbed it through the bmap_free struct. I
> > suppose that could make this a backwards incompatible feature rather
> > than read-only incompatible, though.
> 
> That's an interesting idea that I didn't really consider.
> 
> I'll have a think about it, along with a couple of other suggestions
> from Darrick (e.g. increasing the rmap record size and keeping
> owner-related location information (file offset) in it) as that will
> also impact on any changes to EFI/EFD structure.
> 

Ok.. it's not really a problem, it just seems like a gap given that this
work presumably pushes the owner through the rest of the extent free
path for verification purposes. I think it would be a bit of a shame to
do that work and not have the same verification in the event of a crash,
assuming it can be done relatively cleanly.

> As for ro compat vs incompat, an EFI/EFD change would be a log
> incompat flag, so only be relevant if the log is dirty at mount
> time. Hence if the log was clean then the ro-compat flag would be
> used, and so a change of EFI/EFD format isn't a huge deal...
> 

That makes sense. Hmm, I have yet to get through the rest of this series
so I'm not sure whether extent free results in any rmapbt modifications,
but if so, wouldn't we need that log incompat. flag anyways? Even if the
EFI/EFD doesn't change, how can we allow an older kernel to recover
extent free transactions if they don't know how to update the rmapbt
appropriately?

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/20] xfs: rmap btree requires more reserved free space
  2015-06-03  6:04 ` [PATCH 13/20] xfs: rmap btree requires more reserved free space Dave Chinner
@ 2015-06-25 16:41   ` Brian Foster
  2015-07-10  0:37     ` Dave Chinner
  0 siblings, 1 reply; 37+ messages in thread
From: Brian Foster @ 2015-06-25 16:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:50PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The rmap btree is allocated from the AGFL, which means we have to
> ensure ENOSPC is reported to userspace before we run out of free
> space in each AG. The last allocation in an AG can cause a full
> height rmap btree split, and that means we have to reserve at least
> this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
> Update the various space calculation functiosn to handle this.
> 
> Also, because the macros are now executing conditional code and are called quite
> frequently, convert them to functions that initialise varaibles in the struct
> xfs_mount, use the new variables everywhere and document the calculations
> better.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_alloc.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_alloc.h | 41 ++++------------------------
>  fs/xfs/libxfs/xfs_bmap.c  |  2 +-
>  fs/xfs/libxfs/xfs_sb.c    |  3 +++
>  fs/xfs/xfs_discard.c      |  2 +-
>  fs/xfs/xfs_fsops.c        |  4 +--
>  fs/xfs/xfs_mount.c        |  2 +-
>  fs/xfs/xfs_mount.h        |  2 ++
>  fs/xfs/xfs_super.c        |  2 +-
>  9 files changed, 85 insertions(+), 42 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index f62775a..c6a1372 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -62,6 +62,72 @@ xfs_prealloc_blocks(
>  }
>  
>  /*
> + * In order to avoid ENOSPC-related deadlock caused by out-of-order locking of
> + * AGF buffer (PV 947395), we place constraints on the relationship among actual
> + * allocations for data blocks, freelist blocks, and potential file data bmap
> + * btree blocks. However, these restrictions may result in no actual space
> + * allocated for a delayed extent, for example, a data block in a certain AG is
> + * allocated but there is no additional block for the additional bmap btree
> + * block due to a split of the bmap btree of the file. The result of this may
> + * lead to an infinite loop when the file gets flushed to disk and all delayed
> + * extents need to be actually allocated. To get around this, we explicitly set
> + * aside a few blocks which will not be reserved in delayed allocation.
> + *
> + * The minimum number of needed freelist blocks is 4 fsbs _per AG_ when we are
> + * not using rmap btrees a potential split of file's bmap btree requires 1 fsb,

Somewhere in the above line might be a good place to end one sentence
and start another. ;) I had to read that a few times to get what it's
trying to say.

> + * so we set the number of set-aside blocks to 4 + 4*agcount when not using rmap
> + * btrees.
> + *
> + * When rmap btrees are active, we have to consider that using the last block in
> + * the AG can cause a full height rmap btree split and we need enough blocks on
> + * the AGFL to be able to handle this. That means we have, in addition to the
> + * above consideration, another (2 * mp->m_ag_levels) - 1 blocks required to be
> + * available to the free list.
> + */

BTW, I think I get the 2 block per level log requirement in that a split
requires logging the two blocks involved. Where does the 2nd block per
level come in as an allocation requirement?

Brian

> +unsigned int
> +xfs_alloc_set_aside(
> +	struct xfs_mount *mp)
> +{
> +	unsigned int	blocks;
> +
> +	blocks = 4 + (mp->m_sb.sb_agcount * XFS_ALLOC_AGFL_RESERVE);
> +	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> +		return blocks;
> +	return blocks + (mp->m_sb.sb_agcount * (2 * mp->m_ag_maxlevels) - 1);
> +}
> +
> +/*
> + * When deciding how much space to allocate out of an AG, we limit the
> + * allocation maximum size to the size the AG. However, we cannot use all the
> + * blocks in the AG - some are permanently used by metadata. These
> + * blocks are generally:
> + *	- the AG superblock, AGF, AGI and AGFL
> + *	- the AGF (bno and cnt) and AGI btree root blocks, and optionally
> + *	  the AGI free inode and rmap btree root blocks.
> + *	- blocks on the AGFL according to xfs_alloc_set_aside() limits
> + *
> + * The AG headers are sector sized, so the amount of space they take up is
> + * dependent on filesystem geometry. The others are all single blocks.
> + */
> +unsigned int
> +xfs_alloc_ag_max_usable(struct xfs_mount *mp)
> +{
> +	unsigned int	blocks;
> +
> +	blocks = XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)); /* ag headers */
> +	blocks += XFS_ALLOC_AGFL_RESERVE;
> +	blocks += 3;			/* AGF, AGI btree root blocks */
> +	if (xfs_sb_version_hasfinobt(&mp->m_sb))
> +		blocks++;		/* finobt root block */
> +	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +		/* rmap root block + full tree split on full AG */
> +		blocks += 1 + (2 * mp->m_ag_maxlevels) - 1;
> +	}
> +
> +	return mp->m_sb.sb_agblocks - blocks;
> +}
> +
> +/*
>   * Lookup the record equal to [bno, len] in the btree given by cur.
>   */
>  STATIC int				/* error */
> @@ -1906,6 +1972,9 @@ xfs_alloc_min_freelist(
>  	/* space needed by-size freespace btree */
>  	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_CNTi] + 1,
>  				       mp->m_ag_maxlevels);
> +	/* space needed reverse mapping used space btree */
> +	min_free += min_t(unsigned int, pag->pagf_levels[XFS_BTNUM_RMAPi] + 1,
> +				       mp->m_ag_maxlevels);
>  
>  	return min_free;
>  }
> diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h
> index 39ca815..18e4080 100644
> --- a/fs/xfs/libxfs/xfs_alloc.h
> +++ b/fs/xfs/libxfs/xfs_alloc.h
> @@ -56,42 +56,6 @@ typedef unsigned int xfs_alloctype_t;
>  #define	XFS_ALLOC_FLAG_FREEING	0x00000002  /* indicate caller is freeing extents*/
>  
>  /*
> - * In order to avoid ENOSPC-related deadlock caused by
> - * out-of-order locking of AGF buffer (PV 947395), we place
> - * constraints on the relationship among actual allocations for
> - * data blocks, freelist blocks, and potential file data bmap
> - * btree blocks. However, these restrictions may result in no
> - * actual space allocated for a delayed extent, for example, a data
> - * block in a certain AG is allocated but there is no additional
> - * block for the additional bmap btree block due to a split of the
> - * bmap btree of the file. The result of this may lead to an
> - * infinite loop in xfssyncd when the file gets flushed to disk and
> - * all delayed extents need to be actually allocated. To get around
> - * this, we explicitly set aside a few blocks which will not be
> - * reserved in delayed allocation. Considering the minimum number of
> - * needed freelist blocks is 4 fsbs _per AG_, a potential split of file's bmap
> - * btree requires 1 fsb, so we set the number of set-aside blocks
> - * to 4 + 4*agcount.
> - */
> -#define XFS_ALLOC_SET_ASIDE(mp)  (4 + ((mp)->m_sb.sb_agcount * 4))
> -
> -/*
> - * When deciding how much space to allocate out of an AG, we limit the
> - * allocation maximum size to the size the AG. However, we cannot use all the
> - * blocks in the AG - some are permanently used by metadata. These
> - * blocks are generally:
> - *	- the AG superblock, AGF, AGI and AGFL
> - *	- the AGF (bno and cnt) and AGI btree root blocks
> - *	- 4 blocks on the AGFL according to XFS_ALLOC_SET_ASIDE() limits
> - *
> - * The AG headers are sector sized, so the amount of space they take up is
> - * dependent on filesystem geometry. The others are all single blocks.
> - */
> -#define XFS_ALLOC_AG_MAX_USABLE(mp)	\
> -	((mp)->m_sb.sb_agblocks - XFS_BB_TO_FSB(mp, XFS_FSS_TO_BB(mp, 4)) - 7)
> -
> -
> -/*
>   * Argument structure for xfs_alloc routines.
>   * This is turned into a structure to avoid having 20 arguments passed
>   * down several levels of the stack.
> @@ -131,6 +95,11 @@ typedef struct xfs_alloc_arg {
>  #define XFS_ALLOC_USERDATA		1	/* allocation is for user data*/
>  #define XFS_ALLOC_INITIAL_USER_DATA	2	/* special case start of file */
>  
> +/* freespace limit calculations */
> +#define XFS_ALLOC_AGFL_RESERVE	4
> +unsigned int xfs_alloc_set_aside(struct xfs_mount *mp);
> +unsigned int xfs_alloc_ag_max_usable(struct xfs_mount *mp);
> +
>  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
>  		struct xfs_perag *pag, xfs_extlen_t need);
>  unsigned int xfs_alloc_min_freelist(struct xfs_mount *mp,
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 0b40a29..dfb9f28 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3716,7 +3716,7 @@ xfs_bmap_btalloc(
>  	args.fsbno = ap->blkno;
>  
>  	/* Trim the allocation back to the maximum an AG can fit. */
> -	args.maxlen = MIN(ap->length, XFS_ALLOC_AG_MAX_USABLE(mp));
> +	args.maxlen = MIN(ap->length, mp->m_ag_max_usable);
>  	args.firstblock = *ap->firstblock;
>  	blen = 0;
>  	if (nullfb) {
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index 89d8052..c136abd 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -721,6 +721,9 @@ xfs_sb_mount_common(
>  		mp->m_ialloc_min_blks = sbp->sb_spino_align;
>  	else
>  		mp->m_ialloc_min_blks = mp->m_ialloc_blks;
> +
> +	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
> +	mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
> index e85a951..ec7bb8b 100644
> --- a/fs/xfs/xfs_discard.c
> +++ b/fs/xfs/xfs_discard.c
> @@ -179,7 +179,7 @@ xfs_ioc_trim(
>  	 * matter as trimming blocks is an advisory interface.
>  	 */
>  	if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
> -	    range.minlen > XFS_FSB_TO_B(mp, XFS_ALLOC_AG_MAX_USABLE(mp)) ||
> +	    range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) ||
>  	    range.len < mp->m_sb.sb_blocksize)
>  		return -EINVAL;
>  
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index b8b2f06..d914a51 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -715,7 +715,7 @@ xfs_fs_counts(
>  	cnt->allocino = percpu_counter_read_positive(&mp->m_icount);
>  	cnt->freeino = percpu_counter_read_positive(&mp->m_ifree);
>  	cnt->freedata = percpu_counter_read_positive(&mp->m_fdblocks) -
> -							XFS_ALLOC_SET_ASIDE(mp);
> +						mp->m_alloc_set_aside;
>  
>  	spin_lock(&mp->m_sb_lock);
>  	cnt->freertx = mp->m_sb.sb_frextents;
> @@ -788,7 +788,7 @@ retry:
>  		__int64_t	free;
>  
>  		free = percpu_counter_sum(&mp->m_fdblocks) -
> -							XFS_ALLOC_SET_ASIDE(mp);
> +						mp->m_alloc_set_aside;
>  		if (!free)
>  			goto out; /* ENOSPC and fdblks_delta = 0 */
>  
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 9d6be55..05d3878 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1192,7 +1192,7 @@ xfs_mod_fdblocks(
>  		batch = XFS_FDBLOCKS_BATCH;
>  
>  	__percpu_counter_add(&mp->m_fdblocks, delta, batch);
> -	if (__percpu_counter_compare(&mp->m_fdblocks, XFS_ALLOC_SET_ASIDE(mp),
> +	if (__percpu_counter_compare(&mp->m_fdblocks, mp->m_alloc_set_aside,
>  				     XFS_FDBLOCKS_BATCH) >= 0) {
>  		/* we had space! */
>  		return 0;
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 8030627..cdced0b 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -96,6 +96,8 @@ typedef struct xfs_mount {
>  	uint			m_bm_maxlevels[2]; /* XFS_BM_MAXLEVELS */
>  	uint			m_in_maxlevels;	/* max inobt btree levels. */
>  	xfs_extlen_t		m_ag_prealloc_blocks; /* reserved ag blocks */
> +	uint			m_alloc_set_aside; /* space we can't use */
> +	uint			m_ag_max_usable; /* max space per AG */
>  	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
>  	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
>  	struct mutex		m_growlock;	/* growfs mutex */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 1fb16562..796ccb5 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1080,7 +1080,7 @@ xfs_fs_statfs(
>  	statp->f_blocks = sbp->sb_dblocks - lsize;
>  	spin_unlock(&mp->m_sb_lock);
>  
> -	statp->f_bfree = fdblocks - XFS_ALLOC_SET_ASIDE(mp);
> +	statp->f_bfree = fdblocks - mp->m_alloc_set_aside;
>  	statp->f_bavail = statp->f_bfree;
>  
>  	fakeinos = statp->f_bfree << sbp->sb_inopblog;
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/20] xfs: add an extent to the rmap btree
  2015-06-03  6:04 ` [PATCH 15/20] xfs: add an extent to the rmap btree Dave Chinner
@ 2015-06-25 16:41   ` Brian Foster
  2015-07-10  0:39     ` Dave Chinner
  0 siblings, 1 reply; 37+ messages in thread
From: Brian Foster @ 2015-06-25 16:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jun 03, 2015 at 04:04:52PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now all the btree, free space and transaction infrastructure is in
> place, we can finally add the code to insert reverse mappings to the
> rmap btree. Freeing will be done in a spearate patch, so just the
> addition operation can be focussed on here.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_rmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 138 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> index 38a92a1..c1e5d23 100644
> --- a/fs/xfs/libxfs/xfs_rmap.c
> +++ b/fs/xfs/libxfs/xfs_rmap.c
> @@ -120,6 +120,18 @@ out_error:
>  	return error;
>  }
>  
> +/*
> + * When we allocate a new block, the first thing we do is add a reference to the
> + * extent in the rmap btree. This takes the form of a [agbno, length, owner]
> + * record.  Newly inserted extents should never overlap with an existing extent
> + * in the rmap btree. Hence the insertion is a relatively trivial exercise,
> + * involving checking for adjacent records and merging if the new extent is
> + * contiguous and has the same owner.
> + *
> + * Note that we have no MAXEXTLEN limits here when merging as the length in the
> + * record has the full 32 bits available and hence a single record can track the
> + * entire space in the AG.
> + */
>  int
>  xfs_rmap_alloc(
>  	struct xfs_trans	*tp,
> @@ -130,18 +142,143 @@ xfs_rmap_alloc(
>  	uint64_t		owner)
>  {
>  	struct xfs_mount	*mp = tp->t_mountp;
> +	struct xfs_btree_cur	*cur;
> +	struct xfs_rmap_irec	ltrec;
> +	struct xfs_rmap_irec	gtrec;
> +	int			have_gt;
>  	int			error = 0;
> +	int			i;
>  
>  	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
>  		return 0;
>  
>  	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
> -	if (1)
> +	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
> +
> +	/*
> +	 * For the initial lookup, look for and exact match or the left-adjacent
> +	 * record for our insertion point. This will also give us the record for
> +	 * start block contiguity tests.
> +	 */
> +	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
> +	if (error)
> +		goto out_error;
> +	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
> +

Are we intentionally relying on the fact that a left record will always
exist due to the static metadata blocks that start the AG? If so, I'd
just suggest to note that in the comment above.

Brian

> +	error = xfs_rmap_get_rec(cur, &ltrec, &i);
> +	if (error)
>  		goto out_error;
> +	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
> +	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, ltrec 0x%x/0x%x/0x%llx\n",
> +	//		agno, bno, len, owner, ltrec.rm_startblock,
> +	//		ltrec.rm_blockcount, ltrec.rm_owner);
> +
> +	XFS_WANT_CORRUPTED_GOTO(mp,
> +		ltrec.rm_startblock + ltrec.rm_blockcount <= bno, out_error);
> +
> +	/*
> +	 * Increment the cursor to see if we have a right-adjacent record to our
> +	 * insertion point. This will give us the record for end block
> +	 * contiguity tests.
> +	 */
> +	error = xfs_btree_increment(cur, 0, &have_gt);
> +	if (error)
> +		goto out_error;
> +	if (have_gt) {
> +		error = xfs_rmap_get_rec(cur, &gtrec, &i);
> +		if (error)
> +			goto out_error;
> +		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
> +	//printk("rmalloc ag %d bno 0x%x/0x%x/0x%llx, gtrec 0x%x/0x%x/0x%llx\n",
> +	//		agno, bno, len, owner, gtrec.rm_startblock,
> +	//		gtrec.rm_blockcount, gtrec.rm_owner);
> +		XFS_WANT_CORRUPTED_GOTO(mp, bno + len <= gtrec.rm_startblock,
> +					out_error);
> +	} else {
> +		gtrec.rm_owner = XFS_RMAP_OWN_NULL;
> +	}
> +
> +	/*
> +	 * Note: cursor currently points one record to the right of ltrec, even
> +	 * if there is no record in the tree to the right.
> +	 */
> +	if (ltrec.rm_owner == owner &&
> +	    ltrec.rm_startblock + ltrec.rm_blockcount == bno) {
> +		/*
> +		 * left edge contiguous, merge into left record.
> +		 *
> +		 *       ltbno     ltlen
> +		 * orig:   |ooooooooo|
> +		 * adding:           |aaaaaaaaa|
> +		 * result: |rrrrrrrrrrrrrrrrrrr|
> +		 *                  bno       len
> +		 */
> +		//printk("add left\n");
> +		ltrec.rm_blockcount += len;
> +		if (gtrec.rm_owner == owner &&
> +		    bno + len == gtrec.rm_startblock) {
> +			//printk("add middle\n");
> +			/*
> +			 * right edge also contiguous, delete right record
> +			 * and merge into left record.
> +			 *
> +			 *       ltbno     ltlen    gtbno     gtlen
> +			 * orig:   |ooooooooo|         |ooooooooo|
> +			 * adding:           |aaaaaaaaa|
> +			 * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr|
> +			 */
> +			ltrec.rm_blockcount += gtrec.rm_blockcount;
> +			error = xfs_btree_delete(cur, &i);
> +			if (error)
> +				goto out_error;
> +			XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
> +		}
> +
> +		/* point the cursor back to the left record and update */
> +		error = xfs_btree_decrement(cur, 0, &have_gt);
> +		if (error)
> +			goto out_error;
> +		error = xfs_rmap_update(cur, &ltrec);
> +		if (error)
> +			goto out_error;
> +	} else if (gtrec.rm_owner == owner &&
> +		   bno + len == gtrec.rm_startblock) {
> +		/*
> +		 * right edge contiguous, merge into right record.
> +		 *
> +		 *                 gtbno     gtlen
> +		 * Orig:             |ooooooooo|
> +		 * adding: |aaaaaaaaa|
> +		 * Result: |rrrrrrrrrrrrrrrrrrr|
> +		 *        bno       len
> +		 */
> +		//printk("add right\n");
> +		gtrec.rm_startblock = bno;
> +		gtrec.rm_blockcount += len;
> +		error = xfs_rmap_update(cur, &gtrec);
> +		if (error)
> +			goto out_error;
> +	} else {
> +		//printk("add no match\n");
> +		/*
> +		 * no contiguous edge with identical owner, insert
> +		 * new record at current cursor position.
> +		 */
> +		cur->bc_rec.r.rm_startblock = bno;
> +		cur->bc_rec.r.rm_blockcount = len;
> +		cur->bc_rec.r.rm_owner = owner;
> +		error = xfs_btree_insert(cur, &i);
> +		if (error)
> +			goto out_error;
> +		XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
> +	}
> +
>  	trace_xfs_rmap_alloc_extent_done(mp, agno, bno, len, owner);
> +	xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
>  	return 0;
>  
>  out_error:
>  	trace_xfs_rmap_alloc_extent_error(mp, agno, bno, len, owner);
> +	xfs_btree_del_cursor(cur, XFS_BTREE_ERROR);
>  	return error;
>  }
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 13/20] xfs: rmap btree requires more reserved free space
  2015-06-25 16:41   ` Brian Foster
@ 2015-07-10  0:37     ` Dave Chinner
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-07-10  0:37 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Jun 25, 2015 at 12:41:04PM -0400, Brian Foster wrote:
> On Wed, Jun 03, 2015 at 04:04:50PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The rmap btree is allocated from the AGFL, which means we have to
> > ensure ENOSPC is reported to userspace before we run out of free
> > space in each AG. The last allocation in an AG can cause a full
> > height rmap btree split, and that means we have to reserve at least
> > this many blocks *in each AG* to be placed on the AGFL at ENOSPC.
> > Update the various space calculation functiosn to handle this.
> > 
> > Also, because the macros are now executing conditional code and are called quite
> > frequently, convert them to functions that initialise varaibles in the struct
> > xfs_mount, use the new variables everywhere and document the calculations
> > better.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
.....
> > + * so we set the number of set-aside blocks to 4 + 4*agcount when not using rmap
> > + * btrees.
> > + *
> > + * When rmap btrees are active, we have to consider that using the last block in
> > + * the AG can cause a full height rmap btree split and we need enough blocks on
> > + * the AGFL to be able to handle this. That means we have, in addition to the
> > + * above consideration, another (2 * mp->m_ag_levels) - 1 blocks required to be
> > + * available to the free list.
> > + */
> 
> BTW, I think I get the 2 block per level log requirement in that a split
> requires logging the two blocks involved. Where does the 2nd block per
> level come in as an allocation requirement?

Yup, you are right, I've mixed the two conditions up. Split only
requires an extra block per level, plus a new root block. e.g. see
xfs_alloc_min_freelist()...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 15/20] xfs: add an extent to the rmap btree
  2015-06-25 16:41   ` Brian Foster
@ 2015-07-10  0:39     ` Dave Chinner
  0 siblings, 0 replies; 37+ messages in thread
From: Dave Chinner @ 2015-07-10  0:39 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Thu, Jun 25, 2015 at 12:41:23PM -0400, Brian Foster wrote:
> On Wed, Jun 03, 2015 at 04:04:52PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Now all the btree, free space and transaction infrastructure is in
> > place, we can finally add the code to insert reverse mappings to the
> > rmap btree. Freeing will be done in a spearate patch, so just the
> > addition operation can be focussed on here.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/libxfs/xfs_rmap.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 138 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> > index 38a92a1..c1e5d23 100644
> > --- a/fs/xfs/libxfs/xfs_rmap.c
> > +++ b/fs/xfs/libxfs/xfs_rmap.c
> > @@ -120,6 +120,18 @@ out_error:
> >  	return error;
> >  }
> >  
> > +/*
> > + * When we allocate a new block, the first thing we do is add a reference to the
> > + * extent in the rmap btree. This takes the form of a [agbno, length, owner]
> > + * record.  Newly inserted extents should never overlap with an existing extent
> > + * in the rmap btree. Hence the insertion is a relatively trivial exercise,
> > + * involving checking for adjacent records and merging if the new extent is
> > + * contiguous and has the same owner.
> > + *
> > + * Note that we have no MAXEXTLEN limits here when merging as the length in the
> > + * record has the full 32 bits available and hence a single record can track the
> > + * entire space in the AG.
> > + */
> >  int
> >  xfs_rmap_alloc(
> >  	struct xfs_trans	*tp,
> > @@ -130,18 +142,143 @@ xfs_rmap_alloc(
> >  	uint64_t		owner)
> >  {
> >  	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_btree_cur	*cur;
> > +	struct xfs_rmap_irec	ltrec;
> > +	struct xfs_rmap_irec	gtrec;
> > +	int			have_gt;
> >  	int			error = 0;
> > +	int			i;
> >  
> >  	if (!xfs_sb_version_hasrmapbt(&mp->m_sb))
> >  		return 0;
> >  
> >  	trace_xfs_rmap_alloc_extent(mp, agno, bno, len, owner);
> > -	if (1)
> > +	cur = xfs_rmapbt_init_cursor(mp, tp, agbp, agno);
> > +
> > +	/*
> > +	 * For the initial lookup, look for and exact match or the left-adjacent
> > +	 * record for our insertion point. This will also give us the record for
> > +	 * start block contiguity tests.
> > +	 */
> > +	error = xfs_rmap_lookup_le(cur, bno, len, owner, &i);
> > +	if (error)
> > +		goto out_error;
> > +	XFS_WANT_CORRUPTED_GOTO(mp, i == 1, out_error);
> > +
> 
> Are we intentionally relying on the fact that a left record will always
> exist due to the static metadata blocks that start the AG? If so, I'd
> just suggest to note that in the comment above.

Yes. I thought there was a note somewhere about this, but I'll add a
new one...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-07-10  0:39 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-03  6:04 [RFC PATCH 00/20] xfs: reverse mapping btree support Dave Chinner
2015-06-03  6:04 ` [PATCH 01/20] xfs: xfs_alloc_fix_freelist() can use incore perag structures Dave Chinner
2015-06-15 14:57   ` Brian Foster
2015-06-03  6:04 ` [PATCH 02/20] xfs: factor out free space extent length check Dave Chinner
2015-06-15 14:58   ` Brian Foster
2015-06-03  6:04 ` [PATCH 03/20] xfs: sanitise error handling in xfs_alloc_fix_freelist Dave Chinner
2015-06-15 14:58   ` Brian Foster
2015-06-15 21:51     ` Dave Chinner
2015-06-16 11:27       ` Brian Foster
2015-06-22  0:10         ` Dave Chinner
2015-06-03  6:04 ` [PATCH 04/20] xfs: clean up XFS_MIN_FREELIST macros Dave Chinner
2015-06-15 14:58   ` Brian Foster
2015-06-03  6:04 ` [PATCH 05/20] xfs: introduce rmap btree definitions Dave Chinner
2015-06-03  6:30   ` Darrick J. Wong
2015-06-03  6:34     ` Darrick J. Wong
2015-06-03  6:04 ` [PATCH 06/20] xfs: add rmap btree stats infrastructure Dave Chinner
2015-06-03  6:04 ` [PATCH 07/20] xfs: rmap btree add more reserved blocks Dave Chinner
2015-06-03  6:04 ` [PATCH 08/20] xfs: add owner field to extent allocation and freeing Dave Chinner
2015-06-24 19:09   ` Brian Foster
2015-06-24 21:13     ` Dave Chinner
2015-06-25 13:03       ` Brian Foster
2015-06-03  6:04 ` [PATCH 09/20] xfs: introduce rmap extent operation stubs Dave Chinner
2015-06-03  6:04 ` [PATCH 10/20] xfs: define the on-disk rmap btree format Dave Chinner
2015-06-03  6:04 ` [PATCH 11/20] xfs: add rmap btree growfs support Dave Chinner
2015-06-03  6:04 ` [PATCH 12/20] xfs: rmap btree transaction reservations Dave Chinner
2015-06-03  6:04 ` [PATCH 13/20] xfs: rmap btree requires more reserved free space Dave Chinner
2015-06-25 16:41   ` Brian Foster
2015-07-10  0:37     ` Dave Chinner
2015-06-03  6:04 ` [PATCH 14/20] xfs: add rmap btree operations Dave Chinner
2015-06-03  6:04 ` [PATCH 15/20] xfs: add an extent to the rmap btree Dave Chinner
2015-06-25 16:41   ` Brian Foster
2015-07-10  0:39     ` Dave Chinner
2015-06-03  6:04 ` [PATCH 16/20] xfs: remove an extent from " Dave Chinner
2015-06-03  6:04 ` [PATCH 17/20] xfs: add rmap btree geometry feature flag Dave Chinner
2015-06-03  6:04 ` [PATCH 18/20] xfs: add rmap btree block detection to log recovery Dave Chinner
2015-06-03  6:04 ` [PATCH 19/20] xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled Dave Chinner
2015-06-03  6:04 ` [PATCH 20/20] xfs: enable the rmap btree functionality Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.